Data Loss Prevention

Every once in a while I get to write a neat piece of code that I can share. This is one of those times. I realize it is not large and by PerlMonk standards not very elegant. The problem therein lies with maintainability over the next few years. Regardless I like what I wrote and would like to share.

At the Circus we had a pretty good idea that we had some data leakage. Nothing like people taking off with everything needed to get home loans and rip off customers, just people not thinking about what they send through email. We didn’t know the extent of the problem or even if we had one. We just weren’t sure. Our C-level executives didn’t believe that employees would be so careless with customer data. We decided to find out.

I must say that the results were actually quite positive. We had a couple of people email work related data home so they could work at home over the weekend and a few emails regarding employment, but they were originated by the prospective employee.

Regardless, in order for us to find out I wrote a few scripts that hook into our email system. One that I am particularly proud of recurses through a directory of email messages and attachments scanning each file for relevant data.

Please note that by the time these scripts touch the data it has been scrubbed by the antivirus and other checks we have in place. I am only looking for keywords or regular expressions that would indicate customer related data loss.

Let me explain the directory structure. Under the email system is the directory /var/spool/filter that contains every email message that has been sent in the last 30 minutes. There is a cleanup process that erases all the files in that directory and that is actually where I wrote the hook, in the cleanup process. Here is a sample listing of the directory.

#ls -1 /var/spool/filter/

As you can see, each email header ends with a .hed extension and the message is in .txt format. The ETP.doc file is an attachment.

#ls -1 /var/spool/filter/msg-1299451626-29523-0/

The subroutine I am most pleased with is the one that recurses through the directory structure. The slurp command returns a hash and if it is a subdirectory then it is a hash as well. I look for it with the following line of code.

if (ref $structure->{$key} eq 'HASH')

That is how I find subdirectories to push onto the stack of recursive calls. As it traverses each directory it just looks at each file extension and makes a determination as to what to do with it.

I realize most system administrators are asking why I didn’t use the file command to make sure the script was acting appropriately for each file type but that does not work with the new Microsoft document types.

# file Test-Excel.xlsx
Test-Excel.xlsx: Zip archive data, at least v2.0 to extract

I thought it was a fun project and I enjoyed writing what I felt was an interesting piece of code.

# 2011-01-12 Jud Bishop
# This script goes looking for customer data being sent out through email and
# flags it for further review.
use strict;
use warnings;
use File::Find;
use File::Basename;
use File::Copy::Recursive qw(fcopy dircopy rcopy);
use File::Slurp::Tree;</code>

#my $dir = "/home/jud/TestMessages";
#my $log = "/home/jud/TestMessages/violation";
#my $auditdir = "/home/jud/TestMessages/Trash/";
my $dir = "/var/spool/filter";
my $log = "/var/log/hipaa/violation";
my $auditdir = "/opt/smtpaudit/";
my $debug = 0;

my %tree;
my $tree = slurp_tree($dir);

open (LOG, '>>', $log) or die $!;

traverse_structure($dir, $tree);

close LOG or die $!;

# This does the heavy lifting of the whole program. It recursively
# iterates through the directory structure and works on a file accordingly.
# Each directory is a hash key.
sub traverse_structure {
if($debug){print "##traverse_structure\n";}
my ($base, $structure) = @_;
my $path;
my @violation;
my $secure;
foreach my $key ( keys %$structure) {
$path = $base . "/" . $key;
$secure = 0;
## If it's a HASH then it's a directory.
if (ref $structure->{$key} eq 'HASH'){
if($debug){print "key: $key\n"};
traverse_structure( $path, $structure->{$key} );
} else {
if($debug){print "file : $key\n"};
if($debug){print "base : $base\n"};
if($debug){print "path : $path\n"};
if($debug){print "secure: $secure\n"};
if($debug){print "violation: $#violation\n"};

## If the file is not being used...
if ($path =~ m/doc$/){
parse_doc($path, \@violation);
} elsif ($path =~ m/xlsx$|xls$/) {
parse_excel($path, \@violation);
} elsif ($path =~ m/txt$/) {
parse_message($path, \@violation);
} elsif ($path =~ m/pdf$/) {
parse_pdf($path, \@violation);
} elsif ($path =~ m/hed$/) {
parse_head($path, \@violation, \$secure);
# If it is a secure email than it is encrypted on the fly and not a violation.
if ( ($secure == 0) && ($#violation > 3) ){
push (@violation, "EMAIL: " . $base);

# For later review.
sub copy_dir {
my $path = shift;
if($debug){print "##copy_dir $path\n";}
my $file = fileparse($path);

if ($file =~ m/^msg/){
my $basename = basename($path);
my $newpath = $auditdir . $basename;

if($debug){print "dircopy $path $newpath\n";}

# Log file that is easy to ready because an employee goes through
# this file and decides if it is a REAL violation.
sub log_it {
my @text = @_;
my $line;
if($debug){print "##log_it\n";}
print LOG "---------------------------------------------\n";
foreach $line (@text) {
print LOG "$line\n";
print LOG "---------------------------------------------\n";

sub parse_head {
my ($file, $violation_ref, $secure_ref) = @_;
my @body;
my $line;
if($debug){print "##parse_head $file\n";}

open(FILE,$file) || return 0;
@body = ;

foreach $line (@body) {
if ($line =~ m/^From/){
push (@$violation_ref, $line);
} elsif ($line =~ m/^To/) {
push (@$violation_ref, $line);
} elsif ($line =~ m/^Subject/) {
push (@$violation_ref, $line);
if ($line =~ m/^secure/i )
$$secure_ref = 1;

sub parse_pdf {
my ($file, $violation_ref) = @_;
my @body;
my $new_file = $file . ".txt";
my $CMD;

if($debug){print "##parse_doc $dir $file\n";}
$CMD = "/usr/bin/pdftotext \"" . $file . "\" > \"" . $new_file . "\"";
if($debug){print "CMD: $CMD\n";}
parse_text ($new_file, $violation_ref);

sub parse_doc {
my ($file, $violation_ref) = @_;
my @body;
my $new_file = $file . ".txt";
my $CMD;

if($debug){print "##parse_doc $dir $file\n";}
$CMD = "/usr/bin/antiword -st \"" . $file . "\" > \"" . $new_file . "\"";
if($debug){print "CMD: $CMD\n";}
parse_text ($new_file, $violation_ref);

sub parse_excel {
my ($file, $violation_ref) = @_;
my @body;
my $new_file = $file . ".txt";
my $CMD;

if($debug){print "##parse_excel $file\n";}
$CMD = "/usr/local/bin/antiexcel \"" . $file . "\" > \"" . $new_file . "\"";
if($debug){print "CMD: $CMD\n";}
parse_text ($new_file, $violation_ref);

sub parse_text {
my ($file, $violation_ref) = @_;
my @body;
if($debug){print "##parse_text $file\n";}

open(FILE,$file) || return 0;
@body = ;

compare_text(\@body, $violation_ref);

sub parse_message {
my ($file, $violation_ref) = @_;
my @body;
if($debug){print "##parse_text $file\n";}

open(FILE,$file) || return 0;
@body = ;

compare_text(\@body, $violation_ref);

# All of the earlier subroutines call this one.
# It takes the text and looks for keywords.
sub compare_text {
my ($text_ref, $violation_ref) = @_;
my @difference;
my @text_array;
my @elements;
my %count;
my %rules;
my $element;
if($debug){print "##compare_text\n";}

foreach $element (@$text_ref){
@elements = split(' ', $element);
push (@text_array, @elements);

# The parser was already created above.
my @rule = ("DOB", "D.O.B.", "d.o.b.", "dob", "death:", "release", "admit", "admission", "Age:", "SSN", "Social", "Security", "Account", "Acct", "claimant", "MRI", "myelogram", "credit", "card");

# Me being lazy.
foreach $element (@rule)
$rules{$element} = 1;

foreach $element (@text_array)
if (exists $rules{$element})
if($debug){print "$element\n";}
$element = "VIOLATION: " . $element;
push (@$violation_ref, $element);
# Social Security Number
elsif($element =~ /\d{3}-?\d{2}-?\d{4}/)
if($debug){print "$element\n";}
$element = "VIOLATION: " . $element;
push (@$violation_ref, $element);
# Credit Card Number or MRN
elsif($element =~ /\d{4}-?\d{4}-?\d{4}-?\d{4}/)
if($debug){print "$element\n";}
$element = "VIOLATION: " . $element;
push (@$violation_ref, $element);

Posted in Code, Linux | Leave a comment

Multicast Coolness

It’s amazing what passes for “cool” when you spend your days and nights staring at router configurations. The last two nights I have been working on a very simple sparse-dense lab that had been causing me problems. I finally figured it out and thought I would document some of the interesting bits I found.

When I was working to pass my TSHOOT exam I figured out a few of the best commands for me to troubleshoot.

Can I ping the gateway? No, check layer 2 issues first:
sh vlan br
sh int trunk
sh int status

Can I ping past the gateway? No, check layer 3 routing and protocols:
sh ip route

sh ip ospf ne
sh ip ospf int

sh ip eigrp ne
sh ip eigrp int

You get the idea, just a few commands to learn the most about a protocol quickly. For debugging multicast routing my new favorite commands are:
debug ip mpacket
sh ip mroute
sh ip pim rpf

With my most favorite being debug ip mpacket, let me show you why. This ping was not working when it began, however, in the middle I changed the unicast routing protocol to advertise the RP address. Notice that the error actually states the RPF failed to find the route to the RP.


R6(config)#do debug ip mpacket

R6(config)#do ping rep 100 

Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to, timeout is 2 seconds:

IP(0): s= (FastEthernet0/0.146) d= id=1463, ttl=254, prot=1, len=114(100), mroute olist null
Reply to request 0 from, 44 ms
IP(0): s= (FastEthernet0/0.146) d= id=1464, ttl=254, prot=1, len=114(100), mroute olist null
Reply to request 1 from, 28 ms
Reply to request 2 from, 28 ms
Reply to request 3 from, 28 ms
Reply to request 4 from, 28 ms
Reply to request 5 from, 28 ms


Posted in Routing | Leave a comment


I wish.

I have added a CCIE category to the blog. Although I have been studying steadily I have only posted testlab scripts to date. I will most likely post very little CCIE specific content while I continue to study for the written until I am much closer to my written test day.

I have really struggled with my structure studying for the written. I am a hands on learner and cramming a bunch of reading without application makes it difficult for me to remember and understand the nuances of a technology. I would rather play with a protocol and learn about it through interaction than try to memorize a bunch of random facts for the written test. Recently I have been doing INE Workbook 1 labs as I feel they complement my reading well. They are not difficult and explore the intricacies of one protocol at a time. It is easy for me to do a lab and play around with the protocol to learn.

Reading another candidates blog I ran across his study plan which was taken from
this post. At the end of that blog entry is a list of core INE Workbook 1 labs you should do while preparing for the lab. They are below for convenience.

Bridging & Switching: 1.1-1.15
Frame-Relay: 2.1-2.10
IP Routing: 3.1-3.11
RIP: 4.1-4.6
EIGRP: 5.1-5.8
OSPF: 6.1-6.11, 6.21-6.31
BGP: 7.1-7.9, 7.16-7.26
IPv6: 9.1-9.5, 9.12-9.14, 9.17-9.20, 9.29-9.31
MPLS VPN: 14.1-14.7

That is 109 labs that INE recommends you complete before moving to more advanced labs which gives me a goal and structure. I plan to do these labs in the coming months to complement my reading. That also means I need only do 11 labs per month so I will cut back on labs on the weekends and do more reading and note taking. Actually I have already done ~25 of these labs so it is even fewer labs I need to do but I will do many of them multiple times so I’m not going to quibble with the numbers.

I have completely finished reading and taking notes for TCP/IP Vol I and am half way through Vol II. I will then read the Switching Exam Certification Guide and the QoS Exam Certification Guide again, followed by the CCIEv4 Exam Certification Guide. My goal is to pass the written next winter.

To put dates to my goals:
31 March — Finish Volume II
(Second child is due in April.)
30 June — Finish reading Switching Exam Certification Guide
31 August — Finish reading QoS Exam Certification Guide
31 October — Finish reading CCIEv4 Exam Certification Guide

Finally when I begin reading the CCIEv4 Exam Certification Guide I will begin to post more of my notes. What I found when studying for my CCNP was that immediately after I finished putting all of my notes on the web for a test was when I was the most prepared for theory based exams.

Posted in CCIE | Leave a comment

Password Aging

At the Circus we have a password policy to change all passwords every 90 days. Today it was brought to my attention that one of the linux servers was not following that policy. I confirmed that was true and after a little digging I found that it was only accounts that had been migrated from AIX to linux. But we couldn’t force around 2000 users to all change their passwords at the same time because we would inundate the help desk.

This is the script that I wrote to fix the problem and distribute the password changes over a month. The result is that there are only 78 users per day that are forced to change their password every day over that 28 day period.

# 2011-01-28
# Jud Bishop
# Checks for passwords set to never expire and gives an expiration date.
# Distributes the password changes over a 28 day spread.


for I in `cat /etc/passwd | cut -d: -f 1`
        #echo $I
        #chage -l $I | egrep "Password expires" | cut -d : -f 2

        DATE=`chage -l $I | egrep "Password expires" | cut -d : -f 2 | cut -d \  -f 2`
        if [ $DATE = "never" ]
                echo $I
                if [ $X -le "27" ]
                        X=`expr $X + 1`
                echo $X $I
                chage -d  2010-11-$X -M 90 $I
Posted in Code, Linux | Leave a comment

Military Personnel

I just finished reading an article in the Atlantic concerning military personnel and recruitment titled “Why Our Best Officers Are Leaving.” As a veteran and former officer who enjoyed my time in service I felt the urge to comment publicly.

My stint in the Air Force was the formative years of my professional development. The military formed many of the core beliefs I have today. My view on documentation and succession are much different than my most of my private sector peers. One maxim that my wife and I do not see eye to eye on, “Early is on time, on time is late, late you have a problem.” Another saying that was often repeated was, “Do your current job well and your next job will take care of itself.”

Today my hair is just as short as when I was in, if not shorter and my shoes are still spit shined. Only my uniform has changed; from blues or BDUs to khakis and polos or slacks and dress shirts.

But what I really wanted to comment on was my career development. My first encounter with the Air Force Personnel Center (AFPC) was even before I went active duty. When we got our assignments as college seniors mine was to Offutt AFB, Nebraska. As a cyclist I was not pleased to be heading to a station with a 2 month summer, with fall and spring similarly abbreviated. My commanding officer at the detachment asked me if I wanted him to make a call to AFPC on my behalf and see if I get a more amenable station. I declined stating that I didn’t want to start my time in the Air Force fighting the system.

I worked hard hard at Offutt. Not as many hours as I do now, but I learned a great deal. The one big project I handled was the leg work, research and negotiation to settle a $1M lawsuit against the base. Our Colonel had given three of us the project and I was the one that finished the job. The other two lieutenants just didn’t find it interesting.

An aside. I was also given the task to get a squadron t-shirt designed and approved, but I just couldn’t find the time. Someone else finally did it. Now I believe it would have been a good experience because you had to work through all the red tape, but I just didn’t find that appealing.

When it came time for me to change duty stations my commanding officer called me into his office and told me he had made some phone calls and found me a position at the Air Force Logistics Management Agency (AFLMA).

I don’t believe what I did at the AFLMA was outstanding, I ran a website for the Air Force the last couple of years I was in. The website had pretty high visibility and I gave presentations to nearly every full bird Colonel and met privately with every General in my career field. I traveled extensively during this time and gave presentations like I was a salesman.

Another aside. I got married on Saturday and left for Washington, DC Sunday to give a presentation Monday morning at the Pentagon.

When I declared my intention to leave the Air Force the AFPC representative for my career field took me to lunch. He offered to station me anywhere in the world. When I told him my wife was English/South African and we were considering moving to England he offered to double billet me in England. Next he offered me a nice opening in New Zealand where I would be in charge of my own office. I declined them both and ended up in graduate school.

I just figured every Lieutenant and Captain had the same experience I did. You work hard, show initiative and let your mentors steer you through the maze of jobs and promotions. Imagine my surprise when I found that is not the case in the private sector.

Posted in Thoughts | Leave a comment

Another Test Lab Script

I’m sorry that all of these TestLab scripts are a recurring theme. Work purchased four 3560s and two 1841s for the lab so I have been updating all of my scripts. When I was working on the lab I kept having sessions hang so I wrote a quick script to clear all of the lines on the terminal server.

# 2010-12-14 Jud Bishop
# tl-clear
# A short script to handle logging into a router in the lab.

set host ""
set pass "CHANGEME"
set enable "CHANGEME2"
set ctrlz \032

# Should not need any more changes.
set router [lindex $argv 0]

spawn telnet $host
expect "Password:"
send "$pass\r"
expect "testlab>"
send "enable\r"
expect "Password:"
send "$enable\r"
expect "testlab#"
sleep 1

for { set i 1} {$i < 48} {incr i 1} {
	send "clear line $i\r"
	expect {
		-re ".*confirm.*" {send "y \r"}
		-re ".*Not allowed to clear current line.*" {send "\r"}
		-re ".*Invalid input detected at.*" {send "\r"}
Posted in Code, Routing | Leave a comment

Two Variables, One Line

This morning I could not for the life of me remember how to read two variables from one line in bash. As a result I am putting this simple script up here so that I have an easy place to reference.

The input file was a listing of printer IP addresses that are translated is in the file /tmp/printers.txt and looks like this.

cat /tmp/printers.txt = = =

Here is the simple code to read both variables.

# 2010-08-24 Jud Bishop
# Simple script to find names of local and remote printers
# that are translated.

while IFS== read remote local
name=`dig +short -x $local`
echo -e "$name,$remote,$local"
done < /tmp/printers.txt

But it came it out in this format, not much of a problem but I prefer it more legible., ,, ,, ,

So I cleaned up the output. The first sed stanza deletes the third “.” in the output and the second sed stanza deletes the spaces.

./ | sed 's/\.//3
s/\ //g',,,,,,

Posted in Code, Linux | Leave a comment

I got pulled over.

This story comes from the annals of my cycling adventures.

I was in the Air Force from 1995 to 2000 and was stationed at Gunter in Montgomery, AL in the spring of 1997. I rode quite a bit in college but when I was stationed in Nebraska I did not ride much because of the long cold winters. So when I moved to Montgomery I was looking for routes and stopped in the local bike shop.

The shop was Cycle Escape and the owners would eventually introduce me to my wife, but that is a story for another day. I went in and they had the mechanics stands where you could chat and watch the guys work on bikes. One of the part-time wrenches was a sheriff’s deputy and when I asked about routes he told me about a route out by Emerald Mountain. He told me where to park and gave me a couple of different distance options which became some of my standard routes.

When I was in the military we regularly played golf on Fridays. Not every Friday but once a month, your tax dollars at “work.” We had gone to play golf one Friday south of Montgomery and on the way back I stopped to ride in the Pike Road community because the roads and were traffic were both good. I literally just parked on the side of the road, changed into my cycling clothes and got in a quick ride.

After the ride, on my way back home I was followed by a police officer for miles, even though I had slowed to the speed limit. I got it into my head that someone had seen me changing clothes on the side of the road and had called the cops. I had no idea how I was going to tell my Colonel that I was busted for public nudity for changing on the side of the road.

The rest of the cops must have finally arrived because all of a sudden an SUV pulled in front of me, a second cop was beside me and a sheriff was behind me in a rolling blockade. Of course I pulled over. They all got out with their weapons drawn and one of them over the load speaker told me to move to the back of the car with my hands up and get spread eagle on the trunk.

I was wearing spandex, I couldn’t hide anything if I wanted. They asked if they could go through my car and I said yes. When the sheriff popped my trunk he said, “Nice bike.” Then they went through my golf clubs in the back seat.

Finally the sheriff turned to me and said, “You don’t remember me do you?” I replied that I did not and he told that he was the mechanic who had told me to park up by Emerald Mountain. A woman was abducted around that area and because my car was parked there I became a suspect. If it hadn’t been for the sheriff who told me to park there, and the fact that he was involved in my traffic stop, who knows what would have happened.

Needless to say I have been more careful where I park and especially where I change into my cycling gear.


Posted in Cycling, Thoughts | Leave a comment




Click here for a better image.

You can download the initial configuration files here.

The only difference between this lab and CBWFQ is making it LLQ. I would not do these one after the other, intersperse another lab, or do CBWFQ and then change it to LLQ.

To prepare for this lab only turn on one 800K link between R1 and R2, and the link between R4 and R1 for traffic generation. If you want traffic to make it around the lab then configure to your hearts content.

On R1:

  • Create access lists web, control and print.
  • Create class maps web, control and print.
  • Put http, ftp, pop3 and smtp in the web class map.
  • Put ntp, ssh, telnet and x11 in control.
  • Put 9100 in print..
  • Create the policy map cbwfq and give these bandwidth percentages; web: 30%, control: 20%, print: 10%.
  • Make the control group the priority queue.
  • Apply the configuration to S0/0.
  • Confirm the configuration and debug it.

Here are the protocols for which traffic is generated from our traffic generation configuration file:

for I in `grep dest-port r4-basic-tgn.cfg | cut -d\   -f 3`; do grep [[:space:]]$I/tcp /etc/services; done
telnet          23/tcp
http            80/tcp
ftp             21/tcp
ntp             123/tcp
pop3            110/tcp
smtp            25/tcp
ssh             22/tcp
x11             6000/tcp
jetdirect       9100/tcp

Answer is below:

R1 Configuration for CBWFQ:

! CEF must be turned on.
ip cef
! 1.  Create the access-list.
ip access-list extended control
 permit tcp any any eq 123
 permit tcp any any eq telnet
 permit tcp any any eq 22
 permit tcp any any eq 6000
ip access-list extended print
 permit tcp any any eq 9100
ip access-list extended web
 permit tcp any any eq www
 permit tcp any any eq ftp
 permit tcp any any eq pop3
 permit tcp any any eq smtp
! 2.  Create the class-map.
class-map match-any control
 match access-group name control
class-map match-any web
 match access-group name web
class-map match-any print
 match access-group name print
! 3. Create the policy-map.
policy-map cbwfq
 class web
  bandwidth percent 30
 class control
! 3a.  Notice this is the only difference between LLQ and CBWFQ.
  priority percent 20
 class print
  bandwidth percent 10
 class class-default
! 4.  Apply it to the interface.
interface Serial0/0
 bandwidth 800
 ip address
 service-policy output cbwfq

Confirm the configuration:

sh int s0/0
sh queueing
sh policy-map int s0/0
sh poicy-map int s0/0 output class control

Debug the configuration:

debug priority
Posted in Uncategorized | Leave a comment

Red Hat Upgrades

Now that RHEL 6 is out I’ve begun playing with upgrading our older RHEL servers, starting with anything that is RHEL 4 and then moving forward to RHEL5. The simplest one to start with was one of our network management servers. They are vital to our job but not customer facing and we can deal with some downtime on them.

My original plan was to install a base RHEL4 on the new server, which would give us a platform with the base system installed and the restore software hosted, then restore from backup over the top.

I took an old server we had that still had maintenance and pressed it into service. I wanted to take the opportunity to test our backups as well as the upgrade path from RHEL4 to RHEL6. So I installed RHEL4 on the new server and ran up2date to make sure it was the “latest” and greatest.

On the old server I ran up2date and then I queried the rpm database to see what packages were installed. The problem is that you cannot pipe rpm -qa output as input into an update script. Up2date wants “freetype-devel” as the package name, not the whole package name as listed in the output, “freetype-devel-2.1.9-17.el4_8.1” note the version number. Notice also that some package names have multiple dashes while others have only one dash.

[root@server] # rpm -qa

I could not easily use cut so I wrote the following quick hack:


# 2010-12-03 Jud Bishop
# Run:
# rpm -qa >/tmp/installed-software.log
# Then run:
# for I in ``; do up2date $I; done

use strict;

my $log_file = "/tmp/installed-software.log";

open (FILE, $log_file) or die "Error: can't open log file\n $! \n";
while (<FILE>)
	my (@log) = split /-/;
	my $package = "";

	for (my $i = 0; $i <= $#log; $i++)
		if ( $log[$i]  =~ /^[a-zA-Z]/ )
			if ( $i > 0 )
				$package = $package . "-" . $log[$i];
			} else {
				$package = $log[0];
	print "$package \n";

close FILE or die "Error: can't close file\n $! \n";

It has been interesting testing our backup software. It appears it will take some refinement for us to get the restore process worked out.

Posted in Linux | Leave a comment