Parsing Barracuda Log Files

We have a Barracuda web filter here at the Circus. In general we are pleased with its performance, however, as our internet usage has climbed the days of history has declined. The inverse relationship is due to a “ring buffer” of 250,000 entries in the history log. There are times when we have a few hours of history and that doesn’t sit well when a manager wants to see the browsing history for a user or a pc. So we turned on syslog logging to a remote logging server and evaluated the log parsing packages that mentioned they parsed Barracuda logs. Let me give you a hint, there are not many.

My first hack was just to see what was going on and give a rudimentary understanding, it can be downloaded here.

!#~/bin/bash
grep $1 /var/log/barracuda.log | grep http_scan | awk '{print $28}' | sort | uniq -c | sort -n

Which gives output like this:

  
     63 autotrader.com
     72 google.com
     78 edmunds.com
     81 ad.doubleclick.net
     82 charter.com
    121 dealer.com
    138 alagasco.com
    194 synacor.com
    334 charter.net

The basis of that script gave me more understanding to write the following script which can be downloaded here.

#!/usr/bin/perl -w

use Getopt::Long;
use Number::Format;
use Tie::IxHash;


###############################################
# 2010-02-05 Jud Bishop
# This script parses /var/log/barracuda.log.
# Released under the GNU GPLv2.
###############################################
# Options that can be passed in:
# -u username
# -s source ip address
# -d destination url
# -c category
# -days number of days 
# -pc the pcname
###############################################
# The format of a log file message.
# Used this document:
# http://www.barracudanetworks.ca/download/barracuda-web-filter-syslog.pdf
# And this one liner:
# tail -50 /var/log/barracuda.log | awk '{print $28}' && tail -1 /var/log/barracuda.log
#
# 0.  month
# 1.  day
# 2.  time
# 3.  barracuda_ip
# 4.  http_scan[process_id]
# 5.  md5sum
# 6.  number1
# 7.  source_ip
# 8.  destination_ip
# 9. content_type
# 10. source_ip
# 11. destination_address/URL
# 12. data_size
# 13. BYF
# 14. action (ALLOWED, BLOCKED, DETECTED)
# 15. reason (CLEAN, VIRUS, SPYWARE)
# 16. format_ver == 2 (Version of the policy engine output.)
# 17. match_flag (Whether an existing policy matched the traffic: 1=Yes, 0=No.) 
# 18. tq_flag (Time qualified flag; 1=Yes 0=No.) 
# 19. action_type (The documentation for this flag is incorrect.) 
# 20. source_type (0=Any source, 1=group, 2=ipv4addr, 3=login, 4=auth_user, 5=min_score)
# 21. src_detail Detail related to matched source or "(-)" if not.
# 22. dest_detail If there is a destination match, what type (0=any, 1=particular category, 2=any category
#                 3=domain, 4=mimetype, 5=spyware, 6=uri_path_regex, 7=uri_regex, 8=application)
# 23. dest_detail Matched category or "(-)" if not matched.
# 24. spyware (If it is spyware, 0=allow, not spyware, 1=block, 2=infection.)
# 25. spyware_id (Name of the spyware if matched, if not "-".)
# 26. infection_weight Weight of the infection, mostly 0.
# 27. matched_part Part of the rule theat matched.
# 28. matched_category Comma delimited category name that matched traffic.
# 29. user_name Username, ([ANON], [ldap0:jud], [username:jud])

###############################################
# Variables you can change.
###############################################
my $debug = 0;
my $log_file = "/var/log/barracuda.log";
#my $log_file = "/var/log/barracuda.test";

###############################################
# Variables you should NOT change.
###############################################
my $arg_username = 0;
my $arg_pc = 0;
my $arg_source_ip = 0;
my $arg_dest_url = 0;
my $arg_category = 0;
my $arg_days = 0;
my $arg_help = 0;
my %table; # holds all the data for each session.
tie %table, "Tie::IxHash";
my %categories; # holds all of the categories this person went to.
my $data_sum = 0; # holds the total of all data for a user.
my $session_sum = 0; # hold the number of sessions for a user.
# The names of these variable are so that I don't have to keep looking
# above to figure out what name is what item in the array.
my $user = 29;
my $md5sum = 5;
my $month = 0;
my $day = 1;
my $time = 2;
my $source_ip = 7;
my $destination_ip = 8;
my $url = 11;
my $data_size = 12;
my $action = 14;
my $part = 27;
my $category = 28;

# Reads the options passed in.
sub get_options {
        GetOptions(
                'help|?|h!' => \$arg_help,
                'u=s' => \$arg_username,
                's=s' => \$arg_source_ip,
                'd=s' => \$arg_dest_url,
                'c=s' => \$arg_category,
                'days=i' => \$arg_days,
                'pc=s' => \$arg_pc);

        if ($debug)
        {
                print "username == $arg_username\n";
                print "source_ip == $arg_source_ip\n";
                print "pc == $arg_pc\n";
                print "dest_url == $arg_dest_url\n";
                print "category == $arg_category\n";
                print "days == $arg_days\n";
                print "help == $arg_help\n";
        }

        if ($arg_help)
        {
                print "usage: user-report.pl -days number [-u usernname] [-s source-ip] [-d destination-url] ";
                print "[-c category] [-pc pc_name] [--help|-?]\n";
                exit;
        }
}

# Parses the logs.
# Days equals log file days, makes it easy.
sub parse_logs {

        my ($search_field, $search_equals, $days) = @_;

        if ($debug){
                print "--------------------\n";
                print "parse_logs\n";
                print "search_field == $search_field\n";
                print "search_equals == $search_equals\n";
                print "days == $days\n";
        }

        # Loop through the log files based on number of days:
        # 0 == today
        # 1 == barracuda.log.1 one day past...
        # This is not formatted correctly because I added it as a retrofit.
        for (my $i = 0; $i <= $days; $i++)
        {
        if ($i == 0)
        {
                if ($debug) {print "open $log_file\n"};
                open (FILE, $log_file) or die "Error: can't open file\n $! \n";
        } else {
                if ($debug) {print "open $log_file.$days\n"};
                my $file = sprintf ("%s.%s", $log_file, $days);
                open (FILE, $file) or die "Error: can't open file\n $! \n";
        }
        while (<FILE>)
        {
                chomp;
                # Makes split work like awk, don't believe the man page.
                my (@log) = split /\s+/;
                # The next check is to catch the following type messages.
                # Feb  9 06:51:09 last message repeated 8 times
                if (defined ($log[$search_field]) and ($log[$search_field] eq $search_equals))
                {
                        if ($debug){
                                print "$log[$search_field] $search_equals\n";
                                for (my $i = 0; $i <= $#log; $i++)
                                {
                                        print "log[$i] == $log[$i]\n ";
                                }
                        }
                        # Each session gets a different md5sum, which is why it is the key in the table.
                        if( not exists $table{$log[$md5sum]}){
                                $table{$log[$md5sum]} = {'user'=>$log[$user], 'month'=>$log[$month], 'day'=>$log[$day], 'time'=>$log[$time], 'source_ip'=>$log[$source_ip], 'destination_ip'=>$log[$destination_ip], 'url'=>$log[$url], 'data_size'=>$log[$data_size], 'action'=>$log[$action], 'category'=>$log[$category], 'total_data'=>$log[$data_size], 'session_count'=>1 };

                                $data_sum += $log[$data_size];
                                $session_sum += 1;
                                if ( not exists $categories{$log[$category]} )
                                {
                                        $categories{$log[$category]} = 1;
                                } else {
                                        $categories{$log[$category]} += 1;
                                }

                                if($debug){
                                        print "does not exist\n";
                                        print "log user = $log[$user] \n";
                                        print "table user = $table{$log[$md5sum]}->{user} \n";
                                        print "sessions = $session_sum\n";
                                        print "bandwidth = $data_sum\n";
                                }
                        } else {
                                $table{$log[$md5sum]}->{total_data} += $log[$data_size];
                                $data_sum += $log[$data_size];
                                if($debug){
                                        print "exists \n";
                                        print "bandwidth = $data_sum\n";
                                }
                           }
                    }      
        }
        close FILE or die "Error: can't close file\n $! \n";
        }
}

sub print_report {

        if ($debug) {print "print_report\n";}
        if ($arg_username)
        {
                print "Useage report for: $arg_username\n";
        } elsif ($arg_pc) {
                print "Useage report for: $arg_pc\n";
        } elsif ($arg_source_ip) {
                print "Useage report for: $arg_source_ip\n";
        }
        print "Number of sessions: $session_sum\n";

        my $x = new Number::Format;
        $formatted = $x->format_bytes($data_sum);
        print "Total bandwidth consumed: $formatted\n\n";

        foreach my $category (sort (keys(%categories)))
        {
                printf "%s\n", uc($category);

                foreach $key (keys(%table))
                {
                        if ( $table{$key}->{category} eq $category)
                        {
                                my $url = substr($table{$key}->{url}, 0, 35);
                                print "$table{$key}->{month} $table{$key}->{day} $table{$key}->{time} $url $table{$key}->{action}\n";
                        }
                }
                print "\n";
        }
}

###############################################
# main
###############################################
my $search_field;
my $search_equals;

get_options();

if ($arg_username or $arg_pc) {
        $search_field = $user;
        if ($arg_username eq "ANON"){
                $search_equals = $arg_username;
        } else {
                $search_equals = sprintf("[ldap0:%s]", $arg_username);
        }
} elsif ($arg_source_ip) {
        $search_field = $source_ip;
        $search_equals = $arg_source_ip;
} elsif ($arg_dest_url) {
        $search_field = $url;
        $search_equals = $arg_dest_url;
} elsif ($arg_category) {
        $search_field = $category;
        $search_equals = $arg_category;
}
if ($debug) {print "search_field == $search_field\n"}

parse_logs($search_field, $search_equals, $arg_days);

print_report();

This script produces output like the following:

Usage report for: circus-user
Number of sessions: 401
Total bandwidth consumed: 19.39M

ADVERTISEMENT-POP-UPS,GAME-MEDIA,CUSTOM-2
Feb 17 15:44:17 http://games.mochiads.com/c/p/the-r ALLOWED

AUCTIONS
Feb 17 17:08:21 http://rover.ebay.com/ar/1/56033/1? ALLOWED

AUCTIONS,MOTOR-VEHICLES,CUSTOM-1
Feb 17 17:13:44 http://edmunds.autotrader.com/js/jq ALLOWED
Feb 17 17:13:45 http://edmunds.autotrader.com/inc/g ALLOWED
Feb 17 17:13:46 http://edmunds.autotrader.com/inc/j ALLOWED
Feb 17 17:13:46 http://edmunds.autotrader.com/inc/j ALLOWED
Feb 17 17:13:47 http://edmunds.autotrader.com/dwr/i ALLOWED
Feb 17 17:13:50 http://edmunds.autotrader.com/no_ca ALLOWED

BUSINESS
Feb 17 15:09:08 http://www.statcounter.com/counter/ ALLOWED
Feb 17 15:10:15 http://www.alagasco.com/fw/_css/fle ALLOWED
Feb 17 15:10:16 http://www.alagasco.com/scripts/jFa ALLOWED
Feb 17 15:10:17 http://www.alagasco.com/fw/_js/flex ALLOWED
This entry was posted in Code, Linux. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s