We have a Barracuda web filter here at the Circus. In general we are pleased with its performance, however, as our internet usage has climbed the days of history has declined. The inverse relationship is due to a “ring buffer” of 250,000 entries in the history log. There are times when we have a few hours of history and that doesn’t sit well when a manager wants to see the browsing history for a user or a pc. So we turned on syslog logging to a remote logging server and evaluated the log parsing packages that mentioned they parsed Barracuda logs. Let me give you a hint, there are not many.
My first hack was just to see what was going on and give a rudimentary understanding, it can be downloaded here.
!#~/bin/bash grep $1 /var/log/barracuda.log | grep http_scan | awk '{print $28}' | sort | uniq -c | sort -n
Which gives output like this:
63 autotrader.com 72 google.com 78 edmunds.com 81 ad.doubleclick.net 82 charter.com 121 dealer.com 138 alagasco.com 194 synacor.com 334 charter.net
The basis of that script gave me more understanding to write the following script which can be downloaded here.
#!/usr/bin/perl -w use Getopt::Long; use Number::Format; use Tie::IxHash; ############################################### # 2010-02-05 Jud Bishop # This script parses /var/log/barracuda.log. # Released under the GNU GPLv2. ############################################### # Options that can be passed in: # -u username # -s source ip address # -d destination url # -c category # -days number of days # -pc the pcname ############################################### # The format of a log file message. # Used this document: # http://www.barracudanetworks.ca/download/barracuda-web-filter-syslog.pdf # And this one liner: # tail -50 /var/log/barracuda.log | awk '{print $28}' && tail -1 /var/log/barracuda.log # # 0. month # 1. day # 2. time # 3. barracuda_ip # 4. http_scan[process_id] # 5. md5sum # 6. number1 # 7. source_ip # 8. destination_ip # 9. content_type # 10. source_ip # 11. destination_address/URL # 12. data_size # 13. BYF # 14. action (ALLOWED, BLOCKED, DETECTED) # 15. reason (CLEAN, VIRUS, SPYWARE) # 16. format_ver == 2 (Version of the policy engine output.) # 17. match_flag (Whether an existing policy matched the traffic: 1=Yes, 0=No.) # 18. tq_flag (Time qualified flag; 1=Yes 0=No.) # 19. action_type (The documentation for this flag is incorrect.) # 20. source_type (0=Any source, 1=group, 2=ipv4addr, 3=login, 4=auth_user, 5=min_score) # 21. src_detail Detail related to matched source or "(-)" if not. # 22. dest_detail If there is a destination match, what type (0=any, 1=particular category, 2=any category # 3=domain, 4=mimetype, 5=spyware, 6=uri_path_regex, 7=uri_regex, 8=application) # 23. dest_detail Matched category or "(-)" if not matched. # 24. spyware (If it is spyware, 0=allow, not spyware, 1=block, 2=infection.) # 25. spyware_id (Name of the spyware if matched, if not "-".) # 26. infection_weight Weight of the infection, mostly 0. # 27. matched_part Part of the rule theat matched. # 28. matched_category Comma delimited category name that matched traffic. # 29. user_name Username, ([ANON], [ldap0:jud], [username:jud]) ############################################### # Variables you can change. ############################################### my $debug = 0; my $log_file = "/var/log/barracuda.log"; #my $log_file = "/var/log/barracuda.test"; ############################################### # Variables you should NOT change. ############################################### my $arg_username = 0; my $arg_pc = 0; my $arg_source_ip = 0; my $arg_dest_url = 0; my $arg_category = 0; my $arg_days = 0; my $arg_help = 0; my %table; # holds all the data for each session. tie %table, "Tie::IxHash"; my %categories; # holds all of the categories this person went to. my $data_sum = 0; # holds the total of all data for a user. my $session_sum = 0; # hold the number of sessions for a user. # The names of these variable are so that I don't have to keep looking # above to figure out what name is what item in the array. my $user = 29; my $md5sum = 5; my $month = 0; my $day = 1; my $time = 2; my $source_ip = 7; my $destination_ip = 8; my $url = 11; my $data_size = 12; my $action = 14; my $part = 27; my $category = 28; # Reads the options passed in. sub get_options { GetOptions( 'help|?|h!' => \$arg_help, 'u=s' => \$arg_username, 's=s' => \$arg_source_ip, 'd=s' => \$arg_dest_url, 'c=s' => \$arg_category, 'days=i' => \$arg_days, 'pc=s' => \$arg_pc); if ($debug) { print "username == $arg_username\n"; print "source_ip == $arg_source_ip\n"; print "pc == $arg_pc\n"; print "dest_url == $arg_dest_url\n"; print "category == $arg_category\n"; print "days == $arg_days\n"; print "help == $arg_help\n"; } if ($arg_help) { print "usage: user-report.pl -days number [-u usernname] [-s source-ip] [-d destination-url] "; print "[-c category] [-pc pc_name] [--help|-?]\n"; exit; } } # Parses the logs. # Days equals log file days, makes it easy. sub parse_logs { my ($search_field, $search_equals, $days) = @_; if ($debug){ print "--------------------\n"; print "parse_logs\n"; print "search_field == $search_field\n"; print "search_equals == $search_equals\n"; print "days == $days\n"; } # Loop through the log files based on number of days: # 0 == today # 1 == barracuda.log.1 one day past... # This is not formatted correctly because I added it as a retrofit. for (my $i = 0; $i <= $days; $i++) { if ($i == 0) { if ($debug) {print "open $log_file\n"}; open (FILE, $log_file) or die "Error: can't open file\n $! \n"; } else { if ($debug) {print "open $log_file.$days\n"}; my $file = sprintf ("%s.%s", $log_file, $days); open (FILE, $file) or die "Error: can't open file\n $! \n"; } while (<FILE>) { chomp; # Makes split work like awk, don't believe the man page. my (@log) = split /\s+/; # The next check is to catch the following type messages. # Feb 9 06:51:09 last message repeated 8 times if (defined ($log[$search_field]) and ($log[$search_field] eq $search_equals)) { if ($debug){ print "$log[$search_field] $search_equals\n"; for (my $i = 0; $i <= $#log; $i++) { print "log[$i] == $log[$i]\n "; } } # Each session gets a different md5sum, which is why it is the key in the table. if( not exists $table{$log[$md5sum]}){ $table{$log[$md5sum]} = {'user'=>$log[$user], 'month'=>$log[$month], 'day'=>$log[$day], 'time'=>$log[$time], 'source_ip'=>$log[$source_ip], 'destination_ip'=>$log[$destination_ip], 'url'=>$log[$url], 'data_size'=>$log[$data_size], 'action'=>$log[$action], 'category'=>$log[$category], 'total_data'=>$log[$data_size], 'session_count'=>1 }; $data_sum += $log[$data_size]; $session_sum += 1; if ( not exists $categories{$log[$category]} ) { $categories{$log[$category]} = 1; } else { $categories{$log[$category]} += 1; } if($debug){ print "does not exist\n"; print "log user = $log[$user] \n"; print "table user = $table{$log[$md5sum]}->{user} \n"; print "sessions = $session_sum\n"; print "bandwidth = $data_sum\n"; } } else { $table{$log[$md5sum]}->{total_data} += $log[$data_size]; $data_sum += $log[$data_size]; if($debug){ print "exists \n"; print "bandwidth = $data_sum\n"; } } } } close FILE or die "Error: can't close file\n $! \n"; } } sub print_report { if ($debug) {print "print_report\n";} if ($arg_username) { print "Useage report for: $arg_username\n"; } elsif ($arg_pc) { print "Useage report for: $arg_pc\n"; } elsif ($arg_source_ip) { print "Useage report for: $arg_source_ip\n"; } print "Number of sessions: $session_sum\n"; my $x = new Number::Format; $formatted = $x->format_bytes($data_sum); print "Total bandwidth consumed: $formatted\n\n"; foreach my $category (sort (keys(%categories))) { printf "%s\n", uc($category); foreach $key (keys(%table)) { if ( $table{$key}->{category} eq $category) { my $url = substr($table{$key}->{url}, 0, 35); print "$table{$key}->{month} $table{$key}->{day} $table{$key}->{time} $url $table{$key}->{action}\n"; } } print "\n"; } } ############################################### # main ############################################### my $search_field; my $search_equals; get_options(); if ($arg_username or $arg_pc) { $search_field = $user; if ($arg_username eq "ANON"){ $search_equals = $arg_username; } else { $search_equals = sprintf("[ldap0:%s]", $arg_username); } } elsif ($arg_source_ip) { $search_field = $source_ip; $search_equals = $arg_source_ip; } elsif ($arg_dest_url) { $search_field = $url; $search_equals = $arg_dest_url; } elsif ($arg_category) { $search_field = $category; $search_equals = $arg_category; } if ($debug) {print "search_field == $search_field\n"} parse_logs($search_field, $search_equals, $arg_days); print_report();
This script produces output like the following:
Usage report for: circus-user Number of sessions: 401 Total bandwidth consumed: 19.39M ADVERTISEMENT-POP-UPS,GAME-MEDIA,CUSTOM-2 Feb 17 15:44:17 http://games.mochiads.com/c/p/the-r ALLOWED AUCTIONS Feb 17 17:08:21 http://rover.ebay.com/ar/1/56033/1? ALLOWED AUCTIONS,MOTOR-VEHICLES,CUSTOM-1 Feb 17 17:13:44 http://edmunds.autotrader.com/js/jq ALLOWED Feb 17 17:13:45 http://edmunds.autotrader.com/inc/g ALLOWED Feb 17 17:13:46 http://edmunds.autotrader.com/inc/j ALLOWED Feb 17 17:13:46 http://edmunds.autotrader.com/inc/j ALLOWED Feb 17 17:13:47 http://edmunds.autotrader.com/dwr/i ALLOWED Feb 17 17:13:50 http://edmunds.autotrader.com/no_ca ALLOWED BUSINESS Feb 17 15:09:08 http://www.statcounter.com/counter/ ALLOWED Feb 17 15:10:15 http://www.alagasco.com/fw/_css/fle ALLOWED Feb 17 15:10:16 http://www.alagasco.com/scripts/jFa ALLOWED Feb 17 15:10:17 http://www.alagasco.com/fw/_js/flex ALLOWED