Icon

Software Engineering, Architecture, Web Development and beyond…

Analyzing Log Files with awk (gawk), grep

console Consider that you want to analyze server logs to find out whether the application server has failed at startup – i.e. whether there are some error messages or exceptions in server logs – or just successfully started without any failure from the start time on that the start script was executed, or maybe grep something between two certain points in time. Here is the way of how you can implement a script using awk and grep without any need of other programming langauges like Perl, Python, etc.

Here is how my log file seems:

INFO   | jvm 1    | 2012/01/04 17:52:17 | INFO: JK: ajp13 listening on /0.0.0.0:8009
INFO   | jvm 1    | 2012/01/04 17:52:17 | 04.01.2012 17:52:17 org.apache.jk.server.JkMain start
INFO   | jvm 1    | 2012/01/04 17:52:17 | INFO: Jk running ID=0 time=0/38  config=null
INFO   | jvm 1    | 2012/01/04 17:52:17 | 04.01.2012 17:52:17 org.apache.catalina.startup.Catalina start
INFO   | jvm 1    | 2012/01/04 17:52:17 | INFO: Server startup in 20703 ms

To compare two dates semantically – i don’t mean a lexical comparison in this case, we do need to convert the string expressions to actual time format in order to find out whether the first date given is greater than the second one and vice versa (or numerical comparison in msecs). You could even just use grep to find some occurencies of a string in a text, but in this case we want to narrow our context according to an interval and grep in it.

# (c) Erhan Bagdemir 2012 GPL 2.0 or later
#!/bin/bash
 
server_startup="clear"
 
function grep_interval() {
 
found=$(tail -n $2 $1 | gawk -v d="$3" '{
  t=$5" "$6
  regex="(^20[0-9][0-9]).([0][1-9]|1[0-2]).([0-2][0-9]|30|31)[[:space:]](0[0-9]|1[0-9]|2[1-3]).([0-5][0-9]).([0-5][0-9])"
  match(t,regex,arr)
  ref=mktime(d" ")
  for (i = 1; i < 7; i++) {
      sub(/^0/,"",arr[i]);
  }
  time_in_log=mktime(arr[1]" "arr[2]" "arr[3]" "arr[4]" "arr[5]" "arr[6]" "0)
  if (time_in_log > ref) {
      print $0
  }
 }' | grep -E -w -i "$4")
 
 if [ -n "$found" ]; then
      server_startup=$found
 fi
 
}
# search for errors from the time on 
remote_start_time=$(date +"%Y' '%m' '%d' '%k' '%M' '%S")
 
# start tomcat server
cd /usr/local/tomcat/bin
./catalina.sh start
 
# log file
log=catalina.out
 
# limit the count of lines not to grep the whole file
limit=100
 
# grep
grep_interval $log $limit $remote_start_time "FATAL|ERROR|SERVER\sSTARTUP"
 
# output
echo $server_startup

The variable “t” in AWK script holds the date (5th element in the line $5) and time values (6th position) which’re extracted from the log file, like “2012/01/04 17:52:17″.
With AWK’s “match” function we put all the matches using regular expression “regex” into an array, arr. ref is the startup time which’s passed with -v parameter to the awk like gawk -v d="$3". Since the mktime() doesn’t like leading zeros applying sub() function on array elements we can remove these unwanted zeros in each array element iterating through. time_in_log is the time converted to miliseconds from log. With time_in_log > ref we’re looking for log entries from the reference time (server startup) in msecs on and not just matching two strings but rather semantically . grep -E -w -i "$4" searching for occurencies defined with expression "FATAL|ERROR|SERVER\sSTARTUP" in a line.

With this smart shell script you can easily find out whether your server started successfully or not without any need of other programming language or tool on the shell.

PS: You would need to adjust the regular expression and the positionings of the date/time sections according to your requirements.



Writting a First Test Blog from my smartphone.

Here is my test article written on my iPhone. Thank to WordPress App which makes it happen. I think that i have a plausible reason to buy a new pad device.

20111029-212128.jpg

Making things @Cacheable

Performance optimizing with caching frameworks, or in general caching, like OSCache, EHCache is inevitable in software development and especially critically important in web development. If you want to serve contents from your data storages and if you’ve hunderds of requests – per minute – or even more, like on high-traffic portals, then it must be considered to cache your contents, objects, method calls, etc. in every layer efficiently. Most of ORM frameworks like Hibernate do support first-level and second level caching out-of-box and there are also many other caching strategies like distrubuted caching with Oracle Coherence or Jboss distributed caching facilities. But we won’t discuss in detail all of them. The aim of this post is to show application developers how to integrate custom caching mechanism easily using java annotations (Java 1.5+) from programmers’ point of view.

Think about the situation that you’re developing a website of which contents are supplied by a commercial partner in xml and you have to render these contents on-fly over a reliable connection and responsive server – or you can cache the contents in a relational database, but to reduce database request it’s a best practice to integrate custom caching in your service layer. Say, the contents are about weather informations from stations and you’re about to build a weather web application. Everytime your page gets called, without any caching mechanism your service has to call partner service also. It can bring about an overhead on your network and load on backend. Maybe just for that reason, you might rehandle your contract.

EHCache is a pretty good solution and a well known caching framework which’s broadly used to cache “data” programmaticaly – not only- creating and implementing your own CacheKeys. It is easier to integrate into spring web applications. But, things’re getting even better with ehcache-spring-annotations project on google code if the application deveper had a chance to cache some data, supplying some meta informations (god bless annotations) without any coding effort.

The common scenario is that we have a MVC application and the following controller handles user requests on “http://{yourhost.com}/{appcontext}/weatherchannel/index.html?sid=4567
“sid” stands for “station id” and passed as GET parameter to the controller.

@Controller
@RequestMapping("/weatherchannel")
public class WeatherChannelController {
 
@Autowired
private IWeatherDataGateway weatherGateway;
 
@RequestMapping(value="/index.html", method=RequestMethod.GET)
public String handleUserRequest(@RequestParam("sid") String stationId, ModelMap model) {
     List<weatherstationinfobean> stationInfos = weatherGateway.callPartnerEndpointRestfully(id);
     model.addAttribute("sinfos", stationInfos);
     return "stationinfo";
}
 
}
</weatherstationinfobean>

Read the rest of this entry »

erhan

Author


Hello, I'm Erhan Bagdemir and this is my blog. I talk about Java, J2EE, Frameworks, web application development, OOAD and various other topics often related to programming.

Erhan Bagdemir  Profil von Erhan Bagdemir auf LinkedIn anzeigen

ebagdemir on Stackoverflow

ebagdemir on Twitter

    Hamburg

    Slideshow