Apache Log File Into MySQL Table

Apache can be set up to log all sorts of information. As of Apache 2.2 the basic log file format that a fresh install of Apache will produce will have the following format:

%h %l %u %t "%r" %>s %b

Which doesn't mean a lot to the uninitiated, so here is a short explanation of each.

  • %h - The remote host. This is the IP address of the user connecting to the server.
  • %l - The remote logname. This is not always present.
  • %u - The remote user from auth (nothing if authentication is not used).
  • %t - The time in a common log format.
  • "%r" - The first line of the request, basically the method used (GET/POST) the URL that was accessed and the HTTP protocol level that was used. This is enclosed in quotes.
  • %>s - %s returns the status of the original request request. For some requests Apache will internally create a secondary request, so %>s prints out the last request staus.
  • %b - This is the number of bytes transmitted to the user.

This would produce the following sort out output. - - [17/Apr/2009:14:12:20 +0100] "GET / HTTP/1.1" 200 515

This information can be converted into a database format by using the LOAD DATA command. First, lets create the table we need to store this log format.

DROP TABLE IF EXISTS `test`.`apachelog`;
CREATE TABLE  `test`.`apachelog` (
  `remote_host` varchar(17) DEFAULT NULL,
  `remote_logname` varchar(45) DEFAULT NULL,
  `remote_user` varchar(45) DEFAULT NULL,
  `time1` varchar(22) DEFAULT NULL,
  `time2` varchar(7) DEFAULT NULL,
  `first_line_of_request` text,
  `last_request_status` varchar(4) DEFAULT NULL,
  `bytes_sent` varchar(10) DEFAULT NULL

Here is the command that is used to convert the log file into that table.

LOAD DATA INFILE 'C:/Program Files/Apache Software Foundation/Apache2.2/logs/access.log' INTO TABLE apachelog

Note that our table contains two fields for the time. This is because of two factors. The first is that each field is defined by a space, and because the time value contains a space it is split into two fields. The second is that although we say OPTIONALLY ENCLOSED BY '"' to stop the %r output being split apart, we can't give the LOAD DATA command more than one of these fields. As a result the time is split into the two fields, so we just create a table with more than one field for time to accommodate this.

To improve the table we can use a different output format. Take the following slight alteration to our log file format.

LogFormat "%h,%l,%u,%t,"%r",%>s,%b"

This line can be found in your http.conf file or your httpd-vhosts.conf if you have set up virtual hosts.

This will cause our output to look like the following:,-,-,[20/Apr/2009:14:42:01 +0100],"GET / HTTP/1.1",200,515

We can now change out two time columns to a single one, setting the datatype to VARCHAR(30) and using the following LOAD DATA syntax to load our data.

LOAD DATA INFILE 'C:/Program Files/Apache Software Foundation/Apache2.2/logs/access.log' INTO TABLE apachelog

This gives us a much better set of data in our table.

You can also use the following three MySQL commands to convert from the old table format to the new one.

ALTER TABLE apachelog CHANGE time1 time VARCHAR(40);
UPDATE apachelog SET time = CONCAT(time,' ',time2);
ALTER TABLE apachelog DROP COLUMN time2;

The time2 column will now no longer exist.


Would love to see some example usage stats, such as % of returning visitors based on IP, average time spent of website etc etc
Did not think it would be so easy! Now I would like to put a command in Linux' crontab to have my table updated on a daily basis. So we need to care to avoid more than a single copy of each line in the apache log. Thank you!
I'll take this post like a little tutorial, when it comes to MSQL I need all the help I can get to get it right. Obviously you're explaining here some advanced features, I'll keep a closer look on you blog, I need it. Donnald

Add new comment

The content of this field is kept private and will not be shown publicly.
4 + 5 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.