Apache can be set up to log all sorts of information. As of Apache 2.2 the basic log file format that a fresh install of Apache will produce will have the following format:
%h %l %u %t "%r" %>s %b
Which doesn't mean a lot to the uninitiated, so here is a short explanation of each.
- %h - The remote host. This is the IP address of the user connecting to the server.
- %l - The remote logname. This is not always present.
- %u - The remote user from auth (nothing if authentication is not used).
- %t - The time in a common log format.
- "%r" - The first line of the request, basically the method used (GET/POST) the URL that was accessed and the HTTP protocol level that was used. This is enclosed in quotes.
- %>s - %s returns the status of the original request request. For some requests Apache will internally create a secondary request, so %>s prints out the last request staus.
- %b - This is the number of bytes transmitted to the user.
This would produce the following sort out output.
127.0.0.1 - - [17/Apr/2009:14:12:20 +0100] "GET / HTTP/1.1" 200 515
This information can be converted into a database format by using the LOAD DATA command. First, lets create the table we need to store this log format.
DROP TABLE IF EXISTS `test`.`apachelog`; CREATE TABLE `test`.`apachelog` ( `remote_host` varchar(17) DEFAULT NULL, `remote_logname` varchar(45) DEFAULT NULL, `remote_user` varchar(45) DEFAULT NULL, `time1` varchar(22) DEFAULT NULL, `time2` varchar(7) DEFAULT NULL, `first_line_of_request` text, `last_request_status` varchar(4) DEFAULT NULL, `bytes_sent` varchar(10) DEFAULT NULL )
Here is the command that is used to convert the log file into that table.
LOAD DATA INFILE 'C:/Program Files/Apache Software Foundation/Apache2.2/logs/access.log' INTO TABLE apachelog FIELDS TERMINATED BY ' ' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '' LINES TERMINATED BY 'rn';
Note that our table contains two fields for the time. This is because of two factors. The first is that each field is defined by a space, and because the time value contains a space it is split into two fields. The second is that although we say OPTIONALLY ENCLOSED BY '"' to stop the %r output being split apart, we can't give the LOAD DATA command more than one of these fields. As a result the time is split into the two fields, so we just create a table with more than one field for time to accommodate this.
To improve the table we can use a different output format. Take the following slight alteration to our log file format.
This line can be found in your http.conf file or your httpd-vhosts.conf if you have set up virtual hosts.
This will cause our output to look like the following:
127.0.0.1,-,-,[20/Apr/2009:14:42:01 +0100],"GET / HTTP/1.1",200,515
We can now change out two time columns to a single one, setting the datatype to VARCHAR(30) and using the following LOAD DATA syntax to load our data.
LOAD DATA INFILE 'C:/Program Files/Apache Software Foundation/Apache2.2/logs/access.log' INTO TABLE apachelog FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '' LINES TERMINATED BY 'rn';
This gives us a much better set of data in our table.
You can also use the following three MySQL commands to convert from the old table format to the new one.
ALTER TABLE apachelog CHANGE time1 time VARCHAR(40); UPDATE apachelog SET time = CONCAT(time,' ',time2); ALTER TABLE apachelog DROP COLUMN time2;
The time2 column will now no longer exist.