Apache can be set up to log all sorts of information. As of Apache 2.2 the basic log file format that a fresh install of Apache will produce will have the following format:
%h %l %u %t "%r" %>s %b
Which doesn't mean a lot to the uninitiated, so here is a short explanation of each.
- %h - The remote host. This is the IP address of the user connecting to the server.
- %l - The remote logname. This is not always present.
- %u - The remote user from auth (nothing if authentication is not used).
- %t - The time in a common log format.
- "%r" - The first line of the request, basically the method used (GET/POST) the URL that was accessed and the HTTP protocol level that was used. This is enclosed in quotes.
- %>s - %s returns the status of the original request request. For some requests Apache will internally create a secondary request, so %>s prints out the last request staus.
- %b - This is the number of bytes transmitted to the user.
This would produce the following sort out output.
127.0.0.1 - - [17/Apr/2009:14:12:20 +0100] "GET / HTTP/1.1" 200 515
This information can be converted into a database format by using the LOAD DATA command. First, lets create the table we need to store this log format.
- <p>Here is the command that is used to convert the log file into that table.</p>
- <pre language=">LOAD DATA INFILE 'C:/Program Files/Apache Software Foundation/Apache2.2/logs/access.log' INTO TABLE apachelog
- FIELDS TERMINATED BY ' ' OPTIONALLY ENCLOSED BY '"' ESCAPED BY ''
- LINES TERMINATED BY 'rn';
Note that our table contains two fields for the time. This is because of two factors. The first is that each field is defined by a space, and because the time value contains a space it is split into two fields. The second is that although we say OPTIONALLY ENCLOSED BY '"' to stop the %r output being split apart, we can't give the LOAD DATA command more than one of these fields. As a result the time is split into the two fields, so we just create a table with more than one field for time to accommodate this.
To improve the table we can use a different output format. Take the following slight alteration to our log file format.
This line can be found in your http.conf file or your httpd-vhosts.conf if you have set up virtual hosts.
This will cause our output to look like the following:
127.0.0.1,-,-,[20/Apr/2009:14:42:01 +0100],"GET / HTTP/1.1",200,515
We can now change out two time columns to a single one, setting the datatype to VARCHAR(30) and using the following LOAD DATA syntax to load our data.
- LOAD DATA INFILE 'C:/Program Files/Apache Software Foundation/Apache2.2/logs/access.log' INTO TABLE apachelog
- FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY ''
- LINES TERMINATED BY 'rn';
This gives us a much better set of data in our table.
You can also use the following three MySQL commands to convert from the old table format to the new one.
- ALTER TABLE apachelog CHANGE time1 time VARCHAR(40);
- UPDATE apachelog SET time = CONCAT(time,' ',time2);
- ALTER TABLE apachelog DROP COLUMN time2;
The time2 column will now no longer exist.