[Yanel-dev] Boost access log file format

Cedric Staub cedric.staub at wyona.com
Tue Jun 29 09:28:13 CEST 2010


Hello there

> Cedric pointed out to me that it might make sense to have a different 
> format for the boost access log. It currently reads
> 
> Date, log-level, URL, realm-id, boost-cookie, user (if available), 
> Referer, User-Agent, E-Mail (if available)

A single log entry currently looks like this:
---
http://www.example.org/index.html r:example c:YA-1 ref:null
ua:Mozilla/4.0 (example)
---

While the field:value pairs are actually quite useful and make parsing
a log entry easier, the problem is that the values in the fields (e.g.
the user agent) can also contain colons/spaces, making it in certain
cases impossible to tell where a field starts or ends. 

I suggest we escape the colons and spaces, e.g. by using url encoding.
This can easily be done using java.net.URLEncoder on the logging side
and java.net.URLDecoder and the parsing side.

I would also suggest that we add "url:" to the front of the url in
order to make parsing more robust, right now the parser just assumes
that the url is the first field. Then we could change the format later
(e.g. moving the url to another position) without breaking the parser.

Then we'd just have to make sure any module that appends data actually
uses url encoding and doesn't just print it verbatim. Maybe we can add
a convenciance function somewhere to make this easier?

Anyway, I will try to reply with a patch implementing this shortly.

Greetings
Cedric


More information about the Yanel-development mailing list