Viewing file: webalizer.html (86.23 KB) -rw-rw-rw- Select action/file-type: (+) | (+) | (+) | Code (+) | Session (+) | (+) | SDB (+) | (+) | (+) | (+) | (+) | (+) |
webalizer
webalizer
NAME
SYNOPSIS
DESCRIPTION
RUNNING THE WEBALIZER
INCREMENTAL PROCESSING
REVERSE DNS LOOKUPS
COMMAND LINE OPTIONS
CONFIGURATION FILES
FILES
BUGS
COPYRIGHT
AUTHOR
NAME
|
webalizer - A web server log file analysis tool. |
SYNOPSIS
|
webalizer [ option ... ] [ log-file
] |
|
webazolver [ option ... ] [ log-file
] |
DESCRIPTION
|
The Webalizer is a web server log file analysis
program which produces usage statistics in HTML format for
viewing with a browser. The results are presented in both
columnar and graphical format, which facilitates
interpretation. Yearly, monthly, daily and hourly usage
statistics are presented, along with the ability to display
usage by site, URL, referrer, user agent (browser),
username, search strings, entry/exit pages, and country
(some information may not be available if not present in the
log file being processed). |
|
The Webalizer supports CLF (common log format)
log files, as well as Combined log formats as defined
by NCSA and others, and variations of these which it
attempts to handle intelligently. In addition, the
Webalizer also supports wu-ftpd xferlog
formatted log files, allowing analysis of ftp servers, and
squid proxy logs. Logs may also be compressed, via
gzip. If a compressed log file is detected, it will
be automatically uncompressed while it is read. Compressed
logs must have the standard gzip extension of
.gz. |
|
webazolver is normally just a symbolic link to the
webalizer. When run as webazolver, only DNS
file creation/updates are performed, and the program will
exit once complete. All normal options and configuration
directives are available, however many will not be used. In
addition, a DNS cache file must be specified. If the number
of DNS children processes to use are not specified, the
webazolver will default to 5. |
|
This documentation applies to The Webalizer Version
2.01 |
RUNNING THE WEBALIZER
|
The Webalizer was designed to be run from a Unix
command line prompt or as a crond(8) job. Once
executed, the general flow of the program is: |
|
A default configuration file is scanned for. A file named
webalizer.conf is searched for in the current
directory, and if found, it's configuration data is parsed.
If the file is not present in the current directory, the
file /etc/webalizer.conf is searched for and, if
found, is used instead. |
|
Any command line arguments given to the program are parsed.
This may include the specification of a configuration file,
which is processed at the time it is
encountered. |
|
If a log file was specified, it is opened and made ready for
processing. If no log file was given, STDIN is used
for input. If the log filename '-' is specified,
STDIN will be forced. |
|
If an output directory was specified, the program does a
chdir(2) to that directory in prepration for
generating output. If no output directory was given, the
current directory is used. |
|
If a non-zero number of DNS Children processes were
specified, they will be started, and the specified log file
will be processed, creating or updating the specified DNS
cache file. |
|
If no hostname was given, the program attempts to get the
hostname using a uname(2) system call. If that fails,
localhost is used. |
|
A history file is searched for in the current directory
(output directory) and read if found. This file keeps totals
for previous months, which is used in the main
index.html HTML document. Note: The file
location can now be specified with the HistoryName
configuration option. |
|
If incremental processing was specified, a data file is
searched for and loaded if found, containing the 'internal
state' data of the program at the end of a previous run.
Note: The file location can now be specified with the
IncrementalName configuration option. |
|
Main processing begins on the log file. If the log spans
multiple months, a seperate HTML document is created for
each month. |
|
After main processing, the main index.html page is
created, which has totals by month and links to each months
HTML document. |
|
A new history file is saved to disk, which includes totals
generated by The Webalizer during the current
run. |
|
If incremental processing was specified, a data file is
written that contains the 'internal state' data at the end
of this run. |
INCREMENTAL PROCESSING
|
Version 1.2x of The Webalizer adds incremental run
capability. Simply put, this allows processing large log
files by breaking them up into smaller pieces, and
processing these pieces instead. What this means in real
terms is that you can now rotate your log files as often as
you want, and still be able to produce monthly usage
statistics without the loss of any detail. Basically, The
Webalizer saves and restores all internal data in a
file named webalizer.current. This allows the program
to 'start where it left off' so to speak, and allows the
preservation of detail from one run to the next. The data
file is placed in the current output directory, and is a
plain ascii text file that can be viewed with any standard
text editor. It's location and name may be changed using the
IncrementalName configuration keyword. |
|
Some special precautions need to be taken when using the
incremental run capability of The Webalizer.
Configuration options should not be changed between runs, as
that could cause corruption of the internal data stored. For
example, changing the MangleAgents level will cause
different representations of user agents to be stored,
producing invalid results in the user agents section of the
report. If you need to change configuration options, do it
at the end of the month after normal processing of the
previous month and before processing the current month. You
may also want to delete the webalizer.current file as
well. |
|
The Webalizer also attempts to prevent data
duplication by keeping track of the timestamp of the last
record processed. This timestamp is then compared to current
records being processed, and any records that were logged
previous to that timestamp are ignored. This, in theory,
should allow you to re-process logs that have already been
processed, or process logs that contain a mix of
processed/not yet processed records, and not produce
duplication of statistics. The only time this may break is
if you have duplicate timestamps in two seperate log
files... any records in the second log file that do have the
same timestamp as the last record in the previous log file
processed, will be discarded as if they had already been
processed. There are lots of ways to prevent this however,
for example, stopping the web server before rotating logs
will prevent this situation. This setup also necessitates
that you always process logs in chronological order,
otherwise data loss will occur as a result of the timestamp
compare. |
REVERSE DNS LOOKUPS
|
The Webalizer supports reverse DNS lookups through a DNS
cache file that is either created/updated at run-time,
or has been previously created, either by a previous run of
the webalizer, or by running the stand-alone version,
webazolver. In order to perform reverse DNS lookups,
a DNSCache filename must be specified. In order to
create/update the cache file at run-time, the
DNSChildren number must be non-zero. The
DNSChildren value specifies the number of children
processes to fork, each of which will perform reverse DNS
lookups in order to create/update the DNS cache file. See
the file DNS.README for additional
information. |
COMMAND LINE OPTIONS
|
The Webalizer supports many different configuration options
that will alter the way the program behaves and generates
output. Most of these can be specified on the command line,
while some can only be specified in a configuration file.
The command line options are listed below, with references
to the corresponding configuration file
keywords. |
|
Display all available command line options and exit
program. |
|
Display program version and exit program. |
|
Debug. Display debugging information for errors and
warnings. |
|
IgnoreHist. Ignore history. USE WITH CAUTION.
This will cause The Webalizer to ignore any previous
monthly history file only. Incremental data (if present) is
still processed. |
|
Incremental. Preserve internal data between
runs. |
|
Quiet. Supress informational messages. Does not
supress warnings or errors. |
|
ReallyQuiet. Supress all messages including warnings
and errors. |
|
TimeMe. Force display of timing information at end of
processing. |
|
Use configuration file file. |
|
HostName. Use the hostname name. |
|
OutputDir. Use output directory
dir. |
|
ReportTitle. Use name for report
title. |
|
LogType. Specify log type to be processed. Value can
be either clf, ftp or squid format. If
not specified, will default to CLF format. FTP
logs must be in standard wu-ftpd xferlog
format. |
|
FoldSeqErr. Fold out of sequence log records back
into analysis, by treating as if they were the same
date/time as the last good record. Normally, out of sequence
log records are simply ignored. |
|
CountryGraph. Supress country graph. |
|
HourlyGraph. Supress hourly graph. |
|
HTMLExtension. Defines HTML file extension to use. If
not specified, defaults to html. Do not include the
leading period. |
|
HourlyStats. Supress hourly statistics. |
|
GraphLegend. Supress color coded graph
legends. |
|
GraphLines. Specify number of background lines.
Default is 2. Use zero ('0') to disable the
lines. |
|
PageType. Specify file extensions that are considered
pages. Sometimes referred to as
pageviews. |
|
VisitTimeout. Specify the Visit timeout period.
Specified in number of seconds. Default is 1800 seconds (30
minutes). |
|
IndexAlias. Use the filename name as an
additional alias for index.. |
|
MangleAgents. Mangle user agent names according to
the mangle level specified by num. Mangle levels
are: |
|
5 Browser name and major version. |
|
4 Browser name, major and minor version. |
|
3 Browser name, major version, minor version to two
decimal places. |
|
2 Browser name, major and minor versions and
sub-version. |
|
1 Browser name, version and machine type if
possible. |
|
0 All informaiton (left unchanged). |
|
GroupDomains. Automatically group sites by domain.
The grouping level specified by num can be thought of
as 'the number of dots' to display in the grouping. The
default value of 0 disables any domain
grouping. |
|
DNSCache. Use the DNS cache file
name. |
|
DNSChildren. Use num DNS children processes to
perform DNS lookups, either creating or updateing the DNS
cache file. Specify zero (0) to disable cache file
creation/updates. If given, a DNS cache filename must be
specified. |
|
HideAgent. Hide user agents matching
name. |
|
HideReferrer. Hide referrer matching
name. |
|
HideSite. Hide site matching
name. |
|
HideAllSites. Hide all individual sites (only display
groups). |
|
HideURL. Hide URL matching name. |
|
TopAgents. Display the top num user agents
table. |
|
TopReferrers. Display the top num referrers
table. |
|
TopSites. Display the top num sites
table. |
|
TopURLs. Display the top num URL's
table. |
|
TopCountries. Display the top num countries
table. |
|
TopEntry. Display the top num entry pages
table. |
|
TopExit. Display the top num exit pages
table. |
CONFIGURATION FILES
|
Configuration files are standard ascii(7) text files
that may be created or edited using any standard editor.
Blank lines and lines that begin with a pound sign ('#') are
ignored. Any other lines are considered to be configurgation
lines, and have the form "Keyword Value", where
the Keyword is one of the currently available configuration
keywords defined below, and 'Value' is the value to assign
to that particular option. Any text found after the keyword
up to the end of the line is considered the keyword's value,
so you should not include anything after the actual value on
the line that is not actually part of the value being
assigned. The file sample.conf provided with the
distribution contains lots of useful documentation and
examples as well. General Configuration
Keywords |
|
Use log file named name. If none specified,
STDIN will be used. |
|
Specify log file type as name. Values can be either
web, squid or ftp, with the default
being web. |
|
Create output in the directory dir. If none
specified, the current directory will be used. |
|
Filename to use for history file. Relative to output
directory unless absolute name is given (ie: starts with
'/'). Defaults to webalizer.hist' in the standard
output directory. |
|
Use the title string name for the report title. If
none specified, use the default of (in english)
"Usage Statistics for ". |
|
Set the hostname for the report as name. If none
specified, an attempt will be made to gather the hostname
via a uname(2) system call. If that fails,
localhost will be used. |
|
Use https:// on links to URLS, instead of the default
http://, in the 'Top URL's'
table. |
|
Supress informational messages. Warning and Error messages
will not be supressed. |
|
Supress all messages, including Warning and Error
messages. |
|
Print extra debugging information on Warnings and
Errors. |
|
Force timing information at end of processing. |
|
Use GMT (UTC) time instead of local timezone
for reports. |
|
Ignore previous monthly history file. USE WITH
CAUTION. Does not prevent Incremental file
processing. |
|
Fold out of sequence log records back into analysis by
treating them as if they had the same date/time as the last
good record. Normally, out of sequence log records are
ignored. |
|
CountryGraph ( yes | no ) |
|
Display Country Usage Graph in output report. |
|
Display Daily Graph in output report. |
|
Display Daily Statistics in output report. |
|
Display Hourly Graph in output report. |
|
Display Hourly Statistics in output report. |
|
Define the file extensions to consider as a page. If
a file is found to have the same extension as name,
it will be counted as a page (sometimes called a
pageview). |
|
Allows the color coded graph legends to be
enabled/disabled. |
|
Specify the number of background reference lines displayed
on the graphs produced. Disable by using zero ('0'),
default is 2. |
|
Specifies the visit timeout value. Default is 1800
seconds (30 minutes). A visit is determined by looking
at the difference in time between the current and last
request from a specific site. If the difference is greater
or equal to the timeout value, the request is counted as a
new visit. Specified in seconds. |
|
Use name as an additional alias for
index.*. |
|
Mangle user agent names based on mangle level num.
See the -M command line switch for mangle levels and
their meaning. The default is 0, which doesn't mangle
user agents at all. |
|
SearchEngine name variable |
|
Allows the specification of search engines and their query
strings. The name is the name to match against the
referrer string for a given search engine. The
variable is the cgi variable that the search engine
uses for queries. See the sample.conf file for
example usage with common search engines. |
|
Enable Incremental mode processing. |
|
Filename to use for incremental data. Relative to output
directory unless an absolute name is given (ie: starts with
'/'). Defaults to webalizer.current' in the standard
output directory. |
|
Filename to use for the DNS cache. Relative to output
directory unless an absolute name is given (ie: starts with
'/'). |
|
Number of children DNS processes to run in order to
create/update the DNS cache file. Specify zero (0) to
disable. |
|
Display the top num User Agents table. Use zero to
disable. |
|
Create seperate HTML page with All User
Agents. |
|
Display the top num Referrers table. Use zero to
disable. |
|
AllReferrers ( yes | no ) |
|
Create seperate HTML page with All
Referrers. |
|
Display the top num Sites table. Use zero to
disable. |
|
Display the top num Sites (by KByte) table. Use zero
to disable. |
|
Create seperate HTML page with All
Sites. |
|
Display the top num URLs table. Use zero to
disable. |
|
Display the top num URLs (by KByte) table. Use zero
to disable. |
|
Create seperate HTML page with All URLs. |
|
Display the top num Countries in the table. Use zero
to disable. |
|
Display the top num Entry Pages in the table. Use
zero to disable. |
|
Display the top num Exit Pages in the table. Use zero
to disable. |
|
Display the top num Search Strings in the table. Use
zero to disable. |
|
AllSearchStr ( yes | no ) |
|
Create seperate HTML page with All Search
Strings. |
|
Display the top num Usernames in the table. Use zero
to disable. Usernames are only available if using http based
authentication. |
|
Create seperate HTML page with All
Usernames. |
|
Hide/Ignore/Group/Include Keywords |
|
Hide User Agents that match name. |
|
Hide Referrers that match name. |
|
Hide Sites that match name. |
|
HideAllSites ( yes | no ) |
|
Hide all individual sites. This causes only grouped sites to
be displayed. |
|
Hide URL's that match name. |
|
Hide Usernames that match name. |
|
Ignore User Agents that match name. |
|
Ignore Referrers that match name. |
|
Ignore Sites that match name. |
|
Ignore URL's that match name. |
|
Ignore Usernames that match name. |
|
Group User Agents that match name. Display
Label in 'Top Agent' table if given (instead of
name). |
|
GroupReferrer name [Label] |
|
Group Referrers that match name. Display Label
in 'Top Referrer' table if given (instead of
name). |
|
Group Sites that match name. Display Label in
'Top Site' table if given (instead of
name). |
|
Automatically group sites by domain. The value num
specifies the level of grouping, and can be thought of as
the 'number of dots' to be displayed. The default value of
0 disables domain grouping. |
|
Group URL's that match name. Display Label in
'Top URL' table if given (instead of
name). |
|
Group Usernames that match name. Display Label
in 'Top Usernames' table if given (instead of
name). |
|
Force inclusion of sites that match name. Takes
precedence over Ignore# keywords. |
|
Force inclusion of URL's that match name. Takes
precedence over Ignore# keywords. |
|
Force inclusion of Referrers that match name. Takes
precedence over Ignore# keywords. |
|
Force inclusion of User Agents that match name. Takes
precedence over Ignore* keywords. |
|
Force inclusion of Usernames that match name. Takes
precedence over Ignore* keywords. |
|
Defines the HTML file extension to use. Default is
html. Do not include the leading period! |
|
Insert text at the very beginning of the generated
HTML file. Defaults to a standard html 3.2 DOCTYPE
record. |
|
Insert text within the <HEAD></HEAD>
block of the HTML file. |
|
Insert text in HTML page, starting with the
<BODY> tag. If used, the first line must be a
<BODY ...> tag. Multiple lines may be
specified. |
|
Insert text at top (before horiz. rule) of HTML
pages. Multiple lines may be specified. |
|
Insert text at bottom of the HTML page. The
text is top and right aligned within a table column
at the end of the report. |
|
Insert text at the very end of the HTML page. If not
specified, the default is to insert the ending </BODY>
and </HTML> tags. If used, you must supply
these tags yourself. |
|
The Webalizer allows you to export processed data to other
programs by using tab delimited text files. The
Dump* commands specify which files are to be written,
and where. |
|
Save dump files in directory name. If not specified,
the default output directory will be used. Do not specify a
trailing slash (/fP). |
|
Use name as the filename extension for dump files. If
not given, the default of tab will be
used. |
|
Print a column header as the first record of the
file. |
|
Dump the sites data to a tab delimited file. |
|
Dump the url data to a tab delimited file. |
|
DumpReferrers ( yes | no ) |
|
Dump the referrer data to a tab delimitd file. This data is
only available if using a log that contains referrer
information (ie: a combined format web log). |
|
Dump the user agent data to a tab delimited file. This data
is only available if using a log that contains user agent
information (ie: a combined format web log). |
|
Dump the username data to a tab delimited file. This data is
only available if processing a wu-ftpd xferlog or a web log
that contains http authentication information. |
|
DumpSearchStr ( yes | no ) |
|
Dump the search string data to a tab delimited file. This
data is only available if processing a web log that contains
referrer information and had search string information
present. |
FILES
|
Default configuration file. Is searched for in the current
directory and if not found, in the /etc/
directory. |
|
Monthly history file for previous 12 months. (can be
changed) |
|
Current state data file (Incremental processing). (can be
changed) |
|
Various monthly HTML output files produced.
(extension can be changed) |
|
Various monthly image files used in the
reports. |
|
Monthly tab delimited text files. (extension can be
changed) |
BUGS
|
Report bugs to brad@mrunix.net. |
COPYRIGHT
|
Copyright (C) 1997-2000 by Bradford L. Barrett. Distributed
under the GNU GPL. See the files "COPYING"
and "Copyright", supplied with all
distributions for additional information. |
AUTHOR
|
Bradford L. Barrett
<brad@mrunix.net> |
|