http-analyze(8L) http-analyze(8L) Local Commands NAME http-analyze - a real fast log analyzer for web servers SYNOPSIS http-analyze [-{d|m|h}] [-nrstuvxz] [-c cfgfile] [-i logfile] [-o outdir] [-p privdir] [-N #s|u] [-H homepage] [-S srvname] [-T title] [file] DESCRIPTION http-analyze analyzes logfiles of web servers and creates detailed statistics of the servers's access load in graphical and tabular form. http-analyze expects logfiles entries in common logfile format, which is used by web servers such as Netscape's, NCSA's, and CERN's httpd. If your server uses another format, http-analyze can't read the logfile. http-analyze has been highly optimized to process large logfiles at the maximum possible speed. This is achieved by using a history mechanism to skip logfile entries which have been processed already in a previous run of the program, and by using two modes of operation (named after their maximum useful update interval) with a different detail level in the analysis of the logfile entries: daily mode (option -d): http-analyze generates a short summary showing the hits per day only. By using a history to skip entries processed already and by avoiding detailed analysis of each log entry, http-analyze requires only a fraction of the time needed for a full report. monthly mode (option -m): In this mode, a full report with much more details is generated. The history is used to produce a summary for the last 12 month. If your logfiles are rather large, you can use an update-interval in the range of one to 24 hours to generate a short statistics more frequently and an update-interval from one to 30 days to generate a full report. Since http-analyze maintains a history of the results from previous runs, you may rotate the logfile on a daily base when generating short (daily) reports. However, to generate a full (monthly) report you have to feed all logfiles of the appropriate summary period to http-analyze at once, because the program needs to do further analysis on all logfile entries. After generating a detailed report for a month, you can save the corresponding logfile(s) on tape and remove them from your system. HTML OUTPUT FILES In daily mode, http-analyze writes the short summary into the output file stats.html and updates the daily values in the history file. The short summary includes the following informations by day (see the following section for an explanation of this numbers): - the total number of hits - 1 - Formatted: April 26, 2024 http-analyze(8L) http-analyze(8L) Local Commands - the total number of 304's (Not Modified responses) - the total number of files transferred - the total number of unique sites - the amount of data sent by the server In monthly mode, http-analyze updates the short summary in stats.html and the monthly values in the history file. Additionally, it creates the following files: statsMMYY.html contains the detailed summary for the period determined by analyzing the logfile. MM and YY are replaced by the month and the year respectively. filesMMYY.html lists the URLs of all documents sent by your server. This file is created by default, but you can suppress its creation with an option if you want to exclude them from the statistics. sitesMMYY.html lists the hostnames of all sites accessing your server if the server could successfully resolve the IP address. Again, this file is created by default unless you explicitely suppress its creation. statsYYYY.html or index.html contains a summary of the last 12 month. Which name is choosen depends on the date of the last logfile entry processed: If the last entry indicates that http-analyze is analyzing the current month's log, the name index.html is used for easy reference of the statistics pages. In all other cases the name statsYYYY.html is used. This naming convention allows you to create reports for previous summary periods (e.g. for last year) without affecting the results for the current period. gr-icon.gif a small icon for your link to the statistics page (59x41 pixels). All files are created in the current directory unless you explicitely specify an output directory for the HTML files. Furthermore, the files containing the detailed lists of sites and URLs may be created in a private directory to protect them by authorization. The full summary (statsYYMM.html) contains the following informations: - the total number of hits/304's/files/KB for this month - the amount of data requested/transferred/saved by cache - the total number of unique URLs/sites for this month - the numbers of response codes other than 200 (OK) or 304 (NoMod) - the maximum/average hits per day/hour - the total number of hits/files/304's/sites/KB by day - 2 - Formatted: April 26, 2024 http-analyze(8L) http-analyze(8L) Local Commands - the top 5 seconds, 5 minutes, and 24 hours of the summary period - the top 10 sites accessing your server most often - the top 30 most commonly accessed URLs - the last 10 frequently accessed URLs - the hits/304's/KB sent by Country The following section describes the meaning of all entries in the summary report, which are not self-explaining: Hits (color key: green) The total number of hits processed by the server including requests which did generate an invalid response. Files (color key: blue) The total number of files kind sent by the server (OK responses). Here "file" means any kind of file, thus including not only documents, but also images, CGI scripts, audio and video clips, etc. 304's (color key: yellow) A code 304 (Not Modified) response is sent by the server if a document hasn't been updated since the last time it was requested. This field therefore contains the total number of requests which didn't cause the transmission of a file because of various caching mechanisms used by proxies and browsers. Other responses The total number of all answers from the server which are not OK (200) or Not Modified (304) responses. The full summary includes a list of all those other responses. Unique URLs This field contains the total number of unique URLs (not counting erroneous requests). Unique sites (color key: red) In the Totals section, this is the total number of unique sites per month, while in the Hits by day section it reflects the number of unique sites per day. Therefore, the sum of all sites shown in the "Hits by day" section is not equal to the total number of unique sites. KBytes requested The amount of data requested by the users of your server. http-analyze computes this number by adding the values of the next two fields (see below). KBytes transferred (color key: orange) The amount of data sent as reported by the server. - 3 - Formatted: April 26, 2024 http-analyze(8L) http-analyze(8L) Local Commands KBytes saved by cache The amount of data saved by various caching mechanisms. It's value is computed by multiplying the number of Not Modified requests per page with the size of the document (if known). Note: Because http-analyze can determine the size of a page only if the page has been requested successfully at least once in the same summary period, the values for "KB saved by cache" and "KB requested" are just approximations of the real values. OPTIONS -h print a short help list explaining the usage of the options. -d (daily mode) generate short statistics for the current month only. If a history file exists, the values for previous days are read from this file and the corresponding logfile entries are skipped. If the history file does not exist, the whole logfile will be processed and a history will be created. (This option is set by default.) -m (monthly mode) generate full statistics for a whole month. Although the values from the history file are usually used to create a summary for the last 12 month, the actual logfile entries always have preceedence over any records in the history file. This means that you should rotate your logfile at least on a monthly base. The option -m includes -d. -n (no update) don't create or update the history file. Useful if you want to generate statistics for previous summary periods (before the last month) without overwriting the current state of the history. -r don't create a list of all URLs for hidden items (if any) in the full statistics. -s (no sitelist) don't create a list of all sites in the full statistics. -t (no TOP lists) don't create the top seconds/minutes/hours lists. Also suppresses the "Hits by hours" bar chart. -u (no URL list) don't create a list of all requested URLs in the full statistics. -v (verbose) comment ongoing processing. -x Don't comprise images by default. Normally, http-analyze sums up the values of all images (*.gif, *.jpg, *.ief, *.pcd, *.rgb, *.xbm, *.xpm, *.xwd, *.tif) and hides them under the item "All images" to avoid getting the top lists filled up with lots of image URLs. If -x is given, images are accounted for as single - 4 - Formatted: April 26, 2024 http-analyze(8L) http-analyze(8L) Local Commands items. -z don't create graphical representations of the results. -c cfgfile Use cfgfile as the configuration file. By using a config file, http-analyze allows you to define some options and to tailor the basic HTML page layout somewhat. See "CONFIGURATION FILE" below for a description of the config file format. -i logfile Use logfile as the server's logfile. If `-' is given, stdin is processed. See also the HTTPLogFile entry in the config file. -o outdir This is the name of the directory where the HTML output files should be created. If no directory is given, the files are created in the current directory. See also the HTMLDir entry in the config file. -p privdir Use this directory for the list of all URLs/sites (filesMMYY.html and sitesMMYY.html) . This is useful if you want to grant public access to your web server's statistics while permitting access to the detailed lists to the staff only by using server authentication. See also the PrivateDir entry in the config file. -H homepage Use homepage as an alternate name for homepages. If your index files are named index.html, there is no need to define this option. However, if your server looks for more than one filename (eg. index.html,Welcome.html, and home.html, you must define the latter two explicitely. http-analyze truncates the URLs containing a homepage name so that they merge with `/' or their "base URL", respectively. (For example, the "base URL" for /dir/index.html is /dir/ .) You can define up to three alternate names in addition to index.html. See also the Homepage entry in the config file. -N #{sul} This option defines the number of entries in the top site (s or S), top URL (u or U), or last URL (l or L) list. # is either a positive number or the value 0 to suppress the corresponding list. Note that the list of last frequently accessed URLs is generated only if the number of all unique URLs is greater than the sum of the entries in the top and last URL lists. See also the entries TopSites, TopURLs, and LastURLs in the config file. -S srvname Use srvname as the name of the server in the title of the HTML - 5 - Formatted: April 26, 2024 http-analyze(8L) http-analyze(8L) Local Commands files. If undefined, http-analyze tries to determine the server name itself. Note: http-analyze uses either the uname (2) or the gethostname (2) function to determine the server name depending on what has been defined at compilation time. On most System V implementations, uname returns the nodename (eg. host), while gethostname often returns the full qualified domain name (FQDN, eg. host.my.domain). See also the ServerName entry in the config file. -T title Use title as the document title and header for the HTML files. http-analyze appends the server name and the current summary period to this string. If left undefined, a default phrase is used. See also the DocTitle entry in the config file. CONFIGURATION FILE When specified with the option -c, http-analyze reads some defaults from the named configuration file. Parameters defined with options always take preceedence over the definitions in this configuration file. The configuration file contains one entry per line. Each entry has a name field and one or two value fields, which must be separated by one or more tabulator characters (not blanks!). All names are case-insensitive. ServerName The name of your server (same as option -S). HTTPLogFile The name of the server's logfile. Note that if you define a default name of the logfile, this file gets processed if no other file is explicitely defined at the invocation of http-analyze. Without this definition, http-analyze processes stdin if no file is given. To process stdin even if a default name has been defined, use `-' as the filename for the logfile. DefaultMode Defines the default operation mode of http-analyze. The value field contains either the keyword daily or monthly. If left undefined, the default is the daily mode (-d). Homepage Up to three alternate names for homepages in addition to index.html (same as option -H). All URLs containing one of the homepage names will get truncated so they merge with `/' or the base URL respectively. HTMLDir The name of the directory where the HTML output files should be created (same as -o). If left undefined, files are created in the current directory. PrivateDir The name of a private directory where the detailed site and URL lists should be created (same as option -p). Access to this private directory may be granted to staff - 6 - Formatted: April 26, 2024 http-analyze(8L) http-analyze(8L) Local Commands only by using server authentication. Pathnames not beginning with a `/' are relative to HTMLDir. TopSites, TopURLs, and LastURLs The number of entries in the top site, top URLs, and last frequently used URLs lists (same as option -N ). If set to zero, the corresponding list will be suppressed. DocTitle The document title and header to use in the HTML output files (same as option -T). http-analyze appends the server's name and the current summary period to this string. HeadPrefix The prefix string to output before the document header (after the HTML <TITLE> tag). If HeadPrefix is defined, it must include the HTML <BODY> tag. If left undefined, HeadPrefix defaults to: HeadPrefix <BODY BGCOLOR="#D6D6D6"><P><HR SIZE="8"> HeadSuffix The suffix string to output after the document header (after DocTitle). Useful if you define left- or right- aligned images in HeadPrefix with the headline floating around. DocTrailer The trailer string to output at end of page. Useful to define a link back to your homepage, as in DocTrailer <BR><FONT SIZE="-1"><A HREF="/">Back</A> to my homepage</FONT> HideSys and HideURL These two entries let you define names of sites or URLs which should be hidden under some arbitrary text. Hidden items are accounted for separately, but in the summary they appear comprised under the description defined here. Both entries have two value fields: the first field following the name defines a site or an URL and the second field defines the text under which this item is to be hidden. The URL/site may begin or end with a `*' as a wildcard. However, inside strings, a `*' is taken literal. If the text a item is hidden under begins with a `[' character, the item is not shown in the top sites/URLs lists, but it will be always shown in the detailed sites/URLs lists. Note that URLs are case-sensitive, while sitenames are not. Note also, that images are hidden automatically unless the option -x is specified at invocation of http-analyze. See the sample.conf file for examples on how to use HideSys and HideURL. - 7 - Formatted: April 26, 2024 http-analyze(8L) http-analyze(8L) Local Commands EXAMPLES First of all, you must know the name of your server's logfile. If, for example, the name is /usr/ns-home/httpd-80/logs/access, you can create full statistics for the current month with the following command: http-analyze -vm -S www.myserver.com /usr/ns-home/httpd-80/logs/access This command will create a yearly summary in the file index.html (or statsYYYY.html for previous years) and a monthly summary in file statsMMYY.html, where MM is replaced by the month and YY is replaced by the year. If the period determined by analyzing the logfile is the current month, http-analyze creates also an up-to-date daily summary in the file stats.html. All files are created in the current directory. Assuming that your old logfiles have been saved under the name logYYYY/access.MM in the server's log directory, use the commands cd /usr/ns-home/httpd-80/logs http-analyze -vmn -o /usr/htdocs/stats log1996/access.01 to create full statistics for January '96 in the directory /usr/htdocs/stats preserving the current history (option -n). Note: Generating statistics for previous summary periods without the -n option will overwrite newer values in the history file. To reconstruct the history, you would have to run http-analyze for each following month until the very last one (this situation may be avoided in a following version of the program). Note also, that immediately after generating the statistics for the last month you should run http-analyze -m on the current logfile to create an up-to-date index file (index.html). Remember that this index file is created automatically only when creating a monthly summary for the current month. The following command creates statistics for a whole year using a customized configuration file and reading the log entries from a pipe: gzcat log1996/access.0?.gz | http-analyze -vm -c /usr/local/bin/sample.conf - - 8 - Formatted: April 26, 2024 http-analyze(8L) http-analyze(8L) Local Commands REGULAR INVOCATION VIA CRON To have statistics generated on a regular base, use the following scheme: 1) Optionally install a cron job which calls http-analyze -d frequently to create a daily summary. The execution interval may range from once per day up to twice per hour depending on the size of your logfile and the time needed to analyze it. On my server, I run the daily statistics once per hour. 2) Install a cron job which calls http-analyze -m to create a monthly summary once per week or once per day (again depending on the size of your logfile). Note that monthly summaries (statsMMYY.html) are created for the first time at the second day of a new month. On my server, I create a monthly summary two times per day. 3) Create a script which rotates the server's logfile, restarts the http server, and creates the final summary for this period. Have cron execute this script at 00:00 on the first day of a new month. See the script rotate-httpd for an example on how to do this for several virtual web servers running on the same machine. 4) Because of cron's scheduling overhead and delays in execution of the script which rotates the logfile, heavy used servers sometimes writes a few entries for the new month in the old logfile. http-analyze usually ignores such kind of "white noise" at the end of a month. However, to get correct figures, in this last step you should run http-analyze -m on the logfile for the current month immediately after generating the statistics for the previous month. Note that the cron jobs must run with the uid of the owner of the directory where the HTML output files are going to be created, except for the rotate script, which usually must run with the uid of the Server. You should also take care to avoid running more than one of the cron jobs related to http-analyze at the same time. Here are some sample crontab(1) entries for the scheme described above: # Generate a full report twice per day at 01:17 and 13:17 17 1,13 * * * /usr/local/bin/http-analyze -m -c /usr/httpd/analyze.conf # Generate a short summary each hour except at 01:17 or 13:17 17 2-12 * * * /usr/local/bin/http-analyze -d -c /usr/httpd/analyze.conf 17 14-23 * * * /usr/local/bin/http-analyze -d -c /usr/httpd/analyze.conf # Rotate the HTTPD logfiles at the first day, 00:00 of a new month 0 0 1 * * /usr/local/bin/rotate-httpd - 9 - Formatted: April 26, 2024 http-analyze(8L) http-analyze(8L) Local Commands COPYRIGHT Copyright c 1996 by Stefan Stapelberg, RENT-A-GURU(Reg.) Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and in all HTML output files, that both that copyright notice and this permission notice appear in the supporting documentation, and that the hypertext link to the homepage of http-analyze which the program produces is left intact. This software is provided "as is" without express or implied warranty. Credit for http-analyze must be given to RENT-A-GURU(Reg.) in all derived works. This does not affect your ownership of the derived work itself, and the intent is to assure proper credit for RENT-A- GURU(Reg.), not to interfere with your use of this software. If you have questions, ask. You may use this software at no cost on any installation, even at commercial sites. However, IT IS STRICTLY FORBIDDEN to sell or lease this software in whole or in part or to include it in whole or in part in a commercial product. If you plan to run http-analyze on a commercial installation and you need support, or if you would like to bundle the program with your products, you must sign an appropriate license agreement available from RENT-A-GURU(Reg.). Please send an email to <office@rent-a-guru.de>. RENT-A-GURU(Reg.) is a registered trademark of Martin Weitzel, Stefan Stapelberg, and Walter Mecky. AUTHOR Stefan Stapelberg, <stefan@rent-a-guru.de> CREDITS Thanks to the over 50 beta testers of http-analyzes for their feedback. Special thanks to <Lars-Owe.Ivarsson@its.uu.se> for his suggestions to optimize the parser algorithm and the code he provided as an example. Thanks also to Thomas Boutell (http://www.boutell.com) for his great GD library for fast GIF creation, without http-analyze couldn't produce such fancy graphics in the summary reports (gd 1.2 is copyright 1994, 1995, Quest Protein Database Center, Cold Spring Harbor Labs). - 10 - Formatted: April 26, 2024 http-analyze(8L) http-analyze(8L) Local Commands FILES Note: output files are always created in the directory given with the -o option, with the HTMLDir entry in the config file, or in the current directory (in this order). See also HTML OUTPUT FILES above. index.html, summary report for last 12 month statsYYYY.html summary report for year YYYY stats.html short summary (daily mode) statsMMYY.html full summary for MM/YY (monthly mode) filesMMYY.html list of all URLs requested in MM/YY sitesMMYY.html list of all sites accessing the server in MM/YY stats.hist the history file for the last 12 month and last N days avloadMMYY.gif the Hits by hours bar chart image (492x190) statsMMYY.gif the Hits/Files/Sites/KB by day bar chart image (492x317) cntryMMYY.gif the Total transfers by Country pie chart image (492x320) graphMMYY.gif the Hits/Files/Sites/KB graph image (490x317) sq_*.gif.gif icons for creating bars in the full summary (10x8) gr-icon.gif an icon for making links to your statistics page (59x41) NOTES If you are going to analyze different logfiles in one invocation of http-analyze, you must sort them in ascending order of their date, otherwise the logfiles being processed after the first logfile will be silently ignored. SEE ALSO 3Dstats(8L) A 3D Access Statistics Generator http://www.netstore.de/Supply/http-analyze/The homepage of http-analyze BUGS You tell me. - 11 - Formatted: April 26, 2024