projects@yorhel.nl
home - git - @ayo

Cute decorative scissors, cutting through your code.

AWStats Data File Shrinker

People who run AWStats on large log files have most likely noticed: the data files can grow quite large, resulting in both a waste of disk space and longer page generation times for the AWStats pages. I wrote a small script that analyzes these data files and can remove any information you think is unnecessary.

Download: awshrink (copy to /usr/bin to install).

Important

Do NOT use this script on data files that are not completed yet (i.e. data files of the month you’re living in). This will result in inaccurate sorting of visits, pages, referers and whatever other list you’re shrinking. Also, keep in mind that this is just a fast written perl hack, it is by no means fast and may hog some memory while shrinking data files.

Usage

awshrink [-c -s] [-SECTION LINES] [..] datafile
-s  Show statistics
-c  Overwrite datafile instead of writing to a backupfile (datafile~)
-SECTION LINES
  Shrink the selected SECTION to LINES lines. (See example below)

Typical command-line usage

While awshrink is most useful for monthly cron jobs, here’s an example of basic command line usage to demonstrate what the script can do:

$ wc -c awstats122007.a.txt
29916817 awstats122007.a.txt

$ awshrink -s awstats122007.a.txt
               Section  Size (Bytes)   Lines
           SCREENSIZE*            74       0
                WORMS            131       0
        EMAILRECEIVER            135       0
          EMAILSENDER            143       0
              CLUSTER*           144       0
                LOGIN            155       0
               ORIGIN*           178       6
               ERRORS*           229      10
              SESSION*           236       7
            FILETYPES*           340      12
                 MISC*           341      10
              GENERAL*           362       8
                   OS*           414      29
          SEREFERRALS            587      34
                 TIME*          1270      24
                  DAY*          1293      31
                ROBOT           1644      40
              BROWSER           1992     127
               DOMAIN           2377     131
UNKNOWNREFERERBROWSER           5439     105
       UNKNOWNREFERER          20585     317
            SIDER_404          74717    2199
             PAGEREFS         130982    2500
             KEYWORDS         288189   27036
                SIDER        1058723   25470
          SEARCHWORDS        5038611  157807
              VISITOR       23285662  416084
* = not shrinkable

$ awshrink -s -c -VISITOR 100 -SEARCHWORDS 100 -SIDER 100 awstats122007.a.txt
               Section  Size (Bytes)   Lines
           SCREENSIZE*            74       0
                WORMS            131       0
        EMAILRECEIVER            135       0
          EMAILSENDER            143       0
              CLUSTER*           144       0
                LOGIN            155       0
               ORIGIN*           178       6
               ERRORS*           229      10
              SESSION*           236       7
            FILETYPES*           340      12
                 MISC*           341      10
              GENERAL*           362       8
                   OS*           414      29
          SEREFERRALS            587      34
                 TIME*          1270      24
                  DAY*          1293      31
                ROBOT           1644      40
              BROWSER           1992     127
          SEARCHWORDS           2289     100
               DOMAIN           2377     131
                SIDER           3984     100
UNKNOWNREFERERBROWSER           5439     105
              VISITOR           5980     100
       UNKNOWNREFERER          20585     317
            SIDER_404          74717    2199
             PAGEREFS         130982    2500
             KEYWORDS         288189   27036
* = not shrinkable

$ wc -c awstats122007.a.txt
546074 awstats122007.a.txt