AWStats Data File Shrinker
People who run AWStats on large log files have most likely noticed: the data files can grow quite large, resulting in both a waste of disk space and longer page generation times for the AWStats pages. I wrote a small script that analyzes these data files and can remove any information you think is unnecessary.
Download: awshrink ⓘ (copy to /usr/bin to install).
Important
Do NOT use this script on data files that are not completed yet (i.e. data files of the month you’re living in). This will result in inaccurate sorting of visits, pages, referers and whatever other list you’re shrinking. Also, keep in mind that this is just a fast written perl hack, it is by no means fast and may hog some memory while shrinking data files.
Usage
awshrink [-c -s] [-SECTION LINES] [..] datafile
-s Show statistics
-c Overwrite datafile instead of writing to a backupfile (datafile~)
-SECTION LINES
Shrink the selected SECTION to LINES lines. (See example below)
Typical command-line usage
While awshrink is most useful for monthly cron jobs, here’s an example of basic command line usage to demonstrate what the script can do:
$ wc -c awstats122007.a.txt
29916817 awstats122007.a.txt
$ awshrink -s awstats122007.a.txt
Section Size (Bytes) Lines
SCREENSIZE* 74 0
WORMS 131 0
EMAILRECEIVER 135 0
EMAILSENDER 143 0
CLUSTER* 144 0
LOGIN 155 0
ORIGIN* 178 6
ERRORS* 229 10
SESSION* 236 7
FILETYPES* 340 12
MISC* 341 10
GENERAL* 362 8
OS* 414 29
SEREFERRALS 587 34
TIME* 1270 24
DAY* 1293 31
ROBOT 1644 40
BROWSER 1992 127
DOMAIN 2377 131
UNKNOWNREFERERBROWSER 5439 105
UNKNOWNREFERER 20585 317
SIDER_404 74717 2199
PAGEREFS 130982 2500
KEYWORDS 288189 27036
SIDER 1058723 25470
SEARCHWORDS 5038611 157807
VISITOR 23285662 416084
* = not shrinkable
$ awshrink -s -c -VISITOR 100 -SEARCHWORDS 100 -SIDER 100 awstats122007.a.txt
Section Size (Bytes) Lines
SCREENSIZE* 74 0
WORMS 131 0
EMAILRECEIVER 135 0
EMAILSENDER 143 0
CLUSTER* 144 0
LOGIN 155 0
ORIGIN* 178 6
ERRORS* 229 10
SESSION* 236 7
FILETYPES* 340 12
MISC* 341 10
GENERAL* 362 8
OS* 414 29
SEREFERRALS 587 34
TIME* 1270 24
DAY* 1293 31
ROBOT 1644 40
BROWSER 1992 127
SEARCHWORDS 2289 100
DOMAIN 2377 131
SIDER 3984 100
UNKNOWNREFERERBROWSER 5439 105
VISITOR 5980 100
UNKNOWNREFERER 20585 317
SIDER_404 74717 2199
PAGEREFS 130982 2500
KEYWORDS 288189 27036
* = not shrinkable
$ wc -c awstats122007.a.txt
546074 awstats122007.a.txt