Import old apache access logs to webalizer - ignoring records - apache2

I installed webalizer on my apache 2 webserver yesterday and came across the problem, that all the old access logs are not used. The directory list looks like that:
/var/log/apache2/
access.log
access.log1
access.log.10.gz
access.log.11.gz
...
How can I import all my files at once?
I tried several things, but it was telling me, that the records were ignored.
Hope somone can help. Thanks!

I ran into the same problem. I had just installed webalizer, and changed it to incremental mode (here are the relevant entries from my /etc/webalizer/webalizer.conf):
LogFile /var/log/apache2/access.log.1
OutputDir /var/www/htdocs/w
Incremental yes
IncrementalName webalizer.current
And then I ran webalizer by hand, which initialized the non-gz files in my logs directory. After that, any attempt to manually import an older gz logfile (by running webalizer /var/log/apache2/access.log.2.gz for instance) resulted in all of the entries being ignored.
I suspect this is because the entries found in the gz logs were older than the last import- I had to delete my webalizer.current file (really I cleared the whole dir- either way should work). Finally, in reverse order (oldest first), I could import the old gz files one at a time:
bhs128#home:~$ cd /var/log/apache2
bhs128#home:/var/log/apache2$ sudo rm -rf /var/www/htdocs/w/*
bhs128#home:/var/log/apache2$ ls -1t /var/log/apache2/access.log*gz | grep -o [0-9]* | tail -n1
52
bhs128#home:/var/log/apache2$ for i in {52..2}; do webalizer /var/log/apache2/access.log.$i.gz; done

I just had the same problem, and I took a look into the webalizer.current file:
$ head -n 2 webalizer.current
# Webalizer V2.21-02 Incremental Data - 11/05/2019 22:29:02
2019 11 5 22 29 2
The second line seems to contain the timestamp of the last run, so I just changed the year to 2018. After that, I was able to import older log files than the last imported ones, without having to delete all the data first.

Related

How to track Icecast2 visits with Matomo?

My beloved web radio has an icecast2 instance and it just works. We have also a Matomo instance to track visits on our WordPress website, using only Free/Libre and open source software.
The main issue is that, since Matomo tracks visits via JavaScript, direct visits to the web-radio stream are not intercepted by Matomo as default.
How to use Matomo to track visits to Icecast2 audio streams?
Yep it's possible. Here my way.
First of all, try the Matomo internal import script. Be sure to set your --idsite= and the correct path to your Matomo installation:
su www-data -s /bin/bash
python2.7 /var/www/matomo/misc/log-analytics/import_logs.py --show-progress --url=https://matomo.example.com --idsite=1 --recorders=2 --enable-http-errors --log-format-name=icecast2 --strip-query-string /var/log/icecast2/access.log
NOTE: If you see this error
[INFO] Error when connecting to Matomo: HTTP Error 400: Bad Request
In this case, be sure to have all needed plugins activated:
Administration > System > Plugins > Bulk plugin
So, if the script works, it should start printing something like this:
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log /var/log/icecast2/access.log...
1013 lines parsed, 200 lines recorded, 99 records/sec (avg), 200 records/sec (current)
If so, immediately stop the script to avoid to import duplicate entries before installing the definitive solution.
To stop the script use CTRL+C.
Now we need to run this script every time the log is rotated, before rotation.
The official documentation suggests a crontab but I don't recommend this solution. Instead, I suggest to configure logrotate instead.
Configure the file /etc/logrotate.d/icecast2. From:
/var/log/icecast2/*.log {
...
weekly
...
}
To:
/var/log/icecast2/*.log {
...
daily
prerotate
su www-data -s /bin/bash --command 'python2.7 ... /var/log/icecast2/access.log' > /var/log/logrotate-icecast2-matomo.log
endscript
...
}
IMPORTANT: In the above example replace ... with the right command.
Now you can also try it manually:
logrotate -vf /etc/logrotate.d/icecast2
From another terminal you should be able to see its result in real-time with:
tail -f /var/log/logrotate-icecast2-matomo.log
If it works it means everything will work perfectly and automatically, importing all visits every day, without any duplicate and without missing any lines.
More documentation here about the import script itself:
https://github.com/matomo-org/matomo-log-analytics
More documentation here about logrotate:
https://linux.die.net/man/8/logrotate

Greenplum: Purging database Logs

is there any direct utility available to purge older logs from GP database, If i do it manually it is taking lot of time as there are 100+ segments, i have to go to each server and delete the logs files manually.
Other details: GP version - 4.3.X.X(Software Only Solution)
Cluster Config- 2+10
Thanks
I suggest you create a cron job and use gpssh to do this. For example:
gpssh -f ~/host_list -e 'for i in $(find /data/primary/gpseg*/pg_log/ -name "*.csv" -ctime +60); do rm $i; done'
This will remove files in pg_log on all segments that are over 2 months old. Of course, you should test this and make sure the path to pg_log is correct.

Mercurial getting file-specific log/history information

Is there a way to get file specific information, similar to
hg log
I basically want committer, date/time, and the commit summary, but of just a single file.
You can filter the results of the hg log command by including a filename like so:
hg log file.txt
That will give you the standard log for every changeset where file.txt was changed. You can use
hg log file.txt -l 10 -r "not merge()"
to limit it to only the last 10 as well as excluding merge changes using revsets

copy SVN modified files including directory to a another directory

I have a list of files in my current working copy that have been modified locally. There are about 50 files that have been changed.
I am using the following command to copy files that have been modified in subversion to a folder called /backup. Is there a way to do this but maintain the directories they are in? So it would do something similar to exporting a SVN diff of files. For example if I changed a file called /usr/lib/SPL/RFC.php then it would copy the usr/lib/SPL directory to backup also.
cp `svn st | ack '^M' | cut -b 8-` backup
It looks strange, but it is really easy to copy files with tar. E.g.
tar -cf - $( svn st | ack '^M' | cut -b 8- ) |
tar -C /backup -xf -
Why not create a patch of your changes? That way you have one file containing all of your changes which you can timestamp in the name - something like 2012-05-28-17-30-00-UnitTestChanges.patch, one per day.
Then you can roll up your changes to a fresh checkout once you're ready, and then commit them.
FYI: Subversion 1.8 should have checkpointing / shelving (which is what you seem to want to do), but that's a long way off, and might only be added in Subversion 1.9.

Mercurial, stop versioning cache directory but keep directory

I have a CakePHP project under Mercurial version control. Right now all the files in the app/tmp directory are being versioned, which are always changing.
I do not want to version control these files.
I know I can stop by running hg forget app/tmp/*
But this will also forget the file structure. Which I want to keep.
Now I know that Mercurial doesn't version directories, just files, but the CakePHP folks were also smart enough to put an empty file called empty in every empty directory (I am guessing for this reason).
So what I want to do is tell Mercurial to forget every file under app/tmp except files whos name is exactly empty.
What would the command be for this?
Well, if nothing else works, you can always just ask Mercurial to forget everything, and then revert empty before committing:
Here's how I reproduced it, first create initial repo:
hg init
md app
md app\tmp
echo a>app\empty
echo a>app\tmp\empty
hg commit -m "initial" -A
Then add some files we later want to get rid of:
echo a >app\tmp\test1.txt
echo a >app\tmp\test2.txt
hg commit -m "adding" -A
Then forget the files we don't want:
hg forget app\tmp\*
hg status <-- will show all 3 files
hg revert app\tmp\empty
hg status <-- now empty is gone
echo glob:app/tmp>.hgignore
hg commit -m "ignored" -A
Note that all .hgignore does is to prevent Mercurial from discovering new files during addremove or commit -A, if you have explicitly tracked files that match your ignore filter, Mercurial will still track changes to those files.
In other words, even though I asked Mercurial to ignore app/tmp above, the file empty inside will not be ignored, or removed, since I have explicitly asked Mercurial to track it.
At least theoretically (I don't have time to try it right now), pattern matching should work with the hg forget command. So, you could do something like hg forget -X empty while in the directory (-X means "exclude").
You may want to consider using .hgignore, of course.
Since you only need to do it once I'd just do this:
find app/tmp -type f | grep -v empty | xargs hg forget
hg commit
from then on just put this in your `.hgignore'
^app/tmp
Mercurial has built-in support for globbing and regexes, as explained in the relevant chapter in the mercurial book. The python regex implementation is used.
This should work for you:
hg forget "re:app/tmp/.*(?<!/empty)$"

Resources