Deleting Large number of files from a folder in linux - filesystems

I am working on a linux production environment where there are about 20 million marker files existing in a folder and these are increasing at the rate of about 10,000 per day.
I need to perform cleanup on this folder and delete all files older than 5 days.
I cannot delete the entire folder as it is an active production environment and is mounted on other servers as well.
I tried using a find command as below
find /dirpath -name "*.fileExtension" -mtime 5 | xargs rm {}
and I also tried
find /dirpath -name "*.fileExtension" -mtime 5 | exec rm {}
but the rate at which these commands delete the files is very slow.
Is there a faster way to perform this activity.

Related

Plesk-Scheduled-Tasks reporting "No such file or directory"

I have a working Centos/Plesk (18.0.40 Update #1) environment running Plesk-Scheduled-Tasks with no problems, and I have a new machine that should be a duplicate of that machine (Plesk 18.0.42 Update #1) that is failing to run the Plesk-Scheduled-Tasks (reporting "No such file or directory" on all the tasks that I have added).
Eliminating as many permissions factors as possible, I am testing a scriptless task running "whoami" will work on the original machine but shows an "-: whoami: command not found" error message on the new.
Note, I am also declaring tasks at the domain level - if I was to add a top level task (where it prompts you for the System user) then it can use root and therefore works - but I do not want these tasks to run under root.
Clicking "Run Now" gives the following:
Hiho.
The run scheduled tasks and also the shell access if it´s enabled for your subscription is mostly chrooted. So you have only a minimum on commands which you can use here.
If you open your subscription via FTP Client you should see a bin folder in there. In the bin folder are all commands you are able to use in the chrooted shell.
Example on one of my subscriptions:
bash cat chmod cp curl du false grep groups gunzip gzip head id less ln ls
mkdir more mv pwd rm rmdir scp sh tail tar touch true unrar unzip vi wget

Linux bash find and delete older than

I'm using this command to find and delete all files with same name in current directory and all its subfolders:
find . -name backupname.tar.gz -exec rm -rf {} \;
Now I want to find and delete all files with same name but keep newest one.
I need it to keep newest backup of my sites and delete all old backups
How can I do that?
Thanks

archive_cleanup_command does not clear the archived wal files

Main question:
archive_cleanup_command in the postgresql.conf file does not clear the archived wal files. How can I get it to clear the archived wal files?
Relevant information:
My OS is Linux, Ubuntu v18.04 LTS.
Database is Postgresql version 13
My current settings:
/etc/postgresql/13/main/postgresql.conf file:
wal_level = replica
wal_compression = on
wal_recycle = on
checkpoint_timeout = 5min
max_wal_size = 1GB
min_wal_size = 80MB
archive_mode = on
archive_command = 'pxz --compress --keep --force -6 --to-stdout --quiet %p > /datadrive/postgresql/13/wal_aerchives/%f.xz'
archive_timeout = 10min
restore_command = 'pxz --decompress --keep --force -6 --to-std-out --quiet /datadrive/postgresql/13/wal_archives/%f.xz > %p'
archive_cleanup_command = 'pg_archivecleanup -d -x .xz /datadrive/postgresql/13/wal_archives %r >> /datadrive/postgresql/13/wal_archives/archive_cleanup_command.log 2>&1'
archive_cleanup_command.log has 777 permissions.
I have a master database doing logical replication with a publication and a slave database subscribing to that publication. It is on the slave that I am intending to do the archiving and restore points.
What am I expecting to happen?
The checkpoint timeout setting in the postgresql.conf file means that a restart point is created atleast every 5 mins. And the archive_timeout setting of 10 mins means that postgresql forces a logfile segment switch after every 10 mins. Therefore, atleast every 10 mins, a restart point is created. Whenever a restart point is created, the archive cleanup command is run. When this command is run it will clear all the .xz files older than this restart point. Therefore the wal_archives directory should not really have .xz files older than 20mins or even 2hours....
What is actually happening?
The /datadrive/postgresql/13/wal_archives directory piles up with lots of .xz files that never get cleared.
cat archive_cleanup_command.log shows an empty file. Nothing is ever writing to it.
When I run the pg_archivecleanup command manually via bash, it works (i.e. clears all the archive files before the one specified and cat archive_cleanup_command shows the files that were cleared.
Example:
pg_archivecleanup -d -x .xz /datadrive/postgresql/13/wal_archives 000000010000045E000000E5 >> /datadrive/postgresql/13/wal_archives/archive_cleanup_command.log 2>&1
Then running cat archive_cleanup_command.log gives this:
pg_archivecleanup: keeping WAL file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E5" and later
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000DE.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000DF.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E0.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E1.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E2.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E3.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E4.xz"
What have I tried?
I have tried various permission settings (examples: chmod 777 the wal_archive directory, add other users to the postgres group, etc...)
Extensively and thoroughly read the postgresql documentation and looked atleast 20 different related stackoverflow posts.
Initially tried 7zip cmd line tool to do the zipping instead of pxz.
Successfully restarted the database multiple times using the following commands:
sudo systemctl stop postgresql#13-main
sudo systemctl start postgresql#13-main
Dropped the logical replication and re-created the publication on the master and subscription on the slave.
Enabled checkpoints on the master itself.
Looked at /var/log/postgresql/postgresql-13-main.log. Unfortunately no relevant errors show up in this log.
Restartpoints, restore_command and archive_cleanup_command only apply to streaming ("physical") replication, or to recovery in general, not to logical replication.
A logical replication standby is not in recovery, it is open for reading and writing. In that status, recovery settings like archive_cleanup_command are ignored.
You will have to find another mechanism to delete old WAL archives, ideally in combination with your backup solution.

cron job to move 10 day old files to different directory

I need a script that I can run from a cron job to move files that are 10 days old or older to a different directory. Being a windows sys admin, I have no idea what I'm doing in linux. :( Can someone point me in the right direction?
Thanks in advance.
find /source/directory/* -mtime +10 -exec mv "{}" /my/directory \;
Use 'crontab -e' to open your crontab and set it to run as needed.

How to get the current dev_appserver version?

How can I get the GAE SDK to tell me what version it is? I could not find anything like this:
dev_appserver.py --version
Note that this is different from os.environ['CURRENT_VERSION_ID'], which returns the application version, and it seems that os.environ['SERVER_SOFTWARE'] always returns Development/1.0 when I run it inside the Interactive Console.
I would like to create a GAE SDK updater script that performs the following logic:
Checks to see what the latest version of GAE SDK for Python on Linux is (as of this writing 1.7.5 which is available for download at https://storage.googleapis.com/appengine-sdks/deprecated/175/google_appengine_1.7.5.zip.
Checks the currently installed version of the GAE SDK.
If the available version > installed version, downloads the latest package and unzips it into the correct directory.
If there is no "supported" way to do step #1, I am willing to hard-code the "latest version" in the script, but I still only want to download/install it once even if the script itself is run multiple times. In other words, the script should be idempotent.
The directory where the GAE SDK zip is unpacked to contains a VERSION file with the following contents:
release: "1.7.5"
timestamp: 1357690550
api_versions: ['1']
So I wrote a script to pull the version out of there:
#!/bin/sh
INSTALLEDVERSION=`cat /usr/local/google_appengine/VERSION | grep release | cut -d: -f 2 | cut -d\" -f 2`
LATESTVERSION="1.7.5"
if [ $INSTALLEDVERSION != $LATESTVERSION ]; then
echo "Update GAE SDK"
fi
Or, you can use this to obtain the version string on non-default installs, but readlink may not work correctly on Linux:
INSTALLEDDIR=`which dev_appserver.py | xargs readlink | xargs dirname`
INSTALLEDVERSION=`cat $INSTALLEDDIR/VERSION | grep release | cut -d: -f 2 | cut -d\" -f 2`
But this still does not provide a way to perform step 1, which would query the web for the latest version and do auto-updating.

Resources