Main question:
archive_cleanup_command in the postgresql.conf file does not clear the archived wal files. How can I get it to clear the archived wal files?
Relevant information:
My OS is Linux, Ubuntu v18.04 LTS.
Database is Postgresql version 13
My current settings:
/etc/postgresql/13/main/postgresql.conf file:
wal_level = replica
wal_compression = on
wal_recycle = on
checkpoint_timeout = 5min
max_wal_size = 1GB
min_wal_size = 80MB
archive_mode = on
archive_command = 'pxz --compress --keep --force -6 --to-stdout --quiet %p > /datadrive/postgresql/13/wal_aerchives/%f.xz'
archive_timeout = 10min
restore_command = 'pxz --decompress --keep --force -6 --to-std-out --quiet /datadrive/postgresql/13/wal_archives/%f.xz > %p'
archive_cleanup_command = 'pg_archivecleanup -d -x .xz /datadrive/postgresql/13/wal_archives %r >> /datadrive/postgresql/13/wal_archives/archive_cleanup_command.log 2>&1'
archive_cleanup_command.log has 777 permissions.
I have a master database doing logical replication with a publication and a slave database subscribing to that publication. It is on the slave that I am intending to do the archiving and restore points.
What am I expecting to happen?
The checkpoint timeout setting in the postgresql.conf file means that a restart point is created atleast every 5 mins. And the archive_timeout setting of 10 mins means that postgresql forces a logfile segment switch after every 10 mins. Therefore, atleast every 10 mins, a restart point is created. Whenever a restart point is created, the archive cleanup command is run. When this command is run it will clear all the .xz files older than this restart point. Therefore the wal_archives directory should not really have .xz files older than 20mins or even 2hours....
What is actually happening?
The /datadrive/postgresql/13/wal_archives directory piles up with lots of .xz files that never get cleared.
cat archive_cleanup_command.log shows an empty file. Nothing is ever writing to it.
When I run the pg_archivecleanup command manually via bash, it works (i.e. clears all the archive files before the one specified and cat archive_cleanup_command shows the files that were cleared.
Example:
pg_archivecleanup -d -x .xz /datadrive/postgresql/13/wal_archives 000000010000045E000000E5 >> /datadrive/postgresql/13/wal_archives/archive_cleanup_command.log 2>&1
Then running cat archive_cleanup_command.log gives this:
pg_archivecleanup: keeping WAL file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E5" and later
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000DE.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000DF.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E0.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E1.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E2.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E3.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E4.xz"
What have I tried?
I have tried various permission settings (examples: chmod 777 the wal_archive directory, add other users to the postgres group, etc...)
Extensively and thoroughly read the postgresql documentation and looked atleast 20 different related stackoverflow posts.
Initially tried 7zip cmd line tool to do the zipping instead of pxz.
Successfully restarted the database multiple times using the following commands:
sudo systemctl stop postgresql#13-main
sudo systemctl start postgresql#13-main
Dropped the logical replication and re-created the publication on the master and subscription on the slave.
Enabled checkpoints on the master itself.
Looked at /var/log/postgresql/postgresql-13-main.log. Unfortunately no relevant errors show up in this log.
Restartpoints, restore_command and archive_cleanup_command only apply to streaming ("physical") replication, or to recovery in general, not to logical replication.
A logical replication standby is not in recovery, it is open for reading and writing. In that status, recovery settings like archive_cleanup_command are ignored.
You will have to find another mechanism to delete old WAL archives, ideally in combination with your backup solution.
Related
I have a working Centos/Plesk (18.0.40 Update #1) environment running Plesk-Scheduled-Tasks with no problems, and I have a new machine that should be a duplicate of that machine (Plesk 18.0.42 Update #1) that is failing to run the Plesk-Scheduled-Tasks (reporting "No such file or directory" on all the tasks that I have added).
Eliminating as many permissions factors as possible, I am testing a scriptless task running "whoami" will work on the original machine but shows an "-: whoami: command not found" error message on the new.
Note, I am also declaring tasks at the domain level - if I was to add a top level task (where it prompts you for the System user) then it can use root and therefore works - but I do not want these tasks to run under root.
Clicking "Run Now" gives the following:
Hiho.
The run scheduled tasks and also the shell access if it´s enabled for your subscription is mostly chrooted. So you have only a minimum on commands which you can use here.
If you open your subscription via FTP Client you should see a bin folder in there. In the bin folder are all commands you are able to use in the chrooted shell.
Example on one of my subscriptions:
bash cat chmod cp curl du false grep groups gunzip gzip head id less ln ls
mkdir more mv pwd rm rmdir scp sh tail tar touch true unrar unzip vi wget
I try to backup my kiwi tcms data following steps on web http://kiwitcms.org/blog/atodorov/2018/07/30/how-to-backup-docker-volumes-for-kiwi-tcms/. Some question need help.
Which type datas stored on kiwi_uploads? Shall I also use command "docker volume rm kiwi_uploads" then restore it? Did same as Backing up the database.
Some errors occurs as below when restore kiwi_uploads using "cat uploads.tar | docker exec -i kiwi_web /bin/tar -x". But even error occurs, login and find previous data ok, such as plan, runs, test case...Of cause, I restore kiwi_db_data successfully.
cat uploads.tar | docker exec -i kiwi_web /bin/tar -x
/bin/tar: This does not look like a tar archive
/bin/tar: Skipping to next header
/bin/tar: Exiting with failure status due to previous errors
3."cat database.json | docker exec -i kiwi_web /Kiwi/manage.py loaddata --format json -". No any parameter behind last -? missing or just as this.
1) kiwi_uploads is for all files that are uploaded (or attached) to documents like Test Plan, Test Case, etc.
The instructions in the blog should work for you. Usually there's no need to remove the volume but if you are restoring everything it doesn't really matter.
2) For the errors you have
/bin/tar: This does not look like a tar archive
so whatever file you ended up with is not a tar archive and everything else fails.
3) The last - means to read the input data from stdin. You have to copy the backup and restore commands verbatim.
All commands are designed to be executed from a Linux host. I don't have access to a Windows or Mac OS box so I don't know if they will work there at all.
EDIT-2
I found out that the database doesn't even start after making the file location change.
This is with the default file location:
$pg_isready
/var/run/postgresql:5432 - accepting connections
$pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
9.5 main 5432 online postgres /var/lib/postgresql/9.5/main /var/log/postgresql/postgresql-9.5-main.log
pg_lsclusters output is green.
After the file location has changed on postgresql.conf:
$pg_isready
/var/run/postgresql:5432 - no response
$pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
9.5 main 5432 down root /mnt/Data/postgresdb/postgresql/9.5/main /var/log/postgresql/postgresql-9.5-main.log
Here the output is red.
Following this post here, I tried to start the cluster manually:
$pg_ctlcluster 9.5 main start
Warning: the cluster will not be running as a systemd service. Consider using systemctl:
sudo systemctl start postgresql#9.5-main
Error: You must run this program as the cluster owner (root) or root
I tried the same command with sudo:
Error: Config owner (postgres:124) and data owner (root:0) do not match, and config owner is not root
Which again makes me think the problem might lie with permissions of the directory. The directory is owned by root whose ownership I am unable to change.
EDIT-1
I've been working on this and I'd like to distill this post further to give more specifics. This is my current situation:
I installed postgres: sudo apt-get install postgresql and postgresql-contrib
I used sudo -U postgres psql to get into the postgres shell (I'm not sure if this is what I need to do)
show data_directory returns: /var/lib/postgresql/9.5/main
The data directory is located in Ubuntu ext4 formatted hard drive. I also have a 1 TB NTFS formatted hard disk mounted on /mnt/Data (which is mounted automatically on boot). What I tried:
Stop the postgres service: sudo systemctl stop postgresql
Create a new directory /mnt/Data/postgresdb and copy contents of the previous main to this which gives me a full path of /mnt/Data/postgresdb/postgresql/9.5/main using: sudo rsync -av /var/lib/postgresql/ /mnt/Data/postgresdb/postgresql/
Edit /etc/postgresql/9.5/main/postgresql.conf to change data_directory from the path mentioned above to /mnt/Data/postgresdb/postgresql/9.5/main
Start the postgres service: sudo systemctl start postgresl
Run sudo -U postgres psql but get the error that was mentioned in the original post.
These are the permissions on the respective main directories:
ls -l /var/lib/postgresql/9.5/
total 4.0K drwx------ 19 postgres postgres 4.0K Jan 16 12:40 main
ls -l /mnt/Data/postgresdb/postgresql/9.5/
total 4.0K drwxrwxrwx 1 root root 4.0K Jan 16 12:13 main
From the looks of it, the default directory is owned by "postgres" and the new directory is owned by root. However, when I try to change ownership to postgres: chown -R postgres main, it doesn't output any error, but the ownership doesn't change. I'm curious whether this is because this drive is NTFS formatted and is mounted.
Here is my /etc/fstab:
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
# / was on /dev/sda5 during installation
UUID=3f5a9875-89a3-4ce5-b778-9d7aaf148ed6 / ext4 errors=remount-ro 0 1
# swap was on /dev/sda6 during installation
UUID=85c3f4d4-e450-435b-8dd6-cf1b2cbd8fc2 none swap sw 0 0
/dev/disk/by-label/Data /mnt/Data auto nosuid,nodev,nofail,x-gvfs-show 0 0
Any ideas on how I can go about fixing this?
ORIGINAL POST
Recently, I installed Postgresql for storing some data for my research. The dataset came with instructions on how to setup the data on a Postgresql database (if interested, more info on that here and here). I installed Postgresql and set up a "role" and used the script that was provided for loading the database. It worked but I underestimated the size of the dataset and the script quit saying there was no more space.
I have two drives on my computer a 250G SSD drive with Windows and Ubuntu installed (125G each). And a 1TB HDD NTFS formatted where I store my data. So I thought moving the database to a folder on the other drive would be helpful. I purged all the data and the database to start afresh and followed the instructions here to move the database directory. However, after moving the directory, when I try to connect using psql I get the following error:
~ psql -U username -d postgres 14:48:33
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
How can I fix this? I am running 64-bit Ubuntu 16.04 with Postgresql-9.5. As mentioned earlier, I moved the DB directory a NTFS formatted filesystem (not sure if that cause any problems).
Thanks.
As mentioned in the comments the NTFS was the problem. I ended up resizing my bigger hard drive with 100GB formatted as ext4 and was able to launch postgres with the new data directory without any problems.
I am new to Postgresql and Pgpool II setup. I have configured the Postgresql HA/Load balancing using Pgpool and Repmgr.
The setup consist of 3 nodes and verison of Application and OS is as mentioned below:
**Pgpool node** => 192.168.0.4, **Postgresql Nodes** => 192.168.0.6, 192.168.0.7
**OS version** => CentOS 6.8 (On all the 3 nodes)
**Pgpool II version** => pgpool-II version 3.5.0 (ekieboshi).
**Postgresql Version** => PostgreSQL 9.4.8
**Repmgr Version** => repmgr 3.1.3 (PostgreSQL 9.4.8)
I have followed the link to do the setup.
When I bring down the master node, the failover happens successfully and the Slave node takes over as the new Master node.
After failover, I have to recover the failed node manually and sync it with the new Master node.
I want to automate the recovery process.
The pgpool.conf file on the pgpool node contains parameter recovery_1st_stage_command.
I have searched the sources online and found that the paramater "recovery_1st_stage_command" should be set in the configuration file pgpool.conf on the Pgpool node.
I have set the parameter recovery_1st_stage_command = 'basebackup.sh'.
I have placed the script 'basebackup.sh' file on both the Postgresql node under the data directory '/var/lib/pgsql/9.4/data'.
Also I have placed the script 'pgpool_remote_start' on both the database node under the directory '/var/lib/pgsql/9.4/data'.
Also created the pgpool extension pgpool_recovery and pgpool_adm on both the database node.
When the Master node is stopped, the failover happens but the recovery script 'basebackup.sh' is not executed.
I have checked the pgpool logs and enabled debug level as well. Still cannot find whether the script got executed or not.
Please help me with the automatic online recovery of the failed node. Find the scripts used by me.
basebackup.sh
#!/bin/bash
# first stage recovery
# $1 datadir
# $2 desthost
# $3 destdir
#as I'm using repmgr it's not necessary for me to know datadir(master) $1
RECOVERY_NODE=$2
CLUSTER_PATH=$3
#repmgr needs to know the master's ip
MASTERNODE=`/sbin/ifconfig eth0 | grep inet | awk '{print $2}' | sed 's/addr://'`
cmd1=`ssh postgres#$RECOVERY_NODE "repmgr -D $CLUSTER_PATH --force standby clone $MASTERNODE"`
echo $cmd1
pgpool_remote_start script.
#! /bin/sh
if [ $# -ne 2 ]
then
echo "pgpool_remote_start remote_host remote_datadir"
exit 1
fi
DEST=$1
DESTDIR=$2
PGCTL=/usr/pgsql-9.4/bin/pg_ctl
ssh -T $DEST $PGCTL -w -D $DESTDIR start 2>/dev/null 1>/dev/null < /dev/null &
Thanks.
I think this is as designed. When a master fails, there is a failover and so the slave gets promoted. But the old master is not automatically recovered as a slave. At the contrary, usually the failover script will try to shutdown the failed master for good and disable it from restarting (if possible, maybe the node is down and it is not possible to connect to), this to avoid a split-brain.
If you really want that then you could modify the failover script in such a way that it will do the pcp_recovery operation on the old master after the slave is promoted. But then what you are in fact doing is a switchover... this should be scripted as a series of step. A failover is for when there is a real issue with the master (like machine not responding)
I'm trying to test a small PostgreSQL setup, so I cobbled together a quick local install. However, when I'm trying to create my personal db with createdb, it chokes on errors like this (notably, it starts with base/16384 the first time, and increments each time I run it). Anyone know what's going on here, or if there's some trivial config I missed that would cause this? Thanks, and this is somewhat time-critical, so please respond if you do know anything. Thanks!
UPDATES:
I'm running this on a CentOS 5 server, apologies that I don't have too many further details (it's a shared account on that server). uname -a has the following output:
Linux {OMITTED} 2.6.18-194.11.4.el5 #1 SMP Tue Sep 21 05:04:09 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
I installed PostgreSQL from source from:
http://wwwmaster.postgresql.org/download/mirrors-ftp/source/v9.0.1/postgresql-9.0.1.tar.bz2
built in my home directory and installed to prefix=$HOME/local/pgsql.
Here's a terminal readout for me attempting to create my user's db on a fresh data setup:
[htung#{OMITTED}:~]$ killall postgres
LOG: autovacuum launcher shutting down
LOG: received smart shutdown request
LOG: shutting down
LOG: database system is shut down
[htung#{OMITTED}:~]$ rm -r tmp
mk[1]+ Done ../local/pgsql/bin/postgres -D $HOME/tmp (wd: ~/tmp)
(wd now: ~)
[htung#{OMITTED}:~]$ mkdir tmp
[htung#{OMITTED}:~]$ local/pgsql/bin/initdb -D $HOME/tmp
The files belonging to this database system will be owned by user "htung".
This user must also own the server process.
The database cluster will be initialized with locale en_US.UTF-8.
The default database encoding has accordingly been set to UTF8.
The default text search configuration will be set to "english".
fixing permissions on existing directory /afs/{OMITTED}/htung/tmp ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in /afs/{OMITTED}/htung/tmp/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the -A option the
next time you run initdb.
Success. You can now start the database server using:
local/pgsql/bin/postgres -D /afs/{OMITTED}/htung/tmp
or
local/pgsql/bin/pg_ctl -D /afs/{OMITTED}/htung/tmp -l logfile start
[htung#{OMITTED}:~]$ local/pgsql/bin/postgres -D $HOME/tmp
LOG: database system was shut down at 2010-11-15 13:47:25 PST
LOG: autovacuum launcher started
LOG: database system is ready to accept connections
[1]+ Stopped local/pgsql/bin/postgres -D $HOME/tmp
[htung#{OMITTED}:~]$ bg
[1]+ local/pgsql/bin/postgres -D $HOME/tmp &
[htung#{OMITTED}:~]$ local/pgsql/bin/createdb
ERROR: could not fsync file "base/16384": Invalid argument
STATEMENT: CREATE DATABASE htung;
createdb: database creation failed: ERROR: could not fsync file "base/16384": Invalid argument
[htung#{OMITTED}:~]$
I would guess that you're possibly running into the SE linux system here. I'd recommend to either turn off SELinux and see if that works, or to install from RPMs available from the postgresql website.