Btrfs restore from incremental backups - filesystems

I have a couple of backup files created with BTRFS send.
I can restore the subvolume with those files. However I cannot keep using them for incremental backups.
Here is an example:
# Create first snapshot
btrfs subvolume snapshot -r <original_subvol> <snapshot_1>
# Write some stuff to <original_subvol>
btrfs subvolume snapshot -r <original_subvol> <snapshot_2>
# Create the backup files
btrfs send -f <snapshot_file_1> <snapshot_1>
btrfs send -f <snapshot_file_2> -p <snapshot_1> <snapshot_2>
# Suppose you lost <original_subvol> and you restore to a brand new filesystem
btrfs receive -f <snapshot_file_1> <dest>
btrfs receive -f <snapshot_file_2> <dest>
# Create a write snapshot from the last restored snapshop
btrfs snapshot <restore_snapshot_2> <restore_subvol>
# Write some stuff to <restore_subvol> and do an incremental backup
btrfs snapshot -r <snapshot_3> <restore_subvol>
btrfs send -f <snapshot_file_3> -p <restore_snapshot_2> <restore_subvol>
# Suppose you lost <restore_subvol> (again !) and you restore to a brand new filesystem
btrfs receive -f <snapshot_file_1> <dest>
btrfs receive -f <snapshot_file_2> <dest>
btrfs receive -f <snapshot_file_3> <dest>
=> ERROR: could not find parent subvolume
What I am doing wrong ?

Related

Create SQL Server docker image with restored backup database using purely a Dockerfile

The following docker file creates a custom SQL server image with a database restored from a backup (rmsdev.bak).
FROM mcr.microsoft.com/mssql/server:2019-latest
ENV MSSQL_PID=Developer
ENV SA_PASSWORD=Password1?
ENV ACCEPT_EULA=Y
USER mssql
COPY rmsdev.bak /var/opt/mssql/backup/
# Launch SQL Server, confirm startup is complete, restore the database, then terminate SQL Server.
RUN ( /opt/mssql/bin/sqlservr & ) | grep -q "Service Broker manager has started" \
&& /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P $SA_PASSWORD -Q 'RESTORE DATABASE rmsdev FROM DISK = "/var/opt/mssql/backup/rmsdev.bak" WITH MOVE "rmsdev" to "/var/opt/mssql/data/rmsdev.mdf", MOVE "rmsdev_Log" to "/var/opt/mssql/data/rmsdev_log.ldf", NOUNLOAD, STATS = 5' \
&& pkill sqlservr
CMD ["/opt/mssql/bin/sqlservr"]
The issue is that, once the restore is complete, the backup file is not required anymore and I would like to remove it from the image.
Unfortunately, due to how docker images are formed (layers) I cannot simply 'rm' the file as I would like to.
Multistage Dockerfile is not easily applicable in this case as in a build scenario.
Another way would be to run the container, restore the backup and then commit a new image, but what I am looking to do is to use only docker build with the proper Dockerfile.
Does anyone know a way?
If you know where the data directory is in the image, and the image does not declare that directory as a VOLUME, then you can use a multi-stage build for this. The first stage would set up the data directory as you show. The second stage would copy the populated data directory from the first stage but not the backup file. This trick might depend on the two stages running identical builds of the underlying software.
For SQL Server, the Docker Hub page and GitHub repo are both tricky to find, and surprisingly neither talks to the issue of data storage (as #HansKillian notes in a comment, you would almost always want to store the database data in some sort of volume). The GitHub repo does include a Helm chart built around a Kubernetes StatefulSet and from that we can discover that a data directory would be mounted on /var/opt/mssql.
So I might write a multi-stage build like so:
# Put common setup steps in an initial stage
FROM mcr.microsoft.com/mssql/server:2019-latest AS setup
ENV MSSQL_PID=Developer
ENV SA_PASSWORD=Password1? # (weak password, easily extracted with `docker inspect`)
ENV ACCEPT_EULA=Y # (legally probably the end user needs to accept this not the image builder)
# Have a stage specifically to populate the data directory
FROM setup AS data
# (copy-and-pasted from the question)
USER mssql
COPY rmsdev.bak / # not under /var/opt/mssql
RUN ( /opt/mssql/bin/sqlservr & ) | grep -q "Service Broker manager has started" \
&& /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P $SA_PASSWORD -Q 'RESTORE DATABASE rmsdev FROM DISK = "/rmsdev.bak" WITH MOVE "rmsdev" to "/var/opt/mssql/data/rmsdev.mdf", MOVE "rmsdev_Log" to "/var/opt/mssql/data/rmsdev_log.ldf", NOUNLOAD, STATS = 5' \
&& pkill sqlservr
# Final stage that actually will actually be run.
FROM setup
# Copy the prepopulated data tree, but not the backup file
COPY --from=data /var/opt/mssql /var/opt/mssql
# Use the default USER, CMD, etc. from the base SQL Server image
The standard Docker Hub open-source database images like mysql and postgres generally declare a VOLUME in their Dockerfile for the database data, which forces the data to be stored in a volume. The important thing this means is that you can't set up data in the image like this; you have to populate the data externally, and then copy the data tree outside of the Docker image system.

archive_cleanup_command does not clear the archived wal files

Main question:
archive_cleanup_command in the postgresql.conf file does not clear the archived wal files. How can I get it to clear the archived wal files?
Relevant information:
My OS is Linux, Ubuntu v18.04 LTS.
Database is Postgresql version 13
My current settings:
/etc/postgresql/13/main/postgresql.conf file:
wal_level = replica
wal_compression = on
wal_recycle = on
checkpoint_timeout = 5min
max_wal_size = 1GB
min_wal_size = 80MB
archive_mode = on
archive_command = 'pxz --compress --keep --force -6 --to-stdout --quiet %p > /datadrive/postgresql/13/wal_aerchives/%f.xz'
archive_timeout = 10min
restore_command = 'pxz --decompress --keep --force -6 --to-std-out --quiet /datadrive/postgresql/13/wal_archives/%f.xz > %p'
archive_cleanup_command = 'pg_archivecleanup -d -x .xz /datadrive/postgresql/13/wal_archives %r >> /datadrive/postgresql/13/wal_archives/archive_cleanup_command.log 2>&1'
archive_cleanup_command.log has 777 permissions.
I have a master database doing logical replication with a publication and a slave database subscribing to that publication. It is on the slave that I am intending to do the archiving and restore points.
What am I expecting to happen?
The checkpoint timeout setting in the postgresql.conf file means that a restart point is created atleast every 5 mins. And the archive_timeout setting of 10 mins means that postgresql forces a logfile segment switch after every 10 mins. Therefore, atleast every 10 mins, a restart point is created. Whenever a restart point is created, the archive cleanup command is run. When this command is run it will clear all the .xz files older than this restart point. Therefore the wal_archives directory should not really have .xz files older than 20mins or even 2hours....
What is actually happening?
The /datadrive/postgresql/13/wal_archives directory piles up with lots of .xz files that never get cleared.
cat archive_cleanup_command.log shows an empty file. Nothing is ever writing to it.
When I run the pg_archivecleanup command manually via bash, it works (i.e. clears all the archive files before the one specified and cat archive_cleanup_command shows the files that were cleared.
Example:
pg_archivecleanup -d -x .xz /datadrive/postgresql/13/wal_archives 000000010000045E000000E5 >> /datadrive/postgresql/13/wal_archives/archive_cleanup_command.log 2>&1
Then running cat archive_cleanup_command.log gives this:
pg_archivecleanup: keeping WAL file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E5" and later
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000DE.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000DF.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E0.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E1.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E2.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E3.xz"
pg_archivecleanup: removing file "/datadrive/postgresql/13/wal_archives/000000010000045E000000E4.xz"
What have I tried?
I have tried various permission settings (examples: chmod 777 the wal_archive directory, add other users to the postgres group, etc...)
Extensively and thoroughly read the postgresql documentation and looked atleast 20 different related stackoverflow posts.
Initially tried 7zip cmd line tool to do the zipping instead of pxz.
Successfully restarted the database multiple times using the following commands:
sudo systemctl stop postgresql#13-main
sudo systemctl start postgresql#13-main
Dropped the logical replication and re-created the publication on the master and subscription on the slave.
Enabled checkpoints on the master itself.
Looked at /var/log/postgresql/postgresql-13-main.log. Unfortunately no relevant errors show up in this log.
Restartpoints, restore_command and archive_cleanup_command only apply to streaming ("physical") replication, or to recovery in general, not to logical replication.
A logical replication standby is not in recovery, it is open for reading and writing. In that status, recovery settings like archive_cleanup_command are ignored.
You will have to find another mechanism to delete old WAL archives, ideally in combination with your backup solution.

Unable to restore SEC filings preloaded database from arelle.org, postgres pg_dump gzip file

I was trying to restore an SEC form preloaded database from Arelle.org using postgres. Below is the link:
http://arelle.org/documentation/xbrl-database/
It's the one towards the bottom of the page where it says "Preloaded Database".
I was able to download the file, but unable to gunzipped it at first. So, I copied the file and renamed it with .gz extension instead of .gzip. Then, I was able to gunzip it, but noot sure if that affects the file.
After that I tried the following command on postgress to restore the database in the database that I created:
psql -U username -d mydb -f secfile.pg (no luck)
I also tried:
pg_restore -C -d mydb secfile.pg (also no luck)
I am not sure if it's because I copied and renamed the file. But, I'd really appreciate it if anyone could help.

Starting and populating a Postgres container in Docker

I have a Docker container that contains my Postgres database. It's using the official Postgres image which has a CMD entry that starts the server on the main thread.
I want to populate the database by running RUN psql –U postgres postgres < /dump/dump.sql before it starts listening to queries.
I don't understand how this is possible with Docker. If I place the RUN command after CMD, it will of course never be reached because Docker has finished reading the Dockerfile. But if I place it before the CMD, it will run before psql even exists as a process.
How can I prepopulate a Postgres database in Docker?
After a lot of fighting, I have found a solution ;-)
For me was very useful a comment posted here: https://registry.hub.docker.com/_/postgres/ from "justfalter"
Anyway, I have done in this way:
# Dockerfile
FROM postgres:9.4
RUN mkdir -p /tmp/psql_data/
COPY db/structure.sql /tmp/psql_data/
COPY scripts/init_docker_postgres.sh /docker-entrypoint-initdb.d/
db/structure.sql is a sql dump, useful to initialize the first tablespace.
Then, the init_docker_postgres.sh
#!/bin/bash
# this script is run when the docker container is built
# it imports the base database structure and create the database for the tests
DATABASE_NAME="db_name"
DB_DUMP_LOCATION="/tmp/psql_data/structure.sql"
echo "*** CREATING DATABASE ***"
# create default database
gosu postgres postgres --single <<EOSQL
CREATE DATABASE "$DATABASE_NAME";
GRANT ALL PRIVILEGES ON DATABASE "$DATABASE_NAME" TO postgres;
EOSQL
# clean sql_dump - because I want to have a one-line command
# remove indentation
sed "s/^[ \t]*//" -i "$DB_DUMP_LOCATION"
# remove comments
sed '/^--/ d' -i "$DB_DUMP_LOCATION"
# remove new lines
sed ':a;N;$!ba;s/\n/ /g' -i "$DB_DUMP_LOCATION"
# remove other spaces
sed 's/ */ /g' -i "$DB_DUMP_LOCATION"
# remove firsts line spaces
sed 's/^ *//' -i "$DB_DUMP_LOCATION"
# append new line at the end (suggested by #Nicola Ferraro)
sed -e '$a\' -i "$DB_DUMP_LOCATION"
# import sql_dump
gosu postgres postgres --single "$DATABASE_NAME" < "$DB_DUMP_LOCATION";
echo "*** DATABASE CREATED! ***"
So finally:
# no postgres is running
[myserver]# psql -h 127.0.0.1 -U postgres
psql: could not connect to server: Connection refused
Is the server running on host "127.0.0.1" and accepting
TCP/IP connections on port 5432?
[myserver]# docker build -t custom_psql .
[myserver]# docker run -d --name custom_psql_running -p 5432:5432 custom_psql
[myserver]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ce4212697372 custom_psql:latest "/docker-entrypoint. 9 minutes ago Up 9 minutes 0.0.0.0:5432->5432/tcp custom_psql_running
[myserver]# psql -h 127.0.0.1 -U postgres
psql (9.2.10, server 9.4.1)
WARNING: psql version 9.2, server version 9.4.
Some psql features might not work.
Type "help" for help.
postgres=#
# postgres is now initialized with the dump
Hope it helps!
For those who want to initialize a PostgreSQL DB with millions of records during the first run.
Import using *.sql dump
You can do simple sql dump and copy the dump.sql file into /docker-entrypoint-initdb.d/. The problem is speed. My dump.sql script is about 17MB (small DB - 10 tables with 100k rows in only one of them) and the initialization takes over a minute (!). That is unacceptable for local development / unit test, etc.
Import using binary dump
The solution is to make a binary PostgreSQL dump and use shell scripts initialization support.
Then the same DB is initialized in about 500ms instead of 1 minute.
1. Create the dump.pgdata binary dump of a DB named "my-db"
directly from within a container or your local DB
pg_dump -U postgres --format custom my-db > "dump.pgdata"
Or from host from running container (postgres-container)
docker exec postgres-container pg_dump -U postgres --format custom my-db > "dump.pgdata"
2. Create a Docker image with a given dump and initialization script
$ tree
.
├── Dockerfile
└── docker-entrypoint-initdb.d
├── 01-restore.sh
├── 02-small-updates.sql
└── dump.pgdata
$ cat Dockerfile
FROM postgres:11
COPY ./docker-entrypoint-initdb.d/ /docker-entrypoint-initdb.d/
$ cat docker-entrypoint-initdb.d/01-restore.sh
#!/bin/bash
file="/docker-entrypoint-initdb.d/dump.pgdata"
dbname=my-db
echo "Restoring DB using $file"
pg_restore -U postgres --dbname=$dbname --verbose --single-transaction < "$file" || exit 1
$ cat docker-entrypoint-initdb.d/02-small-updates.sql
-- some updates on your DB, for example for next application version
-- this file will be executed on DB during next release
UPDATE ... ;
3. Build an image and run it
$ docker build -t db-test-img .
$ docker run -it --rm --name db-test db-test-img
Alternatively, you can just mount a volume to /docker-entrypoint-initdb.d/ that contains all your DDL scripts. You can put in *.sh, *.sql, or *.sql.gz files and it will take care of executing those on start-up.
e.g. (assuming you have your scripts in /tmp/my_scripts)
docker run -v /tmp/my_scripts:/docker-entrypoint-initdb.d postgres
There is yet another option available that utilises Flocker:
Flocker is a container data volume manager that is designed to allow databases like PostgreSQL to easily run in containers in production. When running a database in production, you have to think about things like recovering from host failure. Flocker provides tools for managing data volumes across a cluster of machines like you have in a production environment. For example, as a Postgres container is scheduled between hosts in response to server failure, Flocker can automatically move its associated data volume between hosts at the same time. This means that when your Postgres container starts up on a new host, it has its data. This operation can be accomplished manually using the Flocker API or CLI, or automatically by a container orchestration tool that Flocker is integrates with, for example Docker Swarm, Kubernetes or Mesos.
I Followed the same solution which #damoiser , The only situation which was different was I wanted to import all dump data.
Please follow the solution below.(I have not done any kind of checks)
Dockerfile
FROM postgres:9.5
RUN mkdir -p /tmp/psql_data/
COPY db/structure.sql /tmp/psql_data/
COPY scripts/init_docker_postgres.sh /docker-entrypoint-initdb.d/
then the init_docker_postgres.sh script
#!/bin/bash
DB_DUMP_LOCATION="/tmp/psql_data/structure.sql"
echo "*** CREATING DATABASE ***"
psql -U postgres < "$DB_DUMP_LOCATION";
echo "*** DATABASE CREATED! ***"
and then you can build your image as
docker build -t abhije***/postgres-data .
docker run -d abhije***/postgres-data
My solution is inspired by Alex Dguez's answer which unfortunately doesn't work for me because:
I used pg-9.6 base image, and the RUN /docker-entrypoint.sh --help never ran through for me, which always complained with The command '/bin/sh -c /docker-entrypoint.sh -' returned a non-zero code: 1
I don't want to pollute the /docker-entrypoint-initdb.d dir
The following answer is originally from my reply in another post: https://stackoverflow.com/a/59303962/4440427. It should be noted that the solution is for restoring from a binary dump instead of from a plain SQL as asked by the OP. But it can be modified slightly to adapt to the plain SQL case
Dockerfile:
FROM postgres:9.6.16-alpine
LABEL maintainer="lu#cobrainer.com"
LABEL org="Cobrainer GmbH"
ARG PG_POSTGRES_PWD=postgres
ARG DBUSER=someuser
ARG DBUSER_PWD=P#ssw0rd
ARG DBNAME=sampledb
ARG DB_DUMP_FILE=example.pg
ENV POSTGRES_DB launchpad
ENV POSTGRES_USER postgres
ENV POSTGRES_PASSWORD ${PG_POSTGRES_PWD}
ENV PGDATA /pgdata
COPY wait-for-pg-isready.sh /tmp/wait-for-pg-isready.sh
COPY ${DB_DUMP_FILE} /tmp/pgdump.pg
RUN set -e && \
nohup bash -c "docker-entrypoint.sh postgres &" && \
/tmp/wait-for-pg-isready.sh && \
psql -U postgres -c "CREATE USER ${DBUSER} WITH SUPERUSER CREATEDB CREATEROLE ENCRYPTED PASSWORD '${DBUSER_PWD}';" && \
psql -U ${DBUSER} -d ${POSTGRES_DB} -c "CREATE DATABASE ${DBNAME} TEMPLATE template0;" && \
pg_restore -v --no-owner --role=${DBUSER} --exit-on-error -U ${DBUSER} -d ${DBNAME} /tmp/pgdump.pg && \
psql -U postgres -c "ALTER USER ${DBUSER} WITH NOSUPERUSER;" && \
rm -rf /tmp/pgdump.pg
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
CMD pg_isready -U postgres -d launchpad
where the wait-for-pg-isready.sh is:
#!/bin/bash
set -e
get_non_lo_ip() {
local _ip _non_lo_ip _line _nl=$'\n'
while IFS=$': \t' read -a _line ;do
[ -z "${_line%inet}" ] &&
_ip=${_line[${#_line[1]}>4?1:2]} &&
[ "${_ip#127.0.0.1}" ] && _non_lo_ip=$_ip
done< <(LANG=C /sbin/ifconfig)
printf ${1+-v} $1 "%s${_nl:0:$[${#1}>0?0:1]}" $_non_lo_ip
}
get_non_lo_ip NON_LO_IP
until pg_isready -h $NON_LO_IP -U "postgres" -d "launchpad"; do
>&2 echo "Postgres is not ready - sleeping..."
sleep 4
done
>&2 echo "Postgres is up - you can execute commands now"
The above scripts together with a more detailed README are available at https://github.com/cobrainer/pg-docker-with-restored-db
I was able to load the data in by pre-pending the run command in the docker file with /etc/init.d/postgresql. My docker file has the following line which is working for me:
RUN /etc/init.d/postgresql start && /usr/bin/psql -a < /tmp/dump.sql
We for E2E test in which we need a database with structure and data already saved in the Docker image we have done the following:
Dockerfile:
FROM postgres:9.4.24-alpine
ENV POSTGRES_USER postgres
ENV POSTGRES_PASSWORD postgres
ENV PGDATA /pgdata
COPY database.backup /tmp/
COPY database_restore.sh /docker-entrypoint-initdb.d/
RUN /docker-entrypoint.sh --help
RUN rm -rf /docker-entrypoint-initdb.d/database_restore.sh
RUN rm -rf /tmp/database.backup
database_restore.sh:
#!/bin/sh
set -e
pg_restore -C -d postgres /tmp/database.backup
To create the image:
docker build .
To start the container:
docker run --name docker-postgres -d -p 5432:5432 <Id-docker-image>
This does not restore the database every time the container is booted. The structure and data of the database is already contained in the created Docker image.
We have based on this article, but eliminating the multistage:
Creating Fast, Lightweight Testing Databases in Docker
Edit: With version 9.4-alpine does not work now because it does not
run the database_restore.sh scrips. Use version 9.4.24-alpine
My goal was to have an image that contains the database - i. e. saving the time to rebuild it everytime I do docker run oder docker-compose up.
We would just have to manage to get the line exec "$#" out of docker-entrypoint.sh. So I added into my Dockerfile:
#Copy my ssql scripts into the image to /docker-entrypoint-initdb.d:
COPY ./init_db /docker-entrypoint-initdb.d
#init db
RUN grep -v 'exec "$#"' /usr/local/bin/docker-entrypoint.sh > /tmp/docker-entrypoint-without-serverstart.sh && \
chmod a+x /tmp/docker-entrypoint-without-serverstart.sh && \
/tmp/docker-entrypoint-without-serverstart.sh postgres && \
rm -rf /docker-entrypoint-initdb.d/* /tmp/docker-entrypoint-without-serverstart.sh

PostgreSQL - Backup and Restore Database Tables with Partitions

I'm working on PostgreSQL 8.4 and I'd like to do backup and restore (from Ubuntu 11.10 to Ubuntu 12.4)
I want to include all partitions, clusters, roles and stuff.
My commands:
Back up:
dumb_all > filename
Compress:
zip -f mybackup
Uncompress and restore:
sudo gunzip -c /home/ubuntu/Desktop/backupFile.zip | psql -U postgres
The issue is in the restore process, I got an error
invalid command \.
ERROR: syntax error at or near "2"
LINE 1: 2 2 1
^
invalid command \.
ERROR: syntax error at or near "1"
LINE 1: ...
^
out of memory
Plus, the tables with partitions did not restored. also some tables restored without any data!
Please help!
EDIT
I used pgAdmin to do the back up, using the "backup server" option.
If you did used zip to compress the output, then you should use unzip do uncompress it, not gunzip, they use different formats/algorithms.
I'd suggest you to use gzip and gunzip only. For instance, if you generated a backup named mybackup.sql, you can gzip it with:
gzip mybackup.sql
It will generate a file named mybackup.sql.gz. Then, to restore, you can use:
gunzip -c mybackup.sql.gz | psql -U postgres
Also, I'd suggest you to avoid using pgAdmin to do the dump. Not that it can't do, it is just that you can't automatize it, you can easily use pg_dumpall the same way:
pg_dumpall -U postgres -f mybackup.sql
You can either dump and compress without intermediate files using pipe:
pg_dumpall -U postgres | gzip -c > mybackup.sql.gz
BTW, I'd really suggest you avoiding pg_dumpall and use pg_dump with custom format for each database, as with that you already get the result compressed and easier to use latter. But pg_dumpall is ok for small databases.

Resources