Improve PostgreSQL pg_restore Performance from 130 hours - database

I am trying to improve the time taken to restore a PostgreSQL database backup using pg_restore. The 29 GB gzip-compressed backup file is created from a 380 GB PostgreSQL database using pg_dump -Z0 -Fc piped into pigz.
During pg_restore, the database size is increasing at a rate of 50 MB/minute estimated using the SELECT pg_size_pretty(pg_database_size()) query. At this rate, it will take approximately 130 hours to complete the restore which is a very long time.
On further investigation, it appears that the CPU usage is low despite setting pg_restore to use 4 workers.
The disk write speed and IOPS are also very low:
Benchmarking the system's IO using fio has shown that it can do 300 MB/s writes and 2000 IOPS, so we are utilizing only about 20% of the potential IO capabilities.
Is there any way to speed up the database restore?
System
Ubuntu 18.04.3
1 vCPU, 2 GB RAM, 4GB Swap
500 GB ZFS (2-way mirror array)
PostgreSQL 11.6
TimescaleDB 1.60
Steps taken to perform restore:
Decompress the .gz file to /var/lib/postgresql/backups/backup_2020-02-29 (~ 40mins)
Modify postgresql.conf settings
work_mem = 32MB
shared_buffers = 1GB
maintenance_work_mem = 1GB
full_page_writes = off
autovacuum = off
wal_buffers = -1
pg_ctl restart
Run the following commands inside psql:
CREATE DATABASE database_development;
\c database_development
CREATE EXTENSION timescaledb;
SELECT timescaledb_pre_restore();
\! time pg_restore -j 4 -Fc -d database_development /var/lib/postgresql/backups/backup_2020-02-29
SELECT timescaledb_post_restore();

Your database system is I/O bound, as you can see from the %iowait value of 63.62.
Increasing maintenance_work_mem might improve the situation a little, but essentially you need faster storage.

Related

Database migration with Dbeaver using 14 GB RAM

I'm doing a database migration from Oracle tables to SQL Server tables. Two of the three tables were successful during the first try, mostly because they didn't have that many rows as the third (about 3,5 million rows with around 30 columns). I took me around 15 attempts to accomplish the migration process, because Dbeaver use all the RAM available (around 14 GB).
Migration using 10.000/100.000 segments, CPU use to 100% for many minutes, Dbeaver crashed because the JVM use all assigned memory.
After increasing the JVM memory to 14 GB, migration crashed because the system didn't have more RAM available.
I did change the segment size many times with no results. I ended up using the 'direct Query' and after 1,5 hours it finished successfully.
The question is: Why Dbeaver keeps using RAM without GC cleaning it?
How can I change the beauvoir of the GC to be more 'eager'?
Thanks.

Postgres SQL Database consuming all space in hard disk

I have a serious problem with a postgres 8.3 database running on CentOs, a closed service use this database to store analog variables on a especific table from 5 and 5 minutes about 2 years, this information is precious from my company.
A nice day the postgres process stopped and the Hard drive from server was full. We order more 2 compatible hd with the server, it will take 30 days to arrive, but the application have to work until this drives arrive.
Today the application insert around 400 MB per day on Database.
I just symlinked some folders to a external 1 TB HD USB 2.0.
The actual situation is:
df -h
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 ext4 1010G 1009G 224M 100% /
/dev/sda2 ext4 99M 17M 78M 18% /boot
and the $PGDATA folder have
/var/pgsql/data 1005G
The main table have aroud 600GB.
I just make a Backup.tar from the database and delete a lot of older rows, and trying run the VACUUM command, on the middle of process a error about disk space occurs and don't finish the VACUUM.
I've already tried run VACUUM, REINDEX, CLUSTER, and I have the same error.
I'm thinking on DROP TABLE and CREATE TABLE, it's the best option? free the disk space??
Someone have a tip or a solution for this case?
Thanks, and sorry for the english.

most impactful Postgres settings to tweak when host has lots of free RAM

My employer runs Postgres on a decently "large" VM. It is currently configured with 24 cores and 128 GB physical RAM.
Our monitoring solution indicates that the Postgres processes never consume more than about 11 GB of RAM even during periods of heaviest load. Presumably all the remaining free RAM is used by the OS to cache the filesystem.
My question: What configuration settings, if tweaked, are most likely to provide performance gains given a workload that's a mixture of transactional and analytical?
In other words, given there's an embarassingly large amount of free RAM, where am I likely to derive the most "bang for my buck" settings-wise?
EDITED TO ADD:
Here are the current values for some settings frequently mentioned in tuning guides. Note: I didn't set these values; I'm just reading what's in the conf file:
shared_buffers = 32GB
work_mem = 144MB
effective_cache_size = 120GB
"sort_mem" and "max_fsm_pages" weren't set anywhere in the file.
The Postgres version is 9.3.5.
The setting that controls Postgres memory usage is shared_buffers. The recommended setting is 25% of RAM with a maximum of 8GB.
Since 11GB is close to 8GB, it seems your system is tuned well. You could use effective_cache_size to tell Postgres you have a server with a large amount of memory for OS disk caching.
Two good places for starting Postgres performance tuning:
Turn on SQL query logging and explain analyze slow or frequent queries
Use pg_activity (a "top" for Postgres) to see what keeps your server busy

SQL Server long running query taking hours but using low CPU

I'm running some stored procedures in SQL Server 2012 under Windows Server 2012 in a dedicated server with 32 GB of RAM and 8 CPU cores. The CPU usage is always below 10% and the RAM usage is at 80% because SQL Server has 20 GB (of 32 GB) assigned.
There are some stored procedures that are taking 4 hours some days and other days, with almost the same data, are taking 7 or 8 hours.
I'm using the least restrictive isolation level so I think this should not be a locking problem. The database size is around 100 GB and the biggest table has around 5 million records.
The processes have bulk inserts, updates and deletes (in some cases I can use truncate to avoid generating logs and save some time). I'm making some full-text-search queries in one table.
I have full control of the server so I can change any configuration parameter.
I have a few questions:
Is it possible to improve the performance of the queries using
parallelism?
Why is the CPU usage so low?
What are the best practises for configuring SQL Server?
What are the best free tools for auditing the server? I tried one
from Microsoft called SQL Server 2012 BPA but the report is always
empty with no warnings.
EDIT:
I checked the log and I found this:
03/18/2015 11:09:25,spid26s,Unknown,SQL Server has encountered 82 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.HLSQLSERVER\MSSQL\DATA\templog.ldf] in database [tempdb] (2). The OS file handle is 0x0000000000000BF8. The offset of the latest long I/O is: 0x00000001fe4000
Bump up max memory to 24 gb.
Move tempdb off the c drive and consider mult tempdb files, with auto grow at least 128 Mbps or 256 Mbps.
Install performance dashboard and run performance dashboard report to see what queries are running and check waits.
If you are using auto grow on user data log and log files of 10%, change that to something similar to tempdb growth above.
Using performance dashboard check for obvious missing indexes that predict 95% or higher improvement impact.
Disregard all the nay Sayers who say not to do what I'm suggesting. If you do these 5 things and you're still having trouble post some of the results from performance dashboard, which by the way is free.
One more thing that may be helpful, download and install the sp_whoisactive stored proc, run it and see what processes are running. Research the queries that you find after running sp_whoisactive.
query taking hours but using low CPU
You say that as if CPU would matter for most db operations. HINT: They do not.
Databases need IO. RAM sin some cases helps mitigate this, but at the end it runs down to IO.
And you know what I see in your question? CPU, Memory (somehow assuming 32gb is impressive) but NO WORD ON DISC LAYOUT.
And that is what matters. Discs, distribution of files to spread the load.
If you look into performance counters then you will see latency being super high on discs - because whatever "pathetic" (in sql server terms) disc layout you have there, it simply is not up to the task.
Time to start buying. SSD are a LOT cheaper than discs. You may say "Oh, how are they cheaper". Well, you do not buy GB - you buy IO. And last time I checked SSD did not cost 100 times the price of discs - but they have 100 times or more the IO. and we talk always of random IO.
Then isolate Tempdb on separate SSD - tempdb either does no a lot or a TON and you want to see this.
Then isolate the log file.
Make multiple data files, for database and tempdb (particularly tempdb - as many as you have cores).
And yes, this will cost money. But at the end - you need IO and like most developers you got CPU. Bad for a database.

Postgresql - calculate database size

I need to build deploy plan for a medium application which contain many postgres database (720). The model of almost of that are similar, but i have to separate its for manager and performance. Each database have about 400.000 to 1.000.000 record include both read and write.
I have two questions:
1. How could i calculate amount of database on each machine (in centos 2.08 GHz CPU and 4 GB RAM)? Or how many database can i delop on each machine? The concurrence i guest about 10.
2. Is there any tutorial to calculate database size?
3. Is postgres database can "Active - Standby"?
I don't think that your server can possibly handle such load (if these are your true numbers).
Simple calculation: lets round up your 720 databases to 1000.
Also lets round up your average row width of 7288 to 10000 (10KB).
Assume that every database will be using 1 million rows.
Considering all these facts, total size of database in bytes can be estimated as:
1,000 * 10,000 * 1,000,000 = 10,000,000,000,000 = 10 TB
In other words, you will need at least few biggest hard drives money can buy (probably 4TB), and then you will need to use hardware or software RAID to get some adequate reliability out of them.
Note that I did not account for indexes. Depending on nature of your data and your queries, indexes can take anything from 10% to 100% of your data size. Full text search indexes can take 5x more than raw data.
At any rate, you server with just 4GB of ram will be barely moving trying to serve such a huge installation.
However, it should be able to serve not 1,000, but probably 10 or slightly more databases for your setup.

Resources