Purging Data of a large postgres table

Purging Data of a large postgres table - database

We used to store logs in a postgres table and with time that table has grown in size more than 600GB. We tried to implement the purging strategy on the table to delete table's data. Purging query run fine and returns with number of rows deleted, however the table size is still not reducing.
We tried to delete the data using delete queries and more than 348 million rows were deleted. The size still didn't reduce. Auto-vacuum process was also monitored which ran fine but size still didn't reduce.
We are using RDS postgres and the current storage capacity of RDS is around 1200GB, free space is around 100GB. I want to know if I am missing anything.

Related

Reading a table from Oracle is 10 times slower than reading from Sql-Server

I have a table with just 5 columns and 8500 rows. The following simple query on an Oracle database is taking around 8 seconds whereas if I import the same table into a Sql-Server database then, it takes less than 1 seconds.
SELECT CustomerList.* FROM CustomerList ORDER BY TypeID, OrderNr, Title
QUESTION: I am completely new to databases and have started acquiring knowledge about it, but 8 seconds for 8500 records is a way too long time. Is there anything that I can try to resolve the issue?
UPDATE: I exported the table from the Oracle database as a text file and then imported the test file into another fresh Oracle database to create the same table. When I executed the above query onto this newly created database, the execution time of the query is again the same as before (i.e. around 8 seconds).

Regarding High Water Mark (HWM). IN oracle, space for a table's rows is allocated in big chunks called an 'extent'. When an extent is filled up with rows of data a new extent is allocated. The HWM is the pointer to the highest allocated address.
If rows are deleted, the space occupied remains allocated to that table and available for new rows without have to acquire more space for them. And the HWM remains. Even if you delete ALL of the rows (simple DELETE FROM MYTABLE), all of the space remains allocated to the table and available for new rows without having to acquire more space. And the HWM remains. So say you have a table with 1 billion rows. Then you delete all but one of those rows. You still have the space for 1 billion, and the HwM set accordingly. Now, if you select from that table without a WHERE condition that would use an index (thus forcing a Full Table Scan, or FTS) oracle still has to scan that billion-row space to find all of the rows, which could be scattered across the whole space. But when you insert those rows into another database (or even another table in the same database) you only need enough space for those rows. So selecting against the new table is accordingly faster.
That is ONE possibility of your issue.

PostgreSQL failed imports still claim hard disk space? Need to clear cache?

I have a PostgreSQL (10.0 on OS X) database with a single table for the moment. I have noticed something weird when I'm importing a csv file in that table.
When the import fails for various reasons (e.g. one extra row in the csv file or too many characters in a column for a given row), no rows are being added to the table but PostgreSQL still claims that space on my hard disk.
Now, I have a very big csv to import and it failed several time because the csv was not compliant to begin with - so I had tons of import fails that I fixed and tried to import again. What I've realized now is that my computer storage has been reduced by 30-50 GB or so because of that and my database is still empty.
Is that normal?
I suspect this is somewhere in my database cache. Is there a way for me to clear that cache or do I have to fully reinstall my database?
Thanks!

Inserting rows into the database will increase the table size.
Even if the COPY statement fails, the rows that have been inserted so far remain in the table, but they are dead rows since the transaction that inserted them failed.
In PostgreSQL, the SQL statement VACUUM will free that space. That typically does not shrink the table, but it makes the space available for future inserts.
Normally, this is done automatically in the background by the autovacuum daemon.
There are several possibilities:
You disabled autovacuum.
Autovacuum is not fast enough cleaning up the table, so the next load cannot reuse the space yet.
What can you do:
Run VACUUM (VERBOSE) on the table to remove the dead rows manually.
If you want to reduce the table size, run VACUUM (FULL) on the table. That will lock the table for the duration of the operation.

Tablespace is not freed after dropping tables (Oracle 11g)

I have a Oracle 11g database with block size = 8192. So, if I'm correct maximum datafile size will be 32GB.
I have a huge table containing around 10 million records. Data in this table will be purged often. For purging we chose CTAS as a better option as we are going to delete greater portion of the data.
As we'll be dropping the old table after CTAS, the old tables are not releasing the space for new tables. I understand that a tablespace has AUTOEXTEND option but no AUTOSHRINK. But the space occupied by old tables should be available for new tables, which is not happening in this case.
I'm getting an Exception saying
ORA-01652: unable to extend temp segment by 8192 in tablespace
FYI the only operation happening all the time is CTAS + Dropping the old table. Nothing else. First time this is working fine, but when the same operation is done the second time, exception arises.
I tried adding an additional datafile to the tablespace, but after few more purge operations on the table, this is also getting full to 32GB and the issue continues.

Speeding up "Tasks" > "Import Data..." in SQL Server 2012 (indexes and file growth)?

I'm copying 99 million rows from one SQL Server instance to another using the right-click "Tasks" > "Import Data" method. It's just a straight copy into a new, empty table on a new and empty NDF file. I'm using the identity insert when doing the copy so that the IDs will stay in tact. It was going very slowly (30 million records after 12 hours), so my boss told me to cancel it, remove all indexes from the new empty table, then run again.
Will removing indexes on the new table really speed up the transfer of records, and why? I imagine I can create indexes after the table is filled.
What is the underlying process behind right-click "Import Data"? Is it using SqlBulkCopy, is it logging tons of stuff? I know it's not in a transaction because cancelling it stopped it immediately and the already inserted rows were there.
My file growth on the NDF file that holds the table is 20MB. Will increasing this speed up the process when using the above records on 99 million records? It's just an idea I had.

Yes, it should. Each new row being inserted will cause each index to be updated with the new data. It's worth noting that if you remove the indexes, import, then re-add the indexes, those indexes will take a very long time to build anyway.
It essentially runs as a very simple SSIS package. It reads rows from the source and inserts in chunks as a transaction. If your recovery model is set to Full, you could switch it to Bulk Logged for the import. This should be done if you're bulk-moving data when other updates to the database won't be happening, though.
I would try to size the MDF/NDF close to what you'd expect the end result to be. The autogrowth can take time, especially if you have it set low.

DB2 - Reclaiming disk space used by dropped tables

I have an application that logs to a DB2 database. Each log is stored in a daily table, meaning that I have several tables, one per each day.
Since the application is running for quite some time, I dropped some of the older daily tables, but the disk space was not reclaimed.
I understand this is normal in DB2, so I goggled and found out that the following command can be used to reclaim space:
db2 alter tablespace <table space> reduce max
Since the tablespace that store the daily log tables is called USERSPACE1, I executed the following command successfully:
db2 alter tablespace userspace1 reduce max
Unfortunately the disk space used by DB2 instance is still the same...
I've read somewhere that the REORG command can be executed, but what I've seen it is used to reorganize tables. Since I dropped the tables, how can I use REORG?
Is there any other way to do this?
Thanks

Reduce the size of a tablespace is very complex. The extents (set of contiguous pages; unit of tablespace allocation) of the tables are not distributed sequentially for a same table. When you reorg a table, the rows will be organized in pages, and the new pages will be written normally at the end of the tablespace. Sometimes, the high watermark will be increased, and your tablespace will be bigger.
You need to reorg all tables from a tablespace in order to "defrag" all tables. Then, you have to perform a new reorg in order to use the previous space, because it should be an empty space in the tablespace.
However, there are many criteria that impacts the organization of the tables in a tablespace: New extents are created (new rows, rows overflow due to updates); compression could be activated after reorg.
What you can do is to assign few or just one table per tablespace; however, you will waste a lot of space (overhead, empty pages, etc.)
The command that you are using is an automatic way to do that, but it does not always work as desired: http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0055392.html
If you want to see the distribution of the tables in your tablespace, you can use db2dart. Then, you can have an idea of which table to reorg (move).

Sorry guys,
The command that I mentioned on the original post works after all, but the space was retrieved very slowly.
Thanks for the help

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Purging Data of a large postgres table - database

Related

Reading a table from Oracle is 10 times slower than reading from Sql-Server

PostgreSQL failed imports still claim hard disk space? Need to clear cache?

Tablespace is not freed after dropping tables (Oracle 11g)

Speeding up "Tasks" > "Import Data..." in SQL Server 2012 (indexes and file growth)?

DB2 - Reclaiming disk space used by dropped tables

Categories

Resources