Delay to postgresql's operations - database

I'm facing a lot of delay trying to make some basic commands at postgresql, such as create an index or delete an unique row from a table:
The table that I was trying to delete a row is "A" that is referenced from other tables "B", "C" and "D".
Queries with SELECT at table "A" are working normally, it returns the data very quickly, it has around 100k rows. But when I try to DELETE even an unique row, I takes more than an hour and then my session expires.
I noticed that table "D" doesn't have a Index to refer to table "A", and when I try to create a index to it, I face again with an hour of delay and then my session expires.

Did you enable "ON DELETE CASCADE" to your table? this can make your system very slow. In that case, create indexes for the foreign keys
Some time this can happen due to corrupted Indexes.
(one time i faced the similar kind of issue due to the corrupted Indexes. It's a very stagnate situation: That happened because VM image backup and DB backup ran at the same time and that process created me the corrupted indexes)
try to re create those.

Related

Strategies to modify huge database

I am testing different strategies for a incoming breaking change. The problem is that each experiment would carry some costs in Azure.
The data is huge, and can have some inconsistencies due to many years with fixes and transactions before I even knew the company.
I need to change a column in a table with million of records and dozens of indexes. This will have a big downtime.
ALTER TABLE X ALTER COLUMN A1 decimal(15, 4) --The original column is int
One of the initial ideas (Now I know this is not possible) is to have a secondary replica, do the changes there, and, when changes finish, swap primary with secondary... zero or almost zero downtime. I am referring to a "live", redundant replica, not just a "copy"
EDIT:
Throwing new ideas:
Variations to what have been mentioned in one of the answers: Create a table replica (not the whole DB, just the table), apply a INSERT INTO... SELECT and swap the tables at the end of the process. Or... do the swap early to minimize downtime in trade of a delay during the post-addition of all records from the source
I have tried this, but takes AGES to complete. Also, some null and FK violations make the process to fail after processing for several hours.
"Resuming" could be an option but it makes the process slower with each execution. Without some kind of "Resume", each failure have to be repeated from scratch
An acceptable improvement could be to IGNORE the errors (but create logs, of course) and apply fixes after migration. But afaik, AzureSql (nor SqlServer) doesn't offer an "ignore" option
Drop all indexes, constraints and dependencies to the column that needs to be modified, modify the column and apply all indexes, constraints and dependencies again.
Also tried this one. Some indexes take AGES to complete. But for now seems to be the best bet.
There is a possible variation by applying ROW COMPRESSION before the datatype change, but I think it will not improve the real deal: index re-creation
Create a new column with the target datatype, copy the data from the source column, drop the old column and rename the new one.
This strategy also requires to drop and regenerate indexes, so it will not offer lot of gain (if any) with regards #2.
A friend thought of a variation on this, which is to duplicate the needed indexes ONLINE for the column copy. In the meanwhile, trigger all changes on source column to the column copy.
For any of the mentioned strategies, some gain can be obtained by increasing the processing power. But, anyway, we consider to increase the power with any of the approaches, therefore this is common for all solutions
When you need to update A LOT of rows as a one-time event, maybe it's more effective to use the following migration technique :
create a new target table
use INSERT INTO SELECT to fill the new table with correct / updated values
rename the old and new table
create indexes for the new table
After many tests and backups, we finally used the following aproach:
Create a new column [columnName_NEW] with the desired format change. Allow NULLS
Create a trigger for INSERTS to update the new column with the value in the column to be replaced
Copy the old column value to the new column by batches
This operation is very time consuming. We ran a batch every day in a maintenance window (2h during 4 days). Our batch filled the values taking oldest rows first, we counted on the trigger filling the new ones
Once #3 is complete, don't allow NULLS anymore on the new column, but set a default value to avoid the INSERT trigger to crash
Create all the needed indexes and views on the new column. This is very time consuming but can be done ONLINE
Allow NULLS on the old column
Remove the insert trigger - start downtime now!
Rename the old column to [columnName_OLD], the new to [columnName]. This requires few downtime seconds!
--> You can consider it is finally done!
After some safe time, you can backup the result and remove [columnName_OLD] with all of its dependencies
I selected the other answer, because I think it could be also useful in most situations. This one has more steps but has a very little downtime and is reversible at any step but the last.

Clustered Column Store Index gets created in the beginning and vanishes once the job is completed

I have an existing application which has many SQL Server stored procedures that run as below.These stored procs are applied on a data file and compute is done as per some business rules.
1) Pre-process
2) Process
3) Post-Process
In Pre-process, we are creating 'n' no. of tables with clustered column store index in place. When the job kicks off the tables get created with clustered column store index but the indexes vanish once the job is completed. ( This happens only for a large input data file. )
When I run the job on a small data file the clustered column store index gets created on the tables and it exists even after the completion of job.
Note :- The code is the same when i executed it for both small and large data files.
Can somebody share your thoughts on this if you have encountered similar problem?
Two things will cause an already fully established Index to 'vanish' from a table:
A process or user deletes it.
The transaction in which the index was created is rolled back, either because an exception was raised later in the transaction, the transaction wasn't recoverable, or via an explicit Rollback.
And that's it. You're answer lies in one of the two above.
I know this is not the answer you were looking for, it is however guaranteed to be THE answer. Somewhere your code is failing and that's why the indexes are now vanishing.
Sql Server isn't a slapdash RDBMS - if it just arbitrarily just randomly dropped indexes then you know we'd be all over it. By your own admission you have complicated code.
Our DataWarehouse routinely drops and rebuilds indexes of all sorts - the only times it's 'missing' them has been the result of a bug in our code.

Best way to handle updates on a table

I am looking for much more better way to update tables using SSIS. Specifically, i wanted to optimize the updates on tables (around 10 tables uses same logic).
The logic is,
Select the source data from staging then inserts into physical temp table in the DW (i.e TMP_Tbl)
Update all data matching by customerId column from TMP_Tbl to MyTbl.
Inserts all non-existing customerId column from TMP_Tbl1 to MyTbl.
Using the above steps, this takes some time populating TMP_Tbl. Hence, i planned to change the logic to delete-insert but according to this:
In SQL, is UPDATE always faster than DELETE+INSERT? this would be a recipe for pain.
Given:
no index/keys used on the tables
some tables contains 5M rows, some contains 2k rows
each table update took up to 2-3 minutes, which took for about (15 to 20 minutes) all in all
these updates we're in separate sequence container simultaneously runs
Anyone knows what's the best way to use, seems like using physical temp table needs to be remove, is this normal?
With SSIS you usually BULK INSERT, not INSERT. So if you do not mind DELETE - reinserting the rows should in general outperform UPDATE.
Considering this the faster approach will be:
[Execute SQL Task] Delete all records which you need to update. (Depending on your DB design and queries, some index may help here).
[Data Flow Task] Fast load (using OLE DB Destination, Data access mode: Table of fiew - fast load) both updated and new records from source into MyTbl. No need for temp tables here.
If you cannot/don't want to DELETE records - your current approach is OK too.
You just need to fix the performance of that UPDATE query (adding an index should help). 2-3 minutes per every record updated is way too long.
If it is 2-3 minutes for updating millions of records though - then it's acceptable.
Adding the correct non-clustered index to a table should not result in "much more time on the updates".
There will be a slight overhead, but if it helps your UPDATE to seek instead of scanning a big table - it is usually well worth it.

How can I block users while I truncate a SQL Table

We have a SQL Server 2008R2 Table that tracks incremented unique key values, like transaction numbers, etc. It's like a bad version of sequence objects. There are about 300 of these unique keys in the table.
My problem is that the table grows several 100,000 rows every day because it keeps the previously used numbers.
My issue is that we have to clean out the table once a week or performance suffers. I want to use a truncate and then kick off the SP to generate the next incremented value for each of the 300 keys. This works with a run time of about 5 minutes, but during this time the system is trying to use the table and throwing errors because there is no data.
Is there any way to lock the table, preventing user access, truncate and then lift the lock?
TRUNCATE automatically will lock the whole table. A delete statement will implement row locking, which will not interfere with your user queries. You might want to think about a purge of old records during off hours.
This will require cooperation by the readers. If you want to avoid using a highly blocking isolation level like serializable, you can use sp_getapplock and sp_releaseapplock to protect the table during the regeneration process. https://msdn.microsoft.com/en-us/library/ms189823.aspx
An alternative might be to build your new set in another table and then use sp_rename to swap them out.

Deleting Rows from a SQL Table marked for Replication

I erroneously delete all the rows from a MS SQL 2000 table that is used in merge replication (the table is on the publisher). I then compounded the issue by using a DTS operation to retrieve the rows from a backup database and repopulate the table.
This has created the following issue:
The delete operation marked the rows for deletion on the clients but the DTS operation bypasses the replication triggers so the imported rows are not marked for insertion on the subscribers. In effect the subscribers lose the data although it is on the publisher.
So I thought "no worries" I will just delete the rows again and then add them correctly via an insert statement and they will then be marked for insertion on the subscribers.
This is my problem:
I cannot delete the DTSed rows because I get a "Cannot insert duplicate key row in object 'MSmerge_tombstone' with unique index 'uc1MSmerge_tombstone'." error. What I would like to do is somehow delete the rows from the table bypassing the merge replication trigger. Is this possible? I don't want to remove and redo the replication because the subscribers are 50+ windows mobile devices.
Edit: I have tried the Truncate Table command. This gives the following error "Cannot truncate table xxxx because it is published for replication"
Have you tried truncating the table?
You may have to truncate the table and reset the ID field back to 0 if you need the inserted rows to have the same ID. If not, just truncate and it should be fine.
You also could look into temporarily dropping the unique index and adding it back when you're done.
Look into sp_mergedummyupdate
Would creating a second table be an option? You could create a second table, populate it with the needed data, add the constraints/indexes, then drop the first table and rename your second table. This should give you the data with the right keys...and it should all consist of SQL statements that are allowed to trickle down the replication. It just isn't probably the best on performance...and definitely would impose some risk.
I haven't tried this first hand in a replicated environment...but it may be at least worth trying out.
Thanks for the tips...I eventually found a solution:
I deleted the merge delete trigger from the table
Deleted the DTSed rows
Recreated the merge delete trigger
Added my rows correctly using an insert statement.
I was a little worried bout fiddling with the merge triggers but every thing appears to be working correctly.

Resources