Performance tuning methods in Bulk Updates

Performance tuning methods in Bulk Updates - database

I need to update 2 million records in 120 tables. I have created indexes on each table since the same column is referred in where clause and in update statement. Each table has 100 000 records on an average. I have disabled the foreign key constraints before update.
For this I have written a procedure where it will fetch the records and run the update statements. From the blogs I came to know Bulk Collect & FORALL is good option in oracle, but there is only small difference in time that I can see to run the update statements. Are they any other approach to increase the performance and reduce the time to update the records into the tables?

Related

Partitioning table that is typically only inserted to in SQL Server 2012

I have a table that is having approximately 450,000 records per month going into it. It is an audit table of sorts that tracks changes to other tables in the database. That is, inserts, updates and deletes of records. Typically this table is not queried (perhaps only 2-3 times per month to examine how data in other tables changed and under very specific circumstances)
It has been put to me that we should consider partitioning this table to help improve database performance. If the table is only being inserted to 99.9% of the time and rarely queried, would there be any tangible benefit to this partitioning this table?
Thanks.

If the table is only being inserted to 99.9% of the time and rarely
queried, would there be any tangible benefit to this partitioning this
table?
Partitioning is mostly a manageability feature. I would expect no difference in insert performance with or without able partitioning. For SELECT queries, partitioning may improve performance of large scans if partitions can be eliminated (i.e. partitioning column specified in WHERE clause, but indexing and query tuning is usually the key to performance.
Partitioning can improve performance of purge operations. For example, you could use a monthly sliding window to purge an entire month of data at once rather than individual row deletes. I don't know if that's with the trouble with only 450K rows/month, though.

I think you want to get fast access to your recent data.
Add date column as first column in primary key clustered instead of partioning

Bulk insert with data validation

We have a requirement to insert large number of records (about 2 to 3 millions) in a table. However, we should be able to validate and segregate invalid records - primary key, foreign key and non null violations - into a separate error table for later reference. Per my study, bulk insert in SQL server works well for inserting, but I'm not able to figure out the best way to filter out bad data records. Does having having a staging table in between help? Although we could check for violations using some queues against staging table, we must load the good records into the actual table with another insert - either through insert select or merge - but is this an efficient approach? I'm concerned as it would be akin to doing 2x inserts.
I'm planning to use .net sqlbulkcopy for doing bulk insertions and it doesn't have a clear error reporting as well.
Can somebody point me to a more efficient solution?
EDIT: If this approach is the only solution, what method do you think is best for the second insert? Is it insert...select or MERGE? Would they match the efficiency and speed of BULK INSERT? Or is there any other better alternative?
Thanks!

Personally I would not consider 2/3M records as a large amount.
Unless you need the data in seconds, A Single (Non-Bulk) insert will perform adequately.
If I'm nervous about the src data quality - I like to load to a stg table first and then do "Soft RI" - Check for PKs, UQs, FKs etc using SQL.
If I'm worried about Numeric/non-Numeric or bad date type issues then I make the Stg table VARCHAR(8000) for all cols and use TRY_CONVERT when reading from the table.
Once data is in STG you can easily filter only the good rows and report in detail on the bad rows.

T-SQL - synchronize query result with table

I have sql query composed from 3 tables. I need to synchronize the query result with 1 table every 24 hours.
The idea is to run query every 24 hours. Compare the result with target table. Delete the target table rows, insert new ones or update existing rows.
I'm asking for "best-practices" to deal with this kind of situation.
Thank you.

Quickly update a large amount of rows in a table, without blocking inserts on referencing tables, in SQL Server 2008

Context:
I have a system that acts as a Web UI for a legacy accounting system. This legacy system sends me a large text file, several times a day, so I can update a CONTRACT table in my database (the file can have new contracts, or just updated values for existing contracts). This table currently has around 2M rows and about 150 columns. I can't have downtime during these updates, since they happen during the day and there's usually about 40 logged users in any given time.
My system's users can't update the CONTRACT table, but they can insert records in tables that reference the CONTRACT table (Foreign Keys to the CONTRACT table's ID column).
To update my CONTRACT table I first load the text file into a staging table, using a bulk insert, and then I use a MERGE statement to create or update the rows, in batches of 100k records. And here's my problem - during the MERGE statement, because I'm using READ COMMITED SNAPSHOT isolation, the users can keep viewing the data, but they can't insert anything - the transactions will timeout because the CONTRACT table is locked.
Question: does anyone know of a way to quickly update this large amount of rows, while enforcing data integrity and without blocking inserts on referencing tables?
I've thought about a few workarounds, but I'm hoping there's a better way:
Drop the foreign keys. - I'd like to enforce my data consistency, so this don't sound like a good solution.
Decrease the batch size on the MERGE statement so that the transaction is fast enough not to cause timeouts on other transactions. - I have tried this, but the sync process becomes too slow; Has I mentioned above, I receive the update files frequently and it's vital that the updated data is available shortly after.
Create an intermediate table, with a single CONTRACTID column and have other tables reference that table, instead of the CONTRACT table. This would allow me to update it much faster while keeping a decent integrity. - I guess it would work, but it sounds convoluted.
Update:
I ended up dropping my foreign keys. Since the system has been in production for some time and the logs don't ever show foreign key constraint violations, I'm pretty sure no inconsistent data will be created. Thanks to everyone who commented.

Creating an Index on left(date,4), is this possible? Or is there a better way to handle large data?

I'm taking over an database with a table that is growing out of control. It has transaction records for 2011, 2012, 2013 and into the future.
The table is crucial to the company's operation. But it is growing out of control with 730k records and growing with transaction being added bi-weekly.
I do not wish to alter the existing structure of the table because many existing operations depend on it, so far it has an index on the transaction ID and transaction date. But it is becoming very cumbersome to query the table.
Would it be wise or is it even possible to index them to just the year of the transaction dates by using left(date,4) as part of the index?
EDIT: the table is not normalized (and I don't see the purpose to normalize since each row is unique to the claim number), and there are 168 fields to each record with 5 differnent "memo" fields of varchar(255).

One of the options is to create a Filtered Index - it is like uniting a group of records
depending on specific criteria.
In your case you should create several indexed - each filtering the records for specific year.
For example:
CREATE NONCLUSTERED INDEX IndexFor2013Year
ON MyTable.BillOfMaterials (SomeDate)
WHERE SomeDate>"2013-01-01 00:00:00" and SomeDate<"2014-01-01 00:00:00";
GO
Anyway, creating many indexes to a table that is often under DML operations (UPDATE/ INSERT/ DELETE) could even lead to bad performance.
You should perform some test and compare the execution plans.
And please, note that I am giving just an example - depending on what exactly is your query you should create an index. Sometimes, watching the execution plans of my queries I am proposed by the SQL Management Studio (2012) what indexes exactly could lead to better performance.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight