SQL Server Performance: Delete/Insert Vs Select/Update - sql-server

I have a very large database, little over 60 gigs, with many tables with millions of rows. I am getting some timeout errors, so I am rethinking some of my code design.
Currently, my pseduo code is like this:
delete from table where person=123 (deletes about 200 rows)
Then I re-insert the updated data (again, 200 rows). The data is always different, as it's time sensitive.
If I was to do an update, instead of insert, I'd have to select the row first (I'm using an ORM in c#).
tl;dr
I am just wondering, simple question, what is more cost effective.
Select / Update or Delete/Insert?

If you update any column that is part of the clustered index key then your update is handled internally as a delete/insert anyway
How would you handle the difference in cardinality with an UPDATE? Ie. person=123 has 200 rows to delete, but only 199 to insert. Update would not be able to handle this.
Your best approach should be to use a MERGE statement and a table valued parameter with the new values. Of course, no ORM can handle this, but you mention 'performance', and the terms 'performance' and 'ORM' cannot be used in the same sentence...

With Delete/Insert, you will be writing to the database twice. One time to delete and one time to insert. You will also be logging both of those transactions separately, unless you are properly wrapping the entire process in a single transaction.
You could test both methods and watch the results in SQL Profiler, but 9/10 Update will be quicker.
Could of cavets, I'd make sure the person key is indexed so that you are not doing a complete table scan to find the affected records.
Finally, as #Mundu say, you may want to do this using a parametrized query via ADO.NET instead of the ORM.

Related

Best way to handle updates on a table

I am looking for much more better way to update tables using SSIS. Specifically, i wanted to optimize the updates on tables (around 10 tables uses same logic).
The logic is,
Select the source data from staging then inserts into physical temp table in the DW (i.e TMP_Tbl)
Update all data matching by customerId column from TMP_Tbl to MyTbl.
Inserts all non-existing customerId column from TMP_Tbl1 to MyTbl.
Using the above steps, this takes some time populating TMP_Tbl. Hence, i planned to change the logic to delete-insert but according to this:
In SQL, is UPDATE always faster than DELETE+INSERT? this would be a recipe for pain.
Given:
no index/keys used on the tables
some tables contains 5M rows, some contains 2k rows
each table update took up to 2-3 minutes, which took for about (15 to 20 minutes) all in all
these updates we're in separate sequence container simultaneously runs
Anyone knows what's the best way to use, seems like using physical temp table needs to be remove, is this normal?
With SSIS you usually BULK INSERT, not INSERT. So if you do not mind DELETE - reinserting the rows should in general outperform UPDATE.
Considering this the faster approach will be:
[Execute SQL Task] Delete all records which you need to update. (Depending on your DB design and queries, some index may help here).
[Data Flow Task] Fast load (using OLE DB Destination, Data access mode: Table of fiew - fast load) both updated and new records from source into MyTbl. No need for temp tables here.
If you cannot/don't want to DELETE records - your current approach is OK too.
You just need to fix the performance of that UPDATE query (adding an index should help). 2-3 minutes per every record updated is way too long.
If it is 2-3 minutes for updating millions of records though - then it's acceptable.
Adding the correct non-clustered index to a table should not result in "much more time on the updates".
There will be a slight overhead, but if it helps your UPDATE to seek instead of scanning a big table - it is usually well worth it.

Tradeoffs with bulk update on high-traffic table

Suppose I have a SQL Server table that has millions of rows and receives over 2000 inserts per minute. A separate process needs to do a bulk update on this table, let's say with a where clause that will update 1000 rows. But it doesn't care about performance and could optionally run 1000 single-row updates using the primary key.
If the bulk update runs too long, it will block the incoming insertions, right? Whereas updating rows individually will allow insertions to squeak through the cracks and not block? So from the standpoint of optimizing performance for the insertions, am I better off running the updates one row at a time?
Updates will not block the insert but you might get an unexpected behavior if the where condition of the where condition is not applied to the new inserted rows.. So it's better to review the logic of the application to make sure that the new inserted rows are not needed in the update.
But in general the bulk update is much better than single updates.

Faster SQL Performance

I have to insert one record per tables across 30 tables. The data coming from some other System. I have to insert data in the tables for the first time, then if any update happened, then I need to update tables in the SQL Server. I have two options:
a) I can check timestamp for individual table rows and update if the timestamp is greater.
b) Everytime I can stateway delete records and insert data.
Which one will be faster in SQL Server Database? Is there any other option to address the situatation?
If you are not changing the index fields of the record, the stategy of trying to update first and then insert is usually faster than drop/insert as you don't force the database into updating a bunch of index info.
If using Sql2008+ you should be using the merge command, as it explictly handles the update/insert condition cleanly and clearly
ADDED
I should also add that is you know the usage pattern in rarely update (i.e., 90% insert), you may have a case when drop/insert in faster than update/insert -- depends on lots of details. Regardless, merge is the clear winner if using 2008+
I generally like drop and re-insert. I find it to be cleaner and easier to code. However, if this is happening very frequently and you're worried about concurrency issues, you're probably better off with option 1.
Also, another thing to factor in is how often does the timestamp check fail (where you don't have to insert nor update). If 99% of data is redundant/outdated data, you're probably better off with option 1 regardless.

SQL Server Optimize Large Changing Table

I have reports that perform some time consuming data calculations for each user in my database, and the result is 10 to 20 calculated new records for each user. To improve report responsiveness, a nightly job was created to run the calculations and dump the results to a snapshot table in the database. It only runs for active users.
So with 50k users, 30k of which are active, the job "updates" 300k to 600k records in the large snapshot table. The method it currently uses is it deletes all previous records for a given user, then inserts the new set. There is no PK on the table, only a business key is used to group the sets of data.
So my question is, when removing and adding up to 600k records every night, are there techniques to optimize the table to handle this? For instance, since the data can be recreated on demand, is there a way to disable logging for the table as these changes are made?
UPDATE:
One issue is I cannot do this in batch because the way the script works, it's examining one user at a time, so it looks at a user, deletes the previous 10-20 records, and inserts a new set of 10-20 records. It does this over and over. I am worried that the transaction log will run out of space or other performance issues could occur. I would like to configure the table to now worry about data preservation or other items that could slow it down. I cannot drop the indexes and all that because people are accessing the table concurrently to it being updated.
It's also worth noting that indexing could potentially speed up this bulk update rather than slow it down, because UPDATE and DELETE statements still need to be able to locate the affected rows in the first place, and without appropriate indexes it will resort to table scans.
I would, at the very least, consider a non-clustered index on the column(s) that identify the user, and (assuming you are using 2008) consider the MERGE statement, which can definitely avoid the shortcomings of the mass DELETE/INSERT method currently employed.
According to The Data Loading Performance Guide (MSDN), MERGE is minimally logged for inserts with the use of a trace flag.
I won't say too much more until I know which version of SQL Server you are using.
This is called Bulk Insert, you have to drop all indexes in destination table and send insert commands in large packs (hundreds of insert statements) separated by ;
Another way is to use BULK INSERT statement http://msdn.microsoft.com/en-us/library/ms188365.aspx
but it involves dumping data to file.
See also: Bulk Insert Sql Server millions of record
It really depends upon many things
speed of your machine
size of the records being processed
network speed
etc.
Generally it is quicker to add records to a "heap" or an un-indexed table. So dropping all of your indexes and re-creating them after the load may improve your performance.
Partitioning the table may see performance benefits if you partition by active and inactive users (although the data set may be a little small for this)
Ensure you test how long each tweak adds or reduces your load and work from there.

Handling a Huge Data of Record in 1 table

I would like to ask couple question how to handle a huge 100 million of data in 1 single table.
The table will perform INSERT, SELECT & UPDATE.
I have got some advise that to Index the table and Archive the table into couple table.
Any other suggestion that can help to tweak the SQL Performance.
Case:
SQL Server 2008.
Most of the time the update regarding decimal value, and status of tiny int.
The INSERT statement will not using BULK INSERT since I'm assuming that per min that there'r a lot of users let said 10000-500000 performing INSERT statement and Update the table.
You should consider what kind of columns you have.
The more nvarchar/text/etc columns you have included in the different indexes, the slower the index will be.
Also what RDBMS are you going to use? You have different options based on SQL Server, Oracle and MySQL...
But the crucial thing is differently to build the right index's that you would use...
One other thing, you could use BULK INSERT on SQL Server to speed up the inserts.
But ask away, i have dealt with databases being populated with 70 mill data rows pr day ;)
EDIT ----- After more information has come
I'll try to take a little other approach to the case and compare it to data scraping.
There are no doubt that INSERTs are faster than UPDATEs. And you might want to make a table that acts as a "collect" table. What I mean is that it only get inserts all the time. No updates, all is handle with inserts.
Then you use a trigger/event/scheduler to handle what has come into that table and populate what you need to another(s) table(s).
This way you will be able to apply a little business logic to the "cleanup" (update) and keep the performance on the DB Server and not hold up a connection while these things are done.
This of course also have something to do with what the "final" data are to be used for...
\T
Clearly SQL 2008 is capable of 100 million records but a lot of details to look at that just do not come into play at 100 thousand. Pick a good primary key. Fill factor. Other indexes (will slow down insert but speed select). Concurrency (locking). If you can accept dirty reads then that will help performance. This question needs a lot more detail. You need to post the table design and your select, update, and insert TSQL statements. I did not vote your question down but if you don't provide more detail it will get voted down.
For insert be aware you can insert multiple rows at once and is much faster than multiple insert statements if BULK INSERT is not an option.
INSERT INTO Production.UnitMeasure
VALUES (N'FT2', N'Square Feet ', '20080923'), (N'Y', N'Yards', '20080923'), (N'Y3', N'Cubic Yards', '20080923');

Resources