Update Rows in batches SQL Server - sql-server

I'm having a SQL Server table which is having approximately 900 million rows. It is having an Auto increment Id column. My goal is to update the table by only taking 40000 rows at a time. Like, fetch 1st 40000 rows, generate the records for those by calling an API and update in the table. Next, take the next 40000 rows starting from Id 40001, generate the records and store them in table.
For the process, I'm creating a temp table, inserting 40000 records in the temp table from the target table, processing them and updating in the target table. Again in the next iteration, truncating the temp table, taking the next 40000 rows from the target table and inserting into temp table and processing them.
I need to use the temp table because I want to get the Max Id from the temp table, so that, in the next iteration I can select rows from the target table that is greater than Max Id.
Is there any better process to do it?

How about simply using your ID column to limit the range?
SELECT TOP (40000)
*
FROM table
WHERE id > #id
ORDER BY id ASC;
Then loop through in your code, making the start id the last from the prior select.

Related

Select query takes long time even for 10 records from a very big records table

I have a very big records table about 6.5 million records. When I try to select some records from it even 10 records I have to wait a long random time.
SELECT [Column1], [Column2], [Column3], [Column4], [Column5]
FROM [table]
WHERE deviceDataId = '640'
ORDER BY id ASC OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY
My database is deployed on azure I also download and deploy it local system but it takes same time.
Query execution plan:
Now that we have your real query we can see that it appears that you have no index on your column deviceDataId; meaning that the entire table needs to be scanned. As such, even though you only want 10 rows, all 6.5M rows need to be scanned and the value of deviceDataId checked.
If you create an index on your column deviceDataId and minimally INCLUDE the others in your query, then you'll have a covering index which'll greatly help. This also assumes id is the column your CLUSTERED INDEX is ordered on.
CREATE NONCLUSTERED INDEX IX_Table_DeviceTableID_Cols1_5
ON dbo.[table] (deviceDataId)
INCLUDE ([Column1], [Column2], [Column3], [Column4], [Column5]);
Also, as I note in the comments, if deviceDataId is an int, use an int value in your WHERE clause. Don't wrap numerical values in single quotes, that is for literal strings.
WHERE deviceDataId = 640

best practice to update data from last 3 month in a very large Table

I have 2 tables:
History Table A (10 billion records, non-clustered column stored index on all column, the table is partitioned pro every 6 months)
Staging Table B (2000 records, no partition)
I want to compare Table B with the data of last 3 months of Table A, and if the data from Table A is older, I will update them with the data from Table B.
My update query looks like this:
UPDATE COLUMN A=COLUMN B
FROM B
JOIN A
WHERE DATE COLUMN B>DATE COLUMN A
The Problem is: this Query takes 5 hours to update only 2000 records. The JOIN between Table A and Table B takes too long time.
I have tried a lot to modify the query (e.g: add non-clustered index on Date column) but there is no better results
How can I improve the performance of this update query?

sql select performance (Microsoft)

I have a very big table with many rows (50 million) and more than 500 columns. Its indexes are period and client. I need to keep for a period the client and another column (not an index). It takes too much time. So I'm trying to understand why:
If I do:
select count(*)
from table
where cd_periodo=201602
It takes less than 1 sec and returns the number 2 million.
If I select into a temp table the period it also takes no time (2 secs)
select cd_periodo
into #table
from table
where cd_periodo=201602
But if I select another column that it's not part of an index it takes more than 3 minutes.
select not_index_column
into #table
from table
where cd_periodo=201602
Why is this happening? I'm not doing any filter on the column.
When you select an indexed column, the reader doesn't have to process and go into the entire table and read the entire row. The index helps the reader to select the value without having to actually get the row.
When you select a nonindexed column, the opposite of what I said happens, and the reader have to read the whole table in order to get the value from this column.

Get a list of columns and widths for a specific record

I want a list of properties about a given table and for a specific record of data from that table - in one result
Something like this:
Column Name , DataLength, SchemaLengthMax
...and for only one record (based on a where filter)
So what Im thinking is something like this:
- Get a list of columns from sys.columns and also the schema-based maxlength value
- populate column names into a temp table that includes (column_name, data_length, schema_size_max)
- now loop over that temp table and for each column name, fetch the data for that column based on a specific record, then update the temp table with the length of this data
- finally, select from the temp table
sound reasonable?
Yup. That way works. Not sure if it's the best, since it involves one iteration per column along with the where condition on the source table.
Consider this, instead :
Get the candidate records into a temporary table after applying the where condition. Make sure to get a primary key. If there is no primary key, get a rowid. (assuming SQL Server 2005 or above).
Create a temporary table (Say, #RecValueLens) that has three columns : Primary_key_Value, MyColumnName, MyValueLen
Loop through the list of column names (after taking only the column names into another temporary table) and build sql statement shown in Step 4.
Insert Into #RecValueLens (Primary_Key_Value, MyColumnName, MyValueLen)
Select Max(Primary_Key_Goes_Here), Max('Column_Name_Goes_Here') as ColumnName, Len(Max(Column_Name)) as ValueMyLen From Source_Table_Goes_Here
Group By Primary_Key_Goes_Here
So, if there are 10 columns, you will have 10 insert statements. You could either insert them into a temporary table and run it as a loop. If the number of columns is few, you could concatenate all statements into a single batch.
Run the SQL Statement(s) from above. So, you have Record-wise, column-wise, Value lengths. What is left is to get the column definition.
Get the column definition from sys.columns into a temporary table and join with the #RecValueLens to get the output.
Do you want me to write it for you ?

how to increment a sequence number while triggering data from one table to another

I want to write a trigger to transfer some columns of all inserted rows in a table to another table while incrementing the maximum number in a sequence number field in the destination table. this field is not auto increment but is a primary key field.
What I used to do was find the max sequence no in destination table, increment and then insert the new value. This worked fine if data is inserted row at a time. But when many rows are inserted from a single query, how can I increment the sequence number? Sample problem follows:
insert into [mssql].mssql.dbo.destination_table (name,seq_no)
select name,?
from inserted
even few thousand rows can be inserted at once.
seq_no is part of a composite primary key. So for example if data is inserted under different name seq_no will be different. (This requirement should be considered when I can increment the seq_no without considering its part in the primary key)
Okay, I got your problem, try this
insert into [mssql].mssql.dbo.destination_table (name,seq_no)
select name, x.MaxSeq + row_number() over (order by name)
from inserted, (select Max(seq_no) As MaxSeq From source_table) x

Resources