I would like to update one column value for all 19k rows I would like to know the fastest way of updating thousands of rows in a SQL database. Please suggest.
Below is the code I tried, but it's taking full day to execute which making the existing application freeze.
update table_name
set column_name_value = 2
If you are simply changing the whole column to a single value, I would ask what is the point of having the column in the first place. Either way, If you're updating a LIVE production transactional database, I'd take a backup first.
If you are allowed, switch the database into SIMPLE mode. This will reduce the amount of space the trans log uses, or try BULK INSERT mode. This may not be an option.
That said, you can drop the column, and readd it with a default.
ALTER TABLE table_name DROP COLUMN column_name;
ALTER TABLE table_name ADD column_name INTEGER NOT NULL DEFAULT 2;
You may also do something like this:
UPDATE table_name WITH (TABLOCKX) SET column_name = 2;
If you cannot do any of the preceding, taking the database out of commission, then you may have to chunk your updates.
Try the following (if you happen to have a identity key on the table):
DECLARE #num_chunks INT = 10 -- this will break your update into 10 smaller updates
DECLARE #counter INT = 0
WHILE #counter < #num_chunks
BEGIN
UPDATE table_name SET column_name = 2
WHERE (int_identity_column - #counter) % #num_chunks = 0
PRINT 'DONE WITH GROUP ' + CAST(#counter AS VARCHAR) + ' AT ' + CAST(getdate() as varchar)
SET #counter = #counter + 1
END
Using this method will allow you to restart the process from where it left off, if you happen to have an issue and it gets interrupted.
Related
We are building a protected INSTEAD OF UPDATE trigger to monitor and control the updates of several tables (most of MasterData tables of the system, about 150 of them). So, as much as possible (installation, updates) we try to do the code as reusable as possible (no hard-coding field names or table names).
To control the "actual" version of a row, an _ACTIVE field exist, and this goes decreasing for each new version (ACTIVE row gets ACTIVE = 1). Sorry we can not use the temporal tables feature due to backwards compatibility (plenty of business logic is built based on this feature)
Update logic includes treating the OLD and NEW lines before they affect the table, and once everything is treated, update the table; not only the affected rows, but all with same uniqueness key fields (the identification of the uniqueness fields aimed to be done dynamically too; on the following example, the where clause gets dynamically constructed on the variable #toWhereOnClause)
The "real" table suffers two actions, first a bunch of new lines are inserted with _ACTIVE = 2, second, all rows that need to be update get the _ACTIVE -= 1, leaving the newest version of the row set to 1
The problem arise as this second action, the update, needs to be created dynamically, to avoid entering the table name, and set the #toWhereOnClause manually. And this triggers once more the TRIGGER, and because it is dynamicSQL (we believe) is not captured by the TRIGGER_NESTLEVEL() = 1
Code structure is as follow:
CREATE OR ALTER TRIGGER [schema].[triggerName] ON [schema].table
INSTEAD OF UPDATE
AS
BEGIN
SET #tableName = '[schema].[table]' // only line to modify for diferent tables
//TRIGGER PREPARATION
SET #schema = (SELECT SUBSTRING(#tableName, 1, (CHARINDEX('].', #tableName))))
SET #table = (SELECT SUBSTRING(#tableName, (CHARINDEX('[', #tableName, 3)), LEN(#tableName)))
SET #fieldNameS = (SELECT + ',' + QUOTENAME(COLUMN_NAME)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = #schema
AND TABLE_NAME = #table
ORDER BY ORDINAL_POSITION
FOR XML path(''));
SET #uniqueFieldSelector = (SELECT +' AND LeftTable.'+ COLUMN_NAME + ' = #INSERTED.' + COLUMN_NAME
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE TABLE_NAME = #tableName
AND COLUMN_NAME NOT LIKE '%Active%'
FOR XML PATH(''))
SET #toWhereOnClause = (SELECT (SUBSTRING(#uniqueFieldSelector, 5, LEN(#uniqueFieldSelector))))
// DUPLICATE TRIGGER TABLES INTO TEMP TABLE TO WORK ON NEW AND OLD LINES
SELECT * INTO #INSERTED FROM INSERTED -- Can't modify logic table values (INSERTED), we put it in a temp table
SELECT * INTO #DELETED FROM DELETED
// SEVERAL INSTRUCTIONS TO TREAT THE OLD AND NEW LINES (not shown here) AND CALCULATE IF THE UPDATE IS LEGAL (#CONTINUE_TRIGGER)
...
// REAL UPDATE
IF TRIGGER_NESTLEVEL() = 1 AND #CONTINUE_TRIGGER = TRUE
--https://stackoverflow.com/questions/1529412/how-do-i-prevent-a-database-trigger-from-recursing
BEGIN
SET #statementINSERT = N'INSERT INTO' + #tableName + '( ... )
SELECT ... FROM #INSERTED ';
EXECUTE sp_executesql #statementINSERT
SET #statementUPDATE = N'UPDATE TheRealTable
SET TheRealTable._ACTIVE -= 1
FROM ' + #tableName + ' AS TheRealTable
INNER JOIN #INSERTED ON ' + #toWhereOnClause;
EXECUTE sp_executesql #statementUPDATE
END
END
yes, we know it is complex, but legacy doesn't give many options.
SO:
Is there any way to avoid the dynamicSQL trigger again the TRIGGER ??
(system is running on WindowsServer, and Azure instances, all with 120 compatibility at least)
I have a trigger in mssql in which I want to compare each column from the inserted table with the deleted table to check if the value has changed...
If the value has changed I want to insert the column name into a temp table.
My code until now:
declare columnCursor CURSOR FOR
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'MyTable' AND TABLE_SCHEMA='dbo'
--save inserted and deleted into temp tables
select * into #row1 from Inserted
select * into #row2 from Deleted
declare #tmp table(column_name nvarchar(max))
declare #column nvarchar(50)
OPEN COlumnCUrsor
FETCH NEXT FROM ColumnCursor INTO #column
while ##FETCH_STATUS=0 begin
declare #out bit
declare #sql nvarchar(max) = N'
select #out = case when r1.'+#column+'r2.'+#column+' then 1 else 0 end
from #row1 r1
left join #row2 r2 on r1.sys_volgnr=r2.sys_volgnr'
exec sp_executesql #sql,N'#out bit OUTPUT', #out=#out OUTPUT
if( #out = 1 ) begin
insert into #tmp VALUES(#column)
end
FETCH NEXT FROM ColumnCursor INTO #column
end
CLOSE ColumnCursor;
DEALLOCATE ColumnCursor;
Is there an easier way to accomplish this?
Yes, there is.
You can use the COLUMNS_UPDATED function to determine the columns that had actually changed values, though it's not a very friendly function in terms of code readability.
Read this article from Microsoft support called Proper Use of the COLUMNS_UPDATED() Function to see what I mean.
I've came across an article called A More Performant Alternative To COLUMNS_UPDATED(), perhaps it can help you or at least inspire you.
I will note that you should resist the temptation to use the UPDATE() function, as it may return true even if no data was changed.
here is the relevant part from it's MSDN page:
UPDATE() returns TRUE regardless of whether an INSERT or UPDATE attempt is successful.
Looks like you're trying to build a dynamic solution, which might be useful if you expect to change often (=new columns to be added etc). You could do something like this (in pseudo-code)
Build a dynamic SQL based on DMVs (INFORMATION_SCHEMA.COLUMNS) for the column names:
insert into table ...
select
function_to_split_by_comma (
case when I.col1 = U.col1 then 'col1,' else '' end +
case when I.col2 = U.col2 then 'col2,' else '' end +
...
)
where
I.key_column1 = U.key_column1 ...
These names (col1, col2) should be the columns from the DMV query, + the case for each of the row, and then fixed SQL part for the beginning + you'll need to figure out how to join inserted and deleted, which requires the primary key.
For splitting the data into rows, you can use for example the delimited_split_8k by Jeff Moden (http://www.sqlservercentral.com/articles/Tally+Table/72993/).
Also as Damien pointed out, there can be more than one row in the inserted / deleted tables.
The year is 2010.
SQL Server licenses are not cheap.
And yet, this error still does not indicate the row or the column or the value that produced the problem. Hell, it can't even tell you whether it was "string" or "binary" data.
Am I missing something?
A quick-and-dirty way of fixing these is to select the rows into a new physical table like so:
SELECT * INTO dbo.MyNewTable FROM <the rest of the offending query goes here>
...and then compare the schema of this table to the schema of the table into which the INSERT was previously going - and look for the larger column(s).
I realize that this is an old one. Here's a small piece of code that I use that helps.
What this does, is returns a table of the max lengths in the table you're trying to select from. You can then compare the field lengths to the max returned for each column and figure out which ones are causing the issue. Then it's just a simple query to clean up the data or exclude it.
DECLARE #col NVARCHAR(50)
DECLARE #sql NVARCHAR(MAX);
CREATE TABLE ##temp (colname nvarchar(50), maxVal int)
DECLARE oloop CURSOR FOR
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'SOURCETABLENAME' AND TABLE_SCHEMA='dbo'
OPEN oLoop
FETCH NEXT FROM oloop INTO #col;
WHILE (##FETCH_STATUS = 0)
BEGIN
SET #sql = '
DECLARE #val INT;
SELECT #val = MAX(LEN(' + #col + ')) FROM dbo.SOURCETABLENAME;
INSERT INTO ##temp
( colname, maxVal )
VALUES ( N''' + #col + ''', -- colname - nvarchar(50)
#val -- maxVal - int
)';
EXEC(#sql);
FETCH NEXT FROM oloop INTO #col;
END
CLOSE oloop;
DEALLOCATE oloop
SELECT * FROM ##temp
DROP TABLE ##temp;
Another way here is to use binary search.
Comment half of the columns in your code and try again. If the error persists, comment out another half of that half and try again. You will narrow down your search to just two columns in the end.
You could check the length of each inserted value with an if condition, and if the value needs more width than the current column width, truncate the value and throw a custom error.
That should work if you just need to identify which is the field causing the problem. I don't know if there's any better way to do this though.
Recommend you vote for the enhancement request on Microsoft's site. It's been active for 6 years now so who knows if Microsoft will ever do anything about it, but at least you can be a squeaky wheel: Microsoft Connect
For string truncation, I came up with the following solution to find the max lengths of all of the columns:
1) Select all of the data to a temporary table (supply column names where needed), e.g.
SELECT col1
,col2
,col3_4 = col3 + '-' + col4
INTO #temp;
2) Run the following SQL Statement in the same connection (adjust the temporary table name if needed):
DECLARE #table VARCHAR(MAX) = '#temp'; -- change this to your temp table name
DECLARE #select VARCHAR(MAX) = '';
DECLARE #prefix VARCHAR(256) = 'MAX(LEN(';
DECLARE #suffix VARCHAR(256) = ')) AS max_';
DECLARE #nl CHAR(2) = CHAR(13) + CHAR(10);
SELECT #select = #select + #prefix + name + #suffix + name + #nl + ','
FROM tempdb.sys.columns
WHERE object_id = object_id('tempdb..' + #table);
SELECT #select = 'SELECT ' + #select + '0' + #nl + 'FROM ' + #table
EXEC(#select);
It will return a result set with the column names prefixed with 'max_' and show the max length of each column.
Once you identify the faulty column you can run other select statements to find extra long rows and adjust your code/data as needed.
I can't think of a good way really.
I once spent a lot of time debugging a very informative "Division by zero" message.
Usually you comment out various pieces of output code to find the one causing problems.
Then you take this piece you found and make it return a value that indicates there's a problem instead of the actual value (in your case, should be replacing the string output with the len(of the output)). Then manually compare to the lenght of the column you're inserting it into.
from the line number in the error message, you should be able to identify the insert query that is causing the error. modify that into a select query to include AND LEN(your_expression_or_column_here) > CONSTANT_COL_INT_LEN for the string various columns in your query. look at the output and it will give your the bad rows.
Technically, there isn't a row to point to because SQL didn't write the data to the table. I typically just capture the trace, run it Query Analyzer (unless the problem is already obvious from the trace, which it may be in this case), and quickly debug from there with the ages old "modify my UPDATE to a SELECT" method. Doesn't it really just break down to one of two things:
a) Your column definition is wrong, and the width needs to be changed
b) Your column definition is right, and the app needs to be more defensive
?
The best thing that worked for me was to put the rows first into a temporary table using select .... into #temptable
Then I took the max length of each column in that temp table. eg. select max(len(jobid)) as Jobid, ....
and then compared that to the source table field definition.
I am working on a client's database and there is about 1 million rows that need to be deleted due to a bug in the software. Is there an efficient way to delete them besides:
DELETE FROM table_1 where condition1 = 'value' ?
Here is a structure for a batched delete as suggested above. Do not try 1M at once...
The size of the batch and the waitfor delay are obviously quite variable, and would depend on your servers capabilities, as well as your need to mitigate contention. You may need to manually delete some rows, measuring how long they take, and adjust your batch size to something your server can handle. As mentioned above, anything over 5000 can cause locking (which I was not aware of).
This would be best done after hours... but 1M rows is really not a lot for SQL to handle. If you watch your messages in SSMS, it may take a while for the print output to show, but it will after several batches, just be aware it won't update in real-time.
Edit: Added a stop time #MAXRUNTIME & #BSTOPATMAXTIME. If you set #BSTOPATMAXTIME to 1, the script will stop on it's own at the desired time, say 8:00AM. This way you can schedule it nightly to start at say midnight, and it will stop before production at 8AM.
Edit: Answer is pretty popular, so I have added the RAISERROR in lieu of PRINT per comments.
DECLARE #BATCHSIZE INT, #WAITFORVAL VARCHAR(8), #ITERATION INT, #TOTALROWS INT, #MAXRUNTIME VARCHAR(8), #BSTOPATMAXTIME BIT, #MSG VARCHAR(500)
SET DEADLOCK_PRIORITY LOW;
SET #BATCHSIZE = 4000
SET #WAITFORVAL = '00:00:10'
SET #MAXRUNTIME = '08:00:00' -- 8AM
SET #BSTOPATMAXTIME = 1 -- ENFORCE 8AM STOP TIME
SET #ITERATION = 0 -- LEAVE THIS
SET #TOTALROWS = 0 -- LEAVE THIS
WHILE #BATCHSIZE>0
BEGIN
-- IF #BSTOPATMAXTIME = 1, THEN WE'LL STOP THE WHOLE JOB AT A SET TIME...
IF CONVERT(VARCHAR(8),GETDATE(),108) >= #MAXRUNTIME AND #BSTOPATMAXTIME=1
BEGIN
RETURN
END
DELETE TOP(#BATCHSIZE)
FROM SOMETABLE
WHERE 1=2
SET #BATCHSIZE=##ROWCOUNT
SET #ITERATION=#ITERATION+1
SET #TOTALROWS=#TOTALROWS+#BATCHSIZE
SET #MSG = 'Iteration: ' + CAST(#ITERATION AS VARCHAR) + ' Total deletes:' + CAST(#TOTALROWS AS VARCHAR)
RAISERROR (#MSG, 0, 1) WITH NOWAIT
WAITFOR DELAY #WAITFORVAL
END
BEGIN TRANSACTION
DoAgain:
DELETE TOP (1000)
FROM <YourTable>
IF ##ROWCOUNT > 0
GOTO DoAgain
COMMIT TRANSACTION
Maybe this solution from Uri Dimant
WHILE 1 = 1
BEGIN
DELETE TOP(2000)
FROM Foo
WHERE <predicate>;
IF ##ROWCOUNT < 2000 BREAK;
END
(Link: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b5225ca7-f16a-4b80-b64f-3576c6aa4d1f/how-to-quickly-delete-millions-of-rows?forum=transactsql)
Here is something I have used:
If the bad data is mixed in with the good-
INSERT INTO #table
SELECT columns
FROM old_table
WHERE statement to exclude bad rows
TRUNCATE old_table
INSERT INTO old_table
SELECT columns FROM #table
Not sure how good this would be but what if you do like below (provided table_1 is a stand alone table; I mean no referenced by other table)
create a duplicate table of table_1 like table_1_dup
insert into table_1_dup select * from table_1 where condition1 <> 'value';
drop table table_1
sp_rename table_1_dup table_1
If you cannot afford to get the database out of production while repairing, do it in small batches. See also: How to efficiently delete rows while NOT using Truncate Table in a 500,000+ rows table
If you are in a hurry and need the fastest way possible:
take the database out of production
drop all non-clustered indexes and triggers
delete the records (or if the majority of records is bad, copy+drop+rename the table)
(if applicable) fix the inconsistencies caused by the fact that you dropped triggers
re-create the indexes and triggers
bring the database back in production
I need to iterate through the fields on a table and do something if its value does not equal its default value.
I'm in a trigger and so I know the table name. I then loop through each of
the fields using this loop:
select #field = 0, #maxfield = max(ORDINAL_POSITION) from
INFORMATION_SCHEMA.COLUMNS where TABLE_NAME = #TableName
while #field < #maxfield
begin
...
I can then get the field name on each iteration through the loop:
select #fieldname = COLUMN_NAME from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = #TableName
and ORDINAL_POSITION = #field
And I can get the default value for that column:
select #ColDefault = SUBSTRING(Column_Default,2,LEN(Column_Default)-2)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE Table_Name = #TableName
AND Column_name = #fieldname
I have everything I need but I can't see how to then compare the 2. Because
I don't have the field name as a constant, only in a variable, I can't see
how to get the value out of the 'inserted' table (remember I'm in a trigger)
in order to see if it is the same as the default value (held now in
#ColDefault as a varchar).
First, remember that a trigger can be fired with multiple records coming in simultaneously. If I do this:
INSERT INTO dbo.MyTableWithTrigger
SELECT * FROM dbo.MyOtherTable
then my trigger on the MyTableWithTrigger will need to handle more than one record. The "inserted" pseudotable will have more than just one record in it.
Having said that, to compare the data, you can run a select statement like this:
DECLARE #sqlToExec VARCHAR(8000)
SET #sqlToExec = 'SELECT * FROM INSERTED WHERE [' + #fieldname + '] <> ' + #ColDefault
EXEC(sqlToExec)
That will return all rows from the inserted pseudotable that don't match the defaults. It sounds like you want to DO something with those rows, so what you might want to do is create a temp table before you call that #sqlToExec string, and instead of just selecting the data, insert it into the temp table. Then you can use those rows to do whatever exception handling you need.
One catch - this T-SQL only works for numeric fields. You'll probably want to build separate handling for different types of fields. You might have varchars, numerics, blobs, etc., and you'll need different ways of comparing those.
I suspect you can do this using and exec.
But why not just code generate once. It will be more performant