Why does an UPDATE take much longer than a SELECT? - sql-server

I have the following select statement that finishes almost instantly.
declare #weekending varchar(6)
set #weekending = 100103
select InvoicesCharges.orderaccnumber, Accountnumbersorders.accountnumber
from Accountnumbersorders, storeinformation, routeselecttable,InvoicesCharges, invoice
where InvoicesCharges.pubid = Accountnumbersorders.publication
and Accountnumbersorders.actype = 0
and Accountnumbersorders.valuezone = 'none'
and storeinformation.storeroutename = routeselecttable.istoreroutenumber
and storeinformation.storenumber = invoice.store_number
and InvoicesCharges.invoice_number = invoice.invoice_number
and convert(varchar(6),Invoice.bill_to,12) = #weekending
However, the equivalent update statement takes 1m40s
declare #weekending varchar(6)
set #weekending = 100103
update InvoicesCharges
set InvoicesCharges.orderaccnumber = Accountnumbersorders.accountnumber
from Accountnumbersorders, storeinformation, routeselecttable,InvoicesCharges, invoice
where InvoicesCharges.pubid = Accountnumbersorders.publication
and Accountnumbersorders.actype = 0
and dbo.Accountnumbersorders.valuezone = 'none'
and storeinformation.storeroutename = routeselecttable.istoreroutenumber
and storeinformation.storenumber = invoice.store_number
and InvoicesCharges.invoice_number = invoice.invoice_number
and convert(varchar(6),Invoice.bill_to,12) = #weekending
Even if I add:
and InvoicesCharges.orderaccnumber <> Accountnumbersorders.accountnumber
at the end of the update statement reducing the number of writes to zero, it takes the same amount of time.
Am I doing something wrong here? Why is there such a huge difference?

transaction log file writes
index updates
foreign key lookups
foreign key cascades
indexed views
computed columns
check constraints
locks
latches
lock escalation
snapshot isolation
DB mirroring
file growth
other processes reading/writing
page splits / unsuitable clustered index
forward pointer/row overflow events
poor indexes
statistics out of date
poor disk layout (eg one big RAID for everything)
Check constraints with UDFs that have table access
...
Although, the usual suspect is a trigger...
Also, your condition extra has no meaning: How does SQL Server know to ignore it? An update is still generated with most of the baggage... even the trigger will still fire. Locks must be held while rows are searched for the other conditions for example
Edited Sep 2011 and Feb 2012 with more options

The update has to lock and modify the data in the table, and also log the changes to the transaction log. The select does not have to do any of those things.

Because reading does not affect indices, triggers, and what have you?

In Slow servers or large database i usually use UPDATE DELAYED, that waits for a "break" to update the database itself.

Related

Set values on a new table from a historical table, before adding new table back into historical table

I would like to create a historical view of alerts in an application. To do this, I am grabbing all events and timestamping them, then uploading them into a MS SQL table. I would also like to be able to exempt certain objects from the total count by flagging either the finding (to exclude the finding across all systems) or the object in the finding (to exclude the object from all findings).
The idea is, I will have all previous alerts in the main table, then I will set an 'exemptobject' or 'exemptfinding' bit column in the row. When I re-run the script weekly, I will upload the results directly into a temporary table and then I would like to compare either the object or the finding for each object in the temporary table to the main database's 'object' or 'finding' and set the respective 'exemptobject' or 'exemptfinding' bit. Once all the temporary table's objects have any exemption bits set, insert the temporary table into the main table and drop the temporary table to keep a historical record.
This will give me duplicate findings and objects, so I am having difficulty with the merge command:
BEGIN TRANSACTION
MERGE INTO [dbo].[temp_table]
USING [dbo].[historical]
ON [dbo].[temp_table].[object] = [dbo].[historical].[object] OR
[dbo].[temp_table].[finding] = [dbo].[historical].[finding]
WHEN MATCHED THEN
UPDATE
SET [exemptfinding] = [dbo].[historical].[exemptfinding]
,[exemptobject] = [dbo].[historical].[exemptobject]
,[exemptdate] = [dbo].[historical].[exemptdate]
,[comments] = [dbo].[historical].[comments];
COMMIT
This seems to do what I want, but I see that the results are going to grow exponentially and I think it won't be sustainable for long.
BEGIN TRANSACTION
UPDATE [dbo].[temp]
SET [dbo].[temp].[exemptfinding] = [historical].[exemptfinding]
,[dbo].[temp].[exemptobject] = [historical].[exemptobject]
,[dbo].[temp].[exemptdate] = [historical].[exemptdate]
,[dbo].[temp].[comments] = [historical].[comments]
FROM [dbo].[temp] temp
INNER JOIN [dbo].[historical] historical
ON (
[temp].[finding] = [sci].[finding] OR
[temp].[object] = [sci].[object] OR
) AND
(
[historical].[exemptfinding] = 1 OR
[historical].[exemptobject] = 1
)
COMMIT
I feel like I need to normalize the database, but I can't think of a way to separate things out and be able to:
See a count of each finding based on date the script was run
Be able to drill down into each day and see all the findings, objects and recommendations for each
Control the count shown for each finding by removing 'exempted' findings OR objects.
I feel like there's something obvious I'm missing or I'm thinking about this incorrectly. Any help would be greatly appreciated!
EDIT - The following seems to do what I want, but as soon as I add an additional WHERE condition to the final result, the query time goes from 7 seconds to 90 seconds, so I fear it will not scale.
BEGIN TRANSACTION
UPDATE [dbo].[temp]
SET [dbo].[temp].[exemptrecommendation] = [historical].[exemptrecommendation]
,[dbo].[temp].[exemptfinding] = [historical].[exemptfinding]
,[dbo].[temp].[exemptobject] = [historical].[exemptobject]
,[dbo].[temp].[exemptdate] = [historical].[exemptdate]
,[dbo].[temp].[comments] = [historical].[comments]
FROM (
SELECT *
FROM historical h
WHERE EXISTS (
SELECT id
,recommendation
FROM temp t
WHERE (
t.id = s.id OR
t.recommendation = s.recommendation
)
)
) historical
WHERE [dbo].[temp].[recommendation] = [historical].[recommendation] OR
[dbo].[temp].[id] = [historical].[id]
COMMIT

How do I set the correct transaction level?

I am using Dapper on ADO.NET. So at present I am doing the following:
using (IDbConnection conn = new SqlConnection("MyConnectionString")))
{
conn.Open());
using (IDbTransaction transaction = conn.BeginTransaction())
{
// ...
However, there are various levels of transactions that can be set. I think this is the various settings.
My first question is how do I set the transaction level (where I am using Dapper)?
My second question is what is the correct level for each of the following cases? In each of these cases we have multiple instances of a web worker (Azure) service running that will be hitting the DB at the same time.
I need to run monthly charges on subscriptions. So in a transaction I need to read a record and if it's due for a charge create the invoice record and mark the record as processed. Any other read of that record for the same purpose needs to fail. But any other reads of that record that are just using it to verify that it is active need to succeed.
So what transaction do I use for the access that will be updating the processed column? And what transaction do I use for the other access that just needs to verify that the record is active?
In this case it's fine if a conflict causes the charge to not be run (we'll get it the next day). But it is critical that we not charge someone twice. And it is critical that the read to verify that the record is active succeed immediately while the other operation is in its transaction.
I need to update a record where I am setting just a couple of columns. One use case is I set a new password hash for a user record. It's fine if other access occurs during this except for deleting the record (I think that's the only problem use case). If another web service is also updating that's the user's problem for doing this in 2 places simultaneously.
But it's key that the record stay consistent. And this includes the use case of "set NumUses = NumUses + #ParamNum" so it needs to treat the read, calculation, write of the column value as an atomic action. And if I am setting 3 column values, they all get written together.
1) Assuming that Invoicing process is an SP with multiple statements your best bet is to create another "lock" table to store the fact that invoicing job is already running e.g.
CREATE TABLE InvoicingJob( JobStarted DATETIME, IsRunning BIT NOT NULL )
-- Table will only ever have one record
INSERT INTO InvoicingJob
SELECT NULL, 0
EXEC InvoicingProcess
ALTER PROCEDURE InvoicingProcess
AS
BEGIN
DECLARE #InvoicingJob TABLE( IsRunning BIT )
-- Try to aquire lock
UPDATE InvoicingJob WITH( TABLOCK )
SET JobStarted = GETDATE(), IsRunning = 1
OUTPUT INSERTED.IsRunning INTO #InvoicingJob( IsRunning )
WHERE IsRunning = 0
-- job has been running for more than a day i.e. likely crashed without releasing a lock
-- OR ( IsRunning = 1 AND JobStarted <= DATEADD( DAY, -1, GETDATE())
IF NOT EXISTS( SELECT * FROM #InvoicingJob )
BEGIN
PRINT 'Another Job is already running'
RETURN
END
ELSE
RAISERROR( 'Start Job', 0, 0 ) WITH NOWAIT
-- Do invoicing tasks
WAITFOR DELAY '00:01:00' -- to simulate execution time
-- Release lock
UPDATE InvoicingJob
SET IsRunning = 0
END
2) Read about how transactions work: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/transactions-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-2017
You second question is quite broad.

Merge statement optimization

I have a two tables in SQL Server, in which one is the source for a MERGE operation into another.
The source table has 30Mil Records
The Target table has 180Mil Records. Both tables have 227 columns.
I do have SSIS, but I'm told in this case, a MERGE statement is the better option. Below is a shortened version of it:
;WITH MySource as (
SELECT * FROM [STAGE].[dbo].[STAGE_TABLE]
)
MERGE [EDW].[dbo].[TARGET_TABLE] AS MyTarget
USING MySource
ON MySource.[ID_FIELD] = MyTarget.[ID_FIELD]
AND MySource.[LoadDate] >= MyTarget.[LoadDate]
WHEN MATCHED THEN UPDATE SET
<<Target Column>> = MySource.<<Source Colums>> --227 columns
WHEN NOT MATCHED THEN INSERT
(
[ID_FIELD],
[LoadDate],
<<225 Other Columns>>
)
VALUES (
MySource.[ID_FIELD],
MySource.[LoadDate],
MySource.<<225 other columns>>
);
The only changes I made to the script above is truncating the list of columns to keep the code block here short.
My Problem is that I am getting hung on the execution. The profile screen shows a CXPACKET suspension with the error: cwaitpipenewrow, node=2.
How do I troubleshoot this? Thank you.
Seems like CXPACKET and suspended state means that some threads which have completed are logging that other thread's state which have not completed yet.
Please check here. The query need to update upto 1 Billion values in the table. hence it would be slow running queries.
https://dba.stackexchange.com/questions/96346/cxpacket-suspended-and-null-wait-type
https://www.sqlshack.com/troubleshooting-the-cxpacket-wait-type-in-sql-server/
Hope these articles might help you debug.

How to move a row from a table to another table if a Column's value changes in SQL?

I have two tables, Hosts, and UnusedHosts. Hosts has 17 columns, and UnusedHosts has 14 columns, where the first 12 is the same as in Hosts, and the 13th is a UserName, who moved a host to UnusedHosts, and the 14th is a date, when he did it. In Hosts there is a Column Unused which is False. I want do the following. If i change in Hosts this value to True, then it should automatically removed to UnusedHosts.
How can i do this? Could someone provide some example?
P.S.: My SQL knowledge is very small, i can use only very simple selects, updates, inserts, and delete commands.
Thanks!
There's two main types of query in SQL Server - the AFTER and the INSTEAD OF. They work, much as they sound - the AFTER performs your original query, and then runs your trigger. The INSTEAD OF runs your trigger in place of the original query. You can use either in this case, though in different ways.
AFTER:
create trigger hosts_unused
on Hosts
after UPDATE
as
insert into UnusedHosts
select h.<<your_columns>>...
from Hosts h
where h.unused = 1 --Or however else you may be denoting True
delete from Hosts
where unused = 0 --Or however else you may be denoting False
GO
INSTEAD OF:
create trigger hosts_unused
on Hosts
instead of UPDATE
as
insert into UnusedHosts
select i.<<your_columns>>...
from inserted i
where i.unused = 1 --Or however else you may be denoting True
delete h
from inserted i inner join
Hosts h on i.host_id = h.host_id
where i.unused = 1 --Or however else you may be denoting True
update h
set hosts_column_1 = i.hosts_column_1,
hosts_column_2 = i.hosts_column_2,
etc
from inserted i inner join
Hosts h on i.host_id = h.host_id
where i.unused = 0 --Or however else you may be denoting False
GO
It's always important to think of performance when applying triggers. If you have a lot of updates on the Hosts table, but only a few of them are setting the unused column, then the AFTER trigger is probably going to give you better performance. The AFTER trigger also has the benefit that you can simply put in , INSERT after the after UPDATE bit, and it'll work for inserts too.
Check out Books Online on the subject.

How to explicitly lock a table in Microsoft SQL Server (looking for a hack - uncooperative client)

This was my original question:
I am trying to figure out how to enforce EXCLUSIVE table locks in SQL Server. I need to work around uncooperative readers (beyond my control, closed source stuff) which explicitly set their ISOLATION LEVEL to READ UNCOMMITTED. The effect is that no matter how many locks and what kind of isolation I specify while doing an insert/update, a client just needs to set the right isolation and is back to reading my garbage-in-progress.
The answer turned out to be quite simple -
while there is no way to trigger an explicit lock, any DDL change triggers the lock I was looking for.
While this situation is not ideal (the client blocks instead of witnessing repeatable reads), it is much better than letting the client override the isolation and reading dirty data. Here is the full example code with the dummy-trigger lock mechanism
WINNING!
#!/usr/bin/env perl
use Test::More;
use warnings;
use strict;
use DBI;
my ($dsn, $user, $pass) = #ENV{ map { "DBICTEST_MSSQL_ODBC_$_" } qw/DSN USER PASS/ };
my #coninf = ($dsn, $user, $pass, {
AutoCommit => 1,
LongReadLen => 1048576,
PrintError => 0,
RaiseError => 1,
});
if (! fork) {
my $reader = DBI->connect(#coninf);
$reader->do('SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED');
warn "READER $$: waiting for table creation";
sleep 1;
for (1..5) {
is_deeply (
$reader->selectall_arrayref ('SELECT COUNT(*) FROM artist'),
[ [ 0 ] ],
"READER $$: does not see anything in db, sleeping for a sec " . time,
);
sleep 1;
}
exit;
}
my $writer = DBI->connect(#coninf);
eval { $writer->do('DROP TABLE artist') };
$writer->do('CREATE TABLE artist ( name VARCHAR(20) NOT NULL PRIMARY KEY )');
$writer->do(do('DISABLE TRIGGER _lock_artist ON artist');
sleep 1;
is_deeply (
$writer->selectall_arrayref ('SELECT COUNT(*) FROM artist'),
[ [ 0 ] ],
'No rows to start with',
);
$writer->begin_work;
$writer->prepare("INSERT INTO artist VALUES ('bupkus') ")->execute;
# this is how we lock
$writer->do('ENABLE TRIGGER _lock_artist ON artist');
$writer->do('DISABLE TRIGGER _lock_artist ON artist');
is_deeply (
$writer->selectall_arrayref ('SELECT COUNT(*) FROM artist'),
[ [ 1 ] ],
'Writer sees inserted row',
);
# delay reader
sleep 2;
$writer->rollback;
# should not affect reader
sleep 2;
is_deeply (
$writer->selectall_arrayref ('SELECT COUNT(*) FROM artist'),
[ [ 0 ] ],
'Nothing committed (writer)',
);
wait;
done_testing;
RESULT:
READER 27311: waiting for table creation at mssql_isolation.t line 27.
ok 1 - READER 27311: does not see anything in db, sleeping for a sec 1310555569
ok 1 - No rows to start with
ok 2 - Writer sees inserted row
ok 2 - READER 27311: does not see anything in db, sleeping for a sec 1310555571
ok 3 - READER 27311: does not see anything in db, sleeping for a sec 1310555572
ok 3 - Nothing committed (writer)
ok 4 - READER 27311: does not see anything in db, sleeping for a sec 1310555573
ok 5 - READER 27311: does not see anything in db, sleeping for a sec 1310555574
One hack hack hack way to do this is to force an operation on the table which takes a SCH-M lock on the table, which will prevent reads against the table even in READ UNCOMMITTED isolation level. Eg, doing an operation like ALTER TABLE REBUILD (perhaps on a specific empty partition to reduce performance impact) as part of your operation will prevent all concurrent access to the table until you commit.
Add a locking hint to your SELECT:
SELECT COUNT(*) FROM artist WITH (TABLOCKX)
and put your INSERT into a transaction.
If your initial statement is in an explicit transaction, the SELECT will wait for a lock before it processes.
There's no direct way to force locking when a connection is in the READ UNCOMMITTED isolation level.
A solution would be to create views over the tables being read that supply the READCOMMITTED table hint. If you control the table names used by the reader, this could be pretty straightforward. Otherwise, you'll have quite a chore as you'll have to either modify writers to write to new tables or create INSTEAD OF INSERT/UPDATE triggers on the views.
Edit:
Michael Fredrickson is correct in pointing out that a view simply defined as a select from a base table with a table hint wouldn't require any trigger definitions to be updatable. If you were to rename the existing problematic tables and replace them with views, the third-party client ought to be none the wiser.

Resources