I have 1 unit test which enters the user in system through UI but before that it removes the existing entry for that user
I have 3 sets of query and each set has only 1 record which I want to delete, but in my unit test it fails in executing delete query and returns timeout error
I don't know how can I optimise the query, If someone can help in this
delete from CustomerRoles where RegisteredCustomerId = (select Id from RegisteredCustomers where Email = 'boltestsignupseller#yahoo.com')
delete from SellerInfos where RegisteredCustomerId = (select Id from RegisteredCustomers where Email = 'boltestsignupseller#yahoo.com')
DELETE FROM RegisteredCustomers where Email = 'boltestsignupseller#yahoo.com'
Second and third query almost takes more than 2 minutes and eventually timeout
without knowledge of the database, this is impossible to comment on, but common causes would include:
a missing index on the column being used to filter (or an unusable index - perhaps due to varchar vs nvarchar, etc)
blocking due to conflicting operations
the existence of triggers performing an unbounded amount of additional hidden work
Since the queries appear to be expecting a single RegisteredCustomers record, you can possibly reduce some overhead by capturing the located Id into a local variable at the start, and using that local in all three deletes, but this isn't a magic wand:
declare #id int = (
select Id from RegisteredCustomers where Email = 'boltestsignupseller#yahoo.com');
delete from CustomerRoles where RegisteredCustomerId = #id;
delete from SellerInfos where RegisteredCustomerId = #id;
delete from RegisteredCustomers where Id = #id;
Most likely, though, you'll need to actually investigate what is happening (look at blocks, look at the query plan, look at the IO stats, look at the indexing etc).
If there are lots of foreign keys on the tables, and those foreign keys are poorly indexed, it can take non-trivial amounts of time to perform deletes simply because it has to do a lot of work to ensure that the deletes don't violate referential integrity. In some cases, it is preferable to perform a logical delete rather than a physical delete, to avoid this work - i.e. have a column that signifies deletion, and just do an update ... set DeletionDate = GETUTCDATE() ... where ... rather than a delete (but: you need to remember for filter by this column in your queries).
Related
We have process in our project where records in a table with specific flag is deleted and remaining record's flag is updated.
Table have approx 45 million records and half the records are with flag='C' and remaining half with flag='P'.
Process run once in a day to delete all the records with flag 'P' and then update all the remaining ones with flag 'C'
Below are the two statements that is run through SSIS package.
DELETE FROM dbo.RTL_Valuation WITH (TABLOCK)
WHERE Valuation_Age_Flag = 'P';
UPDATE dbo.RTL_Valuation WITH (TABLOCK)
SET Valuation_Age_Flag = 'P'
WHERE Valuation_Age_Flag = 'C';
Currently process takes 60 minutes to complete. Is there any way process time could be improved ?
Thanks
You need to do 10000 rows at a time. You are creating one enormous transaction that takes up a lot of room in the transaction log (so it can be rolled back).
set nocount on
DELETE top (10000) FROM dbo.RTL_Valuation WHERE valuation_Age_Flag = 'P';
while ##rowcount()>0
begin
DELETE top (10000) FROM dbo.RTL_Valuation WHERE valuation_Age_Flag = 'P';
end
You can try 1,000, 5,000 or some other number to determine which is the best 'magic' number to quickly delete rows from a large table on your install of SQL Server. But it will be a lot faster that doing a big delete. The same logic applies to the update.
Ok. I assume, that when you perform your delete and update statements it results into two scans of the entire table (one to identify the rows to delete and one to identify the rows to update) and then you have to perform fully logged delete and update operations over it.
There is nice trick for situations like this if your database is in the simple recovery model. However, whether this is suitable for you depends on other circumstances (e.g. how many indexes you table has, whether there are some references, data types ...) that I am not able to asses from your description. It requires more coding but it usually results into much better performance. You would have to test whether it works better for you than your original approach.
Anyway, the trick works like this:
Instead of delete and update operations just select the rows you want to keep (including the changes of the flag) using "SELECT INTO" construct into new table. This results in the minimally logged insert operation and single table scan. You can use also the "INSERT INTO SELECT" construct but there you must fulfill some additional conditions to get the minimally logged insert.
Once data is in the new table, you have to build all required indexes on it.
Once all indexes are build, you just truncate the original table and using the SWITCH command you simply switch the data back to the original table and drop the "new table". It works also on the standard edition of the SQL Server.
I have a table in SQL DB Server. (Table Name : Material, 6 columns). It contains 2.6 million records. I need to update this table based on two column values. For update, system is taking 2 seconds.
Please help me how optimize below query.
UPDATE Material
SET Value = #Value,
Format = #Format,
SValue = #SValue,
CGroup = #CGroup
WHERE
SM = #SM
AND Characteristic = #Characteristic
You really need to provide the query plan before we can tell you with any certainty what, if anything, might help.
Having said that, the first thing I would check is whether the plan is showing a great deal of time doing a table scan. If so, you could improve performance substantially if it is a large table by adding an index on SM and Characteristic - that will allow the profiler to use the index to perform an index seek instead of a table scan, and could improve performance dramatically.
As you got big data few tweaks can increase query performance
(1) If column to be updated is indexed, remove index
(2) Executing the update in smaller batches
DECLARE #i INT=1
WHILE( #i <= 10 )
BEGIN
UPDATE TOP(20000) Material
SET Value = #Value,
Format = #Format,
SValue = #SValue,
CGroup = #CGroup
WHERE
SM = #SM
AND Characteristic = #Characteristic
SET #i=#i + 1
END
(3) Disabling Delete triggers (if any)
Hope this helps !
Try to put composite index for SM & Characteristic .By doing this, the sql server will be able to select records more easily. Operational wise, Update is a combination of insert & delete.If your table is having more columns, it may slow down your update even if you are not try to update all the columns.
Steps i prefer
Try to put composite index with SM & Characteristic
Try to re create a table with required columns & use joins where ever needed.
2.6 mil rows is not that much. 2 secs for an update is probably too much.
Having said that, the update times could depend on two things.
First, how many rows are being updated with a single update command, ie is it just one row or some larger set? You can't really do much about that, just saying it should be taken into consideration.
The other thing are indexes - you could either have too many of then or not enough.
If the table is missing an index on (SM, Characteristic) -- or (Characteristic, SM), depending on the selectivity -- then it's probably a full table scan every time. If the update touches only a couple of rows, it's waste of time. So, it's the first thing to check.
If there are too many indexes on the affected columns, this could slow down updates as well, because those indexes have to be maintained with every change of data. You can check the usefulness of indexes by querying the sys.dm_db_index_usage_stats DMV (plenty of explanation on the internet, so I won't get into it here) and remove the unused ones. Just be carefull with this and test thoroughly.
One other thing to check is whether the affected columns are part of some foreign key constraint. In that case, the engine must check the validity of the constraint every time (iow, check if the new value exists in the referenced table, or check if there's data in referencing tables, depending on which side of the FK the column is). If there are no supporting indexes for this check, it would cause (again) a scan on the other tables involved.
But to really make sure, check the exec plan and IO stats (SET STATISTICS IO ON), it will tell you exactly what is going on.
I am running SQL Server 2012 and this one query is killing my database performance.
My text message provider does not support scheduled text messages so I have a text message engine that picks up messages from the database and sends them at the scheduled time. I put this query together that gets the messages from the database and also changes their status so that they do not get picked up again.
The query works fine, it is just causing wait times on the CPU especially since it runs every other second. I installed a database performance software and it said this query accounts for 92% of instance execution time. The software also said that every single execution is doing 347,267 Logical Reads.
Any ideas on how to make this perform better?
Should I maybe select into a temporary table and update those results before returning them?
Here is the current query:
UPDATE TOP (30) dbo.Outgoing
SET Status = 2
OUTPUT INSERTED.OutgoingID, INSERTED.[Message], n.PhoneNumber, c.OptInStatus
FROM dbo.Outgoing o
JOIN Numbers n on n.NumberID = o.NumberID
LEFT JOIN Contacts c on c.ContactID = o.ContactID
WHERE Scheduled <= GETUTCDATE() AND SmsId IS NULL AND Status = 1
Here is the execution plan
There are three tables involved in this query: Outgoing, Numbers, & Contacts
Outgoing is the main table that this query deals with. There are only two indexes right now, a clustered primary key index on OutgoingID [PK, bigint, not null] and a non-clustered, non-unique index on SmsId [varchar(255), null] which is an identifier sent back from our text message provider once the messages are successfully received in their system. The Status column is just an integer column that relates to a few different statuses (Scheduled, Queued, Sent, Failed, Etc)
Numbers is just a simple table where we store unique cell phone numbers, some different formats of that number, and some basic information identifying the customer such as First name, carrier, etc. It just has a clustered primary key index on NumberID [bigint]. The PhoneNumber column is just a varchar(15).
The Contacts table just connects the individual person (phone number) to one of our merchants and keeps up with the number's opt in status, and other information related to the customer/merchant relationship. The only columns related to this query are OptInStatus [bit, not null] and ContactID [PK, bigint, not null]
--UPDATE--
Added a non-clustered index on the the Outgoing table with columns (Scheduled, SmsId, Status) and that seems to have brought down the execution time from 2+ second to milliseconds. I will check in with my performance monitoring software tomorrow to see how it has improved. Thank you everyone for the help so far!
As several commenters have already pointed out you need a new index on the dbo.Outgoing table. The server is struggling with finding the rows to update/output. This is most probably where the problem is:
WHERE Scheduled <= GETUTCDATE() AND SmsId IS NULL AND Status = 1
To improve performance you should create an index on dbo.Outgoing where you include these columns. This will make is easier for Sql Server to find the correct rows. It will on the other hand create some more work for the actual update though since there will be a new index that needs attention when updating.
While you're working on this, it would likely be a good idea to shorten the SmsId column unless you actually need it to be 255 chars long. Preferably before you create the index.
As an alternate solution you might think about having separate tables for the messages that are outgoing and those that are outgone. Then you can:
insert all records from Outgoing to Outgone
delete all records from Outgoing, with output clause like you are currently doing.
Make sure though that the insert and the delete operations are done in one transaction or you will soon have weird inconsistencies in the database.
it is just causing wait times on the CPU especially since it runs every other second.
Get rid of the TOP 30 and run it much less often than once every other second... maybe every two or three minutes.
You can enable max degree of parallelism of your sql server for faster processing
I've been trying for weeks to figure out an issue that happens once every 100,000+ transactions. I've tried dozens of variations and have run out of ideas, so I'm hoping someone has seen this before.
In summary, I have a table that acts like a queue. Records are inserted either singly or in transaction'ed batches, and sometimes one record is "dependent" on another (so that it is not eligible to be removed from the queue until the record it's dependent on is first removed). The basic structure of the tables includes these columns:
item_id - a GUID that uniquely identifies the record
depend_id - a GUID that identifies the record that this record is dependent on (or NULL if it's not dependent on anything)
item_lock - a GUID that starts out NULL, but is set to the "owner process" when the record is "locked down" to be worked on (when the work is done, the record is deleted from the table)
A simplified version of the polling query that is called to "lock down" the next "ready" record is:
UPDATE TOP(1) Q1 SET
lock_id = #lock,
FROM item_queue Q1
WHERE (lock_id IS NULL)
AND (depend_id IS NULL OR depend_id NOT IN (SELECT item_id FROM item_queue))
AND execute_at < GETUTCDATE()
My objective here (and this works almost all of the time) is that the NOT IN SELECT will simply check to see if the item that is otherwise eligible to have its lock_id set, that it won't be chosen if it's depend_id matches another item that's still in the table. But 1 out of 100,000+ calls to the stored procedure, that constraint fails and the record with the depend_id that does match an item_id that's still in the table gets chosen.
I have tried various alternatives to the NOT IN SELECT; all methods "work" but all fail in the same way. It is always the case that "dependent" records are inserted with their dependencies within a committed transaction.
Any and all ideas welcomed...I'm stumped.
PS - I should mention that there are many different threads on different client machines adding to and polling/locking/deleting records in this table. One of my working theories is that there is some sort of locking/contention that occasionally causes the record that is being depended on to not show up in the NOT IN SELECT subquery, causing the dependent record to become eligible (but I have not been able to come up with the specific scenario for that to happen).
EDIT: More on transaction isolation level: I'm running with the default READ COMMITTED isolation level. Is it possible that this is causing the "depended on" record to be omitted from the NOT IN SELECT subquery in the "race condition" case where another thread has just updated it? If so, I'm not entirely clear on what isolation level I need to ensure that any record that's still in the table (whether it's being updated or not) comes back in that query.
have you tried doing as a correlated update? By left-joining to the item queue table the second time by the depend_id to the item_id, you should not be locking anything in the second instance, and have the criteria just see if that left join result is NULL. If the depend_id was null, then it won't find a match via the left-join. If it DID have a value in depend_id, and not in as an item_id value, then it too would result in null. Only if the depend_id MATCHED and ITEM_ID would it NOT BE NULL and thus excluded from consideration.
UPDATE TOP(1) item_queue
SET lock_id = #lock
FROM item_queue
LEFT JOIN item_queue IQ2
ON item_queue.depend_id = IQ2.item_id
WHERE
lock_id is null
AND IQ2.item_id is null
AND execute_at < GETUTCDATE()
Based on the clarification of your Top(1), I might switch where the locking attempt might be done by doing the following... Pseudocode within your thread...
Select All Pending Possible Item_Queue entries.
scan through the list of records returned
update the lock_id = uniqueGUID
where item queue ID is the one you are working with LOCALLY
AND lock_id IS NULL
if the number of records returned = 1, then you got it.
process the item queue ID you successfully "locked"
else
if it comes back 0 records updated, someone else hit it
before you... continue with next available LOCAL record.
end scan of available LOCAL POSSIBLE queue records
The premise is this. Everyone queries all POSSIBLE queue items, and everyone tries to hit an update with their GUID. Since the update call here is specifically looking to update where the LOCK_ID IS NULL, if someone else hit it first, it is no longer null and won't update it (0 records updated). If so, try the next one and go through the list.
If no records are available after scanning through the entire available list, you may want to wrap that in a loop of say 2-3 times by getting a fresh list each cycle to remove any "in process" items now being processed and a fresh list of available to try with.
Other similar approaches have been to update with a numeric counter for a lock column and to set it to 1 more than the last value where the value equals what you originally retrieved (vs null in this case). This way, it allows a sort of tracker to how many TIMES a given record has been locked for updating.
I have a table called ticket, and it has a field called number and a foreign key called client that needs to work much like an auto-field (incrementing by 1 for each new record), except that the client chain needs to be able specify the starting number. This isn't a unique field because multiple clients will undoubtedly use the same numbers (e.g. start at 1001). In my application I'm fetching the row with the highest number, and using that number + 1 to make the next record's number. This all takes place inside of a single transaction (the fetching and the saving of the new record). Is it true that I won't have to worry about a ticket ever getting an incorrect (duplicate) number under a high load situation, or will the transaction protect from that possibility? (note: I'm using PostgreSQL 9.x)
without locking the whole table on every insert/update, no. The way transactions work on PostgreSQL means that new rows that appear as a result of concurrent transactions never conflict with each other; and thats exactly what would be happening.
You need to make sure that updates actually cause the same rows to conflict. You would basically need to implement something similar to the mechanic used by PostgreSQL's native sequences.
What I would do is add another column to the table referenced by your client column to represent the last_val of the sequence's you'll be using. So each transaction would look sort of like this:
BEGIN;
SET TRANSACTION SERIALIZABLE;
UPDATE clients
SET footable_last_val = footable_last_val + 1
WHERE clients.id = :client_id;
INSERT INTO footable(somecol, client_id, number)
VALUES (:somevalue,
:client_id,
(SELECT footable_last_val
FROM clients
WHERE clients.id = :client_id));
COMMIT;
So that the first update into the clients table fails due to a version conflict before reaching the insert.
You do have to worry about duplicate numbers.
The typical problematic scenario is: transaction T1 reads N, and creates a new row with N+1. But before T1 commits, another transaction T2 sees N as the max for this client and creates another new row with N+1 => conflict.
There are many ways to avoid this; here is a simple piece of plpgsql code that implements one method, assuming a unique index on (client,number). The solution is to let the inserts run concurrently but in the event of a unique index violation, retry with refreshed values until it's accepted (it's not a busy loop, though, since concurrent inserts are blocked until other transactions commit)
do
$$
begin
loop
BEGIN
-- client number is assumed to be 1234 for the sake of simplicity
insert into the_table(client,number)
select 1234, 1+coalesce(max(number),0) from the_table where client=1234;
exit;
EXCEPTION
when unique_violation then -- nothing (keep looping)
END;
end loop;
end$$;
This example is a bit similar to the UPSERT implementation from the PG documentation.
It's easily transferable into a plpgsql function taking the client id as input.