tsql bulk update - sql-server

MyTableA has several million records. On regular occasions every row in MyTableA needs to be updated with values from TheirTableA.
Unfortunately I have no control over TheirTableA and there is no field to indicate if anything in TheirTableA has changed so I either just update everything or I update based on comparing every field which could be different (not really feasible as this is a long and wide table).
Unfortunately the transaction log is ballooning doing a straight update so I wanted to chunk it by using UPDATE TOP, however, as I understand it I need some field to determine if the records in MyTableA have been updated yet or not otherwise I'll end up in an infinite loop:
declare #again as bit;
set #again = 1;
while #again = 1
begin
update top (10000) MyTableA
set my.A1 = their.A1, my.A2 = their.A2, my.A3 = their.A3
from MyTableA my
join TheirTableA their on my.Id = their.Id
if ##ROWCOUNT > 0
set #again = 1
else
set #again = 0
end
is the only way this will work if I add in a
where my.A1 <> their.A1 and my.A2 <> their.A2 and my.A3 <> their.A3
this seems like it will be horribly inefficient with many columns to compare
I'm sure I'm missing an obvious alternative?

Assuming both tables are the same structure, you can get a resultset of rows that are different using
SELECT * into #different_rows from MyTable EXCEPT select * from TheirTable and then update from that using whatever key fields are available.

Well, the first, and simplest solution, would obviously be if you could change the schema to include a timestamp for last update - and then only update the rows with a timestamp newer than your last change.
But if that is not possible, another way to go could be to use the HashBytes function, perhaps by concatenating the fields into an xml that you then compare. The caveat here is an 8kb limit (https://connect.microsoft.com/SQLServer/feedback/details/273429/hashbytes-function-should-support-large-data-types) EDIT: Once again, I have stolen code, this time from:
http://sqlblogcasts.com/blogs/tonyrogerson/archive/2009/10/21/detecting-changed-rows-in-a-trigger-using-hashbytes-and-without-eventdata-and-or-s.aspx
His example is:
select batch_id
from (
select distinct batch_id, hash_combined = hashbytes( 'sha1', combined )
from ( select batch_id,
combined =( select batch_id, batch_name, some_parm, some_parm2
from deleted c -- need old values
where c.batch_id = d.batch_id
for xml path( '' ) )
from deleted d
union all
select batch_id,
combined =( select batch_id, batch_name, some_parm, some_parm2
from some_base_table c -- need current values (could use inserted here)
where c.batch_id = d.batch_id
for xml path( '' ) )
from deleted d
) as r
) as c
group by batch_id
having count(*) > 1
A last resort (and my original suggestion) is to try Binary_Checksum? As noted in the comment, this does open the risk for a rather high collision rate.
http://msdn.microsoft.com/en-us/library/ms173784.aspx
I have stolen the following example from lessthandot.com - link to the full SQL (and other cool functions) is below.
--Data Mismatch
SELECT 'Data Mismatch', t1.au_id
FROM( SELECT BINARY_CHECKSUM(*) AS CheckSum1 ,au_id FROM pubs..authors) t1
JOIN(SELECT BINARY_CHECKSUM(*) AS CheckSum2,au_id FROM tempdb..authors2) t2 ON t1.au_id =t2.au_id
WHERE CheckSum1 <> CheckSum2
Example taken from http://wiki.lessthandot.com/index.php/Ten_SQL_Server_Functions_That_You_Have_Ignored_Until_Now

I don't know if this is better than adding where my.A1 <> their.A1 and my.A2 <> their.A2 and my.A3 <> their.A3, but I would definitely give it a try (assuming SQL Server 2005+):
declare #again as bit;
set #again = 1;
declare #idlist table (Id int);
while #again = 1
begin
update top (10000) MyTableA
set my.A1 = their.A1, my.A2 = their.A2, my.A3 = their.A3
output inserted.Id into #idlist (Id)
from MyTableA my
join TheirTableA their on my.Id = their.Id
left join #idlist i on my.Id = i.Id
where i.Id is null
/* alternatively (instead of left join + where):
where not exists (select * from #idlist where Id = my.Id) */
if ##ROWCOUNT > 0
set #again = 1
else
set #again = 0
end
That is, declare a table variable for collecting the IDs of the rows being updated and use that table for looking up (and omitting) IDs that have already been updated.
A slight variation on the method would be to use a local temporary table instead of a table variable. That way you would be able to create an index on the ID lookup table, which might result in better performance.

If schema change is not possible. How about using trigger to save off the Ids that have changed. And only import/export those rows.
Or use trigger to export it immediately.

Related

Can I use a variable inside cursor declaration?

Is this code valid?
-- Zadavatel Login ID
DECLARE #ZadavatelLoginId nvarchar(max) =
(SELECT TOP 1 LoginId
FROM
(SELECT Z.LoginId, z.Prijmeni, k.spojeni
FROM TabCisZam Z
LEFT JOIN TabKontakty K ON Z.ID = K.IDCisZam
WHERE druh IN (6,10)) t1
LEFT JOIN
(SELECT ko.Prijmeni, k.spojeni, ko.Cislo
FROM TabCisKOs KO
LEFT JOIN TabKontakty K ON K.IDCisKOs = KO.id
WHERE druh IN (6, 10)) t2 ON t1.spojeni = t2.spojeni
AND t1.Prijmeni = t2.Prijmeni
WHERE
t2.Cislo = (SELECT CisloKontOsoba
FROM TabKontaktJednani
WHERE id = #IdKJ))
-- Pokud je řešitelský tým prázdný
IF NOT EXISTS (SELECT * FROM TabKJUcastZam WHERE IDKJ = #IdKJ)
BEGIN
DECLARE ac_loginy CURSOR FAST_FORWARD LOCAL FOR
-- Zadavatel
SELECT #ZadavatelLoginId
END
ELSE BEGIN
I am trying to pass the variable #ZadavatelLoginId into the cursor declaration and SSMS keeps telling me there is a problem with the code even though it is working.
Msg 116, Level 16, State 1, Procedure et_TabKontaktJednani_ANAFRA_Tis_Notifikace, Line 575 [Batch Start Line 7]
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS
Can anyone help?
I do not see anything in your posted query that could trigger the specific message that you listed. You might get an error if the subquery (SELECT CisloKontOsoba FROM TabKontaktJednani WHERE id = #IdKJ) returned more than one value, but that error would be a very specific "Subquery returned more than 1 value...".
However, as written, your cursor query is a single select of a scalar, which would never yield anything other than a single row.
If you need to iterate over multiple user IDs, but wish to separate your selection query from your cursor definition, what you likely need is a table variable than can hold multiple user IDs instead of a scalar variable.
Something like:
DECLARE #ZadavatelLoginIds TABLE (LoginId nvarchar(max))
INSERT #ZadavatelLoginIds
SELECT t1.LoginId
FROM ...
DECLARE ac_loginy CURSOR FAST_FORWARD LOCAL FOR
SELECT LoginId
FROM #ZadavatelLoginIds
OPEN ac_loginy
DECLARE #LoginId nvarchar(max)
FETCH NEXT FROM ac_loginy INTO #LoginId
WHILE ##FETCH_STATUS = 0
BEGIN
... Send email to #LoginId ...
FETCH NEXT FROM ac_loginy INTO #LoginId
END
CLOSE ac_loginy
DEALLOCATE ac_loginy
A #Temp table can also be used in place of the table variable with the same results, but the table variable is often more convenient to use.
As others have mentioned, I believe that your login selection query is overly complex. Although this was not the focus of your question, I still suggest that you attempt to simplify it.
An alternative might be something like:
SELECT Z.LoginId
FROM TabKontaktJednani KJ
JOIN TabCisKOs KO ON KO.Cislo = KJ.CisloKontOsoba
JOIN TabCisZam Z ON Z.Prijmeni = KO.Prijmeni
JOIN TabKontakty K ON K.IDCisZam = Z.ID
WHERE KJ.id = #IdKJ
AND K.druh IN (6,10)
The above is my attempt to rewrite your posted query after tracing the relationships. I did not see any LEFT JOINS that were not superseded by other conditions that forced them into effectively being inner joins, so the above uses inner joins for everything. I have assumed that the druh column is in the TabKontakty table. Otherwise I see no need for that table. I do not guarantee that my re-interpretation is correct though.
How about you create a #temp table for each sub query since the problem is coming up due to the sub queries?
CREATE TABLE #TEMP1
(
LoginID nvarchar(max)
)
CREATE TABLE #TEMP2
(
ko.Prijmeni nvarchar(max),
k.spojeni nvarchar(max),
ko.Cislo nvarchar(max)
)

select query on a table is hanging when a stored procedure to update the table is run

I've a simple stored procedure to update a table as follows:
This sp is updating the table properly. But when I execute select query on po_tran table, its hanging.
Is there any mistake in the stored procedure..?
alter procedure po_tran_upd #locid char(3)
as
SET NOCOUNT ON;
begin
update t
set t.lastndaysale = (select isnull(sum( qty)*-1, 0)
from exp_tran
where exp_tran.loc_id =h.loc_id and
item_code = t.item_code and
exp_tran.doc_date > dateadd(dd,-30,getdate() )
and exp_tran.doc_type in ('PI', 'IN', 'SR')),
t.stk_qty = (select isnull(sum( qty), 0)
from exp_tran
where exp_tran.loc_id =h.loc_id and
item_code = t.item_code )
from po_tran t, po_hd h
where t.entry_no=h.entry_no and
h.loc_id=#locid and
h.entry_date> getdate()-35
end
;
Try the following possible ways to optimize your procedure.
Read this article, where I have explained the same example using CURSOR, Here I also have updated a field of the table using CURSOR.
Important: Remove Subquery, As I can see you have used a subquery to update the field.
You can use Join or Save the result of your query in the temp variable and you can use that variable while update.
i.g
DECLARE #lastndaysale AS FLOAT
DECLARE #stk_qty AS INT
select #lastndaysale = isnull(sum( qty)*-1, 0) from exp_tran where exp_tran.loc_id =h.loc_id and
item_code = t.item_code and exp_tran.doc_date > dateadd(dd,-30,getdate() ) and exp_tran.doc_type in ('PI', 'IN', 'SR')
select #stk_qty = isnull(sum( qty), 0) from exp_tran where exp_tran.loc_id =h.loc_id and item_code = t.item_code
update t set t.lastndaysale =#lastndaysale,
t.stk_qty = #stk_qty
from po_tran t, po_hd h where t.entry_no=h.entry_no and h.loc_id=#locid and h.entry_date> getdate()-35
This is just a sample example you can do need full changes in that.
I added a possibly more performant update, however, I do not fully understand your question. If "any" query is running slow against the po_tran, then I suggest you examine the indexing on that table and ensure it has a proper clustered index. If "this" query is running slow then I suggest you look into "covering indexes". The two fields entry_no and item_code seem like good candidates to include in a covering index.
update t
set t.lastndaysale =
CASE WHEN e.doc_date > dateadd(dd,-30,getdate() AND e.doc_type in ('PI', 'IN', 'SR') THEN
isnull(sum(qty) OVER (PARTITION BY e.loc_id, t.item_code) *-1, 0)
ELSE 0
END,
t.stk_qty = isnull(SUM(qty) OVER (PARTITION BY e.loc_id, t.item_code),0)
from
po_tran t
INNER JOIN po_hd h ON h.entry_no=t.entry_no AND h.entry_date> getdate()-35
INNER JOIN exp_tran e ON e.loc_id = h.loc_id AND e.itesm_code = t.item_code
where
h.loc_id=#locid

SQL using a function in a trigger

I am creating a a trigger in SQL that will insert into another table after Insert on it. However I need to fetch a Value from the table to increment to be used in the insert.
I have a AirVisionSiteLog table. On insert on the table I would like for it to insert into another SiteLog table. However in order to do this I need to fetch the last Entry Number of the Site from the SiteLog table. Then on its insert take that result and increase by one for the new Entry Number. I am new to Triggers and Functions so I am not sure how to use them correctly. I believe I have a function to retrieve and increment the Entry Number however I am not sure how to use it in the Trigger.
My Function -
CREATE FUNCTION AQB_RMS.F_GetLogEntryNumber
(#LocationID int)
RETURNS INTEGER
AS
BEGIN
DECLARE
#MaxEntry Integer,
#EntryNumber Integer
Set #MaxEntry = (Select Max(SL.EntryNumber) FROM AQB_MON.AQB_RMS.SiteLog SL
WHERE SL.LocationID = #LocationID)
SET #EntryNumber = #MaxEntry + 1
RETURN #EntryNumber
END
My Trigger and attempt to use the Function -
CREATE TRIGGER [AQB_RMS].[SiteLogCreate] on [AQB_MON].[AQB_RMS].[AirVisionSiteLog]
AFTER INSERT
AS
BEGIN
declare #entrynumber int
declare #corrected int
set #corrected = 0
INSERT INTO [AQB_MON].[AQB_RMS].[SiteLog]
([SiteLogTypeID],[LocationID],[EntryNumber],[SiteLogEntry]
,[EntryDate],[Corrected],[DATE_CREATED],[CREATED_BY])
SELECT st.SiteLogTypeID, l.LocationID,
(select AQB_RMS.F_GetLogEntryNumber from [AQB_MON].[AQB_RMS].[SiteLog] sl
where sl.LocationID = l.LocationID)
, i.SiteLogEntry, i.EntryDate, #corrected, i.DATE_CREATED, i.CREATED_BY
from inserted i
left join AQB_MON.[AQB_RMS].[SiteLogType] st on st.SiteLogType = i.SiteLogType
left join AQB_MON.AQB_RMS.Location l on l.SourceSiteID = i.SourceSiteID
END
GO
I believe that you are close.
At this part of the query in the trigger: (I set the columns vertically so that the difference is more noticable)
SELECT st.SiteLogTypeID,
l.LocationID,
(select AQB_RMS.F_GetLogEntryNumber from [AQB_MON].[AQB_RMS].[SiteLog] sl where sl.LocationID = l.LocationID),
i.SiteLogEntry,
i.EntryDate,
#corrected,
i.DATE_CREATED,
i.CREATED_BY
...should be:
SELECT st.SiteLogTypeID,
l.LocationID,
AQB_RMS.F_GetLogEntryNumber(select l.LocationID from [AQB_MON].[AQB_RMS].[SiteLog] sl where sl.LocationID = l.LocationID),
i.SiteLogEntry,
i.EntryDate,
#corrected,
i.DATE_CREATED,
i.CREATED_BY
So basically, you would call the function name with the query as the parameter, which the results thereof should only be one row with a value.
Note that in my modified example, I added the l.LocationID after the select in the function call, so I'm not sure if this is what you need, but change that to match your needs. Because I'm not sure of the exact column that you need, add a comment should there be other issues.

SQL using UPDLOCK in query to update top 1 record after filtering and ordering table

I have a stored procedure as follows:
CREATE PROCEDURE [dbo].[RV_SM_WORKITEM_CHECKWORKBYTYPE]
(
#v_ServiceName Nvarchar(20)
,#v_WorkType Nvarchar(20)
,#v_WorkItemThreadId nvarchar(50)
)
AS BEGIN
;WITH updateView AS
(
SELECT TOP 1 *
FROM rv_sm_workitem WITH (UPDLOCK)
WHERE stateofitem = 0
AND itemtype = #v_worktype
ORDER BY ITEMPRIORITY
)
UPDATE updateView
SET assignedto = #v_ServiceName,
stateofitem = 1,
dateassigned = getdate(),
itemthreadid = #v_WorkItemThreadId
OUTPUT INSERTED.*
END
It does the job I need it to do, namely, grab 1 record with a highest priority, change it's state from Available(0) to Not-Available(1), and return the record for work to be done with it. I should be able to have many threads (above 20) use this proc and have all 20 constantly running/grabbing a new workitem. However I am finding that beyond 2 threads, addition threads are waiting on locks; I'm guessing the UPDLOCK is causing this.
I have 2 questions, is there a better way to do this?
Can I do this without the UPDLOCK in the cte since the update statement by default uses UPDLOCK? Note, at any given time, there are over 400,000 records in this table.
I had to so something similar once and this is what I would suggest:
AS BEGIN
DECLARE #results table (id int, otherColumns varchar(50))
WHILE (EXISTS(SELECT TOP 1 * FROM #results))
BEGIN
;WITH updateView AS
(
SELECT TOP 1 *
FROM rv_sm_workitem
WHERE stateofitem = 0
AND itemtype = #v_worktype
ORDER BY ITEMPRIORITY
)
UPDATE updateView
SET assignedto = #v_ServiceName,
stateofitem = 1,
dateassigned = getdate(),
itemthreadid = #v_WorkItemThreadId
OUTPUT INSERTED.* into #results
where stateofitem = 0
END
END
This ensures that the call cannot not allow a item to be double processed. (because of the where clause on the update statement).
There are other variations of this idea, but this is an easy way to convey it. This is not production ready code though, as it will continually circle in the while loop until there is something to process. But I leave it to you to decide how to break out or not loop and return empty (and let the client side code deal with it.)
Here is the answer that helped me when I had this issue.

Exists vs select count

In SQL Server, performance wise, it is better to use IF EXISTS (select * ...) than IF (select count(1)...) > 0...
However, it looks like Oracle does not allow EXISTS inside the IF statement, what would be an alternative to do that because using IF select count(1) into... is very inefficient performance wise?
Example of code:
IF (select count(1) from _TABLE where FIELD IS NULL) > 0 THEN
UPDATE TABLE _TABLE
SET FIELD = VAR
WHERE FIELD IS NULL;
END IF;
the best way to write your code snippet is
UPDATE TABLE _TABLE
SET FIELD = VAR
WHERE FIELD IS NULL;
i.e. just do the update. it will either process rows or not. if you needed to check if it did process rows then add afterwards
if (sql%rowcount > 0)
then
...
generally in cases where you have logic like
declare
v_cnt number;
begin
select count(*)
into v_cnt
from TABLE
where ...;
if (v_cnt > 0) then..
its best to use ROWNUM = 1 because you DON'T CARE if there are 40 million rows..just have Oracle stop after finding 1 row.
declare
v_cnt number;
begin
select count(*)
into v_cnt
from TABLE
where rownum = 1
and ...;
if (v_cnt > 0) then..
or
select count(*)
into v_cnt
from dual
where exists (select null
from TABLE
where ...);
whichever syntax you prefer.
As Per:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3069487275935
You could try:
for x in ( select count(*) cnt
from dual
where exists ( select NULL from foo where bar ) )
loop
if ( x.cnt = 1 )
then
found do something
else
not found
end if;
end loop;
is one way (very fast, only runs the subquery as long as it "needs" to, where exists
stops the subquery after hitting the first row)
That loop always executes at least once and at most once since a count(*) on a table
without a group by clause ALWAYS returns at LEAST one row and at MOST one row (even of
the table itself is empty!)

Resources