Which is faster (ISNULL(#SKU, '') = '') or (#SKU IS NULL OR #SKU = '') - sql-server

There is one if case where I am using IF #SKU IS NULL OR #SKU = '', but my friend says it will take more time as compare to IF ISNULL(#SKU, '') = ''. So you should use IF ISNULL(#SKU, '') = ''. But I think I'm using correct. So please suggest me which one is run faster.
This is my stored procedure:
CREATE PROCEDURE USP_GetExistingRefunds
(
#OrderNo VARCHAR(50),
#SKU VARCHAR(255),
#ProfileID INT
)
AS
BEGIN
--IF ISNULL(#SKU, '') = '' --this work faster or
IF #SKU IS NULL OR #SKU = '' --this work faster
BEGIN
SELECT OrderNo, SKU, ISNULL(Quantity, 0) Quantity, ISNULL(Amount, 0) Amount
FROM StoreRefundOrder SRO
INNER JOIN StoreRefundOrderItem SROI ON SRO.ID = SROI.RefundOrderID
WHERE SRO.OrderNo = #OrderNo
AND ProfileID = #ProfileID
END
ELSE
BEGIN
SELECT OrderNo, SKU, ISNULL(SUM(Quantity), 0) Quantity, ISNULL(SUM(Amount), 0) Amount
FROM StoreRefundOrder SRO
INNER JOIN StoreRefundOrderItem SROI ON SRO.ID = SROI.RefundOrderID
WHERE SRO.OrderNo = #OrderNo
AND SROI.SKU = #SKU
AND ProfileID = #ProfileID
GROUP BY OrderNo, SKU
END
END

In the context of an IF/ELSE Procedural batch
It doesn't make any difference whatsoever. It literally takes 0.00 MS to determine if a value is blank or unknown, it takes the 0.00MS to determine if ISNULL(#SKU, '') = ''. If there is a difference it would likely be measured in nanoseconds IMO. Again, this in the context of a procedural batch because the statement is only being evaluated once.
In the context of an FILTER (e.g. ON, WHERE or HAVING CLAUSE)
Here the difference is actually enormous, it cannot be understated. This is tricky to explain with parameters and variables involved, so, for brevity, I show you an example with this sample data:
IF OBJECT_ID('tempdb..#things','U') IS NOT NULL DROP TABLE #things;
SELECT TOP (10000) Txt = SUBSTRING(LEFT(NEWID(),36),1,ABS(CHECKSUM(NEWID())%x.N))
INTO #things
FROM (VALUES(1),(30),(40),(NULL)) AS x(N)
CROSS JOIN sys. all_columns;
UPDATE #things SET Txt = NEWID() WHERE txt = '0';
CREATE NONCLUSTERED INDEX nc_things1 ON #things(Txt);
The following queries will find rows that either do or do not contain blanks or nulls
-- Finding things that are blank or NULL
SELECT t.Txt
FROM #things AS t
WHERE t.Txt IS NULL OR t.Txt = '';
-- Finding things that are NOT blank or NULL
SELECT t.Txt
FROM #things AS t
WHERE NOT(t.Txt IS NULL OR t.Txt = '');
SELECT t.Txt
FROM #things AS t
WHERE t.Txt > '';
-- Finding things that are blank or NULL
SELECT t.Txt
FROM #things AS t
WHERE ISNULL(t.Txt,'') = '';
-- Finding things that are NOT blank or NULL
SELECT t.Txt
FROM #things AS t
WHERE ISNULL(t.Txt,'') <> '';
The first three queries are SARGable, the last two are not because of ISNULL. Even though there's an index to help me, the ISNULL renders it useless here. It's the difference between asking someone to look in a phone book for everyone whose name begins with "A" and finding everyone who's name ends with "A".
SARGable predicates allow a query to seek a portion of an index where non-SARGable predicates force a query to scan the entire table REGARDLESS of the how many matching rows exist (if any). When you are dealing with millions/billions of rows joined to many other tables the difference can be a query that runs in seconds to one that, in some cases, may run for hours or even weeks (I've seen a few).
EXECUTION PLANS:
Note that this last one WHERE t.Txt > '' will work too. Any non-null text value is > '' and if t.Txt was NULL then it will also evaluate to false. I include this because this expression works for filtered indexes. The only catch is you can't use it on a text field where Implicit conversion can transform this into the number 0 or less. Note these queries:
IF '' = 0 PRINT 'True' ELSE PRINT 'False'; -- Returns True
IF '' = '0' PRINT 'True' ELSE PRINT 'False'; -- Returns False
IF '' > -1 PRINT 'True' ELSE PRINT 'False'; -- Returns True
IF '' > '-1' PRINT 'True' ELSE PRINT 'False'; -- Returns False

IF #SKU IS NULL OR #SKU ='' is checking null and blank both. In second case if Isnull(#sku,'') you are checking null and assigning '' for null value. Both are different cases.

Which is faster (ISNULL(#SKU, '') = '') or (#SKU IS NULL OR #SKU =
'')
It really doesn't matter in this case. If you were comparing against a column then (SKU IS NULL OR SKU = '') would be preferable as it can use an index but any difference for a single comparison against a variable will be in the order of microseconds and dwarfed by the execution times of the SELECT statements.
To simplify the IF statement I'd probably invert it anyway as below
IF #SKU <> '' --Not null or empty string
BEGIN
SELECT OrderNo, SKU, ISNULL(SUM(Quantity), 0) Quantity, ISNULL(SUM(Amount), 0) Amount
FROM StoreRefundOrder SRO
INNER JOIN StoreRefundOrderItem SROI ON SRO.ID = SROI.RefundOrderID
WHERE SRO.OrderNo = #OrderNo
AND SROI.SKU = #SKU
AND ProfileID = #ProfileID
GROUP BY OrderNo, SKU
END
ELSE
BEGIN
SELECT OrderNo, SKU, ISNULL(Quantity, 0) Quantity, ISNULL(Amount, 0) Amount
FROM StoreRefundOrder SRO
INNER JOIN StoreRefundOrderItem SROI ON SRO.ID = SROI.RefundOrderID
WHERE SRO.OrderNo = #OrderNo
AND ProfileID = #ProfileID
END

A little long for a comment.
As has been noted by many already, in this case your two options don't really make any appreciable difference. But in the future, when you think of a couple of different ways to code something, there are standard practices you can implement quickly and easily to test for yourself.
At the top of your code block, add this command:
SET STATISTICS TIME, IO ON;
You can use either TIME or IO, but I almost always want to see both, so I always turn them both on at the same time.
The output from this addition will show up in your Messages window after your query or queries run and will give you tangible information about which method is faster, or causes less stress on the SQL Server engine.
You'll want to run a few tests with each option. Warm cache / cold cache especially, but a few iterations is best one way or the other to get an average or eliminate outlier results.
I'm weird, so I always close my code block with:
SET STATISTICS TIME, IO OFF;
But strictly speaking that's unnecessary. I just have a thing about resetting anything I change, just to avoid any possibility of forgetting to reset something that will matter.
Kendra Little has an informative blog post on using STATISTICS.

Related

(SOLVED) - First iteration of WHILE loop runs out of memory despite manual reconstruction of query succeeding

Environment: SQL Server 2019 (v15).
I have a large query that uses too much space when run as a single SELECT statement. When I try to run it, I get the following error:
Could not allocate a new page for database 'TEMPDB' because of insufficient disk space in filegroup 'DEFAULT'.
However, the problem breaks down naturally into a dozen or so pieces, so I wrote a WHILE loop to iterate through each piece and insert into a results table. Unfortunately, the first iteration of the WHILE loop also returns the same memory error. All the WHILE loop is doing is changing a few values in the WHERE clause.
The key thing confusing me here, is that when I manually run one iteration of the INSERT statement, absent all looping logic, it works perfectly.
Manually coding the first iteration to use the first institution_name just works, so I don't think the joins here are going wrong and causing the memory error.
WITH my_cte AS
(
SELECT [columns]
FROM mytable a
INNER JOIN bigtable b ON a.institution_name = b.institution_name
AND a.personID = b.personID
WHERE a.institution_name = 'ABC'
AND b.institution_name = 'ABC'
)
INSERT INTO results (personID, institution_name, ...)
SELECT personID, institution_name, [some aggregations]
FROM my_cte
GROUP BY personID, institution_name;
The version with the WHILE loop fails. I need to run the query with different values for institution_name.
Here I show three different values but even just the first iteration fails.
DECLARE #INSTITUTION varchar(10)
DECLARE #COUNTER int
SET #COUNTER = 0
DECLARE #LOOKUP table (temp_val varchar(10), temp_id int)
INSERT INTO #LOOKUP (temp_val, temp_id)
VALUES ('ABC', 1), ('DEF', 2), ('GHI', 3)
WHILE #COUNTER < 3
BEGIN
SET #COUNTER = #COUNTER + 1
SELECT #INSTITUTION = temp_val
FROM #LOOKUP
WHERE temp_id = #COUNTER;
WITH my_cte AS
(
SELECT [columns]
FROM mytable a
INNER JOIN bigtable b ON a.institution_name = b.institution_name
AND a.personID = b.personID
WHERE a.institution_name = #INSTITUTION
AND b.institution_name = #INSTITUTION
)
INSERT INTO results (personID, institution_name, ...)
SELECT personID, institution_name, [some aggregations]
FROM my_cte
GROUP BY personID, institution_name
END
As I write this question, I have quite literally just copy-pasted the insert statement a dozen times, changed the relevant WHERE clause, and run it without errors. Could it be some kind of datatype issue where the query can properly subset if a string literal is put in the WHERE column, but the lookup on my temporary table is failing due to the datatype? I notice that mytable.institution_name is varchar(10) while bigtable.institution_name is nvarchar(10). Setting the temp table to use nvarchar(10) didn't fix it either.

Check condition in WHERE clause

I have below dynamic WHERE condition in XML mapping which is working fine:
WHERE
IncomingFlightId=#{flightId}
<if test="screenFunction == 'MAIL'.toString()">
and ContentCode = 'M'
</if>
<if test="screenFunction == 'CARGO'.toString()">
and ContentCode Not IN('M')
</if>
order by ContentCode ASC
I'm trying to run below query in a IDE but unfortunately its not working.
Can anybody please explain what i'm doing wrong?
WHERE
IncomingFlightId = 2568648
AND (IF 'MAIL' = 'MAIL'
BEGIN
SELECT 'and ContentCode = "M"'
END ELSE BEGIN
SELECT 'and ContentCode Not IN("M")'
END)
order by ContentCode ASC
You can't use IF in straight up SQL statement, use CASE WHEN test THEN returniftrue ELSE valueiffalse END instead (if you have to use conditional logic)
That said, it's probably avoidable if you do something like this:
WHERE
(somecolumn = 'MAIL' AND ContentCode = 'M') OR
(somecolumn <> 'MAIL' and ContentCode <> 'M')
Example of conditional logic in a straight SQL:
SELECT * FROM table
WHERE
CASE WHEN col > 0 THEN 1 ELSE 0 END = 1
Case when runs a test and returns a value. You always have to compare the return value to something else. You can't do something that doesn't return a value.
It's kinda dumb here though, because anything you can express in the truth of a case when, can be more simply and readably expressed in the truth of a where clause directly..
SELECT * FROM table
WHERE
CASE WHEN type = 'x'
THEN (SELECT count(*) FROM x)
ELSE (SELECT count(*) FROM y)
END = 1
Versus
SELECT * FROM table
WHERE
(type = 'x' AND (SELECT count(*) FROM x) = 1) OR
type <> 'x' AND (SELECT count(*) FROM y) = 1)
It's useful for things like this though:
SELECT
bustourname,
SUM(CASE WHEN age > 60 THEN 1 ELSE 0 END) as count_of_old_people
FROM table
GROUP BY bustourname
If you're looking to write a stored procedure that conditionally builds an SQL, then sure, you can do that...
DECLARE #sql VARCAHR(max) = 'SELECT * FROM TABLE WHERE';
IF blah SET #sql = CONCAT(#sql, 'somecolumn = 1')
IF otherblah SET #sql = CONCAT(#sql, 'othercolumn = 1')
EXEC #sql...
But this is only in a stored procedure or procedure-like sql script where it builds a string that looks like an SQL, and then executes it dynamically. You cannot use IF in a plain SELECT statement
You are running the query which (beside it is syntactically incorrect SQL) has nothing to do with query generated and used by mybatis.
You need to understand how if in mybatis mapper works.
if element evaluates before the query is executed at the stage of generation of the SQL query text. If the value of the test is true the content of if element is included into the resulting query. In your case depending on the screenFunction parameter passed to mybatis mapper method one of three conditions are generated.
If value of screenFunction is MAIL then:
WHERE
IncomingFlightId=#{flightId}
and ContentCode = 'M'
order by ContentCode ASC
If value of screenFunction is CARGO then:
WHERE
IncomingFlightId=#{flightId}
and ContentCode Not IN('M')
order by ContentCode ASC
Otherwise (if value of screenFunction is not MAIL and is not CARGO):
WHERE
IncomingFlightId=#{flightId}
order by ContentCode ASC
Only after the query text is generated it is executed via JDBC against the database.
So if you want to run the query manually you need to try one of these queries.
One thing that you may want to do to make it easier is to enable logging of SQL queries and parameters passed to them so you can more easily rerun them.

Need help removing functions from CASE WHEN

I have a situation where I have created script to select data in our company's environment. In doing so, I decided to use functions for some pattern matching and stripping of characters in a CASE WHEN.
However, one of our clients doesn't want to let us put their data in our local environment, so I now have the requirement of massaging the script to be able to run on their environment--essentially meaning I need to remove the functions, and I am having trouble thinking about how I need to move stuff around to do so.
An example of the function call would be:
SELECT ....
CASE WHEN Prp = 'Key Cabinet'
AND SerialNumber IS NOT NULL
AND dbo.fnRemoveNonNumericCharacters(SerialNumber) <> ''
THEN dbo.fnRemoveNonNumericCharacters(SerialNumber)
....
INTO #EmpProperty
FROM ....
Where Prp is a column that contains the property type and SerialNumber is a column that contains a serial number, but also some other random garbage because data entry was sloppy.
The function definition is:
WHILE PATINDEX('%[^0-9]%', #strText) > 0
BEGIN
SET #strText = STUFF(#strText, PATINDEX('%[^0-9]%', #strText), 1, '')
END
RETURN #strText
where #strText is the SerialNumber I am passing in.
I may be stuck in analysis paralysis because I just can't figure out a good way to do this. I don't need a full on solution per-say, perhaps just point me in a direction you know will work. Let me know if you would like some sample DDL/DML to mess around with stuff.
Example 'SerialNumber' values: CA100 (Trash bins), T110, 101B.
There are also a bunch of other types of values such as all text or all numbers, but we are filtering those out. The current patterning matching is good enough.
So I think you mean you can't use a function... so, perhaps:
declare #table table (SomeCol varchar(4000))
insert into #table values
('1 ab2cdefghijk3lmnopqr4stuvwxyz5 6 !7##$8%^&9*()-10_=11+[]{}12\|;:13></14? 15'),
('CA100 (Trash bins), T110, 101B')
;with cte as (
select top (100)
N=row_number() over (order by ##spid) from sys.all_columns),
Final as (
select SomeCol, Col
from #table
cross apply (
select (select X + ''
from (select N, substring(SomeCol, N, 1) X
from cte
where N<=datalength(SomeCol)) [1]
where X between '0' and '9'
order by N
for xml path(''))
) Z (Col)
where Z.Col is not NULL
)
select
SomeCol
,cast(Col as varchar) CleanCol --change this to BIGINT if it isn't too large
from Final

SQL using UPDLOCK in query to update top 1 record after filtering and ordering table

I have a stored procedure as follows:
CREATE PROCEDURE [dbo].[RV_SM_WORKITEM_CHECKWORKBYTYPE]
(
#v_ServiceName Nvarchar(20)
,#v_WorkType Nvarchar(20)
,#v_WorkItemThreadId nvarchar(50)
)
AS BEGIN
;WITH updateView AS
(
SELECT TOP 1 *
FROM rv_sm_workitem WITH (UPDLOCK)
WHERE stateofitem = 0
AND itemtype = #v_worktype
ORDER BY ITEMPRIORITY
)
UPDATE updateView
SET assignedto = #v_ServiceName,
stateofitem = 1,
dateassigned = getdate(),
itemthreadid = #v_WorkItemThreadId
OUTPUT INSERTED.*
END
It does the job I need it to do, namely, grab 1 record with a highest priority, change it's state from Available(0) to Not-Available(1), and return the record for work to be done with it. I should be able to have many threads (above 20) use this proc and have all 20 constantly running/grabbing a new workitem. However I am finding that beyond 2 threads, addition threads are waiting on locks; I'm guessing the UPDLOCK is causing this.
I have 2 questions, is there a better way to do this?
Can I do this without the UPDLOCK in the cte since the update statement by default uses UPDLOCK? Note, at any given time, there are over 400,000 records in this table.
I had to so something similar once and this is what I would suggest:
AS BEGIN
DECLARE #results table (id int, otherColumns varchar(50))
WHILE (EXISTS(SELECT TOP 1 * FROM #results))
BEGIN
;WITH updateView AS
(
SELECT TOP 1 *
FROM rv_sm_workitem
WHERE stateofitem = 0
AND itemtype = #v_worktype
ORDER BY ITEMPRIORITY
)
UPDATE updateView
SET assignedto = #v_ServiceName,
stateofitem = 1,
dateassigned = getdate(),
itemthreadid = #v_WorkItemThreadId
OUTPUT INSERTED.* into #results
where stateofitem = 0
END
END
This ensures that the call cannot not allow a item to be double processed. (because of the where clause on the update statement).
There are other variations of this idea, but this is an easy way to convey it. This is not production ready code though, as it will continually circle in the while loop until there is something to process. But I leave it to you to decide how to break out or not loop and return empty (and let the client side code deal with it.)
Here is the answer that helped me when I had this issue.

tsql bulk update

MyTableA has several million records. On regular occasions every row in MyTableA needs to be updated with values from TheirTableA.
Unfortunately I have no control over TheirTableA and there is no field to indicate if anything in TheirTableA has changed so I either just update everything or I update based on comparing every field which could be different (not really feasible as this is a long and wide table).
Unfortunately the transaction log is ballooning doing a straight update so I wanted to chunk it by using UPDATE TOP, however, as I understand it I need some field to determine if the records in MyTableA have been updated yet or not otherwise I'll end up in an infinite loop:
declare #again as bit;
set #again = 1;
while #again = 1
begin
update top (10000) MyTableA
set my.A1 = their.A1, my.A2 = their.A2, my.A3 = their.A3
from MyTableA my
join TheirTableA their on my.Id = their.Id
if ##ROWCOUNT > 0
set #again = 1
else
set #again = 0
end
is the only way this will work if I add in a
where my.A1 <> their.A1 and my.A2 <> their.A2 and my.A3 <> their.A3
this seems like it will be horribly inefficient with many columns to compare
I'm sure I'm missing an obvious alternative?
Assuming both tables are the same structure, you can get a resultset of rows that are different using
SELECT * into #different_rows from MyTable EXCEPT select * from TheirTable and then update from that using whatever key fields are available.
Well, the first, and simplest solution, would obviously be if you could change the schema to include a timestamp for last update - and then only update the rows with a timestamp newer than your last change.
But if that is not possible, another way to go could be to use the HashBytes function, perhaps by concatenating the fields into an xml that you then compare. The caveat here is an 8kb limit (https://connect.microsoft.com/SQLServer/feedback/details/273429/hashbytes-function-should-support-large-data-types) EDIT: Once again, I have stolen code, this time from:
http://sqlblogcasts.com/blogs/tonyrogerson/archive/2009/10/21/detecting-changed-rows-in-a-trigger-using-hashbytes-and-without-eventdata-and-or-s.aspx
His example is:
select batch_id
from (
select distinct batch_id, hash_combined = hashbytes( 'sha1', combined )
from ( select batch_id,
combined =( select batch_id, batch_name, some_parm, some_parm2
from deleted c -- need old values
where c.batch_id = d.batch_id
for xml path( '' ) )
from deleted d
union all
select batch_id,
combined =( select batch_id, batch_name, some_parm, some_parm2
from some_base_table c -- need current values (could use inserted here)
where c.batch_id = d.batch_id
for xml path( '' ) )
from deleted d
) as r
) as c
group by batch_id
having count(*) > 1
A last resort (and my original suggestion) is to try Binary_Checksum? As noted in the comment, this does open the risk for a rather high collision rate.
http://msdn.microsoft.com/en-us/library/ms173784.aspx
I have stolen the following example from lessthandot.com - link to the full SQL (and other cool functions) is below.
--Data Mismatch
SELECT 'Data Mismatch', t1.au_id
FROM( SELECT BINARY_CHECKSUM(*) AS CheckSum1 ,au_id FROM pubs..authors) t1
JOIN(SELECT BINARY_CHECKSUM(*) AS CheckSum2,au_id FROM tempdb..authors2) t2 ON t1.au_id =t2.au_id
WHERE CheckSum1 <> CheckSum2
Example taken from http://wiki.lessthandot.com/index.php/Ten_SQL_Server_Functions_That_You_Have_Ignored_Until_Now
I don't know if this is better than adding where my.A1 <> their.A1 and my.A2 <> their.A2 and my.A3 <> their.A3, but I would definitely give it a try (assuming SQL Server 2005+):
declare #again as bit;
set #again = 1;
declare #idlist table (Id int);
while #again = 1
begin
update top (10000) MyTableA
set my.A1 = their.A1, my.A2 = their.A2, my.A3 = their.A3
output inserted.Id into #idlist (Id)
from MyTableA my
join TheirTableA their on my.Id = their.Id
left join #idlist i on my.Id = i.Id
where i.Id is null
/* alternatively (instead of left join + where):
where not exists (select * from #idlist where Id = my.Id) */
if ##ROWCOUNT > 0
set #again = 1
else
set #again = 0
end
That is, declare a table variable for collecting the IDs of the rows being updated and use that table for looking up (and omitting) IDs that have already been updated.
A slight variation on the method would be to use a local temporary table instead of a table variable. That way you would be able to create an index on the ID lookup table, which might result in better performance.
If schema change is not possible. How about using trigger to save off the Ids that have changed. And only import/export those rows.
Or use trigger to export it immediately.

Resources