Exists vs select count - sql-server

In SQL Server, performance wise, it is better to use IF EXISTS (select * ...) than IF (select count(1)...) > 0...
However, it looks like Oracle does not allow EXISTS inside the IF statement, what would be an alternative to do that because using IF select count(1) into... is very inefficient performance wise?
Example of code:
IF (select count(1) from _TABLE where FIELD IS NULL) > 0 THEN
UPDATE TABLE _TABLE
SET FIELD = VAR
WHERE FIELD IS NULL;
END IF;

the best way to write your code snippet is
UPDATE TABLE _TABLE
SET FIELD = VAR
WHERE FIELD IS NULL;
i.e. just do the update. it will either process rows or not. if you needed to check if it did process rows then add afterwards
if (sql%rowcount > 0)
then
...
generally in cases where you have logic like
declare
v_cnt number;
begin
select count(*)
into v_cnt
from TABLE
where ...;
if (v_cnt > 0) then..
its best to use ROWNUM = 1 because you DON'T CARE if there are 40 million rows..just have Oracle stop after finding 1 row.
declare
v_cnt number;
begin
select count(*)
into v_cnt
from TABLE
where rownum = 1
and ...;
if (v_cnt > 0) then..
or
select count(*)
into v_cnt
from dual
where exists (select null
from TABLE
where ...);
whichever syntax you prefer.

As Per:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3069487275935
You could try:
for x in ( select count(*) cnt
from dual
where exists ( select NULL from foo where bar ) )
loop
if ( x.cnt = 1 )
then
found do something
else
not found
end if;
end loop;
is one way (very fast, only runs the subquery as long as it "needs" to, where exists
stops the subquery after hitting the first row)
That loop always executes at least once and at most once since a count(*) on a table
without a group by clause ALWAYS returns at LEAST one row and at MOST one row (even of
the table itself is empty!)

Related

Performance issue with larger resultsets MSSQL

I currently have a stored procedure in MSSQL where I execute a SELECT-statement multiple times based on the variables I give the stored procedure. The stored procedure counts how many results are going to be returned for every filter a user can enable.
The stored procedure isn't the issue, I transformed the select statement from te stored procedure to a regular select statement which looks like:
DECLARE #contentRootId int = 900589
DECLARE #RealtorIdList varchar(2000) = ';880;884;1000;881;885;'
DECLARE #publishSoldOrRentedSinceDate int = 8
DECLARE #isForSale BIT= 1
DECLARE #isForRent BIT= 0
DECLARE #isResidential BIT= 1
--...(another 55 variables)...
--Table to be returned
DECLARE #resultTable TABLE
(
variableName varchar(100),
[value] varchar(200)
)
-- Create table based of inputvariable. Example: turns ';18;118;' to a table containing two ints 18 AND 118
DECLARE #RealtorIdTable table(RealtorId int)
INSERT INTO #RealtorIdTable SELECT * FROM dbo.Split(#RealtorIdList,';') option (maxrecursion 150)
INSERT INTO #resultTable ([value], variableName)
SELECT [Value], VariableName FROM(
Select count(*) as TotalCount,
ISNULL(SUM(CASE WHEN reps.ForRecreation = 1 THEN 1 else 0 end), 0) as ForRecreation,
ISNULL(SUM(CASE WHEN reps.IsQualifiedForSeniors = 1 THEN 1 else 0 end), 0) as IsQualifiedForSeniors,
--...(A whole bunch more SUM(CASE)...
FROM TABLE1 reps
LEFT JOIN temp t on
t.ContentRootID = #contentRootId
AND t.RealEstatePropertyID = reps.ID
WHERE
(EXISTS(select 1 from #RealtorIdTable where RealtorId = reps.RealtorID))
AND (#SelectedGroupIds IS NULL OR EXISTS(select 1 from #SelectedGroupIdtable where GroupId = t.RealEstatePropertyGroupID))
AND (ISNULL(reps.IsForSale,0) = ISNULL(#isForSale,0))
AND (ISNULL(reps.IsForRent, 0) = ISNULL(#isForRent,0))
AND (ISNULL(reps.IsResidential, 0) = ISNULL(#isResidential,0))
AND (ISNULL(reps.IsCommercial, 0) = ISNULL(#isCommercial,0))
AND (ISNULL(reps.IsInvestment, 0) = ISNULL(#isInvestment,0))
AND (ISNULL(reps.IsAgricultural, 0) = ISNULL(#isAgricultural,0))
--...(Around 50 more of these WHERE-statements)...
) as tbl
UNPIVOT (
[Value]
FOR [VariableName] IN(
[TotalCount],
[ForRecreation],
[IsQualifiedForSeniors],
--...(All the other things i selected in above query)...
)
) as d
select * from #resultTable
The combination of a Realtor- and contentID gives me a set default set of X amount of records. When I choose a Combination which gives me ~4600 records, the execution time is around 250ms. When I execute the sattement with a combination that gives me ~600 record, the execution time is about 20ms.
I would like to know why this is happening. I tried removing all SUM(CASE in the select, I tried removing almost everything from the WHERE-clause, and I tried removing the JOIN. But I keep seeing the huge difference between the resultset of 4600 and 600.
Table variables can perform worse when the number of records is large. Consider using a temporary table instead. See When should I use a table variable vs temporary table in sql server?
Also, consider replacing the UNPIVOT by alternative SQL code. Writing your own TSQL code will give you more control and even increase performance. See for example PIVOT, UNPIVOT and performance

SQL using UPDLOCK in query to update top 1 record after filtering and ordering table

I have a stored procedure as follows:
CREATE PROCEDURE [dbo].[RV_SM_WORKITEM_CHECKWORKBYTYPE]
(
#v_ServiceName Nvarchar(20)
,#v_WorkType Nvarchar(20)
,#v_WorkItemThreadId nvarchar(50)
)
AS BEGIN
;WITH updateView AS
(
SELECT TOP 1 *
FROM rv_sm_workitem WITH (UPDLOCK)
WHERE stateofitem = 0
AND itemtype = #v_worktype
ORDER BY ITEMPRIORITY
)
UPDATE updateView
SET assignedto = #v_ServiceName,
stateofitem = 1,
dateassigned = getdate(),
itemthreadid = #v_WorkItemThreadId
OUTPUT INSERTED.*
END
It does the job I need it to do, namely, grab 1 record with a highest priority, change it's state from Available(0) to Not-Available(1), and return the record for work to be done with it. I should be able to have many threads (above 20) use this proc and have all 20 constantly running/grabbing a new workitem. However I am finding that beyond 2 threads, addition threads are waiting on locks; I'm guessing the UPDLOCK is causing this.
I have 2 questions, is there a better way to do this?
Can I do this without the UPDLOCK in the cte since the update statement by default uses UPDLOCK? Note, at any given time, there are over 400,000 records in this table.
I had to so something similar once and this is what I would suggest:
AS BEGIN
DECLARE #results table (id int, otherColumns varchar(50))
WHILE (EXISTS(SELECT TOP 1 * FROM #results))
BEGIN
;WITH updateView AS
(
SELECT TOP 1 *
FROM rv_sm_workitem
WHERE stateofitem = 0
AND itemtype = #v_worktype
ORDER BY ITEMPRIORITY
)
UPDATE updateView
SET assignedto = #v_ServiceName,
stateofitem = 1,
dateassigned = getdate(),
itemthreadid = #v_WorkItemThreadId
OUTPUT INSERTED.* into #results
where stateofitem = 0
END
END
This ensures that the call cannot not allow a item to be double processed. (because of the where clause on the update statement).
There are other variations of this idea, but this is an easy way to convey it. This is not production ready code though, as it will continually circle in the while loop until there is something to process. But I leave it to you to decide how to break out or not loop and return empty (and let the client side code deal with it.)
Here is the answer that helped me when I had this issue.

tsql bulk update

MyTableA has several million records. On regular occasions every row in MyTableA needs to be updated with values from TheirTableA.
Unfortunately I have no control over TheirTableA and there is no field to indicate if anything in TheirTableA has changed so I either just update everything or I update based on comparing every field which could be different (not really feasible as this is a long and wide table).
Unfortunately the transaction log is ballooning doing a straight update so I wanted to chunk it by using UPDATE TOP, however, as I understand it I need some field to determine if the records in MyTableA have been updated yet or not otherwise I'll end up in an infinite loop:
declare #again as bit;
set #again = 1;
while #again = 1
begin
update top (10000) MyTableA
set my.A1 = their.A1, my.A2 = their.A2, my.A3 = their.A3
from MyTableA my
join TheirTableA their on my.Id = their.Id
if ##ROWCOUNT > 0
set #again = 1
else
set #again = 0
end
is the only way this will work if I add in a
where my.A1 <> their.A1 and my.A2 <> their.A2 and my.A3 <> their.A3
this seems like it will be horribly inefficient with many columns to compare
I'm sure I'm missing an obvious alternative?
Assuming both tables are the same structure, you can get a resultset of rows that are different using
SELECT * into #different_rows from MyTable EXCEPT select * from TheirTable and then update from that using whatever key fields are available.
Well, the first, and simplest solution, would obviously be if you could change the schema to include a timestamp for last update - and then only update the rows with a timestamp newer than your last change.
But if that is not possible, another way to go could be to use the HashBytes function, perhaps by concatenating the fields into an xml that you then compare. The caveat here is an 8kb limit (https://connect.microsoft.com/SQLServer/feedback/details/273429/hashbytes-function-should-support-large-data-types) EDIT: Once again, I have stolen code, this time from:
http://sqlblogcasts.com/blogs/tonyrogerson/archive/2009/10/21/detecting-changed-rows-in-a-trigger-using-hashbytes-and-without-eventdata-and-or-s.aspx
His example is:
select batch_id
from (
select distinct batch_id, hash_combined = hashbytes( 'sha1', combined )
from ( select batch_id,
combined =( select batch_id, batch_name, some_parm, some_parm2
from deleted c -- need old values
where c.batch_id = d.batch_id
for xml path( '' ) )
from deleted d
union all
select batch_id,
combined =( select batch_id, batch_name, some_parm, some_parm2
from some_base_table c -- need current values (could use inserted here)
where c.batch_id = d.batch_id
for xml path( '' ) )
from deleted d
) as r
) as c
group by batch_id
having count(*) > 1
A last resort (and my original suggestion) is to try Binary_Checksum? As noted in the comment, this does open the risk for a rather high collision rate.
http://msdn.microsoft.com/en-us/library/ms173784.aspx
I have stolen the following example from lessthandot.com - link to the full SQL (and other cool functions) is below.
--Data Mismatch
SELECT 'Data Mismatch', t1.au_id
FROM( SELECT BINARY_CHECKSUM(*) AS CheckSum1 ,au_id FROM pubs..authors) t1
JOIN(SELECT BINARY_CHECKSUM(*) AS CheckSum2,au_id FROM tempdb..authors2) t2 ON t1.au_id =t2.au_id
WHERE CheckSum1 <> CheckSum2
Example taken from http://wiki.lessthandot.com/index.php/Ten_SQL_Server_Functions_That_You_Have_Ignored_Until_Now
I don't know if this is better than adding where my.A1 <> their.A1 and my.A2 <> their.A2 and my.A3 <> their.A3, but I would definitely give it a try (assuming SQL Server 2005+):
declare #again as bit;
set #again = 1;
declare #idlist table (Id int);
while #again = 1
begin
update top (10000) MyTableA
set my.A1 = their.A1, my.A2 = their.A2, my.A3 = their.A3
output inserted.Id into #idlist (Id)
from MyTableA my
join TheirTableA their on my.Id = their.Id
left join #idlist i on my.Id = i.Id
where i.Id is null
/* alternatively (instead of left join + where):
where not exists (select * from #idlist where Id = my.Id) */
if ##ROWCOUNT > 0
set #again = 1
else
set #again = 0
end
That is, declare a table variable for collecting the IDs of the rows being updated and use that table for looking up (and omitting) IDs that have already been updated.
A slight variation on the method would be to use a local temporary table instead of a table variable. That way you would be able to create an index on the ID lookup table, which might result in better performance.
If schema change is not possible. How about using trigger to save off the Ids that have changed. And only import/export those rows.
Or use trigger to export it immediately.

How do i check if something exist without using count(*) ... limit 1

My code is SELECT COUNT(*) FROM name_list WHERE [name]='a' LIMIT 1
It appears there is no limit clause in SQL Server. So how do i say tell me if 'a' exist in name_list.name?
IF EXISTS(SELECT * FROM name_list WHERE name = 'a')
BEGIN
-- such a record exists
END
ELSE
BEGIN
-- such a record does not exist
END
Points to note:
don't worry about the SELECT * - the database engine knows what you are asking
the IF is just for illustration - the EXISTS(SELECT ...) expression is what answers your question
the BEGIN and END are strictly speaking unnecessary if there is only one statement in the block
COUNT(*) returns a single row anyway, no need to limit.
The ANSI equivalent for LIMIT is TOP: SELECT TOP(1) ... FROM ... WHERE...
And finally, there is EXISTS: IF EXISTS (SELECT * FROM ... WHERE ...).
The TOP clause is the closest equivalent to LIMIT. The following will return all of the fields in the first row whose name field equals 'a' (altough if more than one row matches, the row that ets returned will be undefined unless you also provide an ORDER BY clause).
SELECT TOP 1 * FROM name_list WHERE [name]='a'
But there's no need to use it if you're doing a COUNT(*). The following will return a single row with a single field that is number of rows whose name field eqals 'a' in the whole table.
SELECT COUNT(*) FROM name_list WHERE [name]='a'
IF (EXISTS(SELECT [name] FROM name_list where [name] = 'a'))
begin
//do other work if exists
end
You can also do the opposite:
IF (NOT EXISTS(SELECT [name] FROM name_list where [name] = 'a'))
begin
//do other work if not exists
end
No nono that is wrong.
First there is top, so you have to say something like:
select top 1 1 from name_list where [name]='a'
You'll get a row with only a unnamed field with 1 out of the query if there is data, and no rows at all if there is no data.
This query returns exactly what you intended:
SELECT TOP 1 CASE WHEN EXISTS(SELECT * WHERE [name] = 'a') THEN 1 ELSE 0 END FROM name_list

Saving a select count(*) value to an integer (SQL Server)

I'm having some trouble with this statement, owing no doubt to my ignorance of what is returned from this select statement:
declare #myInt as INT
set #myInt = (select COUNT(*) from myTable as count)
if(#myInt <> 0)
begin
print 'there's something in the table'
end
There are records in myTable, but when I run the code above the print statement is never run. Further checks show that myInt is in fact zero after the assignment above. I'm sure I'm missing something, but I assumed that a select count would return a scalar that I could use above?
If #myInt is zero it means no rows in the table: it would be NULL if never set at all.
COUNT will always return a row, even for no rows in a table.
Edit, Apr 2012: the rules for this are described in my answer here:Does COUNT(*) always return a result?
Your count/assign is correct but could be either way:
select #myInt = COUNT(*) from myTable
set #myInt = (select COUNT(*) from myTable)
However, if you are just looking for the existence of rows, (NOT) EXISTS is more efficient:
IF NOT EXISTS (SELECT * FROM myTable)
select #myInt = COUNT(*) from myTable
Declare #MyInt int
Set #MyInt = ( Select Count(*) From MyTable )
If #MyInt > 0
Begin
Print 'There''s something in the table'
End
I'm not sure if this is your issue, but you have to esacpe the single quote in the print statement with a second single quote. While you can use SELECT to populate the variable, using SET as you have done here is just fine and clearer IMO. In addition, you can be guaranteed that Count(*) will never return a negative value so you need only check whether it is greater than zero.
[update] -- Well, my own foolishness provides the answer to this one. As it turns out, I was deleting the records from myTable before running the select COUNT statement.
How did I do that and not notice? Glad you asked. I've been testing a sql unit testing platform (tsqlunit, if you're interested) and as part of one of the tests I ran a truncate table statement, then the above. After the unit test is over everything is rolled back, and records are back in myTable. That's why I got a record count outside of my tests.
Sorry everyone...thanks for your help.

Resources