I am running SQL Server 2012 and have constructed a query to minus results one primary from another. I have done this a few ways one being:
SELECT
campaignContact_id,
nlLogID,
emailAddress
FROM
sm_rel_campaign_contact rcc
WHERE
rcc.campaignContact_id NOT IN (SELECT campaignContact_id
FROM SM_BOUNCES)
AND rcc.campaigncontact_id NOT IN (SELECT campaignContact_id
FROM SM_DEFERRALS )
AND rcc.campaignContact_id NOT IN (SELECT campaignContact_id
FROM SM_FAILS)
AND rcc.campaignContact_id NOT IN (SELECT campaignContact_id
FROM SM_SENDS )
Another being :
ALTER VIEW SM_QUEUE
AS
(
SELECT
campaignContact_id,
nlLogID,
emailAddress
FROM
sm_rel_campaign_contact rcc
WHERE
NOT EXISTS (SELECT * FROM SM_BOUNCES smb
WHERE rcc.campaignContact_id = smb.campaignContact_id)
AND NOT EXISTS (SELECT * FROM SM_DEFERRALS smd
WHERE rcc.campaignContact_id = smd.campaignContact_id)
AND NOT EXISTS (SELECT * FROM SM_FAILS smf
WHERE rcc.campaignContact_id = smf.campaignContact_id)
AND NOT EXISTS (SELECT * FROM SM_SENDS sms
WHERE rcc.campaignContact_id = sms.campaignContact_ID)
)
The issue is when I run this guy after I create the view (either way):
SELECT count(*)
FROM SM_QUEUE
WHERE nlLogID = 81505
it's running incredibly slow! I know you can index views but I wanted to see if anyone had a better suggestion? LEFT OUTER JOIN's maybe?
Appreciate any feedback in advance.
You won't be able to index this view - sub queries or outer joins make this unindexable.
Probably you are missing some useful indexes on the base tables though.
A possible different approach that may perform better is
WITH Excludes
AS (SELECT campaignContact_id
FROM SM_BOUNCES
UNION ALL
SELECT campaignContact_id
FROM SM_DEFERRALS
UNION ALL
SELECT campaignContact_id
FROM SM_FAILS
UNION ALL
SELECT campaignContact_id
FROM SM_SENDS)
SELECT campaignContact_id,
nlLogID,
emailAddress
FROM sm_rel_campaign_contact rcc
WHERE NOT EXISTS (SELECT *
FROM Excludes e
WHERE e.campaignContact_id = rcc.campaignContact_id)
If that doesn't help edit your question and include the CREATE TABLE statements including indexes and details of sizes of the tables involved.
'not in' cannot typically be optimzed (you end up with full table scans) but not exists can be optimized by the query analyzer. See if you can change the not in to a not exist in the first query.
cant really try it since sql fiddle is down and i have no sql server till monday (maybe you need a few more parentheses):
WITH cte AS (
SELECT campaignContact_id FROM sm_rel_campaign_contact rcc
except
SELECT campaignContact_id FROM SM_BOUNCES
except
SELECT campaignContact_id FROM SM_DEFERRALS
except
SELECT campaignContact_id FROM SM_FAILS
except
SELECT campaignContact_id FROM SM_SENDS)
SELECT *
from cte
join sm_rel_campaign_contact rcc on rcc.campaignContact_id = cte.campaignContact_id
I would expect the second version (with the WHERE NOT EXISTS()) to run quite a bit faster than the one with the WHERE NOT IN () construction. However, this requires that you have an index on the campaignContact_id field in each of the referenced tables (SM_BOUNCES, SM_DEFERRALS,SM_FAILS & SM_SENDS). Additionally, since you say these are big tables, do you have an index on the nlLogID field to start with? It might well be that your query spends more time scanning the sm_rel_campaign_contact table than figuring out if it has relevant data in the other tables; especially since you say both versions in the question run 'about the same'.
TIP: try running the query with the 'Show Execution Plan' and interpret what's going on there, or simply take a screenshot and attach it to your question above and see if we can get any insights from it.
Related
I have a table with hierarchical data (contains ~5.000.000 records). I use the following query to retrieve all children records of a specific record:
with TEMP_PACKAGES as
(
select ID AS PACKAGE_ID,PARENT_PACKAGE_ID,1 as LEVEL from PACKAGES where PACKAGES.ID = 5405988
union all
select A.ID AS PACKAGE_ID, B.PARENT_PACKAGE_ID,B.LEVEL+1 AS LEVEL
from PACKAGES as A join TEMP_PACKAGES B on (A.PARENT_PACKAGE_ID = B.PACKAGE_ID)
)
select PACKAGE_ID,PARENT_PACKAGE_ID,LEVEL from TEMP_PACKAGES
so far so good, the above query executed instantly (0 ms).
Now, when I add one more field (name:RESERVED) on the select, the query execution time goes from 0ms to 15000ms (15") (!):
with TEMP_PACKAGES as
(
select ID AS PACKAGE_ID,PARENT_PACKAGE_ID,RESERVED,1 as LEVEL from PACKAGES where PACKAGES.ID = 5405988
union all
select A.ID AS PACKAGE_ID, B.PARENT_PACKAGE_ID,A.RESERVED,B.LEVEL+1 AS LEVEL
from PACKAGES as A join TEMP_PACKAGES B on (A.PARENT_PACKAGE_ID = B.PACKAGE_ID)
)
select PACKAGE_ID,PARENT_PACKAGE_ID,RESERVED,LEVEL from TEMP_PACKAGES
Note that:
All the appropriate indexes exists (ID,PARENT_PACKAGE_ID)
The type of RESERVED field is bit(NULL)
Any ideas why this happening?
Also note that if I modify the query as this:
with TEMP_PACKAGES as
(
select ID AS PACKAGE_ID,PARENT_PACKAGE_ID,1 as LEVEL from PACKAGES where PACKAGES.ID = 5405988
union all
select A.ID AS PACKAGE_ID, B.PARENT_PACKAGE_ID,B.LEVEL+1 AS LEVEL
from PACKAGES as A join TEMP_PACKAGES B on (A.PARENT_PACKAGE_ID = B.PACKAGE_ID)
)
select P.ID,P.PARENT_PACKAGE_ID,P.RESERVED,TP.LEVEL
from TEMP_PACKAGES as TP join PACKAGES as P on TP.PACKAGE_ID=P.ID
the performance is also instantly (0ms), as the first query.
Update (2022.04.13)
thank you for your answers. I attached both execution plans (fast query & slow query) as many of you have requested.
Also, the SQL server edition is 2008 64bit (SP3).
Execution plans image
You should include the Reserved column as part of the index on ID column.
Before adding the reserved column, you query only used the index and did not touch the table for any I/O.
As soon as you added reserved column, every CTE iteration needed to look up the reserved value from the table using ID.
If you cover the reserved column with an index on ID, you will get the performance you seek.
See MS documentation on this here
In order to use newid() in a UDF, I've created a view that can help me:
create view RandomUUID as select newid() as UUID
In this way, UDFs can now get access to newid(). Cool.
My question, however, is what is the best way to select this? Does it make sense to add a TOP 1 or (nolock) to the query in my UDF? As in:
select UUID from RandomUUID
vs.
select top 1 UUID from RandomUUID (nolock) -- Or any other combo of query modifiers
UPDATE:
This SqlFiddle demonstrates how this is being used.
There is no reason to add (nolock) because there is no record involved to lock! For the record, (NOLOCK) should be written as WITH (nolock) from SQL Server 2008 onwards (or was it 2005).
TOP (1) will add a SORT operator here that will be extraneous, since there is only ever one row created.
You can create it like this:
create view RandomUUID as select newid() as UUID;
GO
create function give_me_a_new_id ()
returns uniqueidentifier as
begin
return (select UUID from RandomUUID);
end;
GO
Note: (nolock) will be optimized away, but TOP(1) adds a SORT operation as seen here (expand the Execution plans).
Suppose I have
INSERT INTO #tmp1 (ID) SELECT ID FROM Table1 WHERE Name = 'A'
INSERT INTO #tmp2 (ID) SELECT ID FROM Table2 WHERE Name = 'B'
SELECT ID FROM #tmp1 UNION ALL SELECT ID FROM #tmp3
I would like to run queries 1 & 2 in parallel, and then combine results after they are finished.
Is there a way to do this in pure T-SQL, or a way to check if it will do this automatically?
A background for those who wants it: I investigate a complex search where there're multiple conditions which are later combined (term OR (term2 AND term3) OR term4 AND item5=term5) and thus I investigate if it would be useful to execute those - largely unrelated - conditions in parallel, later combining resulting tables (and calculating ranks, weights, and so on).
E.g. should be several resultsets:
SELECT COUNT(*) #tmp1 union #tmp3
SELECT ID from (#tmp1 union #tmp2) WHERE ...
SELECT * from TABLE3 where ID IN (SELECT ID FROM #tmp1 union #tmp2)
SELECT * from TABLE4 where ID IN (SELECT ID FROM #tmp1 union #tmp2)
You don't. SQL doesn't work like that: it isn't procedural. It leads to race conditions and data issues because of other connections
Table variables are also scoped to the batch and connection so you can't share results over 2 connections in case you're wondering.
In any case, all you need is this, unless you gave us an bad example:
SELECT ID FROM Table1 WHERE Name = 'A'
UNION
SELECT ID FROM Table2 WHERE Name = 'B'
I suspect you're thinking of "run in parallel" because of this procedural thinking. What is your actual desired problem and goal?
Note: table variables do not allow parallel operations: Can queries that read table variables generate parallel exection plans in SQL Server 2008?
You don't decide what to parallelise - SQL Server's optimizer does. And the largest unit of work that the optimizer will work with is a single statement - so, you find a way to express your query as a single statement, and then rely on SQL Server to do its job, which it will usually do quite well.
If, having constructed your query, the performance isn't acceptable, then you can look at applying hints or forcing certain plans to be used. A lot of people break their queries into multiple statements, either believing that they can do a better job than SQL Server, or because it's how they "naturally" think of the task at hand. Both are "wrong" (for certain values of wrong), but if there's a natural breakdown, you may be able to replicate it using Common Table Expressions - these would allow you to name each sub-part of the problem, and then combine them together, all as part of a single statement.
E.g.:
;WITH TabA AS (
SELECT ID FROM Table1 WHERE Name = 'A'
), TabB AS (
SELECT ID FROM Table2 WHERE Name = 'B'
)
SELECT ID FROM TabA UNION ALL SELECT ID FROM TabB
And this will allow the server to decide how best to resolve this query (e.g. deciding whether to store intermediate results in "temp" tables)
Seeing in one of your other comments you discussing about having to "work with" the intermediate results - this can still be done with CTEs (if it's not just a case of you failing to be able to express the "final" result as a single query), e.g.:
;WITH TabA AS (
SELECT ID FROM Table1 WHERE Name = 'A'
), TabAWithCalcs AS (
SELECT ID,(ID*5+6) as ModID from TabA
)
SELECT * FROM TabAWithCalcs
Why not just:
SELECT ID FROM Table1 WHERE Name = 'A'
UNION ALL
SELECT ID FROM Table2 WHERE Name = 'B'
then if SQL Server wants to run the two selects in parallel, it will do at its own violition.
Otherwise we need more context for what you're trying to achieve if this isn't practical.
I have the following Query and i need the query to fetch data from SomeTable based on the filter criteria present in the Someothertable. If there is nothing present in SomeOtherTable Query should return me all the data present in SomeTable
SQL SERVER 2005
SomeOtherTable does not have any indexes or any constraint all fields are char(50)
The Following Query work fine for my requirements but it causes performance problems when i have lots of parameters.
Due to some requirement of Client, We have to keep all the Where clause data in SomeOtherTable. depending on subid data will be joined with one of the columns in SomeTable.
For example the Query can can be
SELECT
*
FROM
SomeTable
WHERE
1=1
AND
(
SomeTable.ID in (SELECT DISTINCT ID FROM SomeOtherTable WHERE Name = 'ABC' and subid = 'EF')
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC' and subid = 'EF')
)
AND
(
SomeTable.date =(SELECT date FROM SomeOtherTable WHERE Name = 'ABC' and subid = 'Date')
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC' and subid = 'Date')
)
EDIT----------------------------------------------
I think i might have to explain my problem in detail:
We have developed an ASP.net application that is used to invoke parametrize crystal reports, parameters to the crystal reports are not passed using the default crystal reports method.
In ASP.net application we have created wizards which are used to pass the parameters to the Reports, These parameters are not directly consumed by the crystal report but are consumed by the Query embedded inside the crystal report or the Stored procedure used in the Crystal report.
This is achieved using a table (SomeOtherTable) which holds parameter data as long as report is running after which the data is deleted, as such we can assume that SomeOtherTable has max 2 to 3 rows at any given point of time.
So if we look at the above query initial part of the Query can be assumed as the Report Query and the where clause is used to get the user input from the SomeOtherTable table.
So i don't think it will be useful to create indexes etc (May be i am wrong).
SomeOtherTable does not have any
indexes or any constraint all fields
are char(50)
Well, there's your problem. There's nothing you can do to a query like this which will improve its performance if you create it like this.
You need a proper primary or other candidate key designated on all of your tables. That is to say, you need at least ONE unique index on the table. You can do this by designating one or more fields as the PK, or you can add a UNIQUE constraint or index.
You need to define your fields properly. Does the field store integers? Well then, an INT field may just be a better bet than a CHAR(50).
You can't "optimize" a query that is based on an unsound schema.
Try:
SELECT
*
FROM
SomeTable
LEFT JOIN SomeOtherTable ON SomeTable.ID=SomeOtherTable.ID AND Name = 'ABC'
WHERE
1=1
AND
(
SomeOtherTable.ID IS NOT NULL
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC')
)
also put 'with (nolock)' after each table name to improve performance
The following might speed you up
SELECT *
FROM SomeTable
WHERE
SomeTable.ID in
(SELECT DISTINCT ID FROM SomeOtherTable Where Name = 'ABC')
UNION
SELECT *
FROM SomeTable
Where
NOT EXISTS (Select spName From SomeOtherTable Where spName = 'ABC')
The UNION will effectivly split this into two simpler queries which can be optiomised separately (depends very much on DBMS, table size etc whether this will actually improve performance -- but its always worth a try).
The "EXISTS" key word is more efficient than the "SELECT COUNT(1)" as it will return true as soon as the first row is encountered.
Or check if the value exists in db first
And you can remove the distinct keyword in your query, it is useless here.
if EXISTS (Select spName From SomeOtherTable Where spName = 'ABC')
begin
SELECT *
FROM SomeTable
WHERE
SomeTable.ID in
(SELECT ID FROM SomeOtherTable Where Name = 'ABC')
end
else
begin
SELECT *
FROM SomeTable
end
Aloha
Try
select t.* from SomeTable t
left outer join SomeOtherTable o
on t.id = o.id
where (not exists (select id from SomeOtherTable where spname = 'adbc')
OR spname = 'adbc')
-Edoode
change all your select statements in the where part to inner jons.
the OR conditions should be union all-ed.
also make sure your indexing is ok.
sometimes it pays to have an intermediate table for temp results to which you can join to.
It seems to me that there is no need for the "1=1 AND" in your query. 1=1 will always evaluate to be true, leaving the software to evaluate the next part... why not just skip the 1=1 and evaluate the juicy part?
I am going to stick to my original Query.
I have a table in SQL server that has the normal tree structure of Item_ID, Item_ParentID.
Suppose I want to iterate and get all CHILDREN of a particular Item_ID (at any level).
Recursion seems an intuitive candidate for this problem and I can write an SQL Server function to do this.
Will this affect performance if my table has many many records?
How do I avoid recursion and simply query the table? Please any suggestions?
With the new MS SQL 2005 you could use the WITHkeyword
Check out this question and particularly this answer.
With Oracle you could use CONNECT BY keyword to generate hierarchical queries (syntax).
AFAIK with MySQL you'll have to use the recursion.
Alternatively you could always build a cache table for your records parent->child relationships
As a general answer, it is possible to do some pretty sophisticated stuff in SQL Server that normally needs recursion, simply by using an iterative algorithm. I managed to do an XHTML parser in Transact SQL that worked surprisingly well. The the code prettifier I wrote was done in a stored procedure. It aint elegant, it is rather like watching buffalo doing Ballet. but it works .
Are you using SQL 2005?
If so you can use Common Table Expressions for this. Something along these lines:
;
with CTE (Some, Columns, ItemId, ParentId) as
(
select Some, Columns, ItemId, ParentId
from myTable
where ItemId = #itemID
union all
select a.Some, a.Columns, a.ItemId, a.ParentId
from myTable as a
inner join CTE as b on a.ParentId = b.ItemId
where a.ItemId <> b.ItemId
)
select * from CTE
The problem you will face with recursion and performance is how many times it will have to recurse to return the results. Each recursive call is another separate call that will have to be joined into the total results.
In SQL 2k5 you can use a common table expression to handle this recursion:
WITH Managers AS
(
--initialization
SELECT EmployeeID, LastName, ReportsTo
FROM Employees
WHERE ReportsTo IS NULL
UNION ALL
--recursive execution
SELECT e.employeeID,e.LastName, e.ReportsTo
FROM Employees e INNER JOIN Managers m
ON e.ReportsTo = m.employeeID
)
SELECT * FROM Managers
or another solution is to flatten the hierarchy into a another table
Employee_Managers
ManagerId (PK, FK to Employee table)
EmployeeId (PK, FK to Employee table)
All the parent child relation ships would be stored in this table, so if Manager 1 manages Manager 2 manages employee 3, the table would look like:
ManagerId EmployeeId
1 2
1 3
2 1
This allows the hierarchy to be easily queried:
select * from employee_managers em
inner join employee e on e.employeeid = em.employeeid and em.managerid = 42
Which would return all employees that have manager 42. The upside will be greater performance, but downside is going to be maintaining the hierarchy
Joe Celko has a book (<- link to Amazon) specifically on tree structures in SQL databases. While you would need recursion for your model and there would definitely be a potential for performance issues there, there are alternative ways to model a tree structure depending on what your specific problem involves which could avoid recursion and give better performance.
Perhaps some more detail is in order.
If you have a master-detail relationship as you describe, then won't a simple JOIN get what you need?
As in:
SELECT
SOME_FIELDS
FROM
MASTER_TABLE MT
,CHILD_TABLE CT
WHERE CT.PARENT_ID = MT.ITEM_ID
You shouldn't need recursion for children - you're only looking at the level directly below (i.e. select * from T where ParentId = #parent) - you only need recursion for all descendants.
In SQL2005 you can get the descendants with:
with AllDescendants (ItemId, ItemText) as (
select t.ItemId, t.ItemText
from [TableName] t
where t.ItemId = #ancestorId
union
select sub.ItemId, sub.ItemText
from [TableName] sub
inner join [TableName] tree
on tree.ItemId = sub.ParentItemId
)
You don't need recursion at all....
Note, I changed columns to ItemID and ItemParentID for ease of typing...
DECLARE #intLevel INT
SET #intLevel = 1
INSERT INTO TempTable(ItemID, ItemParentID, Level)
SELECT ItemID, ItemParentID, #intLevel
WHERE ItemParentID IS NULL
WHILE #intLevel < #TargetLevel
BEGIN
SET #intLevel = #intLevel + 1
INSERT INTO TempTable(ItemID, ItemParentID, Level)
SELECt ItemID, ItemParentID, #intLevel
WHERE ItemParentID IN (SELECT ItemID FROM TempTable WHERE Level = #intLevel-1)
-- If no rows are inserted then there are no children
IF ##ROWCOUNT = 0
BREAK
END
SELECt ItemID FROM TempTable WHERE Level = #TargetLevel