NOT IN statement is slowing down my query

NOT IN statement is slowing down my query - sql-server

I have a problem with my query. I have a simple example here that illustrates the code I have.
SELECT distinct ID
FROM Table
WHERE IteamNumber in (132,434,675) AND Year(DateCreated) = 2019
AND ID NOT IN (
SELECT Distinct ID FROM Table
WHERE IteamNumber in (132,434,675) AND DateCreated < '2019-01-01')
As you can see, I'm retrieving unique data id's that has been created in 2019 and not earlier.
The select statements works fine, but once I use the NOT IN statement, the query could easily go 1 minute plus.
My other question could this be related to the computer/server performance that is running the SQL Server for Microsoft Business Central? Because the same query worked perfectly after all even with the (NOT IN) statement, but that was in Microsoft dynamics C5 SQL Server.
So my question is there something wrong with my query or is it mainly a server issue?
UPDATE: here is a real example: this takes 25 seconds to retrieve 500 rows
Select count(distinct b.No_),'2014'
from [Line] c
inner join [Header] a
on a.CollectionNo = c.CollectionNo
Inner join [Customer] b
on b.No_ = a.CustomerNo
where c.No_ in('2101','2102','2103','2104','2105')
and year(Enrollmentdate)= 2014
and(a.Resignationdate < '1754-01-01 00:00:00.000' OR a.Resignationdate >= '2014-12-31')
and NOT EXISTS(Select distinct x.No_
from [Line] c
inner join [Header] a
on a.CollectionNo = c.CollectionNo
Inner join [Customer] x
on x.No_ = a.CustomerNo
where x.No_ = b.No_ and
c.No_ in('2101','2102','2103','2104','2105')
and Enrollmentdate < '2014-01-01'
and(a.Resignationdate < '1754-01-01 00:00:00.000' OR a.Resignationdate > '2014-12-31'))

If I understand correctly you can write the query as a GROUP BY query with a HAVING clause:
SELECT ID
FROM t
WHERE IteamNumber in (132, 434, 675)
GROUP BY ID
HAVING MIN(DateCreated) >= '20190101' -- no row earlier than 2019
AND MIN(DateCreated) < '20200101' -- at least one row less than 2020
This will remove rows for which an earlier record exists. You can further improve the performance by creating a covering index:
CREATE INDEX IX_t_0001 ON t (ID) INCLUDE (IteamNumber, DateCreated)

I usually prefer JOINs than INs, you can get the same result but the engine tends be able to optimize it better.
You join your main query (T1) with what was the IN subquery (T2), and you filter that T2.ID is null, ensuring that you haven't found any record matching those conditions.
SELECT distinct T1.ID
FROM Table T1
LEFT JOIN Table T2 on T2.ID = T1.ID AND
T2.IteamNumber in (132,434,675) AND T2.DateCreated < '2019-01-01'
WHERE T1.IteamNumber in (132,434,675) AND Year(T1.DateCreated) = 2019 AND
T2.ID is null
UPDATE: Here is the proposal updated with your real query. Since your subquery has inner joins, I have created a CTE so you can left join that subquery. The functioning is the same, you left join your main query with the subquery and you return only the rows with no matching records found on the subquery.
with previous as (
Select x.No_
from [Line] c
inner join [Header] a on a.CollectionNo = c.CollectionNo
inner join [Customer] x on x.No_ = a.CustomerNo
where c.No_ in ('2101','2102','2103','2104','2105')
and Enrollmentdate < '2014-01-01'
and (a.Resignationdate < '1754-01-01 00:00:00.000' OR a.Resignationdate > '2014-12-31'))
)
Select count(distinct b.No_),'2014'
from [Line] c
inner join [Header] a on a.CollectionNo = c.CollectionNo
inner join [Customer] b on b.No_ = a.CustomerNo
left join previous p on p.No_ = b.No_
where c.No_ in ('2101','2102','2103','2104','2105')
and year(Enrollmentdate)= 2014
and (a.Resignationdate < '1754-01-01 00:00:00.000' OR a.Resignationdate >= '2014-12-31')
and p.No_ is null

Issue is because of your IN statement, it is preferred in my opinion to avoid any IN statement rather then this, create join with subquery and filter out your data using where clause.
In case of IN statement each record of your table mapped with all the records of subquery, which definitely slows down your process.
If it is mandatory to use IN clause then use it with index. Create proper index of your respected columns, which improve your performance.
Instead of IN you may use EXISTS to increase the performance of your query.
Example of EXISTS is :
SELECT distinct ID
FROM Table AS T
WHERE IteamNumber in (132,434,675) AND Year(DateCreated) = 2019
AND NOT EXISTS (
SELECT Distinct ID FROM Table AS T2
WHERE T1.ID=T2.ID
AND IteamNumber in (132,434,675) AND DateCreated < '2019-01-01' )

Related

SQL Server Never Ending when Join Two Tables

I have one source table in DB. I need to do group and sum to get one bridging table, extract supplier info on the other bridging table then join the two using part_number.
If I run the subqueries separately, T1 gives me 54699 records and T2 gives approx 10 times rows of T1.
Next, I do left join, I expect it should return 54699 records, but the server engine never stops and it returns 50 million records at the time I scroll down to the end. I have to stop the query manually. I realized there must something wrong with my query, but I can not figure it out. I would appreciate it if you have any ideas. Thank you!
SELECT
T1.*, T2.SUPPLIER
FROM
(SELECT
T.PART_NUMBER,T.YEAR, T.WEEK,
SUM(T.QTY_FILLED) TOTAL_FILLED,
SUM(T.QTY_ORDERED) TOTAL_ORDERED,
COUNT(T.LINE_NUMBER) ORDER_TIMES
FROM
DBO.TABLE1 T
WHERE
T.YEAR IS NOT NULL
GROUP BY
PART_NUMBER, T.YEAR, T.WEEK) T1
LEFT JOIN
(SELECT
T.PART_NUMBER, T.SUPPLIER
FROM
DBO.TABLE1 T) T2 ON T1.PART_NUMBER = T2.PART_NUMBER
ORDER BY
T1.PART_NUMBER, T1.YEAR, T1.WEEK
I also tried the window function, but still no luck.
WITH T1 AS
(
SELECT
T.PART_NUMBER,T.YEAR, T.WEEK,
SUM(T.QTY_FILLED) TOTAL_FILLED,
SUM(T.QTY_ORDERED) TOTAL_ORDERED,
COUNT(T.LINE_NUMBER) ORDER_TIMES
FROM
DBO.TABLE1 T
WHERE
T.YEAR IS NOT NULL
GROUP BY
PART_NUMBER, T.YEAR, T.WEEK
), T2 AS
(
SELECT T.PART_NUMBER, T.SUPPLIER
FROM DBO.TABLE1 T
)
SELECT
T1.*, T2.SUPPLIER
FROM
T1
LEFT JOIN
T2 ON T1.PART_NUMBER = T2.PART_NUMBER
ORDER BY
T1.PART_NUMBER, T1.YEAR, T1.WEEK

First of all, it not only return 54699 rows. You do a join without distinct, so the result could be the join of 50.000 x 5.000.000 rows and it depends on the value of your table.
If you use SQL 2017 or newer, try something like this:
SELECT
T.PART_NUMBER,T.YEAR, T.WEEK,
SUM(T.QTY_FILLED) TOTAL_FILLED,
SUM(T.QTY_ORDERED) TOTAL_ORDERED,
COUNT(T.LINE_NUMBER) ORDER_TIMES,
STRING_AGG (SUPPLIER, ', ') AS SUPPLIER
FROM
DBO.TABLE1 T
WHERE
T.YEAR IS NOT NULL
GROUP BY
PART_NUMBER, T.YEAR, T.WEEK

Find most recent date for ID with multiple records in MS Access

I have a field, "ID", and it has repeat values. (In the example; A21, B42, and C14). My two other fields in the table are "Date" and "Measurement". I want to create a query that will call the previous date WITH matching ID and display the results of that previous row. My end goal is to have a field in my query that will find the change between the current measurement for the ID and the measurement from the date prior.
I have attached an image of the table I have and what I want the query to display.

Sadly MS Access does not support lag(). This however can be emulated with a self-join and a not exists condition with a correlated subquery:
select
t.id,
tprev.date as previous_date,
tprev.measureement as previous_measurement
from Table1 as t
left join Table1 as tprev
on (tprev.id = t.id)
and (tprev.dat < t.date)
and (not exists (
select 1
from Table1 as t1
where
t1 = t.id
and t1.dat < t.date
and t1.dat > tprev.date
))

This is how to make the described query function:
SELECT t.NUM, t.ID, tprev.Date_ AS previous_date, tprev.Measurement AS previous_measurement
FROM Table1 AS t LEFT JOIN Table1 AS tprev ON (tprev.Date_ < t.Date_) AND (tprev.id = t.id)
WHERE not exists
(select 1
from Table1 AS t1
where
t1.ID = t.ID
and t1.Date_ < t.Date_
and t1.Date_ > tprev.Date_);

SQL Server 2008 - create columns from multi row data

I have the following code:
IF (OBJECT_ID('tempdb..#Data') IS NOT NULL)
BEGIN
DROP TABLE #Data
END
SELECT
t.Name, x.Time, x.Date, x.Total,
xo.DrvCommTotal, x.Name2, x.Street, x.Zip,
r.Route1
INTO
#Data
FROM
table1 xo WITH(NOLOCK)
LEFT JOIN
Table2 t WITH(NOLOCK) ON t.ID = x.ID
LEFT JOIN
Route1 r ON r.RouteID = x.RouteID
WHERE
x.Client = 1
AND x.Date = '9/13/2018'
GROUP BY
t.Name, x.Time, x.Date, x.Total, xo.DrvCommTotal, x.Name2,
x.Street, x.Zip, r.Route1
ORDER BY
Route1
SELECT DISTINCT
F.*, F2.NumOrders
FROM
#Data F
LEFT JOIN
(SELECT
Route1, COUNT(*) NumOrders
FROM
#Data
GROUP BY
Route1) F2 ON F2.Route1 = F.Route1
LEFT OUTER JOIN
(SELECT
Street + ',' + Zip Stops, Time, RouteN1
FROM
#Data
GROUP BY
RouteNo1, street, Zip) F3 ON F3.Route1 = F.Route1
WHERE
F.Route1 IS NOT NULL
ORDER BY
F.Route1
and it provides me with a list of routes and stops. The column NumOrders lets me know how many orders are on each route. I need the stops to become individual columns I will label Stop1, Stop2, etc. so that each route is only one row and all the information is contained on the row for one route.
I'm currently using the temp table because the data is so large. I can play with my SELECT statement without having to re-run the entire code.
How do I move the stops for each route into columns?

Hum.. Not quite sure I understand the question but it sounds that you want to pivot the data so that the routes break into columns. If so, I would use a sql Pivot. Here is an example from the documentation:
USE AdventureWorks2014;
GO
SELECT VendorID, [250] AS Emp1, [251] AS Emp2, [256] AS Emp3, [257] AS Emp4, [260] AS Emp5
FROM
(SELECT PurchaseOrderID, EmployeeID, VendorID
FROM Purchasing.PurchaseOrderHeader) p
PIVOT
(
COUNT (PurchaseOrderID)
FOR EmployeeID IN
( [250], [251], [256], [257], [260] )
) AS pvt
ORDER BY pvt.VendorID;
Also, here is the link to how to use pivot: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
Since you already have all the data in your temp table, you could pivot that on the way out.

optimize complex sql query

I am using azure sql server database. I have written one sql query to generate reprot. Here it is:
;WITH cte AS
(
SELECT ProjectID, CreatedDateUTC, ProductID, LicenseID, BackgroundID from Project p
WHERE CAST(p.CreatedDateUTC AS DATE) >= #StartDate and CAST(p.CreatedDateUTC AS DATE) <= #EndDate
and IsBackgroundUsed = 1
and s7ImageGenerated = 1 and p.SiteCode in ('b2c' )
)
SELECT ProjectID , CreatedDateUTC,
(SELECT BackgroundName from Background b WHERE b.BackgroundID = cte.BackgroundID) AS BackgroundName,
(SELECT Name FROM Product pr WHERE pr.ProductID = cte.ProductID) AS ProductName,
Case WHEN LicenseID is null THEN 'Standard' ELSE (SELECT LicenseName from License l WHERE l.LicenseID = cte.LicenseID) END AS CLA,
(SELECT PurchaseFG from Product_background pb WHERE pb.BackgroundID = cte.BackgroundID and pb.ProductId = cte.productID) AS PurchaseFG,
(SELECT FGcode from Product pr WHERE pr.ProductID = cte.ProductID) AS ProductFGCode,
--(Select dbo.[getProjectFGCodeByBackground](cte.ProductID, cte.BackgroundID)) AS FGCode,
'' AS ERPOrderNumber,
0 AS DesignQuanity
from cte
WHERE (SELECT count(*) from Approval.OrderDetail od WHERE od.ProjectID = cte.ProjectID) = 0
Is there any way to optimize this query. Timeout issue comes. I have written this query in store procedure and calling that store procedure using linq entity framework.
Earlier i have used join but it's more slow down so tried with sub query. Worked more then one year now not working.

This will definitely improve the performance, especially if the table Approval.OrderDetail is large:
...WHERE not exists
(SELECT 1 from Approval.OrderDetail od WHERE od.ProjectID = cte.ProjectID)

Writing a sub-select for every single field is a terrible way to retrieve data, as you'll likely end up with a lot of Loop Joins which have terrible performance over large data sets.
Your original JOIN method is the way to go, but you need to ensure you have appropriate indexes on your joining columns.
You can also replace the WHERE clause, with a LEFT JOIN and IS NULL combination
LEFT JOIN Approval.OrderDetail od
ON od.ProjectID = p.ProjectID
...
AND od.ProjectID IS NULL;
or a NOT EXISTS (although that is more likely to have to SCAN a wider range of rows for each row returned by the main query).
WHERE NOT EXISTS
(SELECT 1 FROM Approval.OrderDetail od WHERE od.ProjectID = cte.ProjectID)
In either case, make sure your Project table is appropriately indexed on (IsBackgroundUsed, s7ImageGenerated, SiteCode, CreatedDate) and that all joins are appropriately indexed.
I'd also question whether you actually need to cast your CreatedDateUTC fields to DATE types?
A possible simplification could be:
SELECT
p.ProjectID,
p.CreatedDateUTC,
b.BackgroundName,
pr.Name,
IIF(p.LicenseID IS NULL, 'Standard', l.LicenseName) AS CLA,
pb.PurchaseFG,
pr.FGCode AS ProductFGCode,
'' AS ERPOrderNumber,
0 AS DesignQuantity
FROM Project p
LEFT JOIN Approval.OrderDetail od
ON od.ProjectID = p.ProjectID
LEFT JOIN Background b
ON b.BackgroundID = p.BackgroundID
LEFT JOIN Product pr
ON pr.ProductID = p.ProductID
LEFT JOIN License l
ON l.LicenseID = p.LicenseID
LEFT JOIN Product_Background pb
ON pb.BackgroundID = p.BackgroundID
AND pb.ProductID = p.ProductID
WHERE p.CreatedDateUTC >= #StartDate AND p.CreatedDateUTC <= #EndDate
AND p.IsBackgroundUsed = 1
AND p.s7ImageGenerated = 1
AND p.SiteCode = 'b2c'
AND od.ProjectID IS NULL;

WHERE CAST(p.CreatedDateUTC AS DATE) >= #StartDate and CAST(p.CreatedDateUTC AS DATE) <= #EndDate
make this SARGAble ,create non clustered index on CreatedDateUTC
Suppose this is the parameter ,
declare #StartDate datetime='2018-02-01'
declare #EndDate datetime='2018-02-28'
Then,
set #EndDate=dateadd(second,-1,dateadd(day,1,#EndDate))
now you can safely use do this,
WHERE p.CreatedDateUTC >= #StartDate and p.CreatedDateUTC <= #EndDate
I think,#Mark Sinkinson query will work ok than sub query.( I will try NOT EXISTS clause once)
Use INNER JOIN if possible.
Hope you are using Store Procedure and calling the SP.
Create index on all joins columns.
Since your sub query is working fine output wise without TOP 1 so it appear that all tables have ONE to ONE relation with Project .
CREATE NONCLUSTERED INDEX IX_Project ON project (
CreatedDateUTC
,IsBackgroundUsed
,s7ImageGenerated
,SiteCode
) include (ProductID,LicenseID,BackgroundID);
Hope projectID is already Clustered Index.

Might not be much faster but easier to read for me.
You should be able to adjust #StartDate and #EndDate and not have to cast to date.
Have an index on all join and where conditions.
If those are FK you should be able to use an inner join (and should).
SELECT P.ProjectID , P.CreatedDateUTC,
b.BackgroundName,
pr.Name AS ProductName,
isnull(l.LicenseName, 'Standard') as CLA,
pb.PurchaseFG,
pr.FGcode AS ProductFGCode,
'' AS ERPOrderNumber,
0 AS DesignQuanity
from Project p
left join Background b
on b.BackgroundID = p.BackgroundID
left join Product pr
on pr.ProductID = p.ProductID
left join License l
on l.LicenseID = p.LicenseID
left join Product_background pb
on pb.BackgroundID = p.BackgroundID
and pb.ProductId = p.productID
left join Product pr
on pr.ProductID = p.ProductID
WHERE CAST(p.CreatedDateUTC AS DATE) >= #StartDate
and CAST(p.CreatedDateUTC AS DATE) <= #EndDate
and p.IsBackgroundUsed = 1
and p.s7ImageGenerated = 1
and p.SiteCode = 'b2c'
and not exists (SELECT 1
from Approval.OrderDetail od
WHERE od.ProjectID = p.ProjectID)

List data in a table that is not in a view

Table1 has columns
Id int, Date smalldatetime.
View1 has, among many other columns, column Id int.
View1 has a maximum of 2000 rows, but there are some rather complex computation to determine the values of all the columns.
What is the most efficient way to return all Table1.Id that are not in View1.Id for Table1.Date between '2012-05-30' and '2012-05-31' ?
The filtered selection from Table1 typically returns about 200 unique Table1.Id.
When I do a SELECT * FROM View1, the total data is returned always in under one second. When I do a SELECT Id from Table1 WHERE Date BETWEEN '2012-05-30' AND '2012-05-31', the result is always instanteous.
The moment I tried SELECT Table1.Id from Table1 T1 WHERE Date BETWEEN .. AND .. AND NOT EXISTS (SELECT Id from View1 WHERE ViewId=T1.Id), it takes ages (almost 20s).
I tried using a CTE also, WITH V1 as (SELECT Id from View1) SELECT T1.Id FROM Table1 T1 WHERE Date BETWEEN ... and ... AND NOT EXISTS (SELECT Id from V1 WHERE V1.Id=T1.Id), and it also took ages.
Thanks.

Try something like this:
SELECT t.Id, t.[Date]
FROM dbo.Table1 AS t
LEFT HASH JOIN dbo.View1 AS v ON v.Id = t.Id
WHERE t.[Date] >= '20120530' AND t.[Date] < '20120531'
AND v.Id IS NULL
The HASH hint forces the SQL Server query optimizer to evaluate the view only once.
Another way would be to use a table variable to store the result of the view:
DECLARE #ViewResult TABLE (Id int PRIMARY KEY)
INSERT INTO #ViewResult
SELECT Id FROM dbo.View1
SELECT Id, [Date]
FROM dbo.Table1
WHERE [Date] >= '20120530' AND [Date] < '20120531'
AND Id NOT IN (SELECT Id FROM #ViewResult)
Razvan

With little else to go on, I would say that if the view must be referenced then:
SELECT t.Id, t.[Date]
FROM dbo.Table1 AS t
WHERE t.[Date] >= '20120530'
AND t.[Date] < '20120531'
AND NOT EXISTS
(
SELECT 1 FROM dbo.View1 AS v
WHERE v.Id = t.Id
);
But I suspect you can do something more efficient here if you can bypass the view.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

NOT IN statement is slowing down my query - sql-server

Related

SQL Server Never Ending when Join Two Tables

Find most recent date for ID with multiple records in MS Access

SQL Server 2008 - create columns from multi row data

optimize complex sql query

List data in a table that is not in a view

Categories

Resources