Suggestions for improving slow performance of subquery

Suggestions for improving slow performance of subquery - sql-server

I've tried to illustrate the problem in the (made-up) example below. Essentially, I want to filter records in the primary table based on content in a secondary table. When I attempted this using subqueries, our application performance took a big hit (some queries nearly 10x slower).
In this example I want to return all case notes for a customer EXCEPT for the ones that have references to products 1111 and 2222 in the detail table:
select cn.id, cn.summary from case_notes cn
where customer_id = 2
and exists (
select 1 from case_note_details cnd
where cnd.case_note_id = cn.id
and cnd.product_id not in (1111,2222)
)
I tried using a join as well:
select distinct cn.id, cn.summary from case_notes cn
join case_note_details cnd
on cnd.case_note_id = cn.id
and cnd.product_id not in (1111,2222)
where customer_id = 2
In both cases the execution plan shows two clustered index scans. Any suggestions for other methods or tweaks to improve performance?
Schema:
CREATE TABLE case_notes
(
id int primary key,
employee_id int,
customer_id int,
order_id int,
summary varchar(50)
);
CREATE TABLE case_note_details
(
id int primary key,
case_note_id int,
product_id int,
detail varchar(1024)
);
Sample data:
INSERT INTO case_notes
(id, employee_id, customer_id, order_id, summary)
VALUES
(1, 1, 2, 1000, 'complaint1'),
(2, 1, 2, 1001, 'complaint2'),
(3, 1, 2, 1002, 'complaint3'),
(4, 1, 2, 1003, 'complaint4');
INSERT INTO case_note_details
(id, case_note_id, product_id, detail)
VALUES
(1, 1, 1111, 'Note 1, order 1000, complaint about product 1111'),
(2, 1, 2222, 'Note 1, order 1000, complaint about product 2222'),
(3, 2, 1111, 'Note 2, order 1001, complaint about product 1111'),
(4, 2, 2222, 'Note 2, order 1001, complaint about product 2222'),
(5, 3, 3333, 'Note 3, order 1002, complaint about product 3333'),
(6, 3, 4444, 'Note 3, order 1002, complaint about product 4444'),
(7, 4, 5555, 'Note 4, order 1003, complaint about product 5555'),
(8, 4, 6666, 'Note 4, order 1003, complaint about product 6666');

You have a clustered index scan because you are not accessing your case_note_details table by its id but via non-indexed columns.
I suggest adding an index to the case-note_details table on case_note_id, product_id.
If you are always accessing the case_note_details via the case_note_id, you might also restructure your primary key to be case_note_id, detail_id. There is no need for an independent id as primary key for dependent records. This would let you re-use your detail primary key index for joins with the header table.
Edit: add an index on customer_id as well to the case_notes table, as Manuel Rocha suggested.

When using "exists" I always limit results with "TOP" as bellow:
select cn.id
,cn.summary
from case_notes as cn
where customer_id = 2
and exists (
select TOP 1 1
from case_note_details as cnd
where cnd.case_note_id = cn.id
and cnd.product_id not in (1111,2222)
)

In table case_notes create index for customer_id and on table case_note_details create index for case_note_id and case_note_id.
Then try execute both query. Should have better performance now.
Try also this query
select
cn.id,
cn.summary
from
case_notes cn
where
cn.customer_id = 2 and
cn.id in
(
select
distinct cnd.case_note_id
from
case_note_details cnd
where
cnd.product_id not in (1111,2222)
)

Did you try "in" instead of "exists". This sometimes performs differently:
select cn.id, cn.summary from case_notes cn
where customer_id = 2
and cn.id in (
select cnd.case_note_id from case_note_details cnd
where cnd.product_id not in (1111,2222)
)
Of course, check indexes.

Related

Select all Main table rows with detail table column constraints with GROUP BY

I've 2 tables tblMain and tblDetail on SQL Server that are linked with tblMain.id=tblDetail.OrderID for orders usage. I've not found exactly the same situation in StackOverflow.
Here below is the sample table design:
/* create and populate tblMain: */
CREATE TABLE tblMain (
ID int IDENTITY(1,1) NOT NULL,
DateOrder datetime NULL,
CONSTRAINT PK_tblMain PRIMARY KEY
(
ID ASC
)
)
GO
INSERT INTO tblMain (DateOrder) VALUES('2021-05-20T12:12:10');
INSERT INTO tblMain (DateOrder) VALUES('2021-05-21T09:13:13');
INSERT INTO tblMain (DateOrder) VALUES('2021-05-22T21:30:28');
GO
/* create and populate tblDetail: */
CREATE TABLE tblDetail (
ID int IDENTITY(1,1) NOT NULL,
OrderID int NULL,
Gencod VARCHAR(255),
Quantity float,
Price float,
CONSTRAINT PK_tblDetail PRIMARY KEY
(
ID ASC
)
)
GO
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(1, '1234567890123', 8, 12.30);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(1, '5825867890321', 2, 2.88);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(3, '7788997890333', 1, 1.77);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(3, '9882254656215', 3, 5.66);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(3, '9665464654654', 4, 10.64);
GO
Here is my SELECT with grouping:
SELECT tblMain.id,SUM(tblDetail.Quantity*tblDetail.Price) AS TotalPrice
FROM tblMain LEFT JOIN tblDetail ON tblMain.id=tblDetail.orderid
WHERE (tblDetail.Quantity<>0) GROUP BY tblMain.id;
GO
This gives:
The wished output:
We see that id=2 is not shown even with LEFT JOIN, as there is no records with OrderID=2 in tblDetail.
How to design a new query to show tblMain.id = 2? Mean while I must keep WHERE (tblDetail.Quantity<>0) constraints. Many thanks.
EDIT:
The above query serves as CTE (Common Table Expression) for a main query that takes into account payments table tblPayments again.
After testing, both solutions work.
In my case, the main table has 15K records, while detail table has some millions. With (tblDetail.Quantity<>0 OR tblDetail.Quantity IS NULL) AND tblDetail.IsActive=1 added on JOIN ON clause it takes 37s to run, while the first solution of #pwilcox, the condition being added on the where clause, it ends up on 29s. So a gain of time of 20%.
tblDetail.IsActive column permits me ignore detail rows that is temporarily ignored by setting it to false.
So the for me it's ( #pwilcox's answer).
where (tblDetail.quantity <> 0 or tblDetail.quantity is null)

Change
WHERE (tblDetail.Quantity<>0)
to
where (tblDetail.quantity <> 0 or tblDetail.quantity is null)
as the former will omit id = 2 because the corresponding quantity would be null in a left join.
And as HABO mentions, you can also make the condition a part of your join logic as opposed to your where statement, avoiding the need for the 'or' condition.
select m.id,
totalPrice = sum(d.quantity * d.price)
from tblMain m
left join tblDetail d
on m.id = d.orderid
and d.quantity <> 0
group by m.id;

The maximum recursion 100 has been exhausted before statement completion (SQL Server)

In SQL Server, I have this simplified table and I'm trying to get a list of all employees with their domain manager:
IF OBJECT_ID('tempdb.dbo.#employees') IS NOT NULL DROP TABLE #employees
CREATE TABLE #employees (
empid int,
empname varchar(50),
mgrid int,
func varchar(50)
)
INSERT INTO #employees VALUES(1, 'Jeff', 2, 'Designer')
INSERT INTO #employees VALUES(2, 'Luke', 4, 'Head of designers')
INSERT INTO #employees VALUES(3, 'Vera', 2, 'Designer')
INSERT INTO #employees VALUES(4, 'Peter', 5, 'Domain Manager')
INSERT INTO #employees VALUES(5, 'Olivia', NULL, 'CEO')
;
WITH Emp_CTE AS (
SELECT empid, empname, func, mgrid AS dommgr
FROM #employees
UNION ALL
SELECT e.empid, e.empname, e.func, e.mgrid AS dommgr
FROM #employees e
INNER JOIN Emp_CTE ecte ON ecte.empid = e.mgrid
WHERE ecte.func <> 'Domain Manager'
)
SELECT * FROM Emp_CTE
So the output I want is:
empid empname func dommgr
1 Jeff Designer 4
2 Luke Head of designers 4
3 Vera Designer 4
Instead I get this error:
Msg 530, Level 16, State 1, Line 17
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
What am I doing wrong? Is it actually possible with CTE?
Edit: There was indeed an error in the data, the error has gone now, but the result isn't what I want:
empid empname func dommgr
1 Jeff Designer 2
2 Luke Head of designers 4
3 Vera Designer 2
4 Peter Domain Manager 5
5 Olivia CEO NULL
4 Peter Domain Manager 5
1 Jeff Designer 2
3 Vera Designer 2

You had two employees which were referenecing each other in the managerid, so one was the manager of the other. That caused the infinite recursion. There was also a gap in the recursion tree because the domain-manager was not referenced anywhere. You have fixed the sample data by changing Luke`s mgrid to 4. Now there is no gap and no lgical issue anymore.
But you also had no root entry for the recursion, the first query has no filter.
You can use this query:
WITH DomainManager AS (
SELECT empid, empname, func, dommgr = empid, Hyrarchy = 1
FROM #employees
WHERE func = 'Domain Manager'
UNION ALL
SELECT e.empid, e.empname, e.func, dommgr, Hyrarchy = Hyrarchy +1
FROM #employees e
INNER JOIN DomainManager dm ON dm.empid = e.mgrid
)
SELECT * FROM DomainManager
WHERE func <> 'Domain Manager'
ORDER BY empid
Note that the enry/root point for the CTE is the Domain Manager because you want to find every employees domain manager's ids. This id is transported down the hyrarchy. The final select needs to filter out the Domain Manager because you only want his ID for every employee, you dont want to include him in the result set.
The result of the query is:
empid empname func dommgr Hyrarchy
1 Jeff Designer 4 3
2 Luke Head of designers 4 2
3 Vera Designer 4 3

The error message is raised because the data contains a circular reference between Luke and Vera.
It's easier to perform hierarchical queries if you add a hierarchyid field. SQL Server provides functions that return descendants, ancestors and the level in a hierarchy. hierarchyid fields can be indexed resulting in improved performance.
In the employee example, you can add a level field :
declare #employees table (
empid int PRIMARY KEY,
empname varchar(50),
mgrid int,
func varchar(50),
level hierarchyid not null,
INDEX IX_Level (level)
)
INSERT INTO #employees VALUES
(1, 'Jeff', 2, 'Designer' ,'/5/4/2/1/'),
(2, 'Luke', 4, 'Head of designers','/5/4/2/'),
(3, 'Vera', 2, 'Designer' ,'/5/4/2/3/'),
(4, 'Peter', 5, 'Domain Manager' ,'/5/4/'),
(5, 'Olivia', NULL, 'CEO' ,'/5/')
;
` declare #employees table (
empid int PRIMARY KEY,
empname varchar(50),
mgrid int,
func varchar(50),
level hierarchyid not null,
INDEX IX_Level (level)
)
INSERT INTO #employees VALUES
(1, 'Jeff', 2, 'Designer' ,'/5/4/2/1/'),
(2, 'Luke', 4, 'Head of designers','/5/4/2/'),
(3, 'Vera', 2, 'Designer' ,'/5/4/2/3/'),
(4, 'Peter', 5, 'Domain Manager' ,'/5/4/'),
(5, 'Olivia', NULL, 'CEO' ,'/5/')
;
/5/4/2/1/ is the string representation of a hieararchyID value. It's essentially the path in the hierarchy that leads to a particular row.
To find all subordinates of domain managers, excluding the managers themselves, you can write :
with DMs as
(
select EmpID,level
from #employees
where func='Domain Manager'
)
select
PCs.empid,
PCs.empname as Name,
PCs.func as Class,
DMs.empid as DM,
PCs.level.GetLevel() as THAC0,
PCs.level.GetLevel()- DMs.level.GetLevel() as NextLevel
from
#employees PCs
inner join DMs on PCs.level.IsDescendantOf(DMs.level)=1
where DMs.EmpID<>PCs.empid;
The CTE is only used for convenience
The result is :
empid Name Class DM THAC0 NextLevel
1 Jeff Designer 4 4 2
2 Luke Head of designers 4 3 1
3 Vera Designer 4 4 2
The CTE returns all DMs and their hierarchyid value. The IsDescendantOf() query checks whether a row is a descentant of a DM or not. GetLevel() returns the level of the row in the hierarchy. By subtracting the DM's level from the employee's we get the distance between them

Like others said, you have here a problem with data (Vera).
IF OBJECT_ID('tempdb.dbo.#employees') IS NOT NULL
DROP TABLE #employees
CREATE TABLE #employees (
empid int,
empname varchar(50),
mgrid int,
func varchar(50)
)
INSERT INTO #employees VALUES(1, 'Jeff', 2, 'Designer')
INSERT INTO #employees VALUES(2, 'Luke', 3, 'Head of designers')
INSERT INTO #employees VALUES(3, 'Vera', 4, 'Designer') --**mgrid = 4 instead 2**
INSERT INTO #employees VALUES(4, 'Peter', 5, 'Domain Manager')
INSERT INTO #employees VALUES(5, 'Olivia', NULL, 'CEO')
;WITH Emp_CTE AS
(
SELECT empid, empname, func, mgrid AS dommgr, 0 AS Done
FROM #employees
UNION ALL
SELECT ecte.empid, ecte.empname, ecte.func,
CASE WHEN e.func = 'Domain Manager' THEN e.empid ELSE e.mgrid END AS dommgr,
CASE WHEN e.func = 'Domain Manager' THEN 1 ELSE 0 END AS Done
FROM Emp_CTE AS ecte
INNER JOIN #employees AS e ON
ecte.dommgr = e.empid
WHERE ecte.Done = 0--emp.func <> 'Domain Manager'
)
SELECT *
FROM Emp_CTE
WHERE Done = 1

SQL- Change column value based on other column values in same group & different row

I have a table in SQL Server:
Unique ITEM_ID are part of a group (GROUP_NUMBER). IS_ACTIVE and IS_LAST are either 1 - true , or 0 - false.
What I want to do:
I want to go through this table and for every active ITEM_ID (IS_ACTIVE = 1) that is the ONLY active ITEM_ID in it's group (GROUP_NUMBER) I want to make that row's IS_LAST is set to 1.
So for example, in the table above, the row for ITEM_ID = 6, I want IS_LAST to be 1
I am not sure how to do this as I am not that versed in SQL. I am trying to use a partition by command to maybe split each group up but doing the check to see if an ITEM_ID is the only active in its group seems challenging.
Any help or guidance here is appreciated.
It should be noted that I do not want to do an update or change the actual table in any way, just design a query that can do the changing and spit out an altered version of that table.

Here is solution:
DECLARE #t TABLE(ITEM_ID INT, GROUP_NUMBER INT, IS_ACTIVE BIT, IS_LAST bit)
INSERT INTO #t VALUES
(1, 1, 1, 0),
(2, 1, 1, 0),
(3, 2, 0, 0),
(4, 2, 0, 0),
(5, 2, 0, 0),
(6, 3, 1, 0),
(7, 3, 0, 0)
SELECT t1.ITEM_ID,
t1.GROUP_NUMBER,
t1.IS_ACTIVE,
CASE WHEN t1.IS_ACTIVE = 1 AND
NOT EXISTS(SELECT * FROM #t t2
WHERE t2.IS_ACTIVE = 1 AND t1.GROUP_NUMBER = t2.GROUP_NUMBER AND
t1.ITEM_ID <> t2.ITEM_ID)
THEN 1 ELSE 0 END AS IS_LAST
FROM #t t1

The following query returns the ITEM_ID's where only one is active in the group:
select ITEM_ID from MyTable M
where IS_ACTIVE = 1 and
not exists (select null
from MyTable N where n.IS_ACTIVE = 1 and
M.GROUP_NUMBER = N.GROUP_NUMBER and M.ITEM_ID <> N.ITEM_ID)
You can then left join this query with MyTable. Something like:
select *
from MyTable A left join (<query above>) B on A.ITEM_ID = B.ITEM_ID
If B.ITEM_ID is not Null then IS_LAST = 1.

I think you're on the right tracks with Partition By. The best way I can think to do this is with some code like:
SELECT Item_ID, Group_Number, Is_Active,
COUNT(*) OVER (PARTITION BY Group_Number, Is_Active ORDER BY Group_Number) [Members_In_Set],
CASE WHEN Is_Active = 1 AND COUNT(*) OVER (PARTITION BY Group_Number, Is_Active ORDER BY Group_Number) = 1 THEN 1 ELSE 0 END [Is_Last]
FROM My_Table
The Members_In_Set column is to demonstrate what the count returns when partitioning and then the CASE shows how to use this value along with Is_Active to get the result you're after

Recursive CTE possible for this?

I have product data structured in the following format:
ProductID OptionID Lvl OptionDescription SubOptionID SubOptionDescription
HPH 6 1 Model 10 Studio
HPH 6 1 Model 11 DJ
HPH 7 2 Device 12 Bluetooth
HPH 7 2 Device 13 Cable
HPH 7 2 Device 14 Remote
There could be any number of levels to the product. I need to traverse the hierarchy and produce the following output - a description for each product option:
Studio-Bluetooth
Studio-Cable
Studio-Remote
DJ-Bluetooth
DJ-Cable
DJ-Remote
I've looked CTE's but the examples tend to incorporate adjacent lists (employeeID; managerID..etc) which don't seem appropriate here.
How can I achieve this output?
Thanks.
CREATE TABLE [dbo].[Products](
[ProductID] [varchar](50) NULL,
[OptionID] [int] NULL,
[Lvl] [int] NULL,
[OptionDescription] [varchar](50) NULL,
[SubOptionID] [int] NULL,
[SubOptionDescription] [varchar](50) NULL
) ON [PRIMARY]
insert into Products (ProductID, OptionID, Lvl, OptionDescription, SubOptionID, SubOptionDescription) values ('HPH', 6, 1, 'Model', 10, 'Studio')
insert into Products (ProductID, OptionID, Lvl, OptionDescription, SubOptionID, SubOptionDescription) values ('HPH', 6, 1, 'Model', 11, 'DJ')
insert into Products (ProductID, OptionID, Lvl, OptionDescription, SubOptionID, SubOptionDescription) values ('HPH', 7, 2, 'Device', 12, 'Bluetooth')
insert into Products (ProductID, OptionID, Lvl, OptionDescription, SubOptionID, SubOptionDescription) values ('HPH', 7, 2, 'Device', 13, 'Cable')
insert into Products (ProductID, OptionID, Lvl, OptionDescription, SubOptionID, SubOptionDescription) values ('HPH', 7, 2, 'Device', 14, 'Remote')

with cte as (
-- Root level
select p.Lvl, cast(p.SubOptionDescription as varchar(max)) as [ProductOption]
from #Products p where p.Lvl = 1
union all
-- Anchor part - cartesian here?
select p.Lvl, c.ProductOption + '-' + p.SubOptionDescription
from #Products p
inner join cte c on c.Lvl = p.Lvl - 1
)
select c.ProductOption from cte c;
A couple of notes.
Right now your sample answer implies that you need to create a cartesian product. I hope this is not the case, because the amount of rows will increase explosively. If there are other join conditions which are not apparent from your sample, you can introduce them in the anchor part of the CTE.
You would probably also want to return only leaf rows. There are several ways to do it - there may be some attribute in your actual data, or a combination of rank() and top (1) with ties will do the trick, although it won't be particularly efficient.

Multiple Insert for each of the Account IDs?

How do I effectively insert multiple rows without using loop for all of the Account-ID values?
INSERT INTO Table1
(AccountID, ShowColumns, GroupColumns, AvgColumnsFlag)
VALUES
(1, 'foo1', 'foo2', 'foo3')
(1, 'abc1', 'abc2', 'abc3')
(1, 'xyz1', 'xyz1', 'xyz1')
In this case, I have over 20,000 account ids. I can use one other table with unique account ID and do some kind of joining to get that. Then use it to in place of the displayed example Account-ID of "1".
I don't know how you guys do with multiple inserts for each Account-ID.
Thanks...
[Edit]
I found a way to insert using data from other table recently but unfortunately I can only insert 1 row, not multiple rows. :-( See code below... Is it possible to consolidate 3 of them into 1 instead?
INSERT INTO tblDealerSavedDataMyInventorySavedBuilds
(AccountId, LoadDefault, BuildName, ColumnShowAndSortOrderValues, ColumnGroupByValues, ColumnSortAverageValues)
SELECT DISTINCT tblaAccounts.AccountID, 0, 'My Inventory by Count', 'ImportStatus|StockNumber|Vin|Year|Make ASC|Model ASC|Trim|Mileage|PurchasePrice|StockDate|RepairCost|TotalCost|DaysInInventory|InventoryTrackerLocation|Category', 'Make|Model', 'MyInventoryCount-SortOrderByCount'
FROM tblaAccounts
ORDER BY tblaAccounts.AccountID ASC
INSERT INTO tblDealerSavedDataMyInventorySavedBuilds
(AccountId, LoadDefault, BuildName, ColumnShowAndSortOrderValues, ColumnGroupByValues, ColumnSortAverageValues)
SELECT DISTINCT tblaAccounts.AccountID, 0, 'My Inventory by Make', 'ImportStatus|StockNumber|Vin|Year|Make ASC|Model ASC|Trim|Mileage|PurchasePrice|StockDate|RepairCost|TotalCost|DaysInInventory|InventoryTrackerLocation|Category', 'Make|Model', 'MyInventoryCount-SortOrderByMake'
FROM tblaAccounts
ORDER BY tblaAccounts.AccountID ASC
INSERT INTO tblDealerSavedDataMyInventorySavedBuilds
(AccountId, LoadDefault, BuildName, ColumnShowAndSortOrderValues, ColumnGroupByValues, ColumnSortAverageValues)
SELECT DISTINCT tblaAccounts.AccountID, 0, 'My Inventory by Purchase Price', 'ImportStatus|StockNumber|Vin|Year|Make ASC|Model ASC|Trim|Mileage|PurchasePrice|StockDate|RepairCost|TotalCost|DaysInInventory|InventoryTrackerLocation|Category', 'Make|Model', 'MyInventoryCount-SortOrderByCost'
FROM tblaAccounts
ORDER BY tblaAccounts.AccountID ASC

First insert into #SourceTable all your values.
Then use this statement:
INSERT INTO Table1
SELECT *
FROM #SourceTable
It may look the same, but it's different, since you are actually addressing the table once, instead of 20,000 times..
You can also do it this way:
INSERT INTO Table1
SELECT 1, 'foo1', 'foo2', 'f003'
UNION ALL
SELECT 2, 'abc11', 'abc2', 'abc3'
UNION ALL
...

To insert multiple rows with hard-coded values use
insert into table (col1, col2, col3)
select 1, 'foo1', 'foo2', 'f003'
union all
select 2, 'abc11', 'abc2', 'abc3'
etc.
To insert from existing data
insert into table (col1, col2, col3)
select srccol1, srccol22, srccol33
from TableOrView

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Suggestions for improving slow performance of subquery - sql-server

When using "exists" I always limit results with "TOP" as bellow: select cn.id ,cn.summary from case_notes as cn where customer_id = 2 and exists ( select TOP 1 1 from case_note_details as cnd where cnd.case_note_id = cn.id and cnd.product_id not in (1111,2222) )

Did you try "in" instead of "exists". This sometimes performs differently: select cn.id, cn.summary from case_notes cn where customer_id = 2 and cn.id in ( select cnd.case_note_id from case_note_details cnd where cnd.product_id not in (1111,2222) ) Of course, check indexes.

Related

Select all Main table rows with detail table column constraints with GROUP BY

The maximum recursion 100 has been exhausted before statement completion (SQL Server)

SQL- Change column value based on other column values in same group & different row

Recursive CTE possible for this?

Multiple Insert for each of the Account IDs?

Categories

Resources