TSQL optimizing code for NOT IN - sql-server

I inherit an old SQL script that I want to optimize but after several tests, I must admit that all my tests only creates huge SQL with repetitive blocks. I would like to know if someone can propose a better code for the following pattern (see code below). I don't want to use temporary table (WITH). For simplicity, I only put 3 levels (table TMP_C, TMP_D and TMP_E) but the original SQL have 8 levels.
WITH
TMP_A AS (
SELECT
ID,
Field_X
FROM A
TMP_B AS(
SELECT DISTINCT
ID,
Field_Y,
CASE
WHEN Field_Z IN ('TEST_1','TEST_2') THEN 'CATEG_1'
WHEN Field_Z IN ('TEST_3','TEST_4') THEN 'CATEG_2'
WHEN Field_Z IN ('TEST_5','TEST_6') THEN 'CATEG_3'
ELSE 'CATEG_4'
END AS CATEG
FROM B
INNER JOIN TMP_A
ON TMP_A.ID=TMP_B.ID),
TMP_C AS (
SELECT DISTINCT
ID,
CATEG
FROM TMP_B
WHERE CATEG='CATEG_1'),
TMP_D AS (
SELECT DISTINCT
ID,
CATEG
FROM TMP_B
WHERE CATEG='CATEG_2' AND ID NOT IN (SELECT ID FROM TMP_C)),
TMP_E AS (
SELECT DISTINCT
ID,
CATEG
FROM TMP_B
WHERE CATEG='CATEG_3'
AND ID NOT IN (SELECT ID FROM TMP_C)
AND ID NOT IN (SELECT ID FROM TMP_D))
SELECT * FROM TMP_C
UNION
SELECT * FROM TMP_D
UNION
SELECT * FROM TMP_E
Many thanks in advance for your help.

First off, select DISTINCT will prevent duplicates from the result set, so you are overworking the condition. By adding the "WITH" definitions and trying to nest their use makes it more confusing to follow. The data is ultimately all coming from the "B" table where also has key match in "A". Lets start with just that... And since you are not using anything from the (B)Field_Y or (A)Field_X in your result set, don't add them to the mix of confusion.
SELECT DISTINCT
B.ID,
CASE WHEN B.Field_Z IN ('TEST_1','TEST_2') THEN 'CATEG_1'
WHEN B.Field_Z IN ('TEST_3','TEST_4') THEN 'CATEG_2'
WHEN B.Field_Z IN ('TEST_5','TEST_6') THEN 'CATEG_3'
ELSE 'CATEG_4'
END AS CATEG
FROM
B JOIN A ON B.ID = A.ID
WHERE
B.Field_Z IN ( 'TEST_1', 'TEST_2', 'TEST_3', 'TEST_4', 'TEST_5', 'TEST_6' )
The where clause will only include those category qualifying values you want and still have the results per each category.
Now, if you actually needed other values from your "Field_Y" or "Field_X", then that would generate a different query. However, your Tmp_C, Tmp_D and Tmp_E are only asking for the ID and CATEG columns anyhow.

This may perform better
SELECT DISTINCT B.ID, 'CATEG_1'
FROM
B JOIN A ON B.ID = A.ID
WHERE
B.Field_Z IN ( 'TEST_1', 'TEST_2')
UNION
SELECT DISTINCT B.ID, 'CATEG_2'
FROM
B JOIN A ON B.ID = A.ID
WHERE
B.Field_Z IN ( 'TEST_3', 'TEST_4')
...

Related

Incorrect Sum On Table Join

I am writing a query but i'm getting wrong result.Table are follows:
Tbl1(ProId, price,VId)
Tbl2(ProId, price, VId)
I have written this query:
SELECT
a.ProId, b.ProId,
SUM(a.price) - SUM(b.price) AS TotalPro
FROM
tbl1 AS a
INNER JOIN
tbl2 AS b ON a.ProId = b.ProId
WHERE
a.VId = '1234'
GROUP BY
a.ProId, b.ProId;
This query is returning an incorrect answer. What I have done is sum the price from table one and two separately and minus them the answer was fine. But when I join, I don't know why I am getting the wrong answer. ProId is same in both table, values are same.
I guess you want sth like below:
SELECT ProdId, SUM(price)
FROM (
SELECT a.ProId,a.price
FROM Tbl1 a
WHERE a.VId='1234'
UNION ALL
SELECT b.ProdId, -b.price
FROM Tbl2 b
--WHERE b.VId ='1234' (?)
) sub
GROUP BY ProdId;
The issue with JOIN is you may have some rows that are summed multiple times.

SQL Server : trying to insert a count of zero when a record doesn't exist

I am trying to modify the results of a query to populate a zero when a certain status doesn't exist.
In my base result I have something that looks like this:
But when a certain example doesn't appear in my table, I need a way to have a row show up with a zero for reporting needs, something like this:
I was trying to use a CTE maybe to populate those and left join it up...but doesn't seem to be working the way I want.
WITH DummyValues AS
(
SELECT 'Yellow' AS Val
UNION ALL
SELECT 'Red'
UNION ALL
SELECT 'Gray'
)
SELECT D.Val, V.PlntCd, COUNT(UpgradeMeasure)
FROM reporting.vw_SOTAgingView V
LEFT OUTER JOIN DummyValues D ON D.Val = V.UpgradeMeasure
GROUP BY D.Val, V.PlntCd
Is this an easy thing I am just missing something simple?
You can use a LEFT OUTER JOIN like this to always include the statuses (I switched the order of the tables since that is usually easier to read for most people):
SELECT
D.Val,
V.PlntCd,
COALESCE(COUNT(UpgradeMeasure), 0) AS [Count]
FROM (SELECT 'Yellow' UNION ALL SELECT 'Red' UNION ALL SELECT 'Gray') D
LEFT OUTER JOIN reporting.vw_SOTAgingView V
ON D.Val = V.UpgradeMeasure
GROUP BY D.Val, V.PlntCd
Just note that this won't exactly get your desired set. The "PlntCd" will be NULL if no match is found. If you want to ensure you cover all your plants, you need to start with a complete listing of plants and CROSS JOIN that source to statuses first. This might look like:
SELECT
D.Val, -- From cross-join
P.PlntCd, -- From source
COALESCE(COUNT(UpgradeMeasure), 0) AS [Count]
FROM (SELECT DISTINCT PlntCd FROM reporting.vw_SOTAgingView) P
CROSS JOIN (SELECT 'Yellow' UNION ALL SELECT 'Red' UNION ALL SELECT 'Gray') D
LEFT OUTER JOIN reporting.vw_SOTAgingView V
ON D.Val = V.UpgradeMeasure
AND P.PlntCd = V.PlntCd -- Also join to source to prevent dupes
GROUP BY D.Val, P.PlntCd -- Use source plant code
You have the join backwards.
You left join against the subset. (Or do it the way you have it and RIGHT OUTER JOIN, except no one really uses right joins)
SELECT
*
FROM
TableWithAllData All
LEFT JOIN TableWithSomeData Some ON Some.Id = All.id

How to use Join on linked servers

I am trying to get results of a customer from two linked servers remotely. i need to sum the points of every cust_id but am having problems with my query
SELECT sum(cust_point) as total
FROM [192.168.23.9].[POSDBV4].[dbo].[loyal_summery_branch] where cust_id='0100015388'
INNER JOIN [192.168.13.4].[POSDBV4].[dbo].[loyal_summery_branch]
ON cust_id.[192.168.23.9].[POSDBV4].[dbo].[loyal_summery_branch]=cust_id.[192.168.13.4].[POSDBV4].[dbo].[loyal_summery_branch];
I think you have your query syntax a little scrambled there. Try this.
SELECT sum(cust_point) as total
FROM [192.168.23.9].[POSDBV4].[dbo].[loyal_summery_branch] A
INNER JOIN [192.168.13.4].[POSDBV4].[dbo].[loyal_summery_branch] B ON A.cust_id=B.cust_id
WHERE cust_id='0100015388'
As you want the sum of cust_point of both of the table. Please find the query below
Select( (SELECT sum(cust_point)
FROM [192.168.23.9].[POSDBV4].[dbo].[loyal_summery_branch] where cust_id='0100015388') +
(SELECT sum(cust_point)
FROM [192.168.13.4].[POSDBV4].[dbo].[loyal_summery_branch] where cust_id='0100015388') ) as total
you can always use a UNION ALL here if you like.. this will allow you select other fields as well if you include a GROUP BY
SELECT SUM(cust_point) AS total
FROM (
SELECT cust_point
FROM [192.168.23.9].[POSDBV4].[dbo].[loyal_summery_branch]
WHERE cust_id = '0100015388'
UNION ALL
SELECT cust_point
FROM [192.168.13.4].[POSDBV4].[dbo].[loyal_summery_branch]
WHERE cust_id = '0100015388'
) t

Join the table valued function in the query

I have one table vwuser. I want join this table with the table valued function fnuserrank(userID). So I need to cross apply with table valued function:
SELECT *
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
For each userID it generates multiple records. I only want the last record for each empid that does not have a Rank of Term(inated). How can I do this?
Data:
HistoryID empid Rank MonitorDate
1 A1 E1 2012-8-9
2 A1 E2 2012-9-12
3 A1 Term 2012-10-13
4 A2 E3 2011-10-09
5 A2 TERM 2012-11-9
From this 2nd record and 4th record must be selected.
In SQL Server 2005+ you can use this Common Table Expression (CTE) to determine the latest record by MonitorDate that doesn't have a Rank of 'Term':
WITH EmployeeData AS
(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY empId, ORDER BY MonitorDate DESC) AS RowNumber
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
WHERE Rank != 'Term'
)
SELECT *
FROM EmployeeData AS ed
WHERE ed.RowNumber = 1;
Note: The statement before this CTE will need to end in a semi-colon. Because of this, I have seen many people write them like ;WITH EmployeeData AS...
You'll have to play with this. Having trouble mocking your schema on sqlfiddle.
Select bar.*
from
(
SELECT *
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
where rank != 'TERM'
) foo
left join
(
SELECT *
FROM vwuser AS b
CROSS APPLY fnuserrank(b.userid)
where rank != 'TERM'
) bar
on foo.empId = bar.empId
and foo.MonitorDate > bar.MonitorDate
where bar.empid is null
I always need to test out left outers on dates being higher. The way it works is you do a left outer. Every row EXCEPT one per user has row(s) with a higher monitor date. That one row is the one you want. I usually use an example from my code, but i'm on the wrong laptop. to get it working you can select foo., bar. and look at the results and spot the row you want and make the condition correct.
You could also do this, which is easier to remember
SELECT *
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
) foo
join
(
select empid, max(monitordate) maxdate
FROM vwuser AS b
CROSS APPLY fnuserrank(b.userid)
where rank != 'TERM'
) bar
on foo.empid = bar.empid
and foo.monitordate = bar.maxdate
I usually prefer to use set based logic over aggregate functions, but whatever works. You can tweak it also by caching the results of your TVF join into a table variable.
EDIT:
http://www.sqlfiddle.com/#!3/613e4/17 - I mocked up your TVF here. Apparently sqlfiddle didn't like "go".
select foo.*, bar.*
from
(
SELECT f.*
FROM vwuser AS a
join fnuserrank f
on a.empid = f.empid
where rank != 'TERM'
) foo
left join
(
SELECT f1.empid [barempid], f1.monitordate [barmonitordate]
FROM vwuser AS b
join fnuserrank f1
on b.empid = f1.empid
where rank != 'TERM'
) bar
on foo.empId = bar.barempid
and foo.MonitorDate > bar.barmonitordate
where bar.barempid is null

Join subquery with min

I'm pulling my hair out over a subquery that I'm using to avoid about 100 duplicates (out of about 40k records). The records that are duplicated are showing up because they have 2 dates in h2.datecreated for a valid reason, so I can't just scrub the data.
I'm trying to get only the earliest date to return. The first subquery (that starts with "select distinct address_id", with the MIN) works fine on it's own...no duplicates are returned. So it would seem that the left join (or just plain join...I've tried that too) couldn't possibly see the second h2.datecreated, since it doesn't even show up in the subquery. But when I run the whole query, it's returning 2 values for some ipc.mfgid's, one with the h2.datecreated that I want, and the other one that I don't want.
I know it's got to be something really simple, or something that just isn't possible. It really seems like it should work! This is MSSQL. Thanks!
select distinct ipc.mfgid as IPC, h2.datecreated,
case when ad.Address is null
then ad.buildingname end as Address, cast(trace.name as varchar)
+ '-' + cast(trace.Number as varchar) as ONT,
c.ACCOUNT_Id,
case when h.datecreated is not null then h.datecreated
else h2.datecreated end as Install
from equipmentjoin as ipc
left join historyjoin as h on ipc.id = h.EQUIPMENT_Id
and h.type like 'add'
left join circuitjoin as c on ipc.ADDRESS_Id = c.ADDRESS_Id
and c.GRADE_Code like '%hpna%'
join (select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment)
as h2 on c.address_id = h2.address_id
left join (select car.id, infport.name, carport.number, car.PCIRCUITGROUP_Id
from circuit as car (NOLOCK)
join port as carport (NOLOCK) on car.id = carport.CIRCUIT_Id
and carport.name like 'lead%'
and car.GRADE_Id = 29
join circuit as inf (NOLOCK) on car.CCIRCUITGROUP_Id = inf.PCIRCUITGROUP_Id
join port as infport (NOLOCK) on inf.id = infport.CIRCUIT_Id
and infport.name like '%olt%' )
as trace on c.ccircuitgroup_id = trace.pcircuitgroup_id
join addressjoin as ad (NOLOCK) on ipc.address_id = ad.id
The typical approach to only getting the lowest row is one of the following. You didn't bother to specify what version of SQL Server you're using, what you want to do with ties, and I have little interest to try to work this into your complex query, so I'll show you an abstract simplification for different versions.
SQL Server 2000
SELECT x.grouping_column, x.min_column, x.other_columns ...
FROM dbo.foo AS x
INNER JOIN
(
SELECT grouping_column, min_column = MIN(min_column)
FROM dbo.foo GROUP BY grouping_column
) AS y
ON x.grouping_column = y.grouping_column
AND x.min_column = y.min_column;
SQL Server 2005+
;WITH x AS
(
SELECT grouping_column, min_column, other_columns,
rn = ROW_NUMBER() OVER (ORDER BY min_column)
FROM dbo.foo
)
SELECT grouping_column, min_column, other_columns
FROM x
WHERE rn = 1;
This subqery:
select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment
Probably will return multiple rows because the comment is not guaranteed to be the same.
Try this instead:
CROSS APPLY (
SELECT TOP 1 H2.DateCreated, H2.Comment -- H2.Equipment_id wasn't used
FROM History H2
WHERE
H2.Comment LIKE 'MAC: 5%'
AND C.Address_ID = H2.Address_ID
ORDER BY DateCreated
) H2
Switch that to OUTER APPLY in case you want rows that don't have a matching desired history entry.

Resources