SQL WHERE clause performance degradation - sql-server

I have a query that I am working on and it is displaying performance issues that I would not have expected. Here is the query so far.
INSERT INTO #Bridge (PolicyNumber, ProducerCode, BridgeDate, EffectiveDate, FirstName, LastName, LicenseNumber, BirthDate, Address, City, State, ZipCode)
SELECT tab.col.value('#PolicyNumber', 'VARCHAR(10)') AS PolicyNumber,
tab.col.value('#ProducerCode','VARCHAR(10)') as ProducerCode,
tab.col.value('#BridgeDate','DATETIME') AS BridgeDate,
tab.col.value('#EffectiveDate', 'DATETIME') as EffectiveDate,
tab.col.value('#FirstName', 'VARCHAR(200)') as FirstName,
tab.col.value('#LastName', 'VARCHAR(200)') as LastName,
CASE
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%0000%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%1111%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%2222%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%3333%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%4444%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%5555%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%6666%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%7777%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%8888%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%9999%' THEN NULL
ELSE tab.col.value('#LicenseNumber','VARCHAR(50)')
END as LicenseNumber,
tab.col.value('#BirthDate','DATETIME') as BirthDate,
REPLACE(tab.col.value('#Address1','VARCHAR(300)'), ' APT ',' #') as Address1,
tab.col.value('#City','VARCHAR(300)') as City,
tab.col.value('#State','VARCHAR(5)') as State,
tab.col.value('#ZipCode','VARCHAR(10)') as Zip
FROM #xml.nodes('//rows/datarow') as tab(col)
SELECT B.PolicyNumber,
B.ProducerCode,
B.BridgeDate,
B.EffectiveDate,
H.current_policy,
H.cancel_date,
H.first_eff_date,
H.display_address,
H.city,
H.state,
H.zip
FROM #Bridge B
LEFT JOIN (
SELECT P.policy_id,
P.current_policy,
CASE
WHEN A.pobox <> '' THEN 'PO BOX ' + REPLACE(A.pobox,'PO BOX ','')
ELSE RTRIM(A.house_num + ' ' + A.street_name + ' ' + CASE
WHEN A.apt_num = '' THEN ''
ELSE '#' + A.apt_num
END)
END as display_address,
A.pobox,
A.house_num,
A.street_name,
A.apt_num,
A.city,
MAX(A.policyimage_num) as policimage_num, --this is just to limit the results to the most recent
S.state,
A.zip,
P.first_eff_date,
P.cancel_date
FROM Diamond.dbo.Policy P WITH (NOLOCK)
LEFT JOIN Diamond.dbo.Address A WITH (NOLOCK)
ON P.policy_id = A.policy_id
AND A.nameaddresssource_id = 3
LEFT JOIN Diamond.dbo.State S WITH (NOLOCK)
ON A.state_id = S.state_id
WHERE A.state_id IS NOT NULL
AND P.current_policy NOT IN (SELECT PolicyNumber FROM #Bridge)
GROUP BY P.policy_id,
P.current_policy,
P.cancel_date,
P.first_eff_date,
A.pobox,
A.house_num,
A.street_name,
A.apt_num,
A.city,
S.state,
A.zip) AS H
ON B.Address = H.display_address
AND B.State = H.state
AND B.City = H.city
AND SUBSTRING(B.ZipCode,1,5) = SUBSTRING(H.Zip,1,5)
AND B.PolicyNumber != H.current_policy
WHERE H.current_policy IS NOT NULL
This query, run by itself, finishes in about 1:30 seconds. But if I add the following to the WHERE clause
AND B.EffectiveDate != H.first_eff_date
Suddenly the query takes far longer to return results. (We are at over 15 minutes and still going while I am writing this) I would think that simply having a clause to weed out a few additional rows wouldn't have such a drastic effect, but apparently it does. I how to get around it, I am just curious if anyone has any ideas as to why it has this effect?

Without having a hands on I can only guess at this, but here are some places I think you can tidy up and probably shave off run time.
1, You duplicate the effort required to make sure policy numbers don't match. Pick one of the two you have, not both. I would suggest trying both see which is faster.
i.e. this:
AND P.current_policy NOT IN (SELECT PolicyNumber FROM #Bridge)
Will do the same as this, you don't need both.
AND B.PolicyNumber != H.current_policy
2, It's worth a try to remove all that grouping from your sub query - you don't actually use policimage_num for anything. So why do the grouping? If you are worried that many rows are returned from Address, then you can use DISTINCT on your column set instead, that may be faster.
3, Is A.state_id a nullable value? If not consider trying an INNER JOIN to Address and removing the null check.
4, In all honesty I'm not seeing an obvious reason for that subquery at all, it seems to be over-complicating matters. Can you not simply join the tables together without it (again using DISTINCT if required)?
In other words get tweaking, I bet you can get it below the original run time if you try a few of these ideas.

Related

CASE Statement causing execute time to sky rocket

There are two distinct databases where I work. In creating a report (utilizing TSQL) to share between departments, it was requested to have a field to show the information from the primary database (information kept on the college's database) had also been inputted into a second database (that a specific department uses for information communications with federal program). Without checking the second database with the case statement the query for the rest of the information takes less than a second. With the case statement (in which CTEs where created to conduct the check), it has run for 15 minutes and not finished before I manually ended the execution. Here is the code (CASE statement currently commented out):
With POWERFAIDS_CHECK as
(
Select distinct NAME_MASTER.ID_NUM,
(CAST (NAME_MASTER.ID_NUM as VARCHAR) + CAST (EX_SCHOLARSHIP_RECIPIENTS.AID_ELEMENT as VARCHAR) ) as CHECK_ID
From NAME_MASTER
JOIN EX_SCHOLARSHIP_RECIPIENTS on NAME_MASTER.ID_NUM = EX_SCHOLARSHIP_RECIPIENTS.ID_NUM
JOIN SCHOLARSHIP on EX_SCHOLARSHIP_RECIPIENTS.AID_ELEMENT = SCHOLARSHIP.AID_ELEMENT
JOIN PF_FUND_CDE_MSTR on EX_SCHOLARSHIP_RECIPIENTS.AID_ELEMENT = PF_FUND_CDE_MSTR.RPT_CATEGORY
JOIN PowerFAIDS_Production.dbo.student on NAME_MASTER.ID_NUM = PowerFAIDS_Production.dbo.student.alternate_id
JOIN PowerFAIDS_Production.dbo.funds on PF_FUND_CDE_MSTR.FUND_CDE = PowerFAIDS_Production.dbo.funds.fund_ledger_number
JOIN PowerFAIDS_Production.dbo.stu_award_year on PowerFAIDS_Production.dbo.student.student_token = PowerFAIDS_Production.dbo.stu_award_year.student_token
JOIN PowerFAIDS_Production.dbo.stu_award on PowerFAIDS_Production.dbo.stu_award_year.stu_award_year_token = PowerFAIDS_Production.dbo.stu_award.stu_award_year_token
JOIN YEAR_TERM_TABLE on (YEAR_TERM_TABLE.YR_CDE = EX_SCHOLARSHIP_RECIPIENTS.YR_CDE) and (YEAR_TERM_TABLE.TRM_CDE = EX_SCHOLARSHIP_RECIPIENTS.TRM_CDE)
Where EX_SCHOLARSHIP_RECIPIENTS.YR_CDE = '2021'
and EX_SCHOLARSHIP_RECIPIENTS.TRM_CDE = 'FA'
and YEAR_TERM_TABLE.TRM_BEGIN_DTE = PowerFAIDS_Production.dbo.stu_award.award_period_begin_dt
and EX_SCHOLARSHIP_RECIPIENTS.AWARD_AMT = PowerFAIDS_Production.dbo.stu_award.actual_amt
and stu_award.status = 'A'
),
AWARDED_SCHOLARSHIPS as
(Select distinct NAME_MASTER.ID_NUM, (NAME_MASTER.FIRST_NAME + ' ' + NAME_MASTER.LAST_NAME) as STUDENT_NAME,
SCHOLARSHIP.DESCRIPTION,
Format (EX_SCHOLARSHIP_RECIPIENTS.AWARD_AMT, 'C','en-us') as AWARD_AMT,
EX_SCHOLARSHIP_RECIPIENTS.COMMENTS,
YR_DESC, TRM_DESC, EX_SCHOLARSHIP_RECIPIENTS.AID_ELEMENT,
(CAST (NAME_MASTER.ID_NUM as VARCHAR) + CAST (EX_SCHOLARSHIP_RECIPIENTS.AID_ELEMENT as VARCHAR) ) as CHECK_ID,
NAME_MASTER.LAST_NAME, NAME_MASTER.FIRST_NAME
From NAME_MASTER
JOIN EX_SCHOLARSHIP_RECIPIENTS on NAME_MASTER.ID_NUM = EX_SCHOLARSHIP_RECIPIENTS.ID_NUM
JOIN SCHOLARSHIP on EX_SCHOLARSHIP_RECIPIENTS.AID_ELEMENT = SCHOLARSHIP.AID_ELEMENT
JOIN YEAR_DEF on EX_SCHOLARSHIP_RECIPIENTS.YR_CDE = YEAR_DEF.YR_CDE
JOIN TERM_DEF on EX_SCHOLARSHIP_RECIPIENTS.TRM_CDE = TERM_DEF.TRM_CDE
Where EX_SCHOLARSHIP_RECIPIENTS.TRM_CDE = 'FA'
and EX_SCHOLARSHIP_RECIPIENTS.YR_CDE = '2021'
and EX_SCHOLARSHIP_RECIPIENTS.AID_ELEMENT not between '5000' and '5999'
)
Select distinct AWARDED_SCHOLARSHIPS.ID_NUM, STUDENT_NAME,
AWARDED_SCHOLARSHIPS.DESCRIPTION, AWARDED_SCHOLARSHIPS.AWARD_AMT,
--CASE
-- WHEN AWARDED_SCHOLARSHIPS.CHECK_ID not in (Select CHECK_ID from POWERFAIDS_CHECK)
-- THEN 'No'
-- ELSE 'Yes'
--END as Processed_FA_Award,
AWARDED_SCHOLARSHIPS.COMMENTS,
AWARDED_SCHOLARSHIPS.YR_DESC, AWARDED_SCHOLARSHIPS.TRM_DESC, AWARDED_SCHOLARSHIPS.LAST_NAME, AWARDED_SCHOLARSHIPS.FIRST_NAME
From AWARDED_SCHOLARSHIPS
LEFT OUTER JOIN POWERFAIDS_CHECK on AWARDED_SCHOLARSHIPS.ID_NUM = POWERFAIDS_CHECK.ID_NUM
Where AWARDED_SCHOLARSHIPS.ID_NUM in (Select ID_NUM from AWARDED_SCHOLARSHIPS)
Order by AWARDED_SCHOLARSHIPS.DESCRIPTION, LAST_NAME, FIRST_NAME
Any insights much appreciated, thanks!
It's hard to give a definitive answer here without knowing what your data and indexes look like. If you can share your the execution plan for both versions of your query it would be way easier to give you a clear answer.
You can share you execution plans here: https://www.brentozar.com/pastetheplan/
Without the execution plan here are a couple of ideas to try:
EXISTS often performs better than IN/NOT IN
CASE WHEN EXISTS (SELECT * FROM POWERFAIDS_CHECK WHERE CHECK_ID = AWARDED_SCHOLARSHIPS.CHECK_ID) THEN [...]
Adding a new CTE that contains the subset of data you need might help:
), VALID_CHECK_IDS AS ( SELECT DISTINCT CHECK_ID FROM POWER_FAIDS_CHECK )
Then use this CTE in your case statement.
Add a second join to POWERFAIDS_CHECK
LEFT JOIN POWERFAIDS_CHECK AS VALIDCHECKID ON AWARDED_SCHOLARSHIPS.CHECK_ID = POWERFAIDS_CHECK.CHECK_ID
and update your case to
CASE WHEN VALIDCHECKID.CHECK_ID IS NOT NULL THEN 'No' ELSE 'Yes' [...]

SQL combine two queries result into one dataset

I am trying to combine two SQL queries the first is
SELECT
EAC.Person.FirstName,
EAC.Person.Id,
EAC.Person.LastName,
EAC.Person.EmployeeId,
EAC.Person.IsDeleted,
Controller.Cards.SiteCode,
Controller.Cards.CardCode,
Controller.Cards.ActivationDate,
Controller.Cards.ExpirationDate,
Controller.Cards.Status,
EAC.[Group].Name
FROM
EAC.Person
INNER JOIN
Controller.Cards ON EAC.Person.Id = Controller.Cards.PersonId
INNER JOIN
EAC.GroupPersonMap ON EAC.Person.Id = EAC.GroupPersonMap.PersonId
INNER JOIN
EAC.[Group] ON EAC.GroupPersonMap.GroupId = EAC.[Group].Id
And the second one is
SELECT
IsActive, ActivationDateUTC, ExpirationDateUTC,
Sitecode + '-' + Cardcode AS Credential, 'Badge' AS Type,
CASE
WHEN isActive = 0
THEN 'InActive'
WHEN ActivationDateUTC > GetUTCDate()
THEN 'Pending'
WHEN ExpirationDAteUTC < GetUTCDate()
THEN 'Expired'
ELSE 'Active'
END AS Status
FROM
EAC.Credential
JOIN
EAC.WiegandCredential ON Credential.ID = WiegandCredential.CredentialId
WHERE
PersonID = '32'
Where I would like to run the second query for each user of the first query using EAC.Person.Id instead of the '32'.
I would like all the data to be returned in one Dataset so I can use it in Report Builder.
I have been fighting with this all day and am hoping one of you smart guys can give me a hand. Thanks in advance.
Based on your description in the comments, I understand that the connection between the two datasets is actually the PersonID field, which exists in both EAC.Credential and EAC.Person; however, in EAC.Credential, duplicate values exist for PersonID, and you want only the most recent one for each PersonID.
There are a few ways to do this, and it will depend on the number of rows returned, the indexes, etc., but I think maybe you're looking for something like this...?
SELECT
EAC.Person.FirstName
,EAC.Person.Id
,EAC.Person.LastName
,EAC.Person.EmployeeId
,EAC.Person.IsDeleted
,Controller.Cards.SiteCode
,Controller.Cards.CardCode
,Controller.Cards.ActivationDate
,Controller.Cards.ExpirationDate
,Controller.Cards.Status
,EAC.[Group].Name
,X.IsActive
,X.ActivationDateUTC
,X.ExpirationDateUTC
,X.Credential
,X.Type
,X.Status
FROM EAC.Person
INNER JOIN Controller.Cards
ON EAC.Person.Id = Controller.Cards.PersonId
INNER JOIN EAC.GroupPersonMap
ON EAC.Person.Id = EAC.GroupPersonMap.PersonId
INNER JOIN EAC.[Group]
ON EAC.GroupPersonMap.GroupId = EAC.[Group].Id
CROSS APPLY
(
SELECT TOP 1
IsActive
,ActivationDateUTC
,ExpirationDateUTC
,Sitecode + '-' + Cardcode AS Credential
,'Badge' AS Type
,'Status' =
CASE
WHEN isActive = 0
THEN 'InActive'
WHEN ActivationDateUTC > GETUTCDATE()
THEN 'Pending'
WHEN ExpirationDateUTC < GETUTCDATE()
THEN 'Expired'
ELSE 'Active'
END
FROM EAC.Credential
INNER JOIN EAC.WiegandCredential
ON EAC.Credential.ID = EAC.WiegandCredential.CredentialId
WHERE EAC.Credential.PersonID = EAC.Person.PersonID
ORDER BY EAC.Credential.ID DESC
) AS X
-- Optionally, you can also add conditions to return specific rows, i.e.:
-- WHERE EAC.Person.PersonID = 32
This option uses a CROSS APPLY, which means that every row of the first dataset will return additional values from the second dataset, based on the criteria that you described. In this CROSS APPLY, I'm joining the two datasets based on the fact that PersonID exists in both EAC.Person (in your first dataset) as well as in EAC.Credential. I then specify that I want only the TOP 1 row for each PersonID, with an ORDER BY specifying that we want the most recent (highest) value of ID for each PersonID.
The CROSS APPLY is aliased as "X", so in your original SELECT you now have several values prefixed with the X. alias, which just means that you're taking these fields from the second query and attaching them to your original results.
CROSS APPLY requires that a matching entry exists in both subsets of data, much like an INNER JOIN, so you'll want to check and make sure that the relevant values exist and are returned correctly.
I think this is pretty close to the direction you're trying to go. If not, let me know and I'll update the answer. Good luck!
Try like this;
select Query1.*, Query2.* from (
SELECT
EAC.Person.FirstName,
EAC.Person.Id as PersonId,
EAC.Person.LastName,
EAC.Person.EmployeeId,
EAC.Person.IsDeleted,
Controller.Cards.SiteCode,
Controller.Cards.CardCode,
Controller.Cards.ActivationDate,
Controller.Cards.ExpirationDate,
Controller.Cards.Status,
EAC.[Group].Name
FROM
EAC.Person
INNER JOIN
Controller.Cards ON EAC.Person.Id = Controller.Cards.PersonId
INNER JOIN
EAC.GroupPersonMap ON EAC.Person.Id = EAC.GroupPersonMap.PersonId
INNER JOIN
EAC.[Group] ON EAC.GroupPersonMap.GroupId = EAC.[Group].Id)
Query1 inner join (SELECT top 100
IsActive, ActivationDateUTC, ExpirationDateUTC,
Sitecode + '-' + Cardcode AS Credential, 'Badge' AS Type,
CASE
WHEN isActive = 0
THEN 'InActive'
WHEN ActivationDateUTC > GetUTCDate()
THEN 'Pending'
WHEN ExpirationDAteUTC < GetUTCDate()
THEN 'Expired'
ELSE 'Active'
END AS Status
FROM
EAC.Credential
JOIN
EAC.WiegandCredential ON Credential.ID = WiegandCredential.CredentialId
ORDER BY EAC.Credential.ID DESC) Query2 ON Query1.PersonId = Query2.PersonID
Just select two queries to join them like Query1 and Query2 by equaling PersonId data.

T-SQL: Ugly SQL

EDIT: Just so folks understand, I'm not worried about the formatting, I'm worried about the usage of the GROUP By and the usage of the aggregate fields when it doesn't make a whole lot of sense.
I've been tasked with making some SQL more readable. While I generally know what to do, this particular query escapes me. The gist of the query involves the writer grouping by a whole bunch of fields, and adding those fields to the query results. For fields that he/she doesn't GROUP BY, they use a MIN aggregate function I guess to "make the error go away"
MIN(ISNULL(dbo.V_CONNECT_ContactPartnerDetail_0010.SAPNr, N'')) AS sapkunr,
My difficulty comes from the fact that I can stuff the GROUP BY into a CTE, and then branch out from there, but I've never gotten the row counts to match up between the query I've created and the original one. Any help on making this SQL more readable and making its intent more clear (no functions to make the error go away) would be greatly appreciated.
SELECT dbo.V_CONNECT_ContactPartnerDetail_0010.ID_FI AS firmencode,
dbo.V_CONNECT_ContactPartnerDetail_0010.ID_KP AS partnercode,
dbo.V_CONNECT_ContactPartnerDetail_0010.Nachname,
Min(Isnull(dbo.V_CONNECT_ContactPartnerDetail_0010.Vorname, '')) AS vname,
Min(CASE V_CONNECT_ContactPartnerDetail_0010.Anrede
WHEN 'Frau' THEN 2
ELSE 1
END) AS anrede,
Min(Isnull(dbo.V_CONNECT_ContactPartnerDetail_0010.EMail, N'')) AS mail,
Min(Isnull(dbo.V_CONNECT_ContactPartnerDetail_0010.SAPNr, N'')) AS sapkunr,
Isnull(dbo.V_CONNECT_ContactPartnerDetail_0010.Titel, N'') AS titel
FROM dbo.V_CONNECT_ContactPartnerDetail_0010
INNER JOIN dbo.V_CONNECT_ContactPartnerPivot
ON dbo.V_CONNECT_ContactPartnerDetail_0010.ID_C005 = dbo.V_CONNECT_ContactPartnerPivot.ID_C005
LEFT OUTER JOIN dbo.V_CONNECT_Firmen_PZ_Download
ON dbo.V_CONNECT_ContactPartnerDetail_0010.ID_VF = dbo.V_CONNECT_Firmen_PZ_Download.ID_VF
WHERE ( dbo.V_CONNECT_ContactPartnerDetail_0010.VKO = '0010' )
GROUP BY dbo.V_CONNECT_ContactPartnerDetail_0010.ID_FI,
dbo.V_CONNECT_ContactPartnerDetail_0010.ID_KP,
dbo.V_CONNECT_ContactPartnerDetail_0010.Nachname,
dbo.V_CONNECT_ContactPartnerDetail_0010.Ort,
dbo.V_CONNECT_ContactPartnerPivot.flg_spl,
dbo.V_CONNECT_ContactPartnerPivot.flg_ha,
dbo.V_CONNECT_ContactPartnerPivot.flg_fu,
dbo.V_CONNECT_ContactPartnerPivot.flg_ma,
dbo.V_CONNECT_ContactPartnerPivot.flg_ph,
Isnull(dbo.V_CONNECT_ContactPartnerDetail_0010.Titel, N'')
This is much more "readable" to me
SELECT cpd.ID_FI AS firmencode
, cpd.ID_KP AS partnercode
, cpd.Nachname AS Nachname
, MIN(ISNULL( cpd.Vorname ,'')) AS vname
, MIN(CASE cpd.Anrede WHEN 'Frau' THEN 2 ELSE 1 END) AS anrede
, MIN(ISNULL( cpd.EMail ,N'')) AS mail
, MIN(ISNULL( cpd.SAPNr ,N'')) AS sapkunr
, ISNULL( cpd.Titel ,N'') AS titel
FROM dbo.V_CONNECT_ContactPartnerDetail_0010 cpd
JOIN dbo.V_CONNECT_ContactPartnerPivot cpp
ON cpd.ID_C005 = cpp.ID_C005
LEFT
JOIN dbo.V_CONNECT_Firmen_PZ_Download fpd
ON fpd.ID_VF = cpd.ID_VF
WHERE cpd.VKO = '0010'
GROUP
BY cpd.ID_FI
, cpd.ID_KP
, cpd.Nachname
, cpd.Ort
, cpp.flg_spl
, cpp.flg_ha
, cpp.flg_fu
, cpp.flg_ma
, cpp.flg_ph
, ISNULL(cpd.Titel ,N'')
EDIT
If I was "tasked with making some SQL more readable", I'd start with the changes above.
Beyond that, it's not clear why the GROUP BY clause includes expression that aren't in the SELECT list. It's valid to do that. But what's curious is that uf there are multiple rows from "cpd" that have different values of "Ort", then there's a pontential to get multiple rows returned, with the same values of "ID_FI", "ID_KP", "Nachname".
What really sticks out though is the outer join to "fpd", and apart from the reference to the "ID_VF" column in the join condition, there aren't any references to columns from "fpd" anywhere else in the query. It seems like if that outer join were removed, we'd get the same result.
The first structural change I would propose would be the removal of the join to "fpd".
SELECT cpd.ID_FI AS firmencode
, cpd.ID_KP AS partnercode
, cpd.Nachname AS Nachname
, MIN(ISNULL( cpd.Vorname ,'')) AS vname
, MIN(CASE cpd.Anrede WHEN 'Frau' THEN 2 ELSE 1 END) AS anrede
, MIN(ISNULL( cpd.EMail ,N'')) AS mail
, MIN(ISNULL( cpd.SAPNr ,N'')) AS sapkunr
, ISNULL( cpd.Titel ,N'') AS titel
FROM dbo.V_CONNECT_ContactPartnerDetail_0010 cpd
JOIN dbo.V_CONNECT_ContactPartnerPivot cpp
ON cpp.ID_C005 = cpd.ID_C005
WHERE cpd.VKO = '0010'
GROUP
BY cpd.ID_FI
, cpd.ID_KP
, cpd.Nachname
, cpd.Ort
, ISNULL( cpd.Titel ,N'')
, cpp.flg_spl
, cpp.flg_ha
, cpp.flg_fu
, cpp.flg_ma
, cpp.flg_ph
We can rearrange the expression in the GROUP BY clause, to move "Title" up with the other columns from "cpd". Without an ORDER BY clause, there's no guarantee what order the rows will be returned in.
We can't tell (from the query, and from the information provided) whether the "ID_C005" column is the PRIMARY KEY or a UNIQUE KEY in either "cpd" or "cpp".
And not knowing that, we can't really make other change to the query without potentially changing the result. If "ID_C005" is unique in "cpp", then we could eliminate all of the "cpp" column references from the GROUP BY.
If the purpose of the inner join (to "cpp") is to filter out rows from "cpd" that don't have a matching row in "cpp", we could make some other changes to the query. And that might make it more "readable".

TSQL Select Statement using Case or Join

I am a little stuck on a situation that I have been trying to fight through. I have a page that allows a user to select all the filter options they want to search by and then it runs the query on that data.
Every field requires something to be picked but on a new field I am introducing, it's going to be optional.
It allows you to provide a list of supervisors and it will then provide all records where the agents supervisor is in the list provided; pretty straight forward. However, I am trying to make this optional as I don't want to always search by users. If I don't provide a name in the UI to pass to the stored procedure, then I want to ignore this part of the statement and get me everything regardless of the manager.
Here is the query I am working with:
SELECT a.[escID],
a.[escReasonID],
b.[ArchibusLocationName],
c.[ArchibusLocationName],
b.[DepartmentDesc],
c.[DepartmentDesc],
a.[escCreatedBy],
a.[escWorkedBy],
a.[escNotes],
a.[preventable],
a.[escalationCreated],
a.[escalationTracked],
a.[feedbackID],
typ.[EscalationType],
typ.[EscalationTypeText] AS escalationType,
d.reasonText AS reasonText
FROM [red].[dbo].[TFS_Escalations] AS a
LEFT OUTER JOIN
red.dbo.EmployeeTable AS b
ON a.escCreatedBy = b.QID
LEFT OUTER JOIN
red.dbo.EmployeeTable AS c
ON a.escWorkedBy = c.QID
LEFT OUTER JOIN
red.dbo.TFS_Escalation_Reasons AS d
ON a.escReasonID = d.ReasonID
INNER JOIN
dbo.TFS_EscalationTypes AS typ
ON d.escType = typ.EscalationType
WHERE B.[ArchibusLocationName] IN (SELECT location
FROM #tmLocations)
AND C.[ArchibusLocationName] IN (SELECT location
FROM #subLocations)
AND B.[DepartmentDesc] IN (SELECT department
FROM #tmDepartments)
AND C.[DepartmentDesc] IN (SELECT department
FROM #subDepartments)
AND DATEDIFF(second, '19700101', CAST (CONVERT (DATETIME, A.[escalationCreated], 121) AS INT)) >= #startDate
AND DATEDIFF(second, '19700101', CAST (CONVERT (DATETIME, A.[escalationCreated], 121) AS INT)) <= #endDate
AND a.[PREVENTABLE] IN (SELECT PREVENTABLE FROM #preventable)
AND b.MgrQID IN (SELECT leaderQID FROM #sourceLeaders)
The part that I am trying to make option is the very last line of the query:
AND b.MgrQID IN (SELECT leaderQID FROM #sourceLeaders)
Essentially, if there is no data in the temp table #sourceLeaders then it should ignore that piece of the query.
In all of the other instances of the WHERE clause, something is always required for those fields which is why that all works fine. I just cant figure out the best way to make this piece optional depending on if the temp table has data in it (the temp table is populated by the names entered in the UI that a user COULD search by).
So this line should be TRUE if something matches data in the table variable OR there is nothing in the table variable
AND
(
b.MgrQID IN (SELECT leaderQID FROM #sourceLeaders)
OR
NOT EXISTS (SELECT 1 FROM #sourceLeaders)
)
Similar to Nick.McDermaid's, but uses a case statement instead :
AND
(
1 = CASE WHEN NOT EXISTS(SELECT 1 FROM #sourceLeaders) THEN 1
WHEN b.MgrQID IN (SELECT leaderQID FROM #sourceLeaders) THEN 1
ELSE 0
END
)
Maybe at the top so you have a single check
DECLARE #EmptySourceLeaders CHAR(1)
IF EXISTS (SELECT 1 FROM #sourceLeaders)
SET #EmptySourceLeaders = 'N'
ELSE
SET #EmptySourceLeaders = 'Y'
Then in the joins
LEFT OUTER JOIN #SourceLeaders SL
ON b.MgrQID = SL.leaderQID
Then in the WHERE
AND (#EmptySourceLeaders = 'Y' OR SL.leaderQID IS NOT NULL)
lots of ways to do it.

Join subquery with min

I'm pulling my hair out over a subquery that I'm using to avoid about 100 duplicates (out of about 40k records). The records that are duplicated are showing up because they have 2 dates in h2.datecreated for a valid reason, so I can't just scrub the data.
I'm trying to get only the earliest date to return. The first subquery (that starts with "select distinct address_id", with the MIN) works fine on it's own...no duplicates are returned. So it would seem that the left join (or just plain join...I've tried that too) couldn't possibly see the second h2.datecreated, since it doesn't even show up in the subquery. But when I run the whole query, it's returning 2 values for some ipc.mfgid's, one with the h2.datecreated that I want, and the other one that I don't want.
I know it's got to be something really simple, or something that just isn't possible. It really seems like it should work! This is MSSQL. Thanks!
select distinct ipc.mfgid as IPC, h2.datecreated,
case when ad.Address is null
then ad.buildingname end as Address, cast(trace.name as varchar)
+ '-' + cast(trace.Number as varchar) as ONT,
c.ACCOUNT_Id,
case when h.datecreated is not null then h.datecreated
else h2.datecreated end as Install
from equipmentjoin as ipc
left join historyjoin as h on ipc.id = h.EQUIPMENT_Id
and h.type like 'add'
left join circuitjoin as c on ipc.ADDRESS_Id = c.ADDRESS_Id
and c.GRADE_Code like '%hpna%'
join (select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment)
as h2 on c.address_id = h2.address_id
left join (select car.id, infport.name, carport.number, car.PCIRCUITGROUP_Id
from circuit as car (NOLOCK)
join port as carport (NOLOCK) on car.id = carport.CIRCUIT_Id
and carport.name like 'lead%'
and car.GRADE_Id = 29
join circuit as inf (NOLOCK) on car.CCIRCUITGROUP_Id = inf.PCIRCUITGROUP_Id
join port as infport (NOLOCK) on inf.id = infport.CIRCUIT_Id
and infport.name like '%olt%' )
as trace on c.ccircuitgroup_id = trace.pcircuitgroup_id
join addressjoin as ad (NOLOCK) on ipc.address_id = ad.id
The typical approach to only getting the lowest row is one of the following. You didn't bother to specify what version of SQL Server you're using, what you want to do with ties, and I have little interest to try to work this into your complex query, so I'll show you an abstract simplification for different versions.
SQL Server 2000
SELECT x.grouping_column, x.min_column, x.other_columns ...
FROM dbo.foo AS x
INNER JOIN
(
SELECT grouping_column, min_column = MIN(min_column)
FROM dbo.foo GROUP BY grouping_column
) AS y
ON x.grouping_column = y.grouping_column
AND x.min_column = y.min_column;
SQL Server 2005+
;WITH x AS
(
SELECT grouping_column, min_column, other_columns,
rn = ROW_NUMBER() OVER (ORDER BY min_column)
FROM dbo.foo
)
SELECT grouping_column, min_column, other_columns
FROM x
WHERE rn = 1;
This subqery:
select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment
Probably will return multiple rows because the comment is not guaranteed to be the same.
Try this instead:
CROSS APPLY (
SELECT TOP 1 H2.DateCreated, H2.Comment -- H2.Equipment_id wasn't used
FROM History H2
WHERE
H2.Comment LIKE 'MAC: 5%'
AND C.Address_ID = H2.Address_ID
ORDER BY DateCreated
) H2
Switch that to OUTER APPLY in case you want rows that don't have a matching desired history entry.

Resources