Need subquery for conditional output - sql-server

I am working with a set of vehicle data that uses the following query:
SELECT
VIN_NUM AS [Registration VIN]
,REGION_IND AS [Location of Registration]
,REG_CHANGE AS [Changed Location Since Last Check]
,CASE
WHEN REG_CHANGE = '' THEN REGION_IND
ELSE REG_CHANGE
END AS [Final Location]
FROM
dbo.All_Tests
WHERE
VIN_NUM LIKE '1FM%' AND
CASE
WHEN REGION_IND = '1' THEN 'Upstate'
WHEN REGION_IND = '2' THEN 'Downstate'
ELSE 'Unknown'
END = 'Downstate'
The query pulls from a table a vehicle VIN (VIN_NUM) and whether it is located in one of two regions (REGION_IND), "1" or "2". It also pulls a column, "REG_CHANGE" checking if the vehicle registration has changed location between the two regions since last report. All three come from the same table.
REG_CHANGE is blank (not NULL) if there was no change, and contains the new region location, '1' or '2', if there was a change. This is used in a CASE statement with REGION_IND to give a current location to all vehicles in the database, alias name [Final Location].
The code works if I want the original regions since REGION_IND is a table column. However, I can't use [Final Location] because WHERE statements don't allow aliases. I'm thinking this would be a subquery construct within the SELECT columns, but I'm not certain how it would be structured.
Does anyone have any suggestions?

A useful approach for this is to use an applyoperator within the fromclause which then does permit the use of that column alias within the where clause:
SELECT
VIN_NUM AS [registration vin]
, REGION_IND AS [location of registration]
, REG_CHANGE AS [changed location since last check]
, ca.[final location]
FROM dbo.All_Tests
CROSS APPLY (
SELECT
CASE
WHEN REG_CHANGE = '' THEN REGION_IND
ELSE REG_CHANGE
END AS [final location]
) ca
WHERE VIN_NUM LIKE '1FM%'
AND ca.[final location] = 'Downstate'
Any further uses of àpply that follow can also use these column aliases.
btw: Although a SQL select query starts with the select clause, that clause performed after the from and where clauses. So defining a column alias in the from clause makes that alias available much earlier in the execution of the query.

you can write your where like below
SELECT
VIN_NUM AS [Registration VIN]
,REGION_IND AS [Location of Registration]
,REG_CHANGE AS [Changed Location Since Last Check]
,CASE
WHEN REG_CHANGE = '' THEN REGION_IND
ELSE REG_CHANGE
END AS [Final Location]
FROM
dbo.All_Tests
WHERE
VIN_NUM LIKE '1FM%' AND
(
(REGION_IND = '1' and REG_CHANGE ='Upstate') OR
(REGION_IND = '2' and REG_CHANGE ='Downstate') OR
(REG_CHANGE = 'Downstate')
)

Related

Execution order of CASE statement SQL Server

Does a CASE statement that has multiple parts that is part of an INSERTstatement execute in order and do the 'rules' for lack of a better word stay in place even after the next line? in the query below, does the PO_TYPE assignment overrule the next command - to look in a list of articles for example? So even if that article was in the list in the second part of the statement if it was type 05 or 07 it will still assign to Andrew?
Thanks.
/*INSERT values into the table using SELECT making sure to exclude vendor 20800 - (see last line of code)*/
INSERT INTO SCM_PO_EMPLOYEE_NAME (PO_NUMBER, PO_ITEM_NUMBER, MATERIAL, BUSINESS_UNIT_CODE,PO_TYPE,TEAM_MEMBER_NAME)
SELECT I.PO_NUMBER,
I.PO_ITEM_NUMBER,
I.MATERIAL,
B.BU_CODE,
H.PO_TYPE,
CASE WHEN H.PO_TYPE IN ('05','07') -- Promo PO type - should be on both po type and stock category
AND I.STOCK_CATEGORY LIKE ('A60383%') -- stock category is second part of the check
THEN 'AZ'
WHEN H.PO_TYPE = '02' -- ma PO type
THEN 'MB'
WHEN I.MATERIAL IN ( SELECT ARTICLE
FROM ADI_USER_MAINTAINED.dbo.SCM_EMPLOYEE_ARTICLE A ) -- Check the Employee to article table next
THEN A.TEAM_MEMBER_NAME -- If the PO number matches that conditions then assign the employee from the employee article table
WHEN M.BUSINESS_UNIT_CODE = B.BU_CODE -- if not use then go to the BU assignment (below)
THEN B.TEAM_MEMBER_NAME --- Use the team member name from the Employee_BU table
END AS [TEAM_MEMBER_NAME]
FROM PDX_SAP_USER.dbo.VW_PO_HEADER H
JOIN PDX_SAP_USER.dbo.VW_PO_ITEM I ON H.PO_NUMBER = I.PO_NUMBER
JOIN PDX_SAP_USER.dbo.VW_MM_MATERIAL M ON I.MATERIAL = M.MATERIAL
JOIN ADI_USER_MAINTAINED.dbo.SCM_EMPLOYEE_ARTICLE A ON I.MATERIAL = A.ARTICLE
JOIN ADI_USER_MAINTAINED.dbo.SCM_EMPLOYEE_BU B ON B.BU_CODE = M.BUSINESS_UNIT_CODE
WHERE H.VENDOR_NO <> '20800'; --Exclude '20800' as a vendor!!
A case expression is evaluated sequentially.
So, the second then is only evaluated when the first then does not return true. As an extreme example of this, consider:
select (case when 1=1 then 'true'
when 1/0 = 0 then 'error'
end)
This returns 'true' instead of error'ing out.

SSIS package failing on first run because of to many rows of data

I have an SSIS package that I am trying to run but it always fails because of buffer to much data. This is on my first run. I then thought I should only grab my data from todays date and 30 days and do that insert. My question is how would I then grab the date for that last 30 days then do it again for the next 30 days and again until I have all the data inserted into my data warehouse?
my query looks like this
SELECT db_name() dbname, TicketType, TicketNo source_bk, UniqueID, ItemNo, CASE WHEN VehicleID = '' THEN '-1' ELSE VehicleID END VehicleID
, CASE WHEN TicketID = '' THEN '-1' ELSE TicketID END TicketID,
case when p.purchaseOrder = '' then 'unknown' else p.PurchaseOrder end as PurchaseOrder, TicketDate, TicketTime, S1.LocationID
, S1.CustomerID, S1.OrderID, OrderItem, ProductID, MixID, S1.TaxCodeID, S1.CarrierID, Description, DeliveryAddress1
, Gross, Tare, Net, Qty, Unit, FreightQty, FreightPayQty, S1.Price, S1.FreightRate, S1.FreightAmount, S1.FreightPay
, FreightPayAmount, TodayLoads, TodayQty, OrderLoads, OrderQty, AltTicketQty, AltTicketQtyEdited, TodayAmount
, 'Posted' as [Source] FROM tkhist1 S1 WITH (NOLOCK)
join [dbo].[Slordnam] p
on s1.customerID = p.CustomerID
where s1.TicketDate >= CURRENT_TIMESTAMP -30
UNION
SELECT db_name() dbname, TicketType, TicketNo source_bk, UniqueID, ItemNo, CASE WHEN VehicleID = '' THEN '-1' ELSE VehicleID END VehicleID
, CASE WHEN TicketID = '' THEN '-1' ELSE TicketID END TicketID,
case when p.purchaseOrder = '' then 'unknown' else p.PurchaseOrder end as PurchaseOrder, TicketDate, TicketTime, S1.LocationID
, S1.CustomerID, S1.OrderID, OrderItem, ProductID, MixID, S1.TaxCodeID, S1.CarrierID, Description, DeliveryAddress1
, Gross, Tare, Net, Qty, Unit, FreightQty, FreightPayQty, S1.Price, S1.FreightRate, S1.FreightAmount, S1.FreightPay
, FreightPayAmount, TodayLoads, TodayQty, OrderLoads, OrderQty, AltTicketQty, AltTicketQtyEdited, TodayAmount
, 'Posted' as [Source] FROM Tkbatch S1 WITH (NOLOCK)
join [dbo].[Slordnam] p
on s1.customerID = p.CustomerID
where s1.TicketDate >= CURRENT_TIMESTAMP -30
UNION
SELECT db_name() dbname, TicketType, TicketNo source_bk, UniqueID, ItemNo, CASE WHEN VehicleID = '' THEN '-1' ELSE VehicleID END VehicleID
, CASE WHEN TicketID = '' THEN '-1' ELSE TicketID END TicketID,
case when p.purchaseOrder = '' then 'unknown' else p.PurchaseOrder end as PurchaseOrder, TicketDate, TicketTime, S1.LocationID
, S1.CustomerID, S1.OrderID, OrderItem, ProductID, MixID, S1.TaxCodeID, S1.CarrierID, Description, DeliveryAddress1
, Gross, Tare, Net, Qty, Unit, FreightQty, FreightPayQty, S1.Price, S1.FreightRate, S1.FreightAmount, S1.FreightPay
, FreightPayAmount, TodayLoads, TodayQty, OrderLoads, OrderQty, AltTicketQty, AltTicketQtyEdited, TodayAmount
, 'Posted' as [Source] FROM Tkscale S1 WITH (NOLOCK)
join [dbo].[Slordnam] p
on s1.customerID = p.CustomerID
where s1.TicketDate >= CURRENT_TIMESTAMP -30
after it got done inserting these I would want it to do this again on the next run but get the next 30 days from the end of this. So what I already have is because this data is coming from many databases I have a for each loop and doing this for each database. I want it to get 30 days from now then do the process over and over for the first run because I can't get it to run at all as it is. Then I would do a CDC to only do new data
Two answers to consider...
One is, what do you mean "to much data" - SSIS doesn't have a limit on how much data you want to put through a dataflow. Are you saying the server can't handle the query or you are trying to say do a lookup in SSIS and running out of memory or something else? Bottom line is, it sounds like you might be wise to approach this in a different way since there should be no such limitation.
Two is, if you really want to process sets of rows at a time, make your query dynamic and do a For Loop saving incrementing your date range 30 days for every loop. You can create dynamic SQL in a number of ways and those methods depend on what type of connection you are using and what method you prefer as they all have pros/cons. The most flexible and consistent way which works with any relational connection is to create a string variable and set it's value based on a vbscript.
I hope that helps.

How to use GROUPING function in SQL common table expression - CTE

I have the below T-SQL CTE code where i'm trying to do some row grouping on four columns i.e Product, ItemClassification, Name & Number.
;WITH CTE_FieldData
AS (
SELECT
CASE(GROUPING(M.CodeName))
WHEN 0 THEN M.CodeName
WHEN 1 THEN 'Total'
END AS Product,
CASE(GROUPING(KK.ItemClassification))
WHEN 0 THEN KK.[ItemClassification]
WHEN 1 THEN 'N/A'
END AS [ItemClassification],
CASE(GROUPING(C.[Name]))
WHEN 0 THEN ''
WHEN 1 THEN 'Category - '+ '('+ItemClassification+')'
END AS [Name],
CASE(GROUPING(PYO.Number))
WHEN 0 THEN PYO.Number
WHEN 1 THEN '0'
END AS [Number],
ISNULL(C.[Name],'') AS ItemCode,
MAX(ISNULL(PYO.Unit, '')) AS Unit,
MAX(ISNULL(BT.TypeName, '')) AS [Water Type],
MAX(ISNULL(PYO.OrderTime, '')) AS OrderTime,
MAX(ISNULL(BUA.Event, '')) AS Event,
MAX(ISNULL(PYO.Remarks, '')) AS Remarks,
GROUPING(M.CodeName) AS ProductGrouping,
GROUPING(KK.ItemClassification) AS CategoryGrouping,
GROUPING(C.[Name]) AS ItemGrouping
FROM CTable C INNER JOIN CTableProducts CM ON C.Id = CM.Id
INNER JOIN MyData R ON R.PId = CM.PId
INNER JOIN MyDataDetails PYO ON PYO.CId = C.CId AND PYO.ReportId = R.ReportId
INNER JOIN ItemCategory KK ON C.KId = KK.KId
INNER JOIN Product M ON R.ProductId = M.ProductId
INNER JOIN WaterType BT ON PYO.WId = BT.WId
INNER JOIN WaterUnit BUA ON PYO.WUId = BUA.WUId
WHERE R.ReportId = 4360
GROUP BY M.CodeName, KK.ItemClassification, C.Name, PYO.Number
WITH ROLLUP
)
SELECT
Product,
[Name] AS Category,
Number,
Unit as ItemCode,
[Water Type],
OrderTime,
[Event],
[Comment]
FROM CTE_FieldData
Below are the issues/problems with the data being returned by the script above and they are the ones i'm trying to fix.
At the end of each ItemClassification grouping, i extra record is being added yet it does not exist in the table. (See line number 4 & 10 in the sample query results screenshot attached).
I want the ItemClassification grouping in column 2 to be at the beginning of the group not at the end of the group.
That way, ItemClassification "Category- (One)" would be at line 1 not the current line 5.
Also ItemClassification "Category- (Two)" would be at line 5 not the current line 11
Where the "ItemClassification" is displaying i would like to have columns (Number, ItemCode, [Water Type], [OrderTime], [Event], [Comment]) display null.
In the attached sample query results screenshot, those would be rows 11 & 5
The last row (13) is also unwanted.
I'm trying to understand SQL CTE and the GROUPING function but i'm not getting things right.
It looks like this is mostly caused by WITH ROLLUP and GROUPING. ROLLUP allows you to make essentially a sum line for your groupings. When you have WITH ROLLUP, it will give you NULL values for all of your non-aggregated fields in your select statement. You use GROUPING() in conjunction with ROLLUP to then label those NULL's as 'Total' or '0' or 'Category' as your query does.
1) Caused by GROUPING and ROLLUP. Take away both and this should be resolved.
2) Not sure what determines your groups and what would be defined as beginning or end. Order BY should suffice
3) Use ISNULL or CASE WHEN. If the Item Classification has a non null or non blank value, NULL each field out.
4) Take off WITH ROLLUP.

TSQL Select Statement using Case or Join

I am a little stuck on a situation that I have been trying to fight through. I have a page that allows a user to select all the filter options they want to search by and then it runs the query on that data.
Every field requires something to be picked but on a new field I am introducing, it's going to be optional.
It allows you to provide a list of supervisors and it will then provide all records where the agents supervisor is in the list provided; pretty straight forward. However, I am trying to make this optional as I don't want to always search by users. If I don't provide a name in the UI to pass to the stored procedure, then I want to ignore this part of the statement and get me everything regardless of the manager.
Here is the query I am working with:
SELECT a.[escID],
a.[escReasonID],
b.[ArchibusLocationName],
c.[ArchibusLocationName],
b.[DepartmentDesc],
c.[DepartmentDesc],
a.[escCreatedBy],
a.[escWorkedBy],
a.[escNotes],
a.[preventable],
a.[escalationCreated],
a.[escalationTracked],
a.[feedbackID],
typ.[EscalationType],
typ.[EscalationTypeText] AS escalationType,
d.reasonText AS reasonText
FROM [red].[dbo].[TFS_Escalations] AS a
LEFT OUTER JOIN
red.dbo.EmployeeTable AS b
ON a.escCreatedBy = b.QID
LEFT OUTER JOIN
red.dbo.EmployeeTable AS c
ON a.escWorkedBy = c.QID
LEFT OUTER JOIN
red.dbo.TFS_Escalation_Reasons AS d
ON a.escReasonID = d.ReasonID
INNER JOIN
dbo.TFS_EscalationTypes AS typ
ON d.escType = typ.EscalationType
WHERE B.[ArchibusLocationName] IN (SELECT location
FROM #tmLocations)
AND C.[ArchibusLocationName] IN (SELECT location
FROM #subLocations)
AND B.[DepartmentDesc] IN (SELECT department
FROM #tmDepartments)
AND C.[DepartmentDesc] IN (SELECT department
FROM #subDepartments)
AND DATEDIFF(second, '19700101', CAST (CONVERT (DATETIME, A.[escalationCreated], 121) AS INT)) >= #startDate
AND DATEDIFF(second, '19700101', CAST (CONVERT (DATETIME, A.[escalationCreated], 121) AS INT)) <= #endDate
AND a.[PREVENTABLE] IN (SELECT PREVENTABLE FROM #preventable)
AND b.MgrQID IN (SELECT leaderQID FROM #sourceLeaders)
The part that I am trying to make option is the very last line of the query:
AND b.MgrQID IN (SELECT leaderQID FROM #sourceLeaders)
Essentially, if there is no data in the temp table #sourceLeaders then it should ignore that piece of the query.
In all of the other instances of the WHERE clause, something is always required for those fields which is why that all works fine. I just cant figure out the best way to make this piece optional depending on if the temp table has data in it (the temp table is populated by the names entered in the UI that a user COULD search by).
So this line should be TRUE if something matches data in the table variable OR there is nothing in the table variable
AND
(
b.MgrQID IN (SELECT leaderQID FROM #sourceLeaders)
OR
NOT EXISTS (SELECT 1 FROM #sourceLeaders)
)
Similar to Nick.McDermaid's, but uses a case statement instead :
AND
(
1 = CASE WHEN NOT EXISTS(SELECT 1 FROM #sourceLeaders) THEN 1
WHEN b.MgrQID IN (SELECT leaderQID FROM #sourceLeaders) THEN 1
ELSE 0
END
)
Maybe at the top so you have a single check
DECLARE #EmptySourceLeaders CHAR(1)
IF EXISTS (SELECT 1 FROM #sourceLeaders)
SET #EmptySourceLeaders = 'N'
ELSE
SET #EmptySourceLeaders = 'Y'
Then in the joins
LEFT OUTER JOIN #SourceLeaders SL
ON b.MgrQID = SL.leaderQID
Then in the WHERE
AND (#EmptySourceLeaders = 'Y' OR SL.leaderQID IS NOT NULL)
lots of ways to do it.

SQL WHERE clause performance degradation

I have a query that I am working on and it is displaying performance issues that I would not have expected. Here is the query so far.
INSERT INTO #Bridge (PolicyNumber, ProducerCode, BridgeDate, EffectiveDate, FirstName, LastName, LicenseNumber, BirthDate, Address, City, State, ZipCode)
SELECT tab.col.value('#PolicyNumber', 'VARCHAR(10)') AS PolicyNumber,
tab.col.value('#ProducerCode','VARCHAR(10)') as ProducerCode,
tab.col.value('#BridgeDate','DATETIME') AS BridgeDate,
tab.col.value('#EffectiveDate', 'DATETIME') as EffectiveDate,
tab.col.value('#FirstName', 'VARCHAR(200)') as FirstName,
tab.col.value('#LastName', 'VARCHAR(200)') as LastName,
CASE
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%0000%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%1111%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%2222%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%3333%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%4444%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%5555%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%6666%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%7777%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%8888%' THEN NULL
WHEN tab.col.value('#LicenseNumber','VARCHAR(50)') LIKE '%9999%' THEN NULL
ELSE tab.col.value('#LicenseNumber','VARCHAR(50)')
END as LicenseNumber,
tab.col.value('#BirthDate','DATETIME') as BirthDate,
REPLACE(tab.col.value('#Address1','VARCHAR(300)'), ' APT ',' #') as Address1,
tab.col.value('#City','VARCHAR(300)') as City,
tab.col.value('#State','VARCHAR(5)') as State,
tab.col.value('#ZipCode','VARCHAR(10)') as Zip
FROM #xml.nodes('//rows/datarow') as tab(col)
SELECT B.PolicyNumber,
B.ProducerCode,
B.BridgeDate,
B.EffectiveDate,
H.current_policy,
H.cancel_date,
H.first_eff_date,
H.display_address,
H.city,
H.state,
H.zip
FROM #Bridge B
LEFT JOIN (
SELECT P.policy_id,
P.current_policy,
CASE
WHEN A.pobox <> '' THEN 'PO BOX ' + REPLACE(A.pobox,'PO BOX ','')
ELSE RTRIM(A.house_num + ' ' + A.street_name + ' ' + CASE
WHEN A.apt_num = '' THEN ''
ELSE '#' + A.apt_num
END)
END as display_address,
A.pobox,
A.house_num,
A.street_name,
A.apt_num,
A.city,
MAX(A.policyimage_num) as policimage_num, --this is just to limit the results to the most recent
S.state,
A.zip,
P.first_eff_date,
P.cancel_date
FROM Diamond.dbo.Policy P WITH (NOLOCK)
LEFT JOIN Diamond.dbo.Address A WITH (NOLOCK)
ON P.policy_id = A.policy_id
AND A.nameaddresssource_id = 3
LEFT JOIN Diamond.dbo.State S WITH (NOLOCK)
ON A.state_id = S.state_id
WHERE A.state_id IS NOT NULL
AND P.current_policy NOT IN (SELECT PolicyNumber FROM #Bridge)
GROUP BY P.policy_id,
P.current_policy,
P.cancel_date,
P.first_eff_date,
A.pobox,
A.house_num,
A.street_name,
A.apt_num,
A.city,
S.state,
A.zip) AS H
ON B.Address = H.display_address
AND B.State = H.state
AND B.City = H.city
AND SUBSTRING(B.ZipCode,1,5) = SUBSTRING(H.Zip,1,5)
AND B.PolicyNumber != H.current_policy
WHERE H.current_policy IS NOT NULL
This query, run by itself, finishes in about 1:30 seconds. But if I add the following to the WHERE clause
AND B.EffectiveDate != H.first_eff_date
Suddenly the query takes far longer to return results. (We are at over 15 minutes and still going while I am writing this) I would think that simply having a clause to weed out a few additional rows wouldn't have such a drastic effect, but apparently it does. I how to get around it, I am just curious if anyone has any ideas as to why it has this effect?
Without having a hands on I can only guess at this, but here are some places I think you can tidy up and probably shave off run time.
1, You duplicate the effort required to make sure policy numbers don't match. Pick one of the two you have, not both. I would suggest trying both see which is faster.
i.e. this:
AND P.current_policy NOT IN (SELECT PolicyNumber FROM #Bridge)
Will do the same as this, you don't need both.
AND B.PolicyNumber != H.current_policy
2, It's worth a try to remove all that grouping from your sub query - you don't actually use policimage_num for anything. So why do the grouping? If you are worried that many rows are returned from Address, then you can use DISTINCT on your column set instead, that may be faster.
3, Is A.state_id a nullable value? If not consider trying an INNER JOIN to Address and removing the null check.
4, In all honesty I'm not seeing an obvious reason for that subquery at all, it seems to be over-complicating matters. Can you not simply join the tables together without it (again using DISTINCT if required)?
In other words get tweaking, I bet you can get it below the original run time if you try a few of these ideas.

Resources