SQL Server : LAG multiple Nulls - sql-server

I am trying to "fill in the blanks" with my table below.
I have created a PDF reader that spits out a JSON file that I have manipulated into the below format.
There is very little scope in changing source data, or using a secondary table as a helper, due to the fact that this is meant to handle new data not "seen" before.
What I need help with is getting the Class and Group columns filled in. By this I mean that the Class Column will always have a value in the last row, and I need this repeated (upwards) until it comes across a non-blank value in the column. It then needs to repeat this value, until it comes across the next non-blank and so on.
Similarly the Group Column needs the same solution but starting from the first row down.
I have tried LAG() LEAD() etc with default values, but it doesn't handle the multiple nulls.
I also need the Group column to show the class value when not blank.
I have had a look at cte's but not overly familiar with them and have gotten myself tied in knots today!
Any help is appreciated.
Current Data
ID, Class, Group, Total, Account
1, null, INCOME, null, Fencing
2, null, null, null, Crop
3, Net Income, null, null, Net Income
4, null, Farm Expenditure, null, Irrigation
5, null, null, null, electricity
6, Surplus, null, null, Surplus
7, null, GST, null, GST
8, Closing Balance, null, null, Closing Balance
What I want
ID, Class, Group, Total, Account
1, Net Income, INCOME, null, Fencing
2, Net Income, INCOME, null, Crop
3, Net Income, INCOME, null, Net Income
4, Surplus, Farm Expenditure, null, Irrigation
5, Surplus, Farm Expenditure, null, electricity
6, Surplus, Farm Expenditure, null, Surplus
7, Closing Balance, GST, null, GST
8, Closing Balance, GST, null, Closing Balance

This gives you the output you want with the data you give. In your sample data there is only one "group" in each "class", but it looks like there could maybe be multiple groups per class? If that is the case, it will be a bit more complicated, but the principal will be the same.
CREATE TABLE #data (ID INT, Class VARCHAR(50), [Group] VARCHAR(50), Total INT, Account VARCHAR(50));
INSERT INTO #data(ID, Class, [Group], Total, Account) VALUES
(1, null, 'INCOME', null, 'Fencing'),
(2, null, null, null, 'Crop'),
(3, 'Net Income', null, null, 'Net Income'),
(4, null, 'Farm Expenditure', null, 'Irrigation'),
(5, null, null, null, 'electricity'),
(6, 'Surplus', null, null, 'Surplus'),
(7, null, 'GST', null, 'GST'),
(8, 'Closing Balance', null, null, 'Closing Balance');
-- Find the break points that signify the end of a Class
WITH breaks as(
SELECT IIF(Class IS NOT NULL, 1, 0) AS breakpoint, ID, Class, [Group], Total, Account
FROM #data
),
-- count the breakpoints passed so each group will have a number we can group by
grp AS (
SELECT ISNULL(SUM(breakpoint) OVER (ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp,
ID,Class,[Group],Total,Account
FROM breaks
)
SELECT MAX(grp.Class) OVER (PARTITION BY grp.grp) AS Class,
MAX(grp.[Group]) OVER (PARTITION BY grp.grp) AS [Group],
grp.Total,
grp.Account
FROM grp

Thanks To James, I have now tweaked his code to give the correct Group, as the class and group needed different logic.
-- Find the break points that signify the end of a Class
WITH classbreaks as(
SELECT IIF(Class IS NOT NULL, 1, 0) AS breakpoint, ID, Class, [Group], Total, Account
FROM [PedGroup_db].[dbo].[Cashflow]
),
-- Find the break points that signify the end of a Group
grpbreaks as(
SELECT IIF([Group] IS NOT NULL, 1, 0) AS breakpoint, ID, Class, [Group], Total, Account
FROM [PedGroup_db].[dbo].[Cashflow]
),
-- count the breakpoints passed so each class will have a number we can group by
clss AS (
SELECT ISNULL(SUM(breakpoint) OVER (ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp,
ID,Class,[Group],Total,Account
FROM classbreaks
),
-- count the breakpoints passed so each group will have a number we can group by
grp AS (
SELECT ISNULL(SUM(breakpoint) OVER (ORDER BY ID),0) AS grp,
ID,Class,[Group],Total,Account
FROM grpbreaks
)
--join the two sub queries together on ID
SELECT MAX(clss.Class) OVER (PARTITION BY clss.grp) AS Class,
case when clss.Class = grp.Account Then grp.Account else MAX(grp.[Group]) OVER (PARTITION BY grp.grp) end AS [Group],
grp.Total,
grp.Account
FROM clss left join grp on grp.ID = clss.ID

Related

SQL Server 2016 Compare values from multiple columns in multiple rows in single table

USE dev_db
GO
CREATE TABLE T1_VALS
(
[SITE_ID] [int] NULL,
[LATITUDE] [numeric](10, 6) NULL,
[UNIQUE_ID] [int] NULL,
[COLLECT_RANK] [int] NULL,
[CREATED_RANK] [int] NULL,
[UNIQUE_ID_RANK] [int] NULL,
[UPDATE_FLAG] [int] NULL
)
GO
INSERT INTO T1_VALS
(SITE_ID,LATITUDE,UNIQUE_ID,COLLECT_RANK,CREATED_RANK,UNIQUEID_RANK)
VALUES
(207442,40.900470,59664,1,1,1)
(207442,40.900280,61320,1,1,2)
(204314,40.245220,48685,1,2,2)
(204314,40.245910,59977,1,1,1)
(202416,39.449530,9295,1,1,2)
(202416,39.449680,62264,1,1,1)
I generated the COLLECT_RANK and CREATED_RANK columns from two date columns (not shown here) and the UNIQUEID_RANK column from the UNIQUE_ID which is used here.
I used a SELECT OVER clause with ranking function to generate these columns. A _RANK value of 1 means the latest date or greatest UNIQUE_ID value. I thought my solution would be pretty straight forward using these rank values via array and cursor processing but I seem to have painted myself into a corner.
My problem: I need to choose LONGITUDE value and its UNIQUE_ID based upon the following business rules and set the update value, (1), for that record in its UPDATE_FLAG column.
Select the record w/most recent Collection Date (i.e. RANK value = 1) for a given SITE_ID. If multiple records exist w/same Collection Date (i.e. same RANK value), select the record w/most recent Created Date (RANK value =1) for a given SITE_ID. If multiple records exist w/same Created Date, select the record w/highest Unique ID for a given SITE_ID (i.e. RANK value = 1).
Your suggestions would be most appreciated.
I think you can use top and order by:
select top 1 t1.*
from t1_vals
order by collect_rank asc, create_rank, unique_id desc;
If you want this for sites, which might be what your question is asking, then use row_number():
select t1.*
from (select t1.*,
row_number() over (partition by site_id order by collect_rank asc, create_rank, unique_id desc) as seqnum
from t1_vals
) t1
where seqnum = 1;

T-SQL prepare dynamic COALESCE

As attached in screenshot, there are two tables.
Configuration:
Detail
Using Configuration and Detail table I would like to populate IdentificationType and IDerivedIdentification column in the Detail table.
Following logic should be used, while deriving above columns
Configuration table has order of preference, which user can change dynamically (i.e. if country is Austria then ID preference should be LEI then TIN (in case LEI is blanks) then CONCAT (if both blank then some other logic)
In case of contract ID = 3, country is BG, so LEI should be checked first, since its NULL, CCPT = 456 will be picked.
I could have used COALESCE and CASE statement, in case hardcoding is allowed.
Can you please suggest any alternation approach please ?
Regards
Digant
Assuming that this is some horrendous data dump and you are trying to clean it up here is some SQL to throw at it. :) Firstly, I was able to capture your image text via Adobe Acrobat > Excel.
(I also built the schema for you at: http://sqlfiddle.com/#!6/8f404/12)
Firstly, the correct thing to do is fix the glaring problem and that's the table structure. Assuming you can't here's a solution.
So, here it is and what it does is unpivots the columns LEI, NIND, CCPT and TIN from the detail table and also as well as FirstPref, SecondPref, ThirdPref from the Configuration table. Basically, doing this helps to normalize the data although it's costing you major performance if there are no plans to fix the data structure or you cannot. After that you are simply joining the tables Detail.ContactId to DerivedTypes.ContactId then DerivedPrefs.ISOCountryCode to Detail.CountrylSOCountryCode and DerivedTypes.ldentificationType = DerivedPrefs.ldentificationType If you use an inner join rather than the left join you can remove the RANK() function but it will not show all ContactIds, only those that have a value in their LEI, NIND, CCPT or TIN columns. I think that's a better solution anyway because why would you want to see an error mixed in a report? Write a separate report for those with no values in those columns. Lastly, the TOP (1) with ties allows you to display one record per ContactId and allows for the record with the error to still display. Hope this helps.
CREATE TABLE Configuration
(ISOCountryCode varchar(2), CountryName varchar(8), FirstPref varchar(6), SecondPref varchar(6), ThirdPref varchar(6))
;
INSERT INTO Configuration
(ISOCountryCode, CountryName, FirstPref, SecondPref, ThirdPref)
VALUES
('AT', 'Austria', 'LEI', 'TIN', 'CONCAT'),
('BE', 'Belgium', 'LEI', 'NIND', 'CONCAT'),
('BG', 'Bulgaria', 'LEI', 'CCPT', 'CONCAT'),
('CY', 'Cyprus', 'LEI', 'NIND', 'CONCAT')
;
CREATE TABLE Detail
(ContactId int, FirstName varchar(1), LastName varchar(3), BirthDate varchar(4), CountrylSOCountryCode varchar(2), Nationality varchar(2), LEI varchar(9), NIND varchar(9), CCPT varchar(9), TIN varchar(9))
;
INSERT INTO Detail
(ContactId, FirstName, LastName, BirthDate, CountrylSOCountryCode, Nationality, LEI, NIND, CCPT, TIN)
VALUES
(1, 'A', 'DES', NULL, 'AT', 'AT', '123', '4345', NULL, NULL),
(2, 'B', 'DEG', NULL, 'BE', 'BE', NULL, '890', NULL, NULL),
(3, 'C', 'DEH', NULL, 'BG', 'BG', NULL, '123', '456', NULL),
(4, 'D', 'DEi', NULL, 'BG', 'BG', NULL, NULL, NULL, NULL)
;
SELECT TOP (1) with ties Detail.ContactId,
FirstName,
LastName,
BirthDate,
CountrylSOCountryCode,
Nationality,
LEI,
NIND,
CCPT,
TIN,
ISNULL(DerivedPrefs.ldentificationType, 'ERROR') ldentificationType,
IDerivedIdentification,
RANK() OVER (PARTITION BY Detail.ContactId ORDER BY
CASE WHEN Pref = 'FirstPref' THEN 1
WHEN Pref = 'SecondPref' THEN 2
WHEN Pref = 'ThirdPref' THEN 3
ELSE 99 END) AS PrefRank
FROM
Detail
LEFT JOIN
(
SELECT
ContactId,
LEI,
NIND,
CCPT,
TIN
FROM Detail
) DetailUNPVT
UNPIVOT
(IDerivedIdentification FOR ldentificationType IN
(LEI, NIND, CCPT, TIN)
)AS DerivedTypes
ON DerivedTypes.ContactId = Detail.ContactId
LEFT JOIN
(
SELECT
ISOCountryCode,
CountryName,
FirstPref,
SecondPref,
ThirdPref
FROM
Configuration
) ConfigUNPIVOT
UNPIVOT
(ldentificationType FOR Pref IN
(FirstPref, SecondPref, ThirdPref)
)AS DerivedPrefs
ON DerivedPrefs.ISOCountryCode = Detail.CountrylSOCountryCode
and DerivedTypes.ldentificationType = DerivedPrefs.ldentificationType
ORDER BY RANK() OVER (PARTITION BY Detail.ContactId ORDER BY
CASE WHEN Pref = 'FirstPref' THEN 1
WHEN Pref = 'SecondPref' THEN 2
WHEN Pref = 'ThirdPref' THEN 3
ELSE 99 END)

Min date for a guest across all their accounts

So let me explain our setup first. We have two tables on SQL Server 2012 we are looking at:
PL_Guest and PL_MergedGuests
PL_Guest structure is as follows:
Create Table PL_Guest(
GuestID [int] IDENTITY(1,1) NOT NULL Primary Key,
CreatedDate [date] NOT NULL)
PL_MergedGuests structure is as follows:
Create Table PL_MergedGuests(
MergeID [int] IDENTITY(1,1) NOT NULL Primary Key,
VictimID [int] NOT NULL,
SurvivorID [int] NOT NULL)
So the situation is this:
The PL_Guest table keeps a record of every guest ever created and never has any records removed from it. The PL_MergedGuests table house a list of victimid and survivorid of when we merge two accounts together. We would do this if for instance a person got more than one Guestid assigned to them for some reason. Now when the merge is done the front line employee just picks which of the multiple accounts the person will be keeping this is usually done based on the card the guest has in hand so as to involve less changes for the guest. If a guest had GuestID 5 with CreatedDate 1/1/2013 and GuestID 10 with CreatedDate 10/1/2015; and the merge was done so that GuestID 5 was merged into GuestID 10 then GuestID 5 becomes the Victim and GuestID 10 becomes the survivor. When we run reports we only look at survivor accounts. However we are being asked to find the oldest Created Date for each guest. So for the above example they would want and entry that returns GuestID 10 Created date 1/1/2013 as the Guest that has GuestID 10 also had GuestID 5 which had the older created date of the 1/1/2013.
Now for the really hard part, there is no limit to the number of times a Guest could have been merged and these tables hold over 100 million records each. I was thinking that this would require some kind of looping(I think this might be referred to a recursive coding though I am unsure on that) but I am at a lost for how to write that code. I do have access to create new tables if that will help but can not modify the current tables.
Due to lack of "real" examples I defined several examples myself and maintained several merges. I used a recursive cte in order to evaluate the desired "min creation date". I don't know how fast or slow this will work on your tables, but at least it should provide a usable starting point for further development:
DECLARE #PL_Guest TABLE(
GuestID INT NOT NULL,
CreatedDate [date] NOT NULL
)
DECLARE #PL_MergedGuests TABLE(
MergeID INT NOT NULL,
VictimID [int] NOT NULL,
SurvivorID [int] NOT NULL
)
INSERT INTO #PL_Guest
VALUES (1, '2016-11-01'), (2, '2016-12-01'), (3, '2016-11-01'), (4, '2016-12-01'), (5, '2017-01-01'), (6, '2017-01-01'), (7, '2017-02-01'), (8, '2017-02-01'), (9, '2017-03-01'), (10, '2017-04-01');
INSERT INTO #PL_MergedGuests
VALUES (1, 3, 4), (2, 4, 6), (3, 9, 6), (4, 10, 2), (5, 8, 5);
WITH cteRecursice AS(
SELECT mg1.SurvivorID, mg1.VictimID, 1 AS lvl, mg1.SurvivorID AS LastSurvivor, pg1.CreatedDate AS LastSurvivorCreatedDate
FROM #PL_MergedGuests mg1
JOIN #PL_Guest pg1 ON pg1.GuestID = mg1.SurvivorID
UNION ALL
SELECT mg2.SurvivorID, mg2.VictimID, c.lvl + 1 AS lvl, c.LastSurvivor, c.LastSurvivorCreatedDate
FROM #PL_MergedGuests mg2
JOIN cteRecursice c ON mg2.SurvivorID = c.VictimID
),
cteGrouped AS(
SELECT LastSurvivor, LastSurvivorCreatedDate, MIN(CreatedDate) AS MinCreatedDate
FROM cteRecursice
JOIN #PL_Guest AS pg ON pg.GuestID = VictimID
WHERE LastSurvivor NOT IN (SELECT VictimID FROM #PL_MergedGuests AS pmg)
GROUP BY LastSurvivor, LastSurvivorCreatedDate
UNION ALL
SELECT GuestID, CreatedDate, CreatedDate
FROM #PL_Guest pg
WHERE GuestID NOT IN (SELECT VictimID FROM #PL_MergedGuests UNION ALL SELECT SurvivorID FROM #PL_MergedGuests)
)
SELECT LastSurvivor, IIF(MinCreatedDate < LastSurvivorCreatedDate, MinCreatedDate, LastSurvivorCreatedDate) AS MinCreatedDate
FROM cteGrouped cg
ORDER BY LastSurvivor
OPTION (MAXRECURSION 0)

Copy Distinct Records Based on 3 Cols

I have loads of data in a table called Temp. This data consists of duplicates.
Not Entire rows but the same data in 3 columns. They are HouseNo,DateofYear,TimeOfDay.
I want to copy only the distinct rows from "Temp" into another table, "ThermData."
Basically what i want to do is copy all the distinct rows from Temp to ThermData where distinct(HouseNo,DateofYear,TimeOfDay). Something like that.
I know we can't do that. An alternative to how i can do that.
Do help me out. I have tried lots of things but haven't solved got it.
Sample Data. Values which are repeated are like....
I want to delete the duplicate row based on the values of HouseNo,DateofYear,TimeOfDay
HouseNo DateofYear TimeOfDay Count
102 10/1/2009 0:00:02 AM 2
102 10/1/2009 1:00:02 AM 2
102 10/1/2009 10:00:02 AM 2
Here is a Northwind example based on the Orders table.
There are duplicates based on the (EmployeeID , ShipCity , ShipCountry) columns.
If you only execute the code between these 2 lines:
/* Run everything below this line to show crux of the fix */
/* Run everything above this line to show crux of the fix */
you'll see how it works. Basically:
(1) You run a GROUP BY on the 3 columns of interest. (derived1Duplicates)
(2) Then you join back to the table using these 3 columns. (on ords.EmployeeID = derived1Duplicates.EmployeeID and ords.ShipCity = derived1Duplicates.ShipCity and ords.ShipCountry = derived1Duplicates.ShipCountry)
(3) Then for each group, you tag them with Cardinal numbers (1,2,3,4,etc) (using ROW_NUMBER())
(4) Then you keep the row in each group that has the cardinal number of "1". (where derived2DuplicatedEliminated.RowIDByGroupBy = 1)
Use Northwind
GO
declare #DestinationVariableTable table (
NotNeededButForFunRowIDByGroupBy int not null ,
NotNeededButForFunDuplicateCount int not null ,
[OrderID] [int] NOT NULL,
[CustomerID] [nchar](5) NULL,
[EmployeeID] [int] NULL,
[OrderDate] [datetime] NULL,
[RequiredDate] [datetime] NULL,
[ShippedDate] [datetime] NULL,
[ShipVia] [int] NULL,
[Freight] [money] NULL,
[ShipName] [nvarchar](40) NULL,
[ShipAddress] [nvarchar](60) NULL,
[ShipCity] [nvarchar](15) NULL,
[ShipRegion] [nvarchar](15) NULL,
[ShipPostalCode] [nvarchar](10) NULL,
[ShipCountry] [nvarchar](15) NULL
)
INSERT INTO #DestinationVariableTable (NotNeededButForFunRowIDByGroupBy , NotNeededButForFunDuplicateCount , OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry )
Select RowIDByGroupBy , MyDuplicateCount , OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry
From
(
/* Run everything below this line to show crux of the fix */
Select
RowIDByGroupBy = ROW_NUMBER() OVER(PARTITION BY ords.EmployeeID , ords.ShipCity , ords.ShipCountry ORDER BY ords.OrderID )
, derived1Duplicates.MyDuplicateCount
, ords.*
from
[dbo].[Orders] ords
join
(
select EmployeeID , ShipCity , ShipCountry , COUNT(*) as MyDuplicateCount from [dbo].[Orders] GROUP BY EmployeeID , ShipCity , ShipCountry /*HAVING COUNT(*) > 1*/
) as derived1Duplicates
on ords.EmployeeID = derived1Duplicates.EmployeeID and ords.ShipCity = derived1Duplicates.ShipCity and ords.ShipCountry = derived1Duplicates.ShipCountry
/* Run everything above this line to show crux of the fix */
)
as derived2DuplicatedEliminated
where derived2DuplicatedEliminated.RowIDByGroupBy = 1
select * from #DestinationVariableTable
emphasized text*emphasized text*emphasized text

SQL Query to return an item within range or nearest range

I have a table of ranges that looks like
CREATE TABLE [dbo].[WeightRange](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Description] [nvarchar](50) NULL,
[LowerBound] [decimal](18, 2) NULL,
[UpperBound] [decimal](18, 2) NULL,
[GroupID] [int] NULL
)
Given a weight and group id I need to find the matching (or nearest) range id.
Example
WeightRanges
1, 0-100kgs, 0, 100, 1
2, 101-250kgs, 101, 250, 1
3, 501-1000kgs, 501, 1000, 1
If the weight is 10 the it should return id 1, if the weight is 1500 it should return id 3, and if the weight is 255 it should return id 2. I have left the group out of the example for simplicity.
At this stage I don't really want to change the database design.
I'd use a CASE statement to create a column with the "distance", and then order by distance and take the first item.
Snippet which may help:
SELECT TOP 1 d.id
FROM (
SELECT id, CASE WHEN (#weight >= LowerBound)
AND (#weight <= UpperBound) THEN 0
WHEN (#weight < LowerBound) THEN LowerBound-#weight
WHEN (#weight > UpperBound) THEN #weight-UpperBound
END AS distance
FROM WeightRange
) d
WHERE d.distance IS NOT NULL
ORDER BY d.distance ASC
I think this stored function should to the trick - it uses a CTE (Common Table Expression) internally, so it'll work with SQL Server 2005 and up:
CREATE FUNCTION dbo.FindClosestID(#WeightValue DECIMAL(17,2))
RETURNS INT
AS BEGIN
DECLARE #ReturnID INT;
WITH WeightDistance AS
(
SELECT ID, ABS(Lowerbound - #WeightValue) 'Distance'
FROM WeightRange
UNION ALL
SELECT ID, ABS(upperbound - #WeightValue) 'Distance'
FROM WeightRange
)
SELECT TOP 1 #ReturnID = ID
FROM WeightDistance
ORDER BY Distance
RETURN #ReturnID
END
These queries will return the following values:
SELECT
dbo.FindClosestID(75.0),
dbo.FindClosestID(300.0),
dbo.FindClosestID(380.0),
dbo.FindClosestID(525.0),
dbo.FindClosestID(1500.0)
1 2 3 3 3
Marc

Resources