Related
I am trying to "fill in the blanks" with my table below.
I have created a PDF reader that spits out a JSON file that I have manipulated into the below format.
There is very little scope in changing source data, or using a secondary table as a helper, due to the fact that this is meant to handle new data not "seen" before.
What I need help with is getting the Class and Group columns filled in. By this I mean that the Class Column will always have a value in the last row, and I need this repeated (upwards) until it comes across a non-blank value in the column. It then needs to repeat this value, until it comes across the next non-blank and so on.
Similarly the Group Column needs the same solution but starting from the first row down.
I have tried LAG() LEAD() etc with default values, but it doesn't handle the multiple nulls.
I also need the Group column to show the class value when not blank.
I have had a look at cte's but not overly familiar with them and have gotten myself tied in knots today!
Any help is appreciated.
Current Data
ID, Class, Group, Total, Account
1, null, INCOME, null, Fencing
2, null, null, null, Crop
3, Net Income, null, null, Net Income
4, null, Farm Expenditure, null, Irrigation
5, null, null, null, electricity
6, Surplus, null, null, Surplus
7, null, GST, null, GST
8, Closing Balance, null, null, Closing Balance
What I want
ID, Class, Group, Total, Account
1, Net Income, INCOME, null, Fencing
2, Net Income, INCOME, null, Crop
3, Net Income, INCOME, null, Net Income
4, Surplus, Farm Expenditure, null, Irrigation
5, Surplus, Farm Expenditure, null, electricity
6, Surplus, Farm Expenditure, null, Surplus
7, Closing Balance, GST, null, GST
8, Closing Balance, GST, null, Closing Balance
This gives you the output you want with the data you give. In your sample data there is only one "group" in each "class", but it looks like there could maybe be multiple groups per class? If that is the case, it will be a bit more complicated, but the principal will be the same.
CREATE TABLE #data (ID INT, Class VARCHAR(50), [Group] VARCHAR(50), Total INT, Account VARCHAR(50));
INSERT INTO #data(ID, Class, [Group], Total, Account) VALUES
(1, null, 'INCOME', null, 'Fencing'),
(2, null, null, null, 'Crop'),
(3, 'Net Income', null, null, 'Net Income'),
(4, null, 'Farm Expenditure', null, 'Irrigation'),
(5, null, null, null, 'electricity'),
(6, 'Surplus', null, null, 'Surplus'),
(7, null, 'GST', null, 'GST'),
(8, 'Closing Balance', null, null, 'Closing Balance');
-- Find the break points that signify the end of a Class
WITH breaks as(
SELECT IIF(Class IS NOT NULL, 1, 0) AS breakpoint, ID, Class, [Group], Total, Account
FROM #data
),
-- count the breakpoints passed so each group will have a number we can group by
grp AS (
SELECT ISNULL(SUM(breakpoint) OVER (ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp,
ID,Class,[Group],Total,Account
FROM breaks
)
SELECT MAX(grp.Class) OVER (PARTITION BY grp.grp) AS Class,
MAX(grp.[Group]) OVER (PARTITION BY grp.grp) AS [Group],
grp.Total,
grp.Account
FROM grp
Thanks To James, I have now tweaked his code to give the correct Group, as the class and group needed different logic.
-- Find the break points that signify the end of a Class
WITH classbreaks as(
SELECT IIF(Class IS NOT NULL, 1, 0) AS breakpoint, ID, Class, [Group], Total, Account
FROM [PedGroup_db].[dbo].[Cashflow]
),
-- Find the break points that signify the end of a Group
grpbreaks as(
SELECT IIF([Group] IS NOT NULL, 1, 0) AS breakpoint, ID, Class, [Group], Total, Account
FROM [PedGroup_db].[dbo].[Cashflow]
),
-- count the breakpoints passed so each class will have a number we can group by
clss AS (
SELECT ISNULL(SUM(breakpoint) OVER (ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp,
ID,Class,[Group],Total,Account
FROM classbreaks
),
-- count the breakpoints passed so each group will have a number we can group by
grp AS (
SELECT ISNULL(SUM(breakpoint) OVER (ORDER BY ID),0) AS grp,
ID,Class,[Group],Total,Account
FROM grpbreaks
)
--join the two sub queries together on ID
SELECT MAX(clss.Class) OVER (PARTITION BY clss.grp) AS Class,
case when clss.Class = grp.Account Then grp.Account else MAX(grp.[Group]) OVER (PARTITION BY grp.grp) end AS [Group],
grp.Total,
grp.Account
FROM clss left join grp on grp.ID = clss.ID
I have few tables and basically I'm working out on telerik reports. The structure and the sample data I have is given below:
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Leave'))
BEGIN;
DROP TABLE [Leave];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Addition'))
BEGIN;
DROP TABLE [Addition];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Deduction'))
BEGIN;
DROP TABLE [Deduction];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('EmployeeInfo'))
BEGIN;
DROP TABLE [EmployeeInfo];
END;
GO
CREATE TABLE [EmployeeInfo] (
[EmpID] INT NOT NULL PRIMARY KEY,
[EmployeeName] VARCHAR(255)
);
CREATE TABLE [Addition] (
[AdditionID] INT NOT NULL PRIMARY KEY,
[AdditionType] VARCHAR(255),
[Amount] VARCHAR(255),
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
CREATE TABLE [Deduction] (
[DeductionID] INT NOT NULL PRIMARY KEY,
[DeductionType] VARCHAR(255),
[Amount] VARCHAR(255),
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
CREATE TABLE [Leave] (
[LeaveID] INT NOT NULL PRIMARY KEY,
[LeaveType] VARCHAR(255) NULL,
[DateFrom] VARCHAR(255),
[DateTo] VARCHAR(255),
[Approved] Binary,
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
GO
INSERT INTO EmployeeInfo([EmpID], [EmployeeName]) VALUES
(1, 'Marcia'),
(2, 'Lacey'),
(3, 'Fay'),
(4, 'Mohammad'),
(5, 'Mike')
INSERT INTO Addition([AdditionID], [AdditionType], [Amount], [EmpID]) VALUES
(1, 'Bonus', '2000', 2),
(2, 'Increment', '5000', 5)
INSERT INTO Deduction([DeductionID], [DeductionType], [Amount], [EmpID]) VALUES
(1, 'Late Deductions', '2000', 4),
(2, 'Delayed Project Completion', '5000', 1)
INSERT INTO Leave([LeaveID],[LeaveType],[DateFrom],[DateTo], [Approved], [EmpID]) VALUES
(1, 'Annual Leave','2018-01-08 04:52:03','2018-01-10 20:30:53', 1, 1),
(2, 'Sick Leave','2018-02-10 03:34:41','2018-02-14 04:52:14', 1, 2),
(3, 'Casual Leave','2018-01-04 11:06:18','2018-01-05 04:11:00', 1, 3),
(4, 'Annual Leave','2018-01-17 17:09:34','2018-01-21 14:30:44', 1, 4),
(5, 'Casual Leave','2018-01-09 23:31:16','2018-01-12 15:11:17', 1, 3),
(6, 'Annual Leave','2018-02-16 18:01:03','2018-02-19 17:16:04', 1, 2)
The query I am using to get the output is something like this:
SELECT Info.EmployeeName, Addition.AdditionType, Addition.Amount, Deduction.DeductionType, Deduction.Amount,
Leave.LeaveType,
SUM(DATEDIFF(Day, Leave.DateFrom, Leave.DateTo)) [#OfLeaves],
DatePart(MONTH, Leave.DateFrom)
FROM EmployeeInfo Info
LEFT JOIN Leave
ON Info.EmpID = Leave.EmpID
LEFT JOIN Addition
ON Info.EmpID = Addition.EmpID
LEFT JOIN Deduction
ON Info.EmpID = Deduction.EmpID
WHERE Approved = 1
GROUP BY Info.EmployeeName, Addition.AdditionType, Addition.Amount, Deduction.DeductionType, Deduction.Amount,
Leave.LeaveType,
DatePart(MONTH, Leave.DateFrom)
I actually want to get the output which I could be able to show on the report but somehow as I'm using joins the data is repeating on multiple rows for same user and that's why it's also appearing multiple times on the report.
The output I am getting is something like this
Fay NULL NULL NULL NULL Casual Leave 4 1
Lacey Bonus 2000 NULL NULL Annual Leave 3 2
Lacey Bonus 2000 NULL NULL Sick Leave 4 2
Marcia NULL NULL Delayed Project Completion 5000 Annual Leave 2 1
Mohammad NULL NULL Late Deductions 2000 Annual Leave 4 1
Although what I want it looks something like this:
Fay NULL NULL NULL NULL Casual Leave 4 1
Lacey Bonus 2000 NULL NULL Annual Leave 3 2
Lacey NULL NULL NULL NULL Sick Leave 4 2
Marcia NULL NULL Delayed Project Completion 5000 Annual Leave 2 1
Mohammad NULL NULL Late Deductions 2000 Annual Leave 4 1
As there was only one bonus and it was not allocated multiple times than it should appear one time. I am stuck in formatting the table layout so I think I might able to get a hint in formatting the output in query so I won't have to do there.
Best,
My own recommendation on this case is to change the left joins to a single table in the following way:
select
info.employeename, additiontype, additionamount, deductiontype, deductionamount, leavetype, #ofleaves, leavemth
from Employeeinfo info
join
(
Select
Leave.empid, null as additiontype, null as additionamount, null as deductiontype, null as deductionamount, leave.leavetype, DATEDIFF(Day, Leave.DateFrom, Leave.DateTo) [#OfLeaves], DatePart(MONTH, DateFrom) leavemth
from leave
where approved = 1
Union all
Select
Addition.empid, additiontype, amount, null, null, null, null, null
From addition
Union all
Select empid, null, null, deductiontype, amount, null, null, null
From deduction
) payadj on payadj.empid= info.empid
This approach separates the different pay adjustments into the different columns and also ensures that you don't get the double ups where this joins add multiple employee IDs.
You might need to explicitly name all the null columns for each Union - I haven't tested it, but I thought you only need to name the columns in a union all once.
The output comes in the format below;
employeename bonus leavetype
Lacey 2000 null
Lacey null Sick Leave
Lacey null Annual Leave
Rather than type out the full result set here is a link to sqlfiddle;
http://sqlfiddle.com/#!18/935e9/5/0
The problem you're facing is based on how you are joining the tables together. It's not syntax that's necessarily wrong but how we look at the data and how we understand the relationships between the tables. When doing the LEFT JOINs your query is able to find EmpIDs in each table and it is happy with that and grabs the records (or returns NULL if there are no records matching the EmpID). That isn't really what you're looking for since it can join too much together. So let's see why this is happening. If we take out the join to the Addition table your results would look like this:
Fay NULL NULL Casual Leave 4 1
Lacey NULL NULL Annual Leave 3 2
Lacey NULL NULL Sick Leave 4 2
Marcia Delayed Project Completion 5000 Annual Leave 2 1
Mohammad Late Deductions 2000 Annual Leave 4 1
You are still left with two rows for Lacey. The reason for these two rows is because of the join to the Leave table. Lacey has taken two leaves of absence. One for Sick Leave and the other for Annual Leave. Both of those records share the same EmpID of 2. So when you join to the Addition table (and/or to the rest of the tables) on EmpID the join looks for all matching records to complete that join. There's a single Addition record that matches two Leave records joined on EmpID. Thus, you end up with two Bonus results--the same Addition record for the two Leave records. Try running this query and check the results, it should also illustrate the problem:
SELECT l.LeaveType, l.EmpID, a.AdditionType, a.Amount
FROM Leave l
LEFT JOIN Addition a ON a.EmpID = l.EmpID
The results using your provided data would be:
Annual Leave 1 NULL NULL
Sick Leave 2 Bonus 2000
Casual Leave 3 NULL NULL
Annual Leave 4 NULL NULL
Casual Leave 3 NULL NULL
Annual Leave 2 Bonus 2000
So the data itself isn't wrong. It's just that when joining on EmpID in this way the relationships may be confusing.
So the problem is the relationship between the Leave table and the others. It doesn't make sense to join Leave to the Addition or Deduction tables directly on EmpID because it may look as though Lacey received a bonus for each leave of absence for example. This is what you are experiencing here.
I would suggest three separate queries (and potentially three reports). One to return the leave of absence data and the others for the Addition and Deduction data. Something like:
--Return each employee's leaves of absence
SELECT e.EmployeeName
, l.LeaveType
, SUM(DATEDIFF(Day, l.DateFrom, l.DateTo)) [#OfLeaves]
, DatePart(MONTH, l.DateFrom)
FROM EmployeeInfo e
LEFT JOIN Leave l ON e.EmpID = l.EmpID
WHERE l.Approved = 1
--Return each employee's Additions
SELECT e.EmployeeName
, a.AdditionType
, a.Amount
FROM EmployeeInfo e
LEFT JOIN Addition a ON e.EmpID = a.EmpID
--Return each employee's Deductions
SELECT e.EmployeeName
, d.DeductionType
, d.Amount
FROM EmployeeInfo e
LEFT JOIN Deduction d ON e.EmpID = d.EmpID
Having three queries should better represent the relationship the EmployeeInfo table has with each of the others and separate concerns. From there you can GROUP BY the different types of data and aggregate the values and get total counts and sums.
Here are some resources which may help if you hadn't found these already:
Explanation of SQL Joins: https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
SQL Join Examples: https://www.w3schools.com/sql/sql_join.asp
Telerik Reporting Documentation: https://docs.telerik.com/reporting/overview
As attached in screenshot, there are two tables.
Configuration:
Detail
Using Configuration and Detail table I would like to populate IdentificationType and IDerivedIdentification column in the Detail table.
Following logic should be used, while deriving above columns
Configuration table has order of preference, which user can change dynamically (i.e. if country is Austria then ID preference should be LEI then TIN (in case LEI is blanks) then CONCAT (if both blank then some other logic)
In case of contract ID = 3, country is BG, so LEI should be checked first, since its NULL, CCPT = 456 will be picked.
I could have used COALESCE and CASE statement, in case hardcoding is allowed.
Can you please suggest any alternation approach please ?
Regards
Digant
Assuming that this is some horrendous data dump and you are trying to clean it up here is some SQL to throw at it. :) Firstly, I was able to capture your image text via Adobe Acrobat > Excel.
(I also built the schema for you at: http://sqlfiddle.com/#!6/8f404/12)
Firstly, the correct thing to do is fix the glaring problem and that's the table structure. Assuming you can't here's a solution.
So, here it is and what it does is unpivots the columns LEI, NIND, CCPT and TIN from the detail table and also as well as FirstPref, SecondPref, ThirdPref from the Configuration table. Basically, doing this helps to normalize the data although it's costing you major performance if there are no plans to fix the data structure or you cannot. After that you are simply joining the tables Detail.ContactId to DerivedTypes.ContactId then DerivedPrefs.ISOCountryCode to Detail.CountrylSOCountryCode and DerivedTypes.ldentificationType = DerivedPrefs.ldentificationType If you use an inner join rather than the left join you can remove the RANK() function but it will not show all ContactIds, only those that have a value in their LEI, NIND, CCPT or TIN columns. I think that's a better solution anyway because why would you want to see an error mixed in a report? Write a separate report for those with no values in those columns. Lastly, the TOP (1) with ties allows you to display one record per ContactId and allows for the record with the error to still display. Hope this helps.
CREATE TABLE Configuration
(ISOCountryCode varchar(2), CountryName varchar(8), FirstPref varchar(6), SecondPref varchar(6), ThirdPref varchar(6))
;
INSERT INTO Configuration
(ISOCountryCode, CountryName, FirstPref, SecondPref, ThirdPref)
VALUES
('AT', 'Austria', 'LEI', 'TIN', 'CONCAT'),
('BE', 'Belgium', 'LEI', 'NIND', 'CONCAT'),
('BG', 'Bulgaria', 'LEI', 'CCPT', 'CONCAT'),
('CY', 'Cyprus', 'LEI', 'NIND', 'CONCAT')
;
CREATE TABLE Detail
(ContactId int, FirstName varchar(1), LastName varchar(3), BirthDate varchar(4), CountrylSOCountryCode varchar(2), Nationality varchar(2), LEI varchar(9), NIND varchar(9), CCPT varchar(9), TIN varchar(9))
;
INSERT INTO Detail
(ContactId, FirstName, LastName, BirthDate, CountrylSOCountryCode, Nationality, LEI, NIND, CCPT, TIN)
VALUES
(1, 'A', 'DES', NULL, 'AT', 'AT', '123', '4345', NULL, NULL),
(2, 'B', 'DEG', NULL, 'BE', 'BE', NULL, '890', NULL, NULL),
(3, 'C', 'DEH', NULL, 'BG', 'BG', NULL, '123', '456', NULL),
(4, 'D', 'DEi', NULL, 'BG', 'BG', NULL, NULL, NULL, NULL)
;
SELECT TOP (1) with ties Detail.ContactId,
FirstName,
LastName,
BirthDate,
CountrylSOCountryCode,
Nationality,
LEI,
NIND,
CCPT,
TIN,
ISNULL(DerivedPrefs.ldentificationType, 'ERROR') ldentificationType,
IDerivedIdentification,
RANK() OVER (PARTITION BY Detail.ContactId ORDER BY
CASE WHEN Pref = 'FirstPref' THEN 1
WHEN Pref = 'SecondPref' THEN 2
WHEN Pref = 'ThirdPref' THEN 3
ELSE 99 END) AS PrefRank
FROM
Detail
LEFT JOIN
(
SELECT
ContactId,
LEI,
NIND,
CCPT,
TIN
FROM Detail
) DetailUNPVT
UNPIVOT
(IDerivedIdentification FOR ldentificationType IN
(LEI, NIND, CCPT, TIN)
)AS DerivedTypes
ON DerivedTypes.ContactId = Detail.ContactId
LEFT JOIN
(
SELECT
ISOCountryCode,
CountryName,
FirstPref,
SecondPref,
ThirdPref
FROM
Configuration
) ConfigUNPIVOT
UNPIVOT
(ldentificationType FOR Pref IN
(FirstPref, SecondPref, ThirdPref)
)AS DerivedPrefs
ON DerivedPrefs.ISOCountryCode = Detail.CountrylSOCountryCode
and DerivedTypes.ldentificationType = DerivedPrefs.ldentificationType
ORDER BY RANK() OVER (PARTITION BY Detail.ContactId ORDER BY
CASE WHEN Pref = 'FirstPref' THEN 1
WHEN Pref = 'SecondPref' THEN 2
WHEN Pref = 'ThirdPref' THEN 3
ELSE 99 END)
I'm trying to build a CTE which will pull back all records which are related to a given, arbitrary record in the database.
Create table Requests (
Id bigint,
OriginalId bigint NULL,
FollowupId bigint NULL
)
insert into Requests VALUES (1, null, 3)
insert into Requests VALUES (2, 1, 8)
insert into Requests VALUES (3, 1, 4)
insert into Requests VALUES (4, 3, null)
insert into Requests VALUES (5, null, null)
insert into Requests VALUES (6, null, 7)
insert into Requests VALUES (7, 6, null)
insert into Requests VALUES (8, 2, null)
OriginalId is always the Id of a previous record (or null). FollowupId points to the most recent followup record (which, in turn, points back via OriginalId) and can probably be ignored, but it's there if it's helpful.
I can easily pull back either all ancestors or all descendants of a given record using the following CTE
;With TransactionList (Id, originalId, followupId, Steps)
AS
(
Select Id, originalId, followupId, 0 as Steps from requests where Id = #startId
union all
select reqs.Id, reqs.originalId, reqs.followupId, Steps + 1 from requests reqs
inner join TransactionList tl on tl.Id = reqs.originalId --or tl.originalId = reqs.Id
)
SELECT Id from TransactionList
However, if I use both where clauses, I run into recursion, hit the recursion limit, and it bombs out. Even combining both sets, I don't get the entire tree - just one branch from it.
I don't care about anything other than the list of Ids. They don't need to be sorted, or to display their relationship or anything. Doesn't hurt, but not necessary. But I need every Id in a given tree to pull back the same list when it's passed as #startId.
As an example of what I'd like to see, this is what the output should be when #startId is set to any value 1-4 or 8:
1
2
3
4
8
And for either 6 or 7, I get back both 6 and 7.
You can just create 2 CTE's.
The first CTE will get the Root of the hierarchy, and the second will use the Root ID to get the descendants of the Root.
;WITH cteRoot AS (
SELECT *, 0 [Level]
FROM Requests
WHERE Id = #startId
UNION ALL
SELECT r.*, [Level] + 1
FROM Requests r
JOIN cteRoot cte ON r.Id = cte.OriginalID
),
cteDesc AS (
SELECT *
FROM cteRoot
WHERE OriginalId IS NULL
UNION ALL
SELECT r.*, [Level] + 1
FROM Requests r
JOIN cteDesc cte ON r.OriginalId = cte.Id
)
SELECT * FROM cteDesc
SQL Fiddle
I have product data structured in the following format:
ProductID OptionID Lvl OptionDescription SubOptionID SubOptionDescription
HPH 6 1 Model 10 Studio
HPH 6 1 Model 11 DJ
HPH 7 2 Device 12 Bluetooth
HPH 7 2 Device 13 Cable
HPH 7 2 Device 14 Remote
There could be any number of levels to the product. I need to traverse the hierarchy and produce the following output - a description for each product option:
Studio-Bluetooth
Studio-Cable
Studio-Remote
DJ-Bluetooth
DJ-Cable
DJ-Remote
I've looked CTE's but the examples tend to incorporate adjacent lists (employeeID; managerID..etc) which don't seem appropriate here.
How can I achieve this output?
Thanks.
CREATE TABLE [dbo].[Products](
[ProductID] [varchar](50) NULL,
[OptionID] [int] NULL,
[Lvl] [int] NULL,
[OptionDescription] [varchar](50) NULL,
[SubOptionID] [int] NULL,
[SubOptionDescription] [varchar](50) NULL
) ON [PRIMARY]
insert into Products (ProductID, OptionID, Lvl, OptionDescription, SubOptionID, SubOptionDescription) values ('HPH', 6, 1, 'Model', 10, 'Studio')
insert into Products (ProductID, OptionID, Lvl, OptionDescription, SubOptionID, SubOptionDescription) values ('HPH', 6, 1, 'Model', 11, 'DJ')
insert into Products (ProductID, OptionID, Lvl, OptionDescription, SubOptionID, SubOptionDescription) values ('HPH', 7, 2, 'Device', 12, 'Bluetooth')
insert into Products (ProductID, OptionID, Lvl, OptionDescription, SubOptionID, SubOptionDescription) values ('HPH', 7, 2, 'Device', 13, 'Cable')
insert into Products (ProductID, OptionID, Lvl, OptionDescription, SubOptionID, SubOptionDescription) values ('HPH', 7, 2, 'Device', 14, 'Remote')
with cte as (
-- Root level
select p.Lvl, cast(p.SubOptionDescription as varchar(max)) as [ProductOption]
from #Products p where p.Lvl = 1
union all
-- Anchor part - cartesian here?
select p.Lvl, c.ProductOption + '-' + p.SubOptionDescription
from #Products p
inner join cte c on c.Lvl = p.Lvl - 1
)
select c.ProductOption from cte c;
A couple of notes.
Right now your sample answer implies that you need to create a cartesian product. I hope this is not the case, because the amount of rows will increase explosively. If there are other join conditions which are not apparent from your sample, you can introduce them in the anchor part of the CTE.
You would probably also want to return only leaf rows. There are several ways to do it - there may be some attribute in your actual data, or a combination of rank() and top (1) with ties will do the trick, although it won't be particularly efficient.