Merge rows based on the same date? - sql-server

I have a table that looks like the below
Date | ID | Period | ArchivedBy | ArchivedFlag | Value
2018-01-20 12:23 |23344 | Q1 | NULL | NULL | 200
2018-01-20 12:20 |23344 | NULL | P.Tills | 1 | NULL
2018-01-20 12:19 |23344 | NULL | NULL | 1 | NULL
This table represents all edits made to an agreement (each new edit gets it's own row). If a value hasn't been changed at all, it will say NULL.
so ideally the above would look like the following
Date | ID | Period | ArchivedBy | ArchivedFlag | Value
2018-01-20 |23344 | Q1 | P.Tills | 1 | 200
This returned row should show the latest state of the agreement based on the date. So for the date in my example (2018-01-20) this one row would be returned, combining all changes that were made throughout the day into 1 row which shows how it looks following all the changes throughout the day.
I hope this makes sense?
Thank you!

Here is one way using Row_Number and Group by
SELECT [Date] = Cast([Date] AS DATE),
ID,
Max(period),
Max(ArchivedBy),
Max(ArchivedFlag),
Max(CASE WHEN rn = 1 THEN [Value] END)
FROM (SELECT *,
Rn = Row_number()OVER(partition BY Cast([Date] AS DATE), ID ORDER BY [Date] DESC)
FROM Yourtable)a
GROUP BY Cast([Date] AS DATE),
ID

I would propose 2 solutions.
Simple
For each day select top 1 NOT NULL value:
SELECT G.ID, G.GD Date, Period.*, ArchivedBy.*, Value.* FROM
(SELECT DISTINCT ID, CAST(Date AS Date) GD FROM T) G
CROSS APPLY (SELECT TOP 1 Period FROM T WHERE Period IS NOT NULL AND CAST(Date AS Date)=GD ORDER BY Date DESC) Period
CROSS APPLY (SELECT TOP 1 ArchivedBy FROM T WHERE ArchivedBy IS NOT NULL AND CAST(Date AS Date)=GD ORDER BY Date DESC) ArchivedBy
CROSS APPLY (SELECT TOP 1 Value FROM T WHERE Value IS NOT NULL AND CAST(Date AS Date)=GD ORDER BY Date DESC) Value
Optimized (intuitively, not tested*)
Use varbinary sorting rules and aggregation, manually order NULLs:
SELECT CAST(Date AS Date), ID,
CAST(SUBSTRING(MAX(Arch),9, LEN(MAX(Arch))) AS varchar(10)) ArchivedBy --unbox
--other columns
FROM
(
SELECT Date, ID,
CAST(CASE WHEN ArchivedBy IS NOT NULL THEN ROW_NUMBER() OVER (PARTITION BY CAST(Date AS Date) ORDER BY Date) ELSE 0 END AS varbinary(MAX))+CAST(ArchivedBy AS varbinary(MAX)) Arch --box
--other columns
FROM T
) Tab
GROUP BY ID, CAST(Date AS Date)

Related

Compare dates between rows from the same input file based on ID and replicate rows by increment date till the last working date using SQL Server

I am trying to duplicate rows by comparing the date of the current row with date of the next row for a user ID and row should be duplicated by incrementing the date where < date of the next row.
The last row of the id should increment till the lastworkingdate. If the lastworkingdate is null, should increment the date till current date.
Input:
Output expected
Please suggest if we can implement this logic using SQL Server.
I have tried the below code
WITH cte (User_ID, Start_DateMonth, Start_DateDAY, Last_working_date_text, lead_start_datemonth) AS
(SELECT User_ID,
CONVERT(date, CAST(Start_DateMonth AS varchar(50)) + '01') AS Start_DateMonth,
Start_DateDAY,
Last_working_date_text,
LEAD(CONVERT(datetime, CAST(Start_DateMonth AS varchar(MAX)) + '01')) OVER (PARTITION BY User_ID
ORDER BY CONVERT(date, CAST(Start_DateMonth AS varchar(50)) + '01')) AS lead_start_datemonth
FROM [dbo].[Historic_Headcount3] --mytable
UNION ALL
SELECT User_ID,
CONVERT(date, DATEADD(MONTH, 1, ISNULL(Start_DateMonth, GETDATE()))),
Start_DateDAY,
Last_working_date_text,
CONVERT(datetime, CAST(lead_start_datemonth AS varchar(MAX)) + '01') AS lead_start_datemonth
FROM cte
WHERE DATEADD(MONTH, 1, Start_DateMonth) < ISNULL(lead_start_datemonth,
CASE
WHEN ISDATE(Last_working_date_text) = 1
AND Last_working_date_text != '#' THEN CONVERT(date, Last_working_date_text)
ELSE GETDATE()
END))
SELECT User_ID,
Start_DateMonth,
Start_DateDAY,
Last_working_date_text
FROM cte
ORDER BY User_ID,
Start_DateMonth;
I am getting error
The conversion of a varchar data type to a datetime data type resulted in an out-of-range value.
This script will hopefully give you enough understanding of how to utilise a numbers table to increment your months, so that you can apply it once you get your data cleaning and transformation working as required:
-- Define test data
declare #t table(UserID int, StartDate date, EndDate date);
insert into #t values(1,'20190901','20200217'),(2,'20200202','20200205'),(3,'20200108',null);
-- Find maximum possible number of Month interations required
declare #Months int = (select datediff(month,min(StartDate),getdate())+1 from #t);
-- Query the data
with t(t) as(select t from (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(t)) -- Create a table with 10 rows in
,n(n) as(select top(#Months) row_number() over (order by (select null))-1 from t t1,t t2,t t3,t t4) -- Cross Join the table to itself to return a row_number() up to a possible 10*10*10*10 = 10,000 rows. Use TOP to limit this to what is actually required.
select t.UserID
,dateadd(month,n.n,t.StartDate) as StartDateMonth
,isnull(t.EndDate,getdate()) as EndDate
from #t as t
join n
on dateadd(month,n.n,t.StartDate) <= isnull(t.EndDate,getdate()) -- JOIN the dates to the row_number, incrementing the months as required
order by UserID
,StartDateMonth;
Output:
+--------+----------------+------------+
| UserID | StartDateMonth | EndDate |
+--------+----------------+------------+
| 1 | 2019-09-01 | 2020-02-17 |
| 1 | 2019-10-01 | 2020-02-17 |
| 1 | 2019-11-01 | 2020-02-17 |
| 1 | 2019-12-01 | 2020-02-17 |
| 1 | 2020-01-01 | 2020-02-17 |
| 1 | 2020-02-01 | 2020-02-17 |
| 2 | 2020-02-02 | 2020-02-05 |
| 3 | 2020-01-08 | 2020-02-26 |
| 3 | 2020-02-08 | 2020-02-26 |
+--------+----------------+------------+
The code worked for me
with cte ([CountryId],[Is_EO_EmployeeId],[DepartmentId],[FunctionId],[Employee_StatusId],[Event_ReasonId],User_ID, Start_DateMonth, Start_DateDAY, Last_working_date_text, lead_start_datemonth) as (
SELECT [CountryId],[Is_EO_EmployeeId],[DepartmentId],[FunctionId],[Employee_StatusId],[Event_ReasonId], User_ID, Start_DateMonth,Start_DateDAY,Last_working_date_text, CASE WHEN lead_start_datemonth IS NULL THEN NULL ELSE Convert(datetime, CAST(lead_start_datemonth AS Nvarchar(max))+'01')END AS lead_start_datemonth FROM (
select
[CountryId],
[Is_EO_EmployeeId],
[DepartmentId],
[FunctionId],
[Employee_StatusId],
[Event_ReasonId],
User_ID,
CONVERT(datetime, CAST(Start_DateMonth AS varchar(50)) + '01') AS Start_DateMonth,
Start_DateDAY,
Last_working_date_text,
lead(Start_DateMonth) over(partition by User_ID order by CONVERT(datetime, CAST(Start_DateMonth AS varchar(50)) + '01')) lead_start_datemonth
from [dbo].[Historic_Headcount3]) T--mytable
union all
select
[CountryId],
[Is_EO_EmployeeId],
[DepartmentId],
[FunctionId],
[Employee_StatusId],
[Event_ReasonId],
User_ID,
Convert(datetime,DateAdd(month,1, ISNULL(Start_DateMonth,GetDate()))),
Start_DateDAY,
Last_working_date_text,
lead_start_datemonth
from cte
where DateAdd(month,1, Start_DateMonth) < ISNULL(lead_start_datemonth,CASE WHEN ISDATE(Last_working_date_text)=1 AND Last_working_date_text != '#' THEN CONVERT(datetime,Last_working_date_text) ELSE GETDATE() END)
)
select [CountryId],[Is_EO_EmployeeId],[DepartmentId],[FunctionId],[Employee_StatusId],[Event_ReasonId],User_ID, LEFT(CONVERT(varchar, Start_DateMonth,112),6) AS Start_DateMonth, Start_DateDAY ,Last_working_date_text from cte order by User_ID, Start_DateMonth OPTion (maxrecursion 0)

Return the current month salary and previous month salary in a same table

I have a task to prepare a report generated from a run control page and retrieve the current month salary and previous month salary. In that page, the user will choose the cal_id they want for example in this case the user choose cal id = FEB. Assume the table as below named table_salary:
emplid | cal_id | salary | pymt_date
101 | JAN | 10000 | 2018-01-01
101 | FEB | 15000 | 2018-02-01
And my expected output is
emplid | cur_sal| prev_sal
101 | 15000 | 10000
What I have done so far is like below
SELECT
A.EMPLID, A.SALARY AS CUR_SAL, B.SALARY AS PREV_SAL
FROM
TABLE_SALARY A
LEFT OUTER JOIN
TABLE_SALARY B ON A.EMPLID AND B.EMPLID
AND A.CAL_ID = B.CAL_ID
AND B.PYMT_DT = (SELECT MAX(B1.PYMT_DT)
FROM TABLE_SALARY B1
WHERE B1.EMPLID = B.EMPLID
AND B1.PYMT_DT >= DATEADD(mm, DATEDIFF(mm, 0, B.PYMT_DT) - 1, 0)
AND B1.PYMT_DT < DATEADD(mm, DATEDIFF(mm, 0, PYMT_DT), 0))
But above SQL didn't return the expected output.
Does anyone have an idea how to achieve my expected output?
It should be like this
Use Lead instead of Lag
Create table #t ( id int identity (1,1), Empid int , Month varchar
(10), Salary int, Paymentdate date )
insert into #t (Empid ,Month,Salary,Paymentdate) Select
'1','Jan',1000, '2018-01-01'
insert into #t (Empid ,Month,Salary,Paymentdate) Select
'1','Feb',1500, '2018-02-01'
Select * from #t
SELECT TOP 1
Empid, SALARY AS CUR_SAL, Lead(SALARY, 1, 0) OVER (ORDER BY PaymentDate DESC) AS PREV_SAL FROM
#t ORDER BY
Paymentdate DESC
SELECT TOP 1
Empid, SALARY AS CUR_SAL, LAG(SALARY, 1, 0) OVER (ORDER BY PaymentDate DESC) AS PREV_SAL FROM
#t ORDER BY
Paymentdate DESC
Use a window function to retrieve the previous row in a sorted set. I think this should work.
SELECT TOP 1
EMPLID, SALARY AS CUR_SAL, LEAD(SALARY, 1, 0) OVER (ORDER BY PYMT_DT DESC) AS PREV_SAL
FROM
TABLE_SALARY
ORDER BY
PYMT_DT DESC

Select First, Max, and Last non-null value per group

Trying to select, per group, the first and last values (chronologically) as well as the max value. I had written a query that works fine except it does not handle the NULL values. I need it to ignore NULL values.
Here's an example:
DECLARE #T table (
LabName VARCHAR(20)
, CreatedOn date
, LabValue int
)
INSERT INTO #T
( LabName,CreatedOn,LabValue )
VALUES
('Creatinine', '2016-01-01', NULL)
, ('Creatinine', '2016-02-01', 15)
, ('Creatinine', '2016-03-01', 20)
, ('Creatinine', '2016-04-01', 19)
, ('SGOT (ST)', '2016-01-01', 25)
, ('SGOT (ST)', '2016-02-01', 31)
, ('SGOT (ST)', '2016-03-01', 25)
, ('SGOT (ST)', '2016-04-01', NULL)
SELECT DISTINCT
*
FROM (
SELECT
LabName
, FIRST_VALUE(LabValue) OVER(PARTITION BY LabName ORDER BY CreatedOn ASC) AS FirstValue
, MAX(LabValue) OVER(PARTITION BY LabName) AS MaxValue
, LAST_VALUE(LabValue) OVER(PARTITION BY LabName ORDER BY CreatedOn ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) LastValue
FROM #T
) AS T
It was working fine until I realized some labs aren't run on some dates. Once I put some NULLs into the test data, the results for First and Last will include them.
Here is the result I get:
+------------+------------+----------+-----------+
| LabName | FirstValue | MaxValue | LastValue |
+------------+------------+----------+-----------+
| Creatinine | NULL | 20 | 19 |
| SGOT (ST) | 25 | 31 | NULL |
+------------+------------+----------+-----------+
Here is the result I want:
+------------+------------+----------+-----------+
| LabName | FirstValue | MaxValue | LastValue |
+------------+------------+----------+-----------+
| Creatinine | 15 | 20 | 19 |
| SGOT (ST) | 25 | 31 | 25 |
+------------+------------+----------+-----------+
Use conditional aggregation with ROW_NUMBER():
SELECT LabName,
MAX(CASE WHEN seqnum_asc = 1 THEN LabValue END) as FirstValue,
MAX(LabValue) as MaxValue,
MAX(CASE WHEN seqnum_desc = 1 THEN LabValue END) as LastValue
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY LabName
ORDER BY (CASE WHEN LabValue IS NOT NULL THEN 1 ELSE 2 END),
CreatedOn
) as seqnum_asc,
ROW_NUMBER() OVER (PARTITION BY LabName
ORDER BY (CASE WHEN LabValue IS NOT NULL THEN 1 ELSE 2 END),
CreatedOn DESC
) as seqnum_desc
FROM #T t
) T
GROUP BY LabName;
As you said there are 13 such columns where you need to check not null values.
I think you should first filter all not null values using CTE,then using CTE you can write your actual query.CTE will reduce your result set and applying window function on reduce resultset will give better performance.
BTW,13 such columns appear t be bad DB design.you may have to 100 query in future.
IMHO, DISTINCT often indicate bad DB design than query.
;With CTE as
(-- try to reduce resultset if possible
SELECT * FROM #T
WHERE LabValue IS NOT NULL
)
SELECT DISTINCT
*
FROM (
SELECT
LabName
, FIRST_VALUE(LabValue) OVER(PARTITION BY LabName ORDER BY CreatedOn ASC) AS FirstValue
, MAX(LabValue) OVER(PARTITION BY LabName) AS MaxValue
, LAST_VALUE(LabValue) OVER(PARTITION BY LabName ORDER BY CreatedOn ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) LastValue
FROM CTE
) AS T
Your database is handling NULL values properly.
First value for Creatinine is actually null and last value for SGOT (ST) is null as well.
If you wish to discard rows with null values just add it in the WHERE clause:
SELECT DISTINCT
*
FROM (
SELECT
LabName
, FIRST_VALUE(LabValue) OVER(PARTITION BY LabName ORDER BY CreatedOn ASC) AS FirstValue
, MAX(LabValue) OVER(PARTITION BY LabName) AS MaxValue
, LAST_VALUE(LabValue) OVER(PARTITION BY LabName ORDER BY CreatedOn ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) LastValue
FROM #T
WHERE LabValue IS NOT NULL
) AS T;

SQL multiple start dates to end date

I have a table with the following format (which I cannot change)
ClientID | RefAd1 | Cluster Start Date | Cluster End Date
100001 | R1234 | 2014-11-01 |
100001 | R1234 | 2014-11-10 |
100001 | R1234 | 2014-11-20 |
What I would like to come out with is:
ClientID | RefAd1 | Cluster Start Date | Cluster End Date
100001 | R1234 | 2014-11-01 | 2014-11-10
100001 | R1234 | 2014-11-10 | 2014-11-20
100001 | R1234 | 2014-11-20 | NULL
I've searched on here, and had many attempts myself, but just can't get it working.
I can't update the source table (or add another table into the database) so I'm going to do this in a view (which I can save)
Any help would be gratefully appreciated, been going round in circles with this for a day and a bit now!
Use Self join to get next record
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER(ORDER BY [Cluster Start Date])RNO,*
FROM YOURTABLE
)
SELECT C1.ClientID,C1.RefAd1,C1.[Cluster Start Date],C2.[Cluster Start Date] [Cluster End Date]
FROM CTE C1
LEFT JOIN CTE C2 ON C1.RNO=C2.RNO-1
Click here to view result
EDIT :
To update the table, you can use the below query
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER(ORDER BY [Cluster Start Date])RNO,*
FROM #TEMP
)
UPDATE #TEMP SET [Cluster End Date] = TAB.[Cluster End Date]
FROM
(
SELECT C1.ClientID,C1.RefAd1,C1.[Cluster Start Date],C2.[Cluster Start Date] [Cluster End Date]
FROM CTE C1
LEFT JOIN CTE C2 ON C1.RNO=C2.RNO-1
)TAB
WHERE TAB.[Cluster Start Date]=#TEMP.[Cluster Start Date]
Click here to view result
EDIT 2 :
If you want this to be done for ClientId and RefAd1.
;WITH CTE AS
(
-- Get current date and next date for each type of ClientId and RefAd1
SELECT ROW_NUMBER() OVER(PARTITION BY ClientID,RefAd1 ORDER BY [Cluster Start Date])RNO,*
FROM #TEMP
)
UPDATE #TEMP SET [Cluster End Date] = TAB.[Cluster End Date]
FROM
(
SELECT C1.ClientID,C1.RefAd1,C1.[Cluster Start Date],C2.[Cluster Start Date] [Cluster End Date]
FROM CTE C1
LEFT JOIN CTE C2 ON C1.RNO=C2.RNO-1 AND C1.ClientID=C2.ClientID AND C1.RefAd1=C2.RefAd1
)TAB
WHERE TAB.[Cluster Start Date]=#TEMP.[Cluster Start Date] AND TAB.ClientID=#TEMP.ClientID AND TAB.RefAd1=#TEMP.RefAd1
Click here to view result
If you want to do it only for ClientId, remove the conditions for RefAd1
Here is the script if you just want the view you described:
CREATE VIEW v_name as
SELECT
ClientId,
RefAd1,
[Cluster Start Date],
( SELECT
min([Cluster Start Date])
FROM yourTable
WHERE
t.[Cluster Start Date] < [Cluster Start Date]
) as [Cluster End Date]
FROM yourtable t

SQL Find pairs of data in rows and convert to columns

I'm trying to setup a query to pull employee tenure reports. I have an employee status table that tracks information for each employee (e.g. -Hire Date, Term Date, Salary Change, etc.) The table looks like this:
EmployeeID | Date | Event
1 | 1/1/99 | 1
2 | 1/2/99 | 1
1 | 1/3/99 | 2
1 | 1/4/99 | 1
I used a pivot table to move the table from a vertical layout to a horizontal layout
SELECT [FK_EmployeeID], MAX([1]) AS [Hire Date], ISNULL(MAX([2]), DATEADD(d, 1, GETDATE())) AS [Term Date]
FROM DT_EmployeeStatusEvents PIVOT (MAX([Date]) FOR [EventType] IN ([1], [2])) T
GROUP BY [FK_EmployeeID]
I get a result like this:
EmployeeID | 1 | 2
1 | 1/4/99 | 1/3/99
2 | 1/2/99 | *null*
However, the problem I run into is that I need both sets of values for each employee. (We hire a lot of recurring seasonals) What I would like is a way to convert the columns to rows selecting the hire date (1) and the very next term date (2) for each employee like this:
EmployeeID | 1 | 2
1 | 1/1/99 | 1/3/99
2 | 1/2/99 | *null*
1 | 1/4/99 | *null*
Is this possible? I've looked at a lot of the PIVOT examples and they all show an aggregate function.
The problem is that you are attempting to pivot a datetime value so you are limited to using either max or min as the aggregate function. When you use those you will only return one row for each employeeid.
In order to get past this you will need to have some value that will be used during the grouping of your data - I would suggest using a windowing function like row_number(). You can make your subquery:
select employeeid, date, event
, row_number() over(partition by employeeid, event
order by date) seq
from DT_EmployeeStatusEvents
See SQL Fiddle with Demo. This creates a unique value for each employeeId and event combination. This new number will then be grouped on so you can return multiple rows. You full query will be:
select employeeid, [1], [2]
from
(
select employeeid, date, event
, row_number() over(partition by employeeid, event
order by date) seq
from DT_EmployeeStatusEvents
) d
pivot
(
max(date)
for event in ([1], [2])
) piv
order by employeeid;
See SQL Fiddle with Demo
This should get you started...
DECLARE #EMP TABLE (EMPID INT, dDATE DATETIME, EVENTTYPE INT)
INSERT INTO #EMP
SELECT 1,'1/1/99',1 UNION ALL
SELECT 2,'1/2/99',1 UNION ALL
SELECT 1,'1/3/99',2 UNION ALL
SELECT 1,'1/4/99',1
SELECT EMPID, HIRE, TERM
FROM (SELECT EMPID, dDATE, 'HIRE' AS X, ROW_NUMBER() OVER(PARTITION BY EMPID, EVENTTYPE ORDER BY DDATE) AS INSTANCE FROM #EMP WHERE EVENTTYPE=1
UNION ALL
SELECT EMPID, dDATE, 'TERM' AS X, ROW_NUMBER() OVER(PARTITION BY EMPID, EVENTTYPE ORDER BY DDATE) AS INSTANCE FROM #EMP WHERE EVENTTYPE=2) DATATABLE
PIVOT (MIN([DDATE])
FOR X IN ([HIRE],[TERM])) PIVOTTABLE

Resources