SQL Find pairs of data in rows and convert to columns

SQL Find pairs of data in rows and convert to columns - sql-server

I'm trying to setup a query to pull employee tenure reports. I have an employee status table that tracks information for each employee (e.g. -Hire Date, Term Date, Salary Change, etc.) The table looks like this:
EmployeeID | Date | Event
1 | 1/1/99 | 1
2 | 1/2/99 | 1
1 | 1/3/99 | 2
1 | 1/4/99 | 1
I used a pivot table to move the table from a vertical layout to a horizontal layout
SELECT [FK_EmployeeID], MAX([1]) AS [Hire Date], ISNULL(MAX([2]), DATEADD(d, 1, GETDATE())) AS [Term Date]
FROM DT_EmployeeStatusEvents PIVOT (MAX([Date]) FOR [EventType] IN ([1], [2])) T
GROUP BY [FK_EmployeeID]
I get a result like this:
EmployeeID | 1 | 2
1 | 1/4/99 | 1/3/99
2 | 1/2/99 | *null*
However, the problem I run into is that I need both sets of values for each employee. (We hire a lot of recurring seasonals) What I would like is a way to convert the columns to rows selecting the hire date (1) and the very next term date (2) for each employee like this:
EmployeeID | 1 | 2
1 | 1/1/99 | 1/3/99
2 | 1/2/99 | *null*
1 | 1/4/99 | *null*
Is this possible? I've looked at a lot of the PIVOT examples and they all show an aggregate function.

The problem is that you are attempting to pivot a datetime value so you are limited to using either max or min as the aggregate function. When you use those you will only return one row for each employeeid.
In order to get past this you will need to have some value that will be used during the grouping of your data - I would suggest using a windowing function like row_number(). You can make your subquery:
select employeeid, date, event
, row_number() over(partition by employeeid, event
order by date) seq
from DT_EmployeeStatusEvents
See SQL Fiddle with Demo. This creates a unique value for each employeeId and event combination. This new number will then be grouped on so you can return multiple rows. You full query will be:
select employeeid, [1], [2]
from
(
select employeeid, date, event
, row_number() over(partition by employeeid, event
order by date) seq
from DT_EmployeeStatusEvents
) d
pivot
(
max(date)
for event in ([1], [2])
) piv
order by employeeid;
See SQL Fiddle with Demo

This should get you started...
DECLARE #EMP TABLE (EMPID INT, dDATE DATETIME, EVENTTYPE INT)
INSERT INTO #EMP
SELECT 1,'1/1/99',1 UNION ALL
SELECT 2,'1/2/99',1 UNION ALL
SELECT 1,'1/3/99',2 UNION ALL
SELECT 1,'1/4/99',1
SELECT EMPID, HIRE, TERM
FROM (SELECT EMPID, dDATE, 'HIRE' AS X, ROW_NUMBER() OVER(PARTITION BY EMPID, EVENTTYPE ORDER BY DDATE) AS INSTANCE FROM #EMP WHERE EVENTTYPE=1
UNION ALL
SELECT EMPID, dDATE, 'TERM' AS X, ROW_NUMBER() OVER(PARTITION BY EMPID, EVENTTYPE ORDER BY DDATE) AS INSTANCE FROM #EMP WHERE EVENTTYPE=2) DATATABLE
PIVOT (MIN([DDATE])
FOR X IN ([HIRE],[TERM])) PIVOTTABLE

Related

Get sum up of every 2nd day data between a selected date range

I having table like below in Sql Server. I need to get data within in a date range, for example -: StartDate = '2020-09-01' and EndDate = '2020-09-11'. Its quite simple to get data between a date range but complicated part is that,i need to Sum up data in every 2nd day in the selected date range.
For Example -:
As in the above image, i need to Sum up of SKU in every 2nd day in single column. Could anyone help me out with the query for this result output.
CREATE TABLE #Temp
(
Sku Nvarchar(50),
OrderDate DateTime,
Quantity Int,
)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#xyz','2020-09-01 00:00:00.000',2)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#xyz','2020-09-02 00:00:00.000',1)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#xyz','2020-09-03 00:00:00.000',3)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#xyz','2020-09-04 00:00:00.000',4)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#xyz','2020-09-05 00:00:00.000',5)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#xyz','2020-09-06 00:00:00.000',6)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#xyz','2020-09-07 00:00:00.000',2)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#xyz','2020-09-08 00:00:00.000',1)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#xyz','2020-09-09 00:00:00.000',3)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#xyz','2020-09-10 00:00:00.000',1)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#xyz','2020-09-11 00:00:00.000',10)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#abc','2020-09-01 00:00:00.000',1)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#abc','2020-09-02 00:00:00.000',10)
INSERT INTO #Temp(Sku,OrderDate,Quantity)Values('#abc','2020-09-03 00:00:00.000',10)
select * from #Temp

Use row_number() window function to generate a sequence number per Sku. Do a GROUP BY (rn - 1) / 2. HAVING COUNT(*) = 2 is to only consider those with 2 rows
; with
cte as
(
select *, rn = row_number() over (partition by Sku order by OrderDate)
from #Temp
)
select Sku, sum(Quantity)
from cte
group by Sku, (rn - 1) / 2
having count(*) = 2
order by Sku , (rn - 1) / 2
Use STRING_AGG if you want the result in CSV.

With ROW_NUMBER() and LAG() window functions:
select Sku, Quantity
from (
select Sku,
row_number() over (partition by Sku order by OrderDate) rn,
Quantity + lag(Quantity) over (partition by Sku order by OrderDate) Quantity
from #Temp
where OrderDate between '20200901' and '20200911'
) t
where rn % 2 = 0
order by Sku, rn;
See the demo.
Results:
> Sku | Quantity
> :--- | -------:
> #abc | 11
> #xyz | 3
> #xyz | 7
> #xyz | 11
> #xyz | 3
> #xyz | 4

Something like this
;with
string_cte(Sku, OrderDate, Quantity, rn_grp) as(
select *, (row_number() over (partition by Sku order by OrderDate)+1)/2
from #Temp),
sum_cte(Sku, rn_grp, sum_quantity) as (
select Sku, rn_grp, sum(quantity)
from string_cte
group by Sku, rn_grp
having count(*)>1)
select
Sku, string_agg(sum_quantity, ',') within group (order by rn_grp) SecondDaySumUp
from sum_cte
group by Sku
order by 1 desc;
Output
Sku SecondDaySumUp
#xyz 3,7,11,3,4
#abc 11

Select ID for corresponding max date using GROUP BY

My table structure as below
Category Sex Last Modified Date Id
7 2 2015-01-16 87603
7 1 2014-11-27 87729
7 2 2018-09-06 87135
7 1 2017-12-27 87568
My sql query as below
SELECT
MAX(Id) AS Id
FROM
Table
GROUP BY
Category, Sex
Result as below
87603
87729
But I would like to get Id as Max Last Modified Date. Correct result should be as below
87135
87568

You can use ROW_NUMBER() to find most recent row per group:
SELECT Id, LastModifiedDate
FROM (
SELECT Id, LastModifiedDate, ROW_NUMBER() OVER (PARTITION BY Category, Sex ORDER BY LastModifiedDate DESC) AS rnk
FROM t
) AS cte
WHERE rnk = 1
Use RANK() if you're interested in finding all rows with ties for LastModifiedDate.

You can also get it as
SELECT T.*
FROM
(
SELECT Sex,
MAX([Last Modified Date]) [Last Modified Date],
Category
FROM T
GROUP BY Sex,
Category
) TT INNER JOIN T ON T.[Last Modified Date] = TT.[Last Modified Date]
WHERE T.Sex = TT.Sex
AND
T.Category = TT.Category;
Returns:
+----------+-----+---------------------+-------+
| Category | Sex | Last Modified Date | Id |
+----------+-----+---------------------+-------+
| 7 | 2 | 06/09/2018 00:00:00 | 87135 |
| 7 | 1 | 27/12/2017 00:00:00 | 87568 |
+----------+-----+---------------------+-------+

We can get the solution by joining the same table with its grouped set:
SELECT MIN(T.Id)
FROM Table T
INNER JOIN (SELECT Category,
Sex,
MAX(LastModifiedDate) AS LastModifiedDate
FROM Table
GROUP BY Category, Sex) GT
ON GT.Category = T.Category
AND GT.Sex = T.Sex
AND GT.LastModifiedDate = T.LastModifiedDate
GROUP BY T.Category, T.Sex

Other option is to use correlated subquery :
select t.*
from table t
where t.LastModifiedDate = (select max(t1.LastModifiedDate)
from table t1
where t1.Category = t.Category and t1.Sex = t.Sex
);

Here are a few different approaches... (in no particular order)
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
GO
CREATE TABLE #TestData (
Category TINYINT NOT NULL,
Sex TINYINT NOT NULL,
LastModifiedDate DATE NOT NULL,
Id INT NOT NULL
);
GO
INSERT #TestData(Category, Sex, LastModifiedDate, Id) VALUES
(7, 2, '2015-01-16', 87603),
(7, 1, '2014-11-27', 87729),
(7, 2, '2018-09-06', 87135),
(7, 1, '2017-12-27', 87568);
GO
/* nonclustered index to support the query. */
CREATE UNIQUE NONCLUSTERED INDEX ix_TestData_Category_Sex_LastModifiedDate
ON #TestData (Category ASC, Sex ASC, LastModifiedDate DESC)
INCLUDE (Id);
GO
--====================================================
-- option 1: TOP(n) WITH TIES...
SELECT TOP (1) WITH TIES
td.Id
FROM
#TestData td
ORDER BY
ROW_NUMBER() OVER (PARTITION BY td.Category, td.Sex ORDER BY td.LastModifiedDate DESC);
GO
-----------------------------------------------------
-- option 2: Filter on ROW_NUMBER()...
WITH
cte_AddRN AS (
SELECT
td.Id,
rn = ROW_NUMBER() OVER (PARTITION BY td.Category, td.Sex ORDER BY td.LastModifiedDate DESC)
FROM
#TestData td
)
SELECT
arn.Id
FROM
cte_AddRN arn
WHERE
arn.rn = 1;
GO
-----------------------------------------------------
-- option 3: binary concatination...
SELECT
Id = CONVERT(INT, SUBSTRING(MAX(bv.bin_val), 4, 4))
FROM
#TestData td
CROSS APPLY ( VALUES (CONVERT(BINARY(3), td.LastModifiedDate) + CONVERT(BINARY(4), td.Id)) ) bv (bin_val)
GROUP BY
td.Category,
td.Sex;
GO
--====================================================

Return the current month salary and previous month salary in a same table

I have a task to prepare a report generated from a run control page and retrieve the current month salary and previous month salary. In that page, the user will choose the cal_id they want for example in this case the user choose cal id = FEB. Assume the table as below named table_salary:
emplid | cal_id | salary | pymt_date
101 | JAN | 10000 | 2018-01-01
101 | FEB | 15000 | 2018-02-01
And my expected output is
emplid | cur_sal| prev_sal
101 | 15000 | 10000
What I have done so far is like below
SELECT
A.EMPLID, A.SALARY AS CUR_SAL, B.SALARY AS PREV_SAL
FROM
TABLE_SALARY A
LEFT OUTER JOIN
TABLE_SALARY B ON A.EMPLID AND B.EMPLID
AND A.CAL_ID = B.CAL_ID
AND B.PYMT_DT = (SELECT MAX(B1.PYMT_DT)
FROM TABLE_SALARY B1
WHERE B1.EMPLID = B.EMPLID
AND B1.PYMT_DT >= DATEADD(mm, DATEDIFF(mm, 0, B.PYMT_DT) - 1, 0)
AND B1.PYMT_DT < DATEADD(mm, DATEDIFF(mm, 0, PYMT_DT), 0))
But above SQL didn't return the expected output.
Does anyone have an idea how to achieve my expected output?

It should be like this
Use Lead instead of Lag
Create table #t ( id int identity (1,1), Empid int , Month varchar
(10), Salary int, Paymentdate date )
insert into #t (Empid ,Month,Salary,Paymentdate) Select
'1','Jan',1000, '2018-01-01'
insert into #t (Empid ,Month,Salary,Paymentdate) Select
'1','Feb',1500, '2018-02-01'
Select * from #t
SELECT TOP 1
Empid, SALARY AS CUR_SAL, Lead(SALARY, 1, 0) OVER (ORDER BY PaymentDate DESC) AS PREV_SAL FROM
#t ORDER BY
Paymentdate DESC
SELECT TOP 1
Empid, SALARY AS CUR_SAL, LAG(SALARY, 1, 0) OVER (ORDER BY PaymentDate DESC) AS PREV_SAL FROM
#t ORDER BY
Paymentdate DESC

Use a window function to retrieve the previous row in a sorted set. I think this should work.
SELECT TOP 1
EMPLID, SALARY AS CUR_SAL, LEAD(SALARY, 1, 0) OVER (ORDER BY PYMT_DT DESC) AS PREV_SAL
FROM
TABLE_SALARY
ORDER BY
PYMT_DT DESC

Merge rows based on the same date?

I have a table that looks like the below
Date | ID | Period | ArchivedBy | ArchivedFlag | Value
2018-01-20 12:23 |23344 | Q1 | NULL | NULL | 200
2018-01-20 12:20 |23344 | NULL | P.Tills | 1 | NULL
2018-01-20 12:19 |23344 | NULL | NULL | 1 | NULL
This table represents all edits made to an agreement (each new edit gets it's own row). If a value hasn't been changed at all, it will say NULL.
so ideally the above would look like the following
Date | ID | Period | ArchivedBy | ArchivedFlag | Value
2018-01-20 |23344 | Q1 | P.Tills | 1 | 200
This returned row should show the latest state of the agreement based on the date. So for the date in my example (2018-01-20) this one row would be returned, combining all changes that were made throughout the day into 1 row which shows how it looks following all the changes throughout the day.
I hope this makes sense?
Thank you!

Here is one way using Row_Number and Group by
SELECT [Date] = Cast([Date] AS DATE),
ID,
Max(period),
Max(ArchivedBy),
Max(ArchivedFlag),
Max(CASE WHEN rn = 1 THEN [Value] END)
FROM (SELECT *,
Rn = Row_number()OVER(partition BY Cast([Date] AS DATE), ID ORDER BY [Date] DESC)
FROM Yourtable)a
GROUP BY Cast([Date] AS DATE),
ID

I would propose 2 solutions.
Simple
For each day select top 1 NOT NULL value:
SELECT G.ID, G.GD Date, Period.*, ArchivedBy.*, Value.* FROM
(SELECT DISTINCT ID, CAST(Date AS Date) GD FROM T) G
CROSS APPLY (SELECT TOP 1 Period FROM T WHERE Period IS NOT NULL AND CAST(Date AS Date)=GD ORDER BY Date DESC) Period
CROSS APPLY (SELECT TOP 1 ArchivedBy FROM T WHERE ArchivedBy IS NOT NULL AND CAST(Date AS Date)=GD ORDER BY Date DESC) ArchivedBy
CROSS APPLY (SELECT TOP 1 Value FROM T WHERE Value IS NOT NULL AND CAST(Date AS Date)=GD ORDER BY Date DESC) Value
Optimized (intuitively, not tested*)
Use varbinary sorting rules and aggregation, manually order NULLs:
SELECT CAST(Date AS Date), ID,
CAST(SUBSTRING(MAX(Arch),9, LEN(MAX(Arch))) AS varchar(10)) ArchivedBy --unbox
--other columns
FROM
(
SELECT Date, ID,
CAST(CASE WHEN ArchivedBy IS NOT NULL THEN ROW_NUMBER() OVER (PARTITION BY CAST(Date AS Date) ORDER BY Date) ELSE 0 END AS varbinary(MAX))+CAST(ArchivedBy AS varbinary(MAX)) Arch --box
--other columns
FROM T
) Tab
GROUP BY ID, CAST(Date AS Date)

How to use RANK() in SQL Server

I have a problem using RANK() in SQL Server.
Here’s my code:
SELECT contendernum,
totals,
RANK() OVER (PARTITION BY ContenderNum ORDER BY totals ASC) AS xRank
FROM (
SELECT ContenderNum,
SUM(Criteria1+Criteria2+Criteria3+Criteria4) AS totals
FROM Cat1GroupImpersonation
GROUP BY ContenderNum
) AS a
The results for that query are:
contendernum totals xRank
1 196 1
2 181 1
3 192 1
4 181 1
5 179 1
What my desired result is:
contendernum totals xRank
1 196 1
2 181 3
3 192 2
4 181 3
5 179 4
I want to rank the result based on totals. If there are same value like 181, then two numbers will have the same xRank.

Change:
RANK() OVER (PARTITION BY ContenderNum ORDER BY totals ASC) AS xRank
to:
RANK() OVER (ORDER BY totals DESC) AS xRank
Have a look at this example:
SQL Fiddle DEMO
You might also want to have a look at the difference between RANK (Transact-SQL) and DENSE_RANK (Transact-SQL):
RANK (Transact-SQL)
If two or more rows tie for a rank, each tied rows receives the same
rank. For example, if the two top salespeople have the same SalesYTD
value, they are both ranked one. The salesperson with the next highest
SalesYTD is ranked number three, because there are two rows that are
ranked higher. Therefore, the RANK function does not always return
consecutive integers.
DENSE_RANK (Transact-SQL)
Returns the rank of rows within the partition of a result set, without
any gaps in the ranking. The rank of a row is one plus the number of
distinct ranks that come before the row in question.

To answer your question title, "How to use Rank() in SQL Server," this is how it works:
I will use this set of data as an example:
create table #tmp
(
column1 varchar(3),
column2 varchar(5),
column3 datetime,
column4 int
)
insert into #tmp values ('AAA', 'SKA', '2013-02-01 00:00:00', 10)
insert into #tmp values ('AAA', 'SKA', '2013-01-31 00:00:00', 15)
insert into #tmp values ('AAA', 'SKB', '2013-01-31 00:00:00', 20)
insert into #tmp values ('AAA', 'SKB', '2013-01-15 00:00:00', 5)
insert into #tmp values ('AAA', 'SKC', '2013-02-01 00:00:00', 25)
You have a partition which basically specifies grouping.
In this example, if you partition by column2, the rank function will create ranks for groups of column2 values. There will be different ranks for rows where column2 = 'SKA' than rows where column2 = 'SKB' and so on.
The ranks are decided like this:
The rank for every record is one plus the number of ranks that come before it in its partition. The rank will only increment when one of the fields you selected (other than the partitioned field(s)) is different than the ones that come before it. If all of the selected fields are the same, then the ranks will tie and both will be assigned the value, one.
Knowing this, if we only wanted to select one value from each group in column two, we could use this query:
with cte as
(
select *,
rank() over (partition by column2
order by column3) rnk
from t
) select * from cte where rnk = 1 order by column3;
Result:
COLUMN1 | COLUMN2 | COLUMN3 |COLUMN4 | RNK
------------------------------------------------------------------------------
AAA | SKB | January, 15 2013 00:00:00+0000 |5 | 1
AAA | SKA | January, 31 2013 00:00:00+0000 |15 | 1
AAA | SKC | February, 01 2013 00:00:00+0000 |25 | 1
SQL DEMO

You have to use DENSE_RANK rather than RANK. The only difference is that it doesn't leave gaps. You also shouldn't partition by contender_num, otherwise you're ranking each contender in a separate group, so each is 1st-ranked in their segregated groups!
SELECT contendernum,totals, DENSE_RANK() OVER (ORDER BY totals desc) AS xRank FROM
(
SELECT ContenderNum ,SUM(Criteria1+Criteria2+Criteria3+Criteria4) AS totals
FROM dbo.Cat1GroupImpersonation
GROUP BY ContenderNum
) AS a
order by contendernum
A hint for using StackOverflow, please post DDL and sample data so people can help you using less of their own time!
create table Cat1GroupImpersonation (
contendernum int,
criteria1 int,
criteria2 int,
criteria3 int,
criteria4 int);
insert Cat1GroupImpersonation select
1,196,0,0,0 union all select
2,181,0,0,0 union all select
3,192,0,0,0 union all select
4,181,0,0,0 union all select
5,179,0,0,0;

DENSE_RANK() is a rank with no gaps, i.e. it is “dense”.
select Name,EmailId,salary,DENSE_RANK() over(order by salary asc) from [dbo].[Employees]
RANK()-It contain gap between the rank.
select Name,EmailId,salary,RANK() over(order by salary asc) from [dbo].[Employees]

You have already grouped by ContenderNum, no need to partition again by it.
Use Dense_rank()and order by totals desc.
In short,
SELECT contendernum,totals, **DENSE_RANK()**
OVER (ORDER BY totals **DESC**)
AS xRank
FROM
(
SELECT ContenderNum ,SUM(Criteria1+Criteria2+Criteria3+Criteria4) AS totals
FROM dbo.Cat1GroupImpersonation
GROUP BY ContenderNum
) AS a

SELECT contendernum,totals, RANK() OVER (ORDER BY totals ASC) AS xRank FROM
(
SELECT ContenderNum ,SUM(Criteria1+Criteria2+Criteria3+Criteria4) AS totals
FROM dbo.Cat1GroupImpersonation
GROUP BY ContenderNum
) AS a

RANK() is good, but it assigns the same rank for equal or similar values. And if you need unique rank, then ROW_NUMBER() solves this problem
ROW_NUMBER() OVER (ORDER BY totals DESC) AS xRank

Select T.Tamil, T.English, T.Maths, T.Total, Dense_Rank()Over(Order by T.Total Desc) as Std_Rank From (select Tamil,English,Maths,(Tamil+English+Maths) as Total From Student) as T
enter image description here

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Find pairs of data in rows and convert to columns - sql-server

Related

Get sum up of every 2nd day data between a selected date range

Select ID for corresponding max date using GROUP BY

Return the current month salary and previous month salary in a same table

Merge rows based on the same date?

How to use RANK() in SQL Server

Categories

Resources