return each date that a value changed - sql-server

I have a table with a PersonID, TeamID, and DateEffective. I'd like to return the earliest date that each person changed team. It is possible that a person can move back into a team they were previously allocated to.
Create Table PHistory
(PersonID INT,
TeamID VARCHAR(8),
DateEffective datetime)
Insert INTO PHistory
Values (1, 'TeamA', '2017-07-01'),
(1, 'TeamA', '2017-07-02'),
(1, 'TeamB', '2017-07-03'),
(1, 'TeamA', '2017-07-04'),
(2, 'TeamA', '2017-07-01'),
(2, 'TeamA', '2017-07-02'),
(2, 'TeamA', '2017-07-03'),
(2, 'TeamA', '2017-07-04')
I'd like the query to return:
1|'TeamA'|2017-07-01
1|'TeamB'|2017-07-03
1|'TeamA'|2017-07-04
2|'TeamA'|2017-07-01
I've messed around with MIN() but it doesn't cope with the fact that people can move back into old teams.
I've seen this post that uses a window function, but it only returns the latest record and I need to see all team changes (I couldn't see how to adapt it to my needs).

Here is one approach using row_number
select
PersonId, minDate = min(DateEffective)
from (
select
*, rn1 = row_number() over (partition by PersonId order by DateEffective)
, rn2 = row_number() over (partition by PersonId, TeamId order by DateEffective)
from
PHistory
) t
group by PersonId, rn1 - rn2
order by 1,2
Output
1 2017-07-01 00:00:00.000
1 2017-07-03 00:00:00.000
1 2017-07-04 00:00:00.000
2 2017-07-01 00:00:00.000

Related

Snowflake : IN operator

so I want something as below in my query
select * from table a
where a.id in(select id, max(date) from table a group by id)
I am getting error here , as IN is equivalent to = .
how to do it?
example :
id
date
1
2022-31-01
1
2022-21-03
2
2022-01-01
2
2022-02-01
I need to get only one record based on date(max). The table has more columns than just id and date
so I need to something like this in snowflake
select * from table a
where id in(select id,max(date) from table a group by id)
```-----------------------
All solutions are working , if i select from table .
but i have case statement in view where duplicate records are coming
example :
create or replace view v_test
as
select * from
(
select id,lastdatetime,*,
case when start_date < timestamp and timestamp < end
and move_date = '9999-12-31' then 'Y'
else 'N' end as IND
from table a
) a
so if any one select view where IND= 'Y', more than 1 records are coming
what i want is to select latest records for ID where IND='Y' and max(lastdatetime)
how to incorporate this logic in view?
I think you are trying to get the latest record for each id?
select *
from table a
qualify row_number() over (partition by id order by date desc) = 1
So if we look at your sub-select:
using this "data" for the examples:
with data (id, _date) as (
select column1, to_date(column2, 'yyyy-dd-mm') from values
(1, '2022-31-01'),
(1, '2022-21-03'),
(2, '2022-01-01'),
(2, '2022-02-01')
)
select id, max(_date)
from data
group by 1;
it gives:
ID
MAX(_DATE)
1
2022-03-21
2
2022-01-02
which makes it seem you want the "the last date, per id"
which can classically (ansi sql) be written:
select d.*
from data as d
join (
select
id,
max(_date) as max_date
from data
group by 1
) as c
on d.id = c.id and d._date = c.max_date
;
ID
_DATE
1
2022-03-21
2
2022-01-02
which gives you "all the rows values". BUT if you have many rows with the same last date, you will get those, in the output.
Another methods is to use a ROW_NUMBER to pick one and only one row, which is the style of answer Mike has given:
with data (id, _date, extra) as (
select column1, to_date(column2, 'yyyy-dd-mm'), column3 from values
(1, '2022-31-01', 'extra_a'),
(1, '2022-21-03', 'extra_b_double_a'),
(1, '2022-21-03', 'extra_b_double_b'),
(2, '2022-01-01', 'extra_c'),
(2, '2022-02-01', 'extra_d')
)
select *
from data
qualify row_number() over (partition by id order by _date desc) =1 ;
gives:
ID
_DATE
EXTRA
1
2022-03-21
extra_b_double_a
2
2022-01-02
extra_d
now if you want the "all rows of the last day" you method works, albeit the QUALIFY/ROW_NUMBER is faster. You can use RANK
with data (id, _date, extra) as (
select column1, to_date(column2, 'yyyy-dd-mm'), column3 from values
(1, '2022-31-01', 'extra_a'),
(1, '2022-21-03', 'extra_b_double_a'),
(1, '2022-21-03', 'extra_b_double_b'),
(2, '2022-01-01', 'extra_c'),
(2, '2022-02-01', 'extra_d')
)
select *
from data
qualify dense_rank() over (partition by id order by _date desc) =1 ;
ID
_DATE
EXTRA
1
2022-03-21
extra_b_double_a
1
2022-03-21
extra_b_double_b
2
2022-01-02
extra_d
Now the last thing that it almost seems you are asking for, is "how do find the ID with the most recent data (here 1) and get all rows for that"
with data (id, _date, extra) as (
select column1, to_date(column2, 'yyyy-dd-mm'), column3 from values
(1, '2022-31-01', 'extra_a'),
(1, '2022-21-03', 'extra_b_double_a'),
(1, '2022-21-03', 'extra_b_double_b'),
(2, '2022-01-01', 'extra_c'),
(2, '2022-02-01', 'extra_d')
)
select *
from data
qualify id = last_value(id) over (order by _date);
Here is an example of how to use the in operator with a subquery:
select * from table1 t1 where t1.id in (select t2.id from table2 t2);
Usage of IN is possible to match on both columns:
select *
from tab AS a
where (a.id, a.date) in (select id, max(date) from tab group by id);
For sample data:
CREATE TABLE tab (id, date)
AS
SELECT column1, to_date(column2, 'yyyy-dd-mm')
FROM VALUES
(1, '2022-31-01'),
(1, '2022-21-03'),
(2, '2022-01-01'),
(2, '2022-02-01');
Output:

Need help pulling specific type of ID when there are multiple IDs

I have a dataset where an employee with one SSN can have multiple employee IDs. In that situation I need to only brings back records where the employee ID begins with '200.' In most situations there will only be one employee ID or the employee ID is null(which is okay to bring back).
This is a sample dataset:
declare #t table(id int, name varchar(100), ssn int, eeid int)
insert into #t
values(1, 'John Smith', '55512', '2006544'),
(1, 'John Smith', '55512', '12345'),
(2, 'Bob Johnson', '55514', '200454'),
(3, 'Tom Smith', '44454', NULL),
(4, 'John Thompson', '45434', '204435'),
(4, 'John Thompson', '45434', '12353568')
The output should look like this:
Id Name SSN EEID
1 John Smith 55512 2006544
2 Bob Johnson 55514 200454
3 Tom Smith 44454 NULL
4 John Thompson 45434 204435
I tried playing with a Window function but got stuck. I tried using Rownum but it didn't give the correct result with 'John Smith.'
select *,
row_number()over(partition by ssn ORDER BY case when EEID like '200%'
then 1 end) AS ROWNUM
from #t
ROW_NUMBER, on it's own, doesn't change the returned rows, it just "numbers" them. You'd either need to put the expression in a CTE/Subquery and then filter to the first row in the WHERE, or you use TOP and put the expression in the ORDER BY:
--CTE solution
WITH CTE AS(
SELECT ID,
[Name],
SSN,
EEID,
ROW_NUMBER() OVER (PARTITION BY SSD ORDER BY CASE WHEN eeid LIKE '200%' THEN 1 ELSE 2 END, eeid ASC) AS RN
FROM dbo.YourTable)
SELECT ID,
[Name],
SSN,
EEID
FROM CTE
WHERE RN = 1;
--ORDER BY and TOP solution
SELECT TOP (1) WITH TIES
ID,
[Name],
SSN,
EEID
FROM dbo.YourTable
ORDER BY ROW_NUMBER() OVER (PARTITION BY SSD ORDER BY CASE WHEN eeid LIKE '200%' THEN 1 ELSE 2 END, eeid);

Only show value of Max rows with partition by?

the title might be a bit off however i'm trying to remove the values of a row without removing the actual row.
This is my table:
SELECT ID,CustomerID,Weight FROM Orders
What am i trying to accomplish is this:
The MAX() value of ID Group By CustomerID that would give me null values in Weight where max and group by is not set
Is it possible to do this in one line? with a partiton by?
Something like:
SELECT MAX(ID) over (partition by CustomerID,Weight).... I know this is wrong but if possible to do without a join or CTE and only in one line in the select statement that would be great.
One possible approach is using ROW_NUMBER:
SELECT
ID,
CustomerID,
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY ID DESC) = 1 THEN [Weight]
ELSE Null
END AS [Weight]
FROM #Orders
ORDER BY ID
Input:
CREATE TABLE #Orders (
ID int,
CustomerID int,
[Weight] int
)
INSERT INTO #Orders
(ID, CustomerID, [Weight])
VALUES
(1, 11, 100),
(2, 11, 17),
(3, 11, 35),
(4, 22, 26),
(5, 22, 78),
(6, 22, 10030)
Output:
ID CustomerID Weight
1 11 NULL
2 11 NULL
3 11 35
4 22 NULL
5 22 NULL
6 22 10030
Try this
;WITH CTE
AS
(
SELECT
MAX_ID = MAX(ID) OVER(PARTITION BY CustomerId),
ID,
CustomerId,
Weight
FROM Orders
)
SELECT
ID,
CustomerId,
Weight = CASE WHEN ID = MAX_ID THEN Weight ELSE NULL END
FROM CTE
You can try this.
SELECT ID,CustomerId,CASE WHEN ID= MAX(ID) OVER(PARTITION BY CustomerId) THEN Weight ELSE NULL END AS Weight FROM Orders

Sort by most recent but keep together by another ID column

I am trying to get some sorting and keep together (not really grouping) working.
In my sample data I would like to keep the DealerIDs together, sorted by IsPrimaryDealer DESC, but show the group (ok maybe it is grouping) of dealers by the ones with the most recent entry.
Result set 2 is the closest, but Grant and his brother should be displayed as the first two rows, in that order. (Grant should be row 1, Grants Brother row 2 because Grants Brother was the most recently added)
DECLARE #temp TABLE (
DealerPK int not null IDENTITY(1,1), DealerID int,
IsPrimaryDealer bit, DealerName varchar(50), DateAdded datetime
)
INSERT INTO #temp VALUES
(1, 1, 'Bob', GETDATE() - 7),
(2, 1, 'Robert', GETDATE() - 7),
(3, 1, 'Grant', GETDATE() - 7),
(3, 0, 'Grants Brother', GETDATE() - 1),
(2, 0, 'Roberts Nephew', GETDATE() - 2),
(1, 0, 'Bobs Cousin', GETDATE() - 3)
-- Data As Entered
SELECT * FROM #temp
-- Data Attempt at Row Numbering
SELECT *, intPosition =
ROW_NUMBER() OVER (PARTITION BY IsPrimaryDealer ORDER BY DealerID, IsPrimaryDealer DESC)
FROM #temp
ORDER BY DateAdded DESC
-- Data Attempt By DateAdded
SELECT *, intPosition =
ROW_NUMBER() OVER (PARTITION BY DealerID ORDER BY DateAdded DESC)
FROM #temp
ORDER BY intPosition, DateAdded
Expected Result
PK DID IsPr Name DateAdded
3 3 1 Grant 2015-10-08 17:14:26.497
4 3 0 Grants Brother 2015-10-14 17:14:26.497
2 2 1 Robert 2015-10-08 17:14:26.497
5 2 0 Roberts Nephew 2015-10-13 17:14:26.497
1 1 1 Bob 2015-10-08 17:14:26.497
6 1 0 Bobs Cousin 2015-10-12 17:14:26.497
As requested by OP:
;WITH Cte AS(
SELECT *,
mx = MAX(DateAdded) OVER(PARTITION BY DealerID) FROM #temp
)
SELECT *
FROM Cte
ORDER BY mx DESC, DealerID, IsPrimaryDealer DESC
Hope i understood your question,
This query results expected output :
SELECT Row_number()
OVER (
PARTITION BY DealerID
ORDER BY DealerPK)RN,
DealerPK,
DealerID,
IsPrimaryDealer,
DealerName,
DateAdded
FROM #temp
ORDER BY DealerID DESC

SQL - Filter on dates X number of days apart from the previous

I have a table containing orders. I would like to select those orders that are a certain number of days apart for a specific client. For example, in the table below I would like to select all of the orders for CustomerID = 10 that are at least 30 days apart from the previous instance. With the starting point to be the first occurrence (07/05/2014 in this data).
OrderID | CustomerID | OrderDate
==========================================
1 10 07/05/2014
2 10 07/15/2014
3 11 07/20/2014
4 11 08/20/2014
5 11 09/21/2014
6 10 09/23/2014
7 10 10/15/2014
8 10 10/30/2014
I would want to select OrderIDs (1,6,8) since they are 30 days apart from each other and all from CustomerID = 10. OrderIDs 2 and 7 would not be included as they are within 30 days of the previous order for that customer.
What confuses me is how to set the "checkpoint" to the last valid date. Here is a little "pseudo" SQL.
SELECT OrderID
FROM Orders
WHERE CusomerID = 10
AND OrderDate > LastValidOrderDate + 30
i came here and i saw #SveinFidjestøl already posted answer but i can't control my self after by long tried :
with the help of LAG and LEAD we can comparison between same column
and as per your Q you are looking 1,6,8. might be this is helpful
SQL SERVER 2012 and after
declare #temp table
(orderid int,
customerid int,
orderDate date
);
insert into #temp values (1, 10, '07/05/2014')
insert into #temp values (2, 10, '07/15/2014')
insert into #temp values (3, 11, '07/20/2014')
insert into #temp values (4, 11, '08/20/2014')
insert into #temp values (5, 11, '09/21/2014')
insert into #temp values (6, 10, '09/23/2014')
insert into #temp values (7, 10, '10/15/2014')
insert into #temp values (8, 10, '10/30/2014');
with cte as
(SELECT orderid,customerid,orderDate,
LAG(orderDate) OVER (ORDER BY orderid ) PreviousValue,
LEAD(orderDate) OVER (ORDER BY orderid) NextValue,
rownum = ROW_NUMBER() OVER (ORDER BY orderid)
FROM #temp
WHERE customerid = 10)
select orderid,customerid,orderDate from cte
where DATEDIFF ( day , PreviousValue , orderDate) > 30
or PreviousValue is null or NextValue is null
SQL SERVER 2005 and after
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY p.orderid),
p.orderid,
p.customerid,
p.orderDate
FROM #temp p
where p.customerid = 10)
SELECT CTE.orderid,CTE.customerid,CTE.orderDate,
prev.orderDate PreviousValue,
nex.orderDate NextValue
FROM CTE
LEFT JOIN CTE prev ON prev.rownum = CTE.rownum - 1
LEFT JOIN CTE nex ON nex.rownum = CTE.rownum + 1
where CTE.customerid = 10
and
DATEDIFF ( day , prev.orderDate , CTE.orderDate) > 30
or prev.orderDate is null or nex.orderDate is null
GO
You can use the LAG() function, available in SQL Server 2012, together with a Common Table Expression. You calculate the days between the customer's current order and the customer's previous order and then query the Common Table Expression using the filter >= 30
with cte as
(select OrderId
,CustomerId
,datediff(d
,lag(orderdate) over (partition by CustomerId order by OrderDate)
,OrderDate) DaysSinceLastOrder
from Orders)
select OrderId, CustomerId, DaysSinceLastOrder
from cte
where DaysSinceLastOrder >= 30 or DaysSinceLastOrder is null
Results:
OrderId CustomerId DaysSinceLastOrder
1 10 NULL
6 10 70
3 11 NULL
4 11 31
5 11 32
(Note that 1970-01-01 is chosen arbitrarily, you may choose any date)
Update
A slighty more reliable way of doing it will involve a temporary table. But the original table tbl can be left unchanged. See here:
CREATE TABLE #tmp (id int); -- set-up temp table
INSERT INTO #tmp VALUES (1); -- plant "seed": first oid
WHILE (##ROWCOUNT>0)
INSERT INTO #tmp (id)
SELECT TOP 1 OrderId FROM tbl
WHERE OrderId>0 AND CustomerId=10
AND OrderDate>(SELECT max(OrderDate)+30 FROM tbl INNER JOIN #tmp ON id=OrderId)
ORDER BY OrderDate;
-- now list all found entries of tbl:
SELECT * FROM tbl WHERE EXISTS (SELECT 1 FROM #tmp WHERE id=OrderId)
#tinka shows how to use CTEs to do the trick, and the new windowed functions (for 2012 and later) are probably the best answer. There is also the option, assuming you do not have a very large data set, to use a recursive CTE.
Example:
declare #customerid int = 10;
declare #temp table
(orderid int,
customerid int,
orderDate date
);
insert into #temp values (1, 10, '07/05/2014')
insert into #temp values (2, 10, '07/15/2014')
insert into #temp values (3, 11, '07/20/2014')
insert into #temp values (4, 11, '08/20/2014')
insert into #temp values (5, 11, '09/21/2014')
insert into #temp values (6, 10, '09/23/2014')
insert into #temp values (7, 10, '10/15/2014')
insert into #temp values (8, 10, '10/30/2014');
with datefilter AS
(
SELECT row_number() OVER(PARTITION BY CustomerId ORDER BY OrderDate) as RowId,
OrderId,
CustomerId,
OrderDate,
DATEADD(day, 30, OrderDate) as FilterDate
from #temp
WHERE CustomerId = #customerid
)
, firstdate as
(
SELECT RowId, OrderId, CustomerId, OrderDate, FilterDate
FROM datefilter
WHERE rowId = 1
union all
SELECT datefilter.RowId, datefilter.OrderId, datefilter.CustomerId,
datefilter.OrderDate, datefilter.FilterDate
FROM datefilter
join firstdate
on datefilter.CustomerId = firstdate.CustomerId
and datefilter.OrderDate > firstdate.FilterDate
WHERE NOT EXISTS
(
SELECT 1 FROM datefilter betweens
WHERE betweens.CustomerId = firstdate.CustomerId
AND betweens.orderdate > firstdate.FilterDate
AND datefilter.orderdate > betweens.orderdate
)
)
SELECT * FROM firstdate

Resources