Segregate same columns into two columns - snowflake-cloud-data-platform

Segregate same columns into two columns - snowflake-cloud-data-platform

I have a table with columns:
Id- unique id for each name
Name- client name
Date- account opening date
Sample table:
Id
Name
Date
101
a
8/7/2022
102
b
6/6/2022
101
a
16/8/2022
104
d
13/8/2022
105
e
23/4/2022
Query I am using:
Select id, name,
Max(date) over (partition by id) as max,
Min(date) over (partition by id) as min
From table
Output:
Id
Name
Date
Max
Min
101
a
8/7/22
16/8/22
8/7/22
102
b
6/6/22
6/6/22
6/6/22
101
a
16/8/22
16/8/22
8/7/22
104
d
13/8/22
13/8/22
13/8/22
105
e
23/4/22
23/4/22
23/4/22
The question is to divide the date column into maximum date and minimum date. But with that also where ever there is same date for that I need that date only in max column with null in min.
clear understanding of the question can be done from expected output.
Expected output:
Id
Name
Date
Max
Min
101
a
8/7/22
16/8/22
8/7/22
102
b
6/6/22
6/6/22
101
a
16/8/22
16/8/22
8/7/22
104
d
13/8/22
13/8/22
105
e
23/4/22
23/4/22
If there is help for this question, then do let me know.

You can use IFF function or Case Statement
;With SampleData As
(
Select Id, Name, To_Date(Date) AS Date
From
(
Values(101, 'a', '07/08/2022')
, (102, 'b', '06/06/2022')
, (101, 'a', '08/16/2022')
, (104, 'd', '08/13/2022')
, (105, 'e', '04/23/2022')
) AS Data(Id, Name, Date)
)
Select id
, name
, Max(date) over (partition by id) as max
, IFF(Min(date) over (partition by id) = Max(date) over (partition by id)
, NULL
, Min(date) over (partition by id)) as min
From SampleData
Please note that my date format is different than yours.

Mova's answer is correct and concise, it also can be expressed as nested select again with a CTE for data:
with data(Id, Name, Date) as (
select column1, column2, to_date(column3, 'dd/mm/yyyy') from values
(101, 'a', '8/7/2022'),
(102, 'b', '6/6/2022'),
(101, 'a', '16/8/2022'),
(104, 'd', '13/8/2022'),
(105, 'e', '23/4/2022')
)
as
select
id,
name,
date,
max_date,
iff(max_date != min_date, min_date, null) as min_date
from(
select
id
,name
,date
,max(date) over(partition by id, name) as max_date
,min(date) over(partition by id, name) as min_date
from data
)
ID
NAME
DATE
MAX_DATE
MIN_DATE
101
a
2022-07-08
2022-08-16
2022-07-08
102
b
2022-06-06
2022-06-06
null
101
a
2022-08-16
2022-08-16
2022-07-08
104
d
2022-08-13
2022-08-13
null
105
e
2022-04-23
2022-04-23
null
or a CASE can be used instead of the IFF
select
id,
name,
date,
max_date,
case when max_date != min_date then min_date end as min_date
from(
...
this shows we are trying to get a null value in some cases or the value, which is what NULLIF does, so this can be convert to one short line:
select
id
,name
,date
,max(date) over(partition by id, name) as max_date
,nullif(min(date) over(partition by id, name), max(date) over(partition by id, name)) as min_date
from data
and we can reuse the max_date, to make the small also, thus:
select
id
,name
,date
,max(date) over(partition by id, name) as max_date
,nullif(min(date) over(partition by id, name), max_date) as min_date
from data

Related

Repeated data on inserted rows

--demo setup
drop table if exists dbo.product
go
create table dbo.Product
(
ProductId int,
ProductTitle varchar(55),
ProductCategory varchar(255),
Loaddate datetime
)
insert into dbo.Product
values (1, 'Table', 'ABCD', '3/4/2018'),
(1, 'Table', 'ABCD', '3/5/2018'),
(1, 'Table', 'ABCD', '3/6/2018'),
(1, 'Table', 'XYZ', '3/7/2018'),
(1, 'Table', 'XYZ', '3/8/2018'),
(1, 'Table', 'XYZ', '3/9/2018'),
(1, 'Table', 'GHI', '3/10/2018'),
(1, 'Table', 'GHI', '3/11/2018'),
(1, 'Table', 'XYZ', '3/12/2018'),
(1, 'Table', 'XYZ', '3/13/2018')
SELECT
product.productid,
product.producttitle,
product.productcategory,
MIN(product.loaddate) AS BeginDate,
-- ,max(product.LoadDate) as BeginDate1
CASE
WHEN MAX(product.loaddate) = MAX(oa.enddate1)
THEN '12/31/9999'
ELSE MAX(product.loaddate)
END AS EndDate
FROM
dbo.product product
CROSS APPLY
(SELECT MAX(subproduct.loaddate) EndDate1
FROM dbo.product subproduct
WHERE subproduct.productid = product.productid) oa
GROUP BY
productid, producttitle, productcategory
Output:
productid
producttitle
productcategory
BeginDate
EndDate
1
Table
ABCD
2018-03-04 00:00:00.000
2018-03-06 00:00:00.000
1
Table
XYZ
2018-03-07 00:00:00.000
9999-12-31 00:00:00.000
1
Table
GHI
2018-03-10 00:00:00.000
2018-03-11 00:00:00.000
Desired output:
productid
producttitle
productcategory
BeginDate
EndDate
1
Table
ABCD
2018-03-04 00:00:00.000
2018-03-06 00:00:00.000
1
Table
XYZ
2018-03-07 00:00:00.000
2018-03-09 00:00:00.000
1
Table
GHI
2018-03-10 00:00:00.000
2018-03-11 00:00:00.000
1
Table
XYZ
2018-03-12 00:00:00.000
9999-12-31 00:00:00.000
The last two inserted rows repeat the data from Loaddate '3/7/2018'-'3/9/2018', this doesn't happen if any of the new inserted rows doesn't repeat data. The only thing that changes is the LoadDate, giving me incorrect output. how can i get something like that desired output?

Well, first of all, you need to find a sequence number over all your records. If you already have a primary key, that's good. In example you gave us, there's no such column, so let's generate it.
Then, we make pairs with start and end dates for each product's category change. Another thing is to group all these product's category changes.
Finally, we make just a simple group by:
;
with cte as ( select *,
row_number() over(partition by ProductId order by Loaddate) as rn
from product
), cte2 as ( select t1.ProductId,
t1.ProductTitle,
t1.ProductCategory,
t1.Loaddate as BeginDate,
case
when t1.ProductCategory <> t2.ProductCategory
then t1.Loaddate
else coalesce(t2.Loaddate, null)
end as EndDate,
row_number() over(order by t1.ProductId, t1.Loaddate) as rn_overall,
row_number() over(partition by t1.ProductId, t1.ProductCategory order by t1.Loaddate) as rn_category
from cte as t1
left join cte as t2
on t2.ProductId = t1.ProductId
and t2.rn = t1.rn + 1
), cte3 as ( select *,
min(rn_overall) over (partition by ProductId, ProductCategory, rn_overall - rn_category) as product_group
from cte2
)
select ProductId, ProductTitle, ProductCategory,
min(BeginDate) as BeginDate,
case
when max(case when EndDate is null then 1 else 0 end) = 0
then max(EndDate)
else null
end as EndDate
from cte3
group by ProductId, ProductTitle, ProductCategory, product_group
order by ProductId, BeginDate

How to update rows between two different sets of criteria in SQL Server without using a loop

Issue: How to update rows between two different sets of criteria in SQL Server without using a loop (SQL Server 2014). In other words, for each row in a result set, how to update every row between the first occurrence (with one criterion) and the second occurrence (with different criteria). I think part of the issue is trying to run a TOP N query for every row in the query.
Specifically:
In the example starting table below, how can I update the last 2 columns of dates where:
Update rows between the null Category rows and the last consecutive "M" Category row if the null Category row is preceded by a "S" Category. Category can contain any order of "S", "M", or null.
Set StartDate = IDEndDate+1 day of the "S" row preceding the null row.
Set EndDate = IDEndDate of the last row with a "M" Category.
Here is a SQLFiddle.
Notes: I have done this in the past with a loop (fetch..) but I am trying to do this with a few queries instead kind of like:
step 1: Get work: select all valid null rows (beginning of range)
step 2: for each row above, select the related last "M" row (end of range) and then run a query to update the StartDate, EndDates in each range.
Starting Table:
ID IDStartDate IDEndDate Category
------------------------------------
11 2017-01-01 2017-01-31 S
11 2017-02-02 2017-02-03 null
11 2017-02-03 2017-03-31 M
11 2017-04-01 2017-04-30 M
22 2017-05-01 2017-06-15 S
22 2017-06-16 2017-06-20 null
22 2017-06-21 2017-06-25 M
22 2017-06-26 2017-06-27 null
22 2017-06-28 2017-06-29 S
22 2017-06-30 2017-07-05 M
33 2017-06-30 2017-07-14 M
33 2017-07-15 2017-07-20 S
33 2017-07-21 2017-07-25 null
44 2018-06-30 2018-07-14 S
44 2018-07-15 2018-07-20 M
44 2018-07-21 2018-07-25 null
Desired Ending Table:
ID IDStartDate IDEndDate Category StartDate EndDate
----------------------------------------------------------
11 2017-01-01 2017-01-31 S
11 2017-02-02 2017-02-03 null 2017-02-01 2017-04-30
11 2017-02-03 2017-03-31 M 2017-02-01 2017-04-30
11 2017-04-01 2017-04-30 M 2017-02-01 2017-04-30
22 2017-05-01 2017-06-15 S
22 2017-06-16 2017-06-20 null 2017-06-16 2017-06-25
22 2017-06-21 2017-06-25 M 2017-06-16 2017-06-25
22 2017-06-26 2017-06-27 null
22 2017-06-28 2017-06-29 S
22 2017-06-30 2017-07-05 M
33 2017-06-30 2017-07-14 M
33 2017-07-15 2017-07-20 S
33 2017-07-21 2017-07-25 null
44 2018-06-30 2018-07-14 S
44 2018-07-15 2018-07-20 M
44 2018-07-21 2018-07-25 null
Below is some SQL to create the table and view the query results that I have started. I tried cte, cross apply, outer apply, inner joins... with no luck.
thanks so much!
CREATE TABLE test (
ID INT,
IDStartDate date,
IDEndDate date,
Category VARCHAR (2),
StartDate date,
EndDate date
);
INSERT INTO test (ID, IDStartDate, IDEndDate, Category)
VALUES
(11, '2017-01-01', '2017-01-31', 'S')
,(11, '2017-02-02', '2017-02-03', null)
,(11, '2017-02-03', '2017-03-31', 'M')
,(11, '2017-04-01', '2017-04-30', 'M')
,(22, '2017-05-01', '2017-06-15', 'S')
,(22, '2017-06-16', '2017-06-20', null)
,(22, '2017-06-21', '2017-06-25', 'M')
,(22, '2017-06-26', '2017-06-27', null)
,(22, '2017-06-28', '2017-06-29', 'S')
,(22, '2017-06-30', '2017-07-05', 'M')
,(33, '2017-06-30', '2017-07-14', 'M')
,(33, '2017-07-15', '2017-07-20', 'S')
,(33, '2017-07-21', '2017-07-25', null)
,(44, '2018-06-30', '2018-07-14', 'S')
,(44, '2018-07-15', '2018-07-20', 'M')
,(44, '2018-07-21', '2018-07-25', null);
--**************************
--results: shows first rows of each range
--**************************
;with cte as
(
select *
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS RowNum
,LAG(IDEndDate) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastIDEndDate
,LAG(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastCategory
,LEAD(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS nextCategory
from test
)
select * --select first row of each range to update
from cte
where Category is null and lastCategory = 'S' and nextCategory = 'M'
--*******************************
--6 of 8 "new" values are correct (missing NewEndDate for first range)
--*******************************
;with cte as
(
SELECT *
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS RowNum
,LAG(IDEndDate) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastIDEndDate
,LAG(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastCategory
,LEAD(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS nextCategory
FROM test
), cte2 as
(
select * --find the first/start row of each range
,LAG(RowNum) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastRowNum
,IIF(Category is null and lastCategory = 'S' and nextCategory = 'M', DateAdd(day, 1, lastIDEndDate), null) as NewStartDate
,IIF(Category is null and lastCategory = 'S' and nextCategory = 'M', RowNum, null) as NewStartRowNum
from cte
)
select t1.*, t3.*
from cte2 t1
outer apply
(
select top 1 --find the last/ending row of each range
t2.lastIDEndDate as NewEndDate
,t2.lastRowNum as NewEndRowNum
from cte2 t2
where t1.ID = t2.ID
and t1.NewStartRowNum < t2.RowNum
and t2.nextCategory <> 'M'
order by t2.ID, t2.RowNum
) t3
order by t1.ID, t1.RowNum

Here's an attempt on this SQL puzzle.
Basically, it updates from a CTE.
First it calculates a Cummulative sum. To create some kind of ranking.
Then only for rank 2 & 3 it'll calculate the dates.
;WITH CTE AS
(
SELECT ID, IDStartDate, IDEndDate, Category, StartDate, EndDate,
DATEADD(day,1, FIRST_VALUE(IDEndDate) OVER (PARTITION BY ID ORDER BY IDStartDate)) AS NewStartDate,
FIRST_VALUE(IDEndDate) OVER (PARTITION BY ID ORDER BY IDStartDate DESC) AS NewEndDate
FROM
(
SELECT ID, IDStartDate, IDEndDate, Category, StartDate, EndDate,
SUM(CASE WHEN Category = 'S' THEN 2 WHEN Category IS NULL THEN 1 END) OVER (PARTITION BY ID ORDER BY IDStartDate) AS cSum
FROM test t
) q
WHERE cSum IN (2, 3)
)
UPDATE CTE
SET
StartDate = NewStartDate,
EndDate = NewEndDate
WHERE (Category IS NULL OR Category = 'M');
A test on rextester here

I answered my own question. I had two major errors:
1) A Cross Apply (or Outer Apply) is needed for the Top N query to work properly.
Using a cross apply, the Top N query will be run for each row from the inner query.
Using an inner join (or left join), all rows will be returned first from the inner query and the Top N query runs only once.
2) Filtering on "[column] <> 'M'" messed me up as it did not exclude NULL's. I had to use instead "[column] = 'S' or [column] is null"
Final SQL found in rextester
Working code below:
;with cte as
(
SELECT *
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS RowNum
,LAG(IDEndDate) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastIDEndDate
,LAG(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastCategory
,LEAD(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS nextCategory
FROM test
), cte2 as
(
select t1.ID, t1.IDStartDate, t1.IDEndDate --find the first/start row of the range
,IIF(Category is null and lastCategory = 'S' and nextCategory = 'M', DateAdd(day, 1, lastIDEndDate), null) as NewStartDate
,IIF(Category is null and lastCategory = 'S' and nextCategory = 'M', RowNum, null) as NewStartRowNum
,t3.*
from cte t1
cross apply
(
select top 1 --find the last/ending row of the range
t2.IDEndDate as NewEndDate
,t2.RowNum as NewEndRowNum
from cte t2
where t1.ID = t2.ID
and t1.RowNum < t2.RowNum
and (t2.nextCategory ='S' or t2.nextCategory is null)
order by t1.ID, t1.RowNum
) t3
where Category is null and lastCategory = 'S' and nextCategory = 'M'
)
update t4
set StartDate = NewStartDate
,EndDate = NewEndDate
from cte t4
inner join cte2 t5
on t4.ID = t5.ID
and t4.RowNum Between NewStartRowNum and NewEndRowNum
select * from test

SQL frequency count

I have a table in SSMS:
Id Date Value
111 1/1/18 x
111 1/2/18 x
111 1/3/18 y
111 1/4/18 y
111 1/5/18 x
111 1/6/18 x
222 1/3/18 z
222 1/6/18 y
222 1/8/18 y
I want to count for the frequency of latest value . So the output will be:
Id Value Days
111 x 2 *(for 1/5/18 & 1/6/18)*
222 y 3 *(for 1/6/18 & 1/8/18; Here I assume 1/7/18 is a weekend or holiday. Even though my table skips the weekend, we still want to count days for the weekend)*
How would this be done? Many thanks!

Use lag to get the previous row's value and then a running sum to assign groups. Thereafter count the number in the first group.
select id,val,datediff(day,min(date),max(date))+1 as days
from (select t.*,sum(case when val=prev_val then 0 else 1 end) over(partition by id order by date desc) as grp
from (select t.*,lag(val) over(partition by id order by date desc) as prev_val
from tbl t
) t
) t
where grp=1
group by id,val

Try:
SELECT COUNT(*) FROM Table1 WHERE Value =
(
SELECT Value FROM Table1 WHERE Id = MAX(Id)
)

I hope you want this
select Id, count(Date) as "Days", Value from SSMS
group by ID, Value
correct me if I'm wrong

This answer should account for the weekends and holiday assumptions you have made (with another test case).
SELECT
T.Id, T.val, DATEDIFF(DD, COALESCE(T.MaxSwitch, T.MinMatch, T.MaxDate), T.MaxDate) + 1 AS [Days]
FROM (
SELECT
T.Id,
MAX(CASE WHEN T.LastValue IS NULL THEN T.val ELSE '' END) AS [val],
MAX(T.Date) AS [MaxDate],
MAX(CASE WHEN t.val <> t.LastValue THEN T.RunningDate ELSE NULL END) AS [MaxSwitch],
MIN(CASE WHEN t.val = t.LastValue THEN T.[Date] ELSE NULL END) AS [MinMatch]
FROM (SELECT *, LAG(val) OVER (PARTITION BY Id ORDER BY DATE DESC) AS LastValue,
LAG([Date]) OVER (PARTITION BY Id ORDER BY DATE DESC) AS RunningDate FROM #T) T
GROUP BY
T.Id
) T
This approach uses LAG to track previous value and date so that it can determine (1) the last value to get running match, (2) the latest date when value switched to most recent value, and (3) the earliest date with value matching final date. It then calculates the date difference to account for skipping days in table from priority of (A) latest date value switched to recent value, (B) or if no switch occurred, then earliest date with value matching final date.
For the sample data below:
DECLARE #T TABLE (
Id INT, [Date] DATE, val VARCHAR(10)
)
INSERT #T VALUES
('111', '1/1/18', 'x'),
('111', '1/2/18', 'x'),
('111', '1/3/18', 'y'),
('111', '1/4/18', 'y'),
('111', '1/5/18', 'x'),
('111', '1/6/18', 'x'),
('222', '1/2/18', 'y'),
('222', '1/3/18', 'z'),
('222', '1/6/18', 'y'),
('222', '1/8/18', 'y'),
('333', '1/9/18', 'a')
The following output is given:
Id val Days
----------- ---------- -----------
111 x 2 (from OP example)
222 y 3 (from OP example)
333 a 1 (case of single value)

Getting sum value of a column having minimum date?

I have data like this:
DATE ID weight
---- ---- -------
2017-04-25 11:05:42.273 247 0.418
2017-04-25 11:05:42.310 248 0.568
2017-04-25 13:57:55.327 247 0.418
2017-04-25 13:57:55.360 247 0.534
2017-04-25 13:57:55.397 248 0.568
2017-04-25 13:57:55.453 248 0.448
Now the requirement is I have to sum the gross weight based on barcodeid having minimum date.
here the output should be (0.418+0.568) because it has minimum date for barcode 247 and 248 respectively.

Use a window function to assign a row number starting over for each partition (ID)
then only sum rownumber 1 a CTE or subquery is needed since RN would not be available to limit by.
A partition is just a grouping of records in the columns specified. so ID where 247 and 248 are different groups and row #1 will be assigned to the earliest date in each partition. Then when we say where rn = 1 we only get weights for those earliest dates of each different ID!
WITH CTE AS (SELECT A.*
, Row_NUMBER() Over (Partition by ID order by Date asc) RN
FROM TABLE A)
SELECT Sum(Weight)
FROM CTE
WHERE RN = 1

Edit: Well I have egg on my face. Fixed
I believe a simple sub query will suffice
SELECT sum(weight)
FROM Table t1
WHERE DATE = (select min(DATE) from Table t2 where t1.ID = t2.ID group by id)

;With cte([DATE],ID,[weight])
AS
(
SELECT '2017-04-25 11:05:42.273', 247, 0.418 Union all
SELECT '2017-04-25 11:05:42.310', 248, 0.568 Union all
SELECT '2017-04-25 13:57:55.327', 247, 0.418 Union all
SELECT '2017-04-25 13:57:55.360', 247, 0.534 Union all
SELECT '2017-04-25 13:57:55.397', 248, 0.568 Union all
SELECT '2017-04-25 13:57:55.453', 248, 0.448
)
SELECT Sum(MinWeight) [SumOFweight] From
(
SELECT ID,DATE,Min([weight])OVER(Partition by DATE) AS MinWeight ,Row_NUMBER() Over (Partition by ID order by Date asc) RN From
(
SELECT DATE,ID,SUM([weight])[weight] FROM cte
GROUP by ID,DATE
)dt
)Final
where Final.RN=1
OutPut
SumOFweight
-------------
0.986

Determine consecutive date count in SQL Server

I have some data that looks like this:
id date
--------------------------------
123 2013-04-08 00:00:00.000
123 2013-04-07 00:00:00.000
123 2013-04-06 00:00:00.000
123 2013-04-04 00:00:00.000
123 2013-04-03 00:00:00.000
I need to return a count of the most recent consecutive date streak for a given ID, which in this case would be 3 for id 123. I have no idea if this can be done in SQL. Any suggestions?

The way to do this is to subtract a sequence of numbers and take the difference. This is a constant for a sequence of dates. Here is an example to get the length of all sequences for an id:
select id, grp, count(*) as NumInSequence, min(date), max(date)
from (select t.*,
(date - row_number() over (partition by id order by date)) as grp
from data t
) t
group by id, grp
To get the longest one, I would use row_number() again:
select t.*
from (select id, grp, count(*) as NumInSequence,
min(date) as mindate, max(date) as maxdate,
row_number() over (partition by id order by count(*) desc) as seqnum
from (select t.*,
(date - row_number() over (partition by id order by date)) as grp
from data t
) t
group by id, grp
) t
where seqnum = 1

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Segregate same columns into two columns - snowflake-cloud-data-platform

Related

Repeated data on inserted rows

How to update rows between two different sets of criteria in SQL Server without using a loop

SQL frequency count

Getting sum value of a column having minimum date?

Determine consecutive date count in SQL Server

Categories

Resources