Snowflake - How to create summary table containing unique records - snowflake-cloud-data-platform

I'm looking for some Snowflake syntax assistance in how to generate a summary table or view from an existing table. My summary table should have 1 row per unique id from the existing table along with boolean values indicating if the various milestones (as per the summary column names) have been hit. Any help is appreciated as I am a Snowflake novice. Thanks.
Existing Table
Desired Summary Table/View

So using Himanshu's data, thank you:
WITH fake_data(id, updated, pipeline_id, stage_id) AS (
SELECT column1, to_date(column2,'mm/dd/yyyy hh:mm:ss'), column3, column4
FROM VALUES
(1111, '02/01/2022 09:01:00', 'A', '1' ),
(1111, '02/01/2022 10:01:00', 'A', '2' ),
(1111, '02/01/2022 11:01:00', 'B', '5' ),
(2222, '02/02/2022 13:01:00', 'A', '1' ),
(2222, '02/03/2022 18:01:00', 'B', '5' ),
(2222, '02/04/2022 07:01:00', 'B', '6' ),
(3333, '02/02/2022 14:01:00', 'A', '1' ),
(3333, '02/03/2022 18:01:00', 'A', '2' ),
(3333, '02/03/2022 07:01:00', 'C', '7' ),
(3333, '02/03/2022 21:01:00', 'C', '8' ),
(3333, '02/05/2022 17:01:00', 'C', '9' )
)
we are doing an aggregation across each id and we want to use COUNT_IF to see how many row meet out criteria, and if it is >0 we are happy
SELECT
id,
count_if(pipeline_id='A')>0 AS hit_stage_a,
count_if(pipeline_id='B')>0 AS hit_stage_b,
count_if(pipeline_id='C')>0 AS hit_stage_c,
count_if(stage_id='4')>0 AS hit_stage_4,
count_if(stage_id='5')>0 AS hit_stage_5,
count_if(stage_id='6')>0 AS hit_stage_6
FROM fake_data
GROUP BY 1
ORDER BY 1;
gives:
ID
HIT_STAGE_A
HIT_STAGE_B
HIT_STAGE_C
HIT_STAGE_4
HIT_STAGE_5
HIT_STAGE_6
1111
TRUE
TRUE
FALSE
FALSE
TRUE
FALSE
2222
TRUE
TRUE
FALSE
FALSE
TRUE
TRUE
3333
TRUE
FALSE
TRUE
FALSE
FALSE
FALSE

try this and see if this helps to get what you want.
SELECT ID, decode(HIT_PIPELINE_A, NULL,FALSE,TRUE) ,
decode(HIT_PIPELINE_B, NULL,FALSE,TRUE),
decode(HIT_PIPELINE_C, NULL,FALSE,TRUE),
decode(HIT_STAGE_4, NULL,FALSE,TRUE),
decode(HIT_STAGE_5, NULL,FALSE,TRUE),
decode(HIT_STAGE_6, NULL,FALSE,TRUE) FROM
(
SELECT * from tab1
PIVOT(MAx(PIPELINE_ID) FOR stage_id IN ('1','2','3','4','5','6'))
AS P(ID,DT,HIT_PIPELINE_A,HIT_PIPELINE_B,HIT_PIPELINE_C,HIT_STAGE_4,HIT_STAGE_5,HIT_STAGE_6)
) order by ID;
create or replace table Tab1 (ID varchar2(100), updated date, pipeline_id varchar2(100), stage_id varchar2(10));
insert into tab1 values(1111, to_date('02/01/2022 09:01:00','mm/dd/yyyy hh:mm:ss'), 'A', '1' );
insert into tab1 values(1111, to_date('02/01/2022 10:01:00','mm/dd/yyyy hh:mm:ss'), 'A', '2' );
insert into tab1 values(1111, to_date('02/01/2022 11:01:00','mm/dd/yyyy hh:mm:ss'), 'B', '5' );
insert into tab1 values(2222, to_date('02/02/2022 13:01:00','mm/dd/yyyy hh:mm:ss'), 'A', '1' );
insert into tab1 values(2222, to_date('02/03/2022 18:01:00','mm/dd/yyyy hh:mm:ss'), 'B', '5' );
insert into tab1 values(2222, to_date('02/04/2022 07:01:00','mm/dd/yyyy hh:mm:ss'), 'B', '6' );
insert into tab1 values(3333, to_date('02/02/2022 14:01:00','mm/dd/yyyy hh:mm:ss'), 'A', '1' );
insert into tab1 values(3333, to_date('02/03/2022 18:01:00','mm/dd/yyyy hh:mm:ss'), 'A', '2' );
insert into tab1 values(3333, to_date('02/03/2022 07:01:00','mm/dd/yyyy hh:mm:ss'), 'C', '7' );
insert into tab1 values(3333, to_date('02/03/2022 21:01:00','mm/dd/yyyy hh:mm:ss'), 'C', '8' );
insert into tab1 values(3333, to_date('02/05/2022 17:01:00','mm/dd/yyyy hh:mm:ss'), 'C', '9' );

Related

Create hash by ignoring null in snowflake

I want to create hash of ('a', 'b', 'c', null) by ignoring null. I used below statement to do the same but it returns null.
I want (select SHA2_HEX(a|b|c) whereas Below statement does (select SHA2_HEX(null)
(select SHA2_HEX(CONCAT_WS('|', 'a', 'b', 'c', null)))
CONCAT_WS will produce NULL as soon as one of the values is NULL. Try to add coalesce(your_column, ''), then at least the final output of CONCAT_WS is not NULL. But: Result is still not correct, because you will have |a|b|c| (note the last |).
select SHA2_HEX(CONCAT_WS('|', 'a', 'b', 'c', coalesce(null, '')))
Otherwise just do CONCAT('|', 'a', '|', 'b', '|', 'c', coalesce(null, ''))
ARRAY_CONSTRUCT_COMPACT drops NULL's and then ARRAY_TO_STRING gives the string you are looking for:
select
CONCAT_WS('|', 'a', 'b', 'c', null) as d1
,array_construct('a', 'b', 'c', null) as a1
,ARRAY_TO_STRING(a1, '|') as d2
,array_construct_compact('a', 'b', 'c', null) as a2
,ARRAY_TO_STRING(a2, '|') as d3
;
D1
A1
D2
A2
D3
null
[ "a", "b", "c", undefined ]
a|b|c|
[ "a", "b", "c" ]
a|b|c
thus
select SHA2_HEX(ARRAY_TO_STRING(array_construct_compact('a', 'b', 'c', null),'|'));
gives:
SHA2_HEX(ARRAY_TO_STRING(ARRAY_CONSTRUCT_COMPACT('A', 'B', 'C', NULL),'|'))
a52dd81bfd5e4e66d96b9f598382f6cbf8c5c3897654e6ae9055e03620fcf38e

SQL Server - How can I remove row based on column value from previous date?

I have a question about how to accomplish something in SQL Server. Basically, I want to take a set of data comes from a certain time period, and remove any rows where a column value, in this case SerialNumber, has been entered in the previous 3 weeks and has a passing mark. I filter based on the current date to return any potentially relevant entries. Below is that data.
The issue is that the final entry in that above data has an entry date before the current date and in the previous 3 weeks time period, and it result has a pass of '1'. As such, I'd like to remove any entries for that SerialNumber value so it's not listed in today's results. The desired data is below.
Hopefully this makes sense to you guys. It's hard for me to describe. Current query Code is below, if needed. It doesn't make an attempt to implement the desired functionality as I'm not sure how to go about it.
Select * From
(SELECT A.SerialNumber
,[EndTime] as Date
,ROW_NUMBER() over (partition by A.SerialNumber order by EndTime desc) Entry
,[Pass]
,A.EntryTotal
,A.Passes
,CycleType
From
(
SELECT max([SerialNumber]) as SerialNumber
,Count(*) as EntryTotal
,sum(convert(int,TD.Pass)) as Passes
FROM [FlowDB2].[dbo].[TimeAnalyticsData] TD
where Pass is not null
group by SerialNumber
)
as A join [FlowDB2].[dbo].[TimeAnalyticsData] as TAD on A.SerialNumber = TAD.SerialNumber
inner join [FlowDB2].[dbo].[TimeAnalytics] as TA on TAD.DurationID = TA.DurationID
where
Pass is not null
and
(EndTime >= '2020-08-24 16:00:00' and EndTime < '2020-08-25 4:00:00')
) as B
A correlated subquery allows you to compare data to itself.
The following looks for the non-existence of rows with the same SerialNumber as the current row with pass=1 less than 21 days ago.
The final filter simply makes sure you're looking at a different date than the current one.
select *
from original_data od1
where not exists (select null
from original_data od2
where and od2.pass = 1
and od2.serialnumber = od1.serialnumber
and od2."Date" > DATEADD(day, -21, od1.date)
and od2."Date" <> od1."Date"
);
with your original data recreated as follows:
CREATE TABLE original_data (
"SerialNumber" BIGINT,
"Date" datetime,
"Entry" INTEGER,
"Pass" INTEGER,
"EntryTtl" INTEGER,
"Passes" INTEGER,
"CycleTy" VARCHAR(2)
);
INSERT INTO original_data
("SerialNumber", "Date", "Entry", "Pass", "EntryTtl", "Passes", "CycleTy")
VALUES
('6102046905', '2020-08-24 21:03:20.000', '1', '1', '2', '1', 'PA'),
('6102046905', '2020-08-24 19:47:23.000', '2', '0', '2', '1', 'PA'),
('6102046906', '2020-08-24 22:45:16.000', '1', '1', '2', '1', 'PA'),
('6102046906', '2020-08-24 19:47:23.000', '2', '0', '2', '1', 'PA'),
('6102047024', '2020-08-24 21:03:20.000', '1', '1', '2', '1', 'PA'),
('6102047024', '2020-08-24 19:47:23.000', '2', '0', '2', '1', 'PA'),
('6102047028', '2020-08-24 18:04:48.000', '1', '1', '2', '1', 'PA');
See how it works in this Fiddle.

SQL query to determine if groups of rows have a particular value in a particular column

I am working in SQL Server 2016. I have the following table and sample data:
CREATE TABLE A
(
col1 char(1)
,col2 int
,indicator_flag char(4)
)
;
INSERT INTO A
VALUES
('A', 1, 'Pass')
,('A', 2, 'Pass')
,('A', 3, 'Fail')
,('B', 10, 'Pass')
,('C', 19, 'Fail')
,('D', 1, 'Fail')
,('D', 2, 'Fail')
,('E', 1, 'Pass')
,('E', 2, 'Pass')
,('F', 20, 'Fail')
,('F', 21, 'Fail')
,('F', 100, 'Pass')
;
The indicator_flag column will only ever hold values 'Pass' and 'Fail'. For every distinct value in col1, I want to return a collapsed indicator_flag value according to the following rule -- if all values are 'Pass', then 'Pass'; else, 'Fail'.
So, for the sample data, I expect the following output:
col1 collapsed_indicator_flag
A Fail
B Pass
C Fail
D Fail
E Pass
F Fail
How can I achieve this output? The solution needs to perform well. (My actual table is very large.)
One method is to use aggregation:
select col1, min(indicator_flag) as indicator_flag
from a
group by col1;
This uses the observation that 'Pass' > 'Fail'.
If you want performance, then you could speed this up if you have the right indexes and another table with just col1 values:
select t.col1, coalesce(a.indicator_flag, 'Pass') as indicator_flag
from col1table t outer apply
(select a.*
from a
where a.col1 = t.col1 and a.indicator_flag = 'Fail'
) a;
The index for this query would be a(col1, indicator_flag).

Filtering data using group by with condition into every group

I have data in a table like below:
Table name: Employee Name
Column Name: Carrier and Error
The contents of the table:
**Carrier** **Error**
'A' 'Invalid'
'A' ''
'C' 'Invalid'
'D' ''
I want to get data per group, meaning I have distinct 3 carrier group then I need 3 rows from the table. For example as per above data I need output like below:
**Carrier** **Error**
'A' 'Invalid'
'C' 'Invalid'
'D' ''
Here carrier 'A' has two rows so I need to display data for 'A' is not nullable first row.
Thanks!
You can do simple grouping like:
CREATE TABLE Employee_Name
(
Carrier NVARCHAR(100) NOT NULL ,
Error NVARCHAR(100) NULL
);
INSERT INTO Employee_Name
VALUES ( '''A''', '''Invalid''' ),
( '''A''', '''''' ),
( '''C''', '''Invalid''' ),
( '''D''', '''''' );
--Query
SELECT Carrier ,
MAX(Error) Error
FROM Employee_Name
GROUP BY Carrier;
The result will be :

Can't add data to datetime2 field

Im using SQL Server Express 2008 and Im trying to add data to a field in a table which has a datatype of datetime2(7).
This is what Im trying to add:
'2012-02-02 12:32:10.1234'
But I am getting the error
Msg 8152, Level 16, State 4, Line 1
String or binary data would be truncated.
The statement has been terminated.
Does this mean that it's too long to be added to the field? and should be cut down abit? If so - can you give me an example of how it should look?
Note - I've also tried it in this format:
'01/01/98 23:59:59.999'
Thanks
**EDIT
The actual statement:
INSERT INTO dbo.myTable
(
nbr,
id,
name,
dsc,
start_date,
end_date,
last_date,
condition,
condtion_dsc,
crte_dte,
someting,
activation_date,
denial_date,
another_date,
a_name,
prior_auth_start_date,
prior_auth_end_date,
history_cmnt,
cmnt,
source,
program,
[IC-code],
[IC-description],
another_start_date,
another_start_date,
ver_nbr,
created_by,
creation_date,
updated_by,
updated_date)
VALUES
(
26,
'a',
'sometinh',
'c',
01/01/98 23:59:59.999,
01/01/98 23:59:59.999,
01/01/98 23:59:59.999,
'as',
'asdf',
01/01/98 23:59:59.999,
'lkop',
01/01/98 23:59:59.999,
01/01/98 23:59:59.999,
01/01/98 23:59:59.999,
'a',
01/01/98 23:59:59.999,
01/01/98 23:59:59.999,
'b',
'c',
'd',
'b',
'c',
'd',
01/01/98 23:59:59.999,
01/01/98 23:59:59.999,
423,
'Monkeys',
01/01/98 23:59:59.999,
'Goats',
01/01/98 23:59:59.999
);
Take a close look at the table you are trying to insert into. I bet one of the values you're trying to insert into a char/varchar/nchar/nvarchar column is too long.
SELECT
name,
max_length / CASE WHEN system_type_id IN (231, 239)
THEN 2 ELSE 1 END
FROM sys.columns
WHERE [object_id] = OBJECT_ID('dbo.TargetTableName')
AND system_type_id IN (167, 175, 231, 239);
This will get you a list like:
name
-------- --------
col1 32
col5 64
col7 12
Now, compare this list to the literals you have in your VALUES clause. As I suggested in a comment, I bet one of these has more characters than the table allows.
There's a chance there are binary or varbinary columns, and the issue is there, but I strongly suspect this is a simple "string is too long" problem - and has absolutely nothing to do with your DATETIME2(7) value.

Resources