TSQL : conditional query issue - sql-server

I need to create a flag to identify all Room_IDs where the following is met:
a "Qc-" Status is present within one Hotel_ID.
the "Qc-" Statushas a corresponding non "Qc-" Status (e.g.
'qc-occupied' & 'occupied').
the "Qc-" Status has to have a to have a smaller Room_ID than the
non "Qc-" Status. (e.g. Status = 'qc-occupied' has a Room_ID=1 and Status = 'occupied' has a Room_ID= 5)
This is a simplified table (tableX) I am using as an example:
**Hotel_ID Room_Id Status**
1 1 vacant
1 2 qc-occupied
1 3 vacant
2 1 occupied
2 2 qc-vacant
2 3 vacant
3 1 qc-vacant
4 1 vacant
4 2 occupied
4 3 qc-vacant
5 1 vacant
I need the following as a result:
**Hotel_ID Room_Id Status flag**
1 1 vacant 0
1 2 qc-occupied 0
1 3 vacant 0
2 1 occupied 0
2 2 qc-vacant 1
2 3 vacant 1
3 1 qc-vacant 0
4 1 vacant 0
4 2 occupied 0
4 3 qc-vacant 0
5 1 vacant 0
Thank you in advance !

This is a literal translation of the requirements into rather inelegant code. It can certainly be improved, e.g. by removing your first requirement ("qc-" present.) since it is implicit in the other two requirements. The second requirement is implicit in the third, allowing another improvement.
-- Sample data.
declare #TableX as Table ( Hotel_Id Int, Room_Id Int, Stat VarChar(16) );
insert into #TableX ( Hotel_Id, Room_Id, Stat ) values
( 1, 1, 'vacant' ), ( 1, 2, 'qc-occupied' ), ( 1, 3, 'vacant' ),
( 2, 1, 'occupied' ), ( 2, 2, 'qc-vacant' ), ( 2, 3, 'vacant' ),
( 3, 1, 'qc-vacant' ),
( 4, 1, 'vacant' ), ( 4, 2, 'occupied' ), ( 4, 3, 'qc-vacant' ),
( 5, 1, 'vacant' );
select * from #TableX;
-- Literal translation of requirements.
declare #False as Bit = 0, #True as Bit = 1;
select Hotel_Id, Room_Id, Stat,
QC_In_Hotel, QC_And_NonQC_In_Hotel, QC_Precedes_NonQC_In_Hotel,
case when QC_In_Hotel = #True and QC_And_NonQC_In_Hotel = #True and
QC_Precedes_NonQC_In_Hotel = #True then #True else #False end as Flag
from (
select Hotel_Id, Room_Id, Stat,
-- Req: a "Qc-" Status is present within one Hotel_ID.
case when exists ( select 42 from #TableX as I
where I.Hotel_Id = O.Hotel_Id and I.Stat like 'qc-%' )
then #True else #False end as QC_In_Hotel,
-- Req: the "Qc-" Status has a corresponding non "Qc-" Status (e.g. 'qc-occupied' & 'occupied').
case when exists ( select 42 from #TableX as I
where I.Hotel_Id = O.Hotel_Id and
( ( I.Stat like 'qc-' + O.Stat ) or ( O.Stat like 'qc-' + I.Stat ) ) )
then #True else #False end as QC_And_NonQC_In_Hotel,
-- Req: the "Qc-" Status has to have a to have a smaller Room_ID than the non "Qc-" Status.
case when exists ( select 42 from #TableX as I
where I.Hotel_Id = O.Hotel_Id and
( ( I.Room_Id < O.Room_Id and I.Stat like 'qc-' + O.Stat ) or
( O.Room_Id < I.Room_Id and O.Stat like 'qc-' + I.Stat ) ) )
then #True else #False end as QC_Precedes_NonQC_In_Hotel
from #TableX as O ) as PH
order by Hotel_Id, Room_Id;

Related

Workaround for Snowflake compiler errors

Sometimes the Snowflake SQL compiler tries to be too smart for its own good. This is a follow-up to a previous question here, where a clever solution was provided for my given use-case, but have run into some limitations for that solution.
A brief background; I have a JS-UDTF that takes 3 float arguments to return rows representing a series GENERATE_SERIES(FLOAT,FLOAT,FLOAT), and a SQL-UDTF GENERATE_SERIES(INT,INT,INT) that cast the params to floats, invokes the JS-UDTF, and then the result back to ints. My original version for this wrapper UDTF was:
CREATE OR REPLACE FUNCTION generate_series(FIRST_VALUE INTEGER, LAST_VALUE INTEGER, STEP_VALUE INTEGER)
RETURNS TABLE (GS_VALUE INTEGER)
AS
$$
SELECT GS_VALUE::INTEGER AS GS_VALUE FROM table(generate_series(FIRST_VALUE::DOUBLE,LAST_VALUE::DOUBLE,STEP_VALUE::DOUBLE))
$$;
Which would fail in most conditions where the input were not constants, e.g.:
WITH report_params AS (
SELECT
1::integer as first_value,
3::integer as last_value,
1::integer AS step_value
)
SELECT
*
FROM
report_params, table(
generate_series(
first_value,
last_value,
step_value
)
)
Would return error:
SQL compilation error: Unsupported subquery type cannot be evaluated
The provided solution to trick the SQL compiler to behave was to encapsulate the function params into a VALUES table and cross-join the inner UDTF:
CREATE OR REPLACE FUNCTION generate_series_int(FIRST_VALUE INTEGER, LAST_VALUE INTEGER, STEP_VALUE INTEGER)
RETURNS TABLE (GS_VALUE INTEGER)
AS
$$
SELECT GS_VALUE::INTEGER AS GS_VALUE
FROM (VALUES (first_value, last_value, step_value)),
table(generate_series(first_value::double,last_value::double,step_value::double))
$$;
This worked lovely for most invocations, however I've discovered a situation where the SQL compiler is at it again. Here is a simplified example that reproduces the problem:
WITH report_params AS (
SELECT
1::integer AS first_value,
DATEDIFF('DAY','2020-01-01'::date,'2020-02-01'::date)::integer AS last_value,
1::integer AS step_value
)
SELECT
*
FROM
report_params, table(
COMMON.FN.generate_series(
first_value,
last_value,
step_value
)
);
This results in the error:
SQL compilation error: Invalid expression [CORRELATION(SYS_VW.LAST_VALUE_3)] in VALUES clause
The error seems obvious enough (I think) that the compiler is trying to embed the function code into the outer queries treating the function like a macro before runtime.
The answer at this point might just be that I am asking too much out of Snowflake's current capabilities, but in the interest of learning and continuing to build out what I think is a very helpful UDF library, am curious if there is a solution I am missing.
The major problem is you have written a correlated sub query.
WITH report_params AS (
SELECT * FROM VALUES
(1, 30, 1)
v(first_value,last_value, step_value)
)
SELECT
*
FROM
report_params, table(
COMMON.FN.generate_series(
first_value,
last_value,
step_value
)
);
as when you add a second row to your CTE
WITH report_params AS (
SELECT * FROM VALUES
(1, 30, 1),
(2, 40, 2)
v(first_value,last_value, step_value)
)
SELECT
*
FROM
report_params, table(
COMMON.FN.generate_series(
first_value,
last_value,
step_value
)
);
it becomes more obvious this is correlated, which is not so obvious who snowflake should execute it.
which for the above data would ideal look like (if it was valid SQL)
WITH report_params AS (
SELECT *
,mod(v.first_value,v.step_value) as mod_offset
FROM VALUES
(0, 5, 20, 1),
(1, 3, 15, 3),
(2, 4, 15, 3),
(3, 5, 15, 3)
v(id, first_value,last_value, step_value)
), report_ranges AS (
SELECT min(first_value) as mmin,
max(last_value) as mmax
FROM report_params
WHERE first_value <= last_value AND step_value > 0
), all_range AS (
SELECT
row_number() over (order by seq8()) + rr.mmin - 1 as seq
FROM report_ranges rr,
TABLE(GENERATOR( ROWCOUNT => (rr.mmax - rr.mmin) + 1 ))
)
SELECT
ar.seq
,rp.id, rp.first_value, rp.last_value, rp.step_value, rp.mod_offset
FROM all_range as ar
JOIN report_params as rp ON ar.seq BETWEEN rp.first_value AND rp.last_value AND mod(ar.seq, rp.step_value) = rp.mod_offset
ORDER BY 2,1;
but if your generating it in a stored procedure (or externally) could be substituted into
WITH report_params AS (
SELECT *
,mod(v.first_value,v.step_value) as mod_offset
FROM VALUES
(0, 5, 20, 1),
(1, 3, 15, 3),
(2, 4, 15, 3),
(3, 5, 15, 3)
v(id, first_value,last_value, step_value)
), all_range AS (
SELECT
row_number() over (order by seq8()) + 3 /*min*/ - 1 as seq
FROM TABLE(GENERATOR( ROWCOUNT => (20/*max*/ - 3/*min*/) + 1 ))
)
SELECT
ar.seq
,rp.id
,rp.first_value, rp.last_value, rp.step_value, rp.mod_offset
FROM all_range as ar
JOIN report_params as rp ON ar.seq BETWEEN rp.first_value AND rp.last_value AND mod(ar.seq, rp.step_value) = rp.mod_offset
ORDER BY 2,1;
giving:
SEQ ID FIRST_VALUE LAST_VALUE STEP_VALUE MOD_OFFSET
5 0 5 20 1 0
6 0 5 20 1 0
7 0 5 20 1 0
8 0 5 20 1 0
9 0 5 20 1 0
10 0 5 20 1 0
11 0 5 20 1 0
12 0 5 20 1 0
13 0 5 20 1 0
14 0 5 20 1 0
15 0 5 20 1 0
16 0 5 20 1 0
17 0 5 20 1 0
18 0 5 20 1 0
19 0 5 20 1 0
20 0 5 20 1 0
3 1 3 15 3 0
6 1 3 15 3 0
9 1 3 15 3 0
12 1 3 15 3 0
15 1 3 15 3 0
4 2 4 15 3 1
7 2 4 15 3 1
10 2 4 15 3 1
13 2 4 15 3 1
5 3 5 15 3 2
8 3 5 15 3 2
11 3 5 15 3 2
14 3 5 15 3 2
The problem I cannot guess at, is it feels like you ether trying to hide some complexity behind the table functions JS functions, or have made thing over complex for an unstated reason.
[edit speaking to the 1-9 comment]
the major difference between a generate_series and GENERATOR is the former is almost a UDF or CTE and in snowflake you have to have the GENERATOR in it own sub-select or you will get messed up results.
with s1 as (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 3 ))
), s2 as (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 3 ))
)
select s1.seq as a, s2.seq as b
from s1, s2
order by 1,2;
gives 9 rows of the two data mixed, like you not you want.
where-as
with s1 as (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 3 ))
)
SELECT
row_number() over (order by seq8()) -1 as a
,s1.seq as b
FROM
TABLE(GENERATOR( ROWCOUNT => 3 )), s1;
give 1-9, because the GENERATOR (the creator of rows) has been crossed with the other data, before the sequence code has run..
Another version of the original solution provided, is
WITH report_params AS (
SELECT *
,trunc(div0((last_value-first_value),step_value)) as steps
FROM VALUES
(0, 5, 20, 1),
(1, 3, 15, 3),
(2, 4, 15, 3),
(3, 5, 15, 3)
v(id, first_value,last_value, step_value)
), large_range AS (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 1000 ))
)
select rp.id
,rp.first_value + (lr.seq*rp.step_value) as val
from report_params as rp
join large_range as lr on lr.seq <= rp.steps
order by 1,2;
which I like more as the nature of the mixing is more clear. But it still speaks to the mindset difference between snowflake and other RDB. In postgress there is no cost to doing per-row operations, because it was born of an era where it was all per-row operations, but snowflake has no per-row options, and because it cannot do things on each row, it can do many rows independently. It means all expressions of per-row, need to be moved to the front and then joined. Thus what the above is trying to show.

Select the IDs using group by condition

I have a dataset where I need to find the diseased patients in consecutive rows.
I'll share my sample dataset with a clear explanation.
ID Normal Des1 Des2 Des3 Des4
12 0 1 0 0 0
12 1 0 1 0 0
12 1 0 1 0 0
12 1 0 1 0 0
14 0 1 0 1 0
18 1 0 0 0 0
18 1 0 0 0 0
18 1 0 0 0 0
11 0 1 0 0 0
11 0 1 0 0 0
11 0 1 0 0 0
22 1 0 0 0 0
Here I specified the Diseased list of the dataset. I required the IDs for those who are in the same Disease in all the period.
Assume that I need an output for Patients who never fall in any Diseased criteria(IDs 18, 22) I stored it as a new set(Undiseased), Later I need to get the same model for Des1 patients (IDs 11). I tried the below code to fetch the data. but It returns partial output.
select ID from tablename where
(normal = '1' and Des1 = '0' and Des2 = '0' and Des3 = '0' and Des4 = '0')
group by ID
You can try the below query using COUNT (Transact-SQL)
function.
Create table MySampleTable (Id int, Des1 int, Des2 int, Des3 int)
insert into MySampleTable Values
(12, 0, 1, 0),
(12, 1, 0, 1),
(12, 1, 0, 1),
(18, 1, 0, 0),
(18, 1, 0, 0),
(11, 0, 1, 0),
(11, 0, 1, 0)
; with cte as (Select Id
, Count(distinct Des1) as TotDes1
, Count(distinct Des2) as TotDes2
, Count(distinct Des3) as TotDes3
from MySampleTable
group by Id
)
Select Id from cte where TotDes1 = 1
and TotDes2 = 1 and TotDes3 = 1
It looks like as shown below with the output.
Here is the live db<>fiddle demo.
You can also use the having clause as shown in the query below.
Select Id
/*
, Count(distinct Des1) as TotDes1
, Count(distinct Des2) as TotDes2
, Count(distinct Des3) as TotDes3
*/
from MySampleTable
group by Id
having Count(distinct Des1) = 1 and Count(distinct Des2) = 1
and Count(distinct Des3) = 1
Demo on db<>fiddle
You can achieve it in this simple way
;WITH cte_TempTable AS(
Select DISTINCT Id, Des1, Des2, Des3
from MySampleTable
)
SELECT Id
FROM cte_TempTable
GROUP BY Id
HAVING COUNT(Id) = 1
Output
You can use use apply :
select t.id
from table t cross apply
( values (Des1, 'Des1'), (Des2, 'Des2'), (Des3, 'Des3'), (Des4, 'Des4')
) tt(DiseasFlag, DiseasName)
where DiseasFlag = 1
group by t.id
having count(distinct DiseasName) = 1;

Query to identify contiguous ranges

I'm trying to write a query on the below data set to add a new column which has some sort of "period_id_group".
contiguous new_period row_nr new_period_starting_id
0 0 1 0
1 1 2 2
1 0 3 0
1 0 4 0
1 1 5 5
1 0 6 0
What I'm trying to get is:
contiguous new_period row_nr new_period_starting_id period_id_group
0 0 1 0 0
1 1 2 2 2
1 0 3 0 2
1 0 4 0 2
1 1 5 5 5
1 0 6 0 5
The logic is that for each 0 value in the new_period_starting_id, it has to get the >0 value from the row above.
So, for row_nr = 1 since there is no row before it, period_id_group is 0.
For row_nr = 2 since this is a new perid (marked by new_period = 1), the period_id_group is 2 (the id of this row).
For row_nr = 3 since it's part of a contiguous range (because contiguous = 1), but is not the start of the range, because it's not a new_period (new_period = 0), its period_id_group should inherit the value from the previous row (which is the start of the contiguous range) - in this case period_id_group = 2 also.
I've tried multiple versions but couldn't get a good solution for SQL Server 2008R2, since I can't use LAG().
What I have, so far, is a shameful:
select *
from #temp2 t1
left join (select distinct new_period_starting_id from #temp2) t2
on t1.new_period_starting_id >= t2.new_period_starting_id
where 1 = case
when contiguous = 0
then 1
when contiguous = 1 and t2.new_period_starting_id > 0
then 1
else 1
end
order by t1.rn
Sample data script:
declare #tmp2 table (contiguous int
, new_period int
, row_nr int
, new_period_starting_id int);
insert into #tmp2 values (0, 0, 1, 0)
, (1, 1, 2, 2)
, (1, 0, 3, 0)
, (1, 0, 4, 0)
, (1, 1, 5, 5)
, (1, 0, 6, 0);
Any help is appreciated.
So, if I'm understanding you correctly, you just need one additional column.
SELECT t1.contiguous, t1.new_period, t1.row_nr, t1.new_period_starting_id,
(SELECT TOP 1 (new_period_starting_id)
FROM YourTable t2
WHERE t2.row_nr <= t1.row_nr
AND t2.period_id_group > 0 /* optimization */
ORDER BY t2.row_nr DESC /* optimization */) AS period_id_group
FROM YourTable t1
Here is yet another option for this.
select t1.contiguous
, t1.new_period
, t1.row_nr
, t1.new_period_starting_id
, x.new_period_starting_id
from #tmp2 t1
outer apply
(
select top 1 *
from #tmp2 t2
where (t2.row_nr = 1
or t2.new_period_starting_id > 0)
and t1.row_nr >= t2.row_nr
order by t2.row_nr desc
) x
Found the solution:
select *
, case
when contiguous = 0
then f1
when contiguous = 1 and new_periods = 1
then f1
when contiguous = 1 and new_periods = 0
then v
else NULL
end [period_group]
from (
select *
, (select max(f1) from #temp2 where new_period_starting_id > 0 and rn < t1.rn) [v]
from #temp2 t1
) rs
order by rn

How to get the values in comma separated using joins?

I am working on SQL, I have two tables
EId Ename
1 john
2 alex
3 piers
4 sara
And the second table is
PID PNAME EID
1 mcndd 1
2 carter 1
3 leare 2
4 jain 2
The result should be
EID count PID
1 2 1
1 2 2
2 2 3
2 2 4
I want a query for this.i had tried like this
SELECT t1.EID, COUNT(t1.EID) count,PID
from Employertable t1
INNER JOIN persontable P ON P.EID=t1.EID
Group By t1.EID Having Count(T1.EID) > 1
You can do this using window functions. With those functions you can combine aggregated data with non-aggregated data:
DECLARE #t1 TABLE ( EID INT )
DECLARE #t2 TABLE ( PID INT, EID INT )
INSERT INTO #t1
VALUES ( 1 ),
( 2 ),
( 3 ),
( 4 )
INSERT INTO #t2
VALUES ( 1, 1 ),
( 2, 1 ),
( 3, 2 ),
( 4, 2 )
SELECT *
FROM ( SELECT t1.EID ,
COUNT(*) OVER ( PARTITION BY t2.EID ) AS C ,
t2.PID
FROM #t1 t1
JOIN #t2 t2 ON t2.EID = t1.EID
) t
WHERE t.C > 1
Output:
EID C PID
1 2 1
1 2 2
2 2 3
2 2 4

Create "Sets" within the same table based on multi-column criteria

For every unique combination of BoxId and Revision with a single UnitTypeId of 1 and a single UnitTypeId of 2 both having a NULL SetNumber, assign a SetNumber of 1.
Table and data setup:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[UnitTypes]') AND type in (N'U'))
Drop Table dbo.UnitTypes
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Tracking]') AND type in (N'U'))
DROP TABLE [dbo].[Tracking]
GO
CREATE TABLE dbo.UnitTypes
(
Id int NOT NULL,
Notes varchar(80)
)
GO
CREATE TABLE dbo.Tracking
(
Id int NOT NULL IDENTITY (1, 1),
BoxId int NOT NULL,
Revision int NOT NULL,
UnitValue int NULL,
UnitTypeId int NULL,
SetNumber int NULL
)
GO
ALTER TABLE dbo.Tracking ADD CONSTRAINT
PK_Tracking PRIMARY KEY CLUSTERED
(
Id
)
GO
Insert Into dbo.UnitTypes (Id, Notes) Values (1, 'X Coord'),
(2, 'Y Coord'),
(3, 'Weight'),
(4, 'Length')
Go
Insert Into dbo.Tracking (BoxId, Revision, UnitValue, UnitTypeId, SetNumber)
Values (1165, 1, 150, 1, NULL),
(1165, 1, 1477, 2, NULL),
(1165, 1, 31, 4, NULL),
(1166, 1, 425, 1, 1),
(1166, 1, 1146, 2, 1),
(1166, 1, 438, 1, NULL),
(1166, 1, 1163, 2, NULL),
(1167, 1, 560, 1, NULL),
(1167, 1, 909, 2, NULL),
(1167, 1, 12763, 3, NULL),
(1168, 1, 21, 1, NULL),
(1168, 1, 13109, 3, NULL)
The ideal results would be:
Id BoxId Revision UnitValue UnitTypeId SetNumber
1 1165 1 150 1 1
2 1165 1 1477 2 1
3 1165 1 31 4 1
4 1166 1 425 1 1
5 1166 1 1146 2 1
6 1166 1 438 1 NULL <--NULL Because there is already an existing Set
7 1166 1 1163 2 NULL <--NULL Because there is already an existing Set
8 1167 1 560 1 1
9 1167 1 909 2 1
10 1167 1 12763 3 1
11 1168 1 21 1 NULL <--NULL Because there is not exactly one UnitTypeId of 1 and exactly one UnitTypeId of 2 for this BoxId\Revision combination.
12 1168 1 13109 3 NULL <--NULL Because there is not exactly one UnitTypeId of 1 and exactly one UnitTypeId of 2 for this BoxId\Revision combination.
EDIT:
The question is how can I update the SetNumber, given the constraints above, using pure TSQL?
If I understand your question correctly, you could do this with a subquery that demands all conditions are met:
update t1
set SetNumber = 1
from dbo.Tracking t1
where SetNumber is null
and 1 =
(
select case
when count(case when t2.UnitTypeId = 1 then 1 end) <> 1 then 0
when count(case when t2.UnitTypeId = 2 then 1 end) <> 1 then 0
when count(t2.SetNumber) <> 0 then 0
else 1
end
from dbo.Tracking t2
where t1.BoxId = t2.BoxId
and t1.Revision = t2.Revision
)
The count(t2.SetNumber) is a bit tricky: this will only count rows where SetNumber is not null. So this meets the criterion that no other set with the same (BoxId, Revision) exists.
Try this out, it returns the same results that you gave. The WITH statement sets up a CTE to query from. The ROW_NUMBER() function is partitioning function that does what you want:
;WITH BoxSets AS (
SELECT
ID
,BoxId
,Revision
,UnitValue
,UnitTypeId
,CASE WHEN UnitTypeId IN (1,2) THEN 1 ELSE 0 END ValidUnit
,ROW_NUMBER() OVER (PARTITION BY BoxID,UnitTypeID ORDER BY BoxID,UnitTypeID,UnitValue ) SetNumber
FROM Tracking
)
SELECT
b.ID
,b.BoxId
,b.Revision
,b.UnitValue
,b.UnitTypeId
,CASE ISNULL(b1.ValidUnits,0) WHEN 0 THEN NULL ELSE CASE b.SetNumber WHEN 1 THEN b.SetNumber ELSE NULL END END
FROM BoxSets AS b
LEFT JOIN (SELECT
BoxID
,SUM(ValidUnit) AS ValidUnits
FROM BoxSets
GROUP BY BoxId
HAVING SUM(ValidUnit) > 1) AS b1 ON b.BoxId = b1.BoxId

Resources