Excluding results over a period - snowflake-cloud-data-platform

I have an issue comparing data over time.
I would like to exclude data that was existing in the last period but i am missing something.
I want to retrieve only new ref which not existing during the previous period. (this query is part of bigger query thats why you will fond won / lost in my test query)
An other thing i don't understand is why i got this error with production data ?
SQL compilation error: Unsupported subquery type cannot be evaluated
CREATE OR REPLACE TEMPORARY TABLE "TMP_TEST" (
"Period" TIMESTAMP,
"Country" VARCHAR,
"Ref" VARCHAR,
"Name" VARCHAR,
"Tag" VARCHAR
);
INSERT INTO "TMP_TEST"
VALUES
('01/01/2020','US','WZ32 ','WKDM2 ','123'),
('01/01/2020','US','PZ56 ','2GFSDG37 ','456'),
('01/02/2020','US','OD59 ','ORD56 ','123'),
('01/03/2020','US','OD59 ','ORD56 ','123'),
('01/03/2020','US','OD59 ','ORD56 ','456'),
('01/03/2020','US','NULL ','24GFDSGF2 ','123'),
('01/03/2020','US','RL04 ','24GSFD1 ','123'),
('01/04/2020','US','RL04 ','24GSFD1 ','123');
SELECT * from "TMP_TEST"
SELECT A."Ref",A."Period",A."Name",A."Country",A."Tag", 1 AS "Won",0 AS "Lost"
FROM "TMP_TEST" A
WHERE A."Ref" NOT IN (SELECT B."Ref" FROM "TMP_TEST" B WHERE B."Period" = DATEADD(MONTH, -1,A."Period"))
GROUP BY 1,2,3,4,5
Wanted result :
Period
Country
Ref
Name
Tag
01/01/2020
US
WZ32
WKDM2
123
01/01/2020
US
PZ56
2GFSDG37
456
01/02/2020
US
OD59
ORD56
123
01/03/2020
US
OD59
ORD56
456
01/03/2020
US
NULL
24GFDSGF2
123
01/03/2020
US
RL04
24GSFD1
123

Based on the sample data provided, have got this as a working solution :
SELECT A."Ref",A."Period",A."Name",A."Country",A."Tag", 1 AS "Won",0 AS "Lost"
FROM "TMP_TEST" A
WHERE A."Ref" NOT IN (SELECT B."Ref" FROM "TMP_TEST" B WHERE B."Period" = DATEADD(DAY, -1,A."Period"))
OR A."Tag" NOT IN (SELECT B."Tag" FROM "TMP_TEST" B WHERE B."Period" = DATEADD(Day, -1,A."Period"))
GROUP BY 1,2,3,4,5;
The reason why DATEADD function is changed from month to day is because Snowflake by default reads the data as MM/DD/YYYYY.
So, when using MONTH as the parameter, all the values would be same as the input tables hence there is no exclusion possible.

Related

T-SQL Issue using lead or lag

I have a table that has columns EVENT_ACTION and TIMESTAMP; in column EVENT_ACTION there are two possible values, 225 and 226.
225 represent the start_time and 226 represent the end_time; since they are in two different rows I'm trying to use LAG or LEAD and have some issues.
Here is what I have so far; the column MRDF is my unique id:
SELECT
f.EVENT_ACTION ,
(f.TIMESTAMP) AS starttime,
LEAD(f.TIMESTAMP) OVER (ORDER BY f.MRDF) AS endtime
FROM
dbo.flext f
WHERE
EVENT_ACTION IN (225,226)
ORDER BY
MRDF, EVENT_ACTION
This is what I'm getting: it's now getting the next row's timestamp as I thought it would:
I'm getting a null value for my last EVENT_ACTION 255. I'm planing to place this into a temp table and only take EVENT_ACTION 225
As you can see I'm lost :-).
Any help would be appreciated
Mike
I think you want to use f.TIMESTAMP as your ORDER BY for the LEAD(). I think your query should look something more like this:
SELECT
f.EVENT_ACTION ,
(f.TIMESTAMP) AS starttime,
LEAD(f.TIMESTAMP) OVER (ORDER BY f.TIMESTAMP ASC) AS endtime
FROM
dbo.flext f
WHERE
EVENT_ACTION IN (225,226)
ORDER BY MRDF, EVENT_ACTION
However, this will still leave you with a NULL for the endtime of your last 226 record. So you can add a default value to the LEAD() function for this situation. The syntax is:
LEAD ( scalar_expression [ ,offset ] , [ default ] )
Using this syntax, your LEAD() would then become:
LEAD(f.TIMESTAMP, 1, GETDATE()) OVER (ORDER BY f.TIMESTAMP ASC) AS endtime
You can replace the GETDATE() with whatever you'd want the default value to be when there is no leading record.

SUM() column based on other columns

I have table with sales plan data for every week, which consists of few columns:
SAL_DTDGID -- which is date of every Sunday, for example 20160110, 20160117
SAL_MQuantity --sum of sales plan value
SAL_MQuantityYTD --sum of plans since first day of the year
SAL_CoreElement --sales plan data for few core elements
SAL_Site --unique identifier of place, where sale has happened
How do I sum values in SAL_MQuantityYTD as values of SAL_MQuantity since first records in 2016 to 'now' for every site and every core element?
Every site mentioned in SAL_Site has 52 rows corresponding week count in a year along with 5 different SAL_CoreElement's
Example:
SAL_DTDGID|SAL_MQuantity|SAL_MQuantityYTD|SAL_CoreElement|SAL_Site
20160110 |20000 |20000 |1 |1234
20160117 |10000 |30000 |1 |1234
20160124 |30000 |60000 |1 |1234
If something isn't clear I'll try to explain.
Not sure I completely understand your question, but this should allow you to recreate the running sum for SAL_MQuantityYTD. Replace #test with whatever your table/view is called.
SELECT *,
(SELECT SUM(SAL_MQuantity)
FROM #test T2
WHERE T2.SAL_DTDGID <= T1.SAL_DTDGID
AND T2.SAL_Site = T1.SAL_Site
AND T2.SAL_coreElement = T1.SAL_coreElement) AS RunningTotal
FROM #test T1
If you wanted to create the yearly figure then you could also use a correlated subquery like this
SELECT *,
(SELECT SUM(SAL_MQuantity)
FROM #test T2
WHERE cast(left(T2.SAL_DTDGID,4) as integer) = cast(left(T1.SAL_DTDGID,4) as integer)
AND T2.SAL_Site = T1.SAL_Site
AND T2.SAL_coreElement = T1.SAL_coreElement) AS RunningTotal
FROM #test T1
Edit: Just seen, basically the same answer, using a window function.
Let me explain you an idea. Please try below.
Select A, B,
(Select SUM(SAL_MQuantity)
FORM [Your Table]
WHERE [your date column] between '20160101' AND '[Present date]') AS SAL_MQuantityYTD
FROM [Your Table]
My understanding from your questions is that you want to have the YTD sum of SAL_MQuantity for each year (you can simply 'where' after if you only want 2016), SAL_Site, SAL_CoreElement.
The code below should achieve that and will run on SQL 2008 r2 (im running 2005).
'##t1' is the temp table name I used to test, replace it with your table name.
Select distinct
sum (SAL_MQuantity) over (partition by
left (cast (cast (SAL_DTDGID as int) as varchar (8)),4)
, SAL_Site
, SAL_CoreElement
) as Sum_SAL_DTDGID
,left (cast (cast (SAL_DTDGID as int) as varchar (8)),4) as Time_Period
, SAL_Site
, SAL_CoreElement
from ##t1

How can I get what I'm looking for here?

I think this question has been answered but I am not skilled enough (yet!) to have recognized how someone elses' answer will help me fix my problem so I apologize if this feels like a repost.
I am using MS Server2012
I need the following results from a query:
LoanNumber | OpenDate | CreditLimit | CaptureDate | CaptureBalance | TodayDate | TodayBalance
LoanNumber is a unique identifier | OpenDate is the date the credit line was opened | CaptureDate is OpenDate + 6 days | CaptureBalance is what we consider to be the initial balance on the credit line and is defined as the balance 6 days after it was opened | TodayDate is today | TodayBalance is the balance today
I want to be able to look at a credit line and compare the initial balance (aka CaptureBalance) to the credit limit as well as compare that to the balance today.
Here's my code and see below for more definitions
select top 100
L1.LOANNUMBER as 'LoanNumber'
,L1.OPENDATE as 'OpenDate' --this is stored as Date
,L2.OPENDATE+6 as 'CaptureDate'
,L1.CREDITLIMIT as 'CreditLimit'
,( Select L2.BALANCE
From LOAN as L2
INNER JOIN LOAN as L1 on L2.LOANNUMBER = L1.LOANNUMBER
Where CONVERT(datetime,convert(char(8),L2.RUNDATE )) = L2.OPENDATE+6
) as 'CaptureBalance'
From LOAN as L1
INNER JOIN LOAN as L2 on L1.LOANNUMBER = L2.LOANNUMBER
Where L1.RUNDATE = 20151130 -- this is stored as INT
and L1.[TYPE] = 'Line of Credit'
RUNDATE is important because every day our system logs a snapsot of that loan. Where L1.RUNDATE = 20151130 is telling the system to give me the balance on Nov 30 2015. I also need to get what the balance was 6 days after the date the loan was opened causing me to reference 2 different run dates.
I have to compare the run date (INT) to OpenDate (Date) so I used CONVERT(datetime,convert(char(8),L2.RUNDATE )) to convert the run date INT --> Date so I can effectively compare the two dates.
When I run this I get:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Initially I was running all of this off of the same table. Then I decided to try giving the loan table 2 different aliases and that's where I stopped.
Is the way I'm using that subquery resulting in "more than 1 value" because each result of that query is trying to get listed as a column header? If yes, I still don't know how to get what I'm looking for.
HELP!?
I am pretty sure this is what you want, or at least one approach to it:
select top 100
L1.LOANNUMBER as 'LoanNumber'
,L1.OPENDATE as 'OpenDate' --this is stored as Date
,L2.RUNDATE as 'CaptureDate'
,L1.CREDITLIMIT as 'CreditLimit'
,L2.BALANCE as 'CaptureBalance'
,L1.RUNDATE as 'TodayDate'
,L1.BALANCE as 'TodayBalance'
From LOAN as L1
INNER JOIN LOAN as L2
on L1.LOANNUMBER = L2.LOANNUMBER
AND L2.RUNDATE=DATEADD(dd, 6, L1.OPENDATE)
Where L1.RUNDATE = 20151130 -- this is stored as INT
and L1.[TYPE] = 'Line of Credit'

Gaps in recurring series of a group with datetime [duplicate]

We have a table with following data
Id,ItemId,SeqNumber;DateTimeTrx
1,100,254,2011-12-01 09:00:00
2,100,1,2011-12-01 09:10:00
3,200,7,2011-12-02 11:00:00
4,200,5,2011-12-02 10:00:00
5,100,255,2011-12-01 09:05:00
6,200,3,2011-12-02 09:00:00
7,300,0,2011-12-03 10:00:00
8,300,255,2011-12-03 11:00:00
9,300,1,2011-12-03 10:30:00
Id is an identity column.
The sequence for an ItemId starts from 0 and goes till 255 and then resets to 0. All this information is stored in a table called Item. The order of sequence number is determined by the DateTimeTrx but such data can enter any time into the system. The expected output is as shown below-
ItemId,PrevorNext,SeqNumber,DateTimeTrx,MissingNumber
100,Previous,255,2011-12-01 09:05:00,0
100,Next,1,2011-12-01 09:10:00,0
200,Previous,3,2011-12-02 09:00:00,4
200,Next,5,2011-12-02 10:00:00,4
200,Previous,5,2011-12-02 10:00:00,6
200,Next,7,2011-12-02 11:00:00,6
300,Previous,1,2011-12-03 10:30:00,2
300,Next,255,2011-12-03 16:30:00,2
We need to get those rows one before and one after the missing sequence. In the above example for ItemId 300 - the record with sequence 1 has entered first (2011-12-03 10:30:00) and then 255(2011-12-03 16:30:00), hence the missing number here is 2. So 1 is previous and 255 is next and 2 is the first missing number. Coming to ItemId 100, the record with sequence 255 has entered first (2011-12-02 09:05:00) and then 1 (2011-12-02 09:10:00), hence 255 is previous and then 1, hence 0 is the first missing number.
In the above expected result, MissingNumber column is the first occuring missing number just to illustrate the example.
We will not have a case where we would have a complete series reset at one time i.e. it can be either a series rundown from 255 to 0 as in for itemid 100 or 0 to 255 as in ItemId 300. Hence we need to identify sequence missing when in ascending order (0,1,...255) or either in descending order (254,254,0,2) etc.
How can we accomplish this in a t-sql?
Could work like this:
;WITH b AS (
SELECT *
,row_number() OVER (ORDER BY ItemId, DateTimeTrx, SeqNumber) AS rn
FROM tbl
), x AS (
SELECT
b.Id
,b.ItemId AS prev_Itm
,b.SeqNumber AS prev_Seq
,c.ItemId AS next_Itm
,c.SeqNumber AS next_Seq
FROM b
JOIN b c ON c.rn = b.rn + 1 -- next row
WHERE c.ItemId = b.ItemId -- only with same ItemId
AND c.SeqNumber <> (b.SeqNumber + 1)%256 -- Seq cycles modulo 256
)
SELECT Id, prev_Itm, 'Previous' AS PrevNext, prev_Seq
FROM x
UNION ALL
SELECT Id, next_Itm ,'Next', next_Seq
FROM x
ORDER BY Id, PrevNext DESC
Produces exactly the requested result.
See a complete working demo on data.SE.
This solution takes gaps in the Id column into consideration, as there is no mention of a gapless sequence of Ids in the question.
Edit2: Answer to updated question:
I updated the CTE in the query above to match your latest verstion - or so I think.
Use those columns that define the sequence of rows. Add as many columns to your ORDER BY clause as necessary to break ties.
The explanation to your latest update is not entirely clear to me, but I think you only need to squeeze in DateTimeTrx to achieve what you want. I have SeqNumber in the ORDER BY additionally to break ties left by identical DateTimeTrx. I edited the query above.

Merge rows based on date in SQL Server

I want to display data based on start date and end date. a code can contain different dates. if any time intervel is continues then I need to merge that rows and display as single row
Here is sample data
Code Start_Date End_Date Volume
470 24-Oct-10 30-Oct-10 28
470 17-Oct-10 23-Oct-10 2
470 26-Sep-10 2-Oct-10 2
471 22-Aug-10 29-Aug-10 2
471 15-Aug-10 21-Aug-10 2
The output result I want is
Code Start_Date End_Date Volume
470 17-Oct-10 30-Oct-10 30
470 26-Sep-10 2-Oct-10 2
471 15-Aug-10 29-Aug-10 4
a code can have any no. of time intervels. Pls help. Thank you
Based on your sample data (which I've put in a table called Test), and assuming no overlaps:
;with Ranges as (
select Code,Start_Date,End_Date,Volume from Test
union all
select r.Code,r.Start_Date,t.End_Date,(r.Volume + t.Volume)
from
Ranges r
inner join
Test t
on
r.Code = t.Code and
DATEDIFF(day,r.End_Date,t.Start_Date) = 1
), ExtendedRanges as (
select Code,MIN(Start_Date) as Start_Date,End_Date,MAX(Volume) as Volume
from Ranges
group by Code,End_Date
)
select Code,Start_Date,MAX(End_Date),MAX(Volume)
from ExtendedRanges
group by Code,Start_Date
Explanation:
The Ranges CTE contains all rows from the original table (because some of them might be relevant) and all rows we can form by joining ranges together (both original ranges, and any intermediate ranges we construct - we're doing recursion here).
Then ExtendedRanges (poorly named) finds, for any particular End_Date, the earliest Start_Date that can reach it.
Finally, we query this second CTE, to find, for any particular Start_Date, the latest End_Date that is associated with it.
These two queries combine to basically filter the Ranges CTE down to "the widest possible Start_Date/End_Date pair" in each set of overlapping date ranges.
Sample data setup:
create table Test (
Code int not null,
Start_Date date not null,
End_Date date not null,
Volume int not null
)
insert into Test(Code, Start_Date, End_Date, Volume)
select 470,'24-Oct-10','30-Oct-10',28 union all
select 470,'17-Oct-10','23-Oct-10',2 union all
select 470,'26-Sep-10','2-Oct-10',2 union all
select 471,'22-Aug-10','29-Aug-10',2 union all
select 471,'15-Aug-10','21-Aug-10',2
go
if I understand your request, you're looking for something like:
select code, min(Start_date), max(end_date), sum(volume)
from yourtable
group by code

Resources