Binary operator OR in TSQL? - sql-server

What I 'm trying to achieve is to count occurrences in a sort of time line, considering overlapping events as a single one, starting from a field like this and using TSQL:
Pattern (JSON array of couple of values indicating
the start day and the duration of the event)
----------------------------------------------------
[[0,22],[0,24],[18,10],[30,3]]
----------------------------------------------------
For this example the result expected should be 30
What i need is a TSQL function to obtain this number...
Even If I'm not sure it's the right path to follow, I'm trying to simulate a sort of BINARY OR between rows of my dataset.
After some trying I managed to turn my dataset into something like this:
start | length | pattern
----------------------------------------------------
0 | 22 | 1111111111111111111111
0 | 24 | 111111111111111111111111
18 | 10 | 000000000000000001111111111
30 | 3 | 000000000000000000000000000000111
----------------------------------------------------
But now I dont' know how to proceed in TSQL =)
a solution as i said could be a binary OR between the "pattern" fields to obtain something like this:
1111111111111111111111...........
111111111111111111111111.........
000000000000000001111111111......
000000000000000000000000000000111
--------------------------------------
111111111111111111111111111000111
Is it possible to do it in TSQL?
Maybe i'm just complicating things here do you have other ideas?
DO NOT forget I just need the result number!!!
Thank you all

Only the total days that an event occurs needs to be returned.
But I was wondering how hard it would be to actually calculate that binary OR'd pattern.
declare #T table (start int, length int);
insert into #T values
(0,22),
(0,24),
(18,10),
(30,3);
WITH
DIGITS as (
select n
from (values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) D(n)
),
NUMBERS as (
select (10*d2.n + d1.n) as n
from DIGITS d1, DIGITS d2
where (10*d2.n + d1.n) < (select max(start+length) from #T)
),
CALC as (
select N.n, max(case when N.n between IIF(T.start>0,T.start,1) and IIF(T.start>0,T.start,1)+T.length-1 then 1 else 0 end) as ranged
from #T T
cross apply NUMBERS N
group by N.n
)
select SUM(c.ranged) as total,
stuff(
(
select ranged as 'text()'
from CALC
order by n
for xml path('')
),1,1,'') as pattern
from CALC c;
Result:
total pattern
30 11111111111111111111111111100111

Depending on your input date, you should be able to do something like the following to calculate your Days With An Event.
The cte is used to generate a table of dates, the start and end of which are defined by the two date variables. These would be best suited as data driven from your source data. If you have to use numbered date values, you could simply return incrementing numbers instead of incrementing dates:
declare #Events table (StartDate date
,DaysLength int
)
insert into #Events values
('20160801',22)
,('20160801',24)
,('20160818',10)
,('20160830',3)
declare #StartDate date = getdate()-30
,#EndDate date = getdate()+30
;with Dates As
(
select DATEADD(day,1,#StartDate) as Dates
union all
select DATEADD(day,1, Dates)
from Dates
where Dates < #EndDate
)
select count(distinct d.Dates) as EventingDays
from Dates d
inner join #Events e
on(d.Dates between e.StartDate and dateadd(d,e.DaysLength-1,e.StartDate)
)
option(maxrecursion 0)

Related

Creating rows in a table based on min and max date in Snowflake SQL

Is there a relatively simple way to create rows in a table based on a range of dates?
For example; given:
ID
Date_min
Date_max
1
2022-02-01
2022-20-05
2
2022-02-09
2022-02-12
I want to output:
ID
Date_in_Range
1
2022-02-01
1
2022-02-02
1
2022-02-03
1
2022-02-04
1
2022-02-05
2
2022-02-09
2
2022-02-10
2
2022-02-11
2
2022-02-12
I saw a solution when the range is integer based (How to create rows based on the range of all values between min and max in Snowflake (SQL)?)
But in order to use that approach GENERATOR(ROWCOUNT => 1000) I have to convert my dates to integers and back, and it just gets very messy very quick, especially since I need to apply this to millions of rows.
So, I was wondering if there is a simpler way to do it when dealing with dates instead of integers? Any hints anyone can provide?
Another one without using generator -
with data (ID,Date_min,Date_max) as (
select * from values
(1,to_date('2022-02-01','YYYY-DD-MM'),to_date('2022-20-05','YYYY-DD-MM')),
(2,to_date('2022-02-09','YYYY-DD-MM'),to_date('2022-02-12','YYYY-DD-MM'))
)
select id,
Date_min,
Date_max,
dateadd(day, index, Date_min) day_slots from data,
table(split_to_table(repeat(',',datediff(day, Date_min, Date_max)-1),','));
SQL with first date -
with data (ID,Date_min,Date_max) as (
select * from values
(1,to_date('2022-02-01','YYYY-DD-MM'),to_date('2022-20-05','YYYY-DD-MM')),
(2,to_date('2022-02-09','YYYY-DD-MM'),to_date('2022-02-12','YYYY-DD-MM'))
)
select id,
dateadd(month, index-1, Date_min) day_slots from data,
table(split_to_table(repeat(',',datediff(month, Date_min, Date_max)),','));
But in order to use that approach GENERATOR(ROWCOUNT => 1000) I have to convert my dates to integers and back, and it just gets very messy very quick, especially since I need to apply this to millions of rows.
There is no need to convert date to int back and forth, just simple DATEADD('day', num, start_date)
Pseudocode:
WITH sample_data(id, date_min, date_max) AS (
SELECT 1, '2022-02-01'::DATE, '2022-02-05'::DATE
UNION
SELECT 2, '2022-02-09'::DATE, '2022-02-12'::DATE
) , numbers AS (
SELECT ROW_NUMBER() OVER(ORDER BY SEQ4())-1 AS num -- 0 based
FROM TABLE(GENERATOR(ROWCOUNT => 1000)) -- should match max anticipated span
)
SELECT s.id, DATEADD(DAY, n.num, s.date_min) AS calculated_date
FROM sample_data AS s
JOIN numbers AS n
ON DATEADD('DAY', n.num, s.date_min) BETWEEN s.date_min AND s.date_max
ORDER BY s.id, calculated_date;
Ouptut:

SQL Query to list all hours of the day in datetime format in one column

I need a query that returns all the hours of the day in 12 hour format
ex: 12:00 am, 1:00am, 2:00am etc. This is going to be used in SSRS as a selection field for a parameter for time. I need to select records within a date range and then from a time range in that date range. I have this query which returns the time in 24 hour format but it is not working properly in SSRS:
With CTE(N)
AS
(
SELECT 0
UNION ALL
SELECT N+30
FROM CTE
WHERE N+5<24*60
)
SELECT CONVERT(TIME,DATEADD(minute,N,0) ,108)
FROM CTE
OPTION (MAXRECURSION 0)
This is how I would do it:
DECLARE #t time(1) = '00:00'; --I use 1 as when I use REPLACE later it means that I can "identify" the correct :00 to remove
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
SELECT TOP 24 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1, N N2),
Times AS(
SELECT DATEADD(HOUR, I,#t) AS [Time]
FROM Tally)
SELECT T.[Time],
REPLACE(CONVERT(varchar(12),T.Time,9),':00.0',' ') AS TimeString
FROM Times T
ORDER BY T.[Time] ASC;
Note that I return both a time and varchar datatype; both are important as the ordering of the data for a varchar would be quite different to start with and if you are using SSRS, I suspect you want the value of TimeString as a presentation thing and not the actual value.

How to sum any credits before debits SQL Server?

I'm trying to sum the all credits that occur before a debit, then sum all the debits after credit within a 4 day time period.
Table
ACCT |Date | Amount | Credit or debit
-----+----------+---------+----------------
152 |8/14/2017 | 48 | C
152 |8/12/2017 | 22.5 | D
152 |8/12/2017 | 40 | D
152 |8/11/2017 | 226.03 | C
152 |8/10/2017 | 143 | D
152 |8/10/2017 | 107.23 | C
152 |8/10/2017 | 20 | D
152 |8/10/2017 | 49.41 | C
My query should only sum if there is credit before the debit. the results will have 3 rows with the data above.
Output needed:
acct DateRange credit_amount debit_amount
--------------------------------------------------------------------------
152 2017-10-14 to 2017-10-18 49.41 20
152 2017-10-14 to 2017-10-18 107.23 143
152 2017-10-14 to 2017-10-18 226.03 62.5
The last one is summing the two debits until there is a credit.
First find the first credit.
sum the credits if there are more then 1 before a debit.
then find the debit and sum together until the next credit.
I only need the case where the credit date is before the debit date. The 48 on 8/14 is ignored because there is no debit after it.
The logic is to see if the account was credited then debited after it.
My attempt
DECLARE #StartDate DATE
DECLARE #EndDate DATE
DECLARE #OverallEndDate DATE
SET #OverallEndDate = '2017-08-14'
SET #StartDate = '2017-08-10'
SET #EndDate = dateadd(dd, 4, #startDate);
WITH Dates
AS (
SELECT #StartDate AS sd, #EndDate AS ed, #OverallEndDate AS od
UNION ALL
SELECT dateadd(dd, 1, sd), DATEADD(dd, 1, ed), od
FROM Dates
WHERE od > sd
), credits
AS (
SELECT DISTINCT A.Acct, LEFT(CONVERT(VARCHAR, #StartDate, 120), 10) + 'to' + LEFT(CONVERT(VARCHAR, #EndDate, 120), 10) AS DateRange, credit_amount, debit_amount
FROM (
SELECT t1.acct, sum(amount) AS credit_amount, MAX(t1.datestart) AS c_datestart
FROM [Transactions] T1
WHERE Credit_or_debit = 'C' AND T1.Datestart BETWEEN #StartDate AND #EndDate AND T1.[acct] = '152' AND T1.Datestart <= (
SELECT MIN(D1.Datestart)
FROM [Transactions] D1
WHERE T1.acct = D1.acct AND D1.Credit_or_debit = 'D' AND D1.Datestart BETWEEN #StartDate AND #EndDate
)
GROUP BY T1.acct
) AS A
CROSS JOIN (
SELECT t2.acct, sum(amount) AS debit_amount, MAX(t2.datestart) AS c_datestart
FROM [Transactions] T2 AND T2.DBCR = 'D' AND T2.Datestart BETWEEN #StartDate AND #EndDate AND T2.[acct] = '152' AND T2.Datestart <= (
SELECT MAX(D1.Datestart)
FROM [Transactions] D1
WHERE T2.acct = D1.acct AND D1.Credit_or_debit = 'D' AND D1.Datestart BETWEEN #StartDate AND #EndDate
)
GROUP BY T2.acct
) AS B
WHERE A.acct = B.acct AND A.c_datestart <= B.d_datestart
)
SELECT *
FROM credits
OPTION (MAXRECURSION 0)
Update:
The date stored is actually date timestamped. That is how I verify whether the debit is > credit.
It should be clear now that you definitely need a column that specifies the sequential order of transactions, because otherwise you can't decide whether a debit is placed befor or after a credit when they both have the same datestart. Assuming that you have such a column (in my query I named it ID), a solution could be as follows, without recursion and also without a self-join. The problem can be solved using some of the window functions available since SQL Server 2008.
My solution processes the data in several steps that I implemented as a sequence of 2 CTEs and a final PIVOT query:
DECLARE #StartDate DATE = '20170810';
DECLARE #EndDate DATE = dateadd(dd, 4, #StartDate);
DECLARE #DateRange nvarchar(24);
SET #DateRange =
CONVERT(nvarchar(10), #StartDate, 120) + ' to '
+ CONVERT(nvarchar(10), #EndDate, 120);
WITH
blocks (acct, CD, amount, blockno, r_blockno) AS (
SELECT acct, Credit_or_debit, amount
, ROW_NUMBER() OVER (PARTITION BY acct ORDER BY ID ASC)
- ROW_NUMBER() OVER (PARTITION BY acct, Credit_or_debit ORDER BY ID ASC)
, ROW_NUMBER() OVER (PARTITION BY acct ORDER BY ID DESC)
- ROW_NUMBER() OVER (PARTITION BY acct, Credit_or_debit ORDER BY ID DESC)
FROM Transactions
WHERE datestart BETWEEN #StartDate AND #EndDate
AND Credit_or_debit IN ('C','D') -- not needed, if always true
),
blockpairs (acct, CD, amount, pairno) AS (
SELECT acct, CD, amount
, DENSE_RANK() OVER (PARTITION BY acct, CD ORDER BY blockno)
FROM blocks
WHERE (blockno > 0 OR CD = 'C') -- remove leading debits
AND (r_blockno > 0 OR CD = 'D') -- remove trailing credits
)
SELECT acct, #DateRange AS DateRange
, amt.C AS credit_amount, amt.D AS debit_amount
FROM blockpairs PIVOT (SUM(amount) FOR CD IN (C, D)) amt
ORDER BY acct, pairno;
And this is how it works:
blocks
Here, the relevant data is retrieved from the table, meaning that the date range filter is applied, and another filter on the Credit_or_debit column makes sure that only the values C and D are contained in the result (if this is the case by design in your table, then that part of the WHERE clause can be omitted). The essential part in this CTE is the difference of two rownumbers (blockno). Credits and debits are numbered separately, and their respective rownumber is subtracted from the overall row number. Within a consecutive block of debits or credits, these numbers will be the same for each record, and they will be different (higher) in later blocks of the same type. The main use if this numbering is to identify the very first block (number 0) in order to be able to exclude it from
further processing in the next step in case it's a debit block. To be able to also identify the very last block (and filter it away in the next step if it's a credit block), a similar block numbering is made in the reverse order (r_blockno). The result (which I orderd just for visualization with your sample data) will look like this:
blockpairs
In this CTE, as described before, the very first block is filtered away if it's a debit block, and the very last block is filtered away if it's a credit block. Doing this, the number of remaining blocks must be even, and the logical order of blocks must be a sequence of pairs of credit and debit blocks, each pair starting with a credit block and followed by its associated debit block. Each pair of credit/debit blocks will result in a single row in the end. To associate the credit and debit blocks correctly in the query, I give them the same number by using separate numberings per type (the n-th credit block and the n-th debit block are associated by giving them the same number n). For this numbering, I use the DENSE_RANK function, for all records in a block to obtain the same number (pairno) and make the numbering gapless. For numbrting the blocks of the same type, I reuse the the blockno field described above for ordering. The result in your example (again sorted for visualization):
The final PIVOT query
Finally, the credit_amount and debit_amount are aggregated over the respective blocks grouping by acct and pairno and then diplayed side-by-side using a PIVOT query.
Although the column pairno isn't visible, it is used for sorting the resulting records.

Find mathematical complement of a set of intervals with SQL

I'm pondering over the following problem which needs to be solved via SQL. Let there be an interval [a, b] of natural numbers, and a (finite) set of intervals A that are all subsets of [a, b]. We want to determine the complement auf A, that is, a set of intervals B such that A + B = [a, b] and A and B are pairwise disjoint.
For example: Given [a, b] = the days of 2017 ("all days"), and the intervals March-June, April, April-July and November ("possible days"). Now produce the intervals jan-feb, aug-oct and dec ("impossible days"). All intervals are resp. should be defined via start date and end date.
I tried the following. Produce a calender of 2017 and check for every day if it is contained in neither of the intervals. From these days, construct the corresponding intervals. So far it seems complicated and I'm starting to think that this solution approach is somewhat unlucky with SQL. But maybe it's just my implementation. What do you think? Would you maybe know a better way?
Greetings from Frankfurt,
Johannes
So long as you're working with a known fixed range then you can easily find the candidates to be a start or end that is outside of any current interval. And then just pair those up based on date order:
declare #RangeStart date
declare #RangeEnd date
select #RangeStart = '20170101',#RangeEnd = '20171231'
declare #intervals table (
StartAt date not null,
EndAt date not null
)
insert into #intervals (StartAt,EndAt) values
('20170301','20170630'),
('20170401','20170430'),
('20170401','20170731'),
('20171101','20171130')
;With Starts as (
select
#RangeStart as StartDT
where
not exists (select * from #intervals i where #RangeStart between i.StartAt and i.EndAt) --Start outside an interval
union all
select
DATEADD(day,1,i1.EndAt)
from
#intervals i1
left join
#intervals i2
on
DATEADD(day,1,i1.EndAt) between i2.StartAt and i2.EndAt --No succeeding interval
where
i2.EndAt is null
), Ends as (
select
#RangeEnd as EndDT
where
not exists (select * from #intervals i where #RangeEnd between i.StartAt and i.EndAt) --End outside an interval
union all
select
DATEADD(day,-1,i1.StartAt)
from
#intervals i1
left join
#intervals i2
on
DATEADD(day,-1,i1.StartAt) between i2.StartAt and i2.EndAt --No preceding interval
where
i2.StartAt is null
), OrderedStarts as (
select StartDT,ROW_NUMBER() OVER (ORDER BY StartDT) as rn
from Starts where StartDT between #RangeStart and #RangeEnd
), OrderedEnds as (
select EndDT,ROW_NUMBER() OVER (ORDER BY EndDt) as rn
from Ends where EndDT between #RangeStart and #RangeEnd
)
select
os.StartDT,oe.EndDT
from
OrderedStarts os
inner join
OrderedEnds oe
on
os.rn = oe.rn
Result:
StartDT EndDT
---------- ----------
2017-01-01 2017-02-28
2017-08-01 2017-10-31
2017-12-01 2017-12-31
That is - valid start dates are the start of our range or the day after any other interval, provided that doesn't overlap with another interval. Similarly for valid ends.

Using t-sql to select aggregate when date difference is not just equal but small

I have a table where I want to select the maximum of a column but based on when the date difference is equal or small (lets say 3 days). When two subsequent dates are very close, the data are likely spurious and I want to get the highest state when that happens.
My data looks similar to this
DECLARE #TestingResults TABLE (
IDNumber varchar(100),
DateSeen date,
[state] int)
INSERT INTO #TestingResults VALUES
('A','2015-04-21',2),
('A','2015-05-08',2),
('A','2015-07-01',3),
('B','2014-06-18',100), -- this is the one I want
('B','2014-06-19',2),
('B','2014-07-31',2),
('B','2014-08-11',3),
('B','2014-09-24',3),
('B','2014-10-24',3),
('B','2014-11-24',3),
('B','2014-12-15',3),
('B','2015-01-12',3),
('B','2015-01-13',400), -- this is the one I want
('B','2015-04-06',10), -- either will do
('B','2015-04-07',10),
('B','2015-07-06',3), -- either will do
('B','2015-07-07',3),
('B','2015-10-12',3),
('C','2012-02-20',3),
('C','2012-03-12',3),
('C','2012-04-02',3),
('C','2012-11-21',3)
What I really want is something like this where I take the maximum of state when the difference between dates is < 3 (note, some of the data may have the same state even when the differences in date are small ...) :
IDNumber DateSeen state
A 2015-04-21 2
A 2015-05-08 2
A 2015-07-01 3
-- if there are observations < 3 days apart, take MAX
B 2014-06-18 100
B 2014-07-31 2
B 2014-08-11 3
B 2014-09-24 3
B 2014-10-24 3
B 2014-11-24 3
B 2014-12-15 3
-- if there are observations < 3 days apart, take MAX
B 2015-01-13 400
-- if there are observations < 3 days apart, take MAX
B 2015-04-07 10
-- if there are observations < 3 days apart, take MAX
B 2015-07-07 3
B 2015-10-12 3
C 2012-02-20 3
C 2012-03-12 3
C 2012-04-02 3
C 2012-11-21 3
I guess I could create another variable table to hold it and then query it but there are a couple of problems. First as you can see, IDNumber='B' has a couple of triggers in its sequences of dates so I am thinking there should be an 'smarter' way.
Thanks!
After your clarifying comments (thanks for that!), I would do this as follows:
SELECT ISNULL(high.IDNumber, results.IDNumber) AS IDNumber,
ISNULL(high.DateSeen, results.DateSeen) AS DateSeen,
ISNULL(high.[state], results.[state]) AS [state]
FROM #TestingResults results
OUTER APPLY
(
SELECT TOP 1 IDNumber, DateSeen, [state]
FROM #TestingResults highest
WHERE highest.DateSeen < results.DateSeen
AND highest.IDNumber = results.IDNumber
AND DATEDIFF(DAY,highest.DateSeen,results.DateSeen) <=3
ORDER BY [state] DESC, [DateSeen] DESC
) high
WHERE NOT EXISTS
(
SELECT 1
FROM #TestingResults nearFuture
WHERE nearFuture.DateSeen > results.DateSeen
AND nearFuture.IDNumber = results.IDNumber
AND DATEDIFF(DAY,results.DateSeen,nearFuture.DateSeen) <=3
)
This is almost certainly not the most elegant way to achieve this (I suspect this could be done more efficiently with Window Functions or a recursive CTE or similar), I believe it gives you the behaviour and results you desire.
This should do it using a recursive CTE:
WITH TestingResults AS (
SELECT
*
,ROW_NUMBER() OVER(ORDER BY IDNumber, DateSeen) AS RowNum
FROM #TestingResults
), Data AS (
SELECT
tmp1.IDNumber,
tmp1.DateSeen,
tmp1.state,
tmp1.RowNum,
tmp1.RowNum AS GroupID
FROM (
SELECT
*
,ABS(DATEDIFF(DAY, DateSeen, LAG(DateSeen, 1, NULL) OVER(PARTITION BY IDNumber ORDER BY DateSeen))) AS AbsPrev
FROM TestingResults
) AS tmp1
WHERE tmp1.AbsPrev IS NULL OR tmp1.AbsPrev >= 3 --the first date in a sequence
UNION ALL
SELECT
r.IDNumber,
r.DateSeen,
r.state,
r.RowNum,
d.GroupID
FROM Data d
INNER JOIN TestingResults r ON
r.IDNumber = d.IDNumber
AND DATEDIFF(DAY, d.DateSeen, r.DateSeen) < 3
AND d.RowNum+1 = r.RowNum
)
SELECT MIN(d.IDNumber) AS IDNumber, MAX(d.DateSeen) AS DateSeen, MAX(d.state) AS state
FROM Data d
GROUP BY d.GroupID

Resources