SQL - DATEDIFF analysis across different rows / CURSOR usage - sql-server

I need to identify bookings for a hotel that are made within 5 days of each other.
I need the query to check the other Holiday dates (for each supplierID), and identify just the ones that are within 5 days of each other.
I've heard that using CURSOR is a the way to do this, perhaps with a combination of DATEDIFF and OVER (PARTITION by SupplierID), but have no experience of the CURSOR function and how to use it.
The output should be something like this...
And my query so far is...
SELECT
SupplierID AS 'Hotel',
B.ID AS BookingID,
B.Depart
?? AS '5 days apart'
FROM Bookings B
ORDER by B.SupplierID, B.Depart
Help much appreciated...

create table datestable (hotel int, booking int, holday date)
insert into datestable values
(1,111111,'20140604'),
(1,111112,'20140606'),
(1,111113,'20141012'),
(1,111114,'20141230'),
(5,211111,'20150214'),
(5,211112,'20150217'),
(5,211113,'20150328'),
(5,211114,'20150523')
SELECT *
,(CASE WHEN (
SELECT TOP 1 1
FROM datestable d2
WHERE d1.hotel = d2.hotel AND d1.holday <> d2.holday
AND datediff(day, d2.holday, d1.holday) BETWEEN - 5 AND 5
) = 1 THEN 'y' ELSE 'n' END
) as '5 days apart'
FROM datestable d1

You don't necessarily need to use a cursor here, you can do it with a query like this:
select b1.SupplierID sid1, b1.ID id1, b1.Depart d1,
iif(count(*)>1,'y','n') as within5days
from Bookings b1
left join Bookings b2
on b1.SupplierID=b2.SupplierID and abs(datediff(day, b1.Depart, b2.Depart))<=5
group by b1.SupplierID, b1.ID, b1.Depart;
If you have trouble with the performance, then a cursor might be a better choice, indeed.
edit: added the restriction in the on clause to join only the same suppliers

Related

How to filter IDs based on dates?

I have the following table:
ID | DATES
---+-----------
1 02-09-2010
2 03-08-2011
1 08-01-2011
3 04-03-2010
I am looking for IDs who had at least one date before 05-01-2010 AND at least one date after 05-02-2010
I tried the following:
WHERE tb1.DATES < '05-01-2010' AND tb1.DATES > '05-02-2010'
I don't think it's correct because I wasn't getting the right IDs when I did that and there's something wrong with that logic.
Can someone explain what I am doing wrong here?
The SQL command SELECT * FROM tb1 WHERE tb1.DATES < '05-01-2010' AND tb1.DATES > '05-02-2010' is asking "find all the rows where the 'dates' field is before 1 May and after 2 May" which - when put in English - is obviously none of them.
Instead, the command should be asking "find all the IDs which have a record that is before 1 May, and another record after 2 May" - creating the need to look at multiple records for each ID.
As #Martheen suggested, you could do this with two (sub)queries e.g.,
SELECT A.ID
FROM
(SELECT DISTINCT tb1.ID
FROM mytable tb1
WHERE tb1.[dates] < '20100501'
) AS A
INNER JOIN
(SELECT DISTINCT tb1.ID
FROM mytable tb1
WHERE tb1.[dates] > '20100502'
) AS B
ON A.ID = B.ID;
or using INTERSECT
SELECT DISTINCT tb1.ID
FROM mytable tb1
WHERE tb1.[dates] < '20100501'
INTERSECT
SELECT mt2.ID
FROM mytable mt2
WHERE mt2.[dates] > '20100502';
The use of DISTINCT in the above is so that you only get one row per ID, no matter how many rows they have before/after the relevant dates.
You could also do it via GROUP BY and HAVING - which in this particular case is easy as if any dates are before 1 May, then their earliest date must be before 1 May (and correspondingly for their max data and 2 May) e.g.,
SELECT mt1.ID
FROM mytable mt1
GROUP BY mt1.ID
HAVING MIN(mt1.[dates]) < '20100501' AND MAX(mt1.[dates]) > '20100502';
Here is a db<>fiddle with all 3 of these; all provide the same answer (one row, with ID = 1).
Finally, you should use an unambiguous format for your dates. My preferred one of these is 'yyyyMMdd' with no dashes/slashes/etc (as these make them ambiguous).
Different countries/servers/etc will convert the dates you have there differently e.g., SQL Server UTC string comparison not working
This is one solution to use between to specify range.
SELECT * from Table_name where
From_date BETWEEN '2013-01-03'AND '2013-01-09'
Other solution is to what you mentioned but see that the logic is correct
SELECT * from Table_name where
From_date > '2010-01-05'AND From_date <'2010-02-05'

SQL Server : Join If Between

I have 2 tables:
Query1: contains 3 columns, Due_Date, Received_Date, Diff
where Diff is the difference in the two dates in days
QueryHol with 2 columns, Date, Count
This has a list of dates and the count is set to 1 for everything. All these dates represent public holidays.
I want to be able to get the sum of QueryHol["Count"] if QueryHol["Date"] is between Query1["Due_Date"] and Query1["Received_Date"]
Result Wanted: a column joined onto Query1 to state how many public holidays fell into the date range so they can be subtracted from the Query1["Diff"] column to give a reflection of working days.
Because the 01-01-19 is a bank holiday i would want to minus that from the Diff to end up with results like below
Let me know if you require any more info.
Here's an option:
SELECT query1.due_date
, query1.received_date
, query1.diff
, queryhol.count
, COALESCE(query1.diff - queryhol.count, query1.diff) as DiffCount
FROM Query1
OUTER APPLY(
SELECT COUNT(*) AS count
FROM QueryHol
WHERE QueryHol.Date <= Query1.Received_Date
AND QueryHol.Date >= Query1.Due_Date
) AS queryhol
You may need to play around with the join condition - as it is assumes that the Received_Date is always later than the Due_Date which there is not enough data to know all of the use cases.
If I understand your problem, I think this is a possible solution:
select due_date,
receive_date,
diff,
(select sum(table2.count)
from table2
where table2.due_date between table1.due_date and table1.due_date) sum_holi,
table1.diff - (select sum(table2.count)
from table2
where table2.date between table1.due_date and table2.due_date) diff_holi
from table1
where [...] --here your conditions over table1.

Transaction data aggregate

As a disclaimer, I am not entirely sure the title of the question is best, if not I apologize.
I am trying to calculate cycle times for individuals, but files are occasionally transferred out of their work queues and eventually back. There are no unique transaction IDs recorded just a date and time stamp.
I tried looking for an aggregate group by functions and was told that is not a feature sql-server has.
I started by trying to identify the first and last transaction and was going to build out the query from there but it wasn't too helpful. Any insight would be very helpful.
Changedate is when the transfer from one person to another is recorded (year, moth, day time)
select a.claimId,
a.claimincidentID,
cast(a.changeDate as date) changedate,
a.claimNum,
a.Coverage,
a.AssignedAdjID,
a.AssignedAdj,
a.AssignedUnit,
a.TransferedAdjID,
a.TransferedAdj,
a.TransferedUnit,
a.usertypeid,
a.ChangedBy,
b.Feature_Create_Date,
DATEDIFF(day, b.Feature_Create_Date, a.changedate) transfer1,
cast(FIRST_VALUE(changeDate) OVER (ORDER BY changedate ASC)as date) AS firstchangedate,
cast(LAST_VALUE(changeDate) OVER (ORDER BY a.changedate ASC)as date) AS lastchangedate
from DB1.dbo.Assign_Transfer a
left join DB2.claimslist b on a.claimid=b.claimId
group by a.claimId, a.claimincidentID, a.changeDate, a.claimNum, a.Coverage, a.AssignedAdjID, a.AssignedAdj, a.AssignedUnit, a.TransferedAdjID, a.TransferedAdj, a.TransferedUnit, a.usertypeid, a.ChangedBy, b.Feature_Create_Date
Think of each of these rows as a Start (because the most recent one hasn't ended)
We would need to generate the complement End for this person in the chain.
Then with pairs of Start/End one could create GrossDuration.
Even after we get an assignment's start and end date/time,
we will have workday (8-4, or 9-5, or noon-8, ...) considerations,
also Sat/Sun/Hol and Vacation/out-of-office.
All of which affect Duration--- For Each Person differently.
Which would need to be factored by workday/etc into AdjDuration.
Lets say we can sequence these
Row_Number() Over (Partition by claimID Order by changeDate) as tfrNum
Assigned is the prior, and Transfered is the next
1, 2, 3, ... thru N
V
a.changeDate -- NOW()
V V
a.AssignedAdjID, | a.TransferedAdjID,
a.AssignedAdj, | a.TransferedAdj,
a.AssignedUnit, | a.TransferedUnit,
|
a.usertypeid,
a.ChangedBy,
So, is tfrNum=1 or tfrNum=N the oddball??
Lets look at pairs: each pair goes StartFrom->EndTo
1-2, 2-3, 3-4, 4-5, 5-6, 6-Now
----
From row1 we get TransferredID Start(changeDate) and
from row2 we get AssignedAdjID End (changeDate)
-- 2-3, 3-4, 4-5, etc repeating
--except for
From row6 we get TransferredID Start(changeDate) and
from default (still them) End (Now)
-- -- except again when TransferredUnit is "Closed"
After getting these pairs and their Start and End, we can do the Duration calc.
I need to visualize this problem before I try to run some sql. Real data would help.
Lets start with this, and later I would expand on it after you get it working and look at some data--
With cte_tfrNum (claimID, changeDate, tfrNum, tfrMax) AS
(
SELECT
a.claimId
,a.changeDate
,ROW_NUMBER() Over ( Partition By a.claimId Order By a.changeDate) as tfrNum
,b.tfrMax
FROM DB1.dbo.Assign_Transfer a
-- just for giggles, lets also get the max# of transfers for this claim
Left Join
(SELECT claimId, COUNT(*) as tfrMax
FROM DB1.dbo.Assign_Transfer
Group By claimId
) as b
On b.claimId = a.claimId
)
-- Statement using the CTE
Select
tfrTo.*
From cte_tfrNum as tfrTo
Thank you! I was able to take what you gave me and add a few things to be able to look at what I needed.
select
case when abc.tfrMax > abc.tfrnum then datediff(day,lag(abc.changedate) over(partition by abc.claimID order by abc.claimId),abc.changeDate)
when abc.tfrMax = abc.tfrnum then datediff(day,lag(abc.changedate) over(partition by abc.claimID order by abc.claimId),abc.changeDate)
end as test
, abc.*
from
(
SELECT
a.claimId
,a.changeDate
,a.AssignedAdj
,a.TransferedAdj
,a.Coverage
,ROW_NUMBER() Over ( Partition By a.claimId Order By a.changeDate) as tfrNum
,b.tfrMax
FROM db1.dbo.Assign_Transfer a
Left Join
(SELECT claimId, COUNT(*) as tfrMax
FROM db1.dbo.Assign_Transfer
Group By claimId
) as b
On b.claimId = a.claimId
) abc
group by
abc.claimId
,abc.changeDate
,abc.AssignedAdj
,abc.TransferedAdj
,abc.Coverage
,abc.tfrMax
,abc.tfrNum

SUMIF greater than and Workday column

So I'm trying to convert an Excel table into SQL and I'm having difficulty coming up with the last 2 columns. Below, find my Excel table that is fully functional (in green) and a table for the code that I have in SQL so far (in yellow). I need help replicating columns C and D, I pasted the Excel formula I'm using so you can understand what I'm trying to do:
Here's the code that I have so far:
WITH
cte_DistinctScheduling AS (
SELECT DISTINCT
s.JobNo
FROM
dbo.Scheduling s
WHERE
s.WorkCntr = 'Framing')
SELECT
o.OrderNo,
o.Priority AS [P],
SUM(r.TotEstHrs)/ROUND((8*w.CapacityFactor*(w.UtilizationPct/100)),2) AS
[Work Days Left],
Cast(GetDate()+ROUND(SUM(r.TotEstHrs)/ROUND((8*w.CapacityFactor*
(w.UtilizationPct/100)),2),3) AS DATE) AS DueDate
FROM OrderDet o JOIN cte_DistinctScheduling ds ON o.JobNo = ds.JobNo
JOIN OrderRouting r ON o.JobNo = r.JobNo
JOIN WorkCntr w ON r.WorkCntr = w.ShortName
WHERE r.WorkCntr = 'Framing'
AND o.OrderNo NOT IN ('44444', '77777')
GROUP BY o.OrderNo, o.Priority, ROUND((8*w.CapacityFactor*
(w.UtilizationPct/100)),2)
ORDER BY o.Priority DESC;
My work days left column in SQL gets the right amount for that particular row, but I need it to sum itself and everything with a P value above it and then add that to today's date, while taking workdays into account. I don't see a Workday function in SQL from what I've been reading, so I'm wondering what are some creative solutions? Could perhaps a CASE statement be the answer to both of my questions? Thanks in advance
Took me a while to understand how is the Excel helpful, and I'm still having a hard time absorbing the rest, can't tell if it's a me thing or a you thing, in any case...
First, I've mocked up something to test SUM per your rationale, the idea is doing a self-JOIN and summing everything from that JOIN side, relying on the fact that NULLs will come up for anything that shouldn't be summed:
DECLARE #TABLE TABLE(P int, [Value] int)
INSERT INTO #TABLE SELECT 1, 5
INSERT INTO #TABLE SELECT 2, 6
INSERT INTO #TABLE SELECT 3, 2
INSERT INTO #TABLE SELECT 4, 4
INSERT INTO #TABLE SELECT 5, 9
SELECT T1.P, [SUM] = SUM(ISNULL(T2.[Value], 0))
FROM #TABLE AS T1
LEFT JOIN #TABLE AS T2 ON T2.P <= T1.P
GROUP BY T1.P
ORDER BY P DESC
Second, workdays is a topic that comes up regularly. In case you didn't, consider reading a little about it from previous questions, I even posted an answer on one question last week, and the thread as a whole had several references.
Thirdly, we could use table definitions and sample data loaded on SQL itself, something like I did above.
Lastly, could you please check result of UtilizationPct / 100? If that's an integer-like data type, you're probably getting a bad result on it.

How do I fill in missing dates as rows and give other values? (exceptional case)

I have a lot of explaining to do for the context of this question so bear with me.
At my company, we have a SQL Server database and I'm working in the Management studio 2014.
We have a table that's called Jobstatistics, which displays how many Jobs are done during Intervals of one hour each.
The table looks like this
The station field is basically different areas jobs can be done at.
As you can see, some rows are missing for certain intervals and this is because of the way this table gets filled with data. To fill this table we have a script running that looks at another table and aggregates the amount of jobs for all dates between this interval. In other words, if there aren't any jobs, there won't be a row inserted because there will be nothing to insert (no rows from the other table to aggregate any jobs on).
What I want to do here is fill in these extra intervals with 0 as the amount of Jobs. So there will always be the 24 intervals (hours) for each day and for each station. On top of that we have set targets which we would like to achieve and I declared these in another table, called JobstatisticsTargets, which you could call a calendar table to join the Jobstatistics table on.
The calender table looks like this
I have tried doing a left or right join so the missing intervals would get filled in and the Jobs would at least get NULL values, but the join clause doesn't do what I expect it to.
This is my tried attempt
SELECT a.[Station], a.[Interval], a.[Jobs], b.[28JPH], b.[35JPH]
FROM [JobStatistics] a
RIGHT JOIN [JobStatisticsTargets] b
ON CONVERT(VARCHAR(10),a.Interval,108) = b.Interval
WHERE DATEDIFF(DAY, a.Interval, GETDATE()) < 12
AND Station LIKE '138010'
ORDER BY a.Station, a.Interval
The LEFT JOIN does exactly the same as I would expect a normal join to do and it doesn't append any intervals with NULL values. (the query is just for one station and a few days so I could test easily)
Any help is much appreciated. I will check this topic regularly so be sure to ask any questions regarding the context if you have any and I will try to explain it as good as I can!
EDIT
With some help the query now looks like this
SELECT a.[Station], b.[Interval], a.[Jobs], b.[28JPH], b.[35JPH]
FROM [JobStatistics] a
RIGHT JOIN [JobStatisticsTargets] b
ON CONVERT(VARCHAR(10),a.Interval,108) = b.Interval
AND CONVERT(VARCHAR(10),a.Interval,110) = CONVERT(VARCHAR(10),GETDATE(),110)
AND Station LIKE '138010'
ORDER BY b.Interval
I filter on today's date now because otherwise the extra rows aren't what I want them to be at all. The problem is that I don't know an easy way of filling in my stations. I suppose I need a subquery for those or is there another way?
The problem now as well is that I can't do this query for different stations. I would expect 24 rows for each station representing all the intervals, but I get this as a result:
Station Interval Jobs 28JPH 35JPH
NULL 00:30:00 NULL 0 0
NULL 01:30:00 NULL 0 0
NULL 02:30:00 NULL 0 0
NULL 03:30:00 NULL 0 0
134040 04:30:00 2 0 0
136060 04:30:00 2 0 0
131080 04:30:00 2 0 0
138010 05:30:00 2 0 0
NULL 06:30:00 NULL 0 0
NULL 07:30:00 NULL 28 35
NULL 08:30:00 NULL 28 35
...
You filter on a field from the table which rows may not be presented in the join result: >>>AND Station LIKE '138010'
You should change your query and put this condition in ON CLAUSE, not in WHERE
check this script and let me know,
declare #t table(interval datetime,jobs int)
insert into #t VALUES('2017-04-28 05:30',1),('2017-04-28 06:30',5),('2017-04-29 06:30',5)
--select * from #t
;With CTE as
(
select cast('00:00' as time) as IntervalTime
union ALL
select DATEADD(MINUTE,30,IntervalTime)
from cte
where IntervalTime<'23:30'
)
,CTE1 AS(
select interval,jobs
,dense_rank()over( order by cast(interval as date))rn
from #t
)
select * FROM
(
select distinct case when t.interval is null then
DATEADD(day, DATEDIFF(day, 0,
(select top 1 interval from cte1 where rn=n.number)), cast(c.IntervalTime as datetime))
else t.interval end Newinterval,isnull(t.jobs,0) Jobs
from CTE c
left join cte1 t
on c.IntervalTime=cast(t.interval as time)
cross apply(select number from master.dbo.spt_values
where name is null and number<=(select max(rn) from cte1))n
)t4
where Newinterval is not null

Resources