Select a grouped table (by Id) filtered by a datetime column according to NULL and MAX(date) values - sql-server

Imagine that I have a table with pretty many columns in there, but that has to be returned filtered just by Id and EndDate.
Id
EndDate
...
1
NULL
1
01.01.2022 15:25
1
01.01.2022 15.24
2
15.01.2022 10:00
2
15.01.2022 11:00
2
17.01.2022 00:00
3
NULL
3
10.10.2022 22:12
4
18.05.2022 17:15
4
18.05.2022 17:17
4
19.05.2022 00:00
The resulting table must be the following:
Id
EndDate
...
1
NULL
2
17.01.2022 00:00
3
NULL
4
19.05.2022 00:00
The record with a specific Id must be picked either having a NULL EndDate value or MAX value otherwise. As it's seen on the resulting table, record with Id = 1 has NULL EndDate so then it must be picked, record with Id = 4 doesn't have a NULLable EndDate, so the value with MAX(EndDate) must be returned.
I was trying different scenarios with joining and UNIONing, but it seems desperate. Also, I considered something with CTE tables, but it seems irrelevant. The point is also get an optimal solution, because resulting table are considered to be joined to another table.
If there will be at least an idea of how to get a desired result, I would be appreciate.

You can use ROW_NUMBER in a common table expression to define the priority. Just replace the NULL with a date far in the future like 9999-12-31, then you can just order the date.
WITH cte
AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY ISNULL(EndDate,'99991231') DESC) AS RN
FROM dbo.myTable
)
SELECT *
FROM cte
WHERE cte.RN = 1;

With simple aggregation and a CASE expression where you check if there are any null dates for each Id:
SELECT Id,
CASE WHEN COUNT(*) = COUNT(EndDate) THEN MAX(EndDate) END AS EndDate
FROM tablename
GROUP BY Id;
The condition COUNT(*) = COUNT(EndDate) is satisfied only if all dates are not null.
See the demo.

Related

How to return results from 2 SQL Server tables where one column in common

I've been reading for about 2 hours this afternoon and trying different things to get the results that I need but so far have failed.
Table: Schedule
ScheduleID NOT NULL
EmployeeID NOT NULL
ItemDate NOT NULL
Table: Holidays
HolidayID NOT NULL
EmployeeID NOT NULL
ItemDate NOT NULL
I want to return a result set that has all of the Schedule dates and all of the Holiday dates for a given EmployeeID
Sample data:
Schedule:
ScheduleID EmployeeID ItemDate
------------------------------------
1 1 1/1/2021
2 1 3/1/2021
Holiday:
HolidayID EmployeeID ItemDate
-----------------------------------
1 1 2/1/2021
Should return the following result set
ScheduleID 1 EmployeeID 1 ItemDate 1/1/2021
HolidayID 1 EmployeeID 1 ItemDate 2/1/2021
ScheduleID 2 EmployeeID 1 ItemDate 3/1/2021
I have tried all sorts of joins, inner, outer, right, left but I can't seem to find any scenario that works for what I want.
I'm happy to have NULL values for any of the columns in the returned result set as I can handle this in the code.
The closest I've got is this but I need to have the HolidayID (even if NULL) and/or the ScheduleID (even if NULL) in the results.
SELECT ScheduleID, HolidayID, EmployeeID, ItemDate
FROM Schedule
FULL OUTER JOIN Holiday ON Holiday.EmployeeID = Schedule.EmployeeID
ORDER BY ItemDate
WHERE EmployeeID = 1
Thanks
A simple way to do this is with a UNION operator: https://www.w3schools.com/sql/sql_union.asp
A union will append multiple select statements into one table result. A requirement for a union is the columns must be in order and the same data type. I am putting the results into a WITH clause. This allows you to quickly search for a specific employee ID. If you did not do this you would need two where clauses within the union.
WITH Dates AS (
SELECT ScheduleID, EmployeeID, ItemDate
FROM Schedule
UNION
SELECT HolidayID, EmployeeID, ItemDate
FROM Holiday
)
SELECT *
FROM Dates
WHERE EmployeeID = 1

Getting the Min(startdate) and Max(enddate) for an ID when that ID shows up multiple times

I have a table with a column for ID, StartDate, EndDate, And whether or not there was a gap between the enddate of that row and the next start date. If there was only one set instance of that ID i know that I could just do
SELECT min(startdate),max(enddate)
FROM table
GROUP BY ID
However, I have multiple instances of these IDs in several non-connected timespans. So if I were to do that I would get the very first start date and the last enddate for a different set of time for that personID. How would I go about making sure I get the min a max dates for the specific blocks of time?
I thought about potentially creating a new column where it would have a number for each set of time. So for the first set of time that has no gaps, it would have 1, then when the next row has a gap it will add +1 corresponding to a new set of time. but I am not really sure how to go about that. Here is some sample data to illustrate what I am working with:
ID StartDate EndDate NextDate Gap_ind
001 1/1/2018 1/31/2018 2/1/2018 N
001 2/1/2018 2/30/2018 3/1/2018 N
001 3/1/2018 3/31/2018 5/1/2018 Y
001 5/1/2018 5/31/2018 6/1/2018 N
001 6/1/2018 6/30/2018 6/30/2018 N
This is a classic "gaps and islands" problem, where you are trying to define the boundaries of your islands, and which you can solve by using some windowing functions.
Your initial effort is on track. Rather than getting the next start date, though, I used the previous end date to calculate the groupings.
The innermost subquery below gets the previous end date for each of your date ranges, and also assigns a row number that we use later to keep our groupings in order.
The next subquery out uses the previous end date to identify which groups of date ranges go together (overlap, or nearly so).
The outermost query is the end result you're looking for.
SELECT
Grp.ID,
MIN(Grp.StartDate) AS GroupingStartDate,
MAX(Grp.EndDate) AS GroupingEndDate
FROM
(
SELECT
PrevDt.ID,
PrevDt.StartDate,
PrevDt.EndDate,
SUM(CASE WHEN DATEADD(DAY,1,PrevDt.PreviousEndDate) >= PrevDt.StartDate THEN 0 ELSE 1 END)
OVER (PARTITION BY PrevDt.ID ORDER BY PrevDt.RN) AS GrpNum
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY StartDate, EndDate) as RN,
ID,
StartDate,
EndDate,
LAG(EndDate,1) OVER (PARTITION BY ID ORDER BY StartDate) AS PreviousEndDate
FROM
tbl
) AS PrevDt
) AS Grp
GROUP BY
Grp.ID,
Grp.GrpNum;
Results:
+-----+------------------+--------------+
| ID | InitialStartDate | FinalEndDate |
+-----+------------------+--------------+
| 001 | 2018-01-01 | 2018-03-01 |
| 001 | 2018-05-01 | 2018-06-01 |
+-----+------------------+--------------+
SQL Fiddle demo.
Further reading:
The SQL of Gaps and Islands in Sequences
Gaps and Islands Across Date Ranges
This is an example of a gaps-and-islands problem. A simple solution is to use lag() to determine if there are overlaps. When there is none, you have the start of a group. A cumulative sum defines the group -- and you aggregate on that.
select t.id, min(startdate), max(enddate)
from (select t.*,
sum(case when prev_enddate >= dateadd(day, -1, startdate)
then 0 else 1
end) over (partition by id order by startdate) as grp
from (select t.*, lag(enddate) over (partition by id order by startdate) as prev_enddate
from t
) t
) t
group by id, grp;

Choosing distinct ID with differing column values

Lets say I have this query:
SELECT id, date, amount, cancelled
FROM transactions
Which gives me the following results:
id date amount cancelled
1 01/2019 25.10 0
1 02/2019 19.55 1
1 06/2019 20.33 0
2 10/2019 11.00 0
If there are duplicate IDs, how can I get the one with the latest date? So it would look like this:
id date amount cancelled
1 06/2019 20.33 0
2 10/2019 11.00 0
One method is with ROW_NUMBER and a common table expression like this example. In a multi-statement batch, be mindful to terminate the preceding statement with a semi-colon to avoid parsing errors.
WITH data_with_date_sequence AS (
SELECT
id
, date
, amount
, cancelled
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY date DESC) AS seq
FROM dbo.SomeTable
)
SELECT
id
, date
, amount
, cancelled
FROM data_with_date_sequence
WHERE seq = 1;
One option could be to use ROW_NUMBER function, which will group rows by id and order them by date within same id.
;WITH max_dates AS (
SELECT id,
, date
, amount
, cancelled
, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS Position
FROM transactions
)
SELECT * FROM max_dates WHERE Position = 1

GROUP BY on a column to get remove nulls

I have the following table
Order_ID Loc_ID OrderDate ShippingDate DeliveryDate
10 2 10/12/2018 null null
10 2 null null 18/12/2018
10 2 null 12/13/2019 null
Basically, every time a date is recorded, it is added as a row. I want the table to look like this:
Order_ID Loc_ID Order_Date ShippingDate DeliveryDate
10 2 10/12/2018 13/12/2018 18/12/2018
Can someone tell me how I should do this?
Use MAX:
SELECT Order_ID,
Loc_ID,
MAX(OrderDate) AS OrderDate,
MAX(ShippingDate) AS ShippingDate,
MAX(DeliveryDate) AS DeliveryDate
FROM dbo.YourTable
GROUP BY Order_ID,
Loc_ID;
When ordering data NULL has the lowest value, so any non-NULL value will have a "greater" value. As a result MAX will return the non-NULL value.
A simple aggregation should do the trick
Example
Select Order_ID
,Loc_ID
,OrderDate = max(OrderDate)
,ShippingDate = max(ShippingDate)
,DeliveryDate = max(DeliveryDate)
From YourTable
Group By Order_ID,Loc_ID

SQL Server CTE Use Previous Computed Date as Next Start Date

I have a table that holds tasks. Each task has an allotted number of hours that it's supposed to take to complete the task.
I'm storing the data in a table, like so:
declare #fromtable table (recordid int identity(1,1), orderdate date, deptid int, task varchar(500), estimatedhours int);
I also have a function that calculates the completion date of the task, based on the start date, estimated hours, and department, and some other math that computes headcount, hours available to work, etc.
dbo.fn_getCapEndDate(aStartDate,estimatedHours,deptID)
I need to generate the start and end date for each record in #fromtable. The first record will start with column orderdate as the start date for the computation, then each subsequent record will use the previous record's computedEndDate as their start date.
What I'm trying to achieve:
Here's what I have started with:
with MyCTE as
(
select mt.recordID, mt.deptID, mt.estimatedhours, mt.JobNumber, ROW_NUMBER() over (order by recordID) as RowNum,
convert(date,mt.orderdate) as computedStart,
case when mt.recordID = 1 then convert(date,dbo.fn_getCapEndDate(mt.orderdate,mt.estimatedhours,mt.deptid)) end as computedEnd
from #fromtable mt
)
select c1.*, c2.recordID,
case when c2.recordid is null then c1.computedStart else c2.computedEnd end as StartDate,
case when c2.recordid is null then c1.computedEnd else dbo.fn_getCapEndDate(c2.computedEnd,c1.estimatedhours,c1.deptid) end as computedEnd
from MyCTE c1
left join MyCTE c2 on c1.RowNum = c2.RowNum + 1;
With this, the first two columns have the correct start/end dates. Every column after that computes NULL for its start and end values. It "loses" the value of the previous column's computed end date.
What can I do to fix the issue and return the values as needed?
EDIT: Sample data in text format:
estimatedhours OrderDate
0 1/1/2017
0 1/1/2017
0 1/1/2017
0 1/1/2017
500 1/1/2017
32 1/1/2017
0 1/1/2017
0 1/1/2017
320 1/1/2017
0 1/1/2017
5 1/1/2017
0 1/1/2017
4 1/1/2017
You can use lead as below:
select RecordId, EstimatedHours, StartDate,
ComputedEnd = LEAD(StartDate) over (order by RecordId)
From yourTable

Resources