SQL: How to join two tables by their date ranges - sql-server

I have a table with a history of assigning Eployee Type to a Work item, like follows:
| WorkItemID | EmployeeTypeID | ValidFrom | ValidTo |
| 1 | 1 | 2017-03-01 12:19:20.000 | 2017-03-05 14:11:20.000 |
| 1 | 1 | 2017-03-10 17:00:20.000 | NULL |
| 1 | 2 | 2017-05-12 12:19:20.000 | 2017-05-29 14:11:20.000 |
| 1 | 2 | 2017-07-01 12:19:20.000 | NULL |
| 2 | 1 | 2017-01-01 15:19:20.000 | 2017-03-01 11:29:20.000 |
| 2 | 1 | 2017-04-03 16:19:20.000 | NULL |
NULL means that there's no End date for the last assignment and it is still valid.
I also have a table with a history of assigning Eployee Type to an Employee:
| EmployeeID | EmployeeTypeID | ValidFrom | ValidTo |
| 1 | 1 | 2017-01-01 12:19:20.000 | 2017-03-05 14:11:20.000 |
| 1 | 2 | 2017-03-05 14:11:20.000 | NULL |
| 2 | 1 | 2016-05-05 15:19:20.000 | 2017-03-01 11:29:20.000 |
| 2 | 2 | 2017-03-01 11:29:20.000 | NULL |
For a given EmployeeID and WorkItemID, I need to select a minimum date within these date ranges where their EmployeeTypeID matched (if there is any).
For example, for EmployeeID = 1 And WorkItemID = 1 the minimum date when their Employeetypes matched is 2017-03-01 (disregard the time part).
How do I write an SQL query to join these two tables correctly and select the desired date?

The following way appeared to be correct for me:
Firstly, I select Min Date from table 1 that match with table 2 by date ranges and they should overlap as well:
DECLARE #MinDate1 datetime
DECLARE #MinDate2 datetime
SELECT #MinDate1 =
(SELECT MIN(t1.ValidFrom)
FROM Table1 t1
JOIN Table2 t2 ON t1.EmployeeTypeID = t2.EmployeeTypeID
WHERE t1.WorkItemID = 1 AND t2.EmployeeID = 1
AND (t1.ValidFrom <= t2.ValidTo OR t2.ValidTo IS NULL)
AND (t1.ValidTo >= t2.ValidFrom OR t1.ValidTo IS NULL))
Then I select Min Date from table 2 that match with table 1 by date ranges and they should overlap as well:
SELECT #MinDate2 =
(SELECT MIN(t2.ValidFrom)
FROM Table1 t1
JOIN Table2 t2 ON t1.EmployeeTypeID = t2.EmployeeTypeID
WHERE t1.WorkItemID = 1 AND t2.EmployeeID = 1
AND (t1.ValidFrom <= t2.ValidTo OR t2.ValidTo IS NULL)
AND (t1.ValidTo >= t2.ValidFrom OR t1.ValidTo IS NULL))
And finaly, I select the max date of two which would be the min date when the two ranges actually overlap and have the same EmployeeTypeID
SELECT CASE WHEN #MinDate1 > #MinDate2 THEN #MinDate1 ELSE #MinDate2 END AS MinOverlapDate
The output would be:
| MinOverlapDate |
| 2017-03-01 12:19:20.000 |

So it should be something like this:
SELECT MIN(Date)
FROM table1 t1
JOIN table2 t2 ON t1.EmployeeTypeID = t2.EmployeeTypeID
WHERE t1.EmployeeID = givenValue AND t2.WorkitemID = givenValue
But again if you dont know from which table the result goes you cant write a query for that.
What you should do is do at least 3 tables or maybe more
Would contain Employee informations
Items jobs dates whatever is connected to WORK
Some connection between them (Emp 1 has Work 2) (Emp 2 has Work 4) and so on
You CANNOT have same values in two tables without knowing from which one you want to get tha data!
OR .. You can do it into one table.
Columns: WorkItem | EmployeeID | EmployeeType | Date | Date

Actually, my variant still does not work correctly. The #MinDate1 and #MinDate2 should be compared by each EmployeeTypeID one by one. There it was compared independently.
Here is correct variant of solving this problem:
SELECT MIN(CASE WHEN t1.ValidFrom > t2.ValidFrom THEN t1.ValidFrom ELSE t2.ValidFrom END) AS MinOverlapDate
FROM Table1 t1
JOIN Table2 t2 ON t1.EmployeeTypeID = t2.EmployeeTypeID
WHERE t1.WorkItemID = 1 AND t2.EmployeeID = 1
AND (t1.ValidFrom <= t2.ValidTo OR t2.ValidTo IS NULL)
AND (t1.ValidTo >= t2.ValidFrom OR t1.ValidTo IS NULL)

Don't use >=, <=, = or between when comparing datetime fields. Since all of the mention operator would check against time as well. You would want to use datediff to check against the smallest interval according to your needs
select
Min_Overlap_Per_Section = (select MAX(ValidFrom)
FROM (VALUES (t1.ValidFrom), (t2.ValidFrom)) as ValidFrom(ValidFrom))
, Section_From = (select MAX(ValidFrom)
FROM (VALUES (t1.ValidFrom), (t2.ValidFrom)) as ValidFrom(ValidFrom))
, Section_To = (select MIN(ValidTo)
FROM (VALUES (t1.ValidTo), (t2.ValidTo)) as ValidTo(ValidTo))
from Table1
JOIN Table2 t2 ON t1.EmployeeTypeID = t2.EmployeeTypeID
where (
datediff(day, t1.ValidFrom, t2.ValidTo) >= 0
or t2.ValidTo IS NULL
)
and (
datediff(day, t2.ValidFrom, t1.ValidTo) >= 0
or t1.ValidTo IS NULL
)

Related

Join created table under condition

I am creating a code to join two different tables under a certain condition. The tables look like this
(TABLE 2)
date | deal_code | originator | servicer | random |
-----------------------------------------------------
2011 | 001 | commerzbank | SPV1 | 1 |
2012 | 001 | commerzbank | SPV1 | 12 |
2013 | 001 | commerzbank | SPV1 | 7 |
2013 | 005 | unicredit | SPV2 | 7 |
and another table
(TABLE 1)
date | deal_code | amount |
---------------------------
2011 | 001 | 100 |
2012 | 001 | 100 |
2013 | 001 | 100 |
2013 | 005 | 200 |
I would like to have this as the final result
date | deal_code | amount | originator | servicer | random |
--------------------------------------------------------------
2013 | 001 | 100 | commerzbank | SPV1 | 7 |
2013 | 005 | 200 | unicredit | SPV2 | 7 |
I created the following code
select q1.deal_code, q1.date
from table1 q1
where q1.date = (SELECT MAX(t4.date)
FROM table1 t4
WHERE t4.deal_code = q1.deal_code)
that gives me:
(TABLE 3)
date | deal_code | amount |
---------------------------
2013 | 001 | 100 |
2013 | 005 | 200 |
That is the latest observation for table 1, now I would like to have the originator and servicer information given the deal_code and date. Any suggestion? I hope to have been clear enough. Thanks.
This should do what you are looking for. Please be careful when naming columns. Date is a reserved word and is too ambiguous to be a good name for a column.
declare #Something table
(
SomeDate int
, deal_code char(3)
, originator varchar(20)
, servicer char(4)
, random int
)
insert #Something values
(2011, '001', 'commerzbank', 'SPV1', 1)
, (2012, '001', 'commerzbank', 'SPV1', 12)
, (2013, '001', 'commerzbank', 'SPV1', 7)
, (2013, '005', 'unicredit ', 'SPV2', 7)
declare #SomethingElse table
(
SomeDate int
, deal_code char(3)
, amount int
)
insert #SomethingElse values
(2011, '001', '100')
, (2012, '001', '100')
, (2013, '001', '100')
, (2013, '005', '200')
select x.SomeDate
, x.deal_code
, x.originator
, x.servicer
, x.random
, x.amount
from
(
select s.SomeDate
, s.deal_code
, s.originator
, s.servicer
, s.random
, se.amount
, RowNum = ROW_NUMBER()over(partition by s.deal_code order by s.SomeDate desc)
from #Something s
join #SomethingElse se on se.SomeDate = s.SomeDate and se.deal_code = s.deal_code
) x
where x.RowNum = 1
Looks like this would work:
DECLARE #MaxYear INT;
SELECT #MaxYear = MAX(date)
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.deal_code = t2.deal_code;
SELECT t1.date,
t1.deal_code,
t1.amount,
t2.originator,
t2.servicer,
t2.random
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.date = #MaxYear
AND t1.deal_code = t2.deal_code;
I agree with Sean Lange about the date column name. His method gets around the dependency on the correlated sub-query, but at the heart of things, you really just need to add an INNER JOIN to your existing query in order to get the amount column into your result set.
select
q2.date,
q2.deal_code,
q1.amount,
q2.originator,
q2.servicer,
q2.random
from
table1 q1
join
table2 q2
on q1.date = q2.date
and q1.deal_code = q2.deal_code
where q1.date = (SELECT MAX(t4.date)
FROM table1 t4
WHERE t4.deal_code = q1.deal_code)

Comparing records within the same table in SQL Server 2014

I have a table that shows the entry and exit of items into the warehouse. The Camera 1 and Camera 2 document the entry time and exit time respectively of that item. The cameras then classify the item as it enters and leaves the checkpoint with the help of lasers. Eg: Big box: Class 5, Medium Box: Class 3, Small Box: Class 2.
Sometimes, the cameras classification doesn't match each other. Eg: Classification at entry can be Medium box and on exit can be Small box.
I need to find the number of transactions where the class didn't match for the same TransactionDetail and then a percentage of those class mismatches against all the transaction for a certain time range.
My table looks somewhat like this:
---------------------------------------------------------------------------
| AVDetailID | TransDetailID | AVClassID | CamID | CreatedDate |
---------------------------------------------------------------------------
| 20101522 | 54125478 | 5 | 1 | 2017-05-08 10:15:01:560|
| 20101523 | 54125478 | 5 | 2 | 2017-05-08 10:15:01:620|
| 20101524 | 54125479 | 3 | 1 | 2017-05-08 10:15:03:120|
| 20101525 | 54125479 | 2 | 2 | 2017-05-08 10:15:03:860|
| 20101526 | 54125480 | 4 | 1 | 2017-05-08 10:15:06:330|
| 20101527 | 54125480 | 4 | 2 | 2017-05-08 10:15:06:850|
---------------------------------------------------------------------------
So, in the above case the class changes from 3 to 2 in record 3 and 4. That is one transaction where the class changed. I need to get a percentage of all transactions that where the class changed between each cameras.
The code I tried so far, unsuccessfully is:
SELECT
COUNT(TransDetailId)
FROM
[AVTransDetail]
WHERE
((SELECT AVCClassId WHERE CamId = 1) <> (SELECT AVCClassId WHERE DetectionZoneId = 2))
AND CreatedDate >= '2017-04-01'
AND CreatedDAte <= '2017-04-07'
GROUP BY
TransDetailId
You can try to join the table on itself like this:
SELECT tdBefore.TransDetailId
FROM AVTransDetail AS tdBefore
INNER JOIN AVTransDetail AS tdAfter
ON tdBefore.TransDetailID = tdAfter.TransDetailID
AND tdBefore.CamID = 1
AND tdAfter.CamID = 2
WHERE tdBefore.AVClassID <> tdAfter.AVClassID
AND tdBefore.CreatedDate >= '2017-04-01'
AND tdAfter.CreatedDate <= '2017-04-07'
Then to get the percentage:
DECLARE #MinDate DATE = '20170401',
#MaxDate DATE = '20170407';
SELECT tdBefore.TransDetailId,
COUNT(tdAfter.TransDetailID) OVER() AS NumDifferent,
((CONVERT(DECIMAL(3, 2), COUNT(tdAfter.TransDetailID) OVER())) / allRecords.Count) * 100 AS DiffPercent,
FROM AVTransDetail AS tdBefore
INNER JOIN AVTransDetail AS tdAfter
ON tdBefore.TransDetailID = tdAfter.TransDetailID
AND tdBefore.CamID = 1
AND tdBefore.CamID = 2
CROSS APPLY
(
SELECT COUNT(*) AS [Count]
FROM AVTransDetail
WHERE tdBefore.CreatedDate >= #MinDate
AND tdAfter.CreatedDate <= #MaxDate
) AS allRecords
WHERE tdBefore.AVClassID <> tdAfter.AVClassID
AND tdBefore.CreatedDate >= #MinDate
AND tdAfter.CreatedDate <= #MaxDate

Update All other Records Based on a single record

I have a table with a million records. I need to update some columns which are null based on the existing 'not null' records of a particular id based columns. I've tried with one query, it seems to be working fine but I don't have confidence in it that it will be able to update all those 1 million records exactly the way I need. I'm providing you some sample data how my table looks like.Any help will be appreciated
SELECT * INTO #TEST FROM (
SELECT 1 AS EMP_ID,10 AS DEPT_ID,15 AS ITEM_NBR ,NULL AS AMOUNT,NULL AS ITEM_NME
UNION ALL
SELECT 1,20,16,500,'ABCD'
UNION ALL
SELECT 1,30,17,NULL,NULL
UNION ALL
SELECT 2,10,15,1000,'XYZ'
UNION ALL
SELECT 2,30,16,NULL,NULL
UNION ALL
SELECT 2,40,17,NULL,NULL
) AS A
Sample data:
+--------+---------+----------+--------+----------+
| EMP_ID | DEPT_ID | ITEM_NBR | AMOUNT | ITEM_NME |
+--------+---------+----------+--------+----------+
| 1 | 10 | 15 | NULL | NULL |
| 1 | 20 | 16 | 500 | ABCD |
| 1 | 30 | 17 | NULL | NULL |
| 2 | 10 | 15 | 1000 | XYZ |
| 2 | 30 | 16 | NULL | NULL |
| 2 | 40 | 17 | NULL | NULL |
+--------+---------+----------+--------+----------+
Expected result:
+--------+---------+----------+--------+----------+
| EMP_ID | DEPT_ID | ITEM_NBR | AMOUNT | ITEM_NME |
+--------+---------+----------+--------+----------+
| 1 | 10 | 15 | 500 | ABCD |
| 1 | 20 | 16 | 500 | ABCD |
| 1 | 30 | 17 | 500 | ABCD |
| 2 | 10 | 15 | 1000 | XYZ |
| 2 | 30 | 16 | 1000 | XYZ |
| 2 | 40 | 17 | 1000 | XYZ |
+--------+---------+----------+--------+----------+
I tried this but I'm unable to conclude whether it is updating all the 1 million records properly.
SELECT * FROM #TEST T
inner JOIN #TEST T1 ON T1.EMP_ID=T.EMP_ID
WHERE T1.AMOUNT IS NOT NULL
UPDATE T SET AMOUNT=T1.AMOUNT
FROM #TEST T
inner JOIN #TEST T1 ON T1.EMP_ID=T.EMP_ID
WHERE T1.AMOUNT IS not NULL
I have used UPDATE using inner join
UPDATE T
SET T.AMOUNT = X.AMT,T.ITEM_NME=X.I_N
FROM #TEST T
JOIN
(SELECT EMP_ID,MAX(AMOUNT) AS AMT,MAX(ITEM_NME) AS I_N
FROM #TEST
GROUP BY EMP_ID) X ON X.EMP_ID = T.EMP_ID
SELECT * into #Test1
FROM #TEST
WHERE AMOUNT IS NOT NULL
For records validation run this query first
SELECT T.AMOUNT, T1.AMOUNT, T1.EMP_ID,T1.EMP_ID
FROM #TEST T
inner JOIN #TEST1 T1 ON T1.EMP_ID=T.EMP_ID
WHERE T.AMOUNT IS NULL
Begin Trans
UPDATE T
SET T.AMOUNT=T1.AMOUNT, T.ITEM_NME= = T1.ITEM_NME
FROM #TEST T
inner JOIN #TEST1 T1 ON T1.EMP_ID=T.EMP_ID
WHERE T.AMOUNT IS NULL
rollback
SELECT EMP_ID,MAX(AMOUNT) as AMOUNT MAX(ITEM_NAME) as ITEM_NAME
INTO #t
FROM #TEST
GROUP BY EMP_ID
UPDATE t SET t.AMOUNT = t1.AMOUNT, t.ITEM_NAME = t1.ITEM_NAME
FROM #TEST t INNER JOIN #t t1
ON t.emp_id = t1.emp_id
WHERE t.AMOUNT IS NULL and t.ITEM_NAME IS NULL
Use MAX aggregate function to get amount and item name for each employee and then replace null values of amount and item name with those values. For validation use COUNT function to calculate the number of rows with values of amount and item name as null. If the number of rows is zero then table is updated correctly

calculate days with gaps

table:
+-----------+--------------+------------+------------+
| RequestID | RequestStaus | StartDate | EndDate |
+-----------+--------------+------------+------------+
| 1 | pending | 9/1/2015 | 10/2/2015 |
| 1 | in progress | 10/2/2015 | 10/20/2015 |
| 1 | completed | 10/20/2015 | 11/3/2015 |
| 1 | reopened | 11/3/2015 | null |
| 2 | pending | 9/5/2015 | 9/7/2015 |
| 2 | in progress | 9/7/2015 | 9/25/2015 |
| 2 | completed | 9/25/2015 | 10/7/2015 |
| 2 | reopened | 10/10/2015 | 10/16/2015 |
| 2 | completed | 10/16/2015 | null |
+-----------+--------------+------------+------------+
I would like to calculate the days opened but exclude the days between completed and reopened. For example, RequestID 1, the days opened will be (11/3/2015 - 9/1/2015) + (GetDate() - 11/3/2015), for request 2, the total days will be (10/7/2015 - 9/5/2015) + ( 10/16/2015 - 10/10/2015).
The result I want will be something like:
+-----------+-------------------------------+
| RequestID | DaysOpened |
+-----------+-------------------------------+
| 1 | 63 + (getdate() - 11/3/2015) |
| 2 | 38 |
+-----------+-------------------------------+
How do I approach this problem? thank you!
Tested. Works well. :)
Note:
1) I suppose the required result = (FirstCompleteEndDate - PendingStartDate)+(Sum of all the Reopen duration)
2) So I used the self joins. Table b provides the exact completed record which immediately follows the in process record for each RequestID. Table c provides Sum of all the Reopen duration.
--create tbl structure
create table #test (RequestID int, RequestStatus varchar(20), StartDate date, EndDate date)
go
--insert sample data
insert #test
select 1,'pending','09/01/2015','10/2/2015'
union all
select 1,'in progress','10/2/2015','10/20/2015'
union all
select 1,'completed','10/20/2015','11/3/2015'
union all
select 1,'reopened','11/3/2015',null
union all
select 2,'pending','09/05/2015','9/7/2015'
union all
select 2,'in progress','09/07/2015','9/25/2015'
union all
select 2,'completed','9/25/2015','10/7/2015'
union all
select 2,'reopened','10/10/2015','10/16/2015'
union all
select 2, 'completed','10/16/2015','11/12/2015'
union all
select 2,'reopened','11/20/2015',null
select * from #test
--below is solution
select a.RequestID, a.Startdate as [PendingStartDate], b.enddate as [FirstCompleteEndDate], c.startdate as [LatestReopenStartDate],
datediff(day,a.startdate,b.enddate)+c.ReopenDays as [days] from #test a
join (
select *, row_number()over(partition by RequestID,RequestStatus order by StartDate) as rid from #test
) as b
on a.RequestID = b.RequestID
join (
select distinct RequestID, RequestStatus, max(StartDate)over(partition by RequestID,RequestStatus) as StartDate,
Sum(Case when enddate is null then datediff(day,startdate,getdate())
when enddate is not null then datediff(day,startdate,enddate)
end)over(partition by RequestID,RequestStatus) as [ReopenDays]
from #test
where RequestStatus = 'reopened'
) as c
on b.RequestID = c.RequestID
where a.RequestStatus ='pending' and b.RequestStatus = 'completed' and b.rid = 1
Result:

SQL statement - join based on date

I need to write a statement joining two tables based on dates.
Table 1 contains time recording entries.
+----+-----------+--------+---------------+
| ID | Date | UserID | DESC |
+----+-----------+--------+---------------+
| 1 | 1.10.2010 | 5 | did some work |
| 2 | 1.10.2011 | 5 | did more work |
| 3 | 1.10.2012 | 4 | me too |
| 4 | 1.11.2012 | 4 | me too |
+----+-----------+--------+---------------+
Table 2 contains the position of each user in the company. The ValidFrom date is the date at which the user has been or will be promoted.
+----+-----------+--------+------------+
| ID | ValidFrom | UserID | Pos |
+----+-----------+--------+------------+
| 1 | 1.10.2009 | 5 | PM |
| 2 | 1.5.2010 | 5 | Senior PM |
| 3 | 1.10.2010 | 4 | Consultant |
+----+-----------+--------+------------+
I need a query which outputs table one with one added column which is the position of the user at the time the entry has been made. (the Date column)
All date fileds are of type date.
I hope someone can help. I tried a lot but don't get it working.
Try this using a subselect in the where clause:
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE TimeRecord
(
ID INT,
[Date] Date,
UserID INT,
Description VARCHAR(50)
)
INSERT INTO TimeRecord
VALUES (1,'2010-01-10',5,'did some work'),
(2, '2011-01-10',5,'did more work'),
(3, '2012-01-10', 4, 'me too'),
(4, '2012-11-01',4,'me too')
CREATE TABLE UserPosition
(
ID Int,
ValidFrom Date,
UserId INT,
Pos VARCHAR(50)
)
INSERT INTO UserPosition
VALUES (1, '2009-01-10', 5, 'PM'),
(2, '2010-05-01', 5, 'Senior PM'),
(3, '2010-01-10', 4, 'Consultant ')
Query 1:
SELECT TR.ID,
TR.[Date],
TR.UserId,
TR.Description,
UP.Pos
FROM TimeRecord TR
INNER JOIN UserPosition UP
ON UP.UserId = TR.UserId
WHERE UP.ValidFrom = (SELECT MAX(ValidFrom)
FROM UserPosition UP2
WHERE UP2.UserId = UP.UserID AND
UP2.ValidFrom <= TR.[Date])
Results:
| ID | Date | UserId | Description | Pos |
|----|------------|--------|---------------|-------------|
| 1 | 2010-01-10 | 5 | did some work | PM |
| 2 | 2011-01-10 | 5 | did more work | Senior PM |
| 3 | 2012-01-10 | 4 | me too | Consultant |
| 4 | 2012-11-01 | 4 | me too | Consultant |
You can do it using OUTER APPLY:
SELECT ID, [Date], UserID, [DESC], x.Pos
FROM table1 AS t1
OUTER APPLY (
SELECT TOP 1 Pos
FROM table2 AS t2
WHERE t2.UserID = t1.UserID AND t2.ValidFrom <= t1.[Date]
ORDER BY t2.ValidFrom DESC) AS x(Pos)
For every row of table1 OUTER APPLY operation fetches all table2 rows of the same user that have a ValidFrom date that is older or the same as [Date]. These rows are sorted in descending order and the most recent of these is finally returned.
Note: If no match is found by the OUTER APPLY sub-query then a NULL value is returned, meaning that no valid position exists in table2 for the corresponding record in table1.
Demo here
This works by using a rank function and subquery. I tested it with some sample data.
select sub.ID,sub.Date,sub.UserID,sub.Description,sub.Position
from(
select rank() over(partition by t1.userID order by t2.validfrom desc)
as 'rank', t1.ID as'ID',t1.Date as'Date',t1.UserID as'UserID',t1.Descr
as'Description',t2.pos as'Position', t2.validfrom as 'validfrom'
from temployee t1 inner join jobs t2 on -- replace join tables with your own table names
t1.UserID=t2.UserID
) as sub
where rank=1
This query would work
select t1.*,t2.pos from Table1 t1 left outer join Table2 t2 on
t1.Date=t2.Date and t1.UserID=t2.UserID

Resources