Right join isn't pulling in all rows from right table - sql-server

I've got two tables for a call center in a Microsoft SQL Server database, fact_queue, that has the number of calls received, and dim_interval, which is used to convert the interval number (0-95) into time stamps (e.g. 07:15-07:30). It's set up this way so you can easily change the timezone data is being pulled in.
I'm trying to get a result which will show me all 96 intervals regardless of if there's a call or not, but it's not working as expected.
Here's an example of what's in the tables:
Fact_Queue
date_id
queue_id
interval_id
calls_offered
7780
40
0
1
7780
40
2
5
7780
40
3
6
7780
40
5
10
Dim_Interval
interval_id
interval_name
0
00:00 - 00:15
1
00:15 - 00:30
2
00:30 - 00:45
3
00:45 - 01:00
--
--
95
23:45 - 24:00
I've played around with a couple variations of a query and I believe the following should work, but it isn't
SELECT dim_interval.interval_name
,fact_queue.offered_calls
FROM dim_interval
RIGHT JOIN fact_queue
ON fact_queue.interval_id = dim_interval.interval_id
WHERE fact_queue.date_id= '7780'
AND fact_queue.queue_id = '40'
ORDER BY dim_interval.interval_id
This just results in
interval_name
calls_offered
00:00 - 00:15
1
00:30 - 00:45
5
00:45 - 01:00
6
01:15 - 01:30
10
but what I want is
interval_name
calls_offered
00:00 - 00:15
1
00:15 - 00:30
null
00:30 - 00:45
5
00:45 - 01:00
6
01:00 - 01:15
null
Why is the query not working? If it matters I'm using DBeaver version 21.0.3.202104181339

In the snippet dim_interval JOIN fact_queue, you've place the dimension table on the left, not the right. As you want all the dimension table's rows, this means you want a left outer join...
FROM dim_interval LEFT JOIN fact_queue
That only gets you half way there though, because the WHERE clause is applied After the join. This means the WHERE clause would filter out the results which have NULLs.
So, you need to do the filtering During the join...
SELECT dim_interval.interval_name
,fact_queue.offered_calls
FROM dim_interval
LEFT JOIN fact_queue
ON fact_queue.interval_id = dim_interval.interval_id
AND fact_queue.date_id= '7780'
AND fact_queue.queue_id = '40'
ORDER BY dim_interval.interval_id
Some people prefer to do the filtering Before the join, but that's not necessary and actually yields the same execution plan...
SELECT dim_interval.interval_name
,fact_queue.offered_calls
FROM dim_interval
LEFT JOIN (
SELECT *
FROM fact_queue
WHERE date_id= '7780'
AND queue_id = '40'
) AS fact_queue
ON fact_queue.interval_id = dim_interval.interval_id
ORDER BY dim_interval.interval_id

Related

Get value where ID in one table equals to the ID in another table

I am stuck on this SQL problem which may be easier than I think. So in a nutshell, how do I go about selecting the cost from the appropriate garage when the GarageHistID in the GarageCosts table equals to the ID in the GarageHistory table?
GarageCosts
GarageID Cost Version GarageHistID
950 213 1 455
950 342 3 NULL
GarageHistory
ID VendorID Version GarageID
454 44 1 NULL
455 2 1 950
456 44 2 NULL
Expected Output:
VendorID Cost Version
2 213 1
44 0 1
44 0 2
This is just a left join coalescing a null to zero.
SELECT
gh.VendorID,
ISNULL(gc.Cost,0) AS Cost,
gh.Version
FROM GarageHistory gh
LEFT JOIN GarageCost gc
ON gh.GarageID = gc.GarageID
AND gh.VersionID = gc.VersionID
There is no (specific) need to have bi-directional keys in your 2 tables, but you could use either for the join (along with VersionID).
The following query gives the exact results you mentioned in your question. You can use left join to join the two tables based on GarageHistID field in GarageCosts table and ID field in GarageHistory table
SELECT
gh.VendorID,
ISNULL(gc.Cost,0) AS Cost,
gh.[Version]
FROM GarageHistory gh
left JOIN GarageCosts gc
ON gh.ID = gc.GarageHistID
order by gc.Cost desc

Time Complexity of SQL Cursor

I am using CURSOR to implement the following in SQL Server, I am only iterating through the table - The time complexity will be O(n) I think (?). But everywhere I read about CURSOR, it says CURSOR is a bad practice. So is there a better way to implement the following ?
Existing Table
month value
1 92
4 20
9 92
New Table
month value
1 92
2 92
3 92
4 20
5 20
6 20
7 20
8 20
9 92
10 92
11 92
12 92
The use of cursor isn't (primarily) bad because it has poor time complexity, but because it is more error-prone and harder to read than a simple query. You are correct that iterating over a table via cursor is O(n).
On to your problem at hand. If you have the months (1..12) stored somewhere, say Months, then you can do it like this:
WITH matchingMonths AS (
SELECT m.month, MAX(mav.month) as matchedMonth
FROM Months m, MonthsAndValues mav
WHERE m.month >= mav.month
GROUP BY m.month
)
SELECT mm.month, mav.value
FROM matchingMonths mm
JOIN MonthsAndValues mav on mav.month = mm.matchedMonth
Without such a table Months, you could generate it on-the-fly:
WITH Months(month) AS (
SELECT 1
UNION ALL
SELECT month + 1 FROM Months WHERE month < 12
),
matchingMonths AS (
SELECT m.month, MAX(mav.month) as matchedMonth
FROM Months m, MonthsAndValues mav
WHERE m.month >= mav.month
GROUP BY m.month
)
SELECT mm.month, mav.value
FROM matchingMonths mm
JOIN MonthsAndValues mav on mav.month = mm.matchedMonth

T-SQL how to calculate datediff from previous or next row on log?

I use MS SSMS 2008 R2 to extract data from our company management software, which registers our employee actions and schedules. The table has and ID field, which is unique to each entry. job is the activity the user is performing. user is the user ID. start_time and duration are exactly that. Then there is a "type" where 0 is login (the user logs into the job) and 1 is available time (while performing a job the user may be available or not). "reason" is the reason why the user has become unavailable (break, coffee, lunch, training, etc). Type 0 entries have no reason so reason is always null.
I need to extract the unavailable times by reason and all I'm being able to achieve is to do a DATEADD of duration to start_time in order to get end_time and then use Excel to manually calculate the times for each row.
The SQL table looks like this:
id job user start_time duration type reason
4436812 3 758 05-06-2015 09:00 125670 0 NULL
4436814 3 758 05-06-2015 09:00 6970 1 1004
4436944 3 758 05-06-2015 09:14 39280 1 1004
4437119 3 758 05-06-2015 10:20 0 1 1002
4437172 3 758 05-06-2015 10:35 18470 1 1004
4437312 3 758 05-06-2015 11:09 3960 1 1004
4437350 3 758 05-06-2015 11:16 0 1 1006
4437360 3 758 05-06-2015 11:19 30080 1 1004
4437638 3 758 05-06-2015 12:13 6730 1 1004
4437695 3 758 05-06-2015 12:24 0 1 1007
4438227 3 758 05-06-2015 13:43 NULL 0 NULL
4438228 3 758 05-06-2015 13:43 NULL 1 NULL
(job = 3 and user = 758)
This is the query I made:
select CONVERT(date,start_time) Data, a.job, a.user, convert(varchar(15),convert(datetime,a.start_time),108) StartTime, a.duration duracao,
convert(varchar(15),convert(datetime,DATEADD(second,a.duration/10,a.start_time)),108) EndTime, a.type, a.reason
from schedule_log a
where a.job = 3
and a.user = 758
and CONVERT(date,start_time) = '20150605'
order by a.start_time, a.type
Which translates to:
Date job user LogTime Avail NotAvail
2015-06-05 3 758 04:44:01 04:10:23 00:33:38
So, for each reason, I have to do a DATEDIFF from end time (start+duration) to either the next type 1 start_time or the previous type 0 end time, which ever happened first (the user may become unavailable and then logoff).
How do I do this?
ps: duration is in tenths of second.
Ok, here is my updated suggestion. It is broken into three steps for clarity, but the temp tables are unnecessary - they could become subqueries.
Step 1: Calculate the end time for each period of activity, excluding logins.
Step 2: Join each row to the row that occurred immediately after it, to get the unavailable time following each reason. Note: some of your timestamps do not line up properly, possibly as a result of storing duration in seconds but timestamps only to the minute.
Step 3: Total the unavailable time, and subtract from the duration of the login to get the available time.
Step 4: Total the unavailable time by reason.
SELECT *
,dateadd(s, duration / 10, start_time) AS Endtime
,row_number() OVER (
PARTITION BY job ,[user] ORDER BY start_time, [type]
) AS RN
INTO #temp2
FROM MyTable
WHERE [type] = 1
SELECT a.[user]
,a.job
,a.reason
,a.start_time
,a.type
,a.duration / 10 AS AvailableSeconds
,datediff(s, a.Endtime, b.start_time) AS UnavailableSeconds
INTO #temp3
FROM #temp2 a
LEFT JOIN #temp2 b
ON a.[user] = b.[user]
AND a.job = b.job
AND a.RN = b.RN - 1
SELECT cast(a.start_time AS DATE) AS [Date]
,a.job
,a.[user]
,b.duration / 10 AS LogTime
,b.duration / 10 - sum(UnavailableSeconds) AS Avail
,sum(UnavailableSeconds) AS NotAvail
FROM #temp3 a
LEFT JOIN MyTable b
ON a.job = b.job
AND a.[user] = b.[user]
AND b.[type] = 0
AND b.duration IS NOT NULL
GROUP BY cast(a.start_time AS DATE)
,a.job
,a.[user]
,b.duration
SELECT cast(a.start_time AS DATE) AS [Date]
,a.job
,a.[user]
,a.reason
,sum(UnavailableSeconds) AS NotAvail
FROM #temp3 a
where reason is not null
GROUP BY cast(a.start_time AS DATE)
,a.job
,a.[user]
,a.reason

SQL Query to Calculate the Rolling Difference by Date

I cannot seem to work this one out to be exactly what need.
I'm using MS SQL Management Studio 2008.
I have a table (several actually) but lets keep it simple. The table contains daily stock figures for each item (SKU).
SKU DataDate Web_qty
2 2014-11-17 00:00:00 404
2 2014-11-18 00:00:00 373
2 2014-11-19 00:00:00 1350
66 2014-11-17 00:00:00 3624
66 2014-11-18 00:00:00 3576
66 2014-11-19 00:00:00 3570
67 2014-11-17 00:00:00 9353
67 2014-11-18 00:00:00 9297
67 2014-11-19 00:00:00 9250
I simply need the Select Query to return this:
SKU DataDate Difference
2 2014-11-17 00:00:00 ---
2 2014-11-18 00:00:00 -31
2 2014-11-19 00:00:00 +977
66 2014-11-17 00:00:00 ---
66 2014-11-18 00:00:00 -48
66 2014-11-19 00:00:00 -6
67 2014-11-17 00:00:00 ---
67 2014-11-18 00:00:00 -56
67 2014-11-19 00:00:00 -47
I do not need the --- parts, I have just shown that to draw attention to the fact that this one cannot be calculated as it is the first record.
I've tried using derived tables, but its getting a little confusing, i need to play with a working example so I can understand it better.
If someone could point me in the right direction I'm sure I'll be able to join the other tables back together (i.e. SKU Description and prices).
Really appreciate everyone's time
Kev
Try this. Use correlated sub-query to find rolling difference
CREATE TABLE #tem
(SKU INT,DataDate DATETIME,Web_qty INT)
INSERT #tem
VALUES( 2,'2014-11-17 00:00:00',404),
(2,'2014-11-18 00:00:00',373),
(2,'2014-11-19 00:00:00',1350),
(66,'2014-11-17 00:00:00',3624),
(66,'2014-11-18 00:00:00',3576),
(66,'2014-11-19 00:00:00',3570),
(67,'2014-11-17 00:00:00',9353),
(67,'2014-11-18 00:00:00',9297),
(67,'2014-11-19 00:00:00',9250)
SELECT *,
Web_qty - (SELECT Web_qty
FROM #tem a
WHERE a.sku = b.SKU
AND a.DataDate = Dateadd(dd, -1, b.DataDate)) Roll_diff
FROM #tem b
I know it is an old thread but I happened to have a similar problem and I ended up solving it with Window functions. It works in SQL 2014 but not sure about 2008.
It also solves the problem of potentially non-continuous data as well as rows with no changes. Hopefully it helps someone out there!
CREATE TABLE #tem
(SKU INT,DataDate DATETIME,Web_qty INT)
INSERT #tem
VALUES( 2,'2014-11-17 00:00:00',404),
(2,'2014-11-18 00:00:00',373),
(2,'2014-11-19 00:00:00',1350),
(2,'2014-11-20 00:00:00',1350),
(2,'2014-11-21 00:00:00',1350),
(66,'2014-11-17 00:00:00',3624),
(66,'2014-11-18 00:00:00',3576),
(66,'2014-11-19 00:00:00',3570),
(66,'2014-11-20 00:00:00',3590),
(66,'2014-11-21 00:00:00',3578),
(67,'2014-11-17 00:00:00',9353),
(67,'2014-11-18 00:00:00',9297),
(67,'2014-11-19 00:00:00',9250),
(67,'2014-11-20 00:00:00',9250),
(67,'2014-11-21 00:00:00',9240)
;WITH A AS (
SELECT
SKU,
DataDate,
Web_Qty,
Web_qty - LAG(Web_qty,1, 0)
OVER (PARTITION BY SKU ORDER BY DataDate) Roll_diff
FROM #tem b
)
SELECT
SKU,
DataDate ValidFromDate,
Lead(DataDate, 1, DateFromParts(9999,12,31)) OVER (PARTITION BY SKU ORDER BY DataDate) ValidToDate,
Web_Qty
FROM A WHERE Roll_diff <> 0

SQL cursor over categories?

I tried to post full SQL code to walk you through the data and conversions, but it wouldn't post here. Long story short, I end up with a data table like this:
Location Date Direction PreviousDirection Offset
site1 2013-07-22 11:30:45.000 302 302 0
site1 2013-07-22 11:31:45.000 322 302 20
site1 2013-07-22 11:32:45.000 9 322 47
site1 2013-07-22 11:33:45.000 9 9 0
site1 2013-07-22 11:34:45.000 0 9 -9
site2 2013-07-22 11:30:45.000 326 326 0
site2 2013-07-22 11:31:45.000 2 326 36
site2 2013-07-22 11:32:45.000 2 2 0
site2 2013-07-22 11:33:45.000 2 2 0
site2 2013-07-22 11:34:45.000 2 2 0
Location,Date is the primary key. I need help generating an [AdjustedDirection] column calculated as follows:
For first row (for each Location e.g. site1, site2): Since there is no previous row to calculate on, AdjustedDirection = first row's Direction.
After that, Second row AdjustedDirection: It's the first row's AdjustedDirection plus second row's offset.
Third row AdjustedDirection: It's the second row's AdjustedDirection plus third row's offset.
and so on...
I think this requires a cursor, but I don't know the syntax to do a cursor over multiple categories (Locations) and/or maybe there is a different answer. I can't describe how many steps and how complicated the process was to get to this step. I'm so close to the end and totally stuck here!
If anyone has a clue how to populate these AdjustedDirection values, please prove your awesomeness. Thanks!!
Results should look like this (date truncated for spacing, previous adjusted direction shown for clarity of how current row Adjusted is calculated):
Location Date Direction Offset PrevAdjDirection AdjustedDirection
site1 11:30:45.000 302 0 302 302
site1 11:31:45.000 322 20 302 322
site1 11:32:45.000 9 47 322 369
site1 11:33:45.000 9 0 369 369
site1 11:34:45.000 0 -9 369 360
site2 11:30:45.000 326 0 326 326
site2 11:31:45.000 2 36 326 362
site2 11:32:45.000 2 2 362 362
site2 11:33:45.000 2 2 362 362
site2 11:34:45.000 2 2 362 362
thanks!
Here is a solution using correlated subqueries, some of which can be replaced by window functions (the version of SQL Server makes a difference here).
You want to change your logic. Equivalent logic is:
For the first row, use the Direction
For subsequent rows, use the cumulative sum of the offsets excluding the first offset plus the direction from the first row.
The following calculates the appropriate variables using correlated subqueries, and then combines them using simple logic:
select t.*,
FirstOffset + coalesce(SumEarlierOffsets - FirstOffset + Offset, 0) as AdjustedOffset
from (select t.*,
(select Direction
from t t2
where t2.location = t.location
order by date asc
) as FirstDirection,
(select SUM(offset)
from t t2
where t2.location = t.location and
t2.date < t.date
) as SumEarlierOffsets,
(select Offset
from t t2
where t2.location = t.location
order by date asc
) as FirstOffset
from t
) t
I ended up dumping the current data into a temp table and doing a WHILE UPDATE like this
SELECT Location, Date, Direction, Offset, Adjusted = NULL
INTO #results
FROM t1
WHILE (
SELECT COUNT(*) FROM #results WHERE Adjusted IS NULL
) > 0
UPDATE TOP (1) t1
SET Adjusted = ISNULL(t2.Adjusted,ISNULL(t2.Direction,t1.Direction)) + t1.Offset
FROM #results t1
LEFT JOIN #results t2 ON t2.Location = t1.Location AND t2.Date = DateAdd(minute,-1,t1.Date)
WHERE t1.Adjusted IS NULL
Thanks for the input and inspiration!

Resources