How to repeat values in case of null values on left join

How to repeat values in case of null values on left join - sql-server

I have a table with a calendar, and a table with rates. In the table with the rates, there are no values existing for days in the weekend. I'm trying to join the two, in order to have a table where there is a rate for all days, and I need the rates in the weekend to be the latest available rate. Instad of it showing NULL values, as it would when you make a left join and the record doesn't exist, it should just take the latest available, repeating the previous value.
I have the below code, which works, but it takes 2 min to do on 7,397 rows, which is way too long.
Does anyone know a faster way to get the same results?
SELECT
c.CalendarID,
MAX(r.RateID)
FROM Dim_Calendar c
LEFT JOIN Dim_Rates r ON r.RateDate <= c.CalendarID
What I get without <= and just an = is the following
CalendarID | RateID
20131001 | 2
20131002 | 3
20131003 | 4
20131004 | 5
20131005 | NULL
20131006 | NULL
20131007 | 6
And this is the desired table:
CalendarID | RateID
20131001 | 2
20131002 | 3
20131003 | 4
20131004 | 5
20131005 | 5
20131006 | 5
20131007 | 6

You can use LAG() window function:
SELECT c.CalendarID,
COALESCE(
r.RateID,
LAG(r.RateID, 1) OVER (ORDER BY c.CalendarID),
LAG(r.RateID, 2) OVER (ORDER BY c.CalendarID)
) RateID
FROM Dim_Calendar c LEFT JOIN Dim_Rates r
ON r.RateDate = c.CalendarID
ORDER BY c.CalendarID
See the demo.
Results:
> CalendarID | RateID
> ---------: | :-----
> 20131001 | 2
> 20131002 | 3
> 20131003 | 4
> 20131004 | 5
> 20131005 | 5
> 20131006 | 5
> 20131007 | 6

You could use a correlated subquery to fill the gaps:
SELECT
c.CalendarID,
(SELECT TOP 1 r.RateID FROM Dim_Rates r
WHERE r.RateDate <= c.CalendarID AND r.RateID IS NOT NULL
ORDER BY r.RateDate DESC) AS RateID
FROM Dim_Calendar c
ORDER BY c.CalendarID;
This query can be improved by using the following index:
CREATE INDEX idx ON Dim_Rates (RateDate, RateID);

As pointed out, you need to check for proper and covering indexing. It appears you are running a against a DW DB and if that is the case then you can replace the CTE with indexed temp tables if the esitmated row count approximation is way off in the query plan.
;WITH NormalizedData AS
(
SELECT
RateID,CalendarID,
VirtualGroupID = SUM(LastRecordBeforeGap) OVER (ORDER BY CalendarID ROWS UNBOUNDED PRECEDING)
FROM
(
SELECT RateID,CalendarID,
LastRecordBeforeGap = CASE WHEN LEAD(RateID) OVER(ORDER BY CalendarID) IS NULL AND RateID IS NOT NULL THEN 1 ELSE 0 END
FROM
Dim_Calendar c
LEFT JOIN Dim_Rates r ON r.RateDate = c.CalendarID
)AS x
)
SELECT
RateID = ISNULL(RateID, SUM(RateID) OVER(PARTITION BY VirtualGroupID)),
CalendarID
FROM
NormalizedData

Related

Joining two tables and need to have MAX aggregate function in ON clause

This is my code! I want to give a part id and purchase order id to my report and it brings all the related information with those specification. The important thing is that, if we have same purchase order id and part id we need the code to return the result with the highest transaction id. The following code is not providing what I expected. Could you please help me?
SELECT MAX(INVENTORY_TRANS.TRANSACTION_ID), INVENTORY_TRANS.PART_ID
, INVENTORY_TRANS.PURC_ORDER_ID, TRACE_INV_TRANS.QTY, TRACE_INV_TRANS.CREATE_DATE, TRACE_INV_TRANS.TRACE_ID
FROM INVENTORY_TRANS
JOIN TRACE_INV_TRANS ON INVENTORY_TRANS.TRANSACTION_ID = TRACE_INV_TRANS.TRANSACTION_ID
WHERE INVENTORY_TRANS.PART_ID = #PartID
AND INVENTORY_TRANS.PURC_ORDER_ID = #PurchaseOrderID
GROUP BY TRACE_INV_TRANS.QTY, TRACE_INV_TRANS.CREATE_DATE, TRACE_INV_TRANS.TRACE_ID, INVENTORY_TRANS.PART_ID
, INVENTORY_TRANS.PURC_ORDER_ID
The sample of trace_inventory_trans table is :
part_id trace_id transaction id qty create_date
x 1 10
x 2 11
x 3 12
the sample of inventory_trans table is :
transaction_id part_id purc_order_id
11 x p20
12 x p20
I wanted to have the result of biggest transaction which is transaction 12 but it shows me transaction 11

I would use a sub-query to find the MAX value, then join that result to the other table.
The ORDER BY + TOP (1) returns the MAX value for transaction_id.
SELECT
inv.transaction_id
,inv.part_id
,inv.purc_order_id
,tr.qty
,tr.create_date
,tr.trace_id
FROM
(
SELECT TOP (1)
transaction_id,
part_id,
purc_order_id
FROM
INVENTORY_TRANS
WHERE
part_id = #PartID
AND
purc_order_id = #PurchaseOrderID
ORDER BY
transaction_id DESC
) AS inv
JOIN
TRACE_INV_TRANS AS tr
ON inv.transaction_id = tr.transaction_id;
Results:
+----------------+---------+---------------+------+-------------+----------+
| transaction_id | part_id | purc_order_id | qty | create_date | trace_id |
+----------------+---------+---------------+------+-------------+----------+
| 12 | x | p20 | NULL | NULL | 3 |
+----------------+---------+---------------+------+-------------+----------+
Rextester Demo

Sum variable amount of intervals together

we just changed our telephony system and every agents are now being logged through 15 minute intervals and we need 1 line per event
table event:
empid | code | timestamp | duration
5111 | 5 | 09:45:00 | 45
5222 | 2 | 09:58:00 | 120
5111 | 5 | 10:00:00 | 900
5111 | 5 | 10:15:00 | 900
5111 | 5 | 10:15:30 | 30
5222 | 5 | 11:00:00 | 8
5222 | 5 | 11:00:05 | 5
timestamp is writen after the fact, so a timestamp at 9:45:00 with a duration of 45 was from 9:44:15 and since the interval stopped at 9:45, it was written at that time, but i need 9:44:15 save
result should give me
empid | code | timestamp | duration
5111 | 5 | 09:44:15 | 1875
5222 | 2 | 09:56:00 | 120
5222 | 5 | 10:59:52 | 13
The problem is the phones are locked with a 2 hours max delay, and as you can see with my employee # 5222 he spent 13 seconds on two lines... i could join the same table 10 times. 1 to avoid when there is the same code where the end time of the previous line = the starttime of the new line
this is on MSSQL 2008
Select e.empid
,e.code
,convert(time(0),DATEADD(ss,- e.Duration, e.timestamp))
,e.duration + isnull(e1.duration,0) + isnull(e2.duration,0)
from [event] e
left join [event] e0 on
convert(TIME(0),DATEADD(ss,- e.Duration, e.timestamp)) = e0.timestamp
and
e.empid = e0.empid
and
e.code = e0.code
left join [event] e1 on
convert(TIME(0),DATEADD(ss,- e1.Duration, e1.timestamp)) = e.timestamp
and
e.empid = e1.empid
and
e.code = e1.code
left join [event] e2 on
convert(TIME(0),DATEADD(ss,- e2.Duration, e2.timestamp)) = e1.timestamp
and
e2.empid = e1.empid
and
e2.code = e1.code
--etc......
where isnull(e0.duration,'-10') = '-10'
This works but far from optimal...
i would rather use an aggregate function but i dont know how to write it as there is no comon key other than last timestamps match with new - duration with this table!
it is important to know that agent 5111 could go again on code 5 on the same day, and i would need 2 lines for this one.... if not it would have been too easy!
thank you in advance!

Try this. I have commented in the code, but the basic algorithm
find rowswhich are continuations i.e. there exists a row which matches once
you subtract the duration
find the "originals" i.e. the start of each call by subtracting the continuations
for each original, find the next original so we can determine a range of times to look for continuations
join it all together and add the total duration from continuations appropriate to each original
Hope this helps, it was an interesting challenge!
declare #data table
(
empid int,
code int,
[timestamp] time,
duration int
);
insert into #data values(5111,5,'09:45',45),
(5222,2,'09:58',120),
(5111,5,'10:00',900),
(5111,5,'10:15',900),
(5111,5,'10:15:30',30),
(5222,5,'11:00',8),
(5222,5,'11:00:05',5),
-- added these rows to include the situation you describe where 5111 goes again on code 5:
(5111,5,'13:00',45),
(5111,5,'13:15',900),
(5111,5,'13:15:25',25);
-- find where a row is a continuation
with continuations as (
select a.empid, a.code, a.[timestamp] , a.duration
from #data a
inner join #data b on a.empid = b.empid
and a.code = b.code
where dateadd(ss, -a.duration, a.[timestamp]) = b.[timestamp]
),
-- find the "original" rows as the complement of continuations
originals as
(
select d.empid, d.code, d.[timestamp], d.duration
from #data d
left outer join continuations c on d.empid = c.empid and d.code = c.code and d.timestamp = c.timestamp
where c.empid is null
),
-- to hand the situation where we have more than one call for same agent and code,
-- find the next timestamp for each empid/code
nextcall as (
select a.*, a2.[timestamp] nex
from originals a
outer apply (
select top 1 [timestamp]
from originals a2
where a2.[timestamp] > a.[timestamp]
and a.empid = a2.empid
and a.code = a2.code
order by a2.[timestamp] desc
) a2
)
select o.empid,
o.code,
dateadd(ss, -o.duration, o.timestamp) as [timestamp],
o.duration + isnull(sum(c.duration),0) as duration
from originals o
left outer join nextcall n on o.empid = n.empid and o.code = n.code and o.[timestamp] = n.[timestamp]
left outer join continuations c on o.empid = c.empid
and o.code = c.code
-- filter the continuations on the range of times based on finding the next one
and c.[timestamp] > o.[timestamp]
and (n.nex is null or c.[timestamp] < n.nex)
group by o.empid,
o.code,
o.duration,
o.[timestamp]

Netezza: Show dates even if 0 data for that day

I have this query through an odbc connection in excel for a refreshable report with data for every 4 weeks. I need to show the dates in each of the 4 weeks even if there is no data for that day because this data is then linked to a Graph. Is there a way to do this?
thanks.
Select b.INV_DT, sum( a.ORD_QTY) as Ordered, sum( a.SHIPPED_QTY) as Shipped
from fct_dly_invoice_detail a, fct_dly_invoice_header b, dim_invoice_customer c
where a.INV_HDR_SK = b.INV_HDR_SK
and b.DIM_INV_CUST_SK = c.DIM_INV_CUST_SK
and a.SRC_SYS_CD = 'ABC'
and a.NDC_NBR is not null
**and b.inv_dt between CURRENT_DATE - 16 and CURRENT_DATE**
and b.store_nbr in (2851, 2963, 3249, 3385, 3447, 3591, 3727, 4065, 4102, 4289, 4376, 4793, 5209, 5266, 5312, 5453, 5569, 5575, 5892, 6534, 6571, 7110, 9057, 9262, 9652, 9742, 10373, 12392, 12739, 13870
)
group by 1

The general purpose solution to this is to create a date dimension table, and then perform an outer join to that date dimension table on the INV_DT column.
There are tons of good resources you can search for on creating a good date dimension table, so I'll just create a quick and dirty (and trivial) example here. I highly recommend some research in that area if you'll be doing a lot of BI/reporting.
If our table we want to report from looks like this:
Table "TABLEZ"
Attribute | Type | Modifier | Default Value
-----------+--------+----------+---------------
AMOUNT | BIGINT | |
INV_DT | DATE | |
Distributed on random: (round-robin)
select * from tablez order by inv_dt
AMOUNT | INV_DT
--------+------------
1 | 2015-04-04
1 | 2015-04-04
1 | 2015-04-06
1 | 2015-04-06
(4 rows)
and our report looks like this:
SELECT inv_dt,
SUM(amount)
FROM tablez
WHERE inv_dt BETWEEN CURRENT_DATE - 5 AND CURRENT_DATE
GROUP BY inv_dt;
INV_DT | SUM
------------+-----
2015-04-04 | 2
2015-04-06 | 2
(2 rows)
We can create a date dimension table that contains a row for every date (or ate last 1024 days in the past and 1024 days in the future using the _v_vector_idx view in this example).
create table date_dim (date_dt date);
insert into date_dim select current_date - idx from _v_vector_idx;
insert into date_dim select current_date + idx +1 from _v_vector_idx;
Then our query would look like this:
SELECT d.date_dt,
SUM(amount)
FROM tablez a
RIGHT OUTER JOIN date_dim d
ON a.inv_dt = d.date_dt
WHERE d.date_dt BETWEEN CURRENT_DATE -5 AND CURRENT_DATE
GROUP BY d.date_dt;
DATE_DT | SUM
------------+-----
2015-04-01 |
2015-04-02 |
2015-04-03 |
2015-04-04 | 2
2015-04-05 |
2015-04-06 | 2
(6 rows)
If you actually needed a zero value instead of a NULL for the days where you had no data, you could use a COALESCE or NVL like this:
SELECT d.date_dt,
COALESCE(SUM(amount),0)
FROM tablez a
RIGHT OUTER JOIN date_dim d
ON a.inv_dt = d.date_dt
WHERE d.date_dt BETWEEN CURRENT_DATE -5 AND CURRENT_DATE
GROUP BY d.date_dt;
DATE_DT | COALESCE
------------+----------
2015-04-01 | 0
2015-04-02 | 0
2015-04-03 | 0
2015-04-04 | 2
2015-04-05 | 0
2015-04-06 | 2
(6 rows)

I agree with #ScottMcG that you need to get the list of dates. However if you are in a situation where you aren't allowed to create a table. You can simplify things. All you need is a table that has at least 28 rows. Using your example, this should work.
select date_list.dt_nm, nvl(results.Ordered,0) as Ordered, nvl(results.Shipped,0) as Shipped
from
(select row_number() over(order by sub.arb_nbr)+ (current_date -28) as dt_nm
from (select rowid as arb_nbr
from fct_dly_invoice_detail b
limit 28) sub ) date_list left outer join
( Select b.INV_DT, sum( a.ORD_QTY) as Ordered, sum( a.SHIPPED_QTY) as Shipped
from fct_dly_invoice_detail a inner join
fct_dly_invoice_header b
on a.INV_HDR_SK = b.INV_HDR_SK
and a.SRC_SYS_CD = 'ABC'
and a.NDC_NBR is not null
**and b.inv_dt between CURRENT_DATE - 16 and CURRENT_DATE**
and b.store_nbr in (2851, 2963, 3249, 3385, 3447, 3591, 3727, 4065, 4102, 4289, 4376, 4793, 5209, 5266, 5312, 5453, 5569, 5575, 5892, 6534, 6571, 7110, 9057, 9262, 9652, 9742, 10373, 12392, 12739, 13870)
inner join
dim_invoice_customer c
on b.DIM_INV_CUST_SK = c.DIM_INV_CUST_SK
group by 1 ) results
on date_list.dt_nm = results.inv_dt

selecting all but the last X rows from a join in sql server

I'm having trouble with some maintenance on my database. We have a table A and a table B with a one-to-many relationship. Right now there are between 1 and 10 rows for every row in table B and I want to see every row except the 5 most recent. If there are 5 or less rows in B from any row in A, I don't want to see it because I don't care about that data.
Here's the query I have so far:
WITH cte (id, number)
AS
(
SELECT A.id, COUNT(*)
FROM A INNER JOIN B ON A.id=B.a
GROUP BY A.id
)
SELECT A.id, B.id, number
FROM cte c
INNER JOIN B ON B.a=c.id
WHERE number > 5
ORDER BY A.id, B.id DESC;
GO
It will give me the IDs of the rows in A and B, and the number is just to help me see what is going on (it will be 10 if there are 10 matching rows, 9 if 9, etc).
I just don't really know where to go next. I have a list of rows in A and their matches in B and I want to see only the last 5 rows in B for every row in A. My data might look like this:
A | B | number
---------
1 | 7 | 7
1 | 6 | 7
1 | 5 | 7
1 | 4 | 7
1 | 3 | 7
1 | 2 | 7
1 | 1 | 7
2 | 9 | 2
2 | 8 | 2
And what I want is this:
A | B | number
---------
1 | 2 | 7
1 | 1 | 7
So really my question is - how can I filter out the last 5 rows in B for every row in A like this? I don't even know if I am heading in the right direction with what I've got so far, but it seemed like a reasonable starting point.

Try this query:
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY b.A ORDER BY b.id DESC) AS RowNum, ... other columns from b ...
FROM dbo.B as b
) x
WHERE x.RowNum > 5
Note: ROW_NUMBER() OVER(PARTITION b.A ORDER BY B.id DESC) will start numbering rows from 1 for every b.A in descending order => last row will have RowNum = 1, last but one row will have RowNum = 2, etc.

WITH cte (id, number) AS ( SELECT A.id, COUNT(*) FROM A
INNER JOIN B ON A.id=B.a GROUP BY A.id )
SELECT TOP 5 A.id, B.id, number FROM cte c
INNER JOIN B ON B.a=c.id
WHERE number > 5 ORDER BY A.id ASC, B.id ASC; GO

Get value with MAX(date) from two table

I have two tables.
MainTable:
MainID | LastValue | LastReadingDate
1 | 234 | 01.01.2012
2 | 534 | 03.02.2012
Readings:
MainID | ValueRead | ReadingDate
1 | 123 | 03.02.2012
1 | 488 | 04.03.2012
2 | 324 | 03.02.2012
2 | 683 | 05.04.2012
I want to get
SELECT MainTable.MainID, MainTable.LastValue, MainTable.LastReadingDate, (SELECT ValueRead, MAX(ReadingDate)
FROM Readings
WHERE Readings.MainID=MainTable.MainID ORDER BY ValueRead)
In other words, I want to get the current LastValue and LastReadingDate from MainTable along side the ValueRead with the most recent ReadingDate from Readings.

Here is a query you could use. It'll show all MainTable entries, including those that doesn't have a "Reading" entry yet. Change the LEFT JOIN to an INNER JOIN if you don't want it like that.
WITH LastReads AS (
SELECT ROW_NUMBER() OVER (PARTITION BY MainID ORDER BY ReadingDate DESC) AS ReadingNumber,
MainID,
ValueRead,
ReadingDate
FROM Readings
)
SELECT M.MainID, M.LastValue, M.LastReadingDate, R.ValueRead, R.ReadingDate
FROM MainTable M
LEFT OUTER JOIN LastReads R
ON M.MainID = R.MainID
AND R.ReadingNumber = 1 -- Last reading, use 2 or 3 to get the 2nd newest, 3rd newest, etc.
SQLFiddle-link: http://sqlfiddle.com/#!3/16c68/3
Another link with N number of readings per mainid: http://sqlfiddle.com/#!3/16c68/4

Not tried this myself, but here goes. Please try
select max(r.readingdate), max(t.lastvalue), max(t.lastreadingdate)
from readings r inner join
( select MainID, LastValue, LastReadingDate
from MainTable m
where LastReadingDate =
(select max(minner.LastReadingDate)
from MainTable minner
where minner.MainID = m.MainID
)
) t
on (r.mainid = t.mainid)

try this:
select M.LastValue, M.LastReadingDate,
(select top 1 ValueRead from Readings where MainID=M.MainID order by ReadingDate desc)
from MainTable M

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to repeat values in case of null values on left join - sql-server

Related

Joining two tables and need to have MAX aggregate function in ON clause

Sum variable amount of intervals together

Netezza: Show dates even if 0 data for that day

selecting all but the last X rows from a join in sql server

Get value with MAX(date) from two table

Categories

Resources