Netezza: Show dates even if 0 data for that day - netezza

I have this query through an odbc connection in excel for a refreshable report with data for every 4 weeks. I need to show the dates in each of the 4 weeks even if there is no data for that day because this data is then linked to a Graph. Is there a way to do this?
thanks.
Select b.INV_DT, sum( a.ORD_QTY) as Ordered, sum( a.SHIPPED_QTY) as Shipped
from fct_dly_invoice_detail a, fct_dly_invoice_header b, dim_invoice_customer c
where a.INV_HDR_SK = b.INV_HDR_SK
and b.DIM_INV_CUST_SK = c.DIM_INV_CUST_SK
and a.SRC_SYS_CD = 'ABC'
and a.NDC_NBR is not null
**and b.inv_dt between CURRENT_DATE - 16 and CURRENT_DATE**
and b.store_nbr in (2851, 2963, 3249, 3385, 3447, 3591, 3727, 4065, 4102, 4289, 4376, 4793, 5209, 5266, 5312, 5453, 5569, 5575, 5892, 6534, 6571, 7110, 9057, 9262, 9652, 9742, 10373, 12392, 12739, 13870
)
group by 1

The general purpose solution to this is to create a date dimension table, and then perform an outer join to that date dimension table on the INV_DT column.
There are tons of good resources you can search for on creating a good date dimension table, so I'll just create a quick and dirty (and trivial) example here. I highly recommend some research in that area if you'll be doing a lot of BI/reporting.
If our table we want to report from looks like this:
Table "TABLEZ"
Attribute | Type | Modifier | Default Value
-----------+--------+----------+---------------
AMOUNT | BIGINT | |
INV_DT | DATE | |
Distributed on random: (round-robin)
select * from tablez order by inv_dt
AMOUNT | INV_DT
--------+------------
1 | 2015-04-04
1 | 2015-04-04
1 | 2015-04-06
1 | 2015-04-06
(4 rows)
and our report looks like this:
SELECT inv_dt,
SUM(amount)
FROM tablez
WHERE inv_dt BETWEEN CURRENT_DATE - 5 AND CURRENT_DATE
GROUP BY inv_dt;
INV_DT | SUM
------------+-----
2015-04-04 | 2
2015-04-06 | 2
(2 rows)
We can create a date dimension table that contains a row for every date (or ate last 1024 days in the past and 1024 days in the future using the _v_vector_idx view in this example).
create table date_dim (date_dt date);
insert into date_dim select current_date - idx from _v_vector_idx;
insert into date_dim select current_date + idx +1 from _v_vector_idx;
Then our query would look like this:
SELECT d.date_dt,
SUM(amount)
FROM tablez a
RIGHT OUTER JOIN date_dim d
ON a.inv_dt = d.date_dt
WHERE d.date_dt BETWEEN CURRENT_DATE -5 AND CURRENT_DATE
GROUP BY d.date_dt;
DATE_DT | SUM
------------+-----
2015-04-01 |
2015-04-02 |
2015-04-03 |
2015-04-04 | 2
2015-04-05 |
2015-04-06 | 2
(6 rows)
If you actually needed a zero value instead of a NULL for the days where you had no data, you could use a COALESCE or NVL like this:
SELECT d.date_dt,
COALESCE(SUM(amount),0)
FROM tablez a
RIGHT OUTER JOIN date_dim d
ON a.inv_dt = d.date_dt
WHERE d.date_dt BETWEEN CURRENT_DATE -5 AND CURRENT_DATE
GROUP BY d.date_dt;
DATE_DT | COALESCE
------------+----------
2015-04-01 | 0
2015-04-02 | 0
2015-04-03 | 0
2015-04-04 | 2
2015-04-05 | 0
2015-04-06 | 2
(6 rows)

I agree with #ScottMcG that you need to get the list of dates. However if you are in a situation where you aren't allowed to create a table. You can simplify things. All you need is a table that has at least 28 rows. Using your example, this should work.
select date_list.dt_nm, nvl(results.Ordered,0) as Ordered, nvl(results.Shipped,0) as Shipped
from
(select row_number() over(order by sub.arb_nbr)+ (current_date -28) as dt_nm
from (select rowid as arb_nbr
from fct_dly_invoice_detail b
limit 28) sub ) date_list left outer join
( Select b.INV_DT, sum( a.ORD_QTY) as Ordered, sum( a.SHIPPED_QTY) as Shipped
from fct_dly_invoice_detail a inner join
fct_dly_invoice_header b
on a.INV_HDR_SK = b.INV_HDR_SK
and a.SRC_SYS_CD = 'ABC'
and a.NDC_NBR is not null
**and b.inv_dt between CURRENT_DATE - 16 and CURRENT_DATE**
and b.store_nbr in (2851, 2963, 3249, 3385, 3447, 3591, 3727, 4065, 4102, 4289, 4376, 4793, 5209, 5266, 5312, 5453, 5569, 5575, 5892, 6534, 6571, 7110, 9057, 9262, 9652, 9742, 10373, 12392, 12739, 13870)
inner join
dim_invoice_customer c
on b.DIM_INV_CUST_SK = c.DIM_INV_CUST_SK
group by 1 ) results
on date_list.dt_nm = results.inv_dt

Related

How to repeat values in case of null values on left join

I have a table with a calendar, and a table with rates. In the table with the rates, there are no values existing for days in the weekend. I'm trying to join the two, in order to have a table where there is a rate for all days, and I need the rates in the weekend to be the latest available rate. Instad of it showing NULL values, as it would when you make a left join and the record doesn't exist, it should just take the latest available, repeating the previous value.
I have the below code, which works, but it takes 2 min to do on 7,397 rows, which is way too long.
Does anyone know a faster way to get the same results?
SELECT
c.CalendarID,
MAX(r.RateID)
FROM Dim_Calendar c
LEFT JOIN Dim_Rates r ON r.RateDate <= c.CalendarID
What I get without <= and just an = is the following
CalendarID | RateID
20131001 | 2
20131002 | 3
20131003 | 4
20131004 | 5
20131005 | NULL
20131006 | NULL
20131007 | 6
And this is the desired table:
CalendarID | RateID
20131001 | 2
20131002 | 3
20131003 | 4
20131004 | 5
20131005 | 5
20131006 | 5
20131007 | 6
You can use LAG() window function:
SELECT c.CalendarID,
COALESCE(
r.RateID,
LAG(r.RateID, 1) OVER (ORDER BY c.CalendarID),
LAG(r.RateID, 2) OVER (ORDER BY c.CalendarID)
) RateID
FROM Dim_Calendar c LEFT JOIN Dim_Rates r
ON r.RateDate = c.CalendarID
ORDER BY c.CalendarID
See the demo.
Results:
> CalendarID | RateID
> ---------: | :-----
> 20131001 | 2
> 20131002 | 3
> 20131003 | 4
> 20131004 | 5
> 20131005 | 5
> 20131006 | 5
> 20131007 | 6
You could use a correlated subquery to fill the gaps:
SELECT
c.CalendarID,
(SELECT TOP 1 r.RateID FROM Dim_Rates r
WHERE r.RateDate <= c.CalendarID AND r.RateID IS NOT NULL
ORDER BY r.RateDate DESC) AS RateID
FROM Dim_Calendar c
ORDER BY c.CalendarID;
This query can be improved by using the following index:
CREATE INDEX idx ON Dim_Rates (RateDate, RateID);
As pointed out, you need to check for proper and covering indexing. It appears you are running a against a DW DB and if that is the case then you can replace the CTE with indexed temp tables if the esitmated row count approximation is way off in the query plan.
;WITH NormalizedData AS
(
SELECT
RateID,CalendarID,
VirtualGroupID = SUM(LastRecordBeforeGap) OVER (ORDER BY CalendarID ROWS UNBOUNDED PRECEDING)
FROM
(
SELECT RateID,CalendarID,
LastRecordBeforeGap = CASE WHEN LEAD(RateID) OVER(ORDER BY CalendarID) IS NULL AND RateID IS NOT NULL THEN 1 ELSE 0 END
FROM
Dim_Calendar c
LEFT JOIN Dim_Rates r ON r.RateDate = c.CalendarID
)AS x
)
SELECT
RateID = ISNULL(RateID, SUM(RateID) OVER(PARTITION BY VirtualGroupID)),
CalendarID
FROM
NormalizedData

Joining two tables and need to have MAX aggregate function in ON clause

This is my code! I want to give a part id and purchase order id to my report and it brings all the related information with those specification. The important thing is that, if we have same purchase order id and part id we need the code to return the result with the highest transaction id. The following code is not providing what I expected. Could you please help me?
SELECT MAX(INVENTORY_TRANS.TRANSACTION_ID), INVENTORY_TRANS.PART_ID
, INVENTORY_TRANS.PURC_ORDER_ID, TRACE_INV_TRANS.QTY, TRACE_INV_TRANS.CREATE_DATE, TRACE_INV_TRANS.TRACE_ID
FROM INVENTORY_TRANS
JOIN TRACE_INV_TRANS ON INVENTORY_TRANS.TRANSACTION_ID = TRACE_INV_TRANS.TRANSACTION_ID
WHERE INVENTORY_TRANS.PART_ID = #PartID
AND INVENTORY_TRANS.PURC_ORDER_ID = #PurchaseOrderID
GROUP BY TRACE_INV_TRANS.QTY, TRACE_INV_TRANS.CREATE_DATE, TRACE_INV_TRANS.TRACE_ID, INVENTORY_TRANS.PART_ID
, INVENTORY_TRANS.PURC_ORDER_ID
The sample of trace_inventory_trans table is :
part_id trace_id transaction id qty create_date
x 1 10
x 2 11
x 3 12
the sample of inventory_trans table is :
transaction_id part_id purc_order_id
11 x p20
12 x p20
I wanted to have the result of biggest transaction which is transaction 12 but it shows me transaction 11
I would use a sub-query to find the MAX value, then join that result to the other table.
The ORDER BY + TOP (1) returns the MAX value for transaction_id.
SELECT
inv.transaction_id
,inv.part_id
,inv.purc_order_id
,tr.qty
,tr.create_date
,tr.trace_id
FROM
(
SELECT TOP (1)
transaction_id,
part_id,
purc_order_id
FROM
INVENTORY_TRANS
WHERE
part_id = #PartID
AND
purc_order_id = #PurchaseOrderID
ORDER BY
transaction_id DESC
) AS inv
JOIN
TRACE_INV_TRANS AS tr
ON inv.transaction_id = tr.transaction_id;
Results:
+----------------+---------+---------------+------+-------------+----------+
| transaction_id | part_id | purc_order_id | qty | create_date | trace_id |
+----------------+---------+---------------+------+-------------+----------+
| 12 | x | p20 | NULL | NULL | 3 |
+----------------+---------+---------------+------+-------------+----------+
Rextester Demo

using all values from one column in another query

I am trying to find a solution for the following issue that I have in sql-server:
I have one table t1 of which I want to use each date for each agency and loop it through the query to find out the avg_rate. Here is my table t1:
Table T1:
+--------+-------------+
| agency | end_date |
+--------+-------------+
| 1 | 2017-10-01 |
| 2 | 2018-01-01 |
| 3 | 2018-05-01 |
| 4 | 2012-01-01 |
| 5 | 2018-04-01 |
| 6 | 2017-12-01l |
+--------+-------------+
I literally want to use all values in the column end_date and plug it into the query here (I marked it with ** **):
with averages as (
select a.id as agency
,c.rate
, avg(c.rate) over (partition by a.id order by a.id ) as avg_cost
from table_a as a
join rates c on a.rate_id = c.id
and c.end_date = **here I use all values from t1.end_date**
and c.Start_date = **here I use all values from above minus half a year** = dateadd(month,-6,end_date)
group by a.id
,c.rate
)
select distinct agency, avg_cost from averages
order by 1
The reason why I need two dynamic dates is that the avg_rates vary if you change the timeframe between these dates.
My problem and my question is now:
How can you take the end_date from table t1 plug it into the query where c.end_date is and loop if through all values in t1.end_date?
I appreciate your help!
Do you really need a windowed average? Try this out.
;with timeRanges AS
(
SELECT
T.end_date,
start_date = dateadd(month,-6, T.end_date)
FROM
T1 AS T
)
select
a.id as agency,
c.rate,
T.end_date,
T.start_date,
avg_cost = avg(c.rate)
from
table_a as a
join rates c on a.rate_id = c.id
join timeRanges AS T ON A.DateColumn BETWEEN T.start_date AND T.end_date
group by
a.id ,
c.rate,
T.end_date,
T.start_date
You need a date column to join your data against T1 (I called it DateColumn in this example), otherwise all time ranges would return the same averages.
I can think of several ways to do this - Cursor, StoredProcedure, Joins ...
Given the simplicity of your query, a cartesian product (Cross Join) of Table T1 against the averages CTE should do the magic.

TSQL Conditional Where or Group By?

I have a table like the following:
id | type | duedate
-------------------------
1 | original | 01/01/2017
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | original | 10/01/2017
3 | revised | 09/01/2017
Where there may be either one or two rows for each id. If there are two rows with same id, there would be one with type='original' and one with type='revised'. If there is one row for the id, type will always be 'original'.
What I want as a result are all the rows where type='revised', but if there is only one row for a particular id (thus type='original') then I want to include that row too. So desired output for the above would be:
id | type | duedate
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | revised | 09/01/2017
I do not know how to construct a WHERE clause that conditionally checks whether there are 1 or 2 rows for a given id, nor am I sure how to use GROUP BY because the revised date could be greater than or less than than the original date so use of aggregate functions MAX or MIN don't work. I thought about using CASE somehow, but also do not know how to construct a conditional that chooses between two different rows of data (if there are two rows) and display one of them rather than the other.
Any suggested approaches would be appreciated.
Thanks!
you can use row number for this.
WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Type DESC) AS RN
FROM YourTable
)
SELECT *
FROM T
WHERE RN = 1
Is something like this sufficient?
SELECT *
FROM mytable m1
WHERE type='revised'
or 1=(SELECT COUNT(*) FROM mytable m2 WHERE m2.id=m1.id)
You could use a subquery to take the MAX([type]). In this case it works for [type] since alphabetically we want revised first, then original and "r" comes after "o" in the alphabet. We can then INNER JOIN back on the same table with the matching conditions.
SELECT T2.*
FROM (
SELECT id, MAX([type]) AS [MAXtype]
FROM myTABLE
GROUP BY id
) AS dT INNER JOIN myTable T2 ON dT.id = T2.id AND dT.[MAXtype] = T2.[type]
ORDER BY T2.[id]
Gives output:
id type duedate
1 revised 2017-02-01
2 original 2017-03-01
3 revised 2017-09-01
Here is the sqlfiddle: http://sqlfiddle.com/#!6/14121f/6/0

Updating 1 table from another using wheres

Trying to update one column, from another table with the highest Date.
Table 1 Example:
PartNumber | Cost
1000 | .10
1001 | .20
Table 2 Example:
PartNumber | Cost | Date
1000 | .10 | 2017-01-01
1000 | .50 | 2017-02-01
1001 | .20 | 2017-01-01
1002 | .50 | 2017-02-02
I would like to update table 1 with the most recent values from table2, which would be .50 for each... The query I use to update this has worked just fine until I realized I was not grabbing the correct Cost because there were multiples.. I now want to grab the highest dated revision.
My query:
UPDATE dex_mfgx..insp_master
SET dex_mfgx..insp_master.costperpart = t2.sct_cst_tot
FROM dex_mfgx..insp_master AS t1
INNER JOIN qad_repl..sct_det_sql AS t2
ON t1.partnum = t2.sct_part
WHERE t1.partnum = t2.sct_part and t2.sct_cst_date = MAX(t2.sct_cst_date) ;
My Error:
Msg 147, Level 15, State 1, Line 6
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
Not having much luck with HAVING or GROUPING, although I havent used them much..
Any have an idea that would help?
I think I understand what you are trying to solve now. Thanks to Lamak for setting me straight as I was way off base originally.
Something like this I think is what you are looking for.
with TotalCosts as
(
SELECT t2.sct_cst_tot
, t1.partnum
, RowNum = ROW_NUMBER() over(partition by t1.partnun order by t2.sct_cst_date desc)
FROM dex_mfgx..insp_master AS t1
INNER JOIN qad_repl..sct_det_sql AS t2 ON t1.partnum = t2.sct_part
)
update t1
set costperpart = tc.sct_cst_tot
from dex_mfgx..insp_master AS t1
join TotalCosts tc on tc.partnum = t1.partnum
where tc.RowNum = 1

Resources