How to Optimize for Windowed Aggregate Partition-By Clause in Netezza

How to Optimize for Windowed Aggregate Partition-By Clause in Netezza - netezza

I have a 10 billion row FACT table in netezza and I want to perform ROW_NUMBER() ,MAX () over and SUM () over in one query . When I am doing the same query executes for more than 3 hours . is there any way to improve the query performance .Table is distributed on the 4 columns which are part of the partition clause (COLA,COLB,COLC,COLD).
For sample example
SUM(STR_QTY) OVER (
PARTITION BY
COLA
,COLB
,COLC
,COLD
) AS SLS_RTRN_QTY
,
SUM(STR_QTY_1) OVER (
PARTITION BY
COLA
,COLB
,COLC
,COLD
) AS VAL_QTY
,MIN(ITM_FST_DT) OVER (
PARTITION BY COLA
,COLB
) AS FIRST_DT
,MAX(ITM_LST_DT) OVER (
PARTITION BY PARTITION BY COLA
,COLB
) AS LAST_DT
Edit 1: Original query
SELECT a.*
FROM (
SELECT F.DT_KEY AS DT_KEY
,F.COL_KEY AS COL_KEY
,F.PCK_ITM_KEY AS PCK_ITM_KEY
,F.COLC AS COLC
,F.COLD AS COLD
,F.COLA AS COLA
,F.COLB AS COLB
,F.COLC AS COLC
,F.SH_QTY AS SH_QTY
,SUM(F.SLS_QTY) OVER (
PARTITION BY F.COLD
,F.COLA
,F.COLB
,F.COLC
) AS SLS_QTY
,SUM(F.SLS_RTRN_QTY) OVER (
PARTITION BY F.COLD
,F.COLA
,F.COLB
,F.COLC
) AS SLS_RTRN_QTY
,SUM(F.PCHSE_QTY) OVER (
PARTITION BY F.COLD
,F.COLA
,F.COLB
,F.COLC
) AS PCHSE_QTY
,MAX(F.LST_ML_DT) OVER (
PARTITION BY F.COLA
,F.COLC
) AS LST_ML_DT
,F.LST_MODFD_DTTM AS LST_MODFD_DTTM
,ROW_NUMBER() OVER (
PARTITION BY F.COLD
,F.COLA
,F.COLB
,F.COLC
,F.COLE ORDER BY F.DT_KEY DESC
) AS RNK
FROM FCT_ITEM F
) a
WHERE a.RNK = 1;

This query will cause the entire table to be redistributed on COLA and COLB. If the set of distribution columns is a subset of the partition columns then you won't haven't the expensive redistribution.
As a general rule, use the fewest columns possible in your distribution clause while still maintaining a fairly even distribution.
If just COLA or COLB alone would given even distribution then go with one of those.

Related

How to select the top 1 in case distinct returns 2 rows

I have a select distinct query that can return 2 rows with the same code since not all columns have the same value. Now my boss wants to get the first one. So how to I do it. Below is the sample result. I want only to return the get the first two unique pro

Use row_number in your query. Please find this link for more info link
; with cte as (
select row_number() over (partition by pro order by actual_quantity) as Slno, * from yourtable
) select * from cte where slno = 1

Your chances to get the proper answer can be much higher if you spend some time to prepare the question properly. Provide the DDL and sample data, as well as add the desired result.
To solve your problem, you need to know the right uniqueness order to get 1 record per window group. Google for window functions. In my example the uniqueness is --> Single row for every pro with earliest proforma_invoice_received_date date and small amount per this date.
DROP TABLE IF EXISTS #tmp;
GO
CREATE TABLE #tmp
(
pro VARCHAR(20) ,
actual_quantity DECIMAL(12, 2) ,
proforma_invoice_received_date DATE ,
import_permit DATE
);
GO
INSERT INTO #tmp
( pro, actual_quantity, proforma_invoice_received_date, import_permit )
VALUES ( 'N19-00945', 50000, '20190516', '20190517' ),
( 'N19-00945', 50001, '20190516', '20190517' )
, ( 'N19-00946', 50002, '20190516', '20190517' )
, ( 'N19-00946', 50003, '20190516', '20190517' );
SELECT a.pro ,
a.actual_quantity ,
a.proforma_invoice_received_date ,
a.import_permit
FROM ( SELECT pro ,
actual_quantity ,
proforma_invoice_received_date ,
import_permit ,
ROW_NUMBER() OVER ( PARTITION BY pro ORDER BY proforma_invoice_received_date, actual_quantity ) AS rn
FROM #tmp
) a
WHERE rn = 1;
-- you can also use WITH TIES for that to save some lines of code
SELECT TOP ( 1 ) WITH TIES
pro ,
actual_quantity ,
proforma_invoice_received_date ,
import_permit
FROM #tmp
ORDER BY ROW_NUMBER() OVER ( PARTITION BY pro ORDER BY proforma_invoice_received_date, actual_quantity );
DROP TABLE #tmp;

Try this-
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY pro ORDER BY Pro) RN
-- You need to add other columns in the ORDER BY clause
-- with 'pro' to get your desired row. other case you
-- will get first row returned by the query with only
-- order by 'pro' and this can vary for different execution
FROM your_table
)A
WHERE RN = 1

CREATE TABLE T (
A [numeric](10, 2) NULL,
B [numeric](10, 2) NULL
)
INSERT INTO T VALUES (100,20)
INSERT INTO T VALUES (100,30)
INSERT INTO T VALUES (200,40)
INSERT INTO T VALUES (200,50)
select *
from T
/*
A B
100.00 20.00
100.00 30.00
200.00 40.00
200.00 50.00
*/
select U.A, U.B
from
(select row_number() over(Partition By A Order By B) as row_num, *
from T ) U
where row_num = 1
/*
A B
100.00 20.00
200.00 40.00
*/

SQL Server 2008 ROW_NUMBER() order by Clustered PK slow

I have a simple query below that joins 3 of my tables. Due to the fact that OFFSET and FETCH statement are no available in SQL Server 2008, therefore I have implemented the ROW_NUMBER() in one of my paginated order report.
SELECT * FROM
(
SELECT
ROW_NUMBER() OVER ( ORDER BY OrderProductDetail.ID ) AS RowNum,
*
FROM
Order JOIN
OrderProduct ON Order.ID = OrderProduct.OrderID JOIN
OrderProductDetail ON OrderProduct.ID = OrderProductDetail.OrderProductID
WHERE
Order.Date BETWEEN '2018-01-01 00:00:00.000' AND '2018-02-01 00:00:00.000'
) AS OrderDetailView
WHERE RowNum BETWEEN 1 AND 1000;
With over 16M 3M records in the Table the above query took 1 minute to complete, records found are capped to 1000.
However, if I simply remove the RowNum in WHERE Clause then the query complete within 3 seconds and total of 1700 records returned. (Also same result if I only run the Sub-Query portion)
SELECT * FROM
(
SELECT
ROW_NUMBER() OVER ( ORDER BY OrderProductDetail.ID ) AS RowNum,
*
FROM
Order JOIN
OrderProduct ON Order.ID = OrderProduct.OrderID JOIN
OrderProductDetail ON OrderProduct.ID = OrderProductDetail.OrderProductID
WHERE
Order.Date BETWEEN '2018-01-01 00:00:00.000' AND '2018-02-01 00:00:00.000'
) AS OrderDetailView
Order.ID = Unique Clustered PK (Int)
Order.Date = Non-Clustered Index (Timestamp)
OrderProduct.ID = Unique Clustered PK (Int)
OrderProductDetail.ID = Unique Clustered PK (Int)
Some other test cases I've performed:
( ORDER BY Order.Date ) AS RowNumber >> Fast
( ORDER BY Order.ID ) AS RowNumber >> Fast
Question: How can I improve the performance?

UPDATE STATISTICS Order WITH FULLSCAN;
UPDATE STATISTICS OrderProduct WITH FULLSCAN;
UPDATE STATISTICS OrderProductDetail WITH FULLSCAN;
Finally the query went back to normal after I executed the above commands, my DBA didn't include the FULLSCAN option in first attempt therefore it wasn't working.
Thanks #Jeroen Mostert!

how to use Group by with top count in MDX query

I have an example where we prepared query in sql for fetching appropriate results
SQL Query-
select partnerid,BrandDesc,ActualRetailValue
from
(
select DENSE_RANK() over (partition by partnerid order by sum(ActualRetailValue) desc) as rnk,
partnerid,BrandDesc,sum(ActualRetailValue) as ActualRetailValue
from JDASales
where partnerid in (693,77)
group by partnerid,BrandDesc
) as A
where rnk <=5
order by partnerid,rnk
Output -
I want this result with mdx query.Even tryout with this code
SELECT
NON EMPTY
{[Measures].[Actual Retail Value]} ON COLUMNS
,NON EMPTY
[DimBrands].[Brand].[Brand].ALLMEMBERS
*
TopCount
(
[DimPartners].[Partner].[Partner].ALLMEMBERS
*
[DimSKU].[XXX Desc].[XXX Desc].ALLMEMBERS
,5
,[Measures].[Actual Retail Value]
) ON ROWS
FROM
(
SELECT
{[DimPartners].[Partner].&[1275]} ON COLUMNS
FROM
(
SELECT
{[Dim Date].[Fiscal Year].&[2014-01-01T00:00:00]} ON COLUMNS
FROM [SALES]
)
)
WHERE
[Dim Date].[Fiscal Year].&[2014-01-01T00:00:00];

You can amend the rows snippet to use the GENERATE function:
SELECT
NON EMPTY
{[Measures].[Actual Retail Value]} ON 0
,NON EMPTY
GENERATE(
[DimBrands].[Brand].[Brand].ALLMEMBERS AS B
,
TopCount(
B.CURRENTMEMBER
*[DimPartners].[Partner].[Partner].ALLMEMBERS
*[DimSKU].[XXX Desc].[XXX Desc].ALLMEMBERS
,5
,[Measures].[Actual Retail Value]
)
) ON ROWS
...
...
This functions usage is detailed here: https://msdn.microsoft.com/en-us/library/ms145526.aspx

TSQL matching the first instances of multiple values in a resultset

Say I have part of a large query, as below, that returns a resultset with multiple rows of the same key information (PolNum) with different value information (PolPremium) in a random order.
Would it be possible to select the first matching PolNum fields and sum up the PolPremium. In this case I know that there are 2 PolNumber's used so given the screenshot of the resultset (yes I know it starts at 14 for illustration purposes) and return the first values and sum the result.
First match for PolNum 000035789547
(ROW 14) PolPremium - 32.00
First match for PolNum 000035789547
(ROW 16) PolPremium - 706043.00
Total summed should be 32.00 + 706043.00 = 706072.00
Query
OUTER APPLY
(
SELECT PolNum, PolPremium
FROM PN20
WHERE PolNum IN(SELECT PolNum FROM SvcPlanPolicyView
WHERE SvcPlanPolicyView.ControlNum IN (SELECT val AS ServedCoverages FROM ufn_SplitMax(
(SELECT TOP 1 ServicedCoverages FROM SV91 WHERE SV91.AccountKey = 3113413), ';')))
ORDER BY PN20.PolEffDate DESC
}
Resultset

Suppose that pic if the final result your query produces. Then you can do something like:
DECLARE #t TABLE
(
PolNum VARCHAR(20) ,
PolPremium MONEY
)
INSERT INTO #t
VALUES ( '000035789547', 32 ),
( '000035789547', 76 ),
( '000071709897', 706043.00 ),
( '000071709897', 1706043.00 )
SELECT t.PolNum ,
SUM(PolPremium) AS PolPremium
FROM ( SELECT * ,
ROW_NUMBER() OVER ( PARTITION BY PolNum ORDER BY PolPremium ) AS rn
FROM #t
) t
WHERE rn = 1
GROUP BY GROUPING SETS(t.PolNum, ( ))
Output:
PolNum PolPremium
000035789547 32.00
000071709897 706043.00
NULL 706075.00
Just replace #t with your query. Also I assume that row with minimum of premium is the first. You could probably do filtering top row in outer apply part but it really not clear for me what is going on there without some sample data.

need to update first non null field_x in non-normalised table

I have the following table that I have to work with.
SQL Fiddle
Basically, it is a product that stores up to 10 barcodes for a product code (simplified example). At any time, any number of those 10 barcode fields might have a value.
I have another table that has a list of product code and barcode, and need to add these to the product barcode table.
I need to perform an update so that any of the barcodes in barcodes_to_import are appended to the product_barcode table, into the first non null barcode column.
table product_barcodes
product_Code barcode_1 barcode_2 barcode_3 barcode_4 barcode_5
ABC 1 2 3
BCD 4
table barcodes_to_import
product_code barcode
ABC 7
BCD 8
Expected output:
product_Code barcode_1 barcode_2 barcode_3 barcode_4 barcode_5
ABC 1 2 3 7
BCD 4 8

create table product_barcodes(product_Code varchar(10),barcode_1 int,barcode_2 int,barcode_3 int
,barcode_4 int,barcode_5 int,barcode_6 int,barcode_7 int,barcode_8 int,barcode_9 int,barcode_10 int)
create table barcodes_to_import(product_code varchar(10),barcode int)
--Inserted Sample values as below
SELECT * FROM product_barcodes
SELECT * FROM barcodes_to_import
--Output Query
;with cte
as
(
select product_code,data,col_name
from product_barcodes
unpivot
(
data for col_name in (
barcode_1,barcode_2,barcode_3,barcode_4,barcode_5
,barcode_6,barcode_7,barcode_8,barcode_9,barcode_10
)
) upvt
)
,cte1
as
(
select *,ROW_NUMBER() OVER(PARTITION BY product_code ORDER BY col_name) as rn
from
(
select product_code, data,col_name from cte
union all
select product_code,barcode,'barcode_z' as col_name from barcodes_to_import
) t
)
select
product_Code
,SUM(1) as barcode_1
,SUM([2]) as barcode_2
,SUM([3]) as barcode_3
,SUM([4]) as barcode_4
,SUM([5]) as barcode_5
,SUM([6]) as barcode_6
,SUM([7]) as barcode_7
,SUM([8]) as barcode_8
,SUM([9]) as barcode_9
,SUM([10]) as barcode_10
from cte1
PIVOT
(
AVG(data) for rn in (1,[2],[3],[4],[5],[6],[7],[8],[9],[10])
) pvt
GROUP BY product_Code