I am going to post this question divided in sections. Your help would be much appreciated!
Overview of the Issue
I have an issue with data refresh. Right now, I have a pipeline running on Azure Data Factory which is composed of 7 blocks. Each block hosts a stored procedure coded in SQL. Each block generates a table used by the next block. Tables average 3GB in weight. Each block takes between one and five hours to execute.
What I did so far
Given that I am pretty new to this world, I tried to abide by best practices. For example, each statement begins with SET NOCOUNT ON. I always specified column names in select statement, avoiding code blocks like SELECT * FROM TABLE. Also, each table has well defined Primary Keys (defined in the same order for each table) and, being SQL server, they also act as clustered indexes. As for the exemple reported below, these are two screenshots of the indexes:
and
I tried to exploit the MERGE statement. An example of one stored procedure is reported below. Note that I took the max() of all dimension fields because I joined sales and stock together at the level of Week, Product, Season, and Store and then grouped only by Week, Product, and Season. Hence, if I had stores that had a product but did not sale it I would have doubled the rows on NULL attributes.
CREATE PROCEDURE dbo.SP_W_LINE_LEVEL_RT_WORLD2
AS
SET NOCOUNT ON;
MERGE [dbo].[W_LINE_LEVEL_RT_WORLD2] AS TARGET
USING
(
SELECT
COALESCE(sales.FISCAL_YEARWEEK_NM, stock.FISCAL_YEARWEEK_NM) AS FISCAL_YEARWEEK_NM,
COALESCE(sales.BTN, stock.BTN_CD) AS BTN,
COALESCE(sales.ARTICLE_SEASON_DS, stock.ARTICLE_SEASON_DS) AS ARTICLE_SEASON_DS,
MAX(PRODUCT_IMAGE) AS PRODUCT_IMAGE,
--MAX(SUBCHANNEL) AS SUBCHANNEL,
MAX(GENDER) AS GENDER,
MAX(PRODUCT_CATEGORY) AS PRODUCT_CATEGORY,
MAX(FIRST_SEASON) AS FIRST_SEASON,
MAX(LAST_SEASON) AS LAST_SEASON,
MAX(TREND_DS) AS TREND_DS,
MAX(TREND_CD) AS TREND_CD,
MAX(COLOR) AS COLOR,
MAX(MATERIAL) AS MATERIAL,
MAX(FINISH) AS FINISH,
MAX(DEPARTMENT) AS DEPARTMENT,
MAX(PRODUCT_SUBCATEGORY) AS PRODUCT_SUBCATEGORY,
MAX(THEME_DS) AS THEME_DS,
MAX(THEME_FIX) AS THEME_FIX,
MAX(PRODUCT_SUBGROUP_DS) AS PRODUCT_SUBGROUP_DS,
MAX(PRODUCT_GROUP_DS) AS PRODUCT_GROUP_DS,
MAX(MODEL_DS) AS MODEL_DS,
MAX(PRODUCT_FAMILY_DS) AS PRODUCT_FAMILY_DS,
MAX(LAST_YEAR_WEEK_ABILITAZIONE) AS LAST_YEAR_WEEK_ABILITAZIONE,
-- AVERAGE LIFE TO DATE METRICS
MAX(FULL_PRICE_SALES_LTD) / NULLIF(MAX(TOTAL_QTY_LTD),0) AS AVERAGE_FULL_PRICE_CHF,
MAX(NET_REVENUES_LTD) / NULLIF(MAX(TOTAL_QTY_LTD),0) AS AVERAGE_SELLING_PRICE_CHF,
MAX(SALES_VALUE_STD_LTD) / NULLIF(MAX(TOTAL_QTY_LTD),0) AS STANDARD_COST_CHF,
MAX(SALES_VALUE_STD_LTD) / NULLIF(MAX(NET_REVENUES_LTD),0) AS MARGIN,
-- WEEK BY WEEK ANALYSIS
MAX(LAST_YEAR_MOTNH_ABILITAZIONE) AS LAST_YEAR_MOTNH_ABILITAZIONE,
SUM(NET_REVENUES) AS NET_REVENUES,
SUM(NET_SALES) AS NET_SALES,
SUM(FULL_PRICE_SALES) AS FULL_PRICE_SALES,
SUM(VAL_STDCOST) AS VAL_STDCOST,
SUM(TOT_PROMOTION) AS TOT_PROMOTION,
SUM(NET_SALES_MARKDOWN) AS NET_SALES_MARKDOWN,
SUM(NET_REVENUES_MARKDOWN) AS NET_REVENUES_MARKDOWN,
SUM(TOT_DISCOUNT) AS TOT_DISCOUNT,
SUM(SALES_QTY) AS SALES_QTY,
COUNT(DISTINCT STORE_SELLING) AS STORE_SELLING,
COUNT(DISTINCT COALESCE(sales.CLIENT_CD, stock.CLIENT_CD)) AS TOTAL_STORES_LW,
-- LIFE TO DATE ANALYSIS
MAX(NET_SALES_LTD) AS NET_SALES_LTD,
MAX(NET_REVENUES_LTD) AS NET_REVENUES_LTD,
MAX(FULL_PRICE_SALES_LTD) AS FULL_PRICE_SALES_LTD,
MAX(SALES_VALUE_STD_LTD) AS SALES_VALUE_STD_LTD,
MAX(FULL_PRICE_QTY_LTD) AS FULL_PRICE_QTY_LTD,
MAX(TOTAL_QTY_LTD) AS TOTAL_QTY_LTD,
MAX(NET_SALES_LTD_REGION) AS NET_SALES_LTD_REGION,
MAX(NET_REVENUES_LTD_REGION) AS NET_REVENUES_LTD_REGION,
MAX(FULL_PRICE_SALES_LTD_REGION) AS FULL_PRICE_SALES_LTD_REGION,
MAX(SALES_VALUE_STD_LTD_REGION) AS SALES_VALUE_STD_LTD_REGION,
MAX(FULL_PRICE_QTY_LTD_REGION) AS FULL_PRICE_QTY_LTD_REGION,
MAX(TOTAL_QTY_LTD_REGION) AS TOTAL_QTY_LTD_REGION,
COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_AW_LTD),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_AW_LTD),0) + COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_SS_LTD),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_SS_LTD),0) AS WEEKS_ON_FLOOR_LTD,
COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_AW_LTD_REGION),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_AW_LTD_REGION),0) + COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_SS_LTD_REGION),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_SS_LTD_REGION),0) AS WEEKS_ON_FLOOR_LTD_REGION,
COALESCE(MAX(RUNNING_STORE_SELLING_AW_CARRY_LTD),0) + COALESCE(MAX(RUNNING_STORE_SELLING_AW_SEASONAL_LTD),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_CARRY_LTD),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_SEASONAL_LTD),0) AS STORES_SELLING_LTD,
COALESCE(MAX(RUNNING_STORE_SELLING_AW_CARRY_LTD_REGION),0) + COALESCE(MAX(RUNNING_STORE_SELLING_AW_SEASONAL_LTD_REGION),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_CARRY_LTD_REGION),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_SEASONAL_LTD_REGION),0) AS STORES_SELLING_LTD_REGION,
-- STOCK DATA
COUNT(DISTINCT stock.CLIENT_CD) AS STOCK_STORES,
SUM(STOCK_QTY_NM) AS STOCK_QTY_NM,
SUM(STOCK_IN_STORE) AS STOCK_IN_STORE,
SUM(STOCK_IN_TRANSIT_TO_STORE) AS STOCK_IN_TRANSIT_TO_STORE,
SUM(STOCK_IN_REGIONAL_DC) AS STOCK_IN_REGIONAL_DC,
SUM(STOCK_IN_STORE) + SUM(STOCK_IN_TRANSIT_TO_STORE) + SUM(STOCK_IN_TRANSIT_TO_STORE) + SUM(STOCK_IN_REGIONAL_DC) AS TOTAL_STOCK_ON_HAND,
SUM(STOCK_ALLOCATED_IN_CENTRAL_WH) AS STOCK_ALLOCATED_IN_CENTRAL_WH,
SUM(STOCK_IN_TRANSIT_TO_REGION) AS STOCK_IN_TRANSIT_TO_REGION,
SUM(STOCK_FREE_CENTRAL_WH) AS STOCK_FREE_CENTRAL_WH
FROM
W_SALES_LINE_LEVEL_RT_WORLD2 sales
FULL OUTER JOIN
W_STOCK_LINE_LEVEL_RT_WORLD stock
ON
sales.FISCAL_YEARWEEK_NM = stock.FISCAL_YEARWEEK_NM
AND sales.BTN = stock.BTN_CD
AND sales.ARTICLE_SEASON_DS = stock.ARTICLE_SEASON_DS
AND sales.CLIENT_CD = stock.CLIENT_CD
GROUP BY
COALESCE(sales.FISCAL_YEARWEEK_NM, stock.FISCAL_YEARWEEK_NM),
COALESCE(sales.BTN, stock.BTN_CD),
COALESCE(sales.ARTICLE_SEASON_DS, stock.ARTICLE_SEASON_DS)
) NUOVA
ON
TARGET.FISCAL_YEARWEEK_NM = NUOVA.FISCAL_YEARWEEK_NM
AND TARGET.BTN = NUOVA.BTN
AND TARGET.ARTICLE_SEASON_DS = NUOVA.ARTICLE_SEASON_DS
WHEN MATCHED
THEN UPDATE SET
TARGET.FISCAL_YEARWEEK_NM = NUOVA.FISCAL_YEARWEEK_NM,
TARGET.BTN = NUOVA.BTN,
TARGET.ARTICLE_SEASON_DS = NUOVA.ARTICLE_SEASON_DS,
TARGET.PRODUCT_IMAGE = NUOVA.PRODUCT_IMAGE,
--TARGET.SUBCHANNEL = NUOVA.SUBCHANNEL,
TARGET.GENDER = NUOVA.GENDER,
TARGET.PRODUCT_CATEGORY = NUOVA.PRODUCT_CATEGORY,
TARGET.FIRST_SEASON = NUOVA.FIRST_SEASON,
TARGET.LAST_SEASON = NUOVA.LAST_SEASON,
TARGET.TREND_DS = NUOVA.TREND_DS,
TARGET.TREND_CD = NUOVA.TREND_CD,
TARGET.COLOR = NUOVA.COLOR,
TARGET.MATERIAL = NUOVA.MATERIAL,
TARGET.FINISH = NUOVA.FINISH,
TARGET.DEPARTMENT = NUOVA.DEPARTMENT,
TARGET.PRODUCT_SUBCATEGORY = NUOVA.PRODUCT_SUBCATEGORY,
TARGET.THEME_DS = NUOVA.THEME_DS,
TARGET.THEME_FIX = NUOVA.THEME_FIX,
TARGET.PRODUCT_SUBGROUP_DS = NUOVA.PRODUCT_SUBGROUP_DS,
TARGET.PRODUCT_GROUP_DS = NUOVA.PRODUCT_GROUP_DS,
TARGET.MODEL_DS = NUOVA.MODEL_DS,
TARGET.PRODUCT_FAMILY_DS = NUOVA.PRODUCT_FAMILY_DS,
TARGET.LAST_YEAR_WEEK_ABILITAZIONE = NUOVA.LAST_YEAR_WEEK_ABILITAZIONE,
TARGET.AVERAGE_FULL_PRICE_CHF = NUOVA.AVERAGE_FULL_PRICE_CHF,
TARGET.AVERAGE_SELLING_PRICE_CHF = NUOVA.AVERAGE_SELLING_PRICE_CHF,
TARGET.STANDARD_COST_CHF = NUOVA.STANDARD_COST_CHF,
TARGET.MARGIN = NUOVA.MARGIN,
TARGET.LAST_YEAR_MOTNH_ABILITAZIONE = NUOVA.LAST_YEAR_MOTNH_ABILITAZIONE,
TARGET.NET_REVENUES = NUOVA.NET_REVENUES,
TARGET.NET_SALES = NUOVA.NET_SALES,
TARGET.FULL_PRICE_SALES = NUOVA.FULL_PRICE_SALES,
TARGET.VAL_STDCOST = NUOVA.VAL_STDCOST,
TARGET.TOT_PROMOTION = NUOVA.TOT_PROMOTION,
TARGET.NET_SALES_MARKDOWN = NUOVA.NET_SALES_MARKDOWN,
TARGET.NET_REVENUES_MARKDOWN = NUOVA.NET_REVENUES_MARKDOWN,
TARGET.TOT_DISCOUNT = NUOVA.TOT_DISCOUNT,
TARGET.SALES_QTY = NUOVA.SALES_QTY,
TARGET.STORE_SELLING = NUOVA.STORE_SELLING,
TARGET.TOTAL_STORES_LW = NUOVA.TOTAL_STORES_LW,
TARGET.NET_SALES_LTD = NUOVA.NET_SALES_LTD,
TARGET.NET_REVENUES_LTD = NUOVA.NET_REVENUES_LTD,
TARGET.FULL_PRICE_SALES_LTD = NUOVA.FULL_PRICE_SALES_LTD,
TARGET.SALES_VALUE_STD_LTD = NUOVA.SALES_VALUE_STD_LTD,
TARGET.FULL_PRICE_QTY_LTD = NUOVA.FULL_PRICE_QTY_LTD,
TARGET.TOTAL_QTY_LTD = NUOVA.TOTAL_QTY_LTD,
TARGET.NET_SALES_LTD_REGION = NUOVA.NET_SALES_LTD_REGION,
TARGET.NET_REVENUES_LTD_REGION = NUOVA.NET_REVENUES_LTD_REGION,
TARGET.FULL_PRICE_SALES_LTD_REGION = NUOVA.FULL_PRICE_SALES_LTD_REGION,
TARGET.SALES_VALUE_STD_LTD_REGION = NUOVA.SALES_VALUE_STD_LTD_REGION,
TARGET.FULL_PRICE_QTY_LTD_REGION = NUOVA.FULL_PRICE_QTY_LTD_REGION,
TARGET.TOTAL_QTY_LTD_REGION = NUOVA.TOTAL_QTY_LTD_REGION,
TARGET.WEEKS_ON_FLOOR_LTD = NUOVA.WEEKS_ON_FLOOR_LTD,
TARGET.WEEKS_ON_FLOOR_LTD_REGION = NUOVA.WEEKS_ON_FLOOR_LTD_REGION,
TARGET.STORES_SELLING_LTD = NUOVA.STORES_SELLING_LTD,
TARGET.STORES_SELLING_LTD_REGION = NUOVA.STORES_SELLING_LTD_REGION,
TARGET.STOCK_STORES = NUOVA.STOCK_STORES,
TARGET.STOCK_QTY_NM = NUOVA.STOCK_QTY_NM,
TARGET.STOCK_IN_STORE = NUOVA.STOCK_IN_STORE,
TARGET.STOCK_IN_TRANSIT_TO_STORE = NUOVA.STOCK_IN_TRANSIT_TO_STORE,
TARGET.STOCK_IN_REGIONAL_DC = NUOVA.STOCK_IN_REGIONAL_DC,
TARGET.TOTAL_STOCK_ON_HAND = NUOVA.TOTAL_STOCK_ON_HAND,
TARGET.STOCK_ALLOCATED_IN_CENTRAL_WH = NUOVA.STOCK_ALLOCATED_IN_CENTRAL_WH,
TARGET.STOCK_IN_TRANSIT_TO_REGION = NUOVA.STOCK_IN_TRANSIT_TO_REGION,
TARGET.STOCK_FREE_CENTRAL_WH = NUOVA.STOCK_FREE_CENTRAL_WH
WHEN NOT MATCHED BY TARGET
THEN INSERT
(FISCAL_YEARWEEK_NM,
BTN,
ARTICLE_SEASON_DS,
PRODUCT_IMAGE,
--SUBCHANNEL,
GENDER,
PRODUCT_CATEGORY,
FIRST_SEASON,
LAST_SEASON,
TREND_DS,
TREND_CD,
COLOR,
MATERIAL,
FINISH,
DEPARTMENT,
PRODUCT_SUBCATEGORY,
THEME_DS,
THEME_FIX,
PRODUCT_SUBGROUP_DS,
PRODUCT_GROUP_DS,
MODEL_DS,
PRODUCT_FAMILY_DS,
LAST_YEAR_WEEK_ABILITAZIONE,
AVERAGE_FULL_PRICE_CHF,
AVERAGE_SELLING_PRICE_CHF,
STANDARD_COST_CHF,
MARGIN,
LAST_YEAR_MOTNH_ABILITAZIONE,
NET_REVENUES,
NET_SALES,
FULL_PRICE_SALES,
VAL_STDCOST,
TOT_PROMOTION,
NET_SALES_MARKDOWN,
NET_REVENUES_MARKDOWN,
TOT_DISCOUNT,
SALES_QTY,
STORE_SELLING,
TOTAL_STORES_LW,
NET_SALES_LTD,
NET_REVENUES_LTD,
FULL_PRICE_SALES_LTD,
SALES_VALUE_STD_LTD,
FULL_PRICE_QTY_LTD,
TOTAL_QTY_LTD,
NET_SALES_LTD_REGION,
NET_REVENUES_LTD_REGION,
FULL_PRICE_SALES_LTD_REGION,
SALES_VALUE_STD_LTD_REGION,
FULL_PRICE_QTY_LTD_REGION,
TOTAL_QTY_LTD_REGION,
WEEKS_ON_FLOOR_LTD,
WEEKS_ON_FLOOR_LTD_REGION,
STORES_SELLING_LTD,
STORES_SELLING_LTD_REGION,
STOCK_STORES,
STOCK_QTY_NM,
STOCK_IN_STORE,
STOCK_IN_TRANSIT_TO_STORE,
STOCK_IN_REGIONAL_DC,
TOTAL_STOCK_ON_HAND,
STOCK_ALLOCATED_IN_CENTRAL_WH,
STOCK_IN_TRANSIT_TO_REGION,
STOCK_FREE_CENTRAL_WH)
VALUES
(NUOVA.FISCAL_YEARWEEK_NM,
NUOVA.BTN,
NUOVA.ARTICLE_SEASON_DS,
NUOVA.PRODUCT_IMAGE,
--NUOVA.SUBCHANNEL,
NUOVA.GENDER,
NUOVA.PRODUCT_CATEGORY,
NUOVA.FIRST_SEASON,
NUOVA.LAST_SEASON,
NUOVA.TREND_DS,
NUOVA.TREND_CD,
NUOVA.COLOR,
NUOVA.MATERIAL,
NUOVA.FINISH,
NUOVA.DEPARTMENT,
NUOVA.PRODUCT_SUBCATEGORY,
NUOVA.THEME_DS,
NUOVA.THEME_FIX,
NUOVA.PRODUCT_SUBGROUP_DS,
NUOVA.PRODUCT_GROUP_DS,
NUOVA.MODEL_DS,
NUOVA.PRODUCT_FAMILY_DS,
NUOVA.LAST_YEAR_WEEK_ABILITAZIONE,
NUOVA.AVERAGE_FULL_PRICE_CHF,
NUOVA.AVERAGE_SELLING_PRICE_CHF,
NUOVA.STANDARD_COST_CHF,
NUOVA.MARGIN,
NUOVA.LAST_YEAR_MOTNH_ABILITAZIONE,
NUOVA.NET_REVENUES,
NUOVA.NET_SALES,
NUOVA.FULL_PRICE_SALES,
NUOVA.VAL_STDCOST,
NUOVA.TOT_PROMOTION,
NUOVA.NET_SALES_MARKDOWN,
NUOVA.NET_REVENUES_MARKDOWN,
NUOVA.TOT_DISCOUNT,
NUOVA.SALES_QTY,
NUOVA.STORE_SELLING,
NUOVA.TOTAL_STORES_LW,
NUOVA.NET_SALES_LTD,
NUOVA.NET_REVENUES_LTD,
NUOVA.FULL_PRICE_SALES_LTD,
NUOVA.SALES_VALUE_STD_LTD,
NUOVA.FULL_PRICE_QTY_LTD,
NUOVA.TOTAL_QTY_LTD,
NUOVA.NET_SALES_LTD_REGION,
NUOVA.NET_REVENUES_LTD_REGION,
NUOVA.FULL_PRICE_SALES_LTD_REGION,
NUOVA.SALES_VALUE_STD_LTD_REGION,
NUOVA.FULL_PRICE_QTY_LTD_REGION,
NUOVA.TOTAL_QTY_LTD_REGION,
NUOVA.WEEKS_ON_FLOOR_LTD,
NUOVA.WEEKS_ON_FLOOR_LTD_REGION,
NUOVA.STORES_SELLING_LTD,
NUOVA.STORES_SELLING_LTD_REGION,
NUOVA.STOCK_STORES,
NUOVA.STOCK_QTY_NM,
NUOVA.STOCK_IN_STORE,
NUOVA.STOCK_IN_TRANSIT_TO_STORE,
NUOVA.STOCK_IN_REGIONAL_DC,
NUOVA.TOTAL_STOCK_ON_HAND,
NUOVA.STOCK_ALLOCATED_IN_CENTRAL_WH,
NUOVA.STOCK_IN_TRANSIT_TO_REGION,
NUOVA.STOCK_FREE_CENTRAL_WH)
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
The Road Ahead
Now, the query reported above alone took five hours to complete. I am sure there are better ways out there to improve this terrible performance. I hope some of you will land a hand here.
If I can be clearer on something, just drop a comment!
Thanks!