Stored Procedure Executed in ADF Pipeline - Perfomance Issues - sql-server
I am going to post this question divided in sections. Your help would be much appreciated!
Overview of the Issue
I have an issue with data refresh. Right now, I have a pipeline running on Azure Data Factory which is composed of 7 blocks. Each block hosts a stored procedure coded in SQL. Each block generates a table used by the next block. Tables average 3GB in weight. Each block takes between one and five hours to execute.
What I did so far
Given that I am pretty new to this world, I tried to abide by best practices. For example, each statement begins with SET NOCOUNT ON. I always specified column names in select statement, avoiding code blocks like SELECT * FROM TABLE. Also, each table has well defined Primary Keys (defined in the same order for each table) and, being SQL server, they also act as clustered indexes. As for the exemple reported below, these are two screenshots of the indexes:
and
I tried to exploit the MERGE statement. An example of one stored procedure is reported below. Note that I took the max() of all dimension fields because I joined sales and stock together at the level of Week, Product, Season, and Store and then grouped only by Week, Product, and Season. Hence, if I had stores that had a product but did not sale it I would have doubled the rows on NULL attributes.
CREATE PROCEDURE dbo.SP_W_LINE_LEVEL_RT_WORLD2
AS
SET NOCOUNT ON;
MERGE [dbo].[W_LINE_LEVEL_RT_WORLD2] AS TARGET
USING
(
SELECT
COALESCE(sales.FISCAL_YEARWEEK_NM, stock.FISCAL_YEARWEEK_NM) AS FISCAL_YEARWEEK_NM,
COALESCE(sales.BTN, stock.BTN_CD) AS BTN,
COALESCE(sales.ARTICLE_SEASON_DS, stock.ARTICLE_SEASON_DS) AS ARTICLE_SEASON_DS,
MAX(PRODUCT_IMAGE) AS PRODUCT_IMAGE,
--MAX(SUBCHANNEL) AS SUBCHANNEL,
MAX(GENDER) AS GENDER,
MAX(PRODUCT_CATEGORY) AS PRODUCT_CATEGORY,
MAX(FIRST_SEASON) AS FIRST_SEASON,
MAX(LAST_SEASON) AS LAST_SEASON,
MAX(TREND_DS) AS TREND_DS,
MAX(TREND_CD) AS TREND_CD,
MAX(COLOR) AS COLOR,
MAX(MATERIAL) AS MATERIAL,
MAX(FINISH) AS FINISH,
MAX(DEPARTMENT) AS DEPARTMENT,
MAX(PRODUCT_SUBCATEGORY) AS PRODUCT_SUBCATEGORY,
MAX(THEME_DS) AS THEME_DS,
MAX(THEME_FIX) AS THEME_FIX,
MAX(PRODUCT_SUBGROUP_DS) AS PRODUCT_SUBGROUP_DS,
MAX(PRODUCT_GROUP_DS) AS PRODUCT_GROUP_DS,
MAX(MODEL_DS) AS MODEL_DS,
MAX(PRODUCT_FAMILY_DS) AS PRODUCT_FAMILY_DS,
MAX(LAST_YEAR_WEEK_ABILITAZIONE) AS LAST_YEAR_WEEK_ABILITAZIONE,
-- AVERAGE LIFE TO DATE METRICS
MAX(FULL_PRICE_SALES_LTD) / NULLIF(MAX(TOTAL_QTY_LTD),0) AS AVERAGE_FULL_PRICE_CHF,
MAX(NET_REVENUES_LTD) / NULLIF(MAX(TOTAL_QTY_LTD),0) AS AVERAGE_SELLING_PRICE_CHF,
MAX(SALES_VALUE_STD_LTD) / NULLIF(MAX(TOTAL_QTY_LTD),0) AS STANDARD_COST_CHF,
MAX(SALES_VALUE_STD_LTD) / NULLIF(MAX(NET_REVENUES_LTD),0) AS MARGIN,
-- WEEK BY WEEK ANALYSIS
MAX(LAST_YEAR_MOTNH_ABILITAZIONE) AS LAST_YEAR_MOTNH_ABILITAZIONE,
SUM(NET_REVENUES) AS NET_REVENUES,
SUM(NET_SALES) AS NET_SALES,
SUM(FULL_PRICE_SALES) AS FULL_PRICE_SALES,
SUM(VAL_STDCOST) AS VAL_STDCOST,
SUM(TOT_PROMOTION) AS TOT_PROMOTION,
SUM(NET_SALES_MARKDOWN) AS NET_SALES_MARKDOWN,
SUM(NET_REVENUES_MARKDOWN) AS NET_REVENUES_MARKDOWN,
SUM(TOT_DISCOUNT) AS TOT_DISCOUNT,
SUM(SALES_QTY) AS SALES_QTY,
COUNT(DISTINCT STORE_SELLING) AS STORE_SELLING,
COUNT(DISTINCT COALESCE(sales.CLIENT_CD, stock.CLIENT_CD)) AS TOTAL_STORES_LW,
-- LIFE TO DATE ANALYSIS
MAX(NET_SALES_LTD) AS NET_SALES_LTD,
MAX(NET_REVENUES_LTD) AS NET_REVENUES_LTD,
MAX(FULL_PRICE_SALES_LTD) AS FULL_PRICE_SALES_LTD,
MAX(SALES_VALUE_STD_LTD) AS SALES_VALUE_STD_LTD,
MAX(FULL_PRICE_QTY_LTD) AS FULL_PRICE_QTY_LTD,
MAX(TOTAL_QTY_LTD) AS TOTAL_QTY_LTD,
MAX(NET_SALES_LTD_REGION) AS NET_SALES_LTD_REGION,
MAX(NET_REVENUES_LTD_REGION) AS NET_REVENUES_LTD_REGION,
MAX(FULL_PRICE_SALES_LTD_REGION) AS FULL_PRICE_SALES_LTD_REGION,
MAX(SALES_VALUE_STD_LTD_REGION) AS SALES_VALUE_STD_LTD_REGION,
MAX(FULL_PRICE_QTY_LTD_REGION) AS FULL_PRICE_QTY_LTD_REGION,
MAX(TOTAL_QTY_LTD_REGION) AS TOTAL_QTY_LTD_REGION,
COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_AW_LTD),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_AW_LTD),0) + COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_SS_LTD),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_SS_LTD),0) AS WEEKS_ON_FLOOR_LTD,
COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_AW_LTD_REGION),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_AW_LTD_REGION),0) + COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_SS_LTD_REGION),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_SS_LTD_REGION),0) AS WEEKS_ON_FLOOR_LTD_REGION,
COALESCE(MAX(RUNNING_STORE_SELLING_AW_CARRY_LTD),0) + COALESCE(MAX(RUNNING_STORE_SELLING_AW_SEASONAL_LTD),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_CARRY_LTD),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_SEASONAL_LTD),0) AS STORES_SELLING_LTD,
COALESCE(MAX(RUNNING_STORE_SELLING_AW_CARRY_LTD_REGION),0) + COALESCE(MAX(RUNNING_STORE_SELLING_AW_SEASONAL_LTD_REGION),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_CARRY_LTD_REGION),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_SEASONAL_LTD_REGION),0) AS STORES_SELLING_LTD_REGION,
-- STOCK DATA
COUNT(DISTINCT stock.CLIENT_CD) AS STOCK_STORES,
SUM(STOCK_QTY_NM) AS STOCK_QTY_NM,
SUM(STOCK_IN_STORE) AS STOCK_IN_STORE,
SUM(STOCK_IN_TRANSIT_TO_STORE) AS STOCK_IN_TRANSIT_TO_STORE,
SUM(STOCK_IN_REGIONAL_DC) AS STOCK_IN_REGIONAL_DC,
SUM(STOCK_IN_STORE) + SUM(STOCK_IN_TRANSIT_TO_STORE) + SUM(STOCK_IN_TRANSIT_TO_STORE) + SUM(STOCK_IN_REGIONAL_DC) AS TOTAL_STOCK_ON_HAND,
SUM(STOCK_ALLOCATED_IN_CENTRAL_WH) AS STOCK_ALLOCATED_IN_CENTRAL_WH,
SUM(STOCK_IN_TRANSIT_TO_REGION) AS STOCK_IN_TRANSIT_TO_REGION,
SUM(STOCK_FREE_CENTRAL_WH) AS STOCK_FREE_CENTRAL_WH
FROM
W_SALES_LINE_LEVEL_RT_WORLD2 sales
FULL OUTER JOIN
W_STOCK_LINE_LEVEL_RT_WORLD stock
ON
sales.FISCAL_YEARWEEK_NM = stock.FISCAL_YEARWEEK_NM
AND sales.BTN = stock.BTN_CD
AND sales.ARTICLE_SEASON_DS = stock.ARTICLE_SEASON_DS
AND sales.CLIENT_CD = stock.CLIENT_CD
GROUP BY
COALESCE(sales.FISCAL_YEARWEEK_NM, stock.FISCAL_YEARWEEK_NM),
COALESCE(sales.BTN, stock.BTN_CD),
COALESCE(sales.ARTICLE_SEASON_DS, stock.ARTICLE_SEASON_DS)
) NUOVA
ON
TARGET.FISCAL_YEARWEEK_NM = NUOVA.FISCAL_YEARWEEK_NM
AND TARGET.BTN = NUOVA.BTN
AND TARGET.ARTICLE_SEASON_DS = NUOVA.ARTICLE_SEASON_DS
WHEN MATCHED
THEN UPDATE SET
TARGET.FISCAL_YEARWEEK_NM = NUOVA.FISCAL_YEARWEEK_NM,
TARGET.BTN = NUOVA.BTN,
TARGET.ARTICLE_SEASON_DS = NUOVA.ARTICLE_SEASON_DS,
TARGET.PRODUCT_IMAGE = NUOVA.PRODUCT_IMAGE,
--TARGET.SUBCHANNEL = NUOVA.SUBCHANNEL,
TARGET.GENDER = NUOVA.GENDER,
TARGET.PRODUCT_CATEGORY = NUOVA.PRODUCT_CATEGORY,
TARGET.FIRST_SEASON = NUOVA.FIRST_SEASON,
TARGET.LAST_SEASON = NUOVA.LAST_SEASON,
TARGET.TREND_DS = NUOVA.TREND_DS,
TARGET.TREND_CD = NUOVA.TREND_CD,
TARGET.COLOR = NUOVA.COLOR,
TARGET.MATERIAL = NUOVA.MATERIAL,
TARGET.FINISH = NUOVA.FINISH,
TARGET.DEPARTMENT = NUOVA.DEPARTMENT,
TARGET.PRODUCT_SUBCATEGORY = NUOVA.PRODUCT_SUBCATEGORY,
TARGET.THEME_DS = NUOVA.THEME_DS,
TARGET.THEME_FIX = NUOVA.THEME_FIX,
TARGET.PRODUCT_SUBGROUP_DS = NUOVA.PRODUCT_SUBGROUP_DS,
TARGET.PRODUCT_GROUP_DS = NUOVA.PRODUCT_GROUP_DS,
TARGET.MODEL_DS = NUOVA.MODEL_DS,
TARGET.PRODUCT_FAMILY_DS = NUOVA.PRODUCT_FAMILY_DS,
TARGET.LAST_YEAR_WEEK_ABILITAZIONE = NUOVA.LAST_YEAR_WEEK_ABILITAZIONE,
TARGET.AVERAGE_FULL_PRICE_CHF = NUOVA.AVERAGE_FULL_PRICE_CHF,
TARGET.AVERAGE_SELLING_PRICE_CHF = NUOVA.AVERAGE_SELLING_PRICE_CHF,
TARGET.STANDARD_COST_CHF = NUOVA.STANDARD_COST_CHF,
TARGET.MARGIN = NUOVA.MARGIN,
TARGET.LAST_YEAR_MOTNH_ABILITAZIONE = NUOVA.LAST_YEAR_MOTNH_ABILITAZIONE,
TARGET.NET_REVENUES = NUOVA.NET_REVENUES,
TARGET.NET_SALES = NUOVA.NET_SALES,
TARGET.FULL_PRICE_SALES = NUOVA.FULL_PRICE_SALES,
TARGET.VAL_STDCOST = NUOVA.VAL_STDCOST,
TARGET.TOT_PROMOTION = NUOVA.TOT_PROMOTION,
TARGET.NET_SALES_MARKDOWN = NUOVA.NET_SALES_MARKDOWN,
TARGET.NET_REVENUES_MARKDOWN = NUOVA.NET_REVENUES_MARKDOWN,
TARGET.TOT_DISCOUNT = NUOVA.TOT_DISCOUNT,
TARGET.SALES_QTY = NUOVA.SALES_QTY,
TARGET.STORE_SELLING = NUOVA.STORE_SELLING,
TARGET.TOTAL_STORES_LW = NUOVA.TOTAL_STORES_LW,
TARGET.NET_SALES_LTD = NUOVA.NET_SALES_LTD,
TARGET.NET_REVENUES_LTD = NUOVA.NET_REVENUES_LTD,
TARGET.FULL_PRICE_SALES_LTD = NUOVA.FULL_PRICE_SALES_LTD,
TARGET.SALES_VALUE_STD_LTD = NUOVA.SALES_VALUE_STD_LTD,
TARGET.FULL_PRICE_QTY_LTD = NUOVA.FULL_PRICE_QTY_LTD,
TARGET.TOTAL_QTY_LTD = NUOVA.TOTAL_QTY_LTD,
TARGET.NET_SALES_LTD_REGION = NUOVA.NET_SALES_LTD_REGION,
TARGET.NET_REVENUES_LTD_REGION = NUOVA.NET_REVENUES_LTD_REGION,
TARGET.FULL_PRICE_SALES_LTD_REGION = NUOVA.FULL_PRICE_SALES_LTD_REGION,
TARGET.SALES_VALUE_STD_LTD_REGION = NUOVA.SALES_VALUE_STD_LTD_REGION,
TARGET.FULL_PRICE_QTY_LTD_REGION = NUOVA.FULL_PRICE_QTY_LTD_REGION,
TARGET.TOTAL_QTY_LTD_REGION = NUOVA.TOTAL_QTY_LTD_REGION,
TARGET.WEEKS_ON_FLOOR_LTD = NUOVA.WEEKS_ON_FLOOR_LTD,
TARGET.WEEKS_ON_FLOOR_LTD_REGION = NUOVA.WEEKS_ON_FLOOR_LTD_REGION,
TARGET.STORES_SELLING_LTD = NUOVA.STORES_SELLING_LTD,
TARGET.STORES_SELLING_LTD_REGION = NUOVA.STORES_SELLING_LTD_REGION,
TARGET.STOCK_STORES = NUOVA.STOCK_STORES,
TARGET.STOCK_QTY_NM = NUOVA.STOCK_QTY_NM,
TARGET.STOCK_IN_STORE = NUOVA.STOCK_IN_STORE,
TARGET.STOCK_IN_TRANSIT_TO_STORE = NUOVA.STOCK_IN_TRANSIT_TO_STORE,
TARGET.STOCK_IN_REGIONAL_DC = NUOVA.STOCK_IN_REGIONAL_DC,
TARGET.TOTAL_STOCK_ON_HAND = NUOVA.TOTAL_STOCK_ON_HAND,
TARGET.STOCK_ALLOCATED_IN_CENTRAL_WH = NUOVA.STOCK_ALLOCATED_IN_CENTRAL_WH,
TARGET.STOCK_IN_TRANSIT_TO_REGION = NUOVA.STOCK_IN_TRANSIT_TO_REGION,
TARGET.STOCK_FREE_CENTRAL_WH = NUOVA.STOCK_FREE_CENTRAL_WH
WHEN NOT MATCHED BY TARGET
THEN INSERT
(FISCAL_YEARWEEK_NM,
BTN,
ARTICLE_SEASON_DS,
PRODUCT_IMAGE,
--SUBCHANNEL,
GENDER,
PRODUCT_CATEGORY,
FIRST_SEASON,
LAST_SEASON,
TREND_DS,
TREND_CD,
COLOR,
MATERIAL,
FINISH,
DEPARTMENT,
PRODUCT_SUBCATEGORY,
THEME_DS,
THEME_FIX,
PRODUCT_SUBGROUP_DS,
PRODUCT_GROUP_DS,
MODEL_DS,
PRODUCT_FAMILY_DS,
LAST_YEAR_WEEK_ABILITAZIONE,
AVERAGE_FULL_PRICE_CHF,
AVERAGE_SELLING_PRICE_CHF,
STANDARD_COST_CHF,
MARGIN,
LAST_YEAR_MOTNH_ABILITAZIONE,
NET_REVENUES,
NET_SALES,
FULL_PRICE_SALES,
VAL_STDCOST,
TOT_PROMOTION,
NET_SALES_MARKDOWN,
NET_REVENUES_MARKDOWN,
TOT_DISCOUNT,
SALES_QTY,
STORE_SELLING,
TOTAL_STORES_LW,
NET_SALES_LTD,
NET_REVENUES_LTD,
FULL_PRICE_SALES_LTD,
SALES_VALUE_STD_LTD,
FULL_PRICE_QTY_LTD,
TOTAL_QTY_LTD,
NET_SALES_LTD_REGION,
NET_REVENUES_LTD_REGION,
FULL_PRICE_SALES_LTD_REGION,
SALES_VALUE_STD_LTD_REGION,
FULL_PRICE_QTY_LTD_REGION,
TOTAL_QTY_LTD_REGION,
WEEKS_ON_FLOOR_LTD,
WEEKS_ON_FLOOR_LTD_REGION,
STORES_SELLING_LTD,
STORES_SELLING_LTD_REGION,
STOCK_STORES,
STOCK_QTY_NM,
STOCK_IN_STORE,
STOCK_IN_TRANSIT_TO_STORE,
STOCK_IN_REGIONAL_DC,
TOTAL_STOCK_ON_HAND,
STOCK_ALLOCATED_IN_CENTRAL_WH,
STOCK_IN_TRANSIT_TO_REGION,
STOCK_FREE_CENTRAL_WH)
VALUES
(NUOVA.FISCAL_YEARWEEK_NM,
NUOVA.BTN,
NUOVA.ARTICLE_SEASON_DS,
NUOVA.PRODUCT_IMAGE,
--NUOVA.SUBCHANNEL,
NUOVA.GENDER,
NUOVA.PRODUCT_CATEGORY,
NUOVA.FIRST_SEASON,
NUOVA.LAST_SEASON,
NUOVA.TREND_DS,
NUOVA.TREND_CD,
NUOVA.COLOR,
NUOVA.MATERIAL,
NUOVA.FINISH,
NUOVA.DEPARTMENT,
NUOVA.PRODUCT_SUBCATEGORY,
NUOVA.THEME_DS,
NUOVA.THEME_FIX,
NUOVA.PRODUCT_SUBGROUP_DS,
NUOVA.PRODUCT_GROUP_DS,
NUOVA.MODEL_DS,
NUOVA.PRODUCT_FAMILY_DS,
NUOVA.LAST_YEAR_WEEK_ABILITAZIONE,
NUOVA.AVERAGE_FULL_PRICE_CHF,
NUOVA.AVERAGE_SELLING_PRICE_CHF,
NUOVA.STANDARD_COST_CHF,
NUOVA.MARGIN,
NUOVA.LAST_YEAR_MOTNH_ABILITAZIONE,
NUOVA.NET_REVENUES,
NUOVA.NET_SALES,
NUOVA.FULL_PRICE_SALES,
NUOVA.VAL_STDCOST,
NUOVA.TOT_PROMOTION,
NUOVA.NET_SALES_MARKDOWN,
NUOVA.NET_REVENUES_MARKDOWN,
NUOVA.TOT_DISCOUNT,
NUOVA.SALES_QTY,
NUOVA.STORE_SELLING,
NUOVA.TOTAL_STORES_LW,
NUOVA.NET_SALES_LTD,
NUOVA.NET_REVENUES_LTD,
NUOVA.FULL_PRICE_SALES_LTD,
NUOVA.SALES_VALUE_STD_LTD,
NUOVA.FULL_PRICE_QTY_LTD,
NUOVA.TOTAL_QTY_LTD,
NUOVA.NET_SALES_LTD_REGION,
NUOVA.NET_REVENUES_LTD_REGION,
NUOVA.FULL_PRICE_SALES_LTD_REGION,
NUOVA.SALES_VALUE_STD_LTD_REGION,
NUOVA.FULL_PRICE_QTY_LTD_REGION,
NUOVA.TOTAL_QTY_LTD_REGION,
NUOVA.WEEKS_ON_FLOOR_LTD,
NUOVA.WEEKS_ON_FLOOR_LTD_REGION,
NUOVA.STORES_SELLING_LTD,
NUOVA.STORES_SELLING_LTD_REGION,
NUOVA.STOCK_STORES,
NUOVA.STOCK_QTY_NM,
NUOVA.STOCK_IN_STORE,
NUOVA.STOCK_IN_TRANSIT_TO_STORE,
NUOVA.STOCK_IN_REGIONAL_DC,
NUOVA.TOTAL_STOCK_ON_HAND,
NUOVA.STOCK_ALLOCATED_IN_CENTRAL_WH,
NUOVA.STOCK_IN_TRANSIT_TO_REGION,
NUOVA.STOCK_FREE_CENTRAL_WH)
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
The Road Ahead
Now, the query reported above alone took five hours to complete. I am sure there are better ways out there to improve this terrible performance. I hope some of you will land a hand here.
If I can be clearer on something, just drop a comment!
Thanks!
Related
MatLab - Creating an array of images depending on their correlation
I've created a program for a project that tests images against one another to see whether or not it's the same image or not. I've decided to use correlation since the images I am using are styled in the same way and with this, I've been able to get everything working up to this point. I now wish to create an array of images again, but this time, in order of their correlation. So for example, if I'm testing a 50 pence coin and I test 50 images against the 50 pence coin, I want the highest 5 correlations to be stored into an array, which can then be used for later use. But I'm unsure how to do this as each item in the array will need to have more than one variable, which will be the image location/name of the image and it's correlation percentage. %Program Created By Ben Parry, 2016. clc(); %Simply clears the console window %Targets the image the user picks inputImage = imgetfile(); %Targets all the images inside this directory referenceFolder = 'M:\Project\MatLab\Coin Image Processing\Saved_Images'; if ~isdir(referenceFolder) errorMessage = print('Error: Folder does not exist!'); uiwait(warndlg(errorMessage)); %Displays an error if the folder doesn't exist return; end filePattern = fullfile(referenceFolder, '*.jpg'); jpgFiles = dir(filePattern); for i = 1:length(jpgFiles) baseFileName = jpgFiles(i).name; fullFileName = fullfile(referenceFolder, baseFileName); fprintf(1, 'Reading %s\n', fullFileName); imageArray = imread(fullFileName); imshow(imageArray); firstImage = imread(inputImage); %Reading the image %Converting the images to Black & White firstImageBW = im2bw(firstImage); secondImageBW = im2bw(imageArray); %Finding the correlation, then coverting it into a percentage c = corr2(firstImageBW, secondImageBW); corrValue = sprintf('%.0f%%',100*c); %Custom messaging for the possible outcomes corrMatch = sprintf('The images are the same (%s)',corrValue); corrUnMatch = sprintf('The images are not the same (%s)',corrValue); %Looping for the possible two outcomes if c >=0.99 %Define a percentage for the correlation to reach disp(' '); disp('Images Tested:'); disp(inputImage); disp(fullFileName); disp (corrMatch); disp(' '); else disp(' '); disp('Images Tested:'); disp(inputImage); disp(fullFileName); disp(corrUnMatch); disp(' ' ); end; imageArray = imread(fullFileName); imshow(imageArray); end
You can use struct() function to create structures. Initializing an array of struct: imStruct = struct('fileName', '', 'image', [], 'correlation', 0); imData = repmat(imStruct, length(jpgFiles), 1); Setting field values: for i = 1:length(jpgFiles) % ... imData(i).fileName = fullFileName; imData(i).image = imageArray; imData(i).correlation = corrValue; end Extract values of correlation field and select 5 highest correlations: corrList = [imData.correlation]; [~, sortedInd] = sort(corrList, 'descend'); selectedData = imData(sortedInd(1:5));
How to loop through table based on unique date in MATLAB
I have this table named BondData which contains the following: Settlement Maturity Price Coupon 8/27/2016 1/12/2017 106.901 9.250 8/27/2019 1/27/2017 104.79 7.000 8/28/2016 3/30/2017 106.144 7.500 8/28/2016 4/27/2017 105.847 7.000 8/29/2016 9/4/2017 110.779 9.125 For each day in this table, I am about to perform a certain task which is to assign several values to a variable and perform necessary computations. The logic is like: do while Settlement is the same m_settle=current_row_settlement_value m_maturity=current_row_maturity_value and so on... my_computation_here... end It's like I wanted to loop through my settlement dates and perform task for as long as the date is the same. EDIT: Just to clarify my issue, I am implementing Yield Curve fitting using Nelson-Siegel and Svensson models.Here are my codes so far: function NS_SV_Models() load bondsdata BondData=table(Settlement,Maturity,Price,Coupon); BondData.Settlement = categorical(BondData.Settlement); Settlements = categories(BondData.Settlement); % get all unique Settlement for k = 1:numel(Settlements) rows = BondData.Settlement==Settlements(k); Bonds.Settle = Settlements(k); % current_row_settlement_value Bonds.Maturity = BondData.Maturity(rows); % current_row_maturity_value Bonds.Prices=BondData.Price(rows); Bonds.Coupon=BondData.Coupon(rows); Settle = Bonds.Settle; Maturity = Bonds.Maturity; CleanPrice = Bonds.Prices; CouponRate = Bonds.Coupon; Instruments = [Settle Maturity CleanPrice CouponRate]; Yield = bndyield(CleanPrice,CouponRate,Settle,Maturity); NSModel = IRFunctionCurve.fitNelsonSiegel('Zero',Settlements(k),Instruments); SVModel = IRFunctionCurve.fitSvensson('Zero',Settlements(k),Instruments); NSModel.Parameters SVModel.Parameters end end Again, my main objective is to get each model's parameters (beta0, beta1, beta2, etc.) on a per day basis. I am getting an error in Instruments = [Settle Maturity CleanPrice CouponRate]; because Settle contains only one record (8/27/2016), it's suppose to have two since there are two rows for this date. Also, I noticed that Maturity, CleanPrice and CouponRate contains all records. They should only contain respective data for each day. Hope I made my issue clearer now. By the way, I am using MATLAB R2015a.
Use categorical array. Here is your function (without its' headline, and all rows I can't run are commented): BondData = table(datetime(Settlement),datetime(Maturity),Price,Coupon,... 'VariableNames',{'Settlement','Maturity','Price','Coupon'}); BondData.Settlement = categorical(BondData.Settlement); Settlements = categories(BondData.Settlement); % get all unique Settlement for k = 1:numel(Settlements) rows = BondData.Settlement==Settlements(k); Settle = BondData.Settlement(rows); % current_row_settlement_value Mature = BondData.Maturity(rows); % current_row_maturity_value CleanPrice = BondData.Price(rows); CouponRate = BondData.Coupon(rows); Instruments = [datenum(char(Settle)) datenum(char(Mature))... CleanPrice CouponRate]; % Yield = bndyield(CleanPrice,CouponRate,Settle,Mature); % % NSModel = IRFunctionCurve.fitNelsonSiegel('Zero',Settlements(k),Instruments); % SVModel = IRFunctionCurve.fitSvensson('Zero',Settlements(k),Instruments); % % NSModel.Parameters % SVModel.Parameters end Keep in mind the following: You cannot concat different types of variables as you try to do in: Instruments = [Settle Maturity CleanPrice CouponRate]; There is no need in the structure Bond, you don't use it (e.g. Settle = Bonds.Settle;). Use the relevant functions to convert between a datetime object and string or numbers. For instance, in the code above: datenum(char(Settle)). I don't know what kind of input you need to pass to the following functions.
String or binary data would be truncated when updating datatable
I am trying to update customer info. I use this code to load 5 fields: cust_cn.Open() Dim cust_da As New SqlDataAdapter("SELECT * FROM [customers] where [custID]=" & txtCustPhone.Text, cust_cn) cust_da.Fill(cust_datatable) txtCustPhone.Text = cust_datatable.Rows(0).Item("custID") txtCustFirstName.Text = cust_datatable.Rows(0).Item("first") txtCustLastName.Text = cust_datatable.Rows(0).Item("last") txtCustAddress.Text = cust_datatable.Rows(0).Item("address") txtCustZip.Text = cust_datatable.Rows(0).Item("zip") and this works fine. When I try to modify one of the fields (change zip code on an existing customer) with this code: If cust_datatable.Rows.Count <> 0 Then cust_datatable.Rows(0).Item("custID") = txtCustPhone.Text cust_datatable.Rows(0).Item("first") = txtCustFirstName.Text cust_datatable.Rows(0).Item("last") = txtCustLastName cust_datatable.Rows(0).Item("address") = txtCustAddress.Text cust_datatable.Rows(0).Item("zip") = txtCustZip.Text 'cust_datatable.Rows(custrecord)("custID") = txtCustPhone.Text 'cust_datatable.Rows(custrecord)("first") = txtCustFirstName.Text 'cust_datatable.Rows(custrecord)("last") = txtCustLastName.Text 'cust_datatable.Rows(custrecord)("address") = txtCustAddress.Text 'cust_datatable.Rows(custrecord)("zip") = txtCustZip.Text cust_DA.Update(cust_datatable) End If I get the error: "String or binary data would be truncated" I originally tried to update using the commented section, but it was only modifying the first record in the database. Any thoughts?
Is it better to change the db schema?
I'm building a web app with django. I use postgresql for the db. The app code is getting really messy(my begginer skills being a big factor) and slow, even when I run the app locally. This is an excerpt of my models.py file: REPEATS_CHOICES = ( (NEVER, 'Never'), (DAILY, 'Daily'), (WEEKLY, 'Weekly'), (MONTHLY, 'Monthly'), ...some more... ) class Transaction(models.Model): name = models.CharField(max_length=30) type = models.IntegerField(max_length=1, choices=TYPE_CHOICES) # 0 = 'Income' , 1 = 'Expense' amount = models.DecimalField(max_digits=12, decimal_places=2) date = models.DateField(default=date.today) frequency = models.IntegerField(max_length=2, choices=REPEATS_CHOICES) ends = models.DateField(blank=True, null=True) active = models.BooleanField(default=True) category = models.ForeignKey(Category, related_name='transactions', blank=True, null=True) account = models.ForeignKey(Account, related_name='transactions') The problem is with date, frequency and ends. With this info I can know all the dates in which transactions occurs and use it to fill a cashflow table. Doing things this way involves creating a lot of structures(dictionaries, lists and tuples) and iterating them a lot. Maybe there is a very simple way of solving this with the actual schema, but I couldn't realize how. I think that the app would be easier to code if, at the creation of a transaction, I could save all the dates in the db. I don't know if it's possible or if it's a good idea. I'm reading a book about google app engine and the datastore's multivalued properties. What do you think about this for solving my problem?. Edit: I didn't know about the PickleField. I'm now reading about it, maybe I could use it to store all the transaction's datetime objects. Edit2: This is an excerpt of my cashflow2 view(sorry for the horrible code): def cashflow2(request, account_name="Initial"): if account_name == "Initial": uri = "/cashflow/new_account" return HttpResponseRedirect(uri) month_info = {} cat_info = {} m_y_list = [] # [(month,year),] trans = [] min, max = [] , [] account = Account.objects.get(name=account_name, user=request.user) categories = account.categories.all() for year in range(2006,2017): for month in range(1,13): month_info[(month, year)] = [0, 0, 0] for cat in categories: cat_info[(cat, month, year)] = 0 previous_months = 1 # previous months from actual next_months = 5 dates_list = month_year_list(previous_month, next_months) # Returns [(month,year)] from the requested range m_y_list = [(date.month, date.year) for date in month_year_list(1,5)] min, max = dates_list[0], dates_list[-1] INCOME = 0 EXPENSE = 1 ONHAND = 2 transacs_in_dates = [] txs = account.transactions.order_by('date') for tx in txs: monthyear = () monthyear = (tx.date.month, tx.date.year) if tx.frequency == 0: if tx.type == 0: month_info[monthyear][INCOME] += tx.amount if tx.category: cat_info[(tx.category, monthyear[0], monthyear[1])] += tx.amount else: month_info[monthyear][EXPENSE] += tx.amount if tx.category: cat_info[(tx.category, monthyear[0], monthyear[1])] += tx.amount if monthyear in lista_m_a: if tx not in transacs_in_dates: transacs_in_dates.append(tx) elif tx.frequency == 4: # frequency = 'Monthly' months_dif = relativedelta.relativedelta(tx.ends, tx.date).months if tx.ends.day < tx.date.day: months_dif += 1 years_dif = relativedelta.relativedelta(tx.ends, tx.date).years dif = months_dif + (years_dif*12) dates_range = dif + 1 for i in range(dates_range): dt = tx.date+relativedelta.relativedelta(months=+i) if (dt.month, dt.year) in m_y_list: if tx not in transacs_in_dates: transacs_in_dates.append(tx) if tx.type == 0: month_info[(fch.month,fch.year)][INCOME] += tx.amount if tx.category: cat_info[(tx.category, fch.month, fch.year)] += tx.amount else: month_info[(fch.month,fch.year)][EXPENSE] += tx.amount if tx.category: cat_info[(tx.category, fch.month, fch.year)] += tx.amount import operator thelist = [] thelist = sorted((my + tuple(v) for my, v in month_info.iteritems()), key = operator.itemgetter(1, 0)) thelistlist = [] for atuple in thelist: thelistlist.append(list(atuple)) for i in range(len(thelistlist)): if i != 0: thelistlist[i][4] = thelistlist[i-1][2] - thelistlist[i-1][3] + thelistlist[i-1][4] list = [] for el in thelistlist: if (el[0],el[1]) in lista_m_a: list.append(el) transactions = account.transactions.all() cats_in_dates_income = [] cats_in_dates_expense = [] for t in transacs_in_dates: if t.category and t.type == 0: if t.category not in cats_in_dates_income: cats_in_dates_income.append(t.category) elif t.category and t.type == 1: if t.category not in cats_in_dates_expense: cats_in_dates_expense.append(t.category) cat_infos = [] for k, v in cat_info.items(): cat_infos.append((k[0], k[1], k[2], v))
Depends on how relevant App Engine is here. P.S. If you'd like to store pickled objects as well as JSON objects in the Google Datastore, check out these two code snippets: http://kovshenin.com/archives/app-engine-json-objects-google-datastore/ http://kovshenin.com/archives/app-engine-python-objects-in-the-google-datastore/ Also note that the Google Datastore is a non-relational database, so you might have other trouble refactoring your code to switch to that. Cheers and good luck!
Mom file creation (5 product limit)
Ok, I realize this is a very niche issue, but I'm hoping the process is straight forward enough... I'm tasked with creating a data file out of Customer/Order information. Problem is, the datafile has a 5 product max limit. Basically, I get my data, group by cust_id, create the file structure, within that loop, group by product_id, rewrite the fields in previous file_struct with new product info. That's worked all well and good until a user exceeded that max. A brief example.. (keep in mind, the structure of the array is set by another process, this CANNOT change) orderArray = arranyew(2); set order = 1; loop over cust_id; field[order][1] = "field(1)"; // cust_id field[order][2] = "field(2)"; // name field[order][3] = "field(3)"; // phone field[order][4] = ""; // product_1 field[order][5] = ""; // quantity_1 field[order][6] = ""; // product_2 field[order][7] = ""; // quantity_2 field[order][8] = ""; // product_3 field[order][9] = ""; // quantity_3 field[order][10] = ""; // product_4 field[order][11] = ""; // quantity_4 field[order][12] = ""; // product_5 field[order][13] = ""; // quantity_5 field[order][14] = "field(4)"; // trx_id field[order][15] = "field(5)"; // total_cost counter = 0; loop over product_id field[order[4+counter] = productCode; field[order[5+counter] = quantity; counter = counter + 2; end inner loop; order = order + 1; end outer loop; Like I said, this worked fine until I had a user who ordered more than 5 products. What I basically want to do is check the number of products for each user if that number is greater than 5, start a new line in the text field, but I'm stufk on how to get there. I've tried numerous fixes, but nothing gives the results I need. I can send the entire file if It can help, but I don't want to post it all here.
You need to move the inserting of the header and footer fields into product loop eg. the custid and trx_id fields. Here's a rough idea of one why you can go about this based on the pseudo code you provided. I'm sure that there are more elegant ways that you could code this. set order = 0; loop over cust_id; counter = 1; order = order + 1; loop over product_id if (counter == 1 || counter == 6) { if (counter == 6) { counter == 1; order= order+1; } field[order][1] = "field(1)"; // cust_id field[order][2] = "field(2)"; // name field[order][3] = "field(3)"; // phone } field[order][counter+3] = productCode; // product_1 field[order][counter+4] = quantity; // quantity_1 counter = counter + 1; if (counter == 6) { field[order][14] = "field(4)"; // trx_id field[order][15] = "field(5)"; // total_cost } end inner loop; if (counter == 6) { // loop here to insert blank columns and the totals field to fill out the row. } end outer loop; One thing goes concern me. If you start a new line every five products then your transaction id and total cost is going to be entered into the file more than once. You know the receiving system. It may be a non-issue. Hope this helps
As you put the data into the row, you need check if there are more than 5 products and then create an additional line. loop over product_id if (counter mod 10 == 0 and counter > 0) { // create the new row, and mark it as a continuation of the previous order counter = 0; order = order + 1; field[order][1] = ""; ... field[order][15] = ""; } field[order[4+counter] = productCode; field[order[5+counter] = quantity; counter = counter + 2; end inner loop; I've actually done the export from an ecommerce system to MOM, but that code has since been lost. I have samples of code in classic ASP.