Stored Procedure Executed in ADF Pipeline - Perfomance Issues - sql-server

I am going to post this question divided in sections. Your help would be much appreciated!
Overview of the Issue
I have an issue with data refresh. Right now, I have a pipeline running on Azure Data Factory which is composed of 7 blocks. Each block hosts a stored procedure coded in SQL. Each block generates a table used by the next block. Tables average 3GB in weight. Each block takes between one and five hours to execute.
What I did so far
Given that I am pretty new to this world, I tried to abide by best practices. For example, each statement begins with SET NOCOUNT ON. I always specified column names in select statement, avoiding code blocks like SELECT * FROM TABLE. Also, each table has well defined Primary Keys (defined in the same order for each table) and, being SQL server, they also act as clustered indexes. As for the exemple reported below, these are two screenshots of the indexes:
and
I tried to exploit the MERGE statement. An example of one stored procedure is reported below. Note that I took the max() of all dimension fields because I joined sales and stock together at the level of Week, Product, Season, and Store and then grouped only by Week, Product, and Season. Hence, if I had stores that had a product but did not sale it I would have doubled the rows on NULL attributes.
CREATE PROCEDURE dbo.SP_W_LINE_LEVEL_RT_WORLD2
AS
SET NOCOUNT ON;
MERGE [dbo].[W_LINE_LEVEL_RT_WORLD2] AS TARGET
USING
(
SELECT
COALESCE(sales.FISCAL_YEARWEEK_NM, stock.FISCAL_YEARWEEK_NM) AS FISCAL_YEARWEEK_NM,
COALESCE(sales.BTN, stock.BTN_CD) AS BTN,
COALESCE(sales.ARTICLE_SEASON_DS, stock.ARTICLE_SEASON_DS) AS ARTICLE_SEASON_DS,
MAX(PRODUCT_IMAGE) AS PRODUCT_IMAGE,
--MAX(SUBCHANNEL) AS SUBCHANNEL,
MAX(GENDER) AS GENDER,
MAX(PRODUCT_CATEGORY) AS PRODUCT_CATEGORY,
MAX(FIRST_SEASON) AS FIRST_SEASON,
MAX(LAST_SEASON) AS LAST_SEASON,
MAX(TREND_DS) AS TREND_DS,
MAX(TREND_CD) AS TREND_CD,
MAX(COLOR) AS COLOR,
MAX(MATERIAL) AS MATERIAL,
MAX(FINISH) AS FINISH,
MAX(DEPARTMENT) AS DEPARTMENT,
MAX(PRODUCT_SUBCATEGORY) AS PRODUCT_SUBCATEGORY,
MAX(THEME_DS) AS THEME_DS,
MAX(THEME_FIX) AS THEME_FIX,
MAX(PRODUCT_SUBGROUP_DS) AS PRODUCT_SUBGROUP_DS,
MAX(PRODUCT_GROUP_DS) AS PRODUCT_GROUP_DS,
MAX(MODEL_DS) AS MODEL_DS,
MAX(PRODUCT_FAMILY_DS) AS PRODUCT_FAMILY_DS,
MAX(LAST_YEAR_WEEK_ABILITAZIONE) AS LAST_YEAR_WEEK_ABILITAZIONE,
-- AVERAGE LIFE TO DATE METRICS
MAX(FULL_PRICE_SALES_LTD) / NULLIF(MAX(TOTAL_QTY_LTD),0) AS AVERAGE_FULL_PRICE_CHF,
MAX(NET_REVENUES_LTD) / NULLIF(MAX(TOTAL_QTY_LTD),0) AS AVERAGE_SELLING_PRICE_CHF,
MAX(SALES_VALUE_STD_LTD) / NULLIF(MAX(TOTAL_QTY_LTD),0) AS STANDARD_COST_CHF,
MAX(SALES_VALUE_STD_LTD) / NULLIF(MAX(NET_REVENUES_LTD),0) AS MARGIN,
-- WEEK BY WEEK ANALYSIS
MAX(LAST_YEAR_MOTNH_ABILITAZIONE) AS LAST_YEAR_MOTNH_ABILITAZIONE,
SUM(NET_REVENUES) AS NET_REVENUES,
SUM(NET_SALES) AS NET_SALES,
SUM(FULL_PRICE_SALES) AS FULL_PRICE_SALES,
SUM(VAL_STDCOST) AS VAL_STDCOST,
SUM(TOT_PROMOTION) AS TOT_PROMOTION,
SUM(NET_SALES_MARKDOWN) AS NET_SALES_MARKDOWN,
SUM(NET_REVENUES_MARKDOWN) AS NET_REVENUES_MARKDOWN,
SUM(TOT_DISCOUNT) AS TOT_DISCOUNT,
SUM(SALES_QTY) AS SALES_QTY,
COUNT(DISTINCT STORE_SELLING) AS STORE_SELLING,
COUNT(DISTINCT COALESCE(sales.CLIENT_CD, stock.CLIENT_CD)) AS TOTAL_STORES_LW,
-- LIFE TO DATE ANALYSIS
MAX(NET_SALES_LTD) AS NET_SALES_LTD,
MAX(NET_REVENUES_LTD) AS NET_REVENUES_LTD,
MAX(FULL_PRICE_SALES_LTD) AS FULL_PRICE_SALES_LTD,
MAX(SALES_VALUE_STD_LTD) AS SALES_VALUE_STD_LTD,
MAX(FULL_PRICE_QTY_LTD) AS FULL_PRICE_QTY_LTD,
MAX(TOTAL_QTY_LTD) AS TOTAL_QTY_LTD,
MAX(NET_SALES_LTD_REGION) AS NET_SALES_LTD_REGION,
MAX(NET_REVENUES_LTD_REGION) AS NET_REVENUES_LTD_REGION,
MAX(FULL_PRICE_SALES_LTD_REGION) AS FULL_PRICE_SALES_LTD_REGION,
MAX(SALES_VALUE_STD_LTD_REGION) AS SALES_VALUE_STD_LTD_REGION,
MAX(FULL_PRICE_QTY_LTD_REGION) AS FULL_PRICE_QTY_LTD_REGION,
MAX(TOTAL_QTY_LTD_REGION) AS TOTAL_QTY_LTD_REGION,
COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_AW_LTD),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_AW_LTD),0) + COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_SS_LTD),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_SS_LTD),0) AS WEEKS_ON_FLOOR_LTD,
COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_AW_LTD_REGION),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_AW_LTD_REGION),0) + COALESCE(MAX(WEEKS_ON_FLOOR_SEASONAL_SS_LTD_REGION),0) + COALESCE(MAX(WEEKS_ON_FLOOR_CARRY_SS_LTD_REGION),0) AS WEEKS_ON_FLOOR_LTD_REGION,
COALESCE(MAX(RUNNING_STORE_SELLING_AW_CARRY_LTD),0) + COALESCE(MAX(RUNNING_STORE_SELLING_AW_SEASONAL_LTD),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_CARRY_LTD),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_SEASONAL_LTD),0) AS STORES_SELLING_LTD,
COALESCE(MAX(RUNNING_STORE_SELLING_AW_CARRY_LTD_REGION),0) + COALESCE(MAX(RUNNING_STORE_SELLING_AW_SEASONAL_LTD_REGION),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_CARRY_LTD_REGION),0) + COALESCE(MAX(RUNNING_STORE_SELLING_SS_SEASONAL_LTD_REGION),0) AS STORES_SELLING_LTD_REGION,
-- STOCK DATA
COUNT(DISTINCT stock.CLIENT_CD) AS STOCK_STORES,
SUM(STOCK_QTY_NM) AS STOCK_QTY_NM,
SUM(STOCK_IN_STORE) AS STOCK_IN_STORE,
SUM(STOCK_IN_TRANSIT_TO_STORE) AS STOCK_IN_TRANSIT_TO_STORE,
SUM(STOCK_IN_REGIONAL_DC) AS STOCK_IN_REGIONAL_DC,
SUM(STOCK_IN_STORE) + SUM(STOCK_IN_TRANSIT_TO_STORE) + SUM(STOCK_IN_TRANSIT_TO_STORE) + SUM(STOCK_IN_REGIONAL_DC) AS TOTAL_STOCK_ON_HAND,
SUM(STOCK_ALLOCATED_IN_CENTRAL_WH) AS STOCK_ALLOCATED_IN_CENTRAL_WH,
SUM(STOCK_IN_TRANSIT_TO_REGION) AS STOCK_IN_TRANSIT_TO_REGION,
SUM(STOCK_FREE_CENTRAL_WH) AS STOCK_FREE_CENTRAL_WH
FROM
W_SALES_LINE_LEVEL_RT_WORLD2 sales
FULL OUTER JOIN
W_STOCK_LINE_LEVEL_RT_WORLD stock
ON
sales.FISCAL_YEARWEEK_NM = stock.FISCAL_YEARWEEK_NM
AND sales.BTN = stock.BTN_CD
AND sales.ARTICLE_SEASON_DS = stock.ARTICLE_SEASON_DS
AND sales.CLIENT_CD = stock.CLIENT_CD
GROUP BY
COALESCE(sales.FISCAL_YEARWEEK_NM, stock.FISCAL_YEARWEEK_NM),
COALESCE(sales.BTN, stock.BTN_CD),
COALESCE(sales.ARTICLE_SEASON_DS, stock.ARTICLE_SEASON_DS)
) NUOVA
ON
TARGET.FISCAL_YEARWEEK_NM = NUOVA.FISCAL_YEARWEEK_NM
AND TARGET.BTN = NUOVA.BTN
AND TARGET.ARTICLE_SEASON_DS = NUOVA.ARTICLE_SEASON_DS
WHEN MATCHED
THEN UPDATE SET
TARGET.FISCAL_YEARWEEK_NM = NUOVA.FISCAL_YEARWEEK_NM,
TARGET.BTN = NUOVA.BTN,
TARGET.ARTICLE_SEASON_DS = NUOVA.ARTICLE_SEASON_DS,
TARGET.PRODUCT_IMAGE = NUOVA.PRODUCT_IMAGE,
--TARGET.SUBCHANNEL = NUOVA.SUBCHANNEL,
TARGET.GENDER = NUOVA.GENDER,
TARGET.PRODUCT_CATEGORY = NUOVA.PRODUCT_CATEGORY,
TARGET.FIRST_SEASON = NUOVA.FIRST_SEASON,
TARGET.LAST_SEASON = NUOVA.LAST_SEASON,
TARGET.TREND_DS = NUOVA.TREND_DS,
TARGET.TREND_CD = NUOVA.TREND_CD,
TARGET.COLOR = NUOVA.COLOR,
TARGET.MATERIAL = NUOVA.MATERIAL,
TARGET.FINISH = NUOVA.FINISH,
TARGET.DEPARTMENT = NUOVA.DEPARTMENT,
TARGET.PRODUCT_SUBCATEGORY = NUOVA.PRODUCT_SUBCATEGORY,
TARGET.THEME_DS = NUOVA.THEME_DS,
TARGET.THEME_FIX = NUOVA.THEME_FIX,
TARGET.PRODUCT_SUBGROUP_DS = NUOVA.PRODUCT_SUBGROUP_DS,
TARGET.PRODUCT_GROUP_DS = NUOVA.PRODUCT_GROUP_DS,
TARGET.MODEL_DS = NUOVA.MODEL_DS,
TARGET.PRODUCT_FAMILY_DS = NUOVA.PRODUCT_FAMILY_DS,
TARGET.LAST_YEAR_WEEK_ABILITAZIONE = NUOVA.LAST_YEAR_WEEK_ABILITAZIONE,
TARGET.AVERAGE_FULL_PRICE_CHF = NUOVA.AVERAGE_FULL_PRICE_CHF,
TARGET.AVERAGE_SELLING_PRICE_CHF = NUOVA.AVERAGE_SELLING_PRICE_CHF,
TARGET.STANDARD_COST_CHF = NUOVA.STANDARD_COST_CHF,
TARGET.MARGIN = NUOVA.MARGIN,
TARGET.LAST_YEAR_MOTNH_ABILITAZIONE = NUOVA.LAST_YEAR_MOTNH_ABILITAZIONE,
TARGET.NET_REVENUES = NUOVA.NET_REVENUES,
TARGET.NET_SALES = NUOVA.NET_SALES,
TARGET.FULL_PRICE_SALES = NUOVA.FULL_PRICE_SALES,
TARGET.VAL_STDCOST = NUOVA.VAL_STDCOST,
TARGET.TOT_PROMOTION = NUOVA.TOT_PROMOTION,
TARGET.NET_SALES_MARKDOWN = NUOVA.NET_SALES_MARKDOWN,
TARGET.NET_REVENUES_MARKDOWN = NUOVA.NET_REVENUES_MARKDOWN,
TARGET.TOT_DISCOUNT = NUOVA.TOT_DISCOUNT,
TARGET.SALES_QTY = NUOVA.SALES_QTY,
TARGET.STORE_SELLING = NUOVA.STORE_SELLING,
TARGET.TOTAL_STORES_LW = NUOVA.TOTAL_STORES_LW,
TARGET.NET_SALES_LTD = NUOVA.NET_SALES_LTD,
TARGET.NET_REVENUES_LTD = NUOVA.NET_REVENUES_LTD,
TARGET.FULL_PRICE_SALES_LTD = NUOVA.FULL_PRICE_SALES_LTD,
TARGET.SALES_VALUE_STD_LTD = NUOVA.SALES_VALUE_STD_LTD,
TARGET.FULL_PRICE_QTY_LTD = NUOVA.FULL_PRICE_QTY_LTD,
TARGET.TOTAL_QTY_LTD = NUOVA.TOTAL_QTY_LTD,
TARGET.NET_SALES_LTD_REGION = NUOVA.NET_SALES_LTD_REGION,
TARGET.NET_REVENUES_LTD_REGION = NUOVA.NET_REVENUES_LTD_REGION,
TARGET.FULL_PRICE_SALES_LTD_REGION = NUOVA.FULL_PRICE_SALES_LTD_REGION,
TARGET.SALES_VALUE_STD_LTD_REGION = NUOVA.SALES_VALUE_STD_LTD_REGION,
TARGET.FULL_PRICE_QTY_LTD_REGION = NUOVA.FULL_PRICE_QTY_LTD_REGION,
TARGET.TOTAL_QTY_LTD_REGION = NUOVA.TOTAL_QTY_LTD_REGION,
TARGET.WEEKS_ON_FLOOR_LTD = NUOVA.WEEKS_ON_FLOOR_LTD,
TARGET.WEEKS_ON_FLOOR_LTD_REGION = NUOVA.WEEKS_ON_FLOOR_LTD_REGION,
TARGET.STORES_SELLING_LTD = NUOVA.STORES_SELLING_LTD,
TARGET.STORES_SELLING_LTD_REGION = NUOVA.STORES_SELLING_LTD_REGION,
TARGET.STOCK_STORES = NUOVA.STOCK_STORES,
TARGET.STOCK_QTY_NM = NUOVA.STOCK_QTY_NM,
TARGET.STOCK_IN_STORE = NUOVA.STOCK_IN_STORE,
TARGET.STOCK_IN_TRANSIT_TO_STORE = NUOVA.STOCK_IN_TRANSIT_TO_STORE,
TARGET.STOCK_IN_REGIONAL_DC = NUOVA.STOCK_IN_REGIONAL_DC,
TARGET.TOTAL_STOCK_ON_HAND = NUOVA.TOTAL_STOCK_ON_HAND,
TARGET.STOCK_ALLOCATED_IN_CENTRAL_WH = NUOVA.STOCK_ALLOCATED_IN_CENTRAL_WH,
TARGET.STOCK_IN_TRANSIT_TO_REGION = NUOVA.STOCK_IN_TRANSIT_TO_REGION,
TARGET.STOCK_FREE_CENTRAL_WH = NUOVA.STOCK_FREE_CENTRAL_WH
WHEN NOT MATCHED BY TARGET
THEN INSERT
(FISCAL_YEARWEEK_NM,
BTN,
ARTICLE_SEASON_DS,
PRODUCT_IMAGE,
--SUBCHANNEL,
GENDER,
PRODUCT_CATEGORY,
FIRST_SEASON,
LAST_SEASON,
TREND_DS,
TREND_CD,
COLOR,
MATERIAL,
FINISH,
DEPARTMENT,
PRODUCT_SUBCATEGORY,
THEME_DS,
THEME_FIX,
PRODUCT_SUBGROUP_DS,
PRODUCT_GROUP_DS,
MODEL_DS,
PRODUCT_FAMILY_DS,
LAST_YEAR_WEEK_ABILITAZIONE,
AVERAGE_FULL_PRICE_CHF,
AVERAGE_SELLING_PRICE_CHF,
STANDARD_COST_CHF,
MARGIN,
LAST_YEAR_MOTNH_ABILITAZIONE,
NET_REVENUES,
NET_SALES,
FULL_PRICE_SALES,
VAL_STDCOST,
TOT_PROMOTION,
NET_SALES_MARKDOWN,
NET_REVENUES_MARKDOWN,
TOT_DISCOUNT,
SALES_QTY,
STORE_SELLING,
TOTAL_STORES_LW,
NET_SALES_LTD,
NET_REVENUES_LTD,
FULL_PRICE_SALES_LTD,
SALES_VALUE_STD_LTD,
FULL_PRICE_QTY_LTD,
TOTAL_QTY_LTD,
NET_SALES_LTD_REGION,
NET_REVENUES_LTD_REGION,
FULL_PRICE_SALES_LTD_REGION,
SALES_VALUE_STD_LTD_REGION,
FULL_PRICE_QTY_LTD_REGION,
TOTAL_QTY_LTD_REGION,
WEEKS_ON_FLOOR_LTD,
WEEKS_ON_FLOOR_LTD_REGION,
STORES_SELLING_LTD,
STORES_SELLING_LTD_REGION,
STOCK_STORES,
STOCK_QTY_NM,
STOCK_IN_STORE,
STOCK_IN_TRANSIT_TO_STORE,
STOCK_IN_REGIONAL_DC,
TOTAL_STOCK_ON_HAND,
STOCK_ALLOCATED_IN_CENTRAL_WH,
STOCK_IN_TRANSIT_TO_REGION,
STOCK_FREE_CENTRAL_WH)
VALUES
(NUOVA.FISCAL_YEARWEEK_NM,
NUOVA.BTN,
NUOVA.ARTICLE_SEASON_DS,
NUOVA.PRODUCT_IMAGE,
--NUOVA.SUBCHANNEL,
NUOVA.GENDER,
NUOVA.PRODUCT_CATEGORY,
NUOVA.FIRST_SEASON,
NUOVA.LAST_SEASON,
NUOVA.TREND_DS,
NUOVA.TREND_CD,
NUOVA.COLOR,
NUOVA.MATERIAL,
NUOVA.FINISH,
NUOVA.DEPARTMENT,
NUOVA.PRODUCT_SUBCATEGORY,
NUOVA.THEME_DS,
NUOVA.THEME_FIX,
NUOVA.PRODUCT_SUBGROUP_DS,
NUOVA.PRODUCT_GROUP_DS,
NUOVA.MODEL_DS,
NUOVA.PRODUCT_FAMILY_DS,
NUOVA.LAST_YEAR_WEEK_ABILITAZIONE,
NUOVA.AVERAGE_FULL_PRICE_CHF,
NUOVA.AVERAGE_SELLING_PRICE_CHF,
NUOVA.STANDARD_COST_CHF,
NUOVA.MARGIN,
NUOVA.LAST_YEAR_MOTNH_ABILITAZIONE,
NUOVA.NET_REVENUES,
NUOVA.NET_SALES,
NUOVA.FULL_PRICE_SALES,
NUOVA.VAL_STDCOST,
NUOVA.TOT_PROMOTION,
NUOVA.NET_SALES_MARKDOWN,
NUOVA.NET_REVENUES_MARKDOWN,
NUOVA.TOT_DISCOUNT,
NUOVA.SALES_QTY,
NUOVA.STORE_SELLING,
NUOVA.TOTAL_STORES_LW,
NUOVA.NET_SALES_LTD,
NUOVA.NET_REVENUES_LTD,
NUOVA.FULL_PRICE_SALES_LTD,
NUOVA.SALES_VALUE_STD_LTD,
NUOVA.FULL_PRICE_QTY_LTD,
NUOVA.TOTAL_QTY_LTD,
NUOVA.NET_SALES_LTD_REGION,
NUOVA.NET_REVENUES_LTD_REGION,
NUOVA.FULL_PRICE_SALES_LTD_REGION,
NUOVA.SALES_VALUE_STD_LTD_REGION,
NUOVA.FULL_PRICE_QTY_LTD_REGION,
NUOVA.TOTAL_QTY_LTD_REGION,
NUOVA.WEEKS_ON_FLOOR_LTD,
NUOVA.WEEKS_ON_FLOOR_LTD_REGION,
NUOVA.STORES_SELLING_LTD,
NUOVA.STORES_SELLING_LTD_REGION,
NUOVA.STOCK_STORES,
NUOVA.STOCK_QTY_NM,
NUOVA.STOCK_IN_STORE,
NUOVA.STOCK_IN_TRANSIT_TO_STORE,
NUOVA.STOCK_IN_REGIONAL_DC,
NUOVA.TOTAL_STOCK_ON_HAND,
NUOVA.STOCK_ALLOCATED_IN_CENTRAL_WH,
NUOVA.STOCK_IN_TRANSIT_TO_REGION,
NUOVA.STOCK_FREE_CENTRAL_WH)
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
The Road Ahead
Now, the query reported above alone took five hours to complete. I am sure there are better ways out there to improve this terrible performance. I hope some of you will land a hand here.
If I can be clearer on something, just drop a comment!
Thanks!

Related

MatLab - Creating an array of images depending on their correlation

I've created a program for a project that tests images against one another to see whether or not it's the same image or not. I've decided to use correlation since the images I am using are styled in the same way and with this, I've been able to get everything working up to this point.
I now wish to create an array of images again, but this time, in order of their correlation. So for example, if I'm testing a 50 pence coin and I test 50 images against the 50 pence coin, I want the highest 5 correlations to be stored into an array, which can then be used for later use. But I'm unsure how to do this as each item in the array will need to have more than one variable, which will be the image location/name of the image and it's correlation percentage.
%Program Created By Ben Parry, 2016.
clc(); %Simply clears the console window
%Targets the image the user picks
inputImage = imgetfile();
%Targets all the images inside this directory
referenceFolder = 'M:\Project\MatLab\Coin Image Processing\Saved_Images';
if ~isdir(referenceFolder)
errorMessage = print('Error: Folder does not exist!');
uiwait(warndlg(errorMessage)); %Displays an error if the folder doesn't exist
return;
end
filePattern = fullfile(referenceFolder, '*.jpg');
jpgFiles = dir(filePattern);
for i = 1:length(jpgFiles)
baseFileName = jpgFiles(i).name;
fullFileName = fullfile(referenceFolder, baseFileName);
fprintf(1, 'Reading %s\n', fullFileName);
imageArray = imread(fullFileName);
imshow(imageArray);
firstImage = imread(inputImage); %Reading the image
%Converting the images to Black & White
firstImageBW = im2bw(firstImage);
secondImageBW = im2bw(imageArray);
%Finding the correlation, then coverting it into a percentage
c = corr2(firstImageBW, secondImageBW);
corrValue = sprintf('%.0f%%',100*c);
%Custom messaging for the possible outcomes
corrMatch = sprintf('The images are the same (%s)',corrValue);
corrUnMatch = sprintf('The images are not the same (%s)',corrValue);
%Looping for the possible two outcomes
if c >=0.99 %Define a percentage for the correlation to reach
disp(' ');
disp('Images Tested:');
disp(inputImage);
disp(fullFileName);
disp (corrMatch);
disp(' ');
else
disp(' ');
disp('Images Tested:');
disp(inputImage);
disp(fullFileName);
disp(corrUnMatch);
disp(' ' );
end;
imageArray = imread(fullFileName);
imshow(imageArray);
end
You can use struct() function to create structures.
Initializing an array of struct:
imStruct = struct('fileName', '', 'image', [], 'correlation', 0);
imData = repmat(imStruct, length(jpgFiles), 1);
Setting field values:
for i = 1:length(jpgFiles)
% ...
imData(i).fileName = fullFileName;
imData(i).image = imageArray;
imData(i).correlation = corrValue;
end
Extract values of correlation field and select 5 highest correlations:
corrList = [imData.correlation];
[~, sortedInd] = sort(corrList, 'descend');
selectedData = imData(sortedInd(1:5));

How to loop through table based on unique date in MATLAB

I have this table named BondData which contains the following:
Settlement Maturity Price Coupon
8/27/2016 1/12/2017 106.901 9.250
8/27/2019 1/27/2017 104.79 7.000
8/28/2016 3/30/2017 106.144 7.500
8/28/2016 4/27/2017 105.847 7.000
8/29/2016 9/4/2017 110.779 9.125
For each day in this table, I am about to perform a certain task which is to assign several values to a variable and perform necessary computations. The logic is like:
do while Settlement is the same
m_settle=current_row_settlement_value
m_maturity=current_row_maturity_value
and so on...
my_computation_here...
end
It's like I wanted to loop through my settlement dates and perform task for as long as the date is the same.
EDIT: Just to clarify my issue, I am implementing Yield Curve fitting using Nelson-Siegel and Svensson models.Here are my codes so far:
function NS_SV_Models()
load bondsdata
BondData=table(Settlement,Maturity,Price,Coupon);
BondData.Settlement = categorical(BondData.Settlement);
Settlements = categories(BondData.Settlement); % get all unique Settlement
for k = 1:numel(Settlements)
rows = BondData.Settlement==Settlements(k);
Bonds.Settle = Settlements(k); % current_row_settlement_value
Bonds.Maturity = BondData.Maturity(rows); % current_row_maturity_value
Bonds.Prices=BondData.Price(rows);
Bonds.Coupon=BondData.Coupon(rows);
Settle = Bonds.Settle;
Maturity = Bonds.Maturity;
CleanPrice = Bonds.Prices;
CouponRate = Bonds.Coupon;
Instruments = [Settle Maturity CleanPrice CouponRate];
Yield = bndyield(CleanPrice,CouponRate,Settle,Maturity);
NSModel = IRFunctionCurve.fitNelsonSiegel('Zero',Settlements(k),Instruments);
SVModel = IRFunctionCurve.fitSvensson('Zero',Settlements(k),Instruments);
NSModel.Parameters
SVModel.Parameters
end
end
Again, my main objective is to get each model's parameters (beta0, beta1, beta2, etc.) on a per day basis. I am getting an error in Instruments = [Settle Maturity CleanPrice CouponRate]; because Settle contains only one record (8/27/2016), it's suppose to have two since there are two rows for this date. Also, I noticed that Maturity, CleanPrice and CouponRate contains all records. They should only contain respective data for each day.
Hope I made my issue clearer now. By the way, I am using MATLAB R2015a.
Use categorical array. Here is your function (without its' headline, and all rows I can't run are commented):
BondData = table(datetime(Settlement),datetime(Maturity),Price,Coupon,...
'VariableNames',{'Settlement','Maturity','Price','Coupon'});
BondData.Settlement = categorical(BondData.Settlement);
Settlements = categories(BondData.Settlement); % get all unique Settlement
for k = 1:numel(Settlements)
rows = BondData.Settlement==Settlements(k);
Settle = BondData.Settlement(rows); % current_row_settlement_value
Mature = BondData.Maturity(rows); % current_row_maturity_value
CleanPrice = BondData.Price(rows);
CouponRate = BondData.Coupon(rows);
Instruments = [datenum(char(Settle)) datenum(char(Mature))...
CleanPrice CouponRate];
% Yield = bndyield(CleanPrice,CouponRate,Settle,Mature);
%
% NSModel = IRFunctionCurve.fitNelsonSiegel('Zero',Settlements(k),Instruments);
% SVModel = IRFunctionCurve.fitSvensson('Zero',Settlements(k),Instruments);
%
% NSModel.Parameters
% SVModel.Parameters
end
Keep in mind the following:
You cannot concat different types of variables as you try to do in: Instruments = [Settle Maturity CleanPrice CouponRate];
There is no need in the structure Bond, you don't use it (e.g. Settle = Bonds.Settle;).
Use the relevant functions to convert between a datetime object and string or numbers. For instance, in the code above: datenum(char(Settle)). I don't know what kind of input you need to pass to the following functions.

String or binary data would be truncated when updating datatable

I am trying to update customer info. I use this code to load 5 fields:
cust_cn.Open()
Dim cust_da As New SqlDataAdapter("SELECT * FROM [customers] where [custID]=" & txtCustPhone.Text, cust_cn)
cust_da.Fill(cust_datatable)
txtCustPhone.Text = cust_datatable.Rows(0).Item("custID")
txtCustFirstName.Text = cust_datatable.Rows(0).Item("first")
txtCustLastName.Text = cust_datatable.Rows(0).Item("last")
txtCustAddress.Text = cust_datatable.Rows(0).Item("address")
txtCustZip.Text = cust_datatable.Rows(0).Item("zip")
and this works fine. When I try to modify one of the fields (change zip code on an existing customer)
with this code:
If cust_datatable.Rows.Count <> 0 Then
cust_datatable.Rows(0).Item("custID") = txtCustPhone.Text
cust_datatable.Rows(0).Item("first") = txtCustFirstName.Text
cust_datatable.Rows(0).Item("last") = txtCustLastName
cust_datatable.Rows(0).Item("address") = txtCustAddress.Text
cust_datatable.Rows(0).Item("zip") = txtCustZip.Text
'cust_datatable.Rows(custrecord)("custID") = txtCustPhone.Text
'cust_datatable.Rows(custrecord)("first") = txtCustFirstName.Text
'cust_datatable.Rows(custrecord)("last") = txtCustLastName.Text
'cust_datatable.Rows(custrecord)("address") = txtCustAddress.Text
'cust_datatable.Rows(custrecord)("zip") = txtCustZip.Text
cust_DA.Update(cust_datatable)
End If
I get the error: "String or binary data would be truncated"
I originally tried to update using the commented section, but it was only modifying the first record in the database.
Any thoughts?

Is it better to change the db schema?

I'm building a web app with django. I use postgresql for the db. The app code is getting really messy(my begginer skills being a big factor) and slow, even when I run the app locally.
This is an excerpt of my models.py file:
REPEATS_CHOICES = (
(NEVER, 'Never'),
(DAILY, 'Daily'),
(WEEKLY, 'Weekly'),
(MONTHLY, 'Monthly'),
...some more...
)
class Transaction(models.Model):
name = models.CharField(max_length=30)
type = models.IntegerField(max_length=1, choices=TYPE_CHOICES) # 0 = 'Income' , 1 = 'Expense'
amount = models.DecimalField(max_digits=12, decimal_places=2)
date = models.DateField(default=date.today)
frequency = models.IntegerField(max_length=2, choices=REPEATS_CHOICES)
ends = models.DateField(blank=True, null=True)
active = models.BooleanField(default=True)
category = models.ForeignKey(Category, related_name='transactions', blank=True, null=True)
account = models.ForeignKey(Account, related_name='transactions')
The problem is with date, frequency and ends. With this info I can know all the dates in which transactions occurs and use it to fill a cashflow table. Doing things this way involves creating a lot of structures(dictionaries, lists and tuples) and iterating them a lot. Maybe there is a very simple way of solving this with the actual schema, but I couldn't realize how.
I think that the app would be easier to code if, at the creation of a transaction, I could save all the dates in the db. I don't know if it's possible or if it's a good idea.
I'm reading a book about google app engine and the datastore's multivalued properties. What do you think about this for solving my problem?.
Edit: I didn't know about the PickleField. I'm now reading about it, maybe I could use it to store all the transaction's datetime objects.
Edit2: This is an excerpt of my cashflow2 view(sorry for the horrible code):
def cashflow2(request, account_name="Initial"):
if account_name == "Initial":
uri = "/cashflow/new_account"
return HttpResponseRedirect(uri)
month_info = {}
cat_info = {}
m_y_list = [] # [(month,year),]
trans = []
min, max = [] , []
account = Account.objects.get(name=account_name, user=request.user)
categories = account.categories.all()
for year in range(2006,2017):
for month in range(1,13):
month_info[(month, year)] = [0, 0, 0]
for cat in categories:
cat_info[(cat, month, year)] = 0
previous_months = 1 # previous months from actual
next_months = 5
dates_list = month_year_list(previous_month, next_months) # Returns [(month,year)] from the requested range
m_y_list = [(date.month, date.year) for date in month_year_list(1,5)]
min, max = dates_list[0], dates_list[-1]
INCOME = 0
EXPENSE = 1
ONHAND = 2
transacs_in_dates = []
txs = account.transactions.order_by('date')
for tx in txs:
monthyear = ()
monthyear = (tx.date.month, tx.date.year)
if tx.frequency == 0:
if tx.type == 0:
month_info[monthyear][INCOME] += tx.amount
if tx.category:
cat_info[(tx.category, monthyear[0], monthyear[1])] += tx.amount
else:
month_info[monthyear][EXPENSE] += tx.amount
if tx.category:
cat_info[(tx.category, monthyear[0], monthyear[1])] += tx.amount
if monthyear in lista_m_a:
if tx not in transacs_in_dates:
transacs_in_dates.append(tx)
elif tx.frequency == 4: # frequency = 'Monthly'
months_dif = relativedelta.relativedelta(tx.ends, tx.date).months
if tx.ends.day < tx.date.day:
months_dif += 1
years_dif = relativedelta.relativedelta(tx.ends, tx.date).years
dif = months_dif + (years_dif*12)
dates_range = dif + 1
for i in range(dates_range):
dt = tx.date+relativedelta.relativedelta(months=+i)
if (dt.month, dt.year) in m_y_list:
if tx not in transacs_in_dates:
transacs_in_dates.append(tx)
if tx.type == 0:
month_info[(fch.month,fch.year)][INCOME] += tx.amount
if tx.category:
cat_info[(tx.category, fch.month, fch.year)] += tx.amount
else:
month_info[(fch.month,fch.year)][EXPENSE] += tx.amount
if tx.category:
cat_info[(tx.category, fch.month, fch.year)] += tx.amount
import operator
thelist = []
thelist = sorted((my + tuple(v) for my, v in month_info.iteritems()),
key = operator.itemgetter(1, 0))
thelistlist = []
for atuple in thelist:
thelistlist.append(list(atuple))
for i in range(len(thelistlist)):
if i != 0:
thelistlist[i][4] = thelistlist[i-1][2] - thelistlist[i-1][3] + thelistlist[i-1][4]
list = []
for el in thelistlist:
if (el[0],el[1]) in lista_m_a:
list.append(el)
transactions = account.transactions.all()
cats_in_dates_income = []
cats_in_dates_expense = []
for t in transacs_in_dates:
if t.category and t.type == 0:
if t.category not in cats_in_dates_income:
cats_in_dates_income.append(t.category)
elif t.category and t.type == 1:
if t.category not in cats_in_dates_expense:
cats_in_dates_expense.append(t.category)
cat_infos = []
for k, v in cat_info.items():
cat_infos.append((k[0], k[1], k[2], v))
Depends on how relevant App Engine is here. P.S. If you'd like to store pickled objects as well as JSON objects in the Google Datastore, check out these two code snippets:
http://kovshenin.com/archives/app-engine-json-objects-google-datastore/
http://kovshenin.com/archives/app-engine-python-objects-in-the-google-datastore/
Also note that the Google Datastore is a non-relational database, so you might have other trouble refactoring your code to switch to that.
Cheers and good luck!

Mom file creation (5 product limit)

Ok, I realize this is a very niche issue, but I'm hoping the process is straight forward enough...
I'm tasked with creating a data file out of Customer/Order information. Problem is, the datafile has a 5 product max limit.
Basically, I get my data, group by cust_id, create the file structure, within that loop, group by product_id, rewrite the fields in previous file_struct with new product info. That's worked all well and good until a user exceeded that max.
A brief example.. (keep in mind, the structure of the array is set by another process, this CANNOT change)
orderArray = arranyew(2);
set order = 1;
loop over cust_id;
field[order][1] = "field(1)"; // cust_id
field[order][2] = "field(2)"; // name
field[order][3] = "field(3)"; // phone
field[order][4] = ""; // product_1
field[order][5] = ""; // quantity_1
field[order][6] = ""; // product_2
field[order][7] = ""; // quantity_2
field[order][8] = ""; // product_3
field[order][9] = ""; // quantity_3
field[order][10] = ""; // product_4
field[order][11] = ""; // quantity_4
field[order][12] = ""; // product_5
field[order][13] = ""; // quantity_5
field[order][14] = "field(4)"; // trx_id
field[order][15] = "field(5)"; // total_cost
counter = 0;
loop over product_id
field[order[4+counter] = productCode;
field[order[5+counter] = quantity;
counter = counter + 2;
end inner loop;
order = order + 1;
end outer loop;
Like I said, this worked fine until I had a user who ordered more than 5 products.
What I basically want to do is check the number of products for each user if that number is greater than 5, start a new line in the text field, but I'm stufk on how to get there.
I've tried numerous fixes, but nothing gives the results I need.
I can send the entire file if It can help, but I don't want to post it all here.
You need to move the inserting of the header and footer fields into product loop eg. the custid and trx_id fields.
Here's a rough idea of one why you can go about this based on the pseudo code you provided. I'm sure that there are more elegant ways that you could code this.
set order = 0;
loop over cust_id;
counter = 1;
order = order + 1;
loop over product_id
if (counter == 1 || counter == 6) {
if (counter == 6) {
counter == 1;
order= order+1;
}
field[order][1] = "field(1)"; // cust_id
field[order][2] = "field(2)"; // name
field[order][3] = "field(3)"; // phone
}
field[order][counter+3] = productCode; // product_1
field[order][counter+4] = quantity; // quantity_1
counter = counter + 1;
if (counter == 6) {
field[order][14] = "field(4)"; // trx_id
field[order][15] = "field(5)"; // total_cost
}
end inner loop;
if (counter == 6) {
// loop here to insert blank columns and the totals field to fill out the row.
}
end outer loop;
One thing goes concern me. If you start a new line every five products then your transaction id and total cost is going to be entered into the file more than once. You know the receiving system. It may be a non-issue.
Hope this helps
As you put the data into the row, you need check if there are more than 5 products and then create an additional line.
loop over product_id
if (counter mod 10 == 0 and counter > 0) {
// create the new row, and mark it as a continuation of the previous order
counter = 0;
order = order + 1;
field[order][1] = "";
...
field[order][15] = "";
}
field[order[4+counter] = productCode;
field[order[5+counter] = quantity;
counter = counter + 2;
end inner loop;
I've actually done the export from an ecommerce system to MOM, but that code has since been lost. I have samples of code in classic ASP.

Resources