Writing formulation in SAS - arrays

I have the following SAS dataset:
A B C1 C2 C2
Part1 100 50 20 2 0.1
Part2 100 10 30 5 0.5
Part3 100 80 15 9 0.7
Part4 100 60 58 3 0.9
I have, for each Part, a cost function defined as such:
F1(Part1) = C1,part1 + C2,part1*P1 + C3,part1*P1^2
F2(Part2) = C1,part2 + C2,part2*P2 + C3,part2*P2^2
F3(Part3) = C1,part3 + C2,part3*P3 + C3,part3*P3^2
F4(Part4) = C1,part4 + C2,part4*P4 + C3,part4*P4^2
I defined/declared the following parameters:
set <str> Parts; *set for all parts
num nseg{Parts}; *number of segments for each part cost function
num coef{nseg,Parts} *the C values from the dataset
I'm trying to write a formulation to represent the sum of the cost functions
F1(Part1)+ ... +FN(PartN), and came up with this:
TotalCost = sum{p in Parts, j in 1..nseg[p]} (Coef[n,p] + Coef[n,p]*P[p] + Coef[n,p]*P[p]^2).
Unfortunately I'm not getting it quite right. Any suggestions on best way to do this?
Thank you very much!

Related

Given an array of `first n` natural numbers with one element missing and one duplicate element. Figure out both numbers [duplicate]

This question already has answers here:
Find the missing and duplicate elements in an array in linear time and constant space
(8 answers)
Closed 4 years ago.
Given an array of first n natural numbers with one element missing and one duplicate element. figure out both numbers. I was asked this question in one of the interviews.
For example, the sample array is
5, 2, 1, 4, 5
where n=5
Answer for the given array should be:
missing element = 3
duplicate element = 5
Is it possible to come-up with the solution without iterating the array.
I was able to come-up with a solution having complexity O(nlogn), without extra space. Does any O(n) solution exist (without extra space)?
Let's say that you have numbers 1,...,n. Now, denote their sum 1 + 2 + ... + n as S. If we denote the missing number as j and the duplicit one as k, we will get for the modified sum S' = S - j + k, so that's one equation for two variables. We can repeat the same procedure, but this time, summing second powers, thus S_2 = 1 + 4 + ... + n^2. For the sequence with one missing and one duplicit number, the result will be S_2' = S_2 - j*j + k*k. Thus we get two equations for two variables.
In total, we have:
S' = S - j + k
S_2' = S_2 - j*j + k*k
therefore
k - j = S' - S =: a
k*k - j*j = S_2' - S_2 =: b
where we introduced the symbols a and b to simplify the notation.
Now, k*k - j*j = (k - j)*(k + j), thus:
k - j = a
k*k - j*j = a * (k + j) = b
summing both equations gives:
2*k = b/a + a
2*j = b/a - a
For your particular example:
S = 1 + 2 + 3 + 4 + 5 = 15
S_2 = 1 + 4 + 9 + 16 + 25 = 55
For the series with one missing and one duplicit element:
S' = 5 + 2 + 1 + 4 + 5 = 17
S_2' = 25 + 4 + 1 + 16 + 25 = 71
then:
a = S' - S = 2
b = S_2' - S_2 = 16
therefore:
2*k = 8 + 2 = 10
2*j = 8 - 2 = 6
which gives:
k = 5, j = 3
In practice (as noted by #HermanTheGermanHesse), one can obtain closed-form expressions for S as well as S_2 in terms of polynomials in n (so-called Faulhaber's formulae), namely: S = n*(n+1)/2 and S_2 = n*(n+1)*(2*n+1)/6. Therefore it is just sufficient to go once over the input data, accumulate S' and S_2', and use the formulae for k and j given above...

Loops and output

I am trying to get a bit handy with my loop and output statements, currently I have a loan which amortizes like such:
data have;
input Payment2017 Payment2018 Payment2019 Payment2020;
datalines;
100 10 10 10;
run;
I'm trying to create a maturity and re-issuance profile that looks like this, I will explain the logic when I submit my current code:
data want;
input;
P2017 P2018 P2019 P2020 F1 F2 F3 MP2017 MP2018 MP2019 MP2020 NI2017 NI2018 NI2019 NI2020;
datalines;
100 10 10 10 0.1 0.1 0.1 100 10 10 10 0 0 0 0
100 10 10 10 0.1 0.1 0.1 0 10 1 1 0 10 0 0
100 10 10 10 0.1 0.1 0.1 0 0 11 1.1 0 0 11 0
100 10 10 10 0.1 0.1 0.1 0 0 0 12.1 0 0 0 12.1
;
run;
so the logic is that:
Payment2017 = the balance at the start of the year
Payment2018 - 2020 = the amount paid each period
F1-F3 is the fraction of the loan that is being paid each period.
MP2017-MP2020 is the amount of the loan that is paid back - essentially it is
mp(i) = p(i) *f(i)
NI2017-NI2020 is the amount that is newly issued if you assume that each time I pay off a bit of the loan , it is added back onto the loan. so the current code which I am using looks like this but i'm having some issues with the ouput and loops.
data want;
set have;
array MaturityProfile(4) MaturityProfile&StartDate-MaturityProfile&EndDate;
array NewIssuance(4) NewIssuance&StartDate - NewIssuance&EndDate;
array p(4) payment&StartDate-payment&EndDate;
array fraction(3); * track constant fraction determined at start of profile;
MaturityProfile(1) = P(1);
do i = 1 to 3;
fraction(i) = p(i+1) / p(1);
end;
iter=2;
do j = 1 to 2;
do i = iter to 4;
MaturityProfile(i) = P(j) * Fraction(i-j);
newissuance(i) = MaturityProfile(i);
end;
output;
iter=iter+1;
end;
output;
*MaturityProfile(4) = ( P(3) + MaturityProfile(2) ) * Fraction(1);
*output;
drop i;
drop j;
drop iter;
run;
I'm trying to find a way of for the first two rows, keeping it how it outputs currently but the third row needs the sum of the column for the second row ( or the newissuance2019) and then multiply that by fraction 1
so basically the output to look like the table I've put in the data want step.
TIA.
I managed to fix this by doing:
data want;
set have;
array MaturityProfile(4) MaturityProfile&StartDate-MaturityProfile&EndDate;
array NewIssuance(4) NewIssuance&StartDate - NewIssuance&EndDate;
array p(4) payment&StartDate-payment&EndDate;
array fraction(3); * track constant fraction determined at start of profile;
array Total(4) Total1-Total4;
MaturityProfile(1) = P(1);
do i = 1 to 3;
fraction(i) = p(i+1) / p(1);
end;
iter=2;
do j = 1 to 2;
do i = iter to 4;
MaturityProfile(i) = P(j) * Fraction(i-j);
Total(i)=MaturityProfile(i) + P(i);
end;
output;
iter=iter+1;
end;
MaturityProfile(4) = Total(3) * Fraction(1);
output;
drop i;
drop j;
drop iter;
run;

SQL - update each row of a column using while loop

My original table looks like:
week volume cost
1 11 null
2 32 null
3 80 null
4 75 null
5 50 null
...
51 28 null
I want to update the cost field by applying a more intelligent rule as follows:
if volume < 13, then use a rateA (loose shipment price)
if volume >= 13 and < 25, then use a rateB (20' container price)
if volume >= 25 and < 45, then use a rateC (40' container price)
I want to get the lowest cost by flexibly using above 3 different rates according to the "volume". For instance, in week 4, the "REMAINING VOLUME" initially is 75, I should apply one 40'container cost to load a portion of the volume. Then the "REMAINING VOLUME" is 30, I should apply a 20'container cost to load a portion of the rest. Then the "REMAINING VOLUME" is 5, I should apply the loose shipment price, keep doing in such way until "REMAINING VOLUME" = 0. By doing so it will give me the best combination to minimise the cost per week. Therefore, a while loop needs to be applied to the "REMAINING VOLUME" which can give indication about how to choose the different rates.
the final updated table should look like:
week volume cost
1 11 rateA
2 32 rateB + rateA
3 80 rateC + rateB + rateA
4 75 rateC + rateB + rateA
5 50 rateC + rateA
...
51 28 rateB + rateA
If the cases are fixed and the number of them is small, the following approach works perfectly fine (with SQL Server, anyway; I don't know about other DB's -- see note below) ...
UPDATE #Orders
SET [Cost] = CASE
WHEN 0 * 44 + 0 * 24 + 1 * 12 >= [Volume] THEN 'rateA'
WHEN 0 * 44 + 1 * 24 + 0 * 12 >= [Volume] THEN 'rateB'
WHEN 0 * 44 + 1 * 24 + 1 * 12 >= [Volume] THEN 'rateB + rateA'
WHEN 1 * 44 + 0 + 24 + 0 * 12 >= [Volume] THEN 'rateC'
WHEN 1 * 44 + 0 + 24 + 1 * 12 >= [Volume] THEN 'rateC + rateA'
WHEN 1 * 44 + 1 + 24 + 0 * 12 >= [Volume] THEN 'rateC + rateB'
WHEN 1 * 44 + 1 + 24 + 1 * 12 >= [Volume] THEN 'rateC + rateB + rateA'
ELSE 'too big'
END
FROM #Orders
;
No looping needed!
Notice the straightforward progression of the cases; you can extend the pattern to include larger weights if you want.
This works with SQL Server because you are guaranteed that the first matching condition is used, so if it matches the third WHEN case it won't try to evaluate the fourth, fifth, etc. I don't know if this is true for other databases.

Find timeline for duration values in Matlab

I have the following time-series:
b = [2 5 110 113 55 115 80 90 120 35 123];
Each number in b is one data point at a time instant. I computed the duration values from b. Duration is represented by all numbers within b larger or equal to 100 and arranged consecutively (all other numbers are discarded). A maximum gap of one number smaller than 100 is allowed. This is how the code for duration looks like:
N = 2; % maximum allowed gap
duration = cellfun(#numel, regexp(char((b>=100)+'0'), [repmat('0',1,N) '+'], 'split'));
giving the following duration values for b:
duration = [4 3];
I want to find the positions (time-lines) within b for each value in duration. Next, I want to replace the other positions located outside duration with zeros. The result would look like this:
result = [0 0 3 4 5 6 0 0 9 10 11];
If anyone could help, it would be great.
Answer to original question: pattern with at most one value below 100
Here's an approach using a regular expression to detect the desired pattern. I'm assuming that one value <100 is allowed only between (not after) values >=100. So the pattern is: one or more values >=100 with a possible value <100 in between .
b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
[s, e] = regexp(B, '1+(.1+|)', 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(#ge, y, s(:)) & bsxfun(#le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result
This gives
y =
0 0 3 4 5 6 0 0 9 10 11
Answer to edited question: pattern with at most n values in a row below 100
The regexp needs to be modified, and it has to be dynamically built as a function of n:
b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
n = 2;
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
r = sprintf('1+(.{1,%i}1+)*', n); %// build the regular expression from n
[s, e] = regexp(B, r, 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(#ge, y, s(:)) & bsxfun(#le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result
Here is another solution, not using regexp. It naturally generalizes to arbitrary gap sizes and thresholds. Not sure whether there is a better way to fill the gaps. Explanation in comments:
% maximum step size and threshold
N = 2;
threshold = 100;
% data
b = [2 5 110 113 55 115 80 90 120 35 123];
% find valid data
B = b >= threshold;
B_ind = find(B);
% find lengths of gaps
step_size = diff(B_ind);
% find acceptable steps (and ignore step size 1)
permissible_steps = 1 < step_size & step_size <= N;
% find beginning and end of runs
good_begin = B_ind([permissible_steps, false]);
good_end = good_begin + step_size(permissible_steps);
% fill gaps in B
for ii = 1:numel(good_begin)
B(good_begin(ii):good_end(ii)) = true;
end
% find durations of runs in B. This finds points where we switch from 0 to
% 1 and vice versa. Due to padding the first match is always a start of a
% run, the last one always an end. There will be an even number of matches,
% so we can reshape and diff and thus fidn the durations
durations = diff(reshape(find(diff([false, B, false])), 2, []));
% get positions of 'good' data
outpos = zeros(size(b));
outpos(B) = find(B);

A case of a making-change (sort of). Find the minimal composition of weights

TLDR: (part 1) Need to print out the best composition of weights to reach a target weight. (part 2) Don't know what approach to choose. (part 3) Also, recursion is not my friend.
I am not asking for a solution, I am just looking for a direction.
PART 1
Some text first.
The input to the program is:
a number of weights
weights themselves
target weights I am supposed to compose
There always has to be a weight that = 1, so all the weights can be composed exactly.
I am supposed to print out the optimal composition of weights, for example
number of weights: 4
weights: 1, 3, 7, 10
target weight: 4
output: 2 x 7
PART 2
The first thing that came to my mind was the unbounded knapsack problem, where I would set all the values for weights to "1" and then I'd look for the lowest possible value in the knapsack. The problem is, my programming skills don't reach that level and my googling skills failed me when I wanted to find a fine article/code/video/whatever to understand it.
Then someone pointed out the making-change problem. The problem there is that it usually uses an array and I am expecting really large numbers, I cannot afford to alloc an array of size = target weight. Also, it seems to require quite a lot of magic if I want not only the lowest possible number of weights, but the exact counts.
My solution now, shall I?
sort the weights in descending order
count the number of weights yielded from the greedy algorithm
remove one biggest weight used and try to compose the weight without it
repeat 3 until I have removed all the "biggest weights" or the number of weights started to grow again
(for weights = 1, 3, 7, 10 and target = 14, greedy would give me 1 x 10 + 1 x 3 + 1 x 1, after the third step I would get (0 x 10 +) 2 x 7)
I got here. Only I need to repeat this not outside the recursive function (like I was doing until I realised it still doesn't give me the right results) but I need to move the loop into the recursive function.
PART 3
This is how parts of my code looks now:
for ( i = 0; i < weights_cnt; i ++ )
for ( j = 0; j <= weight / *(weights + i); j ++ )
{
*counter = 0;
if ( (res = greedyMagicImproved(weights + i, weight / *(weights + i) - j, weight, counter, min)) == 0 || min > *counter ) break;
else min = *counter;
}
It's a mess, I know. (the first recursive function I've ever written, sorry for that)
int greedyMagicImproved (int * weights, int limit, int weight, int * counter, int min)
{
if ( *counter > min ) return 0;
else if ( weight % *weights == 0 )
{
printf ("%d * %d\n", limit, *weights);
*counter += limit;
return *counter;
}
else if ( weight == 0 ) return *counter;
else if ( weight / *weights )
{
printf ("%d * %d + ", limit, *weights);
*counter += limit;
return (greedyMagicImproved(weights + 1, (weight - *weights * limit) / *(weights + 1), (weight - *weights * limit) % *weights, counter, min));
}
else return greedyMagicImproved(weights + 1, weight / *(weights + 1), weight, counter, min);
}
This one produces something like this:
Number of weights:
8
Actual weights of weights:
1 2 4 5 10 20 60 95
Weights to be composed:
124
GREEDY = 1 * 95 + 1 * 20 + 1 * 5 + 1 * 4
IMP = 1 * 95 + 1 * 20 + 1 * 5 + 1 * 4
2 * 60 + 1 * 4
6 * 20 + 1 * 4
... some more irrelevant results I'll deal with later
28
GREEDY = 1 * 20 + 1 * 5 + 1 * 2 + 1 * 1
IMP = 1 * 20 + 1 * 5 + 1 * 2 + 1 * 1
1 * 20 + 1 * 5 + 1 * 2 + 1 * 1
1 * 20 + 1 * 5 + 1 * 2 + 1 * 1
2 * 10 + 1 * 5 + 1 * 2 + 1 * 1
5 * 5 + 1 * 2 + 1 * 1
... some more results as well
While I get to see the correct result in the first case, I do not in the second.
So basically, my question is: Should I try to move the loop part into the recursion (and write it basically all over again because I have no idea how to do it) or should I go stealing/packing and making change?
Here is a DP formulation:
Let w[i], i=0,1,...p be the coin weights and f(t,i) be the number of coins needed to hit target t using only w[k], k >= i. When there is no possible way to make change, then f(t,i) is infinite. With this we know
f(t,i) = min_{n} [ n + f(t - n * w[i], i + 1) ]
In English, to hit the target with the minimum of coins using only coins i, i+1, ... p, choose all possible quantities n of coin i and then use the same DP to make change for the remaining amount using only coins i+1,i+2,..., finally choosing the solution that produced the minimum.
The base cases are common sense. For example f(0,_) = 0. You don't need any coins to hit a target of zero.
If T is the problem target, then the answer will be f(T,0).
Since you don't want the answer, I'll let you convert this to code. It's likely you'll get to answers faster if the weights are sorted in descending order.

Resources