Bound my result to [-1,1] in SAS - arrays

I am calculating a fraction based on the payments made in four years and I wish to put a cap on my fraction such that it can only be between -1 and 1. Subsequently I'd like to make the following fractions 0 if the cap is maxxed out - an example would be:
data want;
input payment1 payment2 payment3 payment4 fraction1 fraction2 fraction3;
datalines;
100 25 25 25 0.25 0.25 0.25
150 50 50 50 0.33 0.33 0.33
50 10 10 10 0.2 0.2 0.2
10 50 60 70 1 0 0
;
run;
I've been looking at the ceiling function with the following code
data want2;
set want;
array fraction(3) fraction1 - fraction3;
array payment(4) payment1 - payment4;
do i = 2 to 4;
fraction(i-1) = payment(i)/payment(1);
end;
run;
data want3;
set want2;
array fraction(3) fraction1 - fraction3;
array fract(3) fract1-fract3;
do i = 1 to 3;
fract = ceil (fraction,1);
end;
drop i;
run;
but I am getting this error
ERROR 72-185: The CEIL function call has too many arguments.
So in all i'm looking for a way to calculate the fraction of the payments and then make a ceiling at one, then once the ceiling is hit, the subsequent fractions must be zero (which could be done I suppose by just doing an IF-THEN)

The ceil function is a type of rounding. You need min and max:
do i = 1 to 3;
fract = min(max(fraction, -1) ,1);
end;

Related

Convert continues binary fraction to decimal fraction in C

I implemented a digit-by-digit calculation of the square root of two. Each round it will outpute one bit of the fractional part e.g.
1 0 1 1 0 1 0 1 etc.
I want to convert this output to decimal numbers:
4 1 4 2 1 3 6 etc.
The issue I´m facing is, that this would generally work like this:
1 * 2^-1 + 0 * 2^-2 + 1 * 2^-3 etc.
I would like to avoid fractions altogether, as I would like to work with integers to convert from binary to decimal. Also I would like to print each decimal digit as soon as it has been computed.
Converting to hex is trivial, as I only have to wait for 4 bits. Is there a smart aproach to convert to base10 which allows to observe only a part of the whole output and idealy remove digits from the equation, once we are certain, that it wont change anymore, i.e.
1 0
2 0,25
3 0,375
4 0,375
5 0,40625
6 0,40625
7 0,4140625
8 0,4140625
After processing the 8th bit, I´m pretty sure that 4 is the first decimal fraction digit. Therefore I would like to remove 0.4 complelty from the equation to reduce the bits I need to take care of.
Is there a smart approach to convert to base10 which allows to observe only a part of the whole output and ideally remove digits from the equation, once we are certain that it wont change anymore (?)
Yes, eventually in practice, but in theory, no in select cases.
This is akin to the Table-maker's dilemma.
Consider the below handling of a value near 0.05. As long as the binary sequence is .0001 1001 1001 1001 1001 ... , we cannot know it the decimal equivalent is 0.04999999... or 0.05000000...non-zero.
int main(void) {
double a;
a = nextafter(0.05, 0);
printf("%20a %.20f\n", a, a);
a = 0.05;
printf("%20a %.20f\n", a, a);
a = nextafter(0.05, 1);
printf("%20a %.20f\n", a, a);
return 0;
}
0x1.9999999999999p-5 0.04999999999999999584
0x1.999999999999ap-5 0.05000000000000000278
0x1.999999999999bp-5 0.05000000000000000971
Code can analyse the incoming sequence of binary fraction bits and then ask two questions, after each bit: "if the remaining bits are all 0" what is it in decimal?" and "if the remaining bits are all 1" what is it in decimal?". In many cases, the answers will share common leading significant digits. Yet as shown above, as long as 1001 is received, there are no common significant decimal digits.
A usual "out" is to have an upper bound as to the number of decimal digits that will ever be shown. In that case code is only presenting a rounded result and that can be deduced in finite time even if the binary input sequence remains 1001
ad nauseam.
The issue I´m facing is, that this would generally work like this:
1 * 2^-1 + 0 * 2^-2 + 1 * 2^-3 etc.
Well 1/2 = 5/10 and 1/4 = 25/100 and so on which means you will need powers of 5 and shift the values by powers of 10
so given 0 1 1 0 1
[1] 0 * 5 = 0
[2] 0 * 10 + 1 * 25 = 25
[3] 25 * 10 + 1 * 125 = 375
[4] 375 * 10 + 0 * 625 = 3750
[5] 3750 * 10 + 1 * 3125 = 40625
Edit:
Is there a smart aproach to convert to base10 which allows to observe only a part of the whole output and idealy remove digits from the equation, once we are certain, that it wont change anymore
It might actually be possible to pop the most significant digits(MSD) in this case. This will be a bit long but please bear with me
Consider the values X and Y:
If X has the same number of digits as Y, then the MSD will change.
10000 + 10000 = 20000
If Y has 1 or more digits less than X, then the MSD can change.
19000 + 1000 = 20000
19900 + 100 = 20000
So the first point is self explanatory but the second point is what will allow us to pop the MSD. The first thing we need to know is that the values we are adding is continuously being divided in half every iteration. Which means that if we only consider the MSD, the largest value in base10 is 9 which will produce the sequence
9 > 4 > 2 > 1 > 0
If we sum up these values it will be equal to 16, but if we try to consider the values of the next digits (e.g. 9.9 or 9.999), the value actually approaches 20 but it doesn't exceed 20. What this means is that if X has n digits and Y has n-1 digits, the MSD of X can still change. But if X has n digits and Y has n-2 digits, as long as the n-1 digit of X is less than 8, then the MSD will not change (otherwise it would be 8 + 2 = 10 or 9 + 2 = 11 which means that the MSD will change). Here are some examples
Assuming X is the running sum of sqrt(2) and Y is 5^n:
1. If X = 10000 and Y = 9000 then the MSD of X can change.
2. If X = 10000 and Y = 900 then the MSD of X will not change.
3. If X = 19000 and Y = 900 then the MSD of X can change.
4. If X = 18000 and Y = 999 then the MSD of X can change.
5. If X = 17999 and Y = 999 then the MSD of X will not change.
6. If X = 19990 and Y = 9 then the MSD of X can change.
In the example above, on point #2 and #5, the 1 can already be popped. However for point #6, it is possible to have 19990 + 9 + 4 = 20003, but this also means that both 2 and 0 can be popped after that happened.
Here's a simulation for sqrt(2)
i Out X Y flag
-------------------------------------------------------------------
1 0 5 0
2 25 25 1
3 375 125 1
4 3,750 625 0
5 40,625 3,125 1
6 406,250 15,625 0
7 4 140,625 78,125 1
8 4 1,406,250 390,625 0
9 4 14,062,500 1,953,125 0
10 41 40,625,000 9,765,625 0
11 41 406,250,000 48,828,125 0
12 41 4,062,500,000 244,140,625 0
13 41 41,845,703,125 1,220,703,125 1
14 414 18,457,031,250 6,103,515,625 0
15 414 184,570,312,500 30,517,578,125 0
16 414 1,998,291,015,625 152,587,890,625 1
17 4142 0,745,849,609,375 762,939,453,125 1
You can use multiply and divide approach to reduce the floating point arithmetic.
1 0 1 1
Which is equivalent to 1*2^0+0*2^1+2^(-2)+2^(-3) can be simplified to (1*2^3+0*2^2+1*2^1+1*2^0)/(2^3) only division remains floating point arithmetic rest all is integer arithmetic operation. Multiplication by 2 can be implemented through left shift.

Loops and output

I am trying to get a bit handy with my loop and output statements, currently I have a loan which amortizes like such:
data have;
input Payment2017 Payment2018 Payment2019 Payment2020;
datalines;
100 10 10 10;
run;
I'm trying to create a maturity and re-issuance profile that looks like this, I will explain the logic when I submit my current code:
data want;
input;
P2017 P2018 P2019 P2020 F1 F2 F3 MP2017 MP2018 MP2019 MP2020 NI2017 NI2018 NI2019 NI2020;
datalines;
100 10 10 10 0.1 0.1 0.1 100 10 10 10 0 0 0 0
100 10 10 10 0.1 0.1 0.1 0 10 1 1 0 10 0 0
100 10 10 10 0.1 0.1 0.1 0 0 11 1.1 0 0 11 0
100 10 10 10 0.1 0.1 0.1 0 0 0 12.1 0 0 0 12.1
;
run;
so the logic is that:
Payment2017 = the balance at the start of the year
Payment2018 - 2020 = the amount paid each period
F1-F3 is the fraction of the loan that is being paid each period.
MP2017-MP2020 is the amount of the loan that is paid back - essentially it is
mp(i) = p(i) *f(i)
NI2017-NI2020 is the amount that is newly issued if you assume that each time I pay off a bit of the loan , it is added back onto the loan. so the current code which I am using looks like this but i'm having some issues with the ouput and loops.
data want;
set have;
array MaturityProfile(4) MaturityProfile&StartDate-MaturityProfile&EndDate;
array NewIssuance(4) NewIssuance&StartDate - NewIssuance&EndDate;
array p(4) payment&StartDate-payment&EndDate;
array fraction(3); * track constant fraction determined at start of profile;
MaturityProfile(1) = P(1);
do i = 1 to 3;
fraction(i) = p(i+1) / p(1);
end;
iter=2;
do j = 1 to 2;
do i = iter to 4;
MaturityProfile(i) = P(j) * Fraction(i-j);
newissuance(i) = MaturityProfile(i);
end;
output;
iter=iter+1;
end;
output;
*MaturityProfile(4) = ( P(3) + MaturityProfile(2) ) * Fraction(1);
*output;
drop i;
drop j;
drop iter;
run;
I'm trying to find a way of for the first two rows, keeping it how it outputs currently but the third row needs the sum of the column for the second row ( or the newissuance2019) and then multiply that by fraction 1
so basically the output to look like the table I've put in the data want step.
TIA.
I managed to fix this by doing:
data want;
set have;
array MaturityProfile(4) MaturityProfile&StartDate-MaturityProfile&EndDate;
array NewIssuance(4) NewIssuance&StartDate - NewIssuance&EndDate;
array p(4) payment&StartDate-payment&EndDate;
array fraction(3); * track constant fraction determined at start of profile;
array Total(4) Total1-Total4;
MaturityProfile(1) = P(1);
do i = 1 to 3;
fraction(i) = p(i+1) / p(1);
end;
iter=2;
do j = 1 to 2;
do i = iter to 4;
MaturityProfile(i) = P(j) * Fraction(i-j);
Total(i)=MaturityProfile(i) + P(i);
end;
output;
iter=iter+1;
end;
MaturityProfile(4) = Total(3) * Fraction(1);
output;
drop i;
drop j;
drop iter;
run;

Correct way to get weighted average of concrete array-values along continous interval

I've been looking for a while onto websearch, however, possibly or probably I am missing the right terminology.
I have arbitrary sized arrays of scalars ...
array = [n_0, n_1, n_2, ..., n_m]
I also have a function f->x->y, with 0<=x<=1, and y an interpolated value from array. Examples:
array = [1,2,9]
f(0) = 1
f(0.5) = 2
f(1) = 9
f(0.75) = 5.5
My problem is that I want to compute the average value for some interval r = [a..b], where a E [0..1] and b E [0..1], i.e. I want to generalize my interpolation function f->x->y to compute the average along r.
My mind boggles me slightly w.r.t. finding the right weighting. Imagine I want to compute f([0.2,0.8]):
array --> 1 | 2 | 9
[0..1] --> 0.00 0.25 0.50 0.75 1.00
[0.2,0.8] --> ^___________________^
The latter being the range of values I want to compute the average of.
Would it be mathematically correct to compute the average like this?: *
1 * (1-0.8) <- 0.2 'translated' to [0..0.25]
+ 2 * 1
avg = + 9 * 0.2 <- 0.8 'translated' to [0.75..1]
----------
1.4 <-- the sum of weights
This looks correct.
In your example, your interval's length is 0.6. In that interval, your number 2 is taking up (0.75-0.25)/0.6 = 0.5/0.6 = 10/12 of space. Your number 1 takes up (0.25-0.2)/0.6 = 0.05 = 1/12 of space, likewise your number 9.
This sums up to 10/12 + 1/12 + 1/12 = 1.
For better intuition, think about it like this: The problem is to determine how much space each array-element covers along an interval. The rest is just filling the machinery described in http://en.wikipedia.org/wiki/Weighted_average#Mathematical_definition .

apending for loop/recursion / strange error

I have a matlab/octave for loop which gives me an inf error messages along with the incorrect data
I'm trying to get 240,120,60,30,15... every number is divided by two then that number is also divided by two
but the code below gives me the wrong value when the number hits 30 and 5 and a couple of others it doesn't divide by two.
ang=240;
for aa=2:2:10
ang=[ang;ang/aa];
end
240
120
60
30
40
20
10
5
30
15
7.5
3.75
5
2.5
1.25
0.625
24
12
6
3
4
2
1
0.5
3
1.5
0.75
0.375
0.5
0.25
0.125
0.0625
PS: I will be accessing these values from different arrays, that's why I used a for loop so I can access the values using their indexes
In addition to the divide-by-zero error you were starting with (fixed in the edit), the approach you're taking isn't actually doing what you think it is. if you print out each step, you'll see why.
Instead of that approach, I suggest taking more of a "matlab way": avoid the loop by making use of vectorized operations.
orig = 240;
divisor = 2.^(0:5); #% vector of 2 to the power of [0 1 2 3 4 5]
ans = orig./divisor;
output:
ans = [240 120 60 30 15 7.5]
Try the following:
ang=240;
for aa=1:5
% sz=size(ang,1);
% ang=[ang;ang(sz)/2];
ang=[ang;ang(end)/2];
end
You should be getting warning: division by zero if you're running it in Octave. That says pretty much everything.
When you divide by zero, you get Inf. Because of your recursion... you see the problem.
You can simultaneously generalise and vectorise by using logic:
ang=240; %Replace 240 with any positive integer you like
ang=ang*2.^-(0:log2(ang));
ang=ang(1:sum(ang==floor(ang)));
This will work for any positive integer (to make it work for negatives as well, replace log2(ang) with log2(abs(ang))), and will produce the vector down to the point at which it goes odd, at which point the vector ends. It's also faster than jitendra's solution:
octave:26> tic; for i=1:100000 ang=240; ang=ang*2.^-(0:log2(ang)); ang=ang(1:sum(ang==floor(ang))); end; toc;
Elapsed time is 3.308 seconds.
octave:27> tic; for i=1:100000 ang=240; for aa=1:5 ang=[ang;ang(end)/2]; end; end; toc;
Elapsed time is 5.818 seconds.

Finding the row with max separation between elements of an array in matlab

I have an array of size m x n. Each row has n elements which shows some probability (between 0 and 1). I want to find the row which has the max difference between its elements while it would be better if its nonzero elements are greater as well.
For example in array Arr:
Arr = [0.1 0 0.33 0 0.55 0;
0.01 0 0.10 0 0.2 0;
1 0.1 0 0 0 0;
0.55 0 0.33 0 0.15 0;
0.17 0.17 0.17 0.17 0.17 0.17]
the best row would be 3rd row, because it has more distinct values with greater values. How can I compute this using Matlab?
It seems that you're looking for the row with the greatest standard deviation, which is basically a measure of how much the values vary from the average.
If you want to ignore zero elements, use Shai's useful suggestion to replace zero elements to NaN. Indeed, some of MATLAB's built-in functions allow ignoring them:
Arr2 = Arr;
Arr2(~Arr) = NaN;
To find the standard deviation we'll employ nanstd (not std, because it doesn't ignore NaN values) along the rows, i.e. the 2nd dimension:
nanstd(Arr2, 0, 2)
To find the greatest standard deviation and it's corresponding row index, we'll apply nanmax and obtain both output variables:
[stdmax, idx] = nanmax(nanstd(Arr2, 0, 2));
Now idx holds hold the index of the desired row.
Example
Let's run this code on the input that you provided in your question:
Arr = [0.1 0 0.33 0 0.55 0;
0.01 0 0.10 0 0.2 0;
1 0.1 0 0 0 0;
0.55 0 0.33 0 0.15 0;
0.17 0.17 0.17 0.17 0.17 0.17];
Arr2 = Arr;
Arr2(~Arr) = NaN;
[maxstd, idx] = nanmax(nanstd(Arr2, 0, 2))
idx =
3
Note that the values in row #3 differ one from another much more than those in row #1, and therefore the standard deviation of row #3 is greater. This also corresponds to your comment:
... ergo a row with 3 zero and 3 non-zero but close values is worse than a row with 4 zeros and 2 very different values.
For this reason I believe that in this case 3 is indeed the correct answer.
It seems like you wish to ignore 0s in your matrix. You may achieve this by setting them to NaN and proceed using special build-in functions that ignore NaNs (e.g., nanmin, nanmax, etc.)
Here is a sample code for finding the row (ri) with the largest difference between minimal (nonzero) response and the maximal response:
nArr = Arr;
nArr( Arr == 0 ) = NaN; % replace zeros with NaNs
mn = nanmin(nArr, [], 2); % find minimal, non zero response at each row
mx = nanmax(nArr, [], 2); % maximal response
[~, ri] = nanmax( mx - mn ); % fid the row with maximal difference

Resources