Calculating the drawdown within a Numpy Array Python - arrays

I am trying to write a function that calculates how much the biggest dip was in each array. the function below calculates between the max and the min but it does not get Expected Output I am looking for. The resultant of calc(C) should be -62 since 11,66,45,4 the value went down from 66 to 4 in the array resulting in the dip to be -62 points below 66. How would I be able to fix the function below? Sample code gotten from: issue
def calc(arr):
try:
_min = min(arr)
index_min = np.where(arr == _min)[0][0] #first occurence
_max = max(arr[:index_min])
print(_min-_max)
except:
print('No drawdown')
A = np.array([0,2,5,44,-12,3,-5])
B = np.array([0,10,-110,23,45,66,30,2,12])
C = np.array([0,10,11,-23,45,11,66,45,4,12])
D = np.array([0,5,6,7,8])
E = np.array([0,10,5,6,8])
calc(A)
calc(B)
calc(C)
calc(D)
calc(E)
Output:
-56
-120
-34
No drawdown
No drawdown
Expected Output:
-56
-120
-62
No drawdown
-5

The biggest dip does not necessarily happen at the global maximum or global minimum. We need an exhaustive approach to find the largest dip:
check the maximum value so far, for which we can use numpy.maximum.accumulate;
calculate the biggest dip for each position.
And take the largest dip among all the dips.
def calc(a):
acc_max = np.maximum.accumulate(a)
return (a - acc_max).min()
calc(A)
# -56
calc(B)
# -120
calc(C)
# -62
calc(D)
# 0
calc(E)
# -5

Related

Finding minimum positive value and its position in each column of a matrix

I need to find the minimum positive values in each column and its position inside the column of a certain matrix. So if I have:
A = [1 4
2 3
3 6]
I need to obtain the values 1 and 3, and the positions 1 and 2. Doing this inside a for loop I obtain correctly the minimum values and its position, but it also catches the negative values:
for bit = 1:2
[y(bit),x(bit)] = min(A(:,bit));
end
And if I use:
[y(bit),x(bit)] = min(A(A(:,bit)>0));
I don't receive the expected result. What I'm doing wrong? Thanks.
This can be easily achieved using inf and min...
New method using inf and no looping
Take some random example:
% Generated using A = randi([-100, 100], 10, 3)
A = [ 31 41 -12
-93 -94 -24
70 -45 53
87 -91 59
36 -81 -63
52 65 -2
49 39 -11
-22 -37 29
31 90 42
-66 -94 51];
Set all negative values to positive infinity, which will ensure they are never the minimum value in the column.
A(A<=0) = inf;
% if you want to preserve A, use A2=A; A2(A<=0)=inf;
Now you can just use the min function as expected.
[mins, idx] = min(A);
% mins = 31, 39, 29: as expected
% idx = 1, 7, 8: the indices of the above values in each column as expected.
By default, min will get the column-wise minimum as you want.To specify this explicitly, use min(A,[],1), see the documentation for more details.
Note that you could achieve the same result by using NaN instead of inf.
Your method
In response to why you were getting an unexpected result, it's because you weren't selecting the column of A in your loop, the second attempt should be corrected to
[y(bit),x(bit)] = min(A(A(:,bit)>0, bit));
However, this will still give an unexpected result! The minimums will be correct, but their indices will be lower than expected. This is because the indices will only count the positive values in each column, so you will get the nth positive number rather than the nth number. The easiest "workaround" is to abandon this method and use the quicker one above which doesn't require looping.

Find timeline for duration values in Matlab

I have the following time-series:
b = [2 5 110 113 55 115 80 90 120 35 123];
Each number in b is one data point at a time instant. I computed the duration values from b. Duration is represented by all numbers within b larger or equal to 100 and arranged consecutively (all other numbers are discarded). A maximum gap of one number smaller than 100 is allowed. This is how the code for duration looks like:
N = 2; % maximum allowed gap
duration = cellfun(#numel, regexp(char((b>=100)+'0'), [repmat('0',1,N) '+'], 'split'));
giving the following duration values for b:
duration = [4 3];
I want to find the positions (time-lines) within b for each value in duration. Next, I want to replace the other positions located outside duration with zeros. The result would look like this:
result = [0 0 3 4 5 6 0 0 9 10 11];
If anyone could help, it would be great.
Answer to original question: pattern with at most one value below 100
Here's an approach using a regular expression to detect the desired pattern. I'm assuming that one value <100 is allowed only between (not after) values >=100. So the pattern is: one or more values >=100 with a possible value <100 in between .
b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
[s, e] = regexp(B, '1+(.1+|)', 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(#ge, y, s(:)) & bsxfun(#le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result
This gives
y =
0 0 3 4 5 6 0 0 9 10 11
Answer to edited question: pattern with at most n values in a row below 100
The regexp needs to be modified, and it has to be dynamically built as a function of n:
b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
n = 2;
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
r = sprintf('1+(.{1,%i}1+)*', n); %// build the regular expression from n
[s, e] = regexp(B, r, 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(#ge, y, s(:)) & bsxfun(#le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result
Here is another solution, not using regexp. It naturally generalizes to arbitrary gap sizes and thresholds. Not sure whether there is a better way to fill the gaps. Explanation in comments:
% maximum step size and threshold
N = 2;
threshold = 100;
% data
b = [2 5 110 113 55 115 80 90 120 35 123];
% find valid data
B = b >= threshold;
B_ind = find(B);
% find lengths of gaps
step_size = diff(B_ind);
% find acceptable steps (and ignore step size 1)
permissible_steps = 1 < step_size & step_size <= N;
% find beginning and end of runs
good_begin = B_ind([permissible_steps, false]);
good_end = good_begin + step_size(permissible_steps);
% fill gaps in B
for ii = 1:numel(good_begin)
B(good_begin(ii):good_end(ii)) = true;
end
% find durations of runs in B. This finds points where we switch from 0 to
% 1 and vice versa. Due to padding the first match is always a start of a
% run, the last one always an end. There will be an even number of matches,
% so we can reshape and diff and thus fidn the durations
durations = diff(reshape(find(diff([false, B, false])), 2, []));
% get positions of 'good' data
outpos = zeros(size(b));
outpos(B) = find(B);

How can I find minimum values from array in matlab?

I want to extract the two points (i.e their values) which are marked with black outline in figure. These minima points are 2 and 5. Then after extraction these marked points coordinates I want to calculate the distance between them.
The code that I am using to plot average values of image, calculate minimas and locations is
I1=imread('open.jpg');
I2=rgb2gray(I1);
figure, title('open');
plot(1:size(I2,1), mean(I2,2));
hold on
horizontalAverages = mean(I2 , 2);
plot(1:size(I2,1) , horizontalAverages)
[Minimas locs] = findpeaks(-horizontalAverages)
plot(locs , -1*Minimas , 'r*')
Minima
-86.5647
-80.3647
-81.3588
-106.9882
-77.0765
-77.8235
-92.2353
-106.2235
-115.3118
-98.3706
locs =
30
34
36
50
93
97
110
121
127
136
It is a bit unclear from your question what you are actually looking for, but the following one liner will get you the local minima:
% Some dummy data
x = 1:11;
y = [3 2 1 0.5 1 2 1 0 1 2 3];
min_idx = ([0 sign(diff(y))] == -1) & ([sign(diff(y)) 0] == 1);
figure
plot(x, y);
hold on;
scatter(x(min_idx), y(min_idx))
hold off;
Use the 'findpeaks' function, if you have the signal processing toolbox.
[y,locs]=findpeaks(-x)
will find the local minima. This function has a ton of options to handle all kinds of special cases, so is very useful.

Bad value returned by calculation

This function is ment to sum all of the numbers that are in an even index of the list, and then multiply this sum by the last number of the list.
checkio = [-37,-36,-19,-99,29,20,3,-7,-64,84,36,62,26,-76,55,-24,84,49,-65,41]
def checkzi(array):
if len(array) != 0:
sum_array = 0
for i in array:
x = array.index(i)
if (x % 2 == 0):
sum_array += int(i)
print (sum_array)
print (sum_array)
answer = (sum_array) * (array[len(array)-1])
return (answer)
else:
return 0
checkzi(checkio)
the 'print' output I get is:
-37
-56
-27
-24
-88
-52
-26
29
-36
-36
.
By this I can understand that the last number that was added correctly was 55. after 55, 84 wasn't added correctly.
More to that, the final sum that I get is -1476, while it is suppose to be 1968.
I can't find any reason for this. not something I can see anyway.
Any idea anyone?
Thanks!!
array.index() will always return the first index at which a value is found. So you're looping through every element, and then looking to see what index it's at--but if there are duplicate elements (which there are), then you only see the index of the first one, leading you to always add (or always exclude) that number whenever you encounter it.
A much cleaner (and quicker) way to do this is to only iterate over the even elements of the list in the first place, using Python's slice notation:
checkio = [-37,-36,-19,-99,29,20,3,-7,-64,84,36,62,26,-76,55,-24,84,49,-65,41]
def checkzi(array):
sum_array = 0
for value in array[::2]: #loop over all values at even indexes
sum_array += value
return sum_array * array[-1] # multiply by the last element in the original array
Using the built-in sum function, you could even one-line this whole thing:
def checkzi(array):
return sum(array[::2]) * array[-1]
The problem is that array.index() will return the first instance of a value. You have the value 84 twice - so since the first index is odd, you never add it.
You really need to keep track of the index, not rely on uniqueness of the values. You do this with
for idx, val in enumerate(array):
now your first value will be the index, and the second value will be the value. Test idx%2==0 and you can figure it out from here.
update here is the complete code, making clear (I hope) how this works:
checkio = [-37,-36,-19,-99,29,20,3,-7,-64,84,36,62,26,-76,55,-24,84,49,-65,41]
def checkzi(array):
if len(array) != 0:
sum_array = 0
for idx, x in enumerate(array):
print "testing element", idx, " which has value ", x
if (idx % 2 == 0):
sum_array += x
print "sum is now ", sum_array
else:
print "odd element - not summing"
print (sum_array)
answer = (sum_array) * (array[len(array)-1])
return (answer)
else:
return 0
checkzi(checkio)
Output:
testing element 0 which has value -37
sum is now -37
testing element 1 which has value -36
odd element - not summing
testing element 2 which has value -19
sum is now -56
testing element 3 which has value -99
odd element - not summing
testing element 4 which has value 29
sum is now -27
testing element 5 which has value 20
odd element - not summing
testing element 6 which has value 3
sum is now -24
testing element 7 which has value -7
odd element - not summing
testing element 8 which has value -64
sum is now -88
testing element 9 which has value 84
odd element - not summing
testing element 10 which has value 36
sum is now -52
testing element 11 which has value 62
odd element - not summing
testing element 12 which has value 26
sum is now -26
testing element 13 which has value -76
odd element - not summing
testing element 14 which has value 55
sum is now 29
testing element 15 which has value -24
odd element - not summing
testing element 16 which has value 84
sum is now 113
testing element 17 which has value 49
odd element - not summing
testing element 18 which has value -65
sum is now 48
testing element 19 which has value 41
odd element - not summing
48
You obviously want to take the print statements out - I added them to help explain the program flow.

Using a for loop to generate elements of a vector

I am trying to compute with the equation
and I would like to store each value into a row vector. Here is my attempt:
multiA = [1];
multiB = [];
NA = 6;
NB = 4;
q = [0,1,2,3,4,5,6];
for i=2:7
multiA = [multiA(i-1), (factorial(q(i) + NA - 1))/(factorial(q(i))*factorial(NA-1))];
%multiA = [multiA, multiA(i)];
end
multiA
But this does not work. I get the error message
Attempted to access multiA(3); index out
of bounds because numel(multiA)=2.
multiA = [multiA(i-1), (factorial(q(i)
+ NA -
1))/(factorial(q(i))*factorial(NA-1))];
Is my code even remotely close to what I want to achieve? What can I do to fix it?
You don't need any loop, just use the vector directly.
NA = 6;
q = [0,1,2,3,4,5,6];
multiA = factorial(q + NA - 1)./(factorial(q).*factorial(NA-1))
gives
multiA =
1 6 21 56 126 252 462
For multiple N a loop isn't necessary neither:
N = [6,8,10];
q = [0,1,2,3,4,5,6];
[N,q] = meshgrid(N,q)
multiA = factorial(q + N - 1)./(factorial(q).*factorial(N-1))
Also consider the following remarks regarding the overflow for n > 21 in:
f = factorial(n)
Limitations
The result is only accurate for double-precision values of n that are less than or equal to 21. A larger value of n produces a result that
has the correct order of magnitude and is accurate for the first 15
digits. This is because double-precision numbers are only accurate up
to 15 digits.
For single-precision input, the result is only accurate for values of n that are less than or equal to 13. A larger value of n produces a
result that has the correct order of magnitude and is accurate for the
first 8 digits. This is because single-precision numbers are only
accurate up to 8 digits.
Factorials of moderately large numbers can cause overflow. Two possible approaches to prevent that:
Avoid computing terms that will cancel. This approach is specially suited to the case when q is of the form 1,2,... as in your example. It also has the advantage that, for each value of q, the result for the previous value is reutilized, thus minimizing the number of operations:
>> q = 1:6;
>> multiA = cumprod((q+NA-1)./q)
multiA =
6 21 56 126 252 462
Note that 0 is not allowed in q. But the result for 0 is just 1, so the final result would be just [1 multiA].
For q arbitrary (not necessarily of the form 1,2,...), you can use the gammaln function, which gives the logarithms of the factorials:
>> q = [0 1 2 6 3];
>> multiA = exp(gammaln(q+NA)-gammaln(q+1)-gammaln(NA));
>>multiA =
1.0000 6.0000 21.0000 462.0000 56.0000
You want to append a new element to the end of 'multiA':
for i=2:7
multiA = [multiA, (factorial(q(i) + NA - 1))/(factorial(q(i))*factorial(NA-1))];
end
A function handle makes it much simpler:
%define:
omega=#(q,N)(factorial(q + N - 1))./(factorial(q).*factorial(N-1))
%use:
omega(0:6,4) %q=0..6, N=4
It might be better to use nchoosek as opposed to factorial. The latter can overflow quite easily, I'd imagine.
multiA=nan(1,7);
for i=1:7
multiA(i)=nchoosek(q(i)+N-1, q(i));
end

Resources