Finding minimum positive value and its position in each column of a matrix - arrays

I need to find the minimum positive values in each column and its position inside the column of a certain matrix. So if I have:
A = [1 4
2 3
3 6]
I need to obtain the values 1 and 3, and the positions 1 and 2. Doing this inside a for loop I obtain correctly the minimum values and its position, but it also catches the negative values:
for bit = 1:2
[y(bit),x(bit)] = min(A(:,bit));
end
And if I use:
[y(bit),x(bit)] = min(A(A(:,bit)>0));
I don't receive the expected result. What I'm doing wrong? Thanks.

This can be easily achieved using inf and min...
New method using inf and no looping
Take some random example:
% Generated using A = randi([-100, 100], 10, 3)
A = [ 31 41 -12
-93 -94 -24
70 -45 53
87 -91 59
36 -81 -63
52 65 -2
49 39 -11
-22 -37 29
31 90 42
-66 -94 51];
Set all negative values to positive infinity, which will ensure they are never the minimum value in the column.
A(A<=0) = inf;
% if you want to preserve A, use A2=A; A2(A<=0)=inf;
Now you can just use the min function as expected.
[mins, idx] = min(A);
% mins = 31, 39, 29: as expected
% idx = 1, 7, 8: the indices of the above values in each column as expected.
By default, min will get the column-wise minimum as you want.To specify this explicitly, use min(A,[],1), see the documentation for more details.
Note that you could achieve the same result by using NaN instead of inf.
Your method
In response to why you were getting an unexpected result, it's because you weren't selecting the column of A in your loop, the second attempt should be corrected to
[y(bit),x(bit)] = min(A(A(:,bit)>0, bit));
However, this will still give an unexpected result! The minimums will be correct, but their indices will be lower than expected. This is because the indices will only count the positive values in each column, so you will get the nth positive number rather than the nth number. The easiest "workaround" is to abandon this method and use the quicker one above which doesn't require looping.

Related

Generating random poll numbers

I struggle with this simple problem: I want to create some random poll numbers. I have 4 variables I need to fill with data (actually an array of integer). These numbers should represent a random percentage. All percentages added will be 100% . Sounds simple.
But I think it isn't that easy. My first attempt was to generate a random number between 10 and base (base = 100), and substract the number from the base. Did this 3 times, and the last value was assigned the base. Is there a more elegant way to do that?
My question in a few words:
How can I fill this array with random values, which will be 100 when added together?
int values[4];
You need to write your code to emulate what you are simulating.
So if you have four choices, generate a sample size of random number (0..1 * 4) and then sum all the 0's, 1's, 2's, and 3's (remember 4 won't be picked). Then divide the counts by the sample size.
for (each sample) {
poll = random(choices);
survey[poll] += 1;
}
It's easy to use a computer to simulate things, simple simulations are very fast.
Keep in mind that you are working with integers, and integers don't divide nicely without converting them to floats or doubles. If you are missing a few percentage points, odds are it has to do with your integers dividing with remainders.
What you have here is a problem of partitioning the number 100 into 4 random integers. This is called partitioning in number theory.
This problem has been addressed here.
The solution presented there does essentially the following:
If computes, how many partitions of an integer n there are in O(n^2) time. This produces a table of size O(n^2) which can then be used to generate the kth partition of n, for any integer k, in O(n) time.
In your case, n = 100, and k = 4.
Generate x1 in range <0..1>, subtract it from 1, then generate x2 in range <0..1-x1> and so on. Last value should not be randomed, but in your case equal 1-x1-x2-x3.
I don't think this is a whole lot prettier than what it sounds like you've already done, but it does work. (The only advantage is it's scalable if you want more than 4 elements).
Make sure you #include <stdlib.h>
int prev_sum = 0, j = 0;
for(j = 0; j < 3; ++j)
{
values[j] = rand() % (100-prev_sum);
prev_sum += values[j];
}
values[3] = 100 - prev_sum;
It takes some work to get a truly unbiased solution to the "random partition" problem. But it's first necessary to understand what "unbiased" means in this context.
One line of reasoning is based on the intuition of a random coin toss. An unbiased coin will come up heads as often as it comes up tails, so we might think that we could produce an unbiased partition of 100 tosses into two parts (head-count and tail-count) by tossing the unbiased coin 100 times and counting. That's the essence of Edwin Buck's proposal, modified to produce a four-partition instead of a two-partition.
However, what we'll find is that many partitions never show up. There are 101 two-partitions of 100 -- {0, 100}, {1, 99} … {100, 0} but the coin sampling solution finds less than half of them in 10,000 tries. As might be expected, the partition {50, 50} is the most common (7.8%), while all of the partitions from {0, 100} to {39, 61} in total achieved less than 1.7% (and, in the trial I did, the partitions from {0, 100} to {31, 69} didn't show up at all.) [Note 1]
So that doesn't seem like a unbiased sample of possible partitions. An unbiased sample of partitions would return every partition with equal probability.
So another temptation would be to select the size of the first part of the partition from all the possible sizes, and then the size of the second part from whatever is left, and so on until we've reached one less than the size of the partition at which point anything left is in the last part. However, this will turn out to be biased as well, because the first part is much more likely to be large than any other part.
Finally, we could enumerate all the possible partitions, and then choose one of them at random. That will obviously be unbiased, but unfortunately there are a lot of possible partitions. For the case of 4-partitions of 100, for example, there are 176,581 possibilities. Perhaps that is feasible in this case, but it doesn't seem like it will lead to a general solution.
For a better algorithm, we can start with the observation that a partition
{p1, p2, p3, p4}
could be rewritten without bias as a cumulative distribution function (CDF):
{p1, p1+p2, p1+p2+p3, p1+p2+p3+p4}
where the last term is just the desired sum, in this case 100.
That is still a collection of four integers in the range [0, 100]; however, it is guaranteed to be in increasing order.
It's not easy to generate a random sorted sequence of four numbers ending in 100, but it is trivial to generate three random integers no greater than 100, sort them, and then find adjacent differences. And that leads to an almost unbiased solution, which is probably close enough for most practical purposes, particularly since the implementation is almost trivial:
(Python)
def random_partition(n, k):
d = sorted(randrange(n+1) for i in range(k-1))
return [b - a for a, b in zip([0] + d, d + [n])]
Unfortunately, this is still biased because of the sort. The unsorted list is selected without bias from the universe of possible lists, but the sortation step is not a simple one-to-one match: lists with repeated elements have fewer permutations than lists without repeated elements, so the probability of a particular sorted list without repeats is much higher than the probability of a sorted list with repeats.
As n grows large with respect to k, the number of lists with repeats declines rapidly. (These correspond to final partitions in which one or more of the parts is 0.) In the asymptote, where we are selecting from a continuum and collisions have probability 0, the algorithm is unbiased. Even in the case of n=100, k=4, the bias is probably ignorable for many practical applications. Increasing n to 1000 or 10000 (and then scaling the resulting random partition) would reduce the bias.
There are fast algorithms which can produce unbiased integer partitions, but they are typically either hard to understand or slow. The slow one, which takes time(n), is similar to reservoir sampling; for a faster algorithm, see the work of Jeffrey Vitter.
Notes
Here's the quick-and-dirty Python + shell test:
$ python -c '
from random import randrange
n = 2
for i in range(10000):
d = n * [0]
for j in range(100):
d[randrange(n)] += 1
print(' '.join(str(f) for f in d))
' | sort -n | uniq -c
1 32 68
2 34 66
5 35 65
15 36 64
45 37 63
40 38 62
66 39 61
110 40 60
154 41 59
219 42 58
309 43 57
385 44 56
462 45 55
610 46 54
648 47 53
717 48 52
749 49 51
779 50 50
788 51 49
723 52 48
695 53 47
591 54 46
498 55 45
366 56 44
318 57 43
234 58 42
174 59 41
118 60 40
66 61 39
45 62 38
22 63 37
21 64 36
15 65 35
2 66 34
4 67 33
2 68 32
1 70 30
1 71 29
You can brute force it by, creating a calculation function that adds up the numbers in your array. If they do not equal 100 then regenerate the random values in array, do calculation again.

How to identify breaks within an array of MATLAB?

I have an array in MATLAB containing elements such as
A=[12 13 14 15 30 31 32 33 58 59 60];
How can I identify breaks in values of data? For example, the above data exhibits breaks at elements 15 and 33. The elements are arranged in ascending order and have an increment of one. How can I identify the location of breaks of this pattern in an array? I have achieved this using a for and if statement (code below). Is there a better method to do so?
count=0;
for i=1:numel(A)-1
if(A(i+1)==A(i)+1)
continue;
else
count=count+1;
q(count)=i;
end
end
Good time to use diff and find those neighbouring differences that aren't equal to 1. However, this will return an array which is one less than the length of your input array because it finds pairwise differences up until the last element, so naturally there will be one less. As such, when you find the locations that aren't equal to 1, make sure you add 1 to the locations to account for this:
>> A=[12 13 14 15 30 31 32 33 58 59 60];
>> q = find(diff(A) ~= 1) + 1
q =
5 9
This tells us that locations 5 and 9 in your array is where the jump happens, and that's right for your example data.
However, if you want to find the locations before the jump happens, such as in your code, don't add 1 to the result:
>> q = find(diff(A) ~= 1)
q =
4 8

Matlab - replicating arrays values according to occurrences array

For example, A = [19 20 21 22 23 24 25]; B = [2 0 3 0 0 0 2];
How can we get a new array, repeating each value from B accordingly X times?
For example, answer here is: [19 19 21 21 21 25 25].
Please note that I am only allowed to a for loop combined with a repmat call.
If you are only allowed to use repmat and a for loop, you can do the following:
S = [];
for idx = 1 : length(B)
S = [S repmat(A(idx), 1, B(idx))];
end
S is initially a blank array, then for as many values as there are in B (or A since they're both equal in length), simply concatenate S with each value in A that is repeated by the corresponding number in B. S will contain the output.
By running the above example, I get:
S =
19 19 21 21 21 25 25
However, I highly recommend you use more vectorized approaches. I'll leave that to you as an exercise.
Good luck!

Bad value returned by calculation

This function is ment to sum all of the numbers that are in an even index of the list, and then multiply this sum by the last number of the list.
checkio = [-37,-36,-19,-99,29,20,3,-7,-64,84,36,62,26,-76,55,-24,84,49,-65,41]
def checkzi(array):
if len(array) != 0:
sum_array = 0
for i in array:
x = array.index(i)
if (x % 2 == 0):
sum_array += int(i)
print (sum_array)
print (sum_array)
answer = (sum_array) * (array[len(array)-1])
return (answer)
else:
return 0
checkzi(checkio)
the 'print' output I get is:
-37
-56
-27
-24
-88
-52
-26
29
-36
-36
.
By this I can understand that the last number that was added correctly was 55. after 55, 84 wasn't added correctly.
More to that, the final sum that I get is -1476, while it is suppose to be 1968.
I can't find any reason for this. not something I can see anyway.
Any idea anyone?
Thanks!!
array.index() will always return the first index at which a value is found. So you're looping through every element, and then looking to see what index it's at--but if there are duplicate elements (which there are), then you only see the index of the first one, leading you to always add (or always exclude) that number whenever you encounter it.
A much cleaner (and quicker) way to do this is to only iterate over the even elements of the list in the first place, using Python's slice notation:
checkio = [-37,-36,-19,-99,29,20,3,-7,-64,84,36,62,26,-76,55,-24,84,49,-65,41]
def checkzi(array):
sum_array = 0
for value in array[::2]: #loop over all values at even indexes
sum_array += value
return sum_array * array[-1] # multiply by the last element in the original array
Using the built-in sum function, you could even one-line this whole thing:
def checkzi(array):
return sum(array[::2]) * array[-1]
The problem is that array.index() will return the first instance of a value. You have the value 84 twice - so since the first index is odd, you never add it.
You really need to keep track of the index, not rely on uniqueness of the values. You do this with
for idx, val in enumerate(array):
now your first value will be the index, and the second value will be the value. Test idx%2==0 and you can figure it out from here.
update here is the complete code, making clear (I hope) how this works:
checkio = [-37,-36,-19,-99,29,20,3,-7,-64,84,36,62,26,-76,55,-24,84,49,-65,41]
def checkzi(array):
if len(array) != 0:
sum_array = 0
for idx, x in enumerate(array):
print "testing element", idx, " which has value ", x
if (idx % 2 == 0):
sum_array += x
print "sum is now ", sum_array
else:
print "odd element - not summing"
print (sum_array)
answer = (sum_array) * (array[len(array)-1])
return (answer)
else:
return 0
checkzi(checkio)
Output:
testing element 0 which has value -37
sum is now -37
testing element 1 which has value -36
odd element - not summing
testing element 2 which has value -19
sum is now -56
testing element 3 which has value -99
odd element - not summing
testing element 4 which has value 29
sum is now -27
testing element 5 which has value 20
odd element - not summing
testing element 6 which has value 3
sum is now -24
testing element 7 which has value -7
odd element - not summing
testing element 8 which has value -64
sum is now -88
testing element 9 which has value 84
odd element - not summing
testing element 10 which has value 36
sum is now -52
testing element 11 which has value 62
odd element - not summing
testing element 12 which has value 26
sum is now -26
testing element 13 which has value -76
odd element - not summing
testing element 14 which has value 55
sum is now 29
testing element 15 which has value -24
odd element - not summing
testing element 16 which has value 84
sum is now 113
testing element 17 which has value 49
odd element - not summing
testing element 18 which has value -65
sum is now 48
testing element 19 which has value 41
odd element - not summing
48
You obviously want to take the print statements out - I added them to help explain the program flow.

find and replace values in cell array

I have a cell array like this: [...
0
129
8...2...3...4
6...4
0
I just want to find and replace specific values, but I can't use the ordinary function because the cells are different lengths. I need to replace many specific values at the same time and there is no general function about how values are replaced. However, sometimes several input values should be replaced by the same output.
so I want to say
for values 1:129
'if 0, then 9'
'elseif 1 then 50'
'elseif 2 or 3 or 4 then 61'
etc...up to 129
where these rules are applied to the entire array.
I've tried to work it out myself, but still getting nowhere. Please help!
Since your values appear to span the range 0 to 129, one solution is to add one to these values (so they span the range 1 to 130) and use them as indices into a vector of replacement values. Then you can apply this operation to each cell using the function CELLFUN. For example:
>> C = {0, 129, [8 2 3 4], [6 4], 0}; %# The sample cell array you give above
>> replacement = [9 50 61 61 61 100.*ones(1,125)]; %# A 1-by-130 array of
%# replacement values (I
%# added 125 dummy values)
>> C = cellfun(#(v) {replacement(v+1)},C); %# Perform the replacement
>> C{:} %# Display the contents of C
ans =
9
ans =
100
ans =
100 61 61 61
ans =
100 61
ans =
9

Resources