Algorithm to get group index of item in chunked array - arrays

So I have an arbitrary array of items:
array = [0,1,2,3,4];
and when it's been chunked it looks like:
array.chunk(2) => [[0,1],[2,3],[4]];
array.chunk(3) => [[0,1,2],[3,4]];
What i'd like is an algorithm to get the index of the group that the index is in, based on the group size.
For instance, running the algorithm on each element in array would yield:
array.chunkIndex( chunkSize = 2, index = n )
0 => 0
1 => 0
2 => 1
3 => 1
4 => 2
array.chunkIndex( chunkSize = 3, index = n )
0 => 0
1 => 0
2 => 0
3 => 1
4 => 1
So running the algorithm on the index with chunkSize = 1 would always yield the original index.
How would I go about doing this? To be clear, I don't want to chunk the array, just determine which group it would be in, without looping and without built-in functions, if possible.

Also in psuedo-code:
chunkIndex = index / chunkSize
It's simple integer division which means the only case you have to be careful of is languages that will return a float/decimal/real. For those cases, you will need a floor function to find just the integer part of the result. You may wish to handle negative values also.

floor(index / chunkSize) should work!

Related

How can I find the nonzero values in a MATLAB cells array?

The following code generates an cell array Index [1x29], where each cell is an array [29x6]:
for i = 1 : size(P1_cell,1)
for j = 1 : size(P1_cell,2)
[Lia,Lib] = ismember(P1_cell{i,j},PATTERNS_FOR_ERANOS_cell{1},'rows');
Index1(i,j) = Lib % 29x6
end
Index{i} = Index1; % 1x29
end
How can I find the nonzero values in Index array?, i.e. generate an array with the number of non-zero values in each row of the Index1 array. I tried the following loop, but it doesn't work, it creates conflict with the previous one:
for i = 1 : length(Index)
for j = 1 : length(Index)
Non_ceros = length(find(Index{:,i}(j,:))); %% I just need the length of the find function output
end
end
I need help, Thanks in advance.
The nnz() (number of non-zeros) function can be used to evaluate the number of non-zero elements. To obtain the specific positive values you can index the array by using the indices returned by the find() function. I used some random test data but it should work for 29 by 6 sized arrays as well.
%Random test data%
Index{1} = [5 2 3 0 zeros(1,25)];
Index{2} = [9 2 3 1 zeros(1,25)];
Index{3} = [5 5 5 5 zeros(1,25)];
%Initializing and array to count the number of zeroes%
Non_Zero_Counts = zeros(length(Index),1);
for Row_Index = 1: length(Index)
%Evaluating the number of positive values%
Array = Index{Row_Index};
Non_Zero_Counts(Row_Index) = nnz(Array);
%Retrieving the positive values%
Positive_Indices = find(Array);
PositiveElements{Row_Index} = Array(Positive_Indices);
disp(Non_Zero_Counts(Row_Index) + " Non-Zero Elements ");
disp(PositiveElements{Row_Index});
end
Ran using MATLAB R2019b
for i = 1 : length(Index)
for j = 1 : length(Index)
Non_ceros(i,j) = nnz(Index{:,i}(j,:));
end
end

Random based on area

I have an array of elements:
$arr = array(
'0' => 265000, // Area
'1' => 190000,
'2' => 30000,
'3' => 1300
);
I want to get random index based on the area (Array value). I need the area with big value be selected more frequently.
How can I do this?
What I have now:
$random_idx = mt_rand(0, count($arr)-1);
$selected_area = (object)$arr[$random_idx];
Thanks!
1. Repeted values
Let's suppose we have an array in which every value corresponds to the relative probability of its index. For example, given a coin, the possible outcomes of a toss are 50% tails and 50% heads. We can represent those probability with an array, like (I'll use PHP as this seems the language used by OP):
$coin = array(
'head' => 1,
'tails' => 1
);
While the results of rolling two dice can be represented as:
$dice = array( '2' => 1, '3' => 2, '4' => 3, '5' => 4, '6' => 5, '7' => 6,
'8' => 5, '9' => 4, '10' => 3, '11' => 2, '12' => 1
);
An easy way to pick a random key (index) with a probability proportional to the values of those arrays (and therefore consistent to the underlying model) is to create another array whose elements are the keys of the original one repeated as many times as indicated by the values and then return a random value. For example for the dice array:
$arr = array( 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, ...
Doing so, we are confident that each key will be picked up with the right relative probability. We can encapsulate all the logic in a class with a constructer which builds the helper array an a function that returns a random index using mt_rand():
class RandomKeyMultiple {
private $pool = array();
private $max_range;
function __construct( $source ) {
// build the look-up array
foreach ( $source as $key => $value ) {
for ( $i = 0; $i < $value; $i++ ) {
$this->pool[] = $key;
}
}
$this->max_range = count($this->pool) - 1;
}
function get_random_key() {
$x = mt_rand(0, $this->max_range);
return $this->pool[$x];
}
}
The usage is simple, just create an object of the class passing the source array and then each call of the function will return a random key:
$test = new RandomKeyMultiple($dice);
echo $test->get_random_key();
The problem is that OP's array contains big values and this results in a very big (but still manageable, even without dividing all the values by 100) array.
2. Steps
In general, discrete probability distribution may be more complicated, with float values that cannot be easily translated in number of repetitions.
Another way to solve the problem is to consider the values in the array as the misures of intervals that divide the global range of all possible values:
+---------------------------+-----------------+-------+----+
| | | | |
|<--- 265000 --->|<-- 190000 -->|<30000>|1300|
|<------- 455000 ------>| |
|<---------- 485000 --------->| |
|<---------------- 486300 -------------->|
Then we can choose a random number between 0 and 486300 (the global range) and look up the right index (the odds of which would be proportional to the lenght of its segment, giving the correct probability distribution). Something like:
$x = mt_rand(0, 486300);
if ( $x < 265000 )
return 0;
elseif ( $x < 455000 )
return 1;
elseif ( $x < 485000 )
return 2;
else
return 3;
We can generalize the algorithm and encapsulate all the logic in a class (using an helper array to store the partial sums):
class RandomKey {
private $steps = array();
private $last_key;
private $max_range;
function __construct( $source ) {
// sort in ascending order to partially avoid numerical issues
asort($source);
// calculate the partial sums. Considering OP's array:
//
// 1300 ----> 0
// 30000 ----> 1300
// 190000 ----> 31300
// 265000 ----> 221300 endind with $partial = 486300
//
$partial = 0;
$temp = 0;
foreach ( $source as $k => &$v ) {
$temp = $v;
$v = $partial;
$partial += $temp;
}
// scale the steps to cover the entire mt_rand() range
$factor = mt_getrandmax() / $partial;
foreach ( $source as $k => &$v ) {
$v *= $factor;
}
// Having the most probably outcomes first, minimizes the look-up of
// the correct index
$this->steps = array_reverse($source);
// remove last element (don't needed during checks) but save the key
end($this->steps);
$this->last_key = key($this->steps);
array_pop($this->steps);
}
function get_random_key() {
$x = mt_rand();
foreach ( $this->steps as $key => $value ) {
if ( $x > $value ) {
return $key;
}
}
return $this->last_key;
}
}
Here or here there are live demos with some examples and helper functions to check the probability distribution of the keys.
For bigger arrays, a binary search to look-up the index may also be considered.
This solution is based on element's index, not on it's value. So we need the array to be ordered to always be sure that element with bigger value has bigger index.
Random index generator can now be represented as a linear dependency x = y:
(y)
a i 4 +
r n 3 +
r d 2 +
a e 1 +
y x 0 +
0 1 2 3 4
r a n d o m
n u m b e r (x)
We need to generate indices non-linearly (bigger index - more probability):
a i 4 + + + + +
r n 3 + + + +
r d 2 + + +
a e 1 + +
y x 0 +
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
r a n d o m
n u m b e r
To find the range of x values for an array of length c we can calculate the sum of all numbers in range 0..c:
(c * (c + 1)) / 2;
To find x for any y let's solve quadratic equation
y ^ 2 + y - 2 * x = 0;
Having solved this we get
y = (sqrt(8 * x + 1) - 1) / 2;
Now let's put it all together:
$c = $count($arr);
$range = ($c * ($c + 1)) / 2;
$random_x = mt_rand(0, range);
$random_idx = floor((sqrt(8 * $random_x + 1) - 1) / 2);
This solution fits best for big arrays in terms of performance - it does not depend on the array size and type.
This problem is somewhat similar to the way operating systems can identify the next thread to run with lottery scheduling.
The idea is to assign each area a number of tickets depending on its size and number all those tickets. Depending on which random number was chosen you know which ticket won and thus the winning area.
First you will need to sum up all the areas and find a random number up to this total. Now you just iterate through your array and look for the first element whose summed up total to this point is larger than the random number.
Assuming you are looking for a solution in PHP:
function get_random_index($array) {
// generate total
$total = array_sum($array);
// get a random number in the required range
$random_number = rand(0, $total-1);
// temporary sum needed to find the 'winning' area
$temp_total = 0;
// this variable helps us identify the winning area
$current_area_index = 0;
foreach ($array as $area) {
// add the area to our temporary total
$temp_total = $temp_total + $area;
// check if we already have the right ticket
if($temp_total > $random) {
return $current_area_index;
}
else {
// this area didn't win, so check the next one
$current_area_index++;
}
}
}
Your array describes a discrete probability distribution. Each array value ('area' or 'weight') relates to the probability of a discrete random variable taking a specific value from the range of array keys.
/**
* Draw a pseudorandom sample from the given discrete probability distribution.
* The input array values will be normalized and do not have to sum up to one.
*
* #param array $arr Array of samples => discrete probabilities (weights).
* #return sample
*/
function draw_discrete_sample($arr) {
$rand = mt_rand(0, array_sum($arr) - 1);
foreach ($arr as $key => $weight) {
if (($rand -= $weight) < 0) return $key;
}
}
Replace the first line with $rand = mt_rand() / mt_getrandmax() * array_sum($arr); if you want to support non-integer weights / probabilities.
You might also want to have a look at similar questions asked here. If you are only interested in sampling a small set of known distributions, I recommend the analytic approach outlined by Oleg Mikhailov.

Reverse lookup with non-unique values

What I'm trying to do
I have an array of numbers:
>> A = [2 2 2 2 1 3 4 4];
And I want to find the array indices where each number can be found:
>> B = arrayfun(#(x) {find(A==x)}, 1:4);
In other words, this B should tell me:
>> for ii=1:4, fprintf('Item %d in location %s\n',ii,num2str(B{ii})); end
Item 1 in location 5
Item 2 in location 1 2 3 4
Item 3 in location 6
Item 4 in location 7 8
It's like the 2nd output argument of unique, but instead of the first (or last) occurrence, I want all the occurrences. I think this is called a reverse lookup (where the original key is the array index), but please correct me if I'm wrong.
How can I do it faster?
What I have above gives the correct answer, but it scales terribly with the number of unique values. For a real problem (where A has 10M elements with 100k unique values), even this stupid for loop is 100x faster:
>> B = cell(max(A),1);
>> for ii=1:numel(A), B{A(ii)}(end+1)=ii; end
But I feel like this can't possibly be the best way to do it.
We can assume that A contains only integers from 1 to the max (because if it doesn't, I can always pass it through unique to make it so).
That's a simple task for accumarray:
out = accumarray(A(:),(1:numel(A)).',[],#(x) {x}) %'
out{1} = 5
out{2} = 3 4 2 1
out{3} = 6
out{4} = 8 7
However accumarray suffers from not being stable (in the sense of unique's feature), so you might want to have a look here for a stable version of accumarray, if that's a problem.
Above solution also assumes A to be filled with integers, preferably with no gaps in between. If that is not the case, there is no way around a call of unique in advance:
A = [2.1 2.1 2.1 2.1 1.1 3.1 4.1 4.1];
[~,~,subs] = unique(A)
out = accumarray(subs(:),(1:numel(A)).',[],#(x) {x})
To sum up, the most generic solution, working with floats and returning a sorted output could be:
[~,~,subs] = unique(A)
[subs(:,end:-1:1), I] = sortrows(subs(:,end:-1:1)); %// optional
vals = 1:numel(A);
vals = vals(I); %// optional
out = accumarray(subs, vals , [],#(x) {x});
out{1} = 5
out{2} = 1 2 3 4
out{3} = 6
out{4} = 7 8
Benchmark
function [t] = bench()
%// data
a = rand(100);
b = repmat(a,100);
A = b(randperm(10000));
%// functions to compare
fcns = {
#() thewaywewalk(A(:).');
#() cst(A(:).');
};
% timeit
t = zeros(2,1);
for ii = 1:100;
t = t + cellfun(#timeit, fcns);
end
format long
end
function out = thewaywewalk(A)
[~,~,subs] = unique(A);
[subs(:,end:-1:1), I] = sortrows(subs(:,end:-1:1));
idx = 1:numel(A);
out = accumarray(subs, idx(I), [],#(x) {x});
end
function out = cst(A)
[B, IX] = sort(A);
out = mat2cell(IX, 1, diff(find(diff([-Inf,B,Inf])~=0)));
end
0.444075509687511 %// thewaywewalk
0.221888202987325 %// CST-Link
Surprisingly the version with stable accumarray is faster than the unstable one, due to the fact that Matlab prefers sorted arrays to work on.
This solution should work in O(N*log(N)) due sorting, but is quite memory intensive (requires 3x the amount of input memory):
[U, X] = sort(A);
B = mat2cell(X, 1, diff(find(diff([Inf,U,-Inf])~=0)));
I am curious about the performance though.

Function in Matlab that returns indices

How to write a function in Matlab that takes a matrix with a single 1 value in each column and returns the index of this 1.
Ex:
if the input is x=[0 0 1;1 0 0;0 1 0] it will return indices=[2 3 1]
find is indeed the way to go
[indices,~] = find(x);
If you want to do it more cryptically, or hate find for some reason, you could also use cumsum:
indices = 4 - sum(cumsum(x,1),1);
If you're looking for the row index of the ones, this should do the trick:
[indices,~] = ind2sub(size(x),find(x))
You could also use the second output of max:
[~, result] = max(x==1, [], 1);
A slightly more esoretic approach:
result = nonzeros(bsxfun(#times, x==1, (1:size(x,1)).'));

Find repeated element in array

Consider array of INT of positive numbers:
{1,3,6,4,7,6,9,2,6,6,6,6,8}
Given: only one number is repeated, return number and positions with efficient algorithm.
Any ideas for efficient algorithms?
One possible solution is to maintain an external hash map. Iterate the array, and place the indices of values found into the hash map. When done, you now know which number was duplicated and the indices of the locations it was found at.
In an interview situation, I guess its your chance to ask around the question, for example, how many numbers? what range of numbers? you could state that an optimum algorithm could change depending.
That gives you a chance to show how you solve problems.
If the range of ints in the array is small enough then you could create another array to keep count of the number of times each integer is found then go linearly through the array accumulating occurrence counts, stopping when you get to an occurance count of two.
Hash will do just fine in here. Add numbers to it one by one, each time checking if number's already in there.
Well, there probably is some trick (usually is). But just off the cuff, you should be able to sort the list (O(nlogn)). Then its just a matter of finding a number that is the same as the next one (linear search - O(n)). You'd have to sort it as tuples of values and original indices of course, so you could return that index you are looking for. But the point is that the upper bound on an algorithim that will do the job should be O(nlogn).
If you just go through the list linerally, you could take each index, then search through the rest of the list after it for a matching index. I think that's roughly equivalent to the work done in a bubble sort, so it would probably be O(n^2), but a simple one.
I really hate trick questions as interview questions. They are kind of like optical illusions: Either you see it or you don't, but it doesn't really say anything bad about you if you don't see the trick.
I'd try this:
all elms of list have to be looked at (=> loop over the list)
before the repeated elm is known, store elm => location/index in a hash/dictionary
as soon as the second occurence of the repeated element is found, store its first postion (from the hash) and the current position in the result array
compare further elms of list against the repeated elm, append found locations to the result array
in code:
Function locRep( aSrc )
' to find repeated elm quickly
Dim dicElms : Set dicElms = CreateObject( "Scripting.Dictionary" )
' to store the locations
Dim aLocs : aLocs = Array()
' once found, simple comparison is enough
Dim vRepElm : vRepElm = Empty
Dim nIdx
For nIdx = 0 To UBound( aSrc )
If vRepElm = aSrc( nIdx ) Then ' repeated elm known, just store location
ReDim Preserve aLocs( UBound( aLocs ) + 1 )
aLocs( UBound( aLocs ) ) = nIdx
Else ' repeated elm not known
If dicElms.Exists( aSrc( nIdx ) ) Then ' found it
vRepElm = aSrc( nIdx )
ReDim aLocs( UBound( aLocs ) + 2 )
' location of first occurrence
aLocs( UBound( aLocs ) - 1 ) = dicElms( aSrc( nIdx ) )
' location of this occurrence
aLocs( UBound( aLocs ) ) = nIdx
Else
' location of first occurrence
dicElms( aSrc( nIdx ) ) = nIdx
End If
End If
Next
locRep = aLocs
End Function
Test run:
-------------------------------------------------
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
Src: 1 3 6 4 7 6 9 2 6 6 6 6 8
Res: 2 5 8 9 10 11
ok
Src:
Res:
ok
Src: 1 2 3
Res:
ok
Src: 1 1 2 3 4 5 6
Res: 0 1
ok
Src: 1 2 3 4 5 6 6
Res: 5 6
ok
=================================================
using namespace std;
list<int> find_duplicate_idx(const vector<int>& A)
{
hash_map<int, int> X;
list<int> idx;
for ( int i = 0; i < A.size(); ++ i ) {
hash_map<int, int>::iterator it = X.find(A[i]);
if ( it != X.end() ) {
idx.push_back(it->second);
idx.push_back(i);
for ( int j = i + 1; j < A.size(); ++j )
if ( A[j] == A[i] )
idx.push_back(j);
return idx;
}
X[A[i]] = i;
}
return idx;
}
This is a solution my friend provided. Thank you SETI from mitbbs.com
Use the hash-map to solve it :
private int getRepeatedElementIndex(int[] arr) {
Map<Integer, Integer> map = new HashMap();
// find the duplicate element in an array
for (int i = 0; i < arr.length; i++) {
if(map.containsKey(arr[i])) {
return i;
} else {
map.put(arr[i], i);
}
}
throw new RuntimeException("No repeated element found");
}
Time complexity : O(n)
Space complexity : O(n)

Resources