Max frequency 1d array in a 2d numpy array - arrays

I've a 2d numpy array:
array([[21, 17, 11],
[230, 231, 232],
[21, 17, 11]], dtype=uint8)
I want to find the 1d array which is more frequent. For the above 2d array it is:
[21, 17, 11]. It is something like mode in stats.

We can use np.unique with its optional arg return_counts to get the counts for each unique row and finally get the argmax() to choose the one with the max count -
# a is input array
unq, count = np.unique(a, axis=0, return_counts=True)
out = unq[count.argmax()]
For uint8 type data, we can also convert to 1D by reducing each row to a scalar each and then use np.unique -
s = 256**np.arange(a.shape[-1])
_, idx, count = np.unique(a.dot(s), return_index=True, return_counts=True)
out = a[idx[count.argmax()]]
If we are working with color images that are 3D (the last axis being the color channel) and want to get the most dominant color, we need to reshape with a.reshape(-1,a.shape[-1]) and then feed it to the proposed methods.

Related

Pythonic way to assign 3rd Dimension of Numpy array to 1D Array

I'm trying to flatten an image that's been converted to a 3D numpy array into three separate 1D arrays, representing RGB channels.
The image array is shaped (HEIGHT, WIDTH, RGB), and I've tried in vain to use both index slicing and unzipping to just return the 3rd dimension values.
Ideally, three separate arrays represent each RGB channel,
Example:
print(image)
[
[ [56, 6, 3], [23, 32, 53], [27, 33, 56] ],
[ [57, 2, 3], [23, 246, 49], [29, 253, 58] ]
]
red_channel, green_channel, blue_channel = get_third(image)
print(red_channel)
[56, 23, 27, 57, 23, 29]
I've thought of just using a nested for loop to iterate over the first two dimensions and then add each RGB array to a list or what not, but its my understanding that this would be both inefficient and a bit of an eyesore.
Thanks in advance!
EDIT
Clarification: By unzipping I mean using the star operator (*) within the zip function, like so:
zip(*image)
Also to clarify, I don't intend to retain the width and height, I just want to essentially only flatten and return the 3D dimension of the array.
red_channel, green_channel, blue_channel = np.transpose(np.reshape(image, (-1, 3)))

Get the maximum N elements (along with their indices) of an Array

I've got an array that contains Integers as the one shown below:
val my_array = Array(10, 20, 6, 31, 0, 2, -2)
I need to get the maximum 3 elements of this array along with their corresponding indices (either using a single function or two separate funcs).
For example, the output might be something like:
// max values
Array(31, 20, 10)
// max indices
Array(3, 1, 0)
Although the operations look simple, I was not able to find any relevant functions around.
Here's a straightforward way - zipWithIndex followed by sorting:
val (values, indices) = my_array
.zipWithIndex // add indices
.sortBy(t => -t._1) // sort by values (descending)
.take(3) // take first 3
.unzip // "unzip" the array-of-tuples into tuple-of-arrays
Here's another way to do it:
(my_array zip Stream.from(0)).
sortWith(_._1 > _._1).
take(3)
res1: Array[(Int, Int)] = Array((31,3), (20,1), (10,0))

Element by Element Comparison of Multiple Arrays in MATLAB

I have a multiple input arrays and I want to generate one output array where the value is 0 if all elements in a column are the same and the value is 1 if all elements in a column are different.
For example, if there are three arrays :
A = [28, 28, 43, 43]
B = [28, 43, 43, 28]
C = [28, 28, 43, 43]
Output = [0, 1, 0, 1]
The arrays can be of any size and any number, but the arrays are also the same size.
A none loopy way is to use diff and any to advantage:
A = [28, 28, 43,43];
B = [28, 43, 43,28];
C = [28, 28, 43,43];
D = any(diff([A;B;C])) %Combine all three (or all N) vectors into a matrix. Using the Diff to find the difference between each element from row to row. If any of them is non-zero, then return 1, else return 0.
D = 0 1 0 1
There are several easy ways to do it.
Let's start by putting the relevant vectors in a matrix:
M = [A; B; C];
Now we can do things like:
idx = min(M)==max(M);
or
idx = ~var(M);
No one seems to have addressed that you have a variable amount of arrays. In your case, you have three in your example but you said you could have a variable amount. I'd also like to take a stab at this using broadcasting.
You can create a function that will take a variable number of arrays, and the output will give you an array of an equal number of columns shared among all arrays that conform to the output you're speaking of.
First create a larger matrix that concatenates all of the arrays together, then use bsxfun to take advantage of broadcasting the first row and ensuring that you find columns that are all equal. You can use all to complete this step:
function out = array_compare(varargin)
matrix = vertcat(varargin{:});
out = ~all(bsxfun(#eq, matrix(1,:), matrix), 1);
end
This will take the first row of the stacked matrix and see if this row is the same among all of the rows in the stacked matrix for every column and returns a corresponding vector where 0 denotes each column being all equal and 1 otherwise.
Save this function in MATLAB and call it array_compare.m, then you can call it in MATLAB like so:
A = [28, 28, 43, 43];
B = [28, 43, 43, 28];
C = [28, 28, 43, 43];
Output = array_compare(A, B, C);
We get in MATLAB:
>> Output
Output =
0 1 0 1
Not fancy but will do the trick
Output=nan(length(A),1); %preallocation and check if an index isn't reached
for i=1:length(A)
Output(i)= ~isequal(A(i),B(i),C(i));
end
If someone has an answer without the loop take that, but i feel like performance is not an issue here.

Python 2.7: looping over 1D fibers in a multidimensional Numpy array

I am looking for a way to loop over 1D fibers (row, column, and multi-dimensional equivalents) along any dimension in a 3+-dimensional array.
In a 2D array this is fairly trivial since the fibers are rows and columns, so just saying for row in A gets the job done. But for 3D arrays for example, this expression iterates over 2D slices, not 1D fibers.
A working solution is the one below:
import numpy as np
A = np.arange(27).reshape((3,3,3))
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(A[fiber_index])
However, I am wondering whether there is something that is:
More idiomatic
Faster
Hope you can help!
I think you might be looking for numpy.apply_along_axis
In [10]: def my_func(x):
...: return x**2 + x
In [11]: np.apply_along_axis(my_func, 2, A)
Out[11]:
array([[[ 0, 2, 6],
[ 12, 20, 30],
[ 42, 56, 72]],
[[ 90, 110, 132],
[156, 182, 210],
[240, 272, 306]],
[[342, 380, 420],
[462, 506, 552],
[600, 650, 702]]])
Although many NumPy functions (including sum) have their own axis argument to specify which axis to use:
In [12]: np.sum(A, axis=2)
Out[12]:
array([[ 3, 12, 21],
[30, 39, 48],
[57, 66, 75]])
numpy provides a number of different ways of looping over 1 or more dimensions.
Your example:
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(fiber_index)
print A[fiber_index]
produces something like:
(0, 0)
[0 1 2]
(0, 1)
[3 4 5]
(0, 2)
[6 7 8]
...
generates all index combinations over the 1st 2 dim, giving your function the 1D fiber on the last.
Look at the code for ndindex. It's instructive. I tried to extract it's essence in https://stackoverflow.com/a/25097271/901925.
It uses as_strided to generate a dummy matrix over which an nditer iterate. It uses the 'multi_index' mode to generate an index set, rather than elements of that dummy. The iteration itself is done with a __next__ method. This is the same style of indexing that is currently used in numpy compiled code.
http://docs.scipy.org/doc/numpy-dev/reference/arrays.nditer.html
Iterating Over Arrays has good explanation, including an example of doing so in cython.
Many functions, among them sum, max, product, let you specify which axis (axes) you want to iterate over. Your example, with sum, can be written as:
np.sum(A, axis=-1)
np.sum(A, axis=(1,2)) # sum over 2 axes
An equivalent is
np.add.reduce(A, axis=-1)
np.add is a ufunc, and reduce specifies an iteration mode. There are many other ufunc, and other iteration modes - accumulate, reduceat. You can also define your own ufunc.
xnx suggests
np.apply_along_axis(np.sum, 2, A)
It's worth digging through apply_along_axis to see how it steps through the dimensions of A. In your example, it steps over all possible i,j in a while loop, calculating:
outarr[(i,j)] = np.sum(A[(i, j, slice(None))])
Including slice objects in the indexing tuple is a nice trick. Note that it edits a list, and then converts it to a tuple for indexing. That's because tuples are immutable.
Your iteration can applied along any axis by rolling that axis to the end. This is a 'cheap' operation since it just changes the strides.
def with_ndindex(A, func, ax=-1):
# apply func along axis ax
A = np.rollaxis(A, ax, A.ndim) # roll ax to end (changes strides)
shape = A.shape[:-1]
B = np.empty(shape,dtype=A.dtype)
for ii in np.ndindex(shape):
B[ii] = func(A[ii])
return B
I did some timings on 3x3x3, 10x10x10 and 100x100x100 A arrays. This np.ndindex approach is consistently a third faster than the apply_along_axis approach. Direct use of np.sum(A, -1) is much faster.
So if func is limited to operating on a 1D fiber (unlike sum), then the ndindex approach is a good choice.

Populating array in mathematica

I have a set of around 500 (x,y,z) real values. Since I will need to bin the values based on their (x,y) coordinates, I stripped the z values and stored in on a seperate list. I am left with only the x,y values; I rescaled and rounded them to index pairs in the range of, 1..100 range.
Now I want to populate an array with the z values in a 100x100 matrix at the particular (x,y) coordinates.
More precisely,
I have a set of values for example : data = {{2.62399, 0.338057, 2.09629}, {1.8424, 0.135817, 3.21925}, {0.702257, 1.14502, 3.9335}...
I stripped it of its zvalues and store it in zvalues list:
zvalues = {2.09629, 3.21925, 3.9335....
I rounded, rescaled and created a new array of indices
indices = {{53, 7}, {37, 3}, {14, 23}...
I want to create a new 100x100 matrix and place the zvalues on the coordinates corresponding to the indices matrix
For example, in pseudocode
For (int i = 1, i < 101, i++){
NewArray(indices[i]) = zvalues[i];
}
The first time the loop will run, it should do NewArray(53,7) = 2.09629.
I want to know the syntax to loop through the indices array and populate the 2 dimensional 100x100 NewArray with zvalues
to follow your basic approach you need to initialize the array:
newArray=Table[,{100},{100}]
then in the loop the syntax is:
newArray[[indices[[i,1]],indices[[i,2]]]]=zdata[[i]]
note the double square brackets for referencing parts of arrays (or lists in Mathematica terminology)
A better approach would be to create a SparseArray, which for one thing would not require pre-initialization, or even knowing the dimensions in advance.
Finally in mathematica you can usually use an object oriented approach, avioding the "do" loop all together:
data = {{1.5, 1.1, 1.1}, {2.2, 2.2, 2.2}, {1.01, 2.3, 1.2}};
m1 = Table[, {2}, {2}];
(m1[[Floor[#[[1]]], Floor[#[[2]]]]] = #[[3]]) & /# data;
m1
m2 = SparseArray[ Floor[#[[1 ;; 2]]] -> #[[3]] & /# data , Automatic,];
Normal[m2]
{{1.1, 1.2}, {Null, 2.2}}
{{1.1, 1.2}, {Null, 2.2}}
While I don't understand why you want to create a new way of indexing your array, this will do what you want :
data = {{2.62399, 0.338057, 2.09629}, {1.8424, 0.135817, 3.21925}, {0.702257, 1.14502, 3.9335}};
zvalues = {2.09629, 3.21925, 3.9335};
indices = {{53, 7}, {37, 3}, {14, 23}};
newArray[xIndex_, yIndex_]:=Take[data, Position[indices, {xIndex, yIndex}][[1, 1]]][[1, 3]]
newArray[53, 7]
(* 2.09629 *)

Resources