Pythonic way to assign 3rd Dimension of Numpy array to 1D Array - arrays

I'm trying to flatten an image that's been converted to a 3D numpy array into three separate 1D arrays, representing RGB channels.
The image array is shaped (HEIGHT, WIDTH, RGB), and I've tried in vain to use both index slicing and unzipping to just return the 3rd dimension values.
Ideally, three separate arrays represent each RGB channel,
Example:
print(image)
[
[ [56, 6, 3], [23, 32, 53], [27, 33, 56] ],
[ [57, 2, 3], [23, 246, 49], [29, 253, 58] ]
]
red_channel, green_channel, blue_channel = get_third(image)
print(red_channel)
[56, 23, 27, 57, 23, 29]
I've thought of just using a nested for loop to iterate over the first two dimensions and then add each RGB array to a list or what not, but its my understanding that this would be both inefficient and a bit of an eyesore.
Thanks in advance!
EDIT
Clarification: By unzipping I mean using the star operator (*) within the zip function, like so:
zip(*image)
Also to clarify, I don't intend to retain the width and height, I just want to essentially only flatten and return the 3D dimension of the array.

red_channel, green_channel, blue_channel = np.transpose(np.reshape(image, (-1, 3)))

Related

Max frequency 1d array in a 2d numpy array

I've a 2d numpy array:
array([[21, 17, 11],
[230, 231, 232],
[21, 17, 11]], dtype=uint8)
I want to find the 1d array which is more frequent. For the above 2d array it is:
[21, 17, 11]. It is something like mode in stats.
We can use np.unique with its optional arg return_counts to get the counts for each unique row and finally get the argmax() to choose the one with the max count -
# a is input array
unq, count = np.unique(a, axis=0, return_counts=True)
out = unq[count.argmax()]
For uint8 type data, we can also convert to 1D by reducing each row to a scalar each and then use np.unique -
s = 256**np.arange(a.shape[-1])
_, idx, count = np.unique(a.dot(s), return_index=True, return_counts=True)
out = a[idx[count.argmax()]]
If we are working with color images that are 3D (the last axis being the color channel) and want to get the most dominant color, we need to reshape with a.reshape(-1,a.shape[-1]) and then feed it to the proposed methods.

combine multiple numpy ndarrays as list

I have three equally dimensioned numpy arrays.
I would like to store the data from all three in an array of the same dimensions and size.
To do this, I would like to store three bytes of information per item in the array. I assume this would be a list.
e.g.
>>>red = np.array([[150,25],[37,214]])
>>>green = np.array([[190,27],[123,231]])
>>>blue = np.array([[10,112],[123,119]])
insert combination magic to make a combined array called RGB
>>>RGB
array([(150,190,10),(25,27,112)],[(37,123,123),(214,231,119)])
For a start, each is 2x2. Combined in a list with array, same construction as in making red, produces a 3x2x2.
In [344]: red = np.array([[150,25],[37,214]])
In [345]: green = np.array([[190,27],[123,231]])
In [346]: blue = np.array([[10,112],[123,119]])
In [347]: np.array([red,green,blue])
Out[347]:
array([[[150, 25],
[ 37, 214]],
[[190, 27],
[123, 231]],
[[ 10, 112],
[123, 119]]])
In [348]: _.shape
Out[348]: (3, 2, 2)
That's not the order you want, but we can easily reshape, and if needed transpose.
The target, with an added set of []
In [350]: np.array([[(150,190,10),(25,27,112)],[(37,123,123),(214,231,119)]])
Out[350]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
In [351]: _.shape
Out[351]: (2, 2, 3)
so try moving the 3 shape to the end with transpose:
In [352]: np.array([red,green,blue]).transpose(1,2,0)
Out[352]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
===========================
I should have suggested stack. This a newish version of concatenate that lets us join arrays on different new dimensions. With axis=0 it behaves like np.array. But to join on the last, to put the rgb dimension last use:
In [467]: np.stack((red,green,blue),axis=-1)
Out[467]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
In [468]: _.shape
Out[468]: (2, 2, 3)
Note that this expression does not assume anything about the shape of red, etc, except that they are equal. So it will work with 3d arrays as well.

Confusion with Fancy indexing (for non-fancy people)

Let's assume a multi-dimensional array
import numpy as np
foo = np.random.rand(102,43,35,51)
I know that those last dimensions represent a 2D space (35,51) of which I would like to index a range of rows of a column
Let's say I want to have rows 8 to 30 of column 0
From my understanding of indexing I should call
foo[0][0][8::30][0]
Knowing my data though (unlike the random data used here), this is not what I expected
I could try this that does work but looks ridiculous
foo[0][0][[8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30],0]
Now from what I can find in this documentation I can also use
something like:
foo[0][0][[8,30],0]
which only gives me the values of rows 8 and 30
while this:
foo[0][0][[8::30],0]
gives an error
File "<ipython-input-568-cc49fe1424d1>", line 1
foo[0][0][[8::30],0]
^
SyntaxError: invalid syntax
I don't understand why the :: argument cannot be passed here. What is then a way to indicate a range in your indexing syntax?
So I guess my overall question is what would be the proper pythonic equivalent of this syntax:
foo[0][0][[8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30],0]
Instead of
foo[0][0][8::30][0]
try
foo[0, 0, 8:30, 0]
The foo[0][0] part is the same as foo[0, 0, :, :], selecting a 2d array (35 x 51). But foo[0][0][8::30] selects a subset of those rows
Consider what happens when is use 0::30 on 2d array:
In [490]: np.zeros((35,51))[0::30].shape
Out[490]: (2, 51)
In [491]: np.arange(35)[0::30]
Out[491]: array([ 0, 30])
The 30 is the step, not the stop value of the slice.
the last [0] then picks the first of those rows. The end result is the same as foo[0,0,0,:].
It is better, in most cases, to index multiple dimensions with the comma syntax. And if you want the first 30 rows use 0:30, not 0::30 (that's basic slicing notation, applicable to lists as well as arrays).
As for:
foo[0][0][[8::30],0]
simplify it to x[[8::30], 0]. The Python interpreter accepts [1:2:3, 0], translating it to tuple(slice(1,2,3), 0) and passing it to a __getitem__ method. But the colon syntax is accepted in a very specific context. The interpreter is treating that inner set of brackets as a list, and colons are not accepted there.
foo[0,0,[1,2,3],0]
is ok, because the inner brackets are a list, and the numpy getitem can handle those.
numpy has a tool for converting a slice notation into a list of numbers. Play with that if it is still confusing:
In [495]: np.r_[8:30]
Out[495]:
array([ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29])
In [496]: np.r_[8::30]
Out[496]: array([0])
In [497]: np.r_[8:30:2]
Out[497]: array([ 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

How to handle large files in python?

I am new in python. I have asked another question How to arrange three lists in such a way that the sum of corresponding elements if greater then appear first? Now the problem is following:
I am working with a large text file, in which there are 419040 rows and 6 columns containing floats. Among them I am taking first 3 columns to generate those three lists. So the lists I am actually working with has 419040 entries in each. While I was running the python code to extract the three columns into three lists the python shell was not responding, I suspected the large number of entries for this, I used this code:
file=open("file_location","r")
a=[]
b=[]
c=[]
for lines in file:
x=lines.split(" ")
a.append(float(x[0]))
b.append(float(x[1]))
c.append(float(x[2]))
Note: for small file this code was running perfectly.
To avoid this problem I am using the following code:
import numpy as np
a = []
b = []
c = []
a,b,c = np.genfromtxt('file_location',usecols = [0,1,2], unpack=True)
So when I am running the code given in answers to my previous question the same problem is happening. So what will be the corresponding code using numpy? Or, any other solutions?
If you're going to use numpy, then I suggest using ndarrays, rather than lists. You can use loadtxt since you don't have to handle missing data. I assume it'll be faster.
a = np.loadtxt('file.txt', usecols=(0, 1, 2))
a is now a two-dimensional array, stored as an np.ndarray datatype. It should look like:
>>> a
array([[ 1, 20, 400],
[ 5, 30, 500],
[ 3, 50, 100],
[ 2, 40, 300],
[ 4, 10, 200]])
However, you now need to re-do what you did in the previous question, but using numpy arrays rather than lists. This can be easily achieved like so:
>>> b = a.sum(axis=1)
>>> b
Out[21]: array([535, 421, 342, 214, 153])
>>> i = np.argsort(b)[::-1]
>>> i
Out[26]: array([0, 1, 2, 3, 4])
>>> a[i, :]
Out[27]:
array([[ 5, 30, 500],
[ 1, 20, 400],
[ 2, 40, 300],
[ 4, 10, 200],
[ 3, 50, 100]])
The steps involved are described in a little greater detail here.

Sorting an array with pre-sorted partitions

I have an array that is already sorted in partitions of 4:
2, 23, 45, 55, 1, 4, 23, 74545, 75, 234, 323, 9090, 2, 43, 6342, 323452
What would be the most efficient way to sort this array? Note: the array size is always even and the program knows that every 4 elements are sorted.
I think you can use merge sort for problems like this.
You might be able to use strand sort for this.

Resources