combine multiple numpy ndarrays as list

combine multiple numpy ndarrays as list - arrays

I have three equally dimensioned numpy arrays.
I would like to store the data from all three in an array of the same dimensions and size.
To do this, I would like to store three bytes of information per item in the array. I assume this would be a list.
e.g.
>>>red = np.array([[150,25],[37,214]])
>>>green = np.array([[190,27],[123,231]])
>>>blue = np.array([[10,112],[123,119]])
insert combination magic to make a combined array called RGB
>>>RGB
array([(150,190,10),(25,27,112)],[(37,123,123),(214,231,119)])

For a start, each is 2x2. Combined in a list with array, same construction as in making red, produces a 3x2x2.
In [344]: red = np.array([[150,25],[37,214]])
In [345]: green = np.array([[190,27],[123,231]])
In [346]: blue = np.array([[10,112],[123,119]])
In [347]: np.array([red,green,blue])
Out[347]:
array([[[150, 25],
[ 37, 214]],
[[190, 27],
[123, 231]],
[[ 10, 112],
[123, 119]]])
In [348]: _.shape
Out[348]: (3, 2, 2)
That's not the order you want, but we can easily reshape, and if needed transpose.
The target, with an added set of []
In [350]: np.array([[(150,190,10),(25,27,112)],[(37,123,123),(214,231,119)]])
Out[350]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
In [351]: _.shape
Out[351]: (2, 2, 3)
so try moving the 3 shape to the end with transpose:
In [352]: np.array([red,green,blue]).transpose(1,2,0)
Out[352]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
===========================
I should have suggested stack. This a newish version of concatenate that lets us join arrays on different new dimensions. With axis=0 it behaves like np.array. But to join on the last, to put the rgb dimension last use:
In [467]: np.stack((red,green,blue),axis=-1)
Out[467]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
In [468]: _.shape
Out[468]: (2, 2, 3)
Note that this expression does not assume anything about the shape of red, etc, except that they are equal. So it will work with 3d arrays as well.

Related

Pythonic way to assign 3rd Dimension of Numpy array to 1D Array

I'm trying to flatten an image that's been converted to a 3D numpy array into three separate 1D arrays, representing RGB channels.
The image array is shaped (HEIGHT, WIDTH, RGB), and I've tried in vain to use both index slicing and unzipping to just return the 3rd dimension values.
Ideally, three separate arrays represent each RGB channel,
Example:
print(image)
[
[ [56, 6, 3], [23, 32, 53], [27, 33, 56] ],
[ [57, 2, 3], [23, 246, 49], [29, 253, 58] ]
]
red_channel, green_channel, blue_channel = get_third(image)
print(red_channel)
[56, 23, 27, 57, 23, 29]
I've thought of just using a nested for loop to iterate over the first two dimensions and then add each RGB array to a list or what not, but its my understanding that this would be both inefficient and a bit of an eyesore.
Thanks in advance!
EDIT
Clarification: By unzipping I mean using the star operator (*) within the zip function, like so:
zip(*image)
Also to clarify, I don't intend to retain the width and height, I just want to essentially only flatten and return the 3D dimension of the array.

red_channel, green_channel, blue_channel = np.transpose(np.reshape(image, (-1, 3)))

How to make a query with a custom order by parameter using array?

I have an algorithm that outputs an array in a particular order. Example:
arr = [0, 1, 21, 2, 22, 23, 24, 25, 3, 27, 35, 36, 28, 37, 38, 4, 29, 5, 34, 6, 7, 8, 9, 10, 11, 12]
The array will be different depending on the user's input so the example above is only one of many undefined amount of possibilities; longer, shorter or different values (all values will be integers). So I wont be able to use case in my query.
I want to produce an SQL-Server query in my views.py to display all objects in my model in that exact order.
Here is my "query" at the moment but obviously it doesn't work.
test = QuoteAssemblies.objects.raw("""SELECT qmaQuoteAssemblyID,
qmaPartID,
qmaLevel,
qmaPartShortDescription,
qmaQuantityPerParent
FROM QuoteAssemblies
WHERE qmaQuoteAssemblyID IN arr
ORDER BY qmaQuoteAssemblyID = arr""")
In essence I want the query to be ordered by qmaQuoteAssemblyID as long as it is in the same order of the array (not ASC, DESC etc).
qmaQuoteAssemblyID = 0
qmaQuoteAssemblyID = 1
qmaQuoteAssemblyID = 21
qmaQuoteAssemblyID = 2
etc...
There is a similar example for MySQL Here. I just need something like that but for MSSQL. Cheers.

If your version of SQL Server supports JSON querying (i.e. 2016+), you can use openjson() function to number the elements of your array, and then use that number for sorting:
declare #Arr nvarchar(max) = '[0, 1, 21, 2, 22, 23, 24, 25, 3, 27, 35, 36, 28, 37, 38, 4, 29, 5, 34, 6, 7, 8, 9, 10, 11, 12]';
SELECT q.qmaQuoteAssemblyID,
q.qmaPartID,
q.qmaLevel,
q.qmaPartShortDescription,
q.qmaQuantityPerParent
FROM dbo.QuoteAssemblies q
inner join openjson(#Arr) ar on ar.[value] = q.qmaQuoteAssemblyID
ORDER BY ar.[key];
If you can't utilise JSON for this task, you will need to somehow produce a rowset with your array elements being correctly numbered, and use it in a similar fashion. There are lots of ways to achieve this, and it doesn't necessarily have to be done on server side. For example, you can create a 2 column key-value user-defined table type in your database, and provide the data as a parameter for your query.
Another approach is to supply the data in the form of XML, something like this:
declare #Ax xml = N'<r>
<i n="0" v="0" />
<i n="1" v="1" />
<i n="2" v="21" />
...
</r>';
SELECT q.qmaQuoteAssemblyID,
q.qmaPartID,
q.qmaLevel,
q.qmaPartShortDescription,
q.qmaQuantityPerParent
FROM dbo.QuoteAssemblies q
inner join #Ax.nodes('/r/i') ar(c) on ar.c.value('./#v', 'int') = q.qmaQuoteAssemblyID
ORDER BY ar.c.value('./#n', 'int');
Still, the numbering of XML nodes is better to be done by the application, as there is no efficient way to do this on the database side. That, and performance might be rather worse compared to the option 1.

Confusion with Fancy indexing (for non-fancy people)

Let's assume a multi-dimensional array
import numpy as np
foo = np.random.rand(102,43,35,51)
I know that those last dimensions represent a 2D space (35,51) of which I would like to index a range of rows of a column
Let's say I want to have rows 8 to 30 of column 0
From my understanding of indexing I should call
foo[0][0][8::30][0]
Knowing my data though (unlike the random data used here), this is not what I expected
I could try this that does work but looks ridiculous
foo[0][0][[8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30],0]
Now from what I can find in this documentation I can also use
something like:
foo[0][0][[8,30],0]
which only gives me the values of rows 8 and 30
while this:
foo[0][0][[8::30],0]
gives an error
File "<ipython-input-568-cc49fe1424d1>", line 1
foo[0][0][[8::30],0]
^
SyntaxError: invalid syntax
I don't understand why the :: argument cannot be passed here. What is then a way to indicate a range in your indexing syntax?
So I guess my overall question is what would be the proper pythonic equivalent of this syntax:
foo[0][0][[8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30],0]

Instead of
foo[0][0][8::30][0]
try
foo[0, 0, 8:30, 0]
The foo[0][0] part is the same as foo[0, 0, :, :], selecting a 2d array (35 x 51). But foo[0][0][8::30] selects a subset of those rows
Consider what happens when is use 0::30 on 2d array:
In [490]: np.zeros((35,51))[0::30].shape
Out[490]: (2, 51)
In [491]: np.arange(35)[0::30]
Out[491]: array([ 0, 30])
The 30 is the step, not the stop value of the slice.
the last [0] then picks the first of those rows. The end result is the same as foo[0,0,0,:].
It is better, in most cases, to index multiple dimensions with the comma syntax. And if you want the first 30 rows use 0:30, not 0::30 (that's basic slicing notation, applicable to lists as well as arrays).
As for:
foo[0][0][[8::30],0]
simplify it to x[[8::30], 0]. The Python interpreter accepts [1:2:3, 0], translating it to tuple(slice(1,2,3), 0) and passing it to a __getitem__ method. But the colon syntax is accepted in a very specific context. The interpreter is treating that inner set of brackets as a list, and colons are not accepted there.
foo[0,0,[1,2,3],0]
is ok, because the inner brackets are a list, and the numpy getitem can handle those.
numpy has a tool for converting a slice notation into a list of numbers. Play with that if it is still confusing:
In [495]: np.r_[8:30]
Out[495]:
array([ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29])
In [496]: np.r_[8::30]
Out[496]: array([0])
In [497]: np.r_[8:30:2]
Out[497]: array([ 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

How to handle large files in python?

I am new in python. I have asked another question How to arrange three lists in such a way that the sum of corresponding elements if greater then appear first? Now the problem is following:
I am working with a large text file, in which there are 419040 rows and 6 columns containing floats. Among them I am taking first 3 columns to generate those three lists. So the lists I am actually working with has 419040 entries in each. While I was running the python code to extract the three columns into three lists the python shell was not responding, I suspected the large number of entries for this, I used this code:
file=open("file_location","r")
a=[]
b=[]
c=[]
for lines in file:
x=lines.split(" ")
a.append(float(x[0]))
b.append(float(x[1]))
c.append(float(x[2]))
Note: for small file this code was running perfectly.
To avoid this problem I am using the following code:
import numpy as np
a = []
b = []
c = []
a,b,c = np.genfromtxt('file_location',usecols = [0,1,2], unpack=True)
So when I am running the code given in answers to my previous question the same problem is happening. So what will be the corresponding code using numpy? Or, any other solutions?

If you're going to use numpy, then I suggest using ndarrays, rather than lists. You can use loadtxt since you don't have to handle missing data. I assume it'll be faster.
a = np.loadtxt('file.txt', usecols=(0, 1, 2))
a is now a two-dimensional array, stored as an np.ndarray datatype. It should look like:
>>> a
array([[ 1, 20, 400],
[ 5, 30, 500],
[ 3, 50, 100],
[ 2, 40, 300],
[ 4, 10, 200]])
However, you now need to re-do what you did in the previous question, but using numpy arrays rather than lists. This can be easily achieved like so:
>>> b = a.sum(axis=1)
>>> b
Out[21]: array([535, 421, 342, 214, 153])
>>> i = np.argsort(b)[::-1]
>>> i
Out[26]: array([0, 1, 2, 3, 4])
>>> a[i, :]
Out[27]:
array([[ 5, 30, 500],
[ 1, 20, 400],
[ 2, 40, 300],
[ 4, 10, 200],
[ 3, 50, 100]])
The steps involved are described in a little greater detail here.

Searching through an unsorted nonuniform pair array for closest entry

I have an array that looks something like this:
[[320, 80], [300, 70], [300, 80], [270, 75], [260, 70], [280, 70]]
That is just a snippet, the actual array is 338 big.
I am trying to find the next logical element in the array based on some input. So for example I feed in two numbers, I.e. 315, 80 The next logical one is 320, 80 if you wanted to find a bigger entry.
I don't want to correlate logical to closest because it depends on whether you want a bigger or smaller element. So I suppose by logical I mean "closest in the required direction"
As an additional requirement the second number should try and remain as close as possible to the entered value OR the first number should try and remain as close as possible to the original number.
I am having issues when it comes to cases such as 275, 70, and I want to find the next smallest. That should be 260, 70 but my implementation keeps picking 280, 70
My current implementation adds the difference between the two numbers and looks for the smallest difference possible. I'm not sure how to enforce a direction.
Python Example (although really I'm looking for a language agnostic solution)
elements = [ [320, 80],
[300, 70],
[300, 80],
[270, 75],
[260, 70],
[280, 70]
]
target = [275, 70]
bestMatch = []
bestDifference = 0
for e in elements:
currentDifference = abs((target[0] - e[0]) - (target[1] - e[1]))
if not bestMatch or currentDifference < bestDifference:
bestMatch = e
bestDifference = currentDifference
print bestMatch

Based on your description and example input I have interpreted that as you should take the min of the two differences, rather than the difference of them. Then you'll pick the element that has the smallest change in either of the two numbers.
To go in the right direction you can just check whether the element you are currently at is larger or smaller than the target
Doing that you'll get the following:
elements = [ [320, 80],
[300, 70],
[300, 80],
[270, 75],
[260, 70],
[280, 70]
]
def nextLogicalElement(target, bigger=True):
bestScore = 0
bestMatch = []
for e in elements:
score = min(abs(target[0] - e[0]), abs(target[1] - e[1]))
if bigger and target[0] > e[0] or not bigger and target[0] < e[0]:
continue
if not bestMatch or score < bestScore:
bestMatch = e
bestScore = score
return bestMatch
Output:
>>> print nextLogicalElement([315, 80], bigger=True)
[320, 80]
>>> print nextLogicalElement([275, 70], bigger=False)
[260, 70]