Indexing 3D arrays with Numpy - arrays

I have an array in three dimensions (x, y, z) and an indexing vector. This vector has a size equal to the dimension x of the array. Its objective is to index a specific y bringing their respective z, i.e., the expected result has dimension (x, z).
I wrote a code that works as expected, but does anyone know if a Numpy function can replace the for loop and solve the problem more optimally?
arr = np.random.rand(100,5,2)
result = np.random.rand(100,2)
id = [np.random.randint(0, 5) for _ in range(100)]
for i in range(100):
result[i] = arr[i,id[i]]

You can achieve this with this piece of code:
import numpy as np
arr = np.random.randn(100, 5, 2)
ids = np.random.randint(0, 5, size=100)
res = arr[range(100), ids]
res.shape # (100, 2)

Related

how to convert a tuple in to a 2D matrix

I a have tuple a with the shape of (3,1) and I would like to construct a 2D matrix X with the dimension(3,2). After X is constructed, there is a need to multiply X'*X which is supposed to be in the shape of (2,2)
enter image description here
import numpy as np
thistuple = (1, 2, 3)
arr=np.ones(shape=(len(thistuple),2))
tuple_index=0
for i in range(0,len(arr)):
for j in range(0,len(arr[0])):
if(tuple_index>=len(thistuple)):
break
arr[i][j]=thistuple[tuple_index]
tuple_index+=1
rez = arr.T
result = np.dot(rez,arr)
print(result)
The above code will work for an tuple of shape n*1 in python.

Find common elements in subarrays of arrays

I have two numpy arrays of shape arr1=(~140000, 3) and arr2=(~450000, 10). The first 3 elements of each row, for both the arrays, are coordinates (z,y,x). I want to find the rows of arr2 that have the same coordinates of arr1 (which can be considered a subgroup of arr2).
for example:
arr1 = [[1,2,3],[1,2,5],[1,7,8],[5,6,7]]
arr2 = [[1,2,3,7,66,4,3,44,8,9],[1,3,9,6,7,8,3,4,5,2],[1,5,8,68,7,8,13,4,53,2],[5,6,7,6,67,8,63,4,5,20], ...]
I want to find common coordinates (same first 3 elements):
list_arr = [[1,2,3,7,66,4,3,44,8,9], [5,6,7,6,67,8,63,4,5,20], ...]
At the moment I'm doing this double loop, which is extremely slow:
list_arr=[]
for i in arr1:
for j in arr2:
if i[0]==j[0] and i[1]==j[1] and i[2]==j[2]:
list_arr.append (j)
I also tried to create (after the 1st loop) a subarray of arr2, filtering it on the value of i[0] (arr2_filt = [el for el in arr2 if el[0]==i[0]). This speed a bit the operation, but it still remains really slow.
Can you help me with this?
Approach #1
Here's a vectorized one with views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
a,b = view1D(arr1,arr2[:,:3])
out = arr2[np.in1d(b,a)]
Approach #2
Another with dimensionality-reduction for ints -
d = np.maximum(arr2[:,:3].max(0),arr1.max(0))
s = np.r_[1,d[:-1].cumprod()]
a,b = arr1.dot(s),arr2[:,:3].dot(s)
out = arr2[np.in1d(b,a)]
Improvement #1
We could use np.searchsorted to replace np.in1d for both of the approaches listed earlier -
unq_a = np.unique(a)
idx = np.searchsorted(unq_a,b)
idx[idx==len(a)] = 0
out = arr2[unq_a[idx] == b]
Improvement #2
For the last improvement on using np.searchsorted that also uses np.unique, we could use argsort instead -
sidx = a.argsort()
idx = np.searchsorted(a,b,sorter=sidx)
idx[idx==len(a)] = 0
out = arr2[a[sidx[idx]]==b]
You can do it with the help of set
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[7,8,9,11,14,34],[23,12,11,10,12,13],[1,2,3,4,5,6]])
# create array from arr2 with only first 3 columns
temp = [i[:3] for i in arr2]
aset = set([tuple(x) for x in arr])
bset = set([tuple(x) for x in temp])
np.array([x for x in aset & bset])
Output
array([[7, 8, 9],
[1, 2, 3]])
Edit
Use list comprehension
l = [list(i) for i in arr2 if i[:3] in arr]
print(l)
Output:
[[7, 8, 9, 11, 14, 34], [1, 2, 3, 4, 5, 6]]
For integers Divakar already gave an excellent answer. If you want to compare floats you have to consider e.g. the following:
1.+1e-15==1.
False
1.+1e-16==1.
True
If this behaviour could lead to problems in your code I would recommend to perform a nearest neighbour search and probably check if the distances are within a specified threshold.
import numpy as np
from scipy import spatial
def get_indices_of_nearest_neighbours(arr1,arr2):
tree=spatial.cKDTree(arr2[:,0:3])
#You can check here if the distance is small enough and otherwise raise an error
dist,ind=tree.query(arr1, k=1)
return ind

Python: append kmean.labels_ to Numpy array

The size of two Numpy array are:
(406, 278)
(406,)
however, error occurred while appending Numpy array:
ValueError: all the input arrays must have same number of dimensions
code:
y = numpy.array(kmeans.labels_,copy=True)
x = numpy.append(x, y, axis=1); #error
x = numpy.append(x, y, axis=0); #error
As the error says, you are trying to append a 1d array to a 2d array with an axis parameter, and according to docs:
When axis is specified, values must have the correct shape.
You need to reshape y to a 2d array firstly:
Both of these two methods should work:
np.append(x, y[None, :], axis=0)
np.append(x, y.reshape(1,-1), axis=0)
According to numpy documentation ,
When axis is specified, values must have the correct shape.
So if you want to append the vector y = [0 1 2] to the matrix x = [[0, 0],[1, 1],[2, 2]] with axis=1, first you need to turn y into a matrix form, and then transpose it:
x = numpy.zeros((406,278))
y = numpy.zeros((406,))
x = numpy.append(x, numpy.transpose([y]), axis=1);
print(x.shape) # gives (406,279)

Mapping elements in 3D lower "triangle" to linear structure

This is the 3D version of an existing question.
A 3D array M[x,y,z] of shape (n,n,n) should be mapped to a flat vector containing only the elements with x<=y<=z in order to save space. So what I need is an expression similar to the 2D case (index := x + (y+1)*y/2). I tried to derive some formulas but just can't get it right. Note that the element order inside the vector doesn't matter.
This is an extension of user3386109's answer for mapping an array of arbitrary dimension d with shape (n,...,n) into a vector of size size(d,n) only containing the elements whose indices satisfy X_1 <= X_2 <= ... <= X_d.
The 3D version of the equation is
index := (z * (z+1) * (z+2)) / 6 + (y * (y+1))/2 + x
In case someone interested, here is the code of #letmaik answer in python:
import math
from itertools import combinations_with_replacement
import numpy as np
ndim = 3 # The one you'd like
size = 4 # The size you'd like
array = np.ones([size for _ in range(ndim)]) * -1
indexes = combinations_with_replacement([n for n in range(size)], ndim)
def index(*args):
acc = []
for idx, val in enumerate(args):
rx = np.prod([val + i for i in range(idx + 1)])
acc.append(rx / math.factorial(idx + 1))
return sum(acc)
for args in indexes:
array[args] = index(*args)
print(array)
Although I must confess it could be improved as the order of the elements do not seem natural.

Julia Array Concatenation dimension mismatch

I have a dimensional mismatch problem when using y =[x,a] to concatenate my two arrays:
x = reshape(1:16, 4, 4)
x = mean((x ./ mean(x,1)),2)'
a = zeros(3)
println(x)
y =[x,a]
print (y)
If I try combining them I will get this error:
mismatch in dimension 2
Both variables x and a appear to be in the same dimensions in the console:
println(x)
[0.7307313376278893 0.9102437792092966 1.0897562207907034 1.2692686623721108]
println(a)
[0.0,0.0,0.0]
But x is in the second dimension. Is there a way to combine the arrays so I can get in dimension 1?
y = [0.7307313376278893 0.9102437792092966 1.0897562207907034 1.2692686623721108, 0.0,0.0,0.0]
The problem is that by transposing x (putting a ' at the end of the line) you end up with the following:
julia> size(x)
(1,4)
julia> size(a)
(3,)
So when you try y=[x,a] Julia rightfully complains that it cannot concatenate them.
There are (at least) two solutions:
1) Don't transpose x:
x = reshape(1:16, 4, 4)
x = mean((x ./ mean(x,1)),2)
a = zeros(3)
println(x)
y =[x,a]
print (y)
2) also transpose a and concatenate without a comma:
x = reshape(1:16, 4, 4)
x = mean((x ./ mean(x,1)),2)'
a = zeros(3)'
println(x)
y =[x a]
print (y)
In the first case you will have size(y) = (7, 1) and in the second case you will have size(y) = (1,7), so which option you choose will depend on what you want for the size of y.

Resources