How to effectively generate an array of tuples using numpy [duplicate] - arrays

This question already has answers here:
Combinations from range of values for given sizes
(3 answers)
Closed 3 years ago.
I would like to effectively generate a numpy array of tuples which size is the multiple of the dimensions of each axis using numpy.arange() and exclusively using numpy functions. For example: the size of a_list below is max_i*max_j*max_k.
Moreover, the array that I would like to obtain for the example below looks like this : [(0,0,0), (0,0,1), ..., (0, 0, 9), (0, 1, 0), (0, 1, 1), ..., (9, 4, 14)]
a_list = list()
max_i = 10
max_j = 5
max_k = 15
for i in range(0, max_i):
for j in range(0, max_j):
for k in range(0, max_k):
a_list.append((i, j, k))
The loop's complexity above, relying on list and for loops, is O(max_i*max_j*max_k), I would like to use a factorized way to generate a lookalike array of tuples in numpy. Is it possible ?

I like Divakar's solution in the comments better, but here's another.
What you're describing is a cartesian product. With some help from this post, you can achieve this as follows
import numpy as np
# Input
max_i, max_j, max_k = (10, 5, 15)
# Build sequence arrays 0, 1, ... N
arr_i = np.arange(0, max_i)
arr_j = np.arange(0, max_j)
arr_k = np.arange(0, max_k)
# Build cartesian product of sequence arrays
grid = np.meshgrid(arr_i, arr_j, arr_k)
cartprod = np.stack(grid, axis=-1).reshape(-1, 3)
# Convert to list of tuples
result = list(map(tuple, cartprod))

Related

Plotting a list vs a list of arrays with matplotlib

Let's say I have two lists a and b, whereas one is a list of arrays
a = [1200, 1400, 1600, 1800]
b = [array([ 1.84714754, 4.94204658, 11.61580355, ..., 17.09772144,
17.09537562, 17.09499705]), array([ 3.08541849, 5.11338795, 10.26957508, ..., 16.90633304,
16.90417909, 16.90458781]), array([ 4.61916789, 4.58351918, 4.37590053, ..., -2.76705271,
-2.46715664, -1.94577492]), array([7.11040853, 7.79529924, 8.48873734, ..., 7.78736448, 8.47749987,
9.36040364])]
The shape of both is said to be (4,)
If I now try to plot these via plt.scatter(a, b)
I get an error I can't relate to: ValueError: setting an array element with a sequence.
At the end I want a plot where per n-th value in a a set of values stored as n-th array in b shall be plotted.
I'm pretty sure I've done this before, but I can't get this working.
Any ideas? ty
You need to adjust the elements in a to match the elements in b
len_b = [len(sub_array) for sub_array in b]
a = [repeat_a for i,repeat_a in enumerate(a) for _ in range(len_b[i])]
# convert list of array to just list of values
b = np.ravel(b).tolist()
# check if lengths are same
assert len(a) == len(b)
# if yes, now this should work
plt.scatter(a,b)
I am afraid repetition it is. If all lists in b have the same length, you can use numpy.repeat:
import numpy as np
import matplotlib.pyplot as plt
#fake data
np.random.seed(123)
a = [1200, 1400, 1600, 1800]
b = np.random.randint(1, 100, (4, 11)).tolist()
plt.scatter(np.repeat(a, len(b[0])), b)
plt.show()
If you are not sure and want to be on the safe side, list comprehension it is.
import numpy as np
import matplotlib.pyplot as plt
#fake data
np.random.seed(123)
a = [1200, 1400, 1600, 1800]
b = np.random.randint(1, 100, (4, 11)).tolist()
plt.scatter([[x]*len(b[i]) for i, x in enumerate(a)], b)
plt.show()
The output is the same:
Referring to the suggestion of #sai I tried
import numpy as np
arr0 = np.array([1, 2, 3, 4, 5])
arr1 = np.array([6, 7, 8, 9])
arr2 = np.array([10, 11])
old_b = [arr0, arr1, arr2]
b = np.ravel(old_b).tolist()
print(len(b))
Which will give me length 3 instead of the length 11 I expected. How can I collapse a list of arrays to a single list?
edit:
b = np.concatenate(old_b).ravel().tolist()
will lead to the desired result. Thanks all.

How to remove all occurrences of an element from NumPy array? [duplicate]

This question already has answers here:
Deleting certain elements from numpy array using conditional checks
(3 answers)
Remove all occurrences of a value from a list?
(26 answers)
Closed 4 years ago.
The title is pretty self-explanatory: I have an numpy array like (let's say ints)
[ 1 2 10 2 12 2 ] and I would like to remove all occurrences of 2, so that the resulting array is [ 1 10 12 ]. Preferably I would like to do this as fastest as possible, because I am using relatively large arrays.
NumPy has a function called numpy.delete() but it takes the indexes as an argument, which I do not have.
Edit: The question is indeed different from Deleting certain elements from numpy array using conditional checks, which is I guess a more "general" case. However, the idea of removing occurrences from an array is fundamental enough to merit its own explicit question, so I am keeping the question.
You can use indexing:
arr = np.array([1, 2, 10, 2, 12, 2])
print(arr[arr != 2])
# [ 1 10 12]
Timing is pretty good:
from timeit import Timer
arr = np.array(range(5000))
print(min(Timer(lambda: arr[arr != 4999]).repeat(500, 500)))
# 0.004942436999999522
you can use another numpy function.It is numpy.setdiff1d(ar1, ar2, assume_unique=False).
This function Finds the set difference of two arrays.
import numpy as np
a = np.array([1, 2, 10, 2,12, 2])
b = np.array([2])
c = np.setdiff1d(a,b,True)
print(c)
There are several ways to do this. I suggest you use a mask:
import numpy as np
a = np.array([ 1, 2 ,10, 2, 12, 2 ])
a[~np.isin(a, 2)]
>> array([ 1, 10, 12])
np.isin is convenient because you can apply the filter to multiple elements at once if you need to:
a[~np.isin(a, (1,2))]
>> array([ 10, 12])
Also note that a[mask] is a slice of the original array. This is memory efficient; but if you need to create a new array with your filtered values and leave the original ones untouched, use .copy, e.g.:
b = a[~np.isin(a, (1,2))].copy()

Find common elements in subarrays of arrays

I have two numpy arrays of shape arr1=(~140000, 3) and arr2=(~450000, 10). The first 3 elements of each row, for both the arrays, are coordinates (z,y,x). I want to find the rows of arr2 that have the same coordinates of arr1 (which can be considered a subgroup of arr2).
for example:
arr1 = [[1,2,3],[1,2,5],[1,7,8],[5,6,7]]
arr2 = [[1,2,3,7,66,4,3,44,8,9],[1,3,9,6,7,8,3,4,5,2],[1,5,8,68,7,8,13,4,53,2],[5,6,7,6,67,8,63,4,5,20], ...]
I want to find common coordinates (same first 3 elements):
list_arr = [[1,2,3,7,66,4,3,44,8,9], [5,6,7,6,67,8,63,4,5,20], ...]
At the moment I'm doing this double loop, which is extremely slow:
list_arr=[]
for i in arr1:
for j in arr2:
if i[0]==j[0] and i[1]==j[1] and i[2]==j[2]:
list_arr.append (j)
I also tried to create (after the 1st loop) a subarray of arr2, filtering it on the value of i[0] (arr2_filt = [el for el in arr2 if el[0]==i[0]). This speed a bit the operation, but it still remains really slow.
Can you help me with this?
Approach #1
Here's a vectorized one with views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
a,b = view1D(arr1,arr2[:,:3])
out = arr2[np.in1d(b,a)]
Approach #2
Another with dimensionality-reduction for ints -
d = np.maximum(arr2[:,:3].max(0),arr1.max(0))
s = np.r_[1,d[:-1].cumprod()]
a,b = arr1.dot(s),arr2[:,:3].dot(s)
out = arr2[np.in1d(b,a)]
Improvement #1
We could use np.searchsorted to replace np.in1d for both of the approaches listed earlier -
unq_a = np.unique(a)
idx = np.searchsorted(unq_a,b)
idx[idx==len(a)] = 0
out = arr2[unq_a[idx] == b]
Improvement #2
For the last improvement on using np.searchsorted that also uses np.unique, we could use argsort instead -
sidx = a.argsort()
idx = np.searchsorted(a,b,sorter=sidx)
idx[idx==len(a)] = 0
out = arr2[a[sidx[idx]]==b]
You can do it with the help of set
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[7,8,9,11,14,34],[23,12,11,10,12,13],[1,2,3,4,5,6]])
# create array from arr2 with only first 3 columns
temp = [i[:3] for i in arr2]
aset = set([tuple(x) for x in arr])
bset = set([tuple(x) for x in temp])
np.array([x for x in aset & bset])
Output
array([[7, 8, 9],
[1, 2, 3]])
Edit
Use list comprehension
l = [list(i) for i in arr2 if i[:3] in arr]
print(l)
Output:
[[7, 8, 9, 11, 14, 34], [1, 2, 3, 4, 5, 6]]
For integers Divakar already gave an excellent answer. If you want to compare floats you have to consider e.g. the following:
1.+1e-15==1.
False
1.+1e-16==1.
True
If this behaviour could lead to problems in your code I would recommend to perform a nearest neighbour search and probably check if the distances are within a specified threshold.
import numpy as np
from scipy import spatial
def get_indices_of_nearest_neighbours(arr1,arr2):
tree=spatial.cKDTree(arr2[:,0:3])
#You can check here if the distance is small enough and otherwise raise an error
dist,ind=tree.query(arr1, k=1)
return ind

Get the maximum N elements (along with their indices) of an Array

I've got an array that contains Integers as the one shown below:
val my_array = Array(10, 20, 6, 31, 0, 2, -2)
I need to get the maximum 3 elements of this array along with their corresponding indices (either using a single function or two separate funcs).
For example, the output might be something like:
// max values
Array(31, 20, 10)
// max indices
Array(3, 1, 0)
Although the operations look simple, I was not able to find any relevant functions around.
Here's a straightforward way - zipWithIndex followed by sorting:
val (values, indices) = my_array
.zipWithIndex // add indices
.sortBy(t => -t._1) // sort by values (descending)
.take(3) // take first 3
.unzip // "unzip" the array-of-tuples into tuple-of-arrays
Here's another way to do it:
(my_array zip Stream.from(0)).
sortWith(_._1 > _._1).
take(3)
res1: Array[(Int, Int)] = Array((31,3), (20,1), (10,0))

Looping through slices of Theano tensor

I have two 2D Theano tensors, call them x_1 and x_2, and suppose for the sake of example, both x_1 and x_2 have shape (1, 50). Now, to compute their mean squared error, I simply run:
T.sqr(x_1 - x_2).mean(axis = -1).
However, what I wanted to do was construct a new tensor that consists of their mean squared error in chunks of 10. In other words, since I'm more familiar with NumPy, what I had in mind was to create the following tensor M in Theano:
M = [theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1) for i in xrange(0, 50, 10)]
Now, since Theano doesn't have for loops, but instead uses scan (which map is a special case of), I thought I would try the following:
sequence = T.arange(0, 50, 10)
M = theano.map(lambda i: theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1), sequence)
However, this does not seem to work, as I get the error:
only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Is there a way to loop through the slices using theano.scan (or map)? Thanks in advance, as I'm new to Theano!
Similar to what can be done in numpy, a solution would be to reshape your (1, 50) tensor to a (1, 10, 5) tensor (or even a (10, 5) tensor), and then to compute the mean along the second axis.
To illustrate this with numpy, suppose I want to compute means by slices of 2
x = np.array([0, 2, 0, 4, 0, 6])
x = x.reshape([3, 2])
np.mean(x, axis=1)
outputs
array([ 1., 2., 3.])

Resources