This question already has answers here:
numpy matrix. move all 0's to the end of each row
(2 answers)
Closed 3 years ago.
Suppose i have a numpy array
a = np.array([[1,2,3,4],
[3,4,5,6],
[2,3,4,4],
[3,3,1,2]])
I want to delete the submatrix [[3,4],[3,1]]. I can do it as follows
mask = np.ones(a.shape,dtype=bool)
mask[2:,1:-1] = False
a_new = a[mask,...]
print(a) #output array([1, 2, 3, 4, 3, 4, 5, 6, 2, 4, 3, 2])
However, i want the output as
np.array([[1,2,3,4],
[3,4,5,6],
[2,4,0,0],
[3,2,0,0]])
I just want numpy to remove the submatrix and shift others elements replacing the empty places with 0. How can i do this?
I cannot find a function that does what you ask, but combining np.roll with a mask with this routine produces your output. Perhaps there is a more elegant way:
a = np.array([[1,2,3,4],
[3,4,5,6],
[2,3,4,4],
[3,3,1,2]])
mask = np.ones(a.shape,dtype=bool)
mask[2:,1:-1] = False
mask2 = mask.copy()
mask2[2:, 1:] = False
n = 2 #shift length
a[~mask2] = np.roll((a * mask)[~mask2],-n)
a
>>array([[1, 2, 3, 4],
[3, 4, 5, 6],
[2, 4, 0, 0],
[3, 2, 0, 0]])
you can simply update those element entries to be zero.
a = np.array([[1,2,3,4],
[3,4,5,6],
[2,3,4,4],
[3,3,1,2]])
a[2:, 2:] = 0
returns
array([[1, 2, 3, 4],
[3, 4, 5, 6],
[2, 3, 0, 0],
[3, 3, 0, 0]])
Related
I have a very large array but here I will show a simplified case:
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
array([[ 3, 0, 5, 0],
[ 8, 7, 6, 10],
[ 5, 4, 0, 10]])
I want to argsort() the array but have a way to distinguish 0s. I tried to replace it with NaN:
a = np.array([[3, np.nan, 5, np.nan], [8, 7, 6, 10], [5, 4, np.nan, 10]])
a.argsort()
array([[0, 2, 1, 3],
[2, 1, 0, 3],
[1, 0, 3, 2]])
But the NaNs are still being sorted. Is there any way to make argsort give it a value of -1 or something. Or is there another option other than NaN to replace 0s? I tried math.inf with no success as well. Anybody has any ideas?
The purpose of doing this is that I have a cosine similarity matrix, and I want to exclude those instances where similarities are 0. I am using argsort() to get the highest similarities, which will give me the indices to another table with mappings to labels. If an array's entire similarity is 0 ([0,0,0]), then I want to ignore it. So if I can get argsort() to output it as [-1,-1,-1] after sorting, I can check to see if the entire array is -1 and exclude it.
EDIT:
So output should be:
array([[0, 2, -1, -1],
[2, 1, 0, 3],
[1, 0, 3, -1]])
So when using the last row to refer back to a: the smallest will be a[1], which is 4, followed by a[0], which is 5, then a[3], which is 10, and at last -1, which is the 0
You may want to use numpy.ma.array() like this
a = np.array([[3,4,5],[8,7,6],[5,4,0]])
mask this array with condition a==0,
a_mask = np.ma.array(a, mask=(a==0))
print(a_mask)
# output
masked_array(
data=[[3, 4, 5],
[8, 7, 6],
[5, 4, --]],
mask=[[False, False, False],
[False, False, False],
[False, False, True]],
fill_value=999999)
print(a_mask.mask)
# outputs
array([[False, False, False],
[False, False, False],
[False, False, True]])
and you can use the mask attribute of masked_array to distinguish elements you want to label and fill in other values.
If you mean "distinguish 0s" as the highest value or lowest values, I would suggest trying:
a[a==0]=(a.max()+1)
or:
a[a==0]=(a.min()-1)
One way to achieve the task is to first generate a boolean mask checking for zero values (since you want to distinguish this in the array), then sort it and then use the boolean mask to set the desired values (e.g., -1)
# your unmodified input array
In [294]: a
Out[294]:
array([[3, 4, 5],
[8, 7, 6],
[5, 4, 0]])
# boolean mask checking for zero
In [295]: zero_bool_mask = a == 0
In [296]: zero_bool_mask
Out[296]:
array([[False, False, False],
[False, False, False],
[False, False, True]])
# usual argsort
In [297]: sorted_idxs = np.argsort(a)
In [298]: sorted_idxs
Out[298]:
array([[0, 1, 2],
[2, 1, 0],
[2, 1, 0]])
# replace the indices of 0 with desired value (e.g., -1)
In [299]: sorted_idxs[zero_bool_mask] = -1
In [300]: sorted_idxs
Out[300]:
array([[ 0, 1, 2],
[ 2, 1, 0],
[ 2, 1, -1]])
Following this, to account for the correct sorting indices after the substitution value (e.g., -1), we have to perform this final step:
In [327]: sorted_idxs - (sorted_idxs == -1).sum(1)[:, None]
Out[327]:
array([[ 0, 1, 2],
[ 2, 1, 0],
[ 1, 0, -2]])
So now the sorted_idxs with negative values are the locations where you had zeros in the original array.
Thus, we can have a custom function like so:
def argsort_excluding_zeros(arr, replacement_value):
zero_bool_mask = arr == 0
sorted_idxs = np.argsort(arr)
sorted_idxs[zero_bool_mask] = replacement_value
return sorted_idxs - (sorted_idxs == replacement_value).sum(1)[:, None]
# another array
In [339]: a
Out[339]:
array([[0, 4, 5],
[8, 7, 6],
[5, 4, 0]])
# sample run
In [340]: argsort_excluding_zeros(a, replacement_value=-1)
Out[340]:
array([[-2, 0, 1],
[ 2, 1, 0],
[ 1, 0, -2]])
Using #kmario23 and #ScienceSnake code, I came up with the solution:
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
b = np.where(a == 0, np.inf, a) # Replace 0 -> inf to make them sorted last
s = b.copy() # make a copy of b to sort it
s.sort()
mask = s == np.inf # create a mask to get inf locations after sorting
c = b.argsort()
d = np.where(mask, -1, c) # Replace where the zeros were originally with -1
Out:
array([[ 0, 2, -1, -1],
[ 2, 1, 0, 3],
[ 1, 0, 3, -1]])
Not the most efficient solution because it is sorting twice.....
There might be a slightly more efficient alternative, but this works in pure numpy and is very transparent.
import numpy as np
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
b = np.where(a == 0, np.inf, a) # Replace 0 -> inf to make them sorted last
c = b.argsort()
d = np.where(a == 0, -1, c) # Replace where the zeros were originally with -1
print(d)
outputs
[[ 0 -1 1 -1]
[ 2 1 0 3]
[ 1 0 -1 2]]
To save memory, some of the in-between assignments can be skipped, but I left it this way for clarity.
*** EDIT ***
The OP has clarified exactly what output they want. This is my new solution which has only one sort.
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
b = np.where(a == 0, np.inf, a).argsort()
def remove_invalid_entries(row, num_valid):
row[num_valid.pop():] = -1
return row
num_valid = np.flip(np.count_nonzero(a, 1)).tolist()
b = np.apply_along_axis(remove_invalid_entries, 1, b, num_valid)
print(b)
> [[ 0 2 -1 -1]
[ 2 1 0 3]
[ 1 0 3 -1]]
The start is as before. Then, we go through the argsorted list row by row, and replace the last n elements by -1, where n is the number of 0's that are in the corresponding row of the original list. The fastest way of doing this is with np.apply_along_axis. Here, I counted all the zeros in each row of a, and turn it into a list (reversed order) so that I can use pop() to get the number of elements to keep in the current row of b being iterated over by np.apply_along_axis.
We are given an array sample a, shown below, and a constant c.
import numpy as np
a = np.array([[1, 3, 1, 11, 9, 14],
[2, 12, 1, 10, 7, 6],
[6, 7, 2, 14, 2, 15],
[14, 8, 1, 3, -7, 2],
[0, -3, 0, 3, -3, 0],
[2, 2, 3, 3, 12, 13],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
c = 2
It is convenient, in this problem, to think of each array row as being composed of three pairs, so the 1st row is [1,3, 1,11, 9,14].
DEFINITION: d_min is the minimum difference between the elements of two consecutive pairs.
The PROBLEM: I want to retain rows of array a, where all consecutive pairs have d_min <= c. Otherwise, the rows should be eliminated.
In the 1st array row, the 1st pair (1,3) and the 2nd pair (1,11) have d_min = 1-1=0.
The 2nd pair (1,11) and the 3rd pair(9,14) have d_min = 11-9=2. (in both cases, d_min<=c, so we keep this row in a)
In the 2nd array row, the 1st pair (2,12) and the 2nd pair (1,10) have d_min = 2-1=1.
But, the 2nd pair (1,10) and the 3rd pair(7,6) have d_min = 10-7=3. (3 > c, so this row should be eliminated from array a)
Current efforts: I currently handle this problem with nested for-loops (2 deep).
The outer loop runs through the rows of array a, determining d_min between the first two pairs using:
for r in a
d_min = np.amin(np.abs(np.subtract.outer(r[:2], r[2:4])))
The inner loop uses the same method to determine the d_min between the last two pairs.
Further processing only is done only when d_min<= c for both sets of consecutive pairs.
I'm really hoping there is a way to avoid the for-loops. I eventually need to deal with 8-column arrays, and my current approach would involve 3-deep looping.
In the example, there are 4 row eliminations. The final result should look like:
a = np.array([[1, 3, 1, 11, 9, 14],
[0, -3, 0, 3, -3, 0],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
Assume the number of elements in each row is always even:
import numpy as np
a = np.array([[1, 3, 1, 11, 9, 14],
[2, 12, 1, 10, 7, 6],
[6, 7, 2, 14, 2, 15],
[14, 8, 1, 3, -7, 2],
[0, -3, 0, 3, -3, 0],
[2, 2, 3, 3, 12, 13],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
c = 2
# separate the array as previous pairs and next pairs
sx, sy = a.shape
prev_shape = sx, (sy - 2) // 2, 1, 2
next_shape = sx, (sy - 2) // 2, 2, 1
prev_pairs = a[:, :-2].reshape(prev_shape)
next_pairs = a[:, 2:].reshape(next_shape)
# subtract which will effectively work as outer subtraction due to numpy broadcasting, and
# calculate the minimum difference for each pair
pair_diff_min = np.abs(prev_pairs - next_pairs).min(axis=(2, 3))
# calculate the filter condition as boolean array
to_keep = pair_diff_min.max(axis=1) <= c
print(a[to_keep])
#[[ 1 3 1 11 9 14]
# [ 0 -3 0 3 -3 0]
# [ 3 14 4 12 1 4]
# [ 0 13 13 4 0 3]]
Demo Link
I am in the following situation - I have the following:
Multidimensional numpy array a of n dimensions
t, an array of k rows (tuples), each with n elements. In other words, each row in this array is an index in a
What I want: from a, return an array b with k scalar elements, the ith element in b being the result of indexing a with the ith tuple from t.
Seems trivial enough. The following approach, however, does not work
def get(a, t):
# wrong result + takes way too long
return a[t]
I have to resort to doing this iteratively i.e. the following works correctly:
def get(a, t):
res = []
for ind in t:
a_scalar = a
for i in ind:
a_scalar = a_scalar[i]
# a_scalar is now a scalar
res.append(a_scalar)
return res
This works, except for the fact that given that each dimension in a has over 30 elements, the procedure does get really slow when n gets to more than 5. I understand that it would be slow regardless, however, I would like to exploit numpy's capabilities as I believe it would speed up this process considerably.
The key to getting this right is to understand the roles of indexing lists and tuples. Often the two are treated the same, but in numpy indexing, tuples, list and arrays convey different information.
In [1]: a = np.arange(12).reshape(3,4)
In [2]: t = np.array([(0,0),(1,1),(2,2)])
In [4]: a
Out[4]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [5]: t
Out[5]:
array([[0, 0],
[1, 1],
[2, 2]])
You tried:
In [6]: a[t]
Out[6]:
array([[[ 0, 1, 2, 3],
[ 0, 1, 2, 3]],
[[ 4, 5, 6, 7],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[ 8, 9, 10, 11]]])
So what's wrong with it? It ran, but selected a (3,2) array of rows of a. That is, it applied t to just the first dimension, effectively a[t, :]. You want to index on all dimensions, some sort of a[t1, t2]. That's the same as a[(t1,t2)] - a tuple of indices.
In [10]: a[tuple(t[0])] # a[(0,0)]
Out[10]: 0
In [11]: a[tuple(t[1])] # a[(1,1)]
Out[11]: 5
In [12]: a[tuple(t[2])]
Out[12]: 10
or doing all at once:
In [13]: a[(t[:,0], t[:,1])]
Out[13]: array([ 0, 5, 10])
Another way to write it, is n lists (or arrays), one for each dimension:
In [14]: a[[0,1,2],[0,1,2]]
Out[14]: array([ 0, 5, 10])
In [18]: tuple(t.T)
Out[18]: (array([0, 1, 2]), array([0, 1, 2]))
In [19]: a[tuple(t.T)]
Out[19]: array([ 0, 5, 10])
More generally, in a[idx1, idx2] array idx1 is broadcast against idx2 to produce a full selection array. Here the 2 arrays are 1d and match, the selection is your t set of pairs. But the same principle applies to selecting a set of rows and columns, a[ [[0],[2]], [0,2,3] ].
Using the ideas in [10] and following, your get could be sped up with:
In [20]: def get(a, t):
...: res = []
...: for ind in t:
...: res.append(a[tuple(ind)]) # index all dimensions at once
...: return res
...:
In [21]: get(a,t)
Out[21]: [0, 5, 10]
If t really was a list of tuples (as opposed to an array built from them), your get could be:
In [23]: tl = [(0,0),(1,1),(2,2)]
In [24]: [a[ind] for ind in tl]
Out[24]: [0, 5, 10]
Explore using np.ravel_multi_index
Create some test data
arr = np.arange(10**4)
arr.shape=10,10,10,10
t = []
for j in range(5):
t.append( tuple(np.random.randint(10, size = 4)))
print(t)
# [(1, 8, 2, 0),
# (2, 3, 3, 6),
# (1, 4, 8, 5),
# (2, 2, 6, 3),
# (0, 5, 0, 2),]
ta = np.array(t).T
print(ta)
# array([[1, 2, 1, 2, 0],
# [8, 3, 4, 2, 5],
# [2, 3, 8, 6, 0],
# [0, 6, 5, 3, 2]])
arr.ravel()[np.ravel_multi_index(tuple(ta), (10,10,10,10))]
# array([1820, 2336, 1485, 2263, 502]
np.ravel_multi_index basically calculates, from the tuple of input arrays, the index into a flattened array that starts with shape (in this case) (10, 10, 10, 10).
Does this do what you need? Is it fast enough?
I am trying to find the common elements in two arrays.
pairs = Array.new
a = exchange_one.get_symbols
b = exchange_two.get_symbols
c = a+b
c.uniq{|pair| pairs << pair}
I am combining the two arrays using +
Then I am calling uniq to remove the duplicate, but passing it to a block so the found duplicates can be added to an array before they are deleted.
For some reason the array pairs is just the entire c array.
What is the correct way to find array similarities.
If your goal is simply to determine which elements are the same between two arrays, you can use the intersection operator Array#&.
a = exchange_one.get_symbols
b = exchange_two.get_symbols
intersection = a & b
First understand what are you doing and what you want.
For eg.
a = 15.times.map { rand 6 }
#=> [1, 0, 5, 3, 1, 3, 4, 1, 3, 2, 1, 2, 4, 2, 3]
b = 15.times.map { rand 6 }
#=> [3, 3, 3, 1, 3, 1, 3, 1, 5, 1, 4, 2, 0, 0, 4]
Now what are you doing
c = a + b
#=> [1, 0, 5, 3, 1, 3, 4, 1, 3, 2, 1, 2, 4, 2, 3, 3, 3, 3, 1, 3, 1, 3, 1, 5, 1, 4, 2, 0, 0, 4]
c - only combine arrays irrespective of content hence get all values.
Now
pairs = Array.new
c.uniq{|pair| pairs << pair}
Here uniq is just act as a iterator means if you check 'pair' then it iterate all the values of 'c' and insert those values in 'pairs' array.
check this
c.uniq{|pair| puts pair}
Thats why you are getting all values within 'pairs' array.
The best way to find similarity in arrays is (a&b), but you can make changes in your code as follow to achieve it.
pairs = (arr1+arr2).uniq
OR
pairs = arr1 & arr2 #best and efficient way.
Suppose:
arr1 = 15.times.map { rand 6 }
#=> [1, 0, 4, 0, 2, 3, 1, 0, 2, 4, 4, 1, 3, 1, 1]
arr2 = 15.times.map { rand 6 }
#=> [5, 5, 4, 1, 5, 1, 5, 0, 4, 0, 2, 0, 4, 5, 0]
arr1 contains 5 1s and arr2 contains 2 1s. If, by "common elements" you wish to report that both arrays contain [5, 2].min #=> 2 1s, and similar counts for the other elements that appear in either array, you can do the following:
h1 = count(arr1)
#=> {1=>5, 0=>3, 4=>3, 2=>2, 3=>2}
h2 = count(arr2)
#=> {5=>5, 4=>3, 1=>2, 0=>4, 2=>1}
(h1.keys | h2.keys).each_with_object({}) { |k,h| h[k] = [h1[k], h2[k]].min }
#=> {1=>2, 0=>3, 4=>3, 2=>1, 3=>0, 5=>0}
def count(arr)
arr.each_with_object(Hash.new(0)) { |n,h| h[n] += 1 }
end
How do I find columns in a numpy array that are all-zero and then delete them from the array? I'm looking for a way to both get the column indices and then use those indices to delete.
You could use np.argwhere, with np.all to find your indices. To delete them, use np.delete.
Example:
Find your 0 columns:
a = np.array([[1, 2, 0, 3, 0],
[4, 5, 0, 6, 0],
[7, 8, 0, 9, 0]])
idx = np.argwhere(np.all(a[..., :] == 0, axis=0))
>>> idx
array([[2],
[4]])
Delete your columns
a2 = np.delete(a, idx, axis=1)
>>> a2
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Here is a solution I got
Let say that OriginMat is the matrix with the original data,
And the Result is the matrix I would like to place the result, Then
Result = OriginMat[:,~np.all(OriginMat == 0, axis = 0)]
breaking it down it would be
This check over the column(axis 0) whether or not the values are 0
And negates this value so the columns with zero are taken as false
~np.all(OriginMat == 0, axis = 0)
The resulting matrix would be a vector with False where all elements
are 0 and True when they are not
And the last step just picks the columns that are True(Hence not 0)
I got this solution thanks to the website below:
https://www.science-emergence.com/Articles/How-to-remove-array-rows-that-contain-only-0-in-python/
# Some random array of 1's and 0's
x = np.random.randint(0,2, size=(3, 100))
# Find where all values in the columns are zero
mask = (x == 0).all(0)
# Find the indices of these columns
column_indices = np.where(mask)[0]
# Update x to only include the columns where non-zero values occur.
x = x[:,~mask]
The following works, simplifying #sacuL's anwer:
$ a = np.array([[1, 2, 0, 3, 0],
[4, 5, 0, 6, 0],
[7, 8, 0, 9, 0]])
$ a = a[:, np.any(a, axis=0)]
$ a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])