Numpy argsort while distinguishing values of 0 - arrays

I have a very large array but here I will show a simplified case:
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
array([[ 3, 0, 5, 0],
[ 8, 7, 6, 10],
[ 5, 4, 0, 10]])
I want to argsort() the array but have a way to distinguish 0s. I tried to replace it with NaN:
a = np.array([[3, np.nan, 5, np.nan], [8, 7, 6, 10], [5, 4, np.nan, 10]])
a.argsort()
array([[0, 2, 1, 3],
[2, 1, 0, 3],
[1, 0, 3, 2]])
But the NaNs are still being sorted. Is there any way to make argsort give it a value of -1 or something. Or is there another option other than NaN to replace 0s? I tried math.inf with no success as well. Anybody has any ideas?
The purpose of doing this is that I have a cosine similarity matrix, and I want to exclude those instances where similarities are 0. I am using argsort() to get the highest similarities, which will give me the indices to another table with mappings to labels. If an array's entire similarity is 0 ([0,0,0]), then I want to ignore it. So if I can get argsort() to output it as [-1,-1,-1] after sorting, I can check to see if the entire array is -1 and exclude it.
EDIT:
So output should be:
array([[0, 2, -1, -1],
[2, 1, 0, 3],
[1, 0, 3, -1]])
So when using the last row to refer back to a: the smallest will be a[1], which is 4, followed by a[0], which is 5, then a[3], which is 10, and at last -1, which is the 0

You may want to use numpy.ma.array() like this
a = np.array([[3,4,5],[8,7,6],[5,4,0]])
mask this array with condition a==0,
a_mask = np.ma.array(a, mask=(a==0))
print(a_mask)
# output
masked_array(
data=[[3, 4, 5],
[8, 7, 6],
[5, 4, --]],
mask=[[False, False, False],
[False, False, False],
[False, False, True]],
fill_value=999999)
print(a_mask.mask)
# outputs
array([[False, False, False],
[False, False, False],
[False, False, True]])
and you can use the mask attribute of masked_array to distinguish elements you want to label and fill in other values.

If you mean "distinguish 0s" as the highest value or lowest values, I would suggest trying:
a[a==0]=(a.max()+1)
or:
a[a==0]=(a.min()-1)

One way to achieve the task is to first generate a boolean mask checking for zero values (since you want to distinguish this in the array), then sort it and then use the boolean mask to set the desired values (e.g., -1)
# your unmodified input array
In [294]: a
Out[294]:
array([[3, 4, 5],
[8, 7, 6],
[5, 4, 0]])
# boolean mask checking for zero
In [295]: zero_bool_mask = a == 0
In [296]: zero_bool_mask
Out[296]:
array([[False, False, False],
[False, False, False],
[False, False, True]])
# usual argsort
In [297]: sorted_idxs = np.argsort(a)
In [298]: sorted_idxs
Out[298]:
array([[0, 1, 2],
[2, 1, 0],
[2, 1, 0]])
# replace the indices of 0 with desired value (e.g., -1)
In [299]: sorted_idxs[zero_bool_mask] = -1
In [300]: sorted_idxs
Out[300]:
array([[ 0, 1, 2],
[ 2, 1, 0],
[ 2, 1, -1]])
Following this, to account for the correct sorting indices after the substitution value (e.g., -1), we have to perform this final step:
In [327]: sorted_idxs - (sorted_idxs == -1).sum(1)[:, None]
Out[327]:
array([[ 0, 1, 2],
[ 2, 1, 0],
[ 1, 0, -2]])
So now the sorted_idxs with negative values are the locations where you had zeros in the original array.
Thus, we can have a custom function like so:
def argsort_excluding_zeros(arr, replacement_value):
zero_bool_mask = arr == 0
sorted_idxs = np.argsort(arr)
sorted_idxs[zero_bool_mask] = replacement_value
return sorted_idxs - (sorted_idxs == replacement_value).sum(1)[:, None]
# another array
In [339]: a
Out[339]:
array([[0, 4, 5],
[8, 7, 6],
[5, 4, 0]])
# sample run
In [340]: argsort_excluding_zeros(a, replacement_value=-1)
Out[340]:
array([[-2, 0, 1],
[ 2, 1, 0],
[ 1, 0, -2]])

Using #kmario23 and #ScienceSnake code, I came up with the solution:
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
b = np.where(a == 0, np.inf, a) # Replace 0 -> inf to make them sorted last
s = b.copy() # make a copy of b to sort it
s.sort()
mask = s == np.inf # create a mask to get inf locations after sorting
c = b.argsort()
d = np.where(mask, -1, c) # Replace where the zeros were originally with -1
Out:
array([[ 0, 2, -1, -1],
[ 2, 1, 0, 3],
[ 1, 0, 3, -1]])
Not the most efficient solution because it is sorting twice.....

There might be a slightly more efficient alternative, but this works in pure numpy and is very transparent.
import numpy as np
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
b = np.where(a == 0, np.inf, a) # Replace 0 -> inf to make them sorted last
c = b.argsort()
d = np.where(a == 0, -1, c) # Replace where the zeros were originally with -1
print(d)
outputs
[[ 0 -1 1 -1]
[ 2 1 0 3]
[ 1 0 -1 2]]
To save memory, some of the in-between assignments can be skipped, but I left it this way for clarity.
*** EDIT ***
The OP has clarified exactly what output they want. This is my new solution which has only one sort.
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
b = np.where(a == 0, np.inf, a).argsort()
def remove_invalid_entries(row, num_valid):
row[num_valid.pop():] = -1
return row
num_valid = np.flip(np.count_nonzero(a, 1)).tolist()
b = np.apply_along_axis(remove_invalid_entries, 1, b, num_valid)
print(b)
> [[ 0 2 -1 -1]
[ 2 1 0 3]
[ 1 0 3 -1]]
The start is as before. Then, we go through the argsorted list row by row, and replace the last n elements by -1, where n is the number of 0's that are in the corresponding row of the original list. The fastest way of doing this is with np.apply_along_axis. Here, I counted all the zeros in each row of a, and turn it into a list (reversed order) so that I can use pop() to get the number of elements to keep in the current row of b being iterated over by np.apply_along_axis.

Related

How to optimise code that parses a 2-d array in Ruby

Note: This question poses a problem that I have already solved, however I feel my solution is very rudimentary and that other people, like myself, would benefit from a discussion with input from more experienced developers. Different approaches to solving the problem, as well as more sophisticated methods and algorithms would be really appreciated. I feel this is a good place to learn how Ruby can tackle what I consider to be a fairly difficult problem for a beginner.
Given a 6x6 2D Array arr:
1 1 1 0 0 0
0 1 0 0 0 0
1 1 1 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
We define an hourglass in arr to be a subset of values with indices falling in this pattern in arr's graphical representation:
a b c
d
e f g
There are 16 hourglasses in arr and an hourglass sum is the sum of an hourglass' values. Calculate the hourglass sum for every hourglass in arr, then print the maximum hourglass sum.
For example, given the 2D array:
arr = [
[-9, -9, -9, 1, 1, 1],
[ 0, -9, 0, 4, 3, 2],
[-9, -9, -9, 1, 2, 3],
[ 0, 0, 8, 6, 6, 0],
[ 0, 0, 0, -2, 0, 0],
[ 0, 0, 1, 2, 4, 0]
]
We calculate the following hourglass values:
-63, -34, -9, 12,
-10, 0, 28, 23,
-27, -11, -2, 10,
9, 17, 25, 18
Our highest hourglass value is from the hourglass:
0 4 3
1
8 6 6
My solution is:
def hourglass_sum(arr)
hourglasses = []
arr.each_with_index do |row, i|
# rescue clause to prevent iterating outside the array
unless arr[i].nil?
arr[i].length.times do |iteration|
# generate n 3x3 arrays
r1 = arr[i][iteration...iteration+3]
r2 = arr[i+1][iteration...iteration+3] if arr[i+1] != nil
r3 = arr[i+2][iteration...iteration+3] if arr[i+2] != nil
# rescue clause to stop creating 3x3 arrays that fall outside given input array
if arr[i+1] != nil && arr[i+2] != nil
# take all values except indices 0 and 5 from the 9 element array
result = r1 + [r2[1]] + r3
hourglasses << result.sum unless result.include? nil
end
end
end
end
p hourglasses.max
end
arr = [[-9, -9, -9, 1, 1, 1], [0, -9, 0, 4, 3, 2], [-9, -9, -9, 1, 2, 3], [0, 0, 8, 6, 6, 0], [0, 0 ,0, -2, 0, 0], [0, 0, 1, 2, 4, 0]]
hourglass_sum(arr)
# => 28
One option is to use Matrix methods.
require 'matrix'
ma = Matrix[*arr]
#=> Matrix[[-9, -9, -9, 1, 1, 1],
# [ 0, -9, 0, 4, 3, 2],
# [-9, -9, -9, 1, 2, 3],
# [ 0, 0, 8, 6, 6, 0],
# [ 0, 0, 0, -2, 0, 0],
# [ 0, 0, 1, 2, 4, 0]]
mi = Matrix.build(6-3+1) { |i,j| [i,j] }
#=> Matrix[[[0, 0], [0, 1], [0, 2], [0, 3]],
# [[1, 0], [1, 1], [1, 2], [1, 3]],
# [[2, 0], [2, 1], [2, 2], [2, 3]],
# [[3, 0], [3, 1], [3, 2], [3, 3]]]
def hourglass_val(r,c,ma)
mm = ma.minor(r,3,c,3)
mm.sum - mm[1,0] - mm[1,2]
end
max_hg = mi.max_by { |r,c| hourglass_val(r,c,ma) }
#=> [1,2]
hourglass_val(*max_hg,ma)
#=> 28
[1,2] are the row and column indices of the top-left corner of an optimal hourglass in arr.
Here is an option I came up with.
def width_height(matrix)
[matrix.map(&:size).max || 0, matrix.size]
end
def sum_with_weight_matrix(number_matrix, weight_matrix)
number_width, number_height = width_height(number_matrix)
weight_width, weight_height = width_height(weight_matrix)
width_diff = number_width - weight_width
height_diff = number_height - weight_height
0.upto(height_diff).map do |y|
0.upto(width_diff).map do |x|
weight_height.times.sum do |ry|
weight_width.times.sum do |rx|
weight = weight_matrix.dig(ry, rx) || 0
number = number_matrix.dig(y + ry, x + rx) || 0
number * weight
end
end
end
end
end
arr = [
[-9, -9, -9, 1, 1, 1],
[ 0, -9, 0, 4, 3, 2],
[-9, -9, -9, 1, 2, 3],
[ 0, 0, 8, 6, 6, 0],
[ 0, 0, 0, -2, 0, 0],
[ 0, 0, 1, 2, 4, 0],
]
weights = [
[1, 1, 1],
[0, 1, 0],
[1, 1, 1],
]
sum_matrix = sum_with_weight_matrix(arr, weights)
#=> [
# [-63, -34, -9, 12],
# [-10, 0, 28, 23],
# [-27, -11, -2, 10],
# [ 9, 17, 25, 18]
# ]
max_sum = sum_matrix.flatten.max
#=> 28
This solution uses the width_diff and height_diff to create an output matrix (4x4 for the sample data 0.upto(6 - 3).to_a #=> [0, 1, 2, 3]). The indexes of the weight_matrix (rxand ry) will be used as relative index compared to the larger number_matrix.
If your 2d array always has the same number of elements for each sub-array you can replace matrix.map(&:size).max with matrix[0]&.size || 0 to speed up determining the matrix width. The current solution uses the maximum size of the sub-arrays. Sub-arrays having a smaller size will use 0 for the missing elements thus not effecting the sum.
My solution might be a bit variable heavy. I've done this to have descriptive variable names, that hopefully tell you most you need to know about the solution. You can shorten variable names, or remove them completely when you feel like you don't need them.
If something isn't clear just ask away in the comments.
Without using the Matrix class, here's how I've done it for any arbitrary rectangular array:
offsets = [[-1, -1], [-1, 0], [-1, 1], [0, 0], [1, -1], [1, 0], [1, 1]]
sums = 1.upto(arr.length - 2).flat_map do |i|
1.upto(arr[0].length - 2).map do |j|
offsets.map {|(x, y)| arr[i+x][j+y] }.sum
end
end
puts sums.max
The values we're interested in are just offsets from a current position. We can map out the values in the array relative to the current position by some row and column offset, sum them, then select the max of the sums.

Removing submatrix from numpy array by shifting other elements [duplicate]

This question already has answers here:
numpy matrix. move all 0's to the end of each row
(2 answers)
Closed 3 years ago.
Suppose i have a numpy array
a = np.array([[1,2,3,4],
[3,4,5,6],
[2,3,4,4],
[3,3,1,2]])
I want to delete the submatrix [[3,4],[3,1]]. I can do it as follows
mask = np.ones(a.shape,dtype=bool)
mask[2:,1:-1] = False
a_new = a[mask,...]
print(a) #output array([1, 2, 3, 4, 3, 4, 5, 6, 2, 4, 3, 2])
However, i want the output as
np.array([[1,2,3,4],
[3,4,5,6],
[2,4,0,0],
[3,2,0,0]])
I just want numpy to remove the submatrix and shift others elements replacing the empty places with 0. How can i do this?
I cannot find a function that does what you ask, but combining np.roll with a mask with this routine produces your output. Perhaps there is a more elegant way:
a = np.array([[1,2,3,4],
[3,4,5,6],
[2,3,4,4],
[3,3,1,2]])
mask = np.ones(a.shape,dtype=bool)
mask[2:,1:-1] = False
mask2 = mask.copy()
mask2[2:, 1:] = False
n = 2 #shift length
a[~mask2] = np.roll((a * mask)[~mask2],-n)
a
>>array([[1, 2, 3, 4],
[3, 4, 5, 6],
[2, 4, 0, 0],
[3, 2, 0, 0]])
you can simply update those element entries to be zero.
a = np.array([[1,2,3,4],
[3,4,5,6],
[2,3,4,4],
[3,3,1,2]])
a[2:, 2:] = 0
returns
array([[1, 2, 3, 4],
[3, 4, 5, 6],
[2, 3, 0, 0],
[3, 3, 0, 0]])

Find and delete all-zero columns from Numpy array using fancy indexing

How do I find columns in a numpy array that are all-zero and then delete them from the array? I'm looking for a way to both get the column indices and then use those indices to delete.
You could use np.argwhere, with np.all to find your indices. To delete them, use np.delete.
Example:
Find your 0 columns:
a = np.array([[1, 2, 0, 3, 0],
[4, 5, 0, 6, 0],
[7, 8, 0, 9, 0]])
idx = np.argwhere(np.all(a[..., :] == 0, axis=0))
>>> idx
array([[2],
[4]])
Delete your columns
a2 = np.delete(a, idx, axis=1)
>>> a2
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Here is a solution I got
Let say that OriginMat is the matrix with the original data,
And the Result is the matrix I would like to place the result, Then
Result = OriginMat[:,~np.all(OriginMat == 0, axis = 0)]
breaking it down it would be
This check over the column(axis 0) whether or not the values are 0
And negates this value so the columns with zero are taken as false
~np.all(OriginMat == 0, axis = 0)
The resulting matrix would be a vector with False where all elements
are 0 and True when they are not
And the last step just picks the columns that are True(Hence not 0)
I got this solution thanks to the website below:
https://www.science-emergence.com/Articles/How-to-remove-array-rows-that-contain-only-0-in-python/
# Some random array of 1's and 0's
x = np.random.randint(0,2, size=(3, 100))
# Find where all values in the columns are zero
mask = (x == 0).all(0)
# Find the indices of these columns
column_indices = np.where(mask)[0]
# Update x to only include the columns where non-zero values occur.
x = x[:,~mask]
The following works, simplifying #sacuL's anwer:
$ a = np.array([[1, 2, 0, 3, 0],
[4, 5, 0, 6, 0],
[7, 8, 0, 9, 0]])
$ a = a[:, np.any(a, axis=0)]
$ a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

Add multiple elements in single dimension array at odd intervals

I have to write a function that inserts multiple elements into a single dimension array of unknown length.
For example:
input_array = [1, 2, 3, 4, 5]
Inserting two zeroes between each element, to give:
output_array = [1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5]
.....
Any ideas?
Here is two versions of the code.
Simple for loop:
input_array = [1, 2, 3, 4, 5]
output_array = []
for k in input_array:
output_array.append(k)
output_array.append(0)
output_array.append(0)
print(output_array)
Using list comprehensions:
input_array = [1, 2, 3, 4, 5]
output_array = [item for sublist in [[x, 0, 0] for x in input_array] for item in sublist])
print(output_array)
I can't say whether the asker, as #Willem notes, was looking for a faster solution than he/she himself/herself was able to come up with. In reality, this seems like a simple task:
def fill(iterable, padding: tuple):
result = list()
for i in iterable:
# The * symbol is a sequence unpacking and it serves to flatten the values inside result
# For example, [*(0, 1, 2)] equals [0, 1, 2] and not [(0, 1, 2)]
result.extend([i, *padding])
return result
if __name__ == "__main__":
data = range(1, 6)
padding = (0, 0)
print(fill(data, padding))
I could obviously choose any other container type in place of list assigned to result.
Below is what the above script outputs when running on my machine:
None#vacuum:~$ python3.6 ./test.py
[1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5, 0, 0]

minus specify elements in 2D array numpy

Assume there is a matrix X, a mask and a vector y
>>> X
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>>> mask
array([[False, True, True, True],
[ True, False, True, True],
[ True, True, False, True],
[ True, True, True, False]], dtype=bool)
>>> y
[8, 9, 10]
I want each row of X where mask is true minus y. so i get the result
>>> x[mask].reshape(4,3)-y
array([[-7, -7, -7],
[-4, -3, -3],
[ 0, 0, 1],
[ 4, 4, 4]])
But i want to keep the X a 4*4 matrix. That means where the mask is False, the value of X should not be changed. what should i do? Thanks.
Two approaches could be suggested for in-place edits.
Approach #1 : Boolean-index into X. Reshape it to have same number of elements as number of elements in y. Subtract y from it, thus leveraging broadcasting. Finally index into X with the same mask and assign flattened subtracted values.
-
X[mask] = (X[mask].reshape(X.shape[0],-1) - y).ravel()
Approach #2 : Resize y to have same number of elements as the number of True elements in mask and simply subtract from the masked places in X -
X[mask] -= np.resize(y,mask.sum())
Sample runs -
In [55]: X # Input array
Out[55]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
# Using approach #1
In [56]: X[mask] = (X[mask].reshape(X.shape[0],-1) - y).ravel()
In [57]: X # Changed input array
Out[57]:
array([[ 0, -7, -7, -7],
[-4, 5, -3, -3],
[ 0, 0, 10, 1],
[ 4, 4, 4, 15]])
In [59]: X # Input array
Out[59]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
# Using approach #2
In [60]: X[mask] -= np.resize(y,mask.sum())
In [61]: X # Changed input array
Out[61]:
array([[ 0, -7, -7, -7],
[-4, 5, -3, -3],
[ 0, 0, 10, 1],
[ 4, 4, 4, 15]])

Resources