How to optimise code that parses a 2-d array in Ruby

How to optimise code that parses a 2-d array in Ruby - arrays

Note: This question poses a problem that I have already solved, however I feel my solution is very rudimentary and that other people, like myself, would benefit from a discussion with input from more experienced developers. Different approaches to solving the problem, as well as more sophisticated methods and algorithms would be really appreciated. I feel this is a good place to learn how Ruby can tackle what I consider to be a fairly difficult problem for a beginner.
Given a 6x6 2D Array arr:
1 1 1 0 0 0
0 1 0 0 0 0
1 1 1 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
We define an hourglass in arr to be a subset of values with indices falling in this pattern in arr's graphical representation:
a b c
d
e f g
There are 16 hourglasses in arr and an hourglass sum is the sum of an hourglass' values. Calculate the hourglass sum for every hourglass in arr, then print the maximum hourglass sum.
For example, given the 2D array:
arr = [
[-9, -9, -9, 1, 1, 1],
[ 0, -9, 0, 4, 3, 2],
[-9, -9, -9, 1, 2, 3],
[ 0, 0, 8, 6, 6, 0],
[ 0, 0, 0, -2, 0, 0],
[ 0, 0, 1, 2, 4, 0]
]
We calculate the following hourglass values:
-63, -34, -9, 12,
-10, 0, 28, 23,
-27, -11, -2, 10,
9, 17, 25, 18
Our highest hourglass value is from the hourglass:
0 4 3
1
8 6 6
My solution is:
def hourglass_sum(arr)
hourglasses = []
arr.each_with_index do |row, i|
# rescue clause to prevent iterating outside the array
unless arr[i].nil?
arr[i].length.times do |iteration|
# generate n 3x3 arrays
r1 = arr[i][iteration...iteration+3]
r2 = arr[i+1][iteration...iteration+3] if arr[i+1] != nil
r3 = arr[i+2][iteration...iteration+3] if arr[i+2] != nil
# rescue clause to stop creating 3x3 arrays that fall outside given input array
if arr[i+1] != nil && arr[i+2] != nil
# take all values except indices 0 and 5 from the 9 element array
result = r1 + [r2[1]] + r3
hourglasses << result.sum unless result.include? nil
end
end
end
end
p hourglasses.max
end
arr = [[-9, -9, -9, 1, 1, 1], [0, -9, 0, 4, 3, 2], [-9, -9, -9, 1, 2, 3], [0, 0, 8, 6, 6, 0], [0, 0 ,0, -2, 0, 0], [0, 0, 1, 2, 4, 0]]
hourglass_sum(arr)
# => 28

One option is to use Matrix methods.
require 'matrix'
ma = Matrix[*arr]
#=> Matrix[[-9, -9, -9, 1, 1, 1],
# [ 0, -9, 0, 4, 3, 2],
# [-9, -9, -9, 1, 2, 3],
# [ 0, 0, 8, 6, 6, 0],
# [ 0, 0, 0, -2, 0, 0],
# [ 0, 0, 1, 2, 4, 0]]
mi = Matrix.build(6-3+1) { |i,j| [i,j] }
#=> Matrix[[[0, 0], [0, 1], [0, 2], [0, 3]],
# [[1, 0], [1, 1], [1, 2], [1, 3]],
# [[2, 0], [2, 1], [2, 2], [2, 3]],
# [[3, 0], [3, 1], [3, 2], [3, 3]]]
def hourglass_val(r,c,ma)
mm = ma.minor(r,3,c,3)
mm.sum - mm[1,0] - mm[1,2]
end
max_hg = mi.max_by { |r,c| hourglass_val(r,c,ma) }
#=> [1,2]
hourglass_val(*max_hg,ma)
#=> 28
[1,2] are the row and column indices of the top-left corner of an optimal hourglass in arr.

Here is an option I came up with.
def width_height(matrix)
[matrix.map(&:size).max || 0, matrix.size]
end
def sum_with_weight_matrix(number_matrix, weight_matrix)
number_width, number_height = width_height(number_matrix)
weight_width, weight_height = width_height(weight_matrix)
width_diff = number_width - weight_width
height_diff = number_height - weight_height
0.upto(height_diff).map do |y|
0.upto(width_diff).map do |x|
weight_height.times.sum do |ry|
weight_width.times.sum do |rx|
weight = weight_matrix.dig(ry, rx) || 0
number = number_matrix.dig(y + ry, x + rx) || 0
number * weight
end
end
end
end
end
arr = [
[-9, -9, -9, 1, 1, 1],
[ 0, -9, 0, 4, 3, 2],
[-9, -9, -9, 1, 2, 3],
[ 0, 0, 8, 6, 6, 0],
[ 0, 0, 0, -2, 0, 0],
[ 0, 0, 1, 2, 4, 0],
]
weights = [
[1, 1, 1],
[0, 1, 0],
[1, 1, 1],
]
sum_matrix = sum_with_weight_matrix(arr, weights)
#=> [
# [-63, -34, -9, 12],
# [-10, 0, 28, 23],
# [-27, -11, -2, 10],
# [ 9, 17, 25, 18]
# ]
max_sum = sum_matrix.flatten.max
#=> 28
This solution uses the width_diff and height_diff to create an output matrix (4x4 for the sample data 0.upto(6 - 3).to_a #=> [0, 1, 2, 3]). The indexes of the weight_matrix (rxand ry) will be used as relative index compared to the larger number_matrix.
If your 2d array always has the same number of elements for each sub-array you can replace matrix.map(&:size).max with matrix[0]&.size || 0 to speed up determining the matrix width. The current solution uses the maximum size of the sub-arrays. Sub-arrays having a smaller size will use 0 for the missing elements thus not effecting the sum.
My solution might be a bit variable heavy. I've done this to have descriptive variable names, that hopefully tell you most you need to know about the solution. You can shorten variable names, or remove them completely when you feel like you don't need them.
If something isn't clear just ask away in the comments.

Without using the Matrix class, here's how I've done it for any arbitrary rectangular array:
offsets = [[-1, -1], [-1, 0], [-1, 1], [0, 0], [1, -1], [1, 0], [1, 1]]
sums = 1.upto(arr.length - 2).flat_map do |i|
1.upto(arr[0].length - 2).map do |j|
offsets.map {|(x, y)| arr[i+x][j+y] }.sum
end
end
puts sums.max
The values we're interested in are just offsets from a current position. We can map out the values in the array relative to the current position by some row and column offset, sum them, then select the max of the sums.

Related

Numpy argsort while distinguishing values of 0

I have a very large array but here I will show a simplified case:
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
array([[ 3, 0, 5, 0],
[ 8, 7, 6, 10],
[ 5, 4, 0, 10]])
I want to argsort() the array but have a way to distinguish 0s. I tried to replace it with NaN:
a = np.array([[3, np.nan, 5, np.nan], [8, 7, 6, 10], [5, 4, np.nan, 10]])
a.argsort()
array([[0, 2, 1, 3],
[2, 1, 0, 3],
[1, 0, 3, 2]])
But the NaNs are still being sorted. Is there any way to make argsort give it a value of -1 or something. Or is there another option other than NaN to replace 0s? I tried math.inf with no success as well. Anybody has any ideas?
The purpose of doing this is that I have a cosine similarity matrix, and I want to exclude those instances where similarities are 0. I am using argsort() to get the highest similarities, which will give me the indices to another table with mappings to labels. If an array's entire similarity is 0 ([0,0,0]), then I want to ignore it. So if I can get argsort() to output it as [-1,-1,-1] after sorting, I can check to see if the entire array is -1 and exclude it.
EDIT:
So output should be:
array([[0, 2, -1, -1],
[2, 1, 0, 3],
[1, 0, 3, -1]])
So when using the last row to refer back to a: the smallest will be a[1], which is 4, followed by a[0], which is 5, then a[3], which is 10, and at last -1, which is the 0

You may want to use numpy.ma.array() like this
a = np.array([[3,4,5],[8,7,6],[5,4,0]])
mask this array with condition a==0,
a_mask = np.ma.array(a, mask=(a==0))
print(a_mask)
# output
masked_array(
data=[[3, 4, 5],
[8, 7, 6],
[5, 4, --]],
mask=[[False, False, False],
[False, False, False],
[False, False, True]],
fill_value=999999)
print(a_mask.mask)
# outputs
array([[False, False, False],
[False, False, False],
[False, False, True]])
and you can use the mask attribute of masked_array to distinguish elements you want to label and fill in other values.

If you mean "distinguish 0s" as the highest value or lowest values, I would suggest trying:
a[a==0]=(a.max()+1)
or:
a[a==0]=(a.min()-1)

One way to achieve the task is to first generate a boolean mask checking for zero values (since you want to distinguish this in the array), then sort it and then use the boolean mask to set the desired values (e.g., -1)
# your unmodified input array
In [294]: a
Out[294]:
array([[3, 4, 5],
[8, 7, 6],
[5, 4, 0]])
# boolean mask checking for zero
In [295]: zero_bool_mask = a == 0
In [296]: zero_bool_mask
Out[296]:
array([[False, False, False],
[False, False, False],
[False, False, True]])
# usual argsort
In [297]: sorted_idxs = np.argsort(a)
In [298]: sorted_idxs
Out[298]:
array([[0, 1, 2],
[2, 1, 0],
[2, 1, 0]])
# replace the indices of 0 with desired value (e.g., -1)
In [299]: sorted_idxs[zero_bool_mask] = -1
In [300]: sorted_idxs
Out[300]:
array([[ 0, 1, 2],
[ 2, 1, 0],
[ 2, 1, -1]])
Following this, to account for the correct sorting indices after the substitution value (e.g., -1), we have to perform this final step:
In [327]: sorted_idxs - (sorted_idxs == -1).sum(1)[:, None]
Out[327]:
array([[ 0, 1, 2],
[ 2, 1, 0],
[ 1, 0, -2]])
So now the sorted_idxs with negative values are the locations where you had zeros in the original array.
Thus, we can have a custom function like so:
def argsort_excluding_zeros(arr, replacement_value):
zero_bool_mask = arr == 0
sorted_idxs = np.argsort(arr)
sorted_idxs[zero_bool_mask] = replacement_value
return sorted_idxs - (sorted_idxs == replacement_value).sum(1)[:, None]
# another array
In [339]: a
Out[339]:
array([[0, 4, 5],
[8, 7, 6],
[5, 4, 0]])
# sample run
In [340]: argsort_excluding_zeros(a, replacement_value=-1)
Out[340]:
array([[-2, 0, 1],
[ 2, 1, 0],
[ 1, 0, -2]])

Using #kmario23 and #ScienceSnake code, I came up with the solution:
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
b = np.where(a == 0, np.inf, a) # Replace 0 -> inf to make them sorted last
s = b.copy() # make a copy of b to sort it
s.sort()
mask = s == np.inf # create a mask to get inf locations after sorting
c = b.argsort()
d = np.where(mask, -1, c) # Replace where the zeros were originally with -1
Out:
array([[ 0, 2, -1, -1],
[ 2, 1, 0, 3],
[ 1, 0, 3, -1]])
Not the most efficient solution because it is sorting twice.....

There might be a slightly more efficient alternative, but this works in pure numpy and is very transparent.
import numpy as np
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
b = np.where(a == 0, np.inf, a) # Replace 0 -> inf to make them sorted last
c = b.argsort()
d = np.where(a == 0, -1, c) # Replace where the zeros were originally with -1
print(d)
outputs
[[ 0 -1 1 -1]
[ 2 1 0 3]
[ 1 0 -1 2]]
To save memory, some of the in-between assignments can be skipped, but I left it this way for clarity.
*** EDIT ***
The OP has clarified exactly what output they want. This is my new solution which has only one sort.
a = np.array([[3, 0, 5, 0], [8, 7, 6, 10], [5, 4, 0, 10]])
b = np.where(a == 0, np.inf, a).argsort()
def remove_invalid_entries(row, num_valid):
row[num_valid.pop():] = -1
return row
num_valid = np.flip(np.count_nonzero(a, 1)).tolist()
b = np.apply_along_axis(remove_invalid_entries, 1, b, num_valid)
print(b)
> [[ 0 2 -1 -1]
[ 2 1 0 3]
[ 1 0 3 -1]]
The start is as before. Then, we go through the argsorted list row by row, and replace the last n elements by -1, where n is the number of 0's that are in the corresponding row of the original list. The fastest way of doing this is with np.apply_along_axis. Here, I counted all the zeros in each row of a, and turn it into a list (reversed order) so that I can use pop() to get the number of elements to keep in the current row of b being iterated over by np.apply_along_axis.

Numpy - Indexing one dimension of a multidimensional array

I have an numpy array like this with shape (6, 2, 4):
x = np.array([[[0, 3, 2, 0],
[1, 3, 1, 1]],
[[3, 2, 3, 3],
[0, 3, 2, 0]],
[[1, 0, 3, 1],
[3, 2, 3, 3]],
[[0, 3, 2, 0],
[1, 3, 2, 2]],
[[3, 0, 3, 1],
[1, 0, 1, 1]],
[[1, 3, 1, 1],
[3, 1, 3, 3]]])
And I have choices array like this:
choices = np.array([[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1],
[1, 0, 0, 0],
[1, 0, 1, 1],
[0, 0, 0, 1]])
How can I use choices array to index only the middle dimension with size 2 and get a new numpy array with shape (6, 4) in the most efficient way possible?
The result would be this:
[[1 3 1 1]
[3 3 2 3]
[3 2 3 3]
[1 3 2 0]
[1 0 1 1]
[1 3 1 3]]
I've tried to do it by x[:, choices, :] but this doesn't return what I want. I also tried to do x.take(choices, axis=1) but no luck.

Use np.take_along_axis to index along the second axis -
In [16]: np.take_along_axis(x,choices[:,None],axis=1)[:,0]
Out[16]:
array([[1, 3, 1, 1],
[3, 3, 2, 3],
[3, 2, 3, 3],
[1, 3, 2, 0],
[1, 0, 1, 1],
[1, 3, 1, 3]])
Or with explicit integer-array indexing -
In [22]: m,n = choices.shape
In [23]: x[np.arange(m)[:,None],choices,np.arange(n)]
Out[23]:
array([[1, 3, 1, 1],
[3, 3, 2, 3],
[3, 2, 3, 3],
[1, 3, 2, 0],
[1, 0, 1, 1],
[1, 3, 1, 3]])

as I recently had this issue, found #divakar's answer useful, but still wanted a general functions for that (independent of number of dims etc.), here it is :
def take_indices_along_axis(array, choices, choice_axis):
"""
array is N dim
choices are integer of N-1 dim
with valuesbetween 0 and array.shape[choice_axis] - 1
choice_axis is the axis along which you want to take indices
"""
nb_dims = len(array.shape)
list_indices = []
for this_axis, this_axis_size in enumerate(array.shape):
if this_axis == choice_axis:
# means this is the axis along which we want to choose
list_indices.append(choices)
continue
# else, we want arange(this_axis), but reshaped to match the purpose
this_indices = np.arange(this_axis_size)
reshape_target = [1 for _ in range(nb_dims)]
reshape_target[this_axis] = this_axis_size # replace the corresponding axis with the right range
del reshape_target[choice_axis] # remove the choice_axis
list_indices.append(
this_indices.reshape(tuple(reshape_target))
)
tuple_indices = tuple(list_indices)
return array[tuple_indices]
# test it !
array = np.random.random(size=(10, 10, 10, 10))
choices = np.random.randint(10, size=(10, 10, 10))
assert take_indices_along_axis(array, choices, choice_axis=0)[5, 5, 5] == array[choices[5, 5, 5], 5, 5, 5]
assert take_indices_along_axis(array, choices, choice_axis=2)[5, 5, 5] == array[5, 5, choices[5, 5, 5], 5]

Linear sum of shifted numpy arrays

Given a (m,n) numpy array A, I would like to construct the (m-1,n-1) numpy array B such that B[i,j] equals
A[i+1,j+1]+A[i,j]-A[i+1,j]-A[i,j+1]

In this specific case you can use np.diff twice:
B = np.diff(np.diff(A, axis=0), axis=1)
OR
(probably slower but more general) use linear convolution:
from scipy import signal
B = signal.convolve(A, ((1, -1), (-1, 1)), mode='valid')

B = A[:-1, :-1] + A[1:, 1:] - A[1:, :-1] - A[:-1, 1:]
For example,
In [37]: A = np.arange(24).reshape((6,4))
In [38]: A
Out[38]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])
In [39]: B = A[:-1, :-1] + A[1:, 1:] - A[1:, :-1] - A[:-1, 1:]
In [40]: B
Out[40]:
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
This avoids loops by taking advantage of the fact that NumPy array arithmetic is performed element-wise. So instead of defining B[i,j] in a loop, you express the entire calculation as a sum of array slices.

Sorting an array by corresponding ranges

My purpose is to populate an array from list:
list = [-6, -3, -2, -1, 0, 1, 3, 4, 5, 7, 8, 9, 10, 12, 13, 15]
in an order following this:
Individual integers not followed by its incremented value (by 1) will be added directly to the result array.
Integers which is followed by its incremented value (by 1) will be added to the range array and this array will then be added to result and will be reset again to be used for the next range.
The correct output should be:
solution(list)
# => [-6, [-3, -2, -1, 0, 1], [3, 4, 5], [7, 8, 9, 10], [12, 13], 15]
My code and my output is below.
def solution(list)
result = []
idx = 0
loop do
range = []
loop do
if list[idx+1] - list[idx] == 1
range << list[idx]
idx += 1
else
result << list[idx]
idx += 1
break
end
end
result << range
break if idx == list.size - 1
end
result
end
solution(list)
# => [-6, [], 1, [-3, -2, -1, 0], 5, [3, 4], 10, [7, 8, 9], 13, [12]]
The code is not correct. Can you tell me what I am missing?

You're missing chunk_while.
list.chunk_while{|a, b| a.next == b}.map{|a| a.one? ? a.first : a}
# => [-6, [-3, -2, -1, 0, 1], [3, 4, 5], [7, 8, 9, 10], [12, 13], 15]

Search for zero in 2D array and make a corresponding row and col 0

This is my code, which works, but it's too big. I want to refactor it.
req_row = -1
req_col = -1
a.each_with_index do |row, index|
row.each_with_index do |col, i|
if col == 0
req_row = index
req_col = i
break
end
end
end
if req_col > -1 and req_row > -1
a.each_with_index do |row,index|
row.each_with_index do |col, i|
print (req_row == index or i == req_col) ? 0 : col
print " "
end
puts "\r"
end
end
Input: 2D Array
1 2 3 4
5 6 7 8
9 10 0 11
12 13 14 15
Required output:
1 2 0 4
5 6 0 8
0 0 0 0
12 13 0 15

I'm surprised the Matrix class is not used more:
a = [[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 0, 11],
[12, 13, 14, 15]]
require 'matrix'
m = Matrix.rows(a)
#=> Matrix[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 0, 11], [12, 13, 14, 15]]
r, c = m.index(0)
#=> [2, 2]
Matrix.build(m.row_count, m.column_count) {|i,j| (i==r || j==c) ? 0 : m[i,j]}.to_a
#=> [[ 1, 2, 0, 4],
# [ 5, 6, 0, 8],
# [ 0, 0, 0, 0],
# [12, 13, 0, 15]]
Note Matrix objects are immutable. To change individual elements you must create a new matrix.
A slight modification is required if you wish to do this for every zero in the matrix:
a = [[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 0, 11],
[ 0, 13, 14, 15]]
require 'set'
m = Matrix.rows(a)
#=> Matrix[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 0, 11], [0, 13, 14, 15]]
zero_rows = Set.new
zero_columns = Set.new
m.each_with_index { |e,i,j| (zero_rows << i; zero_columns << j) if e.zero? }
zero_rows
#=> #<Set: {2, 3}>
zero_columns
#=> #<Set: {2, 0}>
Matrix.build(m.row_count, m.column_count) do |i,j|
(zero_rows.include?(i) || zero_columns.include?(j)) ? 0 : m[i,j]
end.to_a
#=> [[0, 2, 0, 4],
# [0, 6, 0, 8],
# [0, 0, 0, 0],
# [0, 0, 0, 0]]

Try this code:
req_row = req_col = -1
a.each_with_index do |row, index|
req_col = row.index(0) # searching index having value 0.
if req_col
req_row = index
break
end
end
a.each_with_index do |row,index|
row.each_with_index do |col, i|
print ((req_row == index or i == req_col) ? 0 : col).to_s + " "
end
puts "\r"
end

Based on the title of your question, here's solution that searches for positions of the zero values (fixate), then actually zeros out the appropriate row and column (clear, more aligned with the contents of your question):
def fixate matrix, search=0, replace=0
rcs = []
matrix.each_with_index do |row,r|
row.each_with_index do |col,c|
rcs << [ r, c ] if col == search
end
end
rcs.each do |(row, col)|
clear matrix, row, col, replace
end
matrix
end
def clear matrix, row, col, val=0
matrix[row].map! { |_| val } # Clear rows
matrix.each { |r| r[col] = val } # Clear columns
matrix
end
Quick test:
fixate [ # [
[ 1, 2, 3, 4 ], # [ 1, 2, 0, 4 ],
[ 5, 6, 7, 8 ], # [ 5, 6, 0, 8 ],
[ 9, 10, 0, 11 ], # [ 0, 0, 0, 0 ],
[ 12, 13, 14, 15 ] # [ 12, 13, 0, 15 ]
] # ]

Here's what I came up with:
zero_rows=[]
a.map do |col|
col.each_with_index do |el, index|
zero_rows.push(index) if el==0
end
col.fill(0) if col.include?(0)
end
a.map{|col| zero_rows.each{|index| col[index]=0} }
First, use map to iterate through the columns and fill them with zeros if they contain at least one 0. but, while doing so, add the indexes which contain a 0 to the zero_rows array.
Afterwards, iterate through the array once more and set the indexes of each column that match an index in zero_rows to 0.
You may know the map method as collect. They do the same thing.
Side Note:
If an array contains multiple zero's this code will zero out every applicable column. OP's example and some other answers here will only zero out the first column in which a 0 is found. If this is the behavior you are expecting, then see #Doguita's answer.

I don't know if this code is better than the other answers. I will test it later:
ary = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 0, 11], [12, 13, 14, 15]]
col = nil
row = ary.index{ |a| col = a.index(0) }
ary.each_with_index { |a, i| i == row ? a.fill(0) : a[col] = 0 } if col
p ary
# => [[1, 2, 0, 4], [5, 6, 0, 8], [0, 0, 0, 0], [12, 13, 0, 15]]
Obs: This answer assumes there's only one 0 to search in the array

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to optimise code that parses a 2-d array in Ruby - arrays

Related

Numpy argsort while distinguishing values of 0

Numpy - Indexing one dimension of a multidimensional array

Linear sum of shifted numpy arrays

Sorting an array by corresponding ranges

Search for zero in 2D array and make a corresponding row and col 0

Categories

Resources