Linear sum of shifted numpy arrays - arrays

Given a (m,n) numpy array A, I would like to construct the (m-1,n-1) numpy array B such that B[i,j] equals
A[i+1,j+1]+A[i,j]-A[i+1,j]-A[i,j+1]

In this specific case you can use np.diff twice:
B = np.diff(np.diff(A, axis=0), axis=1)
OR
(probably slower but more general) use linear convolution:
from scipy import signal
B = signal.convolve(A, ((1, -1), (-1, 1)), mode='valid')

B = A[:-1, :-1] + A[1:, 1:] - A[1:, :-1] - A[:-1, 1:]
For example,
In [37]: A = np.arange(24).reshape((6,4))
In [38]: A
Out[38]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])
In [39]: B = A[:-1, :-1] + A[1:, 1:] - A[1:, :-1] - A[:-1, 1:]
In [40]: B
Out[40]:
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
This avoids loops by taking advantage of the fact that NumPy array arithmetic is performed element-wise. So instead of defining B[i,j] in a loop, you express the entire calculation as a sum of array slices.

Related

Confusion regarding resulting shape of a multi-dimensional slicing of a numpy array

Suppose we have
t = np.random.rand(2,3,4)
i.e., a 2x3x4 tensor.
I'm having trouble understanding why the shape of t[0][:][:2] is 2x4 rather than 3x2.
Aren't we taking the 0th, all, and the first indices of the 1st, 2nd, and 3rd dimensions, in which case that would give us a 3x2 tensor?
In [1]: t = np.arange(24).reshape(2,3,4)
In [2]: t
Out[2]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Select the 1st plane:
In [3]: t[0]
Out[3]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
[:] selects everything - ie. no change
In [4]: t[0][:]
Out[4]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Select first 2 rows of that plane:
In [5]: t[0][:][:2]
Out[5]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
While [i][j] works for integer indices, it shouldn't be used for slices or arrays. Each [] acts on the result of the previous. [:] is not a 'placeholder' for the 'middle dimension'. Python syntax and execution order applies, even when using numpy. numpy just adds an array class, and functions. It doesn't change the syntax.
Instead you want:
In [6]: t[0,:,:2]
Out[6]:
array([[0, 1],
[4, 5],
[8, 9]])
The result is the 1st plane, and first 2 columns, (and all rows). With all 3 dimensions in one [] they are applied in a coordinated manner, not sequentially.
There is a gotcha, when using a slice in the middle of 'advanced indices'. Same selection as before, but transposed.
In [8]: t[0,:,[0,1]]
Out[8]:
array([[0, 4, 8],
[1, 5, 9]])
For this you need a partial decomposition - applying the 0 first
In [9]: t[0][:,[0,1]]
Out[9]:
array([[0, 1],
[4, 5],
[8, 9]])
There is a big indexing page in the numpy docs that you need to study sooner or later.

Eliminating array rows based on a property of consecutive pairs of elements

We are given an array sample a, shown below, and a constant c.
import numpy as np
a = np.array([[1, 3, 1, 11, 9, 14],
[2, 12, 1, 10, 7, 6],
[6, 7, 2, 14, 2, 15],
[14, 8, 1, 3, -7, 2],
[0, -3, 0, 3, -3, 0],
[2, 2, 3, 3, 12, 13],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
c = 2
It is convenient, in this problem, to think of each array row as being composed of three pairs, so the 1st row is [1,3, 1,11, 9,14].
DEFINITION: d_min is the minimum difference between the elements of two consecutive pairs.
The PROBLEM: I want to retain rows of array a, where all consecutive pairs have d_min <= c. Otherwise, the rows should be eliminated.
In the 1st array row, the 1st pair (1,3) and the 2nd pair (1,11) have d_min = 1-1=0.
The 2nd pair (1,11) and the 3rd pair(9,14) have d_min = 11-9=2. (in both cases, d_min<=c, so we keep this row in a)
In the 2nd array row, the 1st pair (2,12) and the 2nd pair (1,10) have d_min = 2-1=1.
But, the 2nd pair (1,10) and the 3rd pair(7,6) have d_min = 10-7=3. (3 > c, so this row should be eliminated from array a)
Current efforts: I currently handle this problem with nested for-loops (2 deep).
The outer loop runs through the rows of array a, determining d_min between the first two pairs using:
for r in a
d_min = np.amin(np.abs(np.subtract.outer(r[:2], r[2:4])))
The inner loop uses the same method to determine the d_min between the last two pairs.
Further processing only is done only when d_min<= c for both sets of consecutive pairs.
I'm really hoping there is a way to avoid the for-loops. I eventually need to deal with 8-column arrays, and my current approach would involve 3-deep looping.
In the example, there are 4 row eliminations. The final result should look like:
a = np.array([[1, 3, 1, 11, 9, 14],
[0, -3, 0, 3, -3, 0],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
Assume the number of elements in each row is always even:
import numpy as np
a = np.array([[1, 3, 1, 11, 9, 14],
[2, 12, 1, 10, 7, 6],
[6, 7, 2, 14, 2, 15],
[14, 8, 1, 3, -7, 2],
[0, -3, 0, 3, -3, 0],
[2, 2, 3, 3, 12, 13],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
c = 2
# separate the array as previous pairs and next pairs
sx, sy = a.shape
prev_shape = sx, (sy - 2) // 2, 1, 2
next_shape = sx, (sy - 2) // 2, 2, 1
prev_pairs = a[:, :-2].reshape(prev_shape)
next_pairs = a[:, 2:].reshape(next_shape)
# subtract which will effectively work as outer subtraction due to numpy broadcasting, and
# calculate the minimum difference for each pair
pair_diff_min = np.abs(prev_pairs - next_pairs).min(axis=(2, 3))
# calculate the filter condition as boolean array
to_keep = pair_diff_min.max(axis=1) <= c
print(a[to_keep])
#[[ 1 3 1 11 9 14]
# [ 0 -3 0 3 -3 0]
# [ 3 14 4 12 1 4]
# [ 0 13 13 4 0 3]]
Demo Link

How to optimise code that parses a 2-d array in Ruby

Note: This question poses a problem that I have already solved, however I feel my solution is very rudimentary and that other people, like myself, would benefit from a discussion with input from more experienced developers. Different approaches to solving the problem, as well as more sophisticated methods and algorithms would be really appreciated. I feel this is a good place to learn how Ruby can tackle what I consider to be a fairly difficult problem for a beginner.
Given a 6x6 2D Array arr:
1 1 1 0 0 0
0 1 0 0 0 0
1 1 1 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
We define an hourglass in arr to be a subset of values with indices falling in this pattern in arr's graphical representation:
a b c
d
e f g
There are 16 hourglasses in arr and an hourglass sum is the sum of an hourglass' values. Calculate the hourglass sum for every hourglass in arr, then print the maximum hourglass sum.
For example, given the 2D array:
arr = [
[-9, -9, -9, 1, 1, 1],
[ 0, -9, 0, 4, 3, 2],
[-9, -9, -9, 1, 2, 3],
[ 0, 0, 8, 6, 6, 0],
[ 0, 0, 0, -2, 0, 0],
[ 0, 0, 1, 2, 4, 0]
]
We calculate the following hourglass values:
-63, -34, -9, 12,
-10, 0, 28, 23,
-27, -11, -2, 10,
9, 17, 25, 18
Our highest hourglass value is from the hourglass:
0 4 3
1
8 6 6
My solution is:
def hourglass_sum(arr)
hourglasses = []
arr.each_with_index do |row, i|
# rescue clause to prevent iterating outside the array
unless arr[i].nil?
arr[i].length.times do |iteration|
# generate n 3x3 arrays
r1 = arr[i][iteration...iteration+3]
r2 = arr[i+1][iteration...iteration+3] if arr[i+1] != nil
r3 = arr[i+2][iteration...iteration+3] if arr[i+2] != nil
# rescue clause to stop creating 3x3 arrays that fall outside given input array
if arr[i+1] != nil && arr[i+2] != nil
# take all values except indices 0 and 5 from the 9 element array
result = r1 + [r2[1]] + r3
hourglasses << result.sum unless result.include? nil
end
end
end
end
p hourglasses.max
end
arr = [[-9, -9, -9, 1, 1, 1], [0, -9, 0, 4, 3, 2], [-9, -9, -9, 1, 2, 3], [0, 0, 8, 6, 6, 0], [0, 0 ,0, -2, 0, 0], [0, 0, 1, 2, 4, 0]]
hourglass_sum(arr)
# => 28
One option is to use Matrix methods.
require 'matrix'
ma = Matrix[*arr]
#=> Matrix[[-9, -9, -9, 1, 1, 1],
# [ 0, -9, 0, 4, 3, 2],
# [-9, -9, -9, 1, 2, 3],
# [ 0, 0, 8, 6, 6, 0],
# [ 0, 0, 0, -2, 0, 0],
# [ 0, 0, 1, 2, 4, 0]]
mi = Matrix.build(6-3+1) { |i,j| [i,j] }
#=> Matrix[[[0, 0], [0, 1], [0, 2], [0, 3]],
# [[1, 0], [1, 1], [1, 2], [1, 3]],
# [[2, 0], [2, 1], [2, 2], [2, 3]],
# [[3, 0], [3, 1], [3, 2], [3, 3]]]
def hourglass_val(r,c,ma)
mm = ma.minor(r,3,c,3)
mm.sum - mm[1,0] - mm[1,2]
end
max_hg = mi.max_by { |r,c| hourglass_val(r,c,ma) }
#=> [1,2]
hourglass_val(*max_hg,ma)
#=> 28
[1,2] are the row and column indices of the top-left corner of an optimal hourglass in arr.
Here is an option I came up with.
def width_height(matrix)
[matrix.map(&:size).max || 0, matrix.size]
end
def sum_with_weight_matrix(number_matrix, weight_matrix)
number_width, number_height = width_height(number_matrix)
weight_width, weight_height = width_height(weight_matrix)
width_diff = number_width - weight_width
height_diff = number_height - weight_height
0.upto(height_diff).map do |y|
0.upto(width_diff).map do |x|
weight_height.times.sum do |ry|
weight_width.times.sum do |rx|
weight = weight_matrix.dig(ry, rx) || 0
number = number_matrix.dig(y + ry, x + rx) || 0
number * weight
end
end
end
end
end
arr = [
[-9, -9, -9, 1, 1, 1],
[ 0, -9, 0, 4, 3, 2],
[-9, -9, -9, 1, 2, 3],
[ 0, 0, 8, 6, 6, 0],
[ 0, 0, 0, -2, 0, 0],
[ 0, 0, 1, 2, 4, 0],
]
weights = [
[1, 1, 1],
[0, 1, 0],
[1, 1, 1],
]
sum_matrix = sum_with_weight_matrix(arr, weights)
#=> [
# [-63, -34, -9, 12],
# [-10, 0, 28, 23],
# [-27, -11, -2, 10],
# [ 9, 17, 25, 18]
# ]
max_sum = sum_matrix.flatten.max
#=> 28
This solution uses the width_diff and height_diff to create an output matrix (4x4 for the sample data 0.upto(6 - 3).to_a #=> [0, 1, 2, 3]). The indexes of the weight_matrix (rxand ry) will be used as relative index compared to the larger number_matrix.
If your 2d array always has the same number of elements for each sub-array you can replace matrix.map(&:size).max with matrix[0]&.size || 0 to speed up determining the matrix width. The current solution uses the maximum size of the sub-arrays. Sub-arrays having a smaller size will use 0 for the missing elements thus not effecting the sum.
My solution might be a bit variable heavy. I've done this to have descriptive variable names, that hopefully tell you most you need to know about the solution. You can shorten variable names, or remove them completely when you feel like you don't need them.
If something isn't clear just ask away in the comments.
Without using the Matrix class, here's how I've done it for any arbitrary rectangular array:
offsets = [[-1, -1], [-1, 0], [-1, 1], [0, 0], [1, -1], [1, 0], [1, 1]]
sums = 1.upto(arr.length - 2).flat_map do |i|
1.upto(arr[0].length - 2).map do |j|
offsets.map {|(x, y)| arr[i+x][j+y] }.sum
end
end
puts sums.max
The values we're interested in are just offsets from a current position. We can map out the values in the array relative to the current position by some row and column offset, sum them, then select the max of the sums.

NumPy: indexing array by list of tuples - how to do it correctly?

I am in the following situation - I have the following:
Multidimensional numpy array a of n dimensions
t, an array of k rows (tuples), each with n elements. In other words, each row in this array is an index in a
What I want: from a, return an array b with k scalar elements, the ith element in b being the result of indexing a with the ith tuple from t.
Seems trivial enough. The following approach, however, does not work
def get(a, t):
# wrong result + takes way too long
return a[t]
I have to resort to doing this iteratively i.e. the following works correctly:
def get(a, t):
res = []
for ind in t:
a_scalar = a
for i in ind:
a_scalar = a_scalar[i]
# a_scalar is now a scalar
res.append(a_scalar)
return res
This works, except for the fact that given that each dimension in a has over 30 elements, the procedure does get really slow when n gets to more than 5. I understand that it would be slow regardless, however, I would like to exploit numpy's capabilities as I believe it would speed up this process considerably.
The key to getting this right is to understand the roles of indexing lists and tuples. Often the two are treated the same, but in numpy indexing, tuples, list and arrays convey different information.
In [1]: a = np.arange(12).reshape(3,4)
In [2]: t = np.array([(0,0),(1,1),(2,2)])
In [4]: a
Out[4]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [5]: t
Out[5]:
array([[0, 0],
[1, 1],
[2, 2]])
You tried:
In [6]: a[t]
Out[6]:
array([[[ 0, 1, 2, 3],
[ 0, 1, 2, 3]],
[[ 4, 5, 6, 7],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[ 8, 9, 10, 11]]])
So what's wrong with it? It ran, but selected a (3,2) array of rows of a. That is, it applied t to just the first dimension, effectively a[t, :]. You want to index on all dimensions, some sort of a[t1, t2]. That's the same as a[(t1,t2)] - a tuple of indices.
In [10]: a[tuple(t[0])] # a[(0,0)]
Out[10]: 0
In [11]: a[tuple(t[1])] # a[(1,1)]
Out[11]: 5
In [12]: a[tuple(t[2])]
Out[12]: 10
or doing all at once:
In [13]: a[(t[:,0], t[:,1])]
Out[13]: array([ 0, 5, 10])
Another way to write it, is n lists (or arrays), one for each dimension:
In [14]: a[[0,1,2],[0,1,2]]
Out[14]: array([ 0, 5, 10])
In [18]: tuple(t.T)
Out[18]: (array([0, 1, 2]), array([0, 1, 2]))
In [19]: a[tuple(t.T)]
Out[19]: array([ 0, 5, 10])
More generally, in a[idx1, idx2] array idx1 is broadcast against idx2 to produce a full selection array. Here the 2 arrays are 1d and match, the selection is your t set of pairs. But the same principle applies to selecting a set of rows and columns, a[ [[0],[2]], [0,2,3] ].
Using the ideas in [10] and following, your get could be sped up with:
In [20]: def get(a, t):
...: res = []
...: for ind in t:
...: res.append(a[tuple(ind)]) # index all dimensions at once
...: return res
...:
In [21]: get(a,t)
Out[21]: [0, 5, 10]
If t really was a list of tuples (as opposed to an array built from them), your get could be:
In [23]: tl = [(0,0),(1,1),(2,2)]
In [24]: [a[ind] for ind in tl]
Out[24]: [0, 5, 10]
Explore using np.ravel_multi_index
Create some test data
arr = np.arange(10**4)
arr.shape=10,10,10,10
t = []
for j in range(5):
t.append( tuple(np.random.randint(10, size = 4)))
print(t)
# [(1, 8, 2, 0),
# (2, 3, 3, 6),
# (1, 4, 8, 5),
# (2, 2, 6, 3),
# (0, 5, 0, 2),]
ta = np.array(t).T
print(ta)
# array([[1, 2, 1, 2, 0],
# [8, 3, 4, 2, 5],
# [2, 3, 8, 6, 0],
# [0, 6, 5, 3, 2]])
arr.ravel()[np.ravel_multi_index(tuple(ta), (10,10,10,10))]
# array([1820, 2336, 1485, 2263, 502]
np.ravel_multi_index basically calculates, from the tuple of input arrays, the index into a flattened array that starts with shape (in this case) (10, 10, 10, 10).
Does this do what you need? Is it fast enough?

Use 2d array as list of indices for n-D array

If I have the following data:
A = np.random.random((3, 4, 5))
# np.all(indices < A.shape) is true
indices = np.array([
[0, 0, 0],
[1, 2, 4],
...
[2, 3, 4]
])
How can I use each row of indices as a set of axis indices into A to give the following?
B = np.array([
A[0, 0, 0],
A[1, 2, 4],
...
A[2, 3, 4]
])
Here's a 2d example:
In [1]: A=np.arange(10,22).reshape(3,4)
In [2]: A
Out[2]:
array([[10, 11, 12, 13],
[14, 15, 16, 17],
[18, 19, 20, 21]])
In [3]: ind=np.array([[0,1],[1,3],[2,0],[0,2]])
In [4]: ind
Out[4]:
array([[0, 1],
[1, 3],
[2, 0],
[0, 2]])
In [5]: A[ind[:,0],ind[:,1]]
Out[5]: array([11, 17, 18, 12])
or for your variables,
A[indices[:,0], indices[:,1], indices[:,2]]
Or more generally:
In [8]: tuple(ind.T)
Out[8]: (array([0, 1, 2, 0]), array([1, 3, 0, 2]))
In [9]: A[tuple(ind.T)]
Out[9]: array([11, 17, 18, 12])
This is based on the idea that A[a,b] is the same as A[(a,b)]. And when a and b are matching lists or arrays, it selects values by pairing them up, roughly the same as
[A[i,j] for i,j in zip(a,b)]
For a product like indexing, the index arrays need to have more dimensions. ix_ is a handy way of generating such arrays:
In [53]: np.ix_(ind[:,0],ind[:,1])
Out[53]:
(array([[0],
[1],
[2],
[0]]), array([[1, 3, 0, 2]]))
In [54]: A[np.ix_(ind[:,0],ind[:,1])]
Out[54]:
array([[11, 13, 10, 12],
[15, 17, 14, 16],
[19, 21, 18, 20],
[11, 13, 10, 12]])
In [56]: A[ind[:,[0]],ind[:,1]]
Out[56]:
array([[11, 13, 10, 12],
[15, 17, 14, 16],
[19, 21, 18, 20],
[11, 13, 10, 12]])
You could use np.ravel_multi_index to generate the linear indices and then extract the selective elements from A with linear-indexing using np.take like so -
np.take(A,np.ravel_multi_index(indices.T,A.shape))

Resources