Axes Swapping in higher degree numpy arrays - arrays

So I was embarking on a mission to figure out how the numpy swapaxes function operates and reached a sort of a roadblock when it came to swapping axes in arrays of dimensions > 3.
Say
import numpy as np
array=np.arange(24).reshape(3,2,2,2)
This would create a numpy array of shape (3,2,2,2) with elements 0-2. Can someone explain to me how exactly axes swapping works in this case, where we cannot visualise the four axes separately?
Say I want to swap axes 0 and 2.
array.swapaxes(0,2)
It would be great if someone could actually describe the abstract swapping which is occurring when there are 4 or more axes. Thanks!

How do you 'describe' a 4d array? We don't have intuitions to match; the best we can do is project from 2d experience. rows, cols, planes, ??
This array is small enough to show the actual print:
In [271]: arr = np.arange(24).reshape(3,2,2,2)
In [272]: arr
Out[272]:
array([[[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]]],
[[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]]],
[[[16, 17],
[18, 19]],
[[20, 21],
[22, 23]]]])
The print marks the higher dimensions with extra [] and blank lines.
In [273]: arr.swapaxes(0,2)
Out[273]:
array([[[[ 0, 1],
[ 8, 9],
[16, 17]],
[[ 4, 5],
[12, 13],
[20, 21]]],
[[[ 2, 3],
[10, 11],
[18, 19]],
[[ 6, 7],
[14, 15],
[22, 23]]]])
To see what's actually being done, we have to look at the underlying properties of the arrays
In [274]: arr.__array_interface__
Out[274]:
{'data': (188452024, False),
'descr': [('', '<i4')],
'shape': (3, 2, 2, 2),
'strides': None, # arr.strides = (32, 16, 8, 4)
'typestr': '<i4',
'version': 3}
In [275]: arr.swapaxes(0,2).__array_interface__
Out[275]:
{'data': (188452024, False),
'descr': [('', '<i4')],
'shape': (2, 2, 3, 2),
'strides': (8, 16, 32, 4),
'typestr': '<i4',
'version': 3}
The data attributes are the same - the swap is a view, sharing data buffer with the original. So no numbers are moved around.
The shape change is obvious, that's what we told it swap. Sometimes it helps to make all dimensions different, e.g. (2,3,4)
It has also swapped 2 strides values, though how that affects the display is harder to explain. We have to know something about how shape and strides work together to create a multidimensional array (from a flat data buffer).

Related

reshaping tensors for multi head attention in pytorch - view vs transpose

I'm learning about the attention operator in the deep learning domain. I understand that to compute multi head attention efficiently in parallel, the input tensors (query, key, value) have to be reshaped properly.
Assuming query, key and value are three tensor of identical shape [N, L, D], in which
N is the batch size
L is the sequence length
D is the hidden/embedding size,
they should be turned into [N*N_H, L, D_H] tensors, where N_H is the number of heads for the attention layer and D_H is the embedding size of each head.
The pytorch code seems to do exactly that. Here below I post the code for reshaping the query tensor (key, value are equally deemed)
q = q.contiguous().view(tgt_len, bsz * num_heads, head_dim).transpose(0, 1)
I don't get why they perform both a view and a transpose call, when the result would be the same by just doing
q = q.contiguous().view(bsz * num_heads, tgt_len, head_dim)
Other than avoiding an additional function call, using view alone also guarantees that the resulting tensor is still contiguous in memory, whereas this doesn't hold (to the best of my knowledge) for transpose. I suppose working with contiguous data is beneficial whenever possible to make computations potentially faster (may lead to fewer memory accesses, better exploiting of spatial locality of data, etc.).
What's the use case for having a transpose call after a view?
The results AREN'T necessarily the same:
a = torch.arange(0, 2 * 3 * 4)
b = a.view(2, 3, 4).transpose(1, 0)
#tensor([[[ 0, 1, 2, 3],
[12, 13, 14, 15]],
[[ 4, 5, 6, 7],
[16, 17, 18, 19]],
[[ 8, 9, 10, 11],
[20, 21, 22, 23]]])
c = a.view(3, 2, 4)
#tensor([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23]]])

Confusion regarding resulting shape of a multi-dimensional slicing of a numpy array

Suppose we have
t = np.random.rand(2,3,4)
i.e., a 2x3x4 tensor.
I'm having trouble understanding why the shape of t[0][:][:2] is 2x4 rather than 3x2.
Aren't we taking the 0th, all, and the first indices of the 1st, 2nd, and 3rd dimensions, in which case that would give us a 3x2 tensor?
In [1]: t = np.arange(24).reshape(2,3,4)
In [2]: t
Out[2]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Select the 1st plane:
In [3]: t[0]
Out[3]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
[:] selects everything - ie. no change
In [4]: t[0][:]
Out[4]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Select first 2 rows of that plane:
In [5]: t[0][:][:2]
Out[5]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
While [i][j] works for integer indices, it shouldn't be used for slices or arrays. Each [] acts on the result of the previous. [:] is not a 'placeholder' for the 'middle dimension'. Python syntax and execution order applies, even when using numpy. numpy just adds an array class, and functions. It doesn't change the syntax.
Instead you want:
In [6]: t[0,:,:2]
Out[6]:
array([[0, 1],
[4, 5],
[8, 9]])
The result is the 1st plane, and first 2 columns, (and all rows). With all 3 dimensions in one [] they are applied in a coordinated manner, not sequentially.
There is a gotcha, when using a slice in the middle of 'advanced indices'. Same selection as before, but transposed.
In [8]: t[0,:,[0,1]]
Out[8]:
array([[0, 4, 8],
[1, 5, 9]])
For this you need a partial decomposition - applying the 0 first
In [9]: t[0][:,[0,1]]
Out[9]:
array([[0, 1],
[4, 5],
[8, 9]])
There is a big indexing page in the numpy docs that you need to study sooner or later.

Splitting a matrix into arrays with different column sizes

I want to split a matrix into arrays with different column sizes. I'm able to do it with a for loop, however I'm curious if it could be done in a faster way using some command.
Let's say for example that I have the following matrix:
matrix = [[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10, 11, 12]
[13, 14, 15, 16]]
Now I woud like to obtain a 2D-array which looks as follows:
desired_array = [[1]
[5, 6]
[9, 10, 11]
[13, 14, 15, 16]]
I want this since I would like the sum per row of the desired_array. Maybe there is another solution to obtain that sum, without using a for loop?
Thank you!
You just want the row-wise sum of the lower triangular matrix.
>>> np.tril(matrix).sum(1)
array([ 1, 11, 30, 58])

NumPy: indexing array by list of tuples - how to do it correctly?

I am in the following situation - I have the following:
Multidimensional numpy array a of n dimensions
t, an array of k rows (tuples), each with n elements. In other words, each row in this array is an index in a
What I want: from a, return an array b with k scalar elements, the ith element in b being the result of indexing a with the ith tuple from t.
Seems trivial enough. The following approach, however, does not work
def get(a, t):
# wrong result + takes way too long
return a[t]
I have to resort to doing this iteratively i.e. the following works correctly:
def get(a, t):
res = []
for ind in t:
a_scalar = a
for i in ind:
a_scalar = a_scalar[i]
# a_scalar is now a scalar
res.append(a_scalar)
return res
This works, except for the fact that given that each dimension in a has over 30 elements, the procedure does get really slow when n gets to more than 5. I understand that it would be slow regardless, however, I would like to exploit numpy's capabilities as I believe it would speed up this process considerably.
The key to getting this right is to understand the roles of indexing lists and tuples. Often the two are treated the same, but in numpy indexing, tuples, list and arrays convey different information.
In [1]: a = np.arange(12).reshape(3,4)
In [2]: t = np.array([(0,0),(1,1),(2,2)])
In [4]: a
Out[4]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [5]: t
Out[5]:
array([[0, 0],
[1, 1],
[2, 2]])
You tried:
In [6]: a[t]
Out[6]:
array([[[ 0, 1, 2, 3],
[ 0, 1, 2, 3]],
[[ 4, 5, 6, 7],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[ 8, 9, 10, 11]]])
So what's wrong with it? It ran, but selected a (3,2) array of rows of a. That is, it applied t to just the first dimension, effectively a[t, :]. You want to index on all dimensions, some sort of a[t1, t2]. That's the same as a[(t1,t2)] - a tuple of indices.
In [10]: a[tuple(t[0])] # a[(0,0)]
Out[10]: 0
In [11]: a[tuple(t[1])] # a[(1,1)]
Out[11]: 5
In [12]: a[tuple(t[2])]
Out[12]: 10
or doing all at once:
In [13]: a[(t[:,0], t[:,1])]
Out[13]: array([ 0, 5, 10])
Another way to write it, is n lists (or arrays), one for each dimension:
In [14]: a[[0,1,2],[0,1,2]]
Out[14]: array([ 0, 5, 10])
In [18]: tuple(t.T)
Out[18]: (array([0, 1, 2]), array([0, 1, 2]))
In [19]: a[tuple(t.T)]
Out[19]: array([ 0, 5, 10])
More generally, in a[idx1, idx2] array idx1 is broadcast against idx2 to produce a full selection array. Here the 2 arrays are 1d and match, the selection is your t set of pairs. But the same principle applies to selecting a set of rows and columns, a[ [[0],[2]], [0,2,3] ].
Using the ideas in [10] and following, your get could be sped up with:
In [20]: def get(a, t):
...: res = []
...: for ind in t:
...: res.append(a[tuple(ind)]) # index all dimensions at once
...: return res
...:
In [21]: get(a,t)
Out[21]: [0, 5, 10]
If t really was a list of tuples (as opposed to an array built from them), your get could be:
In [23]: tl = [(0,0),(1,1),(2,2)]
In [24]: [a[ind] for ind in tl]
Out[24]: [0, 5, 10]
Explore using np.ravel_multi_index
Create some test data
arr = np.arange(10**4)
arr.shape=10,10,10,10
t = []
for j in range(5):
t.append( tuple(np.random.randint(10, size = 4)))
print(t)
# [(1, 8, 2, 0),
# (2, 3, 3, 6),
# (1, 4, 8, 5),
# (2, 2, 6, 3),
# (0, 5, 0, 2),]
ta = np.array(t).T
print(ta)
# array([[1, 2, 1, 2, 0],
# [8, 3, 4, 2, 5],
# [2, 3, 8, 6, 0],
# [0, 6, 5, 3, 2]])
arr.ravel()[np.ravel_multi_index(tuple(ta), (10,10,10,10))]
# array([1820, 2336, 1485, 2263, 502]
np.ravel_multi_index basically calculates, from the tuple of input arrays, the index into a flattened array that starts with shape (in this case) (10, 10, 10, 10).
Does this do what you need? Is it fast enough?

Pythonic algorithm for applying a function to a 3D array along the first axis indices

I have a 3D array like A
A = np.random.randint(20,size=(4,2,2))
array([[[18, 8],
[ 2, 11]],
[[ 9, 8],
[ 9, 10]],
[[ 0, 1],
[10, 6]],
[[ 1, 8],
[ 4, 2]]])
What I want to do is to apply a function to some indices along the axis=0. For example, I want to multiply the A[1] and A[3] by 2 and add 10 to them. I know one option is this:
for index in [1,3]:
A[index] = A[index]*2+10
Which gives:
array([[[18, 8],
[ 2, 11]],
[[28, 26],
[28, 30]],
[[ 0, 1],
[10, 6]],
[[12, 26],
[18, 14]]])
But my original array is of the size of (2500, 300, 300) and I need to apply the function to 500 non-consecutive indices along the axis=0 every time. Is there a faster and more pythonic way to do it?
You could use stepped slicing
A[1::2] = A[1::2] * 2 + 10
A
array([[[18, 8],
[ 2, 11]],
[[28, 26],
[28, 30]],
[[ 0, 1],
[10, 6]],
[[12, 26],
[18, 14]]])
Or assuming your slice is named slc
A[slc] = A[slc] * 2 + 10

Resources