Numpy reshape with remainder throws error - arrays

How can I partition this array into arrays of length 3, with a padded or unpadded remainder (doesn't matter)
>>> np.array([0,1,2,3,4,5,6,7,8,9,10]).reshape([3,-1])
ValueError: cannot reshape array of size 11 into shape (3,newaxis)

### Two Examples Without Padding
x = np.array([0,1,2,3,4,5,6,7,8,9,10])
desired_length = 3
num_splits = np.ceil(x.shape[0]/desired_length)
print(np.array_split(x, num_splits))
# Prints:
# [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10])]
x = np.arange(13)
desired_length = 3
num_splits = np.ceil(x.shape[0]/desired_length)
print(np.array_split(x, num_splits))
# Prints:
# [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10]), array([11, 12])]
### One Example With Padding
x = np.arange(13)
desired_length = 3
padding = int(num_splits*desired_length - x.shape[0])
x_pad = np.pad(x, (0,padding), 'constant', constant_values=0)
print(np.split(x_pad, num_splits))
# Prints:
# [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10, 11]), array([12, 0, 0])]

If you want to avoid padding with zeros, the most elegant way to do it might be slicing in a list comprehension:
>>> import numpy as np
>>> x = np.arange(11)
>>> [x[i:i+3] for i in range(0, x.size, 3)]
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10])]

If you want to pad with zeros, ndarray.resize() does this for you, but you have to figure out the size of the expected array yourself:
import numpy as np
x = np.array([0,1,2,3,4,5,6,7,8,9,10])
cols = 3
rows = np.ceil(x.size / cols).astype(int)
x.resize((rows, cols))
print(x)
Which results in:
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 0]]
As far as I can tell, this is hundreds of times faster than the list comprehension approach (see my other answer).
Note that if you do anything to x before resizing, you might run into an issue with 'references'. Either work on x.copy() or pass refcheck=False to resize().

Related

Removing submatrix from numpy array by shifting other elements [duplicate]

This question already has answers here:
numpy matrix. move all 0's to the end of each row
(2 answers)
Closed 3 years ago.
Suppose i have a numpy array
a = np.array([[1,2,3,4],
[3,4,5,6],
[2,3,4,4],
[3,3,1,2]])
I want to delete the submatrix [[3,4],[3,1]]. I can do it as follows
mask = np.ones(a.shape,dtype=bool)
mask[2:,1:-1] = False
a_new = a[mask,...]
print(a) #output array([1, 2, 3, 4, 3, 4, 5, 6, 2, 4, 3, 2])
However, i want the output as
np.array([[1,2,3,4],
[3,4,5,6],
[2,4,0,0],
[3,2,0,0]])
I just want numpy to remove the submatrix and shift others elements replacing the empty places with 0. How can i do this?
I cannot find a function that does what you ask, but combining np.roll with a mask with this routine produces your output. Perhaps there is a more elegant way:
a = np.array([[1,2,3,4],
[3,4,5,6],
[2,3,4,4],
[3,3,1,2]])
mask = np.ones(a.shape,dtype=bool)
mask[2:,1:-1] = False
mask2 = mask.copy()
mask2[2:, 1:] = False
n = 2 #shift length
a[~mask2] = np.roll((a * mask)[~mask2],-n)
a
>>array([[1, 2, 3, 4],
[3, 4, 5, 6],
[2, 4, 0, 0],
[3, 2, 0, 0]])
you can simply update those element entries to be zero.
a = np.array([[1,2,3,4],
[3,4,5,6],
[2,3,4,4],
[3,3,1,2]])
a[2:, 2:] = 0
returns
array([[1, 2, 3, 4],
[3, 4, 5, 6],
[2, 3, 0, 0],
[3, 3, 0, 0]])

NumPy: indexing array by list of tuples - how to do it correctly?

I am in the following situation - I have the following:
Multidimensional numpy array a of n dimensions
t, an array of k rows (tuples), each with n elements. In other words, each row in this array is an index in a
What I want: from a, return an array b with k scalar elements, the ith element in b being the result of indexing a with the ith tuple from t.
Seems trivial enough. The following approach, however, does not work
def get(a, t):
# wrong result + takes way too long
return a[t]
I have to resort to doing this iteratively i.e. the following works correctly:
def get(a, t):
res = []
for ind in t:
a_scalar = a
for i in ind:
a_scalar = a_scalar[i]
# a_scalar is now a scalar
res.append(a_scalar)
return res
This works, except for the fact that given that each dimension in a has over 30 elements, the procedure does get really slow when n gets to more than 5. I understand that it would be slow regardless, however, I would like to exploit numpy's capabilities as I believe it would speed up this process considerably.
The key to getting this right is to understand the roles of indexing lists and tuples. Often the two are treated the same, but in numpy indexing, tuples, list and arrays convey different information.
In [1]: a = np.arange(12).reshape(3,4)
In [2]: t = np.array([(0,0),(1,1),(2,2)])
In [4]: a
Out[4]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [5]: t
Out[5]:
array([[0, 0],
[1, 1],
[2, 2]])
You tried:
In [6]: a[t]
Out[6]:
array([[[ 0, 1, 2, 3],
[ 0, 1, 2, 3]],
[[ 4, 5, 6, 7],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[ 8, 9, 10, 11]]])
So what's wrong with it? It ran, but selected a (3,2) array of rows of a. That is, it applied t to just the first dimension, effectively a[t, :]. You want to index on all dimensions, some sort of a[t1, t2]. That's the same as a[(t1,t2)] - a tuple of indices.
In [10]: a[tuple(t[0])] # a[(0,0)]
Out[10]: 0
In [11]: a[tuple(t[1])] # a[(1,1)]
Out[11]: 5
In [12]: a[tuple(t[2])]
Out[12]: 10
or doing all at once:
In [13]: a[(t[:,0], t[:,1])]
Out[13]: array([ 0, 5, 10])
Another way to write it, is n lists (or arrays), one for each dimension:
In [14]: a[[0,1,2],[0,1,2]]
Out[14]: array([ 0, 5, 10])
In [18]: tuple(t.T)
Out[18]: (array([0, 1, 2]), array([0, 1, 2]))
In [19]: a[tuple(t.T)]
Out[19]: array([ 0, 5, 10])
More generally, in a[idx1, idx2] array idx1 is broadcast against idx2 to produce a full selection array. Here the 2 arrays are 1d and match, the selection is your t set of pairs. But the same principle applies to selecting a set of rows and columns, a[ [[0],[2]], [0,2,3] ].
Using the ideas in [10] and following, your get could be sped up with:
In [20]: def get(a, t):
...: res = []
...: for ind in t:
...: res.append(a[tuple(ind)]) # index all dimensions at once
...: return res
...:
In [21]: get(a,t)
Out[21]: [0, 5, 10]
If t really was a list of tuples (as opposed to an array built from them), your get could be:
In [23]: tl = [(0,0),(1,1),(2,2)]
In [24]: [a[ind] for ind in tl]
Out[24]: [0, 5, 10]
Explore using np.ravel_multi_index
Create some test data
arr = np.arange(10**4)
arr.shape=10,10,10,10
t = []
for j in range(5):
t.append( tuple(np.random.randint(10, size = 4)))
print(t)
# [(1, 8, 2, 0),
# (2, 3, 3, 6),
# (1, 4, 8, 5),
# (2, 2, 6, 3),
# (0, 5, 0, 2),]
ta = np.array(t).T
print(ta)
# array([[1, 2, 1, 2, 0],
# [8, 3, 4, 2, 5],
# [2, 3, 8, 6, 0],
# [0, 6, 5, 3, 2]])
arr.ravel()[np.ravel_multi_index(tuple(ta), (10,10,10,10))]
# array([1820, 2336, 1485, 2263, 502]
np.ravel_multi_index basically calculates, from the tuple of input arrays, the index into a flattened array that starts with shape (in this case) (10, 10, 10, 10).
Does this do what you need? Is it fast enough?

Easy way of printing two numpy arrays with each element in a different line?

Let's say I have a 1D numpy array x and another one y = x ** 2.
I am looking for an easier alternative to
for i in range(x.size):
print(x[i], y[i])
With one array one can do print(*x, sep = '\n') which is easier than a for loop. I'm thinking of something like converting x and y to arrays of strings and then adding them up into an array z and then using print(*z, sep = '\n'). However, I tried to do that but numpy gives an error when the add operation is performed.
Edit: This is the function I use for this
def to_str(*args):
return '\n'.join([' '.join([str(ls[i]) for ls in args]) for i in range(len(args[0]))]) + '\n'
>>> x = np.arange(10)
>>> y = x ** 2
>>> print(to_str(x,y))
0 0
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
>>>
or if something quick and dirty is enough:
print(np.array((x,y)).T)
You could do something along these lines -
# Input arrays
In [238]: x
Out[238]: array([14, 85, 79, 89, 41])
In [239]: y
Out[239]: array([13, 79, 13, 79, 11])
# Join arrays with " "
In [240]: z = [" ".join(item) for item in np.column_stack((x,y)).astype(str)]
# Finally print it
In [241]: print(*z, sep='\n')
14 13
85 79
79 13
89 79
41 11
# Original approach for printing
In [242]: for i in range(x.size):
...: print(x[i], y[i])
...:
14 13
85 79
79 13
89 79
41 11
To make things a bit more compact, np.column_stack((x,y)) could be replaced by np.vstack((x,y)).T.
There are few other methods to create z as listed below -
z = [str(i)[1:-1] for i in zip(x,y)] # Prints commas between elems
z = [str(i)[1:-1] for i in np.column_stack((x,y))]
z = [str(i)[1:-1] for i in np.vstack((x,y)).T]
Here is one way without loop:
print(np.array2string(np.column_stack((x, y)),separator=',').replace(' [ ','').replace('],', '').strip('[ ]'))
Demo:
In [86]: x
Out[86]: array([0, 1, 2, 3, 4])
In [87]: y
Out[87]: array([ 0, 1, 4, 9, 16])
In [85]: print(np.array2string(np.column_stack((x, y)),separator=',').replace(' [ ','').replace('],', '').strip('[ ]'))
0, 0
1, 1
2, 4
3, 9
4,16
There are 2 issues - combining the 2 arrays, and printing the result
In [1]: a = np.arange(4)
In [2]: b = a**2
In [3]: ab = [a,b] # join arrays in a simple list
In [4]: ab
Out[4]: [array([0, 1, 2, 3]), array([0, 1, 4, 9])]
In [6]: list(zip(*ab)) # 'transpose' that list
Out[6]: [(0, 0), (1, 1), (2, 4), (3, 9)]
That zip(*) is a useful tool or idiom.
I could use your print(*a, sep...) method with this
In [11]: print(*list(zip(*ab)), sep='\n')
(0, 0)
(1, 1)
(2, 4)
(3, 9)
Using sep is a neat py3 trick, but is rarely used. I'm not even sure how to do the equivalent with the older py2 print statement.
But if we convert the list of arrays into a 2d array we have more options.
In [12]: arr = np.array(ab)
In [13]: arr
Out[13]:
array([[0, 1, 2, 3],
[0, 1, 4, 9]])
In [14]: np.vstack(ab) # does the same thing
Out[14]:
array([[0, 1, 2, 3],
[0, 1, 4, 9]])
For simply looking at the 2 arrays together this arr is quite useful. And if the lines get too long, transpose it:
In [15]: arr.T
Out[15]:
array([[0, 0],
[1, 1],
[2, 4],
[3, 9]])
In [16]: print(arr.T)
[[0 0]
[1 1]
[2 4]
[3 9]]
Note that array print format is different that for nested lists. That's intentional.
The brackets seldom get in the way of understanding the display. They even help with the array becomes 3d and higher.
For printing a file that can be read by other programs, np.savetxt is quite useful. It lets me specify the delimiter, and the format for each column or line.
In [17]: np.savetxt('test.csv', arr.T, delimiter=',',fmt='%10d')
In ipython I can look at the file with a simple system call:
In [18]: cat test.csv
0, 0
1, 1
2, 4
3, 9
I can omit the delimiter parameter.
I can reload it with loadtxt
In [20]: np.loadtxt('test.csv',delimiter=',',dtype=int)
Out[20]:
array([[0, 0],
[1, 1],
[2, 4],
[3, 9]])
In Py3 it is hard to write savetxt to the screen. It operates on a byte string file, and sys.stdout is unicode. In Py2 np.savetxt(sys.stdout, ...) might work.
savetxt is not sophisticated. In this example, it is essentially doing a fwrite equivalent of:
In [21]: for row in arr.T:
...: print('%10d,%10d'%tuple(row))
...:
0, 0
1, 1
2, 4
3, 9

How to select a portion of a NumPy array efficiently?

I'm switching from Matlab/octve to Numpy/Scipy.
To select a segment of a Matlab array, it was quite easy.
e.g.
>> x = [1, 2, 3, 4; 5, 6, 7, 8; 9, 10, 11, 12]
x =
1 2 3 4
5 6 7 8
9 10 11 12
>> y = x(2:3, 1:2)
y =
5 6
9 10
How can the same thing be done with NumPy when
x = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
As Indexing > Other indexing options in the NumPy documentation mentions,
The slicing and striding works exactly the same way it does for lists and tuples except that they can be applied to multiple dimensions as well.
For your example, this means
import numpy as np
x = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
# array([[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12]])
x[1:3, 0:2]
# => array([[ 5, 6],
# [ 9, 10]])
Most notable difference to Matlab is probably that indexing is zero-based (i.e., first element has index 0) and that index ranges (called 'slices' in Python) are expressed with an exclusive upper bound: l[4:7] gets l[4], l[5] and l[6] (the 3rd to the 7th element), but not l[7] (the 8th element).
The Python tutorial's section on lists will give you a feeling for how indexing and slicing works for normal (1-dimensional) collections.

Numpy: What is the most efficient way to rearrange a matrix to have every row stacked with its left/right context?

Let me explain it by a small example:
>>> x = np.array([[1,2], [3,4], [5,6], [7,8]])
>>> x
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
I want to have a new array that has the form
array([[0, 0, 1, 2, 3, 4],
[1, 2, 3, 4, 5, 6],
[3, 4, 5, 6, 7, 8],
[5, 6, 7, 8, 0, 0]])
Here, the context has the size +/-1, but I'd like to keep it variable.
What I'm doing so far is appending zeros to the original array:
>>> y = np.concatenate((np.zeros((1, 2)), x, np.zeros((1, 2))), axis=0)
>>> y
array([[ 0., 0.],
[ 1., 2.],
[ 3., 4.],
[ 5., 6.],
[ 7., 8.],
[ 0., 0.]])
And putting the values into a new array by reading rows of the new size:
>>> z = np.empty((x.shape[0], x.shape[1]*3))
>>> for i in range(x.shape[0]): z[i] = y[i:i+3].flatten()
That kind of works, but I find it slow, ugly and unpythonic.
Can you think of a better way to do this rearrangement? Additional thumbsup for an in-place-ish solution :)
There is the option of using stride_tricks, but I will not say that this is the best answer, because while it is "the most efficient way", that way is not always the best when considering readability and that it is playing with fire.
# We make it flat (and copy if necessary) to be on the safe side, and because
# it is more obvious this way with stride tricks (or my function below):
y = y.ravel()
# the new shape is (y.shape[0]//2-2, 6). When looking at the raveled y, the first
# dimension takes steps of 2 elements (so y.strides[0]*2) and the second is
# just the old one:
z = np.lib.stride_tricks.as_strided(y, shape=(y.shape[0]//2-2, 6),
strides=(y.strides[0]*2, y.strides[0]))
Note that z here is only a view, so use z.copy() to avoid any unexpected things before editing it, otherwise in your example all 1s will change if you edit one of them. On the up side, if you mean this by "in-place", you can now change elements in y and z will change too.
If you want to do more of this magic, maybe check out my rolling_window function from https://gist.github.com/3430219, which replaces the last line with:
# 6 values long window, but only every 2nd step on the original array:
z = rolling_window(y, 6, asteps=2)
Important: np.lib.stride_tricks.as_strided by itself is generally not safe and must be used with care as it can create segmentation faults.
Indexing should work:
y = np.concatenate(([0, 0], x.flat, [0, 0])) # or use np.pad with NumPy 1.7
i = np.tile(np.arange(6), (4, 1)) + np.arange(4)[:, None] * 2
z = y[i]
Obviously this is inplace if you want!
To see how this works, take a look at the i indexing array:
array([[ 0, 1, 2, 3, 4, 5],
[ 2, 3, 4, 5, 6, 7],
[ 4, 5, 6, 7, 8, 9],
[ 6, 7, 8, 9, 10, 11]])
Making it flexible:
context = 1
h, w = x.shape
zeros = np.zeros((context, w), dtype=x.dtype)
y = np.concatenate((zeros, x, zeros), axis=0).flat
i = np.tile(np.arange(w + 2 * context * w), (h, 1)) + np.arange(h)[:, None] * w
z = y[i]

Resources