extract blocks of columns (as seperated subarrays) indicated by 1D binary array - arrays

Based on a 1D binary mask, for example, np.array([0,0,0,1,1,1,0,0,1,1,0]), I would like to extract the columns of another array, indicated by the 1's in the binary mask, as as sub-arrays/separate blocks, like [9, 3.5, 7]) and [2.8, 9.1] (I am just making up the numbers to illustrate the point).
So far what I have (again just as a demo to illustrate what my goal is, not the data where this operation will be performed):
arr = torch.from_numpy(np.array([0,0,0,1,1,1,0,0,1,1,0]))
split_idx = torch.where(torch.diff(arr) == 1)[0]+1
torch.tensor_split(arr, split_idx.tolist())
The output is:
(tensor([0, 0, 0]),
tensor([1, 1, 1]),
tensor([0, 0]),
tensor([1, 1]),
tensor([0]))
What I would like to have in the end is:
(tensor([1, 1, 1]),
tensor([1, 1]))
Do you know how to implement it, preferably in pytorch, but numpy functions are also fine. A million thanks in advance!!

You can construct your tensor of slice indices with your approach. Only thing is you were missing the indices for the position of the end of each slice. You can do something like:
>>> slices = arr.diff().abs().nonzero().flatten()+1
tensor([ 3, 6, 8, 10])
Then apply tensor_split and slice to only keep every other element:
>>> torch.tensor_split(arr, slices)[1::2]
(tensor([1, 1, 1]), tensor([1, 1]))

Related

Deleting and adding numpy array rows in a for loop to create a dynamic subarray from larger numpy array,

Summary of problem
Ultimate goal
I would like to take a sub-array from a large input numpy array. This sub array is dynamic, and every iteration through the larger numpy input array will change the sub array so that I can perform a set of calculations that depend on previous iterations of the array. This involves nested for loops, which I realize is not very pythonic, but I don't know of another way.
Problem
The problem arises when I add to the existing dynamic sub-array, it seems to grow extra bracketing. This seems simple to fix, but I am having trouble adapting my Matlab knowledge of array indexing to numpy indexing. I have not even started implementing my calculations yet, but I cannot seem to get the structure of this loop correct.
What I've tried
I have [tried this originally in Pandas][1]. Originally, I thought I could write a pretty simply program to do this using pandas indexing and column naming. But it was SLOW! So I trying to streamline this by
changing the architecture and
relying on numpy instead of Pandas.
Below is a simple program that emulates what I want to do. I am sure I will have other questions, but this is the start. I have a simple (5, 2) array that I loop through the rows of. With each row after row 0, I add the new row to the top of the temp sub-array and delete the last row of the array, maintaining a (2, 2) array throughout. However, as you will see when you run this code, it results in some strange behavior that results in not being able to write the results into the output array. You will also see that I have tried several ways to add and delete columns. Whether these are optimal is besides the point - the current code is the closest I have gotten to running this program!
Some Example code
This code 'works' in the sense that it doesn't trow errors. However, it doesnt' produce the desired results. In this case it would be an output array with the same values as the inputs (because I am not doing any calculations- this is just to get the architecture correct). The desired result would be that each loop creates a sub array in this order:
n=1 [1 1]
n=2 [[1,1], [2,2]]
n=3 [[2, 2], [3, 3]]
n=4 [[3, 3], [4, 4]]
...
N [[N-1, N-1], [N, N]].
This does not need to be limited to 2 items (if list) or rows (if array), and the length will be set by an input variable. Thus, the size of this array must be dynamic (set during the call of the function). Furthermore, I supply a simple example here, but each loop will basically need to add a row from the input. It will be a little more advanced than simply a 2 member NDarray. Lists have the advantage of being able to use .append and .pop attributes, but as far as I can tell, arrays do not. I present the following code example using only arrays.
import numpy as np
a = np.array([[1, 1], [2, 2], [3, 3], [4,4], [5,5]])
print('Original a array: ', a)
out = np.empty_like(a)
b = np.empty(len(a[0,:]))
for ii, rr in enumerate(a):
if ii == 0:
c = [a[ii]]
else:
print('Before: ', c)
#Add next row from array a to the temp array for calculations
c = np.insert(c, 1, [rr], axis=0)
print('During: ', c)
#Remove the last row of the temp array prior to calculations
#indices_to_remove = [0]
#d = c[~np.isin(np.arange(c.size), [indices_to_remove])]
d = c[1::]
c = [d]
print('After: ', c)
#Add the temp array to the output array after calculations
#THIS THROWS ERRORS, AND I THINK IT IS DUE TO THE INCREASING NUMBERS OF BRACKETS.
#out[ii, :] = c
#print(c)
[1]: https://stackoverflow.com/questions/70186681/nested-loops-altering-rows-in-pandas-avoiding-a-value-is-trying-to-be-set-on?noredirect=1#comment124076103_70186681
MATLAB is 1-base indexing whereas Python uses 0-base indexing.
let's say we have a 2D array like this:
a= [[1, 2],
[3, 4],
[5, 6]]
In MATLAB if you do a(1, 1) you will get 1 in python a[1, 1] you will get 4. Also as you know in MATLAB you can do natural indexing which works if you do a(6) = 6 in python if you do that you will get IndexError: . it is almost the same, except in python it starts from 0.
Here is the working example with your desired results.
import numpy as np
a = np.array([[1, 1], [2, 2], [3, 3], [4,4], [5,5]])
test = []
for idx in range(len(a)):
if idx == 0:
test.append(a[idx])
test.append(a[idx:idx+2, :])
# remove the last [5, 5]
test.pop(-1)
for i in test:
print(i, end=',\n')
output
[1 1],
[[1 1]
[2 2]],
[[2 2]
[3 3]],
[[3 3]
[4 4]],
[[4 4]
[5 5]],

Is there any way to increment past an array element in Python? (Like how you can with pointer arithmetic in C?)

I have
arr = [6, 5, 4, 3, 2, 1]
I want to use the ...5, 4, 3, 2, 1] part of the array in a recursive call while keeping the 6 in the array (at its current position) for future use.
It feels very similar to pointer arithmetic in C, I'm just not sure how to implement something like that in Python (ver 3.7). I'm lost as to how to preserve the 6 in the array at it's position, which is essential as the array needs to be maintained in sorted descending order.
Any guidance on how to get around this is appreciated.
You can access the elements of the arr list in the following manner while not disturbing the elements of it
>>> arr[2:]
[4, 3, 2, 1]
>>> arr[1:]
[5, 4, 3, 2, 1]
>>> arr
[6, 5, 4, 3, 2, 1]
>>> arr[2:4]
[4, 3]
List indexing doesn't affect the elements inside. I hope this answers your question
You can use a part of an array in a recursive call without changing the original array using start and end index.
arr=[6,5,4,3,2,1], now if you want to use arr from index=1 then just pass
start and end i.e, fun(arr, start, end) in the function fun where, start=1 and end=length-1, this will not modify the original array and at the same time you can use array from start to end in recursive calls.

comparing two numpy arrays and adding same rows

I have two large data files, one with two columns and one with three columns. I want to select all the rows from the second file that are contained in the fist array. My idea was to compare the numpy arrays.
Let's say I have:
a = np.array([[1, 2, 3], [3, 4, 5], [1, 4, 6]])
b = np.array([[1, 2], [3, 4]])
and the result should look like this:
[[1, 2, 3], [3, 4, 5]]
Any advice on that?
EDIT:
So in the end this works. Not very handy but it works.
for ii in range(a.shape[0]):
u, v, w = a[ii,:]
for jj in range(b.shape[0]):
if (u == b[jj, 0] and v == b[jj, 1]):
print [u, v, w]
The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems efficiently, without using any python loops:
import numpy_indexed as npi
a[npi.contains(b, a[:, :2])]
If you prefer to not use another library but want to do this in numpy only, you can do something similar to what is suggested here and here, namely to use np.in1d (see docs) which does provide you with a mask indicating if an element in one 1D array exists in another 1D array. As the name indicates, this function only works for 1D arrays. But you can use a structured array view (using np.view) to cheat numpy into thinking you have 1D arrays. One caveat is though, that you need a deep copy of the first array a since np.view doesn't mix with slices, well. But if that is not too big of an issue for you, something along the lines of:
a_cp = a[:, :2].copy()
a[np.in1d(a_cp.view((np.void, a_cp.dtype.itemsize*a_cp.shape[1])).ravel(),
b.view((np.void, b.dtype.itemsize*b.shape[1])).ravel())]
might work for you.
This directly uses the masked array to return the correct values from your array a.
Check this, #Ernie. It may help you to get to the solution. ;D
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.in1d.html

How to vectorize NumPy polyder function?

I would like to vectorize the NumPy function polyder, which computes derivatives of polynomials. Is there a simple way or a built-in function to do it?
With vectorize, I mean that if the input is an array of polynomials, the output would be the array with the derivative of the polynomials.
An example:
p = np.array([[3,4,5], [1,2]])
the output should be something like
np.array([[6, 4], [1]])
Since your subarrays, both input and output, can have different lengths, you are better off treating both as lists.
In [97]: [np.polyder(d) for d in [[3,4,5],[1,2]]]
Out[97]: [array([6, 4]), array([1])]
Your p is just a list in an expensive (timewise) array wrapper.
In [100]: p=np.array([[3,4,5],[1,2]])
In [101]: p
Out[101]: array([[3, 4, 5], [1, 2]], dtype=object)
There is little that you can do with such an array that you can't do just as well with a list. Do some time tests. You probably will find that iterating over the arrays of objects is slower than iteration over equivalent lists, especially if you take into account the time it takes convert a list to array.
It can also be tricky to create such arrays. If all the sublists are the same length the result will be a 2d array. Forcing them to be an object array takes special initiation.
A general rull of thumb is - if individual steps work with arrays or lists of different length, you probably can't vectorize. That is, you can't form a rectangular 2d array and apply vector operations.
If the polynomial lists were all the same length, then p could be 2d, and the result could also be that:
In [107]: p=np.array([[3,4,5],[0,1,2]])
In [108]: p
Out[108]:
array([[3, 4, 5],
[0, 1, 2]])
In [109]: np.array([np.polyder(i) for i in p])
Out[109]:
array([[6, 4],
[0, 1]])
In effect it is iterating over the rows of p, and then reassembling the result into an array. There are some numpy functions that streamline iteration (but don't speed it up much), but I see little need for those here.
Looking at the code of this function, the core is:
p = NX.asarray(p)
n = len(p) - 1
y = p[:-1] * NX.arange(n, 0, -1)
which for this 2d array, (len 3) is:
In [117]: p[:,:-1]*np.arange(2,0,-1)
Out[117]:
array([[6, 4],
[0, 1]])
So if the number of polynomials are all the same, this simple multiplication gives the 1st order derivative coefficients. And of course the rows can be padded so they are all the same. So 'vectorization' is easier than I initially thought.
import numpy as np
p = np.array([[3,4,5], [1,2]])
np.array([np.polyder(coefficients) for coefficients in p]) # array([[6 4], [1]], dtype=object)
would fulfill your interface for your specific example. But as hpaulj mentions, there's little sense in working with NumPy arrays instead of normal python lists here, and no actual (hardware-level) vectorization will happen. (Though, as with list comprehensions in general, the interpreter would be free to employ other means of parallelism to compute them.)

Python 2.7: looping over 1D fibers in a multidimensional Numpy array

I am looking for a way to loop over 1D fibers (row, column, and multi-dimensional equivalents) along any dimension in a 3+-dimensional array.
In a 2D array this is fairly trivial since the fibers are rows and columns, so just saying for row in A gets the job done. But for 3D arrays for example, this expression iterates over 2D slices, not 1D fibers.
A working solution is the one below:
import numpy as np
A = np.arange(27).reshape((3,3,3))
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(A[fiber_index])
However, I am wondering whether there is something that is:
More idiomatic
Faster
Hope you can help!
I think you might be looking for numpy.apply_along_axis
In [10]: def my_func(x):
...: return x**2 + x
In [11]: np.apply_along_axis(my_func, 2, A)
Out[11]:
array([[[ 0, 2, 6],
[ 12, 20, 30],
[ 42, 56, 72]],
[[ 90, 110, 132],
[156, 182, 210],
[240, 272, 306]],
[[342, 380, 420],
[462, 506, 552],
[600, 650, 702]]])
Although many NumPy functions (including sum) have their own axis argument to specify which axis to use:
In [12]: np.sum(A, axis=2)
Out[12]:
array([[ 3, 12, 21],
[30, 39, 48],
[57, 66, 75]])
numpy provides a number of different ways of looping over 1 or more dimensions.
Your example:
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(fiber_index)
print A[fiber_index]
produces something like:
(0, 0)
[0 1 2]
(0, 1)
[3 4 5]
(0, 2)
[6 7 8]
...
generates all index combinations over the 1st 2 dim, giving your function the 1D fiber on the last.
Look at the code for ndindex. It's instructive. I tried to extract it's essence in https://stackoverflow.com/a/25097271/901925.
It uses as_strided to generate a dummy matrix over which an nditer iterate. It uses the 'multi_index' mode to generate an index set, rather than elements of that dummy. The iteration itself is done with a __next__ method. This is the same style of indexing that is currently used in numpy compiled code.
http://docs.scipy.org/doc/numpy-dev/reference/arrays.nditer.html
Iterating Over Arrays has good explanation, including an example of doing so in cython.
Many functions, among them sum, max, product, let you specify which axis (axes) you want to iterate over. Your example, with sum, can be written as:
np.sum(A, axis=-1)
np.sum(A, axis=(1,2)) # sum over 2 axes
An equivalent is
np.add.reduce(A, axis=-1)
np.add is a ufunc, and reduce specifies an iteration mode. There are many other ufunc, and other iteration modes - accumulate, reduceat. You can also define your own ufunc.
xnx suggests
np.apply_along_axis(np.sum, 2, A)
It's worth digging through apply_along_axis to see how it steps through the dimensions of A. In your example, it steps over all possible i,j in a while loop, calculating:
outarr[(i,j)] = np.sum(A[(i, j, slice(None))])
Including slice objects in the indexing tuple is a nice trick. Note that it edits a list, and then converts it to a tuple for indexing. That's because tuples are immutable.
Your iteration can applied along any axis by rolling that axis to the end. This is a 'cheap' operation since it just changes the strides.
def with_ndindex(A, func, ax=-1):
# apply func along axis ax
A = np.rollaxis(A, ax, A.ndim) # roll ax to end (changes strides)
shape = A.shape[:-1]
B = np.empty(shape,dtype=A.dtype)
for ii in np.ndindex(shape):
B[ii] = func(A[ii])
return B
I did some timings on 3x3x3, 10x10x10 and 100x100x100 A arrays. This np.ndindex approach is consistently a third faster than the apply_along_axis approach. Direct use of np.sum(A, -1) is much faster.
So if func is limited to operating on a 1D fiber (unlike sum), then the ndindex approach is a good choice.

Resources