I have an array made of 0 and 1. I want to calculate a cumulative sum of all consecutive 1 with a reset each time a 0 is met, using numpy as I have thousands of arrays of thousands of lines and columns.
I can do it with loops but I suspect it will not be efficient.
Would you have a smarter and quick way to run it on the array.
Here is short example of the input and the expected output:
import numpy as np
arr_in = np.array([[1,1,1,1,1,1], [0,0,0,0,0,0], [1,0,1,0,1,1], [0,1,1,1,0,0]])
print(arr_in)
print("expected result:")
arr_out = np.array([[1,2,3,4,5,6], [0,0,0,0,0,0], [1,0,1,0,1,2], [0,1,2,3,0,0]])
print(arr_out)
When you run it:
[[1 1 1 1 1 1]
[0 0 0 0 0 0]
[1 0 1 0 1 1]
[0 1 1 1 0 0]]
expected result:
[[1 2 3 4 5 6]
[0 0 0 0 0 0]
[1 0 1 0 1 2]
[0 1 2 3 0 0]]
With numba.vectorize you can define a custom numpy ufunc to use for accumulation.
import numba as nb # v0.56.4, no support for numpy >= 1.22.0
import numpy as np # v1.21.6
#nb.vectorize([nb.int64(nb.int64, nb.int64)])
def reset_cumsum(x, y):
return x + y if y else 0
arr_in = np.array([[1,1,1,1,1,1],
[0,0,0,0,0,0],
[1,0,1,0,1,1],
[0,1,1,1,0,0]])
reset_cumsum.accumulate(arr_in, axis=1)
Output
array([[1, 2, 3, 4, 5, 6],
[0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 1, 2],
[0, 1, 2, 3, 0, 0]])
You can compute the cumsum for the 1s, then identify the 0s and forward-fill the cumulated sum to subtract it:
# identify 0s
mask = arr_in==0
# get classical cumsum
cs = arr_in.cumsum(axis=1)
# ffill the cumsum value on 1s
# subtract from cumsum
out = cs-np.maximum.accumulate(np.where(mask, cs, 0), axis=1)
Output:
[[1 2 3 4 5 6]
[0 0 0 0 0 0]
[1 0 1 0 1 2]
[0 1 2 3 0 0]]
Output on second example:
[[1 2 3 4 5 6 0 1]
[0 1 2 0 0 0 1 0]]
I have the following array:
[[1 2 1 0 2 0]
[1 2 1 0 2 0]
[1 2 1 0 2 0]
[1 2 1 0 2 0]
[0 1 2 1 0 0]
[0 1 2 1 0 0]
[0 0 1 0 1 0]
[0 0 0 1 1 0]
[0 0 0 0 1 0]
[0 0 0 0 0 1]]
I need to add a column to this array that adds a number whenever the values in the rows change starting with number 3. So the result would look like this:
[[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[0 1 2 1 0 0 4]
[0 1 2 1 0 0 4]
[0 0 1 0 1 0 5]
[0 0 0 1 1 0 6]
[0 0 0 0 1 0 7]
[0 0 0 0 0 1 8]]
Thank you
If a is your array as:
a = np.array([[1, 2, 1, 0, 2, 0], [1, 2, 1, 0, 2, 0], [1, 2, 1, 0, 2, 0], [1, 2, 1, 0, 2, 0],
[0, 1, 2, 1, 0, 0], [0, 1, 2, 1, 0, 0], [0, 0, 1, 0, 1, 0], [0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1]])
using the following code will get you the results:
n = 3
a = a.tolist()
for i, j in enumerate(a):
if i == 0:
j.append(n)
elif i > 0 and j == a[i-1][:-1]:
j.append(n)
else:
n += 1
j.append(n)
# a = np.array(a)
which will give:
[[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[0 1 2 1 0 0 4]
[0 1 2 1 0 0 4]
[0 0 1 0 1 0 5]
[0 0 0 1 1 0 6]
[0 0 0 0 1 0 7]
[0 0 0 0 0 1 8]]
I have two arrays, one looks like this:
[[1 2 1 0 2 0 1]
[1 2 1 0 2 0 1]
[1 2 1 0 2 0 1]
[1 2 1 0 2 0 1]
[0 1 2 1 0 0 2]
[0 1 2 1 0 0 2]
[0 0 1 0 1 0 3]
[0 0 0 1 1 0 4]
[0 0 0 0 1 0 5]
[0 0 0 0 0 1 6]]
The other looks like this:
[[1 2 1 0 2 0]
[1 1 1 0 2 0]
[1 1 1 0 2 0]
[1 2 1 0 2 0]
[0 3 2 2 0 0]
[0 1 2 1 0 0]
[0 2 1 2 1 0]
[0 0 0 1 1 0]
[0 0 0 0 1 0]
[0 0 0 0 0 1]
...
[0 3 2 2 0 0]
[0 1 2 1 0 0]
[0 2 1 2 1 0]
[0 0 0 1 1 0]
[0 0 0 0 1 0]
[0 0 0 0 0 1]]
Whenever a row in the second array matches the first six values in the first array I need to add the last element of the first array (the 7th element) at the end of the row of the second array that matches and when it doesn't match add a 0. The result would look like this:
[[1 2 1 0 2 0 1]
[1 1 1 0 2 0 0]
[1 1 1 0 2 0 0]
[1 2 1 0 2 0 1]
[0 3 2 2 0 0 0]
[0 1 2 1 0 0 2]
[0 2 1 2 1 0 0]
[0 0 0 1 1 0 4]
[0 0 0 0 1 0 5]
[0 0 0 0 0 1 6]
...
[0 3 2 2 0 0 0]
[0 1 2 1 0 0 2]
[0 2 1 2 1 0 0]
[0 0 0 1 1 0 4]
[0 0 0 0 1 0 5]
[0 0 0 0 0 1 6]]
You could use:
import numpy as np
m = (B == A[:,None,:6]).all(2)
new_A = np.c_[B, np.where(m.any(0), np.take(A[:,6], m.argmax(0)), 0)]
How it works:
1- use broadcasting to compare B with all combinations of rows of A (limited to first 6 columns), and build a mask
2- Using numpy.where to check the condition: if at least 1 row in A matches, use numpy.argmax to get the index of the first match, and numpy.take to get the value from A's last column. Else, assign 0.
3- concatenate B and the newly build column
output:
array([[1, 2, 1, 0, 2, 0, 1],
[1, 1, 1, 0, 2, 0, 0],
[1, 1, 1, 0, 2, 0, 0],
[1, 2, 1, 0, 2, 0, 1],
[0, 3, 2, 2, 0, 0, 0],
[0, 1, 2, 1, 0, 0, 2],
[0, 2, 1, 2, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 4],
[0, 0, 0, 0, 1, 0, 5],
[0, 0, 0, 0, 0, 1, 6],
[0, 3, 2, 2, 0, 0, 0],
[0, 1, 2, 1, 0, 0, 2],
[0, 2, 1, 2, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 4],
[0, 0, 0, 0, 1, 0, 5],
[0, 0, 0, 0, 0, 1, 6]])
inputs:
A = [[1, 2, 1, 0, 2, 0, 1],
[1, 2, 1, 0, 2, 0, 1],
[1, 2, 1, 0, 2, 0, 1],
[1, 2, 1, 0, 2, 0, 1],
[0, 1, 2, 1, 0, 0, 2],
[0, 1, 2, 1, 0, 0, 2],
[0, 0, 1, 0, 1, 0, 3],
[0, 0, 0, 1, 1, 0, 4],
[0, 0, 0, 0, 1, 0, 5],
[0, 0, 0, 0, 0, 1, 6]]
A = np.array(A)
B = [[1, 2, 1, 0, 2, 0],
[1, 1, 1, 0, 2, 0],
[1, 1, 1, 0, 2, 0],
[1, 2, 1, 0, 2, 0],
[0, 3, 2, 2, 0, 0],
[0, 1, 2, 1, 0, 0],
[0, 2, 1, 2, 1, 0],
[0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1],
[0, 3, 2, 2, 0, 0],
[0, 1, 2, 1, 0, 0],
[0, 2, 1, 2, 1, 0],
[0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]]
B = np.array(B)
I have an array with the shape (10000,6). For example:
a = np.array([[5, 5, 5, 5, 5, 5][10, 10, 10, 10, 10][15, 15, 15, 15, 15]...])
I want to take every 25th array and subtract its element values from the next 25 elements until a new subtraction array in selected. so for example if the first array is:
[10, 10, 10, 10, 10]
then these values should be subtracted on the array itself and the next 25 arrays until for example a new subtraction array like this is selected:
[2, 2, 2, 2, 2]
then the array itself and the following 25 elements should be subtracted that arrays values.
This means that after the operation every 25th array will be:
[0, 0, 0, 0, 0]
because it has been subtracted by itself.
Here's what I would do:
import numpy as np
arr = np.random.randint(0, 10, (9, 3))
group_size = 3
# select vectors you want ot subtract and copy them {group_size} times
selected = arr[::group_size].repeat(3, axis = 0)
# subtract selected vectors from all vectors in the group
sub_arr = arr-selected
output:
arr =
[[9 6 3]
[8 3 3]
[2 0 4]
[0 3 9]
[3 9 9]
[0 8 6]
[4 0 0]
[6 1 9]
[2 6 4]]
selected =
[[9 6 3]
[9 6 3]
[9 6 3]
[0 3 9]
[0 3 9]
[0 3 9]
[4 0 0]
[4 0 0]
[4 0 0]]
sub_arr =
[[ 0 0 0]
[-1 -3 0]
[-7 -6 1]
[ 0 0 0]
[ 3 6 0]
[ 0 5 -3]
[ 0 0 0]
[ 2 1 9]
[-2 6 4]]
You can reshape your array so that each chunk has the right number of lines, and then simply subtract the first line
import numpy as np
a = np.arange(10000)[:, None] * np.ones(6)
a = a.reshape(-1, 25, 6)
a -= a[:, 0, :][:, None, :]
a = a.reshape(-1, 6)