Given an 10 x N matrix of 1's and 0s, such as:
1 1 1 1 1 1 1 1
1 1 0 1 1 0 0 1
0 0 0 1 1 0 0 1
0 0 0 1 1 0 0 0
1 0 0 0 0 1 0 0
1 0 1 1 1 1 1 1
1 0 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
notes:
the zeroes in a column are always between two runs of consecutive 1s. for example, a column such as 1 1 1 0 1 0 1 1 1 1 is not permitted
there must be at least a gap of one zero in each column, ie a column such as: 1 1 1 1 1 1 1 1 1 1 is not allowed
I want to find the longest consecutive streak of zeroes from left to right. In this case, that would be 4, which corresponds to the path starting in the second column of the 5th row from the top,
The second longest is 3 and there are 3 examples of that.
I'm a bit stumped on this, especially for very large N (~10M). I am looking for suggestions for the right approach/data structure to use or a similar problem and the algorithm used there. Another potential way to model the problem is to represent the problem using two lists:
L1 = [2, 2, 1, 4, 4, 1, 1, 3]
L2 = [6, 3, 5, 5, 5, 5, 5, 5]
but still not quite sure how to come up with an efficient solution
The solution using itertools.groupby(), sum() and max() functions:
import itertools
m = [
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 1, 1, 0, 0, 1],
[0, 0, 0, 1, 1, 0, 0, 1],
[0, 0, 0, 1, 1, 0, 0, 0],
[1, 0, 0, 0, 0, 1, 0, 0],
[1, 0, 1, 1, 1, 1, 1, 1],
[1, 0, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1]
]
max_zero_len = max(sum(1 for z in g if z == 0)
for l in m for k,g in itertools.groupby(l))
print(max_zero_len)
The output:
4
for l in m for k,g in itertools.groupby(l) - will generate a separate group for each consecutive sequences of 1 and 0 values for each nested list. (like [1,1], [0], [1,1], [0,0] ...)
sum(1 for z in g if z == 0) - considers only 0's sequences and counts its length using sum function
max(...) - gets the maximum length among zero(0) sequences
Related
For example I have the following array:
[0, 0, 0, 1, 0, 0, 0]
what I want is
[0, 0, 1, 1, 1, 0, 0]
If the 1 is at the at the end, for example [1, 0, 0, 0] it should add only on one side [1, 1, 0, 0]
How do I add a 1 on either side while keeping the array the same length? I have looked at the numpy pad function, but that didn't seem like the right approach.
One way using numpy.convolve with mode == "same":
np.convolve([0, 0, 0, 1, 0, 0, 0], [1,1,1], "same")
Output:
array([0, 0, 1, 1, 1, 0, 0])
With other examples:
np.convolve([1,0,0,0], [1,1,1], "same")
# array([1, 1, 0, 0])
np.convolve([0,0,0,1], [1,1,1], "same")
# array([0, 0, 1, 1])
np.convolve([1,0,0,0,1,0,0,0], [1,1,1], "same")
# array([1, 1, 0, 1, 1, 1, 0, 0])
You can use np.pad to create two shifted copies of the array: one shifted 1 time toward the left (e.g. 0 1 0 -> 1 0 0) and one shifted 1 time toward the right (e.g. 0 1 0 -> 0 0 1).
Then you can add all three arrays together:
0 1 0
1 0 0
+ 0 0 1
-------
1 1 1
Code:
output = a + np.pad(a, (1,0))[:-1] + np.pad(a, (0,1))[1:]
# (1, 0) says to pad 1 time at the start of the array and 0 times at the end
# (0, 1) says to pad 0 times at the start of the array and 1 time at the end
Output:
# Original array
>>> a = np.array([1, 0, 0, 0, 1, 0, 0, 0])
>>> a
array([1, 0, 0, 0, 1, 0, 0, 0])
# New array
>>> output = a + np.pad(a, (1,0))[:-1] + np.pad(a, (0,1))[1:]
>>> output
array([1, 1, 0, 1, 1, 1, 0, 0])
I have the following array:
[[1 2 1 0 2 0]
[1 2 1 0 2 0]
[1 2 1 0 2 0]
[1 2 1 0 2 0]
[0 1 2 1 0 0]
[0 1 2 1 0 0]
[0 0 1 0 1 0]
[0 0 0 1 1 0]
[0 0 0 0 1 0]
[0 0 0 0 0 1]]
I need to add a column to this array that adds a number whenever the values in the rows change starting with number 3. So the result would look like this:
[[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[0 1 2 1 0 0 4]
[0 1 2 1 0 0 4]
[0 0 1 0 1 0 5]
[0 0 0 1 1 0 6]
[0 0 0 0 1 0 7]
[0 0 0 0 0 1 8]]
Thank you
If a is your array as:
a = np.array([[1, 2, 1, 0, 2, 0], [1, 2, 1, 0, 2, 0], [1, 2, 1, 0, 2, 0], [1, 2, 1, 0, 2, 0],
[0, 1, 2, 1, 0, 0], [0, 1, 2, 1, 0, 0], [0, 0, 1, 0, 1, 0], [0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1]])
using the following code will get you the results:
n = 3
a = a.tolist()
for i, j in enumerate(a):
if i == 0:
j.append(n)
elif i > 0 and j == a[i-1][:-1]:
j.append(n)
else:
n += 1
j.append(n)
# a = np.array(a)
which will give:
[[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[1 2 1 0 2 0 3]
[0 1 2 1 0 0 4]
[0 1 2 1 0 0 4]
[0 0 1 0 1 0 5]
[0 0 0 1 1 0 6]
[0 0 0 0 1 0 7]
[0 0 0 0 0 1 8]]
I have two arrays, one looks like this:
[[1 2 1 0 2 0 1]
[1 2 1 0 2 0 1]
[1 2 1 0 2 0 1]
[1 2 1 0 2 0 1]
[0 1 2 1 0 0 2]
[0 1 2 1 0 0 2]
[0 0 1 0 1 0 3]
[0 0 0 1 1 0 4]
[0 0 0 0 1 0 5]
[0 0 0 0 0 1 6]]
The other looks like this:
[[1 2 1 0 2 0]
[1 1 1 0 2 0]
[1 1 1 0 2 0]
[1 2 1 0 2 0]
[0 3 2 2 0 0]
[0 1 2 1 0 0]
[0 2 1 2 1 0]
[0 0 0 1 1 0]
[0 0 0 0 1 0]
[0 0 0 0 0 1]
...
[0 3 2 2 0 0]
[0 1 2 1 0 0]
[0 2 1 2 1 0]
[0 0 0 1 1 0]
[0 0 0 0 1 0]
[0 0 0 0 0 1]]
Whenever a row in the second array matches the first six values in the first array I need to add the last element of the first array (the 7th element) at the end of the row of the second array that matches and when it doesn't match add a 0. The result would look like this:
[[1 2 1 0 2 0 1]
[1 1 1 0 2 0 0]
[1 1 1 0 2 0 0]
[1 2 1 0 2 0 1]
[0 3 2 2 0 0 0]
[0 1 2 1 0 0 2]
[0 2 1 2 1 0 0]
[0 0 0 1 1 0 4]
[0 0 0 0 1 0 5]
[0 0 0 0 0 1 6]
...
[0 3 2 2 0 0 0]
[0 1 2 1 0 0 2]
[0 2 1 2 1 0 0]
[0 0 0 1 1 0 4]
[0 0 0 0 1 0 5]
[0 0 0 0 0 1 6]]
You could use:
import numpy as np
m = (B == A[:,None,:6]).all(2)
new_A = np.c_[B, np.where(m.any(0), np.take(A[:,6], m.argmax(0)), 0)]
How it works:
1- use broadcasting to compare B with all combinations of rows of A (limited to first 6 columns), and build a mask
2- Using numpy.where to check the condition: if at least 1 row in A matches, use numpy.argmax to get the index of the first match, and numpy.take to get the value from A's last column. Else, assign 0.
3- concatenate B and the newly build column
output:
array([[1, 2, 1, 0, 2, 0, 1],
[1, 1, 1, 0, 2, 0, 0],
[1, 1, 1, 0, 2, 0, 0],
[1, 2, 1, 0, 2, 0, 1],
[0, 3, 2, 2, 0, 0, 0],
[0, 1, 2, 1, 0, 0, 2],
[0, 2, 1, 2, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 4],
[0, 0, 0, 0, 1, 0, 5],
[0, 0, 0, 0, 0, 1, 6],
[0, 3, 2, 2, 0, 0, 0],
[0, 1, 2, 1, 0, 0, 2],
[0, 2, 1, 2, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 4],
[0, 0, 0, 0, 1, 0, 5],
[0, 0, 0, 0, 0, 1, 6]])
inputs:
A = [[1, 2, 1, 0, 2, 0, 1],
[1, 2, 1, 0, 2, 0, 1],
[1, 2, 1, 0, 2, 0, 1],
[1, 2, 1, 0, 2, 0, 1],
[0, 1, 2, 1, 0, 0, 2],
[0, 1, 2, 1, 0, 0, 2],
[0, 0, 1, 0, 1, 0, 3],
[0, 0, 0, 1, 1, 0, 4],
[0, 0, 0, 0, 1, 0, 5],
[0, 0, 0, 0, 0, 1, 6]]
A = np.array(A)
B = [[1, 2, 1, 0, 2, 0],
[1, 1, 1, 0, 2, 0],
[1, 1, 1, 0, 2, 0],
[1, 2, 1, 0, 2, 0],
[0, 3, 2, 2, 0, 0],
[0, 1, 2, 1, 0, 0],
[0, 2, 1, 2, 1, 0],
[0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1],
[0, 3, 2, 2, 0, 0],
[0, 1, 2, 1, 0, 0],
[0, 2, 1, 2, 1, 0],
[0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]]
B = np.array(B)
How can I reset all values in a column from a negative number to the top to zero in an array?
data = np.array([[1, 1, 1, 2], [0, 1, 0, -1], [-1, 0, 1, 0], [1, 1, 1, 1]])
resetneg_data = np.where(data<0, 0, data)
print(resetnet_data)
This gives me:
[[1 1 1 2]
[0 1 0 0]
[0 0 1 0]
[1 1 1 1]]
But what I want is:
[[0 1 1 0]
[0 1 0 0]
[0 0 1 0]
[1 1 1 1]]
That is, zero where negative, and zero everywhere above the negative. But not zero above other zeros. So that if a column drops below zero in a row, all the rows above it reset to zero.
Can I mask the values somehow by finding the specific ranges:
mask_end = np.where(data < 0)
print(mask_end)
gives:
(array([1, 2]), array([3, 0]))
maybe... use those values to replace to that row in a column with zeros?
# find values that are smaller than 0 from bottom up along with values above negatives
mask = np.minimum.accumulate(data[::-1])[::-1] < 0
# set value at mask positions as 0
data[mask] = 0
data
#[[0 1 1 0]
# [0 1 0 0]
# [0 0 1 0]
# [1 1 1 1]]
In python 3 (3.6.5), I have data in a (much longer) numpy array looking like this:
data = np.array([[16347, 0, 60],[16353, 0, 92],[16382, 0, 1],[17867, 0, 2],[20188, 0, 3],[21459, 0, 512],[21873, 0, 71],[22031, 0, 4],[23072, 0, 61],[25378, 0, 60],[25385, 0, 82],[25410, 0, 1],[26895, 0, 2],[29233, 0, 3],[31695, 0, 71],[31845, 0, 4],[32886, 0, 61],[35069, 0, 60],[35075, 0, 90],[35104, 0, 1]])
The first two columns can be ignored for the point of this question. In the third, I would like to replace all 2 entries with a value in the same column, 2 rows before. For instance, in the example data there is a 2 on the 4th row, and it should be replaced by the number 92in row 2. Similarly, the 2 on row 13 needs to be replaced by 82 on line 11, and so on.
In short, I need to search for all 2 entries in a column within a numpy array, and replace them for whatever value was on the same column 2 rows before.
I'd appreciate any tips or ideas.
Thanks!
You can use the np.where() and np.roll() to do this:
data[:,-1]=where(data[:,-1]==2,np.roll(data[:,-1],2),data[:,-1])
Here
data[:,-1]
Isolates the 3rd column you are interested in, and where() returns an array filled with values that depend on the condition. If the condition that the value is equal to 2 is True, it returns the corresponding value from
np.roll(data[:,-1],2)
which is the original column shifted forward by 2, with the last two values no becoming the first two values.
There result for the input array:
[[16347 0 60]
[16353 0 92]
[16382 0 1]
[17867 0 2]
[20188 0 3]
[21459 0 512]
[21873 0 71]
[22031 0 4]
[23072 0 61]
[25378 0 60]
[25385 0 82]
[25410 0 1]
[26895 0 2]
[29233 0 3]
[31695 0 71]
[31845 0 4]
[32886 0 61]
[35069 0 60]
[35075 0 90]
[35104 0 1]]
is the desired:
[[16347 0 60]
[16353 0 92]
[16382 0 1]
[17867 0 92]
[20188 0 3]
[21459 0 512]
[21873 0 71]
[22031 0 4]
[23072 0 61]
[25378 0 60]
[25385 0 82]
[25410 0 1]
[26895 0 82]
[29233 0 3]
[31695 0 71]
[31845 0 4]
[32886 0 61]
[35069 0 60]
[35075 0 90]
[35104 0 1]]