How to delete specific rows from a numpy array using a condition? - arrays

This is the code
a = np.array([[ 0, 1],
[ 3, 11],
[4,2]])
This is what I tried
a= a[a[0]>0,:]
It works fine when I only have two elements, but anything more it throws an error.What I am trying to do is that in the first column if there's a value less than one than I need to delete that entire row.
so the expected output is
([ 3, 11],
[4,2]])
I was hoping for a solution which I could generalize even if there were more than 2 elements per item such as
([2,3,4,5],
[8,2,4,6],
[2,4,9,1],
[5,3,2,0],)
then the application of the code will give a result such as
([2,3,4,5],
[8,2,4,6],
[2,4,9,1],)
Any suggestions.

For just the first column use a[:,0] > 0 which will pull all the values from the first column and check which are > 0 or whatever condition you want:
In [50]: a = np.array([[ 0, 1],
[ 3, 11],
[4,2]])
In [51]: a[a[:,0] > 0]
Out[51]:
array([[ 3, 11],
[ 4, 2]])
You can use all if you want to check all values in each row:
In [43]: a = np.array([[ 0, 1],
[ 3, 11],
[4,2]])
In [44]: a[(a >= 0).all(axis=1)]
Out[44]:
array([[ 3, 11],
[ 4, 2]])
In [45]: a = np.array ([[2,3,4,5],
[8,2,4,6],
[2,4,9,1],
[5,3,2,0]])
In [46]: a[(a > 0).all(axis=1)]
Out[46]:
array([[2, 3, 4, 5],
[8, 2, 4, 6],
[2, 4, 9, 1]])

Related

Confusion regarding resulting shape of a multi-dimensional slicing of a numpy array

Suppose we have
t = np.random.rand(2,3,4)
i.e., a 2x3x4 tensor.
I'm having trouble understanding why the shape of t[0][:][:2] is 2x4 rather than 3x2.
Aren't we taking the 0th, all, and the first indices of the 1st, 2nd, and 3rd dimensions, in which case that would give us a 3x2 tensor?
In [1]: t = np.arange(24).reshape(2,3,4)
In [2]: t
Out[2]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Select the 1st plane:
In [3]: t[0]
Out[3]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
[:] selects everything - ie. no change
In [4]: t[0][:]
Out[4]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Select first 2 rows of that plane:
In [5]: t[0][:][:2]
Out[5]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
While [i][j] works for integer indices, it shouldn't be used for slices or arrays. Each [] acts on the result of the previous. [:] is not a 'placeholder' for the 'middle dimension'. Python syntax and execution order applies, even when using numpy. numpy just adds an array class, and functions. It doesn't change the syntax.
Instead you want:
In [6]: t[0,:,:2]
Out[6]:
array([[0, 1],
[4, 5],
[8, 9]])
The result is the 1st plane, and first 2 columns, (and all rows). With all 3 dimensions in one [] they are applied in a coordinated manner, not sequentially.
There is a gotcha, when using a slice in the middle of 'advanced indices'. Same selection as before, but transposed.
In [8]: t[0,:,[0,1]]
Out[8]:
array([[0, 4, 8],
[1, 5, 9]])
For this you need a partial decomposition - applying the 0 first
In [9]: t[0][:,[0,1]]
Out[9]:
array([[0, 1],
[4, 5],
[8, 9]])
There is a big indexing page in the numpy docs that you need to study sooner or later.

array rows where the random-integer elements may have different ranges

Consider the following code fragment:
import numpy as np
mask = np.array([True, True, False, True, True, False])
val = np.array([9, 3])
arr = np.random.randint(1, 9, size = (5,len(mask)))
As expected, we get an array of random integers, 1 to 9, with 5 rows and 6 columns as below. The val array has not been used yet.
[[2, 7, 6, 9, 7, 5],
[7, 2, 9, 7, 8, 3],
[9, 1, 3, 5, 7, 3],
[5, 7, 4, 4, 5, 2],
[7, 7, 9, 6, 9, 8]]
Now I'll introduce val = [9, 3].
Where mask = True, I want the row element to be taken randomly from 1 to 9.
Where mask = False, I want the row element to be taken randomly from 1 to 3.
How can this be done efficiently? A sample output is shown below.
[[2, 7, 2, 9, 7, 1],
[7, 2, 1, 7, 8, 3],
[9, 1, 3, 5, 7, 3],
[5, 7, 1, 4, 5, 2],
[7, 7, 2, 6, 9, 1]]
One idea is to sample randomly between 0 to 1, then multiply with 9 or 3 depending on mask, and finally add 1 to move the sample.
rand = np.random.rand(5,len(mask))
is3 = (1-mask).astype(int)
# out is random from 0-8 or 0-2 depending on `is3`
out = (rand*val[is3]).astype(int)
# move out by `1`:
out = (out + 1)
Output:
array([[4, 9, 3, 6, 2, 1],
[1, 8, 2, 7, 1, 3],
[8, 2, 1, 2, 3, 2],
[4, 3, 2, 2, 3, 2],
[5, 8, 1, 5, 6, 1]])

drop np array rows based on element uniqueness and one other condition

Consider the 2d integer array below:
import numpy as np
arr = np.array([[1, 3, 5, 2, 8],
[9, 6, 1, 7, 6],
[4, 4, 1, 8, 0],
[2, 3, 1, 8, 5],
[1, 2, 3, 4, 5],
[6, 6, 7, 9, 1],
[5, 3, 1, 8, 2]])
PROBLEM: Eliminate rows from arr that meet two conditions:
a) The row's elements MUST be unique
b) From these unique-element rows, I want to eliminate the permutation duplicates.
All other rows in arr are kept.
In the example given above, the rows with indices 0,3,4, and 6 meet condition a). Their elements are unique.
Of these 4 rows, the ones with indices 0,3,6 are permutations of each other: I want to keep
one of them, say index 0, and ELIMINATE the other two.
The output would look like:
[[1, 3, 5, 2, 8],
[9, 6, 1, 7, 6],
[4, 4, 1, 8, 0],
[1, 2, 3, 4, 5],
[6, 6, 7, 9, 1]])
I can identify the rows that meet condition a) with something like:
s = np.sort(arr,axis=1)
arr[~(s[:,:-1] == s[:,1:]).any(1)]
But, I'm not sure at all how to eliminate the permutation duplicates.
Here's one way -
# Sort along row
b = np.sort(arr,axis=1)
# Mask of rows with unique elements and select those rows
m = (b[:,:-1] != b[:,1:]).all(1)
d = b[m]
# Indices of uniq rows
idx = np.flatnonzero(m)
# Get indices of rows among them that are unique as per possible permutes
u,stidx,c = np.unique(d, axis=0, return_index=True, return_counts=True)
# Concatenate unique ones among these and non-masked ones
out = arr[np.sort(np.r_[idx[stidx], np.flatnonzero(~m)])]
Alternatively, final step could be optimized further, with something like this -
m[idx[stidx]] = 0
out = arr[~m]

Eliminating array rows based on a property of consecutive pairs of elements

We are given an array sample a, shown below, and a constant c.
import numpy as np
a = np.array([[1, 3, 1, 11, 9, 14],
[2, 12, 1, 10, 7, 6],
[6, 7, 2, 14, 2, 15],
[14, 8, 1, 3, -7, 2],
[0, -3, 0, 3, -3, 0],
[2, 2, 3, 3, 12, 13],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
c = 2
It is convenient, in this problem, to think of each array row as being composed of three pairs, so the 1st row is [1,3, 1,11, 9,14].
DEFINITION: d_min is the minimum difference between the elements of two consecutive pairs.
The PROBLEM: I want to retain rows of array a, where all consecutive pairs have d_min <= c. Otherwise, the rows should be eliminated.
In the 1st array row, the 1st pair (1,3) and the 2nd pair (1,11) have d_min = 1-1=0.
The 2nd pair (1,11) and the 3rd pair(9,14) have d_min = 11-9=2. (in both cases, d_min<=c, so we keep this row in a)
In the 2nd array row, the 1st pair (2,12) and the 2nd pair (1,10) have d_min = 2-1=1.
But, the 2nd pair (1,10) and the 3rd pair(7,6) have d_min = 10-7=3. (3 > c, so this row should be eliminated from array a)
Current efforts: I currently handle this problem with nested for-loops (2 deep).
The outer loop runs through the rows of array a, determining d_min between the first two pairs using:
for r in a
d_min = np.amin(np.abs(np.subtract.outer(r[:2], r[2:4])))
The inner loop uses the same method to determine the d_min between the last two pairs.
Further processing only is done only when d_min<= c for both sets of consecutive pairs.
I'm really hoping there is a way to avoid the for-loops. I eventually need to deal with 8-column arrays, and my current approach would involve 3-deep looping.
In the example, there are 4 row eliminations. The final result should look like:
a = np.array([[1, 3, 1, 11, 9, 14],
[0, -3, 0, 3, -3, 0],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
Assume the number of elements in each row is always even:
import numpy as np
a = np.array([[1, 3, 1, 11, 9, 14],
[2, 12, 1, 10, 7, 6],
[6, 7, 2, 14, 2, 15],
[14, 8, 1, 3, -7, 2],
[0, -3, 0, 3, -3, 0],
[2, 2, 3, 3, 12, 13],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
c = 2
# separate the array as previous pairs and next pairs
sx, sy = a.shape
prev_shape = sx, (sy - 2) // 2, 1, 2
next_shape = sx, (sy - 2) // 2, 2, 1
prev_pairs = a[:, :-2].reshape(prev_shape)
next_pairs = a[:, 2:].reshape(next_shape)
# subtract which will effectively work as outer subtraction due to numpy broadcasting, and
# calculate the minimum difference for each pair
pair_diff_min = np.abs(prev_pairs - next_pairs).min(axis=(2, 3))
# calculate the filter condition as boolean array
to_keep = pair_diff_min.max(axis=1) <= c
print(a[to_keep])
#[[ 1 3 1 11 9 14]
# [ 0 -3 0 3 -3 0]
# [ 3 14 4 12 1 4]
# [ 0 13 13 4 0 3]]
Demo Link

How do I split a 9x9 array into 9 3x3 components

I have a 9x9 multidimensional array that represents a sudoku game. I need to break it into it's 9 3x3 many components. How would this be done? I have absolutely no idea where to begin, here.
game = [
[1, 3, 2, 5, 7, 9, 4, 6, 8],
[4, 9, 8, 2, 6, 1, 3, 7, 5],
[7, 5, 6, 3, 8, 4, 2, 1, 9],
[6, 4, 3, 1, 5, 8, 7, 9, 2],
[5, 2, 1, 7, 9, 3, 8, 4, 6],
[9, 8, 7, 4, 2, 6, 5, 3, 1],
[2, 1, 4, 9, 3, 5, 6, 8, 7],
[3, 6, 5, 8, 1, 7, 9, 2, 4],
[8, 7, 9, 6, 4, 2, 1, 5, 3]
]
Split into chunks, it becomes
chunk_1 = [
[1, 3, 2],
[4, 9, 8],
[7, 5, 6]
]
chunk_2 = [
[5, 7, 9],
[2, 6, 1],
[3, 8, 4]
]
...and so on
That was a fun exercise!
Answer
game.each_slice(3).map{|stripe| stripe.transpose.each_slice(3).map{|chunk| chunk.transpose}}.flatten(1)
It would be cumbersome and not needed to define every chunk_1, chunk_2, ....
If you want chunk_2, you can use extract_chunks(game)[1]
It outputs [chunk_1, chunk_2, chunk_3, ..., chunk_9], so it's an Array of Arrays of Arrays :
1 3 2
4 9 8
7 5 6
5 7 9
2 6 1
3 8 4
4 6 8
3 7 5
2 1 9
6 4 3
5 2 1
...
You can define a method to check if this grid is valid (it is) :
def extract_chunks(game)
game.each_slice(3).map{|stripe| stripe.transpose.each_slice(3).map{|chunk| chunk.transpose}}.flatten(1)
end
class Array # NOTE: Use refinements if you don't want to patch Array
def has_nine_unique_elements?
self.flatten(1).uniq.size == 9
end
end
def valid?(game)
game.has_nine_unique_elements? &&
game.all?{|row| row.has_nine_unique_elements? } &&
game.all?{|column| column.has_nine_unique_elements? } &&
extract_chunks(game).all?{|chunk| chunk.has_nine_unique_elements? }
end
puts valid?(game) #=> true
Theory
The big grid can be sliced in 3 stripes, each containing 3 rows of 9 cells.
The first stripe will contain chunk_1, chunk_2 and chunk_3.
We need to cut the strip vertically into 3 chunks. To do so :
We transpose the strip,
Cut it horizontally with each_slice,
transpose back again.
We do the same for stripes #2 and #3.
To avoid returning an Array of Stripes of Chunks of Rows of Cells, we use flatten(1) to remove one level and return an Array of Chunks of Rows of Cells. :)
The method Matrix#minor is tailor-made for this:
require 'matrix'
def sub3x3(game, i, j)
Matrix[*game].minor(3*i, 3, 3*j, 3).to_a
end
chunk1 = sub3x3(game, 0, 0)
#=> [[1, 3, 2], [4, 9, 8], [7, 5, 6]]
chunk2 = sub3x3(game, 0, 1)
#=> [[5, 7, 9], [2, 6, 1], [3, 8, 4]]
chunk3 = sub3x3(game, 0, 2)
#=> [[4, 6, 8], [3, 7, 5], [2, 1, 9]]
chunk4 = sub3x3(game, 1, 0)
#=> [[6, 4, 3], [5, 2, 1], [9, 8, 7]]
...
chunk9 = sub3x3(game, 2, 2)
#=> [[6, 8, 7], [9, 2, 4], [1, 5, 3]]
Ruby has not concept of "rows" and "columns" of arrays. For convenience, therefore, I will refer to the 3x3 "subarray" of game, at offsets i and j (i = 0,1,2, j = 0,1,2), as the 3x3 submatrix of m = Matrix[*game] whose upper left value is at row offset 3*i and column offset 3*j of m, converted to an array.
This is relatively inefficient as a new matrix is created for the calculation of each "chunk". Considering the size of the array, this is not a problem, but rather than making that more efficient you really need to rethink the overall design. Creating nine local variables (rather than, say, an array of nine arrays) is not the way to go.
Here's a suggestion for checking the validity of game (that uses the method sub3x3 above) once all the open cells have been filled. Note that I've used the Wiki description of the game, in which the only valid entries are the digits 1-9, and I have assumed the code enforces that requirement when players enter values into cells.
def invalid_vector_index(game)
game.index { |vector| vector.uniq.size < 9 }
end
def sub3x3_invalid?(game, i, j)
sub3x3(game, i, j).flatten.uniq.size < 9
end
def valid?(game)
i = invalid_vector_index(game)
return [:ROW_ERR, i] if i
j = invalid_vector_index(game.transpose)
return [:COL_ERR, j] if j
m = Matrix[*game]
(0..2).each do |i|
(0..2).each do |j|
return [:SUB_ERR, i, j] if sub3x3_invalid?(game, i, j)
end
end
true
end
valid?(game)
#=> true
Notice this either returns true, meaning game is valid, or an array that both signifies that the solution is not valid and contains information that can be used to inform the player of the reason.
Now try
game[5], game[6] = game[6], game[5]
so
game
#=> [[1, 3, 2, 5, 7, 9, 4, 6, 8],
# [4, 9, 8, 2, 6, 1, 3, 7, 5],
# [7, 5, 6, 3, 8, 4, 2, 1, 9],
# [6, 4, 3, 1, 5, 8, 7, 9, 2],
# [5, 2, 1, 7, 9, 3, 8, 4, 6],
# [2, 1, 4, 9, 3, 5, 6, 8, 7],
# [9, 8, 7, 4, 2, 6, 5, 3, 1],
# [3, 6, 5, 8, 1, 7, 9, 2, 4],
# [8, 7, 9, 6, 4, 2, 1, 5, 3]]
valid?(game)
#=> [:SUB_ERR, 1, 0]
The rows and columns are obviously still valid, but this return value indicates that at least one 3x3 subarray is invalid and the array
[[6, 4, 3],
[5, 2, 1],
[2, 1, 4]]
was the first found to be invalid.
You could create a method that generates a single 3X3 chunk from a given index. since the sudoku board is of length 9, that will produce 9 3X3 chunks for you. see below.
#steps
#you'll loop through each index of the board
#to get the x value
#you divide the index by 3 and multiply by 3
#to get the y value
#you divide the index by 3, take remainder and multiply by 3
#for each x value, you can get 3 y values
#this will give you a single 3X3 box from one index so
def three_by3(index, sudoku)
#to get x value
x=(index/3)*3
#to get y value
y=(index%3)*3
(x...x+3).each_with_object([]) do |x,arr|
(y...y+3).each do |y|
arr<<sudoku[x][y]
end
end
end
sudoku = [ [1,2,3,4,5,6,7,8,9],
[2,3,4,5,6,7,8,9,1],
[3,4,5,6,7,8,9,1,2],
[1,2,3,4,5,6,7,8,9],
[2,3,4,5,6,7,8,9,1],
[3,4,5,6,7,8,9,1,2],
[1,2,3,4,5,6,7,8,9],
[2,3,4,5,6,7,8,9,1],
[3,4,5,6,7,8,9,1,2]]
p (0...sudoku.length).map {|i| three_by3(i,sudoku)}
#output:
#[[1, 2, 3, 2, 3, 4, 3, 4, 5],
# [4, 5, 6, 5, 6, 7, 6, 7, 8],
# [7, 8, 9, 8, 9, 1, 9, 1, 2],
# [1, 2, 3, 2, 3, 4, 3, 4, 5],
# [4, 5, 6, 5, 6, 7, 6, 7, 8],
# [7, 8, 9, 8, 9, 1, 9, 1, 2],
# [1, 2, 3, 2, 3, 4, 3, 4, 5],
# [4, 5, 6, 5, 6, 7, 6, 7, 8],
# [7, 8, 9, 8, 9, 1, 9, 1, 2]]

Resources