Removing array rows based on certain matches between elements - arrays

Consider the small sample of a 6-column integer array:
import numpy as np
J = np.array([[1, 3, 1, 3, 2, 5],
[2, 6, 3, 4, 2, 6],
[1, 7, 2, 5, 2, 5],
[4, 2, 8, 3, 8, 2],
[0, 3, 0, 3, 0, 3],
[2, 2, 3, 3, 2, 3],
[4, 3, 4, 3, 3, 4])
I want to remove, from J:
a) all rows where the first and second PAIRS of elements are exact matches
(this remove rows like [1,3, 1,3, 2,5])
b) all rows where the second and third PAIRS of elements are exact matches
(this remove rows like [1,7, 2,5, 2,5])
Matches between any other pairs are OK.
I have a solution, below, but it is handled in two steps. If there is a more direct, cleaner, or more readily extendable approach, I'd be very interested.
K = J[~(np.logical_and(J[:,0] == J[:,2], J[:,1] == J[:,3]))]
L = K[~(np.logical_and(K[:,2] == J[:,4], K[:,3] == K[:,5]))]
K removes the 1st, 5th, and 7th rows from J, leaving
K = [[2, 6, 3, 4, 2, 6],
[1, 7, 2, 5, 2, 5],
[4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3]])
L removes the 2nd row from K, giving the final outcome.
L = [[2, 6, 3, 4, 2, 6],
[4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3]])
I'm hoping for an efficient solution because, learning from this problem, I need to extend these ideas to 8-column arrays where
I eliminate rows having exact matches between the 1st and 2nd PAIRS, the 2nd and 3rd PAIRS, and the 3rd and 4th PAIRS.

Since we are checking for adjacent pairs for equality, a differencing on 3D reshaped data seems would be one way to do it for a cleaner vectorized one -
# a is input array
In [117]: b = a.reshape(a.shape[0],-1,2)
In [118]: a[~(np.diff(b,axis=1)==0).all(2).any(1)]
Out[118]:
array([[2, 6, 3, 4, 2, 6],
[4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3]])
If you are going for performance, skip the differencing and go for sliced equality -
In [142]: a[~(b[:,:-1] == b[:,1:]).all(2).any(1)]
Out[142]:
array([[2, 6, 3, 4, 2, 6],
[4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3]])
Generic no. of cols
Extends just as well on generic no. of cols -
In [156]: a
Out[156]:
array([[1, 3, 1, 3, 2, 5, 1, 3, 1, 3, 2, 5],
[2, 6, 3, 4, 2, 6, 2, 6, 3, 4, 2, 6],
[1, 7, 2, 5, 2, 5, 1, 7, 2, 5, 2, 5],
[4, 2, 8, 3, 8, 2, 4, 2, 8, 3, 8, 2],
[0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3],
[2, 2, 3, 3, 2, 3, 2, 2, 3, 3, 2, 3],
[4, 3, 4, 3, 3, 4, 4, 3, 4, 3, 3, 4]])
In [158]: b = a.reshape(a.shape[0],-1,2)
In [159]: a[~(b[:,:-1] == b[:,1:]).all(2).any(1)]
Out[159]:
array([[4, 2, 8, 3, 8, 2, 4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3, 2, 2, 3, 3, 2, 3]])
Of course, we are assuming the number of cols allows pairing.

What you have is quite reasonable. Here's what I would write:
def eliminate_pairs(x: np.ndarray) -> np.ndarray:
first_second = (x[:, 0] == x[:, 2]) & (x[:, 1] == x[:, 3])
second_third = (x[:, 1] == x[:, 3]) & (x[:, 2] == x[:, 4])
return x[~(first_second | second_third)]
You could also apply DeMorgan's theorem and eliminate an extra not operation, but that's less important than clarity.

Let's try a loop:
mask = False
for i in range(0,3,2):
mask = (J[:,i:i+2]==J[:,i+2:i+4]).all(1) | mask
J[~mask]
Output:
array([[2, 6, 3, 4, 2, 6],
[4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3]])

Related

Sort the rows of the array by the value of the element of the main diagonal in each of the rows (in the initial array)

Sort the rows of the array by the value of the element of the main diagonal in each of the rows (in the initial array)
[[3, 2, 7, 1, 3, 7, 2, 6, 4, 8],
[5, 3, 7, 1, 1, 1, 6, 4, 6, 7],
[1, 9, 7, 8, 2, 1, 3, 7, 9, 8],
[1, 7, 3, 7, 6, 6, 6, 8, 4, 8],
[4, 2, 3, 2, 2, 3, 2, 4, 7, 6]]
There is such an array, how should it look as a result?

numpy arrays: building a 3d array by adding 2d slices one at a time

Looking for some help with numpy and building a 3d array from multiply 2d arrays. I want to make a loop, such that on every iteration I make a new 2d array and make it a new slice in an existing 3d array. Here's my code sample.
import numpy as np
import random
import array
a = np.random.randint(0, 9, size=(10, 10)) <-- make random 10x10 matrix
b = a <-- save copy
a = np.random.randint(0, 9, size=(10, 10)) <-- make random 10x10 matrix
a.shape
(10, 10) <-- verify it's 10x10
b.shape
(10, 10) <-- verify it's 10x10
b = np.array([b, a]) <-- convert two 2d matrix into one 3d matrix
b.shape
(2, 10, 10) <-- verify it's a 3d matrix with two planes
a = np.random.randint(0, 9, size=(10, 10)) <-- make new random 10x10 matrix
b = np.array([b, a]) <-- add new 2d plane to the 3d matrix
b.shape
(2,) <-- should be (3, 10, 10)
Can anyone see what I'm doing wrong?
When you combine two arrays by using np.array([...]), they have to be the same shape. If they aren't numpy treats them not as numpy arrays, but as dumb/blind objects. There should have been a warning when you ran the last b = np.array([b, a]):
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
Instead, use np.stack
b = np.stack([*b, a])
*b basically expands the children of b, so the above is equivalent to b = np.stack([b[0], b[1], a])
Or you can use np.vstack (vertical stack):
b = np.vstack([b, a[None]])
a[None] basically wraps a in another array. a.shape == (10, 10), a[None].shape == (1, 10, 10)
Both of the above produce the following:
>>> b.shape
(3, 10, 10)
>>> b
array([[[3, 8, 0, 2, 8, 0, 0, 5, 7, 7],
[0, 5, 2, 8, 8, 2, 1, 4, 5, 8],
[3, 2, 2, 4, 1, 8, 2, 0, 7, 5],
[5, 6, 5, 0, 8, 7, 4, 0, 4, 6],
[6, 2, 3, 7, 4, 3, 6, 6, 4, 8],
[2, 5, 1, 7, 1, 3, 0, 6, 0, 5],
[3, 4, 0, 7, 3, 4, 5, 0, 7, 4],
[0, 7, 2, 8, 7, 7, 4, 3, 2, 6],
[4, 6, 2, 5, 5, 8, 5, 8, 0, 8],
[3, 4, 1, 0, 3, 7, 0, 6, 7, 3]],
[[4, 0, 6, 2, 4, 4, 7, 0, 7, 2],
[5, 8, 5, 8, 2, 8, 3, 7, 4, 6],
[2, 1, 2, 0, 4, 5, 6, 3, 0, 0],
[8, 7, 3, 0, 8, 8, 0, 4, 1, 4],
[0, 2, 5, 7, 5, 3, 0, 5, 1, 7],
[1, 5, 8, 0, 2, 6, 5, 0, 3, 2],
[4, 4, 4, 3, 3, 8, 6, 6, 5, 5],
[5, 3, 6, 8, 0, 3, 0, 8, 8, 3],
[4, 2, 6, 6, 6, 2, 0, 0, 6, 2],
[7, 3, 8, 0, 7, 1, 1, 8, 6, 2]],
[[6, 6, 1, 1, 6, 4, 6, 2, 6, 7],
[0, 5, 6, 7, 5, 0, 0, 5, 8, 2],
[6, 6, 1, 5, 2, 3, 2, 3, 3, 2],
[0, 3, 7, 6, 4, 5, 3, 1, 7, 2],
[7, 6, 3, 0, 1, 7, 8, 3, 8, 5],
[3, 1, 8, 6, 1, 5, 0, 8, 6, 1],
[1, 4, 8, 1, 7, 0, 1, 1, 5, 3],
[2, 1, 4, 8, 2, 3, 1, 6, 8, 7],
[8, 1, 1, 0, 6, 1, 0, 6, 1, 6],
[1, 8, 4, 7, 7, 5, 0, 3, 8, 6]]])

array rows where the random-integer elements may have different ranges

Consider the following code fragment:
import numpy as np
mask = np.array([True, True, False, True, True, False])
val = np.array([9, 3])
arr = np.random.randint(1, 9, size = (5,len(mask)))
As expected, we get an array of random integers, 1 to 9, with 5 rows and 6 columns as below. The val array has not been used yet.
[[2, 7, 6, 9, 7, 5],
[7, 2, 9, 7, 8, 3],
[9, 1, 3, 5, 7, 3],
[5, 7, 4, 4, 5, 2],
[7, 7, 9, 6, 9, 8]]
Now I'll introduce val = [9, 3].
Where mask = True, I want the row element to be taken randomly from 1 to 9.
Where mask = False, I want the row element to be taken randomly from 1 to 3.
How can this be done efficiently? A sample output is shown below.
[[2, 7, 2, 9, 7, 1],
[7, 2, 1, 7, 8, 3],
[9, 1, 3, 5, 7, 3],
[5, 7, 1, 4, 5, 2],
[7, 7, 2, 6, 9, 1]]
One idea is to sample randomly between 0 to 1, then multiply with 9 or 3 depending on mask, and finally add 1 to move the sample.
rand = np.random.rand(5,len(mask))
is3 = (1-mask).astype(int)
# out is random from 0-8 or 0-2 depending on `is3`
out = (rand*val[is3]).astype(int)
# move out by `1`:
out = (out + 1)
Output:
array([[4, 9, 3, 6, 2, 1],
[1, 8, 2, 7, 1, 3],
[8, 2, 1, 2, 3, 2],
[4, 3, 2, 2, 3, 2],
[5, 8, 1, 5, 6, 1]])

np array rows with unique elements

Consider the numpy array below. I'd hoping to find a fast way to remove rows not having 4 distinct values.
import numpy as np
D = np.array([[2, 3, 6, 7],
[2, 4, 3, 4],
[4, 9, 0, 1],
[5, 5, 2, 5],
[7, 5, 4, 8],
[7, 5, 4, 7]])
In the small sample array show, the output should be:
D = np.array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
Here's one way -
In [94]: s = np.sort(D,axis=1)
In [95]: D[(s[:,:-1] == s[:,1:]).sum(1) ==0]
Out[95]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
Alternatively -
In [107]: D[~(s[:,:-1] == s[:,1:]).any(1)]
Out[107]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
Or -
In [112]: D[(s[:,:-1] != s[:,1:]).all(1)]
Out[112]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
With pandas -
In [121]: import pandas as pd
In [122]: D[pd.DataFrame(D).nunique(1)==4]
Out[122]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
A working answer with np.unique
I found no way to use the axis keyword in np.unique to get rid of the list compression, perhaps someone can help?
D[np.array([np.max(np.unique(_,return_counts=True)[-1]) for _ in D])==1]

How do I split a 9x9 array into 9 3x3 components

I have a 9x9 multidimensional array that represents a sudoku game. I need to break it into it's 9 3x3 many components. How would this be done? I have absolutely no idea where to begin, here.
game = [
[1, 3, 2, 5, 7, 9, 4, 6, 8],
[4, 9, 8, 2, 6, 1, 3, 7, 5],
[7, 5, 6, 3, 8, 4, 2, 1, 9],
[6, 4, 3, 1, 5, 8, 7, 9, 2],
[5, 2, 1, 7, 9, 3, 8, 4, 6],
[9, 8, 7, 4, 2, 6, 5, 3, 1],
[2, 1, 4, 9, 3, 5, 6, 8, 7],
[3, 6, 5, 8, 1, 7, 9, 2, 4],
[8, 7, 9, 6, 4, 2, 1, 5, 3]
]
Split into chunks, it becomes
chunk_1 = [
[1, 3, 2],
[4, 9, 8],
[7, 5, 6]
]
chunk_2 = [
[5, 7, 9],
[2, 6, 1],
[3, 8, 4]
]
...and so on
That was a fun exercise!
Answer
game.each_slice(3).map{|stripe| stripe.transpose.each_slice(3).map{|chunk| chunk.transpose}}.flatten(1)
It would be cumbersome and not needed to define every chunk_1, chunk_2, ....
If you want chunk_2, you can use extract_chunks(game)[1]
It outputs [chunk_1, chunk_2, chunk_3, ..., chunk_9], so it's an Array of Arrays of Arrays :
1 3 2
4 9 8
7 5 6
5 7 9
2 6 1
3 8 4
4 6 8
3 7 5
2 1 9
6 4 3
5 2 1
...
You can define a method to check if this grid is valid (it is) :
def extract_chunks(game)
game.each_slice(3).map{|stripe| stripe.transpose.each_slice(3).map{|chunk| chunk.transpose}}.flatten(1)
end
class Array # NOTE: Use refinements if you don't want to patch Array
def has_nine_unique_elements?
self.flatten(1).uniq.size == 9
end
end
def valid?(game)
game.has_nine_unique_elements? &&
game.all?{|row| row.has_nine_unique_elements? } &&
game.all?{|column| column.has_nine_unique_elements? } &&
extract_chunks(game).all?{|chunk| chunk.has_nine_unique_elements? }
end
puts valid?(game) #=> true
Theory
The big grid can be sliced in 3 stripes, each containing 3 rows of 9 cells.
The first stripe will contain chunk_1, chunk_2 and chunk_3.
We need to cut the strip vertically into 3 chunks. To do so :
We transpose the strip,
Cut it horizontally with each_slice,
transpose back again.
We do the same for stripes #2 and #3.
To avoid returning an Array of Stripes of Chunks of Rows of Cells, we use flatten(1) to remove one level and return an Array of Chunks of Rows of Cells. :)
The method Matrix#minor is tailor-made for this:
require 'matrix'
def sub3x3(game, i, j)
Matrix[*game].minor(3*i, 3, 3*j, 3).to_a
end
chunk1 = sub3x3(game, 0, 0)
#=> [[1, 3, 2], [4, 9, 8], [7, 5, 6]]
chunk2 = sub3x3(game, 0, 1)
#=> [[5, 7, 9], [2, 6, 1], [3, 8, 4]]
chunk3 = sub3x3(game, 0, 2)
#=> [[4, 6, 8], [3, 7, 5], [2, 1, 9]]
chunk4 = sub3x3(game, 1, 0)
#=> [[6, 4, 3], [5, 2, 1], [9, 8, 7]]
...
chunk9 = sub3x3(game, 2, 2)
#=> [[6, 8, 7], [9, 2, 4], [1, 5, 3]]
Ruby has not concept of "rows" and "columns" of arrays. For convenience, therefore, I will refer to the 3x3 "subarray" of game, at offsets i and j (i = 0,1,2, j = 0,1,2), as the 3x3 submatrix of m = Matrix[*game] whose upper left value is at row offset 3*i and column offset 3*j of m, converted to an array.
This is relatively inefficient as a new matrix is created for the calculation of each "chunk". Considering the size of the array, this is not a problem, but rather than making that more efficient you really need to rethink the overall design. Creating nine local variables (rather than, say, an array of nine arrays) is not the way to go.
Here's a suggestion for checking the validity of game (that uses the method sub3x3 above) once all the open cells have been filled. Note that I've used the Wiki description of the game, in which the only valid entries are the digits 1-9, and I have assumed the code enforces that requirement when players enter values into cells.
def invalid_vector_index(game)
game.index { |vector| vector.uniq.size < 9 }
end
def sub3x3_invalid?(game, i, j)
sub3x3(game, i, j).flatten.uniq.size < 9
end
def valid?(game)
i = invalid_vector_index(game)
return [:ROW_ERR, i] if i
j = invalid_vector_index(game.transpose)
return [:COL_ERR, j] if j
m = Matrix[*game]
(0..2).each do |i|
(0..2).each do |j|
return [:SUB_ERR, i, j] if sub3x3_invalid?(game, i, j)
end
end
true
end
valid?(game)
#=> true
Notice this either returns true, meaning game is valid, or an array that both signifies that the solution is not valid and contains information that can be used to inform the player of the reason.
Now try
game[5], game[6] = game[6], game[5]
so
game
#=> [[1, 3, 2, 5, 7, 9, 4, 6, 8],
# [4, 9, 8, 2, 6, 1, 3, 7, 5],
# [7, 5, 6, 3, 8, 4, 2, 1, 9],
# [6, 4, 3, 1, 5, 8, 7, 9, 2],
# [5, 2, 1, 7, 9, 3, 8, 4, 6],
# [2, 1, 4, 9, 3, 5, 6, 8, 7],
# [9, 8, 7, 4, 2, 6, 5, 3, 1],
# [3, 6, 5, 8, 1, 7, 9, 2, 4],
# [8, 7, 9, 6, 4, 2, 1, 5, 3]]
valid?(game)
#=> [:SUB_ERR, 1, 0]
The rows and columns are obviously still valid, but this return value indicates that at least one 3x3 subarray is invalid and the array
[[6, 4, 3],
[5, 2, 1],
[2, 1, 4]]
was the first found to be invalid.
You could create a method that generates a single 3X3 chunk from a given index. since the sudoku board is of length 9, that will produce 9 3X3 chunks for you. see below.
#steps
#you'll loop through each index of the board
#to get the x value
#you divide the index by 3 and multiply by 3
#to get the y value
#you divide the index by 3, take remainder and multiply by 3
#for each x value, you can get 3 y values
#this will give you a single 3X3 box from one index so
def three_by3(index, sudoku)
#to get x value
x=(index/3)*3
#to get y value
y=(index%3)*3
(x...x+3).each_with_object([]) do |x,arr|
(y...y+3).each do |y|
arr<<sudoku[x][y]
end
end
end
sudoku = [ [1,2,3,4,5,6,7,8,9],
[2,3,4,5,6,7,8,9,1],
[3,4,5,6,7,8,9,1,2],
[1,2,3,4,5,6,7,8,9],
[2,3,4,5,6,7,8,9,1],
[3,4,5,6,7,8,9,1,2],
[1,2,3,4,5,6,7,8,9],
[2,3,4,5,6,7,8,9,1],
[3,4,5,6,7,8,9,1,2]]
p (0...sudoku.length).map {|i| three_by3(i,sudoku)}
#output:
#[[1, 2, 3, 2, 3, 4, 3, 4, 5],
# [4, 5, 6, 5, 6, 7, 6, 7, 8],
# [7, 8, 9, 8, 9, 1, 9, 1, 2],
# [1, 2, 3, 2, 3, 4, 3, 4, 5],
# [4, 5, 6, 5, 6, 7, 6, 7, 8],
# [7, 8, 9, 8, 9, 1, 9, 1, 2],
# [1, 2, 3, 2, 3, 4, 3, 4, 5],
# [4, 5, 6, 5, 6, 7, 6, 7, 8],
# [7, 8, 9, 8, 9, 1, 9, 1, 2]]

Resources