drop np array rows based on element uniqueness and one other condition - arrays

Consider the 2d integer array below:
import numpy as np
arr = np.array([[1, 3, 5, 2, 8],
[9, 6, 1, 7, 6],
[4, 4, 1, 8, 0],
[2, 3, 1, 8, 5],
[1, 2, 3, 4, 5],
[6, 6, 7, 9, 1],
[5, 3, 1, 8, 2]])
PROBLEM: Eliminate rows from arr that meet two conditions:
a) The row's elements MUST be unique
b) From these unique-element rows, I want to eliminate the permutation duplicates.
All other rows in arr are kept.
In the example given above, the rows with indices 0,3,4, and 6 meet condition a). Their elements are unique.
Of these 4 rows, the ones with indices 0,3,6 are permutations of each other: I want to keep
one of them, say index 0, and ELIMINATE the other two.
The output would look like:
[[1, 3, 5, 2, 8],
[9, 6, 1, 7, 6],
[4, 4, 1, 8, 0],
[1, 2, 3, 4, 5],
[6, 6, 7, 9, 1]])
I can identify the rows that meet condition a) with something like:
s = np.sort(arr,axis=1)
arr[~(s[:,:-1] == s[:,1:]).any(1)]
But, I'm not sure at all how to eliminate the permutation duplicates.

Here's one way -
# Sort along row
b = np.sort(arr,axis=1)
# Mask of rows with unique elements and select those rows
m = (b[:,:-1] != b[:,1:]).all(1)
d = b[m]
# Indices of uniq rows
idx = np.flatnonzero(m)
# Get indices of rows among them that are unique as per possible permutes
u,stidx,c = np.unique(d, axis=0, return_index=True, return_counts=True)
# Concatenate unique ones among these and non-masked ones
out = arr[np.sort(np.r_[idx[stidx], np.flatnonzero(~m)])]
Alternatively, final step could be optimized further, with something like this -
m[idx[stidx]] = 0
out = arr[~m]

Related

counting DISTINCT copies of row elements

Consider the array sample A.
import numpy as np
A = np.array([[2, 3, 6, 7, 3, 6, 7, 2],
[2, 3, 6, 7, 3, 6, 7, 7],
[2, 4, 3, 4, 6, 4, 9, 4],
[4, 9, 0, 1, 2, 5, 3, 0],
[5, 5, 2, 5, 4, 3, 7, 5],
[7, 5, 4, 8, 0, 1, 2, 6],
[7, 5, 4, 7, 3, 8, 0, 7]])
PROBLEM: I want to identify rows that have a specified number of DISTINCT element copies. The following code comes close: The code needs to be able to answer questions like "which rows of A have exactly 4 elements that appear twice?", or "which rows of A have exactly 1 element that appear three times?"
r,c = A.shape
nCopies = 4
s = np.sort(A,axis=1)
out = A[((s[:,1:] != s[:,:-1]).sum(axis=1)+1 == c - nCopies)]
This produces 2 output rows, both having 4 copied elements.
The 1st row has copies of 2,3,6,7. The 2nd row has copies of 3,6,7,7:
array([[2, 3, 6, 7, 3, 6, 7, 2],
[2, 3, 6, 7, 3, 6, 7, 7]])
My problem is that I don't want the 2nd output row because it only has 3 DISTINCT copies (ie: 3,6,7)
How can to code be modified to identify only distinct copies?
If I understand correctly, you want the rows of A that have 4 distinct values and every value must have at least one copy. You can leverage np.unique(return_counts=True) which returns 2 values, the distinct values and the count of each value.
counts = [np.unique(row,return_counts=True) for row in A ]
valid_indices = [ np.all(row[1] > 1) and row[0].shape[0] == 4 for row in counts ]
valid_rows = A[valid_indices]

Eliminating array rows based on a property of consecutive pairs of elements

We are given an array sample a, shown below, and a constant c.
import numpy as np
a = np.array([[1, 3, 1, 11, 9, 14],
[2, 12, 1, 10, 7, 6],
[6, 7, 2, 14, 2, 15],
[14, 8, 1, 3, -7, 2],
[0, -3, 0, 3, -3, 0],
[2, 2, 3, 3, 12, 13],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
c = 2
It is convenient, in this problem, to think of each array row as being composed of three pairs, so the 1st row is [1,3, 1,11, 9,14].
DEFINITION: d_min is the minimum difference between the elements of two consecutive pairs.
The PROBLEM: I want to retain rows of array a, where all consecutive pairs have d_min <= c. Otherwise, the rows should be eliminated.
In the 1st array row, the 1st pair (1,3) and the 2nd pair (1,11) have d_min = 1-1=0.
The 2nd pair (1,11) and the 3rd pair(9,14) have d_min = 11-9=2. (in both cases, d_min<=c, so we keep this row in a)
In the 2nd array row, the 1st pair (2,12) and the 2nd pair (1,10) have d_min = 2-1=1.
But, the 2nd pair (1,10) and the 3rd pair(7,6) have d_min = 10-7=3. (3 > c, so this row should be eliminated from array a)
Current efforts: I currently handle this problem with nested for-loops (2 deep).
The outer loop runs through the rows of array a, determining d_min between the first two pairs using:
for r in a
d_min = np.amin(np.abs(np.subtract.outer(r[:2], r[2:4])))
The inner loop uses the same method to determine the d_min between the last two pairs.
Further processing only is done only when d_min<= c for both sets of consecutive pairs.
I'm really hoping there is a way to avoid the for-loops. I eventually need to deal with 8-column arrays, and my current approach would involve 3-deep looping.
In the example, there are 4 row eliminations. The final result should look like:
a = np.array([[1, 3, 1, 11, 9, 14],
[0, -3, 0, 3, -3, 0],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
Assume the number of elements in each row is always even:
import numpy as np
a = np.array([[1, 3, 1, 11, 9, 14],
[2, 12, 1, 10, 7, 6],
[6, 7, 2, 14, 2, 15],
[14, 8, 1, 3, -7, 2],
[0, -3, 0, 3, -3, 0],
[2, 2, 3, 3, 12, 13],
[3, 14, 4, 12, 1, 4],
[0, 13, 13, 4, 0, 3]])
c = 2
# separate the array as previous pairs and next pairs
sx, sy = a.shape
prev_shape = sx, (sy - 2) // 2, 1, 2
next_shape = sx, (sy - 2) // 2, 2, 1
prev_pairs = a[:, :-2].reshape(prev_shape)
next_pairs = a[:, 2:].reshape(next_shape)
# subtract which will effectively work as outer subtraction due to numpy broadcasting, and
# calculate the minimum difference for each pair
pair_diff_min = np.abs(prev_pairs - next_pairs).min(axis=(2, 3))
# calculate the filter condition as boolean array
to_keep = pair_diff_min.max(axis=1) <= c
print(a[to_keep])
#[[ 1 3 1 11 9 14]
# [ 0 -3 0 3 -3 0]
# [ 3 14 4 12 1 4]
# [ 0 13 13 4 0 3]]
Demo Link

Ruby save modified array in a variable without it changing the original array

I'd like to save in two variables the values of an array excluding the first and last elements.
For example:
prices = [9, 3, 5, 2, 1]
The elements I need are:
prices_excl_first = [3, 5, 2, 1]
prices_excl_last = [9, 3, 5, 2]
I figured out how to remove an element from an array a few ways, including slicing off the value by passing its index to the slice method like so:
first_price = prices.slice(0)
last_price = prices.slice(-1)
We could then save the modified arrays into variables:
array_except_first_price = prices.delete(first_price) #=> [3, 5, 2, 1]
array_except_last_index = prices.delete(last_price) #=> [3, 5, 2]
There are two problems with this:
array_except_last_index doesn't contain the first element now
I still need access to the full, original array prices later
So essentially, how can I just temporarily modify the elements in the array when necessary in the problem?
Slicing and dropping elements from array permanently affect the array.
Ruby has first and last to copy just the first and last elements.
Ask for the first and last prices.size-1 elements.
prices = [9, 3, 5, 2, 1]
except_first = prices.last(prices.size-1)
except_last = prices.first(prices.size-1)
#Schwern's answer is probably the best you can get. Here's the second best:
prices = [9, 3, 5, 2, 1]
prices[1..-1] # => [3, 5, 2, 1]
prices[0..-2] # => [9, 3, 5, 2]
Or drop/take (which more closely map to the wording of your question).
prices.drop(1) # => [3, 5, 2, 1]
prices.take(prices.size-1) # => [9, 3, 5, 2]
You could use each_cons:
a, b = prices.each_cons(prices.size - 1).to_a
a #=> [9, 3, 5, 2]
b #=> [3, 5, 2, 1]
Splat it.
*a, d = prices
c, *b = prices
a #=> [9, 3, 5, 2]
b #=> [3, 5, 2, 1]
You can use dup to duplicate the array before performing destructive operations.
prices = [9, 3, 5, 2, 1]
except_first = prices.dup
except_first.delete_at 0
except_last = prices.dup
except_last.delete_at -1
This does end up duplicating the array a couple of times. If you're dealing with large arrays, this may be a problem.

Finding the first combination of two integers in an array whose latter element appears the earliest and sum matches a given value

I have array and sum_of_two:
array = [10, 5, 1, 9, 7, 8, 2, 4, 6, 9, 3, 2, 1, 4, 8, 7, 5]
sum_of_two = 10
I'm trying to find the combination of two integers in array whose latter element of the two appears the earliest among those of such combinations whose sum equals sum_of_two. For example, both [5, 5] and [1, 9] are candidates for such combinations, but 9 of [1, 9] (which appears later than 1 in array) appears earlier than the second 5 of [5, 5] (which is the last element in array). So I would like to return [1, 9].
I tried using combination and find:
array.combination(2).find{|x,y| x + y == sum_of_two} #=> [5, 5]
However, it returns a combination of the first integer in the array, 5 , and another integer further along the array, also 5.
If I use find_all instead of find, I get all combinations of two integers that add up to sum_of_two:
array.combination(2).find_all{|x,y| x + y == sum_of_two}
#=> [[5, 5], [1, 9], [1, 9], [9, 1], [7, 3], [8, 2], [8, 2], [2, 8], [4, 6], [6, 4], [9, 1], [3, 7], [2, 8]]
But then I'm not sure how to get the first one.
I would use Set (which would be a bit more efficient than using Array#include?) and do something like this:
array = [10, 5, 1, 9, 7, 8, 2, 4, 6, 9, 3, 2, 1, 4, 8, 7, 5]
sum_of_two = 10
require 'set'
array.each_with_object(Set.new) do |element, set|
if set.include?(sum_of_two - element)
break [sum_of_two - element, element]
else
set << element
end
end
#=> [1, 9]
x = array.find.with_index{|e, i| array.first(i).include?(sum_of_two - e)}
[sum_of_two - x, x] # => [1, 9]
Array#combination(n) does not give the elements in the order you want, so you must build the pairs yourself. It's easy if you begin from the second index. A O(n) lazy implementation, and let's call the input xs:
pairs = (1...xs.size).lazy.flat_map { |j| (0...j).lazy.map { |i| [xs[i], xs[j]] } }
first_matching_pair = pairs.detect { |i, j| i + j == 10 }
#=> [1, 9]

How do I arrange an array as table rows?

I am trying to learn the Ruby way of array processing. What is a succinct way to write the following function?
def columnize(items, n_cols)
Items is a 1D array of arbitrary length. I want to return an array of rows, each having a length of n_cols, that includes all of the items column-wise, possibly with nils padding the last column. For example:
items = [1, 2, 3, 4, 5, 6, 7]
table = columnize items, 3
This should produce a table of:
[[1, 4, 7],
[2, 5, nil],
[3, 6, nil]]
Note that it's possible for the last column to be all nils as in:
columnize [1, 2, 3, 4, 5, 6, 7, 8, 9], 4
This is a real problem I need to solve for report generation. I have a Ruby newbie solution that is not very satisfying and can post it if desired.
You want to use Matrix class.
items = [1, 2, 3, 4, 5, 6, 7]
require 'matrix'
# ⇒ true
m = Matrix.build(3) { |row, col| items[row+col*3] }
# ⇒ Matrix[[1, 4, 7], [2, 5, nil], [3, 6, nil]]
Ruby's Array class has transpose which is designed to convert rows into columns. Using it in conjunction with fill and Enumerable's each_slice gives:
require 'pp'
def columnize(items, cols)
ary = items.dup.fill(nil, items.size, cols - items.size % cols )
ary.each_slice(ary.size / cols).to_a.transpose
end
items = [1, 2, 3, 4, 5, 6, 7]
pp columnize(items, 3)
pp columnize [1, 2, 3, 4, 5, 6, 7, 8, 9], 4
Which outputs:
[[1, 4, 7], [2, 5, nil], [3, 6, nil]]
[[1, 4, 7, nil], [2, 5, 8, nil], [3, 6, 9, nil]]
Except for filling rows that only have nil elements, this will do:
first, *rest = items.each_slice((items.length/n_cols).ceil).to_a
first.zip(*rest)

Resources