Extract fixed number of elements per row in numpy array - arrays

Suppose I have an array a, and a boolean array b, I want to extract a fixed number of elements from the valid elements in each row of a. The valid elements are the ones indicated by b.
Here is an example:
a = np.arange(24).reshape(4,6)
b = np.array([[0,0,1,1,0,0],[0,1,0,1,0,1],[0,1,1,1,1,0],[0,0,0,0,1,1]]).astype(bool)
x = []
for i in range(a.shape[0]):
c = a[i,b[i]]
d = np.random.choice(c, 2)
x.append(d)
Here I used a for loop, which will be slow in case these arrays are big and high-dimensional. Is there a more efficient way to do this? Thanks.

Generate a random uniform [0, 1] matrix of shape a.
Multiply this matrix by the mask b to set invalid elements to zero.
Select the k maximum indices from each row (simulating an unbiased random k-sample from only the valid elements in this row).
(Optional) use these indices to get the elements.
a = np.arange(24).reshape(4,6)
b = np.array([[0,0,1,1,0,0],[0,1,0,1,0,1],[0,1,1,1,1,0],[0,0,0,0,1,1]])
k = 2
r = np.random.uniform(size=a.shape)
indices = np.argpartition(-r * b, k)[:,:k]
To get the elements from the indices:
>>> indices
array([[3, 2],
[5, 1],
[3, 2],
[4, 5]])
>>> a[np.arange(a.shape[0])[:,None], indices]
array([[ 3, 2],
[11, 7],
[15, 14],
[22, 23]])

Related

sets of numpy 2D array rows having unique values in each row position

Consider array a, holding some of the permutations of 1,2,3,4. (my actual arrays may be larger)
import numpy as np
n = 3
a = np.array([[1, 2, 3, 4],
[1, 2, 4, 3],
[1, 3, 4, 2],
[1, 4, 3, 2],
[2, 3, 4, 1],
[2, 4, 3, 1],
[3, 1, 4, 2],
[4, 1, 2, 3]]))
I want to identify sets of n rows (in this example, n=3) where each row position holds unique values.
In this example, the output would be:
out = [[0, 4, 7],
[2, 5, 7],
[3, 4, 7]]
The 1st row of out indicates that a[0], a[4], and a[7] have unique values in each row position
When n = 2, there are 11 row pairs that match the criteria: [[0,4], [0,6], [0,7], [1,5] ...etc
When n = 4, there are 0 rows that match the criteria.
I'm new enough to python that I can't find a good way to approach this situation.
Solving this problem efficiently is far from not easy. Indeed, the brute-force solution consisting is using n nested loop is very inefficient: its complexity is O(c r! / (r-n)!) where r is the number of rows of a and c is the number of columns of a (note that ! is the factorial). Since r is a number of permutation which already grow experientially with the number of unique items in a, this means the complexity of this solution is really bad.
A more efficient solution (but still not great) is to pick a row, filter the other rows that can match with it (ie. there is no items at the same position that are equal), and then recursively do the same thing n times (the picked row are only the one that are filtered). The several sets of row indices can be appended in a list during the recursion. It is hard to evaluate the complexity of this solution, but it is far much faster in practice since most rows hopefully does not match together and the filtered rows tends to decrease exponentially too. That being said, the complexity is certainly still exponential since the size of the output appears to grow exponentially too and the output needs to be written.
Here is the implementation:
def recursiveFindSets(a, i, n, availableRows, rowIndices, results):
if availableRows.size == 0:
return
for k in availableRows:
# Save the current choice
rowIndices[i] = k
# The next selected rows needs to be bigger than `k` so to prevent replicates
newAvailableRows = availableRows[availableRows > k]
# If there is no solutions with a[k], then choose another
if newAvailableRows.size == 0:
continue
# Find rows that contains different items of a[i]
goodMatches = np.all(a[newAvailableRows] != a[k], axis=1)
# Find the location relative to `a` and not `a[availableRows]`
newAvailableRows = newAvailableRows[goodMatches]
# If there is no solutions with a[k], then choose another
if newAvailableRows.size == 0:
continue
if i == n-2:
# Generate some solutions from `newAvailableRows`
for k2 in newAvailableRows:
rowIndices[i+1] = k2
results.append(rowIndices.copy())
elif i < n-2:
recursiveFindSets(a, i+1, n, newAvailableRows, rowIndices, results)
def findSets(a, n):
availableRows = np.arange(a.shape[0], dtype=int) # Filter
rowIndices = np.empty(n, dtype=np.int_) # Current set of row indices
results = [] # List of all the sets
recursiveFindSets(a, 0, n, availableRows, rowIndices, results)
if len(results) == 0:
return np.empty((0, n), dtype=int)
return np.vstack(results)
findSets(a, 3)
# Output:
# array([[0, 4, 7],
# [2, 5, 7],
# [3, 4, 7]])
You can reduce this problem to finding all cliques of size n in an undirected graph. Nodes in the graph are given by row indices of a. There is an edge between i and j if (a[i] != a[j]).all().
Here is one implementation based on networkx. A function enumerate_all_cliques(g) iterates over cliques in g in order of increasing size. We discard all cliques of size less than n, keep those of size n, and stop once the first clique of size greater than n is found or cliques run out.
from itertools import combinations
import networkx as nx
def f(arr, n):
nodes = np.arange(arr.shape[0])
g = nx.Graph()
g.add_nodes_from(nodes)
for i, j in combinations(nodes, 2):
if (arr[i] != arr[j]).all():
g.add_edge(i, j)
for c in nx.algorithms.clique.enumerate_all_cliques(g):
lc = len(c)
if lc == n:
yield c
elif lc > n:
break
print(list(f(a, 3)))
# [[0, 4, 7], [2, 5, 7], [3, 4, 7]]
Here is another approach: find all maximal cliques and yield all subsets of size n from each clique. This can lead to double-counting, hence set is used before the return statement.
def f(arr, n):
nodes = np.arange(arr.shape[0])
g = nx.Graph()
g.add_nodes_from(nodes)
for i, j in combinations(nodes, 2):
if (arr[i] != arr[j]).all():
g.add_edge(i, j)
cliques = set()
# iterate over maximal cliques
for c in nx.algorithms.clique.find_cliques(g):
# update the clique set subsets of c
cliques.update(map(frozenset, combinations(c, n)))
# return all cliques of size n without doublecounting
return [list(c) for c in cliques]
print(f(a, 3))
# [[2, 5, 7], [3, 4, 7], [0, 4, 7]]
The performance of either approach will vary depending on input values.

circularArrayRotation algorithm ruby

I am using hacker rank and I do not understand why my ruby code only works for one test case out of like 20. Here is the question:
John Watson knows of an operation called a right circular rotation on
an array of integers. One rotation operation moves the last array
element to the first position and shifts all remaining elements right
one. To test Sherlock's abilities, Watson provides Sherlock with an
array of integers. Sherlock is to perform the rotation operation a
number of times then determine the value of the element at a given
position.
For each array, perform a number of right circular rotations and
return the values of the elements at the given indices.
Function Description
Complete the circularArrayRotation function in the editor below.
circularArrayRotation has the following parameter(s):
int a[n]: the array to rotate
int k: the rotation count
int queries[1]: the indices to report
Returns
int[q]: the values in the rotated a as requested in m
Input Format
The first line contains 3 space-separated integers, n, k, and q, the number of elements in the integer array, the rotation count and the number of queries. The second line contains n space-separated integers,
where each integer i describes array element a[i] (where 0 <= i < n). Each of the q subsequent lines contains a single integer, queries[i], an index of an element
in a to return.
Constraints
Sample Input 0
3 2 3
1 2 3
0
1
2
Sample Output 0
2
3
1
Here is my code :
def circularArrayRotation(a, k, queries)
q = []
while k >= 1
m = a.pop()
a.unshift m
k = k - 1
end
for i in queries do
v = a[queries[i]]
q.push v
end
return q
end
It only works for the sample text case but I can't figure out why. Thanks for any help you can provide.
Haven't ran any benchmarks, but this seems like a job for the aptly named Array.rotate() method:
def index_at_rotation (array, num_rotations, queries)
array = array.rotate(-num_rotations)
queries.map {|q| array[q]}
end
a = [1, 2, 3]
k = 2
q = [0,1, 2]
index_at_rotation(a, k, q)
#=> [2, 3, 1]
Handles negative rotation values and nil results as well:
a = [1, 6, 9, 11]
k = -1
q = (1..4).to_a
index_at_rotation(a, k, q)
#=> [9, 11, 1, nil]
I don't see any errors in your code, but I would like to suggest a more efficient way of making the calculation.
First observe that after q rotations the element at index i will at index (i+q) % n.
For example, suppose
n = 3
a = [1,2,3]
q = 5
Then after q rotations the array will be as follows.
arr = Array.new(3)
arr[(0+5) % 3] = a[0] #=> arr[2] = 1
arr[(1+5) % 3] = a[1] #=> arr[0] = 2
arr[(2+5) % 3] = a[2] #=> arr[1] = 3
arr #=> [2,3,1]
We therefore can write
def doit(n,a,q,queries)
n.times.with_object(Array.new(n)) do |i,arr|
arr[(i+q) % n] = a[i]
end.values_at(*queries)
end
doit(3,[1,2,3],5,[0,1,2])
#=> [2,3,1]
doit(3,[1,2,3],5,[2,1])
#=> [1, 3]
doit(3,[1,2,3],2,[0,1,2])
#=> [2, 3, 1]
p doit(3,[1,2,3],0,[0,1,2])
#=> [1,2,3]
doit(20,(0..19).to_a,25,(0..19).to_a.reverse)
#=> [14, 13, 12, 11, 10, 9, 8, 7, 6, 5,
# 4, 3, 2, 1, 0, 19, 18, 17, 16, 15]
Alternatively, we may observe that after q rotations the element at index j was initially at index (j-q) % n.
For the earlier example, after q rotations the array will be
[a[(0-5) % 3], a[(1-5) % 3], a[(2-5) % 3]]
#=> [a[1], a[2], a[0]]
#=> [2,3,1]
We therefore could instead write
def doit(n,a,q,queries)
n.times.map { |j| a[(j-q) % n] }.values_at(*queries)
end

Determine indices of N number of non-zero minimum values in array

I have an array of x size and need to determine the indices of n of the smallest values. I found this link (I have need the N minimum (index) values in a numpy array) discussing how to get multiple minimum values but it doesn't work as well when my array has zeros in it.
For example:
x = [10, 12, 11, 9, 0, 1, 15, 4, 10]
n = 3
I need to find the indices of the 3 lowest non-zero values so the result would be
non_zero_min_ind = [5, 7, 3]
They don't need to be be in any order. I am trying to do this in python 3. Any help would be greatly appreciated.
Using numpy:
import numpy as np
y = np.argsort(x)
y[np.array(x)[y]!=0][:n]
array([5, 7, 3])

Divide array into equal sized subarrays

we are given an array of size n (n is even), we have to divide it into two equal-sized subarrays array1 and array2, sized n/2 each such that product of all the numbers of array1 equals to the product of all the numbers in array2.
Given array:
arr = [2, 4, 5, 12, 15, 18]
solution:
array2 = [4, 5, 18]
array1 = [2, 12, 15]
Explanation:
product of all elements in array1 is 360
product of all elements in array2 is 360.
This problem might be soved using dynamic programming. You need to get subset of size n/2 with product equal to p = sqrt(overall_product). Note that there is no solution when overall_product is not exact square.
Recursion might look like
solution(p, n/2, arr) = choose valid solution from
solution(p / arr[i], n/2-1, arr without arr[i])
return true for arguments (1,0,...)
use memoization or table to solve problem with merely large n values.

How to extract lines in an array, which contain a certain value? (numpy, scipy)

I have an numpy 2D array and I want it to return coloumn c where (r, c-1) (row r, coloumn c) equals a certain value (int n).
I don't want to iterate over the rows writing something like
for r in len(rows):
if array[r, c-1] == 1:
store array[r,c]
, because there are 4000 of them and this 2D array is just one of 20 i have to look trough.
I found "filter" but don't know how to use it (Found no doc).
Is there an function, that provides such a search?
I hope I understood your question correctly. Let's say you have an array a
a = array(range(7)*3).reshape(7, 3)
print a
array([[0, 1, 2],
[3, 4, 5],
[6, 0, 1],
[2, 3, 4],
[5, 6, 0],
[1, 2, 3],
[4, 5, 6]])
and you want to extract all lines where the first entry is 2. This can be done like this:
print a[a[:,0] == 2]
array([[2, 3, 4]])
a[:,0] denotes the first column of the array, == 2 returns a Boolean array marking the entries that match, and then we use advanced indexing to extract the respective rows.
Of course, NumPy needs to iterate over all entries, but this will be much faster than doing it in Python.
Numpy arrays are not indexed. If you need to perform this specific operation more effeciently than linear in the array size, then you need to use something other than numpy.

Resources