How to extract lines in an array, which contain a certain value? (numpy, scipy) - arrays

I have an numpy 2D array and I want it to return coloumn c where (r, c-1) (row r, coloumn c) equals a certain value (int n).
I don't want to iterate over the rows writing something like
for r in len(rows):
if array[r, c-1] == 1:
store array[r,c]
, because there are 4000 of them and this 2D array is just one of 20 i have to look trough.
I found "filter" but don't know how to use it (Found no doc).
Is there an function, that provides such a search?

I hope I understood your question correctly. Let's say you have an array a
a = array(range(7)*3).reshape(7, 3)
print a
array([[0, 1, 2],
[3, 4, 5],
[6, 0, 1],
[2, 3, 4],
[5, 6, 0],
[1, 2, 3],
[4, 5, 6]])
and you want to extract all lines where the first entry is 2. This can be done like this:
print a[a[:,0] == 2]
array([[2, 3, 4]])
a[:,0] denotes the first column of the array, == 2 returns a Boolean array marking the entries that match, and then we use advanced indexing to extract the respective rows.
Of course, NumPy needs to iterate over all entries, but this will be much faster than doing it in Python.

Numpy arrays are not indexed. If you need to perform this specific operation more effeciently than linear in the array size, then you need to use something other than numpy.

Related

Select rows by minimum values of a column considering unique values of another column (numpy array)

I want to select only the rows for each unique value of a column (first column) that have a minimum value in another column (second column).
How can I do it?
Let's say I have this array:
[[10, 1], [10, 5], [10, 2], [20, 4], [20, 1], [20, 7], [20, 2], [40, 7], [40, 4], [40, 5]]
I would like to obtain the following array:
[[10, 1], [20, 1], [40, 4]]
I was trying selecting rows in this way:
d = {i: array[array[:, 0] == i] for i in np.unique(array[:, 0])}
but then I dont't know how to detect the one with minimum value in the second row.
What you want is the idea of groupby, as implemented in pandas for instance. As we don't have that in numpy, let's implement something similar to this other answer.
Let's call your input array A. So first, sort the rows by the values in the first column. We do this so that all entries with the same value appear one after the other.
sor = A[A[:,0].argsort()]
And get the indices where new unique values are found.
uniq=np.unique(sor[:,0],return_index=True)[1]
print(uniq)
>>> array([0, 3, 7])
This indicates the places of the array where we need to cut to get groups. Now split the second column into such groups. That way you get chunks of elements of the second column, grouped by the elements on the first column.
grp=np.split(sor[:,1],uni[1:])
print(grp)
>>> [array([1, 5, 2]), array([4, 1, 7, 2]), array([7, 4, 5])]
Last step is to get the index of the minimum value out of each of these groups
ind=np.array(list(map(np.argmin,grp))) + uni
print(ind)
>>> array([0, 4, 8])
The first part maps the np.argmin function to every group in grp. The + uniq part is there for mapping every one of these minimum arguments into the original scale.
Now you only need to index your sorted array using these indices.
print(sor[ind])
>>> array([[10, 1],
[20, 1],
[40, 4]])

Loop to perform operation on i+1 in numpy array

I have a numpy array, I'd like to take the 3 numbers in each row, minus them from the next row and store those values in another array.
something like
for i in array:
a = i - i+1
I know this is very wrong, but at least this gives the idea of what I want.
Obviously i+1 will just result in the value + 1 and then all I have is a = 1,1,1
When I say i+1 I mean the next in line.
So for example:
input = np.array([[4,4,5], [2,3,1],[1,2,0]])
output = np.array([2,1,4],[1,1,1]) etc....
What would be the best way to do this iteratively on thousands of rows?
IIUC, instead of looping, you can just shift your arrays 1 up using np.roll, subtract that from your original input, and take all the resulting arrays except the last (because there will be nothing to subtract from the last array):
>>> inp = np.array([[4,4,5], [2,3,1],[1,2,0]])
>>> inp
array([[4, 4, 5],
[2, 3, 1],
[1, 2, 0]])
>>> (inp - np.roll(inp,-1,axis=0))[:-1]
array([[2, 1, 4],
[1, 1, 1]])
Or, a more straightforward way would just be to use numpy indexing:
>>> inp[:-1] - inp[1:]
array([[2, 1, 4],
[1, 1, 1]])

Algorithm Logic, Splitting Arrays

I'm not looking for a solution just pseudo code or logic that would help me derive an answer.
Given an array:
[1,2,3,4]
I want to split this into two arrays of varying lengths and contents whose sum lengths are equal to the length of the given array. It would be ideal without repetition.
Example output:
[[1],[2, 3, 4]]
[[1, 2], [3, 4]]
[[1, 3], [2, 4]]
[[1, 4],[2, 3]]
[[1, 2, 3], [4]]
[[2], [1, 3, 4]]
[[2, 4], [1, 3]]
[[3], [1, 2, 4]]
More example:
[[1, 3, 4, 6, 8], [2, 5, 7]] //this is a possible combination of 1 through 8
//array
Intuitions:
First attempt involved pushing the starting number array[i] to the result array[0], the second loop moving the index for the third loop to start iterating as is grabbed sublists. Then fill the other list with remaining indices. Was poorly conceived...
Second idea is permutations. Write an algorithm that reorganizes the array into every possible combination. Then, perform the same split operation on those lists at different indexes keeping track of unique lists as strings in a dictionary.
[1,2,3,4,5,6,7,8]
^
split
[1,2,3,4,5,6,7,8]
^
split
[1,3,4,5,6,7,8,2]
^
split
I'm confident that this will produce the lists i'm looking for. However! i'm afraid it may be less efficient than I'd like due to the need for sorting when checking for duplicates and permutations is expensive in the first place.
Please respond with how you would approach this problem, and why.
Pseudocode. The idea is to start with an item in one of the bags, and then to place the next item once in the same bag, once in the other.
function f(A):
// Recursive function to collect arrangements
function g(l, r, i):
// Base case: no more items
if i == length(A):
return [[l, r]]
// Place the item in the left bag
return g(l with A[i], r, i + 1)
// Also return a version where the item
// is placed in the right bag
concatenated with g(l, r with A[i], i + 1)
// Check that we have at least one item
if A is empty:
return []
// Start the recursion with one item placed
return g([A[0]], [], 1)
(PS see revisions for JavaScript code.)

Iterating Over a List of Lists in Haskell?

I'm having some difficulties understanding how to iterate through a list in Haskell. I've been trying to work with mapM but for some reason I keep on coming up with parsing errors. I know that this can be done recursively, but the code within the iteration/for loop is only a small part of the whole function so I wouldn't want to recursively call the function. So for example, if I have a list of lists like
[[0, 1, 2], [2, 3, 4], [4, 5, 6]]
how would I go about first iterating through each list to see if the sum of values in each list is > 5 and then within each list, iterating through the individual values to check if there is an integer = 2 in the list (and return True in that case)?
how would I go about first iterating through each list to see if the sum of values in each list is > 5
Lets say your list is
l = [[0, 1, 2], [2, 3, 4], [4, 5, 6]]
You can get the lists whose values sum up more than five, using filter. The first argument of filter is a function (\xs -> sum xs > 5), that given a list xs, decides if the sum of its elements is bigger than 5
> filter (\xs -> sum xs > 5) l
[[2,3,4],[4,5,6]]
and then within each list, iterating through the individual values to check if there is an integer = 2 in the list
Same as before, you use filter, but now you check if number 2 is an element of each list xs
> filter (\xs -> 2 `elem` xs) l
[[0,1,2],[2,3,4]]
It's not 100% clear to me exactly what you want to do, but here are some building blocks that may help:
Find which lists have a sum greater than 5:
Prelude> filter ((>5) . sum) [[0, 1, 2], [2, 3, 4], [4, 5, 6]]
[[2,3,4],[4,5,6]]
Find whether a list contains the number 2:
Prelude> any (==2) [1,2,3]
True
Prelude> any (==2) [4,5,6]
False
Combine the above, to give True / False for each list whose sum is greater than 5:
Prelude> (map (any (==2)) . filter ((>5) . sum)) [[0, 1, 2], [2, 3, 4], [4, 5, 6]]
[True,False]
List comprehensions are iterative. I'm not sure what you want so the following creates a tuple of the sum greater than 5 of each list and the second item of the tuple weather any element of the list is 2. The check for elements equal to 2 is another short list comprehension.
[(a,[True|t<-b,t == 2])|b<-[[0,1,2],[2,3,4,5],[6,7,8]], a<-[sum b], a>5]
[(14,[True]),(21,[])]

numpy using multidimensional index array on another multidimensional array

I have a 2 multidimensional arrays, and I'd like to use one as the index to produce a new multidimensional array. For example:
a = array([[4, 3, 2, 5],
[7, 8, 6, 8],
[3, 1, 5, 6]])
b = array([[0,2],[1,1],[3,1]])
I want to use the first array in b to return those indexed elements in the first array of a, and so on. So I want the output to be:
array([[4,2],[8,8],[6,1]])
This is probably simple but I couldn't find an answer by searching. Thanks.
This is a little tricky, but the following will do it:
>>> a[np.arange(3)[:, np.newaxis], b]
array([[4, 2],
[8, 8],
[6, 1]])
You need to index both the rows and the columns of the a array, so to match your b array you would need an array like this:
rows = np.array([[0, 0],
[1, 1],
[2, 2]])
And then a[rows, b] would clearly return what you are after. You can get the same result relying on broadcasting as above, replacing the rows array with np.arange(3)[:, np.newaxis], which is equivalent to np.arange(3).reshape(3, 1).

Resources