How can I efficiently sort an array of objects on two or more attributes in Numpy?
class Obj():
def __init__(self,a,b):
self.a = a
self.b = b
arr = np.array([],dtype=Obj)
for i in range(10):
arr = np.append(arr,Obj(i, 10-i))
arr_sort = np.sort(arr, order=a,b) ???
Thx, Willem-Jan
The order parameter only applies to structured arrays:
In [383]: arr=np.zeros((10,),dtype='i,i')
In [385]: for i in range(10):
...: arr[i] = (i,10-i)
In [386]: arr
Out[386]:
array([(0, 10), (1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3), (8, 2), (9, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
In [387]: np.sort(arr, order=['f0','f1'])
Out[387]:
array([(0, 10), (1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3), (8, 2), (9, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
In [388]: np.sort(arr, order=['f1','f0'])
Out[388]:
array([(9, 1), (8, 2), (7, 3), (6, 4), (5, 5), (4, 6), (3, 7), (2, 8),
(1, 9), (0, 10)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
With a 2d array, lexsort provides a similar 'ordered' sort
In [402]: arr=np.column_stack((np.arange(10),10-np.arange(10)))
In [403]: np.lexsort((arr[:,1],arr[:,0]))
Out[403]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [404]: np.lexsort((arr[:,0],arr[:,1]))
Out[404]: array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0], dtype=int32)
With your object array, I could extract the attributes into either of these structures:
In [407]: np.array([(a.a, a.b) for a in arr])
Out[407]:
array([[ 0, 10],
[ 1, 9],
[ 2, 8],
....
[ 7, 3],
[ 8, 2],
[ 9, 1]])
In [408]: np.array([(a.a, a.b) for a in arr],dtype='i,i')
Out[408]:
array([(0, 10), (1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3),
(8, 2), (9, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
The Python sorted function will work on arr (or its list equivalent)
In [421]: arr
Out[421]:
array([<__main__.Obj object at 0xb0f2d24c>,
<__main__.Obj object at 0xb0f2dc0c>,
....
<__main__.Obj object at 0xb0f35ecc>], dtype=object)
In [422]: sorted(arr, key=lambda a: (a.b,a.a))
Out[422]:
[<__main__.Obj at 0xb0f35ecc>,
<__main__.Obj at 0xb0f3570c>,
...
<__main__.Obj at 0xb0f2dc0c>,
<__main__.Obj at 0xb0f2d24c>]
Your Obj class is missing a nice __str__ method. I have to use something like [(i.a, i.b) for i in arr] to see the values of the arr elements.
As I stated in the comment, for this example, a list is much nice than an object array.
In [423]: alist=[]
In [424]: for i in range(10):
...: alist.append(Obj(i,10-i))
list append is faster than the repeated array append. And object arrays don't add much functionality compared to a list, especially when 1d, and the objects are custom classes like this. You can't do any math on arr, and as you can see, sorting isn't any easier.
Related
Given an ndarray:
np.array(
(
(1, 2, 3, 3, 2),
(4, 5, 4, 3, 2),
(1, 1, 1, 1, 1),
(0, 0, 0, 0, 0),
(0, 2, 3, 4, 0),
)
)
extract the mean of the values bounded by a rectangle with coordinates: (1, 1), (3, 1), (1, 3), (3, 3).
The extracted region of the array would be:
5, 4, 3,
1, 1, 1,
0, 0, 0,
And the mean would be ~1.666666667
import numpy as np
arr = np.array(
(
(1, 2, 3, 3, 2),
(4, 5, 4, 3, 2),
(1, 1, 1, 1, 1),
(0, 0, 0, 0, 0),
(0, 2, 3, 4, 0),
)
)
mean = arr[1:4, 1:4].mean()
According to the question presented here: Python itertools.combinations: how to obtain the indices of the combined numbers, given the following code:
import itertools
my_list = [7, 5, 5, 4]
pairs = list(itertools.combinations(my_list , 2))
#pairs = [(7, 5), (7, 5), (7, 4), (5, 5), (5, 4), (5, 4)]
indexes = list(itertools.combinations(enumerate(my_list ), 2)
#indexes = [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
Is there any way to obtain pairs and indexes in a single line so I can have a lower complexity in my code (e.g. using enumerate or something likewise)?
#Maf - try this, this is as #jonsharpe suggested earlier, use zip:
from pprint import pprint
from itertools import combinations
my_list = [7, 5, 5, 4]
>>> pprint(list(zip(combinations(enumerate(my_list),2), combinations(my_list,2))))
[(((0, 7), (1, 5)), (7, 5)),
(((0, 7), (2, 5)), (7, 5)),
(((0, 7), (3, 4)), (7, 4)),
(((1, 5), (2, 5)), (5, 5)),
(((1, 5), (3, 4)), (5, 4)),
(((2, 5), (3, 4)), (5, 4))]
(Explicit is better than implicit. Simple is better than complex.)
I would use list-comprehension for its flexiblity:
list((x, (x[0][1], x[1][1])) for x in list(combinations(enumerate(my_list), 2)))
This can be further extended using the likes of opertor.itemgetter.
Also, the idea is to run use the iterator only once, so that the method can potentially be applied to other non-deterministic iterators as well, say, an yield from random.choices.
So, I have the following array (structured as Array{Tuple{Int,Float64,Int,Int},1} but it can also be an Array of Arrays) and where the first element of the tuple is an ID and the second is a number indicating a cost. What i want to do is to group by ID and then take the cost difference between the cheapest and the second cheapest cost for such ID, if there is no second cost, the cost difference should be typemax(Float64) -firstcost. Regarding the third and fourth elements of the Tuple, I want to keep those of the firstcost (or minimum cost in that sense).
Example of what I have
(1, 223.2, 2, 2)
(1, 253.2, 3, 2)
(2, 220.0, 4, 6)
(3, 110.0, 1, 4)
(3, 100.0, 3, 8)
Example of what I want:
(1, 30.0, 2, 2)
(2, typemax(Float64)-220.0, 4, 6)
(3,10.0, 3, 8)
This is one way of doing it:
A = [(1, 223.2, 2, 2), (1, 253.2, 3, 2), (2, 220.0, 4, 6), (3, 110.0, 1, 4), (3, 100.0, 3, 8)]
function f(a)
aux(b::Vector) = (b[1][1], (length(b) == 1 ? typemax(Float64) : b[2][2]) - b[1][2], b[1][3:4]...)
sort([aux(sort(filter(x -> x[1] == i, a))) for i in Set(map(first, a))])
end
#show f(A)
There's SplitApplyCombine.jl, which implements (unsurprisingly) a split-apply-combine logic like that found in DataFrames. This is an example where I would stay away from simple one-liners / short solution and write things out more explicitly in the interest of making the code readable and understandable if someone else (or you yourself in a few months time!) reads it:
julia> tups = [(1, 223.2, 2, 2)
(1, 253.2, 3, 2)
(2, 220.0, 4, 6)
(3, 110.0, 1, 4)
(3, 100.0, 3, 8)]
5-element Array{Tuple{Int64,Float64,Int64,Int64},1}:
(1, 223.2, 2, 2)
(1, 253.2, 3, 2)
(2, 220.0, 4, 6)
(3, 110.0, 1, 4)
(3, 100.0, 3, 8)
julia> using SplitApplyCombine
julia> function my_fun(x) # function to apply
if length(x) == 1
return (x[1][1], typemax(Float64) - x[1][2], x[1][3], x[1][4])
else
return (x[1][1], -diff(sort(getindex.(x, 2), rev = true)[1:2]), x[1][4])
end
end
my_fun (generic function with 1 method)
julia> [my_fun(x) for x in group(first, tups)] # apply function group wise
3-element Array{Tuple{Int64,Any,Int64,Vararg{Int64,N} where N},1}:
(2, Inf, 4, 6)
(3, [10.0], 4)
(1, [30.0], 2)
If performance is a concern you might want to think about my_fun and do some profiling to see if and how you can improve it - the only thing I've done here is to use diff to subtract the first from the second element of the sorted array to avoid sorting twice.
How do you implement (or create) an array sort of a list of tuples?
The following was gleaned from my code. Essentially I created an array of tuples
and populated it via for loop; after which I tried to sort it.
var myStringArray: (String,Int)[]? = nil
...
myStringArray += (kind,number)
...
myStringArray.sort{$0 > $1}
This is what Xcode gave me before I could build:
test.swift:57:9: '(String, Int)[]?' does not have a member named
'sort'
You have two problems. First, myStringArray is an Optional, you must "unwrap" it before you can call methods on it. Second, there is no > operator for tuples, you must do the comparison yourself
if let myStringArray = myStringArray {
myStringArray.sort { $0.0 == $1.0 ? $0.1 > $1.1 : $0.0 > $1.0 }
}
Actually, what I was looking for, is the tuple with the largest integer value:
var myStringArray: (String,Int)[]? = nil
...
println("myStringArray: \(myStringArray)\n")
myStringArray!.sort {$0.1 > $1.1}
println("myStringArray: \(myStringArray)\n")
...
Original:
myStringArray: [(One, 1), (Square, 1), (Square, 4), (Square, 9),
(Square, 16), (Square, 25), (Prime, 2), (Prime, 3), (Prime, 5),
(Prime, 7), (Prime, 11), (Prime, 13), (Fibonacci, 1), (Fibonacci, 1),
(Fibonacci, 2), (Fibonacci, 3), (Fibonacci, 5), (Fibonacci, 8)]
Sorted:
myStringArray: [(Square, 25), (Square, 16), (Prime, 13), (Prime, 11),
(Square, 9), (Fibonacci, 8), (Prime, 7), (Prime, 5), (Fibonacci, 5),
(Square, 4), (Prime, 3), (Fibonacci, 3), (Prime, 2), (Fibonacci, 2),
(One, 1), (Square, 1), (Fibonacci, 1), (Fibonacci, 1)]
...so it's the "square" having the largest integer: 25.
I am new to Numpy and I am not an expert programmer at all...
This is my issue:
I have array a and array b (b < a).
I want to substitute some rows of a with all the rows of b (in order).
The rows in a to be substituted have one value in common with b rows.
for example:
a is:
1 a 2
3 b 4
0 z 0
5 c 6
0 y 0
b is
1 z 1
1 y 1
In this case I will want to substitute rows 3 and 5 in a with 1 and 2 in b. The arrays are very big and, as in the example, there are some character types (so I set the array to dtype=object).
First, I would recommend changing your dtype from object to [('a',int),('letter','S1'),('b',int)]. What this does is allow you to have columns of different dtypes (note the length of that dtype list is the number of columns). Then you have:
In [63]: a = np.array(
....: [(1,'a',2),(3,'b',4),(0,'z',0),(5,'c',6),(0,'y',0)],
....: dtype=[('a',int),('letter','S1'),('b',int)])
In [64]: a
Out[64]:
array([(1, 'a', 2), (3, 'b', 4), (0, 'z', 0), (5, 'c', 6), (0, 'y', 0)],
dtype=[('a', '<i8'), ('letter', 'S1'), ('b', '<i8')])
In [65]: a['a']
Out[65]: array([1, 3, 0, 5, 0])
In [66]: a['b']
Out[66]: array([2, 4, 0, 6, 0])
In [67]: a['letter']
Out[67]:
array(['a', 'b', 'z', 'c', 'y'],
dtype='|S1')
Then you can use the same dtype for b:
In [69]: b = np.array([(1,'z',1),(1,'y',1)], dtype=a.dtype)
In [70]: b
Out[70]:
array([(1, 'z', 1), (1, 'y', 1)],
dtype=[('a', '<i8'), ('letter', 'S1'), ('b', '<i8')])
In [71]: b['letter']
Out[71]:
array(['z', 'y'],
dtype='|S1')
Now, you can simply replace the parts of a with the parts of b where the letters match:
In [73]: a
Out[73]:
array([(1, 'a', 2), (3, 'b', 4), (0, 'z', 0), (5, 'c', 6), (0, 'y', 0)],
dtype=[('a', '<i8'), ('letter', 'S1'), ('b', '<i8')])
In [74]: b
Out[74]:
array([(1, 'z', 1), (1, 'y', 1)],
dtype=[('a', '<i8'), ('letter', 'S1'), ('b', '<i8')])
In [75]: for row in b:
....: a[a['letter']==row['letter']] = row
....:
In [76]: a
Out[76]:
array([(1, 'a', 2), (3, 'b', 4), (1, 'z', 1), (5, 'c', 6), (1, 'y', 1)],
dtype=[('a', '<i8'), ('letter', 'S1'), ('b', '<i8')])