Permutations with a limited length in Julia - permutation

In Python, you can use itertools to generate permutations like so:
>>> list(itertools.permutations("ABC", 2))
[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
Julia has a similar permutations function, but it only accepts one argument. What is the best way to emulate the second argument in the Python function?

subsets from Iterators.jl with k=2 should get you every subset of size 2, then you could take every permutation of the subset.

Related

Select records of specific data type from numpy recarray

I have a numpy recarray, that has records of different data types or dtypes.
import numpy as np
a = np.array([1,2,3,4], dtype=int)
b = np.array([6,6,6,6], dtype=int)
c = np.array(['p', 'q', 'r', 's'], dtype=object)
d = np.array(['a', 'b', 'c', 'd'], dtype=object)
X = np.rec.fromarrays([a, b, c, d], names=['a', 'b', 'c', 'd'])
X
>>> rec.array([(1, 6, 'p', 'a'), (2, 6, 'q', 'b'), (3, 6, 'r', 'c'),
(4, 6, 's', 'd')],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', 'O'), ('d', 'O')])
I tried to select records of object data type using select_dtypes, but I get a attribute error
X.select_dtypes(include='object')
>>>AttributeError: recarray has no attribute select_dtypes
Is there an equivalent of the select_dtype function for numpy recarrays where I can select columns of specific data type ?
In [74]: X
Out[74]:
rec.array([(1, 6, 'p', 'a'), (2, 6, 'q', 'b'), (3, 6, 'r', 'c'),
(4, 6, 's', 'd')],
dtype=[('a', '<i4'), ('b', '<i4'), ('c', 'O'), ('d', 'O')])
recarray can access field as attribute or indexing:
In [75]: X.a
Out[75]: array([1, 2, 3, 4])
In [76]: X['a']
Out[76]: array([1, 2, 3, 4])
In [77]: X.dtype.fields
Out[77]:
mappingproxy({'a': (dtype('int32'), 0),
'b': (dtype('int32'), 4),
'c': (dtype('O'), 8),
'd': (dtype('O'), 16)})
testing the pandas approach:
In [78]: import pandas as pd
In [79]: df=pd.DataFrame(X)
In [80]: df
Out[80]:
a b c d
0 1 6 p a
1 2 6 q b
2 3 6 r c
3 4 6 s d
In [83]: df.select_dtypes(include=object)
Out[83]:
c d
0 p a
1 q b
2 r c
3 s d
Exploring the dtype:
In [84]: X.dtype
Out[84]: dtype((numpy.record, [('a', '<i4'), ('b', '<i4'), ('c', 'O'), ('d', 'O')]))
In [85]: X.dtype.fields
Out[85]:
mappingproxy({'a': (dtype('int32'), 0),
'b': (dtype('int32'), 4),
'c': (dtype('O'), 8),
'd': (dtype('O'), 16)})
Checking dtype by field:
In [89]: X['a'].dtype
Out[89]: dtype('int32')
In [90]: X['c'].dtype
Out[90]: dtype('O')
In [91]: X['c'].dtype == object
Out[91]: True
So a list comprehension works:
In [93]: [name for name in X.dtype.names if X[name].dtype==object]
Out[93]: ['c', 'd']
df.select_dtypes is python code, but fairly complex, handling the include and exclude lists.
In [95]: timeit [name for name in X.dtype.names if X[name].dtype==object]
16.5 µs ± 269 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [96]: timeit df.select_dtypes(include=object)
110 µs ± 2.24 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Python itertools.combinations: how to obtain the indices of the combined numbers within the combinations at the same time

According to the question presented here: Python itertools.combinations: how to obtain the indices of the combined numbers, given the following code:
import itertools
my_list = [7, 5, 5, 4]
pairs = list(itertools.combinations(my_list , 2))
#pairs = [(7, 5), (7, 5), (7, 4), (5, 5), (5, 4), (5, 4)]
indexes = list(itertools.combinations(enumerate(my_list ), 2)
#indexes = [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
Is there any way to obtain pairs and indexes in a single line so I can have a lower complexity in my code (e.g. using enumerate or something likewise)?
#Maf - try this, this is as #jonsharpe suggested earlier, use zip:
from pprint import pprint
from itertools import combinations
my_list = [7, 5, 5, 4]
>>> pprint(list(zip(combinations(enumerate(my_list),2), combinations(my_list,2))))
[(((0, 7), (1, 5)), (7, 5)),
(((0, 7), (2, 5)), (7, 5)),
(((0, 7), (3, 4)), (7, 4)),
(((1, 5), (2, 5)), (5, 5)),
(((1, 5), (3, 4)), (5, 4)),
(((2, 5), (3, 4)), (5, 4))]
(Explicit is better than implicit. Simple is better than complex.)
I would use list-comprehension for its flexiblity:
list((x, (x[0][1], x[1][1])) for x in list(combinations(enumerate(my_list), 2)))
This can be further extended using the likes of opertor.itemgetter.
Also, the idea is to run use the iterator only once, so that the method can potentially be applied to other non-deterministic iterators as well, say, an yield from random.choices.

create an array based on grouping (and conditions) from an array

So, I have the following array (structured as Array{Tuple{Int,Float64,Int,Int},1} but it can also be an Array of Arrays) and where the first element of the tuple is an ID and the second is a number indicating a cost. What i want to do is to group by ID and then take the cost difference between the cheapest and the second cheapest cost for such ID, if there is no second cost, the cost difference should be typemax(Float64) -firstcost. Regarding the third and fourth elements of the Tuple, I want to keep those of the firstcost (or minimum cost in that sense).
Example of what I have
(1, 223.2, 2, 2)
(1, 253.2, 3, 2)
(2, 220.0, 4, 6)
(3, 110.0, 1, 4)
(3, 100.0, 3, 8)
Example of what I want:
(1, 30.0, 2, 2)
(2, typemax(Float64)-220.0, 4, 6)
(3,10.0, 3, 8)
This is one way of doing it:
A = [(1, 223.2, 2, 2), (1, 253.2, 3, 2), (2, 220.0, 4, 6), (3, 110.0, 1, 4), (3, 100.0, 3, 8)]
function f(a)
aux(b::Vector) = (b[1][1], (length(b) == 1 ? typemax(Float64) : b[2][2]) - b[1][2], b[1][3:4]...)
sort([aux(sort(filter(x -> x[1] == i, a))) for i in Set(map(first, a))])
end
#show f(A)
There's SplitApplyCombine.jl, which implements (unsurprisingly) a split-apply-combine logic like that found in DataFrames. This is an example where I would stay away from simple one-liners / short solution and write things out more explicitly in the interest of making the code readable and understandable if someone else (or you yourself in a few months time!) reads it:
julia> tups = [(1, 223.2, 2, 2)
(1, 253.2, 3, 2)
(2, 220.0, 4, 6)
(3, 110.0, 1, 4)
(3, 100.0, 3, 8)]
5-element Array{Tuple{Int64,Float64,Int64,Int64},1}:
(1, 223.2, 2, 2)
(1, 253.2, 3, 2)
(2, 220.0, 4, 6)
(3, 110.0, 1, 4)
(3, 100.0, 3, 8)
julia> using SplitApplyCombine
julia> function my_fun(x) # function to apply
if length(x) == 1
return (x[1][1], typemax(Float64) - x[1][2], x[1][3], x[1][4])
else
return (x[1][1], -diff(sort(getindex.(x, 2), rev = true)[1:2]), x[1][4])
end
end
my_fun (generic function with 1 method)
julia> [my_fun(x) for x in group(first, tups)] # apply function group wise
3-element Array{Tuple{Int64,Any,Int64,Vararg{Int64,N} where N},1}:
(2, Inf, 4, 6)
(3, [10.0], 4)
(1, [30.0], 2)
If performance is a concern you might want to think about my_fun and do some profiling to see if and how you can improve it - the only thing I've done here is to use diff to subtract the first from the second element of the sorted array to avoid sorting twice.

Sort array of objects in numpy?

How can I efficiently sort an array of objects on two or more attributes in Numpy?
class Obj():
def __init__(self,a,b):
self.a = a
self.b = b
arr = np.array([],dtype=Obj)
for i in range(10):
arr = np.append(arr,Obj(i, 10-i))
arr_sort = np.sort(arr, order=a,b) ???
Thx, Willem-Jan
The order parameter only applies to structured arrays:
In [383]: arr=np.zeros((10,),dtype='i,i')
In [385]: for i in range(10):
...: arr[i] = (i,10-i)
In [386]: arr
Out[386]:
array([(0, 10), (1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3), (8, 2), (9, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
In [387]: np.sort(arr, order=['f0','f1'])
Out[387]:
array([(0, 10), (1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3), (8, 2), (9, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
In [388]: np.sort(arr, order=['f1','f0'])
Out[388]:
array([(9, 1), (8, 2), (7, 3), (6, 4), (5, 5), (4, 6), (3, 7), (2, 8),
(1, 9), (0, 10)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
With a 2d array, lexsort provides a similar 'ordered' sort
In [402]: arr=np.column_stack((np.arange(10),10-np.arange(10)))
In [403]: np.lexsort((arr[:,1],arr[:,0]))
Out[403]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [404]: np.lexsort((arr[:,0],arr[:,1]))
Out[404]: array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0], dtype=int32)
With your object array, I could extract the attributes into either of these structures:
In [407]: np.array([(a.a, a.b) for a in arr])
Out[407]:
array([[ 0, 10],
[ 1, 9],
[ 2, 8],
....
[ 7, 3],
[ 8, 2],
[ 9, 1]])
In [408]: np.array([(a.a, a.b) for a in arr],dtype='i,i')
Out[408]:
array([(0, 10), (1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3),
(8, 2), (9, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
The Python sorted function will work on arr (or its list equivalent)
In [421]: arr
Out[421]:
array([<__main__.Obj object at 0xb0f2d24c>,
<__main__.Obj object at 0xb0f2dc0c>,
....
<__main__.Obj object at 0xb0f35ecc>], dtype=object)
In [422]: sorted(arr, key=lambda a: (a.b,a.a))
Out[422]:
[<__main__.Obj at 0xb0f35ecc>,
<__main__.Obj at 0xb0f3570c>,
...
<__main__.Obj at 0xb0f2dc0c>,
<__main__.Obj at 0xb0f2d24c>]
Your Obj class is missing a nice __str__ method. I have to use something like [(i.a, i.b) for i in arr] to see the values of the arr elements.
As I stated in the comment, for this example, a list is much nice than an object array.
In [423]: alist=[]
In [424]: for i in range(10):
...: alist.append(Obj(i,10-i))
list append is faster than the repeated array append. And object arrays don't add much functionality compared to a list, especially when 1d, and the objects are custom classes like this. You can't do any math on arr, and as you can see, sorting isn't any easier.

Numpy rows substitution

I am new to Numpy and I am not an expert programmer at all...
This is my issue:
I have array a and array b (b < a).
I want to substitute some rows of a with all the rows of b (in order).
The rows in a to be substituted have one value in common with b rows.
for example:
a is:
1 a 2
3 b 4
0 z 0
5 c 6
0 y 0
b is
1 z 1
1 y 1
In this case I will want to substitute rows 3 and 5 in a with 1 and 2 in b. The arrays are very big and, as in the example, there are some character types (so I set the array to dtype=object).
First, I would recommend changing your dtype from object to [('a',int),('letter','S1'),('b',int)]. What this does is allow you to have columns of different dtypes (note the length of that dtype list is the number of columns). Then you have:
In [63]: a = np.array(
....: [(1,'a',2),(3,'b',4),(0,'z',0),(5,'c',6),(0,'y',0)],
....: dtype=[('a',int),('letter','S1'),('b',int)])
In [64]: a
Out[64]:
array([(1, 'a', 2), (3, 'b', 4), (0, 'z', 0), (5, 'c', 6), (0, 'y', 0)],
dtype=[('a', '<i8'), ('letter', 'S1'), ('b', '<i8')])
In [65]: a['a']
Out[65]: array([1, 3, 0, 5, 0])
In [66]: a['b']
Out[66]: array([2, 4, 0, 6, 0])
In [67]: a['letter']
Out[67]:
array(['a', 'b', 'z', 'c', 'y'],
dtype='|S1')
Then you can use the same dtype for b:
In [69]: b = np.array([(1,'z',1),(1,'y',1)], dtype=a.dtype)
In [70]: b
Out[70]:
array([(1, 'z', 1), (1, 'y', 1)],
dtype=[('a', '<i8'), ('letter', 'S1'), ('b', '<i8')])
In [71]: b['letter']
Out[71]:
array(['z', 'y'],
dtype='|S1')
Now, you can simply replace the parts of a with the parts of b where the letters match:
In [73]: a
Out[73]:
array([(1, 'a', 2), (3, 'b', 4), (0, 'z', 0), (5, 'c', 6), (0, 'y', 0)],
dtype=[('a', '<i8'), ('letter', 'S1'), ('b', '<i8')])
In [74]: b
Out[74]:
array([(1, 'z', 1), (1, 'y', 1)],
dtype=[('a', '<i8'), ('letter', 'S1'), ('b', '<i8')])
In [75]: for row in b:
....: a[a['letter']==row['letter']] = row
....:
In [76]: a
Out[76]:
array([(1, 'a', 2), (3, 'b', 4), (1, 'z', 1), (5, 'c', 6), (1, 'y', 1)],
dtype=[('a', '<i8'), ('letter', 'S1'), ('b', '<i8')])

Resources