removing duplicates from list of arrays and concatenating associated values

removing duplicates from list of arrays and concatenating associated values - arrays

I have two lists of arrays of data (cost and account) identified by the list 'number'. Each array in the lists are of different lengths, but each cost array has a corresponding account array of the same length.
I would like to remove the duplicates in the list number and concatenate the corresponding data in cost and account together for each duplicate. The ordering is important. Here's an example of the lists I have:
number = [4, 6, 8, 4, 8]
cost = [array([1,2,3], dtype = uint64), array([5,6,7,8], dtype = uint64), array([9,10,11], dtype= uint64), array([13,14,15], dtype = uint64), array([17,18], dtype = uint64)]
account = [array([.1,.2,.3], dtype = float32), array([.5,.6,.7,.8], dtype = float32), array([.5,.10,.11], dtype= float32), array([.13,.14,.15], dtype = float32), array([32,.18], dtype = float32)]
The desired result is to have:
number = [4,6,8]
cost = [[1,2,3,13,14,15],[5,4,7,8],[9,10,11,17,18]]
account = [[.1,.2,.3,.13,.14,.15],[.5,.6,.7,.8],[.5,.10,.11,32,.18]]
Is there an easy way to do this with indexing or dictionaries?

If number order is not important (e.g [8,4,6]), You can do as follows:
number = [4, 6, 8, 4, 8]
cost = [[1,2,3],[5,6,7],[9,10,11],[13,14,15],[17,18,19]]
account = [[.1,.2,.3],[.5,.6,.7],[.9,.0,.1],[.3,.4,.5],[.7,.8,.9]]
duplicates = lambda lst, item: [i for i, x in enumerate(lst) if x == item]
indexes = dict((n, duplicates(number, n)) for n in set(number))
number = list(set(number))
cost = [sum([cost[num] for num in val], []) for valin indexes.values()]
account = [sum([account[num] for num in val], []) for valin indexes.values()]
The indexes will be dictionary with number as key and indexes as values using duplicates finder lambda function.

You can do:
# Find unique values in "number"
number, idx, inv = np.unique(number,return_index=True,return_inverse=True)
# concatenate "cost" based on unique values
cost = [np.asarray(cost)[np.where(inv==i)[0]].flatten().tolist() \
for i in idx ]
# concatenate "account" based on unique values
account = [np.asarray(account)[np.where(inv==i)[0]].flatten().tolist() \
for i in idx ]
# Check
In [248]: number
[4 6 8]
In [249]: cost
[[1, 2, 3, 13, 14, 15], [5, 6, 7], [9, 10, 11, 17, 18, 19]]
In [250]: account
[[0.1, 0.2, 0.3, 0.3, 0.4, 0.5], [0.5, 0.6, 0.7], [0.9, 0.0, 0.1, 0.7, 0.8, 0.9]]
np.asarray() and tolist() are unnecessary if your inputs are numpy arrays, so you might want to get rid of them. I just added them so that they work for Python lists too.

Related

Extract fixed number of elements per row in numpy array

Suppose I have an array a, and a boolean array b, I want to extract a fixed number of elements from the valid elements in each row of a. The valid elements are the ones indicated by b.
Here is an example:
a = np.arange(24).reshape(4,6)
b = np.array([[0,0,1,1,0,0],[0,1,0,1,0,1],[0,1,1,1,1,0],[0,0,0,0,1,1]]).astype(bool)
x = []
for i in range(a.shape[0]):
c = a[i,b[i]]
d = np.random.choice(c, 2)
x.append(d)
Here I used a for loop, which will be slow in case these arrays are big and high-dimensional. Is there a more efficient way to do this? Thanks.

Generate a random uniform [0, 1] matrix of shape a.
Multiply this matrix by the mask b to set invalid elements to zero.
Select the k maximum indices from each row (simulating an unbiased random k-sample from only the valid elements in this row).
(Optional) use these indices to get the elements.
a = np.arange(24).reshape(4,6)
b = np.array([[0,0,1,1,0,0],[0,1,0,1,0,1],[0,1,1,1,1,0],[0,0,0,0,1,1]])
k = 2
r = np.random.uniform(size=a.shape)
indices = np.argpartition(-r * b, k)[:,:k]
To get the elements from the indices:
>>> indices
array([[3, 2],
[5, 1],
[3, 2],
[4, 5]])
>>> a[np.arange(a.shape[0])[:,None], indices]
array([[ 3, 2],
[11, 7],
[15, 14],
[22, 23]])

List of lists into table (Row, Column)

I have inspiration with loop pattern diamond. The pattern diamond is replaced with the list of the lists like this code. Actually, I already print the actual result is not looked to row and column. the expected result is looked similar to the actual result. Dummy Coords (dum_coords) is represented the data will put into a table (row, column).
How to rule to change value None or value inside the list the value less than 3. I can replace the value to a new list. Which one better-using value or index of the list.
Please I need advice and suggestions
python, pattern diamond, list of lists
l = 4
dum_coords = [None, [[1,2,3],6,7,8], [9,[1,2,3],11,12], [[1,2,3],14,15,16]]
for i in range(l):
print("row:{}, {}".format(i, dumy_coords[i]))
actual result:
row:0>> [None]
row:1>> [[1,2,3],6,7,8]
row:2>> [9,[1,2,3],11,12]
row:3>> [[1,2,3],14,15,16]
expected result:
row:0>> [[None],[None],[None],[None]]
row:1>> [[None],[1,2,3],6,7]
row:2>> [9,[1,2,3],11,[1,2,3]]
row:3>> [[1,2,3],14,15,0]
Updated Question:
#Shishir Naresha suggestion me his code. for the result is similar to the expected result. But, need more than it. I can replace the value None or not None.
this my code and added with #Shishir Naresha's code
dum_fish_space = [None]*5
dum_fish_pop = [None]*3
dum_fish_preying = [{"current":random.uniform(0,1)*2}]
dum_fish_following = [{"current":random.uniform(0,1)*1,"target":random.uniform(0,1)}]
dum_fish_swarming = [{"current":random.uniform(0,1)}]
dum_fish_randoming = [{"current":random.uniform(0,1)*3}]
dum_fish_battle = []
idxrange = []
new_idxrange = []
for i in range(len(dum_fish_pop)):
temp = random.randrange(len(dum_fish_space))
idxrange.append(random.randrange(len(dum_fish_space)))
if temp is not idxrange:
new_idxrange.append(temp)
l = 4
dumy_coords = [None, [[1, 2, 3], 6, 7], [9, [1, 2, 3], 11, 12], [[1, 2, 3], 14, 15, 16]]
for i in range(l):
if dumy_coords[i] is None:
print("row:{}, {}".format(i, [None] * l))
else:
if len(dumy_coords[i]) < l:
dif = l - len(dumy_coords[i])
print("row:{}, {}".format(i, [None] * dif + dumy_coords[i]))
else:
print("row:{}, {}".format(i, dumy_coords[i]))
I thought is a value inside the table will change based on the index of list of the lists, I expected the result will be alike
expected result before changing the value:
row:0>> [[None],[None],[None],[None]]
row:1>> [[None],[1,2,3],6,7]
row:2>> [9,[1,2,3],11,[1,2,3]]
row:3>> [[1,2,3],14,15,0]
expected result after change the value:
row:0>> [[1,2,3],[1,2,3],[None],[None]]
row:1>> [[None],[None],6,7]
row:2>> [9,[None],11,[1,2,3]]
row:3>> [[1,2,3],14,15,0]

Try the below code. I hope this will help.
l = 4
dumy_coords = [None, [[1,2,3],6,7], [9,[1,2,3],11,12], [[1,2,3],14,15,16]]
for i in range(l):
if dumy_coords[i] is None:
print("row:{}, {}".format(i, [None]*l))
else:
if len(dumy_coords[i]) < l:
dif = l-len(dumy_coords[i])
print("row:{}, {}".format(i, [None]*dif+dumy_coords[i]))
else:
print("row:{}, {}".format(i, dumy_coords[i]))
Output will be as shown below:
row:0, [None, None, None, None]
row:1, [None, [1, 2, 3], 6, 7]
row:2, [9, [1, 2, 3], 11, 12]
row:3, [[1, 2, 3], 14, 15, 16]

NumbaPro - Smartest way to sort a 2d array and then sum over entries of same key

In my program I have an array with the size of multiple million entries like this:
arr=[(1,0.5), (4,0.2), (321, 0.01), (2, 0.042), (1, 0.01), ...]
I could instead make two arrays with the same order (instead of an array with touples) if that helps.
For sorting this array I know I can use radix sort so it has this structure:
arr_sorted = [(1,0.5), (1,0.01), (2,0.42), ...]
Now I want to sum over all the values from the array that have the key 1. Then all that have the key 2 etc. That should be written into a new array like this:
arr_summed = [(1, 0.51), (2,0.42), ...]
Obviously this array would be much smaller, although still on the order of 100000 Entrys. Now my question is: What's the best parallel approach to my problem in CUDA? I am using NumbaPro.
Edit for clarity
I would have two arrays instead of a list of tuples like this:
keys = [1, 2, 5, 2, 6, 4, 4, 65, 3215, 1, .....]
values = [0.1, 0.4, 0.123, 0.01, 0.23, 0.1, 0.1, 0.4 ...]
They are initially numpy arrays that get copied to the device.
What I want is to reduce them by key and if possible set missing key values (for example if three doesn't appear in the array) to zero.
So I would want it go become:
keys = [1, 2, 3, 4, 5, 6, 7, 8, ...]
values = [0.11, 0.41, 0, 0.2, ...] # <- Summed by key
I know how big the final array will be beforehand.

I don't know Numba, but in simple Python:
arr=[(1,0.5), (4,0.2), (321, 0.01), (2, 0.042), (1, 0.01), ...]
res = [0.0] * (indexmax + 1)
for k, v in arr:
res[k] += v

Accessing Values In Matrix (Haskell Newbie)

i am trying to do the equvilant of this in haskell bascially
for (int i = 0; i < city_Permutation_Route.length - 1; i++) {
route_Distance = route_Distance + city_Distance_Matrix[city_Permutation_Route[i]][city_Permutation_Route[i + 1]];
}
there i get the weight of each route, compare it against each of the others so that i print out the route with the lowest weighted route as follows
Route Weight = 453.4
Route = 0,1,2,3,4,5,6,7,8
i have functions to get the total route and all the other data but do not understand how to get values from the matrix
Question: How do i do this in haskell
i want to be able to get the values from my distance matrix using the permutation values as the index to it

Given a permutation, for example [3, 2, 7, 5, 4, 6, 0, 1] you can compute all of the legs by zipping it with its own tail.
zip [3, 2, 7, 5, 4, 6, 0, 1]
(tail [3, 2, 7, 5, 4, 6, 0, 1])
zip [3, 2, 7, 5, 4, 6, 0, 1]
[2, 7, 5, 4, 6, 0, 1]
[(3,2),(2,7),(7,5),(5,4),(4,6),(6,0),(0,1)]
These are the indexes into the distance matrix for the cost of traveling between two points. If we look these up in the city_Distance_Matrix using the list indexing function !! we get the cost for each leg
map (\(c0, c1) -> city_Distance_Matrix !! c0 !! c1)
[(3,2),(2,7),(7,5),(5,4),(4,6),(6,0),(0,1)]
[97.4, 71.6, 111.0,138.0,85.2 ,86.3 ,129.0]
If we total these, we get the total cost for traveling the legs of this permutation.
sum [97.4, 71.6, 111.0,138.0,85.2 ,86.3 ,129.0] = 718.5
Putting this all together, we can define a function that computes the total length of all the legs of a permutation of the cities. We can simplify the function by using zipWith which is a combination of zip and map.
totalLength :: [Int] -> Double
totalLength cities = sum $ zipWith (\c0 c1 -> city_Distance_Matrix !! c0 !! c1) cities (tail cities)
You should be able to use this to find the permutation whose totalLength is minimal.

How do I find the complement of an array?

If I have a sorted array of numerical values such as Double, Integer, and Time, what is the general logic to finding a complement?
Over my CS career in college, I've gotten better of understanding complements and edge cases for ranges. As I help students whose skill levels and understanding match mine when I wrote this, I need help finding a generalized way to convey this concept to them for singular elements and ranges.

Try something like this:
def complement(l, universe=None):
"""
Return the complement of a list of integers, as compared to
a given "universe" set. If no universe is specified,
consider the universe to be all integers between
the minimum and maximum values of the given list.
"""
if universe is not None:
universe = set(universe)
else:
universe = set(range(min(l), max(l)+1))
return sorted(universe - set(l))
then
l = [1,3,5,7,10]
complement(l)
yields:
[2, 4, 6, 8, 9]
Or you can specify your own universe:
complement(l, range(12))
yields:
[0, 2, 4, 6, 8, 9, 11]

To add another option - using a data type that is always useful to learn about, for these types of operations.
a = set([1, 3, 5, 7, 10])
b = set(range(1, 11))
c = sorted(list(b.symmetric_difference(a)))
print(c)
[2, 4, 6, 8, 9]

>>> nums = [1, 3, 5, 7, 10]
>>> [n + ((n&1)*2-1) for n in nums]
[2, 4, 6, 8, 9]

The easiest way is to iterate from the beginning of your list to the second to last element. Set j equal to the index + 1. While j is less than the next number in your list, append it to your list of complements and increment it.
# find the skipped numbers in a list sorted in ascending order
def getSkippedNumbers (arr):
complement = []
for i in xrange(0, len(arr) - 1):
j = arr[i] + 1
while j < arr[i + 1]:
complement.append(j)
j += 1
return complement
test = [1, 3, 5, 7, 10]
print getSkippedNumbers(test) # returns [2, 4, 6, 8, 9]

You can find the compliment of two lists using list comprehension. Here we are taking the complement of a set x with respect to a set y:
>>> x = [1, 3, 5, 7, 10]
>>> y = [1, 2, 3, 4, 8, 9, 20]
>>> z = [n for n in x if not n in y]
>>> z
[5, 7, 10]
>>>

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

removing duplicates from list of arrays and concatenating associated values - arrays

Related

Extract fixed number of elements per row in numpy array

List of lists into table (Row, Column)

NumbaPro - Smartest way to sort a 2d array and then sum over entries of same key

Accessing Values In Matrix (Haskell Newbie)

How do I find the complement of an array?

Categories

Resources