My list looks like this:
['"date","supermarket","categoryA",10',
'"date","candy store","categoryB",5',
'"date","drugstore","categoryC",6',
'"date","supermarket","categoryA",20',
'"date","candy store","categoryB",2',
'"date","drugstore","categoryC",90']
etc
I'm trying to aggregate the numbers per category -- categoryA B C etc
So far, It's been three days of mostly sideways action. I really should get a book on Python as I've just jumped in and now here I'm asking you guys.
I know how to do this in mysql but that logic is not helping me here.
My code:
for x in range(0 , len(list)):
for y in list[x][2]:
value += list[x][3]
Tearing my hairs out, and I don't have many of those left...
Use dictionary to hold the aggregation and iterate list using in:
aggregate = {}
for x in list:
if (x[2] not in aggregate):
aggregate[x[2]] = 0
aggregate[x[2]] += x[3]
The above assumes your list of lists looks like this:
[
["date","supermarket","categoryA",10],
["date","candy store","categoryB",5]
]
Using python dictionaries, simplifies lot of things. This would work:
category_aggregate_dictionary = {}
for x in range(0 , len(list)):
for y in list[x][2]:
value = list[x][3]
category_aggregate_dictionary[y] = 0 if category_aggregate_dictionary.get(y, None) == None
category_aggregate_dictionary[y] += float(value)
Finally, category_aggregate_dictionary["categoryA"] should give you aggregate number of categoryA.
Hope it Helps : )
Here I've assumed you actually have a list of lists. (See my value for "entries" below.)
from collections import Counter
entries = [
["date", "supermarket", "categoryA", 10],
["date", "candy store", "categoryB", 5],
["date", "drugstore", "categoryC", 6],
["date", "supermarket", "categoryA", 20],
["date", "candy store", "categoryB", 2],
["date", "drugstore", "categoryC", 90]
]
# A Counter is much like a dictionary with a default value of 0
category_counts = Counter()
for entry in entries:
category = entry[2]
count = entry[3]
category_counts[category] += count
# You have the counts already at this point. This loop will
# just print them out in sorted order (by category name).
for category in sorted(category_counts.keys()):
print('{}: {}'.format(category, category_counts[category]))
# Output:
# categoryA: 30
# categoryB: 7
# categoryC: 96
If you are dealing with a list of string like that you can use a ast.literal_eval() function in order to evaluate your strings as tuples then use defaultdict() for aggregating the numbers:
>>> from collections import defaultdict
>>> from ast import literal_eval
>>> d = defaultdict(int)
>>> for item in lst:
... *_, cat, num = literal_eval(item)
... d[cat]+=num
...
>>> d
defaultdict(<class 'int'>, {'9': 0, 'categoryA': 30, 'categoryC': 96, 'categoryB': 7})
Related
I have two numpy arrays of shape arr1=(~140000, 3) and arr2=(~450000, 10). The first 3 elements of each row, for both the arrays, are coordinates (z,y,x). I want to find the rows of arr2 that have the same coordinates of arr1 (which can be considered a subgroup of arr2).
for example:
arr1 = [[1,2,3],[1,2,5],[1,7,8],[5,6,7]]
arr2 = [[1,2,3,7,66,4,3,44,8,9],[1,3,9,6,7,8,3,4,5,2],[1,5,8,68,7,8,13,4,53,2],[5,6,7,6,67,8,63,4,5,20], ...]
I want to find common coordinates (same first 3 elements):
list_arr = [[1,2,3,7,66,4,3,44,8,9], [5,6,7,6,67,8,63,4,5,20], ...]
At the moment I'm doing this double loop, which is extremely slow:
list_arr=[]
for i in arr1:
for j in arr2:
if i[0]==j[0] and i[1]==j[1] and i[2]==j[2]:
list_arr.append (j)
I also tried to create (after the 1st loop) a subarray of arr2, filtering it on the value of i[0] (arr2_filt = [el for el in arr2 if el[0]==i[0]). This speed a bit the operation, but it still remains really slow.
Can you help me with this?
Approach #1
Here's a vectorized one with views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
a,b = view1D(arr1,arr2[:,:3])
out = arr2[np.in1d(b,a)]
Approach #2
Another with dimensionality-reduction for ints -
d = np.maximum(arr2[:,:3].max(0),arr1.max(0))
s = np.r_[1,d[:-1].cumprod()]
a,b = arr1.dot(s),arr2[:,:3].dot(s)
out = arr2[np.in1d(b,a)]
Improvement #1
We could use np.searchsorted to replace np.in1d for both of the approaches listed earlier -
unq_a = np.unique(a)
idx = np.searchsorted(unq_a,b)
idx[idx==len(a)] = 0
out = arr2[unq_a[idx] == b]
Improvement #2
For the last improvement on using np.searchsorted that also uses np.unique, we could use argsort instead -
sidx = a.argsort()
idx = np.searchsorted(a,b,sorter=sidx)
idx[idx==len(a)] = 0
out = arr2[a[sidx[idx]]==b]
You can do it with the help of set
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[7,8,9,11,14,34],[23,12,11,10,12,13],[1,2,3,4,5,6]])
# create array from arr2 with only first 3 columns
temp = [i[:3] for i in arr2]
aset = set([tuple(x) for x in arr])
bset = set([tuple(x) for x in temp])
np.array([x for x in aset & bset])
Output
array([[7, 8, 9],
[1, 2, 3]])
Edit
Use list comprehension
l = [list(i) for i in arr2 if i[:3] in arr]
print(l)
Output:
[[7, 8, 9, 11, 14, 34], [1, 2, 3, 4, 5, 6]]
For integers Divakar already gave an excellent answer. If you want to compare floats you have to consider e.g. the following:
1.+1e-15==1.
False
1.+1e-16==1.
True
If this behaviour could lead to problems in your code I would recommend to perform a nearest neighbour search and probably check if the distances are within a specified threshold.
import numpy as np
from scipy import spatial
def get_indices_of_nearest_neighbours(arr1,arr2):
tree=spatial.cKDTree(arr2[:,0:3])
#You can check here if the distance is small enough and otherwise raise an error
dist,ind=tree.query(arr1, k=1)
return ind
I'm trying to write a program in Ruby that allows one array to receive information from another array. Basically, I have a multidimensional array called "student_array" that contains information on a few students
student_array = [["Mike", 13, "American", "male"],
["Grace", 12, "Canadian", "female"],
["Joey", 13, "American", "male"],
["Lily", 13, "American", "female"]
]
I also initialized two other arrays that will count nationalities:
nationality_array = Array.new
nationality_count = Array.new
The purpose of this program is to loop through student array, count the different nationalities of the students, and create a CSV file that will contain the headers of the different nationalities, and a count for each one.
Expected output.csv
American, Canadian
3, 1
Here is the code I have so far
student_array.each do |student|
#pushes the nationality string into the nationality array
nationality_array.push(student[2])
end
so the nationality_array should currently look like this:
nationality_array = ["American", "Canadian", "American", "American"];
nationality_array.uniq = ["American", "Canadian"];
So I will have two headers - "American" and "Canadian"
Now I need a way to loop through the student_array, count up each instance of "American" and "Canadian", and somehow assign it back to the nationality array. I'm having a hard time visualizing how to go about this. This is what I have so far--
american_count = 0;
canadian_count = 0;
student_array.each do |student|
if student[2] = "American"
american_count++
elsif student[2] = "Canadian"
canadian_count++
end
end
nationality_count.push(american_count);
nationality_count.push(canadian_count);
Okay, now I have those counts in the nationality_count array, but how can I pass it to a CSV, making sure that they are assigned to the right headers? I have a feeling that my code is very awkward and could be much more streamlined as well.
It would probably look something like this?
CSV.open("output/redemptions.csv", "wb") do |csv|
csv << [nationality_array]
csv << [nationality_count]
end
Can anyone provide any insight into a cleaner way to go about this?
You could use a Hash to group the counts by nationality instead of different arrays.
nationalities_count = student_array.each_with_object(Hash.new(0)) do |student, hash|
nationality = student[2]
hash[nationality] += 1
end
That will give you a Hash that would look like
{ "American" => 2, "Canadian" => 1 }
You could then use Hash#to_a and Array#transpose like so:
hsh = { "American" => 2, "Canadian" => 1 }
=> {"American"=>2, "Canadian"=>1}
2.4.2 :002 > hsh.to_a
=> [["American", 2], ["Canadian", 1]]
2.4.2 :003 > hsh.to_a.transpose
=> [["American", "Canadian"], [2, 1]]
Finally, to output the CSV file all you need to do is write the arrays into the file
nationalities_with_count = hash.to_a.transpose
CSV.open("output/redemptions.csv", "wb") do |csv|
csv << nationalities_with_count[0]
csv << nationalities_with_count[1]
end
Array#group_by in Ruby core and Hash#transform_values in ActiveSupport are two very versitile methods that can be used here:
require 'active_support/all'
require 'csv'
student_array = [
["Mike", 13, "American", "male"],
["Grace", 12, "Canadian", "female"],
["Joey", 13, "American", "male"],
["Lily", 13, "American", "female"]
]
counts = student_array.group_by { |attrs| attrs[2] }.transform_values(&:length)
# => => {"American"=>3, "Canadian"=>1}
CSV.open("output/redemptions.csv", "wb") do |csv|
csv << counts.keys
csv << counts.values
end
puts File.read "output/redemptions.csv"
# => American,Canadian
# 3,1
.group_by { |attrs| attrs[2] } turns the array into a hash, where keys are the unique values for attrs[2], and values are a list of elements that have that attrs[2]. At this point you can use transform_values to turn those values into numbers representing their length (meaning, how many elements have that specific attrs[2]). The keys and values can then be extracted from the hash as separate arrays.
You even don’t need a CSV tool here:
result =
student_array.
map { |a| a[2] }. # get nationalities
group_by { |e| e }. # hash
map { |n, c| [n, c.count] }. # map values to count
transpose. # put data in rows
map { |row| row.join ',' }. # join values in a row
join($/) # join rows
#⇒ American,Canadian
# 3,1
Now you have a string that is valid CSV, just spit it out to the file.
i am basically trying to switch around an array of arrays; my initial data are:
array = [
[0,0,0],
[1,1,1]
]
the output should be:
[
[0,1],
[0,1],
[0,1]
]
however what i get is:
[]
i have tried doing the same thing without the loops but when i introduce them it just wont append!
see code here:
array = [
[0,0,0],
[1,1,1]
]
transformedArray = []
#add rows to transformed
for j in range(0, len(array) - 1):
transformedArray.append([])
#for each row
for i in range(0, len(array[0]) - 1):
#for each column
for k in range(0, len(array) - 1):
transformedArray[i].append(array[k][i])
can you help? i have not found any similar issues online so i am guessing i've missed something stupid!
Try nesting your loops:
array = [
[0,0,0],
[1,1,1]
]
transformedArray = [[0,0],[0,0],[0,0]]
# iterate through rows
for i in range(len(array)):
# iterate through columns
for j in range(len(array[0])):
transformedArray[j][i] = array[i][j]
for res in transformedArray:
print(res)
returns:
[0, 1]
[0, 1]
[0, 1]
Edited to Add explanation:
First, lists are defined as in this code above: aList = [ ... ] where an array would be defined as anArray = numpy.array([...]), so to the point of the comments above, this is list processing in the question, not true python array process. Next, elements are being added to the list by index, so there has to be a place to put them. I handled that by creating a list with 3 elements already in place. The original post would only create the first 2 rows and then have an index failure when the 3rd row is to be created. The nested for loops then iterate through the embedded lists.
You could do it by mapping a sequence of index-access operations over all the arrays:
for i in range( len( array[0] ) ):
transformedArray.append( map( lambda x: x[i], array ) )
Using just Ruby I am trying to
Generate an array of random numbers
Create a new 2 dimensional array containing x amount of arrays filled with x amount of samples from the original number list.
This is what I have...
a = 1000.times.map{rand(100)}.to_a
b = 5.times.map{a.sample}
#=> [3, 96, 23, 45, 41]
I basically want to be able to generate what I did in b, x amount of times.
Is this possible?
Thank you for the comments everyone!
Just wrap your definition of b in another map:
a = 1000.times.map{rand(100)} # to_a is unnecessary here, map returns an array
b = 5.times.map{5.times.map{a.sample}}
A one-liner to do what you want.
3.times.map {2.times.map {rand 1000} }
#=> [[267, 476], [109, 950], [345, 137]]
I don't have Rails installed at the moment, so here's a pure Ruby solution.
a = (0..1000).to_a.map! { rand(100) }
x = 2
b = (0..x).to_a.map! { a.sample(x) }
# [[83, 73], [55, 93], [57, 18]]
How get a list from array of arrays?
I have a List of Lists, like: [[1,2,3],[1,2,3],[1,2,3]].
I want to have a List that contains all first elements from my List.
For example in my example, I want to have a list = [1,1,1].
If you also might want to get the second/third elements of each List, you can also use transpose:
def input = [[1,2,3],[1,2,3],[1,2,3]]
def output = input.transpose()
// All the lists are joined by element index
assert output == [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
// Grab the first one (1,1,1)
assert output[ 0 ] == [ 1,1,1 ]
If you know you always have an list of lists (i.e. the inner list always exists), you could do it like this:
def lists = [[1,2,3],[1,2,3],[1,2,3]]
def result = lists.collect { it[0] }
assert result == [1,1,1]