Can I "group by" an array of dictionaries in Julia? - arrays

I'm reading in an array from a JSON file because I need to perform a reduce on it before turning it into a DataFrame for further manipulation. For the sake of argument, let's say this is it
a = [Dict("A" => 1, "B" => 1, "C" => "a")
Dict("A" => 1, "B" => 2, "C" => "b")
Dict("A" => 2, "B" => 1, "C" => "b")
Dict("A" => 2, "B" => 2, "C" => "a")]
Now, the reduce I'm performing would be greatly simplified if I could group the array by one or more keys (say, A and C), perform a simpler reduce on each group, and recombine the rows later into a larger array of Dicts that I can then easily turn into a DataFrame.
One solution would be to turn this into a DataFrame, split it into groups, turn individual groups into matrices, do the reduce (with some difficulty, because now I've lost the ability to refer to elements by their name), turn the reduced matrices back into (Sub?)DataFrames (with some more difficulty because names), and hope it all comes together nicely into one giant DataFrame.
Any easier and/or more practical way of doing this?
EDIT Before somebody suggests I look at Query.jl, the reduce I'm running returns an array, with fewer rows because I'm squashing certain pairs of subsequent rows. If I can do such a thing with Query.jl, could somebody hint at how, because the documentation isn't exactly clear on how to "aggregate" with anything that doesn't return a single value. Example:
A B C
-----------
1 a
2 1 a
3 b
4 2 b
should group by "C" and turn that table into something like
A B C
-----------
1 1 a
3 2 b
To clarify, the reduce is working, I only want to simplify it by not having to check if a row belongs to the same group of the previous row before doing the squashing.

It's still experimental, but SplitApplyCombine.jl might do the trick. You can group arbitrary iterables using any key function you want, and get a key -> group dict out at the end.
julia> ## Pkg.clone("https://github.com/JuliaData/SplitApplyCombine.jl.git")
julia> using SplitApplyCombine
julia> group(x->x["C"], a)
Dict{Any,Array{Dict{String,Any},1}} with 2 entries:
"b" => Dict{String,Any}[Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 1),Pair{String,Any}("C", "b")), Dict{String,Any}(Pair{String,Any}("…
"a" => Dict{String,Any}[Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 1),Pair{String,Any}("C", "a")), Dict{String,Any}(Pair{String,Any}("…
Then you can use standard [map]reduce operations (here using the SAC #_ macro for piping):
julia> #_ a |> group(x->x["C"], _) |> values(_) |> reduce(vcat, _)
4-element Array{Dict{String,Any},1}:
Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 1),Pair{String,Any}("C", "b"))
Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 2),Pair{String,Any}("C", "b"))
Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 1),Pair{String,Any}("C", "a"))
Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 2),Pair{String,Any}("C", "a"))

Related

Julia: Functional Programming :Validate array entries against another array of values

I'm tying to create a one liner that filters an array against an array of values. Meaning that I want to cycle through each element of A and compare to the elements of B.
For example: What is safe to drink?
A = ["water";"beer";"ammonia";"bleach";"lemonade"]
B = ["water";"beer"; "lemonade"]
I slapped together this monstrosity but, I'm hoping someone has a more elegant approach:
julia> vcat(filter(w->length(w)!= 0, map(y->filter(z->z!="",(map(x-> begin x==y ? x = y : x = "" end,B))),A))...)
3-element Array{String,1}:
"water"
"beer"
"lemonade"
You can use filter to iterate over the available drinks and in to check whether the current element is in the list of safe drinks:
julia> drinks = ["water", "beer", "bleach"];
julia> safe = ["beer", "lemonade", "water"];
julia> filter(in(safe), drinks)
2-element Array{String,1}:
"water"
"beer"
The filter approach is very neat. You can also use a comprehension:
[a for a in A if a in B]

Correct way of maintaining array structure in R [duplicate]

I am working with 3D arrays. A function takes a 2D array slice (matrix) from the user and visualizes it, using row and column names (the corresponding dimnames of the array). It works fine if the array dimensions are > 1.
However, if I have 1x1x1 array, I cannot extract the slice as a matrix:
a <- array(1, c(1,1,1), list(A="a", B="b", C="c"))
a[1,,]
[1] 1
It is a scalar with no dimnames, hence part of the necessary information is missing. If I add drop=FALSE, I don't get a matrix but retain the original array:
a[1,,,drop=FALSE]
, , C = c
B
A b
a 1
The dimnames are here but it is still 3-dimensional. Is there an easy way to get a matrix slice from 1x1x1 array that would look like the above, just without the third dimension:
B
A b
a 1
I suspect the issue is that when indexing an array, we cannot distinguish between 'take 1 value' and 'take all values' in case where 'all' is just a singleton...
The drop parameter of [ is all-or-nothing, but the abind package has an adrop function which will let you choose which dimension you want to drop:
abind::adrop(a, drop = 3)
## B
## A b
## a 1
Without any extra packages, the best I could do was to apply and return the sub-array:
apply(a, 1:2, identity)
# or
apply(a, 1:2, I)
# B
#A b
# a 1

how to extract 1x1 array slice as matrix in R?

I am working with 3D arrays. A function takes a 2D array slice (matrix) from the user and visualizes it, using row and column names (the corresponding dimnames of the array). It works fine if the array dimensions are > 1.
However, if I have 1x1x1 array, I cannot extract the slice as a matrix:
a <- array(1, c(1,1,1), list(A="a", B="b", C="c"))
a[1,,]
[1] 1
It is a scalar with no dimnames, hence part of the necessary information is missing. If I add drop=FALSE, I don't get a matrix but retain the original array:
a[1,,,drop=FALSE]
, , C = c
B
A b
a 1
The dimnames are here but it is still 3-dimensional. Is there an easy way to get a matrix slice from 1x1x1 array that would look like the above, just without the third dimension:
B
A b
a 1
I suspect the issue is that when indexing an array, we cannot distinguish between 'take 1 value' and 'take all values' in case where 'all' is just a singleton...
The drop parameter of [ is all-or-nothing, but the abind package has an adrop function which will let you choose which dimension you want to drop:
abind::adrop(a, drop = 3)
## B
## A b
## a 1
Without any extra packages, the best I could do was to apply and return the sub-array:
apply(a, 1:2, identity)
# or
apply(a, 1:2, I)
# B
#A b
# a 1

Multiple assignment in Scala without using Array?

I have an input something like this: "1 2 3 4 5".
What I would like to do, is to create a set of new variables, let a be the first one of the sequence, b the second, and xs the rest as a sequence (obviously I can do it in 3 different lines, but I would like to use multiple assignment).
A bit of search helped me by finding the right-ignoring sequence patterns, which I was able to use:
val Array(a, b, xs # _*) = "1 2 3 4 5".split(" ")
What I do not understand is that why doesn't it work if I try it with a tuple? I get an error for this:
val (a, b, xs # _*) = "1 2 3 4 5".split(" ")
The error message is:
<console>:1: error: illegal start of simple pattern
Are there any alternatives for multiple-assignment without using Array?
I have just started playing with Scala a few days ago, so please bear with me :-) Thanks in advance!
Other answers tell you why you can't use tuples, but arrays are awkward for this purpose. I prefer lists:
val a :: b :: xs = "1 2 3 4 5".split(" ").toList
Simple answer
val Array(a, b, xs # _*) = "1 2 3 4 5".split(" ")
The syntax you are seeing here is a simple pattern-match. It works because "1 2 3 4 5".split(" ") evaluates to an Array:
scala> "1 2 3 4 5".split(" ")
res0: Array[java.lang.String] = Array(1, 2, 3, 4, 5)
Since the right-hand-side is an Array, the pattern on the left-hand-size must, also, be an Array
The left-hand-side can be a tuple only if the right-hand-size evaluates to a tuple as well:
val (a, b, xs) = (1, 2, Seq(3,4,5))
More complex answer
Technically what's happening here is that the pattern match syntax is invoking the unapply method on the Array object, which looks like this:
def unapplySeq[T](x: Array[T]): Option[IndexedSeq[T]] =
if (x == null) None else Some(x.toIndexedSeq)
Note that the method accepts an Array. This is what Scala must see on the right-hand-size of the assignment. And it returns a Seq, which allows for the #_* syntax you used.
Your version with the tuple doesn't work because Tuple3's unapplySeq is defined with a Product3 as its parameter, not an Array:
def unapply[T1, T2, T3](x: Product3[T1, T2, T3]): Option[Product3[T1, T2, T3]] =
Some(x)
You can actually "extractors" like this that do whatever you want by simply creating an object and writing an unapply or unapplySeq method.
The answer is:
val a :: b :: c = "1 2 3 4 5".split(" ").toList
Should clarify that in some cases one may want to bind just the first n elements in a list, ignoring the non-matched elements. To do that, just add a trailing underscore:
val a :: b :: c :: _ = "1 2 3 4 5".split(" ").toList
That way:
c = "3" vs. c = List("3","4","5")
I'm not an expert in Scala by any means, but I think this might have to do with the fact that Tuples in Scala are just syntatic sugar for classes ranging from Tuple2 to Tuple22.
Meaning, Tuples in Scala aren't flexible structures like in Python or other languages of the sort, so it can't really create a Tuple with an unknown a priori size.
We can use pattern matching to extract the values from string and assign it to multiple variables. This requires two lines though.
Pattern says that there are 3 numbers([0-9]) with space in between. After the 3rd number, there can be text or not, which we don't care about (.*).
val pat = "([0-9]) ([0-9]) ([0-9]).*".r
val (a,b,c) = "1 2 3 4 5" match { case pat(a,b,c) => (a,b,c) }
Output
a: String = 1
b: String = 2
c: String = 3

Mongoid: Retrieving objects in the order of the

Suppose:
mentions=["2","1","3"]
unranked = User.where(:nickname.in => mentions).map
The output does not match the ordering in the provided array
output is random => 3, 1, 2
i want it as per the original array => 2, 1, 3
I had the same problem, I solved it like this:
mentions=["foo","bar","baz"]
ranked = User.where(:nickname.in => mentions).sort do |a, b|
mentions.index(a.nickname) <=> mentions.index(b.nickname)
end
Not really the most elegant solution since I'm sorting in the application and not on the database engine but hey.. it works (on small lists).

Resources