Related
I'm reading in an array from a JSON file because I need to perform a reduce on it before turning it into a DataFrame for further manipulation. For the sake of argument, let's say this is it
a = [Dict("A" => 1, "B" => 1, "C" => "a")
Dict("A" => 1, "B" => 2, "C" => "b")
Dict("A" => 2, "B" => 1, "C" => "b")
Dict("A" => 2, "B" => 2, "C" => "a")]
Now, the reduce I'm performing would be greatly simplified if I could group the array by one or more keys (say, A and C), perform a simpler reduce on each group, and recombine the rows later into a larger array of Dicts that I can then easily turn into a DataFrame.
One solution would be to turn this into a DataFrame, split it into groups, turn individual groups into matrices, do the reduce (with some difficulty, because now I've lost the ability to refer to elements by their name), turn the reduced matrices back into (Sub?)DataFrames (with some more difficulty because names), and hope it all comes together nicely into one giant DataFrame.
Any easier and/or more practical way of doing this?
EDIT Before somebody suggests I look at Query.jl, the reduce I'm running returns an array, with fewer rows because I'm squashing certain pairs of subsequent rows. If I can do such a thing with Query.jl, could somebody hint at how, because the documentation isn't exactly clear on how to "aggregate" with anything that doesn't return a single value. Example:
A B C
-----------
1 a
2 1 a
3 b
4 2 b
should group by "C" and turn that table into something like
A B C
-----------
1 1 a
3 2 b
To clarify, the reduce is working, I only want to simplify it by not having to check if a row belongs to the same group of the previous row before doing the squashing.
It's still experimental, but SplitApplyCombine.jl might do the trick. You can group arbitrary iterables using any key function you want, and get a key -> group dict out at the end.
julia> ## Pkg.clone("https://github.com/JuliaData/SplitApplyCombine.jl.git")
julia> using SplitApplyCombine
julia> group(x->x["C"], a)
Dict{Any,Array{Dict{String,Any},1}} with 2 entries:
"b" => Dict{String,Any}[Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 1),Pair{String,Any}("C", "b")), Dict{String,Any}(Pair{String,Any}("…
"a" => Dict{String,Any}[Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 1),Pair{String,Any}("C", "a")), Dict{String,Any}(Pair{String,Any}("…
Then you can use standard [map]reduce operations (here using the SAC #_ macro for piping):
julia> #_ a |> group(x->x["C"], _) |> values(_) |> reduce(vcat, _)
4-element Array{Dict{String,Any},1}:
Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 1),Pair{String,Any}("C", "b"))
Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 2),Pair{String,Any}("C", "b"))
Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 1),Pair{String,Any}("C", "a"))
Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 2),Pair{String,Any}("C", "a"))
Suppose I have an array array = [1,2,3,4,5]
I want to collect all the elements and indices of the array in 2 separate arrays like
[[1,2,3,4,5], [0,1,2,3,4]]
How do I do this using a single Ruby collect statement?
I am trying to do it using this code
array.each_with_index.collect do |v,k|
# code
end
What should go in the code section to get the desired output?
Or even simpler:
[array, array.each_index.to_a]
I like the first answer that was posted a while ago. Don't know why the guy deleted it.
array.each_with_index.collect { |value, index| [value,index] }.transpose
Actually I am using an custom vector class on which I am calling the each_with_index method.
Here's one simple way:
array = [1,2,3,4,5]
indexes = *array.size.times
p [ array, indexes ]
# => [[1, 2, 3, 4, 5], [0, 1, 2, 3, 4]]
See it on repl.it: https://repl.it/FmWg
If there is an array
array A = ["a","b","c","d"] #Index is [0,1,2,3]
And it's sorted to.
array A = ["d","c","b","a"]
I need an array that returns me the updated index based on the sorted order
[3,2,1,0]
I'm trying to find a solution to this ruby
UPDATE to the question
If a is sorted to
array A = ["d","b","c","a"] #not a pure reverse
Then the returned index array should be
[3,1,2,0]
You need to create a mapping table that preserves the original order, then use that order to un-map the re-ordered version:
orig = %w[ a b c d ]
orig_order = orig.each_with_index.to_h
revised = %w[ d c b a ]
revised.map { |e| orig_order[e] }
# => [3, 2, 1, 0]
So long as your elements are unique this will be able to track any shift in order.
Here is one way to do this:
original_array = ["a","b","c","d"]
jumbled_array = original_array.shuffle
jumbled_array.map {|i| original_array.index(i)}
#=> [1, 3, 0, 2]
Note:
In this sample, output will change for every run as we are using shuffle to demonstrate the solution.
The solution will work only as long as array has no duplicate values.
If you do wish to solution to work with arrays with duplicate values, then, one possibility is to look at object_id of array members while figuring out the index.
jumbled_array.map {|i| original_array.map(&:object_id).index(i.object_id)}
This solution will work as long as jumbled_array contains element from original_array and no elements were recreated using dup or something that results in change in object_id values
You can use the map and index methods.
arr = ["a","b","c","d"]
sort_arr = ["d","c","b","a"]
sort_arr.map{|s| arr.index(s)}
# => [3, 2, 1, 0]
I am trying to build a method that takes two arguments (source, in this case a hash of names, and thing_to_find, in this case an age as an integer) and returns just the hash keys with that integer. The output should be an array of the hash keys with that value (but not output the value).
I'm about fried trying to get this - help very much appreciated. here's what is given...
my_family_pets_ages = {"Evi" => 6, "Ditto" => 3, "Hoobie" => 3, "George" => 12, "Bogart" => 4, "Poly" => 4, "Annabelle" => 0}
def my_hash_finding_method(source, thing_to_find)
# here i've tried all manner of has_value?, map, include? etc. - I think I'm probably having a problem with syntax...am I close with this?
source.select { |k, v| v.include? thing_to_find }
end
the output should look like
my_hash_finding_method(my_family_pets_ages, 3)
#=> should return ["Hoobie", "Ditto"]
Say you have an array #a = qw/ a b c d/;
and a hash %a = ('a' => 1, 'b' => 1, 'c' => 1, 'd' => 1);
Is there any situation where creating the array version is better than creating the hash (other than when you have to iterate over all the values as in something like
for (#a){
....
in which case you would have to use keys %a if you went with the hash)? Because testing whether a specific value is in a hash is always more efficient than doing so in an array, correct?
Arrays are indexed by numbers.
Hashes are keyed by strings.
All indexes up to the highest index exist in an array.
Hashes are sparsely indexed. (e.g. "a" and "c" can exist without "b".)
There are many emergent properties. Primarily,
Arrays can be used to store ordered lists.
It would be ugly an inefficient to use hashes that way.
It's not possible to delete an element from an array unless it's the highest indexed element.
You can delete from an ordered list implemented using an array, though it is inefficient to remove elements other than the first or last.
It's possible to delete an element from a hash, and it's efficient.
Arrays are ordered lists of values. They can contain duplicate values.
#array = qw(a b c a);
Hashes are a mapping between a key (which must be unique) and a value (which can be duplicated). Hashes are (effectively) unordered, which means that keys come out in apparently random order rather than the order in which they are entered.
%hash = (a => 1, b => 2, c => 3);
Hashes can also be used as sets when only the key matters. Sets are unordered and contain only unique "values" (the hash's keys).
%set = (a => undef, b => undef, c => undef);
Which one to use depends on your data and algorithm. Use an array when order matters (particularly if you can't sort to derive the order) or if duplicate values are possible. Use a set (i.e. use a hash as a set) when values must be unique and don't care about order. Use a hash when uniqueness matters, order doesn't (or is easily sortable), and look-ups are based on arbitrary values rather than integers.
You can combine arrays and hashes (via references) to create arbitrarily complex data structures.
#aoa = ([1, 2, 3], [4, 5, 6]); # array of arrays ("2D" array)
%hoh = (a => { x => 1 }, b => { x => 2 }); # hash of hashes
#aoh = ({a => 1, b => 2}, {a => 3, b => 4}); # array of hashes
%hoa = (a => [1, 2], b => [3, 4]); # hash of arrays
...etc.
This about using numbers as hash keys. It doesn't answer the question directly as it doesn't compare the facilities that arrays provide, but I thought it would be a good place to put the information.
Suppose a hash with ten elements is built using code like this
use strict;
use warnings;
my %hash;
my $n = 1000;
for (1 .. 10) {
$hash{$n} = 1;
$n *= 1000;
}
and then we query it, looking for keys that are powers of ten. Of course the easiest way to multiply an integer by ten is to add a zero, so it is fine to write
my $m = '1';
for (1 .. 100) {
print $m, "\n" if $hash{$m};
$m .= 0;
}
which has the output
1000
1000000
1000000000
1000000000000
1000000000000000
1000000000000000000
We entered ten elements but this shows only six. What has happened? Let's take a look at what's in the hash.
use Data::Dump;
dd \%hash;
and this outputs
{
"1000" => 1,
"1000000" => 1,
"1000000000" => 1,
"1000000000000" => 1,
"1000000000000000" => 1,
"1000000000000000000" => 1,
"1e+021" => 1,
"1e+024" => 1,
"1e+027" => 1,
"1e+030" => 1,
}
so the hash doesn't use the keys that we imagined. It stringifies the numbers in a way that it would be foolish to try to emulate.
For a slightly more practical example, say we had some circles and wanted to collect into sets by area. The obvious thing is to use the area as a hash key, like this program which creates 100,000 circles with random integer diameters up to 18 million.
use strict;
use warnings;
use 5.010;
package Circle;
use Math::Trig 'pi';
sub new {
my $class = shift;
my $self = { radius => shift };
bless $self, $class;
}
sub area {
my $self = shift;
my $radius = $self->{radius};
pi * $radius * $radius;
}
package main;
my %circles;
for (1 .. 100_000) {
my $circle = Circle->new(int rand 18_000_000);
push #{ $circles{$circle->area} }, $circle;
}
Now let's see how many of those hash keys use scientific notation
say scalar grep /e/, keys %circles;
which says (randomly, of course)
861
so there really isn't a tidy way of know what string perl will use if we specify a number as a hash index.
In Perl an #array is an ordered list of values ($v1, $v2, ...) accessed by an integer (both positive and negative),
while a %hash is an unordered list of 'key => value' pairs (k1 => $v1, k2 => $v2, ...) accessed by a string.
There are modules on CPAN that implement ordered hashes, like: Hash::Ordered and Tie::IxHash
You might want to use an array when you have ordered 'items' presumably a great number as well, for
which using a %hash and sorting the keys and/or the values would be inefficient.