Convert an array into an index hash in Ruby - arrays

I have an array, and I want to make a hash so I can quickly ask "is X in the array?".
In perl, there is an easy (and fast) way to do this:
my #array = qw( 1 2 3 );
my %hash;
#hash{#array} = undef;
This generates a hash that looks like:
{
1 => undef,
2 => undef,
3 => undef,
}
The best I've come up with in Ruby is:
array = [1, 2, 3]
hash = Hash[array.map {|x| [x, nil]}]
which gives:
{1=>nil, 2=>nil, 3=>nil}
Is there a better Ruby way?
EDIT 1
No, Array.include? is not a good idea. Its slow. It does a query in O(n) instead of O(1). My example array had three elements for brevity; assume the actual one has a million elements. Let's do a little benchmarking:
#!/usr/bin/ruby -w
require 'benchmark'
array = (1..1_000_000).to_a
hash = Hash[array.map {|x| [x, nil]}]
Benchmark.bm(15) do |x|
x.report("Array.include?") { 1000.times { array.include?(500_000) } }
x.report("Hash.include?") { 1000.times { hash.include?(500_000) } }
end
Produces:
user system total real
Array.include? 46.190000 0.160000 46.350000 ( 46.593477)
Hash.include? 0.000000 0.000000 0.000000 ( 0.000523)

If all you need the hash for is membership, consider using a Set:
Set
Set implements a collection of unordered values with no
duplicates. This is a hybrid of Array's intuitive inter-operation
facilities and Hash's fast lookup.
Set is easy to use with Enumerable objects (implementing
each). Most of the initializer methods and binary operators accept
generic Enumerable objects besides sets and arrays. An
Enumerable object can be converted to Set using the
to_set method.
Set uses Hash as storage, so you must note the following points:
Equality of elements is determined according to Object#eql? and Object#hash.
Set assumes that the identity of each element does not change while it is stored. Modifying an element of a set will render the set to an
unreliable state.
When a string is to be stored, a frozen copy of the string is stored instead unless the original string is already frozen.
Comparison
The comparison operators <, >, <= and >= are implemented as
shorthand for the {proper_,}{subset?,superset?} methods. However, the
<=> operator is intentionally left out because not every pair of
sets is comparable. ({x,y} vs. {x,z} for example)
Example
require 'set'
s1 = Set.new [1, 2] # -> #<Set: {1, 2}>
s2 = [1, 2].to_set # -> #<Set: {1, 2}>
s1 == s2 # -> true
s1.add("foo") # -> #<Set: {1, 2, "foo"}>
s1.merge([2, 6]) # -> #<Set: {1, 2, "foo", 6}>
s1.subset? s2 # -> false
s2.subset? s1 # -> true
[...]
Public Class Methods
new(enum = nil)
Creates a new set containing the elements of the given enumerable
object.
If a block is given, the elements of enum are preprocessed by the
given block.

try this one:
a=[1,2,3]
Hash[a.zip]

You can do this very handy trick:
Hash[*[1, 2, 3, 4].map {|k| [k, nil]}.flatten]
=> {1=>nil, 2=>nil, 3=>nil, 4=>nil}

If you want to quickly ask "is X in the array?" you should use Array#include?.
Edit (in response to addition in OP):
If you want speedy look up times, use a Set. Having a Hash that points to all nils is silly. Conversion is an easy process too with Array#to_set.
require 'benchmark'
require 'set'
array = (1..1_000_000).to_a
set = array.to_set
Benchmark.bm(15) do |x|
x.report("Array.include?") { 1000.times { array.include?(500_000) } }
x.report("Set.include?") { 1000.times { set.include?(500_000) } }
end
Results on my machine:
user system total real
Array.include? 36.200000 0.140000 36.340000 ( 36.740605)
Set.include? 0.000000 0.000000 0.000000 ( 0.000515)
You should consider just using a set to begin with, instead of an array so that a conversion is never necessary.

I'm fairly certain that there isn't a one-shot clever way to construct this hash. My inclination would be to just be explicit and state what I'm doing:
hash = {}
array.each{|x| hash[x] = nil}
It doesn't look particularly elegant, but it's clear, and does the job.
FWIW, your original suggestion (under Ruby 1.8.6 at least) doesn't seem to work. I get an "ArgumentError: odd number of arguments for Hash" error. Hash.[] expects a literal, even-lengthed list of values:
Hash[a, 1, b, 2] # => {a => 1, b => 2}
so I tried changing your code to:
hash = Hash[*array.map {|x| [x, nil]}.flatten]
but the performance is dire:
#!/usr/bin/ruby -w
require 'benchmark'
array = (1..100_000).to_a
Benchmark.bm(15) do |x|
x.report("assignment loop") {hash = {}; array.each{|e| hash[e] = nil}}
x.report("hash constructor") {hash = Hash[*array.map {|e| [e, nil]}.flatten]}
end
gives
user system total real
assignment loop 0.440000 0.200000 0.640000 ( 0.657287)
hash constructor 4.440000 0.250000 4.690000 ( 4.758663)
Unless I'm missing something here, a simple assignment loop seems the clearest and most efficient way to construct this hash.

Rampion beat me to it. Set might be the answer.
You can do:
require 'set'
set = array.to_set
set.include?(x)

Your way of creating the hash looks good. I had a muck around in irb and this is another way
>> [1,2,3,4].inject(Hash.new) { |h,i| {i => nil}.merge(h) }
=> {1=>nil, 2=>nil, 3=>nil, 4=>nil}

I think chrismear's point on using assignment over creation is great. To make the whole thing a little more Ruby-esque, though, I might suggest assigning something other than nil to each element:
hash = {}
array.each { |x| hash[x] = 1 } # or true or something else "truthy"
...
if hash[376] # instead of if hash.has_key?(376)
...
end
The problem with assigning to nil is that you have to use has_key? instead of [], since [] give you nil (your marker value) if the Hash doesn't have the specified key. You could get around this by using a different default value, but why go through the extra work?
# much less elegant than above:
hash = Hash.new(42)
array.each { |x| hash[x] = nil }
...
unless hash[376]
...
end

Maybe I am misunderstanding the goal here; If you wanted to know if X was in the array, why not do array.include?("X") ?

Doing some benchmarking on the suggestions so far gives that chrismear and Gaius's assignment-based hash creation is slightly faster than my map method (and assigning nil is slightly faster than assigning true). mtyaka and rampion's Set suggestion is about 35% slower to create.
As far as lookups, hash.include?(x) is a very tiny amount faster than hash[x]; both are twice as a fast as set.include?(x).
user system total real
chrismear 6.050000 0.850000 6.900000 ( 6.959355)
derobert 6.010000 1.060000 7.070000 ( 7.113237)
Gaius 6.210000 0.810000 7.020000 ( 7.049815)
mtyaka 8.750000 1.190000 9.940000 ( 9.967548)
rampion 8.700000 1.210000 9.910000 ( 9.962281)
user system total real
times 10.880000 0.000000 10.880000 ( 10.921315)
set 93.030000 17.490000 110.520000 (110.817044)
hash-i 45.820000 8.040000 53.860000 ( 53.981141)
hash-e 47.070000 8.280000 55.350000 ( 55.487760)
Benchmarking code is:
#!/usr/bin/ruby -w
require 'benchmark'
require 'set'
array = (1..5_000_000).to_a
Benchmark.bmbm(10) do |bm|
bm.report('chrismear') { hash = {}; array.each{|x| hash[x] = nil} }
bm.report('derobert') { hash = Hash[array.map {|x| [x, nil]}] }
bm.report('Gaius') { hash = {}; array.each{|x| hash[x] = true} }
bm.report('mtyaka') { set = array.to_set }
bm.report('rampion') { set = Set.new(array) }
end
hash = Hash[array.map {|x| [x, true]}]
set = array.to_set
array = nil
GC.start
GC.disable
Benchmark.bmbm(10) do |bm|
bm.report('times') { 100_000_000.times { } }
bm.report('set') { 100_000_000.times { set.include?(500_000) } }
bm.report('hash-i') { 100_000_000.times { hash.include?(500_000) } }
bm.report('hash-e') { 100_000_000.times { hash[500_000] } }
end
GC.enable

If you're not bothered what the hash values are
irb(main):031:0> a=(1..1_000_000).to_a ; a.length
=> 1000000
irb(main):032:0> h=Hash[a.zip a] ; h.keys.length
=> 1000000
Takes a second or so on my desktop.

If you're looking for an equivalent of this Perl code:
grep {$_ eq $element} #array
You can just use the simple Ruby code:
array.include?(element)

Here's a neat way to cache lookups with a Hash:
a = (1..1000000).to_a
h = Hash.new{|hash,key| hash[key] = true if a.include? key}
Pretty much what it does is create a default constructor for new hash values, then stores "true" in the cache if it's in the array (nil otherwise). This allows lazy loading into the cache, just in case you don't use every element.

This preserves 0's if your hash was [0,0,0,1,0]
hash = {}
arr.each_with_index{|el, idx| hash.merge!({(idx + 1 )=> el }) }
Returns :
# {1=>0, 2=>0, 3=>0, 4=>1, 5=>0}

Related

In Ruby, during iteration of an array, how do I send multiple array elements as values for one specific hash key?

I know how to iterate through an array and I know how to create hash keys and values. I do not know how to create an array for the value and pass it multiple elements. My desired value for hash below is:
{'abc' => [1, 2, 3] , 'def' => [4,5,6,7]}
How would I achieve this hash, while iterating through array a and b below using each?
a = [1,2,3,4,5,6,7]
c = [1,2,3]
b = ['abc', 'def']
hash = {}
From your guidelines given in the comment:
While iterating through array a, if the element of iteration is included in array c, it is passed to the array value within key 'abc'. Otherwise, it is passed to other array value in key 'def'
You can do this:
hash = {}
hash['abc'] = a.select { |x| c.include?(x) }
hash['def'] = a.reject{ |x| c.include?(x) }
See Enumerable#select and Enumerable#reject. Also can take a look at Enumerable#partition which would be another good choice here, where you want to split an array into two arrays based on some condition:
in_a, not_in_a = a.partition { |x| c.include?(x) }
hash = { 'abc' => in_a, 'def' => not_in_a }
You can also do it with regular each if these fancy enumerable methods are bit too much for you:
hash = { 'abc' => [], 'def' => [] }
a.each do |x|
if c.include?(x)
hash['abc'].push(x)
else
hash['def'].push(x)
end
end
Unfortunately this question turned out not to be as interesting as I was hoping. I was hoping that the problem was this:
Knowing the hash key and a value, how can I make sure the key's value is an array and that the given value is appended to that?
For instance, start with h being {}. I have a key name :k and a value 1. I want h[:k], if it doesn't already exist, to be [1]. But if it does already exist, then it's an array and I want to append 1 to that array; for instance, if h[:k] is already [3,2], now it should be [3,2,1].
I can think of various ways to ensure that, but one possibility is this:
(hash[key] ||= []) << value
To see that this works, let's make it a method:
def add_to_hash_array_value(hash:, key:, value:)
(hash[key] ||= []) << value
end
Now I'll just call that a bunch of times:
h = {}
add_to_hash_array_value(hash:h, key:"abc", value:1)
add_to_hash_array_value(hash:h, key:"abc", value:2)
add_to_hash_array_value(hash:h, key:"def", value:4)
add_to_hash_array_value(hash:h, key:"def", value:5)
add_to_hash_array_value(hash:h, key:"abc", value:3)
puts h #=> {"abc"=>[1, 2, 3], "def"=>[4, 5]}
We got the right answer.
This is nice because suppose I have some way of knowing, given a value, what key it should be appended to. I can just repeatedly apply that decision-making process.
However, the trouble with trying to apply that to the original question is that the original question seems not to know exactly how to decide, given a value, what key it should be appended to.

What's the cleanest way to construct a Ruby array using a while loop?

Ruby has lots of nice ways of iterating and directly returning that result. This mostly involve array methods. For example:
def ten_times_tables
(1..5).map { |i| i * 10 }
end
ten_times_tables # => [10, 20, 30, 40, 50]
However, I sometimes want to iterate using while and directly return the resulting array. For example, the contents of the array may depend on the expected final value or some accumulator, or even on conditions outside of our control.
A (contrived) example might look like:
def fibonacci_up_to(max_number)
sequence = [1, 1]
while sequence.last < max_number
sequence << sequence[-2..-1].reduce(:+)
end
sequence
end
fibonacci_up_to(5) # => [1, 1, 2, 3, 5]
To me, this sort of approach feels quite "un-Ruby". The fact that I construct, name, and later return an array feels like an anti-pattern. So far, the best I can come up with is using tap, but it still feels quite icky (and quite nested):
def fibonacci_up_to(max_number)
[1, 1].tap do |sequence|
while sequence.last < max_number
sequence << sequence[-2..-1].reduce(:+)
end
end
end
Does anyone else have any cleverer solutions to this sort of problem?
Something you might want to look into for situations like this (though maybe your contrived example fits this a lot better than your actual use case) is creating an Enumerator, so your contrived example becomes:
From the docs for initialize:
fib = Enumerator.new do |y|
a = b = 1
loop do
y << a
a, b = b, a + b
end
end
and then call it:
p fib.take_while { |elem| elem <= 5 }
#=> [1, 1, 2, 3, 5]
So, you create an enumerator which iterates all your values and then once you have that, you can iterate through it and collect the values you want for your array in any of the usual Ruby-ish ways
Similar to Simple Lime's Enumerator solution, you can write a method that wraps itself in an Enumerator:
def fibonacci_up_to(max_number)
return enum_for(__callee__, max_number) unless block_given?
a = b = 1
while a <= max_number
yield a
a, b = b, a + b
end
end
fibonacci_up_to(5).to_a # => [1, 1, 2, 3, 5]
This achieves the same result as returning an Enumerator instance from a method, but it looks a bit nicer and you can use the yield keyword instead of a yielder block variable. It also lets you do neat things like:
fibonacci_up_to(5) do |i|
# ..
end

ruby how to add a hash element into an existing json array

I have code that outputs the following:
car_id = ["123"]
model = [:model].product(car_id).map { |k,v| {k=>v} }
model = [{:model=>"123"}]
I would like to then add a new hash :make into the json like this:
model_with_make = [{:model=>"123", :make => "acura"}]
How do I do this?
Unfortunately, every solution I find produces this:
[{:model=>"123"}, {:make => "acura"}] and that is not what I want.
Ruby convention for showing result of a calculation
I assume you mean that
model = [:model].product(car_id).map { |k,v| {k=>v} }
produces an array containing a single hash:
[{:model=>"123"}]
and that you are not then executing the statement:
model = [{:model=>"123"}]
which would overwrite the the value of model from the previous statement, rendering the previous statement meaningless. If so, the normal way to write that is as follows.
model = [:model].product(car_id).map { |k,v| {k=>v} }
#=> [{:model=>"123"}]
Computing the array
Next,
arr = [:model].product(car_id)
#=> [[:model, "123"]]
But why use Array#product when the arrays [:model] and car_id both contain a single element? It's simpler to just write
arr = [[:model, car_id.first]]
#=> [[:model, "123"]]
Converting the array to a hash
Since arr contains only one element, there's not much point to mapping it; just convert it to a hash:
Hash[arr]
#=> {:model=>"123"}
or (for Ruby versions 1.9+):
arr.to_h
#=> {:model=>"123"}
Add a key-value pair to the hash
If you wish to add the key-value pair :make => "acura" to
h = {:model=>"123"}
you can simply write
h[:make] = "acura"
#=> "acura"
h #=> {:model=>"123", :make=>"acura"}
or, in one line,
(h[:make] = "acura") && h
#=> {:model=>"123", :make=>"acura"}
Wrapping up
Putting this together, you could write
h = [[:model, car_id.first]].to_h
#=> {:model=>"123"}
(h[:make] = "acura") && h
#=> {:model=>"123", :make=>"acura"}
model = [{:model=>"123"}]
model_with_make = model.map { |m| m.merge(make: 'acura') }
#⇒ [{:model=>"123", :make => "acura"}]
If you are concerned that there is the only element in the array model, you might modify it inplace:
model.first.merge!(make: 'acura')
model
#⇒ [{:model=>"123", :make => "acura"}]

How to know if multiple arrays in a list are the same

I would like to know whether all arrays within a list are the same.
== compares two arrays, but I want to know if there is any library method to tell if all arrays within a list are the same.
You can traverse the list of arrays just once, comparing the first array with all the other arrays. If the first one is equal to all the others, then all arrays in the list are equal. Something like this will work:
arrays = [[1,3],[1,3],[1,3]]
array0 = arrays.first
arrays[1..-1].all? { |a| array0 == a }
# => true
arrays = [[1,3],[1,3],[1,4]]
array0 = arrays.first
arrays[1..-1].all? { |a| array0 == a }
# => false
I was curious about the performance of each of the solutions here. Please be welcome to edit this post with your own results, if you like.
In my tests, the difference between the approaches raised with the length of the list of arrays, so I preferably measured a long list of relatively short arrays. I always did a few runs to remove the possible influence of GC.
require 'benchmark'
n = 10
n_arrays = 1000000
arrays = [(1..n).to_a] * n_arrays
Benchmark.bm(14) do |bm|
bm.report("1st vs others:") do
array0 = arrays.first
arrays[1..-1].all? { |a| array0 == a }
end
bm.report("uniq:") { arrays.uniq.size == 1 }
bm.report("each_cons:") { arrays.each_cons(2).all?{|x, y| x == y} }
end
The results suggest that while the each_cons approach is about the same (only slightly slower) than the "1st vs others" approach, the one using uniq is much much slower.
user system total real
1st vs others: 0.080000 0.000000 0.080000 ( 0.080872)
uniq: 1.810000 0.000000 1.810000 ( 1.807646)
each_cons: 0.180000 0.000000 0.180000 ( 0.174251)
[[1,3],[1,3],[1,3]].uniq.size == 1
#=> true
[[1,3],[1,3],[1,4]].uniq.size == 1
#=> false
array.each_cons(2).all?{|x, y| x == y}

Add key value pair to Array of Hashes when unique Id's match

I have two arrays of hashes
sent_array = [{:sellersku=>"0421077128", :asin=>"B00ND80WKY"},
{:sellersku=>"0320248102", :asin=>"B00WTEF9FG"},
{:sellersku=>"0324823180", :asin=>"B00HXZLB4E"}]
active_array = [{:price=>39.99, :asin1=>"B00ND80WKY"},
{:price=>7.99, :asin1=>"B00YSN9QOG"},
{:price=>10, :asin1=>"B00HXZLB4E"}]
I want to loop through sent_array, and find where the value in :asin is equal to the value in :asin1 in active_array, then copy the key & value of :price to sent_array. Resulting in this:
final_array = [{:sellersku=>"0421077128", :asin=>"B00ND80WKY", :price=>39.99},
{:sellersku=>"0320248102", :asin=>"B00WTEF9FG"},
{:sellersku=>"0324823180", :asin=>"B00HXZLB4E", :price=>10}]
I tried this, but I get a TypeError - no implicit conversion of Symbol into Integer (TypeError)
sent_array.each do |x|
x.detect { |key, value|
if value == active_array[:asin1]
x[:price] << active_array[:price]
end
}
end
For reasons of both efficiency and readability, it makes sense to first construct a lookup hash on active_array:
h = active_array.each_with_object({}) { |g,h| h[g[:asin1]] = g[:price] }
#=> {"B00ND80WKY"=>39.99, "B00YSN9QOG"=>7.99, "B00HXZLB4E"=>10}
We now merely step through sent_array, updating the hashes:
sent_array.each { |g| g[:price] = h[g[:asin]] if h.key?(g[:asin]) }
#=> [{:sellersku=>"0421077128", :asin=>"B00ND80WKY", :price=>39.99},
# {:sellersku=>"0320248102", :asin=>"B00WTEF9FG"},
# {:sellersku=>"0324823180", :asin=>"B00HXZLB4E", :price=>10}]
Retrieving a key-value pair from a hash (h) is much faster, of course, than searching for a key-value pair in an array of hashes.
This does the trick. Iterate over your sent array and attempt to find a record in your active_array that has that :asin. If you find something, set the price and you are done.
Your code I believe used detect/find incorrectly. What you want out of that method is the hash that matches and then do something with that. You were trying to do everything inside of detect.
sent_array.each do |sent|
item = active_array.find{ |i| i.has_value? sent[:asin] }
sent[:price] = item[:price] if item
end
=> [{:sellersku=>"0421077128", :asin=>"B00ND80WKY", :price=>39.99}, {:sellersku=>"0320248102", :asin=>"B00WTEF9FG"}, {:sellersku=>"0324823180", :asin=>"B00HXZLB4E", :price=>10}]
I am assuming second element of both sent_array and active_array has B00WTEF9FG as asin and asin1 respectively. (seeing your final result)
Now:
a = active_array.group_by{|a| a[:asin1]}
b = sent_array.group_by{|a| a[:asin]}
a.map { |k,v|
v[0].merge(b[k][0])
}
# => [{:price=>39.99, :asin1=>"B00ND80WKY", :sellersku=>"0421077128", :asin=>"B00ND80WKY"}, {:price=>7.99, :asin1=>"B00WTEF9FG", :sellersku=>"0320248102", :asin=>"B00WTEF9FG"}, {:price=>10, :asin1=>"B00HXZLB4E", :sellersku=>"0324823180", :asin=>"B00HXZLB4E"}]
Why were you getting TypeError?
You are doing active_array[:asin1]. Remember active_array itself is an Array. Unless you iterate over it, you cannot look for keys.
Another issue with your approach is, you are using Hash#detect
find is implemented in terms of each. And each, when called on a
Hash, returns key-value pairs in form of arrays with 2 elements
each. That's why find returns an array.
source
Same is true for detect. So x.detect { |key, value| .. } is not going to work as you are expecting it to.
Solution without assumption
a.map { |k,v|
b[k] ? v[0].merge(b[k][0]) : v[0]
}.compact
# => [{:price=>39.99, :asin1=>"B00ND80WKY", :sellersku=>"0421077128", :asin=>"B00ND80WKY"}, {:price=>7.99, :asin1=>"B00YSN9QOG"}, {:price=>10, :asin1=>"B00HXZLB4E", :sellersku=>"0324823180", :asin=>"B00HXZLB4E"}]
Here since asin1 => "B00ND80WKY" has no match, it cannot get sellersku from other hash.

Resources