Difference between `+=` and `<<` inside a block for `each_with_object` - arrays

I had to update an array, and I used += and << in different runs of code inside a block passed to Array#each_with_object:
Code 1
(1..5).each_with_object([]) do |i, a|
puts a.inspect
a += [i]
end
Output:
[]
[]
[]
[]
[]
Code 2
(1..5).each_with_object([]) do |i, a|
puts a.inspect
a << [i]
end
Output:
[]
[1]
[1,2]
[1,2,3]
[1,2,3,4]
The += operator does not update the original array. Why? What am I missing here?

In each_with_object, the so-called memo object is common among the iterations. You need to modify that object in order to do something meaningful. The += operator is syntax sugar for + and assignment, which does not modify the receiver, hence the iteration has no effect. If you use methods like << or push, then it will have effect.
On the other hand, in inject, the so-called memo object is the return value of the block, and you don't need to modify the object, but you need to return the value you want for the next iteration.

It is clear to me that += operator is not updating the original array. Why?
Because the documentation says so (emphasis mine):
ary + other_ary → new_ary
Concatenation — Returns a new array built by concatenating the two arrays together to produce a third array.
[ 1, 2, 3 ] + [ 4, 5 ] #=> [ 1, 2, 3, 4, 5 ]
a = [ "a", "b", "c" ]
c = a + [ "d", "e", "f" ]
c #=> [ "a", "b", "c", "d", "e", "f" ]
a #=> [ "a", "b", "c" ]
Note that
x += y
is the same as
x = x + y
This means that it produces a new array. As a consequence, repeated use of += on arrays can be quite inefficient.
See also #concat.
Compare to <<
ary << obj → ary
Append—Pushes the given object on to the end of this array. This expression returns the array itself, so several appends may be chained together.
[ 1, 2 ] << "c" << "d" << [ 3, 4 ]
#=> [ 1, 2, "c", "d", [ 3, 4 ] ]
The documentation of Array#+ clearly says that a new array is returned (no less than four times, actually). This is consistent with other uses of the + method in Ruby, e.g. Bignum#+, Fixnum#+, Complex#+, Rational#+, Float#+, Time#+, String#+, BigDecimal#+, Date#+, Matrix#+, Vector#+, Pathname#+, Set#+, and URI::Generic#+.

Related

Ruby, Delete elements (slices) from a multidimensional array

(Pretty new with Ruby)
I can remove a block of elements from a single-dimensional array
array1D = Array.new(6){|i| i*i}
array1D.slice!(2,2) #=> [1, 16, 25]
len = array1D.length #=> 4
However,
Array(arrayd3d[0][0]).slice!(30000,8880)
on arrayd3d[1][1][38884],
I still get
len = array3D.length #=> 38884
1) What I'm doing wrong?
2) How can I delete the same block of elements (30000,8880) from all
arrayd3d[1..nDim1][1..nDim2]?
slice! returns the deleted object:
a = [ "a", "b", "c" ]
a.slice!(1) #=> "b"
a #=> ["a", "c"]
In general in ruby we prefer not to alter the original object unless we're looking for some particular performance gain (very very rare, eg maybe you want to reduce the memory consumption of a very large array before moving on).
That's the reason for the exclamation symbol (! aka bang) which usually indicates some destructive behaviour.
Please consider using the non-bang version instead.
Array.new(6){ |i| i*i }
y = array1D.slice(2,2)
or
def some_method(input_array)
input_array.slice(2,2)
end
x = Array.new(6){ |i| i*i }
y = some_method(x)
This way your code becomes more predictable as you're not altering the value of your arguments.

Ruby converting Array of Arrays into Array of Hashes

Please I need a help with this.
In Ruby If I have this array of arrays
array = [["a: 1", "b:2"],["a: 3", "b:4"]]
How can I obtain this array of hashes in ruby
aoh = [{:a => "1", :b => "2"},{:a => "3", :b => "4"}]
Note that, like pointed out in the comments, this is most likely an XY-problem and instead of transforming the array the better option is to build the starting array in a better manner.
Nevertheless, you can do this in the following manner:
aoh = array.map { |array| array.to_h { |string| string.split(':').map(&:strip) } }
# => [{"a"=>"1", "b"=>"2"}, {"a"=>"3", "b"=>"4"}]
The above will give you string keys which is the safer option to go with. You can convert them to symbols, but they should only be used for trusted identifiers. When the data comes from an user or external source I would go for the above.
Converting to symbols can be done by adding the following line:
# note that this line will mutate the aoh contents
aoh.each { |hash| hash.transform_keys!(&:to_sym) }
#=> [{:a=>"1", :b=>"2"}, {:a=>"3", :b=>"4"}]
array = [["a: 1", "b:2"], ["a: 3", "b:4"]]
array.map do |a|
Hash[
*a.flat_map { |s| s.split(/: */) }.
map { |s| s.match?(/\A\d+\z/) ? s : s.to_sym }
]
end
#=> [{:a=>"1", :b=>"2"}, {:a=>"3", :b=>"4"}]
The regular expression /: */ reads, "match a colon followed by zero or more (*) spaces". /\A\d+\z/ reads, "match the beginning of the string (\A) followed by one or more (+) digits (\d), followed by the end of the string (\z).
The steps are as follows. The first is for the element arr[0] to be passed to the block, the block variable a assigned its value and the block calculation performed.
a = array[0]
#=> ["a: 1", "b:2"]
b = a.flat_map { |s| s.split(/: */) }
#=> ["a", "1", "b", "2"]
c = b.map { |s| s.match?(/\A\d+\z/) ? s : s.to_sym }
#=> [:a, "1", :b, "2"]
d = Hash[*c]
#=> {:a=>"1", :b=>"2"}
We see the array ["a: 1", "b:2"] is mapped to {:a=>"1", :b=>"2"}. Next the element arr[1] is passed to the block, the block variable a is assigned its value and the block calculation is performed.
a = array[1]
#=> ["a: 3", "b:4"]
b = a.flat_map { |s| s.split(/: */) }
#=> ["a", "3", "b", "4"]
c = b.map { |s| s.match?(/\d/) ? s : s.to_sym }
#=> [:a, "3", :b, "4"]
d = Hash[*c]
#=> {:a=>"3", :b=>"4"}
The splat operator (*) causes Hash[*c] to be evaluated as:
Hash[:a, "3", :b, "4"]
See Hash::[].
Loop through your items, loop through its items, create a new array:
array.map do |items|
items.map do |item|
k,v = item.split(":", 2)
{ k.to_sym => v }
}
}
Note that we're using map instead of each which will return an array.

Working with Transpose functions result in error

consider the following array
arr = [["Locator", "Test1", "string1","string2","string3","string4"],
["$LogicalName", "Create Individual Contact","value1","value2"]]
Desired result:
[Test1=>{"string1"=>"value1","string2"=>"value2","string3"=>"","string4"=>""}]
When I do transpose, it gives me the error by saying second element of the array is not the length of the first element in the array,
Uncaught exception: element size differs (2 should be 4)
so is there any to add empty string in the place where there is no element and can perform the transpose and then create the hash as I have given above? The array may consist of many elements with different length but according to the size of the first element in the array, every other inner array has to change by inserting empty string and then I can do the transpose. Is there any way?
It sounds like you might want Enumerable#zip:
headers, *data_rows = input_data
headers.zip(*data_rows)
# => [["Locator", "$LogicalName"], ["Test1", "Create Individual Contact"],
# ["string1", "value1"], ["string2", "value2"], ["string3", nil], ["string4", nil]]
If you wish to transpose an array of arrays, each element of the array must be the same size. Here you would need to do something like the following.
arr = [["Locator", "Test1", "string1","string2","string3","string4"],
["$LogicalName", "Create Individual Contact","value1","value2"]]
keys, vals = arr
#=> [["Locator", "Test1", "string1", "string2", "string3", "string4"],
# ["$LogicalName", "Create Individual Contact", "value1", "value2"]]
idx = keys.index("Test1") + 1
#=> 2
{ "Test1" => [keys[idx..-1],
vals[idx..-1].
concat(['']*(keys.size - vals.size))].
transpose.
to_h }
#=> {"Test1"=>{"string1"=>"value1", "string2"=>"value2", "string3"=>"", "string4"=>""}}
It is not strictly necessary to define the variables keys and vals, but that avoids the need to create those arrays multiple times. It reads better as well, in my opinion.
The steps are as follows. Note keys.size #=> 6 and vals.size #=> 4.
a = vals[idx..-1]
#=> vals[2..-1]
#=> ["value1", "value2"]
b = [""]*(keys.size - vals.size)
#=> [""]*(4 - 2)
#=> ["", ""]
c = a.concat(b)
#=> ["value1", "value2", "", ""]
d = keys[idx..-1]
#=> ["string1", "string2", "string3", "string4"]
e = [d, c].transpose
#=> [["string1", "value1"], ["string2", "value2"], ["string3", ""], ["string4", ""]]
f = e.to_h
#=> {"string1"=>"value1", "string2"=>"value2", "string3"=>"", "string4"=>""}
f = e.to_h
#=> { "Test1" => f }
Find the longest Element in your Array and make sure every other element has the same length - loop and add maxLength - element(i).length amount of "" elements.

Ruby array += vs push

I have an array of arrays and want to append elements to the sub-arrays. += does what I want, but I'd like to understand why push does not.
Behavior I expect (and works with +=):
b = Array.new(3,[])
b[0] += ["apple"]
b[1] += ["orange"]
b[2] += ["frog"]
b => [["apple"], ["orange"], ["frog"]]
With push I get the pushed element appended to EACH sub-array (why?):
a = Array.new(3,[])
a[0].push("apple")
a[1].push("orange")
a[2].push("frog")
a => [["apple", "orange", "frog"], ["apple", "orange", "frog"], ["apple", "orange", "frog"]]
Any help on this much appreciated.
The issue here is b = Array.new(3, []) uses the same object as the base value for all the array cells:
b = Array.new(3, [])
b[0].object_id #=> 28424380
b[1].object_id #=> 28424380
b[2].object_id #=> 28424380
So when you use b[0].push, it adds the item to "each" sub-array because they are all, in fact, the same array.
So why does b[0] += ["value"] work? Well, looking at the ruby docs:
ary + other_ary → new_ary
Concatenation — Returns a new array built by concatenating the two arrays together to produce a third array.
[ 1, 2, 3 ] + [ 4, 5 ] #=> [ 1, 2, 3, 4, 5 ]
a = [ "a", "b", "c" ]
c = a + [ "d", "e", "f" ]
c #=> [ "a", "b", "c", "d", "e", "f" ]
a #=> [ "a", "b", "c" ]
Note that
x += y
is the same as
x = x + y
This means that it produces a new array. As a consequence, repeated use of += on arrays can be quite inefficient.
So when you use +=, it replaces the array entirely, meaning the array in b[0] is no longer the same as b[1] or b[2].
As you can see:
b = Array.new(3, [])
b[0].push("test")
b #=> [["test"], ["test"], ["test"]]
b[0].object_id #=> 28424380
b[1].object_id #=> 28424380
b[2].object_id #=> 28424380
b[0] += ["foo"]
b #=> [["test", "foo"], ["test"], ["test"]]
b[0].object_id #=> 38275912
b[1].object_id #=> 28424380
b[2].object_id #=> 28424380
If you're wondering how to ensure each array is unique when initializing an array of arrays, you can do so like this:
b = Array.new(3) { [] }
This different syntax lets you pass a block of code which gets run for each cell to calculate its original value. Since the block is run for each cell, a separate array is created each time.
It's because in the second code section, you're selecting the sub-array and pushing to it, if you want an array of array's you need to push the array to the main array.
a = Array.new(3,[])
a.push(["apple"])
a.push(["orange"])
a.push(["frog"])
to get the same result as the first one.
EDIT: I forgot to mention, because you initialize the array with blank array's as elements, you will have three empty elements in front of the pushed elements,

Most efficient way to count duplicated elements between two arrays

As part of a very basic program I am writing in Ruby, I am trying to find the total number of shared elements between two arrays of equal length, but
I need to include repeats.
My current example code for this situation is as follows:
array_a = ["B","A","A","A","B"]
array_b = ["A","B","A","B","B"]
counter = 0
array_a.each_index do |i|
array_a.sort[i] == array_b.sort[i]
counter += 1
end
end
puts counter
I want the return value of this comparison in this instance to be 4, and not 2, as the two arrays share 2 duplicate characters ("A" twice, and "B" twice). This seems to work, but I am wondering if there are any more efficient solutions for this issue. Specifically whether there are any methods you would suggest looking into. I spoke with someone who suggested a different method, inject, but I really don't understand how that applies and would like to understand. I did quite a bit of reading on uses for it, and it still isn't clear to me how it is appropriate. Thank you.
Looking at my code, I have realized that it doesn't seem to work for the situation that I am describing.
Allow me to reiterate and explain what I think the OP's original intent was:
Given arrays of equal size
array_a = ["B","A","A","A","B"]
array_b = ["A","B","A","B","B"]
We need to show the total number of matching pairs of elements between the two arrays. In other words, each B in array_a will "use up" a B in array_b, and the same will be true for each A. As there are two B's in array_a and three in array_b, this leaves us with a count of 2 for B, and following the same logic, 2 for A, for a sum of 4.
(array_a & array_b).map { |e| [array_a.count(e), array_b.count(e)].min }.reduce(:+)
If we get the intersection of the arrays with &, the result is a list of values that exist in both arrays. We then iterate over each match, and select the minimum number of times the element exists in either array --- this is the most number of times the element that can be "used". All that is left is to total the number of paired elements, with reduce(:+)
Changing array_a to ["B", "A", "A", "B", "B"] results in a total of 5, as there are now enough of B to exhaust the supply of B in array_b.
If I understand the question correctly, you could do the following.
Code
def count_shared(arr1, arr2)
arr1.group_by(&:itself).
merge(arr2.group_by(&:itself)) { |_,ov,nv| [ov.size, nv.size].min }.
values.
reduce(0) { |t,o| (o.is_a? Array) ? t : t + o }
end
Examples
arr1 = ["B","A","A","A","B"]
arr2 = ["A","B","A","B","B"]
count_shared(arr1, arr2)
#=> 4 (2 A's + 2 B's)
arr1 = ["B", "A", "C", "C", "A", "A", "B", "D", "E", "A"]
arr2 = ["C", "D", "F", "F", "A", "B", "A", "B", "B", "G"]
count_shared(arr1, arr2)
#=> 6 (2 A's + 2 B's + 1 C + 1 D + 0 E's + 0 F's + 0 G's)
Explanation
The steps are as follows for a slightly modified version of the first example.
arr1 = ["B","A","A","A","B","C","C"]
arr2 = ["A","B","A","B","B","D"]
First apply Enumerable#group_by to both arr1 and arr2:
h0 = arr1.group_by(&:itself)
#=> {"B"=>["B", "B"], "A"=>["A", "A", "A"], "C"=>["C", "C"]}
h1 = arr2.group_by(&:itself)
#=> {"A"=>["A", "A"], "B"=>["B", "B", "B"], "D"=>["D"]}
Prior to Ruby v.2.2, when Object#itself was introduced, you would have to write:
arr.group_by { |e| e }
Continuing,
h2 = h0.merge(h1) { |_,ov,nv| [ov.size, nv.size].min }
#=> {"B"=>2, "A"=>2, "C"=>["C", "C"], "D"=>["D"]}
I will return shortly to explain the above calculation.
a = h2.values
#=> [2, 2, ["C", "C"], ["D"]]
a.reduce(0) { |t,o| (o.is_a? Array) ? t : t + o }
#=> 4
Here Enumerable#reduce (aka inject) merely sums the values of a that are not arrays. The arrays correspond to elements of arr1 that do not appear in arr2 or vise-versa.
As promised, I will now explain how h2 is computed. I've used the form of Hash#merge that employs a block (here { |k,ov,nv| [ov.size, nv.size].min }) to compute the values of keys that are present in both hashes being merged. For example, when the first key-value pair of h1 ("A"=>["A", "A"]) is being merged into h0, since h0 also has a key "A", the array
["A", ["A", "A", "A"], ["A", "A"]]
is passed to the block and the three block variables are assigned values (using "parallel assignment", which is sometimes called "multiple assignment"):
k, ov, nv = ["A", ["A", "A", "A"], ["A", "A"]]
so we have
k #=> "A"
ov #=> ["A", "A", "A"]
nv #=> ["A", "A"]
k is the key, ov ("old value") is the value of "A" in h0 and nv ("new value") is the value of "A" in h1. The block calculation is
[ov.size, nv.size].min
#=> [3,2].min = 2
so the value of "A" is now 2.
Notice that the key, k, is not used in the block calculation (which is very common when using this form of merge). For that reason I've changed the block variable from k to _ (a legitimate local variable), both to reduce the chance of introducing a bug and to signal to the reader that the key is not used in the block. The other elements of h2 that use this block are computed similarly.
Another way
It would be quite simple if we had available an Array method I've proposed be added to the Ruby core:
array_a = ["B","A","A","A","B"]
array_b = ["A","B","A","B","B"]
array_a.size - (array_a.difference(array_b)).size
#=> 4
or
array_a.size - (array_b.difference(array_a)).size
#=> 4
I've cited other applications in my answer here.
This is a perfect job for Enumerable#zip and Enumerable#count:
array_a.zip(array_b).count do |a, b|
a == b
end
# => 2
The zip method pairs up elements, "zippering" them together, and the count method can take a block as to if the element should be counted.
The inject method is very powerful, but it's also the most low-level. Pretty much every other Enumerable method can be created with inject if you work at it, so it's quite flexible, but usually a more special-purpose method is better suited. It's still a useful tool if applied correctly.
In this case zip and count do a much better job and if you know what these methods do, this code is self explanatory.
Update:
If you need to count all overlapping letters regardless of order you need to do some grouping on them. Ruby on Rails provides the handy group_by method in ActiveSupport, but in pure Ruby you need to make your own.
Here's an approach that counts up all the unique letters, grouping them using chunk:
# Convert each array into a map like { "A" => 2, "B" => 3 }
# with a default count of 0.
counts = [ array_a, array_b ].collect do |a|
Hash.new(0).merge(
Hash[a.sort.chunk { |v| v }.collect { |k, a| [ k, a.length ] }]
)
end
# Iterate over one of the maps key by key and count the minimum
# overlap between the two.
counts[0].keys.inject(0) do |sum, key|
sum + [ counts[0][key], counts[1][key] ].min
end

Resources