Elixir-like pipes in Ruby to process collections

Elixir-like pipes in Ruby to process collections - arrays

In Elixir there is a great pipeline operator working like this:
"hello, world!"
|> String.split(" ")
|> Enum.map(&String.capitalize/1)
|> Enum.join
In Ruby we can use similar syntax:
"hello, world!"
.split(" ")
.map(&:capitalize)
.join
It works only when have all these methods defined for a object itself. If need to call some local method we should use something like:
.map { |el| URI.parse(el) }
But what if we want to make some collection processing (not a single element), for example GZIP Compression:
chars = text
.downcase
.chars
compressed = GZipped.new(chars).bytes
But chain is broken!
I've found some links, but looks not awesome:
pipe_envy - UGLY! no collections
chainable_methods - no collections
How to use chainable_methods
piperator - much better! But looks heavy
In my opinion it would be great to have something like:
text
.split
.pipe(URI.method(:parse))
.map(&:to_s)
.join
.pipe(GZIPped)
.pipe(Base64.method(:encode))
What is the best way to build such pipes in Ruby?
Update 1
Here is an example
class Dedup
def initialize(obj)
#obj = obj
end
def each
Enumerator.new do |y|
prev = nil
#obj.each do |el|
if el != prev
y << el
prev = el
end
end
end
end
end
expect(
"1 1 1 2 2 3"
.split
.then { |obj| Dedup.new(obj).each }
.to_a
).to eq [1, 2, 3]
This chaining looks ugly and unreadable.
Comparing to:
expect(
"1 1 1 2 2 3"
.split
.pipe(Dedup)
.to_a
).to eq [1, 2, 3]

There's already a method like this, at least starting from Ruby 2.5 - yield_self, aliased as then in Ruby 2.6. You can use & operator with any object responding to to_proc to pass it instead of a block.
text
.split
.map(&URI.method(:parse)) # URI#parse expects a string, not an array
.map(&:to_s)
.join
.then(&GZIPped) # not sure what GZIPped is - I'll assume it has .to_proc method
.then(&Base64.method(:encode))
(I should probably mention that the code above will not actually work and honestly I have no clue what it would suppose to do - why would split a string, convert them to into urls and then back again to strings? The only thing this would to is to raise id one of the substrngs were not a valid string? But then you try to read resulting string as a gzipped file... I'm assuming I misunderstood something in your code)
More advanced stuff - one thing that I quite like in elixir was an option to chain methods together with remaining arguments. This also can be simulated in ruby, but requires a bit of work and good think whether it's worth it:
module MyMath
module_function
UNDEFINED = Object.new
def add(a, b = UNDEFINED)
if b == UNDEFINED
return ->(num) { add(a, num) }
end
a + b
end
end
MyMath.add(2,5) #=> 7
[1,2,5,9].map(&MyMath.add(5)] #=> [6,7,10,14]

Related

How to collapse a multi-dimensional array of hashes in Ruby?

Background:
Hey all, I am experimenting with external APIs and am trying to pull in all of the followers of a User from a site and apply some sorting.
I have refactored a lot of the code, HOWEVER, there is one part that is giving me a really tough time. I am convinced there is an easier way to implement this than what I have included and would be really grateful on any tips to do this in a much more eloquent way.
My goal is simple. I want to collapse an array of arrays of hashes (I hope that is the correct way to explain it) into one array of hashes.
Problem Description:
I have an array named f_collectionswhich has 5 elements. Each element is an array of size 200. Each sub-element of these arrays is a hash of about 10 key-value pairs. My best representation of this is as follows:
f_collections = [ collection1, collection2, ..., collection5 ]
collection1 = [ hash1, hash2, ..., hash200]
hash1 = { user_id: 1, user_name: "bob", ...}
I am trying to collapse this multi-dimensional array into one array of hashes. Since there are five collection arrays, this means the results array would have 1000 elements - all of which would be hashes.
followers = [hash1, hash2, ..., hash1000]
Code (i.e. my attempt which I do not want to keep):
I have gotten this to work with a very ugly piece of code (see below), with nested if statements, blocks, for loops, etc... This thing is a nightmare to read and I have tried my hardest to research ways to do this in a simpler way, I just cannot figure out how. I have tried flatten but it doesn't seem to work.
I am mostly just including this code to show I have tried very hard to solve this problem, and while yes I solved it, there must be a better way!
Note: I have simplified some variables to integers in the code below to make it more readable.
for n in 1..5 do
if n < 5
(0..199).each do |j|
if n == 1
nj = j
else
nj = (n - 1) * 200 + j
end
#followers[nj] = #f_collections[n-1].collection[j]
end
else
(0..199).each do |jj|
njj = (4) * 200 + jj
#followers[njj] = #f_collections[n-1].collection[jj]
end
end
end

Oh... so It is not an array objects that hold collections of hashes. Kind of. Lets give it another try:
flat = f_collection.map do |col|
col.collection
end.flatten
which can be shortened (and is more performant) to:
flat = f_collection.flat_map do |col|
col.collection
end
This works because the items in the f_collection array are objects that have a collection attribute, which in turn is an array.
So it is "array of things that have an array that contains hashes"
Old Answer follows below. I leave it here for documentation purpose. It was based on the assumption that the data structure is an array of array of hashes.
Just use #flatten (or #flatten! if you want this to be "inline")
flat = f_collections.flatten
Example
sub1 = [{a: 1}, {a: 2}]
sub2 = [{a: 3}, {a: 4}]
collection = [sub1, sub2]
flat = collection.flatten # returns a new collection
puts flat #> [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}]
# or use the "inplace"/"destructive" version
collection.flatten! # modifies existing collection
puts collection #> [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}]
Some recommendations for your existing code:
Do not use for n in 1..5, use Ruby-Style enumeration:
["some", "values"].each do |value|
puts value
end
Like this you do not need to hardcode the length (5) of the array (did not realize you removed the variables that specify these magic numbers). If you you want to detect the last iteration you can use each_with_index:
a = ["some", "home", "rome"]
a.each_with_index do |value, index|
if index == a.length - 1
puts "Last value is #{value}"
else
puts "Values before last: #{value}"
end
end
While #flatten will solve your problem you might want to see how DIY-solution could look like:
def flatten_recursive(collection, target = [])
collection.each do |item|
if item.is_a?(Array)
flatten_recursive(item, target)
else
target << item
end
end
target
end
Or an iterative solution (that is limited to two levels):
def flatten_iterative(collection)
target = []
collection.each do |sub|
sub.each do |item|
target << item
end
end
target
end

Delete item from array without returning it

I'm trying to write a method that will cause a rspec test like this to pass:
it "starts the thing and move on" do
class.method_1("Name One")
class.method_1("Name Two")
expect(class.method_2).to eq "Some string Name One"
expect(class.method_3).to eq ["Name Two"]
end
method_1 just adds a name to an array, and method_3 returns the array (defined in initialize method):
def method_1(name)
#array << name
end
def method_3
#array
end
I figured it would be pretty simple to interpolate #array[0] into the string and use #array.delete_at(0) to modify the array. Like so:
def method_2
p "Some string #{#array[0]}"
#array.delete_at(0)
end
But that method returns "Name One" instead of the string. If I comment out the delete code, the string returns properly but my array hasn't been modified. I've been in Ruby docs for a long time but #shift has the same issue about returning the removed item.
I'm almost certain I've over complicated this -- what am I missing?

You can collapse all this down to more conventional Ruby like this:
class MyTestClass
attr_reader :array
def initialize
#array = [ ]
end
def push(s)
#array << s
end
def special_shift
"Some string #{#array.shift}"
end
end
Then in terms of usage:
it "starts the thing and move on" do
my_thing.push("Name One")
my_thing.push("Name Two")
expect(my_thing.special_shift).to eq "Some string Name One"
expect(my_thing.array).to eq ["Name Two"]
end
Using names like push and shift which are consistent with Ruby conventions make the purpose and action of a method a lot easier to understand.
When it comes to your implementation of method_3 you forget that you can inline whatever you want inside a #{...} block, even methods that modify things. The p method is used for display, it won't return anything. To return something you need to have it either as the last thing evaluated (implicit) or by using return (explicit).

Change method_2 to the following to get the array back
def method_2
p "Some string #{#array[0]}"
#array.delete_at(0)
#array
end
From array#delete_if on ruby-doc.org
Deletes the element at the specified index, returning that element, or nil if the index is out of range.
Alternatively use object#tapwhich returns self
#array = [1,2,3,4]
#=> [1, 2, 3, 4]
#array.tap {|arr| arr.delete_at(0)}
#=> [2, 3, 4]

Return unique values of an array without using `uniq`

For a challenge, I'm trying to return the unique values of an array without using uniq. This is what I have so far, which doesn't work:
def unique
unique_arr = []
input_arr.each do |word|
if word != unique_arr.last
unique_arr.push word
end
end
puts unique_arr
end
input = gets.chomp
input_arr = input.split.sort
input_arr.unique
My reasoning here was that if I sorted the array first before I iterated through it with each, I could push it to unique_arr without repetition being a possibility considering if it's a duplicate, the last value pushed would match it.
Am I tackling this the wrong way?

Yes, you are making at least two mistakes.
If you want to call it as input_arr.unique with input_arr being an array, then you have to define the method on Array. You have input_arr within your method body, which comes from nowhere.
puts in the last line of your code outputs to the terminal, but makes the method return nil, which makes it behave differently from uniq.
It can be fixed as:
class Array
def unique
unique_arr = []
each do |word|
unique_arr.push(word) unless unique_arr.last == word
end
unique_arr
end
end

A unique array? That sounds like a Set to me:
require 'set'
Set.new([1,2,3,2,3,4]).to_a
#=> [1,2,3,4]

Here's a concise way to do it that doesn't explicitly use functionality from another class but probably otherwise misses the point of the challenge:
class Array
def unique
group_by(&:itself).keys
end
end

I try this three options. Just for challenge
class Array
def unique
self.each_with_object({}) { |k, h| h[k] = k }.keys
end
def unique2
self.each_with_object([]) { |k, a| a << k unless a.include?(k) }
end
def unique3
arr = []
self.map { |k| arr << k unless arr.include?(k) }
arr
end
end

Here is one more way to do this:
uniques = a.each.with_object([]) {|el, arr| arr << el if not arr.include?(el)}

That's so easy if you see it this way:
a = [1,1,2,3,4]
h = Hash.new
a.each{|q| h[q] = q}
h.values
and this will return:
[1, 2, 3, 4]

Ruby modify array items and return full array

I have this code here
string.split(/(\w{1,}=)/).each_slice(1).map { |i| items << i }
items.map! do |i|
i = i << str if i.to_s =~ /\w{1,}=/
end
puts items*''
And I want to modify certain items in the array based on regex, then return the full array with the modified items in it. This only returns the modified items. How do I achieve what I'm looking for?
EDIT: Ok, so say I'm trying to split a link using this regex:
page.php?site=blah&id=1
The link is split and added to the array which now contains
page.php?
site=
blah&
id=
1
What I want to do is append some value to the end of the elements ending with a =. This way, when I return the modified array as a string it would output like this:
page.php?site=(newval)&id=(newval)

You have several undefined variables in your example, which is very sloppy.
each_slice(1) is equivalent to each(), so it's not clear why you are using each_slice(1). In any case, both each() and map() step through the items in an Array one by one, but each() returns the original Array unchanged. On the other hand, you use map() when you want to create a new Array that contains changes to the items.
In the regex /\w{1,}/, there is a shortcut for the quantifier {1, }, and it's: +, so most people would write the regex as /\w+/, where + means 1 or more.
I want to modify certain items in the array based on regex, then
return the full array with the modified items in it.
Here is an example:
results = [1, 2, 3].map do |num|
if num == 2
num + 4
else
num - 1
end
end
p results
--output:--
[0, 6, 2]
Your current attempt with map() doesn't return anything if the conditional fails. Note how the example above returns something both when the condition fails AND when the condition succeeds. map() replaces an item with whatever is returned for that item.
Now look at this example:
results = [1, 2, 3].map do |num|
if num == 2
num + 4
end
end
p results
--output:--
[nil, 6, nil]
If you don't return something for an item, then map() will use nil for that item. In the example, if the condition num == 2 is true then num+4 is returned--but if num == 2 is false, nothing is returned.
Edit:
words = %w[
page.php?
site=
blah&
id=
1
] #=> words = ["page.php?", "site=", "blah&", "id=", "1"]
suffix = 'hello'
results = words.map do |word|
if word.end_with?('=')
"#{word}#{suffix}"
else
word
end
end
p results
--output:--
["page.php?", "site=hello", "blah&", "id=hello", "1"]

Instead of parsing a URL with a regex, have you considered using the addressable gem?
require 'addressable/uri'
uri = Addressable::URI.parse('page.php?site=blah&id=1&bar')
uri.query_values = uri.query_values.map do |k, v|
[k, v.is_a?(String) ? v << 'foo' : v]
end
puts uri.to_s # => page.php?site=blahfoo&id=1foo&bar
This won't handle very complex query parameters (it will just pass them through).
You can use respond_to? :sub! and v.sub! /$/, 'foo' instead of checking types if that makes you uneasy. (I wouldn't use :<< or :concat because those are valid methods for Arrays.)

Modify hashes in an array based on another array

I have two arrays like this:
a = [{'one'=>1, 'two'=>2},{'uno'=>1, 'dos'=>2}]
b = ['english', 'spanish']
I need to add a key-value pair to each hash in a to get this:
a = [{'one'=>1, 'two'=>2, 'language'=>'english'},{'uno'=>1, 'dos'=>2, 'language'=>'spanish'}]
I attempted this:
(0..a.length).each {|c| a[c]['language']=b[c]}
and it does not work. With this:
a[1]['language']=b[1]
(0..a.length).each {|c| puts c}
an error is shown:
NoMethodError (undefined method '[]=' for nil:NilClass)
How can I fix this?

a.zip(b){|h, v| h["language"] = v}
a # => [
# {"one"=>1, "two"=>2, "language"=>"english"},
# {"uno"=>1, "dos"=>2, "language"=>"spanish"}
# ]

When the each iterator over your Range reaches the last element (i.e. a.length), you will attempt to access a nonexisting element of a.
In your example, a.length is 2, so on the last iteration of your each, you will attempt to access a[2], which doesn't exist. (a only contains 2 elements wich indices 0 and 1.) a[2] evaluates to nil, so you will now attempt to call nil['language']=b[2], which is syntactic sugar for nil.[]=('language', b[2]), and since nil doesn't have a []= method, you get a NoMethodError.
The immediate fix is to not iterate off the end of a, by using an exclusive Range:
(0...a.length).each {|c| a[c]['language'] = b[c] }
By the way, the code you posted:
(0..a.length).each {|c| puts c }
should clearly have shown you that you iterate till 2 instead of 1.
That's only the immediate fix, however. The real fix is to simply never iterate over a datastructure manually. That's what iterators are for.
Something like this, where Ruby will keep track of the index for you:
a.each_with_index do |hsh, i| hsh['language'] = b[i] end
Or, without fiddling with indices at all:
a.zip(b.zip(['language'].cycle).map(&:reverse).map(&Array.method(:[])).map(&:to_h)).map {|x, y| x.merge!(y) }
[Note: this last one doesn't mutate the original Arrays and Hashes unlike the other ones.]

The problem you're having is that your (0..a.length) is inclusive. a.length = 2 so you want to modify it to be 0...a.length which is exclusive.
On a side note, you could use Array#each_with_index like this so you don't have to worry about the length and so on.
a.each_with_index do |hash, index|
hash['language'] = b[index]
end

Here is another method you could use
b.each_with_index.with_object(a) do |(lang,i),obj|
obj[i]["language"] = lang
obj
end
#=>[
{"one"=>1, "two"=>2, "language"=>"english"},
{"uno"=>1, "dos"=>2, "language"=>"spanish"}
]
What this does is creates an Enumerator for b with [element,index] then it calls with_object using a as the object. It then iterates over the Enumerator passing in each language and its index along with the a object. It then uses the index from b to find the proper index in a and adds a language key to the hash that is equal to the language.
Please know this is a destructive method where the objects in a will mutate during the process. You could make it non destructive using with_object(a.map(&:dup)) this will dup the hashes in a and the originals will remain untouched.
All that being said I think YAML would be better suited for a task like this but I am not sure what your constraints are. As an example:
yml = <<YML
-
one: 1
two: 2
language: "english"
-
uno: 1
dos: 2
language: "spanish"
YML
require 'yaml'
YAML.load(yml)
#=>[
{"one"=>1, "two"=>2, "language"=>"english"},
{"uno"=>1, "dos"=>2, "language"=>"spanish"}
]
Although using YAML I would change the structure for numbers to be more like language => Array of numbers by index e.g. {"english" => ["zero","one","two"]}. That way you can can access them like ["english"][0] #=> "zero"