Cleaning Arrays of Arrays which consist of nils

Cleaning Arrays of Arrays which consist of nils - arrays

For example I have an array:
[[nil, nil], [1, 2], [nil, nil], [nil, nil]]
What is the best way to clean it? Array must have only arrays which do not consist of nil. After cleaning it has to be:
[[1,2]]
Something like:
[[nil, nil], [1, 2], [nil, nil], [nil, nil]].each {|x| x - [nil]}

The methods on arrays that remove nil elements is called compact. However, that is not quite enough for this situation because you have an array of arrays. In addition you will want to select the non-nil arrays, or reject the arrays that are nil. You can easily combine the two in the following way:
[[nil, nil], [1, 2], [nil, nil], [nil, nil]].reject { |arr| arr.compact.empty? }
This will only work if you have sub arrays of numbers OR nils. If your sub arrays contain both e.g. [1, nil, 2], then this solution will keep the entire sub array.
It is possible to mutate the sub array to remove nil while you iterate over the sub arrays, but it can be considered practice to mutate while you iterate. Nevertheless, the way to do this would be to use the bang version of the compact method which mutates the original object:
.reject { |arr| arr.compact!.empty? }
This will take [[1, 2, nil, 3]] and return [[1, 2, 3]].
As sagarpandya82 pointed out, you can also use the all or any? methods for simply checking if everything is nil, or if anything is nil instead of removing the nils.
To recap:
original_array = [[nil, nil],[1, nil, 2], [1, 2, 3]]
original_array.reject { |arr| arr.all?(:nil) } # returns [[1, nil, 2], [1, 2, 3]]
original_array.reject { |arr| arr.compact.empty? } # returns [[1, nil, 2], [1, 2, 3]]
original_array.reject { |arr| arr.any?(:nil) } # returns [[1, 2, 3]]
original_array.reject { |arr| arr.compact!.empty? } # returns [[1, 2, 3], [1, 2]]

Assuming you're only interested in 2D-Arrays then:
Rid sub-arrays consisting of only nils:
arr.reject { |arr| arr.all?(&:nil?) }
Rid sub-arrays consisting of any nils:
arr.reject { |arr| arr.any?(&:nil?) }

compact will remove nil elements from an array.
map will run over each item in an array and return a fresh array by applying code on the item of the array. Note that in your example elements of the array are itself ... Arrays.
reject will return a new array without the elements that your given code answers 'false' to.
select will return a new array with the elements that your given code 'likes' (kinda opposite of reject).
So if you just want to remove all nils from an array and its subarray (but not sub-subarrays), you could call
list = [[1,2], [nil], [1,nil,2]]
list.map(&:compact).reject(&:empty?) #=> [[1,2], [1,2]]
which is the same as
compacted_list = list.map do |element|
element.compact
end
non_empty_list = compacted_list.reject do |element|
element.empty?
end
If you want to remove all [nil, nil] entries from the list/array
list.reject{|element| element == [nil, nil]}
or if it is more about selecting the non-nil elements (this is really just about intent-revealing code)
list.select{|element| element != [nil, nil])
Most of these functions have an ! counterpart (like reject!) which does the modification in place, which means you do not have to assign the return value (like in new_list = old_list.reject()).

There appear to be differing interpretations of the question.
If, as suggested by the question's example, all elements (arrays) that contain one nil contain only nils, and those elements are to be excluded, this would do that:
[[nil, nil], [1, 2], [nil, nil], [nil, nil]].select(&:first)
#=> [!1, 2]]
If all elements that contain at least one nil are to be excluded, this would do that:
[[3, nil], [1, 2], [3, 4, 5, nil]].reject { |a| a.any?(&:nil?) }
#=> [!1, 2]]
If all nils are to be removed from each element, this would do that:
[[3, nil], [1, 2], [nil], [nil, 3, 4]].map(&:compact)
#=> [[3], [1, 2], [], [3, 4]]
If all nils are to be removed from each element, and then all empty arrays are to be removed, this would do that:
[[3, nil], [1, 2], [nil], [nil, 3, 4]].map(&:compact).reject(&:empty?)
#=> [[3], [1, 2], [3, 4]]

I was recently looking at facets, a ruby gem that provides a lot of core ruby language extensions.
One of the examples they give is for the Array#recurse method, I'll show it below:
arr = ["a", ["b", "c", nil], nil]
arr.recurse{ |a| a.compact! }
#=> ["a", ["b", "c"]]
This gets about half the job done in your case - you also want to remove non empty arrays.
Facets works by patching core Ruby methods. That means as soon as you run require 'facets/array/recurse', any previously defined Array#recurse method will be overridden. Patching core methods is generally ill-advised because of the possibility of naming conflicts.
Still, it's a useful method, and it's easy to define it in such a way that it takes an array as an argument instead of operating on the value of self. You can then use it to define two methods which together fulfill your purpose:
module ArrayUtils
def recurse(array, *types, &block)
types = [array.class] if types.empty?
a = array.reduce([]) do |arr, value|
case value
when *types
arr << recurse(value, *types, &block)
else
arr << value
end
arr
end
yield a
end
def recursive_compact(array)
recurse(array, &:compact)
end
def recursive_remove_nonempty(array)
recurse(array) do |arr|
arr.reject do |x|
x.is_a?(Array) && x.empty?
end
end
end
end
Testing it out:
include ArrayUtils
orig = [[nil, nil], [1, 2], [nil, nil], [nil, nil]]
compacted = recursive_compact(orig)
nonempties = recursive_remove_nonempty compacted
puts "original: #{orig.inspect}"
puts "compacted: #{compacted.inspect}"
puts "nonempties: #{nonempties.inspect}"
and running
original: [[nil, nil], [1, 2], [nil, nil], [nil, nil]]
compacted: [[], [1, 2], [], []]
nonempties: [[1, 2]]

Related

Right justifying array contents

I need to prepare some data for graphing and analysis and could use some advice on transforming that data.
I have an array of arrays. Each sub-array is a set of gathered data with the last element representing the most recent data for all. Earlier elements represent historical data. The sub-arrays have variable amounts of history. I would like to process the arrays so that the current data (last element in each sub-array) lines up with the right boundary.
For example:
[[2], [3, 5, 8, 9], [2, 10]]
should be transformed to, or processed as if it were:
[[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
I would prefer not to mutate the original data if possible, but can deal with it if it helps (I would just full_dup the array first and work on the copy)

candidate_matrix = [[2], [3, 5, 8, 9], [2, 10]]
row_size = candidate_matrix.map(&:size).max
candidate_matrix.map { |numbers| [nil] * (row_size - numbers.size) + numbers }
# => [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]

Array#fill
As discussed in this answer, you can use Array#fill:
m = arr.max_by(&:size).size
arr.map { |s| s.reverse.fill(nil, m..m-1).reverse }
#=> [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
Array#insert
Or a more semantic answer would use you can use Array#insert:
arr.map { |s| s.dup.insert(0, [nil] * (m - s.size)).flatten }
#=> [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
Kernel#loop
arr.map { |sub|
sub = sub.dup
loop {
break if sub.size >= m
sub.insert 0, nil
}
sub
}
#=> [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
You can use while or until loops similarly which I think look nicer but then you'd face the wrath of the idiom-police. Also using unshift or insert require a dup which isn't ideal.
The reason I've used insert 0, nil here instead of just unshift nil is because say you wanted to left-justify instead, all you would have to do is replace the 0 with a -1 in insert's first argument.

Yielding partitions of a multiset with Ruby

I would like to get all the possible partitions (disjoint subsets of a set which union is the original set) of a multiset (some elements are equal and non-distinguishable from each other).
Simpler case when one would like to yield the partitions of a simple set, in which there are no elements with multiplicity, in other words all elements are different. For this scenario I found this Ruby code on StackOwerflow which is very efficient, as not storing all the possible partitions, but yielding them to a block:
def partitions(set)
yield [] if set.empty?
(0 ... 2 ** set.size / 2).each do |i|
parts = [[], []]
set.each do |item|
parts[i & 1] << item
i >>= 1
end
partitions(parts[1]) do |b|
result = [parts[0]] + b
result = result.reject do |e|
e.empty?
end
yield result
end
end
end
Example:
partitions([1,2,3]){|e| puts e.inspect}
outputs:
[[1, 2, 3]]
[[2, 3], [1]]
[[1, 3], [2]]
[[3], [1, 2]]
[[3], [2], [1]]
As there are 5 different partitioning of the set [1,2,3] (Bell-number anyway: https://en.wikipedia.org/wiki/Bell_number)
However the another set which is in fact a multiset contains elements with multiplicity, then above code doesn't work of course:
partitions([1,1,2]){|e| puts e.inspect}
outputs:
[[1, 1, 2]]
[[1, 2], [1]] *
[[1, 2], [1]] *
[[2], [1, 1]]
[[2], [1], [1]]
One can see two identical partitions, denoted with *, which should be yielded only once.
My question is: how can I modify the def partitions() method to work with multisets too, or how can I filter out the identical partitionings, duplications in an efficient way? Are those identical partitionings coming always followed by each other in a consecutive manner?
My goal is to organize images with different aspect ratio to a montage, and the picture rows of the montage would be those set partitions. I would like to minimalize the difference of the heights between the picture rows (or the standard deviation equivalently) among the possible partitionings, but many times there are pictures with same aspect ratios this is why I try to deal with a multiset.
Yielding not partitons but powersets (all possibe subsets) of a multiset, filtering out the duplicates by simple memoization:
Montage optimization by backtracking on YouTube

You could put it in an array and use uniq:
arr = []
partitions([1,1,2]) { |e| arr << e }
puts arr.to_s
#-> [[[1, 1, 2]], [[1, 2], [1]], [[1, 2], [1]], [[2], [1, 1]], [[2], [1], [1]]]
puts arr.uniq.to_s
#-> [[[1, 1, 2]], [[1, 2], [1]], [[2], [1, 1]], [[2], [1], [1]]]

Ruby: Idiomatic way to compare every index of array to all other indexes of the same array?

Say I have an array arr = [7,0,4,-7] and I'd like to get the pair of indexes [i, i2] where arr[i] + arr[i2] == 0. In the example, the answer would be [0, 3]. What would be the idiomatic and efficient way to do so?
Here's the best I've gotten so far. I'm sure it's not the best way to do it. Previously I was using two while loops but I feel like this isn't any better.
> nums = [7,0,4,-7]
> nums.each_index do |n1|
(nums.length).times do |n2|
return [n1, n2] if nums[n1] + nums[n2] == 0
end
end
> [0, 3]

The following code will find you all pairs of elements where the sum is zero.
arr = [7,0,4,-7, -4, 5]
zero_sum = arr.combination(2).select { |pair| pair.first + pair.last == 0 }
zero_sum #=> [[7, -7], [4, -4]]
You can then find the indexes of these elements this way:
zero_sum.map { |pair| [arr.index(pair.first), arr.index(pair.last)] } #=> [[0, 3], [2, 4]]
If you need just one pair use method find instead of select:
arr.combination(2)
.find { |first, last| first + last == 0 } #=> [7, -7]
.map { |num| arr.index(num) } # =>[0, 3]

The following method requires only a single pass through the array. It returns all pairs of indices of elements of the array that sum to zero.
Code
def zero_summing_pairs(arr)
processed = {}
arr.each_with_index.with_object([]) do |(n,i),pairs|
processed[-n].each { |j| pairs << [j,i] } if processed.key?(-n)
(processed[n] ||= []) << i
end
end
Examples
zero_summing_pairs [7,0,4,-7]
#=> [[0, 3]]
zero_summing_pairs [7,4,0,7,4,0,-7,-4,-7]
#=> [[2, 5], [0, 6], [3, 6], [1, 7], [4, 7], [0, 8], [3, 8]]
The associated values are as follows.
arr = [7,0,4,-7]
zero_summing_pairs(arr).map { |i,j| [arr[i], arr[j]] }
#=> [[7, -7]]
arr = [7,4,0,7,4,0,-7,-4,-7]
zero_summing_pairs(arr).map { |i,j| [arr[i], arr[j]] }
#=> [[0, 0], [7, -7], [7, -7], [4, -4], [4, -4], [7, -7], [7, -7]]
Explanation
pairs is the array of pairs of indices of values of arr that sum to zero. pairs is the object that is returned by the method.
processed is a hash with keys equal to the values of arr that have been processed by the block. The value of each key k is an array of the indices i of arr that have been processed by the block and for which arr[i] #=> -n. I chose a hash structure for fast key lookup.
The line
(processed[n] ||= []) << i
requires explanation. Firstly, this is shorthand for
processed[n] = (processed[n] || []) << i
If processed has a key n (whose value is not nil), the value of that key on the right side of the above expression is a non-empty array containing indices i for which arr[i] #=> -n, so the above expression reduces to
processed[n] = processed[n] << i
and the index i is added to the array. If processed does not have a key n, processed[n] equals nil, so the expression becomes
processed[n] = (processed[n] || []) << i
= (nil || []) << i
= [] << i
= [i]
In other words, here the value of key n is made an empty array and then i is appended to that array.
Let's now step through the code for
arr = [7,0,4,-7]
processed = {}
enum0 = arr.each_with_index
#=> #<Enumerator: [7, 0, 4, -7]:each_with_index>
We can see the values that will be generated by this enumerator by converting it to an array.
enum0.to_a
#=> [[7, 0], [0, 1], [4, 2], [-7, 3]]
Continuing,
enum1 = enum0.with_object([])
#=> #<Enumerator: #<Enumerator: [7, 0, 4, -7]:each_with_index>:with_object([])>
enum1.to_a
#=> [[[7, 0], []], [[0, 1], []], [[4, 2], []], [[-7, 3], []]]
If you examine the return value for the definition of enum1, you will see that it can be thought of as a "compound" enumerator. The empty arrays (corresponding to the block variable pairs) will be filled in as the calculations are performed.
The first value of enum1 is generated and passed to the block, and the three block variables are assigned values using parallel assignment (aka multiple assignment) and disambiguation (aka decompositon).
(n,i), pairs = enum1.next
#=> [[7, 0], []]
n #=> 7
i #=> 0
pairs #=> []
As
processed.key?(-n)
#=> processed.key?(-7)
#=> false
the first line of the block is not executed. The second line of the block is
(processed[n] ||= []) << i
#=> processed[n]
#=> [i]
#=> [0]
so now
processed
#=> {7=>[0], 0=>[1]}
pairs
#=> []
The remaining three elements generated by enum1 are processed similarly.
(n,i), pairs = enum1.next
#=> [[0, 1], []]
processed.key?(-n)
#=> processed.key?(0)
#=> false
(processed[n] ||= []) << i
#=> (processed[0] ||= []) << 1
#=> [] << 1
#=> [1]
processed
#=> {7=>[0], 0=>[1]}
pairs
#=> []
(n,i), pairs = enum1.next
#=> [[4, 2], []]
processed.key?(-n)
#=> processed.key?(-4)
#=> false
(processed[n] ||= []) << i
#=> (processed[4] ||= []) << 2
#=> [] << 2
#=> [2]
processed
#=> {7=>[0], 0=>[1], 4=>[2]}
pairs
#=> []
(n,i), pairs = enum1.next
#=> [[-7, 3], []]
processed.key?(-n)
# processed.key?(7)
#=> true
processed[-n].each { |j| pairs << [j,i] }
# processed[7].each { |j| pairs << [j,3] }
#=> [0]
(processed[n] ||= []) << i
#=> (processed[-7] ||= []) << 3
#=> [] << 3
#=> [3]
processed
#=> {7=>[0], 0=>[1], 4=>[2], -7=>[3]}
pairs
#=> [[0, 3]]
Notice that the last value generated by enum1 is the first to have a match in processed, so is treated differently than the previous values in the block calculation. Lastly,
(n,i), pairs = enum1.next
#=> StopIteration: iteration reached an end (an exception)
causing pairs to be returned from the block and therefore from the method.

Here is one way to do this, it uses Array#combination, approach similar to other answer by #kallax, but works on combinations of indices instead of combination of elements:
arr = [7,0,4,-7]
(0...arr.size).to_a.combination(2).select {|i| arr[i.first] + arr[i.last] == 0}
#=> [[0, 3]]

Sorting using sort_by with nils appearing in the front

Is there any way to use sort_by and make nils appear in the front.
For example
[-1, 2, 3, nil, nil].sort_by &some_block
should give
#=> [nil, nil, -1, 2, 3]
It's similar to this question but the solution there does not work with negative values.

You can use Float::INFINITY if your other values are numeric:
[-1, 2, 3, nil, nil].sort_by { |n| n || -Float::INFINITY }
#=> [nil, nil, -1, 2, 3]
Another way to write this is:
sort_by { |n| n ? n : -Float::INFINITY }
or more explicitly regarding nil:
sort_by { |n| n.nil? ? -Float::INFINITY : n }

> [-1, 2, 3, nil, nil].sort_by{|x| [(x.nil?)?0:1, x]}
=> [nil, nil, -1, 2, 3]
This avoids comparing int with nil, by takign advantage of the short circuiting behaviour of the <=> for arrays

I suggest
def sort_by_with_val_first(arr, val=nil)
([val]*arr.count(val)).concat (arr-[val]).sort_by { |e| yield e }
end
arr = [-1, 2, 3, nil, nil, -4]
sort_by_with_val_first(arr) { |x| x.abs }
#=> [nil, nil, -1, 2, 3, -4]
I like how this reads: "Create an array comprised of the elements val in arr, then concatenate this array with arr with elements val removed, sorted as desired". Converting val in the sort_by to an artificial value that will make it work is, to me, aesthetically displeasing.

Converting uneven rows to columns with FasterCSV

I have a CSV data file with rows that may have lots of columns 500+ and some with a lot less. I need to transpose it so that each row becomes a column in the output file. The problem is that the rows in the original file may not all have the same number of columns so when I try the transpose method of array I get:
`transpose': element size differs (12 should be 5) (IndexError)
Is there an alternative to transpose that works with uneven array length?

I would insert nulls to fill the holes in your matrix, something such as:
a = [[1, 2, 3], [3, 4]]
# This would throw the error you're talking about
# a.transpose
# Largest row
size = a.max { |r1, r2| r1.size <=> r2.size }.size
# Enlarge matrix inserting nils as needed
a.each { |r| r[size - 1] ||= nil }
# So now a == [[1, 2, 3], [3, 4, nil]]
aa = a.transpose
# aa == [[1, 3], [2, 4], [3, nil]]

# Intitial CSV table data
csv_data = [ [1,2,3,4,5], [10,20,30,40], [100,200] ]
# Finding max length of rows
row_length = csv_data.map(&:length).max
# Inserting nil to the end of each row
csv_data.map do |row|
(row_length - row.length).times { row.insert(-1, nil) }
end
# Let's check
csv_data
# => [[1, 2, 3, 4, 5], [10, 20, 30, 40, nil], [100, 200, nil, nil, nil]]
# Transposing...
transposed_csv_data = csv_data.transpose
# Hooray!
# => [[1, 10, 100], [2, 20, 200], [3, 30, nil], [4, 40, nil], [5, nil, nil]]