Right justifying array contents - arrays

I need to prepare some data for graphing and analysis and could use some advice on transforming that data.
I have an array of arrays. Each sub-array is a set of gathered data with the last element representing the most recent data for all. Earlier elements represent historical data. The sub-arrays have variable amounts of history. I would like to process the arrays so that the current data (last element in each sub-array) lines up with the right boundary.
For example:
[[2], [3, 5, 8, 9], [2, 10]]
should be transformed to, or processed as if it were:
[[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
I would prefer not to mutate the original data if possible, but can deal with it if it helps (I would just full_dup the array first and work on the copy)

candidate_matrix = [[2], [3, 5, 8, 9], [2, 10]]
row_size = candidate_matrix.map(&:size).max
candidate_matrix.map { |numbers| [nil] * (row_size - numbers.size) + numbers }
# => [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]

Array#fill
As discussed in this answer, you can use Array#fill:
m = arr.max_by(&:size).size
arr.map { |s| s.reverse.fill(nil, m..m-1).reverse }
#=> [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
Array#insert
Or a more semantic answer would use you can use Array#insert:
arr.map { |s| s.dup.insert(0, [nil] * (m - s.size)).flatten }
#=> [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
Kernel#loop
arr.map { |sub|
sub = sub.dup
loop {
break if sub.size >= m
sub.insert 0, nil
}
sub
}
#=> [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
You can use while or until loops similarly which I think look nicer but then you'd face the wrath of the idiom-police. Also using unshift or insert require a dup which isn't ideal.
The reason I've used insert 0, nil here instead of just unshift nil is because say you wanted to left-justify instead, all you would have to do is replace the 0 with a -1 in insert's first argument.

Related

Splitting an array using each_with_slice but keeping specific value(s) in resultant arrays (one liner)

I'm hoping to get an array into a specific format and do so in a one-liner if possible. Using each_slice(2).to_a is helpful in splitting the array, however I would like to keep second coordinate of each array as the first coordinate of the next array.
Initial Array
# format is [x1, y1, x2, y2, x3, y3, x4, y4]...
full_line_coords = [[1, 1, 2, 2, 3, 3, 4, 4], [5, 5, 6, 6, 7, 7, 8, 8]]
Desired Output
# desired format is [[[x1, y1], [x2, y2]], [[x2, y2], [x3, y3]], [[x3, y3], [x4, y4]]]...
desired = [[[1, 1], [2, 2]], [[2, 2], [3, 3]], [[3, 3], [4, 4]]]
Success without one-liner
# without one line:
desired = []
temp_array = []
full_line_coords.each do |x|
temp_array << x.each_slice(2).to_a
end
temp_array.each do |x|
i = 0
until i == x.length - 1
desired << [x[i], x[i+1]]
i += 1
end
end
p desired
# => [[[1, 1], [2, 2]], [[2, 2], [3, 3]], [[3, 3], [4, 4]], [[5, 5], [6, 6]], [[6, 6], [7, 7]], [[7, 7], [8, 8]]]
Unsure how to make this as one-line, found it simple enough to do the split, but not keeping the end/start coordinates in each array (as below).
One-liner attempt
attempt = full_line_coords.each { |x| p x.each_slice(2).to_a.each_slice(2).to_a } # p to show this is where i'd like the array to be in 'desired' format if possible.
# => [[[1, 1], [2, 2]], [[3, 3], [4, 4]]]
# [[[5, 5], [6, 6]], [[7, 7], [8, 8]]]
Background/Reasoning
The only reason I wish to keep it is a one-liner is because I want to return the "parent" object itself, not just the resulting attributes.
"Parent" objects being a list of links: #<Link:0x803a2e8>, with many attributes, including "segments".
links.each do |l|
puts l.segments
end
# Gives an array of XY coordinates, including all vertices. e.g. [1, 1, 2, 2, 3, 3, 4, 4]
I am then looking to use the "desired" output in some other defined methods but return the link object itself at the end #<Link:0x803a2e8>, not just products from the link's attributes.
Many thanks.
This is the first option I found:
full_line_coords.flat_map { |e| e.each_slice(2).each_cons(2).to_a }
Find the methods in Enumerable class and Array class
Input
full_line_coords = [[1, 1, 2, 2, 3, 3, 4, 4], [5, 5, 6, 6, 7, 7, 8, 8]]
Code
p full_line_coords.map { |var| var.slice_when { |x, y| x != y }
.each_cons(2)
.to_enum
.map(&:itself) }
Output
[[[[1, 1], [2, 2]], [[2, 2], [3, 3]], [[3, 3], [4, 4]]], [[[5, 5], [6, 6]], [[6, 6], [7, 7]], [[7, 7], [8, 8]]]]

Prevent identical pairs when shuffling and slicing Ruby array

I'd like to prevent producing pairs with the same items when producing a random set of pairs in a Ruby array.
For example:
[1,1,2,2,3,4].shuffle.each_slice(2).to_a
might produce:
[[1, 1], [3, 4], [2, 2]]
I'd like to be able to ensure that it produces a result such as:
[[4, 1], [1, 2], [3, 2]]
Thanks in advance for the help!
arr = [1,1,2,2,3,4]
loop do
sliced = arr.shuffle.each_slice(2).to_a
break sliced if sliced.none? { |a| a.reduce(:==) }
end
Here are three ways to produce the desired result (not including the approach of sampling repeatedly until a valid sample is found). The following array will be used for illustration.
arr = [1,4,1,2,3,2,1]
Use Array#combination and Array#sample
If pairs sampled were permitted to have the same number twice, the sample space would be
arr.combination(2).to_a
#=> [[1, 4], [1, 1], [1, 2], [1, 3], [1, 2], [1, 1], [4, 1], [4, 2],
# [4, 3], [4, 2], [4, 1], [1, 2], [1, 3], [1, 2], [1, 1], [2, 3],
# [2, 2], [2, 1], [3, 2], [3, 1], [2, 1]]
The pairs containing the same value twice--here [1, 1] and [2, 2]--are not wanted so they are simple removed from the above array.
sample_space = arr.combination(2).reject { |x,y| x==y }
#=> [[1, 4], [1, 2], [1, 3], [1, 2], [4, 1], [4, 2], [4, 3],
# [4, 2], [4, 1], [1, 2], [1, 3], [1, 2], [2, 3], [2, 1],
# [3, 2], [3, 1], [2, 1]]
We evidently are to sample arr.size/2 elements from sample_space. Depending on whether this is to be done with or without replacement we would write
sample_space.sample(arr.size/2)
#=> [[4, 3], [1, 2], [1, 3]]
for sampling without replacement and
Array.new(arr.size/2) { sample_space.sample }
#=> [[1, 3], [4, 1], [2, 1]]
for sampling with replacement.
Sample elements of each pair sequentially, Method 1
This method, like the next, can only be used to sample with replacement.
Let's first consider sampling a single pair. We could do that by selecting the first element of the pair randomly from arr, remove all instances of that element in arr and then sample the second element from what's left of arr.
def sample_one_pair(arr)
first = arr.sample
[first, second = (arr-[first]).sample]
end
To draw a sample of arr.size/2 pairs we there execute the following.
Array.new(arr.size/2) { sample_one_pair(arr) }
#=> [[1, 2], [4, 3], [1, 2]]
Sample elements of each pair sequentially, Method 2
This method is a very fast way of sampling large numbers of pairs with replacement. Like the previous method, it cannot be used to sample without replacement.
First, compute the cdf (cumulative distribution function) for drawing an element of arr at random.
counts = arr.group_by(&:itself).transform_values { |v| v.size }
#=> {1=>3, 4=>1, 2=>2, 3=>1}
def cdf(sz, counts)
frac = 1.0/sz
counts.each_with_object([]) { |(k,v),a|
a << [k, frac * v + (a.empty? ? 0 : a.last.last)] }
end
cdf_first = cdf(arr.size, counts)
#=> [[1, 0.429], [4, 0.571], [2, 0.857], [3, 1.0]]
This means that there is a probability of 0.429 (rounded) of randomly drawing a 1, 0.571 of drawing a 1 or a 4, 0.857 of drawing a 1, 4 or 2 and 1.0 of drawing one of the four numbers. We therefore can randomly sample a number from arr by obtaining a (pseudo-) random number between zero and one (p = rand) and then determine the first element of counts_cdf, [n, q] for which p <= q:
def draw_random(cdf)
p = rand
cdf.find { |n,q| p <= q }.first
end
draw_random(counts_cdf) #=> 1
draw_random(counts_cdf) #=> 4
draw_random(counts_cdf) #=> 1
draw_random(counts_cdf) #=> 1
draw_random(counts_cdf) #=> 2
draw_random(counts_cdf) #=> 3
In simulation models, incidentally, this is the standard way of generating pseudo-random variates from discrete probability distributions.
Before drawing the second random number of the pair we need to modify cdf_first to reflect that fact that the first number cannot be drawn again. Assuming there will be many pairs to generate randomly, it is most efficient to construct a hash cdf_second whose keys are the first values drawn randomly for the pair and whose values are the corresponding cdf's.
cdf_second = counts.keys.each_with_object({}) { |n, h|
h[n] = cdf(arr.size - counts[n], counts.reject { |k,_| k==n }) }
#=> {1=>[[4, 0.25], [2, 0.75], [3, 1.0]],
# 4=>[[1, 0.5], [2, 0.833], [3, 1.0]],
# 2=>[[1, 0.6], [4, 0.8], [3, 1.0]],
# 3=>[[1, 0.5], [4, 0.667], [2, 1.0]]}
If, for example, a 2 is drawn for the first element of the pair, the probability is 0.6 of drawing a 1 for the second element, 0.8 of drawing a 1 or 4 and 1.0 of drawing a 1, 4, or 3.
We can then sample one pair as follows.
def sample_one_pair(cdf_first, cdf_second)
first = draw_random(cdf_first)
[first, draw_random(cdf_second[first])]
end
As before, to sample arr.size/2 values with replacement, we execute
Array.new(arr.size/2) { sample_one_pair }
#=> [[2, 1], [3, 2], [1, 2]]
With replacement, you may get results like:
unique_pairs([1, 1, 2, 2, 3, 4]) # => [[4, 1], [1, 2], [1, 3]]
Note that 1 gets chosen three times, even though it's only in the original array twice. This is because the 1 is "replaced" each time it's chosen. In other words, it's put back into the collection to potentially be chosen again.
Here's a version of Cary's excellent sample_one_pair solution without replacement:
def unique_pairs(arr)
dup = arr.dup
Array.new(dup.size / 2) do
dup.shuffle!
first = dup.pop
second_index = dup.rindex { |e| e != first }
raise StopIteration unless second_index
second = dup.delete_at(second_index)
[first, second]
end
rescue StopIteration
retry
end
unique_pairs([1, 1, 2, 2, 3, 4]) # => [[4, 3], [1, 2], [2, 1]]
This works by creating a copy of the original array and deleting elements out of it as they're chosen (so they can't be chosen again). The rescue/retry is in there in case it becomes impossible to produce the correct number of pairs. For example, if [1, 3] is chosen first, and [1, 4] is chosen second, it becomes impossible to make three unique pairs because [2, 2] is all that's left; the sample space is exhausted.
This should be slower than Cary's solution (with replacement) but faster (on average) than the posted solutions (without replacement) that require looping and retrying. Welp, chalk up another point for "always benchmark!" I was wrong about all most of my assumptions. Here are the results on my machine with an array of 16 numbers ([1, 1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7, 8, 9, 9, 10]):
cary_with_replacement
93.737k (± 2.9%) i/s - 470.690k in 5.025734s
mwp_without_replacement
187.739k (± 3.3%) i/s - 943.415k in 5.030774s
mudasobwa_without_replacement
129.490k (± 9.4%) i/s - 653.150k in 5.096761s
EDIT: I've updated the above solution to address Stefan's numerous concerns. In hindsight, the errors are obvious and embarrassing! On the plus side, the revised solution is now faster than mudasobwa's solution, and I've confirmed that the two solutions have the same biases.
You can check if there any mathes and shuffle again:
a = [1,1,2,2,3,4]
# first time shuffle
sliced = a.shuffle.each_slice(2).to_a
# checking if there are matches and shuffle if there are
while sliced.combination(2).any? { |a, b| a.sort == b.sort } do
sliced = a.shuffle.each_slice(2).to_a
end
It is unlikely, be aware about possibility of infinity loop

Cleaning Arrays of Arrays which consist of nils

For example I have an array:
[[nil, nil], [1, 2], [nil, nil], [nil, nil]]
What is the best way to clean it? Array must have only arrays which do not consist of nil. After cleaning it has to be:
[[1,2]]
Something like:
[[nil, nil], [1, 2], [nil, nil], [nil, nil]].each {|x| x - [nil]}
The methods on arrays that remove nil elements is called compact. However, that is not quite enough for this situation because you have an array of arrays. In addition you will want to select the non-nil arrays, or reject the arrays that are nil. You can easily combine the two in the following way:
[[nil, nil], [1, 2], [nil, nil], [nil, nil]].reject { |arr| arr.compact.empty? }
This will only work if you have sub arrays of numbers OR nils. If your sub arrays contain both e.g. [1, nil, 2], then this solution will keep the entire sub array.
It is possible to mutate the sub array to remove nil while you iterate over the sub arrays, but it can be considered practice to mutate while you iterate. Nevertheless, the way to do this would be to use the bang version of the compact method which mutates the original object:
.reject { |arr| arr.compact!.empty? }
This will take [[1, 2, nil, 3]] and return [[1, 2, 3]].
As sagarpandya82 pointed out, you can also use the all or any? methods for simply checking if everything is nil, or if anything is nil instead of removing the nils.
To recap:
original_array = [[nil, nil],[1, nil, 2], [1, 2, 3]]
original_array.reject { |arr| arr.all?(:nil) } # returns [[1, nil, 2], [1, 2, 3]]
original_array.reject { |arr| arr.compact.empty? } # returns [[1, nil, 2], [1, 2, 3]]
original_array.reject { |arr| arr.any?(:nil) } # returns [[1, 2, 3]]
original_array.reject { |arr| arr.compact!.empty? } # returns [[1, 2, 3], [1, 2]]
Assuming you're only interested in 2D-Arrays then:
Rid sub-arrays consisting of only nils:
arr.reject { |arr| arr.all?(&:nil?) }
Rid sub-arrays consisting of any nils:
arr.reject { |arr| arr.any?(&:nil?) }
compact will remove nil elements from an array.
map will run over each item in an array and return a fresh array by applying code on the item of the array. Note that in your example elements of the array are itself ... Arrays.
reject will return a new array without the elements that your given code answers 'false' to.
select will return a new array with the elements that your given code 'likes' (kinda opposite of reject).
So if you just want to remove all nils from an array and its subarray (but not sub-subarrays), you could call
list = [[1,2], [nil], [1,nil,2]]
list.map(&:compact).reject(&:empty?) #=> [[1,2], [1,2]]
which is the same as
compacted_list = list.map do |element|
element.compact
end
non_empty_list = compacted_list.reject do |element|
element.empty?
end
If you want to remove all [nil, nil] entries from the list/array
list.reject{|element| element == [nil, nil]}
or if it is more about selecting the non-nil elements (this is really just about intent-revealing code)
list.select{|element| element != [nil, nil])
Most of these functions have an ! counterpart (like reject!) which does the modification in place, which means you do not have to assign the return value (like in new_list = old_list.reject()).
There appear to be differing interpretations of the question.
If, as suggested by the question's example, all elements (arrays) that contain one nil contain only nils, and those elements are to be excluded, this would do that:
[[nil, nil], [1, 2], [nil, nil], [nil, nil]].select(&:first)
#=> [!1, 2]]
If all elements that contain at least one nil are to be excluded, this would do that:
[[3, nil], [1, 2], [3, 4, 5, nil]].reject { |a| a.any?(&:nil?) }
#=> [!1, 2]]
If all nils are to be removed from each element, this would do that:
[[3, nil], [1, 2], [nil], [nil, 3, 4]].map(&:compact)
#=> [[3], [1, 2], [], [3, 4]]
If all nils are to be removed from each element, and then all empty arrays are to be removed, this would do that:
[[3, nil], [1, 2], [nil], [nil, 3, 4]].map(&:compact).reject(&:empty?)
#=> [[3], [1, 2], [3, 4]]
I was recently looking at facets, a ruby gem that provides a lot of core ruby language extensions.
One of the examples they give is for the Array#recurse method, I'll show it below:
arr = ["a", ["b", "c", nil], nil]
arr.recurse{ |a| a.compact! }
#=> ["a", ["b", "c"]]
This gets about half the job done in your case - you also want to remove non empty arrays.
Facets works by patching core Ruby methods. That means as soon as you run require 'facets/array/recurse', any previously defined Array#recurse method will be overridden. Patching core methods is generally ill-advised because of the possibility of naming conflicts.
Still, it's a useful method, and it's easy to define it in such a way that it takes an array as an argument instead of operating on the value of self. You can then use it to define two methods which together fulfill your purpose:
module ArrayUtils
def recurse(array, *types, &block)
types = [array.class] if types.empty?
a = array.reduce([]) do |arr, value|
case value
when *types
arr << recurse(value, *types, &block)
else
arr << value
end
arr
end
yield a
end
def recursive_compact(array)
recurse(array, &:compact)
end
def recursive_remove_nonempty(array)
recurse(array) do |arr|
arr.reject do |x|
x.is_a?(Array) && x.empty?
end
end
end
end
Testing it out:
include ArrayUtils
orig = [[nil, nil], [1, 2], [nil, nil], [nil, nil]]
compacted = recursive_compact(orig)
nonempties = recursive_remove_nonempty compacted
puts "original: #{orig.inspect}"
puts "compacted: #{compacted.inspect}"
puts "nonempties: #{nonempties.inspect}"
and running
original: [[nil, nil], [1, 2], [nil, nil], [nil, nil]]
compacted: [[], [1, 2], [], []]
nonempties: [[1, 2]]

How do I arrange an array as table rows?

I am trying to learn the Ruby way of array processing. What is a succinct way to write the following function?
def columnize(items, n_cols)
Items is a 1D array of arbitrary length. I want to return an array of rows, each having a length of n_cols, that includes all of the items column-wise, possibly with nils padding the last column. For example:
items = [1, 2, 3, 4, 5, 6, 7]
table = columnize items, 3
This should produce a table of:
[[1, 4, 7],
[2, 5, nil],
[3, 6, nil]]
Note that it's possible for the last column to be all nils as in:
columnize [1, 2, 3, 4, 5, 6, 7, 8, 9], 4
This is a real problem I need to solve for report generation. I have a Ruby newbie solution that is not very satisfying and can post it if desired.
You want to use Matrix class.
items = [1, 2, 3, 4, 5, 6, 7]
require 'matrix'
# ⇒ true
m = Matrix.build(3) { |row, col| items[row+col*3] }
# ⇒ Matrix[[1, 4, 7], [2, 5, nil], [3, 6, nil]]
Ruby's Array class has transpose which is designed to convert rows into columns. Using it in conjunction with fill and Enumerable's each_slice gives:
require 'pp'
def columnize(items, cols)
ary = items.dup.fill(nil, items.size, cols - items.size % cols )
ary.each_slice(ary.size / cols).to_a.transpose
end
items = [1, 2, 3, 4, 5, 6, 7]
pp columnize(items, 3)
pp columnize [1, 2, 3, 4, 5, 6, 7, 8, 9], 4
Which outputs:
[[1, 4, 7], [2, 5, nil], [3, 6, nil]]
[[1, 4, 7, nil], [2, 5, 8, nil], [3, 6, 9, nil]]
Except for filling rows that only have nil elements, this will do:
first, *rest = items.each_slice((items.length/n_cols).ceil).to_a
first.zip(*rest)

Converting uneven rows to columns with FasterCSV

I have a CSV data file with rows that may have lots of columns 500+ and some with a lot less. I need to transpose it so that each row becomes a column in the output file. The problem is that the rows in the original file may not all have the same number of columns so when I try the transpose method of array I get:
`transpose': element size differs (12 should be 5) (IndexError)
Is there an alternative to transpose that works with uneven array length?
I would insert nulls to fill the holes in your matrix, something such as:
a = [[1, 2, 3], [3, 4]]
# This would throw the error you're talking about
# a.transpose
# Largest row
size = a.max { |r1, r2| r1.size <=> r2.size }.size
# Enlarge matrix inserting nils as needed
a.each { |r| r[size - 1] ||= nil }
# So now a == [[1, 2, 3], [3, 4, nil]]
aa = a.transpose
# aa == [[1, 3], [2, 4], [3, nil]]
# Intitial CSV table data
csv_data = [ [1,2,3,4,5], [10,20,30,40], [100,200] ]
# Finding max length of rows
row_length = csv_data.map(&:length).max
# Inserting nil to the end of each row
csv_data.map do |row|
(row_length - row.length).times { row.insert(-1, nil) }
end
# Let's check
csv_data
# => [[1, 2, 3, 4, 5], [10, 20, 30, 40, nil], [100, 200, nil, nil, nil]]
# Transposing...
transposed_csv_data = csv_data.transpose
# Hooray!
# => [[1, 10, 100], [2, 20, 200], [3, 30, nil], [4, 40, nil], [5, nil, nil]]

Resources