Converting uneven rows to columns with FasterCSV - arrays

I have a CSV data file with rows that may have lots of columns 500+ and some with a lot less. I need to transpose it so that each row becomes a column in the output file. The problem is that the rows in the original file may not all have the same number of columns so when I try the transpose method of array I get:
`transpose': element size differs (12 should be 5) (IndexError)
Is there an alternative to transpose that works with uneven array length?

I would insert nulls to fill the holes in your matrix, something such as:
a = [[1, 2, 3], [3, 4]]
# This would throw the error you're talking about
# a.transpose
# Largest row
size = a.max { |r1, r2| r1.size <=> r2.size }.size
# Enlarge matrix inserting nils as needed
a.each { |r| r[size - 1] ||= nil }
# So now a == [[1, 2, 3], [3, 4, nil]]
aa = a.transpose
# aa == [[1, 3], [2, 4], [3, nil]]

# Intitial CSV table data
csv_data = [ [1,2,3,4,5], [10,20,30,40], [100,200] ]
# Finding max length of rows
row_length = csv_data.map(&:length).max
# Inserting nil to the end of each row
csv_data.map do |row|
(row_length - row.length).times { row.insert(-1, nil) }
end
# Let's check
csv_data
# => [[1, 2, 3, 4, 5], [10, 20, 30, 40, nil], [100, 200, nil, nil, nil]]
# Transposing...
transposed_csv_data = csv_data.transpose
# Hooray!
# => [[1, 10, 100], [2, 20, 200], [3, 30, nil], [4, 40, nil], [5, nil, nil]]

Related

Reverse Array Queries given

For a given array of integers, perform operations on the array. Return the resulting array after all operations have been applied in the order given. Each operation contains two indices. Reverse the subarray between those zero-based indices, inclusive.
1: arr is : [1, 2, 3] 1: operations is : [[0, 2], [1, 2], [0, 2]]
2: arr is : [640, 26, 276, 224, 737, 677, 893, 87, 422, 30] 2: operations is : [[0, 9], [2, 2], [5, 5], [1, 6], [5, 6], [5, 9], [0, 8], [6, 7], [1, 9], [3, 3]]
can any one help in solving this question.
Start like this:
[1, 2, 3] -> [3, 2, 1] # first: reverse array between index 0 and 2
-> [3, 1, 2] # then: reverse array between index 1 and 2
etc.
I hope you got the idea.

Prevent identical pairs when shuffling and slicing Ruby array

I'd like to prevent producing pairs with the same items when producing a random set of pairs in a Ruby array.
For example:
[1,1,2,2,3,4].shuffle.each_slice(2).to_a
might produce:
[[1, 1], [3, 4], [2, 2]]
I'd like to be able to ensure that it produces a result such as:
[[4, 1], [1, 2], [3, 2]]
Thanks in advance for the help!
arr = [1,1,2,2,3,4]
loop do
sliced = arr.shuffle.each_slice(2).to_a
break sliced if sliced.none? { |a| a.reduce(:==) }
end
Here are three ways to produce the desired result (not including the approach of sampling repeatedly until a valid sample is found). The following array will be used for illustration.
arr = [1,4,1,2,3,2,1]
Use Array#combination and Array#sample
If pairs sampled were permitted to have the same number twice, the sample space would be
arr.combination(2).to_a
#=> [[1, 4], [1, 1], [1, 2], [1, 3], [1, 2], [1, 1], [4, 1], [4, 2],
# [4, 3], [4, 2], [4, 1], [1, 2], [1, 3], [1, 2], [1, 1], [2, 3],
# [2, 2], [2, 1], [3, 2], [3, 1], [2, 1]]
The pairs containing the same value twice--here [1, 1] and [2, 2]--are not wanted so they are simple removed from the above array.
sample_space = arr.combination(2).reject { |x,y| x==y }
#=> [[1, 4], [1, 2], [1, 3], [1, 2], [4, 1], [4, 2], [4, 3],
# [4, 2], [4, 1], [1, 2], [1, 3], [1, 2], [2, 3], [2, 1],
# [3, 2], [3, 1], [2, 1]]
We evidently are to sample arr.size/2 elements from sample_space. Depending on whether this is to be done with or without replacement we would write
sample_space.sample(arr.size/2)
#=> [[4, 3], [1, 2], [1, 3]]
for sampling without replacement and
Array.new(arr.size/2) { sample_space.sample }
#=> [[1, 3], [4, 1], [2, 1]]
for sampling with replacement.
Sample elements of each pair sequentially, Method 1
This method, like the next, can only be used to sample with replacement.
Let's first consider sampling a single pair. We could do that by selecting the first element of the pair randomly from arr, remove all instances of that element in arr and then sample the second element from what's left of arr.
def sample_one_pair(arr)
first = arr.sample
[first, second = (arr-[first]).sample]
end
To draw a sample of arr.size/2 pairs we there execute the following.
Array.new(arr.size/2) { sample_one_pair(arr) }
#=> [[1, 2], [4, 3], [1, 2]]
Sample elements of each pair sequentially, Method 2
This method is a very fast way of sampling large numbers of pairs with replacement. Like the previous method, it cannot be used to sample without replacement.
First, compute the cdf (cumulative distribution function) for drawing an element of arr at random.
counts = arr.group_by(&:itself).transform_values { |v| v.size }
#=> {1=>3, 4=>1, 2=>2, 3=>1}
def cdf(sz, counts)
frac = 1.0/sz
counts.each_with_object([]) { |(k,v),a|
a << [k, frac * v + (a.empty? ? 0 : a.last.last)] }
end
cdf_first = cdf(arr.size, counts)
#=> [[1, 0.429], [4, 0.571], [2, 0.857], [3, 1.0]]
This means that there is a probability of 0.429 (rounded) of randomly drawing a 1, 0.571 of drawing a 1 or a 4, 0.857 of drawing a 1, 4 or 2 and 1.0 of drawing one of the four numbers. We therefore can randomly sample a number from arr by obtaining a (pseudo-) random number between zero and one (p = rand) and then determine the first element of counts_cdf, [n, q] for which p <= q:
def draw_random(cdf)
p = rand
cdf.find { |n,q| p <= q }.first
end
draw_random(counts_cdf) #=> 1
draw_random(counts_cdf) #=> 4
draw_random(counts_cdf) #=> 1
draw_random(counts_cdf) #=> 1
draw_random(counts_cdf) #=> 2
draw_random(counts_cdf) #=> 3
In simulation models, incidentally, this is the standard way of generating pseudo-random variates from discrete probability distributions.
Before drawing the second random number of the pair we need to modify cdf_first to reflect that fact that the first number cannot be drawn again. Assuming there will be many pairs to generate randomly, it is most efficient to construct a hash cdf_second whose keys are the first values drawn randomly for the pair and whose values are the corresponding cdf's.
cdf_second = counts.keys.each_with_object({}) { |n, h|
h[n] = cdf(arr.size - counts[n], counts.reject { |k,_| k==n }) }
#=> {1=>[[4, 0.25], [2, 0.75], [3, 1.0]],
# 4=>[[1, 0.5], [2, 0.833], [3, 1.0]],
# 2=>[[1, 0.6], [4, 0.8], [3, 1.0]],
# 3=>[[1, 0.5], [4, 0.667], [2, 1.0]]}
If, for example, a 2 is drawn for the first element of the pair, the probability is 0.6 of drawing a 1 for the second element, 0.8 of drawing a 1 or 4 and 1.0 of drawing a 1, 4, or 3.
We can then sample one pair as follows.
def sample_one_pair(cdf_first, cdf_second)
first = draw_random(cdf_first)
[first, draw_random(cdf_second[first])]
end
As before, to sample arr.size/2 values with replacement, we execute
Array.new(arr.size/2) { sample_one_pair }
#=> [[2, 1], [3, 2], [1, 2]]
With replacement, you may get results like:
unique_pairs([1, 1, 2, 2, 3, 4]) # => [[4, 1], [1, 2], [1, 3]]
Note that 1 gets chosen three times, even though it's only in the original array twice. This is because the 1 is "replaced" each time it's chosen. In other words, it's put back into the collection to potentially be chosen again.
Here's a version of Cary's excellent sample_one_pair solution without replacement:
def unique_pairs(arr)
dup = arr.dup
Array.new(dup.size / 2) do
dup.shuffle!
first = dup.pop
second_index = dup.rindex { |e| e != first }
raise StopIteration unless second_index
second = dup.delete_at(second_index)
[first, second]
end
rescue StopIteration
retry
end
unique_pairs([1, 1, 2, 2, 3, 4]) # => [[4, 3], [1, 2], [2, 1]]
This works by creating a copy of the original array and deleting elements out of it as they're chosen (so they can't be chosen again). The rescue/retry is in there in case it becomes impossible to produce the correct number of pairs. For example, if [1, 3] is chosen first, and [1, 4] is chosen second, it becomes impossible to make three unique pairs because [2, 2] is all that's left; the sample space is exhausted.
This should be slower than Cary's solution (with replacement) but faster (on average) than the posted solutions (without replacement) that require looping and retrying. Welp, chalk up another point for "always benchmark!" I was wrong about all most of my assumptions. Here are the results on my machine with an array of 16 numbers ([1, 1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7, 8, 9, 9, 10]):
cary_with_replacement
93.737k (± 2.9%) i/s - 470.690k in 5.025734s
mwp_without_replacement
187.739k (± 3.3%) i/s - 943.415k in 5.030774s
mudasobwa_without_replacement
129.490k (± 9.4%) i/s - 653.150k in 5.096761s
EDIT: I've updated the above solution to address Stefan's numerous concerns. In hindsight, the errors are obvious and embarrassing! On the plus side, the revised solution is now faster than mudasobwa's solution, and I've confirmed that the two solutions have the same biases.
You can check if there any mathes and shuffle again:
a = [1,1,2,2,3,4]
# first time shuffle
sliced = a.shuffle.each_slice(2).to_a
# checking if there are matches and shuffle if there are
while sliced.combination(2).any? { |a, b| a.sort == b.sort } do
sliced = a.shuffle.each_slice(2).to_a
end
It is unlikely, be aware about possibility of infinity loop

Right justifying array contents

I need to prepare some data for graphing and analysis and could use some advice on transforming that data.
I have an array of arrays. Each sub-array is a set of gathered data with the last element representing the most recent data for all. Earlier elements represent historical data. The sub-arrays have variable amounts of history. I would like to process the arrays so that the current data (last element in each sub-array) lines up with the right boundary.
For example:
[[2], [3, 5, 8, 9], [2, 10]]
should be transformed to, or processed as if it were:
[[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
I would prefer not to mutate the original data if possible, but can deal with it if it helps (I would just full_dup the array first and work on the copy)
candidate_matrix = [[2], [3, 5, 8, 9], [2, 10]]
row_size = candidate_matrix.map(&:size).max
candidate_matrix.map { |numbers| [nil] * (row_size - numbers.size) + numbers }
# => [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
Array#fill
As discussed in this answer, you can use Array#fill:
m = arr.max_by(&:size).size
arr.map { |s| s.reverse.fill(nil, m..m-1).reverse }
#=> [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
Array#insert
Or a more semantic answer would use you can use Array#insert:
arr.map { |s| s.dup.insert(0, [nil] * (m - s.size)).flatten }
#=> [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
Kernel#loop
arr.map { |sub|
sub = sub.dup
loop {
break if sub.size >= m
sub.insert 0, nil
}
sub
}
#=> [[nil, nil, nil, 2], [3, 5, 8, 9], [nil, nil, 2, 10]]
You can use while or until loops similarly which I think look nicer but then you'd face the wrath of the idiom-police. Also using unshift or insert require a dup which isn't ideal.
The reason I've used insert 0, nil here instead of just unshift nil is because say you wanted to left-justify instead, all you would have to do is replace the 0 with a -1 in insert's first argument.

How do I arrange an array as table rows?

I am trying to learn the Ruby way of array processing. What is a succinct way to write the following function?
def columnize(items, n_cols)
Items is a 1D array of arbitrary length. I want to return an array of rows, each having a length of n_cols, that includes all of the items column-wise, possibly with nils padding the last column. For example:
items = [1, 2, 3, 4, 5, 6, 7]
table = columnize items, 3
This should produce a table of:
[[1, 4, 7],
[2, 5, nil],
[3, 6, nil]]
Note that it's possible for the last column to be all nils as in:
columnize [1, 2, 3, 4, 5, 6, 7, 8, 9], 4
This is a real problem I need to solve for report generation. I have a Ruby newbie solution that is not very satisfying and can post it if desired.
You want to use Matrix class.
items = [1, 2, 3, 4, 5, 6, 7]
require 'matrix'
# ⇒ true
m = Matrix.build(3) { |row, col| items[row+col*3] }
# ⇒ Matrix[[1, 4, 7], [2, 5, nil], [3, 6, nil]]
Ruby's Array class has transpose which is designed to convert rows into columns. Using it in conjunction with fill and Enumerable's each_slice gives:
require 'pp'
def columnize(items, cols)
ary = items.dup.fill(nil, items.size, cols - items.size % cols )
ary.each_slice(ary.size / cols).to_a.transpose
end
items = [1, 2, 3, 4, 5, 6, 7]
pp columnize(items, 3)
pp columnize [1, 2, 3, 4, 5, 6, 7, 8, 9], 4
Which outputs:
[[1, 4, 7], [2, 5, nil], [3, 6, nil]]
[[1, 4, 7, nil], [2, 5, 8, nil], [3, 6, 9, nil]]
Except for filling rows that only have nil elements, this will do:
first, *rest = items.each_slice((items.length/n_cols).ceil).to_a
first.zip(*rest)

Getting all combinations of pairs from a list in Ruby

I have a list of elements (e.g. numbers) and I want to retrieve a list of all possible pairs. How can I do that using Ruby?
Example:
l1 = [1, 2, 3, 4, 5]
Result:
l2 #=> [[1,2], [1,3], [1,4], [1,5], [2,3], [2,4], [2,5], [3,4], [3,5], [4,5]]
In Ruby 1.8.6, you can use Facets:
require 'facets/array/combination'
i1 = [1,2,3,4,5]
i2 = []
i1.combination(2).to_a # => [[1, 2], [1, 3], [1, 4], [1, 5], [2, 3], [2, 4], [2, 5], [3, 4], [3, 5], [4, 5]]
In 1.8.7 and later, combination is built-in:
i1 = [1,2,3,4,5]
i2 = i1.combination(2).to_a
Or, if you really want a non-library answer:
i1 = [1,2,3,4,5]
i2 = (0...(i1.size-1)).inject([]) {|pairs,x| pairs += ((x+1)...i1.size).map {|y| [i1[x],i1[y]]}}

Resources