Ruby parsing CSV rows in a loop - arrays

I'm trying to write an CSV parser. Each line has multiple fields in which I need to process. Each line represents patient data, so I need each line processed by itself. Once I'm finished processing each line I need to go to the next until the end of the file is reached.
I've successfully started writing the parser in Ruby. The data is getting imported and it's creating an array of arrays (each line is an array).
The problem I'm having is properly looping through the data line by line. So, right now I can successfully process the first line and parse each field. I start running into a problem when I add another line with new patient data. The second line gets processed and added to the new array that has been created. For example, line 1 and line 2 once processed, get added to one big array instead of an array of arrays. The data imported needs to output in the same structure.
Here is my code so far:
original_data = Array.new
converted_data = Array.new
Dir.chdir 'convert'
CSV.foreach('CAREPRODEMO.CSV') do |raw_file|
original_data << raw_file
end
# Needed at beginning of array for each patient
converted_data.insert(0, 'Acvite', 'ACT')
# Start processing fields
original_data.each do |o|
# BEGIN Check for nil in original data and replace with empty string
o.map! { |x| x ? x : ''}
converted_data << o.slice(0)
# Remove leading zeros from account number
converted_data[2].slice!(0)
if converted_data[2].slice(1) == '0'
converted_data[2].slice!(1)
end
# Setup patient name to be processed
patient_name = Array.new
patient_name << o.slice(3..4)
converted_data << patient_name.join(' ')
# Setup patient address to be processed
patient_address = Array.new
patient_address << o.slice(5)
converted_data << patient_address.join(' ')
# END Check for nil in converted data and replace with empty string
converted_data.map! { |x| x ? x : ''}
end
# For debugging
p converted_data
Output:
["Acvite", "ACT", "D65188596", "SILLS DALTON H", "16243 B L RD", "00D015188596", "BALLARD DAVE H", "243 H L RD", "", "", ""]
Wanted:
["Acvite", "ACT", "D65188596", "SILLS DALTON H", "16243 B L RD"]
["Acvite", "ACT", "D15188596", "BALLARD DAVE H", "243 H L RD"]

You need to use array of array for storing results, you are using single array, hence the output that you have mentioned.
Move converted_data array inside the loop, and define a new array for collecting output of each loop. A possible approach is shown below.
original_data = Array.new
# Changed the variable name from converted_data
final_data = Array.new
...
original_data.each do |o|
converted_data = Array.new
...
# END Check for nil in converted data and replace with empty string
converted_data.map! { |x| x ? x : ''}
final_data << converted_data
end
p final_data

Related

How to find a specific value in a nested array?

I'm trying to figure out how to place a value into one of three arrays and then shuffle those arrays and have the program output the index location of the value.
Here is what I have so far:
# The purpose of this program is to randomly place the name Zac
# in one of three arrays and return the array number and position of
# Zac
A1 = ["John","Steve","Frank","Charles"]
A2 = ["Sam","Clint","Stuart","James"]
A3 = ["Vic","Jim","Bill","David"]
n = [A1,A2,A3]
name = "Zac"
def placename(title, namelist)
mix = rand(2)
namelist[mix] << title
namelist.shuffle
return namelist
end
allnames = [] << placename(name, n)
def findname(allnames, key)
allnames.each do |i|
until allnames[i].include?(key) == true
i+=1
end
location = allnames[i].find_index(key)
puts "The location and value of #{key} is #{location}"
end
end
findname(allnames, name)
At the moment I'm getting a "undefined method for Nil Class" error (no method error)
Can someone please clarify what I'm doing wrong with this or if there is a more effective way of going about this? Thanks in advance!!
Your approach assumes that in the block starting...
allnames.each do |i|
... that i will contain the index of the allnames element. This isn't true. i will contain the VALUE (contents) of the element.
What you could try as an alternative is...
allnames.each_with_index do |_value, i|
or, you can do...
allnames.each do |value|
and then replace all references to allnames[i] with value
another problem is that...
allnames = [] << placename(name, n)
puts the returned array of arrays inside ANOTHER array. I think what you want to do is..
allnames = placename(name, n)
I modified the last fewlines. I hope this is what you wanted
allnames = placename(name, n)
def findname allnames, key
r = allnames.map.with_index{|x,i|x.include?(key) ? i : p}-[p]
puts "The location of value #{key} is array number #{r[0]} and item number #{allnames[r[0]].index(key)}"
end
findname(allnames, name)
Edit: Randomization
To get randomized array number and item number you have to do the following
def placename(title, namelist)
mix = rand(3) # Since the number of arrays (nested within) is 3 we can use 3 instead of 2
namelist[mix] << title
namelist.map!{|x|x.shuffle}.shuffle! # Shuffling each item and the whole array in place.
return namelist
end
Assuming you want to modify the array in place, I'd do it like this:
# insert name into random subarray
def insert_name name
subarray_idx = rand #name_arrays.size
subarray = #name_arrays[subarray_idx]
insertion_idx = rand subarray.size
#name_arrays[subarray_idx].insert insertion_idx, name
sprintf '"%s" inserted at #name_arrays[%d][%d]',
name, subarray_idx, insertion_idx
end
# define starting array, then print & return the
# message for further parsing if needed
#name_arrays = [
%w[John Steve Frank Charles],
%w[Sam Clint Stuart James],
%w[Vic Jim Bill David],
]
p(insert_name 'Zac')
This has a few benefits:
You can inspect #name_arrays to validate that things look the way you expect.
The message can be parsed with String#scan if desired.
You can modify #insert_name to return your indexes, rather than having to search for the name directly.
If you don't capture the insertion index as a return value, or don't want to parse it from your message String, you can search for it by leveraging Enumerable#each_with_index and Array#index. For example:
# for demonstration only, set this so you can get the same
# results since the insertion index was randomized
#name_arrays =
[["John", "Steve", "Frank", "Charles"],
["Sam", "Clint", "Stuart", "James"],
["Vic", "Jim", "Zac", "Bill", "David"]]
# return indices of nested match
def find_name_idx name
#name_arrays.each_with_index
.map { [_2, _1.index(name)] }
.reject { _1.any? nil }
.pop
end
# use Array#dig to retrieve item at nested index
#name_arrays.dig *find_name_idx('Zac')

Ruby sort order of array of hash using another array in an efficient way so processing time is constant

I have some data that I need to export as csv. It is currently about 10,000 records and will keep growing hence I want an efficient way to do the iteration especially with regards to running several each loop, one after the other.
My question is that is there a away to avoid the many each loops i describe below and if not is there something else I can use beside Ruby's each/map to keep processing time constant irrespective of data size.
For instance:
First i will loop through the whole data to flatten and rename the fields that hold array values so that fields like issue that hol array value will be come issue_1 and issue_1 if it contains only two items in the array.
Next I will do another loop to get all the unique keys in the array of hashes.
Using the unique keys from step 2, I will do another loop to sort this unique keys using a different array that holds the order that the keys should be arranged in.
Finally another loop to generate the CSV
So I have iterated over the data 4 times using Ruby's each/map every time and the time to complete this loops will increase with data size.
Original data is in the form below :
def data
[
{"file"=> ["getty_883231284_200013331818843182490_335833.jpg"], "id" => "60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded" => "2019-12-24", "date_modified" => "2019-12-24", "book_title_1"=>"", "title"=> ["haha"], "edition"=> [""], "issue" => ["nov"], "creator" => ["yes", "some"], "publisher"=> ["Library"], "place_of_publication" => "London, UK"]},
{"file" => ["getty_883231284_200013331818843182490_335833.jpg"], "id" => "60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded" => "2019-12-24", "date_modified"=>"2019-12-24", "book_title"=> [""], "title" => ["try"], "edition"=> [""], "issue"=> ["dec", 'ten'], "creator"=> ["tako", "bell", 'big mac'], "publisher"=> ["Library"], "place_of_publication" => "NY, USA"}]
end
Remapped date by flattening arrays and renaming the keys holding those array
def csv_data
#csv_data = [
{"file_1"=>"getty_883231284_200013331818843182490_335833.jpg", "id"=>"60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded"=>"2019-12-24", "date_modified"=>"2019-12-24", "book_title_1"=>"", "title_1"=>"haha", "edition_1"=>"", "issue_1"=>"nov", "creator_1"=>"yes", "creator_2"=>"some", "publisher_1"=>"Library", "place_of_publication_1"=>"London, UK"},
{"file_1"=>"getty_883231284_200013331818843182490_335833.jpg", "id"=>"60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded"=>"2019-12-24", "date_modified"=>"2019-12-24", "book_title_1"=>"", "title_1"=>"try", "edition_1"=>"", "issue_1"=>"dec", "issue_2" => 'ten', "creator_1"=>"tako", "creator_2"=>"bell", 'creator_3' => 'big mac', "publisher_1"=>"Library", "place_of_publication_1"=>"NY, USA"}]
end
Sorting the headers for the above data
def csv_header
csv_order = ["id", "edition_1", "date_uploaded", "creator_1", "creator_2", "creator_3", "book_title_1", "publisher_1", "file_1", "place_of_publication_1", "journal_title_1", "issue_1", "issue_2", "date_modified"]
headers_object = []
sorted_header = []
all_keys = csv_data.lazy.flat_map(&:keys).force.uniq.compact
#resort using ordering by suffix eg creator_isni_1 comes before creator_isni_2
all_keys = all_keys.sort_by{ |name| [name[/\d+/].to_i, name] }
csv_order.each {|k| all_keys.select {|e| sorted_header << e if e.start_with? k} }
sorted_header.uniq
end
The generate the csv which also involves more loop:
def to_csv
data = csv_data
sorted_headers = csv_header(data)
csv = CSV.generate(headers: true) do |csv|
csv << sorted_header
csv_data.lazy.each do |hash|
csv << hash.values_at(*sorted_header)
end
end
end
To be honest, I was more intrigued to see if I am able to find out what your desired logic is without further description, than about the programming part alone (but of course i enjoyed that as well, it has been ages i did some Ruby, this was a good refresher). Since the mission is not clearly stated, it has to be "distilled" by reading your description, input data and code.
I think what you should do is to keep everything in very basic and lightweight arrays and do the heavy lifting while reading the data in one single big step.
I also made the assumption that if a key ends with a number, or if a value is an array, you want it to be returned as {key}_{n}, even if there's only one value present.
So far i came up with this code (Logic described in comments) and repl demo here
class CustomData
# #keys array structure
# 0: Key
# 1: Maximum amount of values associated
# 2: Is an array (Found a {key}_n key in feed,
# or value in feed was an array)
#
# #data: is a simple array of arrays
attr_accessor :keys, :data
CSV_ORDER = %w[
id edition date_uploaded creator book_title publisher
file place_of_publication journal_title issue date_modified
]
def initialize(feed)
#keys = CSV_ORDER.map { |key| [key, 0, false]}
#data = []
feed.each do |row|
new_row = []
# Sort keys in order to maintain the right order for {key}_{n} values
row.sort_by { |key, _| key }.each do |key, value|
is_array = false
if key =~ /_\d+$/
# If key ends with a number, extract key
# and remember it is an array for the output
key, is_array = key[/^(.*)_\d+$/, 1], true
end
if value.is_a? Array
# If value is an array, even if the key did not end with a number,
# we remember that for the output
is_array = true
else
value = [value]
end
# Find position of key if exists or nil
key_index = #keys.index { |a| a.first == key }
if key_index
# If you could have a combination of _n keys and array values
# for a key in your feed, you need to change this portion here
# to account for all previous values, which would add some complexity
#
# If current amount of values is greater than the saved one, override
#keys[key_index][1] = value.length if #keys[key_index][1] < value.length
#keys[key_index][2] = true if is_array and not #keys[key_index][2]
else
# It is a new key in #keys array
key_index = #keys.length
#keys << [key, value.length, is_array]
end
# Add value array at known key index
# (will be padded with nil if idx is greater than array size)
new_row[key_index] = value
end
#data << new_row
end
end
def to_csv_data(headers=true)
result, header, body = [], [], []
if headers
#keys.each do |key|
if key[2]
# If the key should hold multiple values, build the header string
key[1].times { |i| header << "#{key[0]}_#{i+1}" }
else
# Otherwise it is a singular value and the header goes unmodified
header << key[0]
end
end
result << header
end
#data.each do |row|
new_row = []
row.each_with_index do |value, index|
# Use the value counter from #keys to pad with nil values,
# if a value is not present
#keys[index][1].times do |count|
new_row << value[count]
end
end
body << new_row
end
result << body
end
end

Print array elements in reverse order

The first line contains an integer N, (the size of our array).
The second line contains N space-separated integers describing array's(A's) elements.
I have tried the following, however I looked at the solution page. However I do not understand how this code works. Can someone please explain it to me. I am pretty new in this coding world.
import math
import os
import random
import re
import sys
if __name__ == '__main__':
n = int(input())
arr = [int(arr_one) for arr_one in input().strip().split(' ')]
for i in range(len(arr)):
print(str(arr[-i-1]), end = " ")
input 1234
output 4 3 2 1
In Python3:
if __name__ == '__main__':
n = int(input())
arr = list(map(int, input().rstrip().split()))
print(" ".join(str(x) for x in arr[::-1]))
Input:
1 4 3 2
Output:
2 3 4 1
You are creating a list of integer values, by removing spaces and splitting the values at ' '. After obtaining the list of integers, you are iterating over the list and converting the ith element from the back (a negative value of index denotes element with ith index from right and it is 1 based) of arr back to string and printing the number.
Example:
arr = [1,2,3,4]
print(arr[1]) #prints 2 on the console, i.e 2nd element from the left.
print(arr[-1]) #prints 4 on the console, i.e 1st element from the right.
Let's take this code snippet
n = int(input())
arr = [int(arr_one) for arr_one in input().strip().split(' ')]
for i in range(len(arr)):
print(str(arr[-i-1]), end = " ")
The method input() will take the user input from key board. int(input()) will convert the input into int, if the input is in string format. like "4" instead of 4. The input value stored into variable n.
The Array input will be like this "1 2 3 4". So, we need to separate the string with space delimiter.
The strip() method returns a copy of the string with both leading and trailing characters removed.
The split() method returns a list of strings after breaking the given string by the specified separator.Here the separator is space. So, split(' ')
input().strip().split(' ') will take "1 2 3 4" as input and the output is "1" "2" "3" "4".
Now we need to take each element after separated. And then covert into int and store into array.
arr = [int(arr_one) for arr_one in input().strip().split(' ')]
arr_one is a variable, this variable stores each element after split. For each element, we converted it into int and then storing into a array arr.
In python, array index start from 0. If we want to access from last index in the array, the index will start from -1, -2, -3, and so on.
for i in range(len(arr)): The for loop will iterate from index 0 to length of the array. in this example, size is 4.
printing array elements from index -1. and the end argument is used to end the print statement with given character, here the end character is " ". So the output will be 4 3 2 1.
The above code can be rewritten as below with more readability.
if __name__ == '__main__':
n = int(input())
inp = input("Enter the numbers seperated by spaces:::")
inp = inp.strip() # To remove the leading and trailing spaces
array = []
for item in inp.split(' '): # Splitting the input with space and iterating over each element
array.append(int(item)) # converting the element into integer and appending it to the list
print(array[::-1]) # here -1 says to display the items in the reverse order. Look into list comprehension for more details
For more details on list slicing, look in the python documentation.
Try this!
if __name__ == '__main__':
n = int(input()) # input as int from stream
arr = [int(arr_one) for arr_one in input().strip().split(' ')]
"""
1. asking for input from user
2. strip() function removes leading and trailing characters.
3. split(' ') function split your input on space into list of characters
4. arr_one variable contains yours splited character and your iterating over it using for loop
5. int(arr_one) converts it into integer and [] is nothing just storing everything into another list.
6. In last you are assigning new list to arr variable
"""
for i in reversed(arr): # loop over list in reverse order with built in fucntion
print(i, end = " ") # printing whatever comes in i
It should work like this:
3 # your n
1 2 3 # your input
3 2 1 # output

Find a Duplicate in an array Ruby

I am trying to find the duplicate values in an array of strings between 1 to 1000000.
However, with the code I have, I get the output as all the entries that are doubled.
So for instance, if I have [1,2,3,4,3,4], it gives me the output of 3 4 3 4 instead of 3 4.
Here is my code:
array = [gets]
if array.uniq.length == array.length
puts "array does not contain duplicates"
else
puts "array does contain duplicates"
print array.select{ |x| array.count(x) > 1}
end
Also, every time I test my code, I have to define the array as array = [1,2,3,4,5,3,5]. The puts works but it does not print when I use array [gets].
Can someone help me how to fix these two problems?
How I wish we had a built-in method Array#difference:
class Array
def difference(other)
h = other.tally
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
though #user123's answer is more straightforward. (Array#difference is probably the more efficient of the two, as it avoids the repeated invocations of count.) See my answer here for a description of the method and links to its use.
In a nutshell, it differs from Array#- as illustrated in the following example:
a = [1,2,3,4,3,2,4,2]
b = [2,3,4,4,4]
a - b #=> [1]
a.difference b #=> [1, 3, 2, 2]
For the present problem, if:
arr = [1,2,3,4,3,4]
the duplicate elements are given by:
arr.difference(arr.uniq).uniq
#=> [3, 4]
For your first problem, you need to uniq function like
array.select{ |x| array.count(x) > 1}.uniq
For your second problem, when you receive a value using array = [gets] it would receive your entire sequence of array numbers as a single string, so everything would be stored in a[0] like ["1, 2 3 4\n"].
puts "Enter array"
array = gets.chomp.split(",").map(&:to_i)
if array.uniq.length == array.length
puts "array does not contain duplicates"
else
puts "array does contain duplicates"
print array.select{ |x| array.count(x) > 1}.uniq
end
copy this code in ruby file and try to run using
ruby file_name.rb
Coming to your 'gets' problem,
When you are doing a gets, your are basically getting a string as an input but not an array.
2.2.0 :001 > array = [gets]
1,2,1,4,1,2,3
=> ["1,2,1,4,1,2,3\n"]
See the above example, how the ruby interpreter took all your elements as a single string and put it in an array as a single array element. So you need to explicitly convert the input to an array with comma as a delimiter. The below will address both your questions.
array = gets.chomp
array = array.split(',').map(&:to_i)
if array.uniq.length == array.length
puts "array does not contain duplicates"
else
puts "array does contain duplicates"
print array.select{ |x| array.count(x) > 1}.uniq!
end

Extract the contents from CSV into an array

I have a CSV file with contents:
John,1,2,4,67,100,41,234
Maria,45,23,67,68,300,250
I need to read this content and separate these data into two sections:
1.a Legend1 = John
1.b Legend2 = Maria
2.a Data_array1 = [1,2,4,67,100,41,234]
2.b Data_array2 = [45,23,67,a,67,300,250]
Here is my code; it reads the contents and separates the contents from ','.
testsample = CSV.read('samples/linechart.csv')
CSV.foreach('samples/linechart.csv') do |row|
puts row
end
Its output results in a class of array elements. I am stuck in pursuing it further.
I would recommend not using CSV.read for this it's too simple for that - instead, use File.open and read each line and treat it as a big string.
eg:
# this turns the file into an array of lines
# eg you now have: ["John,1,2,4,67,100,41,234", "Maria,45,23,67,a,67,300,250"]
lines = File.readlines('samples/linechart.csv')
# if you want to do this for each line, just iterate over this array:
lines.each do |line|
# now split each line by the commas to turn it into an array of strings
# eg you have: ["john","1","2","4","67","100","41","234"]
values = line.split(',')
# now, grab the first one as you name and the rest of them as an array of strings
legend = values[0] # "john"
data_array = values[1..-1] # ["1","2","4","67","100","41","234"]
# now do what you need to do with the name/numbers eg
puts "#{legend}: [#{data_array.join(',')}]"
# if you want the second array to be actual numbers instead of strings, you can convert them to numbers using to_i (or to_f if you want floats instead of integers)
# the following says "take each value and call to_i on it and return the set of new values"
data_array = data_array.map(&:to_i)
end # end of iterating over the array
First get the data out of csv like:
require 'csv'
csv_text = File.read('/tmp/a.csv')
csv = CSV.parse(csv_text)
# => [["John", "1", "2", "4", "67", "100", "41", "234"], ["Maria", "45", "23", "67", "a", "67", "300", "250"]]
Now you can format output as per your requirements. Eg:
csv.each.with_index(1){ |a, i|
puts "Legend#{i.to_s} = #{a[0]}"
}
# Legend1 = John
# Legend2 = Maria
You may looking for this,
csv = CSV.new(body)
csv.to_a
You can have a look at http://technicalpickles.com/posts/parsing-csv-with-ruby/
Reference this, too, if needed.
Over-engineered version ;)
class Lines
class Line
attr_reader :legend, :array
def initialize(line)
#line = line
parse
end
private
def parse
#legend, *array = #line.strip.split(",")
#array = array.map(&:to_i)
end
end
def self.parse(file_name)
File.readlines(file_name).map do |line|
Line.new(line)
end
end
end
Lines.parse("file_name.csv").each do |o|
p o.legend
p o.array
puts
end
# Result:
#
# "John"
# [1, 2, 4, 67, 100, 41, 234]
#
# "Maria"
# [45, 23, 67, 68, 300, 250]
Notes:
Basically, Lines.parse("file_name.csv") will give you an array of objects that will respond to the methods: legend and array; which holds the name and array of numbers respectively.
Jokes aside, I think OO will help maintainability.

Resources