Having trouble putting my header back on my CSV file - arrays

Here is my code:
require 'CSV'
contents = CSV.read('/Users/namename/Desktop/test.csv')
arr = []
first_row = contents[0]
contents.shift
contents.each do |row|
if row[12].to_s =~ /PO Box/i or row[12].to_s =~ /^[[:digit:]]/
#File.open('out.csv','a').puts('"'+row.join('","')+'"')
arr << row
else
row[12], row[13] = row[13], row[12]
#File.open('out.csv','a').puts('"'+row.join('","')+'"')
arr << row
end
end
arr.unshift(first_row)
arr.each do |row|
File.open('out.csv', 'a').puts('"' + row.join('","') + '"')
end
First I .shift so that my header fields don't catch the pattern (and ultimately swap) in the first conditional of the first .each loop. Then I conditionally swap cell values that match the pattern, and then store the correctly shifted values in an array. After this, I .unshift to attempt to put back my header fields I stored in first_row, but when I view the resulting out.csv file I get all my headers in the middle. Why?
Example data:
https://gist.github.com/anonymous/e1017d3ba81634d9e1227e7fe49536cb

The root of your problem is that you're not using the features provided by the CSV module.
First, CSV.read takes a :headers option that will catch the headers for you so you don't have to worry about them and, as a bonus, lets you access fields by header name instead of numeric index (handy if the CSV fields' order is changed). With the :headers option, CSV.read returns a CSV::Table object, which has another benefit I'll discuss in a moment.
Second, you're generating your own faux-CSV output instead of letting the CSV module do it. This, in particular, is needless and dangerous:
...puts('"' + row.join('","') + '"')
If any of your column values has quotation marks or newlines, which need to be escaped, this will fail, badly. You could use CSV.generate_line(row) instead, but you don't need to if you've used the headers: option above. Like I said, it returns a CSV::Table object, which has a to_csv method, and that method accepts a :force_quotes option. That will quote every field just like you want—and, more importantly, safely.
Armed with the above knowledge, the code becomes much saner:
require "csv"
contents = CSV.read('/Users/namename/Desktop/test.csv', headers: true)
contents.each do |row|
next unless row["DetailActiveAddressLine1"] =~ /PO Box|^[[:digit:]]/i
row["DetailActiveAddressLine1"], row["DetailActiveAddressLine2"] =
row["DetailActiveAddressLine2"], row["DetailActiveAddressLine1"]
end
File.open('out.csv', 'a') do |file|
file.write(contents.to_csv(force_quotes: true))
end
If you'd like, you can see a version of the code in action (without file access, of course) on Ideone: http://ideone.com/IkdCpb

Related

How to continue iteration over a codition even after it has been met once or even multiple times in ruby

I am writing a small package manager in ruby, and while working on its' package searching functionality, I want to continue iterating over a list of matches, even if it has found a package or sting identical to the inputted string.
def makelist(jsn, searchterm)
len = jsn.length
n = 0
while n < len do
pkname = jsn[n]["Name"]
pkdesc = jsn[n]["Description"]
pkver = jsn[n]["Version"]
unless pkname != nil || pkdesc != nil
# skip
else
puts "#{fmt(fmt("aur/", 6),0)}#{fmt(pkname,0)} [#{fmt(pkver,8)}]\n #{pkdesc}"
n += 1
end
end
end
I have tried using an if statement, unless statement and a case statement in which I gave conditions for what it should do if it specifically finds a packages that matches searchterm but when I use this condition, it always skips all other conditions and ends the loop, printing only that result. This block of code works, but I want to use my fmt function to format the text of matches differently in the list. Anyone have any ideas of how I could accomplish this?
EDIT:
Based on some back and forth the desired behavior is to print matched results differently from unmatched results.
So, in Ruby you have access to "functional" patterns like select, map, reduce, etc. that can be useful for what you're doing here.
It's also advantageous to separate functionality into different methods (e.g. one that searches, one that turns them into a string, one that prints the output). This is just an example of how to break it down, but splitting up the responsibilities makes it much easier to test and see which function isn't doing what you want by examining the intermediate structures.
Also, I'm not sure how you want to "match" the search term, so I used String#include? but you can replace that with whatever matching algorithm you like.
But I think you're looking for something like this:
def makelist(jsn, searchterm)
jsn.select do |package|
package["Name"] && package["Description"]
end.map do |package|
if matches?(package, searchterm)
matched_package_to_string(package)
else
package_to_string(package)
end
end.join("\n")
end
def matches?(package, searchterm)
package['Name'].include?(searchterm) || package('Description').include?(searchterm)
end
def matched_package_to_string(package)
pkname = package["Name"]
pkdesc = package["Description"]
pkver = package["Version"]
"#{fmt(fmt("aur/", 6),0)}#{fmt(pkname,0)} [#{fmt(pkver,8)}]\n #{pkdesc}"
end
def package_to_string(package)
# Not sure how these should print
end

Finding specific instance in a list when the list starts with a comma

I'm uploading a spreadsheet and mapping the spreadsheet column headings to those in my database. The email column is the only one that is required. In StringB below, the ,,, simply indicates that a column was skipped/ignored.
The meat of my question is this:
I have a string of text (StringA) comes from a spreadsheet that I need to find in another string of text (StringB) which matches my database (this is not the real values, just made it simple to illustrate my problem so hopefully this is clear).
StringA: YR,MNTH,ANNIVERSARIES,FIRSTNAME,LASTNAME,EMAIL,NOTES
StringB: ,YEAR,,MONTH,LastName,Email,Comments <-- this list is dynamic
MNTH and MONTH are intentionally different;
excelColumnList = 'YR,MNTH,ANNIV,FIRST NAME,LAST NAME,EMAIL,NOTES';
mappedColumnList= ',YEAR,,MONTH,,First Name,Last Name,Email,COMMENTS';
mappedColumn= 'Last Name';
local.index = ListFindNoCase(mappedColumnList, mappedColumn,',', true);
local.returnValue = "";
if ( local.index > 0 )
local.returnValue = ListGetAt(excelColumnList, local.index);
writedump(local.returnValue); // dumps "EMAIL" which is wrong
The problem I'm having is the index returned when StringB starts with a , returns the wrong index value which affects the mapping later. If StringB starts with a word, the process works perfectly. Is there a better way to to get the index when StringB starts with a ,?
I also tried using listtoarray and then arraytolist to clean it up but the index is still off and I cannot reliably just add +1 to the index to identify the correct item in the list.
On the other hand, I was considering this mappedColumnList = right(mappedColumnList,len(mappedColumnList)-1) to remove the leading , which still throws my index values off BUT I could account for that by adding 1 to the index and this appears to be reliably at first glance. Just concerned this is a sort of hack.
Any advice?
https://cfdocs.org/listfindnocase
Here is a cfgist: https://trycf.com/gist/4b087b40ae4cb4499c2b0ddf0727541b/lucee5?theme=monokai
UPDATED
I accepted the answer using EDIT #1. I also added a comment here: Finding specific instance in a list when the list starts with a comma
Identify and strip the "," off the list if it is the first character.
EDIT: Changed to a while loop to identify multiple leading ","s.
Try:
while(left(mappedColumnList,1) == ",") {
mappedColumnList = right( mappedColumnList,(len(mappedColumnList)-1) ) ;
}
https://trycf.com/gist/64287c72d5f54e1da294cc2c10b5ad86/acf2016?theme=monokai
EDIT 2: Or even better, if you don't mind dropping back into Java (and a little Regex), you can skip the loop completely. Super efficient.
mappedColumnList = mappedColumnList.replaceall("^(,*)","") ;
And then drop the while loop completely.
https://trycf.com/gist/346a005cdb72b844a83ca21eacb85035/acf2016?theme=monokai
<cfscript>
excelColumnList = 'YR,MNTH,ANNIV,FIRST NAME,LAST NAME,EMAIL,NOTES';
mappedColumnList= ',,,YEAR,MONTH,,First Name,Last Name,Email,COMMENTS';
mappedColumn= 'Last Name';
mappedColumnList = mappedColumnList.replaceall("^(,*)","") ;
local.index = ListFindNoCase(mappedColumnList, mappedColumn,',', true);
local.returnValue = ListGetAt(excelColumnList,local.index,",",true) ;
writeDump(local.returnValue);
</cfscript>
Explanation of the Regex ^(,*):
^ = Start at the beginning of the string.
() = Capture this group of characters
,* = A literal comma and all consecutive repeats.
So ^(,*) says, start at the beginning of the string and capture all consecutive commas until reaching the next non-matched character. Then the replaceall() just replaces that set of matched characters with an empty string.
EDIT 3: I fixed a typo in my original answer. I was only using one list.
writeOutput(arraytoList(listtoArray(mappedColumnList))) will get rid of your leading commas, but this is because it will drop empty elements before it becomes an array. This throws your indexing off because you have one empty element in your original mappedColumnList string. The later string functions will both read and index that empty element. So, to keep your indexes working like you see to, you'll either need to make sure that your Excel and db columns are always in the same order or you'll have to create some sort of mapping for each of the column names and then perform the ListGetAt() on the string you need to use.
By default many CF list functions ignore empty elements. A flag was added to these function so that you could disable this behavior. If you have string ,,1,2,3 by default listToArray would consider that 3 elements but listToArray(listVar, ",", true) will return 5 with first two as empty strings. ListGetAt has the same "includeEmptyValues" flag so your code should work consistently when that is set to true.

How can I write strings to an h5 in matlab?

I've managed to answer my own question. This code will write cell arrays of any shape containing strings. The datasets can be modified/overwritten by simply calling again with a different input.
https://www.mathworks.com/matlabcentral/fileexchange/24091-hdf5-read-write-cellstr-example
%Okay, Matlab's h5write(filename, dataset, data) function doesn't work for
%strings. It hasn't worked with strings for years. The forum post that
%comes up first in Google about it is from 2009. Yeah. This is terrible,
%and evidently it's not getting fixed. So, low level functions. Fun fun.
%
%What I've done here is adapt examples, one from the hdf group's website
%https://support.hdfgroup.org/HDF5/examples/api18-m.html called
%"Read / Write String Datatype (Dataset)", the other by Jason Kaeding.
%
%I added functionality to check whether the file exists and either create
%it anew or open it accordingly. I wanted to be able to likewise check the
%existence of a dataset, but it looks like this functionality doesn't exist
%in the API, so I'm doing a try-catch to achieve the same end. Note that it
%appears you can't just create a dataset or group deep in a heirarchy: You
%have to create each level. Since I wanted to accept dataset names in the
%same format as h5read(), in the event the dataset doesn't exist, I loop
%over the parts of the dataset's path and try to create all levels. If they
%already exist, then this action throws errors too; hence a second
%try-catch.
%
%I've made it more advanced than h5create()/h5write() in that it all
%happens in one call and can accept data inputs of variable size. I take
%care of updating the dataset's extent to accomodate changing data array
%sizes. This is important for applications like adding a new timestamp
%every time the file is modified.
%
%#author Pavel Komarov pavel#gatech.edu 941-545-7573
function h5createwritestr(filename, dataset, str)
%"The class of input data must be cellstring instead of char when the
%HDF5 class is VARIABLE LENGTH H5T_STRING.", but also I don't want to
%force the user to put braces around single strings, so this.
if ischar(str)
str = {str};
end
%check whether the specified .h5 exists and either create or open
%accordingly
if ~exist(filename, 'file')
file = H5F.create(filename, 'H5F_ACC_TRUNC', 'H5P_DEFAULT', 'H5P_DEFAULT');
else
file = H5F.open(filename, 'H5F_ACC_RDWR', 'H5P_DEFAULT');
end
%set variable length string type
vlstr_type = H5T.copy('H5T_C_S1');
H5T.set_size(vlstr_type,'H5T_VARIABLE');
% There is no way to check whether a dataset exists, so just try to
% open it, and if that fails, create it.
try
dset = H5D.open(file, dataset);
H5D.set_extent(dset, fliplr(size(str)));
catch
%create the intermediate groups one at a time because evidently the
%API's functions aren't smart enough to be able to do this themselves.
slashes = strfind(dataset, '/');
for i = 2:length(slashes)
url = dataset(1:(slashes(i)-1));%pull out the url of the next level
try
H5G.create(file, url, 1024);%1024 "specifies the number of
catch %bytes to reserve for the names that will appear in the group"
end
end
%create a dataspace for cellstr
H5S_UNLIMITED = H5ML.get_constant_value('H5S_UNLIMITED');
spacerank = max(1, sum(size(str) > 1));
dspace = H5S.create_simple(spacerank, fliplr(size(str)), ones(1, spacerank)*H5S_UNLIMITED);
%create a dataset plist for chunking. (A dataset can't be unlimited
%unless the chunk size is defined.)
plist = H5P.create('H5P_DATASET_CREATE');
chunksize = ones(1, spacerank);
chunksize(1) = 2;
H5P.set_chunk(plist, chunksize);% 2 strings per chunk
dset = H5D.create(file, dataset, vlstr_type, dspace, plist);
%close things
H5P.close(plist);
H5S.close(dspace);
end
%write data
H5D.write(dset, vlstr_type, 'H5S_ALL', 'H5S_ALL', 'H5P_DEFAULT', str);
%close file & resources
H5T.close(vlstr_type);
H5D.close(dset);
H5F.close(file);
end
I found a bug!
spacerank = length(size(str));
Now it works flawlessly as far as I can tell.

Syntax request: "While" syntax for parsing arrays of JSON strings (VBA)

I am working to get my code to loop through arrays of JSON strings (of same format) until it reaches the end of the arrays (i.e., no strings left). I need the code to recognize it has reached the end by identifying that a certain identifier (present in every set) does not have additional information in the next array. So I believe I am looking for "while" syntax that says "while this identifier has content" proceed parsing the JSON according to the below code. My existing code works for array of strings for which I know the length - unfortunately the lengths are variable therefore I need flexible syntax to adjust with the lengths (i.e., "For 0 to n" doesn't work every time).
The JSON code I am parsing is in this format:
{"id":1,"prices":[{"name":"expressTaxi","cost":{"base":"USD4.50","fareType":
"time_plus_distance","cancelFee":"USD10.00","minimumAmt":"USD8.00","perMinute":"USD1.50",
"perDistanceUnit":"USD3.00"}}]
''note that this could have multiple arrays embedded. That is, from the "name" key to
''the "perDistanceUnit" key would all repeat, with different value pairs.
''The number of "name" to "perDistanceUnit" arrays is unknown.
Here the identifier structure I'd like to augment with some type of while loop (the "i" is the index number depending on the # of loop in the yet to be implemented "while" loop).
Json("prices")(i)("name")
So ideally looking for something like:
"While Json("prices")(i)("name") has information" then proceed on....
Please note again, everything works when I know the length -- just looking for a small syntax update, thank you! UPDATE: full code below:
Option Explicit
Sub getJSON()
sheetCount = 1
i = 1
urlArray = Array("URL1", “URL2”, “URL3”)
Dim MyRequest As Object: Set MyRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
Dim MyUrls: MyUrls = urlArray
Dim k As Long
Dim Json As Object
For k = LBound(MyUrls) To UBound(MyUrls)
With MyRequest
.Open "GET", MyUrls(k)
.Send
Set Json = JsonConverter.ParseJson(.ResponseText)
''[where I’d like some type of While statement checking into the below line for content]
Sheets("Sheet" & sheetCount).Cells(i, 1) = Json("prices")(i)("name")
Sheets("Sheet" & sheetCount).Cells(i, 2) = Json("prices")(i)("cost")("base")
i = i + 1
End With
sheetCount = sheetCount + 1
Next
End Sub
I'm not that familiar with the library you're using, bit it seems like it converts objects (items enclosed in { }) to dictionary objects, and array (things enclosed in [ ]) to Collections.
Depending on the structure of the parsed JSON, these objects may be nested: ie. one element in a dictionary may be a Collection(array).
Luckily for you both, of these object types have a Count property you can use to iterate over them (and the dictionary type also has a "keys" collection).
So to loop over each "price":
Dim i As Long, p As Object
For i = 1 To Json("prices").Count
Set p = Json("prices")(i)
Debug.Print p("name"), p("cost")("base"), p("cost")("fareType")
Next i

Clearing up confusion with Maps/Collections (Groovy)

I define a collection that is supposed to map two parts of a line in a tab-separated text file:
def fileMatches = [:].withDefault{[]}
new File('C:\\BRUCE\\ForensicAll.txt').eachLine { line ->
def (source, matches) = line.split (/\t/)[0, 2]
fileMatches[source] << (matches as int)}
I end up with entries such as filename:[984, 984] and I want [filename : 984] . I don't understand how fileMatches[source] << (matches as int) works. How might I get the kind of collection I need?
I'm not quite sure I understand what you are trying to do. How, for example, would you handle a line where the two values are different instead of the same (which seems to be what is implied by your code)? A map requires unique keys, so you can't use filename as a key if it has multiple values.
That said, you could get the result you want with the data implied by your result using:
def fileMatches = [:]
new File('C:\\BRUCE\ForensicAll.txt').eachLine { line ->
def (source, matches) = line.split(/\t/)[0,2]
fileMatches[source] = (matches as int)
}
But this will clobber the data (i.e., you will always end up with the second value from the last line your file. If that's not what you want you may want to rethink your data structure here.
Alternatively, assuming you want unique values, you could do:
def fileMatches = [:].withDefault([] as Set)
new File('C:\\BRUCE\ForensicAll.txt').eachLine { line ->
def (source, matches) = line.split(/\t/)[0,2]
fileMatches[source] << (matches[1] as int)
}
This will result in something like [filename:[984]] for the example data and, e.g., [filename:[984, 987]] for files having those two values in the two columns you are checking.
Again, it really depends on what you are trying to capture. If you could provide more detail on what you are trying to accomplish, your question may become answerable...

Resources