Advice on reading multiple text files into an array with Ruby - arrays

I'm currently writing out a program in Ruby, which I'm fairly new at, and it requires multiple text files to be pushed into an array line by line.
I am currently unable to actually test my code since I'm at work and this is for personal use, but I'm seeking advice to see if my code is correct. I knows how to read a file and push it to the array. If possible can someone check it over and advise if I have the correct idea? I'm self taught regarding Ruby and have no-one to check my work.
I understand if this isn't the right place for trying to get this sort of advice and it's deleted/locked. Apologies if so.
contentsArray = []
Dir.glob('filepath').each do |filename|
next if File.directory?(filename)
r = File.open("#{path}#{filename}")
r.each_line { |line| contentsArray.push line}
end
I'm hoping this snippet will take the lines from multiple files in the same directory and stick them in the array so I can later splice what's in there.

Thank you for the question.
First let's assume that 'filepath' is something like the target pattern you want to glob in Dir.glob('filepath') (I used Dir.glob('src/*.h').each do |filename| in my test).
After that, File.open("#{path}#{filename}") prepends another path to the already complete path you'll have in filename.
And lastly, although this is probably not the problem, the code opens the file and never closes it. The IO object provides a readlines method that takes care of opening and closing the file for you.
Here's some working code that you can adapt:
contentsArray = []
Dir.glob('filepath').each do |filename|
next if File.directory?(filename)
lines = IO.readlines(filename)
contentsArray.concat(lines)
end
puts "#{contentsArray.length} LINES"
Here are references to the Ruby doc's for the IO::readlines and Array::concat methods used:
https://ruby-doc.org/core-2.5.5/IO.html#method-i-readlines
https://ruby-doc.org/core-2.5.5/Array.html#method-i-concat
As an alternative to using the goto (next) the code could conditionally execute on files, like this:
if File.file?(filename)
lines = IO.readlines(filename)
contentsArray.concat(lines)
end

Related

error in looping over files, -fs- command

I'm trying to split some datasets in two parts, running a loop over files like this:
cd C:\Users\Macrina\Documents\exports
qui fs *
foreach f in `r(files)' {
use `r(files)'
keep id adv*
save adv_spa*.dta
clear
use `r(files)'
drop adv*
save fin_spa*.dta
}
I don't know whether what is inside the loop is correctly written but the point is that I get the error:
invalid '"e2.dta'
where e2.dta is the second file in the folder. Does this message refer to the loop or maybe what is inside the loop? Where is the mistake?
You want lines like
use "`f'"
not
use `r(files)'
given that fs (installed from SSC, as you should explain) returns r(files) as a list of all the files whereas you want to use each one in turn (not all at once).
The error message was informative: use is puzzled by the second filename it sees (as only one filename makes sense). The other filenames are ignored: use fails as soon as something is evidently wrong.
Incidentally, note that putting "" around filenames remains essential if any includes spaces.

Reading multiple files with the Ruby gem 'Yomu'

I'm trying to download documents and strip out the document metadata with the Yomu gem, but cannot find guidance for parsing multiple files. The semi working code is below, and should work if you put some pdf files in the same directory as the script.
require 'yomu'
dir = Dir.pwd
files = Dir["#{dir}/*.pdf"]
def allpdf(files)
filearray = []
files.each do |file|
filearray << file
end
filearray
end
def metadata(dir, allfiles)
array = []
allfiles.each do |file|
yomufile = Yomu.new file
array << yomufile.metadata["Author"]
puts array
end
end
allfiles = allpdf(files)
metadata(dir, allfiles)
So when I 'puts array' it spits out what I would expect. But if I call 'array' outside of the loop, I get a single entry repeated over and over, so I can only assume that the array/yomu hash is being overwritten perhaps. What is the best way to fix this so that I can return a full array for use elsewhere in the application?
Please Note: I suspect this may be a more general Ruby error on my part related to my lack of array skills rather than a Yomu specific issue. Im not sure how else to address this question however.
Jakub PavlĂ­k was correct, the code was actually working as stated, it just wasn't displaying the output in the way I expected!

How to find word from the end of file in Lua

Ok I use method from here: How to Read only the last line of a text file in Lua?
The problem is that sometimes line can be bigger.
The question is how can i find first word "foo" from the end of file and then use everything after it?
The problem is that sometimes line can be bigger.
Then you just need to seek further back from the end.
The question is how can i find first word "foo" from the end of file and then use everything after it?
Grab a big enough chunk of the file to be sure you've got the last foo, the use .*foo to skip everything up to and including the last "foo" (.* is greedy).
local f = io.open('filename', 'r')
f:seek('end', -1024)
local text = f:read('*a')
local after = string.match(text, ".*foo(.*)")
f:close()
If the file is not too big and you're ready to take the easy way out this might help:
fh=io.open('myfile.txt','rb')
str=fh:read'*a'
pat='foo'
afterFoo=str:match('.*'..pat..'(.*)$')
fh:close()
If you need a more complex, but faster (in run time on large files) solution, my guess would be that you' read in the file in chunks, reverse each of them, and look for your pattern in reverse. Don't forget to look for your pattern across the borders (the chunks must overlap at least the length of the pattern you're seeking in the general case).
For more explanation about the block reading, see my post here.

Create functions in matlab

How can I create a function with MATLAB so I can call it any where in my code?
I'm new to MATLAB so I will write a PHP example of the code I want to write in MATLAB!
Function newmatlab(n){
n=n+1;
return n;
}
array=array('1','2','3','4');
foreach($array as $x){
$result[]=newmatlab($x);
}
print_f($result);
So in nutshell, I need to loop an array and apply a function to each item in this array.
Can some one show me the above function written in MATLAB so I can understand better?
Note: I need this because I wrote a code that analyzes a video file and then plots data on a graph. I then and save this graph into Excel and jpg. My problem is that I have more than 200 video to analyze, so I need to automate this code to loop inside folders and analyze each *.avi file inside and etc.
As others have said, the documentation covers this pretty thoroughly, but perhaps we can help you understand.
There are a handful of ways that you can define functions in Matlab, but probably the most useful for you to get started is to define one in an m-file. I'll use your example code. You can do this by creating a file called newmatlab.m in your project's directory that looks something like this
% newmatlab.m
function result = newmatlab(array)
result = array + 1
Note that the function has the same name as the file and that there is no explicit return statement - it figures that out by what you've named the output parameter(s) (result in this case).
Then, in the same directory, you can create a script (or another function) that calls your newmatlab function by that name:
% main.m (or whatever)
a = [1 2 3 4];
b = newmatlab(a)
That's it! This is a simplified explanation, but hopefully enough to get you started and then the documentation can help more.
PS: There's no "include" in Matlab; any functions that are defined in m-files in the current path are visible. You can find out what's in the path by using the path command. Roughly, it's going to consist of
Matlab's own directory
The MATLAB subdirectory of your Documents directory
The current working directory

C - Reading multiple files

just had a general question about how to approach a certain problem I'm facing. I'm fairly new to C so bear with me here. Say I have a folder with 1000+ text files, the files are not named in any kind of numbered order, but they are alphabetical. For my problem I have files of stock data, each file is named after the company's respective ticker. I want to write a program that will open each file, read the data find the historical low and compare it to the current price and calculate the percent change, and then print it. Searching and calculating are not a problem, the problem is getting the program to go through and open each file. The only way I can see to attack this is to create a text file containing all of the ticker symbols, having the program read that into an array and then run a loop that first opens the first filename in the array, perform the calculations, print the output, close the file, then loop back around moving to the second element (the next ticker symbol) in the array. This would be fairly simple to set up (I think) but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this? Not really asking for code ( unless there is some amazing function in c that will do this for me ;) ), just some advice from more experienced C programmers.
Thanks :)
Edit: This is on Linux, sorry I forgot to metion that!
Under Linux/Unix (BSD, OS X, POSIX, etc.) you can use opendir / readdir to go through the directory structure. No need to generate static files that need to be updated, when the file system has the information you want. If you only want a sub-set of stocks at a given time, then using glob would be quicker, there is also scandir.
I don't know what Win32 (Windows / Platform SDK) functions are called, if you are developing using Visual C++ as your C compiler. Searching MSDN Library should help you.
Assuming you're running on linux...
ls /path/to/text/files > names.txt
is exactly what you want.
opendir(); on linux.
http://linux.die.net/man/3/opendir
Exemple :
http://snippets.dzone.com/posts/show/5734
In pseudo code it would look like this, I cannot define the code as I'm not 100% sure if this is the correct approach...
for each directory entry
scan the filename
extract the ticker name from the filename
open the file
read the data
create a record consisting of the filename, data.....
close the file
add the record to a list/array...
> sort the list/array into alphabetical order based on
the ticker name in the filename...
You could vary it slightly if you wish, scan the filenames in the directory entries and sort them first by building a record with the filenames first, then go back to the start of the list/array and open each one individually reading the data and putting it into the record then....
Hope this helps,
best regards,
Tom.
There are no functions in standard C that have any notion of a "directory". You will need to use some kind of platform-specific function to do this. For some examples, take a look at this post from Cprogrammnig.com.
Personally, I prefer using the opendir()/readdir() approach as shown in the second example. It works natively under Linux and also on Windows if you are using Cygwin.
Approach 1) I would just have a specific directory in which I have ONLY these files containing the ticker data and nothing else. I would then use the C readdir API to list all files in the directory and iterate over each one performing the data processing that you require. Which ticker the file applies to is determined only by the filename.
Pros: Easy to code
Cons: It really depends where the files are stored and where they come from.
Approach 2) Change the file format so the ticker files start with a magic code identifying that this is a ticker file, and a string containing the name. As before use readdir to iterate through all files in the folder and open each file, ensure that the magic number is set and read the ticker name from the file, and process the data as before
Pros: More flexible than before. Filename needn't reflect name of ticker
Cons: Harder to code, file format may be fixed.
but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this?
I have solved the exact same problem a while back, albeit for personal uses :)
What I did was to use the OS shell commands to generate a list of those files and redirected the output to a text file and had my program run through them.
On UNIX, there's the handy glob function:
glob_t results;
memset(&results, 0, sizeof(results));
glob("*.txt", 0, NULL, &results);
for (i = 0; i < results.gl_pathc; i++)
printf("%s\n", results.gl_pathv[i]);
globfree(&results);
On Linux or a related system, you could use the fts library. It's designed for traversing file hierarchies: man fts,
or even something as simple as readdir
If on Windows, you can use their Directory Management API's. More specifically, the FindFirstFile function, used with wildcards, in conjunction with FindNextFile

Resources