I am writing my first Perl program and it's a doozy. I'm happy to say that everything has been working for the most part, and searching this website has helped with most of my problems.
I am working with a large file composed of space separated values. I filter the file down to display only lines with a certain value in one of the columns, and output the filtered data to a new file. I then attempt to push all of the lines of that file into an array to use for looping. Here's some code:
my #orig_file_lines = <ORIG_FILE>;
open MAKE_NEW_FILE, '>', 'newfile.dat' or die "Couldn't open newfile.dat!";
&make_new_file(\#orig_file_lines); ##Creates a new, filtered newfile.dat
open NEW, "newfile.dat" or die "Couldn't open newfile.dat!";
my #lines;
while(<NEW>){
push(#lines,$_);
}
printf("%s\n", $lines[$#lines]); ##Should print entirety of last line of newfile.dat
The problem is twofold: 1. $#lines = 24500 here when the newly created file (newfile.dat) actually has 24503 lines (so it should be 24502), 2. the printf statement returns a truncated line 24500, cutting off that line prematurely by about two columns.
Every other line, e.g. $lines[0-24499], will successfully print the entire line even when it is wider than $lines[24500], so the length of that particular line (they're all long) is not the problem. But it is almost as if the array has gotten too large somehow, since it cut off part of one line, and then the next two lines. If so, how do I combat this?
It looks like you forgot to close MAKE_NEW_FILE before opening the same file with NEW.
Some other points to look at:
&function syntax is mostly deprecated because it bypasses prototype checking.
I trust that you are using use warnings; and use strict;.
I notice that you have a two-argument open and a three-argument open. Although both are legal they have different mindsets which makes using them together confusing to the programmer. I would stay with the three argument open because I think it is easier to understand (unless you are playing code golf)
Related
I'm trying to split some datasets in two parts, running a loop over files like this:
cd C:\Users\Macrina\Documents\exports
qui fs *
foreach f in `r(files)' {
use `r(files)'
keep id adv*
save adv_spa*.dta
clear
use `r(files)'
drop adv*
save fin_spa*.dta
}
I don't know whether what is inside the loop is correctly written but the point is that I get the error:
invalid '"e2.dta'
where e2.dta is the second file in the folder. Does this message refer to the loop or maybe what is inside the loop? Where is the mistake?
You want lines like
use "`f'"
not
use `r(files)'
given that fs (installed from SSC, as you should explain) returns r(files) as a list of all the files whereas you want to use each one in turn (not all at once).
The error message was informative: use is puzzled by the second filename it sees (as only one filename makes sense). The other filenames are ignored: use fails as soon as something is evidently wrong.
Incidentally, note that putting "" around filenames remains essential if any includes spaces.
I want to do the following:
open and read and ASCII file
locate a substring (geographical coordinates)
create its replacement (apply corrections to the original coordinates)
overwrite the original substring (write in the original file the corrected coordinates).
The format of the ASCII file is:
$GPGGA,091306.00,4548.17420,N,00905.47990,E,1,09,0.87,233.5,M,47.2,M,,*53
I will paste here only the part of the code that is responsible for this operation:
opnmea = fopen (argv[1], "r+");
if (fgets(row_nmea, ROW, opnmea)==NULL){
if (strstr(row_nmea,"$GPGGA")!=NULL) {
sscanf(row_nmea+17, "%10c", old_phi);
sscanf(row_nmea+30, "%11c", old_lam);
sscanf(row_nmea+54, "%5c", old_h);
fputs();
}
}
What I do till now is to extract in a variable the old coordinates and I was thinking to use fputs() for overwriting the old with new values. But I could not do it. The other part of the code that is not here is computing the correct coordinates. My idea is to correct the rows one by one, as the fgets() function reads each line.
I would appreciate very much any suggestion that can show me how to use fputs() or another function to complete my work. I am looking for something simple as I am beginner with C.
Thank you in advance.
Patching a text file in place is not a good solution for this problem, for multiple reasons:
the modified version might have a different length, hence patching cannot be done in place.
the read-write operation of standard streams is not so easy to handle correctly and defeats the buffering mechanism.
if you encounter an error during the patching phase, a partially modified file can be considered corrupted as one cannot tell which coordinates have been modified and which have not.
other programs might be reading from the same file as you are writing it. They will read invalid or inconsistent data.
I strongly recommend to write a program that reads the original file and writes a modified version to a different output file.
For this you need to:
open the original file for reading opnmea = fopen(argv[1], "r");
open the output file for writing: outfile = fopen(temporary_file_name, "w");
copy the lines that do not require modification: just call fputs(row_nmea, outfile).
parse relevant data in lines that require modification with whatever method you are comfortable with: sscanf, strtok, ...
compute the modified fields and write the modified line to outfile with fprintf.
Once the file has been completely and correctly handled, you can replace the original file with rename. The rename operation is usually atomic at the file-system level, so other programs will either finish reading from the previous version or open the new version.
Of course, if the file has only one line, you could simply rewind the stream and write back the line with fprintf, but this is a special case and it will fail if the new version is shorter than the original. Truncating the extra data is not easy. An alternative is to reopen the file in write mode ("w") before writing the modified line.
I would recommend strtok(), followed by your revision, followed by strcat().
strtok() will let you separate the line using the comma as a delimiter, so you will get the field you want reliably. You can break up the line into separate strings, revise the coordinates you wish, and reassemble the line, including the commas, with strcat().
These pages include nice usage examples, too:
http://www.cplusplus.com/reference/cstring/strtok/
http://www.cplusplus.com/reference/cstring/strcat/?kw=strcat
I had read somewhere about the following method to read the whole file into a Perl array at once,
open my $file, '<', $filePath or die "Error: Unable to open file : $!";
my #fileData = <$file>;
close $file;
I suppose the size of the array is only limited by the available system memory. I wanted to know how exactly this works in the background, since there are no loops involved here to read the file line by line and feed them into the array.
Your wish is my command — a comment transferred to an answer, with a mild correction en route.
What is there to say? In array list context, as provided by my #fileData, the <> operator reads lines into the array with an implicit loop. It works. Occasionally, it is useful.
Perl has a couple of mottos. One is TMTOWTDI — There's More Than One Way To Do It. Another is DWIM — Do What I Mean; at least, Perl does this more than many languages, provided you know what you're asking for. This is a piece of dwimmery.
readline is the Perl 5 built-in function that is implementing the <EXPR> operator. It has different behaviour in scalar and list context.
I've got a service which runs all the time and also keeps a log file. It basically adds new lines to the log file every few seconds. I'm written a small file which reads these lines and then parses them to various actions. The question I have is how can I delete the lines which I have already parsed from the log file without disrupting the writing of the log file by the service?
Usually when I need to delete a line in a file then I open the original one and a temporary one and then I just write all the lines to the temp file except the original which I want to delete. Obviously this method will not word here.
So how do I go about deleting them ?
In most commonly used file systems you can't delete a line from the beginning of a file without rewriting the entire file. I'd suggest instead of one large file, use lots of small files and rotate them for example once per day. The old files are deleted when you no longer need them.
Can't be done, unfortunately, without rewriting the file, either in-place or as a separate file.
One thing you may want to look at is to maintain a pointer in another file, specifying the position of the first unprocessed line.
Then your process simply opens the file and seeks to that location, processes some lines, then updates the pointer.
You'll still need to roll over the files at some point lest they continue to grow forever.
I'm not sure, but I'm thinking in this way:
New Line is a char, so you must delete chars for that line + New Line char
By the way, "moving" all characters back (to overwrite the old line), is like copying each character in a different position, and removing them from their old position
So no, I don't think you can just delete a line, you should rewrite all the file.
You can't, that just isn't how files work.
It sounds like you need some sort of message logging service / library that your program could connect to in order to log messages, which could then hide the underlying details of file opening / closing etc.
If each log line has a unique identifier (or even just line number), you could simply store in your log-parsing the identifier until which you got parsing. That way you don't have to change anything in the log file.
If the log file then starts to get too big, you could switch to a new one each day (for example).
(read) takes in a string from stdin, parses it as an s-expression, and returns that expression. How do I do the exact same thing, except taking input from a file?
Any of these:
(call-with-input-file "foo" read)
(with-input-from-file "foo" read)
The first will open the file and apply read on the open port to read a value and finally close it. The second is similar, except that it applies the function on no arguments in a dynamic context where the current input is read from the file. There are a bunch of other ways to do this, but you'll need to ask a more specific question...
(BTW, in the current repository version, which will be released as 4.2.3 soon, there is a new file->list function that will read all sexpressions from the file and return a list holding all of them.)