I have a file in which on each line there are multiple sentences separated by white spaces. SOmetimes one sentence may extend to next line. I want to extract these sentences separated by white space. My code successfully extracts sentences on same line separated by white space but since it is reading line by line. SO, issue comes when one sentence is extended to next line.
Store the part unused in creation of line at each iteration in temperary buffer. Include the buffer in the next iteration (append at the begining of line read).
Related
Originally I had File.foreachf(name, "\n\n") in my code. This worked fine with my own test files, but now using real data I'm running into those files potentially also using \r\n instead of \n\n.
I would like to split a file into chunks of data using the blank line as the delimiter.
Alternatively I also tried to use File.readlines(fname), however this will only split the file by the line and I can't then further sub-split it, even if the blank lines are now empty elements because I used .chomp.
Is there a way to split the file according to new lines as the delimiter that accounts for both \r\n and \n\n?
Thanks
You could write the following.
str =<<~_
Little Miss Muffet
sat on her
tuffet
eating her curds
and whey
_
str.split(/(?:^\r?\n)+/)
#=> ["Little Miss Muffet\nsat on her\n",
# "tuffet\n",
# "eating her curds\nand whey\n"]
The regular expression reads, "match one or more (+) contiguous
empty lines having line terminators of \r\n or \n.
You can write your regex to account for either \r or \n characters:
string.split(/[\r\n]\n/)
The brackets [] indicate any character within them can match, so that would mean the regex matches either \r or \n for the first character.
Well, I have a text file containing some lines. When I have to "delete" a line (known position from the start of the file) I read as many lines as needed until I reach the desired position; then from that position, I start to overwrite with the remaining lines after the line to be "deleted", now the old line is deleted.
But my idea is to move the EOF to the old last line, which needs to be deleted as it's now a duplicate. How I print or move the EOF (End Of File)?
Trying to write a shell script that will read in a text file which looks something like:
Line A needs to be removed
Line B also to be removed
Line C which has lots of things, including characters that need removing should be the first to be read into an array position [0]
Line D
.
.
Line "n"
What I need to do is read from line C up to line n-1 into an array, but also remove the first 4 characters and the last 2 characters of the useful lines (Line C to Line n-1).
I can't seem to do anything other than read in the entire list, or print/echo the partial list but can't get that into an array.
I'm happy to multi-step it, rather than do it all in one line, but what ever is clean.
Try this if works
head -n-1 txt | awk '{if($0~/LineC/){i=1}; if(i>0){print substr($0,5,length($0)-2)}}' txt
My goal is to print out every full line from a text file if that line contains a string that is equivalent to user input.
I understand how to find the occurrences of a specific string in a text file, but I am confused as to how to associate that with a specific line. How do I relate my string with the specific line that it is in?
My initial thought was to store each line in an array and then print out that line if the user string is somewhere in that line.
However each line is a different size, so I was wondering if it is possible for me to initially divide my entire text file into x number of lines and then use a loop to go through each line and search for that string?
Save the file pointer of the starting of the line in a temp variable before starting new line compare
I am reading a file with fgetc, so each time it reads a character, the cursor positio gets changed.
Is it possible to know, after each read, the "coordinates" of the cursor on the file in terms of column and line number?
Thanks
You can use ftell
It does not give you the position in terms of row and column but gives the current position in the stream from the start.
There is no "coordinates" in a file, only a position. A text file is simply a stream of bytes, and lines are separated by line breaks. So, when reading a text file you can calculate your "coordinates" if you scan the whole file. This means, if you really need some "row" and "column" value:
Read the file line by line. Count the newline characters, and you get the "row" number. Be aware that there are different line break characters on different OS -- unix line endings are different to Windows.
Read the line in question character by character and count the characters to the position in question. This will get you the "column" number. You obviously need to accept that the number of "columns" can vary between "rows", and it's perfectly possible to have "rows" with a "column count" of 0.
A different approach would be to
Read the file line by line and build an array of the position (using ftell) of the line breaks.
Now to figure the position of any character just get its position in the file, then find the nearest previous line break. From the line break count up to the character you get the "row", from the difference between the line break position and the current position you get the "column".
But most important is to accept that there is no rows or columns in files -- there's a position in a file, but the file itself is simply a stream of bytes. This also means that you would need to handle files encoded with wide character sets differently, as a character doesn't map to a byte anymore.