Handling CR line endings in Lua - file

I'm trying to read a file with CR line endings using the file:read method which seems to be acting up for some reason. The file contents look like this:
ABCDEFGH
12345
##
6789
I want it to behave consistently with all types of line endings. Every time I try to read the file, it returns the last line in the file concatenated with the any trailing characters from the previous lines that have a greater position than the position of the last character in the last line. Here's what I mean:
> file=io.open("test.lua", "rb")
> function re_read(openFile)
openFile:seek("set");
return openFile:read("*a");
end
> =re_read(file) -- With CR
67895FGH
> =re_read(file) -- With CRLF
ABCDEFGH
12345
##
6789
> =re_read(file) -- with LF
ABCDEFGH
12345
##
6789
>
As you can see, the string being returned is the last string plus 5 in the previous line and plus FGH from the first line. Any lines shorter than the last line are skipped.
My goal is to use the file:line() method to read the file line by line. My hope is that if a 'fix' for file:read is found then it can be applied to file:lines().

In the case with CR only, re_read actually works as expected: it returns the lines separated by CR. But when the interpreter displays it, it interprets the CR characters as "go back to the beginning of the line". So here is how the result changes line by line:
ABCDEFGH
12345FGH
##345FGH
67895FGH
EDIT: here it is character by character, with a "virtual cursor" (|).
|
A|
AB|
ABC|
ABCD|
ABCDEF|
ABCDEFGH|
|ABCDEFGH
1|BCDEFGH
12|CDEFGH
123|DEFGH
1234|EFGH
12345|FGH
|12345FGH
#|2345FGH
##|345FGH
|##345FGH
6|#345FGH
67|345FGH
678|45FGH
6789|5FGH
Proof:
> s = "ABCDEFGH\r12345\r##\r6789"
> =s
67895FGH

You could normalize your line endings with gsub then iterate over the product with gmatch.
local function cr_lines(s)
return s:gsub('\r\n?', '\n'):gmatch('(.-)\n')
end
local function cr_file_lines(filename)
local f = io.open(filename, 'rb')
local s = f:read('*a')
f:close()
return cr_lines(s)
end
for ln in cr_file_lines('test.txt') do
print(ln)
end

Related

Read file txt with lua

A simple question. I have 1 file test.txt in userPath().."/log/test.txt with 15 line
I wish read first line and remove first line and finally file test.txt with 14 line
local iFile = 'the\\path\\test.txt'
local contentRead = {}
local i = 1
file = io.open(iFile, 'r')
for lines in file:lines() do
if i ~= 1 then
table.insert(contentRead, lines)
else
i = i + 1 -- this will prevent us from collecting the first line
print(lines) -- just in case you want to display the first line before deleting it
end
end
io.close(file)
local file = io.open(iFile, 'w')
for _,v in ipairs(contentRead) do
file:write(v.."\n")
end
io.close(file)
there must be other ways to simplify this, but basically what I did in the code was:
Open the file in reading mode, and store all lines of text except the first line in the table contentRead
I opened the file again, but this time in Write mode, causing the entire contents of the file to be erased, and then, I rewrote all the contents stored in the table contentRead in the file.
Thus, the first line of the file was "deleted" and only the other 14 lines remained

Octave - Adding '\n' to String Array is Not Creating a New Line

I want to change ',' character to '\n' and save it to the text file
All files are in this format:
546,234,453,685,.....,234
I want to make it like:
546
234
453
685
...
234
My initiation to this problem is like this:
fid=fopen(files{i});
strArr=fscanf(fid,'%s');
newstrArr=strrep(strArr,',','\n');
% Take each .txt input
for j=1:length(newstrArr)
Array=[Array newstrArr(j)];
endfor
Let me explain step by step:
1st I open the current text file
fid=fopen(files{i});
2nd I find the strings in text file
strArr=fscanf(fid,'%s');
Please Note that you can't replace %s with %d. (Correct me if I am wrong)
3rd I replace commas with newline character
newstrArr=strrep(strArr,',','\n');
4th I add each character to a new array with for loop
for j=1:length(newstrArr)
Array=[Array newstrArr(j)];
endfor
However When I display, using;
disp(Array);
I have this output
How can I properly replace the commas with newlines?
Regards
The issue is that you are inserting a literal '\n' (the characters \ and n) and not a newline character. This is because in Octave, a single-quote enclosed string ignores escape sequences. If you want Octave to respect escape sequences you could use a double-quoted string which will convert \n into a newline.
strrep(strArr, ',', "\n");
Or if you want your code to be MATLAB-compatible, you'll want to instead use char(10) (an actual new-line character). This is because MATLAB does not have double-quote enclosed strings.
output = strrep(strArr, ',', char(10));
Another option would be to split your input at the , and use sprintf to add the newlines (it'll treat \n as a newline)
values = strsplit(strArr, ',');
output = sprintf('%s\n', values{:});
If you just want to save each entry to a new line in a file, you can use fprintf instead.
values = strsplit(strArr, ',');
fout = fopen('output.txt', 'w');
fprintf(foug, '%s\n', values{:});
fclose(fout);
If you really just want to replace "," with newline simply do
in = fileread ("yourfile");
out = strrep (in, ",", "\n")
out = 546
234
453
685
234
Btw, see the difference between "\n" (in GNU Octave a newline) and '\n' (literally \n)
Another option is to use regexprep(), this has the advantage of being MATLAB compatible. Assuming that the newline convention you want is \n, then
regexprep('123,456,789',',','\n')
ans = 123
456
789
When output to a file via fprintf() the result looks like
123
456
789
provided the text editor understands the newline convention.

How do you count number of characters from each lines then add them all up?

I have given a question to write a function "that returns a count of the number of characters in the file whose name is given as a parameter."
So if a file called "data.txt" contains "Hi there!" and is printed by using my codes from below, it will return value of 10. (which is correct)
"""Attemping Question 7.
Author: Ark
Date: 28/04/2015
"""
def file_size(filename):
"""extracts word from a line"""
filename = open(filename, 'r')
for line in filename:
result = len(line) #count number of characters in a line.
return result
However, let say I have made another file called "data2.txt" and it contains
EEEEE
DDDD
CCC
BB
A
If I print this out it would give the value of 6. So, my challenge starts here.. what can I do with my coding to read the lines and add them all up?
print(file_size("data2.txt"))
expected 16 words (?)
You must sum the lengths of the lines, right now you return the length of the very first line.
Also, you must strip a trailing newline if it's there. This should work:
def character_count(filename):
with open(filename) as f:
return sum(len(line.rstrip("\n")) for line in f)

Awk - Separate one .txt file to files by condition

I have one problem, I would like to separate one file by condition to more files.
INPUT: One text file
variable chrom=chr1
1000 10
1010 20
1020 10
vriable chrom=chr2
1000 20
1100 30
1200 10
OUTPUT: two files for this example.
chr1.txt
variable chrom=chr1
1000 10
1010 20
1020 10
chr2.txt
variable chrom=chr2
1000 20
1100 30
1200 10
So, the separator condition if row starts with chrom=chr$i (i={1..22}) => separate to other text file.
Thank you
Something along these lines:
awk 'BEGIN { filename="unknown.txt" } /^variable chrom=/ { close(filename); filename = substr($0, index($0, "=") + 1) ".txt"; } { print > filename }'
Where the awk code is
BEGIN { filename="unknown.txt" } # default file name, used only if the
# file doesn't start with a variable chrom=
# line
/^variable chrom=/ { # in such a line:
close(filename) # close the previous file (if open)
# and set the new filename
filename = substr($0, index($0, "=") + 1) ".txt" filename
}
{ print > filename } # print everything to the current file.
The basic algorithm is very straightforward: Read file linewise, change filename when you find a line that starts a new section, always print the current line to the current file, so the devil is in the detail of isolating the file name from the marker line. The
filename = substr($0, index($0, "=") + 1) ".txt"
approach is simplistic but serviceable for the example you showed: It takes everything after the = and attaches .txt to get the file name. If your marker lines are more complicated than variable chrom=filenamestub, this will have to be amended, but in that case I could only guess your requirements and would probably guess wrong.
If you know how many lines there are between, you could use
split -l 4 textfile.txt
This will split the textfile every 4th line it finds, making the files xaa and xab, and so on.

Python: read one word per line of a text file

Its not a proper code but I want to know if there is a way to search just one word w./o using .split() as it forms a list and i dont want that with this snippet:
f=(i for i in fin.xreadlines())
for i in f:
try:
match=re.search(r"([A-Z]+\b) | ([A-Z\'w]+\b) | (\b[A-Z]+\b) | (\b[A-Z\'w]+\b) | (.\w+\b)", i) # | r"[A-Z\'w]+\b" | r"\b[A-Z]+\b" | r"\b[A-Z\'w]+\b" | r".\w+\b"
Also can i make a reusable class module like this
class LineReader: #Intended only to be used with for loop
def __init__(self,filename):
self.fin=open(filename,'r')
def __getitem__(self,index):
line=self.fin.xreadline()
return line.split()
where say f=LineReader(filepath)
and for i in f.getitem(index=line number 25) loop starts from there?
i dont know how to do that.any tips?
To get the first word of a line:
line[:max(line.find(' '), 0) or None]
line.find(' ') searches for the first whitespace, and returns it. If there is no whitespace found it returns -1
max( ... ), 0) makes sure the result is always greater than 0, and makes -1 0. This is usefull because bool(-1) is True and bool(0) is False.
x or None evaluates to x if x != 0 else None
and finaly line[:None] is equal to line[:], which returns a string identical to line
First sample:
with open('file') as f:
for line in f:
word = line[:max(line.find(' '), 0) or None]
if condition(word):
do_something(word)
And the class (implemented as a generator here)
def words(stream):
for line in stream:
yield line[:max(line.find(' '), 0) or None]
Which you could use like
gen = words(f)
for word in gen:
if condition(word):
print word
Or
gen = words(f)
while 1:
try:
word = gen.next()
if condition(word):
print word
except StopIteration:
break # we reached the end
But you also wanted to start reading from a certain linenumber. This can't be done very efficient if you don't know the lengths of the lines. The only way is reading lines and discarding them until you reach the right linenumber.
def words(stream, start=-1): # you could replace the -1 with 0 and remove the +1
for i in range(start+1): # it depend on whether you start counting with 0 or 1
try:
stream.next()
except StopIteration:
break
for line in stream:
yield line[:max(line.find(' '), 0) or None]
Be aware that you could get strange results if a line would start with a space. To prevent that, you could insert line = line.rstrip() at the beginning of the loop.
Disclaimer: None of this code is tested

Resources