Read file txt with lua - file

A simple question. I have 1 file test.txt in userPath().."/log/test.txt with 15 line
I wish read first line and remove first line and finally file test.txt with 14 line

local iFile = 'the\\path\\test.txt'
local contentRead = {}
local i = 1
file = io.open(iFile, 'r')
for lines in file:lines() do
if i ~= 1 then
table.insert(contentRead, lines)
else
i = i + 1 -- this will prevent us from collecting the first line
print(lines) -- just in case you want to display the first line before deleting it
end
end
io.close(file)
local file = io.open(iFile, 'w')
for _,v in ipairs(contentRead) do
file:write(v.."\n")
end
io.close(file)
there must be other ways to simplify this, but basically what I did in the code was:
Open the file in reading mode, and store all lines of text except the first line in the table contentRead
I opened the file again, but this time in Write mode, causing the entire contents of the file to be erased, and then, I rewrote all the contents stored in the table contentRead in the file.
Thus, the first line of the file was "deleted" and only the other 14 lines remained

Related

find specific Byte in File and read until specific byte in Lua

Is it possible to search at first inside the file after an specific byte and find the position and read just the bytes from the file in until that specific byte?
At the moment it is just possible for me to read some bytes or the whole file in and afterwards search for that specific byte.
like this:
local function read_file(path)
local file = open(path, "r") -- r read mode and b binary mode
if not file then return nil end
local content = file:read(64) -- reading 64 bytes
file:close()
return content
end
local fileContent = read_file("../test/l_0.dat");
print(fileContent)
function parse(line)
if line then
len = 1
a = line:find("V", len +1) --find V in content
return a
else
return false
end
end
a = parse(fileContent) --position of V in content
print(a)
print(string.sub(fileContent, a)) -- content until first found V
In this example i find at position 21 the first V. So it would be cool to read in only 21 bytes except of 64 bytes or the whole file. But then i need to find the position before reading something in. Is this possible ? (The 21byte are variable, it could be 20 or 50 or so on)
You can specify a file position using file:seek and read a certain number of characters (bytes) by providing an integer to file:read
local file = file:open(somePath)
if file then
-- set cursor to -5 bytes from the file's end
file:seek("end", -5)
-- read 3 bytes
print(file:read(3))
file:close()
end
You cannot search in a file without reading it. If you don't want to read the entire file you can read it in chunks either by reading it linewise (if there are lines in your file) or by reading a specific number of bytes each time until you find something.
Of course you can also read it byte-wise.
You can argue if it makes more sense to read a 64 byte file as a whole or in chunks. I mean in most scenarios you won't notice any difference.
So you could file:read(1) in a loop that terminates once you found a V or reach the end of the file.
local file = io.open(somePath)
if file then
local data = ""
for i = 1, 64 do
local b = file:read(1)
if not b then print("no V in file") data = nil break end
data = data .. b
if b == "V" then print(data) break end
end
file:close()
end
vs
local file = io.open("d:/test.txt", "r")
if file then
local data = file:read("a")
local pos = data:find("V")
if pos then
print(data:sub(1, pos))
end
file:close()
end
(Or) Correct your code to...
local function read_file(path)
local file = io.open(path, "r") -- r read mode and b binary mode
if not file then return nil end
local content = file:read(64) -- reading 64 bytes
file:close()
return content
end
local fileContent = read_file("test/l_0.dat") -- '../' causing error
print(fileContent)
local function parse(line)
if line then
local len = 1
local a = line:find("V", len +1) --find V in content
return a
else
return false
end
end
print(fileContent:sub(1, parse(fileContent))) -- content until first found V
That puts out...
0123456789VabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789V
If you want that V is a (single) delimiter you probably dont want to put it out.
Meet the strength of string.sub(text, start, stop)...
print(fileContent:sub(1, parse(fileContent) - 1)) -- before V
-- 0123456789
print(fileContent:sub(parse(fileContent) + 1, -1)) -- after V
-- abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

Awk - Separate one .txt file to files by condition

I have one problem, I would like to separate one file by condition to more files.
INPUT: One text file
variable chrom=chr1
1000 10
1010 20
1020 10
vriable chrom=chr2
1000 20
1100 30
1200 10
OUTPUT: two files for this example.
chr1.txt
variable chrom=chr1
1000 10
1010 20
1020 10
chr2.txt
variable chrom=chr2
1000 20
1100 30
1200 10
So, the separator condition if row starts with chrom=chr$i (i={1..22}) => separate to other text file.
Thank you
Something along these lines:
awk 'BEGIN { filename="unknown.txt" } /^variable chrom=/ { close(filename); filename = substr($0, index($0, "=") + 1) ".txt"; } { print > filename }'
Where the awk code is
BEGIN { filename="unknown.txt" } # default file name, used only if the
# file doesn't start with a variable chrom=
# line
/^variable chrom=/ { # in such a line:
close(filename) # close the previous file (if open)
# and set the new filename
filename = substr($0, index($0, "=") + 1) ".txt" filename
}
{ print > filename } # print everything to the current file.
The basic algorithm is very straightforward: Read file linewise, change filename when you find a line that starts a new section, always print the current line to the current file, so the devil is in the detail of isolating the file name from the marker line. The
filename = substr($0, index($0, "=") + 1) ".txt"
approach is simplistic but serviceable for the example you showed: It takes everything after the = and attaches .txt to get the file name. If your marker lines are more complicated than variable chrom=filenamestub, this will have to be amended, but in that case I could only guess your requirements and would probably guess wrong.
If you know how many lines there are between, you could use
split -l 4 textfile.txt
This will split the textfile every 4th line it finds, making the files xaa and xab, and so on.

How to create a truncated permanent database from a larger file in SAS [duplicate]

This question already has answers here:
Read specific columns of a delimited file in SAS
(3 answers)
Closed 8 years ago.
I'm trying to read a comma delimited .txt file (called 'file.txt' in the code below) into SAS in order to create a permanent database that includes only some of the variables and observations.
Here's a snippet of the .txt file for reference:
SUMLEV,REGION,DIVISION,STATE,NAME,POPESTIMATE2013,POPEST18PLUS2013,PCNT_POPEST18PLUS
10,0,0,0,United States,316128839,242542967,76.7
40,3,6,1,Alabama,4833722,3722241,77
40,4,9,2,Alaska,735132,547000,74.4
40,4,8,4,Arizona,6626624,5009810,75.6
40,3,7,5,Arkansas,2959373,2249507,76
My (abbreviated) code is as follows:
options nocenter nodate ls=72 ps=58;
filename foldr1 'C:\Users\redacted\Desktop\file.txt';
libname foldr2 'C:\Users\redacted\Desktop\Data';
libname foldr3 'C:\Users\redacted\Desktop\Formats';
options fmtsearch=(FMTfoldr.bf_fmts);
proc format library=foldr3.bf_fmts;
[redacted]
run;
data foldr2.file;
infile foldr1 DLM=',' firstobs=2 obs=52;
input STATE $ NAME $ REGION $ POPESTIMATE2013;
PERCENT=POPESTIMATE2013/316128839;
format REGION $regfmt.;
run;
proc print data=foldr2.file;
sum POPESTIMATE2013 PERCENT;
title 'Title';
run;
In my INPUT statement, I list the variables that I want to include in my new truncated database (STATE, NAME, REGION, etc.).
When I print my truncated database, I notice that all of my INPUT variables do not correspond to the same variables in the original file.
Instead my variables print out like this:
STATE (1st var listed in INPUT) printed as SUMLEV (1st var listed in
.txt file)
NAME (2nd var listed in INPUT) printed as REGION (2nd var listed in .txt file)
REGION (3rd " " " ") printed as DIVISION (3rd " " " ")
POPESTIMATE2013 (4th " " " ") printed as STATE (4th " " " ")
It seems that SAS is matching my INPUT variables based on order, not on name. So, because I list STATE first in my INPUT statement, SAS prints out the first variable of the original .txt file (i.e., the SUMLEV variable).
Any idea what's wrong with my code? Thanks for your help!
Your current code is reading in the first 4 values from each line of the CSV file and assigning them to columns with the names you have listed.
The input statement lists all the columns you want to read in (and where to read them from), it does not search for named columns within the input file.
The code below should produce the output you want. The keep statement lists the columns that you want in the output.
data foldr2.file;
infile foldr1 dlm = "," firstobs = 2 obs = 52;
/* Prevent truncating the name variable */
informat NAME $20.;
/* Name each of the columns */
input SUMLEV REGION DIVISION STATE NAME $ POPESTIMATE2013 POPEST18PLUS2013 PCNT_POPEST18PLUS;
/* Keep only the columns you want */
keep STATE NAME REGION POPESTIMATE2013 PERCENT;
PERCENT = POPESTIMATE2013/316128839;
format REGION $regfmt.;
run;
For a slightly more involved solution see Joe's excellent answer here. Applying this approach to your data will require setting the lengths of your columns in advance and converting character values to numeric.
data foldr2.file;
infile foldr1 dlm = "," firstobs = 2 obs = 52;
length STATE 8. NAME $13. REGION 8. POPESTIMATE2013 8.;
input #;
STATE = input(scan(_INFILE_, 4, ','), best.);
NAME = scan(_INFILE_, 5, ',');
REGION = input(scan(_INFILE_, 2, ','), best.);
POPESTIMATE2013 = input(scan(_INFILE_, 6, ','), best.);
PERCENT = POPESTIMATE2013/316128839;
format REGION $regfmt.;
run;
If you are looking to become more familiar with SAS it would be worth your while to take a look at the SAS documentation for reading files.
Your current data step is telling SAS what to name the first four variables in the txt file. To do what you want, you need to list all of the variables in the txt file in your "input" statement. Then, in your data statement, use the keep= option to select the variables you want to be included in the output dataset.
data foldr2.file (keep=STATE NAME REGION POPESTIMATE2013 PERCENT);
infile foldr1 DLM=',' firstobs=2 obs=52;
input
SUMLEV
REGION $
DIVISION
STATE $
NAME $
POPESTIMATE2013
POPEST18PLUS2013
PCNT_POPEST18PLUS;
PERCENT=POPESTIMATE2013/316128839;
format REGION $regfmt.;
run;

Handling CR line endings in Lua

I'm trying to read a file with CR line endings using the file:read method which seems to be acting up for some reason. The file contents look like this:
ABCDEFGH
12345
##
6789
I want it to behave consistently with all types of line endings. Every time I try to read the file, it returns the last line in the file concatenated with the any trailing characters from the previous lines that have a greater position than the position of the last character in the last line. Here's what I mean:
> file=io.open("test.lua", "rb")
> function re_read(openFile)
openFile:seek("set");
return openFile:read("*a");
end
> =re_read(file) -- With CR
67895FGH
> =re_read(file) -- With CRLF
ABCDEFGH
12345
##
6789
> =re_read(file) -- with LF
ABCDEFGH
12345
##
6789
>
As you can see, the string being returned is the last string plus 5 in the previous line and plus FGH from the first line. Any lines shorter than the last line are skipped.
My goal is to use the file:line() method to read the file line by line. My hope is that if a 'fix' for file:read is found then it can be applied to file:lines().
In the case with CR only, re_read actually works as expected: it returns the lines separated by CR. But when the interpreter displays it, it interprets the CR characters as "go back to the beginning of the line". So here is how the result changes line by line:
ABCDEFGH
12345FGH
##345FGH
67895FGH
EDIT: here it is character by character, with a "virtual cursor" (|).
|
A|
AB|
ABC|
ABCD|
ABCDEF|
ABCDEFGH|
|ABCDEFGH
1|BCDEFGH
12|CDEFGH
123|DEFGH
1234|EFGH
12345|FGH
|12345FGH
#|2345FGH
##|345FGH
|##345FGH
6|#345FGH
67|345FGH
678|45FGH
6789|5FGH
Proof:
> s = "ABCDEFGH\r12345\r##\r6789"
> =s
67895FGH
You could normalize your line endings with gsub then iterate over the product with gmatch.
local function cr_lines(s)
return s:gsub('\r\n?', '\n'):gmatch('(.-)\n')
end
local function cr_file_lines(filename)
local f = io.open(filename, 'rb')
local s = f:read('*a')
f:close()
return cr_lines(s)
end
for ln in cr_file_lines('test.txt') do
print(ln)
end

Reading a text file in MATLAB line by line

I have a CSV file, I want to read this file and do some pre-calculations on each row to see for example that row is useful for me or not and if yes I save it to a new CSV file.
can someone give me an example?
in more details this is how my data looks like: (string,float,float) the numbers are coordinates.
ABC,51.9358183333333,4.183255
ABC,51.9353866666667,4.1841
ABC,51.9351716666667,4.184565
ABC,51.9343083333333,4.186425
ABC,51.9343083333333,4.186425
ABC,51.9340916666667,4.18688333333333
basically i want to save the rows that have for distances more than 50 or 50 in a new file.the string field should also be copied.
thanks
You could actually use xlsread to accomplish this. After first placing your sample data above in a file 'input_file.csv', here is an example for how you can get the numeric values, text values, and the raw data in the file from the three outputs from xlsread:
>> [numData,textData,rawData] = xlsread('input_file.csv')
numData = % An array of the numeric values from the file
51.9358 4.1833
51.9354 4.1841
51.9352 4.1846
51.9343 4.1864
51.9343 4.1864
51.9341 4.1869
textData = % A cell array of strings for the text values from the file
'ABC'
'ABC'
'ABC'
'ABC'
'ABC'
'ABC'
rawData = % All the data from the file (numeric and text) in a cell array
'ABC' [51.9358] [4.1833]
'ABC' [51.9354] [4.1841]
'ABC' [51.9352] [4.1846]
'ABC' [51.9343] [4.1864]
'ABC' [51.9343] [4.1864]
'ABC' [51.9341] [4.1869]
You can then perform whatever processing you need to on the numeric data, then resave a subset of the rows of data to a new file using xlswrite. Here's an example:
index = sqrt(sum(numData.^2,2)) >= 50; % Find the rows where the point is
% at a distance of 50 or greater
% from the origin
xlswrite('output_file.csv',rawData(index,:)); % Write those rows to a new file
If you really want to process your file line by line, a solution might be to use fgetl:
Open the data file with fopen
Read the next line into a character array using fgetl
Retreive the data you need using sscanf on the character array you just read
Perform any relevant test
Output what you want to another file
Back to point 2 if you haven't reached the end of your file.
Unlike the previous answer, this is not very much in the style of Matlab but it might be more efficient on very large files.
Hope this will help.
You cannot read text strings with csvread.
Here is another solution:
fid1 = fopen('test.csv','r'); %# open csv file for reading
fid2 = fopen('new.csv','w'); %# open new csv file
while ~feof(fid1)
line = fgets(fid1); %# read line by line
A = sscanf(line,'%*[^,],%f,%f'); %# sscanf can read only numeric data :(
if A(2)<4.185 %# test the values
fprintf(fid2,'%s',line); %# write the line to the new file
end
end
fclose(fid1);
fclose(fid2);
Just read it in to MATLAB in one block
fid = fopen('file.csv');
data=textscan(fid,'%s %f %f','delimiter',',');
fclose(fid);
You can then process it using logical addressing
ind50 = data{2}>=50 ;
ind50 is then an index of the rows where column 2 is greater than 50. So
data{1}(ind50)
will list all the strings for the rows of interest.
Then just use fprintf to write out your data to the new file
here is the doc to read a csv : http://www.mathworks.com/access/helpdesk/help/techdoc/ref/csvread.html
and to write : http://www.mathworks.com/access/helpdesk/help/techdoc/ref/csvwrite.html
EDIT
An example that works :
file.csv :
1,50,4.1
2,49,4.2
3,30,4.1
4,71,4.9
5,51,4.5
6,61,4.1
the code :
File = csvread('file.csv')
[m,n] = size(File)
index=1
temp=0
for i = 1:m
if (File(i,2)>=50)
temp = temp + 1
end
end
Matrix = zeros(temp, 3)
for j = 1:m
if (File(j,2)>=50)
Matrix(index,1) = File(j,1)
Matrix(index,2) = File(j,2)
Matrix(index,3) = File(j,3)
index = index + 1
end
end
csvwrite('outputFile.csv',Matrix)
and the output file result :
1,50,4.1
4,71,4.9
5,51,4.5
6,61,4.1
This isn't probably the best solution but it works! We can read the CSV file, control the distance of each row and save it in a new file.
Hope it will help!

Resources