Reading a text file in MATLAB line by line - file

I have a CSV file, I want to read this file and do some pre-calculations on each row to see for example that row is useful for me or not and if yes I save it to a new CSV file.
can someone give me an example?
in more details this is how my data looks like: (string,float,float) the numbers are coordinates.
ABC,51.9358183333333,4.183255
ABC,51.9353866666667,4.1841
ABC,51.9351716666667,4.184565
ABC,51.9343083333333,4.186425
ABC,51.9343083333333,4.186425
ABC,51.9340916666667,4.18688333333333
basically i want to save the rows that have for distances more than 50 or 50 in a new file.the string field should also be copied.
thanks

You could actually use xlsread to accomplish this. After first placing your sample data above in a file 'input_file.csv', here is an example for how you can get the numeric values, text values, and the raw data in the file from the three outputs from xlsread:
>> [numData,textData,rawData] = xlsread('input_file.csv')
numData = % An array of the numeric values from the file
51.9358 4.1833
51.9354 4.1841
51.9352 4.1846
51.9343 4.1864
51.9343 4.1864
51.9341 4.1869
textData = % A cell array of strings for the text values from the file
'ABC'
'ABC'
'ABC'
'ABC'
'ABC'
'ABC'
rawData = % All the data from the file (numeric and text) in a cell array
'ABC' [51.9358] [4.1833]
'ABC' [51.9354] [4.1841]
'ABC' [51.9352] [4.1846]
'ABC' [51.9343] [4.1864]
'ABC' [51.9343] [4.1864]
'ABC' [51.9341] [4.1869]
You can then perform whatever processing you need to on the numeric data, then resave a subset of the rows of data to a new file using xlswrite. Here's an example:
index = sqrt(sum(numData.^2,2)) >= 50; % Find the rows where the point is
% at a distance of 50 or greater
% from the origin
xlswrite('output_file.csv',rawData(index,:)); % Write those rows to a new file

If you really want to process your file line by line, a solution might be to use fgetl:
Open the data file with fopen
Read the next line into a character array using fgetl
Retreive the data you need using sscanf on the character array you just read
Perform any relevant test
Output what you want to another file
Back to point 2 if you haven't reached the end of your file.
Unlike the previous answer, this is not very much in the style of Matlab but it might be more efficient on very large files.
Hope this will help.

You cannot read text strings with csvread.
Here is another solution:
fid1 = fopen('test.csv','r'); %# open csv file for reading
fid2 = fopen('new.csv','w'); %# open new csv file
while ~feof(fid1)
line = fgets(fid1); %# read line by line
A = sscanf(line,'%*[^,],%f,%f'); %# sscanf can read only numeric data :(
if A(2)<4.185 %# test the values
fprintf(fid2,'%s',line); %# write the line to the new file
end
end
fclose(fid1);
fclose(fid2);

Just read it in to MATLAB in one block
fid = fopen('file.csv');
data=textscan(fid,'%s %f %f','delimiter',',');
fclose(fid);
You can then process it using logical addressing
ind50 = data{2}>=50 ;
ind50 is then an index of the rows where column 2 is greater than 50. So
data{1}(ind50)
will list all the strings for the rows of interest.
Then just use fprintf to write out your data to the new file

here is the doc to read a csv : http://www.mathworks.com/access/helpdesk/help/techdoc/ref/csvread.html
and to write : http://www.mathworks.com/access/helpdesk/help/techdoc/ref/csvwrite.html
EDIT
An example that works :
file.csv :
1,50,4.1
2,49,4.2
3,30,4.1
4,71,4.9
5,51,4.5
6,61,4.1
the code :
File = csvread('file.csv')
[m,n] = size(File)
index=1
temp=0
for i = 1:m
if (File(i,2)>=50)
temp = temp + 1
end
end
Matrix = zeros(temp, 3)
for j = 1:m
if (File(j,2)>=50)
Matrix(index,1) = File(j,1)
Matrix(index,2) = File(j,2)
Matrix(index,3) = File(j,3)
index = index + 1
end
end
csvwrite('outputFile.csv',Matrix)
and the output file result :
1,50,4.1
4,71,4.9
5,51,4.5
6,61,4.1
This isn't probably the best solution but it works! We can read the CSV file, control the distance of each row and save it in a new file.
Hope it will help!

Related

Read file txt with lua

A simple question. I have 1 file test.txt in userPath().."/log/test.txt with 15 line
I wish read first line and remove first line and finally file test.txt with 14 line
local iFile = 'the\\path\\test.txt'
local contentRead = {}
local i = 1
file = io.open(iFile, 'r')
for lines in file:lines() do
if i ~= 1 then
table.insert(contentRead, lines)
else
i = i + 1 -- this will prevent us from collecting the first line
print(lines) -- just in case you want to display the first line before deleting it
end
end
io.close(file)
local file = io.open(iFile, 'w')
for _,v in ipairs(contentRead) do
file:write(v.."\n")
end
io.close(file)
there must be other ways to simplify this, but basically what I did in the code was:
Open the file in reading mode, and store all lines of text except the first line in the table contentRead
I opened the file again, but this time in Write mode, causing the entire contents of the file to be erased, and then, I rewrote all the contents stored in the table contentRead in the file.
Thus, the first line of the file was "deleted" and only the other 14 lines remained

Creating matrix with reading from a file

I wanted to write a code to do this command for me but I have a problem with defining iterative part:
My requirement: I have 101 files which are ended with a number that can be used as a numerator like file_01 to file_101. I want to have "for loop" to do this: removing 3 columns from each file and adding 3 columns with different arrays to those files. The arrays should be read from a specific excel file.
could you please help me
clear all
clc
close all
files = dir('*.txt');
for i=1:length(files)
eval(['load ' files(i).name ' -ascii']);
end
para = xlsread('parameters.xlsx');
and for other part
T = size(x1);
j = T(:,1);
A = zeros(j,1);
for i =0:length(N)
% Deleting the first column
x1(:,1)=[];
newcol = zeros(j,1);
x1 = [newcol x1];
end
Try this:
data = xlsread('parameters.xlsx');
files = dir('*.txt');
for i = 1:length(files)
A = load(files(i).name); % load data from txt
A(1:3,:) = []; % remove first 3 columns
A = [A data(1:3,:)]; % add 3 column from excel data
dlmwrite(files(i).name,A); % re-write current file
end
Remember that number of rows from excel and txt files should be the same

Getting back empty arrays/ variables when uploading .txt file to Matlab

I am trying to load a .txt file with data to Matlab to use for some calculations. However, when I run the code the variables/arrays come back empty or blank.
Below I have the code I am using.
%% importing the data
% Open file in the memory
fileID = fopen('rainfall.txt');
% Read the txt file with formats: Integer, Integer, Float
% Treat multiple delimiters, which is "space" in here, as one. Put the data
% in a variable called chunk.
chunk = textscan(fileID,'%d %d %f','Delimiter',' ',...
'MultipleDelimsAsOne',1);
% Close file from the memory.
fclose(fileID);
% date
dt = chunk{:,1};
% hour
hr = chunk{:,2};
% precip
r = chunk{:,3};
% remove extra variables from Matlab workspace
clear fileID ans
In the Workspace tab in Matlab it shows chunk to be an empty 1x3 cell. This results in dt, hr, and r not having any values either and are listed as having a value of []. So my best guess is that something is going wrong with loading in the data to Matlab.
Also, here is small portion of the data I am working with. This is exactly how it is written in the .txt file as well.
STATION DATE HPCP
----------------- -------------- --------
COOP:132367 20040116 22:00 0.01
COOP:132367 20040116 23:00 0.01
COOP:132367 20040117 00:00 0.04
COOP:132367 20040117 01:00 0.02
COOP:132367 20040117 02:00 0.00
In the actual file I have a lot more data than what I have listed here, but this should give an idea of what the data looks like and how it's formatted.
From the textscan help page:
textscan attempts to match the data in the file to the conversion specifier in formatSpec. The textscan function reapplies formatSpec throughout the entire file and stops when it cannot match formatSpec to the data.
So the first problem is the title lines. You should discard them. For example, by manually reading 2 lines (using fgetl).
Next, you should make sure that the format matches the data. You tried reading 2 integers and a float but you also have the station name.
I think the following should be ok:
fileID = fopen('rainfall.txt');
l = fgetl(fileID);
l = fgetl(fileID);
chunk = textscan(fileID,'%s:%d %d %d %f','Delimiter',' ',...
'MultipleDelimsAsOne',1);

Matlab : Create Array From Folder with columns containing seperated filename and full path

I have a folder that contains pictures of receipts that are named in a specific way. Date first in reverse format (ex. 21/11/2015 -> 15_11_21) followed by a space and then the value of the receipt (ex. 18,45 -> 18_45)
Let's say the files are stored in location C:\pictures\receipts. In this folder I have 3 files:
15_11_21 18_45.jpg
15_11_22 115_28.jpg
15_12_02 3_00.jpg
I want to create an array that has 3 columns. The first column contains the date of the receipt in normal format, the second column contains the value in negative and the third column has the absolute path of the file. The array should be like this:
Receipts = [21/11/2015|-18,45 |C:\pictures\receipts\15_11_21 18_45.jpg
22/11/2015|-115,28|C:\pictures\receipts\15_11_22 115_28.jpg
02/12/2015| -3,00 |C:\pictures\receipts\15_12_02 3_00.jpg];
I tried modifying/combining various functions like getting the full path:
[status, list] = system( 'dir /B /S *.mp3' );
result = textscan( list, '%s', 'delimiter', '\n' );
fileList = result{1}
strsplit to separate the values of the filenames, and even this function, but I cannot get the desired result.
It looks like strsplit should do what you want. Try:
strsplit (filename, {' ', '.'})
Also, I would use dir rather than system, since it is probably more independent of changes in the operating system.
A little bit "hacky":
filename = 'C:\pictures\receipts\15_11_21 18_45.jpg';
filename = strsplit(filename,'\');
filename = filename(end);
d = textscan('15_11_21 18_45.jpg', '%d_%d_%d %d_%d.jpg');
day = d{1};
month = d{2};
year = d{3};
a = -d{4};
b = d{5};
receipt = sprintf('%d/%d/20%d|%d,%d|%s', year, month, day, a, b, filename{1})
Have a look at formatting operators (e.g. type doc sprintf). You may want to add some flags for justification/spacings.
One option, utilizing regular expressions and a data structure as the final output:
% Get list of JPEGs in the current directory + subdirectories
[~, list] = system( 'dir /B /S *.jpg' );
result = textscan( list, '%s', 'delimiter', '\n' );
fileList = result{1};
% Split out file names, could use a regex but why bother. Using cellfun
% rather than an explicit loop
[~, filenames] = cellfun(#fileparts, fileList, 'UniformOutput', false);
% Used named tokens to pull out our data for analysis
Receipts = regexp(filenames, '(?<date>\d*_\d*_\d*)\s*(?<cost>\d*_\d*)', 'names');
Receipts = [Receipts{:}]; % Dump out our nested data
[Receipts(:).fullpath] = fileList{:}; % Add file path to our structure
% Reformat costs
% Replace underscore with decimal, convert to numeric array and negate
tmp = -str2double(strrep({Receipts(:).cost}, '_', '.'));
tmp = num2cell(tmp); % Necessary intermediate step, because MATLAB...
[Receipts(:).cost] = tmp{:}; % Replace field in our data structure
clear tmp
% Reformat dates
formatIn = 'yy_mm_dd';
formatOut = 'dd/mm/yyyy';
pivotYear = 2000; % Pivot year needed since we have 2-digit years
% datenum needed because we have a custom input date format
tmp = datestr(datenum({Receipts(:).date}, formatIn, pivotYear), formatOut);
tmp = cellstr(tmp); % Necessary intermediate step, because MATLAB...
[Receipts(:).date] = tmp{:};
clear tmp
This results in a structure array, Receipts. I went this route because it's more explicit to access the data in the future. For example, if I wanted the cost of my 2nd receipt, I could do:
Employee2Cost = Receipts(2).cost;
Which returns:
Employee2Cost =
-115.2800

How to create a truncated permanent database from a larger file in SAS [duplicate]

This question already has answers here:
Read specific columns of a delimited file in SAS
(3 answers)
Closed 8 years ago.
I'm trying to read a comma delimited .txt file (called 'file.txt' in the code below) into SAS in order to create a permanent database that includes only some of the variables and observations.
Here's a snippet of the .txt file for reference:
SUMLEV,REGION,DIVISION,STATE,NAME,POPESTIMATE2013,POPEST18PLUS2013,PCNT_POPEST18PLUS
10,0,0,0,United States,316128839,242542967,76.7
40,3,6,1,Alabama,4833722,3722241,77
40,4,9,2,Alaska,735132,547000,74.4
40,4,8,4,Arizona,6626624,5009810,75.6
40,3,7,5,Arkansas,2959373,2249507,76
My (abbreviated) code is as follows:
options nocenter nodate ls=72 ps=58;
filename foldr1 'C:\Users\redacted\Desktop\file.txt';
libname foldr2 'C:\Users\redacted\Desktop\Data';
libname foldr3 'C:\Users\redacted\Desktop\Formats';
options fmtsearch=(FMTfoldr.bf_fmts);
proc format library=foldr3.bf_fmts;
[redacted]
run;
data foldr2.file;
infile foldr1 DLM=',' firstobs=2 obs=52;
input STATE $ NAME $ REGION $ POPESTIMATE2013;
PERCENT=POPESTIMATE2013/316128839;
format REGION $regfmt.;
run;
proc print data=foldr2.file;
sum POPESTIMATE2013 PERCENT;
title 'Title';
run;
In my INPUT statement, I list the variables that I want to include in my new truncated database (STATE, NAME, REGION, etc.).
When I print my truncated database, I notice that all of my INPUT variables do not correspond to the same variables in the original file.
Instead my variables print out like this:
STATE (1st var listed in INPUT) printed as SUMLEV (1st var listed in
.txt file)
NAME (2nd var listed in INPUT) printed as REGION (2nd var listed in .txt file)
REGION (3rd " " " ") printed as DIVISION (3rd " " " ")
POPESTIMATE2013 (4th " " " ") printed as STATE (4th " " " ")
It seems that SAS is matching my INPUT variables based on order, not on name. So, because I list STATE first in my INPUT statement, SAS prints out the first variable of the original .txt file (i.e., the SUMLEV variable).
Any idea what's wrong with my code? Thanks for your help!
Your current code is reading in the first 4 values from each line of the CSV file and assigning them to columns with the names you have listed.
The input statement lists all the columns you want to read in (and where to read them from), it does not search for named columns within the input file.
The code below should produce the output you want. The keep statement lists the columns that you want in the output.
data foldr2.file;
infile foldr1 dlm = "," firstobs = 2 obs = 52;
/* Prevent truncating the name variable */
informat NAME $20.;
/* Name each of the columns */
input SUMLEV REGION DIVISION STATE NAME $ POPESTIMATE2013 POPEST18PLUS2013 PCNT_POPEST18PLUS;
/* Keep only the columns you want */
keep STATE NAME REGION POPESTIMATE2013 PERCENT;
PERCENT = POPESTIMATE2013/316128839;
format REGION $regfmt.;
run;
For a slightly more involved solution see Joe's excellent answer here. Applying this approach to your data will require setting the lengths of your columns in advance and converting character values to numeric.
data foldr2.file;
infile foldr1 dlm = "," firstobs = 2 obs = 52;
length STATE 8. NAME $13. REGION 8. POPESTIMATE2013 8.;
input #;
STATE = input(scan(_INFILE_, 4, ','), best.);
NAME = scan(_INFILE_, 5, ',');
REGION = input(scan(_INFILE_, 2, ','), best.);
POPESTIMATE2013 = input(scan(_INFILE_, 6, ','), best.);
PERCENT = POPESTIMATE2013/316128839;
format REGION $regfmt.;
run;
If you are looking to become more familiar with SAS it would be worth your while to take a look at the SAS documentation for reading files.
Your current data step is telling SAS what to name the first four variables in the txt file. To do what you want, you need to list all of the variables in the txt file in your "input" statement. Then, in your data statement, use the keep= option to select the variables you want to be included in the output dataset.
data foldr2.file (keep=STATE NAME REGION POPESTIMATE2013 PERCENT);
infile foldr1 DLM=',' firstobs=2 obs=52;
input
SUMLEV
REGION $
DIVISION
STATE $
NAME $
POPESTIMATE2013
POPEST18PLUS2013
PCNT_POPEST18PLUS;
PERCENT=POPESTIMATE2013/316128839;
format REGION $regfmt.;
run;

Resources