Parse files to obtain array of arrays in matlab - file

I need to parse a .txt file in Matlab so that all lines of the file are a different element in an array. Each element of the array would also be an array of integers. So I need to make an array of arrays from a .txt file.
The problem I'm having is that I can't figure out which function to use to parse the file. If I use importdata(filename), it only parses the first line of the file. If I use textscan, it parses the file in colums, and the file is formatted like:
1 1 1 1 1
13 13 13 13 13
2 2 2 2 2
14 14 14 14 14
I need each of the rows to be an array that I can then use to compare my data against.
Is there an option for either one of those functions that would work for my purposes? I've tried looking on the MATLAB documentation, but can't make sense of it.

If each array needs to be a different size, you need to use a cell array to contain them. Something like this:
fid = fopen('test.txt'); %opens the file
tline = fgets(fid); % reads in the first line
data = {}; % creates an empty cell array
index = 1; % initializes index
while ischar(tline) % loops while the line that has just been read contains characters, that is, the end of the file has not been reached.
data{index} = str2num(tline); % converts the line that has just been read in to a string and assigns it to the current column of data
tline = fgets(fid); % reads in the next line
index = index + 1; % increments index
end
fclose(fid);

If you know your data will have the specific format of 5 numbers followed by a single number, you can use dlmread and then format the resulting matrix.
data = dlmread('data.txt',' ');
multipleValueRows = data(1:2:end,:);
singleValueRows = data(2:2:end,1);
The data matrix has the size (number of rows of your file) x 5 columns. In the rows where you only have a single number, the data matrix will contain zeros in columns 2-5.

Related

How to convert m x n matrix to (each row as) comma separted text file in MATLAB?

I have a .mat file with m x n values. For simplicity, let's say we have 2 rows and 3 columns as:
2 4 6
2 1 4
I want to be able to export these values from .mat file to a text file, however, in such a manner that each line has the respective row from the .mat file with values in each row separated by comma. For the above example, in the txt file, it should look like:
2,4,6
2,1,4
This is what I did till now:
gt1 = load('Benchmark\AAmpiidata\groundtruth.mat');
r = gt1.gTruth.LabelData{1,1}{1,1};
allOneString = sprintf('%.0f,', r(1,:));
allOneString = allOneString(1:end-1);% strip final comma
fid=fopen('allOneString.txt','w');
fprintf(fid,'%s',allOneString);
fclose(fid);true
I am able to extract the first row from .mat file as I require. I get this:
492,304,78,220
However, I don't know how to extract multiple rows from .mat file. Any help will be appreciated!
P.S. In the above code, in the .mat file gt1 directly doesn't have the values. The values I need (mxn) can be extracted using gt1.gTruth.LabelData{1,1}{1,1}
There are two answers depending on the version of MATLAB you use.
Answer 1: For MATLAB 2018 and before:
gt1 = load('Benchmark\AAmpiidata\groundtruth.mat');
r = gt1.gTruth.LabelData{1,1}{1,1};
dlmwrite('allOneString.txt',r)
Answer 2: For MATLAB 2019 (2019-a as of writing this answer)
gt1 = load('Benchmark\AAmpiidata\groundtruth.mat');
r = gt1.gTruth.LabelData{1,1}{1,1};
writematrix(r,'allOneString.txt')
Here's another way, which uses fprintf for file writing and strjoin for building the format string:
r = [2 4 6;2 1 4];
fid = fopen('allOneString.txt','w');
fprintf(fid, [strjoin(repmat({'%.0f'}, 1, size(r,2)), ',') '\n'], r.');
fclose(fid);

How can i import data with uneven row lengths

I have a .txt file I wish to import into matlab (thinking using importdata), however I have some issues telling matlab the format, as well as how much of the data to take in.
The file is generated from the program "TurbSim".
The format is:
12 rows of headers
1 line with 2 numerical values, spacing delimiter is done with an
empty space.
35 lines each with 35 numerical values, spacing delimiter is done
with an empty space.
1 empty line of space
The format, after the headers, repeat and I have a very large file, I've not been able to find a way in a script to load up the file correctly where I have control over how large a portion i wish to take out. Which I may need as the file is ~860MB.
Example txt of my issue. fixed
https://drive.google.com/open?id=1FwmrCiz6TaWXYwXYX_v0BwQD-jjbdsE4
How about this?
clear;
NUM_HEADERLINES = 12;
DELIM_VALUES = ' ';
fid = fopen('TurbSim.txt');
% skip header
for n = 1:NUM_HEADERLINES, fgets(fid);end
while ~feof(fid)
% read line
[line,nl] = fgets(fid);
% remove newline char
line = line(1:end-length(nl));
% explode using delimiter
values = strsplit(line,DELIM_VALUES);
% in case of leading blanks: skip first empty one
if isempty(values{1}), values = values(2:end);end
% skip blank lines
if isempty(values), continue;end
% convert to double
values = str2double(values);
% now process/save/whatever...
...
fprintf('Read %d values\n',length(values)); % in your example: 2 or 40
% disp(values);
end
fclose(fid);
Btw: your examples has 40 lines with 40 values each, not 35 with 35 each.

Best way to compare data from file to data in array in Matlab

I am having a bit of trouble with a specific file i/o in matlab, I am fairly new to it still so some things are still a bit of a mystery to me. The input file is structured as so:
File Name: Processed_kplr003942670-2010174085026_llc.fits.txt
File contents- 6 Header Lines then:
1, 2, 3
1, 2, 3
basically a matrix of about [1443,3] with varying values
now here is the matrix that I'm comparing it to:
[(0123456, 1, 2, 3), (0123456, 2, 3, 4), (etc..)]
Now here is my problem, first I need to know how to properly do the file input in a way which can let me compare the ID number (0123456) that is in the filename with the ID value that is in the matrix, so that I can compare the other columns of both. I do not know how to achieve this in matlab. Furthermore, I need to be able to loop over every point in the the matrix that matches up to the specific file, for example:
If I have 15 files ranging from 'Processed_0123456_1' to 'Processed_0123456_15' then I want to be able to read in the values contained in 'Processed_0123456_1'and compare them to ANY row in the matrix that corresponds to that ID (0123456). I don't know if maybe accumaray can be used for this, but as I said I'm not sure.
So the code must:
-Read in file
-Compare file to any point in the matrix with corresponding ID
-Do operations
-Loop over until full list of files in the directory are read in and processed, and output a matrix with the results.
Thanks for any help.
EDIT: Exact File Sample--
Kepler I.D.-----Channel
[1161345]--------[84]
-TTYPE1--------TTYPE8------------TTYPE4
['TIME']---['PDCSAP_FLUX']---['SAP_FLUX']
['BJD - 2454833']--['e-/s']--------['e-/s']
CROWDSAP --- 0.9791
630.195880143,277165.0,268233.0
630.216312946,277214.0,268270.0
630.23674585,277239.0,268293.0
630.257178554,277296.0,268355.0
630.277611357,277294.0,268364.0
630.29804426,277365.0,268441.0
630.318476962,277337.0,268419.0
630.338909764,277403.0,268481.0
630.359342667,277389.0,268463.0
630.379775369,277441.0,268508.0
630.40020817,277545.0,268604.0
There are more entries than what was just posted but they go for about 1000 lines so it is impractical to post that all here.
To get the file ID, use regular expressions, e.g.:
filename = 'Processed_0123456_1';
file_id_str = regexprep(filename, 'Processed_(\d+)_\d+', '$1');
file_num_str = regexprep(filename, 'Processed_\d+_(\d+)', '$1')
To read in the file contents, assuming that it's all comma-separated values without a header, use textscan, e.g.,
fid = fopen(filename)
C = textscan(fid, '%f,%f,%f') % Use as many %f specifiers as you have entries per line in the file
textscan also works on strings. So, for example, if your file contents was:
filestr = sprintf('1, 2, 3\n1, 3, 3')
Then running textscan on filestr works like this:
C = textscan(filestr, '%f,%f,%f')
C =
[2x1 int32] [2x1 int32] [2x1 int32]
You can convert that to a matrix using cell2mat:
cell2mat(C)
ans =
1 2 3
1 3 3
You could then repeat this procedure for all files with the same ID and concatenate them into a single matrix, e.g.,
C_full = [];
for (all files with the same ID)
C = do_all_the_above_stuff;
C_full = [C_full; C];
end
Then you can look for what you want in C_full.
Update based on updated OP Dec 12, 2013
Here's code to read the values from a single file. Wrap this all in the the loop that I mentioned above to loop over all your files and read them all into a single matrix.
fid = fopen('/path/to/file');
% Skip over 12 header lines
for kk = 1:12
fgetl(fid);
end
% Read in values to a matrix
C = textscan(fid, '%f,%f,%f');
C = cell2mat(C);
I think your requirements are too complicated to write the whole script here. Nonetheless, I will try to give some pointers to help. Disclaimer: None of this is tested, just my best guess. Please expect syntax errors, etc. I hope you can figure them out :-)
1) You can use the textscan function with the delimiter option to get data from the lines of your file. Since your format varies as it does, we will probably want to use...
2) ... fgetl to read the first two lines into strings and process them separately using texstscan. Such an operation might look like:
fid = fopen('file.txt','w');
tline1 = fgetl(fid);
tline2 = fgetl(fid);
fclose(fid);
C1 = textscan(tline1,'%s %d %s','delimiter','_'); %C1{2} will be the integer we want
C2 = textscan(tline2,'%s %s'),'delimiter,':'); %C2{2} will be the values we want, but they're still a string so...
mat = str2num(C2{2});
3) Then, for the rest of the lines, we can use something like dlmread:
mat2 = dlmread('file.txt',',',2,0);
The 2,0 specifies the offset in 0-based rows,columns from the start of the file. You may need to look at something like vertcat to stitch mat and mat2 together.
4) The list of files in the directory can be found with the dir command. The filename is an attribute of the structure that's returned:
dirlist = dir;
for i = 1:length(dirlist)
filename = dirlist(i).name
%process your files
end
You can also pass matching strings to dir, like so:
dirlist = dir('*.txt');
which will find all of the files with extension .txt.
5) You can very easily loop through the comparison matrix:
sze = size(comparisonmatrix);
for i = 1:sze(1)
%compare comparisonmatrix(i,1) to C1{2}
%Perform whatever operations you need
end
Hope that helps!

cast an array into a string - MATLAB

I have an array that I want to basically capture as text so I can write it to one cell of an Excel file as a column header. It's a range of subjects, and I'll have some data underneath. So the range is:
range = 2:12;
which creates and array, but I want the Excel file header to just read 2:12. I've tried creating another variable to capture this text in one field, using num2str like this:
rangeChar = num2str(range);
and I get:
rangeChar = 2 3 4 5 6 7 8 9 10 11 12
but they are each separate fields, so when exported to Excel they each take up their own cell. The original range is not always sequential - for example I might have
range = cat(2, 2:4, 8, 9:12);
so I can't just do a
rangeChar = sprintf('%d:%d', range(1), range(end));
type of thing either. Any thoughts?
You can do it the other way around and keep the range in the string and extract the vector from that when you need it:
rangeChar = '2:12';
range = eval(rangeChar);
Couldn't you just write :
range = '2:12';
Use a cell array to hold "range" and use the following code :
range = {2:4, 8, 9:12};
range_str=repmat({''}, size(range));
for i=1:length(range)
if length(range{i})==1
range_str{i}=sprintf('%d', range{i});
else
range_str{i}=sprintf('%d:%d', range{i}(1), range{i}(end));
end
end
range_str
Output :
range_str =
'2:4' '8' '9:12'

Read in file to single array and then check if a number is in that array

I've got a .txt file set up in the following format:
7
8
9
10
What I'm trying to do, is read in the numbers from the file into an array and then check if a number I'm getting from a different function is contained within that array.
ismember(ruleFunc{x+1},memFunc}
I'm pretty sure that will check if the element from ruleFunc is in the array memFunc and return 1/0 if it is or isn't. But I can't get the ismember function to work properly because the method I'm using to populate the memFunc array is wrong.
Additionally, how am I able to add another number to the .txt file on a new line?
EDIT:
Here is how I am populating memFunc currently. It's also the same method that populates ruleFunc.
mem=fopen('WorkingMemory.txt');
tline = fgets(mem);
workMem = {};
index = 1;
while ischar(tline)
workMem{index} = str2num(tline);
tline = fgets(mem);
index = index + 1;
end
The function ismember returns a matrix that is 1 where the inputs are equal. (See the documenation for more information.) You might actually want something that returns a number, 1 or 0, depending on weather or not your number is in the matrix at all. I've included both options below.
% read in file
filename = 'my_data.txt';
fid = fopen(filename);
data = textscan(fid, '%d');
data = data{1};
fclose(fid);
% determine if number is in the file
number = 33;
ismember(data,number) %this returns an array
length(find(data == number)) > 0 % this returns 1 or 0
%write a line to existing file
fid2 = fopen(filename,'a');
newnumber = 100;
fprintf(fid2, '%d\n', newnumber);
fclose(fid2);
Now I see your updated answer. That code will read each line into a different cell of a cell array. You want all your data in a matrix. You could rearrange your cell array and put the data into a matrix or you could use textscan as described above.
In response to your comment, you can make an if statement like this:
if (length(find(data == number)) > 0)
'do something'
end
Maybe you are actually creating an array containing strings instead of numbers?
If it's not as simple as that, more information / code snippets would be useful. You mention the method populating memFunc may be wrong, maybe you could post that code?

Resources