MATLAB: Write strings and numbers of variables in files - file

I have following data:
a=[3 1 6]';
b=[2 5 2]';
c={'ab' 'bc' 'cd'}';
I now want to make a file which looks like this (the delimiter is tab):
ab 3 2
bc 1 5
cd 6 2
my solution (with a loop) is:
a=[3 1 6]';
b=[2 5 2]';
c={'ab' 'bc' 'cd'}';
c=cell2mat(c);
fid=fopen('filename','w');
for i=1:numel(b)
fprintf(fid,'%s\t%u\t%u\n',c(i,:),a(i),b(i));
end
fclose(fid);
Is there a possibility without loop and/or the possibility to write cell arrays directly in files?
Thanks.

How about this:
%A cell array holding all data
% (Note transpose)
data = cat(2, c, num2cell(a), num2cell(b))';
Write data to a file
fid = fopen('example.txt', 'w');
fprintf(fid, '%s\t%u\t%u\n', data{:});
fclose(fid);
This will be memory wasteful if your datasets get large (probably better to leave then as separate variables and loop), but seems to work.

Related

How to convert m x n matrix to (each row as) comma separted text file in MATLAB?

I have a .mat file with m x n values. For simplicity, let's say we have 2 rows and 3 columns as:
2 4 6
2 1 4
I want to be able to export these values from .mat file to a text file, however, in such a manner that each line has the respective row from the .mat file with values in each row separated by comma. For the above example, in the txt file, it should look like:
2,4,6
2,1,4
This is what I did till now:
gt1 = load('Benchmark\AAmpiidata\groundtruth.mat');
r = gt1.gTruth.LabelData{1,1}{1,1};
allOneString = sprintf('%.0f,', r(1,:));
allOneString = allOneString(1:end-1);% strip final comma
fid=fopen('allOneString.txt','w');
fprintf(fid,'%s',allOneString);
fclose(fid);true
I am able to extract the first row from .mat file as I require. I get this:
492,304,78,220
However, I don't know how to extract multiple rows from .mat file. Any help will be appreciated!
P.S. In the above code, in the .mat file gt1 directly doesn't have the values. The values I need (mxn) can be extracted using gt1.gTruth.LabelData{1,1}{1,1}
There are two answers depending on the version of MATLAB you use.
Answer 1: For MATLAB 2018 and before:
gt1 = load('Benchmark\AAmpiidata\groundtruth.mat');
r = gt1.gTruth.LabelData{1,1}{1,1};
dlmwrite('allOneString.txt',r)
Answer 2: For MATLAB 2019 (2019-a as of writing this answer)
gt1 = load('Benchmark\AAmpiidata\groundtruth.mat');
r = gt1.gTruth.LabelData{1,1}{1,1};
writematrix(r,'allOneString.txt')
Here's another way, which uses fprintf for file writing and strjoin for building the format string:
r = [2 4 6;2 1 4];
fid = fopen('allOneString.txt','w');
fprintf(fid, [strjoin(repmat({'%.0f'}, 1, size(r,2)), ',') '\n'], r.');
fclose(fid);

Best way to compare data from file to data in array in Matlab

I am having a bit of trouble with a specific file i/o in matlab, I am fairly new to it still so some things are still a bit of a mystery to me. The input file is structured as so:
File Name: Processed_kplr003942670-2010174085026_llc.fits.txt
File contents- 6 Header Lines then:
1, 2, 3
1, 2, 3
basically a matrix of about [1443,3] with varying values
now here is the matrix that I'm comparing it to:
[(0123456, 1, 2, 3), (0123456, 2, 3, 4), (etc..)]
Now here is my problem, first I need to know how to properly do the file input in a way which can let me compare the ID number (0123456) that is in the filename with the ID value that is in the matrix, so that I can compare the other columns of both. I do not know how to achieve this in matlab. Furthermore, I need to be able to loop over every point in the the matrix that matches up to the specific file, for example:
If I have 15 files ranging from 'Processed_0123456_1' to 'Processed_0123456_15' then I want to be able to read in the values contained in 'Processed_0123456_1'and compare them to ANY row in the matrix that corresponds to that ID (0123456). I don't know if maybe accumaray can be used for this, but as I said I'm not sure.
So the code must:
-Read in file
-Compare file to any point in the matrix with corresponding ID
-Do operations
-Loop over until full list of files in the directory are read in and processed, and output a matrix with the results.
Thanks for any help.
EDIT: Exact File Sample--
Kepler I.D.-----Channel
[1161345]--------[84]
-TTYPE1--------TTYPE8------------TTYPE4
['TIME']---['PDCSAP_FLUX']---['SAP_FLUX']
['BJD - 2454833']--['e-/s']--------['e-/s']
CROWDSAP --- 0.9791
630.195880143,277165.0,268233.0
630.216312946,277214.0,268270.0
630.23674585,277239.0,268293.0
630.257178554,277296.0,268355.0
630.277611357,277294.0,268364.0
630.29804426,277365.0,268441.0
630.318476962,277337.0,268419.0
630.338909764,277403.0,268481.0
630.359342667,277389.0,268463.0
630.379775369,277441.0,268508.0
630.40020817,277545.0,268604.0
There are more entries than what was just posted but they go for about 1000 lines so it is impractical to post that all here.
To get the file ID, use regular expressions, e.g.:
filename = 'Processed_0123456_1';
file_id_str = regexprep(filename, 'Processed_(\d+)_\d+', '$1');
file_num_str = regexprep(filename, 'Processed_\d+_(\d+)', '$1')
To read in the file contents, assuming that it's all comma-separated values without a header, use textscan, e.g.,
fid = fopen(filename)
C = textscan(fid, '%f,%f,%f') % Use as many %f specifiers as you have entries per line in the file
textscan also works on strings. So, for example, if your file contents was:
filestr = sprintf('1, 2, 3\n1, 3, 3')
Then running textscan on filestr works like this:
C = textscan(filestr, '%f,%f,%f')
C =
[2x1 int32] [2x1 int32] [2x1 int32]
You can convert that to a matrix using cell2mat:
cell2mat(C)
ans =
1 2 3
1 3 3
You could then repeat this procedure for all files with the same ID and concatenate them into a single matrix, e.g.,
C_full = [];
for (all files with the same ID)
C = do_all_the_above_stuff;
C_full = [C_full; C];
end
Then you can look for what you want in C_full.
Update based on updated OP Dec 12, 2013
Here's code to read the values from a single file. Wrap this all in the the loop that I mentioned above to loop over all your files and read them all into a single matrix.
fid = fopen('/path/to/file');
% Skip over 12 header lines
for kk = 1:12
fgetl(fid);
end
% Read in values to a matrix
C = textscan(fid, '%f,%f,%f');
C = cell2mat(C);
I think your requirements are too complicated to write the whole script here. Nonetheless, I will try to give some pointers to help. Disclaimer: None of this is tested, just my best guess. Please expect syntax errors, etc. I hope you can figure them out :-)
1) You can use the textscan function with the delimiter option to get data from the lines of your file. Since your format varies as it does, we will probably want to use...
2) ... fgetl to read the first two lines into strings and process them separately using texstscan. Such an operation might look like:
fid = fopen('file.txt','w');
tline1 = fgetl(fid);
tline2 = fgetl(fid);
fclose(fid);
C1 = textscan(tline1,'%s %d %s','delimiter','_'); %C1{2} will be the integer we want
C2 = textscan(tline2,'%s %s'),'delimiter,':'); %C2{2} will be the values we want, but they're still a string so...
mat = str2num(C2{2});
3) Then, for the rest of the lines, we can use something like dlmread:
mat2 = dlmread('file.txt',',',2,0);
The 2,0 specifies the offset in 0-based rows,columns from the start of the file. You may need to look at something like vertcat to stitch mat and mat2 together.
4) The list of files in the directory can be found with the dir command. The filename is an attribute of the structure that's returned:
dirlist = dir;
for i = 1:length(dirlist)
filename = dirlist(i).name
%process your files
end
You can also pass matching strings to dir, like so:
dirlist = dir('*.txt');
which will find all of the files with extension .txt.
5) You can very easily loop through the comparison matrix:
sze = size(comparisonmatrix);
for i = 1:sze(1)
%compare comparisonmatrix(i,1) to C1{2}
%Perform whatever operations you need
end
Hope that helps!

Parse files to obtain array of arrays in matlab

I need to parse a .txt file in Matlab so that all lines of the file are a different element in an array. Each element of the array would also be an array of integers. So I need to make an array of arrays from a .txt file.
The problem I'm having is that I can't figure out which function to use to parse the file. If I use importdata(filename), it only parses the first line of the file. If I use textscan, it parses the file in colums, and the file is formatted like:
1 1 1 1 1
13 13 13 13 13
2 2 2 2 2
14 14 14 14 14
I need each of the rows to be an array that I can then use to compare my data against.
Is there an option for either one of those functions that would work for my purposes? I've tried looking on the MATLAB documentation, but can't make sense of it.
If each array needs to be a different size, you need to use a cell array to contain them. Something like this:
fid = fopen('test.txt'); %opens the file
tline = fgets(fid); % reads in the first line
data = {}; % creates an empty cell array
index = 1; % initializes index
while ischar(tline) % loops while the line that has just been read contains characters, that is, the end of the file has not been reached.
data{index} = str2num(tline); % converts the line that has just been read in to a string and assigns it to the current column of data
tline = fgets(fid); % reads in the next line
index = index + 1; % increments index
end
fclose(fid);
If you know your data will have the specific format of 5 numbers followed by a single number, you can use dlmread and then format the resulting matrix.
data = dlmread('data.txt',' ');
multipleValueRows = data(1:2:end,:);
singleValueRows = data(2:2:end,1);
The data matrix has the size (number of rows of your file) x 5 columns. In the rows where you only have a single number, the data matrix will contain zeros in columns 2-5.

R read from file different-sized arrays

I need to apply the Mann Kendall trend test in R to a big number (about 1 million) of different-sized time series. I've already created a script that takes the time-series (practically a list of numbers) from all the files in a certain directory and then outputs the results to a .txt file.
The problem is that I have about 1 million of time-series so creating 1 million of file isn't exactly nice. So I thought that putting all the time-series in only one .txt file (separated by some symbol like "#" for example) could be more manageable. So I have a file like this:
1
2
4
5
4
#
2
13
34
#
...
I'm wondering, is it possible to extract such series (between two "#") in R and then apply the analysis?
EDIT
Following #acesnap hints I'm using this code:
library(Kendall)
a=read.table("to_r.txt")
numData=1017135
for (i in 1:numData){
s1=subset(a,a$V1==i)
m=MannKendall(s1$V2)
cat(m[[1]]," ",m[[2]], " ", m[[3]]," ",m[[4]]," ", m[[5]], "\n" , file="monotonic_trend_checking.txt",append=TRUE)
}
This approach works but the problem is that it is taking ages for computation. Can you suggest a faster approach?
If you were to number the datasets as they went into the larger file it would make things easier. If you were to do that you could use a for loop and subsetting.
setNum data
1 1
1 2
1 4
1 5
1 4
2 2
2 13
2 34
... ...
Then do something like:
answers1 <- c()
numOfDataSets <- 1000000
for(i in 1:numOfDataSets){
ss1 <- subset(bigData, bigData$setNum == i) ## creates subset of each data set
ans1 <- mannKendallTrendTest(ss1$data) ## gets answer from test
answers1 <- c(answers1, ans1) ## inserts answer into vector
print(paste(i, " | ", ans1, "",sep="" )) ## prints which data set is in use
flush.console() ## prints to console now instead of waiting
}
Here is a perhaps a more elegant solution:
# Read in your data
x=c('1','2','3','4','5','#','4','5','5','6','#','3','6','23','#')
# Build a list of indices where you want to split by:
ind=c(0,which(x=='#'))
# Use those indices split the vector into a list
lapply(seq(length(ind)-1),function (y) as.numeric(x[(ind[y]+1):(ind[y+1]-1)]))
Note that for this code to work, you must have a '#' character at the very end of the file.

Reading and processing a large text file in Matlab

I'm trying to read a large text file (a few million lines) into Matlab. Initially I was using importdata(file_name), which seemed like a concise solution. However I need to use Matlab 7 (yeah I know its old) and it seems importdata isn't supported. As such I tried the following:
while ~feof(fid)
fline = fgetl(fid);
fdata{1,lno} = fline ;
lno = lno + 1;
end
But this is really slow. I'm guessing its because its resizing the array on each iteration. Is there a better way of doing this. Bearing in mind the first 20 lines of the input data are string type data and the remainder of the data is 3 to 6 columns of hexadecimal values.
you will have to do some reshaping, but another option for you will be you could use fread.
But as was mentioned this essentially locks you into a rectangular import. So another option would be to use textscan. As I mention in another note, I'm not 100% sure when it was implemented, all I know is you dont have "importdata()"
fid = fopen('textfile.txt')
Out = textscan(fid,'%s','delimiter',sprintf('\n'));
fclose(fid)
with the use of textscan, you will be able to get a cell array of characters for each line which you can then manipulate however you want. And as I say in my comments, this no longer matters whether the lines are the same length or not. NOW you can parse the cell array more quickly. But as gnovice mentions, and he also does have a very elegant solution, you may have to concern yourself with memory requirements.
The one thing you never want to use in matlab if you can avoid it, is looping structures. They are fast in C/C++ etc, but in matlab, they are the slowest way of getting where you are going.
EDIT: Just looked it up, and it looks like textscan WAS implemented literally in version 7 (R14) so if thats what you have, you should be good to use that.
I see two options:
Rather than growing by 1 every single time, you could e.g. double the size of your array only when necessary. This massively reduces the number of reallocations required.
Do a two-pass approach. The first pass simply counts the number of lines, without storing them. The second pass actually fills in the array (which has been preallocated to the correct size).
One solution is to read the entire contents of the file as a string of characters with FSCANF, split the string into individual cells at the points where newline characters occur using MAT2CELL, remove extra white space on the ends with STRTRIM, then process the string data in each cell as needed. For example, using this sample text file 'junk.txt':
hi
hello
1 2 3
FF 00 FF
12 A6 22 20 20 20
FF FF FF
The following code will put each line in a cell of a cell array cellData:
>> fid = fopen('junk.txt','r');
>> strData = fscanf(fid,'%c');
>> fclose(fid);
>> nCharPerLine = diff([0 find(strData == char(10)) numel(strData)]);
>> cellData = strtrim(mat2cell(strData,1,nCharPerLine))
cellData =
'hi' 'hello' '1 2 3' 'FF 00 FF' '12 A6 22 20 20 20' 'FF FF FF'
Now if you want to convert all of the hexadecimal data (lines 3 through 6 in my sample data file) from strings to vectors of numbers, you can use CELLFUN and SSCANF like so:
>> cellData(3:end) = cellfun(#(s) {sscanf(s,'%x',[1 inf])},cellData(3:end));
>> cellData{3:end} %# Display contents
ans =
1 2 3
ans =
255 0 255
ans =
18 166 34 32 32 32
ans =
255 255 255
NOTE: Since you are dealing with such large arrays, you will have to be mindful of the amount of memory being used by your variables. The above solution is vectorized, but may take up a lot of memory. You may have to overwrite or clear large variables like strData when you create cellData. Alternatively, you could loop over the elements in nCharPerLine and individually process each segment of the larger string strData into the vectors you need, which you can preallocate now that you know how many lines of data you have (i.e. nDataLines = numel(nCharPerLine)-nHeaderLines;).

Resources