Reading many (1000+) files with dlmread - Loop with varying filenames? - arrays

I'm very new to matlab, or coding for that matter.
I'm running a simulation which outputs thousands of files. These files are .vtk and are read correctly by dlmread.
I tried reading one of them, defining it as a matrix and extracting column vectors out of this matrix. This works fine. What i need now is to not only read one of them, but all. The filenames vary by a number, for example cover1000.vtk, cover2000.vtk, ...., cover1200000.vtk.
I want all of them to be read with dlmread and stored as a different matrix. How do i do that? Here is what i have right now, working with one file at a time:
A_1000 = dlmread ('cover1000.vtk') %matrix a containing values from vtk file_in_loadpath
fx_1000 = A(1:20,1) %extracting vector with specific values
fx_ave_1000 = sum(fx_1000)/length(fx_1000) % average of the values in extracted vector
I'm thinking of a loop, but how do i create a loop with varying file names?
Also I've read that a loop is not the best idea, cell arrays would be better. But i have absolutely no idea how to implement any of this.
Thanks for your help!
cheers

You can use the function dir to list all the vtk files in your directory then loop over those files.
filename = dir('*.vtk'); %list all the vtk files in your current directory.
for ii = 1:length(filename)
A = dlmread (filename(ii).name) %matrix a containing values from vtk file_in_loadpath
fx{ii} = A(1:20,1) %extracting vector with specific value
fx_ave{ii} = sum(fx{ii})/length(fx{ii}) % average of the values in extracted vector
end
The results are now stored in two cells: fx and fx_ave.

Related

Loop over files of same length and add their column respectively

I have a directory that contains Nfiles text files called myfile_i.txt (with i and integer).
Each text file contains 2 columns (let's call them x and y) and Nsteps rows of floating numbers.
I would like to loop over all the files in the directory and do the following computation in C:
while [condition to go through all files called myfile_i.txt in the directory]
{
for (ii= istart;ii<Nsteps; i++) // with istart>=0 and istart<Nsteps otherwise error
{
dsq[ii]=(x[ii]-x[istart])*(x[ii]-x[istart])+(y[ii]-y[istart])*(y[ii]-y[istart]);
}
}
where istart is an input parameter (integer) that I choose as a starting point in terms of iteration (or if you will, time steps).
Then at the end I would need to devide this result (which I believe it a column vector) by the number of steps (basically, Nsteps-istart).
I used to do something like this in Matlab but I have lots of these large files and loading them all and computing this is too slow for my computer, this is why I want to use (and learn) C.
What should my program look like in order to accomplish this computation?

Convert mutiple 2-dimensional .mat files into single 3-dimensional .mat in MATLAB

I have 79 .mat files each contain a 264*264 array named "CM". I want to combine them all into a single 264*264*79 matrix but I don't know how.
files=dir('*.mat') %// load all filenames from the directory ending on .mat
for ii = numel(files):-1:1 %// let the loop run backwards
load(files(ii).name);
A(:,:,ii) = CM; %// assumed they are actually all equivalently called CM
end
The dir command get a list of all files in the pwd (present working directory). The the for loop runs backwards, so as to initialise the storage variable A to its maximum size, improving efficiency. Within the loop, load a file and then store it in A. Finally A will be a [264 264 79] array.

Can i format output of matlab command such that i can use it to declare a new variable?

It's best explained with an easier example. Say some script in MATLAB gives me a cell array of strings:
temp = dir;
names = {temp.name}'
ans =
'folder1'
'folder2'
'file1'
I would like to use this output in another script, in another matlab session. Ideally, in the second script i would write
names = {'folder1', 'folder2', 'file1'}
but this means copypasting the output right under "ans = " and then manually adding the commas and curly brackets. In my case the cell array is quite large so this is undesirable. Even more it feels clumsy and there could be an easier way. Is there any way to make matlab print the output in such a way that i do not have to do this?
Exactly the same thing would be nice to know for matrices instead of cell arrays!!
I am aware of saving the variable in a .mat file and loading it, but i was wondering if the above is also possible (it would be cleaner in my case).
Personally I would advise the use of a cleaner way of handling this (such as mat files).
But then again sometimes the time spent setting these up is just not worth it for simple tasks which are unlikely to be repeated much...
For matrices there is a builtin function to do this, for cells however we would need produce a sting with the required format...
Matrix
For 1d or 2d matrices mat2str provides this functionality
mat2str(eye(2))
ans =
[1 0;0 1]
Cell
However to my knowledge there is no such builtin function for cells.
For a 1d cell array of strings the following will give the output in a copyable format:
['{',sprintf('''%s'' ',names{:}),'}']
ans =
{'folder1' 'folder2' 'file1' }
note: the stings in the cells cannot contain the ' character
If i understand you correctly, you are getting the names output from one script and want to use it within another script. Since you then cannot pass it as function argument, you are currently copying it over. One could do that with eval and copy&paste around:
names = {'folder1'
'folder2'
'file1'};
% create the command
n = length(names);
cmd = sprintf(['names = {',repmat('''%s'', ', 1, n-1) ,'''%s''}'], names{:}); % '%s, %s, ...., %s' format
% cmd contains the string: names_new = {'folder1', 'folder2', 'file1'}
% eval the cmd in script 2
eval(cmd) % evals the command names = {'folder1', 'folder2', 'file1'}
But this is generally very bad practice as it gets insanely hard to debug if something goes wrong somewhere. Also it makes you copy and paste things around, which i feel is uncomfortable. How about storing them in a txt file and loading them in the second script? It gets things done autmatically.
names = {'folder1'
'folder2'
'file1'};
% write output to file
fid = fopen('mynames.txt', 'w'); % open file to write something
fprintf(fid, [repmat('%s, ',1, n-1), '%s'], names{:});
fclose(fid);
% here comes script 2
fid = fopen('mynames.txt', 'r'); % open file to read something
names_loaded = textscan(fid, '%s');
names_loaded = names_loaded{:};
fclose(fid)
I think the key here is that you have a variable in 1 place, and want to use it in a different case.
In that situation you don't want to copy the output matlab generates, you just want to save the value itself.
After finding the result just do this:
save names
Later you can load this variable with
load names
Check doc save and doc names for more extensive examples. You may for example want to save all relevant variables in a file with a more generic name.

How to name a Matlab output file using input from a text file

I am trying to take an input from a text file in this format:
Processed_kplr010074716-2009131105131_llc.fits.txt
Processed_kplr010074716-2009166043257_llc.fits.txt
Processed_kplr010074716-2009259160929_llc.fits.txt
etc.... (there are several hundred lines)
and use that input to name my output files for a Matlab loop. Each time the loop ends, i would like it to process the results and save them to a file such as:
Matlab_Processed_kplr010074716-2009131105131_llc.fits.txt
This would make identifying the object which has been processed easier as I can then just look for the ID number and not of to sort through a list of random saved filenames. I also need it to save plots that are generated in each loop in a similar fashion.
This is what I have so far:
fileNames = fopen('file_list_1.txt', 'rt');
inText = textscan(fileNames, '%s');
outText = [inText]';
fclose(fileNames)
for j:numel(Data)
%Do Stuff
save(strcat('Matlab_',outText(j),'.txt'))
print(Plot, '-djpeg', strcat(outText(j),'.txt'))
end
Any help is appreciated, thanks.
If you want to use the save command to save to a text file, you need to use -ascii tab, see the documentation for more details. You might also want to use dlmwrite instead(or even fprintf, but I don't believe you can write the whole matrix at once with fprintf, you have to loop over the rows).

Concatenate a large number of HDF5 files

I have about 500 HDF5 files each of about 1.5 GB.
Each of the files has the same exact structure, which is 7 compound (int,double,double) datasets and variable number of samples.
Now I want to concatenate all this files by concatenating each of the datasets so that at the end I have a single 750 GB file with my 7 datasets.
Currently I am running a h5py script which:
creates a HDF5 file with the right datasets of unlimited max
open in sequence all the files
check what is the number of samples (as it is variable)
resize the global file
append the data
this obviously takes many hours,
would you have a suggestion about improving this?
I am working on a cluster, so I could use HDF5 in parallel, but I am not good enough in C programming to implement something myself, I would need a tool already written.
I found that most of the time was spent in resizing the file, as I was resizing at each step, so I am now first going trough all my files and get their length (it is variable).
Then I create the global h5file setting the total length to the sum of all the files.
Only after this phase I fill the h5file with the data from all the small files.
now it takes about 10 seconds for each file, so it should take less than 2 hours, while before it was taking much more.
I get that answering this earns me a necro badge - but things have improved for me in this area recently.
In Julia this takes a few seconds.
Create a txt file that lists all the hdf5 file paths (you can use bash to do this in one go if there are lots)
In a loop read each line of txt file and use label$i = h5read(original_filepath$i, "/label")
concat all the labels label = [label label$i]
Then just write: h5write(data_file_path, "/label", label)
Same can be done if you have groups or more complicated hdf5 files.
Ashley's answer worked well for me. Here is an implementation of her suggestion in Julia:
Make text file listing the files to concatenate in bash:
ls -rt $somedirectory/$somerootfilename-*.hdf5 >> listofHDF5files.txt
Write a julia script to concatenate multiple files into one file:
# concatenate_HDF5.jl
using HDF5
inputfilepath=ARGS[1]
outputfilepath=ARGS[2]
f = open(inputfilepath)
firstit=true
data=[]
for line in eachline(f)
r = strip(line, ['\n'])
print(r,"\n")
datai = h5read(r, "/data")
if (firstit)
data=datai
firstit=false
else
data=cat(4,data, datai) #In this case concatenating on 4th dimension
end
end
h5write(outputfilepath, "/data", data)
Then execute the script file above using:
julia concatenate_HDF5.jl listofHDF5files.txt final_concatenated_HDF5.hdf5

Resources