I need to transfer a text from a text file/string to a Table with a 2 positions vector. Like this:
Text File:
Gustavo 20
Danilo 20
Dimas 40
Table
Names = {{Gustavo,20},{Danilo,20},{Dimas,40}}
Need help to do this.
You can use io.lines() for this.
vectorarray = {}
for line in io.lines(filename) do
local w, n = string.match(line, "^(%w+)"), string.match(line, "(%d+)$")
table.insert(vectorarray, {w, n})
end
This is, of course, assuming that it's an absolute end of line and absolute start, and there is only those two options per line. If you're using the file name in many other places, then you could set a global variable for the file name and call it each time, such as:
arrayfile = "C:/arrayfile.txt"
Either way, make sure you put the correct path in quotation marks in the file name.
A shorter variation of Josh's answer that directly puts the result into the table. This matches alphabetic names followed by at least one space and numbers but you can change the pattern as needed:
Names = {}
for line in io.lines(filename) do
Names[ #Names+1 ] = {line:match('(%a+)%s+(%d+)')}
end
Related
I've managed to answer my own question. This code will write cell arrays of any shape containing strings. The datasets can be modified/overwritten by simply calling again with a different input.
https://www.mathworks.com/matlabcentral/fileexchange/24091-hdf5-read-write-cellstr-example
%Okay, Matlab's h5write(filename, dataset, data) function doesn't work for
%strings. It hasn't worked with strings for years. The forum post that
%comes up first in Google about it is from 2009. Yeah. This is terrible,
%and evidently it's not getting fixed. So, low level functions. Fun fun.
%
%What I've done here is adapt examples, one from the hdf group's website
%https://support.hdfgroup.org/HDF5/examples/api18-m.html called
%"Read / Write String Datatype (Dataset)", the other by Jason Kaeding.
%
%I added functionality to check whether the file exists and either create
%it anew or open it accordingly. I wanted to be able to likewise check the
%existence of a dataset, but it looks like this functionality doesn't exist
%in the API, so I'm doing a try-catch to achieve the same end. Note that it
%appears you can't just create a dataset or group deep in a heirarchy: You
%have to create each level. Since I wanted to accept dataset names in the
%same format as h5read(), in the event the dataset doesn't exist, I loop
%over the parts of the dataset's path and try to create all levels. If they
%already exist, then this action throws errors too; hence a second
%try-catch.
%
%I've made it more advanced than h5create()/h5write() in that it all
%happens in one call and can accept data inputs of variable size. I take
%care of updating the dataset's extent to accomodate changing data array
%sizes. This is important for applications like adding a new timestamp
%every time the file is modified.
%
%#author Pavel Komarov pavel#gatech.edu 941-545-7573
function h5createwritestr(filename, dataset, str)
%"The class of input data must be cellstring instead of char when the
%HDF5 class is VARIABLE LENGTH H5T_STRING.", but also I don't want to
%force the user to put braces around single strings, so this.
if ischar(str)
str = {str};
end
%check whether the specified .h5 exists and either create or open
%accordingly
if ~exist(filename, 'file')
file = H5F.create(filename, 'H5F_ACC_TRUNC', 'H5P_DEFAULT', 'H5P_DEFAULT');
else
file = H5F.open(filename, 'H5F_ACC_RDWR', 'H5P_DEFAULT');
end
%set variable length string type
vlstr_type = H5T.copy('H5T_C_S1');
H5T.set_size(vlstr_type,'H5T_VARIABLE');
% There is no way to check whether a dataset exists, so just try to
% open it, and if that fails, create it.
try
dset = H5D.open(file, dataset);
H5D.set_extent(dset, fliplr(size(str)));
catch
%create the intermediate groups one at a time because evidently the
%API's functions aren't smart enough to be able to do this themselves.
slashes = strfind(dataset, '/');
for i = 2:length(slashes)
url = dataset(1:(slashes(i)-1));%pull out the url of the next level
try
H5G.create(file, url, 1024);%1024 "specifies the number of
catch %bytes to reserve for the names that will appear in the group"
end
end
%create a dataspace for cellstr
H5S_UNLIMITED = H5ML.get_constant_value('H5S_UNLIMITED');
spacerank = max(1, sum(size(str) > 1));
dspace = H5S.create_simple(spacerank, fliplr(size(str)), ones(1, spacerank)*H5S_UNLIMITED);
%create a dataset plist for chunking. (A dataset can't be unlimited
%unless the chunk size is defined.)
plist = H5P.create('H5P_DATASET_CREATE');
chunksize = ones(1, spacerank);
chunksize(1) = 2;
H5P.set_chunk(plist, chunksize);% 2 strings per chunk
dset = H5D.create(file, dataset, vlstr_type, dspace, plist);
%close things
H5P.close(plist);
H5S.close(dspace);
end
%write data
H5D.write(dset, vlstr_type, 'H5S_ALL', 'H5S_ALL', 'H5P_DEFAULT', str);
%close file & resources
H5T.close(vlstr_type);
H5D.close(dset);
H5F.close(file);
end
I found a bug!
spacerank = length(size(str));
Now it works flawlessly as far as I can tell.
I am a Matlab beginner, as will soon be very obvious. I am trying to assemble an cell array that has a single column of filenames.
I have multiple sessions. Each session should have 56 filenames (but some could be short or long, so I'd honestly prefer a solution that wouldn't break on encountering a short session). I need to loop over sessions and append the names in each subsequent session to my cell array, so that after two sessions the dimensions are 112, 1.
In other words, I'd like an array that went:
P =
/data/session1/dvol1.img
/data/session1/dvol2.img
...
/data/session1/dvol56.img
/data/session2/dvol1.img
/data/session2/dvol2.img
...
/data/session2/dvol56.img
and so on if there are more than two sessions.
The function I have that finds the filenames in the session is spm_select. It returns a char array of all the files in a directory that match a regular expression, in this case, 56 files for each session directory.
(I recognize my question is very similar to the question here: Using loops to get multiple values into a cell but I couldn't figure out an answer to my question since that person is only trying to append a single value at a time.)
I have tried a lot of things that haven't worked.
This:
data_path = '/foo/bar/';
subjects = {'test1'};
sessions = {'session1' 'session2' };
for i=1:numel(subjects)
clear P
P=cell(56*numel(sessions),1);
for j=1:numel(sessions)
P{(j-1)*56+1} = spm_select('FPList', fullfile(data_path,subjects{i}, sessions{j}), '^d.*\.img$');
end
end
generated a cell array that was 112x1, but had a first element that was 56x57 char array, that is, the filenames of all files in my first session directory, and none of them from the second.
I'm not sure how useful it would be to recapitulate every wrong-headed thing I've done, so I won't.
Thanks in advance for your help.
Editing to include sample output from spm_select by request:
>> output = spm_select('FPList', fullfile(data_path,subjects{i}, sessions{j}), '^d.*\.img$')
output =
/home/katie/Desktop/sample/test1/run_1L3/draghf000001.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000035.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000069.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000103.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000137.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000171.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000205.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000239.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000273.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000307.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000341.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000375.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000409.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000443.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000477.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000511.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000545.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000579.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000613.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000647.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000681.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000715.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000749.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000783.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000817.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000851.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000885.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000919.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000953.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000987.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001021.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001055.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001089.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001123.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001157.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001191.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001225.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001259.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001293.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001327.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001361.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001395.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001429.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001463.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001497.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001531.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001565.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001599.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001633.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001667.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001701.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001735.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001769.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001803.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001837.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001871.img
>> class(output)
ans =
char
>> size(output)
ans =
56 57
>>
Edit: Ok, problem solved. Here is the code I eventually used:
data_path = '/foo/bar';
subjects = {'test1'};
sessions = {'session1' 'session2' };
output={};
for i=1:numel(subjects)
for j=1:numel(sessions)
files=spm_select('FPList', fullfile(data_path,subjects{i},sessions{j}), '^d.*\.img$')
f_c=cellstr(files);
output=vertcat(output,f_c);
end
end
I think the answer to how "do you get a char array to append to a column cell array vertically" is convert it to a cell array and use vertcat.
You can try this code:
function output=file_list(path)
output={};
subjects=dir(path);
for a=3:length(subjects)
sessions=dir(fullfile(path,subjects(a).name));
for b=3:length(sessions)
files=dir(fullfile(path,subjects(a).name,'/',sessions(b).name,'/*.img'));
f_c=struct2cell(files);
f=f_c(1,:)';
output=vertcat(output,fullfile(path,subjects(a).name,'/',sessions(b).name,'/',f));
end
end
One drawback of this code is that the size of output grows inside the loop. Here is an example:
path='/home/naveen/Desktop/example/'; % path is the main directory in which the data of
% subjects is stored in sub directories.
output=file_list(path)
The output is:
output =
'/home/naveen/Desktop/example/subject_1/session_1/lipo2.png'
'/home/naveen/Desktop/example/subject_1/session_1/lipo_6.png'
'/home/naveen/Desktop/example/subject_1/session_1/lps_4.png'
'/home/naveen/Desktop/example/subject_1/session_2/ltx_2.png'
'/home/naveen/Desktop/example/subject_1/session_2/ltx_2_1.png'
'/home/naveen/Desktop/example/subject_1/session_2/ltx_2_3.png'
'/home/naveen/Desktop/example/subject_1/session_2/ltx_4.png'
'/home/naveen/Desktop/example/subject_1/session_2/ltx_6.png'
'/home/naveen/Desktop/example/subject_2/session_1/lipo2.png'
'/home/naveen/Desktop/example/subject_2/session_1/lipo_6.png'
'/home/naveen/Desktop/example/subject_2/session_1/lps_4.png'
'/home/naveen/Desktop/example/subject_2/session_2/ltx_2.png'
'/home/naveen/Desktop/example/subject_2/session_2/ltx_2_1.png'
'/home/naveen/Desktop/example/subject_2/session_2/ltx_2_3.png'
'/home/naveen/Desktop/example/subject_2/session_2/ltx_4.png'
'/home/naveen/Desktop/example/subject_2/session_2/ltx_6.png'
Hope this works for you. Please note that in the inner most for loop you have to change the file extension while using for your purpose.
I am having a bit of trouble with a specific file i/o in matlab, I am fairly new to it still so some things are still a bit of a mystery to me. The input file is structured as so:
File Name: Processed_kplr003942670-2010174085026_llc.fits.txt
File contents- 6 Header Lines then:
1, 2, 3
1, 2, 3
basically a matrix of about [1443,3] with varying values
now here is the matrix that I'm comparing it to:
[(0123456, 1, 2, 3), (0123456, 2, 3, 4), (etc..)]
Now here is my problem, first I need to know how to properly do the file input in a way which can let me compare the ID number (0123456) that is in the filename with the ID value that is in the matrix, so that I can compare the other columns of both. I do not know how to achieve this in matlab. Furthermore, I need to be able to loop over every point in the the matrix that matches up to the specific file, for example:
If I have 15 files ranging from 'Processed_0123456_1' to 'Processed_0123456_15' then I want to be able to read in the values contained in 'Processed_0123456_1'and compare them to ANY row in the matrix that corresponds to that ID (0123456). I don't know if maybe accumaray can be used for this, but as I said I'm not sure.
So the code must:
-Read in file
-Compare file to any point in the matrix with corresponding ID
-Do operations
-Loop over until full list of files in the directory are read in and processed, and output a matrix with the results.
Thanks for any help.
EDIT: Exact File Sample--
Kepler I.D.-----Channel
[1161345]--------[84]
-TTYPE1--------TTYPE8------------TTYPE4
['TIME']---['PDCSAP_FLUX']---['SAP_FLUX']
['BJD - 2454833']--['e-/s']--------['e-/s']
CROWDSAP --- 0.9791
630.195880143,277165.0,268233.0
630.216312946,277214.0,268270.0
630.23674585,277239.0,268293.0
630.257178554,277296.0,268355.0
630.277611357,277294.0,268364.0
630.29804426,277365.0,268441.0
630.318476962,277337.0,268419.0
630.338909764,277403.0,268481.0
630.359342667,277389.0,268463.0
630.379775369,277441.0,268508.0
630.40020817,277545.0,268604.0
There are more entries than what was just posted but they go for about 1000 lines so it is impractical to post that all here.
To get the file ID, use regular expressions, e.g.:
filename = 'Processed_0123456_1';
file_id_str = regexprep(filename, 'Processed_(\d+)_\d+', '$1');
file_num_str = regexprep(filename, 'Processed_\d+_(\d+)', '$1')
To read in the file contents, assuming that it's all comma-separated values without a header, use textscan, e.g.,
fid = fopen(filename)
C = textscan(fid, '%f,%f,%f') % Use as many %f specifiers as you have entries per line in the file
textscan also works on strings. So, for example, if your file contents was:
filestr = sprintf('1, 2, 3\n1, 3, 3')
Then running textscan on filestr works like this:
C = textscan(filestr, '%f,%f,%f')
C =
[2x1 int32] [2x1 int32] [2x1 int32]
You can convert that to a matrix using cell2mat:
cell2mat(C)
ans =
1 2 3
1 3 3
You could then repeat this procedure for all files with the same ID and concatenate them into a single matrix, e.g.,
C_full = [];
for (all files with the same ID)
C = do_all_the_above_stuff;
C_full = [C_full; C];
end
Then you can look for what you want in C_full.
Update based on updated OP Dec 12, 2013
Here's code to read the values from a single file. Wrap this all in the the loop that I mentioned above to loop over all your files and read them all into a single matrix.
fid = fopen('/path/to/file');
% Skip over 12 header lines
for kk = 1:12
fgetl(fid);
end
% Read in values to a matrix
C = textscan(fid, '%f,%f,%f');
C = cell2mat(C);
I think your requirements are too complicated to write the whole script here. Nonetheless, I will try to give some pointers to help. Disclaimer: None of this is tested, just my best guess. Please expect syntax errors, etc. I hope you can figure them out :-)
1) You can use the textscan function with the delimiter option to get data from the lines of your file. Since your format varies as it does, we will probably want to use...
2) ... fgetl to read the first two lines into strings and process them separately using texstscan. Such an operation might look like:
fid = fopen('file.txt','w');
tline1 = fgetl(fid);
tline2 = fgetl(fid);
fclose(fid);
C1 = textscan(tline1,'%s %d %s','delimiter','_'); %C1{2} will be the integer we want
C2 = textscan(tline2,'%s %s'),'delimiter,':'); %C2{2} will be the values we want, but they're still a string so...
mat = str2num(C2{2});
3) Then, for the rest of the lines, we can use something like dlmread:
mat2 = dlmread('file.txt',',',2,0);
The 2,0 specifies the offset in 0-based rows,columns from the start of the file. You may need to look at something like vertcat to stitch mat and mat2 together.
4) The list of files in the directory can be found with the dir command. The filename is an attribute of the structure that's returned:
dirlist = dir;
for i = 1:length(dirlist)
filename = dirlist(i).name
%process your files
end
You can also pass matching strings to dir, like so:
dirlist = dir('*.txt');
which will find all of the files with extension .txt.
5) You can very easily loop through the comparison matrix:
sze = size(comparisonmatrix);
for i = 1:sze(1)
%compare comparisonmatrix(i,1) to C1{2}
%Perform whatever operations you need
end
Hope that helps!
I'm still fairly new to Matlab but for some reason the documentation hasn't been all that helpful with this.
I've got a .dat file that I want to turn into a _ row by 6 column array (the number of rows changes depending on the program that's generating the .dat file). What I need to do is get the dimensions of the image this array will be used to make from the 1st row 2nd column (x dimension) and 1st row 4th column (y dimension). When using the Import Data tool in Matlab, this works properly:
However I need the program to do it automatically. If the first line wasn't there, I'm pretty sure I could just use fscanf to put the data in the array, but the image dimensions are necessary.
Any idea what I need to use instead?
You may use textscan. The first call to this function will handle the first line (i.e. get the dimension of your file) and the second call the remaining of your file. The second call uses repmat to declare the format spec: %f, meaning double, repeated nb_col times. The option CollectOutput will concatenate all the columns in a single array. Note that textscan can read the entire file without specifying the number of rows.
The code would be
fileID = fopen('youfile.dat'); %declare a file id
C1 = textscan(fileID,'%s%f%s%f'); %read the first line
nb_col = C1{4}; %get the number of columns (could be set by user too)
%read the remaining of the file
C2 = textscan(fileID, repmat('%f',1,nb_col), 'CollectOutput',1);
fclose(fileID); %close the connection
In the case where the the number of columns is fixed, you can simply do
fileID = fopen('youfile.dat');
C1 = textscan(fileID,'%s%f%s%f'); %read the first line
im_x = C1{2}; %get the x dimension
im_y = C1{4}; %get the x dimension
C2 = textscan(fileID,'%f%f%f%f%f%f%*[^\n]', 'CollectOutput',1);
fclose(fileID);
The format specification %*[^\n] skips the remaining of a line.
I have a file that contains a full set of values for some sentences which have transcribed for a speech recognition program. Ive been trying to write some matlab code to go through this file and extract the values for each sentence and write them to a new individual file. So instead of having them all in one 'mlf' file i want them in separate files for each sentence.
For example by 'mlf' file (contains all values for all sentences) looks like this:
#!MLF!#
"/N001.lab"
AH
SEE
I
GOT
THEM
MONTHS
AGO
.
"/N002.lab"
WELL
WORK
FOR
LIVE
WIRE
BUT
ERM
.
"/N003.lab"
IM
GOING
TO
SEE
JAMES
VINCENT
MCMORROW
.
etc
So each sentences is separated by the 'Nxxx.lab' and the '.'. I need to create a new file for every Nxxx.lab, for example the file for N001 would just contain:
AH
SEE
I
GOT
THEM
MONTHS
AGO
I've been trying to use fgetline to specify the 'Nxxx.lab' and '.' boundaries, but it doesn't work as i don't know how to write the content into a new file separate from the 'mlf'.
If anyone can give me any guidance of what sort of approach to use would be greatly appreciated!
Cheers!
Try this code (input file test.mlf has to be in the working directory):
%# read the file
filename = 'test.mlf';
fid = fopen(filename,'r');
lines = textscan(fid,'%s','Delimiter','\n','HeaderLines',1);
lines = lines{1};
fclose(fid);
%# find start and stop indices
istart = find(cellfun(#(x) strcmp(x(1),'"'), lines));
istop = find(strcmp(lines, '.'));
assert(numel(istop)==numel(istop) && all(istop>istart),'Check the input file format.')
%# write lines to new files
for k = 1:numel(istart)
filenew = lines{istart(k)}(2:end-1);
fout = fopen(filenew,'wt');
for l = (istart(k)+1):(istop(k)-1)
fprintf(fout,'%s\n',lines{l});
end
fclose(fout);
end
The code assume that the file names are in double-quotes as in your example. If not, you can find istart indices base on a pattern. Or just assuming that entries for new file start from the 2nd line and follows the dot: istart = [1; istop(1:end-1)+1];
You could use a growing cell array to gather the information.
Read one line at a time from the file.
Grab the file name and put it into the first column if its the first read for the sentence.
If the line read is a period, add it to the string and move the index to a row in the array. Write the new file with the content.
This bit of code should help you in building the cell array while appending a string within it. I assume reading line by line is not a problem. You can also retain the carriage returns/new lines within the string ('\n').
%% Declare A
A = {}
%% Fill row 1
A(1,1) = {'file1'}
A(1,2) = {'Sentence 1'}
A(1,2) = { strcat(A{1,2}, ', has been appended')}
%% Fill row 2
A(2,1) = {'file2'}
A(2,2) = {'Sentence 2'}
While I'm sure you can do this with MATLAB, I would suggest you use Perl to split the original file and then process the individual files using MATLAB.
The following Perl script reads the entire file ("xxx.txt") and writes out the individual files according the the "NAME.lab" lines:
open(my $fh, "<", "xxx.txt");
# read the entire file into $contents
# This may not be a good idea if the file is huge.
my $contents = do { local $/; <$fh> };
# iterate over the $contents string and extract the individual
# files
while($contents =~ /"(.*)"\n((.*\n)*?)\./mg) {
# We arrive here with $1 holding the filename
# and $2 the content up to the "." ending the section/sentence.
open(my $fout, ">", $1);
print $fout $2;
close($fout);
}
close($fh);
The multiline regular expression is a bit difficult but it does the job.
For these sort of text manipulation, perl is much faster and useful. A good tool to learn if you process a lot of text.