How can i import data with uneven row lengths - arrays

I have a .txt file I wish to import into matlab (thinking using importdata), however I have some issues telling matlab the format, as well as how much of the data to take in.
The file is generated from the program "TurbSim".
The format is:
12 rows of headers
1 line with 2 numerical values, spacing delimiter is done with an
empty space.
35 lines each with 35 numerical values, spacing delimiter is done
with an empty space.
1 empty line of space
The format, after the headers, repeat and I have a very large file, I've not been able to find a way in a script to load up the file correctly where I have control over how large a portion i wish to take out. Which I may need as the file is ~860MB.
Example txt of my issue. fixed
https://drive.google.com/open?id=1FwmrCiz6TaWXYwXYX_v0BwQD-jjbdsE4

How about this?
clear;
NUM_HEADERLINES = 12;
DELIM_VALUES = ' ';
fid = fopen('TurbSim.txt');
% skip header
for n = 1:NUM_HEADERLINES, fgets(fid);end
while ~feof(fid)
% read line
[line,nl] = fgets(fid);
% remove newline char
line = line(1:end-length(nl));
% explode using delimiter
values = strsplit(line,DELIM_VALUES);
% in case of leading blanks: skip first empty one
if isempty(values{1}), values = values(2:end);end
% skip blank lines
if isempty(values), continue;end
% convert to double
values = str2double(values);
% now process/save/whatever...
...
fprintf('Read %d values\n',length(values)); % in your example: 2 or 40
% disp(values);
end
fclose(fid);
Btw: your examples has 40 lines with 40 values each, not 35 with 35 each.

Related

How to convert 'inputdlg' output into text file?

I am writing a script that displays a character array (I would use a string array but inputdlg requires char), allows the user to edit the array, and outputs the new array into a text file.
However, I am running into the issue of not being able to format the output (vals1) into a text file. I think part of the problem is that the inputdlg command outputs a 1x1 array, which is difficult to convert back into the line-by-line format that I started with (in this case arr).
The code below outputs a single line read by column rather than row: "A1ReB2otC3bcD4e E5re 6tt 7 c 8S 9m i t h ". I'm not sure how I would convert this since the charvals1 (the inputdlg output) returns the same string of characters.
Is there a way to either return a row-by-row output (rather than the 1x1 array string) after the user inputs a new array, or print a reformatted version of the inputdlg output (including line breaks)?
arr = char(["ABCDE";
"123456789";
"Robert Smith";
"etc etc"])
% User updates the array
prompt = {'Update content below if necessary'};
dlgtitle = "Section 2";
dims = [30 50];
definput = {arr};
charvals1 = inputdlg(prompt,dlgtitle,dims,definput);
vals1 = convertCharsToStrings(charvals1);
% Outputting the updated array to text file
prompt = {'Enter desired input file name'};
dlgtitle = "Input Name";
dims = [1 35];
definput = {'Input Name'};
fileName = inputdlg(prompt,dlgtitle,dims,definput);
selected_dir = uigetdir();
fileLocation = char(strcat(selected_dir, '\', string(fileName(1)),'.txt'));
txtfile = fopen(fileLocation,'wt');
fprintf(txtfile, '%s\n', vals1) ;
Don't use convertCharsToStrings, since it will operate along the first dimension of the character array (you could transpose the character array first, but then the 'linebreaks' are lost).
You can convert the character array that you obtain to a string and then trim the whitespace. This can be written to a textfile without any problem with the code that you have already.
charvals1 = inputdlg(prompt,dlgtitle,dims,definput);
vals1 = string(charvals1{1}); % note the {1} to access the contents of the cell array.
vals1 = strtrim(vals1);
And don't forget to close the txtfile:
txtfile = fopen(fileLocation,'wt');
fprintf(txtfile, '%s\n', vals1);
fclose(txtfile);

Best way to compare data from file to data in array in Matlab

I am having a bit of trouble with a specific file i/o in matlab, I am fairly new to it still so some things are still a bit of a mystery to me. The input file is structured as so:
File Name: Processed_kplr003942670-2010174085026_llc.fits.txt
File contents- 6 Header Lines then:
1, 2, 3
1, 2, 3
basically a matrix of about [1443,3] with varying values
now here is the matrix that I'm comparing it to:
[(0123456, 1, 2, 3), (0123456, 2, 3, 4), (etc..)]
Now here is my problem, first I need to know how to properly do the file input in a way which can let me compare the ID number (0123456) that is in the filename with the ID value that is in the matrix, so that I can compare the other columns of both. I do not know how to achieve this in matlab. Furthermore, I need to be able to loop over every point in the the matrix that matches up to the specific file, for example:
If I have 15 files ranging from 'Processed_0123456_1' to 'Processed_0123456_15' then I want to be able to read in the values contained in 'Processed_0123456_1'and compare them to ANY row in the matrix that corresponds to that ID (0123456). I don't know if maybe accumaray can be used for this, but as I said I'm not sure.
So the code must:
-Read in file
-Compare file to any point in the matrix with corresponding ID
-Do operations
-Loop over until full list of files in the directory are read in and processed, and output a matrix with the results.
Thanks for any help.
EDIT: Exact File Sample--
Kepler I.D.-----Channel
[1161345]--------[84]
-TTYPE1--------TTYPE8------------TTYPE4
['TIME']---['PDCSAP_FLUX']---['SAP_FLUX']
['BJD - 2454833']--['e-/s']--------['e-/s']
CROWDSAP --- 0.9791
630.195880143,277165.0,268233.0
630.216312946,277214.0,268270.0
630.23674585,277239.0,268293.0
630.257178554,277296.0,268355.0
630.277611357,277294.0,268364.0
630.29804426,277365.0,268441.0
630.318476962,277337.0,268419.0
630.338909764,277403.0,268481.0
630.359342667,277389.0,268463.0
630.379775369,277441.0,268508.0
630.40020817,277545.0,268604.0
There are more entries than what was just posted but they go for about 1000 lines so it is impractical to post that all here.
To get the file ID, use regular expressions, e.g.:
filename = 'Processed_0123456_1';
file_id_str = regexprep(filename, 'Processed_(\d+)_\d+', '$1');
file_num_str = regexprep(filename, 'Processed_\d+_(\d+)', '$1')
To read in the file contents, assuming that it's all comma-separated values without a header, use textscan, e.g.,
fid = fopen(filename)
C = textscan(fid, '%f,%f,%f') % Use as many %f specifiers as you have entries per line in the file
textscan also works on strings. So, for example, if your file contents was:
filestr = sprintf('1, 2, 3\n1, 3, 3')
Then running textscan on filestr works like this:
C = textscan(filestr, '%f,%f,%f')
C =
[2x1 int32] [2x1 int32] [2x1 int32]
You can convert that to a matrix using cell2mat:
cell2mat(C)
ans =
1 2 3
1 3 3
You could then repeat this procedure for all files with the same ID and concatenate them into a single matrix, e.g.,
C_full = [];
for (all files with the same ID)
C = do_all_the_above_stuff;
C_full = [C_full; C];
end
Then you can look for what you want in C_full.
Update based on updated OP Dec 12, 2013
Here's code to read the values from a single file. Wrap this all in the the loop that I mentioned above to loop over all your files and read them all into a single matrix.
fid = fopen('/path/to/file');
% Skip over 12 header lines
for kk = 1:12
fgetl(fid);
end
% Read in values to a matrix
C = textscan(fid, '%f,%f,%f');
C = cell2mat(C);
I think your requirements are too complicated to write the whole script here. Nonetheless, I will try to give some pointers to help. Disclaimer: None of this is tested, just my best guess. Please expect syntax errors, etc. I hope you can figure them out :-)
1) You can use the textscan function with the delimiter option to get data from the lines of your file. Since your format varies as it does, we will probably want to use...
2) ... fgetl to read the first two lines into strings and process them separately using texstscan. Such an operation might look like:
fid = fopen('file.txt','w');
tline1 = fgetl(fid);
tline2 = fgetl(fid);
fclose(fid);
C1 = textscan(tline1,'%s %d %s','delimiter','_'); %C1{2} will be the integer we want
C2 = textscan(tline2,'%s %s'),'delimiter,':'); %C2{2} will be the values we want, but they're still a string so...
mat = str2num(C2{2});
3) Then, for the rest of the lines, we can use something like dlmread:
mat2 = dlmread('file.txt',',',2,0);
The 2,0 specifies the offset in 0-based rows,columns from the start of the file. You may need to look at something like vertcat to stitch mat and mat2 together.
4) The list of files in the directory can be found with the dir command. The filename is an attribute of the structure that's returned:
dirlist = dir;
for i = 1:length(dirlist)
filename = dirlist(i).name
%process your files
end
You can also pass matching strings to dir, like so:
dirlist = dir('*.txt');
which will find all of the files with extension .txt.
5) You can very easily loop through the comparison matrix:
sze = size(comparisonmatrix);
for i = 1:sze(1)
%compare comparisonmatrix(i,1) to C1{2}
%Perform whatever operations you need
end
Hope that helps!

Matlab - Importing a .dat file into an array

I'm still fairly new to Matlab but for some reason the documentation hasn't been all that helpful with this.
I've got a .dat file that I want to turn into a _ row by 6 column array (the number of rows changes depending on the program that's generating the .dat file). What I need to do is get the dimensions of the image this array will be used to make from the 1st row 2nd column (x dimension) and 1st row 4th column (y dimension). When using the Import Data tool in Matlab, this works properly:
However I need the program to do it automatically. If the first line wasn't there, I'm pretty sure I could just use fscanf to put the data in the array, but the image dimensions are necessary.
Any idea what I need to use instead?
You may use textscan. The first call to this function will handle the first line (i.e. get the dimension of your file) and the second call the remaining of your file. The second call uses repmat to declare the format spec: %f, meaning double, repeated nb_col times. The option CollectOutput will concatenate all the columns in a single array. Note that textscan can read the entire file without specifying the number of rows.
The code would be
fileID = fopen('youfile.dat'); %declare a file id
C1 = textscan(fileID,'%s%f%s%f'); %read the first line
nb_col = C1{4}; %get the number of columns (could be set by user too)
%read the remaining of the file
C2 = textscan(fileID, repmat('%f',1,nb_col), 'CollectOutput',1);
fclose(fileID); %close the connection
In the case where the the number of columns is fixed, you can simply do
fileID = fopen('youfile.dat');
C1 = textscan(fileID,'%s%f%s%f'); %read the first line
im_x = C1{2}; %get the x dimension
im_y = C1{4}; %get the x dimension
C2 = textscan(fileID,'%f%f%f%f%f%f%*[^\n]', 'CollectOutput',1);
fclose(fileID);
The format specification %*[^\n] skips the remaining of a line.

Parse files to obtain array of arrays in matlab

I need to parse a .txt file in Matlab so that all lines of the file are a different element in an array. Each element of the array would also be an array of integers. So I need to make an array of arrays from a .txt file.
The problem I'm having is that I can't figure out which function to use to parse the file. If I use importdata(filename), it only parses the first line of the file. If I use textscan, it parses the file in colums, and the file is formatted like:
1 1 1 1 1
13 13 13 13 13
2 2 2 2 2
14 14 14 14 14
I need each of the rows to be an array that I can then use to compare my data against.
Is there an option for either one of those functions that would work for my purposes? I've tried looking on the MATLAB documentation, but can't make sense of it.
If each array needs to be a different size, you need to use a cell array to contain them. Something like this:
fid = fopen('test.txt'); %opens the file
tline = fgets(fid); % reads in the first line
data = {}; % creates an empty cell array
index = 1; % initializes index
while ischar(tline) % loops while the line that has just been read contains characters, that is, the end of the file has not been reached.
data{index} = str2num(tline); % converts the line that has just been read in to a string and assigns it to the current column of data
tline = fgets(fid); % reads in the next line
index = index + 1; % increments index
end
fclose(fid);
If you know your data will have the specific format of 5 numbers followed by a single number, you can use dlmread and then format the resulting matrix.
data = dlmread('data.txt',' ');
multipleValueRows = data(1:2:end,:);
singleValueRows = data(2:2:end,1);
The data matrix has the size (number of rows of your file) x 5 columns. In the rows where you only have a single number, the data matrix will contain zeros in columns 2-5.

Splitting arrays in matlab

Greetings All
I'm trying to
1)split an array into multiple parts
2)export each part to separate wave files
3)re-import wav files and join them together to make sure
the array data that was split up wasn't altered.
I can do all of these steps the problem is when I test for
error I expect it should be something like 2.232e-15 which
is almost no error however I get unexpected large numbers
for error.
MAE = 0.046232
MXE = 0.14522
RMSE = 0.064035
How can I fix this so the error rate goes down?
I thought the array was being split in sections and the cell data was being copied
exactly but it's looking like that may not be the case,
how can I fix this?
Code below:
%split_file
%create sine wave signal
clear all, clc
tic
fs = 44100; % Sampling frequency
t=linspace(0,1,fs);
freq=340;
ya = sin(2*pi*freq*t); %+ 1*sin(2*pi*250*t);
[size_r,size_c]=size(ya');
jj=[];
kk=0;
wavefilesplit=[];
%need to delete diretory and recreate it to clean out files
fileprepathStr='/home/rat/Documents/octave/pre/'; %
rmdir(fileprepathStr,'s');
fprintf('\n-1- deleting %s directory %2.4f sec',fileprepathStr,toc);
mkdir(fileprepathStr);
fprintf('\n-1- creating %s directory %2.4f sec',fileprepathStr,toc);
jj=1;
for ii=1:fs/4:size_r, %build array of desired ranges or fs/2
jj(end+1,:)=ii-1; %minus 1 to get correct array index in cell
end;
[size_rjj,size_cjj]=size(jj); %used to get size of jj array
jj(end+1,:)=size_r-(size_rjj-2); %adds the end of the sound file to the end of the jj array minus the amount of files joined
jj(2,:)=[]; %deletes second cell with zero and shifts the cells up
for ii=1:1:size_rjj-1,kk=kk+1;
wavefilesplit=ya(jj(kk):jj(kk+1));
wavefn=strcat('wavefn_',num2str(kk,'%04d')); %build filename dynamiclly with 4 leading zeros
wavwrite([wavefilesplit],fs,16,strcat('/home/rat/Documents/octave/pre/',wavefn,'.wav'));
fprintf('\n-1- wavwrite split %s.wav %3.0f of %3.0f %6.3fsec %6.3fmins\n',wavefn,kk,size_rjj-1,toc,toc/60);
end;
fprintf('\n-2- Elapsed time in seconds after wavwrite split %6.3fsec %6.3fmins\n',toc,toc/60);
%rejoin to check if arrays are the same
y2=[]; %
yb2=[];
filepathprocStr='/home/rat/Documents/octave/pre/';
files2=strcat(filepathprocStr,'*.wav');
files2=dir(files2);
[rwsz_files2,clsz_files2]=size(files2); %used to get ro and col size
for i=1:numel(files2)
[yb2, fs2, nbits] = wavread(strcat(filepathprocStr,files2(i).name));
yb2=yb2';
y2=[y2;yb2]; %Append files2
fprintf('\n %4.0f of %4.0f joined %s',i,rwsz_files2,files2(i).name)
end;
wavwrite([y2],fs2,16,'/home/rat/Documents/octave/pre/All_joined.2wav')
fprintf(' \n Done!!!\n');
ya=ya';
dy = abs(ya-y2); % absolute error
MAE = mean(dy) % 7.2292e-015 mean-absolute-error
MXE = max(dy) % 3.4195e-014 maximum-absolute-error
RMSE = sqrt(mean(dy.^2)) % 9.5049e-015 root-mean-sqare-error
I decided to use the reshape command and pad with zeros because it cuts the code down a lot and it should be quicker.
clear all, clc
ya=1:64;
fs=9;
padlen = mod(-length(ya), fs); %creates number of zeros needed to get correct array reshaped
ya_reshaped = reshape([ya zeros(1,padlen)], fs, []); % used to pad zeros on 1 col x rows
[size_r,size_c]=size(ya_reshaped)
wavefilesplit=[];
wavefilesplit2=[];
for ii=1:size_c
wavefilesplit=ya_reshaped(:,ii) %this line can be used to export data/audio to file
%could use if else statment to strip zeros off end if inserted here
wavefilesplit2=[wavefilesplit2; ya_reshaped(:,ii)] %will append to end all in one col to error check
end;
wavefilesplit2(end-(padlen-1):end)=[] %will erase zeros at the end of array wavefilesplit2(end-(padlen-1):end,1)=2 will ad 2's

Resources