files comparison - file

I'm trying to compare 2 txt files to check files are equals otherwise, get the output and give difference (say that there are a diff line x)
I'm trying as follows:
fid1 = fopen(file_1, 'r');
fid2 = fopen(file_2, 'r');
lines1 = textscan(fid1,'%s','delimiter','\n');
lines2 = textscan(fid2,'%s','delimiter','\n');
lines1 = lines1{1};
lines2 = lines2{1};
fclose(fid1);
fclose(fid2);
tf = isequal(lines1,lines2); % this gives 0 or 1
I would like when the value is 0 (files are different) to localize the diff and give line where files are different or print content of difference.

You essentially want to compare each element of the two cell arrays you have, not the whole cell arrays. You could do that with a loop in most languages, but of course MATLAB has many ways to avoid loops. Here, it is cellfun:
cellfun(#isequal,lines1,lines2)
(I left out the part where, if the two cell arrays are of unequal size, you have to shorten the longer one.) Then, find is useful for finding the first (or all) occurrence(s) of a certain value in any vector.

Related

How do I split results into separate variables in matlab?

I'm pretty new to matlab, so I'm guessing there is some shortcut way to do this but I cant seem to find it
results = eqs\soltns;
A = results(1);
B = results(2);
C = results(3);
D = results(4);
E = results(5);
F = results(6);
soltns is a 6x1 vector and eqs is a 6x6 matrix, and I want the results of the operation in their own separate variables. It didn't let me save it like
[A, B, C, D, E, F] = eqs\soltns;
Which I feel like would make sense, but it doesn't work.
Up to now, I have never come across a MATLAB function doing this directly (but maybe I'm missing something?). So, my solution would be to write a function distribute on my own.
E.g. as follows:
result = [ 1 2 3 4 5 6 ];
[A,B,C,D,E,F] = distribute( result );
function varargout = distribute( vals )
assert( nargout <= numel( vals ), 'To many output arguments' )
varargout = arrayfun( #(X) {X}, vals(:) );
end
Explanation:
nargout is special variable in MATLAB function calls. Its value is equal to the number of output parameters that distribute is called with. So, the check nargout <= numel( vals ) evaluates if enough elements are given in vals to distribute them to the output variables and raises an assertion otherwise.
arrayfun( #(X) {X}, vals(:) ) converts vals to a cell array. The conversion is necessary as varargout is also a special variable in MATLAB's function calls, which must be a cell array.
The special thing about varargout is that MATLAB assigns the individual cells of varargout to the individual output parameters, i.e. in the above call to [A,B,C,D,E,F] as desired.
Note:
In general, I think such expanding of variables is seldom useful. MATLAB is optimized for processing of arrays, separating them to individual variables often only complicates things.
Note 2:
If result is a cell array, i.e. result = {1,2,3,4,5,6}, MATLAB actually allows to split its cells by [A,B,C,D,E,F] = result{:};
One way as long as you know the size of results in advance:
results = num2cell(eqs\soltns);
[A,B,C,D,E,F] = results{:};
This has to be done in two steps because MATLAB does not allow for indexing directly the results of a function call.
But note that this method is hard to generalize for arbitrary sizes. If the size of results is unknown in advance, it would probably be best to leave results as a vector in your downstream code.

Array of sets in Matlab

Is there a way to create an array of sets in Matlab.
Eg: I have:
a = ones(10,1);
b = zeros(10,1);
I need c such that c = [(1,0); (1,0); ...], i.e. each set in c has first element from a and 2nd element from b with the corresponding index.
Also is there some way I can check if an unknown set (x,y) is in c.
Can you all please help me out? I am a Matlab noob. Thanks!
There are not sets in your understanding in MATLAB (I assume that you are thinking of tuples on Python...) But there are cells in MATLAB. That is a data type that can store pretty much anything (you may think of pointers if you are familiar with the concept). It is indicated by using { }.
Knowing this, you could come up with a cell of arrays and check them using cellfun
% create a cell of numeric arrays
C = {[1,0],[0,2],[99,-1]}
% check which input is equal to the array [1,0]
lg = cellfun(#(x)isequal(x,[1,0]),C)
Note that you access the address of a cell with () and the content of a cell with {}. [] always indicate arrays of something. We come to this in a moment.
OK, this was the part that you asked for; now there is a bonus:
That you use the term set makes me feel that they always have the same size. So why not create an array of arrays (or better an array of vectors, which is a matrix) and check this matrix column-wise?
% array of vectors (there is a way with less brackets but this way is clearer):
M = [[1;0],[0;2],[99;-1]]
% check at which column (direction ",1") all rows are equal to the proposed vector:
lg = all(M == [0;2],1)
This way is a bit clearer, better in terms of memory and faster.
Note that both variables lg are arrays of logicals. You can use them directly to index the original variable, i.e. M(:,lg) and C{lg} returns the set that you are looking for.
If you would like to get logical value regarding if p is in C, maybe you can try the code below
any(sum((C-p).^2,2)==0)
or
any(all(C-p==0,2))
Example
C = [1,2;
3,-1;
1,1;
-2,5];
p1 = [1,2];
p2 = [1,-2];
>> any(sum((C-p1).^2,2)==0) # indicating p1 is in C
ans = 1
>> any(sum((C-p2).^2,2)==0) # indicating p2 is not in C
ans = 0
>> any(all(C-p1==0,2))
ans = 1
>> any(all(C-p2==0,2))
ans = 0

matlab: logically comparing two cell arrays

I have an excel file from which I obtained two string arrays, Titles of dimension 6264x1 and another Names of dimension 45696x1. I want to create an output matrix of size 6264x45696 containing in the elements a 1 or 0, a 1 if Titles contains Names.
I think I want something along the lines of:
for (j in Names)
for (k in Titles)
if (Names[j] is in Titles[k])
write to excel
end
end
end
But I don't know what functions I should use to achieve what I have in the picture. Here is what I have come up with:
[~,Title] = xlsread('exp1.xlsx',1,'A3:A6266','basic');
[~,Name] = xlsread('exp1.xlsx',2,'B3:B45698','basic');
A = cellstr(Title);
GN = cellstr(Name);
BinaryMatrix = false(45696,6264);
for i=1:1:45696
for j=1:1:6264
if (~isempty(ismember(A,GN)))
BinaryMatrix(i,j)= true;
end
end
end
the problem with this code is that it never finishes running, although there are no suggestions within matlab.
You can use third output of unique to get numbers corresponding to each string element and use bsxfun to compare numbers.
GN = cellstr(Name);
A = cellstr(Title);
B = [ GN(:); A(:)];
[~,~,u]= unique(B);
BinaryaMatrix = bsxfun(#eq, u(1:numel(GN)),u(numel(GN)+1:end).');
ismember can handle cell arrays of character vectors. Its second output tells you the information you need, from which you can build the result using sparse (it could also be done by preallocating and using [sub2ind):
[~, m] = ismember(Titles, Names);
BinaryMatrix = full(sparse(nonzeros(m), find(m), true, numel(Names), numel(Titles)));

Comparing two arrays of pixel values, and store any matches

I want to compare the pixel values of two images, which I have stored in arrays.
Suppose the arrays are A and B. I want to compare the elements one by one, and if A[l] == B[k], then I want to store the match as a key value-pair in a third array, C, like so: C[l] = k.
Since the arrays are naturally quite large, the solution needs to finish within a reasonable amount of time (minutes) on a Core 2 Duo system.
This seems to work in under a second for 1024*720 matrices:
A = randi(255,737280,1);
B = randi(255,737280,1);
C = zeros(size(A));
[b_vals, b_inds] = unique(B,'first');
for l = 1:numel(b_vals)
C(A == b_vals(l)) = b_inds(l);
end
First we find the unique values of B and the indices of the first occurrences of these values.
[b_vals, b_inds] = unique(B,'first');
We know that there can be no more than 256 unique values in a uint8 array, so we've reduced our loop from 1024*720 iterations to just 256 iterations.
We also know that for each occurrence of a particular value, say 209, in A, those locations in C will all have the same value: the location of the first occurrence of 209 in B, so we can set all of them at once. First we get locations of all of the occurrences of b_vals(l) in A:
A == b_vals(l)
then use that mask as a logical index into C.
C(A == b_vals(l))
All of these values will be equal to the corresponding index in B:
C(A == b_vals(l)) = b_inds(l);
Here is the updated code to consider all of the indices of a value in B (or at least as many as are necessary). If there are more occurrences of a value in A than in B, the indices wrap.
A = randi(255,737280,1);
B = randi(255,737280,1);
C = zeros(size(A));
b_vals = unique(B);
for l = 1:numel(b_vals)
b_inds = find(B==b_vals(l)); %// find the indices of each unique value in B
a_inds = find(A==b_vals(l)); %// find the indices of each unique value in A
%// in case the length of a_inds is greater than the length of b_inds
%// duplicate b_inds until it is larger (or equal)
b_inds = repmat(b_inds,[ceil(numel(a_inds)/numel(b_inds)),1]);
%// truncate b_inds to be the same length as a_inds (if necessary) and
%// put b_inds into the proper places in C
C(a_inds) = b_inds(1:numel(a_inds));
end
I haven't fully tested this code, but from my small samples it seems to work properly and on the full-size case, it only takes about twice as long as the previous code, or less than 2 seconds on my machine.
So, if I understand your question correctly, you want for each value of l=1:length(A) the (first) index k into B so that A(l) == B(k). Then:
C = arrayfun(#(val) find(B==val, 1, 'first'), A)
could give you your solution, as long as you're sure that every element will have a match. The above solution would fail otherwise, complaning that the function returned a non-scalar (because find would return [] if no match is found). You have two options:
Using a cell array to store the result instead of a numeric array. You would need to call arrayfun with 'UniformOutput', false at the end. Then, the values of A without matches in B would be those for which isempty(C{i}) is true.
Providing a default value for an index into A with no matches in B (e.g. 0 or NaN). I'm not sure about this one, but I think that you would need to add 'ErrorHandler', #(~,~) NaN to the arrayfun call. The error handler is a function that gets called when the function passed to arrayfun fails, and may either rethrow the error or compute a substitute value. Thus the #(~,~) NaN. I am not sure that it would work, however, since in this case the error is in arrayfun and not in the passed function, but you can try it.
If you have the images in arrays A & B
idx = A == B;
C = zeros(size(A));
C(idx) = A(idx);

Best way to compare data from file to data in array in Matlab

I am having a bit of trouble with a specific file i/o in matlab, I am fairly new to it still so some things are still a bit of a mystery to me. The input file is structured as so:
File Name: Processed_kplr003942670-2010174085026_llc.fits.txt
File contents- 6 Header Lines then:
1, 2, 3
1, 2, 3
basically a matrix of about [1443,3] with varying values
now here is the matrix that I'm comparing it to:
[(0123456, 1, 2, 3), (0123456, 2, 3, 4), (etc..)]
Now here is my problem, first I need to know how to properly do the file input in a way which can let me compare the ID number (0123456) that is in the filename with the ID value that is in the matrix, so that I can compare the other columns of both. I do not know how to achieve this in matlab. Furthermore, I need to be able to loop over every point in the the matrix that matches up to the specific file, for example:
If I have 15 files ranging from 'Processed_0123456_1' to 'Processed_0123456_15' then I want to be able to read in the values contained in 'Processed_0123456_1'and compare them to ANY row in the matrix that corresponds to that ID (0123456). I don't know if maybe accumaray can be used for this, but as I said I'm not sure.
So the code must:
-Read in file
-Compare file to any point in the matrix with corresponding ID
-Do operations
-Loop over until full list of files in the directory are read in and processed, and output a matrix with the results.
Thanks for any help.
EDIT: Exact File Sample--
Kepler I.D.-----Channel
[1161345]--------[84]
-TTYPE1--------TTYPE8------------TTYPE4
['TIME']---['PDCSAP_FLUX']---['SAP_FLUX']
['BJD - 2454833']--['e-/s']--------['e-/s']
CROWDSAP --- 0.9791
630.195880143,277165.0,268233.0
630.216312946,277214.0,268270.0
630.23674585,277239.0,268293.0
630.257178554,277296.0,268355.0
630.277611357,277294.0,268364.0
630.29804426,277365.0,268441.0
630.318476962,277337.0,268419.0
630.338909764,277403.0,268481.0
630.359342667,277389.0,268463.0
630.379775369,277441.0,268508.0
630.40020817,277545.0,268604.0
There are more entries than what was just posted but they go for about 1000 lines so it is impractical to post that all here.
To get the file ID, use regular expressions, e.g.:
filename = 'Processed_0123456_1';
file_id_str = regexprep(filename, 'Processed_(\d+)_\d+', '$1');
file_num_str = regexprep(filename, 'Processed_\d+_(\d+)', '$1')
To read in the file contents, assuming that it's all comma-separated values without a header, use textscan, e.g.,
fid = fopen(filename)
C = textscan(fid, '%f,%f,%f') % Use as many %f specifiers as you have entries per line in the file
textscan also works on strings. So, for example, if your file contents was:
filestr = sprintf('1, 2, 3\n1, 3, 3')
Then running textscan on filestr works like this:
C = textscan(filestr, '%f,%f,%f')
C =
[2x1 int32] [2x1 int32] [2x1 int32]
You can convert that to a matrix using cell2mat:
cell2mat(C)
ans =
1 2 3
1 3 3
You could then repeat this procedure for all files with the same ID and concatenate them into a single matrix, e.g.,
C_full = [];
for (all files with the same ID)
C = do_all_the_above_stuff;
C_full = [C_full; C];
end
Then you can look for what you want in C_full.
Update based on updated OP Dec 12, 2013
Here's code to read the values from a single file. Wrap this all in the the loop that I mentioned above to loop over all your files and read them all into a single matrix.
fid = fopen('/path/to/file');
% Skip over 12 header lines
for kk = 1:12
fgetl(fid);
end
% Read in values to a matrix
C = textscan(fid, '%f,%f,%f');
C = cell2mat(C);
I think your requirements are too complicated to write the whole script here. Nonetheless, I will try to give some pointers to help. Disclaimer: None of this is tested, just my best guess. Please expect syntax errors, etc. I hope you can figure them out :-)
1) You can use the textscan function with the delimiter option to get data from the lines of your file. Since your format varies as it does, we will probably want to use...
2) ... fgetl to read the first two lines into strings and process them separately using texstscan. Such an operation might look like:
fid = fopen('file.txt','w');
tline1 = fgetl(fid);
tline2 = fgetl(fid);
fclose(fid);
C1 = textscan(tline1,'%s %d %s','delimiter','_'); %C1{2} will be the integer we want
C2 = textscan(tline2,'%s %s'),'delimiter,':'); %C2{2} will be the values we want, but they're still a string so...
mat = str2num(C2{2});
3) Then, for the rest of the lines, we can use something like dlmread:
mat2 = dlmread('file.txt',',',2,0);
The 2,0 specifies the offset in 0-based rows,columns from the start of the file. You may need to look at something like vertcat to stitch mat and mat2 together.
4) The list of files in the directory can be found with the dir command. The filename is an attribute of the structure that's returned:
dirlist = dir;
for i = 1:length(dirlist)
filename = dirlist(i).name
%process your files
end
You can also pass matching strings to dir, like so:
dirlist = dir('*.txt');
which will find all of the files with extension .txt.
5) You can very easily loop through the comparison matrix:
sze = size(comparisonmatrix);
for i = 1:sze(1)
%compare comparisonmatrix(i,1) to C1{2}
%Perform whatever operations you need
end
Hope that helps!

Resources