How to sort structure arrays in MATLAB? - arrays

I'm working with an image retrieval system using color histogram intersection in MATLAB. This method gives me the following data: a real number which represents the histogram intersection distance, and the image file name. Because they are different data types, I store them in structure array with two fields, and then I save this structure in a .mat file. Now I need to sort this structure according to the histogram intersection distance in descending order in order to retrieve the image with the highest histogram intersection distance. I've tried many methods to sort this data but without result. Please can you help me solve this problem?

It's also possible to sort the entire structure.
To build off of gnovice's example...
% Create a structure array
s = struct('value',{1 7 4},'file',{'img1.jpg' 'img2.jpg' 'img3.jpg'});
% Sort the structure according to values in descending order
% We are only interested in the second output from the sort command
[blah, order] = sort([s(:).value],'descend');
% Save the sorted output
sortedStruct = s(order);

Here's one example of how you could do this, using the function MAX instead of having to sort:
%# First, create a sample structure array:
s = struct('value',{1 7 4},'file',{'img1.jpg' 'img2.jpg' 'img3.jpg'});
%# Next concatenate the "value" fields and find the index of the maximum value:
[maxValue,index] = max([s.value]);
%# Finally, get the file corresponding to the maximum value:
maxFile = s(index).file;
EDIT : If you would like to get the N highest values, and not just the maximum, you can use SORT instead of MAX (as Shaka suggested). For example (using the above structure):
>> N = 2; %# Get two highest values
>> [values,index] = sort([s.value],'descend'); %# Sort all values, largest first
>> topNFiles = {s(index(1:N)).file} %# Get N files with the largest values
topNFiles =
'img2.jpg' 'img3.jpg'


Displaying a matrix obtained from a vectorised structure

I am trying to extract the coordinates located in the variable x within a .mat structure. I would like to print them as a three column matrix. Let's say:
-5543837.67700032 -2054567.16633347 2387852.25825667
4641938.565315761 393003.28157792 4133325.70392322
-3957408.7414133 3310229.46968631 3737494.72491701
1492206.38965564 -4458130.51073730 4296015.51539152
4075539.69798060 931735.497964395 4801629.46009471
3451207.69353006 3060375.44622100 4391915.05780934
I know that I can get them with
stat = [file.scan.stat]';
x = [stat.x]';
But I get something like:
% :: and so on
I would like to print them as I showed at the beginning (x as a vector of 3 coordinates and one line per station) but I don't know how to treat them. I have tried with loops but I really don't know how to express them.
How can I display my coordinates as an n -by- 3 matrix?
This is the scan file:
This is x:
[stat.x].' gives you a flattened vector, as you've seen. You can reshape() that vector to the desired format:
x = reshape(x,3,[]).';
This reshapes first to 3 rows and n columns (your number of stations), then transposed to have n rows of 3 columns.
For a short introduction on how reshape works, see this answer of mine.

How to structure multiple python arrays for sorting

A fourier analysis I'm doing outputs 5 data fields, each of which I've collected into 1-d numpy arrays: freq bin #, amplitude, wavelength, normalized amplitude, %power.
How best to structure the data so I can sort by descending amplitude?
When testing with just one data field, I was able to use a dict as follows:
fourier_tuples = zip(range(len(fourier)), fourier)
fourier_map = dict(fourier_tuples)
import operator
fourier_sorted = sorted(fourier_map.items(), key=operator.itemgetter(1))
fourier_sorted = np.argsort(-fourier)[:3]
My intent was to add the other arrays to line 1, but this doesn't work since dicts only accept 2 terms. (That's why this post doesn't solve my issue.)
Stepping back, is this a reasonable approach, or are there better ways to combine & sort separate arrays? Ultimately, I want to take the data values from the top 3 freqs and associated other data, and write them to an output data file.
Here's a snippet of my data:
fourier = np.array([1.77635684e-14, 4.49872050e+01, 1.05094837e+01, 8.24322470e+00, 2.36715913e+01])
freqs = np.array([0. , 0.00246951, 0.00493902, 0.00740854, 0.00987805])
wavelengths = np.array([inf, 404.93827165, 202.46913583, 134.97942388, 101.23456791])
amps = np.array([4.33257766e-16, 1.09724890e+00, 2.56328871e-01, 2.01054261e-01, 5.77355886e-01])
powers% = np.array([4.8508237956526163e-32, 0.31112370227749603, 0.016979224022185751, 0.010445983875848858, 0.086141014686372669])
The last 4 arrays are other fields corresponding to 'fourier'. (Actual array lengths are 42, but pared down to 5 for simplicity.)
You appear to be using numpy, so here is the numpy way of doing this. You have the right function np.argsort in your post, but you don't seem to use it correctly:
order = np.argsort(amplitudes)
This is similar to your dictionary trick only it computes the inverse shuffling compared to your procedure. Btw. why go through a dictionary and not simply a list of tuples?
The contents of order are now indices into amplitudes the first cell of order contains the position of the smallest element of amplitudes, the second cell contains the position of the next etc. Therefore
top5 = order[:-6:-1]
Provided your data are 1d numpy arrays you can use top5 to extract the elements corresponding to the top 5 ampltiudes by using advanced indexing
If you want you can group them together in columns and apply top5 to the resulting n-by-5 array:
np.c_[freq_bin, amplitudes, wavelength, ...][top5, :]
If I understand correctly you have 5 separate lists of the same length and you are trying to sort all of them based on one of them. To do that you can either use numpy or do it with vanilla python. Here are two examples from top of my head (sorting is based on the 2nd list).
a = [11,13,10,14,15]
b = [2,4,1,0,3]
c = [22,20,23,25,24]
#numpy solution
import numpy as np
my_array = np.array([a,b,c])
my_sorted_array = my_array[:,my_array[1,:].argsort()]
#vanilla python solution
from operator import itemgetter
my_list = zip(a,b,c)
my_sorted_list = sorted(my_list,key=itemgetter(1))
You can then flip the array with my_sorted_array = np.fliplr(my_sorted_array) if you wish or if you are working with lists you can reverse it in place with my_sorted_list.reverse()
To get first n values only, you have to simply slice the array similarly to what #Paul is suggesting. Slice is done in a similar manner to classic list slicing by specifying start:stop:step (you can omit the step) arguments. In your case for 5 top columns it would be [:,-5:]. So in the example above you can take top 2 columns from each row like this:
my_sliced_sorted_array = my_sorted_array[:,-2:]
result will be:
array([[15, 13],
[ 3, 4],
[24, 20]])
Hope it helps.

Split matrix into several depending on value in Matlab

I have a cell array that I need to split into several matrices so that I can take the sum of subsets of the data. This is a sample of what I have:
A = {'M00.300', '1644.07';...
'M00.300', '9745.42'; ...
'M00.300', '2232.88'; ...
'M00.600', '13180.82'; ...
'M00.600', '2755.19'; ...
'M00.600', '15800.38'; ...
'M00.900', '18088.11'; ...
'M00.900', '1666.61'};
I want the sum of the second columns for each of 'M00.300', 'M00.600', and 'M00.900'. For example, to correspond to 'M00.300' I would want 1644.07 + 9745.42 + 2232.88.
I don't want to just hard code it because each data set is different, so I need the code to work for different size cell arrays.
I'm not sure of the best way to do this, I was going to begin by looping through A and comparing the strings in the first column and creating matrices within that loop, but that sounded messy and not efficient.
Is there a simpler way to do this?
Classic use of accumarray. You would use the first column as an index and the second column as the values associated with each index. accumarray works where you group values that belong to the same index together and you apply a function to those values. In your case, you'd use the default behaviour and sum things together.
However, you'll need to convert the first column into numeric labels. The third output of unique will help you do this. You'll also need to convert the second column into a numeric array and so str2double is a perfect way to do this.
Without further ado:
[val,~,id] = unique(A(:,1)); %// Get unique values and indices
out = accumarray(id, str2double(A(:,2))); %// Aggregate the groups and sum
format long g; %// For better display of precision
T = table(val, out) %// Display on a nice table
I get this:
>> T = table(val, out)
T =
val out
_________ ________
'M00.300' 13622.37
'M00.600' 31736.39
'M00.900' 19754.72
The above uses the table class that is available from R2013b and onwards. If you don't have this, you can perhaps use a for loop and print out each cell and value separately:
for idx = 1 : numel(out)
fprintf('%s: %f\n', val{idx}, out(idx));
We get:
M00.300: 13622.370000
M00.600: 31736.390000
M00.900: 19754.720000

Sorting elements of a single array into different subarrays

I have an 1000 element array with values ranging from 1 - 120. I want to split this array into 6 different subarrays with respect to the value range
for ex:
array1 with values from ranges 0-20.
array 2 with values from range 20-40........100-120 etc.
At the end I would like to plot a histogram with X-axis as the range and each bar depicting the number of elements in that range. I dont know of any other way for 'this' kind of plotting.
In other words, you want to create a histogram. Matlab's hist() will do this for you.
If you only need the histogram, you can achieve the result using histc, like this:
edges = 0:20:120; % edges to compute histogram
n = histc(array,edges);
n = n(1:end-1); % remove last (no needed in your case; see "histc" doc)
bar(edges(1:end-1)+diff(edges)/2, n); % do the plot. For x axis use
% mean value of each bin

Creating sub-arrays from large single array based on marker values

I need to create a 1-D array of 2-D arrays, so that a program can read each 2-D array separately.
I have a large array with 5 columns, with the second column storing 'marker' data. Depending on the marker value, I need to take the corresponding data from the remaining 4 columns and put them into a new array on its own.
I was thinking of having two for loops running, one to take the target data and write it to a cell in the 1-D array, and one to read the initial array line-by-line, looking for the markers.
I feel like this is a fairly simple issue, I'm just having trouble figuring out how to essentially cut and paste certain parts of an array and write them to a new one.
Thanks in advance.
No for loops needed, use your marker with logical indexing. For example, if your large array is A :
B=A(A(:,2)==marker,[1 3:5])
will select all rows where the marker was present, without the 2nd col. Then you can use reshape or the (:) operator to make it 1D, for example
or, if you want a one-liner:
B=reshape(A(A(:,2)==marker,[1 3:5]),1,[]);
I am just answering my own question to show any potential future users the solution I came up with eventually.
MARKER_DATA=csvread('ESphnB2.csv'); % load data from csv file
A=MARKER_DATA(:,2); % create 1D array for markers
A=A'; % make column into row
for i=1:length(A) % for every marker
if A(i) ~= 231 % if it is not 231 then
A(i)=0; % set value to zero
edgeArray = diff([0; (A(:) ~= 0); 0]); % set non-zero values to 1
ind = [find(edgeArray > 0) find(edgeArray < 0)-1]; % find indices of 1 and save to array with beginning and end
t=1; % initialize counter for trials
for j=1:size(ind,1) % for every marked index
B{t}=MARKER_DATA(ind(j,1):ind(j,2),[3:6]); % create an array with the rows from the data according to indicies
t=t+1; % create a new trial
gazeVectors=B'; % reorient and rename array of trials for saccade analysis
save('Trial_Data_2.mat','gazeVectors'); % save array to mat file
