MATLAB cell array indexing and looping - arrays

I'm trying to create a script that reads data from a text file, and plots the data onto a scatter plot.
For example, say the file name is prices.txt and contains:
Pens 2 4
Pencils 1.5 3
Rulers 3 3.5
Sharpeners 1 3
Highlighters 3 4
Where columns 2 and 3 are prices of the items for two different stores.
What my script should do is read the prices, calculates (using another function) future prices of the stores and plots these prices onto a scatter plot where x is one store and y is another. This is a silly example I know but it fits the description.
Don't worry to much about the other function that does the calculation, just assume it does what its supposed to.
Basically, I've come up with the following:
pricesfile = fopen('Prices.txt');
prices = textscan(pricesfile, '%s %d d');
fclose(pricesfile);
count = 1;
while count <= length(prices{1})
for item = constants{1}
name = constants{1}{count};
store_A = prices{2}{count};
store_B = prices{3}{count};
(...other function goes here...)
end
end
After doing this I'm completely stuck. My thought process behind this was to go through each item name, and create a vector that's assigned to this name with its two corresponding prices as items in the vector eg:
pens = [2 4]
pencils = [1.5 3]
etc. Then, I would somehow plot those items in the vector on a scatter plot and use the name of the vector as a label.
I'm not too sure how to carry out the rest of my code or even if what I've written will get me to the solution.
Please help and thanks in advance.

pricesfile = fopen('Prices.txt');
data = textscan(pricesfile, '%s %d d');
fclose(pricesfile);
You were on the right track but after this (through a bit of hackery) you don't actually need a loop:
plot(repmat(data{2},1,2)', repmat(data{3},1,2)', '.')
legend(data{1})
What you DO NOT want to do is create variables named after strings. Rather store them in an array with an array of the names (which is basically what your textscan code gives you). Matlab is very good at handling matrices/arrays.
You could also split your price array up for example:
names = prices{1};
prices = [data{2:3}];
now you can perform calculations on prices quite easily like
prices_cents = prices*100;
plot(prices_cents(:,[1,1]), prices_cents(:,[2,2]))
legend(names)
Note that the [1,1] etc above is just using indexing as a short hand to achieve what repmat does...

Related

Pytables Value Error (rank of the appended object and "..."EArray differ)

I am trying to use pytables to store my images dataset. I am using Earray to append each image as it is read. The dimensions of my Earray and image are same(except for the first, along which appending is done). I am using the following code:
atom = Atom.from_dtype(np.dtype(np.uint32,(278,278,1)))
i=0
for <read each image from folder using os into img>:
im = cv2.imread(img.path,0)
im = np.expand_dims(im,2) #this is because keras needs 3d images and grayscale images are 2d
if not i:
X = data.create_earray(dataGroup,"X",atom,(0,)+im.shape,chunkshape=(20,20,20,1))
X.append(np.expand_dims(im,0)) #as appending require same dim.
i=1
But still when I run the code, it gives my ValueError saying the my object rank is 1 and X rank is 4. How is that possible when I am assigning X size using im. I even tried printing shape of im, it gives (278,278,1). So, what is the problem? I am using Pytables for first time, so dont know them in depth.
Adding a second answer with a more complicated write method plus an EArray.read example. Frankly, I prefer my simpler method (above) to create the EArray with obj= defined, and let Pytables handle the data structures. However, if you prefer to manage this yourself, see example 2 (below). Key items to note:
Atom definition has 4 dimensions, with 0 axis set to zero (defines
the direction that will be extended).
im = np.expand_dims(im,0) is done until AFTER im.shape is referenced in the
definition of the EArray shape at creation.
[UPDATED CODE BELOW]
import tables as tb, numpy as np
data = tb.open_file("image_data1.h5", mode='w')
dataGroup = data.create_group(data.root, 'MyData')
MyAtom = tb.Atom.from_dtype(np.dtype(np.uint32,(0,278,278,1)))
im = np.arange(278*278).reshape((278,278))
im = np.expand_dims(im,2)
X = data.create_earray(dataGroup,"X", MyAtom, (0,)+im.shape)
im = np.expand_dims(im,0)
X.append( im )
print ('flavor =', X.flavor )
print ('dim=', X.ndim, ', rows = ', X.nrows)
im = np.arange(278*278,278*278+278*278).reshape((278,278))
im = np.expand_dims(im,2)
im = np.expand_dims(im,0)
X.append( im )
print ('dim=', X.ndim, ', rows = ', X.nrows)
data.close()
Here are the lines you need to read the data from EArray X (with a couple of print statements to verify values in the corners). This should work so long as the EArray flavor is Numpy (as it is in my example). You can also use the out= parameter to specify a NumPy array to receive the output data. There are other methods to access EArray data, including .iterrows() to iterate, and .__getitem__() to slice with fancy indexing. Read the Pytables documentation if you want to do any of these.
Y_1 = X.read( 0 )
print (Y_1[0,0,0])
print (Y_1[-1,-1,-1])
Y_2 = X.read( 1 )
print (Y_2[0,0,0])
print (Y_2[-1,-1,-1])
First, note that you don't have to create the EArray before you load the first image dataset. Pytables is smart enough to determine the atom and shape definition from the first object.
It was hard for me to exercise your code without a complete example and your data. So, I created a very simple example that uses np.arange() to create a couple of (278,278) image arrays, then extends them in the 2 and 0 directions. Hopefully this mimics the data you are trying to load to the EArray. The 2 Pytables functions (file.create_earray and earray.append) create 2 rows of data, 1 for each "image". After running, open image_data1.h5 with HDFView and inspect the data.
Maybe this will help you understand how to load your images to HDF5 Earrays:
import tables as tb, numpy as np
data = tb.open_file("image_data1.h5", mode='w')
dataGroup = data.create_group(data.root, 'MyData')
im = np.arange(278*278).reshape((278,278))
im = np.expand_dims(im,2)
im = np.expand_dims(im,0)
X = data.create_earray( dataGroup,"X",obj=im )
print ('dim=', X.ndim, ', rows = ', X.nrows)
im = np.arange(278*278, 278*278+278*278).reshape((278,278))
im = np.expand_dims(im,2)
im = np.expand_dims(im,0)
X.append( im )
print ('dim=', X.ndim, ', rows = ', X.nrows)
data.close()

Can You assign a value A to a value B given two sets of data in matlab?

I am very new to MatLab and have the following problem: I have 2 arrays with a lot of integers, and I would like to be able to assign any given data point from one array to another. Example:
array1 = [1, 2, 3]
array2 = [2, 4, 6]
So if I have a data from array1, in this case, I would be able to say array1*2 = array2. There are different ways to solve this, and I have two arrays with about 100k elements each. I need to divide the data into smaller segments and then create an average for each, so that I may be able to derive that for array1*X ~ array2. I need a good estimation, the example above just doesnt do it any justice. Thanks for the help
From what I understand you want to estimate X in your formula:
array1*X = array2
If you devide you get the estimate for each value, than take the mean to have an estimator of X
mean(double(array2)./double(array1))
You worry about the 100k values, 200k values in total. This is not big for Matlab. Concider that each value takes up 8 bytes. 1600kB is not much for a computer.
You could also fit it with a simple linear regression model
sum(array1.*array2) / sum(array1.^2)
Let's try again. So you have 2x 100k points that have a relation y = b*x.
You want to find b for different segments of the 100k points.
x = randi([1 2^15-1],1,1E5,'uint16'); %demo data
y = x.*2;
The two methods I gave before will yield 2:
x=double(x);y=double(y);
mean(y./x)
sum(x.*y) / sum(x.^2)
If you want to segment you can first cut the data in smaller parts
segment=1000; %nr datapoints / segment
N = length(x)/segment; %nr of segments
L = floor(rem(N,1)*segment); %nr of datapoints in last segment
N = floor(N);
xc=mat2cell(x,1,[repmat(segment,[1,N]),L]);
yc=mat2cell(y,1,[repmat(segment,[1,N]),L]);
if L==0,xc(end)=[];yc(end)=[];end
Now you can loop over xc and get the mean value
for ct = 1:length(xc)
x = xc{ct};y=yc{ct};
m(ct) = mean(y./x);
m2(ct) = sum(x.*y) / sum(x.^2);
end

How to structure multiple python arrays for sorting

A fourier analysis I'm doing outputs 5 data fields, each of which I've collected into 1-d numpy arrays: freq bin #, amplitude, wavelength, normalized amplitude, %power.
How best to structure the data so I can sort by descending amplitude?
When testing with just one data field, I was able to use a dict as follows:
fourier_tuples = zip(range(len(fourier)), fourier)
fourier_map = dict(fourier_tuples)
import operator
fourier_sorted = sorted(fourier_map.items(), key=operator.itemgetter(1))
fourier_sorted = np.argsort(-fourier)[:3]
My intent was to add the other arrays to line 1, but this doesn't work since dicts only accept 2 terms. (That's why this post doesn't solve my issue.)
Stepping back, is this a reasonable approach, or are there better ways to combine & sort separate arrays? Ultimately, I want to take the data values from the top 3 freqs and associated other data, and write them to an output data file.
Here's a snippet of my data:
fourier = np.array([1.77635684e-14, 4.49872050e+01, 1.05094837e+01, 8.24322470e+00, 2.36715913e+01])
freqs = np.array([0. , 0.00246951, 0.00493902, 0.00740854, 0.00987805])
wavelengths = np.array([inf, 404.93827165, 202.46913583, 134.97942388, 101.23456791])
amps = np.array([4.33257766e-16, 1.09724890e+00, 2.56328871e-01, 2.01054261e-01, 5.77355886e-01])
powers% = np.array([4.8508237956526163e-32, 0.31112370227749603, 0.016979224022185751, 0.010445983875848858, 0.086141014686372669])
The last 4 arrays are other fields corresponding to 'fourier'. (Actual array lengths are 42, but pared down to 5 for simplicity.)
You appear to be using numpy, so here is the numpy way of doing this. You have the right function np.argsort in your post, but you don't seem to use it correctly:
order = np.argsort(amplitudes)
This is similar to your dictionary trick only it computes the inverse shuffling compared to your procedure. Btw. why go through a dictionary and not simply a list of tuples?
The contents of order are now indices into amplitudes the first cell of order contains the position of the smallest element of amplitudes, the second cell contains the position of the next etc. Therefore
top5 = order[:-6:-1]
Provided your data are 1d numpy arrays you can use top5 to extract the elements corresponding to the top 5 ampltiudes by using advanced indexing
freq_bin[top5]
amplitudes[top5]
wavelength[top5]
If you want you can group them together in columns and apply top5 to the resulting n-by-5 array:
np.c_[freq_bin, amplitudes, wavelength, ...][top5, :]
If I understand correctly you have 5 separate lists of the same length and you are trying to sort all of them based on one of them. To do that you can either use numpy or do it with vanilla python. Here are two examples from top of my head (sorting is based on the 2nd list).
a = [11,13,10,14,15]
b = [2,4,1,0,3]
c = [22,20,23,25,24]
#numpy solution
import numpy as np
my_array = np.array([a,b,c])
my_sorted_array = my_array[:,my_array[1,:].argsort()]
#vanilla python solution
from operator import itemgetter
my_list = zip(a,b,c)
my_sorted_list = sorted(my_list,key=itemgetter(1))
You can then flip the array with my_sorted_array = np.fliplr(my_sorted_array) if you wish or if you are working with lists you can reverse it in place with my_sorted_list.reverse()
EDIT:
To get first n values only, you have to simply slice the array similarly to what #Paul is suggesting. Slice is done in a similar manner to classic list slicing by specifying start:stop:step (you can omit the step) arguments. In your case for 5 top columns it would be [:,-5:]. So in the example above you can take top 2 columns from each row like this:
my_sliced_sorted_array = my_sorted_array[:,-2:]
result will be:
array([[15, 13],
[ 3, 4],
[24, 20]])
Hope it helps.

Split matrix into several depending on value in Matlab

I have a cell array that I need to split into several matrices so that I can take the sum of subsets of the data. This is a sample of what I have:
A = {'M00.300', '1644.07';...
'M00.300', '9745.42'; ...
'M00.300', '2232.88'; ...
'M00.600', '13180.82'; ...
'M00.600', '2755.19'; ...
'M00.600', '15800.38'; ...
'M00.900', '18088.11'; ...
'M00.900', '1666.61'};
I want the sum of the second columns for each of 'M00.300', 'M00.600', and 'M00.900'. For example, to correspond to 'M00.300' I would want 1644.07 + 9745.42 + 2232.88.
I don't want to just hard code it because each data set is different, so I need the code to work for different size cell arrays.
I'm not sure of the best way to do this, I was going to begin by looping through A and comparing the strings in the first column and creating matrices within that loop, but that sounded messy and not efficient.
Is there a simpler way to do this?
Classic use of accumarray. You would use the first column as an index and the second column as the values associated with each index. accumarray works where you group values that belong to the same index together and you apply a function to those values. In your case, you'd use the default behaviour and sum things together.
However, you'll need to convert the first column into numeric labels. The third output of unique will help you do this. You'll also need to convert the second column into a numeric array and so str2double is a perfect way to do this.
Without further ado:
[val,~,id] = unique(A(:,1)); %// Get unique values and indices
out = accumarray(id, str2double(A(:,2))); %// Aggregate the groups and sum
format long g; %// For better display of precision
T = table(val, out) %// Display on a nice table
I get this:
>> T = table(val, out)
T =
val out
_________ ________
'M00.300' 13622.37
'M00.600' 31736.39
'M00.900' 19754.72
The above uses the table class that is available from R2013b and onwards. If you don't have this, you can perhaps use a for loop and print out each cell and value separately:
for idx = 1 : numel(out)
fprintf('%s: %f\n', val{idx}, out(idx));
end
We get:
M00.300: 13622.370000
M00.600: 31736.390000
M00.900: 19754.720000

Split array into smaller unequal-sized arrays dependend on array-column values

I'm quite new to MatLab and this problem really drives me insane:
I have a huge array of 2 column and about 31,000 rows. One of the two columns depicts a spatial coordinate on a grid the other one a dependent parameter. What I want to do is the following:
I. I need to split the array into smaller parts defined by the spatial column; let's say the spatial coordinate are ranging from 0 to 500 - I now want arrays that give me the two column values for spatial coordinate 0-10, then 10-20 and so on. This would result in 50 arrays of unequal size that cover a spatial range from 0 to 500.
II. Secondly, I would need to calculate the average values of the resulting columns of every single array so that I obtain per array one 2-dimensional point.
III. Thirdly, I could plot these points and I would be super happy.
Sadly, I'm super confused since I miserably fail at step I. - Maybe there is even an easier way than to split the giant array in so many small arrays - who knows..
I would be really really happy for any suggestion.
Thank you,
Arne
First of all, since you wish a data structure of array of different size you will need to place them in a cell array so you could try something like this:
res = arrayfun(#(x)arr(arr(:,1)==x,:), unique(arr(:,1)), 'UniformOutput', 0);
The previous code return a cell array with the array splitted according its first column with #(x)arr(arr(:,1)==x,:) you are doing a function on x and arrayfun(function, ..., 'UniformOutput', 0) applies function to each element in the following arguments (taken a single value of each argument to evaluate the function) but you must notice that arr must be numeric so if not you should map your values to numeric values or use another way to select this values.
In the same way you could do
uo = 'UniformOutput';
res = arrayfun(#(x){arr(arr(:,1)==x,:), mean(arr(arr(:,1)==x,2))), unique(arr(:,1)), uo, 0);
You will probably want to flat the returning value, check the function cat, you could do:
res = cat(1,res{:})
Plot your data depends on their format, so I can't help if i don't know how the data are, but you could try to plot inside a loop over your 'res' variable or something similar.
Step I indeed comes with some difficulties. Once these are solved, I guess steps II and III can easily be solved. Let me make some suggestions for step I:
You first define the maximum value (maxValue = 500;) and the step size (stepSize = 10;). Now it is possible to iterate through all steps and create your new vectors.
for k=1:maxValue/stepSize
...
end
As every resulting array will have different dimensions, I suggest you save the vectors in a cell array:
Y = cell(maxValue/stepSize,1);
Use the find function to find the rows of the entries for each matrix. At each step k, the range of values of interest will be (k-1)*stepSize to k*stepSize.
row = find( (k-1)*stepSize <= X(:,1) & X(:,1) < k*stepSize );
You can now create the matrix for a stepk by
Y{k,1} = X(row,:);
Putting everything together you should be able to create the cell array Y containing your matrices and continue with the other tasks. You could also save the average of each value range in a second column of the cell array Y:
Y{k,2} = mean( Y{k,1}(:,2) );
I hope this helps you with your task. Note that these are only suggestions and there may be different (maybe more appropriate) ways to handle this.

Resources