Add the loop index with variable, array & table names in matlab [duplicate] - arrays

I have a variable that is created by a loop. The variable is large enough and in a complicated enough form that I want to save the variable each time it comes out of the loop with a different name.
PM25 is my variable. But I want to save it as PM25_year in which the year changes based on `str = fname(13:end)'
PM25 = permute(reshape(E',[c,r/nlay,nlay]),[2,1,3]); % Reshape and permute to achieve the right shape. Each face of the 3D should be one day
str = fname(13:end); % The year
% Third dimension is organized so that the data for each site is on a face
save('PM25_str', 'PM25_Daily_US.mat', '-append')
The str would be a year, like 2008. So the variable saved would be PM25_2008, then PM25_2009, etc. as it is created.

Defining new variables based on data isn't considered best practice, but you can store your data more efficiently using a cell array. You can store even a large, complicated variable like your PM25 variable within a single cell. Here's how you could go about doing it:
Place your PM25 data for each year into the cell array C using your loop:
for i = 1:numberOfYears
C{i} = PM25;
end
Resulting in something like this:
C = { PM25_2005, PM25_2006, PM25_2007 };
Now let's say you want to obtain your variable for the year 2006. This is easy (assuming you aren't skipping years). The first year of your data will correspond to position 1, the second year to position 2, etc. So to find the index of the year you want:
minYear = 2005;
yearDesired = 2006;
index = yearDesired - minYear + 1;
PM25_2006 = C{index};

You can do this using eval, but note that it's often not considered good practice. eval may be a security risk, as it allows user input to be executed as code. A better way to do this may be to use a cell array or an array of objects.
That said, I think this will do what you want:
for year = 2008:2014
eval(sprintf('PM25_%d = permute(reshape(E',[c,r/nlay,nlay]),[2,1,3]);',year));
save('PM25_Daily_US.mat',sprintf('PM25_%d',year),'-append');
end

I do not recommend to set variables like this since there is no way to track these variables and completely prevents all kind of error checking that MATLAB does beforehand. This kind of code is handled completely in runtime.
Anyway in case you have a really good reason for doing this I recommend that you use the function assignin for this.
assignin('caller', ['myvar',num2str(1)], 63);

Related

What is the difference between indexing an array and a dictionary?

From my understanding, an array is a simple table of values, such as local t = {"a","b","c"}, and a dictionary is a table of objects, such as local t = {a = 1, b = 2, c = 3} Of course, let me know if I'm wrong in either or both cases.
Anyways, my question lies in how we index the entries in either of these cases. For example, let's say I have the following code:
local t = {"TestEntry"}
print(t["TestEntry"])
Of course, this prints nil. However, when we use a dictionary the same way:
local t = {TestEntry = 1}
print(t["TestEntry"])
This, naturally, prints 1. My question is, why does it work this way for dictionaries, but not arrays?
Finally, I'd like to address the issue that led me to this question. Let's say, before I want to run a chunk of code, I need to see if a specific value is inside a table. It would be convenient if I could just check if it is in the table with table["GivenEntry"], but, as we have seen, this would only work if the entry in the table is actually an object. In my specific case, I am simply using an array, so it is not an object.
Thus, I had to resort to using a for loop to check the table:
local t = {"TestEntry1","TestEntry2"}
for i,v in pairs(t) do
if v == "TestEntry1" then
--do code
end
end
After doing this, it almost seemed as if it would be easier to create a silly dictionary, like:
local t = {TestEntry1 = "TestEntry1"}
because then, I could simply run t["TestEntry1"], and I wouldn't have to worry about having an empty table (because then the for loop would not run). Are there ramifications to creating a dictionary for such purposes? Is it less efficient in general?
Your input is appreciated,
Thank you.
In Lua both arrays an dictionaries are the same type (the table). local t = {"TestEntry"} is essentially short for local t = {[1] = "TestEntry"} (The brackets are needed by Lua for a number, you would access it with t[1]).
So the options for checking if "TestEntry1" is in the table are as you have written. A dictionary takes more memory and depending on how many values you have may take a while to create, but accessing a key should be constant time. Whereas to loop through the table will take longer and longer the more items you have so it is a tradeoff you have to decide on.
There are faster ways to search an array however (e.g. if it is sorted: https://en.wikipedia.org/wiki/Binary_search_algorithm)

Add a element to a 2D time series in Matlab

I have a very simple problem but I am wondering if there is a simpler way to solve it (they must be).
I have a matrix which is 10 by 10 and contains double. I need to create a time serie with those data points.
The way i am doing it is as follow. I create a 3D array with the thrid dimension being the time. And everyday I add the new data in the array by increasing the time dimension by one.
Here is the code:
TS_updated = zeros(size(TS_Current)+[0,0,1]);
TS_updated(:,:,1:end-1) = TS_Current;
TS_updated(:,:,end) = TS_New;
where TS_Current is the existing 3D array representing the time serie and TS_New is the new data from today which I need to add to the time series.
Is there a quicker way to append the last element such as with 2D table:
TS_updated = [TS_Current;TS_New];
Or even maybe a smarter way to store the time serie?
You can also use
TS(:,:,end+1) = TS_new;
And you might also want to preallocate if you intend to extend the series more often than once per day. You can start with any length and double space when that limit is reached.
There is no clearly better way of arranging data I could see. You might flatten it to 100xTime instead of 10x10xTime, but it depends whether it would help.
Use the cat function (documentation) in the third dimension:
TS_updated = cat(3, TS_Current, TS_New);
You could include error checking first by using
% Check dimensions 1 and 2 are consistent first
if size(TS_Current,1) == size(TS_New,1) && size(TS_Current,2) == size(TS_New,2)
% Now concatenate
TS_updated = cat(2, TS_Current, TS_New);
else
error('New time series has incorrect dimensions')
end
You want to concatenate in the 3rd dimension?
A=ones(3,3,2)
B=rand(3,3);
C=cat(3,A,B)

Concatenate string && integer as array variable of double type - MATLAB

I currently look for an advice on the below piece of code which consists of efficiently looping through a dataset (of cell type) and extracting each column as data vector.
[i,j]=size(fimat);
k=2;
while k<=j % looping through columns
[num2str(k-1),'yr']=cell2mat(fimat(:,k)); %extract each column as vector
k=k+1;
end
My matter undeniably lies in the following statement:
[num2str(k-1),'yr']
that correctly concatenates numbers (reflected by variable k) and string name 'yr'. However the syntax fails in assigning for instance (during 1st iteration)
1yr=cell2mat(fimat(:,2))
The resulting error speaks from itself
Error: An array for multiple LHS assignment cannot contain LEX_TS_STRING.
but I'm still figuring out a way to do it. Thus any feedback would be appreciated.
Thanks
First of all, in matlab, a variable name cannot start with a digit. You should modify your code such that the variable name starts with either a letter or an underscore.
For instance ['yr' num2str(k-1)] or ['_' num2str(k-1) 'yr'] would be better.
Then, what you are trying to do is very strongly discouraged by everyone, including The Mathworks. It would be much better to use a cell yr and call to yr{k} rather than iterative variable names:
yr = cell(j,1);
for k = 2:j
yr{k-1} = cell2mat(fimat(:,k));
end
Anyway, if you still want to do this, you can use eval
while k<=j
eval(['_' num2str(k-1) 'yr = cell2mat(fimat(:,k));']);
k=k+1;
end
Best,
You can not dynamically create variable names like you did. The left side of the = must be a identifier, not a char. The alternative I recommend is to use a cell array instead of individual variable names. For example:
yr{k-1}=cell2mat(fimat(:,k))
If you must use variable names with numbers, which I strongly recommend not to do, you have to use eval for the line. Alternatives which I strongly recommend to check before using eval are struct with dynamic field names and containers.Map
Here is my answer to the question, for sharing purposes. Hope it will help and Thanks to the contributors of this post.
[i,j]=size(fimat); %get dimension of dataset (of cell type)
numdata=cell2mat(fimat(1:i,2:j)); %extract only numeric from dataset
for k=1:j-1
eval(sprintf('yr%d = numdata(:,k)', k));
end

Saving parts of Matlab cell array

I am using Matlab for some data collection, and I want to save the data after each trial (just in case something goes wrong). The data is organized as a cell array of cell arrays, basically in the format
data{target}{trial} = zeros(1000,19)
But the actual data gets up to >150 MB by the end of the collection, so saving everything after each trial becomes prohibitively slow.
So now I am looking at opting for the matfile approach (http://www.mathworks.de/de/help/matlab/ref/matfile.html), which would allow me to only save parts of the data. The problem: this doesn't support cells of cell arrays, which means I couldn't change/update the data for a single trial; I would have to re-save the entire target's data (100 trials).
So, my question:
Is there another different method I can use to save parts of the cell array to speed up saving?
(OR)
Is there a better way to format my data that would work with this saving process?
A not very elegant but possibly effective solution is to use trial as part of the variable name. That is, use not a cell array of cell arrays (data{target}{trial}), but just different cell arrays such as data_1{target}, data_2{target}, where 1, 2 are the values of the trial counter.
You could do that with eval: for example
trial = 1; % change this value in a for lopp
eval([ 'data_' num2str(trial) '{target} = zeros(1000,19);']); % fill data_1{target}
You can then save the data for each trial in a different file. For example, this
eval([ 'save temp_save_file_' num2str(trial) ' data_' num2str(trial)])
saves data_1 in file temp_save_file_1, etc.
Update:
Actually it does appear to be possible to index into cell arrays, just not iside cell arrays. Hence, if you store your data slightly differently it seems like you can use matfile to update only part of it. See this example:
x = cell(3,4);
save x;
matObj = matfile('x.mat','writable',true);
matObj.x(3,4) = {eye(10)};
Note that this gives me a version warning, but it seems to work.
Hope this does the trick. However, still look into the next part of my answer as it may help you even more.
For calculations it is usually not required to save to disk after every iteration. An easy way to get a speedup (at the cost of a little more risk) is to save only after every n trials.
Like this for example:
maxTrial = 99;
saveEvery = 10;
for trial = 1:maxTrial
myFun; %Do your calculations here
if trial == maxTrial || mod(trial, saveEvery) == 0
save %Put your save command here
end
end
If your data is always at (or within) a certain size, you can also choose to store your data in a matrix rather than a cell array, then you can use indexing to save only part of the file.
In response to #Luis I will post an other way to deal with the situation.
It is indeed an option to save data in named variables or files, but to save a named variable in a named file seems too much.
If you only change the name of the file, you can save everything without using eval:
assuming you are dealing with trial 't':
filename = ['temp_save_file_' + num2str(t)];
If you really want, you can use print commands to write it as 001 for example.
Now you can simply use this:
save(filename, myData)
To use this, construct the filename again and so something like this:
totalData = {}; %Initialize your total data
And then read them as you wrote them (inside a loop):
load(filename)
totalData{t} = myData

When is it appropriate to use a cell array vs. a struct in Matlab?

If I want to store some strings or matrices of different sizes in a single variable, I can think of two options: I could make a struct array and have one of the fields hold the data,
structArray(structIndex).structField
or I could use a cell array,
cellArray{cellIndex}
but is there a general rule-of-thumb of when to use which data structure? I'd like to know if there are downsides to using one or the other in certain situations.
In my opinion it's more a matter of convenience and code clarity. Ask yourself would you prefer to refer your variable elements by number(s) or by name. Then use cell array in former case and struct array in later. Think about it as if you have a table with and without headers.
By the way you can easily convert between structures and cells with CELL2STRUCT and STRUCT2CELL functions.
If you use it for computation within a function, I suggest you use cell arrays, since they're more convenient to handle, thanks e.g. to CELLFUN.
However, if you use it to store data (and return output), it's better to return structures, since the field names are (should be) self-documenting, so you don't need to remember what information you had in column 7 of your cell array. Also, you can easily include a field 'help' in your structure where you can put some additional explanation of the fields, if necessary.
Structures are also useful for data storage since you can, if you want to update your code at a later date, replace them with objects without needing to change your code (at least in case you did pre-assignment of your structure). They have the same sytax, but objects will allow you to add more functionality, such as dependent properties (i.e. properties that are calculated on the fly based on other properties).
Finally, note that cells and structures add a few bytes of overhead to every field. Thus, if you want to use them to handle large amounts of data, you're much better off to use structures/cells containing arrays, rather than having large arrays of structures/cells where the fields/elements only contain scalars.
This code suggests that cell arrays may be roughly twice as fast as structs for assignment and retrieval. I did not separate the two operations. One could easily modify the code to do that.
Running "whos" afterwards suggests that they use very similar amounts of memory.
My goal was to make a "list of lists" in python terminology. Perhaps an "array of arrays".
I hope this is interesting/useful!
%%%%%%%%%%%%%% StructVsCell.m %%%%%%%%%%%%%%%
clear all
M = 100; % number of repetitions
N = 2^10; % size of cell array and struct
for m = 1:M
% Fill up a template cell array with
% lists of randomly sized matrices with
% random elements.
template{N} = 0;
for n = 1:N
r1 = round(24*rand());
r2 = round(24*rand());
r3 = rand(round(r2*rand),round(r1*rand()));
template{N} = r3;
end
% Make a cell array equivalent
% to the template.
cell_array = template;
% Create a struct with the
% same data.
structure = struct('data',0);
for n = 1:N
structure(n).data = template{n};
end
% Time cell array
tic;
for n = 1:N
data = cell_array{n};
cell_array{n} = data';
end
cell_time(m) = toc;
% Time struct
tic;
for n = 1:N
data = structure(n).data;
structure(n).data = data';
end
struct_time(m) = toc;
end
str = sprintf('cell array: %0.4f',mean(cell_time));
disp(str);
str = sprintf('struct: %0.4f',mean(struct_time));
disp(str);
str = sprintf('struct_time / cell_time: %0.4f',mean(struct_time)/mean(cell_time));
disp(str);
% Check memory use
whos
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
First and foremost, I second yuk's answer. Clarity is generally more important in the long run.
However, you may have two more options depending on how irregularly shaped your data is:
Option 3: structScalar.structField(fieldIndex)
Option 4: structScalar.structField{cellIndex}
Among the four, #3 has the least memory overhead for large numbers of elements (it minimizes the total number of matrices), and by large numbers I mean >100,000. If your code lends itself to vectorizing on structField, it is probably a performance win, too. If you can't collect each element of structField into a single matrix, option 4 has the notational benefits without the memory & performance advantages of option 3. Both of these options make it easier to use arrayfun or cellfun on the entire dataset, at the expense of requiring you to add or remove elements from each field individually. The choice depends on how you use your data, which brings us back to yuk's answer -- choose what makes for the clearest code.

Resources