I have a very simple problem but I am wondering if there is a simpler way to solve it (they must be).
I have a matrix which is 10 by 10 and contains double. I need to create a time serie with those data points.
The way i am doing it is as follow. I create a 3D array with the thrid dimension being the time. And everyday I add the new data in the array by increasing the time dimension by one.
Here is the code:
TS_updated = zeros(size(TS_Current)+[0,0,1]);
TS_updated(:,:,1:end-1) = TS_Current;
TS_updated(:,:,end) = TS_New;
where TS_Current is the existing 3D array representing the time serie and TS_New is the new data from today which I need to add to the time series.
Is there a quicker way to append the last element such as with 2D table:
TS_updated = [TS_Current;TS_New];
Or even maybe a smarter way to store the time serie?
You can also use
TS(:,:,end+1) = TS_new;
And you might also want to preallocate if you intend to extend the series more often than once per day. You can start with any length and double space when that limit is reached.
There is no clearly better way of arranging data I could see. You might flatten it to 100xTime instead of 10x10xTime, but it depends whether it would help.
Use the cat function (documentation) in the third dimension:
TS_updated = cat(3, TS_Current, TS_New);
You could include error checking first by using
% Check dimensions 1 and 2 are consistent first
if size(TS_Current,1) == size(TS_New,1) && size(TS_Current,2) == size(TS_New,2)
% Now concatenate
TS_updated = cat(2, TS_Current, TS_New);
else
error('New time series has incorrect dimensions')
end
You want to concatenate in the 3rd dimension?
A=ones(3,3,2)
B=rand(3,3);
C=cat(3,A,B)
Related
I want to add a constant to the second column of an array.
I do this as shown below:
Where for illustration the values are as follows:
What is the most efficient way of adding a constant to an array column?
With a question about efficiency you should supply number. For anything lower than a 1000 x 1000 2D array I can't measure the difference. Usually it is best to simply test it.
Here the code for testing (same answer as crossrulz)
With a 10000 x 10000 array option 2 becomes about 10 times faster.
One comment unless you are in a very high demanding situation, readability is usually preferred over efficiency. In my opinion option 2 is more readable since it has no for loop and the constant is presented as a constant instead of an array.
But you can get more efficient than that by using the In Place Element structure. The image below shows two different ways to add 5 to a column. The second one avoids making a memory copy of the entire array. Indexing out a column of an array with Index Array and then modifying it requires a shift of underlying memory format, even though the array is going to be put back in the Replace Array Subset. The In Place Element structure gives enough context to LabVIEW for it to recognize that the Add can be done without data copies.
Index Array to get the second column, add your constant, and then Replace Array Subset to replace the second column.
I have a variable that is created by a loop. The variable is large enough and in a complicated enough form that I want to save the variable each time it comes out of the loop with a different name.
PM25 is my variable. But I want to save it as PM25_year in which the year changes based on `str = fname(13:end)'
PM25 = permute(reshape(E',[c,r/nlay,nlay]),[2,1,3]); % Reshape and permute to achieve the right shape. Each face of the 3D should be one day
str = fname(13:end); % The year
% Third dimension is organized so that the data for each site is on a face
save('PM25_str', 'PM25_Daily_US.mat', '-append')
The str would be a year, like 2008. So the variable saved would be PM25_2008, then PM25_2009, etc. as it is created.
Defining new variables based on data isn't considered best practice, but you can store your data more efficiently using a cell array. You can store even a large, complicated variable like your PM25 variable within a single cell. Here's how you could go about doing it:
Place your PM25 data for each year into the cell array C using your loop:
for i = 1:numberOfYears
C{i} = PM25;
end
Resulting in something like this:
C = { PM25_2005, PM25_2006, PM25_2007 };
Now let's say you want to obtain your variable for the year 2006. This is easy (assuming you aren't skipping years). The first year of your data will correspond to position 1, the second year to position 2, etc. So to find the index of the year you want:
minYear = 2005;
yearDesired = 2006;
index = yearDesired - minYear + 1;
PM25_2006 = C{index};
You can do this using eval, but note that it's often not considered good practice. eval may be a security risk, as it allows user input to be executed as code. A better way to do this may be to use a cell array or an array of objects.
That said, I think this will do what you want:
for year = 2008:2014
eval(sprintf('PM25_%d = permute(reshape(E',[c,r/nlay,nlay]),[2,1,3]);',year));
save('PM25_Daily_US.mat',sprintf('PM25_%d',year),'-append');
end
I do not recommend to set variables like this since there is no way to track these variables and completely prevents all kind of error checking that MATLAB does beforehand. This kind of code is handled completely in runtime.
Anyway in case you have a really good reason for doing this I recommend that you use the function assignin for this.
assignin('caller', ['myvar',num2str(1)], 63);
I have a 60x60x35 array and would like to calculate the Wilcoxon signed rank test to calculate if the median for each element value across the third array dimension (i.e. with 35 values) is different from zero. Thus, I would like my results in two 60x60 arrays - with values of 0 and 1 depending on the test statistic, and in a separate array with corresponding p values.
The problem I am facing is specifying the command in a way that desired output would have appropriate dimensions and would be calculated across the appropriate dimension of the array.
Thanks for your help and all the best!
So one way to solve your problem is using a nested for-loop. Lets say your data is stored in data:
data=rand(60,60,35);
size_data=size(data);
p=zeros(size_data(1),size_data(2));
p(:,:)=NaN;
h=zeros(size_data(1),size_data(2));
h(:,:)=NaN;
for k=1:size_data(1)
for l=1:size_data(2)
tmp_data=data(k,l,:);
tmp_data=reshape(tmp_data,1,numel(tmp_data));
[p(k,l), h(k,l)]=signrank(tmp_data);
end
end
What I am doing is I preallocate the memory of p,h as a 60x60 matrix. Then I set them to NaN, so if you can easily see if sth went wrong (0 would be an acceptable result). Now I loop over all elements and store the actual data array in a new variable. signrank needs the data to be an array so I reshape it to two dimensions.
I guess you could skip those loops by using bsxfun
I am using Matlab for some data collection, and I want to save the data after each trial (just in case something goes wrong). The data is organized as a cell array of cell arrays, basically in the format
data{target}{trial} = zeros(1000,19)
But the actual data gets up to >150 MB by the end of the collection, so saving everything after each trial becomes prohibitively slow.
So now I am looking at opting for the matfile approach (http://www.mathworks.de/de/help/matlab/ref/matfile.html), which would allow me to only save parts of the data. The problem: this doesn't support cells of cell arrays, which means I couldn't change/update the data for a single trial; I would have to re-save the entire target's data (100 trials).
So, my question:
Is there another different method I can use to save parts of the cell array to speed up saving?
(OR)
Is there a better way to format my data that would work with this saving process?
A not very elegant but possibly effective solution is to use trial as part of the variable name. That is, use not a cell array of cell arrays (data{target}{trial}), but just different cell arrays such as data_1{target}, data_2{target}, where 1, 2 are the values of the trial counter.
You could do that with eval: for example
trial = 1; % change this value in a for lopp
eval([ 'data_' num2str(trial) '{target} = zeros(1000,19);']); % fill data_1{target}
You can then save the data for each trial in a different file. For example, this
eval([ 'save temp_save_file_' num2str(trial) ' data_' num2str(trial)])
saves data_1 in file temp_save_file_1, etc.
Update:
Actually it does appear to be possible to index into cell arrays, just not iside cell arrays. Hence, if you store your data slightly differently it seems like you can use matfile to update only part of it. See this example:
x = cell(3,4);
save x;
matObj = matfile('x.mat','writable',true);
matObj.x(3,4) = {eye(10)};
Note that this gives me a version warning, but it seems to work.
Hope this does the trick. However, still look into the next part of my answer as it may help you even more.
For calculations it is usually not required to save to disk after every iteration. An easy way to get a speedup (at the cost of a little more risk) is to save only after every n trials.
Like this for example:
maxTrial = 99;
saveEvery = 10;
for trial = 1:maxTrial
myFun; %Do your calculations here
if trial == maxTrial || mod(trial, saveEvery) == 0
save %Put your save command here
end
end
If your data is always at (or within) a certain size, you can also choose to store your data in a matrix rather than a cell array, then you can use indexing to save only part of the file.
In response to #Luis I will post an other way to deal with the situation.
It is indeed an option to save data in named variables or files, but to save a named variable in a named file seems too much.
If you only change the name of the file, you can save everything without using eval:
assuming you are dealing with trial 't':
filename = ['temp_save_file_' + num2str(t)];
If you really want, you can use print commands to write it as 001 for example.
Now you can simply use this:
save(filename, myData)
To use this, construct the filename again and so something like this:
totalData = {}; %Initialize your total data
And then read them as you wrote them (inside a loop):
load(filename)
totalData{t} = myData
If I want to store some strings or matrices of different sizes in a single variable, I can think of two options: I could make a struct array and have one of the fields hold the data,
structArray(structIndex).structField
or I could use a cell array,
cellArray{cellIndex}
but is there a general rule-of-thumb of when to use which data structure? I'd like to know if there are downsides to using one or the other in certain situations.
In my opinion it's more a matter of convenience and code clarity. Ask yourself would you prefer to refer your variable elements by number(s) or by name. Then use cell array in former case and struct array in later. Think about it as if you have a table with and without headers.
By the way you can easily convert between structures and cells with CELL2STRUCT and STRUCT2CELL functions.
If you use it for computation within a function, I suggest you use cell arrays, since they're more convenient to handle, thanks e.g. to CELLFUN.
However, if you use it to store data (and return output), it's better to return structures, since the field names are (should be) self-documenting, so you don't need to remember what information you had in column 7 of your cell array. Also, you can easily include a field 'help' in your structure where you can put some additional explanation of the fields, if necessary.
Structures are also useful for data storage since you can, if you want to update your code at a later date, replace them with objects without needing to change your code (at least in case you did pre-assignment of your structure). They have the same sytax, but objects will allow you to add more functionality, such as dependent properties (i.e. properties that are calculated on the fly based on other properties).
Finally, note that cells and structures add a few bytes of overhead to every field. Thus, if you want to use them to handle large amounts of data, you're much better off to use structures/cells containing arrays, rather than having large arrays of structures/cells where the fields/elements only contain scalars.
This code suggests that cell arrays may be roughly twice as fast as structs for assignment and retrieval. I did not separate the two operations. One could easily modify the code to do that.
Running "whos" afterwards suggests that they use very similar amounts of memory.
My goal was to make a "list of lists" in python terminology. Perhaps an "array of arrays".
I hope this is interesting/useful!
%%%%%%%%%%%%%% StructVsCell.m %%%%%%%%%%%%%%%
clear all
M = 100; % number of repetitions
N = 2^10; % size of cell array and struct
for m = 1:M
% Fill up a template cell array with
% lists of randomly sized matrices with
% random elements.
template{N} = 0;
for n = 1:N
r1 = round(24*rand());
r2 = round(24*rand());
r3 = rand(round(r2*rand),round(r1*rand()));
template{N} = r3;
end
% Make a cell array equivalent
% to the template.
cell_array = template;
% Create a struct with the
% same data.
structure = struct('data',0);
for n = 1:N
structure(n).data = template{n};
end
% Time cell array
tic;
for n = 1:N
data = cell_array{n};
cell_array{n} = data';
end
cell_time(m) = toc;
% Time struct
tic;
for n = 1:N
data = structure(n).data;
structure(n).data = data';
end
struct_time(m) = toc;
end
str = sprintf('cell array: %0.4f',mean(cell_time));
disp(str);
str = sprintf('struct: %0.4f',mean(struct_time));
disp(str);
str = sprintf('struct_time / cell_time: %0.4f',mean(struct_time)/mean(cell_time));
disp(str);
% Check memory use
whos
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
First and foremost, I second yuk's answer. Clarity is generally more important in the long run.
However, you may have two more options depending on how irregularly shaped your data is:
Option 3: structScalar.structField(fieldIndex)
Option 4: structScalar.structField{cellIndex}
Among the four, #3 has the least memory overhead for large numbers of elements (it minimizes the total number of matrices), and by large numbers I mean >100,000. If your code lends itself to vectorizing on structField, it is probably a performance win, too. If you can't collect each element of structField into a single matrix, option 4 has the notational benefits without the memory & performance advantages of option 3. Both of these options make it easier to use arrayfun or cellfun on the entire dataset, at the expense of requiring you to add or remove elements from each field individually. The choice depends on how you use your data, which brings us back to yuk's answer -- choose what makes for the clearest code.