I would like to store multiple tables in one array. In my code below, I am creating two tables T1 and T2. I want to store these tables into one variable MyArray.
LastName = {'Sanchez';'Johnson';'Li';'Diaz';'Brown'};
Age = [38;43;38;40;49];
Smoker = logical([1;0;1;0;1]);
Height = [71;69;64;67;64];
Weight = [176;163;131;133;119];
BloodPressure = [124 93; 109 77; 125 83; 117 75; 122 80];
T1 = table(LastName,Age,Smoker);
T2 = table(Height,Weight,BloodPressure);
% The code below does not work
MyArray(1) = T1;
MyArray(2) = T2;
I know I can use a cell array but I would like to know if it is possible to create a table datatype array in MATLAB.
Because table already implements () indexing, it's not really clear to me how you would expect to index MyArray. Your example almost looks to me like MyArray = [T1, T2].
I'm not sure if it satisfies your needs, but you can have table objects with table variables, like this:
T = table(T1, T2);
You can then using indexing as normal, e.g.
T.T1.LastName{2}
There was a time when
builtin('subsref',T1,substruct('()',{1}))
(for any custom class T1*) would skip calling the class-specific overloaded subsref and use the built-in method instead. This would be equivalent to T1(1), but ignoring whatever the class defined for that syntax. Similarly for subsasgn, which is the subscripted assignment operation T1(2)=T2. This allowed the creation and use of arrays of a class.
However, this seems to no longer work. Maybe it is related to the classdef-style classes, as the last time I used the trick above was before those were introduced.
I would suggest that you use cell arrays for this (even if the above still worked, I would not recommend it).
* Note that table is a custom class, you can edit table to see the source code.
Related
I'm a little new to Julia and am trying to use the fill! method to improve code performance on Julia. Currently, I read a 2d array from a file say read_array and perform row-operations on it to get a processed_array as follows:
function preprocess(matrix)
# Initialise
processed_array= Array{Float64,2}(undef, size(matrix));
#first row of processed_array is the difference of first two row of matrix
processed_array[1,:] = (matrix[2,:] .- matrix[1,:]) ;
#last row of processed_array is difference of last two rows of matrix
processed_array[end,:] = (matrix[end,:] .- matrix[end-1,:]);
#all other rows of processed_array is the mean-difference of other two rows
processed_array[2:end-1,:] = (matrix[3:end,:] .- matrix[1:end-2,:]) .*0.5 ;
return processed_array
end
However, when I try using the fill! method I get a MethodError.
processed_array = copy(matrix)
fill!(processed_array [1,:],d[2,:]-d[1,:])
MethodError: Cannot convert an object of type Matrix{Float64} to an object of type Float64
I'll be glad if someone can tell me what I'm missing and also suggest a method to optimize the code. Thanks in advance!
fill!(A, x) is used to fill the array A with a unique value x, so it's not what you want anyway.
What you could do for a little performance gain is to broadcast the assignments. That is, use .= instead of =. If you want, you can also use the #. macro to automatically add dots everywhere for you (for maybe cleaner/easier-to-read code):
function preprocess(matrix)
out = Array{Float64,2}(undef, size(matrix))
#views #. out[1,:] = matrix[2,:] - matrix[1,:]
#views #. out[end,:] = matrix[end,:] - matrix[end-1,:]
#views #. out[2:end-1,:] = 0.5 * (matrix[3:end,:] - matrix[1:end-2,:])
return out
end
For optimal performance, I think you probably want to write the loops explicitly and use multithreading with a package like LoopVectorization.jl for example.
PS: Note that in your code comments you wrote "cols" instead of "rows", and you wrote "mean" but take a difference. (Not sure it was intentional.)
I have an existing .py file that prints a classifier.predict for a SVC model. I would like to loop through each row in the X feature set to return a prediction.
I am currently trying to define the element from which to iterate over so as to allow for definition of the test statistic feature set X.
The test statistic feature set X is written in code as:
X_1 = xspace.iloc[testval-1:testval, 0:5]
testval is the element name used in the for loop in the above line:
for testval in X.T.iterrows():
print(testval)
I am having trouble returning a basic set of index values for X (X is the pandas dataframe)
I have tested the following with no success.
for index in X.T.iterrows():
print(index)
for index in X.T.iteritems():
print(index)
I am looking for the set of index values, with base 1 if possible, like 1,2,3,4,5,6,7,8,9,10...n
seemingly simple stuff...i haven't located an existing question via stackoverflow or google.
ALSO, the individual dataframes I used as the basis for X were refined with the line:
df1.set_index('Date', inplace = True)
Because dates were used as the basis for the concatenation of the individual dataframes the loops as written above are returning date values rather than
location values as I would prefer hence:
X_1 = xspace.iloc[testval-1:testval, 0:5]
where iloc, location is noted
please ask for additional code if you'd like to see more
the loops i've done thus far are returning date values, I would like to return index values of the location of the rows to accommodate the line:
X_1 = xspace.iloc[testval-1:testval, 0:5]
The loop structure below seems to be working for my application.
i = 1
j = list(range(1, len(X),1)
for i in j:
I have a cell array that I need to split into several matrices so that I can take the sum of subsets of the data. This is a sample of what I have:
A = {'M00.300', '1644.07';...
'M00.300', '9745.42'; ...
'M00.300', '2232.88'; ...
'M00.600', '13180.82'; ...
'M00.600', '2755.19'; ...
'M00.600', '15800.38'; ...
'M00.900', '18088.11'; ...
'M00.900', '1666.61'};
I want the sum of the second columns for each of 'M00.300', 'M00.600', and 'M00.900'. For example, to correspond to 'M00.300' I would want 1644.07 + 9745.42 + 2232.88.
I don't want to just hard code it because each data set is different, so I need the code to work for different size cell arrays.
I'm not sure of the best way to do this, I was going to begin by looping through A and comparing the strings in the first column and creating matrices within that loop, but that sounded messy and not efficient.
Is there a simpler way to do this?
Classic use of accumarray. You would use the first column as an index and the second column as the values associated with each index. accumarray works where you group values that belong to the same index together and you apply a function to those values. In your case, you'd use the default behaviour and sum things together.
However, you'll need to convert the first column into numeric labels. The third output of unique will help you do this. You'll also need to convert the second column into a numeric array and so str2double is a perfect way to do this.
Without further ado:
[val,~,id] = unique(A(:,1)); %// Get unique values and indices
out = accumarray(id, str2double(A(:,2))); %// Aggregate the groups and sum
format long g; %// For better display of precision
T = table(val, out) %// Display on a nice table
I get this:
>> T = table(val, out)
T =
val out
_________ ________
'M00.300' 13622.37
'M00.600' 31736.39
'M00.900' 19754.72
The above uses the table class that is available from R2013b and onwards. If you don't have this, you can perhaps use a for loop and print out each cell and value separately:
for idx = 1 : numel(out)
fprintf('%s: %f\n', val{idx}, out(idx));
end
We get:
M00.300: 13622.370000
M00.600: 31736.390000
M00.900: 19754.720000
So I would like to optimize my code such that I can look through an array such as
{'one','two','three'}
and create corresponding variables defined as tables or arrays
such as
one = table()
two = table()
three = table()
I am aware of the eval function however I would like to use this function in a loop s.t I allocate values to the new variable right after i create it
If I am understanding your question properly, given a cell array consisting only of strings, you wish to create variables in your workspace where each variable is declared as a string using the names from this cell array.
You could use eval, but I'm going to recommend something other than eval. eval should be avoided and instead of iterating those reasons, you can read Loren Shure's article on eval.
In any case, I would recommend you place these variables as dynamic fields inside a structure instead. Do something like this:
s = struct()
strs = {'one', 'two', 'three'};
for idx = 1 : numel(strs)
s.(strs{idx}) = table();
end
In this case, s would be a structure, and you can access the variable by the dot operator. In this case, you can access the corresponding variables by:
d = s.one; %// or
d2 = s.two; %// or
d3 = s.three;
If you want to place this into a function per se, you can place this into a function like so:
function [s] = createVariables(strs)
s = struct();
for idx = 1 : numel(strs)
s.(strs{idx}) = table();
end
This function will take in a cell array of strings, and outputs a structure that contains fields that correspond to the cell array of strings you put in. As such, you'd call it like so:
strs = {'one', 'two', 'three'};
s = createVariables(strs);
However, if you really really... I mean really... want to use eval, you can create your workspace variables like so:
strs = {'one', 'two', 'three'};
for idx = 1 : numel(strs)
eval([strs{idx} ' = table();']);
end
To place this into a function, do:
function [] = createVariables(strs)
for idx = 1 : numel(strs)
eval([strs{idx} ' = table();']);
end
However, be warned that if you run the function above, these variables will only be defined in the scope that the function was run in. You will not see these variables when the function exits. If you want to run a function so that the variables get defined in the workspace after you run the function, then eval is not the right solution for you. You should thus stick with the dynamic field approach that I talked about at the beginning of this post.
In any case, this will create one, two and three as workspace variables that are initialized to empty tables. However, I will argue with you that the first line of code is easier to read than the second piece of code, which is one of the main arguing points as to why eval should be avoided. If you stare at the second piece of code long enough, then you can certainly see what we're trying to achieve, but if you read the first piece of code, you can ascertain its purpose more clearly.
I have an array within an array and I am trying to name the variables using a for loop as there are a lot of variables. When I use the following simple code Time1 = dataCOMB{1,1}{1,1}(1:1024, 1); it opens the first cell in an array and proceeds to open the first cell in the following array and finally defines all the values in column 1 rows 1 to 1024 as Time1. However I have 38 of these different sets of data and when I apply the following code:
for t = 1:38
for aa = 1:38
Time(t) = dataCOMB{1,1}{1,aa}(1:1024, 1);
end
end
I get an error
In an assignment A(I) = B, the number of elements in B and I must be the same.
Error in Load_Files_working (line 39)
Time(t) = dataCOMB{1,1}{1,aa}(1:1024, 1);
Basically I am trying to get matlab to call the first column in each data set Time1, Time2, etc.
The problem:
1)You'd want to extract in a cell row...
2) ...the first 1024 numbers in the 1st column...
3) ...from each of the first 38 cells of a cell array.
The plan:
1) If one wants to get info from each element of a cell array (that is, an array accessed via {} indexing), one may use cellfun. Calling cellfun(some_function, a_cell_array) will aggregate the results of some_function(a_cell_array{k}) for all possible k subscripts. If the results are heterogeneous (i.e. not having the same type and size), one may use the cell_fun(..., 'UniformOutput', false) option to put them in an output cell array (cell arrays are good at grouping together heterogeneous data).
2) To extract the first 1024 numbers from the first column of an numeric array x one may use this anonymous function: #(x) x(1:1024,1). The x argument will com from each element of a cell array, and our anonymous function will play the role of some_function in the step above.
3) Now we need to specify a_cell_array, i.e. the cell array that contains the first 38 cells of the target. That would be, simply dataCOMB{1,1}(1,1:38).
The solution:
This one-liner implements the plan:
Time = cellfun(#(x) x(1:1024,1), dataCOMB{1,1}(1,1:38), 'UniformOutput', false);
Then you can access your data as in this example:
this_time = Time{3};
Your error is with Time(t). That's not how you create a new variable in matlab. To do exactly what you want (ie, create variables names Time1, Time2, etc...you'll need to use the eval function:
for aa = 1:38
eval(['Time' num2str(aa) '= dataCOMB{1,1}{1,aa}(1:1024,1);']);
end
Many people do not like recommending the eval function. Others wouldn't recommend moving all of your data out of a data structure and into their own independently-named variables. So, to address these two criticisms, a better alternative might be to pull your data out of your complicated data structure and to put it into a simpler array:
Time_Array = zeros(1024,38);
for aa = 1:38
Time_Array(:,aa) = dataCOMB{1,1}{1,aa}(1:1024,1);
end
Or, if you don't like that because you really like the names Time1, Time2, etc, you could create them as fields to a data structure:
Time_Data = [];
for aa = 1:38
fieldname = ['Time' num2str(aa)];
Time_Data.(fieldname) = dataCOMB{1,1}{1,aa}(1:1024,1);
end
And, in response to a comment below by the original post, this method can be extended to further unpack the data:
Time_Data = [];
count = 0;
for z = 1:2;
for aa = 1:38
count = count+1;
fieldname = ['Time' num2str(count)];
Time_Data.(fieldname) = dataCOMB{1,z}{1,aa}(1:1024,1);
end
end