An If Statement inside an apply - arrays

I'm trying to use apply() to go through an array by rows, look at a column of 1's and 0's and then populate another column in that same array by using a function if the first column is a one, and a different function if it's a 0.
So it would be something like...
apply(OutComes, 1, if(risk = 1) {OutComes[, "Age"] = Function_1} else{OutComes[, "Age"] = Function_2} )
where OutComes is the array in question and risk is the variable which determines which function we use.
The aim is that 2 functions determine life length and people fall into one of the two categories, each with its own function. Based on the risk group, I want to use a different function to calculate the age, but this doesn't seem to be working.

apply() needs the name of a function; you need to define a function here,
because no readymade function supplied.
example: apply(OutComes, 1, sum) -will return sums of each line.
The number of output in vector is same as number or rows, so you can assign that to a variable and then add by cbind or replace the values of an existing column.
apply(OutComes, 1, function(x) {
if (x[n] == 1) {
Function_1 ()
}else {
Function_2 ()
} ) -> new_age
# x : is the working row at the time
# n : column number for "risk" # or # if(x["risk"] ==1)
# also note == instead of = at if
OutComes = cbind(OutComes, new_age)
#or
OutComes$Age <- new_age

Related

how to sum only the max value for common prefix inside the array in scala

I have array contain string items in scala , each item contain from prefix + || + double value like below :
var y = Array("Zara||6.0", "Nuha||4.0","Zara||2.0","Zara||0.1")
what I want to Do :
i need sum all double value from above array (y(i).split("\|\|")(1)) But if the prefix the duplicated in the array then I only want sum the max value like below :
for item Zara we have 3 values i want to take the max (in our sample it 6.0)
for item Nuha it unique then i will take it's value (4.0)
the excepted output is (6.0+4.0)=10.0
is there are any way to do it in scala rather than using 2 instead loop ?
Prepare your array: extract prefix and values into tuple. Use foldLeft for aggregate max elem for each prefix, and sum values
val res = y.map(_.split("\\|\\|")).map(arr => (arr(0), arr(1).toDouble))
.foldLeft(Map.empty[String, Double]) { (acc, elem) =>
val value = acc.get(elem._1).map(math.max(_, elem._2)).getOrElse(elem._2)
acc + (elem._1 -> value)
}.values.sum
println(res)
You can do it pretty much in one step (it's three steps technically, but only one specifically addressing your requirement, everything else (split and sum) is kinda a given either way.
y
.iterator
.map(_.split("""\|\|"""))
.groupMapReduce(_.head)(_.last.toDouble)(_ max _)
.values
.sum
Also ... do not use vars. Even if you just putting together a quick sample. Vars are evil, just pretend they do not exist at all ... at least for a while, until you acquire enough of a command of the language to be able to tell the 1% of situations, where you might actually need them. Actually, avoid using Arrays as much as possible too.

Changing elements in an array using an apply function in r

So at the moment, I have an array with 8 columns and rows refer to people. I want to change the value of one column to 1 or 0 based on the value of another column for that person using an apply function.
I already have this with a loop, which is
for(i in 1:nrow(OutComes)) {
if(OutComes[i,"Risk_Factor"] > 0.7) {
OutComes[i,"OnsetAge"] = 1
} else {
OutComes[i,"OnsetAge"] = 0
}
}
So the OutCome array has a vector called "Risk_Factor" where each person is assigned a uniform random number using runif(). If this number is greater than 0.7, an element along the same row in the column "Onset Age" changes.
How would this work with an apply function?
I have searched but can't find anything which helps.
Assignment is a vectorized function, so there's no need for a loop.
is_risky <- OutComes[,"Risk_Factor"] > 0.7
OutComes[, "OnsetAge"] <- as.integer(is_risky)

Using percentage function with accumarray

I have two arrays:
OTPCORorder = [61,62,62,62,62,62,62,62,62,62,62,62,65,65,...]
AprefCOR = [1,3,1,1,1,1,1,1,1,1,2,3,3,2,...]
for each element in OTPCORorder there is a corresponding element in AprefCOR.
I want to know the percent of the number 1 for each set of unique OTPCORorder as follows:
OTPCORorder1 = [61,62,65,...]
AprefCOR1 = [1,0.72,0,...]
I already have this:
[OTPCORorder1,~,idx] = unique(OTPCORorder,'stable');
ANS = OTPCORorder1 = [61,62,65,...];
and I used to work with "accumarray" but I used the "mean" or "sum" function such as this:
AprefCOR1 = accumarray(idx,AprefCOR,[],#mean).';
I was just wondering if there exists a way to use this but with "prctile" function or any other function that gives me the percent of a specific element for example "1" in this case.
Thank you very much.
This could be one approach:
%// make all those non-zero values to zero
AprefCORmask = AprefCOR == 1;
%// you have done this
[OTPCORorder1,~,idx] = unique(OTPCORorder,'stable');
%// Find number of each unique values
counts = accumarray(idx,1);
%// Find number of ones for each unique value
sumVal = accumarray(idx,AprefCORmask);
%// find percentage of ones to get the results
perc = sumVal./counts
Results:
Inputs:
OTPCORorder = [61,62,62,62,62,62,62,62,62,62,62,62,65,65];
AprefCOR = [1,3,1,1,1,1,1,1,1,1,2,3,3,2];
Output:
perc =
1.0000
0.7273
0
Here's another approach without using accumarray. I think it's more readable:
>> list = unique(PCORorder);
>> counts_master = histc(PCORorder, list);
>> counts = histc(PCORorder(AprefCOR == 1), list);
>> perc = counts ./ counts_master
perc =
1.0000 0.7273 0
How the above code works is that we first find those elements in PCORorder that are unique. Once we do this, we first count up how many elements belong to each unique value in PCORorder via histc using the bins to count at as this exact list. If you're using a more newer version of MATLAB, use histcounts instead... same syntax. Once we find the total number of elements for each value in PCORorder, we simply count up how many elements correspond to PCORorder where AprefCOR == 1 and then to calculate the percentage, you simply divide each entry in this list with the total number of elements from the previous list.
It'll give you the same results as accumarray but with less overhead.
Your approach works, you only need to define an appropriate anonymous function to be used by accumarray. Let value = 1 be the value whose percentage you want to compute. Then
[~, ~, u] = unique(OTPCORorder); %// labels for unique values in OTPCORorder
result = accumarray(u(:), AprefCOR(:), [], #(x) mean(x==value)).';
As an alternative, you can use sparse as follows. Generate a two-row matrix sucha that each column corresponds to one of the possible values in OTPCORorder. First row tallies how many times each value in OTPCORorder had the desired value in AprefCOR; second row tallies how many times it didn't.
[~, ~, u] = unique(OTPCORorder);
s = full(sparse((AprefCOR==value)+1, u, 1));
result = s(2,:)./sum(s,1);

How to name variables in a data array using a for loop

I have an array within an array and I am trying to name the variables using a for loop as there are a lot of variables. When I use the following simple code Time1 = dataCOMB{1,1}{1,1}(1:1024, 1); it opens the first cell in an array and proceeds to open the first cell in the following array and finally defines all the values in column 1 rows 1 to 1024 as Time1. However I have 38 of these different sets of data and when I apply the following code:
for t = 1:38
for aa = 1:38
Time(t) = dataCOMB{1,1}{1,aa}(1:1024, 1);
end
end
I get an error
In an assignment A(I) = B, the number of elements in B and I must be the same.
Error in Load_Files_working (line 39)
Time(t) = dataCOMB{1,1}{1,aa}(1:1024, 1);
Basically I am trying to get matlab to call the first column in each data set Time1, Time2, etc.
The problem:
1)You'd want to extract in a cell row...
2) ...the first 1024 numbers in the 1st column...
3) ...from each of the first 38 cells of a cell array.
The plan:
1) If one wants to get info from each element of a cell array (that is, an array accessed via {} indexing), one may use cellfun. Calling cellfun(some_function, a_cell_array) will aggregate the results of some_function(a_cell_array{k}) for all possible k subscripts. If the results are heterogeneous (i.e. not having the same type and size), one may use the cell_fun(..., 'UniformOutput', false) option to put them in an output cell array (cell arrays are good at grouping together heterogeneous data).
2) To extract the first 1024 numbers from the first column of an numeric array x one may use this anonymous function: #(x) x(1:1024,1). The x argument will com from each element of a cell array, and our anonymous function will play the role of some_function in the step above.
3) Now we need to specify a_cell_array, i.e. the cell array that contains the first 38 cells of the target. That would be, simply dataCOMB{1,1}(1,1:38).
The solution:
This one-liner implements the plan:
Time = cellfun(#(x) x(1:1024,1), dataCOMB{1,1}(1,1:38), 'UniformOutput', false);
Then you can access your data as in this example:
this_time = Time{3};
Your error is with Time(t). That's not how you create a new variable in matlab. To do exactly what you want (ie, create variables names Time1, Time2, etc...you'll need to use the eval function:
for aa = 1:38
eval(['Time' num2str(aa) '= dataCOMB{1,1}{1,aa}(1:1024,1);']);
end
Many people do not like recommending the eval function. Others wouldn't recommend moving all of your data out of a data structure and into their own independently-named variables. So, to address these two criticisms, a better alternative might be to pull your data out of your complicated data structure and to put it into a simpler array:
Time_Array = zeros(1024,38);
for aa = 1:38
Time_Array(:,aa) = dataCOMB{1,1}{1,aa}(1:1024,1);
end
Or, if you don't like that because you really like the names Time1, Time2, etc, you could create them as fields to a data structure:
Time_Data = [];
for aa = 1:38
fieldname = ['Time' num2str(aa)];
Time_Data.(fieldname) = dataCOMB{1,1}{1,aa}(1:1024,1);
end
And, in response to a comment below by the original post, this method can be extended to further unpack the data:
Time_Data = [];
count = 0;
for z = 1:2;
for aa = 1:38
count = count+1;
fieldname = ['Time' num2str(count)];
Time_Data.(fieldname) = dataCOMB{1,z}{1,aa}(1:1024,1);
end
end

Saving return values of function returning multiple variables in Matlab

I have never used matlab before so excuse this very basic question.
Basically I have a function that returns multiple variables, defined like so:
function [a, b, c]=somefunction(x, y, z)
I know I can get the return values as follows:
[a,b,c] = somefunction(1,2,3);
Now what I would like to do instead is save multiple runs of somefunction into an array and then retrieve them later. I tried:
results = [];
results = [results somefunction(1,2,3)];
results = [results somefunction(4,5,6)];
And then I tried accessing the individual runs as:
% access second run, i.e. somefunction(1,2,3) ?
a = results(2, 1);
b = results(2, 2);
c = results(2, 3);
but this tells me that the index is out of bound because size(results) = [1,99654] (99654 is the number of results I need to save). So it does not appear to be an array? Sorry for this basic question, again I have never used matlab.
When you combine arrays with [ ... ], you're concatenating them, creating one long flat array. For example, if call 1 returns 3 elements, call 2 returns 8 elements, and call 3 returns 4 elements, you'll end up with a 14-long array, and no way of knowing which elements came from which function call.
If you want to keep the results from each run separate, you can stash them in a cell array. You still need a comma-separated list on the LHS to get all the multiple argouts. The {}-indexing syntax, as opposed to (), "pops" contents in and out of cell elements.
Let's store the results in a k-by-n array named x, where the function returns n outputs and we'll call it k times.
x = cell(2, 3); % cell(k, n)
% Make calls
[x{1,1}, x{1,2}, x{1,3}] = somefunction(1,2,3);
[x{2,1}, x{2,2}, x{2,3}] = somefunction(4,5,6);
% Now, the results of the ni-th argout of the ki-th call are in x{ki,ni}
% E.g. here is the 3rd argout from the second call
x{2,3}
You could also store the argouts in separate variables, which may be more readable. In this case, each will be a k-long vector
[a,b,c] = deal(cell(1,2)); % cell(1,k)
[a{1}, b{1}, c{1}] = somefunction(1,2,3);
[a{2}, b{2}, c{2}] = somefunction(1,2,3);
And of course this generalizes to loops, if your somefunction inputs are amenable to that.
[a,b,c] = deal(cell(1,nIterations));
for k = 1:nIterations
[a{k}, b{k}, c{k}] = somefunction(1,2,3);
end
Details are in the doco at http://www.mathworks.com/help/matlab/cell-arrays.html or doc cell.
(Side note: that results(1, 2) in your post ought to succeed for an array of size [1,99654]. Sure you didn't do results(2, 1)?)

Resources