Changing elements in an array using an apply function in r - arrays

So at the moment, I have an array with 8 columns and rows refer to people. I want to change the value of one column to 1 or 0 based on the value of another column for that person using an apply function.
I already have this with a loop, which is
for(i in 1:nrow(OutComes)) {
if(OutComes[i,"Risk_Factor"] > 0.7) {
OutComes[i,"OnsetAge"] = 1
} else {
OutComes[i,"OnsetAge"] = 0
}
}
So the OutCome array has a vector called "Risk_Factor" where each person is assigned a uniform random number using runif(). If this number is greater than 0.7, an element along the same row in the column "Onset Age" changes.
How would this work with an apply function?
I have searched but can't find anything which helps.

Assignment is a vectorized function, so there's no need for a loop.
is_risky <- OutComes[,"Risk_Factor"] > 0.7
OutComes[, "OnsetAge"] <- as.integer(is_risky)

Related

Matrix - Vector value matching

In my current project, I need to find values inside of an matrix that match with individual vector values. This is an example of the process; the main program has me using lat and lon values. But I create a 20x20 matrix and then a 20x1 array of randomly placed values.
When i do the for loop, each iteration of the Leroy vector is subtracted from every value in matrix. The first min function should return the smallest value from each column and its correspoding index. The second min function should return the smallest overall value from the first min function. and which index had the smallest value.
My concern is that im not sure which integer inside the matrix returned the smallest value. Is there a way I can use the indexes or something to figure that out?
Matrix = magic(20);
Leroy = randi(20,20,1);
for i = 1:length(Leroy)
[Jenkins, J] = min(min(Leroy(i) - Matrix);
end
As Cris Luego pointed in his comment, your for loop is not required since
Leroy(i) - Matrix
translates to something like
5 - [1 2 3; 4 5 6; 7 8 9]
However, your problem to get the index of the minimum in -Matrix can be solved by using min(-Matrix(:)):
[minimum, minidx] = min(-Matrix(:));
However, you will get the linear index. If you need the index for row and column, use
[colidx,rowidx] = ind2sub(size(Matrix), minidx);
Matrix = magic(20);
Leroy = randi(20,20,1);
for i = 1:length(Leroy)
[Jenkins, J] = min((Leroy(i) - Matrix).^2);
end
Using this will help get a match between two values in two arrays or a match between array values and matrix values

Take value from specific column for each row

I have array X with some values
[[0.3,0.4,0.5],
[0.1,0.7,0.9],
.
.
.
[0.3,0.6,0.9]]
an I have array with indexes I =[0,2,1,2,0,..].
I would like to take value from array X for each row according to indexed in array I like, in array I first value is 0 so from first row in I will take value from column 0 which is 0.3 and so on.
Is there any possible to do this without loop?
My idea:
Y = X[:,I] has no sense.
You were almost there, what you need is some fancy indexing on top:
Y = X[np.arange(len(I)),I]
This kind of indexing tells numpy to select the entries (i, I(i)) in X.

An If Statement inside an apply

I'm trying to use apply() to go through an array by rows, look at a column of 1's and 0's and then populate another column in that same array by using a function if the first column is a one, and a different function if it's a 0.
So it would be something like...
apply(OutComes, 1, if(risk = 1) {OutComes[, "Age"] = Function_1} else{OutComes[, "Age"] = Function_2} )
where OutComes is the array in question and risk is the variable which determines which function we use.
The aim is that 2 functions determine life length and people fall into one of the two categories, each with its own function. Based on the risk group, I want to use a different function to calculate the age, but this doesn't seem to be working.
apply() needs the name of a function; you need to define a function here,
because no readymade function supplied.
example: apply(OutComes, 1, sum) -will return sums of each line.
The number of output in vector is same as number or rows, so you can assign that to a variable and then add by cbind or replace the values of an existing column.
apply(OutComes, 1, function(x) {
if (x[n] == 1) {
Function_1 ()
}else {
Function_2 ()
} ) -> new_age
# x : is the working row at the time
# n : column number for "risk" # or # if(x["risk"] ==1)
# also note == instead of = at if
OutComes = cbind(OutComes, new_age)
#or
OutComes$Age <- new_age

Reshape a 3D array and remove missing values

I have an NxMxT array where each element of the array is a grid of Earth. If the grid is over the ocean, then the value is 999. If the grid is over land, it contains an observed value. N is longitude, M is latitude, and T is months.
In particular, I have an array called tmp60 for the ten years 1960 through 1969, so 120 months for each grid.
To test what the global mean in January 1960 was, I write:
tmpJan60=tmp60(:,:,1);
tmpJan60(tmpJan60(:,:)>200)=NaN;
nanmean(nanmean(tmpJan60))
which gives me 5.855.
I am confused about the reshape function. I thought the following code should yield the same average, namely 5.855, but it does not:
load tmp60
N1=size(tmp60,1)
N2=size(tmp60,2)
N3=size(tmp60,3)
reshtmp60 = reshape(tmp60, N1*N2,N3);
reshtmp60( reshtmp60(:,1)>200,: )=[];
mean(reshtmp60(:,1))
this gives me -1.6265, which is not correct.
I have checked the result in Excel (!) and 5.855 is correct, so I assume I make a mistake in the reshape function.
Ideally, I want a matrix that takes each grid, going first down the N-dimension, and make the 720 rows with 120 columns (each column is a month). These first 720 rows will represent one longitude band around Earth for the same latitude. Next, I want to increase the latitude by 1, thus another 720 rows with 120 columns. Ultimately I want to do this for all 360 latitudes.
If longitude and latitude were inputs, say column 1 and 2, then the matrix should look like this:
temp = [-179.75 -89.75 -1 2 ...
-179.25 -89.75 2 4 ...
...
179.75 -89.75 5 9 ...
-179.75 -89.25 2 5 ...
-179.25 -89.25 3 4 ...
...
-179.75 89.75 2 3 ...
...
179.75 89.75 6 9 ...]
So temp(:,3) should be all January 1960 observations.
One way to do this is:
grid1 = tmp60(1,1,:);
g1 = reshape(grid1, [1,120]);
grid2 = tmp60(2,1,:);
g2 = reshape(grid2,[1,120]);
g = [g1;g2];
But obviously very cumbersome.
I am not able to automate this procedure for the N*M elements, so comments are appreciated!
A link to the file tmp60.mat
The main problem in your code is treating the nans. Observe the following example:
a = randi(10,6);
a(a>7)=nan
m = [mean(a(:),'omitnan') mean(mean(a,'omitnan'),'omitnan')]
m =
3.8421 3.6806
Both elements in m are simply the mean on all elements in a. But they are different! The reason is the taking the mean of all values together, with mean(a(:),'omitnan') is like summing all not-nan values, and divide by the number of values we summed:
sum(a(:),'omitnan')/sum(~isnan(a(:)))==mean(a(:),'omitnan') % this is true
but taking the mean of the first dimension, we get 6 mean values:
sum(a,'omitnan')./sum(~isnan(a))==mean(a,'omitnan') % this is also true
and when we take the mean of them we divide by a larger number, because all nans were omitted already:
mean(sum(a,'omitnan')./sum(~isnan(a)))==mean(a(:),'omitnan') % this is false
Here is what I think you want in your code:
% this is exactly as your first test:
tmpJan60=tmn60(:,:,1);
tmpJan60(tmpJan60>200) = nan;
m1 = mean(mean(tmpJan60,'omitnan'),'omitnan')
% this creates the matrix as you want it:
result = reshape(permute(tmn60,[3 1 2]),120,[]).';
result(result>200) = nan;
r = reshape(result(:,1),720,360);
m2 = mean(mean(r,'omitnan'),'omitnan')
isequal(m1,m2)
To create the matrix you first permute the dimensions so the one you want to keep as is (time) will be the first. Then reshape the array to Tx(lon*lat), so you get 120 rows for all time steps and 259200 columns for all combinations of the coordinates. All that's left is to transpose it.
m1 is your first calculation, and m2 is what you try to do in the second one. They are equal here, but their value is not 5.855, even if I use your code.
However, I think the right solution will be to take the mean of all values together:
mean(result(:,1),'omitnan')

Marking and finding locations of elements of array in Matlab

I have a matrix having values between [0,1]. I want to find and mark the locations of those elements which have values <0.1 and >0.9.
So I use the matlab function find; but it returns me two vectors: a row and column vector distinctly which is difficult for analysis. So is there a way by which I can see the location of which elements meet the conditions without losing the original matrix structure?
I used the below line of code:
[r,c,v]= find(X<0.1 | X>0.9); % X is my 512*512 matrix of values
Thanks!
See if this works out for you -
%// cell array with each cell housing the matching indices for each row
out = cellfun(#find,mat2cell(X<0.1 | X>0.9,ones(1,size(X,1)),size(X,2)),'uni',0)
Browse through the values of out using - celldisp(out)
Just by using condition like this :
mask = (X < 0.1 | X > 0.9)
Will return an array logical with 1 where condition is respected.

Resources