C variable assignment and R equivalent - c

Hi I am trying to understand the following variable assignment in C, and try re-write it in R. I use R often but have only really glanced at C.
int age,int b_AF,int b_ra,int b_renal,int b_treatedhyp,int b_type2,double bmi,int ethrisk,int fh_cvd,double rati,double sbp,int smoke_cat,int surv,double town
)
{
double survivor[3] = {
0,
0.996994316577911,
0.993941843509674
};
a = /*pre assigned*/
double score = 100.0 * (1 - pow(survivor[surv], exp(a)) );
return(score);
}
how does survivor[surv] work in this context? An explanation would be helpful, and any input on how to do the assignment in R would be a bonus.
Thanks very much!

This is an aggregate initializer:
double survivor[3] = {
0,
0.996994316577911,
0.993941843509674
};
and is equivalent to:
double survivor[3];
survivor[0] = 0;
survivor[1] = 0.996994316577911;
survivor[2] = 0.993941843509674;
and survivor[surv] is the value stored at index of the survivor array. Array indexes run from 0 to N - 1 so if surv was 1 then survivor[surv] has value of 0.996994316577911.
Note, the function as currently written does not check that surv is a valid index for the array survivor (i.e. surv > -1 and surv < 3) and runs the risk of undefined behaviour.

Given the Answer of #hmjd then, the R equivalent would be
survivor <- c(0, 0.996994316577911, 0.993941843509674)
or if survivor already exists and you wish to assign into the first 3 elements:
survivor[1:3] <- c(0, 0.996994316577911, 0.993941843509674)
(Note R's indices are 1-based unlike C's 0-based ones.)
As for the extraction, the general idea is the same as with C, but the details matter:
R> survivor[0] ## 0 index returns an empty vector
numeric(0)
R> survivor[-1] ## negative index **drops** that element
[1] 0.9969943 0.9939418
R> survivor[10] ## positive outside length of vector returns NA
[1] NA
R> surv <- 2
R> survivor[surv] ## same holds for whatever surv contains
[1] 0.9969943

Related

Array subsetting in Julia

With the Julia Language, I defined a function to sample points uniformly inside the sphere of radius 3.14 using rejection sampling as follows:
function spherical_sample(N::Int64)
# generate N points uniformly distributed inside sphere
# using rejection sampling:
points = pi*(2*rand(5*N,3).-1.0)
ind = sum(points.^2,dims=2) .<= pi^2
## ideally I wouldn't have to do this:
ind_ = dropdims(ind,dims=2)
return points[ind_,:][1:N,:]
end
I found a hack for subsetting arrays:
ind = sum(points.^2,dims=2) .<= pi^2
## ideally I wouldn't have to do this:
ind_ = dropdims(ind,dims=2)
But, in principle array indexing should be a one-liner. How could I do this better in Julia?
The problem is that you are creating a 2-dimensional index vector. You can avoid it by using eachrow:
ind = sum.(eachrow(points.^2)) .<= pi^2
So that your full answer would be:
function spherical_sample(N::Int64)
points = pi*(2*rand(5*N,3).-1.0)
ind = sum.(eachrow(points.^2)) .<= pi^2
return points[ind,:][1:N,:]
end
Here is a one-liner:
points[(sum(points.^2,dims=2) .<= pi^2)[:],:][1:N, :]
Note that [:] is dropping a dimension so the BitArray can be used for indexing.
This does not answer your question directly (as you already got two suggestions), but I rather thought to hint how you could implement the whole procedure differently if you want it to be efficient.
The first point is to avoid generating 5*N rows of data - the problem is that it is very likely that it will be not enough to generate N valid samples. The point is that the probability of a valid sample in your model is ~50%, so it is possible that there will not be enough points to choose from and [1:N, :] selection will throw an error.
Below is the code I would use that avoids this problem:
function spherical_sample(N::Integer) # no need to require Int64 only here
points = 2 .* pi .* rand(N, 3) .- 1.0 # note that all operations are vectorized to avoid excessive allocations
while N > 0 # we will run the code until we have N valid rows
v = #view points[N, :] # use view to avoid allocating
if sum(x -> x^2, v) <= pi^2 # sum accepts a transformation function as a first argument
N -= 1 # row is valid - move to the previous one
else
rand!(v) # row is invalid - resample it in place
#. v = 2 * pi * v - 1.0 # again - do the computation in place via broadcasting
end
end
return points
end
This one is pretty fast, and uses StaticArrays. You can probably also implement something similar with ordinary tuples:
using StaticArrays
function sphsample(N)
T = SVector{3, Float64}
v = Vector{T}(undef, N)
n = 1
while n <= N
p = rand(T) .- 0.5
#inbounds v[n] = p .* 2π
n += (sum(abs2, p) <= 0.25)
end
return v
end
On my laptop it is ~9x faster than the solution with views.

Split, group and mean: computation with arrays

A is a given N x R xT array. I must split it horizontally to N sub-arrays of size L x M and then group each z together in an array K and take a mean.
For Example: A is the array rand(N,R,T)= rand( 16, 3 ,3); Now I am going to split it:
A=rand( 16, 3 ,3) : A(1,:,:), A(2,:,:), A(3,:,:), A(4,:,:), ... , A(16,:,:).
I have 16 slices.
B_1=A(1,:,:); B_2=A(2,:,:); B_3=A(3,:,:); ... ; B_16=A(16,:,:);
The next step is grouping together every 3 ( for example).
Now I am going create K_i as :
K_1(1,:,:)=B_1;
K_1(2,:,:)=B_2;
K_1(3,:,:)=B_3;
...
K_8(1,:,:)=B_14;
K_8(2,:,:)=B_15;
K_8(3,:,:)=B_16;
The average array is found as:
C_1=[B_1 + B_2 + B_3]/3
...
C_8= [ B_14 + B_15 + B_16] /3
I have implemented it as:
A_reshape = reshape(squeeze(A), size(A,2), size(A,3),2, []);
mean_of_all_slices = permute(mean(A_reshape , 3), [1 2 4 3]);
Question 1 I have checked by hand. It gives me a wrong result. How to fix it? [SOLVED]
EDIT 2 I need to simulate the following computation:
take a product each slice of the array K_i with another array P_p: It means:
for `K_1` is given `P_1`): `B_1 * P_1` , `B_2 * P_1`, `B_3 * P_1`
...
for `K_8` is given `P_8`): `B_14 * P_8` , `B_15 * P_8`, `B_16 * P_8`
I have solved!!!
Disclaimer: this answers a previous version of the question.
In cases such as this I would suggest relying on built-ins, which have a predictable behavior. In your case, this would be movmean (introduced in R2016a):
WIN_SZ = 2; % Window size for averaging
AVG_DIM = 1; % Dimension for averaging
tmp = movmean(A, WIN_SZ , AVG_DIM ,'Endpoints', 'discard');
C = tmp(1:WINDOW_SZ:end, :, :); % This only selects A1+A2, A3+A4 etc.
If your MATLAB is a bit older, this can also be done using convolution (convn, introduced before R2006):
WIN_SZ = 3;
tmp = convn(A, ones(WIN_SZ ,1)./WIN_SZ, 'valid'); % Shorter than A in dim1 by (WIN_SZ-1)
C = tmp(1:WINDOW_SZ:end, :, :); % dim1 size is: ceil((size(A,1)-(WIN_SZ-1))/3)
BTW, the step where you create B from slices of A can be done using
B = num2cell(A,[2,3]); % yields a 16x1 cell array of 1x3x3 double arrays

Nested for-loops and their formats

I am using Python 2.7. From previous posts, I am learning Python and I have moved from arrays and now I am working on loops. I am also trying to work with operations using arrays.
A1 = np.random.random_integers(35, size=(10.,5.))
A = np.array(A1)
B1 = np.random.random_integers(68, size=(10.,5.))
B = np.array(B1)
D = np.zeros(10,5) #array has 10 rows and 5 columns filled with zeros to give me the array size I want
for j in range (1,5):
for k in range (1,5):
D[j,k] = 0
for el in range (1,10):
D[j,k] = D[j,k] + A[j] * B[k]
The error I am getting is : setting an array element with a sequence
Is my formatting incorrect?
Because A, B and D are all 2D arrays, then D[j,k]
is a single element, while A[j] (the same as A[j,:]) is a 1D array which, in this case, has 5 elements. Similar for B[k] = B[k,:], i.e. also a 5 element array.
A[j] * B[k] is therefore also five element array, which can not be stored in the place of a single element, and you therefore get the error: setting an array element with a sequence.
If you want to select single elements from A and B, then the last line should be
D[j,k] = D[j,k] + A[j,k] * B[j,k]
Some further comments on your code:
# A is already a numpy array, so 'A = np.array(A1)' is redundant and can be omitted
A = np.random.random_integers(35, size=(10.,5.))
# Same as above
B = np.random.random_integers(68, size=(10.,5.))
D = np.zeros([10,5]) # This is the correct syntax for creating a 2D array with the np.zeros() function
for j in range(1,5):
for k in range(1,5):
# D[j,k] = 0 You have already defined D to be zero for all elements with the np.zeros function, so there is no need to do it again
for el in range(1,75):
D[j,k] = D[j,k] + A[j] * B[k]
EDIT:
Well, I do not have enough reputation to comment on your post #Caroline.py, so I will do it here instead:
First of all, remember that python uses zero indexing, so 'range(1,5)' gives you '[1,2,3,4]', which means that you would not reach the first index, i.e. index 0. Thus you would probably want to use 'range(0,5)', which is the same as just 'range(5)', instead.
I can see that you changed the el range from 75 to 10. If you don't use el to anything, it just means that you add perform the last line 10 times.
I don't know what you want to do, but if you want to store the multiple of A and B in D, then this should be right:
for j in range(10):
for k in range(5):
D[j,k] = A[j,k] * B[j,k]
or just
D = A * B

Vector search Algorithm

I have the following problem. Say I have a vector:
v = [1,2,3,4,5,1,2,3,4,...]
I want to sequentially sample points from the vector, that have an absolute maginute difference higher than a threshold from a previously sampled point. So say my threshold is 2.
I start at the index 1, and sample the first point 1. Then my condition is met at v[3], and I sample 3 (since 3-1 >= 2). Then 3, the new sampled point becomes the reference, that I check against. The next sampled point is 5 which is v[5] (5-3 >= 2). Then the next point is 1 which is v[6] (abs(1-5) >= 2).
Unfortunately my code in R, is taking too long. Basically I am scanning the array repeatedly and looking for matches. I think that this approach is naive though. I have a feeling that I can accomplish this task in a single pass through the array. I dont know how though. Any help appreciated. I guess the problem I am running into is that the location of the next sample point can be anywhere in the array, and I need to scan the array from the current point to the end to find it.
Thanks.
I don't see a way this can be done without a loop, so here is one:
my.sample <- function(x, thresh) {
out <- x
i <- 1
for (j in seq_along(x)[-1]) {
if (abs(x[i]-x[j]) >= thresh) {
i <- j
} else {
out[j] <- NA
}
}
out[!is.na(out)]
}
my.sample(x = c(1:5,1:4), thresh = 2)
# [1] 1 3 5 1 3
You can do this without a loop using a bit of recursion:
vsearch = function(v, x, fun=NULL) {
# v: input vector
# x: threshold level
if (!length(v) > 0) return(NULL)
y = v-rep(v[1], times=length(v))
if (!is.null(fun)) y = fun(y)
i = which(y >= x)
if (!length(i) > 0) return(NULL)
i = i[1]
return(c(v[i], vsearch(v[-(1:(i-1))], x, fun=fun)))
}
With your vector above:
> vsearch(c(1,2,3,4,5,1,2,3,4), 2, abs)
[1] 3 5 1 3

Data.Map vs. Data.Array for symmetric matrices?

Sorry for the vague question, but I hope for an experienced Haskeller this is a no-brainer.
I have to represent and manipulate symmetric matrices, so there are basically three different choices for the data type:
Complete matrix storing both the (i,j) and (j,i) element, although m(i,j) = m(j,i)
Data.Array (Int, Int) Int
A map, storing only elements (i,j) with i <= j (upper triangular matrix)
Data.Map (Int, Int) Int
A vector indexed by k, storing the upper triangular matrix given some vector order f(i,j) = k
Data.Array Int Int
Many operations are going to be necessary on the matrices, updating a single element, querying for rows and columns etc. However, they will mainly act as containers, no linear algebra operations (inversion, det, etc) will be required.
Which one of the options would be the fastest one in general if the dimensionality of the matrices is going to be at around 20x20? When I understand correctly, every update (with (//) in the case of array) requires full copies, so going from 20x20=400 elements to 20*21/2 = 210 elements in the cases 2. or 3. would make a lot of sense, but access is slower for case 2. and 3. needs conversion at some point.
Are there any guidelines?
Btw: The 3rd option is not a really good one, as computing f^-1 requires square roots.
You could try using Data.Array using a specialized Ix class that only generates the upper half of the matrix:
newtype Symmetric = Symmetric { pair :: (Int, Int) } deriving (Ord, Eq)
instance Ix Symmetric where
range ((Symmetric (x1,y1)), (Symmetric (x2,y2))) =
map Symmetric [(x,y) | x <- range (x1,x2), y <- range (y1,y2), x >= y]
inRange (lo,hi) i = x <= hix && x >= lox && y <= hiy && y >= loy && x >= y
where
(lox,loy) = pair lo
(hix,hiy) = pair hi
(x,y) = pair i
index (lo,hi) i
| inRange (lo,hi) i = (x-loy)+(sum$take(y-loy)[hix-lox, hix-lox-1..])
| otherwise = error "Error in array index"
where
(lox,loy) = pair lo
(hix,hiy) = pair hi
(x,y) = pair i
sym x y
| x < y = Symmetric (y,x)
| otherwise = Symmetric (x,y)
*Main Data.Ix> let a = listArray (sym 0 0, sym 6 6) [0..]
*Main Data.Ix> a ! sym 3 2
14
*Main Data.Ix> a ! sym 2 3
14
*Main Data.Ix> a ! sym 2 2
13
*Main Data.Ix> length $ elems a
28
*Main Data.Ix> let b = listArray (sym 0 0, sym 19 19) [0..]
*Main Data.Ix> length $ elems b
210
There is a fourth option: use an array of decreasingly-large arrays. I would go with either option 1 (using a full array and just storing every element twice) or this last one. If you intend to be updating a lot of elements, I strongly recommend using a mutable array; IOArray and STArray are popular choices.
Unless this is for homework or something, you should also take a peek at Hackage. A quick look suggests the problem of manipulating matrices has been solved several times already.

Resources