Moving rows between multiple subarrays - arrays

My question follows from a previous question that I asked, but without including my own code (which I should have done initially).
Moving rows between subarrays
which solves my dilemma only partially. But I have adopted the method in the below code.
Here is relevant code to my specific problem:
K <- 2 # number of equally-sized (sub)populations
N <- 5 # total number of sampled individuals
Hstar <- 5 # total number of haplotypes
probs <- rep(1/Hstar, Hstar) # haplotype frequencies
m = 0.1 # migration rate between subpopulations
perms <- 10000 # number of permutations
## Set up container(s) to hold the identity of each individual from each permutation ##
num.specs <- ceiling(N / K)
## Create an ID for each haplotype ##
haps <- 1:Hstar
## Assign individuals (N) to each subpopulation (K) ##
specs <- 1:num.specs
## Generate permutations, assume each permutation has N individuals, and sample those individuals' haplotypes from the probabilities ##
gen.perms <- function() {
sample(haps, size = num.specs, replace = TRUE, prob = probs)
}
pop <- array(dim = c(perms, num.specs, K))
for (i in 1:K) {
pop[,, i] <- replicate(perms, gen.perms())
}
## Allow individuals from permutations to migrate between subpopulations ##
for (i in 1:K) {
if (m != 0){
ind <- sample(perms, size = perms * m, replace = FALSE) # sample random row from random subpopulation
}
pop[ind,] ## should swap rows between subarrays, but instead throws an error.
}
'ind' identifies the rows that are to be swapped.
The goal is to swap rows from one subpopulation (= subarray) to the other as initially asked in the linked question. For example, switch row 1 of subarray 1 with row 100 of subarray 2. Most importantly, I need to preserve the array type. In the end, 'pop' must have dimensions = c(perms, num.specs, K). Can this be done?
Any assistance is greatly appreciated.

i think you forget to put a comma. I guess you need to change line 38 to this:
pop[ind,,] ## should swap rows between subarrays, but instead throws an error.
and when you want to go on with the changed pop-array, with swapped lines, you need to store it in a variable. for example like this:
pop_new <- pop[ind,,]
or you can store it in the same variable, which mean it will overwrite the old content of pop. When you use the same vector (ind) for index as well the target variable you replace only the old value. With sample() you swap the rows randomly :
pop[ind,,] <- pop[sample(ind),,]

Related

Altering arrays to add/remove entries at each time-step in R

This question, probably has a simple solution but I cannot think of how to do it...
So I have a script as follows:
# ------------------ MODEL SETUP ----------------------------------------# simulation length
t_max <- 50
# arena
arena_x <- 100
arena_y <- 100
# plant parameters
a <- 0.1
b <- 0.1
g <- 1
# list of plant locations and initial sizes
nplants <-dim(plantLocsX)[1]*dim(plantLocsX)[2]
iterations<-5
totalBiomass<-matrix(0,nrow=iterations,ncol=1)
# starting loop
sep <- 10
# Original matrix
plantLocsX <- matrix(rep(seq(0,arena_x,sep), arena_y/sep),
nrow=1+arena_x/sep,
ncol=1+arena_y/sep)
plantLocsY <- t(plantLocsX)
plantSizes <- matrix(1,nrow=nplants,ncol=1)
# Plot the plants
radius <- sqrt( plantSizes/ pi )
symbols(plantLocsX, plantLocsY, radius, xlim = c(0,100), ylim=c(0,100), inches=0.05, fg = "green",
xlab = "x domain (m)", ylab = "y domain (m)", main = "Random Plant Locations", col.main = 51)
# Calculate distances between EACH POSSIBLE PAIR of plants
distances <- matrix(0,nrow=nplants,ncol=nplants)
for (i in 1:nplants){
for (j in 1:nplants){
distances[i,j] <- sqrt( (plantLocsX[i]-plantLocsX[j])^2 + (plantLocsY[i]-plantLocsY[j])^2 )
}
}
# ------------------ MODEL RUNNING ---------------------------------------
I need to alter the arrays containing plant locations and plant sizes so that at each time step, entries are removed and added (simulating mortality/reproduction, respectively). The "distances" must be updated with plant locations and sizes after each iteration...I can only think of complex ways to do this: destructing and constructing new matrices at each time step to fit the new number of elements but there must be functions to make this simpler....any advice?
Many thanks!!

How do you dynamically create difference- or delta- columns in a data.frame?

My dataframe has column names of outstanding balance from Balance, Balance1, Balance2,...,Balance36.
I want to add a column for the delta between each month, i.e. Delta2 = Balance2 - Balance1
How can I simplify by method below.
dataset$delta1 = apply(dataset[, c("Balance1","Balance")], 1, function(x){x[2]-x[1]})
dataset$delta2 = apply(dataset[, c("Balance2","Balance1")], 1, function(x){x[2]-x[1]})
...
dataset$delta35 = apply(dataset[, c("Balance35","Balance34")], 1, function(x){x[2]-x[1]})
dataset$delta36 = apply(dataset[, c("Balance36","Balance35")], 1, function(x){x[2]-x[1]})
It boils down to a one-liner. First, name your dataset something short, df is the usual name. Then, use direct subtraction; there's zero need to call apply() to subtract one column from another:
df$delta1 <- df[,"Balance1"] - df[,"Balance"]
df$delta2 <- df[,"Balance2"] - df[,"Balance1"]
...
df$delta35 <- df[,"Balance35"] - df[,"Balance34")]
df$delta36 <- df[,"Balance36"] - df[,"Balance35")]
But since the whole computation has a regular structure, we're really only talking about generating a Nx36 array of differences, so use numeric column indices. Say your "Balance*" column indices are (50:85) and your delta_cols are 100:135, or whatever. Then the indices for LHS of your "Balance*" subtraction are balance_lhs <- (50:84) and RHS indices are (51:85), or just ((50:84)+1) (remember that most operators like addition vectorize in R)
So your Nx36 array can be generated by just the one-liner:
df[,delta_cols] <- df[,(balance_lhs+1)] - df[,balance_lhs]
And you can compute delta_cols <- which(colnames(df) == c("delta1",...,"delta36") programmatically, to avoid magic-number column indices in your code.
Use lapply to calculate delta for all 36 comparisons in one line.
# Sample data (37 columns, labelled Balance, Balance1, ...)
set.seed(2017);
df <- as.data.frame(matrix(runif(37 * 100), ncol = 37));
colnames(df) <- paste("Balance", c("", seq(1:36)), sep = "");
# List of difference vectors (36 distance vectors, labelled delta1, ...)
lst <- lapply(2:ncol(df), function(i) df[, i] - df[, i - 1]);
names(lst) <- paste("delta", seq(1:36), sep = "");
# Combine with original dataframe
df <- cbind.data.frame(
df,
as.data.frame(lst));

R routine always samples from last row of array instead of random rows

I have been debugging the following routine for some time.
A problem that came to my attention is that sampling is always done on the last row of my array every time I run the simulation. I want it to select rows at random each time the code is run.
Here's what I have:
N <- 10
Hstar <- 5
perms <- 10 ### How many permutations are we considering
specs <- 1:N
Set up a container to hold the identity of each individual from each permutation
pop <- array(dim = c(perms, N))
haps <- as.character(1:Hstar)
Assign probabilities
probs <- rep(1/Hstar, Hstar)
Generate permutations
for(i in 1:perms){
pop[i, ] <- sample(haps, size = N, replace = TRUE, prob = probs)
}
Make a matrix to hold the 1:N individuals from each permutation
HAC.mat <- array(dim = c(perms, N))
for(j in specs){
for(i in 1:perms){
ind.index <- sample(specs, size = j, replace = FALSE) ## which individuals will we sample
hap.plot <- pop[i, ind.index] ## pull those individuals from a permutation
HAC.mat[i, j] <- length(unique(hap.plot)) ## how many haplotypes did we get for a given sampling intensity (j) from each ### permutation (i)
}
}
When I look at ind.index and hap.plot, I notice that values from haps are always taken from the last row in the pop variable and I can quite understand why this is occurring. I would like it to randomly sample from a given row in pop.
Any help is greatly appreciated.
I have found a workaround that looks like it works.
hap.plot <- pop[sample(nrow(pop), size = 1, replace = TRUE), ]

Is there a way to reshape an array that does not maintain the original size (or a convenient work-around)?

As a simplified example, suppose I have a dataset composed of 40 sorted values. The values of this example are all integers, though this is not necessarily the case for the actual dataset.
import numpy as np
data = np.linspace(1,40,40)
I am trying to find the maximum value inside the dataset for certain window sizes. The formula to compute the window sizes yields a pattern that is best executed with arrays (in my opinion). For simplicity sake, let's say the indices denoting the window sizes are a list [1,2,3,4,5]; this corresponds to window sizes of [2,4,8,16,32] (the pattern is 2**index).
## this code looks long because I've provided docstrings
## just in case the explanation was unclear
def shapeshifter(num_col, my_array=data):
"""
This function reshapes an array to have 'num_col' columns, where
'num_col' corresponds to index.
"""
return my_array.reshape(-1, num_col)
def looper(num_col, my_array=data):
"""
This function calls 'shapeshifter' and returns a list of the
MAXimum values of each row in 'my_array' for 'num_col' columns.
The length of each row (or the number of columns per row if you
prefer) denotes the size of each window.
EX:
num_col = 2
==> window_size = 2
==> check max( data[1], data[2] ),
max( data[3], data[4] ),
max( data[5], data[6] ),
.
.
.
max( data[39], data[40] )
for k rows, where k = len(my_array)//num_col
"""
my_array = shapeshifter(num_col=num_col, my_array=data)
rows = [my_array[index] for index in range(len(my_array))]
res = []
for index in range(len(rows)):
res.append( max(rows[index]) )
return res
So far, the code is fine. I checked it with the following:
check1 = looper(2)
check2 = looper(4)
print(check1)
>> [2.0, 4.0, ..., 38.0, 40.0]
print(len(check1))
>> 20
print(check2)
>> [4.0, 8.0, ..., 36.0, 40.0]
print(len(check2))
>> 10
So far so good. Now here is my problem.
def metalooper(col_ls, my_array=data):
"""
This function calls 'looper' - which calls
'shapeshifter' - for every 'col' in 'col_ls'.
EX:
j_list = [1,2,3,4,5]
==> col_ls = [2,4,8,16,32]
==> looper(2), looper(4),
looper(8), ..., looper(32)
==> shapeshifter(2), shapeshifter(4),
shapeshifter(8), ..., shapeshifter(32)
such that looper(2^j) ==> shapeshifter(2^j)
for j in j_list
"""
res = []
for col in col_ls:
res.append(looper(num_col=col))
return res
j_list = [2,4,8,16,32]
check3 = metalooper(j_list)
Running the code above provides this error:
ValueError: total size of new array must be unchanged
With 40 data points, the array can be reshaped into 2 columns of 20 rows, or 4 columns of 10 rows, or 8 columns of 5 rows, BUT at 16 columns, the array cannot be reshaped without clipping data since 40/16 ≠ integer. I believe this is the problem with my code, but I do not know how to fix it.
I am hoping there is a way to cutoff the last values in each row that do not fit in each window. If this is not possible, I am hoping I can append zeroes to fill the entries that maintain the size of the original array, so that I can remove the zeroes after. Or maybe even some complicated if - try - break block. What are some ways around this problem?
I think this will give you what you want in one step:
def windowFunc(a, window, f = np.max):
return np.array([f(i) for i in np.split(a, range(window, a.size, window))])
with default f, that will give you a array of maximums for your windows.
Generally, using np.split and range, this will let you split into a (possibly ragged) list of arrays:
def shapeshifter(num_col, my_array=data):
return np.split(my_array, range(num_col, my_array.size, num_col))
You need a list of arrays because a 2D array can't be ragged (every row needs the same number of columns)
If you really want to pad with zeros, you can use np.lib.pad:
def shapeshifter(num_col, my_array=data):
return np.lib.pad(my_array, (0, num_col - my.array.size % num_col), 'constant', constant_values = 0).reshape(-1, num_col)
Warning:
It is also technically possible to use, for example, a.resize(32,2) which will create an ndArray padded with zeros (as you requested). But there are some big caveats:
You would need to calculate the second axis because -1 tricks don't work with resize.
If the original array a is referenced by anything else, a.resize will fail with the following error:
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
The resize function (i.e. np.resize(a)) is not equivalent to a.resize, as instead of padding with zeros it will loop back to the beginning.
Since you seem to want to reference a by a number of windows, a.resize isn't very useful. But it's a rabbit hole that's easy to fall into.
EDIT:
Looping through a list is slow. If your input is long and windows are small, the windowFunc above will bog down in the for loops. This should be more efficient:
def windowFunc2(a, window, f = np.max):
tail = - (a.size % window)
if tail == 0:
return f(a.reshape(-1, window), axis = -1)
else:
body = a[:tail].reshape(-1, window)
return np.r_[f(body, axis = -1), f(a[tail:])]
Here's a generalized way to reshape with truncation:
def reshape_and_truncate(arr, shape):
desired_size_factor = np.prod([n for n in shape if n != -1])
if -1 in shape: # implicit array size
desired_size = arr.size // desired_size_factor * desired_size_factor
else:
desired_size = desired_size_factor
return arr.flat[:desired_size].reshape(shape)
Which your shapeshifter could use in place of reshape

How to create sub-arrays access the i-th dimension of an array within for()?

In a for-loop, I run in i over an array which I would like to sub-index in dimension i. How can this be done? So a minimal example would be
(A <- array(1:24, dim = 2:4))
A[2,,] # i=1
A[,1,] # i=2
A[,,3] # i=3
where I index 'by foot'. I tried something along the lines of this but wasn't successful. Of course one could could create "2,," as a string and then eval & parse the code, but that's ugly. Also, inside the for loop (over i), I could use aperm() to permute the array such that the new first dimension is the former ith, so that I can simply access the first component. But that's kind of ugly too and requires to permute the array back. Any ideas how to do it more R-like/elegantly?
The actual problem is for a multi-dimensional table() object, but I think the idea will remain the same.
Update
I accepted Rick's answer. I just present it with a for loop and simplified it further:
subindex <- c(2,1,3) # in the ith dimension, we would like to subindex by subindex[i]
for(i in seq_along(dim(A))) {
args <- list(1:2, 1:3, 1:4)
args[i] <- subindex[i]
print(do.call("[", c(list(A), args)))
}
#Build a multidimensional array
A <- array(1:24, dim = 2:4)
# Select a sub-array
indexNumber = 2
indexSelection = 1
# Build a parameter list indexing all the elements of A
parameters <- list(A, 1:2, 1:3, 1:4)
# Modify the appropriate list element to a single value
parameters[1 + indexNumber] <- indexSelection
# select the desired subarray
do.call("[", parameters)
# Now for something completely different!
#Build a multidimensional array
A <- array(1:24, dim = 2:4)
# Select a sub-array
indexNumber = 2
indexSelection = 1
reduced <- A[slice.index(A, indexNumber) == indexSelection]
dim(reduced) <- dim(A)[-indexNumber]
# Also works on the left-side
A[slice.index(A, 2)==2] <- -1:-8

Resources