Altering arrays to add/remove entries at each time-step in R - arrays

This question, probably has a simple solution but I cannot think of how to do it...
So I have a script as follows:
# ------------------ MODEL SETUP ----------------------------------------# simulation length
t_max <- 50
# arena
arena_x <- 100
arena_y <- 100
# plant parameters
a <- 0.1
b <- 0.1
g <- 1
# list of plant locations and initial sizes
nplants <-dim(plantLocsX)[1]*dim(plantLocsX)[2]
iterations<-5
totalBiomass<-matrix(0,nrow=iterations,ncol=1)
# starting loop
sep <- 10
# Original matrix
plantLocsX <- matrix(rep(seq(0,arena_x,sep), arena_y/sep),
nrow=1+arena_x/sep,
ncol=1+arena_y/sep)
plantLocsY <- t(plantLocsX)
plantSizes <- matrix(1,nrow=nplants,ncol=1)
# Plot the plants
radius <- sqrt( plantSizes/ pi )
symbols(plantLocsX, plantLocsY, radius, xlim = c(0,100), ylim=c(0,100), inches=0.05, fg = "green",
xlab = "x domain (m)", ylab = "y domain (m)", main = "Random Plant Locations", col.main = 51)
# Calculate distances between EACH POSSIBLE PAIR of plants
distances <- matrix(0,nrow=nplants,ncol=nplants)
for (i in 1:nplants){
for (j in 1:nplants){
distances[i,j] <- sqrt( (plantLocsX[i]-plantLocsX[j])^2 + (plantLocsY[i]-plantLocsY[j])^2 )
}
}
# ------------------ MODEL RUNNING ---------------------------------------
I need to alter the arrays containing plant locations and plant sizes so that at each time step, entries are removed and added (simulating mortality/reproduction, respectively). The "distances" must be updated with plant locations and sizes after each iteration...I can only think of complex ways to do this: destructing and constructing new matrices at each time step to fit the new number of elements but there must be functions to make this simpler....any advice?
Many thanks!!

Related

R: Adding columns from one data frame to another, non-matching number of rows

I have a .txt file with millions of rows of data - DateTime (1-min intervals) and Precipitation.
I have a .csv file with thousands of rows of data - DateTime (daily intevals), MaxTemp, MinTemp, WindSpd, WindDir.
I import the .txt file as a data frame and do a few transformations. I then move this into a new data frame.
I import the .csv file as a data frame do a few transformations. I then want to add the columns from this data frame into the new data frame (total of 7 columns). However, R throws an error: "Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 10382384, 32868, 1"
I know the number of rows is different, however, this is the format I need for the next step in processing. This could be easily done in Excel were it not for the crazy amount of rows.
Simulated code is below, which produces the same error:
a <- as.character(c(1,2,3,4,5,6,7,8,9,10))
b <- c(paste("Date", a))
c <- c(rnorm(10, mean = 5, sd = 2.1))
Frame1 <- data.frame(b,c)
d <- as.character(c(1,2,3))
e <- c(paste("Date", d))
f <- c(rnorm(3, mean = 1, sd = 0.7))
g <- c(rnorm(3, mean = 3, sd = 2))
h <- c(rnorm(3, mean = 8, sd = 1))
Frame2 <- data.frame(e,f,g,h)
NewFrame <- cbind(Frame1)
NewFrame <- cbind(NewFrame, Frame2)
I have tried a *_join but it throws error: "Error: by must be supplied when x and y have no common variables.
i use by = character()` to perform a cross-join." which to me reads like it wants to match things up, which I don't need. I really just need to plop these two datasets side-by-side for the next processing step. Help?
The data frames MUST have an equal number of rows. To compensate then, I just added a bunch of rows to the smaller dataset so that it contains the same amount of rows as the larger dataset (in my case, it will always be the .csv file) and filled it with "NA" values. The following application I use for downstream processing knows how to handle the "NA" values so this works well for me.
I've run the solution with a representative dataset and I am able to cbind the two data frames together.
Sample code with the simulated dataset:
#create data frame 1
a <- as.character(c(1:10))
b <- c(paste("Date", a))
c <- c(rnorm(10, mean = 5, sd = 2.1))
Frame1 <- data.frame(b,c)
#create date frame 2
d <- as.character(c(1,2,3))
e <- c(paste("Date", d))
f <- c(rnorm(3, mean = 1, sd = 0.7))
g <- c(rnorm(3, mean = 3, sd = 2))
h <- c(rnorm(3, mean = 8, sd = 1))
Frame2 <- data.frame(e,f,g,h)
#find the maximum number of rows
maxlen <- max(nrow(Frame1), nrow(Frame2))
#finds the minimum number of rows
rowrow <- min(nrow(Frame1), nrow(Frame2))
#adds enough rows to the smaller dataset to equal the number of rows
#in the larger dataset. Populates the rows with "NA" values
Frame2[rowrow+(maxlen-rowrow),] <- NA
#creates the new data frame from the two frames
NewFrame <- cbind(NewFrame, Frame2)

List comprehensions in NumPy arrays

In essence this is what I want to create
import numpy as np
N = 100 # POPULATION SIZE
D = 30 # DIMENSIONALITY
lowerB = [-5.12] * D # LOWER BOUND (IN ALL DIMENSIONS)
upperB = [5.12] * D # UPPER BOUND (IN ALL DIMENSIONS)
# INITIALISATION PHASE
X = np.empty([N, D]) # EMPTY FLIES ARRAY OF SIZE: (N,D)
# INITIALISE FLIES WITHIN BOUNDS
for i in range(N):
for d in range(D):
X[i, d] = np.random.uniform(lowerB[d], upperB[d])
but I want to do so without the for loops to save time and use List comprehensions
I have try things like
np.array([(x,y)for x in range(N)for y in range(D)])
but this doesn’t get me to an array like array([100,30]). Does anyone know a tutorial or the correct documentation I should be looking at so I can learn exactly how to do this?

Moving rows between multiple subarrays

My question follows from a previous question that I asked, but without including my own code (which I should have done initially).
Moving rows between subarrays
which solves my dilemma only partially. But I have adopted the method in the below code.
Here is relevant code to my specific problem:
K <- 2 # number of equally-sized (sub)populations
N <- 5 # total number of sampled individuals
Hstar <- 5 # total number of haplotypes
probs <- rep(1/Hstar, Hstar) # haplotype frequencies
m = 0.1 # migration rate between subpopulations
perms <- 10000 # number of permutations
## Set up container(s) to hold the identity of each individual from each permutation ##
num.specs <- ceiling(N / K)
## Create an ID for each haplotype ##
haps <- 1:Hstar
## Assign individuals (N) to each subpopulation (K) ##
specs <- 1:num.specs
## Generate permutations, assume each permutation has N individuals, and sample those individuals' haplotypes from the probabilities ##
gen.perms <- function() {
sample(haps, size = num.specs, replace = TRUE, prob = probs)
}
pop <- array(dim = c(perms, num.specs, K))
for (i in 1:K) {
pop[,, i] <- replicate(perms, gen.perms())
}
## Allow individuals from permutations to migrate between subpopulations ##
for (i in 1:K) {
if (m != 0){
ind <- sample(perms, size = perms * m, replace = FALSE) # sample random row from random subpopulation
}
pop[ind,] ## should swap rows between subarrays, but instead throws an error.
}
'ind' identifies the rows that are to be swapped.
The goal is to swap rows from one subpopulation (= subarray) to the other as initially asked in the linked question. For example, switch row 1 of subarray 1 with row 100 of subarray 2. Most importantly, I need to preserve the array type. In the end, 'pop' must have dimensions = c(perms, num.specs, K). Can this be done?
Any assistance is greatly appreciated.
i think you forget to put a comma. I guess you need to change line 38 to this:
pop[ind,,] ## should swap rows between subarrays, but instead throws an error.
and when you want to go on with the changed pop-array, with swapped lines, you need to store it in a variable. for example like this:
pop_new <- pop[ind,,]
or you can store it in the same variable, which mean it will overwrite the old content of pop. When you use the same vector (ind) for index as well the target variable you replace only the old value. With sample() you swap the rows randomly :
pop[ind,,] <- pop[sample(ind),,]

R routine always samples from last row of array instead of random rows

I have been debugging the following routine for some time.
A problem that came to my attention is that sampling is always done on the last row of my array every time I run the simulation. I want it to select rows at random each time the code is run.
Here's what I have:
N <- 10
Hstar <- 5
perms <- 10 ### How many permutations are we considering
specs <- 1:N
Set up a container to hold the identity of each individual from each permutation
pop <- array(dim = c(perms, N))
haps <- as.character(1:Hstar)
Assign probabilities
probs <- rep(1/Hstar, Hstar)
Generate permutations
for(i in 1:perms){
pop[i, ] <- sample(haps, size = N, replace = TRUE, prob = probs)
}
Make a matrix to hold the 1:N individuals from each permutation
HAC.mat <- array(dim = c(perms, N))
for(j in specs){
for(i in 1:perms){
ind.index <- sample(specs, size = j, replace = FALSE) ## which individuals will we sample
hap.plot <- pop[i, ind.index] ## pull those individuals from a permutation
HAC.mat[i, j] <- length(unique(hap.plot)) ## how many haplotypes did we get for a given sampling intensity (j) from each ### permutation (i)
}
}
When I look at ind.index and hap.plot, I notice that values from haps are always taken from the last row in the pop variable and I can quite understand why this is occurring. I would like it to randomly sample from a given row in pop.
Any help is greatly appreciated.
I have found a workaround that looks like it works.
hap.plot <- pop[sample(nrow(pop), size = 1, replace = TRUE), ]

Read multidimensional NetCDF as data frame in R

I use a netCDF file which stores one variable and has following dimensions: lon, lat, time.
Generally speaking I wish to compare it against different data that I have already in R stored as dataframe - first two columns are coordinates in WGS84, and next are values for specific time.
So I wrote following code.
# since # ncFile$dim$time$units say: [1] "days since 1900-1-1"
daysFromDate <- function(data1, data2="1900-01-01")
{
round(as.numeric(difftime(data1,data2,units = "days")))
}
#study area:
lon <- c(40.25, 48)
lat <- c(16, 24.25)
myTime <- c(daysFromDate("2008-01-16"), daysFromDate("2011-12-31"))
varName <- "spei"
require(ncdf4)
require(RCurl)
x <- getBinaryURL("http://digital.csic.es/bitstream/10261/104742/3/SPEI_01.nc")
ncFile <- nc_open(x)
LonIdx <- which( ncFile$dim$lon$vals >= lon[1] | ncFile$dim$lon$vals <= lon[2])
LatIdx <- which( ncFile$dim$lat$vals >= lat[1] & ncFile$dim$lat$vals <= lat[2])
TimeIdx <- which( ncFile$dim$time$vals >= myTime[1] & ncFile$dim$time$vals <= myTime[2])
MyVariable <- ncvar_get( ncFile, varName)[ LonIdx, LatIdx, TimeIdx]
I thought that data frame will be returned so that I will be able to easily manipulate data (in example - check correlation or create a plot).
Unfortunately 3-dimensional list has been returned instead.
How can I reformat this to data frame with following columns X-Y-Time1-Time2-...
So, example data will looks as follows
X Y 2014-01-01 2014-01-02 2014-01-02
50 17 0.5 0.4 0.3
where 0.5, 0.4 and 0.3 are example variable values
Or maybe there is different solution?
Ok, try following code, but it assumes that ranges are dense filled. And I changed lon test from or to and
require(ncdf4)
nc <- nc_open("SPEI_01.nc")
print(nc)
lon <- ncvar_get(nc, "lon")
lat <- ncvar_get(nc, "lat")
time <- ncvar_get(nc, "time")
lonIdx <- which( lon >= 40.25 & lon <= 48.00)
latIdx <- which( lat >= 16.00 & lat <= 24.25)
myTime <- c(daysFromDate("2008-01-16"), daysFromDate("2011-12-31"))
timeIdx <- which(time >= myTime[1] & time <= myTime[2])
data <- ncvar_get(nc, "spei")[lonIdx, latIdx, timeIdx]
indices <- expand.grid(lon[lonIdx], lat[latIdx], time[timeIdx])
print(length(indices))
class(indices)
summary(indices)
str(indices)
df <- data.frame(cbind(indices, as.vector(data)))
summary(df)
str(df)
UPDATE
ok, looks like I got the idea what do you want, but have do direct solution. What I've got so far is this: split data frame using either split() function or data.table package. After splitting by X&Y, you'll get lists of small data frames where X&Y are a constant for a given frame. Probably is it possible to transpose and recombine them back, but I have no idea how. It might be a good idea to continue to work with data as columns, Lists are nested, could be flattened, and here is link for splitting in R: http://www.uni-kiel.de/psychologie/rexrepos/posts/dfSplitMerge.html
Code, as continued from previous example
require(data.table)
colnames(df) <- c("X","Y","Time","spei")
df$Time <- as.Date(df$Time, origin="1900-01-01")
dt <- as.data.table(df)
summary(dt)
# Taken from https://github.com/Rdatatable/data.table/issues/1389
# x data.table
# f use `by` argument instead - unlike data.frame
# drop logical default FALSE will include `by` columns in resulting data.tables - unlike data.frame
# by character column names on which split into lists
# flatten logical default FALSE will result in recursive nested list having data.table as leafs
# ... ignored
split.data.table <- function(x, f, drop = FALSE, by, flatten = FALSE, ...){
if(missing(by) && !missing(f)) by = f
stopifnot(!missing(by), is.character(by), is.logical(drop), is.logical(flatten), !".ll" %in% names(x), by %in% names(x), !"nm" %in% by)
if(!flatten){
.by = by[1L]
tmp = x[, list(.ll=list(.SD)), by = .by, .SDcols = if(drop) setdiff(names(x), .by) else names(x)]
setattr(ll <- tmp$.ll, "names", tmp[[.by]])
if(length(by) > 1L) return(lapply(ll, split.data.table, drop = drop, by = by[-1L])) else return(ll)
} else {
tmp = x[, list(.ll=list(.SD)), by=by, .SDcols = if(drop) setdiff(names(x), by) else names(x)]
setattr(ll <- tmp$.ll, 'names', tmp[, .(nm = paste(.SD, collapse = ".")), by = by, .SDcols = by]$nm)
return(ll)
}
}
# here is data.table split
q <- split.data.table(dt, by = c("X","Y"), drop=FALSE)
str(q)
# here is data frame split
qq <- split(df, list(df$X, df$Y))
str(qq)

Resources