How to create a grid from 1D array using R? - arrays

I have a file which contains a 209091 element 1D binary array representing the global land area
which can be downloaded from here:
ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_flags_2002/
I want to create a full from the 1D data arrays using provided ancillary row and column files .globland_r and globland_c which can be downloaded from here:
ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/
There is a code written in Matlab for this purpose and I want to translate this Matlab code to R but I do not know Matlab
function [gridout, EASE_r, EASE_s] = mkgrid_global(x)
%MKGRID_GLOBAL(x) Creates a matrix for mapping
% gridout = mkgrid_global(x) uses the 2090887 element array (x) and returns
%Load ancillary EASE grid row and column data, where <MyDir> is the path to
%wherever the globland_r and globland_c files are located on your machine.
fid = fopen('C:\MyDir\globland_r','r');
EASE_r = fread(fid, 209091, 'int16');
fclose(fid);
fid = fopen('C:\MyDir\globland_c','r');
EASE_s = fread(fid, 209091, 'int16');
fclose(fid);
gridout = NaN.*zeros(586,1383);
%Loop through the elment array
for i=1:1:209091
%Distribute each element to the appropriate location in the output
%matrix (but MATLAB is
%(1,1)
end
EDit following the solution of #mdsumner:
The files MLLATLSB and MLLONLSB (4-byte integers) contain latitude and longitude (multiply by 1e-5) for geo-locating the full global EASE grid matrix (586×1383)
MLLATLSB and MLLONLSB can be downloaded from here:
ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/

## the sparse dims, literally the xcol * yrow indexes
dims <- c(1383, 586)
cfile <- "ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/globland_c"
rfile <- "ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/globland_r"
## be nice, don't abuse this
col <- readBin(cfile, "integer", n = prod(dims), size = 2, signed = FALSE)
row <- readBin(rfile, "integer", n = prod(dims), size = 2, signed = FALSE)
## example data file
fdat <- "ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_flags_2002/flags_2002170A.bin"
dat <- readBin(fdat, "integer", n = prod(dims), size = 1, signed = FALSE)
## now get serious
m <- matrix(as.integer(NA), dims[2L], dims[1L])
m[cbind(row + 1L, col + 1L)] <- dat
image(t(m)[,dims[2]:1], col = rainbow(length(unique(m)), alpha = 0.5))
Maybe we can reconstruct this map projection too.
flon <- "MLLONLSB"
flat <- "MLLATLSB"
## the key is that these are integers, floats scaled by 1e5
lon <- readBin(flon, "integer", n = prod(dims), size = 4) * 1e-5
lat <- readBin(flat, "integer", n = prod(dims), size = 4) * 1e-5
## this is all we really need from now on
range(lon)
range(lat)
library(raster)
library(rgdal) ## need for coordinate transformation
ex <- extent(projectExtent(raster(extent(range(lon), range(lat)), crs = "+proj=longlat"), "+proj=cea"))
grd <- raster(ncols = dims[1L], nrows = dims[2L], xmn = xmin(ex), xmx = xmax(ex), ymn = ymin(ex), ymx = ymax(ex), crs = "+proj=cea")
There is probably an "out by half pixel" error in there, left as an exercise.
Test
plot(setValues(grd, m), col = rainbow(max(m, na.rm = TRUE), alpha = 0.5))
Hohum
library(maptools)
data(wrld_simpl)
plot(spTransform(wrld_simpl, CRS(projection(grd))), add = TRUE)
We can now save the valid cellnumbers to match our "grd" template, then read any particular dat-file and just populate the template with those values based on cellnumbers. Also, it seems someone trod nearly this path earlier but not much was gained:
How to identify lat and long for a global matrix?

Related

R: Adding columns from one data frame to another, non-matching number of rows

I have a .txt file with millions of rows of data - DateTime (1-min intervals) and Precipitation.
I have a .csv file with thousands of rows of data - DateTime (daily intevals), MaxTemp, MinTemp, WindSpd, WindDir.
I import the .txt file as a data frame and do a few transformations. I then move this into a new data frame.
I import the .csv file as a data frame do a few transformations. I then want to add the columns from this data frame into the new data frame (total of 7 columns). However, R throws an error: "Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 10382384, 32868, 1"
I know the number of rows is different, however, this is the format I need for the next step in processing. This could be easily done in Excel were it not for the crazy amount of rows.
Simulated code is below, which produces the same error:
a <- as.character(c(1,2,3,4,5,6,7,8,9,10))
b <- c(paste("Date", a))
c <- c(rnorm(10, mean = 5, sd = 2.1))
Frame1 <- data.frame(b,c)
d <- as.character(c(1,2,3))
e <- c(paste("Date", d))
f <- c(rnorm(3, mean = 1, sd = 0.7))
g <- c(rnorm(3, mean = 3, sd = 2))
h <- c(rnorm(3, mean = 8, sd = 1))
Frame2 <- data.frame(e,f,g,h)
NewFrame <- cbind(Frame1)
NewFrame <- cbind(NewFrame, Frame2)
I have tried a *_join but it throws error: "Error: by must be supplied when x and y have no common variables.
i use by = character()` to perform a cross-join." which to me reads like it wants to match things up, which I don't need. I really just need to plop these two datasets side-by-side for the next processing step. Help?
The data frames MUST have an equal number of rows. To compensate then, I just added a bunch of rows to the smaller dataset so that it contains the same amount of rows as the larger dataset (in my case, it will always be the .csv file) and filled it with "NA" values. The following application I use for downstream processing knows how to handle the "NA" values so this works well for me.
I've run the solution with a representative dataset and I am able to cbind the two data frames together.
Sample code with the simulated dataset:
#create data frame 1
a <- as.character(c(1:10))
b <- c(paste("Date", a))
c <- c(rnorm(10, mean = 5, sd = 2.1))
Frame1 <- data.frame(b,c)
#create date frame 2
d <- as.character(c(1,2,3))
e <- c(paste("Date", d))
f <- c(rnorm(3, mean = 1, sd = 0.7))
g <- c(rnorm(3, mean = 3, sd = 2))
h <- c(rnorm(3, mean = 8, sd = 1))
Frame2 <- data.frame(e,f,g,h)
#find the maximum number of rows
maxlen <- max(nrow(Frame1), nrow(Frame2))
#finds the minimum number of rows
rowrow <- min(nrow(Frame1), nrow(Frame2))
#adds enough rows to the smaller dataset to equal the number of rows
#in the larger dataset. Populates the rows with "NA" values
Frame2[rowrow+(maxlen-rowrow),] <- NA
#creates the new data frame from the two frames
NewFrame <- cbind(NewFrame, Frame2)

Get Matrix 2D from Matrix 3D with a given choice of the third dimension corresponding to the first dimension

I have:
A matrix 3D: A = (m, n, k).
An array of choices for the third dimension corresponding to each index of the first dimension. idn = (m, 1) (wherein the value of any idn is a random integer in [1,k].
I need to capture the 2D matrix B (m,n) wherein the referred third dimension to A is taken from the corresponding choice. For example:
idn(1) = 1;
idn(2) = k;
idn(j) = k-1;
Then:
B(1,:) = A(1,:,idn(1)) = A(1,:,1);
B(2,:) = A(2,:,idn(2)) = A(2,:,k);
B(j,:) = A(j,:,idn(j)) = A(j,:,k-1);
Since idn is not constant, a simple squeeze could not help.
I have also tried the below code, but it does not work either.
B = A(:,:,idn(:));
It is very much appreciated if anyone could give me a solution.
This could be done with sub2ind and permute, but the simplest way I can think of is using linear indexing manually:
A = rand(3, 4, 5); % example data
idn = [5; 1; 2]; % example data
ind = (1:size(A,1)).' + size(A,1)*size(A,2)*(idn(:)-1); % 1st and 3rd dimensions
ind = ind + size(A,1)*(0:size(A,2)-1); % include 2nd dimension using implicit expansion
B = A(ind); % index into A to get result

Read multidimensional NetCDF as data frame in R

I use a netCDF file which stores one variable and has following dimensions: lon, lat, time.
Generally speaking I wish to compare it against different data that I have already in R stored as dataframe - first two columns are coordinates in WGS84, and next are values for specific time.
So I wrote following code.
# since # ncFile$dim$time$units say: [1] "days since 1900-1-1"
daysFromDate <- function(data1, data2="1900-01-01")
{
round(as.numeric(difftime(data1,data2,units = "days")))
}
#study area:
lon <- c(40.25, 48)
lat <- c(16, 24.25)
myTime <- c(daysFromDate("2008-01-16"), daysFromDate("2011-12-31"))
varName <- "spei"
require(ncdf4)
require(RCurl)
x <- getBinaryURL("http://digital.csic.es/bitstream/10261/104742/3/SPEI_01.nc")
ncFile <- nc_open(x)
LonIdx <- which( ncFile$dim$lon$vals >= lon[1] | ncFile$dim$lon$vals <= lon[2])
LatIdx <- which( ncFile$dim$lat$vals >= lat[1] & ncFile$dim$lat$vals <= lat[2])
TimeIdx <- which( ncFile$dim$time$vals >= myTime[1] & ncFile$dim$time$vals <= myTime[2])
MyVariable <- ncvar_get( ncFile, varName)[ LonIdx, LatIdx, TimeIdx]
I thought that data frame will be returned so that I will be able to easily manipulate data (in example - check correlation or create a plot).
Unfortunately 3-dimensional list has been returned instead.
How can I reformat this to data frame with following columns X-Y-Time1-Time2-...
So, example data will looks as follows
X Y 2014-01-01 2014-01-02 2014-01-02
50 17 0.5 0.4 0.3
where 0.5, 0.4 and 0.3 are example variable values
Or maybe there is different solution?
Ok, try following code, but it assumes that ranges are dense filled. And I changed lon test from or to and
require(ncdf4)
nc <- nc_open("SPEI_01.nc")
print(nc)
lon <- ncvar_get(nc, "lon")
lat <- ncvar_get(nc, "lat")
time <- ncvar_get(nc, "time")
lonIdx <- which( lon >= 40.25 & lon <= 48.00)
latIdx <- which( lat >= 16.00 & lat <= 24.25)
myTime <- c(daysFromDate("2008-01-16"), daysFromDate("2011-12-31"))
timeIdx <- which(time >= myTime[1] & time <= myTime[2])
data <- ncvar_get(nc, "spei")[lonIdx, latIdx, timeIdx]
indices <- expand.grid(lon[lonIdx], lat[latIdx], time[timeIdx])
print(length(indices))
class(indices)
summary(indices)
str(indices)
df <- data.frame(cbind(indices, as.vector(data)))
summary(df)
str(df)
UPDATE
ok, looks like I got the idea what do you want, but have do direct solution. What I've got so far is this: split data frame using either split() function or data.table package. After splitting by X&Y, you'll get lists of small data frames where X&Y are a constant for a given frame. Probably is it possible to transpose and recombine them back, but I have no idea how. It might be a good idea to continue to work with data as columns, Lists are nested, could be flattened, and here is link for splitting in R: http://www.uni-kiel.de/psychologie/rexrepos/posts/dfSplitMerge.html
Code, as continued from previous example
require(data.table)
colnames(df) <- c("X","Y","Time","spei")
df$Time <- as.Date(df$Time, origin="1900-01-01")
dt <- as.data.table(df)
summary(dt)
# Taken from https://github.com/Rdatatable/data.table/issues/1389
# x data.table
# f use `by` argument instead - unlike data.frame
# drop logical default FALSE will include `by` columns in resulting data.tables - unlike data.frame
# by character column names on which split into lists
# flatten logical default FALSE will result in recursive nested list having data.table as leafs
# ... ignored
split.data.table <- function(x, f, drop = FALSE, by, flatten = FALSE, ...){
if(missing(by) && !missing(f)) by = f
stopifnot(!missing(by), is.character(by), is.logical(drop), is.logical(flatten), !".ll" %in% names(x), by %in% names(x), !"nm" %in% by)
if(!flatten){
.by = by[1L]
tmp = x[, list(.ll=list(.SD)), by = .by, .SDcols = if(drop) setdiff(names(x), .by) else names(x)]
setattr(ll <- tmp$.ll, "names", tmp[[.by]])
if(length(by) > 1L) return(lapply(ll, split.data.table, drop = drop, by = by[-1L])) else return(ll)
} else {
tmp = x[, list(.ll=list(.SD)), by=by, .SDcols = if(drop) setdiff(names(x), by) else names(x)]
setattr(ll <- tmp$.ll, 'names', tmp[, .(nm = paste(.SD, collapse = ".")), by = by, .SDcols = by]$nm)
return(ll)
}
}
# here is data.table split
q <- split.data.table(dt, by = c("X","Y"), drop=FALSE)
str(q)
# here is data frame split
qq <- split(df, list(df$X, df$Y))
str(qq)

Calculation data from one array to another

I have two array, the first one is data_array(50,210), the second one is dest_array(210,210). The goal, using data from data_array to calculate the values of dest_array at specific indicies, without using for-loop.
I do it in such way:
function [ out ] = grid_point( row,col,cg_row,cg_col,data,kernel )
ker_len2 = floor(length(kernel)/2);
op1_vals = data((row - ker_len2:row + ker_len2),(col - ker_len2:col + ker_len2));
out(cg_row,cg_col) = sum(sum(op1_vals.*kernel)); %incorre
end
function [ out ] = sm(dg_X, dg_Y)
%dg_X, dg_Y - arrays of size 210x210, the values - coordinates of data in data_array,
%index of each element - position this data at 210x210 grid
data_array = randi(100,50,210); %data array
kernel = kernel_sinc2d(17,'hamming'); %sinc kernel for calculations
ker_len2 = floor(length(kernel)/2);
%adding the padding for array, to avoid
%the errors related to boundaries of data_array
data_array = vertcat(data_array(linspace(ker_len2+1,2,ker_len2),:),...
data_array,...
data_array(linspace(size(data_array,1)-1,size(data_array,1) - ker_len2,ker_len2),:));
data_array = horzcat(data_array(:,linspace(ker_len2+1,2,ker_len2)),...
data_array,...
data_array(:,linspace(size(data_array,2)-1,(size(data_array,2) - ker_len2,ker_len2)));
%cg_X, cg_Y - arrays of indicies for X and Y directions
[cg_X,cg_Y] = meshgrid(linspace(1,210,210),linspace(1,210,210));
%for each point at grid(210x210) formed by cg_X and cg_Y,
%we should calculate the value, using the data from data_array(210,210).
%after padding, data_array will have size (50 + ker_len2*2, 210 + ker_len2*2)
dest_array = arrayfun(#(y,x,cy,cx) grid_point(y, x, cy, cx, data_array, kernel),...
dg_Y, dg_X, cg_Y, cg_X);
end
But, it seems that arrayfun cannot resolve my problem, because I use arrays with different sizes. Have somebody the ideas of this?
I am not completely sure, but judging from the title, this may be what you want:
%Your data
data_array_small = rand(50,210)
data_array_large = zeros(210,210)
%Indicating the points of interest
idx = randperm(size(data_array_large,1));
idx = idx(1:size(data_array_small,1))
%Now actually use the information:
data_array_large(idx,:) = data_array_small

How to read multiple files into a multi-dimensional array

I want to make array in 3 dimension.
Here is what I tried:
z<-c(160,720,420)
first_data_set <-array(dim = length(file_1), dimnames = z)
Data that I am reading is in one level. (only x and y)
There are other data in the same format, and I need to put them in the same array with the first data. So once I finish reading all data, all of them are in the same array but there is no overwriting.
So I think array has to be 3 dimensions; otherwise I cannot keep all data that I read in loop.
Say that you have two matrices of size 3x4:
m1 <- matrix(rnorm(12), nrow = 3, ncol = 4)
m2 <- matrix(rnorm(12), nrow = 3, ncol = 4)
If you want to place them in an array, first make an array of NA's:
A <- array(as.numeric(NA), dim = c(3,4,2))
Then populate the layers with data:
A[,,1] <- m1
A[,,2] <- m2
As suggested by #Justin, you could also just put the matrices together in a list:
A2 <- list()
A2[['m1']] <- m1
A2[['m2']] <- m2
To read matrices from files: using a list makes it easier to get these matrices from files in a directory, without having to specify the dimensions in advance. Assume you want all files with extension csv:
myfiles <- dir(pattern = ".csv")
for (i in 1:length(myfiles)){
A2[[myfiles[i]]] <- read.table(myfiles[i], sep = ',')
}

Resources