I am using a RNN with LSTM nodes in Keras for a time series prediction. I have two input features and one output feature and I'm using a sliding window of size 4 and stepsize 1.
So I'm trying to prepare the arrays accordingly for the LSTM to handle the data. However, the dimensions don't seem to be right. I've got it to a point where I got the 3D array in the right shape for the network to take it, but the way the data is set up in the array does not seem right to me.
So, looking only at the training data, this is the raw data from the file:
train_input = df[['a','b']].values (this is of shape (354, 2))
train_output = df[['c']].values (this is of shape (354, 1))
Next I scale the data, after which the shape still remains the same. And then I want to use a loop in order to bring the data into the sliding window shape (window size 4, range 354):
train_input_window = []
train_output_window = []
for i in range(4, 354):
train_input_window.append(train_input_scaled[i-4:i, 0])
train_input_window.append(train_input_scaled[i-4:i, 1])
train_output_window.append(train_output_scaled[i, 0])
train_input_window = np.array(train_input_window)
train_output_window = np.array(train_output_window)
Now train_input_window is of shape (700, 4)
and train_output_window is of shape (350,)
So this is where the problem lies, I think. Because I can reshape the data into a 3D array that will work:
train_input3D = np.reshape(train_input_window, (350,4,2))
train_output3D = np.reshape(train_output_window, (350,1,1))
but I just don't think that the data is arranged correctly inside the arrays.
the training input looks somethin like this:
print(train_input3D)
[[[a a]
[a a]
[b b]
[b b]]
[[a a]
[a a]
[b b]
[b b]].....
shouldn't it be
[[[a b]
[a b]
[a b]
[a b]]
[[a b]
[a b]
[a b]
[a b]].....
I tried so much different stuff, and by now I'm so confused that I just hope I'm not also confusing everyone here by trying to explain.
So, is the input shape that I think my array should be in correct for what Im trying? If so, how do I arrange it that way?
Here is my complete code:
#Read Data
df = pd.ExcelFile('GPT.xlsx').parse('7Avg')
# Training Data
train_input = df[['Precip_7Sum','Temp_7Avg']].values#
train_output = df[['GWL_7Avg']].values
# Testing Data
test_input = df[['Precip_7SumT','Temp_7AvgT']].values#
test_output = df[['GWL_7AvgT']].values
# normalize / scale Data
input_scaler = MinMaxScaler(feature_range = (0, 1))
output_scaler = MinMaxScaler(feature_range = (0, 1))
train_input_scaled = input_scaler.fit_transform(train_input)
train_output_scaled = output_scaler.fit_transform(train_output)
test_input_scaled = input_scaler.transform(test_input)
test_output_scaled = output_scaler.transform(test_output)
# Convert Data into sliding window format
train_input_window = []
train_output_window = []
for i in range(4, 354):
train_input_window.append(train_input_scaled[i-4:i, 0])
train_input_window.append(train_input_scaled[i-4:i, 1])
train_output_window.append(train_output_scaled[i, 0])
train_input_window = np.array(train_input_window)
train_output_window = np.array(train_output_window)
test_input_window = []
test_output_window = []
for i in range(4, 354):
test_input_window.append(train_input_scaled[i-4:i, 0])
test_input_window.append(train_input_scaled[i-4:i, 1])
test_output_window.append(train_output_scaled[i, 0])
test_input_window = np.array(test_input_window)
test_output_window = np.array(test_output_window)
# Convert data into 3-D Formats
train_input3D = np.reshape(train_input_window, (350,train_input_window.shape[1],2)) # 3D tensor with shape (batch_size, timesteps, input_dim) // (nr. of samples, nr. of timesteps, nr. of features)
train_output3D = np.reshape(train_output_window, (350,1,1)) #
test_input3D = np.reshape(test_input_window, (350,test_input_window.shape[1],2))
# Instantiate model class
model = Sequential()
# Add LSTM layer
model.add(LSTM(units=1, return_sequences = True, input_shape = (4,2)))
# Add dropout layer to avoid over-fitting
model.add(Dropout(0.2))
# add three more LSTM and Dropouts
model.add(LSTM(units=1, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=1, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=1, return_sequences=True))
model.add(Dropout(0.2))
# Create dense layer at the end of the model to make model more robust
model.add(Dense(units = 1, output_shape = (4,1)))
# Compile model
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Training
model.fit(train_input3D, train_output_window, epochs = 100, batch_size = 4)
# Testing / predictions
train_predictions = model.predict(train_input3D)
test_predictions = model.predict(test_input3D)
# Reverse scaling of data for output data
train_predictions = input_scaler.inverse_transform(train_predictions)
test_predictions = input_scaler.inverse_transform(test_predictions)
orig_data = train_output.append(test_output)
Every help on this would be much appreciated. I hope I could get my problem across clearly enough and that someone who could help actually reads all of this :D
Related
In essence this is what I want to create
import numpy as np
N = 100 # POPULATION SIZE
D = 30 # DIMENSIONALITY
lowerB = [-5.12] * D # LOWER BOUND (IN ALL DIMENSIONS)
upperB = [5.12] * D # UPPER BOUND (IN ALL DIMENSIONS)
# INITIALISATION PHASE
X = np.empty([N, D]) # EMPTY FLIES ARRAY OF SIZE: (N,D)
# INITIALISE FLIES WITHIN BOUNDS
for i in range(N):
for d in range(D):
X[i, d] = np.random.uniform(lowerB[d], upperB[d])
but I want to do so without the for loops to save time and use List comprehensions
I have try things like
np.array([(x,y)for x in range(N)for y in range(D)])
but this doesn’t get me to an array like array([100,30]). Does anyone know a tutorial or the correct documentation I should be looking at so I can learn exactly how to do this?
I'm essentially trying to accomplish this and then this but with a 3D matrix, say (128,128,60,6). The 4th dimension is an array vector that represents the diffusion array at that voxel, e.g.:
d[30,30,30,:] = [dxx, dxy, dxz, dyy, dyz, dzz] = D_array
Where dxx etc. are diffusion for a particular direction. D_array can also be seen as a triangular matrix (since dxy == dyx etc.). So I can use those 2 other answers to get from D_array to D_square, e.g.
D_square = [[dxx, dxy, dxz], [dyx, dyy, dyz],[dzx, dzy, dzz]]
I can't seem to figure out the next step however - how to apply that unit transformation of a D_array into D_square to the whole 3D volume.
Here's the code snippet that works on a single tensor:
#this solves an linear eq. that provides us with diffusion arrays at each voxel in a 3D space
D = np.einsum('ijkt,tl->ijkl',X,bi_plus)
#our issue at this point is we have a vector that represents a triangular matrix.
# first make a tri matx from the vector, testing on unit tensor first
D_tri = np.zeros((3,3))
D_array = D[30][30][30]
D_tri[np.triu_indices(3)] = D_array
# then getting the full sqr matrix
D_square = D_tri.T + D_tri
np.fill_diagonal(D_square, np.diag(D_tri))
So what would be the numpy-way of formulating that unit transformation of the Diffusion tensor to the whole 3D volume all at once?
Approach #1
Here's one using row, col indices from triu_indices for indexing along last two axes into an initialized output array -
def squareformnd_rowcol_integer(ar, n=3):
out_shp = ar.shape[:-1] + (n,n)
out = np.empty(out_shp, dtype=ar.dtype)
row,col = np.triu_indices(n)
# Get a "rolled-axis" view with which the last two axes come to the front
# so that we could index into them just like for a 2D case
out_rolledaxes_view = out.transpose(np.roll(range(out.ndim),2,0))
# Assign permuted version of input array into rolled output version
arT = np.moveaxis(ar,-1,0)
out_rolledaxes_view[row,col] = arT
out_rolledaxes_view[col,row] = arT
return out
Approach #2
Another one with the last two axes merged into one and then indexing with linear indices -
def squareformnd_linear_integer(ar, n=3):
out_shp = ar.shape[:-1] + (n,n)
out = np.empty(out_shp, dtype=ar.dtype)
row,col = np.triu_indices(n)
idx0 = row*n+col
idx1 = col*n+row
ar2D = ar.reshape(-1,ar.shape[-1])
out.reshape(-1,n**2)[:,idx0] = ar2D
out.reshape(-1,n**2)[:,idx1] = ar2D
return out
Approach #3
Finally altogether a new method using masking and should be better with performance as most masking based ones are when it comes to indexing -
def squareformnd_masking(ar, n=3):
out = np.empty((n,n)+ar.shape[:-1] , dtype=ar.dtype)
r = np.arange(n)
m = r[:,None]<=r
arT = np.moveaxis(ar,-1,0)
out[m] = arT
out.swapaxes(0,1)[m] = arT
new_axes = range(out.ndim)[2:] + [0,1]
return out.transpose(new_axes)
Timings on (128,128,60,6) shaped random array -
In [635]: ar = np.random.rand(128,128,60,6)
In [636]: %timeit squareformnd_linear_integer(ar, n=3)
...: %timeit squareformnd_rowcol_integer(ar, n=3)
...: %timeit squareformnd_masking(ar, n=3)
10 loops, best of 3: 103 ms per loop
10 loops, best of 3: 103 ms per loop
10 loops, best of 3: 53.6 ms per loop
A vectorized way to do it:
# Gets the triangle matrix
d_tensor = np.zeros(128, 128, 60, 3, 3)
triu_idx = np.triu_indices(3)
d_tensor[:, :, :, triu_idx[0], triu_idx[1]] = d
# Make it symmetric
diagonal = np.zeros(128, 128, 60, 3, 3)
idx = np.arange(3)
diagonal[:, :, :, idx, idx] = d_tensor[:, :, :, idx, idx]
d_tensor = np.transpose(d_tensor, (0, 1, 2, 4, 3)) + d_tensor - diagonal
I perform calculations using 5 fold cross validation. I want to collect all the predictions in one array in order to avoid statistic calculation per fold. I have tried doing it by extending array of predictions by adding array to an existing array. For example:
for train_index, test_index in skf:
fold += 1
x_train, x_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
rf.fit(x_train, y_train)
predicted = rf.predict_proba(x_test)
round_predicted = rf.predict(x_test)
if fold>1:
allFolds_pred = np.concatenate((predicted, allFolds_pred), axis=1)
allFolds_rpred = np.concatenate((round_predicted, allFolds_rpred), axis=1)
allFolds_y = np.concatenate((y_test, allFolds_y), axis=1)
else:
allFolds_pred = predicted
allFolds_rpred = round_predicted
allFolds_y = y_test
fpr, tpr, _ = roc_curve(allFolds_y, llFolds_pred[:,1])
roc_auc = auc(fpr, tpr)
cm=confusion_matrix(allFolds_y, allFolds_rpred, labels=[0, 1])
Calculate statistics.
However it is not working. What is the best way to proceed? Is there any better way to do the same?
I need help! My purpose is to develop in MATLAB a routine that, starting from a series of actions (modeled by a label, a mean value and a variance), is able to generate an array of activity. I explain better with my code:
action_awake_in_bed = [1 5*60 1*60];
action_out_of_bed = [3 30 10];
action_out_bedroom = [2 2*60 15];
ACTIVITY_WAKE = {'action_awake_in_bed','action_out_of_bed','action_out_bedroom'};
The first element of action array is a label (a posture label), the second element is the length of the action (in seconds), the third element the variance.
I need as output the array ACTIVITY_WAKE....
Thanks
Let's use a struct to store the meta-parameters
action.awake_in_bed = [1 5*60 1*60];
action.out_of_bad = [3 30 10];
action.out_of_bedroom = [2 2*60 15];
ACTIVITY = {'awake_in_bed','out_of_bad','out_of_bedroom'};
After these pre-definitions, we can sample an activity vector
ACTIVITY_WAKE = cell(1,numel(ACTIVITY));
for ii = 1:numel( ACTIVITY ) %// foreach activity
cp = action.(ACTIVITY{ii}); %// get parameters of current activity
n = round( cp(2) + sqrt(cp(3))*randn() ); %// get the number of samples
ACTIVITY_WAKE{ii} = repmat( cp(1), 1, n );
end
ACTIVITY_WAKE = [ ACTIVITY_WAKE{:} ];
To get the number of samples I use the following recipe to sample from a normal distribution with mean~=0 and std~=1.
I have a file which contains a 209091 element 1D binary array representing the global land area
which can be downloaded from here:
ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_flags_2002/
I want to create a full from the 1D data arrays using provided ancillary row and column files .globland_r and globland_c which can be downloaded from here:
ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/
There is a code written in Matlab for this purpose and I want to translate this Matlab code to R but I do not know Matlab
function [gridout, EASE_r, EASE_s] = mkgrid_global(x)
%MKGRID_GLOBAL(x) Creates a matrix for mapping
% gridout = mkgrid_global(x) uses the 2090887 element array (x) and returns
%Load ancillary EASE grid row and column data, where <MyDir> is the path to
%wherever the globland_r and globland_c files are located on your machine.
fid = fopen('C:\MyDir\globland_r','r');
EASE_r = fread(fid, 209091, 'int16');
fclose(fid);
fid = fopen('C:\MyDir\globland_c','r');
EASE_s = fread(fid, 209091, 'int16');
fclose(fid);
gridout = NaN.*zeros(586,1383);
%Loop through the elment array
for i=1:1:209091
%Distribute each element to the appropriate location in the output
%matrix (but MATLAB is
%(1,1)
end
EDit following the solution of #mdsumner:
The files MLLATLSB and MLLONLSB (4-byte integers) contain latitude and longitude (multiply by 1e-5) for geo-locating the full global EASE grid matrix (586×1383)
MLLATLSB and MLLONLSB can be downloaded from here:
ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/
## the sparse dims, literally the xcol * yrow indexes
dims <- c(1383, 586)
cfile <- "ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/globland_c"
rfile <- "ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/globland_r"
## be nice, don't abuse this
col <- readBin(cfile, "integer", n = prod(dims), size = 2, signed = FALSE)
row <- readBin(rfile, "integer", n = prod(dims), size = 2, signed = FALSE)
## example data file
fdat <- "ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_flags_2002/flags_2002170A.bin"
dat <- readBin(fdat, "integer", n = prod(dims), size = 1, signed = FALSE)
## now get serious
m <- matrix(as.integer(NA), dims[2L], dims[1L])
m[cbind(row + 1L, col + 1L)] <- dat
image(t(m)[,dims[2]:1], col = rainbow(length(unique(m)), alpha = 0.5))
Maybe we can reconstruct this map projection too.
flon <- "MLLONLSB"
flat <- "MLLATLSB"
## the key is that these are integers, floats scaled by 1e5
lon <- readBin(flon, "integer", n = prod(dims), size = 4) * 1e-5
lat <- readBin(flat, "integer", n = prod(dims), size = 4) * 1e-5
## this is all we really need from now on
range(lon)
range(lat)
library(raster)
library(rgdal) ## need for coordinate transformation
ex <- extent(projectExtent(raster(extent(range(lon), range(lat)), crs = "+proj=longlat"), "+proj=cea"))
grd <- raster(ncols = dims[1L], nrows = dims[2L], xmn = xmin(ex), xmx = xmax(ex), ymn = ymin(ex), ymx = ymax(ex), crs = "+proj=cea")
There is probably an "out by half pixel" error in there, left as an exercise.
Test
plot(setValues(grd, m), col = rainbow(max(m, na.rm = TRUE), alpha = 0.5))
Hohum
library(maptools)
data(wrld_simpl)
plot(spTransform(wrld_simpl, CRS(projection(grd))), add = TRUE)
We can now save the valid cellnumbers to match our "grd" template, then read any particular dat-file and just populate the template with those values based on cellnumbers. Also, it seems someone trod nearly this path earlier but not much was gained:
How to identify lat and long for a global matrix?