How do I merge the xarray dataset? - array-merge

I have eight different xarray data with (7, 375, 121, 240). All the data have same dim, coords, attrb.
I want to make it one xarray data as (7, 375*8, 121, 240).
But when I try mer = xr.merge([ds1,ds2]), it shows the error as below:
MergeError: conflicting values for variable 'hgt' on objects to be combined. You can skip this check by specifying compat='override'.
So I add mer = XR.merge([ds1,ds2], compat='override'), but the result is same as ds1 (7, 375, 121, 240).
How can I mer the xarray dataset as (7, 375*8, 121, 240)?

It seems you need xr.concat, not merge
ds = xr.concat([ds1, ds2, ...], dim=dim_name)
where dim_name is the name of the second dimension in your array

Related

the list of Numpy arrays you are passing to your model is not the size model expects. Expected to see 3 arrays but got the following list of 1 arrays:

I am trying to do transfer learning on Mask RCNN. I get this error when I try to run model fit generator.
The input shapes of my dataset are-
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
(24, 150, 150, 3)
(24, 5)
(8, 150, 150, 3)
(8, 5)
model1.fit_generator(datagen.flow(x_train,y_train, batch_size=batch_size), epochs = 1, validation_data = (x_test,y_test),verbose = 1, steps_per_epoch=x_train.shape[0] // batch_size
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 3 array(s), but instead got the following list of 1 arrays: [array([[[[0.00784314, 0.00392157, 0.00784314],
[0.00784314, 0. , 0. ],
[0.00392157, 0.00784314, 0. ],
...,
[0.15686275, 0.00784314, 0.51764706...

Why won't Keras take my input?

Why does my model in Keras not take in my input/output data?
The input data consist of being a list of numpy.ndarrays of shape (15,1,3) and the output is a list of numpy.arrays with only one number in each entry.
Here is the where I create my model, and pass things in:
model = Sequential()
print "Data-train-in: " + str(data_train_input[0].shape)
print "Data-train-out: " + str(data_train_output[0].shape)
print "Data-test-in: " + str(data_test_input[0].shape)
#sys.exit()
print "Model Definition"
print "Row: " + str(row)
model.add(Convolution2D(64,3,3,input_shape=(3,row,1)))
print model.output_shape
model.add(Convolution2D(32,1,3))
print model.output_shape
model.add(MaxPooling2D((1,1)))
print model.output_shape
model.add(Flatten())
print model.output_shape
model.add(Dense(1,activation='relu'))
print model.output_shape
model.compile(loss='mean_squared_error', optimizer="sgd")
reduce_lr=ReduceLROnPlateau(monitor='val_loss', factor=0.01, patience=3, verbose=1, mode='auto', epsilon=0.0001, cooldown=0, min_lr=0.000000000000000001)
stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=1, mode='auto')
log=csv_logger = CSVLogger('training_'+str(i)+'.csv')
print "Model Train"
hist_current = model.fit(data_train_input,
data_train_output,
shuffle=False,
validation_data=(data_test_input,data_test_output),
validation_split=0.1,
nb_epoch=150,
verbose=1,
callbacks=[reduce_lr,log,stop])
Which outputs:
Data-train-in: (15, 1, 3)
Data-train-out: ()
Data-test-in: (15, 1, 3)
Model Definition
Row: 15
(None, 1, 13, 64)
(None, 1, 11, 32)
(None, 1, 11, 32)
(None, 352)
(None, 1)
Model Train
Traceback (most recent call last):
File "keras_convolutional_feature_extraction.py", line 502, in <module>
model(0,train_input_data,output_data_train,test_input_data,output_data_test)
File "keras_convolutional_feature_extraction.py", line 496, in model
callbacks=[reduce_lr,log,stop])
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 652, in fit
sample_weight=sample_weight)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1038, in fit
batch_size=batch_size)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 963, in _standardize_user_data
exception_prefix='model input')
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 54, in standardize_input_data
'...')
Exception: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 arrays but instead got the following list of 260182 arrays: [array([[[ 67, 255, 180]],
[[ 68, 255, 178]],
[[ 68, 255, 178]],
[[ 67, 255, 180]],
[[ 43, 254, 204]],
[[ 19, 253, 228]],
[[ 9, 205, 241]],
[[ ...
I am not sure on how to interpret the output message. What is wrong here?
Your data doesn't match your input layer. In your model you used input_shape=(3,row,1) which equals to input_shape=(3,15,1) in this context.
But your print show that your training examples are with a different shape of (15, 1, 3).
Try changing your input definition to input_shape=(row,1,3).
Another way to solve the problem is reshaping your data to the input layer shape.
import numpy as np
data_train_input = np.array(data_train_input)
this seems to work.

combine multiple numpy ndarrays as list

I have three equally dimensioned numpy arrays.
I would like to store the data from all three in an array of the same dimensions and size.
To do this, I would like to store three bytes of information per item in the array. I assume this would be a list.
e.g.
>>>red = np.array([[150,25],[37,214]])
>>>green = np.array([[190,27],[123,231]])
>>>blue = np.array([[10,112],[123,119]])
insert combination magic to make a combined array called RGB
>>>RGB
array([(150,190,10),(25,27,112)],[(37,123,123),(214,231,119)])
For a start, each is 2x2. Combined in a list with array, same construction as in making red, produces a 3x2x2.
In [344]: red = np.array([[150,25],[37,214]])
In [345]: green = np.array([[190,27],[123,231]])
In [346]: blue = np.array([[10,112],[123,119]])
In [347]: np.array([red,green,blue])
Out[347]:
array([[[150, 25],
[ 37, 214]],
[[190, 27],
[123, 231]],
[[ 10, 112],
[123, 119]]])
In [348]: _.shape
Out[348]: (3, 2, 2)
That's not the order you want, but we can easily reshape, and if needed transpose.
The target, with an added set of []
In [350]: np.array([[(150,190,10),(25,27,112)],[(37,123,123),(214,231,119)]])
Out[350]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
In [351]: _.shape
Out[351]: (2, 2, 3)
so try moving the 3 shape to the end with transpose:
In [352]: np.array([red,green,blue]).transpose(1,2,0)
Out[352]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
===========================
I should have suggested stack. This a newish version of concatenate that lets us join arrays on different new dimensions. With axis=0 it behaves like np.array. But to join on the last, to put the rgb dimension last use:
In [467]: np.stack((red,green,blue),axis=-1)
Out[467]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
In [468]: _.shape
Out[468]: (2, 2, 3)
Note that this expression does not assume anything about the shape of red, etc, except that they are equal. So it will work with 3d arrays as well.

Methods of creating a structured array

I have the following information and I can produce a numpy array of the desired structure. Note that the values x and y have to be determined separately since their ranges may differ so I cannot use:
xy = np.random.random_integers(0,10,size=(N,2))
The extra list[... conversion is necessary for the conversion in order for it to work in Python 3.4, it is not necessary, but not harmful when using Python 2.7.
The following works:
>>> # attempts to formulate [id,(x,y)] with specified dtype
>>> N = 10
>>> x = np.random.random_integers(0,10,size=N)
>>> y = np.random.random_integers(0,10,size=N)
>>> id = np.arange(N)
>>> dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
>>> arr = np.array(list(zip(id,np.hstack((x,y)))),dt)
>>> arr
array([(0, [7.0, 7.0]), (1, [7.0, 7.0]), (2, [5.0, 5.0]), (3, [0.0, 0.0]),
(4, [6.0, 6.0]), (5, [6.0, 6.0]), (6, [7.0, 7.0]),
(7, [10.0, 10.0]), (8, [3.0, 3.0]), (9, [7.0, 7.0])],
dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])
I cleverly thought I could circumvent the above nasty bits by simply creating the array in the desired vertical structure and applying my dtype to it, hoping that it would work. The stacked array is correct in the vertical form
>>> a = np.vstack((id,x,y)).T
>>> a
array([[ 0, 7, 6],
[ 1, 7, 7],
[ 2, 5, 9],
[ 3, 0, 1],
[ 4, 6, 1],
[ 5, 6, 6],
[ 6, 7, 6],
[ 7, 10, 9],
[ 8, 3, 2],
[ 9, 7, 8]])
I tried several ways of trying to reformulate the above array so that my dtype would work and I just can't figure it out (this included vstacking a vstack etc). So my question is...how can I use the vstack version and get it into a format that meets my dtype requirements without having to go through the procedure that I did. I am hoping it is obvious, but I am sliced, stacked and ellipsed myself into an endless loop.
SUMMARY
Many thanks to hpaulj. I have included two incarnations based upon his suggestions for others to consider. The pure numpy solution is substantially faster and a lot cleaner.
"""
Script: pnts_StackExch
Author: Dan.Patterson#carleton.ca
Modified: 2015-08-24
Purpose:
To provide some timing options on point creation in preparation for
point-to-point distance calculations using einsum.
Reference:
http://stackoverflow.com/questions/32224220/
methods-of-creating-a-structured-array
Functions:
decorators: profile_func, timing, arg_deco
main: make_pnts, einsum_0
"""
import numpy as np
import random
import time
from functools import wraps
np.set_printoptions(edgeitems=5,linewidth=75,precision=2,suppress=True,threshold=5)
# .... wrapper funcs .............
def delta_time(func):
"""timing decorator function"""
import time
#wraps(func)
def wrapper(*args, **kwargs):
print("\nTiming function for... {}".format(func.__name__))
t0 = time.time() # start time
result = func(*args, **kwargs) # ... run the function ...
t1 = time.time() # end time
print("Results for... {}".format(func.__name__))
print(" time taken ...{:12.9f} sec.".format(t1-t0))
#print("\n print results inside wrapper or use <return> ... ")
return result # return the result of the function
return wrapper
def arg_deco(func):
"""This wrapper just prints some basic function information."""
#wraps(func)
def wrapper(*args,**kwargs):
print("Function... {}".format(func.__name__))
#print("File....... {}".format(func.__code__.co_filename))
print(" args.... {}\n kwargs. {}".format(args,kwargs))
#print(" docs.... {}\n".format(func.__doc__))
return func(*args, **kwargs)
return wrapper
# .... main funcs ................
#delta_time
#arg_deco
def pnts_IdShape(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
"""Make N points based upon a random normal distribution,
with optional min/max values for Xs and Ys
"""
dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
IDs = np.arange(0,N)
Xs = np.random.random_integers(x_min,x_max,size=N) # note below
Ys = np.random.random_integers(y_min,y_max,size=N)
a = np.array([(i,j) for i,j in zip(IDs,np.column_stack((Xs,Ys)))],dt)
return IDs,Xs,Ys,a
#delta_time
#arg_deco
def alternate(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
""" after hpaulj and his mods to the above and this. See docs
"""
dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
IDs = np.arange(0,N)
Xs = np.random.random_integers(0,10,size=N)
Ys = np.random.random_integers(0,10,size=N)
c_stack = np.column_stack((IDs,Xs,Ys))
a = np.ones(N, dtype=dt)
a['ID'] = c_stack[:,0]
a['Shape'] = c_stack[:,1:]
return IDs,Xs,Ys,a
if __name__=="__main__":
"""time testing for various methods
"""
id_1,xs_1,ys_1,a_1 = pnts_IdShape(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10)
id_2,xs_2,ys_2,a_2 = alternate(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10)
Timing results for 1,000,000 points are as follows
Timing function for... pnts_IdShape
Function... **pnts_IdShape**
args.... ()
kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... pnts_IdShape
time taken ... **0.680652857 sec**.
Timing function for... **alternate**
Function... alternate
args.... ()
kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... alternate
time taken ... **0.060056925 sec**.
There are 2 ways of filling a structured array (http://docs.scipy.org/doc/numpy/user/basics.rec.html#filling-structured-arrays) - by row (or rows with list of tuples), and by field.
To do this by field, create the empty structured array, and assign values by field name
In [19]: a=np.column_stack((id,x,y))
# same as your vstack().T
In [20]: Y=np.zeros(a.shape[0], dtype=dt)
# empty, ones, etc
In [21]: Y['ID'] = a[:,0]
In [22]: Y['Shape'] = a[:,1:]
# (2,) field takes a 2 column array
In [23]: Y
Out[23]:
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
(4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
(8, [6.0, 1.0]), (9, [6.0, 6.0])],
dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])
On the surface
arr = np.array(list(zip(id,np.hstack((x,y)))),dt)
looks like an ok way of constructing the list of tuples need to fill the array. But result duplicates the values of x instead of using y. I'll have to look at what is wrong.
You can take a view of an array like a if the dtype is compatible - the data buffer for 3 int columns is layed out the same way as one with 3 int fields.
a.view('i4,i4,i4')
But your dtype wants 'i4,f8,f8', a mix of 4 and 8 byte fields, and a mix of int and float. The a buffer will have to be transformed to achieve that. view can't do it. (don't even ask about .astype.)
corrected list of tuples method:
In [35]: np.array([(i,j) for i,j in zip(id,np.column_stack((x,y)))],dt)
Out[35]:
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
(4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
(8, [6.0, 1.0]), (9, [6.0, 6.0])],
dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])
The list comprehension produces a list like:
[(0, array([8, 8])),
(1, array([8, 0])),
(2, array([6, 2])),
....]
For each tuple in the list, the [0] goes in the first field of the dtype, and [1] (a small array), goes in the 2nd.
The tuples could also be constructed with
[(i,[j,k]) for i,j,k in zip(id,x,y)]
dt1 = np.dtype([('ID','<i4'),('Shape',('<i4',(2,)))])
is a view compatible dtype (still 3 integers)
In [42]: a.view(dtype=dt1)
Out[42]:
array([[(0, [8, 8])],
[(1, [8, 0])],
[(2, [6, 2])],
[(3, [8, 8])],
[(4, [3, 2])],
[(5, [6, 1])],
[(6, [5, 6])],
[(7, [7, 7])],
[(8, [6, 1])],
[(9, [6, 6])]],
dtype=[('ID', '<i4'), ('Shape', '<i4', (2,))])

How to put a datetime value into a numpy array?

I am learning python so please bear with me. I have been trying to get a datetime
variable into a numpy array, but have not been able to figure out how. I need to calculate differences between times for each index later on, so I didn't know if I should put the datetime variable into the array, or convert it to another data type. I get the error:
'NoneType' object does not support item assignment
Is my dtype variable constructed correctly? This says nothing about datetime type.
import numpy as np
from liblas import file
f = file.File(project_file, mode = 'r')
num_points = int(f.__len())
# dtype should be [float, float, float, int, int, datetime]
dt = [('x', 'f4'), ('y', 'f4'), ('z', 'f4'), ('i', 'u2'), ('c', 'u1'), ('time', 'datetime64')]
xyzict = np.empty(shape=(num_points, 6), dtype = dt)
# Load all points into numpy array
counter = 0
for p in f:
newrow = [p.x, p.y, p.z, p.i, p.c, p.time]
xyzict[counter] = newrow
counter += 1
Thanks in advance
EDIT: I should note that I plan on sorting the array by date before proceeding.
p.time is in the following format:
>>>p.time
datetime.datetime(1971, 6, 26, 19, 37, 12, 713269)
>>>str(p.time)
'1971-06-26 19:37:12.713275'
I don't really understand how you are getting a datetime object out of your file, or what p is for that matter, but assuming you have a list of tuples (not lists, see my comment above), you can do the setting all in one step:
dat = [(.5, .5, .5, 0, 34, datetime.datetime(1971, 6, 26, 19, 37, 12, 713269)),
(.3, .3, .6, 1, 23, datetime.datetime(1971, 6, 26, 19, 34, 23, 345293))]
dt = [('x', 'f4'), ('y', 'f4'), ('z', 'f4'), ('i', 'u2'), ('c', 'u1'), ('time', 'datetime64[us]')]
datarr = np.array(dat, dt)
Then you can access the fields by name:
>>> datarr['time']
array(['1971-06-26T15:37:12.713269-0400', '1971-06-26T15:34:23.345293-0400'], dtype='datetime64[us]')
Or sort by field:
>>> np.sort(datarr, order='time')
array([ (0.3, 0.3, 0.6, 1, 23, datetime.datetime(1971, 6, 26, 19, 34, 23, 345293)),
(0.5, 0.5, 0.5, 0, 34, datetime.datetime(1971, 6, 26, 19, 37, 12, 713269))],
dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4'), ('i', '<u2'), ('c', 'u1'), ('time', '<M8[us]')])

Resources