Why won't Keras take my input? - arrays

Why does my model in Keras not take in my input/output data?
The input data consist of being a list of numpy.ndarrays of shape (15,1,3) and the output is a list of numpy.arrays with only one number in each entry.
Here is the where I create my model, and pass things in:
model = Sequential()
print "Data-train-in: " + str(data_train_input[0].shape)
print "Data-train-out: " + str(data_train_output[0].shape)
print "Data-test-in: " + str(data_test_input[0].shape)
#sys.exit()
print "Model Definition"
print "Row: " + str(row)
model.add(Convolution2D(64,3,3,input_shape=(3,row,1)))
print model.output_shape
model.add(Convolution2D(32,1,3))
print model.output_shape
model.add(MaxPooling2D((1,1)))
print model.output_shape
model.add(Flatten())
print model.output_shape
model.add(Dense(1,activation='relu'))
print model.output_shape
model.compile(loss='mean_squared_error', optimizer="sgd")
reduce_lr=ReduceLROnPlateau(monitor='val_loss', factor=0.01, patience=3, verbose=1, mode='auto', epsilon=0.0001, cooldown=0, min_lr=0.000000000000000001)
stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=1, mode='auto')
log=csv_logger = CSVLogger('training_'+str(i)+'.csv')
print "Model Train"
hist_current = model.fit(data_train_input,
data_train_output,
shuffle=False,
validation_data=(data_test_input,data_test_output),
validation_split=0.1,
nb_epoch=150,
verbose=1,
callbacks=[reduce_lr,log,stop])
Which outputs:
Data-train-in: (15, 1, 3)
Data-train-out: ()
Data-test-in: (15, 1, 3)
Model Definition
Row: 15
(None, 1, 13, 64)
(None, 1, 11, 32)
(None, 1, 11, 32)
(None, 352)
(None, 1)
Model Train
Traceback (most recent call last):
File "keras_convolutional_feature_extraction.py", line 502, in <module>
model(0,train_input_data,output_data_train,test_input_data,output_data_test)
File "keras_convolutional_feature_extraction.py", line 496, in model
callbacks=[reduce_lr,log,stop])
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 652, in fit
sample_weight=sample_weight)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1038, in fit
batch_size=batch_size)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 963, in _standardize_user_data
exception_prefix='model input')
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 54, in standardize_input_data
'...')
Exception: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 arrays but instead got the following list of 260182 arrays: [array([[[ 67, 255, 180]],
[[ 68, 255, 178]],
[[ 68, 255, 178]],
[[ 67, 255, 180]],
[[ 43, 254, 204]],
[[ 19, 253, 228]],
[[ 9, 205, 241]],
[[ ...
I am not sure on how to interpret the output message. What is wrong here?

Your data doesn't match your input layer. In your model you used input_shape=(3,row,1) which equals to input_shape=(3,15,1) in this context.
But your print show that your training examples are with a different shape of (15, 1, 3).
Try changing your input definition to input_shape=(row,1,3).
Another way to solve the problem is reshaping your data to the input layer shape.

import numpy as np
data_train_input = np.array(data_train_input)
this seems to work.

Related

Converting image identified by PyTesseract to an array

I have an image with a list of numbers which I have scanned using PyTesseract to construct a string. Concretely, here is the code:
from PIL import Image
import pytesseract
from scipy import stats
import numpy as np
pytesseract.pytesseract.tesseract_cmd = r'C:\\\Program Files\\\Tesseract-OCR\\\tesseract.exe'
str1=pytesseract.image_to_string(Image.open('D:/Image.png'))
Here's the image I am scanning:
The problem is that PyTesseract is scanning the image as individual characters instead of integers.
I would like to understand why this is happening and what can I do to get the desired result.
In short, PyTesseract is not scanning integers in a list of numbers, instead scanning them as individual characters. How do I tell it to scan for integers and put them in an array?
Well,If you only want to get a list,Use re.split and strip can solve it.(Because tesseract's result has some errors).
You can try this:
import pytesseract
import re
data = pytesseract.image_to_string('OCR.png')
dataList = re.split(r',|\.| ',data) # split the string
resultList = [int(i.strip()) for i in dataList if i != ''] # remove the '' str and convert str to int.
print(resultList)
# result: [71, 194, 38, 1701, 89, 76, 11, 83, 1629, 48, 94, 63, 132, 16, 111, 95, 84, 341, 975, 14, 40, 64, .......

dict type numpy.AxisError: axis -1 is out of bounds for array of dimension 0

I am not able to figure out how to fix this error when I run my python code. This is the entire error
Loading all_data
type of alldata <class 'dict'>
Sorting these keys dict_keys([0, 1, 2, 3, 4, 5])
Traceback (most recent call last):
File "test.py", line 48, in <module>
keys_sorted = np.sort(all_data.keys())
File "/home/MAHEUNIX/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 934, in sort
a.sort(axis=axis, kind=kind, order=order)
numpy.AxisError: axis -1 is out of bounds for array of dimension 0
MAHEUNIX#WGSHA-LAB-005:/
This is the corresponding code code:
print("Loading all_data")
all_data = load_dataset()
print("type of alldata",type(all_data),"\n")
print ("Sorting these keys", all_data.keys(),"\n\n")
keys_sorted = np.sort(all_data.keys())
print("keys sorted successfully\n")
train_idx, valid_idx = train_test_split(all_data.keys(), train_size = 0.9)
print (train_idx)
What is happening?

Fitting a linear regression with scipy.stats; error in array shapes

I have written some code to read a data file using pandas and process the data with numpy. This results in some NaNs in the numpy array. I mask those out so that I can apply a linear regression fit with scipy.stats:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
def makeArray(band):
"""
Takes as argument a string as the name of a wavelength band.
Converts the list of magnitudes in that band into a numpy array,
replacing invalid values (where invalid == -999) with NaNs.
Returns the array.
"""
array_name = band + '_mag'
array = np.array(df[array_name])
array[array==-999]=np.nan
return array
# Read data file
fields = ['no', 'NED', 'z', 'obj_type','S_21', 'power', 'SI_flag',
'U_mag', 'B_mag', 'V_mag', 'R_mag', 'K_mag', 'W1_mag',
'W2_mag', 'W3_mag', 'W4_mag', 'L_UV', 'Q', 'flag_uv']
magnitudes = ['U_mag', 'B_mag', 'V_mag', 'R_mag', 'K_mag', 'W1_mag',
'W2_mag', 'W3_mag', 'W4_mag']
df = pd.read_csv('todo.dat', sep = ' ',
names = fields, index_col = False)
# Define axes for processing
redshifts = np.array(df['z'])
y = np.log(makeArray('K'))
mask = np.isnan(y)
plt.scatter(redshifts, y, label = ('K'), s = 2, color = 'r')
slope, intercept, r_value, p_value, std_err = stats.linregress(redshifts, y[mask])
fit = slope*redshifts + intercept
plt.legend()
plt.show()
but the lines where I calculate the stats parameters and the fit line (third- and fourth-to-last lines) give me the following error:
Traceback (most recent call last):
File "<ipython-input-77-ec9f43cdfa9b>", line 1, in <module>
runfile('C:/Users/Jeremy/Dropbox/Notes/Postgrad/Masters Research/VUW/QSOs/read_csv.py', wdir='C:/Users/Jeremy/Dropbox/Notes/Postgrad/Masters Research/VUW/QSOs')
File "C:\Users\Jeremy\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "C:\Users\Jeremy\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Jeremy/Dropbox/Notes/Postgrad/Masters Research/VUW/QSOs/read_csv.py", line 35, in <module>
slope, intercept, r_value, p_value, std_err = stats.linregress(redshifts, y[mask])
File "C:\Users\Jeremy\Anaconda3\lib\site-packages\scipy\stats\_stats_mstats_common.py", line 92, in linregress
ssxm, ssxym, ssyxm, ssym = np.cov(x, y, bias=1).flat
File "C:\Users\Jeremy\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2865, in cov
X = np.vstack((X, y))
File "C:\Users\Jeremy\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 234, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
The variables are shaped like:
so I'm not sure what the error means, or how to fix it. Is there a way around this? Or perhaps another module I can use instead of scipy.stats that will allow me to fit a linear regression?
The problem is that y[mask] is a different length to redshifts.
Below is a simple example piece of code to show the issue..
import numpy as np
na = np.array
y = na([np.nan, 4, 5, 6, 7, 8, np.nan, 9, 10, np.nan])
mask = np.isnan(y)
print(len(y), len(y[mask]))
You will have to substitute values for the nan values in y with something like..
print('old y: ', y)
for idx, m in enumerate(mask):
if m:
y[idx] = 1000 # or whatever value you decide on
print('new y: ', y)
Full example code...
import numpy as np
na = np.array
y = na([np.nan, 4, 5, 6, 7, 8, np.nan, 9, 10, np.nan])
mask = np.isnan(y)
print(len(y), len(y[mask]))
print('old y: ', y)
for idx, m in enumerate(mask):
if m:
y[idx] = 1000 # or whatever value you decide on
print('new y: ', y)
print(len(y))

combine multiple numpy ndarrays as list

I have three equally dimensioned numpy arrays.
I would like to store the data from all three in an array of the same dimensions and size.
To do this, I would like to store three bytes of information per item in the array. I assume this would be a list.
e.g.
>>>red = np.array([[150,25],[37,214]])
>>>green = np.array([[190,27],[123,231]])
>>>blue = np.array([[10,112],[123,119]])
insert combination magic to make a combined array called RGB
>>>RGB
array([(150,190,10),(25,27,112)],[(37,123,123),(214,231,119)])
For a start, each is 2x2. Combined in a list with array, same construction as in making red, produces a 3x2x2.
In [344]: red = np.array([[150,25],[37,214]])
In [345]: green = np.array([[190,27],[123,231]])
In [346]: blue = np.array([[10,112],[123,119]])
In [347]: np.array([red,green,blue])
Out[347]:
array([[[150, 25],
[ 37, 214]],
[[190, 27],
[123, 231]],
[[ 10, 112],
[123, 119]]])
In [348]: _.shape
Out[348]: (3, 2, 2)
That's not the order you want, but we can easily reshape, and if needed transpose.
The target, with an added set of []
In [350]: np.array([[(150,190,10),(25,27,112)],[(37,123,123),(214,231,119)]])
Out[350]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
In [351]: _.shape
Out[351]: (2, 2, 3)
so try moving the 3 shape to the end with transpose:
In [352]: np.array([red,green,blue]).transpose(1,2,0)
Out[352]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
===========================
I should have suggested stack. This a newish version of concatenate that lets us join arrays on different new dimensions. With axis=0 it behaves like np.array. But to join on the last, to put the rgb dimension last use:
In [467]: np.stack((red,green,blue),axis=-1)
Out[467]:
array([[[150, 190, 10],
[ 25, 27, 112]],
[[ 37, 123, 123],
[214, 231, 119]]])
In [468]: _.shape
Out[468]: (2, 2, 3)
Note that this expression does not assume anything about the shape of red, etc, except that they are equal. So it will work with 3d arrays as well.

String Input in Java

I was trying to take input many strings in separate lines and want to store all of them for use later.For example want to take input as follows(last line ends with a ".")-
My name is ABCD
My name is BCDS
My name is fdada.
How can I implement this?? Also I want to use all these strings.In java or any other language I would have made a string array and used that array to access all the three strings.
But the moment I enter 1st line it gives me false.
you can use a failure driven loop, like
:- dynamic a_line/1.
read_lines :-
retractall(a_line(__)),
repeat,
read_line_to_codes(user_input, L),
assertz(a_line(L)),
( last(L, 0'.) ; fail ).
and then
?- read_lines.
|: My name is ABCD
|: My name is BCDS
|: My name is fdada.
true .
the result get stored in a_line/1, so
?- a_line(L),atom_codes(A,L).
L = [77, 121, 32, 110, 97, 109, 101, 32, 105|...],
A = 'My name is ABCD'
L = [32, 77, 121, 32, 110, 97, 109, 101, 32|...],
A = ' My name is BCDS'.
...

Resources