Related
I have an image with a list of numbers which I have scanned using PyTesseract to construct a string. Concretely, here is the code:
from PIL import Image
import pytesseract
from scipy import stats
import numpy as np
pytesseract.pytesseract.tesseract_cmd = r'C:\\\Program Files\\\Tesseract-OCR\\\tesseract.exe'
str1=pytesseract.image_to_string(Image.open('D:/Image.png'))
Here's the image I am scanning:
The problem is that PyTesseract is scanning the image as individual characters instead of integers.
I would like to understand why this is happening and what can I do to get the desired result.
In short, PyTesseract is not scanning integers in a list of numbers, instead scanning them as individual characters. How do I tell it to scan for integers and put them in an array?
Well,If you only want to get a list,Use re.split and strip can solve it.(Because tesseract's result has some errors).
You can try this:
import pytesseract
import re
data = pytesseract.image_to_string('OCR.png')
dataList = re.split(r',|\.| ',data) # split the string
resultList = [int(i.strip()) for i in dataList if i != ''] # remove the '' str and convert str to int.
print(resultList)
# result: [71, 194, 38, 1701, 89, 76, 11, 83, 1629, 48, 94, 63, 132, 16, 111, 95, 84, 341, 975, 14, 40, 64, .......
I have one array like below:
Array = [21.2, 13.6, 86.2, 54.6, 76, 34, 78, 12, 90, 4];
Now I want to add Array values from the first index to the fourth index, and from the seventh index to the tenth.
I wrote this code but it did not work correctly.
s = 0
for I=1:10
if 1<=I<=4 | I>6
s = s + Array(I);
end
end
Please help me with this problem.
You can implement it without any kind of loop that may slow your code. To make those sums, you just need to use 'sum'. For further help, please read this. In your case, I'd do the following:
a = [21.2, 13.6, 86.2, 54.6, 76, 34, 78, 12, 90, 4];
b = sum(a(1:4))+sum(a(8:end));
I need to initialize an static array in a Fortran subroutine
double precision A(56136,8)
like so:
A(1,1)=0.999950528145
A(1,2)=0.99982470274
A(1,3)=0.999987006187
.
.
.
A(56136,7)=0.933468163013
A(56136,8)=0.0668926686049
The latter is generated by another program.
Compilation with ifort 13.0 ifort file.f -O0 takes very long (around 30 minutes).
Q1: What is the reason for this and how can I avoid it?
I have no handle on the main program, the subroutine is linked to third party files. The subroutine is called very often, so file access is not desirable.
Q2: Can I put the initialization outside the subroutine, without having a main program, avoiding the initialization every time the subroutine is called?
Edit
It is constant. Initializing it in the declaration statement would look like this?
double precision A(56136:8)=reshape(/*
& #, #, #, #, #, #, #, #,
& #, #, #, #, #, #, #, #,
:
& */,(56136,8))
This does not work because there are too many newlines.
I did a test with 10000 integers and it compiles in seconds when using a DATA statement. My array holds 50000 integers but wanted to see if I can assign them in blocks of 10000.
program Console1
implicit none
! Variables
integer A(50000)
data A(1:10000) / & ! 1000 lines of 10 numbers each to follow:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, &
31, 37, 41, 43, 47, 53, 59, 61, 67, 71, &
73, 79, 83, 89, 97, 101, 103, 107, 109, 113, &
127, 131, 137, 139, 149, 151, 157, 163, 167, 173, &
179, 181, 191, 193, 197, 199, 211, 223, 227, 229, &
233, 239, 241, 251, 257, 263, 269, 271, 277, 281, &
...
104087,104089,104107,104113,104119,104123,104147,104149,104161,104173, &
104179,104183,104207,104231,104233,104239,104243,104281,104287,104297, &
104309,104311,104323,104327,104347,104369,104381,104383,104393,104399, &
104417,104459,104471,104473,104479,104491,104513,104527,104537,104543, &
104549,104551,104561,104579,104593,104597,104623,104639,104651,104659, &
104677,104681,104683,104693,104701,104707,104711,104717,104723,104729 /
! Append the first 10000 elements to the remaining array
A(10001:50000) = [A(1:10000),A(1:10000),A(1:10000),A(1:10000)]
print *, A(9998:10002)
end program Console1
Edit 1
Here is how do use a DATA statement with a COMMON block to set the values inside a subroutine. Note that use of common blocks is nearly deprecated.
subroutine A_Fill()
implicit none
integer :: A(10)
common /vals/ A
data A/ 1,2,3,4,5,6,7,8,9,10 /
end subroutine
program Console1
implicit none
! Variables
integer :: A(10)
common /vals/ A
call A_Fill()
print *, A
end program Console1
Edit 2
Another solution that uses a function to copy the saved array into a secondary working copy.
function A_Fill() result(B)
implicit none
integer :: B(10)
integer,save :: A(10)
data A/ 1,2,3,4,5,6,7,8,9,10 /
B = A
end function
program Console1
implicit none
interface
function A_Fill() result(B)
implicit none
integer :: B(10)
end function
end interface
! Variables
integer :: A(10)
A = A_Fill()
print *, A
end program Console1
Why does my model in Keras not take in my input/output data?
The input data consist of being a list of numpy.ndarrays of shape (15,1,3) and the output is a list of numpy.arrays with only one number in each entry.
Here is the where I create my model, and pass things in:
model = Sequential()
print "Data-train-in: " + str(data_train_input[0].shape)
print "Data-train-out: " + str(data_train_output[0].shape)
print "Data-test-in: " + str(data_test_input[0].shape)
#sys.exit()
print "Model Definition"
print "Row: " + str(row)
model.add(Convolution2D(64,3,3,input_shape=(3,row,1)))
print model.output_shape
model.add(Convolution2D(32,1,3))
print model.output_shape
model.add(MaxPooling2D((1,1)))
print model.output_shape
model.add(Flatten())
print model.output_shape
model.add(Dense(1,activation='relu'))
print model.output_shape
model.compile(loss='mean_squared_error', optimizer="sgd")
reduce_lr=ReduceLROnPlateau(monitor='val_loss', factor=0.01, patience=3, verbose=1, mode='auto', epsilon=0.0001, cooldown=0, min_lr=0.000000000000000001)
stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=1, mode='auto')
log=csv_logger = CSVLogger('training_'+str(i)+'.csv')
print "Model Train"
hist_current = model.fit(data_train_input,
data_train_output,
shuffle=False,
validation_data=(data_test_input,data_test_output),
validation_split=0.1,
nb_epoch=150,
verbose=1,
callbacks=[reduce_lr,log,stop])
Which outputs:
Data-train-in: (15, 1, 3)
Data-train-out: ()
Data-test-in: (15, 1, 3)
Model Definition
Row: 15
(None, 1, 13, 64)
(None, 1, 11, 32)
(None, 1, 11, 32)
(None, 352)
(None, 1)
Model Train
Traceback (most recent call last):
File "keras_convolutional_feature_extraction.py", line 502, in <module>
model(0,train_input_data,output_data_train,test_input_data,output_data_test)
File "keras_convolutional_feature_extraction.py", line 496, in model
callbacks=[reduce_lr,log,stop])
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 652, in fit
sample_weight=sample_weight)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1038, in fit
batch_size=batch_size)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 963, in _standardize_user_data
exception_prefix='model input')
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 54, in standardize_input_data
'...')
Exception: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 arrays but instead got the following list of 260182 arrays: [array([[[ 67, 255, 180]],
[[ 68, 255, 178]],
[[ 68, 255, 178]],
[[ 67, 255, 180]],
[[ 43, 254, 204]],
[[ 19, 253, 228]],
[[ 9, 205, 241]],
[[ ...
I am not sure on how to interpret the output message. What is wrong here?
Your data doesn't match your input layer. In your model you used input_shape=(3,row,1) which equals to input_shape=(3,15,1) in this context.
But your print show that your training examples are with a different shape of (15, 1, 3).
Try changing your input definition to input_shape=(row,1,3).
Another way to solve the problem is reshaping your data to the input layer shape.
import numpy as np
data_train_input = np.array(data_train_input)
this seems to work.
For instance, how to execute the equivalent following SQL (which inserts into a BINARY(16) field)
INSERT INTO Table1 (MD5) VALUES (X'6717f2823d3202449201145073ab871A'),(X'6717f2823d3202449301145073ab371A')
using dbWriteTable()? Doing
dbWriteTable(db, "Table1", data.frame(MD5 = "X'6717f2823d3202449201145073ab871A'", ...), append = T, row.names = F)
doesn't seem to work - it writes the values as text.
In the end, I'm going to have a big data.frame of hashes that I want to write, and so perfect for using dbWriteTable. But I just can't figure out how to INSERT the data.frame into binary database fields.
So here are two possibilities that seem to work. The first uses dbSendQuery(...) in a loop (you've probably thought of this already...).
db.WriteTable = function(con,table,df) { # no error checking whatsoever...
require(DBI)
field <- colnames(df)[1]
for (i in 1:nrow(df)) {
query <- sprintf("INSERT INTO %s (%s) VALUES (X'%s')",table,field,df[i,1])
rs <- dbSendQuery(con,statement=query)
}
return(nrow(df))
}
library(DBI)
drv <- dbDriver("SQLite")
con <- dbConnect(drv)
rs <- dbSendQuery(con, statement="CREATE TABLE hash (MD5 BLOB)")
df <- data.frame(MD5=c("6717f2823d3202449201145073ab871A",
"6717f2823d3202449301145073ab371A"))
rs <- db.WriteTable(con,"hash",df)
result.1 <- dbReadTable(con,"hash")
result.1
# MD5
# 1 67, 17, f2, 82, 3d, 32, 02, 44, 92, 01, 14, 50, 73, ab, 87, 1a
# 2 67, 17, f2, 82, 3d, 32, 02, 44, 93, 01, 14, 50, 73, ab, 37, 1a
If your data frame of hashes is very large, then df.WriteFast(...) does the same thing as db.WriteTable(...) only it should be faster.
db.WriteFast = function(con.table,df) {
require(DBI)
field <- colnames(df)[1]
lapply(unlist(df[,1]),function(x){
dbSendQuery(con,
statement=sprintf("INSERT INTO %s (%s) VALUES (X'%s')",
table,field,x))})
}
Note that result.1 is a data frame, and if we use it in a call to dbWriteTable(...) we can successfully write the hashes to a BLOB. So it is possible.
str(result.1)
# 'data.frame': 2 obs. of 1 variable:
# $ MD5:List of 2
# ..$ : raw 67 17 f2 82 ...
# ..$ : raw 67 17 f2 82 ...
The second approach takes advantage of R's raw data type to create a data frame structured like result.1, and passes that to dbWriteTable(...). You'd think this would be easy, but no.
h2r = function(x) {
bytes <- substring(x, seq(1, nchar(x)-1, 2), seq(2, nchar(x), 2))
return(list(as.raw(as.hexmode(bytes))))
}
hash2raw = Vectorize(h2r)
df.raw=data.frame(MD5=list(1:nrow(df)))
colnames(df.raw)="MD5"
df.raw$MD5 = unname(hash2raw(as.character(df$MD5)))
dbWriteTable(con, "newHash",df.raw)
result.2 <- dbReadTable(con,"newHash")
result.2
all.equal(result.1$MD5,result.2$MD5)
# [1] TRUE
In this approach, we create a data frame df.raw which has one column, MD5, wherein each element is a list of raw bytes. The utility function h2r(...) takes a character representation of the hash, breaks it into a vector of char(2) (the bytes), then interprets each of those as hex (as.hexmode(...)), converts the result to raw (as.raw(...)), and finally returns the result as a list. Vectorize(...) is a wrapper that allows hash2raw(...) to take a vector as its argument.
Personally, I think you're better off using the first approach: it takes advantage of SQLite's internal mechanism for writing hex to BLOBs, and it's much easier to understand.