Loading a blob(Google app Engine) into PIL or NumPy - google-app-engine

I'd like to be able to load a blob(image into the Python Image Processing Library or into a numpy array for analysis(such as mean, median, standard deviation) without using the serving url.
Here is my image database
the t_image_url contains the serving url for the blob
from google.appengine.ext import db, blobstore
class ImageModel(db.Model):
t_image = blobstore.BlobReferenceProperty(required=True)
t_imageUrl = db.StringProperty(required = True)
here is a segment of what I tried
import numpy as np
import Image
import ImageOps
class ImageAnalysisHandler(BaseHandler):
def get(self, imageModel_id):
if self.user:
i = ImageModel.get_by_id(int(imageModel_id))
OpenedImage = Image.open(i.t_image)
self.render('imageAnalysis.html', imageD = i)
else:
self.redirect('login')
This obviously didn't work since the Image Module(from the Python Imaging Library) doesn't know how to read blobs. I was wondering if anyone knew how to read in a blob into PIL or a numpy array accurately.

Take a look at the BlobReader class. It let you read a file store in blobstore with a file-like interface.

Related

Predicting locally with a model trained on Sagemaker

I have trained a model on AWS SageMaker by using the built-in algorithm Semantic Segmentation. This trained model named as model.tar.gz is stored on S3. So I want to download this file from S3 and then use it to make inference on my local PC without using AWS SageMaker.
Here are the three files:
hyperparams.json: includes the parameters for network architecture, data inputs, and training. Refer to Semantic Segmentation Hyperparameters.
model_algo-1
model_best.params
My code:
import mxnet as mx
from mxnet import image
from gluoncv.data.transforms.presets.segmentation import test_transform
import gluoncv
img = image.imread('./bdd100k/validation/14df900d-c5c145cb.jpg')
img = test_transform(img, ctx)
img = img.astype('float32')
model = gluoncv.model_zoo.PSPNet(2)
# load the trained model
model.load_parameters('./model/model_best.params')
Error:
AssertionError: Parameter 'head.psp.conv1.0.weight' is missing in file './model/model_best.params', which contains parameters: 'layer3.2.bn3.beta', 'layer3.0.conv3.weight', 'conv1.1.running_var', ..., 'layer2.2.bn3.running_mean', 'layer3.4.bn2.running_mean', 'layer4.2.bn3.beta', 'layer3.4.bn3.beta'. Set allow_missing=True to ignore missing parameters.
The following should work after extracting model_algo-1 from the tar.gz file. This will run on local ctx.
import gluoncv
from gluoncv import model_zoo
from gluoncv.data.transforms.presets.segmentation import test_transform
model = model_zoo.DeepLabV3(nclass=2, backbone='resnet50',
pretrained_base=False, height=800, width=1280, crop_size=240)
model.load_parameters("model_algo-1")
img = test_transform(img, ctx)
img = img.astype('float32')
output = model.predict(img)
print(output.shape)
max_predict = mx.nd.squeeze(mx.nd.argmax(output, 1)).asnumpy()
print(max_predict.shape)
prob_mask = mx.nd.squeeze(output).asnumpy()
def NormalizeData(data):
return (data - np.min(data)) / (np.max(data) - np.min(data))
target_cls_id = 1
prob_mat = prob_mask[target_cls_id, :, :]
norm_prob = NormalizeData(prob_mat)
plt.hist(norm_prob.flatten(), bins=50)

Converting the response of Python get request(jpg content) in Numpy Array

The workflow of my function is the following:
retrieve a jpg through python get request
save image as png (even though is downloaded as jpg) on disk
use imageio to read from disk image and transform it into numpy array
work with the array
This is what I do to save:
response = requests.get(urlstring, params=params)
if response.status_code == 200:
with open('PATH%d.png' % imagenumber, 'wb') as output:
output.write(response.content)
This is what I do to load and transform png into np.array
imagearray = im.imread('PATH%d.png' % imagenumber)
Since I don't need to store permanently what I download I tried to modify my function in order to transform the response.content in a Numpy array directly. Unfortunately every imageio like library works in the same way reading a uri from the disk and converting it to a np.array.
I tried this but obviously it didn't work since it need a uri in input
response = requests.get(urlstring, params=params)
imagearray = im.imread(response.content))
Is there any way to overcome this issue? How can I transform my response.content in a np.array?
imageio.imread is able to read from urls:
import imageio
url = "https://example_url.com/image.jpg"
# image is going to be type <class 'imageio.core.util.Image'>
# that's just an extension of np.ndarray with a meta attribute
image = imageio.imread(url)
You can look for more information in the documentation, they also have examples: https://imageio.readthedocs.io/en/stable/examples.html
You can use BytesIO as file to skip writing to an actual file.
bites = BytesIO(base64.b64decode(response.content))
Now you have it as BytesIO, so you can use it just like a file:
img = Image.open(bites)
img_np = np.array(im)

How to setup uploading binary objects through flask_restless endpoint?

I am working on a REST python application and I have picked flask_restless to build endpoints connected to the database. One of the tables I would like to manage is storing binary files as blobs (LargeBinary).
I have noticed, though, that flask_restless requires json data for POST requests. I tried to apply base64 to the binary file contents and wrap it with json, but ultimately flask_restless passed file contents to sqlalchemy as a string and the SQLite backend complained that it requires bytes input (quite rightly so).
I tried searching the interwebs for a solution, but either I am formulating my query incorrectly, or actually there is none.
So, is there a way to configure the endpoint managed with flask_restless to accept binary file as an attachment? Or rather the suggested solution would be to setup the endpoint for that particular table directly with flask (I did that before in another app), away from flask_restless?
It turns out that sending an attachment is not possible.
So I dug deeper into how to send base64-encoded attachments which would then be saved as blobs.
For that I used pre- and post-processing facility of flask_restless:
def pp_get_single_image(result=None, **kw):
import base64
result['image'] = base64.b64encode(result['image']).decode('utf8')
def pp_get_many_images(result=None, search_params=None, **kw):
result['objects'] = [pp_get_single_image(d) or d for d in result['objects']]
def pp_post_image_in(data=None, **kw):
import base64
data['image'] = base64.b64decode(data['image'])
def pp_post_image_out(result=None, **kw):
import base64
result['image'] = base64.b64encode(result['image']).decode('utf8')
postprocessors=dict(GET_SINGLE=[pp_get_single_image], GET_MANY=[pp_get_many_images], POST=[pp_post_image_out])
preprocessors=dict(POST=[pp_post_image_in])
manager = flask_restless.APIManager(app, flask_sqlalchemy_db=db)
manager.create_api(Image, methods=['GET', 'POST', 'DELETE'],
postprocessors=pp_image.postprocessors,
preprocessors=pp_image.preprocessors)

Storing BlobKey in DataStore with app engine

So I decided to rewrite my image gallery because of the new high performance image serving thing. That meant using Blobstore which I have never used before. It seemed simple enough until I tried to store the BlobKey in my model.
How on earth do I store reference to a blobstorekey in a Model? Should I use string or should I use some special property that I don't know about? I have this model
class Photo(db.Model):
date = db.DateTimeProperty(auto_now_add=True)
title = db.StringProperty()
blobkey = db.StringProperty()
photoalbum = db.ReferenceProperty(PhotoAlbum, collection_name='photos')
And I get this error: Property blobkey must be a str or unicode instance, not a BlobKey
Granted, I am a newbie in app engine but this is the first major wall I have hit yet.
Have googled around extensively without any success.
The following works for me. Note the class is blobstore.blobstore instead of just blobstore.
Model:
from google.appengine.ext.blobstore import blobstore
class Photo(db.Model):
imageblob = blobstore.BlobReferenceProperty()
Set the property:
from google.appengine.api import images
from google.appengine.api import blobstore
from google.appengine.ext.webapp import blobstore_handlers
class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
upload_files = self.get_uploads('file') # 'file' is file upload field in the form
blob_info = upload_files[0]
entity = models.db.get(self.request.get('id'))
entity.imageblob = blob_info.key()
Get the property:
image_url = images.get_serving_url(str(photo.imageblob.key()))
Instead of a db.StringProperty() you need to use db.blobstore.BlobReferenceProperty (I think)
I'm still trying to figure this thing out as well, but thought I'd post some ideas.
Here are the reference pages from Google:
http://code.google.com/appengine/docs/python/datastore/typesandpropertyclasses.html
http://code.google.com/appengine/docs/python/datastore/typesandpropertyclasses.html#BlobReferenceProperty

<class 'google.appengine.runtime.DeadlineExceededError'>: how to get around?

Ok guys I am having tons of problems getting my working dev server to a working production server :). I have a task that will go through and request urls and collect and update data. It takes 30 minutes to run.
I uploaded to production server and going to the url with its corresponding .py script appname.appspot.com/tasks/rrs after 30 seconds I am getting the class google.appengine.runtime.DeadlineExceededError' Is there any way to get around this? Is this a 30 second deadline for a page? This script works fine in development server I go to the url and the associate .py script runs until completion.
import time
import random
import string
import cPickle
from StringIO import StringIO
try:
import json
except ImportError:
import simplejson as json
import urllib
import pprint
import datetime
import sys
sys.path.append("C:\Program Files (x86)\Google\google_appengine")
sys.path.append("C:\Program Files (x86)\Google\google_appengine\lib\yaml\lib")
sys.path.append("C:\Program Files (x86)\Google\google_appengine\lib\webob")
from google.appengine.api import users
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
from google.appengine.ext import db
class SR(db.Model):
name = db.StringProperty()
title = db.StringProperty()
url = db.StringProperty()
##request url and returns JSON_data
def overview(page):
u = urllib.urlopen(page)
bytes = StringIO(u.read())
##print bytes
u.close()
try:
JSON_data = json.load(bytes)
return JSON_data
except ValueError,e:
print e," Couldn't get .json for %s" % page
return None
##specific code to parse particular JSON data and append new SR objects to the given url list
def parse_json(JSON_data,lists):
sr = SR()
sr.name = ##data gathered
sr.title = ##data gathered
sr.url = ##data gathered
lists.append(sr)
return lists
## I want to be able to request lets say 500 pages without timeing out
page = 'someurlpage.com'##starting url
url_list = []
for z in range(0,500):
page = 'someurlpage.com/%s'%z
JSON_data = overview(page)##get json data for a given url page
url_list = parse_json(JSON_data,url_list)##parse the json data and append class objects to a given list
db.put(url_list)##finally add object to gae database
Yes, the App Engine imposes a 30 seconds deadline. One way around it might be a try/except DeadlineExceededError and putting the rest in a taskqueue.
But you can't make your requests run for a longer period.
You can also try Bulkupdate
Example:
class Todo(db.Model):
page = db.StringProperty()
class BulkPageParser(bulkupdate.BulkUpdater):
def get_query(self):
return Todo.all()
def handle_entity(self, entity):
JSON_data = overview(entity.page)
db.put(parse_json(JSON_data, [])
entity.delete()
# Put this in your view code:
for i in range(500):
Todo(page='someurlpage.com/%s' % i).put()
job = BulkPageParser()
job.start()
ok so if I am dynamically adding links as I am parsing the pages, I would add to the todo queue like so I believe.
def handle_entity(self, entity):
JSON_data = overview(entity.page)
data_gathered,new_links = parse_json(JSON_data, [])##like earlier returns the a list of sr objects, and now a list of new links/pages to go to
db.put(data_gathered)
for link in new_links:
Todo(page=link).put()
entity.delete()

Resources