Storing BlobKey in DataStore with app engine

Storing BlobKey in DataStore with app engine - google-app-engine

So I decided to rewrite my image gallery because of the new high performance image serving thing. That meant using Blobstore which I have never used before. It seemed simple enough until I tried to store the BlobKey in my model.
How on earth do I store reference to a blobstorekey in a Model? Should I use string or should I use some special property that I don't know about? I have this model
class Photo(db.Model):
date = db.DateTimeProperty(auto_now_add=True)
title = db.StringProperty()
blobkey = db.StringProperty()
photoalbum = db.ReferenceProperty(PhotoAlbum, collection_name='photos')
And I get this error: Property blobkey must be a str or unicode instance, not a BlobKey
Granted, I am a newbie in app engine but this is the first major wall I have hit yet.
Have googled around extensively without any success.

The following works for me. Note the class is blobstore.blobstore instead of just blobstore.
Model:
from google.appengine.ext.blobstore import blobstore
class Photo(db.Model):
imageblob = blobstore.BlobReferenceProperty()
Set the property:
from google.appengine.api import images
from google.appengine.api import blobstore
from google.appengine.ext.webapp import blobstore_handlers
class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
upload_files = self.get_uploads('file') # 'file' is file upload field in the form
blob_info = upload_files[0]
entity = models.db.get(self.request.get('id'))
entity.imageblob = blob_info.key()
Get the property:
image_url = images.get_serving_url(str(photo.imageblob.key()))

Instead of a db.StringProperty() you need to use db.blobstore.BlobReferenceProperty (I think)
I'm still trying to figure this thing out as well, but thought I'd post some ideas.
Here are the reference pages from Google:
http://code.google.com/appengine/docs/python/datastore/typesandpropertyclasses.html
http://code.google.com/appengine/docs/python/datastore/typesandpropertyclasses.html#BlobReferenceProperty

Related

Upload to Amazon S3 using Boto3 and return public url

Iam trying to upload files to s3 using Boto3 and make that uploaded file public and return it as a url.
class UtilResource(BaseZMPResource):
class Meta(BaseZMPResource.Meta):
queryset = Configuration.objects.none()
resource_name = 'util_resource'
allowed_methods = ['get']
def post_list(self, request, **kwargs):
fileToUpload = request.FILES
# write code to upload to amazone s3
# see: https://boto3.readthedocs.org/en/latest/reference/services/s3.html
self.session = Session(aws_access_key_id=settings.AWS_KEY_ID,
aws_secret_access_key=settings.AWS_ACCESS_KEY,
region_name=settings.AWS_REGION)
client = self.session.client('s3')
client.upload_file('zango-static','fileToUpload')
url = "some/test/url"
return self.create_response(request, {
'url': url // return's public url of uploaded file
})
I searched whole documentation I couldn't find any links which describes how to do this can someone explain or provide any resource where I can find the soultion?

I'm in the same situation.
Not able to find anything in the Boto3 docs beyond generate_presigned_url which is not what I need in my case since I have public readable S3 Objects.
The best I came up with is:
bucket_location = boto3.client('s3').get_bucket_location(Bucket=s3_bucket_name)
object_url = "https://s3-{0}.amazonaws.com/{1}/{2}".format(
bucket_location['LocationConstraint'],
s3_bucket_name,
key_name)
You might try posting on the boto3 github issues list for a better solution.

I had the same issue.
Assuming you know the bucket name where you want to store your data, you can then use the following:
import boto3
from boto3.s3.transfer import S3Transfer
credentials = {
'aws_access_key_id': aws_access_key_id,
'aws_secret_access_key': aws_secret_access_key
}
client = boto3.client('s3', 'us-west-2', **credentials)
transfer = S3Transfer(client)
transfer.upload_file('/tmp/myfile', bucket, key,
extra_args={'ACL': 'public-read'})
file_url = '%s/%s/%s' % (client.meta.endpoint_url, bucket, key)

The best solution I found is still to use the generate_presigned_url, just that the Client.Config.signature_version needs to be set to botocore.UNSIGNED.
The following returns the public link without the signing stuff.
config = Config(signature_version=botocore.UNSIGNED)
config.signature_version = botocore.UNSIGNED
boto3.client('s3', config=config).generate_presigned_url('get_object', ExpiresIn=0, Params={'Bucket': bucket, 'Key': key})
The relevant discussions on the boto3 repository are:
https://github.com/boto/boto3/issues/110
https://github.com/boto/boto3/issues/169
https://github.com/boto/boto3/issues/1415

Somebody who wants to build up a direct URL for the public accessible object to avoid using generate_presigned_url for some reason.
Please build URL with urllib.parse.quote_plus considering whitespace and special character issue.
My object key: 2018-11-26 16:34:48.351890+09:00.jpg
please note whitespace and ':'
S3 public link in aws console: https://s3.my_region.amazonaws.com/my_bucket_name/2018-11-26+16%3A34%3A48.351890%2B09%3A00.jpg
Below code was OK for me
import boto3
s3_client = boto3.client
bucket_location = s3_client.get_bucket_location(Bucket='my_bucket_name')
url = "https://s3.{0}.amazonaws.com/{1}/{2}".format(bucket_location['LocationConstraint'], 'my_bucket_name', quote_plus('2018-11-26 16:34:48.351890+09:00.jpg')
print(url)

Going through the existing answers and their comments, I did the following and works well for special cases of file names like having whitespaces, having special characters (ASCII), corner cases. E.g. file names of the form: "key=value.txt"
import boto3
import botocore
config = botocore.client.Config(signature_version=botocore.UNSIGNED)
object_url = boto3.client('s3', config=config).generate_presigned_url('get_object', ExpiresIn=0, Params={'Bucket': s3_bucket_name, 'Key': key_name})
print(object_url)

For Django, if you use Django storages with boto3 the code below does exactly what you want:
default_storage.url(name=f.name)

I used an f-string for the same
import boto3
#s3_client = boto3.session.Session(profile_name='sssss').client('s3')
s3_client=boto3.client('s3')
s3_bucket_name = 'xxxxx'
s3_website_URL= f"http://{s3_bucket_name}.s3-website.{s3_client.get_bucket_location(Bucket=s3_bucket_name)['LocationConstraint']}.amazonaws.com"

gae-boilerplate existing blob not showing

Error does not display image with the following url
http://127.0.0.1:8080/serve/CrObzPCoJfjG4ESUUb0hjw==
Image does exist in the blobstore can be checked in admin
My route
Dope on redirect routes
RedirectRoute('/serve/[a-zA-Z0-9-_]', handlers.ServeHandler, name='ServeHandler'),
My code:
from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers
import urllib
class ServeHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
logging.info("SERVE " + str(resource))
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.send_blob(blob_info)
class FetchHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
upload_files = self.get_uploads('file') # 'file' is file upload field in the form
blob_info = upload_files[0]
logging.info("FOUND blob info" + str(blob_info))
self.redirect('/serve/%s' % blob_info.key())
class ImageHandler(BaseHandler):
#user_required
def get(self, **kwargs):
user_session = self.user
user_session_object = self.auth.store.get_session(self.request)
upload_url = blobstore.create_upload_url('/fetch/')
user_info = models.User.get_by_id(long( self.user_id ))
user_info_object = self.auth.store.user_model.get_by_auth_token(
user_session['user_id'], user_session['token'])
try:
params = {
"upload_url": upload_url,
"user_session" : user_session,
"user_session_object" : user_session_object,
"user_info" : user_info,
"user_info_object" : user_info_object,
"userinfo_logout-url" : self.auth_config['logout_url'],
}
return self.render_template('image.html', **params)
except (AttributeError, KeyError), e:
return "Secure zone error:" + " %s." % e

I think your problem might be on this line:
self.redirect('/serve/%s' % blob_info.key())
According to the following recent changes assuming you did update appengine to the latest release:
The Blobstore service now returns the created filename instead of the blobKey when using Cloud Storage [link][1]
Have a look at the recent release notes and the changes that came with it.

I think they URL that you are providing to the create_upload_url is the wrong one, since you are defining it like /upload/.
Add the forward slash in the end and it should work:
upload_url = blobstore.create_upload_url('/upload/')

Comparing your code (form an earlier revision of your question) to some code I have working, I suspect that you might not want the trailing / on the /upload/ route (i.e., use /upload instead.
I'm not familiar with RedirectRoute, though.

Unable to download file from blobstore

I have successfully uploaded a file to blobstore using this code.
But I am unable to download it.
What I am doing is:
`class PartnerFileDownloadHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, blob_key):
resource = str(urllib.unquote(blob_key))
logging.info('I am here.') //This gets printed successfully.
blob_info = blobstore.BlobInfo.get(blob_key)
logging.info(blob_info) //This gets logged too.
self.send_blob(blob_info)`
I have also tried:
blobstore.BlobReader(blob_key).read()
and I get file data in string form but I can not write it to file, as local file system can not be accessed from within a handler, I guess.
The way I am uploading a file is the only way in my project so I can not use the usual way specified in the Google's official tutorial. Also The file I am uploading to blobstore is not present at my local file syatem, I pick it from a URL, perhaps this is the problem why I am not able to download the file.
Any suggestions?
Thanks

Perhaps you should use resource instead of blob_key from your code sample?
class PartnerFileDownloadHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, blob_key):
resource = str(urllib.unquote(blob_key))
self.send_blob(resource)

you can use DownloadHandler as this:
from mimetypes import guess_type
def mime_type(filename):
return guess_type(filename)[0]
class Thumbnailer(blobstore_handlers.BlobstoreDownloadHandler):
def get(self , blob_key):
if blob_key:
blob_info = blobstore.get(blob_key)
if blob_info:
save_as1 = blob_info.filename
mime_type=mime_type(blob_info.filename)
self.send_blob(blob_info,content_type=mime_type,save_as=save_as1)

How to pass a google mapreduce parameter to done_callback

I'm having trouble setting a parameter when kicking off a mapreduce via start_map so I can access it in done_callback. Numerous things I've read imply that it's possible, but somehow I've not got the earth-moon-stars properly aligned. Ultimately, what I'm trying to accomplish is to delete the temporary blob I created for the mapreduce job.
Here's how I kick it off:
mrID = control.start_map(
"Find friends",
"findfriendshandler.findFriendHandler",
"mapreduce.input_readers.BlobstoreLineInputReader",
{"blob_keys": blobKey},
shard_count=7,
mapreduce_parameters={'done_callback': '/fnfrdone','blobKey': blobKey})
In done_callback, the context object isn't available:
class FindFriendsDoneHandler(webapp.RequestHandler):
def post(self):
ctx = context.get()
if ctx is not None:
params = ctx.mapreduce_spec.mapper.params
try:
blobKey = params['blobKey']
logging.info(['BLOBKEY ' + blobKey])
except KeyError:
logging.info('blobKey key not found in params')
else:
logging.info('context.get did not work') #THIS IS WHAT GETS OUTPUT
Thanks!
EDIT: It seems like there may be more than one MR library, so I wanted to include my various imports:
from mapreduce import control
from mapreduce import operation as op
from mapreduce import context
from mapreduce import model

Below is the code I used in my done_callback handler to retrieve my blobKey user parameter:
class FindFriendsDoneHandler(webapp.RequestHandler):
mrID = self.request.headers['Mapreduce-Id']
try:
mapreduceState = MapreduceState.get_by_key_name(mrID)
mrSpec = mapreduceState.mapreduce_spec
jsonSpec = mrSpec.to_json()
jsonParams = jsonSpec['params']
blobKey = jsonParams['blobKey']
blobInfo = BlobInfo.get(blobKey)
blobInfo.delete()
logging.info('Temp blob deleted successfully for mapreduce:' + mrID)
except:
logging.warning('Unable to delete temp blob for mapreduce:' + mrID)
This uses the mapreduce ID passed into done callback via the header to retrieve the mapreduce state model object from the mapreduce state table. The model stores any user params sent via start_map in a mapreduce_spec property which is in json format.
Note that MR, itself, actually stores the blob_key elsewhere in mapreduce_spec.
Thanks again to #Nick for pointing me to the model.py source file.
I'd love to hear if there's a simpler way to get at MR user params...

Context is only available to mappers/reducers - it's largely concerned with things that don't make sense outside the context of one. As you can see from the source, however, the "Mapreduce-Id" header is set, from which you can get the ID of the mapreduce job.
You shouldn't have to do your own cleanup, though - mapreduce has a handler that does it for you.

<class 'google.appengine.runtime.DeadlineExceededError'>: how to get around?

Ok guys I am having tons of problems getting my working dev server to a working production server :). I have a task that will go through and request urls and collect and update data. It takes 30 minutes to run.
I uploaded to production server and going to the url with its corresponding .py script appname.appspot.com/tasks/rrs after 30 seconds I am getting the class google.appengine.runtime.DeadlineExceededError' Is there any way to get around this? Is this a 30 second deadline for a page? This script works fine in development server I go to the url and the associate .py script runs until completion.
import time
import random
import string
import cPickle
from StringIO import StringIO
try:
import json
except ImportError:
import simplejson as json
import urllib
import pprint
import datetime
import sys
sys.path.append("C:\Program Files (x86)\Google\google_appengine")
sys.path.append("C:\Program Files (x86)\Google\google_appengine\lib\yaml\lib")
sys.path.append("C:\Program Files (x86)\Google\google_appengine\lib\webob")
from google.appengine.api import users
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
from google.appengine.ext import db
class SR(db.Model):
name = db.StringProperty()
title = db.StringProperty()
url = db.StringProperty()
##request url and returns JSON_data
def overview(page):
u = urllib.urlopen(page)
bytes = StringIO(u.read())
##print bytes
u.close()
try:
JSON_data = json.load(bytes)
return JSON_data
except ValueError,e:
print e," Couldn't get .json for %s" % page
return None
##specific code to parse particular JSON data and append new SR objects to the given url list
def parse_json(JSON_data,lists):
sr = SR()
sr.name = ##data gathered
sr.title = ##data gathered
sr.url = ##data gathered
lists.append(sr)
return lists
## I want to be able to request lets say 500 pages without timeing out
page = 'someurlpage.com'##starting url
url_list = []
for z in range(0,500):
page = 'someurlpage.com/%s'%z
JSON_data = overview(page)##get json data for a given url page
url_list = parse_json(JSON_data,url_list)##parse the json data and append class objects to a given list
db.put(url_list)##finally add object to gae database

Yes, the App Engine imposes a 30 seconds deadline. One way around it might be a try/except DeadlineExceededError and putting the rest in a taskqueue.
But you can't make your requests run for a longer period.
You can also try Bulkupdate
Example:
class Todo(db.Model):
page = db.StringProperty()
class BulkPageParser(bulkupdate.BulkUpdater):
def get_query(self):
return Todo.all()
def handle_entity(self, entity):
JSON_data = overview(entity.page)
db.put(parse_json(JSON_data, [])
entity.delete()
# Put this in your view code:
for i in range(500):
Todo(page='someurlpage.com/%s' % i).put()
job = BulkPageParser()
job.start()

ok so if I am dynamically adding links as I am parsing the pages, I would add to the todo queue like so I believe.
def handle_entity(self, entity):
JSON_data = overview(entity.page)
data_gathered,new_links = parse_json(JSON_data, [])##like earlier returns the a list of sr objects, and now a list of new links/pages to go to
db.put(data_gathered)
for link in new_links:
Todo(page=link).put()
entity.delete()

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Storing BlobKey in DataStore with app engine - google-app-engine

Related

Upload to Amazon S3 using Boto3 and return public url

gae-boilerplate existing blob not showing

Unable to download file from blobstore

How to pass a google mapreduce parameter to done_callback

<class 'google.appengine.runtime.DeadlineExceededError'>: how to get around?

Categories

Resources