dowload app engine ndb entities via bulk exporter / bulk uploader - google-app-engine

Context:
My model classes inherit from a base class:
class BaseModel(ndb.model):
# commom fields and methods
class SpecificModel(BaseModel):
# specific fields and methods
Problem:
I want to export the SpecificModel entities using the appengine bulkuploader service.
I have the defined the config file (data_loader.py):
import sys
sys.path.append('.') ## this is to ensure that it finds the file 'models.py'
from google.appengine.ext import ndb
from google.appengine.tools import bulkloader
from models import *
class SpecificModelExporter(bulkloader.Exporter):
def __init__(self):
bulkloader.Exporter.__init__(self, 'SpecificModel',
[('fieldOne', str, None),
('fieldTwo', str, None)
])
exporters = [ SpecificModelExporter ]
I use the following command to download data:
appcfg.py download_data --config_file=data_loader.py --filename=data.csv --kind=SpecificModel --url=http://url.appspot.com/_ah/remote_api
When I try to download the data I get the following error:
google.appengine.ext.db.KindError: No implementation for kind 'SpecificModel'
Any clues?

Have a look at the source code:
Your model will be looked up in GetImplementationClass via
implementation_class = db.class_for_kind(kind_or_class_key)
but the registry of db models will not include any ndb models you've defined. A similar registry is created in ndb.Model._kind_map and any db models you had defined would not be found there.
NOTE: As far as I can tell there is no corresponding issue/feature request asking for ndb support in the bulk loader or an equivalent ndb bulk loader. It may be worth filing one and starring it.

Related

Flask and connecting to a database confusion

I am currently creating a web application using Flask. My main issue at the moment understanding the concept of connecting to a database as there are many resources online which are confusing me in terms of establishing a solid connection to a database. The Syntax to SQL is not a problem as I have knowledge of that.
I am choosing SQLAlchemy with a dialect of SQLite instead of MySQL, PostgresSQL and etc.
My first question is: is choosing a dialect while using SQLAlchemy necessary? Can we not use SQLAlchemy as it is?
Second Question: I have seen many examples and tutorials online using "phpMyAdmin" or something similar to have a visual and interactive way to deal with their database (relations) in their localhost browser. Is this necessary to set-up before creating any type of database connection for any type of project?
Second Question (extension): To set up pypMyAdmin, there are tutorials such as "https://www.youtube.com/watch?v=hVHFPzjp064&t=238s" indicating to activate apache, activate PHP, and download MySQL to use a workbench. As stated in the second question, are these steps mandatory - as many tutorials don't seem to show how to set this up.
Third Question: Due to my project slowly growing, I am using the 'separation of concerns' concept. My file tree is the following:
After researching, I believe I should include database related code with the __init__.py file? Plus, of course updating the config file with the necessary configurations? What I don't understand is, the syntax used to connect to a database. The following code will show my code in both files stated above:
__init__.py
# This class will ultimately bring our entire application together.
from flask import Flask
from config import Config
from flask_sqlalchemy import SQLAlchemy
from flask_migrate import Migrate
# Creating Flask app.
app = Flask(__name__)
# Creating a database object which represents the database.
# Created a migration object which represents the migration engine.
db = SQLAlchemy(app)
migrate = Migrate(app, db)
# TODO Explain reasons for using this method:
# Using method to determine Flask environment from the following link:
# https://www.youtube.com/watch?v=GW_2O9CrnSU&t=366s
if app.config["ENV"] == "production":
app.config.from_object("config.ProductionConfig")
elif app.config["ENV"] == "testing":
app.config.from_object("config.TestingConfig")
else:
app.config.from_object("config.DevelopmentConfig")
# Importing views file to avoid circular import.
from app import views
from app import admin_views
from app import routes, models
config.py
# This class contains important information regarding the conifgurations for this application.
# It is good practice to keep configurations of the application in a seperate file. This enforces the
# practice of 'seperation of concerns'.
# There is a main class "Config" which has subclasses as illustrated below. The configuration settings
# are defined as class variables within the 'Config' class. As the application grows, we can create subclasses.
import os
basedir = os.path.abspath(os.path.dirname(__file__))
# The SECRET_KEY is important as it...
class Config(object):
DEBUG = False
TESTING = False
SECRET_KEY = '\xb6"\xc5\xce\xc2D\xd1*\x0c\x06\x83 \xbc\xdbM\x97\xe2\xf4OZ\xdc\x16Jv'
# The SQLAlchemy extension is connecting the location of the database from the URI variable.
# The fallback value if the value is not defined is given below as the URL.
SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL') or \
'sqlite:///' + os.path.join(basedir, 'app.db')
# The 'modifications' config option is set to false as it prevents a signal from appearing whenever
# there is a change made within the database.
SQLALCHEMY_TRACK_MODIFICATIONS = False
class ProductionConfig(Config):
pass
class DevelopmentConfig(Config):
DEBUG = True
class TestingConfig(Config):
TESTING = True
I apologise if my questions seem all over the place. The more I research, the more confused I am becoming with being able to successfully connect to a database.
I would appreciate if someone could answer my concerns in an 'easy to understand' way.
You can avoid (or forestall) some amount of confusion by starting from
flask-sqlalchemy, which provides a convenience layer over SQLAlchemy.
It will arrange for SQLALCHEMY_DATABASE_URI to be turned into an SQLAlchemy "engine" for the database specified in the URI. SQLAlchemy does all of the heavy lifting. There's no need to do the create_engine() yourself when using flask-sqlalchemy.
Adding that following the organizational scheme from chapter 15 of the Flask Mega Tutorial will carry you quite far.

Datastore entity access from ODK

I'm trying to access data that ODK has pushed into the datastore. The below code words fine when I query an entity that I created via Python, which was called "ProductSalesData". The entity name ODK has given it's data is "opendatakit.test1". When I update the data model to class opendatakit.test1(db.Model) it obviously bombs due to a sytax error. How do I call that data?
#!/usr/bin/env python
import webapp2
from google.appengine.ext import db
class ProductSalesData(db.Model):
product_id = db.IntegerProperty()
date = db.DateTimeProperty()
store = db.StringProperty()
q = ProductSalesData.all()
class simplequery(webapp2.RequestHandler):
def get(self):
for ProductSalesData in q:
self.response.out.write('Result:%s<br />' % ProductSalesData.store)
app = webapp2.WSGIApplication(
[('/', simplequery)],
debug=True)
I know you tagged GAE, but do you have to access it straight through the datastore?
If not, I've had better success using the API that has already been built into aggregate: https://code.google.com/p/opendatakit/wiki/BriefcaseAggregateAPI
If you need GAE access I'd suggest the ODK developers group over on google groups - they're pretty active.

AppEngine datastore - backup programmatically

I would like to backup my app's datastore programmatically, on a regular basis.
It seems possible to create a cron that backs up the datastore, according to https://developers.google.com/appengine/articles/scheduled_backups
However, I require a more fine-grained solution: Create different backup files for dynamically changing namespaces.
Is it possible to simply call the /_ah/datastore_admin/backup.create url with GET/POST?
Yes; I'm doing exactly that in order to implement some logic that couldn't be done with cron.
Use the taskqueue API to add the URL request, like this:
from google.appengine.api import taskqueue
taskqueue.add(url='/_ah/datastore_admin/backup.create',
method='GET',
target='ah-builtin-python-bundle',
params={'kind': ('MyKind1', 'MyKind2')})
If you want to use more parameters that would otherwise go into the cron url, like 'filesystem', put those in the params dict alongside 'kind'.
Programmatically backup datastore based on environment
This comes in addition to Jamie's answer. I needed to backup the datastore to Cloud Storage, based on the environment (staging/production). Unfortunately, this can no longer be achieved via a cronjob so I needed to do it programmatically and create a cron to my script. I can confirm that what's below is working, as I saw there were some people complaining that they get a 404. However, it's only working on a live environment, not on the local development server.
from datetime import datetime
from flask.views import MethodView
from google.appengine.api import taskqueue
from google.appengine.api.app_identity import app_identity
class BackupDatastoreView(MethodView):
BUCKETS = {
'app-id-staging': 'datastore-backup-staging',
'app-id-production': 'datastore-backup-production'
}
def get(self):
environment = app_identity.get_application_id()
task = taskqueue.add(
url='/_ah/datastore_admin/backup.create',
method='GET',
target='ah-builtin-python-bundle',
queue_name='backup',
params={
'filesystem': 'gs',
'gs_bucket_name': self.get_bucket_name(environment),
'kind': (
'Kind1',
'Kind2',
'Kind3'
)
}
)
if task:
return 'Started backing up %s' % environment
def get_bucket_name(self, environment):
return "{bucket}/{date}".format(
bucket=self.BUCKETS.get(environment, 'datastore-backup'),
date=datetime.now().strftime("%d-%m-%Y %H:%M")
)
You can now use the managed export and import feature, which can be accessed through gcloud or the Datastore Admin API:
Exporting and Importing Entities
Scheduling an Export

What's a namespace used for in the App Engine datastore?

In the development admin console, when I look at my data, it says "Select different namespace".
What are namespaces for and how should I use them?
Namespaces allow you to implement segregation of data for multi-tenant applications. The official documentation links to some sample projects to give you an idea how it might be used.
Namespaces is used in google app engine to create Multitenant Applications. In Multitenent applications single instance of the application runs on a server, serving multiple client organizations (tenants). With this, an application can be designed to virtually partition its data and configuration (business logic), and each client organization works with a customized virtual application instance..you can easily partition data across tenants simply by specifying a unique namespace string for each tenant.
Other Uses of namespace:
Compartmentalizing user information
Separating admin data from application data
Creating separate datastore instances for testing and production
Running multiple apps on a single app engine instance
For More information visit the below links:
http://www.javacodegeeks.com/2011/12/multitenancy-in-google-appengine-gae.html
https://developers.google.com/appengine/docs/java/multitenancy/
http://java.dzone.com/articles/multitenancy-google-appengine
http://www.sitepoint.com/multitenancy-and-google-app-engine-gae-java/
Looking, towards this question is not that much good reviewed and answered so trying to give this one.
When using namespaces, we can have a best practice of key and value separation there on a given namespace. Following is the best example of giving the namespace information thoroughly.
from google.appengine.api import namespace_manager
from google.appengine.ext import db
from google.appengine.ext import webapp
class Counter(db.Model):
"""Model for containing a count."""
count = db.IntegerProperty()
def update_counter(name):
"""Increment the named counter by 1."""
def _update_counter(name):
counter = Counter.get_by_key_name(name)
if counter is None:
counter = Counter(key_name=name);
counter.count = 1
else:
counter.count = counter.count + 1
counter.put()
# Update counter in a transaction.
db.run_in_transaction(_update_counter, name)
class SomeRequest(webapp.RequestHandler):
"""Perform synchronous requests to update counter."""
def get(self):
update_counter('SomeRequest')
# try/finally pattern to temporarily set the namespace.
# Save the current namespace.
namespace = namespace_manager.get_namespace()
try:
namespace_manager.set_namespace('-global-')
update_counter('SomeRequest')
finally:
# Restore the saved namespace.
namespace_manager.set_namespace(namespace)
self.response.out.write('<html><body><p>Updated counters')
self.response.out.write('</p></body></html>')

How do I get a list of namespaces on google app engine?

I would like to make a backup of all user data in the datastore. My application is using the new namespace feature to provide multi tenanting on a per user basis (as per the example in the docs).
The bulk loader needs the namespace for each customer to download the data. I don't keep a list of users, so I can't generate the namespaces. Is there a method of detecting all the currently used namespaces?
Since SDK 1.4.0 you can use Metadata Queries:
from google.appengine.ext.db import metadata
for ns in metadata.get_namespaces():
print "namespace: '%s'" % ns.namespace_name
For NDB the import is slightly different:
from google.appengine.ext.ndb import metadata
There is also now a get_namespaces() function:
from google.appengine.ext.db import metadata
namespaces = metadata.get_namespaces()
get_namespaces() returns a list of Namespace objects. The docs also note that "metadata queries that fetch information on namespaces, kinds, and properties are generally slow to execute."
Using ndb
from google.appengine.ext.ndb import metadata
all_namespaces = [ns for ns in metadata.get_namespaces()]
Using datastore
Per Datastore Metadata:
query = client.query(kind='__namespace__')
query.keys_only()
all_namespaces = [entity.key.id_or_name for entity in query.fetch()]
There's no API to get a list of namespaces. You must keep a record of the ones you use. I use a model specifically for this.

Resources