AppEngine datastore - backup programmatically - google-app-engine

I would like to backup my app's datastore programmatically, on a regular basis.
It seems possible to create a cron that backs up the datastore, according to https://developers.google.com/appengine/articles/scheduled_backups
However, I require a more fine-grained solution: Create different backup files for dynamically changing namespaces.
Is it possible to simply call the /_ah/datastore_admin/backup.create url with GET/POST?

Yes; I'm doing exactly that in order to implement some logic that couldn't be done with cron.
Use the taskqueue API to add the URL request, like this:
from google.appengine.api import taskqueue
taskqueue.add(url='/_ah/datastore_admin/backup.create',
method='GET',
target='ah-builtin-python-bundle',
params={'kind': ('MyKind1', 'MyKind2')})
If you want to use more parameters that would otherwise go into the cron url, like 'filesystem', put those in the params dict alongside 'kind'.

Programmatically backup datastore based on environment
This comes in addition to Jamie's answer. I needed to backup the datastore to Cloud Storage, based on the environment (staging/production). Unfortunately, this can no longer be achieved via a cronjob so I needed to do it programmatically and create a cron to my script. I can confirm that what's below is working, as I saw there were some people complaining that they get a 404. However, it's only working on a live environment, not on the local development server.
from datetime import datetime
from flask.views import MethodView
from google.appengine.api import taskqueue
from google.appengine.api.app_identity import app_identity
class BackupDatastoreView(MethodView):
BUCKETS = {
'app-id-staging': 'datastore-backup-staging',
'app-id-production': 'datastore-backup-production'
}
def get(self):
environment = app_identity.get_application_id()
task = taskqueue.add(
url='/_ah/datastore_admin/backup.create',
method='GET',
target='ah-builtin-python-bundle',
queue_name='backup',
params={
'filesystem': 'gs',
'gs_bucket_name': self.get_bucket_name(environment),
'kind': (
'Kind1',
'Kind2',
'Kind3'
)
}
)
if task:
return 'Started backing up %s' % environment
def get_bucket_name(self, environment):
return "{bucket}/{date}".format(
bucket=self.BUCKETS.get(environment, 'datastore-backup'),
date=datetime.now().strftime("%d-%m-%Y %H:%M")
)

You can now use the managed export and import feature, which can be accessed through gcloud or the Datastore Admin API:
Exporting and Importing Entities
Scheduling an Export

Related

Total cpu utilization of running app engine flexible instances

I need to make decisions in an external system based on the current CPU utilization of my App Engine Flexible service. I can see the exact values / metrics I need to use in the dashboard charting in my Google Cloud Console, but I don't see a direct, easy way to get this information from something like a gcloud command.
I also need to know the count of running instances, but I think I can use gcloud app instances list -s default to get a list of my running instances in the default service, and then I can use a count of lines approach to get this info easily. I intend to make a python function which returns a tuple like (instance_count, cpu_utilization).
I'd appreciate if anyone can direct me to an easy way to get this. I am currently exploring the StackDriver Monitoring service to get this same information, but as of now it is looking super-complicated to me.
You can use the gcloud app instances list -s default command to get the running instances list, as you said. To retrieve CPU utilization, have a look on this Python Client for Stackdriver Monitoring. To list available metric types:
from google.cloud import monitoring
client = monitoring.Client()
for descriptor in client.list_metric_descriptors():
print(descriptor.type)
Metric descriptors are described here. To display utilization across your GCE instances during the last five minutes:
metric = 'compute.googleapis.com/instance/cpu/utilization'
query = client.query(metric, minutes=5)
print(query.as_dataframe())
Do not forget to add google-cloud-monitoring==0.28.1 to “requirements.txt” before installing it.
Check this code that locally runs for me:
import logging
from flask import Flask
from google.cloud import monitoring as mon
app = Flask(__name__)
#app.route('/')
def list_metric_descriptors():
"""Return all metric descriptors"""
# Instantiate client
client = mon.Client()
for descriptor in client.list_metric_descriptors():
print(descriptor.type)
return descriptor.type
#app.route('/CPU')
def cpuUtilization():
"""Return CPU utilization"""
client = mon.Client()
metric = 'compute.googleapis.com/instance/cpu/utilization'
query = client.query(metric, minutes=5)
print(type(query.as_dataframe()))
print(query.as_dataframe())
data=str(query.as_dataframe())
return data
#app.errorhandler(500)
def server_error(e):
logging.exception('An error occurred during a request.')
return """
An internal error occurred: <pre>{}</pre>
See logs for full stacktrace.
""".format(e), 500
if __name__ == '__main__':
# This is used when running locally. Gunicorn is used to run the
# application on Google App Engine. See entrypoint in app.yaml.
app.run(host='127.0.0.1', port=8080, debug=True)

Query an existing Google Cloud Datastore from Google app engine

I have a Entity with ~50k rows in Google Cloud Datastore, the stand alone not GAE. I am starting development with GAE and would like to query this existing datastore without having to import it to GAE. I have been unable to find a way to connect to an existing datastore Kind.
Basic code altered from Hello World and other guides im trying to get working as a POC.
import webapp2
import json
import time
from google.appengine.ext import ndb
class Product(ndb.Model):
type = ndb.StringProperty()
#classmethod
def query_product(cls):
return ndb.gql("SELECT * FROM Product where name >= :a LIMIT 5 ")
class MainPage(webapp2.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
query = Product.query_product()
self.response.write(query)
app = webapp2.WSGIApplication([
('/', MainPage),
], debug=True)
Returned Errors are
TypeError: Model Product has no property named 'name'
Seems obvious that its trying to use a GAE datastore with the kind Product instead of my existing Datastore with Product already defined, But I cant find how to make that connection.
There is only one Google Cloud Datastore. App Engine does not have a datastore of its own - it works with the same Google Cloud Datastore.
All entities in the Datastore are stored for a particular project. If you are trying to access data from a different project, you will not be able to see it without going through special authentication.
I'm not too certain what it is you're trying to accomplish when you say that you would like to query this existing datastore without having to import it to GAE. I'm guessing that you have project A with the datastore with 50k rows, and you're starting project B. And you want to access the project A datastore from project B. If this is the case, and if you're trying to access the datastore from a different project, then maybe this previous answer that mentions remote api can help you.
Below is working code. I was pretty close at the time I made this original post but the reason I was getting no data back was because I was running my App locally. As soon as I actually deployed my code to App Engine it pulled from Datastore no problem.
import webapp2
import json
import time
from google.appengine.datastore.datastore_query import Cursor
from google.appengine.ext import ndb
class Product(ndb.Model):
name = ndb.StringProperty()
class MainPage(webapp2.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
query = ndb.gql("SELECT * FROM Product where name >= 'a' LIMIT 5 ")
output = query.fetch()
#query = Product.query(Product.name == 'zubo - pre-owned - nintendo ds')
#query = Product.query()
#output = query.fetch(10)
self.response.write(output)
app = webapp2.WSGIApplication([
('/', MainPage),
], debug=True)

How to automate download of weekly export service files

In SalesForce you can schedule up to weekly "backups"/dumps of your data here: Setup > Administration Setup > Data Management > Data Export
If you have a large Salesforce database there can be a significant number of files to be downloading by hand.
Does anyone have a best practice, tool, batch file, or trick to automate this process or make it a little less manual?
Last time I checked, there was no way to access the backup file status (or actual files) over the API. I suspect they have made this process difficult to automate by design.
I use the Salesforce scheduler to prepare the files on a weekly basis, then I have a scheduled task that runs on a local server which downloads the files. Assuming you have the ability to automate/script some web requests, here are some steps you can use to download the files:
Get an active salesforce session ID/token
enterprise API - login() SOAP method
Get your organization ID ("org ID")
Setup > Company Profile > Company Information OR
use the enterprise API getUserInfo() SOAP call to retrieve your org ID
Send an HTTP GET request to https://{your sf.com instance}.salesforce.com/ui/setup/export/DataExportPage/d?setupid=DataManagementExport
Set the request cookie as follows:
oid={your org ID}; sid={your
session ID};
Parse the resulting HTML for instances of <a href="/servlet/servlet.OrgExport?fileName=
(The filename begins after fileName=)
Plug the file names into this URL to download (and save):
https://{your sf.com instance}.salesforce.com/servlet/servlet.OrgExport?fileName={filename}
Use the same cookie as in step 3 when downloading the files
This is by no means a best practice, but it gets the job done. It should go without saying that if they change the layout of the page in question, this probably won't work any more. Hope this helps.
A script to download the SalesForce backup files is available at https://github.com/carojkov/salesforce-export-downloader/
It's written in Ruby and can be run on any platform. Supplied configuration file provides fields for your username, password and download location.
With little configuration you can get your downloads going. The script sends email notifications on completion or failure.
It's simple enough to figure out the sequence of steps needed to write your own program if Ruby solution does not work for you.
I'm Naomi, CMO and co-founder of cloudHQ, so I feel like this is a question I should probably answer. :-)
cloudHQ is a SaaS service that syncs your cloud. In your case, you'd never need to upload your reports as a data export from Salesforce, but you'll just always have them backed up in a folder labeled "Salesforce Reports" in whichever service you synchronized Salesforce with like: Dropbox, Google Drive, Box, Egnyte, Sharepoint, etc.
The service is not free, but there's a free 15 day trial. To date, there's no other service that actually syncs your Salesforce reports with other cloud storage companies in real-time.
Here's where you can try it out: https://cloudhq.net/salesforce
I hope this helps you!
Cheers,
Naomi
Be careful that you know what you're getting in the back-up file. The backup is a zip of 65 different CSV files. It's raw data, outside of the Salesforce UI cannot be used very easily.
Our company makes the free DataExportConsole command line tool to fully automate the process. You do the following:
Automate the weekly Data Export with the Salesforce scheduler
Use the Windows Task Scheduler to run the FuseIT.SFDC.DataExportConsole.exe file with the right parameters.
I recently wrote a small PHP utility that uses the Bulk API to download a copy of sObjects you define via a json config file.
It's pretty basic but can easily be expanded to suit your needs.
Force.com Replicator on github.
Adding a Python3.6 solution. Should work (I haven't tested it though). Make sure the packages (requests, BeautifulSoup and simple_salesforce) are installed.
import os
import zipfile
import requests
import subprocess
from datetime import datetime
from bs4 import BeautifulSoup as BS
from simple_salesforce import Salesforce
def login_to_salesforce():
sf = Salesforce(
username=os.environ.get('SALESFORCE_USERNAME'),
password=os.environ.get('SALESFORCE_PASSWORD'),
security_token=os.environ.get('SALESFORCE_SECURITY_TOKEN')
)
return sf
org_id = "SALESFORCE_ORG_ID" # canbe found in salesforce-> company profile
export_page_url = "https://XXXX.my.salesforce.com/ui/setup/export/DataExportPage/d?setupid=DataManagementExport"
sf = login_to_salesforce()
cookie = {'oid': org_id, 'sid':sf.session_id}
export_page = requests.get(export_page_url, cookies=cookie)
export_page = export_page.content.decode()
links = []
parsed_page = BS(export_page)
_path_to_exports = "/servlet/servlet.OrgExport?fileName="
for link in parsed_page.findAll('a'):
href = link.get('href')
if href is not None:
if href.startswith(_path_to_exports):
links.append(href)
print(links)
if len(links) == 0:
print("No export files found")
exit(0)
today = datetime.today().strftime("%Y_%m_%d")
download_location = os.path.join(".", "tmp", today)
os.makedirs(download_location, exist_ok=True)
baseurl = "https://zageno.my.salesforce.com"
for link in links:
filename = baseurl + link
downloadfile = requests.get(filename, cookies=cookie, stream=True) # make stream=True if RAM consumption is high
with open(os.path.join(download_location, downloadfile.headers['Content-Disposition'].split("filename=")[1]), 'wb') as f:
for chunk in downloadfile.iter_content(chunk_size=100*1024*1024): # 50Mbs ??
if chunk:
f.write(chunk)
I have added a feature in my app to automatically backup the weekly/monthly csv files to S3 bucket, https://app.salesforce-compare.com/
Create a connection provider (currently only AWS S3 is supported) and link it to a SF connection (needs to be created as well).
On the main page you can monitor the progress of the scheduled job and access the files in the bucket
More info: https://salesforce-compare.com/release-notes/

Accessing sqlite datastore from command line

I've been accessing the traditional datastore from the command line as follows:
from google.appengine.api import apiproxy_stub_map
from google.appengine.api.datastore_file_stub import DatastoreFileStub
os.environ['APPLICATION_ID']="myapp"
apiproxy_stub_map.apiproxy=apiproxy_stub_map.APIProxyStubMap()
stubname, stub = 'datastore_v3', DatastoreFileStub(os.environ["APPLICATION_ID"], Datastore, "/")
apiproxy_stub_map.apiproxy.RegisterStub(stubname, stub)
I've upgraded to the sqlite datastore and need to update the stub (and maybe stubname), presumably with DatastoreSqliteStub, but can't seem to initialise it; any suggestions ?
Thanks!
Here is a little module I often reuse in my AppEngine projects: ae.py
It lets me just do:
import ae
ae.connect_local_datastore()
at the top of scripts. or with remote_api setup you can also do:
ae.connect_remote_datastore()
A simple console.py script that makes use of this can be found here
Hope they help.

What's a namespace used for in the App Engine datastore?

In the development admin console, when I look at my data, it says "Select different namespace".
What are namespaces for and how should I use them?
Namespaces allow you to implement segregation of data for multi-tenant applications. The official documentation links to some sample projects to give you an idea how it might be used.
Namespaces is used in google app engine to create Multitenant Applications. In Multitenent applications single instance of the application runs on a server, serving multiple client organizations (tenants). With this, an application can be designed to virtually partition its data and configuration (business logic), and each client organization works with a customized virtual application instance..you can easily partition data across tenants simply by specifying a unique namespace string for each tenant.
Other Uses of namespace:
Compartmentalizing user information
Separating admin data from application data
Creating separate datastore instances for testing and production
Running multiple apps on a single app engine instance
For More information visit the below links:
http://www.javacodegeeks.com/2011/12/multitenancy-in-google-appengine-gae.html
https://developers.google.com/appengine/docs/java/multitenancy/
http://java.dzone.com/articles/multitenancy-google-appengine
http://www.sitepoint.com/multitenancy-and-google-app-engine-gae-java/
Looking, towards this question is not that much good reviewed and answered so trying to give this one.
When using namespaces, we can have a best practice of key and value separation there on a given namespace. Following is the best example of giving the namespace information thoroughly.
from google.appengine.api import namespace_manager
from google.appengine.ext import db
from google.appengine.ext import webapp
class Counter(db.Model):
"""Model for containing a count."""
count = db.IntegerProperty()
def update_counter(name):
"""Increment the named counter by 1."""
def _update_counter(name):
counter = Counter.get_by_key_name(name)
if counter is None:
counter = Counter(key_name=name);
counter.count = 1
else:
counter.count = counter.count + 1
counter.put()
# Update counter in a transaction.
db.run_in_transaction(_update_counter, name)
class SomeRequest(webapp.RequestHandler):
"""Perform synchronous requests to update counter."""
def get(self):
update_counter('SomeRequest')
# try/finally pattern to temporarily set the namespace.
# Save the current namespace.
namespace = namespace_manager.get_namespace()
try:
namespace_manager.set_namespace('-global-')
update_counter('SomeRequest')
finally:
# Restore the saved namespace.
namespace_manager.set_namespace(namespace)
self.response.out.write('<html><body><p>Updated counters')
self.response.out.write('</p></body></html>')

Resources