GAE Datastore backup - google-app-engine

Is it necessary to do backups of GAE's Datastore?
Does anyone have any experience, suggestions, tricks for doing so?

Backups are always necessary to protect against human error. Since App Engine encourages you to build mutiple revisions of your code that run against the same dataset, it's important to be able to go back.
A simple dump/restore tool is explained in the Bulkloader documentation.
Something else I've done in the past for major DB refactors is:
Change the entity name in your new code (e.g. User -> Customer or User2 if you have to)
When looking up an entity by key:
Try the key and return if possible
Try the key for the old db.Model class. If you find it, migrate the data, put(), and return the new entity
Use the entity as usual
(You may have to use a task queue to migrate all the data. If you always fetch the entities by key it's not necessary.)
Deploy a new version of your code so that both coexist server-side. When you activate the new version, it is like a point-in-time snapshot of the old entities. In an emergency, you could reactivate the old version and use the old data.

You can now use the managed export and import feature, which can be accessed through gcloud or the Datastore Admin API:
Exporting and Importing Entities
Scheduling an Export

Related

wso2am deployment overrides database, API's are lost

i am using wso2 api-manager 02.01.00 on a linux system. The Api-Manager is deployed at Folder A. The Databases (h2) are deployed ad Folder B which is not in Folder A. The datasources in /repository/conf/datasources/master-datasources.xml are pointing correctly to the databases in Folder B. I configured it like that, because i want do preserve the databases if there is a deployment. (Becaus a fiew Developer are using the API-Manager and they don't want to loose their Data.) But it seem, that WSO2AM_DB.h2.db is created new if there is an api-manager-depoyment. I think this, because i had a look to the DB-Size. I started with a Size of 1750KB for WSO2AM_DB.h2.db. I published a view API's in the Manager and the Size increases to 2774KB. Then i did a Deployment and the size returned to 1750KB.
Effect is that API-Store/Publisher says "There are no APIS published yet".
But i could see the APIS at Application Subscriptions and in Carbon Resources at /_system/governance/apimgt/applicationdata/provider/admin.
I tried to force a new Indexing with this, but it doesn't change anything.
Could i configure at any place, that the Database should not be created/manipulated at start?
Meanwhile i'm really desperated of not solving this problem.
Maybe you could help me.
Thank you for your Time.
WSO2 does not recommend to run on H2 database. You need to use a production database such as mysql, oracle, etc. H2 is only for tryouts.
Basically, WSO2 servers store data in databases as well as use the file system. For this kind of a deployment, you need to do the following.
Point to an external database. If you are using this for demo purposes, still you can go with the current mode (H2 database).
Use dep-sync. The content which comes under the WSO2_HOME/repository/deployment/server location needs to be preserved. You can use SVN based dep-sync or rsync. Basic idea is that for a new deployment, you need to have the data of the previous deployment.
Solr Indexing preservation. If you have hundreds/thousands of APIs in the system, it would take time for indexing. To avoid that you can copy the content of WSO2_HOME/solr to the new deployment.

How do I proxy a production database with a testing database?

I'm currently working on a Grails project which has a static production database with a lot of data in it. I would like to test my application using the production data, but instead of having to clone the production database I'd like to setup a proxy database to the production database.
Essentially reads of the database would go all the way to production database while writes would stop at a proxy database (preferably an h2 database). If a row was updated that came from the production database the row would be saved to the proxy database and returned, instead of the production's row, on subsequent queries.
I'd like to do all of this as transparently to the application as possible. My currently line of thinking is that I'd need to fork the Hibernate GORM implementation and make it support this use case. Has this been done before? Is there a better way?
Forking the Hibernate GORM implementation may not be a good idea. You will be stuck in your version and will have to, somehow, make this up to date with the original plugin (eg. bug fix, new implementations).
Maybe a custom TestMixin that allows you to override all registered domain classes, with new implementations of save(), get(), find() and etc can be an option. You can work with the metaClass to override this static methods and this will be triggered only on tests with the annotated mixin.
With this you can use multiple datasources in the test environment to determine which will be used.

Wordpress distributed development and database management

I am looking for a way to handle a distributed development for Wordpress. For the moment I set up a shared git repository on which I have all the code of the website versioned. The problem I'm having regards how to handle the database. Clearly I need our site running while we (me and other developers) improve the website locally. This means that the user of the website (which is not up yet) will be able to modify our database (user registration, etc.) while we are working on the development of the site locally, using a dump of the database.
What I am trying to understand is the best practice to handle a shared development like this, while the site is running and thus the database can change.
Not sure what you develop, theme or plugins but with WordPress the change in the database should not have an effect on your development, unless you set something up where the user can create new custom posts, with that I mean a new "custom post" not a new post based on a custom post, which could potentially change the behavior of what you are developing.
If the user runs into something odd because of what they did, well that's called bug fixing, the good news is that you can just export and import the database to fix whatever they run into.
Database data changes isn't your problem (dump exchange, if needed, solve most)
Changes of structure are another big question, you can try to see (for brain-powered solution) on LiquiBase

Prevent syncdb from updating database in Django?

I'd like to respect the database as a "read-only" and never write to it. Is there a way to easily prevent syncdb from even bothering to check to update the database?
With Django 1.2 and the ability to have multiple databases, it'd like to be able to query a database for information. I'd never need to actually write to that database.
However, I'd be scared if syncdb ran and attempted to update that database (because I may not have a technically read-only account to that database). Mainly, I'd just like to use/abuse the Django ORM as a way to query that database.
UPDATE: Sorry, I need to be able to sync one of the databases in settings.py, just not this specific one.
Heh, I guess I'll answer my own question (RTFM!)...
http://docs.djangoproject.com/en/dev/topics/db/multi-db/#an-example
def allow_syncdb(self, db, model):
...
That's a definite start...
If you don't need syncdb, don't run it, simple as that. Updating the database is what it does, so if you don't need that, you shouldn't run it - it doesn't do anything else.
However if you're actually asking how to prevent syncdb from running at all, one possibility would be to define a 'dummy' syncdb command inside one of your apps. Follow the custom management command instructions but just put pass inside the command's handle method. Django will always find your version of the command first, making it a no-op.
This issue came up for me when working with read-only mirrors of Microsof SQL Server databases (uhhg). Since you can't selectively run syncdb on a single app or database. But you have to run syncdb when you first create a new Django project or install an new app that requires it (like south). What I did was to put my read-only database in its own Django app and then add an empty South migration to that app. That way syncdb thinks south is handling db setup for those apps, and south doesn't do anything to them!
manage.py schemamigration ap_with_read_only_database --empty initial_empty_migration_that_does_nothing
That leaves you free to manage the schema of that db outside of django.

Wiping the datastore?

I'm working on an app engine project (java). I'm using the jdo interface. I haven't pushed the application yet (just running at localhost). Is there a way I can totally wipe my datastore after I publish? In eclipse, when working locally, I can just wipe the datastore by deleting the local file:
appengine-generated/local_db.bin
any facility like that once published?
I'm using jdo right now, but I might switch to objectify or slim3, and would want a convenient way to wipe my datastore should I switch over, or otherwise make heavy modifications to my classes.
Otherwise it seems like I have to setup methods to delete instances myself, right?
Thanks
you can delete it from admin console if there are not much enitty stored in your app. go to http://appengine.google.com and manually do it. easy for less than 2000-5000 entity.
This question addressed the same topic. There is no one command way to drop an entire datastore's worth of data. The only suggestion I have beyond those give in that previous question, would be to try out the new Mapper functionality, which would make it easy to map over an entire set of entities, deleting them as you went.

Resources