A user of our application has accidently deleted data. They'd like this to be restored. We have no special logic or datastore entities that can do this.
However, we do daily backups of our entire datastore to blobstore using the datastore admin.
What are our options for selectively restoring part of this backup back into the datastore?
We'd preferably like to not have a service interruption for other users. One final restriction is that we can not change our production app id (i.e. copy data over to a new app and then restore the backup to our old app - this is because our clients reference our appid directly).
Thoughts?
UPDATE
I was thinking of running a mapreduce over all the blobs in our app and finding the ones that are to do with our backup. Parsing these backups and restoring the entities as needed. The only issue is, what format are the blobs stored in? How can I parse them?
Since 1.6.5 the Datastore Admin now allows you to restore individual Kinds from an existing backup.
About the backup format: according to the datastore admin source code you can use RecordsReader to read backup file stored in leveldb log format in a MapperPipeline
The restore functionality in its current form is not very useful for my application. There should be an option to restore only few entities or namespaces into current app-id or another app-id.
Please star this issue
http://code.google.com/p/googleappengine/issues/detail?id=7311
May be custom backup reader help you
final BlobstoreService blobstoreService = BlobstoreServiceFactory.getBlobstoreService();
final BlobKey blobKey = blobstoreService.createGsBlobKey("/gs/" + bucket + "/" + pathToOutputFile);
final RecordReadChannel rrc = BlobserviceHelper.openRecordReadChannel(blobKey, blobstoreService);
ByteBuffer bf;
while ((bf = rrc.readRecord()) != null) {
final OnestoreEntity.EntityProto proto = new OnestoreEntity.EntityProto();
proto.mergeFrom(bf.array());
final Entity entity = EntityTranslator.createFromPb(proto);
entity.removeProperty(""); // Remove empty property
//Now you can save entity to datastore or read keys and properties
}
Related
This used to be possible by downloading with the bulkloader and uploading to the local dev server. However, the bulkloader download has been non-functional for several months now, due to not supporting oauth2.
A few places recommend downloading from a cloud storage backup, and uploading to the local datastore through either bulkloader or by directly parsing the backup. However, neither of these appear functional anymore. The bulkloader method throws:
OperationalError: unable to open database file
And the RecordsReader class, which is used to read the backup files, reaches end of file when trying to read the first record, resulting in no records being read.
Does there exist a current, functional, method for copying the live datastore to the local dev datastore?
RecordsReader is functioning perfectly on unix. I've tried this https://gist.github.com/jehna/3b258f5287fcc181aacf one day ago and it worked amazing.
You should add to the imports your Kinds implementation and run it in the datastore interactive shell.
for example:
from myproject.kinds_implementations import MyKind
I've removed the
for pp in dir(a):
try:
ppp = getattr(a, "_" + pp)
if isinstance(ppp, db.Key):
ppp._Key__reference.set_app(appname)
ppp
except AttributeError:
""" It's okay """
And it worked well. In my case the backup downloaded in multiple directories so I've modified the access to the directories. for some thing like that:
for directory in mypath:
full_directory_path = join(mypath, directory)
for sub_dir in listdir(directory_full_path):
full_sub_dir_path = join(full_directory_path, sub_dir)
onlyfiles = [ f for f in listdir(full_sub_dir_path) if isfile(join(mypath,f)) ]
for file in onlyfiles:
If you're working on windows you're welcome to follow my question about RecordsReader on windows, hopefully someone will answer there Google datastore backup to local dev_appserver
edit:
Working great on windows if you change the file open permissions from 'r' to 'rb'
The bulkloader is still functional on Python with OAuth2, albeit with some caveats. In downloading from the live app, there is an issue with refreshing of the OAuth2 token so the total download time is limited to 3600 seconds, or 3600+3600 if you manually use a refresh token with --oauth2_refresh_token.
When uploading to the development server app, OAuth2 will fail with a 401, so it's necessary to edit google.appengine.ext.remote_api.handler and stub out 'CheckIsAdmin' to always return True as a workaround:
def CheckIsAdmin(self):
return True
user_is_authorized = False
...
I upvoted the above answer however, as it looks like a more robust solution at this point.
Using GAE, I am using javax's entity manager (javax.persistence.EntityManagerFactory) to generate an instance of the entity manager:
private static final EntityManagerFactory emfInstance = Persistence.createEntityManagerFactory("transactions-optional");
I retrieve from the datastore using the following code:
event = mgr.find(Event.class, id);
The problem I have is that if I retrieve the data for the first time, everything goes fine. However, if I go through the "Datastore Viewer" in the GAE dashboard to edit the values manually. The next time I call the data, using the "find" method, the values returned are the old value. I have to manually upload backend again in order to get the new values.
Any idea what is causing this? I would like mgr.find to always call the latest value. Thanks.
The entity is being cached. When you change it through the Datastore Viewer, the entity cached by your backend is not affected.
After you make a change in the Datastore viewer, click on the "Flush Memcache" button.
If this does not help, you may need to change configuration for your caching:
Level2 Caching is enabled by default. To get the previous default
behavior, set the persistence property datanucleus.cache.level2.type
to none. (Alternatively include the datanucleus-cache plugin in the
classpath, and set the persistence property
datanucleus.cache.level2.type to javax.cache to use Memcache for L2
caching.
Try flushing memcache and then try your query again.Most times the last persisted entity data is what's retrieved until you do this.
I've got a database full of BlobKeys that were previously uploaded through the standard Google App Engine create_upload_url() process, and each of the uploads went to the same Google Cloud Storage bucket by setting the gs_bucket_name argument.
What I'd like to do is be able to decode the existing blobkeys so I can get their Google Cloud Storage filenames. I understand that I could have been using the gs_object_name property from the FileInfo class, except:
You must save the gs_object_name yourself in your upload handler or
this data will be lost. (The other metadata for the object in GCS is stored
in GCS automatically, so you don't need to save that in your upload handler.
Meaning gs_object_name property is only available in the upload handler, and if I haven't been saving it at that time then its lost.
Also, create_gs_key() doesn't do the trick because it instead takes a google storage filename and creates a blobkey.
So, how can I take a blobkey that was previously uploaded to a Google Cloud Storage bucket through app engine, and get it's Google Cloud Storage filename? (python)
You can get the cloudstorage filename only in the upload handler (fileInfo.gs_object_name) and store it in your database. After that it is lost and it seems not to be preserved in BlobInfo or other metadata structures.
Google says: Unlike BlobInfo metadata FileInfo metadata is not
persisted to datastore. (There is no blob key either, but you can
create one later if needed by calling create_gs_key.) You must save
the gs_object_name yourself in your upload handler or this data will
be lost.
https://developers.google.com/appengine/docs/python/blobstore/fileinfoclass
Update: I was able to decode a SDK-BlobKey in Blobstore-Viewer: "encoded_gs_file:base64-encoded-filename-here". However the real thing is not base64 encoded.
create_gs_key(filename, rpc=None) ... Google says: "Returns an encrypted blob key as a string." Does anyone have a guess why this is encrypted?
From the statement in the docs, it looks like the generated GCS filenames are lost. You'll have to use gsutil to manually browse your bucket.
https://developers.google.com/storage/docs/gsutil/commands/ls
If you have blobKeys you can use: ImagesServiceFactory.makeImageFromBlob
I'm trying to trouble shoot some issues I'm having with an export task I have created. I'm attempting to export CSV data using Google Cloud Storage and I seem to be unable to export all my data. I'm assuming it has something to do with the (FAR TOO LOW) 30 second file limit when I attempt to restart the task.
I need to trouble shoot, but I can't seem to find where my local/development server writing the files out. I see numerous entries in the GsFileInfo table so I assume something is going on, but I can't seem to find the actual output file.
Can someone point me to the location of the Google Cloud Storage files in the local AppEngine development environment?
Thanks!
Looking at dev_appserver code, looks like you can specify a path or it will calculate a default based on the OS you are using.
blobstore_path = options.blobstore_path or os.path.join(storage_path,
'blobs')
Then it passed this path to blobstore_stub (GCS storage is backed by blobstore stub), which seems to shard files by their blobstore key.
def _FileForBlob(self, blob_key):
"""Calculate full filename to store blob contents in.
This method does not check to see if the file actually exists.
Args:
blob_key: Blob key of blob to calculate file for.
Returns:
Complete path for file used for storing blob.
"""
blob_key = self._BlobKey(blob_key)
return os.path.join(self._DirectoryForBlob(blob_key), str(blob_key)[1:])
For example, i'm using ubuntu and started with dev_appserver.py --storage_path=~/tmp, then i was able to find files under ~/tmp/blobs and datastore under ~/tmp/datastore.db. Alternatively, you can go to local admin_console, the blobstore viewer link will also display gcs files.
As tkaitchuck mentions above, you can use the included LocalRawGcsService to pull the data out of the local.db. This is the only way to get the file, as they are stored in the Local DB using the blobstore. Here's the original answer:
which are the files uri on GAE java emulating cloud storage with GCS client library?
I want to upload data from Google Cloud Storage to BigQuery, but I can't find any Java sample code describing how to do this. Would someone please give me some hint as how to do this?
What I actually wanna do is to transfer data from Google App Engine tables to BigQuery (and sync on a daily basis), so that I can do some analysis. I use the Google Cloud Storage Service in Google App Engine to write (new) records to files in Google Cloud Storage, and the only missing part is to append the data to tables in BigQuery (or create a new table for first time write). Admittedly I can manually upload/append the data using the BigQuery browser tool, but I would like it to be automatic, otherwise I need to manually do it everyday.
I don't know of any java samples for loading tables from Google Cloud Storage into BigQuery. That said, if you follow the instructions for running query jobs here, you can run a Load job instead with the folowing:
Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad loadConfig = new JobConfigurationLoad();
config.setLoad(loadConfig);
job.setConfiguration(config);
// Set where you are importing from (i.e. the Google Cloud Storage paths).
List<String> sources = new ArrayList<String>();
sources.add("gs://bucket/csv_to_load.csv");
loadConfig.setSourceUris(sources);
// Describe the resulting table you are importing to:
TableReference tableRef = new TableReference();
tableRef.setDatasetId("myDataset");
tableRef.setTableId("myTable");
tableRef.setProjectId(projectId);
loadConfig.setDestinationTable(tableRef);
List<TableFieldSchema> fields = new ArrayList<TableFieldSchema>();
TableFieldSchema fieldFoo = new TableFieldSchema();
fieldFoo.setName("foo");
fieldFoo.setType("string");
TableFieldSchema fieldBar = new TableFieldSchema();
fieldBar.setName("bar");
fieldBar.setType("integer");
fields.add(fieldFoo);
fields.add(fieldBar);
TableSchema schema = new TableSchema();
schema.setFields(fields);
loadConfig.setSchema(schema);
// Also set custom delimiter or header rows to skip here....
// [not shown].
Insert insert = bigquery.jobs().insert(projectId, job);
insert.setProjectId(projectId);
JobReference jobRef = insert.execute().getJobReference();
// ... see rest of codelab for waiting for job to complete.
For more information on the load configuration object, see the javadoc here.