How generate datastore-indexes.xml in Google App engine application (Java) - google-app-engine

We have a java application and we deployed this on Google App engine. We created around 150 indexes in datastore and which are running fine in production.
but somehow we missed indexes information in datastore-indexes-auto.xml and there is no any file with name datastore-indexes.xml.
Now we want to have datastore-indexes.xml / datastore-indexes-auto.xml with all existing indexes which serving in production now.
How can we do this? I checked appcfg/gcloud commands, there is no any command to import/download the indexes file from app engine application.
Thanks

You could download your deployed app code (How do I download a specific service's source code off of AppEngine?) or check it in StackDriver (similar to Google Cloud DataStore automatic indexing, but looking for the java-specific file(s) instead of index.yaml) and copy/paste the index configs from there.
Place those configs into your app's version-controlled datastore-indexes.xml file (create it if needed) - these will be the manually-maintained indexes. The development server will continue maintain the missing ones automatically in datastore-indexes-auto.xml. The datastore will combine the info from the 2 files at deployment time.
Note that the Datastore indexes are cumulative, they're not automatically deleted if fewer of them are present in newer versions of the xml files, they have to me manually vacuumed/deleted. So check that the index configs recovered with the above method(s) are indeed all of those displayed in the Datastore Indexes page, any missing ones would have to be reconstructed manually from that page info.

As of gcloud 211.0.0 you can list your composite indexes with gcloud beta datastore indexes list

Related

How do I dynamically generate a sitemap with Google App Engine

My website changes every day - I run a news website with new stories every day. I want Google to index my site as often as possible and want/need to autogenerate the sitemap.
I use Google App Engine (with Node.js) to run my site. With GAE - I do not have write-access to the root directory. To post the site map - I need to re-deploy my whole site after generating the map. That is an unnecessarily complex step.
I have searched far and wide and cannot see how to save my sitemap. So - I considered using a static one with a dynamically generated child that I store in another location where I have write access. Google says it wants all linked sitemaps in the same directory. So that appears to be a dead-end.
Can I use "App Deploy" in such a way that only the sitemap is uploaded? Any other possibilities? Appreciate any and all suggestions. It seems unlikely that Google didn't provide some way to solve this.
For a site where new URLs are being created regularly (like a news, blog site, etc), don't 'store' your sitemap. It should be generated on demand i.e. your App should include code to generate the content when the link <your_website>/sitemap.xml is loaded.
Separately, you should note that gcloud app deploy doesn't always deploys all your files. It usually deploys only files that have changed. You can easily confirm this by running the deploy command, changing a single file and then running the deploy command again. You will see that the logs will say something like - Uploading 1 files to Google Cloud Storage and the deploy will be faster. You can change X number of files, deploy again and the message will be updated to indicate it is only deploying x files.
However, I'm not sure what it uses to compute the diff. Maybe it compares it to the files currently in your staging bucket and if the files in the staging bucket have been deleted (they have a default life span of 15 days) it will deploy all the files again (but as I said, I'm not sure of this)

Bucket of Staging files after deploying an app engine

After deploying a google app engine, at least 4 buckets are created in the google cloud storage:
[project-id].appspot.com
staging.[project-id].appspot.com
artifacts.[project-id].appspot.com
vm-containers.[project-id].appspot.com
What are they, and will they incur storage cost? Can they be safely deleted?
I believe the "artifacts" bucket is what they're referring to here. A key point is the following:
Once deployment is complete, App Engine no longer needs the container images. Note that they are not automatically deleted, so to avoid reaching your storage quota, you can safely delete any images you don't need.
I discovered this after (to my great surprise) Google started charging me money every month. I saw that the "artifacts" bucket had a directory named "images". (I naively thought that it had something to do with graphics or photographs, which was quite mysterious as my app doesn't do anything with graphics.)
Staging buckets are described in the App Engine's documentation when Setting Up Google Cloud Storage.
I am quoting relevant information here for future viewers:
Note: When you create a default bucket, you also get a staging bucket
with the same name except that staging. is prepended to it. You can
use this staging bucket for temporary files used for staging and test
purposes; it also has a 5 GB limit, but it is automatically emptied on
a weekly basis.
So in essence, when you create either an app Engine Standard or Flexible, you get these two buckets. You can delete the buckets (I deleted the staging one) and I was able to recover it by running gcloud beta app repair.
They are not mandatory for a GAE app - one has to explicitly enable GCS for a GAE app for some of these to be created.
At least a while back only the 1st 2 were created by default (for a standard environment python app) when GCS was enabled and they are by default empty.
It is possible that the others are created by default as well these days, I'm not sure. But they could also be created by and used for something specific you're doing in/for your app - only you can tell that.
You can check what's in them via the Storage menu in the developer console. That might give a hint as for their usage. For my apps which have such buckets created - they're empty.
From Default Google Cloud Storage bucket:
Applications can use a Default Google Cloud Storage bucket, which has
free quota and doesn't require billing to be enabled for the app. You
create this free default bucket in the Google Cloud Platform Console
App Engine settings page for your project.
The free quota is 5 GB, so as long as you don't reach that you're OK.
Now there is a matter of one bucket mentioned in the docs vs the multiple ones actually seen - debatable, I'm not sure what to suggest.
In short - I'd check the content of these directories. If they're not empty I'd check the estimated costs for any indication that the free 5 GB quota might not be applicable for them. If that's the case I'd investigate the actual usage and decide if to delete something or not.
Otherwise I'd just leave them be.
An update on what staging is for (at least in Python GAE Standard):
https://cloud.google.com/appengine/docs/standard/python3/using-cloud-storage
App Engine also creates a bucket that it uses for temporary storage when it deploys new versions of your app. This bucket, named staging.project-id.appspot.com, is for use by App Engine only. Apps can't interact with this bucket.
Still can't figure out what artifacts is for.
artifacts.[project-id].appspot.com These files in the bucket are created by the google container registry.
WARNING: Deleting them will cause you to lose access to your container registry.

How can I export data from Google App Engine High Replication datastore?

I am looking into using Google App Engine for a project and would like make sure I have a way to export all my data if I ever decide to leave GAE (or GAE shuts down).
Everything I search about exporting data from GAE points to https://developers.google.com/appengine/docs/python/tools/uploadingdata. However, that page contains this note:
Note: This document applies to apps that use the master/slave
datastore. If your app uses the High Replication datastore, it is
possible to copy data from the app, but Google does not currently
support this use case. If you attempt to copy from a High Replication
datastore, you'll see a high_replication_warning error in the Admin
Console, and the downloaded data might not include recently saved
entities.
The problem is that recently the master/slave datastore was recently deprecated in favor of the High Replication datastore. I understand that the master/slave datastore is still supported for a little while, but I don't feel comfortable using something that has officially been deprecated and is on its way out. So that leaves me with the High Replication datastore and the only way it seems to export the data is the method above that is not officially supported (and thus does not provide me with a guarantee that I can get my data out).
Is there any other (officially supported) way of exporting data from the High Replication datastore? I don't feel comfortable using Google App Engine if it means my data could be locked in there forever.
It took me quite a long time to setup the download of data from GAE as the documentation is not as clear as it should be.
If you extracting data from a Unix server, you maybe could reuse the script below.
Also, if you do not provide the "config_file" parameter, it will extract all your data for this kind but in a proprietary format which can only be used for restoring data afterwards.
#!/bin/sh
#------------------------------------------------------------------
#-- Param 1 : Namespace
#-- Param 2 : Kind (table id)
#-- Param 3 : Directory in which the csv file should be stored
#-- Param 4 : output file name
#------------------------------------------------------------------
appcfg.py download_data --secure --email=$BACKUP_USERID -- config_file=configClientExtract.yml --filename=$3/$4.csv --kind=$2 --url=$BACKUP_WEBSITE/remote_api --namespace=$1 --passin <<-EOF $BACKUP_PASSWORD EOF
Currently app engine datastore supports another option also. Data backup provision can be used to copy selected data into blob store or google cloud storage. This function is available under datastore admin area in app engine console. If required, the backed up data can then be downloaded from the blob viewer or cloud storage. For doing the backup for high replication datastore, it is recommended that datastore writes are disabled before taking the backup.
You need to configure a builtin called remote_api. This article has all the information and guide you need to be able to download all your data today and in the future.

Datastore Location with Google App Engine / Java

How can I customize the location of the datastore file while working with GAE/J.
The option --datastore_path doesn't seem to work with GAE/J.
And if it is possible, what option do I use in the maven-gae-plugin.
I assume you mean while running the dev app server locally. Try the --generated_dir option with <sdk>/bin/dev_appserver.sh:
--generated_dir=dir Set the directory where generated files are created.
The generated files include the local datastore file.

Clean datastore for GoogleAppEngine

How to clear datastore in GoogleAppEngine.
I want to clear my development data to do a test again, but I can found a way to do that
If you are running from the commandline, use the --clear_datastore flag, e.g.,
dev_appserver.py --clear_datastore=yes app
Otherwise, if you're running it off the included GAE launcher, go into the settings of your app (double click it), and there should be a little checkbox that says "clear datastore on launch" under Launch Settings.
reference:
https://cloud.google.com/appengine/docs/python/tools/devserver#Python_Using_the_Datastore
dev_appserver.py --clear_datastore myapp
assuming by "development data", you mean the data in the dev server.
Simply use Administering Your Datastore (Experimental)
Some Other ways
App Engine: How to "reset" the datastore?
Delete all data for a kind in Google App Engine
Interactive console is also a great way.
For Java, the following information can be seen in Using the Datastore - Clearing the Datastore at the end of the page (as of 2013/05/10):
The development web server uses a local version of the Datastore for testing your application, using local files. The data persists as long as the temporary files exist, and the web server does not reset these files unless you ask it to do so.
The file is named local_db.bin, and it is created in your application's WAR directory, in the WEB-INF/appengine-generated/ directory. To clear the Datastore, delete this file.
So, stop your server, delete the file, and restart it up.
On your local machine you can go to : http://localhost:8080/_ah/admin/datastore

Resources