Can't backup GAE application datastore to GS bucket - google-app-engine

I'm trying to backup GAE datastore to GS bucket as described here: https://developers.google.com/appengine/docs/adminconsole/datastoreadmin#Backup_And_Restore. I've tried to supply bucket name in forms:
bucket
/gs/bucket
/gs/bucket/path
but non of it work.
Every time I get a message:
There was a problem kicking some off the jobs/tasks:
Invalid bucket name: 'bucket'
What am I doing wrong? Is it possible at all to backup all data (including blob files) to GS without writing custom code for this?

I got it to work by adding the service account e-mail as a privileged user with write permission.
Here's what I did:
Create bucket via web interface (STORAGE>CLOUD STORAGE>Storage Browser > New Bucket)
Add APPID#appspot.gserviceaccount.com as a privileged user with edit permission (Permissions>Add Member)
Even thought it was part of the same project, for some reason I still had to add the project e-mail as a privileged user.

I suspect the bucket does not exist or else app engine does not have permission to write to the bucket.
Make sure the following are true:
You have created BUCKET. Use something like gsutil to create the bucket if necessary.
gsutil mb gs://BUCKET
Make sure your app engine service account has WRITE access to BUCKET.
The service account is of the form APP_NAME#appspot.gserviceaccount.com.
Add the service account to your project team with can edit access.
Alternatively you can change the bucket acl and the service account there. This option is more complicated.
Now start the backup using the form /gs/BUCKET
If you get an Bucket "/gs/BUCKET" is not accessible message then your bucket does not exist, or APP_NAME#appspot.gserviceaccount.com does not have access to your bucket.
NOTE: the form is /gs/BUCKET. The following are wrong: BUCKET, gs://BUCKET, gs/BUCKET etc.
Check that the bucket exists with the right permissions with following command:
gsutil getacl gs://BUCKET # Note the URI form here instead of a path.
Look for an entry like the following:
<Entry>
<Scope type="UserByEmail">
<EmailAddress>APP_NAME#appspot.gserviceaccount.com</EmailAddress>
</Scope>
<Permission>WRITE</Permission>
</Entry>
If you don't see one you can add one in the following manner:
gsutil getacl gs://BUCKET > acl.xml
vim acl.xml # Or your favorite editor
# Add the xml above
gsutil setacl acl.xml gs://BUCKET
Now the steps above will work.

Make sure to follow closely the instructions here:
https://cloud.google.com/appengine/docs/standard/python/console/datastore-backing-up-restoring#restoring_data_to_another_app
Things to make sure:
add ACL permission to your target application
if the backup is already created before adding permission to the bucket, find the backup and add permission
add [PROJECT_ID]#appspot.gserviceaccount.com as a member of you source application with Editor role
the path to import in your source application is /gs/bucket

I just spent a while wrestling with this myself. Thank you #fejta for your assistance.
I could not figure this out. I had added my user to the project, verified that I could write, manually updated the ACL (which should not have been required), ...
In the end, creating a bucket from the command line via:
gsutil mb gs://BUCKET
instead of the web user interface worked for me. Multiple buckets created either before or after adding the user to the team all resulted in 'Invalid bucket name'
I addressed it with:
/gs/BUCKET

Related

How do you resolve an "Access Denied" error when invoking `image_uris.retrieve()` in AWS Sagemaker JumpStart?

I am working in a SageMaker environment that is locked down. For example, my user account is prevented from creating S3 buckets. But, I can successfully run vanilla ML training jobs by passing in role=get_execution_role to an instance of the Estimator class when using an out-of-the-box algorithm such as XGBoost.
Now, I'm trying to use an algorithm (LightBGM) that is only available via the JumpStart feature in SageMaker, but I can't get it to work. When I try to retrieve an image URI via image_uris.retrieve(), it returns the following error:
ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied.
This makes some sense to me if my user permissions are being used when creating an object. But what I want to do is specify another role - like the one returned from get_execution_role - to perform these tasks.
Is that possible? Is there another work-around available? How can I see which role is being used?
Thanks,
When I encountered this issue, it was a permissions issue with a bucket that had changed.
In the SageMaker Python SDK source code , there is a cache that is located at in an AWS-owned bucket: jumpstart-cache-prod-{region}. and a manifest.json that translates the ECR path for the image for you.
If you look at the stack trace, it could be erroring out at the code that is looking for the manifest.
One place to look is if there are new restrictions placed in IAM, Included here is the minimum policy you need to access JumpStart (pretrained) models

Set all files in Google Cloud Storage Bucket to public by default

I am trying to set the files that are uploaded to a bucket to public by default.
When editing the bucket permissions, I get the popup below which I don't understand and I can't find any documentation about it. How do I set availability to the public?
The 'entity' selectboxes have the options: domain, group, user, project
The settings currently don't seem to set the files to public, because when I try to access a file through the url obtained with CloudStorageTools::getPublicUrl($fileName, false) I get:
<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
</Error>
You;ll want to set an ACL for that:
gsutil defacl set public-read gs://bucket
Making Data Public
Active Google Cloud Shell
gsutil acl ch -u AllUsers:R gs://[BUCKET_NAME]/[OBJECT_NAME]
If successful, the response looks like the following example:
Updated ACL on gs://[BUCKET_NAME]/[OBJECT_NAME]
In the command, in the path after choosing the bucket, the CamelCase writing style must be respected.
I recommend applying the command to each file separately to get the short URL. Because if we assign the permission to the directory it will be possible to access the files but through a long URL.
Short URL:
https://storage.googleapis.com/[BUCKET_NAME]/[FILE_PATH]/[FILE]

Datastore backup import: Failed to read bucket "XYZ" is not accessible

I'm working with two separate projects. One is for production, the other dev.
I have backed up the production datastore into a bucket. Now I want to import that into the dev datastore. But when I try, I get the message:
Failed to read bucket: Bucket "the.bucket.name" is not accessible
I thought it might be permissions, I added the dev project with owners-devid and editors-devid and my-email as owners of the bucket. But still got the same error.
gsutil ls working for me, I think that I am not having an issue specifying the bucket.
The issue I had was that I was adding the dev project into permissions as
Project editors-############## Editor
vs
User [project name]#appspot.gserviceaccount.com Editor
The datastore import happens under the user account.
I have the same problem. Setting the permissions as:
User [project name]#appspot.gserviceaccount.com Writer
allows to perform the backup in another project bucket, however it doesn't allow to import from that bucket. I also tried to set owner permission, but the result was the same.
The error reported is:
Requested path https://storage.googleapis.com/[bucket_name]/[id_backup_info].info is not accessible/access denied

TransformationError on blob via get_serving_url (app engine)

TransformationError
This error keeps coming up for a specific image.
There are no problems with other images and I'm wondering what the reason for this exception could be.
From Google:
"Error while attempting to transform the image."
Update:
Development server it works fine, only live it fails.
Thanks
Without more information I'd say it's either the image is corrupted, or it's in a format that cannot be used with get_serving_url (animate GIF for example).
I fought this error forever and incase anyone finds they get the dreaded TransformationError please note that you need to make sure that your app has owner permissions on the files you want to generate a url for
It'll look something like this in your IAM tab:
App Engine app default service account
your-project-name-here#appspot.gserviceaccount.com
In IAM on that member you want to scroll down to Storage and grant "Storage Object Admin" to that user. That is as long as you have your storage bucket under the same project... if not I'm not sure how...
This TransformationError exception seems to show up for permissions errors so it is a bit misleading.
I way getting this error because I had used the Bucket Policy Only permissions on a bucket in a different project.
However after changing this back to Object Level permissions and giving my App Engine app access (from a different project) I was able to perform the App Engine Standard Images operation (google.appengine.api.images.get_serving_url) that I was trying to implement.
Make sure that you set your permissions correctly either in the Console UI or via gsutil like so:
gsutil acl ch -u my-project-a#appspot.gserviceaccount.com:OWNER gs://my-project-b

302 status when copying data to another app in AppEngine

I'm trying to use the "Copy to another app" feature of AppEngine and keep getting an error:
Fetch to http://datastore-admin.moo.appspot.com/_ah/remote_api failed with status 302
This is for a Java app but I followed the instructions on setting up a default Python runtime.
I'm 95% sure it's an authentication issue and the call to remote_api is redirecting to the Google login page. Both apps use Google Apps as the authentication mechanism. I've also tried copying to and from a third app we have which uses Google Accounts for authentication.
Notes:
The user account I log in with is an Owner on all three apps. It's a Google Apps account (if that wasn't obvious).
I have a gmail account this is an Owner on all three apps as well. When I log in to the admin console with it, I don't see the datastore admin console at all when I click it.
I'm able to use the remote_api just fine from the command-line after I enter my details
Tried with both the Python remote_api built-in and the Java one.
I've found similar questions/blog posts about this, one of which required logging in from a browser, then manually submitting the ACSID cookie you get after that's done. Can't do that here, obviously.
OK, I think I got this working.
I'll refer to the two appIDs as "source" and "dest".
To enable datastore admin (as you know) you need to upload a Python project with the app.yaml and appengine_config.py files as described in the docs.
Either I misread the docs or there is an error. The "appID" inthe .yaml should be the app ID you are uploading to to enable DS admin.
The other appID in the appengine_config file, specifically this line:
remoteapi_CUSTOM_ENVIRONMENT_AUTHENTICATION = (
'HTTP_X_APPENGINE_INBOUND_APPID', ['appID'])
Should be the appID of the "source", ID the app id of where the data is coming from in the DS copy operation.
I think this line is what allows the source appID to be authenticated as having permissions to write to the "dest" app ID.
So, I changed that .py, uploaded again to my "dest" app ID. To be sure I made this dummy python app as default and left it as that.
Then on the source app ID I tried the DS copy again, and all the copy jobs were kicked off OK - so it seems to have fixed it.

Resources