Google Cloud Function to schedule exports for datastore entities - google-app-engine

We have an app hosted in GCP AppEngine. This app saves images and other data in datastore. The images are separated entities (kinds) say Kind1 and Kind2. We only need to export these two entity kinds and store the export in a storage bucket called datastore-exports. We have exported these two entity kinds manually via console. We would like to create a cloud function that could export the two fore mentioned datastore entity kinds on a daily basis every 24 hours. I need assistance on the files and code logic for this to take place.
Below are two examples that I came across that are somewhat what we want to accomplish.
I see they are doing that with firestore HERE.
I also had a look at this doc HERE but we need to use python 3.
Any assistance on either node.js or python3 methods will be highly appreciated.
Thanks,

I tried to reproduce your use case:
Run the command:
gcloud app decribe
# .......
#locationId: europe-west2
Make sure that your export bucket and your cloud function are deployed in the same location.
Your cloud function will use App Engine default service account.
PROJECT_ID#appspot.gserviceaccount.com
Assign to this service account the role Datastore Import Export Admin
(I would recommend to create a new service account for your cloud function, not using App Engine default service account.)
Create the cloud function:
a.main.py
def export_datastore(request):
import google.auth
import google.auth.transport.requests
import json
import requests
#Get the Access Token
creds, project_id = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
token = creds.token
output_url_prefix = 'gs://your-bucket'
url = 'https://datastore.googleapis.com/v1/projects/{}:export'.format(project_id)
#We export all kinds and all namespaces
entity_filter = {
'kinds': [],
'namespace_ids': []
}
request = {
'project_id': project_id,
'output_url_prefix': output_url_prefix,
'entity_filter': entity_filter
}
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer ' + token,
'Accept': 'application/json'
}
#Call the API to make the Datastore export
r = requests.post(url, data=json.dumps(request), headers=headers)
print(r.json())
return('Export command sent')
b. requirements.txt
# Function dependencies, for example:
google-auth
Use Google Cloud Scheduler to call the cloud function every 24 hours.

Related

How can I call a Google Cloud Function from Google App Engine?

I have an App Engine project.
I also have a Google Cloud Function.
And I want to call that Google Cloud Function from the App Engine project. I just can't seem to get that to work.
Yes, if I make the function full public (i.e. set the Cloud Function to 'allow all traffic' and create a rule for 'allUsers' to allow calling the function) it works. But if I limit either of the two settings, it stops working immediately and I get 403's.
The App and Function are in the same project, so I would at least assume that setting the Function to 'allow internal traffic only' should work just fine, provided that I have a rule for 'allUsers' to allow calling the function.
How does that work? How does one generally call a (non-public) Google Cloud Function from Google App Engine?
You need an auth header for the ping to the function url. It should look like:
headers = {
....
'Authorization': 'Bearer some-long-hash-token'
}
Here is how to get the token:
import requests
token_response = requests.get(
'http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience=' +
'https://[your zone]-[your app name].cloudfunctions.net/[your function name]',
headers={'Metadata-Flavor': 'Google'})
return token_response.content.decode("utf-8")
'Allow internal traffic only' does not work as expected. My App Engine app is in the same project as the Functions, and it does not work. I had to turn on 'Allow all traffic', and use the header method.
Example:
def get_access_token():
import requests
token_response = requests.get(
'http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience=' +
'https://us-central1-my_app.cloudfunctions.net/my_function',
headers={'Metadata-Flavor': 'Google'})
return token_response.content.decode("utf-8")
def test():
url_string = f"https://us-central1-my_app.cloudfunctions.net/my_function?message=it%20worked"
access_token = get_access_token()
print(
requests.get(url_string, headers={'Authorization': f"Bearer {access_token}"}
)
As mentioned in the docs, Allow internal traffic only mentions the following:
Only requests from VPC networks in the same project or VPC Service Controls perimeter are allowed. All other requests are rejected.
Please note that since App Engine Standard is a serverless product, it is not part of the VPC and then the requests made from this product are not considered "Internal" calls, actually the calls are made from the Public IPs of the instances and for this reason you get an HTTP 403 error message.
Also using a VPC Serverless Connector won't work since this more a bridge to reach resources in the VPC (like VMs or Memorystore instances) but not a Cloud Function because this is also a Serverless product and it does not have an IP in the VPC.
I think here are three options:
Using App Engine Flex:
Since App Engine Flex uses VM instances, these instances will be part of the VPC and you'll reach the Function even when setting the "Allow internal traffic only" option.
Use a VM as a proxy:
You can create a VPC Serverless Connector and assign it to the app in App Engine. Then you can create a VM and reach the function using the VM as a proxy. This is not the best option because of the costs but at the end is an option.
The last option considers that the function can use the Allow All Traffic option:
You can set some security on the Cloud Function to only allow a particular Service Account and you can use this sample code to authenticate.
EDITED:
A good sample of the code for this option was shared by #gaefan in the other answer.
#GAEfan is correct.
As an addition: I used the official Google Auth library to give me the necessary headers.
const {GoogleAuth} = require('google-auth-library');
// Instead of specifying the type of client you'd like to use (JWT, OAuth2, etc)
// this library will automatically choose the right client based on the environment.
const googleCloudFunctionURL = 'https://europe-west1-project.cloudfunctions.net/function';
(async function() {
const auth = new GoogleAuth();
let googleCloudFunctionClient = await auth.getIdTokenClient(googleCloudFunctionURL);
console.log(await googleCloudFunctionClient.getRequestHeaders(googleCloudFunctionURL));
})();

Google Cloud Tasks cannot authenticate to Cloud Run

I am trying to invoke a Cloud Run service using Cloud Tasks as described in the docs here.
I have a running Cloud Run service. If I make the service publicly accessible, it behaves as expected.
I have created a cloud queue and I schedule the cloud task with a local script. This one is using my own account. The script looks like this
from google.cloud import tasks_v2
client = tasks_v2.CloudTasksClient()
project = 'my-project'
queue = 'my-queue'
location = 'europe-west1'
url = 'https://url_to_my_service'
parent = client.queue_path(project, location, queue)
task = {
'http_request': {
'http_method': 'GET',
'url': url,
'oidc_token': {
'service_account_email': 'my-service-account#my-project.iam.gserviceaccount.com'
}
}
}
response = client.create_task(parent, task)
print('Created task {}'.format(response.name))
I see the task appear in the queue, but it fails and retries immediately. The reason for this (by checking the logs) is that the Cloud Run service returns a 401 response.
My own user has the roles "Service Account Token Creator" and "Service Account User". It doesn't have the "Cloud Tasks Enqueuer" explicitly, but since I am able to create the task in the queue, I guess I have inherited the required permissions.
The service account "my-service-account#my-project.iam.gserviceaccount.com" (which I use in the task to get the OIDC token) has - amongst others - the following roles:
Cloud Tasks Enqueuer (Although I don't think it needs this one as I'm creating the task with my own account)
Cloud Tasks Task Runner
Cloud Tasks Viewer
Service Account Token Creator (I'm not sure whether this should be added to my own account - the one who schedules the task - or to the service account that should perform the call to Cloud Run)
Service Account User (same here)
Cloud Run Invoker
So I did a dirty trick: I created a key file for the service account, downloaded it locally and impersonated locally by adding an account to my gcloud config with the key file. Next, I run
curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" https://url_to_my_service
That works! (By the way, it also works when I switch back to my own account)
Final tests: if I remove the oidc_token from the task when creating the task, I get a 403 response from Cloud Run! Not a 401...
If I remove the "Cloud Run Invoker" role from the service account and try again locally with curl, I also get a 403 instead of a 401.
If I finally make the Cloud Run service publicly accessible, everything works.
So, it seems that the Cloud Task fails to generate a token for the service account to authenticate properly at the Cloud Run service.
What am I missing?
I had the same issue here was my fix:
Diagnosis: Generating OIDC tokens currently does not support custom domains in the audience parameter. I was using a custom domain for my cloud run service (https://my-service.my-domain.com) instead of the cloud run generated url (found in the cloud run service dashboard) that looks like this: https://XXXXXX.run.app
Masking behavior: In the task being enqueued to Cloud Tasks, If the audience field for the oidc_token is not explicitly set then the target url from the task is used to set the audience in the request for the OIDC token.
In my case this meant that enqueueing a task to be sent to the target https://my-service.my-domain.com/resource the audience for the generating the OIDC token was set to my custom domain https://my-service.my-domain.com/resource. Since custom domains are not supported when generating OIDC tokens, I was receiving 401 not authorized responses from the target service.
My fix: Explicitly populate the audience with the Cloud Run generated URL, so that a valid token is issued. In my client I was able to globally set the audience for all tasks targeting a given service with the base url: 'audience' : 'https://XXXXXX.run.app'. This generated a valid token. I did not need to change the url of the target resource itself. The resource stayed the same: 'url' : 'https://my-service.my-domain.com/resource'
More Reading:
I've run into this problem before when setting up service-to-service authentication: Google Cloud Run Authentication Service-to-Service
1.I created a private cloud run service using this code:
import os
from flask import Flask
from flask import request
app = Flask(__name__)
#app.route('/index', methods=['GET', 'POST'])
def hello_world():
target = os.environ.get('TARGET', 'World')
print(target)
return str(request.data)
if __name__ == "__main__":
app.run(debug=True,host='0.0.0.0',port=int(os.environ.get('PORT', 8080)))
2.I created a service account with --role=roles/run.invoker that I will associate with the cloud task
gcloud iam service-accounts create SERVICE-ACCOUNT_NAME \
--display-name "DISPLAYED-SERVICE-ACCOUNT_NAME"
gcloud iam service-accounts list
gcloud run services add-iam-policy-binding SERVICE \
--member=serviceAccount:SERVICE-ACCOUNT_NAME#PROJECT-ID.iam.gserviceaccount.com \
--role=roles/run.invoker
3.I created a queue
gcloud tasks queues create my-queue
4.I create a test.py
from google.cloud import tasks_v2
from google.protobuf import timestamp_pb2
import datetime
# Create a client.
client = tasks_v2.CloudTasksClient()
# TODO(developer): Uncomment these lines and replace with your values.
project = 'your-project'
queue = 'your-queue'
location = 'europe-west2' # app engine locations
url = 'https://helloworld/index'
payload = 'Hello from the Cloud Task'
# Construct the fully qualified queue name.
parent = client.queue_path(project, location, queue)
# Construct the request body.
task = {
'http_request': { # Specify the type of request.
'http_method': 'POST',
'url': url, # The full url path that the task will be sent to.
'oidc_token': {
'service_account_email': "your-service-account"
},
'headers' : {
'Content-Type': 'application/json',
}
}
}
# Convert "seconds from now" into an rfc3339 datetime string.
d = datetime.datetime.utcnow() + datetime.timedelta(seconds=60)
# Create Timestamp protobuf.
timestamp = timestamp_pb2.Timestamp()
timestamp.FromDatetime(d)
# Add the timestamp to the tasks.
task['schedule_time'] = timestamp
task['name'] = 'projects/your-project/locations/app-engine-loacation/queues/your-queue/tasks/your-task'
converted_payload = payload.encode()
# Add the payload to the request.
task['http_request']['body'] = converted_payload
# Use the client to build and send the task.
response = client.create_task(parent, task)
print('Created task {}'.format(response.name))
#return response
5.I run the code in Google Cloud Shell with my user account which has Owner role.
6.The response received has the form:
Created task projects/your-project/locations/app-engine-loacation/queues/your-queue/tasks/your-task
7.Check the logs, success
The next day I am no longer able to reproduce this issue. I can reproduce the 403 responses by removing the Cloud Run Invoker role, but I no longer get 401 responses with exactly the same code as yesterday.
I guess this was a temporary issue on Google's side?
Also, I noticed that it takes some time before updated policies are actually in place (1 to 2 minutes).
For those like me, struggling through documentation and stackoverflow when having continuous UNAUTHORIZED responses on Cloud Tasks HTTP requests:
As was written in thread, you better provide audience for oidcToken you send to CloudTasks. Ensure your requested url exactly equals to your resource.
For instance, if you have Cloud Function named my-awesome-cloud-function and your task request url is https://REGION-PROJECT-ID.cloudfunctions.net/my-awesome-cloud-function/api/v1/hello, you need to ensure, that you set function url itself.
{
serviceAccountEmail: SERVICE-ACCOUNT_NAME#PROJECT-ID.iam.gserviceaccount.com,
audience: https://REGION-PROJECT-ID.cloudfunctions.net/my-awesome-cloud-function
}
Otherwise seems full url is used and leads to an error.

Google Cloud Tasks & Google App Engine Python 3

I am trying to work with the Google Cloud Tasks API
In python2.7 app engine standard you had this amazing library (deferred) that allowed you to easily assign workers to multiple tasks that could be completed asynchronisly.
So in a webapp2 handler I could do this:
create_csv_file(data):
#do a bunch of work ...
return
MyHandler(webapp2.Handler):
def get(self)
data = myDB.query()
deferred.defer(create_csv_file, data)
Now I am working on the new Google App Engine Python 3 runtime and the deferred library is not available for GAE Py3.
Is the google cloud tasks the correct solution/replacement?
This is where I am at now... I've scoured the internet looking for answer but my Google powers have failed me. I've found come examples but they are not very good and they appear as though you should be creating creating /adding tasks from gcloud console or locally but no examples of adding tasks from a front end api endpoint.
ExportCSVFileHandler(Resource):
def get(self):
create_task()
return
CSVTaskHandler(Resource):
def(post):
#do a lot of work creating a csv file
return
create_task():
client = tasks.CloudTasksClient(credentials='mycreds')
project = 'my-project_id'
location = 'us-east4'
queue_name = 'csv-worker'
parent = client.location_path(project, location)
the_queue = {
'name': client.queue_path(project, location, queue_name),
'rate_limits': {
'max_dispatches_per_second': 1
},
'app_engine_routing_override': {
'version': 'v2',
'service': 'task-module'
}
}
queues = [the_queue]
task = {
'app_engine_http_request': {
'http_method': 'GET',
'relative_uri': '/create-csv',
'app_engine_routing': {
'service': 'worker'
},
'body': str(20).encode()
}
}
# Use the client to build and send the task.
response = client.create_task(parent, task)
print('Created task {}'.format(response.name))
# [END taskqueues_using_yaml]
return response
Yes, Cloud Tasks is the replacement for App Engine Taskqueues. The API can be called from anywhere, ie locally, from App Engine, from external services, and even gcloud. The samples show you how to do this locally, but you can easily replace your old taskqueue code with the new Cloud Tasks library.
Unfortunately, there is no deferred library for Cloud Tasks. There are multiple ways around this. Create separate endpoints for task handlers and use the App Engine routing to send the task to the right endpoint, or add metadata to the task body in order for your handler to appropriate process the task request.

What service account permissions should I set for secure flexible app engine -> cloud function communication

I am having problems identifying which service account I need to give certain roles to.
I have a NodeJS app running on my flexible app engine environment.
I have a single hello-world python3.7 HTTP cloud function.
I want to do a GET request from my app engine to my cloud function.
When the allUser member is given the Cloud Function Invoker role on the hello-world cloud function everything works fine.
But now I want to secure my cloud function endpoint so that only my flexible app engine can reach it.
I remove the allUser member and as expected I get a 403 when the app engine tries to call.
Now I add the #appspot.gserviceaccount.com and #gae-api-prod.google.com.iam.gserviceaccount.com members to the hello-world cloud function and give them Cloud Function Invoker roles.
I would expect the flexible app engine to now be able to call the hello-world cloud function seeing as I gave it the Cloud Function Invoker role.
But I keep getting a 403 error.
What service account is app engine flexible using to do these calls to the cloud function API?
The are some settings to made in order to connect cloud functions wit a service account:
Enable required APIs
Enable Service account
Act as User Service Account
The default service account creates a cloud function and sometimes doesn't have all the privileges.
You can find more info Here:
https://cloud.google.com/functions/docs/securing/
John Hanley was correct,
When using GCP libraries to perform actions (like google-cloud-firestore for example) the executing function will use the underlying service account permissions to do those actions.
When doing manual HTTP requests to cloud function URLs, you will have to fetch a token from the metadata server to properly authenticate your request.
def generate_token() -> str:
"""Generate a Google-signed OAuth ID token"""
token_request_url: str = f'http://metadata/computeMetadata/v1/instance/service-
accounts/default/identity?audience={TARGET_URL}'
token_request_headers: dict = {'Metadata-Flavor': 'Google'}
token_response = requests.get(token_request_url, headers=token_request_headers)
return token_response.content.decode("utf-8")
def do_request():
token: str = generate_token()
headers: dict = {
'Content-type': 'application/json',
'Accept': 'application/json',
'Authorization': f'Bearer {token}'
}
requests.post(url=TARGET_URL, json=data, headers=headers)

api for deleting image from google cloud storage buckets

Hello all actually i am using google cloud platform and there i am storing my coupons images in gcs buckets. Now does google provide any api to delete an existing image from gcs buckets. I searched a lot on its docs google docs also have seen many blogs but what everyone do is deleting data from database but no one tell about how to delete image from buckets. If anyone have done this please help me it would be really appreciable.
Thanks
Sure.
Via console you can use gsutil command in this way. You just should install gsutil command.
Via api rest you can use this service. You can try this api here.
Also there are libraries for python, java or other languajes.
From #MikeSchwartz suggestion. Using cloud console, you can manage your objects manually. Link.
Update 2: example on NodeJS
We can choose between three options. Using request module, Google cloud NodeJS client or Google API NodeJS client. But first of all, you should authorise your server to make request to Google Cloud Storage (GCS). To do that:
Open the Console Credentials page
If it's not already selected, select the project that you're creating credentials for.
Click Create credentials​ and choose Service Account Key.
In the dropdown select Compute Engine default service account. Then click create. A Json file will be downloaded.
In the left panel click Overview and type cloud storage in the finder.
Click Google Cloud Storage and make sure that this api is enabled.
Rename the downloaded json to keyfile.json and put it in accesible path for your NodeJS code.
Google cloud NodeJS client. Here the official repository with a lot of samples.
var fs = require('fs');
var gcloud = require('gcloud');
var gcs = gcloud.storage({
projectId: 'your-project',
keyFilename: '/path/to/keyfile.json'
});
var bucket = gcs.bucket('your-bucket');
var file = bucket.file('your-file');
file.delete(function(err, apiResponse) {}):
Using request module.
npm install request
Then in your code:
var request = require('request');
request({
url: 'https://www.googleapis.com/storage/v1/b/your-bucket/o/your-file',
qs: {key: 'your-private-key'}, // you can find your private-key in your keyfile.json
method: 'DELETE'
}, function(error, response, body){});
Using Google API NodeJS: I don't know how to use this, but there a lot of examples here.
Assuming you have your image file public url, you can do it like this
import {Storage} from "#google-cloud/storage";
const storage = new Storage({
projectId: GCLOUD_PROJECT,
keyFilename: 'keyfile.json'
});
const bucket = storage.bucket(GCLOUD_BUCKET);
//var image_file="https://storage.googleapis.com/{bucketname}/parentfolder/childfolder/filename
var image_file="https://storage.googleapis.com/1533406597315/5be45c0b8c4ccd001b3567e9/1542186701528/depositphotos_173658708-stock-photo-hotel-room.jpg";
new Promise((resolve, reject) => {
var imageurl = image_file.split("/");
imageurl = imageurl.slice(4, imageurl.length + 1).join("/");
//imageurl=parentfolder/childfolder/filename
storage
.bucket(GCLOUD_BUCKET)
.file(imageurl)
.delete()
.then((image) => {
resolve(image)
})
.catch((e) => {
reject(e)
});
});
check on googles official documentation under code samples in this link https://cloud.google.com/storage/docs/deleting-objects or github https://github.com/googleapis/nodejs-storage/blob/master/samples/files.js

Resources