I get exception DeadlineExceededError from my app engine instance at 15:43 (UTC+03).
DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded.
In requested method logic like that
def post(self):
data = json.loads(request.data)
user = User(**data['user'])
db.session.add(user)
db.session.commit()
return '', 201, {'Location': url_for(request.endpoint)}
Google Cloud SQL have incident, but it closed sooner than I got this exception. https://status.cloud.google.com/incident/cloud-sql/17010
Related
I have 2 service. One is hosted in Google App Engine and one is hosted in Cloud Run.
I use urlfetch (Python 2) imported from google.appengine.api in GAE to call APIs provided by the Cloud Run.
Occasionally there are a few (like <10 per week) DeadlineExceededError shown up like this:
Deadline exceeded while waiting for HTTP response from URL
But these few days such error suddenly occurs frequently (like ~40 per day). Not sure if it is due to Christmas peak hour or what.
I've checked Load Balancer logs of Cloud Run and turned out the request has never reached the Load Balancer.
Has anyone encountered similar issue before? Is anything wrong with GAE urlfetch?
I found a conversion which is similar but the suggestion was to handle the error...
Wonder what can I do to mitigate the issue. Many thanks.
Update 1
Checked again, found some requests from App Engine did show up in Cloud Run Load Balancer logs but the time is weird:
e.g.
Logs from GAE project
10:36:24.706 send request
10:36:29.648 deadline exceeded
Logs from Cloud Run project
10:36:35.742 reached load balancer
10:36:49.289 finished processing
Not sure why it took so long for the request to reach the Load Balancer...
Update 2
I am using GAE Standard located in US with the following settings:
runtime: python27
api_version: 1
threadsafe: true
automatic_scaling:
max_pending_latency: 5s
inbound_services:
- warmup
- channel_presence
builtins:
- appstats: on
- remote_api: on
- deferred: on
...
The Cloud Run hosted API gateway I was trying to call is located in Asia. In front of it there is a Google Load Balancer whose type is HTTP(S) (classic).
Update 3
I wrote a simple script to directly call Cloud Run endpoint using axios (whose timeout is set to 5s) periodically. After a while some requests were timed out. I checked the logs in my Cloud Run project, 2 different phenomena were found:
For request A, pretty much like what I mentioned in Update 1, logs were found for both Load Balancer and Cloud Run revision.
Time of CR revision log - Time of LB log > 5s so I think this is an acceptable time out.
But for request B, no logs were found at all.
So I guess the problem is not about urlfetch nor GAE?
Deadline exceeded while waiting for HTTP response from URL is actually a DeadlineExceededError. The URL was not fetched because the deadline was exceeded. This can occur with either the client-supplied deadline (which you would need to change), or the system default if the client does not supply a deadline parameter.
When you are making a HTTP request, App Engine maps this request to URLFetch. URLFetch has its own deadline that is configurable. See the URLFetch documentation.
You can set a deadline for each URLFetch request. By default, the deadline for a fetch is 5 seconds. You can change this default by:
Including the following appengine.api.urlfetch.defaultDeadline setting in your appengine-web.xml configuration file. Specify the timeout in seconds:
<system-properties>:
<property name="appengine.api.urlfetch.defaultDeadline" value="10"/>
</system-properties>
You can also adjust the default deadline by using the urlfetch.set_default_fetch_deadline() function. This function stores the new default deadline on a thread-local variable, so it must be set for each request, for example, in a custom middleware.
from google.appengine.api import urlfetch
urlfetch.set_default_fetch_deadline(45)
If your Cloud Run service is processing long requests, you can increase the request timeout. If your service doesn't return a response within the time specified, the request ends and the service returns an HTTP 504 error.
Update the timeoutSeconds attribute in YAML file as :
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: SERVICE
spec:
template:
spec:
containers:
- image: IMAGE
timeoutSeconds: VALUE
OR
You can update the request timeout for a given revision at any time by using the following command:
gcloud run services update [SERVICE] --timeout=[TIMEOUT]
If requests are terminating earlier with error code 503, you might need to update the request timeout setting for your language framework:
Node.js developers might need to update the [server.timeout property via server.setTimeout][6] (use server.setTimeout(0) to achieve an unlimited timeout) depending on the version you are using.
Python developers need to update Gunicorn's default timeout.
I am trying to invoke a Cloud Run service using Cloud Tasks as described in the docs here.
I have a running Cloud Run service. If I make the service publicly accessible, it behaves as expected.
I have created a cloud queue and I schedule the cloud task with a local script. This one is using my own account. The script looks like this
from google.cloud import tasks_v2
client = tasks_v2.CloudTasksClient()
project = 'my-project'
queue = 'my-queue'
location = 'europe-west1'
url = 'https://url_to_my_service'
parent = client.queue_path(project, location, queue)
task = {
'http_request': {
'http_method': 'GET',
'url': url,
'oidc_token': {
'service_account_email': 'my-service-account#my-project.iam.gserviceaccount.com'
}
}
}
response = client.create_task(parent, task)
print('Created task {}'.format(response.name))
I see the task appear in the queue, but it fails and retries immediately. The reason for this (by checking the logs) is that the Cloud Run service returns a 401 response.
My own user has the roles "Service Account Token Creator" and "Service Account User". It doesn't have the "Cloud Tasks Enqueuer" explicitly, but since I am able to create the task in the queue, I guess I have inherited the required permissions.
The service account "my-service-account#my-project.iam.gserviceaccount.com" (which I use in the task to get the OIDC token) has - amongst others - the following roles:
Cloud Tasks Enqueuer (Although I don't think it needs this one as I'm creating the task with my own account)
Cloud Tasks Task Runner
Cloud Tasks Viewer
Service Account Token Creator (I'm not sure whether this should be added to my own account - the one who schedules the task - or to the service account that should perform the call to Cloud Run)
Service Account User (same here)
Cloud Run Invoker
So I did a dirty trick: I created a key file for the service account, downloaded it locally and impersonated locally by adding an account to my gcloud config with the key file. Next, I run
curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" https://url_to_my_service
That works! (By the way, it also works when I switch back to my own account)
Final tests: if I remove the oidc_token from the task when creating the task, I get a 403 response from Cloud Run! Not a 401...
If I remove the "Cloud Run Invoker" role from the service account and try again locally with curl, I also get a 403 instead of a 401.
If I finally make the Cloud Run service publicly accessible, everything works.
So, it seems that the Cloud Task fails to generate a token for the service account to authenticate properly at the Cloud Run service.
What am I missing?
I had the same issue here was my fix:
Diagnosis: Generating OIDC tokens currently does not support custom domains in the audience parameter. I was using a custom domain for my cloud run service (https://my-service.my-domain.com) instead of the cloud run generated url (found in the cloud run service dashboard) that looks like this: https://XXXXXX.run.app
Masking behavior: In the task being enqueued to Cloud Tasks, If the audience field for the oidc_token is not explicitly set then the target url from the task is used to set the audience in the request for the OIDC token.
In my case this meant that enqueueing a task to be sent to the target https://my-service.my-domain.com/resource the audience for the generating the OIDC token was set to my custom domain https://my-service.my-domain.com/resource. Since custom domains are not supported when generating OIDC tokens, I was receiving 401 not authorized responses from the target service.
My fix: Explicitly populate the audience with the Cloud Run generated URL, so that a valid token is issued. In my client I was able to globally set the audience for all tasks targeting a given service with the base url: 'audience' : 'https://XXXXXX.run.app'. This generated a valid token. I did not need to change the url of the target resource itself. The resource stayed the same: 'url' : 'https://my-service.my-domain.com/resource'
More Reading:
I've run into this problem before when setting up service-to-service authentication: Google Cloud Run Authentication Service-to-Service
1.I created a private cloud run service using this code:
import os
from flask import Flask
from flask import request
app = Flask(__name__)
#app.route('/index', methods=['GET', 'POST'])
def hello_world():
target = os.environ.get('TARGET', 'World')
print(target)
return str(request.data)
if __name__ == "__main__":
app.run(debug=True,host='0.0.0.0',port=int(os.environ.get('PORT', 8080)))
2.I created a service account with --role=roles/run.invoker that I will associate with the cloud task
gcloud iam service-accounts create SERVICE-ACCOUNT_NAME \
--display-name "DISPLAYED-SERVICE-ACCOUNT_NAME"
gcloud iam service-accounts list
gcloud run services add-iam-policy-binding SERVICE \
--member=serviceAccount:SERVICE-ACCOUNT_NAME#PROJECT-ID.iam.gserviceaccount.com \
--role=roles/run.invoker
3.I created a queue
gcloud tasks queues create my-queue
4.I create a test.py
from google.cloud import tasks_v2
from google.protobuf import timestamp_pb2
import datetime
# Create a client.
client = tasks_v2.CloudTasksClient()
# TODO(developer): Uncomment these lines and replace with your values.
project = 'your-project'
queue = 'your-queue'
location = 'europe-west2' # app engine locations
url = 'https://helloworld/index'
payload = 'Hello from the Cloud Task'
# Construct the fully qualified queue name.
parent = client.queue_path(project, location, queue)
# Construct the request body.
task = {
'http_request': { # Specify the type of request.
'http_method': 'POST',
'url': url, # The full url path that the task will be sent to.
'oidc_token': {
'service_account_email': "your-service-account"
},
'headers' : {
'Content-Type': 'application/json',
}
}
}
# Convert "seconds from now" into an rfc3339 datetime string.
d = datetime.datetime.utcnow() + datetime.timedelta(seconds=60)
# Create Timestamp protobuf.
timestamp = timestamp_pb2.Timestamp()
timestamp.FromDatetime(d)
# Add the timestamp to the tasks.
task['schedule_time'] = timestamp
task['name'] = 'projects/your-project/locations/app-engine-loacation/queues/your-queue/tasks/your-task'
converted_payload = payload.encode()
# Add the payload to the request.
task['http_request']['body'] = converted_payload
# Use the client to build and send the task.
response = client.create_task(parent, task)
print('Created task {}'.format(response.name))
#return response
5.I run the code in Google Cloud Shell with my user account which has Owner role.
6.The response received has the form:
Created task projects/your-project/locations/app-engine-loacation/queues/your-queue/tasks/your-task
7.Check the logs, success
The next day I am no longer able to reproduce this issue. I can reproduce the 403 responses by removing the Cloud Run Invoker role, but I no longer get 401 responses with exactly the same code as yesterday.
I guess this was a temporary issue on Google's side?
Also, I noticed that it takes some time before updated policies are actually in place (1 to 2 minutes).
For those like me, struggling through documentation and stackoverflow when having continuous UNAUTHORIZED responses on Cloud Tasks HTTP requests:
As was written in thread, you better provide audience for oidcToken you send to CloudTasks. Ensure your requested url exactly equals to your resource.
For instance, if you have Cloud Function named my-awesome-cloud-function and your task request url is https://REGION-PROJECT-ID.cloudfunctions.net/my-awesome-cloud-function/api/v1/hello, you need to ensure, that you set function url itself.
{
serviceAccountEmail: SERVICE-ACCOUNT_NAME#PROJECT-ID.iam.gserviceaccount.com,
audience: https://REGION-PROJECT-ID.cloudfunctions.net/my-awesome-cloud-function
}
Otherwise seems full url is used and leads to an error.
I have a simple service that publishes messages to a PubSub topic and occasionally get a "Deadline Exceeded" error message:
GaxError(RPC failed, caused by <_Rendezvous of RPC that terminated
with (StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded)>)
Python code:
from google.cloud import pubsub
pubsub_client = pubsub.Client()
topic = pubsub_client.topic("pubsub-topic")
data = data.encode('utf-8')
message_id = topic.publish(data)
It posts a few messages a second, from a Flask web app, and maybe one in a few hundred fail with that error.
Turns out I was creating too many PubSub clients!
I moved this part outside the function / route so that the topic and client are global variables and aren't initialized with each call:
pubsub_client = pubsub.Client()
topic = pubsub_client.topic("pubsub-topic")
(Right after instantiating Flask):
app = Flask(__name__)
I created a topic in my project Project 1 and I have an app on Google app engine which posts every minute a message to this topic.
I have a google cloud compute machine in a second project (Project 2) which subscribed to this topic and receives the messages.
I did not give any access right to the machine on my Project 2, but even without access rights, It managed to receive the messages. More precisely, I did not write specific permissions associated to the topic I created.
My questions are:
1- is this normal? Shouldn't the machine on Project 2 get a "forbidden access error"?
2- how can I restrain access on a certain topic?
Here is the code of my subscription part:
import httplib2
import base64
import pandas
import json
from apiclient import discovery
from oauth2client import client as oauth2client
from oauth2client.client import SignedJwtAssertionCredentials
from oauth2client.client import GoogleCredentials
def create_pubsub_client(http=None):
credentials = GoogleCredentials.get_application_default()
if not http:
http = httplib2.Http()
credentials.authorize(http)
return discovery.build('pubsub', 'v1', http=http)
client = create_pubsub_client()
# You can fetch multiple messages with a single API call.
batch_size = 1
subscription_str = 'projects/<myproject1>/subscriptions/testo'
# Create a POST body for the Pub/Sub request
body = {
# Setting ReturnImmediately to false instructs the API to wait
# to collect the message up to the size of MaxEvents, or until
# the timeout.
'returnImmediately': False,
'maxMessages': batch_size,
}
while True:
resp = client.projects().subscriptions().pull(
subscription=subscription_str, body=body).execute()
received_messages = resp.get('receivedMessages')
if received_messages is not None:
ack_ids = []
for received_message in received_messages:
pubsub_message = received_message.get('message')
if pubsub_message:
# Process messages
msg = base64.b64decode(str(pubsub_message.get('data')))
treatment(msg)
# Get the message's ack ID
ack_ids.append(received_message.get('ackId'))
# Create a POST body for the acknowledge request
ack_body = {'ackIds': ack_ids}
# Acknowledge the message.
client.projects().subscriptions().acknowledge(
subscription=subscription_str, body=ack_body).execute()
The ability of the machine in Project 2 to access the topic/subscription in Project 1 depends entirely on how machine is authenticated. If it is authenticated with something that has permissions on both projects, e.g., your developer account, then you would be able to access the subscription on the topic in Project 1. That is normal.
If you want to restrict the access, create a service account in Project 1 and set the permissions on your topic and/or subscription to allow only that service account. You would do so in the Pub/Sub section of the Google Developers Console. Then, only machines authenticated via that service account will be able to access them.
We're having trouble publishing messages to a Google Cloud PubSub topic on Google AppEngine. Using the Application Default credentials works perfect locally. But once it's deployed on Google AppEngine it gives the following error:
<HttpError 403 when requesting https://pubsub.googleapis.com/v1/projects/our-project-id/topics/our-topic:publish?alt=json returned "The request cannot be identified with a project. Please pass a valid API key with the request.">
I would assume that it's will use the service account of app engine to access the PubSub API. Here is the code we used to create the credentials.
credentials = GoogleCredentials.get_application_default()
if credentials.create_scoped_required():
credentials = credentials.create_scoped(['https://www.googleapis.com/auth/pubsub'])
http = httplib2.Http()
credentials.authorize(http)
pubsub_service = build('pubsub', 'v1', http=http)
The error is thrown when publishing the actual message to PubSub.
pubsub_service.projects().topics().publish(topic="projects/out-project-id/topics/out-topic", body = { 'messages' : [ { 'data': base64.b64encode(request.get_data()) }]}).execute()
Not that the same flow works doing API call's to "BigQuery", so it's not a general Google API problem. It seems to be specific to PubSub...
It's a rare case of the service account without project id embedded in it. We fixed your service account and you should be good to go now. Sorry for the trouble.