Google Cloud PubSub Error: Deadline Exceeded - google-cloud-pubsub

I have a simple service that publishes messages to a PubSub topic and occasionally get a "Deadline Exceeded" error message:
GaxError(RPC failed, caused by <_Rendezvous of RPC that terminated
with (StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded)>)
Python code:
from google.cloud import pubsub
pubsub_client = pubsub.Client()
topic = pubsub_client.topic("pubsub-topic")
data = data.encode('utf-8')
message_id = topic.publish(data)
It posts a few messages a second, from a Flask web app, and maybe one in a few hundred fail with that error.

Turns out I was creating too many PubSub clients!
I moved this part outside the function / route so that the topic and client are global variables and aren't initialized with each call:
pubsub_client = pubsub.Client()
topic = pubsub_client.topic("pubsub-topic")
(Right after instantiating Flask):
app = Flask(__name__)

Related

Google Cloud Tasks & Google App Engine Python 3

I am trying to work with the Google Cloud Tasks API
In python2.7 app engine standard you had this amazing library (deferred) that allowed you to easily assign workers to multiple tasks that could be completed asynchronisly.
So in a webapp2 handler I could do this:
create_csv_file(data):
#do a bunch of work ...
return
MyHandler(webapp2.Handler):
def get(self)
data = myDB.query()
deferred.defer(create_csv_file, data)
Now I am working on the new Google App Engine Python 3 runtime and the deferred library is not available for GAE Py3.
Is the google cloud tasks the correct solution/replacement?
This is where I am at now... I've scoured the internet looking for answer but my Google powers have failed me. I've found come examples but they are not very good and they appear as though you should be creating creating /adding tasks from gcloud console or locally but no examples of adding tasks from a front end api endpoint.
ExportCSVFileHandler(Resource):
def get(self):
create_task()
return
CSVTaskHandler(Resource):
def(post):
#do a lot of work creating a csv file
return
create_task():
client = tasks.CloudTasksClient(credentials='mycreds')
project = 'my-project_id'
location = 'us-east4'
queue_name = 'csv-worker'
parent = client.location_path(project, location)
the_queue = {
'name': client.queue_path(project, location, queue_name),
'rate_limits': {
'max_dispatches_per_second': 1
},
'app_engine_routing_override': {
'version': 'v2',
'service': 'task-module'
}
}
queues = [the_queue]
task = {
'app_engine_http_request': {
'http_method': 'GET',
'relative_uri': '/create-csv',
'app_engine_routing': {
'service': 'worker'
},
'body': str(20).encode()
}
}
# Use the client to build and send the task.
response = client.create_task(parent, task)
print('Created task {}'.format(response.name))
# [END taskqueues_using_yaml]
return response
Yes, Cloud Tasks is the replacement for App Engine Taskqueues. The API can be called from anywhere, ie locally, from App Engine, from external services, and even gcloud. The samples show you how to do this locally, but you can easily replace your old taskqueue code with the new Cloud Tasks library.
Unfortunately, there is no deferred library for Cloud Tasks. There are multiple ways around this. Create separate endpoints for task handlers and use the App Engine routing to send the task to the right endpoint, or add metadata to the task body in order for your handler to appropriate process the task request.

Google Pub/Sub access rights

I created a topic in my project Project 1 and I have an app on Google app engine which posts every minute a message to this topic.
I have a google cloud compute machine in a second project (Project 2) which subscribed to this topic and receives the messages.
I did not give any access right to the machine on my Project 2, but even without access rights, It managed to receive the messages. More precisely, I did not write specific permissions associated to the topic I created.
My questions are:
1- is this normal? Shouldn't the machine on Project 2 get a "forbidden access error"?
2- how can I restrain access on a certain topic?
Here is the code of my subscription part:
import httplib2
import base64
import pandas
import json
from apiclient import discovery
from oauth2client import client as oauth2client
from oauth2client.client import SignedJwtAssertionCredentials
from oauth2client.client import GoogleCredentials
def create_pubsub_client(http=None):
credentials = GoogleCredentials.get_application_default()
if not http:
http = httplib2.Http()
credentials.authorize(http)
return discovery.build('pubsub', 'v1', http=http)
client = create_pubsub_client()
# You can fetch multiple messages with a single API call.
batch_size = 1
subscription_str = 'projects/<myproject1>/subscriptions/testo'
# Create a POST body for the Pub/Sub request
body = {
# Setting ReturnImmediately to false instructs the API to wait
# to collect the message up to the size of MaxEvents, or until
# the timeout.
'returnImmediately': False,
'maxMessages': batch_size,
}
while True:
resp = client.projects().subscriptions().pull(
subscription=subscription_str, body=body).execute()
received_messages = resp.get('receivedMessages')
if received_messages is not None:
ack_ids = []
for received_message in received_messages:
pubsub_message = received_message.get('message')
if pubsub_message:
# Process messages
msg = base64.b64decode(str(pubsub_message.get('data')))
treatment(msg)
# Get the message's ack ID
ack_ids.append(received_message.get('ackId'))
# Create a POST body for the acknowledge request
ack_body = {'ackIds': ack_ids}
# Acknowledge the message.
client.projects().subscriptions().acknowledge(
subscription=subscription_str, body=ack_body).execute()
The ability of the machine in Project 2 to access the topic/subscription in Project 1 depends entirely on how machine is authenticated. If it is authenticated with something that has permissions on both projects, e.g., your developer account, then you would be able to access the subscription on the topic in Project 1. That is normal.
If you want to restrict the access, create a service account in Project 1 and set the permissions on your topic and/or subscription to allow only that service account. You would do so in the Pub/Sub section of the Google Developers Console. Then, only machines authenticated via that service account will be able to access them.

Google Cloud Pubsub authentication error from App Engine

We're having trouble publishing messages to a Google Cloud PubSub topic on Google AppEngine. Using the Application Default credentials works perfect locally. But once it's deployed on Google AppEngine it gives the following error:
<HttpError 403 when requesting https://pubsub.googleapis.com/v1/projects/our-project-id/topics/our-topic:publish?alt=json returned "The request cannot be identified with a project. Please pass a valid API key with the request.">
I would assume that it's will use the service account of app engine to access the PubSub API. Here is the code we used to create the credentials.
credentials = GoogleCredentials.get_application_default()
if credentials.create_scoped_required():
credentials = credentials.create_scoped(['https://www.googleapis.com/auth/pubsub'])
http = httplib2.Http()
credentials.authorize(http)
pubsub_service = build('pubsub', 'v1', http=http)
The error is thrown when publishing the actual message to PubSub.
pubsub_service.projects().topics().publish(topic="projects/out-project-id/topics/out-topic", body = { 'messages' : [ { 'data': base64.b64encode(request.get_data()) }]}).execute()
Not that the same flow works doing API call's to "BigQuery", so it's not a general Google API problem. It seems to be specific to PubSub...
It's a rare case of the service account without project id embedded in it. We fixed your service account and you should be good to go now. Sorry for the trouble.

Google App Engine connection request timeout error

I am working on a GAE web app which shows movie related data.To get the movie data I am using API from OMDB (http://www.omdbapi.com/) .Below is the code snippet I use to connect to the API.
When i run it locally it works perfectly fine, but doesn't work when deployed on GAE. It throws connection timeout exception, i tried increasing connection timeout period but that didn't work.
String URLstr = "http://www.omdbapi.com/?t="+URLEncoder.encode(Request,"utf-8");
URL url=null;
URLConnection uc = null;
BufferedReader bf = null;
try {
url= new URL(URLstr);
uc = url.openConnection();
uc.setConnectTimeout(15* 1000);
bf = new BufferedReader(new InputStreamReader(uc.getInputStream()));
} catch (IOException e) {
throw new IllegalArgumentException(e.getMessage());
}
Is my code incorrect? are there some restrictions with GAE that i missed?
Your code looks correct. I am having the exact same issue with OMDB API and Google App Engine as of a few weeks ago. I reached out to Brian who runs OMDB API regarding this and I think it had to do with the App Engine IP range being blocked because of abuse a few weeks ago.
I created the following webapp to figure out what external IP address the url fetch from my app was showing up as to the OMDB servers. I deployed the following to GAE to get the public IP.
import webapp2
import logging
from google.appengine.api import urlfetch
class ifconfig(webapp2.RequestHandler):
def get(self):
url="http://ipecho.net/plain"
urlfetch.set_default_fetch_deadline(60)
result = urlfetch.fetch(url)
logging.debug("I think my external IP is %s " % result.content)
self.response.write(result.content)
app = webapp2.WSGIApplication([
('/ifconfig', ifconfig)
])
In Google App Engine, I went to the instances tab and shutdown the instance, and checked what external IP the new instance had. I did this several times, and in my case it seemed like the external IPs were all coming from 107.178.195.0/24, so I provided this information to OMDB API.
I guess this was in the banned IP block, and Brian was able to unblock that range. This fixed my issue and requests to the API started working again.
This possibly might have resolved the issue for you as well, but if it didn't, you might want to figure out what your public IP is and reach out to Brian to see if it's in an IP range that's being blocked

GAE App gets socket errors when communicating with BigQuery

Our GAE python application communicates with BigQuery using the Google Api Client for Python (currently we use version 1.3.1) with the GAE-specific authentication helpers. Very often we get a socket error while communicating with BigQuery.
More specifically, we build a python Google API client as follows
1. bq_scope = 'https://www.googleapis.com/auth/bigquery'
2. credentials = AppAssertionCredentials(scope=bq_scope)
3. http = credentials.authorize(httplib2.Http())
4. bq_service = build('bigquery', 'v2', http=http)
We then interact with the BQ service and get the following error
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/gae_override/httplib.py", line 536, in getresponse
'An error occured while connecting to the server: %s' % e)
error: An error occured while connecting to the server: Unable to fetch URL: [api url...]
The error raised is of type google.appengine.api.remote_socket._remote_socket_error.error, not an exception that wraps the error.
Initially we thought that it might be timeout-related, so we also tried setting a timeout altering line 3 in the above snippet to
3. http = credentials.authorize(httplib2.Http(timeout=60))
However, according to the log output of client library the API call takes less than 1 second to crash and explicitly setting the timeout did not change the system behavior.
Note that the error occurs in various API calls, not just a single one, and usually this happens on very light operations, for example we often see the error while polling BQ for a job status and rarely on data fetching. When we re-run the operation, the system works.
Any idea why this might happen and -perhaps- a best-practice to handle it?
All HTTP(s) requests will be routed through the urlfetch service.
Beneath that, the Google Api Client for Python uses httplib2 to make HTTP(s) requests and under the covers this library uses socket.
Since the error is coming from socket you might try to set the timeout there.
import socket
timeout = 30
socket.setdefaulttimeout(timeout)
If we continue up the stack httplib2 will use the timeout parameter from the socket level timeout.
http://httplib2.readthedocs.io/en/latest/libhttplib2.html
Moving further up the stack you can set the timeout and retries for BigQuery.
try:
timeout = 30000
num_retries = 5
query_request = bigquery_service.jobs()
query_data = {
'query': (query_var),
'timeoutMs': timeout,
}
And finally you can set the timeout for urlfetch.
from google.appengine.api import urlfetch
urlfetch.set_default_fetch_deadline(30)
If you believe it's timeout related you might want to test each library / level to make sure the timeout is being passed correctly. You can also use a basic timer to see the results.
start_query = time.time()
query_response = query_request.query(
projectId='<project_name>',
body=query_data).execute(num_retries=num_retries)
end_query = time.time()
logging.info(end_query - start_query)
There are dozens of questions about timeout and deadline exceeded for GAE and BigQuery on this site so I wouldn't be surprised if you're hitting something weird.
Good luck!

Resources