What is 'capacity' parameter when talking about Flink async IO? - apache-flink

When using Flink AsyncDataStream#unorderedWait, there's a parameter called 'capacity', quote from flink official doc,
Capacity: This parameter defines how many asynchronous requests may be in progress at the same time. Even though the async I/O approach leads typically to much better throughput, the operator can still be the bottleneck in the streaming application. Limiting the number of concurrent requests ensures that the operator will not accumulate an ever-growing backlog of pending requests, but that it will trigger backpressure once the capacity is exhausted.
I'm not quite get it, is it for the whole job, or it's for a subtask?
Let's say my toy flink app consumes a kafka, and for each kafka message, it makes a http request, when it receives the http response, it sinks it to another kafka topic.
And in this example, the parallelism of kafka source to 50, if I set the 'capacity' to 10, what does that mean? Does it mean that the whole app will make at most 10 http requests at the same time? Or, 10 http requests for each subtask (that results in at most 500 http requests at the same time)?
And another question is, what it the best practice of set the 'capacity' in this scenario?
Many thanks!

The capacity is per instance of the async i/o operator. So in your example, there would be at most 500 concurrent http requests.
You may have to do some benchmarking experiments to see where it makes sense to balance the tradeoffs for your use case. If the capacity is too small then under load you're likely to create backpressure prematurely; if capacity is too large, then under load you're likely to overwhelm the external service, leading to timeouts or other errors.

Related

App Engine latency while in the middle of a request processing

Why would there be any latency in App Engine in the middle of processing a request? This only happens at times and randomly occurs at different places in the request handling with a latency of around 3 or more seconds after starting to process a request.
The usual suspect is your handler reaching out for some resources, either from GAE APIs (datastore, memcache, etc), other GCP API/infra (cloud storage, machine learning, big query, etc) or an external/3rd party service/URL.
Most, if not all such interactions can occasionally encounter peak response times way longer than average for various possible reasons (or combinations of reasons), for example:
temporary outages of the service being accessed of in the networking layer ensuring connectivity to them
retries at networking or application layers due to communication errors/packet loss
service VMs/instances needed to be launched from scratch during (re)starts or even during scaling up
normal operation conditions which require more time, like datastore transaction retries due to collisions
If the occurrence rate becomes unacceptable an investigation would need to be done to identify which of such external accesses is/are responsible, what are the conditions causing them and maybe find some solution to prevent or reduce the impact of the occurences.
Of course, there may be other reasons as well.

Does App Engine Flexible for Python support concurrent requests?

From the documentation on how GAE Flexible handles requests, it says that "An instance can handle multiple requests concurrently" but I don't know what this exactly means.
Let's say my application can process a single request every 60 seconds.
After starting to process the initial request, will another request (or 3) that occur say 30 seconds after (so halfway done with the first request), be handled by the same instance, or will it trigger autoscaling and spin up more instances to handle those new requests? This situation assumes that CPU utilization for the first request is still below the scaling CPU-utilization threshold.
I'm worried that because it takes my instance 60 seconds to process a single request and I will be receiving multiple requests at a time, that I'll be inefficiently triggering autoscaling even if there is enough processing power to handle additional requests on the same instance. Is this how it works? I would ideally like to be able to multi-thread my processing and accept additional requests on the same instance while still under the CPU utilization threshold.
The documentation for concurrent requests is scarce for the Flexible environment unlike the Standard environment so I want to be sure.
Perhaps 'number of workers' is the config setting you're looking for:
https://cloud.google.com/appengine/docs/flexible/python/runtime#recommended_gunicorn_configuration
Gunicorn uses workers to handle requests. By default, Gunicorn uses sync workers. This worker class is compatible with all web applications, but each worker can only handle one request at a time. By default, gunicorn only uses one of these workers. This can often cause your instances to be underutilized and increase latency in applications under high load.
And it sounds like you've already seen that you can specify the cpu utilization threshold:
https://cloud.google.com/appengine/docs/flexible/python/reference/app-yaml#automatic_scaling
You can also use something other than gunicorn if you prefer. Here's one of their example's where they use Honcho instead:
https://github.com/GoogleCloudPlatform/getting-started-python/blob/master/6-pubsub/app.yaml

What is the difference between appengine datastore timeout errors 5 and 11?

I'm trying to speed up a Google App Engine request handler that has a big datastore PutMulti call (500 entities) by splitting it into batches of entities and running concurrent goroutines to send smallerPutMulti calls (100 entities each).
Before this, I had often been getting the datastore error Call error 11: Deadline exceeded (timeout) from my PutMulti calls going over the deadline when I tested the handler on many concurrent requests. After the parallelization, the handler did speed up, but I still occasionally got that error and also another type of error, API error 5 (datastore_v3: TIMEOUT): The datastore operation timed out, or the data was temporarily unavailable.
Is this error 5 due to contention in the datastore, and what is the difference between errors 5 and 11?
These errors come from two different places, the first, the call error, is a local error that is caused by a timeout in the RPC client. It indicates that there was a timeout waiting for completion of an RPC. The default RPC timeout in google.golang.org/appengine is 60 seconds.
The second error comes from the service side. This error indicates that a timeout occurred performing operations within datastore. Some of these operations have timeouts much shorter than 60s, and typically this may indicate contention.
A possibly simpler way to understand the differences is that you will find that if you make a single multi operation with a very large number of changes, you can trigger the first timeout with ease. If you create a significant number of concurrent operations against a single key or small set of keys, you will more readily trigger the latter. As timeouts are general indicators of saturation of shared resources, there are of course many ways and combinations to generate them. In general, one will want to retry operations as appropriate, and also size operations appropriately, as well as aggregating operations on hot keys as best as possible to reduce the chance of contention related issues. As others have suggested, the python and java docs have some examples of this already.
You may wish to make use of https://godoc.org/google.golang.org/appengine#IsTimeoutError and if you need to increase the timeout for the first error class, you may be able to adjust the context deadline, see the methods here: https://godoc.org/golang.org/x/net/context#WithDeadline Note: you will not be able to extend the deadline beyond that of a request deadline, however, if you are running in tasks or VMs you can extend to long deadlines.
The first error you see may be just the timeout in normal operation, the 2nd is likely because of write contention. More on this: Handling Datastore Errors https://cloud.google.com/appengine/articles/handling_datastore_errors

What is the benefit / usage of a AppEngine remote procedure call

I'm trying to grasp the concept of the rpc on appengine. When or Why would i need to use one and what are the benefits?
Do they help with staying within your quota?
Are they more efficient?
When you use the datastore, memcache, URL Fetch, or many of the other services, you are implicitly creating and using an RPC.
Some methods take an optional RPC argument. You can create an RPC with custom settings, such as a deadline, to give you more control over the call. An example of when setting a deadline on datastore operations can be useful is deferring a write to the task queue on a timeout type failure. Setting a lower deadline will ensure you have enough time to try again or insert a task.
rpc on AppEngine is useful when you want to do a URL fetch and you want to do other things while you're waiting for the response to be completed.
Let's say your URL fetch will take 1 second to complete and you have 'other' processing to do for 1 second which you can do while your waiting. You can launch an rpc call, do the 'other' processing, and when the rpc fetch is finished you can continue the request. The request will take a total of 1 second (plus overhead) with rpc as opposed to the conventional approach which would take 2 seconds.

Async urlfetch on App Engine

My app needs to do many datastore operations on each request. I'd like to run them in parallel to get better response times.
For datastore updates I'm doing batch puts so they all happen asynchronously which saves many milliseconds. App Engine allows up to 500 entities to be updated in parallel.
But I haven't found a built-in function that allows datastore fetches of different kinds to execute in parallel.
Since App Engine does allow urlfetch calls to run asynchronously, I created a getter URL for each kind which returns the query results as JSON-formatted text. Now my app can do async urlfetch calls to these URLs which could parallelize the datastore fetches.
This technique works well with small numbers of parallel requests, but App Engine throws errors when attempting to run more than 5 or 10 of these urlfetch calls at the same time.
I'm only testing now, so each urlfetch is the identical query; since they work fine in small volumes but start failing with more than a handful of simultaneous requests, I'm thinking it must have something to do with the async urlfetch calls.
My questions are:
Is there a limit to the number of urlfetch.create_rpc() calls that can run asynchronously?
The synchronous urlfecth.fetch() function has a 'deadline' parameter that will allow the function to wait up to 10 seconds for a response before failing. Is there any way to tell urlfetch.create_rpc() how long to wait for a response?
What do the errors shown below mean?
Is there a better server-side technique to run datastore fetches of different kinds in parallel?
File "/base/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 501, in get_result
return self.__get_result_hook(self)
File "/base/python_lib/versions/1/google/appengine/api/urlfetch.py", line 331, in _get_fetch_result
raise DownloadError(str(err))
InterruptedError: ('The Wait() request was interrupted by an exception from another callback:', DownloadError('ApplicationError: 5 ',))
Since App Engine allows async urlfetch calls but does not allow async datastore gets, I was trying to use urlfetch RPCs to retrieve from the datastore in parallel.
The lack of async datastore gets is an acknowledged issue:
http://code.google.com/p/googleappengine/issues/detail?id=1889
And there's now a third-party tool that allows async queries:
http://code.google.com/p/asynctools/
"asynctools is a library allowing you to execute Google App Engine API calls in parallel. API calls can be mixed together and queued up and then all are kicked off in parallel."
This is exactly what I was looking for.
While I am afraid that I can't directly answer any of the questions that you pose, I think that I ought to tell you that all of your research along these lines may not lead to you to a working solution for your problem.
The problem is that datastore writes take much longer than reads, so if you find a way to max out the number of reads that can happen, you're code will very run out of time long before it is able to make corresponding writes to all of the entities that you have read.
I would seriously consider rethinking the design of your datastore classes to reduce the number of reads and writes that needs to happen, as this will quickly become a bottleneck for your application.
Have you considered using TaskQueues to do the work of queuing the requests to be executed later?
If the task returns a 4xx status it will be considered failed and will be retried later - so you could pass the error back up and have the task queue handle retrying the requests until the succeed. Also, with some experimentation with bucket sizes and rates, you can probably have the Task Queue slow down the requests enough that you don't max out the database
There's also a nice wrapper (deferred.defer) which makes things even simpler - you can make a deferred call to (almost) any function in your app.

Resources