Use Google Cloud Functions to speed up GAE app - google-app-engine

I have a GAE standard Python app that does some fairly computational processing. I need to complete the processing within the 60 second request time limit, and ideally I'd like to do it faster for a better user experience.
Splitting the work to multiple threads don't seem to be a good solution because the threads would likely run on the same CPU and thus wouldn't give a speed up.
I was wondering if Google Cloud Functions (GCF) could be used in a similar manner as threads. For example, if I create a GCF to do the processing, split my work into 10 chunks, and make 10 GCF calls in parallel, can I expect to get results 10x faster? (aside from latency and GCF startup costs)

Each function invocation runs in its own server instance, and a function will scale up to 1000 instances to handle concurrent requests in parallel. So yes, you can do this, if you are willing to potentially pay the cold start cost of each server instance as it's allocated for its first request.

If you're able to split the workload in smaller chunks that you'd be launching in parallel via separate (external) requests I'd suspect you'd get a better performance (and cost) by using GAE itself (maybe in a separate service) instead of CFs:
GAE standard environment instances can have higher CPU speeds - a B8 instance has 4.8 GHz, the max CF CPU speed is 2.4 GHz
you have better control over the GAE scaling configuration and starting time penalties
I suspect networking delays would be at least the same if not better on GAE - not going to another product infra (unsure though)
GAE costs would likely be smaller since you pay per instance hours (regardless of how many requests the instance handles) not per request/invocations

Related

What is a Google App Engine instance?

I am trying to estimate the monthly costs for having GAE for in-app store and I do not really understand what is an instance and what can I do within one instance.
Can I just have one instance with multiple threads to deal with multiple clients? And as I have 28 hours of free instance per app per day (http://cloud.google.com/pricing/), does it mean that I would not pay for my server app running all the time?
An instance is an instance of a virtual server, running your code, that is able to serve requests to clients. This is usually done in parallel (Goroutines, Java threads, Python threads with 2.7) for most efficient usage of available resources.
Response times depends on what you're doing in your code, and it's usually IO dependent. If you have a waterfall of serial database lookups, it takes longer than if you only have a single multiget and perhaps an async write.
Part of the deal with GAE is that Google handles the elasticity for you. If there are a lot of connections waiting, new instances will start as needed (until your quota is exhausted). That means it can be difficult to estimate cost upfront, because you don't know exactly how efficient your code is and how much resources you'll need. I recommend a scheme where more usage means more income, and income per request is higher than cost per request. :)
You can tweak settings, saying you want requests to wait in queue, or always have a couple of spare instances ready to serve new requests, which will affect cost for you and response times for users.
In an IaaS scenario you could say that you will use five instances and that's the cost, but in reality you might need only 1 at night local time, and 25 the rest of the day, which means your users would most likely see dropped connections or otherwise have a negative user experience.
A free instance is normally able to handle test traffic during development without exhausting the quota.
Well AppEngine may decide you need to have more than one instance running to handle the requests and so will start another one. You won't be able to limit it to one running instance. In fact, it's sometimes unclear why AE starts another instance when it seems like the requests are low, but it will if it decides it needs another warm instance to be ready to handle requests if the serving instance(s) are too near their limit.

Google AppEngine sending all requests to same instance

Lately, I have seen GAE taking much, much longer to process requests than it did just a week ago. Nothing changed in my code, but GAE now is taking 4000-12000ms to respond to requests. What makes is worse is that I have plenty of instances available with 0 requests on them.
Has anyone else seen this happen?
What can I do to fix it?I have gone as far as to spin up 15 extra instances (and paid through the nose for them) but nothing seems to send requests to the other idle instances reliably.
My bill has gone from 70-90c/day to $5-8/day without any code change or increase in traffic. In fact, I am losing traffic because of the huge latency.
QPS* Latency* Requests Errors Age Memory Availability
0.000 0.0 ms 1378 0 10:10:09 57.9 MBytes Dynamic
0.000 0.0 ms 1681 0 15:39:57 57.2 MBytes Dynamic
0.017 9687.0 ms 886 0 10:19:10 56.7 MBytes Dynamic
I recommend installing AppStats to get a picture of what's taking so long in each request. I'd guess that you're having some contention issues or large numbers of reads/writes caused by some new data configuration.
The idle instances won't help decrease latency - it looks like every request takes a long time, and with less than one request per minute (in this sample anyway), 10s requests could run serially on the same instance.
We have a similar problem in our app. In our case, we are under the impression that GAE's scheduler did a poor job in balancing requests to existing instances.
In some cases, the scheduler decided to spin up new instances instead of re-using already existing ones. Since spinning a new instance took 5 to >45 seconds, I suspect this might be what happened to you.
Try to investigate the following and see if it helps you:
Make sure your app has thread-safe enabled so that you could process concurrent requests. You could configure this in your app.yaml if you are using Python, or in your appengine-web.xml if you use Java. Of course, you also need to make sure that the code in your app is threadsafe.
In your application settings, if it is still set on automatic, change the minimum pending latency to a non-automatic setting. I'd suggest around 10 seconds for now, but you could experiment later on which setting would suit you the most. This force the scheduler to wait for a certain time to see if any instance is available within the time before spinning up a new instance.
Now, to answer your original question regarding sending all requests to same instance, as far as I know there is no way to address a specific front-end instance in order to direct the requests to that particular instance.
What you could do is migrate your app to use backend instances instead of the regular frontend instance. Backends provides a way to directly target any particular instance within it. You could deploy your app in a single backend to have more control on the number of instance that you spawn. And since using the backend bypass the scheduler, you would not encounter latencies caused by new instances spinning up.
The major drawback of using this approach is that you lose the auto-scalability benefit of using front-end instances. But seeing from your low daily billing, I think scalability is not yet a major concern for the scale of your app.

How many parallel requests can one Google App Engine Python instance handle?

How many threads/requests can one Google App Engine Python instance handle in parallel? I'm using python27 runtime and threadsafe option is enabled (true).
Are there any restictions or conditions which could limit parallelism?
For clarification: this isn't about Java or Python GAE SDK.
The amount of parallelism you get is highly dependent on the workload of your application. If your requests are CPU bound, you'll only serve one request at a time. On the other hand, if your requests are RPC bound, you could potentially serve 10's of concurrent requests. However, there are two relavent limits:
1. Instance size. The default 600MHz F1 instance can only serve so many concurrent requests before hitting the CPU limit, overloading your instance and causing a significant increase in latency.
2. There is a hard limit on concurrent requests. It's implementation dependent and subject to change, but at this moment on python27, it's 8.
Although I get millions of hits/day my QPS is around 2 and my requests are under a second
So don't expect too much parallelism its 2-3 at most
( It's impossible to determine a QPS value for your use-case, this is my use-case )

how fast is Google App Engine MapReduce?

How much of a compute-intensive gain can one expect on GAE MapReduce? The scenario of interest to me is compute intensive, so for example: multiplying a trillion random floats in a single threaded single core application. Then imagine 1000 MapReduce workers multiplying a billion random numbers each and announcing "finished" when all workers have finished. Assume billing is enabled if that matters. (It might not).
Edit: A commenter asked for clarification. The title has been revised. If the task takes 50000 seconds single threaded and in an alternative implementation 1000 MapReduce workers are employed and they finish after 500 seconds, then the performance gain is 100 times. 1000 workers: 100 times gain, only slightly disappointing, but so be it for this example. How can I get finished sooner? Can I ask for 10,000 workers? This question may have to do with limits and quotas. Assume an adequate budget. Does MapReduce's compute-intensive performance gain head to an asymptote and if so what is the performance gain at that asymptote? There was also information in the comment about MapReduce being suitable for large amounts of data generated by a user facing URL however, my question is not in regard to a Datastore-intensive application's performance versus the same application rewritten for MapReduce. Datastore activity will be minimal in this compute-intensive scenario. I realize there will always be some Datastore activity in any MapReduce application, but since this is a compute-intensive scenario, the Datastore activity and the size of the Datastore entities is not going to be a big influence on the performance gain calculated. The task will use the Datastore for less than 1% of the elapsed time. Nor is the scenario involving a large amount of communication bandwidth (other than the minimum necessary to hit the task queued URLs that MapReduce uses). The question is in regard to comparing a compute-intensive single threaded non-MapReduce task's elapsed time to the same task's elapsed time on MapReduce which is inherently multi-threaded given there are multiple workers. I use the word "task" generically, in other words, "task means work". The gain might (but not necessarily) be a function of the number of workers hence I mentioned 1000 workers in the example.
It's not clear exactly what you're asking here. Are you asking how efficient it is? How cheap it is? How fast it is?
In general, App Engine is designed for serving user-facing sites, and the App Engine mapreduce API exists to assist with that - processing large amounts of data generated by the user-facing site. If you have a large amount of data that's hosted outside App Engine, and you want to do some sort of large-scale data processing on it, App Engine is probably not the tool for you.
Regarding performance, you can expect each worker to execute tasks as fast as they would be if you were executing them serially, so your items-per-second is roughly the number of workers multiplied by the regular rate - there's relatively little overhead. There can be some delay at the end, though, when different workers finish at different times, and how much this is depends on how good a job mapreduce does of sharding your data. With datastore input, this used to be fairly poor, but it's a lot better now.
As to how many mappers you can have, that depends on a number of things: Whether or not your app has billing enabled, how much other traffic your app gets, and how long your mapper tasks take per element. The only real way to determine this is to experiment a bit.

is Google App Engine-MapReduce my best bet for a massively parallel solution in a cloud?

Is Google App Engine-MapReduce my best bet for a massively parallel solution in a cloud? My problem takes hours multi-threaded on a 4 core PC. I'd say 600 minutes might do. I would prefer 1000 servers get it done in 36 seconds. Switching from 4 core threading to 1000 server processing is eminently doable in my app. In fact, I can already send 1000 small jobs to 4 cores but it's not going to get done sooner than 4 big jobs to 4 cores given that I still have only 4 cores. (My dataset is small so Map-Reduce, which was designed for large datasets, might have a different sweet-spot than my type of compute-bound problem.)
I think I can get this done if I have 1000 simultaneous URL fetches but as you may know Google limits at 10 requests. It seems Google is actively discouraging outsiders from putting massively parallel solutions on their infrastructure.
I started looking into Google App Engine because upon deployment there will be very few users and it appeared App Engine has fine-grained costs - a feature I really like. My impression was that Amazon EC2 would be more work but also that costs were more likely to be chunky. Given that I'm a home-based business, I don't want to pay anything more than a nominal amount when in the early months I don't expect a lot of visitors to my website. May be they will never visit.
In general, where do people turn to for massively parallel (compute-bound) problems that ought to be served by a cloud?
For compute bound tasks, EC2 is often better than App Engine. App Engine is focused on serving web requests, not pure number crunching. It is not designed to go from 0 requests this minute to 1000 requests the next minute and back to 0 requests the minute after that. In fact, one of its features is that you generally don't need explicit control over how many instances are running at once. Also, long running jobs are not possible, though for many tasks you can use Task Queues to chain jobs together. I think the current limit on background tasks is 10 minutes.
EC2 does have a super low tier of service that you can get for free. EC2 lets you explicitly bring servers up and down, but I think the smallest increment you can pay for is 1 hour.
Of course, if you want to literally run your job on 1000 servers, neither app engine nor EC2 will likely let you do that for free. Both are very elastic/adaptive, but bringing 1000 servers up for 30 seconds of work is not very economical for them. On App Engine you will likely run up against an hourly or daily quota before you had 1000 simultaneous instances running. On EC2, you generally pay by the server instance. So you would be paying for 1000 hours of instance time. Of course, one of Amazon's High CPU instances might be much more powerful than your PC, so maybe you'd only need a 100 or so. or maybe you could compromise and have only 20 instances running at a time, meaning it takes a few minutes to finish your computation, but you don't go broke.
Have you checked Amazon's Elastic MapReduce? http://aws.amazon.com/elasticmapreduce/
With App Engine you should also investigate the task queues. If you already know how to split the big problem into many small ones, you could create one task that takes in the big problem and then creates 1000 (or 10.000) subtasks to tackle the smaller problems. And after that collect the results in one task, if needed.
Individual tasks can run up to 10 minutes before they are terminated, which makes them a little bit easier to use for computing tasks than regular requests.

Resources