This may be the wrong place for this question, so please re-direct me if necessary.
I have deployed a couple simple functions using Google Cloud Functions that do the following:
Read files from AWS and write to Cloud SQL
Aggregate Cloud SQL data and write csv file to Cloud Storage bucket
Simple OLS prediction model on aggregated data
I have these as separate functions because (1) often takes longer than the Cloud Function maximum timeout. Because of this, I am considering moving this whole thing to App Engine as a service. My question about App Engine Standard are:
What do the request timeouts mean? If I were to run this service, do I still have a short time-limit after which it will no longer run?
Is App Engine the best thing to use for this task?
Thanks for all your help
According to Google Documentation, GAE Standard has a maximum timeout of 1 minute for http requests and 10 minutes for cron/tasks for the older environments. Newer env have it as 10 minutes for both http requests & tasks. If your functions are taking longer than these, then GAE standard won't work for you. For such a case, you should take a look at GAE Flex - see this Google documentation which compares Flex to Standard).
Secondly, it seems to me that what you have are endpoints that are only hit occasionally or at specific scheduled times. If that is the case, I would also recommend taking a look at Cloud Run. We have a blog article about it and we have this
....Another thing to note about Cloud Run is that it only runs when it receives an HTTP request. It plays dead and comes alive to execute your code when an HTTP request comes in. When it is done executing the request, it goes 'dead' again till the next request comes in. This means you're not paying for time spent idling i.e. when it is not doing anything....
You can keep your Cloud Functions and the strong cohesion implemented by each of your 3 Functions, then you can use Cloud Workflows a serverless solution to orchestrate the 3 CF calls. The drawback : you pay for 3 CF invocations and 3 Workflows steps. But does it matter ? Since 2millions CF invocations are free and 5000 Workflows steps are free.
As proposed by #NoCommandLine Cloud Run is indeed an alternative, with its timeout of 3600s(1h). The drawback: you need to wrap your code in a http request and provide a webserver like express or gunicorn.
A hack is to build a docker container for your code with no need for a webserver and run it using Cloud Build which have a timeout of 24 hours.
Related
I'm working with a Django web app deployed on Google App Engine flexible environment.
I'm streaming my data while processing requests in my views using bigquery.Client(). But I think it is not the best way to do it. Do I need to delegate this process outside of the view (using pub/sub, tasks, cloud functions etc.? If so, give me a suitable architecture: which GCP product should I use, how to connect, and what to read.
Based on your comment, I could recommend you Cloud Run;
Cloud Run is a serverless container based product. You write a webserver (that handle your POST request), wrap it in a container and deploy it on Cloud Run.
With a brand new feature, named always on the CPU is not throttled after the response sent (the normal behavior). With always on, you keep the full CPU up to the Cloud Run instances off load (usually after 15 minutes, but can be quicker).
The benefit of the feature is the capacity to return immediately the response to the client, and then to continue to process, asynchronously, your data to store in BigQuery (in streaming mode).
I am interested in triggering notifications into my Salesforce system when reviews are posted on my Google Business profile.
Using this article, it describes being able to call the Salesforce API by executing Python code using Google's App Engine, leaving very little for me to code myself but I'm not familiar with how Google works in terms of billing.
Essentially when a topic/subscription is triggered, Google will execute a small python script to call the Salesforce API. I want to know how much will it cost to do this, or how can I calculate how this will be billed?
You must be referring to Cloud Functions because that's the product used in the article (also, Cloud Functions has Pub/Sub event trigger). App Engine and Cloud Functions are both serverless products but they have different use cases. If you wish to know the difference between them, here's a good SO answer.
You are billed depending on how long the function takes to finish, how many times you invoke it, and the type of resources configured on the function (the higher the CPU & memory is, the higher the cost). There's additional charges as well if your function makes outbound data transfer whenever you invoke the function.
You can estimate your monthly cost by checking out GCP pricing calculator. Also, there are several products involved on the article (such as Secret Manager and Pub/Sub) so take note of adding these products on your estimate. For additional pricing details, check out the docs: https://cloud.google.com/functions/pricing
I also have another advice though this is not about the pricing when invoking the function but more of the additional when deploying it. From GCP docs:
When you deploy your function's source code to Cloud Functions, that source is stored in a Cloud Storage bucket. Cloud Build then automatically builds your code into a container image and pushes that image to Container Registry. Cloud Functions accesses this image when it needs to run the container to execute your function.
This apply on newer runtimes. It may be noticeable on your billing if you frequently deploy/update a function because Cloud Build has to re-build those images and those images has to be stored on a Cloud Storage bucket. Functions take time to deploy so it's best practice to test your functions first locally. A solution is to use Functions Framework.
I was handed an assignment but I don't know where to start.
The aim is to have 2 piece of code running. One will run in Open stack private cloud and perform the task of indexing two sets of text, with another running in EC2 with the task of matching the two indexed tests.
I want to access them via google app engine.
Ideally, I would like to click a button or perform an action on Google app engine, which then sends a request to Openstack to run the code and retrieve the output of a txt file.
That outputted text files will then be forwarded onto EC2 where the matching will occur and the results sent back to Google App Engine.
My question is, how can I send the files between the systems using REST requests?
FrankN --
EC2, GAE and OpenStack are disparate cloud computing platforms. To integrate them might include, say, using one platform while saving backups to another.
CloudU.Rackspace.com is a vendor-neutral education site about cloud computing (note: It'll take six or so hours to finish it all). This might help.
Disclaimer: I work for Rackspace.
Firstly, not really sure what your requirements are, why your code does or why are you trying to mix cloud providers in that way.
That said, I would suggest taking the upload from GAE and push it to AWS S3 where you can then retrieve and use as you please from EC2.
Not sure what functionality you are trying to get out of OpenStack that is not present in AWS; however, I would suggest building whatever you are building in EC2 first, then duplicate in on OpenStack services to avoid future vendor lock in.
I am currently using google app engine as my mobile application back end. I have a few tasks that can not be performed in the gae environment (mainly image recognition using opencv). My intention is to retain gae and use AWS to perform these specific tasks.
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
You could either push tasks from GAE towards AWS, or have your AWS instances pull tasks from GAE.
If you push tasks from GAE towards AWS, you could use URLFetch to push your data towards your AWS instances.
If you prefer to have your AWS instances pull tasks from GAE, you could have your GAE instances put their tasks in the GAE Pull Queue, and then have your AWS instances use the Task Queue REST API to lease tasks from the queue.
In either case, the AWS instance could report back the processing result through a simple POST request to your GAE servlets, or through inserting tasks via the abovementioned REST API which would later be leased by your GAE instances. The latter could be useful if you want to control the rate of which your GAE app process the results.
Disclaimer: I'm a lead developer on the AppScale project.
One way that you could go is with AppScale - it's an open source implementation of the App Engine APIs that runs over Amazon EC2 (as well as other clouds). Since it's open source, you could alter the AppServer that we ship with it to enable OpenCV to be used. This would require you to run your App Engine app in AWS, but you could get creative and have a copy of your app running with Google, and have it send Task Queue requests to the version of your app running in AWS only when you need to use the OpenCV libraries.
Have you considered using amazon simple queue service ? http://aws.amazon.com/sqs/
You should be able to add items to the queue from gae using a standard http clint.
Sure. AppEngine has a Task Queue, where you can put in your tasks by simply implementing DeferredTask. In that task you can make requests to AWS.
Your intention to retain the application in GAE and use AWS to perform a few tasks, that can not be performed in the GAE, seems for me a right scenario.
I'd like to share a few ideas along with some resources to answer the main part of your question:
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
If you need GAE and AWS to perform the task all the time (24/7) then your application will definitely depend on batch schedule or task queue. They are available by GAE.
However if you could arrange to pull the task in GAE and perform by AWG on interval basis (say twice a day of less than an hour each), you may no need to use them as long you can manage the GAE to put the data on Google Cloud Storage (GCS) as public.
For this scenario, you need to setup AWS EC2 Instance for On/Off Schedule and let the instance to run a boot script using cloud-init to collect the data through your domain that pointed to GCS (c.storage.googleapis.com) like so:
wget -q --read-timeout=0.0 --waitretry=5 --tries=400 \\
--background http://your.domain.com/yourfile?q=XXX...
By having the data from GCS, then AWS can perform these specific tasks. Let it fire up GAE to clean the data and put the result back to GCS to be ready to be used as your mobile application back end.
Following are some options to consider:
You should note that not all of the EC2 types are suitable for On/Off Schedule. I recommend to use EC2-VPC/EBS if you want to setup AWS EC2 Instance for On/Off Schedule
You may no need to setup EC2 if you can set AWS Lambda to perform the task without EC2. The cost is cheaper, a task running twice a day for typically less than 3 seconds with memory consumption up to 128MB typically costs less than $0.0004 USD/month
As outcome of rearranging you your application in GAE and set AWG to perform some of the tasks, it might finally rise your billing rates, try to to optimize the instance class in GAE.
I have been writing a Google Chrome extension for Stack Exchange. It's a simple extension that allows you to keep track of your reputation and get notified of comments on Stack Exchange sites.
Currently I've encountered with some issues that I can't handle myself.
My extension uses Google App Engine as its back-end to make external requests to Stack Exchange API. Each single client request from extension for new comments on single site can cause plenty of requests to api endpoint to prepare response even for non-skeetish user. Average user has accounts at least on 3 sites from Stack Exchange network, some has > 10!
Stack Exchange API has request limits:
A single IP address can only make a certain number of API requests per day (10,000).
The API will cut my requests off if I make more than 30 requests over 5 seconds from single IP address.
It's clear that all requests should be throttled to 30 per 5 seconds and currently I've implemented request throttle logic based on a distributed lock with memcached. I'm using memcached as a simple lock manager to coordinate the activity of GAE instances and throttle UrlFetch requests.
But I think it's a big failure to limit such powerful infrastructure to issue no more than 30 requests per 5 sec. Such api request rate does not allow me to continue development of new interesting and useful features and one day it will stop working properly at all.
Now my app has 90 users and growing and I need come up with solution how to maximize request rate.
As known App Engine makes external UrlFetch requests via the same pool of different IP's.
My goal is to write request throttle functionality to ensure compliance with the api terms of usage and to utilize GAE distributed capabilities.
So my question is how-to provide maximum practical API throughput while complying with api terms of usage and utilizing GAE distributed capabilities.
Advise to use another platform/host/proxy is just useless in my mind.
If you are searching a way to programmatically manage Google App Engine shared pool of IPs, I firmly believe that you are out of luck.
Anyway, quoting this advice that is part of the faq, I think you have more than a chance to keep on running your awesome app:
What should I do if I need more
requests per day?
Certain types of applications -
services and websites to name two -
can legitimately have much higher
per-day request requirements than
typical applications. If you can
demonstrate a need for a higher
request quota, contact us.
EDIT:
I was wrong, actually you don't have any chance.
Google App Engine [app]s are doomed.
First off: I'm using your extension and it rocks!
Have you consider using memcached and caching the results?
Instead of taking the results from the API directly, try first to find them on the cache if they are use it and if they are not: retrieve them and cache them and let them expire after X minutes.
Second, try to batch up users requests, instead of asking the reputation of a single user ask the reputation of several users together.