I'm having a nodejs script which starts a stream with a third party and stores the incoming messages in FireStore.
There is no need for incoming requests. But after I deployed my script to App Engine, the script only starts if I call the cloud endpoint. After that, it keeps running (and that is what it should do).
Probably there is a way to start processes by default and also build in something like a auto-restart if it crashes, but I couldn't find it or I am using the wrong search terms :-)
AppEngine is a web-microservice platform. I mean that all (micro) service deployed have to be triggered by an HTTP request.
By the way, you can perform an infinite batch process which stream data.
However, you can set up a Cloud Task which call an AppEngine endpoint. The max duration is 24H. Link this to a Cloud Scheduler to launch every day your 24H-long task. (In detail, your cloud scheduler has to trigger an endpoint like Cloud Function or AppEngine. This endpoint creates the task in Cloud Task. Cloud Scheduler can't directly create a task in Cloud Task)
As Guillaume mentioned, GAE isn't really intended for implementing services like the ones you want to.
However, it's possible to do something similar, simply by configuring a minimum 1 idle instance:
GAE will start an idle instance for the service automatically, without waiting for a triggering request
when the idle instance dies accidentally or is terminated because it reaches the end of its allowed lifespan GAE will again start a new idle instance
when the 1st request comes in GAE will dispatch it to the idle instance, that instance thus becoming active (serving subsequent requests) and GAE will immediately start a new idle instance to have it on standby
when the only active instance dies GAE won't start a new instance immediately, it'll wait until a new request comes in, which will be like the 1st request
when traffic is high enough GAE will start dispatching it to the idle instance on standby activating it and again start a new idle instance on standby.
Related
It appears that AppEngine standard has a warmup feature to warm up an app after a deployment but I don't see the same feature available for Flex. The readiness & liveness probes also don't work for this since setting the path setting to a custom path inside the application doesn't seem to make the probes actually hit the internal endpoint.
Is there some solution I'm missing other than doing things like manually hitting the endpoints myself after the deployment which won't be very reliable since the calls don't necessarily always round robin to each instance?
In App Engine Standard, warmup requests essentially load your app's code into a new instance before any live requests reach that instance. This can happen in the following situations:
When you redeploy a version of your app.
When new instances are created due to the load from requests
exceeding the capacity of the current set of running instances.
When maintenance and repairs of the underlying infrastructure or
physical hardware occur
In App Engine Flexible, you can achieve the same result by using the initial_delay_sec setting for liveness checks in your app.yaml file. If you set up its value to give enough time for your code to initialize, the first request coming to that instance will be processed quickly by your already-initialized code.
In my app (Google App Engine Standard Python 2.7) I have some flags in global variables that are initialized (read values from memcache/Datastore) when the instance start (at the first request). That variables values doesn't change often, only once a month or in case of emergencies (i.e. when google app engine Taskqueue or Memcache service are not working well, that happened not more than twice a year as reported in GC Status but affected seriously my app and my customers: https://status.cloud.google.com/incident/appengine/15024 https://status.cloud.google.com/incident/appengine/17003).
I don't want to store these flags in memcache nor Datastore for efficiency and costs.
I'm looking for a way to send a message to all instances (see my previous post GAE send requests to all active instances ):
As stated in https://cloud.google.com/appengine/docs/standard/python/how-requests-are-routed
Note: Targeting an instance is not supported in services that are configured for auto scaling or basic scaling. The instance ID must be an integer in the range from 0, up to the total number of instances running. Regardless of your scaling type or instance class, it is not possible to send a request to a specific instance without targeting a service or version within that instance.
but another solution could be:
1) Send a shutdown message/command to all instances of my app or a service
2) Send a restart message/command to all instances of my app or service
I use only automatic scaling, so I'cant send a request targeted to a specific instance (I can get the list of active instances using GAE admin API).
it's there any way to do this programmatically in Python GAE? Manually in the GCP console it's easy when having a few instances, but for 50+ instances it's a pain...
One possible solution (actually more of a workaround), inspired by your comment on the related post, is to obtain a restart of all instances by re-deployment of the same version of the app code.
Automated deployments are also possible using the Google App Engine Admin API, see Deploying Your Apps with the Admin API:
To deploy a version of your app with the Admin API:
Upload your app's resources to Google Cloud Storage.
Create a configuration file that defines your deployment.
Create and send the HTTP request for deploying your app.
It should be noted that (re)deploying an app version which handles 100% of the traffic can cause errors and traffic loss due to:
overwriting the app files actually being in use (see note in Deploying an app)
not giving GAE enough time to spin up sufficient instances fast enough to handle high income traffic rates (more details here)
Using different app versions for the deployments and gradually migrating traffic to the newly deployed apps can completely eliminate such loss. This might not be relevant in your particular case, since the old app version is already impaired.
Automating traffic migration is also possible, see Migrating and Splitting Traffic with the Admin API.
It's possible to use the Google Cloud API to stop all the instances. They would then be automatically scaled back up to the required level. My first attempt at this would be a process where:
The config item was changed
The current list of instances was enumerated from the API
The instances were shutdown over a time period that allows new instances to be spun up and replace them, and how time sensitive the config change is. Perhaps close on instance per 60s.
In terms of using the API you can use the gcloud tool (https://cloud.google.com/sdk/gcloud/reference/app/instances/):
gcloud app instances list
Then delete the instances with:
gcloud app instances delete instanceid --service=s1 --version=v1
There is also a REST API (https://cloud.google.com/appengine/docs/admin-api/reference/rest/v1/apps.services.versions.instances/list):
GET https://appengine.googleapis.com/v1/{parent=apps/*/services/*/versions/*}/instances
DELETE https://appengine.googleapis.com/v1/{name=apps/*/services/*/versions/*/instances/*}
I am currently using google app engine as my mobile application back end. I have a few tasks that can not be performed in the gae environment (mainly image recognition using opencv). My intention is to retain gae and use AWS to perform these specific tasks.
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
You could either push tasks from GAE towards AWS, or have your AWS instances pull tasks from GAE.
If you push tasks from GAE towards AWS, you could use URLFetch to push your data towards your AWS instances.
If you prefer to have your AWS instances pull tasks from GAE, you could have your GAE instances put their tasks in the GAE Pull Queue, and then have your AWS instances use the Task Queue REST API to lease tasks from the queue.
In either case, the AWS instance could report back the processing result through a simple POST request to your GAE servlets, or through inserting tasks via the abovementioned REST API which would later be leased by your GAE instances. The latter could be useful if you want to control the rate of which your GAE app process the results.
Disclaimer: I'm a lead developer on the AppScale project.
One way that you could go is with AppScale - it's an open source implementation of the App Engine APIs that runs over Amazon EC2 (as well as other clouds). Since it's open source, you could alter the AppServer that we ship with it to enable OpenCV to be used. This would require you to run your App Engine app in AWS, but you could get creative and have a copy of your app running with Google, and have it send Task Queue requests to the version of your app running in AWS only when you need to use the OpenCV libraries.
Have you considered using amazon simple queue service ? http://aws.amazon.com/sqs/
You should be able to add items to the queue from gae using a standard http clint.
Sure. AppEngine has a Task Queue, where you can put in your tasks by simply implementing DeferredTask. In that task you can make requests to AWS.
Your intention to retain the application in GAE and use AWS to perform a few tasks, that can not be performed in the GAE, seems for me a right scenario.
I'd like to share a few ideas along with some resources to answer the main part of your question:
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
If you need GAE and AWS to perform the task all the time (24/7) then your application will definitely depend on batch schedule or task queue. They are available by GAE.
However if you could arrange to pull the task in GAE and perform by AWG on interval basis (say twice a day of less than an hour each), you may no need to use them as long you can manage the GAE to put the data on Google Cloud Storage (GCS) as public.
For this scenario, you need to setup AWS EC2 Instance for On/Off Schedule and let the instance to run a boot script using cloud-init to collect the data through your domain that pointed to GCS (c.storage.googleapis.com) like so:
wget -q --read-timeout=0.0 --waitretry=5 --tries=400 \\
--background http://your.domain.com/yourfile?q=XXX...
By having the data from GCS, then AWS can perform these specific tasks. Let it fire up GAE to clean the data and put the result back to GCS to be ready to be used as your mobile application back end.
Following are some options to consider:
You should note that not all of the EC2 types are suitable for On/Off Schedule. I recommend to use EC2-VPC/EBS if you want to setup AWS EC2 Instance for On/Off Schedule
You may no need to setup EC2 if you can set AWS Lambda to perform the task without EC2. The cost is cheaper, a task running twice a day for typically less than 3 seconds with memory consumption up to 128MB typically costs less than $0.0004 USD/month
As outcome of rearranging you your application in GAE and set AWG to perform some of the tasks, it might finally rise your billing rates, try to to optimize the instance class in GAE.
In my AppEngine application, the servlet takes a long time to initialize. This is not a big problem with a usual Tomcat deployment, as the initialization happens only once. However, in AppEngine, I noticed that many service requests cause AppEngine to launch a new process and do the initialization, and thus, these sevrice requests take a long time.
Is it possible to disconnect the initialization from the service request? Somehow tell AppEngine to do the initialization in the background, so that when a user asks for a page, he won't have to wait so long?
You can setup Idle instances or Warmup Requests a the Application Settings at the Applications Settings section of the App Engine console of your app. That should avoid getting those slow initialization times, as your app would be preloaded and ready to go.
The Twitter streaming api says that we should open a HTTP request and parse updates as they come in. I was under the impression that Google's urlfetch cannot keep the http request open past 10 seconds.
I considered having a cron job that polled my Twitter account every few seconds, but I think Google AppEngine only allows cron jobs once a minute. However, my application needs near-realtime access to my twitter #replies (preferably only a 10 second or less lag).
Are there any method for receiving real-time updates from Twitter?
Thanks!
Unfortunately, you can't use the urlfetch API for 'hanging gets'. All the data will be returned when the request terminates, so even if you could hold it open arbitrarily long, it wouldn't do you much good.
Have you considered using Gnip? They provide a push-based 'web hooks' notification system for many public feeds, including Twitter's public timeline.
I'm curious.
Wouldn't you want this to be polling twitter on the client side? Are you polling your public feed? If so, I would decentralize the work to the clients rather than the server...
It may be possible to use Google Compute Engine https://developers.google.com/compute/ to maintain unrestricted hanging GET connections, then call a webhook in your AppEngine app to deliver the data from your compute engine VM to where it needs to be in AppEngine.