Gracefully switch domain from App Engine service to Cloud Run - google-app-engine

I have a service running in App Engine, which has custom domain mapped to it.
We are in a process of migrating this service from App Engine to Cloud Run.
Thus, we would like to switch domain mapping from App Engine to Cloud Run.
We've tried doing that in our staging environment and discovered that such switching causes some downtime of our service (about 10-15 minutes).
Is it possible to avoid such a downtime and migrate traffic/switch domain gracefully without downtime?
Just for information, in our App Engine domain is mapped through deploying dispatch.yaml file.
In Cloud Run we enable domain for our service through following link: https://console.cloud.google.com/run/domains?project=<project_name>&supportedpurview=project

As #John Hanley mentioned and #Sandeep Vokkareni shared,
It potentially will have downtime. The issue is that the domain mapping is managed by the GFE (Google Frontend). You must delete and reissue custom domain mappings. One way to decrease the downtime is to make the DNS resource record TTL value small, such as 60 seconds. Once the remapping succeeds, change the TTL back to a normal value such as 86400 (one day).

Related

Google App Engine streaming data into Bigquery: GCP architecture

I'm working with a Django web app deployed on Google App Engine flexible environment.
I'm streaming my data while processing requests in my views using bigquery.Client(). But I think it is not the best way to do it. Do I need to delegate this process outside of the view (using pub/sub, tasks, cloud functions etc.? If so, give me a suitable architecture: which GCP product should I use, how to connect, and what to read.
Based on your comment, I could recommend you Cloud Run;
Cloud Run is a serverless container based product. You write a webserver (that handle your POST request), wrap it in a container and deploy it on Cloud Run.
With a brand new feature, named always on the CPU is not throttled after the response sent (the normal behavior). With always on, you keep the full CPU up to the Cloud Run instances off load (usually after 15 minutes, but can be quicker).
The benefit of the feature is the capacity to return immediately the response to the client, and then to continue to process, asynchronously, your data to store in BigQuery (in streaming mode).

Failover to different Google AppEngine Standard project

I got a Google AppEngine Standard app running in region1 and want to deploy that same app as a backup region2 in case region1 is down. I'm looking for a way to make that failover happens seemlessly for my users (both human users on browsers and other third party services which call back to my app's service).
Currently I have my custom domain name (both naked name and www name) mapped to the app on region #1 (done in [Google Cloud Console][App Engine][Settings][Customer Domains]).
In the event region1 is down, I would like to go in that setting area of app1(region1), remove those maps and then go that setting area of app2(region2) add those maps, so that after that point, requests to myappdomainname.com and www.myappdomainname.com will go to app2 on region2
Question: is that plan feasible? In particular, if region1 is down, can I still be able to access app1's setting area to remove those maps, so that I can add the maps to app2?
Down time while switching these for about an hour is okay for my app, as long as my users can continue to use the same URL they been using when region1 was still running.
Google App Engine is a regional service, meaning that it cannot span to more than a region. However, it's replicated through all zones of the region to reduce any potential downtime.
The kind of implementation you want for GAE is opposite of the actual purpose of it. One of GAE's principal features is that you don't have to configure and manage the instances running in the background yourself.
The preferred way of getting this to work on Google Cloud Platform would be using Compute Engine. GCE gives you the option to create the instances in any region you want and configure a Load Balancer to serve the traffic and scale your instances as you want. Here is some documentation about serving applications using GCE:
Running a GO app on Compute Engine (part of a GCP quickstart)
Building Scalable and Resilient Web Applications on Google Cloud Platform
Designing Robust Systems
Also, here's a Google Groups post about this issue that goes a little bit more in detail.

GAE shutdown or restart all the active instances of a service/app

In my app (Google App Engine Standard Python 2.7) I have some flags in global variables that are initialized (read values from memcache/Datastore) when the instance start (at the first request). That variables values doesn't change often, only once a month or in case of emergencies (i.e. when google app engine Taskqueue or Memcache service are not working well, that happened not more than twice a year as reported in GC Status but affected seriously my app and my customers: https://status.cloud.google.com/incident/appengine/15024 https://status.cloud.google.com/incident/appengine/17003).
I don't want to store these flags in memcache nor Datastore for efficiency and costs.
I'm looking for a way to send a message to all instances (see my previous post GAE send requests to all active instances ):
As stated in https://cloud.google.com/appengine/docs/standard/python/how-requests-are-routed
Note: Targeting an instance is not supported in services that are configured for auto scaling or basic scaling. The instance ID must be an integer in the range from 0, up to the total number of instances running. Regardless of your scaling type or instance class, it is not possible to send a request to a specific instance without targeting a service or version within that instance.
but another solution could be:
1) Send a shutdown message/command to all instances of my app or a service
2) Send a restart message/command to all instances of my app or service
I use only automatic scaling, so I'cant send a request targeted to a specific instance (I can get the list of active instances using GAE admin API).
it's there any way to do this programmatically in Python GAE? Manually in the GCP console it's easy when having a few instances, but for 50+ instances it's a pain...
One possible solution (actually more of a workaround), inspired by your comment on the related post, is to obtain a restart of all instances by re-deployment of the same version of the app code.
Automated deployments are also possible using the Google App Engine Admin API, see Deploying Your Apps with the Admin API:
To deploy a version of your app with the Admin API:
Upload your app's resources to Google Cloud Storage.
Create a configuration file that defines your deployment.
Create and send the HTTP request for deploying your app.
It should be noted that (re)deploying an app version which handles 100% of the traffic can cause errors and traffic loss due to:
overwriting the app files actually being in use (see note in Deploying an app)
not giving GAE enough time to spin up sufficient instances fast enough to handle high income traffic rates (more details here)
Using different app versions for the deployments and gradually migrating traffic to the newly deployed apps can completely eliminate such loss. This might not be relevant in your particular case, since the old app version is already impaired.
Automating traffic migration is also possible, see Migrating and Splitting Traffic with the Admin API.
It's possible to use the Google Cloud API to stop all the instances. They would then be automatically scaled back up to the required level. My first attempt at this would be a process where:
The config item was changed
The current list of instances was enumerated from the API
The instances were shutdown over a time period that allows new instances to be spun up and replace them, and how time sensitive the config change is. Perhaps close on instance per 60s.
In terms of using the API you can use the gcloud tool (https://cloud.google.com/sdk/gcloud/reference/app/instances/):
gcloud app instances list
Then delete the instances with:
gcloud app instances delete instanceid --service=s1 --version=v1
There is also a REST API (https://cloud.google.com/appengine/docs/admin-api/reference/rest/v1/apps.services.versions.instances/list):
GET https://appengine.googleapis.com/v1/{parent=apps/*/services/*/versions/*}/instances
DELETE https://appengine.googleapis.com/v1/{name=apps/*/services/*/versions/*/instances/*}

Azure Web App - Request Timeout Issue

I have and MVC5 web-app running on Azure. Primarily this is a website but I also have a CRON job running (triggered from an external source) which calls a URL (a GET Request) to carry out some house keeping.
This task is asynchronous can take up to and sometimes over the default timeout on Azure of 230 seconds. I'm regularly getting 500 errors due to this.
Now, I've done a bit of reading and this looks like it's something to do with the Azure Load Balancer settings. I've also found some threads relating to being able to alter that timeout in various contexts but I'm wondering if anyone has experience in altering the Azure Load Balancer timeout in the context of a Web App? Is it a straightforward process?
but I'm wondering if anyone has experience in altering the Azure Load Balancer timeout in the context of a Web App?
You could configure the Azure Load Balancer settings in virtual machines and cloud services.
So, if you want to do it, I suggest you could deploy web app to Virtual Machine or migrate web app to cloud services.
For more detail, you could refer to the link.
If possible, you could try to optimize your query or execution code.
This is a little bit of an old question but I ran into it while trying to figure out what was going on with my app so I figured I would post an answer here in case anyone else does the same.
An Azure Load Balancer (created via ANY means) which mostly comes along with an external IP Address — at least when created via Kubernetes / AKS / Helm — has the 4 min, 240 second idle connection timeout that you referred to.
That being said there are two different types of Public IP Addresses — Basic (which is almost always the default that you probably created) and Standard. More information on the docs: https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-standard-overview
You can modify the idle timeout see the following doc: https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-tcp-idle-timeout
That being said changing the timeout may or may not solve your problem. In our case we had a Rails app that was connection to a database outside of azure. As soon as you add a Load Balancer with a Public IP all traffic will EXIT that public IP and be bound by the 4 minute idle timeout.
That's fine except for that Rails anticipates that a connection to a Database is not going to be cut frequently — which in this case happens ALL the time.
We ended up implementing a connection pooling service that sat between our Rails app and our real database (called PGbouncer — specific to Postgres DBs). That service monitored a connection and re-connected when the timer was nearing the Azure LB timeout.
It took a little while to implement but in our case it works flawlessly. You can see some more details over here: What Azure Kubernetes (AKS) 'Time-out' happens to disconnect connections in/out of a Pod in my Cluster?
The longest timeout you can set for a Public IP / Load Balancer is 30 minutes. If you have a connection that you would like to utilize that runs idle longer than that — then you may be out of luck. As of now 30 mins is the max.

Using amazon web services as google app engine back end

I am currently using google app engine as my mobile application back end. I have a few tasks that can not be performed in the gae environment (mainly image recognition using opencv). My intention is to retain gae and use AWS to perform these specific tasks.
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
You could either push tasks from GAE towards AWS, or have your AWS instances pull tasks from GAE.
If you push tasks from GAE towards AWS, you could use URLFetch to push your data towards your AWS instances.
If you prefer to have your AWS instances pull tasks from GAE, you could have your GAE instances put their tasks in the GAE Pull Queue, and then have your AWS instances use the Task Queue REST API to lease tasks from the queue.
In either case, the AWS instance could report back the processing result through a simple POST request to your GAE servlets, or through inserting tasks via the abovementioned REST API which would later be leased by your GAE instances. The latter could be useful if you want to control the rate of which your GAE app process the results.
Disclaimer: I'm a lead developer on the AppScale project.
One way that you could go is with AppScale - it's an open source implementation of the App Engine APIs that runs over Amazon EC2 (as well as other clouds). Since it's open source, you could alter the AppServer that we ship with it to enable OpenCV to be used. This would require you to run your App Engine app in AWS, but you could get creative and have a copy of your app running with Google, and have it send Task Queue requests to the version of your app running in AWS only when you need to use the OpenCV libraries.
Have you considered using amazon simple queue service ? http://aws.amazon.com/sqs/
You should be able to add items to the queue from gae using a standard http clint.
Sure. AppEngine has a Task Queue, where you can put in your tasks by simply implementing DeferredTask. In that task you can make requests to AWS.
Your intention to retain the application in GAE and use AWS to perform a few tasks, that can not be performed in the GAE, seems for me a right scenario.
I'd like to share a few ideas along with some resources to answer the main part of your question:
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
If you need GAE and AWS to perform the task all the time (24/7) then your application will definitely depend on batch schedule or task queue. They are available by GAE.
However if you could arrange to pull the task in GAE and perform by AWG on interval basis (say twice a day of less than an hour each), you may no need to use them as long you can manage the GAE to put the data on Google Cloud Storage (GCS) as public.
For this scenario, you need to setup AWS EC2 Instance for On/Off Schedule and let the instance to run a boot script using cloud-init to collect the data through your domain that pointed to GCS (c.storage.googleapis.com) like so:
wget -q --read-timeout=0.0 --waitretry=5 --tries=400 \\
--background http://your.domain.com/yourfile?q=XXX...
By having the data from GCS, then AWS can perform these specific tasks. Let it fire up GAE to clean the data and put the result back to GCS to be ready to be used as your mobile application back end.
Following are some options to consider:
You should note that not all of the EC2 types are suitable for On/Off Schedule. I recommend to use EC2-VPC/EBS if you want to setup AWS EC2 Instance for On/Off Schedule
You may no need to setup EC2 if you can set AWS Lambda to perform the task without EC2. The cost is cheaper, a task running twice a day for typically less than 3 seconds with memory consumption up to 128MB typically costs less than $0.0004 USD/month
As outcome of rearranging you your application in GAE and set AWG to perform some of the tasks, it might finally rise your billing rates, try to to optimize the instance class in GAE.

Resources