Serveless inference over multi-model endpoint - Amazon Sagemaker - amazon-sagemaker

I created a model on Sagemaker using the following two options. I also specified the URI for the custom container under ECR as well as the root path for the model archives.
I am able to successfully created provisioned endpoint configuration however, in case of serverless, the following message showed up. Does this mean that it is absolutely not possible on Sagemaker to have a serverless multimodel endpoint?

Does this mean that it is absolutely not possible on Sagemaker to have a serverless multimodel endpoint? Basically with serverless you can deploy each model as a different endpoint and its cost effective as you pay only for usage. To answer your question technically you can't deploy multiple models on a serverless endpoint like you do with Multi model endpoints.


Can traffic from App Engine for Google APIs travel through Serverless VPC access connector not be routed through cloud NAT?

We have set up a VPC Serverless access connector, and configured app engine to use this in app.yaml. We have egress_setting: all-traffic set, as we want to access a 3rd party API from a specific IP address. We used the documentation from
Part of our testing is hitting a large set of URLs on app engine and checking the HTTP status. In this testing we noticed a dramatic reduction in the rate of serving requests when using the connector. Since all egress traffic is routed via the connector, my first inclination is to think our applications usage of Google APIs (datastore, cloud storage, Cloud SQL) is being impacted.
The connector is still has the minimum number of instances as active instances, indicating we have not reached the limit of it's performance, and that this is not the bottleneck. However, retesting with the vpc_access_connector removed from app.yaml returns performance to what we previously had.
I've tried enabling Private Google Access on the subnet the connector is linked to, but this has not improved the situation.
I think we may need to add some routing rules that allow us to send the traffic for Google APIs directly to Google's services, and not through the cloud NAT, but I'm unsure as to what rules would be applicable. I see no reason why this is not possible, but I haven't found the right documentation to guide me here.
Is this possible? Is this documented somewhere?

Is there a way to deploy internal facing applications in Google App Engine?

Is there a way to deploy "internal facing" applications in Google App Engine. AWS offers this capability as explained here and so does Azure as explained here.
What is the GCP equivalent for this? It appears App Engine Flexible Environment could be the answer but I could not find a clear documentation on whether Flexible Environment is indeed the way to host intranet facing applications. Is there someone from GCP who can advise?
I tested the solution recommended by Dan recently. Listed below are my observations:
App Engine Flex allows deploying to a VPC and this allows VPN scenarios. The VPN scenarios however is for connections (originating) from App Engine to GCP VPCs or to other networks outside GCP which can be on-prem or in another cloud.
Access (destined) to the app itself from a GCP or another network is always routed via the internet facing Public IPs. There is no option to access the app at a private IP at the moment.
If there's another update, I will update it here.
Update 28Oct2021
Google has now launched Serverless Network Endpoint Group(NEG)s. With this users can connect AppEngine, Cloud Run & Cloud Function endpoints to a LoadBalancer. However at the moment, you can only use Serverless NEGs with an external HTTP(S) load balancer. You cannot use serverless NEGs with regional external HTTP(S) load balancers or with any other load balancer types. Google documentation for Serverless NEGs is available here.
I'm not sure this meets your requirements, but it's possible to set up an App Engine Standard application (not certain about Flexible) such that it is only accessible to users logged into your G-Suite domain. This is the approach I've used for internal-facing applications in the past, but it only applies if your case involves an entity using G-Suite.
You can set this up under the App Engine application Settings, under Identity Aware Proxy.
In this scenario the application is still operating at a publicly accessible location, but only users logged into your G-Suite domain can access it.
It should be possible with the GAE flexible environment. From Advanced network configuration:
You can segment your Compute Engine network into subnetworks. This
allows you to enable VPN scenarios, such as accessing databases within
your corporate network.
To enable subnetworks for your App Engine application:
Create a custom subnet network.
Add the network name and subnetwork name to your app.yaml file, as specified above.
To establish a VPN, create a gateway and a tunnel for a custom subnet network.
The standard env GAE doesn't offer access to the networking layer to achieve such goal.

Using Docker compose within Google App Engine

I am currently experimenting with the Google App Engine flexible environment, especially the feature allowing you to build custom runtimes by providing a Dockerfile.
Docker provides a really nice feature called docker-compose for defining and running multi-container Docker applications.
Now the question is, is there any way one can use the power of docker-compose within GAE? If the answer is no, what would be the best approach for deploying a multi-container application (for instance Nginx + PHP-FPM + RabbitMQ + Elasticsearch + Redis + MongoDB, ...) within GAE flexible environment using Docker?
It is not possible at this time to use docker-compose to have multiple application containers within a single App Engine instance. This does seem however to be by design.
Scaling application components independently
If you would like to have multiple application containers, you would need to deploy them as separate App Engine services. There would still only be a single application container per service instance but there could be multiple instances of each service. This would grant you the flexibility you seek of scaling each application component independently. In addition, if the application in a container were to hang, it could not affect other services as they would reside in different VMs.
An added benefit of deploying each component as a separate service is that one need not use the flexible environment for every service. For some very small tasks such as API backends or serving relatively slow-changing web content, the standard environment may suffice and may be less expensive at low resource levels.
Communication between components
Since one of your comments mentions getting instance IPs, I thought you might find inter-service communication useful. I'm not certain for what reason you wish to use VM instance IPs but a typical use case might be to communicate between instances or services. To do this without instance IPs, your best bet is to issue HTTP request from one service to another simply using the appropriate url. If you have a service called web and one called api, the web service can issue a request to where your application is hosted and the api service will receive a request with the X-Appengine-Inbound-Appid header specified with your project ID. This can serve as a way a identifying the request as coming from your own application.
Multicontainer application using Docker
You mention many examples of applications including NGinx, PHP-FPM, RabbitMQ, etc.. With App Engine using custom runtimes, you can deploy any container to handle traffic as long as it responds to requests from port 8080. Keep in mind that the primary purpose of the application is to serve responses. The instances should be designed to start up and shut down quickly to be horizontally scalable. They should not be used to store any application data. That should remain outside of App Engine using tools like Cloud SQL, Cloud Datastore, BigQuery or your own Redis instance running on Compute Engine.
I hope this clarifies a few things and answers your questions.
You can follow following steps to create a container with docker-compose file in Google App Engine.
Follow link
You can build your custom image using docker-compose file
docker-compose build
Create a tag for local build
Push image to google registry
deploy Container
gcloud app deploy --image-url=[HOSTNAME]/[PROJECT-ID]/[IMAGE]
please add auth for docker commands to run.

How to access GAE datastore with Objectify and service account credentials?

Is it possible for one GAE application to access the datastore of another GAE application (both applications are hosted under the same Google account) using Objectify? If so, how can I pass service account credentials to Objectify (which API calls)?
It is not possible. Objectify is a very simple and convenient lightweight ORM that sits on top of a GAE Datastore, thus shielding the developer from most of the complexities of using JDO/JPA.
Nowhere in the documentation have I seen the scenario you describe mentioned because that is not the problem it is trying to solve.
I suspect what you will probably need to do is create a Web Service that exposes your GAE application (whose data you want) through an API. Then have your other GAE application call those service methods to obtain the data it needs.
Alternatively, you can use something called remote_api. It allows you to access and manipulate a GAE Datastore remotely.
Below are some links I just found to similar questions after posting my answer:
Can I access Datastore entities of my other Google App Engine Applications
Can one application access other applications data querying the key in Google App Engine?
A solution is to have only one "GAE application" but to make different Modules in your application. The Datastore will be shared between the modules.
Another solution is to use the Remote API (, but you won't be able to use Objectify, I think...

Hosting/transferring a web site on Google App Engine

I have my website currently hosted on paid server, but i want to transfer it on GAE.
How can i do it? Can anyone please help me in this case.I'd appreciate your help.
1) First you will have to adapt your website to the GAE framework (python with django or the new Java environment). You can test your work by downloading the SDK of GAE which offer a local server.
2) Then create an account on and upload your application on, test it.
3) If you have a domain name, create a google apps account on this domain, and finally bind this domain with your GAE website. Here is the Google doc.
If it is just a static website which does not need server side scripts or a database, then you might want to look into Google Sites instead of Appengine. You can find out more about Sites here:
If you do have some server side logic going on, you will need to convert it to either python or java and convert your relational database to Google's Data API which does not support the SQL your current database uses. You can read more about the APIs and what is supported with the Data API and tutorials at:
In response to sanorita's comment "Actually, it's generated html and not plain html. and google appengine is for static data... right?":
AppEngine can host static data, but that is far from its intent.
The purpose of AppEngine is to allow developers to easily deploy their dynamic applications on Google's infrastructure. In the end, assuming you have programmed your app in effective ways to handle scaling (basically just noting that writes to the database are expensive, and contention is the root of all evil) you can handle nearly any amount of traffic.