Anyone up to creating a tomcat based alternative for GAE? - google-app-engine

If we had the possibility to run GAE app without any code change on our servlet engine that would be great because:
in case that google changes their billing policy we can just jump to our own server or in case their current policy doesn't fit our app needs
we can do stuff which is not allowed in the GAE, compromising a 1 JVM, 1 DB
We don't actually need a distributed system but more of a realtime system with synchronize, true locking mechanisms, other servers/software installed on the server machine, socket interface etc...
Such a package should include at least:
TomCat (or equivalent)
DataNucleus Access Platform
(Task Queue service)
Any idea if it's easy to get such a thing or if it's already exist somewhere?
Thanks

Good question - GAE is excellent, but it has considerable limitations, so I think it is a good idea to keep your options open. With that in mind here are some options.
http://appscale.cs.ucsb.edu/
"AppScale is a platform that allows users to deploy and host their own Google App Engine applications. It executes automatically over Amazon EC2 and Eucalyptus as well as Xen and KVM. It has been developed and is maintained by the RACELab at UC Santa Barbara.
There is also TyphoonAE but it is Python specific so probably not useful for you.
Also take note of the Siena project...
http://www.sienaproject.com/index.html
This is supposed to provide GAE/J users with a persistence API that is better suited to the GAE Datastore then JDO/JPA, but is still portable to other platforms.

Related

Caching after GAE standard migration to Go 1.11/1.12

I've almost completed migrating based on google's instructions.
It's very nice to not have to call into the app-engine libraries whatsoever.
However, now I must replace my calls to app-engine-standard memcached.
Here's what the guide says: "To use a memcache service on App Engine, use Redis Labs Memcached Cloud instead of App Engine Memcache."
So is this my only option; a third party? They don't even list pricing on their page if GCE is selected.
I also see in the standard environment how-to guides there is a guide on Connecting to internal resources in a VPC network.
From that link it mentions Cloud Memorystore. I can't find any examples if this is advisable or possible to do on GAE standard. Of course it wasn't previously possible but now that GAE standard has become much more "standard", I think it should be possible?
Thanks for any advice on the best way forward.
Memorystore appears to be Google's replacement:
https://cloud.google.com/memorystore/
You connect to it using this guide:
https://cloud.google.com/appengine/docs/standard/go/using-memorystore
Alas it costs about $1.20/GB per day with no free quota.
Thus, if your data doesn't change, and requires less than 100MB of cache at a time, the first answer might be better (free). Also, your data won't explode the instance as you can control the max size of the cache.
However, if your data changes or you need more cache, MemoryStore is a more direct replacement to MemCache - just costs money.
I've been thinking about this. 2nd gen instances have twice the ram, so if global cache isn't required (as in items don't change once created - (name items using their sha256)), you can run your own local threadsafe memcache (such as https://github.com/dgraph-io/ristretto) and allocate some of the extra ram to it. It'll be faster than Memcache was, so requests can be serviced even faster, keeping the number of instances low.
You could make it global for data that does change, by using pub/sub between instances, but I think that's significantly more work.
To ease the migration to 1.12, I have been thinking of using this solution:
create a dedicated app using the 1.11 runtime.
setup twirp endpoints to act as a proxy for all the deprecated app engine services (memcache, mail, search...)

When to use Google App Engine Flex vs Google Cloud Run

I want to deploy containerized code using one of Google's serverless options. From what I understand Google has two options for this:
Google App Engine Flexible Environment
Google Cloud Run (in beta)
I've watched the 2019 Google Next talk Where Should I Run My Code? Choosing From 5+ Compute Options. And I read Jerry101's answer to the general question "What is the difference between Google App Engine and Google Cloud Run?".
To me it basically sounds like Cloud Run is the answer to the limitations of using Google App Engine Flexible Environment.
The reasons I can think of to choose App Engine Flexible Environment over Cloud Run are:
Legacy - if your code currently relies on App Engine Flex you might not want to deal with moving it
Track record - App Engine Flex has been around for a while in general availability and in that sense has a track record, whereas Cloud Run is just in Beta
But those are both operation type considerations. Neither is a concern for me. Is there a technical advantage to choosing App Engine Flex over Cloud Run?
Thanks
Note: The beta Serverless VPC Access for App Engine is only available for the standard environment as of this question posting April 2019, not for Flex, so that's not a consideration in the question of App Engine Flex vs Cloud Run
Pricing/Autoscaling: The pricing model between GAE Flexible Environment and Cloud Run are a bit different.
In GAE Flexible, you are always running at least 1 instance at any time. So even if your app is not getting any requests, you’re paying for that instance. Billing granularity is 1 minute.
In Cloud Run, you are only paying when you are processing requests, and the billing granularity is 0.1 second. See here for an explanation of the Cloud Run billing model.
Underlying infrastructure: Since GAE Flexible is running on VMs, it is a bit slower than Cloud Run to deploy a new revision of your app, and scale up. Cloud Run deployments are faster.
Portability: Cloud Run uses the open source Knative API and its container contract. This gives you flexibility and freedom to a greater extent. If you wanted to run the same workload on an infra you manage (such as GKE), you could do it with "Cloud Run on GKE".
I'd actually suggest that you seriously consider Cloud Run over App Engine.
Over time, I've seen a few comments of a "new" App Engine in the works, and it really seems that Cloud Run is that answer. It is in beta, and that can be an issue. I've seen some companies use beta services in production and others wait. However, if I am going to start a new app today - it's going to be Cloud Run over App Engine Flex for sure.
Google is very deep into Kubernetes as a business function. As Cloud Run is sitting on GKE - this means that it is indirectly receiving development via other teams (The general GKE infrastructure).
Conversely, App Engine is on some older tech. Although it's not bad - it is "yesterday's" technology. Google, to me, seems to be a company that gets really excited about what is new and what is being highly adopted in the industry.
All this said - when you wrap your services in a container, you should be able to run them anywhere? Well, anywhere there is a container platform. You can front your services with something like Cloud Endpoints and you can deploy on both App Engine and Cloud Run - swap back and forth. Then, at that point, the limitations might be the services that are offered. For instance, Cloud Run currently doesn't support some items, like Cloud Load Balancing or Cloud Memorystore. That might be a blocker today.
Short story: Appengine is something real, relatively stable. Cloud Run is pretty much just a draft/idea, very unstable.
Long story:
Being in alpha/beta Google Cloud Run may suffer many changes. If you are old enough you might remember how dramatically Appengine pricing has changed. It promised a CPU/RAM based pricing, then it decided that's not "possible" or at least not very profitable and moved to a VM based pricing, then they shipped a decent appengine release(Appengine Flex or whatever name it had at that time) but also increased the price again by adding a minimum instance model. Not to mention the countless APIs/breaking changes or the limits changes.
Cloud Run is based on gVisor which has some limitations so depending on the language/library you use and what you do, it may break(or just Google's implementation may break) at some point and there is nothing you can do (i.e. patch the system) and it will ruin your productivity and potentially your business. You may have a look on its current issues.
Free advice: Even if you choose Appengine or Cloud Run avoid proprietary APIs/services such Google Datastore. They may ruin your business. Pricing, APIs and limits will change. There is no real open source or paid alternative so your code is not portable. Your code is pretty worthless outside of Google cloud.
Disclaimer: I've been burned by appengine changes and datastore lock-in so my opinion may be biased.
I have a ML model with REST API interface as a micro service. When I tried to run with Cloud Run, it deploys but just does not work. I had to switch back to App Engine Flexible Env.
Cloud Run (fully managed) currently (Jul 2020) has RAM limitation of 2GB. For better hardware I should go for Anthose with GKE infra. But this has min instance needs of at least 4 instances to properly work.
Mine being a tiny application I settled for App engine Flexible environment. Though autoscale settings required min 2 instances, in early days it could be managed with manual scaling and 1 instance as limit.
EDIT:
As on Aug 22 2020, the RAM limit is 4GB and number of cores is 2, for fully managed GCP cloud Run.
Main difference is background tasks.
In cloud run, everything kicks off by a request, and once that request completes, the instance won't be up up any longer.
App Engine also gave you some built in freebies like memory caching, but I don't think that's true of App Engine flex.
For a straightforward HTTP API, the differences are negligible, and you can get some of the things that App Engine gives you with other GCP products (Cloud Scheduler, Cloud Task).
You can check this video out for a comparison and demo on cloud run:
https://www.youtube.com/watch?v=rVWopvGE74c
App Engine Flexible, focus on "Code first", developer-focused, App Engine app was made up of multiple services, you really didn't have to do any kind of naming when deploying your applications.
Characteristics of the GAE flexible environment :
It is not possible to downscale to ZERO
Source code that is written in a version of any of the supported
programming languages: Python, Java, Node.js, Go, Ruby, PHP, or .NET
Runs in a Docker container that includes a custom runtime or source
code written in other programming languages.
Uses or depends on frameworks that include native code.
Accesses the resources or services of your Google Cloud project that
reside in the Compute Engine network.
Maximum request timeout: 60 minutes
Cloud Run is a managed compute platform that enables you to run containers that are invocable via requests or events, everything is a service, whether it's an actual service or an application with a web interface, so consider its use as the deployment of a service rather than an appplication.
Characteristics of Cloud Run :
It is serverless: it abstracts away all infrastructure management
It depends on the fact that your application should be stateless.
GCP will spin up multiple instances of your app to scale it dynamically
Downscale to ZERO
You can use below url to get difference between Cloud Run and App Engine.
Hosting Options
Some times many reason to use App Engine over the Cloud Run is, Cloud Run doesn’t Background processes. It response time also 15 mins only.

Does Google App Engine support multiprocessing via python, and does the DB support multiple writes in localhost?

Regarding a production environment, I would like to know if the python standard environment (2.7) at Google App Engine supports code with multiprocessing and pooling? Using Google´s datastore. Or should Map Reduce be used instead?
And regarding development environment in a localhost, also I would like to know, how to avoid a database lock when writing to the same database from processes started from different shell terminals?
Thanks
You can have a look at this post on Google Groups, where it is confirmed that multiprocessing is not available in Google App Engine (GAE) Standard environment, but you can implement it in GAE Flexible. You might also be interested in this post about parallel execution in GAE, and Tasklets in particular with a Cloud Datastore example.
Regarding database lock:
Updates are actually done within a datastore transaction and NDB by default will retry the operation three times before failing altogether. It is recommended you only update an entity group once per second at the most. If you are seeing database locks, then you're probably doing something wrong. We implemented a version of the "fork-join queue" described by Brett Slatkin back in 2010 data pipelines talk, which is a method of "joining" many updates to the same entity such that they can all be applied at once at a controlled rate: https://www.youtube.com/watch?v=zSDC_TU7rtc&feature=youtu.be&t=33m37s
also, see the discussion going on here:
How to deal with eventual consistency in fork-join-queue

Moving app to GAE

Context:
We have a web application written in Java EE on Glassfish + MySQL with JDBC for storage + XMPP connection for GCM CCS upstream messaging + Quartz scheduler library. Those are the main components of the app.
We're wrapping up our app development stage and we're trying to figure out the best option for deploying it, considering future scalability, both in number of users and their geographical location (ex, multiple VMS on multiple continents, if needed). Currently we're using a single VM instance from DigitalOcean for both the application server and the MySQL server, which we can then scale up with some effort (not much, but probably more than GAE).
Questions:
1) From what I understood, Cloud SQL is replicated in multiple datacenters across the world, thus storage-wise, the geographical scalability is assured. So this is a bonus. However, we would need to update the DB interaction code in the app to make use of Cloud SQL structure, thus being locked-in to their platform. Has this been a problem for you ? (I haven't looked at their DB interaction code yet, so I might be off on this one)
2) Do GAE instances work pretty much like a normal VM would ? Are there any differences regarding this aspect ? I've understood the auto-scaling capability, but how are they geographically scalable ? Do you have to select a datacenter location for each individual instance ? Or how can you achieve multiple worldwide instance locations as Cloud SQL does right out of the box ?
3) How would the XMPP connection fare with multiple instances ? Considering each instance is a separate VM, that means each instance will have a unique XMPP connection to the GCM CCS server. That would cause multiple problems, ex if more than 10 instances are being opened, then the 10 simultaneous XMPP connections limit cap will be hit.
4) How would the Quartz scheduler fare with the instances ? Right now we're saving the scheduled jobs in the MySQL database and scheduling them at each server start. If there are multiple instances, that means the jobs will be scheduled on each instance; so if there are multiple instances, the jobs will be scheduled multiple times. We wouldn't want that.
So, if indeed problems 3 & 4 are like I described, what would be the solution to those 2 problems ? Having a single instance for the XMPP connection as well as a single instance for the Quartz scheduler ?
1) Although CloudSQL is a managed replicated RDBMS, you still need to chose a region when creating an instance. That said, you cannot expect the latency to be great seamlessly globally. You would still need to design a proper architecture to achieve that.
2) GAE isn't a VM in any sense. GAP is a PaaS and, therefore, a runtime environment. You should expect several differences: you are restricted to Java, PHP, GO and Python, the mechanisms GAE provide do you out-of-the-box and compatible third-parties libraries. You cannot install a middleware there, for example. On the other hand, you don't have to deal with anything from the infrastructure standpoint, having transparent high-availability and scalability.
3) XMPP is not my strong suit, but GAE offers a XMPP service, you should take a look at it. More info: https://developers.google.com/appengine/docs/java/xmpp/?hl=pt-br
4) GAE offers a cron service that works pretty well. You wouldn't need to use Quartz.
My last advice is: if you want to migrate an existing application, your best choice would probably be GCE + CloudSQL, not GAE.
Good luck!
Cheers!

Will using a Cloud PaaS automatically solve scalability issues?

I'm currently looking for a Cloud PaaS that will allow me to scale an application to handle anything between 1 user and 10 Million+ users ... I've never worked on anything this big and the big question that I can't seem to get a clear answer for is that if you develop, let's say a standard application with a relational database and soap-webservices, will this application scale automatically when deployed on a Paas solution or do you still need to build the application with fall-over, redundancy and all those things in mind?
Let's say I deploy a Spring Hibernate application to Amazon EC2 and I create single instance of Ubuntu Server with Tomcat installed, will this application just scale indefinitely or do I need more Ubuntu instances? If more than one Ubuntu instance is needed, does Amazon take care of running the application over both instances or is this the developer's responsibility? What about database storage, can I install a database on EC2 that will scale as the database grow or do I need to use one of their APIs instead if I want it to scale indefinitely?
CloudFoundry allows you to build locally and just deploy straight to their PaaS, but since it's in beta, there's a limit on the amount of resources you can use and databases are limited to 128MB if I remember correctly, so this a no-go for now. Some have suggested installing CloudFoundry on Amazon EC2, how does it scale and how is the database layer handled then?
GAE (Google App Engine), will this allow me to just deploy an app and not have to worry about how it scales and implements redundancy? There appears to be some limitations one what you can and can't run on GAE and their price increase recently upset quite a large number of developers, is it really that expensive compared to other providers?
So basically, will it scale and what needs to be done to make it scale?
That's a lot of questions for one post. Anyway:
Amazon EC2 does not scale automatically with load. EC2 is basically just a virtual machine. You can achieve scaling of EC2 instances with Auto Scaling and Elastic Load Balancing.
SQL databases scale poorly. That's why people started using NoSQL databases in the first place. It's best to see which database your cloud provider offers as a managed service: Datastore on GAE and DynamoDB on Amazon.
Installing your own database on EC2 instances is very impractical as EC2 has ephemeral storage (it looses all data on "disk" when it reboots).
GAE Datastore is actually a one big database for all applications running on it. So it's pretty scalable - your million of users should not be a problem for it.
http://highscalability.com/blog/2011/1/11/google-megastore-3-billion-writes-and-20-billion-read-transa.html
Yes App Engine scales automatically, both frontend instances and database. There is nothing special you need to do to make it scale, just use their API.
There are limitations what you can do with AppEngine:
A. No local storage (filesystem) - you need to use Datastore or Blobstore.
B. Comet is only supported via their proprietary Channels API
C. Datastore is a NoSQL database: no JOINs, limited queries, limited transactions.
Cost of GAE is not bad. We do 1M requests a day for about 5 dollars a day. The biggest saving comes from the fact that you do not need a system admin on GAE ( but you do need one for EC2). Compared to the cost of manpower GAE is incredibly cheap.
Some hints to save money (an speed up) GAE:
A. Use get instead of query in Datastore (requires carefully crafting natiral keys).
B. Use memcache to cache data you got form datastore. This can be done automatically with objectify and it's #Cached annotation.
C. Denormalize data. Meaning you write data redundantly in various places in order to get to it in as few operations as possible.
D. If you have a lot of REST requests from devices, where you do not use cookies, then switch off session support ( or roll your own as we did). Sessions use datastore under the hood and for every request it does get and put.
E. Read about adjusting app settings. Try different settings (depending how tolerant your app is to request delay and your traffic patterns/spikes). We were able to cut down frontend instances by 70%.

Resources