Moving app to GAE - google-app-engine

Context:
We have a web application written in Java EE on Glassfish + MySQL with JDBC for storage + XMPP connection for GCM CCS upstream messaging + Quartz scheduler library. Those are the main components of the app.
We're wrapping up our app development stage and we're trying to figure out the best option for deploying it, considering future scalability, both in number of users and their geographical location (ex, multiple VMS on multiple continents, if needed). Currently we're using a single VM instance from DigitalOcean for both the application server and the MySQL server, which we can then scale up with some effort (not much, but probably more than GAE).
Questions:
1) From what I understood, Cloud SQL is replicated in multiple datacenters across the world, thus storage-wise, the geographical scalability is assured. So this is a bonus. However, we would need to update the DB interaction code in the app to make use of Cloud SQL structure, thus being locked-in to their platform. Has this been a problem for you ? (I haven't looked at their DB interaction code yet, so I might be off on this one)
2) Do GAE instances work pretty much like a normal VM would ? Are there any differences regarding this aspect ? I've understood the auto-scaling capability, but how are they geographically scalable ? Do you have to select a datacenter location for each individual instance ? Or how can you achieve multiple worldwide instance locations as Cloud SQL does right out of the box ?
3) How would the XMPP connection fare with multiple instances ? Considering each instance is a separate VM, that means each instance will have a unique XMPP connection to the GCM CCS server. That would cause multiple problems, ex if more than 10 instances are being opened, then the 10 simultaneous XMPP connections limit cap will be hit.
4) How would the Quartz scheduler fare with the instances ? Right now we're saving the scheduled jobs in the MySQL database and scheduling them at each server start. If there are multiple instances, that means the jobs will be scheduled on each instance; so if there are multiple instances, the jobs will be scheduled multiple times. We wouldn't want that.
So, if indeed problems 3 & 4 are like I described, what would be the solution to those 2 problems ? Having a single instance for the XMPP connection as well as a single instance for the Quartz scheduler ?

1) Although CloudSQL is a managed replicated RDBMS, you still need to chose a region when creating an instance. That said, you cannot expect the latency to be great seamlessly globally. You would still need to design a proper architecture to achieve that.
2) GAE isn't a VM in any sense. GAP is a PaaS and, therefore, a runtime environment. You should expect several differences: you are restricted to Java, PHP, GO and Python, the mechanisms GAE provide do you out-of-the-box and compatible third-parties libraries. You cannot install a middleware there, for example. On the other hand, you don't have to deal with anything from the infrastructure standpoint, having transparent high-availability and scalability.
3) XMPP is not my strong suit, but GAE offers a XMPP service, you should take a look at it. More info: https://developers.google.com/appengine/docs/java/xmpp/?hl=pt-br
4) GAE offers a cron service that works pretty well. You wouldn't need to use Quartz.
My last advice is: if you want to migrate an existing application, your best choice would probably be GCE + CloudSQL, not GAE.
Good luck!
Cheers!

Related

Distributed database with hundreds of read-only replicas which can synchronise asynchronously through HTTP

I have a service running as a sidecar next to a variety of applications.
This service needs to be extremely fast and do not make remote calls.
It has to have in-memory database. The contents of this database have to be populated and kept up-to-date (although a lag is acceptable) with a central component.
The service does not accept writes.
Of course this could be done through a mechanism of long pooling, for instance, but this brings the complexity of managing this solution and some intrinsic inefficiencies.
Is there a lightweight, ephemeral in-process and preferably in-memory database that can synchronise asynchronously with central replica preferably through regular HTTP so that no ports needs to be opened?
Maybe Couchbase lite/mobile is what you are after. Atleast mobile is syncing over a web socket, not sure about which protocol lite is running (or if there is actually a difference between the products).
Seems like couchbase lite replaced touchDB which was a mobile version of CouchDB IIRC.
Another variant might be running pouchDB and using CouchDB as the master backend. You don't say which platform the application will run on, which is relevant if you want an in-process solution.

Which M.. tier do I need for my app (MongoDB)?

I’m new to MongoDB,
I have an Ionic App for a local restaurant where you have some products which you can order. The app also have a register to create some users. There is also a Angular Web App where you can put in products and look up users etc.
Both apps are connected to the MongoDB. Unfortunately I don’t have any clue which data plan is necessary for the deployment of these two apps.
Is it maybe better to switch to Firebase?
Can anybody help me please?
Best regards
Basti
Selecting a tier in MongoDB-Atlas depends on various factors like data size, IOPS, Price etc.. As you're saying this is for a local restaurant & I would assume there could be less traffic to the App, then in that case you can go with M10 cause that's where MongoDB Atlas really provide some valuable features to database which is used in production environment. For development environment you can give a try with M5 cluster. Some features you can enjoy using M10 or above are :
Dedicated Cluster : Clusters deploy each mongod process to its own instance, Where as M0, M2 & M5 are in shared environment, So Atlas will automatically upgrade the cluster to latest version upon availability which is not preferred in realtime Apps as there could be a functionality/package that can break with upgrades.
Queryable backups : You can take advantage of querying specific continuous backup snapshot, Which is really helpful to restore a part of data instead of entire dataset which is back'd up a day ago.
Supports Network Peering : As most of projects now a days use cloud platforms to deploy Apps, Clusters >= M10 supports network peering.
Metrics & Performance Advisor : This is one most important thing which you'll get benefited using clusters >= M10. Using alerts you'll get to know which kind of queries are taking much time, How many connections are open at a given time, monitor CPU threshold & get alerted, additionally MongoDB can suggest you with indexes to be created for better performance of queries being run on collections which fail to use index already present in.
At the end of the day, Remaining most other features remain almost same. From my experience usually you'll estimate & pre-pay certain amount for MongoDB Atlas account for around 3 years, Where you don't get back anything if you've not utilized all of it. Also you can upgrade & downgrade clusters at anytime manually or can be automatically scaled up or down based on incoming traffic.
Ref : cluster-tier

Tomcat via Apache Server going down after too many connections

I have an Apache (2.4) Server that serves content through the AJP connector on a Tomcat 7 Server.
One of my clients manages to kill the tomcat instance after running too many concurrent connections to a JSP JSON Api service. (Apache still works, but tomcat falls over. Restarting Tomcat brings it back up) there are no errors in tomcats logs.
I would like to protect the site from falling over like that, but I am not sure what configurations to change.
I do not want to limit the number of concurrent connections as there are legitimate use cases for that,
My Tomcat memory settings are :
Initial Memory pool : 1280MB
Maximum memory pool : 2560MB
which I assumed was plenty.
It might be worth mentioning that the API service relies on multiple, possibly heavy MySQL connections.
Any advice would be most appreciated.
Why don't you'd slowly switch your most used/ important application features to microservices architecture and dockerize your tomcat servers to be able to manage multiple instances of your application. This will hopefully help your application to manage multiple connections without impacting the overall performance of the servers at the job.
If you are talking about scaling, you need to do the horizontal scaling her with multiple tomcat servers.
If you cannot limit user connections & still want the app to run smooth, then you need to scale. Architectural change to microservices is an option but may not be possible always for a production solution.
The best to think about is running multiple tomcats sharing the load. There are various ways to do this. With your tech stack, I feel the Apache 2 load balancer plugin in combination with Tomcat will do best.
Have an example here.
Now, with respect to server capacity, db connection capacity etc, you might also need to think about vertical scaling.

Google App Engine Cloud SQL too many connections

I have an app that is running on Google App Engine and uses the Task Queue Api to do some of the heavier lifting in the background. Some of those tasks need to connect to Cloud SQL to do their work. At scale I get too many tasks attempting to connect to Cloud SQL at once. What I need is some sort of data service layer of a shared client so that the tasks themselves aren't making individual connections to Cloud SQL. If anyone has any ideas I'd love to hear them.
Yes, you can actually do this, but it will take a little bit of planning, coding, and configuration in your side.
One idea is to use a Pull Queue (instead of Push Queues). With a pull queue you can schedule your tasks and execute them in a separate module of your application. The hardware of that module can be configured separately from your main module, thus you can avoid too many instances serving requests which in turns will allow you to better use connection pooling.
Of course, depending on the traffic you are getting you might want to decide on how many concurrent backend instances you want running (and connecting to your DB) to avoid/minimize contention.
Easier said than done, but here are two resources that will help you out:
App Engine Modules - https://developers.google.com/appengine/docs/java/modules/
Pull Queues - https://developers.google.com/appengine/docs/python/taskqueue/overview-pull

Will using a Cloud PaaS automatically solve scalability issues?

I'm currently looking for a Cloud PaaS that will allow me to scale an application to handle anything between 1 user and 10 Million+ users ... I've never worked on anything this big and the big question that I can't seem to get a clear answer for is that if you develop, let's say a standard application with a relational database and soap-webservices, will this application scale automatically when deployed on a Paas solution or do you still need to build the application with fall-over, redundancy and all those things in mind?
Let's say I deploy a Spring Hibernate application to Amazon EC2 and I create single instance of Ubuntu Server with Tomcat installed, will this application just scale indefinitely or do I need more Ubuntu instances? If more than one Ubuntu instance is needed, does Amazon take care of running the application over both instances or is this the developer's responsibility? What about database storage, can I install a database on EC2 that will scale as the database grow or do I need to use one of their APIs instead if I want it to scale indefinitely?
CloudFoundry allows you to build locally and just deploy straight to their PaaS, but since it's in beta, there's a limit on the amount of resources you can use and databases are limited to 128MB if I remember correctly, so this a no-go for now. Some have suggested installing CloudFoundry on Amazon EC2, how does it scale and how is the database layer handled then?
GAE (Google App Engine), will this allow me to just deploy an app and not have to worry about how it scales and implements redundancy? There appears to be some limitations one what you can and can't run on GAE and their price increase recently upset quite a large number of developers, is it really that expensive compared to other providers?
So basically, will it scale and what needs to be done to make it scale?
That's a lot of questions for one post. Anyway:
Amazon EC2 does not scale automatically with load. EC2 is basically just a virtual machine. You can achieve scaling of EC2 instances with Auto Scaling and Elastic Load Balancing.
SQL databases scale poorly. That's why people started using NoSQL databases in the first place. It's best to see which database your cloud provider offers as a managed service: Datastore on GAE and DynamoDB on Amazon.
Installing your own database on EC2 instances is very impractical as EC2 has ephemeral storage (it looses all data on "disk" when it reboots).
GAE Datastore is actually a one big database for all applications running on it. So it's pretty scalable - your million of users should not be a problem for it.
http://highscalability.com/blog/2011/1/11/google-megastore-3-billion-writes-and-20-billion-read-transa.html
Yes App Engine scales automatically, both frontend instances and database. There is nothing special you need to do to make it scale, just use their API.
There are limitations what you can do with AppEngine:
A. No local storage (filesystem) - you need to use Datastore or Blobstore.
B. Comet is only supported via their proprietary Channels API
C. Datastore is a NoSQL database: no JOINs, limited queries, limited transactions.
Cost of GAE is not bad. We do 1M requests a day for about 5 dollars a day. The biggest saving comes from the fact that you do not need a system admin on GAE ( but you do need one for EC2). Compared to the cost of manpower GAE is incredibly cheap.
Some hints to save money (an speed up) GAE:
A. Use get instead of query in Datastore (requires carefully crafting natiral keys).
B. Use memcache to cache data you got form datastore. This can be done automatically with objectify and it's #Cached annotation.
C. Denormalize data. Meaning you write data redundantly in various places in order to get to it in as few operations as possible.
D. If you have a lot of REST requests from devices, where you do not use cookies, then switch off session support ( or roll your own as we did). Sessions use datastore under the hood and for every request it does get and put.
E. Read about adjusting app settings. Try different settings (depending how tolerant your app is to request delay and your traffic patterns/spikes). We were able to cut down frontend instances by 70%.

Resources