Tomcat via Apache Server going down after too many connections - apache2

I have an Apache (2.4) Server that serves content through the AJP connector on a Tomcat 7 Server.
One of my clients manages to kill the tomcat instance after running too many concurrent connections to a JSP JSON Api service. (Apache still works, but tomcat falls over. Restarting Tomcat brings it back up) there are no errors in tomcats logs.
I would like to protect the site from falling over like that, but I am not sure what configurations to change.
I do not want to limit the number of concurrent connections as there are legitimate use cases for that,
My Tomcat memory settings are :
Initial Memory pool : 1280MB
Maximum memory pool : 2560MB
which I assumed was plenty.
It might be worth mentioning that the API service relies on multiple, possibly heavy MySQL connections.
Any advice would be most appreciated.

Why don't you'd slowly switch your most used/ important application features to microservices architecture and dockerize your tomcat servers to be able to manage multiple instances of your application. This will hopefully help your application to manage multiple connections without impacting the overall performance of the servers at the job.

If you are talking about scaling, you need to do the horizontal scaling her with multiple tomcat servers.
If you cannot limit user connections & still want the app to run smooth, then you need to scale. Architectural change to microservices is an option but may not be possible always for a production solution.
The best to think about is running multiple tomcats sharing the load. There are various ways to do this. With your tech stack, I feel the Apache 2 load balancer plugin in combination with Tomcat will do best.
Have an example here.
Now, with respect to server capacity, db connection capacity etc, you might also need to think about vertical scaling.

Related

Database connection pool strategy for micro services

We are trying to convert our monolithic application to a micro services based architecture. We use Postgresql as one of our database in the monolithic application with BoneCP for connection pooling.
When this monolith is split to a number of independent micro-services with each of them running in a different JVM, I can think about two options for connection pooling
BoneCP or any decent connection pool for each microservice - My initial research shows that this is the primary choice. It is possible to have a fine grained control of connection requirements for each service.But, down side is that as the number of services increase, number of connection pool also increases and eventually there will be too many idle connections assuming that minimum connections in each pool is greater than 0.
Rely on database specific extensions like PGBouncer - This approach has the advantage that connection pool is managed by a central source rather than a pool for each micro service and hence number of idle connections can be brought down. It is also language/technology agnostic. Down side is that these extensions are database specific and some of the functionalities in JDBC may not work. For eg: Prepared statments may not work with PGBouncer in Transaction_Pooling mode.
In our case most of the micro-services(at least 50) will be connecting to the same Postgres server even though the database can be different. So, if we go with option 1, there is a higher chance of creating too many idle connections.The traffic to most of our services are very moderate and the rationale behind moving to micro-service is for easier deployment, scaling etc.
Has anyone faced a similar problem while adopting micro-services architecture? Is there a better way of solving this problem in micro-service world?
I don't see how pgbouncer will solve any of the problems you would have with the first approach. There are many reasons to use pgbouncer but I don't think they are really applicable here.
Also, in my experience, while idle connections can be an issue, they probably will not be on the scale you are talking about. I mean we are not talking hundreds of idle connections right?
More critically, one key thing that a microservices approach would give you is an ability to move dbs off to other servers. If you do this, then having your connection pool centrally managed makes this harder to do.
Per-service pool is generally more flexible and it makes your infrastructure quite a bit more flexible too.
I have responded a similar question here: Microservices - Connection Pooling when connecting to a single legacy database
"I am facing a similar dilemma at my work and I can share the conclusions we have reached so far.
There is no silver bullet at the moment, so:
1 - Calculate the number of connections dividing the total desired number of connections for the instances of microservices will work well if you have a situation where your microservices don't need to drastically elastic scale.
2 - Not having a pool at all and let the connections be opened on demand. This is what is being used in functional programming (like Amazon lambdas). It will reduce the total number of open connections but the downside is that you lose performance as per opening connections on the fly is expensive.
You could implement some sort of topic that let your service know that the number of instances changed in a listener and update the total connection number, but it is a complex solution and goes against the microservice principle that you should not change the configurations of the service after it started running.
Conclusion: I would calculate the number if the microservice tend to not grow in scale and without a pool if it does need to grow elastically and exponentially, in this last case make sure that a retry is in place in case it does not get a connection in the first attempt.
There is an interesting grey area here awaiting for a better way of controlling pools of connections in microservices.
In time, and to make the problem even more interesting, I recommend reading the
article About Pool Sizing from HikariCP: https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing The ideal concurrent connections in a database are actually smaller than most people think."
Maybe group some smaller number of microservices into modulith and use karaf, or other osgi container as a runtime for them. Then you can create bundle that will represent a connection-pool for your database so other bundles — microservices can use it. But I'm not sure if it will solve your architecture problem.
Let's say you have the limiting requirement - only 10 connections to the database.
You can run 10 instances of the microservice with the connection pool limited to 1 connection max. Or you can run 3 instances with pool max=3.
The centralized connection pool, which would serve multiple services in the cloud, sounds bad (the typical single point of failure).

How do I determine the ideal value for my database.yml pool?

Longtime listener, first-time caller here... my Postgres database says it allows the following:
Connections:
6/120
What should my corresponding "pool" setting be in this scenario? 6? 120? Something else entirely? Thanks in advance for any help here.
If it makes a difference I'm using Puma & Sidekiq to run a Rails 4 application on Heroku.
How many connections does your app use under typical load? Set the idle pool to that, and set the max pool to somewhere under the max allowed by the server.
But, that server connections setting should also be tuned to your application and hardware. It's typically some function of your core count, RAM and work_mem setting, and the kind of disks you have, but will also depend on what kind of queries your app typically runs.
(see here for some tips: https://wiki.postgresql.org/wiki/Number_Of_Database_Connections)
Postgres is actually pretty forgiving: opening connections is cheap (an undersized pool), compared to many other databases; idle open connections (oversized pool) are also cheap (a few K of shared buffers, if memory serves).
It's really having more active connections than your resources allow that will cause problems, which is why the server-side configuration is more important.

Moving app to GAE

Context:
We have a web application written in Java EE on Glassfish + MySQL with JDBC for storage + XMPP connection for GCM CCS upstream messaging + Quartz scheduler library. Those are the main components of the app.
We're wrapping up our app development stage and we're trying to figure out the best option for deploying it, considering future scalability, both in number of users and their geographical location (ex, multiple VMS on multiple continents, if needed). Currently we're using a single VM instance from DigitalOcean for both the application server and the MySQL server, which we can then scale up with some effort (not much, but probably more than GAE).
Questions:
1) From what I understood, Cloud SQL is replicated in multiple datacenters across the world, thus storage-wise, the geographical scalability is assured. So this is a bonus. However, we would need to update the DB interaction code in the app to make use of Cloud SQL structure, thus being locked-in to their platform. Has this been a problem for you ? (I haven't looked at their DB interaction code yet, so I might be off on this one)
2) Do GAE instances work pretty much like a normal VM would ? Are there any differences regarding this aspect ? I've understood the auto-scaling capability, but how are they geographically scalable ? Do you have to select a datacenter location for each individual instance ? Or how can you achieve multiple worldwide instance locations as Cloud SQL does right out of the box ?
3) How would the XMPP connection fare with multiple instances ? Considering each instance is a separate VM, that means each instance will have a unique XMPP connection to the GCM CCS server. That would cause multiple problems, ex if more than 10 instances are being opened, then the 10 simultaneous XMPP connections limit cap will be hit.
4) How would the Quartz scheduler fare with the instances ? Right now we're saving the scheduled jobs in the MySQL database and scheduling them at each server start. If there are multiple instances, that means the jobs will be scheduled on each instance; so if there are multiple instances, the jobs will be scheduled multiple times. We wouldn't want that.
So, if indeed problems 3 & 4 are like I described, what would be the solution to those 2 problems ? Having a single instance for the XMPP connection as well as a single instance for the Quartz scheduler ?
1) Although CloudSQL is a managed replicated RDBMS, you still need to chose a region when creating an instance. That said, you cannot expect the latency to be great seamlessly globally. You would still need to design a proper architecture to achieve that.
2) GAE isn't a VM in any sense. GAP is a PaaS and, therefore, a runtime environment. You should expect several differences: you are restricted to Java, PHP, GO and Python, the mechanisms GAE provide do you out-of-the-box and compatible third-parties libraries. You cannot install a middleware there, for example. On the other hand, you don't have to deal with anything from the infrastructure standpoint, having transparent high-availability and scalability.
3) XMPP is not my strong suit, but GAE offers a XMPP service, you should take a look at it. More info: https://developers.google.com/appengine/docs/java/xmpp/?hl=pt-br
4) GAE offers a cron service that works pretty well. You wouldn't need to use Quartz.
My last advice is: if you want to migrate an existing application, your best choice would probably be GCE + CloudSQL, not GAE.
Good luck!
Cheers!

Google App Engine Cloud SQL too many connections

I have an app that is running on Google App Engine and uses the Task Queue Api to do some of the heavier lifting in the background. Some of those tasks need to connect to Cloud SQL to do their work. At scale I get too many tasks attempting to connect to Cloud SQL at once. What I need is some sort of data service layer of a shared client so that the tasks themselves aren't making individual connections to Cloud SQL. If anyone has any ideas I'd love to hear them.
Yes, you can actually do this, but it will take a little bit of planning, coding, and configuration in your side.
One idea is to use a Pull Queue (instead of Push Queues). With a pull queue you can schedule your tasks and execute them in a separate module of your application. The hardware of that module can be configured separately from your main module, thus you can avoid too many instances serving requests which in turns will allow you to better use connection pooling.
Of course, depending on the traffic you are getting you might want to decide on how many concurrent backend instances you want running (and connecting to your DB) to avoid/minimize contention.
Easier said than done, but here are two resources that will help you out:
App Engine Modules - https://developers.google.com/appengine/docs/java/modules/
Pull Queues - https://developers.google.com/appengine/docs/python/taskqueue/overview-pull

Fastest Open Source Content Management System for Cloud/Cluster deployment

Currently clouds are mushrooming like crazy and people start to deploy everything to the cloud including CMS systems, but so far I have not seen people that have succeeded in deploying popular CMS systems to a load balanced cluster in the cloud. Some performance hurdles seem to prevent standard open-source CMS systems to be deployed to the cloud like this.
CLOUD: A cloud, better load-balanced cluster, has at least one frontend-server, one network-connected(!) database-server and one cloud-storage server. This fits well to Amazon Beanstalk and Google Appengine. (This specifically excludes CMS on a single computer or Linux server with MySQL on the same "CPU".)
To deploy a standard CMS in such a load balanced cluster needs a cloud-ready CMS with the following characteristics:
The CMS must deal with the latency of queries to still be responsive and render pages in less than a second to be cached (or use a precaching strategy)
The filesystem probably must be connected to a remote storage (Amazon S3, Google cloudstorage, etc.)
Currently I know of python/django and Wordpress having middleware modules or plugins that can connect to cloud storages instead of a filesystem, but there might be other cloud-ready CMS implementations (Java, PHP, ?) and systems.
I myself have failed to deploy django-CMS to the cloud, finally due to query latency of the remote DB. So here is my question:
Did you deploy an open-source CMS that still performs well in rendering pages and backend admin? Please post your average page rendering access stats in microseconds for uncached pages.
IMPORTANT: Please describe your configuration, the problems you have encountered, which modules had to be optimized in the CMS to make it work, don't post simple "this works", contribute your experience and knowledge.
Such a CMS probably has to make fewer than 10 queries per page, if more, the queries must be made in parallel, and deal with filesystem access times of 100ms for a stat and query delays of 40ms.
Related:
Slow MySQL Remote Connection
Have you tried Umbraco?
It relies on database, but it keeps layers of cache so you arent doing selects on every request.
http://umbraco.com/azure
It works great on azure too!
I have found an excellent performance test of Wordpress on Appengine. It appears that Google has spent some time to optimize this system for load-balanced cluster and remote DB deployment:
http://www.syseleven.de/blog/4118/google-app-engine-php/
Scaling test from the report.
parallel
hits GAE 1&1 Sys11
1 1,5 2,6 8,5
10 9,8 8,5 69,4
100 14,9 - 146,1
Conclusion from the report the system is slower than on traditional hosting but scales much better.
http://developers.google.com/appengine/articles/wordpress
We have managed to deploy python django-CMS (www.django-cms.org) on GoogleAppEngine with CloudSQL as DB and CloudStore as Filesystem. Cloud store was attached by forking and fixing a django.storage module by Christos Kopanos http://github.com/locandy/django-google-cloud-storage
After that, the second set of problems came up as we discovered we had access times of up to 17s for a single page access. We have investigated this and found that easy-thumbnails 1.4 accessed the normal file system for mod_time requests while writing results to the store (rendering all thumb images on every request). We switched to the development version where that was already fixed.
Then we worked with SmileyChris to fix unnecessary access of mod_times (stat the file) on every request for every image by tracing and posting issues to http://github.com/SmileyChris/easy-thumbnails
This reduced access times from 12-17s to 4-6s per public page on the CMS basically eliminating all storage/"file"-system access. Once that was fixed, easy-thumbnails replaced (per design) file-system accesses with queries to the DB to check on every request if a thumbnail's source image has changed.
One thing for the web-designer: if she uses a image.width statement in the template this forces a ugly slow read on the "filesystem", because image widths are not cached.
Further investigation led to the conclusion that DB accesses are very costly, too and take about 40ms per roundtrip.
Up to now the deployment is unsuccessful mostly due to DB access times in the cloud leading to 4-5s delays on rendering a page before caching it.

Resources