Database connection pool strategy for micro services

Database connection pool strategy for micro services - database

We are trying to convert our monolithic application to a micro services based architecture. We use Postgresql as one of our database in the monolithic application with BoneCP for connection pooling.
When this monolith is split to a number of independent micro-services with each of them running in a different JVM, I can think about two options for connection pooling
BoneCP or any decent connection pool for each microservice - My initial research shows that this is the primary choice. It is possible to have a fine grained control of connection requirements for each service.But, down side is that as the number of services increase, number of connection pool also increases and eventually there will be too many idle connections assuming that minimum connections in each pool is greater than 0.
Rely on database specific extensions like PGBouncer - This approach has the advantage that connection pool is managed by a central source rather than a pool for each micro service and hence number of idle connections can be brought down. It is also language/technology agnostic. Down side is that these extensions are database specific and some of the functionalities in JDBC may not work. For eg: Prepared statments may not work with PGBouncer in Transaction_Pooling mode.
In our case most of the micro-services(at least 50) will be connecting to the same Postgres server even though the database can be different. So, if we go with option 1, there is a higher chance of creating too many idle connections.The traffic to most of our services are very moderate and the rationale behind moving to micro-service is for easier deployment, scaling etc.
Has anyone faced a similar problem while adopting micro-services architecture? Is there a better way of solving this problem in micro-service world?

I don't see how pgbouncer will solve any of the problems you would have with the first approach. There are many reasons to use pgbouncer but I don't think they are really applicable here.
Also, in my experience, while idle connections can be an issue, they probably will not be on the scale you are talking about. I mean we are not talking hundreds of idle connections right?
More critically, one key thing that a microservices approach would give you is an ability to move dbs off to other servers. If you do this, then having your connection pool centrally managed makes this harder to do.
Per-service pool is generally more flexible and it makes your infrastructure quite a bit more flexible too.

I have responded a similar question here: Microservices - Connection Pooling when connecting to a single legacy database
"I am facing a similar dilemma at my work and I can share the conclusions we have reached so far.
There is no silver bullet at the moment, so:
1 - Calculate the number of connections dividing the total desired number of connections for the instances of microservices will work well if you have a situation where your microservices don't need to drastically elastic scale.
2 - Not having a pool at all and let the connections be opened on demand. This is what is being used in functional programming (like Amazon lambdas). It will reduce the total number of open connections but the downside is that you lose performance as per opening connections on the fly is expensive.
You could implement some sort of topic that let your service know that the number of instances changed in a listener and update the total connection number, but it is a complex solution and goes against the microservice principle that you should not change the configurations of the service after it started running.
Conclusion: I would calculate the number if the microservice tend to not grow in scale and without a pool if it does need to grow elastically and exponentially, in this last case make sure that a retry is in place in case it does not get a connection in the first attempt.
There is an interesting grey area here awaiting for a better way of controlling pools of connections in microservices.
In time, and to make the problem even more interesting, I recommend reading the
article About Pool Sizing from HikariCP: https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing The ideal concurrent connections in a database are actually smaller than most people think."

Maybe group some smaller number of microservices into modulith and use karaf, or other osgi container as a runtime for them. Then you can create bundle that will represent a connection-pool for your database so other bundles — microservices can use it. But I'm not sure if it will solve your architecture problem.

Let's say you have the limiting requirement - only 10 connections to the database.
You can run 10 instances of the microservice with the connection pool limited to 1 connection max. Or you can run 3 instances with pool max=3.
The centralized connection pool, which would serve multiple services in the cloud, sounds bad (the typical single point of failure).

Related

Tomcat via Apache Server going down after too many connections

I have an Apache (2.4) Server that serves content through the AJP connector on a Tomcat 7 Server.
One of my clients manages to kill the tomcat instance after running too many concurrent connections to a JSP JSON Api service. (Apache still works, but tomcat falls over. Restarting Tomcat brings it back up) there are no errors in tomcats logs.
I would like to protect the site from falling over like that, but I am not sure what configurations to change.
I do not want to limit the number of concurrent connections as there are legitimate use cases for that,
My Tomcat memory settings are :
Initial Memory pool : 1280MB
Maximum memory pool : 2560MB
which I assumed was plenty.
It might be worth mentioning that the API service relies on multiple, possibly heavy MySQL connections.
Any advice would be most appreciated.

Why don't you'd slowly switch your most used/ important application features to microservices architecture and dockerize your tomcat servers to be able to manage multiple instances of your application. This will hopefully help your application to manage multiple connections without impacting the overall performance of the servers at the job.

If you are talking about scaling, you need to do the horizontal scaling her with multiple tomcat servers.
If you cannot limit user connections & still want the app to run smooth, then you need to scale. Architectural change to microservices is an option but may not be possible always for a production solution.
The best to think about is running multiple tomcats sharing the load. There are various ways to do this. With your tech stack, I feel the Apache 2 load balancer plugin in combination with Tomcat will do best.
Have an example here.
Now, with respect to server capacity, db connection capacity etc, you might also need to think about vertical scaling.

Equal connection distribution is not happening in Aurora autoscaled insatnces

We are running a REST API based spring boot application using AWS Aurora as Database. Our application connects to read-only Aurora MySQL RDS instances.
We are doing load testing on it. Initially we have one database and we have autoscaling in place, which is triggered on high CPU.
Now we are expecting that if we are getting some X throughput with one db instance then we should be getting approx 1.8X when autoscaling happens, and connections should be distributed equally among with the newly created database instances.
But it is not happening, instead DB connections are going up and down on both database instances erratically. Due to which our load is not getting distributed equally and we are not getting desired throughput. Sometimes one database is running on 100 % CPU while the other is still on 20% CPU and after few minutes it is reversed.
Below are the database connection cofiguration :-
Driver - com.mysql.jdbc.driver
Maximum active connections=100
Max age = 300000
Initial pool size = 10
Tomcat jdbc pool is used for connection pooling
NOTE:
1) We have also disabled jvm network DNS caching.
2) we also tried refreshing the database connections every 5 minutes,
Even the active ones.
3) We have tried everything suggested by AWS but nothing is working.
4)We have even written a lambda code to update Route 53 when new db instance comes up to avoid cluster endpoint caching but still same issue.
Can anyone please help what is the best practice for this as currently we cannot take this into production.

This is not a great answer, but since you haven't gotten any replies yet some thoughts.
1) The behavior you are seeing replicates bad routing logic of load balancers
This is no surprise you, but this used to be much more common with small web server deployments – especially long running queries. With connection pooling, you mirror this situation.
2) Taking this assumption forward, we need to guess on how Amazon choose to balance traffic to read only replicas.
Even in their white paper, they don't mention how they are doing routing: https://www.allthingsdistributed.com/files/p1041-verbitski.pdf
Likely options are route53 or an NLB.
My best guess would be that they are using an NLB. NLBs became available to us only in Q3 2017 and Aurora was 2 years before, but it still is a reasonable guess.
NLBs would let us balance based on least connections (far better than round robin).
3) Validating assumptions
If route53 is being used, then we would be able to use DNS to find out.
I did a dig against the route53 end point and found that it gave me an answer
dig +nocmd +noall +answer zzz-databasecluster-xxx.cluster-ro-yyy.us-east-1.rds.amazonaws.com
zzz-databasecluster-xxx.cluster-ro-yyy.us-east-1.rds.amazonaws.com. 1 IN CNAME zzz-0.yyy.us-east-1.rds.amazonaws.com.
zzz-0.yyy.us-east-1.rds.amazonaws.com. 5 IN A 10.32.8.33
I did it again and got a different answer.
dig +nocmd +noall +answer zzz-databasecluster-xxx.cluster-ro-yyy.us-east-1.rds.amazonaws.com
zzz-databasecluster-xxx.cluster-ro-yyy.us-east-1.rds.amazonaws.com. 1 IN CNAME zzz-2.yyy.us-east-1.rds.amazonaws.com.
zzz-2.yyy.us-east-1.rds.amazonaws.com. 5 IN A 10.32.7.97
What you can see is that the read only endpoint is giving me a CNAME result to
Zzz is name of my cluster, yyy came from my cloudformation stack formation, and yyy comes from amazon.
Note: zzz-0 and zzz-2 are the two read only replicas.
What we can see here is that we have route53 for our load balancing.
4) Route53 Load Balancing
They are likely setting up Route53 with round robin on all healthy read only replicas.
The TTL is likely 5s.
Healthy nodes will get removed, but there is no balancing based on
5) Ramifications
A) Using the Read Only end point can only balance traffic away from unhealthy instances
B) DB Pools will keep connections for a long time which means that new read replicas won’t be touched
If we have a small number of servers, we will be unbalanced – which we can’t do much against.
6) Thoughts on what you can do
A) Verify yourself with dig that you are getting correct DNS resolution that keeps rotating between replicas every 5s.
If you don’t, this is something you need to fix
B) Periodically recycle DB Clients
New replicas will get used and while you will be unbalanced, this will help by keeping changing.
What is critical though is you MUST not have all your clients recycle at the same time. Otherwise, you run the risk of all getting the same time. I would suggest doing some random ttl per client (within min/max).
C) Manage it yourself
Summary: When you connect, connect directly to the read replica with least connection/lowest CPU.
How you do this is slightly not simplistic. I would suggest a lambda function that keeps this connection string in a queryable location. Have it update at some frequency. I would say the frequency of updating the preferred DB is 1/10 of the frequency you are recycle the DB connections. You could add logic if the DBs are running similarly, you give the readonly end point..and only give an explicit one when there is significant inequity.
I would caution when a new instance comes up you want to be careful of floating.
D) Increase number of clients or number of read only copies
Both of these would decrease the chance that two boxes would get significant differences.

How do I determine the ideal value for my database.yml pool?

Longtime listener, first-time caller here... my Postgres database says it allows the following:
Connections:
6/120
What should my corresponding "pool" setting be in this scenario? 6? 120? Something else entirely? Thanks in advance for any help here.
If it makes a difference I'm using Puma & Sidekiq to run a Rails 4 application on Heroku.

How many connections does your app use under typical load? Set the idle pool to that, and set the max pool to somewhere under the max allowed by the server.
But, that server connections setting should also be tuned to your application and hardware. It's typically some function of your core count, RAM and work_mem setting, and the kind of disks you have, but will also depend on what kind of queries your app typically runs.
(see here for some tips: https://wiki.postgresql.org/wiki/Number_Of_Database_Connections)
Postgres is actually pretty forgiving: opening connections is cheap (an undersized pool), compared to many other databases; idle open connections (oversized pool) are also cheap (a few K of shared buffers, if memory serves).
It's really having more active connections than your resources allow that will cause problems, which is why the server-side configuration is more important.

Storing database connections in session, in a small scale webapp

I have a j2ee webapp that's being used internally by ~20-30 people.
There is no chance of significant growth in the number of users.
From what I understood there's a trade-off between opening a new DB connection for each request made to the webapp (expensive, but doesn't block other users when the DB is in use), to using the singleton pattern (doesn't open new connections but only allows one user at a time).
I thought that since I know that only 30 users will ever use my webapp at the same time, maybe the simplest and best solution would be to store the connection as a session attribute, thus reducing to a minimum the amount of openings made, while still allocating one connection per user.
What do you think?

From what I understood there's a
trade-off between opening a new DB
connection for each request made to
the webapp
That is what connection pools are for. If you use a connection pool in your application, the pool once initialized, is in charge of providing connections for use in the application as and when needed. In a properly tuned connection pool, there are going to be enough connections created on reserve that can be provided to the application, mitigating the need to create and open a connection only when the application requests for it.
I thought that since I know that only
30 users will ever use my webapp at
the same time, maybe the simplest and
best solution would be to store the
connection as a session attribute
Per-user connections are not a good idea, primarily when a web application is concerned. In a web application, it is perfectly possible for users to initiate multiple requests to the server (think multi-tabbed browsing). In such a case, the use of a single connection per user will result in weird application behavior, unless you synchronize access to the connection.
One must also consider the side-effect of putting transient attributes into the session - Connection objects are not serializable and hence must be marked transient. If the session is deserialized at some point, one has to account for the fact that the Connection object will not be available, and must be re-initialized.

I think you're getting into premature optimization especially given the scale of the application. Opening a new connection is not that expensive and like Makach says, most modern RDBMSs handle connection pooling and will hold connections open for subsequent requests. You'd be trying to write better code than the compiler, so to speak.

No. Don't do that. It's perfectly ok to reconnect to the database every time you need to. Any database management system will do their own connection pool caching I think.
If you want to try to keep open connections you'll make it incredible hard for yourself to manage this in a secure, bug-free, safe etc way.

Decision making in distributed applications

With a distributed application, where you have lots of clients and one main server, should you:
Make the clients dumb and the server smart: clients are fast and non-invasive. Business rules are needed in only 1 place
Make the clients smart and the server dumb: take as much load as possible off of the server
Additional info:
Clients collect tons of data about the computer they are on. The server must analyze all of this info to determine the health of these computers
The owners of the client computers are temperamental and will shut down the clients if the client starts to consume too many resources (thus negating the purpose of the distributed app in helping diagnose problems)

You should do as much client-side processing as possible. This will enable your application to scale better than doing processing server-side. To solve your temperamental user problem, you could look into making your client processes run at a very low priority so there's no noticeable decrease in performance on the part of the user.

In a client-server setting, if you care about security, you should always program on the assumption that the client may have been compromised. Even if it hasn't, there is always the risk of somebody using an old version of the client, using a competing or modified version of the client, or just of the net connection being a bit screwy.
So while you do as much work on the client as possible, processing and marshalling information into the right form, the server then needs to do a thorough sanity check on anything the client gives it.
So the answer I guess is "both".

The server must analyze all of this
info to determine the health of these
computers
That is probably the biggest clue so far explaning what your application is kinda about. Are you able to provide a more elaborate briefing on what this application is seeking to achieve in this distributed environment? We do not even know if the client-side processing is disk I/O or processor intensive. How you design the solution is dependent on the nature of what needs to be done to help the users/business accomplish their jobs and objectives.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight