Bitnami Postgres-HA load balancing, replication and too many connections - database

We have .net core based application which is deployed in kubernetes and we have Postgres HA deployed on same cluster along with pgpool-2 for connection pooling. We are facing lot of errors on our application side and based on our stress test we believe database connections are getting exhausted.
Postgresql-ha helm chart deployed to a kubernetes cluster with 6 or so nodes. We initially thought that by increasing pgpool replicas that would increase the number of connections available to us. We also considered that we might also need to scale postgresql replicas in conjunction. Our app connects through a Load Balancer service (typically GKE provisioned) that is enabled through the values.yaml upon deployment.
However when we run tests we see connecting to the load balancer will select an available pgpool it seems based on some resource aside from available connections (CPU?) so it could be a random pgpool replica/pod. Then pgpool communicates to repmanager and postgresql and we ultimately hit the error “Too many connections” no matter how many pgpool/postgres replicas we create.
If we start at 1 pgpool and 1 postgresql replica we see the abillity to handle 30-40 connections before producing the error. If we increase to 2 pgpool and 1 postgresql we see double the connections capable before producing errors. However we hit a wall to where we can not achieve more than 105 connections. We assume 5 are reserved connections and the 100 is max_connection setting on the postgresql.
We are confused at this behavior from how the load balancer is selecting the pgpool and that all postgresql seem to be limited by the same amount of max_connections. Also confused that we see 5 more connections than max_connections. Do all postgresql replicas share the same max_connections meaning if we had 2 it would still only be 100 connections and in this case what is the point of pgpool replication or are we thinking about it incorrectly.
We see mostly idle connections when we query pg_stat_activity, but are confused on how told scale postgresql-ha without running out of connections in this case.

Related

Connections pool (database or other service) across cluster of containers

I expected this to be a common problem but couldn't find a definitive answer.
The scenario is the following:
You have a micro-service application deployed in containers. Each micro-service is deployed in its own container and can be vertically/horizontally scaled independently from the others.
One of these micro-service needs to connect to a service like a database. You use your preferred client library to connect to that specific database, using a connections pool inside the micro-service application.
Your application is elastic, meaning it should scale in and out basing on some workload metrics, deploying/removing containers if required.
Now here is the problem. Your database can accept only a limited number of connections, let's say 100. Say also that your micro-service requiring database connection has a connections pool with a max limit of 10. This means that effectively your micro-service can't horizontally scale out beyond 10 containers, otherwise you can can go above the max number of connections supported by the database.
Ideally you would like to scale out the service independently from the database connections limit, having some sort of stateful pool service across the cluster of containers that is aware of the total number of connections currently active.
What are possible solutions to the above scenario?
I dont know what is the database that you use but for example in postgresql, there is a "pool service" as you say, called pgbouncer,https://www.pgbouncer.org/ that acts as a global connection pool for all services or instances that requires connection to de database.
You deploy it as a separate service and you configure it to connect to your postgresql instance and also configure the number of connections available if they can be reused among services.. etc. Then you have to connect you microservices to this pgbouncer and this way you are sure that you won't overload the database no matter the number of instances of the microservice...
If you are not using postgresql I am pretty sure that other databases have similar solutions

how database connection pool works when multiple instances of same micro service

I have one microservice with hickaricp+PostgreSQL and its working fine with max connections 20.
we planned to do load test with 500 concurrent request with 4 instances of same micro service.
My question is how connection pool connections shared across 4 instances and how many max connections should i keep in hickaricp?
what all changes have to done in database side to sustain that load?
If you want to have a shared common pool of conections among all the microservices instances to define a maximum number of concurrent connections to the database you must use an external connection pooler like pgbouncer and make all microservices make request to pgbouncer. This way pgbouncer manages the connections and share all of them
The max connection pool is per service instance. In this case you have 50, and 4 instances, so the total concurrent transaction that you can do is 50*4 successfully. If you have more transactions then remaining will wait, in which case time out can happen for these extra.

Go database client killing more connections to SQL Server than expected

As part of debugging some other issues on our server I noticed some really odd behavior with respect to connections that I'm hoping to understand.
I have 2 go servers, one of which talks to a SQL Server RDS instance, and another that talks to a managed SQL Server instance in Azure.
I believe there is a slight difference in the way the 2 backends work - RDS has a single port (1433) on which the client authenticates and subsequently establishes the connection. Azure SQL seems to authenticate on port 1433 and then redirect the client to another service that actually handles the connections.
In both cases I've got substantial load running against the servers. At least 500 requests/s, with peaks of about 2k req/s. Each of these requests results in a Select query which returns a single row with a primary key lookup - so really short lived connections to SQL. The average time per query is 50-80ms on both, with p95 in the 100-150ms range
Behavior I'm trying to understand:
I'm using the go database/sql driver with an MS-SQL implementation (Specifically go-mssqldb).
I've set Max Idle connections and Max Open connections to 64.
What I would expect: 64 long running Established connections that are occasionally idle but quickly reused.
What I'm seeing: Generally 64 Established connections, with the number often dropping down to somewhere between 50 and 64. This also results in 200-400 connections in the TIME_WAIT state at any given time.
What could be causing this behavior? It is just the fact that the go driver lazily closes connections? If so why would the number drop below 64?
I'm happy to provide any more details!

Is it possible to limit number of connections used by Entity Framework?

I've noticed that on a NopCommerce site we host (which uses Entity Framework) that if I run a crawler on the site (to check for broken links) it knocks the entire webserver offline for a few minutes and no other hosted sites respond. This seems to be because Entity Framework is opening 30-odd database connections and runs hundreds of queries per second (about 20-40 per page view).
I cannot change how EF is used by NopCommerce (it would take weeks) or change the version of EF being used, so can I mitigate the effects it has on SQL Server by limiting how many concurrent connections it uses, to give other sites hosted on the same server a fairer chance at database access?
What I'm ideally looking to do, is limit the number of concurrent DB connections to about 10, for a particular application.
I think the best you can do is use the Max Pool Size setting in the connection string. This limits the maximum number of connections in the connection pool, and I think this means that's the maximum number of connections the application will ever use. What I'm not sure of though, is if it can't get a connection from the pool, will it cause an exception. I've never tried limiting the connections in this manner.
Here's a litle reading on the settings you can put in a ADO.NET connection string:
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring%28v=vs.100%29.aspx
And here's a little more reading on "Using Connection Pooling":
http://msdn.microsoft.com/en-us/library/8xx3tyca%28v=vs.100%29.aspx

Cloud Architecture

I'm researching cloud services to host an e-commerce site. And I'm trying to understand some basics on how they are able to scale things.
From what I can gather from AWS, Rackspace, etc documentation:
Setup 1:
You can get an instance of a webserver (AWS - EC2, Rackspace - Cloud Server) up. Then you can grow that instance to have more resources or make replicas of that instance to handle more traffic. And it seems like you can install a database local to these instances.
Setup 2:
You can have instance(s) of a webserver (AWS - EC2, Rackspace - Cloud Server) up. You can also have instance(s) of a database (AWS - RDS, Rackspace - Cloud Database) up. So the webserver instances can communicate with the database instances through a single access point.
When I use the term instances, I'm just thinking of replicas that can be access through a single access point and data is synchronized across each replica in the background. This could be the wrong mental image, but it's the best I got right now.
I can understand how setup 2 can be scalable. Webserver instances don't change at all since it's just the source code. So all the http requests are distributed to the different webserver instances and is load balanced. And the data queries have a single access point and are then distributed to the different database instances and is load balanced and all the data writes are sync'd between all database instances that is transparent to the application/webserver instance(s).
But for setup 1, where there is a database setup locally within each webserver instance, how is the data able to be synchronized across the other databases local to the other web server instances? Since the instances of each webserver can't talk to each other, how can you spin up multiple instances to scale the app? Is this setup mainly for sites with static content where the data inside the database is not getting changed? So with an e-commerce site where orders are written to the database, this architecture will just not be feasible? Or is there some way to get each webserver instance to update their local database to some master copy?
Sorry for such a simple question. I'm guessing the documentation doesn't say it plainly because it's so simple or I just wasn't able to find the correct document/page.
Thank you for your time!
Update:
Moved question to here:
https://webmasters.stackexchange.com/questions/32273/cloud-architecture
We have one server setup to be the application server, and our database installed across a cluster of separate machines on AWS in the same availability zone (initially three but scalable). The way we set it up is with a "k-safe" replication. This is scalable as the data is distributed across the machines, and duplicated such that one machine could disappear entirely and the site continues to function. THis also allows queries to be distributed.
(Another configuration option was to duplicate all the data on each of the database machines)
Relating to setup #1, you're right, if you duplicate the entire database on each machine with load balancing, you need to worry about replicating the data between the nodes, this will be complex and will take a toll on performance, or you'll need to sacrifice consistency, or synchronize everything to a single big database and then you lose the effect of clustering. Also keep in mind that when throughput increases, adding an additional server is a manual operation that can take hours, so you can't respond to throughput on-demand.
Relating to setup #2, here scaling the application is easy and the cloud providers do that for you automatically, but the database will become the bottleneck, as you are aware. If the cloud provider scales up your application and all those application instances talk to the same database, you'll get more throughput for the application, but the database will quickly run out of capacity. It has been suggested to solve this by setting up a MySQL cluster on the cloud, which is a valid option but keep in mind that if throughput suddenly increases you will need to reconfigure the MySQL cluster which is complex, you won't have auto scaling for your data.
Another way to do this is a cloud database as a service, there are several options on both the Amazon and RackSpace clouds. You mentioned RDS but it has the same issue because in the end it's limited to one database instance with no auto-scaling. Another MySQL database service is Xeround, which spreads the load over several database nodes, and there is a load balancer that manages the connection between those nodes and synchronizes the data between the partitions automatically. There is a single access point and a round-robin DNS that sends the requests to up to thousands of database nodes. So this might answer your need for a single access point and scalability of the database, without needing to setup a cluster or change it every time there is a scale operation.

Resources