I've been researching the whole web about database pooling, but I still don't understand few things which I hope to find answer here from more experienced developers.
My understanding of database pooling is that when there are multiple clients connecting to the database, it makes sense to keep the connection in a "cache" and use that to connect to the database faster.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open. When a client requests data from an endpoint, the server will use the same connection to retrieve the data. How would pooling help in that case?
I'm sure I'm missing something in the understanding of pooling. It would make sense if multiple instances of the same database exist and therefore it's decided in the pool which database to connect to using the cached credentials. Is it what happens there?
Also could you give me a scenario where database pooling should be used and when not?
Thanks for clarifying any doubt of mine.
Connection pooling is handled differently in different application scenarios and platforms/languages.
The main consideration is that a database connection is a finite resource, and it costs time to establish it.
Connections are finite because most database servers impose a maximum limit on the number of concurrent connections (this may be part of your licensing terms). If your application uses more connections than the database allows, it may start rejecting connections (leading to error handling requirements in the client app), or making connections wait (leading to poor response times). By configuring a connection pool, the client application can manage these scenarios in a central place.
Secondly, managing connections is a bit of a pain - there are lots of different error handling scenarios, configuration settings etc.; it's a good idea to centralize this. You can, of course, do that without a connection pool.
Thirdly, connections are "expensive" resources - they take time to establish. Most web pages require several database queries; if each query spends just 1 tenth of a second creating a database connection, you're very quickly spending noticable time waiting for database connections. By using a connection pool, you avoid this overhead.
In .Net, connection pooling is handled automatically, not directly by your application.
All you need to do is open, use and close your connections normally and it will take care of the rest.
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-connection-pooling
If you're talking about a different platform, the mechanics are different, although the purpose is the same.
In all cases, it's time consuming to open and close connections to the DB server, so something between your application and the DB server (typically the database driver or some sort of middle-ware) will maintain a pool of open connections and create, kill and recycle them as necessary.
Pooling keeps the connections open and cuts down on the overhead of opening one for each request.
Also could you give me a scenario where database pooling should be used and when not?
Connection pooling useful in any application that uses the same database connection multiple times within the lifetime of the connection pool.
There would actually be a very slight decrease in performance if you had an application that used a single connection once, then didn't use it again until the connection pool had timed out and recycled. This is extremely uncommon in production applications.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open.
If you have an application that opens a connection and leaves it open, then theoretically pooling would not help. Practically speaking, it's good to periodically kill and recreate various resources like connections, handles, sockets, etc. because software isn't perfect and some code contains resource leaks.
I'm just guessing, but suspect that you're concern is premature optimization. Unless you have done actual testing and determined that connection pooling is a problem, I wouldn't be too concerned with it. Connection pools are generally self-maintaining and almost always improve performance.
Related
I have Odoo front end on aws ec2 instance and connected it with postgresql on ElephentQl site with 15 concurrent connections
so I want to make sure that this connection limits will pose no problem so i wanna use kafka to perform database write instead of Odoo doing it directly but found no recourses online to help me out
Is your issue about Connection Pooling? PostgreSQL includes two implementations of DataSource for JDBC 2 and two for JDBC 3, as shown here.
dataSourceName String Every pooling DataSource must have a unique name.
initialConnections int The number of database connections to be created when the pool is initialized.
maxConnections int The maximum number of open database connections to allow.
When more connections are requested, the caller will hang until a connection is returned to the pool.
The pooling implementations do not actually close connections when the client calls the close method, but instead return the connections to a pool of available connections for other clients to use. This avoids any overhead of repeatedly opening and closing connections, and allows a large number of clients to share a small number of database connections.
Additionally, you might want to investigate, Pgbouncer. Pgbouncer is a stable, in-production connection pooler for PostgreSQL. PostgreSQL doesn’t realise the presence of PostgreSQL. Pgbouncer can do load handling, connection scheduling/routing and load balancing. Read more from this blog that shows how to integrate this with Odoo. There are a lot of references from this page.
Finally, I would second OneCricketeer's comment, above, on using Amazon RDS, unless you are getting a far better deal with ElephantSQL.
On using Kafka, you have to realise that Odoo is a frontend application that is synchronous to user actions, therefore you are not architecturally able to have a functional system if you put Kafka in-between Odoo and the database. You would input data and see it in about 2-10 minutes. I exaggerate but; If that is what you really want to do then by all means, invest the time and effort.
Read more from Confluent, the team behind Kafka that came out of LinkedIn on how they use a solution called BottledWater to do some cool streams over PostgreSQL, that should be more like what you want to do.
Do let us know which option you selected and what worked! Keep the community informed.
An answer from here
Typically, opening a database connection is an expensive operation, so
pooling keeps the connections active so that, when a connection is
later requested, one of the active ones is used in preference to
opening another one.
I understand the concept of Connection Pool in DB management. That is the answer for a "what is ~" question. All developers blog posts, answers, tutorials, DB docs out there always answer for a question "what is". Like they constantly copy/paste text from one another. Nobody tries to explain "why is it so" and "how". The answer above is an example to it.
I can't understand why and how it is possible that keeping, say, 30 opened connections in a pool is less costly for a system than to open a new connection when it is required.
Suppose, I have a web server located in Australia. And a DB in AWS located in USA somewhere in Oregon. Or in GB. Or anywhere but very far from AUS. So all they say that keeping a pool of 20-... opened connections would be less costly for a memory and system performance than to open a new connection every time in such a case? How it can be? Why?
In your scenario, I think the biggest problem is network latency. You can't expect communication between servers located in two different continents to be particularly fast. So if you initiate a new connection every time you need one, you'd experience this latency every single time.
From here:
Connecting to a database server typically consists of several time-consuming steps. A physical channel such as a socket or a named pipe must be established, the initial handshake with the server must occur, the connection string information must be parsed, the connection must be authenticated by the server, checks must be run for enlisting in the current transaction, and so on.
Also, if the connection between the webserver and the database uses SSL/TLS, there's a handshake that must be performed on every new connection before the actual communication can occur (in addition to the normal handshake that ocurrs in normal connections). This handshake is expensive in terms of time.
From here:
Before the client and the server can begin exchanging application data over TLS, the encrypted tunnel must be negotiated: the client and the server must agree on the version of the TLS protocol, choose the ciphersuite, and verify certificates if necessary. Unfortunately, each of these steps requires new packet roundtrips between the client and the server, which adds startup latency to all TLS connections. (...) As the above exchange illustrates, new TLS connections require two roundtrips for a "full handshake"—that’s the bad news
When you use connection pooling, this overhead is avoided by regularly sending something like a 'ping' message to the SQL server from time to time, to avoid the connection from timeout due to inactivity. Sure, this might consume more memory in your server, but nowadays that's a far less expensive resource. Latency in networks is unavoidable.
Hypothetical scenario:
I have a database server that has significantly more RAM/CPU than could possibly be used in its current system. Connecting an application server to it, would I get better preformance using pooling to use multiple connections that each have smaller executions, or a single connection with a larger execution?
More importantly, why? I'm having trouble finding any reference material to pull me one way or the other.
I always vote for connection pooling for a couple of reasons.
the pool layer will deal with failures and grabbing a working connection when you need it
you can service multiple requests concurrently by using different connections at the same time. a single connection will often block and queue up requests to the db
establishing a connection to a db is expensive - pools can do this up front and in the background as needed
There's also a handy discussion in this answer.
I have a legacy application WinForms that connect directly to a SQL Server 2005 database.
There are many client applications open at the same time (several hundreds), so I want to minimize the number of connections to the database.
I can release connections early and often, and keep the timeout value low.
Are there other things I need to consider?
Try to use the same connection string when you create a new connection, so .Net will use one connection pool.
Dispose your connection as soon as possible.
You can set max pool size in the connection string itself to determine the max number of active connections.
You should consider introducing a connection pool.
In the Java world you usually get this "for free" with an application server. However this would be oversized anyway if everything you care for is the database connection pooling.
The general idea is to have one process (on the server) open a limited number of parallel connections to the database. You would do this in some sort of "proxy" application (a mini application server of sorts) and re-use the expensive to create database connections for incoming connections to your app that are cheaper to create and throw away.
Of course this require some changes to the client side as well, so maybe it is not the ideal solution if you cannot accept that as a precondition.
Can someone explain what is Connection and Statement Pooling and what is the benefit over unpooled DataSources? I am trying to understand when it is a good idea to use a technology like c3p0 or proxool in a project. I need first to understand what they do and when it interesting to use them. Thank you very much.
The Happy Connection
It's so easy to create a new connection every time. One line: that's all it takes. Nothing much to think about. Great life.
Hold on. Do you eat on a plate?
Do you throw away your plate after each use?
No, you wash it and put it on the dish rack, so you can use it again on your next meal. Buying new plates everytime is out of the question. If you did that, you will have wasted enough money to buy a new iPad in one year.
Think about connection pools again.
But this time, the connections are your plates, the connection pool is your dish rack. Your wallet and your energy represent the system resources (memory and bandwidth).
Wash or Spend?
What would you rather do:
a. wash the dishes
b. or run to the mall every meal and buy new plates?
While there are tasks involved in connection pooling, it's less taxing in the long run compared to creating new connections every time. The key is in knowing how many plates (connections) your family (application) will need in any given day.
Pools can be used for database connections, threads, entity beans and other factory-derived objects.
Creating a network connection to a database server is (relatively) expensive.
Likewise asking the server to prepare a SQL statement is (relatively) expensive.
Using a connection/statement pool, you can reuse existing connections/prepared statements, avoiding the cost of initiating a connection, parsing SQL etc.
I am not familiar with c3p0, but the benefits of pooling connections and statements include:
Performance. Connecting to the database is expensive and slow. Pooled connections can be left physically connected to the database, and shared amongst the various components that need database access. That way the connection cost is paid for once and amortized across all the consuming components.
Diagnostics. If you have one sub-system responsible for connecting to the database, it becomes easier to diagnose and analyze database connection usage.
Maintainability. Again, if you have one sub-system responsible for handing out database connections, your code will be easier to maintain than if each component connected to the database itself.
Connecting and disconnecting from a database is an expensive operation. By using pooling you can write your code to open and close connections but the pool decides when to actually do it, leaving a certain number of connections open for a certain time.
Statement pooling? Are you talking about statement caching?
Quoting the book JAVA Persistance with Hibernate
There are three reasons for using a
pool:
Acquiring a new connection is expensive. Some database management
systems even start a completely new
server process for each connection.
Maintaining many idle connections is expensive for a database management
system, and the pool can optimize the
usage of idle connections (or
disconnect if there are no requests).
Creating prepared statements is also expensive for some drivers, and the
connection pool can cache statements
for a connections across requests.