I'm bit confused about relationship between a
Database Open Session
Connection pooling
To elaborate, I'm using JDBC with Oracle 9i DB and I'm also using a Connection Pool to pool my connections.
What I would like to know is that: When my connections are lying idle in pool, are they associated with any Open Session with database? So If I've 5 connection sitting idle in pool, does that mean there will be 5 corresponding active session Open with my database?
Ok.. I got some answer from other forums:
That depends entirely on the pool implementation. It seems likely they are associated with an open session for a while, and then the sessions are closed if the connections are not used for some time, and reestablished when they're needed again.
Not keeping them open for some amount of time would mean wasting the overhead of establishing connections when requests are coming in rapid-fire. Keeping them open forever would hog limited resources for no good reason. Both of these go against my understanding of the very point of having a connection pool in the first place.
Related
I've been researching the whole web about database pooling, but I still don't understand few things which I hope to find answer here from more experienced developers.
My understanding of database pooling is that when there are multiple clients connecting to the database, it makes sense to keep the connection in a "cache" and use that to connect to the database faster.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open. When a client requests data from an endpoint, the server will use the same connection to retrieve the data. How would pooling help in that case?
I'm sure I'm missing something in the understanding of pooling. It would make sense if multiple instances of the same database exist and therefore it's decided in the pool which database to connect to using the cached credentials. Is it what happens there?
Also could you give me a scenario where database pooling should be used and when not?
Thanks for clarifying any doubt of mine.
Connection pooling is handled differently in different application scenarios and platforms/languages.
The main consideration is that a database connection is a finite resource, and it costs time to establish it.
Connections are finite because most database servers impose a maximum limit on the number of concurrent connections (this may be part of your licensing terms). If your application uses more connections than the database allows, it may start rejecting connections (leading to error handling requirements in the client app), or making connections wait (leading to poor response times). By configuring a connection pool, the client application can manage these scenarios in a central place.
Secondly, managing connections is a bit of a pain - there are lots of different error handling scenarios, configuration settings etc.; it's a good idea to centralize this. You can, of course, do that without a connection pool.
Thirdly, connections are "expensive" resources - they take time to establish. Most web pages require several database queries; if each query spends just 1 tenth of a second creating a database connection, you're very quickly spending noticable time waiting for database connections. By using a connection pool, you avoid this overhead.
In .Net, connection pooling is handled automatically, not directly by your application.
All you need to do is open, use and close your connections normally and it will take care of the rest.
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-connection-pooling
If you're talking about a different platform, the mechanics are different, although the purpose is the same.
In all cases, it's time consuming to open and close connections to the DB server, so something between your application and the DB server (typically the database driver or some sort of middle-ware) will maintain a pool of open connections and create, kill and recycle them as necessary.
Pooling keeps the connections open and cuts down on the overhead of opening one for each request.
Also could you give me a scenario where database pooling should be used and when not?
Connection pooling useful in any application that uses the same database connection multiple times within the lifetime of the connection pool.
There would actually be a very slight decrease in performance if you had an application that used a single connection once, then didn't use it again until the connection pool had timed out and recycled. This is extremely uncommon in production applications.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open.
If you have an application that opens a connection and leaves it open, then theoretically pooling would not help. Practically speaking, it's good to periodically kill and recreate various resources like connections, handles, sockets, etc. because software isn't perfect and some code contains resource leaks.
I'm just guessing, but suspect that you're concern is premature optimization. Unless you have done actual testing and determined that connection pooling is a problem, I wouldn't be too concerned with it. Connection pools are generally self-maintaining and almost always improve performance.
An answer from here
Typically, opening a database connection is an expensive operation, so
pooling keeps the connections active so that, when a connection is
later requested, one of the active ones is used in preference to
opening another one.
I understand the concept of Connection Pool in DB management. That is the answer for a "what is ~" question. All developers blog posts, answers, tutorials, DB docs out there always answer for a question "what is". Like they constantly copy/paste text from one another. Nobody tries to explain "why is it so" and "how". The answer above is an example to it.
I can't understand why and how it is possible that keeping, say, 30 opened connections in a pool is less costly for a system than to open a new connection when it is required.
Suppose, I have a web server located in Australia. And a DB in AWS located in USA somewhere in Oregon. Or in GB. Or anywhere but very far from AUS. So all they say that keeping a pool of 20-... opened connections would be less costly for a memory and system performance than to open a new connection every time in such a case? How it can be? Why?
In your scenario, I think the biggest problem is network latency. You can't expect communication between servers located in two different continents to be particularly fast. So if you initiate a new connection every time you need one, you'd experience this latency every single time.
From here:
Connecting to a database server typically consists of several time-consuming steps. A physical channel such as a socket or a named pipe must be established, the initial handshake with the server must occur, the connection string information must be parsed, the connection must be authenticated by the server, checks must be run for enlisting in the current transaction, and so on.
Also, if the connection between the webserver and the database uses SSL/TLS, there's a handshake that must be performed on every new connection before the actual communication can occur (in addition to the normal handshake that ocurrs in normal connections). This handshake is expensive in terms of time.
From here:
Before the client and the server can begin exchanging application data over TLS, the encrypted tunnel must be negotiated: the client and the server must agree on the version of the TLS protocol, choose the ciphersuite, and verify certificates if necessary. Unfortunately, each of these steps requires new packet roundtrips between the client and the server, which adds startup latency to all TLS connections. (...) As the above exchange illustrates, new TLS connections require two roundtrips for a "full handshake"—that’s the bad news
When you use connection pooling, this overhead is avoided by regularly sending something like a 'ping' message to the SQL server from time to time, to avoid the connection from timeout due to inactivity. Sure, this might consume more memory in your server, but nowadays that's a far less expensive resource. Latency in networks is unavoidable.
I have a web application that uses Node.js & a connection to an Oracle Database.
Currently my architecture between Node and the DB uses one connection, that stays open. The issue is that some queries take a long time to return, thus blocking subsequent queries until the first returns.
If I open a new connection on each request this does not happen, and the subsequent queries will return before the first (long) one.
The question is what is best practice? Does each request merit a new connection to the database to be closed on callback, should I prioritize queries that I know to take significant time with their own connection, Or is a single connection correct?
Many thanks in advance for your thoughts.
You could use generic-pool module, which is generic resource pool to reuse expensive resources such as database connections
General idea is that you create a connection pool with certain amount of connections (10 by default).
Connections are reused, they will be kept for a certain max idle time (30 seconds by default).
I use this module for Oracle database in production, and found no issues so far.
I have a legacy application WinForms that connect directly to a SQL Server 2005 database.
There are many client applications open at the same time (several hundreds), so I want to minimize the number of connections to the database.
I can release connections early and often, and keep the timeout value low.
Are there other things I need to consider?
Try to use the same connection string when you create a new connection, so .Net will use one connection pool.
Dispose your connection as soon as possible.
You can set max pool size in the connection string itself to determine the max number of active connections.
You should consider introducing a connection pool.
In the Java world you usually get this "for free" with an application server. However this would be oversized anyway if everything you care for is the database connection pooling.
The general idea is to have one process (on the server) open a limited number of parallel connections to the database. You would do this in some sort of "proxy" application (a mini application server of sorts) and re-use the expensive to create database connections for incoming connections to your app that are cheaper to create and throw away.
Of course this require some changes to the client side as well, so maybe it is not the ideal solution if you cannot accept that as a precondition.
When I profile my application using SQL Server Profiler, I am seeing lots of Audit Login and Audit Logout messages for connections to the same database. I am wondering, does this indicate that something is wrong with my connection pooling? The reason I ask, is because I found this in the MSDN documentation in regards to connection pooling:
Login and logout events will not be
raised on the server when a connection
is fetched from or returned to the
connection pool. This is because the
connection is not actually closed when
it is returned to the connection pool.
For more information, see Audit Login
Event Class and Audit Logout Event
Class in SQL Server Books Online.
http://msdn.microsoft.com/en-us/library/8xx3tyca.aspx
Also, does anyone have any tips for determining how effective the connection pooling is for a given SQL server? I have lots of databases on a single server and I know this can have a huge impact, but I am wondering if there is an easy way to obtain metrics on the effectiveness of my connection pooling. Thanks in advance!
While the MSDN article says that the event will only be raised for non-reused connections, the SQL Server documentation contradicts this statement:
"The Audit Login event class indicates that a user has successfully logged in to Microsoft SQL Server. Events in this class are fired by new connections or by connections that are reused from a connection pool."
The best way to measure the effectiveness of pooling is to collect the time spent in connecting with and without pooling. With pooling, you should see that the first connection is slow and the subsequent ones are extremely fast. Without pooling, every connection will take a lot of time.
If you want to track the Audit Logon event, you can use the EventSubClass data column to whether the login is with a reused connection or a new connection. The value will be 1 for a real connection and 2 for a reused connection from the pool.application.
Remember that connections are pooled per connectionstring. If you have many databases and connect using many connectionstrings, your app will create a new connection when none exist with the correct connectionstring. Then it will pool that connection and, if the pool is full, bump an existing connection. The default Max Pool Size is 100 connections, so if you're routinely bouncing through more than 100 databases, you'll close and open connections all the time.
It's not ideal, but you can solve the problem by always connecting to a single database (one connection string) and then switch db context 'USE [DBName]'. There are drawbacks:
You lose the ability to specify a user/pass per connection string (your app user needs permission to all databases).
Your SQL becomes more complex (especially if you're using an out-of-the-box ORM or stored procs).
You could experiment with increasing the Max Pool Size if your database count isn't huge. Otherwise, if some databases are used frequently while others aren't, you could turn pooling off on the infrequent dbs. Both items are configured via connectionstring.
As far as metrics, monitoring the login and logout events on SQL Server is a good start. If your app is pooling nicely you shouldn't see a lot of them.