Connection pooling for a rich client accessing a database directly - database

I have a legacy application WinForms that connect directly to a SQL Server 2005 database.
There are many client applications open at the same time (several hundreds), so I want to minimize the number of connections to the database.
I can release connections early and often, and keep the timeout value low.
Are there other things I need to consider?

Try to use the same connection string when you create a new connection, so .Net will use one connection pool.
Dispose your connection as soon as possible.
You can set max pool size in the connection string itself to determine the max number of active connections.

You should consider introducing a connection pool.
In the Java world you usually get this "for free" with an application server. However this would be oversized anyway if everything you care for is the database connection pooling.
The general idea is to have one process (on the server) open a limited number of parallel connections to the database. You would do this in some sort of "proxy" application (a mini application server of sorts) and re-use the expensive to create database connections for incoming connections to your app that are cheaper to create and throw away.
Of course this require some changes to the client side as well, so maybe it is not the ideal solution if you cannot accept that as a precondition.

Related

What is the difference between pooling at Entity Framework vs SQL Server level?

I have a .NET 6 Web API that is hosted on server A. SQL Server is on server B. Both servers are in the same network.
Each endpoint of the Web API makes use of the Entity Framework to query data from the database.
I wanted to enable pooling at the Entity Framework level so that connections are reused. But I'm reading the SQL Server has its own pool of connections anyways. Link: https://learn.microsoft.com/en-us/ef/core/performance/advanced-performance-topics?tabs=with-di%2Cwith-constant#dbcontext-pooling
Note that DbContext pooling is orthogonal to database connection
pooling, which is managed at a lower level in the database driver.
So I want to ask - What is the difference between pooling at Entity Framework vs SQL Server level?
I wanted to enable pooling at the Entity Framework level so that connections are reused
Entity Framework doesn't get involved at the "connections are reused" level. Pooling in that regard is a process of ADO.net forging e.g. a TCP connection to a database (which is a relatively long and resource intensive operation) and keeping it open. When your old school code like using var conn = new SqlConnection("connstr here") calls conn.Open() one of these already-connected connections is leased from the pool and handed to you; you do some queries and then Close (or dispose, which closes) the connection but that doesn't actually disconnect the database; it just returns the connection to the pool
As noted, EF doesn't get involved in this; it's been a thing since long before EF was invented and is active by default unless you've turned it off specifically. EF will use ADO.net connections like any other app, so it already benefits from connection pooling passively
The article youre reading is about a different set of things being pooled. Typically a DbContext is a light weight short lived thing that represents a device that forms queries to a database; you're supposed to make one, use it for a few queries and then throw it away.
It's designed for fast creation but that doesn't mean there aren't minor improvements to be had and if you've identified a situation where you need to wring every last drop of performance out of the system then EF offers a way to dispense with the typical "throw it away and make another" route of making sure you're using a fresh DbContext by providing a facility for DbContexts to be pooled, and rather than being made anew they're reset
It's probably unlikely you're in that place where pooling your contexts would make a significant difference; you're asking about enabling connection pooling but in all likelihood it'll be already enabled because you would know if you'd put "Pooling=false" in a connection string.
For more info on connection pooling, see https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-connection-pooling

Why would my app keep opened too many sleeping connections?

I have a .net core app which is deployed as a Webjob on Azure. It listens to a topic and according to what it reads it performs CRUD operations on a SQL Server Database (the App uses EF core for that).
The thing is that, as the application runs, the number of opened connections increases and most of them are not used for a long time (even for days).
is there a way to make the app not to create too many sleeping connections?
I have tried to run my app locally, using a local SQL Server DB (Express). When I ran it, it only kept opened about 3 connections (with the same amount of message handled as when it is deployed as a webjob).
is there a way to make the app not to create too many sleeping connections?
Yes. Likely the current behavior is fine, and it isn't creating too many sleeping connections. But if you want to change the connection pooling behavior, you can:
The ConnectionString property of the SqlConnection object supports
connection string key/value pairs that can be used to adjust the
behavior of the connection pooling logic. For more information, see
ConnectionString.
SQL Server Connection Pooling

Database Pooling when to use it and when not

I've been researching the whole web about database pooling, but I still don't understand few things which I hope to find answer here from more experienced developers.
My understanding of database pooling is that when there are multiple clients connecting to the database, it makes sense to keep the connection in a "cache" and use that to connect to the database faster.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open. When a client requests data from an endpoint, the server will use the same connection to retrieve the data. How would pooling help in that case?
I'm sure I'm missing something in the understanding of pooling. It would make sense if multiple instances of the same database exist and therefore it's decided in the pool which database to connect to using the cached credentials. Is it what happens there?
Also could you give me a scenario where database pooling should be used and when not?
Thanks for clarifying any doubt of mine.
Connection pooling is handled differently in different application scenarios and platforms/languages.
The main consideration is that a database connection is a finite resource, and it costs time to establish it.
Connections are finite because most database servers impose a maximum limit on the number of concurrent connections (this may be part of your licensing terms). If your application uses more connections than the database allows, it may start rejecting connections (leading to error handling requirements in the client app), or making connections wait (leading to poor response times). By configuring a connection pool, the client application can manage these scenarios in a central place.
Secondly, managing connections is a bit of a pain - there are lots of different error handling scenarios, configuration settings etc.; it's a good idea to centralize this. You can, of course, do that without a connection pool.
Thirdly, connections are "expensive" resources - they take time to establish. Most web pages require several database queries; if each query spends just 1 tenth of a second creating a database connection, you're very quickly spending noticable time waiting for database connections. By using a connection pool, you avoid this overhead.
In .Net, connection pooling is handled automatically, not directly by your application.
All you need to do is open, use and close your connections normally and it will take care of the rest.
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-connection-pooling
If you're talking about a different platform, the mechanics are different, although the purpose is the same.
In all cases, it's time consuming to open and close connections to the DB server, so something between your application and the DB server (typically the database driver or some sort of middle-ware) will maintain a pool of open connections and create, kill and recycle them as necessary.
Pooling keeps the connections open and cuts down on the overhead of opening one for each request.
Also could you give me a scenario where database pooling should be used and when not?
Connection pooling useful in any application that uses the same database connection multiple times within the lifetime of the connection pool.
There would actually be a very slight decrease in performance if you had an application that used a single connection once, then didn't use it again until the connection pool had timed out and recycled. This is extremely uncommon in production applications.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open.
If you have an application that opens a connection and leaves it open, then theoretically pooling would not help. Practically speaking, it's good to periodically kill and recreate various resources like connections, handles, sockets, etc. because software isn't perfect and some code contains resource leaks.
I'm just guessing, but suspect that you're concern is premature optimization. Unless you have done actual testing and determined that connection pooling is a problem, I wouldn't be too concerned with it. Connection pools are generally self-maintaining and almost always improve performance.

Why a connection pool of many opened connections is less costly for a system than to open a new connection every time?

An answer from here
Typically, opening a database connection is an expensive operation, so
pooling keeps the connections active so that, when a connection is
later requested, one of the active ones is used in preference to
opening another one.
I understand the concept of Connection Pool in DB management. That is the answer for a "what is ~" question. All developers blog posts, answers, tutorials, DB docs out there always answer for a question "what is". Like they constantly copy/paste text from one another. Nobody tries to explain "why is it so" and "how". The answer above is an example to it.
I can't understand why and how it is possible that keeping, say, 30 opened connections in a pool is less costly for a system than to open a new connection when it is required.
Suppose, I have a web server located in Australia. And a DB in AWS located in USA somewhere in Oregon. Or in GB. Or anywhere but very far from AUS. So all they say that keeping a pool of 20-... opened connections would be less costly for a memory and system performance than to open a new connection every time in such a case? How it can be? Why?
In your scenario, I think the biggest problem is network latency. You can't expect communication between servers located in two different continents to be particularly fast. So if you initiate a new connection every time you need one, you'd experience this latency every single time.
From here:
Connecting to a database server typically consists of several time-consuming steps. A physical channel such as a socket or a named pipe must be established, the initial handshake with the server must occur, the connection string information must be parsed, the connection must be authenticated by the server, checks must be run for enlisting in the current transaction, and so on.
Also, if the connection between the webserver and the database uses SSL/TLS, there's a handshake that must be performed on every new connection before the actual communication can occur (in addition to the normal handshake that ocurrs in normal connections). This handshake is expensive in terms of time.
From here:
Before the client and the server can begin exchanging application data over TLS, the encrypted tunnel must be negotiated: the client and the server must agree on the version of the TLS protocol, choose the ciphersuite, and verify certificates if necessary. Unfortunately, each of these steps requires new packet roundtrips between the client and the server, which adds startup latency to all TLS connections. (...) As the above exchange illustrates, new TLS connections require two roundtrips for a "full handshake"—that’s the bad news
When you use connection pooling, this overhead is avoided by regularly sending something like a 'ping' message to the SQL server from time to time, to avoid the connection from timeout due to inactivity. Sure, this might consume more memory in your server, but nowadays that's a far less expensive resource. Latency in networks is unavoidable.

When to use connection pooling or single connection?

Hypothetical scenario:
I have a database server that has significantly more RAM/CPU than could possibly be used in its current system. Connecting an application server to it, would I get better preformance using pooling to use multiple connections that each have smaller executions, or a single connection with a larger execution?
More importantly, why? I'm having trouble finding any reference material to pull me one way or the other.
I always vote for connection pooling for a couple of reasons.
the pool layer will deal with failures and grabbing a working connection when you need it
you can service multiple requests concurrently by using different connections at the same time. a single connection will often block and queue up requests to the db
establishing a connection to a db is expensive - pools can do this up front and in the background as needed
There's also a handy discussion in this answer.

Resources