Consider a classic ASP site running on IIS6 with a dedicated SQL Server 2008 backend...
Scenario 1:
Open Connection
Do 15 queries, updates etc all through the ASP-page
Close Connection
Scenario 2:
For each query, update etc, open and close the connection
With connection pooling, my money would be on scenario 2 being the most effective and scalable.
Would I be correct in that assumption?
Edit: More information
This is database operations spread over a lot of asp-code in separate functions, doing separate things etc. It is not 15 queries done in rapid succession. Think a big site with many functions, includes etc.
Fundamentally, ASP pages are synchronous. So why not open a connection once per page load, and close it once per page load? All other opens/closes seem to be unnecessary.
If I understand you correctly you are considering sharing a connection object across complex code held in various functions in various includes.
In such a scenario this would be a bad idea. It becomes difficult to guarantee the correct state and settings on the connection if other code may have seen the need to modify them. Also you may at times have code that fetches a firehose recordset and hasn't finished processing when another piece of code is invoked that also needs a connection. In such a case you could not share a connection.
Having each atomic chunk of code acquire its own connection would be better. The connection would be in a clean known state. Multiple connections when necessary can operate in parrallel. As others have pointed out the cost of connection creation is almost entirely mitigated by the underlying connection pooling.
in your Scenario 2, there is a round-trip between your application and SQLServer for executing each query which consumes your server's resources and time of total executions will raise.
but in Scenario 1, there is only one round-trip and also SQLServer will run all of the queries in just one time. so it is faster and less resource-consuming
EDIT: well, I thought you mean multiple queries in one time..
so, with connection pooling enabled, there is exactly no problem in closing connection after each transaction. so go with Scenario 2
Best practice is to open the connection once, read all your data and close the connection as soon as possible. AFTER you've closed the connection, you can do what you like with the data you retrieved. In this scenario, you don't open too many connections and you don't open the connection for too long.
Even though your code has database calls in several places, the overhead of creating the connection will make things worse than waiting - unless you're saying your page takes many seconds to create on the server side? Usually, even without controlled data access and with many functions, your page should be well under a second to generate on the server.
I believe the default connection pool is about 20 connections but SQLServer can handle alot more. Getting a connection from the server takes the longest time (assuming you are not doing anything daft with your commands) so I see nothing wrong with getting a connection per page and killing it if used afterwards.
For scalability you could run into problems where your connection pool gets too busy and time outs while your script waits for a connection to be come available while your DB is sat there with a 100 spare connections but no one using them.
Create and kill on the same page gets my vote.
From a performance point of view there is no notable difference. ADODB connection pooling manages the actual connections with the db. Adodb.connection .open and .close are just a façade to the connection pool. Instantiating either 1 or 15 adodb.connection objects doesn't really matter performance wise. Before we where using transactions we used the connection string in combination with adodb.command (.activeConnection) and never opened or closed connections explicitly.
Reasons to explicitly keep reference to a adodb.connection are transactions or connection-based functions like mysql last_inserted_id(). In these cases you must be absolutely certain that you are getting the same connection for every query.
Related
I've been researching the whole web about database pooling, but I still don't understand few things which I hope to find answer here from more experienced developers.
My understanding of database pooling is that when there are multiple clients connecting to the database, it makes sense to keep the connection in a "cache" and use that to connect to the database faster.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open. When a client requests data from an endpoint, the server will use the same connection to retrieve the data. How would pooling help in that case?
I'm sure I'm missing something in the understanding of pooling. It would make sense if multiple instances of the same database exist and therefore it's decided in the pool which database to connect to using the cached credentials. Is it what happens there?
Also could you give me a scenario where database pooling should be used and when not?
Thanks for clarifying any doubt of mine.
Connection pooling is handled differently in different application scenarios and platforms/languages.
The main consideration is that a database connection is a finite resource, and it costs time to establish it.
Connections are finite because most database servers impose a maximum limit on the number of concurrent connections (this may be part of your licensing terms). If your application uses more connections than the database allows, it may start rejecting connections (leading to error handling requirements in the client app), or making connections wait (leading to poor response times). By configuring a connection pool, the client application can manage these scenarios in a central place.
Secondly, managing connections is a bit of a pain - there are lots of different error handling scenarios, configuration settings etc.; it's a good idea to centralize this. You can, of course, do that without a connection pool.
Thirdly, connections are "expensive" resources - they take time to establish. Most web pages require several database queries; if each query spends just 1 tenth of a second creating a database connection, you're very quickly spending noticable time waiting for database connections. By using a connection pool, you avoid this overhead.
In .Net, connection pooling is handled automatically, not directly by your application.
All you need to do is open, use and close your connections normally and it will take care of the rest.
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-connection-pooling
If you're talking about a different platform, the mechanics are different, although the purpose is the same.
In all cases, it's time consuming to open and close connections to the DB server, so something between your application and the DB server (typically the database driver or some sort of middle-ware) will maintain a pool of open connections and create, kill and recycle them as necessary.
Pooling keeps the connections open and cuts down on the overhead of opening one for each request.
Also could you give me a scenario where database pooling should be used and when not?
Connection pooling useful in any application that uses the same database connection multiple times within the lifetime of the connection pool.
There would actually be a very slight decrease in performance if you had an application that used a single connection once, then didn't use it again until the connection pool had timed out and recycled. This is extremely uncommon in production applications.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open.
If you have an application that opens a connection and leaves it open, then theoretically pooling would not help. Practically speaking, it's good to periodically kill and recreate various resources like connections, handles, sockets, etc. because software isn't perfect and some code contains resource leaks.
I'm just guessing, but suspect that you're concern is premature optimization. Unless you have done actual testing and determined that connection pooling is a problem, I wouldn't be too concerned with it. Connection pools are generally self-maintaining and almost always improve performance.
Hypothetical scenario:
I have a database server that has significantly more RAM/CPU than could possibly be used in its current system. Connecting an application server to it, would I get better preformance using pooling to use multiple connections that each have smaller executions, or a single connection with a larger execution?
More importantly, why? I'm having trouble finding any reference material to pull me one way or the other.
I always vote for connection pooling for a couple of reasons.
the pool layer will deal with failures and grabbing a working connection when you need it
you can service multiple requests concurrently by using different connections at the same time. a single connection will often block and queue up requests to the db
establishing a connection to a db is expensive - pools can do this up front and in the background as needed
There's also a handy discussion in this answer.
I have a web application that uses Node.js & a connection to an Oracle Database.
Currently my architecture between Node and the DB uses one connection, that stays open. The issue is that some queries take a long time to return, thus blocking subsequent queries until the first returns.
If I open a new connection on each request this does not happen, and the subsequent queries will return before the first (long) one.
The question is what is best practice? Does each request merit a new connection to the database to be closed on callback, should I prioritize queries that I know to take significant time with their own connection, Or is a single connection correct?
Many thanks in advance for your thoughts.
You could use generic-pool module, which is generic resource pool to reuse expensive resources such as database connections
General idea is that you create a connection pool with certain amount of connections (10 by default).
Connections are reused, they will be kept for a certain max idle time (30 seconds by default).
I use this module for Oracle database in production, and found no issues so far.
I have come across an issue that seems to be somehow connected to a web server configuration, and resulting in queries randomly taking a long time to execute. The application is created using old plain Classic ASP and ADODB Connection is used.
The scenario goes as follows:
there is a single connection opened in a script at the beginning of processing each HTTP request
this connection is used to execute a query against a SQL Server, that resides on a separate box. conn.Execute is used. Connection is NOT closed afterwards
there are usually a few to a few dozens of conn.Execute in a single ASP page
All has been working well until recently, when some of the conn.Execute started to take much longer to execute, totally on random.
the difference is e.g. 15ms normal execution time vs. 2000ms long execution time
on the SQL Server side, Profiler does not show longer query execution times, so there must be something blocking the conn.Execute request
When a proper practice of closing a connection after each conn.Execute has been implemented, the issue goes away. However, as I have stated before, all has been working flawlessly until recently. This web app is a fairly large one and rewriting it to close and reopen connections properly will take some time. And I need a short-term solution.
My guess is that it could have something to do with the connection pool size, however this is not ADO.NET, therefore I am not sure, whether a connection pool issue should be taken into the consideration at all. On the SQL Server side, there is no limit on the number of concurrent connections to the server.
I need some hints. Brainstorming possible ideas.
Could be related to delays resolving the hostname in the connection string via DNS - have you tried putting an IP address in the connection string instead of the hostname?
What is it?
How do I implement connection pooling with MS SQL?
What are the performance ramifications when
Executing many queries one-after-the other (i.e. using a loop with 30K+ iterations calling a stored procedure)?
Executing a few queries that take a long time (10+ min)?
Are there any best practices?
Connection pooling is a mechanism to re-use connections, as establishing a new connection is slow.
If you use an MSSQL connection string and System.Data.SqlClient then you're already using it - in .Net this stuff is under the hood most of the time.
A loop of 30k iterations might be better as a server side cursor (look up T-SQL cursor statements), depending on what you're doing with each step outside of the sproc.
Long queries are fine - but be careful calling them from web pages as Asp.Net isn't really optimised for long waits and some connections will cut out.
A little more info on the connection pooling thing... you're using it already with SqlClient, but only if your connection string is identical for each new connection you open. My understanding is that the framework will pool connections automatically when it can, but if the connection string varies even slightly from one connection to the next, then the new connection won't come from the pool - it gets created anew (so it's more expensive).
You can use the Performance Monitor app with XP/Vista to watch SQL connections and you'll see pretty quickly whether or not pooling is being used. Look under the ".NET CLR Data" category" in Performance Monitor.
I second Keith; if you're calling a stored procedure 30,000 times, you have far bigger problems than connection pooling.
Your question was also partially answered by this thread. A search would have revealed this.. The definition of Connection Pooling, of which a Google would have answered with the first hit being this..
Which would leave just the best practices, which I think would have been a good question :)
+1 to Keith's Answer. He has hit the nail right on the head.
Just a polite reminder from the FAQ:
You've searched the internet before
asking your question, and you come to
us armed with research and information
about your question ... right?