what is the optimal database connection strategy

what is the optimal database connection strategy - sql-server

i have a asp.net mvc website which runs a number of queries for each page. Should i open up a single connection or open and close a connection on each query?

It really doesn't matter. When you use ADO.NET (which includes Linq to SQL, NHibernate and any of the other ORMs), the library employs connection pooling. You can "close" and "reopen" a logical connection a dozen times but the same physical connection will remain open the whole time. So don't concern yourself too much with whether or not the connection is open or closed.
Instead, you should be trying to limit the number of queries you have to run per page, because every round-trip incurs a significant overhead. If you're displaying the same data on every page, cache the results, and set up a cache dependency or expiration if it changes infrequently. Also try to re-use query data by using appropriate joins and/or eager loading (if you're using an ORM that lazy-loads).
Even if the data will always be completely different on every page load, you'll get better performance by using a single stored procedure that returns multiple result sets, than you would by running each query separately.
Bottom line: Forget about the connection strategy and start worrying about the query strategy. Any more than 3-5 queries per page and you're liable to run into serious scale issues.

If you are running multiple queries on a page in regular ADO.NET, then they are run in sequence and connection pooling is going to mean it doesn't matter. Best practice is to open connections on demand and close them immediately - even for multiple queries in the same page. Connection pooling makes this fairly efficient.
When you are using multiple queries, your performance could improve significantly by opening multiple connections simultaneously and use asynchronous ADO, to ensure that all the requests are running at the same time in multiple threads. In this case, you need a connection for each query. But the overall connection time will be reduced.
There is also the potential to use MARS on a single connection, but I'm not a big proponent of that, and it's a lot more limited in functionality.

If you are fairly sure that the transactions will finish quickly then use a single connection.
Be sure to check all return results and wrap everything in exception handlingwhere possible.

To avoid unnecessary overhead it's better to use a single connection. But be sure to run the queries in a "try" block and close the connections in a "finally" block to be sure not to leave connections hanging.
try-finally

unitofwork?? this is a great strategy to employ. nhibernate and many others use this pattern.
give it a google for specific details relevant to your needs..
jim

Related

Codeigniter multiple database connection slows my pages. Do I have to close the connections? Where?

I'm using CodeIgniter 3.0 for a web app.
I have 2 databases, on a same server (I guess) but with 2 different hostnames.
I don't use the second database, except for one kind of user.
When this kind of user connects to the web app, the pages take a very long time to display, but for my other users, no problem, so I guess it's a multiple databases connection issue.
In my database.php file, I write the 2 arrays including the databases informations.
In my model files using the second database, I just write something like that:
$db1 = $this->load->database('db1', TRUE);
...
// I do my query as usual
$db1->...
...
return $db1->get();
I do not close the connection.
Questions:
1) In each page, I use several functions using the second database. Is this issue due to theses multiple connections to my second database?
2) Do I have to close the connection in my functions's model, just before the return? Or is it beter to connect and disconnect in the controler?
3) I saw about the CI reconnect function, but how to use it well? To reconnect, I have to connect first, but where to connect first?
4) Or do you think the issue is due to something else, like some bad SQL queries?

Let's go through your questions one at a time and I'll comment.
1) In each page, I use several functions using the second database. Is
this issue due to theses multiple connections to my second database?
I say no because I have used the same multiple DB approach many times and have never seen a performance hit. Besides, if a performance hit was a common problem there would be lots of online complaints and people looking for solutions. I've seen none. (And I spend way too much time helping people with CodeIgniter.)
2) Do I have to close the connection in my function's model, just
before the return? Or is it better to connect and disconnect in the
controller?
If closing the connection did help then the answer to when to do it depends on the overall structure of the logic. For instance, if a controller is using several methods from the same model the create a page then close the connection in the controller. On the other hand, if only one model method is used to create a given page then close the connection in the model.
What you don't want to do is repeatedly open and close a DB connection while building a page.
3) I saw about the CI reconnect function, but how to use it well? To
reconnect, I have to connect first, but where to connect first?
reconnect() is only useful when database server drops the connection due to it being idle for too long. You'll know you need to use reconnect() when you start getting "no database connection" or "cannot connect to database" errors.
4) Or do you think the issue is due to something else, like some bad SQL queries?
Because the other approaches you ask about won't help this is the strongest possibility. Again, my reasoning is because I've never had this problem using multiple database connections.
I suggest you do some performance profiling on the queries to the second database. Check out the following bits of documentation for help with that.
Profiling Your Application
Benchmarking Class
There are lots of reasons for slow page loads and the use of the second DB might just be a coincidence.
About Closing Connections
The question is, "If I do not close the DB connection by myself, CI will do it for me, but when?".
The answer is found in the PHP manual, "Open non-persistent MySQL connections and result sets are automatically destroyed when a PHP script finishes its execution." That quote is from the mysqli documentation, but, to the best of my knowledge, it is true for all of PHP's database extensions, i.e. Oracle, Mssql, PDO, etc.
In short, DB connection closing is baked into PHP and happens when the script is done. In CI, the script is done very shortly after the Controller returns. (Examine the end of /system/core/Codeigniter.php if you want to see what happens when the controller returns.) In effect, a Controller returning is, more or less, another way of saying "after the page is loaded".
Unless you happen to be using persistent connections (usually a bad idea) you seldom need to explicitly close DB connections. One reason to close them yourself is when a lot (really a lot) of time is required to process the query results. Manually closing connections will help assure the DB server won't reach it's connection limit when the web server is under heavy usage.
To determine what "really a lot" means you have to consider multiple factors, i.e. how many connections the database server allows, how the time-to-process compares to the DB idle connection dropout duration, and the amount of traffic the site needs to handle.
There are likely other considerations too. I'm not a database performance tuning expert.

Should I keep this "GlobalConnection" or create connection for every query?

I have inherited a legacy Delphi application that uses ADO to connect to SQL Server.
The application has a notion of a "Global Connection" -- that is a single connection that it opens at the start, and then keeps open all throughout the running of the application (which can be days, weeks, or longer....)
So my question is this: Should I keep this way of doing things or should I switch to a "connect-query-disconnect" mode of doing things? Does it matter?
Switching would be a non-trivial task, but I'll do it if it means better performance, data management, etc.

Well, it depends on what you're expecting to get out of it, and what kind of application it is.
There's nothing in particular wrong with using a single long-running connection, as long as the application can gracefully handle disconnections and recover or log/notify when it can't reconnect.
The problem with a connect-query-disconnect setup is that you're adding the overhead of connecting and disconnecting on every query. That's going to slow things down, and in an interactive GUI application users may notice the additional overhead. You also have to make sure that authorization is transparently handled if it isn't already.
At the same time, there may be interactive performance gains to be had if you can push all the queries off onto background threads and asynchronously update the GUI. If contention appears because the queries are serialized, you can migrate to a connection-pool system fairly readily as well and improve things even more. This has a fairly high complexity cost to it though, so now you're looking to balancing what the gains are compared to the work involved.
Right now, my ultimate response is "if it ain't broke, don't fix it." Changes along the lines you propose are a lot of work -- how much do the users of this application stand to gain? Are there other problems to solve that might benefit them more?
Edit: Okay, so it's broke. Well, slow at least, which is all the same to me. If you've ruled out problems with the SQL Server itself, and the queries are performing as fast as they can (i.e. DB schema is sane, the right indexes are available, queries aren't completely braindead, server has enough RAM and fast enough I/O, network isn't flaky, etc.), then yes, it's time to find ways to improve the performance of the app itself.
Simply moving to a connect-query-disconnect is going to make things worse, and the more queries you're issuing the bigger the drop off is going to be. It sounds like you're going to need to rearchitect the app so that you can run fewer queries, run them in the background, cache more aggressively on the client, or some combination of all 3.
Don't forget the making the clients perform better means that server side performance gets more important since it's probably going to be handling a higher load if clients start making multiple connections and issuing multiple queries in parallel.

As mr Frazier told before - the one global connection is not bad per se.
If you intend to change, first detect WHAT is the problem. Let's see some scenarios:
1
Some screens(IOW: an set of 1..n forms to operate in a business entity) are slow. Possible causes:
insuficient filtering resulting in a pletora of records being pulled from database without necessity.
the number of records are ok, but takes too much to render it. Solution: faster controls or intelligent rendering (ex.: Virtual list views)
too much queries each time you open an screen. Possible solutions: use TClientDatasets (or any in-memory dataset) to hold infrequently modified lookup tables. An more sophisticated cache for more extensive tables or opening those datasets in other threads can improve response times.
Scrolling on datasets with controls bound can be slow (just to remember, because those little details can be easily forgotten).
2
Whole app simply slows down. Checklist:
Network cards are ok? An few net cards mal-functioning can wreak havoc even on good structured networks as they create unnecessary noise on the line.
[MSSQL DBA HAT ON] The next on the line of attack is SQL Server. Ask the DBA to trace blocks and deadlocks. Register slow queries and work on them speed up. This relate directly to #1.1 and #1.3
Detect if some naive developer have done SELECT inside transactions. In read committed isolation, it's just overhead, as it'll create more network traffic. Open the query, retrieve the data and close the dataset.
Review the database schema, if you can.
Are any data-bound operations on a bulk of records (let's say, remarking the price of some/majority/all products) being done on the app? Make an SP or refactor the operation on an query, it'll be much faster and will reduce the load of the entire server.
Extensive operations on a group of records? Learn how to do that operations at once on the server instead of one-by-one record. See an examination of most used alternatives on the MSSQL MVP Erland Sommarskog's article on array and list on MSSQL.
Beware of queries with WHERE like : WHERE SomeFunction(table1.blabla) = #SomeParam . Most of time, that ones will not use an index causing to read the entire table to select the desired data. If is a big table.... Indexing on a persisted computed columns can make miracles...[MSSQL HAT OFF]
That's what I can think of without a little more detail... ;-)

Database connections are time consuming resources to create and the rule of thumb should be create as little as possible and reuse as much as possible. That's why some other technologies have database connection pools, which are typically established at application/service startup and then kept as long as possible and shared among threads.
From your comment, the application has performances issues, but it's difficult without more details to make any recommendation.
Should try to nail down what is slow - are all queries slow or just some specific ones?
If just some specific ones is there some correlation.
My 2 cents.

Why use Singleton to manage db connection?

I know this has been asked before here there and everywhere but i can't get a clear explanation so i'm going to pitch it again. So what is all of the fuss about using a singleton to control the db connection in your web app? Some like it some hate it i don't understand it. From what I've read, "it's to ensure that there is always only one active connection to your DB". I mean why is that a good thing? 1 active DB connection on a data driven web app processing multiple requests per second spells trouble doesn't it? For whatever reason nobody can properly explain this. I've been all over the web. I know i'm thick.

Assuming Java here, but is relevant to most other technologies as well.
I'm not sure whether you've confused the use of a plain singleton with a service locator. Both of them are design patterns. The service locator pattern is used by applications to ensure that there is a single class entrusted with the responsibility of obtaining and providing access to databases, files, JMS queues, etc.
Most service locators are implemented as singletons, since there is no need for multiple service locators to do the same job. Besides, it is useful to cache information obtained from the first lookup that can be later used by other clients of the service locator.
By the way, the argument about
"it's to ensure that there is always
only one active connection to your DB"
is false and misleading. It is quite possible that the connection can be closed/reclaimed if left inactive for quite a long period of time. So caching a connection to the database is frowned upon. There is one deviation from this argument; "re-using" the connection obtained from the connection pool is encouraged as long as you do so with the same context, i.e. within the same HTTP request, or user request (whichever is applicable). This done obviously, from the point of view of performance, since establishing new connections can prove to be an expensive operation.

High-performance (or even medium-performance) web apps use database connection pooling, so one DB connection can be shared among many web requests. The singleton is usually the object which manages this pool. I think the motivation for using a singleton is to idiot-proof against maintenance programmers that might otherwise instantiate many of these objects needlessly.

"it's to ensure that there is always only one active connection to your DB." I think that would be better stated as to ensure each CLIENT has only one active connection to your DB. The reason why this is incredibly important is because you want to prevent deadlocks. If I have TWO open database connections (as a client) I might be updating on one connection, then I might try to update the same row in another connection. This will a deadlock which the database cannot detect. So, the idea of the singleton is basically to make sure that there is ONE object who is charge of handing out database connections to each client. Basically. You don't HAVE to have a singleton for this, but most people will tell you it just makes sense that the system only has one.

You're right--usually this isn't what you want.
However, there are plenty of cases where you need to throttle yourself down to a single connection. By serializing your access to the database through a singleton, you can address other issues or constraints like load, bandwidth, etc.
I've done something similar in the past for a bulk processing app. Instead, though, I used a semaphore to synchronize access to the database so I could allow n concurrent db operations.

One might want to use a singleton due to database server constraints, for example, a server might limit the number of connections.
My main conscious reason is that you know what connections can be managed/closed etc., just makes things a bit more organised when you don't have unnecessary, redundant connections.

I don't think it's a simple answer. For instance on ASP.NET, the platform implements connection pooling by default, so it will automatically adjust a "pool" of connections and re-use them so you're not constantly creating and destroying expensive objects.
However, let's say you were writing a data collection application that monitored 200 separate input sources. Every time one of those inputs changed, you fire off a thread that records the event to the database. I would say that could be a bad design if there's a chance that even a fraction of those could fire off at the same time. Suddenly having 20 or 40 active database connections is inefficient. It might be better to queue the updates, and as long as there are updates left in the queue, a singleton connection picks them off the queue and executes them on the server. It's more efficient because you only have to negotiate the connection and authentication once. Once there's no activity for a while you could choose to close down the connection. This kind of behavior would be hard to implement without a central resource manager like a singleton.

"only one active connection" is a very narrow statement for illustration. It could just as well be a singleton managing a pool of connection. The point of a singleton for database connections is that you don't want every consumer making it's own connection or set of connections.

I think you might want to be more specific about, "using a singleton to control the db connection in your web app." Ideally, a java.sql.Connection object will not be thread safe, but your javax.sql.DataSource may want to pool connections, so you should go to a single instance of it to share the pooling.

you are more looking for one connection per request, not one connection for the entire application. you can still control access to it through a singleton though (storing the connection in the HttpContext.Items collection).

It guarantees that each client using your site only gets one connection to the db.
You really do not want a new connection being made everytime a user does an action that will create a db query. Not only for performance reasons with the connection handshaking involved, but to decrease load on the db server.
DB connections are a precious commodity, and this technique helps minimize the amount used at any given time.

Opening the database connection once or on every databaseaction?

Im currently creating a webportal with ASP.NET which relies heavily on database usage. Basically, every (well almost every :P ) GET query from any user will result in a query to the database from the webserver.
Now, I'm really new at this, and I'm very concerned about performance. Due to my lack of experience in this area, I don't really know what to expect.
My question is, using ADO.NET, would it be a smarter choice to just leave a static connection open from the webserver to the database, and then check the integrety of this connection serverside before each query to the database? - Or, would I be better off opening the connection before each query and then close it afterwards?
In my head the first option would be the better as you save time handshaking etc. before each query and you save memory both on the database and the server side since you only have one connection, but are there any downfalls to this approach? Could 2 queries send at the same time potentially destroy each others integrity or mix the returned dataset?
I've tried searching everywhere in here and on the web to find some best-practices about this, but with no luck. Closest I got was this: is it safe to keep database connections open for long time , but that seems to be more fitting for distributed systems where you have more than one user of the database, whereas I only got my webserver..

You're way early to be worrying about performance.
Anyhow, connections are pooled by the framework. You should be opening them, using them, and disposing of them ASAP.
Something like...
public object Load()
{
using (SqlConnection cn = new SqlConnection(connectionString))
using (SqlCommand cm = new SqlCommand(commandString, cn))
{
cn.Open();
return cm.ExecuteScalar();
}
}

It's better to let ADO.NET handle the connection pooling. It'll persist the connection if it thinks it needs to, but don't use a static connection object. That just smells. It would be better to pass the connection object around to methods that need it, and create the connection in a using block.

You should always close your connection after finishing your DB interaction. ADO.NET has connection pooling which will take care of efficient connection reuse. Whenever you open 2nd, 3rd and subsequent connections - they'll be taken from a pool with almost no overhead.
Hope this helps.

I'd be thinking more about caching than advanced connection pooling. Every get requires a database hit?
If its a portal you've got common content and user specific content, using the Cache you can store common items as well as with a mangled key (with the users id) you can store user specific items.

ADO.NET does connection pooling. When you call close on the connection object it will keep the connection in the pool making the next connection much faster.

Your initial hunch is correct. What you need is database connection pooling.

You definitely don't want to open a connection for every database call, that will result in extremely poor performance very quickly. Establishing a database connecting is very expensive.
Instead what you should be using is a connection pool. The pool will manage your connections, and try to re-use existing connections when possible.

I don't know your platform, but look into Connection Pooling - there must be a library or utility available (either in the base system or as an add-on, or supplied with the database drivers) that will provide the means to pool several active connections to the database, which are ready and raring to be used when you obtain one from the pool.
To be honest, I would expect the pooling to occur by default in any database abstraction library (with an available option to disable it). It appears that ADO.NET does this.

Really the first question to ask is why are you very concerned about performance? What is your expected workload? Have you tried it yet?
But in general, yes, it's smarter to have an open connection that you keep around for a while than to re-open a database connection each time; depending on the kind of connection, network issues, and phase of the moon, it can take a good part of a second or more to make an initial connection; if your workload is such that you expect more than a GET every five seconds or so, you'll be happier with a standing connection.

Why is it bad practice to make multiple database connections in one request?

A discussion about Singletons in PHP has me thinking about this issue more and more. Most people instruct that you shouldn't make a bunch of DB connections in one request, and I'm just curious as to what your reasoning is. My first thought is the expense to your script of making that many requests to the DB, but then I counter myself with the question: wouldn't multiple connections make concurrent querying more efficient?
How about some answers (with evidence, folks) from some people in the know?

Database connections are a limited resource. Some DBs have a very low connection limit, and wasting connections is a major problem. By consuming many connections, you may be blocking others for using the database.
Additionally, throwing a ton of extra connections at the DB doesn't help anything unless there are resources on the DB server sitting idle. If you've got 8 cores and only one is being used to satisfy a query, then sure, making another connection might help. More likely, though, you are already using all the available cores. You're also likely hitting the same harddrive for every DB request, and adding additional lock contention.
If your DB has anything resembling high utilization, adding extra connections won't help. That'd be like spawning extra threads in an application with the blind hope that the extra concurrency will make processing faster. It might in some certain circumstances, but in other cases it'll just slow you down as you thrash the hard drive, waste time task-switching, and introduce synchronization overhead.

It is the cost of setting up the connection, transferring the data and then tearing it down. It will eat up your performance.
Evidence is harder to come by but consider the following...
Let's say it takes x microseconds to make a connection.
Now you want to make several requests and get data back and forth. Let's say that the difference in transport time is negligable between one connection and many (just ofr the sake of argument).
Now let's say it takes y microseconds to close the connection.
Opening one connection will take x+y microseconds of overhead. Opening many will take n * (x+y). That will delay your execution.

Setting up a DB connection is usually quite heavy. A lot of things are going on backstage (DNS resolution/TCP connection/Handshake/Authentication/Actual Query).
I've had an issue once with some weird DNS configuration that made every TCP connection took a few seconds before going up. My login procedure (because of a complex architecture) took 3 different DB connections to complete. With that issue, it was taking forever to log-in. We then refactored the code to make it go through one connection only.

We access Informix from .NET and use multiple connections. Unless we're starting a transaction on each connection, it often is handled in the connection pool. I know that's very brand-specific, but most(?) database systems' cilent access will pool connections to the best of its ability.
As an aside, we did have a problem with connection count because of cross-database connections. Informix supports synonyms, so we synonymed the common offenders and the multiple connections were handled server-side, saving a lot in transfer time, connection creation overhead, and (the real crux of our situtation) license fees.

I would assume that it is because your requests are not being sent asynchronously, since your requests are done iteratively on the server, blocking each time, you have to pay for the overhead of creating a connection each time, when you only have to do it once...
In Flex, all web service calls are automatically called asynchronously, so you it is common to see multiple connections, or queued up requests on the same connection.
Asynchronous requests mitigate the connection cost through faster request / response time...because you cannot easily achieve this in PHP without out some threading, then the performance hit is greater then simply reusing the same connection.
that's my 2 cents...