Why is it bad practice to make multiple database connections in one request? - database

A discussion about Singletons in PHP has me thinking about this issue more and more. Most people instruct that you shouldn't make a bunch of DB connections in one request, and I'm just curious as to what your reasoning is. My first thought is the expense to your script of making that many requests to the DB, but then I counter myself with the question: wouldn't multiple connections make concurrent querying more efficient?
How about some answers (with evidence, folks) from some people in the know?

Database connections are a limited resource. Some DBs have a very low connection limit, and wasting connections is a major problem. By consuming many connections, you may be blocking others for using the database.
Additionally, throwing a ton of extra connections at the DB doesn't help anything unless there are resources on the DB server sitting idle. If you've got 8 cores and only one is being used to satisfy a query, then sure, making another connection might help. More likely, though, you are already using all the available cores. You're also likely hitting the same harddrive for every DB request, and adding additional lock contention.
If your DB has anything resembling high utilization, adding extra connections won't help. That'd be like spawning extra threads in an application with the blind hope that the extra concurrency will make processing faster. It might in some certain circumstances, but in other cases it'll just slow you down as you thrash the hard drive, waste time task-switching, and introduce synchronization overhead.

It is the cost of setting up the connection, transferring the data and then tearing it down. It will eat up your performance.
Evidence is harder to come by but consider the following...
Let's say it takes x microseconds to make a connection.
Now you want to make several requests and get data back and forth. Let's say that the difference in transport time is negligable between one connection and many (just ofr the sake of argument).
Now let's say it takes y microseconds to close the connection.
Opening one connection will take x+y microseconds of overhead. Opening many will take n * (x+y). That will delay your execution.

Setting up a DB connection is usually quite heavy. A lot of things are going on backstage (DNS resolution/TCP connection/Handshake/Authentication/Actual Query).
I've had an issue once with some weird DNS configuration that made every TCP connection took a few seconds before going up. My login procedure (because of a complex architecture) took 3 different DB connections to complete. With that issue, it was taking forever to log-in. We then refactored the code to make it go through one connection only.

We access Informix from .NET and use multiple connections. Unless we're starting a transaction on each connection, it often is handled in the connection pool. I know that's very brand-specific, but most(?) database systems' cilent access will pool connections to the best of its ability.
As an aside, we did have a problem with connection count because of cross-database connections. Informix supports synonyms, so we synonymed the common offenders and the multiple connections were handled server-side, saving a lot in transfer time, connection creation overhead, and (the real crux of our situtation) license fees.

I would assume that it is because your requests are not being sent asynchronously, since your requests are done iteratively on the server, blocking each time, you have to pay for the overhead of creating a connection each time, when you only have to do it once...
In Flex, all web service calls are automatically called asynchronously, so you it is common to see multiple connections, or queued up requests on the same connection.
Asynchronous requests mitigate the connection cost through faster request / response time...because you cannot easily achieve this in PHP without out some threading, then the performance hit is greater then simply reusing the same connection.
that's my 2 cents...

Related

What happens when bandwidth is exceeded with a Microsoft SQL Server connection?

The application used by a group of 100+ users was made with VB6 and RDO. A replacement is coming, but the old one is still maintained. Users moved to a different building across the street and problems began. My opinion regarding the problem has been bandwidth, but I've had to argue with others who say it's database. Users regularly experience network slowness using the application, but also workstation tasks in general. The application moves large audio files and indexes them on occasion as well as others. Occasionally the database becomes hung. We have many top end, robust SQL Servers, so it is not a server problem. What I figured out is, a transaction is begun on a connection, but fails to complete properly because of a communication error. Updates from other connections become blocked, they continue stacking up, and users are down half a day. What I've begun doing the moment I'm told of a problem, after verifying the database is hung, is set the database to single user then back to multiuser to clear connections. They must all restart their applications. Today I found out there is a bandwidth limit at their new location which they regularly max out. I think in the old location there was a big pipe servicing many people, but now they are on a small pipe servicing a small number of people, which is also less tolerant of momentary high bandwidth demands.
What I want to know is exactly what happens to packets, both coming and going, when a bandwidth limit is reached. Also I want to know what happens in SQL Server communication. Do some packets get dropped? Do they start arriving more out of sequence? Do timing problems occur?
I plan to start controlling such things as file moves through the application. But I also want to know what configurations are usually present on network nodes regarding transient high demand.
This is a very broad question. Networking is very key (especially in Availability Groups or any sort of mirroring set up) to good performance. When transactions complete on the SQL server, they are then placed in the output buffer. The app then needs to 'pick up' that data, clear it's output buffer and continue on. I think (without knowing your configuration) that your apps aren't able to complete the round trip because the network pipe is inundated with requests, so the apps can't get what they need to successfully finish and close out. This causes havoc as the network can't keep up with what the apps and SQL server are trying to do. Then you have a 200 car pileup on a 1 lane highway.
Hindsight being what it is, there should have been extensive testing on the network capacity before everyone moved across the street. Clearly, that didn't happen so you are kind of left to do what you can with what you have. If the company can't get a stable networking connection, the situation may be out of your control. If you're the DBA, I highly recommend you speak to your higher ups and explain to them the consequences of the reduced network capacity. Often times, showing the consequences of inaction can lead to action.
Out of curiosity, is there any way you can analyze what waits are happening when the pileup happens? I'm thinking it will be something along the lines of ASYNC_NETWORK_IO which is usually indicative that SQL is waiting on the app to come back and pick up it's data.

Slow XA transactions in JBoss

We are running jboss 4.2.2 with SQL server 2005 (sqljdbc driver 1.2).
We have recently installed new relic and can see a large bottleneck with our transactions.
Generally for any one web request the bottleneck of sits on one of these:
master..xp_sqljdbc_xa_start
master..xp_sqljdbc_xa_commit
org.jboss.resource.adapter.jdbc.WrapperDataSource.getConnection()
master..xp_sqljdbc_xa_end
Several hundred ms are spent on one of these items (in some cases several seconds). Cumulatively most of the response time is spent on these items.
I'm trying to indentify whether its any of the following:
Will moving away from XA transactions help?
Is there a larger problem at my database that I dont have visibility over?
Can I upgrade my SQL driver to help with this?
Or is this an indication that there are just a lot of queries, and we should start by looking at our code, and trying to lower the number of queries overall?
XA transactions are necessary if you are performing work against more than one resource in a single transaction, if you need consistency then you need XA. However you are talking in terms of "queries" which might imply that you are mostly doing read-only activities, and so XA may be overkill. Furthermore you don't speak about using multiple databases or other transactional resources so do you really need XA at all?
So first step: understand the requirements, do you need transaction scopes that span several database interactions? If you are just doing a single query then XA is not needed. If you have mixture of activities needing XA and simple queries not needing XA then use two different connections, one with XA and one without - this clarifies your intention. However I would expect XA drivers to use single resource optimisations so that if XA is not needed you don't pay the overhead so I suspect something more is going on here. (disclaimer I don't use JBoss so my intuition is suspect).
So look to see whether your transaction scopes are appropriate, isolation levels are sensible and so on. Are you getting contention because transactions are unreasonably long, for example are transactions held over user think time?
Next those multi-second waits: that implies contention (or some bizarre network issue) The only reason I can think of for an xa_start being slow is that writing a transaction log is taking unreasonably long - are your logs perhaps on some slow network device? Waits for getConnection() might simply imply that your connection pool is too small (or you're holding connections for too long) If xa_commit and xa_end are taking a long time I'd want to know what the resource managers are doing, can you get any info from the database server.
My overall position: If you truly need XA then you will pay some logging and network message overheads, but these should not be costing you hundreds or thousands of millseconds. Most business systems need XA in a small subset of their overall resource accesses typically when updating two otherwise independent systems, and almost never in read-only scenarios - absolute consistency across distributed systems is pretty much meaningless so using XA for queries is almost certainly overkill.

what is the optimal database connection strategy

i have a asp.net mvc website which runs a number of queries for each page. Should i open up a single connection or open and close a connection on each query?
It really doesn't matter. When you use ADO.NET (which includes Linq to SQL, NHibernate and any of the other ORMs), the library employs connection pooling. You can "close" and "reopen" a logical connection a dozen times but the same physical connection will remain open the whole time. So don't concern yourself too much with whether or not the connection is open or closed.
Instead, you should be trying to limit the number of queries you have to run per page, because every round-trip incurs a significant overhead. If you're displaying the same data on every page, cache the results, and set up a cache dependency or expiration if it changes infrequently. Also try to re-use query data by using appropriate joins and/or eager loading (if you're using an ORM that lazy-loads).
Even if the data will always be completely different on every page load, you'll get better performance by using a single stored procedure that returns multiple result sets, than you would by running each query separately.
Bottom line: Forget about the connection strategy and start worrying about the query strategy. Any more than 3-5 queries per page and you're liable to run into serious scale issues.
If you are running multiple queries on a page in regular ADO.NET, then they are run in sequence and connection pooling is going to mean it doesn't matter. Best practice is to open connections on demand and close them immediately - even for multiple queries in the same page. Connection pooling makes this fairly efficient.
When you are using multiple queries, your performance could improve significantly by opening multiple connections simultaneously and use asynchronous ADO, to ensure that all the requests are running at the same time in multiple threads. In this case, you need a connection for each query. But the overall connection time will be reduced.
There is also the potential to use MARS on a single connection, but I'm not a big proponent of that, and it's a lot more limited in functionality.
If you are fairly sure that the transactions will finish quickly then use a single connection.
Be sure to check all return results and wrap everything in exception handlingwhere possible.
To avoid unnecessary overhead it's better to use a single connection. But be sure to run the queries in a "try" block and close the connections in a "finally" block to be sure not to leave connections hanging.
try-finally
unitofwork?? this is a great strategy to employ. nhibernate and many others use this pattern.
give it a google for specific details relevant to your needs..
jim

Why use Singleton to manage db connection?

I know this has been asked before here there and everywhere but i can't get a clear explanation so i'm going to pitch it again. So what is all of the fuss about using a singleton to control the db connection in your web app? Some like it some hate it i don't understand it. From what I've read, "it's to ensure that there is always only one active connection to your DB". I mean why is that a good thing? 1 active DB connection on a data driven web app processing multiple requests per second spells trouble doesn't it? For whatever reason nobody can properly explain this. I've been all over the web. I know i'm thick.
Assuming Java here, but is relevant to most other technologies as well.
I'm not sure whether you've confused the use of a plain singleton with a service locator. Both of them are design patterns. The service locator pattern is used by applications to ensure that there is a single class entrusted with the responsibility of obtaining and providing access to databases, files, JMS queues, etc.
Most service locators are implemented as singletons, since there is no need for multiple service locators to do the same job. Besides, it is useful to cache information obtained from the first lookup that can be later used by other clients of the service locator.
By the way, the argument about
"it's to ensure that there is always
only one active connection to your DB"
is false and misleading. It is quite possible that the connection can be closed/reclaimed if left inactive for quite a long period of time. So caching a connection to the database is frowned upon. There is one deviation from this argument; "re-using" the connection obtained from the connection pool is encouraged as long as you do so with the same context, i.e. within the same HTTP request, or user request (whichever is applicable). This done obviously, from the point of view of performance, since establishing new connections can prove to be an expensive operation.
High-performance (or even medium-performance) web apps use database connection pooling, so one DB connection can be shared among many web requests. The singleton is usually the object which manages this pool. I think the motivation for using a singleton is to idiot-proof against maintenance programmers that might otherwise instantiate many of these objects needlessly.
"it's to ensure that there is always only one active connection to your DB." I think that would be better stated as to ensure each CLIENT has only one active connection to your DB. The reason why this is incredibly important is because you want to prevent deadlocks. If I have TWO open database connections (as a client) I might be updating on one connection, then I might try to update the same row in another connection. This will a deadlock which the database cannot detect. So, the idea of the singleton is basically to make sure that there is ONE object who is charge of handing out database connections to each client. Basically. You don't HAVE to have a singleton for this, but most people will tell you it just makes sense that the system only has one.
You're right--usually this isn't what you want.
However, there are plenty of cases where you need to throttle yourself down to a single connection. By serializing your access to the database through a singleton, you can address other issues or constraints like load, bandwidth, etc.
I've done something similar in the past for a bulk processing app. Instead, though, I used a semaphore to synchronize access to the database so I could allow n concurrent db operations.
One might want to use a singleton due to database server constraints, for example, a server might limit the number of connections.
My main conscious reason is that you know what connections can be managed/closed etc., just makes things a bit more organised when you don't have unnecessary, redundant connections.
I don't think it's a simple answer. For instance on ASP.NET, the platform implements connection pooling by default, so it will automatically adjust a "pool" of connections and re-use them so you're not constantly creating and destroying expensive objects.
However, let's say you were writing a data collection application that monitored 200 separate input sources. Every time one of those inputs changed, you fire off a thread that records the event to the database. I would say that could be a bad design if there's a chance that even a fraction of those could fire off at the same time. Suddenly having 20 or 40 active database connections is inefficient. It might be better to queue the updates, and as long as there are updates left in the queue, a singleton connection picks them off the queue and executes them on the server. It's more efficient because you only have to negotiate the connection and authentication once. Once there's no activity for a while you could choose to close down the connection. This kind of behavior would be hard to implement without a central resource manager like a singleton.
"only one active connection" is a very narrow statement for illustration. It could just as well be a singleton managing a pool of connection. The point of a singleton for database connections is that you don't want every consumer making it's own connection or set of connections.
I think you might want to be more specific about, "using a singleton to control the db connection in your web app." Ideally, a java.sql.Connection object will not be thread safe, but your javax.sql.DataSource may want to pool connections, so you should go to a single instance of it to share the pooling.
you are more looking for one connection per request, not one connection for the entire application. you can still control access to it through a singleton though (storing the connection in the HttpContext.Items collection).
It guarantees that each client using your site only gets one connection to the db.
You really do not want a new connection being made everytime a user does an action that will create a db query. Not only for performance reasons with the connection handshaking involved, but to decrease load on the db server.
DB connections are a precious commodity, and this technique helps minimize the amount used at any given time.

What is the Speed Difference Between Database and Web Service Calls?

All things being equal, and in the most simple form, which is faster?
1.) A call to a web service method
2.) A call to a database
For example, assume that you have a simple web service that just returns an integer that is calculated in X time. You also have a database that, when queried in th right way, also takes X time to calculate the answer. (So the compute time is the same in both cases) In both cases, assume the amount of data both directions is the same, say, a single 32-bit integer, for simplicity.
Thus far, the calculation times of both the web service and the database are exactly the same.
The environment is 1 application server, where the app resides, and 1 other server that is holding both the web service and the database. There is nothing else going on in the environment other than the application calling either the web service or database repeatedly. This all within one single LAN, so any network latency is equal.
From an application, which will be faster, the call to the database, or the call to the web service?
What I am trying to isolate, I guess, is which is more heavy-weight. Does the set up, open, close, tear down of a database connection end up slower than that for a web service, or is it the same? Additionally, if there are other things, such as parsing the result from a web service, how do they affect the speed?
O(1) doesn't refer to any length of time. A single operation could take .001 ms on a webservice and 100 seconds in a database and they both could be using O(1) functions:
http://en.wikipedia.org/wiki/Big_O_notation
It's hard to know quite what you're asking. If you're asking whether accessing a local database is generally faster than accessing a similar service over the internet, then I expect that, generally, the answer is that the local database will be faster. The call over the internet to the web service has a lot of overhead and communication over internet is relatively slow. Evan on a slow computer a databases can perform many thousands of simple queries per second. Contrast that with access over the internet, where you'd be lucky to get 50 round trip requests per second, not even accounting for time it takes to perform the requested operation on the server.
If you're asking whether a server on the web can serve data faster by avoiding a database and calculating results directly, then the answer is it depends. The call to the database in this case adds unnecessary overhead if the data in it can be easily calculated in a stand-alone function. The answer to this question doesn't really have anything to do with a "web service". Is it faster to calculate an answer in a function or to access the answer using a query on a database? As I said, the answer would depend on the complexity of the particular function you had to use, and weighing its computation time against the overhead of accessing the answer (or part of the answer) directly from a database.
In short, the answer to your question depends on what exactly you're asking. It would also probably help to know why you're asking the question. I have a suspicion that the real answer is that this probably isn't something you need to worry about, not really a practical concern unless you have a particular situation requiring optimization.
If you're concerned about comparison of speed when webservice and database are both on a lan, I'm pretty sure the overhead of the db is a less than the webservice. The application typically maintains a stateful connection(s) to the db, while requests to a webservice are via http, which is stateless, relatively higher overhead, and slower. Could be wrong, though. Best answer would be to whip up a simple webservice, query, and (1) measure time it takes to retrieve results using both methods, and compare, and/or (2) create an app that opens a lot of threads and do some load testing.
A caveat: If your app doesn't maintain an open connection or have access to a pool of connections with the db, then the db alternative may well be slower. Initial creation of a db connection can be relatively slow. But that shouldn't figure into things, since you should write your app so that an open connection is always maintained.
Based on practical experience, I would say that the database call is significantly faster.
It all depends on the network topology and languages you're using. If you're talking C#...my money would be on the database call being faster almost every time.
Your calls to the database server are going to be made over the native protocol. Everything is going to be optimized.
If you're calling a web service, you're going to need some mechanism to send the request to the web server, wait for the web server to respond, and then something to parse the result of the web service call back into your code.
One could say that generally, latency of the network in a web service (which will typically be over the internet) is going to be slower than the call to a database (which is typically on a LAN or something, which is faster than one's connection to the internet).
Of course, this makes a LOT of assumptions about setups/software/etc, etc which effectively reduces it to an apples and oranges comparison, which there is never a good answer for.
O(1) doesn't specify the speed, it specifies the 'growth' in time required as the underlying data gets larger. The constants are dropped from the equation. What this means is that O(N^2) can be less than O(N) for some really small N.
A web service is a way to connect to some functionality. Besides the network latency, the real time is bound by what the service is actually doing. There could be a database underneath for example. If it is something that just returns an Integer, the computational time is mostly trivial, the request is bounded by the network.
A database needs to parse the query, build a query tree, optimized it, then apply some search algorithms against a series of caches and files. If you just plopped an Integer into a trivial table, or a tableless SQL call, then fetching the data is probably trivial, its the whole transactional packaging that will eat CPU.
Can you get a packet back and forth to a server before you can parse trivial SQL and punch back a tabled result? Mostly, these days I say it was a toss up. Some networks are faster than others, while some databases and servers are pretty good. Nothing is certain.
In general, is a web service faster than a database? Yes, if and only if the service is trivial (if it's hiding a database, then it's obviously just additional time). Databases are big bulky engines, and while they've gotten much faster over the years, their base level of transactional integrity specifies an awful lot of minimum CPU usage. They're slower because they are doing so much more work. Contrast that with some explicit minimal computation hidden behind network access. A fibber or gigabit network can rapidly move data. It's just so much less work to get accomplished.
Of course the reason we don't replace databases with custom written web services is time. It takes too long to write it, and then keep it up to date. Way more effort than just slamming it into a database and accepting it's performance.
Paul.
IMHO I would say the database call would be faster hands down. I say this because there is much less overhead. With the verbosity of the HTTP protocol and SOAP markup incurred you have a lot more bloat in your data. This bloat data has extra cost for packaging and un-packaging. with a stored procedure call you could use an output parameter to return a single int instead of a result set to make it even lighter.
Algorithmic complexity is just one variable that impacts the overall performance of a system. Other factors might include network latency or network bandwidth, especially when the size of the returned data is different.
If you run the same O(1) algorithm on a local machine, you will get the results faster than if you run the algorithm on a machine on another continent and need the same results sent over the network.
Other factors might include raw CPU speed if the calls are done on physically different machines.
That's why premature optimisation is the root of all evil.
EDIT:
I'd say it depends even more now on the details of the system, i.e., what database software you are using, or whether or not your web service is reading data from a static web page or dynamically generating the data.
But I am beginning to lose sight of why you are asking the question. You seem to say that both methods take the same amount of time. So if they take the same amount of time, how can you ask which is faster? Clearly they are equally fast. You need to tell us more about how and when they stop taking the same amount of time.
If we are assuming that you are communicating to a different server for both the web and database calls, wouldn't they be pretty much the same, since both requests are transferred through TCP/IP? The only thing then that could be compared is how big the actual results are that are sent back in terms of bits across the wire.

Resources