I know this has been asked before here there and everywhere but i can't get a clear explanation so i'm going to pitch it again. So what is all of the fuss about using a singleton to control the db connection in your web app? Some like it some hate it i don't understand it. From what I've read, "it's to ensure that there is always only one active connection to your DB". I mean why is that a good thing? 1 active DB connection on a data driven web app processing multiple requests per second spells trouble doesn't it? For whatever reason nobody can properly explain this. I've been all over the web. I know i'm thick.
Assuming Java here, but is relevant to most other technologies as well.
I'm not sure whether you've confused the use of a plain singleton with a service locator. Both of them are design patterns. The service locator pattern is used by applications to ensure that there is a single class entrusted with the responsibility of obtaining and providing access to databases, files, JMS queues, etc.
Most service locators are implemented as singletons, since there is no need for multiple service locators to do the same job. Besides, it is useful to cache information obtained from the first lookup that can be later used by other clients of the service locator.
By the way, the argument about
"it's to ensure that there is always
only one active connection to your DB"
is false and misleading. It is quite possible that the connection can be closed/reclaimed if left inactive for quite a long period of time. So caching a connection to the database is frowned upon. There is one deviation from this argument; "re-using" the connection obtained from the connection pool is encouraged as long as you do so with the same context, i.e. within the same HTTP request, or user request (whichever is applicable). This done obviously, from the point of view of performance, since establishing new connections can prove to be an expensive operation.
High-performance (or even medium-performance) web apps use database connection pooling, so one DB connection can be shared among many web requests. The singleton is usually the object which manages this pool. I think the motivation for using a singleton is to idiot-proof against maintenance programmers that might otherwise instantiate many of these objects needlessly.
"it's to ensure that there is always only one active connection to your DB." I think that would be better stated as to ensure each CLIENT has only one active connection to your DB. The reason why this is incredibly important is because you want to prevent deadlocks. If I have TWO open database connections (as a client) I might be updating on one connection, then I might try to update the same row in another connection. This will a deadlock which the database cannot detect. So, the idea of the singleton is basically to make sure that there is ONE object who is charge of handing out database connections to each client. Basically. You don't HAVE to have a singleton for this, but most people will tell you it just makes sense that the system only has one.
You're right--usually this isn't what you want.
However, there are plenty of cases where you need to throttle yourself down to a single connection. By serializing your access to the database through a singleton, you can address other issues or constraints like load, bandwidth, etc.
I've done something similar in the past for a bulk processing app. Instead, though, I used a semaphore to synchronize access to the database so I could allow n concurrent db operations.
One might want to use a singleton due to database server constraints, for example, a server might limit the number of connections.
My main conscious reason is that you know what connections can be managed/closed etc., just makes things a bit more organised when you don't have unnecessary, redundant connections.
I don't think it's a simple answer. For instance on ASP.NET, the platform implements connection pooling by default, so it will automatically adjust a "pool" of connections and re-use them so you're not constantly creating and destroying expensive objects.
However, let's say you were writing a data collection application that monitored 200 separate input sources. Every time one of those inputs changed, you fire off a thread that records the event to the database. I would say that could be a bad design if there's a chance that even a fraction of those could fire off at the same time. Suddenly having 20 or 40 active database connections is inefficient. It might be better to queue the updates, and as long as there are updates left in the queue, a singleton connection picks them off the queue and executes them on the server. It's more efficient because you only have to negotiate the connection and authentication once. Once there's no activity for a while you could choose to close down the connection. This kind of behavior would be hard to implement without a central resource manager like a singleton.
"only one active connection" is a very narrow statement for illustration. It could just as well be a singleton managing a pool of connection. The point of a singleton for database connections is that you don't want every consumer making it's own connection or set of connections.
I think you might want to be more specific about, "using a singleton to control the db connection in your web app." Ideally, a java.sql.Connection object will not be thread safe, but your javax.sql.DataSource may want to pool connections, so you should go to a single instance of it to share the pooling.
you are more looking for one connection per request, not one connection for the entire application. you can still control access to it through a singleton though (storing the connection in the HttpContext.Items collection).
It guarantees that each client using your site only gets one connection to the db.
You really do not want a new connection being made everytime a user does an action that will create a db query. Not only for performance reasons with the connection handshaking involved, but to decrease load on the db server.
DB connections are a precious commodity, and this technique helps minimize the amount used at any given time.
Related
I'm using CodeIgniter 3.0 for a web app.
I have 2 databases, on a same server (I guess) but with 2 different hostnames.
I don't use the second database, except for one kind of user.
When this kind of user connects to the web app, the pages take a very long time to display, but for my other users, no problem, so I guess it's a multiple databases connection issue.
In my database.php file, I write the 2 arrays including the databases informations.
In my model files using the second database, I just write something like that:
$db1 = $this->load->database('db1', TRUE);
...
// I do my query as usual
$db1->...
...
return $db1->get();
I do not close the connection.
Questions:
1) In each page, I use several functions using the second database. Is this issue due to theses multiple connections to my second database?
2) Do I have to close the connection in my functions's model, just before the return? Or is it beter to connect and disconnect in the controler?
3) I saw about the CI reconnect function, but how to use it well? To reconnect, I have to connect first, but where to connect first?
4) Or do you think the issue is due to something else, like some bad SQL queries?
Let's go through your questions one at a time and I'll comment.
1) In each page, I use several functions using the second database. Is
this issue due to theses multiple connections to my second database?
I say no because I have used the same multiple DB approach many times and have never seen a performance hit. Besides, if a performance hit was a common problem there would be lots of online complaints and people looking for solutions. I've seen none. (And I spend way too much time helping people with CodeIgniter.)
2) Do I have to close the connection in my function's model, just
before the return? Or is it better to connect and disconnect in the
controller?
If closing the connection did help then the answer to when to do it depends on the overall structure of the logic. For instance, if a controller is using several methods from the same model the create a page then close the connection in the controller. On the other hand, if only one model method is used to create a given page then close the connection in the model.
What you don't want to do is repeatedly open and close a DB connection while building a page.
3) I saw about the CI reconnect function, but how to use it well? To
reconnect, I have to connect first, but where to connect first?
reconnect() is only useful when database server drops the connection due to it being idle for too long. You'll know you need to use reconnect() when you start getting "no database connection" or "cannot connect to database" errors.
4) Or do you think the issue is due to something else, like some bad SQL queries?
Because the other approaches you ask about won't help this is the strongest possibility. Again, my reasoning is because I've never had this problem using multiple database connections.
I suggest you do some performance profiling on the queries to the second database. Check out the following bits of documentation for help with that.
Profiling Your Application
Benchmarking Class
There are lots of reasons for slow page loads and the use of the second DB might just be a coincidence.
About Closing Connections
The question is, "If I do not close the DB connection by myself, CI will do it for me, but when?".
The answer is found in the PHP manual, "Open non-persistent MySQL connections and result sets are automatically destroyed when a PHP script finishes its execution." That quote is from the mysqli documentation, but, to the best of my knowledge, it is true for all of PHP's database extensions, i.e. Oracle, Mssql, PDO, etc.
In short, DB connection closing is baked into PHP and happens when the script is done. In CI, the script is done very shortly after the Controller returns. (Examine the end of /system/core/Codeigniter.php if you want to see what happens when the controller returns.) In effect, a Controller returning is, more or less, another way of saying "after the page is loaded".
Unless you happen to be using persistent connections (usually a bad idea) you seldom need to explicitly close DB connections. One reason to close them yourself is when a lot (really a lot) of time is required to process the query results. Manually closing connections will help assure the DB server won't reach it's connection limit when the web server is under heavy usage.
To determine what "really a lot" means you have to consider multiple factors, i.e. how many connections the database server allows, how the time-to-process compares to the DB idle connection dropout duration, and the amount of traffic the site needs to handle.
There are likely other considerations too. I'm not a database performance tuning expert.
my question seems very simple, but there are some subquestions which requires deeper inspection.
My Question:
What's the best practice/architecture for handle the database connection?
My found options:
For each rest ful service with database(DB) requests create a new
connection to the DB and close this again after the queries.
Create a connection outside the REST Service and use this for each
query.
Option 1.:
One negativ point of this are the costs for etablish and close the connection for each request.
Option 2.:
Don't know whether it's work. I've researched for the Web Service Lifecycle for checking how this could be work, but don't know whether the instances will stay alive after the finish of the Web Service. Also don't know whether it's a good practice because there could be events which destroy the connection. A last issue is I think the requests could be block each other (So it destroys the concept of threads).
Hope you could help me a little bit with this architecture.
Greets,
Nik
If you create one per query / transaction, it is much easier to manage "closing" the connections.
I can see why common sense dictates that you should open one and use it throughout, but you will run into problems with dropped connections and multithreading. So your next step will be to open a pool, say of 50, connections and keep them all open, doling them out to different processes.
If you open a connection when you need it and dispose of it when you've finished, that will not actually close the connection, it'll just return it to the connection pool to be used again.
My collegue is defending that opening a single database connection for an application is much better and faster than opening and closing it using a pool.
He has an ApplicationStart method where he inits Application('db') and keeps this connection live across the app. This app is mostly contains readonly data.
How can I persuade him?
That depends a lot on what the "application" here is. If this is a client application that works on a single thread and does things sequentially, then frankly there won't be any noticeable difference either way. In that scenario, if you use the pool it will basically be a pool of 1 item, and opening a connection from the pool will be virtually instantaneous (and certainly not noticeable compared to network IO). In that scenario I would still say use the inbuilt pooling, as it will avoid assumptions when you change scenario.
However, if you application uses more than one thread, or via any other mechanism does more than one thing at a time (async) etc, using a single connection would be very bad; either it will outright fail, or you will need to synchronize around the connection, which would limit you severely. Note that any server-side application (any kind of web application, WCF service, SOAP service, or socket service) would react very badly to his idea.
Perhaps the main way to convince him is simply: ask him to prove it. Ask for a repeatable test / demonstration that shows this difference.
I have a j2ee webapp that's being used internally by ~20-30 people.
There is no chance of significant growth in the number of users.
From what I understood there's a trade-off between opening a new DB connection for each request made to the webapp (expensive, but doesn't block other users when the DB is in use), to using the singleton pattern (doesn't open new connections but only allows one user at a time).
I thought that since I know that only 30 users will ever use my webapp at the same time, maybe the simplest and best solution would be to store the connection as a session attribute, thus reducing to a minimum the amount of openings made, while still allocating one connection per user.
What do you think?
From what I understood there's a
trade-off between opening a new DB
connection for each request made to
the webapp
That is what connection pools are for. If you use a connection pool in your application, the pool once initialized, is in charge of providing connections for use in the application as and when needed. In a properly tuned connection pool, there are going to be enough connections created on reserve that can be provided to the application, mitigating the need to create and open a connection only when the application requests for it.
I thought that since I know that only
30 users will ever use my webapp at
the same time, maybe the simplest and
best solution would be to store the
connection as a session attribute
Per-user connections are not a good idea, primarily when a web application is concerned. In a web application, it is perfectly possible for users to initiate multiple requests to the server (think multi-tabbed browsing). In such a case, the use of a single connection per user will result in weird application behavior, unless you synchronize access to the connection.
One must also consider the side-effect of putting transient attributes into the session - Connection objects are not serializable and hence must be marked transient. If the session is deserialized at some point, one has to account for the fact that the Connection object will not be available, and must be re-initialized.
I think you're getting into premature optimization especially given the scale of the application. Opening a new connection is not that expensive and like Makach says, most modern RDBMSs handle connection pooling and will hold connections open for subsequent requests. You'd be trying to write better code than the compiler, so to speak.
No. Don't do that. It's perfectly ok to reconnect to the database every time you need to. Any database management system will do their own connection pool caching I think.
If you want to try to keep open connections you'll make it incredible hard for yourself to manage this in a secure, bug-free, safe etc way.
Im currently creating a webportal with ASP.NET which relies heavily on database usage. Basically, every (well almost every :P ) GET query from any user will result in a query to the database from the webserver.
Now, I'm really new at this, and I'm very concerned about performance. Due to my lack of experience in this area, I don't really know what to expect.
My question is, using ADO.NET, would it be a smarter choice to just leave a static connection open from the webserver to the database, and then check the integrety of this connection serverside before each query to the database? - Or, would I be better off opening the connection before each query and then close it afterwards?
In my head the first option would be the better as you save time handshaking etc. before each query and you save memory both on the database and the server side since you only have one connection, but are there any downfalls to this approach? Could 2 queries send at the same time potentially destroy each others integrity or mix the returned dataset?
I've tried searching everywhere in here and on the web to find some best-practices about this, but with no luck. Closest I got was this: is it safe to keep database connections open for long time , but that seems to be more fitting for distributed systems where you have more than one user of the database, whereas I only got my webserver..
You're way early to be worrying about performance.
Anyhow, connections are pooled by the framework. You should be opening them, using them, and disposing of them ASAP.
Something like...
public object Load()
{
using (SqlConnection cn = new SqlConnection(connectionString))
using (SqlCommand cm = new SqlCommand(commandString, cn))
{
cn.Open();
return cm.ExecuteScalar();
}
}
It's better to let ADO.NET handle the connection pooling. It'll persist the connection if it thinks it needs to, but don't use a static connection object. That just smells. It would be better to pass the connection object around to methods that need it, and create the connection in a using block.
You should always close your connection after finishing your DB interaction. ADO.NET has connection pooling which will take care of efficient connection reuse. Whenever you open 2nd, 3rd and subsequent connections - they'll be taken from a pool with almost no overhead.
Hope this helps.
I'd be thinking more about caching than advanced connection pooling. Every get requires a database hit?
If its a portal you've got common content and user specific content, using the Cache you can store common items as well as with a mangled key (with the users id) you can store user specific items.
ADO.NET does connection pooling. When you call close on the connection object it will keep the connection in the pool making the next connection much faster.
Your initial hunch is correct. What you need is database connection pooling.
You definitely don't want to open a connection for every database call, that will result in extremely poor performance very quickly. Establishing a database connecting is very expensive.
Instead what you should be using is a connection pool. The pool will manage your connections, and try to re-use existing connections when possible.
I don't know your platform, but look into Connection Pooling - there must be a library or utility available (either in the base system or as an add-on, or supplied with the database drivers) that will provide the means to pool several active connections to the database, which are ready and raring to be used when you obtain one from the pool.
To be honest, I would expect the pooling to occur by default in any database abstraction library (with an available option to disable it). It appears that ADO.NET does this.
Really the first question to ask is why are you very concerned about performance? What is your expected workload? Have you tried it yet?
But in general, yes, it's smarter to have an open connection that you keep around for a while than to re-open a database connection each time; depending on the kind of connection, network issues, and phase of the moon, it can take a good part of a second or more to make an initial connection; if your workload is such that you expect more than a GET every five seconds or so, you'll be happier with a standing connection.