I fee very skeptical about this part as I have not found an answer on the web that satisfies me.
The questions are:
Should I keep the db connection on forever and then keep processing my requests. This way is easier as I only have to open up the connection once and forget about closing it unless it goes down for some reason. Here, I understand the overheads of keeping a connection on forever.
Should I open up a fresh connection each time I receive a request that needs some processing in the database and then close after the processing is done.
As per my research, the second part is better, but I need to understand the consequences in both cases.
It will mean a lot if someone can explain me the design overheads here with some examples
Related
my question seems very simple, but there are some subquestions which requires deeper inspection.
My Question:
What's the best practice/architecture for handle the database connection?
My found options:
For each rest ful service with database(DB) requests create a new
connection to the DB and close this again after the queries.
Create a connection outside the REST Service and use this for each
query.
Option 1.:
One negativ point of this are the costs for etablish and close the connection for each request.
Option 2.:
Don't know whether it's work. I've researched for the Web Service Lifecycle for checking how this could be work, but don't know whether the instances will stay alive after the finish of the Web Service. Also don't know whether it's a good practice because there could be events which destroy the connection. A last issue is I think the requests could be block each other (So it destroys the concept of threads).
Hope you could help me a little bit with this architecture.
Greets,
Nik
If you create one per query / transaction, it is much easier to manage "closing" the connections.
I can see why common sense dictates that you should open one and use it throughout, but you will run into problems with dropped connections and multithreading. So your next step will be to open a pool, say of 50, connections and keep them all open, doling them out to different processes.
If you open a connection when you need it and dispose of it when you've finished, that will not actually close the connection, it'll just return it to the connection pool to be used again.
Scenario
The DB for an application has gone down. This results in any actor responsible for committing important data to the DB failing to get a connection
Preferred Behaviour
The important data is written to the db when it comes back up sometime in the future.
Current Implementation
The actor catches the DBException, wraps the data in a DBWriteFailed case class, and sends the message to its supervisor. The supervisor then schedules another write for sometime in the future (e.g. 1 minute) using system.scheduler.scheduleOnce(...) so that we don't spin in circles too much while waiting for the DB to come back up.
This implementation certainly works but I feel there might be a better way.
The protocol gets a bit messier when the committing actor has to respond to the original sender after a successful commit.
The regular flow of messages to the committing actor is not throttled in any way and the actor will happily process the new messages, likely failing to connect to the DB for each and every one of them.
If messages get caught in this retry loop for too long, the mailboxes of the committing actors will start to balloon. It is important that this data be committed, but none of it matters if the application crawls to a halt or crashes due to excessive memory usage.
I am an akka novice and I am largely inexperienced when it comes to supervisor strategies, but I feel as though I may be able to leverage one of those to handle some of this retry logic.
Is there a common approach in akka for solving a problem like this? Am I on the right track or should I be heading in a different direction?
Any help is appreciated.
You can use Akka Circuit Breaker to reduce connection attempts. Instead of using the scheduler as retry queue I would use a buffer (with max size limit) inside the actor and retry those when circuit breaker becomes closed again (onClose callback should send message to self actor). An alternative could be to combine the circuit breaker with a stashing mailbox.
If you're planning to implement full failover in your app
Don't.
Do not bubble database failover responsibility up into the app layer. As far as your app is concerned, the database should just be up and ready to accept reads and writes.
If your database goes down often, spend time making your database more robust (there's a multitude of resources on the web already for this: search the web for terms like 'replication', 'high availability', 'load-balancing' and 'clustering', and learn from the war stories of others at highscalability.com). It all really depends on what the cause of your DB outages are (e.g. I once maxed out the NIC on the DB master, and "fixed" the problem intermittently by enabling GZIP on the wire).
You'll be glad you adhered to a separation of concerns if you go down this route.
If you're planning to implement the odd sprinkling of retry logic and handling DB brown-outs
If you're not expecting your app to become a replacement database, then Patrik's answer is the best way to go.
A commonly problem I see in lots of web languages is that database connections need to be closed otherwise the number of total connections gradually increases and then it grinds to a halt in whatever form.
HTTP is stateless, when the request has finished processing why can't these languages just drop any connections that request opened? Are there any legitimate reasons for why you might keep it open?
Because the cost of opening, authenticating and authorising access to a database is quite expensive. That is why normally everybody uses a databases connection pool. Connections are still open while request handlers pick up a available-already-opened connection from a pool. When one closes a connection what is really happening is that the connection is being freed for others to use.
To answer ...
why can't these languages just drop
any connections that request opened?
Are there any legitimate reasons for
why you might keep it open?
Connections might stay opened after the request is complete and use for other purposes. For instance asynchronous updates of data. But I am with you, in 90% of the cases when the request is finished the connections opened should be returned back to the pool. Depending on the Web Framework you use (Spring, DJANGO, ...) this kind of behaviour can be configured or at least implemented with minimum effort.
Checking for an open connection while closing an http connection gives more overhead so I guess that's why some languages don't close it by default.
And if you don't close it explicitly, it will have to be done by the garbage collector which can take a while.
In an environment with a SQL Server failover cluster or mirror, how do you prefer to handle errors? It seems like there are two options:
Fail the entire current client request, and let the user retry
Catch the error in your DAL, and retry there
Each approach has its pros and cons. Most shops I've worked with do #1, but many of them also don't follow strict transactional boundaries, and seem to me to be leaving themselves open for trouble in the event of failure. Even so, I'm having trouble talking them into #2, which should also result in a better user experience (one catch is the potentially long delay while the failover happens).
Any arguments one way or the other would be appreciated. If you use the second approach, do you have a standard wrapper that helps simplify implementation? Either way, how do you structure your code to avoid issues such as those related to the lack of idempotency in the command that failed?
Number 2 could be an infinite loop. What if it's network related, or the local PC needs rebooted, or whatever?
Number 1 is annoying to users, of course.
If you only allow access via a web site, then you'll never see the error anyway unless the failover happens mid-call. For us, this is unlikely and we have failed over without end users realising.
In real life you may not have nice clean DAL on a web server. You may have an Excel sheet connecting (most financials) or WinForms where the connection is kept open, so you only have the one option.
Fail over should only take a few seconds anyway. If the DB recovery takes more than that, you have bigger issues anyway. And if it happens often enough to have to think about handling it, well...
In summary, it will happen that rarely that you want to know and number 1 would be better. IMHO.
A discussion about Singletons in PHP has me thinking about this issue more and more. Most people instruct that you shouldn't make a bunch of DB connections in one request, and I'm just curious as to what your reasoning is. My first thought is the expense to your script of making that many requests to the DB, but then I counter myself with the question: wouldn't multiple connections make concurrent querying more efficient?
How about some answers (with evidence, folks) from some people in the know?
Database connections are a limited resource. Some DBs have a very low connection limit, and wasting connections is a major problem. By consuming many connections, you may be blocking others for using the database.
Additionally, throwing a ton of extra connections at the DB doesn't help anything unless there are resources on the DB server sitting idle. If you've got 8 cores and only one is being used to satisfy a query, then sure, making another connection might help. More likely, though, you are already using all the available cores. You're also likely hitting the same harddrive for every DB request, and adding additional lock contention.
If your DB has anything resembling high utilization, adding extra connections won't help. That'd be like spawning extra threads in an application with the blind hope that the extra concurrency will make processing faster. It might in some certain circumstances, but in other cases it'll just slow you down as you thrash the hard drive, waste time task-switching, and introduce synchronization overhead.
It is the cost of setting up the connection, transferring the data and then tearing it down. It will eat up your performance.
Evidence is harder to come by but consider the following...
Let's say it takes x microseconds to make a connection.
Now you want to make several requests and get data back and forth. Let's say that the difference in transport time is negligable between one connection and many (just ofr the sake of argument).
Now let's say it takes y microseconds to close the connection.
Opening one connection will take x+y microseconds of overhead. Opening many will take n * (x+y). That will delay your execution.
Setting up a DB connection is usually quite heavy. A lot of things are going on backstage (DNS resolution/TCP connection/Handshake/Authentication/Actual Query).
I've had an issue once with some weird DNS configuration that made every TCP connection took a few seconds before going up. My login procedure (because of a complex architecture) took 3 different DB connections to complete. With that issue, it was taking forever to log-in. We then refactored the code to make it go through one connection only.
We access Informix from .NET and use multiple connections. Unless we're starting a transaction on each connection, it often is handled in the connection pool. I know that's very brand-specific, but most(?) database systems' cilent access will pool connections to the best of its ability.
As an aside, we did have a problem with connection count because of cross-database connections. Informix supports synonyms, so we synonymed the common offenders and the multiple connections were handled server-side, saving a lot in transfer time, connection creation overhead, and (the real crux of our situtation) license fees.
I would assume that it is because your requests are not being sent asynchronously, since your requests are done iteratively on the server, blocking each time, you have to pay for the overhead of creating a connection each time, when you only have to do it once...
In Flex, all web service calls are automatically called asynchronously, so you it is common to see multiple connections, or queued up requests on the same connection.
Asynchronous requests mitigate the connection cost through faster request / response time...because you cannot easily achieve this in PHP without out some threading, then the performance hit is greater then simply reusing the same connection.
that's my 2 cents...