I have a web application that uses Node.js & a connection to an Oracle Database.
Currently my architecture between Node and the DB uses one connection, that stays open. The issue is that some queries take a long time to return, thus blocking subsequent queries until the first returns.
If I open a new connection on each request this does not happen, and the subsequent queries will return before the first (long) one.
The question is what is best practice? Does each request merit a new connection to the database to be closed on callback, should I prioritize queries that I know to take significant time with their own connection, Or is a single connection correct?
Many thanks in advance for your thoughts.
You could use generic-pool module, which is generic resource pool to reuse expensive resources such as database connections
General idea is that you create a connection pool with certain amount of connections (10 by default).
Connections are reused, they will be kept for a certain max idle time (30 seconds by default).
I use this module for Oracle database in production, and found no issues so far.
Related
I have Odoo front end on aws ec2 instance and connected it with postgresql on ElephentQl site with 15 concurrent connections
so I want to make sure that this connection limits will pose no problem so i wanna use kafka to perform database write instead of Odoo doing it directly but found no recourses online to help me out
Is your issue about Connection Pooling? PostgreSQL includes two implementations of DataSource for JDBC 2 and two for JDBC 3, as shown here.
dataSourceName String Every pooling DataSource must have a unique name.
initialConnections int The number of database connections to be created when the pool is initialized.
maxConnections int The maximum number of open database connections to allow.
When more connections are requested, the caller will hang until a connection is returned to the pool.
The pooling implementations do not actually close connections when the client calls the close method, but instead return the connections to a pool of available connections for other clients to use. This avoids any overhead of repeatedly opening and closing connections, and allows a large number of clients to share a small number of database connections.
Additionally, you might want to investigate, Pgbouncer. Pgbouncer is a stable, in-production connection pooler for PostgreSQL. PostgreSQL doesn’t realise the presence of PostgreSQL. Pgbouncer can do load handling, connection scheduling/routing and load balancing. Read more from this blog that shows how to integrate this with Odoo. There are a lot of references from this page.
Finally, I would second OneCricketeer's comment, above, on using Amazon RDS, unless you are getting a far better deal with ElephantSQL.
On using Kafka, you have to realise that Odoo is a frontend application that is synchronous to user actions, therefore you are not architecturally able to have a functional system if you put Kafka in-between Odoo and the database. You would input data and see it in about 2-10 minutes. I exaggerate but; If that is what you really want to do then by all means, invest the time and effort.
Read more from Confluent, the team behind Kafka that came out of LinkedIn on how they use a solution called BottledWater to do some cool streams over PostgreSQL, that should be more like what you want to do.
Do let us know which option you selected and what worked! Keep the community informed.
I've been researching the whole web about database pooling, but I still don't understand few things which I hope to find answer here from more experienced developers.
My understanding of database pooling is that when there are multiple clients connecting to the database, it makes sense to keep the connection in a "cache" and use that to connect to the database faster.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open. When a client requests data from an endpoint, the server will use the same connection to retrieve the data. How would pooling help in that case?
I'm sure I'm missing something in the understanding of pooling. It would make sense if multiple instances of the same database exist and therefore it's decided in the pool which database to connect to using the cached credentials. Is it what happens there?
Also could you give me a scenario where database pooling should be used and when not?
Thanks for clarifying any doubt of mine.
Connection pooling is handled differently in different application scenarios and platforms/languages.
The main consideration is that a database connection is a finite resource, and it costs time to establish it.
Connections are finite because most database servers impose a maximum limit on the number of concurrent connections (this may be part of your licensing terms). If your application uses more connections than the database allows, it may start rejecting connections (leading to error handling requirements in the client app), or making connections wait (leading to poor response times). By configuring a connection pool, the client application can manage these scenarios in a central place.
Secondly, managing connections is a bit of a pain - there are lots of different error handling scenarios, configuration settings etc.; it's a good idea to centralize this. You can, of course, do that without a connection pool.
Thirdly, connections are "expensive" resources - they take time to establish. Most web pages require several database queries; if each query spends just 1 tenth of a second creating a database connection, you're very quickly spending noticable time waiting for database connections. By using a connection pool, you avoid this overhead.
In .Net, connection pooling is handled automatically, not directly by your application.
All you need to do is open, use and close your connections normally and it will take care of the rest.
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-connection-pooling
If you're talking about a different platform, the mechanics are different, although the purpose is the same.
In all cases, it's time consuming to open and close connections to the DB server, so something between your application and the DB server (typically the database driver or some sort of middle-ware) will maintain a pool of open connections and create, kill and recycle them as necessary.
Pooling keeps the connections open and cuts down on the overhead of opening one for each request.
Also could you give me a scenario where database pooling should be used and when not?
Connection pooling useful in any application that uses the same database connection multiple times within the lifetime of the connection pool.
There would actually be a very slight decrease in performance if you had an application that used a single connection once, then didn't use it again until the connection pool had timed out and recycled. This is extremely uncommon in production applications.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open.
If you have an application that opens a connection and leaves it open, then theoretically pooling would not help. Practically speaking, it's good to periodically kill and recreate various resources like connections, handles, sockets, etc. because software isn't perfect and some code contains resource leaks.
I'm just guessing, but suspect that you're concern is premature optimization. Unless you have done actual testing and determined that connection pooling is a problem, I wouldn't be too concerned with it. Connection pools are generally self-maintaining and almost always improve performance.
An answer from here
Typically, opening a database connection is an expensive operation, so
pooling keeps the connections active so that, when a connection is
later requested, one of the active ones is used in preference to
opening another one.
I understand the concept of Connection Pool in DB management. That is the answer for a "what is ~" question. All developers blog posts, answers, tutorials, DB docs out there always answer for a question "what is". Like they constantly copy/paste text from one another. Nobody tries to explain "why is it so" and "how". The answer above is an example to it.
I can't understand why and how it is possible that keeping, say, 30 opened connections in a pool is less costly for a system than to open a new connection when it is required.
Suppose, I have a web server located in Australia. And a DB in AWS located in USA somewhere in Oregon. Or in GB. Or anywhere but very far from AUS. So all they say that keeping a pool of 20-... opened connections would be less costly for a memory and system performance than to open a new connection every time in such a case? How it can be? Why?
In your scenario, I think the biggest problem is network latency. You can't expect communication between servers located in two different continents to be particularly fast. So if you initiate a new connection every time you need one, you'd experience this latency every single time.
From here:
Connecting to a database server typically consists of several time-consuming steps. A physical channel such as a socket or a named pipe must be established, the initial handshake with the server must occur, the connection string information must be parsed, the connection must be authenticated by the server, checks must be run for enlisting in the current transaction, and so on.
Also, if the connection between the webserver and the database uses SSL/TLS, there's a handshake that must be performed on every new connection before the actual communication can occur (in addition to the normal handshake that ocurrs in normal connections). This handshake is expensive in terms of time.
From here:
Before the client and the server can begin exchanging application data over TLS, the encrypted tunnel must be negotiated: the client and the server must agree on the version of the TLS protocol, choose the ciphersuite, and verify certificates if necessary. Unfortunately, each of these steps requires new packet roundtrips between the client and the server, which adds startup latency to all TLS connections. (...) As the above exchange illustrates, new TLS connections require two roundtrips for a "full handshake"—that’s the bad news
When you use connection pooling, this overhead is avoided by regularly sending something like a 'ping' message to the SQL server from time to time, to avoid the connection from timeout due to inactivity. Sure, this might consume more memory in your server, but nowadays that's a far less expensive resource. Latency in networks is unavoidable.
I'm bit confused about relationship between a
Database Open Session
Connection pooling
To elaborate, I'm using JDBC with Oracle 9i DB and I'm also using a Connection Pool to pool my connections.
What I would like to know is that: When my connections are lying idle in pool, are they associated with any Open Session with database? So If I've 5 connection sitting idle in pool, does that mean there will be 5 corresponding active session Open with my database?
Ok.. I got some answer from other forums:
That depends entirely on the pool implementation. It seems likely they are associated with an open session for a while, and then the sessions are closed if the connections are not used for some time, and reestablished when they're needed again.
Not keeping them open for some amount of time would mean wasting the overhead of establishing connections when requests are coming in rapid-fire. Keeping them open forever would hog limited resources for no good reason. Both of these go against my understanding of the very point of having a connection pool in the first place.
Consider a classic ASP site running on IIS6 with a dedicated SQL Server 2008 backend...
Scenario 1:
Open Connection
Do 15 queries, updates etc all through the ASP-page
Close Connection
Scenario 2:
For each query, update etc, open and close the connection
With connection pooling, my money would be on scenario 2 being the most effective and scalable.
Would I be correct in that assumption?
Edit: More information
This is database operations spread over a lot of asp-code in separate functions, doing separate things etc. It is not 15 queries done in rapid succession. Think a big site with many functions, includes etc.
Fundamentally, ASP pages are synchronous. So why not open a connection once per page load, and close it once per page load? All other opens/closes seem to be unnecessary.
If I understand you correctly you are considering sharing a connection object across complex code held in various functions in various includes.
In such a scenario this would be a bad idea. It becomes difficult to guarantee the correct state and settings on the connection if other code may have seen the need to modify them. Also you may at times have code that fetches a firehose recordset and hasn't finished processing when another piece of code is invoked that also needs a connection. In such a case you could not share a connection.
Having each atomic chunk of code acquire its own connection would be better. The connection would be in a clean known state. Multiple connections when necessary can operate in parrallel. As others have pointed out the cost of connection creation is almost entirely mitigated by the underlying connection pooling.
in your Scenario 2, there is a round-trip between your application and SQLServer for executing each query which consumes your server's resources and time of total executions will raise.
but in Scenario 1, there is only one round-trip and also SQLServer will run all of the queries in just one time. so it is faster and less resource-consuming
EDIT: well, I thought you mean multiple queries in one time..
so, with connection pooling enabled, there is exactly no problem in closing connection after each transaction. so go with Scenario 2
Best practice is to open the connection once, read all your data and close the connection as soon as possible. AFTER you've closed the connection, you can do what you like with the data you retrieved. In this scenario, you don't open too many connections and you don't open the connection for too long.
Even though your code has database calls in several places, the overhead of creating the connection will make things worse than waiting - unless you're saying your page takes many seconds to create on the server side? Usually, even without controlled data access and with many functions, your page should be well under a second to generate on the server.
I believe the default connection pool is about 20 connections but SQLServer can handle alot more. Getting a connection from the server takes the longest time (assuming you are not doing anything daft with your commands) so I see nothing wrong with getting a connection per page and killing it if used afterwards.
For scalability you could run into problems where your connection pool gets too busy and time outs while your script waits for a connection to be come available while your DB is sat there with a 100 spare connections but no one using them.
Create and kill on the same page gets my vote.
From a performance point of view there is no notable difference. ADODB connection pooling manages the actual connections with the db. Adodb.connection .open and .close are just a façade to the connection pool. Instantiating either 1 or 15 adodb.connection objects doesn't really matter performance wise. Before we where using transactions we used the connection string in combination with adodb.command (.activeConnection) and never opened or closed connections explicitly.
Reasons to explicitly keep reference to a adodb.connection are transactions or connection-based functions like mysql last_inserted_id(). In these cases you must be absolutely certain that you are getting the same connection for every query.