I have Odoo front end on aws ec2 instance and connected it with postgresql on ElephentQl site with 15 concurrent connections
so I want to make sure that this connection limits will pose no problem so i wanna use kafka to perform database write instead of Odoo doing it directly but found no recourses online to help me out
Is your issue about Connection Pooling? PostgreSQL includes two implementations of DataSource for JDBC 2 and two for JDBC 3, as shown here.
dataSourceName String Every pooling DataSource must have a unique name.
initialConnections int The number of database connections to be created when the pool is initialized.
maxConnections int The maximum number of open database connections to allow.
When more connections are requested, the caller will hang until a connection is returned to the pool.
The pooling implementations do not actually close connections when the client calls the close method, but instead return the connections to a pool of available connections for other clients to use. This avoids any overhead of repeatedly opening and closing connections, and allows a large number of clients to share a small number of database connections.
Additionally, you might want to investigate, Pgbouncer. Pgbouncer is a stable, in-production connection pooler for PostgreSQL. PostgreSQL doesn’t realise the presence of PostgreSQL. Pgbouncer can do load handling, connection scheduling/routing and load balancing. Read more from this blog that shows how to integrate this with Odoo. There are a lot of references from this page.
Finally, I would second OneCricketeer's comment, above, on using Amazon RDS, unless you are getting a far better deal with ElephantSQL.
On using Kafka, you have to realise that Odoo is a frontend application that is synchronous to user actions, therefore you are not architecturally able to have a functional system if you put Kafka in-between Odoo and the database. You would input data and see it in about 2-10 minutes. I exaggerate but; If that is what you really want to do then by all means, invest the time and effort.
Read more from Confluent, the team behind Kafka that came out of LinkedIn on how they use a solution called BottledWater to do some cool streams over PostgreSQL, that should be more like what you want to do.
Do let us know which option you selected and what worked! Keep the community informed.
Related
According to Datomic's Connection documentation:
Datomic connections do not adhere to an acquire/use/release pattern.
They are thread-safe, cached, and long lived. Many processes (e.g.
application servers) will never call release.
I'm interested to know how this is achieved in practice, specifically for sql connections. From the client/user perspective this is great as you don't need to worry about the thread pool at all which simplifies client code, and what you need to reason about signficantly. It's something I'd love to replicate in other applications with SQL connections.
Breaking down the question into smaller parts:
What challenges needed to be considered when treating Datomic connections as long lived?
Is the approach suitable in general when dealing with JDBC connections, or is it only suitable for a sub class of problems (including Datomic's)?
I can see that Tomcat's JDBC connection pool is used under the hood, how is this pooling used to achieve long lived connections from the Datomic connection perspective?
In practice when do you use separate JDBC connections behind the scenes e.g. do you use separate connections for reads vs writes?
"It depends" :)
For example, if you have memcached set up, the Datomic peer potentially doesn't have to talk to the sql database at all. Datomic uses sql to fetch chunks of encoded data, not to do structured queries, so if a block is present in memcached, no sql is needed. Also, the peer will get new chunks sent to it by the transactor so if you're particularly lucky, everything is already available in your peer before you run your first query.
If a chunk is not already in the peer, and is not already in memcached, the peer needs to connect to the sql database fetch chunks. But this all happens under the hood, and is managed by the tomcat connection pool as you mentioned. Generally, the idea is that for a query to run successfully, the index has to pull any missing chunks from storage (memcached, sql, ...), and this happens in a lazy fashion. But the datomic connection itself "lives forever", i.e. this is all managed for you, and you don't have to create N connections depending on the amount of traffic your peer has etc etc.
As for writes, those go through the transactor and does not connect directly to storage. Writes are represented as the EDN datastructure we're all familiar with (the list of lists with db/add etc etc), and is shipped of to the transactor to a queue and processed in sequence. The transactor then connects directly to storage when it needs to, but that's obviously a separate concern that does not affect the peer in any way.
I hope this was clarifying :)
I've been researching the whole web about database pooling, but I still don't understand few things which I hope to find answer here from more experienced developers.
My understanding of database pooling is that when there are multiple clients connecting to the database, it makes sense to keep the connection in a "cache" and use that to connect to the database faster.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open. When a client requests data from an endpoint, the server will use the same connection to retrieve the data. How would pooling help in that case?
I'm sure I'm missing something in the understanding of pooling. It would make sense if multiple instances of the same database exist and therefore it's decided in the pool which database to connect to using the cached credentials. Is it what happens there?
Also could you give me a scenario where database pooling should be used and when not?
Thanks for clarifying any doubt of mine.
Connection pooling is handled differently in different application scenarios and platforms/languages.
The main consideration is that a database connection is a finite resource, and it costs time to establish it.
Connections are finite because most database servers impose a maximum limit on the number of concurrent connections (this may be part of your licensing terms). If your application uses more connections than the database allows, it may start rejecting connections (leading to error handling requirements in the client app), or making connections wait (leading to poor response times). By configuring a connection pool, the client application can manage these scenarios in a central place.
Secondly, managing connections is a bit of a pain - there are lots of different error handling scenarios, configuration settings etc.; it's a good idea to centralize this. You can, of course, do that without a connection pool.
Thirdly, connections are "expensive" resources - they take time to establish. Most web pages require several database queries; if each query spends just 1 tenth of a second creating a database connection, you're very quickly spending noticable time waiting for database connections. By using a connection pool, you avoid this overhead.
In .Net, connection pooling is handled automatically, not directly by your application.
All you need to do is open, use and close your connections normally and it will take care of the rest.
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-connection-pooling
If you're talking about a different platform, the mechanics are different, although the purpose is the same.
In all cases, it's time consuming to open and close connections to the DB server, so something between your application and the DB server (typically the database driver or some sort of middle-ware) will maintain a pool of open connections and create, kill and recycle them as necessary.
Pooling keeps the connections open and cuts down on the overhead of opening one for each request.
Also could you give me a scenario where database pooling should be used and when not?
Connection pooling useful in any application that uses the same database connection multiple times within the lifetime of the connection pool.
There would actually be a very slight decrease in performance if you had an application that used a single connection once, then didn't use it again until the connection pool had timed out and recycled. This is extremely uncommon in production applications.
What I fail to understand is how that would help if let's say I have a server that connects to the database and keeps the connection open.
If you have an application that opens a connection and leaves it open, then theoretically pooling would not help. Practically speaking, it's good to periodically kill and recreate various resources like connections, handles, sockets, etc. because software isn't perfect and some code contains resource leaks.
I'm just guessing, but suspect that you're concern is premature optimization. Unless you have done actual testing and determined that connection pooling is a problem, I wouldn't be too concerned with it. Connection pools are generally self-maintaining and almost always improve performance.
An answer from here
Typically, opening a database connection is an expensive operation, so
pooling keeps the connections active so that, when a connection is
later requested, one of the active ones is used in preference to
opening another one.
I understand the concept of Connection Pool in DB management. That is the answer for a "what is ~" question. All developers blog posts, answers, tutorials, DB docs out there always answer for a question "what is". Like they constantly copy/paste text from one another. Nobody tries to explain "why is it so" and "how". The answer above is an example to it.
I can't understand why and how it is possible that keeping, say, 30 opened connections in a pool is less costly for a system than to open a new connection when it is required.
Suppose, I have a web server located in Australia. And a DB in AWS located in USA somewhere in Oregon. Or in GB. Or anywhere but very far from AUS. So all they say that keeping a pool of 20-... opened connections would be less costly for a memory and system performance than to open a new connection every time in such a case? How it can be? Why?
In your scenario, I think the biggest problem is network latency. You can't expect communication between servers located in two different continents to be particularly fast. So if you initiate a new connection every time you need one, you'd experience this latency every single time.
From here:
Connecting to a database server typically consists of several time-consuming steps. A physical channel such as a socket or a named pipe must be established, the initial handshake with the server must occur, the connection string information must be parsed, the connection must be authenticated by the server, checks must be run for enlisting in the current transaction, and so on.
Also, if the connection between the webserver and the database uses SSL/TLS, there's a handshake that must be performed on every new connection before the actual communication can occur (in addition to the normal handshake that ocurrs in normal connections). This handshake is expensive in terms of time.
From here:
Before the client and the server can begin exchanging application data over TLS, the encrypted tunnel must be negotiated: the client and the server must agree on the version of the TLS protocol, choose the ciphersuite, and verify certificates if necessary. Unfortunately, each of these steps requires new packet roundtrips between the client and the server, which adds startup latency to all TLS connections. (...) As the above exchange illustrates, new TLS connections require two roundtrips for a "full handshake"—that’s the bad news
When you use connection pooling, this overhead is avoided by regularly sending something like a 'ping' message to the SQL server from time to time, to avoid the connection from timeout due to inactivity. Sure, this might consume more memory in your server, but nowadays that's a far less expensive resource. Latency in networks is unavoidable.
I have a web application that uses Node.js & a connection to an Oracle Database.
Currently my architecture between Node and the DB uses one connection, that stays open. The issue is that some queries take a long time to return, thus blocking subsequent queries until the first returns.
If I open a new connection on each request this does not happen, and the subsequent queries will return before the first (long) one.
The question is what is best practice? Does each request merit a new connection to the database to be closed on callback, should I prioritize queries that I know to take significant time with their own connection, Or is a single connection correct?
Many thanks in advance for your thoughts.
You could use generic-pool module, which is generic resource pool to reuse expensive resources such as database connections
General idea is that you create a connection pool with certain amount of connections (10 by default).
Connections are reused, they will be kept for a certain max idle time (30 seconds by default).
I use this module for Oracle database in production, and found no issues so far.
I've noticed that on a NopCommerce site we host (which uses Entity Framework) that if I run a crawler on the site (to check for broken links) it knocks the entire webserver offline for a few minutes and no other hosted sites respond. This seems to be because Entity Framework is opening 30-odd database connections and runs hundreds of queries per second (about 20-40 per page view).
I cannot change how EF is used by NopCommerce (it would take weeks) or change the version of EF being used, so can I mitigate the effects it has on SQL Server by limiting how many concurrent connections it uses, to give other sites hosted on the same server a fairer chance at database access?
What I'm ideally looking to do, is limit the number of concurrent DB connections to about 10, for a particular application.
I think the best you can do is use the Max Pool Size setting in the connection string. This limits the maximum number of connections in the connection pool, and I think this means that's the maximum number of connections the application will ever use. What I'm not sure of though, is if it can't get a connection from the pool, will it cause an exception. I've never tried limiting the connections in this manner.
Here's a litle reading on the settings you can put in a ADO.NET connection string:
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring%28v=vs.100%29.aspx
And here's a little more reading on "Using Connection Pooling":
http://msdn.microsoft.com/en-us/library/8xx3tyca%28v=vs.100%29.aspx