I'm working on a web application which it's sections are in separated projects, as a module, and these modules contain WCF services which call each other in some cases.
The question is that if each module has own connection to the database does it affect on performance or I should find a way to share the connection per each web request.
Imaging that in some scenarios four or five modules are involved, it means that each request would create more than five connection that immediately would be closed after execution.
I think the best practice is using connection pool.
think about what will happen if two modules going to participate in a transaction.
if you use separate connections in one transaction you will end up with distributed transaction that of course will suffer the performance.
Related
Currently I am realising that I come to the limits of relational databases. Having n>>1000 clients periodically (each 10 seconds) requesting a simple microservice endpoint (signals its activity), I just have to make a decision if a client is active or not (3 periods of requests missing).
The naive approach would be to make a database update for each clients request (which has bad performance impacts and hence would not be a suitable solution).
Now, having more instances of those services (for future service scaling), any bulk processing (e.g. by using a periodically processed queue structure) would automatically introduce database locking issues due to subsequent requests of the same clients by different instances (this client id would be part of multiple database bulk updates by different services). Hence, those updates would partially fail even for clients not beeing processed in parallel.
Is there any way to make this processing reliable and high-performant using other (e.g. in-memory) database solutions like Redis or somehing else? In Redis I don't get the trick to manage keys and values in a way to retrieve all keys that were not changed during the last x periods.
Do you have any ideas about well-established technologies to cope with that issue?
We are trying to convert our monolithic application to a micro services based architecture. We use Postgresql as one of our database in the monolithic application with BoneCP for connection pooling.
When this monolith is split to a number of independent micro-services with each of them running in a different JVM, I can think about two options for connection pooling
BoneCP or any decent connection pool for each microservice - My initial research shows that this is the primary choice. It is possible to have a fine grained control of connection requirements for each service.But, down side is that as the number of services increase, number of connection pool also increases and eventually there will be too many idle connections assuming that minimum connections in each pool is greater than 0.
Rely on database specific extensions like PGBouncer - This approach has the advantage that connection pool is managed by a central source rather than a pool for each micro service and hence number of idle connections can be brought down. It is also language/technology agnostic. Down side is that these extensions are database specific and some of the functionalities in JDBC may not work. For eg: Prepared statments may not work with PGBouncer in Transaction_Pooling mode.
In our case most of the micro-services(at least 50) will be connecting to the same Postgres server even though the database can be different. So, if we go with option 1, there is a higher chance of creating too many idle connections.The traffic to most of our services are very moderate and the rationale behind moving to micro-service is for easier deployment, scaling etc.
Has anyone faced a similar problem while adopting micro-services architecture? Is there a better way of solving this problem in micro-service world?
I don't see how pgbouncer will solve any of the problems you would have with the first approach. There are many reasons to use pgbouncer but I don't think they are really applicable here.
Also, in my experience, while idle connections can be an issue, they probably will not be on the scale you are talking about. I mean we are not talking hundreds of idle connections right?
More critically, one key thing that a microservices approach would give you is an ability to move dbs off to other servers. If you do this, then having your connection pool centrally managed makes this harder to do.
Per-service pool is generally more flexible and it makes your infrastructure quite a bit more flexible too.
I have responded a similar question here: Microservices - Connection Pooling when connecting to a single legacy database
"I am facing a similar dilemma at my work and I can share the conclusions we have reached so far.
There is no silver bullet at the moment, so:
1 - Calculate the number of connections dividing the total desired number of connections for the instances of microservices will work well if you have a situation where your microservices don't need to drastically elastic scale.
2 - Not having a pool at all and let the connections be opened on demand. This is what is being used in functional programming (like Amazon lambdas). It will reduce the total number of open connections but the downside is that you lose performance as per opening connections on the fly is expensive.
You could implement some sort of topic that let your service know that the number of instances changed in a listener and update the total connection number, but it is a complex solution and goes against the microservice principle that you should not change the configurations of the service after it started running.
Conclusion: I would calculate the number if the microservice tend to not grow in scale and without a pool if it does need to grow elastically and exponentially, in this last case make sure that a retry is in place in case it does not get a connection in the first attempt.
There is an interesting grey area here awaiting for a better way of controlling pools of connections in microservices.
In time, and to make the problem even more interesting, I recommend reading the
article About Pool Sizing from HikariCP: https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing The ideal concurrent connections in a database are actually smaller than most people think."
Maybe group some smaller number of microservices into modulith and use karaf, or other osgi container as a runtime for them. Then you can create bundle that will represent a connection-pool for your database so other bundles — microservices can use it. But I'm not sure if it will solve your architecture problem.
Let's say you have the limiting requirement - only 10 connections to the database.
You can run 10 instances of the microservice with the connection pool limited to 1 connection max. Or you can run 3 instances with pool max=3.
The centralized connection pool, which would serve multiple services in the cloud, sounds bad (the typical single point of failure).
We have an affiliate system which counts millions of banner Impressions/Clicks per day.
Currently it writes to SQL every Impression/Click that occurs in real time on each request.
Web application serves these requests.
We are facing two problems:
If we have a lot of concurrent requests per second, the SQL is
starting to work very hard to insert the Impressons/Clicks data and
as a result lead to problem #2.
If SQL is slow at the moment, the requests are being accumulated and
are waiting in the queue on web server. As a result we have a
slowness on a web application and requests are not being processed.
Design we thought of in high level:
We are now considering changing the design by taking out the writing to SQL logic out of web application (write it to some local storage instead) and making a stand alone service which will read from local storage and eventually write the aggregated Impressions/Clicks data (not in real time) to SQL in background.
Our constraints:
10 web servers (load balanced)
1 SQL server
What do you think of suggested design?
Would you use NoSQL as local storage for each web server?
Suggest your alternative.
Your problem seems to be that your front-end code is synchronusly blocking while waiting for the back-end code to update the database.
Decouple front-end and back-end, e.g. by putting a queue inbetween where the front-end can write to the queue with low latency and high throughput. The back-end then can take its time to process the queued data into their destinations.
It may or may not be necessary to make the queue restartable (i.e. not losing data after a crash). Depending on this, you have various options:
In-memory queue, speedy but not crash-proof.
Database queue, makes sense if writing the raw request data to a simple data structure is faster than writing the final data into its target data structures.
Renundant queues, to cover for crashes.
I'm with Bernd, but I'm not sure about using a queue specifically.
All you need is something asynchronous that you can call; that way the act of logging the impression is pretty much redundant.
I have a j2ee webapp that's being used internally by ~20-30 people.
There is no chance of significant growth in the number of users.
From what I understood there's a trade-off between opening a new DB connection for each request made to the webapp (expensive, but doesn't block other users when the DB is in use), to using the singleton pattern (doesn't open new connections but only allows one user at a time).
I thought that since I know that only 30 users will ever use my webapp at the same time, maybe the simplest and best solution would be to store the connection as a session attribute, thus reducing to a minimum the amount of openings made, while still allocating one connection per user.
What do you think?
From what I understood there's a
trade-off between opening a new DB
connection for each request made to
the webapp
That is what connection pools are for. If you use a connection pool in your application, the pool once initialized, is in charge of providing connections for use in the application as and when needed. In a properly tuned connection pool, there are going to be enough connections created on reserve that can be provided to the application, mitigating the need to create and open a connection only when the application requests for it.
I thought that since I know that only
30 users will ever use my webapp at
the same time, maybe the simplest and
best solution would be to store the
connection as a session attribute
Per-user connections are not a good idea, primarily when a web application is concerned. In a web application, it is perfectly possible for users to initiate multiple requests to the server (think multi-tabbed browsing). In such a case, the use of a single connection per user will result in weird application behavior, unless you synchronize access to the connection.
One must also consider the side-effect of putting transient attributes into the session - Connection objects are not serializable and hence must be marked transient. If the session is deserialized at some point, one has to account for the fact that the Connection object will not be available, and must be re-initialized.
I think you're getting into premature optimization especially given the scale of the application. Opening a new connection is not that expensive and like Makach says, most modern RDBMSs handle connection pooling and will hold connections open for subsequent requests. You'd be trying to write better code than the compiler, so to speak.
No. Don't do that. It's perfectly ok to reconnect to the database every time you need to. Any database management system will do their own connection pool caching I think.
If you want to try to keep open connections you'll make it incredible hard for yourself to manage this in a secure, bug-free, safe etc way.
I know this has been asked before here there and everywhere but i can't get a clear explanation so i'm going to pitch it again. So what is all of the fuss about using a singleton to control the db connection in your web app? Some like it some hate it i don't understand it. From what I've read, "it's to ensure that there is always only one active connection to your DB". I mean why is that a good thing? 1 active DB connection on a data driven web app processing multiple requests per second spells trouble doesn't it? For whatever reason nobody can properly explain this. I've been all over the web. I know i'm thick.
Assuming Java here, but is relevant to most other technologies as well.
I'm not sure whether you've confused the use of a plain singleton with a service locator. Both of them are design patterns. The service locator pattern is used by applications to ensure that there is a single class entrusted with the responsibility of obtaining and providing access to databases, files, JMS queues, etc.
Most service locators are implemented as singletons, since there is no need for multiple service locators to do the same job. Besides, it is useful to cache information obtained from the first lookup that can be later used by other clients of the service locator.
By the way, the argument about
"it's to ensure that there is always
only one active connection to your DB"
is false and misleading. It is quite possible that the connection can be closed/reclaimed if left inactive for quite a long period of time. So caching a connection to the database is frowned upon. There is one deviation from this argument; "re-using" the connection obtained from the connection pool is encouraged as long as you do so with the same context, i.e. within the same HTTP request, or user request (whichever is applicable). This done obviously, from the point of view of performance, since establishing new connections can prove to be an expensive operation.
High-performance (or even medium-performance) web apps use database connection pooling, so one DB connection can be shared among many web requests. The singleton is usually the object which manages this pool. I think the motivation for using a singleton is to idiot-proof against maintenance programmers that might otherwise instantiate many of these objects needlessly.
"it's to ensure that there is always only one active connection to your DB." I think that would be better stated as to ensure each CLIENT has only one active connection to your DB. The reason why this is incredibly important is because you want to prevent deadlocks. If I have TWO open database connections (as a client) I might be updating on one connection, then I might try to update the same row in another connection. This will a deadlock which the database cannot detect. So, the idea of the singleton is basically to make sure that there is ONE object who is charge of handing out database connections to each client. Basically. You don't HAVE to have a singleton for this, but most people will tell you it just makes sense that the system only has one.
You're right--usually this isn't what you want.
However, there are plenty of cases where you need to throttle yourself down to a single connection. By serializing your access to the database through a singleton, you can address other issues or constraints like load, bandwidth, etc.
I've done something similar in the past for a bulk processing app. Instead, though, I used a semaphore to synchronize access to the database so I could allow n concurrent db operations.
One might want to use a singleton due to database server constraints, for example, a server might limit the number of connections.
My main conscious reason is that you know what connections can be managed/closed etc., just makes things a bit more organised when you don't have unnecessary, redundant connections.
I don't think it's a simple answer. For instance on ASP.NET, the platform implements connection pooling by default, so it will automatically adjust a "pool" of connections and re-use them so you're not constantly creating and destroying expensive objects.
However, let's say you were writing a data collection application that monitored 200 separate input sources. Every time one of those inputs changed, you fire off a thread that records the event to the database. I would say that could be a bad design if there's a chance that even a fraction of those could fire off at the same time. Suddenly having 20 or 40 active database connections is inefficient. It might be better to queue the updates, and as long as there are updates left in the queue, a singleton connection picks them off the queue and executes them on the server. It's more efficient because you only have to negotiate the connection and authentication once. Once there's no activity for a while you could choose to close down the connection. This kind of behavior would be hard to implement without a central resource manager like a singleton.
"only one active connection" is a very narrow statement for illustration. It could just as well be a singleton managing a pool of connection. The point of a singleton for database connections is that you don't want every consumer making it's own connection or set of connections.
I think you might want to be more specific about, "using a singleton to control the db connection in your web app." Ideally, a java.sql.Connection object will not be thread safe, but your javax.sql.DataSource may want to pool connections, so you should go to a single instance of it to share the pooling.
you are more looking for one connection per request, not one connection for the entire application. you can still control access to it through a singleton though (storing the connection in the HttpContext.Items collection).
It guarantees that each client using your site only gets one connection to the db.
You really do not want a new connection being made everytime a user does an action that will create a db query. Not only for performance reasons with the connection handshaking involved, but to decrease load on the db server.
DB connections are a precious commodity, and this technique helps minimize the amount used at any given time.