I have a Spring application which supports a single customer.
I would like to extend this application to support multiple customers where each customers database is stored in a separate database. The schema for the database is the same for each customer, and the same DAOs and business logic should remain the same.
How would I accomplish this with Spring/JPA? Would I need to have multiple persistence contexts and wire in an appropriate entity manager factory based upon the currently logged in user? Are there any examples of implementing something similar to this?
I would advice against running separate database under a single application. If a redesign of the data model to incorporate multiple customers is not an option, why don't you run multiple instances of your application server/web container, one for each customer? As otherwise you'll have to deal with the drawbacks of having a shared platform and isolated databases.
With multiple customer databases and a single application your code will become more complex, you can't guarantee that customer data is fully isolated (e.g. due to a bug in the application a customer is shown the wrong data, so there's not much benefit in isolating each customer) and you'll have the nightmare of maintaining each customer database. Also, by having different databases you can virtually guarantee that someone pointy-haired is going to ask for some bespoke functionality for customer A while leaving customer B's functionality untouched, because "... it will be easy, as we've got different databases...", forgetting that the application is shared.
If you really, really want to have separate databases for particular customers, this would be the way to go — define separate persistence units with the same entity definitions, but different entity manager factory configurations.
To me it sounds more like a need to redesign the database structure. I'm guessing that the application has been written for only one client in mind and it turned out that more appeared on the horizon, so, hey, let's do something about it, and fast! Aren't you trying to copy-paste, but in a bigger scale? You'll going to have a lot of redundancy with JPA if you want to have a few databases with the same structure: for example, everything what's defined inside the mapping-file (queries, entity relationship mappings, etc.) is defined per persistence unit — you'll have to repeat these definitions and keep them all synchronized.
I'll stop here, as it is merely guesswork, for the lack of broader description.
Related
The business domain has five high-level bounded contexts
Customers
Applications
Documents
Decisions
Preforms
Further, these bounded contexts has sub-contexts like ordering and delivery of the documents. Despite the project of consisting ten of thousands of classes and dozens of EJB's, most of the business logic resides in relational database views and triggers for a reason: A lot of joins, unions and constraints involved in all business transactions. In other words, there is complex web of dependencies and constraints between the bounded contexts, which restricts the state transfers. In layman terms: the business rules are very complicated.
Now, if I were to split this monolith to database per service microservices architecture, bounded contexts being the suggested service boundaries, I will have to implement all the business logic with explicit API calls. I would end up with hundreds of API's implementing all these stupid little business rules. As the performance is main factor (we use a lot of effort to optimize the SQL as it is now), this is out of the question. Secondly, segregated API's would probably be nightmare to maintain in this web of ever evolving business rules, where as database triggers actually support the high cohesion and DRY mentality, enforcing the business rules transparently.
I came up with a conclusion microservice architecture being unsuitable for this type of document management system. Am I correct, or approaching the idea from wrong angle?
First of all, you don't have to have a Microservices architecture. I really mean it! If you were ordered by management/architect to do it, and it doesn't solve any real problems you are having, you are probably right for pushing back.
That being said, and with the disclaimer that I don't know the exact requirements of your application, having "things" as bounded context is a smell. So having "Customers", "Applications", "Documents", etc. as services is very likely the wrong approach.
Bounded contexts should not be CRUD operations on a specific entity. They should be completely independent (or as independent as possible) "vertical" parts of the whole application. Preferably with their own Database and GUI. They should also operate independently of each other, not requiring input from other services for own decisions.
It is the complete opposite of data-centric design, where tables/fields and relations are the core concepts. Here, functionality is the core concept. You would have to split your application along functionality to arrive at a good separation.
I could imagine a document management system having these idependent bounded contexts / services: Search, Workflow, Editing, etc.
Here is how you would think about it: Search does not require any (synchronous) input from any other service. It may receive regular, even near-time updates with new documents, but that does not impact it's main feature: searching already indexed documents. The GUI is also independent, something like one google-like page with a search box maybe. It can deliver results independently, and would link back to the Workflow or Editing apps when you click on a result.
The others would be similarly independent. Again, the point is to split the services in a way that makes them work independently. If you don't have that, you will only make things worse with Microservices.
First of all the above answer is correct in suggesting that you need to breaup your microservice in a better way.
Now If scalability is your concern(lots of api calls between microservice).
I strongly suggest you to validate that how many of the constraints are really required at the first level, and how many of them you could do in async way. With that what i mean is in distributed enviornment we actually do not need to validate all the things at the same time.
Sometimes these things are not directly visible , for eg: lets say there are two services order service and customer service and order service expose a api which say place a order for customer id. and business say you cannot place a order for a unknown customer
one implementation is from the order service you call the customer service in sync ---- in this case customer service down will impact your service, now lets question do we really need this.
Because a scenario could happen where customer just placed an order and somebody deleted that customer from customer service, now we have a order which dosen't belong to customer.Consistency cannot be guaranteed.
In the new sol. we are saying allow the order service to place the order without checking the customer id and do one of the following:
Using ProcessManager check the customer validity and update the status of the order as invalid and when customer get deleted using ProcessManager update the order status as invalid or perform business logic
Do not check at all , because placing a order dosen't count a thing, when this order will be in the process of dispatch that service will anyway check the customer status
In this way your API hits are reduced and better independent services are produced
Since the native-cloud applications or microservices architecture requires decentralized data model (each microservices has its own database), and universal data model is centralized data model
So, how we have microservices architecture with universal data model patterns?
Is there any reference or implementation of universal data model and microservices?
In general the two concepts are not compatible. Using a universal data model for all of your services would clash with a couple of key ideas behind using Microservices, e.g. Polyglot Persistence, separate development & deployment of each service. Also, let's not forget that the "Data Model Resource Book" was last updated in 2009.
However, if you must combine the two approaches, e.g. because management insists on it, you can encapsulate all access to the universal data model by a dedicated service and make your other services dependent on it.
Some good thoughts on the subject can be found here: http://plainoldobjects.com/2015/09/02/does-each-microservice-really-need-its-own-database-2/
Yes to #Fritz's point -- universal data modeling and microservices are really two different concepts and are very difficult if not impossible to be used together. I would like to add that the reasoning for polyglot persistence is also because of how the data should be modeled. Microservices allow the use of different data stores that can best model the data according to their domain.
To elaborate more, I don't think it would do justice to mention microservices and data modeling but not domain driven design. From my experience, domain driven design really helps in thinking about services, their responsibilities, and their right to exist. For instance, I found it often to be the case that there are usually a collection of services that carries out a particular domain functionality. An example could be an e-commerce application that have payments, shopping carts, etc. These could be separated into different "bounded contexts" based on domain driven design terminology.
With the different bounded contexts, each microservice no longer sees the same concept in the system the same, so in effect, there is no real universal data model. The easiest example that I can think of to show this, is when you also want reporting on the metrics in the system. If the example was an ecommerce application, the notion of a transaction in the orders microservice are going to be different than transactions in a reporting service. The reporting service for instance may want to know about transactions at a sub-level such as the profit or revenue generated for a particular order instead of the particular line items in an order. However, in the perspective of the orders service, the order details such as the line items and the address of the individual that made the purchase are probably important and should be known. This should then require two different data models.
With respect to domain modeling, I may be a bit extreme but I would go as far as saying that if there are multiple services sharing the same data source, they should really be the same service; there should be only one service for a single data source. My arguments for that would be that the domain hasn't been properly modeled and that the coupling makes it different to evolve any one service if there are multiple services that relies on a single data source. The case could be that one service requires the schema of the data source to change while the other one does not but still is required to accommodate the schema change. Hope this helps!
For people that are splitting up monolithic applications into microservices how are you handling the connundrum of breaking apart the database. Typical applications that I've worked on do a lot of database integration for performance and simplicity reasons.
If you have two tables that are logically distinct (bounded contexts if you will) but you often do aggregate processing on a large volumes of that data then in the monolith you're more than likely to eschew object orientation and are instead using your database's standard JOIN feature to process the data on the database prior to return the aggregated view back to your app tier.
How do you justify splitting up such data into microservices where presumably you will be required to 'join' the data through an API rather than at the database.
I've read Sam Newman's Microservices book and in the chapter on splitting the Monolith he gives an example of "Breaking Foreign Key Relationships" where he acknowledges that doing a join across an API is going to be slower - but he goes on to say if your application is fast enough anyway, does it matter that it is slower than before?
This seems a bit glib? What are people's experiences? What techniques did you use to make the API joins perform acceptably?
When performance or latency doesn't matter too much (yes, we don't
always need them) it's perfectly fine to just use simple RESTful APIs
for querying additional data you need. If you need to do multiple
calls to different microservices and return one result you can use
API Gateway pattern.
It's perfectly fine to have redundancy in Polyglot persistence environments. For example, you can use messaging queue for your microservices and send "update" events every time you change something. Other microservices will listen to required events and save data locally. So instead of querying you keep all required data in appropriate storage for specific microservice.
Also, don't forget about caching :) You can use tools like Redis or Memcached to avoid querying other databases too often.
It's OK for services to have read-only replicated copies of certain reference data from other services.
Given that, when trying to refactor a monolithic database into microservices (as opposed to rewrite) I would
create a db schema for the service
create versioned* views** in that schema to expose data from that schema to other services
do joins against these readonly views
This will let you independently modify table data/strucutre without breaking other applications.
Rather than use views, I might also consider using triggers to replicate data from one schema to another.
This would be incremental progress in the right direction, establishing the seams of your components, and a move to REST can be done later.
*the views can be extended. If a breaking change is required, create a v2 of the same view and remove the old version when it is no longer required.
**or Table-Valued-Functions, or Sprocs.
CQRS---Command Query Aggregation Pattern is the answer to thi as per Chris Richardson.
Let each microservice update its own data Model and generates the events which will update the materialized view having the required join data from earlier microservices.This MV could be any NoSql DB or Redis or elasticsearch which is query optimized. This techniques leads to Eventual consistency which is definitely not bad and avoids the real time application side joins.
Hope this answers.
I would separate the solutions for the area of use, on let’s say operational and reporting.
For the microservices that operate to provide data for single forms that need data from other microservices (this is the operational case) I think using API joins is the way to go. You will not go for big amounts of data, you can do data integration in the service.
The other case is when you need to do big queries on large amount of data to do aggregations etc. (the reporting case). For this need I would think about maintaining a shared database – similar to your original scheme and updating it with events from your microservice databases. On this shared database you could continue to use your stored procedures which would save your effort and support the database optimizations.
In Microservices you create diff. read models, so for eg: if you have two diff. bounded context and somebody wants to search on both the data then somebody needs to listen to events from both bounded context and create a view specific for the application.
In this case there will be more space needed, but no joins will be needed and no joins.
I have been enforcing business rules in both my application tier (models) and my database tier (stored procedures who raise errors).
I've been duplicating my validations in both places for a few reasons:
If the conditions change between
when they are checked in the
application code and when they are
checked in the database, the
business rule checks in the database
will save the day. The database
also allows me to lock various
records in a simpler manner than in
my application code, so it seems
natural to do so here.
If we have
to do some batch data
insertions/updates to the database directly, if I route
all these operations through my
stored procedures/functions which
are doing the business rule
validations, there's no chance of me
putting in bad data even though I lack the protections that I would get if I was doing single-input through the application.
While
enforcing these things ONLY in the
database would have the same effect
on the actual data, it seems
improper to just throw data at the
database before first making a good
effort to validate that it conforms
to constraints and business rules.
What's the right balance?
You need to enforce at the data tier to ensure data integrity. That's your last line of defense, and that's the DBs job, to help enforce its world view of the data.
That said, throwing junk data against the DB for validation is a coarse technique. Typically the errors are designed to be human readable rather than machine readable, so its inefficient for the program to process the error from the DB and make heads or tails out of it.
Stored Procedures are a different matter. Back in the day, Stored Procedures were The Way to handle business rules on the data tiers, etc.
But today, with the modern application server environments, they have become a, in general, better place to put this logic. They offer multiple ways to access and expose the data (the web, web services, remote protocols, APIs, etc). Also, if your rules are CPU heavy (arguably most aren't) it's easier to scale app servers than DB servers.
The large array of features within the app servers give them a flexibility beyond what the DB servers can do, and thus much of what was once pushed back in to the DBs is being pulled out with the DB servers being relegated to "dumb persistence".
That said, there are certainly performance advantages using Stored Procs and such, but now that's a tuning thing where the question becomes "is it worth losing the app server capability for the gain we get by putting it in to the DB server".
And by app server, I'm not simply talking Java, but .NET and even PHP etc.
If the rule must be enforced at all times no matter where the data came from or how it was updated, the database is where it needs to be. Remember databases are affected by direct querying to make changes that affect many records or to do something the application would not normally do. These are things like fixing a group of records when a customer is bought out by another customer and they want to change all the historical data, the application of new tax rates to orders not yet processed, the fixing of a some bad data inputs. They are also affected sometimes by other applications which do not use your data layer. They may also be affected by imports run through ETL programs which also cannot use your data layer. So if the rule must in all cases be followed, it must be in the database.
If the rule is only for special cases concerning how this particular input page works, then it needs to be in the application. So if a sales manager has only specific things he can do from his user interface, these things can be specified in the application.
Somethings it is helpful to do in both places. For instance, it is silly to allow a user to put a non-date in an input box that will relate to a date field. The datatype in the database should still be a datetime datatype, but it is best to check some of this stuff before you send.
Your business logic can sit in either location, but should not be in both. The logic should NOT be duplicated because it's easy to make a mistake trying to keep both in sync. If you put it in the model you'll want all data access to go through your models, including batch updates.
There will be trade-offs to putting it in the database vs the application models (here's a few of the top of my head):
Databases can be harder to maintain and update than applications
It's easier to distribute load if it's in the application tier
Multiple, disparate dbs may require splitting business rules (which may not be possible)
Here's my problem. I built a web app, and naturally kept the data in a database which describes that app's domain. Afterwords, I built another web app for the same organization, and used a seperate database to describe that app's domain and store data... and naturally a couple more projects came up and for each app I've isolated it's data to a single database. Deveolpment wise, I think it's ok, as I can maintain changes to the data structure and data at the app's database.
Considering these apps belong to the same organization, there tends to be plenty of data replicated between them, like department names, job titles, shop names, etc. Most of these tables hold the same data, but are not exactly the same in each database, and are not always used by all of the apps. Changes to this data, though, needs to be changed at all the apps (sometimes in a diferent ways) creating a growing management "hassle".
So I've been think of a way to get some syncronization between the data. I want an easier management - update at one app (or a central app) and update all the databases as needed by each app - and also a better way to share data between apps (like maybe mash up data from differnt apps in a new app to alow specific analysis). Most of the data I'm refering to is used as contraints more than being core domain concept, describing the organization rather than describing a particular domain.
I'm looking for opinions on some ways to get this done.
My first idea was to grab comun data structures, like the department names' table i mentioned, and stick'em in a core database. Any updates to the data would be done at this database, through a dedicated web app, and I'd apply some sort of Observer or Publisher / Suscriber Pattern for these changes - on changes the app would notify observing apps (through there dedicated webservice) that the changes occured and allow for the app to grab the new data and use it as it needs. GUIDs could be user as a reference to identify the same data throughout the apps. Also, I could build web services for read and search operations that don't need to be in a specific app's database, but could be useful to it.
A second idea would be that each app manage it's own data, and the apps could observe one another. A change in one could notify others that share the same data structure that the change occurred. I could still use some GUIDs and even build services on any of the apps. I think this would also be less excessive in terms of duplication of data, but might be harder to manage as each app would eventually be coupled to other apps, and I would some how have to distribute responsabilities as to which app controls what information.
I'm really curious as to something of this genre of data distibuition and syncing would work and even be recomended. Opions and other ideas are more than welcome!
What you describe here is a typical case for a "Master Data Management" system. EAI vendors (Oracle, TIBCO, IBM) offer such products. They resemble your first solution, being centralised databases with synchronization processes, detecting changes in external data sources, grabbing the changes and synchronizing data out to other external databases. They also provide a user interface to change master data directly.
MDM software are expensive, but you can implement a custom solution which will be - at least initially - cheaper than purchasing one. Both of your solutions make technical sense but there is a difference in their manageability.
The first one is better, if you can dedicate a responsible person/organization to take care of it and the business owners of your services can agree on making changes via this new centralised system.
The second solution shares the responsibility between the service owners. The hard task here is to identify the owner of each type of information (business object).
I cannot advise a solution without a deeper knowledge of your systems and organizations, but I hope I could give some ideas.