Microservices unsuitable for business domain? - database

The business domain has five high-level bounded contexts
Customers
Applications
Documents
Decisions
Preforms
Further, these bounded contexts has sub-contexts like ordering and delivery of the documents. Despite the project of consisting ten of thousands of classes and dozens of EJB's, most of the business logic resides in relational database views and triggers for a reason: A lot of joins, unions and constraints involved in all business transactions. In other words, there is complex web of dependencies and constraints between the bounded contexts, which restricts the state transfers. In layman terms: the business rules are very complicated.
Now, if I were to split this monolith to database per service microservices architecture, bounded contexts being the suggested service boundaries, I will have to implement all the business logic with explicit API calls. I would end up with hundreds of API's implementing all these stupid little business rules. As the performance is main factor (we use a lot of effort to optimize the SQL as it is now), this is out of the question. Secondly, segregated API's would probably be nightmare to maintain in this web of ever evolving business rules, where as database triggers actually support the high cohesion and DRY mentality, enforcing the business rules transparently.
I came up with a conclusion microservice architecture being unsuitable for this type of document management system. Am I correct, or approaching the idea from wrong angle?

First of all, you don't have to have a Microservices architecture. I really mean it! If you were ordered by management/architect to do it, and it doesn't solve any real problems you are having, you are probably right for pushing back.
That being said, and with the disclaimer that I don't know the exact requirements of your application, having "things" as bounded context is a smell. So having "Customers", "Applications", "Documents", etc. as services is very likely the wrong approach.
Bounded contexts should not be CRUD operations on a specific entity. They should be completely independent (or as independent as possible) "vertical" parts of the whole application. Preferably with their own Database and GUI. They should also operate independently of each other, not requiring input from other services for own decisions.
It is the complete opposite of data-centric design, where tables/fields and relations are the core concepts. Here, functionality is the core concept. You would have to split your application along functionality to arrive at a good separation.
I could imagine a document management system having these idependent bounded contexts / services: Search, Workflow, Editing, etc.
Here is how you would think about it: Search does not require any (synchronous) input from any other service. It may receive regular, even near-time updates with new documents, but that does not impact it's main feature: searching already indexed documents. The GUI is also independent, something like one google-like page with a search box maybe. It can deliver results independently, and would link back to the Workflow or Editing apps when you click on a result.
The others would be similarly independent. Again, the point is to split the services in a way that makes them work independently. If you don't have that, you will only make things worse with Microservices.

First of all the above answer is correct in suggesting that you need to breaup your microservice in a better way.
Now If scalability is your concern(lots of api calls between microservice).
I strongly suggest you to validate that how many of the constraints are really required at the first level, and how many of them you could do in async way. With that what i mean is in distributed enviornment we actually do not need to validate all the things at the same time.
Sometimes these things are not directly visible , for eg: lets say there are two services order service and customer service and order service expose a api which say place a order for customer id. and business say you cannot place a order for a unknown customer
one implementation is from the order service you call the customer service in sync ---- in this case customer service down will impact your service, now lets question do we really need this.
Because a scenario could happen where customer just placed an order and somebody deleted that customer from customer service, now we have a order which dosen't belong to customer.Consistency cannot be guaranteed.
In the new sol. we are saying allow the order service to place the order without checking the customer id and do one of the following:
Using ProcessManager check the customer validity and update the status of the order as invalid and when customer get deleted using ProcessManager update the order status as invalid or perform business logic
Do not check at all , because placing a order dosen't count a thing, when this order will be in the process of dispatch that service will anyway check the customer status
In this way your API hits are reduced and better independent services are produced

Related

Domain Driven Design, should I use multiple databases or a single source of truth

I'm about to propose some fundamental changes to my employers and would like the opinion of the community (opinion because I know that getting a solid answer to something like this is a bit far-fetched).
Context:
The platform I'm working on was built by a tech consultancy before I joined. While I was being onboarded they explained that they used DDD to build it, they have 2 domains, the client side and the admin side, each has its own database, its own GraphQl server, and its own back-end and front-end frameworks. The data between the tables is being synchronized through an http service that's triggered by the GraphQl server on row insertions, updates, and deletes.
Problem:
All of the data present on the client domain is found in the admin domain, there's no domain specific data there. Synchronization is a mess and is buggy. The team isn't large enough to manage all the resources and keep track of the different schemas.
Proposal:
Remove the client database and GraphQl servers, have a single source of truth database for all the current and potentially future applications. Rethink the schema, split the tables that need to be split, consolidate the ones that should be joined, and create new tables according to the actual current business flow.
Am I justified in my proposal, or was the tech consultancy doing the right thing and I'm sending us backwards?
Normally you have a database, or schema, for each separated boundary context. That means, that the initial idea of the consultancy company was correct.
What's not correct is the way that the consistency between the two is managed. You don't do it on tables changes but with services inside one (or both) the domains listening to the events and taking the update actions. It's a lot of work, anyway, because you have to update the event handlers on every change (in the events or tables structure).
This code is what's called anti corruption layer, that's exactly what it does: it avoids any corruption between the copies of the domain in another domain.
Said this, as you pointed out, your team is small and it could be that maintaining such a layer (and hence code) could cost a lot of energies. But, you've also to remember that once you've done, you have just to update it when needed.
Anyway, back to the proposal, you could also take this route. What you should (must, I would say) is that in each domain the external tables should be accessed only by some services, or queries, and this code should never ever modify the content that it access. Never. But I suppose that you already know this.
Nothing is written in the stone, the rules should always be adapted when put in a real context. Two separated databases means more work, but also a much better separation of the domains. It could never happen that someone accidentally modifies the content of the tables of the other domain. On the other side, one database means less work, but also much more care about what the code does.

Universal data model and microservices integration

Since the native-cloud applications or microservices architecture requires decentralized data model (each microservices has its own database), and universal data model is centralized data model
So, how we have microservices architecture with universal data model patterns?
Is there any reference or implementation of universal data model and microservices?
In general the two concepts are not compatible. Using a universal data model for all of your services would clash with a couple of key ideas behind using Microservices, e.g. Polyglot Persistence, separate development & deployment of each service. Also, let's not forget that the "Data Model Resource Book" was last updated in 2009.
However, if you must combine the two approaches, e.g. because management insists on it, you can encapsulate all access to the universal data model by a dedicated service and make your other services dependent on it.
Some good thoughts on the subject can be found here: http://plainoldobjects.com/2015/09/02/does-each-microservice-really-need-its-own-database-2/
Yes to #Fritz's point -- universal data modeling and microservices are really two different concepts and are very difficult if not impossible to be used together. I would like to add that the reasoning for polyglot persistence is also because of how the data should be modeled. Microservices allow the use of different data stores that can best model the data according to their domain.
To elaborate more, I don't think it would do justice to mention microservices and data modeling but not domain driven design. From my experience, domain driven design really helps in thinking about services, their responsibilities, and their right to exist. For instance, I found it often to be the case that there are usually a collection of services that carries out a particular domain functionality. An example could be an e-commerce application that have payments, shopping carts, etc. These could be separated into different "bounded contexts" based on domain driven design terminology.
With the different bounded contexts, each microservice no longer sees the same concept in the system the same, so in effect, there is no real universal data model. The easiest example that I can think of to show this, is when you also want reporting on the metrics in the system. If the example was an ecommerce application, the notion of a transaction in the orders microservice are going to be different than transactions in a reporting service. The reporting service for instance may want to know about transactions at a sub-level such as the profit or revenue generated for a particular order instead of the particular line items in an order. However, in the perspective of the orders service, the order details such as the line items and the address of the individual that made the purchase are probably important and should be known. This should then require two different data models.
With respect to domain modeling, I may be a bit extreme but I would go as far as saying that if there are multiple services sharing the same data source, they should really be the same service; there should be only one service for a single data source. My arguments for that would be that the domain hasn't been properly modeled and that the coupling makes it different to evolve any one service if there are multiple services that relies on a single data source. The case could be that one service requires the schema of the data source to change while the other one does not but still is required to accommodate the schema change. Hope this helps!

Using multiple databases in a Spring JPA application

I have a Spring application which supports a single customer.
I would like to extend this application to support multiple customers where each customers database is stored in a separate database. The schema for the database is the same for each customer, and the same DAOs and business logic should remain the same.
How would I accomplish this with Spring/JPA? Would I need to have multiple persistence contexts and wire in an appropriate entity manager factory based upon the currently logged in user? Are there any examples of implementing something similar to this?
I would advice against running separate database under a single application. If a redesign of the data model to incorporate multiple customers is not an option, why don't you run multiple instances of your application server/web container, one for each customer? As otherwise you'll have to deal with the drawbacks of having a shared platform and isolated databases.
With multiple customer databases and a single application your code will become more complex, you can't guarantee that customer data is fully isolated (e.g. due to a bug in the application a customer is shown the wrong data, so there's not much benefit in isolating each customer) and you'll have the nightmare of maintaining each customer database. Also, by having different databases you can virtually guarantee that someone pointy-haired is going to ask for some bespoke functionality for customer A while leaving customer B's functionality untouched, because "... it will be easy, as we've got different databases...", forgetting that the application is shared.
If you really, really want to have separate databases for particular customers, this would be the way to go — define separate persistence units with the same entity definitions, but different entity manager factory configurations.
To me it sounds more like a need to redesign the database structure. I'm guessing that the application has been written for only one client in mind and it turned out that more appeared on the horizon, so, hey, let's do something about it, and fast! Aren't you trying to copy-paste, but in a bigger scale? You'll going to have a lot of redundancy with JPA if you want to have a few databases with the same structure: for example, everything what's defined inside the mapping-file (queries, entity relationship mappings, etc.) is defined per persistence unit — you'll have to repeat these definitions and keep them all synchronized.
I'll stop here, as it is merely guesswork, for the lack of broader description.

Should business rules be enforced in both the application tier and the database tier, or just one of the two?

I have been enforcing business rules in both my application tier (models) and my database tier (stored procedures who raise errors).
I've been duplicating my validations in both places for a few reasons:
If the conditions change between
when they are checked in the
application code and when they are
checked in the database, the
business rule checks in the database
will save the day. The database
also allows me to lock various
records in a simpler manner than in
my application code, so it seems
natural to do so here.
If we have
to do some batch data
insertions/updates to the database directly, if I route
all these operations through my
stored procedures/functions which
are doing the business rule
validations, there's no chance of me
putting in bad data even though I lack the protections that I would get if I was doing single-input through the application.
While
enforcing these things ONLY in the
database would have the same effect
on the actual data, it seems
improper to just throw data at the
database before first making a good
effort to validate that it conforms
to constraints and business rules.
What's the right balance?
You need to enforce at the data tier to ensure data integrity. That's your last line of defense, and that's the DBs job, to help enforce its world view of the data.
That said, throwing junk data against the DB for validation is a coarse technique. Typically the errors are designed to be human readable rather than machine readable, so its inefficient for the program to process the error from the DB and make heads or tails out of it.
Stored Procedures are a different matter. Back in the day, Stored Procedures were The Way to handle business rules on the data tiers, etc.
But today, with the modern application server environments, they have become a, in general, better place to put this logic. They offer multiple ways to access and expose the data (the web, web services, remote protocols, APIs, etc). Also, if your rules are CPU heavy (arguably most aren't) it's easier to scale app servers than DB servers.
The large array of features within the app servers give them a flexibility beyond what the DB servers can do, and thus much of what was once pushed back in to the DBs is being pulled out with the DB servers being relegated to "dumb persistence".
That said, there are certainly performance advantages using Stored Procs and such, but now that's a tuning thing where the question becomes "is it worth losing the app server capability for the gain we get by putting it in to the DB server".
And by app server, I'm not simply talking Java, but .NET and even PHP etc.
If the rule must be enforced at all times no matter where the data came from or how it was updated, the database is where it needs to be. Remember databases are affected by direct querying to make changes that affect many records or to do something the application would not normally do. These are things like fixing a group of records when a customer is bought out by another customer and they want to change all the historical data, the application of new tax rates to orders not yet processed, the fixing of a some bad data inputs. They are also affected sometimes by other applications which do not use your data layer. They may also be affected by imports run through ETL programs which also cannot use your data layer. So if the rule must in all cases be followed, it must be in the database.
If the rule is only for special cases concerning how this particular input page works, then it needs to be in the application. So if a sales manager has only specific things he can do from his user interface, these things can be specified in the application.
Somethings it is helpful to do in both places. For instance, it is silly to allow a user to put a non-date in an input box that will relate to a date field. The datatype in the database should still be a datetime datatype, but it is best to check some of this stuff before you send.
Your business logic can sit in either location, but should not be in both. The logic should NOT be duplicated because it's easy to make a mistake trying to keep both in sync. If you put it in the model you'll want all data access to go through your models, including batch updates.
There will be trade-offs to putting it in the database vs the application models (here's a few of the top of my head):
Databases can be harder to maintain and update than applications
It's easier to distribute load if it's in the application tier
Multiple, disparate dbs may require splitting business rules (which may not be possible)

Should data validation be done at the database level?

I am writing some stored procedures to create tables and add data. One of the fields is a column that indicates percentage. The value there should be 0-100. I started thinking, "where should the data validation for this be done? Where should data validation be done in general? Is it a case by case situation?"
It occurs to me that although today I've decided that 0-100 is a valid value for percentage, tomorrow, I might decide that any positive value is valid. So this could be a business rule, couldn't it? Should a business rule be implemented at the database level?
Just looking for guidance, we don't have a dba here anymore.
Generally, I would do validations in multiple places:
Client side using validators on the aspx page
Server side validations in the code behind
I use database validations as a last resort because database trips are generally more expensive than the two validations discussed above.
I'm definitely not saying "don't put validations in the database", but I would say, don't let that be the only place you put validations.
If your data is consumed by multiple applications, then the most appropriate place would be the middle tier that is (should be) consumed by the multiple apps.
What you are asking in terms of business rules, takes on a completely different dimension when you start thinking of your entire application in terms of business rules. If the question of validations is small enough, do it in individual places rather than build a centralized business rules system. If it is a rather large system, them you can look into a business rules engine for this.
If you have a good data access tier, it almost doesn't matter which approach you take.
That said, a database constraint is a lot harder to bypass (intentionally or accidentally) than an application-layer constraint.
In my work, I keep the business logic and constraints as close to the database as I can, ensuring that there are fewer potential points of failure. Different constraints are enforced at different layers, depending on the nature of the constraint, but everything that can be in the database, is in the database.
In general, I would think that the closer the validation is to the data, the better.
This way, if you ever need to rewrite a top level application or you have a second application doing data access, you don't have two copies of the (potentially different) code operating on the same data.
In a perfect world the only thing talking (updating, deleting, inserting) to your database would be your business api. In the perfect world databae level constraints are a waste of time, your data would already have been validated and cross checked in your business api.
In the real world we get cowboys taking shortcuts and other people writing directly to the database. In this case some constraints on the database are well worth the effort. However if you have people not using your api to read/write you have to consider where you went wrong in your api design.
It would depend on how you are interacting with the database, IMO. For example, if the only way to the database is through your application, then just do the validation there.
If you are going to allow other applications to update the database, then you may want to put the validation in the database, so that no matter how the data gets in there it gets validated at the lowest level.
But, validation should go on at various levels, to give the user the quickest opportunity possible to know that there is a problem.
You didn't mention which version of SQL Server, but you can look at user defined datatypes and see if that would help you out, as you can just centralize the validation.
I worked for a government agency, and we had a -ton- of business rules. I was one of the DBA's, and we implemented a large number of the business rules in the database; however, we had to keep them pretty simple to avoid Oracle's dreaded 'mutating table' error. Things get complicated very quickly if you want to use triggers to implement business rules which span several tables.
Our decision was to implement business rules in the database where we could because data was coming in through the application -and- through data migration scripts. Keeping the business rules only in the application wouldn't do much good when data needed to be migrated in to the new database.
I'd suggest implementing business rules in the application for the most part, unless you have data being modified elsewhere than in the application. It can be easier to maintain and modify your business rules that way.
One can make a case for:
In the database implement enough to ensure overall data integrity (e.g. in SO this could be every question/answer has at least one revision).
In the boundary between presentation and business logic layer ensure the data makes sense for the business logic (e.g. in SO ensuring markup doesn't contain dangerous tags)
But one can easily make a case for different places in the application layers for every case. Overall philosophy of what the database is there for can affect this (e.g. is the database part of the application as a whole, or is it a shared data repository for many clients).
The only thing I try to avoid is using Triggers in the database, while they can solve legacy problems (if you cannot change the clients...) they are a case of the Action at a Distance anti-pattern.
I think basic data validation like you described makes sure that the data entered is correct. The applications should be validating data, but it doesn't hurt to have the data validated again on the database. Especially if there is more than one way to access the database.
You can reasonable restrict the database so that the data always makes sense. A database will support multiple applications using the same data so some restrictions make sense.
I think the only real cost in doing so would be time. I think such restrictions aren't a big deal unless you are doing something crazy. And, you can change the rules later if needed (although some changes are obviously harder than others)
First ideal: have a "gatekeeper" so that your data's consistency does not depend upon each developer applying the same rules. Simple validation such as range validation may reasonably be implemented in the DB. If it changes at least you have somewhere to put.
Trouble is the "business rules" tend to get much more complex. It can be useful to offload processing to the application tier where OO languages can be better for managing complex logic.
The trick then is to structure the app tier so that the gatekeeper is clear and unduplicated.
In a small organisation (no DBA ergo, small?) I would tend to put the business rules where you have strong development expertise.
This does not exclude doing initial validation in higher levels, for example you might validate all the way up in the UI to help the user get it right, but you don't depend upon that initial validation - you still have the gatekeeper.
If you percentage is always 'part divided by whole' (and you don't save part and whole values elsewhere), then checking its value against [0-100] is appropriate at db level. Additional constraints should be applied at other levels.
If your percentage means some kind of growth, then it may have any kind of values and should not be checked at db level.
It is case by case situation. Usually you should check at db level only constraints, which can never change or have natural limits (like first example).
Richard is right: the question is subjective the way it has been asked here.
Another take is: what are the schools of thought on this? Do they vary by sector or technology?
I've been doing Ruby on Rails for a bit now, and there, even relationships between records (one-to-many etc.) are NOT respected on the DB level, not to mention cascade deleting and all that stuff. Neither are any kind of limits aside from basic data types, which allow the DB to do its work. Your percentage thing is not handled on the DB level but rather at the Data Model level.
So I think that one of the trends that we're seeing lately is to give more power to the app level. You MUST check the data coming in to your server (so somewhere in the presentation level) and you MIGHT check it on the client and you MIGHT check again in the business layer of your app. Why would you want to check it again at the database level?
However: the darndest things do happen and sometimes the DB gets values that are "impossible" reading the business-layer's code. So if you're managing, say, financial data, I'd say to put in every single constraint possible at every level. What do people from different sectors do?

Resources