Universal data model and microservices integration - data-modeling

Since the native-cloud applications or microservices architecture requires decentralized data model (each microservices has its own database), and universal data model is centralized data model
So, how we have microservices architecture with universal data model patterns?
Is there any reference or implementation of universal data model and microservices?

In general the two concepts are not compatible. Using a universal data model for all of your services would clash with a couple of key ideas behind using Microservices, e.g. Polyglot Persistence, separate development & deployment of each service. Also, let's not forget that the "Data Model Resource Book" was last updated in 2009.
However, if you must combine the two approaches, e.g. because management insists on it, you can encapsulate all access to the universal data model by a dedicated service and make your other services dependent on it.
Some good thoughts on the subject can be found here: http://plainoldobjects.com/2015/09/02/does-each-microservice-really-need-its-own-database-2/

Yes to #Fritz's point -- universal data modeling and microservices are really two different concepts and are very difficult if not impossible to be used together. I would like to add that the reasoning for polyglot persistence is also because of how the data should be modeled. Microservices allow the use of different data stores that can best model the data according to their domain.
To elaborate more, I don't think it would do justice to mention microservices and data modeling but not domain driven design. From my experience, domain driven design really helps in thinking about services, their responsibilities, and their right to exist. For instance, I found it often to be the case that there are usually a collection of services that carries out a particular domain functionality. An example could be an e-commerce application that have payments, shopping carts, etc. These could be separated into different "bounded contexts" based on domain driven design terminology.
With the different bounded contexts, each microservice no longer sees the same concept in the system the same, so in effect, there is no real universal data model. The easiest example that I can think of to show this, is when you also want reporting on the metrics in the system. If the example was an ecommerce application, the notion of a transaction in the orders microservice are going to be different than transactions in a reporting service. The reporting service for instance may want to know about transactions at a sub-level such as the profit or revenue generated for a particular order instead of the particular line items in an order. However, in the perspective of the orders service, the order details such as the line items and the address of the individual that made the purchase are probably important and should be known. This should then require two different data models.
With respect to domain modeling, I may be a bit extreme but I would go as far as saying that if there are multiple services sharing the same data source, they should really be the same service; there should be only one service for a single data source. My arguments for that would be that the domain hasn't been properly modeled and that the coupling makes it different to evolve any one service if there are multiple services that relies on a single data source. The case could be that one service requires the schema of the data source to change while the other one does not but still is required to accommodate the schema change. Hope this helps!

Related

How to maintain master data across multiple parties

The scenario is simple: a number of companies need to share some reference data (let's say, a list of products and their attributes).
The problem is that currently each entity collects/cleans the data internally, then shares the data with other companies which leads to a length process of exchanging files (spreadsheets).
What are some of the modern approaches for solving this? Surely this scenario is very common in modern corporate life, so I am looking for some guidance on standard processes / technologies to look into.
Thanks!
This is indeed a common scenario, and there are several ways of solving the problem, ranging from dedicated commercial offerings (like Tibco MDM or similar offerings from IBM, Dell/Boomi, etc) that tend to be large, monolithic master data solutions to lightweight microservices that each own a particular subset of the data and publish changes in the relevant data to the other microservices that need it. If the database technology and schemas are largely similar between companies, sometimes you can use things like replication or database log shipping to synchronize data between companies. If your data does not need to be synchronized automatically, an ETL procedure could be used to extract changes to master data from one company and load it in the others.
A near-realtime approach that utilizes Domain-Driven Design would identify a Bounded Context for each set of Master Data elements that need to be managed, such as Products. Then, each company would define their Domain Model for Products, defining a Product Aggregate that exposes the public surface of the Product and abstracts the company-specific details. The Domain Model would also define the Events that each company can publish related to their Product aggregate. If different companies own different parts of the product, they may publish changes to those fields as Domain Events to which the other companies subscribe.
The underlying repositories and data structures that define the Product in each company would be updated by a command processor that receives the changes and applies them to the company-specific domain. This can be orchestrated using an ESB, microservices, or a message broker, depending on the complexity of the processing required between companies, such as routning, transformation and translation.

Stateless Micro services and database

We have a requirement of building stateless micro services which rely on a database cluster to persist data.
What is the approach that is recommended for redundant stateless micro services(for high availability and scalability) using the database cluster. For example: Running multiple copies of version 1.0 Payment service.
Should all the redundant micro services use a common shared DB schema or they should have their own schema? In case of independent DB schema inconsistency among the redundant services may exist.
Also how can the schema upgrade handled in case of common DB schema?
This is a super broad topic, and rather hard to answer in general terms.
However...
A key requirement for a micro service architecture is that each service should be independent from the others. You should be able to deploy, modify, improve, scale your micro service independently from the others.
This means you do not want to share anything other than API definitions. You certainly don't want to share a schema; each service should be able to define its own schema, release new versions, change data types etc. without having to check with the other services. That's almost impossible with a shared schema.
You may not want to share a physical server. Sharing a server means you cannot make independent promises on scalability and up-time; a big part of the micro service approach means that the team that builds it is also responsible for running it. You really want to avoid the "well, it worked in dev, so if it doesn't scale on production, it's the operations team's problem" attitude. Databases - especially clustered, redundant databases - can be expensive, so you might compromise on this if you really need this.
As most microservice solutions use containerization and cloud hosting, it's quite unlikely that you'd have the "one database server to rule them all" sitting around. You may find it much better to have each micro service run its own persistence service, rather than sharing.
The common approach to dealing with inconsistencies is to accept them - but to use CQRS to distribute data between microservices, and make sure the micro services deal with their internal consistency requirements.
This also deals with the "should I upgrade my database when I release a new version?" question. If your observers understand the version for each message, they can make decisions on how to store them. For instance, if version 1.0 uses a different set of attributes to version 1.1, the listener can do the mapping.
In the comments, you ask about consistency. This is a super complex topic - especially in micro service architectures.
If you have, for instance, a "customers" service and an "orders" service, you must make sure that all orders have a valid customer. In a monolithic application, with a single database, and exclusively synchronous interactions, that's easy to enforce at the database level.
In a micro service architecture, where you might have lots of data stores, with no dependencies on each other, and a combination of synchronous and asynchronous calls, it's really hard. This is an inevitable side effect of reducing dependencies between micro services.
The most common approach is "eventual consistency". This typically requires a slightly different application design. For instance, on the "orders" screen, you would invoke first the client microservice (to get client data), and then the "orders" service (to get order details), rather than have a single (large) service call to retrieve everything.

How to classify data to different categories with their purpose in ERP Applications?

all
I am recently pondering that how to classify data into different categories in Erp solutions, basing on that, I can decide which data should I strip out and put it into a shared database for multiple tenants instances.
As a industry practice, the Erp product is separated into 2 layers. The technology platform layer provide a lot of reusable components and modeling tools, make business applications follow the consistent architecture, the business application layer witch based on it provide the business functions.
So, basically the data can be categorized into 2 main types.one is platform data, the other is business data.Further,the platform data can be categorized into sub categories:
platform
1)environment
2)engine related(Form engine, workflow engine, data access engine...which make the business function work)
3)metadata(for example:Form Description,Business Object Description, Data Model, Workflow definition)
4)configurations(organization or user related configurations)
5)management related(data structures for manage the models)
business
1)model instances(actual orders data)
2)business configurations
3)derived data(from model instance data, and form query or analysis)
After analysis, I found that the environment data, configurations, management related data, business data are in a high degree of coupling. The only category can be separated from the instance database is the metadata.
1.Does my analysis reasonable?
2.Are there any patterns for reference?
Thanks.
I would like to suggest the following pattern for splitting the data in the database
MetaData[3] that you use to identify, authenticate and authorize the users, the Settings or customization Congifurations [4] per users or tenants along with the basic and the other components [1 throught 4] that are platform specific, (because you have not specified what environment does etc..) and common for all tenants can be in a separate database and the rest of the business specific data to reside in another database[s].
This will help the tenants that come from a geographically different location to store their own data in their separate database, even as part of the national laws and regulations and data safety.
Post your understanding in this regard

Using multiple databases in a Spring JPA application

I have a Spring application which supports a single customer.
I would like to extend this application to support multiple customers where each customers database is stored in a separate database. The schema for the database is the same for each customer, and the same DAOs and business logic should remain the same.
How would I accomplish this with Spring/JPA? Would I need to have multiple persistence contexts and wire in an appropriate entity manager factory based upon the currently logged in user? Are there any examples of implementing something similar to this?
I would advice against running separate database under a single application. If a redesign of the data model to incorporate multiple customers is not an option, why don't you run multiple instances of your application server/web container, one for each customer? As otherwise you'll have to deal with the drawbacks of having a shared platform and isolated databases.
With multiple customer databases and a single application your code will become more complex, you can't guarantee that customer data is fully isolated (e.g. due to a bug in the application a customer is shown the wrong data, so there's not much benefit in isolating each customer) and you'll have the nightmare of maintaining each customer database. Also, by having different databases you can virtually guarantee that someone pointy-haired is going to ask for some bespoke functionality for customer A while leaving customer B's functionality untouched, because "... it will be easy, as we've got different databases...", forgetting that the application is shared.
If you really, really want to have separate databases for particular customers, this would be the way to go — define separate persistence units with the same entity definitions, but different entity manager factory configurations.
To me it sounds more like a need to redesign the database structure. I'm guessing that the application has been written for only one client in mind and it turned out that more appeared on the horizon, so, hey, let's do something about it, and fast! Aren't you trying to copy-paste, but in a bigger scale? You'll going to have a lot of redundancy with JPA if you want to have a few databases with the same structure: for example, everything what's defined inside the mapping-file (queries, entity relationship mappings, etc.) is defined per persistence unit — you'll have to repeat these definitions and keep them all synchronized.
I'll stop here, as it is merely guesswork, for the lack of broader description.

Any recomendations for an efective way to sync data from one database, to other app's databases?

Here's my problem. I built a web app, and naturally kept the data in a database which describes that app's domain. Afterwords, I built another web app for the same organization, and used a seperate database to describe that app's domain and store data... and naturally a couple more projects came up and for each app I've isolated it's data to a single database. Deveolpment wise, I think it's ok, as I can maintain changes to the data structure and data at the app's database.
Considering these apps belong to the same organization, there tends to be plenty of data replicated between them, like department names, job titles, shop names, etc. Most of these tables hold the same data, but are not exactly the same in each database, and are not always used by all of the apps. Changes to this data, though, needs to be changed at all the apps (sometimes in a diferent ways) creating a growing management "hassle".
So I've been think of a way to get some syncronization between the data. I want an easier management - update at one app (or a central app) and update all the databases as needed by each app - and also a better way to share data between apps (like maybe mash up data from differnt apps in a new app to alow specific analysis). Most of the data I'm refering to is used as contraints more than being core domain concept, describing the organization rather than describing a particular domain.
I'm looking for opinions on some ways to get this done.
My first idea was to grab comun data structures, like the department names' table i mentioned, and stick'em in a core database. Any updates to the data would be done at this database, through a dedicated web app, and I'd apply some sort of Observer or Publisher / Suscriber Pattern for these changes - on changes the app would notify observing apps (through there dedicated webservice) that the changes occured and allow for the app to grab the new data and use it as it needs. GUIDs could be user as a reference to identify the same data throughout the apps. Also, I could build web services for read and search operations that don't need to be in a specific app's database, but could be useful to it.
A second idea would be that each app manage it's own data, and the apps could observe one another. A change in one could notify others that share the same data structure that the change occurred. I could still use some GUIDs and even build services on any of the apps. I think this would also be less excessive in terms of duplication of data, but might be harder to manage as each app would eventually be coupled to other apps, and I would some how have to distribute responsabilities as to which app controls what information.
I'm really curious as to something of this genre of data distibuition and syncing would work and even be recomended. Opions and other ideas are more than welcome!
What you describe here is a typical case for a "Master Data Management" system. EAI vendors (Oracle, TIBCO, IBM) offer such products. They resemble your first solution, being centralised databases with synchronization processes, detecting changes in external data sources, grabbing the changes and synchronizing data out to other external databases. They also provide a user interface to change master data directly.
MDM software are expensive, but you can implement a custom solution which will be - at least initially - cheaper than purchasing one. Both of your solutions make technical sense but there is a difference in their manageability.
The first one is better, if you can dedicate a responsible person/organization to take care of it and the business owners of your services can agree on making changes via this new centralised system.
The second solution shares the responsibility between the service owners. The hard task here is to identify the owner of each type of information (business object).
I cannot advise a solution without a deeper knowledge of your systems and organizations, but I hope I could give some ideas.

Resources