Splitting monolith to microservices database issues - database

I am splitting monolith application to microservices and I was able to split it to three microservices, for easier explanation suppose these are:
Users (CRUD)
Messages (CRUD)
Other things (CRUD)
All of these are distinct bounded contexts and I'm using database table for microservice. So in DB i have:
USERS table
id
surname
lastname
...
OTHER_THINGS table
id
col1
col2
...
MESSAGES table
id
title
created_time
USER_ID
OTHER_THING_ID
...
Now my web page needs searching/filtering of messages by all of the specified columns of all of these tables. For example:
Web page user can enter:
surname of USER,
col2 of OTHER_THINGS
title of messages
And I should return only filtered rows.
With monolith I have used simple database JOINS, but in this situation I can't find the best option. Can you suggest me possible options and which ones are better?

"suppose I have Orders and Customers tables, where ORDER has FK to CUSTOMER. For me these seems to be in different microservices. "
Still nope to the foreign key. The Orders microservice has a data store with its own Customers table. The Customer Update microservice has a data store with its own Customers table. The Customer Orders search would be a feature of the Orders microservice and so will search its data store not the Customer Update data store.
The whole point about microservices is the absence of dependencies. They are entire, discrete systems in the their own right. This makes them easy to build and easy to deploy. The snag is the issue you are butting up against: data management. Most enterprises aspire to a single source of truth regarding their data. Which usually means a central database, which imposes constraints on applications because everything has to share the same data model and changes to common entities such as Customer cause major upheaval.
Microservices appear to offer a solution to this by spinning out subsets of functionality which own their own data model. This inevitably means data integrity across the enterprise is looser, because it is handled asynchronously. There is no longer a single source of truth.
So the Customer Update microservice will publish updates about Customers as messages which the Orders microservice will consume and apply. Likewise, if the Orders microservice can create new Customers then it will publish a similar stream of messages which the Customer Update microservice will consume and apply. What happens if the two microservices create records for the same new Customer in the same window between refreshes? Well, yes, a good question.
The upshot is, the microservice will work in some scenarios and be absolutely disastrous in others. Certainly most enterprise applications will remain largely monolithic not just through inertia but because the benefits of centrally shared data outweigh the agility of microservices in many instances.

Related

How to persist data in microservices?

I am getting started in microservices architectures and I have a couple of questions about the data persistence and databases.
So my understanding is each microservice has it's own database (not necessarily, but usually). But given that case, consider a usual social media platform with users, posts and comments. There will be two microservices, a user's microservice and a posts' microservice. The user's database have a users table and the posts' database has posts and comments tables.
My question is on the posts microservice, because each post and comment has an author, so usually we would create the foreign key pointing to the user's table, however this is in a different database. What to do then? From my perspective there are 2 options:
Add the authorId entry to the table but not the foreign key constrain. If so, what would happen in the application whenever we retrieve that user's data from the user's microservice using the authorId and the user's data is gone?
Create an author's table in the posts' database. If so, what data should that table contain other than the user's id?
It just doesn't feel right to duplicate the data that is already in the user's database but it also doesn't feel right to use the user's id without the FK constraint.
One thing to note, data growth is quite different
Users -> relatively static data.
Posts & Comments -> Dynamic and could be exponentially high compared to users data.
Two microservices design looks good. I would prefer option-1 from your design.
Duplication is not bad, In normal database design this is normal to have "Denormalization" for better read performance. This is also helping in decoupling from users table , may help you to choose different database if require. some of your question what if users data is missing and posts is available, this can be handle with business logic and API design.

How big tech companies share databases across multiple teams?

How multiple teams(which own different system components/micro-services) in a big tech company share their databases.
I can think of multiple use cases where this would be required. For example in an e-commerce firm, same product will be shared among multiple teams like product at first will be part of product onboarding service, then may be catalog service (which stores all products and categories), then search service, cart service, order placing service, recommendation service, cancellation & return service and so on.
If they don't share any db then
Do they all have redundant copy of the products with same product ID and
Wouldn't there be a challenge to achieve consistency among multiple team.
There are multiple related doubt I have in both the case wether they share DB or not.
I have been through multiple tech blogs and video on software design, and still didn't get satisfying answer. Do share some resources which can give a complete workflow of how things work end-to-end in a big tech firm.
Thank you
In the microservice architecture, each microservice exposes endpoints where other microservice can access shared information between the services. So one service would store as minimal information of a record that is managed by another microservice.
For example if a user service would like to fetch orders for a particular user in an e-commerce case, then the order service would expose an endpoint given a user id would return all orders related to the userid supplied and so on...so essentally the only field related to the user that the order service needs to store is the userid, the rest of the user details is irrelevant to it.
To further improve the cohesion and understanding between teams, data discovery apis/documentation are also built to share metadata of databases to other teams to further explain what each table/field means for one to efficiently plan out a microservice. You can read more about how such companies build data discovery tools
here
If I understand you correctly, you are unsure how different departments receive data in a company?
The idea is that you create reusable and effective API's to solve this problem.
Let's generically say the company we're looking at is walmart. Walmart has millions of items in a database(s). Each item has a unique ID etc etc.
If Walmart is selling items online via walmart.com, they have to have a way to get those items, so they create API's and use them to grab items based on certain query conditions.
Now, let's say walmart has decided to build an app... well they need those exact same items! Well, good thing we already created those API's, we will use the exact same ones to grab the data.
Now, how does Walmart manage which items are available at which store, and at what price? They would usually link this meta data through additional database schema tables and tying them all together with primary and foreign keys.
^^ This essentially allows walmart to grab ONLY the item out of their CORE database that only has details that are necessary to the item (e.g. name, size, color, SKU, details, etc), and link it to another database that is say, YOUR local walmart that contains information relevant to only your walmart location in regard to that item (e.g. price, stock, aisle number etc).
So using multiple databases yes, in a sense.
Perhaps this may drive you down some more roads: https://learnsql.com/blog/why-use-primary-key-foreign-key/
https://towardsdatascience.com/designing-a-relational-database-and-creating-an-entity-relationship-diagram-89c1c19320b2
There's a substantial diversity of approaches used between and even within big tech companies, driven by different company/org cultures and different requirements around consistency and availability.
Any time you have an explicit "query another service/another DB" dependency, you have a coupling which tends to turn a problem in one service into a problem in both services (and this isn't a necessarily a one-way thing: it's quite possible for the querying service to encounter a problem which cascades into a problem in the queried service (this is especially possible when a cache becomes load-bearing, which has led to major outages at at least one FANMAG in the not-that-distant past)).
This has led some companies that could be fairly called big tech to eschew that approach in their service design, typically by having services publish events describing what has changed to a durable log (append-only storage). Other services subscribe to that log and use the events to construct their own eventually consistent view of the data owned by the other service (i.e. there's some level of data duplication, with services storing exactly the data they need to function).

How to split many to many tables to achieve loosely coupled Microservice

I have three tables Invoice(InvoiceId, Invoicetotoalprice), table complaint(complaintId), and table are complaintType(complaintTypeID). And I have two microservices invoice Management and Complaint Management. So the table complaintType is identified by complaint table and invoice table; invoice Management and Complaint Management share the table the pic included to show the relation. Is there a pattern to deal with this problem? Also Microservice communication would be via API gateway
enter image description here
There are two approaches to solve this kind of problem.
The first and easier to implement is to keep the table in one of the microservices, in your case I suppose that makes more sense to maintain it in complaint service, and in invoice service keep only the id of the table so if at any moment you need to request info of the complaint you have to make a request to the complaint service passing the id.
The second option and the best, because this way you minimize the communication between services and makes them less coupled, is to replicate the table so it will be two different tables in search microservice, but only one of the microservices will be the owner, in this case, I suppose will be the complaint microservice. So when any modification occurs in that table the complaint microservice will publish a message using some message broker like kafka and invoice service will consume it and update the table so there won't be inconsistencies between the two models. This way you decouple the two services and if for example, the complaint server goes down, it won't affect the invoice service

Extending data models within a MicroService architecture

As part of a personal research project, I'm attempting to learn more about Microservice architecture and how to incorporate it within the industry I work with.
I've been reading a lot of booked and articles around Microservice architecture, and have been researching and working with multiple different software components to support this architecture, such as RabbitMQ, however I have come unstuck at initiation of the data models.
To put the requirement in it's simplest form, lets say I have the requirement for two Microservices for the following high level requirement (please note, I am excluding the WebAPI / Bridge microservice and UI microservices as part of this process, I am just focusing on the backend Microservices that house core data):
Requirement:
Provide the ability for a customer to log into a portal and add register for a "scheme", which allows them to add money or credit to their record. This will be a multi-tenanted solution, where data can be placed in a single database or over multiple databases (already covered) and each Microservice will be responsible for it's own table(s).
Some "tenants" may or may not have the "Credit Microservice" enabled as defined below and may only have the "Customer Microservice" and potentially other Microservices such as "Marketing Microservice" (not defined below, but referenced as an example)
Customer Microservice:
Responsible for managing customer information within the system, such as First Name, Last name, email address, etc. This Microservice will expose various functions such as Creating a new Customer, Updating an Existing customer, Deleting a customer and finally retrieving customers (Basic CRUD operations).
This Microservice will be fed data directly from our internal CRM (Customer Relationship Management) system via integration (this part is covered)
Data Scheme:
CustomerId
FirstName
LastName
EmailAddress
Once a customer is created, an event is posted on the message queue informing other microservices that a Customer has been created.
Credit Microservice:
Responsible for managing a customers balance and scheme enrolled within. By default, a customer will NOT be enrolled in a scheme, and will therefore NOT be able to deposit credit to their account.
This Microservice will expose functions to "Enroll" a member to a scheme and "AddCredit" to their account as well as retrieving the information for a particular customer (their scheme and balance). In real life, this would also let them change scheme, and have various other options, but I'm keeping that out of this for this simple example.
Within the system, this Microservice would likely be responsible for storing the "transactions" to credit as well, in a separate table within it's database (or unique schema within a single database instance). This is not defined below as it is irrelevant to the issue at hand.
Data Scheme:
CustomerId
SchemeName
Balance
This Microservice will be responsible for listening to events from the Customer Microservice and creating new customers within it's own unique SQL table, allowing that customer to Enrol or Add Credit to their account.
Issue:
Within the UI element of the application, I need to show admin users a list of customers, including if they are enrolled in the application or not, as well as how much credit the customer has, but only if they have enabled the "Credit" functionality (the identification of this is already covered, each tenant can enable certain Microservices when setup)
The issue comes in the fact that the data is stored over two tables, and mastered by two different Microservices...
Do I create a new Microservice (EG: CustomerCredit) that joins the two tables together (readonly) to display the results? Does the API call the Customer Microservice "retrieve" first then call the Credit Microservice "retrieve" after with the relevant IDs and then join them together?
The first example above does not work in practice, as I might have MULTIPLE microservices extending the schema of the customer model, for example, Marketing where I might store "Last Email Date" against a customer id.
EDIT:
To confirm, the structure would look similar to the below:

How does data denormalization work with the Microservice Pattern?

I just read an article on Microservices and PaaS Architecture. In that article, about a third of the way down, the author states (under Denormalize like Crazy):
Refactor database schemas, and de-normalize everything, to allow complete separation and partitioning of data. That is, do not use underlying tables that serve multiple microservices. There should be no sharing of underlying tables that span multiple microservices, and no sharing of data. Instead, if several services need access to the same data, it should be shared via a service API (such as a published REST or a message service interface).
While this sounds great in theory, in practicality it has some serious hurdles to overcome. The biggest of which is that, often, databases are tightly coupled and every table has some foreign key relationship with at least one other table. Because of this it could be impossible to partition a database into n sub-databases controlled by n microservices.
So I ask: Given a database that consists entirely of related tables, how does one denormalize this into smaller fragments (groups of tables) so that the fragments can be controlled by separate microservices?
For instance, given the following (rather small, but exemplar) database:
[users] table
=============
user_id
user_first_name
user_last_name
user_email
[products] table
================
product_id
product_name
product_description
product_unit_price
[orders] table
==============
order_id
order_datetime
user_id
[products_x_orders] table (for line items in the order)
=======================================================
products_x_orders_id
product_id
order_id
quantity_ordered
Don't spend too much time critiquing my design, I did this on the fly. The point is that, to me, it makes logical sense to split this database into 3 microservices:
UserService - for CRUDding users in the system; should ultimately manage the [users] table; and
ProductService - for CRUDding products in the system; should ultimately manage the [products] table; and
OrderService - for CRUDding orders in the system; should ultimately manage the [orders] and [products_x_orders] tables
However all of these tables have foreign key relationships with each other. If we denormalize them and treat them as monoliths, they lose all their semantic meaning:
[users] table
=============
user_id
user_first_name
user_last_name
user_email
[products] table
================
product_id
product_name
product_description
product_unit_price
[orders] table
==============
order_id
order_datetime
[products_x_orders] table (for line items in the order)
=======================================================
products_x_orders_id
quantity_ordered
Now there's no way to know who ordered what, in which quantity, or when.
So is this article typical academic hullabaloo, or is there a real world practicality to this denormalization approach, and if so, what does it look like (bonus points for using my example in the answer)?
This is subjective but the following solution worked for me, my team, and our DB team.
At the application layer, Microservices are decomposed to semantic function.
e.g. a Contact service might CRUD contacts (metadata about contacts: names, phone numbers, contact info, etc.)
e.g. a User service might CRUD users with login credentials, authorization roles, etc.
e.g. a Payment service might CRUD payments and work under the hood with a 3rd party PCI compliant service like Stripe, etc.
At the DB layer, the tables can be organized however the devs/DBs/devops people want the tables organized
The problem is with cascading and service boundaries: Payments might need a User to know who is making a payment. Instead of modeling your services like this:
interface PaymentService {
PaymentInfo makePayment(User user, Payment payment);
}
Model it like so:
interface PaymentService {
PaymentInfo makePayment(Long userId, Payment payment);
}
This way, entities that belong to other microservices only are referenced inside a particular service by ID, not by object reference. This allows DB tables to have foreign keys all over the place, but at the app layer "foreign" entities (that is, entities living in other services) are available via ID. This stops object cascading from growing out of control and cleanly delineates service boundaries.
The problem it does incur is that it requires more network calls. For instance, if I gave each Payment entity a User reference, I could get the user for a particular payment with a single call:
User user = paymentService.getUserForPayment(payment);
But using what I'm suggesting here, you'll need two calls:
Long userId = paymentService.getPayment(payment).getUserId();
User user = userService.getUserById(userId);
This may be a deal breaker. But if you're smart and implement caching, and implement well engineered microservices that respond in 50 - 100 ms each call, I have no doubt that these extra network calls can be crafted to not incur latency to the application.
It is indeed one of key problems in microservices which is quite conviniently omitted in most of articles. Fortunatelly there are solutions for this. As a basis for discussion let's have tables which you have provided in the question.
Image above shows how tables will look like in monolith. Just few tables with joins.
To refactor this to microservices we can use few strategies:
Api Join
In this strategy foreign keys between microservices are broken and microservice exposes an endpoint which mimics this key. For example: Product microservice will expose findProductById endpoint. Order microservice can use this endpoint instead of join.
It has an obvious downside. It is slower.
Read only views
In the second solution you can create copy of the table in the second database. Copy is read only. Each microservice can use mutable operations on its read/write tables. When it comes to read only tables which are copied from other databases they can (obviously) use only reads
High performance read
It is possible to achieve high performance read by introducing solutions such as redis/memcached on top of read only view solution. Both sides of join should be copied to flat structure optimized for reading. You can introduce completely new stateless microservice which can be used for reading from this storage. While it seems like a lot of hassle it is worth to note that it will have higher performance than monolithic solution on top of relational database.
There are few possible solutions. Ones which are simplest in implementation have lowest performance. High performance solutions will take few weeks to implement.
I realise this is possibly not a good answer but what the heck. Your question was:
Given a database that consists entirely of related tables, how does
one denormalize this into smaller fragments (groups of tables)
WRT the database design I'd say "you can't without removing foreign keys".
That is, people pushing Microservices with the strict no shared DB rule are asking database designers to give up foreign keys (and they are doing that implicitly or explicitly). When they don't explicitly state the loss of FK's it makes you wonder if they actually know and recognise the value of foreign keys (because it is frequently not mentioned at all).
I have seen big systems broken into groups of tables. In these cases there can be either A) no FK's allowed between the groups or B) one special group that holds "core" tables that can be referenced by FK's to tables in other groups.
... but in these systems "groups of tables" is often 50+ tables so not small enough for strict compliance with microservices.
To me the other related issue to consider with the Microservice approach to splitting the DB is the impact this has reporting, the question of how all the data is brought together for reporting and/or loading into a data warehouse.
Somewhat related is also the tendency to ignore built in DB replication features in favor of messaging (and how DB based replication of the core tables / DDD shared kernel) impacts the design.
EDIT: (the cost of JOIN via REST calls)
When we split up the DB as suggested by microservices and remove FK's we not only lose the enforced declarative business rule (of the FK) but we also lose the ability for the DB to perform the join(s) across those boundaries.
In OLTP FK values are generally not "UX Friendly" and we often want to join on them.
In the example if we fetch the last 100 orders we probably don't want to show the customer id values in the UX. Instead we need to make a second call to customer to get their name. However, if we also wanted the order lines we also need to make another call to the products service to show product name, sku etc rather than product id.
In general we can find that when we break up the DB design in this way we need to do a lot of "JOIN via REST" calls. So what is the relative cost of doing this?
Actual Story: Example costs for 'JOIN via REST' vs DB Joins
There are 4 microservices and they involve a lot of "JOIN via REST". A benchmark load for these 4 services comes to ~15 minutes. Those 4 microservices converted into 1 service with 4 modules against a shared DB (that allows joins) executes the same load in ~20 seconds.
This unfortunately is not a direct apples to apples comparison for DB joins vs "JOIN via REST" as in this case we also changed from a NoSQL DB to Postgres.
Is it a surprise that "JOIN via REST" performs relatively poorly when compared to a DB that has a cost based optimiser etc.
To some extent when we break up the DB like this we are also walking away from the 'cost based optimiser' and all that in does with query execution planning for us in favor of writing our own join logic (we are somewhat writing our own relatively unsophisticated query execution plan).
I would see each microservice as an Object, and as like any ORM , you use those objects to pull the data and then create joins within your code and query collections, Microservices should be handled in a similar manner. The difference only here will be each Microservice shall represent one Object at a time than a complete Object Tree. An API layer should consume these services and model the data in a way it has to be presented or stored.
Making several calls back to services for each transaction will not have an impact as each service runs in a separate container and all these calles can be executed parallely.
#ccit-spence, I liked the approach of intersection services, but how it can be designed and consumed by other services? I believe it will create a kind of dependency for other services.
Any comments please?

Resources