I have conceptual/design problem which I am not able to solve. I will try to explain it on sport terminology example, but of course same problem can be applied to e-shop or any other system.
I want to build 2 "standalone" micro services with API to perform needed actions. Service A will handle player management and Service B will handle team management.
In Service A I can for example perform player training, buy player items, set player appearance etc.
In Service B I can for example set team tactic, hire team staff, set team appearance (logo, color) etc.
I have read many artictles and topicĀ“s about microservices and as written in one article: "... hardest part about microservice = your data ..."
And here comes my problem. How and where I should store relation between player and team (imagine table TeamPlayer as simple relation table with columns player_id, team_id) ? So these are my concerns:
Should be table TeamPlayer part of microservice A or B? Or should both micro services share same database?
If it will be standalone databases and if I will decide that microservice B will store this relation how I can validate that player exists? What if somebody sent me wrong player identifier? Or do I need to care? Do I need to validate that player exists or this is not microservice B concern? Of course microserivce B knows teams, so I can return error when wrong team identifier is provided.
I do not want to call microservice A from microservice B because then I will tight them together and made dependency on from microservice B to microservice A.
If it will be standalone databases how I will read the data? Imagine that in UI I want list of player names inside team. Microsoervice B knows team and just relations and microservice A know names. Do I need to make at least two calls to collect data?
If it will be shared database can microservice B read some data about player dicrectly from database? I can imagine some situation where this should not be allowed because of some access rights etc, but all these business logic is build in microservice A API which normally reads and return data about player.
How to solve this in the best way? Is API gateway the answer or how to do it in best way?
I hope so that it clear what are my concerns :)
In general, the proper boundaries for microservices aren't going to line up that well with the normalized tables that you'd use in a monolithic application backed by a relational DB with foreign key constraints. Instead, I advise letting each service maintain its own view of the data it needs to perform its tasks (having services publish durable logs of events for changes to the state for which they control the writes is the mechanism I recommend). There are probably multiple "bounded contexts" in your conception of services A and B: it will probably make sense to have a microservice for each bounded context.
So with that in mind, what is a team? From the perspective of the "assign player to team" operation, a team is probably just a set of players associated with some sort of unique ID. Tactics and logo/color are irrelevant for that operation. All it needs is:
what players exist (derivable from the events published by a service which manages creating new players)
what teams have been created (this service might own that, or it could derive from the events published by a service which manages creating teams)
which players are on which teams (this service would definitely own that data)
Note that this approach does bring in eventual consistency: it won't be until some (likely short, think likely less than a second) time has passed since a player was created that an assignment of that player to a team will succeed.
Short answer is - table TeamPlayer should be part of microservice A and B both.
For explanation I borrow the phrase from Levi's answer "I advise letting each service maintain its own view of the data". This is the correct way to design microservice.
You may raise questions on data duplication and data consistency. Yes, there will be data duplication but this is architecture trade-off to design, develop and maintain a microservice. On data consistency, eventual consistency should default mode to maintain database state unless there is specific use case (example - banking) to have strong consistency.
Hope this answer gives you clear and concise view of your design problem.
Related
How multiple teams(which own different system components/micro-services) in a big tech company share their databases.
I can think of multiple use cases where this would be required. For example in an e-commerce firm, same product will be shared among multiple teams like product at first will be part of product onboarding service, then may be catalog service (which stores all products and categories), then search service, cart service, order placing service, recommendation service, cancellation & return service and so on.
If they don't share any db then
Do they all have redundant copy of the products with same product ID and
Wouldn't there be a challenge to achieve consistency among multiple team.
There are multiple related doubt I have in both the case wether they share DB or not.
I have been through multiple tech blogs and video on software design, and still didn't get satisfying answer. Do share some resources which can give a complete workflow of how things work end-to-end in a big tech firm.
Thank you
In the microservice architecture, each microservice exposes endpoints where other microservice can access shared information between the services. So one service would store as minimal information of a record that is managed by another microservice.
For example if a user service would like to fetch orders for a particular user in an e-commerce case, then the order service would expose an endpoint given a user id would return all orders related to the userid supplied and so on...so essentally the only field related to the user that the order service needs to store is the userid, the rest of the user details is irrelevant to it.
To further improve the cohesion and understanding between teams, data discovery apis/documentation are also built to share metadata of databases to other teams to further explain what each table/field means for one to efficiently plan out a microservice. You can read more about how such companies build data discovery tools
here
If I understand you correctly, you are unsure how different departments receive data in a company?
The idea is that you create reusable and effective API's to solve this problem.
Let's generically say the company we're looking at is walmart. Walmart has millions of items in a database(s). Each item has a unique ID etc etc.
If Walmart is selling items online via walmart.com, they have to have a way to get those items, so they create API's and use them to grab items based on certain query conditions.
Now, let's say walmart has decided to build an app... well they need those exact same items! Well, good thing we already created those API's, we will use the exact same ones to grab the data.
Now, how does Walmart manage which items are available at which store, and at what price? They would usually link this meta data through additional database schema tables and tying them all together with primary and foreign keys.
^^ This essentially allows walmart to grab ONLY the item out of their CORE database that only has details that are necessary to the item (e.g. name, size, color, SKU, details, etc), and link it to another database that is say, YOUR local walmart that contains information relevant to only your walmart location in regard to that item (e.g. price, stock, aisle number etc).
So using multiple databases yes, in a sense.
Perhaps this may drive you down some more roads: https://learnsql.com/blog/why-use-primary-key-foreign-key/
https://towardsdatascience.com/designing-a-relational-database-and-creating-an-entity-relationship-diagram-89c1c19320b2
There's a substantial diversity of approaches used between and even within big tech companies, driven by different company/org cultures and different requirements around consistency and availability.
Any time you have an explicit "query another service/another DB" dependency, you have a coupling which tends to turn a problem in one service into a problem in both services (and this isn't a necessarily a one-way thing: it's quite possible for the querying service to encounter a problem which cascades into a problem in the queried service (this is especially possible when a cache becomes load-bearing, which has led to major outages at at least one FANMAG in the not-that-distant past)).
This has led some companies that could be fairly called big tech to eschew that approach in their service design, typically by having services publish events describing what has changed to a durable log (append-only storage). Other services subscribe to that log and use the events to construct their own eventually consistent view of the data owned by the other service (i.e. there's some level of data duplication, with services storing exactly the data they need to function).
The business domain has five high-level bounded contexts
Customers
Applications
Documents
Decisions
Preforms
Further, these bounded contexts has sub-contexts like ordering and delivery of the documents. Despite the project of consisting ten of thousands of classes and dozens of EJB's, most of the business logic resides in relational database views and triggers for a reason: A lot of joins, unions and constraints involved in all business transactions. In other words, there is complex web of dependencies and constraints between the bounded contexts, which restricts the state transfers. In layman terms: the business rules are very complicated.
Now, if I were to split this monolith to database per service microservices architecture, bounded contexts being the suggested service boundaries, I will have to implement all the business logic with explicit API calls. I would end up with hundreds of API's implementing all these stupid little business rules. As the performance is main factor (we use a lot of effort to optimize the SQL as it is now), this is out of the question. Secondly, segregated API's would probably be nightmare to maintain in this web of ever evolving business rules, where as database triggers actually support the high cohesion and DRY mentality, enforcing the business rules transparently.
I came up with a conclusion microservice architecture being unsuitable for this type of document management system. Am I correct, or approaching the idea from wrong angle?
First of all, you don't have to have a Microservices architecture. I really mean it! If you were ordered by management/architect to do it, and it doesn't solve any real problems you are having, you are probably right for pushing back.
That being said, and with the disclaimer that I don't know the exact requirements of your application, having "things" as bounded context is a smell. So having "Customers", "Applications", "Documents", etc. as services is very likely the wrong approach.
Bounded contexts should not be CRUD operations on a specific entity. They should be completely independent (or as independent as possible) "vertical" parts of the whole application. Preferably with their own Database and GUI. They should also operate independently of each other, not requiring input from other services for own decisions.
It is the complete opposite of data-centric design, where tables/fields and relations are the core concepts. Here, functionality is the core concept. You would have to split your application along functionality to arrive at a good separation.
I could imagine a document management system having these idependent bounded contexts / services: Search, Workflow, Editing, etc.
Here is how you would think about it: Search does not require any (synchronous) input from any other service. It may receive regular, even near-time updates with new documents, but that does not impact it's main feature: searching already indexed documents. The GUI is also independent, something like one google-like page with a search box maybe. It can deliver results independently, and would link back to the Workflow or Editing apps when you click on a result.
The others would be similarly independent. Again, the point is to split the services in a way that makes them work independently. If you don't have that, you will only make things worse with Microservices.
First of all the above answer is correct in suggesting that you need to breaup your microservice in a better way.
Now If scalability is your concern(lots of api calls between microservice).
I strongly suggest you to validate that how many of the constraints are really required at the first level, and how many of them you could do in async way. With that what i mean is in distributed enviornment we actually do not need to validate all the things at the same time.
Sometimes these things are not directly visible , for eg: lets say there are two services order service and customer service and order service expose a api which say place a order for customer id. and business say you cannot place a order for a unknown customer
one implementation is from the order service you call the customer service in sync ---- in this case customer service down will impact your service, now lets question do we really need this.
Because a scenario could happen where customer just placed an order and somebody deleted that customer from customer service, now we have a order which dosen't belong to customer.Consistency cannot be guaranteed.
In the new sol. we are saying allow the order service to place the order without checking the customer id and do one of the following:
Using ProcessManager check the customer validity and update the status of the order as invalid and when customer get deleted using ProcessManager update the order status as invalid or perform business logic
Do not check at all , because placing a order dosen't count a thing, when this order will be in the process of dispatch that service will anyway check the customer status
In this way your API hits are reduced and better independent services are produced
I'm planning to decompose an application I started to build as a monolith with a graph database into microservices. But the dilema i'm facing is trying to find a proper solution to split the different services and not loosing the benefits provided by the graph database.
The idea I've considered initially is to split each different entity into it's own microservice, using a document store to persist the data on each service. And then define a higher level service to manage the relationships.
For example with a relationship (A)-->(B), would produce 3 microservices, a service for entities of type A, another for the entities of type B, and a third higher level with a graph database, storing nodes of type A and B, containing only the ID's and the relationships between those.
Question 1: Is there anything wrong with this approach in terms of coupling, fault tolerance, or anything else that I can't think of right now?
Question 2: When you toss a third entity into the game, for example (A)-->(B), (A)-->(C) and (C)-->(B), which one would be the best approach in this scenario?
Do I stick to the strategy of just one higher level service to maintain all the relationships?
Do I generate several higher level services to maintain each type of relationship?
Question 3: In the case of relationships between entities of the same type, for example (Person)--isFriendOf-->(Person), having in mind the concept of separation of concerns, is it appropiate to separate the management of the relationships into a different service?
Any input, feedback and ideas are very welcome.
I've been doing some research on the subject, and for the sake of clarity, I'll propose a more concrete scenario, so it will be easier to discuss about it.
The graph model would be something like this:
The goal here would be to implement a song playlist recommendation service, trying to find the songs that a given user haven't listened yet, based on genres and artists from the songs that the user already listened, and also from other songs listened by other users, followed by the current user.
According to the answer to this question (in Programmers Stack Exchange) How do you handle shared concepts in a microservice architecture? seems that in fact the initial proposed approach is a good one. If separation of concerns is done properly when strangling the different parts into different services, then there shouldn't be coupling issues.
As for fault tolerance, is hard to generalize, so it might be better to study the specifics in a case by case basis, and determine in each situation, how to gracefully downgrade the service when other services aren't available.
Regarding questions 2 and 3, I tried to take a generalized abstract approach at first, but after considering an specific case, I ended up with the conclusion that is also hard to generalize. So for this specific graph model, I came up with this possible solution:
So basically, to question 3 the answer is yes, because this way, the relationship of the users following other users can be used by some other service, and it won't be forced to deppend on the recommendation system.
And to the question 2, it deppends on the domain model, since it made sense to split the user service apart from the friendship service in the first place, that relationship don't need to be replicated in the recommendation service, while all the other relationships are indeed related, it makes sense to keep them together, at least while there is no need to split them again in order to be able to be reused for other services.
Since the native-cloud applications or microservices architecture requires decentralized data model (each microservices has its own database), and universal data model is centralized data model
So, how we have microservices architecture with universal data model patterns?
Is there any reference or implementation of universal data model and microservices?
In general the two concepts are not compatible. Using a universal data model for all of your services would clash with a couple of key ideas behind using Microservices, e.g. Polyglot Persistence, separate development & deployment of each service. Also, let's not forget that the "Data Model Resource Book" was last updated in 2009.
However, if you must combine the two approaches, e.g. because management insists on it, you can encapsulate all access to the universal data model by a dedicated service and make your other services dependent on it.
Some good thoughts on the subject can be found here: http://plainoldobjects.com/2015/09/02/does-each-microservice-really-need-its-own-database-2/
Yes to #Fritz's point -- universal data modeling and microservices are really two different concepts and are very difficult if not impossible to be used together. I would like to add that the reasoning for polyglot persistence is also because of how the data should be modeled. Microservices allow the use of different data stores that can best model the data according to their domain.
To elaborate more, I don't think it would do justice to mention microservices and data modeling but not domain driven design. From my experience, domain driven design really helps in thinking about services, their responsibilities, and their right to exist. For instance, I found it often to be the case that there are usually a collection of services that carries out a particular domain functionality. An example could be an e-commerce application that have payments, shopping carts, etc. These could be separated into different "bounded contexts" based on domain driven design terminology.
With the different bounded contexts, each microservice no longer sees the same concept in the system the same, so in effect, there is no real universal data model. The easiest example that I can think of to show this, is when you also want reporting on the metrics in the system. If the example was an ecommerce application, the notion of a transaction in the orders microservice are going to be different than transactions in a reporting service. The reporting service for instance may want to know about transactions at a sub-level such as the profit or revenue generated for a particular order instead of the particular line items in an order. However, in the perspective of the orders service, the order details such as the line items and the address of the individual that made the purchase are probably important and should be known. This should then require two different data models.
With respect to domain modeling, I may be a bit extreme but I would go as far as saying that if there are multiple services sharing the same data source, they should really be the same service; there should be only one service for a single data source. My arguments for that would be that the domain hasn't been properly modeled and that the coupling makes it different to evolve any one service if there are multiple services that relies on a single data source. The case could be that one service requires the schema of the data source to change while the other one does not but still is required to accommodate the schema change. Hope this helps!
This is more of a conceptual question but answers specific to opensource products like (JBoss, etc) are also welcome.
If my enterprise app needs to scale and I want to choose the scale-out model (instead of the scale-up) model, how would the multiple app server instances preserve the singleton semantics of a piece of code/data?
Example: Let's say, I have an ID-generation class whose logic demands that it be instantiated as a singleton. This class may or may not talk to the underlying database. Now, how would I ensure that the singleton semantics of this class are maintained when scaling out?
Secondly, is there a book or an online resource that both lists such conceptual issues and suggests solutions?
EDIT: In general, how would one handle generic, application state in the app server layer to allow the application to scale out? What design patterns, software components/products, etc I should be exploring further?
The further you scale out, the less able you are going to be to manage global static atomically. In other words, if you have 100 servers that need to share state (knowing which ID is next in an ID generating singleton class), then there is no technology I know of that will quickly and atomically get that ID for you.
Data has to travel from machine to machine in regards to the ID generation.
There are a few options I can think of for the scenario you mentioned:
Wait for all machines to catch up/sync before accepting a new ID. You could generate the ID locally and then check that it's good across other machines - or - run a job to get the next ID across all machines (think map/reduce).
Think sharding. With sharding you can generate IDs "locally" and be guaranteed to have uniqueness. So if you had 100 machines, machines 1-10 are for users in California, machines 11-20 are for users in New York, etc. Picking a sharding key can be tough.
Start looking to messaging systems. You would create/modify your object locally on a machine and then send the result to a service bus/messaging system and the other machines subscribe to a topic/queue and can get the object and process it.
Pick a horizontally scalable database to manage objects. They've already solved the issues of syncing and replication.