I'm planning to decompose an application I started to build as a monolith with a graph database into microservices. But the dilema i'm facing is trying to find a proper solution to split the different services and not loosing the benefits provided by the graph database.
The idea I've considered initially is to split each different entity into it's own microservice, using a document store to persist the data on each service. And then define a higher level service to manage the relationships.
For example with a relationship (A)-->(B), would produce 3 microservices, a service for entities of type A, another for the entities of type B, and a third higher level with a graph database, storing nodes of type A and B, containing only the ID's and the relationships between those.
Question 1: Is there anything wrong with this approach in terms of coupling, fault tolerance, or anything else that I can't think of right now?
Question 2: When you toss a third entity into the game, for example (A)-->(B), (A)-->(C) and (C)-->(B), which one would be the best approach in this scenario?
Do I stick to the strategy of just one higher level service to maintain all the relationships?
Do I generate several higher level services to maintain each type of relationship?
Question 3: In the case of relationships between entities of the same type, for example (Person)--isFriendOf-->(Person), having in mind the concept of separation of concerns, is it appropiate to separate the management of the relationships into a different service?
Any input, feedback and ideas are very welcome.
I've been doing some research on the subject, and for the sake of clarity, I'll propose a more concrete scenario, so it will be easier to discuss about it.
The graph model would be something like this:
The goal here would be to implement a song playlist recommendation service, trying to find the songs that a given user haven't listened yet, based on genres and artists from the songs that the user already listened, and also from other songs listened by other users, followed by the current user.
According to the answer to this question (in Programmers Stack Exchange) How do you handle shared concepts in a microservice architecture? seems that in fact the initial proposed approach is a good one. If separation of concerns is done properly when strangling the different parts into different services, then there shouldn't be coupling issues.
As for fault tolerance, is hard to generalize, so it might be better to study the specifics in a case by case basis, and determine in each situation, how to gracefully downgrade the service when other services aren't available.
Regarding questions 2 and 3, I tried to take a generalized abstract approach at first, but after considering an specific case, I ended up with the conclusion that is also hard to generalize. So for this specific graph model, I came up with this possible solution:
So basically, to question 3 the answer is yes, because this way, the relationship of the users following other users can be used by some other service, and it won't be forced to deppend on the recommendation system.
And to the question 2, it deppends on the domain model, since it made sense to split the user service apart from the friendship service in the first place, that relationship don't need to be replicated in the recommendation service, while all the other relationships are indeed related, it makes sense to keep them together, at least while there is no need to split them again in order to be able to be reused for other services.
Related
I have conceptual/design problem which I am not able to solve. I will try to explain it on sport terminology example, but of course same problem can be applied to e-shop or any other system.
I want to build 2 "standalone" micro services with API to perform needed actions. Service A will handle player management and Service B will handle team management.
In Service A I can for example perform player training, buy player items, set player appearance etc.
In Service B I can for example set team tactic, hire team staff, set team appearance (logo, color) etc.
I have read many artictles and topicĀ“s about microservices and as written in one article: "... hardest part about microservice = your data ..."
And here comes my problem. How and where I should store relation between player and team (imagine table TeamPlayer as simple relation table with columns player_id, team_id) ? So these are my concerns:
Should be table TeamPlayer part of microservice A or B? Or should both micro services share same database?
If it will be standalone databases and if I will decide that microservice B will store this relation how I can validate that player exists? What if somebody sent me wrong player identifier? Or do I need to care? Do I need to validate that player exists or this is not microservice B concern? Of course microserivce B knows teams, so I can return error when wrong team identifier is provided.
I do not want to call microservice A from microservice B because then I will tight them together and made dependency on from microservice B to microservice A.
If it will be standalone databases how I will read the data? Imagine that in UI I want list of player names inside team. Microsoervice B knows team and just relations and microservice A know names. Do I need to make at least two calls to collect data?
If it will be shared database can microservice B read some data about player dicrectly from database? I can imagine some situation where this should not be allowed because of some access rights etc, but all these business logic is build in microservice A API which normally reads and return data about player.
How to solve this in the best way? Is API gateway the answer or how to do it in best way?
I hope so that it clear what are my concerns :)
In general, the proper boundaries for microservices aren't going to line up that well with the normalized tables that you'd use in a monolithic application backed by a relational DB with foreign key constraints. Instead, I advise letting each service maintain its own view of the data it needs to perform its tasks (having services publish durable logs of events for changes to the state for which they control the writes is the mechanism I recommend). There are probably multiple "bounded contexts" in your conception of services A and B: it will probably make sense to have a microservice for each bounded context.
So with that in mind, what is a team? From the perspective of the "assign player to team" operation, a team is probably just a set of players associated with some sort of unique ID. Tactics and logo/color are irrelevant for that operation. All it needs is:
what players exist (derivable from the events published by a service which manages creating new players)
what teams have been created (this service might own that, or it could derive from the events published by a service which manages creating teams)
which players are on which teams (this service would definitely own that data)
Note that this approach does bring in eventual consistency: it won't be until some (likely short, think likely less than a second) time has passed since a player was created that an assignment of that player to a team will succeed.
Short answer is - table TeamPlayer should be part of microservice A and B both.
For explanation I borrow the phrase from Levi's answer "I advise letting each service maintain its own view of the data". This is the correct way to design microservice.
You may raise questions on data duplication and data consistency. Yes, there will be data duplication but this is architecture trade-off to design, develop and maintain a microservice. On data consistency, eventual consistency should default mode to maintain database state unless there is specific use case (example - banking) to have strong consistency.
Hope this answer gives you clear and concise view of your design problem.
I was wondering if it is possible to store graph related data (for example a guitar node having various other nodes of data connected to it like specs and description) inside another database?
Imagine a graph inside a block in a tabular database. Or a key of guitar which has various nodes connected with a value pair. If possible, what are the performance limitations, drawbacks and etc.?
you can take most business scenarios and model them in multiple ways so yes, in theory, you could define one solution that uses a graph DB and another solution that uses a relational DB.
Whether you can actually model the business scenario for specific DB types would be down to the details of the requirements, as would the pros and cons.
I'm afraid your question is too general to provide an answer that gives any specifics on performance limitations, drawbacks, etc.
The business domain has five high-level bounded contexts
Customers
Applications
Documents
Decisions
Preforms
Further, these bounded contexts has sub-contexts like ordering and delivery of the documents. Despite the project of consisting ten of thousands of classes and dozens of EJB's, most of the business logic resides in relational database views and triggers for a reason: A lot of joins, unions and constraints involved in all business transactions. In other words, there is complex web of dependencies and constraints between the bounded contexts, which restricts the state transfers. In layman terms: the business rules are very complicated.
Now, if I were to split this monolith to database per service microservices architecture, bounded contexts being the suggested service boundaries, I will have to implement all the business logic with explicit API calls. I would end up with hundreds of API's implementing all these stupid little business rules. As the performance is main factor (we use a lot of effort to optimize the SQL as it is now), this is out of the question. Secondly, segregated API's would probably be nightmare to maintain in this web of ever evolving business rules, where as database triggers actually support the high cohesion and DRY mentality, enforcing the business rules transparently.
I came up with a conclusion microservice architecture being unsuitable for this type of document management system. Am I correct, or approaching the idea from wrong angle?
First of all, you don't have to have a Microservices architecture. I really mean it! If you were ordered by management/architect to do it, and it doesn't solve any real problems you are having, you are probably right for pushing back.
That being said, and with the disclaimer that I don't know the exact requirements of your application, having "things" as bounded context is a smell. So having "Customers", "Applications", "Documents", etc. as services is very likely the wrong approach.
Bounded contexts should not be CRUD operations on a specific entity. They should be completely independent (or as independent as possible) "vertical" parts of the whole application. Preferably with their own Database and GUI. They should also operate independently of each other, not requiring input from other services for own decisions.
It is the complete opposite of data-centric design, where tables/fields and relations are the core concepts. Here, functionality is the core concept. You would have to split your application along functionality to arrive at a good separation.
I could imagine a document management system having these idependent bounded contexts / services: Search, Workflow, Editing, etc.
Here is how you would think about it: Search does not require any (synchronous) input from any other service. It may receive regular, even near-time updates with new documents, but that does not impact it's main feature: searching already indexed documents. The GUI is also independent, something like one google-like page with a search box maybe. It can deliver results independently, and would link back to the Workflow or Editing apps when you click on a result.
The others would be similarly independent. Again, the point is to split the services in a way that makes them work independently. If you don't have that, you will only make things worse with Microservices.
First of all the above answer is correct in suggesting that you need to breaup your microservice in a better way.
Now If scalability is your concern(lots of api calls between microservice).
I strongly suggest you to validate that how many of the constraints are really required at the first level, and how many of them you could do in async way. With that what i mean is in distributed enviornment we actually do not need to validate all the things at the same time.
Sometimes these things are not directly visible , for eg: lets say there are two services order service and customer service and order service expose a api which say place a order for customer id. and business say you cannot place a order for a unknown customer
one implementation is from the order service you call the customer service in sync ---- in this case customer service down will impact your service, now lets question do we really need this.
Because a scenario could happen where customer just placed an order and somebody deleted that customer from customer service, now we have a order which dosen't belong to customer.Consistency cannot be guaranteed.
In the new sol. we are saying allow the order service to place the order without checking the customer id and do one of the following:
Using ProcessManager check the customer validity and update the status of the order as invalid and when customer get deleted using ProcessManager update the order status as invalid or perform business logic
Do not check at all , because placing a order dosen't count a thing, when this order will be in the process of dispatch that service will anyway check the customer status
In this way your API hits are reduced and better independent services are produced
I am building a system that allows front-end users to define their own business objects. Defining a business object involves creating data fields for that business object and then relating it to other business objects in the system - fairly straight forward stuff. My question is, what is the most efficient storage strategy?
The requirements are:
Must support business objects with potentially 100+ fields (of all common data types)
The system will eventually support hundreds of thousands of business object instances
Business objects sometimes display data and aggregates from their relationships with other business objects
Users must be able to search for business objects by their data fields (and fields from related business objects)
The two possible solutions I can envisage are:
Have a dynamic schema such that when a new business object type is created a new table is created for storing instances of that object. The object's fields become columns in the storage table.
Have a fixed schema where instance data fields are stored as rows in basically a big long table.
I can see pros and cons to both approaches:
the dynamic schema allows me to index search columns
the dynamic tables are potentially limited in width by the max column size
dynamic schemas rule out / cause issues with replication
the static schema means less or even no dynamic sql generation
my guess is the static schema may perform like a dog when it comes to searching across 100,000+ objects
So what is the best soution? Is there another approach I haven't thought of?
Edit: The requirement I have been given is to build a generic system capable of supporting front-end user defined business objects. There will of course be restrictions on how these objects can be constructed and related, but the requirement itself is not up for negotiation.
My client is a service provider and requires a degree of flexibility in servicing their own clients, hence the need to create business objects.
I think your problem matches very well to a graph database like Neo4j, as it's built for the requested kind of flexibility from the beginning. It stores data as nodes and relationships/edges, and both nodes and relationships can hold arbitrary properties (in a key/value fashion). One important difference to a RDBMS is that a graph database won't need to lookup the relationships in a big long table (like in your fixed schema solution), so there should be a significant performance gain there. You can find out about language bindings for Neo4j in the wiki and read what others say about it in this stackoverflow thread. Disclaimer: I'm part of the Neo4j team.
Without much understanding of your situation...
Instead of writing a general purpose one-size-fits-all business objects system (which is the holy grail for Oracle, Microsoft, SAS, etc.), why not do it the typical way, where the requirements are gathered, and a developer designs and implements the users' business objects in an effective manner?
If your users are typical, they will create a monster, which will end up running slow, and they will hate it. Most users will view the data as an Excel sheet, and not understand relationships like: parent/child. As a result there will be some crazy objects built, and impossible-to-solve reports. You'll be forced to create scripts to manually convert many old objects to better and properly defined ones, etc...
Your requirements sound a little bit like an associative database with a front end to compose and edit entities.
I agree with KM above, unless you have a very compelling reason not to, you would be better off using a traditional approach. There are a lot of development tools and practices that allow you to build a robust and scalable system. Otherwise you will have to implement much of this yourself.
I don't know the best way to do this, because it sounds like something that has already been implemented by others. If I were asked to implement this feature, I would recommend buying a wheel instead of reinventing it.
Perhaps there are reasons you have to invent your own? If so, then you should add those reasons to the requirements you listed.
If you absolutely must be this generic, I still recommend buying a system that has been architected for this requirement. Not just the storage requirements, which are the least of the problems your customer will have; but also: how do you keep the customer from screwing up totally when given this much freedom. Some of the commercial systems already meet this challenge without going out of business because of customers messing up.
If you still need to do this on your own, then I suggest that your requirements (or perhaps those of another vendor?) must include: allow the customer to get it right, and help keep the customer from getting it wrong. You'll need some sort of UI to allow the customer to define these business objects, and the UI should validate the model that the customer builds.
I recommend a UI that works at a conceptual level. As an example, see NORMA, a Visual Studio add-in for Object-Role Modeling (the "other" ORM). Consider it as a example only, if your end users cannot afford a Visual Studio Standard license. Otherwise, you'll find that it is extensible, already produces many types of artifact (from SQL in various dialects to code), and will validate the model to see that it makes sense. End users would also be able to enter sample data that they believe should be valid, and the system will validate the data against the model.
If your customers are producing sensible (if dynamic) business objects, then the question of storage will be much simpler.
Have you thought about an XML based solution? The requirements suggested to me "Build a system that allows users to dynamically generate an XML Schema and work with XML documents based on that schema." I don't know enough about storing and querying XML documents to comment on your original question.
Another possibility might be to leverage NHibernate's ability to generate database schemas. If you can dynamically generate business objects, then you can generate XML mappings or Fluent mappings and use that to generate a normalized database schema.
Every user that I have ever talked to has always wanted "everything" in their project. Part of the job of gathering requirements is to guide the user, not just write down everything they say.
Your only hope is to build several template objects, that they can add properties to, you could code your application to handle each type of these objects, but allow the user to still slightly modify each as necessary.
You need to inform the user upfront of the major flaws this type of design has. This will help you in the end, when it runs slow, or if they screw up and need help fixing something. I'd put this in writing.
How many possible objects would they really need? Perhaps you could set these up using your system first. I have developed several very customizable systems over the years and when the user is sitting at an empty screen, it is like a deer in the headlights.
In any event, good luck.
I've been trying to see if I can accomplish some requirements with a document based database, in this case CouchDB. Two generic requirements:
CRUD of entities with some fields which have unique index on it
ecommerce web app like eBay (better description here).
And I'm begining to think that a Document-based database isn't the best choice to address these requirements. Furthermore, I can't imagine a use for a Document based database (maybe my imagination is too limited).
Can you explain to me if I am asking pears from an elm when I try to use a Document oriented database for these requirements?
You need to think of how you approach the application in a document oriented way. If you simply try to replicate how you would model the problem in an RDBMS then you will fail. There are also different trade-offs that you might want to make. ([ed: not sure how this ties into the argument but:] Remember that CouchDB's design assumes you will have an active cluster of many nodes that could fail at any time. How is your app going to handle one of the database nodes disappearing from under it?)
One way to think about it is to imagine you didn't have any computers, just paper documents. How would you create an efficient business process using bits of paper being passed around? How can you avoid bottlenecks? What if something goes wrong?
Another angle you should think about is eventual consistency, where you will get into a consistent state eventually, but you may be inconsistent for some period of time. This is anathema in RDBMS land, but extremely common in the real world. The canonical transaction example is of transferring money from bank accounts. How does this actually happen in the real world - through a single atomic transactions or through different banks issuing credit and debit notices to each other? What happens when you write a cheque?
So lets look at your examples:
CRUD of entities with some fields with unique index on it.
If I understand this correctly in CouchDB terms, you want to have a collection of documents where some named value is guaranteed to be unique across all those documents? That case isn't generally supportable because documents may be created on different replicas.
So we need to look at the real world problem and see if we can model that. Do you really need them to be unique? Can your application handle multiple docs with the same value? Do you need to assign a unique identifier? Can you do that deterministically? A common scenario where this is required is where you need a unique sequential identifier. This is tough to solve in a replicated environment. In fact if the unique id is required to be strictly sequential with respect to time created it's impossible if you need the id straight away. You need to relax at least one of those constraints.
ecommerce web app like ebay
I'm not sure what to add here as the last comment you made on that post was to say "very useful! thanks". Was there something missing from the approach outlined there that is still causing you a problem? I thought MrKurt's answer was pretty full and I added a little enhancement that would reduce contention.
Is there a need to normalize the data?
Yes: Use relational.
No: Use document.
I am in the same boat, I am loving couchdb at the moment, and I think that the whole functional style is great. But when exactly do we start to use them in ernest for applications. I mean, yes we can all start to develop applications extremely quickly, cruft free with all those nasty hang-ups about normal form being left in the wayside and not using schemas. But, to coin a phrase "we are standing on the shoulders of giants". There is a good reason to use RDBMS and to normalise and to use schemas. My old oracle head is reeling thinking about data without form.
My main wow factor on couchdb is the replication stuff and the versioning system working in tandem.
I have been racking my brain for the last month trying to grok the storage mechanisms of couchdb, apparently it uses B trees but doesn't store data based on normal form. Does this mean that it is really really smart and realises that bits of data are replicated so lets just make a pointer to this B tree entry?
So far I am thinking of xml documents, config files, resource files streamed to base64 strings.
But would I use couchdb for structural data. I don't know, any help greatly appreciated on this.
Might be useful in storing RDF data or even free form text.
A possibility is to have a main relational database that stores definitions of items that can be retrieved by their IDs, and a document database for the descriptions and/or specifications of those items. For example, you could have a relational database with a Products table with the following fields:
ProductID
Description
UnitPrice
LotSize
Specifications
And that Specifications field would actually contain a reference to a document with the technical specifications of the product. This way, you have the best of both worlds.
Document based DBs are best suiting for storing, well, documents. Lotus Notes is a common implementation and Notes email is an example. For what you are describing, eCommerce, CRUD, etc., realtional DBs are better designed for storage and retrieval of data items/elements that are indexed (as opposed to documents).
Re CRUD: the whole REST paradigm maps directly to CRUD (or vice versa). So if you know that you can model your requirements with resources (identifiable via URIs) and a basic set of operations (namely CRUD), you may be very near to a REST-based system, which quite a few document-oriented systems provide out of the box.