I am working on a Spring Boot microservices project and I want to divide one microservice that has been done huge into 4 smaller ones. The main issue is that they are some tables (e.g. Customers table) that I want to be shared across all the new microservices in order to keep correlations-FKs. So, each microservice will have its own schema/user and all of them will have a shared one, as well, that will contain these basic tables (in order to avoid writing their Entities many times - one for each service -, I want to import them as a separate dependency through maven). First of all, I would like to ask you if that makes sense architecturally, and if yes, if there is a way to use one Datasource for two (or more) schemas/users - because currently I have managed to connect two Datasources, but I cannot correlate their Entities (the compiler does not permit that). Thank you!
I would advise against this. If you want to keep referential integrity, and have dependencies between microservices you probably don't need a microservice architecture, but just some modularization in your code, which can help you cope with complexity as well. You don't necessarily need a HTTP boundary between code parts just to make stuff work at a better scale.
For the technical parts, I really think that using multiple datasources might result in a 2PC transaction, which you need to be very careful with.
"Correlating entities" could work by specifying the schema/catalog in the #Table annotation to refer to the respective namespace where the shared tables reside in.
Related
The business domain has five high-level bounded contexts
Customers
Applications
Documents
Decisions
Preforms
Further, these bounded contexts has sub-contexts like ordering and delivery of the documents. Despite the project of consisting ten of thousands of classes and dozens of EJB's, most of the business logic resides in relational database views and triggers for a reason: A lot of joins, unions and constraints involved in all business transactions. In other words, there is complex web of dependencies and constraints between the bounded contexts, which restricts the state transfers. In layman terms: the business rules are very complicated.
Now, if I were to split this monolith to database per service microservices architecture, bounded contexts being the suggested service boundaries, I will have to implement all the business logic with explicit API calls. I would end up with hundreds of API's implementing all these stupid little business rules. As the performance is main factor (we use a lot of effort to optimize the SQL as it is now), this is out of the question. Secondly, segregated API's would probably be nightmare to maintain in this web of ever evolving business rules, where as database triggers actually support the high cohesion and DRY mentality, enforcing the business rules transparently.
I came up with a conclusion microservice architecture being unsuitable for this type of document management system. Am I correct, or approaching the idea from wrong angle?
First of all, you don't have to have a Microservices architecture. I really mean it! If you were ordered by management/architect to do it, and it doesn't solve any real problems you are having, you are probably right for pushing back.
That being said, and with the disclaimer that I don't know the exact requirements of your application, having "things" as bounded context is a smell. So having "Customers", "Applications", "Documents", etc. as services is very likely the wrong approach.
Bounded contexts should not be CRUD operations on a specific entity. They should be completely independent (or as independent as possible) "vertical" parts of the whole application. Preferably with their own Database and GUI. They should also operate independently of each other, not requiring input from other services for own decisions.
It is the complete opposite of data-centric design, where tables/fields and relations are the core concepts. Here, functionality is the core concept. You would have to split your application along functionality to arrive at a good separation.
I could imagine a document management system having these idependent bounded contexts / services: Search, Workflow, Editing, etc.
Here is how you would think about it: Search does not require any (synchronous) input from any other service. It may receive regular, even near-time updates with new documents, but that does not impact it's main feature: searching already indexed documents. The GUI is also independent, something like one google-like page with a search box maybe. It can deliver results independently, and would link back to the Workflow or Editing apps when you click on a result.
The others would be similarly independent. Again, the point is to split the services in a way that makes them work independently. If you don't have that, you will only make things worse with Microservices.
First of all the above answer is correct in suggesting that you need to breaup your microservice in a better way.
Now If scalability is your concern(lots of api calls between microservice).
I strongly suggest you to validate that how many of the constraints are really required at the first level, and how many of them you could do in async way. With that what i mean is in distributed enviornment we actually do not need to validate all the things at the same time.
Sometimes these things are not directly visible , for eg: lets say there are two services order service and customer service and order service expose a api which say place a order for customer id. and business say you cannot place a order for a unknown customer
one implementation is from the order service you call the customer service in sync ---- in this case customer service down will impact your service, now lets question do we really need this.
Because a scenario could happen where customer just placed an order and somebody deleted that customer from customer service, now we have a order which dosen't belong to customer.Consistency cannot be guaranteed.
In the new sol. we are saying allow the order service to place the order without checking the customer id and do one of the following:
Using ProcessManager check the customer validity and update the status of the order as invalid and when customer get deleted using ProcessManager update the order status as invalid or perform business logic
Do not check at all , because placing a order dosen't count a thing, when this order will be in the process of dispatch that service will anyway check the customer status
In this way your API hits are reduced and better independent services are produced
I have a Spring application which supports a single customer.
I would like to extend this application to support multiple customers where each customers database is stored in a separate database. The schema for the database is the same for each customer, and the same DAOs and business logic should remain the same.
How would I accomplish this with Spring/JPA? Would I need to have multiple persistence contexts and wire in an appropriate entity manager factory based upon the currently logged in user? Are there any examples of implementing something similar to this?
I would advice against running separate database under a single application. If a redesign of the data model to incorporate multiple customers is not an option, why don't you run multiple instances of your application server/web container, one for each customer? As otherwise you'll have to deal with the drawbacks of having a shared platform and isolated databases.
With multiple customer databases and a single application your code will become more complex, you can't guarantee that customer data is fully isolated (e.g. due to a bug in the application a customer is shown the wrong data, so there's not much benefit in isolating each customer) and you'll have the nightmare of maintaining each customer database. Also, by having different databases you can virtually guarantee that someone pointy-haired is going to ask for some bespoke functionality for customer A while leaving customer B's functionality untouched, because "... it will be easy, as we've got different databases...", forgetting that the application is shared.
If you really, really want to have separate databases for particular customers, this would be the way to go — define separate persistence units with the same entity definitions, but different entity manager factory configurations.
To me it sounds more like a need to redesign the database structure. I'm guessing that the application has been written for only one client in mind and it turned out that more appeared on the horizon, so, hey, let's do something about it, and fast! Aren't you trying to copy-paste, but in a bigger scale? You'll going to have a lot of redundancy with JPA if you want to have a few databases with the same structure: for example, everything what's defined inside the mapping-file (queries, entity relationship mappings, etc.) is defined per persistence unit — you'll have to repeat these definitions and keep them all synchronized.
I'll stop here, as it is merely guesswork, for the lack of broader description.
I maintain an application which has many domain entities that draw data from more than one database. The way this normally works is that the entities are loaded from Database A (in which most of their fields are stored). when a property corresponding to data in Database B is called, the entity fires off SQL to Database B to get all the relevant data.
I'm currently using a 'roll-your-own' ORM, which is ugly, but effective (and easy to understand). I've recently started using NHibernate for entities drawn solely from Database A, but I'm wondering how I might use NHibernate for entities drawn from both Databases A and B.
The best way I can think of do this is as follows. I continue to use a NHibernate-based class library for entities in Database A. Those entities which also need data from Database B expose all their data from Database B in a single class accessed via a property. When this property is called, it invokes the appropriate repository, and the object is returned. The class library for accessing Database B would therefore need to be referenced from the class library for accessing Database A.
Does this make any sense, and is there a more established pattern for this situation (which must be fairly common).
Thanks
David
I don't know how well it maps to your situation, or how mature the NHibernate porting for it is at this point, but you might want to look into Shards.
If it doesn't work for you as-is, it might at least supply some interesting patterns to consider.
EDIT (based on comments):
This indeed doesn't seem to map to your situation, as Shards is about horizontal splitting of data.
If you need to split vertically, you'll probably need to define multiple persistence units. Queries and transactions involving both databases will probably get interesting. I'm afraid I can't really help much with this. This question is definitely related though.
What is the best way or recommended best practice in the flow of database driven asp.net web application? I mean the database first or coding first or side by side?
Your data access code won't compile without an existing database - unless you stub (or Mock) it. So probably the database comes first.
But it is a bad idea to do whole chunks of the application in isolation. Ideally you should design and build slivers of the system - database and application - hand-in-hand. These slivers should be cohesive sub-sets of functionality, probably smaller than sub-systems. Inevitably, the act of coding screens and business rules will throw up problems in the data model. So it is good to have a data modeller or DBA who is happy to work incrementally alongside the developers.
edit
Stephanie makes an extremely pertinent point:
"the core tables which are persisting
your app's data really can't be
piecemealed. Most of the data is known
at project start. It has a form, you
need to find it."
I agree that the core entities are knowable at project start, and the physical data model can be derived from that logical data model. But I don't think it is ever possible to nail down completely the structure of any table, even a core table, at the start. This is because at the start of the design/build phase all we have to go on are the Requirements, and if there's one thing history tells us about the Requirements it is that they will change.
So, new tables will be needed and some existing tables will become obsolete. There will be columns which need to be added, columns which need to be modified, columns which need to be dropped. This is why Nature gave us the ALTER TABLE statement.
I am not suggesting that we don't design our tables, or assemble them piecemeal. I am merely suggesting that when we start designing the HR sub-system we need to worry about the EMPLOYEES table and the SALARIES table. We don't need to concern ourselves with INVENTORY or ORDERS until we commence work on Sales.
We personally start with the Domain and do things side-by-side. The important part is that we implement vertical slices of the application (fully working end-to-end features), not horizontal slices (e.g. first the whole database layer, then the data access, then the services, then the presentation): we build the application incrementally and demonstrate progress with working code after each iteration.
Applications are all about features.
You don't build apps to store data,
but to provide functionality. If we
can't agree on that, the discussion is
moot of course. Software should be
developed to satisfy the needs of its
users and not of its developers.
Well I have really no understanding of the second sentence. If you think my company pays me a good salary to write code that satisfies me and not my users you're crazy. So that argument is a strawman. Back to the first.
This is a common view point of application centric people (they), vs. database centric people (We). They see the entire point of the exercise to "provide features". Those are things the clients know they want and ask for them. To them, the database is just persistence required for these features. And when they are done, that's it, features delivered, database is sufficient for those features. Could be an entire Rube Goldberg inside the database with redundant data, severe violations of normal forms, constraints enforced by the application, what have you.
think overall usability alone outweighs database design
If the design of your database is affecting your usability than the design was bad. I have no doubt that one who strives for features will leave the database in such a state that it severely hampers usability.
Data Centric people, don't look at a system as a place to provide only what's been asked for, but a repository of Intellectual Capital that can be exploited by more than whatever the Application-du-jour is. I can't begin to describe the number of cases where one team has used the database of some other team's app to enhance their apps value. Just look at all the medical research that is nothing more that the meta-analysis of existing studies. None of that is possible if you believe that only the features of your app matter and subsequent uses of your apps data do not.
A good data model isn't inviolate. Sure you'll add to it, change it when requirements change. But if you don't completely understand your data, I don't know how anyone can begin to write code.
I guess you need first to define datamodel and only then going coding. You should plan everything carefully before actually writting the code.
First is a feature list.
Then, detailed spec.
Then test plan and design of all, including databases.
Then, it wouldn't matter which to implement first.
You'll probably end up doing it "side by side".
You need some data to be able to test the application, but you need the application to be able to verify that you're storing the correct data.
Do some modelling first and then build the minimum you can for one or two features. Then when these are working add the next feature and so on.
You'll need to write some database update procedures (both the code and the rules about what and when to update) as you will have to extend your tables, but you'll need those for the final system anyway as it will have to change as new requirements come along.
Having done it quite a few times, I find myself invariably doing it like so:
Define the problem I'm trying to solve.
Write out some use-cases.
Have my significant other or a friend tell me if this is even a problem.
Sketch out a few sample screens.
Write flow diagrams for the use cases.
Ask my Rubber-duck questions.
Use questions to refine 1-6.
Write out the 'nouns'. Those become my data Model.
Write out the actions. Those become application logic.
Code data Model.
Code Application Logic.
Realize I've gotten it a little wrong.
Repeat 10-12 as many times as needed.
Ask, "Have I solved the problem"?
If not, rinse, lather and repeat 1-15.
This is a trick question. IMO, they both come in parallel during your planning and design phase. They are so closely related that it make sense to do them together. Just keep in mind that your database design will be almost fully developed while your code is still in its infancy (though your application logic should be almost fully mapped out in you head or on paper)
The idea is that you're designing your solution in the context of the problem. When you're planning out your solution you will be (or should be) defining your application as a set of things and actions (nouns and verbs).
For example, a very basic helpdesk program has people and tickets. People need to create tickets, update tickets, and close tickets. The nouns that require persistent storage will comprise your database, and the nouns + actions will be contained in your application.
Sometimes your table mappings and the relationship between tables will be obvious (IE people create tickets, ticket.creatorID = people.personID) and other times the relationship doesn't really click in your head until you start working through use cases or until you start writing your code (IE different ppl have different access levels defining what they can do. At a glance this would seem like a simple field in a table, but in practice it is better as a separate table).
I have noticed a lot of companies use a prefix for their database tables. E.g. tables would be named MS_Order, MS_User, etc. Is there a good reason for doing this?
The only reason I can think of is to avoid name collision. But does it really happen? Do people run multiple apps in one database? Is there any other reason?
Personally, I don't see any value in it. In fact, it's a bummer for intellisense-like features because everything begins with MS_. :) The Master agrees with me too.
Huge schemas often have many tables with similar, but distinct, purposes. Thus, various "segmented" naming conventions.
Darn, didn't get first post :-)
In SQL Server 2005 and above the schema feature eliminates the need for any kind of prefix. A good example of their usage can be found by reading about the Schemas in AdventureWorks.
In some older versions of SQL server, having a prefix to create a pseudo namespace might of been useful with DBs with lots of tables.
Other than that I can't really see the point.
Even when the database only contains one application, the prefixes can be useful in grouping like parts of the application together. So tables that containtain cutomer information might be prefixed with cust_, those that contain warehouse information might be prefixed with inv_ (for inventory), those that contain finacial information might be prefixed with fin_, etc.
I've worked on systems where there is an existing database for an application which was created and maintained by a different company and we've needed to add another app that uses large amounts of the same data with just a few extra tables of our own, so in that case having an app specific prefix can help with the separation.
Slightly tangentially to the original question, I've seen databases use prefixes to indicate the type of data that a table is holding. There'd be a prefix for lookup tables which are obviously pretty static in both size and content and a different prefix for the tables that contain variable data. This in turn may be broken into having one prefix for tables that are added to but not really changed like logging, processed orders, customer transactions etc, and and another for more variable data like customer balance or whatever. Link tables could also have their own prefix to separate them out too.
I have never seen a naming collision, as it usually doesn't make sense to put tables from different applications into the same database namespace. If you had some sort of reusable library that could be integrated into different applications, perhaps that might be a reason, but I haven't seen anything like that.
Though, now that I think about it, there are some cheap web hosting providers that only allow users to create a very small number of databases, so it would be possible to run a number of different applications using a single database, so long as the names didn't collide (and such a prefixing convention would certainly help).
Multiple applications using a particular table, correct. Prefixes prevent name collision. Also, it makes it rather simple to backup tables and keep them in the same database, just change the prefix and your backup will be fully functional, etc. Aside from that, it's just good practice.
Prefixes are a good way to sort out which sql objects are associated with which app when multiple apps dip into the same database.
I have also prefixed sql objects differently within the same app to facilitate easier management of security. i.e. all the objects with admin_ need this security applied and the rest need something else.
Prefixes can be handy for humans, search tools and scripts. If the situation is a simple one, however, there is probably no use for them at all.
It's most often use if several applications are sharing one database. For example, if you install Wordpress, it prefixes all tables with "wp_". This is good if you want your applications to share data very easily(sessions across all applications in your company, for example.)
There are better ways to accomplish this however, and I never prefix my table names, as each application has it's own self-contained database.