SAAS application with microservices and database per tenant - database

I am designing web application for multi-tenant SAAS model, where due to some regulation we decided to have one database per tenant and we are considering microservices for our middleware. But I confused bit, where microservice architecture talks about 'every microservice has their own database'. So here are my questions
If I use shared database for microservices, it violate concept/purpose of microservices design where they all should be isolated and any change should limited to that microservice.
I can think of a layer which mimic my database tables as DTO and every microservice communicate to database with this layer, correct me if I am wrong here.
If I use every microservice has their own database then its almost impossible to have one database per tenant, otherwise each microservice end-up with n number of database.
So what should be right approach to design our middleware with one database per tenant
Or if any one has better approach feel free to share.
Below is high-level design we are started with
here

You should distinguish 2 things here:
Database sharding
Sharding is a database architecture pattern related to horizontal partitioning in which you split your database based on some logical key. In your case your logical key is your Tenant(tenantId or tenantCode). Sharding will allow you to split your data from one database to multiple physical databases. Ideally you can have a database per logical shard key. In your case this means that you can have in best case database per tenant. Keep in mind that you don't have to split it that far. If your data is not that big enough to be worth putting every tenant data to separate database start with 2 databases and put half of your tenants to 1 physical database and half to a second physical database. You can then coordinate this on your application layer by saving in some configuration or another table in database which tenant is in which database. As your data grows you can migrate and/or create additional physical databases or physical shards.
Database per micro-service
It is one of the basic rules in micro-services architecture that each micro-service has its own database. There are multiple reasons for it some of them are:
Scaling
Isolation
Using different database technologies for different micro-services(if needed)
development team separation
You can read more about it here. Sure it has some drawbacks but keep in mind that is one of the key rules in micro-services architecture.
Your questions
If I use shared database for microservices, it violate concept/purpose
of microservices design where they all should be isolated and any
change should limited to that microservice.
If you can try to avoid shared database for multiple micro-services. If you end up doing it you should maybe consider your architectural design. Sometimes forcing one database is a indicator that some micro-services should be merged to one as the coupling between them is to big so the overhead of working with them becomes very complex.
If I use every microservice has their own database then its almost
impossible to have one database per tenant, otherwise each
microservice end-up with n number of database.
I don't really agree that its impossible. Yes it is hard to manage but if you decided to use micro-services and you need database sharding you need to deal with the extra complexity. Sure having one database per micro-service and then for each micro-service n databases could be very challenging.
As a solution I would suggest the following:
Include the tenant(tenantId or tenantCode) as a column in every table in your database. This way you will be able to migrate easily later if you decide that you need to shard that table,set of tables in schema, or whole db belonging to some micro-service. As already said in the above part regarding Database sharding you can start with one Physical shard(one physical database) but already define your logical shard(in this case using the tenant info in each table).
Separate physically the data to different shards only in the micro-services where you need it. Lets say you have 2 micro-services: product-inventory-micro-service and customers-micro-service. Lets say you have 300 million products in your product-inventory-micro-service db and only 500 000 Customers. You don't need to have a database per tenant in the customers-micro-service but in product-inventory-micro-service with 300 million records that would be very helpful performance wise.
As I said above start small with 1 or 2 physical databases and increase and migrate during the time as the time goes your data increases and you have the need for it. This way you will save yourself some overhead in development and maintenance of your servers at least for the time that you don't need it.

Our application is SaaS and multi-tenant with a microservices architecture. We do indeed use 1 database per service, though in some cases it's actually just a separate schema per tenant, not a separate instance.
The key concept in "database per service" is all to do with avoiding sharing of tables/documents between services and less to do with exactly how you implement that. If two services are both accessing the same table, that becomes a point of tight coupling between the services as well as a common point of failure - two things microservices are designed to avoid.
Database-per-service means don't share persistence between multiple services, how you implement that is up to you.
Multi-tenancy is another challenge and there are multiple ways to approach it. In our case (where regulations do not dictate our architecture) we have designed our Tenant-aware table-structures with TenantId as part of the Primary Key. Each Tenant-aware service implements this separation and our Authorization service helps keep users within the boundaries of their assigned Tenant(s).
If you are required to have more separation than key/value separation, I would look to leverage schemas as a great tool to segregate both data and security:
You could have a database instance (say a SQL Server instance) per microservice, and within each instance have a schema per tenant.
That may be overkill at the start, and I think you'd be safe to do a schema per service/tenant combination until that number grew to be a problem.
In either case, you'd probably want to write some tooling in your DB of choice to help manage your growing collection of schemas, but unless you are going to end up with thousands of tenants, that should get you pretty far down the road.
The last point to make is that you're going to lean heavily on an Integration bus of some sort to keep your multiple, independent data stores in sync. I strongly encourage you to spend as much time on designing that as you do the rest of your persistence as those events become the lifeblood of the system. For example, in our multi-tenant setup, creating a new tenant touches 80% of our other services (by broadcasting a message) so that those services are ready to interact with the new tenant. There are some integration events that need to happen quickly, others that can take their time, but managing all those moving parts is a challenge.

Related

Combination Single-tenant and multi-tenant infrastructure

I am evaluating the best approach for migrating our current on-premises Java Web app to a SAAS platform. Application multi-tenancy seems straight-forward, but less so with the database. We're probably all aware of the database-per-tenant pros at this point: isolation, performance, reduced backup/restore complexity, and much lower retrofit complexity. Naturally the row-per-tenant approach has its benefits as well, reduced infrastructure costs being a major one.
Is it unheard of to combining the two approaches? That way the database-per-tenant approach faster time-to-market while the development changes to support a multi-tenant database are being made gradually. Once both approaches are operational customers with particularly heavy workloads or security constraints could have their own isolated database, but the default would be using a shared common database (for cost/efficiency reasons). Does anyone have any experience using/seeing this combination of approaches in the real world?
Whether requests are routing to datasource by tenant ID, or the tenant ID is an argument to the SQL queries, the major differences should be contained with in the persistence layer/database somewhat limiting the added complexity of combining the two approaches.
There are complexities when we scale out a tenant, i.e. moving a tenant data from the shared database to that of the isolated database.
The automation of this process requires effort and testing due to the identification of the entity tables, mapping tables and ordering of these steps to process the migration successfully. The strategy used for the database like ORM or ADO.NET also needs to be considered for this process.
Compared to having a row wise tenantid, if we can use a schema per tenant within the same database, it will be easier to perform this kind of migration.
We did try this out initially, but since there was framework data and application / business data, it was little difficult to resolve the migration to happen automatically, given lesser time-frame, however with the right time and plan, this can be achieved.

Database Bottleneck In Distributed Application

I hear about SOA and Distributed Applications everywhere now. I would like know about some best practices related to keeping the single data source responsive or in case if you have copy of data on every server how it is better to synchronise those databases to keep them updated ?
There are many answers to this question and in order to choose the most appropriate solution, you need to carefully consider what kind of data you are storing and what you want to do with it.
Replication
This is the traditional mechanism for many RDBMS, and normally relies on features provided by the RDBMS. Replication has a latency which means although servers can handle load independently, they may not necessarily be reading the latest data. This may or may not be a problem for a particular system. When replication is bidirectional then simultaneous changes on two databases can lead to conflicts that need resolving somehow. Depending on your data, the choice might be easy (i.e. audit log => append both), or difficult (i.e. hotel room booking - cancel one? select alternative hotel?). You also have to consider what to do in the event that the replication network link is down (i.e. do you deny updates on both database, one database or allow the databases to diverge and sort out the conflicts later). This is all dependent on the exact type of data you have. One possible compromise, for read-heavy systems, is to use unidirectional replication to many databases for reading, and send all write operations to the source database. This is always a trade-off between Availability and Consistency (see CAP Theorem). The advantage of RDBMS and replication is that you can easily query your entire dataset in complex ways and have greater opportunity to
remove duplication by using relational links to data items.
Sharding
If your data can be cleanly partitioned into disjoint subsets (e.g. different customers), such that all possible relational links between data items are contained within each subset (e.g. customers -> orders). Then you can put each subset in separate databases. This is the principle behind NoSQL databases, or as Martin Fowler calls them 'Aggregate-Oriented Databases'. The downside of this approach is that it requires more work to run queries over your entire dataset, as you have to query all your databases and then combine the results (e.g. map-reduce). Another disadvantage is that in separating your data you may need to duplicate some (e.g. sharding by customers -> orders might mean product data is duplicated). It is also hard to manage the data schema as it lies independently on multiple databases, which is why most NoSQL databases are schema-less.
Database-per-service
In the microservice approach, it is advised that each microservice should have its own dedicated database, that is not allowed to be accessed by any other microservice (of a different type). Hence, a microservice that manages customer contact information stores the data in a separate database from the microservice that manages customer orders. Links can be made between the databases using globally unique ids, or URIs (especially if the microservices are RESTful) etc. The downside again from this is that it is even harder to perform complex queries on the entire dataset (especially since all access should go via the microservice API not direct to the databases).
Polyglot storage
So many of my projects in the past have involved a single RDBMS in which all data was placed. Some of this data was well suited to the relational model, much of it was not. For example, hierarchical data might be better stored in a graph database, stock ticks in a column-oriented database, html templates in a NoSQL database. The trend with micro-services is to move towards a model where different parts of your dataset are placed in storage providers that are chosen according to the need.
If you thinking to keep different copies of the database for each microservice and you want to achieve eventual consistency than you can use Kafka Connect. I can briefly tell you that kafka connect will watch your DBS and whenever there are any changes it will read the log file and will add these logged events as a message in Queue then another database those are a subscriber to this Queue can execute the same statement at their side also.
Kafka connect isn't the only framework, you can search and find other frameworks or application for the same implementation.

Architecture: one or multiple databases for sub customers (web)APP

I've built a winforms application that i'm currently rebuilding into an ASP.NET MVC application using Web API etc. Maybe an app will be added later on.
Assume that I will provide these applications to a few customers.
My applications are made for customer accounting.
So all of my customers wil manage their customers whithin the applications I provide.
That brings me to my question. Should I work with one big database for al my customers, or should I use seperate database for each of my customers? I'd like to ask the same for web app instances, api's etc.
Technicaly I think both options are possible. If it's just a mather of preference, all input is appreciated.
Some pros and cons I could think off:
One database:
Easy to setup/maintain
Install one update for all of my customers
No possibility to restore db for one customer
Not flexible in terms of resource spreading
Performance, this db can get realy large
Multiple databases:
Preformance, databases are smaller sized and can be spread by multiple servers
Easy to restore data if customer made a 'huge mistake'
The ability to provide customer specific needs (not needed atm)
Harder to setup/maintain, every instance needs to be updated seperately.
A kind of gateway/routing thing is needed to route users to the right datbase/app
I would like to know how the 'big companies' approach this.
You seem to be talking about database multi-tenancy, and you are right about the pros and cons.
The answer to this depends a lot on the kind of application you are building and the kind of customers it will have.
I would go with multi-tenant (single DB multiple tenants) database if
Your application is a multi-tenant application.
Your users do not need to store their own data backups.
Your DB schema will not change for each customer (this is implied in multi-tenant applications anyway).
Your tenants/customers will not have a huge amount of individual data.
Your customers don't have government imposed data isolation laws they need to comply with (EU data in EU, US data in US etc.).
And for individual databases pretty much the inverse of all those points.

Which Multi-tenant approach is recommended

As per Multi-Tenant Data Architecture post, there are 3 ways to implement multi-tenancy
Separate Databases
Shared Database, Separate Schemas
Shared Database, Shared Schema
I have following details:
User should be able to backup and restore their data.
No of tenants : 3 (approx)
Each tenant might belong to different domain(url).
There are some common tables for all the tenants.
No of tables in each tenant: 10 (initial)
I would like to know which approach is more suitable for me?
I think option 2 is the best one, but you still have an issue with requirement 1. Backup and restore is not available per schema. you will need to handle this using Import data or any custom tool. common tables will have separate schema.
In option 1, you need to handle requirement 4, common tables will be replicated between all databases.
The most important condition among all the 5 conditions is condition 4 - which says that some tables are common among all the tenant. If some tables are common then Separate Database (i.e. Option 1) is ruled out.
You could have gone ahead with option 2, shared database and separate schema but the number of tenants are quite less (3 in your case). In such a scenario adding the overhead of maintaining separate schemas is an overhead which should be avoided. Hence, we can skip Option 2 as well until we evaluate option 3.
Option 3 - Shared database with shared schema is will be most efficient option for you. It avoids the overhead of maintaining separate schemas and allows common tables among tenants. In shared schemas generally a tenant identifier is used across tables. Hibernate already has support for such tenant identifiers (just in case you will use Java-J2EE for implementation).
The only problem may be performance as putting the data of all three tenants together in the same table(s) will lead to lower database access\search performance which you will have to counter with de-normalization and indexing.
I would recommend going forward with option 3.
For us, we have developed an ERP in HR which is used by about a forty of client and a thousand's of users; The approach used for the multi-tenancy is the third.
In addition, one of the technical separation tables used was the heritance.
I had the same issue in my system. We had an ad-network system that became pretty large overtime, so I was considering migration to multi-tenant architecture per publisher.
Option 3 wasn't relevant as some publishers had special requirements (additional columns/procedures and we are not supporting partitioning). Option 2 wasn't relevant since we had concurrency and load issues between customers. In order to reach high performance, with plans to scale-out up to 1000 publishers, with 20 widgets and high concurrency, Our only solution was Option 1 - Separate Databases.
We already migrated to this mode, and we have 10 databases running, with a shared database for configuration and ads. From performance, it is great. From high availability, it is also very good as each publisher traffic doesn't effect all the other. Adding new publisher is an easy step for us, as we have our template environment. The only issue we have is Maintenance.
Lately I've been reading about PartitionDB, It looks very simple to manage as you can have a gate database to perform all maintenance works including upgrades and top level queries. It supports common shared database (same we have already), and I am trying now to understand how to use the stand alone database as well.

Microservices and database joins

For people that are splitting up monolithic applications into microservices how are you handling the connundrum of breaking apart the database. Typical applications that I've worked on do a lot of database integration for performance and simplicity reasons.
If you have two tables that are logically distinct (bounded contexts if you will) but you often do aggregate processing on a large volumes of that data then in the monolith you're more than likely to eschew object orientation and are instead using your database's standard JOIN feature to process the data on the database prior to return the aggregated view back to your app tier.
How do you justify splitting up such data into microservices where presumably you will be required to 'join' the data through an API rather than at the database.
I've read Sam Newman's Microservices book and in the chapter on splitting the Monolith he gives an example of "Breaking Foreign Key Relationships" where he acknowledges that doing a join across an API is going to be slower - but he goes on to say if your application is fast enough anyway, does it matter that it is slower than before?
This seems a bit glib? What are people's experiences? What techniques did you use to make the API joins perform acceptably?
When performance or latency doesn't matter too much (yes, we don't
always need them) it's perfectly fine to just use simple RESTful APIs
for querying additional data you need. If you need to do multiple
calls to different microservices and return one result you can use
API Gateway pattern.
It's perfectly fine to have redundancy in Polyglot persistence environments. For example, you can use messaging queue for your microservices and send "update" events every time you change something. Other microservices will listen to required events and save data locally. So instead of querying you keep all required data in appropriate storage for specific microservice.
Also, don't forget about caching :) You can use tools like Redis or Memcached to avoid querying other databases too often.
It's OK for services to have read-only replicated copies of certain reference data from other services.
Given that, when trying to refactor a monolithic database into microservices (as opposed to rewrite) I would
create a db schema for the service
create versioned* views** in that schema to expose data from that schema to other services
do joins against these readonly views
This will let you independently modify table data/strucutre without breaking other applications.
Rather than use views, I might also consider using triggers to replicate data from one schema to another.
This would be incremental progress in the right direction, establishing the seams of your components, and a move to REST can be done later.
*the views can be extended. If a breaking change is required, create a v2 of the same view and remove the old version when it is no longer required.
**or Table-Valued-Functions, or Sprocs.
CQRS---Command Query Aggregation Pattern is the answer to thi as per Chris Richardson.
Let each microservice update its own data Model and generates the events which will update the materialized view having the required join data from earlier microservices.This MV could be any NoSql DB or Redis or elasticsearch which is query optimized. This techniques leads to Eventual consistency which is definitely not bad and avoids the real time application side joins.
Hope this answers.
I would separate the solutions for the area of use, on let’s say operational and reporting.
For the microservices that operate to provide data for single forms that need data from other microservices (this is the operational case) I think using API joins is the way to go. You will not go for big amounts of data, you can do data integration in the service.
The other case is when you need to do big queries on large amount of data to do aggregations etc. (the reporting case). For this need I would think about maintaining a shared database – similar to your original scheme and updating it with events from your microservice databases. On this shared database you could continue to use your stored procedures which would save your effort and support the database optimizations.
In Microservices you create diff. read models, so for eg: if you have two diff. bounded context and somebody wants to search on both the data then somebody needs to listen to events from both bounded context and create a view specific for the application.
In this case there will be more space needed, but no joins will be needed and no joins.

Resources