Which Multi-tenant approach is recommended - sql-server

As per Multi-Tenant Data Architecture post, there are 3 ways to implement multi-tenancy
Separate Databases
Shared Database, Separate Schemas
Shared Database, Shared Schema
I have following details:
User should be able to backup and restore their data.
No of tenants : 3 (approx)
Each tenant might belong to different domain(url).
There are some common tables for all the tenants.
No of tables in each tenant: 10 (initial)
I would like to know which approach is more suitable for me?

I think option 2 is the best one, but you still have an issue with requirement 1. Backup and restore is not available per schema. you will need to handle this using Import data or any custom tool. common tables will have separate schema.
In option 1, you need to handle requirement 4, common tables will be replicated between all databases.

The most important condition among all the 5 conditions is condition 4 - which says that some tables are common among all the tenant. If some tables are common then Separate Database (i.e. Option 1) is ruled out.
You could have gone ahead with option 2, shared database and separate schema but the number of tenants are quite less (3 in your case). In such a scenario adding the overhead of maintaining separate schemas is an overhead which should be avoided. Hence, we can skip Option 2 as well until we evaluate option 3.
Option 3 - Shared database with shared schema is will be most efficient option for you. It avoids the overhead of maintaining separate schemas and allows common tables among tenants. In shared schemas generally a tenant identifier is used across tables. Hibernate already has support for such tenant identifiers (just in case you will use Java-J2EE for implementation).
The only problem may be performance as putting the data of all three tenants together in the same table(s) will lead to lower database access\search performance which you will have to counter with de-normalization and indexing.
I would recommend going forward with option 3.

For us, we have developed an ERP in HR which is used by about a forty of client and a thousand's of users; The approach used for the multi-tenancy is the third.
In addition, one of the technical separation tables used was the heritance.

I had the same issue in my system. We had an ad-network system that became pretty large overtime, so I was considering migration to multi-tenant architecture per publisher.
Option 3 wasn't relevant as some publishers had special requirements (additional columns/procedures and we are not supporting partitioning). Option 2 wasn't relevant since we had concurrency and load issues between customers. In order to reach high performance, with plans to scale-out up to 1000 publishers, with 20 widgets and high concurrency, Our only solution was Option 1 - Separate Databases.
We already migrated to this mode, and we have 10 databases running, with a shared database for configuration and ads. From performance, it is great. From high availability, it is also very good as each publisher traffic doesn't effect all the other. Adding new publisher is an easy step for us, as we have our template environment. The only issue we have is Maintenance.
Lately I've been reading about PartitionDB, It looks very simple to manage as you can have a gate database to perform all maintenance works including upgrades and top level queries. It supports common shared database (same we have already), and I am trying now to understand how to use the stand alone database as well.

Related

SAAS application with microservices and database per tenant

I am designing web application for multi-tenant SAAS model, where due to some regulation we decided to have one database per tenant and we are considering microservices for our middleware. But I confused bit, where microservice architecture talks about 'every microservice has their own database'. So here are my questions
If I use shared database for microservices, it violate concept/purpose of microservices design where they all should be isolated and any change should limited to that microservice.
I can think of a layer which mimic my database tables as DTO and every microservice communicate to database with this layer, correct me if I am wrong here.
If I use every microservice has their own database then its almost impossible to have one database per tenant, otherwise each microservice end-up with n number of database.
So what should be right approach to design our middleware with one database per tenant
Or if any one has better approach feel free to share.
Below is high-level design we are started with
here
You should distinguish 2 things here:
Database sharding
Sharding is a database architecture pattern related to horizontal partitioning in which you split your database based on some logical key. In your case your logical key is your Tenant(tenantId or tenantCode). Sharding will allow you to split your data from one database to multiple physical databases. Ideally you can have a database per logical shard key. In your case this means that you can have in best case database per tenant. Keep in mind that you don't have to split it that far. If your data is not that big enough to be worth putting every tenant data to separate database start with 2 databases and put half of your tenants to 1 physical database and half to a second physical database. You can then coordinate this on your application layer by saving in some configuration or another table in database which tenant is in which database. As your data grows you can migrate and/or create additional physical databases or physical shards.
Database per micro-service
It is one of the basic rules in micro-services architecture that each micro-service has its own database. There are multiple reasons for it some of them are:
Scaling
Isolation
Using different database technologies for different micro-services(if needed)
development team separation
You can read more about it here. Sure it has some drawbacks but keep in mind that is one of the key rules in micro-services architecture.
Your questions
If I use shared database for microservices, it violate concept/purpose
of microservices design where they all should be isolated and any
change should limited to that microservice.
If you can try to avoid shared database for multiple micro-services. If you end up doing it you should maybe consider your architectural design. Sometimes forcing one database is a indicator that some micro-services should be merged to one as the coupling between them is to big so the overhead of working with them becomes very complex.
If I use every microservice has their own database then its almost
impossible to have one database per tenant, otherwise each
microservice end-up with n number of database.
I don't really agree that its impossible. Yes it is hard to manage but if you decided to use micro-services and you need database sharding you need to deal with the extra complexity. Sure having one database per micro-service and then for each micro-service n databases could be very challenging.
As a solution I would suggest the following:
Include the tenant(tenantId or tenantCode) as a column in every table in your database. This way you will be able to migrate easily later if you decide that you need to shard that table,set of tables in schema, or whole db belonging to some micro-service. As already said in the above part regarding Database sharding you can start with one Physical shard(one physical database) but already define your logical shard(in this case using the tenant info in each table).
Separate physically the data to different shards only in the micro-services where you need it. Lets say you have 2 micro-services: product-inventory-micro-service and customers-micro-service. Lets say you have 300 million products in your product-inventory-micro-service db and only 500 000 Customers. You don't need to have a database per tenant in the customers-micro-service but in product-inventory-micro-service with 300 million records that would be very helpful performance wise.
As I said above start small with 1 or 2 physical databases and increase and migrate during the time as the time goes your data increases and you have the need for it. This way you will save yourself some overhead in development and maintenance of your servers at least for the time that you don't need it.
Our application is SaaS and multi-tenant with a microservices architecture. We do indeed use 1 database per service, though in some cases it's actually just a separate schema per tenant, not a separate instance.
The key concept in "database per service" is all to do with avoiding sharing of tables/documents between services and less to do with exactly how you implement that. If two services are both accessing the same table, that becomes a point of tight coupling between the services as well as a common point of failure - two things microservices are designed to avoid.
Database-per-service means don't share persistence between multiple services, how you implement that is up to you.
Multi-tenancy is another challenge and there are multiple ways to approach it. In our case (where regulations do not dictate our architecture) we have designed our Tenant-aware table-structures with TenantId as part of the Primary Key. Each Tenant-aware service implements this separation and our Authorization service helps keep users within the boundaries of their assigned Tenant(s).
If you are required to have more separation than key/value separation, I would look to leverage schemas as a great tool to segregate both data and security:
You could have a database instance (say a SQL Server instance) per microservice, and within each instance have a schema per tenant.
That may be overkill at the start, and I think you'd be safe to do a schema per service/tenant combination until that number grew to be a problem.
In either case, you'd probably want to write some tooling in your DB of choice to help manage your growing collection of schemas, but unless you are going to end up with thousands of tenants, that should get you pretty far down the road.
The last point to make is that you're going to lean heavily on an Integration bus of some sort to keep your multiple, independent data stores in sync. I strongly encourage you to spend as much time on designing that as you do the rest of your persistence as those events become the lifeblood of the system. For example, in our multi-tenant setup, creating a new tenant touches 80% of our other services (by broadcasting a message) so that those services are ready to interact with the new tenant. There are some integration events that need to happen quickly, others that can take their time, but managing all those moving parts is a challenge.

Database Bottleneck In Distributed Application

I hear about SOA and Distributed Applications everywhere now. I would like know about some best practices related to keeping the single data source responsive or in case if you have copy of data on every server how it is better to synchronise those databases to keep them updated ?
There are many answers to this question and in order to choose the most appropriate solution, you need to carefully consider what kind of data you are storing and what you want to do with it.
Replication
This is the traditional mechanism for many RDBMS, and normally relies on features provided by the RDBMS. Replication has a latency which means although servers can handle load independently, they may not necessarily be reading the latest data. This may or may not be a problem for a particular system. When replication is bidirectional then simultaneous changes on two databases can lead to conflicts that need resolving somehow. Depending on your data, the choice might be easy (i.e. audit log => append both), or difficult (i.e. hotel room booking - cancel one? select alternative hotel?). You also have to consider what to do in the event that the replication network link is down (i.e. do you deny updates on both database, one database or allow the databases to diverge and sort out the conflicts later). This is all dependent on the exact type of data you have. One possible compromise, for read-heavy systems, is to use unidirectional replication to many databases for reading, and send all write operations to the source database. This is always a trade-off between Availability and Consistency (see CAP Theorem). The advantage of RDBMS and replication is that you can easily query your entire dataset in complex ways and have greater opportunity to
remove duplication by using relational links to data items.
Sharding
If your data can be cleanly partitioned into disjoint subsets (e.g. different customers), such that all possible relational links between data items are contained within each subset (e.g. customers -> orders). Then you can put each subset in separate databases. This is the principle behind NoSQL databases, or as Martin Fowler calls them 'Aggregate-Oriented Databases'. The downside of this approach is that it requires more work to run queries over your entire dataset, as you have to query all your databases and then combine the results (e.g. map-reduce). Another disadvantage is that in separating your data you may need to duplicate some (e.g. sharding by customers -> orders might mean product data is duplicated). It is also hard to manage the data schema as it lies independently on multiple databases, which is why most NoSQL databases are schema-less.
Database-per-service
In the microservice approach, it is advised that each microservice should have its own dedicated database, that is not allowed to be accessed by any other microservice (of a different type). Hence, a microservice that manages customer contact information stores the data in a separate database from the microservice that manages customer orders. Links can be made between the databases using globally unique ids, or URIs (especially if the microservices are RESTful) etc. The downside again from this is that it is even harder to perform complex queries on the entire dataset (especially since all access should go via the microservice API not direct to the databases).
Polyglot storage
So many of my projects in the past have involved a single RDBMS in which all data was placed. Some of this data was well suited to the relational model, much of it was not. For example, hierarchical data might be better stored in a graph database, stock ticks in a column-oriented database, html templates in a NoSQL database. The trend with micro-services is to move towards a model where different parts of your dataset are placed in storage providers that are chosen according to the need.
If you thinking to keep different copies of the database for each microservice and you want to achieve eventual consistency than you can use Kafka Connect. I can briefly tell you that kafka connect will watch your DBS and whenever there are any changes it will read the log file and will add these logged events as a message in Queue then another database those are a subscriber to this Queue can execute the same statement at their side also.
Kafka connect isn't the only framework, you can search and find other frameworks or application for the same implementation.

Multiple Raven Databases with different Replication strategies

Raven DB creating multiple Databases to support different replication strategies.
Recently I was tasked with creating an additional raven database to store information pertaining to users. So the solution I working on would have some information in one Raven database and user information in another Raven Database. The reason for the request is so we could support different replications strategies for the two databases. Given my understanding raven only supports a single replication strategy per RavenDB.
First I would like to know if anyone has created an application with two raven databases?
Second I would like to know what problems you might have encountered, and a general sense of what issues I can plan for or mitigate early on?
Thank you ahead of time,
Having multiple Raven databases is possible, but only advisable in certain situations.
If each database could potentially be on a different server (as one would assume since you're talking about replicating differently) then each must have its own DocumentStore, which is fairly expensive to set up, but this only should happen once at application startup anyway, and you're talking about 2, not 50.
As Matt mentions in the comments, if you have two databases on the same server, then you can use the same DocumentStore and specify the database name when you open the session.
Each database should be for logically very different things. You won't easily be able to commingle data between the two databases. If a document in one database contained a reference to a document id in the other database, you wouldn't be able to use the Include features to get both of those documents in one round trip - there would essentially be a wall between the databases. Indexes could not span between the databases, for example.
Accessing both databases would require spinning up an IDocumentSession for each, both of which would need to be managed separately. If you're managing your document sessions at an infrastructure level (i.e. one session per HTTP request) then having two complicates things quite a bit.
However, if you have a segmented type of application, this can work quite well. For example, if you had a Users database that provided single sign-on across multiple websites (or areas of a website) then this could be a good fit. On most pages the user info would be essentially read-only (like to display the black bar at the top of Stack Overflow), except for the user management pages.
This could also be common if you're going for an Udi Dahan style SOA application structure where you define service boundaries and each service has its own independent database.

Database tables - how many database?

How many databases are needed for a social website? I have my tech team working on developing a social site but all their tables are in 1 database. I wanted to create separate table sets for user data, temporary tables, etc and thinking maybe have one separate database only for critical data, etc but I am not a tech person and now sure how this works? The site is going to be a local reviews website.
This is what happens when management tries to make tech decisions...
The simple answer, as always, is as few as possible.
The slightly more complicated answer is that once your begin to push the limits of your server and begin to think about multiple servers with master/slave replication then your may want your frequent write tables separated from your seldom write tables which will lower the master-slave update requirements.
If you start using seperate databases you can also run into an with you backup / restore strategy. If you have 5 databases and backup all five, what happens when you need to restore one of them, do you then need to restore all five?
I would opt for the fewest number of databases.
The reason you would want to have multiple databases is for scaling-out to multiple machines. In the context of a "social application" where large volume / high availability is a concern. If you anticipate the need to scale out to multiple machines to handle high volumes then the breakout of tables should be those that logically need to stay together.
So, for example, maybe you want to keep tables related to a specific subject area (maybe status updates) together in one database and other tables that are related to a different subject area (let's say user's picture libraries) together in a different database.
There are logical and performance reasons to keep tables in separate physical or logical databases.
What is the reason that you want it in different databases?
You could just put all tables in one database without a problem, even with for example multiple installations of an open source package. In that case you can use table prefixes.
Unless you are developing a really BIG website, one database is the way to proceed (by the way, did you consider the possible issues that may raise when working with various databases?).
If you are worried about performance, you can always configure different tablespaces on several storage devices in order to improve timings.
If you are worried about security, just increase it (better passwords, no direct root login, no port forwarding, avoid tunneling, etc.)
I am not a tech person only doing the functional analysis but I own the project so I need to oversee the tech team. My reason to have multiple database is security and performance.
Since this is going to be a new startup, there is no money to invest into strong security or getting the database designed flawless. Plus there are currently no backup policies in place so:
1) I want to separate critical data like user password/basic profile info, then separate out user media (photos they upload on their profile) and then the user content. Then separate out the system content. Current design is to have to layers of tables: Master tables for entire system and module tables for each individual module.
2) Performance: There are a lot of modules being designed and this is a data intensive social site with lots of reporting / analytic being builtin so lots of read/writes. Maybe better to distribute load across database based on purpose?
Since there isn't much funding hence I want to get it right the first time with my investment so the database can scale & work well until revenue comes in to actually invest in getting it right. Ofcourse that could be maybe 6 months away and say a million users away too.
Oh & there is plan to add staging/production mode also so seperate or same database?
You'll be fine sticking with using one database for now. Your developers can isolate/seperate application data by making use of database schema. Working with multiple databases can quickly become a journey through a world of pain and is to be avoided unless its absolutely crucial.

SQL Server architecture guidance

We are designing a new version of our existing product on a new schema.
Its an internal web application with possibly 100 concurrent users (max)This will run on a SQL Server 2008 database.
On of the discussion items recently is whether we should have a single database of split the database for performance reasons across 2 separate databases.
The database could grow anywhere from 50-100GB over 5 years.
We are Developers and not DBAs so it would be nice to get some general guidance.
[I know the answer is not simple as it depends on the schema, archiving policy, amount of data etc. ]
Option 1 Single Main Database
[This is my preferred option].
The plan would be to have all the tables in a single database and possibly to use file groups and partitioning to separate the data if required across multiple disks. [Use schema if appropriate]. This should deal with the performance concerns
One of the comments wrt this was that the a single server instance would still be processing this data so there would still be a processing bottle neck.
For reporting we could have a separate reporting DB but this is still being discussed.
Option 2 Split the database into 2 separate databases
DB1 - Customers, Accounts, Customer resources etc
DB2 - This would contain the bulk of the data [i.e. Vehicle tracking data, financial transaction tables etc].
These tables would typically contain a lot of data. [It could reside on a separate server if required]
This plan would involve keeping the main data in a smaller database [DB1] and retaining the [mainly] read only transaction type data in a separate DB [DB2]. The UI would mainly read from DB1 and thus be more responsive.
[I'm aware that this option makes it harder for Referential Integrity to be enforced.]
Points for consideration
As we are at the design stage we can at least make proper use of indexes to deal performance issues so thats why option 1 to me is attractive and its more of a standard approach.
For both options we are considering implementing an archiving database.
Apologies for the long Question. In summary the question is 1 DB or 2?
Thanks in advance,
Liam
Option 1 in my opinion is the way to go.
CPU is very unlikely to be your bottleneck with 100 concurrent users providing your workload. You could acquire a single multi-socket server with additional CPU capacity available via hot swap technology to offer room to grow should you wish. Dependent on your availability requirements you could also consider using a Clustering solution to allow for swapping in more processing CPU resource by forced fail over to another node.
The performance of your disk subsystem is going to be your biggest concern. Your design decisions will be influenced by the storage solution you use, which I assume will be SAN technology.
As a minimum you will want to place your LOG(RAID 1) and DATA files(RAID 10 or 5 dependent on workload) on separate LUNS.
Dependent on your table access you may wish to consider placing different Filegroups on separate LUN's. Partitioning your table data could prove advantageous to you but only for large tables.
50 to 100GB and 100 users is a pretty small database by most standards today. Don't over engineer your solution by trying to solve problems that you haven't even seen yet. Splitting it into two databases, especially on two different servers will create a mountain of headaches that you're better off without. Concentrate your efforts on creating a useful product instead.
I agree to the other comments stating that between 50 and 100GB is small these days. I'd also agree that you shouldn't overengineer.
But, if there is a obvious (or not so obvious) logical separation between the entities you store (like you say, one being read-write and the other parts mainly read-only), I'd still split it in different dbs. At least I would design it in a way I could easily factor one piece out. Security would be one reason, management/backup/restore another, easier serviceability (because inherently the design will be better factored and parts better isolated from each other), and, in SQL Server, ability to scale out (or the lack thereof if it is a single database). Separating login and content databases for example often makes sense for bigger web applications.
And, if you really want a sound design, separate your entities in a single db, using different schemas, putting proper permissions on objects, you end up with almost the same effort in my eyes.
Microsoft products like SharePoint, TFS and BizTalk all use several different databases (Though I do not pretend to be aware of the reasons / probably just the outcome of the way they organize their teams).
Especially with regard to that you cannot scale out a single database instance on SQL Server (clustering needs multiple instances), I'd be tempted to split it.
#John: I would never use RAID5. Solves no purpose other than to hurt performance. I agree with the RAID10 approach.
Putting data in another database is not going to make the slightest difference to performance. Performance is a factor of other things entirely.
A reason to create a new database is for maintenance and administration reasons. For example if one set of data needs a different backup and recovery policy or has higher availability requirements.

Resources