creating database for each new company [duplicate] - database

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am building a SAAS application and we are discussing about one database per client vs shared databases. I have read a lot, incluisve some topics here at SO but I have still many doubts.
Our plataform should be highly customizable by each client. (they should be able to have custom tables and add custom fields to existing tables).
The multiple database aproach seems great in this case.
The problem is. should my "users" table be in the master database or in each client database?.
A user might have one or more organizations, so it would be present in multiple databases.
Also, what about generic tables like countries table, etc?
It makes sense to be in the master database. But I have many tables with a created_by field which have a foreign key to the user. Also have some permission related tables by client.
I would loose the power of foreign keys if multiple databases, which means more queries to the database. I know I can use cross-join between databases if they are in the same server but then i loose scalability. (I might need to have multiple database servers in future).
I have tought about federated tables. Not sure about performance.
The technologies I am using are php and symfony 2 framework and mysql for the database.
Also, I am afraid about the maintenance of such a system. We could create some scripts to automate the schema changes in all databases, but if we have 10k clients that would mean 10k databases.
What is your opiniion about this?
The main caracteristic of my app should be flexibility so if a client needs something more specific than the base plataform doesnt have, it should be possible to do it for him.

Some classic problems here. Have you ever been to http://highscalability.com/? Some good case studies there.
From personal experience if you try to share clients on one server, you will find that a very successful/active user will take up all the resources of the machine over time. We had one client in a SAAS that destroyed a shared server and we had to move him somewhere else.
I would rip out global enumerations into a service. You can make one central database for things like list of countries, list of states, etc. and put it behind a web service layer. Also in that database you can have user management/managing what server belongs to what user etc. You can make a management portal that reads/writes to this database for managing your user base.
If I was doing a SAAS again, I would start small and wait for the pain to hit. What you really want are good tools to address the scaling issues when they happen. Have some scripts ready to do rolling schema changes across servers (no way to avoid this once you have more than one server). Have scripts to take down machines while you are modifying the schema. Have scripts to migrate a user from a shared server to a dedicated one.
Consider setting up replication from a central database. This would pump down global information that each user partition/database would need without you having to write a lot of code.
But the biggest piece of advice I've seen - and experienced first hand - don't try too hard to build the next Facebook for scale. Start simple and see what actually happens before worrying about major scalability issues. You might be surprised as the user base grows what scales well and what does not.

Related

How to structure/coordinate multiple databases?

Imagine a large corp with dozens of companies, each with their own website and each website will have their own unique functional requirements
Most data on each website will be specific to that website
Each website can edit its own data
Some data will be shared across all websites
There will be a central CMS that is allowed to edit this data, but other websites can read and use that data
e.g. say you're planning the infrastructure for a company that owns multiple sub-companies that make different kinds of products, some in the same category (cereal, food), others in completely different categories (books, instruments). Some are marketing websites, some are for CRM, some are online stores
there are a list of regulatory requirements that affect all products
each company should manage the status of compliance of its own products to each requirement
when a new requirement surfaces, details regarding that requirement should only be entered once
How would the multiple databases be coordinated?
edit: added more info per Bob's suggestions
Thanks for the incredibly insightful questions!
compliance data is not shared, silo'd within each site
shared data is only on the one enterprise-wide database, they will mostly be "types of [thing]"
no conclusive list of instances where they'll be used but currently it'd be to populate CMS dropdowns for individual sites.
changes to shared data would occur a few times a year.
Ideally changes would be reflected within a few minutes, but an hour or so should be acceptable
very low volume in shared data.
All DBs will be new, decision on which DB is pending current investigation.
Sub-systems will expose REST api
Here are some ways I have seen this handled, you need to think about the implications of each structure based on the details of your particular business domain. All can work, but all have to be carefully set up if they are going to work.
One database for shared information and one for each client for client-specific information. Set up the overall application so that the first thing you put in the application on log in is the client and it connects to the correct client. People might have to also have a way to change the client if users will handled multiples.
Separate servers for each client if they completely need to be siloed. Database changes are by script (and in source control) and are applied to each server as need be. So the changes to the central database might have a job that runs to push any data changes to the other servers
All the data in one database, but making sure each table has a client_id so that the data is always filtered correctly by client. You can set up separate views by client, so that the users can only see the clients they are supposed to see. This only works if the data for each client is substantially in the same form.
And since you are in a regulatory environment, I strongly urge that you create an audit database that is updated by database triggers (never audit from the application, you will lose changes to the data) for each database.
I agree with Chris that, even after both the sets of questions, there is still a big set of possible solutions. For instance, if the databases were the same technology, and the shared data were stored in the same way in each one, you could do db-level replication from the central db to the others. Is it OK to have 2 separate dbs per application (one with shared stuff and one with not-shared?) - this would influence the kind of replication.
Or you could have a purely code solution, where clicking publish in a GUI that updates the central db calls a set of APIs that also update the other dbs. Or micro-services - updating the central db also creates a message on a shared queue, that is picked up by services that each look after a different db and apply the updates in whatever form makes sense for that db.
It depends on (among the things already mentioned) what your organisation's technology strategy is, what technology and skills you already have in-house, and so on.
So this is as much an architecture question as it is a db question.
I don't think this question is sufficiently clear to get a single answer. However there are a few possibilities.
In many cases, where you have shared data you want to have a single point of ownership of that information. It could be in a database, in an excel file (which can then be turned into csv and periodically loaded on all dbs), or some other form. The specifics depend on what is shared exactly.
Now in this case it sounds like you are going to have some sort of legal department in charge of some shared information and they will manage that data, which will then be shared to the other sites. This might be done with an application they manage which aggregates information from the other companies or it could be data which is pushed to their systems.
A final point:
Software is at its best when it facilitates human solutions to human problems, not when it tries to solve those problems directly. In these cases, you probably want a good human solution in place and then to look at what software can do to support that. A lot of the issues (who owns the information?) will already have been solved and you will be simply automating what is already done.

multi-tenant database architecture [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am building a SAAS application and we are discussing about one database per client vs shared databases. I have read a lot, incluisve some topics here at SO but I have still many doubts.
Our plataform should be highly customizable by each client. (they should be able to have custom tables and add custom fields to existing tables).
The multiple database aproach seems great in this case.
The problem is. should my "users" table be in the master database or in each client database?.
A user might have one or more organizations, so it would be present in multiple databases.
Also, what about generic tables like countries table, etc?
It makes sense to be in the master database. But I have many tables with a created_by field which have a foreign key to the user. Also have some permission related tables by client.
I would loose the power of foreign keys if multiple databases, which means more queries to the database. I know I can use cross-join between databases if they are in the same server but then i loose scalability. (I might need to have multiple database servers in future).
I have tought about federated tables. Not sure about performance.
The technologies I am using are php and symfony 2 framework and mysql for the database.
Also, I am afraid about the maintenance of such a system. We could create some scripts to automate the schema changes in all databases, but if we have 10k clients that would mean 10k databases.
What is your opiniion about this?
The main caracteristic of my app should be flexibility so if a client needs something more specific than the base plataform doesnt have, it should be possible to do it for him.
Some classic problems here. Have you ever been to http://highscalability.com/? Some good case studies there.
From personal experience if you try to share clients on one server, you will find that a very successful/active user will take up all the resources of the machine over time. We had one client in a SAAS that destroyed a shared server and we had to move him somewhere else.
I would rip out global enumerations into a service. You can make one central database for things like list of countries, list of states, etc. and put it behind a web service layer. Also in that database you can have user management/managing what server belongs to what user etc. You can make a management portal that reads/writes to this database for managing your user base.
If I was doing a SAAS again, I would start small and wait for the pain to hit. What you really want are good tools to address the scaling issues when they happen. Have some scripts ready to do rolling schema changes across servers (no way to avoid this once you have more than one server). Have scripts to take down machines while you are modifying the schema. Have scripts to migrate a user from a shared server to a dedicated one.
Consider setting up replication from a central database. This would pump down global information that each user partition/database would need without you having to write a lot of code.
But the biggest piece of advice I've seen - and experienced first hand - don't try too hard to build the next Facebook for scale. Start simple and see what actually happens before worrying about major scalability issues. You might be surprised as the user base grows what scales well and what does not.

Database per application VS One big database for all applications [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm designing a few applications that will share 2 or 3 database tables and all of the other tables will be independent of each app. The shared databases contain mostly user information, and there might occur the case where other tables need to be shared, but that's my instinct speaking.
I'm leaning over the one database for all applications solution because I want to have referential integrity, and I won't have to keep the same information up to date in each of the databases, but I'm probably going to end with a database of 100+ tables where only groups of ten tables will have related information.
The database per application approach helps me keep everything more organized, but I don't know a way to keep the related tables in all databases up to date.
So, the basic question is: which of both approaches do you recommend?
Thanks,
Jorge Vargas.
Edit 1:
When I talk about not being able to have referential integrity, it's because there's no way to have foreign keys in tables when those tables are in different databases, and at least one of the tables per application will need a foreign key to one of the shared tables.
Edit 2:
Links to related questions:
SQL design around lack of cross-database foreign key references
Keeping referential integrity across multiple databases
How to salvage referential integrity with mutiple databases
Only the second one has an accepted answer. Still haven't decided what to do.
Answer:
I've decided to go with a database per application with cross-database references to a shared database, adding views to each database mimicking the tables in the shared database, and using NHibernate as my ORM. As the membership system I'll be using the asp.net one.
I'll also use triggers and logical deletes to try and keep to a minimum the number of ID's I'll have flying around livin' la vida loca without a parent. The development effort needed to keep databases synced is too much and the payoff is too little (as you all have pointed out). So, I'd rather fight my way through orphaned records.
Since using an ORM and Views was first suggested by svinto, he gets the correct answer.
Thanks to all for helping me out with this tough decision.
Neither way looks ideal
I think you should consider not making references in database layer for cross-application relations, and make them in application layer. That would allow you to split it to one database per app.
I'm working on one app with 100+ tables. I have them in one database, and are separated by prefixes - each table has prefix for module it belongs to. Then i have built a layer on top of database functions to use this custom groups. I'm also building data administrator, which takes advantage of this table groups and makes editing data very easy.
It depends and your options are a bit different depending on the database and frameworks you're using. I'd recommend using some sort of ORM and that way you don't need to bother that much. Anyways you could probably put each app in it's own schema in the database and then either reference the shared tables by schemaname.tablename or create views in each application schema that's just a SELECT * FROM schemaname.tablename and then code against that view.
There are no hard and fast rules to choose one over the other.
Multiple databases provide modularity. As far as sync-ing across multiple databases are concerned, one can use the concept of linked servers and views thereof and can gain the advantages of integrated database (unified access) as well.
Also, keeping multiple databases can help better management of security, data, backup & restore, replication, scaling out etc!
My 2cents.
THat does not sound like "a lot of applications" at all, but like "one application system with different executables". Naturally they can share one database. Make smart usage of Schemata to isolate the different funcational areas within one database.
One database for all application in my opinion .Data would be stored once no repitation.
With the other approach you would end up replicating and in my opinion when you start replicating it will bring its own headache and data would go out of sync too
The most appropriate approach from scalability and maintenence point of view would be to make the "shared/common" tables subset self-sufficient and put it to "commons" database, for all others have 1 db per application of per logical scope (business logic determined) and maintain this structure always
This will ease the planning and execution commissioning/decommissioning/relocation/maintenence procedures of your software (you will know exactly which two affected DBs (commons+app_specific) are involved if you know which app you are going to touch and vice versa.
At our business, we went with a separate database per application, with cross database references for the small amount of shared information and an occasional linked server. This has worked pretty well with a development, staging, build and production environments.
For users, our entire user base is on windows. We use Active Directory to manage the users with application references to groups, so that the apps don't have to manage users, which is nice. We did not centralize the group management, that is each application has tables for groups and security which is not so nice but works.
I would recommend, that if your applications are really different, to have a database per application. Looking back, the central shared database for users sounds workable as well.
You can use triggers for cross database referential integrity:
Create a linked server to the server that holds the database that you want to reference. Then use 4-part naming to reference the table in the remote database that holds the reference data. Then put this in the insert and update triggers on the table.
EXAMPLE(assumes single row inserts and updates):
DECLARE #ref (datatype appropriate to your field)
SELECT #ref = refField FROM inserted
IF NOT EXISTS (SELECT *
FROM referenceserver.refDB.dbo.refTable
WHERE refField = #ref)
BEGIN
RAISERROR(...)
ROLLBACK TRAN
END
To do multi row inserts and updates you can join the tables on the reference field but it can be very slow.
I think the answer to this question depends entirely on your non functional requirements. If you are designing a application that will one day need to be deployed across 100's of nodes then you need to design your database so that if need be it could be horizontally scaled. If on the other hand this application is to be used by a hand full of users and may have a short shelf life then you approach will be different. I have recently listened to a pod cast of how EBAY's architecture is set-up, http://www.se-radio.net/podcast/2008-09/episode-109-ebay039s-architecture-principles-randy-shoup, and they have a database per application stream and they use sharding to split tables across physical nodes. Now their non-functional requirements are that the system is available 24/7, is fast, can support thousands of users and that is does not lose any important data. EBAY make millions of pounds and so can support the effort that this takes to develop and maintain.
Anyway this does not answer your question:) my personnel opinion would be to make sure your non-functional requirements have been documented and signed off by someone. That way you can decide on the best Architecture. I would be tempted to have each application using its own database and a central database for shared data. And I would try to minimise the dependencies between them, which I'm sure is not easy or you would have done it:), but I would also try to steer clear of having to produce some sort of middle ware software to keep tables in sync as this could create a headaches for you.
At the end of the day you need to get your system up and running and the guys with the pointy hair wont give a monkeys chuff about how cool your design is.
We went for splitting the database down, and having one common database for all the shared tables. Due to them all being on the save SQL Server instance it didn't affect the cost of running queries across multiple database.
The key in replication for us was that whole server was on a Virtual Machine (VM), so for replication to create Dev/Test environments, IT Support would just create a copy of that image and restore additional copies when required.

How to Design a SaaS Database [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have a web app that I built for a trucking company that I would like to offer as SaaS. What is the best way to design the database?
Should I create a new database for each company? Or should I use one database with tables that have a prefix of the company name? Or should I Use one database with one of each table and just add a company id field to the tables? Or is there some other way to do it?
faced with a similar situation about 10 years ago, we opted for a database per client. we have hundreds (not thousands) of clients. looking back it was one of the best decisions we made. backups are easy. copying a single client to our office for analysis is easy (just take the last backup). scaling is easy (moving a single big client to a different server can free up resources on a stressed sql server). joel & jeff had a discussion about this on a stack overflow podcast (not a recent one) and joel did the same thing i do ... each client gets their own database. database purists will often argue for lumping everyone into one db, but i would never do that.
-don
Should I create a new database for each company?
Yes - Don Dickinson was on the money. However, see a refinement below.
Or should I use one database with tables that have a prefix of the
company name?
Lord no! Changing your database queries for different for client would make you go insane! Also, you'd almost certainly run dynamic SQL (where the table name is changed in code before running the query), which would harm performance as most servers like to cache query plans and interim results - this doesn't work if the table names keep changing.
Or should I Use one database with one
of each table and just add a company
id field to the tables?
You might want to do this if you want to have some kind of scalable model for your customers. Whilst provisioning a new database for each customer gives you lots of flexibility, it also involves costs and complexity. You have to create a new backup schedule, have a lifecycle model for dealing with expired customers etc.
So, you might say that "free trial" and "bronze" customers are all lumped into a single database, using the company id to separate them out; "silver" users get their own database (but you still keep the customer_id field in the schema, so you don't have to change queries between two levels of customer), and "gold" customers get their own database server.
I did something similar a few years ago at a SaaS company - and customers are typically happy to have an upgrade path on infrastructure (read: performance and resilience) as well as features.
We have some databases here with shared clients and some where each client has it's own server and own database. The ones where the client is on it's own server are the easiest to manage and the least likely to cause a problem when some developer forgot to add the clientid and sent client a's data to client b by accident (an example NOT chosen at random).
Keeping each on it's own server or server instance allows us to keep the database structure the same with the same names and makes it easier to propagate changes to all the servers because we don't have to change the database name.
If you do use separate instances for each client, make sure you design and implement a good system for propagating all changes to all clients. If these databases get out of sync, they can become horrible to maintain. You'll find that if you let them get out of sync, each client will ask for changes and you will have 27 ways to do the same thing. You have to generalize when they are on the same database, when they are separate you have to use self discipline to ensure new functionality is the same for each client.
It depends, here, i work in a company that has many "Internal Business units" treated like other companies.
So, some reports must include all companies, Client accounts must also be shared across companies. Here we have a CompanyId Field in the tables that requires it.
The Prefix solution is surely one to be avoided.

Is anyone using the Service Broker in SQL Server? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
When I attended a presentation of SQL Server 2008 at Microsoft, they did a quick gallup to see what features we were using. It turned out that in the entire lecture hall, my company was the only one using the Service Broker. This surprised me a lot, as I thought that more people would be using it.
My experience with SB is that it does it's job well, but is pretty tough to administer and it's hard to get an overview.
So, have you considered using the Service Broker? If not, why not? Did you go for MSMQ instead? Is there anything in SQL Server 2008 that would make you consider using the Service Broker.
I've been using SQL Service Broker since a couple of months after SQL 2005 was released. We use it non-stop here sending hundreds of thousands of messages through it per day.
We use it to load data from staging tables to production tables so that the service that loads the staging table doesn't have to wait for the data to actually process, it can go back and get more data to load.
We use it to queue the deletion of files from the file system. (When the row is deleted the file needs to be deleted as well.)
At prior companies I've used it to print loan documents and the checks that were sent out to the customers.
I even used Service Broker to do ETL from an OLTP database to an OLAP database for real time reporting.
Most people (especially DBAs) don't like Service Broker because there isn't any UI for it. If you want to use service broker or see what its doing you have to actually write and run some T/SQL.
I have been using SB in 2005 for about two years now with one implementation handling several hundred thousand messages a day. I would say the biggest challenge has been not so much in the architecture but understanding all the nuances involved. The documentation from Microsoft is poor with very few practical examples. Remus Rusanu's blogs have really been helpful in doing things like dialog reuse and activation stored procedure tuning. I have found it's REALLY important to reuse dialogs as much as possible (and working through all the associated locking involved with that) as well as handling multiple received messages as a set rather than one at a time.
Monitoring SB can be a pain. You basically depend on a bunch of system views to tell you what's going on. Orphaned messages are a pain. There's just a lot of little gotchas that can, well, getcha.
Aside from the problems, and there aren't THAT many, I think it has really worked out better than I expected it to. Since SB is integrated into the database, there's no separate message queues to back up outside the database. It's all transactionally consistent. Performance is good. It's a great solution.
I would use it again and will continue to use it.
At my current company, our usage of SB is somewhat different to that of the other posters. We use SB in SQL2005 mainly as a management tool. For example, we use it to manage updates to a small set of mutable tables that are present in a large number of otherwise immutable databases. All the messages are between services running on the same instance and the message volume is very low.
My experience with SB has been that it can be somewhat 'fiddly' to setup correctly and, as you mentioned in your question, it is hard to get an overview of the state of SB because there is not a single monitoring tool.
Nevertheless, we have found it hugely valuable as a way to automate a lot of database management tasks in a traceable and reliable way.
I have recently considered using Service Broker for a project, but yes, decided to go for MSMQ instead.
Our architecture consisted of a number of (clustered) servers, each needing to write information into a single instance of SQL reliably.
As I understand it, SB only works for SQL to SQL communication, so we would have needed an instance of SQL on each clustered box. We felt this was a bit unnecessary, hence using MSMQ
To be honest, i'm can't think of a scenario where I would use SB - I'm interested in knowing a bit more about your scenario, to see if I'm missing something vital.
Service Broker can be used in various cases where automation is required to be done in the distributed architecture.
Such applications receiving events from various devices and need processing to be done reliably. Where events from devices (detection) or sensors are used for processing the logic of automation. To do exchange of data between multiple database or applications.
I hope the implementation can be more secured and reliable with SB

Resources