multi-tenant database architecture [closed] - database

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am building a SAAS application and we are discussing about one database per client vs shared databases. I have read a lot, incluisve some topics here at SO but I have still many doubts.
Our plataform should be highly customizable by each client. (they should be able to have custom tables and add custom fields to existing tables).
The multiple database aproach seems great in this case.
The problem is. should my "users" table be in the master database or in each client database?.
A user might have one or more organizations, so it would be present in multiple databases.
Also, what about generic tables like countries table, etc?
It makes sense to be in the master database. But I have many tables with a created_by field which have a foreign key to the user. Also have some permission related tables by client.
I would loose the power of foreign keys if multiple databases, which means more queries to the database. I know I can use cross-join between databases if they are in the same server but then i loose scalability. (I might need to have multiple database servers in future).
I have tought about federated tables. Not sure about performance.
The technologies I am using are php and symfony 2 framework and mysql for the database.
Also, I am afraid about the maintenance of such a system. We could create some scripts to automate the schema changes in all databases, but if we have 10k clients that would mean 10k databases.
What is your opiniion about this?
The main caracteristic of my app should be flexibility so if a client needs something more specific than the base plataform doesnt have, it should be possible to do it for him.

Some classic problems here. Have you ever been to http://highscalability.com/? Some good case studies there.
From personal experience if you try to share clients on one server, you will find that a very successful/active user will take up all the resources of the machine over time. We had one client in a SAAS that destroyed a shared server and we had to move him somewhere else.
I would rip out global enumerations into a service. You can make one central database for things like list of countries, list of states, etc. and put it behind a web service layer. Also in that database you can have user management/managing what server belongs to what user etc. You can make a management portal that reads/writes to this database for managing your user base.
If I was doing a SAAS again, I would start small and wait for the pain to hit. What you really want are good tools to address the scaling issues when they happen. Have some scripts ready to do rolling schema changes across servers (no way to avoid this once you have more than one server). Have scripts to take down machines while you are modifying the schema. Have scripts to migrate a user from a shared server to a dedicated one.
Consider setting up replication from a central database. This would pump down global information that each user partition/database would need without you having to write a lot of code.
But the biggest piece of advice I've seen - and experienced first hand - don't try too hard to build the next Facebook for scale. Start simple and see what actually happens before worrying about major scalability issues. You might be surprised as the user base grows what scales well and what does not.

Related

Strategies to building a database of 30m images [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Summary
I am facing the task of building a searchable database of about 30 million images (of different sizes) associated with their metadata. I have no real experience with databases so far.
Requirements
There will be only a few users, the database will be almost read-only, (if things get written then by a controlled automatic process), downtime for maintenance should be no big issue. We will probably perform more or less complex queries on the metadata.
My Thoughts
My current idea is to save the images in a folder structure and build a relational database on the side that contains the metadata as well as links to the images themselves. I have read about document based databases. I am sure they are reliable, but probably the images would only be accessible through a database query, is that true? In that case I am worried that future users of the data might be faced with the problem of learning how to query the database before actually getting things done.
Question
What database could/should I use?
Storing big fields that are not used in queries outside the "lookup table" is recommended for certain database systems, so it does not seem unusual to store the 30m images in the file system.
As to "which database", that depends on the frameworks you intend to work with, how complicated your queries usually are, and what resources you have available.
I had some complicated queries run for minutes on MySQL that were done in seconds on PostgreSQL and vice versa. Didn't do the tests with SQL Server, which is the third RDBMS that I have readily available.
One thing I can tell you: Whatever you can do in the DB, do it in the DB. You won't even nearly get the same performance if you pull all the data from the database and then do the matching in the framework code.
A second thing I can tell you: Indexes, indexes, indexes!
It doesn't sound like the data is very relational so a non-relational DBMS like MongoDB might be the way to go. With any DBMS you will have to use queries to get information from it. However, if your worried about future users, you could put a software layer between the user and DB that makes querying easier.
Storing images in the filesystem and metadata in the DB is a much better idea than storing large Blobs in the DB (IMHO). I would also note that the filesystem performance will be better if you have many folders and subfolders rather than 30M images in one big folder (citation needed)

creating database for each new company [duplicate]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am building a SAAS application and we are discussing about one database per client vs shared databases. I have read a lot, incluisve some topics here at SO but I have still many doubts.
Our plataform should be highly customizable by each client. (they should be able to have custom tables and add custom fields to existing tables).
The multiple database aproach seems great in this case.
The problem is. should my "users" table be in the master database or in each client database?.
A user might have one or more organizations, so it would be present in multiple databases.
Also, what about generic tables like countries table, etc?
It makes sense to be in the master database. But I have many tables with a created_by field which have a foreign key to the user. Also have some permission related tables by client.
I would loose the power of foreign keys if multiple databases, which means more queries to the database. I know I can use cross-join between databases if they are in the same server but then i loose scalability. (I might need to have multiple database servers in future).
I have tought about federated tables. Not sure about performance.
The technologies I am using are php and symfony 2 framework and mysql for the database.
Also, I am afraid about the maintenance of such a system. We could create some scripts to automate the schema changes in all databases, but if we have 10k clients that would mean 10k databases.
What is your opiniion about this?
The main caracteristic of my app should be flexibility so if a client needs something more specific than the base plataform doesnt have, it should be possible to do it for him.
Some classic problems here. Have you ever been to http://highscalability.com/? Some good case studies there.
From personal experience if you try to share clients on one server, you will find that a very successful/active user will take up all the resources of the machine over time. We had one client in a SAAS that destroyed a shared server and we had to move him somewhere else.
I would rip out global enumerations into a service. You can make one central database for things like list of countries, list of states, etc. and put it behind a web service layer. Also in that database you can have user management/managing what server belongs to what user etc. You can make a management portal that reads/writes to this database for managing your user base.
If I was doing a SAAS again, I would start small and wait for the pain to hit. What you really want are good tools to address the scaling issues when they happen. Have some scripts ready to do rolling schema changes across servers (no way to avoid this once you have more than one server). Have scripts to take down machines while you are modifying the schema. Have scripts to migrate a user from a shared server to a dedicated one.
Consider setting up replication from a central database. This would pump down global information that each user partition/database would need without you having to write a lot of code.
But the biggest piece of advice I've seen - and experienced first hand - don't try too hard to build the next Facebook for scale. Start simple and see what actually happens before worrying about major scalability issues. You might be surprised as the user base grows what scales well and what does not.

Database per application VS One big database for all applications [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm designing a few applications that will share 2 or 3 database tables and all of the other tables will be independent of each app. The shared databases contain mostly user information, and there might occur the case where other tables need to be shared, but that's my instinct speaking.
I'm leaning over the one database for all applications solution because I want to have referential integrity, and I won't have to keep the same information up to date in each of the databases, but I'm probably going to end with a database of 100+ tables where only groups of ten tables will have related information.
The database per application approach helps me keep everything more organized, but I don't know a way to keep the related tables in all databases up to date.
So, the basic question is: which of both approaches do you recommend?
Thanks,
Jorge Vargas.
Edit 1:
When I talk about not being able to have referential integrity, it's because there's no way to have foreign keys in tables when those tables are in different databases, and at least one of the tables per application will need a foreign key to one of the shared tables.
Edit 2:
Links to related questions:
SQL design around lack of cross-database foreign key references
Keeping referential integrity across multiple databases
How to salvage referential integrity with mutiple databases
Only the second one has an accepted answer. Still haven't decided what to do.
Answer:
I've decided to go with a database per application with cross-database references to a shared database, adding views to each database mimicking the tables in the shared database, and using NHibernate as my ORM. As the membership system I'll be using the asp.net one.
I'll also use triggers and logical deletes to try and keep to a minimum the number of ID's I'll have flying around livin' la vida loca without a parent. The development effort needed to keep databases synced is too much and the payoff is too little (as you all have pointed out). So, I'd rather fight my way through orphaned records.
Since using an ORM and Views was first suggested by svinto, he gets the correct answer.
Thanks to all for helping me out with this tough decision.
Neither way looks ideal
I think you should consider not making references in database layer for cross-application relations, and make them in application layer. That would allow you to split it to one database per app.
I'm working on one app with 100+ tables. I have them in one database, and are separated by prefixes - each table has prefix for module it belongs to. Then i have built a layer on top of database functions to use this custom groups. I'm also building data administrator, which takes advantage of this table groups and makes editing data very easy.
It depends and your options are a bit different depending on the database and frameworks you're using. I'd recommend using some sort of ORM and that way you don't need to bother that much. Anyways you could probably put each app in it's own schema in the database and then either reference the shared tables by schemaname.tablename or create views in each application schema that's just a SELECT * FROM schemaname.tablename and then code against that view.
There are no hard and fast rules to choose one over the other.
Multiple databases provide modularity. As far as sync-ing across multiple databases are concerned, one can use the concept of linked servers and views thereof and can gain the advantages of integrated database (unified access) as well.
Also, keeping multiple databases can help better management of security, data, backup & restore, replication, scaling out etc!
My 2cents.
THat does not sound like "a lot of applications" at all, but like "one application system with different executables". Naturally they can share one database. Make smart usage of Schemata to isolate the different funcational areas within one database.
One database for all application in my opinion .Data would be stored once no repitation.
With the other approach you would end up replicating and in my opinion when you start replicating it will bring its own headache and data would go out of sync too
The most appropriate approach from scalability and maintenence point of view would be to make the "shared/common" tables subset self-sufficient and put it to "commons" database, for all others have 1 db per application of per logical scope (business logic determined) and maintain this structure always
This will ease the planning and execution commissioning/decommissioning/relocation/maintenence procedures of your software (you will know exactly which two affected DBs (commons+app_specific) are involved if you know which app you are going to touch and vice versa.
At our business, we went with a separate database per application, with cross database references for the small amount of shared information and an occasional linked server. This has worked pretty well with a development, staging, build and production environments.
For users, our entire user base is on windows. We use Active Directory to manage the users with application references to groups, so that the apps don't have to manage users, which is nice. We did not centralize the group management, that is each application has tables for groups and security which is not so nice but works.
I would recommend, that if your applications are really different, to have a database per application. Looking back, the central shared database for users sounds workable as well.
You can use triggers for cross database referential integrity:
Create a linked server to the server that holds the database that you want to reference. Then use 4-part naming to reference the table in the remote database that holds the reference data. Then put this in the insert and update triggers on the table.
EXAMPLE(assumes single row inserts and updates):
DECLARE #ref (datatype appropriate to your field)
SELECT #ref = refField FROM inserted
IF NOT EXISTS (SELECT *
FROM referenceserver.refDB.dbo.refTable
WHERE refField = #ref)
BEGIN
RAISERROR(...)
ROLLBACK TRAN
END
To do multi row inserts and updates you can join the tables on the reference field but it can be very slow.
I think the answer to this question depends entirely on your non functional requirements. If you are designing a application that will one day need to be deployed across 100's of nodes then you need to design your database so that if need be it could be horizontally scaled. If on the other hand this application is to be used by a hand full of users and may have a short shelf life then you approach will be different. I have recently listened to a pod cast of how EBAY's architecture is set-up, http://www.se-radio.net/podcast/2008-09/episode-109-ebay039s-architecture-principles-randy-shoup, and they have a database per application stream and they use sharding to split tables across physical nodes. Now their non-functional requirements are that the system is available 24/7, is fast, can support thousands of users and that is does not lose any important data. EBAY make millions of pounds and so can support the effort that this takes to develop and maintain.
Anyway this does not answer your question:) my personnel opinion would be to make sure your non-functional requirements have been documented and signed off by someone. That way you can decide on the best Architecture. I would be tempted to have each application using its own database and a central database for shared data. And I would try to minimise the dependencies between them, which I'm sure is not easy or you would have done it:), but I would also try to steer clear of having to produce some sort of middle ware software to keep tables in sync as this could create a headaches for you.
At the end of the day you need to get your system up and running and the guys with the pointy hair wont give a monkeys chuff about how cool your design is.
We went for splitting the database down, and having one common database for all the shared tables. Due to them all being on the save SQL Server instance it didn't affect the cost of running queries across multiple database.
The key in replication for us was that whole server was on a Virtual Machine (VM), so for replication to create Dev/Test environments, IT Support would just create a copy of that image and restore additional copies when required.

How to Design a SaaS Database [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have a web app that I built for a trucking company that I would like to offer as SaaS. What is the best way to design the database?
Should I create a new database for each company? Or should I use one database with tables that have a prefix of the company name? Or should I Use one database with one of each table and just add a company id field to the tables? Or is there some other way to do it?
faced with a similar situation about 10 years ago, we opted for a database per client. we have hundreds (not thousands) of clients. looking back it was one of the best decisions we made. backups are easy. copying a single client to our office for analysis is easy (just take the last backup). scaling is easy (moving a single big client to a different server can free up resources on a stressed sql server). joel & jeff had a discussion about this on a stack overflow podcast (not a recent one) and joel did the same thing i do ... each client gets their own database. database purists will often argue for lumping everyone into one db, but i would never do that.
-don
Should I create a new database for each company?
Yes - Don Dickinson was on the money. However, see a refinement below.
Or should I use one database with tables that have a prefix of the
company name?
Lord no! Changing your database queries for different for client would make you go insane! Also, you'd almost certainly run dynamic SQL (where the table name is changed in code before running the query), which would harm performance as most servers like to cache query plans and interim results - this doesn't work if the table names keep changing.
Or should I Use one database with one
of each table and just add a company
id field to the tables?
You might want to do this if you want to have some kind of scalable model for your customers. Whilst provisioning a new database for each customer gives you lots of flexibility, it also involves costs and complexity. You have to create a new backup schedule, have a lifecycle model for dealing with expired customers etc.
So, you might say that "free trial" and "bronze" customers are all lumped into a single database, using the company id to separate them out; "silver" users get their own database (but you still keep the customer_id field in the schema, so you don't have to change queries between two levels of customer), and "gold" customers get their own database server.
I did something similar a few years ago at a SaaS company - and customers are typically happy to have an upgrade path on infrastructure (read: performance and resilience) as well as features.
We have some databases here with shared clients and some where each client has it's own server and own database. The ones where the client is on it's own server are the easiest to manage and the least likely to cause a problem when some developer forgot to add the clientid and sent client a's data to client b by accident (an example NOT chosen at random).
Keeping each on it's own server or server instance allows us to keep the database structure the same with the same names and makes it easier to propagate changes to all the servers because we don't have to change the database name.
If you do use separate instances for each client, make sure you design and implement a good system for propagating all changes to all clients. If these databases get out of sync, they can become horrible to maintain. You'll find that if you let them get out of sync, each client will ask for changes and you will have 27 ways to do the same thing. You have to generalize when they are on the same database, when they are separate you have to use self discipline to ensure new functionality is the same for each client.
It depends, here, i work in a company that has many "Internal Business units" treated like other companies.
So, some reports must include all companies, Client accounts must also be shared across companies. Here we have a CompanyId Field in the tables that requires it.
The Prefix solution is surely one to be avoided.

Is anyone using the Service Broker in SQL Server? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
When I attended a presentation of SQL Server 2008 at Microsoft, they did a quick gallup to see what features we were using. It turned out that in the entire lecture hall, my company was the only one using the Service Broker. This surprised me a lot, as I thought that more people would be using it.
My experience with SB is that it does it's job well, but is pretty tough to administer and it's hard to get an overview.
So, have you considered using the Service Broker? If not, why not? Did you go for MSMQ instead? Is there anything in SQL Server 2008 that would make you consider using the Service Broker.
I've been using SQL Service Broker since a couple of months after SQL 2005 was released. We use it non-stop here sending hundreds of thousands of messages through it per day.
We use it to load data from staging tables to production tables so that the service that loads the staging table doesn't have to wait for the data to actually process, it can go back and get more data to load.
We use it to queue the deletion of files from the file system. (When the row is deleted the file needs to be deleted as well.)
At prior companies I've used it to print loan documents and the checks that were sent out to the customers.
I even used Service Broker to do ETL from an OLTP database to an OLAP database for real time reporting.
Most people (especially DBAs) don't like Service Broker because there isn't any UI for it. If you want to use service broker or see what its doing you have to actually write and run some T/SQL.
I have been using SB in 2005 for about two years now with one implementation handling several hundred thousand messages a day. I would say the biggest challenge has been not so much in the architecture but understanding all the nuances involved. The documentation from Microsoft is poor with very few practical examples. Remus Rusanu's blogs have really been helpful in doing things like dialog reuse and activation stored procedure tuning. I have found it's REALLY important to reuse dialogs as much as possible (and working through all the associated locking involved with that) as well as handling multiple received messages as a set rather than one at a time.
Monitoring SB can be a pain. You basically depend on a bunch of system views to tell you what's going on. Orphaned messages are a pain. There's just a lot of little gotchas that can, well, getcha.
Aside from the problems, and there aren't THAT many, I think it has really worked out better than I expected it to. Since SB is integrated into the database, there's no separate message queues to back up outside the database. It's all transactionally consistent. Performance is good. It's a great solution.
I would use it again and will continue to use it.
At my current company, our usage of SB is somewhat different to that of the other posters. We use SB in SQL2005 mainly as a management tool. For example, we use it to manage updates to a small set of mutable tables that are present in a large number of otherwise immutable databases. All the messages are between services running on the same instance and the message volume is very low.
My experience with SB has been that it can be somewhat 'fiddly' to setup correctly and, as you mentioned in your question, it is hard to get an overview of the state of SB because there is not a single monitoring tool.
Nevertheless, we have found it hugely valuable as a way to automate a lot of database management tasks in a traceable and reliable way.
I have recently considered using Service Broker for a project, but yes, decided to go for MSMQ instead.
Our architecture consisted of a number of (clustered) servers, each needing to write information into a single instance of SQL reliably.
As I understand it, SB only works for SQL to SQL communication, so we would have needed an instance of SQL on each clustered box. We felt this was a bit unnecessary, hence using MSMQ
To be honest, i'm can't think of a scenario where I would use SB - I'm interested in knowing a bit more about your scenario, to see if I'm missing something vital.
Service Broker can be used in various cases where automation is required to be done in the distributed architecture.
Such applications receiving events from various devices and need processing to be done reliably. Where events from devices (detection) or sensors are used for processing the logic of automation. To do exchange of data between multiple database or applications.
I hope the implementation can be more secured and reliable with SB

Resources