Designing databases and applications for hosted / cloud solutions - sql-server

Are there any resources available that can guide someone on how to 'think' about the various components of a hosted / cloud solution before going ahead and starting to make a hosted application? If that made no sense, what I mean to ask is are there any guidance books/websites on what things need to be considered when making a cloud application?
I am attempting to make a hosted CRM-style software application that will serve many hundreds of customers. The application is powered by a SQL server database with many tables and a ColdFusion, HTML5, CSS, Javascript front-end. If I was installing this application and its components at each client site, then each installation is unique to that customer. But somehow I have to replicate this uniqueness in the cloud which is baffling me.
Only two things have come to mind so far:
The need for a unique database per customer in SQL server
The need to change DB connection strings per customer in the web application
My thought process has come to a block when I am trying to envisage how to design the application to serve so many different customers. Even though the application that all customers use will is the same (same DB tables, same front-end), the data that they store and retrieve will be specific to them. So I was thinking that surely each customer needs a separate database creating for them? Is it feasible to create a replica database for each customer? If I need to update some tables or add a new table, how would I do this for hundreds of different databases?
From the front-end I guess each unique customer log-in would change DB connection strings so that they can only access their database. Other than this I can't think of anything else that needs to change per customer basis.
When a new customer wants to sign up, it needs to be clear to me what I need to create for them to have access to the application. I guess this is ultimately what I need to think of but I'm stuck.
If anyone can suggest some things to think of or if there is a book or website on this kind of thing that someone could point me to I'd really be very thankful.
EDIT:
I was looking at an article about Salesforce.com and it says
"In order to ensure privacy of data for each user and give an effect of each having their own database, the data from different users are securely isolated from one another."
Anyone know how this is achieved or how it may be done?

Found some great information here. It is called multi-tenant database design and seems to be a common topic. Once I get the database designed then the application can sit nicely on top.
https://dba.stackexchange.com/questions/1043/what-problems-will-i-get-creating-a-database-per-customer

Related

Database sharding on Heroku

At some point in the next few months our app will be at the size where we need to shard our DB. We are using Heroku for hosting, Node.js/PostgreSQL stack.
Conceptually, it makes sense for our app to have each logical shard represent one user and all data associated with that user (each user of our app generates a lot of data, and there are no interactions between users). We need to retain the ability for the user to do complex ad-hoc querying on their data. I have read many articles such as this one which talk about sharding: http://www.craigkerstiens.com/2012/11/30/sharding-your-database/
Conceptually, I understand how Sharding works. However in practice I have no idea how to go about implementing this on Heroku, in terms of what code I need to write and what parts of my application I need to modify. A link to a tutorial or some pointers would be much appreciated.
Here are some resources I have already looked at:
http://www.craigkerstiens.com/2012/11/30/sharding-your-database/
MySQL sharding approaches?
Heroku takes care of multiple database servers?
http://petrohi.me/post/30848036722/scaling-out-postgres-partitioning
http://adam.heroku.com/past/2009/7/6/sql_databases_dont_scale/
https://devcenter.heroku.com/articles/heroku-postgres-follower-databases
Why do people use Heroku when AWS is present? What distinguishes Heroku from AWS?
As the author of the first article happy to chime in further. When it comes to sharding one of the very key components is what key are you sharding on. The complexity of sharding really comes into play when you have data that is intermingled across different physical nodes. If you're something like a multi-tenant app then modeling all your data around this idea of a tenant or customer can fit very cleanly in this setup. In that case you're going to want to break up all tables that are related to customer and shard them the same way as other tenant related tables.
As for doing this on Heroku, there are two options. You can roll your own with Heroku Postgres and application logic, or using something like Citus (which is an add-on that helps manage more of this for you.
For rolling your own, you'll first create the various application logic to handle creating all your shards and knowing where to route the appropriate queries to. For Rails there are some gems to help wtih this like activerecord-multi-tenant or apartment. When it comes to actually moving to sharding and that migration, what you'll want to do is create a Heroku follower to start. During the migration you'll have it start un-following. Then you'll remove half of the data from the original primary and the other half from the follower you separated accordingly.
I am not sure I would call this "sharding."
In LedgerSMB here is how we do things. Each company (business entity) is a separate database with fully separate data. Data cannot be shared between companies. One postgreSQL cluster can run any number of company databases. We have an administrative interface that creates the database and loads the schema. The administrative interface can also create new users, which can be shared between companies (optionally). I don't know quite how well it would work to share users between dbs on Heroku but I am including that detail in terms of how we work with PostgreSQL.
So this is a viable approach.
What you really need is something to spin up databases and manage users in an automated way. From there you can require that the user specifies a company name that you can map to a database however you'd like (this mapping could be stored in another database for example).
I know this is fairly high level. It should get you started however.

Application database/instance decomposintion

I'm designing a service that will serve some business entites. Logically it will be divided into two parts:
Frontend - bells and whistels like Wiki, Pricing, Landing Page, maybe account information (billing, account status, and so on).
Service itself, where business entity's empoyers will do theirs work.
It is play 2.x framework, planning to host in heroku.
It is not clear for now how to decompose intstances and DB stuff.
Should I decompose DB for clients: business entity - one database? Or should I store all data in one database, but add for all tables id of business entity that ownes some row? What issues (performance, administrative, scaling) may come up with this decision?
If I will choose to divide databases, how can I do this? For that I need to launch app instance with DB for client that instance belongs to. Thus we have non-uniform instances that can be obstacle for scaling. And as I know, heroku doesn't support non-uniform (web)instances.
Please help, i'm totally stucked here.
Expected stack:
Scala
Play 2.0
Anorm
JDBC
PostgresSQL
Heroku
All (except Scala, and may be Play 2.0) of this are interchangeable.
This is a pretty classic problem. You have many clients and you wonder if you should create separate databases for each client - or if they should share a database.
I would recommend starting with one shared database and then use that until you out grow it. Think of some of the disadvantages to having each client with their own database instance:
Like you mention the schema management can be tough. You'd need to write tools to maintain all databases across all servers.
If you tell clients you have structured your system this way, some of them might push you to fork the database. In other words they might argue, "I have my own database! I want a new table just for me."
It's a bit harder to run queries across servers/databases. If you wanted to count how many items all clients have, you'd have to think about that a bit.
But if you want to start by sharding based on client (http://en.wikipedia.org/wiki/Shard_(database_architecture)), you might consider:
As mentioned previously, you'll need some tools/scripts to launch a new database instance for a client. Often those tools will need to "seed" the database with configuration information - like populating a states table for addresses.
You'll want to have a tool to monitor/maintain the databases. Start one, stop another, see if one has high CPU usage etc.
You'll need some kind of system to aggregate statistics across all clients.
You'll need a tool to roll out schema changes and a plan on how you can gracefully upgrade the database while their web application is running.
Overall I would advise to start small and simple and only start worrying about scale when you get there.

Building a web application with multiple database instances or just a single instance

I am currently designing a web application where I will have customers signing up as companies. Each company will have its own set of users. As I am designing this I am wondering which approach would work best. I see sites like fogbugz or basecamp which use subdomains. In cases with subdomains do you have a database instance per sub domain? I'm wondering if it is recommended to have a database instance per company or if I should have some kind of company table and manage the company and user data/credentials all from one database.
Which approach is best? Is there literature on this subject (i.e. any web or book)?
thanks in advance!
You have to weigh up your options, as some of this will be a matter of opinion and might not be feasible for your implementation.
That being said, I'd consider the single database approach, for these reasons:
Maintenance: when running a database per registered 'client', you will very easily reach a situation where any changes or upgrades you make to your app's schema have to be applied to every single database instance. This will get ridiculous, fast.
Convenience: You might want analytics and usage stats, or some way to administrate all these databases. Querying a single database is comparatively trivial to trying to aggregate the same query for all your databases. This isn't going to scale.
Scalability *: As mentioned in 2, you're going to require a special sort of aggregation to query things about your clients, and your app as a whole. The bigger your app gets, the more complex your querying. The other issue is, if one client uses the app a lot more than another, what will you be encouraged to optimise? Your app, the bigger client's database, or the smaller client's? Not forgetting anything you do change has to be copied to all databases.
Backups: You can backup one database easily, just by creating a dump and stashing it somewhere. Get a thousand clients and now you have to run 1000 database dumps, and name them well enough to be able to identify them if one single database corrupts. How will you even know if this happens? Database errors will be localised to that specific one, as opposed to your entire app.
UI: A user signs up or is invited to use your app, and belongs to one particular client. Are you going to save that user account to the client's database? If so, see scalability for the issue of working with that data when the user wants to change their password, or you want to email them. So, do you tell the user to let you know which database they're in so you can find them?
Simplification: You have a database per client and want to just use a single one. How do you merge them all together without significantly breaking things? There'll be primary key conflicts if you use auto incremented IDs; bookmarked URLs will break if you decide to just regenerate the keys; foreign keys across tables will no longer point to the right records. Your data integrity will go down the pan.
You mention 'white label' services that offer their product through custom subdomains. I'm not privy to how these work, but the subdomain is only a basic CNAME or A record in their DNS zonefile. The process of adding these can be automated, and the design of the application and a bit of server configuration can deal with linking these subdomains to the correct accounts and data. They're just URLs, so maybe on the backend, the app doesn't differentiate between:
http://client.example.com
http://example.com/client
Overall though, you may decide that all these problems are things you can and would prefer to deal with. Be warned, however, that by doing so you may be shooting yourself in the foot, and you can gain a lot more from crafting a well-designed single database schema and a well-abstracted front-end.
*#xQbert mentions the very real benefit of scalability with multiple databases. I've amended this answer to clarify that I was more concerned with other aspects.

Subscription website architecture questions + SQL Server & .NET

I have a few questions about the architecture of a subscription service I am about to embark on and I am looking for some feedback on how best to set it up.
I won’t have a large amount of customers as Basecamp, maybe a few hundred and was wondering what would be a solid architecture for setting up the customer sites. I’m running SQL Server and .NET on a dedicated machine. Should create a new database for each customer as to have control and isolation of data or keep them all in one database?
I am also thinking of creating a sub-domain for each customer as well so modifications can be made to each site as needed. The customer URLs would look like this:
https://customer1.foobar.com
https://customer2.foobar.com
I am going to have the ability to ‘plug-in’ reports that will be uploaded to the site so each customer can customize as needed. Off the top of my head this necessitates having each sub domain on its own code-base for the uploading of these reports.
So on the main site the customer would sign up for their new subscription and I would programmatically create a new directory for the customer from the main code base and then create a sub domain pointing to the new directory for the customer and then finally their database.
Does this sound about right? Am I on the right track? How do other such sites accomplish the same thing?
Thanks for letting me bend your ear for a bit on this.
From a maintenance perspective, having a virtual directory for each customer scares me. Having done something similar, I would create separate domain pointers as you are intimating. Then you can check the referral headers to see what should be displayed. I would probably create one main site template and dynamically brand it for each customer. You can still create separate folders for customer specific reports or if you really need custom pages unique to that customer. I just wouldn't make each their own site.
The advantage of separate sites (including databases) is that the fate on one client isn't bound to all others. It'd be easier to upgrade (trial) to a sub-set before deploying to everyone else. The big issue here, as Scot points out, is time. You'd want to have things as automated as possible (and well tested), etc. It's also easy when a client leaves. You can always just back-up their database and send it to them (for example).
Auto-provisioning new sites and databases isn't easy, and the account that does that will need plenty of privileges - so your security testing will need to be better than usual.
A multi-tenancy approach is good for minimizing your time but you do have to be careful, you don't want customers data getting mixed up.
One approach that will work, within the one app (and database), is to make use of HttpHandlers (MVC framework, perhaps) so that some sort of client identifier is in part of the URL - but the folder doesn't have to physically exist (or virtually in the IIS sense). That way you don't have to worry about getting folder permission correct; but you do have to be careful about correctly identifying clients, their ids, and making sure clients can't make calls that use an id that isn't theirs.
https://www.foobar.com/[clientid]/subscriptions
The advantage of this is it's relatively straight forward: everything is in the application, and you don't have to worry about adding new DNS records, setting directory and/or database permissions, etc.

Software as a service - Database

If I am building a CRM web application to sell as a membership service, what is the best method to design and deploy the database?
Do I have 1 database that houses 100s of records per table or deploy multiple databases for different clients?
Is it really an issue to use a single database since I believe sites like Flickr use them?
Multiple clients is called "multi-tenant". See for example this article "Multi-Tenant Data Architecture" from Microsoft.
In a situation like a CRM system, you will probably need to have separate instances of your database for each customer.
I say this because if you'd like larger clients, most companies have security policies in place regarding customer data. If you store their customer data in the same database as another customer, you're running the risk of exposing one companies confidential data to another company (a competitor, etc.).
Sites like Flickr don't have to worry about this as much since the majority of us out on the Interwebs don't have such strict policies regarding our personal data.
Long term it is easiest to maintain one database with multiple clients' data in it. Think about deployment, backup, etc. However, this doesn't keep you from having several instances of this database, each containing a subset of the full client dataset. I'd recommend to grow the number of databases after you have established the usefulness/desirability of your product. Having complex infrastructure is not necessary if you have no traffic....
So, I'd just put a client id in the relevant tables and smile when client 4 comes in and the extent of your new deployment is one insert statement.

Resources