SQL Server - robust protection of client data (multi-tenancy)

SQL Server - robust protection of client data (multi-tenancy) - sql-server

We are considering using a single SQL Server database to store data for multiple clients. We feel having all the data in one database could make things more manageable than a "separate db per client" setup.
The biggest concern we have is accidental access to the wrong client. It would be very, very bad if we were to ever accidentally show one client's data to another client. We perform lots of queries, and are afraid of a scenario where someone says "write me a query of this and this to go show the client for the meeting in 15 minutes." If someone is careless and omits the WHERE clause that filters for the correct client then we would be in serious trouble. Is there a robust setup or design pattern for SQL Server such that it makes it impossible (or at least very difficult) to accidently pull the wrong client's data from a single "global" database?
To be clear, this is NOT a database that the clients use directly or via apps (yet). We are talking about a database accessed by several of our programmers and we are afraid of screwing up ourselves.

At the very minimum, you should put the client data in separate schemas. In SQL Server, schemas are the unit of authorization. Only people authorized for a given client should be able to see that client's data. In addition to other protections, you should be using the built-in authorization capabilities of the database.
Right now, it sounds like you are in a situation where a very small group of people are the ones accessing all the data for everyone. Well, if you are successful, then you will probably need more people in the future. In fact, you might be giving some clients direct access to the data. If it is their data, they will want apps running on it.
My best advice, if you are planning on growing, is to place each client's data in a separate database. I would architect the system so this database can be on a remote server. If it needs to synchronize with common data, then develop a replication strategy for moving that data around.
You may think it is bad to have one client see another client's data. From the business perspective, this is deadly -- like "company goes out of business, no job" deadly. Your clients are probably more concerned about such confidentiality than you are. And, an architecture that ensures protection will make them more comfortable.

Multi-Tenant Data Architecture
http://msdn.microsoft.com/en-us/library/aa479086.aspx
here's what we do (mysql unfortunately):
"tenant" column in each table
tables are in one schema [1]
views are in another schema (for easier security and naming). view must not include tenant column. view does a WHERE on the tenant based on current user
tenant value is set by trigger on insert, based on the user
Assuming that all your DDL is in .sql files under source control (which it should be), then having many databases or schemas is not so tough.
[1] a schema in mysql is called a 'database'

You could set up one inline table valued function for each table that takes a required parameter #customerID and filters that particular table to the data of this customer. If the entire app were to use only these TVP's the app would be safe by construction.
There might be some performance implications. The exact numbers depend on the schema and queries. They can be zero, however, as inline TVP's are inlined and optimized together with the rest of the query.

You can limit access to data only via storedprocedures with obligatory customerid parameter.
If you allow you IT build views sooner or later someone forget this where clause as you said.
But a schema per client with already prefiltered views will enable selfservice and extra Brings value i guess.

Related

Is it possible to sanitize a Datomic database, given that it is a 'value' and does not 'update-in-place'?

I am exploring building on top of Datomic. I am sold on the principle of the database 'as a value'. The thing is, we need to be able to provide sanitized copies of the database to our developers to run locally. Any sensitive data which we are required to keep on the correct side of the firewall must not leak out.
With a standard SQL database this is easy: we just have a service inside the firewall which takes a snapshot of the DB and runs some script against it to update-in-place the sensitive value such that my.secret.email#address.com > email00123#address.com etc. Then the sanitised DB is made available to the developer to lift out of the compliance zone.
However, my understanding of Datomic (and its very strength) is that nothing is ever updated in place. So how would it be possible to sanitise a Datomic DB? Thanks.

This is one use case for filtering databases in Datomic. Filtering for security reasons is also discussed in this talk by Nubank.
The operational model is a bit different from the SQL world because user access and authorization and authentication, etc. are not baked into the database to the same degree. Any peer participates in the database fully and can submit transactions, etc., request an unfiltered database, etc. as API calls. You need an additional application layer (i.e. to create a client for your peer-as-a-server and only expose endpoints for queries against the filtered database) if you want stronger security guarantees.

Store database in sql server client wise

I have develop application in which i have created different logins for every client.Our applications is having so many clients like job portals or facebook and every client having huge amount of data .If i use single database then one table get huge amount of data for all client
I find out one solution for that and solution is to create separate database for every client but as there are so many client then we need to create so many databases so that not correct solution
Please can you tell me right way to implement this by using sql server 2008 r2
Thanks

You could try having one schema per client, and that client's logon has that schema as their default and is the only schema that they have access to. However you'll have a lot of schemas so it may not be much help! (Also, iof you're using something like EF to access the db it won't work.)
Single database good:
Easy management
Single database bad:
Possible performance problems (although not until you get into
billions of rows; one DB I designed had a table with more than 21B
rows after 3 months; lucky I made the IDENTITY column a BigInt!)
Security issues/complexity: how do you stop one client accessing
another's data?
Single point of failure for all clients
Multiple database good
Security is easier
Single point of failure per client (assuming multiple DB Servers to
spread that load also)
More flexibility in applying updates: some clients are OK with
Wednesday, some with Thursday
I'm sure that there are other issues as well. Really it's up to your requirements and how they can best be met,
Multiple db bad:
More management required
Given a DB has overhead, your overhead resource usage goes up

why are multiple DBs actually needed?

I was looking at godaddy.com which says they offer up to 10 MySQL DBs, but I don't know why you would need more than 1 ever since a DB can have mutliple tables. Can't multiple DBs be integrated into a single DB? Is there an example case where its better or unfeasible to not have multiple ones? And how do you differentiate between them when you want to call them, from their directory or from a name?
Best,

I guess separation of concerns would be the most obvious answer. In the same way you can have all of your functionality in one humongous class in object oriented programming, it's a good idea to keep non-related information separate. It's easier to wrap your head around smaller chunks of data, and future developers mights start to think tables are related, and aggregate data in a way they were never meant to.

Imagine that you're doing two different projects with two different teams. Maybe you won't one team to access the other team tables.
There can also be a space limit in each database, and It each one can be configured with specific params to optimize the performance.
In other hand, two final users can be assigned to make the backups of each entire database, and you wan`t one user to make the backup of the other DB because he could be able to restore the database in other place and access the first database data.

I'm sure there are some pretty good DBAs on the forum who can answer this in detail.
Storing tables in different databases makes because you are able to backup them up individually. Furthermore, you will be able to control access to each database under different NT groups (e.g. Admin vs. users). Although this can be done at the indvidual table level, sometimes it makes sense to grant or deny access to an entire database to a particular group.
When you need to call them in SQL Server you need to append the database name to the query like this SELECT * FROM [MyDatabase].[dbo].[MyTable].

One other reason to use separate databases relates to whether you need full transactional recovery or not. For instance, if I havea bunch of tables that are populated on a schedule through import processes and never by the users, putting them in a separate database allows me to set the recovery mode to simple which reduces the logging (a good thing when you are loading millions of records at once). I can also not do transactional log backup every fifteen minutes like I do for the data in the database with the user inserted data. It could also make recovery a faster process when needed as the databases would be smaller and thus individally take less time to recover. Won't help much when the whole server crashes but it could help a lot if onely one datbase gets corrupted for some reason. If the data relates to different applications, it simplifies the security as well to have the data in separte databases. And of course sometimes we have commercial databases and we can;t add tables to those and so may need a separate database to handles some things we want to add to that data (we do this for instance with our Project Management software, we have a spearate database where we extract and summarize data from the PM system for reporting and then write all our custome reports off that.)

User Table in Separate DB

Note: I have no intention of implementing this, it's more of a thought experiment.
Suppose I had multiple services available through a web interface. At least two of which required user registration and some data in a database. A single registration would grant access to all services. Like Google (GMail, Google Docs, etc.).
Would all of these services, which are related to registered users, be located within a single database, perhaps with table-prefixes for what service they were for?
Or would each service have it's own database? The only plus I can see to doing this is that it would make table names cleaner. Any time any user interaction would be needed, interacting with at least two different databases would be needed, which would needlessly complicate sql queries.
Would this suggest that the 'big boys' use only a single database, and load it with tons of different (and perhaps completely unrelated) tables?

If you use the right DBMS, you can have the best of both strategies. In PostgreSQL, within a 'database' you can have separate schemas. The authentication service would access a single schema and provide the other services a key which is used as a reference for data in the other schemas. You can still deal with the entire database as a single entity i.e:
query across schemas without using dblink
store personally identifiable information separately (schemas can have separate per-user permissions to further protect data)
DBMS managed foreign key constrains (I believe)
consistent (re the data) backup and restore
You get these advantages at the cost of a more complex DAL (may not be supported by your favorite DAL framework) and less portability between DBMS's.

I do not think it is a good idea to make multiple services dependent on a single database. If you need to restore some service from a backup, you'll have to restore all.
You are overloading a database server probably too.
I would do that only if it is likely they will share much data at future point.
Also you might consider smaller database with only the shared user data.

I would consider having 1 user / role repository with a separate database for services.

I've never done this, but I think it would depend on performance. If there's almost no overhead to do separate databases, that might be the answer. Doing separate DBs may also make it easy to split DBs across machines.
Complexity is also an issue. Hopefully your schema would be defined in such a way that you wouldn't need to dip into several different databases for different queries.

There's always a problem with potentially overloading databases and access thereof; replication is one potential good solution.

There are several strategies.
When you move to multiple databases (or multiple servers), things get more complex. Your core user information could be in a single database. The individual services could be in other databases. The problem with that is that the database is the outer unit of referential integrity, so you cannot design in foreign keys across databases. One way around this is to distribute changes to the core master tables (additions and updates only, obviously, since deletions would be forbidden due to a foreign-key constraint) to separate databases on a regular basis, and then enforce RI against these copies of the core master database tables within the service databases. This also means that the service databases and their services can run while the other databases are down for maintenance. Obviously this is an increased architectural complexity for an improvement to your service windows and reduced coupling.
I would recommend starting with a single database. If your RDBMS supports it, I would organize components according to SCHEMAs which would allow you to at least maintain a logical separation by design. You can more easily refactor later.
Many databases have tables which can be considered unrelated. Sometimes in a system you have multiple entity networks that hardly connect (sometimes not at all). You can use SCHEMAs in these cases too.

Copying data from a local database to a remote one

I'm writing a system at the moment that needs to copy data from a clients locally hosted SQL database to a hosted server database. Most of the data in the local database is copied to the live one, though optimisations are made to reduce the amount of actual data required to be sent.
What is the best way of sending this data from one database to the other? At the moment I can see a few possibly options, none of them yet stand out as being the prime candidate.
Replication, though this is not ideal, and we cannot expect it to be supported in the version of SQL we use on the hosted environment.
Linked server, copying data direct - a slow and somewhat insecure method
Webservices to transmit the data
Exporting the data we require as XML and transferring to the server to be imported in bulk.
The data copied goes into copies of the tables, without identity fields, so data can be inserted/updated without any violations in that respect. This data transfer does not have to be done at the database level, it can be done from .net or other facilities.
More information
The frequency of the updates will vary completely on how often records are updated. But the basic idea is that if a record is changed then the user can publish it to the live database. Alternatively we'll record the changes and send them across in a batch on a configurable frequency.
The amount of records we're talking are around 4000 rows per table for the core tables (product catalog) at the moment, but this is completely variable dependent on the client we deploy this to as each would have their own product catalog, ranging from 100's to 1000's of products. To clarify, each client is on a separate local/hosted database combination, they are not combined into one system.
As well as the individual publishing of items, we would also require a complete re-sync of data to be done on demand.
Another aspect of the system is that some of the data being copied from the local server is stored in a secondary database, so we're effectively merging the data from two databases into the one live database.

Well, I'm biased. I have to admit. I'd like to hypnotize you into shelling out for SQL Compare to do this. I've been faced with exactly this sort of problem in all its open-ended frightfulness. I got a copy of SQL Compare and never looked back. SQL Compare is actually a silly name for a piece of software that synchronizes databases It will also do it from the command line once you have got a working project together with all the right knobs and buttons. Of course, you can only do this for reasonably small databases, but it really is a tool I wouldn't want to be seen in public without.
My only concern with your requirements is where you are collecting product catalogs from a number of clients. If they are all in separate tables, then all is fine, whereas if they are all in the same table, then this would make things more complicated.

How much data are you talking about? how many 'client' dbs are there? and how often does it need to happen? The answers to those questions will make a big difference on the path you should take.

There is an almost infinite number of solutions for this problem. In order to narrow it down, you'd have to tell us a bit about your requirements and priorities.
Bulk operations would probably cover a wide range of scenarios, and you should add that to the top of your list.

I would recommend using Data Transformation Services (DTS) for this. You could create a DTS package for appending and one for re-creating the data.
It is possible to invoke DTS package operations from your code so you may want to create a wrapper to control the packages that you can call from your application.

In the end I opted for a set of triggers to capture data modifications to a change log table. There is then an application that polls this table and generates XML files for submission to a webservice running at the remote location.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight