Postgresql - one database for everyone, or one-database per customer - database

I'm working on a web-based business application where each customer will need to have their own data (think basecamphq.com type model) For scalability and ease-of-upgrades, I'd prefer to have a single database where each customer gets a filtered version of the data. The problem is how to guarantee that they stay sandboxed to their own data. Trying to enforce it in code seems like a disaster waiting to happen. I know Oracle has a way to append a where clause to every query based on a login id, but does Postgresql have anything similar?
If not, is there a different design pattern I could use (like creating a view of each table for each customer that filters)?
Worse case scenario, what is the performance/memory overhead of having 1000 100M databases vs having a single 1Tb database? I will need to provide backup/restore functionality on a per-customer basis which is dead-simple on a single database but quite a bit trickier if they are sharing the database with other customers.

You might want to look into adding Veil to your PostgreSQL installation.

Schemas plus inherited tables might work for this, create your master table then inherit tables into per-customer schemas which provide a company ID or name field default.
Set the permissions per schema for each customer and set the schema search path per user. Use the same table names in each schema so that the queries remain the same.

Related

What is the difference between a schema & a database in Snowflake?

Is there a good reason to start a new project in a fresh Snowflake schema vs. a fresh Snowflake database?
I know this sounds like an opinion based question, but I'm trying to get to the technical limitations of one vs. the other.
As far as I can tell, databases & schemas are just like folders and sub-folders. They seem to have no bearing on cost or capability.
I can do:
SELECT *
FROM database1.schemaA.tableX x
JOIN database2.schemaB.tableY y ON y.row_id = x.row_id
So is it all purely syntax and table organization? Or am I missing something?
For simple use cases, you can treat databases and schemas as folders and subfolders. How you set them up is determined on how you want to organise your data and how you want to manage access control.
Access control: the more granular you want to make your access control the more complicated it is to implement and maintain. It's relatively simple to give users access to everything in a database, it's more complicated to give users access to specific schemas within a database and it can get very complicated to give users access to a subset of tables within a schema. Therefore if you have sets of tables that should be accessible to different set of users it is easier if you keep them in different schemas (or databases).
Replication: if you are going to need to replicate data to another Snowflake account (presumably in another region or otherwise you would probably use Sharing not Replication) then bear in mind that replication happens at the database level i.e. you can't replicate specific schemas (or tables or views), the whole database gets replicated. This may influence how you segregate your data between databases

When to Create DB and When to Create Schema

This seems a design question but I wanted to know if there is a pattern or design consideration we need to have where we would want to create a Database and not a new schema.
why not create one big database and separate schemas. Under what circumstance should we create a new database.
They are just logical divisions, so for the most part it's a matter of preference. There is one place where it's not a matter of preference: replication.
As of September, 2022, the unit of replication is the database. It's possible to specify which databases you want to replicate, but not which schemas within a database to replicate.
If you plan to replicate, you'll want to think about keeping only the schemas/tables that are important to replicate in one or more databases that get replicated and keep other data in databases that do not get replicated.
Another thought could be, In a large DWH Enterprise Solution,
There can be variety of flavours of tables which You can map to different databases. Sales DB, Master DB, Finance DB for ex. Then Inside DBs, You may want to have schemas for tables, views ,procedures and other object .

Multi tenant databases with single shared database

we are using .net mvc and sqlserver db.
EDIT
We are also using NHibernate for data access. I mention this because we will not be writing our own sql or do stored procs. triggers in the db might work but I don't know if you can do that between databases.
END EDIT
we want to have a multi tenant set up so each client has there own instance of the db. However, we need to have each tenant connect to an other database which has a great deal of user information. there will be some small amount of shared data between them. Basically the tenants will be referencing the data of the users in the shared database.
The idea is that some people will use just the shared database ( independent clients ) they then may well be hired by one of the tenant clients. the tenant will then want access to the new employees data in the shared database. Further the employee may leave one tenant and join another or leave one and remain independent and want access to thier data. We could of course have the shared database schema in each tenant and just do a big export import each time some one left or joined but this seems like a lot of trouble too.
I am asking for any advice on how to manage the fact that the tenants will have references to the shared database but no referential integrity. Or if there is an alternate approach or whatever.
Thank you,
Raif
Across databases you have to give up declarative referential integrity (foreign keys). However you can still enforce this (if you think you need to) using after or instead of triggers, or if you control all data manipulation via stored procedures, you can do it there (on insert or update, for example, you can check first, or as part of the modification join to or use EXISTS against the table(s) in other databases to be sure that a valid value is being used).
I've worked with multi-tenant models and there can be huge benefits that are worth the costs (e.g. giving up DRI in some cases). For things that are mostly reference data and that aren't free-text entry, there shouldn't be a whole lot of extra effort required.

Separating weakly linked database schemas

I've been tasked with revisiting a database schema we designed and use internally for various ticketing and reporting systems. Currently there exists about 40 tables in one Oracle database schema supporting perhaps six webapps.
However, there's one unifying relationship amongst them all: a rooms table describing the room. Room name, purpose and other data are thrown into a shared table for each app. My initial idea was to pull each of these applications into a separate database, and perform joins between a given database and the room database. But I've discovered this solution prevents foreign key constraints in SQL Server 2005. It seems silly to duplicate one table for each app and keep those multiple copies synchronized.
Should I just leave everything in one large DB, or is there something else I can do separate the tables without losing FK constraints?
The only way to achieve built-in referential integrity is to have the table inside the database in which it is referenced. You might be able to achieve the equivalent of referential integrity using triggers but it would likely be deathly slow.
You might be able to use SQL Server replication, in it's "Transactional replication" mode/form. http://msdn.microsoft.com/en-us/library/ms151176.aspx
if all the apps truly use and depend on the rooms - then keep them all in one DB.
you can still set privilege on the tables properly, and manage the data sets in the non overlapping areas normally -
is there any task you imagine you will not be able to perform when things are together?

Grouping ETL Staging Tables With User Schemas?

I was thinking of putting staging tables and stored procedures that update those tables into their own schema. Such that when importing data from SomeTable to the datawarehouse, I would run a Initial.StageSomeTable procedure which would insert the data into the Initial.SomeTable table. This way all the procs and tables dealing with the Initial staging are grouped together. Then I'd have a Validation schema for that stage of the ETL, etc.
This seems cleaner than trying to uniquely name all these very similar tables, since each table will have multiple instances of itself throughout the staging process.
Question: Is using a user schema to group tables/procs/views together an appropriate use of user schemas in MS SQL Server? Or are user schemas supposed to be used for security, such as grouping permissions together for objects?
This is actually a recommended practice. Take a look at the Microsoft Business Intelligence ETL Design Practices from the Project Real. You will find (download doc from the first link) that they use quite a few schemata to group and identify objects in the warehouse.
In addition to dbo and etl, they also use admin, audit, part, olap and a few more.
I think it's appropriate enough, it doesn't really matter, you could use another database if you liked which is actually what we do.
I'm not sure why you would want a validation schema though, what are you going to do there?
Both the reasons you list (purpose/intent, security) are valid reasons to use schemas. Once you start using them, you should always specify schema when referencing an object (although I'm lazy and never specify dbo).
One trick we use is to have the same-named table in each of several schemas, combined with table partitioning (available in SQL 2005 and up). Load the data in first schema, then when it's validated "swap" the partition into dbo--after swapping the dbo partition into a "dumpster" schema copy of the table. Net Production downtime is measured in seconds, and it's all carefully wrapped in a declared transaction.

Resources