Grouping ETL Staging Tables With User Schemas?

Grouping ETL Staging Tables With User Schemas? - sql-server

I was thinking of putting staging tables and stored procedures that update those tables into their own schema. Such that when importing data from SomeTable to the datawarehouse, I would run a Initial.StageSomeTable procedure which would insert the data into the Initial.SomeTable table. This way all the procs and tables dealing with the Initial staging are grouped together. Then I'd have a Validation schema for that stage of the ETL, etc.
This seems cleaner than trying to uniquely name all these very similar tables, since each table will have multiple instances of itself throughout the staging process.
Question: Is using a user schema to group tables/procs/views together an appropriate use of user schemas in MS SQL Server? Or are user schemas supposed to be used for security, such as grouping permissions together for objects?

This is actually a recommended practice. Take a look at the Microsoft Business Intelligence ETL Design Practices from the Project Real. You will find (download doc from the first link) that they use quite a few schemata to group and identify objects in the warehouse.
In addition to dbo and etl, they also use admin, audit, part, olap and a few more.

I think it's appropriate enough, it doesn't really matter, you could use another database if you liked which is actually what we do.
I'm not sure why you would want a validation schema though, what are you going to do there?

Both the reasons you list (purpose/intent, security) are valid reasons to use schemas. Once you start using them, you should always specify schema when referencing an object (although I'm lazy and never specify dbo).
One trick we use is to have the same-named table in each of several schemas, combined with table partitioning (available in SQL 2005 and up). Load the data in first schema, then when it's validated "swap" the partition into dbo--after swapping the dbo partition into a "dumpster" schema copy of the table. Net Production downtime is measured in seconds, and it's all carefully wrapped in a declared transaction.

Related

When to Create DB and When to Create Schema

This seems a design question but I wanted to know if there is a pattern or design consideration we need to have where we would want to create a Database and not a new schema.
why not create one big database and separate schemas. Under what circumstance should we create a new database.

They are just logical divisions, so for the most part it's a matter of preference. There is one place where it's not a matter of preference: replication.
As of September, 2022, the unit of replication is the database. It's possible to specify which databases you want to replicate, but not which schemas within a database to replicate.
If you plan to replicate, you'll want to think about keeping only the schemas/tables that are important to replicate in one or more databases that get replicated and keep other data in databases that do not get replicated.

Another thought could be, In a large DWH Enterprise Solution,
There can be variety of flavours of tables which You can map to different databases. Sales DB, Master DB, Finance DB for ex. Then Inside DBs, You may want to have schemas for tables, views ,procedures and other object .

What are the implications of creating tables in a database with different schemas?

I am creating a database with about 40 different tables.
I have heard about people grouping tables into database 'schemas' - what are the implications of using different schemas in a database? Can tables from one schema still relate to another schema? What are the functional differences between different schemas?

Where are schemas located in SSMS? They are rightfully placed under the security tab.
Lets use the AdventureWorks databases.
If you assign security at the schema level, purchasing users will only have access to the purchasing table and sales people will have only access to the sales tables.
In fact, they will not even see the other tables if you set it up correctly.
If you combine schemas with creating tables/indexes on file groups, now you can place all the sales people onto file group sales and purchasing on file group purchasing.
IE - Spreading the I/O load.
In short, I think schemas are an unknown and little used feature.
Check out my blog article on this fact.
http://craftydba.com/?p=4326

I assume that you are talking about SQL Server. You can join and reference between tables in different schemas. I see it mostly used for visual organization and/or for managing objects' permission (you can assign permissions at the schema-level).
If you are worried about any negative effects of doing dbo.table vs custom.table - there aren't any that I imagine you would encounter.

Schemas are just collections of database objects. They are useful for maintaining separation of sets of objects.
There is always at least one schema. For SQL Server it is named dbo.
One implication of having multiple schemas is that you will have to manage permissions for the various schemas. This is usually done via a role that's associated with the schema.
Objects in one schema are available to objects from another, and there is no performance penalty in writing queries that use objects from multiple schemas.

Postgresql - one database for everyone, or one-database per customer

I'm working on a web-based business application where each customer will need to have their own data (think basecamphq.com type model) For scalability and ease-of-upgrades, I'd prefer to have a single database where each customer gets a filtered version of the data. The problem is how to guarantee that they stay sandboxed to their own data. Trying to enforce it in code seems like a disaster waiting to happen. I know Oracle has a way to append a where clause to every query based on a login id, but does Postgresql have anything similar?
If not, is there a different design pattern I could use (like creating a view of each table for each customer that filters)?
Worse case scenario, what is the performance/memory overhead of having 1000 100M databases vs having a single 1Tb database? I will need to provide backup/restore functionality on a per-customer basis which is dead-simple on a single database but quite a bit trickier if they are sharing the database with other customers.

You might want to look into adding Veil to your PostgreSQL installation.

Schemas plus inherited tables might work for this, create your master table then inherit tables into per-customer schemas which provide a company ID or name field default.
Set the permissions per schema for each customer and set the schema search path per user. Use the same table names in each schema so that the queries remain the same.

Separating weakly linked database schemas

I've been tasked with revisiting a database schema we designed and use internally for various ticketing and reporting systems. Currently there exists about 40 tables in one Oracle database schema supporting perhaps six webapps.
However, there's one unifying relationship amongst them all: a rooms table describing the room. Room name, purpose and other data are thrown into a shared table for each app. My initial idea was to pull each of these applications into a separate database, and perform joins between a given database and the room database. But I've discovered this solution prevents foreign key constraints in SQL Server 2005. It seems silly to duplicate one table for each app and keep those multiple copies synchronized.
Should I just leave everything in one large DB, or is there something else I can do separate the tables without losing FK constraints?

The only way to achieve built-in referential integrity is to have the table inside the database in which it is referenced. You might be able to achieve the equivalent of referential integrity using triggers but it would likely be deathly slow.

You might be able to use SQL Server replication, in it's "Transactional replication" mode/form. http://msdn.microsoft.com/en-us/library/ms151176.aspx

if all the apps truly use and depend on the rooms - then keep them all in one DB.
you can still set privilege on the tables properly, and manage the data sets in the non overlapping areas normally -
is there any task you imagine you will not be able to perform when things are together?

schema in sql server 2008

what is the difference between creating ordinary tables using 'dbo' and creating tables using schemas.How this schema works & supports the tables

A schema is just a container for DB objects - tables, views etc. It allows you to structure a very large database solution you might have. As a sample, have a look at the newer AdventureWorks sample databases - they have a number of schemata included, like "HumanResources" and so forth.
A schema can be a security boundary, e.g. you can give or deny certain users access to a schema as a whole. A schema can also be used to keep tables with the same name apart, e.g. you could create a "user schema" for each user of your application, and have a "Settings" table in each of them, holding that user's settings, e.g. "Bob.Settings", "Mary.Settings" etc.
In my experience, schemata are not used very often in SQL Server. It's a way to organize your database objects into containers, but unless you have a huge amount of database objects, it's probably something you won't really use much.

dbo is a schema.

See if this helps.
Schema seems to be a way of categorizing objects (tables/stored procs/views etc).
Think of it as a bucket to organize related objects based on functionality.
I am not sure, how logged in SQL user is tied to a specific schema though.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Grouping ETL Staging Tables With User Schemas? - sql-server

I think it's appropriate enough, it doesn't really matter, you could use another database if you liked which is actually what we do. I'm not sure why you would want a validation schema though, what are you going to do there?

Related

When to Create DB and When to Create Schema

What are the implications of creating tables in a database with different schemas?

Postgresql - one database for everyone, or one-database per customer

Separating weakly linked database schemas

schema in sql server 2008

Categories

Resources