Data modeling in snowflake - snowflake-cloud-data-platform

We perform ELT in our company. we load the data to the landing zone (which is a database in snowflake) and have schemas as per the source from which it is retrieving the data such as:
LZ(database) -- FACEBOOK<LINKEDIN(schemas)
(Here nothing needs to be changed)
Once all the data is loaded, analysts create views/tasks to do the transformations as per the information needed.
We are moving towards the domain-driven design in snowflake in later part. We have analysts and each analyst belongs to a domain such as sales, and vendor.
We have identified all domains now next step is implementation. There are two ways:
domains as databases
domains as schemas inside a single database
We can have a sales database, a vendor database.
Or we can have a database such as analysts: inside which sales could be a schema and vendor could be a schema.
Which one should I go for and why? I have seen in most cases its schemas only but what could work best and why and what could be the implications is what I am looking for

Related

Best Options to manage large sets of data SQlserver

I am currently working on a project which involves the following:
The application I am working on is connected to a SQlserver
database.
SAP loads information into multiple tables (in a daily
and also hourly basis) into a MASTER database
There are 5 other databases(hosted on the same server) that access this information via synonyms and stored procedure calls to the MASTER database
The MASTER database purely used for storing the data and routing it to the other databases)
Master Database -
Tables:
MASTER_TABLE1 <------- SAP inserts data into this table.Triggers are used to process the valid data & insert into secondary staging tables -say MASTER_TABLE1_SEC
MASTER_TABLE1_SEC -- Holds processed data coming into MASTER_TABLE1
FIVE other databases ( for each manufacturing facility) are present in the same server. My application is connected to the facility databases ( not the Master)
FACILITY1
Facility2
....
FACILITY5
Synonyms of MASTER_TABLE1_SEC are created in each of these 5 facility databases
Stored procedures are again called from the Facility databases- in order to load data from the MASTER_TABLE1_SEC into the respective tables( within EACH facility) based on the business logic.
Is there a better architecture to handle this kind of a project? I am a beginner when it comes to advanced data management. Can anyone suggest a better architecture or tools to handle this?
There are a lot of patterns that would actually meet the needs described here. It serves that you are working with a type of Data Warehouse. I use Data Vault for my Enterprise Data Warehouses. It is an Ensemble Modeling technique designed for integration and master data preparation. You can think of it as a way to house all data from all time. You would then generate Data Marts (Kimball Method) for each of the Facilities containing only thei or whatever is required for their needs.

how do I get company ID wise backup or restore the data

I have a mini account software. In this software I can store multiple company data. The data is stored in SQL Server 2008 R2 database.
In current database I have a User table which stores all user names, a Company Master table which stores company details like name,address, session etc. and user ID as FK with user table. Next is tran table which link with company Master and stores vouchers details and others table link to tran tabel like bill, payment etc.
The app is build for small companies and professionals who keep & maintain there their client data. In that scenario all data is separate and mutually independedent. In case of the small company they maintain all subsidiary company's account related data in a single app. Some time they receipt or send any one subsidiary company data to that company or any government body or Audit firms. like mobile phone contacts, I can send all contacts or any selected contact.
Users used to select his/her company first form company Master and then add/edit reference data or view report on the basis of selected company ID.
Now my problem is the data volume is become very high on some client places because of 50 to 60 companies data are stored in a single database and how I get company ID wise backup or restore the data. Is filegroup of sql server can help on this matter? I have no knowledge of filegroup.
Please help me.
Do not split your SQL database into multiple SQL databases (either do not create more filegroups etc.) just because you need to get data filtered by the CompanyId. Everytime when your Client would need to create a new Company, your application would have to create a new database for it. This would also quite complicate things like app updates.
If you do not face any grave performance problems - like when using SQL Express and your client database is 9 GB (max. database size for Express is 10 GB) - leave 1 database for 1 client.
Be sure all your related tables are well indexed by the CompanyID column. Then you can provide means to export data by CompanyID from your application - custom reports, exports to csv files, Excel etc.
Database backup file is usually not used for passing data to other applications. Its goal is to assure disaster recovery - when the disk fails etc. then your client will be able to recover easily. On contrary when he would have 50 database files in place of just 1 he would have hard time restoring all those databases properly.

What are the implications of creating tables in a database with different schemas?

I am creating a database with about 40 different tables.
I have heard about people grouping tables into database 'schemas' - what are the implications of using different schemas in a database? Can tables from one schema still relate to another schema? What are the functional differences between different schemas?
Where are schemas located in SSMS? They are rightfully placed under the security tab.
Lets use the AdventureWorks databases.
If you assign security at the schema level, purchasing users will only have access to the purchasing table and sales people will have only access to the sales tables.
In fact, they will not even see the other tables if you set it up correctly.
If you combine schemas with creating tables/indexes on file groups, now you can place all the sales people onto file group sales and purchasing on file group purchasing.
IE - Spreading the I/O load.
In short, I think schemas are an unknown and little used feature.
Check out my blog article on this fact.
http://craftydba.com/?p=4326
I assume that you are talking about SQL Server. You can join and reference between tables in different schemas. I see it mostly used for visual organization and/or for managing objects' permission (you can assign permissions at the schema-level).
If you are worried about any negative effects of doing dbo.table vs custom.table - there aren't any that I imagine you would encounter.
Schemas are just collections of database objects. They are useful for maintaining separation of sets of objects.
There is always at least one schema. For SQL Server it is named dbo.
One implication of having multiple schemas is that you will have to manage permissions for the various schemas. This is usually done via a role that's associated with the schema.
Objects in one schema are available to objects from another, and there is no performance penalty in writing queries that use objects from multiple schemas.

SQL Server: conventions for naming a schema

Consider a database server whose job today is to house one database. Likely the database will be moved in the future to another database instance which houses multiple databases & schemas.
Let's pretend the app/project is called Invoicer 2.0. The database is called AcmeInvoice. The database holds all the invoice, customer, and product information. Here's a diagram of the actors and their roles and behaviour.
The schema(s) will largely be used to easily assign permissions to roles. The added benefit here is that the objects aren't under dbo, and that the objects & permissions can be ported to another machine in the future.
Question
What conventions do you use when naming the schema?
Is it good form to name the schema the same as the database?
I would think that if your schema name ends up being the same as your database schema, then you are just adding redundancy to your database. Find objects in your database that have common scope or purpose and create a schema to relect that scope. So for example if you have an entity for Invoices, and you have some supporting lookup tables for invoice states, etc, then put them all in an invoice schema.
As a generally rule of thumb, I would try to avoid using a name that reflects the application name, database name or other concrete/physical things because they can change, and find a name that conceptually represents the scope of your objects that will go into the schema.
Your comment states that "the schemas will largely be used to easily assign permissions to roles". Your diagram shows specific user types having access to some/all tables or some/all stored procs. I think trying to organize objects conceptually into schemas and organize them from a security standpoint into schemas are conflicting things. I am in favour of creating roles in sql server to reflect the types of users, and grant those roles access to the specific objects that each user type needs, as apposed to granting the role or user access the schema to build your security framework..
Why would you name the schema the same as the database? This means all database objects fall under the same schema. If this is the case, why have a schema at all?
Typically schema's are used to group objects within a common scope of activity or function. For example, given what you've described, you might have an Invoice schema, a Customer schema and a Product schema. All Invoice related objects would go into the Invoice schema, all Customer related objects would go into the Customer schema, and the same for Products.
We often will use a Common schema as well which includes objects that might be common to our entire application.
I would call the database AcmeInvoice (or another suitable name) and the schema Invoicer2.
My reasons are as follows: Acmeinvoice means I am grouping all of that applications objects/data together. It can therefore be moved as one unit to other machines (a backup/restore or unattach/attach).
The schema would be Invoicer2. Applications change, maybe in the future you will have Invoicer21 (you would create a schema), or perhaps a reporting module or system (Reports schema).
I find that the use of schemas allows me to separate data/procedures in one database into different groups which make it easier to adminster permissions.

Postgresql - one database for everyone, or one-database per customer

I'm working on a web-based business application where each customer will need to have their own data (think basecamphq.com type model) For scalability and ease-of-upgrades, I'd prefer to have a single database where each customer gets a filtered version of the data. The problem is how to guarantee that they stay sandboxed to their own data. Trying to enforce it in code seems like a disaster waiting to happen. I know Oracle has a way to append a where clause to every query based on a login id, but does Postgresql have anything similar?
If not, is there a different design pattern I could use (like creating a view of each table for each customer that filters)?
Worse case scenario, what is the performance/memory overhead of having 1000 100M databases vs having a single 1Tb database? I will need to provide backup/restore functionality on a per-customer basis which is dead-simple on a single database but quite a bit trickier if they are sharing the database with other customers.
You might want to look into adding Veil to your PostgreSQL installation.
Schemas plus inherited tables might work for this, create your master table then inherit tables into per-customer schemas which provide a company ID or name field default.
Set the permissions per schema for each customer and set the schema search path per user. Use the same table names in each schema so that the queries remain the same.

Resources