Can a Star Schema Contain Outriggers? - data-modeling

My understanding from Star Schema modelling is that dimensions must directly link to a fact table. If this is true, then a star schema can't contain Outrigger dimension table. So, is this statement true that if a schema contains Outrigger dimension, it is snowflake?
Thank you

Neither of the terms “star schema” or “snowflake” have precise definitions that are generally accepted. If you want to describe your model as a star schema, a star schema with outriggers or a snowflake then that is entirely up to you.

Related

Star Schema from multiple source tables

I am struggling in figuring out how to create a star schema from multiple source tables. I work at a trading firm so the data is related to user trading activity. The issue I am having is that our datasets do not have primary ids for every field that could be a dimension. Instead, we usually relate our data together using the combination of date and account number. Here is an example of 3 source tables...
I would like to turn this into a star schema, something that looks like ...
Is my only option to denormalize my source tables into one wide table (joining trades to position on account number and date, and joining the users table on account number), create keys for each dimension, then re normalizing it into the star schema? Are star schema's ever built from multiple source tables?
Star schemas are almost always created from multiple source tables.
The normal process is:
Populate your dimension tables
Create a temporary/virtual fact record using your source data
Using this fact record, look up the relevant dimension keys
Write the actual fact record to your target fact table
Data-warehousing is about query speed. The data-warehouse should not be concerned with data integrity. IT SHOULD NOT CLEAN OR CORRECT BAD DATA. It only needs to gather all the data together into a single record to present to the model for analysis. Denormalizing the data is how this is done.
In a star schema, dimensions do not know about each other and have no relationships with other dimensions. In a snowflake, dimensions are related to other dimensions. That is the primary difference between star and snowflake.
All the metadata options for events are rolled up into dimensions and used for slicing/filtering. All the measurable/calculation data for an event are in the event fact, along with a reference to the dimension(s) containing the relevant metadata. The Metadata/Dimension is reused across multiple fact records.
Based on the limited example you've provided, I'd suggest you research degenerate dimensions and junk dimensions. Your Trade and Position data may need to be turned into a fact and a dimension (degenerate), and some of your flag attributes may be best placed into a junk dimension.
You should also make sure your dimension keys are clear. You should not have multiple paths to a dimension (accountnumber: trade -> position -> user & trade -> user ) as that will cause inconsistent results when querying depending on which relationship you traverse.

Is it possible to normalize a one-to-many table schema further than entity level?

Note: this section is a little long-winded, so feel free to just read the bolded summary below.
For example, take the schema below. Could the 2 "Statistic" tables at the bottom be normalized further?
My initial reaction is to feel that it could be normalized further (having a Statistics table for each and every sport could result in a lot of tables), but if I were to do that, then what mechanism would enforce the requirements for each respective sport's statistics?
For example, a Baseball statistic MUST include values for TotalHits and TotalHomeRuns. If I keep the BaseballStatistic table the way it is, then I can set these columns to Not Null.
Thoughts? Maybe there is even a better way than this current schema.
My basic problem is that I need to associate different types (BaseballStatistic and FootballStatistic) to the same parent type (Game). Is there any way to do this other than what I have done here?
If it matters, I am using SQL Server.

Can a table in different schema have totally different rows?

Please help me to clarify with the following question about Oracle database:
Say a database has a table called Employee. Both SchemaA and SchemaB use employee table. SchemaA.Employee and SchemaB.Employee are totally different?
In terms of content (rows)
In terms of structures (Columns names and data types)
Please let me know. I use Oracle a lot under fixed Schema.
If any of the above is true, then under the same database, we can have developer schema, test schema, production Schema and training schema. Is that true?

ER diagram, is this allowed?

I have to create an ER diagram based on a relational schema.
There is a table of players, and a table of zones. A player can 'live' in many zones, and each zone is owned by one or more players.
I've come up with this simple ER diagram but I'm not sure having relationships going each way is allowed?
Cheers
Yes, that is a perfectly good Entity Relation Diagram. (I am not responding as to whether it makes sense or not: you still need to resolve the Relations and Cardinality.)
Using the correct terms helps people understand exactly what you are discussing, and which level you are discussing. Loose talk results in much more volume in the discussion, and time wasted in clarifying what you meant by which term. Not good for productive technical endeavours.
At this early stage, it is normal to model Entities and Relations (not Attributes), that's why it is called an ER diagram; we are nowhere near modelling the data. The Relations are relevant, and that's why you are detailing and evaluating their nature in the diamonds and Cardinality. The goal is to clarify the true Entities, and their Relations to each other. Many-to-many relations remain as relations. The ERD is purely Logical, there is no Physical.
Once you have some confidence with that, that you have gotten the Entities and Relations right, you move onto a Data Model (which includes Attributes). Still at a Logical level, the n::n relations remain as relations.
As you progress, you may show further detail, such as Domain for each Attribute. That's the DataType, but at the Logical level, just as the terms are Entity = Table and Attribute = Column, Domain = DataType.
.
When you get to the Physical level, the Data Model has Tables; Columns; DataTypes.
And n::n Relations are manifested as the Associative Tables.
.
The idea is, as long as you are working through the prescribed steps, at (1), the content in the diamonds will determine (expose) if they need to be stored, and the diamond is thus promoted to an Entity; otherwise it remains a Relation.
There is a junction table called lives-in in the relational schema I've been given. However, I thought when mapping a relational schema [back] to an ER diagram a junction table becomes a relationship?
The Relational term is Associative table.
Yes. If it is a pure n::n Table (containing nothing but the two FKs to the PKs of the parent Tables), at the ERD level, which is Logical only, it is a Relation.
If it has Columns other than the two FKs, it is an Entity.
Since there's a many-to-many relationship between [Players] and [Zones] you have to add a junction table (called for ex. [PlayersZones]). The notation itself is correct (Chen notation), though I prefer the Crow's Foot Notation.
I am not able to see your images (blocked!) so I'll just try to describe the "correct" design. If a player living in a zone doesn't necessarily mean they own it, you should have four tables:
PLAYER (playerid, <other fields>)
ZONE (zoneid, <other fields>
PLAYER_ZONE(playerid, lives_in_zoneid)
ZONE_OWNER (zoneid, owner_playerid)
Otherwise three tables would suffice.

What is the difference between a schema and a table and a database?

This is probably a n00blike (or worse) question. But I've always viewed a schema as a table definition in a database. This is wrong or not entirely correct. I don't remember much from my database courses.
schema -> floor plan
database -> house
table -> room
A relation schema is the logical definition of a table - it defines what the name of the table is, and what the name and type of each column is. It's like a plan or a blueprint. A database schema is the collection of relation schemas for a whole database.
A table is a structure with a bunch of rows (aka "tuples"), each of which has the attributes defined by the schema. Tables might also have indexes on them to aid in looking up values on certain columns.
A database is, formally, any collection of data. In this context, the database would be a collection of tables. A DBMS (Database Management System) is the software (like MySQL, SQL Server, Oracle, etc) that manages and runs a database.
In a nutshell, a schema is the definition for the entire database, so it includes tables, views, stored procedures, indexes, primary and foreign keys, etc.
This particular posting has been shown to relate to Oracle only and the definition of Schema changes when in the context of another DB.
Probably the kinda thing to just google up but FYI terms do seem to vary in their definitions which is the most annoying thing :)
In Oracle a database is a database. In your head think of this as the data files and the redo logs and the actual physical presence on the disk of the database itself (i.e. not the instance)
A Schema is effectively a user. More specifically it's a set of tables/procs/indexes etc owned by a user. Another user has a different schema (tables he/she owns) however user can also see any schemas they have select priviliedges on. So a database can consist of hundreds of schemas, and each schema hundreds of tables. You can have tables with the same name in different schemas, which are in the same database.
A Table is a table, a set of rows and columns containing data and is contained in schemas.
Definitions may be different in SQL Server for instance. I'm not aware of this.
Schema behaves seem like a parent object as seen in OOP world. so it's not a database itself. maybe this link is useful.
But, In MySQL, the two are equivalent. The keyword DATABASE or DATABASES
can be replaced with SCHEMA or SCHEMAS wherever it appears. Examples:
CREATE DATABASE <=> CREATE SCHEMA
SHOW DATABASES <=> SHOW SCHEMAS
Documentation of MySQL
SCHEMA & DATABASE terms are something DBMS dependent.
A Table is a set of data elements (values) that is organized using a model of vertical columns (which are identified by their name) and horizontal rows. A database contains one or more(usually) Tables . And you store your data in these tables. The tables may be related with one another(See here).
As per https://www.informit.com/articles/article.aspx?p=30669
The names of all objects must be unique within some scope. Every
database must have a unique name; the name of a schema must be unique
within the scope of a single database, the name of a table must be
unique within the scope of a single schema, and column names must be
unique within a table. The name of an index must be unique within a
database.
From the PostgreSQL documentation:
A database contains one or more named schemas, which in turn contain tables. Schemas also contain other kinds of named objects, including data types, functions, and operators. The same object name can be used in different schemas without conflict; for example, both schema1 and myschema can contain tables named mytable. Unlike databases, schemas are not rigidly separated: a user can access objects in any of the schemas in the database he is connected to, if he has privileges to do so.
There are several reasons why one might want to use schemas:
To allow many users to use one database without interfering with each other.
To organize database objects into logical groups to make them more manageable.
Third-party applications can be put into separate schemas so they do not collide with the names of other objects.
Schemas are analogous to directories at the operating system level, except that schemas cannot be nested.
Contrary to some of the above answers, here is my understanding based on experience with each of them:
MySQL: database/schema :: table
SQL Server: database :: (schema/namespace ::) table
Oracle: database/schema/user :: (tablespace ::) table
Please correct me on whether tablespace is optional or not with Oracle, it's been a long time since I remember using them.
As MusiGenesis put so nicely, in most databases:
schema : database : table :: floor plan : house : room
But, in Oracle it may be easier to think of:
schema : database : table :: owner : house : room
More on schemas:
In SQL 2005 a schema is a way to group objects. It is a container you can put objects into. People can own this object. You can grant rights on the schema.
In 2000 a schema was equivalent to a user. Now it has broken free and is quite useful. You could throw all your user procs in a certain schema and your admin procs in another. Grant EXECUTE to the appropriate user/role and you're through with granting EXECUTE on specific procedures. Nice.
The dot notation would go like this:
Server.Database.Schema.Object
or
myserver01.Adventureworks.Accounting.Beans
A Schema is a collection of database objects which includes logical structures too.
It has the name of the user who owns it.
A database can have any number of Schema's.
One table from a database can appear in two different schemas of same name.
A user can view any schema for which they have been assigned select privilege.
I try answering based on my understanding of the following analogy:
A database is like the house
In the house there are several types of rooms. Assuming that you're living in a really big house. You really don't want your living rooms, bedrooms, bathrooms, mezzanines, treehouses, etc. to look the same. They each need a blueprint to tell how to build/use them. In other words, they each need a schema to tell how to build/use a bathroom, for example.
Of course, you may have several bedrooms, each looks slightly different. You and your wife/husband's bedroom is slightly different from your kids' bedroom. Each bedroom is analogous to a table in your database.
A DBMS is like a butler in the house. He manages literally everything.
In oracle Schema is one user under one database,For example scott is one schema in database orcl.
In one database we may have many schema's like scott
Schemas contains Databases.
Databases are part of a Schema.
So, schemas > databases.
Schemas contains views, stored procedure(s), database(s), trigger(s) etc.
A schema is not a plan for the entire database. It is a plan/container for a subset of objects (ex.tables) inside a a database. This goes to say that you can have multiple objects(ex. tables) inside one database which don't neccessarily fall under the same functional category. So you can group them under various schemas and give them different user access permissions. That said, I am unsure whether you can have one table under multiple schemas. The Management Studio UI gives a dropdown to assign a schema to a table, and hence making it possible to choose only one schema. I guess if you do it with TSQL, it might create 2 (or multiple) different objects with different object Ids.
A database schema is a way to logically group objects such as tables, views, stored procedures etc. Think of a schema as a container of objects.
And tables are collections of rows and columns.
combination of all tables makes a db.

Resources