Star Schema or snowflake for derived fields. - data-modeling

We have a fact table that links to a customer dimension using the customer id field. It's working fine and the product owner is happy with it too. As part of the customer attributes we have the [SIC Code 2007][1]
[1]: https://onsdigital.github.io/dp-classification-tools/standard-industrial-classification/ONS_SIC_hierarchy_view.html and the ISO country code.
Now we have in the backlog a new user story to classify the customers based on a combination of those 2 fields. It's an straightforward table maintained by the user that for example says if SIC is 01.12 and Country FR then classify the customer as "Farming EU".
On a 3FN world we would join the customer dimension with the classification table and done. On a dimensional model we have studied 3 options:
A new table and this means moving to a snowflake schema.
Add the classification field to the customer dimension table. This means that if the user changes the classification ( Farming EU to Farming France for example ) we have to reprocess the dimension table.
Create the classification as a dimension and move the SIC and country code to the fact table ( or a surrogate key combining them ). This gives the user great flexibility to modify those combinations.
What do you think it is a best approach considering that the product is now on a releasable state but not in production yet?

Related

Creating a database schema from The Movie Database

I'm trying to create a database schema using information pulled from the themoviedb api.
I thought I was doing ok until I went to add in the television series, then I got really confused.
The TMDb API seems to treat television series and movies as completely separate things. It further divides television listings into series, seasons, and episodes.
For example there is a separate cast listing for television seasons (season regulars) and individual episodes (guest cast). I have no idea how to reflect all this in the database.
I've tried my best to model everything below, but I think there's something wrong somewhere. Please ignore the datatypes.
Role can be either writer, director, or actor.
http://imgur.com/a/1WKQB
Hi user2146821,
Your database design looks good, with the exception of how to display the relations between regular cast and guest cast members, as you've expressed.
Currently, you are approaching the scenario by having a singular join table between Movie, TV Seasons, TV Episodes and Person. This creates a table for which you cannot have either a singular primary key nor a correct composite primary key, as you will have nulls for any given record.
In the linked image above, you can see another way of handling this relationship - you create three join tables, each with Person on one side and a corresponding table on the other (either Movie, TV Season or TV Episode). This eliminates nulls from the join tables, allows for composite primary keys to be formed in the joins tables and structures the database in a more meaningful way.

Where should I store repetitive data in Access?

I'm creating this little Access DB, for the HR department to store all data related to all the training sessions that the company organizes for all the employees.
So, I have a Training Session table with information like date, subject, place, observations, trainer, etc, and the unique ID number.
Then there's the Personnel table, with employer ID (which is also the unique table number), names and working department.
So, after that I need another table that keeps a record of all the attendants of each training session. And here's the question, should I use a table for that in the first place? Does it have to be one table for each training session to store the attendants?
I've used excel for quite some time now, but I'm very new to Access and databases (even small ones like this). Any information will be highly appreciated.
Thanks in advance!
It should be one table for persons, one table for trainings, and one for participation/attendance, to minimize (or best: avoid) repetition. Your tables should use primary and foreign keys, so that there are one-to-many relationships between trainings and attendances as well as people and attendances (the attendances table would then have a column referring to the person who attended, and another column referring to the training session).
Google "database normalization" for more detail and variations of that principle (https://en.wikipedia.org/wiki/Database_normalization).

How do I create a table in SQL Server that stores multiple values for one cell?

Suppose I have a table for purchase orders. One customer might buy many products. I need to store all these products and their relevant prices in a single record, such as an invoice format.
If you can change the db design, Prefer to create another table called PO_products that has the PO_Id as the foreign key from the PurchaseOrder table. This would be more flexible and the right design for your requirement.
If for some reason, you are hard pressed to store in a single cell (which I re-iterate is not a good design), you can make use of XMLType and store all of the products information as XML.
Note: Besides being bad design, there is a significant performance cost of storing the data as XML.
This is a typical example of an n-n relationship between customer and products.
Lets say 1 customer can have from 0 to N products and 1 products can be bought by 0 to N customers. You want to use a junction table to store every purchase orders.
This junction table may contain the id of the purchase, the id of the customer and the id of the product.
https://en.wikipedia.org/wiki/Many-to-many_(data_model)

Structuring a cube with paths from a fact table to a dimension via two alternative intermediate dimensions

I'm unsure how to configure a cube in SSAS for a complex case that I can simplify as follows:
A fact table stores data about a Sale.
A dimension called Promotion records details of the marketing activity that generated the Sale
A dimension called Customer records details of the person who we sold to
We also have a table holding data about an Organisation
In some, but not all cases, a Promotion is targetted at an Organisation. There is an optional one-to-one relationship from Promotion to Organisation.
In some, but not all cases, a Customer is associated with an Organisation. There is an optional many-to-one relationship from Customer to Organisation.
We want to be able to analyse Sales by Organisation. For instance, if I report number of Sales by Organisation, the count for each Organisation should include both the sales through Promotions targetted at that Organisation and sales to Customers associated with that Organisation.
Note that with this data structure, each Sale may be associated with 0, 1 or 2 Organisations depending on the Promotion and the Customer. So if I report on number of sales by organisation, the grand total will not necessarily equal the total number of sales.
How would you structure the cube? I don't think it can work by simply setting up a referenced relationship from Sales->Promotion->Organisation and another from Sales->Customer->Organisation because SSAS won't know which path to use (and certainly won't know that it should aggregate across both paths together). Do I create two Organisation dimensions? Do I disconnect Organisation from the other dimensions and define some direct linkage between Organisation and Sales? Do I scrap the Organisation dimension and include organisation details as attributes in both Promotion and Customer?
You are correct in that SSAS won't be able to handle referencing the Organisation dimension through Promotion on one path, and through Customer on another path. This will give you an error when you try to build the cube.
Since each sale can be associated with 0, 1 or 2 organisations, I would recommend modelling this with a bridge-table (many-to-many) between the Organisation-dimension and the Sale-fact. This assumes that you have a unique ID on each Sale-transaction, so that you can create a fact-dimension on the Sale-fact (which need not be visible in the cube).
You construct the bridge-table in your ETL-flow. It should simply contain 2 columns, which relate the Organisation ID's with the Sale ID's. No Sale ID should have more than 2 Organisation ID's. Your final model should look something like this:
DimCustomer <---.
|
FactSale <---- BridgeSaleOrganisation ----> DimOrganisation
|
DimPromotion <---ยด
In the dimension-usage of SSAS, you set up a Many-to-many relation between FactSale and DimOrganisation using the BrdigeSaleOrganisation as the intermediary table. Once this is in place, filtering sales by the Organisation-dimension, will give you all sales belonging to that organisation via the bridge table, no matter whether they are through Promotion or Customer.
For more examples of many-to-many modelling, check out this excellent paper by "the gurus", Marco Russo and Alberto Ferrari.

Help with many-to-many relation

I have a problem with a many-to-many relation in my tables, which is between an employee and instructor who work in a training centre. I cannot find the link between them, and I don't know how to get it. The employee fields are:
employee no.
employee name
company name
department job title
business area
mobile number
ext
ranking
The Instructors fields are
instructor name
institute
mobile number
email address
fees
in a many-to-many relationship the relationships will be in a 3rd table, something like
table EmployeeInstructor
EmployeeID
InstructorID
to find all the employees for a specific instructor, you'd use a join against all three tables.
Or more likely there will be classes involved --
Employee takes Class
Instructor teaches Class
so you'll have and EmployeeClass table,
an InstructorClass table,
and join through them. And Class needs to be unique, or else you'll need
Class is taught in Quarter on ClassSchedule
and end up joining EmplyeeClassSchedule to InstructorClassSchedule.
This ends up being one of your more interesting relational designs pretty quickly. If you google for "Terry Halpin" and "Object Role Modeling", this is used as an illustrative situation in the tutorial.
First of all, you will need a unique key in both tables. The employee number may work for the employee table, but you will need another for the instructor table. Personally, I tend to use auto incrementing identity fields called ID in my tables. This is the primary key.
Second, create a new table, InstructorEmployee. This table has two columns, InstructorID and EmployeeID. Both fields should be indexed. Now you can create an association between any Employee and any Instructor by creating a record which contains the two IDs.

Resources