How to design a database model for a large data warehouse? - data-modeling

Lets say I design a database model for an online seller such as Amazon:
Next, to create the database model for the larger data warehouse, I flatten the Order and OrderDetails tables, and flatten the Product and Vendor tables:
I do this to apply the concept of designing models for parent-child applications, as described here http://bit.ly/1bOuOXQ
The data in the data warehouse tables becomes repetitive:
Several values such as $76.30 for OrderTotal repeat on each row, is this correct? Is the model correct?

Each row represents a match for each product, the corresponding order subtotal and total will be identical for each row with a specific orderid.
The result is as expected for your model definition.

Yes.
OrderTotal is the same for all records, so it is duplicated when you put everything in a single spreadsheet.
(Of course, in a relational database you are keep data in several tables, so it isn't duplicated.)

Related

DW Design (PO and Invoices)

I have to build a DW to store PO and Invoice data:
An Invoice has a header and a list of items
A PO has a header and a list of items.
An invoice can be related to zero or more POs
A PO can be related to one or more invoices.
How is the recommended way to design this in star schema?
Designing a DW involves understanding multiple aspects before having a model.
What is the frequency of data refresh.
What is the volume of data.
Which columns need to be indexed. Also, which index will help you better.
The queries written on the tables. Are the queries aggregates? or are they straight select statements.
What is your history preservation strategy.
The data types of every column you need. You need to think about cross platform query executions...
So on and so forth..
You will need to deep dive into it. Just creating tables with FK will help now, but over the time when data volume increases it will be a bottleneck.
You have a problem in that you are modelling data, not process.
Star schemas are based on a business process, not an entity relationship.
What are you trying to model? What is the grain of the model?
I'll go out on a limb, and say that you're probably modelling sales. Have one fact: Sale. If you need order-specific information, consider whether it is part of an Order dimension, or if it should be carried as degenerate dimensions and/or measures in the Sale fact.
Create a Invoice_Header_Fact and a Invoice_LineItem_Fact. (This can be denormalized and merged in one table too)
Use Order_Key from Header Fact in LineItem Fact to associate it to lineitems
Create a PO_Header_Fact and a PO_LineItem_Fact.
Use PO_Key from Header Fact in LineItem Fact to associate it to lineitems
Create a bridge/xref table to maintain many to many relationship between PO and Invoices.
Hope this helps!

Laravel DB design: how to model different types of similar entity

First of all I have to mention that I am modernising our ERP system that is build in-house. It handles everything from purchasing of parts, sale orders, quotes and inventory to invoicing and statistical data. The system is web based and heavily dependent on ORM. EloquentORM will be used in the redesign.
My main question is about the data model of certain entities that are very similar. Currently three of most widely interconnected entities in the app are: Orders, Products and Invoices.
1. Orders
In current DB design I have one big orders table in which there is a order_type attribute to distinct between different order types: Purchase orders, Sale orders, Quotes and Service orders. About 80% of fields are common to each order type and there are some specific fields for each order types. Currently at ~15k records.
2. Products
Similarly I have one big products table with an attribute product_type to distinct between different product types: Finished products, Services, Assemblies and Parts. Again there is a fair % of fields that are common throughout all product types and some that are specific to different product type Currently at ~7k records.
3. Invoices
Again one table invoices with invoice_type attribute to distinct between 4 invoice types: Issued invoices (for things we sell), Received invoices (for things we buy), Credit notes and Avans Invoices. More or less all invoice types use the same fields. Currently at ~15k records.
I am now trying to decide which is the optimal way for this kind of DB model. I see three options:
1. Single Table Inheritance
Leave as is, everything in the same table. It feels kind of awkward to always filter records like where order_type = 'Sale order' to display right orders in the right place in GUI... Also when doing sale and purchase analytics I need to include the same where condition to fetch right orders. Seems wrong!
2. Class Table Inheritance
Have a master tables orders, products and invoices with common set of fields between each of entity types and then one-to-one child relation tables for every different type of each entity: sales_orders, purchase_orders, quote_orders, finished_products, reseller_products, part_products, assembly_products, received_invoices and issued_invoices with FK in each of the child tables to master table... This seems like a good idea but handling that with ORM brings in a little more complexity...
In this method I have a questions which FK should be used around. For example each invoice can belong to one order. Received invoice will go with Purchase order and issued invoice will go with Sale order. Should the master orders table's PK be used as a FK in the master invoices table to relate these entities, or should the child sale_orders PK be used in the child issued_invoices?
3. Concrete Table Inheritance
Having completely separated tables for every type of each entity. This would avoid me having parent->child relationship between master table but would result in a lot of similar attributes in each table...
What would be the best approach? I am aiming at ease of use in EloquentORM and also speed and scalability for the future.

Differentiation between 'Entity' and 'Table'

Can someone tell me the easy way to explain the differentiation between an entity and a table in database?
Entity is a logical concept of relational database model. And table is used to express it, but there is a slight difference. Table expresses not only entities, but also relations.
For example, assume that you want to make a database of projects and employees of a company. Entity is a unit of information that has meanings by itself. In this case, there will be two entities - "Project" and "Employee". Each entity has its own attributes.
In relational DB model, there is another idea, 'relation'. If employees participate in several projects, then we can say that there is a relation with a name 'works_on'.
Sometimes, relation can have its own attribute. In this case, 'works_on' relation can have attribute 'start_date' and so on. And if this relation is M:N relation(Like this case: In project 1, there are 5 employees. Employee A works on two projects.), then you have to make an extra table to express this relation. (If you don't make an extra table when the relation is M:N, then you have to insert too many duplicated rows into both 'Project' and 'Employee' table.)
CREATE TABLE works_on(
employee,
project_id,
start_date
)
An entity resides in a table, it is a single set of information, i.e: if you have a database of employees, then an employee is an entity. A table is a group of fields with certain parameters.
Basically everything is stored in a table, entities goes into tables.
In a relational database the concept is the same. An entity is a table.
In OOP (Oriented Object Programming) there is a nice article in Oracle docs:
In general terms, entity objects encapsulate the business policy and
data for
the logical structure of the business, such as product lines,
departments, sales, and regions
business documents, such as invoices, change orders, and service
requests
physical items, such as warehouses, employees, and equipment
Another way of looking at it is that an entity object stores the
business logic and column information for a database table (or view,
synonym, or snapshot). An entity object caches data from a database
and provides an object-oriented representation of it.
Depending on how you want to work, you can create entity objects from
existing database tables (reverse generation) or define entity objects
and use them to create database tables (forward generation).
There is little difference between an entity and a table, but first we should define the concepts of “tuple” and “attribute”. I want to express those concepts with a table.
Here is an example of a table. As you can see it has some columns. Each column represents an attribute and each line represents a tuple(row). This is the employee table. In this example, the table is being used for representing an entity which is employee. However, if we create a new table named superior to specify the superiority relationship between those employees, it would be a table that represents a relation. In summary, we can use tables for representing both the entities and relations so entities and tables are not the same.

How can I create a hierarchy in SSAS?

I have the table order with following fields:
ID
Serial
Visitor
Branch
Company
Assume there are relations between Visitor, Branch and Company in the database. But every visitor can be in more Branch. How can I create a hierarchy between these three fields for my order table.
How can I do that?
You would need to create a denormalised dimension table, with the distinct result of the denormalisation process of the table order. In this case, you would have many rows for the same visitor. One for each branch.
In your fact table, the activity record which would have BranchKey in the primary key, would reference this dimension. This obviously would be together with the VisitorKey...
Then in SSAS you would need to build the hierarchy, and set the relationships between the keys... When displaying this data in a client, such as excel, you would drag the hierarchy in the rows, and when expanding, data from your fact would fit in according to the visitors branch...
With regards to dimensions, it's important to set relationships between the attributes, as this will give you a massive performance gain when processing the dimension, and the cube. Take a look at this article for help regarding that matter http://www.bidn.com/blogs/DevinKnight/ssis/1099/ssas-defining-attribute-relationships-in-2005-and-2008. In this case it's the same approach also for '12.

Basic questions regarding Data Warehousing

I'm wanting to use OLAP cubes and have to first design a data warehouse. I am going for the star-schema. I'm a little confused about how to convert from a normal database to a data warehouse, especially with regards to foreign keys between dimension tables. I know a fact table has foreign keys to dimensions, but do dimensions have foreign keys between them? For example, what do I need to do with the following 2 examples:
TABLE: Airports
COLUMNS: Id, Name, Code, CityId
When I make the Airports dimension, do I remove CityId and put the City Name instead? Or what?
TABLE: Regions
COLUMNS: Id, Name, RegionType, ParentId
The question for this one is mostly the same, but a bit more complex, because here ParentId refers to the same table (Regions).. example: a City can refer to a parent Country record. How do I translate these over to a data warehouse star schema?
Lastly, regarding measures, those go on the fact table, right? I think I will likely need multiple fact tables. Is that normal? Does one fact table translate to one OLAP cube? Or what?
You want to include city within your airport dimension. You are intentionally flattening out your normalised schema to aid the speed of the dimensional model which can seem counter intuitive if you are coming from transactional development.
With regards to the perennial child relationship, you want the parented to be translated into the surrogate of the region record. Ssas will provide the functionality to relate parent child records when you are designing your cube.
Multiple facts are not unusual, but unless the fact data is completely unrelated, there is no need to separate them into different cubes. The requirement for multiple facts will be driven by having data at a different grain. Keep all of you metrics (I.e. Flights) together, but you would separate out flight metrics from food sale metrics
you not converting to data warehouse, you are creating new data warehouse with few dimension and 1 (at least) Fact table. dimension tables are loaded first and you DO NOT want to change id with name.
you need additional key for each dimension table. once you load dimensions, I usually use ssis package to load fact table.(either incremental load or you can truncate fact table each time before you load with new data( depends what you need) ...

Resources