How can I create a hierarchy in SSAS? - sql-server

I have the table order with following fields:
ID
Serial
Visitor
Branch
Company
Assume there are relations between Visitor, Branch and Company in the database. But every visitor can be in more Branch. How can I create a hierarchy between these three fields for my order table.
How can I do that?

You would need to create a denormalised dimension table, with the distinct result of the denormalisation process of the table order. In this case, you would have many rows for the same visitor. One for each branch.
In your fact table, the activity record which would have BranchKey in the primary key, would reference this dimension. This obviously would be together with the VisitorKey...
Then in SSAS you would need to build the hierarchy, and set the relationships between the keys... When displaying this data in a client, such as excel, you would drag the hierarchy in the rows, and when expanding, data from your fact would fit in according to the visitors branch...
With regards to dimensions, it's important to set relationships between the attributes, as this will give you a massive performance gain when processing the dimension, and the cube. Take a look at this article for help regarding that matter http://www.bidn.com/blogs/DevinKnight/ssis/1099/ssas-defining-attribute-relationships-in-2005-and-2008. In this case it's the same approach also for '12.

Related

Star Schema from multiple source tables

I am struggling in figuring out how to create a star schema from multiple source tables. I work at a trading firm so the data is related to user trading activity. The issue I am having is that our datasets do not have primary ids for every field that could be a dimension. Instead, we usually relate our data together using the combination of date and account number. Here is an example of 3 source tables...
I would like to turn this into a star schema, something that looks like ...
Is my only option to denormalize my source tables into one wide table (joining trades to position on account number and date, and joining the users table on account number), create keys for each dimension, then re normalizing it into the star schema? Are star schema's ever built from multiple source tables?
Star schemas are almost always created from multiple source tables.
The normal process is:
Populate your dimension tables
Create a temporary/virtual fact record using your source data
Using this fact record, look up the relevant dimension keys
Write the actual fact record to your target fact table
Data-warehousing is about query speed. The data-warehouse should not be concerned with data integrity. IT SHOULD NOT CLEAN OR CORRECT BAD DATA. It only needs to gather all the data together into a single record to present to the model for analysis. Denormalizing the data is how this is done.
In a star schema, dimensions do not know about each other and have no relationships with other dimensions. In a snowflake, dimensions are related to other dimensions. That is the primary difference between star and snowflake.
All the metadata options for events are rolled up into dimensions and used for slicing/filtering. All the measurable/calculation data for an event are in the event fact, along with a reference to the dimension(s) containing the relevant metadata. The Metadata/Dimension is reused across multiple fact records.
Based on the limited example you've provided, I'd suggest you research degenerate dimensions and junk dimensions. Your Trade and Position data may need to be turned into a fact and a dimension (degenerate), and some of your flag attributes may be best placed into a junk dimension.
You should also make sure your dimension keys are clear. You should not have multiple paths to a dimension (accountnumber: trade -> position -> user & trade -> user ) as that will cause inconsistent results when querying depending on which relationship you traverse.

DW Design (PO and Invoices)

I have to build a DW to store PO and Invoice data:
An Invoice has a header and a list of items
A PO has a header and a list of items.
An invoice can be related to zero or more POs
A PO can be related to one or more invoices.
How is the recommended way to design this in star schema?
Designing a DW involves understanding multiple aspects before having a model.
What is the frequency of data refresh.
What is the volume of data.
Which columns need to be indexed. Also, which index will help you better.
The queries written on the tables. Are the queries aggregates? or are they straight select statements.
What is your history preservation strategy.
The data types of every column you need. You need to think about cross platform query executions...
So on and so forth..
You will need to deep dive into it. Just creating tables with FK will help now, but over the time when data volume increases it will be a bottleneck.
You have a problem in that you are modelling data, not process.
Star schemas are based on a business process, not an entity relationship.
What are you trying to model? What is the grain of the model?
I'll go out on a limb, and say that you're probably modelling sales. Have one fact: Sale. If you need order-specific information, consider whether it is part of an Order dimension, or if it should be carried as degenerate dimensions and/or measures in the Sale fact.
Create a Invoice_Header_Fact and a Invoice_LineItem_Fact. (This can be denormalized and merged in one table too)
Use Order_Key from Header Fact in LineItem Fact to associate it to lineitems
Create a PO_Header_Fact and a PO_LineItem_Fact.
Use PO_Key from Header Fact in LineItem Fact to associate it to lineitems
Create a bridge/xref table to maintain many to many relationship between PO and Invoices.
Hope this helps!

Designing a data model that incorporates logical operators

I am new to data modeling and i'm having trouble coming up with a data model that can store logic.
The data model would be used to store location and marketing attributes.
When a customer visits one of the company's websites, they would enter in their zip code, and based on their location the attributes would be used to arrange the online catalog of items.
The catalog of items would be separate from the database, so the data model would only produce the output of attributes used to arrange the items. Each item in the catalog has attributes such as ItemNumber, Price, Condition, Manufacture, and marketing segments (Age:Adult, Education: College, Income:High, etc.).
**For example:**
**Input zip code**: 90210
**Output Attributes**: (ItemNumber:123456, Segment:HighIncome, Condition:New)
This example is saying for zip 90210, first show item #123456, followed by all of the items with the HighIncome segment, and then display all of the non-refurbished items.
So far I have 2 tables with a many to many relationship and I would like to add an additional table(s) so I can incorporate logic (AND & OR).
The first table would have location and other information about which of the company's site the user is on.
Table Location(
Location_Unique_Identifier number
ZipCode varchar2
State varchar2
Site varchar2
..
)
The second table would have the attributes types (Manufacture, Price, Condition, etc.) and the attribute values (IBM, 10.00, Refurbished, etc.).
Table Attributes(
Attribute_Unique_Identifier number
Attribute_Type varchar2
Attribute_Value varchar2
..
..
)
In-between these two tables to break up the many to many relationship I would add the logic table. This table should allow me to output
item#123456 AND (item#768900 OR Condition:New)
The problem I am having with the logic table is trying to make it flexible enough to handle an unknown amount of AND/ORs and to handle the grouping.
This is a typical scenario of JOIN two( many ) tables together to do AND/OR/XOR or something else logical.
The best choice is to build a meterailized view that denormalize the attributes from multiple tables together into one table(this table is called a view).
In your case, the view may be:
table location_join_attributes{
number,
zipcode,
state,
site,
Manufacture,
Price,
Condition,
......
}
Then you will operate your logical statement on this table/view as(modified from your example):
item#123456 OR (item#768900 AND Condition:New) AND (more condition)
If we do not have this view, this operation will firstly fetch out all the records have item#768900, and then filter among the second table to know which of them have condition:new. It will take a long time to finish. If the condition is complex, the performance is terrible.
For quick query, you should build secondary indexes on the columns you operate.
On the scalability side, if your business logic changes, you may build a new view, and the older one will be discarded. The original tables do not change, which is also one of the advantages of a materialized view has.

How can I store an indefinite amount of stuff in a field of my database table?

Heres a simple version of the website I'm designing: Users can belong to one or more groups. As many groups as they want. When they log in they are presented with the groups the belong to. Ideally, in my Users table I'd like an array or something that is unbounded to which I can keep on adding the IDs of the groups that user joins.
Additionally, although I realize this isn't necessary, I might want a column in my Group table which has an indefinite amount of user IDs which belong in that group. (side question: would that be more efficient than getting all the users of the group by querying the user table for users belonging to a certain group ID?)
Does my question make sense? Mainly I want to be able to fill a column up with an indefinite list of IDs... The only way I can think of is making it like some super long varchar and having the list JSON encoded in there or something, but ewww
Please and thanks
Oh and its a mysql database (my website is in php), but 2 years of php development I've recently decided php sucks and I hate it and ASP .NET web applications is the only way for me so I guess I'll be implementing this on whatever kind of database I'll need for that.
Your intuition is correct; you don't want to have one column of unbounded length just to hold the user's groups. Instead, create a table such as user_group_membership with the columns:
user_id
group_id
A single user_id could have multiple rows, each with the same user_id but a different group_id. You would represent membership in multiple groups by adding multiple rows to this table.
What you have here is a many-to-many relationship. A "many-to-many" relationship is represented by a third, joining table that contains both primary keys of the related entities. You might also hear this called a bridge table, a junction table, or an associative entity.
You have the following relationships:
A User belongs to many Groups
A Group can have many Users
In database design, this might be represented as follows:
This way, a UserGroup represents any combination of a User and a Group without the problem of having "infinite columns."
If you store an indefinite amount of data in one field, your design does not conform to First Normal Form. FNF is the first step in a design pattern called data normalization. Data normalization is a major aspect of database design. Normalized design is usually good design although there are some situations where a different design pattern might be better adapted.
If your data is not in FNF, you will end up doing sequential scans for some queries where a normalized database would be accessed via a quick lookup. For a table with a billion rows, this could mean delaying an hour rather than a few seconds. FNF guarantees a direct access lookup path for each item of data.
As other responders have indicated, such a design will involve more than one table, to be joined at retrieval time. Joining takes some time, but it's tiny compared to the time wasted in sequential scans, if the data volume is large.

Basic questions regarding Data Warehousing

I'm wanting to use OLAP cubes and have to first design a data warehouse. I am going for the star-schema. I'm a little confused about how to convert from a normal database to a data warehouse, especially with regards to foreign keys between dimension tables. I know a fact table has foreign keys to dimensions, but do dimensions have foreign keys between them? For example, what do I need to do with the following 2 examples:
TABLE: Airports
COLUMNS: Id, Name, Code, CityId
When I make the Airports dimension, do I remove CityId and put the City Name instead? Or what?
TABLE: Regions
COLUMNS: Id, Name, RegionType, ParentId
The question for this one is mostly the same, but a bit more complex, because here ParentId refers to the same table (Regions).. example: a City can refer to a parent Country record. How do I translate these over to a data warehouse star schema?
Lastly, regarding measures, those go on the fact table, right? I think I will likely need multiple fact tables. Is that normal? Does one fact table translate to one OLAP cube? Or what?
You want to include city within your airport dimension. You are intentionally flattening out your normalised schema to aid the speed of the dimensional model which can seem counter intuitive if you are coming from transactional development.
With regards to the perennial child relationship, you want the parented to be translated into the surrogate of the region record. Ssas will provide the functionality to relate parent child records when you are designing your cube.
Multiple facts are not unusual, but unless the fact data is completely unrelated, there is no need to separate them into different cubes. The requirement for multiple facts will be driven by having data at a different grain. Keep all of you metrics (I.e. Flights) together, but you would separate out flight metrics from food sale metrics
you not converting to data warehouse, you are creating new data warehouse with few dimension and 1 (at least) Fact table. dimension tables are loaded first and you DO NOT want to change id with name.
you need additional key for each dimension table. once you load dimensions, I usually use ssis package to load fact table.(either incremental load or you can truncate fact table each time before you load with new data( depends what you need) ...

Resources