In the multidimensional cube, I have two facts (at different grain) named as : FactTestScore and Fact SubjectScore. These two facts share two dimensions- DimStudent and DimSubject. And FactTestScore has additional dimension of DimTest. I've deployed the cube without any error.
In the PowerBI to create report, when I have matrix table with Subject, Test, Student and their respective scores, the all tests are getting cross joined with all subjects. Can you please point out where I am making mistake?
In Power BI, filtering across relationships can only propagate in the direction specified.
In your diagram, there is no path from one dimension table to another without going the "wrong way". In fact, it doesn't look like they can even filter the fact tables unless the notation is the reverse of what I'm accustomed to.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have to design and build a star / snowflake schema database that will keep data about employees in a company - especially the rates that are payed to the employees. This is the first time I am experimenting with this schema type and I'm not sure about which parts of the fact tables should be separate dimension tables.
I don't exactly understand the practical upsides of having this schema, is it actually that much easier to perform queries on this type of database? Or is it only about the performance?
Below I am attaching the project of the schema of my database. I would like to know what should I modify for this to be the best possible version for this database. I also have a question about two things:
Should the rate column be just a value in the fact table? Or should it be a foreign key to a dim_rate table?
What about date dimensions? Should they just be values in specific tables? Or should they always be foreign keys? If they should be foreign keys, should there be one dim_date table or a table for each type of date?
As an example for question 2 lets takie the dim_employee table and the employment_date and end_of_employment columns. I have these dates as values in the dim_employee table but I can think of 2 other versions of how to handle this data: either foreign keys to a dim_date table or seperate fact tables for fact_start_of_employment and fact_end_of_deployment. I know I will need different kinds of report for example reports showing how many people started work and left the company for different date intervals (eg. in december of 2020). Honestly at this point I have no idea which option would be best and easiest to work with in the future.
Also as I said - I would love any constructive criticism of this schema, even if it means completely redesigning it.
I would merge both fact tables because I think there is a strong relation between rate and position. But that's how I look at this data without knowing all the details.
I would also create a date dimension and a form_of_employment dimension.
That would result in 4 dimensions:
dim_employee
dim_date
dim_position
dim_form_of_employment
And a single fact table with these columns:
fact_assignment
employee_id
date_id
position_id
form_of_employment_id
rate
student
This setup results in a proper star and very simpel SQL for your reports
For every BI or reporting system, you have a process of designing your tables and building them based on that design. This process is called dimensional modeling. Some others call it data warehouse design, which is the same thing. Dimensional modeling is the process of thinking and designing the data model including tables and their relationships. As you see, there is no technology involved in the process of dimensional modeling, It is all happening on your head and ends up with sketching diagrams on the paper. Dimensional modeling is not the diagram in which tables are connected to each other, it is the process of doing that.
Star Schema is the best way of designing a data model for reporting, You will get the best performance and also flexibility using such a model.
In this case the Employee Dimension will be a Historical Dimension or Slowly Changing Dimension :
You can use a bridge table.
In a classic dimensional schema, each dimension attached to a fact table has a single value consistent with the fact table’s grain. But there are a number of situations in which a dimension is legitimately multivalued.
Like in your example, an employee can have many positions :
I want to calculate the "stock turnover rate" KPI which depend on two measures in two differents table facts (Amount from sales fact and physical quantity from inventory fact). So, my question is the following-
Do I have to regroup the two facts in the same cube OLAP, or is there another way to do this? knowing that everyone recommend to have one fact table per cube.
Are there any dimensions that the Fact tables share?
I don't think that being recommended to have 1 Fact table per Cube is correct. There is no reason why you cant have multiple Fact tables or Measure groups.
Let's say that in my database model I have three fact tables. These fact tables have same dimension tables (so called conformed dimensions). I know that I shouldn't connect directly fact tables (since direct connection can cause double-counting of some facts), but only through the dimension tables. What I am interested in is can I connect every fact with every dimension table without problems? I looked for an answer a lot and the opinions are divided. Some say there is no problem, the others say that because of this fact tables can associate with each other and circular references can occur; and that in these cases so called link table should be used. Is this link table really necessary or can this work without it?
If a dimension can describe an aspect of the fact event, you should connect it so it can be used in analytics.
However, you shouldn't force a relationship to connect a fact to a dimension that it does not need. That will make your model confusing and bloated.
You are correct that you should not connect facts directly. The model does not function that way. You'll want to read up on the purpose of facts and dimension to understand why.
You should be able to navigate between related events through the common dimensions, but that is not a circular reference. A circular reference prevents a value from being returned because there is not a bottom to the relationship.
If entities have a many to many relationship, you can use link/bridge tables to expand the relationship into multiple one to many/one to one relationships. That is complicated to model and too much to explain as part of this question.
If you want more, please post some of your model so we can focus on the specific needs of your question.
I implemented the model (in MS SQL), and I'm sharing here my experience in case anyone is interested in this in future.
In the end I created five fact tables (model turned out to be more complex), they are all connected to all existing dimension tables (six of them) directly. I didn't use the link table.
This model is in usage for almost five months now and so far no problems appeared.
I'm a newbie to data warehousing and I've been reading articles and watching videos on the principles but I'm a bit confused as to how I would take the design below and convert it into a star schema.
In all the examples I've seen the fact table references the dim tables, so I'm assuming the questionId and responseId would be part of the fact table? Any advice would be much appreciated.
I can't see the image at the moment (blocked by my firewall # the office). but I'll try to give you some ideas.
The general idea is to organize your measurable 'facts' into what are called fact tables. There are 3 main types of facts, but that is a topic for a different day (but I'd be happy to go into this if needed). Each of these facts are what you'd see in the center of typical 'star schema'. The other attributes within the fact tables are typically FK references to the dimension tables.
Regarding dimensions, these are groups of attributes that share commonality (the most notable being a calendar dimension). This is important because when you're doing analysis across multiple facts the dimensions are what you use to connect them.
If you consider this simple example: A product is ordered and then shipped. We could have 2 transaction facts (one that contains the qty ordered - measure, type of product ordered - dimension, and transaction date - dimension). We'd also have a transaction fact for the product shipping ( qty shipped - measure, product type - dimension, and ship date - dimension). This simple schema could be used to answer questions like 'how many products by product type last quarter were ordered but not shipped'.
Hopefully this helps you get started.
Usually a fact table is used to aggregate measures - which are always numeric. Examples would be: sales dollars, distances, weights, number of items sold.
The type of data you drew here doesn't have any cut and dry "measure" so you need to decide what you want to measure. Is the number of answers per question? Is it how many responses per sample?
This is often called an Event Fact table (if you want to search for other examples). And you need some sort of reporting requirements before you can turn it into a star schema. So it isn't an easy answer...
It's so easy :) Responses is fact, all other is dimensions. And your schema is now star designed, because you can directly connect fact with all dimensions. Example, when you need to redesign its structure where addresses stored in separate table and related with sample. You must add address table id into responses table for get star schema.
Preface
SQL Server 2008 R2 Standard Edition, Multidimensional Cube
In my data warehouse I have the following tables:
Dimensions
DimPartnership - Groupings of Partners DimPartner - Groupings of
Investors (can be in multiple partnerships) DimInvestor - Individual
investors that can make up multiple partners
Facts
FactInvestments - Records related to investment activity. Contains a foreign key "InvestorKey" that relates to the DimInvestor table.
Bridges
BrInvestorPartner - Bridge table to resolve Investors to Partners
BrPartnerPartnership - Bridge table to resolve Partners to Partnerships
Problem:
I need to create a Many-To-Many-To-Many relationship in SSAS. The first many-to-many dimension is working, the second one is not.
Current Solution:
I have made two bridge tables that link the Investor dimension to the partner dimension and then the partner dimension to the partnership dimension. The cube processes and, as expected, the partner many-to-many dimension works correctly. I am able to slice measures in the fact table by partner members. However, when I apply partnership as a part of the query, it has no effect on the Investments measure group. My Investments measures group is ignoring this dimension, it seems.
Question
Can anyone point out what I'm doing incorrectly? Is this even supported by Microsoft? I can't find anything in their documentation about this, but I would assume this would be supported. I appreciate any guidance toward figuring out what's wrong. Can this be solved with scoping or doing some sort of intersection on Partner Partnership Count?
Pictures
Some pics that might help you:
Faulty results
Values and names edited to protect client privacy - same value returned for all partnerships (the total of all investments)
DSV
Cube Structure
Dimension Usage
And of course once I posted my question, I figured out the problem.
My dimension usage for the Partnership Dimension should use both v Br Investor Partner bridge and the v Br Partner Partnership bridge with Many-To-Many relationships. Everything is now working as expected.
Compare this to the Dimension Usage Screenshot in my OP: