SSAS Tabular Model - Multiple Fact tables - sql-server

I have a model with three fact tables and three dimensions. Each fact table can individually relate to each dimension, this works fine. But the three dimensions are in this schema not related and therefore cannot be used with each other by the client.
I have tried many solutions, one of them was merging the DimPerson, DimDepartment and DimDistrict by Crossjoin to get "all possible combinations". But given the number of rows in each of these dimensions, the task takes too long.
Any ideas? Or am I going about this the wrong way?
Here is the schema:

We have lots of models like this. The most common shared dimension is Date. We filter Date on every dashboard using the dimension and all the Fact tables return the filtered values. Be sure to use the Dimension columns for filtering, not the Fact table columns. Best practice is to hide the Fact columns (PersonID, DeptID, DistrictID) so these columns are only available via the Dimensions.

Related

Star Schema from multiple source tables

I am struggling in figuring out how to create a star schema from multiple source tables. I work at a trading firm so the data is related to user trading activity. The issue I am having is that our datasets do not have primary ids for every field that could be a dimension. Instead, we usually relate our data together using the combination of date and account number. Here is an example of 3 source tables...
I would like to turn this into a star schema, something that looks like ...
Is my only option to denormalize my source tables into one wide table (joining trades to position on account number and date, and joining the users table on account number), create keys for each dimension, then re normalizing it into the star schema? Are star schema's ever built from multiple source tables?
Star schemas are almost always created from multiple source tables.
The normal process is:
Populate your dimension tables
Create a temporary/virtual fact record using your source data
Using this fact record, look up the relevant dimension keys
Write the actual fact record to your target fact table
Data-warehousing is about query speed. The data-warehouse should not be concerned with data integrity. IT SHOULD NOT CLEAN OR CORRECT BAD DATA. It only needs to gather all the data together into a single record to present to the model for analysis. Denormalizing the data is how this is done.
In a star schema, dimensions do not know about each other and have no relationships with other dimensions. In a snowflake, dimensions are related to other dimensions. That is the primary difference between star and snowflake.
All the metadata options for events are rolled up into dimensions and used for slicing/filtering. All the measurable/calculation data for an event are in the event fact, along with a reference to the dimension(s) containing the relevant metadata. The Metadata/Dimension is reused across multiple fact records.
Based on the limited example you've provided, I'd suggest you research degenerate dimensions and junk dimensions. Your Trade and Position data may need to be turned into a fact and a dimension (degenerate), and some of your flag attributes may be best placed into a junk dimension.
You should also make sure your dimension keys are clear. You should not have multiple paths to a dimension (accountnumber: trade -> position -> user & trade -> user ) as that will cause inconsistent results when querying depending on which relationship you traverse.

Dimension Creation - Multiple Uses

We received some generic training related to TM1 and dimension creation and we were informed we'd need separate dimensions for the same values.
Let me describe, we transport goods and we'd have an origin and destination province and in typical database design I'd expect we'd have one "province" reference table, but we were informed we'd need an "origin" dimension and a "destination" dimension. This seems to be cumbersome and seems like we'd encounter the same issue with customers, services, etc.
Can someone clarify how this could work for us?
Again, I'd expect to see a "lookup" table in the database which contains all possible provinces (assumption is values in both columns would be the same), then you'd have an ID value in any column that used the "province" and join to the "lookup" table based on ID.
in typical database design I'd expect we'd have one "province" reference table, but we were informed we'd need an "origin" dimension and a "destination" dimension
Following the regular DB design it makes sense to keep two data entities separate: one defines source, other defines target. I think on this we'd both agree. If you could give more details it would be better.
Imagine a drop down list: two lists populated by one single "source", but represent two different values in DB.
assumption is values in both columns would be the same
if the destination=origin, you don't need two dimensions then? :) This point needs clarification.
Besides your solution (combination of all source and destination in a table with an unique ID, which could be a way of solving this), it seems it's resolvable by cube or dimension structure changes.
If at some dimension you'd use e.g. ProvinceOrigin and ProvinceDestination as string type elements, and populate them from one single dimension (dynamic attribute) then whenever you save the cube you'll have these two fields populated from one single dimension.
Obviously the best solution for you depends on your system architecture.

Manage numeric Intervals on SQL

I want to manage some datas by intervals on my database like that :
It is possible to do that on an unique table or I need 3 tables, one for each color (with FK) ?
Real example :
Actually, on my app I use this on a dataGridView and on my database :
It is possible to set / modify or everything on three databases. I manually add the equivalency (green) but for some number with a little different is it the same equivalency, so it's - for me - interesting to use numeric intervals
I'm not an expert on modeling databases but this is how I solve your scenario.
I'd create two Range Tables, one for storing column values, and other one for row values, each table will have same structure but since you need to represent the final values in a matrix way i decide to consider two tables(instead of merging them in one, its possible but then you'll need more effort to showing data from "Values"). As you can see i've considered a IdEquivalency columns, this will be useful for showing the data ad needed.
Finally the table Values(for green values) has two FK(one for each range value), and the value stored.
This is still a basic idea, but I'm sure you get the point.
Considerations:
Change Table Names according what its value represent.

PostgreSql correct number of field

I would like your opinion. I have a table with 120 VARCHAR fields where I will have to hire about 1,000 records per month for at least 10 years, with a total number of 240,000 records.
I could divide the fields into multiple tables but I'd rather keep it that way. Do you think I will have problems in the future?
Thank you
Well, if the data of the columns is following a certain logic, keep it flat. Which means that I would let it that way. Otherwise separate it into multiple tables. I depents on your data.
I worked once worked with medical data where one table contained over 100 columns, but all these columns where needed to get a diagnostic result. I don't remember, what exactly it was, because I worked with that data set some years ago. But in that case it would make it more complicated, if the columns would be separated into multiple columns. Logically the data of each column served a certain purpose so it was easier to have them all in the same place (the table).
If you put the columns all together just to be lazy, so that you have to call the table once, I would recommend to separate the columns into different tables to make it more comfortable to work with, and to make the database schema more understandable.

Database subtyping/supertyping

I have tables "Crop", "Corn", "Soybean", and "Grain". One entry in Crop corresponds to a single entry in one of the other tables. The problem here is that Crop should be one-to-one with only one of the other tables, but not more than one. The Crop table is needed because it combines a lot of the common data from the other tables and makes querying the information much easier code side. From working on this I have a couple strategies with drawbacks...
A. Put three columns into Crop for the IDs of the other tables then populate the column "Corn" if it's a corn crop ect...
Drawbacks: Wasted columns, have to check all three columns whenever I want to see what crop it is
B. Combine Corn, Soybean, and Grain tables and add a single column for what type of crop it is.
Drawbacks: Each table has different columns, wasted and unnecessary columns in each row
Is it safe to say I'm stuck here? Or is there a strategy to handle cases like this? Thanks.
This is the "subtype" situation and is covered extensively in Stephane Faroult's the Art of SQL
The recommended solution involves using the same unique key (in this case, say CropID) across all tables, Crop, Corn, Soybean and Grain. The set of primary keys of the Crop table then becomes the union of primary keys of Corn, SoyBean and Grain. In addition, you define an attribute, say CropType, on the Crop table indicating the type of each Crop record.
This way, common attributes stay on the Crop table and type-specific attributes go to type-specific tables with no redundancy.
Why not a PivotTable for all tables like :
PivotTable -> PivotID, PivotDate
Crop->CropID, PivotID, other fields
Soybean->SoybeanID, PivotID, other fields
Gran->GrainID, PivotID, other fields
So, you could select all tables with only one PivotID

Resources