We received some generic training related to TM1 and dimension creation and we were informed we'd need separate dimensions for the same values.
Let me describe, we transport goods and we'd have an origin and destination province and in typical database design I'd expect we'd have one "province" reference table, but we were informed we'd need an "origin" dimension and a "destination" dimension. This seems to be cumbersome and seems like we'd encounter the same issue with customers, services, etc.
Can someone clarify how this could work for us?
Again, I'd expect to see a "lookup" table in the database which contains all possible provinces (assumption is values in both columns would be the same), then you'd have an ID value in any column that used the "province" and join to the "lookup" table based on ID.
in typical database design I'd expect we'd have one "province" reference table, but we were informed we'd need an "origin" dimension and a "destination" dimension
Following the regular DB design it makes sense to keep two data entities separate: one defines source, other defines target. I think on this we'd both agree. If you could give more details it would be better.
Imagine a drop down list: two lists populated by one single "source", but represent two different values in DB.
assumption is values in both columns would be the same
if the destination=origin, you don't need two dimensions then? :) This point needs clarification.
Besides your solution (combination of all source and destination in a table with an unique ID, which could be a way of solving this), it seems it's resolvable by cube or dimension structure changes.
If at some dimension you'd use e.g. ProvinceOrigin and ProvinceDestination as string type elements, and populate them from one single dimension (dynamic attribute) then whenever you save the cube you'll have these two fields populated from one single dimension.
Obviously the best solution for you depends on your system architecture.
Related
I am struggling in figuring out how to create a star schema from multiple source tables. I work at a trading firm so the data is related to user trading activity. The issue I am having is that our datasets do not have primary ids for every field that could be a dimension. Instead, we usually relate our data together using the combination of date and account number. Here is an example of 3 source tables...
I would like to turn this into a star schema, something that looks like ...
Is my only option to denormalize my source tables into one wide table (joining trades to position on account number and date, and joining the users table on account number), create keys for each dimension, then re normalizing it into the star schema? Are star schema's ever built from multiple source tables?
Star schemas are almost always created from multiple source tables.
The normal process is:
Populate your dimension tables
Create a temporary/virtual fact record using your source data
Using this fact record, look up the relevant dimension keys
Write the actual fact record to your target fact table
Data-warehousing is about query speed. The data-warehouse should not be concerned with data integrity. IT SHOULD NOT CLEAN OR CORRECT BAD DATA. It only needs to gather all the data together into a single record to present to the model for analysis. Denormalizing the data is how this is done.
In a star schema, dimensions do not know about each other and have no relationships with other dimensions. In a snowflake, dimensions are related to other dimensions. That is the primary difference between star and snowflake.
All the metadata options for events are rolled up into dimensions and used for slicing/filtering. All the measurable/calculation data for an event are in the event fact, along with a reference to the dimension(s) containing the relevant metadata. The Metadata/Dimension is reused across multiple fact records.
Based on the limited example you've provided, I'd suggest you research degenerate dimensions and junk dimensions. Your Trade and Position data may need to be turned into a fact and a dimension (degenerate), and some of your flag attributes may be best placed into a junk dimension.
You should also make sure your dimension keys are clear. You should not have multiple paths to a dimension (accountnumber: trade -> position -> user & trade -> user ) as that will cause inconsistent results when querying depending on which relationship you traverse.
I want to manage some datas by intervals on my database like that :
It is possible to do that on an unique table or I need 3 tables, one for each color (with FK) ?
Real example :
Actually, on my app I use this on a dataGridView and on my database :
It is possible to set / modify or everything on three databases. I manually add the equivalency (green) but for some number with a little different is it the same equivalency, so it's - for me - interesting to use numeric intervals
I'm not an expert on modeling databases but this is how I solve your scenario.
I'd create two Range Tables, one for storing column values, and other one for row values, each table will have same structure but since you need to represent the final values in a matrix way i decide to consider two tables(instead of merging them in one, its possible but then you'll need more effort to showing data from "Values"). As you can see i've considered a IdEquivalency columns, this will be useful for showing the data ad needed.
Finally the table Values(for green values) has two FK(one for each range value), and the value stored.
This is still a basic idea, but I'm sure you get the point.
Considerations:
Change Table Names according what its value represent.
I'm trying to create a multidimentional database from a preexisting database using SQL Server Analysis Services. My problem is that the original database stores all information on a varchar field called "value". What's in that field depends on another field that holds the type of statistic. So I can have for example a fact with statistic_type "number of products sold" with value 1000 and another with type "cost of material bought" with value 5000. The values can have completely differentic meanings, some are numeric values, others are percentages and others are strings.
How do I turn those into measures. Should the statistic_type be a dimension of the cube and have the value as a measure? Does a measure always need to have a numeric value? Should I separate the fact table amoung several tables, one for each type of statistic? Or is there some sensible way to create a cube using just the one table.
It's the first time I'm working with multidimentional databases and SSAS so I'm a little lost.
A measure always needs to have a numeric value. In fact, you will probably have to cast the value column as a numeric datatype in your Data Source View in order for it to even be a candidate for a measure in your cube.
You should make statistic_type a dimension and "value" a measure. It's ok to just use the one table, although it might be easier to work with if you make a lookup table of the distinct statistic_types.
I am working with a structure that results a lot of single attribute dimensions that require no hierarchy. Examples:
Status(Status Name)
Type(Type Name)
I get the following warning when compiling the project:
"Avoid having multiple dimensions containing a single attribute. Consider unifying them if possible."
A large number of single attribute dimensions is workable for our users, but it causes a lot of clutter in the Excel pivot table. Dimensions are listed along with the single attribute which is redundant.
I would like to unify them as the warning suggests so that I have a single dimension called 'Attributes' which contains status/type/etc, but I am unsure the best way to do so. It doesn't make conceptual sense to me with a parent/child dimension.
Any suggestions?
I agree this is a worthwhile change. I would construct a view that brings together the required attributes. Often they are all available on the fact/measure group table/view, so you can just use the same source object (in your DSV) to construct the dimension.
The tricky part may be the dimension key. The most flexible key is a Fact Surrogate Key eg a unique value per Fact row - in the future you can add any other fact-based attributes without affecting the key. However this will not scale indefinitely - you are probably OK up to 1m rows at least.
Beyond that scale, I would concatenate the attributes to form the dimension key and deliver them to a new dimension table. I would normally do this back in the ETL layer. The identical concatenation logic must be used for both the dimension and fact.
I have a table that holds information about a particular Object, Say Item and has columns
ItemID, ItemName, price, ItemListingType.....LastOrderDate
One of the bits of information, ItemListingType could be one of 10 different types
such as:
private, gov, non-gov, business... etc (strings) and could be extended to more types in future.
Should I be using a column inside table ITEM or should I Use a separate table with two columns and put a foreign key in Item table to reference that (a one to many relationship)? Like:
ListingTypeID int
ListingTypeName varchar(MAX)
EDIT: how many values for a column, you will consider to use another table for that
2, 4 or what ?
Thanks
Use a separate table to store this kind of reference data. This is a tenet of normalization and will also enable easier caching because you are separating read-only and read-write data. my two cents...
Separate table.
What if you have a listing type not yet used?
Or delete the last item with type x?
Or need to change a value?
These are insert, update and delete anomalies, which is one reason for normalisation
I would definitely go for a "lookup" style column; that way you are not stumped when there future additions to the list of permissible listing types. You are also reducing redundancy and making it easier to change the designation of aparticular type of listing (if "gov" changes to "government agencies", then you only have to change it in one place).
You should do it with the second table that holds the ListingTypes and link to the id of that table from the one with the Objects...
Take a look at Relational Database and Relational model.
In situations like this I ask myself:
Can the Item have an undetermined number of Listing types? If yes, different table.
Do the specs say that there will never be more than 3 types? Depends. Sometimes I'll still go with a separate table, sometimes not. You get a feel for this after a while.
Will the Item ALWAYS have a single listing type? If yes, same table, single column.
Now to take matters one step further.
If an Item has zero or more listing types AND those listing types are actually shared (in other words two items could have the same listing type, then we have 3 tables: Items, ListingTypes, and a cross reference table to support a many to many relationship.
Clasically, you should use an extra table, because you won't have any duplication that way. It will also allow you to change the value for this listing in a single place. However, if you are very very sure that no types will be added, keep the column.