Data warehouse design - periodic snapshot with frequently changing dimension keys

Data warehouse design - periodic snapshot with frequently changing dimension keys - database

Imagine a fact table with a summation of measures over a time period, say 1 hour.
Start Date | Measure 1 | Measure 2
-------------------------------------------
2018-09-08 00:00:00 | 5 | 10
2018-09-08 00:01:00 | 12 | 20
Ideally we want to maintain the grain such that each row is exactly 1 hour. However, each row references dimensions which might ‘break’ the grain. For instance:
Start Date | Measure 1 | Measure 2 | Dim 1
---------------------------------------------------
2018-09-08 00:00:00 | 5 | 10 | key 1
2018-09-08 00:01:00 | 12 | 20 | key 2
It is possible that the dimension value may change 30 minutes into the hour in which case, the above would be inaccurate and should be represented like this:
Start Date | Measure 1 | Measure 2 | Dim 1
---------------------------------------------------
2018-09-08 00:00:00 | 5 | 10 | val 1
2018-09-08 00:00:30 | 5 | 10 | val 2
2018-09-08 00:01:00 | 12 | 20 | val 2
In our scenario, the data needs to be sliced by at least 5 dimension keys with queries like:
sum(measure1) where dim1 = x and dim2 = y..
Is there a design pattern for this requirement? I have considered ‘periodic snapshots’ but I have not read anywhere about this kind of row splitting on dimension changes.
I can see only two options:
Store the dimension values that were most present on each row (e.g. if a dimension value was true for the majority of the time in the hour, use this value). This would lead to some loss of accuracy.
Split each row on every dimension change. This is complex in the ETL, creates more data and breaks the granularity rule in the fact table.
Option 2 is the current solution and serves the purpose but is harder to maintain. Is there a better way to do this, or other options?
By way of a real example, this system records production data in a manufacturing environment so the data is something like:
Line | Date | Crew | Product | Running Time (mins)
-----------------------------------------------------------------------
Line 1 | 2018-09-08 00:00:00 | Crew A | Product A | 60
As noted, the crew, product or any of the other dimension may change multiple times within the hour.

You shouldn't need to split the time portion of your fact table since you clearly want to report hourly data, but you should have two records, one for each dimension value. If this is an aggregate of a transactional fact table, your process that loads the hourly table should be grouping each record by each dimension key. So in your example above, you should have two records for hour like so:
Start Date | Measure 1 | Measure 2 | Dim 1
---------------------------------------------------
2018-09-08 00:00:00 | 5 | 10 | val 1
2018-09-08 00:01:00 | 5 | 10 | val 1
2018-09-08 00:01:00 | 12 | 10 | val 2
You will need to take into account the other measures as well and make sure they all go into the correct bucket (val 1 or val 2). I split them evenly in the example.
Now if you slice by hour 1 and by Dim 1 Value 2, you will only see 12 (measure 1), and if you slice on hour 1, dim 1 value 1, you will only see 5, and if you only slice on hour 1, you will see 17.
Remember, your grain is defined by the level of each dimension, not just the time dimension. HTH.

Related

Totals for Rows in Cognos 11 Crosstab Report

I am working in Cognos version 11.2.4 and working to attain row totals on a Crosstab report. The table I'm working with has several Locations listed as rows, Hour as columns (24 'Hours'), and count and average time measures. Using the default options to Summarize the columnar data is not an issue - I am able to get a Total for the count column and an overall average for each Average Time column. When attempting to retrieve row summarizations however, I receive an error message:
Error Message
I don't understand the verbiage stating the object selected represents a single value as it should be performing the aggregation on each column for the given row.
My expected results are outlined below:
**Hour 1** **Hour 2**
| Count | Average Time | Count | Average Time | Total | Average |
| -------- | ------------ | -------- | ------------ | ------- | --------- |
Location A| 20 | .5 | 15 | .75 | 35 | .625 |
Location B| 15 | .25 | 25 | .5 | 40 | .375 |
My question is: are the default summary options within Cognos Crosstabs not suitable for row level aggregations?

is this a good practice for a database for products with multiple units,cost price & expiry dates

I a working on a database that is going to have a product with multiple expiration date, multiple cost prices and therefore there will multiple stock entries for the same product, I have made an initial database design for this and I wanted to ask you guys if this is a good practice or not. If not please advise me on how to do it the right way.
This is what I have thought about so far.
Creating 3 tables (1. Product_info - 2.Product_Stock - 3.units)
and below is the detailed structure:
Units Table
--------------------------
id Name
|------|
1 |Piece |
2 |Pack |
3 |Kilos |
Here I will list all the units that I will use as the base product unit.
Product Information Table
-----------------------------------------------------------------------------------------------------------------------
id Name AvgCostPrice AvgPrice AvgPackCostPrice AvgPackPrice totalQuantity BaseUnitID multiplier PackBarcode Barcode
|------|------------|--------|----------------|------------|-------------|----------|----------|------------|--------|
1 |Soda | | | | | 108 | 1 | 12 | 111111 | 111222 |
2 |Water | | | | | 50 | 1 | 6 | 222222 | 222111 |
in the above table the average cost price and selling price for the packs and piece will be calculated from the different stocks I have for the said product.
The multiplier column will be for how much pieces does a product pack hold.
The Total Quantity will hold the sum of different stock quantities I have in the (Product Stock Table) ,Also it will only sum the quantity for base unit of the product.
for example: if the base unit of soda is pack, then it will sum the (PackQTY) Column in (Product Stock Table). and if else it will sum (Quantity) in that table.
Product Stock Table
---------------------------------------------------------------------------------------------------------------------------
id ProdID UnitID CustomBarcode Quantity PackQTY CostPrice Price PackCostPrice PackPrice expDate Enabled
|------|-------|--------------|-------------|-------|----------|-------|--------------|---------|--------------|---------
1 |1 | 1 | | 84 | 7 | 2.0 | 2.4 | 24.0 | 29 | 20/may/2019 | 1
2 |1 | 1 | | 24 | 2 | 1.5 | 1.9 | 18.0 | 23 | 10/aug/2019 | 1
2 |2 | 3 | | 50 | 0 | 3.0 | 5.0 | 0.0 | 0 | 10/Feb/2019 | 1
1.The enabled column will work as a (Boolean) to determine whether to use this stock while selling.
for example: if I wanted to sell a soda Can and I have two Stocks for it. if stock number one is 0 then enable column will be false and therefore it will only subtract the quantity sold from stock number two and use its price and cost price in the (SalesDetails Table)
Custom Barcode Column will be used to separate stocks when having a discount on almost expired stock.
And I also thought of separating the different units for each product stock in (Stock Table)
So, when I want to sell 24 pieces of soda and 3 packs of soda it will choose the oldest stock depending on its (Enabled Column Value = True)and subtract that quantity from it and if it reaches zero then (Enabled column) Value will change to false.
after that it will go again and do the same but this time it will change the value of PackQtY from 7 to 4 and the Quantity Column Value will be calculated through this [ Product_Stock.Quantity= Product_Stock.Quantity - (QtySold * Prodcut_info.Multiplier Column Value) ] which will be 84-(3*12)= 48
And the sales details structure output will be like this:
Sale Details Table
----------------------------------------------------------
id ProdID UnitID Quantity CostPrice Price total CostTotal
|------|-------|-----------|-------------|-------|------|---------|
1 |1 | 1 | 24 | 2.0 | 2.4 | 57.6 | 48.0 |
2 |1 | 2 | 3 | 18.0 | 23.0 | 69.0 | 54.0 |
Product Stock Table (After Selling 24 pieces of Soda and 3 packs of Soda)
---------------------------------------------------------------------------------------------------------------------------
id ProdID UnitID CustomBarcode Quantity PackQTY CostPrice Price PackCostPrice PackPrice expDate Enabled
|------|-------|--------------|-------------|-------|----------|-------|--------------|---------|--------------|---------
1 |1 | 1 | | 48 | 4 | 2.0 | 2.4 | 24.0 | 29 | 20/may/2019 | 1
2 |1 | 1 | | 0 | 0 | 1.5 | 1.9 | 18.0 | 23 | 10/aug/2019 | 0
Sorry if I didn't explain it very well.
Thank you very much in advance.

Firstly, you need to be careful about how you use nouns.
For example: "Price" does not mean the same as "Cost" and "CostPrice" sounds like an oxymoron. I suggest that you restrict your yourself to using either Cost or Price.
Van Ng asks if you have done an Entity Relationship diagram. Well, at the stage that you seem to be at, it is probably unwise to start with an ER diagram because an ER diagram is helpful as a summary of a model that you have already defined - and you are not yet at that stage.
Averages: If you design your database schema correctly then you can calculate data such as averages. You don't need averages as base tables.
I recommend that you consider using the fact-based modeling method called "object-role modeling"(ORM) because you can start with "the facts" before thinking about drawing ER diagrams.
Example:
I used the NORMA ORM tool to create the following example:
First, I read your text, extracted facts and then used the facts to design an object-role model.
Then I used the NORMA tool to generate a "logical view" of the object-role model. (happens in milliseconds)
I did not add everything that you mention but I hope that this will be enough to help you to make progress.
The example contains two artefacts:
1: The logical model that was generated by the NORMA tool.
2: The facts from which the logical model was generated.
[

SSRS 'where clause'

I've got a table that contains sales information for several companies. Each sales transaction the company makes is stored in the table, and the week of the year (1-52) that the sale took place within is stored also. Here's a small example of the database table that I'm querying to produce the SSRS report.
|---------------------|------------------|------------------|
| Company | Week |Sales_Transaction |
|---------------------|------------------|------------------|
| Alpha | 20 | 1.00 |
|---------------------|------------------|------------------|
| Alpha | 20 | 2.00 |
|---------------------|------------------|------------------|
| Beta | 20 | 9.00 |
|---------------------|------------------|------------------|
| Alpha | 21 | 5.00 |
|---------------------|------------------|------------------|
| Coolbeans | 21 | 5.50 |
|---------------------|------------------|------------------|
| Alpha | 22 | 2.00 |
|---------------------|------------------|------------------|
| Alpha | 22 | 2.00 |
|---------------------|------------------|------------------|
| Coolbeans | 22 | 3.00 |
|---------------------|------------------|------------------|
I have a matrix with a row group which produces a line in the matrix for each company. The matrix has 52 additional columns for each week of the year. Here's a condensed version of the matrix and data I want to see.
|--------------|---------------|----------------|----------------|
| Company | # Sales Wk 20 | # Sales Wk 21 | # Sales Wk 22 |
|--------------|---------------|----------------|----------------|
| Alpha | 2 | 1 | 2 |
|--------------|---------------|----------------|----------------|
| Beta | 1 | 0 | 0 |
|--------------|---------------|----------------|----------------|
| Coolbeans | 0 | 1 | 1 |
|--------------|---------------|----------------|----------------|
To count the number of sales transactions for each week for each company, I'm using this expression like this for each column:
=Count(IIF(Fields!Sales_Week_Number.Value = "20", Fields!Sales.Value, 0))
Using the example expression above which I'm placing in the # Sales Wk 20 matrix column, the problem is that instead of counting ONLY the transactions that occurred in week 20, it counts transactions for all weeks for the company. The result is that in column # Sales Wk 20, it shows a 5 for Alpha, a 1 for Beta, and a 2 for Coolbeans.
What do I need to do to make it only count the sales transaction from the specific week?
Side Note: Regarding the 52 columns for each week of the year, I intentionally did not use a column group for this b/c I need to do some other calculations/comparisons with another matrix which doesn't play nice when column groups are used. I did, however, use a row group for the companies.

Your expression should use SUM instead of count
=SUM(IIF(Fields!Sales_Transaction.Value=0,0,1))

I think you may be going down the wrong path here. Since your using a matrix in SSRS, then the easiest way is to make SSRS handle the separation for you rather than building a WHERE.
Try just adding =CountRows() as part of your formula, and ssrs handles the grouping for you. I'll check the format of the command when I'm on-line properly not on my phone.

Use this expression in your matrix's value column -
=IIf((Fields!Sales_Transaction.Value)>0,Count(Fields!Sales_Transaction.Value),0);

Use an array formula to calculate datedif on criteria

Lets say I have a record of logged flights in a range, example below
[A] | [B] | [C][D][...] | [G]
1 Date | Mode | More Data.... | Days Since
2 1 May | Day | .... | Formula here
3 4 May | Night | .... | Formula here
4 6 May | Day | .... | Formula here
5 8 May | Night | .... | Formula here
I can use a formula to get the datedif between each row in column G, similar to
=DATEDIF(A2,A3,"d")
and copy it all the way down the column, but I'm guessing I need an array formula to go back and find the first row above the current row that matches in column B and get the datedif or days between those two dates. I'm assuming an array formula, but what would the best way to go about that be? I need the result to be the days between row 5 and 3 (night) and 4 and 2 (day) and then copied down about 300 rows...
I was looking at another array formula for sorting rows and eliminated blanks, but not sure how to adapt it to this scenario.

To get the difference in days you only have to subtract one date from the other.
LOOKUP function can be used to find the previous match, so try this formula in G2 copied down
=IFERROR(A2-LOOKUP(2,1/(B$1:B1=B2),A$1:A1),"")
format result cell as number with no decimal places

Conditional SUM using multiple tables in EXCEL

I have a table that I'm trying to populate based on the values of two reference tables.
I have various different projects 'Type 1', 'Type 2' etc. that each run for 4 months and cost different amounts depending on when in their life cycle they are. These costings are shown in Ref Table 1.
Ref Table 1
Month | a | b | c | d
---------------------------------
Type 1 | 1 | 2 | 3 | 4
Type 2 | 10 | 20 | 30 | 40
Type 3 | 100 | 200 | 300 | 400
Ref Table 2 shows my schedule of projects for the next 3 months. With 2 new ones starting in Jan, one being a Type 1 and the other being a Type 2. In Feb, I'll have 4 projects, the first two entering their second month and two new ones start, but this time a Type 1 and a Type 3.
Ref table 2
Date | Jan | Feb | Mar
--------------------------
Type 1 | a | b | c
Type 1 | | a | b
Type 2 | a | b | c
Type 2 | | | a
Type 3 | | a | b
I'd like to create a table which calculates the total costs spent per project type each month. Example results are shown below in Results table.
Results
Date | Jan | Feb | Mar
-------------------------------
Type 1 | 1 | 3 | 5
Type 2 | 10 | 20 | 40
Type 3 | 0 | 100 | 200
I tried doing it with an array formula:
Res!b2 = {sum(if((Res!A2 = Ref2!A2:A6) * (Res!A2 = Ref1!A2:A4) * (Ref2!B2:D6 = Ref1!B1:D1), Ref!B2:E4))}
However it doesn't work and I believe that it's because of the third condition trying to compare a vector with another vector rather than a single value.
Does anyone have any idea how I can do this? Happy to use arrays, index, match, vector, lookups but NOT VBA.
Thanks

Assuming that months in results table headers are in the same order as Ref table 2 (as per your example) then try this formula in Res!B2
=SUM(SUMIF(Ref1!$B$1:$E$1,IF(Ref2!$A$2:$A$6=Res!$A2,Ref2!B$2:B$6),INDEX(Ref1!$B$2:$E$4,MATCH(Res!$A2,Ref1!$A$2:$A$4,0),0)))
confirm with CTRL+SHIFT+ENTER and copy down and across
That gives me the same results as you get in your results table
If the months might be in different orders then you can add something to check that too - I assumed that the types in results table row labels might be in a different order to Ref table 1, but if they are always in the same order too (as per your example) then the INDEX/MATCH part at the end can be simplified to a single range

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Data warehouse design - periodic snapshot with frequently changing dimension keys - database

Related

Totals for Rows in Cognos 11 Crosstab Report

is this a good practice for a database for products with multiple units,cost price & expiry dates

SSRS 'where clause'

Use an array formula to calculate datedif on criteria

Conditional SUM using multiple tables in EXCEL

Categories

Resources