Semi-historical questions in DataWarehouse/Cube - sql-server

I’m about to design a data warehouse and a cube with a product dimension and a sales fact
table. The product dimension is a SCD (type 2).
Lets say the data in the product dimension looks like this the first year:
Name | Product Group | Year
A | A | 2011
B | A | 2011
C | B | 2011
and like this the second year:
Name | Product Group | Year
A | A | 2012
B | B | 2012
C | C | 2012
as you can see some of the products has changed product group between 2011 and 2012.
In the cube I want to ask two types of questions:
The easy one: How much did we sell for in each product group in 2011.
The hard one: How much did we sell for in each product group in 2011 if the products belonged to the product groups they have in 2012.
How would you design the warehouse and the cube do accomplish this?
Thanks!
Ps. I'm using SQL Server 2012

This can be treated as a type 3 SCD. Adding a column [Product Group 2011] allows answering these types of what-if questions.
Name | Product Group | Product Group 2011 | Year
A | A | A | 2012
B | B | A | 2012
C | C | B | 2012
An alternative is to add a durable key to the product dimension and the sales fact.
id | dur_id | Name | Product Group | Year
1 | 1 | A | A | 2012
2 | 2 | B | B | 2012
3 | 3 | C | C | 2012
4 | 1 | A | A | 2012
5 | 2 | B | B | 2012
6 | 3 | C | C | 2012
Then you can join from the SALES fact to the PRODUCT dimesion on the dur_id, just remember to restrict based off of the dimension Year.
For SSAS, you could load in "duplicate" fact rows (i.e. the 2011 and 2012 fact rows associated with 2011, and again associated with 2012) You'd then need to make 2012 the default member for the [Year] attribute hierarchy, and prevent roll-ups that don't specify the year.

I got to great answers over at forums on msdn.
http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/96070b49-954a-456f-9687-3c8afaf74a39

Related

FOR XML command works in SQL Server 2008 R2 but not in SQL Server 2017

==Edited to include outputs from each server==
==Edited to include additional table definition information==
I am attempting to integrate an application running on a SQL 2008R2 database, with a new application that runs on a SQL Server 2017 database.
This is undertaken by SQL scripts that are run as stored procedures on the 2017 database to copy information across from the 2008 database.
The SQL script below works perfectly fine on the 2008R2 database (in management studio 2014) and uses the for XML command to produce a string list of 1's and 0's that correspond to a week that an activity occurs. 1= occurs, 0 = does not occur, with this script being part of a larger SQL script.
When I run this script within SQL management Studio 17 on a 2017 server with the 2008R2 database setup as a linked server, the script runs but the FOR XML export just returns a sting of 0's and is not working as expected.
I've looked into the For XML command and I am not aware of it acting any differently on different versions on SQL server.
I also have another 10-15 integration scripts (though none of the others use the for xml command), that work perfectly well between the 2008 and 2017 database where the 2008 database is a linked server.
I can individually return the information from the tables via the linked server, but when I attempt to run the query the activity id returns successfully but the code string does not.
I am having to use the for XML script as the old database records each occurrence of an activity as an individual line, while the new system records one record for the activity and then records a string of 0's and 1's that work as a week pattern to say if an activity occurs or not.
I don't know if it is the use of the for xml command itself or the fact that its being run via a linked server.
In the script below I have removed the references for the linked server and the database name for security reasons, but as mentioned the script works perfectly fine in my 2008R2 environment.
When run in 2008 I receive the below output
+------------+-------------------------------------------------+
| activityid | code |
+------------+-------------------------------------------------+
| 59936 | 11111110111111100000000000000000000000000000000 |
+------------+-------------------------------------------------+
When Run in 2017 I receive the following output
+------------+-------------------------------------------------+
| activityid | code |
+------------+-------------------------------------------------+
| 59936 | 00000000000000000000000000000000000000000000000 |
+------------+-------------------------------------------------+
The vw_AcademicWeeks element is a view which picks up the following information
+----------------+-------------+
| Field | Type |
+----------------+-------------+
| ay_code | varchar(4) |
| week_number | int |
| ay_start | date |
| ay_end | date |
+----------------+-------------+
This returns for each week within an academic year the start and end date of the week (example shown below)
+---------+---------+------------+------------+
| ay_code | week_no | ay_start | ay_end |
+---------+---------+------------+------------+
| 1718 | 1 | 01/08/2017 | 06/08/2017 |
| 1718 | 2 | 07/08/2017 | 13/08/2017 |
| 1718 | 3 | 14/08/2017 | 20/08/2017 |
| 1718 | 4 | 21/08/2017 | 27/08/2017 |
+---------+---------+------------+------------+
The TT_Activity table is setup as below
+----------------------+-----------+
| Colum Name | Data Type |
+----------------------+-----------+
| ActivityOccurrenceID | int |
| ActivityID | int |
| StartTime | datetime |
| EndTime | datetime |
+----------------------+-----------+
This table contains multiple rows for an activity, with different start and end times i.e. if an activity occurs every day at 9am, there would be five entries for a week
+----------------------+------------+---------------------+---------------------+
| ActivityOccurrenceID | ActivityID | StartTime | EndTime |
+----------------------+------------+---------------------+---------------------+
| 2214753 | 65577 | 12/07/2019 13:30:00 | 12/07/2019 14:30:00 |
| 2214752 | 65577 | 05/07/2019 13:30:00 | 05/07/2019 14:30:00 |
| 2214906 | 65583 | 02/07/2019 14:30:00 | 02/07/2019 16:00:00 |
| 2215967 | 65613 | 02/07/2019 14:30:00 | 02/07/2019 16:00:00 |
| 2226569 | 65949 | 02/07/2019 14:30:00 | 02/07/2019 16:00:00 |
| 2226754 | 65963 | 02/07/2019 14:30:00 | 02/07/2019 16:00:00 |
+----------------------+------------+---------------------+---------------------+
The TT_Activity field contains the basic information for an activity and contains a single record for each activity
+-------------+--------------+
| Colum Name | Data Type |
+-------------+--------------+
| ActivityID | int |
| Code | varchar(40) |
| Description | varchar(255) |
| PeriodID | int |
+-------------+--------------+
Which contains the following information
+------------+---------+-------------+----------+
| ActivityID | Code | Description | PeriodID |
+------------+---------+-------------+----------+
| 20668 | Maths | Maths | 2017 |
| 20669 | English | English | 2017 |
| 20670 | Science | Science | 2017 |
+------------+---------+-------------+----------+
==SQL Query Below==
select
tta2.activityid,
(
select
case when ttao.endtime is null then '0' else '1' end
from
vw_AcademicWeeks aw
left join
TT_ActivityOccurrence ttao
on
(dateadd(dd,datediff(dd,0,DATEADD(dd, -(DATEPART(dw, ttao.StartTime)-1), ttao.StartTime)),0)) = aw.ay_start
and ay_code='1718'
and ttao.ActivityID=tta2.ActivityID
where
aw.week_no>=6
group by
ttao.ActivityID,
aw.week_no,
case when ttao.endtime is null then '0' else '1' end
having
count(aw.week_no)<>9
order by
week_no asc
FOR XML PATH(''))as code
from
TT_Activity tta2
where tta2.PeriodID='2017'
Having looked at the code again and pulling it apart I've found the cause of the issue.
The language of the 2008 R2 server was set as British, while the language of the 2017 server was set as US-English.
This was causing the vw_AcademicWeeks view to create start and end dates of a week that were wrong, as such the formula string below was returning the incorrect date which was then not matching up.
TT_ActivityOccurrence TTAO ON (dateadd(dd, datediff(dd, 0, DATEADD(dd, - (DATEPART(dw, ttao.StartTime) - 1), ttao.StartTime)), 0)) = aw.ay_start

Limit RANGE with condition in Window function

Take an example I have the following transaction table, with transaction values of each department for each trimester.
TransactionID | Department | Trimester | Year | Value | Moving Avg
1 | Dep1 | T1 | 2014 | 13 |
2 | Dep1 | T1 | 2014 | 43 |
3 | Dep1 | T2 | 2014 | 36 |
300 | Dep1 T1 | 2017 | 28 |
301 | Dep2 T1 | 2014 | 24 |
I would like to calculate moving average for each transaction from the same department, taking the window as from the 6 trimesters to 2 trimesters before the current line's trimester. Example for transaction 300 in T1 2017, I'd like to have the average of transaction values for Dep1 from T1-2015 to T2-2016.
How can I achieve this with sliding window function in SQL Server 2014. My thought is that I should use something like
SELECT
AVG(VALUES) OVER
(PARTITION BY DEPARTMENT ORDER BY TRIMESTER,
YEAR RANGE [Take the range from previous 6 to 2 trimesters])
How would we define the RANGE clause. I suppose I could not use ROWS due to the number of rows for the window is unknown.
The same question for median. How would we rewrite for calculating the median instead of mean ?

How do I use SQL to count two separate rows, then display the separate counts

I've checked through here on how to use group by to count rows, but I think I am implementing it wrong, or missing something.
What I have:
Two columns with data. Machine_GroupID ( I extract the client name from this). The second column, fieldValue, contains the ship date of the Machine(I extract the year only from this).
SQL 2012 server
Example:
+------------------------------+-----------------------+
| Machine_GroupID(Client Name) | fieldValue(Ship Date) |
+------------------------------+-----------------------+
| Site1.clientA | 2015-05-07 |
| Site2.clientA | 2014-01-06 |
| Department.Site1.clientA | 2015-02-05 |
| Site1.clientB | 2014-03-04 |
| Department.Site1.ClientC | 2015-10-01 |
+------------------------------+-----------------------+
What I am trying to do:
I am trying to generate a report to show all of the workstations a client purchased in a certain year. This will end up being a report in reportviewer, or something useful to display the data to our Executive team.
Desired Report example:
Machines purchased in 2015
+---------+------------------------+
| ClientA | 2(Count of fieldValue) |
+---------+------------------------+
| ClientC | 1 |
+---------+------------------------+
Machines purchased in 2014
+---------+---+
| ClientA | 1 |
+---------+---+
| ClientB | 1 |
+---------+---+
My Code so far:
select count(*), reverse(left(reverse(Machine_GroupID) ,
charindex('.',reverse(Machine_GroupID))-1)) as Client ,
LEFT(fieldValue,4) AS "Ship Year"
from dbo.vSystemInfoManual
where fieldName = 'Ship Date'
group by fieldValue, Machine_GroupID
This code generates a table that looks like the following:
+---+----------+-----------+
| | Client | Ship Year |
+---+----------+-----------+
| 1 | ClientA | 2015 |
| 1 | ClientA | 2015 |
| 1 | ClientA | 2014 |
| 1 | ClientB | 2014 |
| 1 | ClientC | 2015 |
+---+----------+-----------+
Is there a change that I can make to my code to make this possible ? Am I trying to do too much with this query ? I am still learning SQL so any help is definitely appreciated! Thank you.
Your sample data doesn't really go along with the results you posted. Anyway, I would change the way you are getting the Client value by using the PARSENAME function:
SELECT PARSENAME(Machine_GroupID,1) Client,
LEFT(fieldValue,4) [Ship Year],
COUNT(*) N
FROM dbo.vSystemInfoManual
WHERE fieldName = 'Ship Date'
GROUP BY PARSENAME(Machine_GroupID,1),
LEFT(fieldValue,4);

How to query min and max when a column consists of different several strings

I am still learning SQL query. I have a table looks like this:
The Financial_Year_Month_Code consists of the "year" and P01 stands for Period 1. The Calendar Key is just the year month day.
|---------------------------|--------------|--------------------
| Financial_Year_Month_Code | Calendar_Key | Column 3...
|---------------------------|--------------|--------------------
| 1988 P01 | 19870901 |
| ......
| 2013 P01 | 20110901 |
| 2013 P01 | 20110902 |
| 2013 P01 | 20110903 |
| 2013 P01 | ..... |
| 2013 P02 | 20111002 |
| 2013 P02 | 20111003 |
| 2013 P02 | 20111004 |
| MORE...
The result I want to query should look like this:
|----------------------|------------------|--------------------|-----------------|
| Financial_Code_Start | Calendar_Key_Min | Financial_Code_End |Calendar_key_max |
|----------------------|------------------|--------------------|-----------------|
| 2013 P02 | 20110901 | 2014 P01 | 20120930 |
| 2013 P03 | 20111002 | 2014 P02 | 20121029 |
| ....
The original table has a huge list of Financial Year Code from 1988 to 2014. The management would like me to produce a table above to list the financial year code between 2012 and 2014 so they can see the start date and the end date of a rolling 12 month period. Too bad, our developers are all on holiday at the moment so I need help on this one. Thank you very much.
You can group a query by Financial_Year_Month_Code and then calculate aggregates (like MIN and MAX) for each group.
SELECT Financial_Year_Month_Code, MIN(Calendar_Key), MAX(Calendar_Key)
FROM YOUR_TABLE
GROUP BY Financial_Year_Month_Code

how do I design return travel itinerary vs one way itinerary

I'm not quite sure how to approach this problem:
Price for one-way trips is different than price for round trip itineraries.
In the backend, I have a table for storing the itinerary (which yields an id). I have another pricing table, which defines what is the price of this id from startDate to endDate.
My itinerary table can only represent information for one way travel. How do I model round trip itineraries ?
One way to deal with this was:
have another column in the table: returnId
if returnId = -1 -> one way trip
else
returnId = id to its complimentary itinerary
for e.g.
A -> B is a roundtrip itinerary & C -> D is a one way trip;
It would look something like this:
Id | Departure | Arrival | ReturnId
1 | A | B | 3
2 | C | D | -1
3 | B | A | 1
In this case pricing table
Id | StartDate | EndDate | Price
1 | Jan 1, 2012 | Dec 10,2012| 150.00
3 | Jan 1, 2012 | Dec 10,2012| 150.00
2 | Jan 1, 2012 | Dec 10,2012| 100.00
I'd like to hear thoughts/suggestions on this design ?
EDIT:
I added a related question and I think the answer to this problem will have to cater to both the requirements.
One thing, I'd like to mention is..the price for a round trip is specified as a unit and not individual components from A->B and back, B->A.
Similarly, if there are multiple segments in a trip, price is defined for the complete trip and not individual segments.
Rather than adding a self join like that, I would have a Trip table which contains the one-to-many mapping of Trip to Itinerary (where 1 trip consists of multiple itineraries). This way, a trip can have more than 2 legs..
Something like:
Trip_Itineraries
TripId | ItineraryId
1 | 1
1 | 2
2 | 3
Itinerary
ItineraryId | Departure | Arrival
1 | A | B
2 | B | A
3 | C | D
Pricing
ItineraryId | StartDate | EndDate | Price
1 | Jan 1, 2012 | Jul 10,2012 | 100.00
2 | Jul 1, 2012 | Dec 10,2012 | 100.00
2 | Jul 1, 2012 | Dec 10,2012 | 150.00
Then you can do:
SELECT T.TripId, sum(P.price)
FROM Trip_Itineraries T INNER JOIN Pricing P ON T.ItineraryId = P.ItineraryId
GROUP BY T.TripId
to get the total price for the trip..
Blended the two answers and came up with this:
Journey:
journeyId
baggage policy
misc
Segment:
segmentId
journeyId (FK)
segment info
Price:
journeyId
startDate
endDate
price
Journey
Jid | baggage | misc
1 | "baggage policy1" | "round trip A->B"
2 | "baggage policy2" | "one-way C->D"
3 | "baggage policy3" | "one-way E->H with a hop in F, followed by G to H"
Segment
Id | Jid | Dep | Arrival
1 | 1 | A | B
2 | 1 | B | A
3 | 2 | C | D
4 | 3 | E | F
5 | 3 | F | G
6 | 3 | G | H
Price
JourneyId | StartDate | EndDate | Price
1 | Jan 1, 2012 | Dec 10,2012| 150.00
3 | Jan 1, 2012 | Dec 10,2012| 150.00
2 | Jan 1, 2012 | Dec 10,2012| 100.00
Thoughts ?

Resources