MS Analysis Cube - one-to-many joins - sql-server

I am building an OLAP cube in MS SQL Server BI Studio. I have two main tables that contain my measures and dimensions.
One table contains
Date | Keywords | Measure1
where date-keyword is the composite key.
One table contains looks like
Date | Keyword | Product | Measure2 | Measure3
where date-keyword-product is the composite key.
My problem is that there can be a one-to-many relationship between date-keyword's in the first table and date-keyword's in the second table (as the second table has data broken down by product).
I want to be able to make queries that look something like this when filtered for a given Keyword:
Measure1 Measure2 Measure3
============================================================
Tuesday, January 01 2013 23 19 18
============================================================
Bike 23
Car 23 16 13
Motorcycle 23
Caravan 23 2 4
Van 23 1 1
I've created dimensions for the Date and ProductType but I'm having problems creating the dimension for the Keywords. I can create a Keyword dimension that affects the measures from the second table but not the first.
Can anyone point me to any good tutorials for doing this sort of thing?

Turns out the first table had one row with all null values (a weird side effect of uploading an excel file straight into MS SQL Server db). Because the value that the cube was trying to apply the dimension to was null in this one row, the whole cube build and deploy failed with no useful error messages! Grr

Related

how to use the Pivoted Column values in Matrix use in another Tablix and write expressions on top of it

I have one set of data with fields
StudentId, Name , Address in one dataset and being used in one Tablix.
also another set of data: StudentID Subject Marks in another Dataset and using Matrix to Pivot in the Report.
I am able to fetch the Report in this way
StudentID Name Address MAths Physcis Chemistry Median
1 Mike NJ 85 70 90 2
2 David CA 81 85 90 1
I was calculating Median by counting number of Subject Marks greater than 80.
Now how do I use the value of Median in Tablix instead of in Matrix.
Below should be the expected output format
StudentID Median Name Address MAths Physcis Chemistry
1 2 Mike NJ 85 70 90
2 3 David CA 81 85 90
Note: I am using Matrix to Pivot Subject Column in SSRS Report. I am using Pivot operation in SSRS instead of performing in SP because I get 40 columns after Pivoting in SP and need to physically map 40 columns. Here in example I have only given 3 columns(Maths, Physcis and Chemistry).
Also please do let me know if expected output format is at least possible.
Is there any way that I will be able to Pivot Subject Columns inside the Tablix itself instead of using the another Matrix??
Thank you.
There are two ways to typically go about an aggregation like this. If you stick with the two existing datasets, you'll have to use the Lookup or LookupSet functions to get data from the other dataset. For example, if your table/matrix is using the second dataset as it's source, you would Lookup the Name of each student. Keep in mind that this is not efficient for large reports.
The other approach, which I would recommend, is to join these two datasets in SQL and use that as the data source for the report. This is more efficient and makes the report simpler to maintain.
It's good that you are letting the report do the pivoting for you, it works much better that way.

SQL Server database design for evaluations

I'm designing this employee evaluation web page, and was wondering if my current database design is the correct one or if it could be improved.
This is my current design
Table Agenda:
+--------------+----------+----------+-----------+------+-------+-------+
| idEvaluation | Location | Employee | #Employee | Date | Date1 | Date2 |
+--------------+----------+----------+-----------+------+-------+-------+
Date is the date scheduled for the evaluation to be performed.
Date 1 and Date 2 its a period of time to retrieve some metrics from another database.
Table Evaluations:
+--------------+---------+------------+------+----------+
| idEvaluation | Manager | Department | Date | Comments |
+--------------+---------+------------+------+----------+
Table Scores:
+--------------+----------+-------+
| idEvaluation | idFactor | Score |
+--------------+----------+-------+
idFactor relates to another table which contains the factor and a description of it, like I said its this a correct design??
My concern its this, currently there are 60 employees, 11 managers and 12 factors, each employee its evaluated twice a year by every manager, so in the Agenda Table there's not much trouble since its only one record per evaluation (60 employees = 60 records), how ever on the Evaluations Table there are 11 records for every evaluation, so it goes to 660 records (60 employees * 11 managers = 660), and then on the Scores Table it goes even bigger since there are 12 factors for every evaluation, it goes to 7920 records (660 evaluations * 12 factors each = 7920).
Is this normal?? Am I doing it wrong?? Any input its appreciated.
EDIT
Location, Employee, #Employee, Manager and Department are loaded automatically by the vb.net page, they are "imported" from an Active Directory and its checked before insertion so duplicate names, misspelled names, and this sort of thing its not an issue.
The main idea is you dont want to repeat string literals
So if you have
id Department
1 Sales
2 IT
3 Admin
Instead of repeat Sales many time you only use 1 which is smaller so things also get faster.
Second if you have users
id user
1 Jhon Alexander
2 Maria Jhonson
If Jhon decide change his name then you will have to check all tables and change the name. Also there is the problem if two person have same name you wont know which one are you evaluating.
So go for separated table and use the ID.

Optimal View Design To Find Mismatches Between Two Sets of Data

A bit of background...my company utilizes a piece of software that stores information about a mortgage loan in independent fields. These fields are broken up across many tables in the loan database.
My current dilemma revolves around designing a view(s) that will allow me to find mismatched data on a subset of loans from the underwriting side of our software and the lock side of our software.
Here is a quick example of the data returned from the two views that already exist:
UW View
transID | DTIField | LTVField | MIField
50000 | 37.5 | 85.0 | 1
Lock View
transID | DTIField | LTVField | MIField
50000 | 42.0 | 85.0 | 0
In the above situation, the view should return the fields that are not matching (in this case the DTIField and the MIField). I have built a comparison view that uses a series of CASE statements to return either a 0 for not matched or a 1 for matched already:
transID | DTIField | LTVField | MIField
50000 | 0 | 1 | 0
This is fine in itself but it is creating a bit of an issue downstream on the reporting side. We want to be able to build a report that would display only those transIDs that have mismatched data and show which columns are not matched. Crystal Reports is the reporting solution in question.
Some specifics about the data sets...we have 27 items of the loan that we are comparing (so a total 54 fields). There are over 4000 loans in the system and growing. There are already indexes on the transID fields.
How would you structure the view to return all the data needed for the report? We can do a good amount of work in Crystal Reports but ideally much of the logic would be handled in MSSQL.
Thanks for any assistance.
I think there should be no issue in comparing the 27 columns for a given row. Since you'll be reading the row just once and comparing the columns on that row in both the tables, it shouldn't really pose any performance issues. You can use some hash functions HASHBYTES to assign a hash value to the combination of these 27 fields in both the tables and then use this field to compare which rows should be returned by the view. This should result in some performance improvement. Testing will reveal more.

How to design a Db table for attendance

I am currently working on a school management system but can't seem to figure out the best way to design my student attendance table.
INFO
School is for 14 weeks and class holds 5 times a week. Students in the school can be up to 2000 per term. Meaning attendance can be up to 14 x 5 x 2000 = 140, 000 per term.
I am developing the application for a desktop using VB.Net and MS Access.
PROGRESS SO FAR
I have so far designed something that I am skeptic about.
table name: attendance
_____________________________________________
| id |std_id | att_week | att_date | status |
''''''''''''''''''''''''''''''''''''''''''''''
| 1 | 0001 | 1 |29/9/2015 | yes |
''''''''''''''''''''''''''''''''''''''''''''''
| 2 | 0002 | 1 |29/9/2015 | yes |
''''''''''''''''''''''''''''''''''''''''''''''
I easily found out that designing it like this can easily yield 140, 000 rows in a term.
I also thought of making the week days as column names, that will easily result in 14 x 5 = 70 columns.
What is the best way to design this said table.
Friend I think you should construct your table like this:
Table would accept only the absentees
id student_id class date
________________________________________
1 11 7a 11/11/2020
2 21 6b 10/12/2020
and so on.....
You could easily retrieve details like
1] total absentees per class
2] total absent of a student in date range
3] Per day report of attendance of student can be easily prepared based on this data
ALSO this would be extremly fast due to less number of record and if you index on class_id and and partition tables in specified date range.
Thank You!

SSAS -> AdventureWorks Example -> Using the browser to splice a measure by week, shows results that have two of the same week records?

I have been working on a cube and noticed that when I am browsing measures in my cube by weeks, I am getting an unexpected result, but first let me display my current scenario. I am looking at counts of a fact load by weeks. When I do so I am getting results like these. :
Weeks | Fact Internet Sales Count
2001-07-01 00:00:00.000 | 28
2001-07-08 00:00:00.000 | 29
....and so on as you would expect.
Further down I noticed this. :
2001-09-30 00:00:00.000 | 10
2001-09-30 00:00:00.000 | 24
As you can see, it shows the week twice with different counts, when you add these counts together it is the correct number of counts for this week (i.e. 34).
I am just confused why it is showing two weeks, when I look at the data in sql I can see that the difference in data between these two is strictly the month in which these dates fell (10 in the earliest month and 24 and the later month in any example).
I initially saw this in my original cube that I created on my own, in turn, I pulled up trusty adventureWorks practice cube and found that it was present in that cube also.
This is due to the fact that within this date hierarchy, the lowest attribute in the hierarchy was date not week. Therefore, there was always a split for weeks by date. This can be alleviated by making a date hierarchy with week as the lowest portion of a date hierarchy.

Resources