How to add a break-out aggregate row group in SSRS - sql-server

I know virtually nothing about SSRS, so forgive me if I'm using the wrong vocab.
The group I'm working for has a list of volunteer opportunities. Each opportunity has a specified number of volunteers needed. The database keeps track of people who have volunteered, for which opportunity they volunteered, and the status of their volunteering: whether they've just signed up and need to be contacted, whether they're somewhere in the process of becoming a volunteer, whether they are volunteering, or whether they've quit volunteering.
Certain volunteer opportunities (i.e. those requiring contact with children and therefore requiring a background check), have more volunteer states than the rest of the opportunities. For these opportunities there are a total of 14 states, compared to 3 or 4 for the rest.
I need to create a report that displays the counts of people in each state for each opportunity. It's unreasonable for there to be 15 columns (14 + the volunteers needed) for states when most have only three. For children-related opportunities, I want to specify an 'other' column, and have a expansion [+] to the left of the volunteer opportunity name which will expand out all the children-specific states with their associated counts.
The report as it is now looks like this:
My background is in database/query design, so naturally I wrote a query with a joined sub-query for each of the columns. Rather than making an additional new subquery for each of the 'Other' states, I assume there's a way that I can have a single subquery join grouping on the volunteer status, and let SSRS do the rest of the work. (I may be wrong about SSRS's capabilities here.)
My proposed query looks something like this:
SELECT vo.name, vo.volunteers_needed, vm.status, vm.status_count
FROM tbl_volunteer_opportunity vo
JOIN (SELECT volunteer_opp_id, status, COUNT(*) "status_count"
FROM tbl_volunteer_opportunity_member
GROUP BY volunteer_opp_id, status) vm ON vo.volunteer_opp_id = vm.volunteer_op_id
I now need to make a tablix and/or datasource to make columns for the Connected, In Process, No Contact, and an aggregate for Other values, and then do something else for a expansion for each of the Other statuses.
I'm not sure how to do either of those things.

Assuming you have a query that returns data in the following format:
+-------------+--------------------+-------------+-------+
| ProfileName | Status | StatusGroup | Count |
+-------------+--------------------+-------------+-------+
| A | Needed | Needed | 5 |
| A | Connected | Connected | 3 |
| A | In Process | In Process | 5 |
| A | No Contact | No Contact | 2 |
| A | Other status | Other | 3 |
| A | Another status | Other | 6 |
| A | Yet another status | Other | 2 |
+-------------+--------------------+-------------+-------+
You then create a tablix that uses the ProfileName as a row grouping, and the StatusGroup as a column group. The tablix will look like this in the designer:
+-----------------+
| [StatusGroup] |
+-----------------+-----------------+
| [ProfileName] | [Sum(Count)] |
+-----------------+-----------------+
You can then add a totel column on the right, and add an additional level to the column group with interactive expand/collapse functionality (to expand the "Other" StatusGroup into the individual statuses). Using SSRS expressions, you should be able to hide the expand/collapse button on the column headers of the StatusGroups that are not "Other".
Hope this is enough to get you started.

Related

How to write a SQL for a calculation based on incremental window of batch table

My requirement is to calculate based on an incremental size window for a batch table.
For example, the first window has 1 row, the second window has 2 rows(including 1 row from the 1st window and a new row), then 3 rows in the 3rd window(including 2 rows from the 2nd window and a new row), and so on.
For example:
Source table:
datetime | productId | price |
3-1 | p1 | 10 |
3-2 | p1 | 20 |
3-3 | p1 | 30 |
3-4 | p1 | 40 |
Result table:
datetime | productId | average|
3-1 | p1 | 10/1 |
3-2 | p1 | (10+20)/2 |
3-3 | p1 | (10+20+30)/3 |
3-4 | p1 | (10+20+30+40)/4 |
I am trying to find a way to implement this requirement with Sql, to me seems the OVER action can do that but not yet implemented in flink, so I need an alternative way.
BTW:
I tried to use a TUMBLE window of 1 day and store the previous value in the user defined aggregation object but failed as the aggregation object will be reused by all product not a single object for each product
The OVER clause on a batch table is not supported by Flink's SQL yet. You can track the status of this effort here.
However, did you consider to implement this behavior on a streaming table instead? Streaming tables can also read from static files such as CSV files and many operations are supported there as well. This depends on the other operations you want to use in your query, though.

SSAS - MDX calculated member

I've a fact table that details individual line amounts for orders placed by my organisation. In this fact, at line level, I've included the total order amount to be used, as it's possible we might need that level of detail at some point.
Here's an example of what I've got:-
+------------+------------+---------------+------------+---------------------+
| BookingKey | Booking_ID | Category_FKey | Line_Value | Total_Booking_Value |
+------------+------------+---------------+------------+---------------------+
| 1 | 12 | 8 | 150 | 700 |
| 2 | 12 | 4 | 150 | 700 |
| 3 | 12 | 5 | 300 | 700 |
| 4 | 12 | 4 | 100 | 700 |
+------------+------------+---------------+------------+---------------------+
As you can see, the Total_Booking_Value here is the sum of the Line_Value for the booking in the example (Booking_ID = 12).
The Category_FKey looks up to a Categories dimension.
Using this structure I've created a simple cube and this works fine, mainly.
The issue I have is that I'd like to be able to view the Total Line_Value amount, and somehow include the Total_Booking_Value alongside it.
So, for example I might add the Categories dimension as a filter and want to filter by say Category_FKey = 4.
If this was the case I'd want the aggregates to tell me that the total Line_Value was 250 (for BookingKeys 2 and 4), and the Total_Booking_Value should be 700. Using normal aggregation (ie SUM) I'm getting the Total_Booking_Value as 1400 (obviously - because it's adding 700 * 2 for the two rows the cube would return).
So, the way I see it I'd like to create an MDX calculation that somehow takes the Total_Booking_Value and gives just the value for the Booking in question.
Should this be done using some kind of average, or division by the Distinct number of items? I can't figure this out. I tried something like this:-
create member currentcube.measures.[Calculated Booking Value]
as
[Measures].[Total_Booking_Value] / count(Measures.Booking_ID);
But this isn't working.
Hopefully this makes sense and you can point me in the right direction.
I find it strange that booking_ID is a measure - intuitively it strikes me as something that would be an attribute and therefore a hierarchy - in which case you'd be able to do the count like this:
[Measures].[Total_Booking_Value]
/
COUNT(EXISTING [Booking].[Booking_ID].[Booking_ID].members)
A straightforward solution would be to have two fact tables: one with granularity booking key and one with granularity booking id. The first would contain all columns except total booking value, and the second would contain columns booking id and total booking value.
Then each of both measures would easily be summable.
The reference type between the second fact table and the category dimension could be configures as many-to-many via the first fact table. Thus, you would see the full values of the involved bookings for each selected category, automatically eliminating double counting.

Two ways to store the same data

At work we're creating a form to allow property agents to submit their new developments. A simplified version of our form is the following:
Bedrooms: [Enter a number]
Quantity: [Enter a number]
Add Another | Save
We allow agents to add multiple rows. However at the moment we have absolutely zero validation for duplicates, which in my opinion allows our database to store identical data in two ways:
| development_id | bedrooms | quantity |
|----------------|----------|----------|
| 1 | 3 | 1 |
| 1 | 3 | 1 |
| 1 | 3 | 3 |
Clearly a row could represent both one unit or a group of units.
I'm arguing that we should store the developments either one way of the other, but certainly not both. Unfortunately the back-end developers — I'm mostly front-end — are arguing that it's not a big deal, and to me that seems absurd.
For a simple example, by storing it as the above, a COUNT to obtain how many developments are for sale that have 3 bedrooms requires a SELECT COUNT(*) and consideration of the quantity field.
As a front-end developer it seems largely to be presentation logic, because transforming between rendering them as a list of single units, or grouping them together should be a front-end/API task, and the business logic should be one way or the other. Ultimately our table seems to be not normalised at all.
In my humble opinion there should be a unique index on development_id, bedrooms.
Am I right in my argument? Or horribly wrong?
Edit:
In clarification all of these are currently possible, all of which represent the same fact, and my argument is there should be only one way:
| development_id | bedrooms | quantity |
|----------------|----------|----------|
| 1 | 3 | 1 |
| 1 | 3 | 1 |
| 1 | 3 | 1 |
Same as:
| development_id | bedrooms | quantity |
|----------------|----------|----------|
| 1 | 3 | 1 |
| 1 | 3 | 2 |
Same as:
| development_id | bedrooms | quantity |
|----------------|----------|----------|
| 1 | 3 | 3 |
You're right, there should be only one way to record each fact in a database and duplicate rows should not be allowed. If each row represents the quantity of units that have a certain number of bedrooms in a particular development, then a unique key on development_id, bedrooms makes sense, and will prevent multiple entries for the same kind of unit in each development.
Funnily, you & backend colleagues/rivals are both right.
It's not a big deal, for real (in the shown circumstances).
Although it really violates DB normalization (in the shown circumstances).
From what you reveal, there's no need to split into multiple rows.
Although imagine it gains another attribute that distinct one three-bed from another, from now on. Say, apt plan. Or a timestamp, for whatever reason.
It immediately starts making sense then.
Another thing here: reads are generally non-blocking, writes are.
That means, on a mature RDBMS with row-level locks, the inserts (and reads for COUNT) won't be competing, while updates to a counter would.
Although I'm way far from thinking your realty agents combined would ever achieve even single-digit TPS in their additions, so you may consider an issue non-existent for a scale. :-)

Which is a better database schema for a tracking tool?

I have to generate a view that shows tracking across each month. The ultimate view will be something like this:
| Person | Task | Jan | Feb | Mar| Apr | May | June . . .
| Joe | Roof Work | 100% | 50% | 50% | 25% |
| Joe | Basement Work | 0% | 50% | 50% | 75% |
| Tom | Basement Work | 100% | 100% | 100% | 100% |
I already have the following tables:
Person
Task
I am now creating a new table to foreign key into the above 2 tables and i am trying to figure out the pros and cons of creating 1 or 2 tables.
Option 1:
Create a new table with the following Columns:
Id
PersonId
TaskId
Jan2012
Feb2012
Mar2012
Apr2013
or
Option 2:
have 2 seperate tables
One table for just
Id
PersonId
TaskId
and another table for just the following columns
Id
PersonTaskId (the id from table above)
MonthYearKey
MonthYearValue
So an example record would be
| 1 | 13 | Jan2011 | 100% |
where 13 would represent a specific unique Person and Task combination. This second way would avoid having to create new columns to continue over time (which seems right) but i also want to avoid overkill.
which would be a more scalable way to have this schema. Also, any other suggestions or more elegant ways of doing this would be great as well?
You can have a m2m table with data columns. I don't see a reason why you can't just put MonthYearKey, MonthYearValue on the same table with PersonId and TaskId
Id
TaskId
PersonId
MonthYearKey
MonthYearValue
It's possible too that you would want to move the MonthYearKey out into their own table, it really just comes down to common queries and what this data is used for.
I would note, you never want to design a schema where you are adding columns due to time. The first option would require maintenance all the time, and would become very difficult to query also.
Option 2 is definitely more scalable and is not overkill.
Option 1 would require you to add a new column every month and simple date based queries of your data would not be possible, e.g. Show me all people who worked at least 90% in any month last year.
The ultimate view would be generated from a particular query or view of your data.

Summarizing across multiple columns in sql or crystal

I was wondering if there was a way to get a distinct count on a certain column based on the value of a second column while still getting a total count of the first column. This is an example of the issue I'm facing. I have a query that returns an i-Vent type, ID, Status, and linked medication orders for a pharmacy intervention system. The interventions are grouped by i-Vent type. The Status can be one of five values or NULL. I need to be able to count how many i-Vents were recorded as each of the six possible values for Status.
An example set may look similar to this:
________________________________________________________
Type | ID | Status | Linked Meds
________________________________________________________
IV2PO | 1234 | Accepted | pantoprazole IV
IV2PO | 1234 | Accepted | pantoprazole PO
IV2PO | 1235 | NULL | NULL
IV2PO | 1236 | Pending | metoclopramide IV
IV2PO | 1236 | Pending | metoclopramide PO
IV2PO | 1236 | Pending | Pharmacy Consult - IV2PO
Consult | 1237 | Rejected | NULL
________________________________________________________
The group summary should list IV2PO having a total count of 3 with a count of 1 for "Accepted", 1 for "NULL", and 1 for "Pending"; and Consult having a total count of 1 with a count of 1 for "Rejected".
Please take notice of the duplicate values caused by having more than one medication/order liked to an i-Vent.
Ultimately I'm building the final report in Crystal Reports so if there is a way to get the correct counts there that would be fine as well. I have a version of this which uses a subreport to get the linked medications/orders, but I'd like to find a better alternative to take less time to run and use fewer resources.
Does anyone know of a way to do this?
Thanks!
In Crystal Reports you can use Count distinct summary option
When creating a "Summary", using the Count function may not be desirable. It is often the case that a report must only return the number of unique contact records, as other tables (i.e. History) may contain multiple rows for each customer.
Select Insert | Summary.
Select the fieldname you wish to summarize.
Make sure to select Distinct Count as the Summary Operation.

Resources