Summarizing across multiple columns in sql or crystal - sql-server

I was wondering if there was a way to get a distinct count on a certain column based on the value of a second column while still getting a total count of the first column. This is an example of the issue I'm facing. I have a query that returns an i-Vent type, ID, Status, and linked medication orders for a pharmacy intervention system. The interventions are grouped by i-Vent type. The Status can be one of five values or NULL. I need to be able to count how many i-Vents were recorded as each of the six possible values for Status.
An example set may look similar to this:
________________________________________________________
Type | ID | Status | Linked Meds
________________________________________________________
IV2PO | 1234 | Accepted | pantoprazole IV
IV2PO | 1234 | Accepted | pantoprazole PO
IV2PO | 1235 | NULL | NULL
IV2PO | 1236 | Pending | metoclopramide IV
IV2PO | 1236 | Pending | metoclopramide PO
IV2PO | 1236 | Pending | Pharmacy Consult - IV2PO
Consult | 1237 | Rejected | NULL
________________________________________________________
The group summary should list IV2PO having a total count of 3 with a count of 1 for "Accepted", 1 for "NULL", and 1 for "Pending"; and Consult having a total count of 1 with a count of 1 for "Rejected".
Please take notice of the duplicate values caused by having more than one medication/order liked to an i-Vent.
Ultimately I'm building the final report in Crystal Reports so if there is a way to get the correct counts there that would be fine as well. I have a version of this which uses a subreport to get the linked medications/orders, but I'd like to find a better alternative to take less time to run and use fewer resources.
Does anyone know of a way to do this?
Thanks!

In Crystal Reports you can use Count distinct summary option
When creating a "Summary", using the Count function may not be desirable. It is often the case that a report must only return the number of unique contact records, as other tables (i.e. History) may contain multiple rows for each customer.
Select Insert | Summary.
Select the fieldname you wish to summarize.
Make sure to select Distinct Count as the Summary Operation.

Related

Natural key when combining two source tables that use similar promary keys

I have two source tables, one is basically an invoice, the other is a migrated invoice. The same object should probably have been used for both, but I have this instead. They contain most of the same data.
I had thought to combine both into a dimension table, however both will use the same natural keys. How should I approach this?
One potential solution I thought of was using negative numbers for the migrated table, but then the natural keys won't align exactly with the source.
Do I just combine them in the fact table? Then I can't link back to the dimension table for either due to NULLs.
Or do I add an additional column or information to indicate which type of invoice it is?
EDIT
Simple models of the current tables below.
The dimension currently only contains the non migrated data, it has a primary key, however
if i merge the migrated invoice table in to this, it will appear as if the changes are being
made to the original invoices and not a second set of invoices
Dimension
surrogate_key| source_pk | Total | scd_from | scd_to
| | | |
1 | 1 | 100 | 01/01/2019 | 31/01/2019
2 | 1 | 150 | 01/02/2019 | 31/12/2019
3 | 2 | 50 | 01/01/2019 | 31/12/9999
source invoice table
pk | Total
___________________
1 | 150
2 | 50
source migrated invoice table
pk | total
___________________
1 | 200
2 | 300
If invoice and migrated invoice have same natural key but some of the fields have different values (your example shows Total amount different between them), then you have one row based on the natural key in the Dim but 2 different columns to represent the 2 sources. Based on your example, you need invoice_Total and migrated_invoice_Total columns in your DIM.

SSAS - MDX calculated member

I've a fact table that details individual line amounts for orders placed by my organisation. In this fact, at line level, I've included the total order amount to be used, as it's possible we might need that level of detail at some point.
Here's an example of what I've got:-
+------------+------------+---------------+------------+---------------------+
| BookingKey | Booking_ID | Category_FKey | Line_Value | Total_Booking_Value |
+------------+------------+---------------+------------+---------------------+
| 1 | 12 | 8 | 150 | 700 |
| 2 | 12 | 4 | 150 | 700 |
| 3 | 12 | 5 | 300 | 700 |
| 4 | 12 | 4 | 100 | 700 |
+------------+------------+---------------+------------+---------------------+
As you can see, the Total_Booking_Value here is the sum of the Line_Value for the booking in the example (Booking_ID = 12).
The Category_FKey looks up to a Categories dimension.
Using this structure I've created a simple cube and this works fine, mainly.
The issue I have is that I'd like to be able to view the Total Line_Value amount, and somehow include the Total_Booking_Value alongside it.
So, for example I might add the Categories dimension as a filter and want to filter by say Category_FKey = 4.
If this was the case I'd want the aggregates to tell me that the total Line_Value was 250 (for BookingKeys 2 and 4), and the Total_Booking_Value should be 700. Using normal aggregation (ie SUM) I'm getting the Total_Booking_Value as 1400 (obviously - because it's adding 700 * 2 for the two rows the cube would return).
So, the way I see it I'd like to create an MDX calculation that somehow takes the Total_Booking_Value and gives just the value for the Booking in question.
Should this be done using some kind of average, or division by the Distinct number of items? I can't figure this out. I tried something like this:-
create member currentcube.measures.[Calculated Booking Value]
as
[Measures].[Total_Booking_Value] / count(Measures.Booking_ID);
But this isn't working.
Hopefully this makes sense and you can point me in the right direction.
I find it strange that booking_ID is a measure - intuitively it strikes me as something that would be an attribute and therefore a hierarchy - in which case you'd be able to do the count like this:
[Measures].[Total_Booking_Value]
/
COUNT(EXISTING [Booking].[Booking_ID].[Booking_ID].members)
A straightforward solution would be to have two fact tables: one with granularity booking key and one with granularity booking id. The first would contain all columns except total booking value, and the second would contain columns booking id and total booking value.
Then each of both measures would easily be summable.
The reference type between the second fact table and the category dimension could be configures as many-to-many via the first fact table. Thus, you would see the full values of the involved bookings for each selected category, automatically eliminating double counting.

SQL Server 2005 & Up - Concatenated Value based on Value in Joined Table

Prelude: The design of this database is truly horrible - this isn't the first "crooked" question I've asked, and it won't be the last. The question is what it is, and I'm only asking because A) I only have a couple of years of experience with SQL Server, and B) I've already been pounding my face on my keyboard for a couple of days trying to find a viable solution.
Having said all of that...
We have a database with two tables relevant to this issue. The schema is ridiculous, so I'm going to paraphrase so that it can be understood:
T_Customer & T_Task
T_Task holds data about various work that has been performed on behalf of a customer in T_Customer.
In T_Customer, there is a field called "Sort_Type" (again, paraphrasing...). In this field, there is a concatenated string of various fields from T_Task which, in the order specified, determine how the customer's report is produced in the client program. There are a total of 73 possible fields in T_Task that can be chosen as Sort_Type in T_Customer, and the user can choose up to 5 of them in any given order. For example:
T_Customer
Customer_ID | Sort_Type
------------|-------------------------------
1 | 'Task_Date,Task_Type,Task_ID'
2 | 'Task_Type,Destination'
3 | 'Route,Task_Type,Task_ID'
T_Task
Task_ID | Customer_ID | Task_Type | Task_Date | Route | Destination
--------|-------------|-----------|-----------|---------|-------------
12345 | 1 | 1 | 01/01/2017| '1 to 2'| '2'
12346 | 1 | 1 | 01/02/2017| '3 to 4'| '4'
12347 | 2 | 2 | 12/31/2016| '6 to 2'| '2'
12348 | 3 | 3 | 01/01/2017| '4 to 1'| '1'
In this example, Customer #1's report would be sorted/totaled by the Task_Date, then by Task_Type, then by Task_ID; but not simply by doing an ORDER BY. This function requires one single value which can be ordered as a whole, single unit. As such...
Up until today, a field existed in T_Task called (paraphrasing....) 'MySort'. This field contained a concatenated string of fixed-width values filled in with zeroes and created according to the order and content of the values in T_Customer.Sort_Type. In this case:
Task_ID | Customer_ID | Task_Type | Task_Date | Route | Destination | MySort
--------|-------------|-----------|-----------|-------|-------------|-------
12345 | 1 | 1 | 01/01/2017| 1 to 2| 2 |'002017010100000000010000012345'
12346 | 1 | 1 | 01/02/2017| 3 to 4| 4 |'002017010200000000010000012346'
12347 | 2 | 2 | 12/31/2016| 6 to 2| 2 |'000000000000000000020000000002'
12348 | 3 | 3 | 01/01/2017| 4 to 1| 1 |'000040to0100000000030000012348'
During the printing phase of every single report, the program would search for the customer, find the values in T_Customer.Sort_Type, split them by commas, and then run an update on all of the tasks of that customer to update the value of MySort accordingly...
Can you guess what the problem is with this? Performance (not to mention chronic insanity)
I have been tasked with finding a more efficient way of performing this same task server-side, within SQL Server 2005 if possible, using whatever means will eventually allow me to return a result set including all of the details of the tasks requested, together with a concatenated string similar to the one used in the past (which the client program relies upon in order to sort and subtotal the report).
I've tried Views, UDFs in computed columns, and parameterized queries, but I know my limitations. I'm too inexperienced to know all of my options.
Question: Aside from quitting (not an option) or going berserk (considering it...), what methods might you use to solve this problem?
EDIT: Having received two questions about the MySort column already,
I'll explain a bit better.
T_Task.MySort =
REPLICATE('0',10 - LEN(T_Customer.Sort_Type[Value1])
+ CAST(T_Customer.Sort_Type[Value1] AS VARCHAR(10))
+
REPLICATE('0',10 - LEN(T_Customer.Sort_Type[Value2])
+ CAST(T_Customer.Sort_Type[Value2] AS VARCHAR(10))
+
REPLICATE('0',10 - LEN(T_Customer.Sort_Type[Value3])
+ CAST(T_Customer.Sort_Type[Value3] AS VARCHAR(10))
WHERE T_Customer.Customer_ID = T_Task.Customer_ID
...Up to T_Customer.Sort_Type[Value5].
Reminder: Those values are not constants at all, so the value of the
field MySort had to constantly be updated before printing a report.
The idea is to somehow remove the need to constantly update the field,
and instead return the string as part of the result set.
The resulting string should always be 50 characters in length. I
didn't do that here simply to save a bit of space and time - I chose
only 3 for the example. The real string would simply have another
twenty zeroes leading the value:
'00000000000000000000002017010100000000010000012345'

How to add a break-out aggregate row group in SSRS

I know virtually nothing about SSRS, so forgive me if I'm using the wrong vocab.
The group I'm working for has a list of volunteer opportunities. Each opportunity has a specified number of volunteers needed. The database keeps track of people who have volunteered, for which opportunity they volunteered, and the status of their volunteering: whether they've just signed up and need to be contacted, whether they're somewhere in the process of becoming a volunteer, whether they are volunteering, or whether they've quit volunteering.
Certain volunteer opportunities (i.e. those requiring contact with children and therefore requiring a background check), have more volunteer states than the rest of the opportunities. For these opportunities there are a total of 14 states, compared to 3 or 4 for the rest.
I need to create a report that displays the counts of people in each state for each opportunity. It's unreasonable for there to be 15 columns (14 + the volunteers needed) for states when most have only three. For children-related opportunities, I want to specify an 'other' column, and have a expansion [+] to the left of the volunteer opportunity name which will expand out all the children-specific states with their associated counts.
The report as it is now looks like this:
My background is in database/query design, so naturally I wrote a query with a joined sub-query for each of the columns. Rather than making an additional new subquery for each of the 'Other' states, I assume there's a way that I can have a single subquery join grouping on the volunteer status, and let SSRS do the rest of the work. (I may be wrong about SSRS's capabilities here.)
My proposed query looks something like this:
SELECT vo.name, vo.volunteers_needed, vm.status, vm.status_count
FROM tbl_volunteer_opportunity vo
JOIN (SELECT volunteer_opp_id, status, COUNT(*) "status_count"
FROM tbl_volunteer_opportunity_member
GROUP BY volunteer_opp_id, status) vm ON vo.volunteer_opp_id = vm.volunteer_op_id
I now need to make a tablix and/or datasource to make columns for the Connected, In Process, No Contact, and an aggregate for Other values, and then do something else for a expansion for each of the Other statuses.
I'm not sure how to do either of those things.
Assuming you have a query that returns data in the following format:
+-------------+--------------------+-------------+-------+
| ProfileName | Status | StatusGroup | Count |
+-------------+--------------------+-------------+-------+
| A | Needed | Needed | 5 |
| A | Connected | Connected | 3 |
| A | In Process | In Process | 5 |
| A | No Contact | No Contact | 2 |
| A | Other status | Other | 3 |
| A | Another status | Other | 6 |
| A | Yet another status | Other | 2 |
+-------------+--------------------+-------------+-------+
You then create a tablix that uses the ProfileName as a row grouping, and the StatusGroup as a column group. The tablix will look like this in the designer:
+-----------------+
| [StatusGroup] |
+-----------------+-----------------+
| [ProfileName] | [Sum(Count)] |
+-----------------+-----------------+
You can then add a totel column on the right, and add an additional level to the column group with interactive expand/collapse functionality (to expand the "Other" StatusGroup into the individual statuses). Using SSRS expressions, you should be able to hide the expand/collapse button on the column headers of the StatusGroups that are not "Other".
Hope this is enough to get you started.

Checking for overlapping car reservations

I'm writing a simple booking program for a car rental (a school assignment). Me and my buddy are trying to make the system a little more advanced than the assignment dictates, but we're having some problems we hoped you could help us with.
The idea is that you can reserve a certain car type, and when you get the car it will be one of that type (you don't reserve a specific car, as our assignment dictates, but only a type). Only one customer can have the car on a specific date. As the reservations tick in we have to make sure, that we don't hire out more cars of each type than we've got. The reservations are basically stored with a start date, an end date, and a car type.
If we ignore the car type for now (lets say we only have one type) then the reservations could graphically look something like this:
1/12 2/12 3/12 4/12 5/12 6/12 7/12
|-------------------|
|-----------------|
|-----|
|-------|
|-----------|
|-------------|
If the rental only has three cars it would be possible to rent a car from 3/12 to 5/12 since all days only have 2 car reservations. But how do we know this? Do we have to check each date and count() the number of reservations that spans over that date?
And what if somebody had reserved a car on 4/12, then 3/12 and 5/12 would still only have 2 reservations, but 4/12 would have 3.
Would it be possible to do with a query some how, or do we have to step through each date in the program to check the number of reservations didn't exceed the number of cars?
(This is easy enough with only full dates, but consider the scenario where you could rent the cars on an hourly basis (not only on a daily as here). Then it could be a though one to step through each our if we have a lot of reservations and cars and the timespan is long...)
Hope you have some nice ideas that will help us along. Thanks for taking the time to read the question :)
Mikkel, Denmark
Assume, You have such reservation situation in real life:
1/12 2/12 3/12 4/12 5/12 6/12 7/12
Car1: |-------------------|
Car2: |-----------------|
Car3: |-------| |-----------| |-----|
Car4: |-------------|
Table car
| id | type | registration |
| 1 | 1 | HH1111 |
| 2 | 1 | HH3333 |
| 3 | 2 | HH77 |
| 4 | 3 | DD999 |
Table reservation
| car_id | date_from | date_to |
| 1 | 2013-12-01 | 2013-12-04 |
| 2 | 2013-12-04 | 2013-12-07 |
| 3 | 2013-12-01 | 2013-12-02 |
| 3 | 2013-12-03 | 2013-12-05 |
| 3 | 2013-12-06 | 2013-12-07 |
| 4 | 2013-12-01 | 2013-12-03 |
Now, You must by really simple logic, select all available cars for period
from 2013-12-05 to 2013-12-06
"Select ALL cars, which does not have any reservation with dates, which blocks it for usage"
with brillian mysql select:
select * from car where not exists ( select * from reservation
where car.id = reservation.car_id AND
date_from < '2013-12-06' AND
date_to > '2013-12-05' )
"Would it be possible to do with a query some how, or do we have to step through each date in the program to check the number of reservations didn't exceed the number of cars? (This is easy enough with only full dates,"
The nature of your problem is that a violation of the constraint could appear on any individual date. So logically speaking, it is indeed necessary to do the check for each individual date comprised in a new reservations. The only optimisation possible would be to do the check at the level of "smallest intervals". To do that, you must first compute all the intervals that already appear in the database, and which overlap with your new reservation.
For example, a new reservation for 4/12-6/12 would have to be split into 4/12-5/12 (second line) and 5/12-6/12 (third line). Those individual intervals might be longer than one single day, and you can do the checks on the level of those individual intervals. (They are the same as individual days in this particular example, but a reservation 7/12-19/12 would not have to be split at all.
However, computing this might prove difficult, and there's another caveat: when you're looking al multi-row inserts, you should also be splitting over the other rows to be inserted (and that requires you to record all the inserted rows in a temporary table, otherwise you won't be able to access them).

Resources