Aggregate query - column for querying affects aggregation - sql-server

I have a table "Scores" with fields as follows:
UserId
LessonId
ExerciseId
Score
Timestamp
I want to setup a view, "vw_AggregateScoreForUser" that will aggregate data from that table, as follows:
SELECT UserId,
LessonId,
COUNT(ExerciseId) AS TotalExercises,
SUM(Score) AS TotalScore,
COUNT(DISTINCT CONVERT(date, Timestamp)) AS StudyDays
FROM Scores
GROUP BY UserId, LessonId
The tricky bit is StudyDays, where I'm counting the unique dates that the user has at least one entry here on - that gives me the days that they "studied", i.e. completed at least one exercise.
Now, say that I want to execute this view for lessons 1 to 5.
SELECT FROM vw_AggregateScoreForUser WHERE UserId = 1 AND LessonId BETWEEN 1 AND 5;
What I want, is one record returned that aggregates the data for those 5 lessons. But with the above setup, the data is grouped by LessonId, so I will get 5 records back.
The issue is that StudyDays may now be incorrect as it's computed per lesson. E.g. with the following data:
UserId LessonId ExerciseId ... Timestamp
1 1 1 2019-11-21 09:00
1 1 2 2019-11-22 10:00
1 2 1 2019-11-22 11:00
I would get the result
UserId LessonId TotalExercises ... StudyDays
1 1 2 2
1 2 1 1
I can't simply add StudyDays to get the number of days studied. That would give me 3, but the distinct count for StudyDays overall should be 2.
The issue is that I need LessonId in the view in order to be able to use it in the WHERE clause, but having it in the view will group my data by lesson causing the aggregate to be incorrect.
How do you include a field in a view so that you can filter on it, without having it affect the aggregation that occurs in that view?

Some grouping aggregates can't be stacked in multiple levels, as they give different result. A count-distinct from a count-distinct isn't the same as applying a count-distinct from the original set. The same happens with averages, which take into account the number of rows.
The problem in your case is the GROUP BY LessonID with a COUNT DISTINCT inside the view. You are already computing values by LessonID when you want (later on) multiple LessonID values to be computed together as a set.
As long as you keep your GROUP BY inside the view, you will have this problem. A solution would be changing the view for a table-valued function, which allows a range of lessons to be supplied:
CREATE FUNCTION dbo.ufnUserLessonSummary (
#UserID INT,
#LessonIDFrom INT,
#LessonIDTo INT)
RETURNS TABLE
AS RETURN
SELECT
UserId,
LessonId,
COUNT(ExerciseId) AS TotalExercises,
SUM(Score) AS TotalScore,
COUNT(DISTINCT CONVERT(date, Timestamp)) AS StudyDays
FROM
Scores AS S
WHERE
S.UserID = #UserID AND
S.LessonID BETWEEN #LessonIDFrom AND #LessonIDTo
GROUP BY
UserId,
LessonId
You can query it like the following:
SELECT
S.*
FROM
dbo.ufnUserLessonSummary(1, 1, 5) AS S
However, this is limited to a range of lessons. What happens if you want only lessons 1, 3 and 5? Another more complex, but more versatile option is to use an SP with a pre-loaded input table:
CREATE PROCEDURE dbo.uspUserLessonSummary
AS
BEGIN
SELECT
UserId,
LessonId,
COUNT(ExerciseId) AS TotalExercises,
SUM(Score) AS TotalScore,
COUNT(DISTINCT CONVERT(date, Timestamp)) AS StudyDays
FROM
Scores AS S
INNER JOIN #UserLesson AS U ON
S.UserID = U.UserID AND
S.LessonID = U.LessonID
GROUP BY
UserId,
LessonId
END
You can supply which records you want by loading the temporary table before executing:
IF OBJECT_ID('tempdb..#UserLesson') IS NOT NULL
DROP TABLE #UserLesson
CREATE TABLE #UserLesson (
UserID INT,
LessonID INT)
INSERT INTO #UserLesson (
UserID,
LessonID)
VALUES
(1, 1),
(1, 2),
(1, 3),
(1, 4),
(1, 5)
EXEC dbo.uspUserLessonSummary
You can also use variable tables with this approach.

Related

min(count(*)) over... behavior?

I'm trying to understand the behavior of
select ..... ,MIN(count(*)) over (partition by hotelid)
VS
select ..... ,count(*) over (partition by hotelid)
Ok.
I have a list of hotels (1,2,3)
Each hotel has departments.
On each departments there are workers.
My Data looks like this :
select * from data
Ok. Looking at this query :
select hotelid,departmentid , cnt= count(*) over (partition by hotelid)
from data
group by hotelid, departmentid
ORDER BY hotelid
I can perfectly understand what's going on here. On that result set, partitioning by hotelId , we are counting visible rows.
But look what happens with this query :
select hotelid,departmentid , min_cnt = min(count(*)) over (partition by hotelid)
from data
group by hotelid, departmentid
ORDER BY hotelid
Question:
Where are those numbers came from? I don't understand how adding min caused that result? min of what?
Can someone please explain how's the calculation being made?
fiddle
The 2 statements are very different. The first query is counting the rows after the grouping and then application the PARTITION. So, for example, with hotel 1 there is 1 row returned (as all rows for Hotel 1 have the same department A as well) and so the COUNT(*) OVER (PARTITION BY hotelid) returns 1. Hotel 2, however, has 2 departments 'B' and 'C', and so hence returns 2.
For your second query, you firstly have the COUNT(*), which is not within the OVER clause. That means it counts all the rows within the GROUP BY specified in your query: GROUP BY hotelid, departmentid. For Hotel 1, there are 4 rows for department A, hence 4. Then you take the minimum of 4; which is unsurprisingly 4. For all the other hotels, they have at least 1 entry with only 1 row for a hotel and department and so returns 1.

SQL to identify initial dates for a product that changes from active to cancelled status

I have a table that records the following items:
product_id
product_status
date
Products can exist in the following product statuses: pending, active, or canceled. Only one status can exist per date per product code. A status and product code is inserted for each and every day a product exists.
Utilizing SQL I'd like to be able to identify the initial cancellation dates for a product that cancels more than once in a given time frame.
i.e. if a product is active for 3 days and then cancels for 3 days and then is active again for 3 days and then cancels again for another 3 days.
I'd like to be able to identify day 1 of the 2 cancellation periods.
Thought I'd get the crystal ball out for this one. This sounds like a Gaps and Islands question. There's plenty of answers on how to do this on the internet, however, this might be what you're after:
CREATE TABLE #Sample (product_id int,
product_status varchar(10),
[date] date); --blargh
INSERT INTO #Sample
VALUES (1,'active', '20170101'),
(1,'active', '20170102'),
(1,'active', '20170103'),
(1,'cancelled', '20170104'),
(1,'cancelled', '20170105'),
(1,'cancelled', '20170106'),
(1,'active', '20170107'),
(1,'pending', '20170108'),
(1,'active', '20170109'),
(1,'cancelled', '20170110'),
(2,'pending', '20170101'),
(2,'active', '20170102'),
(2,'cancelled', '20170103'),
(2,'cancelled', '20170104');
GO
SELECT *
FROM #Sample;
WITH Groups AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY product_id
ORDER BY [date]) -
ROW_NUMBER() OVER (PARTITION BY product_id, product_status
ORDER BY [date]) AS Grp
FROM #Sample)
SELECT product_id, MIN([date]) AS cancellation_start
FROM Groups
WHERE product_status = 'cancelled'
GROUP BY Grp, product_id
ORDER BY product_id, cancellation_start;
GO
DROP TABLE #Sample;
If not, then see Patrick Artnet's comment.

Sum field(s) from one table in another table, summing from 3 different tables

I have an Access database with customer IDs. Each customer can have multiple orders and each order can be of a different type. I have three separate tables (Online, In-store, Payment Plan) for each order type with various amounts from each order, all are related to a customer ID. In one of the tables, there are two types of order types that amounts must be maintained separately withing the same table. I want to sum each order type in another table called Totals. I can successfully create a query to get the sums for each type based on the customer ID but I am not sure how to pull those values in my Totals table. The scenario below is repeated for multiple customers and each type is its own table---the payment plans are in a table together. I have historical data so I am limited to how I can manipulate as far as merging fields and what not.
Customer ID#: 1
Order Type: Online
Online Amount: $20.00
Order Type: Online
Online Amount: $40.00
Sum of Online Amount: $60.00
Order Type: In-store
Online Amount: $35.00
Order Type: In-store
Online Amount: $60.00
Sum of In-Store Amount: $95.00
Order Type: Payment Plan
Payment Plan 1 Amount: $30.00
Payment Plan 1 Amount: $23.00
Sum of Payment Plan 1 Amount: $53.00
Order Type: Payment Plan 2
Payment Plan 2 Amount: $35.00
Payment Plan 2 Amount: $30.00
Sum of Payment Plan 2 Amount: $65.00
In my Totals table I have a field for each type that sums the amount spent by each customer ID and then a field where all of their order types are summed into one overall total field.
I am learning as I go so any help/example is appreciated. Thank you.
Having separate tables for your different order types doesn't help. For a database it would be better to have a single table for all sales with a sale_type field.
You don't describe exactly what your tables look like, so I've had to make a couple of assumptions. If your tables contain an OrderType field then you can create a Union query to join all your sales together:
SELECT CustomerID
, OrderType
, Amount
FROM Online
UNION ALL SELECT CustomerID
, OrderType
, Amount
FROM [In-Store]
UNION ALL SELECT CustomerID
, OrderType
, Amount
FROM [Payment Plan]
If you don't have an OrderType you can hard-code the values into the query:
SELECT CustomerID
, "Online" AS OrderType
, Amount
FROM Online
UNION ALL SELECT CustomerID
, "In-Store"
, Amount
FROM [In-Store]
UNION ALL SELECT CustomerID
, "Payment Plan"
, Amount
FROM [Payment Plan]
Note - The field name is declared for the OrderType in the first Select block. You could do it in each block, but Access only looks at the first.
Like all queries, the results come in table form and can be treated as such. So now we need to list the CustomerName (I'm assuming you have a Customers table), the OrderType and the sum of the amount for that Customer & OrderType.
SELECT CustomerName
, OrderType
, SUM(Amount)
FROM Customers INNER JOIN
(
SELECT CustomerID
, OrderType
, Amount
FROM Online
UNION ALL SELECT CustomerID
, OrderType
, Amount
FROM [In-Store]
UNION ALL SELECT CustomerID
, OrderType
, Amount
FROM [Payment Plan]
) T1 ON Customers.CustomerID = T1.CustomerID
GROUP BY CustomerName
, OrderType
All sales in your three tables will have a customer within the customers table so we can use an INNER JOIN to return only records where the value appears in both tables (Customers table & result of query table).
The UNION QUERY is wrapped in brackets and given the name T1and joined to the Customers table on the CustomerID field.
We group all fields that aren't part of an aggregate function, so group on CustomerName and OrderType and sum the Amount field.
This is all you really need to do - let the query run each time you want the totals to get the most up to date values. There's shouldn't be a need to push the results to a Totals table as that will be out of date as soon as you make a new sale (or someone returns something).
If you really want to INSERT these figures into a Total table just add a first line to the SQL:
INSERT INTO Total (CustomerName, OrderType, Amount)
Here is a dirty workaround, though I think there might be a more direct solution to it.
You could create an output table (I broke it down to ID, Online, InStore and Total) and use DSum functions within an UPDATE query.
UPDATE tbl_Totals SET
Total_InStore = DSum("Amount", "tbl_InStore", "Customer_ID = " & Customer_ID),
Total_Online = DSum("Amount", "tbl_Online", "Customer_ID = " & Customer_ID),
Total = DSum("Amount", "tbl_InStore", "Customer_ID = " & Customer_ID) + DSum("Amount", "tbl_Online", "Customer_ID = " & Customer_ID)

Average data in its own row

I have data that returns the same value multiple times in one column, I only want to include the first value or even average the group, since they are all the same value. The group itself might have 3 rows of payments, but the payments are the same. I just want the three rows to show, but only the one payment in its own column.
In the data below I would like to add another column that averages Rich and Bob's value and inputs the amount in the top row for Rich and Bob.
Sample Data:
1 Rich 300
2 Rich 300
3 Rich 300
4 Bob 250
5 Bob 250
You probably want something like this:
Just paste this into an empty query window and exectue. Adapt to your needs...
DECLARE #tbl TABLE(ID INT, PersonName VARCHAR(100),Amount DECIMAL(6,2))
INSERT INTO #tbl VALUES
(1,'Rich',300)
,(2,'Rich',300)
,(3,'Rich',300)
,(4,'Bob',250)
,(5,'Bob',250);
WITH NumberedPerson AS
(
SELECT tbl.*
,ROW_NUMBER() OVER(PARTITION BY PersonName ORDER BY ID) PersonID
,AVG(Amount) OVER(PARTITION BY PersonName) PersonAvg
FROM #tbl AS tbl
)
SELECT *
,CASE WHEN PersonID=1 THEN PersonAvg ELSE NULL END AS AverageInFirstRow
FROM NumberedPerson
ORDER BY ID
But - to be honest - that is absolutely not the way how this should be done...

How Can i Create a View or table out of two tables so that their rows are not merged?

I have two views that i want tho merge them into one view so that their records are not merge into one record! i mean suppose I have these tables :
Table one(suppose this is a sell table were our customer sold something!)
Date Description Fee Number Money
12/2/2012 something 10$ 20 200$
10/3/2012 somethingelse 20$ 30 600$
Table Two (suppose this is the table where our customer got money!)
Date Description Money
02/8/2012 someinfo 5000$
12/1/2012 stuff 3100$
And the resulting Table or view would be(based on the descending order on date) :
Date Description Fee Number Money
02/8/2012 someinfo 0 0 5000$
10/3/2012 somethingelse 20$ 30 600$
12/2/2012 something 10$ 20 200$
12/1/2012 stuff 0 0 3100$
How can I achieve this form? These two tables are separate ,but each has a unique personal ID which represents the salesmen account. ( so basically this means that these information belong to one person only.and our customer wants a report that gives him this specific view only!)
I tried using UNION on these two tables , but the rows where merged!!
If i use Joins there would only be a row where the two tables row are merged together .So I am stuck here and dont know what to do now .
I think you need UNION ALL not just UNION.
select Date, Description, Fee, Number, Money
from table1
UNION ALL
select Date, Description, 0 Fee, 0 Number, Money
from table2
order by Date
Try somthing like
CREATE VIEW vMyView
AS
SELECT [Date], [Description], [Fee], [Number], [Money]
FROM v1
UNION ALL
SELECT [Date], [Description], 0 AS [Fee], 0 AS [Number], [Money]
FROM v2
I think this should do it.
CREATE VIEW new_view AS
SELECT * FROM table_one
UNION ALL
SELECT *, 0 as Fee, 0 as Number FROM table_two;

Resources