Aggregating column in view - sql-server

I have a table with "HoursCompleted" and is calculated daily.
CREATE TABLE MyTable
(
Id INT NOT NULL IDENTITY(1,1),
PersonId INT NOT NULL,
DateValue DATE NOT NULL,
HoursToday DECIMAL(3,2) NOT NULL
)
What I need to do is create a view that shows those fields, as well as a TotalHoursForPerson.
So, I can Select from View, where PersonId = x, and it returns:
Id|PersonId|DateValue|HoursToday|Total
1,1,'01-JAN', 5, 5
2,1,'02-JAN', 8, 13
3,1,'03-JAN', 2, 15
etc
etc
But I am unsure if I can get that 'total' column.

You try to achieve a cumulative sum, here is one concise and efficient way to do it :
CREATE VIEW YourView
AS
SELECT Id, PersonId, DateValue, HoursToday,
SUM(HoursToday) OVER (PARTITION BY PersonId ORDER BY Id /* or date */) AS Total
FROM MyTable
Then, to query the view :
SELECT * FROM YourView WHERE PersonId = 42

Related

Converting subquery into window function

I had an interview where they asked me: "Find the most popular class among students using their first enrollment date", where the assumption was that a student would pick their favorite class first. For simplicity, no two EnrollmentDT could be exactly the same and there are no data issues (e.g. a student can't be enrolled in the same class twice).
They expected me to use a window function, and I'm curious how to do that for this problem.
I quickly setup some seed data as such (I'm aware the seed portion isn't a perfect representation, but I needed something close enough quickly):
IF OBJECT_ID('StudentClass') IS NOT NULL DROP TABLE StudentClass;
IF OBJECT_ID('Class') IS NOT NULL DROP TABLE Class;
IF OBJECT_ID('Student') IS NOT NULL DROP TABLE Student;
CREATE TABLE Student (
StudentID INT IDENTITY(1,1) PRIMARY KEY,
[Name] UNIQUEIDENTIFIER DEFAULT NEWID(),
);
CREATE TABLE Class (
ClassID INT IDENTITY(1,1) PRIMARY KEY,
[Name] UNIQUEIDENTIFIER DEFAULT NEWID(),
ClassLevel INT DEFAULT CAST(CEILING(RAND() * 3) AS INT)
);
CREATE TABLE StudentClass (
StudentClassID INT IDENTITY(1,1),
StudentID INT FOREIGN KEY REFERENCES Student (StudentID),
ClassID INT FOREIGN KEY REFERENCES Class (ClassID),
EnrollmentDT DATETIME2
);
GO
INSERT INTO Student DEFAULT VALUES
GO 50
INSERT INTO Class DEFAULT VALUES
GO 5
DECLARE #StudentIndex INT = 1;
DECLARE #Cycle INT = 1;
WHILE #Cycle <= 5
BEGIN
IF RAND() > 0.5
BEGIN
INSERT INTO StudentClass (StudentID, ClassID, EnrollmentDT)
VALUES
(#StudentIndex, #Cycle, DATEADD(SECOND, CAST(CEILING(RAND() * 10000) AS INT), SYSDATETIME()))
END
SET #StudentIndex = #StudentIndex + 1;
IF #StudentIndex = 50
BEGIN
SET #Cycle = #Cycle + 1;
SET #StudentIndex = 1;
END
END
But the only thing I could come up with was:
SELECT
sc.ClassID,
COUNT(*) AS IsFavoriteClassCount
FROM
StudentClass sc
INNER JOIN (
SELECT
StudentID,
MIN(EnrollmentDT) AS MinEnrollmentDT
FROM
StudentClass
GROUP BY
StudentID
) sq
ON sc.StudentID = sq.StudentID
AND sc.EnrollmentDT = sq.MinEnrollmentDT
GROUP BY
sc.ClassID
ORDER BY
IsFavoriteClassCount DESC;
Any guidance on their thinking would be greatly appreciated! If I made any errors in my constructions / query, take that as a proper error and not something intentional.
SELECT
ClassID,
COUNT(*) AS IsFavoriteClassCount
FROM
(
SELECT
sc.ClassID,
sc.StudentID,
ROW_NUMBER() OVER (PARTITION BY sc.StudentID
ORDER BY
sc.EnrollmentDT) AS rn
FROM
StudentClass sc
)
t
WHERE
rn = 1
GROUP BY
ClassID
ORDER BY
IsFavoriteClassCount DESC;
The query uses ROW_NUMBER() as a window function to assign a unique row number to each student's first enrollment in a class (based on their enrollment date). The inner query selects the ClassID and the StudentID for each enrollment and adds the row number, partitioned by the StudentID and ordered by the enrollment date. The outer query filters for the rows where the row number is 1, which indicates that these are the first enrollments for each student, and aggregates the data by the ClassID to get the count of how many times each class was the first choice of a student.
You can use FIRST_VALUE() window function to get the first pick of each student:
WITH cte AS (
SELECT DISTINCT StudentID,
FIRST_VALUE(ClassID) OVER (PARTITION BY StudentID ORDER BY EnrollmentDT) AS ClassID
FROM StudentClass
)
SELECT TOP 1 WITH TIES
ClassID,
COUNT(*) AS IsFavoriteClassCount
FROM cte
GROUP BY ClassID
ORDER BY COUNT(*) DESC;
Remove TOP 1 WITH TIES to get results for all classes.

Prevent Grouping rows by NULL value

According to this article:
When grouping with a column in a GROUP BY statement that contains NULLs, they will be put into one group in your result set:
However, what I want is to prevent grouping rows by NULL value.
The following code gives me one row:
IF(OBJECT_ID('tempdb..#TestTable') IS NOT NULL)
DROP TABLE #TestTable
GO
CREATE TABLE #TestTable ( ID INT, Value INT )
INSERT INTO #TestTable(ID, Value) VALUES
(NULL, 70),
(NULL, 70)
SELECT
ID
, Value
FROM #TestTable
GROUP BY ID, Value
The output is:
ID Value
NULL 70
However, I would like to have two rows. My desired result looks like this:
NULL 70
NULL 70
Is it possible to have two rows with GROUP BY?
UPDATE:
What I need is to count those rows:
SELECT
COUNT(1) AS rows
FROM (SELECT 1 AS foo
FROM #TestTable
GROUP BY ID, Value
)q
OUTPUT: 1
But, actually, there are two rows. I need output to have 2.
What you need is a way to make NULL values in Id unique. Using the following code will make the values unique, but continue to group the non-NULL value by virtue of the default value for a case expression being NULL:
group by Id, case when Id is NULL then NewId() end, Value
Assuming you want this behavior because you do want to group by the values of the nullable column (Id in your example), you can add a row_number when the id column is null using a common table expression to create an artificial difference between duplicate groups - like this:
-- Adding some more rows to the table
INSERT INTO #TestTable(ID, Value) VALUES
(NULL, 70),
(NULL, 70),
(1, 70),
(1, 70),
(2, 70);
The query, with the cte:
WITH CTE AS
(
SELECT Id, Value, IIF(Id IS NULL, ROW_NUMBER() OVER(ORDER BY Id), NULL) As Surrogate
FROM #TestTable
)
SELECT
ID
, Value
FROM CTE
GROUP BY ID, Surrogate, Value
Results:
ID Value
NULL 70
NULL 70
1 70
2 70

SQL Server table design to define WHERE condition

I have an existing Stored procedure which has lots of hard-coding with IF conditions. The procedure checks the values of following input fields and displays relevant message: The fields are:
• BrandId
• ProductId
• SchemeId
• RegionId
The existing Message table:
MsgId MsgText
1 AAAA
2 BBBB
3 CCCC
4 MMMM
Existing stored proc. pseudo code:
IF(BrandId in (5,10))
IF(#ProductId in (5))
SELECT ‘BBBB’ as MsgText
END IF
END IF
IF(SchemeId in (1,5,10))
SELECT ‘AAAA’ as MsgText
IF(SchemeId =2 AND #RegionId=4)
SELECT ‘BBBB’ as MsgText
IF (#RegionId=6)
SELECT ‘MMMM’ as MsgText
In order to remove hard-coding and re-writing the procedure cleanly from scratch, I want to design new tables which will store "MsgId"s against a BrandId/ProdId/PlanId/SchemeId value or against a combination of these fields (e.g SchemeId =2 AND RegionId=4).With this kind of design I can directly fetch the relevant MsgId against a specific field or combination of fields.
Could anybody suggest table designs to meet the requirement?
Based on your responses to the comments, this might work out.
create table dbo.[Messages] (
MessageId int not null
, MessageText nvarchar(1024) not null
, constraint pk_Messages primary key clustered (MessageId)
);
insert into dbo.[Messages] (MessageId,MessageText) values
(1,'AAAA')
, (2,'BBBB')
, (13,'MMMM');
create table dbo.Messages_BrandProduct (
BrandId int not null
, ProductId int not null
, MessageId int not null
, constraint pk_Messages_BrandProduct primary key clustered
(BrandId, ProductId, MessageId)
);
insert into dbo.Messages_BrandProduct (BrandId, ProductId, MessageId) values
(5,5,2)
,(10,5,2);
create table dbo.Messages_SchemeRegion (
SchemeId int not null
, RegionId int not null
, MessageId int not null
, constraint pk_Messages_SchemeRegion primary key clustered
(SchemeId, RegionId, MessageId)
);
insert into dbo.Messages_SchemeRegion (SchemeId, RegionId, MessageId)
select SchemeId = 1, RegionId , MessageId = 1 from dbo.Regions
union all
select SchemeId = 5, RegionId , MessageId = 1 from dbo.Regions
union all
select SchemeId = 10, RegionId , MessageId = 1 from dbo.Regions
union all
select SchemeId = 2, RegionId = 4, MessageId = 2
union all
select SchemeId , RegionId = 6, MessageId = 13 from dbo.Schemes;
In your procedure you could pull the messages like this:
select MessageId
from dbo.Messages_BrandProduct mbp
inner join dbo.[Messages] m on mbp.MessageId=m.MessageId
where mbp.BrandId = #BrandId and mbp.ProductId = #ProductId
union -- union all if you don't need to deduplicate messages
select MessageId
from dbo.Messages_SchemeRegion msr
inner join dbo.[Messages] m on msr.MessageId=m.MessageId
where msr.SchemeId = #SchemeId and msr.RegionId = #RegionId;
This should do it.
CREATE TABLE [dbo].[IDs](
[BrandID] [int] NOT NULL,
[ProductID] [int] NOT NULL,
[SchemeID] [int] NOT NULL,
[RegionID] [int] NOT NULL,
[MsgID] [int] NOT NULL
)
You can adjust the table and column names as needed. Cheers.

Trouble Writing Update Query

I want to write an update script for the following table.
Id int,
Title nvarchar(100),
ProgramId int,
EventId int,
SortOrder int
I want to set the SortOrder column to 1 through N, as sorted by the Id column. However, I want the number to restart when either ProgramId or EventId changes. That is, I'd like the numbering sequence 1...N for each row with the same ProgramId and EventId values, and then restart the numbering for the next ProgramId and EventId values.
I know I could use ROW_NUMBER to get a row number based on the current sorting, but I don't see how I could restart the number when one of those other two columns changes. Is this even possible?
Like this:
;WITH cte As
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ProgramId, EventId ORDER BY Id) As RN
FROM YourTable
)
UPDATE cte
SET SortOrder = RN

Problem with unique SQL query

I want to select all records, but have the query only return a single record per Product Name. My table looks similar to:
SellId ProductName Comment
1 Cake dasd
2 Cake dasdasd
3 Bread dasdasdd
where the Product Name is not unique. I want the query to return a single record per ProductName with results like:
SellId ProductName Comment
1 Cake dasd
3 Bread dasdasdd
I have tried this query,
Select distict ProductName,Comment ,SellId from TBL#Sells
but it is returning multiple records with the same ProductName. My table is not realy as simple as this, this is just a sample. What is the solution? Is it clear?
Select ProductName,
min(Comment) , min(SellId) from TBL#Sells
group by ProductName
If y ou only want one record per productname, you ofcourse have to choose what value you want for the other fields.
If you aggregate (using group by) you can choose an aggregate function,
htat's a function that takes a list of values and return only one : here I have chosen MIN : that is the smallest walue for each field.
NOTE : comment and sellid can come from different records, since MIN is taken...
Othter aggregates you might find useful :
FIRST : first record encountered
LAST : last record encoutered
AVG : average
COUNT : number of records
first/last have the advantage that all fields are from the same record.
SELECT S.ProductName, S.Comment, S.SellId
FROM
Sells S
JOIN (SELECT MAX(SellId)
FROM Sells
GROUP BY ProductName) AS TopSell ON TopSell.SellId = S.SellId
This will get the latest comment as your selected comment assuming that SellId is an auto-incremented identity that goes up.
I know, you've got an answer already, I'd like to offer a way that was fastest in terms of performance for me, in a similar situation. I'm assuming that SellId is Primary Key and identity. You'd want an index on ProductName for best performance.
select
Sells.*
from
(
select
distinct ProductName
from
Sells
) x
join
Sells
on
Sells.ProductName = x.ProductName
and Sells.SellId =
(
select
top 1 s2.SellId
from
Sells s2
where
x.ProductName = s2.ProductName
Order By SellId
)
A slower method, (but still better than Group By and MIN on a long char column) is this:
select
*
from
(
select
*,ROW_NUMBER() over (PARTITION BY ProductName order by SellId) OccurenceId
from sells
) x
where
OccurenceId = 1
An advantage of this one is that it's much easier to read.
create table Sale
(
SaleId int not null
constraint PK_Sale primary key,
ProductName varchar(100) not null,
Comment varchar(100) not null
)
insert Sale
values
(1, 'Cake', 'dasd'),
(2, 'Cake', 'dasdasd'),
(3, 'Bread', 'dasdasdd')
-- Option #1 with over()
select *
from Sale
where SaleId in
(
select SaleId
from
(
select SaleId, row_number() over(partition by ProductName order by SaleId) RowNumber
from Sale
) tt
where RowNumber = 1
)
order by SaleId
-- Option #2
select *
from Sale
where SaleId in
(
select min(SaleId)
from Sale
group by ProductName
)
order by SaleId
drop table Sale

Resources