T-SQL group by from multiple years - sql-server

I barely know how to ask this question aside from the specific example, so here goes:
We have an event registration table, and I want to match registrants that have registered for one of 4 events in each of the preceding 5 years.
The only way I can think of doing this is with verbose sub-queries, but performance-wise it's an absolute dog:
SELECT FirstName, LastName, EmailAddress
FROM RegTable
WHERE EventId IN (1,2,3,4)
AND EventYear = 2011
AND FirstName + LastName + DOB IN (SELECT FirstName + LastName + DOB FROM RegTable WHERE EventId IN (1,2,3,4) AND EventYear = 2012)
And so on for each year. Like I said, not very eloquent or efficient.
Is there a simpler way?

You can do a GROUP BY with HAVING and then do a INTERSECT with current Year events
SELECT FirstName, LastName, DOB
FROM RegTable
WHERE EventId IN (1,2,3,4)
AND EventYear IN (2011,2010,2009,2008,2007)
GROUP BY FirstName, LastName, DOB
HAVING COUNT(Distinct EventYear) = 5
INTERSECT
SELECT DISTINCT FirstName ,LastName ,DOB
FROM RegTable
WHERE EventId IN (1,2,3,4)
AND EventYear = 2012
The above query in action with sample data. SQL Fiddle

I hoestly didn't understand the question, just rewriting your query (assuming that it is doing what you need):
SELECT RT2011.FirstName, RT2011.LastName, RT2011.EmailAddress
FROM
RegTable RT2011,
RegTable RT2012
WHERE RT2011.EventId IN (1,2,3,4)
AND RT2012.EventId IN (1,2,3,4)
AND RT2011.EventYear = 2011
AND RT2012.EventYear = 2012
AND RT2011.FirstName = RT2012.FirstName
AND RT2011.LastName = RT2012.LastName
AND RT2011.DOB = RT2012.DOB

Related

Combining rows with overlapping dates in T-SQL

I have some data similar to the below:
Base data
Student Start Date End Date Course
John 01-Jan-20 30-Sep-20 Business
John 01-Jan-20 30-Dec-20 Psychology
John 01-Oct-20 NULL Music
Jack 01-Feb-20 30-Sep-20 Business
Jack 01-Apr-20 30-Nov-20 Music
I want to transform the data so I have a row for each student, for each time period, with a concatenated list of courses, i.e.
Target output
Student Start Date End Date Course
John 01-Jan-20 30-Sep-20 Business, Psychology
John 01-Oct-20 30-Dec-20 Psychology, Music
John 01-Jan-21 NULL Music
Jack 01-Feb-20 31-Mar-20 Business
Jack 01-Apr-20 30-Sep-20 Business, Music
Jack 01-Oct-20 30-Nov-20 Music
I have a script that works if the dates are identical, using STUFF on the course field and grouping on student/dates (code below). But I can't work out how to handle the overlapping dates?
Select Student,
Courses =
STUFF((select ',' + course
from Table1 b
where a.student = b.student
for XML PATH('')
),1,1,''
)
from table1 a
Group by student
This is a little long winded, as you need to get the groups for the dates. As the dates don't overlap, you then need to do a bit of elimination of some of the groupings too, so it takes a couple of sweeps.
I use CTEs to get the groups I need, and then use a subquery to string aggregate (on a more recent version of SQL Server you can use STRING_AGG and not need a second scan of the table). This ends up with this:
WITH YourTable AS(
SELECT *
FROM (VALUES('John',CONVERT(date,'01-Jan-20'),CONVERT(date,'30-Sep-20'),'Business'),
('John',CONVERT(date,'01-Jan-20'),CONVERT(date,'30-Dec-20'),'Psychology'),
('John',CONVERT(date,'01-Oct-20'),CONVERT(date,NULL),'Music'),
('Jack',CONVERT(date,'01-Feb-20'),CONVERT(date,'30-Sep-20'),'Business'),
('Jack',CONVERT(date,'01-Apr-20'),CONVERT(date,'30-Nov-20'),'Music'))V(Student,StartDate,EndDate,Course)),
Dates AS(
SELECT DISTINCT V.Student, V.[Date]
FROM YourTable YT
CROSS APPLY (VALUES(YT.Student,YT.StartDate),
(YT.Student,YT.EndDate)) V(Student,[Date])),
Islands AS(
SELECT *,
LEAD(ISNULL([Date],'99991231')) OVER (PARTITION BY Student ORDER BY ISNULL([Date],'99991231')) AS NextDate
FROM Dates
WHERE [Date] IS NOT NULL),
Groups AS(
SELECT I.Student,
I.Date AS StartDate,
CASE DATEPART(DAY,I.NextDate) WHEN 1 THEN DATEADD(DAY, -1, I.NextDate) ELSE I.NextDate END AS EndDate,
STUFF((SELECT ',' + YT.Course
FROM YourTable YT
WHERE YT.Student = I.Student
AND YT.StartDate <= I.[Date]
AND (YT.EndDate >= I.NextDate OR YT.EndDate IS NULL)
ORDER BY YT.Course
FOR XML PATH(''),TYPE).value('(./text())[1]','nvarchar(MAX)'),1,1,'') AS Courses
FROM Islands I)
SELECT Student,
StartDate,
EndDate,
Courses
FROM Groups
WHERE ([StartDate] != EndDate OR EndDate IS NULL)
AND Courses IS NOT NULL
ORDER BY Student DESC,
StartDate ASC;

SQL Server Group rows with multiple occurences of Group BY columns

I am trying to summarize a dataset and get the minimum and maximum date for each group. However, a group can exist multiple times if there is a gap. Here is sample data:
CREATE TABLE temp (
id int,
FIRSTNAME nvarchar(50),
LASTNAME nvarchar(50),
STARTDATE datetime2(7),
ENDDATE datetime2(7)
)
INSERT into temp values(1,'JOHN','SMITH','2013-04-02','2013-05-31')
INSERT into temp values(2,'JOHN','SMITH','2013-05-31','2013-10-31')
INSERT into temp values(3,'JANE','DOE','2013-10-31','2016-07-19')
INSERT into temp values(4,'JANE','DOE','2016-07-19','2016-08-11')
INSERT into temp values(5,'JOHN','SMITH','2016-08-11','2017-02-01')
INSERT into temp values(6,'JOHN','SMITH','2017-02-01','9999-12-31')
I am looking to summarize the data as follows:
JOHN SMITH 2013-04-02 2013-10-31
JANE DOE 2013-10-31 2016-08-11
JOHN SMITH 2016-08-11 9999-12-31
A "group by" will combine the two John Smith records together with the incorrect min and max dates.
Any help is appreciated.
Thanks.
As JNevill pointed out, this is a classic Gaps and Islands problem. Below is one solution using Row_Number().
Select FirstName
,LastName
,StartDate=min(StartDate)
,EndDate =max(EndDate)
From (
Select *
,Grp = Row_Number() over (Order by ID) - Row_Number() over (Partition By FirstName,LastName Order by EndDate)
From Temp
) A
Group By FirstName,LastName,Grp
Order By min(StartDate)
Please try the following...
SELECT firstName,
lastName,
MIN( startDate ) AS earliestStartDate,
MAX( endDate ) AS latestEndDate
FROM temp
GROUP BY firstName,
lastName;
This statement will use the GROUP BY statement to group together the records based on firstName and lastName combinations. It will then return the firstName and lastName for each group as well as the earliest startDate for that group courtesy of the MIN() function and the latest endDate for that group courtesy of the MAX() function.
If you have any questions or comments, then please feel free to post a Comment accordingly.

Convert rows to columns in MS SQL

I'm looking for an efficient way to convert rows to columns in MS SQL server.
Example DB Table:
**ID PersonID Person201Code Person201Value**
1 1 CurrentIdNo 0556
2 1 FirstName Queency
3 1 LastName Sablan
The query result should be like this:
**CurrentIdNo FirstName LastName**
0556 Queency Sablan
I tried using PIVOT but it only return null on row values:
SELECT CurrentIdNo, FirstName, LastName
FROM
(
SELECT ID, PersonId, Person201Code, Person201Value
FROM HRPerson201
) src
PIVOT
(
MAX (ID)
FOR Person201Code in (CurrentIdNo, Firstname, LastName))
pvt;
How can I successfully convert rows to columns in MS SQL server?
Thanks!
Remove the ID from pivot source query and add Person201Value pivot aggregate
instead of ID
SELECT CurrentIdNo,
FirstName,
LastName
FROM (SELECT PersonId,
Person201Code,
Person201Value
FROM HRPerson201) src
PIVOT ( Max (Person201Value)
FOR Person201Code IN (CurrentIdNo,
Firstname,
LastName)) pvt;
SQLFIDDLE DEMO
SELECT *
FROM
(SELECT personid,Person201Code,Person201Value
FROM #pivot) Sales
PIVOT(max(Person201Value)
FOR Person201Code in (CurrentIdNo, Firstname, LastName))
AS PivotSales;

SQL query to count non unique field

I'm trying to figure out how to add a count on the email field in the query on the end but the problem I have is some of the data required is unique i.e. ID, DateTime but the email is not which I want a count of. I just can't figure it out how to do it in one SQL line.
e.g. Return:-
101, bla, prd, test#test.com, alfred, comp, test, 2015-10-10 10:10:10, 2 <-- count
100, bla, prd, test#test.com, alfred, comp, test, 2015-09-10 10:11:10, 2
099, bla, prd, anoter#email.com, simpson, comp, test, 2014-10-10 10:10:10, 1
098, bla, prd, bla#email.com, henry, comp, test, 2014-05-10 10:10:10, 1
Query
select TOP 200
ID,
FromPage,
Product,
Email,
Name,
Company,
Industry,
DateTime,
(count code here as EmailTotal)
from InstallEmails
WHERE product like 'prd%'
ORDER BY ID DESC
If you are using SQL Server 2005 or later, you can use a window function
SELECT TOP 200
ID,
FromPage,
Product,
Email,
Name,
Company,
Industry,
DateTime,
EmailTotal = COUNT(*) OVER(PARTITION BY Email)
FROM InstallEmails
WHERE product like 'prd%'
ORDER BY ID DESC;
For earlier versions, you will need to use a subquery:
SELECT TOP 200
ID,
FromPage,
Product,
Email,
Name,
Company,
Industry,
DateTime,
EmailTotal = ( SELECT COUNT(*)
FROM ( SELECT TOP 200 Email
FROM InstallEmails
WHERE product like 'prd%'
ORDER BY id DESC
) AS ie2
WHERE ie2.Email = ie.Email
)
FROM InstallEmails AS ie
WHERE product like 'prd%'
ORDER BY ID DESC;
worked it out.. appears to be ok.
select TOP 200 *, (select COUNT(email) from InstallEmails where email = t.email) as EmailTotal
from InstallEmails as t
where product like 'prd%'
ORDER BY ID DESC

MSSQL: Joining multiple tables with dynamic values

I'm struggling to get following 3 tables into one query:
tPerson
ID FirstName
1 'Jack'
2 'Liz'
tAttribute
ID AttributeName
101 'LastName'
102 'Gender'
tData
PersonID AttributeID AttributeValue
1 101 'Nicholson'
1 102 'Male'
2 101 'Taylor'
2 102 'Female'
Important: The attributes in tAttribute are dynamic. There could be more, e.g.:
ID AttributeName
103 'Income'
104 'MostPopularMovie'
Question: How can I write my query (or queries if neccessary), so that I get following output:
PersonID FirstName LastName Gender [otherFields]
1 'Jack' 'Nicholson' 'Male' [otherValues]
2 'Liz' 'Taylor' 'Female' [otherValues]
I often read "What have you tried so far?", but posting all my failed attempts using subqueries and joins wouldn't make much sense. I'm just not that secure with SQL.
Many thanks in advance.
Thanks to #Tab Alleman, I google for "SQL PIVOT" and came up with following result:
SELECT PersonID,
FirstName,
[LastName],
[Gender]
FROM (
SELECT tPerson.ID AS PersonID,
tPerson.FirstName,
tAttribute.AttributeName,
tData.AttributeValue
FROM tAttribute
INNER JOIN tData ON (
tAttribute.ID = tData.AttributeID
)
INNER JOIN tPerson ON (
tData.PersonID = tPerson.ID
)
) AS unPivotResult
PIVOT (
MAX(AttributeValue)
FOR AttributeName IN ([LastName],[Gender])
) AS pivotResult
Addition: I didn't know how to get LastName and Gender dynamically via SQL, so I did that with ColdFusion, which I use for programming. It will look like this:
<!--- "local.attributes" gets generated by making another query,--->
<!--- I just wrote it statically here for this example --->
<cfset local.attributes = "[LastName],[Gender]" />
<cfquery name="local.persons">
SELECT PersonID,
FirstName,
#local.attributes#
FROM (
...
) AS unPivotResult
PIVOT (
MAX(AttributeValue)
FOR AttributeName IN (#local.attributes#)
) AS pivotResult
</cfquery>
It'd be cool, if I could replace the ColdFusion part with something like
SELECT AttributeName FROM tAttribute and then use that to get the brackets-definition.

Resources