Understanding GroupBY - sql-server

I have a table in SQL Server where personal entries and exits are recorded.
There are multiple hours of entry or exit on the same day. What I need is to recover the first entry and the last exit of the day.
Date hour Clock
------------------------
01/01/2017 09:00 1
01/01/2017 11:30 2
01/01/2017 17:00 2
02/01/2017 7:59 1
02/01/2017 16:00 1
I have this SQL query that works correctly.
SELECT
d.Date,
MIN(d.hour) as Entry,
MAX(dt.hour) as Exit
FROM
#temp1 AS d
LEFT JOIN
#temp1 AS dt ON d.Date = dt.Date
GROUP BY
d.Date
ORDER BY
Date DESC
BUT if I add 2 more columns to the query
SELECT
d.Date,
d.clock as ClockEntry, -- Aggregated column to display
MIN(d.hour) as Entry,
dt.clock as ClockExit, -- Aggregated column to display
MAX(dt.hour) as Exit
FROM
#temp1 AS d
LEFT JOIN
#temp1 AS dt ON d.Date = dt.Date
GROUP BY
d.Date
ORDER BY
Date DESC
I get this error:
Column '# temp1.clock' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I just need to group by the field "date", I do not want to add more conditions to the GROUP BY.. How could I solve it?
I want this result
DATE ClockEntry Entry ClockExit Exit
-------------------------------------------------------
01/01/2017 1 09:00 2 17:00
02/01/2017 1 7:59 1 16:00

So there is an easy way to do this - use 2 ranking functions:
Partitioned by date, Ordered by the hour ascending
Partitioned by date, Ordered by the hour descending
At that point the rows can be joined where they both have a value of 1, and the dates match.
I tend to use a CTE for this:
with temp2 (MinId, MaxId, Date, Hour, Clock)
AS
(
select ROW_NUMBER() Over (partition by date order by hour),
ROW_NUMBER() Over (partition by date order by hour desc),
*
from temp1
)
select distinct
d1.Date,
d1.Clock,
d1.Hour,
d2.Clock,
d2.Hour
FROM temp2 d1
LEFT JOIN temp2 d2
ON d1.Date = d2.Date -- dates match
AND d1.MinId=d2.MaxId -- minId=earliest record MaxId=latest record
WHERE d1.MinId=1

GROUP BY Method
If you know that only a single value will get out the clock column,
just aggregate the values with a MAX or MIN aggregate function for example like this :
SELECT
d.Date,
MIN(d.clock) as ClockEntry,
MIN(d.hour) as Entry,
MAX(dt.clock) as ClockExit,
MAX(dt.hour) as Exit
FROM #temp1 AS d
LEFT JOIN #temp1 AS dt
ON d.Date= dt.Date
GROUP BY d.Date
order by Date desc
Or if you have multiple clock values and want to see them all,
add them to the GROUP BY statement :
SELECT
d.Date,
d.clock as ClockEntry,
MIN(d.hour) as Entry,
dt.clock as ClockExit,
MAX(dt.hour) as Exit
FROM #temp1 AS d
LEFT JOIN #temp1 AS dt
ON d.Date= dt.Date
GROUP BY d.Date, d.clock, dt.clock
order by Date desc
ORDER BY Method
Use a cursor or a Common Table Expression for each date to get both first entry and last exit.
First Entry for a given date
SELECT TOP 1 d.date, d.clock as ClockEntry, d.hour as Entry
FROM #temp1 AS d
WHERE d.date = #myDate
ORDER BY d.hour ASC
Last Exit for a given date
SELECT TOP 1 d.date, d.clock as ClockExit, d.hour as Exit
FROM #temp1 AS d
WHERE d.date = #myDate
ORDER BY d.hour DESC
Reference :
GROUP BY Documentation
AGGREGATE FUNCTIONS

Related

Eliminate count that comes from the left join and not the sales table?

The following query returns the sales per month, regardless if the month is 0. If I group #sales by month then it will return one sale in February and 2 sales in April. This is correct.
Since this will be used as a running total later on, I'll need a count for all months, even if 0. So I do a left join with table 1#months` so that every month is returned regardless of the count.
declare #sales table
(
salesdate date
)
insert into #sales
select '2/20/2021' union
select '4/15/2021' union
select '4/20/2021'
/* This result is correct: will return two rows: one sale in February and 2 sales in April */
select month(salesdate) as monthnumber, count(*) 'salesTotal' from
#sales
group by month(salesdate)
order by month(salesdate)
declare #months table
(
monthnumber int,
monthname varchar(10)
)
insert into #months
select 1, 'January' union
select 2, 'February' union
select 3, 'March' union
select 4, 'April' union
select 5, 'May'
select m.monthname, m.monthnumber, count(*)
from #months m left join #sales s on
m.monthnumber = month(salesdate)
group by m.monthname, m.monthnumber
The last query returns the following result:
monthname monthnumber TotalSales
January 1 1
February 2 1 //This is correct
March 3 1
April 4 2 //This is correct
May 5 1
The problem with this result is that the left join will always return a positive count(*) even if the actual sales count is 0.
How can I exclude the row count if there aren't any sales for the month?
Instead of doing COUNT(*), you need an aggregation that will differentiate between non-NULL and NULL values from #sales in the event that the LEFT JOIN does not match because there's no sales in a given month. Try something like this:
select m.monthname, m.monthnumber, sum(case when s.salesdate is not null then 1 else 0 end)
from #months m
left join #sales s
on m.monthnumber = month(salesdate)
group by m.monthname, m.monthnumber
The SUM() in conjunction with the CASE statement will only tally sales that match, as opposed to always counting the single NULL row that is the result of the LEFT JOIN missing.
A much simpler solution than #pwang (which is very useful in some situations), is to just use COUNT on a column.
COUNT will only count non-null values, so we can just specify a column from the sales table, which will be null if there is no join result:
select
m.monthname,
m.monthnumber,
count(salesdate)
from #months m
left join #sales s
on m.monthnumber = month(salesdate)
group by m.monthname, m.monthnumber
The column must be a NOT NULL column, otherwise results may be incorrect.

Select Earliest Date Within a Range of Dates Before a Break Occurs

I have been trying to find a solution for getting the most recent start date from a series of date ranges. I have found similar topics on StackOverflow as well as other websites, but none of worked for my specific scenario.
Here are two examples of the data in my database:
Example 1
Start Date | End Date
-----------|-----------
8/26/2006 | 5/31/2016
6/1/2016 | 12/31/2017
1/1/2018 | NULL
For this example, I'm expecting the result of the query to be: 8/26/2006. This is because the start and end dates are continuous all the way back to the original start date.
Example 2
Start Date | End Date
-----------|-----------
7/6/2014 | 11/30/2014
1/1/2019 | NULL
For this example, I'm expecting the result of the query to be: 1/1/2019. This is because there is a break between 11/30/2014 and 1/1/2019.
I don't need a list of all of the dates or even the end dates returned. I just need the earliest start date before a break in the date ranges.
I'm guessing what I need is a recursive CTE to loop through the records, such as this:
WITH CTE AS
(
SELECT
T1.StartDate
,T1.EndDate
FROM
ExampleTable AS T1
LEFT JOIN
ExampleTable AS T2
ON
T1.EmployeeID = T2.EmployeeID
AND T1.StartDate - 1 = T2.EndDate
WHERE
T1.EmployeeID = #EmployeeID
UNION ALL
SELECT
C.EmployeeID
,C.StartDate
,T2.EndDate
FROM
CTE AS C
JOIN
ExampleTable AS T2
ON
C.EmployeeID = T2.EmployeeID
AND T2.StartDate - 1 = C.EndDate
)
SELECT
StartDate
,NULLIF(MAX(ISNULL(EndDate, '32121231')), '32121231') AS EndDate
FROM
CTE
GROUP BY
StartDate;
But no luck. It always returns all of the date ranges I listed in examples 1 or 2. Can anyone help please?
This seems the simplest method to get the result:
SELECT TOP 1 StartDate
FROM YourTable
ORDER BY CASE WHEN LAG(EndDate) OVER (ORDER BY StartDate) = DATEADD(DAY,-1,StartDate) THEN 1 ELSE 0 END,
StartDate DESC;
So, for your data:
WITH VTE AS(
SELECT CONVERT(date, StartDate,101) AS StartDate,
CONVERT(date, EndDate,101) AS EndDate
FROM (VALUES('7/6/2014','11/30/2014'),
('1/1/2019',NULL)) V(StartDate, EndDate))
SELECT TOP 1 StartDate
FROM VTE
ORDER BY CASE WHEN LAG(EndDate) OVER (ORDER BY StartDate) = DATEADD(DAY,-1,StartDate) THEN 1 ELSE 0 END,
StartDate DESC;
WITH VTE AS(
SELECT CONVERT(date, StartDate,101) AS StartDate,
CONVERT(date, EndDate,101) AS EndDate
FROM (VALUES('8/26/2006','5/31/2016'),
('6/1/2016 ','12/31/2017'),
('1/1/2018 ',NULL)) V(StartDate, EndDate))
SELECT TOP 1 StartDate
FROM VTE
ORDER BY CASE WHEN LAG(EndDate) OVER (ORDER BY StartDate) = DATEADD(DAY,-1,StartDate) THEN 1 ELSE 0 END,
StartDate DESC;

How to get row with last day of month in Sql Server query

Given a table with a single row for each day of the month, how can I query it to get the row for the last day of each month?
Try adapting the following query. The SELECT statement within the IN clause choses the dates for the outer query to return.
SELECT *
FROM myTable
WHERE DateColumn IN
(
SELECT MAX(DateColumn)
FROM myTable
GROUP BY YEAR(Datecolumn), MONTH(DateColumn)
)
Try to make use of below query:
DECLARE #Dates Table (ID INT, dt DATE)
INSERT INTO #Dates VALUES
(1,'2017-02-01'),
(2,'2017-02-03'),
(3,'2017-02-04'),
(4,'2017-03-03'),
(5,'2017-04-03'),
(6,'2017-04-04')
SELECT MAX(dt) AS LastDay FROM #Dates GROUP BY DATEPART(MONTH,dt)
OUTPUT
LastDay
2017-02-04
2017-03-03
2017-04-04
OR
SELECT DATEPART(MONTH,dt) AS [MONTH],MAX(DATEPART(DAY,dt)) AS LastDay FROM #Dates GROUP BY DATEPART(MONTH,dt)
MONTH LastDay
2 4
3 3
4 4
You need to select from your table where the YourDateColumn field of the record equals the last date of the month YourDateColumn belongs to:
SELECT CAST(DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0, YourDateColumn )+1,0)) AS DATE)

SQL SMS 2008 -Count column ids and count duplicate ids if createddate is greater than 3 months between ids

*Edit (Hopefully to be more clear)
Table below, I would like to count ids and count duplicate ids where the createddate has a gap of 3 months or more for that ID.
Query I have so far...
if object_id('tempdb..#temp') is not null
begin drop table #temp end
select
top 100
a.id, a.CreatedDate
into #temp
from tbl a
where 1=1
--and year(CreatedDate) = '2015'
if object_id('tempdb..#temp2') is not null
begin drop table #temp2 end
select t.id, count(t.id) as Total_Cnt
into #temp2
from #temp t
group by id
select distinct #temp2.Total_Cnt, #temp2.id, #temp.CreatedDate, DENSE_RANK() over (partition by #temp.id order by createddate) RK
from #temp2
inner join #temp on #temp2.id = #temp.id
where 1=1
order by Total_Cnt desc
Results:
Total_cnt id createddate rk
3 1 01-01-2015 1
3 1 03-02-2015 2
3 1 01-02-2015 3
2 2 05-01-2015 1
2 2 05-02-2015 2
1 3 06-01-2015 1
1 4 07-01-2015 1
Count ids and only count duplicate ids when the createddate from the id is greater than 3 months.
Something like this...
Total_cnt id Countwith3monthgap
3 1 2
2 2 1
1 3 1
1 4 1
You can use a cte and ROW_NUMBER to get your order and self join the cte based on the order..
WITH cte AS
( SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedDate) Rn
FROM
Test
)
SELECT
c1.ID,
COUNT(CASE WHEN c2.CreatedDate IS NULL THEN 1
WHEN c1.CreatedDate >= DATEADD(month,3,c2.CreatedDate) THEN 1
END)
FROM
cte c1
LEFT JOIN cte c2 ON c1.ID = c2.ID
AND c1.RN = c2.RN + 1
GROUP BY
c1.ID
You also need to use a conditional count where the Previous CreatedDate is null or if the Current CreatedDate is >= the Previous CreatedDate + 3 months
If you happen to be using SQL 2012+ you can also use LAG here to get the same result
SELECT
ID,
COUNT(*)
FROM
(SELECT
ID,
CreatedDate CurrentDate,
LAG(CreatedDate) OVER (PARTITION BY ID ORDER BY CreatedDate) PreviousDate
FROM
Test
) T
WHERE
PreviousDate IS NULL
OR CurrentDate >= DATEADD(month, 3, PreviousDate)
GROUP BY
ID
You can use a lag to get the previous date, Null for the first in the list
SELECT
id,
lag(CreatedDate,1) OVER (PARTITION BY Id ORDER BY CreatedDate) AS PreviousCreateDate,
CreatedDate
FROM #t
You can use that as a subquery and get the difference in months using DATEDIFF
SELECT sub.id,DATEDiff(month, sub.PreviousCreateDate ,sub.CreatedDate)
FROM (SELECT
id,
lag(CreatedDate,1) OVER (PARTITION BY Id ORDER BY CreatedDate) AS PreviousCreateDate,
CreatedDate
FROM #t) sub
WHERE DATEDiff(month, sub.PreviousCreateDate ,sub.CreatedDate) >=3
OR sub.PreviousCreateDate IS NULL
You can then take your totals
SELECT sub.id,COUNT(sub.id) as cnt
FROM (SELECT
id,
lag(CreatedDate,1) OVER (PARTITION BY Id ORDER BY CreatedDate) AS PreviousCreateDate,
CreatedDate
FROM #t) sub
WHERE DATEDIFF(month, sub.PreviousCreateDate ,sub.CreatedDate) >=3
OR sub.PreviousCreateDate IS NULL
GROUP BY sub.id
Note that using datediff the last day of january is three months before the first day of march. That appears to be the logic you were after.
You might want to define your three month gap criteria as
WHERE sub.PreviousCreateDate <= DATEADD(month, -3, sub.CreatedDate)
OR sub.PreviousCreateDate IS NULL
or
WHERE sub.CreatedDate >= DATEADD(month, +3, sub.PreviousCreateDate )
OR sub.PreviousCreateDate IS NULL
I'm guessing that your desired definition of three-month gap doesn't coincide with datediff()'s. Most of the logic here is to look back at the previous date and decide if the gap is big enough to qualify.
When datediff() counts three months difference we still need to make sure the day of month is later than the first one (per example and ID 5). If difference is more than three months then we're good automatically.
But I'm also assuming that you would want to treat the distance from November 30th to February 28th (or 29th in a leap year) as a full three months because the end date falls on the final day of the month. By adjusting the end date by an extra day this is an easy scenario to snag as it will bump the date into the following month and increase the month difference by one as well. If that's not what you want then just remove the dateadd(day, 1, ...) portion and use only the raw CreatedDate value.
You sample data is limited so I'm also making the assumption that the gaps are measure between consecutive dates. If you're wanting to find blocks of runs that don't span more than three months across the set, then that's a different problem and you should clarify with more information.
Since you've indicated that you're probably on SQL Server 2008 you'll have to do without the lag() function. Although the first query could be adjusted for that it's likely easier to go with the second approach at the end.
with diffs as (
select
ID,
row_number() over (partition by ID order by CreatedDate) as RN,
case when
datediff(
month,
lag(CreatedDate, 1) over (partition by ID order by CreatedDate),
CreatedDate
) = 3
and
datepart(
day,
lag(CreatedDate, 1) over (partition by ID order by CreatedDate)
) <= datepart(day, CreatedDate)
or
datediff(
month,
lag(CreatedDate, 1) over (partition by ID order by CreatedDate),
/* adding one day to handle gaps like Nov30 - Feb28/29 and Jan31 - Apr30 */
dateadd(day, 1, CreatedDate)
) >= 4
then 1
else 0
end as GapFlag
from <T> /* <--- your table name here */
), gaps as (
select
ID, RN,
sum(1 + GapFlag) over (partition by ID order by RN) as Counter
from diffs
)
select ID, count(distinct Counter - RN) as "Count"
from gaps
group by ID
The rest of the logic is a typical gaps and islands scenario looking for holes in the sum(1 + GapCount) sequence with the offset of 1 acting pretty much like row_number().
http://sqlfiddle.com/#!6/61b12/3
JamieD77's approach is also valid. I was originally thinking your problem involved more than looking at the rows in sequence. Here's how I would tweak it for the gap definition I've been running with:
with data as (
select ID, CreatedDate, row_number() over (partition by ID order by CreatedDate) as RN
from T
)
select ID, count(*) as "Count"
from data d1 left outer join data d0
on d0.ID = d1.ID and d0.RN = d1.RN - 1 /* connect to the one before */
where
datediff(month, d0.CreatedDate, d1.CreatedDate) = 3
and datepart(day, d0.CreatedDate) <= datepart(day, d0.CreatedDate)
or datediff(month, d0.CreatedDate, dateadd(day, 1, d0.CreatedDate)) >= 4
or d0.ID is null
group by ID
Edit: You have changed the question since yesterday.
Change this line in the first query to include the total count:
...
select count(*) as TotalCnt, ID, count(distinct Counter - RN) as GapCount
...
Second would look like:
with data as (
select ID, CreatedDate, row_number() over (partition by ID order by CreatedDate) as RN
from T
)
select
count(*) as TotalCnt, ID,
count(case when
datediff(month, d0.CreatedDate, d1.CreatedDate) = 3
and datepart(day, d0.CreatedDate) <= datepart(day, d0.CreatedDate)
or datediff(month, d0.CreatedDate, dateadd(day, 1, d0.CreatedDate)) >= 4
or d0.ID is null then 1 end
) as GapCount
from data d1 left outer join data d0
on d0.ID = d1.ID and d0.RN = d1.RN - 1 /* connect to the one before */
where
group by ID

How to show default value in group by?

I have this query
SELECT DATE,COUNT(DATE) FROM TABLE WHERE DATE BETWEEN DATE1 AND DATE2 GROUP BY DATE..
Result will be like(for date 1/2/2012 to 3/2/2012 i.e 3 dates)
1/2/2012 5
2/2/2012 6
3/2/2012 9
Sometimes if the count is Null means it does not showing date i.e
1/2/2012 5
If (2/2/2012 is not there)
3/2/2012 9
I want to list all dates. i.e like this
1/2/2012 5
2/2/2012 0
3/2/2012 9
How to do that?
You can use a CTE to generate a list of dates, and left join on that:
; with Dates as
(
select cast('2012-01-01' as date) as dt
union all
select dateadd(day, 1, dt)
from Dates
where dateadd(day, 1, dt) < '2012-01-06'
)
select d.dt
, count(yt.id)
from Dates d
left join
YourTable yt
on yt.Date = d.Dt
group by
d.dt;
Live example at SQL Fiddle.

Resources