COUNT and COUNT DISTINCT for different groups

COUNT and COUNT DISTINCT for different groups - sql-server

For a SQL Server based report,
Table:
CID Date ID Service Days
1 3/7/2016 1 Individual 3
2 4/5/2016 2 Individual 4
3 5/24/2016 1 Individual 3
4 4/4/2016 4 Group 2
5 4/4/2016 4 Group 2
6 2/18/2016 4 Group 2
7 5/5/2016 5 Group 1
8 5/5/2016 5 Group 1
I used this code:
SELECT
ID,
Service,
COUNT(WHEN Days = 4 THEN 1 END) AS '4Days',
COUNT(WHEN Days = 3 THEN 1 END) AS '3Days',
COUNT(WHEN Days = 2 THEN 1 END) AS '2Days',
COUNT(WHEN Days = 1 THEN 1 END) AS '1Day'
FROM Table T1
GROUP BY
ID,
Service
which gives me this Output:
ID Service 4Days 3Days 2Days 1Day
1 Individual 0 2 0 0
2 Individual 1 0 0 0
4 Group 0 0 3 0
5 Group 0 0 0 2
What I want to do is not count the Group services as separate services for separate individuals, but just as one service per group. A Count Distinct used with the Date or ID could help me do that but I don't know how to make that play with the Individual services where I just wanna count them individually and not using DISTINCT. So the desired output is:
ID Service 4Days 3Days 2Days 1Day
1 Individual 0 2 0 0
2 Individual 1 0 0 0
4 Group 0 0 2 0
5 Group 0 0 0 1
I'll edit the post in case I oversimplified the problem since this is dummy data.

Looks like you could use distinct this way if you wanted:
count(distinct
case when Days = 1 then case when Service = 'Group' then 1 else "Date" end end
) as [1Day]
Depending on your indexing it's possible that introducing another column in the query would change the query plan. I suspect that probably isn't the case though.

If I am not wrong for '2Days' column service type 'Group' count should be '2' if our grouping based on 'Date' column, if so then try this:
SELECT
ID,
Service,
CASE WHEN MAX(t.days) = 4 THEN MAX(t.date) ELSE 0 END AS '4Days',
CASE WHEN MAX(t.days) = 3 THEN MAX(t.date) ELSE 0 END AS '3Days',
CASE WHEN MAX(t.days) = 2 THEN MAX(t.date) ELSE 0 END AS '2Days',
CASE WHEN MAX(t.days) = 1 THEN MAX(t.date) ELSE 0 END AS '1Day'
FROM table T1
OUTER APPLY (SELECT days,
COUNT(DISTINCT(date)) date
FROM Table WHERE days = t1.days GROUP BY days) t
GROUP BY id, service
ORDER BY ID

Based on your last edit, this is the most straight forward way I could think of to handle the query:
with cte as (
select id, service, days
from table t1
where service = 'Individual'
union all
select id, service, days
from table t1
where service = 'Group'
group by id, service, days, date
)
select id,
service,
count(case when days = 4 then 'X' end) as [4Days],
count(case when days = 3 then 'X' end) as [3Days],
count(case when days = 2 then 'X' end) as [2Days],
count(case when days = 1 then 'X' end) as [1Day]
from cte
group by id, service

Related

SQL Server: add column for rows since value changed

I have a table that contains 3 columns: personID, weeknumber, and event. Event is 0 if there was no event for that person in that week and 1 if there was.
I need to create a new column weekssincelastevent which will be 0 for the week where event=1 and then 1,2,3,4 etc for the weeks afterwards. If there is a later event then it starts from 0 again. E.g.
personID
weeknumber
event
weekssincelastevent
1
1
0
NULL
1
2
0
NULL
1
3
1
0
1
4
0
1
1
5
0
2
1
6
0
3
2
1
0
NULL
2
2
1
0
2
3
0
1
2
4
1
0
2
5
0
1
The column should be NULL before the first events and all values NULL where a personID never has event.
I can't think how to write this in SQL.
The table has ~600m rows (60m personIDs with 100 weeknumbers each, although some personIDs don't have all the weeknumbers).
Many thanks for any insight.

This is a bit of a gaps and island problem here. The first part, in the CTE, puts the data into "groups". Each time there is an event that's a new group. it also calculates the number of weeks that past since the prior week (which is set to 0 for rows hosting an event). Then in the outer query we SUM the number of weeks past in each group, giving the number of weeks that have passed:
WITH Groups AS(
SELECT PersonID,
WeekNumber,
Event,
COUNT(CASE Event WHEN 1 THEN 1 END) OVER (PARTITION BY PersonID ORDER BY WeekNumber ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Events,
CASE Event WHEN 0 THEN WeekNumber - LAG(WeekNumber) OVER (PARTITION BY PersonID ORDER BY WeekNumber ASC) ELSE 0 END AS WeeksPassed
FROM dbo.YourTable)
SELECT PersonID,
WeekNumber,
Event,
CASE WHEN Events = 0 THEN NULL
ELSE SUM(WeeksPassed) OVER (PARTITION BY PersonID, Events ORDER BY WeekNumber ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
END AS WeekSinceLastEvent
FROM Groups;
db<>fiddle

You can do this with a conditional aggregate within a windowed function:
SELECT t.PersonID,
t.WeekNumber,
t.Event,
WeeksSinceLastEvent = t.WeekNumber - MAX(CASE WHEN t.Event = 1 THEN t.WeekNumber END)
OVER(PARTITION BY t.PersonID ORDER BY t.WeekNumber)
FROM dbo.T AS t;
The key parts are:
CASE WHEN t.Event = 1 THEN t.WeekNumber END Only consider week number where it is a valid event. Since MAX with ignore nulls this will only consider relevant rows
OVER (PARTITION BY t.PersonID ORDER BY t.WeekNumber) - Only consider rows for the current person, where the weeknumber is lower than the current row.
Example on DB<>Fiddle

SQL Server Group By Excluding Some Values

I have some records like below:
ID Val Amount
1 0 3
2 0 3
3 0 4
4 1 2
5 1 3
6 2 3
7 2 4
I want to group this data by the column Val and get the sum(amount), but do not group the ones with Val = 0.
The result set I need is like below:
Val Amount
0 3
0 3
0 4
1 5
2 7
I did it by two ways, but none seem to be the best way:
First one is by using unions, like, first having the ones with Val = 0, then grouping the ones with Val <> 0 and unioning the two result sets.
Second one is a little bit better. Let's call the data we have is in the table #Table:
WITH g AS
(
SELECT Val, Amount, CASE WHEN Val = '0' then Val + ID
else Val END A FROM #table
)
SELECT CASE WHEN A LIKE '0%' THEN 0 ELSE A END AS A, SUM(Amount)
FROM g
GROUP BY A
This also works, but being have to concatenate with the ID column (or raw_number) and than using a left function to remove it is not a best practice.
So I'm looking for a better approach, both looking better and performing better as well.
I work on SQL Server 2008, but I'm open to any solutions which require newer versions.

The shortest way of doing it is the following:
SELECT Val, SUM(Amount)
FROM mytable
GROUP BY Val, CASE WHEN Val = 0 THEN ID ELSE 0 END
Demo here
You can also do it using window functions:
;WITH CTE AS (
SELECT ID, Val, Amount,
DENSE_RANK() OVER (PARTITION BY Val
ORDER BY CASE
WHEN Val = 0 THEN ID
ELSE 0
END) AS rank
FROM mytable
)
SELECT Val, SUM(Amount) AS total_amount
FROM CTE
GROUP BY Val, rank
The result set returned by the CTE is:
ID Val Amount rank
--------------------
1 0 3 1
2 0 3 2
3 0 4 3
4 1 2 1
5 1 3 1
6 2 3 1
7 2 4 1
So using rank you can differentiate between 0 and the rest of Val values.
Demo here
You can use both methods and see how they compare to each other in terms of performance.

Use a union here. The top of the below union finds aggregate amounts of values which are not zero, and the bottom brings in the zero value records, not aggregated.
SELECT Val, SUM(Amount) AS Amount
FROM g
WHERE Val <> 0
GROUP BY Val
UNION ALL
SELECT Val, Amount
FROM g
WHERE Val = 0
ORDER BY Val;
Demo

Grouping ID while counting specific attribute values

I want to count how many occurrences there is of the value 1 in the attribute months for each ID in a table.
Here is what I am working with
ID. Months
1000 1
1000 1
1000 2
1001 2
1002 3
1003 1
This is what I would like to have
ID. Count(Months=1)
1000 2
1003 1

If you want to count row for just one month, you can use WHERE clause for filtering:
select id,
count(*) as cnt
from your_table
where month = 1
group by id;
If you want to get counts for multiple months in one row (it's called pivoting), you can use conditional aggregation in most of the databases:
select id,
count(case when month = 1 then 1 end) as cnt_month_1,
count(case when month = 2 then 1 end) as cnt_month_2,
count(case when month = 3 then 1 end) as cnt_month_3,
. . .
from your_table
group by id;
Some databases offer PIVOT operator for this task. For that, you'll need to specify which database you are using.

How to query records based on row_num and one of the column value?

Rownum Status
1 2
2 1
3 3
4 2
5 3
6 1
The condition is to query records appear before the first record of status=3 which in the above scenario the expected output will be rownum = 1 and 2.
In the case if there is no status=3 then show everything.
I'm not sure from where to start hence currently no findings

If you are using SQL Server 2012+, then you can use window version of SUM with an ORDER BY clause:
SELECT Rownum, Status
FROM (
SELECT Rownum, Status,
SUM(CASE WHEN Status = 3 THEN 1 ELSE 0 END)
OVER
(ORDER BY Rownum) AS s
FROM mytable) t
WHERE t.s = 0
Calculated field s is a running total of Status = 3 occurrences. The query returns all records before the first occurrence of a 3 value.
Demo here

Passing values into CASE statement

and thank you all in advance for your help.
I'm trying to take the results from two separate queries and include them in a third query that has a CASE statement. I've had some success but I'm not able to present the results of the third query in the proper order. The purpose of this is to show the employee count for each department under the different managers. So far I can only load separately the manager names and their departments and employee department count totals by department. What I can't figure out is how to get the manager names in and the employee department count in for each manager row. Below are the two source queries I've used so far and the query with the CASE statement. I've also looked at UNPIVOT function with no success yet.
a) This simple query lists each primary manager name. There are also sub managers that will be returned using a hierarchy query later.
select name from employees "Boss" where employeeid in
(‘1’,'5','25','84','85');
b) This query returns the department id count for each main manager (‘1’,'5','25','84','85') as well as all sub-managers.
select departmentid, count(departmentid) COUNT from employees
where departmentid = departmentid and level <= 3
connect by prior employeeid = bossid
start with employeeid = 5
group by departmentid
order by departmentid;
c) Here’s a CASE statement that outputs exactly as desired. The problem here is the select statement currently outputs only the manager names and the manager departments into the columns. What I need to do is output both the manager names and the manager's employee department counts into the individual manager row columns. I've tried to do a separate select of the manager names to get the ‘Boss’ column and another select to include the department counts. But that got messy. Also passing the counts in a second statement would create an additional unwanted column.
select e.name "Boss",
COUNT(CASE WHEN d.departmentid = '1' THEN 1 END) AS "Finance",
COUNT(CASE WHEN d.departmentid = '2' THEN 1 END) AS "HR",
COUNT(CASE WHEN d.departmentid = '3' THEN 1 END) AS "IT",
COUNT(CASE WHEN d.departmentid = '4' THEN 1 END) AS "Marketing",
COUNT(CASE WHEN d.departmentid = '5' THEN 1 END) AS "Sales"
from employees e, departments d
where e.employeeid in (select distinct e.bossid from employees e)
and e.departmentid = d.departmentid (+)
group by e.name
order by e.name;
Boss Finance HR IT Marketing Sales
-------------------- ---------- ---------- ---------- ---------- ----------
Baxter Carney 0 0 0 0 1
Blythe Pierce 0 0 0 0 1
Here's an altered CASE query that loads the employee department counts but unfortunately it loads by department and not by individual manager. That is the problem I'm stuck on right now. How to pass the counts to the right manager and into the right column.
select departmentid "DEPTNO",
COUNT(CASE WHEN departmentid = '1' THEN 1 END) AS "Finance",
COUNT(CASE WHEN departmentid = '2' THEN 1 END) AS "HR",
COUNT(CASE WHEN departmentid = '3' THEN 1 END) AS "IT",
COUNT(CASE WHEN departmentid = '4' THEN 1 END) AS "Marketing",
COUNT(CASE WHEN departmentid = '5' THEN 1 END) AS "Sales"
from employees
where departmentid = departmentid and level <= 3
connect by prior employeeid = bossid
start with employeeid = 5
group by departmentid
order by departmentid
/
DEPTNO Finance HR IT Marketing Sales
3 0 0 1 0 0
5 0 0 0 0 21
And here's for all managers. You can see that it just keeps increasing the individual department count.
DEPTNO Finance HR IT Marketing Sales
1 4 0 0 0 0
2 0 23 0 0 0
3 0 0 20 0 0
4 0 0 0 1 0
5 0 0 0 0 28

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

COUNT and COUNT DISTINCT for different groups - sql-server

Related

SQL Server: add column for rows since value changed

SQL Server Group By Excluding Some Values

Grouping ID while counting specific attribute values

How to query records based on row_num and one of the column value?

Passing values into CASE statement

Categories

Resources