Grouping ID while counting specific attribute values - sql-server

I want to count how many occurrences there is of the value 1 in the attribute months for each ID in a table.
Here is what I am working with
ID. Months
1000 1
1000 1
1000 2
1001 2
1002 3
1003 1
This is what I would like to have
ID. Count(Months=1)
1000 2
1003 1

If you want to count row for just one month, you can use WHERE clause for filtering:
select id,
count(*) as cnt
from your_table
where month = 1
group by id;
If you want to get counts for multiple months in one row (it's called pivoting), you can use conditional aggregation in most of the databases:
select id,
count(case when month = 1 then 1 end) as cnt_month_1,
count(case when month = 2 then 1 end) as cnt_month_2,
count(case when month = 3 then 1 end) as cnt_month_3,
. . .
from your_table
group by id;
Some databases offer PIVOT operator for this task. For that, you'll need to specify which database you are using.

Related

SQL Server: add column for rows since value changed

I have a table that contains 3 columns: personID, weeknumber, and event. Event is 0 if there was no event for that person in that week and 1 if there was.
I need to create a new column weekssincelastevent which will be 0 for the week where event=1 and then 1,2,3,4 etc for the weeks afterwards. If there is a later event then it starts from 0 again. E.g.
personID
weeknumber
event
weekssincelastevent
1
1
0
NULL
1
2
0
NULL
1
3
1
0
1
4
0
1
1
5
0
2
1
6
0
3
2
1
0
NULL
2
2
1
0
2
3
0
1
2
4
1
0
2
5
0
1
The column should be NULL before the first events and all values NULL where a personID never has event.
I can't think how to write this in SQL.
The table has ~600m rows (60m personIDs with 100 weeknumbers each, although some personIDs don't have all the weeknumbers).
Many thanks for any insight.
This is a bit of a gaps and island problem here. The first part, in the CTE, puts the data into "groups". Each time there is an event that's a new group. it also calculates the number of weeks that past since the prior week (which is set to 0 for rows hosting an event). Then in the outer query we SUM the number of weeks past in each group, giving the number of weeks that have passed:
WITH Groups AS(
SELECT PersonID,
WeekNumber,
Event,
COUNT(CASE Event WHEN 1 THEN 1 END) OVER (PARTITION BY PersonID ORDER BY WeekNumber ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Events,
CASE Event WHEN 0 THEN WeekNumber - LAG(WeekNumber) OVER (PARTITION BY PersonID ORDER BY WeekNumber ASC) ELSE 0 END AS WeeksPassed
FROM dbo.YourTable)
SELECT PersonID,
WeekNumber,
Event,
CASE WHEN Events = 0 THEN NULL
ELSE SUM(WeeksPassed) OVER (PARTITION BY PersonID, Events ORDER BY WeekNumber ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
END AS WeekSinceLastEvent
FROM Groups;
db<>fiddle
You can do this with a conditional aggregate within a windowed function:
SELECT t.PersonID,
t.WeekNumber,
t.Event,
WeeksSinceLastEvent = t.WeekNumber - MAX(CASE WHEN t.Event = 1 THEN t.WeekNumber END)
OVER(PARTITION BY t.PersonID ORDER BY t.WeekNumber)
FROM dbo.T AS t;
The key parts are:
CASE WHEN t.Event = 1 THEN t.WeekNumber END Only consider week number where it is a valid event. Since MAX with ignore nulls this will only consider relevant rows
OVER (PARTITION BY t.PersonID ORDER BY t.WeekNumber) - Only consider rows for the current person, where the weeknumber is lower than the current row.
Example on DB<>Fiddle

COUNT and COUNT DISTINCT for different groups

For a SQL Server based report,
Table:
CID Date ID Service Days
1 3/7/2016 1 Individual 3
2 4/5/2016 2 Individual 4
3 5/24/2016 1 Individual 3
4 4/4/2016 4 Group 2
5 4/4/2016 4 Group 2
6 2/18/2016 4 Group 2
7 5/5/2016 5 Group 1
8 5/5/2016 5 Group 1
I used this code:
SELECT
ID,
Service,
COUNT(WHEN Days = 4 THEN 1 END) AS '4Days',
COUNT(WHEN Days = 3 THEN 1 END) AS '3Days',
COUNT(WHEN Days = 2 THEN 1 END) AS '2Days',
COUNT(WHEN Days = 1 THEN 1 END) AS '1Day'
FROM Table T1
GROUP BY
ID,
Service
which gives me this Output:
ID Service 4Days 3Days 2Days 1Day
1 Individual 0 2 0 0
2 Individual 1 0 0 0
4 Group 0 0 3 0
5 Group 0 0 0 2
What I want to do is not count the Group services as separate services for separate individuals, but just as one service per group. A Count Distinct used with the Date or ID could help me do that but I don't know how to make that play with the Individual services where I just wanna count them individually and not using DISTINCT. So the desired output is:
ID Service 4Days 3Days 2Days 1Day
1 Individual 0 2 0 0
2 Individual 1 0 0 0
4 Group 0 0 2 0
5 Group 0 0 0 1
I'll edit the post in case I oversimplified the problem since this is dummy data.
Looks like you could use distinct this way if you wanted:
count(distinct
case when Days = 1 then case when Service = 'Group' then 1 else "Date" end end
) as [1Day]
Depending on your indexing it's possible that introducing another column in the query would change the query plan. I suspect that probably isn't the case though.
If I am not wrong for '2Days' column service type 'Group' count should be '2' if our grouping based on 'Date' column, if so then try this:
SELECT
ID,
Service,
CASE WHEN MAX(t.days) = 4 THEN MAX(t.date) ELSE 0 END AS '4Days',
CASE WHEN MAX(t.days) = 3 THEN MAX(t.date) ELSE 0 END AS '3Days',
CASE WHEN MAX(t.days) = 2 THEN MAX(t.date) ELSE 0 END AS '2Days',
CASE WHEN MAX(t.days) = 1 THEN MAX(t.date) ELSE 0 END AS '1Day'
FROM table T1
OUTER APPLY (SELECT days,
COUNT(DISTINCT(date)) date
FROM Table WHERE days = t1.days GROUP BY days) t
GROUP BY id, service
ORDER BY ID
Based on your last edit, this is the most straight forward way I could think of to handle the query:
with cte as (
select id, service, days
from table t1
where service = 'Individual'
union all
select id, service, days
from table t1
where service = 'Group'
group by id, service, days, date
)
select id,
service,
count(case when days = 4 then 'X' end) as [4Days],
count(case when days = 3 then 'X' end) as [3Days],
count(case when days = 2 then 'X' end) as [2Days],
count(case when days = 1 then 'X' end) as [1Day]
from cte
group by id, service

How to query records based on row_num and one of the column value?

Rownum Status
1 2
2 1
3 3
4 2
5 3
6 1
The condition is to query records appear before the first record of status=3 which in the above scenario the expected output will be rownum = 1 and 2.
In the case if there is no status=3 then show everything.
I'm not sure from where to start hence currently no findings
If you are using SQL Server 2012+, then you can use window version of SUM with an ORDER BY clause:
SELECT Rownum, Status
FROM (
SELECT Rownum, Status,
SUM(CASE WHEN Status = 3 THEN 1 ELSE 0 END)
OVER
(ORDER BY Rownum) AS s
FROM mytable) t
WHERE t.s = 0
Calculated field s is a running total of Status = 3 occurrences. The query returns all records before the first occurrence of a 3 value.
Demo here

Query for sum of all and particular rows

How can I arrive at a query for the below scenario?
Data:
Date Product Result Total
15/01/2015 ABC Pass 5
15/01/2015 XYZ Pass 8
15/01/2015 MNO Fail 2
23/01/2015 ABC Pass 10
23/01/2015 XYZ Fail 3
I need the result in the below format:
Date Total Pass Fail
15/01/2015 15 13 2
23/01/2015 13 10 3
Use conditional Aggregate
select Date
sum(Total) Total,
SUM(case when Result ='Pass' then Total else 0 end) Pass,
SUM(case when Result ='Fail' then Total else 0 end) Fail
From yourtable
Group by Date
Try this using PIVOT . FIDDLER DEMO
SELECT Date,
sum(pass) + sum(fail) AS Total,
sum(pass) AS Pass,
sum(fail) AS Fail
FROM TableName
PIVOT (SUM(Total) FOR Result in (pass, fail)) AS P
GROUP BY Date

ssis merge join more than 2 data sets

I'm working on an ssis package to fix some data from a table. The table looks something like this:
CustID FieldID INT_VAL DEC_VAL VARCHAR_VAL DATE_VAL
1 1 23
1 2 500.0
1 3 David
1 4 4/1/05
1 5 52369871
2 1 25
2 2 896.23
2 3 Allan
2 4 9/20/03
2 5 52369872
I want to transform it into this:
CustID FirstName AccountNumber Age JoinDate Balance
1 David 52369871 23 4/1/05 500.0
2 Allan 52369872 25 9/20/03 896.23
Currently, I've got my SSIS package set up to pull in the data from the source table, does a conditional split on the field id, then generates a derived column on each split. The part I'm stuck on is joining the data back together. I want to join the data back together on the CustId.
However, the join merge only allows you to join 2 datasets, in the end I will need to join about 30 data sets. Is there a good way to do that without having to have a bunch of merge joins?
That seems a bit awkward, why not just do it in a query?
select
CustID,
max(case when FieldID = 3 then VARCHAR_VAL else null end) as 'FirstName',
max(case when FieldID = 5 then INT_VAL else null end) as 'AccountNumber',
max(case when FieldID = 1 then INT_VAL else null end) as 'Age',
max(case when FieldID = 4 then DATE_VAL else null end) as 'JoinDate',
max(case when FieldID = 2 then DEC_VAL else null end) as 'Balance'
from
dbo.StagingTable
group by
CustID
If your source system is MSSQL, then you can use that query from SSIS or even create a view in the source database (if you're allowed to). If not, then copy the data directly to a staging table in MSSQL and query it from there.

Resources