SQL Group By - Generate multiple aggregate columns from single column - sql-server

I would like to group by Company & Date and generate count columns for 2 separate values (Flag=Y and Flag=N).
Input table looks like this:
Company Date Flag
------- ------- -----
001 201201 Y
001 201201 N
001 201202 N
001 201202 N
001 201202 Y
The output should look like this:
Company Date Count_Y Count_N
------- ------ ------- -------
001 201201 1 1
001 201202 1 2
How can I write the SQL query?
Any kind of help is appreciated! Thanks!

You can do it using correlated subqueries like this:
SELECT
Company,
Date,
(SELECT COUNT(*) FROM MyTable AS T1
WHERE T1.Flag='Y' AND T1.Company=T2.Company AND T1.Date=T2.Date) AS Count_Y,
(SELECT COUNT(*) FROM MyTable AS T1
WHERE T1.Flag='N' AND T1.Company=T2.Company AND T1.Date=T2.Date) AS Count_N
FROM MyTable AS T2
GROUP BY Company, Date
You can also do it more concisely, but perhaps with (arguably) slighly less readability using the SUM trick:
SELECT
Company,
Date,
SUM(CASE WHEN Flag='Y' THEN 1 ELSE 0 END) AS Count_Y,
SUM(CASE WHEN Flag='N' THEN 1 ELSE 0 END) AS Count_N,
FROM MyTable
GROUP BY Company, Date
In Oracle/PLSQL, the DECODE function can be used to replace the CASE for the even more concise:
SELECT
Company,
Date,
SUM(DECODE(Flag,'Y',1,0)) AS Count_Y,
SUM(DECODE(Flag,'N',1,0)) AS Count_N,
FROM MyTable
GROUP BY Company, Date

If you have an identifier/key for this table, then you can pivot it like this:
SELECT
[Company],
[Date],
[Y] Count_Y,
[N] Count_N
FROM Company
PIVOT
(COUNT([ID]) FOR FLAG IN ([Y],[N])) pvt
Where ID is your identifier for the table Company.
Fiddle with the code here
If you do not have an identifier/key for the table and Company, Date and Flag are the only columns you have, then you can do a PIVOT on the count of the Flag itself like #ConradFrix has suggested in the comments:
SELECT
[Company],
[Date],
[Y] Count_Y,
[N] Count_N
FROM Company
PIVOT
(COUNT(FLAG) FOR FLAG IN ([Y],[N])) pvt

Related

How to pick multiple values from the table and add them dynamically for picking another column

SELECT
AgentID,Seat1,SeatUpdated_1,Seat2,SeatUpdated_2,
Seat3,SeatUpdated_3,nTimesSeatChanged,
DATEDIFF(MS,(F.SeatUpdated_1),(F.SeatUpdated_3)) AvgTime
FROM ##final F
Now I have to pick SeatUpdated_3 in the diff function based on column nTimesSeatChanges.
If it have value 2 for any agent, the selected column should be SeatUpdated_2
From the limited information would look at how you store the information first of all. It looks like you are storing related values across column. It would be simpler to work with if you looked at storing them as rows.
Current:
AgentID Seat1 SeatUpdated_1 Seat2 SeatUpdated_2 Seat3 SeatUpdated_3 nTimesSeatChanged
------- ----- ------------- ----- ------------- ----- ------------- -----------------
1 11 21 01/02/2015 31 01/03/2015 2
TO :
TableKey AgentID Seat SeatUpdated
-------- ------- ---- -----------
1 1 11 01/01/2015
2 1 12 01/02/2015
3 1 13 01/03/2015
4 2 22 02/02/2015
5 2 23 02/03/2015
Then work in simple queries to get the end result. I'm not an expert by any means but this is how I would approach it.
;
--Some sample data
WITH CTE_DATA as
(
SELECT '1' as TableKey, '1' as 'AgentID','11' as'Seat','01/01/2015' as 'SeatUpdated'
UNION
SELECT '2','1','12','01/02/2015'
UNION
SELECT '3','1','13','01/03/2015'
UNION
SELECT '4','2','22','02/02/2015'
UNION
SELECT '5','2','23','02/03/2015'
)
,
--Get Min Seat
CTE_Min AS (
SELECT AgentID
,MIN(Seat) AS min_seat
FROM CTE_DATA
GROUP BY AgentID
)
--Get max seat
,CTE_Max AS (
SELECT AgentID
,MAX(Seat) AS max_seat
FROM CTE_DATA
GROUP BY AgentID
)
--Stick them together
,CTE_Join AS (
SELECT Min.AgentID
,Min.min_seat
,max.max_seat
FROM CTE_Min min
JOIN CTE_Max max ON min.AgentID = max.AgentID
)
--Get the date
,CTE_JoinDate AS (
SELECT j.*
,d1.SeatUpdated AS min_date
,d2.SeatUpdated AS max_date
FROM CTE_Join j
LEFT JOIN CTE_DATA d1 ON j.AgentID = d1.AgentID
AND j.min_seat = d1.Seat
LEFT JOIN CTE_DATA d2 ON j.AgentID = d2.AgentID
AND j.max_seat = d2.Seat
)
--Work out nSeats
,CTE_nSeats AS (
SELECT AgentID
,COUNT(1) AS nSeats
FROM CTE_DATA
GROUP BY AgentID
)
--Final result set
SELECT j.*
,DATEDIFF(DAY, min_date, max_date) AS DIFF_Days
,n.nSeats
FROM CTE_JoinDate j
LEFT JOIN CTE_nSeats n ON j.AgentID = n.AgentID

Combine two results in one row

EmpID Name Date Earn
1 A 7/1/2014 2
1 A 7/1/2014 4
1 A 7/2/2014 1
1 A 7/2/2014 2
2 B 7/1/2014 5
2 B 7/2/2014 5
I would like combine two results in one row as below.here is my statement but i want to find the solution to get the Total_Earn?. Thank
"SELECT EmpID, Name, Date, Sum(earn) FROM employee WHERE Date between DateFrom and DateTo
GROUP BY EmpID, Name, Date"
EmpID Name Date Earn Total_Earn
1 A 7/2/2014 3 9
2 B 7/2/2014 5 10
It looks like you want the Max date and the Sum of Earn for each employee. Assuming you want one record for each ID/Name, you would do this:
select EmpID, Name, Max(Date), Sum(Earn)
from YourTableName
group by EmpID, Name
Try this. Substitute the date for whatever value you want.
SELECT table1.EmpID, table1.Name, table1.Date, table1.Earn, table2.Total_Earn
FROM
(SELECT EmpID, Name, Date, Earn
FROM yourtablename
WHERE Date = "2014-07-02"
GROUP BY EmpID) table1
LEFT JOIN
(SELECT EmpID, SUM(Earn)
FROM yourtablename
WHERE Date <= "2014-07-02"
GROUP BY EmpID) table2
ON table1.EmpID = table2.EmpID
This will perform two SELECTs and join their results. The first select (defined as table1) well select the employee ID and earnings for the specified date.
The second statement (defined as table2) will select the total earnings for an employee up to and including that date.
The two statements are then joined together according to the employee ID.

How do I perform the following multi-layered pivot with TSQL in Access 2010?

I have looked at the following relevant posts:
How to create a PivotTable in Transact/SQL?
SQL Server query - Selecting COUNT(*) with DISTINCT
SQL query to get field value distribution
Desire: The have data change from State #1 to State #2.
Data: This data is a collection of the year(s) in which a person (identified by their PersonID) has been recorded performing a certain activity, at a certain place.
My data currently looks as follows:
State #1
Row | Year | PlaceID | ActivityID | PersonID
001 2011 Park Read 201a
002 2011 Library Read 202b
003 2012 Library Read 202b
004 2013 Library Read 202b
005 2013 Museum Read 202b
006 2011 Park Read 203c
006 2010 Library Read 203c
007 2012 Library Read 204d
008 2014 Library Read 204d
Edit (4/2/2014): I decided that I want State #2 to just be distinct counts.
What I want my data to look like:
State #2
Row | PlaceID | Column1 | Column2 | Column3
001 Park 2
002 Library 1 1 1
003 Museum 1
Where:
Column1: The count of the number of people that attended the PlaceID to read on only one year.
Column2: The count of the number of people that attended the PlaceID to read on two different years.
Column3: The count of the number of people that attended the PlaceID to read on three different years.
In the State #2 schema, a person cannot be counted in more than one column for each row (place). If a person reads at a particular place for 2010, 2011, 2012, they appear in Row 001, Column3 only. However, that person can appear in other rows, but once again, in only one column of that row.
My methodology (please correct me if I am doing this wrong):
I believe that the first step is to extract distinct counts of the number of years each person attended the place to perform the activity of interest (please correct me on this methodology if incorrect).
As such, this is where I am with the T-SQL:
SELECT
PlaceID
,PersonID
,[ActivityID]
,COUNT(DISTINCT [Year]) AS UNIQUE_YEAR_COUNT
FROM (
SELECT
Year
,PlaceID
,ActivityID
,PersonID
FROM [my].[linkeddatabasetable]
WHERE ActivityID = 'Read') t1
GROUP BY
PlaceID
,PersonID
,[ActivityID]
ORDER BY 1,2
Unfortunately, I do not know where to take it from here.
I think you have two options.
A traditional pivot
select placeID,
, Column1 = [1]
, Column2 = [2]
, Column3 = [3]
from
(
SELECT
PlaceID
,COUNT(DISTINCT [Yearvalue]) AS UNIQUE_YEAR_COUNT
FROM (
SELECT
yearValue
,PlaceID
,ActivityID
,PersonID
FROM #SO
WHERE ActivityID = 'Read') t1
GROUP BY
PlaceID
,PersonID
,[ActivityID]) up
pivot (count(UNIQUE_YEAR_COUNT) for UNIQUE_YEAR_COUNT in ([1],[2],[3]) ) as pvt
or as a case/when style pivot.
select I.PlaceID
, Column1 = count(case when UNIQUE_YEAR_COUNT = 1 then PersonID else null end)
, Column2 = count(case when UNIQUE_YEAR_COUNT = 2 then PersonID else null end)
, Column3 = count(case when UNIQUE_YEAR_COUNT = 3 then PersonID else null end)
from (
SELECT
PlaceID
, PersonID
,COUNT(DISTINCT [Yearvalue]) AS UNIQUE_YEAR_COUNT
FROM (
SELECT
yearValue
,PlaceID
,ActivityID
,PersonID
FROM #SO
WHERE ActivityID = 'Read') t1
GROUP BY
PlaceID
,PersonID
,[ActivityID]) I
group by I.PlaceID
Since you are in Access I would think using an aggregate functions would do the work.
Try with DCOUNT() to begin with http://office.microsoft.com/en-us/access-help/dcount-function-HA001228817.aspx.
Replace your count() with dcount("year", "linkeddatabasetable", "placeid=" & [placeid])

Query trick - kind of unpivot

I have the following table
SnapShotDay OperationalUnitNumber IsOpen StatusDate
1-01-2014 001 1 1-01-2014
2-01-2014 NULL NULL NULL
3-01-2014 001 0 3-01-2014
4-01-2014 NULL NULL NULL
5-01-2014 001 1 5-01-2014
I obtain this with a SELECT construct, but what I need to do now is fill in the "NULL"ed rows by taking values from the first Non nulled row before. The latter would give:
SnapShotDay OperationalUnitNumber IsOpen StatusDate
1-01-2014 001 1 1-01-2014
2-01-2014 001 1 1-01-2014
3-01-2014 001 0 3-01-2014
4-01-2014 001 0 3-01-2014
5-01-2014 001 1 5-01-2014
In functional words: I have events records that give me an event on a date for an oprrational unit; the event is: IsOpen or IsClosed. Chaining those events together according to the date gives a sort of Ranges. What I need is generate daily records for those ranges (target is a fact table).
I am trying to achieve this in plain SQL query (no stored procedure).
Can you think of a trick ?
Declare #t table(
SnapShotDay date,
OperationalUnitNumber int,
IsOpen bit,
StatusDate date
)
insert into #t
select '1-01-2014', 001 , 1 , '1-01-2014' union all
select '2-01-2014', NULL, NULL, NULL union all
select '3-01-2014', 001 , 0 ,'3-01-2014' union all
select '4-01-2014', NULL,NULL,NULL union all
select '5-01-2014', 001 ,1,'5-01-2014'
;
with CTE as
(
select *,row_number()over( order by (select 0))rn from #t
)
select *,
case when a.isopen is null then (
select IsOpen from cte where rn=a.rn-1
) else a.isopen end
from cte a
ok i got it create one more cte1 then,
,cte1 as
(
select top 1 rn ,IsOpen from cte where IsOpen is not null order by rn desc
)
--select * from Statuses
select *,
case
when a.rn<=(select b.rn from cte1 b) and a.IsOpen is null then
(
select
a1.IsOpen
from
cte a1
where
a1.rn=a.rn-1
)
when a.rn>=(select b.rn from cte1 b) and a.IsOpen is null then
(select IsOpen from cte1)
else
a.isopen
end
from
cte a
Try this. In the main query we're looking for the previous date with not null values. Then just JOIN this table with this LastDate.
WITH T1 AS
(
SELECT *, (SELECT MAX(SnapShotDay)
FROM T
WHERE SnapShotDay<=TMain.SnapShotDay
AND OPERATIONALUNITNUMBER IS NOT NULL)
as LastDate
FROM T as TMain
)
SELECT T1.SnapShotDay,
T.OperationalUnitNumber,
T.IsOpen,
T.StatusDate
FROM T1
JOIN T ON T1.LastDate=T.SnapShotDay
SQLFiddle demo
SELECT
t1.SnapShotDay,
CASE WHEN t1.OperationalUnitNumber IS NOT NUll
THEN t1.OperationalUnitNumber
ELSE (SELECT TOP 1 t2.OperationalUnitNumber FROM YourTable t2 WHERE t2.SnapShotDay < t1.SnapShotDay AND t2.OperationalUnitNumber IS NOT NULL ORDER BY SnapShotDay DESC)
END AS OperationalUnitNumber,
CASE WHEN t1.IsOpen IS NOT NUll
THEN t1.IsOpen
ELSE (SELECT TOP 1 t2.IsOpen FROM YourTable t2 WHERE t2.SnapShotDay < t1.SnapShotDay AND t2.IsOpen IS NOT NULL ORDER BY SnapShotDay DESC)
END AS IsOpen,
CASE WHEN t1.StatusDate IS NOT NUll
THEN t1.StatusDate
ELSE (SELECT TOP 1 t2.StatusDate FROM YourTable t2 WHERE t2.SnapShotDay < t1.SnapShotDay AND t2.StatusDate IS NOT NULL ORDER BY SnapShotDay DESC)
END AS StatusDate
FROM YourTable t1
You asked for 'plain sql', here is a tested attempt using SQL, with comments, that gives the required answer.
I have tested the code using 'sqlite' and 'mysql' on windows xp. It is pure SQL and should work everywhere.
SQL is about 'sets' and combining them and ordering the results.
This problem seems to be about two separate sets:
1) The 'snap shot day' that have readings.
2) the 'snap shot day' that don't have readings.
I have added extra columns so that we can easily see where values came from.
let us deal with the easy set first:
This is the set of 'supplied' readings.
SELECT dss.SnapShotDay theDay,
'supplied' readingExists,
dss.OperationalUnitNumber,
dss.IsOpen,
dss.StatusDate
FROM dailysnapshot dss
WHERE dss.OperationalUnitNumber IS NOT NULL
results:
theDay readingExists OperationalUnitNumber IsOpen StatusDate
2014-01-01 supplied 001 1 2014-01-01
2014-01-03 supplied 001 0 2014-01-03
2014-01-05 supplied 001 1 2014-01-05
Now let us deal with the set of 'days that have missing readings'. We need to get the 'most recent day that has readings that is closest to the day with the missing readings' and assume the same values from the 'most recent day' that is before the 'current' missing day.
It sounds complex but it isn't. It asks:
foreach day without a reading - get me the closest, earlier, date that has readings and i will use those readings.
Here is the query:
SELECT emptyDSS.SnapShotDay,
'missing' readingExists,
maxPrevDSS.OperationalUnitNumber,
maxPrevDSS.IsOpen,
maxPrevDSS.StatusDate
FROM dailysnapshot emptyDSS
INNER JOIN dailysnapshot maxPrevDSS ON maxPrevDSS.SnapShotDay =
(SELECT MAX(dss.SnapShotDay)
FROM dailysnapshot dss
WHERE dss.SnapShotDay < emptyDSS.SnapShotDay
AND dss.OperationalUnitNumber IS NOT NULL)
WHERE emptyDSS.OperationalUnitNumber IS NULL
results:
SnapShotDay readingExists OperationalUnitNumber IsOpen StatusDate
2014-01-02 missing 001 1 2014-01-01
2014-01-04 missing 001 0 2014-01-03
This is not about efficiency! It is about getting the correct 'result set' with the easiest to understand SQL code. I assume the database engine will optimize the query. The query can be 'tweaked' later if required.
We now need to combine the two queries and order the results in the manner we require.
The standard way of combining results from SQL queries is with set operators (union, intersection, minus).
we use 'union' and an 'order by' on the result set.
this gives the final query of:
SELECT dss.SnapShotDay theDay,
'supplied' readingExists,
dss.OperationalUnitNumber,
dss.IsOpen,
dss.StatusDate
FROM dailysnapshot dss
WHERE `OperationalUnitNumber` IS NOT NULL
UNION
SELECT emptyDSS.SnapShotDay theDay,
'missing' readingExists,
maxPrevDSS.OperationalUnitNumber,
maxPrevDSS.IsOpen,
maxPrevDSS.StatusDate
FROM dailysnapshot emptyDSS
INNER JOIN dailysnapshot maxPrevDSS ON maxPrevDSS.SnapShotDay =
(SELECT MAX(dss.SnapShotDay)
FROM dailysnapshot dss
WHERE dss.SnapShotDay < emptyDSS.SnapShotDay
AND dss.OperationalUnitNumber IS NOT NULL)
WHERE emptyDSS.OperationalUnitNumber IS NULL
ORDER BY theDay ASC
result:
theDay readingExists dss.OperationalUnitNumber dss.IsOpen dss.StatusDate
2014-01-01 supplied 001 1 2014-01-01
2014-01-02 missing 001 1 2014-01-01
2014-01-03 supplied 001 0 2014-01-03
2014-01-04 missing 001 0 2014-01-03
2014-01-05 supplied 001 1 2014-01-05
I enjoyed doing this.
It should work with most SQL engines.

Selecting rows with the nearest date using SQL

I have a SQL statement.
SELECT
ID, LOCATION, CODE,MAX(DATE),FLAG
FROM
TABLE1
WHERE
DATE <= CONVERT(DATETIME,'11-11-2012')
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY
ID, LOCATION, CODE
I need rows with the nearest date to the 11-11-2012, but the table returns all the values. What am I doing wrong. Thanks
ID LOCATION CODE DATE FLAG
-------------------------------------------------------------------
14 CAR STREET,UDUPI 234 2012-08-08 00:00:00.000 0
14 CAR STREET,UDUPI 234 2012-08-10 00:00:00.000 1
14 CAR STREET,UDUPI 234 2012-08-14 00:00:00.000 0
279 MADHUGIRI 234 2012-08-08 00:00:00.000 1
279 MADHUGIRI 234 2012-08-11 00:00:00.000 0
I want to show only the rows with dates less than or equal to the given date. The required result is
ID LOCATION CODE DATE FLAG
-------------------------------------------------------------------
14 CAR STREET,UDUPI 234 2012-08-10 00:00:00.000 1
279 MADHUGIRI 234 2012-08-11 00:00:00.000 0
;WITH x AS
(
SELECT ID, Location, Code, Date, Flag,
rn = ROW_NUMBER() OVER
(PARTITION BY ID, Location, Code ORDER BY [Date] DESC)
FROM dbo.TABLE1 AS t1
WHERE [Date] <= '20121111'
AND ID IN (14, 279) -- sorry, missed this
AND EXISTS (SELECT 1 FROM #TEMP_CODE WHERE CODE = t1.CODE)
)
SELECT ID, Location, Code, Date, Flag
FROM x WHERE rn = 1;
This yields:
ID LOCATION CODE [Date] FLAG
--- ---------------- ---- ---------- ----
14 CAR STREET,UDUPI 234 2012-08-14 0
279 MADHUGIRI 234 2012-08-11 0
This disagrees with your required results, but I think those are wrong and I think you should check them.
Use a subquery to get the max date for each ID, and then join that to your table:
SELECT
ID, LOCATION, CODE, DATE, FLAG
FROM
TABLE1
JOIN (
SELECT ID AS SubID, MAX(DATE) AS SubDATE
FROM TABLE1
WHERE DATE < '11/11/2012'
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY ID
) AS SUB ON ID = SubID AND DATE = SubDATE
add a Order BY DATE LIMIT 0,2
With the order by you will make the date order by the closest to your condition in where and with the limit will return only the top 2 values!
SET ROWCOUNT 2
SELECT
ID, LOCATION, CODE,MAX(DATE),FLAG
FROM
TABLE1
WHERE
DATE <= CONVERT(DATETIME,'11-11-2012')
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY
ID, LOCATION, CODE
ORDER BY DATE

Resources