How would I select distinct persons per year, but only count each person once.
An example of my data is:
ID Date
1 20NOV2018
2 06JUN2017
2 29JUL2011
3 05MAY2014
4 04APR2002
4 25APR2009
I want my output to look like:
2002 1
2009 0
2011 1
2014 1
2017 0
2018 1
Use sub-selects to left join the distinct years with the first year an id occurs and count the ids from that.
data have;
input
id date date9.; format date date9.; datalines;
1 20NOV2018
2 06JUN2017
2 29JUL2011
3 05MAY2014
4 04APR2002
4 25APR2009
run;
proc sql;
create table want as
select each.year, count(first.id) as appeared_count
from
( select distinct year(date) as year
from have
) as each
left join
( select id, min(year(date)) as year
from have group by id
) as first
on each.year = first.year
group by each.year
order by each.year
;
Hope this Code Works Fine for Your case:
SELECT YEAR(M.DATE) AS DATE,COUNT(S.ID)
FROM #TAB M
LEFT JOIN (SELECT MIN(YEAR(DATE)) AS DATE ,ID
FROM #TAB GROUP BY ID) S ON YEAR(M.DATE)=S.DATE GROUP BY YEAR(M.DATE) ORDER BY YEAR(M.DATE)
I have 2 tables
OrderDetails:
Id Name type Quantity
------------------------------------------
2009 john a 10
2009 john a 20
2010 sam b 25
2011 sam c 50
2012 sam d 30
ValueDetails:
Id Value
-------------------
2009 300
2010 500
2011 200
2012 100
I need to get an output which displays the data as such :
Id Name type Quantity Price
-------------------------------------------------
2009 john a 10
2009 john a 20 9000
2010 sam b 25
2011 sam c 50
2012 sam d 30 25500
The price is calculated by Value x Quantity and the sum of the values is displayed in the last repeating row of the given Name.
I tired to use sum and group by but I get only two rows. I need to display all 5 rows. How can I write this query?
You can use Row_Number with max of Row_Number to get this formatted sum
;with cte as (
select od.*, sm= sum( od.Quantity*vd.value ) over (partition by Name),
RowN = row_number() over(partition by Name order by od.id)
from #yourOrderDetails od
inner join #yourValueDetails vd
on od.Id = vd.Id
)
select Id, Name, Type, Quantity,
case when max(RowN) over(partition by Name) = row_number() over(partition by Name order by Id)
then sm else null end as ActualSum
from cte
Your input tables:
create table #yourOrderDetails (Id int, Name varchar(20), type varchar(2), Quantity int)
insert into #yourOrderDetails (Id, Name, type, Quantity) values
(2009 ,'john','a', 10 )
,(2009 ,'john','a', 20 ) ,(2010 ,'sam ','b', 25 )
,(2011 ,'sam ','c', 50 ) ,(2012 ,'sam ','d', 30 )
create table #yourValueDetails(Id int, Value Int)
insert into #yourValueDetails(Id, value) values
( 2009 , 300 ) ,( 2010 , 500 )
,( 2011 , 200 ) ,( 2012 , 100 )
SELECT a.ID,
a.Name,
a.Type,
a.quantity,
price = (a.quantity * b.price)
FROM OrderDetails a LEFT JOIN
ValueDetails b on a.id = b.id
This will put the price on every row. If you want to do a SUM by Id,Name and Type it's not going to show the individual records like you show them above. If you want to put a SUM on one of the lines that share the same Id, Name and Type then you'd need a rule to figure out which one and then you could probably use a CASE statement to decide on which line you want to show the SUM total.
I have this query for retrieving rows from a SQL Server table:
SELECT
aid,
research_area_category_id,
CAST(research_area as VARCHAR(100)) [research_area],
COUNT(*) [Paper_Count]
FROM
sub_aminer_paper
GROUP BY
aid,
research_area_category_id,
CAST(research_area as VARCHAR(100))
HAVING
aid IN (SELECT
aid
FROM
sub_aminer_paper
GROUP BY
aid
HAVING
MIN(p_year) = 1990 AND MAX(p_year) = 2014 AND COUNT(pid) BETWEEN 10 AND 40
)
ORDER BY aid ASC, Paper_Count DESC
which returns this output:
aid research_area_category_id research_area Paper_Count
2937 33 markov chain 3
2937 33 markov decision process 1
2937 1 optimization problem 1
2937 27 real time application 1
2937 32 software product lines 1
11120 29 aspect oriented programming 4
11120 1 graph cut 2
11120 1 optimization problem 2
11120 32 uml class diagrams 1
11120 25 chinese word segmentation 1
11120 29 dynamic programming 1
11120 19 face recognition 1
11120 1 approximation algorithm 1
12403 2 differential equation 7
12403 1 data structure 2
12403 34 design analysis 1
12403 9 object detection 1
12403 27 operating system 1
12403 1 problem solving 1
12403 21 archiving system 1
12403 2 calculus 1
Now this is returning the output including all of rows concerned with respective aid's whereas I need only first 3 rows for each aid ORDER BY Paper_Count DESC i.e. rows containing value of Paper_Count 3, 1, 1 for aid 2937, 4,2,2 for 11120 and 7,2,2 for 12403.
Please help! Thanks.
one way is to apply row_number() over(partition by aid order by Paper_Count desc) as rn on your resultset and then select all records with rn<=3
with cte
as
(
SELECT
aid,
research_area_category_id,
CAST(research_area as VARCHAR(100)) [research_area],
COUNT(*) [Paper_Count]
FROM
sub_aminer_paper
GROUP BY
aid,
research_area_category_id,
CAST(research_area as VARCHAR(100))
HAVING
aid IN (SELECT
aid
FROM
sub_aminer_paper
GROUP BY
aid
HAVING
MIN(p_year) = 1990 AND MAX(p_year) = 2014 AND COUNT(pid) BETWEEN 10 AND 40
)
ORDER BY aid ASC, Paper_Count DESC
)
,
cte1
AS
(
SELECT * ,
ROW_NUMBER() OVER (PARTITION BY aid ORDER BY Paper_Count DESC) AS rn
FROM cte
)
SELECT * FROM cte1 WHERE rn<=3
I have a table with some forenames in:
SELECT * FROM d;
Forename
--------------------------------
Robert
Susan
Frances
Kate
May
Alex
Anna
I want to pull a cumulative total of name lengths alphabetically. So far I have:
WITH Names ( RowNum, Forename, ForenameLength )
AS ( SELECT ROW_NUMBER() OVER ( ORDER BY forename ) AS RowNum ,
Forename ,
LEN(forename) AS ForenameLength
FROM d
)
SELECT RowNum ,
Forename ,
ForenameLength ,
ISNULL(ForenameLength + ( SELECT ISNULL(SUM(ForenameLength),0)
FROM Names
WHERE RowNum < n.RowNum
), 0) AS CumLen
FROM NAMES n;
RowNum Forename ForenameLength CumLen
-------------------- -------------------------------- -------------- -----------
1 Alex 4 4
2 Anna 4 8
3 Frances 7 15
4 Kate 4 19
5 May 3 22
6 Robert 6 28
7 Susan 5 33
But I understand that it should be possible to do this (recursively) within the CTE. Anyone know how this could be achieved?
N.B. whilst we are developing on 2012, the current live system is 2008 so any solution would need to be backwards compatible at least in the short term.
You are on SQL Server 2012 and should use sum() over() instead.
select row_number() over(order by d.Forename) as RowNum,
d.Forename,
len(d.Forename) as ForenameLength,
sum(len(d.Forename)) over(order by d.Forename rows unbounded preceding) as CumLen
from d
order by d.Forename;
Result:
RowNum Forename ForenameLength CumLen
-------- ------------ -------------- -----------
1 Alex 4 4
2 Anna 4 8
3 Frances 7 15
4 Kate 4 19
5 May 3 22
6 Robert 6 28
7 Susan 5 33
Update:
If you for some reason absolutely want a recursive version it could look something like this:
with C as
(
select top(1)
1 as RowNum,
d.Forename,
len(d.Forename) as ForenameLength,
len(d.Forename) as CumLen
from d
order by d.Forename
union all
select d.RowNum,
d.Forename,
d.ForenameLength,
d.CumLen
from (
select C.RowNum + 1 as RowNum,
d.Forename,
len(d.Forename) as ForenameLength,
C.CumLen + len(d.Forename) as CumLen,
row_number() over(order by d.ForeName) as rn
from d
inner join C
on C.Forename < d.Forename
) as d
where d.rn = 1
)
select C.RowNum,
C.Forename,
C.ForenameLength,
C.CumLen
from C;
Adapted from Performance Tuning the Whole Query Plan by Paul White.
I have a query that pulls out month/year totals for customers, and add the ntile ranking. If I were to be able to pull out the max subtotal for ntile 1, 2, 3, 4, and 5, I would ALMOST get what I'm after, but I do not know how to proceed.
For example, the result I want would look something like:
Month Year CustomerCode SubTotal ntile
1 2012 CCC 131.45 1
1 2012 CCC 342.95 2
1 2012 ELITE 643.92 3
1 2012 CCC 1454.05 4
1 2012 CCC 12971.78 5
2 2012 CCC 135.99 1
2 2012 CCI 370.47 2
2 2012 NOC 766.84 3
2 2012 ELITE 1428.26 4
2 2012 VBC 5073.20 5
3 2012 CCC 119.02 1
3 2012 CCC 323.78 2
3 2012 HUCC 759.66 3
3 2012 ELITE 1402.95 4
3 2012 CCC 7964.20 5
EXCEPT - I would expect ranking to be different customers like for month 2, but my base query isn't giving me that result - and I obviously don't know how to get it in T-SQL on SQL SERVER 2005 - in fact I'm not sure what I'm getting.
My next option is to pull a DataTable in C# and do some gymnastics to get there, but there has to be an easier way :)
My base query is
SELECT
i.DateOrdered
,LTRIM(STR(DATEPART(MONTH,i.DateOrdered))) AS [Month]
,LTRIM(STR(YEAR(i.Dateordered))) AS [Year]
,c.CustomerCode
,SUM(i.Jobprice) AS Subtotal
,NTILE(5) OVER(ORDER BY SUM(i.JobPrice)) AS [ntile]
FROM Invoices i
JOIN
Customers c
ON i.CustomerID = c.ID
WHERE i.DateOrdered >= '1/1/2012'
AND i.DateOrdered <= '9/30/2012'
GROUP BY YEAR(i.DateOrdered), MONTH(i.DateOrdered), i.DateOrdered, c.CustomerCode
ORDER BY LTRIM(STR(DATEPART(MONTH,i.DateOrdered))),
TRIM(STR(YEAR(i.Dateordered))),
SUM(i.JobPrice), c.CustomerCode ASC
I'd really appreciate help getting this right.
Thanks in advance
Cliff
If I read you correctly, what you are after is
For each month in the range,
Show 5 customers who have the greatest SUMs in that month
And against each customer, show the corresponding SUM.
In that case, this SQL Fiddle creates a sample table and runs the query that gives you the output described above. If you wanted to see what's in the created tables, just do simple SELECTs on the right panel.
The query is:
; WITH G as -- grouped by month and customer
(
SELECT DATEADD(D,1-DAY(i.DateOrdered),i.DateOrdered) [Month],
c.CustomerCode,
SUM(i.Jobprice) Subtotal
FROM Invoices i
JOIN Customers c ON i.CustomerID = c.ID
WHERE i.DateOrdered >= '1/1/2012' AND i.DateOrdered <= '9/30/2012'
GROUP BY DATEADD(D,1-DAY(i.DateOrdered),i.DateOrdered), c.CustomerCode
)
SELECT MONTH([Month]) [Month],
YEAR([Month]) [Year],
CustomerCode,
SubTotal,
Rnk [Rank]
FROM
(
SELECT *, RANK() OVER (partition by [Month] order by Subtotal desc) Rnk
FROM G
) X
WHERE Rnk <= 5
ORDER BY Month, Rnk
To explain, the first part (WITH block) is just a fancy way of writing a subquery, that GROUPs the data by month and Customer. The expression DATEADD(D,1-DAY(i.DateOrdered),i.DateOrdered) turns every date into the FIRST day of that month, so that the data can be easily grouped by month. The next subquery written in traditional form adds a RANK column within each month by the subtotal, which is finally SELECTed to give the top 5*.
Note that RANK allows for equal rankings, which may end up showing 6 customers for a month, if 3 of them are ranked equally at position 4. If that is not what you want, then you can change the word RANK to ROW_NUMBER which will randomly tie-break between equal Subtotals.
The query needs to be modified to only get the month and year dateparts. The issue you are having with the same customer showing multiple times in the same month is due to the inclusion of i.DateOrdered in the select and group by clauses.
The following query should give you what you need. Also, I suspect it is a typo on the next to last line of the query, but tsql doesn't have a TRIM() function only LTRIM and RTRIM.
SELECT
LTRIM(STR(DATEPART(MONTH,i.DateOrdered))) AS [Month]
,LTRIM(STR(YEAR(i.Dateordered))) AS [Year]
,c.CustomerCode
,SUM(i.Jobprice) AS Subtotal
,NTILE(5) OVER(ORDER BY SUM(i.JobPrice)) AS [ntile]
FROM Invoices i
JOIN
Customers c
ON i.CustomerID = c.ID
WHERE i.DateOrdered >= '1/1/2012'
AND i.DateOrdered <= '9/30/2012'
GROUP BY YEAR(i.DateOrdered), MONTH(i.DateOrdered), c.CustomerCode
ORDER BY LTRIM(STR(DATEPART(MONTH,i.DateOrdered))),
LTRIM(STR(YEAR(i.Dateordered))),
SUM(i.JobPrice), c.CustomerCode ASC
This gives these results
Month Year CustomerCode Subtotal ntile
1 2012 ELITE 643.92 2
1 2012 CCC 14900.23 5
2 2012 CCC 135.99 1
2 2012 CCI 370.47 1
2 2012 NOC 766.84 3
2 2012 ELITE 1428.26 4
2 2012 VBC 5073.20 4
3 2012 HUCC 759.66 2
3 2012 ELITE 1402.95 3
3 2012 CCC 8407.00 5
Try this:
declare #tab table
(
[month] int,
[year] int,
CustomerCode varchar(20),
SubTotal float
)
insert into #tab
select
1,2012,'ccc',131.45 union all
select
1,2012,'ccc',343.45 union all
select
1,2012,'ELITE',643.92 union all
select
2,2012,'ccc',131.45 union all
select
2,2012,'ccc',343.45 union all
select
2,2012,'ELITE',643.92 union all
select
3,2012,'ccc',131.45 union all
select
3,2012,'ccc',343.45 union all
select
3,2012,'ELITE',643.92
;with cte as
(
select NTILE(3) OVER(partition by [month] ORDER BY [month]) AS [ntile],* from #tab
)
select * from cte
Even in your base query you need to add partition by, so that you will get correct output.
I can't see how to solve this problem without double ranking:
You need to get the largest sums per customer & month.
You then need, for every month, to retrieve the top five of the found sums.
Here's how I would approach this:
;
WITH MaxSubtotals AS (
SELECT DISTINCT
CustomerID,
MonthDate = DATEADD(MONTH, DATEDIFF(MONTH, 0, DateOrdered), 0),
Subtotal = MAX(SUM(JobPrice)) OVER (
PARTITION BY Customer, DATEADD(MONTH, DATEDIFF(MONTH, 0, DateOrdered), 0)
ORDER BY SUM(JobPrice)
)
FROM Invoices
GROUP BY
CustomerID,
DateOrdered
),
TotalsRanked AS (
SELECT
CustomerID,
MonthDate,
Subtotal,
Ranking = ROW_NUMBER() OVER (PARTITION BY MonthDate ORDER BY Subtotal DESC)
FROM MaxDailyTotals
)
SELECT
Month = MONTH(i.MonthDate),
Year = YEAR(i.MonthDate),
c.CustomerCode,
i.Subtotal,
i.Ranking
FROM TotalsRanked i
INNER JOIN Customers ON i.CustomerID = c.ID
WHERE i.Ranking <= 5
;
The first CTE, MaxSubtotals, determines the maximum subtotals per customer & month. Involving DISTINCT and a window aggregating function, it is essentially a "shortcut" for the following two-step query:
SELECT
CustomerID,
MonthDate,
Subtotal = MAX(Subtotal)
FROM (
SELECT
CustomerID,
MonthDate = DATEADD(MONTH, DATEDIFF(MONTH, 0, DateOrdered), 0),
Subtotal = SUM(JobPrice)
FROM Invoices
GROUP BY
CustomerID,
DateOrdered
) s
GROUP BY
CustomerID,
MonthDate
The other CTE, TotalsRanked, simply adds ranking numbers for the found susbtotals, partitioning by customer and month. As a final step, you only need to limit the rows to those that have rankings not greater than 5 (or whatever you might choose another time).
Note that using ROW_NUMBER() to rank the rows in this case guarantees that you'll get no more than 5 rows with the Ranking <= 5 filter. If there were two or more rows with the same subtotal, the would get distinct rankings, and in the end you might end up with an output like this:
Month Year CustomerCode Subtotal Ranking
----- ---- ------------ -------- -------
1 2012 CCC 1500.00 1
1 2012 ELITE 1400.00 2
1 2012 NOC 900.00 3
1 2012 VBC 700.00 4
1 2012 HUCC 700.00 5
-- 1 2012 ABC 690.00 6 -- not returned
-- 1 2012 ... ... ...
Even though there might be other customers with Subtotals of 700.00 for the same month, they wouldn't be returned, because they would be assigned rankings after 5.
You could use RANK() instead of ROW_NUMBER() to account for that. But note that you might end up with more than 5 rows per month then, with an output like this:
Month Year CustomerCode Subtotal Ranking
----- ---- ------------ -------- -------
1 2012 CCC 1500.00 1
1 2012 ELITE 1400.00 2
1 2012 NOC 900.00 3
1 2012 VBC 700.00 4
1 2012 HUCC 700.00 4
1 2012 ABC 700.00 4
-- 1 2012 DEF 690.00 7 -- not returned
-- 1 2012 ... ... ...
Customers with subtotals less than 700.00 wouldn't make it to the output because they would have rankings starting with 7, which would correspond to the ranking of the first under-700.00 sum if ranked by ROW_NUMBER().
And there's another option, DENSE_RANK(). You might want to use it if you want up to 5 distinct sums per month in your output. With DENSE_RANK() your output might contain even more rows per month than it would have with RANK(), but the number of distinct subtotals would be exactly 5 (or fewer if the original dataset can't provide you with 5). That is, your output might then look like this:
Month Year CustomerCode Subtotal Ranking
----- ---- ------------ -------- -------
1 2012 CCC 1500.00 1
1 2012 ELITE 1400.00 2
1 2012 NOC 900.00 3
1 2012 VBC 700.00 4
1 2012 HUCC 700.00 4
1 2012 ABC 700.00 4
1 2012 DEF 650.00 5
1 2012 GHI 650.00 5
1 2012 JKL 650.00 5
-- 1 2012 MNO 600.00 5 -- not returned
-- 1 2012 ... ... ...
Like RANK(), the DENSE_RANK() function assigns same rankings to identical values, but, unlike RANK(), it doesn't produce gaps in the ranking sequence.
References:
OVER Clause (Transact-SQL)
Ranking Functions (Transact-SQL)