SQL Server Pivot Table and Group - sql-server

I'm attempting to pivot some of my rows, currently my query is this:
SELECT count(distinct users.userName) as TotalUsers, products.productNameCommon, employees.Division, products.productType
FROM FlexLM_users users
INNER JOIN Org.Employees employees ON employees.Username=users.userName
INNER JOIN FlexLM_history history ON users.userID=history.userID
INNER JOIN FlexLM_products products ON products.productID=history.productID
where products.productType = 'Base'
GROUP BY products.productNameCommon, employees.Division, products.productType
ORDER BY users DESC
And outputs this:
TotalUsers| productNameCommon | Division | productType
------------------------------------------------------------------------
16 | Standard | Disease Control | base
12 | Basic | Epidemiology | base
10 | Standard | Prevention | base
8 | Advanced | Epidemiology | base
6 | Basic | Disease Control | base
2 | Advanced | Prevention | base
What I am looking to do is this:
Division | Basic | Standard | Advanced | TotalUsers
----------------------------------------------------------
Disease Control| 6 | 16 | 0 | 22
Epidemiology | 12 | 0 | 8 | 20
Prevention | 0 | 10 | 2 | 12

SELECT Division
, ISNULL([Basic] , 0) AS [Basic]
, ISNULL([Standard], 0) AS [Standard]
, ISNULL([Advanced], 0) AS [Advanced]
, ISNULL([Basic] , 0)
+ ISNULL([Standard], 0)
+ ISNULL([Advanced], 0) AS TotalUsers
FROM (
SELECT TotalUsers , productNameCommon , Division
FROM (
-- Your Existing Query here
)a
) t
PIVOT (
SUM (TotalUsers)
FOR productNameCommon
IN ([Basic], [Standard] , [Advanced])
) p

Related

How to retreive stack from Database

I have this table called 'Stack'.
+---------------+-------+-------------+
| Stack_Counter | value | Stack_Depth |
+---------------+-------+-------------+
| 1 | 3 | 1 |
| 2 | 0 | 2 |
| 3 | 0 | 1 |
| 4 | | 0 |
| 5 | 3 | 1 |
| 6 | 3 | 2 |
| 7 | 1 | 3 |
| 8 | 2 | 2 |
| 9 | 4 | 1 |
| 10 | 2 | 2 |
| 11 | 0 | 3 |
| 12 | 0 | 2 |
| 13 | 0 | 1 |
| 14 | 2 | 2 |
| 15 | 2 | 3 |
| 16 | 1 | 4 |
| 17 | 1 | 3 |
| 18 | 2 | 2 |
| 19 | 1 | 3 |
| 20 | 0 | 4 |
+---------------+-------+-------------+
I want to find out the stack array in Stack_Counter '20'.
So the correct answer should be
+---------------+-------+-------------+
| Stack_Counter | value | Stack_Depth |
+---------------+-------+-------------+
| 13 | 0 | 1 |
| 18 | 2 | 2 |
| 19 | 1 | 3 |
| 20 | 0 | 4 |
+---------------+-------+-------------+
Basically , this is to find out consecutive rows in selected Stack_Depth.
Is there any way to acheive it?
... and here's a generic all-SQL solution:
SELECT Stack_Counter, value, Stack_Depth
FROM
(SELECT *, RANK() OVER (
PARTITION BY Stack_Depth
ORDER BY Stack_Counter DESC) rank
FROM stack)
WHERE rank=1 AND Stack_Depth > 0;
An SQL-only solution may be more trouble than it's worth (either because of its complexity or its tediousness), but here is one that works with SQLite:
SELECT * FROM (SELECT * FROM stack WHERE Stack_Depth=1 ORDER BY Stack_Counter DESC LIMIT 1)
UNION ALL
SELECT * FROM (SELECT * FROM stack WHERE Stack_Depth=2 ORDER BY Stack_Counter DESC LIMIT 1)
UNION ALL
SELECT * FROM (SELECT * FROM stack WHERE Stack_Depth=3 ORDER BY Stack_Counter DESC LIMIT 1)
UNION ALL
SELECT * FROM (SELECT * FROM stack WHERE Stack_Depth=4 ORDER BY Stack_Counter DESC LIMIT 1);
The window functions I mentioned in comments proved to be more difficult than I thought... for myself at the time. Peak's answer is an elegant solution using the rank() window function, just what I had originally intended. In the mean time, sqlite also supports recursive CTE (Common Table Expressions; WITH statement):
WITH RECURSIVE
latest (id, level) AS (
VALUES (20, (SELECT Stack_Depth FROM stack WHERE Stack_Counter = 20))
UNION ALL
SELECT (SELECT max(Stack_Counter)
FROM stack
WHERE Stack_Depth = level - 1
AND Stack_Counter <= 20),
level - 1
FROM latest
WHERE level - 1 > 0
)
SELECT stack.*
FROM stack INNER JOIN latest
ON stack.Stack_Counter = latest.id
ORDER BY stack.Stack_Counter
There are three places that I had to insert the desired stack level, but these could all be replaced with a named SQL parameter if you're calling this from a prepared statement in the host language.
And if you're not interested in choosing a particular Stack_Counter value, rather just want the result from the entire table, then replace the VALUES clause with a SELECT like
WITH RECURSIVE
latest (id, level) AS (
SELECT * FROM (SELECT Stack_Counter, Stack_Depth FROM stack ORDER BY Stack_Counter DESC LIMIT 1)
UNION ALL
SELECT (SELECT max(Stack_Counter)
FROM stack
WHERE Stack_Depth = level - 1),
level - 1
FROM latest
WHERE level - 1 > 0
)
SELECT stack.*
FROM stack INNER JOIN latest
ON stack.Stack_Counter = latest.id
ORDER BY stack.Stack_Counter

SQL Server - identify combinations of values and assign combination identifier

I am trying to assign what amounts to a 'combinationid' to rows of my table, based on the values in the two columns below. Each product has a number of customers linked to it. For every combination of customers, I need to create a combination ID.
For example, the combination of customers for product 'a' is the same combination of customers for product 'c' (they both have customers 1, 2 and 3), so products a and c should have the same combination identifier ('customergroup'). However, products should not share the same customergroup if they only share some of the same customers - e.g. product b only has customers 1 and 2 (not 3), so should have a different customergroup to products 'a' and 'c'.
Input:
| productid | customerid |
|-----------|------------|
| a | 1 |
| a | 2 |
| a | 3 |
| b | 1 |
| b | 2 |
| c | 3 |
| c | 2 |
| c | 1 |
| d | 1 |
| d | 3 |
| e | 1 |
| e | 2 |
| f | 1 |
| g | 2 |
| h | 3 |
Desired output:
| productid | customerid | customergroup |
|-----------|------------|---------------|
| a | 1 | 1 |
| a | 2 | 1 |
| a | 3 | 1 |
| b | 1 | 2 |
| b | 2 | 2 |
| c | 3 | 1 |
| c | 2 | 1 |
| c | 1 | 1 |
| d | 1 | 3 |
| d | 3 | 3 |
| e | 1 | 2 |
| e | 2 | 2 |
| f | 1 | 4 |
| g | 2 | 5 |
| h | 3 | 6 |
or just
| productid | customergroupid |
|-----------|-----------------|
| a | 1 |
| b | 2 |
| c | 1 |
| d | 3 |
| e | 2 |
| f | 4 |
| g | 5 |
| h | 6 |
Edit: first version of this did include a description of my attempts. I currently have nested queries that basically give me a column for customer 1, 2, 3 etc and then uses dense rank to get the grouping. The problem is that is not dynamic for different numbers of customers and I did not know where to start for getting a dynamic result as above. Thanks for the replies.
Considering you haven't shown your efforts, or confirmed the version you're using, I've assumed you have the latest ("and greatest") version of SQL Server, which means you have access to STRING_AGG.
This doesn't give the groupings in the same order, but I'm going to also also that doesn't matter, and the grouping is just arbitrary. This gives you the following:
WITH VTE AS(
SELECT *
FROM (VALUES('a',1),
('a',2),
('a',3),
('b',1),
('b',2),
('c',3),
('c',2),
('c',1),
('d',1),
('d',3),
('e',1),
('e',2),
('f',1),
('g',2),
('h',3)) V(productid,customerid)),
Groups AS(
SELECT productid,
STRING_AGG(customerid,',') WITHIN GROUP (ORDER BY customerid) AS CustomerIDs
FROM VTE
GROUP BY productid),
Rankings AS(
SELECT productid,
CustomerIDs,
DENSE_RANK() OVER (ORDER BY CustomerIDs ASC) AS Grouping
FROM Groups)
SELECT V.productid,
V.customerid,
R.Grouping AS customergroupid
FROM VTE V
JOIN Rankings R ON V.productid = R.productid
ORDER BY V.productid,
V.customerid;
db<>fiddle.
If you aren't using SQL Server 2017, I suggest looking up the FOR XML PATH method for string aggregation.
Using Larnu's answer this is how I got the result for 2008:
WITH VTE AS(
SELECT *
FROM (VALUES('a','1'),
('a','2'),
('a','3'),
('b','1'),
('b','2'),
('c','3'),
('c','2'),
('c','1'),
('d','1'),
('d','3'),
('e','1'),
('e','2'),
('f','1'),
('g','2'),
('h','3')) V(productid,customerid)),
Groups AS(
SELECT productid, CustomerIDs = STUFF((SELECT N', ' + customerid
FROM VTE AS p2
WHERE p2.productid = p.productid
ORDER BY customerid
FOR XML PATH(N'')), 1, 2, N'')
FROM VTE AS p
GROUP BY productid),
Rankings AS(
SELECT productid,
CustomerIDs,
DENSE_RANK() OVER (ORDER BY CustomerIDs ASC) AS Grouping
FROM Groups)
SELECT V.productid,
V.customerid,
R.Grouping AS customergroupid
FROM VTE V
JOIN Rankings R ON V.productid = R.productid
ORDER BY V.productid,
V.customerid;
Thanks again for your assistance.

Getting duplicates with additional information

I've inherited a database and I'm having trouble constructing a working SQL query.
Suppose this is the data:
[Products]
| Id | DisplayId | Version | Company | Description |
|---- |----------- |---------- |-----------| ----------- |
| 1 | 12345 | 0 | 16 | Random |
| 2 | 12345 | 0 | 2 | Random 2 |
| 3 | AB123 | 0 | 1 | Random 3 |
| 4 | 12345 | 1 | 16 | Random 4 |
| 5 | 12345 | 1 | 2 | Random 5 |
| 6 | AB123 | 0 | 5 | Random 6 |
| 7 | 12345 | 2 | 16 | Random 7 |
| 8 | XX45 | 0 | 5 | Random 8 |
| 9 | XX45 | 0 | 7 | Random 9 |
| 10 | XX45 | 1 | 5 | Random 10 |
| 11 | XX45 | 1 | 7 | Random 11 |
[Companies]
| Id | Code |
|---- |-----------|
| 1 | 'ABC' |
| 2 | '456' |
| 5 | 'XYZ' |
| 7 | 'XYZ' |
| 16 | '456' |
The Versioncolumn is a version number. Higher numbers indicate more recent versions.
The Company column is a foreign key referencing the Companies table on the Id column.
There's another table called ProductData with a ProductId column referencing Products.Id.
Now I need to find duplicates based on the DisplayId and the corresponding Companies.Code. The ProductData table should be joined to show a title (ProductData.Title), and only the most recent ones should be included in the results. So the expected results are:
| Id | DisplayId | Version | Company | Description | ProductData.Title |
|---- |----------- |---------- |-----------|------------- |------------------ |
| 5 | 12345 | 1 | 2 | Random 2 | Title 2 |
| 7 | 12345 | 2 | 16 | Random 7 | Title 7 |
| 10 | XX45 | 1 | 5 | Random 10 | Title 10 |
| 11 | XX45 | 1 | 7 | Random 11 | Title 11 |
because XX45 has 2 "entries": one with Company 5 and one with Company 7, but both companies share the same code.
because 12345 has 2 "entries": one with Company 2 and one with Company 16, but both companies share the same code. Note that the most recent version of both differs (version 2 for company 16's entry and version 1 for company 2's entry)
ABC123 should not be included as its 2 entries have different company codes.
I'm eager to learn your insights...
Based on your sample data, you just need to JOIN the tables:
SELECT
p.Id, p.DisplayId, p.Version, p.Company, d.Title
FROM Products AS p
INNER JOIN Companies AS c ON p.Company = c.Id
INNER JOIN ProductData AS d ON d.ProductId = p.Id;
But if you want the latest one, you can use the ROW_NUMBER():
WITH CTE
AS
(
SELECT
p.Id, p.DisplayId, p.Version, p.Company, d.Title,
ROW_NUMBER() OVER(PARTITION BY p.DisplayId,p.Company ORDER BY p.Id DESC) AS RN
FROM Products AS p
INNER JOIN Companies AS c ON p.Company = c.Id
INNER JOIN ProductData AS d ON d.ProductId = p.Id
)
SELECT *
FROM CTE
WHERE RN = 1;
sample fiddle
| Id | DisplayId | Version | Company | Title |
|----|-----------|---------|---------|----------|
| 5 | 12345 | 1 | 2 | Title 5 |
| 7 | 12345 | 2 | 16 | Title 7 |
| 10 | XX45 | 1 | 5 | Title 10 |
| 11 | XX45 | 1 | 7 | Title 11 |
If i understood you correctly, you can use CTE to find all the duplicated rows from your table, then you can just use SELECT from CTE and even add more manipulations.
WITH CTE AS(
SELECT Id,DisplayId,Version,Company,Description,ProductData.Title
RN = ROW_NUMBER()OVER(PARTITION BY DisplayId, Company ORDER BY p.Id DESC)
FROM dbo.YourTable1
)
SELECT *
FROM CTE
Try this:
SELECT b.ID,displayid,version,company,productdata.title
FROM
(select A.ID,a.displayid,version,a.company,rn,a.code, COUNT(displayid) over (partition by displayid,code) cnt from
(select Prod.ID,displayid,version,company,Companies.code, Row_number() over (partition by displayid,company order by version desc) rn
from Prod inner join Companies on Prod.Company = Companies.id) a
where a.rn=1) b inner join productdata on b.id = productdata.id where cnt =2
You have to first get the current version and then you see how many times the DisplayID + Code show-up. Then based on that you can select only the ones that have a count greater than one. You can then INNER JOIN ProductData on the final query to get the Title.
WITH
MaxVersion AS --Get the current versions
(
SELECT
MAX(Version) AS Version,
DisplayID,
Company
FROM
#TmpProducts
GROUP BY
DisplayID,
Company
)
,CTE AS
(
SELECT
p.DisplayID,
c.Code,
COUNT(*) AS RowCounter
FROM
#TmpProducts p
INNER JOIN
#TmpCompanies c
ON
c.ID = p.Company
INNER JOIN
MaxVersion mv
ON
mv.DisplayID = p.DisplayID
AND mv.Version = p.Version
AND mv.Company = p.Company
GROUP BY
p.DisplayID,
c.Code
)
SELECT
p.*
FROM
#TmpProducts p
INNER JOIN
CTE c
ON
c.DisplayID = p.DisplayID
INNER JOIN
MaxVersion mv
ON
mv.DisplayID = p.DisplayID
AND mv.Company = p.Company
AND mv.Version = p.Version
WHERE
c.RowCounter > 1

How to pivot on two levels

I am using SQL Server 2012 and am trying to construct a pivot table from TSQL based on the table below which has been generated by joining multiple tables.
INCIDENT ID | Department | Priority | Impact
--------------------------------------------
1 | IT | Urgent | High
2 | IT | Retrospective | Medium
3 | Marketing | Normal | Low
4 | Marketing | Normal | High
5 | Marketing | Normal | Med
6 | Finance | Normal | Med
From this table, want it to be displayed in following format:
Priority | Normal | Urgent | Retrospective |
| Department | Low | Medium | High | Low | Medium | High | Low | Medium | High |
--------------------------------------------------------------------------------
| IT | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 |
| Finance | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
| Marketing | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 |
I have the following code which successfully Pivots on the "Priority" level.
SELECT *
FROM (
SELECT
COUNT(incident.incident_id) OVER(PARTITION BY serv_dept.serv_dept_n) Total,
serv_dept.serv_dept_n Department,
ImpactName.item_n Impact,
PriorityName.item_n Priority
FROM -- ommitted for brevity
WHERE -- ommitted for brevity
) AS T
PIVOT (
COUNT(Priority)
FOR Priority IN ("Normal", "Urgent", "Retrospective")
) PIV
ORDER BY Department ASC
How can I get this query to pivot on two levels like the second table I pasted?
Any help would be appreciated.
The easiest way may be conditional aggregation:
select department,
sum(case when priority = 'Normal' and target = 'Low' then 1 else 0 end) as Normal_low,
sum(case when priority = 'Normal' and target = 'Med' then 1 else 0 end) as Normal_med,
sum(case when priority = 'Normal' and target = 'High' then 1 else 0 end) as Normal_high,
. . .
from t
group by department;
I'll take a stab at it:
WITH PivotData AS
(
SELECT
Department
, Priority + '_' + Impact AS PriorityImpact
, Incident_ID
FROM
<table>
)
SELECT
Department
, Normal_Low
, Normal_Medium
,...
FROM
PivotData
PIVOT (COUNT(Incident_ID FOR PriorityImpact IN (<Listing all the PriorityImpact values>) ) as P;

Using Common Table Expressions for multi-time windows

I have run this query in SQL Server as:
WITH CTE AS
(
SELECT AIP.aid [Author_ID],
MIN(CAST(P.abstract_research_area as VARCHAR(100))) [Research_Area],
CAST(RC.research_category as VARCHAR(100)) [Research_Category],
P.abstract_research_area_category_id [Category_ID],
COUNT(*) [Paper_Count],
P.p_year [Paper_Year]
FROM author_individual_papers AIP
JOIN sub_aminer_paper P ON AIP.pid = P.pid
JOIN research_categories RC ON P.abstract_research_area_category_id = RC.category_id
WHERE P.abstract_research_area_category_id IS NOT NULL AND
AIP.aid IN (SELECT Author_ID FROM Authors) AND AIP.p_year BETWEEN 2005 AND 2014
GROUP BY AIP.aid,
CAST(RC.research_category as VARCHAR(100)),
P.abstract_research_area_category_id,
P.p_year
),
CTE_1 AS
(
SELECT *, ROW_NUMBER()
OVER(
PARTITION BY Author_ID, Paper_Year
ORDER BY Paper_Count DESC, Research_Area ASC
) AS Rank
FROM CTE
)
SELECT *
FROM CTE_1
WHERE Rank <= 3
which returns this output:
+-----------+------------------------+-------------------+-------------+-------------+------------+------+
| Author_ID | Research_Area | Research_Category | Category_ID | Paper_Count | Paper_Year | Rank |
+-----------+------------------------+-------------------+-------------+-------------+------------+------+
| 677 | feature extraction | Data Mining | 8 | 1 | 2005 | 1 |
| 677 | image annotation | Image Processing | 11 | 1 | 2005 | 2 |
| 677 | retrieval model | Info retrieval | 12 | 1 | 2005 | 3 |
| 677 | semantic | Prog Languages | 19 | 1 | 2007 | 1 |
| 677 | feature extraction | Data Mining | 8 | 1 | 2009 | 1 |
| 677 | image annotation | Image Processing | 11 | 1 | 2011 | 1 |
| 677 | semantic | Prog Languages | 19 | 1 | 2012 | 1 |
| 677 | video sequence | Computation Math | 5 | 2 | 2013 | 1 |
| 1359 | adversary model | Analysis of Algo | 1 | 2 | 2005 | 1 |
| 1359 | ensemble method | Machine Learning | 14 | 2 | 2005 | 2 |
| 1359 | image representation | Image Processing | 11 | 2 | 2005 | 3 |
| 1359 | adversary model | Analysis of Algo | 1 | 7 | 2006 | 1 |
| 1359 | concurrency control | Signal Processing | 17 | 5 | 2006 | 2 |
| 1359 | information system | Info retrieval | 12 | 2 | 2006 | 3 |
| 1359 | algorithm analysis | Analysis of Algo | 1 | 3 | 2007 | 1 |
| 1359 | markov model | Prob & Statistics | 18 | 2 | 2007 | 2 |
| 1359 | real time systems | Signal Processing | 17 | 2 | 2007 | 3 |
| 1359 | point based model | Computation Math | 5 | 3 | 2008 | 1 |
| 1359 | discriminant analysis | Analysis of Algo | 1 | 2 | 2008 | 2 |
| 1359 | fuzzy logic systems | Artif Intelligence| 2 | 2 | 2008 | 3 |
| ... | ... | ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... | ... | ... |
+-----------+------------------------+-------------------+-------------+-------------+------------+------+
This is showing TOP 3 ROWS for each Author_ID in every Paper_Year ranging BETWEEN 2005 to 2014 which is ORDER BY Paper_Count DESC. So now each Author_ID if having papers in each (10) Paper_Year, then will correspond to 30 rows.
I want to display TOP 3 ROWS for each Author_ID not for every Paper_Year individually but for each Paper_Interval e.g. for Paper_Interval i.e. 2005-06, 2007-08, 2009-10, 2011-12, 2013-14.
The desired/expected* OUTPUT is:
* If there is no paper in respected year for any author, in-spite of this, the year should get mentioned in the Interval e.g. Author_ID = 677 has no paper in 2006, so still the Interval should get displayed as 2005-2006.
+-----------+---------------------+-------------+-------------+----------------+------+
| Author_ID | Research_Category | Category_ID | Paper_Count | Paper_Interval | Rank |
+-----------+---------------------+-------------+-------------+----------------+------+
| 677 | Data Mining | 8 | 1 | 2005-06 | 1 |
| 677 | Image Processing | 11 | 1 | 2005-06 | 2 |
| 677 | Info retrieval | 12 | 1 | 2005-06 | 3 |
| 677 | Prog Languages | 19 | 1 | 2007-08 | 1 |
| 677 | Data Mining | 8 | 1 | 2009-10 | 1 |
| 677 | Image Processing | 11 | 1 | 2011-12 | 1 |
| 677 | Prog Languages | 19 | 1 | 2011-12 | 2 |
| 677 | Computation Math | 5 | 2 | 2013-14 | 1 |
| 1359 | Analysis of Algo | 1 | 9 | 2005-06 | 1 |
| 1359 | Signal Processing | 17 | 5 | 2005-06 | 2 |
| 1359 | Machine Learning | 14 | 2 | 2005-06 | 3 |
| 1359 | Analysis of Algo | 1 | 5 | 2007-08 | 1 |
| 1359 | Prob & Statistics | 5 | 3 | 2007-08 | 2 |
| 1359 | Artif Intelligence | 2 | 2 | 2007-08 | 3 |
| ... | ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... | ... |
+-----------+---------------------+-------------+-------------+----------------+------+
whereas if I make 2 years interval for each Author_ID then each author will have 15 rows max if having papers in each Paper_Interval (5 Intervals).
Moreover, Research_Category with the first highest total Paper_Count in a single Paper_Interval will come at Rank = 1 and vice versa. If there is a match in Paper_Count as in this case:
For Author_ID = 1359 and Paper_Interval = 2005-06
In terms of highest total Paper_Count,
First Highest Total Paper_Count = 9 for Category_ID = 1 will be at Rank = 1
Second Highest Total Paper_Count = 5 for Category_ID = 17 will be at Rank = 2
Third Highest Total
There is a match in terms of Third Highest Total Paper_Count i.e.
Research_Area | Category_ID | Paper_Count | Paper_Interval
-----------------------------------------------------------------
ensemble method | 14 | 2 | 2005-06
image representation | 11 | 2 | 2005-06
information system | 12 | 2 | 2005-06
Now, in this case we will choose alphabetically (Research_Area) for Rank = 3 which comes Category_ID = 14.
The question is: how can we modify this query to get output in desired form for 5 intervals (i.e. Paper_Interval) for each Author_ID?
ADDENDUM
I have added 3 tables (used in the query) with sample data in .csv format in under-mentioned links as:
CREATE TABLE author_individual_papers
CREATE TABLE [dbo].[author_individual_papers](
[id] [int] IDENTITY(1,1) NOT NULL,
[aid] [int] NULL,
[pid] [int] NULL,
[p_year] [int] NULL,
[p_venue_vid] [int] NULL
)
Table link with sample data (only for Author_ID 677 & 1359)
author_individual_papers
CREATE TABLE sub_aminer_paper
CREATE TABLE [dbo].[sub_aminer_paper](
[pid] [int] NULL,
[p_year] [int] NULL,
[abstract_research_area] [varchar](max) NULL,
[abstract_research_area_category_id] [int] NULL
)
Table link with sample data (only for Author_ID 677 & 1359)
sub_aminer_paper
CREATE TABLE research_categories
CREATE TABLE [dbo].[research_categories](
[category_id] [int] NOT NULL,
[research_category] [nvarchar](max) NULL
)
Table link with data
research_categories
The desired/expected result is already mentioned above in the question.
Try this, hopefully i got your requirement right.
DECLARE #year_start INT
DECLARE #year_end INT
SET #year_start = 2005
SET #year_end = 2014
; WITH
CTE AS
(
SELECT
AIP.aid [Author_ID],
MIN(CAST(P.abstract_research_area as VARCHAR(100))) [Research_Area],
CAST(RC.research_category as VARCHAR(100)) [Research_Category],
P.abstract_research_area_category_id [Category_ID],
COUNT(*) [Paper_Count],
--P.p_year [Paper_Year], -- removed
(p.p_year - #year_start + 2) / 2 [Interval_No], -- added
CAST(MIN(P.p_year) as VARCHAR(4)) + '-' + CAST(MAX(P.p_year) as VARCHAR(4)) [Interval] -- added
FROM
author_individual_papers AIP
JOIN
sub_aminer_paper P
ON AIP.pid = P.pid
JOIN research_categories RC
ON P.abstract_research_area_category_id = RC.category_id
WHERE
P.abstract_research_area_category_id IS NOT NULL
AND
AIP.aid IN (SELECT Author_ID FROM Authors)
AND
AIP.p_year BETWEEN #year_start AND #year_end
GROUP BY
AIP.aid,
CAST(RC.research_category as VARCHAR(100)),
P.abstract_research_area_category_id,
--P.p_year, -- removed
(p.p_year - #year_start + 2) / 2 -- added
),
CTE_1 AS
(
SELECT *,
ROW_NUMBER()
OVER(
PARTITION BY Author_ID, [Interval_No] -- changed
ORDER BY Paper_Count DESC, Research_Area ASC
) AS Rank
FROM CTE
)
SELECT *
FROM CTE_1
WHERE Rank <= 3
EDIT : Updated Query
DECLARE #year_start INT
DECLARE #year_end INT
SET #year_start = 2005
SET #year_end = 2014
; WITH
p_year AS -- added
(
SELECT p_year = #year_start
UNION ALL
SELECT p_year = p_year + 1
FROM p_year
WHERE p_year < #year_end
),
Interval AS -- added
(
SELECT p_year, Interval_No,
Interval = CAST(MIN(p_year) OVER (PARTITION BY Interval_No) AS VARCHAR(4)) + '-' + CAST(MAX(p_year) OVER (PARTITION BY Interval_No) AS VARCHAR(4))
FROM
(
SELECT p_year, (p_year - #year_start + 2) / 2 AS Interval_No
FROM p_year
) AS D
),
CTE AS
(
SELECT
AIP.aid [Author_ID],
MIN(CAST(P.abstract_research_area as VARCHAR(100))) [Research_Area],
CAST(RC.research_category as VARCHAR(100)) [Research_Category],
P.abstract_research_area_category_id [Category_ID],
COUNT(*) [Paper_Count],
--P.p_year [Paper_Year], -- removed
I.Interval_No, -- added, changed
I.Interval -- added, changed
FROM
author_individual_papers AIP
JOIN
sub_aminer_paper P
ON AIP.pid = P.pid
JOIN research_categories RC
ON P.abstract_research_area_category_id = RC.category_id
JOIN Interval I -- added
ON P.p_year = I.p_year
WHERE
P.abstract_research_area_category_id IS NOT NULL
AND
AIP.aid IN (SELECT Author_ID FROM Authors)
AND
AIP.p_year BETWEEN #year_start AND #year_end
GROUP BY
AIP.aid,
CAST(RC.research_category as VARCHAR(100)),
P.abstract_research_area_category_id,
--P.p_year, -- removed
I.Interval_No, I.Interval -- added, changed
),
CTE_1 AS
(
SELECT *,
ROW_NUMBER()
OVER
(
PARTITION BY Author_ID, [Interval_No] -- changed
ORDER BY Paper_Count DESC, Research_Area ASC
) AS Rank
FROM CTE
)
SELECT *
FROM CTE_1
WHERE Rank <= 3
ORDER BY Author_ID, Interval, Rank

Resources