I have a sports statistical database. In one of the tables, I have the game-by-game stats for each player.
PK, PlayerID, OpponentID, GameID, Points, Rebounds, etc...
I would like to know how to return queries like, most consecutive games with at least 20 points or consecutive games with 10 rebounds, etc... (I have many other tables where this applies as well, just using this as an example.)
GameID is in chronological order, so that would be the way to determine consecutive games.
I assume this involves CTEs but I am not well-versed in that subject.
You are looking for queries that implement solutions for gaps-and-island problems.
Your question is quite generic, so let me give you an example of a such query, say: find the player(s) with the most consecutive 20+ point games; also find out the first and last game of the series, and the top/low points.
Here is a query for that purpose:
select top 1 with ties
PlayerID,
min(GameID) first_game,
max(GameID) last_game,
min(Points) min_points,
max(Points) max_points,
count(*) consecutive_games
from (
select
s.*,
row_number() over(partition by PlayerID order by GameID) rn,
sum(case when Points >= 20 then 1 else 0 end) over(partition by PlayerID order by GameID) sm
from PlayerStats s
) x
where Points >= 20
group by PlayerID, rn - sm
order by consecutive_games desc;
This works by doing a conditional sum ordered by game, that increments for each game above 20 points, and comparing it to the game sequence. When the difference between the sum and the game sequence changes, a new group of games starts; the rest is just aggregation. You can run the subquery idenpendently to see what it returns; you can also remove the top 1 clause to see the whole list of +20 points games series).
With this sample data:
PlayerID | GameID | Points
-------: | -----: | -----:
1 | 1 | 10
1 | 2 | 25
1 | 3 | 24
1 | 4 | 32
1 | 5 | 2
1 | 6 | 27
1 | 7 | 42
1 | 8 | 32
1 | 9 | 21
1 | 10 | 20
The query returns:
PlayerID | first_game | last_game | min_points | max_points | consecutive_games
-------: | ---------: | --------: | ---------: | ---------: | ----------------:
1 | 6 | 10 | 20 | 42 | 5
You should be able to apply the same logic for other stats.
Demo on DB Fiddle
I'm using SQL Server 2008, and trying to gather individual customer data appearing over multiple rows in my table, an example of my database is as follows:
custID | status | type | value
-------------------------
1 | 1 | A | 150
1 | 0 | B | 100
1 | 0 | A | 153
1 | 0 | A | 126
2 | 0 | A | 152
2 | 0 | B | 101
2 | 0 | B | 103
For each custID, my task is to find a flag if status=1 for any row, if type=B for any row, and the average of value in all cases where type=B. So my solution should look like:
custID | statusFlag | typeFlag | valueAv
-------------------------------------------
1 | 1 | 1 | 100
2 | 0 | 1 | 102
I can get answers for this using lots of row_number() over (partition by .. ), to create ids, and creating subtables for each column selecting the desired id. My issue is this method is awkward and time consuming, as I have many more columns than shown above to do this over, and many tables to repeat it for. My ideal solution would be to define my own aggregate() function so I could just do:
select custID, ag1(statusFlag), ag2(typeFlag)
group by custID
but as far as I can tell custom aggregates can't be defined in SQL server. Is there a nicer general approach to this problem, which doesn't require defining lots of id's ?
use CASE WHEN to evaluate the value and apply the aggregate function accordingly
select custID,
statusFlag = max(status),
typeFlag = max(case when type = 'B' then 1 else 0 end),
valueAv = avg(case when type = 'B' then value end)
from samples
group by custID
I'm trying to get a "lineage" or similar, and also information about the first and last links (at least; all would be good), out of a table that has self-referential links between rows that have been "replaced" and rows that have replaced them. The table has a structure along these lines:
CREATE TABLE Thing (
Id INT PRIMARY KEY,
TStamp DATETIME,
Replaces INT NULL,
ReplacedBy INT NULL
);
I'm stuck with this structure. :-) It's sort of doubly-linked (yes, it's a bit silly): Each row has a unique Id, and then a row that has been "replaced" by another will have a non-NULL ReplacedBy giving the Id of the replacement row, and the replacement row will also have a link back to what it replaces in Replaces. So we can use either Replaces or ReplacedBy (or both) if we like.
Here's some sample data:
INSERT INTO Thing
(Id, TStamp, Replaces, ReplacedBy)
VALUES
(1, '2017-01-01', NULL, 11),
(2, '2017-01-02', NULL, 12),
(3, '2017-01-03', NULL, NULL),
(4, '2017-01-04', NULL, NULL),
(11, '2017-01-11', 1, NULL),
(12, '2017-01-12', 2, 22),
(22, '2017-01-22', 12, NULL);
So 1 was replaced by 11, 2 was replaced by 12, and 12 was replaced by 22.
I'd like to get the following information for each chain of links from this table in a reasonable way:
Details of the row that started the chain
Details of the final row in the chain
Details of the links in-between or at least how many links (total) there are in the chain
...filtered by a date range applied to the last row in the chain.
In an ideal universe, I'd get back something like this:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 2017−01−01 |
| 1 | 11 | 11 | 2 | 2017−01−11 |
| 2 | 22 | 2 | 3 | 2017−01−02 |
| 2 | 22 | 12 | 3 | 2017−01−12 |
| 2 | 22 | 22 | 3 | 2017−01−22 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
So far I have this query, which I could post-process to get the above:
WITH Data AS (
SELECT Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
UNION ALL
SELECT Thing.Id, Thing.TStamp, Thing.Replaces, Thing.ReplacedBy, Depth + 1
FROM Data
JOIN Thing
ON Thing.Replaces = Data.Id
)
SELECT *
FROM Data
WHERE ReplacedBy IS NOT NULL OR Depth > 0
ORDER BY
Id, Depth;
That gives me:
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
| Id | TStamp | Replaces | ReplacedBy | Depth |
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
| 1 | 2017−01−01 | NULL | 11 | 0 |
| 2 | 2017−01−02 | NULL | 12 | 0 |
| 11 | 2017−01−11 | 1 | NULL | 1 |
| 12 | 2017−01−12 | 2 | 12 | 0 |
| 12 | 2017−01−12 | 2 | 12 | 1 |
| 22 | 2017−01−13 | 12 | NULL | 1 |
| 22 | 2017−01−13 | 12 | NULL | 2 |
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
And I could use something like this to figure out (for instance) the final row of each chain:
WITH Data AS (
SELECT Id, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
UNION ALL
SELECT Thing.Id, Thing.Replaces, Thing.ReplacedBy, Depth + 1
FROM Data
JOIN Thing
ON Thing.Replaces = Data.Id
),
MaxData AS (
SELECT Data.Id, Data.Depth
FROM Data
JOIN (
SELECT Id, MAX(Depth) AS MaxDepth
FROM Data
GROUP BY Id
) j ON data.Id = j.Id AND Data.Depth = j.MaxDepth
WHERE Depth > 0
)
SELECT *
FROM MaxData
ORDER BY
Id;
...which gives me:
+−−−−+−−−−−−−+
| Id | Depth |
+−−−−+−−−−−−−+
| 11 | 1 |
| 12 | 1 |
| 22 | 2 |
+−−−−+−−−−−−−+
...but I've lost the starting point and the points along the way.
I have the strong feeling I'm missing something really straight-forward — but clever — that would let me get this largely with the query rather than post-processing, some kind of join with a "min" and "max" query (but not like my one above). What would it be?
The table doesn't have any indexes on Replaces or ReplacedBy, but we could add any needed. The table is only lightly used (roughly 300k rows and probably only a couple of hundred updates/inserts a day).
I'm limited to SQL Server 2008 features.
Inspired by Gordon Linoff's answer and HABO's comment which highlighted something Gordon was doing that was critical, I:
Removed the SQL Server 2012+ FIRST_VALUE function, replacing it with a CROSS JOIN on an "overview" query of the data
Included the Links count in the overview query
Removed the reliance on t in Gordon's WHERE NOT EXISTS (SELECT 1 FROM Thing t2 WHERE t2.ReplacedBy = t.id), which (at last on SQL Server 2008) wasn't bound to anything
Filtered out rows that weren't replaced
Below, I also add the date filtering mentioned in the question
...filtered by a date range applied to the last row in the chain.
...which Gordon didn't cover at all, and changes our approach, but only in terms of the arrow of time.
So, first, without the date criteria, sticking fairly close to Gordon's answer:
WITH Data AS (
SELECT Id AS FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
UNION ALL
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d
JOIN Thing t ON t.Replaces = d.Id
),
Overview AS (
SELECT FirstId, MAX(Id) AS LastId, COUNT(*) AS Links
FROM Data
GROUP BY
FirstId
)
SELECT d.FirstId, o.LastId, d.Id, o.Links, d.Depth, d.TStamp
FROM Data d
CROSS APPLY (
SELECT LastId, Links
FROM Overview
WHERE FirstId = d.FirstId
) o
ORDER BY
d.FirstId, d.Depth
;
The critical parts of that are grabbing the seed Id as FirstId here:
SELECT Id AS FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
and then propagating it through the results of the recursive join:
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d
JOIN Thing t ON t.Replaces = d.Id
Just adding that to my original query gives us most of what I wanted. Then we add a second query to get the LastId for each FirstId (Gordon did it as a FIRST_VALUE over a partition, but I can't do that in SQL Server 2008) and using an overview query also lets me grab the number of links. We cross-apply that on the basis of the FirstId value to get the overall results I wanted.
The query above returns the following for the sample data:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 0 | 2017-01-01 |
| 1 | 11 | 11 | 2 | 1 | 2017-01-11 |
| 2 | 22 | 2 | 3 | 0 | 2017-01-02 |
| 2 | 22 | 12 | 3 | 1 | 2017-01-12 |
| 2 | 22 | 22 | 3 | 2 | 2017-01-13 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
...e.g., exactly what I wanted, plus Depth if I want (so I know what order the intermediary links were in).
If we wanted to include rows that were never replaced, we'd just change
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
to
WHERE Replaces IS NULL
Giving us:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 0 | 2017-01-01 |
| 1 | 11 | 11 | 2 | 1 | 2017-01-11 |
| 2 | 22 | 2 | 3 | 0 | 2017-01-02 |
| 2 | 22 | 12 | 3 | 1 | 2017-01-12 |
| 2 | 22 | 22 | 3 | 2 | 2017-01-13 |
| 3 | 3 | 3 | 1 | 0 | 2017-01-03 |
| 4 | 4 | 4 | 1 | 0 | 2017-01-04 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
But we've ignored the date criteria required by the question:
...filtered by a date range applied to the last row in the chain.
To do that without building a massive temporary result set, we have to work backward: Instead of selecting the starting point (the first entry in a chain, Replaces IS NULL), we need to select the ending point (the last entry in a chain, ReplacedBy IS NULL), and then invert our logic working back through the chain. It's largely a matter of:
Swapping FirstId with LastId
Swapping Replaces with ReplacedBy (convenient the table had both!)
Using MIN to get the first ID in the chain rather than MAX to get the last
Using d.Depth - 1 rather than d.Depth + 1
Then fixing-up Depth based on Links once we know it in our final select, to get those nice values where 0 = first link rather than some varying negative number: o.Links + d.Depth - 1 AS Depth
All of which gives us:
WITH Data AS (
SELECT Id AS LastId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE ReplacedBy IS NULL AND Replaces IS NOT NULL
-- Filtering by date of last entry would go here
UNION ALL
SELECT d.LastId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth - 1
FROM Data d
JOIN Thing t ON t.ReplacedBy = d.Id
),
Overview AS (
SELECT LastId, MIN(Id) AS FirstId, COUNT(*) AS Links
FROM Data
GROUP BY
LastId
)
SELECT o.FirstId, d.LastId, d.Id, o.Links, o.Links + d.Depth - 1 AS Depth, d.TStamp
FROM Data d
CROSS APPLY (
SELECT FirstId, Links
FROM Overview
WHERE LastId = d.LastId
) o
ORDER BY
o.FirstId, d.Depth
;
So for instance, if we used
AND TStamp BETWEEN '2017-01-12' AND '2017-02-01'
where I have
-- Filtering by date of last entry would go here
above, with our sample data we'd get this result:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 2 | 22 | 2 | 3 | 0 | 2017−01−02 |
| 2 | 22 | 12 | 3 | 1 | 2017−01−12 |
| 2 | 22 | 22 | 3 | 2 | 2017−01−13 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
...because the last link the Id = 1 chain is outside the date range, so we don't include it.
This is a little tricky. Arrange the CTE to start at the beginning of each list. That makes the subsequent processing easier:
WITH Data AS (
SELECT Id as FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing t
WHERE NOT EXISTS (SELECT 1 FROM Thing t2 WHERE t2.ReplacedBy = t.id)
UNION ALL
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d JOIN
Thing t
ON t.Replaces = d.Id
)
SELECT d.*,
FIRST_VALUE(id) OVER (PARTITION BY FirstId ORDER BY Depth DESC) as LastId
FROM Data d;
Then, you can use FIRST_VALUE() with a reverse sort to get the last value in the chain.
This returns chains that have no links. You can add a filter to remove these.
I have 2 tables, one with the ID and its count and the other with the names belonging to the respective IDs. They need to be joined so that the end result is a table with count of 1 in each row and the respective names next to them. Note that the number of names in table 2 is less than the count in table 1 for the same ID in some cases.
Table 1
ID | Count
-----------
100 | 3
101 | 2
102 | 4
Table 2
ID | Name
----------
100 | abc
100 | def
101 | ghi
101 | jkl
102 | mno
102 | pqr
102 | stu
Result
ID | Count | Name
------------------
100 | 1 | abc
100 | 1 | def
100 | 1 |
101 | 1 | ghi
101 | 1 | jkl
102 | 1 | mno
102 | 1 | pqr
102 | 1 | stu
102 | 1 |
I'm using TSQL for this and my current query converts table 1 into multiple rows in the result table; then it inserts individual names from table 2 into the result table through a loop. I'm hoping there must be a simpler or more efficient way to do this as the current method takes considerable amount of time. If there is, please let me know.
The first thing that comes to mind for me involves using a Number table, which you could create (as a one-time task) like this:
CREATE TABLE numbers (
ID INT
)
DECLARE #CurrentNumber INT, #MaxNumber INT
SET #MaxNumber = 100 -- Choose a value here which you feel will always be greater than MAX(table1.Count)
SET #CurrentNumber = 1
WHILE #CurrentNumber <= #MaxNumber
BEGIN
INSERT INTO numbers VALUES (#CurrentNumber)
SET #CurrentNumber = #CurrentNumber + 1
END
Once you have a numbers table, you can solve this problem like this:
SELECT one.ID,
1 AS [Count],
ISNULL(two.Name,'') AS Name
FROM table1 one
JOIN numbers n ON n.ID <= CASE WHEN one.[Count] >= (SELECT COUNT(1) FROM table2 two WHERE one.ID = two.ID)
THEN one.[Count]
ELSE (SELECT COUNT(1) FROM table2 two WHERE one.ID = two.ID)
END
LEFT JOIN (SELECT ID,
Name,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) AS RecordNo
FROM table2) two ON one.ID = two.ID
AND two.RecordNo = n.ID
I have following Product table and ProductTag tables -
ID | Product
--------------
1 | Product_A
2 | Product_B
3 | Product_C
TagID | ProductID
----------------------
1 | 2
1 | 3
2 | 1
2 | 2
2 | 3
3 | 1
3 | 2
Now I need a SQL query that return all products list which are having both Tag 1 and 2. Result should be as given below -
ProductID | Product
------------------------
2 | Product_B
3 | Product_C
Please suggest how can i write a MS SQL query for this.
SELECT p.ID, p.Product
FROM Product p
INNER JOIN ProductTag pt
ON p.ID = pt.ProductID
WHERE pt.TagID IN (1, 2) -- <== Tags you want to find
GROUP BY p.ID, o.Product
HAVING COUNT(*) = 2 -- <== tag count on WHERE clause
however, if TagID is not unique on every Product, you need to count only the distinct product.
HAVING COUNT(DISTINCT pt.TagID) = 2
More on: SQL of Relational Division