I have a bunch of value pairs (Before, After) by users in a table. In ideal scenarios these values should form an unbroken chain. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
| 1 | 30 | 40 |
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
Unfortunately, these records originate in multiple different tables and are imported into my investigation table. The other values in the table do not lend themselves to ordering (e.g. CreatedDate) due to some quirks in the system saving them out of order.
I need to produce a list of users with gaps in their data. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
// Row Deleted (30->40)
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
I've looked at the other Daisy Chaining questions on SO (and online in general), but they all appear to be on a given problem space, where one value in the pair is always lower than the other in a predictable fashion. In my case, there can be increases or decreases.
Is there a way to quickly calculate the longest chain that can be created? I do have a CreatedAt column that would provide some (very rough) relative ordering - When the date is more than about 10 seconds apart, we could consider them orderable)
Are you not therefore simply after this to get the first row where the "chain" is broken?
SELECT UserID, Before, After
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable NE
WHERE NE.After = YT.Before)
AND YT.Before != 0;
If you want to last row where the row where the "chain" is broken, just swap the aliases on the columns in the WHERE in the NOT EXISTS.
the following performs hierarchical recursion on your example data and calculates a "chain" count column called 'h_level'.
;with recur_cte([UserId], [Before], [After], h_level) as (
select [UserId], [Before], [After], 0
from dbo.test_table
where [Before] is null
union all
select tt.[UserId], tt.[Before], tt.[After], rc.h_level+1
from dbo.test_table tt join recur_cte rc on tt.UserId=rc.UserId
and tt.[Before]=rc.[After]
where tt.[Before]<tt.[after])
select * from recur_cte;
Results:
UserId Before After h_level
1 NULL 10 0
1 10 20 1
1 20 30 2
1 30 40 3
1 30 52 3
Is this helpful? Could you further define which rows to exclude?
If you want users that have more than one chain:
select t.UserID
from <T> as t left outer join <T> as t2
on t2.UserID = t.UserID and t2.Before = t.After
where t2.UserID is null
group by t.UserID
having count(*) > 1;
I'm trying to get a "lineage" or similar, and also information about the first and last links (at least; all would be good), out of a table that has self-referential links between rows that have been "replaced" and rows that have replaced them. The table has a structure along these lines:
CREATE TABLE Thing (
Id INT PRIMARY KEY,
TStamp DATETIME,
Replaces INT NULL,
ReplacedBy INT NULL
);
I'm stuck with this structure. :-) It's sort of doubly-linked (yes, it's a bit silly): Each row has a unique Id, and then a row that has been "replaced" by another will have a non-NULL ReplacedBy giving the Id of the replacement row, and the replacement row will also have a link back to what it replaces in Replaces. So we can use either Replaces or ReplacedBy (or both) if we like.
Here's some sample data:
INSERT INTO Thing
(Id, TStamp, Replaces, ReplacedBy)
VALUES
(1, '2017-01-01', NULL, 11),
(2, '2017-01-02', NULL, 12),
(3, '2017-01-03', NULL, NULL),
(4, '2017-01-04', NULL, NULL),
(11, '2017-01-11', 1, NULL),
(12, '2017-01-12', 2, 22),
(22, '2017-01-22', 12, NULL);
So 1 was replaced by 11, 2 was replaced by 12, and 12 was replaced by 22.
I'd like to get the following information for each chain of links from this table in a reasonable way:
Details of the row that started the chain
Details of the final row in the chain
Details of the links in-between or at least how many links (total) there are in the chain
...filtered by a date range applied to the last row in the chain.
In an ideal universe, I'd get back something like this:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 2017−01−01 |
| 1 | 11 | 11 | 2 | 2017−01−11 |
| 2 | 22 | 2 | 3 | 2017−01−02 |
| 2 | 22 | 12 | 3 | 2017−01−12 |
| 2 | 22 | 22 | 3 | 2017−01−22 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
So far I have this query, which I could post-process to get the above:
WITH Data AS (
SELECT Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
UNION ALL
SELECT Thing.Id, Thing.TStamp, Thing.Replaces, Thing.ReplacedBy, Depth + 1
FROM Data
JOIN Thing
ON Thing.Replaces = Data.Id
)
SELECT *
FROM Data
WHERE ReplacedBy IS NOT NULL OR Depth > 0
ORDER BY
Id, Depth;
That gives me:
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
| Id | TStamp | Replaces | ReplacedBy | Depth |
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
| 1 | 2017−01−01 | NULL | 11 | 0 |
| 2 | 2017−01−02 | NULL | 12 | 0 |
| 11 | 2017−01−11 | 1 | NULL | 1 |
| 12 | 2017−01−12 | 2 | 12 | 0 |
| 12 | 2017−01−12 | 2 | 12 | 1 |
| 22 | 2017−01−13 | 12 | NULL | 1 |
| 22 | 2017−01−13 | 12 | NULL | 2 |
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
And I could use something like this to figure out (for instance) the final row of each chain:
WITH Data AS (
SELECT Id, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
UNION ALL
SELECT Thing.Id, Thing.Replaces, Thing.ReplacedBy, Depth + 1
FROM Data
JOIN Thing
ON Thing.Replaces = Data.Id
),
MaxData AS (
SELECT Data.Id, Data.Depth
FROM Data
JOIN (
SELECT Id, MAX(Depth) AS MaxDepth
FROM Data
GROUP BY Id
) j ON data.Id = j.Id AND Data.Depth = j.MaxDepth
WHERE Depth > 0
)
SELECT *
FROM MaxData
ORDER BY
Id;
...which gives me:
+−−−−+−−−−−−−+
| Id | Depth |
+−−−−+−−−−−−−+
| 11 | 1 |
| 12 | 1 |
| 22 | 2 |
+−−−−+−−−−−−−+
...but I've lost the starting point and the points along the way.
I have the strong feeling I'm missing something really straight-forward — but clever — that would let me get this largely with the query rather than post-processing, some kind of join with a "min" and "max" query (but not like my one above). What would it be?
The table doesn't have any indexes on Replaces or ReplacedBy, but we could add any needed. The table is only lightly used (roughly 300k rows and probably only a couple of hundred updates/inserts a day).
I'm limited to SQL Server 2008 features.
Inspired by Gordon Linoff's answer and HABO's comment which highlighted something Gordon was doing that was critical, I:
Removed the SQL Server 2012+ FIRST_VALUE function, replacing it with a CROSS JOIN on an "overview" query of the data
Included the Links count in the overview query
Removed the reliance on t in Gordon's WHERE NOT EXISTS (SELECT 1 FROM Thing t2 WHERE t2.ReplacedBy = t.id), which (at last on SQL Server 2008) wasn't bound to anything
Filtered out rows that weren't replaced
Below, I also add the date filtering mentioned in the question
...filtered by a date range applied to the last row in the chain.
...which Gordon didn't cover at all, and changes our approach, but only in terms of the arrow of time.
So, first, without the date criteria, sticking fairly close to Gordon's answer:
WITH Data AS (
SELECT Id AS FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
UNION ALL
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d
JOIN Thing t ON t.Replaces = d.Id
),
Overview AS (
SELECT FirstId, MAX(Id) AS LastId, COUNT(*) AS Links
FROM Data
GROUP BY
FirstId
)
SELECT d.FirstId, o.LastId, d.Id, o.Links, d.Depth, d.TStamp
FROM Data d
CROSS APPLY (
SELECT LastId, Links
FROM Overview
WHERE FirstId = d.FirstId
) o
ORDER BY
d.FirstId, d.Depth
;
The critical parts of that are grabbing the seed Id as FirstId here:
SELECT Id AS FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
and then propagating it through the results of the recursive join:
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d
JOIN Thing t ON t.Replaces = d.Id
Just adding that to my original query gives us most of what I wanted. Then we add a second query to get the LastId for each FirstId (Gordon did it as a FIRST_VALUE over a partition, but I can't do that in SQL Server 2008) and using an overview query also lets me grab the number of links. We cross-apply that on the basis of the FirstId value to get the overall results I wanted.
The query above returns the following for the sample data:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 0 | 2017-01-01 |
| 1 | 11 | 11 | 2 | 1 | 2017-01-11 |
| 2 | 22 | 2 | 3 | 0 | 2017-01-02 |
| 2 | 22 | 12 | 3 | 1 | 2017-01-12 |
| 2 | 22 | 22 | 3 | 2 | 2017-01-13 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
...e.g., exactly what I wanted, plus Depth if I want (so I know what order the intermediary links were in).
If we wanted to include rows that were never replaced, we'd just change
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
to
WHERE Replaces IS NULL
Giving us:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 0 | 2017-01-01 |
| 1 | 11 | 11 | 2 | 1 | 2017-01-11 |
| 2 | 22 | 2 | 3 | 0 | 2017-01-02 |
| 2 | 22 | 12 | 3 | 1 | 2017-01-12 |
| 2 | 22 | 22 | 3 | 2 | 2017-01-13 |
| 3 | 3 | 3 | 1 | 0 | 2017-01-03 |
| 4 | 4 | 4 | 1 | 0 | 2017-01-04 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
But we've ignored the date criteria required by the question:
...filtered by a date range applied to the last row in the chain.
To do that without building a massive temporary result set, we have to work backward: Instead of selecting the starting point (the first entry in a chain, Replaces IS NULL), we need to select the ending point (the last entry in a chain, ReplacedBy IS NULL), and then invert our logic working back through the chain. It's largely a matter of:
Swapping FirstId with LastId
Swapping Replaces with ReplacedBy (convenient the table had both!)
Using MIN to get the first ID in the chain rather than MAX to get the last
Using d.Depth - 1 rather than d.Depth + 1
Then fixing-up Depth based on Links once we know it in our final select, to get those nice values where 0 = first link rather than some varying negative number: o.Links + d.Depth - 1 AS Depth
All of which gives us:
WITH Data AS (
SELECT Id AS LastId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE ReplacedBy IS NULL AND Replaces IS NOT NULL
-- Filtering by date of last entry would go here
UNION ALL
SELECT d.LastId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth - 1
FROM Data d
JOIN Thing t ON t.ReplacedBy = d.Id
),
Overview AS (
SELECT LastId, MIN(Id) AS FirstId, COUNT(*) AS Links
FROM Data
GROUP BY
LastId
)
SELECT o.FirstId, d.LastId, d.Id, o.Links, o.Links + d.Depth - 1 AS Depth, d.TStamp
FROM Data d
CROSS APPLY (
SELECT FirstId, Links
FROM Overview
WHERE LastId = d.LastId
) o
ORDER BY
o.FirstId, d.Depth
;
So for instance, if we used
AND TStamp BETWEEN '2017-01-12' AND '2017-02-01'
where I have
-- Filtering by date of last entry would go here
above, with our sample data we'd get this result:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 2 | 22 | 2 | 3 | 0 | 2017−01−02 |
| 2 | 22 | 12 | 3 | 1 | 2017−01−12 |
| 2 | 22 | 22 | 3 | 2 | 2017−01−13 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
...because the last link the Id = 1 chain is outside the date range, so we don't include it.
This is a little tricky. Arrange the CTE to start at the beginning of each list. That makes the subsequent processing easier:
WITH Data AS (
SELECT Id as FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing t
WHERE NOT EXISTS (SELECT 1 FROM Thing t2 WHERE t2.ReplacedBy = t.id)
UNION ALL
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d JOIN
Thing t
ON t.Replaces = d.Id
)
SELECT d.*,
FIRST_VALUE(id) OVER (PARTITION BY FirstId ORDER BY Depth DESC) as LastId
FROM Data d;
Then, you can use FIRST_VALUE() with a reverse sort to get the last value in the chain.
This returns chains that have no links. You can add a filter to remove these.
I have imported two Excel sheets as tables in Microsoft's SQL Server Management Server 2007, and they are both identical, except for the fact that they are from 2 different dates.
I'm looking to do 2 things that I'm struggling to do:
Calculate the monthly difference between values for the 2 tables which I can do with cast, and
inner join, but I'm not successful in using those values to be able to sum those values with a nested case like so:
SELECT SUM(
CASE WHEN ID <>'MISSING' THEN
CASE WHEN SUM(VALUE)>=0 THEN
SUM(VALUE)
ELSE
0
END
END)
I've tried many different ways, but one of the main errors I get is:
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
The data would like so:
dbo.Table1
date(dd/mm/yy) | name | id | value
---------------+------+---------+-------
1/1/14 | A | MISSING | 56
1/1/14 | A | MISSING | -1
1/1/14 | B | YES | 56
1/1/14 | B | YES | -1
dbo.Table2
date(dd/mm/yy) | name | id | value
---------------+------+---------+-------
1/2/14 | A | MISSING | 24
1/2/14 | A | MISSING | -11
1/2/14 | B | YES | 24
1/2/14 | B | YES | -11
Don't use SUM inside another SUM, just keep only the outer SUM. Also your outer CASE has no ELSE part, but you'd probably want to have 0 in that scenario as well, so why not use a single CASE condition?
SELECT SUM(CASE WHEN ID <> 'MISSING' AND VALUE >= 0 THEN VALUE ELSE 0 END)
I know there are several unpivot / cross apply discussions here but I was not able to find any discussion that covers my problem. What I've got so far is the following:
SELECT Perc, Salary
FROM (
SELECT jobid, Salary_10 AS Perc10, Salary_25 AS Perc25, [Salary_Median] AS Median
FROM vCalculatedView
WHERE JobID = '1'
GROUP BY JobID, SourceID, Salary_10, Salary_25, [Salary_Median]
) a
UNPIVOT (
Salary FOR Perc IN (Perc10, Perc25, Median)
) AS calc1
Now, what I would like is to add several other columns, eg. one named Bonus which I also want to put in Perc10, Perc25 and Median Rows.
As an alternative, I also made a query with cross apply, but here, it seems as if you can not "force" sort the rows like you can with unpivot. In other words, I can not have a custom sort, but only a sort that is according to a number within the table, if I am correct? At least, here I do get the result like I wish to have, but the rows are in a wrong order and I do not have the rows names like Perc10 etc. which would be nice.
SELECT crossapplied.Salary,
crossapplied.Bonus
FROM vCalculatedView v
CROSS APPLY (
VALUES
(Salary_10, Bonus_10)
, (Salary_25, Bonus_25)
, (Salary_Median, Bonus_Median)
) crossapplied (Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.Salary,
crossapplied.Bonus
Perc stands for Percentile here.
Output is intended to be something like this:
+--------------+---------+-------+
| Calculation | Salary | Bonus |
+--------------+---------+-------+
| Perc10 | 25 | 5 |
| Perc25 | 35 | 10 |
| Median | 27 | 8 |
+--------------+---------+-------+
Do I miss something or did I something wrong? I'm using MSSQL 2014, output is going into SSRS. Thanks a lot for any hint in advance!
Edit for clarification: The Unpivot-Method gives the following output:
+--------------+---------+
| Calculation | Salary |
+--------------+---------+
| Perc10 | 25 |
| Perc25 | 35 |
| Median | 27 |
+--------------+---------+
so it lacks the column "Bonus" here.
The Cross-Apply-Method gives the following output:
+---------+-------+
| Salary | Bonus |
+---------+-------+
| 35 | 10 |
| 25 | 5 |
| 27 | 8 |
+---------+-------+
So if you compare it to the intended output, you'll notice that the column "Calculation" is missing and the row sorting is wrong (note that the line 25 | 5 is in the second row instead of the first).
Edit 2: View's definition and sample data:
The view basically just adds computed columns of the table. In the table, I've got Columns like Salary and Bonus for each JobID. The View then just computes the percentiles like this:
Select
Percentile_Cont(0.1)
within group (order by Salary)
over (partition by jobID) as Salary_10,
Percentile_Cont(0.25)
within group (order by Salary)
over (partition by jobID) as Salary_25
from Tabelle
So the output is like:
+----+-------+---------+-----------+-----------+
| ID | JobID | Salary | Salary_10 | Salary_25 |
+----+-------+---------+-----------+-----------+
| 1 | 1 | 100 | 60 | 70 |
| 2 | 1 | 100 | 60 | 70 |
| 3 | 2 | 150 | 88 | 130 |
| 4 | 3 | 70 | 40 | 55 |
+----+-------+---------+-----------+-----------+
In the end, the view will be parameterized in a stored procedure.
Might this be your approach?
After your edits I understand, that your solution with CROSS APPLY would comes back with the right data, but not in the correct output. You can add constant values to your VALUES and do the sorting in a wrapper SELECT:
SELECT wrapped.Calculation,
wrapped.Salary,
wrapped.Bonus
FROM
(
SELECT crossapplied.*
FROM vCalculatedView v
CROSS APPLY (
VALUES
(1,'Perc10',Salary_10, Bonus_10)
, (2,'Perc25',Salary_25, Bonus_25)
, (3,'Median',Salary_Median, Bonus_Median)
) crossapplied (SortOrder,Calculation,Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.SortOrder,
crossapplied.Calculation,
crossapplied.Salary,
crossapplied.Bonus
) AS wrapped
ORDER BY wrapped.SortOrder
Thank you for taking your time to read my question.
What I want to do is to count all rows from a colum (State), and get 2 different count columns on the result (One for each possible value 1 or 0).
For example I have this table (Called “product”) with these rows:
ProductID |Subcategory| State |
101 | 201 | 1 |
102 | 201 | 1 |
103 | 201 | 1 |
104 | 202 | 0 |
105 | 202 | 0 |
106 | 203 | 1 |
107 | 203 | 0 |
108 | 203 | 0 |
State: 1=Active, 0=Inactive
So I want to get this:
|Subcategory| Active|Inactive|
| 201 | 3 | 0 |
| 202 | 0 | 2 |
| 202 | 1 | 2 |
Do you know how can I do this?
I was trying with this query and inserting the results in a temptable:
SELECT product.Subcategory,
COUNT(product.Subcategory) AS Products
FROM product,subcategory,category
WHERE product.Subcategory = subcategory.SubcategoryId
AND product.State = 1 or 0
AND subcategory.Category = category.CategoryId
GROUP BY product.Subcategory
But I get just the products with State = 1 or 0
I tried LEFT JOIN but I couldn’t get what I need it.
This is what I’m trying… (But it is not working)
SELECT P.Subcategory,
COUNT(P.State)
FROM product AS P
LEFT JOIN(
SELECT product.Subcategory AS S,
COUNT(product.State) AS C
FROM product
WHERE product.State = 0
GROUP BY product.Subcategory) AS t
ON t.S = p.Subcategory
I swear I checked on other user questions like:
Display zero by using count(*) if no result returned for a particular case
SQL - Returning all rows even if count is zero for item
But I couldn’t find what I need it.
Could you please help me with this? =)
Thank you!!!
I would do this using conditional aggregation:
select p.subcategory,
sum(case when state = 1 then 1 else 0 end) as active,
sum(case when state = 0 then 1 else 0 end) as inactive
from product p
group by p.subcategory
You can do that in two ways.
One is by using Conditional Aggregate use Count Instead of Sum to avoid Else part in case statement
SELECT subcategory,
Count(CASE WHEN state = 1 THEN 1 END) AS Active,
Count(CASE WHEN state = 0 THEN 1 END) AS InActive
FROM product
GROUP BY subcategory
Another way is by using Pivot
SELECT subcategory,
[1] AS Active,
[0] AS InActive
FROM (SELECT subcategory,
State,
ProductID
FROM product) a
PIVOT(Count(ProductID)
FOR State IN([1],[0])) piv
Try This
SELECT subcategory,
Count(State) AS TotalC,
ABS(Sum(State)) AS Active,
TotalC-Active AS InActive
FROM product GROUP BY ProductID;