SQL Server Hierarchical Sum of column - sql-server

I have my database design as per the diagram.
Category table is self referencing parent child relationship
Budget will have all the categories and amount define for each category
Expense table will have entries for categories for which the amount has been spend (consider Total column from this table).
I want to write select statement that will retrieve dataset with columns given below :
ID
CategoryID
CategoryName
TotalAmount (Sum of Amount Column of all children hierarchy From BudgetTable )
SumOfExpense (Sum of Total Column of Expense all children hierarchy from expense table)
I tried to use a CTE but was unable to produce anything useful. Thanks for your help in advance. :)
Update
I just to combine and simplify data I have created one view with the query below.
SELECT
dbo.Budget.Id, dbo.Budget.ProjectId, dbo.Budget.CategoryId,
dbo.Budget.Amount,
dbo.Category.ParentID, dbo.Category.Name,
ISNULL(dbo.Expense.Total, 0) AS CostToDate
FROM
dbo.Budget
INNER JOIN
dbo.Category ON dbo.Budget.CategoryId = dbo.Category.Id
LEFT OUTER JOIN
dbo.Expense ON dbo.Category.Id = dbo.Expense.CategoryId
Basically that should produce results like this.

This is an interesting problem. And I'm going to solve it with a hierarchyid. First, the setup:
USE tempdb;
IF OBJECT_ID('dbo.Hierarchy') IS NOT NULL
DROP TABLE dbo.[Hierarchy];
CREATE TABLE dbo.Hierarchy
(
ID INT NOT NULL PRIMARY KEY,
ParentID INT NULL,
CONSTRAINT [FK_parent] FOREIGN KEY ([ParentID]) REFERENCES dbo.Hierarchy([ID]),
hid HIERARCHYID,
Amount INT NOT null
);
INSERT INTO [dbo].[Hierarchy]
( [ID], [ParentID], [Amount] )
VALUES
(1, NULL, 100 ),
(2, 1, 50),
(3, 1, 50),
(4, 2, 58),
(5, 2, 7),
(6, 3, 10),
(7, 3, 20)
SELECT * FROM dbo.[Hierarchy] AS [h];
Next, to update the hid column with a proper value for the hiearchyid. I'll use a bog standard recursive cte for that
WITH cte AS (
SELECT [h].[ID] ,
[h].[ParentID] ,
CAST('/' + CAST(h.[ID] AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) AS [h],
[h].[hid]
FROM [dbo].[Hierarchy] AS [h]
WHERE [h].[ParentID] IS NULL
UNION ALL
SELECT [h].[ID] ,
[h].[ParentID] ,
CAST([c].[h] + CAST(h.[ID] AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) AS [h],
[h].[hid]
FROM [dbo].[Hierarchy] AS [h]
JOIN [cte] AS [c]
ON [h].[ParentID] = [c].[ID]
)
UPDATE [h]
SET hid = [cte].[h]
FROM cte
JOIN dbo.[Hierarchy] AS [h]
ON [h].[ID] = [cte].[ID];
Now that the heavy lifting is done, the results you want are almost trivially obtained:
SELECT p.id, SUM([c].[Amount])
FROM dbo.[Hierarchy] AS [p]
JOIN [dbo].[Hierarchy] AS [c]
ON c.[hid].IsDescendantOf(p.[hid]) = 1
GROUP BY [p].[ID];

After much research and using test data, I was able to get the running totals starting from bottom of hierarchy.
The solution is made up of two steps.
Create a scalar-valued function that will decide whether a categoryId is a direct or indirect child of another categoryId. This is given in first code-snippet. Note that a recursive query is used for this since that is the best approach when dealing with hierarchy in SQL Server.
Write the running total query that will give totals according to your requirements for all categories. You can filter by category if you wanted to on this query. The second code snippet provides this query.
Scalar-valued function that tells if a child category is a direct or indirect child of another category
CREATE FUNCTION dbo.IsADirectOrIndirectChild(
#childId int, #parentId int)
RETURNS int
AS
BEGIN
DECLARE #isAChild int;
WITH h(ParentId, ChildId)
-- CTE name and columns
AS (
SELECT TOP 1 #parentId, #parentId
FROM dbo.Category AS b
UNION ALL
SELECT b.ParentId, b.Id AS ChildId
FROM h AS cte
INNER JOIN
Category AS b
ON b.ParentId = cte.ChildId AND
cte.ChildId IS NOT NULL)
SELECT #isAChild = ISNULL(ChildId, 0)
FROM h
WHERE ChildId = #childId AND
ParentId <> ChildId
OPTION(MAXRECURSION 32000);
IF #isAChild > 0
BEGIN
SET #isAChild = 1;
END;
ELSE
BEGIN
SET #isAChild = 0;
END;
RETURN #isAChild;
END;
GO
Query for running total starting from bottom of hierarchy
SELECT c.Id AS CategoryId, c.Name AS CategoryName,
(
SELECT SUM(ISNULL(b.amount, 0))
FROM dbo.Budget AS b
WHERE dbo.IsADirectOrIndirectChild( b.CategoryId, c.Id ) = 1 OR
b.CategoryId = c.Id
) AS totalAmount,
(
SELECT SUM(ISNULL(e.total, 0))
FROM dbo.Expense AS e
WHERE dbo.IsADirectOrIndirectChild( e.CategoryId, c.Id ) = 1 OR
e.CategoryId = c.Id
) AS totalCost
FROM dbo.Category AS c;

Related

More efficient way to write this query - several M:M relationships

I am using SQL Server 2014 SP3.
I have the following hypothetical database structure.
There are accounts, which can belong to multiple customers, represented by the following tables:
Account <- Account_Customer -> Customer
The customers, in turn, can own multiple cars:
Customer <- Customer_Car -> Car
In addition, the customers can own many pets:
Customer <- Customer_Pet -> Pet
Now I am trying to come up with the most efficient query to answer the following question:
Get a list of accounts where none of the account owners have a "Cat" and none of the account owners drive a "Dodge".
The script below sets up the tables and some sample data. Please note that in real life, these tables will have 10's of millions of records, so I am trying to come up with the most efficient way to answer this question. So far I was only able to do it by accessing the same tables multiple times.
Setup script:
USE tempdb;
-- Create tables
IF OBJECT_ID('Account') IS NOT NULL DROP TABLE Account;
CREATE TABLE Account (AccountId INT, AccountName VARCHAR(24))
IF OBJECT_ID('Customer') IS NOT NULL DROP TABLE Customer;
CREATE TABLE Customer (CustomerId INT, CustomerName VARCHAR(24))
IF OBJECT_ID('Pet') IS NOT NULL DROP TABLE Pet;
CREATE TABLE Pet (PetId INT, PetName VARCHAR(24))
IF OBJECT_ID('Car') IS NOT NULL DROP TABLE Car;
CREATE TABLE Car (CarId INT, CarName VARCHAR(24))
IF OBJECT_ID('Account_Customer') IS NOT NULL DROP TABLE Account_Customer;
CREATE TABLE Account_Customer (AccountId INT, CustomerId INT)
IF OBJECT_ID('Customer_Pet') IS NOT NULL DROP TABLE Customer_Pet;
CREATE TABLE Customer_Pet (CustomerId INT, PetId INT)
IF OBJECT_ID('Customer_Car') IS NOT NULL DROP TABLE Customer_Car;
CREATE TABLE Customer_Car (CustomerId INT, CarId INT)
-- Populate data
INSERT [dbo].[Account]([AccountId], [AccountName])
VALUES (1, 'Account1'), (2, 'Account2')
INSERT [dbo].[Customer]([CustomerId], [CustomerName])
VALUES (1, 'Customer1'), (2, 'Customer2'), (3, 'Customer3'), (4, 'Customer4')
INSERT [dbo].[Pet]([PetId], [PetName])
VALUES (1, 'Cat1'), (2, 'Cat2'), (3, 'Dog3'), (4, 'Dog4')
INSERT [dbo].[Car]([CarId], [CarName])
VALUES (1, 'Ford1'), (2, 'Ford2'), (3, 'Kia3'), (4, 'Dodge4')
INSERT [dbo].[Account_Customer] ([AccountId], [CustomerId])
VALUES (1,1), (1,2), (2, 2), (2,3), (2,4)
INSERT [dbo].[Customer_Pet] ([CustomerId], [PetId])
VALUES (2,3), (3,1), (3, 2), (4,3), (4,4)
INSERT [dbo].[Customer_Car] ([CustomerId], [CarId])
VALUES (1,2), (2,2), (3,1), (3, 2), (3, 4)
--SELECT * FROM [dbo].[Account] AS [A]
--SELECT * FROM [dbo].[Customer] AS [C]
--SELECT * FROM [dbo].[Pet] AS [P]
--SELECT * FROM [dbo].[Car] AS [C]
--SELECT * FROM [dbo].[Account_Customer] AS [AC]
--SELECT * FROM [dbo].[Customer_Pet] AS [CP]
--SELECT * FROM [dbo].[Customer_Car] AS [CC]
-- Bring all the data together to see what we have (denormalized)
SELECT [A].[AccountId], [A].[AccountName],
[C].[CustomerId], [C].[CustomerName],
[CP].[PetId], [P].[PetName],
[C2].[CarId], [C2].[CarName]
FROM [dbo].[Customer] AS [C]
JOIN [dbo].[Account_Customer] AS [AC] ON [AC].[CustomerId] = [C].[CustomerId]
JOIN [dbo].[Account] AS [A] ON [A].[AccountId] = [AC].[AccountId]
LEFT JOIN [dbo].[Customer_Pet] AS [CP] ON [CP].[CustomerId] = [C].[CustomerId]
LEFT JOIN [dbo].[Pet] AS [P] ON [P].[PetId] = [CP].[PetId]
LEFT JOIN [dbo].[Customer_Car] AS [CC] ON [CC].[CustomerId] = [C].[CustomerId]
LEFT JOIN [dbo].[Car] AS [C2] ON [C2].[CarId] = [CC].[CarId]
ORDER BY [A].[AccountId], [AC].[CustomerId]
And here is the query, which answers my question, but I suspect it's inefficient on a large number of records. Is there a better way?
-- This should only return Account1
SELECT DISTINCT
[A].[AccountId],
[A].[AccountName]
FROM [dbo].[Customer] AS [C]
JOIN [dbo].[Account_Customer] AS [AC] ON [AC].[CustomerId] = [C].[CustomerId]
JOIN [dbo].[Account] AS [A] ON [A].[AccountId] = [AC].[AccountId]
EXCEPT
SELECT -- get Accounts where owner has a "Cat" or drives a "Dodge"
[A].[AccountId],
[A].[AccountName]
FROM [dbo].[Customer] AS [C]
JOIN [dbo].[Account_Customer] AS [AC] ON [AC].[CustomerId] = [C].[CustomerId]
JOIN [dbo].[Account] AS [A] ON [A].[AccountId] = [AC].[AccountId]
WHERE
(
EXISTS (SELECT TOP (1) 1
FROM [dbo].[Customer] AS [C2]
JOIN [dbo].[Customer_Pet] AS [CP2] ON [CP2].[CustomerId] = [C2].[CustomerId]
JOIN [dbo].[Pet] AS [P2] ON [P2].[PetId] = [CP2].[PetId]
WHERE [C2].[CustomerId] = [C].[CustomerId] -- correlation
AND [P2].[PetName] LIKE 'Cat%'
)
OR
EXISTS (SELECT TOP (1) 1
FROM [dbo].[Customer] AS [C2]
JOIN [dbo].[Customer_Car] AS [CP2] ON [CP2].[CustomerId] = [C2].[CustomerId]
JOIN [dbo].[Car] AS [P2] ON [P2].[CarId] = [CP2].[CarId]
WHERE [C2].[CustomerId] = [C].[CustomerId] -- correlation
AND [P2].[CarName] LIKE 'Dodge%'
)
)
Sorry if this is obvious, but please observe that the query below will not work (because it answers slightly different question - return accounts where AT LEAST ONE OWNER does not have a "Cat" and does not drive a "Dodge":
-- Does not work:
SELECT DISTINCT
[A].[AccountId],
[A].[AccountName]
FROM [dbo].[Customer] AS [C]
JOIN [dbo].[Account_Customer] AS [AC] ON [AC].[CustomerId] = [C].[CustomerId]
JOIN [dbo].[Account] AS [A] ON [A].[AccountId] = [AC].[AccountId]
WHERE
(
NOT EXISTS (SELECT TOP (1) 1
FROM [dbo].[Customer] AS [C2]
JOIN [dbo].[Customer_Pet] AS [CP2] ON [CP2].[CustomerId] = [C2].[CustomerId]
JOIN [dbo].[Pet] AS [P2] ON [P2].[PetId] = [CP2].[PetId]
WHERE [C2].[CustomerId] = [C].[CustomerId] -- correlation
AND [P2].[PetName] LIKE 'Cat%'
)
AND
NOT EXISTS (SELECT TOP (1) 1
FROM [dbo].[Customer] AS [C2]
JOIN [dbo].[Customer_Car] AS [CP2] ON [CP2].[CustomerId] = [C2].[CustomerId]
JOIN [dbo].[Car] AS [P2] ON [P2].[CarId] = [CP2].[CarId]
WHERE [C2].[CustomerId] = [C].[CustomerId] -- correlation
AND [P2].[CarName] LIKE 'Dodge%'
)
)
I must say, in a real database I would be very suspicious of all these Many:Many relationships. Can an Account be owned by multiple Customers, each of whom can own multiple Accounts? Equally can a Cat or a Pet have multiple owners?
Be that as it may: you can express your query like this:
You want all Accounts...
for which there do not exist Account_Customers...
Where those Customers are in the set of Customers who own a Cat...
... or a Dodge
SELECT *
FROM Account a
WHERE NOT EXISTS (
SELECT ac.CustomerId
FROM Account_Customer ac
WHERE ac.AccountId = a.AccountId
INTERSECT
(
SELECT cp.CustomerId
FROM Customer_Pet cp
JOIN Pet p ON p.PetId = cp.PetId
WHERE p.PetName LIKE 'Cat%'
UNION ALL
SELECT cc.CustomerId
FROM Customer_Car cc
JOIN Car c ON c.CarId = cc.CarId
WHERE c.CarName LIKE 'Dodge%'
)
)
db<>fiddle
It's too late for a more in-depth answer, so here's a quick and dirty one with a temp table.
Mind you it's not as bad as it looks, many times I've had simple queries on temp tables massively outperform large, interesting (from a mathematic point of view) queries.
Also, a question about performance is never simple to answer. Of special interest is the fact that you mention millions of rows and need for performance while your query uses a like operator on some text column. At least the % is in the end, so it's still SARGable. Will this column have an index? That will probably make a difference.
Here (done blind, hopefully no errors):
create table #forbidden
(
CustomerId int primary key
)
insert #forbidden(CustomerId)
select CustomerId from Customer C
where
exists(select 1 from Customer_Pet CP where CP.CustomerId=C.CustomerId and CP.[PetName] LIKE 'Cat%')
or exists(select 1 from Customer_Car CC where CC.CustomerId=C.CustomerId and CC.[CarName] LIKE 'Dodge%')
select * from Account A
where not exists
(
select 1
from Account_Customer AC
where
AC.CustomerId=A.CustomerId
and AC.CustomerId in (select CustomerId from #forbidden)
)

SQL Server query to find multiple rows that exists in another table

I'm having trouble figuring out how to write a query that checks if two tables have any overlapping rows, given a certain criteria. The main issue I have is that the number of rows that need to match can vary, sometimes it may be one row, sometimes it's several.
As an example, let's say I have two tables. I want to find which Parents are in both tables. For a parent to be in both tables, the tables need to have the same number of children, and the children should have the same names and ages. Each parent is identified by either a name or a number.
CREATE TABLE ParentsToSearch
(
ParentID INT NOT NULL,
ChildName NVARCHAR(800) NOT NULL,
ChildAge INT
);
CREATE TABLE ExistingParentsAndChildren
(
ParentName NVARCHAR(800) NOT NULL,
ChildName NVARCHAR(800) NOT NULL,
ChildAge INT
);
If I had the following sample data, I would want the query to return that ParentID 7 exists in the table ExistingParentAndChildren (as ParentID 7's children and their ages is a perfect match with John's children)
ParentsToSearch
ParentID
ChildName
ChildAge
7
Katie
17
7
Jacob
8
12
Robert
10
ExistingParentAndChildren
ParentName
ChildName
ChildAge
John
Katie
17
John
Jacob
8
Sue
Robert
5
Sue
Carter
14
Sue
Ralph
10
Alex
Rocky
12
I assume I need to use something like ALL or PIVOT? But I'm kind of lost, as I'm new to DB queries.
A simple join on the ChildName and ChildAge columns gets you close.
But then there is the possibility that the children of parent Y are a subset of the children with parent X (i.e. parent X has children x1 (5) and x2 (3) and parent Y also has child x1 (5), then the children from parent Y are a subset of the children from parent X).
If you would extend the available parent data with the count of his children in each table and matched on that number as well, then you would have a full match.
Adding the required counts can be done with a cross apply (a subquery that is executed for each row) or common table expressions (returns a table you can join with).
Sample data
CREATE TABLE ParentsToSearch
(
ParentID INT NOT NULL,
ChildName NVARCHAR(800) NOT NULL,
ChildAge INT
);
insert into ParentsToSearch (ParentId, ChildName, ChildAge) values
( 7, 'Katie', 17),
( 7, 'Jacob', 8),
(12, 'Robert', 10);
CREATE TABLE ExistingParentsAndChildren
(
ParentName NVARCHAR(800) NOT NULL,
ChildName NVARCHAR(800) NOT NULL,
ChildAge INT
);
insert into ExistingParentsAndChildren (ParentName, ChildName, ChildAge) values
('John', 'Katie', 17),
('John', 'Jacob', 8),
('Sue', 'Robert', 5),
('Sue', 'Carter', 14),
('Sue', 'Ralph', 10),
('Alex', 'Rocky', 12);
Solution
With cross apply.
select p.ParentId,
ep.ParentName
from ParentsToSearch p
cross apply ( select count(1) as ChildCount
from ParentsToSearch p2
where p2.ParentId = p.ParentId ) pc
join ExistingParentsAndChildren ep
on ep.ChildName = p.ChildName
and ep.ChildAge = p.ChildAge
cross apply ( select count(1) as ChildCount
from ExistingParentsAndChildren ep2
where ep2.ParentName = ep.ParentName ) epc
where pc.ChildCount = epc.ChildCount -- match the child counts
group by p.ParentId,
ep.ParentName;
With common table expressions.
with cte_pc as
(
select p2.ParentId,
count(1) as ChildCount
from ParentsToSearch p2
group by p2.ParentId
),
cte_epc as
(
select ep2.ParentName,
count(1) as ChildCount
from ExistingParentsAndChildren ep2
group by ep2.ParentName
)
select p.ParentId,
ep.ParentName
from ParentsToSearch p
join cte_pc p3
on p3.ParentId = p.ParentId
join ExistingParentsAndChildren ep
on ep.ChildName = p.ChildName
and ep.ChildAge = p.ChildAge
join cte_epc ep3
on ep3.ParentName = ep.ParentName
where p3.ChildCount = ep3.ChildCount -- match the child counts
group by p.ParentId,
ep.ParentName;
Result
ParentId ParentName
-------- ----------
7 John
Fiddle to see things in action (also includes some alternative queries to list the children etc.).
If using SQLServer > 2016 then I would just do it like you would think about it without thinking about coding first. Breaking down dataset using "with" statements allows the solution to flow naturally.
Solution:
Group all children under their parents for both tables using the STRING_AGG function and with statements
with ParentNameFamilies as
(
SELECT ParentName, STRING_AGG(CONVERT(NVARCHAR(max), ChildName), ',') AS ChildNames
, STRING_AGG(CONVERT(NVARCHAR(max), ChildAge), ',') AS ChildAges
FROM dbo.ExistingParentsAndChildren
group by ParentName
), ParentIDFamilies as
(
SELECT ParentID, STRING_AGG(CONVERT(NVARCHAR(max), ChildName), ',') AS ChildNames
, STRING_AGG(CONVERT(NVARCHAR(max), ChildAge), ',') AS ChildAges
FROM dbo.ParentsToSearch
group by ParentID
)
Then compare the two types of families on both child names and child ages values
select a.ParentName, b.ParentID
from ParentNameFamilies a
inner join ParentIDFamilies b on a.ChildNames=b.ChildNames and b.ChildAges=b.ChildAges
and so all together in one view or query:
with ParentNameFamilies as
(
SELECT ParentName, STRING_AGG(CONVERT(NVARCHAR(max), ChildName), ',') AS ChildNames
, STRING_AGG(CONVERT(NVARCHAR(max), ChildAge), ',') AS ChildAges
FROM dbo.ExistingParentsAndChildren
group by ParentName
), ParentIDFamilies as
(
SELECT ParentID, STRING_AGG(CONVERT(NVARCHAR(max), ChildName), ',') AS ChildNames
, STRING_AGG(CONVERT(NVARCHAR(max), ChildAge), ',') AS ChildAges
FROM dbo.ParentsToSearch
group by ParentID
)
select a.ParentName, b.ParentID
from ParentNameFamilies a
inner join ParentIDFamilies b on a.ChildNames=b.ChildNames and b.ChildAges=b.ChildAges

rSQL While Loop insert

*Updated - Please see below(Past the picture)
I am really stuck with this particular problem, I have two tables, Projects and Project Allocations, they are joined by the Project ID.
My goal is to populate a modified projects table's columns using the rows of the project allocations table. I've included an image below to illustrate what I'm trying to achieve.
A project can have up to 6 Project Allocations. Each Project Allocation has an Auto increment ID (Allocation ID) but I can't use this ID in a sub-selects because it isn't in a range of 1-6 so I can distinguish between who is the first PA2 and who is PA3.
Example:
(SELECT pa1.name FROM table where project.projectid = project_allocations.projectid and JVID = '1') as [PA1 Name],
(SELECT pa2.name FROM table where project.projectid = project_allocations.projectid and JVID = '1') as [PA2 Name],
The modified Projects table has columns for PA1, PA2, PA3. I need to populate these columns based on the project allocations table. So the first record in the database FOR EACH project will be PA1.
I've put together an SQL Agent job that drops and re-creates this table with the added columns so this is more about writing the project allocation row's into the modified projects table by row_num?
Any advice?
--Update
What I need to do now is to get the row_number added as a column for EACH project in order of DESC.
So the first row for each project ID will be 1 and for each row after that will be 2,3,4,5,6.
I've found the following code on this website:
use db_name
with cte as
(
select *
, new_row_id=ROW_NUMBER() OVER (ORDER BY eraprojectid desc)
from era_project_allocations_m
where era_project_allocations_m.eraprojectid = era_project_allocations_m.eraprojectid
)
update cte
set row_id = new_row_id
update cte
set row_id = new_row_id
I've added row_id as a column in the previous SQL Agent step and this code and it runs but it doesn't produce me a row_number FOR EACH projectid.
As you can see from the above image; I need to have 1-2 FOR Each project ID - effectively giving me thousands of 1s, 2s, 3s, 4s.
That way I can sort them into columns :)
From what I can tell a query using row number is what you are after. (Also, it might be a pivot table..)
Example:
create table Something (
someId int,
someValue varchar(255)
);
insert into Something values (1, 'one'), (1, 'two'), (1, 'three'), (1, 'four'), (2, 'ein'), (2, 'swei'), (3, 'un')
with cte as (
select someId,
someValue,
row_number() over(partition by someId order by someId) as rn
from Something
)
select distinct someId,
(select someValue from cte where ct.someId = someId and rn = 1) as value1,
(select someValue from cte where ct.someId = someId and rn = 2) as value2,
(select someValue from cte where ct.someId = someId and rn = 3) as value3,
(select someValue from cte where ct.someId = someId and rn = 4) as value4
into somethingElse
from cte ct;
select * from somethingElse;
Result:
someId value1 value2 value3 value4
1 one two three four
2 ein swei NULL NULL
3 un NULL NULL NULL

How can I increment an integer field in a SQL Server database table only when another field changes value?

I have an integer field (essentially a counter) called PID in a sql server table called Projects that I want to increment only when the value of another field changes (Task). Initially PID=1 for all rows. The following query (which I got from one of your answers elsewhere) does exactly what I want but I need to update my table with the result and that I cannot figure out.
SELECT Task,
dense_rank() over(order by Task) PID
FROM dbo.Projects;
If I do something like
Update Projects
SET Projects.PID =(SELECT Task,
dense_rank() over(order by Task) PID
FROM dbo.Projects);
I get "The select list for the INSERT statement contains more items than the insert list. The number of SELECT values must match the number of INSERT columns." How can I update my table with a query that gives me what I want?
This is the table design:
CREATE TABLE [dbo].[Projects]
(
[PID] [int] NULL
, [TID] [int] IDENTITY(1,1) NOT NULL
, [Project] [nvarchar](127) NOT NULL
, [Task] [nvarchar](127) NOT NULL
, [Dollars] [decimal](18, 0) NOT NULL
, [TaskLead] [nvarchar](127) NULL
) ON [PRIMARY];
I populate the table with
INSERT INTO dbo.Projects(Project, Task, Dollars, TaskLead)
SELECT Project + ' ' + ProjectDescription, Task + ' ' + TaskDescription, Dollars, TaskLead
FROM TM1_1
ORDER BY Project ASC, Task ASC;
E.g. data:
PID TID Project Task
1 1 Prj1 Tsk11
1 2 Prj1 Tsk12
2 1 Prj2 Tsk21
I want to update the table such that all projects that are the same have the same PID. I am now trying:
use mbt_tm1;
;WITH cteRank AS (
SELECT PID, DENSE_RANK() OVER ( PARTITION BY Project ORDER BY Project ASC) AS Calculated_Rank
FROM Projects )
UPDATE cteRank
SET PID = Calculated_Rank
Without knowing more about your exact question, perhaps this would work?
Update Projects
SET Projects.PID = dense_rank() over(order by Task);
To help with your question, I've put together a little sample that I think shows what you are asking. If this does not reflect your environment, please add details to your question.
CREATE TABLE Projects
(
ProjectID INT
, Task INT
, PID INT
);
GO
INSERT INTO Projects (ProjectID, Task, PID) VALUES (1, 1, 1);
INSERT INTO Projects (ProjectID, Task, PID) VALUES (2, 1, 2);
INSERT INTO Projects (ProjectID, Task, PID) VALUES (3, 1, 3);
INSERT INTO Projects (ProjectID, Task, PID) VALUES (4, 2, 1);
UPDATE Projects
SET Projects.PID = (
SELECT MAX(PID) + 1
FROM Projects p
WHERE P.Task = Projects.Task
);
SELECT *
FROM Projects;
[EDIT # 2]
Unless I am misunderstanding your requirements, this seems a simple way to update the PID for all associated projects:
UPDATE Projects
SET PID = (SELECT MAX(PID) FROM Projects P WHERE P.Project = Projects.Project);
You have to return only one value from the inner query and you're looking for a rank:
Update Projects
SET Projects.PID = X.Calculated_Rank
FROM Project P
INNER JOIN
(SELECT P1.PID, DENSE_RANK() OVER ( PARTITION BY P1.Project ORDER BY P1.Project ASC) AS Calculated_Rank
FROM Projects P1) X
ON X.PID = P.PID
This is what worked:
;WITH cte AS (
SELECT PID,
DENSE_RANK() OVER (
ORDER BY Project ASC) AS Calculated_Rank
FROM Projects
)
UPDATE cte SET PID = Calculated_Rank
It transformed a query that showed the result, namely:
SELECT Project,
dense_rank() over(order by Project) PID
FROM dbo.Projects;
into one that actually updated the table. Thank you so much for your support!
sorry to say but your question is very unclear. yet the original SELECT query you told is working for you and you want to use it for update i see that update is wrong.
correct update syntex would be something like this.
update PID in table with new Rank where Task is matching.
UPDATE t1 SET t1.Projects.PID = t2.pid
FROM Projects t1
JOIN
(
SELECT Task ,DENSE_RANK() OVER ( ORDER BY Task ) PID
FROM Projects
)t2
ON t1.Task=t2.task
EDIT:-1
this query is to get the output as you explained and shown in example.
no need to use CTE.
set nocount on
declare #prj table
(
pid int null
,tid int null
,prj sysname
,tsk sysname
)
insert into #prj (prj,tsk)
select 'prj1','tsk11'
union all select 'prj1','tsk12'
union all select 'prj2','tsk21'
union all select 'prj2','tsk22'
update t1 set t1.pID=t2.pID, t1.tID=t2.tID
from #prj t1
join
(
select dense_RANK() over (order by prj) as pID
,ROW_NUMBER() over (partition by prj order by prj,tsk) as tID
,prj,tsk
from #prj
)t2
on t1.prj=t2.prj and t1.tsk=t2.tsk
select * from #prj

How to develop a recursive CTE in T-SQL?

I am new to recursive CTEs. I am trying to develop a CTE which will return all of the employees under each manager name. So I have two tables: people_rv and staff_rv
People_rv table contains all of the people, both managers and employees. Staff_rv only contains manager information. Uniqueidentifier staff values are stored in Staff_rv. Uniqueidentifier employee values are stored in people_rv. People_rv contains varchar first and last name values for both managers and employees.
But when I run the following CTE I get an error:
WITH
cteStaff (ClientID, FirstName, LastName, SupervisorID, EmpLevel)
AS
(
SELECT p.people_id, p.first_name, p.last_name, s.supervisor_id,1
FROM people_rv p JOIN staff_rv s on s.people_id = p.people_id
WHERE s.supervisor_id = '95E16819-8C3A-4098-9430-08F0E3B764E1'
UNION ALL
SELECT p2.people_id, p2.first_name, p2.last_name, s2.supervisor_id, r.EmpLevel + 1
FROM people_rv p2 JOIN staff_rv s2 on s2.people_id = p2.people_id
INNER JOIN cteStaff r on s2.staff_id = r.ClientID
)
SELECT
FirstName + ' ' + LastName AS FullName,
EmpLevel,
(SELECT first_name + ' ' + last_name FROM people_rv p join staff_rv s on s.people_id = p.people_id
WHERE s.staff_id = cteStaff.SupervisorID) AS Manager
FROM cteStaff
OPTION (MAXRECURSION 0);
My output is:
Barbara G 1 Melanie K
Dawn P 1 Melanie K
Garrett M 1 Melanie K
Stephanie P 1 Melanie K
Amanda F 1 Melanie K
Amanda T 1 Melanie K
Stephanie G 1 Melanie K
Carlos H 1 Melanie K
So it is not iterating any more than the first level. Why not?
Melanie is the top most supervisor, but each of the persons in the leftmost column are also supervisors. So this query should also return level 2.
You may be in an infinite loop with your join. I would check how many levels you expect the table to actually go down. Generally you join a recursion on something similar to do
ID = ParentID
of something either contained in a table or in an expression. Keep in mind you can also create a CTE prior to a recursive CTE if you have to make up your relationship.
Here is an example that will self execute, it may help.
Declare #table table ( PersonId int identity, PersonName varchar(512), Account int, ParentId int, Orders int);
insert into #Table values ('Brett', 1, NULL, 1000),('John', 1, 1, 100),('James', 1, 1, 200),('Beth', 1, 2, 300),('John2', 2, 4, 400);
select
PersonID
, PersonName
, Account
, ParentID
from #Table
; with recursion as
(
select
t1.PersonID
, t1.PersonName
, t1.Account
--, t1.ParentID
, cast(isnull(t2.PersonName, '')
+ Case when t2.PersonName is not null then '\' + t1.PersonName else t1.PersonName end
as varchar(255)) as fullheirarchy
, 1 as pos
, cast(t1.orders +
isnull(t2.orders,0) -- if the parent has no orders than zero
as int) as Orders
from #Table t1
left join #Table t2 on t1.ParentId = t2.PersonId
union all
select
t.PersonID
, t.PersonName
, t.Account
--, t.ParentID
, cast(r.fullheirarchy + '\' + t.PersonName as varchar(255))
, pos + 1 -- increases
, r.orders + t.orders
from #Table t
join recursion r on t.ParentId = r.PersonId
)
, b as
(
select *, max(pos) over(partition by PersonID) as maxrec -- I find the maximum occurrence of position by person
from recursion
)
select *
from b
where pos = maxrec -- finds the furthest down tree
-- and Account = 2 -- I could find just someone from a different department
Your problem as far as I can tell is is you have no join connecting managers to their employees.
This join
INNER JOIN cteStaff r on r.StaffID = s2.staff_id
Just joins the same initial level 1 staffer back to himself.
UPDATE:
Still not quite right! You have a supervisor_id, but again you're still not actually using that to join back to the CTE.
So for each recursion of this CTE you need to (excluding the name join):
select {Level 1 Boss}, NULL (no supervisor)
union
select {new employee}, {that employee's boss}
So the join must connect the CTE's ClientID (the level 1 boss) to the second UNION query's supervisor field, which looks to be supervisor_id , not staff_id.
The JOIN to accomplish this second task is (from what I can tell of your staff_rv table schema):
SELECT p2.people_id, p2.first_name, p2.last_name, s2.supervisor_id, r.EmpLevel + 1
FROM people_rv p2 JOIN staff_rv s2 on s2.people_id = p2.people_id
INNER JOIN cteStaff r on s2.supervisor_id = r.ClientID
Note the bottom join joins the r.ClientID (the level 1 boss) to the staffer's supervisor_id field.
(NB: I think your staff_id and supervisor_id's mimic your people_id values from the people_rv table, so this join should work fine. But if they are different (i.e. a staffer's supervisor_id isn't that supervisor's people_id) then you'll need to write the join such that the staffer's supervisor_id can be joined to their people_id you're storing as ClientID in the CTE.)
Here's a good simple Recursive CTE to review (it may not be the answer, but someone else searching on how to make a recursive CTE may need it):
-- Recursive CTE
;
WITH Years ( myYear )
AS (
-- Base case
SELECT DATEPART(year, GETDATE())
UNION ALL
-- Recursive
SELECT Years.myYear - 1
FROM Years
WHERE Years.myYear >= 2002
)
SELECT *
FROM Years
Note that this probably won't solve your problem, but is a means to hopefully seeing where you're going wrong in the original query.
The default is 100 levels of recursion - you can set it to unlimited by using the MAXRECURSION query hint where you're selecting from your CTE:
...
FROM cteStaff
OPTION (MAXRECURSION 0);
From MSDN:
MAXRECURSION number
Specifies the maximum number of recursions allowed for this query. number is a nonnegative integer between 0 and 32767. When 0 is
specified, no limit is applied. If this option is not specified, the
default limit for the server is 100.
When the specified or default number for MAXRECURSION limit is reached during query execution, the query is ended and an error is
returned.
Because of this error, all effects of the statement are rolled back. If the statement is a SELECT statement, partial results or no
results may be returned. Any partial results returned may not include
all rows on recursion levels beyond the specified maximum recursion
level.

Resources