More efficient way to write this query - several M:M relationships - sql-server

I am using SQL Server 2014 SP3.
I have the following hypothetical database structure.
There are accounts, which can belong to multiple customers, represented by the following tables:
Account <- Account_Customer -> Customer
The customers, in turn, can own multiple cars:
Customer <- Customer_Car -> Car
In addition, the customers can own many pets:
Customer <- Customer_Pet -> Pet
Now I am trying to come up with the most efficient query to answer the following question:
Get a list of accounts where none of the account owners have a "Cat" and none of the account owners drive a "Dodge".
The script below sets up the tables and some sample data. Please note that in real life, these tables will have 10's of millions of records, so I am trying to come up with the most efficient way to answer this question. So far I was only able to do it by accessing the same tables multiple times.
Setup script:
USE tempdb;
-- Create tables
IF OBJECT_ID('Account') IS NOT NULL DROP TABLE Account;
CREATE TABLE Account (AccountId INT, AccountName VARCHAR(24))
IF OBJECT_ID('Customer') IS NOT NULL DROP TABLE Customer;
CREATE TABLE Customer (CustomerId INT, CustomerName VARCHAR(24))
IF OBJECT_ID('Pet') IS NOT NULL DROP TABLE Pet;
CREATE TABLE Pet (PetId INT, PetName VARCHAR(24))
IF OBJECT_ID('Car') IS NOT NULL DROP TABLE Car;
CREATE TABLE Car (CarId INT, CarName VARCHAR(24))
IF OBJECT_ID('Account_Customer') IS NOT NULL DROP TABLE Account_Customer;
CREATE TABLE Account_Customer (AccountId INT, CustomerId INT)
IF OBJECT_ID('Customer_Pet') IS NOT NULL DROP TABLE Customer_Pet;
CREATE TABLE Customer_Pet (CustomerId INT, PetId INT)
IF OBJECT_ID('Customer_Car') IS NOT NULL DROP TABLE Customer_Car;
CREATE TABLE Customer_Car (CustomerId INT, CarId INT)
-- Populate data
INSERT [dbo].[Account]([AccountId], [AccountName])
VALUES (1, 'Account1'), (2, 'Account2')
INSERT [dbo].[Customer]([CustomerId], [CustomerName])
VALUES (1, 'Customer1'), (2, 'Customer2'), (3, 'Customer3'), (4, 'Customer4')
INSERT [dbo].[Pet]([PetId], [PetName])
VALUES (1, 'Cat1'), (2, 'Cat2'), (3, 'Dog3'), (4, 'Dog4')
INSERT [dbo].[Car]([CarId], [CarName])
VALUES (1, 'Ford1'), (2, 'Ford2'), (3, 'Kia3'), (4, 'Dodge4')
INSERT [dbo].[Account_Customer] ([AccountId], [CustomerId])
VALUES (1,1), (1,2), (2, 2), (2,3), (2,4)
INSERT [dbo].[Customer_Pet] ([CustomerId], [PetId])
VALUES (2,3), (3,1), (3, 2), (4,3), (4,4)
INSERT [dbo].[Customer_Car] ([CustomerId], [CarId])
VALUES (1,2), (2,2), (3,1), (3, 2), (3, 4)
--SELECT * FROM [dbo].[Account] AS [A]
--SELECT * FROM [dbo].[Customer] AS [C]
--SELECT * FROM [dbo].[Pet] AS [P]
--SELECT * FROM [dbo].[Car] AS [C]
--SELECT * FROM [dbo].[Account_Customer] AS [AC]
--SELECT * FROM [dbo].[Customer_Pet] AS [CP]
--SELECT * FROM [dbo].[Customer_Car] AS [CC]
-- Bring all the data together to see what we have (denormalized)
SELECT [A].[AccountId], [A].[AccountName],
[C].[CustomerId], [C].[CustomerName],
[CP].[PetId], [P].[PetName],
[C2].[CarId], [C2].[CarName]
FROM [dbo].[Customer] AS [C]
JOIN [dbo].[Account_Customer] AS [AC] ON [AC].[CustomerId] = [C].[CustomerId]
JOIN [dbo].[Account] AS [A] ON [A].[AccountId] = [AC].[AccountId]
LEFT JOIN [dbo].[Customer_Pet] AS [CP] ON [CP].[CustomerId] = [C].[CustomerId]
LEFT JOIN [dbo].[Pet] AS [P] ON [P].[PetId] = [CP].[PetId]
LEFT JOIN [dbo].[Customer_Car] AS [CC] ON [CC].[CustomerId] = [C].[CustomerId]
LEFT JOIN [dbo].[Car] AS [C2] ON [C2].[CarId] = [CC].[CarId]
ORDER BY [A].[AccountId], [AC].[CustomerId]
And here is the query, which answers my question, but I suspect it's inefficient on a large number of records. Is there a better way?
-- This should only return Account1
SELECT DISTINCT
[A].[AccountId],
[A].[AccountName]
FROM [dbo].[Customer] AS [C]
JOIN [dbo].[Account_Customer] AS [AC] ON [AC].[CustomerId] = [C].[CustomerId]
JOIN [dbo].[Account] AS [A] ON [A].[AccountId] = [AC].[AccountId]
EXCEPT
SELECT -- get Accounts where owner has a "Cat" or drives a "Dodge"
[A].[AccountId],
[A].[AccountName]
FROM [dbo].[Customer] AS [C]
JOIN [dbo].[Account_Customer] AS [AC] ON [AC].[CustomerId] = [C].[CustomerId]
JOIN [dbo].[Account] AS [A] ON [A].[AccountId] = [AC].[AccountId]
WHERE
(
EXISTS (SELECT TOP (1) 1
FROM [dbo].[Customer] AS [C2]
JOIN [dbo].[Customer_Pet] AS [CP2] ON [CP2].[CustomerId] = [C2].[CustomerId]
JOIN [dbo].[Pet] AS [P2] ON [P2].[PetId] = [CP2].[PetId]
WHERE [C2].[CustomerId] = [C].[CustomerId] -- correlation
AND [P2].[PetName] LIKE 'Cat%'
)
OR
EXISTS (SELECT TOP (1) 1
FROM [dbo].[Customer] AS [C2]
JOIN [dbo].[Customer_Car] AS [CP2] ON [CP2].[CustomerId] = [C2].[CustomerId]
JOIN [dbo].[Car] AS [P2] ON [P2].[CarId] = [CP2].[CarId]
WHERE [C2].[CustomerId] = [C].[CustomerId] -- correlation
AND [P2].[CarName] LIKE 'Dodge%'
)
)
Sorry if this is obvious, but please observe that the query below will not work (because it answers slightly different question - return accounts where AT LEAST ONE OWNER does not have a "Cat" and does not drive a "Dodge":
-- Does not work:
SELECT DISTINCT
[A].[AccountId],
[A].[AccountName]
FROM [dbo].[Customer] AS [C]
JOIN [dbo].[Account_Customer] AS [AC] ON [AC].[CustomerId] = [C].[CustomerId]
JOIN [dbo].[Account] AS [A] ON [A].[AccountId] = [AC].[AccountId]
WHERE
(
NOT EXISTS (SELECT TOP (1) 1
FROM [dbo].[Customer] AS [C2]
JOIN [dbo].[Customer_Pet] AS [CP2] ON [CP2].[CustomerId] = [C2].[CustomerId]
JOIN [dbo].[Pet] AS [P2] ON [P2].[PetId] = [CP2].[PetId]
WHERE [C2].[CustomerId] = [C].[CustomerId] -- correlation
AND [P2].[PetName] LIKE 'Cat%'
)
AND
NOT EXISTS (SELECT TOP (1) 1
FROM [dbo].[Customer] AS [C2]
JOIN [dbo].[Customer_Car] AS [CP2] ON [CP2].[CustomerId] = [C2].[CustomerId]
JOIN [dbo].[Car] AS [P2] ON [P2].[CarId] = [CP2].[CarId]
WHERE [C2].[CustomerId] = [C].[CustomerId] -- correlation
AND [P2].[CarName] LIKE 'Dodge%'
)
)

I must say, in a real database I would be very suspicious of all these Many:Many relationships. Can an Account be owned by multiple Customers, each of whom can own multiple Accounts? Equally can a Cat or a Pet have multiple owners?
Be that as it may: you can express your query like this:
You want all Accounts...
for which there do not exist Account_Customers...
Where those Customers are in the set of Customers who own a Cat...
... or a Dodge
SELECT *
FROM Account a
WHERE NOT EXISTS (
SELECT ac.CustomerId
FROM Account_Customer ac
WHERE ac.AccountId = a.AccountId
INTERSECT
(
SELECT cp.CustomerId
FROM Customer_Pet cp
JOIN Pet p ON p.PetId = cp.PetId
WHERE p.PetName LIKE 'Cat%'
UNION ALL
SELECT cc.CustomerId
FROM Customer_Car cc
JOIN Car c ON c.CarId = cc.CarId
WHERE c.CarName LIKE 'Dodge%'
)
)
db<>fiddle

It's too late for a more in-depth answer, so here's a quick and dirty one with a temp table.
Mind you it's not as bad as it looks, many times I've had simple queries on temp tables massively outperform large, interesting (from a mathematic point of view) queries.
Also, a question about performance is never simple to answer. Of special interest is the fact that you mention millions of rows and need for performance while your query uses a like operator on some text column. At least the % is in the end, so it's still SARGable. Will this column have an index? That will probably make a difference.
Here (done blind, hopefully no errors):
create table #forbidden
(
CustomerId int primary key
)
insert #forbidden(CustomerId)
select CustomerId from Customer C
where
exists(select 1 from Customer_Pet CP where CP.CustomerId=C.CustomerId and CP.[PetName] LIKE 'Cat%')
or exists(select 1 from Customer_Car CC where CC.CustomerId=C.CustomerId and CC.[CarName] LIKE 'Dodge%')
select * from Account A
where not exists
(
select 1
from Account_Customer AC
where
AC.CustomerId=A.CustomerId
and AC.CustomerId in (select CustomerId from #forbidden)
)

Related

I need to find multiple rows with iteration without using loop

Let's say I have 2 tables.
Users Table
and Have one more table which defines hierarchy of user.
hierarchy Table
So as you can see:
C is a supervisor of D
B is a supervisor of C
A is a supervisor of B
So when I pass User D, then it should return all the supervisor like A,B,C
same when I pass User C, then it should return all the supervisor like A,B
What I tried.
Create table Users
(
Id int primary key identity (1,1),
Name varchar(1),
)
Insert into Users values ('A')
Insert into Users values ('B')
Insert into Users values ('C')
Insert into Users values ('D')
Create table Hierarchy
(
Id int primary key identity (1,1),
EmployeeId int FOREIGN KEY REFERENCES Users(Id),
SupervisorId int FOREIGN KEY REFERENCES Users(Id)
)
Insert into Hierarchy values (4,3)
Insert into Hierarchy values (3,2)
Insert into Hierarchy values (2,1)
select * from Users
select * from Hierarchy
with HierarchyData as
(
select mbh.* from Hierarchy mbh where mbh.EmployeeId = 4
union all
select mbh.* from Hierarchy mbh
join Hierarchy on mbh.SupervisorId = Hierarchy.EmployeeId
where mbh.EmployeeId <> 4
)
select e.Name as EmpName, s.Name as SupervisorName from HierarchyData h
join Users e on h.EmployeeId = e.Id
join Users s on h.SupervisorId = s.Id
But I am getting only one level data.
Any kind of help would be appreciated.
#Vishal as per my understanding written query for you can you please check it's working or not?
here I used LEFT JOIN you can go with INNER JOIN
If you go with LEFT JOIN as per your example A not have any supervisor so the record can be empty.
If you go with INNER JOIN as per your example you got only the B, C, D record.
Please check the below test query.
DECLARE #User TABLE
(
UserID INT,
UserName NVARCHAR(50)
)
DECLARE #EmployeeTable TABLE
(
ID INT,
EmployeeID INT,
supervisorID INT
)
INSERT INTO #User VALUES(1,'A'),
(2,'B'),
(3,'C'),
(4,'D')
INSERT INTO #EmployeeTable VALUES
(1,4,3),
(2,3,2),
(3,2,1)
SELECT [U].[UserName] [EmployeeName],
[ET].[EmployeeID],
[ET].[SupervisorID],[ST].[SupervisorName]
FROM #User [U]
LEFT JOIN #EmployeeTable [ET]
ON [U].[UserID] = [ET].[EmployeeID]
LEFT JOIN
(
SELECT [U].UserName [SupervisorName] ,[ST].* FROM #User [U]
INNER JOIN #EmployeeTable [ST]
ON [ST].[supervisorID] = [U].[UserID]
) [ST]
ON [ST].[supervisorID] = [ET].[supervisorID]
Left join query result
Inner join query result
let me know if I can help more :).

Show all and only rows in table 1 not in table 2 (using multiple columns)

I have one table (Table1) that has several columns used in combination: Name, TestName, DevName, Dept. When each of these 4 columns have values, the record is inserted into Table2. I need to confirm that all of the records with existing values in each of these fields within Table1 were correctly copied into Table 2.
I have created a query for it:
SELECT DISTINCT wr.Name,wr.TestName, wr.DEVName ,wr.Dept
FROM table2 wr
where NOT EXISTS (
SELECT NULL
FROM TABLE1 ym
WHERE ym.Name = wr.Name
AND ym.TestName = wr. TestName
AND ym.DEVName = wr.DEVName
AND ym. Dept = wr. Dept
)
My counts are not adding up, so I believe that this is incorrect. Can you advise me on the best way to write this query for my needs?
You can use the EXCEPT set operator for this one if the table definitions are identical.
SELECT DISTINCT ym.Name, ym.TestName, ym.DEVName, ym.Dept
FROM table1 ym
EXCEPT
SELECT DISTINCT wr.Name, wr.TestName, wr.DEVName, wr.Dept
FROM table2 wr
This returns distinct rows from the first table where there is not a match in the second table. Read more about EXCEPT and INTERSECT here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql?view=sql-server-2017
Your query should do the job. It checks anything that are in Table1, but not Table2
SELECT ym.Name, ym.TestName, ym.DEVName, ym.Dept
FROM Table1 ym
WHERE NOT EXISTS (
SELECT 1
FROM table2
WHERE ym.Name = Name AND ym.TestName = TestName AND ym.DEVName = DEVName AND ym. Dept = Dept
)
If the structure of both tables are the same, EXCEPT is probably simpler.
IF OBJECT_ID(N'tempdb..#table1') IS NOT NULL drop table #table1
IF OBJECT_ID(N'tempdb..#table2') IS NOT NULL drop table #table2
create table #table1 (id int, value varchar(10))
create table #table2 (id int)
insert into #table1(id, value) VALUES (1,'value1'), (2,'value2'), (3,'value3')
--test here. Comment next line
insert into #table2(id) VALUES (1) --Comment/Uncomment
select * from #table1
select * from #table2
select #table1.*
from #table1
left JOIN #table2 on
#table1.id = #table2.id
where (#table2.id is not null or not exists (select * from #table2))

SQL Server CTE left outer join

I have 2 tables in SQL Server 2008, customertest with columns customer id (cid) and it's boss id (upid), and conftest with cid, confname, confvalue
customertest schema and data:
conftest schema and data:
I want to know how to design a CTE that if cid in conftest doesn't have that confname's confvalue, it will keep searching upid and till find a upper line which have confname and confvalue.
For example , I want to get value of 100 if I search for cid=4 (this is normal case). And I want to get value of 200 if I search for cid=7 or 8.
And if cid7 and cid8 have child node , it will all return 200 (of cid5) if I search using this CTE.
I don't have a clue how to do this , I think maybe can use CTE and some left outer join, please give me some example ?? Thanks a lot.
If it's unknown how many levels there are in the hierarchy?
Then such challenge is often done via a Recursive CTE.
Example Snippet:
--
-- Using table variables for testing reasons
--
declare #customertest table (cid int primary key, upid int);
declare #conftest table (cid int, confname varchar(6) default 'budget', confvalue int);
--
-- Sample data
--
insert into #customertest (cid, upid) values
(1,0), (2,1), (3,1), (4,2), (5,2), (6,3),
(7,5), (8,5), (9,8), (10,9);
insert into #conftest (cid, confvalue) values
(1,1000), (2,700), (3,300), (4,100), (5,200), (6,300);
-- The customer that has his own budget, or not.
declare #customerID int = 10;
;with RCTE AS
(
--
-- the recursive CTE starts from here. The seed records, as one could call it.
--
select cup.cid as orig_cid, 0 as lvl, cup.cid, cup.upid, budget.confvalue
from #customertest as cup
left join #conftest budget on (budget.cid = cup.cid and budget.confname = 'budget')
where cup.cid = #customerID -- This is where we limit on the customer
union all
--
-- This is where the Recursive CTE loops till it finds nothing new
--
select RCTE.orig_cid, RCTE.lvl+1, cup.cid, cup.upid, budget.confvalue
from RCTE
join #customertest as cup on (cup.cid = RCTE.upid)
outer apply (select b.confvalue from #conftest b where b.cid = cup.cid and b.confname = 'budget') as budget
where RCTE.confvalue is null -- Loop till a budget is found
)
select
orig_cid as cid,
confvalue
from RCTE
where confvalue is not null;
Result :
cid confvalue
--- ---------
10 200
Btw, the Recursive CTE uses the OUTER APPLY because MS SQL Server doesn't allow a LEFT OUTER JOIN to be used there.
And if it's certain that there's maximum 1 level depth for the upid with a budget?
Then just simple left joins and a coalesce would do.
For example:
select cup.cid, coalesce(cBudget.confvalue, upBudget.confvalue) as confvalue
from #customertest as cup
left join #conftest cBudget on (cBudget.cid = cup.cid and cBudget.confname = 'budget')
left join #conftest upBudget on (upBudget.cid = cup.upid and upBudget.confname = 'budget')
where cup.cid = 8;
I don't think you are looking for a CTE to do that, from what I understand:
CREATE TABLE CustomerTest(
CID INT,
UPID INT
);
CREATE TABLE ConfTest(
CID INT,
ConfName VARCHAR(45),
ConfValue INT
);
INSERT INTO CustomerTest VALUES
(1, 0),
(2, 1),
(3, 1),
(4, 2),
(5, 2),
(6, 3),
(7, 5),
(8, 5);
INSERT INTO ConfTest VALUES
(1, 'Budget', 1000),
(2, 'Budget', 700),
(3, 'Budget', 300),
(4, 'Budget', 100),
(5, 'Budget', 200),
(6, 'Budget', 300);
SELECT MAX(CNT.CID) AS CID,
CNT.ConfName,
MIN(CNT.ConfValue) AS ConfValue
FROM ConfTest CNT INNER JOIN CustomerTest CMT ON CMT.CID = CNT.CID
OR CMT.UPID = CNT.CID
WHERE CMT.CID = 7 -- You can test for values (8, 4) or any value you want :)
GROUP BY
CNT.ConfName;

SQL Function to return sequential id's

Consider this simple INSERT
INSERT INTO Assignment (CustomerId,UserId)
SELECT CustomerId,123 FROM Customers
That will obviously assign UserId=123 to all customers.
What I need to do is assign them to 3 userId's sequentially, so 3 users get one third of the accounts equally.
INSERT INTO Assignment (CustomerId,UserId)
SELECT CustomerId,fnGetNextId() FROM Customers
Could I create a function to return sequentially from a list of 3 ID's?, i.e. each time the function is called it returns the next one in the list?
Thanks
Could I create a function to return sequentially from a list of 3 ID's?,
If you create a SEQUENCE, then you can assign incremental numbers with the NEXT VALUE FOR (Transact-SQL) expression.
This is a strange requirement, but the modulus operator (%) should help you out without the need for functions, sequences, or altering your database structure. This assumes that the IDs are integers. If they're not, you can use ROW_NUMBER or a number of other tactics to get a distinct number value for each customer.
Obviously, you would replace the SELECT statement with an INSERT once you're satisfied with the code, but it's good practice to always select when developing before inserting.
SETUP WITH SAMPLE DATA:
DECLARE #Users TABLE (ID int, [Name] varchar(50))
DECLARE #Customers TABLE (ID int, [Name] varchar(50))
DECLARE #Assignment TABLE (CustomerID int, UserID int)
INSERT INTO #Customers
VALUES
(1, 'Joe'),
(2, 'Jane'),
(3, 'Jon'),
(4, 'Jake'),
(5, 'Jerry'),
(6, 'Jesus')
INSERT INTO #Users
VALUES
(1, 'Ted'),
(2, 'Ned'),
(3, 'Fred')
QUERY:
SELECT C.Name AS [CustomerName], U.Name AS [UserName]
FROM #Customers C
JOIN #Users U
ON
CASE WHEN C.ID % 3 = 0 THEN 1
WHEN C.ID % 3 = 1 THEN 2
WHEN C.ID % 3 = 2 THEN 3
END = U.ID
You would change the THEN 1 to whatever your first UserID is, THEN 2 with the second UserID, and THEN 3 with the third UserID. If you end up with another user and want to split the customers 4 ways, you would do replace the CASE statement with the following:
CASE WHEN C.ID % 4 = 0 THEN 1
WHEN C.ID % 4 = 1 THEN 2
WHEN C.ID % 4 = 2 THEN 3
WHEN C.ID % 4 = 3 THEN 4
END = U.ID
OUTPUT:
CustomerName UserName
-------------------------------------------------- --------------------------------------------------
Joe Ned
Jane Fred
Jon Ted
Jake Ned
Jerry Fred
Jesus Ted
(6 row(s) affected)
Lastly, you will want to select the IDs for your actual insert, but I selected the names so the results are easier to understand. Please let me know if this needs clarification.
Here's one way to produce Assignment as an automatically rebalancing view:
CREATE VIEW dbo.Assignment WITH SCHEMABINDING AS
WITH SeqUsers AS (
SELECT UserID, ROW_NUMBER() OVER (ORDER BY UserID) - 1 AS _ord
FROM dbo.Users
), SeqCustomers AS (
SELECT CustomerID, ROW_NUMBER() OVER (ORDER BY CustomerID) - 1 AS _ord
FROM dbo.Customers
)
-- INSERT Assignment(CustomerID, UserID)
SELECT SeqCustomers.CustomerID, SeqUsers.UserID
FROM SeqUsers
JOIN SeqCustomers ON SeqUsers._ord = SeqCustomers._ord % (SELECT COUNT(*) FROM SeqUsers)
;
This shifts assignments around if you insert a new user, which could be quite undesirable, and it's also not efficient if you had to JOIN on it. You can easily repurpose the query it contains for one-time inserts (the commented-out INSERT). The key technique there is joining on ROW_NUMBER()s.

SQL Server Hierarchical Sum of column

I have my database design as per the diagram.
Category table is self referencing parent child relationship
Budget will have all the categories and amount define for each category
Expense table will have entries for categories for which the amount has been spend (consider Total column from this table).
I want to write select statement that will retrieve dataset with columns given below :
ID
CategoryID
CategoryName
TotalAmount (Sum of Amount Column of all children hierarchy From BudgetTable )
SumOfExpense (Sum of Total Column of Expense all children hierarchy from expense table)
I tried to use a CTE but was unable to produce anything useful. Thanks for your help in advance. :)
Update
I just to combine and simplify data I have created one view with the query below.
SELECT
dbo.Budget.Id, dbo.Budget.ProjectId, dbo.Budget.CategoryId,
dbo.Budget.Amount,
dbo.Category.ParentID, dbo.Category.Name,
ISNULL(dbo.Expense.Total, 0) AS CostToDate
FROM
dbo.Budget
INNER JOIN
dbo.Category ON dbo.Budget.CategoryId = dbo.Category.Id
LEFT OUTER JOIN
dbo.Expense ON dbo.Category.Id = dbo.Expense.CategoryId
Basically that should produce results like this.
This is an interesting problem. And I'm going to solve it with a hierarchyid. First, the setup:
USE tempdb;
IF OBJECT_ID('dbo.Hierarchy') IS NOT NULL
DROP TABLE dbo.[Hierarchy];
CREATE TABLE dbo.Hierarchy
(
ID INT NOT NULL PRIMARY KEY,
ParentID INT NULL,
CONSTRAINT [FK_parent] FOREIGN KEY ([ParentID]) REFERENCES dbo.Hierarchy([ID]),
hid HIERARCHYID,
Amount INT NOT null
);
INSERT INTO [dbo].[Hierarchy]
( [ID], [ParentID], [Amount] )
VALUES
(1, NULL, 100 ),
(2, 1, 50),
(3, 1, 50),
(4, 2, 58),
(5, 2, 7),
(6, 3, 10),
(7, 3, 20)
SELECT * FROM dbo.[Hierarchy] AS [h];
Next, to update the hid column with a proper value for the hiearchyid. I'll use a bog standard recursive cte for that
WITH cte AS (
SELECT [h].[ID] ,
[h].[ParentID] ,
CAST('/' + CAST(h.[ID] AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) AS [h],
[h].[hid]
FROM [dbo].[Hierarchy] AS [h]
WHERE [h].[ParentID] IS NULL
UNION ALL
SELECT [h].[ID] ,
[h].[ParentID] ,
CAST([c].[h] + CAST(h.[ID] AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) AS [h],
[h].[hid]
FROM [dbo].[Hierarchy] AS [h]
JOIN [cte] AS [c]
ON [h].[ParentID] = [c].[ID]
)
UPDATE [h]
SET hid = [cte].[h]
FROM cte
JOIN dbo.[Hierarchy] AS [h]
ON [h].[ID] = [cte].[ID];
Now that the heavy lifting is done, the results you want are almost trivially obtained:
SELECT p.id, SUM([c].[Amount])
FROM dbo.[Hierarchy] AS [p]
JOIN [dbo].[Hierarchy] AS [c]
ON c.[hid].IsDescendantOf(p.[hid]) = 1
GROUP BY [p].[ID];
After much research and using test data, I was able to get the running totals starting from bottom of hierarchy.
The solution is made up of two steps.
Create a scalar-valued function that will decide whether a categoryId is a direct or indirect child of another categoryId. This is given in first code-snippet. Note that a recursive query is used for this since that is the best approach when dealing with hierarchy in SQL Server.
Write the running total query that will give totals according to your requirements for all categories. You can filter by category if you wanted to on this query. The second code snippet provides this query.
Scalar-valued function that tells if a child category is a direct or indirect child of another category
CREATE FUNCTION dbo.IsADirectOrIndirectChild(
#childId int, #parentId int)
RETURNS int
AS
BEGIN
DECLARE #isAChild int;
WITH h(ParentId, ChildId)
-- CTE name and columns
AS (
SELECT TOP 1 #parentId, #parentId
FROM dbo.Category AS b
UNION ALL
SELECT b.ParentId, b.Id AS ChildId
FROM h AS cte
INNER JOIN
Category AS b
ON b.ParentId = cte.ChildId AND
cte.ChildId IS NOT NULL)
SELECT #isAChild = ISNULL(ChildId, 0)
FROM h
WHERE ChildId = #childId AND
ParentId <> ChildId
OPTION(MAXRECURSION 32000);
IF #isAChild > 0
BEGIN
SET #isAChild = 1;
END;
ELSE
BEGIN
SET #isAChild = 0;
END;
RETURN #isAChild;
END;
GO
Query for running total starting from bottom of hierarchy
SELECT c.Id AS CategoryId, c.Name AS CategoryName,
(
SELECT SUM(ISNULL(b.amount, 0))
FROM dbo.Budget AS b
WHERE dbo.IsADirectOrIndirectChild( b.CategoryId, c.Id ) = 1 OR
b.CategoryId = c.Id
) AS totalAmount,
(
SELECT SUM(ISNULL(e.total, 0))
FROM dbo.Expense AS e
WHERE dbo.IsADirectOrIndirectChild( e.CategoryId, c.Id ) = 1 OR
e.CategoryId = c.Id
) AS totalCost
FROM dbo.Category AS c;

Resources