Recursive SQL query with multiple columns - sql-server

I have a table with the following columns
idRelationshipType int,
idPerson1 int,
idPerson2 int
This table allows me to indicate records in a database that should be linked together.
I need to do a query returning all the unique ids where a person's id exists in idPerson1 or idPerson2 columns. Additionally, I need the query to be recursive so that the if I a match is found in idPerson1, the value for idPerson2 is included in the result set and used to repeat the query recursively until no more matches are found.
Example data:
CREATE TABLE [dbo].[tbRelationships]
(
[idRelationshipType] [int],
[idPerson1] [int] ,
[idPerson2] [int]
)
INSERT INTO tbRelationships (idRelationshipType, idPerson1, idPerson2)
VALUES (1, 1, 2)
INSERT INTO tbRelationships (idRelationshipType, idPerson1, idPerson2)
VALUES (1, 2, 3)
INSERT INTO tbRelationships (idRelationshipType, idPerson1, idPerson2)
VALUES (1, 3, 4)
INSERT INTO tbRelationships (idRelationshipType, idPerson1, idPerson2)
VALUES (1, 5, 1)
Four 'Relationships' are defined here. For this query, I will only know one of the ids to begin with. I need a query that in concept works like
SELECT idPerson
FROM [some query]
WHERE [the id i have to start with] = #idPerson
AND idRelationshipType = #idRelationshipType
The returned result should be a 5 rows with one column 'idPerson', with 1, 2, 3, 4, and 5 as the row values.
I have tried various combinations of UNPIVOT and recursive CTEs but I am not making much progress.
Any help would be greatly appreciated.
Thanks,
Daniel

I think this is what you want:
DECLARE #RelationshipType int
DECLARE #PersonId int
SELECT #RelationshipType = 1, #PersonId = 1
;WITH Hierachy (idPerson1, IdPerson2)
AS
(
--root
SELECT R.idPerson1, R.idPerson2
FROM tbRelationships R
WHERE R.idRelationshipType = #RelationshipType
AND (R.idPerson1 = #PersonId OR R.idPerson2 = #PersonId)
--recurse
UNION ALL
SELECT R.idPerson1, R.idPerson2
FROM Hierachy H
JOIN tbRelationships R
ON (R.idPerson1 = H.idPerson2
OR R.idPerson2 = H.idPerson1)
AND R.idRelationshipType = #RelationshipType
)
SELECT DISTINCT idPerson
FROM
(
SELECT idPerson1 AS idPerson FROM Hierachy
UNION
SELECT idPerson2 AS idPerson FROM Hierachy
) H
Essentially, get the first rows where the required id is in either column, and then recurse getting all of the child ids based on id column 2

Related

SQL Select from table where joined values from a second table are a subset of values from a third table

I have the following tables in MS SQL Server: Tasks, Users, Tags, TaskTags (maps a task to a tag), and UserTags (maps a user to a tag).
Given a User U, I want to find all tasks T where every tag of T is also a tag of U (e.g. a task should be returned if its tags are a subset of the user's tags).
Here is a table script with some sample data (it can be run at http://sqlfiddle.com/ with MS SQL Server 17):
CREATE TABLE [dbo].[Tasks](
[TaskId] [int] NOT NULL PRIMARY KEY,
[TaskName] [nvarchar](MAX) NOT NULL
)
CREATE TABLE [dbo].[Users](
[UserId] [int] NOT NULL PRIMARY KEY,
[UserName] [nvarchar](MAX) NOT NULL
)
CREATE TABLE [dbo].[Tags](
[TagId] [int] NOT NULL PRIMARY KEY,
[TagName] [nvarchar](MAX) NOT NULL
)
CREATE TABLE [dbo].[TaskTags](
[TaskId] [int] NOT NULL,
[TagId] [int] NOT NULL
)
CREATE TABLE [dbo].[UserTags](
[UserId] [int] NOT NULL,
[TagId] [int] NOT NULL
)
INSERT INTO Tasks VALUES (1,'Task for all SWEs');
INSERT INTO Tasks VALUES (2,'Task for USA SWEs');
INSERT INTO Tasks VALUES (3,'Task for all PMs');
INSERT INTO Tasks VALUES (4,'Task for Europe PMs');
INSERT INTO Users VALUES (1,'Europe SWE');
INSERT INTO Users VALUES (2,'USA SWE');
INSERT INTO Users VALUES (3,'Europe PM');
INSERT INTO Users VALUES (4,'USA PM');
INSERT INTO Tags VALUES (1,'swe');
INSERT INTO Tags VALUES (2,'pm');
INSERT INTO Tags VALUES (3,'usa');
INSERT INTO Tags VALUES (4,'europe');
INSERT INTO TaskTags VALUES (1,1);
INSERT INTO TaskTags VALUES (2,1);
INSERT INTO TaskTags VALUES (2,3);
INSERT INTO TaskTags VALUES (3,2);
INSERT INTO TaskTags VALUES (4,2);
INSERT INTO TaskTags VALUES (4,4);
INSERT INTO UserTags VALUES (1,1);
INSERT INTO UserTags VALUES (1,4);
INSERT INTO UserTags VALUES (2,1);
INSERT INTO UserTags VALUES (2,3);
INSERT INTO UserTags VALUES (3,2);
INSERT INTO UserTags VALUES (3,4);
INSERT INTO UserTags VALUES (4,2);
INSERT INTO UserTags VALUES (4,3);
I was able to figure out the inverse of this problem, when the Task T is given. E.g. given Task T, return all Users U where the tags of T are a subset of U. Here is that query:
WITH thisTaskTags AS (
SELECT DISTINCT TaskTags.TagId
FROM TaskTags
WHERE TaskTags.TaskId = #taskId
)
SELECT UserTags.UserId
FROM UserTags JOIN thisTaskTags
ON UserTags.TagId = thisTaskTags.TagId CROSS JOIN
(SELECT COUNT(*) AS keycnt FROM thisTaskTags) k
GROUP BY UserTags.UserId
HAVING COUNT(thisTaskTags.TagId) = MAX(k.keycnt)
When #taskId = 1, UserIds 1 and 2 are returned, and when #taskId = 2, only UserId 2 is returned (correct behavior).
However when I tried to convert this to returning all tasks a given user should have, I ran into trouble. I tried this query:
WITH thisUserTags AS (
SELECT DISTINCT UserTags.TagId
FROM UserTags
WHERE UserTags.UserId = #userId
)
SELECT TaskTags.TaskId
FROM TaskTags JOIN thisUserTags
ON thisUserTags.TagId = TaskTags.TagId CROSS JOIN
(SELECT COUNT(*) AS keycnt FROM thisUserTags) k
GROUP BY TaskTags.TaskId
HAVING COUNT(thisUserTags.TagId) = MAX(k.keycnt);
However this only returns tasks where all the task tags match all the user tasks, e.g. if U had tags: [a,b,c] it would only get tasks with tags: [a,b,c] instead of [a], [b], [b,c], etc.
With concrete examples, if you set #userId = 1, no task IDs are returned, when the correct output would be getting 1 row, Task ID = 1. And when #userId = 2, only taskID 2 is returned, when both taskIDs 1 and 2 should be returned (i.e. if a task only has the "swe" tag, all "swe" users should get it, but if a task has both "swe" and "usa", only users who have both of those tags should get it).
I also tried this query:
SELECT DISTINCT Tasks.TaskId FROM Tasks
INNER JOIN TaskTags ON TaskTags.TaskId = Tasks.TaskId
WHERE TaskTags.TagId IN (SELECT TagId from UserTags where UserId = #userId)
GROUP BY Tasks.TaskId
But the issue with this is it returns any task that has any tag in common, so U with tags: [a,b,c] would get T with tags: [b,d] even though U doesn't have tag d.
Again with concrete examples, if #userId = 1, taskIDs 1,2, and 4 are returned, when only taskIds 1 and 2 should be returned (task ID 4 should only be assigned to users with both tags "europe" and "pm", here it is erroneously being assigned to a user with tags "europe" and "swe" due to the common "europe" tag).
Could someone shed some light here?
This is a classic Relational Division With Remainder question.
You just need to frame it right:
You want all Tasks...
... whose TaskTags divide the set of all UserTags for a given User
There can be a remainder of UserTags but not a remainder of TaskTags so the former is the dividend, the latter is the divisor.
A typical solution (there are many) is to left join the dividend to the divisor, group it up, then ensure that the number of matched dividends is the same as the number of divisors. In other words, all divisors have a match.
Since you only seem to want the Tasks but not their TaskTags, you can do all this in an EXISTS subquery:
DECLARE #userId int = 1;
SELECT *
FROM Tasks t
WHERE EXISTS (SELECT 1
FROM TaskTags tt
LEFT JOIN UserTags ut ON ut.TagId = tt.TagId
AND ut.UserId = #userId
WHERE tt.TaskId = t.TaskId
HAVING COUNT(*) = COUNT(ut.UserId)
);
db<>fiddle
You're probably looking for something like the following...
declare #userId int = ...;
select Tasks.TaskId
from Tasks
where 0 = (
select count(1)
from (
select TagId from TaskTags where TaskTags.TaskId=Tasks.TaskId
except
select TagId from UserTags where UserTags.UserId=#userId
) TaskSpecificTags
);
It's not clear if you also want to return Tasks with 0 Tags so you may need to test that condition as well.

join many rows into one xml column

I have two tables like this:
and
The first generated table lists all the actors/directors/other and the second table lists all the movies. What I need to do is fit all the Person_ID and Name into one column in the second table, ideally through a join and XML conversion. I know how to convert a whole table to an XML, but I am not sure how to do it to a dozen rows on the fly, is iteration the way? Any ideas?
If you use FOR XML PATH('') it eliminates the root tags. If you do not specify a column alias, it will just output the value. So you can create a string of values in a sub-query as one of the column values. Something like this should get you started:
create table #Cast (FilmID int, Person_ID int, PersonName varchar(20))
create table #Films (FilmID int , FilmName varchar(20))
insert into #Cast values (1, 1, 'bob')
insert into #Cast values (1, 2, 'tom')
insert into #Cast values (2, 3, 'sam')
insert into #Cast values (2, 4, 'ray')
insert into #Films values (1, 'Flowers for Charlie')
insert into #Films values (2, 'Batman')
SELECT #Films.*,
SUBSTRING(
(SELECT ( ',' + convert(varchar(10),Person_ID) + ',' + PersonName)
FROM #Cast
WHERE #Films.FilmID = #Cast.FilmID
ORDER BY #Films.FilmID, #Cast.FilmID
FOR XML PATH('')
), 2, 8000) CastList
FROM #Films

MS SQL select statement with "check table"

I'm struggling to find out a solution for my select statement. I have two tables:
CREATE TABLE Things
(
Id int,
type int
)
CREATE TABLE Relations
(
Id int,
IdParent int,
IdChild int
)
What I need to select, based on given Thing Id:
All records from Things that has Type = -1
All records from Things that has Type matching IdChild where IdParent is Type of a row matching given Id.
If Type of a row matching given Id is -1 or doesn't exist in table Relations (as IdParent) I need to select all records from Things
I am having problem with the last scenario, I tried to do that by joining Relations table, but I can't come up with a condition that would satisfy all scenarios, any suggestions?
UPDATE
This is how I do it now. I need to solve the scenario where given id does not exists in table Relations - then I need to select all records.
CREATE TABLE Things (id int, type int)
CREATE TABLE Relations (id int, parent int, child int)
INSERT INTO Things VALUES (1, 1)
INSERT INTO Things VALUES (2, -1)
INSERT INTO Things VALUES (3, 3)
INSERT INTO Things VALUES (4, 3)
INSERT INTO Things VALUES (5, 2)
INSERT INTO Things VALUES (6, -1)
INSERT INTO Relations VALUES (1, 1, 2)
DECLARE #id int = 1
SELECT * FROM Things
JOIN Relations R ON R.parent = #id AND Type = R.child OR Type = -1
So I must solve the situation where #id = 2 for example - I need to retrieve all rows (same for #id = 5, unless it appears in Relations in parent column somewhere).
UPDATE 2
I came up with something like that:
DECLARE #id int = 2
DECLARE #type int
SELECT #type = type FROM Things WHERE Id = #id
IF #type > -1
BEGIN
SELECT T.* FROM Things T
JOIN Relations R ON (R.parent = #id AND T.Type = R.child) OR T.Type = -1
END
ELSE
BEGIN
SELECT * FROM Things
END
I'm quite sure that this can be done differently, without conditional IF, optimized.
from Things
left outer join Relations
on Relations.IdParent = Things.Type
where Things.Type = -1 or Relations.IdParent is null

Avoiding duplicate recursion with Common Table Expressions

Let's say I have a table with 2 columns ID and ParentID. My data looks like this:
ID ParentID
1 Null
2 1
3 1
4 2
4 2
So to find all relationships based on a given ID my query simplified looks like this:
WITH links ([ID], [ParentID], Depth)
AS
(
--Get the starting link
SELECT
[ID],
[ParentID],
[Depth] = 1
FROM
[MyTable]
WHERE
[ID] = #StartID
UNION ALL
--Recursively get links that are parented to links already in the CTE
SELECT
mt.[ID],
mt.[ParentID],
[Depth] = l.[Depth] + 1
FROM
[MyTable] mt
JOIN
links l ON mt.ParentID = l.ID
WHERE
Depth < 99
)
SELECT
[Depth],
[ID],
[ParentID]
FROM
[links]
Now let's say the data in my table creates a cyclical relationship (4 is parented to 2 and 2 is parented to 4. Forgetting for a moment that there should likely be constraints on the database to prevent this, the above recursive CTE query produce duplicate records (99 of them) because it will recursively evaluate that cyclical relationship between 2 and 4.
ID ParentID
1 Null
2 1
3 1
4 2
2 4
2 4
How can I alter my query to prevent that, assuming that I have no control over preventing the actual data from representing that cyclical relationship. Normally I would put a distinct on the final select but I want the Depth value, which makes every record distinct. I'm also hoping to account for it within the CTE, as a distinct operates on the final select, and is probably not as efficient.
You could create a tree path variable in the CTE which shows your entire path from the top of the recursive query, then check to see if the number in question is in the tree path, if it is then abort at that point.
USE Master;
GO
CREATE DATABASE [QueryTraining];
GO
USE [QueryTraining];
GO
CREATE TABLE [MyTable] (
ID int, --would normally be an INT IDENTITY
ParentID int
);
INSERT INTO [MyTable] (ID, ParentID)
VALUES (1, NULL),
(2, 1),
(3, 1),
(4, 2),
(2, 4),
(2, 4);
DECLARE #StartID AS INTEGER;
SET #StartID = 1;
;WITH links (ID, ParentID, Depth, treePath)
AS
(
--Get the starting link
SELECT [ID],
[ParentID],
[Depth] = 1,
CAST(':' + CAST([ID] AS VARCHAR(MAX)) AS VARCHAR(MAX)) AS treePath
FROM [MyTable]
WHERE [ID] = #StartID
UNION ALL
--Recursively get links that are parented to links already in the CTE
SELECT mt.[ID],
mt.[ParentID],
[Depth] = l.[Depth] + 1,
CAST(l.treePath + CAST(mt.[ID] AS VARCHAR(MAX)) + ':' AS VARCHAR(MAX)) AS treePath
FROM [MyTable] mt
INNER JOIN links l ON mt.ParentID = l.ID
AND CHARINDEX(':' + CAST(mt.[ID] AS VARCHAR(MAX)) + ':', l.[treePath]) = 0
WHERE Depth < 10
)
SELECT
[Depth],
[ID],
[ParentID],
[treePath]
FROM
[links];
The line on the INNER JOIN that says
AND CHARINDEX(':' + CAST(mt.[ID] AS VARCHAR(MAX)) + ':', l.[treePath]) = 0
Is where the previous numbers in the path get filtered out.
Just copy and paste the example and give it a try.
One note, the way that I am using CHARINDEX on the CTE may not scale well, but it does accomplish what I think you are looking for.

SQL Server Merge statement

I am doing merge statement in my stored procedure. I need to count the rows during updates and inserts. If i use a common variable to get the updated rows (for both update and insert) how i can differ, this is the count which i got from update and this is the count which i got from insert. Please give me a better way
You can create a table variable to hold the action type then OUTPUT the pseudo $action column to it.
Example
/*Table to use as Merge Target*/
DECLARE #A TABLE (
[id] [int] NOT NULL PRIMARY KEY CLUSTERED,
[C] [varchar](200) NOT NULL)
/*Insert some initial data to be updated*/
INSERT INTO #A
SELECT 1, 'A' UNION ALL SELECT 2, 'B'
/*Table to hold actions*/
DECLARE #Actions TABLE(act CHAR(6))
/*Do the Merge*/
MERGE #A AS target
USING (VALUES (1, '#a'),( 2, '#b'),(3, 'C'),(4, 'D'),(5, 'E')) AS source (id, C)
ON (target.id = source.id)
WHEN MATCHED THEN
UPDATE SET C = source.C
WHEN NOT MATCHED THEN
INSERT (id, C)
VALUES (source.id, source.C)
OUTPUT $action INTO #Actions;
/*Check the result*/
SELECT act, COUNT(*) AS Cnt
FROM #Actions
GROUP BY act
Returns
act Cnt
------ -----------
INSERT 3
UPDATE 2

Resources