I need to model groups of persons and I can't find a way to design tabels to do it efficiently.
Groups can be thought as sets, unordered collections of one or more persons, each group should be uniquely identified by its components.
Edit: and a person can be part of more than one group.
My first attempt looks like this.
A table which contains all "persons" managed by the system.
table Persons(
id int,
name varchar,
(other data...)
)
a table that contains groups and all group properties:
table Groups(
group_id int,
group_name varchar,
(other data...)
)
and a table with the association between persons and groups
table gropus_persons (
person_id int,
group_id in
)
This design doesn't fit well with this requirements because it is hard to write the query to retrieve the group id from a list of components.
The only query I could come up to find the group composed by persons (1, 2, 3) looks like this:
select *
from groups g
where
g.group_id in (select group_id from gropus_persons where person_id = 1)
and g.group_id in (select group_id from gropus_persons where person_id = 2)
and g.group_id in (select group_id from gropus_persons where person_id = 3)
and not exists (select 1 from gropus_persons where group_id = g.group_id and person_id not in (1,2,3))
the problem is that the number of components is variable so I can only use a dynamically generated query and add a subquery for each component each time I need to find a new group.
Is there a better solution?
Thank you in advice for the help!
You need to group by the "group" and count how many hits you receive. For this, you only need the intersection table:
select GroupID, count(*) as MemberCount
from GroupsPersons
where PersonID in( 1, 2, 3 )
group by GroupID
having count(*) = 3;
The problem comes with making this query suitable for a varying list of person id values. As you seem to already realize this will require dynamic SQL, the pseudo-code will look something like this:
stmt := 'select GroupID, count(*) as MemberCount '
|| 'from GroupsPersons '
|| 'where PersonID in( ' || CSVList || ' ) '
|| 'group by GroupID '
|| 'having count(*) = ' || length( CSVList );
The one potential bug you have to be wary of is if the same id repeats in the list. For example: CSVList := '1, 2, 3, 2';
This will generate a correct count(*) value of 3, but the having clause will be looking for 4.
Another solution to consider is to pivot/xpath the set of person IDs in alpha sequence and store it in your groups table and compare that string with your target.
For your example, you'd use Select group_id from groups where personIDs = '1,2,3,'
How about this, I think the schema is the same as yours, not sure:
create table Groups(
group_id int primary key,
group_name varchar(100)
);
create table Persons(
person_id int primary key,
name varchar
);
create table Membership(
group_id int REFERENCES Groups (group_id),
person_id int REFERENCES Persons (person_id)
);
INSERT INTO Persons
VALUES (1, 'p1'),
(2, 'p2'),
(3, 'p2'),
(4, 'p2');
INSERT INTO Groups
VALUES (1, 'group1'),
(2, 'group2');
INSERT INTO Membership
VALUES (1, 1),
(1, 2),
(2, 2),
(1, 3);
Then select:
select p.name, g.group_name
from Persons as p
join Membership as m on p.person_id = m.person_id
join Groups as g on g.group_id = m.group_id
where m.group_id in (1, 2);
Obviously data would need to be adjusted to suit yours.
Related
columns are: Name, Location_Name, Location_ID
I want to check Names and Location_ID, and if there are two that are the same I want to delete/remove that row.
For example: If Name John Fox at location id 4 shows up two or more times I want to just keep one.
After that, I want to count how many people per location.
Location_Name1: 45
Location_Name2: 66
Etc...
The location name and Location Id are related.
Sample data
Code I tried
Deleting duplicates is a common pattern. You apply a sequence number to all of the duplicates, then delete any that aren't first. In this case I order arbitrarily but you can choose to keep the one with the lowest PK value or that was modified last or whatever - just update the ORDER BY to sort the one you want to keep first.
;WITH cte AS
(
SELECT *, rn = ROW_NUMBER() OVER
(PARTITION BY Name, Location_ID ORDER BY ##SPID)
FROM dbo.TableName
)
DELETE cte WHERE rn > 1;
Then to count, assuming there can't be two different Location_IDs for a given Location_Name (this is why schema + sample data is so helpful):
SELECT Location_Name, People = COUNT(Name)
FROM dbo.TableName
GROUP BY Location_Name;
Example db<>fiddle
If Location_Name and Location_ID are not tightly coupled (e.g. there could be Location_ID = 4, Location_Name = Place 1 and Location_ID = 4, Location_Name = Place 2 then you're going to have to define how to determine which place to display if you group by Location_ID, or admit that perhaps one of those columns is meaningless.
If Location_Name and Location_ID are tightly coupled, they shouldn't both be stored in this table. You should have a lookup/dimension table that stores both of those columns (once!) and you use the smallest data type as the key you record in the fact table (where it is repeated over and over again). This has several benefits:
Scanning the bigger table is faster, because it's not as wide
Storage is reduced because you're not repeating long strings over and over and over again
Aggregation is clearer and you can join to get names after aggregation, which will be faster
If you need to change a location's name, you only need to change it in exactly one place
Sample code
CREATE TABLE People_Location
(
Name VARCHAR(30) NOT NULL,
Location_Name VARCHAR(30) NOT NULL,
Location_ID INT NOT NULL,
)
INSERT INTO People_Location
VALUES
('John Fox', 'Moon', 4),
('John Bear', 'Moon', 4),
('Peter', 'Saturn', 5),
('John Fox', 'Moon', 4),
('Micheal', 'Sun', 1),
('Jackie', 'Sun', 1),
('Tito', 'Sun', 1),
('Peter', 'Saturn', 5)
Get location and count
select Location_Name, count(1)
from
(select Name, Location_Name,
rn = ROW_NUMBER() OVER (PARTITION BY Name, Location_ID ORDER BY Name)
from People_Location
) t
where rn = 1
group by Location_Name
Result
Moon 2
Saturn 1
Sun 3
I have a SQL Server database with a large products table and a category table.
In the product table there is a category column which contains the ID of the corresponding category.
What I have to do is perform a LIKE query on both the product name column and the corresponding category name and return all the matching.
What kind of approach should I use considering I want to minimize the load on the server?
EDIT to improve this question:
Results:
product1 name->red apple catId->3 (cat_id = 3, "best apples")
product2 name->green apple catId->5 (cat_id = 5, "good fruits")
product3 name->green banana catId->8 (cat_id = 8, "apples & bananas")
If i understand correctly, this will help you -
create table products(
ID INT PRIMARY KEY,
[Name] NVARCHAR(MAX),
[CatId] INT
)
CREATE TABLE CATEGORIES(
ID INT PRIMARY KEY,
[Name] NVARCHAR(MAX)
)
insert into CATEGORIES values(1, 'Laptop')
insert into CATEGORIES values(2, 'Fruit')
insert into products values (2, 'Washington Apple', 2)
insert into products values (1, 'Apple mac book pro', 1)
select * from products
SELECT p.[Name] as ProductName, c.[Name] as CategoryName FROM products p LEFT JOIN CATEGORIES c ON p.CatId = c.ID Where p.[Name] like '%apple%'
This code will join the two tables and return the products whose name matches the keyword and return their category names as well. You can modify the definition of products and categories to include other columns you want. You may also want to add FK constraint on catid.
SQL FIDDLE DEMO HERE
I have this structure of tables:
CREATE TABLE Users
([UserId] int,
[IdDepartment] int);
INSERT INTO Users
([UserId], [IdDepartment])
VALUES
(1, 5),
(2, 0),
(3, -1),
(4, 0),
(5, -1),
(6, 0);
CREATE TABLE Department
([IdDepartment] int, [Name] varchar(23), [IdUser] int);
INSERT INTO Department
([IdDepartment], [Name], [IdUser])
VALUES
(1, 'Sales', 3),
(2, 'Finance', null ),
(3, 'Accounting' , 5),
(4, 'IT' ,3),
(5, 'Secretary',null),
(6, 'Sport',3);
I want to get a query with this results:
In the Users table if the IdDepartment is 0 ist means that the user is an admin so he can see all the departments. If the user has a -1 in the idpartment it means that the user can access to limited departments, so in this case I do a inner join to the Department table to get the list of this departments. The last case is if the user has a number for the idDepartament in the user table diferent to 0 and diferent to -1 it means that the user can access only to this department.
I tried to do something like that, but it is not well structured:
select
case idDepartment
when 0 then (select Name from Department)
when -1 then (select Name from Department where IdUser = 3)
else (select Name from Department
inner join Users on Department.idDepartment = Users.Department
where Users.UserId = 3)
end
from
Department
where
IdUser = 3
How can I do this? thanks.
I add an example for what I want to get:
-For the user that has the userid (1) -->
Department Name
---------------
Secretary
-For the user that has the userid (2) -->
Department Name
---------------
Sales
Finance
Accounting
IT
Secretary
Sport
-For the user that has the userid (3) -->
Department Name
---------------
Sales
IT
Sports
You cant do something like that in a SELECT CASE, the best option is to just introduce some logic
DECLARE #IdUser INT = 3
DECLARE #userDepartment INT
SELECT #userDepartment = IdDepartment
FROM Users
WHERE UserId = #IdUser
IF #userDepartment = 0
BEGIN
SELECT Name FROM Department
END
ELSE IF #userDepartment = -1
BEGIN
SELECT Name FROM Department WHERE IdUser = #IdUser
END
ELSE
BEGIN
SELECT Name FROM Department
INNER JOIN Users
ON Department.idDepartment = Users.IdDepartment
WHERE Users.UserId = #IdUser
END
By the way, You've hit upon why your structure is not ideal. If you had a junction table between Users & Departments, you could model any combination of what you have already with a much simpler query (At the cost of lots of rows in your junction table)
Your sample code is a bit confusing but feels like you are looking for something like this:
declare #id_user int = 3
select d.IdDepartment, d.Name
from Department d
where exists
(
select 1
from Users u
where u.[UserId] = #id_user
and u.IdDepartment in (0, d.IdDepartment)
)
or d.[IdUser] = #id_user
which implements:
if IdUser in Department table is the same as #id_user given - he has access to this department for sure
otherwise this user has access to department if his IdDep value is 0 or equal to corresponding department ID
But your permissions/security model smells not good and it's absolutely not scalable. You'd better invent another entity (table) to store permitted tuples: (IdUser, IdDepartment). Select statements would look much more clear in this case.
declare #IdUser int = 3;
SELECT u.[UserId], d.Name
from Users u
join Department d
on u.[IdDepartment] = 0
or ( u.[IdDepartment] = -1 and d.[IdUser] = u.[UserId] )
or ( u.[IdDepartment] > 0 and d.[IdDepartment] = u.[IdDepartment] )
where u.[UserId] = #IdUser
order by u.[UserId], d.Name
userID 3 should include sport
If I understand from your pseudo-query what you're trying to do, you can do it with a carefully constructed WHERE clause:
case idDepartment
when 0 then (select Name from Department)
when -1 then (select Name from Department where IdUser = 3)
else (select Name from Department inner join Users on Department.idDepartment = Users.Department where Users.UserId = 3)
end
Could be written as a sub-select:
(SELECT Name
FROM Department
LEFT OUTER JOIN Users
on Department.idDepartment = Users.Department
WHERE idDepartment=0
OR (idDepartment = -1 AND idUser = 3)
OR (Users.UserId = 3)
)
You would have to correlate the sub-select to your outer query, of course.
I have my database design as per the diagram.
Category table is self referencing parent child relationship
Budget will have all the categories and amount define for each category
Expense table will have entries for categories for which the amount has been spend (consider Total column from this table).
I want to write select statement that will retrieve dataset with columns given below :
ID
CategoryID
CategoryName
TotalAmount (Sum of Amount Column of all children hierarchy From BudgetTable )
SumOfExpense (Sum of Total Column of Expense all children hierarchy from expense table)
I tried to use a CTE but was unable to produce anything useful. Thanks for your help in advance. :)
Update
I just to combine and simplify data I have created one view with the query below.
SELECT
dbo.Budget.Id, dbo.Budget.ProjectId, dbo.Budget.CategoryId,
dbo.Budget.Amount,
dbo.Category.ParentID, dbo.Category.Name,
ISNULL(dbo.Expense.Total, 0) AS CostToDate
FROM
dbo.Budget
INNER JOIN
dbo.Category ON dbo.Budget.CategoryId = dbo.Category.Id
LEFT OUTER JOIN
dbo.Expense ON dbo.Category.Id = dbo.Expense.CategoryId
Basically that should produce results like this.
This is an interesting problem. And I'm going to solve it with a hierarchyid. First, the setup:
USE tempdb;
IF OBJECT_ID('dbo.Hierarchy') IS NOT NULL
DROP TABLE dbo.[Hierarchy];
CREATE TABLE dbo.Hierarchy
(
ID INT NOT NULL PRIMARY KEY,
ParentID INT NULL,
CONSTRAINT [FK_parent] FOREIGN KEY ([ParentID]) REFERENCES dbo.Hierarchy([ID]),
hid HIERARCHYID,
Amount INT NOT null
);
INSERT INTO [dbo].[Hierarchy]
( [ID], [ParentID], [Amount] )
VALUES
(1, NULL, 100 ),
(2, 1, 50),
(3, 1, 50),
(4, 2, 58),
(5, 2, 7),
(6, 3, 10),
(7, 3, 20)
SELECT * FROM dbo.[Hierarchy] AS [h];
Next, to update the hid column with a proper value for the hiearchyid. I'll use a bog standard recursive cte for that
WITH cte AS (
SELECT [h].[ID] ,
[h].[ParentID] ,
CAST('/' + CAST(h.[ID] AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) AS [h],
[h].[hid]
FROM [dbo].[Hierarchy] AS [h]
WHERE [h].[ParentID] IS NULL
UNION ALL
SELECT [h].[ID] ,
[h].[ParentID] ,
CAST([c].[h] + CAST(h.[ID] AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) AS [h],
[h].[hid]
FROM [dbo].[Hierarchy] AS [h]
JOIN [cte] AS [c]
ON [h].[ParentID] = [c].[ID]
)
UPDATE [h]
SET hid = [cte].[h]
FROM cte
JOIN dbo.[Hierarchy] AS [h]
ON [h].[ID] = [cte].[ID];
Now that the heavy lifting is done, the results you want are almost trivially obtained:
SELECT p.id, SUM([c].[Amount])
FROM dbo.[Hierarchy] AS [p]
JOIN [dbo].[Hierarchy] AS [c]
ON c.[hid].IsDescendantOf(p.[hid]) = 1
GROUP BY [p].[ID];
After much research and using test data, I was able to get the running totals starting from bottom of hierarchy.
The solution is made up of two steps.
Create a scalar-valued function that will decide whether a categoryId is a direct or indirect child of another categoryId. This is given in first code-snippet. Note that a recursive query is used for this since that is the best approach when dealing with hierarchy in SQL Server.
Write the running total query that will give totals according to your requirements for all categories. You can filter by category if you wanted to on this query. The second code snippet provides this query.
Scalar-valued function that tells if a child category is a direct or indirect child of another category
CREATE FUNCTION dbo.IsADirectOrIndirectChild(
#childId int, #parentId int)
RETURNS int
AS
BEGIN
DECLARE #isAChild int;
WITH h(ParentId, ChildId)
-- CTE name and columns
AS (
SELECT TOP 1 #parentId, #parentId
FROM dbo.Category AS b
UNION ALL
SELECT b.ParentId, b.Id AS ChildId
FROM h AS cte
INNER JOIN
Category AS b
ON b.ParentId = cte.ChildId AND
cte.ChildId IS NOT NULL)
SELECT #isAChild = ISNULL(ChildId, 0)
FROM h
WHERE ChildId = #childId AND
ParentId <> ChildId
OPTION(MAXRECURSION 32000);
IF #isAChild > 0
BEGIN
SET #isAChild = 1;
END;
ELSE
BEGIN
SET #isAChild = 0;
END;
RETURN #isAChild;
END;
GO
Query for running total starting from bottom of hierarchy
SELECT c.Id AS CategoryId, c.Name AS CategoryName,
(
SELECT SUM(ISNULL(b.amount, 0))
FROM dbo.Budget AS b
WHERE dbo.IsADirectOrIndirectChild( b.CategoryId, c.Id ) = 1 OR
b.CategoryId = c.Id
) AS totalAmount,
(
SELECT SUM(ISNULL(e.total, 0))
FROM dbo.Expense AS e
WHERE dbo.IsADirectOrIndirectChild( e.CategoryId, c.Id ) = 1 OR
e.CategoryId = c.Id
) AS totalCost
FROM dbo.Category AS c;
i have a table some departments tagged with user as
User | Department
user1 | IT,HR,House Keeping
user2 | HR,House Keeping
user3 | IT,Finance,HR,Maintainance
user4 | Finance,HR,House Keeping
user5 | IT,HR,Finance
i have created a SP that take parameter varchar(max) as filter (i dynamically merged if in C# code)
in the sp i creat a temp table for the selected filters eg; if user select IT & Finance & HR
i merged the string as IT##Finance##HR (in C#) & call the sp with this parameter
in SP i make a temp table as
FilterValue
IT
Finance
HR
now the issue how can i get the records that contains all the departments taged with them
(users that are associated with all the values in temp table) to get
User | Department
user3 | IT,Finance,HR,Maintainance
user5 | IT,HR,Finance
as optput
please suggest an optimised way to achive this filtering
This design is beyond horrible -- you should really change this to use truly relational design with a dependent table.
That said, if you are not in a position to change the design, you can limp around the problem with XML, and it might give you OK performance.
Try something like this (replace '#test' with your table name as needed...). You won't need to even create your temp table -- this will jimmy your comma-delimited string around into XML, which you can then use XQuery against directly:
DECLARE #test TABLE (usr int, department varchar(1000))
insert into #test (usr, department)
values (1, 'IT,HR,House Keeping')
insert into #test (usr, department)
values (2, 'HR,House Keeping')
insert into #test (usr, department)
values (3, 'IT,Finance,HR,Maintainance')
insert into #test (usr, department)
values (4, 'Finance,HR,House Keeping')
insert into #test (usr, department)
values (5, 'IT,HR,Finance')
;WITH departments (usr, department, depts)
AS
(
SELECT usr, department, CAST(NULLIF('<department><dept>' + REPLACE(department, ',', '</dept><dept>') + '</dept></department>', '<department><dept></dept></department>') AS xml)
FROM #test
)
SELECT departments.usr, departments.department
FROM departments
WHERE departments.depts.exist('/department/dept[text()[1] eq "IT"]') = 1
AND departments.depts.exist('/department/dept[text()[1] eq "HR"]') = 1
AND departments.depts.exist('/department/dept[text()[1] eq "Finance"]') = 1
I agree with others that your design is, um, not ideal, and given the fact that it may change, as per your comment, one is not too motivated to find a really fascinating solution for the present situation.
Still, you can have one that at least works correctly (I think) and meets the situation. Here's what I've come up with:
;
WITH
UserDepartment ([User], Department) AS (
SELECT 'user1', 'IT,HR,House Keeping' UNION ALL
SELECT 'user2', 'HR,House Keeping' UNION ALL
SELECT 'user3', 'IT,Finance,HR,Maintainance' UNION ALL
SELECT 'user4', 'Finance,HR,House Keeping' UNION ALL
SELECT 'user5', 'IT,HR,Finance'
),
Filter (FilterValue) AS (
SELECT 'IT' UNION ALL
SELECT 'Finance' UNION ALL
SELECT 'HR'
),
CSVSplit AS (
SELECT
ud.*,
--x.node.value('.', 'varchar(max)')
x.Value AS aDepartment
FROM UserDepartment ud
CROSS APPLY (SELECT * FROM dbo.Split(',', ud.Department)) x
)
SELECT
c.[User],
c.Department
FROM CSVSplit c
INNER JOIN Filter f ON c.aDepartment = f.FilterValue
GROUP BY
c.[User],
c.Department
HAVING
COUNT(*) = (SELECT COUNT(*) FROM Filter)
The first two CTEs are just sample tables, the rest of the query is the solution proper.
The CSVSplit CTE uses a Split function that splits a comma-separated list into a set of items and returns them as a table. The entire CTE turns the row set of the form
----- ---------------------------
user1 department1,department2,...
... ...
into this:
----- -----------
user1 department1
user1 department2
... ...
The main SELECT joins the normalised row set with the filter table and selects rows where the number of matches exactly equals the number of items in the filter table. (Note: this implies there's no identical names in UserDepartment.Department).