How To implement inner join on self dependent table - sql-server

Categories
1 | Pen | 3
2 | Book | 3
3 | Education | null
4 | Shirt | null
Product
1 | 10.00 | Parker-Pen | the description | 1000 | 1
2 | 35.00 | Dairy | the description | 500 | 2
3 | 9.00 | Dux-Pen | the description | 1000 | 1
4 | 350.00 | GeographyMap | the description | 30 | 3
4 | 250.00 | PoloShirt | the description | 100 | 4
These are the tables which I was actually retrieving the product whose category id is 3.
Here is the query which i used to retrive the data
select p.name, c.name
from product p
inner join Categories c on p.Categories_id=c.id
inner join Categories c2 on c2.id=3 or c2.parent=3
It is actually retrieving the data but in multiple time. And also have the poloshirt, which is not on the category id.
Can you explain me what is the problem and what is the better way for categorizing the product

If there is only one level of hierarchy allowed, you should get an answer with such query:
select p.name, c.name
from product p
inner join Categories c on p.Categories_id=c.id
where c.id=3 or c.parent=3

Related

SQL Server - identify combinations of values and assign combination identifier

I am trying to assign what amounts to a 'combinationid' to rows of my table, based on the values in the two columns below. Each product has a number of customers linked to it. For every combination of customers, I need to create a combination ID.
For example, the combination of customers for product 'a' is the same combination of customers for product 'c' (they both have customers 1, 2 and 3), so products a and c should have the same combination identifier ('customergroup'). However, products should not share the same customergroup if they only share some of the same customers - e.g. product b only has customers 1 and 2 (not 3), so should have a different customergroup to products 'a' and 'c'.
Input:
| productid | customerid |
|-----------|------------|
| a | 1 |
| a | 2 |
| a | 3 |
| b | 1 |
| b | 2 |
| c | 3 |
| c | 2 |
| c | 1 |
| d | 1 |
| d | 3 |
| e | 1 |
| e | 2 |
| f | 1 |
| g | 2 |
| h | 3 |
Desired output:
| productid | customerid | customergroup |
|-----------|------------|---------------|
| a | 1 | 1 |
| a | 2 | 1 |
| a | 3 | 1 |
| b | 1 | 2 |
| b | 2 | 2 |
| c | 3 | 1 |
| c | 2 | 1 |
| c | 1 | 1 |
| d | 1 | 3 |
| d | 3 | 3 |
| e | 1 | 2 |
| e | 2 | 2 |
| f | 1 | 4 |
| g | 2 | 5 |
| h | 3 | 6 |
or just
| productid | customergroupid |
|-----------|-----------------|
| a | 1 |
| b | 2 |
| c | 1 |
| d | 3 |
| e | 2 |
| f | 4 |
| g | 5 |
| h | 6 |
Edit: first version of this did include a description of my attempts. I currently have nested queries that basically give me a column for customer 1, 2, 3 etc and then uses dense rank to get the grouping. The problem is that is not dynamic for different numbers of customers and I did not know where to start for getting a dynamic result as above. Thanks for the replies.
Considering you haven't shown your efforts, or confirmed the version you're using, I've assumed you have the latest ("and greatest") version of SQL Server, which means you have access to STRING_AGG.
This doesn't give the groupings in the same order, but I'm going to also also that doesn't matter, and the grouping is just arbitrary. This gives you the following:
WITH VTE AS(
SELECT *
FROM (VALUES('a',1),
('a',2),
('a',3),
('b',1),
('b',2),
('c',3),
('c',2),
('c',1),
('d',1),
('d',3),
('e',1),
('e',2),
('f',1),
('g',2),
('h',3)) V(productid,customerid)),
Groups AS(
SELECT productid,
STRING_AGG(customerid,',') WITHIN GROUP (ORDER BY customerid) AS CustomerIDs
FROM VTE
GROUP BY productid),
Rankings AS(
SELECT productid,
CustomerIDs,
DENSE_RANK() OVER (ORDER BY CustomerIDs ASC) AS Grouping
FROM Groups)
SELECT V.productid,
V.customerid,
R.Grouping AS customergroupid
FROM VTE V
JOIN Rankings R ON V.productid = R.productid
ORDER BY V.productid,
V.customerid;
db<>fiddle.
If you aren't using SQL Server 2017, I suggest looking up the FOR XML PATH method for string aggregation.
Using Larnu's answer this is how I got the result for 2008:
WITH VTE AS(
SELECT *
FROM (VALUES('a','1'),
('a','2'),
('a','3'),
('b','1'),
('b','2'),
('c','3'),
('c','2'),
('c','1'),
('d','1'),
('d','3'),
('e','1'),
('e','2'),
('f','1'),
('g','2'),
('h','3')) V(productid,customerid)),
Groups AS(
SELECT productid, CustomerIDs = STUFF((SELECT N', ' + customerid
FROM VTE AS p2
WHERE p2.productid = p.productid
ORDER BY customerid
FOR XML PATH(N'')), 1, 2, N'')
FROM VTE AS p
GROUP BY productid),
Rankings AS(
SELECT productid,
CustomerIDs,
DENSE_RANK() OVER (ORDER BY CustomerIDs ASC) AS Grouping
FROM Groups)
SELECT V.productid,
V.customerid,
R.Grouping AS customergroupid
FROM VTE V
JOIN Rankings R ON V.productid = R.productid
ORDER BY V.productid,
V.customerid;
Thanks again for your assistance.

Getting duplicates with additional information

I've inherited a database and I'm having trouble constructing a working SQL query.
Suppose this is the data:
[Products]
| Id | DisplayId | Version | Company | Description |
|---- |----------- |---------- |-----------| ----------- |
| 1 | 12345 | 0 | 16 | Random |
| 2 | 12345 | 0 | 2 | Random 2 |
| 3 | AB123 | 0 | 1 | Random 3 |
| 4 | 12345 | 1 | 16 | Random 4 |
| 5 | 12345 | 1 | 2 | Random 5 |
| 6 | AB123 | 0 | 5 | Random 6 |
| 7 | 12345 | 2 | 16 | Random 7 |
| 8 | XX45 | 0 | 5 | Random 8 |
| 9 | XX45 | 0 | 7 | Random 9 |
| 10 | XX45 | 1 | 5 | Random 10 |
| 11 | XX45 | 1 | 7 | Random 11 |
[Companies]
| Id | Code |
|---- |-----------|
| 1 | 'ABC' |
| 2 | '456' |
| 5 | 'XYZ' |
| 7 | 'XYZ' |
| 16 | '456' |
The Versioncolumn is a version number. Higher numbers indicate more recent versions.
The Company column is a foreign key referencing the Companies table on the Id column.
There's another table called ProductData with a ProductId column referencing Products.Id.
Now I need to find duplicates based on the DisplayId and the corresponding Companies.Code. The ProductData table should be joined to show a title (ProductData.Title), and only the most recent ones should be included in the results. So the expected results are:
| Id | DisplayId | Version | Company | Description | ProductData.Title |
|---- |----------- |---------- |-----------|------------- |------------------ |
| 5 | 12345 | 1 | 2 | Random 2 | Title 2 |
| 7 | 12345 | 2 | 16 | Random 7 | Title 7 |
| 10 | XX45 | 1 | 5 | Random 10 | Title 10 |
| 11 | XX45 | 1 | 7 | Random 11 | Title 11 |
because XX45 has 2 "entries": one with Company 5 and one with Company 7, but both companies share the same code.
because 12345 has 2 "entries": one with Company 2 and one with Company 16, but both companies share the same code. Note that the most recent version of both differs (version 2 for company 16's entry and version 1 for company 2's entry)
ABC123 should not be included as its 2 entries have different company codes.
I'm eager to learn your insights...
Based on your sample data, you just need to JOIN the tables:
SELECT
p.Id, p.DisplayId, p.Version, p.Company, d.Title
FROM Products AS p
INNER JOIN Companies AS c ON p.Company = c.Id
INNER JOIN ProductData AS d ON d.ProductId = p.Id;
But if you want the latest one, you can use the ROW_NUMBER():
WITH CTE
AS
(
SELECT
p.Id, p.DisplayId, p.Version, p.Company, d.Title,
ROW_NUMBER() OVER(PARTITION BY p.DisplayId,p.Company ORDER BY p.Id DESC) AS RN
FROM Products AS p
INNER JOIN Companies AS c ON p.Company = c.Id
INNER JOIN ProductData AS d ON d.ProductId = p.Id
)
SELECT *
FROM CTE
WHERE RN = 1;
sample fiddle
| Id | DisplayId | Version | Company | Title |
|----|-----------|---------|---------|----------|
| 5 | 12345 | 1 | 2 | Title 5 |
| 7 | 12345 | 2 | 16 | Title 7 |
| 10 | XX45 | 1 | 5 | Title 10 |
| 11 | XX45 | 1 | 7 | Title 11 |
If i understood you correctly, you can use CTE to find all the duplicated rows from your table, then you can just use SELECT from CTE and even add more manipulations.
WITH CTE AS(
SELECT Id,DisplayId,Version,Company,Description,ProductData.Title
RN = ROW_NUMBER()OVER(PARTITION BY DisplayId, Company ORDER BY p.Id DESC)
FROM dbo.YourTable1
)
SELECT *
FROM CTE
Try this:
SELECT b.ID,displayid,version,company,productdata.title
FROM
(select A.ID,a.displayid,version,a.company,rn,a.code, COUNT(displayid) over (partition by displayid,code) cnt from
(select Prod.ID,displayid,version,company,Companies.code, Row_number() over (partition by displayid,company order by version desc) rn
from Prod inner join Companies on Prod.Company = Companies.id) a
where a.rn=1) b inner join productdata on b.id = productdata.id where cnt =2
You have to first get the current version and then you see how many times the DisplayID + Code show-up. Then based on that you can select only the ones that have a count greater than one. You can then INNER JOIN ProductData on the final query to get the Title.
WITH
MaxVersion AS --Get the current versions
(
SELECT
MAX(Version) AS Version,
DisplayID,
Company
FROM
#TmpProducts
GROUP BY
DisplayID,
Company
)
,CTE AS
(
SELECT
p.DisplayID,
c.Code,
COUNT(*) AS RowCounter
FROM
#TmpProducts p
INNER JOIN
#TmpCompanies c
ON
c.ID = p.Company
INNER JOIN
MaxVersion mv
ON
mv.DisplayID = p.DisplayID
AND mv.Version = p.Version
AND mv.Company = p.Company
GROUP BY
p.DisplayID,
c.Code
)
SELECT
p.*
FROM
#TmpProducts p
INNER JOIN
CTE c
ON
c.DisplayID = p.DisplayID
INNER JOIN
MaxVersion mv
ON
mv.DisplayID = p.DisplayID
AND mv.Company = p.Company
AND mv.Version = p.Version
WHERE
c.RowCounter > 1

Get all categories with number of associated records with where clause

So I have two tables:
Categories
-------------------
| Id | Name |
-------------------
| 1 | Category1 |
-------------------
| 2 | Category2 |
-------------------
| 3 | Category3 |
-------------------
Products
--------------------------------------------
| Id | CategoryId | Name | CreatedDate |
--------------------------------------------
| 1 | 1 | Product1 | 2017-05-05 |
--------------------------------------------
| 1 | 1 | Product2 | 2017-05-06 |
--------------------------------------------
| 2 | 2 | Product3 | 2017-12-21 |
--------------------------------------------
I need a query to select all categories along with the number of products for each for a specific time range in which those products were created (CreatedDate).
What I currently have is this:
SELECT c.[Name], COUNT(p.[Id]) AS ProductCount
FROM Categories AS c
LEFT JOIN Products AS p ON p.[CategoryId] = c.[Id]
WHERE p.[CreatedDate] BETWEEN '2017-05-01' AND '2017-06-01'
GROUP BY c.[Name]
My issue is that I'm not seeing Category2 and Category3 in the results set because they don't pass the WHERE clause. I want to see all categories in the results set.
Put the where condition in the left join clause
SELECT c.[Name], COUNT(p.[Id]) AS ProductCount
FROM Categories AS c
LEFT JOIN Products AS p ON p.[CategoryId] = c.[Id]
AND p.[CreatedDate] BETWEEN '2017-05-01' AND '2017-06-01'
GROUP BY c.[Name]
This way it is applied to the join only and not to the complete result set.

Return one row when multiples are returned

Firstly apologies for this as a number of similar posts have been posted but I just can't seem to return what I would like
My data returns
desc | date | taken | result | text | notes | page | group | q | answer | value | state | time |
------------------------------------------------------------------------------
Asess1 | 20170101 | John | 5 | Injury | xxx | Page1 | Assess11 | 1 | 1234567 | 1 | 1 | 0 |
Asess1 | 20170101 | John | 5 | Injury | xxx | Page1 | Assess11 | 1 | 1234567 | 1 | 1 | 0 |
Asess1 | 20170101 | John | 5 | Injury | xxx | Page1 | Assess11 | 1 | 1234567 | 1 | 1 | 0 |
Asess1 | 20170101 | John | 5 | Injury | xxx | Page1 | Assess11 | 1 | 1234567 | 1 | 1 | 0 |
Asess1 | 20170101 | John | 5 | Injury | xxx | Page1 | Assess11 | 1 | 1234567 | 1 | 1 | 0 |
Code as follows
select t.desc,a.date,a.taken,a.result,a.text,a.notes,d.page,d.group,d.q,d.answer,d.value,d.state,d.timeSpanSeconds
from cc_clientAssessments a
inner join cs_assessmentData d on a.assessmentId=d.assessment
inner join cs_clients c on c.person=a.residentId
inner join cs_facilities f on f.guid=a.facilityId
inner join cs_assessmentTypes t on t.assessmentTypeId=a.assessmentTypeId
where c.surname='smith'
and f.name='home'
and t.description ='injury'
and a.dateTaken='2017-05-28 00:00:00.000'
and d.questionName='1'
and d.answer='1234567'
order by t.desc, a.date desc,d.page,d.group,d.q
any help would be great.
One (or more) of your joins is causing this duplication, because you have not been specific enough in your join criteria.
As other commenters have said, remove all your joins and then add them back in one by one until you start to see duplicates. Using select * you can see what additional data is being pulled back and therefore what additional filters you need to include in that join. Once you have no duplication, add in the next join and repeat the whole process.
This is the sensible way to resolve this as nine times out of ten you can stop this duplication with more specific join criteria, which will also ensure your query is processing less data and will therefore be more efficient.
Although the elegant solution is to fix your joins so they don't result in as many rows, a very quick fix would be to just use distinct to eliminate duplicates and convert your text field to an string, so it can be compared.
select distinct t.desc,a.date,a.taken,a.result,substring(a.text,1,512) as text,a.notes,d.page,d.group,d.q,d.answer,d.value,d.state,d.timeSpanSeconds
from cc_clientAssessments a
inner join cs_assessmentData d on a.assessmentId=d.assessment
inner join cs_clients c on c.person=a.residentId
inner join cs_facilities f on f.guid=a.facilityId
inner join cs_assessmentTypes t on t.assessmentTypeId=a.assessmentTypeId
where c.surname='smith'
and f.name='home'
and t.description ='injury'
and a.dateTaken='2017-05-28 00:00:00.000'
and d.questionName='1'
and d.answer='1234567'
order by t.desc, a.date desc,d.page,d.group,d.q

Where to use Outer Apply

MASTER TABLE
x------x--------------------x
| Id | Name |
x------x--------------------x
| 1 | A |
| 2 | B |
| 3 | C |
x------x--------------------x
DETAILS TABLE
x------x--------------------x-------x
| Id | PERIOD | QTY |
x------x--------------------x-------x
| 1 | 2014-01-13 | 10 |
| 1 | 2014-01-11 | 15 |
| 1 | 2014-01-12 | 20 |
| 2 | 2014-01-06 | 30 |
| 2 | 2014-01-08 | 40 |
x------x--------------------x-------x
I am getting the same results when LEFT JOIN and OUTER APPLY is used.
LEFT JOIN
SELECT T1.ID,T1.NAME,T2.PERIOD,T2.QTY
FROM MASTER T1
LEFT JOIN DETAILS T2 ON T1.ID=T2.ID
OUTER APPLY
SELECT T1.ID,T1.NAME,TAB.PERIOD,TAB.QTY
FROM MASTER T1
OUTER APPLY
(
SELECT ID,PERIOD,QTY
FROM DETAILS T2
WHERE T1.ID=T2.ID
)TAB
Where should I use LEFT JOIN AND where should I use OUTER APPLY
A LEFT JOIN should be replaced with OUTER APPLY in the following situations.
1. If we want to join two tables based on TOP n results
Consider if we need to select Id and Name from Master and last two dates for each Id from Details table.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
LEFT JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID
which forms the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | NULL | NULL |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
This will bring wrong results ie, it will bring only latest two dates data from Details table irrespective of Id even though we join with Id. So the proper solution is using OUTER APPLY.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
OUTER APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D
Here is the working : In LEFT JOIN , TOP 2 dates will be joined to the MASTER only after executing the query inside derived table D. In OUTER APPLY, it uses joining WHERE M.ID=D.ID inside the OUTER APPLY, so that each ID in Master will be joined with TOP 2 dates which will bring the following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
2. When we need LEFT JOIN functionality using functions.
OUTER APPLY can be used as a replacement with LEFT JOIN when we need to get result from Master table and a function.
SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
OUTER APPLY dbo.FnGetQty(M.ID) C
And the function goes here.
CREATE FUNCTION FnGetQty
(
#Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=#Id
)
which generated the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
3. Retain NULL values when unpivoting
Consider you have the below table
x------x-------------x--------------x
| Id | FROMDATE | TODATE |
x------x-------------x--------------x
| 1 | 2014-01-11 | 2014-01-13 |
| 1 | 2014-02-23 | 2014-02-27 |
| 2 | 2014-05-06 | 2014-05-30 |
| 3 | NULL | NULL |
x------x-------------x--------------x
When you use UNPIVOT to bring FROMDATE AND TODATE to one column, it will eliminate NULL values by default.
SELECT ID,DATES
FROM MYTABLE
UNPIVOT (DATES FOR COLS IN (FROMDATE,TODATE)) P
which generates the below result. Note that we have missed the record of Id number 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
x------x-------------x
In such cases an APPLY can be used(either CROSS APPLY or OUTER APPLY, which is interchangeable).
SELECT DISTINCT ID,DATES
FROM MYTABLE
OUTER APPLY(VALUES (FROMDATE),(TODATE))
COLUMNNAMES(DATES)
which forms the following result and retains Id where its value is 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
| 3 | NULL |
x------x-------------x
In your example queries the results are indeed the same.
But OUTER APPLY can do more: For each outer row you can produce an arbitrary inner result set. For example you can join the TOP 1 ORDER BY ... row. A LEFT JOIN can't do that.
The computation of the inner result set can reference outer columns (like your example did).
OUTER APPLY is strictly more powerful than LEFT JOIN. This is easy to see because each LEFT JOIN can be rewritten to an OUTER APPLY just like you did. It's syntax is more verbose, though.

Resources