Duplicates on Self Left Join - sql-server

I'm trying to pivot out a table of data stored in a vertical model into a more horizontal, SQL Server table-like model. Unfortunately due to the nature of the data, I cannot use the real data here so I worked up a generic example that follows the same model.
There are three columns to the table, an ID, column ID and value, where the ID and column ID form the Primary Key. Additionally none of the data is required (i.e. an ID can be missing column ID = 3 without breaking anything)
PetID | ColumnID | Value
---------------------------
1 | 1 | Gilda
1 | 2 | Cat
2 | 1 | Sonny
2 | 2 | Cat
2 | 3 | Black
Due to the fact that the Primary Key is a composite of two columns I cannot use the built in PIVOT functionality, so I tried doing a self LEFT JOIN:
SELECT T1.PetID
,T2.Value AS [Name]
,T3.Value AS [Type]
,T4.Value AS [Color]
FROM #Temp AS T1
LEFT JOIN #Temp AS T2 ON T1.PetID = T2.PetID
AND T2.ColumnID = 1
LEFT JOIN #Temp AS T3 ON T1.PetID = T3.PetID
AND T3.ColumnID = 2
LEFT JOIN #Temp AS T4 ON T1.PetID = T4.PetID
AND T4.ColumnID = 3;
The idea being that I want to take the ID from T1 and then do a self LEFT JOIN to get each of the values by ColumnID. However I'm getting duplicates in the data:
PetID | Name | Type | Color
------------------------------
1 | Gilda | Cat | NULL
1 | Gilda | Cat | NULL
2 | Sonny | Cat | Black
2 | Sonny | Cat | Black
2 | Sonny | Cat | Black
I am able to get rid of these duplicates using a DISTINCT, but the dataset is rather large, so the required sort action is slowing down the query tremendously. Is there a better way to accomplish this or am I just stuck with a slow query?

You can use a CASE statement and avoid the joins altogether.
SELECT
PetID,
MAX(CASE WHEN ColumnID = 1 THEN Value ELSE NULL END) AS Name,
MAX(CASE WHEN ColumnID = 2 THEN Value ELSE NULL END) AS Type,
MAX(CASE WHEN ColumnID = 3 THEN Value ELSE NULL END) AS Color
FROM #Temp
GROUP BY PetId
It is essential that PetID, ColumnID be your primary key for this to work correctly. Otherwise it will cause problems when the same ColumnID is used multiple times for the same PetID

You can use pivot if you'd like to..
SELECT *
FROM (SELECT PetID,
(CASE ColumnID
WHEN 1 THEN 'Name'
WHEN 2 THEN 'Type'
WHEN 3 THEN 'Color'
END) ValueType,
VALUE
FROM #Temp
) t
PIVOT
( MAX(Value)
FOR ValueType IN ([Name],[Type],[Color])
) p
Another way without the Sub query would be..
SELECT PetID,
[1] [Name],
[2] [Type],
[3] [Color]
FROM #Temp
PIVOT
( MAX(Value)
FOR ColumnID IN ([1],[2],[3])
) p

I don't understand your concern about sorting. You have a primary key so you also have an index. This is the correct way to do it:
select
PetID,
min(case when ColumnID = 1 then Value end) as Name,
min(case when ColumnID = 2 then Value end) as Type,
min(case when ColumnID = 3 then Value end) as Color
from #Temp
group by PetID
A fix for your duplication is simple though and will probably improve performance as well:
FROM (select distinct PetID from #Temp) AS T1

SELECT T1.PetID
,T1.Value AS [Name]
,T2.Value AS [Type]
,T3.Value AS [Color]
--select *
FROM #Temp AS T1
LEFT JOIN #Temp AS T2 ON T1.PetID = T2.PetID
AND T2.ColumnID = 2
LEFT JOIN #Temp AS T3 ON T1.PetID = T3.PetID
AND T3.ColumnID = 3
where t1.ColumnID = 1
Your problem was that you were joining to the main table that had multiple rows.

Related

Update Column Data of Table Aliases According to other Column's Data

I have a store Procedure as Following
BEGIN
;with Data as(
select
E.Id,
E.FirstName as [Employee],
E.IsDeleted AS [status],
Co.Name as [Company],
S.SalaryType,
S.Date as [Date],
E.Desc as Notes,
CASE #SortField
WHEN 'Id' THEN ROW_NUMBER() OVER (ORDER BY E.Id)
WHEN 'date' THEN ROW_NUMBER() OVER (ORDER BY S.Date)
END rn
From Employee E
Inner Join Company Co on E.CompId = Co.Id
Inner Join Salary S on E.Salarytype = S.Id
Where
E.Name Like '%'+#EmployeeName+'%'
-- Other AND Conditions --
)
select *,(Select Count(1) From data) FilteredCount -- can i Update it here ??--
FROM data
ORDER BY CASE WHEN #SortDir = 'ASC' THEN rn else -rn END
OFFSET #StartIndex ROWS
FETCH NEXT #PageSize ROWS ONLY
END
It of course Gives more columns in result but to keep question short i kept only two columns.
+------+-------+
| Name | Status|
+------+-------+
| John | 1 |
| Mark | 1 |
| Sami | 0 |
+------+-------+
So now I just want some changes in my table aliases so I can update result SET Name = 'Deleted' Where I am getting Status as 0
+--------+-------+
| Name | Status|
+--------+-------+
| John | 1 |
| Mark | 1 |
| Deleted| 0 |
+------+---------+
Is this possible to update Table aliases an perform Order by Like Operations on it ?
You don't need a CTE to UPDATE the table, or even get the result set you require, but you can OUTPUT the column of a table you UPDATE at the same time you UPDATE it. Normally for the above, however, you would only UPDATE the rows you want to UPDATE, rather than them all, though that doesn't mean you can't:
UPDATE dbo.Employee
SET [Name] = CASE Status WHEN 1 THEN [Name] ELSE 'Deleted' END
OUTPUT inserted.[Name],
inserted.Status;

SQL SERVER update or insert after left join

I have a Table Animals
Id | Name | Count | -- (other columns not relevant)
1 | horse | 11
2 | giraffe | 20
I want to try to insert or update values from a CSV string
Is it possible to do something like the following in 1 query?
;with results as
(
select * from
(
values ('horse'), ('giraffe'), ('lion')
)
animal_csv(aName)
left join animals on
animals.[Name] = animal_csv.aName
)
update results
set
[Count] = 1 + animals.[Count]
-- various other columns are set here
where Id is not null
--else
--insert into results ([Name], [Count]) values (results.aName, 1)
-- (essentially Where id is null)
It looks like what you're looking for is a table variable or temporary table rather than a common table expression.
If I understand your problem correctly, you are building a result set based on data you're getting from a CSV, merging it by incrementing values, and then returning that result set.
As I read your code, it looks as if your results would look like this:
aName | Id | Name | Count
horse | 1 | horse | 12
giraffe | 2 | giraffe | 21
lion | | |
I think what you're looking for in your final result set is this:
Name | Count
horse | 12
giraffe | 21
lion | 1
First, you can get from your csv and table to a resultset in a single CTE statement:
;WITH animal_csv AS (SELECT * FROM (VALUES('horse'),('giraffe'), ('lion')) a(aName))
SELECT ISNULL(Name, aName) Name
, CASE WHEN [Count] IS NULL THEN 1 ELSE 1 + [Count] END [Count]
FROM animal_csv
LEFT JOIN animals
ON Name = animal_csv.aName
Or, if you want to build your resultset using a table variable:
DECLARE #Results TABLE
(
Name VARCHAR(30)
, Count INT
)
;WITH animal_csv AS (SELECT * FROM (VALUES('horse'),('giraffe'), ('lion')) a(aName))
INSERT #Results
SELECT ISNULL(Name, aName) Name
, CASE WHEN [Count] IS NULL THEN 1 ELSE 1 + [Count] END [Count]
FROM animal_csv
LEFT JOIN animals
ON Name = animal_csv.aName
SELECT * FROM #results
Or, if you just want to use a temporary table, you can build it like this (temp tables are deleted when the connection is released/closed or when they're explicitly dropped):
;WITH animal_csv AS (SELECT * FROM (VALUES('horse'),('giraffe'), ('lion')) a(aName))
SELECT ISNULL(Name, aName) Name
, CASE WHEN [Count] IS NULL THEN 1 ELSE 1 + [Count] END [Count]
INTO #results
FROM animal_csv
LEFT JOIN animals
ON Name = animal_csv.aName
SELECT * FROM #results

How To Avoid TempTable in Union All when queries contain DIFFERENT order by and inner join?

What i am trying to do is always sending Product with 0 quantity to the end of an already sorted temp Table without losing current sorting (as i described in the following question How to send Zero Qty Products to the end of a PagedList<Products>?)
I have one Sorted temptable which is filled (it is sorted by what user has selected like Alphabetic , by Price or by Newer product,sorting is based identity id) :
CREATE TABLE #DisplayOrderTmp
(
[Id] int IDENTITY (1, 1) NOT NULL,
[ProductId] int NOT NULL
)
sorted #DisplayOrderTmp :
+------------+---------------+
| id | ProductId |
+------------+---------------+
| 1 | 66873 | // Qty is 0
| 2 | 70735 | // Qty is not 0
| 3 | 17121 | // Qty is not 0
| 4 | 48512 | // Qty is not 0
| 5 | 51213 | // Qty is 0
+------------+---------------+
I want pass this data to web-page, but before it i need to send product with zero quantity to the end of this list without loosing current Sorting by)
My returned data should be like this (sorting doesn't changed just 0 quantity products went to the end of list by their order):
CREATE TABLE #DisplayOrderTmp4
(
[Id] int IDENTITY (1, 1) NOT NULL,
[ProductId] int NOT NULL
)
+------------+---------------+
| id | ProductId |
+------------+---------------+
| 1 | 70735 |
| 2 | 17121 |
| 3 | 48512 |
| 4 | 66873 |
| 5 | 51213 |
+------------+---------------+
P.S: Its My product Table which i have to inner join with tmptable to find qty of products.
Product Table is like this :
+------------+---------------+------------------+
| id | stockqty | DisableBuyButton |
+------------+---------------+------------------+
| 17121 | 1 | 0 |
| 48512 | 27 | 0 |
| 51213 | 0 | 1 |
| 66873 | 0 | 1 |
| 70735 | 11 | 0 |
+------------+---------------+------------------+
What i have tried so far is this : (it works with delay and has performance issue i almost have 30k products)
INSERT INTO #DisplayOrderTmp2 ([ProductId])
SELECT p2.ProductId
FROM #DisplayOrderTmp p2 with (NOLOCK) // it's already sorted table
INNER JOIN Product prd with (NOLOCK)
ON p2.ProductId=prd.Id
and prd.DisableBuyButton=0 // to find product with qty more than 0
group by p2.ProductId order by min(p2.Id) // to save current ordering
INSERT INTO #DisplayOrderTmp3 ([ProductId])
SELECT p2.ProductId
FROM #DisplayOrderTmp p2 with (NOLOCK) //it's already sorted table
INNER JOIN Product prd with (NOLOCK)
ON p2.ProductId=prd.Id
and prd.DisableBuyButton=1 // to find product with qty equal to 0
group by p2.ProductId order by min(p2.Id) // to save current ordering
INSERT INTO #DisplayOrderTmp4 ([ProductId]) // finally Union All this two data
SELECT p2.ProductId FROM
#DisplayOrderTmp2 p2 with (NOLOCK) // More than 0 qty products with saved ordering
UNION ALL
SELECT p2.ProductId FROM
#DisplayOrderTmp3 p2 with (NOLOCK) // 0 qty products with saved ordering
Is there any way To Avoid creating TempTable in this query? send 0
quantity products of first temptable to the end of data-list without
creating three other tempTable , without loosing current ordering based by Identity ID.
My query has performance problem.
I have to say again that the temptable has a identity insert ID column and it is sorted based sorting type which user passed to Stored-Procedure.
Thank You All :)
Make sure the temp table has an index or primary key with Id as the leading column. This will help avoid sort operators in the plan for the ordering:
CREATE TABLE #DisplayOrderTmp
(
[Id] int NOT NULL,
[ProductId] int NOT NULL
,PRIMARY KEY CLUSTERED(Id)
);
With that index, you should be able to get the result without additional temp tables with reasonable efficiency using a UNION ALL query, assuming ProductID is the Product table primary key:
WITH products AS (
SELECT p2.Id, p2.ProductId, prd.stockqty, 1 AS seq
FROM #DisplayOrderTmp p2
JOIN Product prd
ON p2.ProductId=prd.Id
WHERE prd.stockqty > 0
UNION ALL
SELECT p2.Id, p2.ProductId, prd.stockqty, 2 AS seq
FROM #DisplayOrderTmp p2
JOIN Product prd
ON p2.ProductId=prd.Id
WHERE prd.stockqty = 0
)
SELECT ProductId
FROM products
ORDER BY seq, Id;
You mentioned in comments that you ultimately want a paginated result. This can be done in T-SQL by adding OFFSET and FETCH to the ORDER BY clause as below. However, be aware that pagination over a large result set will become progressively slower the further into the result one queries.
WITH products AS (
SELECT p2.Id, p2.ProductId, prd.stockqty, 1 AS seq
FROM #DisplayOrderTmp p2
JOIN Product prd
ON p2.ProductId=prd.Id
WHERE prd.stockqty > 0
UNION ALL
SELECT p2.Id, p2.ProductId, prd.stockqty, 2 AS seq
FROM #DisplayOrderTmp p2
JOIN Product prd
ON p2.ProductId=prd.Id
WHERE prd.stockqty = 0
)
SELECT ProductId
FROM products
ORDER BY seq, Id
OFFSET #PageSize * (#PageNumber - 1) ROWS
FETCH NEXT #PageSize ROWS ONLY;
You could use ORDER BY without using UNION ALL:
SELECT p2.ProductId
FROM #DisplayOrderTmp p2
JOIN Product prd
ON p2.ProductId=prd.Id
ORDER BY prd.DisableBuyButton, p2.id;
DisableBuyButton = 0 - qnt > 0
DisableBuyButton = 1 - qnt = 0
Seems it only needs an extra something in the order by.
An IIF or CASE can be used to give a priority to the sorting.
SELECT tmp.ProductId
FROM #DisplayOrderTmp tmp
JOIN Product prd
ON prd.Id = tmp.ProductId
AND prd.DisableBuyButton IN (0,1)
ORDER BY IIF(prd.DisableBuyButton=0,1,2), tmp.id;

Converting multiple rows into one in SQL Server

I have 2 tables:
Product:
ProductId | Name | Description
----------+-------+-------------------------------------
1 | shirt | this is description for shirt
2 | pent | this is description for pent
ProductOverride:
ProductOverrideId | ColumnId | Value | ProductId
------------------+-----------+------------------------+-----------
1 | 1 | overridden name | 1
2 | 2 | overridden description | 1
where ColumnId is column_id from sys.columns.
I want to select all the products with the following requirement:
if product name or product description is overridden in ProductOverride table, get the overridden value of name/description, otherwise get the name/description value from the product table.
Sample output:
ProductId | Name | Description
----------+-----------------+---------------------------
1 | overridden name | overridden description
2 | pent | this is description for pent
I have the following query which returns the exact result.
DECLARE #productNameColumnId INT = 1;
DECLARE #productDescriptionColumnId INT = 2;
WITH OverriddenProductNameCTE ([Value], [ProductId]) AS
(
SELECT
temp.[Value], temp.ProductId
FROM
ProductOverride temp
WHERE
temp.ColumnId = #productNameColumnId
), OverriddenProductDescriptionCTE ([Value], [ProductId]) AS
(
SELECT
temp.[Value], temp.ProductId
FROM
ProductOverride temp
WHERE
temp.ColumnId = #productDescriptionColumnId
)
SELECT
p.ProductId,
CASE
WHEN EXISTS(SELECT [Value]
FROM OverriddenProductNameCTE opnc
WHERE opnc.ProductId = p.ProductId)
THEN (SELECT [Value]
FROM OverriddenProductNameCTE opnc
WHERE opnc.ProductId = p.ProductId)
ELSE p.[Name]
END AS [Name],
CASE
WHEN EXISTS(SELECT [Value]
FROM OverriddenProductDescriptionCTE opdc
WHERE opdc.ProductId = p.ProductId)
THEN (SELECT [Value]
FROM OverriddenProductDescriptionCTE opdc
WHERE opdc.ProductId = p.ProductId)
ELSE p.[Description]
END AS [Description]
FROM
product p
but in the CASE statements, I have the following repetitive code:
SELECT [Value]
FROM OverriddenProductNameCTE opnc
WHERE opnc.ProductId = p.ProductId
which means if the CASE statement's first condition is true DBMS will execute the same query again in the THEN part.
I want to improve this query both in terms of simplifying the query and in terms of processing.
Also if there is any advantage of using CTEs in this situation?
If it's only 2 columns I think the simplest thing you can do is left join twice with coalesce:
SELECT p.ProductId
,COALESCE(poN.Value, p.Name) As Name
,COALESCE(poD.Value, p.Description) As Description
FROM Product p
LEFT JOIN ProductOverride poN ON p.ProductId = poN.ProductId AND poN.ColumnId = 1
LEFT JOIN ProductOverride poD ON p.ProductId = poD.ProductId AND poD.ColumnId = 2
If it's more columns I would suggest pivoting the ProductOverride table and left join to that - Like this (a complete example):
Create and populate sample tables (Please save us this step in your future questions)
CREATE TABLE Product
(
ProductId int,
Name varchar(100),
Description varchar(100),
price int null
);
INSERT INTO Product VALUES
(1, 'shirt', 'Description for shirts', 1),
(2, 'Pants', 'Description for pants', 4),
(3, 'Socks', 'Description for socks', 5)
CREATE TABLE ProductOverride
(
ProductOverrideId int,
ColumnId int,
Value varchar(100),
ProductId int
);
INSERT INTO ProductOverride VALUES
(1,1,'product 1 name',1),
(2,2,'product 1 desc',1),
(3,3,'7',1),
(4,1,'pants name',2),
--Note: no pants description in the override tabl
(6,3,'8',2);
-- Note: no socks at all in override table
The query:
SELECT p.ProductId
,COALESCE(override.[1], p.Name) As Name
,COALESCE(override.[2], p.Description) As Description
,COALESCE(CAST(override.[3] as int), p.Price) As Price
FROM Product p
LEFT JOIN
(
SELECT *
FROM
(
SELECT ProductId, Value, ColumnId -- Columns To use for pivot
FROM ProductOverride
) ColumnsToPivot
PIVOT (
max (Value)
for ColumnId in ([1], [2], [3]) -- Values in ColumnId column to make the column names
) as pivotedData
) as override ON p.ProductId = override.ProductId
Results:
ProductId Name Description Price
1 product 1 name product 1 desc 7
2 pants name Description for pants 8
3 Socks Description for socks 5
You can see a live demo on rextester.

SQL Server - Transpose rows into columns

I've searched high and low for an answer to this so apologies if it's already answered!
I have the following result from a query in SQL 2005:
ID
1234
1235
1236
1267
1278
What I want is
column1|column2|column3|column4|column5
---------------------------------------
1234 |1235 |1236 |1267 |1278
I can't quite get my head around the pivot operator but this looks like it's going to be involved. I can work with there being only 5 rows for now but a bonus would be for it to be dynamic, i.e. can scale to x rows.
EDIT:
What I'm ultimately after is assigning the values of each resulting column to variables, e.g.
DECLARE #id1 int, #id2 int, #id3 int, #id4 int, #id5 int
SELECT #id1 = column1, #id2 = column2, #id3 = column3, #id4 = column4,
#id5 = column5 FROM [transposed_table]
You also need a value field in your query for each id to aggregate on. Then you can do something like this
select [1234], [1235]
from
(
-- replace code below with your query, e.g. select id, value from table
select
id = 1234,
value = 1
union
select
id = 1235,
value = 2
) a
pivot
(
avg(value) for id in ([1234], [1235])
) as pvt
I think you'll find the answer in this answer to a slightly different question: Generate "scatter plot" result of members against sets from SQL query
The answer uses Dynamic SQL. Check out the last link in mellamokb's answer: http://www.sqlfiddle.com/#!3/c136d/14 where he creates column names from row data.
In case you have a grouped flat data structure that you want to group transpose, like such:
GRP | ID
---------------
1 | 1234
1 | 1235
1 | 1236
1 | 1267
1 | 1278
2 | 1234
2 | 1235
2 | 1267
2 | 1289
And you want its group transposition to appear like:
GRP | Column 1 | Column 2 | Column 3 | Column 4 | Column 5
-------------------------------------------------------------
1 | 1234 | 1235 | 1236 | 1267 | 1278
2 | 1234 | 1235 | NULL | 1267 | NULL
You can accomplish it with a query like this:
SELECT
Column1.ID As column1,
Column2.ID AS column2,
Column3.ID AS column3,
Column4.ID AS column4,
Column5.ID AS column5
FROM
(SELECT GRP, ID FROM FlatTable WHERE ID = 1234) AS Column1
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1235) AS Column2
ON Column1.GRP = Column2.GRP
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1236) AS Column3
ON Column1.GRP = Column3.GRP
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1267) AS Column4
ON Column1.GRP = Column4.GRP
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1278) AS Column5
ON Column1.GRP = Column5.GRP
(1) This assumes you know ahead of time which columns you will want — notice that I intentionally left out ID = 1289 from this example
(2) This basically uses a bunch of left outer joins to append 1 column at a time, thus creating the transposition. The left outer joins (rather than inner joins) allow for some columns to be null if they don't have corresponding values from the flat table, without affecting any subsequent columns.

Resources