SQL: doing several joins to evaluate different columns - sql-server

I'm quite new to SQL but use it a lot now in my work now (Microsoft SQL Server).
So the issue is this: I collect data that is atypical for a certain column.
Let's say I got different Burgers and they should have a standardized calories value. So I did this with a query
------------------------------------------
| Burger | calories | numBurgers | Rank |
------------------------------------------
| Chicken| 600 | 20 | 1 |
| Chicken| 400 | 3 | 2 |
| Beef | 700 | 35 | 1 |
| Beef | 850 | 4 | 2 |
-------------------------------------------
To get a list of all the "wrong" burgers I use a temporary table and filter out GroupRank = 1
USE database;
GO
WITH GapRanking AS
(
SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
)
SELECT * FROM GapRanking
WHERE Rank <> 1
...
I get all combinations of Burgers and calories that are not "standard"
Then I do an Inner Join with the original table and all columns on the one above.
SELECT * FROM BaseTable as base
INNER JOIN
(SELECT * FROM GapRanking
WHERE Rank <> 1) AS err
ON (base.Burgers = err.Burgers
AND base.calories = err.calories)
This way I get a table with complete information about the "not-standard" burgers. So far so good.
Now I want to add other rows where there is a deviation in another criteria, price for example, not just calories and add it to the list if its not already there.
So I thought of UNION or JOIN.
So what is the best approach. UNION the above query with the same query just different column (price instead of calories)?
Or do a JOIN with the same query just different column (price instead of calories)?
The code gets quite "ugly" and I'm not sure if I do the right approach here.
Also because of me using the temporary table using WITH a UNION does not seem possible so easily.
I'm really glad for any ideas here. Cheers

use sub-query and join below is just sudo-code not actual you can follow like this way
select t1.*, t2.required_colum
(SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
) as t1
join
(SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
) as t2
on t1.colname = t2.colname
where t1.Rank != 1 and t2.Rank != 1

Related

sql selection of one value from several identical

I have the result of executing a query. it collects data from several tables. he is such a:
|Name|date |number|Id
|alex|01-01-2021 |1111 | 1
|mike|01-01-2021 |2222 | 2
|alex|02-01-2021 |1111 | 3
|alex|03-01-2021 |1111 | 4
|john|04-01-2021 |3333 | 5
i need to get the following result:
|Name|date |number| Id
|mike|01-01-2021|2222 | 2
|alex|any value |1111 | Any value
|john|04-01-2021|3333 | 5
I need to select one of the repeated values and show it.I have a large query with many columns. here I gave only a short version to explain the essence of the problem
select Name,max(date) as date,number
from atable
group by Name, number
You may use this CTE and manage which date (first or last) you will get
WITH data AS (
SELECT
Name,
date,
number,
row_number() OVER (PARTITION BY Name ORDER BY date) AS row_num
FROM test01
)
SELECT
Name,
date,
number
FROM data
WHERE row_num = 1

TSQL Conditional Where or Group By?

I have a table like the following:
id | type | duedate
-------------------------
1 | original | 01/01/2017
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | original | 10/01/2017
3 | revised | 09/01/2017
Where there may be either one or two rows for each id. If there are two rows with same id, there would be one with type='original' and one with type='revised'. If there is one row for the id, type will always be 'original'.
What I want as a result are all the rows where type='revised', but if there is only one row for a particular id (thus type='original') then I want to include that row too. So desired output for the above would be:
id | type | duedate
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | revised | 09/01/2017
I do not know how to construct a WHERE clause that conditionally checks whether there are 1 or 2 rows for a given id, nor am I sure how to use GROUP BY because the revised date could be greater than or less than than the original date so use of aggregate functions MAX or MIN don't work. I thought about using CASE somehow, but also do not know how to construct a conditional that chooses between two different rows of data (if there are two rows) and display one of them rather than the other.
Any suggested approaches would be appreciated.
Thanks!
you can use row number for this.
WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Type DESC) AS RN
FROM YourTable
)
SELECT *
FROM T
WHERE RN = 1
Is something like this sufficient?
SELECT *
FROM mytable m1
WHERE type='revised'
or 1=(SELECT COUNT(*) FROM mytable m2 WHERE m2.id=m1.id)
You could use a subquery to take the MAX([type]). In this case it works for [type] since alphabetically we want revised first, then original and "r" comes after "o" in the alphabet. We can then INNER JOIN back on the same table with the matching conditions.
SELECT T2.*
FROM (
SELECT id, MAX([type]) AS [MAXtype]
FROM myTABLE
GROUP BY id
) AS dT INNER JOIN myTable T2 ON dT.id = T2.id AND dT.[MAXtype] = T2.[type]
ORDER BY T2.[id]
Gives output:
id type duedate
1 revised 2017-02-01
2 original 2017-03-01
3 revised 2017-09-01
Here is the sqlfiddle: http://sqlfiddle.com/#!6/14121f/6/0

Multiple Column Duplicate Criteria

I am using SQL Server. This is my sample data set:
IDNO| Consigment | SO_Number | Acc Number | OfficeNumber|PL9 |Remarks
--- | -----------| ----------| -----------| ------------|-------|-------
1 | AA12345MY | 1024450191| 8800400431 |B213 |W449401|Stay
2 | AA12345MY | 1024450192| 8800400431 |B213 |W449401|Remove
3 | BA12345MY | 1024460121| 8800400726 |K678 |W229790|Stay
4 | BA12345MY | 1024460124| 8800400726 |K678 |W229790|Remove
I want to put a remarks on row 2 and 4 as it is a duplicates.
Duplicate criteria must match these 4 columns:
Consigment
Acc Number
OfficeNumber
PL9
I am removing the youngest SO number (which one is the latest)
I haven't got a clue on how to start as I never found a perfect reference
Regards,
Fadlisham Fadzil
One approach here to create a CTE which labels duplicate records and then delete from that CTE:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Consigment, [Acc Number], OfficeNumber, PL9
ORDER BY SO_Number) rn
FROM yourTable
)
DELETE FROM cte
WHERE rn > 1;

TSQL, Remove rows based upon Row-Index

My Table is like this.....
**AttName** **Title** **Count_Index**
Red Boys 1
Red Girls 2
Green Boys 1
Blue Boys 1
I only Want to return...
Red Boys 1
Red Girls 2
Thats because I have Red with two entries, I want to skip/remove all the ROW(s) if their Count is ONLY 1. In other words I am only interested in rows if their count goes above "1".
Try
SELECT *
FROM table1
WHERE AttName IN (SELECT AttName FROM table1 GROUP BY AttName HAVING COUNT(*) > 1)
SQLFiddle
Output
| ATTNAME | TITLE | COUNT_INDEX |
---------------------------------
| Red | Boys | 1 |
| Red | Girls | 2 |
Ok, this is tested. I like using windowing functions when looking for things like duplicates. Particularly because the avoids doing a subselect in a where clause, and from the same table twice. Instead all the needed columns are already pulled in the subselect. Although windowing function can be expensive sometimes.
Select *, ROW_NUMBER() over (Partition by AttrName Order By AttrName) --probably better to order by whatever the primary key is for consistent results, esepcially if you plan to use this in a delete statement
From (
SELECT AttName, title, COUNT(AttrName) over (partition by AttrName) as cnt
FROM yourtable
) as counted
Where counted.cnt > 1

SQL Server problems when using ODER BY and DISTINCT together

I got two problems, the first problem is my two COUNTS that I start with. GroupID is a string that keep products together (Name_Year together), same product but different size.
If I have three reviews in tblReview and they all have the same GroupID I want to return 3. My problem is that if I have three Products with different ProductID but same GroupID and I add three Review to that GroupID I got 9 returns (3*3). If I only have one Product With the same GroupID and three Reviews it works (1*3=3 returns)
The Second problem is that if I have the ORDER BY CASE Price I have to add GROUP BY Price as well and then I don't get the DISTINCT effect that I want. And that is to just show products that have unique GroupID.
Here's the query, hope somebody can help me with this.
ALTER PROCEDURE GetFilterdProducts
#CategoryID INT, #ColumnName varchar(100)
AS
SELECT COUNT(tblReview.GroupID) AS ReviewCount,
COUNT(tblComment.GroupID) AS CommentCount,
Product.ProductID,
Product.Name,
Product.Year,
Product.Price,
Product.BrandID,
Product.GroupID,
AVG(tblReview.Grade) AS Grade
FROM Product LEFT JOIN
tblComment ON Product.GroupID = tblComment.GroupID LEFT JOIN
tblReview ON Product.GroupID = tblReview.GroupID
WHERE (Product.CategoryID = #CategoryID)
GROUP BY Product.ProductID, Product.BrandID, Product.GroupID, Product.Name, Product.Year, Product.Price
HAVING COUNT(distinct Product.GroupID) = 1
ORDER BY
CASE
WHEN #ColumnName='Name' THEN Name
WHEN #ColumnName='Year' THEN Year
WHEN #ColumnName='Price' THEN Price
END
My tabels:
Product:
ProductID, Name, Year, Price, BrandID, GroupID
tblReview:
ReviewID, Description, Grade, ProductID, GroupID
tblComment:
CommentID, Description, ProductID, GroupID
I think that my problem is that if I have three GroupID with the same name, ex Nike_2010 in Product and I have three Reviews in tblReview that counts the first row in Products that contain Nike_2010 counts how many reviews in tblReview with the same GroupID, Nike_2010 and then the second row in Product that contains Nike_2010 and then do the same count again and again, that results to 9 rows. How do I avoid that?
For starters, because you're joining on multiple tables, you're going to end up with the cross product of all of them as a result. Your counts will then return the total count of rows containing data in that column. Consider the following example:
- PRODUCTS - -- COMMENTS -- --- REVIEWS ---
Key | Name Key | Comment Key | Review
1 | A 1 | Foo 1 | Great
2 | B 1 | Bar 1 | Wonderful
The query
SELECT PRODUCTS.Key, PRODUCTS.Name, COMMENTS.Comment, REVIEWS.Review
FROM PRODUCTS
LEFT OUTER JOIN COMMENTS ON PRODUCTS.KEY = COMMENTS.KEY
LEFT OUTER JOIN REVIEWS ON PRODUCTS.KEY = REVIEWS.KEY
will result in the following data:
Key | Name | Comment | Review
1 | A | Foo | Great
1 | A | Foo | Wonderful
1 | A | Bar | Great
1 | A | Bar | Wonderful
2 | B | NULL | NULL
Thus, counting in this format
SELECT PRODUCTS.Key, PRODUCTS.Name, COUNT(COMMENTS.Comment), COUNT(REVIEWS.Review)
FROM PRODUCTS
LEFT OUTER JOIN COMMENTS ON PRODUCTS.KEY = COMMENTS.KEY
LEFT OUTER JOIN REVIEWS ON PRODUCTS.KEY = REVIEWS.KEY
GROUP BY PRODUCTS.Key, PRODUCTS.Name
will give you
Key | Name | Count1 | Count2
1 | A | 4 | 4
2 | B | 0 | 0
because it's counting each row in the table produced by the join!
Instead, you want to count each table separately in a subquery before joining it back like the following:
SELECT PRODUCTS.Key, PRODUCTS.Name, ISNULL(CommentCount.NumComments, 0),
ISNULL(ReviewCount.NumReviews, 0)
FROM PRODUCTS
LEFT OUTER JOIN (SELECT Key, COUNT(*) as NumComments
FROM COMMENTS
GROUP BY Key) CommentCount on PRODUCTS.Key = CommentCount.Key
LEFT OUTER JOIN (SELECT Key, COUNT(*) as NumReviews
FROM REVIEWS
GROUP BY Key) ReviewCount on PRODUCTS.Key = ReviewCount.Key
which will produce the following
Key | Name | NumComments | NumReviews
1 | A | 2 | 2
2 | B | 0 | 0
As for the "DISTINCT effect" you refer to, I'm not exactly sure I follow. Could you elaborate a bit?
About second problem - cannot you group by same CASE statement? You shouldn't have Price field in results list then though.

Resources