Problem with unique SQL query - sql-server

I want to select all records, but have the query only return a single record per Product Name. My table looks similar to:
SellId ProductName Comment
1 Cake dasd
2 Cake dasdasd
3 Bread dasdasdd
where the Product Name is not unique. I want the query to return a single record per ProductName with results like:
SellId ProductName Comment
1 Cake dasd
3 Bread dasdasdd
I have tried this query,
Select distict ProductName,Comment ,SellId from TBL#Sells
but it is returning multiple records with the same ProductName. My table is not realy as simple as this, this is just a sample. What is the solution? Is it clear?

Select ProductName,
min(Comment) , min(SellId) from TBL#Sells
group by ProductName
If y ou only want one record per productname, you ofcourse have to choose what value you want for the other fields.
If you aggregate (using group by) you can choose an aggregate function,
htat's a function that takes a list of values and return only one : here I have chosen MIN : that is the smallest walue for each field.
NOTE : comment and sellid can come from different records, since MIN is taken...
Othter aggregates you might find useful :
FIRST : first record encountered
LAST : last record encoutered
AVG : average
COUNT : number of records
first/last have the advantage that all fields are from the same record.

SELECT S.ProductName, S.Comment, S.SellId
FROM
Sells S
JOIN (SELECT MAX(SellId)
FROM Sells
GROUP BY ProductName) AS TopSell ON TopSell.SellId = S.SellId
This will get the latest comment as your selected comment assuming that SellId is an auto-incremented identity that goes up.

I know, you've got an answer already, I'd like to offer a way that was fastest in terms of performance for me, in a similar situation. I'm assuming that SellId is Primary Key and identity. You'd want an index on ProductName for best performance.
select
Sells.*
from
(
select
distinct ProductName
from
Sells
) x
join
Sells
on
Sells.ProductName = x.ProductName
and Sells.SellId =
(
select
top 1 s2.SellId
from
Sells s2
where
x.ProductName = s2.ProductName
Order By SellId
)
A slower method, (but still better than Group By and MIN on a long char column) is this:
select
*
from
(
select
*,ROW_NUMBER() over (PARTITION BY ProductName order by SellId) OccurenceId
from sells
) x
where
OccurenceId = 1
An advantage of this one is that it's much easier to read.

create table Sale
(
SaleId int not null
constraint PK_Sale primary key,
ProductName varchar(100) not null,
Comment varchar(100) not null
)
insert Sale
values
(1, 'Cake', 'dasd'),
(2, 'Cake', 'dasdasd'),
(3, 'Bread', 'dasdasdd')
-- Option #1 with over()
select *
from Sale
where SaleId in
(
select SaleId
from
(
select SaleId, row_number() over(partition by ProductName order by SaleId) RowNumber
from Sale
) tt
where RowNumber = 1
)
order by SaleId
-- Option #2
select *
from Sale
where SaleId in
(
select min(SaleId)
from Sale
group by ProductName
)
order by SaleId
drop table Sale

Related

How can I delete duplicates from a table based on a location id and name

columns are: Name, Location_Name, Location_ID
I want to check Names and Location_ID, and if there are two that are the same I want to delete/remove that row.
For example: If Name John Fox at location id 4 shows up two or more times I want to just keep one.
After that, I want to count how many people per location.
Location_Name1: 45
Location_Name2: 66
Etc...
The location name and Location Id are related.
Sample data
Code I tried
Deleting duplicates is a common pattern. You apply a sequence number to all of the duplicates, then delete any that aren't first. In this case I order arbitrarily but you can choose to keep the one with the lowest PK value or that was modified last or whatever - just update the ORDER BY to sort the one you want to keep first.
;WITH cte AS
(
SELECT *, rn = ROW_NUMBER() OVER
(PARTITION BY Name, Location_ID ORDER BY ##SPID)
FROM dbo.TableName
)
DELETE cte WHERE rn > 1;
Then to count, assuming there can't be two different Location_IDs for a given Location_Name (this is why schema + sample data is so helpful):
SELECT Location_Name, People = COUNT(Name)
FROM dbo.TableName
GROUP BY Location_Name;
Example db<>fiddle
If Location_Name and Location_ID are not tightly coupled (e.g. there could be Location_ID = 4, Location_Name = Place 1 and Location_ID = 4, Location_Name = Place 2 then you're going to have to define how to determine which place to display if you group by Location_ID, or admit that perhaps one of those columns is meaningless.
If Location_Name and Location_ID are tightly coupled, they shouldn't both be stored in this table. You should have a lookup/dimension table that stores both of those columns (once!) and you use the smallest data type as the key you record in the fact table (where it is repeated over and over again). This has several benefits:
Scanning the bigger table is faster, because it's not as wide
Storage is reduced because you're not repeating long strings over and over and over again
Aggregation is clearer and you can join to get names after aggregation, which will be faster
If you need to change a location's name, you only need to change it in exactly one place
Sample code
CREATE TABLE People_Location
(
Name VARCHAR(30) NOT NULL,
Location_Name VARCHAR(30) NOT NULL,
Location_ID INT NOT NULL,
)
INSERT INTO People_Location
VALUES
('John Fox', 'Moon', 4),
('John Bear', 'Moon', 4),
('Peter', 'Saturn', 5),
('John Fox', 'Moon', 4),
('Micheal', 'Sun', 1),
('Jackie', 'Sun', 1),
('Tito', 'Sun', 1),
('Peter', 'Saturn', 5)
Get location and count
select Location_Name, count(1)
from
(select Name, Location_Name,
rn = ROW_NUMBER() OVER (PARTITION BY Name, Location_ID ORDER BY Name)
from People_Location
) t
where rn = 1
group by Location_Name
Result
Moon 2
Saturn 1
Sun 3

How to get the most recent row in a temporal tables history

I have the following query:
SELECT *
FROM dbo.mytable FOR system_time BETWEEN #rowEffFrom AND #rowEffTo t
WHERE t.anothertableid = 1
ORDER BY t.rowefffrom
My table has the following columns relevant to this question:
Id, AnotherTableId (FK), Description, RowEffFrom, RowEffTo
What I have done is get an entry from tblAnotherTable's history matching certain criteria. I then set #rowEffFrom and #rowEffTo as the RowEffFrom and RowEffTo for the retrieved row (top 1).
myTable has a many to one relationship with tblAnotherTable.
I want to retrieve the latest version of all entries between the #rowEffFrom and #rowEffTo, but one per Id.
e.g.
if my query above returns 2 rows, both with Id of 11, then I just want the most recent. If it returns 6 rows with ids 11,12,13 I want 3 rows with the latest of each version.
Nevermind, the answer is:
SELECT *
FROM ( SELECT Row_number() OVER (partition BY id ORDER BY rowefffrom DESC) rownumber,
*
FROM dbo.mytable FOR system_time BETWEEN #rowEffFrom AND #rowEffTo t
WHERE t.anothertableid = 1 ) mt
WHERE mt.rownumber = 1

TSQL - De-duplication report - grouping

So I'm trying to create a report that ranks a duplicate record, the idea behind this is that the customer wants to merge a whole lot of duplicate records that came about from a migration.
I need the ranking so that my report can show which record should be the "main" record, i.e. the record that will have missing data pulled into it.
The duplicate definition is pretty simple:
If the email addresses are the same then it is always a duplicate, if
the emails do not match, then the first name, surname, and mobile must
match.
The ranking will be based on a whole bunch of columns in the table, so:
email address isn't NULL = 50
phone number isn't NULL = 20
etc.. whichever gets the highest number in the duplicate group becomes the main record. This is where I am having issues, I can't seem to find a way to get an incremental number for each duplicate set. This is some of the code I have so far:
( I took out some of the rank columns in the temp table and CTE expression to shorten it )
DECLARE #tmp_Duplicates TABLE (
tmp_personID INT
, tmp_Firstname NVARCHAR(100)
, tmp_Surname NVARCHAR(100)
, tmp_HomeEmail NVARCHAR(300)
, tmp_MobileNumber NVARCHAR(100)
--- Ratings
, tmp_HomeEmail_Rating INT
--- Groupings
, tmp_GroupNumber INT
)
;WITH cteDupes AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personID DESC) AS RND,
ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personId) AS RNA,
p.personID, p.PersonFirstName, p.PersonSurname, p.PersonHomeEMail
, personMobileTelephone
FROM tblCandidate c INNER JOIN tblPerson p ON c.candidateID = p.personID
)
INSERT INTO #tmp_Duplicates
SELECT PersonID, PersonFirstName, PersonSurname, PersonHomeEMail, personMobileTelephone
, 10, RND
FROM cteDupes
WHERE RNA + RND > 2
ORDER BY personID, PersonFirstName, PersonSurname
SELECT * FROM #tmp_Duplicates
This gives me the results I want, but the group number isn't showing how I need it:
What I need is for each group to be an incremental value:

Select Different Column Value for Row with Max Value

I'm hoping for a cleaner way to do something that I know how to do one way. I want to retrieve the UserId for the MAX ID value as well as that MAX ID value. Let's say I have a table with data like this:
ID UserId Value
1 10 'Foo'
2 15 'Blah'
3 10 'Blech'
4 20 'Qwerty'
I want to retrieve:
ID UserId
4 20
I know I could do this like so:
SELECT
t.ID,
t.UserID
FROM
(
SELECT MAX(ID) as [MaxID]
FROM table
) as m
JOIN table as t ON m.MaxID = t.ID
I'm only vaguely familiar with the ROW_NUMBER(), RANK() and other similar methods and I can't help believing that this scenario could benefit from some such method to get rid of joining back to the table.
You can definitely use ROW_NUMBER for something like this:
with t1Rank as
(
select *
, t1Rank = row_number() over (order by ID desc)
from t1
)
select ID, UserID
from t1Rank
where t1Rank = 1
SQL Fiddle with demo.
The advantage with this approach is you can bring Value (or other fields as required) into the result set, too. Plus you can tweak the ordering/grouping as required.
You could also just do it with a sub-query like this:
SELECT ID ,
UserID
FROM table
WHERE ID = ( SELECT MAX(ID)
FROM table
);
SELECT TOP 1 ID, UserID FROM <table> ORDER BY ID DESC

Check for applicable Group query for shopping cart

I have problem for one of discount check condition. I have tables structure as below:
Cart table (id, customerid, productid)
Group table (groupid, groupname, discountamount)
Group Products table (groupproductid, groupid, productid)
While placing an order, there will be multiple items in cart, I want to check those items with top most group if that group consists of all product shopping cart have?
Example:
If group 1 consists 2 products and those two products exists in cart table then group 1 discount should be returned.
please help
It's tricky, without having real table definitions nor sample data. So I've made some up:
create table Carts(
id int,
customerid int,
productid int
)
create table Groups(
groupid int,
groupname int,
discountamount int
)
create table GroupProducts(
groupproductid int,
groupid int,
productid int
)
insert into Carts (id,customerid,productid) values
(1,1,1),
(2,1,2),
(3,1,4),
(4,2,2),
(5,2,3)
insert into Groups (groupid,groupname,discountamount) values
(1,1,10),
(2,2,15),
(3,3,20)
insert into GroupProducts (groupproductid,groupid,productid) values
(1,1,1),
(2,1,5),
(3,2,2),
(4,2,4),
(5,3,2),
(6,3,3)
;With MatchedProducts as (
select
c.customerid,gp.groupid,COUNT(*) as Cnt
from
Carts c
inner join
GroupProducts gp
on
c.productid = gp.productid
group by
c.customerid,gp.groupid
), GroupSizes as (
select groupid,COUNT(*) as Cnt from GroupProducts group by groupid
), MatchingGroups as (
select
mp.*
from
MatchedProducts mp
inner join
GroupSizes gs
on
mp.groupid = gs.groupid and
mp.Cnt = gs.Cnt
)
select * from MatchingGroups
Which produces this result:
customerid groupid Cnt
----------- ----------- -----------
1 2 2
2 3 2
What we're doing here is called "relational division" - if you want to search elsewhere for that term. In my current results, each customer only matches one group - if there are multiple matches, we need some tie-breaking conditions to determine which group to report. I prompted with two suggestions in comments (lowest groupid or highest discountamount). Your response of "added earlier" doesn't help - we don't have a column which contains the addition dates of groups. Rows have no inherent ordering in SQL.
We would do the tie-breaking in the definition of MatchingGroups and the final select:
MatchingGroups as (
select
mp.*,
ROW_NUMBER() OVER (PARTITION BY mp.customerid ORDER BY /*Tie break criteria here */) as rn
from
MatchedProducts mp
inner join
GroupSizes gs
on
mp.groupid = gs.groupid and
mp.Cnt = gs.Cnt
)
select * from MatchingGroups where rn = 1

Resources