Removing Duplicates of two columns in a query - sql-server

I have a select * query which gives lots of row and lots of columns of results. I have an issue with duplicates of one column A when given the same value of another column B that I would like to only include one of.
Basically I have a column that tells me the "name" of object and another that tells me the "number". Sometimes I have an object "name" with more than one entry for a given object "number". I only want distinct "numbers" within a "name" but I want the query to give the entire table when this is true and not just these two columns.
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bob 3 443 76
This example above is fine
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bill 1 443 76
Bill 2 54 1856
This example above is fine
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bob 2 209 17
This example above is not fine, I only want one of the Bob 2's.

Try it if you are using SQL 2005 or above:
With ranked_records AS
(
select *,
ROW_NUMBER() OVER(Partition By name, number Order By name) [ranked]
from MyTable
)
select * from ranked_records
where ranked = 1

If you just want the Name and number, then
SELECT DISTINCT Name, Number FROM Table1
If you want to know how many of each there are, then
SELECT Name, Number, COUNT(*) FROM Table1 GROUP BY Name, Number

By using a Common Table Expression (CTE) and the ROW_NUMBER OVER PARTION syntax as follows:
WITH
CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Name, Number ORDER BY Name, Number) AS R
FROM
dbo.ATable
)
SELECT
*
FROM
CTE
WHERE
R = 1

WITH
CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Plant, BatchNumber ORDER BY Plant, BatchNumber) AS R
FROM dbo.StatisticalReports WHERE dbo.StatisticalReports. \!"FermBatchStartTime\!" >= DATEADD(d,-90, getdate())
)
SELECT
*
FROM
CTE
WHERE
R = 1
ORDER BY dbo.StatisticalReports.Plant, dbo.StatisticalReports.FermBatchStartTime

Related

Displaying all columns in SQL and also sum of columns with same ID in the last Repeating row

I have 2 tables
OrderDetails:
Id Name type Quantity
------------------------------------------
2009 john a 10
2009 john a 20
2010 sam b 25
2011 sam c 50
2012 sam d 30
ValueDetails:
Id Value
-------------------
2009 300
2010 500
2011 200
2012 100
I need to get an output which displays the data as such :
Id Name type Quantity Price
-------------------------------------------------
2009 john a 10
2009 john a 20 9000
2010 sam b 25
2011 sam c 50
2012 sam d 30 25500
The price is calculated by Value x Quantity and the sum of the values is displayed in the last repeating row of the given Name.
I tired to use sum and group by but I get only two rows. I need to display all 5 rows. How can I write this query?
You can use Row_Number with max of Row_Number to get this formatted sum
;with cte as (
select od.*, sm= sum( od.Quantity*vd.value ) over (partition by Name),
RowN = row_number() over(partition by Name order by od.id)
from #yourOrderDetails od
inner join #yourValueDetails vd
on od.Id = vd.Id
)
select Id, Name, Type, Quantity,
case when max(RowN) over(partition by Name) = row_number() over(partition by Name order by Id)
then sm else null end as ActualSum
from cte
Your input tables:
create table #yourOrderDetails (Id int, Name varchar(20), type varchar(2), Quantity int)
insert into #yourOrderDetails (Id, Name, type, Quantity) values
(2009 ,'john','a', 10 )
,(2009 ,'john','a', 20 ) ,(2010 ,'sam ','b', 25 )
,(2011 ,'sam ','c', 50 ) ,(2012 ,'sam ','d', 30 )
create table #yourValueDetails(Id int, Value Int)
insert into #yourValueDetails(Id, value) values
( 2009 , 300 ) ,( 2010 , 500 )
,( 2011 , 200 ) ,( 2012 , 100 )
SELECT a.ID,
a.Name,
a.Type,
a.quantity,
price = (a.quantity * b.price)
FROM OrderDetails a LEFT JOIN
ValueDetails b on a.id = b.id
This will put the price on every row. If you want to do a SUM by Id,Name and Type it's not going to show the individual records like you show them above. If you want to put a SUM on one of the lines that share the same Id, Name and Type then you'd need a rule to figure out which one and then you could probably use a CASE statement to decide on which line you want to show the SUM total.

T-SQL getting all unique groups with their usage count

How do I find the unique groups that are present in my table, and display how often that type of group is used?
For example (SQL Server 2008R2)
So, I would like to find out how many times the combination of
PMI 100
RT 100
VT 100
is present in my table and for how many itemid's it is used;
These three form a group because together they are assigned to a single itemid. The same combination is assigned to id 2527 and 2529, so therefore this group is used at least twice. (usagecount = 2)
(and I want to know that for all types of groups that are appearing)
The entire dataset is quite large, about 5.000.000 records, so I'd like to avoid using a cursor.
The number of code/pct combinations per itemid varies between 1 and 6.
The values in the "code" field are not known up front, there are more than a dozen values on average
I tried using pivot, but I got stuck eventually and I also tried various combinations of GROUP-BY and counts.
Any bright ideas?
Example output:
code pct groupid usagecount
PMI 100 1 234
RT 100 1 234
VT 100 1 234
CD 5 2 567
PMI 100 2 567
VT 100 2 567
PMI 100 3 123
PT 100 3 123
VT 100 3 123
RT 100 4 39
VT 100 4 39
etc
Just using a simple group:
SELECT
code
, pct
, COUNT(*)
FROM myTable
GROUP BY
code
, pct
Not too sure if that's more like what you're looking for:
select
uniqueGrp
, count(*)
from (
select distinct
itemid
from myTable
) as I
cross apply (
select
cast(code as varchar(max)) + cast(pct as varchar(max)) + '_'
from myTable
where myTable.itemid = I.itemid
order by code, pct
for xml path('')
) as x(uniqueGrp)
group by uniqueGrp
Either of these should return each combination of code and percentage with a group id for the code and the total number of instances of the code against it. You can use them for also adding the number of instances of the specific code/pct combo too for determining % contribution etc
select
distinct
t.code, t.pct, v.groupcol, v.vol
from
[tablename] t
inner join (select code, rank() over(order by count(*)) as groupcol,
count(*) as vol from [tablename] s
group by code) v on v.code=t.code
or
select
t.code, t.pct, v.groupcol, v.vol
from
(select code, pct from [tablename] group by code, pct) t
inner join (select code, rank() over(order by count(*)) as groupcol,
count(*) as vol from [tablename] s
group by code) v on v.code=t.code
Grouping by Code, and Pct should be enough I think. See the following :
select code,pct,count(p.*)
from [table] as p
group by code,pct

T-SQL select rows by oldest date and unique category

I'm using Microsoft SQL. I have a table that contains information stored by two different categories and a date. For example:
ID Cat1 Cat2 Date/Time Data
1 1 A 11:00 456
2 1 B 11:01 789
3 1 A 11:01 123
4 2 A 11:05 987
5 2 B 11:06 654
6 1 A 11:06 321
I want to extract one line for each unique combination of Cat1 and Cat2 and I need the line with the oldest date. In the above I want ID = 1, 2, 4, and 5.
Thanks
Have a look at row_number() on MSDN.
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY date_time, id) rn
FROM mytable
) q
WHERE rn = 1
(run the code on SQL Fiddle)
Quassnoi's answer is fine, but I'm a bit uncomfortable with how it handles dups. It seems to return based on insertion order, but I'm not sure if even that can be guaranteed? (see these two fiddles for an example where the result changes based on insertion order: dup at the end, dup at the beginning)
Plus, I kinda like staying with old-school SQL when I can, so I would do it this way (see this fiddle for how it handles dups):
select *
from my_table t1
left join my_table t2
on t1.cat1 = t2.cat1
and t1.cat2 = t2.cat2
and t1.datetime > t2.datetime
where t2.datetime is null

SQL Server 2008 how to select top [column value] and random record?

I'm using SQL Server 2008, I want select random row record, and the total number of record is depend on another table's column value, how to do this?
My SQL statement is something like this, but wrong..
select top b.number a.name, a.link_id
from A a
left join B b on b.link_id = a.link_id
order by newid()
Here are my tables and the expected result.
Table A:
name link_id
james 100
albert 100
susan 100
simon 101
tom 101
fion 101
Table B:
link_id number
100 2
101 1
Expected result:
when run 1st time, result may be:
name link_id
james 100
susan 100
fion 101
2nd time result may be:
albert 100
susan 100
simon 101
3rd time could be:
james 100
albert 100
fion 101
Explaination
Refer to table B, link_id: 100, number: 2
meaning that Table A should select out 2 random record for link_id = 100
and need to select 1 random record for link_id=101
You can use the ROW_NUMBER() function:
SELECT A.name, A.link_id
FROM(
SELECT name,link_id, ROW_NUMBER()OVER(PARTITION BY link_id ORDER BY NEWID()) rn
FROM dbo.tblA
) AS A
JOIN dbo.tblB AS B
ON A.link_id = B.link_id
WHERE A.rn <= B.number;
Here is a SqlFiddle to show this in action: http://sqlfiddle.com/#!3/92eac/2
Try this:
SELECT a.*
FROM b
CROSS APPLY
(
SELECT TOP (b.number) a.*
FROM a
WHERE a.link_id = b.link_id
ORDER BY
NEWID()
) a
Also see: SQLFiddle

sql select from multiple records only the most recent

i have a table named customer_age that loks like this:
ID 1 2 3 4 5 6 7 8 9
NAME JIM JIM JIM NICK NICK NICK Paul Paul Paul
VALUE 20 13 12 10 20 8 4 24 14
and i want to display only the first record from each name. Something like this
ID 1 4 7
NAME JIM NICK Paul
VALUE 20 10 4
So far i have not been able to work it out.
i use sql server 2005
Any help would be appreciated...
Try using a subselect to find the lowest ID for each name, and use that set of IDs to pull the records from the main table:
SELECT ID, Name, Value
FROM customer_age
WHERE ID IN
(
SELECT MIN(ID) AS ID
FROM customer_age
GROUP BY Name
)
Just select the first record for each name using cross apply:
SELECT
ca.ID, ca.NAME, ca.VALUE
FROM customer_age c
CROSS APPLY (SELECT TOP 1 ID, NAME, VALUE
FROM customer_age ca
WHERE ca.NAME = c.NAME ORDER BY ID) ca
ORDER BY ca.ID
How about using window functions??
SELECT Id, Name, Value
FROM (
SELECT Id, Name, Value, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Id ASC) AS rowNum
FROM customer_age
) AS sub
WHERE rowNum = 1
Assuming first record means highest ID, you may try your query with descending orderby ID and TOP n.

Resources