Query to SELECT non-repeating values from table - sql-server

I have a table structured as below:
ID Name RunDate
10001 Item 1 12/09/2013 02:11:47
10002 Item 2 12/09/2013 01:13:25
10001 Item 1 12/09/2013 01:11:37
10007 Item 7 12/08/2013 11:02:04
10001 Item 1 12/08/2013 10:25:00
My problem is that this table will be sent to a distribution group email and it makes the e-mail so big because the table has more than hundreds of rows. What I want to achieve is to only show the records that have DISTINCT ID showing only the most-recent RunDate.
ID Name RunDate
10001 Item 1 12/09/2013 02:11:47
10002 Item 2 12/09/2013 01:13:25
10007 Item 7 12/08/2013 11:02:04
Any idea how I can do this? I'm not very good with aggregate stuff and I've used DISTINCT but it always mess up my query.
Thanks!

Group by the values that should be distinct and use max() to get the most current date
select id, name, max(rundate) as rundate
from your_table
group by id, name

This is more flexible because it doesn't require grouping by all columns:
;WITH x AS
(
SELECT ID, Name, RunDate, /* other columns, */
rn = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY RunDate DESC)
FROM dbo.TableName
)
SELECT ID, Name, RunDate /* , other columns */
FROM x
WHERE rn = 1
ORDER BY ID;
(Since Name doesn't really need to be grouped, and in fact shouldn't even be in this table, and the next follow-up question to the GROUP BY solution is almost always, "How do I add <column x> and <column y> to the output, if they have different values and can't be added to the GROUP BY?")

Related

Finding a difference then the largest value over time

How do you get the row that gained most value over a period of time out of the large group set?
I've seen some overly-complicated variations on this question, and none with a good answer. I've tried to put together the simplest possible example:
Given a table like the one below, with row#, ID, year, and value columns, how would you find an ID that gained the most value and display the difference as a new column in the output?
Column A
ID
Year
Value
row 1
322
2012
150,000
row 2
322
2013
165,000
row 3
344
2012
220,000
row 4
344
2013
290,000
Desired output:
ID
Value
Value_Gained
344
290,000
70,000
SELECT id, year, value
FROM table
WHERE value = (SELECT MAX(value) FROM table);
The FIRST_VALUE window function will help you get values between last and first year for each of your ids. Then it's sufficient to order by your biggest values and getting one row using TOP(N).
SELECT TOP(1)
ID,
FIRST_VALUE([Value]) OVER(PARTITION BY [ID] ORDER BY [Year] DESC) AS [Value],
FIRST_VALUE([Value]) OVER(PARTITION BY [ID] ORDER BY [Year] DESC)
- FIRST_VALUE([Value]) OVER(PARTITION BY [ID] ORDER BY [Year]) AS [ValueGained]
FROM tab
ORDER BY [Value] DESC
Check the demo here.

Altering a QUALIFY with an additional criterion

In Snowflake I have this original query which, for a given consumer_ID, produces a list of unique store IDs.
SELECT
t.consumer_id
, t.business_id
, t.store_id
, t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
The original purpose was to provide a list that does not duplicate store_id for a given consumer_id. Suppose now I also need to ensure this list does not duplicate business_id as well for a given consumer_ID. Is there an easy way to modify the above?
SELECT
t.consumer_id
, t.business_id
, t.store_id
, t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER
(PARTITION BY t.consumer_id
,t.store_id
,t.business_id
ORDER BY t.campaign_id) = 1
The partition by clause forms windows by the combination of all the expressions in the clause.
This will deduplicate by the combination of consumer_id, store_id, and business_id. If this is not what you need, please update with sample input and output to clarify.
So if I make up some data:
WITH campaigns_mini(consumer_id, business_id, store_id, campaign_id) as (
select * from values
(1,10,100,1000),
(1,10,100,1001),
(1,10,101,1002),
(2,20,200,2000)
)
and use your exist SQL
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
I get
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
101
1002
1
10
100
1000
2
20
200
2000
we get the Store not repeated for the Consumer, but as you note you don't want the business repeated ether..
If we change to using business_id instead of store_id we see we get less rows:
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
ORDER BY 1;
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
2
20
200
2000
So if we want "no repeating business_id AND no repeating stores" using the Qualify Greg's has proposed will not help, as we are keeping the first for the distinct set of consumer,business, & store:
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id, t.store_id ORDER BY t.campaign_id) = 1
which gives:
CONSUMER_ID |BUSINESS_ID |STORE_ID |CAMPAIGN_ID
1 |10 |100 |1000
1 |10 |101 |1002
2 |20 |200 |2000
So the next thing is to think why not keep the only the first of the two sets:
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
AND ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
which for this data works!
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
2
20
200
2000
but then for this data:
WITH campaigns_mini(consumer_id, business_id, store_id, campaign_id) as (
select * from values
(1,10,100,1000),
(1,10,101,1001),
(1,20,101,1002)
)
there is only one row with business 20, for store 101, but the first 101 store is on campaign 1001, so both those rows are discarded.
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
So if we use two layers to do the prune, for this data:
select * from (
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
)
QUALIFY ROW_NUMBER() OVER (PARTITION BY consumer_id, store_id ORDER BY campaign_id) = 1
works:
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
1
20
101
1002
but if your flip those orders of QUALIFY you are back to just one row..
so as a general problem it cannot be safely solve for all data cases with this pattern...

Average using SQL Group by needs to omit duplicates and group by more than one column

I'm using SQL Server 2016 and I'm having an issue grouping by more than one col and finding an average while omitting duplicate rows. I have a transaction table defined as:
CREATE TABLE [dbo].[CUST_TRANSACTION](
[EXTRACT_DATE] [date] NULL,
[CUSTOMER_ID] [bigint] NULL,
[TRANS_NUMBER] [bigint] NULL,
[CATEGORY] [smallint] NULL,
[RANKING] [smallint] NULL )
Here is some data:
EXTRACT_DATE CUSTOMER_ID TRANS_NUMBER CATEGORY RANKING
10/31/2017 10001 1000101 4 100
10/31/2017 10001 1000102 4 100
10/31/2017 10002 1000201 4 200
10/31/2017 10001 1000103 5 100
10/31/2017 10003 1000301 5 300
10/31/2017 10003 1000302 5 300
10/31/2017 10004 1000401 7 500
10/31/2017 10001 1000104 8 100
The Customer_Id AND TRANS_NUMBER combo needs to be unique, but a customer_id can have 1 to Many Trans_Numbers and a Customer_Id can exist in 1 to many Categories. From the data I reviewed, the Ranking for a Customer_ID seems to be the same for a given EXTRACT_DATE. I found no NULLS in the Ranking, but I did find zeroes, so I need to exclude any zeroes from the Average.
The request is to generate a report broken down by each Category ( 1 - 15) and find the Average Ranking within that Category, but to only count a customer_id once and also find the Max Ranking with that Category. This is for a given EXTRACT_Date.
So I ran the following:
Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from CUST_TRANSACTION
where EXTRACT_DATE = Convert(datetime, '2017-10-31' )
and RANKING > 1
group by CATEGORY
order by CATEGORY
Generated the following output:
CATEGORY Max Ranking Average Ranking
4 200 133
5 300 233
7 500 500
8 100 100
But Category 4 should have an Average of 150 since customer_Id = 10001 has two entries and Category 5 should be = 200 since Customer_id 10003 has two entries.
When I tried to Group by both Category, Customer_Id, the output includes each combination of Category and Customer_Id, which is what Group by does. So I'm not sure if I need a sub-select or any other ideas?
Thanks
it looks like you don't care about the trans_number mappings, so you could remove it and choose distinct remaining values in a derived table:
Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from ( select distinct [EXTRACT_DATE] ,
[CUSTOMER_ID] ,
[CATEGORY] ,
[RANKING] from CUST_TRANSACTION )CUST_TRANSACTION
where EXTRACT_DATE = Convert(datetime, '2017-10-31' )
and RANKING > 1
group by CATEGORY
order by CATEGORY
You can use Common Table Expression (CTE) to filter out duplicate customerID in a category. Something like this.
;with cte as (
select CATEGORY, RANKING, EXTRACT_DATE
ROW_NUMBER() over(partition by category, customer_id order by customer_id) rn
from CUST_TRANSACTION
)
Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from cte --CUST_TRANSACTION
where EXTRACT_DATE = Convert(datetime, '2017-10-31' )
and RANKING > 1
and rn = 1
group by CATEGORY
order by CATEGORY
Due to different requirements of overall average and maximum you can't use a single column to get both. A sub-select will deliver one column for averaging and another for maximum'ing.
DECLARE #QUERY_DATE DATE = '2017-10-31';
Select
CATEGORY
, MAX(RANKING_detail_max) "Max Ranking"
, AVG(RANKING_detail_sum) "Average Ranking"
from (
select CATEGORY
, CUSTOMER_ID
, SUM(RANKING) RANKING_detail_sum
, MAX(RANKING) RANKING_detail_max
from CUST_TRANSACTION
where EXTRACT_DATE = #QUERY_DATE
and RANKING > 0
group by CATEGORY, CUSTOMER_ID
) rollup
group by CATEGORY
order by CATEGORY

DISTINCT and GROUP BY with SQL Server

I have the following table (sql server) and i'm looking for a query to select the last two rows with all fields:
order by created_at
group by / distinct type_id
id type_id some_value created_at
1 B mk2 2016-10-01 00:00:00.000
2 A mbs 2016-10-01 10:02:39.077
3 B sa 2016-10-02 10:03:08.123
4 A xc 2016-10-02 10:03:28.777
5 B q1 2016-10-03 10:04:20.920
6 A tr 2016-10-03 10:04:48.533
7 A 1a 2016-09-30 10:36:26.287
In MySQL its an easy task - but with SQL Server all fields have to be contained in either an aggregate function or the GROUP BY clause. But that results in field combinations that does not exist.
Is there a way to handle this?
Thanks in advance!
Solution
Based on the comment from Andrew Deighton i did this:
SELECT *
FROM (
SELECT
id,
type_id,
some_value,
created_at,
ROW_NUMBER()
OVER (PARTITION BY type_id
ORDER BY created_at DESC) AS row
FROM test_sql
) AS ts
WHERE row = 1
ORDER BY row
Conclusion: No need for GROUP BY and DISTINCT.

Combine two results in one row

EmpID Name Date Earn
1 A 7/1/2014 2
1 A 7/1/2014 4
1 A 7/2/2014 1
1 A 7/2/2014 2
2 B 7/1/2014 5
2 B 7/2/2014 5
I would like combine two results in one row as below.here is my statement but i want to find the solution to get the Total_Earn?. Thank
"SELECT EmpID, Name, Date, Sum(earn) FROM employee WHERE Date between DateFrom and DateTo
GROUP BY EmpID, Name, Date"
EmpID Name Date Earn Total_Earn
1 A 7/2/2014 3 9
2 B 7/2/2014 5 10
It looks like you want the Max date and the Sum of Earn for each employee. Assuming you want one record for each ID/Name, you would do this:
select EmpID, Name, Max(Date), Sum(Earn)
from YourTableName
group by EmpID, Name
Try this. Substitute the date for whatever value you want.
SELECT table1.EmpID, table1.Name, table1.Date, table1.Earn, table2.Total_Earn
FROM
(SELECT EmpID, Name, Date, Earn
FROM yourtablename
WHERE Date = "2014-07-02"
GROUP BY EmpID) table1
LEFT JOIN
(SELECT EmpID, SUM(Earn)
FROM yourtablename
WHERE Date <= "2014-07-02"
GROUP BY EmpID) table2
ON table1.EmpID = table2.EmpID
This will perform two SELECTs and join their results. The first select (defined as table1) well select the employee ID and earnings for the specified date.
The second statement (defined as table2) will select the total earnings for an employee up to and including that date.
The two statements are then joined together according to the employee ID.

Resources