How do you get the row that gained most value over a period of time out of the large group set?
I've seen some overly-complicated variations on this question, and none with a good answer. I've tried to put together the simplest possible example:
Given a table like the one below, with row#, ID, year, and value columns, how would you find an ID that gained the most value and display the difference as a new column in the output?
Column A
ID
Year
Value
row 1
322
2012
150,000
row 2
322
2013
165,000
row 3
344
2012
220,000
row 4
344
2013
290,000
Desired output:
ID
Value
Value_Gained
344
290,000
70,000
SELECT id, year, value
FROM table
WHERE value = (SELECT MAX(value) FROM table);
The FIRST_VALUE window function will help you get values between last and first year for each of your ids. Then it's sufficient to order by your biggest values and getting one row using TOP(N).
SELECT TOP(1)
ID,
FIRST_VALUE([Value]) OVER(PARTITION BY [ID] ORDER BY [Year] DESC) AS [Value],
FIRST_VALUE([Value]) OVER(PARTITION BY [ID] ORDER BY [Year] DESC)
- FIRST_VALUE([Value]) OVER(PARTITION BY [ID] ORDER BY [Year]) AS [ValueGained]
FROM tab
ORDER BY [Value] DESC
Check the demo here.
Related
I have a table that contains Transactions of Customers.
I should Find Customers That had have at least 2 transaction with amount>20000 in Three consecutive days each month.
For example , Today is 2022/03/12 , I should Gather Data Of Transactions From 2022/02/13 To 2022/03/12, Then check These Data and See If a Customer had at least 2 Transaction With Amount>=20000 in Three consecutive days.
For Example, Consider Below Table:
Id
CustomerId
Transactiondate
Amount
1
1
2022-01-01
50000
2
2
2022_02_01
20000
3
3
2022_03_05
30000
4
3
2022_03_07
40000
5
2
2022_03_07
20000
6
4
2022_03_07
30000
7
4
2022_03_07
30000
The Out Put Should be : CustomerId =3 and CustomerId=4
I write query that Find Customer For Special day , but i don't know how to find these customers in one month with out using loop.
the query for special day is:
With cte (select customerid, amount, TransactionDate,Dateadd(day,-2,TransactionDate) as PrevDate
From Transaction
Where TransactionDate=2022-03-12)
Select CustomerId,Count(*)
From Cte
Where
TransactionDate>=Prevdate and TransactionDate<=TransactionDate
And Amount>=20000
Group By CustomerId
Having count(*)>=2
Hi there are many options how to achieve this.
I think that easies (from perfomance maybe not) is using LAG function:
WITH lagged_days AS (
SELECT
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
,*
FROM Transaction
), valid_cust_base as (
SELECT
*
FROM lagged_days
WHERE DATEPART(MONTH, lagged) = DATEPART(MONTH, Transactiondate)
AND datediff(day, Transactiondate, lagged_dt) <= 3
AND Amount >= 20000
)
SELECT
CustomerID
FROM valid_cust_base
GROUP BY CustomerID
HAVING COUNT(*) >= 2
First I have created lagged TransactionDate over customer (I assume that id is incremental). Then I have Selected only transactions within one month, with amount >= 20000 and where date difference between transaction is less then 4 days. Then just select customers who had more than 1 transaction.
In LAG First value is always missing per Customer missing, but you still need to be able say: 1st and 2nd transaction are within 3 days. Thats why I am replacing first NULL value with LEAD. It doesn't matter if you use:
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
OR
ISNULL(LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
The main goal is to have for each transaction closest TransactionDate.
Lets say I have this query:
SELECT id, date, amount, cancelled
FROM transactions
Which gives me the following results:
id date amount cancelled
1 01/2019 25.10 0
1 02/2019 19.55 1
1 06/2019 20.33 0
2 10/2019 11.00 0
If there are duplicate IDs, how can I get the one with the latest date? So it would look like this:
id date amount cancelled
1 06/2019 20.33 0
2 10/2019 11.00 0
One method is with ROW_NUMBER and a common table expression like this example. In a multi-statement batch, be mindful to terminate the preceding statement with a semi-colon to avoid parsing errors.
WITH data_with_date_sequence AS (
SELECT
id
, date
, amount
, cancelled
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY date DESC) AS seq
FROM dbo.SomeTable
)
SELECT
id
, date
, amount
, cancelled
FROM data_with_date_sequence
WHERE seq = 1;
One option could be to use ROW_NUMBER function, which will group rows by id and order them by date within same id.
;WITH max_dates AS (
SELECT id,
, date
, amount
, cancelled
, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS Position
FROM transactions
)
SELECT * FROM max_dates WHERE Position = 1
I'm trying to sum totals together that goes beyond a basic "group by" or "case" statement.
Here's an example datasets:
Amt Cust_id Ranking PlanType
10 1 1 Term
6 1 2 Variable
8 1 3 Variable
7 1 4 Variable
12 1 5 Term
6 1 6 Variable
10 1 7 Variable
The objective is to return the max sum where the plan type is 'Variable' and
the Ranking numbers are adjacent to each other.
So the answer to the example would be the sum of rows 2-4 which returns 21.
The answer is not the sum of all variable plan types, because row 5 is a 'Term' which breaks it apart.
So I'd like to end with a dataset like below to handle multiple groups of customers:
Amt Cust_ID
21 1
30 2
45 3
Here's where I'm stuck which returns wrong answer:
Create Table #tb (Amt INT, Cust_id TINYINT, Ranking INT, PlanType
VARCHAR(10))
INSERT INTO #tb
VALUES (10,1,1,'Term'),
(6,1,2,'Variable'),
(8,1,3,'Variable'),
(7,1,4,'Variable'),
(12,1,5,'Term'),
(6,1,6,'Variable'),
(10,1,7,'Variable'),
(10,2,1,'Term'),
(6,2,2,'Variable'),
(7,2,4,'Variable'),
(12,2,5,'Term'),
(6,2,6,'Variable'),
(50,2,7,'Variable')
select
( SELECT SUM(Amt) FROM #tb as t2
WHERE t2.Cust_ID=t1.Cust_ID AND t2.Ranking<=t1.Ranking AND
t2.PlanType='Variable') RollingAmt
,Cust_ID, Ranking, Amt, PlanType
from #tb as t1
order by Cust_ID, Ranking
The query runs a rolling sum ordered by "Ranking" where PlanType = 'Variable'. Unfortunately it runs a rolling sum of all "Variable"'s together. I need it to not do that.
If it runs into a PlanType "Term" it needs to start over its sum within each group.
In order to do this you need to use a gaps-and-islands technique to generate a "group id" based on consecutive runs of the same PlanType, then you can sum and sort based on that new group id.
Try this:
DECLARE #data TABLE (Amt INT, Cust_id TINYINT, Ranking INT, PlanType VARCHAR(10))
INSERT INTO #data
VALUES (10,1,1,'Term'),
(6,1,2,'Variable'),
(8,1,3,'Variable'),
(7,1,4,'Variable'),
(12,1,5,'Term'),
(6,1,6,'Variable'),
(10,1,7,'Variable'),
(10,2,1,'Term'),
(6,2,2,'Variable'),
(7,2,4,'Variable'),
(12,2,5,'Term'),
(6,2,6,'Variable'),
(50,2,7,'Variable')
;WITH X AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Cust_id,PlanType ORDER BY Ranking)
- ROW_NUMBER() OVER(PARTITION BY Cust_id ORDER BY Ranking) groupID /* Assign a groupID to consecutive runs of PlanTypes by Cust_id */
FROM #data
), Y AS
(
SELECT *, SUM(Amt) OVER(PARTITION BY Cust_id,groupID) AS AmtSum /* Sum Amt by Cust/groupID */
FROM X
WHERE PlanType='Variable'
), Z AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Cust_id ORDER BY AmtSum DESC) AS RN /* Assign a row number (1) to highest AmtSum by Cust */
FROM Y
)
SELECT AmtSum, Cust_id
FROM Z
WHERE RN=1 /* Only select RN=1 to get highest value by cust_id/groupId */
If you are curious about how this all works, you can comment the last SELECT and do SELECT * FROM X then SELECT * FROM Y etc, to see what each step does along the way; but only one SELECT can follow the entire CTE structure.
I have a table structured as below:
ID Name RunDate
10001 Item 1 12/09/2013 02:11:47
10002 Item 2 12/09/2013 01:13:25
10001 Item 1 12/09/2013 01:11:37
10007 Item 7 12/08/2013 11:02:04
10001 Item 1 12/08/2013 10:25:00
My problem is that this table will be sent to a distribution group email and it makes the e-mail so big because the table has more than hundreds of rows. What I want to achieve is to only show the records that have DISTINCT ID showing only the most-recent RunDate.
ID Name RunDate
10001 Item 1 12/09/2013 02:11:47
10002 Item 2 12/09/2013 01:13:25
10007 Item 7 12/08/2013 11:02:04
Any idea how I can do this? I'm not very good with aggregate stuff and I've used DISTINCT but it always mess up my query.
Thanks!
Group by the values that should be distinct and use max() to get the most current date
select id, name, max(rundate) as rundate
from your_table
group by id, name
This is more flexible because it doesn't require grouping by all columns:
;WITH x AS
(
SELECT ID, Name, RunDate, /* other columns, */
rn = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY RunDate DESC)
FROM dbo.TableName
)
SELECT ID, Name, RunDate /* , other columns */
FROM x
WHERE rn = 1
ORDER BY ID;
(Since Name doesn't really need to be grouped, and in fact shouldn't even be in this table, and the next follow-up question to the GROUP BY solution is almost always, "How do I add <column x> and <column y> to the output, if they have different values and can't be added to the GROUP BY?")
I'm having a hard time getting my head around a query im trying to build with SQL Server 2005.
I have a table, lets call its sales:
SaleId (int) (pk) EmployeeId (int) SaleDate(datetime)
I want to produce a report listing the total number of sales by an employee for each day in a given data range.
So, for example I want the see all sales in December 1st 2009 - December 31st 2009 with an output like:
EmployeeId Dec1 Dec2 Dec3 Dec4
1 10 10 1 20
2 25 10 2 2
..etc however the dates need to be flexible.
I've messed around with using pivot but cant quite seem to get it, any ideas welcome!
Here's a complete example. You can change the date range to fit your needs.
use sandbox;
create table sales (SaleId int primary key, EmployeeId int, SaleAmt float, SaleDate date);
insert into sales values (1,1,10,'2009-12-1');
insert into sales values (2,1,10,'2009-12-2');
insert into sales values (3,1,1,'2009-12-3');
insert into sales values (4,1,20,'2009-12-4');
insert into sales values (5,2,25,'2009-12-1');
insert into sales values (6,2,10,'2009-12-2');
insert into sales values (7,2,2,'2009-12-3');
insert into sales values (8,2,2,'2009-12-4');
SELECT * FROM
(SELECT EmployeeID, DATEPART(d, SaleDate) SaleDay, SaleAmt
FROM sales
WHERE SaleDate between '20091201' and '20091204'
) src
PIVOT (SUM(SaleAmt) FOR SaleDay
IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31])) AS pvt;
Results (actually 31 columns (for all possible month days) will be listed, but I'm just showing first 4):
EmployeeID 1 2 3 4
1 10 10 1 20
2 25 10 2 2
I tinkered a bit, and I think this is how you can do it with PIVOT:
select employeeid
, [2009/12/01] as Dec1
, [2009/12/02] as Dec2
, [2009/12/03] as Dec3
, [2009/12/04] as Dec4
from sales pivot (
count(saleid)
for saledate
in ([2009/12/01],[2009/12/02],[2009/12/03],[2009/12/04])
) as pvt
(this is my table:
CREATE TABLE [dbo].[sales](
[saleid] [int] NULL,
[employeeid] [int] NULL,
[saledate] [date] NULL
data is: 10 rows for '2009/12/01' for emp1, 25 rows for '2009/12/01' for emp2, 10 rows for '2009/12/02' for emp1, etc.)
Now, i must say, this is the first time I used PIVOT and perhaps I am not grasping it, but this seems pretty useless to me. I mean, what good is it to have a crosstab if you cannot do anything to specify the columns dynamically?
EDIT: ok- dcp's answer does it. The trick is, you don't have to explicitly name the columns in the SELECT list, * will actually correctly expand to a column for the first 'unpivoted' column, and a dynamically generated column for each value that appears in the FOR..IN clause in the PIVOT construct.