I often find when I am pulling data for analysis, that I group the number of orders a customer has placed into ranges, such as:
1-2
3-5
6-9
10-12
13-15
I do this with a CASE function. However, when you get the query results, the order ranges will be listed like:
1-2
10-12
13-15
3-5
6-9
This easy to correct in Excel when you have 1 query and a few order range groups. However, when you're pulling many queries, it's a pain to correct this over and over.
What is the best way to pull a range and have it ordered correctly?
here's an example of the query I would write:
SELECT
OrderRange = CASE
WHEN COUNT(OrderID) BETWEEN 1 AND 5 THEN '1-5'
WHEN COUNT(OrderID) BETWEEN 6 AND 10 THEN '6-10'
WHEN COUNT(OrderID) > 10 THEN '10+'
ELSE 'Error'
END
FROM Orders
GROUP BY CASE
WHEN COUNT(OrderID) BETWEEN 1 AND 5 THEN '1-5'
WHEN COUNT(OrderID) BETWEEN 6 AND 10 THEN '6-10'
WHEN COUNT(OrderID) > 10 THEN '10+'
ELSE 'Error'
END
ORDER BY... ?
I'd keep a table of ranges, e.g. (indices not written)
CREATE TABLE Ranges (RangeSet int, MinVal int, MaxVal int, Name varchar(50));
and then e.g.
INSERT INTO ranges VALUES
(1,1,5,'1-5'),(1,6,10,'6-10'),(1,11,-1,'11+'),
(2,1,10,'1-10'),(2,11,20,'11-20'),(2,21,30,'21-30'),(2,31,-1,'31+');
you get the idea. Now you do something like (table and field names free fiction)
SELECT
CustomerID,
count(OrderID) AS OrderCount
FROM Orders
WHERE <whatever, e.g order_date BETWEEN ... AND ...>
GROUP BY CustomerID
HAVING OrderCount>0
as you'd normally would expect, but wrap it in a superquery joining to the Ranges table
SELECT
BaseView.CustomerID as CustomerID,
Ranges.Name as OrderRange
FROM (
SELECT
CustomerID,
count(OrderID) AS OrderCount
FROM Orders
WHERE <whatever, e.g order_date BETWEEN ... AND ...>
GROUP BY CustomerID
HAVING OrderCount>0
) AS BaseView
INNER JOIN Ranges ON
Ranges.RangeSet=<id-of-required-rangeset>
AND BaseView.OrderCount>=Ranges.MinVal
AND (BaseView.OrderCount<=Ranges.MaxVal OR Ranges.MaxVal=-1)
ORDER BY RangeSet.MinVal DESC
;
Now you just have to supply the RangeSet you want to apply, maybe creating a new one on occasion.
Disclaimer: This is a performance-killer
If I'm understanding you correctly you want the list of customers and order ranges ordered from least to highest. You should be able to do that by just ordering by the count(orderID)
SELECT CustomerID,
OrderRange = CASE
WHEN COUNT(OrderID) BETWEEN 1 AND 5 THEN '1-5'
WHEN COUNT(OrderID) BETWEEN 6 AND 10 THEN '6-10'
WHEN COUNT(OrderID) > 10 THEN '10+'
ELSE 'Error'
END ,
FROM Orders
GROUP BY CustomerID
order by count(orderid)
Results:
CustomerId OrderRange
CENTC 1-5
GROSR 1-5
LAZYK 1-5
...
ROMEY 1-5
VINET 1-5
ALFKI 6-10
CACTU 6-10
...
VICTE 6-10
WANDK 6-10
BLONP 10+
GREAL 10+
RICAR 10+
...
QUICK 10+
ERNSH 10+
SAVEA 10+
Related
I have a table that contains Transactions of Customers.
I should Find Customers That had have at least 2 transaction with amount>20000 in Three consecutive days each month.
For example , Today is 2022/03/12 , I should Gather Data Of Transactions From 2022/02/13 To 2022/03/12, Then check These Data and See If a Customer had at least 2 Transaction With Amount>=20000 in Three consecutive days.
For Example, Consider Below Table:
Id
CustomerId
Transactiondate
Amount
1
1
2022-01-01
50000
2
2
2022_02_01
20000
3
3
2022_03_05
30000
4
3
2022_03_07
40000
5
2
2022_03_07
20000
6
4
2022_03_07
30000
7
4
2022_03_07
30000
The Out Put Should be : CustomerId =3 and CustomerId=4
I write query that Find Customer For Special day , but i don't know how to find these customers in one month with out using loop.
the query for special day is:
With cte (select customerid, amount, TransactionDate,Dateadd(day,-2,TransactionDate) as PrevDate
From Transaction
Where TransactionDate=2022-03-12)
Select CustomerId,Count(*)
From Cte
Where
TransactionDate>=Prevdate and TransactionDate<=TransactionDate
And Amount>=20000
Group By CustomerId
Having count(*)>=2
Hi there are many options how to achieve this.
I think that easies (from perfomance maybe not) is using LAG function:
WITH lagged_days AS (
SELECT
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
,*
FROM Transaction
), valid_cust_base as (
SELECT
*
FROM lagged_days
WHERE DATEPART(MONTH, lagged) = DATEPART(MONTH, Transactiondate)
AND datediff(day, Transactiondate, lagged_dt) <= 3
AND Amount >= 20000
)
SELECT
CustomerID
FROM valid_cust_base
GROUP BY CustomerID
HAVING COUNT(*) >= 2
First I have created lagged TransactionDate over customer (I assume that id is incremental). Then I have Selected only transactions within one month, with amount >= 20000 and where date difference between transaction is less then 4 days. Then just select customers who had more than 1 transaction.
In LAG First value is always missing per Customer missing, but you still need to be able say: 1st and 2nd transaction are within 3 days. Thats why I am replacing first NULL value with LEAD. It doesn't matter if you use:
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
OR
ISNULL(LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
The main goal is to have for each transaction closest TransactionDate.
I want to know who has the most friends from the app I own(transactions), which means it can be either he got paid, or paid himself to many other users.
I can't make the query to show me only those who have the max friends number (it can be 1 or many, and it can be changed so I can't use limit).
;with relationships as
(
select
paid as 'auser',
Member_No as 'afriend'
from Payments$
union all
select
member_no as 'auser',
paid as 'afriend'
from Payments$
),
DistinctRelationships AS (
SELECT DISTINCT *
FROM relationships
)
select
afriend,
count(*) cnt
from DistinctRelationShips
GROUP BY
afriend
order by
count(*) desc
I just can't figure it out, I've tried count, max(count), where = max, nothing worked.
It's a two columns table - "Member_No" and "Paid" - member pays the money, and the paid is the one who got the money.
Member_No
Paid
14
18
17
1
12
20
12
11
20
8
6
3
2
4
9
20
8
10
5
20
14
16
5
2
12
1
14
10
It's from Excel, but I loaded it into sql-server.
It's just a sample, there are 1000 more rows
It seems like you are massively over-complicating this. There is no need for self-joining.
Just unpivot each row so you have both sides of the relationship, then group it up by one side and count distinct of the other side
SELECT
-- for just the first then SELECT TOP (1)
-- for all that tie for the top place use SELECT TOP (1) WITH TIES
v.Id,
Relationships = COUNT(DISTINCT v.Other),
TotalTransactions = COUNT(*)
FROM Payments$ p
CROSS APPLY (VALUES
(p.Member_No, p.Paid),
(p.Paid, p.Member_No)
) v(Id, Other)
GROUP BY
v.Id
ORDER BY
COUNT(DISTINCT v.Other) DESC;
db<>fiddle
I'm using SQL Server 2016 and I'm having an issue grouping by more than one col and finding an average while omitting duplicate rows. I have a transaction table defined as:
CREATE TABLE [dbo].[CUST_TRANSACTION](
[EXTRACT_DATE] [date] NULL,
[CUSTOMER_ID] [bigint] NULL,
[TRANS_NUMBER] [bigint] NULL,
[CATEGORY] [smallint] NULL,
[RANKING] [smallint] NULL )
Here is some data:
EXTRACT_DATE CUSTOMER_ID TRANS_NUMBER CATEGORY RANKING
10/31/2017 10001 1000101 4 100
10/31/2017 10001 1000102 4 100
10/31/2017 10002 1000201 4 200
10/31/2017 10001 1000103 5 100
10/31/2017 10003 1000301 5 300
10/31/2017 10003 1000302 5 300
10/31/2017 10004 1000401 7 500
10/31/2017 10001 1000104 8 100
The Customer_Id AND TRANS_NUMBER combo needs to be unique, but a customer_id can have 1 to Many Trans_Numbers and a Customer_Id can exist in 1 to many Categories. From the data I reviewed, the Ranking for a Customer_ID seems to be the same for a given EXTRACT_DATE. I found no NULLS in the Ranking, but I did find zeroes, so I need to exclude any zeroes from the Average.
The request is to generate a report broken down by each Category ( 1 - 15) and find the Average Ranking within that Category, but to only count a customer_id once and also find the Max Ranking with that Category. This is for a given EXTRACT_Date.
So I ran the following:
Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from CUST_TRANSACTION
where EXTRACT_DATE = Convert(datetime, '2017-10-31' )
and RANKING > 1
group by CATEGORY
order by CATEGORY
Generated the following output:
CATEGORY Max Ranking Average Ranking
4 200 133
5 300 233
7 500 500
8 100 100
But Category 4 should have an Average of 150 since customer_Id = 10001 has two entries and Category 5 should be = 200 since Customer_id 10003 has two entries.
When I tried to Group by both Category, Customer_Id, the output includes each combination of Category and Customer_Id, which is what Group by does. So I'm not sure if I need a sub-select or any other ideas?
Thanks
it looks like you don't care about the trans_number mappings, so you could remove it and choose distinct remaining values in a derived table:
Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from ( select distinct [EXTRACT_DATE] ,
[CUSTOMER_ID] ,
[CATEGORY] ,
[RANKING] from CUST_TRANSACTION )CUST_TRANSACTION
where EXTRACT_DATE = Convert(datetime, '2017-10-31' )
and RANKING > 1
group by CATEGORY
order by CATEGORY
You can use Common Table Expression (CTE) to filter out duplicate customerID in a category. Something like this.
;with cte as (
select CATEGORY, RANKING, EXTRACT_DATE
ROW_NUMBER() over(partition by category, customer_id order by customer_id) rn
from CUST_TRANSACTION
)
Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from cte --CUST_TRANSACTION
where EXTRACT_DATE = Convert(datetime, '2017-10-31' )
and RANKING > 1
and rn = 1
group by CATEGORY
order by CATEGORY
Due to different requirements of overall average and maximum you can't use a single column to get both. A sub-select will deliver one column for averaging and another for maximum'ing.
DECLARE #QUERY_DATE DATE = '2017-10-31';
Select
CATEGORY
, MAX(RANKING_detail_max) "Max Ranking"
, AVG(RANKING_detail_sum) "Average Ranking"
from (
select CATEGORY
, CUSTOMER_ID
, SUM(RANKING) RANKING_detail_sum
, MAX(RANKING) RANKING_detail_max
from CUST_TRANSACTION
where EXTRACT_DATE = #QUERY_DATE
and RANKING > 0
group by CATEGORY, CUSTOMER_ID
) rollup
group by CATEGORY
order by CATEGORY
This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"
One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.
I have say the following rows
Country Population
IE 30
IE 20
UK 15
DE 20
DE 10
UK 20
BE 5
So basically I want to net the values together only for IE and DE... the rest I just want the values
So this would sum them all ..
Select Country, Sum(Population) From CountryPopulation group by Country
and I can add a where clause to exclude all other countries except IE and DE... but I also want these in the result set but just not summed.
So the table above would look like this when summed
Country Population
IE 50 -- Summed
UK 15 -- Orginal Value
DE 30 -- Summed
UK 20 -- Orginal Value
BE 5 -- Orginal Value
Problem is I can’t get a sum if, or case to work as the query has to be aggregated by group by. Only other way I can thing on is to
Sum all the IE and DE and union it with the rest of the data..
Or
Maybe use a CTE
Is there a nice slick way of doing this....
Select Country, Sum(Population)
From CountryPopulation
group by case when Country in ('IE','DE')
then 'IE_DE'
else Country
end
declare #t table (Country char(2), Population int)
insert into #t (Country, Population) values
('IE',30),
('IE',20),
('UK',15),
('DE',20),
('DE',10),
('UK',20),
('BE',5 )
; With Ordered as (
select Country,Population,CASE
WHEN Country in ('IE','DE') THEN 1
ELSE ROW_NUMBER() OVER (ORDER BY Country)
END as rn
from #t
)
select Country,rn,SUM(Population)
from Ordered
group by Country,rn
Produces:
Country rn
------- -------------------- -----------
BE 1 5
DE 1 30
IE 1 50
UK 6 15
UK 7 20
The trick is to just introduce a unique value for each row, except for the IE and DE rows that all get a 1. If the source rows all, actually, already have such a unique value then the CTE can be simplified (or avoided, at the expense of having to place the CASE expression in the GROUP BY as well as the SELECT)
You could also use UNION ALL and divide this query into two:
SELECT P.country,
P.population
FROM (SELECT country,
Population = Sum(population)
FROM dbo.countrypopulation cp
WHERE country IN ( 'IE', 'DE' )
GROUP BY country
UNION ALL
SELECT country, population
FROM dbo.countrypopulation cp
WHERE country NOT IN ( 'IE', 'DE' )
) P
ORDER BY P.population DESC
Even if this is not so concise it is readable and efficient.
sql-fiddle