Use percentile_cont with a "group by" statment in T-SQL - sql-server

I'd like to use the percentile_cont function to get median values in T-SQL. However, I also need to get mean values as well. I'd like to do something like the following:
SELECT CustomerID ,
AVG(Expenditure) AS MeanSpend , percentile_cont
( .5) WITHIN GROUP(ORDER BY Expenditure) OVER( ) AS MedianSpend
FROM Customers
GROUP BY CustomerID
Can this be accomplished? I know I can use the OVER clause to group the percentile_cont results...
but then I'm stuck using two queries, am I not?

Just figured it out... gotta drop the group by and give both aggregation functions a over statement.
SELECT CustomerID,
AVG(Expenditure) OVER(PARTITION BY CustomerID) AS MeanSpend,
percentile_cont(.5) WITHIN GROUP(ORDER BY Expenditure) OVER(PARTITION BY CustomerID) AS MedianSpend
FROM Customers

You can't use "group by" with window functions. These functions return the aggregated values for every row. One way is to use "select distinct" to get rid of the duplicate rows. Just make sure you partition each window function by the non-aggregated columns (groupId in this example).
--Generate test data
SELECT TOP(10)
value.number%3 AS groupId
, value.number AS number
INTO #data
FROM master.dbo.spt_values AS value
WHERE value."type" = 'P'
ORDER BY NEWID()
;
--View test data
SELECT * FROM #data ORDER BY groupId,number;
--CALCULATE MEDIAN
SELECT DISTINCT
groupId
, AVG(number) OVER(PARTITION BY groupId) AS mean
, percentile_cont(.5) WITHIN GROUP(ORDER BY number) OVER(PARTITION BY groupId) AS median
FROM #data
;
--Clean up
DROP TABLE #data;

Related

SQL Server window function implementation issue

I have a table structure like below:
I have the following query to get the unique result from the table:
WITH Dupes AS
(
SELECT
ID, Template_ID, Address, Job_Number, Other_Info,
Assigned_By, Assignees, Active, seen,
ROW_NUMBER() OVER (PARTITION BY Template_ID,Job_Number ORDER BY ID) AS RowNum
FROM
Schedule
WHERE
Assignees IN ('9', '16', '22')
)
SELECT
ID, Template_ID, Job_Number, Address, Other_Info,
Assigned_By, Assignees, Active, seen
FROM
Dupes
WHERE
RowNum = 1
Output of the above query is:
If the Job_Number and Template_ID are same, only return one row(first row using ID). That is why I did use ROW_NUMBER() OVER(PARTITION BY Template_ID,Job_Number ORDER BY ID) AS RowNum. I am not sure how to fix this as I rarely used this function.
I need to get the output like below:
Updated Code
Tried the code below:
seems your trying to group by Job_Number, remove Template_ID on your partition by clause
WITH Dupes AS
(
SELECT ID,Template_ID,Address,Job_Number,Other_Info,Assigned_By,Assignees,Active,seen,
ROW_NUMBER() OVER(PARTITION BY rtrim(ltrim(Job_Number)) ORDER BY ID) AS RowNum
FROM Schedule
WHERE Assignees IN('9','16','22')
)
SELECT ID,Template_ID,Job_Number,Address,Other_Info,Assigned_By,Assignees, Active,seen FROM Dupes WHERE RowNum=1

Column 'ACCOUNT.ACCOUNT_ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

I am trying to get available balance on last(max) date. I am trying to write below query but it is showing error.
select ACCOUNT_ID,AVAIL_BALANCE,OPEN_DATE,MAX(LAST_ACTIVITY_DATE)
from ACCOUNT
group by CUST_ID;
Column 'ACCOUNT.ACCOUNT_ID' is invalid in the select list because it
is not contained in either an aggregate function or the GROUP BY
clause.
I am new to sql. Can anyone let me know where I am wrong in this query?
Any column not having a calculation/function on it must be in the GROUP BY clause.
select ACCOUNT_ID,AVAIL_BALANCE,OPEN_DATE,MAX(LAST_ACTIVITY_DATE)
from ACCOUNT
group by ACCOUNT_ID,AVAIL_BALANCE,OPEN_DATE;
If you're wanting the most recent row for each customer, think ROW_NUMBER(), not GROUP BY:
;With Numbered as (
select *,ROW_NUMBER() OVER (
PARTITION BY CUST_ID
ORDER BY LAST_ACTIVITY_DATE desc) rn
from Account
)
select ACCOUNT_ID,AVAIL_BALANCE,OPEN_DATE,LAST_ACTIVITY_DATE
from Numbered
where rn=1
I think you want to select one records having max(LAST_ACTIVITY_DATE) for each CUST_ID.
For this you can use TOP 1 WITH TIES like following.
SELECT TOP 1 WITH TIES account_id,
avail_balance,
open_date,
last_activity_date
FROM account
ORDER BY Row_number()
OVER (
partition BY cust_id
ORDER BY last_activity_date DESC)
Issue with your query is, you can't select non aggregated column in select if you don't specify those columns in group by
If you want to get the max activity date for a customer then your query should be as below
select CUST_ID, MAX(LAST_ACTIVITY_DATE)
from ACCOUNT
group by CUST_ID;
You can't select any other column which is not in the group by clause. The error message also giving the same message.
with query(CUST_ID, LAST_ACTIVITY_DATE) as
(
select
CUST_ID,
MAX(LAST_ACTIVITY_DATE) as LAST_ACTIVITY_DATE
from ACCOUNT
group by CUST_ID
)
select
a.ACCOUNT_ID,
a.AVAIL_BALANCE,
a.OPEN_DATE,
a.LAST_ACTIVITY_DATE
from ACCOUNT as a
inner join query as q
on a.CUST_ID = q.CUST_ID
and a.LAST_ACTIVITY_DATE = q.LAST_ACTIVITY_DATE

Get the id of the row with the max value with two grouping

We have a data structure with four columns:
ContractoreName, ProjectCode, InvoiceID, OrderID
We want to group the data by both ContractoreName and ProjectCode columns, and then get the InvoiceID of the row for each group with MAX(OrderID).
You could use ROW_NUMBER:
SELECT ContractorName, ProjectName, OrderId, InvoiceId
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY ContractorName, ProjectName
ORDER BY OrderId DESC) AS rn
FROM tab
) AS sub
WHERE rn = 1;
ROW_NUMBER() is what I would call the canonical solution. In many cases, an old-fashioned solution has better performance:
select t.*
from t
where t.orderid = (select max(t2.orderid)
from t t2
where t2.contractorname = t.contractorname and
t2.projectname = t.projectname
);
This is especially true if there is an index on (contractorname, projectname, orderid).
Why is this faster? Basically, SQL Server can scan the table doing a lookup in an index. The lookup is really fast because the index is designed for it, so the scan is just a little faster than a full table scan.
When using row_number(), SQL Server has to scan the table to calculate the row number (and that can use the index, so it might be fast). But then it has to go back to the table to fetch the columns and apply the where clause. So, even if it uses an index, it is doing more work.
EDIT:
I should also point out that this can be done without a subquery:
select distinct contractorname, projectname,
max(orderid) over (partition by contractorname, projectname) as lastest_order,
first_value(invoiceid) partition by (order by contractorname, projectname order by orderid desc) as lastest_invoice
from t;
Unfortunately, SQL Server doesn't offer first_value() as an aggregation function, but you can use select distinct and get the same effect.

Select a random database row based on another query

For internal control we would like to select a single random invoice for each of multiple invoice types and regions.
Here's the SQL to get a set of distinct Invoice Types and Regions
select InvoiceType,RegionID
from Invoices
group by InvoiceType, RegionID
For each row this returns I need to fetch a random row with that InvoiceType and RegionID. This is how I'm fetching random rows:
SELECT top 1
CustomerID
,InvoiceNum
,Name
FROM Invoices
JOIN Customers on Customers.CustomerID=Invoices.CustomerID
where InvoiceType=X and RegionID=Y
ORDER BY NEWID
But I don't know how to run this select statement foreach() row the first statement returns. I could do it programmatically but I would prefer an option using only a stored procedure as this query isn't supposed to need a program.
WITH cteInvoices AS (
SELECT CustomerID, InvoiceNum, Name,
ROW_NUMBER() OVER(PARTITION BY InvoiceType, RegionID ORDER BY NEWID()) AS RowNum
FROM Invoices
)
SELECT c.CustomerID, c.InvoiceNum, c.Name
FROM cteInvoices c
WHERE c.RowNum = 1;

Generate Row Serial Numbers in SQL Query

I have a customer transaction table. I need to create a query that includes a serial number pseudo column. The serial number should be automatically reset and start over from 1 upon change in customer ID.
Now, I am familiar with the row_number() function in SQL. This doesnt exactly solve my problem because to the best of my knowledge the serial number will not be reset in case the order of the rows change.
I want to do this in a single query (SQL Server) and without having to go through any temporary table usage etc. How can this be done?
Sometime we might don't want to apply ordering on our result set to add serial number. But if we are going to use ROW_NUMBER() then we have to have a ORDER BY clause. So, for that we can simply apply a tricks to avoid any ordering on the result set.
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS ItemNo, ItemName FROM ItemMastetr
For that we don't need to apply order by on our result set. We'll just add ItemNo on our given result set.
select
ROW_NUMBER() Over (Order by CustomerID) As [S.N.],
CustomerID ,
CustomerName,
Address,
City,
State,
ZipCode
from Customers;
I'm not certain, based on your question if you want numbered rows that will remember their numbers even if the underlying data changes (and gives a different ordering), but if you just want numbered rows - that reset on a change in customer ID, then try using the Partition by clause of row_number()
row_number() over(partition by CustomerID order by CustomerID)
Implementing Serial Numbers Without Ordering Any of the Columns
Demo SQL Script-
IF OBJECT_ID('Tempdb..#TestTable') IS NOT NULL
DROP TABLE #TestTable;
CREATE TABLE #TestTable (Names VARCHAR(75), Random_No INT);
INSERT INTO #TestTable (Names,Random_No) VALUES
('Animal', 363)
,('Bat', 847)
,('Cat', 655)
,('Duet', 356)
,('Eagle', 136)
,('Frog', 784)
,('Ginger', 690);
SELECT * FROM #TestTable;
There are ‘N’ methods for implementing Serial Numbers in SQL Server. Hereby, We have mentioned the Simple Row_Number Function to generate Serial Numbers.
ROW_NUMBER() Function is one of the Window Functions that numbers all rows sequentially (for example 1, 2, 3, …) It is a temporary value that will be calculated when the query is run. It must have an OVER Clause with ORDER BY. So, we cannot able to omit Order By Clause Simply. But we can use like below-
SQL Script
IF OBJECT_ID('Tempdb..#TestTable') IS NOT NULL
DROP TABLE #TestTable;
CREATE TABLE #TestTable (Names VARCHAR(75), Random_No INT);
INSERT INTO #TestTable (Names,Random_No) VALUES
('Animal', 363)
,('Bat', 847)
,('Cat', 655)
,('Duet', 356)
,('Eagle', 136)
,('Frog', 784)
,('Ginger', 690);
SELECT Names,Random_No,ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS SERIAL_NO FROM #TestTable;
In the Above Query, We can Also Use SELECT 1, SELECT ‘ABC’, SELECT ” Instead of SELECT NULL. The result would be Same.
SELECT ROW_NUMBER() OVER (ORDER BY ColumnName1) As SrNo, ColumnName1, ColumnName2 FROM TableName
select ROW_NUMBER() over (order by pk_field ) as srno
from TableName
Using Common Table Expression (CTE)
WITH CTE AS(
SELECT ROW_NUMBER() OVER(ORDER BY CustomerId) AS RowNumber,
Customers.*
FROM Customers
)
SELECT * FROM CTE
I found one solution for MYSQL its easy to add new column for SrNo or kind of tepropery auto increment column by following this query:
SELECT #ab:=#ab+1 as SrNo, tablename.* FROM tablename, (SELECT #ab:= 0)
AS ab
ALTER function dbo.FN_ReturnNumberRows(#Start int, #End int) returns #Numbers table (Number int) as
begin
insert into #Numbers
select n = ROW_NUMBER() OVER (ORDER BY n)+#Start-1 from (
select top (#End-#Start+1) 1 as n from information_schema.columns as A
cross join information_schema.columns as B
cross join information_schema.columns as C
cross join information_schema.columns as D
cross join information_schema.columns as E) X
return
end
GO
select * from dbo.FN_ReturnNumberRows(10,9999)

Resources