Determining Percentile Difference from lag, lead and current row using windows function? - sql-server

Is there a better way to get the moving percentile using a windows function without utilizing a CTE or derived table etc? I wanted to fit it all in one query utilizing windows functions, but im having a hell of a time converting it to a percentile value. The only resolution I could think of was to create the numeric values and then do the math with the table. Just would be cool if there was a more streamlined way to do this?
WITH numberdata AS
(
SELECT
custid
,orderid
,LAG(VAL) OVER(PARTITION BY CUSTID ORDER BY ORDERID DESC) as lagval
,LEAD(VAL) OVER(PARTITION BY CUSTID ORDER BY ORDERID DESC) as leadval
,val - LAG(VAL) OVER(PARTITION BY CUSTID ORDER BY ORDERID DESC) as lagvaldiff
,val - LEAD(VAL) OVER(PARTITION BY CUSTID ORDER BY ORDERID DESC) as leadvaldiff
,val
FROM sales.ordervalues
)
select
CAST((lagval)/val AS NUMERIC(10,2)) as lagpctdiff
,CAST((leadval)/val AS NUMERIC(10,2)) as leadpctdiff
,CAST((lagvaldiff)/leadvaldiff AS NUMERIC(10,2)) as pctdiff
,val
,lagval
from numberdata
order by custid desc
This is just me studying to learn more about the code in preparation of a test. Data comes from the sales.ordervalues table located in training db TSQL_2012.
How can I convert the leadvaldiff and lagvaldiff columns to a percentage without placing it within a CTE?
dataset

Am i missing something or is this what you are looking for?
SELECT LAG(VAL) OVER(PARTITION BY CUSTID ORDER BY ORDERID DESC)/val as lagval
,LEAD(VAL) OVER(PARTITION BY CUSTID ORDER BY ORDERID DESC)/val as leadval
,(val - LAG(VAL) OVER(PARTITION BY CUSTID ORDER BY ORDERID DESC))/
(val - LEAD(VAL) OVER(PARTITION BY CUSTID ORDER BY ORDERID DESC)) as pctdiff
FROM sales.OrderValues;

Related

How to group adjacent row and sum the data in SQL

I would like to sum the Value and group the adjacent row in SQL as shown below. May I know how to do that?
My code now:
Select ID, Value from Table_1
Further question
how about this?
This is a typical gaps and island problem.
As a starter: keep in mind that SQL tables represents unordered set of rows. So for your question to be solved, you need a column that defines the ordering of rows across the table - I assumed ordering_id.
Here is an approach that uses the difference between row_numbers() to build the groups of adjacent rows having the same id:
select
id,
sum(value) value
from (
select
t.*,
row_number() over(order by ordering_id) rn1
row_number() over(partition by id order by ordering_id) rn2
from mytable t
) t
group by id, rn1 - rn2
If you want this on a per user basis:
select
user,
id,
sum(value) value
from (
select
t.*,
row_number() over(partition by user order by ordering_id) rn1
row_number() over(partition by user, id order by ordering_id) rn2
from mytable t
) t
group by user, id, rn1 - rn2

SQL Server window function implementation issue

I have a table structure like below:
I have the following query to get the unique result from the table:
WITH Dupes AS
(
SELECT
ID, Template_ID, Address, Job_Number, Other_Info,
Assigned_By, Assignees, Active, seen,
ROW_NUMBER() OVER (PARTITION BY Template_ID,Job_Number ORDER BY ID) AS RowNum
FROM
Schedule
WHERE
Assignees IN ('9', '16', '22')
)
SELECT
ID, Template_ID, Job_Number, Address, Other_Info,
Assigned_By, Assignees, Active, seen
FROM
Dupes
WHERE
RowNum = 1
Output of the above query is:
If the Job_Number and Template_ID are same, only return one row(first row using ID). That is why I did use ROW_NUMBER() OVER(PARTITION BY Template_ID,Job_Number ORDER BY ID) AS RowNum. I am not sure how to fix this as I rarely used this function.
I need to get the output like below:
Updated Code
Tried the code below:
seems your trying to group by Job_Number, remove Template_ID on your partition by clause
WITH Dupes AS
(
SELECT ID,Template_ID,Address,Job_Number,Other_Info,Assigned_By,Assignees,Active,seen,
ROW_NUMBER() OVER(PARTITION BY rtrim(ltrim(Job_Number)) ORDER BY ID) AS RowNum
FROM Schedule
WHERE Assignees IN('9','16','22')
)
SELECT ID,Template_ID,Job_Number,Address,Other_Info,Assigned_By,Assignees, Active,seen FROM Dupes WHERE RowNum=1

SQL ORDER BY clause ERROR in the ROW_NUMBER

I want to use ROW_NUMBER() function and get first and latest values.
I write bellow query. But I got an error.
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
help me to solve the issue. Below the sql query
SELECT *
FROM(
SELECT OPP_ID,PRJ_ID,
ROW_NUMBER() OVER (PARTITION BY OPP_ID ORDER BY MAX(CREATION_DATE) DESC) AS RN
FROM OPPOR
GROUP BY OPP_ID,PRJ_ID
ORDER BY MAX(CREATION_DATE) DESC) OP
WHERE OP.RN = 1
The row_number function can do it's own aggregation and ordering, so no need to use group by or order by in your subquery (order by won't work in subqueries as you've seen). It is a little unclear if you want to partition by opp_id or opp_id and prj_id though. But this should be what you're looking for:
SELECT *
FROM(
SELECT OPP_ID,PRJ_ID,
ROW_NUMBER() OVER (PARTITION BY OPP_ID ORDER BY CREATION_DATE DESC) AS RN
FROM OPPOR
) OP
WHERE OP.RN = 1

Get the id of the row with the max value with two grouping

We have a data structure with four columns:
ContractoreName, ProjectCode, InvoiceID, OrderID
We want to group the data by both ContractoreName and ProjectCode columns, and then get the InvoiceID of the row for each group with MAX(OrderID).
You could use ROW_NUMBER:
SELECT ContractorName, ProjectName, OrderId, InvoiceId
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY ContractorName, ProjectName
ORDER BY OrderId DESC) AS rn
FROM tab
) AS sub
WHERE rn = 1;
ROW_NUMBER() is what I would call the canonical solution. In many cases, an old-fashioned solution has better performance:
select t.*
from t
where t.orderid = (select max(t2.orderid)
from t t2
where t2.contractorname = t.contractorname and
t2.projectname = t.projectname
);
This is especially true if there is an index on (contractorname, projectname, orderid).
Why is this faster? Basically, SQL Server can scan the table doing a lookup in an index. The lookup is really fast because the index is designed for it, so the scan is just a little faster than a full table scan.
When using row_number(), SQL Server has to scan the table to calculate the row number (and that can use the index, so it might be fast). But then it has to go back to the table to fetch the columns and apply the where clause. So, even if it uses an index, it is doing more work.
EDIT:
I should also point out that this can be done without a subquery:
select distinct contractorname, projectname,
max(orderid) over (partition by contractorname, projectname) as lastest_order,
first_value(invoiceid) partition by (order by contractorname, projectname order by orderid desc) as lastest_invoice
from t;
Unfortunately, SQL Server doesn't offer first_value() as an aggregation function, but you can use select distinct and get the same effect.

How to extract the latest created row for each group?

I Have a table below and I would like to group the data by Opportunity_Id and pick up the rows with latest CreatedDate for each Opportunity_Id. I use local variable and table type to load in case by case. Is there any other more effienct way like using a set of query only to achieve the same result? Thanks heaps
SELECT *
FROM (
SELECT *,
row_number() over (partition by opportunity_id
order by Created_Date desc) rn
FROM yourTable
) T
WHERE T.rn = 1

Resources