Calculate percentage based on row_number in sql - sql-server

I'm trying to write a sql query that for each returned row, will also return a percentage (0-100) based on its row position. To further complicate this, the query is also currently grouped. Here's some example data to play with:
create table #test(StockId int)
insert into #test values (101), (101), (202), (202), (303), (404), (505)
select
StockId,
count(*) as BinMovements
from #test
group by StockId
order by BinMovements desc
This query currently returns:
StockId, BinMovements
101, 2
202, 2
303, 1
404, 1
505, 1
(Though obviously, because BinMovements is either 2 or 1 it would be equally correct to return the StockIds in a different order such as):
202, 2
101, 2
404, 1
303, 1
505, 1
I'd like to add a 'percentage' column, but this is really just based on row position, so I'd like to see values like:
101, 2, 100
202, 2, 80
303, 1, 60
404, 1, 40
505, 1, 20
I imagine the solution may involve ROW_NUMBER, and I started down this path thinking I could just get ROW_NUMBER / total rows * 100, but this isn't working for some reason. Possibly because of the group by clause?
select
StockId,
count(*) as BinMovements,
ROW_NUMBER() OVER(order by (count(*))) as RowNumber,
count(*) over () as TotalCount,
(ROW_NUMBER() OVER(order by (count(*)))) / (count(*) over ()) * 100 as Percentage
from #test
group by StockId
order by RowNumber desc
returns:
StockId BinMovements RowNumber TotalCount Percentage
202 2 5 5 100
101 2 4 5 0
505 1 3 5 0
404 1 2 5 0
303 1 1 5 0
I'd prefer to do this in a single select if possible, though if not wrapping it in an outer select may be a solution. Thanks

Both COUNT and ROW_NUMBER functions returns BIGINT.
Therefore result of division is also converted to BIGINT.
This works (you should multiply before divide):
(ROW_NUMBER() OVER(order by (count(*)))) * 100 / (count(*) over ()) as Percentage

Using count() over() and row_number() similar to your question, just reordering equation.
Note that the order by for the row_number() is the inverse of the order by for the statement, because we want to start the percentage at 100. Otherwise you can end up with 80 as the first row, 100 as the second, etc.
select
StockId
, BinMovements = count(*)
, Percentage = 100/count(*) over ()
* row_number() over (order by count(*) asc, stockid desc)
from #test
group by StockId
order by BinMovements desc, StockId asc
rextester demo: http://rextester.com/TCPE10348
+---------+--------------+------------+
| StockId | BinMovements | Percentage |
+---------+--------------+------------+
| 101 | 2 | 100 |
| 202 | 2 | 80 |
| 303 | 1 | 60 |
| 404 | 1 | 40 |
| 505 | 1 | 20 |
+---------+--------------+------------+
If you want a decimal percentage, change 100 to 100.0 in the equation.

Related

Difficulty using SELECT SUM in a query USING SQL Server

I have the following query:
WITH rows AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY TimeStamp) AS rn
FROM [AVL_Ignition]
)
SELECT mc.[DeviceIMEI], (mp.TimeStamp - mc.TimeStamp) as millisecond, mc.Value, mc.Tag
FROM rows mc
JOIN rows mp
ON mc.rn = mp.rn - 1
this query is working correctly and is returning me the following values
DeviceIMEI| milissecond|value
123 | 184 |1
123 | 184 |0
123 | 184 |1
123 | 184 |0
123 | 184 |1
123 | 184 |0
I am wanting to add the values ​​in the millisecond field where value = 1
I'm trying to use SELECT SUM as follows but I'm not getting a result
SELECT mc.[DeviceIMEI], SUM (WITH rows AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY TimeStamp) AS rn
FROM [AVL_Ignition]
)
SELECT (mp.TimeStamp - mc.TimeStamp) as millisecond
FROM rows mc
JOIN rows mp
ON mc.rn = mp.rn - 1) where mc.Value = 1;
I know that SELECT SUM is not complicated to use but I'm having trouble doing this with this query
I'd do it without using SUM in this case. If you are on SQL Server 2012+ you can use the IIF function like this (just leave your WITH section where it is):
SELECT
mc.DeviceIMEI
, (mp.TimeStamp - mc.TimeStamp) + IIF(mc.Value = 1, 1, 0) AS millisecond
, mc.Value
, mc.Tag
Otherwise you have to use a CASE:
SELECT
mc.DeviceIMEI
, (mp.TimeStamp - mc.TimeStamp) + CASE
WHEN mc.Value = 1 THEN mc.Value
ELSE 0
END AS millisecond
, mc.Value
, mc.Tag

SQL Server - Convert record per day into date range (with gaps)

I have found a lot of questions and answers asking how to convert a date range to records per day, but I need the opposite and can't find anything yet.
So let's say I have this dataset:
User | Available
1 | 01-01-2019
1 | 02-01-2019
1 | 03-01-2019
1 | 04-01-2019
2 | 05-01-2019
2 | 06-01-2019
2 | 07-01-2019
2 | 10-01-2019
2 | 11-01-2019
2 | 12-01-2019
So we have user 1 who is available from 01/01/2019 to 04/01/2019. Then we have user 2 who is available from 05/01/2019 to 07/01/2019 and 10/01/2019 to 12/01/2019.
The result I am looking for should look like this:
User | Start | End
1 | 01-01-2019 | 04-01-2019
2 | 05-01-2019 | 07-01-2019
2 | 10-01-2019 | 12-01-2019
User 1 was fairly easy to calculate using min/max dates, but with the gaps of user 2, I am completely lost. Any suggestions?
I had to do this before somewhere too, this is the solution I used. Basically use a row number split by your grouping columns and ordered by date, and additionally calculate the amount of days from a particular date onwards (any hard-coded day will work).
The key here is that while the row number increases 1 by 1, the anchor difference will only increase 1 by 1 if the days are consecutive. Thus, the rest between the anchor diff and the row number will stay the same only if there are consecutive dates, allowing you to group by and calculate min/max.
IF OBJECT_ID('tempdb..#Availabilities') IS NOT NULL
DROP TABLE #Availabilities
CREATE TABLE #Availabilities (
[User] INT,
Available DATE)
INSERT INTO #Availabilities
VALUES
(1, '2019-01-01'),
(1, '2019-01-02'),
(1, '2019-01-03'),
(1, '2019-01-04'),
(2, '2019-01-05'),
(2, '2019-01-06'),
(2, '2019-01-07'),
(2, '2019-01-10'),
(2, '2019-01-11'),
(2, '2019-01-12')
;WITH WindowFunctions AS
(
SELECT
A.[User],
A.Available,
AnchorDayDifference = DATEDIFF(DAY, '2018-01-01', A.Available),
RowNumber = ROW_NUMBER() OVER (PARTITION BY A.[User] ORDER BY A.Available)
FROM
#Availabilities AS A
)
SELECT
T.[User],
Start = MIN(T.Available),
[End] = MAX(T.Available)
FROM
WindowFunctions AS T
GROUP BY
T.[User],
T.AnchorDayDifference - T.RowNumber
Result:
User Start End
1 2019-01-01 2019-01-04
2 2019-01-05 2019-01-07
2 2019-01-10 2019-01-12
The WindowFunctions values are (added the posterior rest result):
User Available AnchorDayDifference RowNumber GroupingRestResult
1 2019-01-01 365 1 364
1 2019-01-02 366 2 364
1 2019-01-03 367 3 364
1 2019-01-04 368 4 364
2 2019-01-05 369 1 368
2 2019-01-06 370 2 368
2 2019-01-07 371 3 368
2 2019-01-10 374 4 370
2 2019-01-11 375 5 370
2 2019-01-12 376 6 370
This is a "common" Groups and Island question. Provided you're on SQL Server 2012+ (and if you're not, it's time to upgrade) this gets you the result you're after:
USE Sandbox;
GO
WITH VTE AS(
SELECT V.[User],
CONVERT(date,Available,105) AS Available
FROM (VALUES(1,'01-01-2019'),
(1,'02-01-2019'),
(1,'03-01-2019'),
(1,'04-01-2019'),
(2,'05-01-2019'),
(2,'06-01-2019'),
(2,'07-01-2019'),
(2,'10-01-2019'),
(2,'11-01-2019'),
(2,'12-01-2019')) V([User],Available)),
Diffs AS(
SELECT V.[User],
V.Available,
DATEDIFF(DAY, LAG(V.Available,1,DATEADD(DAY, -1, V.Available)) OVER (PARTITION BY V.[User] ORDER BY V.Available), V.Available) AS Diff
FROM VTE V),
Groups AS(
SELECT D.[User],
D.Available,
COUNT(CASE WHEN D.Diff > 1 THEN 1 END) OVER (PARTITION BY D.[User] ORDER BY D.Available
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM Diffs D)
SELECT G.[User],
MIN(G.Available) AS [Start],
MAX(G.Available) AS [End]
FROM Groups G
GROUP BY G.[User],
G.Grp
ORDER BY G.[User],
[Start];
The first CTE Diffs, excluding VTE ("Value Table Expression") for the sample data, gets the difference in days between the different rows. The second CTE Groups then puts the dates into groups (surprise that), base on if the difference was more than 1. Then we can use those groups to get a MIN and MAX for that group in the final SELECT.
I'm reading as MONTHS not DAYS
Example
Select [User]
,[Start] = min([Available])
,[End] = max([Available])
From (
Select *
,Grp = DateDiff(MONTH,'1900-01-01',[Available]) - Row_Number() over (Partition By [User] Order by [Available])
From YourTable
) A
Group By [User],[Grp]
Returns
User Start End
1 2019-01-01 2019-04-01
2 2019-05-01 2019-07-01
2 2019-10-01 2019-12-01

SQL Server - assign value to a field based on a running total

For a customer, I'm sending through an XML file to another system, the sales orders and I sum the quantities for each item across all sales orders lines (e.g.: if I have "ItemA" in 10 sales orders with different quantities in each one, I sum the quantity and send the total).
In return, I get a response whether the requested quantities can be delivered to the customers or not. If not, I still get the total quantity that can be delivered. However, could be situations when I request 100 pieces of "ItemA" and I cannot deliver all 100, but 98. In cases like this, I need to distribute (to UPDATE a custom field) those 98 pieces FIFO, according to the requested quantity in each sales order and based on the registration date of each sales order.
I tried to use a WHILE LOOP but I couldn't achieve the desired result. Here's my piece of code:
DECLARE #PickedQty int
DECLARE #PickedERPQty int
DECLARE #OrderedERPQty int=2
SET #PickedQty =
WHILE (#PickedQty>0)
BEGIN
SET #PickedERPQty=(SELECT CASE WHEN #PickedQty>#OrderedERPQty THEN #OrderedERPQty ELSE #PickedQty END)
SET #PickedQty=#PickedQty-#PickedERPQty
PRINT #PickedQty
IF #PickedQty>=0
BEGIN
UPDATE OrderLines
SET UDFValue2=#PickedERPQty
WHERE fDocID='82DADC71-6706-44C7-9B78-7FCB55D94A69'
END
IF #PickedQty <= 0
BREAK;
END
GO
Example of response
I requested 35 pieces but only 30 pieces are available to be delivered. I need to distribute those 30 pieces for each sales order, based on requested quantity and also FIFO, based on the date of the order. So, in this example, I will update the RealQty column with the requested quantity (because I have stock) and in the last one, I assign the remaining 5 pieces.
ord_Code CustOrderCode Date ItemCode ReqQty AvailQty RealQty
----------------------------------------------------------------------------
141389 CV/2539 2018-11-25 PX085 10 30 10
141389 CV/2550 2018-11-26 PX085 5 30 5
141389 CV/2563 2018-11-27 PX085 10 30 10
141389 CV/2564 2018-11-28 PX085 10 30 5
Could anyone give me a hint? Thanks
This might be more verbose than it needs to be, but I'll leave it to you to skinny it down if that's possible.
Set up the data:
DECLARE #OrderLines TABLE(
ord_Code INTEGER NOT NULL
,CustOrderCode VARCHAR(7) NOT NULL
,[Date] DATE NOT NULL
,ItemCode VARCHAR(5) NOT NULL
,ReqQty INTEGER NOT NULL
,AvailQty INTEGER NOT NULL
,RealQty INTEGER NOT NULL
);
INSERT INTO #OrderLines(ord_Code,CustOrderCode,[Date],ItemCode,ReqQty,AvailQty,RealQty) VALUES (141389,'CV/2539','2018-11-25','PX085',10,0,0);
INSERT INTO #OrderLines(ord_Code,CustOrderCode,[Date],ItemCode,ReqQty,AvailQty,RealQty) VALUES (141389,'CV/2550','2018-11-26','PX085', 5,0,0);
INSERT INTO #OrderLines(ord_Code,CustOrderCode,[Date],ItemCode,ReqQty,AvailQty,RealQty) VALUES (141389,'CV/2563','2018-11-27','PX085',10,0,0);
INSERT INTO #OrderLines(ord_Code,CustOrderCode,[Date],ItemCode,ReqQty,AvailQty,RealQty) VALUES (141389,'CV/2564','2018-11-28','PX085',10,0,0);
DECLARE #AvailQty INTEGER = 30;
For running totals, for SQL Server 20012 and up anyway, SUM() OVER is the preferred technique so I started off with some variants on that. This query brought in some useful numbers:
SELECT
ol.ord_Code,
ol.CustOrderCode,
ol.Date,
ol.ItemCode,
ol.ReqQty,
#AvailQty AS AvailQty,
SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]) AS TotalOrderedQty,
#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]) AS RemainingQty
FROM
#OrderLines AS ol;
Then I used the RemainingQty to do a little math. The CASE expression is hairy, but the first step checks to see if the RemainingQty after processing this row will be positive, and if it is, we fulfill the order. If not, we fulfill what we can. The nested CASE is there to stop negative numbers from coming into the result set.
SELECT
ol.ord_Code,
ol.CustOrderCode,
ol.Date,
ol.ItemCode,
ol.ReqQty,
#AvailQty AS AvailQty,
SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]) AS TotalOrderedQty,
#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]) AS RemainingQty,
CASE
WHEN (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date])) > 0
THEN ol.ReqQty
ELSE
CASE
WHEN ol.ReqQty + (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date])) > 0
THEN ol.ReqQty + (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]))
ELSE 0
END
END AS RealQty
FROM
#OrderLines AS ol
Windowing functions (like SUM() OVER) can only be in SELECT and ORDER BY clauses, so I had to do a derived table with a JOIN. A CTE would work here, too, if you prefer. But I used that derived table to UPDATE the base table.
UPDATE Lines
SET
Lines.AvailQty = d.AvailQty
,Lines.RealQty = d.RealQty
FROM
#OrderLines AS Lines
JOIN
(
SELECT
ol.ord_Code,
ol.CustOrderCode,
ol.Date,
ol.ItemCode,
#AvailQty AS AvailQty,
CASE
WHEN (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date])) > 0
THEN ol.ReqQty
ELSE
CASE
WHEN ol.ReqQty + (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date])) > 0
THEN ol.ReqQty + (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]))
ELSE 0
END
END AS RealQty
FROM
#OrderLines AS ol
) AS d
ON d.CustOrderCode = Lines.CustOrderCode
AND d.ord_Code = Lines.ord_Code
AND d.ItemCode = Lines.ItemCode
AND d.Date = Lines.Date;
SELECT * FROM #OrderLines;
Results:
+----------+---------------+---------------------+----------+--------+----------+---------+
| ord_Code | CustOrderCode | Date | ItemCode | ReqQty | AvailQty | RealQty |
+----------+---------------+---------------------+----------+--------+----------+---------+
| 141389 | CV/2539 | 25.11.2018 00:00:00 | PX085 | 10 | 30 | 10 |
| 141389 | CV/2550 | 26.11.2018 00:00:00 | PX085 | 5 | 30 | 5 |
| 141389 | CV/2563 | 27.11.2018 00:00:00 | PX085 | 10 | 30 | 10 |
| 141389 | CV/2564 | 28.11.2018 00:00:00 | PX085 | 10 | 30 | 5 |
+----------+---------------+---------------------+----------+--------+----------+---------+
Play with different available qty values here: https://rextester.com/MMFAR17436

How can I group / window date ordered events delineated by an arbitrary expression?

I would like to group some data together based on dates and some (potentially arbitrary) indicator:
Date | Ind
================
2016-01-02 | 1
2016-01-03 | 5
2016-03-02 | 10
2016-03-05 | 15
2016-05-10 | 6
2016-05-11 | 2
I would like to group together subsequent (date-ordered) rows but breaking the group after Indicator >= 10:
Date | Ind | Group
========================
2016-01-02 | 1 | 1
2016-01-03 | 5 | 1
2016-03-02 | 10 | 1
2016-03-05 | 15 | 2
2016-05-10 | 6 | 3
2016-05-11 | 2 | 3
I did find a promising technique at the end of a blog post: "Use this Neat Window Function Trick to Calculate Time Differences in a Time Series" (the final subsection, "Extra Bonus"), but the important part of the query uses a keyword (FILTER) that doesn't seem to be supported in SQL Server (and a quick Google later and I'm not sure where it is supported!).
I'm still hopeful a technique using a window function might be the answer. I just need a counter that I can add to every row, (like RANK or ROW_NUMBER does) but that only increments when some arbitrary condition evaluates as true. Is there a way to do this in SQL Server?
Here is the solution:
DECLARE #t TABLE ([Date] DATETIME, Ind INT)
INSERT INTO #t
VALUES
('2016-01-02', 1),
('2016-01-03', 5),
('2016-03-02', 10),
('2016-03-05', 15),
('2016-05-10', 6),
('2016-05-11', 2)
SELECT [Date],
Ind,
1 + SUM([Group]) OVER(ORDER BY [Date]) AS [Group]
FROM
(
SELECT *,
CASE WHEN LAG(ind) OVER(ORDER BY [Date]) >= 10
THEN 1
ELSE 0
END AS [Group]
FROM #t
) t
Just mark row as 1 when previous is greater than 10 else 0. Then a running sum will give you the desired result.
Giving full credit to Giorgi for the idea, but I've modified his answer (both for my benefit and for future readers).
Just change the CASE statement to see if 30 or more days have lapsed since the last record:
DECLARE #t TABLE ([Date] DATETIME)
INSERT INTO #t
VALUES
('2016-01-02'),
('2016-01-03'),
('2016-03-02'),
('2016-03-05'),
('2016-05-10'),
('2016-05-11')
SELECT [Date],
1 + SUM([Group]) OVER(ORDER BY [Date]) AS [Group]
FROM
(
SELECT [Date],
CASE WHEN DATEADD(d, -30, [Date]) >= LAG([Date]) OVER(ORDER BY [Date])
THEN 1
ELSE 0
END AS [Group]
FROM #t
) t

Get top X percentage based on cumulative sum

My table looks like this:
ID | ItemID | ItemQualityID | Amount | UnitPrice
My goal is to find the top x% rows for each ItemID + ItemQualityID pair based on Amount cumulative sum and ordered by UnitPrice.
For example:
ID | ItemID | ItemQualityID | Amount | UnitPrice
1 1 1 18 2
2 1 1 1 1
3 1 1 1 1
4 2 1 18 2
5 2 1 1 1
6 2 1 1 1
7 1 1 1 3
and I want the top 10%, then the resulting table should contain row #2, 3, 5, 6. Since the total amount for ItemID 1 and 2 are 21 and 20 respectively, thus 10% would be 2 items each. If I want the top 20%, the resulting table should still be the same since if I include row 1 and 4 it would make it 100%. Row #7 has unit price > row #1 so if row #1 is not included then row #7 shouldn't be included as well.
Ideally I want the table with all the filtered rows for some other calculations but I will be happy even if I can only get the sum of Amount * UnitPrice of the filtered table. Something like
ItemID | ItemQualityID | Sum
1 1 2
2 1 2
for the above example.
You can use SUM OVER :
DECLARE #percent DECIMAL(5, 2) = .1
;WITH CteSum AS(
SELECT *,
TotalSum = SUM(Amount) OVER(PARTITION BY ItemID, ItemQualityID),
CumSum = SUM(Amount) OVER(PARTITION BY ItemID, ItemQualityID ORDER BY UnitPrice, ID)
FROM tbl
)
SELECT
ItemID,
ItemQualityID,
[Sum] = SUM(Amount * UnitPrice)
FROM CteSum
WHERE CumSum <= #percent * TotalSum
GROUP BY ItemID, ItemQualityID
ONLINE DEMO

Resources