How to group ranged values using SQL Server - sql-server

I have a table of values like this
978412, 400
978813, 20
978834, 50
981001, 20
As you can see the second number when added to the first is 1 number before the next in the sequence. The last number is not in the range (doesnt follow a direct sequence, as in the next value). What I need is a CTE (yes, ideally) that will output this
978412, 472
981001, 20
The first row contains the start number of the range then the sum of the nodes within. The next row is the next range which in this example is the same as the original data.

From the article that Josh posted, here's my take (tested and working):
SELECT
MAX(t1.gapID) as gapID,
t2.gapID-MAX(t1.gapID)+t2.gapSize as gapSize
-- max(t1) is the specific lower bound of t2 because of the group by.
FROM
( -- t1 is the lower boundary of an island.
SELECT gapID
FROM gaps tbl1
WHERE
NOT EXISTS(
SELECT *
FROM gaps tbl2
WHERE tbl1.gapID = tbl2.gapID + tbl2.gapSize + 1
)
) t1
INNER JOIN ( -- t2 is the upper boundary of an island.
SELECT gapID, gapSize
FROM gaps tbl1
WHERE
NOT EXISTS(
SELECT * FROM gaps tbl2
WHERE tbl2.gapID = tbl1.gapID + tbl1.gapSize + 1
)
) t2 ON t1.gapID <= t2.gapID -- For all t1, we get all bigger t2 and opposite.
GROUP BY t2.gapID, t2.gapSize

Check out this MSDN Article. It gives you a solution to your problem, if it will work for you depends on the ammount of data you have and your performance requirements for the query.
Edit:
Well using the example in the query, and going with his last solution the second way to get islands (first way resulted in an error on SQL 2005).
SELECT MIN(start) AS startGroup, endGroup, (endgroup-min(start) +1) as NumNodes
FROM (SELECT g1.gapID AS start,
(SELECT min(g2.gapID) FROM #gaps g2
WHERE g2.gapID >= g1.gapID and NOT EXISTS
(SELECT * FROM #gaps g3
WHERE g3.gapID - g2.gapID = 1)) as endGroup
FROM #gaps g1) T1 GROUP BY endGroup
The thing I added is (endgroup-min(start) +1) as NumNodes. This will give you the counts.

Related

SQL - Attain Previous Transaction Informaiton [duplicate]

I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer

Return all records with a balance below a threshold value

I'm trying to setup a query to return all order line items with an outstanding balance below a certain threshold value (5%, for example). I managed this query without any concerns, but there is a complication. I only want to return these line items in cases where there aren't any line items outside of this threshold.
For example, if line item 1 has an Ordered Qty of 100, and 98 have been received, this line item would be returned unless there is a line item 2 with an Order qty of 100 and 50 received (since this is above the 5% threshold).
This might be more easily demonstrated than explained, so I set up a simplified SQL Fiddle to show what I have thus far. I'm using a CTE to add a remaining balance field and then querying against that within my threshold. I appreciate any advice
In the fiddle example, OrderNum 987654 should NOT be returned since that order has a second line item with 50% remaining.
SQL Fiddle
;WITH cte as (
SELECT
h.OrderNum
,d.ItemNumber
,d.OrderedQty
,d.ReceivedQty
,100.0 * (1 - (CAST(d.ReceivedQty as Numeric(10, 2)) / d.OrderedQty)) as RemainingBal
FROM OrderHeader h
INNER JOIN OrderDetail d
ON h.OrderNum = d.OrderNum
)
SELECT * FROM Cte
WHERE RemainingBal >0 and RemainingBal <= 5.0
I got this to work...
;WITH cte as (
SELECT
h.OrderNum
,d.ItemNumber
,d.OrderedQty
,d.ReceivedQty
,100.0 * (1 - (CAST(d.ReceivedQty as Numeric(10, 2)) / d.OrderedQty)) as
RemainingBal
FROM OrderHeader h
INNER JOIN OrderDetail d
ON h.OrderNum = d.OrderNum
)
SELECT * FROM Cte WHERE OrderNum IN(
SELECT OrderNum
FROM Cte
GROUP BY OrderNum
HAVING CAST((SUM(OrderedQty)) - (SUM(ReceivedQty)) AS
DECIMAL(10,2))/CAST(SUM(OrderedQty) AS DECIMAL(10,2)) <= .05
)

Show records in three tables with defiate number of rows in each table in SSRS report

I have a table in report like
I want to show the records in three tables on every page, each table contains only 20 records.
Page1:
Page2:
How can I achieve this type of pattern?
I can think of 2 ways to do this, as a MATRIX style report where the column group is your columns, and as a normal table where you JOIN the data to produce 3 copies of name, ID, and any other fields you want. The MATRIX style is definitely more elegant and flexible, but the normal table might be easier for customers to modify if you're turning the report over to power users.
Both solutions start with tagging the data with PAGE, ROW, and COLUMN information. Note that I'm sorting on NAME, but you could sort on any field. Also note that this solution does not depend on your ID being sequential and in the order you want, it generates it's own sequence numbers based on NAME or whatever else you choose.
In this demo I'm setting RowsPerPage and NumberofColumns as hard coded constants, but they could easily be user selected parameters if you use the MATRIX format.
DECLARE #RowsPerPage INT = 20
DECLARE #Cols INT = 3
;with
--Fake data generation BEGIN
cteSampleSize as (SELECT TOP 70 ROW_NUMBER () OVER (ORDER BY O.name) as ID
FROM sys.objects as O
), cteFakeData as (
SELECT N.ID, CONCAT(CHAR(65 + N.ID / 26), CHAR(65 + ((N.ID -1) % 26))
--, CHAR(65 + ((N.ID ) % 26))
) as Name
FROM cteSampleSize as N
),
--Fake data generation END, real processing begins below
cteNumbered as ( -- We can't count on ID being sequential and in the order we want!
SELECT D.*, ROW_NUMBER () OVER (ORDER BY D.Name) as SeqNum
--Replace ORDER BY D.Name with ORDER BY D.{Whatever field}
FROM cteFakeData as D --Replace cteFakeData with your real data source
), ctePaged as (
SELECT D.*
, 1+ FLOOR((D.SeqNum -1) / (#RowsPerPage*#Cols)) as PageNum
, 1+ ((D.SeqNum -1) % #RowsPerPage) as RowNum
, 1+ FLOOR(((D.SeqNum-1) % (#RowsPerPage*#Cols) ) / #RowsPerPage) as ColNum
FROM cteNumbered as D
)
--FINAL - use this for MATRIX reports (best)
SELECT * FROM ctePaged ORDER BY SeqNum
If you want to use the JOIN method to allow this in a normal table, replace the --FINAL query above with this one. Note that it's pretty finicky, so test it with several degrees of fullness in the final report. I tested with 70 and 90 rows of sample data so I had a partial first column and a full first and partial second.
--FINAL - use this for TABLE reports (simpler)
SELECT C1.PageNum , C1.RowNum , C1.ID as C1_ID, C1.Name as C1_Name
, C2.ID as C2_ID, C2.Name as C2_Name
, C3.ID as C3_ID, C3.Name as C3_Name
FROM ctePaged as C1 LEFT OUTER JOIN ctePaged as C2
ON C1.PageNum = C2.PageNum AND C1.RowNum = C2.RowNum
AND C1.ColNum = 1 AND (C2.ColNum = 2 OR C2.ColNum IS NULL)
LEFT OUTER JOIN ctePaged as C3 ON C1.PageNum = C3.PageNum
AND C1.RowNum = C3.RowNum AND (C3.ColNum = 3 OR C3.ColNum IS NULL)
WHERE C1.ColNum = 1
1) Add the dataset with the below query to get Page number and Table number. You can change the number 20 and 60 as per requirement. In my case, I need 20 records per section and having 3 sections, so total records per page are 60.
Select *,(ROW_NUMBER ( ) OVER ( partition by PageNumber order by Id )-1)/20 AS TableNumber from (
Select (ROW_NUMBER ( ) OVER ( order by Id )-1)/60 AS PageNumber
,* from Numbers
)Src
2)Add the table of one column and select the prepared dataset.
3)Add PageNumber in Group expression for Details group.
4)Add the Column parent group by right-clicking on detail row. Select Group by TableNumber.
5) Delete the first two rows. Select Delete rows only.
6) Add one more table and select the ID and Name.
7) Drag this newly created table into the cell of the previously created table. And increase the size of the table.
Result:
Each table section contains 20 records. and it will continue in next pages also.

Optimise Running Multiplication Calculation in SQL server

I have a large database table with about 2400 records and when I run the function below:
SELECT (SELECT EXP(SUM(LOG((cast(t1.NAT as float) + ISNULL(cast(t1.Dist as float),0))/cast(t1.NAT as float)))) FROM Test t1 where t1.CODE = t2.CODE AND t1.DATE <= t2.DATE) as Distro FROM Test t2
The code above causes performance issues as it goes through every row. Is there a way to optimise it? Are there any mistakes I am making?
The table I use this function on doesn't have its data sorted by DATE and I cannot sort it.
Try the below JOIN version of your query
SELECT
Distro=EXP(SUM(LOG( 1 + ISNULL(cast(t1.Dist as float)/cast(t1.NAT as float),0))))
FROM
Test t1
JOIN
Test t2
On t1.CODE = t2.CODE AND t1.DATE <= t2.DATE
Also if your problems are functions you can also get same results using
DECLARE #result float=1
SELECT
#result=#result*( 1 + ISNULL(cast(t1.Dist as float)/cast(t1.NAT as float),0))
FROM
Test t1
JOIN
Test t2
On t1.CODE = t2.CODE AND t1.DATE <= t2.DATE
SELECT Distro=#result
Is there a way to optimise it? Are there any mistakes I am making?
Yes.
This is a triangular join.
You say that the table has 2,400 rows. Assuming that there is only one CODE for simplicity then even if there is an index on CODE, DATE the subquery would need to process on average half the table (the outer row with the lowest date would only cause one row to be summed but by the time the highest date is encountered it needs to sum the whole 2,400 rows.
So in total the number of rows being summed would be 2,881,200 (2400 * 2401 / 2).
Your situation likely isn't as bad as that - dependent on how many CODE you do in fact have and how well distributed they are but still using window functions will be more efficient as they can do it with one pass through the data.
On the assumption that CODE, DATE is unique you can use
SELECT CASE
WHEN Min(Abs(input))
OVER (
PARTITION BY Code ORDER BY DATE) = 0 THEN 0
ELSE CASE
WHEN Sum(Sign(CASE
WHEN input < 0 THEN 1
ELSE 0
END))
OVER (
PARTITION BY Code ORDER BY DATE) % 2 = 1 THEN -1
ELSE 1
END * Exp(Sum(Log(Abs(NULLIF(input, 0))))
OVER (
PARTITION BY Code ORDER BY DATE))
END
FROM Test t1
CROSS APPLY (VALUES (( CAST(t1.NAT AS FLOAT) + ISNULL(CAST(t1.Dist AS FLOAT), 0) ) / CAST(t1.NAT AS FLOAT))) V(input)

Generate a row range based on "FromRange" "ToRange" field of each row

I have a table with following fields:
DailyWork(ID, WorkerID, FromHour, ToHour) assume that, all of the fields are of type INT.
This table needs to be expanded in a T_SQL statement to be part of a JOIN.
By expand a row, I mean, generate a hour for each number in range of FromHour and ToHour. and then join it with the rest of the statement.
Example:
Assume, I have another table like this: Worker(ID, Name). and a simple SELECT statement would be like this:
SELECT * FROM
Worker JOIN DailyWork ON Worker.ID = DailyWork.WorkerID
The result has columns similar to this: WorkerID, Name, DailyWorkID, WorkerID, FromHour, ToHour
But, what i need, has columns like this: WorkerID, Name, Hour.
In fact the range of FromHour and ToHour is expanded. and each individual hour placed in separate row, in Hour column.
Although i read a similar question to generate a range of number , but it didn't really help.
I you start with a list of numbers, then this is pretty easy. Often, the table master.spt_values is used for this purpose:
with nums as (
select row_number() over (order by (select null)) - 1 as n
from master.spt_values
)
select dw.*, (dw.fromhour + nums.n) as specifichour
from dailywork dw join
nums
on dw.tohour >= dw.fromhour + nums.n;
The table master.spt_values generally has a few thousand rows at least.
Another solution would be...
WITH [DayHours] AS (
SELECT 1 AS [DayHour]
UNION ALL
SELECT [DayHour] + 1 FROM [DayHours] WHERE [DayHour] + 1 <= 24
)
SELECT [Worker]
JOIN [DayHours] ON [Worker].[FromHour] <= [DayHours].[DayHour]
AND [Worker].[ToHour] >= [DayHours].[DayHour]

Resources