Optimise Running Multiplication Calculation in SQL server - sql-server

I have a large database table with about 2400 records and when I run the function below:
SELECT (SELECT EXP(SUM(LOG((cast(t1.NAT as float) + ISNULL(cast(t1.Dist as float),0))/cast(t1.NAT as float)))) FROM Test t1 where t1.CODE = t2.CODE AND t1.DATE <= t2.DATE) as Distro FROM Test t2
The code above causes performance issues as it goes through every row. Is there a way to optimise it? Are there any mistakes I am making?
The table I use this function on doesn't have its data sorted by DATE and I cannot sort it.

Try the below JOIN version of your query
SELECT
Distro=EXP(SUM(LOG( 1 + ISNULL(cast(t1.Dist as float)/cast(t1.NAT as float),0))))
FROM
Test t1
JOIN
Test t2
On t1.CODE = t2.CODE AND t1.DATE <= t2.DATE
Also if your problems are functions you can also get same results using
DECLARE #result float=1
SELECT
#result=#result*( 1 + ISNULL(cast(t1.Dist as float)/cast(t1.NAT as float),0))
FROM
Test t1
JOIN
Test t2
On t1.CODE = t2.CODE AND t1.DATE <= t2.DATE
SELECT Distro=#result

Is there a way to optimise it? Are there any mistakes I am making?
Yes.
This is a triangular join.
You say that the table has 2,400 rows. Assuming that there is only one CODE for simplicity then even if there is an index on CODE, DATE the subquery would need to process on average half the table (the outer row with the lowest date would only cause one row to be summed but by the time the highest date is encountered it needs to sum the whole 2,400 rows.
So in total the number of rows being summed would be 2,881,200 (2400 * 2401 / 2).
Your situation likely isn't as bad as that - dependent on how many CODE you do in fact have and how well distributed they are but still using window functions will be more efficient as they can do it with one pass through the data.
On the assumption that CODE, DATE is unique you can use
SELECT CASE
WHEN Min(Abs(input))
OVER (
PARTITION BY Code ORDER BY DATE) = 0 THEN 0
ELSE CASE
WHEN Sum(Sign(CASE
WHEN input < 0 THEN 1
ELSE 0
END))
OVER (
PARTITION BY Code ORDER BY DATE) % 2 = 1 THEN -1
ELSE 1
END * Exp(Sum(Log(Abs(NULLIF(input, 0))))
OVER (
PARTITION BY Code ORDER BY DATE))
END
FROM Test t1
CROSS APPLY (VALUES (( CAST(t1.NAT AS FLOAT) + ISNULL(CAST(t1.Dist AS FLOAT), 0) ) / CAST(t1.NAT AS FLOAT))) V(input)

Related

Return all records with a balance below a threshold value

I'm trying to setup a query to return all order line items with an outstanding balance below a certain threshold value (5%, for example). I managed this query without any concerns, but there is a complication. I only want to return these line items in cases where there aren't any line items outside of this threshold.
For example, if line item 1 has an Ordered Qty of 100, and 98 have been received, this line item would be returned unless there is a line item 2 with an Order qty of 100 and 50 received (since this is above the 5% threshold).
This might be more easily demonstrated than explained, so I set up a simplified SQL Fiddle to show what I have thus far. I'm using a CTE to add a remaining balance field and then querying against that within my threshold. I appreciate any advice
In the fiddle example, OrderNum 987654 should NOT be returned since that order has a second line item with 50% remaining.
SQL Fiddle
;WITH cte as (
SELECT
h.OrderNum
,d.ItemNumber
,d.OrderedQty
,d.ReceivedQty
,100.0 * (1 - (CAST(d.ReceivedQty as Numeric(10, 2)) / d.OrderedQty)) as RemainingBal
FROM OrderHeader h
INNER JOIN OrderDetail d
ON h.OrderNum = d.OrderNum
)
SELECT * FROM Cte
WHERE RemainingBal >0 and RemainingBal <= 5.0
I got this to work...
;WITH cte as (
SELECT
h.OrderNum
,d.ItemNumber
,d.OrderedQty
,d.ReceivedQty
,100.0 * (1 - (CAST(d.ReceivedQty as Numeric(10, 2)) / d.OrderedQty)) as
RemainingBal
FROM OrderHeader h
INNER JOIN OrderDetail d
ON h.OrderNum = d.OrderNum
)
SELECT * FROM Cte WHERE OrderNum IN(
SELECT OrderNum
FROM Cte
GROUP BY OrderNum
HAVING CAST((SUM(OrderedQty)) - (SUM(ReceivedQty)) AS
DECIMAL(10,2))/CAST(SUM(OrderedQty) AS DECIMAL(10,2)) <= .05
)

MSSQL can't understand what's happening with the action "having count(*) lesser than <some field of other table>"

I've tried to understand some part of an exercise i'm doing and just couldn't get it.
There's a part where 'T' is selected, grouped by 'a' and than it's redirected to "having count(*) < T3.a",
and I don't know how to approach it.
I've tried googling this sort of thing and see if there are similar examples but all other examples were using regular numbers for ex.: "having count(*) < 5" and not whole fields for comparison.
The exercise is this:
MSSQL exercise
create table T(a int, b int);
insert into T values(1,2);
insert into T values(1,1);
insert into T values(2,3);
insert into T values(2,4);
insert into T values(3,4);
insert into T values(4,5);
select T3.b, (select count(T5.a)
from T T5
where T5.a = T3.b)
from (select T1.a as a, T2.b as b
from T T1, T T2
where T1.b < T2.a) as T3
where not exists (select T4.a
from T T4
group by T4.a
having count(*) < T3.a);
I thought that the having count(*) was comparing each value that was grouped by to each value of T3.a in each row and if all rows have met the criteria than the value is getting selected but I somehow get different results.
Can someone please explain to me what is really going on behind this "having count(*) < T3.a" operation?
Thank you in advance.
To repeat myself from the comments, a HAVING is like a WHERE for aggregate functions. You cannot use aggregate function in the WHERE, for example WHERE SUM(SomeColumn) > 5, so you need to do them in the HAVING: HAVING SUM(SomeColumn) > 5. This would returns any rows where the SUM of the column SomeColumn is greater than 5 in the group.
For your expression, HAVING COUNT(*) < T3.a it would only return rows where the value of COUNT(*) is less than the value of T3.a.
Let's break this down to it's separate parts.
First the FROM
from (select T1.a as a, T2.b as b
from T T1, T T2
where T1.b < T2.a) as T3
This uses the old-style deprecated cross-join syntax. It can be rewritten as a normal join:
from (select T1.a as a, T2.b as b
from T T1
join T T2 on T1.b < T2.a
) as T3
If we analyze what it does, we realize that it is actually what is known as a triangular join: every row is self-joined to every row lower than it. This was commonly done when window aggregates were not available.
WHERE
where not exists (select T4.a
from T T4
group by T4.a
having count(*) < T3.a);
This is a correlated subquery: T3.a is a reference to the outer query.
What this predicate says is: for this particular row, there must be no rows in the subquery.
The subquery itself says: take all rows in T, group them by a and count, then only include rows for which the count is less than the outer reference a.
Note that because it is an EXIST, the actual selected value is not used. I suspect this may not have been the intention.
SELECT
select T3.b, (select count(T5.a)
from T T5
where T5.a = T3.b)
We then take b from the first join, and the count from a subquery of all matching T rows. Again, this was common when window aggregates were not available.
So the whole thing can be rewritten as follows:
select T2.b, (select count(T5.a)
from T T5
where T5.a = T3.b)
from (
select *, count(*) over (partition by a) as cnt
from T
) T1
join T T2 on T1.b < T2.a
where T1.cnt < T1.a;
There is something not quite right about the logic in your query, but without knowing what the original intention was, and without seeing the table and column names, I cannot say. The triangular join in particular looks very suspect.

Calculate a Recursive Rolling Average in SQL Server

We are attempting to calculate a rolling average and have tried to convert numerous SO answers to solve the problem. To this point we are still unsuccessful.
What we've tried:
Here are some of the SO answers we have considered.
SQL Server: How to get a rolling sum over 3 days for different customers within same table
SQL Query for 7 Day Rolling Average in SQL Server
T-SQL calculate moving average
Our latest attempt has been to modify one of the solutions (#4) found here.
https://www.red-gate.com/simple-talk/sql/t-sql-programming/calculating-values-within-a-rolling-window-in-transact-sql/
Example:
Here is an example in SQL Fiddle: http://sqlfiddle.com/#!6/4570a/17
In the fiddle, we are still trying to get the SUM to work right but ultimately we are trying to get the average.
The end goal
Using the Fiddle example, we need to find the difference between Value1 and ComparisonValue1 and present it as Diff1. When a row has no Value1 available, we need to estimate it by taking the average of the last two Diff1 values and then add it to the ComparisonValue1 for that row.
With the correct query, the result would look like this:
GroupID Number ComparisonValue1 Diff1 Value1
5 10 54.78 2.41 57.19
5 11 55.91 2.62 58.53
5 12 55.93 2.78 58.71
5 13 56.54 2.7 59.24
5 14 56.14 2.74 58.88
5 15 55.57 2.72 58.29
5 16 55.26 2.73 57.99
Question: is it possible to calculate this average when it could potentially factor into the average of the following rows?
Update:
Added a VIEW to the Fiddle schema to simplify the final query.
Updated the query to include the new rolling average for Diff1 (column Diff1Last2Avg). This rolling average works great until we run into nulls in the Value1 column. This is where we need to insert the estimate.
Updated the query to include the estimate that should be used when there is no Value1 (column Value1Estimate). This is working great and would be perfect if we could use the estimate in place of NULL in the Value1 column. Since the Diff1 column reflects the difference between Value1 (or its estimate) and ComparisonValue1, including the Estimate would fill in all the NULL values in Diff1. This in turn would continue to allow the Estimates of future rows to be calculated. It gets confusing at this point, but still hacking away at it. Any ideas?
Credit for the idea goes to this answer: https://stackoverflow.com/a/35152131/6305294 from #JesúsLópez
I have included comments in the code to explain it.
UPDATE
I have corrected the query based on comments.
I have swapped numbers in minuend and subtrahend to get difference as a positive number.
Removed Diff2Ago column.
Results of the query now exactly match your sample output.
;WITH cte AS
(
-- This is similar to your ItemWithComparison view
SELECT i.Number, i.Value1, i2.Value1 AS ComparisonValue1,
-- Calculated Differences; NULL will be returned when i.Value1 is NULL
CONVERT( DECIMAL( 10, 3 ), i.Value1 - i2.Value1 ) AS Diff
FROM Item AS i
LEFT JOIN [Group] AS G ON g.ID = i.GroupID
LEFT JOIN Item AS i2 ON i2.GroupID = g.ComparisonGroupID AND i2.Number = i.Number
WHERE NOT i2.Id IS NULL
),
cte2 AS(
/*
Start with the first number
Note if you do not have at least 2 consecutive numbers (in cte) with non-NULL Diff value and therefore Diff1Ago or Diff2Ago are NULL then everything else will not work;
You may need to add additional logic to handle these cases */
SELECT TOP 1 -- start with the 1st number (see ORDER BY)
a.Number, a.Value1, a.ComparisonValue1, a.Diff, b.Diff AS Diff1Ago
FROM cte AS a
-- "1 number ago"
LEFT JOIN cte AS b ON a.Number - 1 = b.Number
WHERE NOT a.Value1 IS NULL
ORDER BY a.Number
UNION ALL
SELECT b.Number, b.Value1, b.ComparisonValue1,
( CASE
WHEN NOT b.Value1 IS NULL THEN b.Diff
ELSE CONVERT( DECIMAL( 10, 3 ), ( a.Diff + a.Diff1Ago ) / 2.0 )
END ) AS Diff,
a.Diff AS Diff1Ago
FROM cte2 AS a
INNER JOIN cte AS b ON a.Number + 1 = b.Number
)
SELECT *, ( CASE WHEN Value1 IS NULL THEN ComparisonValue1 + Diff ELSE Value1 END ) AS NewValue1
FROM cte2 OPTION( MAXRECURSION 0 );
Limitations:
this solution works well only when you need to consider small number of preceding values.

Rewrite SQL Query- I need to replace NOT IN with Join

I have a query in my production environment which is taking long time to execute. I did not write this query but I must find a way to make it quicker since it is causing a big performance issue at the moment. I need to replace NOT IN with Left Join but not sure how to rewrite it. It looks like following at the moment
SELECT TOP 1 IT.ITEMID
FROM (SELECT CAST(ITEMID AS NUMERIC) + 1 ITEMID
FROM Items
WHERE ISNUMERIC(ITEMID) = 1
AND CAST(ITEMID AS NUMERIC) >= 50000) IT
WHERE IT.ITEMID NOT IN (SELECT CAST(ITEMID AS NUMERIC) ITEMID
FROM Items
WHERE ISNUMERIC(ITEMID) = 1)
ORDER BY IT.ITEMID
Kindly suggest how am I supposed to rewrite it using Left Join for better performance. Any help/guidance is greatly appreciated.
Try this one -
;WITH cte AS
(
SELECT DISTINCT ITEMID =
CASE WHEN ISNUMERIC(ITEMID) = 1
THEN ITEMID
END
FROM Items
)
SELECT TOP 1 ITEMID = ITEMID + 1
FROM cte t
WHERE ITEMID >= 50000
AND NOT EXISTS(
SELECT 1
FROM cte t2
WHERE t.ITEMID + 1 = t2.ITEMID
)
ORDER BY t.ITEMID
As mentioned in the comments, the NOT EXISTS version of the query is usually faster in SQLServer than the LEFT JOIN - for completeness, here's both versions:
Left join variant of existing query:
with cte as
(SELECT CAST(it.ITEMID AS NUMERIC) ITEMID
FROM Items
WHERE ISNUMERIC(ITEMID) = 1)
select top 1 i.ITEMID + 1 ITEMID
FROM cte i
LEFT JOIN cte ni ON i.ITEMID + 1 = ni.ITEMID
WHERE i.ITEMID >= 50000 AND ni.ITEMID IS NULL
Not exists variant of existing query:
with cte as
(SELECT CAST(it.ITEMID AS NUMERIC) ITEMID
FROM Items
WHERE ISNUMERIC(ITEMID) = 1)
select top 1 i.ITEMID + 1 ITEMID
FROM cte i
WHERE i.ITEMID >= 50000 AND NOT EXISTS
(SELECT NULL
FROM cte ni
WHERE i.ITEMID + 1 = ni.ITEMID)
As #gbn pointed at the comments, the CAST and functions on predicates which invalidates index use anyway, so there is no point in converting this from NOT IN to LEFT JOIN / IS NULL or to NOT EXISTS. And NOT EXISTS usually performs better than LEFT NULL in SQL-Server.
NOT IN is not advised due to the problems (wrong, unexpected results) when there are nulls (in the compared columns or produced by the expressions) and the inefficient plans because of the nullability of the columns/expessions.
And ISNUMERIC() is not doing always what you think it does (as # Damien_The_Unbeliever noted in another comment.) There are cases where the IsNumeric result is 1 but the cast fails.
So, the sane thing to do would be - in my opinion - to add another column in the table and convert (the values that can be converted) to numeric and store them in that column. Then you could write the query without casting and an index on that column could be used.
If you cannot alter the tables in any way (by adding a new column or a materialized view), then you can try and test the various rewritings the other answers offer.
I agree with #ypercube that the sane thing to do is to fix your schema.
If for some reason this is not an option maybe materialising the whole thing into an indexed temporary table at runtime would make the best of a bad job.
CREATE TABLE #T
(
ITEMID NUMERIC(18,0) PRIMARY KEY
WITH ( IGNORE_DUP_KEY = ON)
)
INSERT INTO #T
SELECT CASE WHEN ISNUMERIC(ITEMID) = 1 THEN ITEMID END
FROM Items
WHERE CASE WHEN ISNUMERIC(ITEMID) = 1 THEN ITEMID END >= 50000
SELECT TOP 1 ITEMID+1
FROM #T T1
WHERE NOT EXISTS (SELECT * FROM #T T2 WHERE T2.ITEMID = T1.ITEMID +1)
ORDER BY ITEMID

How to group ranged values using SQL Server

I have a table of values like this
978412, 400
978813, 20
978834, 50
981001, 20
As you can see the second number when added to the first is 1 number before the next in the sequence. The last number is not in the range (doesnt follow a direct sequence, as in the next value). What I need is a CTE (yes, ideally) that will output this
978412, 472
981001, 20
The first row contains the start number of the range then the sum of the nodes within. The next row is the next range which in this example is the same as the original data.
From the article that Josh posted, here's my take (tested and working):
SELECT
MAX(t1.gapID) as gapID,
t2.gapID-MAX(t1.gapID)+t2.gapSize as gapSize
-- max(t1) is the specific lower bound of t2 because of the group by.
FROM
( -- t1 is the lower boundary of an island.
SELECT gapID
FROM gaps tbl1
WHERE
NOT EXISTS(
SELECT *
FROM gaps tbl2
WHERE tbl1.gapID = tbl2.gapID + tbl2.gapSize + 1
)
) t1
INNER JOIN ( -- t2 is the upper boundary of an island.
SELECT gapID, gapSize
FROM gaps tbl1
WHERE
NOT EXISTS(
SELECT * FROM gaps tbl2
WHERE tbl2.gapID = tbl1.gapID + tbl1.gapSize + 1
)
) t2 ON t1.gapID <= t2.gapID -- For all t1, we get all bigger t2 and opposite.
GROUP BY t2.gapID, t2.gapSize
Check out this MSDN Article. It gives you a solution to your problem, if it will work for you depends on the ammount of data you have and your performance requirements for the query.
Edit:
Well using the example in the query, and going with his last solution the second way to get islands (first way resulted in an error on SQL 2005).
SELECT MIN(start) AS startGroup, endGroup, (endgroup-min(start) +1) as NumNodes
FROM (SELECT g1.gapID AS start,
(SELECT min(g2.gapID) FROM #gaps g2
WHERE g2.gapID >= g1.gapID and NOT EXISTS
(SELECT * FROM #gaps g3
WHERE g3.gapID - g2.gapID = 1)) as endGroup
FROM #gaps g1) T1 GROUP BY endGroup
The thing I added is (endgroup-min(start) +1) as NumNodes. This will give you the counts.

Resources