I wish to put this condition in a more sargable way - sql-server

In my sql server 2017 standard CU 20 there is a very often excuted query that has this awful condition:
AND F.progressive_invoice % #numberofservicesInstalled = #idService
Is there a mathematical way to put it in a more convenient way for sql server?
The query lasts from 240 to 500 ms
Can you help me to do better? Please

What do you think is particularly awful here?
Is this query performing badly? Are you sure, that this condition is responsible?
This % is the modulo operator.
SELECT 13 % 5 => Remainder is 3
--This is roughly the same your code is doing:
DECLARE #Divisor INT=5; --Switch this value
DECLARE #CompareRemainder INT=3;
SELECT CASE WHEN 13 % #Divisor = #CompareRemainder THEN 'Remainder matches variable' ELSE 'no match' END;
Your line of code will tell the engine to compute a integer division of F.progressive_invoice and the variable #numberofservicesInstalled, then pick the remainder. This computation's result is compared to the variable #idService.
As this computation must be done for each value, an index will not help here...
I do not think, that this can be put more sargable.
UPDATE
In a comment you suggest, that it might help to change the code behind the equal operator. No this will not help.
I tried to think of a senseful meaning of this... Is the number of a service (or - as the variable suggests - its id) somehow hidden in the invoice number?
execution plan and row estimation:
The engine will see, that this must be computed for all rows. It would help to enforce any other filter before you have to do this. But you do not show enough. The line of code we see is just one part of a condition...
Indexes and statistics will surely play their roles too...

The short answer to your direct question is no. You cannot rearrange this expression to make it a sargable predicate. At least, not with a construct which is finite and agnostic of the possible values of #numberofservicesinstalled.

Related

Optimizing array overlap query using && operator

I have a relatively simple query that attempts to calculate the count of rows I'll have to deal with in a later operation. It looks like:
SELECT COUNT(*)
FROM my_table AS t1
WHERE t1.array_of_ids && ARRAY[cast('1' as bigint)];
The tricky piece is that ARRAY[] portion is determined by the code that invokes the query so instead of having 1 element in this example it could have hundreds or thousands. This makes the query take a decent amount of time to run if a user is actively waiting for the calculation to complete.
Is there anything obvious I'm doing wrong or any obvious improvement that could be made?
Thanks!
Edit:
There are not any indexes on the table. I tried to create one with
CREATE INDEX my_index on my_table(array_of_ids);
and it came back with
ERROR: index row requires 8416 bytes, maximum size is 8191
I'm not very experienced here unfortunately. Maybe there is simply too many rows for an index to be useful?
I ran an explain on the query and the output essentially looks like:
QUERY PLAN | Filter: ((array_of_ids && '{1, 2, 3, 4, 5, 6, 7 ... n}'::bigint[])
so I guess it is automatically doing the ::bigint[]. I tried this as well and the query takes the same time to execute which I guess makes sense.
I realize I'm only pasting a portion of the response to the explain (analyze, buffers, format text) but I'm doing this in psql and my system often runs out of memory. There are tons of --- in the output, I am not sure if there is a way to not have psql do that.
The plan looks pretty simple so is this basically saying there is no way to optimize this? I have two huge arrays and it just takes time to determine an overlap? Not sure if there is a JOIN solution here, I tried to unnest and do a JOIN on equivalent entries but the query never returned so I'm not sure if I got it wrong or if it is just a far slower approach.

Table join with multiple conditions

I'm having trouble to give the condition for tables' joining. The highlight parts are the 3 conditions that I need to solve. Basically, there are some securities that for their effective term if the value is between 0 to 2 it has score 1, if the value is between 2 to 10, it has score 2, and if the value is bigger than 10 it has value 4.
For the first two conditions, in the query's where part I solve them like this
however for the third condition if the Descriptsec is empty I'm not quite sure what can I do, can anyone help?
Can you change the lookup table ([Risk].[dbo].[FILiquidityBuckets]) you are using?
If yes, do this:
Add bounds so that table looks like this:
Metric-DescriptLowerBound-DescriptUpperBound-LiquidityScore
Effective term-0-2-1
Effective term-2-10-2
Effective term-10-9999999(some absurd high number)-4
Then your join condition can be this:
ON FB3.Metric='Effective term'
AND CAST(sa.effectiveTerm AS INT) BETWEEN CAST(FB3.DescriptLowerBound AS INT)
AND CAST(FB3.DescriptLowerBound AS INT)
Please note that BETWEEN is inclusive so in the edge cases (where the value is exactly 2 or 10), the lower score will be captured.
I can see some problems: the effective term in table with sa alias is a float. So you should consider rounding up or down.
Overall a lot of things can be changed/improved but I tried to offer an immediate solution.
Hope this helps.

Is this confusing TSQL syntax more efficient? Is it really that confusing?

I would have searched but I don't even know how to search for something like this, so here goes.
I work in a company where I'm looking at lots of older stored procs (currently we're on sql server 2012, I don't know when they were written, but likely 3-5+ years ago). I continually come across these little nuggets in select statements, and they are extremely confusing to me. Ultimately they are all math-based, so maybe there's a performance reason for these things? Maybe they are way more common than I thought, and I just haven't been exposed to them?
Here is an example, imagine it is buried in a SELECT clause, selecting from a table that has a numerical "balance" column (and please forgive me if my parens don't match up exactly ... you will get the idea).
SELECT
...
HasBalance = LEFT('Y', 1 * ABS(SIGN(SUM(balance)))) + LEFT('N', 1 * (1 -ABS(SIGN(SUM(balance)))))
...
I mean, it wasn't that hard to figure out that essentially what it's doing is coming up with 'Y' or 'N' by concat'ing 2 strings ('Y' + '') or ('' + 'N'), each of which were generated by using 3 functions to reduce a value (balance) to a 1 or 0 to multiply by 1 to determine how many characters to take in each LEFT function (ultimately, 0 or 1).
But come on.
Here is another example:
sum(x.amount * charindex('someCode', x.code) * charindex(x.code, 'someCode'))
This one is essentially saying x.code MUST equal 'someCode' or return 0; otherwise return x.amount. If either charindex fails it throws a *0 into the mix, 0'ing out the math.
There are lots of others, they're not impossible to figure out WHAT they're doing ... but I still have to spend time figuring out what they're doing.
Is the obscurity as obscure as I think it is? If so, is it worth the tradeoff (assuming there is a tradeoff)?
Thanks!

SQL Server Modulo function appears to return values outside of expected range [duplicate]

This question already has answers here:
How does this CASE expression reach the ELSE clause?
(3 answers)
Closed 7 years ago.
While running a modulo function inside a case statement, there are many times that a value outside of the expected range is returned.
SELECT CASE WHEN ABS(CheckSUM(NewId())) % 5 IN (0,1,2,3,4) then NULL
ELSE 'What Happened?' END
If you run this script a few times, you will see there are times that the result appears to be outside of the range of 0,1,2,3,4. My thinking is that this is somehow returning non-integer values during the case statement causing modulo to be an ineffective method of sorting by case.
Could someone explain what is happening in these cases so that I can combat this in the future?
NOTE: If I run the code modulo function by itself (outside of case statement) and return the results all values are in the range of 0,1,2,3,4 as expected.
Change your statement to
SELECT top(1)
CASE WHEN ABS(CheckSUM(NewId())) % 5 IN (0,1,2,3,4) then NULL
ELSE 'What Happened?' END
And have a look at the actual execution plan.
The IN part in the case is expanded to.
CASE WHEN abs(checksum(newid()))%(5)=(4) OR
abs(checksum(newid()))%(5)=(3) OR
abs(checksum(newid()))%(5)=(2) OR
abs(checksum(newid()))%(5)=(1) OR
abs(checksum(newid()))%(5)=(0)
THEN NULL
ELSE 'What Happened?'
END
abs(checksum(newid()))%(5) is executed once for each value in the in clause.
I know this isn't an answer, but I'm putting it here to format the code properly.
When I ran OP's code, I got a "What Happened?" the third time. I wanted to see what Modulo WAS being returned when that happened, to see what insight it might provide, but to do that, I needed a non-moving NewID(), so I transposed his query to this:
DECLARE #NewID UNIQUEIDENTIFIER = NEWID();
SELECT CASE WHEN ABS(CheckSUM(#NewID)) % 5 IN (0,1,2,3,4) then NULL
ELSE 'What Happened?' END
And I didn't get any occurrences of "What Happened?".
My best guess is that the SQL engine tries to get cute with the order of operations somehow and sometimes ends up doing the modulo on a non-numeric numerator.
But without being able to reproduce the behavior on a transposed query, it's impossible to "catch it in the act".
In the interest of making this an actual answer, I guess to combat this in the future, you should populate a variable with NEWID() rather than putting it inline in your query, if at all possible.
And other forms of "un-nesting" these operations might work as well.
Here is the execution plan I see when executing Mikael's query:

How can I find odd house numbers in MS SQL?

The problem is I need to ignore the stray Letters in the numbers: e.g. 19A or B417
Take a look here:
Extracting Numbers with SQL Server
There are several hidden "gotcha's" that are explained pretty well in the article.
It depends on how much data you're dealing with, but doing that in SQL is probably going to be slow. Not everyone will agree with me here, but I think all data processing should be done in application code.
I would just take the rows you want, and filter it in the application you're dealing with.
The easiest thing to do here would be to create a CLR function which takes the address. In the CLR function, you would take the first part of the address (assuming it is the house number), which should be delimited by whitespace.
Then, replace any non-numeric characters with an empty string.
You should have a string representing an integer at that point which you can pass to the Parse method on the Int32 class to produce an integer, which you can then check to see if it is odd.
I recommend a CLR function (assuming you are using SQL Server 2005 and above, and can set the compatibility level of the database) here because it's easier to perform string manipulations in .NET than it is in T-SQL.
Assuming [Address] is the column with the address in it...
Select Case Cast(Substring(Reverse(Address), PatIndex('%[0-9]%',
Reverse(Address)), 1) as Integer) % 2
When 0 Then 'Even'
When 1 Then 'Odd' End
From Table
I've been through this drill before. The best alternative is to add a column to the table or to a subsidiary joinable table that stores the inferred numerical value for the purpose. Then use iterative queries to set the column repeatedly until you get sufficient accuracy and coverage. You'll end up encountering stuff like "First, Third," "451a", "1200 South 19th Blvd East", and worse.
Then filter new and edited records as they occur.
As usual, UDF's should be avoided as being slow and (comparatively) less debuggable.

Resources