How do you use Snowflake RATIO_TO_REPORT to be precise? - snowflake-cloud-data-platform

I'm having a problem with Snowflake function RATIO_TO_REPORT and it's rounding. It seems like there is default rounding behavior, which causes a sum to be different from 1.
How would you address this issue?
RATIO_TO_REPORT Issue
Cheers,
Joe

Nothing going on here is technically "wrong" - as you've identified, this is a rounding issue. Each of the RATIO_TO_REPORT results is correct, and the sum of the values is also correct.
The best way to get around this is to force RATIO_TO_REPORT to output a more precise number by casting the input to NUMBER rather than INTEGER. In my testing, the below worked well:
-- Create the table for testing
CREATE TABLE R2R_TEST (activity_count integer);
-- Insert the values from the example screenshot.
INSERT INTO R2R_TEST VALUES (210),(11754),(3660),(66);
-- Create the test RATIO_TO_REPORT query with the cast to NUMBER
SELECT RATIO_TO_REPORT(activity_count::number(32,8)) OVER (ORDER BY activity_count) r2r FROM R2R_TEST;
-- Check our work.
with test as (SELECT RATIO_TO_REPORT(activity_count::number(32,8)) OVER (ORDER BY activity_count) r2r FROM R2R_TEST)
SELECT SUM(r2r) FROM test; -- 1.0000000000[...]
You can see that all I've done is cast activity_count to a NUMBER, and this gives the unrounded result.

Related

SSRS: SUM values if contending with NULLS

In SSMS, this is easy with an ISNULL function. So in SSRS, how would I go about getting a SUM value from two columns where one of them has NULLS?
Here's an example of the data:
So if a pro_rate value exists, I want to use it in the aggregation, but if it is NULL, then I need to pull from the cost column. Here's my most recent attempt at the expression:
=iif(isnothing(sum(Fields!pro_rate.Value), sum(Fields!cost.Value), sum(Fields!pro_rate.Value)))
In SQL, I would simply write sum(isnull(pro_rate, cost))(to also ensure it's clear what my result should be) but I keep getting the dreaded #Error in my report.
UPDATE 1: I altered my query a bit to the following and got the #Error.
=sum(iif(isnothing(Fields!pro_rate.Value), Fields!cost.Value, Fields!pro_rate.Value))
After looking at your data and the expressions being used, I came to the conclusion that you are seeing that error message due to a conversion error. It looks like your cost field is an INT datatype as the values all seem to be rounded off to the nearest integer and the pro_rate field appears to be a DECIMAL datatype. This appears to cause a problem when you are attempting to SUM an integer into a decimal without the proper level of precision.
To test this, I created the following simple dataset:
CREATE TABLE #temp (A VARCHAR(2), B INT, C DECIMAL(4,2))
INSERT INTO #temp (A,B,C)
VALUES ('A', '5', '1.7'), ('B', '10', '2.6'), ('C', '9', NULL)
SELECT * FROM #temp
With this data, I was able to test and confirm that my theory was correct and you should be able to fix your #Error by putting a CDEC function around the cost field in the expression provided by Derrick Moeller.
It looks to me like you want to evaluate each row individually and you have your order of operations incorrect.
=sum(iif(isnothing(Fields!pro_rate.Value), Fields!cost.Value, Fields!pro_rate.Value))

Using Sum(Cast(Replace syntax

I am trying to write a simple query in order to change some of our stage data. I have a varchar $ column (unfortunately) that needs to be summed. My issue is that because of commas, I cannot change the datatype.
So, I can use REPLACE(AMT,',','') to remove the commas but It still wont let me cast it as a decimal and I get
Error converting data type varchar to numeric.
I am trying the following below with no luck. Any ideas? Can this be done or am I using the wrong syntax here?
Select SUM(Cast(REPLACE(Amt,',','') as Decimal (18,2)) )
I was able to resolve this with #HABO suggestion. I used Cast(Ltrim(rtrim(table.AMT)) as Money) for all instances of the varchar amount. This removed white space and removed the commas from the numbers.
This should work... including an example.
Edit: if you are on SQL Server 2012+, you may be able to shorten your task by using Try_Convert
DECLARE #SomeTable AS TABLE (Amt Varchar(100));
INSERT INTO #Sometable (Amt) VALUES ('abc123,456.01'),(' 123,456.78 '),(Null),('asdad'),('');
With NumericsOnly AS
(
SELECT
REPLACE(Left(SubString(Amt, PatIndex('%[0-9.-,]%', Amt), 8000), PatIndex('%[^0-9.,-]%', SubString(Amt, PatIndex('%[0-9.,-]%', Amt), 8000) + 'X')-1),',','') AS CleanAmt
FROM
#SomeTable
)
SELECT
SUM(CONVERT(DECIMAL(18,2), CleanAmt)) AS TotalAmt
FROM
NumericsOnly
WHERE
IsNumeric(CleanAmt)=1
General methodology is taken from here
I wouldn't use money as a data type as it is notorious for rounding error.
The error is due to SQL order of operations within your SUM(CAST(REPLACE... operation. This issue can be resolved by summing the column AFTER it's been staged to be summed via a subquery:
SELECT SUM(Field),...
FROM ( SELECT
Cast(REPLACE(Amt,',','') as NUMERIC) as 'Field'
,...
) [Q]
If the table you're summing is administered by a BI Team, get them to stage the data there. Happy Data Happy life.

Precision equal to scale in MS SQL-Server

I recently came across a column declaration where the precision equal to scale :
...
[MYCOLUMN] [decimal](5, 5) NULL,
...
According to this documentation this seems correct.
But i don't understand, does it means that the decimal intended to fit in, can only have numbers to the right of the decimal point ?
Yes, it means that only decimals are allowed when inserting into the table. Everything else will be truncated and will most likely generate a "Data will be truncated" warning.
So the possible values for that column are in the following range:
0 - 0.99999
Have a look at this SQLFiddle. If you uncomment the inserting of 1 into the table you will get a truncation error.
Also, in case SQLFiddle ever goes down, here is the code:
create table test (col1 decimal(5,5));
insert into test values (0.12345);
insert into test values (0);
-- insert into test values (1);
select *
from test

Is Sql Server's ISNULL() function lazy/short-circuited?

TIs ISNULL() a lazy function?
That is, if i code something like the following:
SELECT ISNULL(MYFIELD, getMyFunction()) FROM MYTABLE
will it always evaluate getMyFunction() or will it only evaluate it in the case where MYFIELD is actually null?
This works fine
declare #X int
set #X = 1
select isnull(#X, 1/0)
But introducing an aggregate will make it fail and proving that the second argument could be evaluated before the first, sometimes.
declare #X int
set #X = 1
select isnull(#X, min(1/0))
It's whichever it thinks will work best.
Now it's functionally lazy, which is the important thing. E.g. if col1 is a varchar which will always contain a number when col2 is null, then
isnull(col2, cast(col1 as int))
Will work.
However, it's not specified whether it will try the cast before or simultaneously with the null-check and eat the error if col2 isn't null, or if it will only try the cast at all if col2 is null.
At the very least, we would expect it to obtain col1 in any case because a single scan of a table obtaining 2 values is going to be faster than two scans obtaining one each.
The same SQL commands can be executed in very different ways, because the instructions we give are turned into lower-level operations based on knowledge of the indices and statistics about the tables.
For that reason, in terms of performance, the answer is "when it seems like it would be a good idea it is, otherwise it isn't".
In terms of observed behaviour, it is lazy.
Edit: Mikael Eriksson's answer shows that there are cases that may indeed error due to not being lazy. I'll stick by my answer here in terms of the performance impact, but his is vital in terms of correctness impact across at least some cases.
Judging from the different behavior of
SELECT ISNULL(1, 1/0)
SELECT ISNULL(NULL, 1/0)
the first SELECT returns 1, the second raises a Msg 8134, Level 16, State 1, Line 4
Divide by zero error encountered. error.
This "lazy" feature you are referring to is in fact called "short-circuiting"
And it does NOT always work especially if you have a udf in the ISNULL expression.
Check this article where tests were run to prove it:
Short-circuiting (mainly in VB.Net and SQL Server)
T-SQL is a declarative language hence it cannot control the algorithm used to get the results.. it just declares what results it needs. It is upto the query engine/optimizer to figure out the cost-effective plan. And in SQL Server, the optimizer uses "contradiction detection" which will never guarantee a left-to-right evaluation as you would assume in procedural languages.
For your example, did a quick test:
Created the scalar-valued UDF to invoke the Divide by zero error:
CREATE FUNCTION getMyFunction
( #MyValue INT )
RETURNS INT
AS
BEGIN
RETURN (1/0)
END
GO
Running the below query did not give me a Divide by zero error encountered error.
DECLARE #test INT
SET #test = 1
SET #test = ISNULL(#test, (dbo.getMyFunction(1)))
SELECT #test
Changing the SET to the below statement did give me the Divide by zero error encountered. error. (introduced a SELECT in ISNULL)
SET #test = ISNULL(#test, (SELECT dbo.getMyFunction(1)))
But with values instead of variables, it never gave me the error.
SELECT ISNULL(1, (dbo.getMyFunction(1)))
SELECT ISNULL(1, (SELECT dbo.getMyFunction(1)))
So unless you really figure out how the optimizer is evaluating these expressions for all permutations, it would be safe to not rely on the short-circuit capabilities of T-SQL.

SQL Cast Mystery

I have a real mystery with the T-SQL below. As it is, it works with either the DATAP.Private=1, or the cast as int on Right(CRS,1). That is, if I uncomment the DATAP.Private=1, I get the error Conversion failed when converting the varchar value 'M' to data type int, and if I then remove that cast, the query works again. With the cast in place, the query only works without the Private=1.
I cannot for the life of me see how the Private=1 can add anything to the result set that will cause the error, unless Private is ever 'M', but Private is a bit field!
SELECT
cast(Right(CRS,1) as int) AS Company
, cast(PerNr as int) AS PN
, Round(Sum(Cost),2) AS Total_Cost
FROM
DATAP
LEFT JOIN BU_Summary ON DATAP.BU=BU_Summary.BU
WHERE
DATAP.Extension Is Not Null
--And DATAP.Private=1
And Left(CRS,2)='SB'
And DATAP.PerNr Between '1' And '9A'
and Right(CRS,1) <> 'm'
GROUP BY
cast(Right(CRS,1) as int)
, cast(PerNr as int)
ORDER BY
cast(PerNr as int)
I've seen something like this in the past. It's possible the DATAP.Private = 1 clause is generating a query plan that is performing the CRS cast before the Right(CRS,1) <> 'm' filter is applied.
It sure shouldn't do that, but I've had similar problems in T-SQL I've written, particularly when views are involved.
You might be able to reorder the elements to get the query to work, or select uncast data values into a temporary table or table variable and then select and cast from there in a separate statement as a stopgap.
If you check the execution plan it might shed some light about what is being calculated where. This might give you more ideas as to what you might change.
Just a guess, but it may be that when Private = 1, PerNr cannot be anything but a castable number in your data (as it is in the PerNr can equal '9A [or whatever else]', breaking the cast in the group by and order by clauses).
CAST('9A' AS int) fails when I tested it. It looks like you're doing unnecessary CASTS for grouping and sorting. Particularly in the GROUP BY, that will at least kill any chance for optimizing.

Resources