I'm trying to use the Divide function in SSAS:
https://msdn.microsoft.com/en-us/library/jj873944(v=sql.110).aspx
But I have to support SQL server 2008 and it looks like this is not available. The initial problem I am having is that when I add a case statement to a measure calculation the performance of the query is VERY poor. It's been suggested to use this divide function.
The statement to create member is:
Create Member CurrentCube.[Measures].[AvgSentiment] As
CASE
WHEN ([Measures].[SentimentCount]) > 0 THEN [Measures].[SentimentSum] / [Measures].[SentimentCount]
WHEN([Measures].[SentimentCount]) = 0 THEN 0
END
, VISIBLE =1
, ASSOCIATED_MEASURE_GROUP = 'vw_CUBE_FACT' ;
Which I tried replacing with:
Create Member CurrentCube.[Measures].[AvgSentiment] As
Divide ([Measures].[SentimentSum], [Measures].[SentimentCount], 0)
, VISIBLE =1
, ASSOCIATED_MEASURE_GROUP = 'vw_CUBE_FACT'
I also tried:
Create Member CurrentCube.[Measures].[AvgSentiment] As
IIF(
[Measures].[SentimentCount] = 0
, 0
, [Measures].[SentimentSum]/[Measures].[SentimentCount]
)
, VISIBLE =1
, ASSOCIATED_MEASURE_GROUP = 'vw_SEAMS_CUBE_FACT'
This also ate up a tonne of CPU / Memory.
Divide won't make a huge difference.
Your third attempt is almost what you want. What you aim to do with cube measures is make them "sparse". My interpretation of this is that you want them to only exist in the parts of the cube where they should exist - everywhere else the cube space should be empty - this is achieved by using null in your measure rather than 0:
CREATE MEMBER CurrentCube.[Measures].[AvgSentiment] As
IIF(
[Measures].[SentimentCount] = 0
, NULL //<<got rid of the 0
, [Measures].[SentimentSum]
/[Measures].[SentimentCount]
)
, VISIBLE =1
, ASSOCIATED_MEASURE_GROUP = 'vw_SEAMS_CUBE_FACT'
IIF generally performs better than CASE so better this than adapting your attempt using CASE.
If the above still performs badly then I suspect you need to investigate the performance of [Measures].[SentimentCount] - how is this calculated?
Related
When I use CASE .. WHEN .. END I get an index scan less efficient than the index seek.
I have complex business rules I need to use the CASE, is there any workaround ?
Query A:
select * from [dbo].[Mobile]
where((
CASE
where ([MobileNumber] = (LTRIM(RTRIM('987654321'))))
END
) = 1)
This query gets an index scan and 199 logical reads.
Query B:
select * from [dbo].[Mobile]
where ([MobileNumber] = (LTRIM(RTRIM('987654321'))))
This query gets an index seek and 122 logical reads.
For the table
CREATE TABLE #T(X CHAR(1) PRIMARY KEY);
And the query
SELECT *
FROM #T
WHERE CASE WHEN X = 'A' THEN 1 ELSE 0 END = 1;
It is apparent without that much thought that the only circumstances in which the CASE expression evaluates to 1 are when X = 'A' and that the query has the same semantics as
SELECT *
FROM #T
WHERE X = 'A';
However the first query will get a scan and the second one a seek.
The SQL Server optimiser will try all sorts of relational transformations on queries but will not even attempt to rearrange expressions such as CASE WHEN X = 'A' THEN 1 ELSE 0 END = 1 to express it as an X = expression so it can perform an index seek on it.
It is up to the query writer to write their queries in such a way that they are sargable.
There is no workaround to get an index seek on column MobileNumber with your existing CASE predicate. You just need to express the condition differently (as in your example B).
Potentially you could create a computed column with the CASE expression and index that - and you could then see an index seek on the new column. However this is unlikely to be useful to you as I assume in reality the mobile number 987654321 is dynamic and not something to be hardcoded into a column used by an index.
After cleaning up and fixing your code, you have a WHERE which is boolean expression based around a CASE.
As mentioned by #MartinSmith, there is simply no way SQL Server will re-arrange this. It does not do the kind of dynamic slicing that would allow it to re-arrange the first query into the second version.
select *
from [dbo].[Mobile]
where
CASE
WHEN [MobileNumber] = LTRIM(RTRIM('987654321'))
THEN 1
END
= 1
You may ask: the second version also has an expression in it, why does this not also get a scan?
select *
from [dbo].[Mobile]
where [MobileNumber] = LTRIM(RTRIM('987654321'))
The reason is that what SQL Server can recognize is that LTRIM(RTRIM('987654321')) is a deterministic constant expression: it does not change depending on runtime settings, nor on the result of in-row calculations.
Therefore, it can optimize by calculating it at compile time. The query therefore becomes this under the hood, which can be used against an index on MobileNumber.
select *
from [dbo].[Mobile]
where [MobileNumber] = '987654321'
I'm tasked with pulling relevent data out of a field which is essentially free text. I have been able to get the information I need 98% of the time by looking for keywords and using CASE statements to break the field down into 5 different fields.
My issue is I can't get around the last 2% because the errors don't follow any logical order - they are mostly misspellings.
I could bypass the field with a TRY CATCH, but I don't like giving up 4 good pieces of data when the routine is choking on one.
Is there any way to handle blanket errors within a CASE statement, or is there another option?
Current code, the 'b' with the commented out section is where it's choking right now:
CASE WHEN #Location = 0 THEN
CASE WHEN #Duration = 0 THEN
CASE WHEN #Timing = 0 THEN
SUBSTRING(#Comment,#Begin, #Context-#Begin)
ELSE
SUBSTRING(#Comment,#Begin, #Timing-#Begin)
END
ELSE SUBSTRING(#Comment,#Begin, #Duration-#Begin)
END
ELSE SUBSTRING(#Comment,#Begin, #Location-#Begin)
END AS Complaint
,CASE WHEN #Location = 0 THEN ''
ELSE
CASE WHEN #Duration = 0 THEN
CASE WHEN #Timing = 0 THEN SUBSTRING(#Comment,#Location+10, (#CntBegin-11))
ELSE SUBSTRING(#Comment,#Location+10, #Timing-(#Location+10))
END
ELSE SUBSTRING(#Comment,#Location+10, #Duration-(#Location+10))
END
END AS Location
,CASE WHEN #Timing = 0 THEN ''
ELSE
CASE WHEN #CntBegin = 0 THEN
SUBSTRING(#Comment,#Timing+#TimingEnd, (#Location+#Context)-(#Timing+#TimingEnd))
ELSE
'b'--SUBSTRING(#Comment,#Timing+#TimingEnd, (#Location+#CntBegin-1)-(#Timing+#TimingEnd))
END
END AS Timing
On this statement, which has a comma in an odd spot. I have to reference the comma usually for the #CntBegin, but in this case it's making my (#Location+#CntBegin-1) shorter then the (#Timing+#TimingEnd):
'Pt also presents with/for mild check MGP/MGD located in OU, since 12/2015 ? Stability.'
Please take into account, I'm not necessarily trying to fix this error, I'm looking for a way to handle any error that comes up as who knows what someone is going to type. I'd like to just display 'ERR' in that particular field when the code runs into something it can't handle. I just don't want the routine to die.
Assuming your error is due to the length parameter in SUBSTRING being less than 0. I always alias my parameters using CROSS APPLY and then validate the input before calling SUBSTRING(). Something like this should work:
SELECT
CASE WHEN CA.StringLen > 0 /*Ensure valid length*/
THEN SUBSTRING(#comment,#Timing+#TimingEnd,CA.StringLen)
ELSE 'Error'
END
FROM YourTable
CROSS APPLY (SELECT StringLen = (#Location+#CntBegin-1)-(#Timing+#TimingEnd)) AS CA
Let's say I have a table lines
b | a
-----------
17 7000
17 0
18 6000
18 0
19 5000
19 2500
I want to get positive values of a function: (a1 - a2) \ (b2 - b1) for all elements in cartesian product of lines with different b's. (If you are interested this will result in intersections of lines y1 = b1*x + a1 and y2 = b2*x + a2)
I wrote query1 for that cause
SELECT temp.point FROM
(SELECT DISTINCT ((l1.a - l2.a) / (l2.b - l1.b)) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
WHERE temp.point > 0
It throws a "division by zero" error. I tried the same query without the WHERE clause (query2) and it works just fine
SELECT temp.point FROM
(SELECT DISTINCT ((l1.a - l2.a) / (l2.b - l1.b)) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
as well as the variation with the defined SQL function (query3)
CREATE FUNCTION get_point(#a1 DECIMAL(18, 4), #a2 DECIMAL(18, 4), #b1 INT, #b2 INT)
RETURNS DECIMAL(18, 4)
WITH EXECUTE AS CALLER
AS
BEGIN
RETURN (SELECT (#a1 - #a2) / (#b2 - #b1))
END
GO
SELECT temp.point FROM
(SELECT DISTINCT dbo.get_point(l1.a, l2.a, l1.b, l2.b) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
WHERE temp.point > 0
I have an intuitive assumption that the outer SELECT shouldn't affect the way nested SELECT is executed (or at least shouldn't break it). Even if it is not true that wouldn't explain why query3 works when query1 doesn't.
Could someone explain the principle behind this? That would be much appreciated.
If you want to guarantee that the query will always work, you'd need to wrap your calculation in something like a case statement
case when l2.b - l1.b = 0
then null
else (l1.a - l2.a) / (l2.b - l1.b)
end
Technically, the optimizer is perfectly free to evaluate conditions in whatever order it expects will be more efficient. The optimizer is free to evaluate the division before the where clause that filters out rows where the divisor would be 0. It is also free to evaluate the where clause first. Your different queries have different query plans which result in different behavior.
Realistically, though, even though a particular query might have a "good" query plan today, there is no guarantee that the optimizer won't decide in a day, a month, or a year to change the query plan to something that would throw a division by 0 error. I suppose you could decide to use a bunch of hints/ plan guides to force a particular plan with a particular behavior to be used. But that tends to be the sort of thing that bites you in the hind quarters later. Wrapping the calculation in a case (or otherwise preventing the division by 0 error) will be much safer and easier to explain to the next developer.
I have a table with a column of bit values. I want to write a function that returns true if all records of an associated item are true.
One way I found of doing it is:
Select #Ret = CAST(MIN(CAST(IsCapped as tinyInt)) As Bit)
from ContractCover cc
Inner join ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
return #ret
But is the casting to int to get the minimum expensive? Should I instead just be querying based on say:
(count(Id) where IsCapped = 0 > 0) returning false rather than doing the multiple casts?
In the execution plan it doesn't seem like calling this function is heavy in the execution (but I'm not too familiar with analysing query plans - it just seems to have the same % cost as another section of the stored proc of like 2%).
Edit - when I execute the stored proc which calls the function and look at the execution plan - the part where it calls the function has a query cost (relative to the batch) : 1% which is comparable to other sections of the stored proc. Unless I'm looking at the wrong thing :)
Thanks!!
I would do this with an exists statement as it will jump out of the query from the moment it finds 1 record where IsCapped = 0 where as your query will always read all data.
CREATE FUNCTION dbo.fn_are_contracts_capped(#ContractVersionId int)
RETURNS bit
WITH SCHEMABINDING
AS
BEGIN
DECLARE #return_value bit
IF EXISTS(
SELECT 1
FROM dbo.ContractCover cc
JOIN dbo.ContractRiskVersion crv
ON cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
AND IsCapped = 0)
BEGIN
SET #return_value = 0
END
ELSE
BEGIN
SET #return_value = 1
END
RETURN #return_value
END
Compared to the IO required to read the data, the cast will not add a lot of overhead.
Edit: wrapped code in a scalar function.
Casting in the SELECT would be CPU and memory bound. Not sure how much in this case--under normal circumstances we usually try to optimize for IO first, and then worry about CPU and memory second. So I don't have a definite answer for you there.
That said, the problem with this particular solution to your problem is that it won't short-circuit. SQL Server will read out all rows where ContractVersionId = #ContractVersionId and IsActive = 1, convert IsCapped to an INT, and take the min, where really, you can quit as soon as you find a single row where IsCapped = 0. It won't matter much if ContactVersionId is highly selective, and only returns a very small fraction of the table, or if most rows are capped. But if ContactVersionId is not highly selective, or if a high percentage of the rows are uncapped, then you are asking SQL Server to do too much work.
Second consideration is that scalar-valued functions are a notorious performance drag in SQL Server. It is better to create as an in-line table function if possible, eg:
create function AreAllCapped(#ContractVersionId int)
returns table as return (
select
ContractVersionId = #ContractVersionId
, AreAllCapped = case when exists (
select *
from ContractCover cc
join ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
where crv.ContractVersionId = #ContractVersionId
and cc.IsActive = 1
and IsCapped = 0
)
then 0 else 1 end
)
Which you then can call using CROSS APPLY in the FROM clause (assuming SQL 2005 or later).
Final note: taking the count where IsCapped = 0 has similar problems. It's like the difference between Any() and Count() in LINQ, if you are familiar. Any() will short-circuit, Count() has to actually count all the elements. SELECT COUNT(*) ... WHERE IsCapped = 0 still has to count all the rows, even though a single row is all you need to move on.
Of course, it is a known fact that a bit column can't be passed as an argument to an aggregate function (and thus, if it needs to be passed, you have to cast it as an integer first), but bit columns can be sorted by. Your query, therefore, could be rewritten like this:
SELECT TOP 1 #Ret = IsCapped
FROM ContractCover cc
INNER JOIN ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
ORDER BY IsCapped;
Note that in this particular query it is assumed that IsCapped can't be NULL. If it can, you'll need to add an additional filter to the WHERE clause:
AND IsCapped IS NOT NULL
Unless, of course, you would actually prefer to return NULL instead of 0, if any.
As for the cost of casting, I don't really have anything to add to what has already been said by Filip and Peter. I do find it a nuisance that bit data require casting before aggregating, but that's never something of a primary concern.
Without using custom functions, is it possible in SQLite to do the following. I have two tables, which are linked via common id numbers. In the second table, there are two variables. What I would like to do is be able to return a list of results, consisting of: the row id, and NULL if all instances of those two variables (and there may be more than two) are NULL, 1 if they are all 0 and 2 if one or more is 1.
What I have right now is as follows:
SELECT
a.aid,
(SELECT count(*) from W3S19 b WHERE a.aid=b.aid) as num,
(SELECT count(*) FROM W3S19 c WHERE a.aid=c.aid AND H110 IS NULL AND H112 IS NULL) as num_null,
(SELECT count(*) FROM W3S19 d WHERE a.aid=d.aid AND (H110=1 or H112=1)) AS num_yes
FROM W3 a
So what this requires is to step through each result as follows (rough Python pseudocode):
if row['num_yes'] > 0:
out[aid] = 2
elif row['num_null'] == row['num']:
out[aid] = 'NULL'
else:
out[aid] = 1
Is there an easier way? Thanks!
Use CASE...WHEN, e.g.
CASE x WHEN w1 THEN r1 WHEN w2 THEN r2 ELSE r3 END
Read more from SQLite syntax manual (go to section "The CASE expression").
There's another way, for numeric values, which might be easier for certain specific cases.
It's based on the fact that boolean values is 1 or 0, "if condition" gives a boolean result:
(this will work only for "or" condition, depends on the usage)
SELECT (w1=TRUE)*r1 + (w2=TRUE)*r2 + ...
of course #evan's answer is the general-purpose, correct answer