How many times is function called - sql-server

Lets have a following query:
SELECT * FROM {tablename} WHERE ColumnId = dbo.GetId()
where dbo.GetId() is non-deterministic user defined function. The question is whether dbo.GetId() is called only once for entire query and its result is then applied or is it called for each row? I think it is called for every row, but I don't know of any way how to prove it.
Also would following query be more efficient?
DECLARE #Id int
SET #Id = dbo.GetId()
SELECT * FROM {tablename} WHERE ColumnId = #Id

I doubt this is guaranteed anywhere. Use a variable if you want to ensure it.
I amended #Prdp's example
CREATE VIEW vw_rand
AS
SELECT Rand() ran
GO
/*Return 0 or 1 with 50% probability*/
CREATE FUNCTION dbo.Udf_non_deterministic ()
RETURNS INT
AS
BEGIN
RETURN
(SELECT CAST(10000 * ran AS INT) % 2
FROM vw_rand)
END
go
SELECT *
FROM master..spt_values
WHERE dbo.Udf_non_deterministic() = 1
In this case it is only evaluated once. Either all rows are returned or zero.
The reason for this is that the plan has a filter with a startup predicate.
The startup expression predicate is [tempdb].[dbo].[Udf_non_deterministic]()=(1).
This is only evaluated once when the filter is opened to see whether to get rows from the subtree at all - not for each row passing through it.
But conversely the below returns a different number of rows each time indicating that it is evaluated per row. The comparison to the column prevents it being evaluated up front in the filter as with the previous example.
SELECT *
FROM master..spt_values
WHERE dbo.Udf_non_deterministic() = (number - number)
And this rewrite goes back to evaluating once (for me) but CROSS APPLY still gave multiple evaluations.
SELECT *
FROM master..spt_values
OUTER APPLY(SELECT dbo.Udf_non_deterministic() ) AS C(X)
WHERE X = (number - number)

Here is one way to prove it
View
View is created to add a Nondeterministic inbuilt Functions inside user defined function
CREATE VIEW vw_rand
AS
SELECT Rand() ran
Nondeterministic Functions
Now create a Nondeterministic user defined Functions using the above view
CREATE FUNCTION Udf_non_deterministic ()
RETURNS FLOAT
AS
BEGIN
RETURN
(SELECT ran
FROM vw_rand)
END
Sample table
CREATE TABLE #test
(
id INT,
name VARCHAR(50)
)
INSERT #test
VALUES (1,'a'),
(2,'b'),
(3,'c'),
(4,'d')
SELECT dbo.Udf_non_deterministic (), *
FROM #test
Result:
id name non_deterministic_val
1 a 0.203123494465542
2 b 0.888439497446073
3 c 0.633749721616085
4 d 0.104620204364744
As you can see for all the rows the function is called

Yes it does get called once per row.
See following thread for debugging functions
SQL Functions - Logging
And yes the below query is efficient as the function is called only once.
DECLARE #Id int
SET #Id = dbo.GetId()
SELECT * FROM {tablename} WHERE ColumnId = #Id

Related

Why does the REPLICATE function reevaluate the value of NEWID when passed into a function that returns a table?

I wrote this function, and then went to test it. The results kind of surprised me. Any idea why this happens?
CREATE FUNCTION dbo.chunk(#input VARCHAR(MAX), #chunkSize INT = 36)
RETURNS TABLE AS
RETURN (
WITH CTE AS (
SELECT SUBSTRING(#input,1,#chunkSize) AS [chunk], 1 AS [row]
UNION ALL
SELECT SUBSTRING(#input,1+([row]*#chunkSize),#chunkSize)
, [row] + 1
FROM cte
WHERE LEN(#input) > ([row]*#chunkSize)
)
SELECT [chunk]
FROM cte
)
GO
/* This does what I would expect this to do. */
DECLARE #input varchar(MAX) = REPLICATE(NEWID(),2);
SELECT * FROM dbo.chunk(#input,36)
/* But this, this is odd. If I call replicate here it calls newid() twice... */
SELECT * FROM dbo.chunk(REPLICATE(NEWID(),2),36);
chunk
E1810B3D-3DD4-4F55-B650-ED2DB28BCF70
E1810B3D-3DD4-4F55-B650-ED2DB28BCF70
chunk
89A26C8B-D5C7-47A8-BBBC-FE859B24E267
F9636F76-1ED6-4D19-A309-BA35EAC9F782
This is very expected.
The function you are using is an inline table valued function so the call is inlined into the general execution plan.
The function has #input parameter of type VARCHAR(MAX) and you are passing REPLICATE(NEWID(),2) to it so conceptually just
replace all instances of #input in your function with CONVERT(VARCHAR(MAX),REPLICATE(NEWID(),2)).
Similarly just replace all instances of #chunkSize with 36.
Once you do that you end up with the below which gives an identical execution plan and behaviour.
WITH CTE AS (
SELECT SUBSTRING(CONVERT(VARCHAR(MAX),REPLICATE(NEWID(),2)),1,36) AS [chunk], 1 AS [row]
UNION ALL
SELECT SUBSTRING(CONVERT(VARCHAR(MAX),REPLICATE(NEWID(),2)),1+([row]*36),36)
, [row] + 1
FROM cte
WHERE LEN(CONVERT(VARCHAR(MAX),REPLICATE(NEWID(),2))) > ([row]*36)
)
SELECT [chunk]
FROM cte
NEWID() is referenced in the plan three times. Once in the anchor branch of the recursive CTE and twice in the recursive branch. In your case you get one invocation of the anchor and one invocation of the recursive branch so it is evaluated three times. For the expression involving the LEN the re-evaluation doesn't make any difference to the result as the length will be the same even if the actual guid has changed.
Some non deterministic functions are treated as runtime constants but NEWID() is not one of them. That doesn't really help you in this recursive CTE case anyway.
If you try the following...
CREATE OR ALTER FUNCTION dbo.RandRows(#input float, #Rows INT = 10)
RETURNS TABLE AS
RETURN (
WITH CTE AS (
SELECT 1 AS Level, #input as [rand]
WHERE #Rows >= 1
UNION ALL
SELECT Level + 1, #input as [rand]
FROM cte
WHERE #Rows > Level + 1
)
SELECT [rand]
FROM cte
)
GO
SELECT *
FROM dbo.RandRows(RAND(), 10)
It ends up the same as
WITH CTE AS (
SELECT 1 AS Level, rand() as [rand]
WHERE 10 >= 1
UNION ALL
SELECT Level + 1, rand() as [rand]
FROM cte
WHERE 10 > Level + 1
)
SELECT [rand]
FROM cte
The invocation in the anchor branch and recursive branches are treated as two different invocations and given different runtime constant labels so you get one value in the first row and a different value in the subsequent rows.

SQL using a function in a trigger

I am creating a a trigger in SQL that will insert into another table after Insert on it. However I need to fetch a Value from the table to increment to be used in the insert.
I have a AirVisionSiteLog table. On insert on the table I would like for it to insert into another SiteLog table. However in order to do this I need to fetch the last Entry Number of the Site from the SiteLog table. Then on its insert take that result and increase by one for the new Entry Number. I am new to Triggers and Functions so I am not sure how to use them correctly. I believe I have a function to retrieve and increment the Entry Number however I am not sure how to use it in the Trigger.
My Function -
CREATE FUNCTION AQB_RMS.F_GetLogEntryNumber
(#LocationID int)
RETURNS INTEGER
AS
BEGIN
DECLARE
#MaxEntry Integer,
#EntryNumber Integer
Set #MaxEntry = (Select Max(SL.EntryNumber) FROM AQB_MON.AQB_RMS.SiteLog SL
WHERE SL.LocationID = #LocationID)
SET #EntryNumber = #MaxEntry + 1
RETURN #EntryNumber
END
My Trigger and attempt to use the Function -
CREATE TRIGGER [AQB_RMS].[SiteLogCreate] on [AQB_MON].[AQB_RMS].[AirVisionSiteLog]
AFTER INSERT
AS
BEGIN
declare #entrynumber int
declare #corrected int
set #corrected = 0
INSERT INTO [AQB_MON].[AQB_RMS].[SiteLog]
([SiteLogTypeID],[LocationID],[EntryNumber],[SiteLogEntry]
,[EntryDate],[Corrected],[DATE_CREATED],[CREATED_BY])
SELECT st.SiteLogTypeID, l.LocationID,
(select AQB_RMS.F_GetLogEntryNumber from [AQB_MON].[AQB_RMS].[SiteLog] sl
where sl.LocationID = l.LocationID)
, i.SiteLogEntry, i.EntryDate, #corrected, i.DATE_CREATED, i.CREATED_BY
from inserted i
left join AQB_MON.[AQB_RMS].[SiteLogType] st on st.SiteLogType = i.SiteLogType
left join AQB_MON.AQB_RMS.Location l on l.SourceSiteID = i.SourceSiteID
END
GO
I believe that you are close.
At this part of the query in the trigger: (I set the columns vertically so that the difference is more noticable)
SELECT st.SiteLogTypeID,
l.LocationID,
(select AQB_RMS.F_GetLogEntryNumber from [AQB_MON].[AQB_RMS].[SiteLog] sl where sl.LocationID = l.LocationID),
i.SiteLogEntry,
i.EntryDate,
#corrected,
i.DATE_CREATED,
i.CREATED_BY
...should be:
SELECT st.SiteLogTypeID,
l.LocationID,
AQB_RMS.F_GetLogEntryNumber(select l.LocationID from [AQB_MON].[AQB_RMS].[SiteLog] sl where sl.LocationID = l.LocationID),
i.SiteLogEntry,
i.EntryDate,
#corrected,
i.DATE_CREATED,
i.CREATED_BY
So basically, you would call the function name with the query as the parameter, which the results thereof should only be one row with a value.
Note that in my modified example, I added the l.LocationID after the select in the function call, so I'm not sure if this is what you need, but change that to match your needs. Because I'm not sure of the exact column that you need, add a comment should there be other issues.

Retrieve a random row in a user defined function?

I'm trying to define this function:
CREATE FUNCTION getRandomName ()
RETURNS VARCHAR(48)
AS BEGIN
-- concatenate two random strings from two columns in a table and return as a new string
DECLARE #finalStr VARCHAR(48);
SET #finalStr = (SELECT TOP 1 st1 FROM randomStrings ORDER BY RAND()) +
' ' +
(SELECT TOP 1 st2 FROM randomStrings ORDER BY RAND());
RETURN #finalStr;
END
I can't do this because:
Msg 443, Level 16, State 1, Procedure getRandomName, Line 6
Invalid use of a side-effecting operator 'rand' within a function.
The postings I have found online related to this problem suggest passing in a random value as a parameter when calling the function, or using a view and querying that view in the function to get a single random number into a variable. I can't use those methods because I am trying to use the randomization in the ORDER BY clause.
Is there a way to accomplish this?
(SQL Server 2014)
EDIT:
So you could use a view to get a result as stated below, but now I find myself needing to pass a parameter to the function:
CREATE FUNCTION getRandomName (
#maxPieceSize int
)
RETURNS VARCHAR(48)
AS BEGIN
-- concatenate two random strings from two columns in a table and return as a new string
DECLARE #finalStr VARCHAR(48);
SET #finalStr = (SELECT TOP 1 st1 FROM randomStrings WHERE LEN(st1) <= #maxPieceSize ORDER BY RAND()) +
' ' +
(SELECT TOP 1 st2 FROM randomStrings WHERE LEN(st1) <= #maxPieceSize ORDER BY RAND());
RETURN #finalStr;
END
So I can't create a view for this scenario because you can't pass parameters to views.
So here's my dilemma:
Function: I can't use this because I cannot use any nondeterministic function within a function.
View: I can't use this because I need to pass a parameter to the "function".
Procedure: The only way I can see to do this is to use an output variable, which means declaring a variable, etc. I would not be able to simply do something like EXECUTE getRandomName(6) or SELECT getRandomName(6).
Am I stuck using a procedure and doing it "the hard way" (using an output variable, and having to declare that variable every time I want to use the method)?
EDIT AGAIN:
I tried to write the actual method as a stored procedure, then call that stored procedure from a function which declares the variable, assigns it and then returns it. It made sense. Except....
Msg 557, Level 16, State 2, Line 1
Only functions and some extended stored procedures can be executed from within a function.
I'm guessing SQL Server really doesn't want me to have a function that can return a random value. (Funny, because isn't RAND() a function in its own right?)
Why do you stuck with function? Use a view as a function:
CREATE view getRandomName
AS
SELECT (SELECT TOP 1 st1 FROM randomStrings ORDER BY Newid()) +
' ' +
(SELECT TOP 1 st1 FROM randomStrings ORDER BY Newid())
as RandomName
GO
SELECT (SELECT RandomName FROM getRandomName) + ' - This is random name'
GO
There is also an old and crazy way to get random row within Stored Procedure:
CREATE PROCEDURE usp_Random_Message
#i INT
AS
SELECT TOP 1 * FROM (
SELECT TOP (#i) * FROM sys.Messages
ORDER BY message_id
) AS a ORDER BY message_id DESC
GO
DECLARE #i INT = CAST(RAND() * 100 as INT);
EXEC usp_Random_Message #i;
First of all,
SELECT TOP 1 st1 FROM randomStrings ORDER BY RAND()
would not return what you expect, because RAND is a run-time constant. Which means that the server generates a random number once and uses it for the duration of the query.
You want to arrange all rows in a random order and then pick the top row. The following query would do it:
SELECT TOP 1 st1 FROM randomStrings ORDER BY NEWID()
or
SELECT TOP 1 st1 FROM randomStrings ORDER BY CRYPT_GEN_RANDOM(4)
If you look at the execution plan you'll see that the randomStrings table is scanned in full, then sorted and one top row is picked.
I'm guessing that you want to use your function like this:
SELECT
SomeTable.SomeColumn
,dbo.GetRandomName() AS RandomName
FROM SomeTable
For each row in SomeTable you want to get some random string.
Even if you make your original approach work through some tricks, you would have randomStrings table scanned in full and sorted (twice) for each row of the SomeTable. It is likely to be not efficient.
One way to make it efficient and avoid tricks is to make sure that the randomStrings table has a int column ID with values from 1 to the maximum number of rows in this table. Make it primary key as well.
Then your function would accept two parameters - two random numbers in the range 1..N and the function would build the random string using the given IDs.
The function may look like this:
CREATE FUNCTION dbo.GetRandomName
(
#ParamID1 int
,#ParamID2 int
)
RETURNS VARCHAR(48)
AS
BEGIN
DECLARE #FinalStr VARCHAR(48);
SET #FinalStr =
(SELECT st1 FROM randomStrings WHERE ID = #ParamID1)
+ ' ' +
(SELECT st1 FROM randomStrings WHERE ID = #ParamID2)
;
RETURN #FinalStr;
END
If randomStrings table has 100 rows with IDs from 1 to 100, then usage of this function may look like this:
SELECT
SomeTable.SomeColumn
,dbo.GetRandomName(
(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 100 + 1
,(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 100 + 1
) AS RandomName
FROM SomeTable
CRYPT_GEN_RANDOM(4) generate 4 random bytes, they are cast to int and converted to a float number between 0 and 1, which is multiplied by the number of rows in the randomStrings table (100). It is just one of the methods to generate a random number in the range 1...N
CRYPT_GEN_RANDOM generates a different random number each time it is called and it is called twice per row, so you should get expected results.

selecting random data from a predefined list

I have a list of employee ids, lets say:
Q1654
F2597
Y9405
B6735
D8732
C4893
I9732
L1060
H6720
These values are not in any one of my tables, but I want to create a function that will take in no parameters and return a random value from this list. How can I do this?
Without getting into random number theory, here's one method:
http://sqlfiddle.com/#!6/192f2/1/0
it basically uses the function newID to generate a random value then sorts by it returning the top 1 record.
Given that it needs to be in a function and function's can't use newID... interjecting a view in the middle eliminates the problem.
Select * into myRand FROM (
SELECT 'Q1654' as val UNION
SELECT 'F2597'UNION
SELECT 'Y9405'UNION
SELECT 'B6735'UNION
SELECT 'D8732'UNION
SELECT 'C4893'UNION
SELECT 'I9732'UNION
SELECT 'L1060'UNION
SELECT 'H6720') b;
Create View vMyRand as
Select top 1 val from myRand order by NewID();
CREATE FUNCTION GetMyRand ()
RETURNS varchar(5)
--WITH EXECUTE AS CALLER
AS
BEGIN
Declare #RetValue varchar(5)
--#configVar =
Select #RetValue = val from vmyRand
RETURN(#retValue)
END;

comparing data via a function

I have two sets of data (locations) in separate tables and I need to compare if they match or not. I have a UDF which performs a calculation based upon 5 values from each table.
How do I perform a select with a join using this udf?
my udf is basically defined by....
ALTER FUNCTION [dbo].[MatchRanking]
(
#Latitude FLOAT
, #Longitude FLOAT
, #Postcode VARCHAR(16)
, #CompanyName VARCHAR(256)
, #TelephoneNumber VARCHAR(32)
, #Latitude2 FLOAT
, #Longitude2 FLOAT
, #Postcode2 VARCHAR(16)
, #CompanyName2 VARCHAR(256)
, #TelephoneNumber2 VARCHAR(32)
)
RETURNS INT
WITH EXECUTE AS CALLER
AS
BEGIN
DECLARE #RetVal INT
DECLARE #PostcodeVal INT
SET #RetVal = 0
SET #PostcodeVal = 0
SET #RetVal = #RetVal + dbo.FuzzyLogicStringMatch(#CompanyName, #CompanyName2)
IF #RetVal = 1 AND dbo.TelephoneNoStringMatch(#TelephoneNumber, #TelephoneNumber2) = 1
RETURN 5
ELSE
IF #RetVal = 1 AND dbo.FuzzyLogicStringMatch(#Postcode, #Postcode2) = 1
RETURN 5
ELSE
IF #RetVal = 1 AND ROUND(#Latitude,4) = ROUND(#Latitude2,4) AND ROUND(#Longitude,4) = ROUND(#Longitude2,4)
RETURN 5
ELSE
IF (#RetVal = 1 AND ROUND(#Latitude,4) = ROUND(#Latitude2,4)) OR (#RetVal = 1 AND ROUND(#Longitude,4) = ROUND(#Longitude2,4))
SET #RetVal = 2
ELSE
BEGIN
IF ROUND(#Latitude,4) = ROUND(#Latitude2,4)
SET #RetVal = #RetVal + 1
IF ROUND(#Longitude,4) = ROUND(#Longitude2,4)
SET #RetVal = #RetVal + 1
SET #RetVal = #RetVal + dbo.TelephoneNoStringMatch(#TelephoneNumber, #TelephoneNumber2)
SET #RetVal = #RetVal + dbo.FuzzyLogicStringMatch(#Postcode, #Postcode2)
END
RETURN #RetVal
END
This is the previous code that I am trying to fix:
SELECT li.LImportId, l.LocationId, dbo.MatchRanking(li.Latitude, li.Longitude, li.[Name], li.Postcode, li.TelephoneNumber,
l.Latitude, l.Longitude, l.CompanyName, l.Postcode, l.TelephoneNumber
) AS [MatchRanking]
FROM #LocImport li
LEFT JOIN [Location] l
ON lI.[Latitude] = l.[Latitude]
OR lI.[Longitude] = l.[Longitude]
OR lI.[Postcode] = l.[Postcode]
OR lI.[Name] = l.[CompanyName]
OR lI.[TelephoneNumber] = l.[TelephoneNumber]
What was wrong with your original JOIN? that should perform much faster than this function.
This should do it, but I think it will be really slow:
SELECT
...
FROM Table1 t1
CROSS JOIN Table2 t2
WHERE dbo.MatchRanking(t1.Latitude ,..,..,t2.Latitude ,..)=1 --"1" or whatever return value is a match
One thing we did was set up a spearate table to store the results of the cross join so they only have to be calcluated once. Then have a job that runs nightly to pick up any new records and populated them against all the old records. After all the lat/longs are not going to change (except for the occasional type which the nightly job can find and fix) and it doesn't make sense to do this calculation everytime you run a query to find the distances when the calulation will alwys be the same for most of the numbers. Once this data is in a table, you can easily query very quickly. Populating the table the first time might take awhile.
Personally, I would do this kind of fuzzy data matching in several passes.
-create an xref table which contain the keys for the records that match
-get all that match exactly and insert the keys in the xref table
-"fuzzify" your criteria and search again but only in those records that do not already have a match in the xref.
-rinse and repeat by expanding and/or fuzzifying your criteria until the matches you get are garbage.
You will have to cross join the two tables (m and n yields m x n compares) first and compare to find matches - there is really no other simple way.
However, with meta-understanding of the data and its interpretation and your goals, if you can somehow filter your set to eliminate items from the cross join fairly easily, that would help - especially if there's anything that has to be an exact match, or any way to partition the data so that items in different partitions would never be compared (i.e. comparing a US and Europe location would always have match rank 0)
I would say that you could use the Lat and Long to eliminate completely if they differ by a certain amount, but it looks like they are used to improve the match ranking, not to negatively eliminate items from being match ranked.
And scalar functions (your FuzzyMatches) called repeatedly (like in a multi-million row cross join) are tremendously expensive.
It looks to me like you could extrac the first match and the inner else to take place in your cross join (and inline if possible, not as a UDF) so that they can be somewhat optimized by the query optimizer in conjunction with the cross join instead of in a black box called m x n times.
Another possibility is to pre-extract only the distinct Phone number pairs, post code pairs etc.
SELECT Postcode1, Postcode2, dbo.FuzzyLogicStringMatch(Postcode1, Postcode2) AS MatchRank
FROM (
SELECT DISTINCT Postcode AS Postcode1
FROM Table1
) AS Postcodes1
CROSS JOIN
(
SELECT DISTINCT Postcode AS Postcode2
FROM Table2
) AS Postcodes2
If your function is symmetric, you can further reduce this space over which to call the UDF with some extra work (it's easier if the table is a self-join, you just use an upper right triangle).
Now you have the minimal set of compares for your scalar UDF to be called over. Put this result into a table indexed on the two columns.
You can do similar for all your UDF parameter sets. If the function definition isn't changing, you only need to add new combinations to your table over time, turning an expensive scalar function call into a table-lookup on relatively slower growing data. You can even use a CASE statement to fall back on the UDF call inline if the lookup table doesn't have an entry - so you can decide whether to keep the lookup tables comprehensive or not.

Resources