Retrieve a random row in a user defined function?

Retrieve a random row in a user defined function? - sql-server

I'm trying to define this function:
CREATE FUNCTION getRandomName ()
RETURNS VARCHAR(48)
AS BEGIN
-- concatenate two random strings from two columns in a table and return as a new string
DECLARE #finalStr VARCHAR(48);
SET #finalStr = (SELECT TOP 1 st1 FROM randomStrings ORDER BY RAND()) +
' ' +
(SELECT TOP 1 st2 FROM randomStrings ORDER BY RAND());
RETURN #finalStr;
END
I can't do this because:
Msg 443, Level 16, State 1, Procedure getRandomName, Line 6
Invalid use of a side-effecting operator 'rand' within a function.
The postings I have found online related to this problem suggest passing in a random value as a parameter when calling the function, or using a view and querying that view in the function to get a single random number into a variable. I can't use those methods because I am trying to use the randomization in the ORDER BY clause.
Is there a way to accomplish this?
(SQL Server 2014)
EDIT:
So you could use a view to get a result as stated below, but now I find myself needing to pass a parameter to the function:
CREATE FUNCTION getRandomName (
#maxPieceSize int
)
RETURNS VARCHAR(48)
AS BEGIN
-- concatenate two random strings from two columns in a table and return as a new string
DECLARE #finalStr VARCHAR(48);
SET #finalStr = (SELECT TOP 1 st1 FROM randomStrings WHERE LEN(st1) <= #maxPieceSize ORDER BY RAND()) +
' ' +
(SELECT TOP 1 st2 FROM randomStrings WHERE LEN(st1) <= #maxPieceSize ORDER BY RAND());
RETURN #finalStr;
END
So I can't create a view for this scenario because you can't pass parameters to views.
So here's my dilemma:
Function: I can't use this because I cannot use any nondeterministic function within a function.
View: I can't use this because I need to pass a parameter to the "function".
Procedure: The only way I can see to do this is to use an output variable, which means declaring a variable, etc. I would not be able to simply do something like EXECUTE getRandomName(6) or SELECT getRandomName(6).
Am I stuck using a procedure and doing it "the hard way" (using an output variable, and having to declare that variable every time I want to use the method)?
EDIT AGAIN:
I tried to write the actual method as a stored procedure, then call that stored procedure from a function which declares the variable, assigns it and then returns it. It made sense. Except....
Msg 557, Level 16, State 2, Line 1
Only functions and some extended stored procedures can be executed from within a function.
I'm guessing SQL Server really doesn't want me to have a function that can return a random value. (Funny, because isn't RAND() a function in its own right?)

Why do you stuck with function? Use a view as a function:
CREATE view getRandomName
AS
SELECT (SELECT TOP 1 st1 FROM randomStrings ORDER BY Newid()) +
' ' +
(SELECT TOP 1 st1 FROM randomStrings ORDER BY Newid())
as RandomName
GO
SELECT (SELECT RandomName FROM getRandomName) + ' - This is random name'
GO
There is also an old and crazy way to get random row within Stored Procedure:
CREATE PROCEDURE usp_Random_Message
#i INT
AS
SELECT TOP 1 * FROM (
SELECT TOP (#i) * FROM sys.Messages
ORDER BY message_id
) AS a ORDER BY message_id DESC
GO
DECLARE #i INT = CAST(RAND() * 100 as INT);
EXEC usp_Random_Message #i;

First of all,
SELECT TOP 1 st1 FROM randomStrings ORDER BY RAND()
would not return what you expect, because RAND is a run-time constant. Which means that the server generates a random number once and uses it for the duration of the query.
You want to arrange all rows in a random order and then pick the top row. The following query would do it:
SELECT TOP 1 st1 FROM randomStrings ORDER BY NEWID()
or
SELECT TOP 1 st1 FROM randomStrings ORDER BY CRYPT_GEN_RANDOM(4)
If you look at the execution plan you'll see that the randomStrings table is scanned in full, then sorted and one top row is picked.
I'm guessing that you want to use your function like this:
SELECT
SomeTable.SomeColumn
,dbo.GetRandomName() AS RandomName
FROM SomeTable
For each row in SomeTable you want to get some random string.
Even if you make your original approach work through some tricks, you would have randomStrings table scanned in full and sorted (twice) for each row of the SomeTable. It is likely to be not efficient.
One way to make it efficient and avoid tricks is to make sure that the randomStrings table has a int column ID with values from 1 to the maximum number of rows in this table. Make it primary key as well.
Then your function would accept two parameters - two random numbers in the range 1..N and the function would build the random string using the given IDs.
The function may look like this:
CREATE FUNCTION dbo.GetRandomName
(
#ParamID1 int
,#ParamID2 int
)
RETURNS VARCHAR(48)
AS
BEGIN
DECLARE #FinalStr VARCHAR(48);
SET #FinalStr =
(SELECT st1 FROM randomStrings WHERE ID = #ParamID1)
+ ' ' +
(SELECT st1 FROM randomStrings WHERE ID = #ParamID2)
;
RETURN #FinalStr;
END
If randomStrings table has 100 rows with IDs from 1 to 100, then usage of this function may look like this:
SELECT
SomeTable.SomeColumn
,dbo.GetRandomName(
(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 100 + 1
,(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 100 + 1
) AS RandomName
FROM SomeTable
CRYPT_GEN_RANDOM(4) generate 4 random bytes, they are cast to int and converted to a float number between 0 and 1, which is multiplied by the number of rows in the randomStrings table (100). It is just one of the methods to generate a random number in the range 1...N
CRYPT_GEN_RANDOM generates a different random number each time it is called and it is called twice per row, so you should get expected results.

Related

Why does the REPLICATE function reevaluate the value of NEWID when passed into a function that returns a table?

I wrote this function, and then went to test it. The results kind of surprised me. Any idea why this happens?
CREATE FUNCTION dbo.chunk(#input VARCHAR(MAX), #chunkSize INT = 36)
RETURNS TABLE AS
RETURN (
WITH CTE AS (
SELECT SUBSTRING(#input,1,#chunkSize) AS [chunk], 1 AS [row]
UNION ALL
SELECT SUBSTRING(#input,1+([row]*#chunkSize),#chunkSize)
, [row] + 1
FROM cte
WHERE LEN(#input) > ([row]*#chunkSize)
)
SELECT [chunk]
FROM cte
)
GO
/* This does what I would expect this to do. */
DECLARE #input varchar(MAX) = REPLICATE(NEWID(),2);
SELECT * FROM dbo.chunk(#input,36)
/* But this, this is odd. If I call replicate here it calls newid() twice... */
SELECT * FROM dbo.chunk(REPLICATE(NEWID(),2),36);
chunk
E1810B3D-3DD4-4F55-B650-ED2DB28BCF70
E1810B3D-3DD4-4F55-B650-ED2DB28BCF70
chunk
89A26C8B-D5C7-47A8-BBBC-FE859B24E267
F9636F76-1ED6-4D19-A309-BA35EAC9F782

This is very expected.
The function you are using is an inline table valued function so the call is inlined into the general execution plan.
The function has #input parameter of type VARCHAR(MAX) and you are passing REPLICATE(NEWID(),2) to it so conceptually just
replace all instances of #input in your function with CONVERT(VARCHAR(MAX),REPLICATE(NEWID(),2)).
Similarly just replace all instances of #chunkSize with 36.
Once you do that you end up with the below which gives an identical execution plan and behaviour.
WITH CTE AS (
SELECT SUBSTRING(CONVERT(VARCHAR(MAX),REPLICATE(NEWID(),2)),1,36) AS [chunk], 1 AS [row]
UNION ALL
SELECT SUBSTRING(CONVERT(VARCHAR(MAX),REPLICATE(NEWID(),2)),1+([row]*36),36)
, [row] + 1
FROM cte
WHERE LEN(CONVERT(VARCHAR(MAX),REPLICATE(NEWID(),2))) > ([row]*36)
)
SELECT [chunk]
FROM cte
NEWID() is referenced in the plan three times. Once in the anchor branch of the recursive CTE and twice in the recursive branch. In your case you get one invocation of the anchor and one invocation of the recursive branch so it is evaluated three times. For the expression involving the LEN the re-evaluation doesn't make any difference to the result as the length will be the same even if the actual guid has changed.
Some non deterministic functions are treated as runtime constants but NEWID() is not one of them. That doesn't really help you in this recursive CTE case anyway.
If you try the following...
CREATE OR ALTER FUNCTION dbo.RandRows(#input float, #Rows INT = 10)
RETURNS TABLE AS
RETURN (
WITH CTE AS (
SELECT 1 AS Level, #input as [rand]
WHERE #Rows >= 1
UNION ALL
SELECT Level + 1, #input as [rand]
FROM cte
WHERE #Rows > Level + 1
)
SELECT [rand]
FROM cte
)
GO
SELECT *
FROM dbo.RandRows(RAND(), 10)
It ends up the same as
WITH CTE AS (
SELECT 1 AS Level, rand() as [rand]
WHERE 10 >= 1
UNION ALL
SELECT Level + 1, rand() as [rand]
FROM cte
WHERE 10 > Level + 1
)
SELECT [rand]
FROM cte
The invocation in the anchor branch and recursive branches are treated as two different invocations and given different runtime constant labels so you get one value in the first row and a different value in the subsequent rows.

Return specific number of rows in result set from Stored Procedure

When we make a stored procedure call we pass input parameter of how many rows we want to get from result. Also, we want specific columns returned which is obtained through join operation on tables.
My doubt is can we return the result as table but if in that approach how to limit result rows to specific count which is passed as input parameter.
I also searched and found about using Fetch next rows only but can we use that without offset logic.
Can somebody suggest me if there is any better approach than above mentioned?

Here is an example of how you could use TOP.
create or alter procedure TopTest
(
#RowCount int
) as
select top (#RowCount) *
from sys.columns c
order by c.name
And here is how you could do this using OFFSET/FETCH
create or alter procedure TopTestOffset
(
#RowCount int
) as
select *
from sys.columns c
order by c.name
offset 0 rows
fetch first (#RowCount) rows only

TOP and OFFSET are easier to use if you need to get first n rows. If you need a range of rows (i.e. for paging), you can use CTE
with vw as (
SELECT ROW_NUMBER() OVER (ORDER BY column1) AS RowNumber,
columnlist
from YourTable
) select * from vw
where RowNumber between 1 and #NumberOfRows

How to extract every 7 characters of an nvarchar into another table?

I have an nvarchar(200) called ColumnA in Table1 that contains, for example, the value:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
I want to extract every 7 characters into Table2, ColumnB and end up with all of these values below.
ABCDEFG
BCDEFGH
CDEFGHI
DEFGHIJ
EFGHIJK
FGHIJKL
GHIJKLM
HIJKLMN
IJKLMNO
JKLMNOP
KLMNOPQ
LMNOPQR
MNOPQRS
NOPQRST
OPQRSTU
PQRSTUV
QRSTUVW
RSTUVWX
STUVWXY
TUVWXYZ
[Not the real table and column names.]
The data is being loaded to Table1 and Table2 in an SSIS Package, and I'm puzzling whether it is better to do the string handling in TSQL in a SQL Task or parse out the string in a VB Script Component.
[Yes, I think we're the last four on the planet using VB in Script Components. I cannot persuade the other three that this C# thing is here to stay. Although, maybe it is a perfect time to go rogue.]

You can use a recursive CTE calculating the offsets step by step and substring().
WITH
cte
AS
(
SELECT 1 n
UNION ALL
SELECT n + 1 n
FROM cte
WHERE n + 1 <= len('ABCDEFGHIJKLMNOPQRSTUVWXYZ') - 7 + 1
)
SELECT substring('ABCDEFGHIJKLMNOPQRSTUVWXYZ', n, 7)
FROM cte;
db<>fiddle

If you have a physical numbers table, this is easy. If not, you can create a tally-on-the-fly:
DECLARE #string VARCHAR(100)='ABCDEFGHIJKLMNOPQRSTUVWXYZ';
--We create the tally using ROW_NUMBER against any table with enough rows.
WITH Tally(Nmbr) AS
(SELECT TOP(LEN(#string)-6) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT Nmbr
,SUBSTRING(#string,Nmbr,7) AS FragmentOf7
FROM Tally
ORDER BY Nmbr;
The idea in short:
The tally returns a list of numbers from 1 to n (n=LEN(#string)-6). This Number is used in SUBSTRING to define the starting position.

You can do it with T-SQL like this:
DECLARE C CURSOR LOCAL FOR SELECT [ColumnA] FROM [Table1]
OPEN C
DECLARE #Val nvarchar(200);
FETCH NEXT FROM C into #Val
WHILE ##FETCH_STATUS = 0 BEGIN
DECLARE #I INTEGER;
SELECT #I = 1;
WHILE #I <= LEN(#vAL)-6 BEGIN
PRINT SUBSTRING(#Val, #I, 7)
SELECT #I = #I + 1
END
FETCH NEXT FROM C into #Val
END
CLOSE C

Script Component solution
Assuming that the input Column name is Column1
Add a script component
Open the script component configuration form
Go to Inputs and Outputs Tab
Click on the Output icon and set the Synchronous Input property to None
Add an Output column (example outColumn1)
In the Script editor, use a similar code in the row processing function:
Dim idx as integer = 0
While Row.Column1.length > idx + 7
Output0Buffer.AddRow()
Output0Buffer.outColumn1 = Row.
Column1.Substring(idx,7)
idx +=1
End While

How many times is function called

Lets have a following query:
SELECT * FROM {tablename} WHERE ColumnId = dbo.GetId()
where dbo.GetId() is non-deterministic user defined function. The question is whether dbo.GetId() is called only once for entire query and its result is then applied or is it called for each row? I think it is called for every row, but I don't know of any way how to prove it.
Also would following query be more efficient?
DECLARE #Id int
SET #Id = dbo.GetId()
SELECT * FROM {tablename} WHERE ColumnId = #Id

I doubt this is guaranteed anywhere. Use a variable if you want to ensure it.
I amended #Prdp's example
CREATE VIEW vw_rand
AS
SELECT Rand() ran
GO
/*Return 0 or 1 with 50% probability*/
CREATE FUNCTION dbo.Udf_non_deterministic ()
RETURNS INT
AS
BEGIN
RETURN
(SELECT CAST(10000 * ran AS INT) % 2
FROM vw_rand)
END
go
SELECT *
FROM master..spt_values
WHERE dbo.Udf_non_deterministic() = 1
In this case it is only evaluated once. Either all rows are returned or zero.
The reason for this is that the plan has a filter with a startup predicate.
The startup expression predicate is [tempdb].[dbo].[Udf_non_deterministic]()=(1).
This is only evaluated once when the filter is opened to see whether to get rows from the subtree at all - not for each row passing through it.
But conversely the below returns a different number of rows each time indicating that it is evaluated per row. The comparison to the column prevents it being evaluated up front in the filter as with the previous example.
SELECT *
FROM master..spt_values
WHERE dbo.Udf_non_deterministic() = (number - number)
And this rewrite goes back to evaluating once (for me) but CROSS APPLY still gave multiple evaluations.
SELECT *
FROM master..spt_values
OUTER APPLY(SELECT dbo.Udf_non_deterministic() ) AS C(X)
WHERE X = (number - number)

Here is one way to prove it
View
View is created to add a Nondeterministic inbuilt Functions inside user defined function
CREATE VIEW vw_rand
AS
SELECT Rand() ran
Nondeterministic Functions
Now create a Nondeterministic user defined Functions using the above view
CREATE FUNCTION Udf_non_deterministic ()
RETURNS FLOAT
AS
BEGIN
RETURN
(SELECT ran
FROM vw_rand)
END
Sample table
CREATE TABLE #test
(
id INT,
name VARCHAR(50)
)
INSERT #test
VALUES (1,'a'),
(2,'b'),
(3,'c'),
(4,'d')
SELECT dbo.Udf_non_deterministic (), *
FROM #test
Result:
id name non_deterministic_val
1 a 0.203123494465542
2 b 0.888439497446073
3 c 0.633749721616085
4 d 0.104620204364744
As you can see for all the rows the function is called

Yes it does get called once per row.
See following thread for debugging functions
SQL Functions - Logging
And yes the below query is efficient as the function is called only once.
DECLARE #Id int
SET #Id = dbo.GetId()
SELECT * FROM {tablename} WHERE ColumnId = #Id

selecting random data from a predefined list

I have a list of employee ids, lets say:
Q1654
F2597
Y9405
B6735
D8732
C4893
I9732
L1060
H6720
These values are not in any one of my tables, but I want to create a function that will take in no parameters and return a random value from this list. How can I do this?

Without getting into random number theory, here's one method:
http://sqlfiddle.com/#!6/192f2/1/0
it basically uses the function newID to generate a random value then sorts by it returning the top 1 record.
Given that it needs to be in a function and function's can't use newID... interjecting a view in the middle eliminates the problem.
Select * into myRand FROM (
SELECT 'Q1654' as val UNION
SELECT 'F2597'UNION
SELECT 'Y9405'UNION
SELECT 'B6735'UNION
SELECT 'D8732'UNION
SELECT 'C4893'UNION
SELECT 'I9732'UNION
SELECT 'L1060'UNION
SELECT 'H6720') b;
Create View vMyRand as
Select top 1 val from myRand order by NewID();
CREATE FUNCTION GetMyRand ()
RETURNS varchar(5)
--WITH EXECUTE AS CALLER
AS
BEGIN
Declare #RetValue varchar(5)
--#configVar =
Select #RetValue = val from vmyRand
RETURN(#retValue)
END;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Retrieve a random row in a user defined function? - sql-server

Related

Why does the REPLICATE function reevaluate the value of NEWID when passed into a function that returns a table?

Return specific number of rows in result set from Stored Procedure

How to extract every 7 characters of an nvarchar into another table?

How many times is function called

selecting random data from a predefined list

Categories

Resources