selecting random data from a predefined list - sql-server

I have a list of employee ids, lets say:
Q1654
F2597
Y9405
B6735
D8732
C4893
I9732
L1060
H6720
These values are not in any one of my tables, but I want to create a function that will take in no parameters and return a random value from this list. How can I do this?

Without getting into random number theory, here's one method:
http://sqlfiddle.com/#!6/192f2/1/0
it basically uses the function newID to generate a random value then sorts by it returning the top 1 record.
Given that it needs to be in a function and function's can't use newID... interjecting a view in the middle eliminates the problem.
Select * into myRand FROM (
SELECT 'Q1654' as val UNION
SELECT 'F2597'UNION
SELECT 'Y9405'UNION
SELECT 'B6735'UNION
SELECT 'D8732'UNION
SELECT 'C4893'UNION
SELECT 'I9732'UNION
SELECT 'L1060'UNION
SELECT 'H6720') b;
Create View vMyRand as
Select top 1 val from myRand order by NewID();
CREATE FUNCTION GetMyRand ()
RETURNS varchar(5)
--WITH EXECUTE AS CALLER
AS
BEGIN
Declare #RetValue varchar(5)
--#configVar =
Select #RetValue = val from vmyRand
RETURN(#retValue)
END;

Related

Searching for multiple patterns in a string in T-SQL

In t-sql my dilemma is that I have to parse a potentially long string (up to 500 characters) for any of over 230 possible values and remove them from the string for reporting purposes. These values are a column in another table and they're all upper case and 4 characters long with the exception of two that are 5 characters long.
Examples of these values are:
USFRI
PROME
AZCH
TXJS
NYDS
XVIV. . . . .
Example of string before:
"Offered to XVIV and USFRI as back ups. No response as of yet."
Example of string after:
"Offered to and as back ups. No response as of yet."
Pretty sure it will have to be a UDF but I'm unable to come up with anything other than stripping ALL the upper case characters out of the string with PATINDEX which is not the objective.
This is unavoidably cludgy but one way is to split your string into rows, once you have a set of words the rest is easy; Simply re-aggregate while ignoring the matching values*:
with t as (
select 'Offered to XVIV and USFRI as back ups. No response as of yet.' s
union select 'Another row AZCH and TXJS words.'
), v as (
select * from (values('USFRI'),('PROME'),('AZCH'),('TXJS'),('NYDS'),('XVIV'))v(v)
)
select t.s OriginalString, s.Removed
from t
cross apply (
select String_Agg(j.[value], ' ') within group(order by Convert(tinyint,j.[key])) Removed
from OpenJson(Concat('["',replace(s, ' ', '","'),'"]')) j
where not exists (select * from v where v.v = j.[value])
)s;
* Requires a fully-supported version of SQL Server.
build a function to do the cleaning of one sentence, then call that function from your query, something like this SELECT Col1, dbo.fn_ReplaceValue(Col1) AS cleanValue, * FROM MySentencesTable. Your fn_ReplaceValue will be something like the code below, you could also create the table variable outside the function and pass it as parameter to speed up the process, but this way is all self contained.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION fn_ReplaceValue(#sentence VARCHAR(500))
RETURNS VARCHAR(500)
AS
BEGIN
DECLARE #ResultVar VARCHAR(500)
DECLARE #allValues TABLE (rowID int, sValues VARCHAR(15))
DECLARE #id INT = 0
DECLARE #ReplaceVal VARCHAR(10)
DECLARE #numberOfValues INT = (SELECT COUNT(*) FROM MyValuesTable)
--Populate table variable with all values
INSERT #allValues
SELECT ROW_NUMBER() OVER(ORDER BY MyValuesCol) AS rowID, MyValuesCol
FROM MyValuesTable
SET #ResultVar = #sentence
WHILE (#id <= #numberOfValues)
BEGIN
SET #id = #id + 1
SET #ReplaceVal = (SELECT sValue FROM #allValues WHERE rowID = #id)
SET #ResultVar = REPLACE(#ResultVar, #ReplaceVal, SPACE(0))
END
RETURN #ResultVar
END
GO
I suggest creating a table (either temporary or permanent), and loading these 230 string values into this table. Then use it in the following delete:
DELETE
FROM yourTable
WHERE col IN (SELECT col FROM tempTable);
If you just want to view your data sans these values, then use:
SELECT *
FROM yourTable
WHERE col NOT IN (SELECT col FROM tempTable);

SQL using a function in a trigger

I am creating a a trigger in SQL that will insert into another table after Insert on it. However I need to fetch a Value from the table to increment to be used in the insert.
I have a AirVisionSiteLog table. On insert on the table I would like for it to insert into another SiteLog table. However in order to do this I need to fetch the last Entry Number of the Site from the SiteLog table. Then on its insert take that result and increase by one for the new Entry Number. I am new to Triggers and Functions so I am not sure how to use them correctly. I believe I have a function to retrieve and increment the Entry Number however I am not sure how to use it in the Trigger.
My Function -
CREATE FUNCTION AQB_RMS.F_GetLogEntryNumber
(#LocationID int)
RETURNS INTEGER
AS
BEGIN
DECLARE
#MaxEntry Integer,
#EntryNumber Integer
Set #MaxEntry = (Select Max(SL.EntryNumber) FROM AQB_MON.AQB_RMS.SiteLog SL
WHERE SL.LocationID = #LocationID)
SET #EntryNumber = #MaxEntry + 1
RETURN #EntryNumber
END
My Trigger and attempt to use the Function -
CREATE TRIGGER [AQB_RMS].[SiteLogCreate] on [AQB_MON].[AQB_RMS].[AirVisionSiteLog]
AFTER INSERT
AS
BEGIN
declare #entrynumber int
declare #corrected int
set #corrected = 0
INSERT INTO [AQB_MON].[AQB_RMS].[SiteLog]
([SiteLogTypeID],[LocationID],[EntryNumber],[SiteLogEntry]
,[EntryDate],[Corrected],[DATE_CREATED],[CREATED_BY])
SELECT st.SiteLogTypeID, l.LocationID,
(select AQB_RMS.F_GetLogEntryNumber from [AQB_MON].[AQB_RMS].[SiteLog] sl
where sl.LocationID = l.LocationID)
, i.SiteLogEntry, i.EntryDate, #corrected, i.DATE_CREATED, i.CREATED_BY
from inserted i
left join AQB_MON.[AQB_RMS].[SiteLogType] st on st.SiteLogType = i.SiteLogType
left join AQB_MON.AQB_RMS.Location l on l.SourceSiteID = i.SourceSiteID
END
GO
I believe that you are close.
At this part of the query in the trigger: (I set the columns vertically so that the difference is more noticable)
SELECT st.SiteLogTypeID,
l.LocationID,
(select AQB_RMS.F_GetLogEntryNumber from [AQB_MON].[AQB_RMS].[SiteLog] sl where sl.LocationID = l.LocationID),
i.SiteLogEntry,
i.EntryDate,
#corrected,
i.DATE_CREATED,
i.CREATED_BY
...should be:
SELECT st.SiteLogTypeID,
l.LocationID,
AQB_RMS.F_GetLogEntryNumber(select l.LocationID from [AQB_MON].[AQB_RMS].[SiteLog] sl where sl.LocationID = l.LocationID),
i.SiteLogEntry,
i.EntryDate,
#corrected,
i.DATE_CREATED,
i.CREATED_BY
So basically, you would call the function name with the query as the parameter, which the results thereof should only be one row with a value.
Note that in my modified example, I added the l.LocationID after the select in the function call, so I'm not sure if this is what you need, but change that to match your needs. Because I'm not sure of the exact column that you need, add a comment should there be other issues.

Retrieve a random row in a user defined function?

I'm trying to define this function:
CREATE FUNCTION getRandomName ()
RETURNS VARCHAR(48)
AS BEGIN
-- concatenate two random strings from two columns in a table and return as a new string
DECLARE #finalStr VARCHAR(48);
SET #finalStr = (SELECT TOP 1 st1 FROM randomStrings ORDER BY RAND()) +
' ' +
(SELECT TOP 1 st2 FROM randomStrings ORDER BY RAND());
RETURN #finalStr;
END
I can't do this because:
Msg 443, Level 16, State 1, Procedure getRandomName, Line 6
Invalid use of a side-effecting operator 'rand' within a function.
The postings I have found online related to this problem suggest passing in a random value as a parameter when calling the function, or using a view and querying that view in the function to get a single random number into a variable. I can't use those methods because I am trying to use the randomization in the ORDER BY clause.
Is there a way to accomplish this?
(SQL Server 2014)
EDIT:
So you could use a view to get a result as stated below, but now I find myself needing to pass a parameter to the function:
CREATE FUNCTION getRandomName (
#maxPieceSize int
)
RETURNS VARCHAR(48)
AS BEGIN
-- concatenate two random strings from two columns in a table and return as a new string
DECLARE #finalStr VARCHAR(48);
SET #finalStr = (SELECT TOP 1 st1 FROM randomStrings WHERE LEN(st1) <= #maxPieceSize ORDER BY RAND()) +
' ' +
(SELECT TOP 1 st2 FROM randomStrings WHERE LEN(st1) <= #maxPieceSize ORDER BY RAND());
RETURN #finalStr;
END
So I can't create a view for this scenario because you can't pass parameters to views.
So here's my dilemma:
Function: I can't use this because I cannot use any nondeterministic function within a function.
View: I can't use this because I need to pass a parameter to the "function".
Procedure: The only way I can see to do this is to use an output variable, which means declaring a variable, etc. I would not be able to simply do something like EXECUTE getRandomName(6) or SELECT getRandomName(6).
Am I stuck using a procedure and doing it "the hard way" (using an output variable, and having to declare that variable every time I want to use the method)?
EDIT AGAIN:
I tried to write the actual method as a stored procedure, then call that stored procedure from a function which declares the variable, assigns it and then returns it. It made sense. Except....
Msg 557, Level 16, State 2, Line 1
Only functions and some extended stored procedures can be executed from within a function.
I'm guessing SQL Server really doesn't want me to have a function that can return a random value. (Funny, because isn't RAND() a function in its own right?)
Why do you stuck with function? Use a view as a function:
CREATE view getRandomName
AS
SELECT (SELECT TOP 1 st1 FROM randomStrings ORDER BY Newid()) +
' ' +
(SELECT TOP 1 st1 FROM randomStrings ORDER BY Newid())
as RandomName
GO
SELECT (SELECT RandomName FROM getRandomName) + ' - This is random name'
GO
There is also an old and crazy way to get random row within Stored Procedure:
CREATE PROCEDURE usp_Random_Message
#i INT
AS
SELECT TOP 1 * FROM (
SELECT TOP (#i) * FROM sys.Messages
ORDER BY message_id
) AS a ORDER BY message_id DESC
GO
DECLARE #i INT = CAST(RAND() * 100 as INT);
EXEC usp_Random_Message #i;
First of all,
SELECT TOP 1 st1 FROM randomStrings ORDER BY RAND()
would not return what you expect, because RAND is a run-time constant. Which means that the server generates a random number once and uses it for the duration of the query.
You want to arrange all rows in a random order and then pick the top row. The following query would do it:
SELECT TOP 1 st1 FROM randomStrings ORDER BY NEWID()
or
SELECT TOP 1 st1 FROM randomStrings ORDER BY CRYPT_GEN_RANDOM(4)
If you look at the execution plan you'll see that the randomStrings table is scanned in full, then sorted and one top row is picked.
I'm guessing that you want to use your function like this:
SELECT
SomeTable.SomeColumn
,dbo.GetRandomName() AS RandomName
FROM SomeTable
For each row in SomeTable you want to get some random string.
Even if you make your original approach work through some tricks, you would have randomStrings table scanned in full and sorted (twice) for each row of the SomeTable. It is likely to be not efficient.
One way to make it efficient and avoid tricks is to make sure that the randomStrings table has a int column ID with values from 1 to the maximum number of rows in this table. Make it primary key as well.
Then your function would accept two parameters - two random numbers in the range 1..N and the function would build the random string using the given IDs.
The function may look like this:
CREATE FUNCTION dbo.GetRandomName
(
#ParamID1 int
,#ParamID2 int
)
RETURNS VARCHAR(48)
AS
BEGIN
DECLARE #FinalStr VARCHAR(48);
SET #FinalStr =
(SELECT st1 FROM randomStrings WHERE ID = #ParamID1)
+ ' ' +
(SELECT st1 FROM randomStrings WHERE ID = #ParamID2)
;
RETURN #FinalStr;
END
If randomStrings table has 100 rows with IDs from 1 to 100, then usage of this function may look like this:
SELECT
SomeTable.SomeColumn
,dbo.GetRandomName(
(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 100 + 1
,(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 100 + 1
) AS RandomName
FROM SomeTable
CRYPT_GEN_RANDOM(4) generate 4 random bytes, they are cast to int and converted to a float number between 0 and 1, which is multiplied by the number of rows in the randomStrings table (100). It is just one of the methods to generate a random number in the range 1...N
CRYPT_GEN_RANDOM generates a different random number each time it is called and it is called twice per row, so you should get expected results.

How many times is function called

Lets have a following query:
SELECT * FROM {tablename} WHERE ColumnId = dbo.GetId()
where dbo.GetId() is non-deterministic user defined function. The question is whether dbo.GetId() is called only once for entire query and its result is then applied or is it called for each row? I think it is called for every row, but I don't know of any way how to prove it.
Also would following query be more efficient?
DECLARE #Id int
SET #Id = dbo.GetId()
SELECT * FROM {tablename} WHERE ColumnId = #Id
I doubt this is guaranteed anywhere. Use a variable if you want to ensure it.
I amended #Prdp's example
CREATE VIEW vw_rand
AS
SELECT Rand() ran
GO
/*Return 0 or 1 with 50% probability*/
CREATE FUNCTION dbo.Udf_non_deterministic ()
RETURNS INT
AS
BEGIN
RETURN
(SELECT CAST(10000 * ran AS INT) % 2
FROM vw_rand)
END
go
SELECT *
FROM master..spt_values
WHERE dbo.Udf_non_deterministic() = 1
In this case it is only evaluated once. Either all rows are returned or zero.
The reason for this is that the plan has a filter with a startup predicate.
The startup expression predicate is [tempdb].[dbo].[Udf_non_deterministic]()=(1).
This is only evaluated once when the filter is opened to see whether to get rows from the subtree at all - not for each row passing through it.
But conversely the below returns a different number of rows each time indicating that it is evaluated per row. The comparison to the column prevents it being evaluated up front in the filter as with the previous example.
SELECT *
FROM master..spt_values
WHERE dbo.Udf_non_deterministic() = (number - number)
And this rewrite goes back to evaluating once (for me) but CROSS APPLY still gave multiple evaluations.
SELECT *
FROM master..spt_values
OUTER APPLY(SELECT dbo.Udf_non_deterministic() ) AS C(X)
WHERE X = (number - number)
Here is one way to prove it
View
View is created to add a Nondeterministic inbuilt Functions inside user defined function
CREATE VIEW vw_rand
AS
SELECT Rand() ran
Nondeterministic Functions
Now create a Nondeterministic user defined Functions using the above view
CREATE FUNCTION Udf_non_deterministic ()
RETURNS FLOAT
AS
BEGIN
RETURN
(SELECT ran
FROM vw_rand)
END
Sample table
CREATE TABLE #test
(
id INT,
name VARCHAR(50)
)
INSERT #test
VALUES (1,'a'),
(2,'b'),
(3,'c'),
(4,'d')
SELECT dbo.Udf_non_deterministic (), *
FROM #test
Result:
id name non_deterministic_val
1 a 0.203123494465542
2 b 0.888439497446073
3 c 0.633749721616085
4 d 0.104620204364744
As you can see for all the rows the function is called
Yes it does get called once per row.
See following thread for debugging functions
SQL Functions - Logging
And yes the below query is efficient as the function is called only once.
DECLARE #Id int
SET #Id = dbo.GetId()
SELECT * FROM {tablename} WHERE ColumnId = #Id

How to select data rows from sql server with the maximum value?

I have a sql server table named Student like below:
I wish to select the students with the highest score from each class, which shall produce the output like this:
Due to some constraint, I can't be sure how many unique class names would exist in the table. My stored procedure is :
create procedure selectBestStudent
as
begin
select Name, max(TestScore)
from [TestDB1].[dbo].[StudentTest]
group by Name
end
But the result is wrong. Any idea?
You can use ROW_NUMBER with a PARTITION BY:
SELECT Name, Class, TestScore
FROM (
SELECT Name, Class, TestScore,
ROW_NUMBER() OVER (PARTITION BY Class
ORDER BY TestScore DESC) AS rn
FROM StudentTest) AS t
WHERE t.rn = 1
ROW_NUMBER enumerates records within each Class partition: the ORDER BY clause guarantees that the record having the greatest TestScore value is assigned a value equal to 1.
Note: To handle ties you can use RANK in place of ROW_NUMBER. This way you can get all students that share the same maximum TestScore for the same Class.
You can also achieve this goal with NOT EXISTS()
SELECT * FROM Student s
WHERE NOT EXISTS(select 1 FROM Student t
where t.class = s.class
and t.testScore > s.testScore)
This will select only those rows that doesn't have a row with a higher value on testScore
I think you will have a problem with the Group By and the MAX() when there are multiple people with the same score in a class.
I solved it with a fetch if you don't know yet what this is, you can look here. It's easier than it looks at the beginning!
I know that might be a horrible way to do it but its's easy to understand and it worked! :D
USE [TestDB]
GO
DECLARE #class char(10), #testscore int;
DECLARE #result Table
(
Name char(10),
Class char(10),
TestScore int
);
-- Get Classes and their Maxima
DECLARE TestScore_cursor CURSOR FOR SELECT [class], MAX([testscore]) FROM [student] GROUP BY [class];
OPEN TestScore_cursor;
-- Perform the first fetch.
FETCH NEXT FROM TestScore_cursor INTO #class, #testscore;
-- Check ##FETCH_STATUS to see if there are any more rows to fetch.
WHILE ##FETCH_STATUS = 0
BEGIN
-- Search Students by Class and Score and add them to tempTable #result
INSERT INTO #result SELECT [name], [class], [testscore] From [student] where [testScore] = #testscore AND [class] = #class;
FETCH NEXT FROM TestScore_cursor INTO #class, #testscore;
END
-- Show the Result
SELECT * FROM #result;
CLOSE TestScore_cursor;
DEALLOCATE TestScore_cursor;
GO

Resources