A SQL Server function to generate of sequential numbers - sql-server

I would like to have a SQL Server function dbo.GetNextNumber(), which would generate sequential numbers for each call. As far as I understand this is impossible with a native T-SQL function as SQL Server insists the functions has to be deterministic. But, if you could show me a native T-SQL function that does this would really make my day.
I thought perhaps this could be possible to write using a CLR function. As CLR functions are static, the sequence numbers need to be stored in the calling context of the set operation, as storing it as a static variable would result in several connections using the same sequence, resulting in not-so-sequential numbers. I do not know enough about embedded CLR to see if set operation's (select, update, delete, insert) calling context is reachable from the CLR side.
At the end of the day, the following query
select dbo.GetNextNumber() from sysobjects
must return the result
1
2
3
4
5
It is OK if another function call to reset the context is necessary like
exec dbo.ResetSequenceNumbers()
To prevent some misunderstandings and reduce the chances of wasting your time answering wrong question, please note that I am not looking for an ID generation function for a table and I am aware of some hacks (albeit using a proc not a function) that involves some temp tables with identity columns. The ROW_NUMBER() function is close but it also does not cut.
Thanks a lot for any responses
Kemal
P.S. It is amazing that SQL Server does have a built-in function for that. A function (provided that it cannot be used in joins and where clauses) is really easy to do and extremely useful, but for some reason it is not included.

As you have implemented the CLR sequence based on my article related to the calculation of Running Totals, you can achieve the same using the ROW_NUBER() function.
The ROW_NUMBER() function requires the ORDER BY in the OVER clause, however there is a nice workaround how to avoid sorting due to the ORDER BY. You cannot put an expression in the order by, but you can put SELECT aConstant there. So you can easily achieve number generating using below statement.
SELECT
    ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowNumber,
    *
FROM aTable

In the next version of SQL Server you can use a SEQUENCE to do this. In earlier versions it's easy enough to do by inserting to a separate "sequence table" (a table with only an IDENTITY column) and then retrieving the value with the SCOPE_IDENTITY() function. You won't be able to do that in a function but you can use a stored procedure.

Can you give more information about why ROW_NUMBER() doesn't cut it? In the specific example you give, ROW_NUMBER() (with appropriate clarification of OVER(ORDER BY)) certainly seems to match your criteria.
But there are certainly cases where ROW_NUMBER() might not be useful, and in those cases, there are usually other techniques.
Perhaps having this general function seems useful to you, but in most cases, I find a solution better tailored to the problem at hand is a better idea than a general purpose function which ends up causing more difficulties - like a leaky abstraction you are constantly working around. You specifically mention the need to have a reset function. No such thing is needed with ROW_NUMBER(), given that it has the OVER(PARTITION BY ORDER BY) which allow you to specify the grouping.

SQL Server Denali has a new sequence structure for developers
It is easy to manage and maintain sequence numbers in SQL Server after Denali
You can find details in Sequence Numbers in SQL Server
For other versions than Denali, you can use Sequence Tables.
Here is a sample for Sequence Table in SQL Server
For reading sequence number from sequence table, you should insert dummy records from this auto identity enabled sql table

As of SQL Server 2022 there is a new operator to address this:
GENERATE_SERIES ( start, stop [, step ] )
For full details, see:
https://learn.microsoft.com/en-us/sql/t-sql/functions/generate-series-transact-sql?view=sql-server-ver16

Related

SQLSTATE[IMSSP]: Tried to bind parameter number 2101. SQL Server supports a maximum of 2100 parameters

I am trying to run this query
$claims = Claim::wherein('policy_id', $userPolicyIds)->where('claim_settlement_status', 'Accepted')->wherebetween('intimation_date', [$startDate, $endDate])->get();
Here, $userPolicyIds can have thousands of policy ids. Is there any way I can increase the maximum number of parameters in SQL server? If not, could anyone help me find a way to solve this issue?
The wherein method creates an SQL fragment of the form WHERE policy_id IN (userPolicyIds[0], userPolicyIds[1], userPolicyIds[2]..., userPolicyIds[MAX]). In other words, the entire collection is unwrapped into the SQL statement. The result is a HUGE SQL statement that SQL Server refuses to execute.
This is a well known limitation of Microsoft SQL Server. And it is a hard limit, because there appears to be no option for changing it. But SQL Server can hardly be blamed for having this limit, because trying to execute a query with as many as 2000 parameters is an unhealthy situation that you should not have put yourself into in the first place.
So, even if there was a way to change the limit, it would still be advisable to leave the limit as it is, and restructure your code instead, so that this unhealthy situation does not arise.
You have at least a couple of options:
Break your query down to batches of, say, 2000 items each.
Add your fields into a temporary table and make your query join that table.
Personally, I would go with the second option, since it will perform much better than anything else, and it is arbitrarily scalable.
I solved this problem by running this raw query
SELECT claims.*, policies.* FROM claims INNER JOIN policies ON claims.policy_id = policies.id
WHERE policy_id IN $userPolicyIds AND claim_settlement_status = 'Accepted' AND intimation_date BETWEEN '$startDate' AND '$endDate';
Here, $userPolicyIds is a string like this ('123456','654321','456789'). This query is a bit slow, I'll admit that. But the number of policy ids is always going to a very big number and I wanted a quick fix.
just use prepare driver_options (PDO::prepare)
PDO::ATTR_EMULATE_PREPARES => true
https://learn.microsoft.com/en-us/sql/connect/php/pdo-prepare
and split where in on peaces (where (column in [...]) or column in [...])

Using an IN clause with table variable causes my query to run MUCH slower

I am using SSRS report whereby I need to pass multiple parameters to some SQL code.
Based on this blog post, the best way to handle multiple parameters is to used a split function, so that is the road I am following.
However, I am having some bad performance after following this.
For example, the following WHERE clause will return the data in 4 seconds:
AND DimBusinessDivision.Id IN (
22
)
This will also correctly return in 4 seconds:
DECLARE #BusinessDivisionId INT = 22
AND DimBusinessDivision.Id IN (
#BusinessDivisionId
)
However, using the split function such as below, It takes 2 minutes (which is the same time it takes without a WHERE clause:
AND DimBusinessDivision.Id IN (
SELECT Item FROM dbo.FuncSplit(#BusinessDivisionId, ',')
)
I've also tried creating a temp table and a table variable before the SQL statement with the results of the table but there's no difference. I have a feeling this has to do with the fact that the values are not literal values and that SQL server doesn't know what query plan to follow, or something similar. Does anyone know of any ways to increase the performance of this?
It simply doesn't like using a table to get the values in even if the table has the same amounts of rows.
UPDATE: I have used the table function as an inner join which has fixed the issue. Any idea's why this made all the difference?
INNER JOIN
dbo.FuncSplit(#BusinessDivisionIds, ',') AS FilteredBusinessDivisions ON
FilteredBusinessDivisions.Item = DimBusinessDivision.Id
A few things to play with:
Try the non-performant query and add OPTION (RECOMPILE); at the end of the query. If it magically runs much faster, then yes the issue was a bad cached query plan. For more information on this specific problem, you can Google "parameter sniffing" for a more thourough explanation.
You may also want to look at the function definition and toss a RECOMPILE in there too, and see what difference that makes.
Look at the estimated query plan and try to determine the difference.
But the root of the problem, I think, is that you are reinventing the wheel with this "split" function. You can have multi-valued parameters in SSRS and use "WHERE col IN #param": https://technet.microsoft.com/en-us/library/aa337396(v=sql.105).aspx
Unless there's a very specific reason you must split a comma separated list and cannot use normal parameters, just use a regular parameter that accepts multiple values.
Edit: I looked at the article you linked to. It's quite easy to have a SELECT ALL option in any reporting tool (not just SSRS), though it's not obvious. Using the "magic value" as written in the article you linked to works just fine. Can I ask what limitation is prompting you to need to do this string splitting?

Paging for dynamic complex queries for SQL Server

I can't find an easy way to make paging for complex queries for SQL server. I need to write function that takes sql query as an argument (this query can include subqueries, order by statements, grouping etc.) and retrieve a particular page of results. In oracle it's easy by encapsulating such query with another select statement, but for SQL server I can't find any simillar way. What I would like to avoid is to parse input SQL statement. I'm using SQL server 2005
Paging in SQL Server 2005 and upwards is best done via ranking functions. However, given that an arbitrary SQL query is unsorted, you need to somehow specify what the sort shall be for this to work, which isn't really "compatible" with a generic solution like you're trying to make (*).
The suggested way to do it is like this (assuming the variables #PageSize with the number of items per page, and #Page as 1-based index to the page you want to retrieve):
WITH NumberedQuery AS (
SELECT ROW_NUMBER() OVER (ORDER BY q.SomeColumn) ix, q.*
FROM QueryToPage q
)
SELECT nq.*
FROM NumberedQuery nq
WHERE (nq.ix >= (#Page-1)*#PageSize) AND (nq.ix < #Page*#PageSize);
(*): Your approach with concatenating SQL code has several issues, it prevents the use of parametrized queries, it adds the risk of SQL injection, it hurts performance and it cannot solve the issue at hand if the order is unspecified.

inner join Vs scalar Function

Which of the following query is better... This is just an example, there are numerous situations, where I want the user name to be displayed instead of UserID
Select EmailDate, B.EmployeeName as [UserName], EmailSubject
from Trn_Misc_Email as A
inner join
Mst_Users as B on A.CreatedUserID = B.EmployeeLoginName
or
Select EmailDate, GetUserName(CreatedUserID) as [UserName], EmailSubject
from Trn_Misc_Email
If there is no performance benefit in using the First, I would prefer using the second... I would be having around 2000 records in User Table and 100k records in email table...
Thanks
A good question and great to be thinking about SQL performance, etc.
From a pure SQL point of view the first is better. In the first statement it is able to do everything in a single batch command with a join. In the second, for each row in trn_misc_email it is having to run a separate BATCH select to get the user name. This could cause a performance issue now, or in the future
It is also eaiser to read for anyone else coming onto the project as they can see what is happening. If you had the second one, you've then got to go and look in the function (I'm guessing that's what it is) to find out what that is doing.
So in reality two reasons to use the first reason.
The inline SQL JOIN will usually be better than the scalar UDF as it can be optimised better.
When testing it though be sure to use SQL Profiler to view the cost of both versions. SET STATISTICS IO ON doesn't report the cost for scalar UDFs in its figures which would make the scalar UDF version appear better than it actually is.
Scalar UDFs are very slow, but the inline ones are much faster, typically as fast as joins and subqueries
BTW, you query with function calls is equivalent to an outer join, not to an inner one.
To help you more, just a tip, in SQL server using the Managment studio you can evaluate the performance by Display Estimated execution plan. It shown how the indexs and join works and you can select the best way to use it.
Also you can use the DTA (Database Engine Tuning Advisor) for more info and optimization.

Cost of Inline Table Valued function in SQL Server

Is there an inherent cost to using inline-table-valued functions in SQL Server 2008 that is not incurred if the SQL is inlined directly? Our application makes very heavy use of inline-table-valued functions to reuse common queries, but recently, we've found that queries run much faster if we don't use them.
Consider this:
CREATE FUNCTION dbo.fn_InnerQuery (#asOfDate DATETIME)
RETURNS TABLE
AS
RETURN
(
SELECT ... -- common, complicated query here
)
Now, when I do this:
SELECT TOP 10 Amount FROM dbo.fn_InnerQuery(dbo.Date(2009,1,1)) ORDER BY Amount DESC
The query returns with results in about 15 seconds.
However, when I do this:
SELECT TOP 10 Amount FROM
(
SELECT ... -- inline the common, complicated query here
) inline
ORDER BY Amount DESC
The query returns in less than 1 second.
I'm a little baffled by the overhead of using the table valued function in this case. I did not expect that. We have a ton of table valued functions in our application, so I'm wondering if there is something I'm missing here.
In this case, the UDF should be unnested/expanded like a view and it should be transparent.
Obviously, it's not...
In this case, my guess is that the column is smalldatetime and is cast to datetime because of the udf parameter but the constant is correctly evaluated (to match colum datatype) when inline.
datetime has a higher precedence that smalldatetime, so the column would be cast
What do the query plans say? The UDF would show a scan, the inline a seek most likely (not 100%, just based on what I've seen before)
Edit: Blog post by Adam Machanic
One thing that can slow functions down is omitting dbo. from table references inside the function. That causes SQL Server to do a security check for every call, which can be expensive.
Try running the table valued function independently to see, how fast/slow it executes?
Also, I am not sure how to clear the execution cache(?) which SQL Server might retain from the execution of the UDF. I mean - if you run the UDF first, it could be the case where SQL Server has the actual query with it & it could cache the plan/result. So, if you run the complicated query separately - it could be running it from cache.
In your second example the Table Valued function has to return the entire data set before the query can apply the filter. Hopping across the TF boundary is not something that the optimiser can always do.
In the third example the query optimiser can work out that the user only wants the top few 'amounts'. If this isn't an aggregate value the optimiser can push that processing right to the start of the query and not bother with any other data. If it is an aggregate amount then the slowdown is for a different reason.
If you compare the query plans of the two queries you should see that they are different.

Resources