When I write SQL I try to make it as readable as possible. Among other things I often declare "constants" instead of using "magic numbers".
i.e. instead of
WHERE [Order].OrderType = 3
I do
DECLARE #OrderType_Cash AS int = 3;
...
WHERE [Order].OrderType = #OrderType_Cash
This works fine and I have not noticed any performance issues for the size of queries and data I normally work with.
Recently I read an article about parameter sniffing and workarounds (https://blogs.msdn.microsoft.com/turgays/2013/09/10/parameter-sniffing-problem-and-possible-workarounds/). In the artice one of the workarounds presented is "use local variables".
Workaround : Use local variable – This workaround is very similar with previous one (OPTION (OPTIMIZE FOR (#VARIABLE UNKNOWN))) – when
you assign the paramaters to local ones SQL Server uses statistic
densities instead of statistic histograms – So It estimates the same
number of records for all paramaters – The disadvantage is that some
queries will use suboptimal plans because densities are not precise
enough as the statistic histogram.
This makes me a bit worried since my interpretations is that I might get a suboptimal plan in my stored procedures just because I use a local variable instead of a "magic number".
I was also under the impression that SQL Server automatically convert "magic numbers" into variables in order to reuse plans.
Can someone clear this up for me?
Is there a difference between using a "magic number" and a local variable?
If yes, is it only in stored procedures or does it also apply to ad-hoc queries and dynamic SQL?
Is it a bad habit to use local variables like I do?
As documented in the article Statistics Used by the Query Optimizer in Microsoft SQL Server 2005
If you use a local variable in a query predicate instead of a
parameter or literal, the optimizer resorts to a reduced-quality
estimate, or a guess for selectivity of the predicate. Use parameters
or literals in the query instead of local variables
Regarding your questions...
I was also under the impression that SqlServer automatically convert "magic numbers" into variables in order to reuse plans.
No, never, it can auto parameterise adhoc queries but parameters behave differently from variables and can be sniffed. By default it will only do this in very limited circumstances where it is "safe" and unlikely to introduce parameter sniffing issues.
Is there a difference between using a "magic number" and a local
variable?
Yes, the statement is generally compiled before the variable value is even assigned. And even if the statement was to be subject to deferred compilation (or happen to be recompiled after the assignment) values of variables are still never sniffed except if you use option (recompile). If you use the literal inline SQL Server can look up that literal value in the histogram and potentially get much more accurate estimates rather than resorting to guesses. Accurate row estimates are important in getting the correct overall plan shape (e.g. Join type or access method selection) as well as getting an appropriate memory grant for your query.
The book "SQL Server 2005 practical troubleshooting" has this to say on the issue.
In SQL Server 2005, statement level compilation allows for compilation
of an individual statement in a stored procedure to be deferred until
just before the first execution of the query. By then the local
variable's value would be known. Theoretically SQL Server could take
advantage of this to sniff local variable values in the same way that
it sniffs parameters. However because it was common to use local
variables to defeat parameter sniffing in SQL Server 7.0 and SQL
Server 2000+, sniffing of local variables was not enabled in SQL
Server 2005. It may be enabled in a future SQL Server release though
(NB: this has not in fact been enabled in any version to date)
If yes, is it only in stored procedures or does it also
apply to ad-hoc queries and dynamic sql?
This applies to every use of variables. Parameters can be sniffed though so if you were to have a variable in the outer scope passed as a parameter in the inner scope that would allow the variable value to be sniffed.
Is it a bad habit to use local variables like I do?
If the plan is going to be sensitive to the exact variable value than yes. There are certain places where it will be entirely innocuous however.
The disadvantage of option (recompile) as a fix is that it recompiles the statement every time. This is unnecessary when the only reason for doing so is to get it to sniff a variable whose value is constant. The disadvantage of option (optimize for) with a specific literal value is that if the value changes you need to update all those references too.
Another approach would be to create a view of Constants.
CREATE VIEW MyConstants
AS
SELECT 3 AS OrderTypeCash, 4 AS OrderTypeCard
Then, instead of using a variable at all for those, reference that instead.
WHERE [Order].OrderType = (SELECT OrderTypeCash FROM MyConstants)
This will allow the value to be resolved at compile time and only need to be updated in one place.
Alternatively, if you use SSDT and database projects you could use a sqlcmd variable that is defined once and assigned to and then replace all your TSQL variable references with that. The code deployed to the server will still have "magic numbers" but in your source code it is a single SqlCmd variable (NB: For this pattern you might need to create a stub procedure in the project and use the post deployment script to actually alter it with the desired definition and performing the sqlcmd substitutions).
Related
I am new to Postgresql and I am trying to figure out some details about stored procedures (which I think are actually called functions in pgsql) when used in a multiple schema environment.
The application I have in mind involves a multi-tenant DB design where one schema is used for each tenant and all schemata, which have the same table structure and names, are part of the same database. As far as I know from DBs in general, stored procedures/functions are pre-compiled and therefore faster so I woulid like to use them for performing operations on each schema's tables by sending the required parameters from the application server instead of sending a list of SQL commands. In addition, I would like to have a SINGLE set of functions that implement all the SELECT (including JOIN type), INSERT, UPDATE, etc operations on the tables of each schema. This will allow to easily perform changes in each function and avoid SQL code replication and redundancy. As I found out, it is possible to create a set of functions in a schema s0 and then create s1, s2, ... schemata (having all the same tables) that use these functions.
For exapmle, I can create a template schema named s0 (identical to all others) and create a SQL or pl/pgSQL function that belongs to this schema and contains operations on the schema's tables. In this function, the table names are written without the schema prefix, i.e.
first_table and not s0.first_table
An example function could be:
CREATE FUNCTION sel() RETURNS BIGINT
AS 'SELECT count(a) from first_table;'
LANGUAGE SQL;
As I have tested, this function works well. If I move to schema s1 by entering:
set search_path to s1;
and then call the function again, the function acts upon s1 schema's identically named table first_table.
The function could also include the parameter path in order to call it with a schema name and a command to change the search_ path similar to this:
CREATE OR REPLACE FUNCTION doboth(path TEXT, is_local BOOLEAN DEFAULT false) RETURNS BIGINT AS $$
SELECT set_config('search_path', regexp_replace(path, '[^\w ,]', '', 'g'), is_local);
SELECT count(a) from first_table;
$$ LANGUAGE sql;
as shown in the proposed solution in PostgreSQL: how do I set the search_path from inside a function?
However, when I tried this and I called the function for a schema, I noticed that the second SELECT of the function was executed before the first, which led to executing the second SELECT on the wrong schema! This was really unexpected. Does anybody know the explanation to this behavior?
In order to bypass this issue, I created a plpgsql function that does the same thing and it worked without any execution order issues:
CREATE OR REPLACE FUNCTION doboth(path TEXT, is_local BOOLEAN DEFAULT false) RETURNS BIGINT AS $$
DECLARE result BIGINT;
BEGIN
PERFORM set_config('search_path', regexp_replace(path, '[^\w ,]', '', 'g'), is_local);
SELECT count(a) from first_table INTO result;
RETURN result;
END
$$ LANGUAGE plpgsql;
So, now some questions about performance this time:
1) Apart from a) having the selection of schema to operate and the specified operation on the schema in one transaction which is necessary for my multi-tenant implementation, and b) teaming together SQL commands and avoiding some extra data exchange between the application server and the DB server which is beneficial, do the Postgresql functions have any performance benefits over executing the same code in separate SQL commands?
2) In the described multi-tenant scenario with many schemata and one DB,
does a function that is defined once and called for any identical schema to the one it is defined lose any of its performance benefits (if any)?
3) Is there any difference in performance between an SQL function and a PL/pgSQL function that contains the same operations?
Before I answer your questions, a remark to your SQL function.
It does not fail because the statements are executed in a wrong order, but because both queries are parsed before the first one is executed. The error message you get is somewhat like
ERROR: relation "first_table" does not exist
[...]
CONTEXT: SQL function "doboth" during startup
Note the “during startup”.
Aswers
You may experience a slight performance boost, particularly if the SQL statements are complicated, because the plans of SQL statements in a PL/pgSQL function are cached for the duration of a database session or until they are invalidated.
If the plan for the query is cached by the PL/pgSQL function, but the SQL statement calling the function has to be planned every time, you might actually be worse of from a performance angle because of the overhead of executing the function.
Whenever you call the function with a different schema name, the query plan will be invalidated and has to be created anew. So if you change the schema name for every invocation, you won't gain anything.
SQL function don't cache query plans, so they don't perform better than the plain SQL query.
Note, however, that the gains from caching simple SQL statements in functions are not tremendous.
Use functions that just act as containers for SQL statements only if it makes life simpler for you, otherwise use plain SQL.
Do not only focus on performance uring design, but on a good architecture and a simple design.
If the same statements keep repeating over and over, you might gain more performance using prepared statements than using functions.
Firstly, I do not really believe there can be any issues with line execution order in functions. If you have any issues, it's your code not working, not Postgres.
Secondly, multi-tenant behavior is well implemented with set search_path to s1, s0;. There is usually no need for switching anything inside procedures.
Thirdly, there are no performance benefits in using stored procedures except for minimizing data flows between DB and the application. If you consider a query like SELECT count(*) FROM mytable WHERE somecolumn = $1 there is absolutely nothing you can optimize before you know the value of $1.
And finally, no, there is no significant difference between functions in SQL and PL/pgSQL. The most time is still consumed by reading through tables, so focus on perfecting that.
Hope that clarifies the situation. Also, you may want to consider security benefits of storage procedures. Just a hint.
If I do the following:
IF #THING = TRUE BEGIN
DECLARE #BIGTHING AS BIGINT
END
Will SQL Server allocate the resources for the #BIGTHING?
Or maybe better asked: does SQL Server parse the stored procedure and allocate all the declared variables before execution?
It must as this
IF 1=2 BEGIN
DECLARE #BIGTHING AS BIGINT=0
END
select #BIGTHING
runs without problem for me with Microsoft SQL Server 2008 R2. Interestingly though, if the variable is defaulted to a value as shown above, it only sets the value if the IF condition is true.
A variable is available for use in a T-SQL batch from the point at which it's declaration is parsed until the end of the batch. This means that a) variable declarations don't take part in any form of control flow, and b) variable names must be unique within a batch, even if they appear to be in mutually exclusive parts of the batch. SpectralGhost's answer shows the consequence of this.
Allocation isn't exactly the right word for it. All local variables are just given slots on stack as part of the function declaration. Some time is spent zeroing out the values, but that's probably done in mass for the whole stack frame.
In interesting thing is when the code is recompiled. This can happen in the middle of stored procedure for a number of reasons. The one I was told specifically to look out for is altering a temp table. If you make any changes to the temp table, a new execution plan is generated for the rest of the stored proc.
I have a simple SELECT statement with a couple columns referenced in the WHERE clause. Normally I do these simple ones in the VB code (setup a Command object, set Command Type to text, set Command Text to the Select statement). However I'm seeing timeout problems. We've optimized just about everything we can with our tables, etc.
I'm wondering if there'd be a big performance hit just because I'm doing the query this way, versus creating a simple stored procedure with a couple params. I'm thinking maybe the inline code forces SQL to do extra work compiling, creating query plan, etc. which wouldn't occur if I used a stored procedure.
An example of the actual SQL being run:
SELECT TOP 1 * FROM MyTable WHERE Field1 = #Field1 ORDER BY ID DESC
A well formed "inline" or "ad-hoc" SQL query - if properly used with parameters - is just as good as a stored procedure.
But this is absolutely crucial: you must use properly parametrized queries! If you don't - if you concatenate together your SQL for each request - then you don't benefit from these points...
Just like with a stored procedure, upon first executing, a query execution plan must be found - and then that execution plan is cached in the plan cache - just like with a stored procedure.
That query plan is reused over and over again, if you call your inline parametrized SQL statement multiple times - and the "inline" SQL query plan is subject to the same cache eviction policies as the execution plan of a stored procedure.
Just from that point of view - if you really use properly parametrized queries - there's no performance benefit for a stored procedure.
Stored procedures have other benefits (like being a "security boundary" etc.), but just raw performance isn't one of their major plus points.
It is true that the db has to do the extra work you mention, but that should not result in a big performance hit (unless you are running the query very, very frequently..)
Use sql profiler to see what is actually getting sent to the server. Use activity monitor to see if there are other queries blocking yours.
Your query couldn't be simpler. Is Field1 indexed? As others have said, there is no performance hit associated with "ad-hoc" queries.
For where to put your queries, this is one of the oldest debates in tech. I would argue that your requests "belong" to your application. They will be versionned with your app, tested with your app and should disappear when your app disappears. Putting them anywhere other than in your app is walking into a world of pain. But for goodness sake, use .sql files, compiled as embedded resources.
Select statement which is part of form clause of any
another statement is called as inline query.
Cannot take parameters.
Not a database object
Procedure:
Can take paramters
Database object
can be used globally if same action needs to be performed.
I have seen people suggest copying the parameters to a local variable to avoid parameter sniffing in a stored proc. Say you have
CREATE PROCEDURE List_orders_3 #fromdate datetime AS
DECLARE #fromdate_copy datetime
SELECT #fromdate_copy = #fromdate
SELECT * FROM Orders WHERE OrderDate > #fromdate_copy
(I got this from http://www.sommarskog.se/query-plan-mysteries.html but I need more details to understand it fully).
But what does this actually do to the query plan cache and query plan optimizer? If it is true that the optimizer makes no assumptions about #fromdate_copy, then why is it that it won't cache a plan that is most likely going to be a full table scan (since it makes no assumptions, how could it generate anything else)?
Is this technique basically like a "no inputs will run well, but no input will run terribly either" ?
Actually, you need to assign a default variable to the #fromdate_copy field that you declare, so that when the query engine looks at the query itself, it bases a plan on the value that is 'hard-coded' - but instead, when the query actually gets executed, it gets executed with the value being passed in and switched..
Ken Henderson (the Guru himself) explained this in great detail: http://blogs.msdn.com/b/khen1234/archive/2005/06/02/424228.aspx
If you can, read his books - they offer a plethora of information about sql server internals: http://www.amazon.com/Gurus-Guide-Server-Architecture-Internals/dp/0201700476/ref=pd_bxgy_b_text_c
I'm not sure if he has anything written for the newer versions, but some of the fundamentals haven't changed that much...
A while ago I had a query that I ran quite a lot for one of my users. It was still being evolved and tweaked but eventually it stablised and ran quite quickly, so we created a stored procedure from it.
So far, so normal.
The stored procedure, though, was dog slow. No material difference between the query and the proc, but the speed change was massive.
[Background, we're running SQL Server 2005.]
A friendly local DBA (who no longer works here) took one look at the stored procedure and said "parameter spoofing!" (Edit: although it seems that it is possibly also known as 'parameter sniffing', which might explain the paucity of Google hits when I tried to search it out.)
We abstracted some of the stored procedure to a second one, wrapped the call to this new inner proc into the pre-existing outer one, called the outer one and, hey presto, it was as quick as the original query.
So, what gives? Can someone explain parameter spoofing?
Bonus credit for
highlighting how to avoid it
suggesting how to recognise possible cause
discuss alternative strategies, e.g. stats, indices, keys, for mitigating the situation
FYI - you need to be aware of something else when you're working with SQL 2005 and stored procs with parameters.
SQL Server will compile the stored proc's execution plan with the first parameter that's used. So if you run this:
usp_QueryMyDataByState 'Rhode Island'
The execution plan will work best with a small state's data. But if someone turns around and runs:
usp_QueryMyDataByState 'Texas'
The execution plan designed for Rhode-Island-sized data may not be as efficient with Texas-sized data. This can produce surprising results when the server is restarted, because the newly generated execution plan will be targeted at whatever parameter is used first - not necessarily the best one. The plan won't be recompiled until there's a big reason to do it, like if statistics are rebuilt.
This is where query plans come in, and SQL Server 2008 offers a lot of new features that help DBAs pin a particular query plan in place long-term no matter what parameters get called first.
My concern is that when you rebuilt your stored proc, you forced the execution plan to recompile. You called it with your favorite parameter, and then of course it was fast - but the problem may not have been the stored proc. It might have been that the stored proc was recompiled at some point with an unusual set of parameters and thus, an inefficient query plan. You might not have fixed anything, and you might face the same problem the next time the server restarts or the query plan gets recompiled.
Yes, I think you mean parameter sniffing, which is a technique the SQL Server optimizer uses to try to figure out parameter values/ranges so it can choose the best execution plan for your query. In some instances SQL Server does a poor job at parameter sniffing & doesn't pick the best execution plan for the query.
I believe this blog article http://blogs.msdn.com/queryoptteam/archive/2006/03/31/565991.aspx has a good explanation.
It seems that the DBA in your example chose option #4 to move the query to another sproc to a separate procedural context.
You could have also used the with recompile on the original sproc or used the optimize for option on the parameter.
A simple way to speed that up is to reassign the input parameters to local parameters in the very beginning of the sproc, e.g.
CREATE PROCEDURE uspParameterSniffingAvoidance
#SniffedFormalParameter int
AS
BEGIN
DECLARE #SniffAvoidingLocalParameter int
SET #SniffAvoidingLocalParameter = #SniffedFormalParameter
--Work w/ #SniffAvoidingLocalParameter in sproc body
-- ...
In my experience, the best solution for parameter sniffing is 'Dynamic SQL'. Two important things to note is that 1. you should use parameters in your dynamic sql query 2. you should use sp_executesql (and not sp_execute), which saves the execution plan for each parameter values
Parameter sniffing is a technique SQL Server uses to optimize the query execution plan for a stored procedure. When you first call the stored procedure, SQL Server looks at the given parameter values of your call and decides which indices to use based on the parameter values.
So when the first call contains not very typical parameters, SQL Server might select and store a sub-optimal execution plan in regard to the following calls of the stored procedure.
You can work around this by either
using WITH RECOMPILE
copying the parameter values to local variables inside the stored procedure and using the locals in your queries.
I even heard that it's better to not use stored procedures at all but to send your queries directly to the server.
I recently came across the same problem where I have no real solution yet.
For some queries the copy to local vars helps getting back to the right execution plan, for some queries performance degrades with local vars.
I still have to do more research on how SQL Server caches and reuses (sub-optimal) execution plans.
I had similar problem. My stored procedure's execution plan took 30-40 seconds. I tried using the SP Statements in query window and it took few ms to execute the same.
Then I worked out declaring local variables within stored procedure and transferring the values of parameters to local variables. This made the SP execution very fast and now the same SP executes within few milliseconds instead of 30-40 seconds.
Very simple and sort, Query optimizer use old query plan for frequently running queries. but actually the size of data is also increasing so at that time new optimized plan is require and still query optimizer using old plan of query. This is called Parameter Sniffing.
I have also created detailed post on this. Please visit this url:
http://www.dbrnd.com/2015/05/sql-server-parameter-sniffing/
Changing your store procedure to execute as a batch should increase the speed.
Batch file select i.e.:
exec ('select * from order where order id ='''+ #ordersID')
Instead of the normal stored procedure select:
select * from order where order id = #ordersID
Just pass in the parameter as nvarchar and you should get quicker results.