I'm using one SP performing all CRUD operations
So, basically I'm executing same SP depending what action necesary:
eg
-- for select
exec USP_ORDER #MODE='S', #ORDER_DATE='2009/01/01'
-- for update
exec USP_ORDER #MODE='U', #other_params
-- for insert
exec USP_ORDER #MODE='I', #other_params
-- for delete
exec USP_ORDER #MODE='D', #ID=100
Thanks that I have only 1 SP per 1 Buisness Object which keeps my DB ordered.
But recently I experienced performance issues.
In light on that my question:
Is this approach correct? Can it has influence on performance / proper exec. plan?
It can have performance implications due to possible caching the 'wrong' query plan. Check out the topics 'parameter sniffing' and query plan caching.
EDIT: In response to John's comment, you could also have your top level SP decide which CRUD SP to call, then each would get its own cached query plan.
I think this is more a coding/design preference question.
Personally, I am big fan of keeping things simple and for this reason I would suggest you break out your operations into separate stored procedures.
This will be more transparent and also aide any performance tuning you may have to do in future, i.e. if your update procedure/logic is performing slowly, you can immediately isolate it as the cause whereas if the logic is part of a much larger procedure with varying CRUD operations, the root cause of the issue will not be quite so obvious.
I'm also fan of simplification (if possible).
But the reason I've decided on that way: currently I have ~80 SPs If I divide all by function they serve (eg. USP_Sample_Insert, USP_Sample_Select1, USP_Sample_Select2,
USP_Sample_Delete) I will have ~400 SPs!
Managing, navigating, updating, sync parameters between such huge amount if SP instances would be nightmare to me.
For me - only reasonable case to do it is performance....
Related
I've got a procedure exhibiting performance issues due to parameter sniffing. The procedure is costly but generally executes in an acceptable amount of time amid typical load. However, it will periodically perform poorly. When this happens, we see sessions executing the procedure go parallel. We'll see the number of sessions executing the routine start to pile up. We resolve the issue by pulling the procedure's plan from the cache, killing the active sessions, and monitoring for an immediate re-occurrence (like when a bad plan ends up back in the cache).
The procedure is in a typical OLTP database (lots of small DML queries).When the subject query performs poorly, it causes a CPU spike and degrades performance of the associated service. During business hours, the procedure is executed more than 5 times a minute (don't have an exact count). The plan for the procedure is here:
https://www.brentozar.com/pastetheplan/?id=S1aEVQJBN
All columns in the query (join and projected columns), are covered by indexes.
Here are the options I'm weighing:
We can apply Option Recompile to the problematic select statement inside the routine.
We could do a little research and find a parameter value that generates a plan that is generally good for "everything". We'd then add the following OPTIMIZE FOR to the function. This will incur a little technical debt as we may need to adjust this value overtime.
We could introduce branching logic in the procedure. If the parameter value were X or in range-X, we'd execute sproc, and if not, we'd run sproc. Again, incurs a little technical debt.
We could change the problematic statement to dynamic SQL. This will cause a plan to be generated for each unique SQL statement. If we get tons of unique calls of this procedure, it could bloat the plan cache.
Slap a MAXDOP of "1" on the problematic select statement.
Options I've factored out:
We can optimize for unknown. This will change record estimates the density vector, and optimize operations for that.
Have I missed any sensible options? What options make the most sense?
I have just analyzed a call to a particular SQL Server Stored Procedure. It is very slow so I decided to analyze it.
The result is confusing:
When executing the procedure it takes 1 min 24s.
When executing two identical calls simultaneously it takes 2 min 50 s. Obviously some bad blocking and context swapping is happening.
When switching on Actual Execution Plan and running one call it takes 4 min 48s. The client statistics now tells me the total execution time is 3000ms.
What is happening?
Does the Actual Execution Plan actually interfere that much?
That means there should be a warning somewhere that the execution time is MUCH longer and the statistics is WAY off.
The procedure is not huge in size but with a nasty complexity: cursors, nested selects in five levels, temporary tables, sub procedures and function calls.
There are tons of articles in the web discussing why using cursors is a bad practice and should be avoided when possible. Depending on database settings a cursor is registered in the global namespace making concurrency close to impossible - it is absolutely normal that 2 instances of your proc take exactly twice longer. Performance of cursors and locking are another points of consideration. So having said that even a while loop on, say, temp table with identity column to loop on, might be a better solution than cursors.
My experience is that turning on actual execution plan always adds some time - I have noticed that the more complex the plan is the bigger the effect it is (so i guess it is quite related to the SSMS handling it) ... but in your case the effect looks of larger scale than I would expect, so there might be something else going on.
I asked a question here Using cursor in OLTP databases (SQL server)
where people responded saying cursors should never be used.
I feel cursors are very powerful tools that are meant to be used (I don't think Microsoft supports cursors for bad developers).Suppose you have a table where the value of a column in a row is dependent on the value of the same column in the previous row. If it is a one time back end process, don't you think using a cursor would be an acceptable choice?
Off the top of my head I can think of a couple of scenarios where I feel there should be no shame in using cursors. Please let me know if you guys feel otherwise.
A one time back end process to clean bad data which completes execution within a few minutes.
Batch processes that run once in a long period of time (something like once a year).
If in the above scenarios, there is no visible strain on the other processes, wouldn't it be unreasonable to spend extra hours writing code to avoid cursors? In other words in certain cases the developer's time is more important than the performance of a process that has almost no impact on anything else.
In my opinion these would be some scenarios where you should seriously try to avoid using a cursor.
A stored proc called from a website that can get called very often.
A SQL job that would run multiple times a day and consume a lot of resources.
I think its very superficial to make a general statement like "cursors should never be used" without analyzing the task at hand and actually weighing it against the alternatives.
Please let me know of your thoughts.
There are several scenarios where cursors actually perform better than set-based equivalents. Running totals is the one that always comes to mind - look for Itzik's words on that (and ignore any that involve SQL Server 2012, which adds new windowing functions that give cursors a run for their money in this situation).
One of the big problems people have with cursors is that they perform slowly, they use temporary storage, etc. This is partially because the default syntax is a global cursor with all kinds of inefficient default options. The next time you're doing something with a cursor that doesn't need to do things like UPDATE...WHERE CURRENT OF (which I've been able to avoid my entire career), give it a fair shake by comparing these two syntax options:
DECLARE c CURSOR
FOR <SELECT QUERY>;
DECLARE c CURSOR
LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR <SELECT QUERY>;
In fact the first version represents a bug in the undocumented stored procedure sp_MSforeachdb which makes it skip databases if the status of any database changes during execution. I subsequently wrote my own version of the stored procedure (see here) which both fixed the bug (simply by using the latter version of the syntax above) and added several parameters to control which databases would be chosen.
A lot of people think that a methodology is not a cursor because it doesn't say DECLARE CURSOR. I've seen people argue that a while loop is faster than a cursor (which I hope I've dispelled here) or that using FOR XML PATH to perform group concatenation is not performing a hidden cursor operation. Looking at the plan in a lot of cases will show the truth.
In a lot of cases cursors are used where set-based is more appropriate. But there are plenty of valid use cases where a set-based equivalent is much more complicated to write, for the optimizer to generate a plan for, both, or not possible (e.g. maintenance tasks where you're looping through tables to update statistics, calling a stored procedure for each value in a result, etc.). The same is true for a lot of big multi-table queries where the plan gets too monstrous for the optimizer to handle. In these cases it can be better to dump some of the intermediate results into a temporary structure first. The same goes for some set-based equivalents to cursors (like running totals). I've also written about the other way, where people almost always think instinctively to use a while loop / cursor and there are clever set-based alternatives that are much better.
UPDATE 2013-07-25
Just wanted to add some additional blog posts I've written about cursors, which options you should be using if you do have to use them, and using set-based queries instead of loops to generate sets:
Best Approaches for Running Totals - Updated for SQL Server 2012
What impact can different cursor options have?
Generate a Set or Sequence Without Loops: [Part 1] [Part 2] [Part 3]
The issue with cursors in SQL Server is that the engine is set-based internally, unlike other DBMS's like Oracle which are cursor-based internally. This means that when you create a cursor in SQL Server, temporary storage needs to be created and the set-based resultset needs to be copied over to the temporary cursor storage. You can see why this would be expensive right off the bat, not to mention any row-by-row processing that you might be doing on top of the cursor itself. The bottom line is that set-based processing is more efficient, and often times your cursor-based operation can be done better using a CTE or temp table.
That being said, there are cases where a cursor is probably acceptable, as you said for one-off operations. The most common use I can think of is in a maintenance plan where you may be iterating through all the databases on a server executing various maintenance tasks. As long as you limit your usage and don't design whole applications around RBAR (row-by-agonizing-row) processing, you should be fine.
In general cursors are a bad thing. However in some cases it is more practical to use a cursor and in some it is even faster to use one. A good example is a cursor through a contact table sending emails based on some criteria. (Not to open up the question if sending an email from your DBMS is a good idea - let's just assume it is for the problem at hand.) There is no way to write that set-based. You could use some trickery to come up with a set-based solution to generate dynamic SQL, but a real set-based solution does not exist.
However, a calculation involving the previous row can be done using a self join. That is usually still faster than a cursor.
In all cases you need to balance the effort involved in developing a faster solution. If nobody cares, if you process runs in 1 minute or in one hour, use what gets the job done quickest. If you are looping through a dataset that grows over time like an [orders] table, try to stay away from a cursor if possible. If you are not sure, do a performance test comparing a cursor base with a set-based solution on several significantly different data sizes.
I had always disliked cursors because of their slow performance. However, I found I didn't fully understand the different types of cursors and that in certain instances, cursors are a viable solution.
When you have a business problem that can only be solved by processing one row at a time, then a cursor is appropriate.
So to improve performance with the cursor, change the type of cursor you are using. Something I didn't know was, if you don't specify which type of cursor you are declaring, you get the Dynamic Optimistic type by default, which is the one that is the slowest for performance because it's doing lots of work under the hood. However, by declaring your cursor as a different type, say a static cursor, it has very good performance.
See these articles for a fuller explanation:
The Truth About Cursors: Part I
The Truth About Cursors: Part II
The Truth About Cursors: Part III
I think the biggest con against cursors is performance, however, not laying out a task in a set based approach would probably rank second. Third would be readability and layout of the tasks as they usually don't have a lot of helpful comments.
SQL Server is optimized to run the set based approach. You write the query to return a result set of data, like a join on tables for example, but the SQL Server execution engine determines which join to use: Merge Join, Nested Loop Join, or Hash Join. SQL Server determines the best possible joining algorithm based upon the participating columns, data volume, indexing structure, and the set of values in the participating columns. So using a set based approach is generally the best approach in performance over the procedural cursor approach.
They are necessary for things like dynamic SQL pivoting, but you should try and avoid using them whenever possible.
I'm running on SQL Server 2008 R2 and am trying to fine-tune performance. I did everything I could from:
Code review of SQL code
Create or remove indexes as I think appropriate
Auto create stats ON
Auto update stats ON
Auto update stats async ON
I have a 24/7 system that constantly stores data. Sometimes we do reads and that's where the issue is. Sometimes the reads take a couple of seconds or less (which would be expected and acceptable to us). Other times, the reads take several seconds that could amount to a minute before the stored procedure completes and we render data on the UI.
If we do the read again, it would be faster. The SQL profiler would trace the particular stored procedure or query that took several seconds. We would zoom into that stored procedure, and do everything we can do to optimize it if we can.
I also traced the auto stats event and the recompile event. It's hard to tell if a stat is being updated causing the read to take a long time, or if a recompile caused it. Sometimes, I see that the profiler traced a recompile of the read query that took several unacceptable minutes, other times it doesn't trace a recompile.
I tried to prevent the query optimizer from blocking the read until it recompiles or updates stats by using option use plan XML, etc. But I ran into compile errors complaining that the query plan XML isn't valid; that could be true because the query is quiet involved: select + joins that involve a local table var. I sort of hacked the XML and maybe that's why it deemed it invalid. So I gave up on using plan hint.
We tried periodic (every 15 minutes) manual running update stats in order to keep stats up-to-date as much as we can, but that hurt performance. updatestats blocks writes, and I'm sure even reads; updatestats seemed to maintain a bunch of statistics and on average it was taking around 80-90 seconds. A read that waits that long is unacceptable.
So the idea is to let the reads happen and prevent a situation when a recompile/update stat blocks it, correct? Does it make sense to disable auto statistics altogether? Or perhaps disable auto create statistics after deleting all the auto created stats?
This goes against Microsoft recommendations perhaps, since they enable auto create statistics and auto update statistics by default, and performance may suffer, but any ideas/hints you can give would be appreciated.
From what you are explaining, it looks like the below (all or some) might be happening.
You are doing physical reads. The quick way you avoid this is by increasing the amount of RAM you throw at the box. You haven't mentioned the hardware specs of your server. Please add details.
If you trace the SQL calls then you can easily figure out why the RECOMPILE happened. Look at the EventSubClass to figure out the reason and work towards resolving that.
ref: http://msdn.microsoft.com/en-us/library/ms187105.aspx
You mentioned table variables. These are notorious for causing performance issues when NOT using at the right place. If you use table variables in a JOIN, parallel plan is out of the question and no stats also. I am NOT sure how and where you are using but try replacing them with temp tables. And starting from SQL Server 2005, you will get only STMT recompilation at best and NOT the complete SP recompile as it happened in 2000.
You mentioned Update Stats ASYNC option and this won't block the query.
What are the TOP WAIT STATS on this server? Have you identified the expensive procedures based on CPU, Logical reads & execution count?
Have you looked the Page Life Expectancy, amount of IO using virtual file stats DMV?
Updating Stats every 15 minutes is NOT a good plan. How often is data inserted into the system? What is the sample rate you are using? What is your index maintenance strategy?
Have you looked at the missing indexes DMV?
There are a bunch of good queries to identify problems in more granular fashion using the below queries.
ref: http://dl.dropbox.com/u/13748067/SQL%20Server%202008%20Diagnostic%20Information%20Queries%20%28April%202011%29.sql
There are so many other things to look at but the above is a good starting point.
OK, here is my IMHO catch on this:
DBCC INDEXDEFRAG is worth trying and is an ONLINE function hence can be used on a live system
You could be reaching the maximum capacity of your architectural design. You can scale up which can always help but more likely you have to change the architecture to achieve better scalability sacrificing simplicity
A common trick is partitioning. You are writing to a table whose index distribution looks nothing like it was a few hours ago - hence degrading performance. This is a massive write, such a table could be divided to daily write and the rest of the data with nightly batches of moving stuff across.
More and more, people are being converting to CQRS. You might be the next. This solves the problem by separating reads from writes (a very simplistic explanation).
I have an ETL process that involves a stored procedure that makes heavy use of SELECT INTO statements (minimally logged and therefore faster as they generate less log traffic). Of the batch of work that takes place in one particular stored the stored procedure several of the most expensive operations are eager spools that appear to just buffer the query results and then copy them into the table just being made.
The MSDN documentation on eager spools is quite sparse. Does anyone have a deeper insight into whether these are really necessary (and under what circumstances)? I have a few theories that may or may not make sense, but no success in eliminating these from the queries.
The .sqlplan files are quite large (160kb) so I guess it's probably not reasonable to post them directly to a forum.
So, here are some theories that may be amenable to specific answers:
The query uses some UDFs for data transformation, such as parsing formatted dates. Does this data transformation necessitate the use of eager spools to allocate sensible types (e.g. varchar lengths) to the table before it constructs it?
As an extension of the question above, does anyone have a deeper view of what does or does not drive this operation in a query?
My understanding of spooling is that it's a bit of a red herring on your execution plan. Yes, it accounts for a lot of your query cost, but it's actually an optimization that SQL Server undertakes automatically so that it can avoid costly rescanning. If you were to avoid spooling, the cost of the execution tree it sits on will go up and almost certainly the cost of the whole query would increase. I don't have any particular insight into what in particular might cause the database's query optimizer to parse the execution that way, especially without seeing the SQL code, but you're probably better off trusting its behavior.
However, that doesn't mean your execution plan can't be optimized, depending on exactly what you're up to and how volatile your source data is. When you're doing a SELECT INTO, you'll often see spooling items on your execution plan, and it can be related to read isolation. If it's appropriate for your particular situation, you might try just lowering the transaction isolation level to something less costly, and/or using the NOLOCK hint. I've found in complicated performance-critical queries that NOLOCK, if safe and appropriate for your data, can vastly increase the speed of query execution even when there doesn't seem to be any reason it should.
In this situation, if you try READ UNCOMMITTED or the NOLOCK hint, you may be able to eliminate some of the Spools. (Obviously you don't want to do this if it's likely to land you in an inconsistent state, but everyone's data isolation requirements are different). The TOP operator and the OR operator can occasionally cause spooling, but I doubt you're doing any of those in an ETL process...
You're right in saying that your UDFs could also be the culprit. If you're only using each UDF once, it would be an interesting experiment to try putting them inline to see if you get a large performance benefit. (And if you can't figure out a way to write them inline with the query, that's probably why they might be causing spooling).
One last thing I would look at is that, if you're doing any joins that can be re-ordered, try using a hint to force the join order to happen in what you know to be the most selective order. That's a bit of a reach but it doesn't hurt to try it if you're already stuck optimizing.