SQL Server performance and fully qualified table names - sql-server

It seems to be fairly accepted that including the schema owner in the query increases db performance, e.g.:
SELECT x FROM [dbo].Foo vs SELECT x FROM Foo.
This is supposed to save a lookup, because SQL Server will otherwise look for a Foo table belonging to the user in the connection context.
Today I was told that always including the database name improves the performance the same way, even if you are querying the database you selected in your connection string:
SELECT x FROM MyDatabase.[dbo].Foo
Is there any truth to this? Does this make sense as a coding standard? Does any of this (even the first example) translate to measurable benefits?
Are we talking about a few cycles for an extra dictionary lookup on the database server vs more bloated SQL and extra concatenation on the web server (or other client)?

One thing to keep in mind is that this is a compilation binding, not an execution one. So if you execute the same query 1 million times, only the first execution will 'hit' the look up time, the rest will reuse the same plan and plans are pre-bound (names are already resolved to object ids).

In this case I would personally prefer readability over the tiny performance increase that this could possibly cause, if any.
SELECT * FROM Foo
Seems a lot easier to scan than:
SELECT * FROM MyDatabase.[dbo].Foo

Try it out? Just loop through a million queries of both and see which one finishes first.
I'm guessing it's a load of bunk though. The developers of MS SQL spend millions of hours researching efficiency for search algorithms and storage methods only to be thwarted by users not specifying fully qualified table names? Laughable.

SQL server will not make an extra look up if the default schema is the same. It should be included if it's not and it's a query that is used a lot.
The database name will not benefit the query performance. I think this could be seen with the Estimated Execution Plan in the Management Studio.

As Spencer said - try it, of course make sure you clear the cache each time as this will interfere with your results.
http://www.devx.com/tips/Tip/14401
I also would be suprised if it made any apprecible difference.

Related

ORDER BY is taking too much time when executed for VIEW

I have a relatively complicated set up. I'm using SQL Server 2012 with 3 linked servers which are IBM DB2 servers. I have several queries which join tables from all three linked servers to fetch data. Due to some specifics of the version that I'm using I can't use some OLAP functions directly so since an upgrade is not an option the workaround was to create views and execute those functions on the views. One problem that I'm facing right now is that using ORDER BY from the view almost triples up the time needed for the view to be executed.
When I execute the only with SELECT it takes 24 seconds (Yeah, I know we talk about ridiculous times here, but still, I want to fix the problem with the order by since I'm not allowed to change the queries to the DB2 servers but the order by is on my side), when I use ORDER BY it goes from 68 to 80 seconds depending on which column I'm ordering on. I can't create a schemabound view because it's now allowed with OpenQuery, I've read the anyways it's not allowed to use ORDER BY when creating a view, I haven't tried that but since I need the order by to be available on multiple columns it's not a n option unless I create as much views as columns I have which sounds kinda ridiculous but... dunno.
Since I have trivial knowledge about SQL at general I'm not sure what is the best choice here. Even if the execution times are fixed I don't want my Order by clause to be so much time consuming compared to the time needed for the whole query. If I can make it as fast as it is when I execute it directly in the query - when I don't use view and I add the ORDER BY to the initial query the original time is 24 seconds and then it goes up to 36 which in percents is still much better than the performance when the same ORDER BY function is executed from the view.
So my questions are - what causes the ORDER BY to be executed so slow from the view and how can I make it as fast as if it was part from the original query, also, if this is just not possible, how can I reduce the huge time it takes?
Views use different execution plans than the queries that make them. This is, in my opinion, a bit of a shortcoming of views. ORDER BY is a particularly expensive command, so it makes the difference in execution plans very noticeable.
The alternative to this issue I've found was going the Table Valued Function route as it does appear to use the same execution plan as just running the query.
Here's a decent write-up of Table Valued Functions:
http://technet.microsoft.com/en-us/library/ms191165(v=sql.105).aspx

Pros and cons of using a cursor (in SQL server)

I asked a question here Using cursor in OLTP databases (SQL server)
where people responded saying cursors should never be used.
I feel cursors are very powerful tools that are meant to be used (I don't think Microsoft supports cursors for bad developers).Suppose you have a table where the value of a column in a row is dependent on the value of the same column in the previous row. If it is a one time back end process, don't you think using a cursor would be an acceptable choice?
Off the top of my head I can think of a couple of scenarios where I feel there should be no shame in using cursors. Please let me know if you guys feel otherwise.
A one time back end process to clean bad data which completes execution within a few minutes.
Batch processes that run once in a long period of time (something like once a year).
If in the above scenarios, there is no visible strain on the other processes, wouldn't it be unreasonable to spend extra hours writing code to avoid cursors? In other words in certain cases the developer's time is more important than the performance of a process that has almost no impact on anything else.
In my opinion these would be some scenarios where you should seriously try to avoid using a cursor.
A stored proc called from a website that can get called very often.
A SQL job that would run multiple times a day and consume a lot of resources.
I think its very superficial to make a general statement like "cursors should never be used" without analyzing the task at hand and actually weighing it against the alternatives.
Please let me know of your thoughts.
There are several scenarios where cursors actually perform better than set-based equivalents. Running totals is the one that always comes to mind - look for Itzik's words on that (and ignore any that involve SQL Server 2012, which adds new windowing functions that give cursors a run for their money in this situation).
One of the big problems people have with cursors is that they perform slowly, they use temporary storage, etc. This is partially because the default syntax is a global cursor with all kinds of inefficient default options. The next time you're doing something with a cursor that doesn't need to do things like UPDATE...WHERE CURRENT OF (which I've been able to avoid my entire career), give it a fair shake by comparing these two syntax options:
DECLARE c CURSOR
FOR <SELECT QUERY>;
DECLARE c CURSOR
LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR <SELECT QUERY>;
In fact the first version represents a bug in the undocumented stored procedure sp_MSforeachdb which makes it skip databases if the status of any database changes during execution. I subsequently wrote my own version of the stored procedure (see here) which both fixed the bug (simply by using the latter version of the syntax above) and added several parameters to control which databases would be chosen.
A lot of people think that a methodology is not a cursor because it doesn't say DECLARE CURSOR. I've seen people argue that a while loop is faster than a cursor (which I hope I've dispelled here) or that using FOR XML PATH to perform group concatenation is not performing a hidden cursor operation. Looking at the plan in a lot of cases will show the truth.
In a lot of cases cursors are used where set-based is more appropriate. But there are plenty of valid use cases where a set-based equivalent is much more complicated to write, for the optimizer to generate a plan for, both, or not possible (e.g. maintenance tasks where you're looping through tables to update statistics, calling a stored procedure for each value in a result, etc.). The same is true for a lot of big multi-table queries where the plan gets too monstrous for the optimizer to handle. In these cases it can be better to dump some of the intermediate results into a temporary structure first. The same goes for some set-based equivalents to cursors (like running totals). I've also written about the other way, where people almost always think instinctively to use a while loop / cursor and there are clever set-based alternatives that are much better.
UPDATE 2013-07-25
Just wanted to add some additional blog posts I've written about cursors, which options you should be using if you do have to use them, and using set-based queries instead of loops to generate sets:
Best Approaches for Running Totals - Updated for SQL Server 2012
What impact can different cursor options have?
Generate a Set or Sequence Without Loops: [Part 1] [Part 2] [Part 3]
The issue with cursors in SQL Server is that the engine is set-based internally, unlike other DBMS's like Oracle which are cursor-based internally. This means that when you create a cursor in SQL Server, temporary storage needs to be created and the set-based resultset needs to be copied over to the temporary cursor storage. You can see why this would be expensive right off the bat, not to mention any row-by-row processing that you might be doing on top of the cursor itself. The bottom line is that set-based processing is more efficient, and often times your cursor-based operation can be done better using a CTE or temp table.
That being said, there are cases where a cursor is probably acceptable, as you said for one-off operations. The most common use I can think of is in a maintenance plan where you may be iterating through all the databases on a server executing various maintenance tasks. As long as you limit your usage and don't design whole applications around RBAR (row-by-agonizing-row) processing, you should be fine.
In general cursors are a bad thing. However in some cases it is more practical to use a cursor and in some it is even faster to use one. A good example is a cursor through a contact table sending emails based on some criteria. (Not to open up the question if sending an email from your DBMS is a good idea - let's just assume it is for the problem at hand.) There is no way to write that set-based. You could use some trickery to come up with a set-based solution to generate dynamic SQL, but a real set-based solution does not exist.
However, a calculation involving the previous row can be done using a self join. That is usually still faster than a cursor.
In all cases you need to balance the effort involved in developing a faster solution. If nobody cares, if you process runs in 1 minute or in one hour, use what gets the job done quickest. If you are looping through a dataset that grows over time like an [orders] table, try to stay away from a cursor if possible. If you are not sure, do a performance test comparing a cursor base with a set-based solution on several significantly different data sizes.
I had always disliked cursors because of their slow performance. However, I found I didn't fully understand the different types of cursors and that in certain instances, cursors are a viable solution.
When you have a business problem that can only be solved by processing one row at a time, then a cursor is appropriate.
So to improve performance with the cursor, change the type of cursor you are using. Something I didn't know was, if you don't specify which type of cursor you are declaring, you get the Dynamic Optimistic type by default, which is the one that is the slowest for performance because it's doing lots of work under the hood. However, by declaring your cursor as a different type, say a static cursor, it has very good performance.
See these articles for a fuller explanation:
The Truth About Cursors: Part I
The Truth About Cursors: Part II
The Truth About Cursors: Part III
I think the biggest con against cursors is performance, however, not laying out a task in a set based approach would probably rank second. Third would be readability and layout of the tasks as they usually don't have a lot of helpful comments.
SQL Server is optimized to run the set based approach. You write the query to return a result set of data, like a join on tables for example, but the SQL Server execution engine determines which join to use: Merge Join, Nested Loop Join, or Hash Join. SQL Server determines the best possible joining algorithm based upon the participating columns, data volume, indexing structure, and the set of values in the participating columns. So using a set based approach is generally the best approach in performance over the procedural cursor approach.
They are necessary for things like dynamic SQL pivoting, but you should try and avoid using them whenever possible.

Any suggestions for identifying what indexes need to be created?

I'm in a situation where I have to improve the performance of about 75 stored procedures (created by someone else) used for reporting. The first part of my solution was creating about 6 denormalized tables that will be used for the bulk of the reporting. Now that I've created the tables I have the somewhat daunting task of determining what Indexes I should create to best improve the performance of these stored procs.
I'm curious to see if anyone has any suggestions for finding what columns would make sense to include in the indexes? I've contemplated using Profiler/DTA, or possibly fasioning some sort of query like the one below to figure out the popular columns.
SELECT name, Count(so.name) as hits, so.xtype
from syscomments as sc
INNER JOIN sysobjects so ON sc.id=so.id
WHERE sc.text like '%ColumnNamme%'
AND xtype = 'P'
Group by name,so.xtype
ORDER BY hits desc
Let me know if you have any ideas that would help me not have to dig through these 75 procs by hand.
Also, inserts are only performed on this DB once per day so insert performance is not a huge concern for me.
Any suggestions for identifying what indexes need to be created?
Yes! Ask Sql Server to tell you.
Sql Server automatically keeps statistics for what indexes it can use to improve performance. This is already going on in the background for you. See this link:
http://msdn.microsoft.com/en-us/library/ms345417.aspx
Try running a query like this (taken right from msdn):
SELECT mig.*, statement AS table_name,
column_id, column_name, column_usage
FROM sys.dm_db_missing_index_details AS mid
CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle)
INNER JOIN sys.dm_db_missing_index_groups AS mig ON mig.index_handle = mid.index_handle
ORDER BY mig.index_group_handle, mig.index_handle, column_id;
Just be careful. I've seen people take the missing index views as Gospel, and use them to push out a bunch of indexes they don't really need. Indexes have costs, in terms of upkeep at insert, update, and delete time, as well as disk space and memory use. To make real, accurate use of this information you want to profile actual execution times of your key procedures both before and after any changes, to make sure the benefits of an index (singly or cumulative) aren't outweighed by the costs.
If you know all of the activity is coming from the 75 stored procedures then I would use profiler to track which stored procedures take the longest and are called the most. Once you know which ones are then look at those procs and see what columns are being used most often in the Where clause and JOIN ON sections. Most likely, those are the columns you will want to put non-clustered indexes on. If a set of columns are often times used together then there is a good chance you will want to make 1 non-clustered index for the group. You can have many non-clustered indexes on a table (250) but you probably don't want to put more than a handful on it. I think you will find the data is being searched and joined on the same columns over and over. Remember the 80/20 rule. You will probably get 80% of your speed increases in the first 20% of the work you do. There will be a point where you get very little speed increase for the added indexes, that is when you want to stop.
I concur with bechbd - use a good sample of your database traffic (by running a server trace on a production system during real office hours, to get the best snapshot), and let the Database Tuning Advisor analyze that sampling.
I agree with you - don't blindly rely on everything the Database Tuning Advisor tells you to do - it's just a recommendation, but the DTA can't take everything into account. Sure - by adding indices you can speed up querying - but you'll slow down inserts and updates at the same time.
Also - to really find out if something helps, you need to implement it, measure again, and compare - that's really the only reliable way. There are just too many variables and unknowns involved.
And of course, you can use the DTA to fine-tune a single query to perform outrageously well - but that might neglect the fact that this query is only ever called one per week, or that by tuning this one query and adding an index, you hurt other queries.
Index tuning is always a balance, a tradeoff, and a trial-and-error kind of game - it's not an exact science with a formula and a recipe book to strictly determine what you need.
You can use SQL Server profiler in SSMS to see what and how your tables are being called then using the Database Tuning Tool in profiler to at least start you down the correct path. I know most DBA's will probably scream at me for recommending this but for us non-DBA types such as myself it at least gives us a starting point.
If this is strictly a reporting database and you need performance, consider moving to a data warehouse design. A star or snowflake schema will outperform even a denormalized relational design when it comes to reporting.

SQL server fields with most usage

Is there an SQL query that can give me the fields that are used in most stored procedures or updated, selected most in a given table. I am asking this because I want to figure out which fields to put indexes on.
Take a look at the missing indexes article on SQLServerPedia http://sqlserverpedia.com/wiki/Find_Missing_Indexes
I think you are looking at the problem the wrong way around.
What you first need to identify are the most expensive (cumulative: so both single-run high cost, and many-runs lower cost) queries in your normal workload.
Once you have identified those queries, you can analyse their query plans and create appropriate indexes.
This SO Answer might be of use: How Can I Log and Find the Most Expensive Queries? (title and tags say SQL Server 2008, but my accepted answer applies to any version).
Most used fields are by no means index candidates. Good index candidates are those that correctly balance the extra storage requirements with SARGability and query projection coverage, as described in Index Design Basics. You should follow the advice the very engine is giving you, using the Missing Indexes feature:
sys.dm_db_missing_index_group_stats
sys.dm_db_missing_index_groups
sys.dm_db_missing_index_details
sys.dm_db_missing_index_columns
A good action plan is to start from the most expensive queries by IO obtained from sys.dm_exec_query_stats and then open the plan of the query with sys.dm_exec_query_plan in Management Studio and at the top of the query plan view will be a proposed index, with the CREATE INDEX just ready to copy and paste into execution. In fact you don't even have to run the queries to find the most expensive query in the plan cache, there are already SSMS reports that can find it for you.
Wish it was that easy.... You need to do a search for SQL Query Optimization. There are lots of tools to use for what you need.
And knowing which fields are used often does not tell you much about what you need to do to optimize access.
You might have some luck by searching your code for all the queries and then running a SELECT EXPLAIN on them. This should give you some glaring numbers when a query is poorly indexed. Unfortunately it won't give you an idea of which statements are called most frequently.
Also - I've noticed some issues with type mismatching in queries. When you pass a string containing a number and it's querying an index based on integers then it will scan the entire table.

Alternatives to MS SQL 2005 FullText Catalog

I can't seem to get acceptable performance from FullText Catalogs. We have situations where we must run 100k+ queries as quickly as possible. Some of the queries use FREETEXT some don't. Here's an example of a query
IF EXISTS(select 1 from user_data d where d.userid=#userid and FREETEXT(*, #activities) SET #match=1
This can take between 3-15 seconds. I need it to be much faster < 1s if possible.
I like the "flexibility" of the fulltext query in that it can search across multiple columns and the syntax is pretty intuitive. I'd rather not use a Like statement because we want to be able to match words like "Writer" and "Writing".
I've tried some of the suggestions listed here http://msdn.microsoft.com/en-us/library/ms142560(SQL.90).aspx
We've got as much memory and cpu as we can afford, unfortunately we can't put the catalogs on their own disk controllers.
I'm stumped and ready to explore other alternatives to FullText Queries. Is there anything else out there that gives that kind of "Writer"/"Writing" similar matches? Perhaps even something that uses the CLR?
Check out these alternatives, although I doubt they'll improve performance without isolating them onto separate hardware:
Which search technology to use with ASP.NET?
Lucene.Net and SQL Server
Due to the nature of FREETEXT, the performance is less than when you'd use CONTAINS, simply because it has to take into account less precise alternatives for the keywords given. CONTAINS can find Writing when you specify Write btw, I'm not sure if you've checked whether CONTAINS will do the trick or not.
Also be sure to avoid IF statements in SQL, as they often lead to a complete recompilation of the execution plan for every query execution, which will likely contribute to the poor performance you're seeing. I'm not sure how the IF statement is used, as it's likely inside a bigger piece of SQL. Try to merge the EXISTS query with that bigger piece of sql, as you can set the #match parameter from within the SELECT statement inside the EXISTS, or get rid of the variable altogether and use the EXISTS clause as a predicate in the bigger query.
SQL is a set-oriented language and interpreted. Therefore, it's often faster to get rid of imperative programming constructs and use the native set-operators of sql instead.
perhaps https://github.com/MahyTim/LuceneNetSqlDirectory can help you, it allows to store a LuceneNET index in SQLServer.

Resources