SQL Server Query Optimization Using Join Hints - sql-server

We have a query in our system that has been a problem in the amount of logical reads it is using. The query is run often enough (a few times a day), but it is report in nature (i.e. gathering data, it is not transactional).
After having a couple of people look at it we are mulling over a few different options.
Using OPTION (FORCE ORDER) and a few MERGE JOIN hints to get the optimizer to process the data more efficiently (at least on the data that has been tested).
Using temp tables to break up the query so the optimizer isn't dealing with a very large query which is allowing it process it more efficiently.
We do not really have the option of doing a major schema change or anything, tuning the query is kind of the rallying point for this issue.
The query hints option is performing a little better than the other option, but both options are acceptable in terms of performance at this point.
So the question is, which would you prefer? The query hints are viewed as slightly dangerous because it we are overriding the optimizer etc. The temp table solution needs to write out to the tempdb etc.
In the past we have been able to see large performance gains using temp tables on our larger reporting queries but that has generally been for queries that are run less frequently than this query.

if you have exhausted optimizing via indexes and removed non-SARGABLE sql then I recommend going for the temp tables option:
temp tables provide repeatable performance, provided they do not put excessive pressure on the tempdb in terms of size increase and performance - you will need to monitor those
sql hints may stop being effective because of other table/index changes in the future
remember to clean up temp tables when you are finished.

Related

Best Options to lead with tempDB with a lot of data

In order to load data from multiple data sources and a big amount of Data using SQL Server 2014.
My ETL Scripts are in T-SQL and it taking a lot of time to execute because my TempDB are full.
In your opinion, which is the best way to lead with this:
Using Commit Transactions?
Clean TempDB?
etc.
They only way to answer this question is with a very high level general response.
You have a few options:
Simply allocate more space to TempDB.
Optimize your ETL queries and tune your indexes.
Option 2 is often the better apporoach. Excessive use of TempDB indicates that inefficient sorts or joins are occurring. To resolve this, you need to analyze the actual execution plans of your ETL code. Look for the following:
Exclamation marks in your query plan. This often indicates that a join or a sort operation had to spill over to TempDB because the optimizer under estimated the amount of memory required. You might have statistics which needs to be updated.
Look for large differences in the estimated number of rows and actual number of rows. This can also indicate statistics that are out of date of parameter sniffing issues.
Look for sort operations. It is often possible to remove these by adding indexes to your tables.
Look for inefficient access methods. These can often be resolved by adding covering indexes. E.g table scan if you only need a small number of rows from a large table. Just note that table scans are often the best approach when loading data warehouses.
Hope this was helpful.
Marius

Optimize the query "Select * from tableA"

I was asked in an interview to enlighten the ways one can use to optimize the query Select * from TableA if it is taking a lot of time to execute. (TableA) can be any table with large amount of data. The interviewer didn't leave me any option like to select few columns or to use "WHERE" clause rather he wanted me to give solution for the subject query.
It's really hard to know what the interviewer was looking for.
They might be relatively inexperienced and expected answers like:
"list all the columns instead of * since that's way faster!"; or,
"add an ORDER BY because that will always speed it up!"
The kinds of things an experienced person might be looking for are:
inspect the query plan, are there computed columns or other similar things taking additional resources?
revisit the requirements - do the users really need the whole table in arbitrary order?
is there a clustered index on the table; if not, is the heap full of forwarding pointers?
is there excessive fragmentation on the underlying table (and/or the index being used to satisfy the query)?
is the query being blocked?
what is the query waiting on?
is the query waiting on an external resource (e.g. crappy I/O subsystem, a memory grant, a tempdb autogrow)?
is the query parallel and suffering packet waits because the stats are out of date?
There are a lot of underlying things that may be making that query slow or that may make that query a bad choice.
Actually some databases will have optimize commands that will rebuild the database tables to reduce fragmentation - and this way actually improve the performance for such queries.
PostgreSQL and SQLite have the command
VACUUM;
MySQL and ORACLE have a command
OPTIMIZE TABLE table;
It is expensive, as it will move around a lot of data. But in doing so, it will make the pages more balanced and this way usually shrink the total database size (some databases may however decide to add an index at this point, so it may also grow).
Since the data is stored in pages, reducing the number of pages by rebuilding the database can improve the performance even for a SELECT * FROM table; statement.

What do you do to make sure a new index does not slow down queries?

When we add or remove a new index to speed up something, we may end up slowing down something else.
To protect against such cases, after creating a new index I am doing the following steps:
start the Profiler,
run a SQL script which contains lots of queries I do not want to slow down
load the trace from a file into a table,
analyze CPU, reads, and writes from the trace against the results from the previous runs, before I added (or removed) an index.
This is kind of automated and kind of does what I want. However, I am not sure if there is a better way to do it. Is there some tool that does what I want?
Edit 1 The person who voted to close my question, could you explain your reasons?
Edit 2 I googled up but did not find anything that explains how adding an index can slow down selects. However, this is a well known fact, so there should be something somewhere. If nothing comes up, I can write up a few examples later on.
Edit 3 One such example is this: two columns are highly correlated, like height and weight. We have an index on height, which is not selective enough for our query. We add an index on weight, and run a query with two conditions: a range on height and a range on weight. because the optimizer is not aware of the correlation, it grossly underestimates the cardinality of our query.
Another example is adding an index on increasing column, such as OrderDate, can seriously slow down a query with a condition like OrderDate>SomeDateAfterCreatingTheIndex.
Ultimately what you're asking can be rephrased as 'How can I ensure that the queries that already use an optimal, fast, plan do not get 'optimized' into a worse execution plan?'.
Whether the plan changes due to parameter sniffing, statistics update or metadata changes (like adding a new index) the best answer I know of to keep the plan stable is plan guides. Deploying plan guides for critical queries that already have good execution plans is probably the best way to force the optimizer into keep using the good, validated, plan. See Applying a Fixed Query Plan to a Plan Guide:
You can apply a fixed query plan to a plan guide of type OBJECT or
SQL. Plan guides that apply a fixed query plan are useful when you
know about an existing execution plan that performs better than the
one selected by the optimizer for a particular query.
The usual warnings apply as to any possible abuse of a feature that prevents the optimizer from using a plan which may be actually better than the plan guide.
How about the following approach:
Save the execution plans of all typical queries.
After applying new indexes, check which execution plans have changed.
Test the performance of the queries with modified plans.
From the page "Query Performance Tuning"
Improve Indexes
This page has many helpful step-by-step hints on how to tune your indexes for best performance, and what to watch for (profiling).
As with most performance optimization techniques, there are tradeoffs. For example, with more indexes, SELECT queries will potentially run faster. However, DML (INSERT, UPDATE, and DELETE) operations will slow down significantly because more indexes must be maintained with each operation. Therefore, if your queries are mostly SELECT statements, more indexes can be helpful. If your application performs many DML operations, you should be conservative with the number of indexes you create.
Other resources:
http://databases.about.com/od/sqlserver/a/indextuning.htm
However, it’s important to keep in mind that non-clustered indexes slow down the data modification and insertion process, so indexes should be kept to a minimum
http://searchsqlserver.techtarget.com/tip/Stored-procedure-to-find-fragmented-indexes-in-SQL-Server
Fragmented indexes and tables in SQL Server can slow down application performance. Here's a stored procedure that finds fragmented indexes in SQL servers and databases.
Ok . First off, index's slow down two things (at least)
-> insert/update/delete : index rebuild
-> query planning : "shall I use that index or not ?"
Someone mentioned the query planner might take a less efficient route - this is not supposed to happen.
If your optimizer is even half-decent, and your statistics / parameters correct, there is no way it's going to pick the wrong plan.
Either way, in your case (mssql), you can hardly trust the optimizer and will still have to check every time.
What you're currently doing looks quite sound, you should just make sure the data you're looking at is relevant, i.e. real use case queries in the right proportion (this can make a world of difference).
In order to do that I always advise to write a benchmarking script based on real use - through logging of production-env. queries, a bit like I said here :
Complete db schema transformation - how to test rewritten queries?

SQL server file size affection on performance

I have two table in database A,B
A: summary view of my data
B: detail view (text files has detail story)
I have 1 million record on my database 70% of the size is for Table B and 30% for Table A.
I want to know if the size of database affect the query performance response time ?
Is it beneficial to remove my Table B and store it on Disk to reduce the file size of the database to optimize the performance of my database ?
Absolutely size can be a factor on DB performance! However, the size can be mitigated through the proper use of indexes, relationships and integrity. There are a number of other things than can cause performance loss like triggers that may execute in unwanted situations.
I would say removing the table is an option but I don't think should be your first choice. Make sure your database has used the things I mention above properly.
There are many databases that are exponentially larger than yours that perform very well. Also, use explain plans on your queries to make sure you are using sound syntax like the proper use of joins.
I would personally start with using explain plans. This will tell you if you are missing indexes, joins, etc. Then make the changes one at a time until you are happy with the performance.
A million records is a tiny table in database terms. Any modern datbase should be able to handle that without breaking a sweat, even Access can easily handle that. If you are currently experiencing performance problems you likely have database design problems, poorly written queries, and/or inadequate hardware.

SQL Server 2005 - Rowsize effect on query performance?

Im trying to squeeze some extra performance from searching through a table with many rows.
My current reasoning is that if I can throw away some of the seldom used member from the searched table thereby reducing rowsize the amount of pagesplits and hence IO should drop giving a benefit when data start to spill from memory.
Any good resource detailing such effects?
Any experiences?
Thanks.
Tuning the size of a row is only a major issue if the RDBMS is performing a full table scan of the row, if your query can select the rows using only indexes then the row size is less important (unless you are returning a very large number of rows where the IO of returning the actual result is significant).
If you are doing a full table scan or partial scans of large numbers of rows because you have predicates that are not using indexes then rowsize can be a major factor. One example I remember, On a table of the order of 100,000,000 rows splitting the largish 'data' columns into a different table from the columns used for querying resulted in an order of magnitude performance improvement on some queries.
I would only expect this to be a major factor in a relatively small number of situations.
I don't now what else you tried to increase performance, this seems like grasping at straws to me. That doesn't mean that it isn't a valid approach. From my experience the benefit can be significant. It's just that it's usually dwarfed by other kinds of optimization.
However, what you are looking for are iostatistics. There are several methods to gather them. A quite good introduction can be found ->here.
The sql server query plan optimizer is a very complex algorithm and decision what index to use or what type of scan depends on many factors like query output columns, indexes available, statistics available, statistic distribution of you data values in the columns, row count, and row size.
So the only valid answer to your question is: It depends :)
Give some more information like what kind of optimization you have already done, what does the query plan looks like, etc.
Of cause, when sql server decides to do a table scna (clustered index scan if available), you can reduce io-performance by downsize row size. But in that case you would increase performance dramatically by creating a adequate index (which is a defacto a separate table with smaller row size).
If the application is transactional then look at the indexes in use on the table. Table partitioning is unlikely to be much help in this situation.
If you have something like a data warehouse and are doing aggregate queries over a lot of data then you might get some mileage from partitioning.
If you are doing a join between two large tables that are not in a 1:M relationship the query optimiser may have to resolve the predicates on each table separately and then combine relatively large intermediate result sets or run a slow operator like nested loops matching one side of the join. In this case you may get a benefit from a trigger-maintained denormalised table to do the searches. I've seen good results obtained from denormalised search tables for complex screens on a couple of large applications.
If you're interested in minimizing IO in reading data you need to check if indexes are covering the query or not. To minimize IO you should select column that are included in the index or indexes that cover all columns used in the query, this way the optimizer will read data from indexes and will never read data from actual table rows.
If you're looking into this kind of details maybe you should consider upgrading HW, changing controllers or adding more disk to have more disk spindle available for the query processor and so allowing SQL to read more data at the same time
SQL Server disk I/O is frequently the cause of bottlenecks in most systems. The I/O subsystem includes disks, disk controller cards, and the system bus. If disk I/O is consistently high, consider:
Move some database files to an additional disk or server.
Use a faster disk drive or a redundant array of inexpensive disks (RAID) device.
Add additional disks to a RAID array, if one already is being used.
Tune your application or database to reduce disk access operations.
Consider index coverage, better indexes, and/or normalization.
Microsoft SQL Server uses Microsoft Windows I/O calls to perform disk reads and writes. SQL Server manages when and how disk I/O is performed, but the Windows operating system performs the underlying I/O operations. Applications and systems that are I/O-bound may keep the disk constantly active.
Different disk controllers and drivers use different amounts of CPU time to perform disk I/O. Efficient controllers and drivers use less time, leaving more processing time available for user applications and increasing overall throughput.
First thing I would do is ensure that your indexes have been rebuilt; if you are dealing with huge amount of data and an index rebuild is not possible (if SQL server 2005 onwards you can perform online rebuilds without locking everyone out), then ensure that your statistics are up to date (more on this later).
If your database contains representative data, then you can perform a simple measurement of the number of reads (logical and physical) that your query is using by doing the following:
SET STATISTICS IO ON
GO
-- Execute your query here
SET STATISTICS IO OFF
GO
On a well setup database server, there should be little or no physical reads (high physical reads often indicates that your server needs more RAM). How many logical reads are you doing? If this number is high, then you will need to look at creating indexes. The next step is to run the query and turn on the estimated execution plan, then rerun (clearing the cache first) displaying the actual execution plan. If these differ, then your statistics are out of date.
I think you're going to be farther ahead using standard optimization techniques first -- check your execution plan, profiler trace, etc. and see whether you need to adjust your indexes, create statistics etc. -- before looking at the physical structure of your table.

Resources