How to optimize SQL query/SP using execution plan in SQL Server? - sql-server

I have a SP which is querying on 12 tables, few are very big tables approximately 15GB of size. I am querying on datetime fields to get one month data with about 15 columns.
Please suggest me step by step approach to write optimize query in sql server using indexes.
I can not share execution plan but I can tell you now the only issue with hash match(inner join), it is taking most of the time of execution.
Thanks in advance.

Look for "table scan" and "clustered index scan" both indicate the need for an index on that table or a tweak to an existing index. I'd give you more but this is a massive subject.
If you have an expensive KEY lookup on a clustered index then this can usually be fixed with an appropriate Covering Index.

Related

SQL Server execution plan Index seek

I was trying to improve 2 queries that are almost the same with indexing. I saw a Table Scan in the first query and created an index to make that an Index Seek, when I saw the second query, SQL Server indicated to create an index equals that last I have created changing the order of columns only, but in execution plan the SQL Server Engine was already doing an Index seek on the table.
My question is:
If SQL Server execution plan are already an index seek should I create another index for this query, should I delete the index I have created and replace with this other one, or should I ignore the advice that SQL Server gives?
One cannot answer without specific details. This is not a guessing game. Please post the exact table structure, table sizes, the indexes you added and the execution plans you have.
The fact that you added an index does not mean you added the best index. Nor does the fact that the execution plan uses an index seek implies the plan is optimal. Wrong index column order and partial predicate match would manifest as 'seek' on the leading column(s), it would be suboptimal, and SQL would continue recommending a better index (ie. exactly the symptoms you describe).
Please read Understanding how SQL Server executes a query and How to analyse SQL Server performance.
I saw a Table Scan in the first query and created an index to make that an Index Seek
All Seeks are not good,All Scans are not bad..
Imagine you have an customers table with 10 customers each having 1000 orders,now total rows in the orders table is 10000 rows..
To get top 1 order for each customer ,if your query is doing scan of orders table it may be bad,since doing seek will only cost you 10 seeks..
You have to understand the data and see why optimizer choose this plan and how you make optimizer in choosing the plan you need..Itzik Ben-Gan gives amazing examples in this tutorial and there is a video on SQL Bits
Further Craig Freedman talks on seeks and scans part and goes into details on why optimiser may choose Scan over Seek due to random reads,data density

Why is my Sql Server query plan using scans instead of seeks when joining data?

Comming from an MySQL background I'm having difficulties to understand what's wrong with the following setup.
I have two tables variable and dimension
Both have a primary key, variable furthermore has a foreign key to dimension named dimension_instance_1_uid, on which an index was created.
When I execute a query like this
SELECT
this_.name, dimensioni4_.name
FROM dbo.variable this_
INNER JOIN dbo.dimension_instance dimensioni4_
-- even with index hint nothing changes...
-- WITH (INDEX(PK_dimension_instance))
ON this_.dimension_instance_1_uid = dimensioni4_.UID
it seems as if the index isn't used for a seek and a scan is executed according to the execution plan. It shows two index scan's instead of one index scan and one index seek.
I would expect a index seek because in my case in dimension_instance only 10 of 15k records match entries in variable table.
Can anybody shed some light in my misunderstanding of how MS SQL indexes work.
The Query Execution Plan and the Query Optimizer estimate what is the better operation to do regarding the data inside the db and other variables: in your case maybe it thinks that the query will be less costly doing an index scan instead of a seek: this may be caused from low row numbers
it seems as if the index isn't used at all when I look at the execution plan.
Am I blind, are you blind or did you post the wrong execution plan?
The plan has two source tables and bot use a Clustered Index Scan. THat is100% usage of an index for source table access.
Now, why a scan and not a seek -well, because you dont have any limitations (where clause) and that may be the fastest way. If te machine assumes both tables must be fully read anyway, why doing a seek instead of a scan?
Can anybody shed some light in my misunderstanding of how MS SQL indexes work.
It's not the indexes you misunderstand, but the Hash Join. Hash Join just doesn't have a use for indexes on the join predicates (unlike nested loops join).
http://use-the-index-luke.com/sql/join/hash-join-partial-objects

Multiple Joins in SQL query

I'm trying to run a query from multiple tables and I'm having an issue with the query taking over 10 minutes to provide just 3 records. The query is as follows:
select TOP 100 pm_entity_type_name, year(event_date),
pm_event_type_name, pm_event_name, pm_entity_name,
pm_entity_code, event_priority, event_cost
from pm_event_priority, pm_entity, pm_entity_type, pm_event_type, pm_event
where pm_event.pm_event_id = pm_event_priority.pm_event_id
And pm_entity.pm_entity_id = pm_event_priority.pm_entity_id
And pm_entity_type.pm_entity_type_id = pm_entity.pm_entity_type_id
And pm_event_type.pm_event_type_id = pm_event_priority.pm_event_type_id
And ( pm_entity.pm_entity_type_id = '002LEITUU0005T8EX40001XFTEW000000OZX' OR
pm_entity_type.parent_id= '002LEITUU0005T8EX40001XFTEW000000OZX' )
ORDER BY 1,2,3
I wonder, is there any way I can modify this query to possibly make the query a little faster?
Query performance can tank when you have to join many large tables together, particularly when the join columns are not properly indexed. In your case, I suspect your tables are quite large (many rows) and the _id columns are not indexed.
If you are using SQL Server Management Studio, you can click on "Display Estimated Execution Plan" to see how the query optimizer is interpreting your query. If you see a bunch of Table Scans rather than Index Scans/Seeks, this means SQL Server has to read through each and every row in your tables; a performance nightmare! Try putting some indexes on the _id columns of each table (perhaps a clustered index), and/or using Database Engine Tuning Advisor to automatically recommend the best index structure to apply to your tables to improve this query's performance.
You need to look at the query plan. See this question on how to obtain it.
Once you have a query plan, see if you can tell from the list what is so slow. Chances are there are table scans because of missing indexes.
What happens if you take out the TOP 100 and do a SELECT * using the same criteria? If you get a ridiculous amount of data back, there may be missing join criteria.

Oracle 11g: Index not used in "select distinct"-query

My question concerns Oracle 11g and the use of indexes in SQL queries.
In my database, there is a table that is structured as followed:
Table tab (
rowid NUMBER(11),
unique_id_string VARCHAR2(2000),
year NUMBER(4),
dynamic_col_1 NUMBER(11),
dynamic_col_1_text NVARCHAR2(2000)
) TABLESPACE tabspace_data;
I have created two indexes:
CREATE INDEX Index_dyn_col1 ON tab (dynamic_col_1, dynamic_col_1_text) TABLESPACE tabspace_index;
CREATE INDEX Index_unique_id_year ON tab (unique_id_string, year) TABLESPACE tabspace_index;
The table contains around 1 to 2 million records. I extract the data from it by executing the following SQL command:
SELECT distinct
"sub_select"."dynamic_col_1" "AS_dynamic_col_1","sub_select"."dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM
(
SELECT "tab".* FROM "tab"
where "tab".year = 2011
) "sub_select"
Unfortunately, the query needs around 1 hour to execute, although I created the both indexes described above.
The explain plan shows that Oracle uses a "Table Full Access", i.e. a full table scan. Why is the index not used?
As an experiment, I tested the following SQL command:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
Even in this case, the index is not used and a full table scan is performed.
In my real database, the table contains more indexed columns like "dynamic_col_1" and "dynamic_col_1_text".
The whole index file has a size of about 50 GB.
A few more informations:
The database is Oracle 11g installed on my local computer.
I use Windows 7 Enterprise 64bit.
The whole index is split over 3 dbf files with about 50GB size.
I would really be glad, if someone could tell me how to make Oracle use the index in the first query.
Because the first query is used by another program to extract the data from the database, it can hardly be changed. So it would be good to tweak the table instead.
Thanks in advance.
[01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan.
The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
Are you sure that an index access would be faster than a full table scan? As a very rough estimate, full table scans are 20 times faster than reading an index. If tab has more than 5% of the data in 2011 it's not surprising that Oracle would use a full table scan. And as #Dan and #Ollie mentioned, with year as the second column this will make the index even slower.
If the index really is faster, than the issue is probably bad statistics. There are hundreds of ways the statistics could be bad. Very briefly, here's what I'd look at first:
Run an explain plan with and without and index hint. Are the cardinalities off by 10x or more? Are the times off by 10x or more?
If the cardinality is off, make sure there are up to date stats on the table and index and you're using a reasonable ESTIMATE_PERCENT (DBMS_STATS.AUTO_SAMPLE_SIZE is almost always the best for 11g).
If the time is off, check your workload statistics.
Are you using parallelism? Oracle always assumes a near linear improvement for parallelism, but on a desktop with one hard drive you probably won't see any improvement at all.
Also, this isn't really relevant to your problem, but you may want to avoid using quoted identifiers. Once you use them you have to use them everywhere, and it generally makes your tables and queries painful to work with.
Your index should be:
CREATE INDEX Index_year
ON tab (year)
TABLESPACE tabspace_index;
Also, your query could just be:
SELECT DISTINCT
dynamic_col_1 "AS_dynamic_col_1",
dynamic_col_1_text "AS_dynamic_col_1_text"
FROM tab
WHERE year = 2011;
If your index was created solely for this query though, you could create it including the two fetched columns as well, then the optimiser would not have to go to the table for the query data, it could retrieve it directly from the index making your query more efficient again.
Hope it helps...
I don't have an Oracle instance on hand so this is somewhat guesswork, but my inclination is to say it's because you have the compound index in the wrong order. If you had year as the first column in the index it might use it.
Your second test query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
would not use the index because you have no WHERE clause, so you're asking Oracle to read every row in the table. In that situation the full table scan is the faster access method.
Also, as other posters have mentioned, your index on YEAR has it in the second column. Oracle can use this index by performing a skip scan, but there is a performance hit for doing so, and depending on the size of your table Oracle may just decide to use the FTS again.
I don't know if it's relevant, but I tested the following query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
WHERE "dynamic_col_1" = 123 AND "dynamic_col_1_text" = 'abc'
The explain plan for that query show that Oracle uses an index scan in this scenario.
The columns dynamic_col_1 and dynamic_col_1_text are nullable. Does this have an effect on the usage of the index?
01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
Try this:
1) Create an index on year field (see Ollie answer).
2) And then use this query:
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
or
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
GROUP BY dynamic_col_1, dynamic_col_1_text
Maybe it will help you.

Inserting taking longer on Sql Server 2005 table

I have a table with about 45 columns and as more data goes in, the longer it takes for the inserts to happen. I have increased the size of the data and log files, reduced the fill factor on all the indexes on that table, and still slower and slower insert times. Any ideas would be GREATLY appreciated.
For inserts, you want to DECREASE the fillfactor on the indexes on the table in order to reduce page splitting.
It is somewhat expected that it will take longer to insert as more data goes in, because your indexes just plain get bigger.
Try putting in data in batches instead of row-by-row. SQL Server is more efficient that way.
Make sure you don't have too many indexes on your tables.
Consider using SQL Server 2005's INCLUDE statement on your indexes if you are just including columns in your indexes because you want them covered in your queries.
How big is the table?
What is the context? Is this a batch of many new records?
Can you post the schema including index definition?
Can you SET STATISTICS IO ON, SET STATISTICS TIME ON, and post the display for one iteration?
Is there anything pathological about the data, or the context? Is this on a server or a laptop (testing)?
Why dont you drop index before inserting and recreate index back on to table so you no need to do update statistics
You could also ensure that the indexes on that table are defragmented

Resources