Interpreting SQL Server table statistics - sql-server

I write queries and procedures, I have no experience as a DB Admin and I am not in a position to be such. I work with hundreds of tables and certain older tables are difficult to work with. I suspect that statistics are a problem but the DBA states that isn't the case.
I don't know how to interpret statistics or even which ones I should look at. As an example, I am currently JOINing 2 tables, it is a simple JOIN that uses an index.
It returns just under 500 rows in 4 columns. It runs very quickly but not when in production with thousands of runs a day. My estimated and actual rows on this JOIN are off by 462%.
I have distilled this stored procedure down to a lot of very basic temp tables to locate the problem areas and it appears to be 2 tables, this example is one of them.
What I want is to know is which commands to run and what statistics to look at to take to the DBA to discuss the specific problem at hand. I do not want to be confrontational but informational. I have a very good professional relationship with this DBA but he is very black and white with his policies so I may not get anywhere with it in the end, but then I can also take that to my lead if I get stonewalled.
I ran a DBCC SHOW_STATISTICS on the table's index. I am not sure if this is the data I need or what I am really looking at. I would really like to know where to start with this. I have googled but all the pages I read are very geared towards DBAs and assume prior knowledge in areas I don't have.
Below is a sample of my JOIN obfuscated - my JOIN is on a temp table the first 2 conditions are needed for the Index, the date conditions when removed make the JOIN actually much worse with 10x the reads:
SELECT
x.UniqueID,
x.ChargeCode,
x.dtDate,
x.uniqueForeignID
INTO
#AnotherTempTable
FROM
Billing.dbo.Charges x
JOIN
#temptable y ON x.uniqueForeignID = y.uniqueID
AND x.ChargeCode = y.ChargeCode
AND #PostMonthStart <= x.dtDate
AND x.dtDate < #PostMonthEnd
The JOIN above is part of a new plan where I have been dissecting all the data needed into temp tables to determine the root cause of the problem in high CPU and reads in production. Below is a list of all the statements that are executing, sorted by the number of reads. The second row is this example query but there are others with similar issues.
Below is the execution plan operations for the plan prior to my updates.
While the new plan has better run time and closer estimates, I worry that I am still going to run into issues if the statistics are off. If I am completely off-base, please tell me and point me in the right direction, I will gladly bark up a different tree if I am making incorrect assumptions.

The first table returned shows some general information. You can see the statistics on this index were last updated 12/25/2019 at 10:19 PM. As of the writing of this answer, that is yesterday evening, so stats were updated recently. That is likely to be some kind of evening maintenance, but it could also be a threshold of data modifications that triggered an automatic statistics update.
There were 222,596,063 rows in the table at the time the statistics were sampled. The statistics update sampled 626,452 of these rows, so the sample rate is 0.2%. This sample size was likely the default sample rate used by a simple update statistics MyTable command.
A sample rate of 0.2% is fast to calculate but can lead to very bad estimates-- especially if an index is being used to support a foreign key. For example, a parent/child relationship may have a ParentKey column on the child table. A low statistics sample rate will result in very high estimates per parent row which can lead to strange decisions in query plans.
Look at the third table (the histogram). The RANGE_HI_KEY corresponds to a specific key value of the first column in this index. The EQ_ROWS column is the histogram's estimate of the number of rows that correspond to this key. If you get a count of the rows in this table by one of these keys in the RANGE_HI_KEY column, does the number in the EQ_ROWS column look like an accurate estimate? If not, a higher sample rate may yield better query plans.
For example, take the value 1475616. Is the count of rows for this key close to the EQ_ROWS value of 3893?
select count(*) from MyTable where FirstIndexColumn = 1475616
If the estimate is very bad, the DBA may need to increase the sample size on this table:
update statistics MyTable with sample 5 percent
If the DBA uses Ola Hallengren's plan (an excellent choice, in my opinion), this can be done by passing the #StatisticsSample parameter to the IndexOptimize procedure.

Related

T-SQL Clustered Index Seek Performance

Usual blather... query takes too long to run... blah blah. Long question. blah.
Obviously, I am looking at different ways of rewriting the query; but that is not what this post is about.
To resolve a "spill to tempdb" warning in a query, I have already
rebuilt all of the indexes in the database
updated all of the statistics on the tables and indexes
This fixed the "spill to tempdb" warning and improved the query performance.
Since rebuilding indexes and statistics resulted in a huge performance gain for one query (with out having to rewrite it), this got me thinking about how to improve the performance of other queries without rewriting them.
I have a nice big query that joins about 20 tables, does lots of fancy stuff I am not posting here, but takes about 6900ms to run.
Looking at the actual execution plan, I see 4 steps that have a total cost of 79%; so "a-hah" that is where the performance problem is. 3 steps are "clustered index seek" on PK_Job and the 4th step is an "Index lazy spool".
execution plan slow query
So, I break out those elements into a standalone query to investigate further... I get the "same" 4 steps in the execution plan, with a cost of 97%, only the query time is blazing fast 34ms. ... WTF? where did the performance problem disappear to?
execution plan fast query
I expected the additional tables to increase the query time; but I am not expecting the execution time to query this one Job table to go from 30ms to 4500ms.
-- this takes 34ms
select *
from equip e
left join job jf on (jf.jobid = e.jobidf)
left join job jd on (jd.jobid = e.jobidd)
left join job jr on (jr.jobid = e.jobidd)
-- this takes 6900ms
select *
from equip e
left join job jf on (jf.jobid = e.jobidf)
left join job jd on (jd.jobid = e.jobidd)
left join job jr on (jr.jobid = e.jobidd)
-- add another 20 tables in here..
Question 1: what should I look at in the two execution plans to identify why the execution time (of the clustered index seek) on this table goes from 30ms to 4500ms?
So, thinking this might have something to do with the statistics I review the index statistics on the PK_Job = JobID (which is an Int column) the histogram ranges look useless... all the "current" records are lumped together in one range (row 21 in the image). Standard problem with a PK that increments, new data is always in the last range; that is 99.999% of the JobID values that are referenced are in the one histogram range. I tried adding a filtered statistic, but that had no impact on the actual execution plan.
output from DBCC SHOW_STAT for PK_Job
Question 2: are the above PK_Job statistics a contributing factor to the complicated query being slow? That is, would "fixing" the statistics help with the complicated query? if so, what could that fix look like?
Again: I know, rewrite the query. Post more of the code (all 1500 lines of it that no one will find of any use). blah, blah.
What I would like are tips on what to look at in order to answer Q1 and Q2.
Thanks in advance!
Question 3: why would a simple IIF add 100ms to a query? the "compute scalar" nodes all show a cost of 0%, but the IIF doubles the execution time of the query.
adding this to select doubles execution time from 90ms to 180ms; Case statements are just as bad too.
IFF(X.Okay = 1, '', 'N') As OkayDesc
Next observation: Actual execution plan shows query cost relative to batch of 98%; but STATISTICS TIME shows cpu time of 141 ms; however batch cpu time is 3640 ms.
Question 4: why doesn't the query cost % (relative to batch) match up with statement cpu time?
The SQL Engine is pretty smart in optimizing badly written queries in most of the cases. But, when a query is too complex, sometimes it cannot use these optimizations and even perform bad.
So, you are asking:
I break out those elements into a standalone query to investigate
further... I get the "same" 4 steps in the execution plan, with a cost
of 97%, only the query time is blazing fast 34ms? where did
the performance problem disappear to?
The answer is pretty simple. Breaking the queries and materializing the data in #table or #table helps the engine to understand better with what amount of that it is working and built a better plan.
Brent Ozar wrote about this yesterday giving an example how bad a big query can be.
If you want more details about how to optimize your query via rewriting, you need to provide more details, but in my practice, in most of the cases simplifying the query and materializing the data in #temp tables (as we can use parallel operations using them) is giving good results.

Netezza: How does the WHERE clause influence the estimated rows number in query verbose plan?

I execute a simple query:
SELECT * FROM TABLE1
WHERE ID > 9 AND ID < 11
and the query verbose plan is:
[SPU Sequential Scan table "TABLE1" {(TABLE1."ID")}]
-- Estimated Rows = 1, ...
But after changing the where clause to
WHERE ID = 10
the query verbose plan changes:
[SPU Sequential Scan table "TABLE1" {(TABLE1."ID")}]
-- Estimated Rows = 1000, ...
(where 1000 is the total number of rows in TABLE1).
Why is it so? How does the estimation work?
The optimizer of any cost-based database is always full of surprises, and this one is not unusual across the platforms im familiar with.
A couple of questions:
- have you created statistics on the table? (otherwise you are flying blind)
- what is the datatype for that column ? (i hope it is an integer of some sort, not a NUMBER(x,y), even if y=0)
Furthermore:
The statistics for a column in netezza contains no distribution statistics (it won't know if there are more "solved" than "unsolved" cases in a support-system table with 5 years worth of data). Instead it relies on two things:
1) for all tables: simple statistics if you create them (number of distinct values, max+min values, number of nulls)
2) for large'ish tables (I think the configureable minimum value is close to 100 mill rows) it creates JIT syatistics (Just In Time) by scanning a few random data pages on the dataslices that all live up to the zone-mappable whereclauses and creating statistics for this one query.
The last feature is actually quite powerfull, even though is adds runtime to planning-phase of the query. It significantly increases the likelyhood that if there are SOME correlation between two whereclauses on a table, this will be taken into account.
An example: a whereclause on (AGE>60 and Retired=true) in a list of all citizens in a major city. It is most likely more or less irrelevant to add the AGE restriction, and Netezza will know that.
In general you should not worry about estimated number of rows being a bit off (as in this case) with netezza, it will most often get it "right enough" and throw hardware at the problem to compensate for any minor mistakes.
Untill recently I worked with SQLserver which is notorius (may be better in newer version) for being overly optimistic about the value of where clauses, and ending up in access plans with 5 levels of nested-loop joins with millions of rows in each, when joining 6 tables. Changing where clauses much like you did in the question, will cause sqlserver to put LESS empathesis on a specific restriction, and that can cause the 5 joins to become a more efficient HASH or other algorithm, resulting in better performance. In my experience that is MUCH too frequent an occurance on databases that relies TOO heavily on these estimates - perhaps because the optimizer were not created/tuned for a warehouse-like workload.

order by slows query down massively

using sql server 2014; ((SP1-CU3) (KB3094221) Oct 10 2015 x64
I have the following query
SELECT * FROM dbo.table1 t1
LEFT JOIN dbo.table2 t2 ON t2.trade_id = t1.tradeNo
LEFT JOIN dbo.table3 t3 ON t3.TradeReportID = t1.tradeNo
order by t1.tradeNo
there are ~70k, 35k and 73k rows in t1,t2 and t3 respectively.
When I omit the order by this query executes in 3 seconds with 73k rows.
As written the query took 8.5 minutes to return ~50k rows (I since stopped it)
Switching the order of the LEFT JOINs makes a difference:
SELECT * FROM dbo.table1 t1
LEFT JOIN dbo.table3 t3 ON t3.TradeReportID = t1.tradeNo
LEFT JOIN dbo.table2 t2 ON t2.trade_id = t1.tradeNo
order by t1.tradeNo
This now runs in 3 seconds.
I dont have any indexes on the tables. Adding indexes t1.tradeNo and t2.trade_id and t3.TradeReportID has no effect.
Running the query with only one left join (both scenarios) in combination with the order by is fast.
Its fine for me to swap the order of the LEFT JOINs but this doesnt go far to explaining why this happens and under what scenarios it may happen again
The estimated exectuion plan is: (slow)
(exclamation mark details)
VS
Switching the order of the left joins (fast):
which I note are markedly different but I cannot interpret these to explain the performance difference
UPDATE
It appears the addition of the order by clause results in the execution plan using the Table Spool (lazy spool) vs NOT using this in the fast query.
If I turn off the table spool via DBCC RULEOFF ('BuildSpool'); this 'fixes' the speed but according to this post this isnt recommended and cannot do it per query anyway
UPDATE 2
One of the columns returned (table3.Text] has type varchar(max)) - If this is changed to nvarchar(512) then the original (slow) query is now fast - ie the execution plan now decides to not use the Table Spool - also note that even tho the type is varchar(max) the field values are NULL for every one of the rows. This is now fixable but I am none the wiser
UPDATE 3
Warnings in the execution plan stated
Type conversion in expression (CONVERT_IMPLICIT(nvarchar(50),[t2].[trade_id],0)) may affect "CardinalityEstimate" in query plan choice, ...
t1.tradeNo is nvarchar(21) - the other two are varchar(50) - after altering the latter two to the same as the first the problem disappears! (leaving varchar(max) as stated in UPDATE 2 unchanged)
Given this problem goes away when either UPDATE 2 or UPDATE 3 are rectified I would guess that its a combination of the query optimizer using a temp table (table spool) for a column that has an unbounded size - very interesting despite the nvarchar(max) column having no data.
Your likely best fix is to make sure both sides of your joins have the same datatype. There's no need for one to be varchar and the other nvarchar.
This is a class of problems that comes up quite frequently in DBs. The database is assuming the wrong thing about the composition of the data it's about to deal with. The costs shown in your estimated execution plan are likely a long way from what you'd get in your actual plan. We all make mistakes and really it would be good if SQL Server learned from its own but currently it doesn't. It will estimate a 2 second return time despite being immediately proven wrong again and again. To be fair, I don't know of any DBMS which has machine-learning to do better.
Where your query is fast
The hardest part is done up front by sorting table3. That means it can do an efficient merge join which in turn means it has no reason to be lazy about spooling.
Where it's slow
Having an ID that refers to the same thing stored as two different types and data lengths shouldn't ever be necessary and will never be a good idea for an ID. In this case nvarchar in one place varchar in another. When that makes it fail to get a cardinality estimate that's the key flaw and here's why:
The optimizer is hoping to require only a few unique rows from table3. Just a handful of options like "Male", "Female", "Other". That would be what is known as "low cardinality". So imagine tradeNo actually contained IDs for genders for some weird reason. Remember, it's you with your human skills of contextualisation, who knows that's very unlikely. The DB is blind to that. So here is what it expects to happen: As it executes the query the first time it encounters the ID for "Male" it will lazily fetch the data associated (like the word "Male") and put it in the spool. Next, because it's sorted it expects just a lot more males and to simply re-use what it has already put in the spool.
Basically, it plans to fetch the data from tables 1 and 2 in a few big chunks stopping once or twice to fetch new details from table 3. In practice the stopping isn't occasional. In fact, it may even be stopping for every single row because there are lots of different IDs here. The lazy spool is like going upstairs to get one small thing at a time. Good if you think you just need your wallet. Not so good if you're moving house, in which case you'll want a big box (the eager spool).
The likely reason that shrinking the size of the field in table3 helped is that it meant it estimated less of a comparative benefit in doing the lazy spool over a full sort up front. With varchar it doesn't know how much data there is, just how much there could potentially be. The bigger the potential chunks of data that need shuffling, the more physical work needs doing.
What you can do to avoid in future
Make your table schema and indexes reflect the real shape of the data.
If an ID can be varchar in one table then it's very unlikely to need the extra characters available in nvarchar for another table. Avoid the need for conversions on IDs and also use integers instead of characters where possible.
Ask yourself if any of these tables need tradeNo to be filled in for
all rows. If so, make it not nullable on that table. Next, ask if the
ID should be unique for any of these tables and set it up as such in
the appropriate index. Unique is the definition of maximum cardinality
so it won't make that mistake again.
Nudge in the right direction with join order.
The order you have your joins in the SQL is a signal to the database about how powerful/difficult you expect each join to be. (Sometimes as a human you know more. e.g. if querying for 50 year old astronauts you know that filtering for astronauts would be the first filter to apply but maybe begin with the age when searching for 50 year office workers.) The heavy stuff should come first. It will ignore you if it thinks it has the information to know better but in this case it's relying on your knowledge.
If all else fails
A possible fix would be to INCLUDE all the fields you'll need from table3 in the index on TradeReportId. The reason the indexes couldn't help so much already is that they make it easy to identify how to re-sort but it still hasn't been physically done. That is work it was hoping to optimize with a lazy spool but if the data were included it would be already available so no work to optimize.
Having indexes on a table are key to speeding up retrieval of data. Start with this and then retry your query to see if the speed is improved using 'ORDER BY'

Your first gut feeling on this SqlServer design question

We have 2 tables. One holds measurements, the other one holds timestamps (one for every minute)
every measurement holds a FK to a timestamp.
We have 8M (million) measurements, and 2M timestamps.
We are creating a report database via replication, and my first solution was this: when a new measurement was received via the replication process, lookup the right timestamp and add it to the measurement table.
Yes, it's duplication of data, but it is for reporting and since we have measurements every 5 minutes and users can query for yearly data (105.000 measurements) we have to optimize for speed.
But a co-developer said: you don't have to do that, we'll just query with a join (on the two tables), SqlServer is so fast, you don't see the difference.
My first reaction was: a join on two tables with 8M and 2M records can't make 'no difference'.
What is your first feeling on this?
EDIT:
new measurements: 400 records per 5 minutes
EDIT 2:
maybe the question is not so clear:
the first solution is to get the data from the timestamp table and copy it to the measurement table when the measurement record is inserted.
In that case we have an action when the record is inserted AND an extra (duplicated) timestamp value. In this case we lonly query ONE table because it holds all the data.
The second solution is to join the two tables in a query.
With the proper index the join will make no difference*. My initial thought is that if the report is querying over the entire dataset, the joins might actually be faster because there is literally 6 million fewer timestamps that it has to read from the disk.
*This is just a guess based on my experience with tables with millions of records. You results will vary based on your queries.
I'd create an Indexed View (similar to a Materialized view in Oracle) which joins the tables using appropriate indexes.
If the query just retrieves the data for the given date ranges, there will be a merge join - that is, a range scan for each of tow tables. Since the timestamp table presumably contains only timestamp, this shouldn't be expensive.
On the other hand, if you have only one table and index on the date column, the index itself becomes larger and more expensive to scan.
So, with properly constructed indexes and queries I won't expect a significant difference in performance.
I'd suggest you to keep properly normalized design until you start having performance problems that force you to change it. And then you need to carefully analyze query plans and measure performance with different options - there're lots of thing that could matter in your particular case.
Frankly in this case your best bet is try both solutions and see which one is better. Performance tuning is an art when you start talking about large data sets and is highly dependant onthe not only the database design you have but the hardware and the whther you are using partioning, etc. Be sure to test both getting the data out and putting the data in. Since you have so many inserts, insert speed is critical and tthe index you would need on on the datetime field is critical to select performance, so you really need to thouroughly test this. Don't forget about dumping the cache when you test. And test multiple times and if possible test under a typical query load.

How do I troubleshoot performance problems with an Oracle SQL statement

I have two insert statements, almost exactly the same, which run in two different schemas on the same Oracle instance. What the insert statement looks like doesn't matter - I'm looking for a troubleshooting strategy here.
Both schemas have 99% the same structure. A few columns have slightly different names, other than that they're the same. The insert statements are almost exactly the same. The explain plan on one gives a cost of 6, the explain plan on the other gives a cost of 7. The tables involved in both sets of insert statements have exactly the same indexes. Statistics have been gathered for both schemas.
One insert statement inserts 12,000 records in 5 seconds.
The other insert statement inserts 25,000 records in 4 minutes 19 seconds.
The number of records being insert is correct. It's the vast disparity in execution times that confuses me. Given that nothing stands out in the explain plan, how would you go about determining what's causing this disparity in runtimes?
(I am using Oracle 10.2.0.4 on a Windows box).
Edit: The problem ended up being an inefficient query plan, involving a cartesian merge which didn't need to be done. Judicious use of index hints and a hash join hint solved the problem. It now takes 10 seconds. Sql Trace / TKProf gave me the direction, as I it showed me how many seconds each step in the plan took, and how many rows were being generated. Thus TKPROF showed me:-
Rows Row Source Operation
------- ---------------------------------------------------
23690 NESTED LOOPS OUTER (cr=3310466 pr=17 pw=0 time=174881374 us)
23690 NESTED LOOPS (cr=3310464 pr=17 pw=0 time=174478629 us)
2160900 MERGE JOIN CARTESIAN (cr=102 pr=0 pw=0 time=6491451 us)
1470 TABLE ACCESS BY INDEX ROWID TBL1 (cr=57 pr=0 pw=0 time=23978 us)
8820 INDEX RANGE SCAN XIF5TBL1 (cr=16 pr=0 pw=0 time=8859 us)(object id 272041)
2160900 BUFFER SORT (cr=45 pr=0 pw=0 time=4334777 us)
1470 TABLE ACCESS BY INDEX ROWID TBL1 (cr=45 pr=0 pw=0 time=2956 us)
8820 INDEX RANGE SCAN XIF5TBL1 (cr=10 pr=0 pw=0 time=8830 us)(object id 272041)
23690 MAT_VIEW ACCESS BY INDEX ROWID TBL2 (cr=3310362 pr=17 pw=0 time=235116546 us)
96565 INDEX RANGE SCAN XPK_TBL2 (cr=3219374 pr=3 pw=0 time=217869652 us)(object id 272084)
0 TABLE ACCESS BY INDEX ROWID TBL3 (cr=2 pr=0 pw=0 time=293390 us)
0 INDEX RANGE SCAN XIF1TBL3 (cr=2 pr=0 pw=0 time=180345 us)(object id 271983)
Notice the rows where the operations are MERGE JOIN CARTESIAN and BUFFER SORT. Things that keyed me into looking at this were the number of rows generated (over 2 million!), and the amount of time spent on each operation (compare to other operations).
Use the SQL Trace facility and TKPROF.
The main culprits in insert slow downs are indexes, constraints, and oninsert triggers. Do a test without as many of these as you can remove and see if it's fast. Then introduce them back in and see which one is causing the problem.
I have seen systems where they drop indexes before bulk inserts and rebuild at the end -- and it's faster.
The first thing to realize is that, as the documentation says, the cost you see displayed is relative to one of the query plans. The costs for 2 different explains are not comparable. Secondly the costs are based on an internal estimate. As hard as Oracle tries, those estimates are not accurate. Particularly not when the optimizer misbehaves. Your situation suggests that there are two query plans which, according to Oracle, are very close in performance. But which, in fact, perform very differently.
The actual information that you want to look at is the actual explain plan itself. That tells you exactly how Oracle executes that query. It has a lot of technical gobbeldy-gook, but what you really care about is knowing that it works from the most indented part out, and at each step it merges according to one of a small number of rules. That will tell you what Oracle is doing differently in your two instances.
What next? Well there are a variety of strategies to tune bad statements. The first option that I would suggest, if you're in Oracle 10g, is to try their SQL tuning advisor to see if a more detailed analysis will tell Oracle the error of its ways. It can then store that plan, and you will use the more efficient plan.
If you can't do that, or if that doesn't work, then you need to get into things like providing query hints, manual stored query outlines, and the like. That is a complex topic. This is where it helps to have a real DBA. If you don't, then you'll want to start reading the documentation, but be aware that there is a lot to learn. (Oracle also has a SQL tuning class that is, or at least used to be, very good. It isn't cheap though.)
I've put up my general list of things to check to improve performance as an answer to another question:
Favourite performance tuning tricks
... It might be helpful as a checklist, even though it's not Oracle-specific.
I agree with a previous poster that SQL Trace and tkprof are a good place to start. I also highly recommend the book Optimizing Oracle Performance, which discusses similar tools for tracing execution and analyzing the output.
SQL Trace and tkprof are only good if you have access to theses tools. Most of the large companies that I do work for do not allow developers to access anything under the Oracle unix IDs.
I believe you should be able to determine the problem by first understanding the question that is being asked and by reading the explain plans for each of the queries. Many times I find that the big difference is that there are some tables and indexes that have not been analyzed.
Another good reference that presents a general technique for query tuning is the book SQL Tuning by Dan Tow.
When the performance of a sql statement isn't as expected / desired, one of the first things I do is to check the execution plan.
The trick is to check for things that aren't as expected. For example you might find table scans where you think an index scan should be faster or vice versa.
A point where the oracle optimizer sometimes takes a wrong turn are the estimates how many rows a step will return. If the execution plan expects 2 rows, but you know it will more like 2000 rows, the execution plan is bound to be less than optimal.
With two statements to compare you can obviously compare the two execution plans to see where they differ.
From this analysis, I come up with an execution plan that I think should be suited better. This is not an exact execution plan, but just some crucial changes, to the one I found, like: It should use Index X or a Hash Join instead of a nested loop.
Next thing is to figure out a way to make Oracle use that execution plan. Often by using Hints, or creating additonal indexes, sometimes changing the SQL statement. Then of course test that the changed statement
a) still does what it is supposed to do
b) is actually faster
With b it is very important to make sure you are testing the correct use case. A typical pit fall is the difference between returning the first row, versus returning the last row. Most tools show you the first results as soon as they are available, with no direct indication, that there is more work to be done. But if your actual program has to process all rows before it continues to the next processing step, it is almost irrelevant when the first row appears, it is only relevant when the last row is available.
If you find a better execution plan, the final step is to make you database actually use it in the actual program. If you added an index, this will often work out of the box. Hints are an option, but can be problematic if a library creates your sql statement, those ofte don't support hints. As a last resort you can save and fix execution plans for specific sql statements. I'd avoid this approach, because its easy to become forgotten and in a year or so some poor developer will scratch her head why the statement performs in a way that might have been apropriate with the data one year ago, but not with the current data ...
analyzing the oI also highly recommend the book Optimizing Oracle Performance, which discusses similar tools for tracing execution and utput.

Resources