My query had ordered hint. it was giving below cost and cardinality
When I removed ordered hint then it started giving below cost and cardinality.
in terms of performance which plan is better? I can put more details including query if required. I am not saying somebody to my work, but even smallest suggestion would be really helpful for me.
Impossible to say which is faster based on cost alone. Cost is only the amount of work the optimizer estimates it will take to execute a query a certain way. This will depend on your statistics and your query (and optimizer math). If your statistics don’t represent the data or your query has filters that it can’t estimate: you’re going to get a misleading cost calculation. What you need to remember is Garbage In - Garbage Out, ie bad stats will give you a bad plan.
If you’re putting hints in, generally that means the execution plan that the optimizer came up with wasn’t deemed good enough. In those cases, you’re essentially saying that Oracle’s cost calculation was wrong - so we definitely shouldn’t use it to see which query is faster.
Luckily, you have everything you need to determine which query is faster - you have your database and the queries, you just need to execute them and see.
I suspect neither is particularly fast, but if you want to improve them you’re going to need to look at where the work is really going in executing them. The final cost in those queries are very high so maybe it has correctly identified an unavoidable (based on how the query is written and what structures exist) high cost operation. Reading over the execution plan yourself and considering how much effort each step would be is always a good idea.
The easy way to begin tuning it would be to get out the Row Source Execution Statistics for a complete execution and target the parts of the plan that are responsible for the most actual time. See parts 3 and 4 of https://ctandrewsayer.wordpress.com/2017/03/21/4-easy-lessons-to-enhance-your-performance-diagnostics/ for how to do that - if anything it will give you something you can share that concrete advise can be given on (if you do share it then don’t forget to include the full query).
Normally cost comparison is enough to say whether using hint makes sense. Usually hints make it worse when statistics is gathered properly.
So, the one with less query cost is better.
I always look on usage of cpu, logical reads (reads from RAM) and physical reads (reads from disk). The better option uses less resources.
Related
I have the following problem with InfluxDb: it overwires values with the same tag set and timestamp (a terrible design choice in my view).
Now to sidestep this in a cost effective manner one idea is to make a tag (e.g. value_id) which is unique and constantly increasing.
I know this will bloat cardinality to a point where query time will be super slow.
My question is: if I don’t use this random tag (value_id) in my query, but have it in the db, will this still-affect the speed of my queries?
If it does not it sounds like a “solution” to my problem.
P.S. I am aware that adding a nanosecond or arbitrary tag are two "solutions" suggested by InfluxDB, but neither sounds good and neither work reliably without a large cost.
Can you explain your use-case and why you need to write different values with the same time and tagset?
To answer your question: yes, this is damage your write and query time.
InfluxDB has a Seriesfile that stores a mapping if your series key to a unique identifier. This lookup and potential write to the Seriesfile happens on every write and read. The biggest this file gets, the slower these operations get.
It's not actually cardinality from a query POV that is bad, TSI enables billions of series; however the Seriesfile hasn't been optimised for these workloads yet.
There is a rather complex SQL Server query I have been attempting to optimize for some months now which takes a very long time to execute despite multiple index additions (adding covering, non-clustered indexes) and query refactoring/changes. Without getting into the full details, the execution plan is below. Is there anything here which jumps out to anyone as particularly inefficient or bad? I got rid of all key lookups and there appears to be heavy use of index seeks which is why I am confused that it still takes a huge amount of time somehow. When the query runs, the bottleneck is clearly CPU (not disk I/O). Thanks much for any thoughts.
OK so I made a change based on Martin's comments which have seemingly greatly helped the query speed. I'm not 100% positive this is the solution bc I've been running this a lot and it's possible that so much underlying data has been put into memory that it is now fast. But I think there is actually a true difference.
Specifically, the 3 scans inside of the nested loops were being caused by sub-queries on very small tables that contain a small set of records to be completely excluded from the result set. Conceptually, the query was something like:
SELECT fields
FROM (COMPLEX JOIN)
WHERE id_field NOT IN (SELECT bad_ID_field FROM BAD_IDs)
the idea being that if a record appears in BAD_IDs it should never be included in the results.
I tinkered with this and changed it to something like:
SELECT fields
FROM (COMPLEX JOIN)
LEFT JOIN BAD_IDs ON id_field = bad_ID_field
WHERE BAD_IDs.bad_ID_field IS NULL
This is logically the same thing - it excludes results for any ID in BAD_IDs - but it uses a join instead of a subquery. Even the execution plan is almost identical; a TOP operation gets changed to a FILTER elsewhere in the tree, but the clustered index scan is still there.
But, it seems to run massively faster! Is this to be expected? I have always assumed that a subquery used in the fashion I did was OK and that the server would know how to create the fastest (and presumably identical, which it almost is) execution plan. Is this not correct?
Thx!
When we add or remove a new index to speed up something, we may end up slowing down something else.
To protect against such cases, after creating a new index I am doing the following steps:
start the Profiler,
run a SQL script which contains lots of queries I do not want to slow down
load the trace from a file into a table,
analyze CPU, reads, and writes from the trace against the results from the previous runs, before I added (or removed) an index.
This is kind of automated and kind of does what I want. However, I am not sure if there is a better way to do it. Is there some tool that does what I want?
Edit 1 The person who voted to close my question, could you explain your reasons?
Edit 2 I googled up but did not find anything that explains how adding an index can slow down selects. However, this is a well known fact, so there should be something somewhere. If nothing comes up, I can write up a few examples later on.
Edit 3 One such example is this: two columns are highly correlated, like height and weight. We have an index on height, which is not selective enough for our query. We add an index on weight, and run a query with two conditions: a range on height and a range on weight. because the optimizer is not aware of the correlation, it grossly underestimates the cardinality of our query.
Another example is adding an index on increasing column, such as OrderDate, can seriously slow down a query with a condition like OrderDate>SomeDateAfterCreatingTheIndex.
Ultimately what you're asking can be rephrased as 'How can I ensure that the queries that already use an optimal, fast, plan do not get 'optimized' into a worse execution plan?'.
Whether the plan changes due to parameter sniffing, statistics update or metadata changes (like adding a new index) the best answer I know of to keep the plan stable is plan guides. Deploying plan guides for critical queries that already have good execution plans is probably the best way to force the optimizer into keep using the good, validated, plan. See Applying a Fixed Query Plan to a Plan Guide:
You can apply a fixed query plan to a plan guide of type OBJECT or
SQL. Plan guides that apply a fixed query plan are useful when you
know about an existing execution plan that performs better than the
one selected by the optimizer for a particular query.
The usual warnings apply as to any possible abuse of a feature that prevents the optimizer from using a plan which may be actually better than the plan guide.
How about the following approach:
Save the execution plans of all typical queries.
After applying new indexes, check which execution plans have changed.
Test the performance of the queries with modified plans.
From the page "Query Performance Tuning"
Improve Indexes
This page has many helpful step-by-step hints on how to tune your indexes for best performance, and what to watch for (profiling).
As with most performance optimization techniques, there are tradeoffs. For example, with more indexes, SELECT queries will potentially run faster. However, DML (INSERT, UPDATE, and DELETE) operations will slow down significantly because more indexes must be maintained with each operation. Therefore, if your queries are mostly SELECT statements, more indexes can be helpful. If your application performs many DML operations, you should be conservative with the number of indexes you create.
Other resources:
http://databases.about.com/od/sqlserver/a/indextuning.htm
However, it’s important to keep in mind that non-clustered indexes slow down the data modification and insertion process, so indexes should be kept to a minimum
http://searchsqlserver.techtarget.com/tip/Stored-procedure-to-find-fragmented-indexes-in-SQL-Server
Fragmented indexes and tables in SQL Server can slow down application performance. Here's a stored procedure that finds fragmented indexes in SQL servers and databases.
Ok . First off, index's slow down two things (at least)
-> insert/update/delete : index rebuild
-> query planning : "shall I use that index or not ?"
Someone mentioned the query planner might take a less efficient route - this is not supposed to happen.
If your optimizer is even half-decent, and your statistics / parameters correct, there is no way it's going to pick the wrong plan.
Either way, in your case (mssql), you can hardly trust the optimizer and will still have to check every time.
What you're currently doing looks quite sound, you should just make sure the data you're looking at is relevant, i.e. real use case queries in the right proportion (this can make a world of difference).
In order to do that I always advise to write a benchmarking script based on real use - through logging of production-env. queries, a bit like I said here :
Complete db schema transformation - how to test rewritten queries?
What does it indicate to see a query that has a low cost in the explain plan but a high consistent gets count in autotrace? In this case the cost was in the 100's and the CR's were in the millions.
The cost can represent two different things depending on version and whether you are running in cpu-based costing mode or not.
Briefly, the cost represents that amount of time that the optimizer expects the query to execute for, but it is expressed in units of the amount of time that a single block read takes. For example if Oracle expects a single block read to take 1ms and the query to take 20ms, then the cost equals 20.
Consistent gets do not match exactly with this for a number of reasons: the cost includes non-consistent (current) gets (eg reading and writing temp data), the cost includes CPU time, and a consistent get can be a multiblock read instead of a single block read and hence have a different duration. Oracle can also get the estimate of the cost completely wrong and it could end up requiring a lot more or less consistent gets than the estimate suggested.
A useful method that can helo explain disconnects between predicted execution plan and actual performance is "cardinality feedback". See this presentation: http://www.centrexcc.com/Tuning%20by%20Cardinality%20Feedback.ppt.pdf
At best, the cost is the optimizer's estimate of the number of I/O's that a query would perform. So, at best, the cost is only likely to be accurate if the optimizer has found a very good plan-- if the optimizer's estimate of the cost is correct and the plan is ideal, that generally means that you're never going to bother looking at the plan because that query is going to perform reasonably well.
Consistent gets, however, is an actual measure of the number of gets that a query actually performed. So that is a much more accurate benchmark to use.
Although there are many, many things that can influence the cost, and a few things that can influence the number of consistent gets, it is probably reasonable to expect that if you have a very low cost and a very high number of consistent gets that the optimizer is probably working with poor estimates of the cardinality of the various steps (the ROWS column in the PLAN_TABLE tells you the expected number of rows returned in each step). That may indicate that you have missing or outdated statistics, that you are missing some histograms, that your initialization parameters or system statistics are wrong in some way, or that the CBO has problems for some other reason estimating the cardinality of your results.
What version of Oracle are you using?
I have an ETL process that involves a stored procedure that makes heavy use of SELECT INTO statements (minimally logged and therefore faster as they generate less log traffic). Of the batch of work that takes place in one particular stored the stored procedure several of the most expensive operations are eager spools that appear to just buffer the query results and then copy them into the table just being made.
The MSDN documentation on eager spools is quite sparse. Does anyone have a deeper insight into whether these are really necessary (and under what circumstances)? I have a few theories that may or may not make sense, but no success in eliminating these from the queries.
The .sqlplan files are quite large (160kb) so I guess it's probably not reasonable to post them directly to a forum.
So, here are some theories that may be amenable to specific answers:
The query uses some UDFs for data transformation, such as parsing formatted dates. Does this data transformation necessitate the use of eager spools to allocate sensible types (e.g. varchar lengths) to the table before it constructs it?
As an extension of the question above, does anyone have a deeper view of what does or does not drive this operation in a query?
My understanding of spooling is that it's a bit of a red herring on your execution plan. Yes, it accounts for a lot of your query cost, but it's actually an optimization that SQL Server undertakes automatically so that it can avoid costly rescanning. If you were to avoid spooling, the cost of the execution tree it sits on will go up and almost certainly the cost of the whole query would increase. I don't have any particular insight into what in particular might cause the database's query optimizer to parse the execution that way, especially without seeing the SQL code, but you're probably better off trusting its behavior.
However, that doesn't mean your execution plan can't be optimized, depending on exactly what you're up to and how volatile your source data is. When you're doing a SELECT INTO, you'll often see spooling items on your execution plan, and it can be related to read isolation. If it's appropriate for your particular situation, you might try just lowering the transaction isolation level to something less costly, and/or using the NOLOCK hint. I've found in complicated performance-critical queries that NOLOCK, if safe and appropriate for your data, can vastly increase the speed of query execution even when there doesn't seem to be any reason it should.
In this situation, if you try READ UNCOMMITTED or the NOLOCK hint, you may be able to eliminate some of the Spools. (Obviously you don't want to do this if it's likely to land you in an inconsistent state, but everyone's data isolation requirements are different). The TOP operator and the OR operator can occasionally cause spooling, but I doubt you're doing any of those in an ETL process...
You're right in saying that your UDFs could also be the culprit. If you're only using each UDF once, it would be an interesting experiment to try putting them inline to see if you get a large performance benefit. (And if you can't figure out a way to write them inline with the query, that's probably why they might be causing spooling).
One last thing I would look at is that, if you're doing any joins that can be re-ordered, try using a hint to force the join order to happen in what you know to be the most selective order. That's a bit of a reach but it doesn't hurt to try it if you're already stuck optimizing.