I read that when the records in the recycle bin are too many to exclude those deleted from queries, you can use the condition "where isDeleted = false". But in my batch, monitoring the times, the query is much slower than the one without the explicit condition. At least the first run, then it looks faster.
However, the results obtained by developer console were always exciting.
Can anyone tell me why and help me, please!
Where you've read it? Looks very suspicious to me. isDeleted = false should have no impact on all normal queries (ones that don't have ALL ROWS at the end) because that's what they do out of the box. If anything it might even slow down the execution because query optimizer would need to consider this field (it's not indexed, it'd be useless to index something that 99% of the time has same value).
You can experiment with Query Optimizer in the developer console and remember that typically index statistics are recalculated overnight so if you've loaded lots of test data - "today" queries might still run off old statistics.
You might be overcomplicating it, relying on something that's one-off results because for example the server's load was low at the time you started your experiment. Or maybe whatever this was about is simply undocumented behaviour that changed with one of recent releases. Just select / create a meaningful index, you'll be better off.
More reading material:
https://developer.salesforce.com/docs/atlas.en-us.salesforce_large_data_volumes_bp.meta/salesforce_large_data_volumes_bp/ldv_deployments_infrastructure_indexes.htm
https://developer.salesforce.com/docs/atlas.en-us.salesforce_large_data_volumes_bp.meta/salesforce_large_data_volumes_bp/ldv_deployments_techniques_deleting_data.htm
https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/langCon_apex_SOQL_VLSQ.htm
Related
My query had ordered hint. it was giving below cost and cardinality
When I removed ordered hint then it started giving below cost and cardinality.
in terms of performance which plan is better? I can put more details including query if required. I am not saying somebody to my work, but even smallest suggestion would be really helpful for me.
Impossible to say which is faster based on cost alone. Cost is only the amount of work the optimizer estimates it will take to execute a query a certain way. This will depend on your statistics and your query (and optimizer math). If your statistics don’t represent the data or your query has filters that it can’t estimate: you’re going to get a misleading cost calculation. What you need to remember is Garbage In - Garbage Out, ie bad stats will give you a bad plan.
If you’re putting hints in, generally that means the execution plan that the optimizer came up with wasn’t deemed good enough. In those cases, you’re essentially saying that Oracle’s cost calculation was wrong - so we definitely shouldn’t use it to see which query is faster.
Luckily, you have everything you need to determine which query is faster - you have your database and the queries, you just need to execute them and see.
I suspect neither is particularly fast, but if you want to improve them you’re going to need to look at where the work is really going in executing them. The final cost in those queries are very high so maybe it has correctly identified an unavoidable (based on how the query is written and what structures exist) high cost operation. Reading over the execution plan yourself and considering how much effort each step would be is always a good idea.
The easy way to begin tuning it would be to get out the Row Source Execution Statistics for a complete execution and target the parts of the plan that are responsible for the most actual time. See parts 3 and 4 of https://ctandrewsayer.wordpress.com/2017/03/21/4-easy-lessons-to-enhance-your-performance-diagnostics/ for how to do that - if anything it will give you something you can share that concrete advise can be given on (if you do share it then don’t forget to include the full query).
Normally cost comparison is enough to say whether using hint makes sense. Usually hints make it worse when statistics is gathered properly.
So, the one with less query cost is better.
I always look on usage of cpu, logical reads (reads from RAM) and physical reads (reads from disk). The better option uses less resources.
There is a rather complex SQL Server query I have been attempting to optimize for some months now which takes a very long time to execute despite multiple index additions (adding covering, non-clustered indexes) and query refactoring/changes. Without getting into the full details, the execution plan is below. Is there anything here which jumps out to anyone as particularly inefficient or bad? I got rid of all key lookups and there appears to be heavy use of index seeks which is why I am confused that it still takes a huge amount of time somehow. When the query runs, the bottleneck is clearly CPU (not disk I/O). Thanks much for any thoughts.
OK so I made a change based on Martin's comments which have seemingly greatly helped the query speed. I'm not 100% positive this is the solution bc I've been running this a lot and it's possible that so much underlying data has been put into memory that it is now fast. But I think there is actually a true difference.
Specifically, the 3 scans inside of the nested loops were being caused by sub-queries on very small tables that contain a small set of records to be completely excluded from the result set. Conceptually, the query was something like:
SELECT fields
FROM (COMPLEX JOIN)
WHERE id_field NOT IN (SELECT bad_ID_field FROM BAD_IDs)
the idea being that if a record appears in BAD_IDs it should never be included in the results.
I tinkered with this and changed it to something like:
SELECT fields
FROM (COMPLEX JOIN)
LEFT JOIN BAD_IDs ON id_field = bad_ID_field
WHERE BAD_IDs.bad_ID_field IS NULL
This is logically the same thing - it excludes results for any ID in BAD_IDs - but it uses a join instead of a subquery. Even the execution plan is almost identical; a TOP operation gets changed to a FILTER elsewhere in the tree, but the clustered index scan is still there.
But, it seems to run massively faster! Is this to be expected? I have always assumed that a subquery used in the fashion I did was OK and that the server would know how to create the fastest (and presumably identical, which it almost is) execution plan. Is this not correct?
Thx!
When we add or remove a new index to speed up something, we may end up slowing down something else.
To protect against such cases, after creating a new index I am doing the following steps:
start the Profiler,
run a SQL script which contains lots of queries I do not want to slow down
load the trace from a file into a table,
analyze CPU, reads, and writes from the trace against the results from the previous runs, before I added (or removed) an index.
This is kind of automated and kind of does what I want. However, I am not sure if there is a better way to do it. Is there some tool that does what I want?
Edit 1 The person who voted to close my question, could you explain your reasons?
Edit 2 I googled up but did not find anything that explains how adding an index can slow down selects. However, this is a well known fact, so there should be something somewhere. If nothing comes up, I can write up a few examples later on.
Edit 3 One such example is this: two columns are highly correlated, like height and weight. We have an index on height, which is not selective enough for our query. We add an index on weight, and run a query with two conditions: a range on height and a range on weight. because the optimizer is not aware of the correlation, it grossly underestimates the cardinality of our query.
Another example is adding an index on increasing column, such as OrderDate, can seriously slow down a query with a condition like OrderDate>SomeDateAfterCreatingTheIndex.
Ultimately what you're asking can be rephrased as 'How can I ensure that the queries that already use an optimal, fast, plan do not get 'optimized' into a worse execution plan?'.
Whether the plan changes due to parameter sniffing, statistics update or metadata changes (like adding a new index) the best answer I know of to keep the plan stable is plan guides. Deploying plan guides for critical queries that already have good execution plans is probably the best way to force the optimizer into keep using the good, validated, plan. See Applying a Fixed Query Plan to a Plan Guide:
You can apply a fixed query plan to a plan guide of type OBJECT or
SQL. Plan guides that apply a fixed query plan are useful when you
know about an existing execution plan that performs better than the
one selected by the optimizer for a particular query.
The usual warnings apply as to any possible abuse of a feature that prevents the optimizer from using a plan which may be actually better than the plan guide.
How about the following approach:
Save the execution plans of all typical queries.
After applying new indexes, check which execution plans have changed.
Test the performance of the queries with modified plans.
From the page "Query Performance Tuning"
Improve Indexes
This page has many helpful step-by-step hints on how to tune your indexes for best performance, and what to watch for (profiling).
As with most performance optimization techniques, there are tradeoffs. For example, with more indexes, SELECT queries will potentially run faster. However, DML (INSERT, UPDATE, and DELETE) operations will slow down significantly because more indexes must be maintained with each operation. Therefore, if your queries are mostly SELECT statements, more indexes can be helpful. If your application performs many DML operations, you should be conservative with the number of indexes you create.
Other resources:
http://databases.about.com/od/sqlserver/a/indextuning.htm
However, it’s important to keep in mind that non-clustered indexes slow down the data modification and insertion process, so indexes should be kept to a minimum
http://searchsqlserver.techtarget.com/tip/Stored-procedure-to-find-fragmented-indexes-in-SQL-Server
Fragmented indexes and tables in SQL Server can slow down application performance. Here's a stored procedure that finds fragmented indexes in SQL servers and databases.
Ok . First off, index's slow down two things (at least)
-> insert/update/delete : index rebuild
-> query planning : "shall I use that index or not ?"
Someone mentioned the query planner might take a less efficient route - this is not supposed to happen.
If your optimizer is even half-decent, and your statistics / parameters correct, there is no way it's going to pick the wrong plan.
Either way, in your case (mssql), you can hardly trust the optimizer and will still have to check every time.
What you're currently doing looks quite sound, you should just make sure the data you're looking at is relevant, i.e. real use case queries in the right proportion (this can make a world of difference).
In order to do that I always advise to write a benchmarking script based on real use - through logging of production-env. queries, a bit like I said here :
Complete db schema transformation - how to test rewritten queries?
We have a SQL table that is populated with events from our website (mostly error logging and the like.) The table has several text fields that contain all of the information about the type of event, and a date/time field that shows when the event was logged. The table is fairly large and grows by around 10-100 records per day.
Obviously, when going through this log, we often are looking for the most recent items, so I figured an obvious way to improve our search times would be to add a index to the date field. Me, I figured that while either ASC or DESC would both be great, DESC would be better since that's the way we're searching most of the time. Our DB guy said "no way"...it would be really bad, because the index table would rapidly become fragmented.
I could see why you wouldn't want to have a clustered index on date DESC, because you'd constantly be trying to insert at the beginning...but I thought with a non-clustered index it would be okay, since the records wouldn't need to be moved around. But what he's saying also makes sense...still would have to move indexes around.
But how much? And how big of a hit would it be? And even if it isn't much of a hit, maybe it's still not worth it because the performance on occasional selects just couldn't improve that much? Thoughts?
I don't think it's a bad idea - quite the contrary!
Not knowing your database system, I can't really be sure why your DB guy would think this would be a bad idea. And even so - even an ascending index on the date will be quite beneficial already (at least in the case of SQL Server).
In this case, if you do frequently query by date and usually will retrieve the most recent ones, this seems like a perfect index to me! Maybe you could make it even better by adding the second most likely selection criteria (log application? log type?) to it, so that if you specify both the date and that second criteria, the search scope would be even more limited within the index.
If I were you, I would try a few sample queries against the table without this index, and then add the non-clustered index on your logdate - first with ASC and test how your queries perform (check out their execution plans!), then try the index with DESC, and possibly try the index with LogDate and an additional criteria field, too. See how performance looks like.
Marc
Indexes speed up some queries but slow down all loads. Whether or not an index gives an overall performance improvement depends on how much it speeds up your actual query workload and how much it slows down your actual loading workload (as well as deletes and updates that modify the indexed column).
In many (probably most) applications that involve storing event data, there is a huge amount of loading going on and relatively little querying, which is primarily summary-type queries that don't benefit from indexes. In these sorts of applications, indexes often do more harm than good.
In many such applications, it is possible to do loads during off hours so even if the index gives an overall slowdown, it might be worth it to increase query speed because someone is waiting for the query output but no one waits for the load to complete. However, the index can get so large that overruns the file cache and each insert has to read and write a different leaf page from disk. At this point, loads start to require a linear number of random access disk reads and writes, which can cause it to take all day to do a load.
A co-worker recently ran into a situation where a query to look up security permissions was taking ~15 seconds to run using an = comparison on UserID (which is a UNIQUEIDENTIFIER). Needless to say, the users were less than impressed.
Out of frustration, my co-worker changed the = comparison to use a LIKE and the query sped up to under 1 second.
Without knowing anything about the data schema (I don't have access to the database or execution plans), what could potentially cause this change in performance?
(Broad and vague question, I know)
It may have just been a poor execution plan that had been cached; Changing to the LIKE statement then just caused a new execution plan to be generated. The same speedup may have been noticed if the person had run sp_recompile on the table in question and then re-run the = query.
The other possibility is that this is a complex query and a type conversion is taking place across the = operator for every row. LIKE changes the semantics somewhat so that the type conversion does not have to weigh as heavily in execution planning. I would suggest that your coworker take a look at the execution plan with the = in place and see if there is something like
CONVERT(varchar, variable) = othervariable
in the execution step. In the wrong circumstances, a single typecast can slow a query by two orders of magnitude.
In some cases, LIKE can be faster than an equivalent function like SUBSTRING when an index can be utilized.
Can you give the exact SQL?
Sometimes functions can stop the optimizer from being able to use an index.
Compare the execution plans.
Well, if he ran the two queries one after the other, then it is quite likely that the data had to read from the disk for the first query, but was still in the RDBMS data cache for the second one...
If this is what happened, then if he ran them in the opposite order he would have seen the opposite results... If he used like with an exact value (no wildcards) then the query plan should have been identical..
Have you tried updating the statistics on this table/database? Might be worth a try.