I'm having a performance issue with a select statement I'm executing.
Here it is:
SELECT Material.*
FROM Material
INNER JOIN LineInfo ON Material.LineInfoCtr = LineInfo.ctr
INNER JOIN Order_Header ON LineInfo.Order_HeaderCtr = Order_Header.ctr
WHERE (Order_Header.jobNum = 'ttest')
AND (Order_Header.revision_number = 0)
AND (LineInfo.lineNum = 46)
The statement is taking 5-10 seconds to execute depending on server load.
Some table stats:
- Material has 2,030,xxx records.
- Lineinfo has 190,xxx records
- Order_Header has 2,5xx records.
My statement is returning a total of 18 rows containing about 20-25 fields of data. Returning a single field or all of them makes no difference. Is this performance typical? Is there something I could do to improve it?
I've tried using a sub select to retrieve the foreign key, the IN clause and I found one post where a fella said using a left outer join helped him. For me, they all yield the same 5 to 10 seconds of execution time.
This is MS SQL server 2005 accessed through MS SQL management studio. Times are the elapsed time in query analyzer.
Any ideas?
The first thing you should do is analyze the query plan, to see what indexes (if any) SQL Server is using.
You can probably benefit from some covering indexes in this query, since you only use columns in Lineinfo and Order_Header for the join and the query restriction (the WHERE clause).
I do not see anything special in your query so, if indexes are correct, it should perform much more faster than that,, the number of rows is not very high.
Do you have indexes on the table involved in the query and have you tried to use the "display execution plan" option of the Query Analyzer. Basically you need to run the query, loop at the execution plan and add indexes so that you do not see any full table scan operation.
If you run from SQL Management studio then you have the option to tune automatically the query adding indexes but I would suggest trying optimize on your own to better understand what you're doing.
Regards
Massimo
It won't affect performance, but don't write a query such as "SELECT * FROM X". Eschew the star notation and spell out the individual columns. The code that calls this will still work that way, even if the schema is changed by adding a column.
Indexes are key here, as others have already said.
The order of the WHERE clauses can help. Execute the one that eliminates the greatest number of rows from consideration first.
Taking all suggestions and rolling them together I was able to setup some indexes and now it's taking less than a second to execute. Honestly, it's almost immediate.
My problem was that by clicking on the table properties I saw that the primary key was indexed and I mistakenly thought that's what everyone had been talking about. I looked at the execution plan and ran the tuning assistant and putting the two together, I realized that you could index the foreign keys too. That is now done and things are exceptionally snappy.
Thanks for the help, and sorry for such a newb question.
Related
From this SO answer a view should provide the same performance as using the same query directly.
Is querying over a view slower than executing SQL directly?
I have a view where this is not true.
This query targeting a view
SELECT
*
FROM
[Front].[vw_Details] k
WHERE
k.Id = 970435
Takes 10 seconds to complete. Copying the query from the view and adding WHERE k.Id = 970435 to it completes in less than 1 second. The view is nothing special, 4 LEFT JOINs, and a few CASE directives to clean up data.
How can I figure out what the problem is, or what do I need to complete this question with in order for this to be answerable?
Update 1:
SQL Server Version: 12.0.4436.0
Query plan for view: https://pastebin.com/RY40Ab0k
Query plan for select: https://pastebin.com/gwahhgpu
Your query plan is no longer visible, but if you look in the plan, you will most likely see a triangle complaining about the cardinality estimation and/or implicite declaration. What it means is that you are joining tables in a way where your keys are hard to guess for the SQL engine.
It is instant when you run from a query directly, probably because it doesn't need to guess the size of your key is
For example:
k.Id = 970435
SQLSERVER already knows that it is looking for 970435 a 6 digit number.
It can eliminate all the key that doesn't start by 9 and doesn't have 6 digits. :)
However, in a view, it has to build the plan in a way to account for unknown. Because it doesn't know what kind of key it may hold.
See the microsoft for various example and scenario that may help you.
https://learn.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide?view=sql-server-ver15
If you are always looking for an int, one work around is to force the type with a cast or convert clause. It's may cause performance penalty depending on your data, but it is a trick in the toolbox to tell sql to not attempt the query a plan as varchar(max) or something along that line.
SELECT *
FROM [Front].[vw_Details] k
WHERE TRY_CONVERT(INT,k.Id) = 970435
use a stored procedure to return results. stored procedures use indexes, whereas, views often don't
or
use a table function and query the table function
I have two tables that I want to join, they both have index on the column I am trying to join.
QUERY 1
SELECT * FROM [A] INNER JOIN [B] ON [A].F = [B].F;
QUERY 2
SELECT * FROM (SELECT * FROM [A]) [A1] INNER JOIN (SELECT * FROM B) [B1] ON [A1].F=[B1].F
the first query clearly will utilize the index, what about the second one?
after the two select statements in the brackets are executed, then join would occur, but my guess is the index wouldn't help to speed up the query because it is pretty much a new table..
The query isn't executed quite so literally as you suggest, where the inner queries are executed first and then their results are combined with the outer query. The optimizer will take your query and will look at many possible ways to get your data through various join orders, index usages, etc. etc. and come up with a plan that it feels is optimal enough.
If you execute both queries and look at their respective execution plans, I think you will find that they use the exact same one.
Here's a simple example of the same concept. I created my schema as so:
CREATE TABLE A (id int, value int)
CREATE TABLE B (id int, value int)
INSERT INTO A (id, value)
VALUES (1,900),(2,800),(3,700),(4,600)
INSERT INTO B (id, value)
VALUES (2,800),(3,700),(4,600),(5,500)
CREATE CLUSTERED INDEX IX_A ON A (id)
CREATE CLUSTERED INDEX IX_B ON B (id)
And ran queries like the ones you provided.
SELECT * FROM A INNER JOIN B ON A.id = B.id
SELECT * FROM (SELECT * FROM A) A1 INNER JOIN (SELECT * FROM B) B1 ON A1.id = B1.id
The plans that were generated looked like this:
Which, as you can see, both utilize the index.
Chances are high that the SQL Server Query Optimizer will be able to detect that Query 2 is in fact the same as Query 1 and use the same indexed approach.
Whether this happens depends on a lot of factors: your table design, your table statistics, the complexity of your query, etc. If you want to know for certain, let SQL Server Query Analyzer show you the execution plan. Here are some links to help you get started:
Displaying Graphical Execution Plans
Examining Query Execution Plans
SQL Server uses predicate pushing (a.k.a. predicate pushdown) to move query conditions as far toward the source tables as possible. It doesn't slavishly do things in the order you parenthesize them. The optimizer uses complex rules--what is essentially a kind of geometry--to determine the meaning of your query, and restructure its access to the data as it pleases in order to gain the most performance while still returning the same final set of data that your query logic demands.
When queries become more and more complicated, there is a point where the optimizer cannot exhaustively search all possible execution plans and may end up with something that is suboptimal. However, you can pretty much assume that a simple case like you have presented is going to always be "seen through" and optimized away.
So the answer is that you should get just as good performance as if the two queries were combined. Now, if the values you are joining on are composite, that is they are the result of a computation or concatenation, then you are almost certainly not going to get the predicate push you want that will make the index useful, because the server won't or can't do a seek based on a partial string or after performing reverse arithmetic or something.
May I suggest that in the future, before asking questions like this here, you simply examine the execution plan for yourself to validate that it is using the index? You could have answered your own question with a little experimentation. If you still have questions, then come post, but in the meantime try to do some of your own research as a sign of respect for the people who are helping you.
To see execution plans, in SQL Server Management Studio (2005 and up) or SQL Query Analyzer (SQL 2000) you can just click the "Show Execution Plan" button on the menu bar, run your query, and switch to the tab at the bottom that displays a graphical version of the execution plan. Some little poking around and hovering your mouse over various pieces will quickly show you which indexes are being used on which tables.
However, if things aren't as you expect, don't automatically think that the server is making a mistake. It may decide that scanning your main table without using the index costs less--and it will almost always be right. There are many reasons that scanning can be less expensive, one of which is a very small table, another of which is that the number of rows the server statistically guesses it will have to return exceeds a significant portion of the table.
These both queries are same. The second query will be transformed just same as first one during transformation.
However, if you have specific requirement I would suggest that you put the whole code.Then It would be much easier to answer your question.
Few days ago I wrote one query and it gets executes quickly but now a days it takes 1 hrs.
This query run on my SQL7 server and it takes about 10 seconds.
This query exists on another SQL7 server and until last week it took about
10 seconds.
The configuration of both servers are same. Only the hardware is different.
Now, on the second server this query takes about 30 minutes to extract the s
ame details, but anybody has changed any details.
If I execute this query without Where, it'll show me the details in 7
seconds.
This query still takes about same time if Where is problem
Without seeing the query and probably the data I can't do a lot other than offer tips.
Can you put more constraints on the query. If you can reduce the amount of data involved then this will speed up the query.
Look at the columns used in your joins, where and having clauses and order by. Check that the tables that the columns belong to contain indices for these columns.
Do you need to use the user defined function or can it be done another way?
Are you using subquerys? If so can these be pulled out into separate views?
Hope this helps.
Without knowing how much data is going into your tables, and not knowing your schema, it's hard to give a definitive answer but things to look at:
Try running UPDATE STATS or DBCC REINDEX.
Do you have any indexes on the tables? If not, try adding indexes to columns used in WHERE clauses and JOIN predicates.
Avoid cross table OR clauses (i.e, where you do WHERE table1.col1 = #somevalue OR table2.col2 = #someothervalue). SQL can't use indexes effectively with this construct and you may get better performance by splitting the query into two and UNION'ing the results.
What do your functions (UDFs) do and how are you using them? It's worth noting that dropping them in the columns part of a query gets expensive as the function is executed per row returned: thus if a function does a select against the database, then you end up running n + 1 queries against the database (where n = number of rows returned in the main select). Try and engineer the function out if possible.
Make sure your JOINs are correct -- where you're using a LEFT JOIN, revisit the logic and see if it needs to be a LEFT or whether it can be turned into an INNER JOIN. Sometimes people use LEFT JOINs, but when you examine the logic in the rest of the query, it can sometimes be apparent that the LEFT JOIN gives you nothing (because, for example, someone may had added a WHERE col IS NOT NULL predicate against the joined table). INNER JOINs can be faster, so it's worth reviewing all of these.
It would be a lot easier to suggest things if we could see the query.
I have a query with about 6-7 joined tables and a FREETEXT() predicate on 6 columns of the base table in the where.
Now, this query worked fine (in under 2 seconds) for the last year and practically remained unchanged (i tried old versions and the problem persists)
So today, all of a sudden, the same query takes around 1-1.5 minutes.
After checking the Execution Plan in SQL Server 2005, rebuilding the FULLTEXT Index of that table, reorganising the FULLTEXT index, creating the index from scratch, restarting the SQL Server Service, restarting the whole server I don't know what else to try.
I temporarily switched the query to use LIKE instead until i figure this out (which takes about 6 seconds now).
When I look at the query in the query performance analyser, when I compare the ´FREETEXT´query with the ´LIKE´ query, the former has 350 times as many reads (4921261 vs. 13943) and 20 times (38937 vs. 1938) the CPU usage of the latter.
So it really is the ´FREETEXT´predicate that causes it to be so slow.
Has anyone got any ideas on what the reason might be? Or further tests I could do?
[Edit]
Well, I just ran the query again to get the execution plan and now it takes 2-5 seconds again, without any changes made to it, though the problem still existed yesterday. And it wasn't due to any external factors, as I'd stopped all applications accessing the database when I first tested the issue last thursday, so it wasn't due to any other loads.
Well, I'll still include the execution plan, though it might not help a lot now that everything is working again... And beware, it's a huge query to a legacy database that I can't change (i.e. normalize data or get rid of some unneccessary intermediate tables)
Query plan
ok here's the full query
I might have to explain what exactly it does. basically it gets search results for job ads, where there's two types of ads, premium ones and normal ones. the results are paginated to 25 results per page, 10 premium ones up top and 15 normal ones after that, if there are enough.
so there's the two inner queries that select as many premium/normal ones as needed (e.g. on page 10 it fetches the top 100 premium ones and top 150 normal ones), then those two queries are interleaved with a row_number() command and some math. then the combination is ordered by rownumber and the query is returned. well it's used at another place to just get the 25 ads needed for the current page.
Oh and this whole query is constructed in a HUGE legacy Coldfusion file and as it's been working fine, I haven't dared thouching/changing large portions so far... never touch a running system and so on ;) Just small stuff like changing bits of the central where clause.
The file also generates other queries which do basically the same, but without the premium/non premium distinction and a lot of other variations of this query, so I'm never quite sure how a change to one of them might change the others...
Ok as the problem hasn't surfaced again, I gave Martin the bounty as he's been the most helpful so far and I didn't want the bounty to expire needlessly. Thanks to everyone else for their efforts, I'll try your suggestions if it happens again :)
This issue might arise due to a poor cardinality estimate of the number of results that will be returned by the full text query leading to a poor strategy for the JOIN operations.
How do you find performance if you break it into 2 steps?
One new step that populates a temporary table or table variable with the results of the Full Text query and the second one changing your existing query to refer to the temp table instead.
(NB: You might want to try this JOIN with and without OPTION(RECOMPILE) whilst looking at query plans for (A) a free text search term that returns many results (B) One that returns only a handful of results.)
Edit It's difficult to clarify exactly in the absence of the offending query but what I mean is instead of doing
SELECT <col-list>
FROM --Some 6 table Join
WHERE FREETEXT(...);
How does this perform?
DECLARE #Table TABLE
(
<pk-col-list>
)
INSERT INTO #Table
SELECT PK
FROM YourTable
WHERE FREETEXT(...)
SELECT <col-list>
FROM --Some 6 table Join including onto #Table
OPTION(RECOMPILE)
Usually when we have this issue, it is because of table fragmentation and stale statistics on the indexes in question.
Next time, try to EXEC sp_updatestats after a rebuild/reindex.
See Using Statistics to Improve Query Performance for more info.
I have a query that has been running every day for a little over 2 years now and has typically taken less than 30 seconds to complete. All of a sudden, yesterday, the query started taking 3+ hours to complete and was using 100% CPU the entire time.
The SQL is:
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER JOIN vwSomeData alpha ON beta.id = alpha.id
alpha.id is a BIGINT type and beta.id is an INT type. dbo.fnGetStuff() is a simple SELECT statement with 2 INNER JOINs on tables in the same DB, using a WHERE id = #id. The function returns approximately 11000 results.
The view vwSomeData is a simple SELECT statement with two INNER JOINs that returns about 590000 results.
Both the view and the function will complete in less than 10 seconds when executed by themselves. Selecting the results of the function into a temporary table first and then joining on that makes the query finish in < 10 seconds.
How do I troubleshoot what's going on? I don't see any locks in the activity manager.
Look at the query plan. My guess is that there is a table scan or more in the execution plan. This will cause huge amounts of I/O for the few record you get in the result.
You could use the SQL Server Profiler tool to monitor what queries are running on SQL Server. It doesn't show the locks, but it can for instance also give you hints on how to improve your query by suggesting indexes.
If you've got a reasonably recent version of SQL Server Management Studio, it has a Database Tuning Adviser as well, under Tools. It takes a trace from the Profiler and makes some, sometimes highly useful, suggestions. Makes sure there's not too many queries - it takes a long time to build advice.
I'm not an expert on it, but have had some luck with it in the past.
Do you need to use a function? Can you re-write the entire thing into a stored procedure in which you pass in the #ID as a parameter.
Even if your table has indexes because you pass the #ID as a variable to the WHERE clause potentially greatly increasing the amount of time for the query to run.
The reason the indexes may not be used is because the Query Analyzer does not know the value of the variables when it selects an access method to perform the query. Because this is a batch file, only one pass is made of the Transact-SQL code, preventing the Query Optimizer from knowing what it needs to know in order to select an access method that uses the indexes.
You might want to consider an INDEX query hint if you cannot re-write the SQL.
it might also be possible, since this just started happening, that the INDEXes have become fragmented and might need to be rebuilt.
I've had similar problems with joining functions that return large datasets. I had to do what you've already suggested. Put the results in a temp table and join on that.
Look at the estimated plan, this will probably shed some light. Typically when query cost gets orders of magnitude more expensive it is because a loop or merge join is being used where a hash join is more appropriate. If you see a loop or merge join in the estimated plan, look at the number of rows it expects to process - is it far smaller than the number of rows you know will actually be in play? You can also specify a hint to use a hash join and see if it performs much better. If so, try updating statistics and see if it goes back to a hash join without a hint.
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER HASH JOIN vwSomeData alpha ON beta.id = alpha.id
-- having no idea what type of schema is in place and just trying to throw out ideas:
Like others have said... use Profiler and find the source of pain... but I'm thinking it is the function on the other database. Since that function might be a source of pain, have you thought about a little denormalization or anything on [DifferentDatabase]. I think you'll find a bit more scalability in joining to a more flattened table with indexes than a costly function.
Run this command:
SET SHOWPLAN_ALL ON
Then run your query. It will display the execution plan, look for a "SCAN" on an index or a table. That is most likely what is happening to your query now. If that is the case, try to figure out why it is not using indexes now (refresh statistics, etc)