Full Text Query takes minutes instead of sub seconds after upgrade

Full Text Query takes minutes instead of sub seconds after upgrade - sql-server

We just upgraded our SQL Server 2005 to SQL server 2008 R2 and noticed some performance problems.
The query below was already slow but now in 2008 it just times out. We rebuild the catalog to make sure its freshly made on 2008
DECLARE #FREETEXT varchar(255) = 'TEN-T'
select Distinct ...
from
DOSSIER_VERSION
inner join
DOSSIER_VERSION_LOCALISED ...
where
CONTAINS(DOSSIER_VERSION.*,#FREETEXT)
or
CONTAINS(DOSSIER_VERSION_LOCALISED.*,#FREETEXT)
The query takes minutes if you have both conditions enabled.
If you just put the following in the where
CONTAINS(DOSSIER_VERSION.*,#FREETEXT)
Its super fast. Same goes for the case if its just
CONTAINS(DOSSIER_VERSION_LOCALISED.*,#FREETEXT)
Since we are or'ing the results I would expect the time for this query to run to be less than the sum but as stated above it takes minutes/times out.
Can anyone tell me what is going on here? If I use a union (which is conceptually the same as the or) the performance problem is gone but I would like to know what issue I am running into here since I want to avoid rewriting queries.
Regards, Tom

See my answers to these very similar questions:
Adding more OR searches with
CONTAINS Brings Query to Crawl
SQL Server full text query across
multiple tables - why so slow?
The basic idea is that using LEFT JOINs to CONTAINSTABLE (or FREETEXTTABLE) performs significantly better than having multiple CONTAINS (or FREETEXT) ORed together in the WHERE clause.

Related

Stable SQL Server stored procedure unusual drop in performance

I'm experiencing a very strange issue in SQL Server 2005.
Yesterday users reported slowness in a specific part of our database app. I am not sure how pervasive the slowness is - it's definitely not everywhere, as this is the only part of the system reported - but I isolated the relevant stored procedure which used to run in 2-3 seconds and is now consistently running in 50-60 seconds.
It's a complex query -- multiple layers of subqueries. It returns only 42 rows in 16 columns.
The query looks like this:
select col1,2,3,4,5,...
from
( select .... ) t
ORDER BY col1
I started picking apart the query to find out what was slow and found that removing the final ORDER BY clause brought the performance back in line.
This is highly mysterious. I could not replicate the problem on our DEV server. It's only 42 rows so the order by clause should be inconsequential. Execution plans are identical w/ and without the order by, and on the two servers.
Any brainstorming about what could have changed on our production server would be much appreciated!

ALTER INDEX ALL ON BiggestTableInQuery
REBUILD;
Did the trick! It was horribly fragmented, 55-98% on 5 indexes, clustered 55%. So now I'm going to review other large tables as well.
Thanks simon at rcl and Tab Alleman!

EXECUTE AS vs SELECT FROM linked server

I've got an Oracle server integrated with MS SQL as a linked server. Currently I'm working on the query optimization. I've found out that queries that written as following:
SELECT colName1, colName2, ..
FROM ORACLE.TBL_TBLENAME
WHERE something = #something
work very slowly. On the other hand, the same query written as:
EXECUTE ('SELECT colName1, colName2, ..
FROM TBL_TBLENAME
WHERE something :something',#something) at ORACLE
work much faster.
What I'm concerned about is the execution plan. For the first query Estimated Subtree Cost is 0.16, for the second it is 3.36. The second query performs a 'Remote scan'. I don't know whether this is good or not.
The query is supposed to run quite often (around 20 queries in 1 minute).

given you execution plan (and i'm an oracle guy not a sql server guy), it appears that the first one is doing a full table scan and filtering at the sql server end (compute scalar?), whereas the 2nd one is sumitting the filter to oracle and so much quicker.
are the stats up-to-date on the oracle table (perhaps it thinks there's only a few rows in the table so sql server is deciding its better to just fetch the whole table over and do the procesing locally?) and are there any histograms involved on "something"?
if the 2nd one is performing good for you though, is there really a problem?

SQL Server 2005 FREETEXT() Perfomance Issue

I have a query with about 6-7 joined tables and a FREETEXT() predicate on 6 columns of the base table in the where.
Now, this query worked fine (in under 2 seconds) for the last year and practically remained unchanged (i tried old versions and the problem persists)
So today, all of a sudden, the same query takes around 1-1.5 minutes.
After checking the Execution Plan in SQL Server 2005, rebuilding the FULLTEXT Index of that table, reorganising the FULLTEXT index, creating the index from scratch, restarting the SQL Server Service, restarting the whole server I don't know what else to try.
I temporarily switched the query to use LIKE instead until i figure this out (which takes about 6 seconds now).
When I look at the query in the query performance analyser, when I compare the ´FREETEXT´query with the ´LIKE´ query, the former has 350 times as many reads (4921261 vs. 13943) and 20 times (38937 vs. 1938) the CPU usage of the latter.
So it really is the ´FREETEXT´predicate that causes it to be so slow.
Has anyone got any ideas on what the reason might be? Or further tests I could do?
[Edit]
Well, I just ran the query again to get the execution plan and now it takes 2-5 seconds again, without any changes made to it, though the problem still existed yesterday. And it wasn't due to any external factors, as I'd stopped all applications accessing the database when I first tested the issue last thursday, so it wasn't due to any other loads.
Well, I'll still include the execution plan, though it might not help a lot now that everything is working again... And beware, it's a huge query to a legacy database that I can't change (i.e. normalize data or get rid of some unneccessary intermediate tables)
Query plan
ok here's the full query
I might have to explain what exactly it does. basically it gets search results for job ads, where there's two types of ads, premium ones and normal ones. the results are paginated to 25 results per page, 10 premium ones up top and 15 normal ones after that, if there are enough.
so there's the two inner queries that select as many premium/normal ones as needed (e.g. on page 10 it fetches the top 100 premium ones and top 150 normal ones), then those two queries are interleaved with a row_number() command and some math. then the combination is ordered by rownumber and the query is returned. well it's used at another place to just get the 25 ads needed for the current page.
Oh and this whole query is constructed in a HUGE legacy Coldfusion file and as it's been working fine, I haven't dared thouching/changing large portions so far... never touch a running system and so on ;) Just small stuff like changing bits of the central where clause.
The file also generates other queries which do basically the same, but without the premium/non premium distinction and a lot of other variations of this query, so I'm never quite sure how a change to one of them might change the others...
Ok as the problem hasn't surfaced again, I gave Martin the bounty as he's been the most helpful so far and I didn't want the bounty to expire needlessly. Thanks to everyone else for their efforts, I'll try your suggestions if it happens again :)

This issue might arise due to a poor cardinality estimate of the number of results that will be returned by the full text query leading to a poor strategy for the JOIN operations.
How do you find performance if you break it into 2 steps?
One new step that populates a temporary table or table variable with the results of the Full Text query and the second one changing your existing query to refer to the temp table instead.
(NB: You might want to try this JOIN with and without OPTION(RECOMPILE) whilst looking at query plans for (A) a free text search term that returns many results (B) One that returns only a handful of results.)
Edit It's difficult to clarify exactly in the absence of the offending query but what I mean is instead of doing
SELECT <col-list>
FROM --Some 6 table Join
WHERE FREETEXT(...);
How does this perform?
DECLARE #Table TABLE
(
<pk-col-list>
)
INSERT INTO #Table
SELECT PK
FROM YourTable
WHERE FREETEXT(...)
SELECT <col-list>
FROM --Some 6 table Join including onto #Table
OPTION(RECOMPILE)

Usually when we have this issue, it is because of table fragmentation and stale statistics on the indexes in question.
Next time, try to EXEC sp_updatestats after a rebuild/reindex.
See Using Statistics to Improve Query Performance for more info.

SQL Server query taking up 100% CPU and runs for hours

I have a query that has been running every day for a little over 2 years now and has typically taken less than 30 seconds to complete. All of a sudden, yesterday, the query started taking 3+ hours to complete and was using 100% CPU the entire time.
The SQL is:
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER JOIN vwSomeData alpha ON beta.id = alpha.id
alpha.id is a BIGINT type and beta.id is an INT type. dbo.fnGetStuff() is a simple SELECT statement with 2 INNER JOINs on tables in the same DB, using a WHERE id = #id. The function returns approximately 11000 results.
The view vwSomeData is a simple SELECT statement with two INNER JOINs that returns about 590000 results.
Both the view and the function will complete in less than 10 seconds when executed by themselves. Selecting the results of the function into a temporary table first and then joining on that makes the query finish in < 10 seconds.
How do I troubleshoot what's going on? I don't see any locks in the activity manager.

Look at the query plan. My guess is that there is a table scan or more in the execution plan. This will cause huge amounts of I/O for the few record you get in the result.

You could use the SQL Server Profiler tool to monitor what queries are running on SQL Server. It doesn't show the locks, but it can for instance also give you hints on how to improve your query by suggesting indexes.

If you've got a reasonably recent version of SQL Server Management Studio, it has a Database Tuning Adviser as well, under Tools. It takes a trace from the Profiler and makes some, sometimes highly useful, suggestions. Makes sure there's not too many queries - it takes a long time to build advice.
I'm not an expert on it, but have had some luck with it in the past.

Do you need to use a function? Can you re-write the entire thing into a stored procedure in which you pass in the #ID as a parameter.
Even if your table has indexes because you pass the #ID as a variable to the WHERE clause potentially greatly increasing the amount of time for the query to run.
The reason the indexes may not be used is because the Query Analyzer does not know the value of the variables when it selects an access method to perform the query. Because this is a batch file, only one pass is made of the Transact-SQL code, preventing the Query Optimizer from knowing what it needs to know in order to select an access method that uses the indexes.
You might want to consider an INDEX query hint if you cannot re-write the SQL.
it might also be possible, since this just started happening, that the INDEXes have become fragmented and might need to be rebuilt.

I've had similar problems with joining functions that return large datasets. I had to do what you've already suggested. Put the results in a temp table and join on that.

Look at the estimated plan, this will probably shed some light. Typically when query cost gets orders of magnitude more expensive it is because a loop or merge join is being used where a hash join is more appropriate. If you see a loop or merge join in the estimated plan, look at the number of rows it expects to process - is it far smaller than the number of rows you know will actually be in play? You can also specify a hint to use a hash join and see if it performs much better. If so, try updating statistics and see if it goes back to a hash join without a hint.
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER HASH JOIN vwSomeData alpha ON beta.id = alpha.id

-- having no idea what type of schema is in place and just trying to throw out ideas:
Like others have said... use Profiler and find the source of pain... but I'm thinking it is the function on the other database. Since that function might be a source of pain, have you thought about a little denormalization or anything on [DifferentDatabase]. I think you'll find a bit more scalability in joining to a more flattened table with indexes than a costly function.

Run this command:
SET SHOWPLAN_ALL ON
Then run your query. It will display the execution plan, look for a "SCAN" on an index or a table. That is most likely what is happening to your query now. If that is the case, try to figure out why it is not using indexes now (refresh statistics, etc)

Query hangs with INNER JOIN on datetime field

We've got a weird problem with joining tables from SQL Server 2005 and MS Access 2003.
There's a big table on the server and a rather small table locally in Access. The tables are joined via 3 fields, one of them a datetime field (containing a day; idea is to fetch additional data (daily) from the big server table to add data to the local table).
Up until the weekend this ran fine every day. Since yesterday we experienced strange non-time-outs in Access with this query. Non-time-out means that the query runs forever with rather high network transfer, but no timeout occurs. Access doesn't even show the progress bar. Server trace tells us that the same query is exectuted over and over on the SQL server without error but without result either. We've narrowed it down to the problem seemingly being accessing server table with a big table and either JOIN or WHERE containing a date, but we're not really able to narrow it down. We rebuilt indices already and are currently restoring backup data, but maybe someone here has any pointers of things we could try.
Thanks, Mike.

If you join a local table in Access to a linked table in SQL Server, and the query isn't really trivial according to specific limitations of joins to linked data, it's very likely that Access will pull the whole table from SQL Server and perform the join locally against the entire set. It's a known problem.
This doesn't directly address the question you ask, but how far are you from having all the data in one place (SQL Server)? IMHO you can expect the same type of performance problems to haunt you as long as you have some data in each system.
If it were all in SQL Server a pass-through query would optimize and use available indexes, etc.

Thanks for your quick answer!
The actual query is really huge; you won't be happy with it :)
However, we've narrowed it down to a simple:
SELECT * FROM server_table INNER JOIN access_table ON server_table.date = local_table.date;
If the server_table is a big table (hard to say, we've got 1.5 million rows in it; test tables with 10 rows or so have worked) and the local_table is a table with a single cell containing a date. This runs forever. It's not only slow, It just does nothing besides - it seems - causing network traffic and no time out (this is what I find so strange; normally you get a timeout, but this just keeps on running).
We've just found KB article 828169; seems to be our problem, we'll look into that. Thanks for your help!

Use the DATEDIFF function to compare the two dates as follows:
' DATEDIFF returns 0 if dates are identical based on datepart parameter, in this case d
WHERE DATEDIFF(d,Column,OtherColumn) = 0
DATEDIFF is optimized for use with dates. Comparing the result of the CONVERT function on both sides of the equal (=) sign might result in a table scan if either of the dates is NULL.
Hope this helps,
Bill

Try another syntax ? Something like:
SELECT * FROM BigServerTable b WHERE b.DateFld in (SELECT DISTINCT s.DateFld FROM SmallLocalTable s)
The strange thing in your problem description is "Up until the weekend this ran fine every day".
That would mean the problem is really somewhere else.
Did you try creating a new blank Access db and importing everything from the old one ?
Or just refreshing all your links ?

Please post the query that is doing this, just because you have indexes doesn't mean that they will be used. If your WHERE or JOIN clause is not sargable then the index will not be used
take this for example
WHERE CONVERT(varchar(49),Column,113) = CONVERT(varchar(49),OtherColumn,113)
that will not use an index
or this
WHERE YEAR(Column) = 2008
Functions on the left side of the operator (meaning on the column itself) will make the optimizer do an index scan instead of a seek because it doesn't know the outcome of that function
We rebuilt indices already and are currently restoring backup data, but maybe someone here has any pointers of things we could try.
Access can kill many good things....have you looked into blocking at all
run
exec sp_who2
look at the BlkBy column and see who is blocking what

Just an idea, but in SQL Server you can attach your Access database and use the table there. You could then create a view on the server to do the join all in SQL Server. The solution proposed in the Knowledge Base article seems problematic to me, as it's a kludge (if LIKE works, then = ought to, also).
If my suggestion works, I'd say that it's a more robust solution in terms of maintainability.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight