I have a database restored on two different machines (developer machine, and tester machine), and whilst not identical they have similar performance.
Using the following query (obfuscated):
CREATE TABLE #TmpTable (MapID int, Primary key(MapID))
UPDATE MapTable
SET Tstamp = GetDate()
FROM MapTable
JOIN Territory as Territory ON Territory.ID = MapTable.TerritoryID
LEFT JOIN #TmpTable ON #TmpTable.MapID = MapTable.MapID
WHERE MapTable.TStamp > DateAdd(year, -100, GETDATE())
AND Territory.Name IS NOT null
AND Territory.Name NOT LIKE '!%'
AND #TmpTable.MapID IS NULL
For 400k records, the developer machine updates in about 4 seconds, but on the tester machine it updates in about 25 seconds; the same DB was restored on both machines.
The problem is that when running this through a tool we use, the timeout for queries is set to 30 seconds and it timeouts 90+ % of the time on tester machine.
But the execution plan on both is the same...
Can anyone suggest why this is, and possible optimisation(s)?
One thing that I can see that 'might' have an impact on performance is the use of the GETDATE() function that might be called twice for each record, but definitely once, so that's 400K calls to the function!
I would put the result of GETDATE() into a variable and use that, I always do that unless there is a very good reason no to, for example the changes in the date throughout the query is required like in batch processing within a CURSOR.
However, I doubt this will be the main issue with performance. With such a large difference in execution time between the different machines where the execution plan is the same I would be looking at factors external to SQL such as CPU usage, HDD usage, speed and fragmentation etc.
Related
I'm running into an interesting issue in production, which I cannot replicate in our QA/Staging environments.
I have a query that is doing dirty reads on a fairly large table (around 6 million rows, but we only keep the last 90 days of data in it, older records are warehoused in a different database). This table has lots of writes to it, as it logs page views, but only occasionally is data read from the table.
Recently I noticed that when one specific query is running, SQL Server 2019 starts generating a ton of WRITELOG waits and appears to hold up any other requests that are trying to write to the database.
Now the query itself has nolock hints on all the tables, because it's okay if dirty data is returned. We use the nolock hints because the writes to the table are extremely frequently and queries to this table can be slow because there are a lot of page scans required.
The query itself looks something like this:
select
clt.ViewDate, clt.UserId, clt.RemoteAddress, clt.LibraryId, clt.Parameters
, u.Fullname
, cl.Id as VideoId, cl.Title
-- we need a compound key for each row, so we can count the unique rows
, case
when clt.ViewDate is null then null
else row_number() over (order by clt.ViewDate, clt.UserId, clt.LibraryId, clt.Parameters)
end as compoundKey
from
ContentLibrary as cl (nolock)
left join
(
ContentLibraryTracking as clt (nolock)
inner join
[User] as u (nolock)
on
clt.UserId = u.UserId
)
on
clt.ViewDate between #startDate and #endDate
and
clt.Parameters like #filter
where
1 = 1
and
cl.ContentType = #contentType
order by
clt.ViewDate
The problem table appears to be the ContentLibraryTracking. This is the table that has millions of rows and has lots of inserts and we warehouse rows nightly, so there can be a lot of page fragmentation. We do defrag the indices and stats weekly on the table.
When this query is running, sp_BlitzWho will report the query has entered into a CXCONSUMER. I will then see SQL Server 2019 starting to queue processes with a WRITELOG wait. This processed remain in this state until the query has finished running.
Since our application has some kind of write transaction with every page view, this means this query is holding up execution for entire application, which is obviously bad.
While I know have page scans is bad for a query plan, the query requires searching patterns in a varchar column, which is why the page scans happen. Since the reads are very infrequently, the table is optimized for writes since those are extremely frequent. And while the query could perform better, considering the work it's doing even when it's slow it runs within 15 seconds or so.
One thing I do see from the sp_BlitzWho results is the query is using parallelism and it also states the Transaction Isolation Level is Read Committed (which I would unexpected Read Uncommitted since all the tables have a nolock hint).
What would cause a query with dirty reads to be forcing the database to queue up WRITELOG events?
I could see this happening if the query was altering data and causing it's own transaction log entries, but that should not be happening with the query. That's the whole reason we are using the nolock hint on the tables.
Also, our database, log files and tempdb are all on their own logical storage devices, so reads from the database should not be causing a IO problems writing to the transaction log files.
A couple of notes on the environment:
We are running Microsoft SQL Server 2019 (RTM-CU8-GDR) (KB4583459) - 15.0.4083.2 (X64))
The database is running in a VM
We backup transaction logs every 5 minutes (could this be the issue?)
Memory and CPU usage appear fine with the query runs
SQL Monitor 11 only really shows spikes in the log flushes and waits (which would match the behavior). Page splits, buffer cache & page are all normal. I do see the "disk read bytes/sec" go up on the logic drive that has the database on it, but the writes on all drives (including the transaction logs) look okay.
Any thoughts would be greatly appreciated as I'm really scratching my head over this issue.
Right after I posted my question I started looking at the sp_BlitzWho results in more detail. I noticed the parallelism was using all the CPUs. So I changed the MAXDOP to half the CPU/cores and this appears to have resolved the issue. I'm going to keep monitoring the situation, but looks like an instance where the MAXDOP was not set correctly.
It make sense that if a query is eating up all the available cores, that other threads would be waiting. I was just thrown off by the WRITELOG waits.
I have a Azure SQL production database that runs at around 10-20% DTU usage on average, however, I get DTU spikes that take it upwards of 100% at times. Here is a sample from the past 1 hour:
I realize this could be a rouge query, so I switched over to the Query Performance Insight tab, and I find the following from the past 24 hours:
This chart makes sense with regards to the CPU usage line. Query 3780 takes the majority of at CPU, as expected with my application. The Overall DTU (red) line seems to follow this correctly (minus the spikes).
However, in the DTU Components charts I can see large Data IO spikes occurring that coincide with the Overall DTU spikes. Switching over to the TOP 5 queries by Data IO, I see the following:
This seems to indicate that there are no queries that are using high amounts of Data IO.
How do I find out where this Data IO usage is coming from?
Finally, I see that there is this one, "odd ball" query (7966) listed under the TOP 5 queries by Data IO with only 5 executions. Selecting it shows the following:
SELECT StatMan([SC0], [SC1], [SC2], [SB0000])
FROM (SELECT TOP 100 PERCENT [SC0], [SC1], [SC2], step_direction([SC0]) over (order by NULL) AS [SB0000]
FROM (SELECT [UserId] AS [SC0], [Type] AS [SC1], [Id] AS [SC2] FROM [dbo].[Cipher] TABLESAMPLE SYSTEM (1.828756e+000 PERCENT)
WITH (READUNCOMMITTED) ) AS _MS_UPDSTATS_TBL_HELPER
ORDER BY [SC0], [SC1], [SC2], [SB0000] ) AS _MS_UPDSTATS_TBL
OPTION (MAXDOP 16)
What is this query?
This does not look like any query that my application has created/uses. The timestamps on the details chart seem to line up with the approximate times of the overall Data IO spikes (just prior to 6am) which leads me to think this query has something to do with all of this.
Are there any other tools can I use to help isolate this issue?
The query is updating statistics..this occurs when this setting AUTO UPDATE STATISTICS is on..This should be kept on and you can't turn it off..this is a best practice..
You should update stats manually only when when you see a query not performing well and stats are off for that query..
Also below are some rules when SQL will update stats automatically for you
When a table with no rows gets a row
When 500 rows are changed to a table that is less than 500 rows
When 20% + 500 are changed in a table greater than 500 rows
By ‘change’ we mean if a row is inserted, updated or deleted. So, yes, even the automatically-created statistics get updated and maintained as the data changes.There were some changes to these rules in recent versions and sql can update stats more often
References:
https://www.sqlskills.com/blogs/erin/understanding-when-statistics-will-automatically-update/
It seems that query is part of the automatic update of statistics process. To mitigate the impact of this process on production you can regularly update statistics and indexes using runbooks as explained here. Run sp_updatestats to immediately try to mitigate the impact of this process.
I'm using this SELECT Statment
SELECT ID, Code, ParentID,...
FROM myTable WITH (NOLOCK)
WHERE ParentID = 0x0
This Statment is repeated each 15min (Through Windows Service)
The problem is the database become slow to other users when this query is runnig.
What is the best way to avoid slow performance while query is running?
Generate an execution plan for your query and inspect it.
Is the ParentId field indexed?
Are there other ways you might optimize the query?
Is it possible to increase the performance of the server that is hosting SQL Server?
Does it need more disk or RAM?
Do you have separate drives (spindles) for operating system, data, transaction logs, temporary databases?
Something else to consider - must you always retrieve the very latest values from this table for your application, or might it be possible to cache prior results and use those for some length of time?
Seems your table got huge number of records. You can think of implementing page-wise retrieval of data. You can first request for say TOP 100 rows and then having multiple calls to fetch rest of data.
I still don't understand need to run such query every 15 mins. You may think of implementing a stored procedure which can perform majority of processing and return you a small subset of data. This will be good improvement if it suits your requirement.
I see a similar question here from 2013, but it was not answered so I am posting my version.
We are using SQL Server 2008 (SP4) - 10.0.6000.29 (X64) and have a database that is about 70GB in size with about 350 tables. On a daily basis, there are only a small number of updates occurring, though a couple times a year we dump a fair amount of data into it. There are several Windows Services that constantly query the database, but rarely updated it. There are also several websites that use it, and desktop applications (again, minimal daily updates).
The problem we have is that every once in a while a query that hits certain records will take much longer than normal. The following is a bogus example:
This query against 2 tables with less than 600 total records might take 30+ seconds:
select *
from our_program_access bpa
join our_user u on u.user_id = bpa.user_id
where u.user_id = 50 and program_name = 'SomeApp'
But when you change the user_id value to another user record, it takes less than one second:
select *
from our_program_access bpa
join our_user u on u.user_id = bpa.user_id
where u.user_id = 51 and program_name = 'SomeApp'
The real queries that are being used are a little more complex, but the idea is the same: search ID 50 takes 30+ seconds, search ID 51 takes < 1 second, but both return only 1 record out of about 600 total.
We have found that the issue seems related to the statistics. When this problem occurs, we run sp_updatestats, and all the queries are equal and fast in time. So, we started to run sp_updatestats in a maintainenance plan every night. But the problem still pops up. We also tried setting AUTO_UPDATE_STATISTICS_ASYNC on, but the problem eventually popped up.
While the database is large, it doesn't really undergo tremendous changes, though it does face constant queries from different services.
There are several other databases on the same server such as a mail log, SharePoint, and web filtering. Overall, performance is very good until we run into this problem.
Does it make sense that on a database that undergoes relatively small changes daily would need to run sp_updatstats so frequently? What else can we do to resolve this issue?
I have a query with about 6-7 joined tables and a FREETEXT() predicate on 6 columns of the base table in the where.
Now, this query worked fine (in under 2 seconds) for the last year and practically remained unchanged (i tried old versions and the problem persists)
So today, all of a sudden, the same query takes around 1-1.5 minutes.
After checking the Execution Plan in SQL Server 2005, rebuilding the FULLTEXT Index of that table, reorganising the FULLTEXT index, creating the index from scratch, restarting the SQL Server Service, restarting the whole server I don't know what else to try.
I temporarily switched the query to use LIKE instead until i figure this out (which takes about 6 seconds now).
When I look at the query in the query performance analyser, when I compare the ´FREETEXT´query with the ´LIKE´ query, the former has 350 times as many reads (4921261 vs. 13943) and 20 times (38937 vs. 1938) the CPU usage of the latter.
So it really is the ´FREETEXT´predicate that causes it to be so slow.
Has anyone got any ideas on what the reason might be? Or further tests I could do?
[Edit]
Well, I just ran the query again to get the execution plan and now it takes 2-5 seconds again, without any changes made to it, though the problem still existed yesterday. And it wasn't due to any external factors, as I'd stopped all applications accessing the database when I first tested the issue last thursday, so it wasn't due to any other loads.
Well, I'll still include the execution plan, though it might not help a lot now that everything is working again... And beware, it's a huge query to a legacy database that I can't change (i.e. normalize data or get rid of some unneccessary intermediate tables)
Query plan
ok here's the full query
I might have to explain what exactly it does. basically it gets search results for job ads, where there's two types of ads, premium ones and normal ones. the results are paginated to 25 results per page, 10 premium ones up top and 15 normal ones after that, if there are enough.
so there's the two inner queries that select as many premium/normal ones as needed (e.g. on page 10 it fetches the top 100 premium ones and top 150 normal ones), then those two queries are interleaved with a row_number() command and some math. then the combination is ordered by rownumber and the query is returned. well it's used at another place to just get the 25 ads needed for the current page.
Oh and this whole query is constructed in a HUGE legacy Coldfusion file and as it's been working fine, I haven't dared thouching/changing large portions so far... never touch a running system and so on ;) Just small stuff like changing bits of the central where clause.
The file also generates other queries which do basically the same, but without the premium/non premium distinction and a lot of other variations of this query, so I'm never quite sure how a change to one of them might change the others...
Ok as the problem hasn't surfaced again, I gave Martin the bounty as he's been the most helpful so far and I didn't want the bounty to expire needlessly. Thanks to everyone else for their efforts, I'll try your suggestions if it happens again :)
This issue might arise due to a poor cardinality estimate of the number of results that will be returned by the full text query leading to a poor strategy for the JOIN operations.
How do you find performance if you break it into 2 steps?
One new step that populates a temporary table or table variable with the results of the Full Text query and the second one changing your existing query to refer to the temp table instead.
(NB: You might want to try this JOIN with and without OPTION(RECOMPILE) whilst looking at query plans for (A) a free text search term that returns many results (B) One that returns only a handful of results.)
Edit It's difficult to clarify exactly in the absence of the offending query but what I mean is instead of doing
SELECT <col-list>
FROM --Some 6 table Join
WHERE FREETEXT(...);
How does this perform?
DECLARE #Table TABLE
(
<pk-col-list>
)
INSERT INTO #Table
SELECT PK
FROM YourTable
WHERE FREETEXT(...)
SELECT <col-list>
FROM --Some 6 table Join including onto #Table
OPTION(RECOMPILE)
Usually when we have this issue, it is because of table fragmentation and stale statistics on the indexes in question.
Next time, try to EXEC sp_updatestats after a rebuild/reindex.
See Using Statistics to Improve Query Performance for more info.