I have a table with 102 columns and 43200 rows. Id column is an identity column and 2 columns have an unique index.
When I just execute
Select *
from MyTable
it takes almost 8 minutes+ over the network.
This table has a Status column which contains 1 or 0. If I select with where Status = 1, then I'm getting 31565 rows and the select is taking 6 minutes+. For your information status 1 completed and will not change ever anymore. But 0 status is working in progress and the rows are changing different columns value by different user stage.
When I select with Status = 0, it takes 1.43 minutes and returns 11568 rows.
How can I increase performance for completed and WIP status query separately or cumulatively? Can I somehow use caching?
The SQL server takes care of caching. At least as long as there is enough free RAM. When it take so long to get the data at first you need to find the bottleneck.
RAM: Is there enough to hold the full table? And is the SQL server configured to use it?
Is there an upper limit to RAM usage? If not SQL server assumes unlimited RAM and this will often end caching in page file, which causes massive slow downs
You said "8+ minutes through network". How long does it take on local execution? Maybe the network is slow
Hard drive: When the table is too big to be held in RAM it gets read from hard drive. HDDs are somewhat slow. Maybe defragmenting the indices could help here (at least somewhat)
If none helps, the SQL profiler might help to show you where the bottleneck actually is to find
This is an interesting question, but it's a little open-ended, more info is needed. I totally agree with allmhuran's comment that maybe you shouldn't be using "select * ..." for a large table. (It could in fact be posted as an answer, it deserves upvotes).
I suspect there may be design issues - Are you using BLOB's? Is the data at least partially normalized? ref https://en.wikipedia.org/wiki/Database_normalization
I Suggest create a non clustered index on "Status" Column. It improves your queries with Where Clause that uses this column.
Related
I have 1.2 million rows in Azure data table. The following command:
DELETE FROM _PPL_DETAIL WHERE RunId <> 229
is painfully slow.
There is an index on RunId.
I am deleting most of the data.
229 is a small number of records.
It has been running for an hour now
Should it take this long?
I am pretty sure it will finish.
Is there anything I can do to make operations like this faster?
The database does have a PK, although it is a dummy PK (not used). I already saw that as an optimization need to help this problem, but it still takes way too long (SQL Server treats a table without a PK differently -- much less efficient). It is still taking 1+ hour.
How about trying something like below
BEGIN TRAN
SELECT * INTO #T FROM _PPL_DETAIL WHERE RunId = 229
TRUNCATE TABLE _PPL_DETAIL
INSERT INTO _PPL_DETAIL
SELECT * FROM #T
COMMIT TRAN
Without knowing what database tier is using the database where that statment runs it is not easy to help you. However, let us tell you how the system works so that you can make this determination with a bit more investigation by yourself.
Currently the log commit rate is limited by the tier the database has. Deletes are fundamentally limited on the ability to write out log records (and replicate them to multiple machines in case your main machine dies). When you select records, you don't have to go over the network to N machines and you may not even need to go to the local disk if the records are preserved in memory, so selects are generally expected to be faster than inserts/updates/deletes because of the need to harden log for you. You can read about the specific limits for different reservation sizes are here: DTU Limits and vCore Limits.
One common problem is to do individual operations in a loop (like a cursor or driven from the client). This implies that each statement has a single row updated and thus has to harden each log record serially because the app has to wait for the statement to return before submitting the next statement. You are not hitting that since you are running a big delete as a single statement. That could be slow for other reasons such as:
Locking - if you have other users doing operations on the table, it could block the progress of the delete statement. You can potentially see this by looking at sys.dm_exec_requests to see if your statement is blocking on other locks.
Query Plan choice. If you have to scan a lot of rows to delete a small fraction, you could be blocked on the IO to find them. Looking at the query plan shape will help here, as will set statistics time on (We suggest you change the query to do TOP 100 or similar to get a sense of whether you are doing lots of logical read IOs vs. actual logical writes). This could imply that your on-disk layout is suboptimal for this problem. The general solutions would be to either pick a better indexing strategy or to use partitioning to help you quickly drop groups of rows instead of having to delete all the rows explicitly.
An additional strategy to have better performance with deletes is to perform batching.
As I know SQL Server had a change and the default DOP is 1 on their servers, so if you run the query with OPTION(MAXDOP 0) could help.
Try this:
DELETE FROM _PPL_DETAIL
WHERE RunId <> 229
OPTION (MAXDOP 0);
I'm using this SELECT Statment
SELECT ID, Code, ParentID,...
FROM myTable WITH (NOLOCK)
WHERE ParentID = 0x0
This Statment is repeated each 15min (Through Windows Service)
The problem is the database become slow to other users when this query is runnig.
What is the best way to avoid slow performance while query is running?
Generate an execution plan for your query and inspect it.
Is the ParentId field indexed?
Are there other ways you might optimize the query?
Is it possible to increase the performance of the server that is hosting SQL Server?
Does it need more disk or RAM?
Do you have separate drives (spindles) for operating system, data, transaction logs, temporary databases?
Something else to consider - must you always retrieve the very latest values from this table for your application, or might it be possible to cache prior results and use those for some length of time?
Seems your table got huge number of records. You can think of implementing page-wise retrieval of data. You can first request for say TOP 100 rows and then having multiple calls to fetch rest of data.
I still don't understand need to run such query every 15 mins. You may think of implementing a stored procedure which can perform majority of processing and return you a small subset of data. This will be good improvement if it suits your requirement.
I'm trying to copy a table with size ~50 million rows into another database on a link server. It does not have any indexes (although i wouldn't think that should make a difference). I've used the following query:
select * into [db2].[schema].[table_name]
from
openquery([linked_server_name],
'select * from [db1].[schema].[table_name]')
This took approximately 7 minutes.
This seems suspiciously long for what I intended to be a simple copy and paste. Am I missing something?
I need to run this on a regular basis and would ideally like to keep it as automated as possible (no manual copying tables across servers using SISS would be ideal)
Any ideas would be highly appreciated!
Thanks a bunch
There could be lots of different reasons at play here each having a cumulative effect on the 'slowness'.
Looking at the wait states will be the first port of call.
Indexing ( at least on the select side ) isnt an issue here , there is no predicate used ( at to a lesser extent using all the columns ) therefore how would you expect an index to be helpful ??
I would say that number of rows is not a helpful metric... How big in terms of MB / GB is the source data set ? Use the "Include Client Statistics" in SSMS to get an accurate nummber. Now, if that is 'big' how log does it take to drag a .zip file of the same size over the network ?
Without the indexing of the table, the SELECT query is bound to be slow. Indexes help in improving the performance of the select query. The indexes slow down the INSERT queries as they have to add the data as well as index it simultaneously. On the contrary, the index creation boosts up the SELECT query.
Also, is your data copied transactional in nature or you can do away with dirty read? If dirty read is fine then you can use WITH(NOLOCK) in the SELECT query to avoid and transaction issues.
I have a query with about 6-7 joined tables and a FREETEXT() predicate on 6 columns of the base table in the where.
Now, this query worked fine (in under 2 seconds) for the last year and practically remained unchanged (i tried old versions and the problem persists)
So today, all of a sudden, the same query takes around 1-1.5 minutes.
After checking the Execution Plan in SQL Server 2005, rebuilding the FULLTEXT Index of that table, reorganising the FULLTEXT index, creating the index from scratch, restarting the SQL Server Service, restarting the whole server I don't know what else to try.
I temporarily switched the query to use LIKE instead until i figure this out (which takes about 6 seconds now).
When I look at the query in the query performance analyser, when I compare the ´FREETEXT´query with the ´LIKE´ query, the former has 350 times as many reads (4921261 vs. 13943) and 20 times (38937 vs. 1938) the CPU usage of the latter.
So it really is the ´FREETEXT´predicate that causes it to be so slow.
Has anyone got any ideas on what the reason might be? Or further tests I could do?
[Edit]
Well, I just ran the query again to get the execution plan and now it takes 2-5 seconds again, without any changes made to it, though the problem still existed yesterday. And it wasn't due to any external factors, as I'd stopped all applications accessing the database when I first tested the issue last thursday, so it wasn't due to any other loads.
Well, I'll still include the execution plan, though it might not help a lot now that everything is working again... And beware, it's a huge query to a legacy database that I can't change (i.e. normalize data or get rid of some unneccessary intermediate tables)
Query plan
ok here's the full query
I might have to explain what exactly it does. basically it gets search results for job ads, where there's two types of ads, premium ones and normal ones. the results are paginated to 25 results per page, 10 premium ones up top and 15 normal ones after that, if there are enough.
so there's the two inner queries that select as many premium/normal ones as needed (e.g. on page 10 it fetches the top 100 premium ones and top 150 normal ones), then those two queries are interleaved with a row_number() command and some math. then the combination is ordered by rownumber and the query is returned. well it's used at another place to just get the 25 ads needed for the current page.
Oh and this whole query is constructed in a HUGE legacy Coldfusion file and as it's been working fine, I haven't dared thouching/changing large portions so far... never touch a running system and so on ;) Just small stuff like changing bits of the central where clause.
The file also generates other queries which do basically the same, but without the premium/non premium distinction and a lot of other variations of this query, so I'm never quite sure how a change to one of them might change the others...
Ok as the problem hasn't surfaced again, I gave Martin the bounty as he's been the most helpful so far and I didn't want the bounty to expire needlessly. Thanks to everyone else for their efforts, I'll try your suggestions if it happens again :)
This issue might arise due to a poor cardinality estimate of the number of results that will be returned by the full text query leading to a poor strategy for the JOIN operations.
How do you find performance if you break it into 2 steps?
One new step that populates a temporary table or table variable with the results of the Full Text query and the second one changing your existing query to refer to the temp table instead.
(NB: You might want to try this JOIN with and without OPTION(RECOMPILE) whilst looking at query plans for (A) a free text search term that returns many results (B) One that returns only a handful of results.)
Edit It's difficult to clarify exactly in the absence of the offending query but what I mean is instead of doing
SELECT <col-list>
FROM --Some 6 table Join
WHERE FREETEXT(...);
How does this perform?
DECLARE #Table TABLE
(
<pk-col-list>
)
INSERT INTO #Table
SELECT PK
FROM YourTable
WHERE FREETEXT(...)
SELECT <col-list>
FROM --Some 6 table Join including onto #Table
OPTION(RECOMPILE)
Usually when we have this issue, it is because of table fragmentation and stale statistics on the indexes in question.
Next time, try to EXEC sp_updatestats after a rebuild/reindex.
See Using Statistics to Improve Query Performance for more info.
I'm having a very particular performance problem at work!
In the system we're using there's a table that holds information about the current workflow process. One of the fields holds a spreadsheet that contains metadata about the process (don't ask me why!! and NO I CAN'T CHANGE IT!!)
The problem is that this spreadsheet is stored in an IMAGE field in an SQL Server 2005 (within a database set with SQL 2000 compatibility).
This table currently has 22K+ lines and even a simple query like this:
SELECT TOP 100 *
FROM OFFENDING_TABLE
Takes 30 seconds to retrieve the data in Query Analyser.
I'm thinking about updating the compatibility to SQL 2005 (once that I was informed that the app can handle it).
The second thing I'm thinking is to change the data-type of the column to varbinary(max) but I don't know if doing this will affect the application.
Another thing that I'm considering is to use sp_tableoption to set the large value types out of row to 1 as it's currently 0, but I have no information if doing this will improve performance.
Does anyone know how to improve performance in such scenario?
Edited to clarify
My problem is that I have no control on what the application asks to the SQL Server, and I did some Reflection on it (the app is a .NET 1.1 website) and it uses the offending field for some internal stuff that I have no idea what it is.
I need to improve the overall performance of this table.
I'd recommend you look into the offending table layout health:
select * from sys.dm_db_index_physical_stats(
db_id(), object_id('offending_table'), null, null, detailed);
Things too look for are avg_fragmentation_in_percent, page_count, avg_page_space_used_in_percent, record_count and ghost_record_count. Cues like high fragmentation, or a high number of ghost records, or a low page used percent indicate problems and things can be improved quite a bit just by rebuilding the index (ie. the table) from scratch:
ALTER INDEX ALL ON offending_table REBUILD;
I'm saying this considering that you cannot change the table nor the app. If you'd be able to change the table and the app, the advice you already got is good advice (don't use '*', dont' select w/o a condition, use the newer varbinary(max) type etc etc).
I'd also look into the average page lifetime in performance counters to understand if the system is memory starved. From your description of the symptomps the system looks IO bound which leads me to think there is little page caching going on, and more RAM could help, as well as a faster IO subsytem. On a SQL 2008 system I would also suggest turning page compression on, but on 2005 you can't.
And, just to be sure, make sure the queries are not blocked by contention from the app itself, ie. the query doesn't spend 90% of that 30 seconds waiting for a row lock. Look at sys.dm_exec_requests while the query is running, see the wait_time, wait_type and wait_resource. Is it PAGEIOLATCH_XX? Or is it a lock? Also, how is the sys.dm_os_wait_stats in your server, what are the top wait reasons?
First of all - don't ever do a SELECT * in production code - reporting or not.
You have three basic choices:
move that blob field out into a separate table if it's not always needed; probably not practical since you mention you cannot change the schema
be more careful with your SELECT statements to select only those fields that you really need - and omit the blob field
see if you can limit your query to include a WHERE clause and find a way to optimize the query plan by e.g. adding a suitable index to the table (if you can)
There's no magic "make this faster" switch - but you can optimize your query or optimize your table layout. Both help. If you can't change anything - neither the table layout, nor add an index, nor change the queries, you'll have a hard time optimizing anything, I'm afraid....
Just changing the field to VARBINARY(MAX) won't change anything at all - no performance improvement to be expected just from changing the data type.
A short answer is to only do SELECTs against multiple rows when the fields returned do not include the offending image field, ie no SELECT *. If you want the value of the image field, retrieve it on a case-by-case basis.
Setting the large value types out of row option should definitely help performance. The row size will be significantly smaller, SQL Server can do a lot fewer physical reads to get throught the table.