Select statement for over 500k records

Select statement for over 500k records - sql-server

I'm using this SELECT Statment
SELECT ID, Code, ParentID,...
FROM myTable WITH (NOLOCK)
WHERE ParentID = 0x0
This Statment is repeated each 15min (Through Windows Service)
The problem is the database become slow to other users when this query is runnig.
What is the best way to avoid slow performance while query is running?

Generate an execution plan for your query and inspect it.
Is the ParentId field indexed?
Are there other ways you might optimize the query?
Is it possible to increase the performance of the server that is hosting SQL Server?
Does it need more disk or RAM?
Do you have separate drives (spindles) for operating system, data, transaction logs, temporary databases?
Something else to consider - must you always retrieve the very latest values from this table for your application, or might it be possible to cache prior results and use those for some length of time?

Seems your table got huge number of records. You can think of implementing page-wise retrieval of data. You can first request for say TOP 100 rows and then having multiple calls to fetch rest of data.
I still don't understand need to run such query every 15 mins. You may think of implementing a stored procedure which can perform majority of processing and return you a small subset of data. This will be good improvement if it suits your requirement.

Related

Slow query operations on Azure database

I have 1.2 million rows in Azure data table. The following command:
DELETE FROM _PPL_DETAIL WHERE RunId <> 229
is painfully slow.
There is an index on RunId.
I am deleting most of the data.
229 is a small number of records.
It has been running for an hour now
Should it take this long?
I am pretty sure it will finish.
Is there anything I can do to make operations like this faster?
The database does have a PK, although it is a dummy PK (not used). I already saw that as an optimization need to help this problem, but it still takes way too long (SQL Server treats a table without a PK differently -- much less efficient). It is still taking 1+ hour.

How about trying something like below
BEGIN TRAN
SELECT * INTO #T FROM _PPL_DETAIL WHERE RunId = 229
TRUNCATE TABLE _PPL_DETAIL
INSERT INTO _PPL_DETAIL
SELECT * FROM #T
COMMIT TRAN

Without knowing what database tier is using the database where that statment runs it is not easy to help you. However, let us tell you how the system works so that you can make this determination with a bit more investigation by yourself.
Currently the log commit rate is limited by the tier the database has. Deletes are fundamentally limited on the ability to write out log records (and replicate them to multiple machines in case your main machine dies). When you select records, you don't have to go over the network to N machines and you may not even need to go to the local disk if the records are preserved in memory, so selects are generally expected to be faster than inserts/updates/deletes because of the need to harden log for you. You can read about the specific limits for different reservation sizes are here: DTU Limits and vCore Limits.
One common problem is to do individual operations in a loop (like a cursor or driven from the client). This implies that each statement has a single row updated and thus has to harden each log record serially because the app has to wait for the statement to return before submitting the next statement. You are not hitting that since you are running a big delete as a single statement. That could be slow for other reasons such as:
Locking - if you have other users doing operations on the table, it could block the progress of the delete statement. You can potentially see this by looking at sys.dm_exec_requests to see if your statement is blocking on other locks.
Query Plan choice. If you have to scan a lot of rows to delete a small fraction, you could be blocked on the IO to find them. Looking at the query plan shape will help here, as will set statistics time on (We suggest you change the query to do TOP 100 or similar to get a sense of whether you are doing lots of logical read IOs vs. actual logical writes). This could imply that your on-disk layout is suboptimal for this problem. The general solutions would be to either pick a better indexing strategy or to use partitioning to help you quickly drop groups of rows instead of having to delete all the rows explicitly.
An additional strategy to have better performance with deletes is to perform batching.

As I know SQL Server had a change and the default DOP is 1 on their servers, so if you run the query with OPTION(MAXDOP 0) could help.
Try this:
DELETE FROM _PPL_DETAIL
WHERE RunId <> 229
OPTION (MAXDOP 0);

Does SQL Server always physically build the full result set?

Imagine I have a big table with 20 columns and billion lines of data. Then I run a simple query like:
select [First Name], [Last Name]
from Audience;
After that I read the result set sequentially. Will SQL Server physically create all records (i.e. billion records) on the server side in the result set before I will start reading it? Is there any query plan that will build the result set dynamically while feeding it to the client?
I understand that concurrency reasons may prevent this. Can I give any hint that multiuser access is not possible? Maybe I should use cursors?

Depends on the query plan. If the query does not require any temporary internal structures then yes you get immediate response even before the full recordset has been constructed. If the query does require temporary internal storage (e.g. you are sorting it in a manner that doesn't match any index, or an index is available but a different one is used because it requires less I/O) then you will have to wait until the full recordset is constructed.
The only way to tell is to look at the query plan and examine each and every step. You will need to know how to interpret them... for example, a DISTINCT will require a temporary structure whereas a FLOW DISTINCT will not. If the query plan shows an EAGER SPOOL you will definitely have to wait, although there are a few things you can do to avoid them.
Note: You can't rely on this-- query plans can change depending not just on schema or indexes but on database statistics (e.g. selectivity), which are always changing.

Speeding up [select * into table] across linked servers

I'm trying to copy a table with size ~50 million rows into another database on a link server. It does not have any indexes (although i wouldn't think that should make a difference). I've used the following query:
select * into [db2].[schema].[table_name]
from
openquery([linked_server_name],
'select * from [db1].[schema].[table_name]')
This took approximately 7 minutes.
This seems suspiciously long for what I intended to be a simple copy and paste. Am I missing something?
I need to run this on a regular basis and would ideally like to keep it as automated as possible (no manual copying tables across servers using SISS would be ideal)
Any ideas would be highly appreciated!
Thanks a bunch

There could be lots of different reasons at play here each having a cumulative effect on the 'slowness'.
Looking at the wait states will be the first port of call.
Indexing ( at least on the select side ) isnt an issue here , there is no predicate used ( at to a lesser extent using all the columns ) therefore how would you expect an index to be helpful ??
I would say that number of rows is not a helpful metric... How big in terms of MB / GB is the source data set ? Use the "Include Client Statistics" in SSMS to get an accurate nummber. Now, if that is 'big' how log does it take to drag a .zip file of the same size over the network ?

Without the indexing of the table, the SELECT query is bound to be slow. Indexes help in improving the performance of the select query. The indexes slow down the INSERT queries as they have to add the data as well as index it simultaneously. On the contrary, the index creation boosts up the SELECT query.
Also, is your data copied transactional in nature or you can do away with dirty read? If dirty read is fine then you can use WITH(NOLOCK) in the SELECT query to avoid and transaction issues.

Clustered DB - Stado - Slow first query

Using PostgreSQL in a clustered database (stado) on two nodes. I managed to configure stado coordinator and nodes agents successfully but when I try running a heavy query, the first time it takes too long to show results then after that it was fast.
When I restart the server it goes slow again. It's like stado does some caching or something. I thought the problem was because of stado initialization and thus configured agents but still the problem exists! Any ideas?
EDIT
Query:
SELECT id,position,timestamp
FROM table t1
WHERE id <> 0
AND ST_Intersects(ST_Buffer_Meters(ST_SetSRID(
ST_MakePoint(61.4019, 15.218205), 4326), 1160006), position)
AND timestamp BETWEEN '2013-10-01' AND '2014-01-01';
Explain:
ٍٍStep 0
_______
Target: CREATE UNLOGGED TABLE "TMPTT7_1" ( "XCOL1" INT) WITHOUT OIDS
SELECT: SELECT count(*) AS "XCOL1" FROM "t1" WHERE "t1"."timestamp" BETWEEN '2013-10-01' AND '2014-01-01' AND ("t1"."id"<>0) AND ST_Intersects(ST_Buffer_Meters(ST_SetSRID(
ST_MakePoint(61.4019, 15.218205), 4326), 1160006), "t1"."position")
Step: 1
_______
Select: SELECT SUM("XCOL1") AS "EXPRESSION6" FROM "TMPTT7_1"
Drop:
TMPTT7_1

Two reasons.
Caching, obviously. When a query is executed the first time with cold cache, obviously the cache is populated. That goes for system cache as well as database cache, both work together, at least in standard Postgres. Can make a huge difference.
Query plan caching, possibly. To a much lesser degree. If you run the same query in a single session repeatedly, plans for PL/pgSQL functions for instance are cached.
Depending on your type of connection to the database, there may also be network latency, which may be higher for the first call.

Caching in memory is the reason, that is correct. A good tip for this type of situation is to "warm-up" the database each time you restart it with a script that runs the query (or a similar query that still accesses the same data). In some cases I have seen instances where several "warm-up" queries are run after any type of restart, then users still have a good experience. You will still have to wait for the warm-up query to finish after a restart, but at least it will not be a user waiting for that.
The other possibility is that you are doing a non-indexed query, you should check for that. If it is indexed and accessing a reasonable amount of data by a key, then it should be fast (even without the warm-up for most queries). This is a very common problem, easy to miss. Use the Postres EXPLAIN command, it will show you how the query is being performed against the database (i.e., with an index or without).

SQL Server query taking up 100% CPU and runs for hours

I have a query that has been running every day for a little over 2 years now and has typically taken less than 30 seconds to complete. All of a sudden, yesterday, the query started taking 3+ hours to complete and was using 100% CPU the entire time.
The SQL is:
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER JOIN vwSomeData alpha ON beta.id = alpha.id
alpha.id is a BIGINT type and beta.id is an INT type. dbo.fnGetStuff() is a simple SELECT statement with 2 INNER JOINs on tables in the same DB, using a WHERE id = #id. The function returns approximately 11000 results.
The view vwSomeData is a simple SELECT statement with two INNER JOINs that returns about 590000 results.
Both the view and the function will complete in less than 10 seconds when executed by themselves. Selecting the results of the function into a temporary table first and then joining on that makes the query finish in < 10 seconds.
How do I troubleshoot what's going on? I don't see any locks in the activity manager.

Look at the query plan. My guess is that there is a table scan or more in the execution plan. This will cause huge amounts of I/O for the few record you get in the result.

You could use the SQL Server Profiler tool to monitor what queries are running on SQL Server. It doesn't show the locks, but it can for instance also give you hints on how to improve your query by suggesting indexes.

If you've got a reasonably recent version of SQL Server Management Studio, it has a Database Tuning Adviser as well, under Tools. It takes a trace from the Profiler and makes some, sometimes highly useful, suggestions. Makes sure there's not too many queries - it takes a long time to build advice.
I'm not an expert on it, but have had some luck with it in the past.

Do you need to use a function? Can you re-write the entire thing into a stored procedure in which you pass in the #ID as a parameter.
Even if your table has indexes because you pass the #ID as a variable to the WHERE clause potentially greatly increasing the amount of time for the query to run.
The reason the indexes may not be used is because the Query Analyzer does not know the value of the variables when it selects an access method to perform the query. Because this is a batch file, only one pass is made of the Transact-SQL code, preventing the Query Optimizer from knowing what it needs to know in order to select an access method that uses the indexes.
You might want to consider an INDEX query hint if you cannot re-write the SQL.
it might also be possible, since this just started happening, that the INDEXes have become fragmented and might need to be rebuilt.

I've had similar problems with joining functions that return large datasets. I had to do what you've already suggested. Put the results in a temp table and join on that.

Look at the estimated plan, this will probably shed some light. Typically when query cost gets orders of magnitude more expensive it is because a loop or merge join is being used where a hash join is more appropriate. If you see a loop or merge join in the estimated plan, look at the number of rows it expects to process - is it far smaller than the number of rows you know will actually be in play? You can also specify a hint to use a hash join and see if it performs much better. If so, try updating statistics and see if it goes back to a hash join without a hint.
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER HASH JOIN vwSomeData alpha ON beta.id = alpha.id

-- having no idea what type of schema is in place and just trying to throw out ideas:
Like others have said... use Profiler and find the source of pain... but I'm thinking it is the function on the other database. Since that function might be a source of pain, have you thought about a little denormalization or anything on [DifferentDatabase]. I think you'll find a bit more scalability in joining to a more flattened table with indexes than a costly function.

Run this command:
SET SHOWPLAN_ALL ON
Then run your query. It will display the execution plan, look for a "SCAN" on an index or a table. That is most likely what is happening to your query now. If that is the case, try to figure out why it is not using indexes now (refresh statistics, etc)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight