Does SQL Server always physically build the full result set?

Does SQL Server always physically build the full result set? - sql-server

Imagine I have a big table with 20 columns and billion lines of data. Then I run a simple query like:
select [First Name], [Last Name]
from Audience;
After that I read the result set sequentially. Will SQL Server physically create all records (i.e. billion records) on the server side in the result set before I will start reading it? Is there any query plan that will build the result set dynamically while feeding it to the client?
I understand that concurrency reasons may prevent this. Can I give any hint that multiuser access is not possible? Maybe I should use cursors?

Depends on the query plan. If the query does not require any temporary internal structures then yes you get immediate response even before the full recordset has been constructed. If the query does require temporary internal storage (e.g. you are sorting it in a manner that doesn't match any index, or an index is available but a different one is used because it requires less I/O) then you will have to wait until the full recordset is constructed.
The only way to tell is to look at the query plan and examine each and every step. You will need to know how to interpret them... for example, a DISTINCT will require a temporary structure whereas a FLOW DISTINCT will not. If the query plan shows an EAGER SPOOL you will definitely have to wait, although there are a few things you can do to avoid them.
Note: You can't rely on this-- query plans can change depending not just on schema or indexes but on database statistics (e.g. selectivity), which are always changing.

Related

Select statement for over 500k records

I'm using this SELECT Statment
SELECT ID, Code, ParentID,...
FROM myTable WITH (NOLOCK)
WHERE ParentID = 0x0
This Statment is repeated each 15min (Through Windows Service)
The problem is the database become slow to other users when this query is runnig.
What is the best way to avoid slow performance while query is running?

Generate an execution plan for your query and inspect it.
Is the ParentId field indexed?
Are there other ways you might optimize the query?
Is it possible to increase the performance of the server that is hosting SQL Server?
Does it need more disk or RAM?
Do you have separate drives (spindles) for operating system, data, transaction logs, temporary databases?
Something else to consider - must you always retrieve the very latest values from this table for your application, or might it be possible to cache prior results and use those for some length of time?

Seems your table got huge number of records. You can think of implementing page-wise retrieval of data. You can first request for say TOP 100 rows and then having multiple calls to fetch rest of data.
I still don't understand need to run such query every 15 mins. You may think of implementing a stored procedure which can perform majority of processing and return you a small subset of data. This will be good improvement if it suits your requirement.

How to compare sql server query performance

I have two ways to get the result from sql server
first way
select * from my_view where key='key'
second way
select * from my_function('key')
the two ways do the same thing.
actually the size of the database is low so the execution time is too low ,and i can't tell witch one take less time, but I m not sure that they still be the same if the data base come to grow
So my question is: Is there any way or free tool to compare the exact execution time of the two queries?

key='key' will be always faster or the same. Never use function on column name (if applicable, use function on the value in the condition eg.: key=function('key')). If you do, SQL Server is not able to use any optimizations on the query (eg. indexes will be ignored).
To ensure query will keep its performance level as your table grows, you will have to set an index on the column.
You can deduce performance from Execution Plan in SQL Server Management Studio (when you execute the query with Include Actual Execution Plan enabled - Ctrl+M). You can also enable Include Statistics (Ctr+Alt+S) to see some performance info.

If you run multiple queries in a batch (one query window) and enable "Include Actual Execution Plan", it'll give each query a "relative cost to the batch".
So if you had two in your batch, if one is < 50% then it's faster than the other.

no more spool space in Database

I am using Teradata. In that I am getting 'no more spool space in Database'. My database utilization is 85%.
Is there any relationship between this error and DB utilization factor ?
Any studies on this would be more helpful for me to resolve this.
Share me your ideas to avoid this.

Spool space problems occur either when you have an inefficient query or when statistics have not been properly collected on the tables you are using. It can also happen with tables where the primary index was poorly chosen (high skew). Spool is an attribute of the user account you are using to connect to the Teradata environment; it is not really an attribute of the database itself.
The only way to know for certain is to look at the EXPLAIN plan for your query.
If your query is inefficient, rewrite it. If statistics need to be collected or if the index needs to be altered, contact the DBA responsible for the tables you are using.
If there is a particular query that is giving you an "out of spool" error, update this question with the complete text of the query.

I was not able to resolve my "out of spool" error by the methods above. I resolved the error by moving a rank function into its own small table without any join or extraneous columns.

Spool space can occur when you use tables having large data. If you are using multiple tables, check if you are using alias names instead of referring the complete table. Using alias names actually narrows down the data by the joins. Also see if functions like oreplace which consume more data are being used. Try using regular expressions in that case.

Eventually, you might create too low spool space.
You need to specify a new value for SPOOL in a MODIFY PROFILE or MODIFY USER statement, depending on where the user spool is defined. The syntax is:
MODIFY [PROFILE profile_name | USER user_name ] AS SPOOL = 100000000 bytes ;

Slow query with cfqueryparam searching on indexed column containing hashes

I have the following query that runs in 16ms - 30ms.
<cfquery name="local.test1" datasource="imagecdn">
SELECT hash FROM jobs WHERE hash in(
'EBDA95630915EB80709C69089315399B',
'3617B8E6CF0C62ECBD3C48DDF8585466',
'D519A38F09FDA868A2FEF1C55C9FEE76',
'135F94C3774F7719CFF8FF3A275D2D05',
'D58FAE69C559273D8427673A08193789',
'2BD7276F209768F2FCA6635659D7922A',
'B1E3CFBFCCFF6F5B48A849A050E6D424',
'2288F5B8A797F5302E8CA24323617236',
'8951883E36B5D38A4643DFAA0396BF13',
'839210BD564E30BE1355D1A6D4EF7081',
'ED4A2CB0C28B608C29576819CF7BE19B',
'CB26925A4874945B810707D5FF0B91F2',
'33B2FC229F0CC797A02AD163CDBA0875',
'624986E7547DBAC0F47B3005CFDE0A16',
'6F692C289BD805CEE41EF59F83F16F4D',
'8551F0033C617BD9EADAAD6CEC4B3E9E',
'94C3C0A74C2DE085FF9F1BBF928821A4',
'28DC1A9D2A69C2EDF5E6C0E6368A0B3C'
)
</cfquery>
If I execute the same query but use cfqueryparam it runs in 500ms - 2000ms.
<cfset local.hashes = "[list of the same ids as above]">
<cfquery name="local.test2" datasource="imagecdn">
SELECT hash FROM jobs WHERE hash in(
<cfqueryparam cfsqltype="cf_sql_varchar" value="#local.hashes#" list="yes">
)
</cfquery>
The table has roughly 60,000 rows. The "hash" column is varchar(50) and has a unique non-clustered index, but is not the primary key. DB server is MSSQL 2008. The web server is running the latest version of CF9.
Any idea why the cfqueryparam causes the performance to bomb out? It behaves this way every single time, no matter how many times I refresh the page. If I pair the list down to only 2 or 3 hashes, it still performs poorly at like 150-200ms. When I eliminate the cfqueryparam the performance is as expected. In this situation there is the possibility for SQL injection and thus using cfqueryparam would certainly be preferable, but it shouldn't take 100ms to find 2 records from an indexed column.
Edits:
We are using hashes generated by hash() not UUIDS or GUIDS. The hash is generated by a hash(SerializeJSON({ struct })) which contains the plan for a set of operations to execute on an image. The purpose for this is that it allows us to know before insert and before query the exact unique id for that structure. These hashes act as an "index" of what structures have already been stored in the DB. In addition with hashes the same structure will hash to the same result, which is not true for UUIDS and GUIDS.
The query is being executed on 5 different CF9 servers and all of them exhibit the same behavior. To me this rules out the idea that CF9 is caching something. All servers are connecting to the exact same DB so if caching was occurring it would have to be the DB level.

Your issue may be related to VARCHAR vs NVARCHAR. These 2 links may help
Querying MS SQL Server G/UUIDs from ColdFusion and
nvarchar vs. varchar in SQL Server, BEWARE
What might be happening is there is a setting in ColdFusion administrator if cfqueryparam sends varchars as unicode or not. If that setting does not match the column setting (in your case, if that setting is enabled) then MS SQL will not use that index.

As Mark points out it is is probably got a bad execution plan in the cache. One of the advantages of cfqueryparam is that when you pass in different values it can reuse the cached plan it has for that statement. This is why when you try it with a smaller list you see no improvement. When you do not use cfqueryparam SQL Server has to work out the Execution Plan each time. This normally a bad thing unless it has a sub optimal plan in the cache. Try clearing the cache as explained here http://www.devx.com/tips/Tip/14401 this hopefully will mean that the next time you run your statement with cfqueryparam in it'll cache the better plan.
Make sense?

I don't think cfqueryparam causing issue. As you have mention big hike in execution it may be index not going to use for your query when trying with cfqueryparam. I have created same scenario on my development computer but I got same execution time with and without cfqueryparam. There may be some overhead using list as in first query you are passing it directly as test and in second coldfusion need to create from query parameter from provided list but again this should not that much. I will suggest to start "SQL Server Profiler" and monitor query executed on server, this will give you better who costing another 500 ms.

How to improve performance in SQL Server table with image fields?

I'm having a very particular performance problem at work!
In the system we're using there's a table that holds information about the current workflow process. One of the fields holds a spreadsheet that contains metadata about the process (don't ask me why!! and NO I CAN'T CHANGE IT!!)
The problem is that this spreadsheet is stored in an IMAGE field in an SQL Server 2005 (within a database set with SQL 2000 compatibility).
This table currently has 22K+ lines and even a simple query like this:
SELECT TOP 100 *
FROM OFFENDING_TABLE
Takes 30 seconds to retrieve the data in Query Analyser.
I'm thinking about updating the compatibility to SQL 2005 (once that I was informed that the app can handle it).
The second thing I'm thinking is to change the data-type of the column to varbinary(max) but I don't know if doing this will affect the application.
Another thing that I'm considering is to use sp_tableoption to set the large value types out of row to 1 as it's currently 0, but I have no information if doing this will improve performance.
Does anyone know how to improve performance in such scenario?
Edited to clarify
My problem is that I have no control on what the application asks to the SQL Server, and I did some Reflection on it (the app is a .NET 1.1 website) and it uses the offending field for some internal stuff that I have no idea what it is.
I need to improve the overall performance of this table.

I'd recommend you look into the offending table layout health:
select * from sys.dm_db_index_physical_stats(
db_id(), object_id('offending_table'), null, null, detailed);
Things too look for are avg_fragmentation_in_percent, page_count, avg_page_space_used_in_percent, record_count and ghost_record_count. Cues like high fragmentation, or a high number of ghost records, or a low page used percent indicate problems and things can be improved quite a bit just by rebuilding the index (ie. the table) from scratch:
ALTER INDEX ALL ON offending_table REBUILD;
I'm saying this considering that you cannot change the table nor the app. If you'd be able to change the table and the app, the advice you already got is good advice (don't use '*', dont' select w/o a condition, use the newer varbinary(max) type etc etc).
I'd also look into the average page lifetime in performance counters to understand if the system is memory starved. From your description of the symptomps the system looks IO bound which leads me to think there is little page caching going on, and more RAM could help, as well as a faster IO subsytem. On a SQL 2008 system I would also suggest turning page compression on, but on 2005 you can't.
And, just to be sure, make sure the queries are not blocked by contention from the app itself, ie. the query doesn't spend 90% of that 30 seconds waiting for a row lock. Look at sys.dm_exec_requests while the query is running, see the wait_time, wait_type and wait_resource. Is it PAGEIOLATCH_XX? Or is it a lock? Also, how is the sys.dm_os_wait_stats in your server, what are the top wait reasons?

First of all - don't ever do a SELECT * in production code - reporting or not.
You have three basic choices:
move that blob field out into a separate table if it's not always needed; probably not practical since you mention you cannot change the schema
be more careful with your SELECT statements to select only those fields that you really need - and omit the blob field
see if you can limit your query to include a WHERE clause and find a way to optimize the query plan by e.g. adding a suitable index to the table (if you can)
There's no magic "make this faster" switch - but you can optimize your query or optimize your table layout. Both help. If you can't change anything - neither the table layout, nor add an index, nor change the queries, you'll have a hard time optimizing anything, I'm afraid....
Just changing the field to VARBINARY(MAX) won't change anything at all - no performance improvement to be expected just from changing the data type.

A short answer is to only do SELECTs against multiple rows when the fields returned do not include the offending image field, ie no SELECT *. If you want the value of the image field, retrieve it on a case-by-case basis.

Setting the large value types out of row option should definitely help performance. The row size will be significantly smaller, SQL Server can do a lot fewer physical reads to get throught the table.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight