Slow "Select" Query with varchar(max)

Slow "Select" Query with varchar(max) - sql-server

I have a small table with 500 rows.
This table has 10 columns including one varchar(max) column.
When I perform this query:
SELECT TOP 36 *
FROM MyTable
WHERE (Column1 = Value1)
It retrieves around 36 rows in 3 minutes.
The varchar(max) columns contains in each row 3000 characters.
If I try to retrieve only one row less:
SELECT TOP 35 *
FROM MyTable
WHERE (Column1 = Value1)
Then the query retrieves 35 rows in 0 seconds.
In my clients statistics, Bytes received from server, I have:
95 292 for the query retrieving data in 0 sec
over 200 000 000 for the query retrieving data in 3 min
Do you know does it come from?
EDIT --- Here is my real code:
select top 36 *
from Snapshots
where ExamId = 212
select top 35 *
from Snapshots
where ExamId = 212
EDIT --- More info on clients statistics
The two statistics having a huge variation are:
Bytes received from server : 66 038 Vs More than 2 000 000
TDS packets received from server 30 Vs 11000

Varchar(max) can't be part of a index key and apart from this other major drawback is it cannot be stored internally as a contiguous memory area as they can possibly grow up to 2Gb. So for improve the performance you need to avoid it.

Use Index for ExamId also use select field1,field2,etc instead of select * ....

I am not sure but try this:
select * from Snapshots where ExamId = (select top 36 ExamId from Snapshots where ExamId = 212)

Your execution time should be very low, while fetch is much longer.
Remove the varchar(max) from the SELECT TOP statement and only retrieve those values as you specifically need them.

Include SET STATISTICS IO ON before running the SELECT query and provide the output. Also, can you post the query plans from the 2 different queries as that will go a long way to explaining what the differences are. You can use https://www.brentozar.com/pastetheplan/ to upload it and provide the links.
Your TOP also does not have a matching ORDER BY so you cannot guarantee the ordering of the first 35 or 36 rows returned. This means that the 35 rows may not all be included in the 36 and you may be returning hugely different volumes of data.
Finally, also try in SSMS to enable Client Statistics with the query - this will show whether the delay is at the server side or all in latency in returning the result set to you.

Without the complete table description as a DDL statement (CREATE TABLE...) and indexes, it is very difficult to answer.
One important question is: do you use the "directive" TEXTIMAGE_ON when creating your table ? This will separate LOBs storage from relational data to avoid row overflow storage...

As other people are saying you should throw schema (datatype+existing index) of Snapshot table.
In snapshot table i believe examid is non clustered index which is not unique.
One examid has many record.Snapshot table must be having any PK column .
Top clause should always be use with Order by clause.Top clause without Order by clause is Non Determinstic.
On what basis it will select Top N.
So knowing schema of Snapshot then decide correct Index.
Using Order by clause can also be Non Determinstic but this is another discussion.
You can try this,
create table #temp(PKID int)
insert into #temp(pkid)
select top 36 pkid
from dbo.Snapshots
where ExamId = 212
Then you can do this,
select col1,col2,col3,col4
from dbo.Snapshots S
where exists(select 1 from #temp t where t.pkid=s.pkid)
Now your main question and problem,
Why 35 rows retrieve in 0 seconds and 36 rows retrieve in 3 minute.
I will write thst soon here.Meanwhile I am waiting for complete structure of Snapshot table.

Related

SQL Server variable Table or index causing performance issues

I am trying to build a stored procedure that retrieve information from few tables in my databases. I often use variable table to hold data since I have to return it in a result set and also reuse it in following queries instead of requiring the table multiple times.
Is this a good and common way to do that ?
So I started having performance issues when testing the stored procedure. By the way is there an efficient way to test is without having to change the parameter each times ? If I don't change parameter values the query will take only a few milliseconds to run I assume it use some sort of cache.
So I was starting having performance issues when the day before everything was working well so I reworked my queries looked that all index was being used correctly etc. Then I tried switching variable table for temp table just for testing purpose and bingo the 2 or 3 next tests ran like a charm and then performance issues started to appear again. So I am a bit clueless on what happens here and why it happen.
I am running my tests on the production db since it doesn't update or insert anything. There is a piece of code to give you an idea of my test case
--Stuff going on to get values in a temps table for the next query
DECLARE #ApplicationIDs TABLE(ID INT)
-- This table have over 110 000 000 rows and this query use one of its indexes. The query insert between 1 and 10-20k rows
INSERT INTO #ApplicationIDs(ID)
SELECT ApplicationID
FROM Schema.Application
WHERE Columna = value
AND Columnb = value
AND Columnc = value
-- I request the table again but joined with other tables to have my final resultset no performance issues here. ApplicationID is the clustered primary key
SELECT Columns
FROM Schema.Application
INNER JOIN SomeTable ON Columna = Columnb
WHERE ApplicationID IN (SELECT ID FROM #ApplicationIDs)
--There is where it starts happening this table has around 200 000 000 rows and about 50 columns and yes the applicationid column is indexed (nonclustered). I use this index that way in few other context and it work well just not this one
SELECT Columns
FROM Schema.SubApplication
WHERE ApplicationID IN (SELECT ID FROM #ApplicationIDs)
The server is in a VM with 64 gb of ram and SQL have 56GB allocated.
Let me know if you need further details.

Why is this stored procedure using TOP 100 PERCENT?

I have the following stored procedure inside a third party application inside sql server 2008 R2:-
ALTER PROCEDURE [dbo].[GetContacts]
AS
BEGIN
---------
SELECT TOP (100) PERCENT .....
INTO [#temp200]
FROM dbo.Contact
ORDER BY dbo.Contact.Name
--SELECT * from #temp200
SELECT top 38 *
FROM #temp200
ORDER BY Fullname
delete top (38) FROM #temp200
SELECT top 38 *
FROM #temp200
ORDER BY Fullname
delete top (38) FROM #temp200
SELECT top 38 *
FROM #temp200
ORDER BY Fullname
delete top (38) FROM #temp200
SELECT *
FROM #temp200
ORDER BY Fullname
now I run this inside sql management studio where I got the following results tabs:-
the first one contains 38 records.
the second one 38 records.
the third one contains 38 records.
the fourth one contains 30 records.
where in this case I got 144 records ,, so not sure what is the purpose of the
(SELECT TOP (100)) , as I will get 144 records. now as a test I changed the Select TOP(100) to be Select TOP(35) where in this case I got 2 results; the first on with 38 records while the second one with 17 records and .. so can anyone advice how my above SP is working ?

That guy did not understand that tables do not have order. He tried to insert in an ordered way into the temp tables. This is not possible. The TOP 100 PERCENT trick shuts up the warning about that but does nothing to ensure order.
In earlier SQL Server versions this code might well have worked by coincidence. Since then more optimizations have been added and this code is extremely brittle. Rewrite this if you get the chance. It's a latent time bomb.

How to SELECT LIMIT in ASE 12.5? LIMIT 10, 10 gives syntax error?

How can I LIMIT the result returned by a query in Adaptive Server IQ/12.5.0/0306?
The following gives me a generic error near LIMIT:
SELECT * FROM mytable LIMIT 10, 10;
Any idea why? This is my first time with this dbms

Sybase IQ uses row_count to limit the number of rows returned.
You will have to set the row count at the beginning of your statement and decide whether the statement should be TEMPORARY or not.
ROW_COUNT
SET options

LIMIT statement is not supported in Sybase IQ 12. I think there is not simple or clean solution, similarly like in old SQL server. But there are some approches that works for SQL server 2000 and should work also for Sybase IQ 12. I don't promise that queries below will work on copy&paste.
Subquery
SELECT TOP 10 *
FROM mytable
WHERE Id NOT IN (
SELECT TOP 10 Id FROM mytable ORDER BY OrderingColumn
)
ORDER BY OrderingColumn
Basically, it fetches 10 rows but also skips first 10 rows. To get this works, rows must be unique and ordering is important. Id cannot be more times in results. Otherwise you can filter out valid rows.
Asc-Desc
Another workaround depends on ordering. It uses ordering and fetches 10 rows for second page and you have to take care of last page (it does not work properly with simple formula page * rows per page).
SELECT *
FROM
(
SELECT TOP 10 *
FROM
(
SELECT TOP 20 * -- (page * rows per page)
FROM mytable
ORDER BY Id
) AS t1
ORDER BY Id DESC
) AS t2
ORDER BY Id ASC
I've found some info about non working subqueries in FROM statement in ASE 12. This approach maybe is not possible.
Basic iteration
In this scenario you can just iterate through rows. Let's assume id of tenth row is 15. Then it will select next 10 rows after tenth row. Bad things happen when you will order by another column than Id. It is not possible.
SELECT TOP 10 *
FROM mytable
WHERE Id > 15
ORDER BY Id
Here is article about another workarounds in SQL server 2000. Some should also works in similar ways in Sybase IQ 12.
http://www.codeproject.com/Articles/6936/Paging-of-Large-Resultsets-in-ASP-NET
All those things are workarounds. If you can try to migrate on newer version.

Query slow for certain criteria on clustered index

I have a table called readings that has > 76 million rows in it that I'm running this query on:
declare #tunnel_id int = 13
SELECT TOP 1 local_time, recorded_time
FROM readings
WHERE tunnel_id = #tunnel_id
ORDER BY id DESC
The id column is a bigint, set as the primary key, and has a clustered index, and there is also an index on the tunnel_id field.
The works great and returns in less than a second for about 16 out of the 20 different tunnel_id's I'm trying. However, on the last 4 or so the query takes 40 seconds and uses hundreds of thousands of reads.
I tried modifying the query into this:
SELECT TOP (1) local_time, recorded_time
FROM readings
where id = (
SELECT TOP 1 id
FROM readings
WHERE tunnel_id = 13
ORDER BY id DESC
)
Which once again is only slow for a few tunnel_id's. What perplexes me more is that the inner select runs quickly for the slow id's and if I hardcode the maximum id instead of the subquery it also runs quickly.
What am I missing here that's making this query perform poorly?
Edit for comments:
Tunnel_id is not unique, each tunnel has multiple millions of rows. This is running on Sql Server 2012.
I included the actual execution plans from both the fast and slow runs and they are identical.
Fast:
Slow:
But as you can see, the first executes in less than a second while the second takes 51 seconds.

The plan basically scans the entire clustered index from start to end and looks for the first row with tunnel_id = #tunnel_id.
My educated guess is that the 'slow' tunnels don't have any rows in the beginning of the clustered index and so it has to scan more of it.
This non-clustered index should speed things up:
CREATE NONCLUSTERED INDEX [IX_FOO] ON [readings]
(
tunnel_id,
ID
)
INCLUDE
(
local_time,
recorded_time
)
This could replace the existing index on tunnel_id.

The interesting part here is that SQL isn't using the index in tunnel_id at all and is just scanning the table in whole, which is slow if it's big like 76 millions rows.
I think the real cause it isn't using it is because the ordering by id, as it must perform a lookup and then an additional sorting. I doubt at first that parameter sniffing is the main problem here.
I would try to change the index instead, and make it covering. If possible include in the index the local time, recorded time and the id (not 100% sure if it's needed as it's the cluster key anyway).
CREATE NONCLUSTERED INDEX IX_tunnel_id ON dbo.readings (tunnel_id) INCLUDE (id, local_time, recorded_time)
Note that, while this can improve this particular query, it will make inserts and updates a little slower, and require additional storage space.

Just found that you can hint to use the tunnel_id index:
declare #tunnel_id int = 13
SELECT TOP 1 local_time, recorded_time
FROM readings
WITH (INDEX(idx_tunnel_id))
WHERE tunnel_id = #tunnel_id
ORDER BY id DESC
which works as expected and returns in less than 1 second.

MS Access row number, specify an index

Is there a way in MS access to return a dataset between a specific index?
So lets say my dataset is:
rank | first_name | age
1 Max 23
2 Bob 40
3 Sid 25
4 Billy 18
5 Sally 19
But I only want to return those records between 'rank' 2 and 4, so my results set is Bob, Sid and Billy? However, Rank is not part of the table, and this should be generated when the query is run. Why don't I use an autogenerated number, because if a record is deleted, this will be inconsistent, and what if I wanted the results in reverse!
This obviously very simple, and the reason I ask is because I am working on a product catalogue and I am looking for a more efficient way of paging through the returned dataset, so if I only return 1 page worth of data from the database this is obviously going to be quicker then return a complete set of 3000 records and then having to subselect from that set!
Thanks R.

Original suggestion:
SELECT * from table where rank BETWEEN 2 and 4;
Modified after comment, that rank is not existing in structure:
Select top 100 * from table;
And if you want to choose subsequent results, you can choose the ID of the last record from the first query, say it was ID 101, and use a WHERE clause to get the next 100;
Select top 100 * from table where ID > 100;
But these won't give you what you're looking for either, I bet.

How are you calculating rank? I assume you are basing it on some data in another dataset somewhere. If so, create a function, do a table join, or do something that can calculate rank based on values in other table(s), then you can do queries based on the rank() function.
For example:
select *
from table
where rank() between 2 and 4
If you are not calculating rank based on some data somewhere, there really isn't a way to write this query, and you might as well be returning three random rows from the table.

I think you need to use a correlated subquery to calculate the rank on the fly e.g. I'm guessing the rank is based on name:
SELECT T1.first_name, T1.age,
(
SELECT COUNT(*) + 1
FROM MyTable AS T2
WHERE T1.first_name > T2.first_name
) AS rank
FROM MyTable AS T1;
The bad news is the Access data engine is poorly optimized for this kind of query; in my experience, performace will start to noticeably degrade beyond a few hundred rows.
If it is not possible to maintain the rank on the db side of the house (e.g. high insertion environment) consider doing the paging on the client side. For example, an ADO classic recordset object has properties to support paging (PageCount, PageSize, AbsolutePage, etc), something for which DAO recordsets (being of an older vintage) have no support.
As always, you'll have to perform your own timings but I suspect that when there are, say, 10K rows you will find it faster to take on the overhead of fetching all the rows to an ADO recordset then finding the page (then perhaps fabricate smaller ADO recordset consisting of just that page's worth of rows) than it is to perform a correlated subquery to only fetch the number of rows for the page.

Unfortunately the LIMIT keyword isn't available in MS Access -- that's what is used in MySQL for a multi-page presentation. If you can write an order key into the results table, then you can use it something like this:
SELECT TOP 25 MyOrder, Etc FROM Table1 WHERE MyOrder in
(SELECT TOP 55 MyOrder FROM Table1 ORDER BY MyOrder DESC)
ORDER BY MyOrder ASCENDING

If I understand you correctly, there is ionly first_name and age columns in your table. If this is the case, then there is no way to return Bob, Sid, and Billy with a single query. Unless you do something like
SELECT * FROM Table
WHERE FirstName = 'Bob'
OR FirstName = 'Sid'
OR FirstName = 'Billy'
But I think that this is not what you are looking for.
This is because SQL databases make no guarantee as to the order that the data will come out of the database unless you specify an ORDER BY clause. It will usually come out in the same order it was added, but there are no guarantees, and once you get a lot of rows in your table, there's a reasonably high probability that they won't come out in the order you put them in.
As a side note, you should probably add a "rank" column (this column is usually called id) to your table, and make it an auto incrementing integer (see Access documentation), so that you can do the query mentioned by Sev. It's also important to have a primary key so that you can be certain which rows are being updated when you are running an update query, or which rows are being deleted when you run a delete query. For example, if you had 2 people named Max, and they were both 23, how you delete 1 row without deleting the other. If you had another auto incrementing unique column in there, you could specify the unique ID in your query to delete only one.
[ADDITION]
Upon reading your comment, If you add an autoincrement field, and want to read 3 rows, and you know the ID of the first row you want to read, then you can use "TOP" to read 3 rows.
Assuming your data looks like this
ID | first_name | age
1 Max 23
2 Bob 40
6 Sid 25
8 Billy 18
15 Sally 19
You can wuery Bob, Sid and Billy with the following QUERY.
SELECT TOP 3 FirstName, Age
From Table
WHERE ID >= 2
ORDER BY ID