Why RID (Bookmark) lookup is not being shown in execution plan? - sql-server

I tried to examine RID (foremerly bookmark) lookup by creating a heap table:
CREATE TABLE [dbo].[CustomerAddress]
(
[CustomerID] [int],
[AddressID] [int],
[ModifiedDate] [datetime]
);
GO
CREATE NONCLUSTERED INDEX x
ON dbo.CustomerAddress(CustomerID, AddressID);
Then, I tried the following query to inestigate execution plan:
SELECT CustomerID, AddressID, ModifiedDate
FROM dbo.CustomerAddress
WHERE CustomerID = 29485;
But, using MSSMS I cannot see RID lookup in the execution plan:
I'm using SQL Server 2008R2 (version 10.50.4000.0) service pack 2.
PS: This question is based on Aaron Bertrand's article.

A table scan means SQL Server does not use your index. It reads from the "heap". A "heap" is the data storage for tables without a clustered index.
Since it does not touch the index at all, SQL Server does not need a RID lookup to go from the index to the heap.
The reason is probably that SQL Server estimates there might be more than +/- 100 rows for one customer. The optimizer will try to avoid a large numbers of lookups.
You could try again with an index on just (CustomerID), or by adding an AddresID to your where clause.

Related

How to generate a single GUID for all rows in a batch insert within the query?

I am writing a quick-and-dirty application to load sales plan data into SQL Server (2008 FWIW, though I don't think the specific version matters).
The data set is the corporate sales plan: a few thousand rows of Units, Dollars and Price for each combination of customer, part number and month. This data is updated every few weeks, and it's important to track who changed it and what the changes were.
-- Metadata columns are suffixed with ' ##', to enable an automated
-- tool I wrote to handle repetitive tasks such as de-duplication of
-- records whose values didn't change in successive versions of the
-- forecast.
CREATE TABLE [SlsPlan].[PlanDetail]
(
[CustID] [char](15) NOT NULL,
[InvtID] [char](30) NOT NULL,
[FiscalYear] [int] NOT NULL,
[FiscalMonth] [int] NOT NULL,
[Version Number ##] [int] IDENTITY(1,1) NOT NULL,
[Units] [decimal](18, 6) NULL,
[Unit Price] [decimal](18, 6) NULL,
[Dollars] [decimal](18, 6) NULL,
[Batch GUID ##] [uniqueidentifier] NOT NULL,
[Record GUID ##] [uniqueidentifier] NOT NULL DEFAULT (NEWSEQUENTIALID()),
[Time Created ##] [datetime] NOT NULL,
[User ID ##] [varchar](64) NULL DEFAULT (ORIGINAL_LOGIN()),
CONSTRAINT [PlanByProduct_PK] PRIMARY KEY CLUSTERED
([CustID], [InvtID], [FiscalYear], [FiscalMonth], [Version Number ##])
)
To track changes, I'm using an IDENTITY column as part of the primary key to enable multiple version with the same primary key. To track who did the change, and also to enable backing out an entire bad update if someone does something completely stupid, I am inserting the Active Directory logon of the creator of that version of the record, a time stamp, and two GUIDs.
The "Batch GUID" column should be the same for all records in a batch; the "Record GUID" column is obviously unique to that particular record and is used for de-duplication only, not for any sort of query.
I would strongly prefer to generate the batch GUID inside a query rather than by writing a stored procedure that does the obvious:
DECLARE #BatchGUID UNIQUEIDENTIFIER = NEWID()
INSERT INTO MyTable
SELECT I.*, #BatchGUID
FROM InputTable I
I figured the easy way to do this is to construct a single-row result with the timestamp, the user ID and a call to NEWID() to create the batch GUID. Then, do a CROSS JOIN to append that single row to each of the rows being inserted. I tried doing this a couple different ways, and it appears that the query execution engine is essentially executing the GETDATE() once, because a single time stamp appears in all rows (even for a 5-million row test case). However, I get a different GUID for each row in the result set.
The below examples just focus on the query, and omit the insert logic around them.
WITH MySingleRow AS
(
Select NewID() as [Batch GUID ##],
ORIGINAL_LOGIN() as [User ID ##],
getdate() as [Time Created ##]
)
SELECT N.*, R1.*
FROM util.zzIntegers N
CROSS JOIN MySingleRow R1
WHERE N.Sequence < 10000000
In the above query, "util.zzIntegers" is just a table of integers from 0 to 10 million. The query takes about 10 seconds to run on my server with a cold cache, so if SQL Server were executing the GETDATE() function with each row of the main table, it would certainly have a different value at least in the milliseconds column, but all 10 million rows have the same timestamp. But I get a different GUID for each row. As I said before, the goal is to have the same GUID in each row.
I also decided to try a version with an explicit table value constructor in hopes that I would be able to fool the optimizer into doing the right thing. I also ran it against a real table rather than a relatively "synthetic" test like a single-column list of integers. The following produced the same result.
WITH AnotherSingleRow AS
(
SELECT SingleRow.*
FROM (
VALUES (NewID(), Original_Login(), getdate())
)
AS SingleRow(GUID, UserID, TimeStamp)
)
SELECT R1.*, S.*
FROM SalesOrderLineItems S
CROSS JOIN AnotherSingleRow R1
The SalesOrderLineItems is a table with 6 million rows and 135 columns, to make doubly sure that runtime was sufficiently long that the GETDATE() would increment if SQL Server were completely optimizing away the table value constructor and just calling the function each time the query runs.
I've been lurking here for a while, and this is my first question, so I definitely wanted to do good research and avoid criticism for just throwing a question out there. The following questions on this site deal with GUIDs but aren't directly relevant. I also spent a half hour searching Google with various combinations of phrases didn't seem to turn up anything.
Azure actually does what I want, as evidenced in the following question I
turned up in my research:
Guid.NewGuid() always return same Guid for all rows.
However, I'm not on Azure and not going to go there anytime soon.
Someone tried to do the same thing in SSIS
(How to insert the same guid in SSIS import)
but the answer to that query came back that you generate the GUID in
SSIS as a variable and insert it into each row. I could certainly do
the equivalent in a stored procedure but for the sake of elegance and
maintainability (my colleagues have less experience with SQL Server queries
than I do), I would prefer to keep the creation of the batch GUID in
a query, and to simplify any stored procedures as much as possible.
BTW, my experience level is 1-2 years with SQL Server as a data analyst/SQL developer as part of 10+ years spent writing code, but for the last 20 years I've been mostly a numbers guy rather than an IT guy. Early in my career, I worked for a pioneering database vendor as one of the developers of the query optimizer, so I have a pretty good idea what a query optimizer does, but haven't had time to really dig into how SQL Server does it. So I could be completely missing something that's obvious to others.
Thank you in advance for your help.

Index Seek Query Will Sometimes Take Minutes To Complete

I have a query that (~95% of the time) executes nearly instantly on the production Azure SQL Database. Running the query in SSMS (in production) shows that my non-clustered index is being utilized with an index seek (cost 100%).
However, randomly the database all of a sudden gets into a state where this same query will fail to execute. It always times out from the calling application. Logging into SSMS when this episode is occurring I can manually execute the query and it will eventually complete after minutes of execution (since there are no time out limits in SSMS vs that of the calling application).
After I allow the query to fully execute without timeouts I can subsequently execute the query again with instant results. The calling application can also call it now with instant results again. It appears that by allowing it to fully execute without a timeout clears up whatever issue was occurring and returns execution back to normal.
Monitoring the server metrics shows no real issues or spikes in CPU utilization that would suggest the server is just in a stressed state during this time. All other queries within the application still execute quickly as normal. Even queries that utilize this same table and non-clustered index.
Table
CREATE TABLE [dbo].[Item] (
[Id] UNIQUEIDENTIFIER NOT NULL,
[UserId] UNIQUEIDENTIFIER NULL,
[Type] TINYINT NOT NULL,
[Data] NVARCHAR (MAX) NULL,
[CreationDate] DATETIME2 (7) NOT NULL,
CONSTRAINT [PK_Item] PRIMARY KEY CLUSTERED ([Id] ASC),
CONSTRAINT [FK_Item_User] FOREIGN KEY ([UserId]) REFERENCES [dbo].[User] ([Id])
);
This table has millions of rows in it.
Index
CREATE NONCLUSTERED INDEX [IX_Item_UserId_Type_IncludeAll]
ON [dbo].[Item]([UserId] ASC, [Type] ASC)
INCLUDE ([Data], [CreationDate]);
Issue Query
SELECT
*
FROM
[dbo].[Item]
WHERE
[UserId] = #UserId
AND [Data] IS NOT NULL
While I was catching it in the act today in SSMS, I also modified to query to to remove the AND [Data] IS NOT NULL from the where clause. Ex:
SELECT
*
FROM
[dbo].[Item]
WHERE
[UserId] = #UserId
This query executed instantly and execution plans show that it is utilizing the index properly. Adding back AND [Data] IS NOT NULL causes the query be slow again. This Data column can hold large amounts of JSON data so I am not sure if that somehow has anything to do with it.
Running sp_WhoIsActive while the episode is occurring and my query is long-running shows that reads, physical_reads, cpu, and used_memory are ever-increasing as the query continues to execute. Interestingly, the query_plan column is NULL while it is running so I am not able to see what plan it is actually utilizing. Though I can always see that the index seek is utilized while running it manually thereafter.
Why would this query get into a state where it would take a really long time to execute while the majority of the time it executes with near instant results? We can see that it is properly utilizing it's non-clustered index as a seek operation.
Why does allowing the query to fully execute in SSMS (vs timing out as the calling application does) seem to clear up the problem going forward?
How can I avoid these types of episodes?
Few things i would check...
1.Your query doesn't have a good index , a good index would be below since you are doing a select * as well as data is not null
create index nci on table(userid,data)
include(rest of columns in select )
2.Try updating statistics and indexes for this table,this will help if there is a index fragmentation or stale statistics
3.Try option(recompile) hint to see if parameter sniffing is a problem

SQL Server doesn't use a suggested index

First, I am using: Microsoft SQL Server 2012 (SP1) - 11.0.3000.0 (X64)
I have created a table that looks like this:
create table dbo.pos_key
( keyid int identity(1,1) not null
, systemid int not null
, partyid int not null
, portfolioid int null
, instrumentid int not null
, security_no decimal(10,0) null
, entry_date datetime not null
)
keyid is a clustered primary key. My table has about 144,000 rows. Currently systemId doesn't have much fluctuation, it is the same in every row except 1.
Now I perform the following query:
select *
from pos_key
where systemid = 33000
and portfolioid = 150444
and instrumentid = 639
Which returns 1 row after a clustered index scan. [pos_key].[PK_pos_key]
Execution plan said that expected row count was 1.082
SQL Server quickly suggests that I add an index.
CREATE NONCLUSTERED INDEX IDX_SYS_PORT_INST
ON [dbo].[pos_key] ([systemid],[portfolioid],[instrumentid])
So I do and run the query again.
Surprisingly SQL-server doesn't use the new index, instead it again goes for the same clustered index scan and but now it claims to expect 4087 rows! It however doesn't suggest any new index this time.
To get it to use the new index I have done the following:
Updated table statistics (update statistics)
Updated index statistics (update statistics)
Dropped cached execution plans related to this queries (DBCC FREEPROCCACHE)
No luck, SQL server always goes for the clustered scan and expects 4087 rows.
Index statistics look like this:
All Density Average Length Columns
----------------------------------------------------------------------------
0.5 4 systemid
6.095331E-05 7.446431 systemid, portfolioid
1.862301E-05 11.44643 systemid, portfolioid, instrumentid
6.9314E-06 15.44643 systemid, portfolioid, instrumentid, keyid
Curiously I left this overnight and in the morning ran the query again and BAMM now it hits the index. I dropped the index, ran the select and then created the index again. Now SQL server is back to 4087 expected rows and clustered index scan.
So what am I missing. The index obviously works but SQL server doesn't want to use it right away.
Is the of fluctuation in systemId somehow causing trouble?
Is DBCC FREEPROCCACHE not enough to get rid of cached execution plans?
Are the ways of SQL-Server just mysterious?
With a composite index and all columns used in equality predicates, specify the most selective column first (portfolieid here). SQL Server maintains a histogram only for the first column.
With the less selective column first, SQL Server probably overestimated the row count and chose to the clustered index scan instead thinking it was more efficient since you are selecting all columns.

SQL Server 2014 Index Optimization: Any benefit with including primary key in indexes?

After running a query, the SQL Server 2014 Actual Query Plan shows a missing index like below:
CREATE NONCLUSTERED INDEX IX_1 ON Table1 (Column1) INCLUDE
(PK_Column,SomeOtherColumn)
The missing index suggests to include the Primary Key column in the index. The table is clustered index with the PK_Column.
I am confused and it seems that I don’t get the concept of Clustered Index Primary Key right.
My assumption was: when a table has a clustered PK, all of the non-clustered indexes point to the PK value. Am I correct? If I am, why the query plan missing index asks me to include the PK column in the index?
Summary:
Index advised is not valid,but it doesn't make any difference.See below tests section for details..
After researching for some time,found an answer here and below statement explains convincingly about missing index feature..
they only look at a single query, or a single operation within a single query. They don't take into account what already exists or your other query patterns.
You still need a thinking human being to analyze the overall indexing strategy and make sure that you index structure is efficient and cohesive.
So coming to your question,this index advised may be valid ,but should not to be taken for granted. The index advised is useful for SQL Server for the particular query executed, to reduce cost.
This is the index that was advised..
CREATE NONCLUSTERED INDEX IX_1 ON Table1 (Column1)
INCLUDE (PK_Column, SomeOtherColumn)
Assume you have a query like below..
select pk_column, someothercolumn
from table
where column1 = 'somevalue'
SQL Server tries to scan a narrow index as well if available, so in this case an index as advised will be helpful..
Further you didn't share the schema of table, if you have an index like below
create index nci_test on table(column1)
and a query of below form will advise again same index as stated in question
select pk_column, someothercolumn
from table
where column1 = 'somevalue'
Update :
i have orders table with below schema..
[orderid] [int] NOT NULL Primary key,
[custid] [char](11) NOT NULL,
[empid] [int] NOT NULL,
[shipperid] [varchar](5) NOT NULL,
[orderdate] [date] NOT NULL,
[filler] [char](160) NOT NULL
Now i created one more index of below structure..
create index onlyempid on orderstest(empid)
Now when i have a query of below form
select empid,orderid,orderdate --6.3 units
from orderstest
where empid=5
index advisor will advise below missing index .
CREATE NONCLUSTERED INDEX empidalongwithorderiddate
ON [dbo].[orderstest] ([empid])
INCLUDE ([orderid],[orderdate])--you can drop orderid too ,it doesnt make any difference
If you can see orderid is also included in above suggestion
now lets create it and observe both structures..
---Root level-------
For index onlyempid..
for index empidalongwithorderiddate
----leaf level-------
For index onlyempid..
for index empidalongwithorderiddate
As you can see , creating as per suggestion makes no difference,Even though it is invalid.
I Assume suggestion was made by Index advisor based on query ran and is specifically for the query and it has no idea of other indexes involved
I don't know your schema, nor your queries. Just guessing.
Please correct me if this theory is incorrect.
You are right that non-clustered indexes point to the PK value. Imagine you have large database (for example gigabytes of files) stored on ordinary platter hard-drive. Lets suppose that the disk is fragmented and the PK_index is saved physical far from your Table1 Index.
Imagine that your query need to evaluate Column1 and PK_column as well. The query execution read Column1 value, then PK_value, then Column1 value, then PK_value...
The hard-drive platter is spinning from one physical place to another, this can take time.
Having all you need in one index is more effective, because it means reading one file sequentially.

Will using an indexed view improve performance of SELECT COUNT queries?

I have a table with that will grow to several million rows over some years. As part of my web application, I have to query the count on a subset of this table whenever a user accesses a particular page. Someone with an architecty hat has said that they have a performance concern with that. Assuming they are correct, will adding an indexed view address this issue?
Sql that I want to be fast:
SELECT COUNT(*) FROM [dbo].[Txxx] WHERE SomeName = 'ZZZZ'
OR
SELECT COUNT_BIG(*) FROM [dbo].[Txxx] WHERE SomeName = 'ZZZZ'
Table:
CREATE TABLE [dbo].[Txxx](
[Id] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[SomeName] [nvarchar](50) NOT NULL,
[SomeGuid] [uniqueidentifier] NOT NULL
CONSTRAINT [PK_Txxx] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
View:
CREATE view dbo.Vxxx
WITH SCHEMABINDING
AS
SELECT SomeName, COUNT_BIG(*) AS UsedCount
FROM dbo.Txxx
GROUP BY SomeName
Index:
CREATE UNIQUE CLUSTERED INDEX [IV_COUNT] ON [dbo].[Vxxx]
(
[SomeName] ASC
)
Yes, but only Enterprise Edition will consider the indexed view during query compilation. To leverage the index on non-EE you need to select directly from the view and use the NOEXPAND hint:
NOEXPAND applies only to indexed views. An indexed view is a view with
a unique clustered index created on it. If a query contains references
to columns that are present both in an indexed view and base tables,
and the query optimizer determines that using the indexed view
provides the best method for executing the query, the query optimizer
uses the index on the view. This function is called indexed view
matching. Automatic use of indexed view by query optimizer is
supported only in specific editions of SQL Server.
Be warned that a indexed view like this will create write contention, because any update will lock and entire SomeName scope: only one transaction at a time will be able to insert, delete or update any row with SomeName = 'ZZZZ'.
Yes, that indexed view will definitely improve the performance of that particular query (assuming Enterprise Edition - Remus explains how to utilize it if you're not on Enterprise).
However, it isn't "free" - the index will need to be maintained for all DML operations to dbo.Txxx, will occupy space (though considerably less than the base table, in comparison), and will be subject to issues that also affect normal tables - such as fragmentation and (likely to a lesser extent in this case) page splits.

Resources