Cluster index on varchar on small table - sql-server

Hy guys,
I inherited a database with the following table with only 200 rows:
CREATE TABLE [MyTable](
[Id] [uniqueidentifier] NOT NULL,
[Name] [varchar](255) NULL,
[Value] [varchar](8000) NULL,
[EffectiveStartDate] [datetime] NULL,
[EffectiveEndDate] [datetime] NULL,
[Description] [varchar](2000) NOT NULL DEFAULT (''),
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
As you can see there is a Clustered PK on a UniqueIdentifier column. I was doing some performance checks and the most expensive query so far (CPU and IO) is the following:
SELECT #Result = Value
FROM MyTable
WHERE #EffectiveDate BETWEEN EffectiveStartDate AND EffectiveEndDate
AND Name=#VariableName
The query above is encapsulated in a UDF and usually the udf is not called in a select list or where clause, instead it's return is usually assigned to a variable.
The execution plan shows a Clustered Index Scan
Our system is based in a large number of aggregations and math processing in real time. Every time our Web Application refreshes the main page, it calls a bunch of Stored Procedures and UDFs and the query above is run around 500 times per refresh per user.
My question is: Should I change the PK to nonclustered and create a clustered index on the Name, EffectiveStartDate, EffectiveEndDate in a such small table?

No you should not. You can just add another index which will be covering index:
CREATE INDEX [IDX_Covering] ON dbo.MyTable(Name, EffectiveStartDate, EffectiveEndDate)
INCLUDE(Value)
If #VariableName and #EffectiveDate are variables with correct types you should now see index seek.
I am not sure this will help, but you need to try, because index scan of 200 rows is just nothing, but calling it 500 times may be a problem. By the way if those 200 rows are in one page I suspect this will not help. The problem may be somewhere else, like opening a connection 500 times or something like that...

Related

How can a query with ORDER BY run faster than the same query without ordering?

I have a SQL Server database with EventJournal table with the following columns:
Ordering (bigint, primary key)
PersistenceID (nvarchar(255))
SequenceNr (bigint)
Payload (varbinary(max))
Other columns are omitted for clarity. In addition to the primary key on Ordering there is a unique constraint on PersistenceID+SequenceNr.
If I run a query
select top 100 * from EventJournal where PersistenceID like 'msc:%'
... it takes very long time to execute (the table contains more than 100M rows)
But if I add ordering to results:
select top 100 * from EventJournal where PersistenceID like 'msc:%' order by Ordering
... then it returns the result immediately.
The execution plan for both queries are the same and in essence is the clustered index scan on PK. Then why does the first query take long time to execute?
Here's the table definition:
CREATE TABLE [dbo].[EventJournal](
[PersistenceID] [nvarchar](255) NOT NULL,
[SequenceNr] [bigint] NOT NULL,
[IsDeleted] [bit] NOT NULL,
[Manifest] [nvarchar](500) NOT NULL,
[Payload] [varbinary](max) NOT NULL,
[Timestamp] [bigint] NOT NULL,
[Tags] [nvarchar](100) NULL,
[Ordering] [bigint] IDENTITY(1,1) NOT NULL,
[SerializerId] [int] NULL,
CONSTRAINT [PK_EventJournal] PRIMARY KEY CLUSTERED
(
[Ordering] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [QU_EventJournal] UNIQUE NONCLUSTERED
(
[PersistenceID] ASC,
[SequenceNr] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
This it 1st plan:
https://www.brentozar.com/pastetheplan/?id=SJ3kCo-Fv
And here's the 2nd one:
https://www.brentozar.com/pastetheplan/?id=Hy1B0ibtP
As I mentioned in my comments, the plans are different, the difference is in access method:
The first plan uses unordered scan:
And the second plan uses ordered scan:
By the way, the other answer suggests useless index.
SQL Server will NOT use this index, as it's equivalent to the non-clustered index already in place. As the index QU_EventJournal on ([PersistenceID], [SequenceNr]) was not used, the same way the index on (PersistenceID, Ordering) will not be used.
Both of these indexes has PersistenceID, Ordering in the index as Ordering is clustered index key, so it is presented in index on ([PersistenceID], [SequenceNr]) even if you don't see it in the definition. The suggested index will be also bigger as it is not defined as unique, and the sizes of other fields are the same: Ordering is bigint, SequenceNr is bigint.
It's wrong to think that in index on 2 fields the second field(Ordering) can be used to avoid the sort in the second query, it's not true.
For example the index on PersistenceID, Ordering can have rows like these:
msc:123, 100
msc:124, 5
msc:124, 6
msc:125, 1
I hope you clearly see that the index is ordered by PersistenceID, Ordering,
but the result of the second query is expected to be
msc:125, 1
msc:124, 5
msc:124, 6
msc:123, 100
So the SORT operator is needed, so this index will not be used.
Now to your question:
shouldn't lack of ORDER BY be used by the query analyzer as an
opportunity to build more efficient execution plan without ordering
constraints
Yes you are correct, without order by server is free to choose both the ordered and unordered scan, and yes you are right in this:
I also don't understand why using TOP without ORDER BY is a bad
practice in case I want ANY N rows from the result
When you don't need top N ordered by, because you just want to see what kind of records have 'msc:' in them, you should not add order by because it could cause a SORT in your plan.
And to your main question:
Then why does the first query take long time to execute?
The answer is: this was pure coincidence.
Your data is laying in way that the rows with 'msc:' in them go first, in the order defined by Ordering. And if you scan your index not in order they are just in the middle or at the end of the table.
If you seek for another pattern in PersistenceID the unordered scan will be faster

SQL performance advice

I know that it can be a tricky question and that it highly depends on context.
I have a database with a table containing around 10 millions lines (each line contains 4 varchar).
There is an index non clustered on the field (a varchar) used for the where.
I'm a bit confused because when selecting a line using a where on the indexed columns, it takes around a second to end.
Any advices to improve this response time ?
Would an indexed clustered be a good solution here?
Here is the table definition :
CREATE TABLE [dbo].[MYTABLE](
[ID] [uniqueidentifier] NOT NULL DEFAULT (newid()),
[CreationDate] [datetime] NOT NULL DEFAULT (getdate()),
[UpdateDate] [datetime] NULL,
[FIELD1] [varchar](9) NOT NULL,
[FIELD2] [varchar](max) NOT NULL,
[FIELD3] [varchar](1) NULL,
[FIELD4] [varchar](4) NOT NULL,
CONSTRAINT [PK_MYTABLE] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Here is the index definition :
CREATE UNIQUE NONCLUSTERED INDEX [IX_FIELD1] ON [dbo].[MYTABLE]
(
[FIELD1] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
And here is the query I use (very basic) :
SELECT * FROM MYTABLE WHERE FIELD1 = 'DATA'
For this case, since you are selecting all columns (*), changing your nonclustered index to a clustered one will improve the time, since accessing additional columns (columns not included in the index, not ordered nor by INCLUDE) from a nonclustered index will need to retrieve another page with the actual data.
Since you can't have more than 1 clustered index by table, you will have to drop the existing one (the PRIMARY KEY in this case) and create it afterwards.
ALTER TABLE dbo.MYTABLE DROP [PK_MYTABLE] -- This might fail if you have foreign keys
ALTER TABLE dbo.MYTABLE ADD CONSTRAINT PK_MYTABLE PRIMARY KEY NONCLUSTERED (ID)
DROP INDEX [IX_FIELD1] ON [dbo].[MYTABLE]
CREATE CLUSTERED INDEX [IX_FIELD1] ON [dbo].[MYTABLE] (FIELD1)
Indexes access times might greatly vary depending on their fragmentation also (if you have many inserts, deletes or updates with values that aren't bigger or lower than the last/first one).
Also keep in mind that if you are doing another operation like joins, function calls or additional WHERE filters, the enging might decide not to use the indexes.
If you are certain that the column used in your WHERE clause is unique, you may create a UNIQUE CLUSTERED INDEX on that column.
If values in that column are not unique, you can implement COVERING INDEXES. For example, if you had this index:
CREATE NONCLUSTERED INDEX IX_Column4 ON MyTable(Column4)
INCLUDE (Column1, Column2);
When executing the following query:
SELECT Column1, Column2 FROM MyTable WHERE Column4 LIKE 'Something';
You would (most likely) be using the IX_Column4 index. But when executing something like:
SELECT Column1, Column2, Column3 FROM MyTable WHERE Column4 LIKE 'Something';
You will not be benefited from the advantages that this kind of index have to offer.
If rows in your table are regurlarly INSERTED, DELETED or UPDATED you should check for INDEX FRAGMENTATION and REBUILD or REORGANIZE them.
I would recommend the following articles in case you want to lear more about indexes:
Available index types, check out the guidelines offered for each kind of index.
SQL Server Index Design Guide
Hope it helps.

How to create temporary non cluster index on my searching table Columns and remove it after select operation

I have 1 stored procedure which can return more than 1000 of records.
I want to Create temporary Non cluster index on my searching table columns as because i have heard that non cluster index will speed up data retrieval (SELECT) operations and slow down data updates(UPDATE and DELETE) operations and remove that non cluster index after my Operation have been completed.
Like I am having 2 Tables UserDetails and CategoryMaster and my searching fieds:
UserDetails(ServiceDescription,Skills)
CategoryMaster(Name)
This is my stored procedure:
ALTER PROCEDURE [dbo].[SearchworkerProcedure1]
#SearchKeyword nvarchar(70)
AS
DECLARE #Keywords TABLE
(
sno INT IDENTITY(1,1) PRIMARY KEY,
keyname VARCHAR(100),
Shortkeyname as substring(keyname,0,5)
)
DECLARE #SearchKeywordTable TABLE
(
[VendorId] [int] NULL,
[ServiceDescription] [nvarchar](max) NULL,
[Skills] [nvarchar](max) NULL
)
INSERT INTO #Keywords SELECT * FROM [splitstring_to_table](#SearchKeyword,',')
BEGIN
--My Query
END
My UserDetails Create Query:
CREATE TABLE [dbo].[UserDetails](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Fullname] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_UserDetails] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
So is that Possible to create temporary non cluster index in stored procedure and remove that non cluster index after Select Operation????
The temporary index is a bad idea. To index a table, a table needs to be scanned - just as it would if you were doing a SELECT on it with the current setup.
Permanent (and temporary) indexes on the fields that you have mentioned would have absolutely no effect whatsoever because your search criteria has leading wildcards. This will result in a table scan anyway.
The only place where indexes may help are on your foreign key columns used in joins. However without having any meaningful sizing stats in regards to yoiur tables, it's a guess.

table needs indexes to improve performance

I was having timeout issue when giving long period of DateTime in below query (query runs from c# application). Table had 30 million rows with a non-clustered index on ID(not a primary key).
Found that there was no primary key so I recently updated ID as Primary Key, it’s not giving me timeout now. Can anyone help me for the below query to create index on more than one key for future and also if I remove non clustered index from this table and create on more than one column? Data is increasing rapidly and need improvement on performace
select
ID, ReferenceNo, MinNo, DateTime, DataNo from tbl1
where
DateTime BETWEEN '04/09/2013' AND '20/11/2013'
and ReferenceNo = 4 and MinNo = 3 and DataNo = 14 Order by ID
this is the create script
CREATE TABLE [dbo].[tbl1]( [ID] [int] IDENTITY(1,1) not null, [ReferenceNo] [int] not null, [MinNo] [int] not null, [DateTime] [datetime] not null, [DataNo] [int] not null, CONSTRAINT [tbl1_pk] PRIMARY KEY CLUSTERED ([ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS
= ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
Its hard to tell which index you should use without knowing more about your database and how its used.
You may want to change the ID column to a clustered index. If ID is an identity column you will get very few page splits while inserting new data. It will however require you to rebuild the table and that may be a problem depending on your usage of the database. You will be looking at some downtime.
If you want a covering index it should look something like this:
CREATE NONCLUSTERED INDEX [MyCoveringIndex] ON tbl1
(
[ReferenceNo] ASC,
[MinNo] ASC,
[DataNo] ASC,
[DateTime ] ASC
)
Its no need to include ID as a column as its already in the clusted index (clusted index columns will be included in all other indexes). This will however use up a whole lot of space (somewhere in the range of 1GB if the columns above are of the types int and datetime). It will also affect your insert, update and delete performance on the table in (most cases) a negative way.
You can create the index in online mode if you are using Enterprice Edition of SQL server. In all other cases there will be a lock on the table while creating the index.
Its also hard to know what other queries that are made against the table. You may want to tweek the order of the columns in the index to better match other queries.
Indexing all fields would be fastest, but would likely waste a ton of space. I would guess that a date index would provide the most benefit with the least storage capacity cost because the data is probably evenly spread out over a large period of time. If the MIN() MAX() dates are close together, then this will not be as effective:
CREATE NONCLUSTERED INDEX [IDX_1] ON [dbo].[tbl1] (
[DateTime] ASC
)
GO
As a side note, you can use SSMSE's "Display Estimated Execution Plan" which will show you what the DB needs to do to get your data. It will suggest missing indexes and also provide CREATE INDEX statements. These suggestions can be quite wasteful, but they will give you an idea of what is taking so long. This option is in the Standard Toolbar, four icons to the right from "Execute".

How to speed up Xpath performance in SQL Server when searching for an element with specific text

I am supposed to remove whole rows and part of XML-documents from a table with an XML column based on a specific value in the XML column. However the table contains millions of rows and gets locked when I perform the operation. Currently it will take almost a week to clean it up, and the system is too critical to be taken offline for so long.
Are there any ways to optimize the xpath expressions in this script:
declare #slutdato datetime = '2012-03-01 00:00:00.000'
declare #startdato datetime = '2000-02-01 00:00:00.000'
declare #lev varchar(20) = 'suppliername'
declare #todelete varchar(10) = '~~~~~~~~~~'
CREATE TABLE #ids (selId int NOT NULL PRIMARY KEY)
INSERT into #ids
select id from dbo.proevesvar
WHERE leverandoer = #lev
and proevedato <= #slutdato
and proevedato >= #startdato
begin transaction /* delete whole rows */
delete from dbo.proevesvar
where id in (select selId from #ids)
and ProeveSvarXml.exist('/LaboratoryReport/LaboratoryResults/Result[Value=sql:variable(''#todelete'')]') = 1
and Proevesvarxml.exist('/LaboratoryReport/LaboratoryResults/Result[Value!=sql:variable(''#todelete'')]') = 0
commit
go
begin transaction /* delete single results */
UPDATE dbo.proevesvar SET ProeveSvarXml.modify('delete /LaboratoryReport/LaboratoryResults/Result[Value=sql:variable(''#todelete'')]')
where id in (select selId from #ids)
commit
go
The table definitions is:
CREATE TABLE [dbo].[ProeveSvar](
[ID] [int] IDENTITY(1,1) NOT NULL,
[CPRnr] [nchar](10) NOT NULL,
[ProeveDato] [datetime] NOT NULL,
[ProeveSvarXml] [xml] NOT NULL,
[Leverandoer] [nvarchar](50) NOT NULL,
[Proevenr] [nvarchar](50) NOT NULL,
[Lokationsnr] [nchar](13) NOT NULL,
[Modtaget] [datetime] NOT NULL,
CONSTRAINT [PK_ProeveSvar] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [IX_ProeveSvar_1] UNIQUE NONCLUSTERED
(
[CPRnr] ASC,
[Lokationsnr] ASC,
[Proevenr] ASC,
[ProeveDato] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
The first insert statement is very fast. I believe I can handle the locking by committing 50 rows at a time, so other requests can be handled in between my transactions.
The total number of rows for this supplier is about 5.5 million and the total rowcount in the table is around 13 million.
I've not really used xpath within SQL server before, but something which stands out is that you're doing lots of reads and writes in the same command (in the second statement). If possible, change your queries to..
CREATE TABLE #ids (selId int NOT NULL PRIMARY KEY)
INSERT into #ids
select id from dbo.proevesvar
WHERE leverandoer = #lev
and proevedato <= #slutdato
and proevedato >= #startdato
and ProeveSvarXml.exist('/LaboratoryReport/LaboratoryResults/Result[Value=sql:variable(''#todelete'')]') = 1
and Proevesvarxml.exist('/LaboratoryReport/LaboratoryResults/Result[Value!=sql:variable(''#todelete'')]') = 0
begin transaction /* delete whole rows */
delete from dbo.proevesvar
where id in (select selId from #ids)
This means that the first query will only create the new temporary table, and not write anything back, which will take slightly longer than your original, but the key thing is that your second query will ONLY be deleting records based on what's in your temporary table.
What you'll probably find is because it's deleting records, it's constantly re-building indices, and causing the reads to also be slower.
I'd also delete/disable any indices/constraints that don't actually help your query run.
Also, you're creating your clustered primary key on the ID, which isn't always the best thing to do. Especially if you're doing lots of date scans.
Can you also view the estimated execution plan for the top query, it would be interesting to see the order in which it checks the conditions. If it's doing the date first, then that's fine, but if it's doing the xpath before it checks the date, you might have to separte it into 3 queries, or add a new clustered index on 'proevedato,id'. This should force the query to only run the xpath for records which actually match the date.
Hope this helps.

Resources