How to speed up recreating cluster index - sql-server

In SQL Server, there is no option for altering the cluster index if i want to add one new column to cluster index definition. The only option is to drop and create cluster index with new definition.
From what I understand, drop and create of cluster index is a very costly and time consuming for high volume tables.
Cluster index recreate rebuilds all the nonclustered indexes on a table which can be very expensive.
The question to this forum "is there anyway we can speed up cluster index recreating"
The one workaround what I can think is to drop all non-cluster index before recreating cluster index. Will this approach work ?

Use
CREATE .... WITH (DROP_EXISTING = ON)
Instead of
DROP ...
CREATE ...
This means the non clustered indexes only have to be updated once (to include the new key column). Not twice - first to use the physical rid and then again to use the new CI key.
The DROP_EXISTING clause tells SQL Server that the existing clustered index is being dropped but that a new one will be added in its place, letting SQL Server defer updating the nonclustered index until the new clustered index is in place..
Additionally, SQL Server won't rebuild the nonclustered index at all if the clustered index key doesn't change and is defined as UNIQUE, which isn't an obvious performance benefit of defining a clustered index as UNIQUE
Example
CREATE TABLE #T
(
A INT,
B INT,
C INT
)
CREATE CLUSTERED INDEX IX ON #T(A)
CREATE CLUSTERED INDEX IX ON #T(A,B) WITH (DROP_EXISTING = ON)
DROP TABLE #T

Related

Clustering key goes up to tree with non-clustered index in SQL Server

It seems in SQL Server before version 2019, the clustering key/keys goes up to tree structure with not unique non-clustered index. With bigger and multiple clustering key/keys, you gain much more wider and taller tree that costs you more storage size and memory size.
Because of that we used to separate PK from clustered key my questions are
Have SQL Server 2019 and Azure changed in non-clustered indexing or not?
Heaps do not have clustering key/keys at all, what's the way of indexing in heaps?
Have SQL Server 2019 and Azure changed in non-clustered indexing or not
This behavior is older than many people on this site.
Because of that we used to separate PK from clustered
That is an almost-always-unnecessary micro-optimization.
Heaps do not have clustering key/keys at all, what's the way of indexing in heaps
Non-clustered non-unique indexes always have the row locator as index keys. For heaps the row locator is the ROWID (FileNo,PageNo,SlotNo).
If you want move the rows out from the leaf level of your wide PK, it's typically for a very large table. And so moving the rows to a clustered columstore index can be a good option. To do that just drop the clustered PK (this will leave the leaf level as a heap), create the CCI, and then recreate the PK as a nonclustered PK. eg
drop table if exists t
go
create table t(id int not null, a int, b int)
alter table t
add constraint pk_t
primary key clustered(id)
go
alter table t drop constraint pk_t
create clustered columnstore index cci_t on t
alter table t
add constraint pk_t
primary key nonclustered (id)
And if you have other non-clustered indexes drop them first, and only recreate them afterwords if you really need to. IE unique indexes, indexes supporting a foreign key, or indexes need to support specific queries. A CCI typically doesn't need lots of indexes since it's so efficient to scan.

execution plan suggesting to add an index on columns which are not part of where clause

I am running following query in SSMS and execution plan suggesting to add index on columns which are not part of where clause. I was planning to add index on two columns which are being used in where clause (OID and TransactionDate).
SELECT
[OID] , //this is not a PK. Primary key column is not a part of sql script
[CustomerNum] ,
[Amount] ,
[TransactionDate] ,
[CreatedDate]
FROM [dbo].[Transaction]
WHERE OID = 489
AND TransactionDate > '01/01/2018 06:13:06.46';
Index suggestion
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Transaction] ([OID],[TransactionDate])
INCLUDE ([CustomerNum],[Amount],[CreatedDate])
Updated
Do i need to include other columns? Data is being imported to that table through a back end process using SQLBulkCopy class in .net. I am wondering if having non cluster index on all columns would reduce the performance. (In my table is Pk column called TransactionID which is not needed but i have this in the table in case its needed in the future otherwise SQLBulkCopy works better with heap. Other option is to drop and recreate indexes before and after SQLBulkCopy operation)
the INCLUDE keyword specifies the non-key columns to be added to the leaf level of the nonclustered index.
This means that if you will add this index and run the query again, SQL Server can get all the information needed from the index, thus eliminating the need to perform a lookup in the table as well.
As a general rule of thumb - when SSMS suggest an index, create it. You can always drop it later if it doesn't help.
You don't need to add all table columns in your non-clustered index, suggested index is good for the query provided. SQL Server database engine suggestions are usually really good.
INCLUDE keyword is required to avoid KEY LOOKUP and use NONCLUSTERED INDEX SEEK.
All in all: No NONCLUSTERED INDEX results in Clustered index scan
Created NONCLUSTERED INDEX with no included columns results in NONCLUSTERED INDEX scan plus key lookup.
Created NONCLUSTERED INDEX with included columns results in NONCLUSTERED INDEX SEEK.

Parallel clustered columnstore index build

Is it possible to build clustered columnstore index using several processes and keeping the row order of clustered rowstore index? Table is partitioned. Currently I am converting rowstore index to columnstore like this:
CREATE CLUSTERED columnstore INDEX [index_name] ON schema.table
WITH (
drop_existing = ON
,MAXDOP = 1
)
If I increase MAXDOP, then row order isn't kept. I am thinking about creating separate table for every partition and then doing partition switching. Maybe there is a better way?

Modifying an existing index: Create index with drop_existing=on vs. alter index statement

I am new to index maintenance. I see most of our indexes are modified using create index with drop_existing = on. For example:
create nonclustered index ixn_SomeTable__SomeIndexName_ic1
on dbo.SomeTable ( Column1 )
include ( Column2, IncludeThisNewColumn3 )
with ( sort_in_tempdb = on, drop_existing = on, online = on, fillfactor = 95 ) on [SomeFileGroup]
go
but I see TSQL also has alter index statement.
Questions -
What does the drop_existing=on do? Does it just drops the index if it exists and recreates it or does it save on rebuilding the index (re-indexing data etc) if the modifications don't really need rebuilding of index. (for example including a column in a non-clustered index)?
What is the difference between create index with drop_existing = on and alter index? When do I absolutely have to use one or the other?
Does the index become unavailable when the modifications to the index are in progress and is there a way to keep the unavailable time to minimum?
What does the drop_existing=on do? Does it just drops the index if it exists and recreates it or does it save on rebuilding the index (re-indexing data etc)
The DROP_EXISTING clause tells SQL Server that the existing clustered index is being dropped but that a new one will be added in its place, letting SQL Server defer updating the nonclustered index until the new clustered index is in place
does it save on rebuilding the index (re-indexing data etc) if the modifications don't really need rebuilding of index. (for example including a column in a non-clustered index)?
SQL Server won't rebuild the non clustered index at all if the clustered index key doesn't change and is defined as UNIQUE
What is the difference between create index with drop_existing = on and alter index? When do I absolutely have to use one or the other?
alter index is used to Rebuild/Reorg index..I don't see any comparison with Create
Does the index become unavailable when the modifications to the index are in progress and is there a way to keep the unavailable time to minimum?
When you use DROP EXISTING clause,index will be available for most of the time..Index will require an Exclusive lock at the end,but this blocking will be very short
References:
https://stackoverflow.com/a/41096665/2975396
https://msdn.microsoft.com/en-us/library/ms188783.aspx
http://sqlmag.com/database-high-availability/use-create-indexs-dropexisting-clause-when-recreating-clustered-index

SQL Server: changing the filegroup of an index (which is also a PK)

We're doing a cleanup on a group of databases, and the first step is to get all indexes in the database into the correct filegroups.
Currently, those indexes are mixed between the DATA filegroup and the INDEXES filegroup; they all need to move to the INDEXES filegroup.
This can be done easily enough in script I guess, however how do you best handle an index on a Primary Key?
The following command
DROP INDEX table.indexname
produces the error:
An explicit DROP INDEX is not allowed
on index 'Answer.PK_Answer'. It is
being used for PRIMARY KEY constraint
enforcement.
So what is the best way? Do I need to drop the Primary Key, then drop the Index, then re-create the primary key and finally re-create the index on the correct filegroup? Are there any drawbacks to this method?
You can try the below statement which drops and recreates the index on the index filegroup
CREATE CLUSTERED INDEX PK_Answer
ON tablename(Answer)
WITH (DROP_EXISTING = ON);
Since he has a primary key:
CREATE UNIQUE CLUSTERED INDEX PK_Answer ON tablename(Answer) WITH (DROP_EXISTING = ON);
In case anyone else needed this information (I did), add the FILEGROUP at the end if you wish to MOVE the recreated PRIMARY KEY to another location. Neither of the previous answers stipulated this portion:
CREATE UNIQUE CLUSTERED INDEX PK_TableName_Answer ON TableName(Answer) WITH(DROP_EXISTING = ON) ON [INDEX];

Resources