Parallel clustered columnstore index build - sql-server

Is it possible to build clustered columnstore index using several processes and keeping the row order of clustered rowstore index? Table is partitioned. Currently I am converting rowstore index to columnstore like this:
CREATE CLUSTERED columnstore INDEX [index_name] ON schema.table
WITH (
drop_existing = ON
,MAXDOP = 1
)
If I increase MAXDOP, then row order isn't kept. I am thinking about creating separate table for every partition and then doing partition switching. Maybe there is a better way?

Related

Change Data Capture cannot be enabled on a table with a clustered columnstore index. Consider dropping clustered columnstore index

I am getting the below error when trying to enable CDC on a table with a clustered columnstore index:
Change Data Capture cannot be enabled on a table with a clustered columnstore index. Consider dropping clustered columnstore index
But I need to have both CDC and clustered columnstore index on the same table.
Is there any workaround to this limitation?
The workaround to this is to enable CDC first and then to create a clustered columnstore index.
Drop a clustered columnstore index if exists.
Enable CDC.
Create a clustered columnstore index.
Voila! It works!

Adding a column to a table with nonclustered columnstore index

Adding a column to a table that has a nonclustered columnstore index on it is not a problem, but is it possible to add a column to nonclustered columnstore index itself without dropping it and creating again?

Multiple Clustered Indexes on a Single Table?

I thought we could only place one clustered index on one table, and put multiple non-clustered indexes on a table, but using the code below I can easily add more than one clustered index to my table.
CREATE CLUSTERED INDEX TBL_MULTI_LC_HIST ON dbo.TBL_MULTI_LC_HIST (ID,AsOfDate)
Is this completely wrong?
It isn't possible to create multiple clustered indexes for a single table. From the docs (emphasis mine):
Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be stored in only one order.
For example this will fail:
CREATE TABLE Thing
(
Column1 INT NOT NULL,
Column2 INT NOT NULL
)
CREATE CLUSTERED INDEX IX1 ON dbo.Thing(Column1)
CREATE CLUSTERED INDEX IX2 ON dbo.Thing(Column2)
Error:
Cannot create more than one clustered index on table 'dbo.Thing'. Drop the existing clustered index 'IX1' before creating another.
Example: http://www.sqlfiddle.com/#!18/53a63/1
You can however have a single index with multiple columns in it which is perhaps where you are getting confused:
CREATE CLUSTERED INDEX IX3 ON dbo.Thing(Column1, Column2)
You can only have one clustered index. A "Clustered" index IS the row... it contains all the columns. Every other index would just contain a pointer to the clustered row. The key of the clustered index enforces an 'ordering' on the rows by default.
If there is no clustered index, then the rows are basically stored in a heap, with no order or structure.

Why QO choses clustered index-scan vs table-scan?

If I have a query like this:
SELECT * FROM tTable
where tTable does not contain any indexes a table-scan happens, as expected. If I add a clustered index on some column then QO decides to use clustered index scan on this query. Why? Why is clustered-index-scan preferred instead of table-scan in this case?
If I add a clustered index on some column then QO decides to use clustered index scan on this query
because when you create a clustered index on a table,data in table is rearranged in index order..so table it self is clustered index.This is also the reason why you can't have two clustered indexes on same table
To summarize,when you create a clustered index,there is only one structure ,not two(clustered index and table)
The query is "give me all rows and all columns" which means "read every row" which is a scan
There is nothing to do an index seek on, because there is no WHERE clause.
Unlike this:
SELECT * FROM tTable WHERE PrimaryClusteredKeyValue = 45
Then this may use a nonclustered seek followed by a clustered key lookup or it may still scan the clustered index because you ask for all columns. It depends on how many rows gbn will match
SELECT * FROM tTable WHERE NonClusteredOtherColumnValue = 'gbn'

How to speed up recreating cluster index

In SQL Server, there is no option for altering the cluster index if i want to add one new column to cluster index definition. The only option is to drop and create cluster index with new definition.
From what I understand, drop and create of cluster index is a very costly and time consuming for high volume tables.
Cluster index recreate rebuilds all the nonclustered indexes on a table which can be very expensive.
The question to this forum "is there anyway we can speed up cluster index recreating"
The one workaround what I can think is to drop all non-cluster index before recreating cluster index. Will this approach work ?
Use
CREATE .... WITH (DROP_EXISTING = ON)
Instead of
DROP ...
CREATE ...
This means the non clustered indexes only have to be updated once (to include the new key column). Not twice - first to use the physical rid and then again to use the new CI key.
The DROP_EXISTING clause tells SQL Server that the existing clustered index is being dropped but that a new one will be added in its place, letting SQL Server defer updating the nonclustered index until the new clustered index is in place..
Additionally, SQL Server won't rebuild the nonclustered index at all if the clustered index key doesn't change and is defined as UNIQUE, which isn't an obvious performance benefit of defining a clustered index as UNIQUE
Example
CREATE TABLE #T
(
A INT,
B INT,
C INT
)
CREATE CLUSTERED INDEX IX ON #T(A)
CREATE CLUSTERED INDEX IX ON #T(A,B) WITH (DROP_EXISTING = ON)
DROP TABLE #T

Resources