How to drop a clustered columnstore index? - sql-server

How can I drop a clustered columnstore index on a table?
I am trying to alter the length of a column, but getting this error:
The statement failed because a secondary dictionary reached the maximum size limit.
Consider dropping the columnstore index, altering the column, then creating a new columnstore index.
I have a table that looks like this, roughly:
CREATE TABLE [dim].[Ticket]
(
[ID] [bigint] NULL,
[Rev] [int] NULL,
[Timestamp] [datetime2](7) NULL,
[Title] [varchar](260) NULL,
[Description] [varchar](4005) NULL
)
WITH
(
DISTRIBUTION = HASH ( [ID] ),
CLUSTERED COLUMNSTORE INDEX
)
When I try variations of this recommendation:
https://learn.microsoft.com/en-us/sql/t-sql/statements/drop-index-transact-sql?view=sql-server-ver16
I just get errors.

I checked that this works on Synapse too. Discover the CCI's name with
select * from sys.indexes where object_id = object_id('dim.Ticket')
then drop it
drop index ClusteredIndex_fdddc3c574214a2096190cbc54f58cc4 on dim. Ticket
You'll then have a heap. When you're ready re-compress the table with
create clustered columnstore index cci_dim_ticket on dim.Ticket
But it would be more efficient to create a new table with a CTAS, and then rename and drop the old one. Dropping the CCI actually requires rewriting the table as an uncompressed heap, which you can skip with CTAS.

Related

SQL Server create clustered index on nvarchar column, enforce sorting

I want have a small table with two columns [Id] [bigint] and [Name] [nvarchar](63). The column is used for tags and it will contain all tags that exist.
I want to force an alphabetical sorting by the Name column so that a given tag is found more quickly.
Necessary points are:
The Id is my primary key, I use it e.g. for foreign keys.
The Name is unique as well.
I want to sort by Name alphabetically.
I need the SQL command for creating the constraints since I use scripts to create the table.
I know you can sort the table by using a clustered index, but I know that the table is not necessarily in that order.
My query looks like this but I don't understand how to create the clustered index on Name but still keep the Id as Primary Key:
IF NOT EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[Tags]')
AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Tags]
(
[Id] [bigint] IDENTITY(1,1) PRIMARY KEY NOT NULL,
[Name] [nvarchar](63) NOT NULL,
CONSTRAINT AK_TagName UNIQUE(Name)
)
END
Edit:
I decided to follow paparazzo's advice. So if you have the same problem make sure you read his answer as well.
You should NOT do what you want to do.
Let the Id identity be the clustered PK. It (under normal use) will not fragment.
A table has no natural order. You have to sort by to get an order. Yes data is typically presented in PK order but that is just a convenience the query optimizer may or may not use.
Just put a non clustered unique index on Name and sort by it in the select.
You really need bigint? That is a massive table.
You can specify that the Primary Key is NONCLUSTERED when declaring it as a constraint, you can then declare the Unique Key as being the CLUSTERED index.
CREATE TABLE [dbo].[Tags] (
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](63) NOT NULL,
CONSTRAINT PK_Tag PRIMARY KEY NONCLUSTERED (Id ASC),
CONSTRAINT AK_TagName UNIQUE CLUSTERED (Name ASC)
);
Also specifying ASC or DESC after the Column name (within the key/index declaration) sets the index sort order. The default is usually ascending.

Whats the difference between Primary key in table definitions vs. unique clustered index

What is the difference between defining the PK as part of the table definition vs. adding it as a unique clustered index. Using the example below, both tables show up as index_id 1 in sys.indexes, but only table1 has is_primary_key=1
I thought this was the same, but SSMS only shows the key-symbol on table1
Thanks.
CREATE DATABASE IndexVsHeap
GO
USE [IndexVsHeap]
GO
-- Clustered index table
CREATE TABLE [dbo].[Table1](
[LogDate] [datetime2](7) NOT NULL,
[Database_Name] [nvarchar](128) NOT NULL,
[Cached_Size_MB] [decimal](10, 2) NULL,
[Buffer_Pool_Percent] [decimal](5, 2) NULL
CONSTRAINT [PK_LogDate_DatabaseName] PRIMARY KEY(LogDate, Database_Name)
)
-- Table as heap, PK-CI added later, or did i?
CREATE TABLE [dbo].[Table2](
[LogDate] [datetime2](7) NOT NULL,
[Database_Name] [nvarchar](128) NOT NULL,
[Cached_Size_MB] [decimal](10, 2) NULL,
[Buffer_Pool_Percent] [decimal](5, 2) NULL
)
-- Adding PK-CI to table2
CREATE UNIQUE CLUSTERED INDEX [PK_LogDate_Database_Name] ON [dbo].[Table2]
(
[LogDate] ASC,
[Database_Name] ASC
)
GO
SELECT object_name(object_id), * FROM sys.index_columns
WHERE object_id IN ( object_id('table1'), object_id('table2') )
SELECT * FROM sys.indexes
WHERE name LIKE '%PK_LogDate%'
To all intents and purposes there is no difference here.
A unique index would allow null but the columns are not null anyway.
Also a unique index (though not constraint) could be declared with included columns or as a filtered index but neither of those apply here as the index is clustered.
The primary key creates a named constraint object that is schema scoped so the name must be unique. An index must only be named uniquely within the table it is part of.
I would still opt for the PK though to get the visual indicator in the tooling. It allows other developers (and possibly code) to more easily detect what is the unique row identifier.
Also remember that while a table can have only one PK, it could have multiple unique indexes (although only one can be clustered).
I can see where you might want to cluster on information that is unique in some meaningful way but might want to have a separate autogenerated nonclustered PK to make joins faster than joining on the automobile VIN number, for instance. That is why both are available.
Primary key is a key that identifies each row in a unique way (it's a unique index too). It could be clustered or not but it's highly recommended to be clustered. If it is clustered, data is stored based on that key.
A unique clustered index is a unique value (or combination of values) and the data is stored based on that index.
What's the advantage of a clustered index? if you have to an index scan (scan the whole index), data is stored together so it's faster.

What happens when you use page compression on a primary key in SQL Server?

Given that the primary key index is how the table is physically laid out, what effect if any is there by putting a WITH DATA_COMPRESSION on it?
CREATE TABLE [Search].[Property]
(
[PropertyId] [BIGINT]
NOT NULL
CONSTRAINT PK_Property PRIMARY KEY WITH (DATA_COMPRESSION = PAGE),
[Parcel] [GEOMETRY] NULL
CHECK ([Parcel] IS NULL
OR ([Parcel].STSrid = 3857
AND [Parcel].STIsValid() = 1
)),
[StreetNumber] [VARCHAR](20) NULL,
[StreetDir] [VARCHAR](2) NULL,
[StreetName] [VARCHAR](50) NULL,
[StreetType] [VARCHAR](4) NULL,
[StreetPostDir] [VARCHAR](2) NULL
)
WITH (
DATA_COMPRESSION = PAGE);
GO
This has the same effect as compressing the table, such as :
ALTER TABLE [Search].[Property]
REBUILD WITH (DATA_COMPRESSION = PAGE);
See MSDN for details on compression of indexes and tables, or MSDN for details on how page compression is implemented within SQL Server.
AFAIK, It depends actually; When you enable compression on primary key (as in your post). If a clustered index is created on PK (which is default) then it will effect as table level compression (i.e, compression on the clustered index = compressing the table); whereas if it's nonclustered index then only a index compression will take place.
[PropertyId] [BIGINT]
NOT NULL
CONSTRAINT PK_Property PRIMARY KEY WITH (DATA_COMPRESSION = PAGE)

indexes that appear to be redundant with clustered PK

I am working on a database at a client with the following table:
CREATE TABLE [Example] (
[ID] INT IDENTITY (1, 1) NOT NULL,
....
[AddressID] INT NULL,
[RepName] VARCHAR(50) NULL,
....
CONSTRAINT [PK_Example] PRIMARY KEY CLUSTERED ([ID] ASC)
)
And it has the following indexes:
CREATE NONCLUSTERED INDEX [IDX_Example_Address]
ON [example]( [ID] ASC, [AddressId] ASC);
CREATE NONCLUSTERED INDEX [IDX_Example_Rep]
ON [example]( [ID] ASC, [RepName] ASC);
To me these are appear to be redundant with the clustered Index. I cannot imagine any scenario where these would be beneficial. If anyone can come up with a situation where these would be useful, let me know.
Here is another example:
CREATE NONCLUSTERED INDEX [IDX_Example_IsDeleted]
ON [example]( [IsDeleted] ASC)
INCLUDE( [ID], [SomeNumber]);
Why would you need to INCLUDE [ID]? My understanding is that the clustered index key is already present in every non-clustered index, so why would they do that? I would just INCLUDE ([SomeNumber])
You are correct in that the clustered index key is already included in every non-clustered index, but not in the same sense as your example clustered indices suggest.
For example, if you have a non-clustered index as in your example for IDX_Example_Rep, and you run this query:
SELECT [RepName], [Id] FROM [Example] WHERE [RepName] = 'some_value';
The IDX_Example_Rep index will be used, but it will be an index scan (every row will be checked). This is because the [Id] column was specified as the first column in the index.
If the index is instead specified as follows:
CREATE NONCLUSTERED INDEX [IDX_Example_Rep]
ON [example]([RepName] ASC);
Then when you run the same sample query, the IDX_Example_Rep index is used and the operation is an index seek - the engine knows exactly where to find the records by [RepName] within the IDX_Example_Rep index and, because the only other field being returned by the SELECT is the [Id] field, which is the key of the clustered index and therefore included in the non-clustered index, no further operations are necessary.
If the SELECT list were expanded to include, say, the [AddressId] field, then you'll find the engine still performs the index seek against IDX_Example_Rep to find the correct records, but then also has do a key lookup against the clustered index to get the "other" fields (the [AddressId] in this example).
So, no - you probably don't want to repeat the [Id] column as part of the non-clustered indices in general, but when it comes to non-clustered indices you definitely want to pay attention to your SELECTed fields and know whether or not you're covering the fields you're going to need.

New uniqueidentifier on the go

I want to add a column for a table which would become a PRIMARY KEY and be of type uniqueidentifier. I have this, but I wonder if there is a faster (in fewer code lines) way?
ALTER TABLE [table] ADD [id] [uniqueidentifier]
DEFAULT('00000000-0000-0000-0000-000000000000') NOT NULL
GO
UPDATE [table] SET [id] = NEWID()
GO
ALTER TABLE [table] ADD CONSTRAINT [PK_table_id] PRIMARY KEY CLUSTERED
GO
If you want to keep naming your constraints (and you should), I don't think we can reduce it below 2 statements:
create table T (
Col1 varchar(10) not null
)
go
insert into T (Col1)
values ('abc'),('def')
go
ALTER TABLE T ADD [id] [uniqueidentifier] constraint DF_T_id DEFAULT(NEWID()) NOT NULL
GO
ALTER TABLE T ADD constraint PK_T PRIMARY KEY CLUSTERED (id)
go
drop table T
Note, that I've added a name for the default constraint. Also, this ensures that new rows also have id values assigned. As I said in my comment, it's usually preferable to avoid having columns with values generated by NEWID() clustered - it leads to lots of fragmentation. If you want to avoid that, consider NEWSEQUENTIALID().
If you don't care about constraint names, you can do it as a single query:
ALTER TABLE T ADD [id] [uniqueidentifier] DEFAULT(NEWID()) NOT NULL PRIMARY KEY CLUSTERED

Resources