Composite clustered index as primary key vs heap table in SQL Server - sql-server

To ensure uniqueness there is a composite PK (clustered) containing:
[timestamp] [datetime2]
[userId] [varchar](36)
[cost_type] [varchar](20)
There are two more columns in the table:
[cost_cent] [bigint] NULL
[consumption_cent] [bigint] NULL
Composite clustered primary keys are not ideal (incl. varchar) but what is the alternative?
Having a heap table with a non clustered primary key? Additionally add another clustered index? But on what column? There is no identity column.
Background: there is a constant insert/update on this table via Merge statements. Table size is ~50 million rows
Queries will use the PK with a time range mainly.

Your index size is 58 bytes,i don't see a big issue with this size..
there is a constant insert/update on this table via Merge statements
if you go with existing setup of composite key(since 56 bytes is not that huge) ,updating primary key is a red flag,since
1.You may see some fragmentation
2.update/delete commands will also have to touch non clustered indexes
Some more options i would experiment with,since 50 million is not much huge
Leave this table as heap and add a non clustered index with timestamp column as leading column and rest of the columns needed for a query as included columns .When you leave this table as heap,try answering the following questions yourself to see if leaving this table as heap helps you
Will you ever need to join this table to other tables?
Do you need a way to uniquely identify a record?
2.I would also try adding an identity column and make it as primary key..

Related

Is there any way to mark existing clustered index as primary key in SQL Server?

I have table with about 60 000 000 rows in it and having size near 60 GB.
It has 2 indexes: clustered index and primary key on same id identity column.
Primary key index has size near 1GB. It looks excessive. I have several such tables.
Question is, is there any way to effectively mark existing clustered index as primary key also without dropping both indexes and creating one single new index?
Sub question is does it worth it to do such operations, maybe only drop primary keys? What's the real practical advantage of having primary key except descriptive usage for 3rd party tools? maybe sql server optimizer uses this metadata to optimize queries or any other advantages that I am missing?
Small sample of what I want to achieve, but other way (if it exists), without dropping and creating indexes.
I assume that it looks like there's no other way really, but who knows maybe there's some trick.
create table a
(
id int identity,
col1 varchar(50)
)
create unique clustered index cix_id on a(id)
alter table a add constraint pk_a primary key nonclustered(id)
select t.name,i.name,i.is_primary_key,i.type_desc from
sys.tables t
inner join sys.indexes i on i.object_id=t.object_id
where t.name='a'
drop index cix_id on a
alter table a drop constraint pk_a
alter table a add constraint pk_a primary key clustered(id)
select t.name,i.name,i.is_primary_key,i.type_desc from
sys.tables t
inner join sys.indexes i on i.object_id=t.object_id
where t.name='a'

Azure Synapse Analytics: Can I use non-unique column as hash column in hash distributed tables?

I'm using Dedicated SQL Pools (AKA Azure Synapse Analytics). Trying to optimize a fact table and according to documentation FACT tables should be hash distributed for better performance.
Problems is:
My fact table has a composite primary key.
You can specify only column as hash distribution column.
Can I use one of those columns as distribution column? Any one of the columns would have duplicates, though they are all NOT NULL.
CREATE TABLE myTable
(
[ITEM] [varchar](50) NOT NULL,
[LOC] [varchar](50) NOT NULL,
[MEASURE] [varchar](50) NOT NULL
CONSTRAINT [PK] PRIMARY KEY NONCLUSTERED
(
[LOC] ASC,
[ITEM] ASC
) NOT ENFORCED
)
WITH
(
DISTRIBUTION = HASH([ITEM]),
CLUSTERED COLUMNSTORE INDEX
)
Yes, you can! You can use any column as a hash distribution column, but be aware that this introduces a constraint into your table: you cannot drop the distribution column.
There are two reasons to use a hash distribution column: one is the to prevent data movement across distributions for queries, but the other is to ensure even distribution of data across your distributions to ensure all the workers are efficiently used in queries. Hash-distributing by a non-skewed column, even if not unique, can help with the second case.
However, if you do want to distribute by your primary key, consider creating a composite primary key by hashing together the different columns of your composite primary key. You can hash-distribute by your hashed key and this will also hopefully reduce data movement if you need to upsert on that hashed key later.

What is different between the two methods of generating cluster primary keys?

I have a Table to make a Clustered Primary Key.
CREATE TABLE dbo.SampleTable
(
C1 INT NOT NULL,
C2 INT NOT NULL )
First Way is making Primary Key index with Clustered index.
ALTER TABLE dbo.SampleTable ADD CONSTRAINT IDX_SampleTable PRIMARY KEY CLUSTERED (C1, C2)
Second Way is CREATE CLUSTERED INDEX after ADD CONSTRAINT PRIMARY KEY NONCLUSTERED about same columns.
ALTER TABLE dbo.SampleTable ADD CONSTRAINT IDX_SampleTable PRIMARY KEY NONCLUSTERED (C1, C2)
CREATE CLUSTERED INDEX IDX_SampleTable2 ON dbo.SampleTable (C1 ,C2) -- Can not create Same Name With above Constraint Name
Is there a difference in performance from the above two methods?
Is there a way do not recommend using it?
Yes, there is a difference. By specifying CLUSTERED, you instruct the database to store the data in a certain way. Basically, it enforces that subsequent indexes are stored on subsequent data blocks on the hard drive.
By creating a clustered primary key as in your first statement, all the data in the table will always have unique values in C1, C2 and the data is always stored in subsequent data blocks.
In the second example, you do NOT enforce this CLUSTERED behaviour through the primary key, but through a separate index. Though the effects are the same now, you might choose to remove (or temporarily disable) the index and then the data would no longer be guaranteed to get stored in a CLUSTERED fashion.
Bottom line: In practice these two statements are the same now, but might make a difference in the future because the CLUSTERED property is not integrated in the PK, but in a separate index.
Creating a Nonclustered Primary Key and then creating a Clustered index on the columns within the Primary key is not a good idea. Effectively you'll create 2 indexes on the columns (C1 and C2 in this case), however, it's very unlikely the nonclustered index will ever be used. This is because the Clustered Index is very likely going to be the first choice for the RDBMS, as the pages will be in the order of the Clustered Index. Also, when using a non-clustered index the data engine will still need to refer to the Clustered Index afterwards, to find out the exact location of the row (in the pages).
If you do want a clustered index on your Primary Key(s) then create the key as a Clustered Primary Key. This is not to say that your Primary Key should always be Clustered, but that is a very different subject.
This depends from your datas:
https://learn.microsoft.com/en-gb/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-2017
Clustered indexes sort and store the data rows in the table or view
based on their key values. These are the columns included in the index
definition. There can be only one clustered index per table, because
the data rows themselves can be stored in only one order.
So the clustered key influence the format of your physical data structure.

designing new table for daily uploads - use unique constraint

I am using SQL Server 2012 & am creating a table that will have 8 columns, types below
datetime
varchar(12)
varchar(6)
varchar(100)
float
float
int
datetime
Once a day (normally) there will be an upload of approx 10,000 rows of data. Going forward its possible it could be 100,000.
The rows will be unique if I group on the first three columns listed above. I have read I can use the unique constraint on multiple columns which will guarantee the rows are unique.
I think I'm correct in saying that the unique constraint by default sets up non-clustered index. Would a clustered index be better & assuming when the table starts to contain millions of rows this won't cause any issues?
My last question. By applying the unique constraint on my table I am right to say querying the data will be quicker than if the unique constraint wasn't applied (because of the non-clustering or clustering) & uploading the data will be slower (which is fine) with the constraint on the table?
Unique index can be non-clustered.
Primary key is unique and can be clustered
Clustered index is not unique by default
Unique clustered index is unique :)
Mor information you can get from this guide.
So, we should separate uniqueness and index keys.
If you need to kepp data unique by some column - create uniqe contraint (unique index). You'll protect your data.
Also, you can create primary key (PK) on your columns - they will be unique also. But, there is a difference: all other indexies will use PK for referencing, so PK must be as short as possible. So, my advice - create Identity column (int or bigint) and create PK on it. And, create unique index on your unique columns.
Querying data may become faster, if you do queries on your unique columns, if you do query on other columns - you need to create other, specific indexies.
So, unique keys - for data consistency, indexies - for queries.
I think I'm correct in saying that the unique constraint by default
sets up non-clustered index
TRUE
Would a clustered index be better & assuming when the table starts to
contain millions of rows this won't cause any issues?
(1)if u need to make (datetime ,varchar(12), varchar(6)) Unique
(2)if you application or you will access rows using datetime or datetime ,varchar(12) or datetime ,varchar(12), varchar(6) in where condition
ALL the time
then have primary key on (datetime ,varchar(12), varchar(6))
by default it will put Uniqness and clustered index on all above three column.
but as you commented above:
the queries will vary to be honest. I imagine most queries will make
use of the first datetime column
and you will deal with huge data and might join this table with other tables
then its better have a surrogate key( ever-increasing unique identifier ) in the table and to satisfy your Selects
have Non-Clustered INDEXES
Surrogate Key vs Business Key
NON-CLUSTERED INDEX

Alter a column length

I need to alter the length of a column column_length in say more than 500 tables and the tables might have no of records ranging from 10 records to 3 or 4 million records.
The column may just be a normal column
CREATE TABLE test(column_length varchar(10))
The column might contain non-clustered index on it.
CREATE TABLE test(column_length varchar(10))
CREATE UNIQUE NONCLUSTERED INDEX column_length_ind ON test (column_length)
The column might contain PRIMARY KEY clustered index on it
CREATE TABLE test(column_length varchar(10))
ALTER TABLE test ADD PRIMARY KEY CLUSTERED INDEX ON column_length
The column might be a composite primary key
The column might have a foreign key reference
In short the column column_length might be anything.
All I need is to create scripts to alter the length of the column_length from varchar(10) to varchar(50). Should I drop the indexes before altering and then recreate them? What about the primary key and foreign key?
Through my research and testing I figured out that I can just alter the column's length without dropping the primary key or any indexes but have to drop and recreate the foreign key alone.
Is this assumption right?
Yes you should be able to just modify the columns. From my experience it is faster to leave the index and primary key in place.
Likely you will need to do alter column on the foreign key tables as well to increase the size. SO first you drop the fk constraint, then fix the forign kkey fields, then fix the primary key field then put the constraints back on.

Resources