How does Sql Server handle fill factor on tables with clustered indexes on composite primary keys?
I would assume a key node value would be generated based on the fields that make up the clustered index. Would this mean that each new row inserted would effectively get inserted to at the end of the index?
Would this mean that each new row
inserted would effectively get
inserted to at the end of the index?
No, it would be inserted where it belongs in the order of the columns that make up the index. In other words if you PK is zipcode and state (silly example) and you insert 22222, NJ then it will be inserted after 22222, NI and before 22222,NK
Related
I am using SQL Server 2012 & am creating a table that will have 8 columns, types below
datetime
varchar(12)
varchar(6)
varchar(100)
float
float
int
datetime
Once a day (normally) there will be an upload of approx 10,000 rows of data. Going forward its possible it could be 100,000.
The rows will be unique if I group on the first three columns listed above. I have read I can use the unique constraint on multiple columns which will guarantee the rows are unique.
I think I'm correct in saying that the unique constraint by default sets up non-clustered index. Would a clustered index be better & assuming when the table starts to contain millions of rows this won't cause any issues?
My last question. By applying the unique constraint on my table I am right to say querying the data will be quicker than if the unique constraint wasn't applied (because of the non-clustering or clustering) & uploading the data will be slower (which is fine) with the constraint on the table?
Unique index can be non-clustered.
Primary key is unique and can be clustered
Clustered index is not unique by default
Unique clustered index is unique :)
Mor information you can get from this guide.
So, we should separate uniqueness and index keys.
If you need to kepp data unique by some column - create uniqe contraint (unique index). You'll protect your data.
Also, you can create primary key (PK) on your columns - they will be unique also. But, there is a difference: all other indexies will use PK for referencing, so PK must be as short as possible. So, my advice - create Identity column (int or bigint) and create PK on it. And, create unique index on your unique columns.
Querying data may become faster, if you do queries on your unique columns, if you do query on other columns - you need to create other, specific indexies.
So, unique keys - for data consistency, indexies - for queries.
I think I'm correct in saying that the unique constraint by default
sets up non-clustered index
TRUE
Would a clustered index be better & assuming when the table starts to
contain millions of rows this won't cause any issues?
(1)if u need to make (datetime ,varchar(12), varchar(6)) Unique
(2)if you application or you will access rows using datetime or datetime ,varchar(12) or datetime ,varchar(12), varchar(6) in where condition
ALL the time
then have primary key on (datetime ,varchar(12), varchar(6))
by default it will put Uniqness and clustered index on all above three column.
but as you commented above:
the queries will vary to be honest. I imagine most queries will make
use of the first datetime column
and you will deal with huge data and might join this table with other tables
then its better have a surrogate key( ever-increasing unique identifier ) in the table and to satisfy your Selects
have Non-Clustered INDEXES
Surrogate Key vs Business Key
NON-CLUSTERED INDEX
I have a table that I intended to partition by a nullable column.
This seems to work just fine except for the primary key. I get an error:
Partition columns for a unique index must be a subset of the index key
Create a primary key on a different filegroup. This doesn't work because it removes partitioning.
Skip the primary key all together and create a clustered index (non-unique). This won't work exactly because I need a primary key.
Any idea on how I can get a primary key on a partitioned table where the partition column is nullable? If not, I am open to suggestions on how to handle it another way.
Thanks in advance.
Not sure what really blocked you. You can create PK on your unique column, and have your partition column with nullable. Just not to only create unique cluster index on only PK column. When you need to create unique cluster index, add you PK column and the partition column together.
I need to update a large number of keys in a large SQL Server 2005 database and will be dropping FKs and PKs on a bunch of tables, doing the update (which replaces the values of the PK/FK) and then adding the FK and PK again.
My questions are:
Will this process have any effect on exsiting indexes that exist on those tables, either
indexes that include the PK/FK fields or indexes on other unaffected fields. ie will all indexes still exists, will they need a rebuild?
Will this process affect table statistics, requiring a recalc?
Many thanks
If you drop a PK (which is usually a clustered index) SQL Server will drop and recreate all non clustered indexes(this is needed because if you have a clustered index the non clustered indexes point to the clustered index). If you don't have a clustered index (a heap) the non clustered indexes point to the data row
rebuilding a clustered index will automatically update statistics, a reorg won't
If you created the keys with cascade update then they should be updated automatically
example
create table pri(id int not null primary key)
go
create table ForeignK(fid int not null)
go
ALTER TABLE dbo.ForeignK ADD CONSTRAINT
FK_ForeignK_pri FOREIGN KEY
(fid) REFERENCES dbo.pri(id) ON UPDATE CASCADE
ON DELETE NO ACTION
insert pri values(1)
insert ForeignK values(1)
now update the PK table
update pri set id = 5
go
this will now be 5 also
select * from ForeignK
Every change in indexed column has a propagation in structure change.
For foreign key you could disable them and then rebuild. For private key only thing you can do is to rebuild them. I think that SQLMenace explained it clearly why.
More
I am trying to convert tables from using guid primary keys / clustered indexes to using int identities. This is for SQL Server 2005. There are two tables MainTable and RelatedTable, and the current table structure is as follows:
MainTable [40 million rows]
IDGuid - uniqueidentifier - PK
-- [data columns]
RelatedTable [400 million rows]
RelatedTableID - uniqueidentifier - PK
MainTableIDGuid - uniqueidentifier [foreign key to MainTable]
SequenceNumber - int - incrementing number per main table entry since there can be multiple entries related to a given row in the main table. These go from 1,2,3... etc for each MainTableIDGuid value.
-- [data columns]
The clustered index for MainTable is currently the primary key (IDGuid). The clustered index for RelatedTable is currently (MainTableIDGuid, SequenceNumber).
I want my conversion is do several things:<
Change MainTable to use an integer ID instead of GUID
Add a MainTableIDInt column to related table that links to Main Table's integer ID
Change the primary key and clustered index of RelatedTable to (MainTableIDInt, SequenceNumber)
Get rid of the guid columns.
I've written a script to do the following:
Add an IDInt int IDENTITY column to MainTable. This does a table rebuild and generates the new identity ID values.
Add a MainTableIDInt int column to RelatedTable.
The next step is to populate the RelatedTable.MainTableIDInt column for each row with its corresponding MainTable.IDInt value [based on the matching guid IDs]. This is the step I'm hung up on. I understand this is not going to be speedy, but I'd like to have it perform as well as possible.
I can write a SQL statement that does this update:
UPDATE RelatedTable
SET RelatedTable.MainTableIDInt = (SELECT MainTable.IDInt FROM MainTable WHERE MainTable.IDGuid = RelatedTable.MainTableIDGuid)
or
UPDATE RelatedTable
SET RelatedTable.MainTableIDInt = MainTable.IDInt
FROM RelatedTable
LEFT OUTER JOIN MainTable ON RelatedTable.MainTableIDGuid = MainTable.IDGuid
The 'Display Estimated Execution Plan' displays roughly the same for both of these queries. The execution plan it spits out does the following:
Clustered index scans over MainTable and RelatedTable and does a Merge Join on them [estimated number of rows = 400 million]
Sorts [estimated number of rows = 400 million]
Clustered index update over RelatedTable [estimated number of rows = 400 million]
I'm concerned about the performance of this [sorting 400 million rows sounds unpleasant]. Are my concerns about performance of these execution plan justified? Is there a better way to update the new ID for my related table that will scale given the size of the tables?
First, this will be a headache. Second, I wouldn't change any of the indexes or constraints until I had the data in place. I.e., I would add the identity column but not make it the primary key nor clustered index. Then I'd add the soon-to-be new foreign keys to the various tables. Your queries should look like:
Update ChildTable
Set NewIntForeignKeyId = P.NewIntPrimaryKey
From ChildTable As C
Join ParentTable As P
On P.PrimaryKey = C.ForeignKey
First, notice that I'm using an inner join. There is no reason to use an outer join for this type of query given that you will eventually enforce referential integrity between the new columns. Second, if you populate the columns first and then rebuild the constraints, it will be faster as you'll be able to leverage the existing indexes. Remember that when you change the clustered index, it rebuilds all of the nonclustered indexes. If the tables are large, that will be a serious hit.
Once you have the data in place, I'd then drop all primary constraints, unique constraints, foreign key constraints and unique indexes. Drop the clustered index/constraint last. I'd then add the clustered indexes to all of the tables and after that was done, recreate the unique constraints, foreign key constraints and indexes. If you do not drop the existing indexes before you recreate the clustered index, it will rebuild the existing indexes twice: once when you drop the clustered index and again when you recreate it.
Btw, I highly doubt there is a way to avoid table scans for this sort of thing since you are going to be updating every row.
What is Clustered and non clustered indexing? How to index a table using sql server 2000 Enterprise manager?
In a clustered index on ID, the table rows are ordered by ID.
In a non-clustered index on ID, the references to table rows are ordered by ID.
We can compare a database to a CSV file:
ID,Value
-------
1,ReallyReallyLongValue1
3,ReallyReallyLongValue2
In a clustered table, when we insert a new row, we need to squeeze it between the existing rows:
ID,Value
-------
1,ReallyReallyLongValue1
2,ReallyReallyLongValue2
3,ReallyReallyLongValue3
, which is slow on insert but fast on retrieve.
In a non-clustered table, we keep a separate file index file which orders our rows:
Id,RowNumber
------------
1, 1
3, 2
When we insert the new row, we just append it to our main file and update the short index file:
ID,Value
-------
1,ReallyReallyLongValue1
3,ReallyReallyLongValue3
2,ReallyReallyLongValue2
Id,RowNumber
------------
1, 1
2, 3
3, 2
, which is fast on insert but less efficient on retrieve.
In real databases indexes use more efficient binary trees, but the principle remains the same.
Clustered indexes are faster on SELECT, non-clustered indexes are faster on INSERT / UPDATE / DELETE
A clustered index means that the rows are physically ordered by the values in that index. A non-clustered index means that an index table is kept up to date that allows for quick seeking and sorting based upon value, but does not physically order the rows.
Only one clustered index can exist for a table, and if a primary key exists then that is the clustered index (in SQL Server).
A clustered index defines how the actual table is stored. The rows are stored in a way to make searches on the fields in the clustered index fast. (They're not physically stored in the sort order of the index fields, but in a binary tree or something similiar.)
You can have only one clustered index per table. The clustered index contains all fields in the table, for example:
indexfield1 - indexfield2 - field2 - field3 - ....
A non-clustered index is like a separate table. It contains the fields in the index, and a reference to the fields in the table. For example:
secondindexfield1 - secondindexfield2 - reference to table row
When searching a non-clustered index, SQL server will find the value in the index, do a "bookmark lookup" to the table, and retrieve the other row fields from there. This is why non-clustered indexes perform slightly less wel then clustered indexes.
To add an index in SQL Server Management Studio, expand the table node in object view. Right click on "Indexes" and select "New Index".
Clustered Index: Only one clustered index per table is allowed. If an index is clustered, it means that the table on which the clustered index is based is physically sorted according to that index. Think of the page numbers in an encyclopedia.
Non-clustered Index: Can have many non-clustered indexes per table. Think of the keyword index at the back of the book.