SQL Server Performance and clustered index values - sql-server

I have a table myTable with a unique clustered index myId with fill factor 100%
Its an integer, starting at zero (but its not an identity column for the table)
I need to add a new type of row to the table.
It might be nice if I could distinguish these rows by using negative values of myId.
Would having negative values incur extra page splitting and slow down inserts?
Extra Background:
This table exists as part of the etl for a data warehouse that gathers data from disparate systems. I now want to accomodate a new type of data. A way for me to do this is to reserve negative ids for this new data, which will thus be automatically clustered. This will also avoid major key changes or extra columns in the schema.
Answer Summary:
Fill factors of 100% will noramlly slow down the inserts. But not inserts that happen sequentially, and that includes the sequntial negative inserts.

Besides the practical administration points you already got and the suspect dubious use of negative ids to represent data model attributes, there is also a valid question here: give a table with int ids from 0 to N, inserting new negative values where would those value go and would they cause additional splits?
The initial rows will be placed on the clustered index leaf pages, row with id 0 on first page and row with id N on the last page, filling the pages in between. When the first row with value of -1 is inserted, this will sort ahead of row with id 0 and as such will add a new page to the tree (will allocate an extent of 8 pages actually, but that is a different point) and will link the page in front of the leaf level linked list of pages. This will NOT cause a page split of the former first page. On further inserts of values -2, -3 etc they will go to the same new page and they will be inserted in the proper position (-2 ahead of -1, -3 ahead of -2 etc) until the page fills. Further inserts will add a new page ahead of this one, that will accommodate further new values. Inserts of positive values N+1, N+2 will go at the last page and be placed in it until it fills, then they'll cause a new page to be added and will start filling that page.
So basically the answer is this: inserts at either end of a clustered index should not cause page splits. Page splits can be caused only by inserts between two existing keys. This actually extends to the non-leaf pages as well, an index at either end of the cluster may not split a non-leaf page either. I do not discuss here the impact of updates of course (they can can cause splits if the increase the length of a variable length column).
Lately has been a lot of talk in the SQL Server blogosphere about the potential performance problems of page splits, but I must warn against going to unnecessary extremes to avoid them. Page splits are a normal index operation. If you find yourself in an environment where the page split performance hit is visible during inserts, then you'll be probably worse hit by the 'mitigation' measures because you'll create artificial page latch hot spots that are far worse as they'll affect every insert. What is true is that prolonged operation with frequent splits will result in high fragmentation which impacts the data access time. I say that is best mitigated with off-peak periodical index maintenance operation (reorganize). Avoid premature optimizations, always measure first.

Not enough to notice for any reasonable system.
Page splits happen when a page is full, either at the start or at the end of the range.
As long as you regular index maintenance...
Edit, after Fill factor comments:
After a page split wth 90 or 100 FF, each page will be 50% full. FF = 100 only means an insert will happen sooner (probably 1st insert).
With a strictly monotonically increasing (or decreasing) key (+ve or -ve), a page split happens at either end of the range.
However, from BOL, FILLFACTOR
Fill
Adding Data to the End of the Table
A nonzero fill factor other than 0 or
100 can be good for performance if the
new data is evenly distributed
throughout the table. However, if all
the data is added to the end of the
table, the empty space in the index
pages will not be filled. For example,
if the index key column is an IDENTITY
column, the key for new rows is always
increasing and the index rows are
logically added to the end of the
index. If existing rows will be
updated with data that lengthens the
size of the rows, use a fill factor of
less than 100. The extra bytes on each
page will help to minimize page splits
caused by extra length in the rows.
So does, fillfactor matter for strictly monotonic keys...? Especially if it's low volume writes

No, not at all. Negative values are just as valid as INTegers as positive ones. No problem. Basically, internally, they're all just 4 bytes worth of zeroes and ones :-)
Marc

You are asking the wrong question!
If you create a clustered index that has a fillfactor of 100%, every time a record is inserted, deleted or even modified, page splits can occur because there is likely no room on the existing index data page to write the change.
Even with regular index maintenance, a fill factor of 100% is counter productive on a table where you know inserts are going to be performed. A more usual value would be 90%.

I'm concerned that this post may have taken a wrong turn, in that there seems to be an underlying design issue at work here, irrespective of the resultant page splits.
Why do you need to introduce a negative ID?
An integer primary key, for example, should uniquely indentify a row, it's sign should be irrelevant. I suspect that there may be a definition issue with the primary key for your table if this is not the case.
If you need to flag/identify the newly inserted records then create a column specifically for this purpose.
This solution would be ideal because you may then be able to ensure that your primary key is sequential (perhaps using an Identity data type, although not essential), thereby avoiding issues with page splits (on insert) altogether.
Also, to confirm if I may, a fill factor of 100% for a clustered index primary key (identity integer for example), will not cause page splits for sequential inserts!

Related

Fill factor for nonclustered index on an EmailAddress column?

I'm trying to figure out what an ideal fill factor would be for a non-clusetered index of a column such as EmailAddress. If I have a Person table that is frequently added to, a fill-factor of 0 would result in heavy fragmentation of the index since each new person will have an essentially random value here. In my case, the data is written to and read from frequently, but we have almost no changes or deletions. Are there any guidelines for indexing these types of columns regarding fill factor?
Fill Factor is irrelevant unless you rebuild the index. An index with "random" insertion points will generate page splits and naturally maintain room on pages to accommodate new rows, as split pages end up 50% full.
If you do rebuild such an index (which there's often no reason to do), then consider using a fill factor so you don't remove all the free space on pages, which would lead to a flurry of page splits after rebuild, the end result of which will be similar to (but more expensive than) rebuilding with a fill factor.
Emprically, 60-75 is a reasonable choice.

Clustered Index order

I have a CRM system where I am reviewing the structure of the SQL Server 2017 database.
There is a History table with a clustered index on Date_Entered (datetime) and Time_Entered (varchar(8)). There are around 30 million rows (and counting).
When a customer record is opened, their most recent history is displayed in reverse date / time order.
With the index in ascending order, new rows are added at the end of the table. If I were to rebuild it in descending date / time order, I assume that displaying the records would be quicker (though it isn't slow anyway), but would that mean that new records would be inserted at the beginning, and therefore lead to constant page splits, or is that not the case?
You won't end up with constant page splits. You probably think that because you have the wrong mental image of clustered indexes and think that they are stored sorted in physical order so new rows must somehow get inserted physically at the beginning. This is not the case.
But - don't do that as you will end up with 99.x% logical fragmentation without page splits. The new physical pages will tend to be allocated from later in the file than previous ones whereas the logical order of the index would need them to be allocated from earlier (example).
Assuming inserts are in order of Date_Entered, Time_Entered then each new page will be filled up and then a new page allocated and filled up. There is no need for page splits that move rows between pages - just allocation of new pages. The only issue is that the physical page order will be likely a complete reversal of the logical index order until you rebuild or reorganize the index.
I assume that displaying the records would be quicker
No, it is not needed anyway as the leaf pages of the index are linked together in a doubly linked list and can be read in either forward or backwards direction.
Backward scans can't be parallelised but hopefully this isn't a concern for you.
In any event for displaying the customer record probably you are using a different index with leading column customerid making this largely irrelevant.

Why PostgreSQL(timescaledb) costs more storage in table?

I'm new to database. Recently I start using timescaledb, which is an extension in PostgreSQL, so I guess this is also PostgreSQL related.
I observed a strange behavior. I calculated my table structure, 1 timestamp, 2 double, so totally 24bytes per row. And I imported (by psycopg2 copy_from) 2,750,182 rows from csv file. I manually calculated the size should be 63MB, but I query timescaledb, it tells me the table size is 137MB, index size is 100MB and total 237MB. I was expecting that the table size should equal my calculation, but it doesn't. Any idea?
There are two basic reasons your table is bigger than you expect:
1. Per tuple overhead in Postgres
2. Index size
Per tuple overhead: An answer to a related question goes into detail that I won't repeat here but basically Postgres uses 23 (+padding) bytes per row for various internal things, mostly multi-version concurrency control (MVCC) management (Bruce Momjian has some good intros if you want more info). Which gets you pretty darn close to the 137 MB you are seeing. The rest might be because of either the fill factor setting of the table or if there are any dead rows still included in the table from say a previous insert and subsequent delete.
Index Size: Unlike some other DBMSs Postgres does not organize its tables on disk around an index, unless you manually cluster the table on an index, and even then it will not maintain the clustering over time (see https://www.postgresql.org/docs/10/static/sql-cluster.html). Rather it keeps its indices separately, which is why there is extra space for your index. If on-disk size is really important to you and you aren't using your index for, say, uniqueness constraint enforcement, you might consider a BRIN index, especially if your data is going in with some ordering (see https://www.postgresql.org/docs/10/static/brin-intro.html).

Does changing fill factor on Identity column affect Merge Replication?

I have identified multiple Identity columns in a database that are set to 80 or 90%. I wish to set them all to 100%.
Does anyone know if changing the fill factor on an identity column using Merge Replication causes any issues?
FillFactor comes into picture only when an index is rebuilt by leaving the Percentage of space free set using FillFactor Setting.
With Merge replication,changes at both the sources are tracked through triggers and they are kept in sync.
When you set fillfactor to 80%,20% of the space can be still used for inserts.If you set at 100% ,you are not leaving any space ,there by you have a chance of page splits.Page splits are very expensive in terms of log growth.so there is a chance your inserts will be slower.
But with identity column,all the values will be increasing,so they will be logically added to the end of page.So setting a value of 0 or 100 should improve performance.But fill factor affects only your leaf level pages and what if you update any of the row which may cause the size to exceed the total length of page..Here is what MSDN says on this case
A nonzero fill factor other than 0 or 100 can be good for performance if the new data is evenly distributed throughout the table. However, if all the data is added to the end of the table, the empty space in the index pages will not be filled. For example, if the index key column is an IDENTITY column, the key for new rows is always increasing and the index rows are logically added to the end of the index. If existing rows will be updated with data that lengthens the size of the rows, use a fill factor of less than 100. The extra bytes on each page will help to minimize page splits caused by extra length in the rows.
Setting a Good fillFactor value depends on how your database is used..Heavy Inserts(more free should be there and fillfactor value should be less,but selects will be some what costly).Less inserts (leave fill factor at some high value)
simple search yields so many results .but you should test them first and adapt it to your scenario
FILLFACTOR is mainly used for Indexing.
Since you want to change the Fill Factor to 100.Its mean you need to drop and recreate the Index on the merge tables with Fillfactor 100.
And if in your merge replication, 'Copy Clustered Index' and 'Copy Non Clustered Index' is TRUE For all article properties, then once you recreate Index on the publisher, it will also get replicate on other subscriber.
So, if you have heavy merge tables with Index, I would recommend to implement it during offhours because Index creation will take time to replicate on subscriber.
You can check the fill factor by this query too. Yes, but as #Ragesh said, whenever we change the fill factor (Replication) will impact the performance.
Fill Factor is directly related to Indexes. Every time we all here the
word ‘Index,’ we directly relate it to performance. Index enhances
performance ‑ this is true, but there are a several other options
along with it.
SELECT *
FROM sys.configurations
WHERE name ='fill factor (%)'
Here is good article and explanation of your query.
http://sqlmag.com/blog/what-fill-factor-index-fill-factor-and-performance-part-1
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/9ef72506-f1b0-4700-b836-851e9871d0a6/merge-table-indexes-fill-factor?forum=sqlreplication

How does Index update works when index key is updated in table?

I used to think before that when I update a indexed column in table, at the same time index is also updated. But during one of my interview, interviewer was stressing that it doesn't work that way. For any update in base table, index will rebuild/reorganize. Although I am pretty sure that this can't happen as both operations are very costly, still want to make sure with expert's view.
While thinking about this, one more thing came to my mind. Say I have index column values 1-1000. So as per B-Tree structure, say value 999, will go to right most nodes from top to bottom. Now if I updated this column from 999 to 2, a lot of shuffling will be required to adjust this value in the index B-Tree. How it will be taken care if index rebuild/reorganize doesn't happen after base table update.
I used to think before that when I update a indexed column in table,
at the same time index is also updated.
Yes, that's true. As is for deletes and inserts.
Other indexing systems, may work differently and need to be updated incrementally or rebuild in its entirely separate from the indexed data. This may be confusing.
Statistics need to be updated separately. (See other active discussions in this group.)
For any update in base table, index will rebuild/reorganize.
No, but if SQL Server cannot fit the node in it's physical place a page split may occur. Or when a key value changes a single psychical row movement may occur.
Both may causes fragmentation. Too much fragmentation may cause performance issues. That's why DBA's find it necessary to reduce fragmentation by rebuilding or reorganizing an index at a convenient time.
Say I have index column values 1-1000. So as per B-Tree structure, say value 999, will go to right most nodes from top to bottom. Now if I updated this column from 999 to 2, a lot of shuffling will be required to adjust this value in the index B-Tree. How it will be taken care if index rebuild/reorganize doesn't happen after base table update.
Only the changed row is moved to another slot in another page in the B-Tree. The original slot will remain empty. If the new page is full, a page split occurs. This causes a change in the parent page, which may another page split occur if that page is also full, and so on. Al those events may cause fragmentation which may cause performance degradation.

Resources