I'm trying to figure out what an ideal fill factor would be for a non-clusetered index of a column such as EmailAddress. If I have a Person table that is frequently added to, a fill-factor of 0 would result in heavy fragmentation of the index since each new person will have an essentially random value here. In my case, the data is written to and read from frequently, but we have almost no changes or deletions. Are there any guidelines for indexing these types of columns regarding fill factor?
Fill Factor is irrelevant unless you rebuild the index. An index with "random" insertion points will generate page splits and naturally maintain room on pages to accommodate new rows, as split pages end up 50% full.
If you do rebuild such an index (which there's often no reason to do), then consider using a fill factor so you don't remove all the free space on pages, which would lead to a flurry of page splits after rebuild, the end result of which will be similar to (but more expensive than) rebuilding with a fill factor.
Emprically, 60-75 is a reasonable choice.
Related
I have been searching good information about index benchmarking on PostgreSQL and found nothing really good.
I need to understand how PostgreSQL behaves while handling a huge amount of records.
Let's say 2000M records on a single non-partitioned table.
Theoretically, b-trees are O(log(n)) for reads and writes but in practicality
I think that's kind of an ideal scenario not considering things like HUGE indexes not fitting entirely in memory (swapping?) and maybe other things I am not seeing at this point.
There are no JOIN operations, which is fine, but note this is not an analytical database and response times below 150ms (less the better) are required. All searches are expected to be done using indexes, of course. Where we have 2-3 indexes:
UUID PK index
timestamp index
VARCHAR(20) index (non unique but high cardinality)
My concern is how writes and reads will perform once the table reach it's expected top capacity (2500M records)
... so specific questions might be:
May "adding more iron" achieve reasonable performance in such scenario?
NOTE this is non-clustered DB so this is vertical scaling.
What would be the main sources of time consumption either for reads and writes?
What would be the amount of records on a table that we can consider "too much" for this standard setup on PostgreSql (no cluster, no partitions, no sharding)?
I know this amount of records suggests taking some alternative (sharding, partitioning, etc) but this question is about learning and understanding PostgreSQL capabilities more than other thing.
There should be no performance degradation inserting into or selecting from a table, even if it is large. The expense of index access grows with the logarithm of the table size, but the base of the logarithm is large, and the index shouldn't have a depth of the index cannot be more than 5 or 6. The upper levels of the index are probably cached, so you end up with a handful of blocks read when accessing a single table row per index scan. Note that you don't have to cache the whole index.
I would like to know if there is any way other than "analyze index validate structure" to find out index fragmentation in Oracle database? As this causes row lock in a production environment.
"analyze index validate structure online" doesn't populate the index_stats.
Thanks.
If your optimizer statistics are relatively current, you can do something like
get number of LEAF_BLOCKS from USER_INDEXES
get NUM_ROWS from USER_INDEXES
get AVG_COL_LEN for each column in the index from USER_TABLES
Summing the column lengths, plus 6 bytes for the rowid, times the number of rows gives you a total byte figure for the index entries.
Depending on the data within it, and how that data was inserted, an index will typically sit at around 65-90% full. Throw in some block level overheads (lets say 200 bytes per 8k block), and you can use this get an estimate of how many leaf blocks you expect the index to have.
If that is roughly close to the LEAF_BLOCK statistic you have, then you can assume that the index is probably not "fragmented" (although that is a term that can cover a multitude of things).
But unless you have a performance issue that you can currently tie back to this index, then I wouldn't worry too much about index fragmentation.
A row lock is a row lock; as such it is very unlikely that this is due to index fragmentation (which is another topic).
A row lock is generally taken out by the application, and is normal behavior.
I have identified multiple Identity columns in a database that are set to 80 or 90%. I wish to set them all to 100%.
Does anyone know if changing the fill factor on an identity column using Merge Replication causes any issues?
FillFactor comes into picture only when an index is rebuilt by leaving the Percentage of space free set using FillFactor Setting.
With Merge replication,changes at both the sources are tracked through triggers and they are kept in sync.
When you set fillfactor to 80%,20% of the space can be still used for inserts.If you set at 100% ,you are not leaving any space ,there by you have a chance of page splits.Page splits are very expensive in terms of log growth.so there is a chance your inserts will be slower.
But with identity column,all the values will be increasing,so they will be logically added to the end of page.So setting a value of 0 or 100 should improve performance.But fill factor affects only your leaf level pages and what if you update any of the row which may cause the size to exceed the total length of page..Here is what MSDN says on this case
A nonzero fill factor other than 0 or 100 can be good for performance if the new data is evenly distributed throughout the table. However, if all the data is added to the end of the table, the empty space in the index pages will not be filled. For example, if the index key column is an IDENTITY column, the key for new rows is always increasing and the index rows are logically added to the end of the index. If existing rows will be updated with data that lengthens the size of the rows, use a fill factor of less than 100. The extra bytes on each page will help to minimize page splits caused by extra length in the rows.
Setting a Good fillFactor value depends on how your database is used..Heavy Inserts(more free should be there and fillfactor value should be less,but selects will be some what costly).Less inserts (leave fill factor at some high value)
simple search yields so many results .but you should test them first and adapt it to your scenario
FILLFACTOR is mainly used for Indexing.
Since you want to change the Fill Factor to 100.Its mean you need to drop and recreate the Index on the merge tables with Fillfactor 100.
And if in your merge replication, 'Copy Clustered Index' and 'Copy Non Clustered Index' is TRUE For all article properties, then once you recreate Index on the publisher, it will also get replicate on other subscriber.
So, if you have heavy merge tables with Index, I would recommend to implement it during offhours because Index creation will take time to replicate on subscriber.
You can check the fill factor by this query too. Yes, but as #Ragesh said, whenever we change the fill factor (Replication) will impact the performance.
Fill Factor is directly related to Indexes. Every time we all here the
word ‘Index,’ we directly relate it to performance. Index enhances
performance ‑ this is true, but there are a several other options
along with it.
SELECT *
FROM sys.configurations
WHERE name ='fill factor (%)'
Here is good article and explanation of your query.
http://sqlmag.com/blog/what-fill-factor-index-fill-factor-and-performance-part-1
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/9ef72506-f1b0-4700-b836-851e9871d0a6/merge-table-indexes-fill-factor?forum=sqlreplication
I used to think before that when I update a indexed column in table, at the same time index is also updated. But during one of my interview, interviewer was stressing that it doesn't work that way. For any update in base table, index will rebuild/reorganize. Although I am pretty sure that this can't happen as both operations are very costly, still want to make sure with expert's view.
While thinking about this, one more thing came to my mind. Say I have index column values 1-1000. So as per B-Tree structure, say value 999, will go to right most nodes from top to bottom. Now if I updated this column from 999 to 2, a lot of shuffling will be required to adjust this value in the index B-Tree. How it will be taken care if index rebuild/reorganize doesn't happen after base table update.
I used to think before that when I update a indexed column in table,
at the same time index is also updated.
Yes, that's true. As is for deletes and inserts.
Other indexing systems, may work differently and need to be updated incrementally or rebuild in its entirely separate from the indexed data. This may be confusing.
Statistics need to be updated separately. (See other active discussions in this group.)
For any update in base table, index will rebuild/reorganize.
No, but if SQL Server cannot fit the node in it's physical place a page split may occur. Or when a key value changes a single psychical row movement may occur.
Both may causes fragmentation. Too much fragmentation may cause performance issues. That's why DBA's find it necessary to reduce fragmentation by rebuilding or reorganizing an index at a convenient time.
Say I have index column values 1-1000. So as per B-Tree structure, say value 999, will go to right most nodes from top to bottom. Now if I updated this column from 999 to 2, a lot of shuffling will be required to adjust this value in the index B-Tree. How it will be taken care if index rebuild/reorganize doesn't happen after base table update.
Only the changed row is moved to another slot in another page in the B-Tree. The original slot will remain empty. If the new page is full, a page split occurs. This causes a change in the parent page, which may another page split occur if that page is also full, and so on. Al those events may cause fragmentation which may cause performance degradation.
I have a table myTable with a unique clustered index myId with fill factor 100%
Its an integer, starting at zero (but its not an identity column for the table)
I need to add a new type of row to the table.
It might be nice if I could distinguish these rows by using negative values of myId.
Would having negative values incur extra page splitting and slow down inserts?
Extra Background:
This table exists as part of the etl for a data warehouse that gathers data from disparate systems. I now want to accomodate a new type of data. A way for me to do this is to reserve negative ids for this new data, which will thus be automatically clustered. This will also avoid major key changes or extra columns in the schema.
Answer Summary:
Fill factors of 100% will noramlly slow down the inserts. But not inserts that happen sequentially, and that includes the sequntial negative inserts.
Besides the practical administration points you already got and the suspect dubious use of negative ids to represent data model attributes, there is also a valid question here: give a table with int ids from 0 to N, inserting new negative values where would those value go and would they cause additional splits?
The initial rows will be placed on the clustered index leaf pages, row with id 0 on first page and row with id N on the last page, filling the pages in between. When the first row with value of -1 is inserted, this will sort ahead of row with id 0 and as such will add a new page to the tree (will allocate an extent of 8 pages actually, but that is a different point) and will link the page in front of the leaf level linked list of pages. This will NOT cause a page split of the former first page. On further inserts of values -2, -3 etc they will go to the same new page and they will be inserted in the proper position (-2 ahead of -1, -3 ahead of -2 etc) until the page fills. Further inserts will add a new page ahead of this one, that will accommodate further new values. Inserts of positive values N+1, N+2 will go at the last page and be placed in it until it fills, then they'll cause a new page to be added and will start filling that page.
So basically the answer is this: inserts at either end of a clustered index should not cause page splits. Page splits can be caused only by inserts between two existing keys. This actually extends to the non-leaf pages as well, an index at either end of the cluster may not split a non-leaf page either. I do not discuss here the impact of updates of course (they can can cause splits if the increase the length of a variable length column).
Lately has been a lot of talk in the SQL Server blogosphere about the potential performance problems of page splits, but I must warn against going to unnecessary extremes to avoid them. Page splits are a normal index operation. If you find yourself in an environment where the page split performance hit is visible during inserts, then you'll be probably worse hit by the 'mitigation' measures because you'll create artificial page latch hot spots that are far worse as they'll affect every insert. What is true is that prolonged operation with frequent splits will result in high fragmentation which impacts the data access time. I say that is best mitigated with off-peak periodical index maintenance operation (reorganize). Avoid premature optimizations, always measure first.
Not enough to notice for any reasonable system.
Page splits happen when a page is full, either at the start or at the end of the range.
As long as you regular index maintenance...
Edit, after Fill factor comments:
After a page split wth 90 or 100 FF, each page will be 50% full. FF = 100 only means an insert will happen sooner (probably 1st insert).
With a strictly monotonically increasing (or decreasing) key (+ve or -ve), a page split happens at either end of the range.
However, from BOL, FILLFACTOR
Fill
Adding Data to the End of the Table
A nonzero fill factor other than 0 or
100 can be good for performance if the
new data is evenly distributed
throughout the table. However, if all
the data is added to the end of the
table, the empty space in the index
pages will not be filled. For example,
if the index key column is an IDENTITY
column, the key for new rows is always
increasing and the index rows are
logically added to the end of the
index. If existing rows will be
updated with data that lengthens the
size of the rows, use a fill factor of
less than 100. The extra bytes on each
page will help to minimize page splits
caused by extra length in the rows.
So does, fillfactor matter for strictly monotonic keys...? Especially if it's low volume writes
No, not at all. Negative values are just as valid as INTegers as positive ones. No problem. Basically, internally, they're all just 4 bytes worth of zeroes and ones :-)
Marc
You are asking the wrong question!
If you create a clustered index that has a fillfactor of 100%, every time a record is inserted, deleted or even modified, page splits can occur because there is likely no room on the existing index data page to write the change.
Even with regular index maintenance, a fill factor of 100% is counter productive on a table where you know inserts are going to be performed. A more usual value would be 90%.
I'm concerned that this post may have taken a wrong turn, in that there seems to be an underlying design issue at work here, irrespective of the resultant page splits.
Why do you need to introduce a negative ID?
An integer primary key, for example, should uniquely indentify a row, it's sign should be irrelevant. I suspect that there may be a definition issue with the primary key for your table if this is not the case.
If you need to flag/identify the newly inserted records then create a column specifically for this purpose.
This solution would be ideal because you may then be able to ensure that your primary key is sequential (perhaps using an Identity data type, although not essential), thereby avoiding issues with page splits (on insert) altogether.
Also, to confirm if I may, a fill factor of 100% for a clustered index primary key (identity integer for example), will not cause page splits for sequential inserts!