I have a confusion. If i set fill factor to 50% then sql engine will leave half the space empty for future growth, so data will store in the datepages upto 4KB (approx) because page max size is 8KB. Also fill factor applies only rebuilding indexes. please clear my doubt on the above scenario. Any help would be appreciated.
Thanks
DBAs and developers often read that lowering the fillfactor improves performance by reducing page splits. Perhaps they’re trying to fix a performance problem, or perhaps they’re feeling paranoid. They either lower fillfactor too much on some indexes, or apply a fillfactor change to all indexes.
Here’s the scoop: it’s true that the default fillfactor of 100% isn’t always good. If I fill my pages to the brim, and then go back and need to insert a row onto that page, it won’t fit. To make the data fit and preserve the logical structure of the index, SQL Server will have to do a bunch of complicated things (a “bad” type of page split), including:
1)Add a new page.
2)Move about half the data to the new page.
3)Mark the data that was moved on the old page so it’s not valid anymore.
4)Update page link pointers on existing pages to point to the new page
And yep, that’s a lot of work.
It generates log records and causes extra IO. And yes, if you have this happen a lot, you might want to lower the fillfactor in that index a bit to help make it happen less often.
BEST PRACTICES FOR SETTING FILLFACTOR
Here’s some simple advice on how to set fillfactor safely:
1)Don’t set the system wide value for fillfactor. It’s very unlikely that this will help your performance more than it hurts.
2)Get a good index maintenance solution that checks index fragmentation and only acts on indexes that are fairly heavily fragmented. Have the solution log to a table. Look for indexes that are frequently fragmented. Consider lowering the fillfactor gradually on those individual indexes using a planned change to rebuild the index. When you first lower fillfactor, consider just going to 95 and reassessing the index after a week or two of maintenance running again. (Depending on your version and edition of SQL Server, the rebuild may need to be done offline. Reorganize can’t be used to set a new fillfactor.)
This second option may sound nitpicky, but in most environments it only takes a few minutes to figure out where you need to make a change. You can do it once a month. And it’s worth it– because nobody wants their database performance to slow down and realize that they’ve been causing extra IO by leaving many gigabytes of space in memory needlessly empty.
Related
I am using SQL Server 2012, from few days I have noticed that fragmentation of some indexes are growing very rapidly. I have read different article and apply the fill factor.
First I have change the fill factor to 95 and rebuild, after one day fragmentation was about 50%. So I decrease the fill factor to 90 and then 80 but after one day fragmentation again reach to 50%.
I need some help to find out the reason for growing fragmentation and solution to fix it.
FYI, I am applying fill factor on index level, only 4-5 indexes are having this issue I have applied fill factor to other indexes as well they are working fine.
Thanks in advance.
There are many things which causes index fragmentation..some of them are below
1.Insert and Update operations causing Page splits
2.Delete operations
3.Initial allocation of pages from mixed extents
4.Large row size
SQL Server only uses fillfactor when you’re creating, rebuilding, or reorganizing an index,so even if you specify a fill factor of 70, you may still get page splits.. and further Index fragmentation is an “expected” and “unavoidable” characteristic of any OLTP environment.
So with your fill factor setting, sql server leaves some space when index is rebuilt and this helps in first scenario only and this is also subjected to your workload
So i recommend not worrying about fragmentation much unless your workload does a lot of range scans..below are some links which helps you
further you can track Pagesplits/deletes which are some of the causes for fragmentation using Perfmon counters/extended events and also using transaction log
https://dba.stackexchange.com/questions/115943/index-fragmentation-am-i-interpreting-the-results-correctly
https://www.brentozar.com/archive/2012/08/sql-server-index-fragmentation/
References :
Notes - SQL Server Index Fragmentation, Types and Solutions
I have a system which populates an empty database with many millions of records.
The database has various types of indexes, the ones I'm worried about are:
Indices on foreign keys. These are non-clustered, and not necessarily inserted in sequential order.
Indices on BINARY(32) fields. These are content hashes and not ordered at all. Basically, these are like GUIDS and not sequential.
So as the data is bulk-inserted, there is significant fragmentation of these indices.
Question 1: if I set FILLFACTOR=75 to these indices when database is created, will it have any effect at all as the data is inserted? It seems FILLFACTOR has effect after data is created not before. Or will new index pages be allocated with original fillfactor setting?
Question 2: what other recommended strategies can I use to make sure these indices perform optimally?
Question1:
Fill factor is used only when indexes are rebuilt,SQL doesnt try to store pages based on fill factor while doing inserts.
Question2:
It depends on what you are saying as optimal.On a minimal you can check whether your indexes are usefull and your queries are using your indexes.There are tons of best practices around indexes like selective first key,small keys..
Its good to search for any thing about indexes from Kimberly Tripp and DBA.SE
References:
http://www.sqlskills.com/blogs/paul/a-sql-server-dba-myth-a-day-2530-fill-factor/
http://www.sqlskills.com/blogs/kimberly/category/indexes/
Check the index fragmentation as well as the write/read ration. If the write/read ratio is very high AND you see fragmentation, you can experiment with adding a fillfactor during index rebuild operation.
The amount of fill-factor really depends on your fragmentation you are seeing. If you see 0-20% fragmentation (and these happened over a period of time), you may not want any fill-factor. If you see 20-40% fragmentation, you may try 90%.
Lastly get a good index maintenance plan. Ola Hallengren's index script is excellent.
NB: Above suggestions are just suggestions - your
When starting a project, should SQL indexes be created at the beginning?
I have a project where I haven´t created any indexes yet in production. The table that will grow most has 30000 rows and I have measured the time of the queries against this table creating an index and deleting it afterwards. The times are very similar.
I have decided to postpone the creation of the indexes in production until I notice a reduction of the response time in queries when creating them.
Is my approach correct? Or should I create them now?
I'm pretty deep into the topic of database indexing (it's actually my full-time job, also wrote a book about it (SQL Performance Explained) which is available for free here).
In my opinion, indexes should be created at the time you write the query because this is the time you have all the required information needed to decide which indexes to create in your head. In other words, if you do it at that time, it doesn't take you any extra effort. Another reason is that indexing sometimes affects the way you have to write the query so it can actually take benefit of that index.
However, the above statement assumes that you know how indexes work so you can decide which indexes to create. If you don't know that, I'd really suggest to learn about proper indexing first. Again, the book I've written is available for free on the web (Table of Contents). According to a recent survey, it takes you about 4-5 hours to read through it. Well-spent time, I'd say.
However, due to the ludicrous speed of modern hardware and vast amount of memory—even cheap commodity hardware—it is absolutely possible that you cannot measure any difference with these small tables (30k is small in DB world) yet. Nevertheless, you because you cannot measure this difference with a timers resolution of maybe 10ms, it doesn't mean the difference isn't there. Further: did you verify that the index was actually used? Are you sure the index you created was a good index for the given query?
Never the less, if the overall system is fast enough for you at the moment, sure you can go on without indexes. The risk remains, however, that it isn't fast enough on the day a major news outlet covers your app. What is supposed to be your best day might turn out to become your worst day :(
You didn't tell us a lot about your app, so I've to do some guesswork. I guess it is more like an OLTP app like an online website (as opposed to BI/OLAP). Although indexes add some overhead to write operations (insert, update, delete and merge), this is typically small compared to the benefit they bring to select (still assuming OLTP). Sure you can misuse indexes (e.g., creating hundreds on a single table) so that the overhead becomes a major problem too. But adding "a few" indexes on an OLTP table will most certainly not cause any problems due to the maintenance overhead.
Coming to an end: if you already know which indexes are good for your queries (verify it using explain), add them now before it is too late. If you are not sure, I'd still suggest to put some effort on that now. If you are not afraid of load peaks taking your app down, go on without indexes.
If you need more help, create a new question containing your query, table & index definitions as well as the explain output and people will be happy to help you figuring out if that index is fine or not.
Just create them now based on sensible choices: start with primary and foreign keys - thst'll keep your joins fast - then add indexes on single columns you'll be searching on (name, phone, etc) you are using.
Avoid creating multiple column indexes until you have a demonstrated performance problem and you can prove that an index helps. Often, reworking the query will fix the problem better than some complicated index.
The only time I delay creating indexes is if I'm about to load a heap of data and building indexes before loading means a much slower load as the index is updated for every row addition, although some databases allow the index rebuild to be deferred until after the load, so even then there's no point in waiting.
I don't know the correct words for what I'm trying to find out about and as such having a hard time googling.
I want to know whether its possible with databases (technology independent but would be interested to hear whether its possible with Oracle, MySQL and Postgres) to point to specific rows instead of executing my query again.
So I might initially execute a query find some rows of interest and then wish to avoid searching for them again by having a list of pointers or some other metadata which indicates the location on a database which I can go to straight away the next time I want those results.
I realise there is caching on databases, but I want to keep these "pointers" else where and as such caching doesn't ultimately solve this problem. Is this just an index and I store the index and look up by this? most of my current tables don't have indexes and I don't want the speed decrease that sometimes comes with indexes.
So whats the magic term I've been trying to put into google?
Cheers
In Oracle it is called ROWID. It identifies the file, the block number, and the row number in that block. I can't say that what you are describing is a good idea, but this might at least get you started looking in the right direction.
Check here for more info: http://www.orafaq.com/wiki/ROWID.
By the way, the "speed decrease that comes with indexes" that you are afraid of is only relevant if you do more inserts and updates than reads. Indexes only speed up reads, so if the read ratio is high, you might not have an issue and an index might be your best solution.
most of my current tables don't have
indexes and I don't want the speed
decrease that sometimes comes with
indexes.
And you also don't want the speed increase which usually comes with indexes but you want to hand-roll a bespoke pseudo-cache instead?
I'm not being snarky here, this is a serious point. Database designers have expended a great deal of skill and energy into optimizing their products. Wouldn't it be more sensible to learn how to take advantage of their efforts rather re-implementing some core features?
In general, the best way to handle this sort of requirement is to use the primary key (or in fact any convenient, compact unique identifier) as the 'pointer', and rely on the indexed lookup to be swift - which it usually will be.
You can use ROWID in more DBMS than just Oracle, but it generally isn't recommended for a variety or reasons. If you succumb to the 'every table has an autoincrement column' school of database design, then you can record the autoincrement column values as the identifiers.
You should have at least one index on (almost) all of your tables - that index will be for the primary key. The exception might be for a table so small that it fits in memory easily and won't be updated and will be used enough not to be evicted from memory. Then an index might be a distraction; however, such tables are typically seldom updated so the index won't harm anything, and the optimizer will ignore it if the index doesn't help (and it may not).
You may also have auxilliary indexes. In a system where most of the activity is reading the data, you may want to erro on the side of having more indexes rather than fewer, because access time is most critical. If your system was update intensive, then you would go with fewer indexes because there is a cost associated with updating indexes when data is added, removed or updated. Clearly, you need to design the indexes to work well with the queries that your users actually perform (or your applications perform).
You may also be interested in cursors. (Note that the index debate is still valid with cursors.)
Wikipedia definition here.
I'm creating an app that will have to put at max 32 GB of data into my database. I am using B-tree indexing because the reads will have range queries (like from 0 < time < 1hr).
At the beginning (database size = 0GB), I will get 60 and 70 writes per millisecond. After say 5GB, the three databases I've tested (H2, berkeley DB, Sybase SQL Anywhere) have REALLY slowed down to like under 5 writes per millisecond.
Questions:
Is this typical?
Would I still see this scalability issue if I REMOVED indexing?
What are the causes of this problem?
Notes:
Each record consists of a few ints
Yes; indexing improves fetch times at the cost of insert times. Your numbers sound reasonable - without knowing more.
You can benchmark it. You'll need to have a reasonable amount of data stored. Consider whether or not to index based upon the queries - heavy fetch and light insert? index everywhere a where clause might use it. Light fetch, heavy inserts? Probably avoid indexes. Mixed workload; benchmark it!
When benchmarking, you want as real or realistic data as possible, both in volume and on data domain (distribution of data, not just all "henry smith" but all manner of names, for example).
It is typical for indexes to sacrifice insert speed for access speed. You can find that out from a database table (and I've seen these in the wild) that indexes every single column. There's nothing inherently wrong with that if the number of updates is small compared to the number of queries.
However, given that:
1/ You seem to be concerned that your writes slow down to 5/ms (that's still 5000/second),
2/ You're only writing a few integers per record; and
3/ You're queries are only based on time queries,
you may want to consider bypassing a regular database and rolling your own sort-of-database (my thoughts are that you're collecting real-time data such as device readings).
If you're only ever writing sequentially-timed data, you can just use a flat file and periodically write the 'index' information separately (say at the start of every minute).
This will greatly speed up your writes but still allow a relatively efficient read process - worst case is you'll have to find the start of the relevant period and do a scan from there.
This of course depends on my assumption of your storage being correct:
1/ You're writing records sequentially based on time.
2/ You only need to query on time ranges.
Yes, indexes will generally slow inserts down, while significantly speeding up selects (queries).
Do keep in mind that not all inserts into a B-tree are equal. It's a tree; if all you do is insert into it, it has to keep growing. The data structure allows for some padding, but if you keep inserting into it numbers that are growing sequentially, it has to keep adding new pages and/or shuffle things around to stay balanced. Make sure that your tests are inserting numbers that are well distributed (assuming that's how they will come in real life), and see if you can do anything to tell the B-tree how many items to expect from the beginning.
Totally agree with #Richard-t - it is quite common in offline/batch scenarios to remove indexes completely before bulk updates to a corpus, only to reapply them when update is complete.
The type of indices applied also influence insertion performance - for example with SQL Server clustered index update I/O is used for data distribution as well as index update, where as nonclustered indexes are updated in seperate (and therefore more expensive) I/O operations.
As with any engineering project - best advice is to measure with real datasets (skews page distribution, tearing etc.)
I think somewhere in the BDB docs they mention that page size greatly affects this behavior in btree's. Assuming you arent doing much in the way of concurrency and you have fixed record sizes, you should try increasing your page size