Putting indices on all columns of a read only table - database

I have a table in a database that will be generated from the start and probably never be written to again. Even if it were ever written to, it'll be in the form of batch processes run during a release, and write time is not important at all.
It's a relatively large table with about 80k rows and maybe about 10-12 columns.
The application is likely to retrieve data from this table often.
I was thinking, since it'll never be written to again, should I just put indices on all the columns? That way it'll always be quick to read no matter what type of query I form?
Is this a good idea? Is there any downside to this I should be aware of?

My understanding is that each index does require some (a relatively small amount of) storage space. If you're tight for space this could matter. Exactly how much impact this might make may depend on which DB you are using.

It will depend on the table. If all of the columns will be used in search criteria, then it is not unreasonable to put indexes on them all. That is fairly unlikely though. Also, there may be compound (multi-column) indexes that would be more beneficial than some of the simple (single-column) indexes.
Finally, the query optimizer will have to review all the indexes that are present on the table when evaluating how answer queries. It is hard to say when this becomes a measurable overhead, but more indexes takes more time.
So, given the static nature of the table you describe, it is reasonable to index it more heavily than you might a more dynamic table. Indexing every column is probably not sensible. Choosing carefully which compound indexes to add may be important too.

Choose indexes for a table based on the queries you run against that table.
Indexes you never need for any query are just wasted space.
Individual indexes on each column isn't the full set of indexes possible. You also can make multi-column indexes (i.e. compound indexes), and these can be important for optimizing certain queries. The order of columns in a compound index matters.
SQL Server 2008 supports only 999 nonclustered indexes per table, so if you try to create all possible indexes on a table of more than a few columns, you will reach the limit.
Sorry, but you actually need to learn some things before you can optimize effectively. If it were simply a matter of indexing every column, then the RDBMS would do this by default.

Related

Disadvantages of creating multiple indexes in a PostgreSQL table

After reading documentation on indexes I thought
Hey, since (in my case) almost always reading from database is performed much more often than writing to it, why not create indexes on most fields in the table?
Is it right attitude? Are there any other downsides of doing this, except longer inserting?
Of course, indexes would be limited to fields that I actually use in conditions of SELECT statements.
Indexes have several disadvantages.
First, they consume space. This may be inconsequential, but if your table is particularly large, it may have an impact.
Second, and more importantly, you need to remember that indexes have a performance penalty when it comes to INSERTing new rows, DELETEing old ones or UPDATEing existing values of the indexed column, as now the DML statement need not only modify the table's data, but the index's one too. Once again, this depends largely on your application usecase. If DMLs are so rare that the performance is a non-issue, this may not be a consideration.
Third (although this ties in storngly to my first point), remember that each time you create another database object, you are creating an additional maintenance overhead - it's another index you'd have to occasionally rebuild, collect statistics for (depending on the RDBMS you're using, of course), another objet to clatter the data dictionary, etc.
The bottom line all comes down to your usecase. If you have important queries that you run often and that can be improved by this index - go for it. If you're running this query once in a blue moon, you probably wouldn't want to slow down all your INSERT statements.
There also is a (small) performance downside of having many indexes during read operations: the more you have, the more time Postgres spends evaluating which query plan is best.
As you correctly state, there's no point in having an index on fields that you don't use on conditions. It merely slows down inserts.
Note that there also is little point in having an index on fields that offer dubious selectivity. Picture a table that is roughly split in two depending on whether rows have a boolean set to true or false. An index on that field offers little benefit for queries if true and false values distributed mostly evenly — the index could be useful, from lack of better options, if true and false rows are clustered together. If, in contrast, that field is 90% true, a partial index on the 10% that are false is useful.
And lastly, there is the storage issue: each index takes space. Sometimes (GIN in particular) a few times more than the indexed data.

What are some best practices and "rules of thumb" for creating database indexes?

I have an app, which cycles through a huge number of records in a database table and performs a number of SQL and .Net operations on records within that database (currently I am using Castle.ActiveRecord on PostgreSQL).
I added some basic btree indexes on a couple of the feilds, and as you would expect, the performance of the SQL operations increased substantially. Wanting to make the most of dbms performance I want to make some better educated choices about what I should index on all my projects.
I understand that there is a detrement to performance when doing inserts (as the database needs to update the index, as well as the data), but what suggestions and best practices should I consider with creating database indexes? How do I best select the feilds/combination of fields for a set of database indexes (rules of thumb)?
Also, how do I best select which index to use as a clustered index? And when it comes to the access method, under what conditions should I use a btree over a hash or a gist or a gin (what are they anyway?).
Some of my rules of thumb:
Index ALL primary keys (I think most RDBMS do this when the table is created).
Index ALL foreign key columns.
Create more indexes ONLY if:
Queries are slow.
You know the data volume is going to increase significantly.
Run statistics when populating a lot of data in tables.
If a query is slow, look at the execution plan and:
If the query for a table only uses a few columns, put all those columns into an index, then you can help the RDBMS to only use the index.
Don't waste resources indexing tiny tables (hundreds of records).
Index multiple columns in order from high cardinality to less. This means: first index the columns with more distinct values, followed by columns with fewer distinct values.
If a query needs to access more than 10% of the data, a full scan is normally better than an index.
Here's a slightly simplistic overview: it's certainly true that there is an overhead to data modifications due to the presence of indexes, but you ought to consider the relative number of reads and writes to the data. In general the number of reads is far higher than the number of writes, and you should take that into account when defining an indexing strategy.
When it comes to which columns to index I'v e always felt that the designer ought to know the business well enough to be able to take a very good first pass at which columns are likely to benefit. Other then that it really comes down to feedback from the programmers, full-scale testing, and system monitoring (preferably with extensive internal metrics on performance to capture long-running operations),
As #David Aldridge mentioned, the majority of databases perform many more reads than they do writes and in addition, appropriate indexes will often be utilised even when performing INSERTS (to determine the correct place to INSERT).
The critical indexes under an unknown production workload are often hard to guess/estimate, and a set of indexes should not be viewed as set once and forget. Indexes should be monitored and altered with changing workloads (that new killer report, for instance).
Nothing beats profiling; if you guess your indexes, you will often miss the really important ones.
As a general rule, if I have little idea how the database will be queried, then I will create indexes on all Foriegn Keys, profile under a workload (think UAT release) and remove those that are not being used, as well as creating important missing indexes.
Also, make sure that a scheduled index maintenance plan is also created.

Table design and performance related to database?

I have a table with 158 columns in SQL Server 2005.
any disdvantage of keeping so many columns.
Also I have to keep those many columns,
how can i improve performance - like using SP's, Indexes?
Wide tables can be quite performant when you usually want all the fields for a particular row. Have you traced your user's usage patterns? If they're usually pulling just one or two fields from multiple rows then your performance will suffer. The main issue is when your total row size hits the 8k page mark. That means SQL has to hit the disk twice for every row (first page + overflow page), and thats not counting any index hits.
The guys at Universal Data Models will have some good ideas for refactoring your table. And Red Gate's SQL Refactor makes splitting a table heaps easier.
There is nothing inherently wrong with wide tables. The main case for normalization is database size, where lots of null columns take up a lot of space.
The more columns you have, the slower your queries will be.
That's just a fact. That isn't to say you aren't justified in having many columns. The above does not give one carte blanche to split one entity's worth of table with many columns into multiple tables with fewer columns. The administrative overhead of such a solution would most probably outweigh any perceived performance gains.
My number one recommendation to you, based off of my experience with abnormally wide tables (denormalized schemas of bulk imported data) is to keep the columns as thin as possible. I had to work with a lot of crazy data and left most of the columns as VARCHAR(255). I recommend against this. Although convenient for development purposes, performance would spiral out of control, especially when working with Perl. Shrinking the columns to their bare minimum (VARCHAR(18) for instance) helped immensely.
Stored procedures are just batches of SQL commands; they don't have any direct on speed other than that regular use of certain types of stored procedures will end up using cached query plans (which is a performance boost).
You can use indexes to speed up certain queries, but there's no hard and fast rule here. Good index design depends entirely on the type of queries you're running. Indexing will, by definition, make your writes slower; they exist only to make your reads faster.
The problem with having that many columns in a table is that finding rows using the clustered primary key can be expensive. If it were possible to change the schema, breaking this up into many normalized tables will be the best way to improve efficiency. I would strongly recommend this course.
If not, then you may be able to use indices to make some SELECT queries faster. If you have queries that only use a small number of the columns, adding indices on those columns could mean that the clustered index will not need to be scanned. Of course, there is always a price to pay with indices, in terms of storage space and INSERT, UPDATE and DELTETE time, so this may not be a good idea for you.

should nearly unique fields have indexes

I have a field in a database that is nearly unique: 98% of the time the values will be unique, but it may have a few duplicates. I won't be doing many searches on this field; say twice a month. The table currently has ~5000 records and will gain about 150 per month.
Should this field have an index?
I am using MySQL.
I think the 'nearly unique' is probably a red herring. The data is either unique, or it's not, but that doesn't determine whether you would want to index it for performance reasons.
Answer:
5000 records is really not many at all, and regardless of whether you have an index, searches will still be fast. At that rate of inserts, it'll take you 3 years to get to 10000 records, which is still also not many.
I personally wouldn't bother with adding an index, but it wouldn't matter if you did.
Explanation:
What you have to think about when deciding to add an index is the trade-off between insertion speed, and selection speed.
Without an index, doing a select on that field means MySQL has to walk over every single row and read every single field. Adding an index prevents this.
The downside of the index is that each time data gets inserted, the DB has to update the index in addition to adding the data. This is usually a small overhead, but you'd really notice it if you had loads of indexes, and were doing a lot of writes.
By the time you get this many rows in your database, you'd want an index anyway as otherwise your selects would take all day, but it's just something to be aware about so that you don't end up adding indexes on fields "just in case I need it"
That's not very many records at all; I wouldn't bother making any indexes on that table. The relative uniqueness of the field is irrelevant - even on years-old commodity hardware I'd expect a query on that table to take a fraction of a second.
you can use the general rule of thumb: optimize when it becomes a problem. Just don't use an index until you notice you need one.
From what you say, it doesn't sound like an index is necessary. Rule of thumb is index fields that are being used in SELECTS a lot to speed up the searching, which in turn (can) slows down INSERTS and UPDATES.
On a recordset as small as yours, I don't think you will see much of a real world hit either way.
If you'll only be doing searches on it twice a month and its that few rows then I would say don't index it. Its all but useless.
No. There aren't many records and it's not going to be frequently queried. No need to index.
It's really a judgement call. With such a small table you can search reasonably quickly without an index, so you could get by without it.
On the other hand, the cost of creating an index you don't really need is pretty low, so you're not saving yourself much by not doing it.
Also, if you do create the index, you're covered for the future if you suddenly start getting 1000 new records/week. Possibly you know enough about the situation to say for certain that that will never happen, but requirements do have a way of changing when you least expect.
EDIT: As far as changing requirements, the thing to consider is this: If the DB does grow and you find out later that you do need an index, can you simply create the index and be done? Or will you also need to change lots of code to make use of the new index?
It depends. As others have responded, there's a trade off between table update speed and selection speed. Table update includes inserts, updates, and deletes on the table.
One question you didn't address. Does the table have a primary key, and a corresponding index? A table with no indexes usually benefits form having at least one index. The most common way of getting that index is to declare a primary key, and rely on the DBMS to generate an index accordingly.
If a table has no candidates for primary key, that usually indicates a serious flaw in table design. That's a separate issue and should get a spearate discussion.

How many database indexes is too many?

I'm working on a project with a rather large Oracle database (although my question applies equally well to other databases). We have a web interface which allows users to search on almost any possible combination of fields.
To make these searches go fast, we're adding indexes to the fields and combinations of fields on which we believe users will commonly search. However, since we don't really know how our customers will use this software, it's hard to tell which indexes to create.
Space isn't a concern; we have a 4 terabyte RAID drive of which we are using only a small fraction. However, I'm worried about the possible performance penalties of having too many indexes. Because those indexes need to be updated every time a row is added, deleted, or modified, I imagine it'd be a bad idea to have dozens of indexes on a single table.
So how many indexes is considered too many? 10? 25? 50? Or should I just cover the really, really common and obvious cases and ignore everything else?
It depends on the operations that occur on the table.
If there's lots of SELECTs and very few changes, index all you like.... these will (potentially) speed the SELECT statements up.
If the table is heavily hit by UPDATEs, INSERTs + DELETEs ... these will be very slow with lots of indexes since they all need to be modified each time one of these operations takes place
Having said that, you can clearly add a lot of pointless indexes to a table that won't do anything. Adding B-Tree indexes to a column with 2 distinct values will be pointless since it doesn't add anything in terms of looking the data up. The more unique the values in a column, the more it will benefit from an index.
I usually proceed like this.
Get a log of the real queries run on the data on a typical day.
Add indexes so the most important queries hit the indexes in their execution plan.
Try to avoid indexing fields that have a lot of updates or inserts
After a few indexes, get a new log and repeat.
As with all any optimization, I stop when the requested performance is reached (this obviously implies that point 0. would be getting specific performance requirements).
Everyone else has been giving you great advice. I have an added suggestion for you as you move forward. At some point you have to make a decision as to your best indexing strategy. In the end though, the best PLANNED indexing strategy can still end up creating indexes that don't end up getting used. One strategy that lets you find indexes that aren't used is to monitor index usage. You do this as follows:-
alter index my_index_name monitoring usage;
You can then monitor whether the index is used or not from that point forward by querying v$object_usage. Information on this can be found in the Oracle® Database Administrator's Guide.
Just remember that if you have a warehousing strategy of dropping indexes before updating a table, then recreating them, you will have to set the index up for monitoring again, and you'll lose any monitoring history for that index.
In data warehousing it is very common to have a high number of indexes. I have worked with fact tables having two hundred columns and 190 of them indexed.
Although there is an overhead to this it must be understood in the context that in a data warehouse we generally only insert a row once, we never update it, but it can then participate in thousands of SELECT queries which might benefit from indexing on any of the columns.
For maximum flexibility a data warehouse generally uses single column bitmap indexes except on high cardinality columns, where (compressed) btree indexes can be used.
The overhead on index maintenance is mostly associated with the expense of writing to a great many blocks and the block splits as new rows are added with values that are "in the middle" of existing value ranges for that column. This can be mitigated by partitioning and having the new data loads aligned with the partitioning scheme, and by using direct path inserts.
To address your question more directly, I think it is probably fine to index the obvious at first, but do not be afraid of adding more indexes on if the queries against the table would benefit.
In a paraphrase of Einstein about simplicity, add as many indexes as you need and no more.
Seriously, however, every index you add requires maintenance whenever data is added to the table. On tables that are primarily read only, lots of indexes are a good thing. On tables that are highly dynamic, fewer is better.
My advice is to cover the common and obvious cases and then, as you encounter issues where you need more speed in getting data from specific tables, evaluate and add indices at that point.
Also, it's a good idea to re-evaluate your indexing schemes every few months, just to see if there is anything new that needs indexing or any indices that you've created that aren't being used for anything and should be gotten rid of.
In addition to the points everyone else has raised, the Cost Based Optimizer incurs a cost when creating a plan for an SQL statement if there are more indexes because there are more combinations for it to consider. You can reduce this by correctly using bind variables so that SQL statements stay in the SQL cache. Oracle can then do a soft parse and re-use the plan it found last time.
As always, nothing is simple. If there are skewed columns and histograms involved then this can be a bad idea.
In our web applications we tend to limit the combinations of searches that we allow. Otherwise you would have to test literally every combination for performance to ensure you did not have a lurking problem that someone will find one day. We have also implemented resource limits to stop this causing issues elsewhere in the application should something go wrong.
I made some simple tests on my real project and real MySql database. I already answered in this topic: What is the cost of indexing multiple db columns?
But I think it will be better if I quote it here:
I made some simple tests using my real
project and real MySql database.
My results are: adding average index
(1-3 columns in an index) to a table -
makes inserts slower by 2.1%. So, if
you add 20 indexes, your inserts will
be slower by 40-50%. But your selects
will be 10-100 times faster.
So is it ok to add many indexes? - It
depends :) I gave you my results - You
decide!
Ultimately how many indexes you need depend on the behavior of your applications that ride on top of your database server.
In general the more inserting you do the more painful your indexes become. Each time you do an insert, all the indexes that include that table have to be updated.
Now if your application has a decent amount of reading, or even more so if it's almost all reading, then indexes are the way to go as there will be major performance improvements for very little cost.
There's no static answer in my opinion, this sort of thing falls under 'performance tuning'.
It could be that everything your app does is looked up by a primary key, or it could be the oposite in that queries are done over unristricted combinations of fields and any one in particular could be used at any given time.
Beyond just indexing, there's reogranizing your DB to include calculated search fields, splitting tables, etc - it's really dependant on your load shapes and query parameters, how much/what data 'really' needs to be retruend by a query.
If your entire DB is fronted by stored-procedure facades turning becomes a bit easier, as you don't have to wory about every ad-hoc query. Or you may have a deep understanding of the kind of queries that will hit your DB, and can limit the tuning to those.
For SQL Server I've found the Database Engine Tuning advisor usefull - you set up 'typical' workloads and it can make recommendations about adding/removing indexes and statistics. I'm sure other DBs have similar tools, either 'offical' or third party.
This really is a more theoretical questions than practical. Indexes impact on your performance depends on the hardware you have, the version of Oracle, index types, etc. Yesterday I heard Oracle announced a dedicated storage, made by HP, which is supposed to perform 10 times faster with 11g database.
As for your case, there can be several solutions:
1. Have a large amount of indexes (>20) and rebuild them daily (nightly). This would be especially useful if the table gets thousands of updates/deletes daily.
2. Partition your table (if that applies your data model).
3. Use a separate table for new/updated data, and run a nightly process which combines the data together. This would require a change in your application logic.
4. Switch to IOT (index organized table), if your data support this.
Of course there might be many more solutions for such case. My first suggestion to you, would be to clone the DB to a development environment, and run some stress testing against it.
An index imposes a cost when the underlying table is updated. An index provides a benefit when it is used to spped up a query. For each index, you need to balance the cost against the benefit. How much slower does the query run without the index? How much of a benefit is running faster? Can you or your users tolerate the slow speed when the index is missing?
Can you tolerate the additional time it takes to complete an update?
You need to compare costs and benefits. That's particular to your situation. There's no magic number of indexes that passes the threshold of "too many".
There's also the cost of the space needed to store the index, but you've said that in your situation that's not an issue. The same is true in most situations, given how cheap disk space has become.
If you do mostly reads (and few updates) then there's really no reason not to index everything you'll need to index. If you update often, then you may need to be cautious on how many indexes you have. There's no hard number, but you'll notice when things start to slow down. Make sure your clustered index is the one that makes the most sense based on the data.
One thing you may consider is building indexes to target a standard combination of searches. If column1 is commonly searched, and column2 is often used with it, and column3 is sometimes used with column2 and column1, then an index on column1, column2, and column3 in that order can be used for any of those three circumstances, though it is only one index that has to be maintained.
How many columns are there?
I have always been told to make single-column indexes, not multi-column indexes. So no more indexes than the amount of columns, IMHO.
What it really comes down to is, don't add an index unless you know (and this often means gathering usage statistics) that it will be used far more often than it's updated.
Any index that doesn't meet that criteria will cost you more to rebuild than the performance penalty of not having it in the odd case it got used.
Sql server gives you some good tools that let you see which indexes are actually being used.
This article, http://www.mssqltips.com/tip.asp?tip=1239, gives you some queries that let you get a better insight into how much an index is used, as opposed to how much it is updated.
It is totally based on the columns which are being used in Where Clause.
And as the Thumb of Rule, we must have indexes on Foreign Key Columns to avoid DEADLOCKS.
AWR report should analyze periodically to understand the need of indexes.

Resources