We use an Oracle 10.2.0.5 database in Production.
Optimizer is in "cost-based" mode.
Do we need to calculate statistics (DBMS_STATS package) after:
creating a new index
adding a column
creating a new table
?
Thanks
There's no short answer. It totally depends on your data and how you use it. Here are some things to consider:
As #NullUserException pointed out, statistics are automatically gathered, usually every night. That's usually good enough; in most (OLTP) environments, if you just added new objects they won't contain a lot of data before the stats are automatically gathered. The plans won't be that bad, and if the objects are new they probably won't be used much right away.
creating a new index - No. "Oracle Database now automatically collects statistics during index creation and rebuild".
adding a column - Maybe. If the column will be used in joins and predicates you probably want stats on it. If it's just used for storing and displaying data it won't really affect any plans. But, if the new column takes up a lot of space it may significantly alter the average row length, number of blocks, row chaining, etc., and the optimizer should know about that.
creating a new table - Probably. Oracle is able to compensate for missing statistics through dynamic sampling, although this often isn't good enough. Especially if the new table has a lot of data; bad statistics almost always lead to under-estimating the cardinality, which will lead to nested loops when you want hash joins. Also, even if the table data hasn't changed, you may need to gather statistics one more time to enable histograms. By default, Oracle creates histograms for skewed data, but will not enable those histograms if those columns haven't been used as a predicate. (So this applies to adding a new column as well). If you drop and re-create a table, even with the same name, Oracle will not maintain any of that column use data, and will not know that you need histograms on certain columns.
Gathering optimizer statistics is much more difficult than most people realize. At my current job, most of our performance problems are ultimately because of bad statistics. If you're trying to come up with a plan for your system you ought to read the Managing Optimizer Statistics chapter.
Update:
There's no need to gather statistics for empty objects; dynamic sampling will work just as quickly as reading stats from the data dictionary. (Based on a quick test hard parsing a large number of queries with and without stats.) If you disable dynamic sampling then there may be some weird cases where the Oracle default values lead to inaccurate plans, and you would be better off with statistics on an empty table.
I think the reason Oracle automatically gathers stats for indexes at creation time is because it doesn't cost much extra. When you create an index you have to read all the blocks in the table, so Oracle might as well calculate the number of levels, blocks, keys, etc., at the same time.
Table statistics can be more complicated, and may require multiple passes of the data. Creating an index is relatively simple compared to the arbitrary SQL that may be used as part of a create-table-as-select. It may not be possible, or efficient, to take those arbitrary SQL statements and transform them into a query that also returns the information needed to gather statistics.
Of course it wouldn't cost anything extra to gather stats for an empty table. But it doesn't gain you anything either, and it would just be misleading to anyone who looks at the USER_TABLES.LAST_ANALYZED - the table appear to be analyzed, but not with any meaningful data.
I'm currently using a MySQL table for an online game under LAMP.
One of the table is huge (soon millions of rows) and contains only integers (IDs,timestamps,booleans,scores).
I did everything to never have to JOIN on this table. However, I'm worried about the scalability. I'm thinking about moving this single table to another faster database system.
I use intermediary tables to calculate the scores but in some cases, I have to use SUM() or AVERAGE() directly on some filtered rowsets of this table.
For you, what is the best database choice for this table?
My requirements/specs:
This table contains only integers (around 15 columns)
I need to filter by certain columns
I'd like to have UNIQUE KEYS
It could be nice to have "INSERT ... ON DUPLICATE UPDATE" but I suppose my scripts can manage it by themselves.
i have to use SUM() or AVERAGE()
thanks
Just make sure you have the correct indexes on so selecting should be quick
Millions of rows in a table isn't huge. You shouldn't expect any problems in selecting, filtering or upserting data if you index on relevant keys as #Tom-Squires suggests.
Aggregate queries (sum and avg) may pose a problem though. The reason is that they require a full table scan and thus multiple fetches of data from disk to memory. A couple of methods to increase their speed:
If your data changes infrequently then caching those query results in your code is probably a good solution.
If it changes frequently then the quickest way to improve their performance is probably to ensure that your database engine keeps the table in memory. A quick calculation of expected size: 15 columns x 8 bytes x millions =~ 100's of MB - not really an issue (unless you're on a shared host). If your RDBMS does not support tuning this for a specific table, then simply put it in a different database schema - shouldn't be a problem since you're not doing any joins on this table. Most engines will allow you to tune that.
I have a requirement to create a report that is killing the processor and taking a long time to run.
I think I could speed this up significantly by creating an index view that keeps all this data in one place making it a lot easy to query/report on. This view would not just be used for the report as I think it would benefit quite a few areas in the data layer.
The indexed view will potentially contain 5 million+ records, I cant seem to find any guidance as to at what point indexed views are not longer recommended. I assume that an index view of this size would take considerable time to build when SQL first starts, but I would hope after this the cost of maintaining it would be minimal.
Is there any kind of best practice guide as to when to use index views and when not to use them? Would the view rebuild itself after every server restart or does it get stored somewhere on the disk?
The index associated with your Indexed View will be updated whenever updates are made to the any of the columns in the index.
High numbers of updates will most likely kill the benefit. If it is mainly reads then it will work fine.
The real benefits of Indexed Views are when you have aggregates that are too expensive to compute in real time.
Please see: Improving Performance with SQL Server 2008 Indexed Views:
Indexed views can increase query
performance in the following ways:
Aggregations can be precomputed and stored in the index to minimize
expensive computations during query
execution.
Tables can be prejoined and the resulting data set stored.
Combinations of joins or aggregations can be stored.
The query optimizer considers indexed
views only for queries with nontrivial
cost. This avoids situations where
trying to match various indexed views
during the query optimization costs
more than the savings achieved by the
indexed view usage. Indexed views are
rarely used in queries with a cost of
less than 1.
Applications that benefit from the
implementation of indexed views
include:
Decision support workloads.
Data marts.
Data warehouses.
Online analytical processing (OLAP) stores and sources.
Data mining workloads.
From the query type and pattern point
of view, the benefiting applications
can be characterized as those
containing:
Joins and aggregations of large tables.
Repeated patterns of queries.
Repeated aggregations on the same or overlapping sets of columns.
Repeated joins of the same tables on the same keys.
Combinations of the above.
An indexed view (aka materialized view) is maintained by SQL Server after every change to the underlying table(s). Needless to say, you should not have an indexed view on a table that has traffic.
For your problem, a better solution would be to run the query and store it in its own table, like:
select * into CachedReport from YourView
That will give you the performance of an indexed view, while you can decide when to refresh it. For example, you could refresh it by running the select into query from a scheduled job every night.
I'm not aware of any guidance concerning size of indexed views. It's effectively a separate table that's being "automagically" updated every time the base tables on which it depends are updated, so I tend to think of it as a separate table.
As to your question on the building of the index - it's stored on disk, the same as every other index, so it doesn't get rebuilt during server restart (other than any repair that takes place due to transactions not having completed before the restart).
There's no hard row number limit on when to use a table or a materialised view.
However as a guide line avoid a materialised view over volatile tables - the load can kill your server.
First off as Timothy suggested check the indexes on your underlying tables, then the statistics. Your Query Optimiser might be just on the complete track due to missing/out of date statistics.
If this doesn't help with performance check what data is really required from the view as my guess is that a) the row count and b) the row size is what is killing your server loading the whole view into temp table and running it through I/O contention.
I have a table with 158 columns in SQL Server 2005.
any disdvantage of keeping so many columns.
Also I have to keep those many columns,
how can i improve performance - like using SP's, Indexes?
Wide tables can be quite performant when you usually want all the fields for a particular row. Have you traced your user's usage patterns? If they're usually pulling just one or two fields from multiple rows then your performance will suffer. The main issue is when your total row size hits the 8k page mark. That means SQL has to hit the disk twice for every row (first page + overflow page), and thats not counting any index hits.
The guys at Universal Data Models will have some good ideas for refactoring your table. And Red Gate's SQL Refactor makes splitting a table heaps easier.
There is nothing inherently wrong with wide tables. The main case for normalization is database size, where lots of null columns take up a lot of space.
The more columns you have, the slower your queries will be.
That's just a fact. That isn't to say you aren't justified in having many columns. The above does not give one carte blanche to split one entity's worth of table with many columns into multiple tables with fewer columns. The administrative overhead of such a solution would most probably outweigh any perceived performance gains.
My number one recommendation to you, based off of my experience with abnormally wide tables (denormalized schemas of bulk imported data) is to keep the columns as thin as possible. I had to work with a lot of crazy data and left most of the columns as VARCHAR(255). I recommend against this. Although convenient for development purposes, performance would spiral out of control, especially when working with Perl. Shrinking the columns to their bare minimum (VARCHAR(18) for instance) helped immensely.
Stored procedures are just batches of SQL commands; they don't have any direct on speed other than that regular use of certain types of stored procedures will end up using cached query plans (which is a performance boost).
You can use indexes to speed up certain queries, but there's no hard and fast rule here. Good index design depends entirely on the type of queries you're running. Indexing will, by definition, make your writes slower; they exist only to make your reads faster.
The problem with having that many columns in a table is that finding rows using the clustered primary key can be expensive. If it were possible to change the schema, breaking this up into many normalized tables will be the best way to improve efficiency. I would strongly recommend this course.
If not, then you may be able to use indices to make some SELECT queries faster. If you have queries that only use a small number of the columns, adding indices on those columns could mean that the clustered index will not need to be scanned. Of course, there is always a price to pay with indices, in terms of storage space and INSERT, UPDATE and DELTETE time, so this may not be a good idea for you.
I'm working on a project with a rather large Oracle database (although my question applies equally well to other databases). We have a web interface which allows users to search on almost any possible combination of fields.
To make these searches go fast, we're adding indexes to the fields and combinations of fields on which we believe users will commonly search. However, since we don't really know how our customers will use this software, it's hard to tell which indexes to create.
Space isn't a concern; we have a 4 terabyte RAID drive of which we are using only a small fraction. However, I'm worried about the possible performance penalties of having too many indexes. Because those indexes need to be updated every time a row is added, deleted, or modified, I imagine it'd be a bad idea to have dozens of indexes on a single table.
So how many indexes is considered too many? 10? 25? 50? Or should I just cover the really, really common and obvious cases and ignore everything else?
It depends on the operations that occur on the table.
If there's lots of SELECTs and very few changes, index all you like.... these will (potentially) speed the SELECT statements up.
If the table is heavily hit by UPDATEs, INSERTs + DELETEs ... these will be very slow with lots of indexes since they all need to be modified each time one of these operations takes place
Having said that, you can clearly add a lot of pointless indexes to a table that won't do anything. Adding B-Tree indexes to a column with 2 distinct values will be pointless since it doesn't add anything in terms of looking the data up. The more unique the values in a column, the more it will benefit from an index.
I usually proceed like this.
Get a log of the real queries run on the data on a typical day.
Add indexes so the most important queries hit the indexes in their execution plan.
Try to avoid indexing fields that have a lot of updates or inserts
After a few indexes, get a new log and repeat.
As with all any optimization, I stop when the requested performance is reached (this obviously implies that point 0. would be getting specific performance requirements).
Everyone else has been giving you great advice. I have an added suggestion for you as you move forward. At some point you have to make a decision as to your best indexing strategy. In the end though, the best PLANNED indexing strategy can still end up creating indexes that don't end up getting used. One strategy that lets you find indexes that aren't used is to monitor index usage. You do this as follows:-
alter index my_index_name monitoring usage;
You can then monitor whether the index is used or not from that point forward by querying v$object_usage. Information on this can be found in the Oracle® Database Administrator's Guide.
Just remember that if you have a warehousing strategy of dropping indexes before updating a table, then recreating them, you will have to set the index up for monitoring again, and you'll lose any monitoring history for that index.
In data warehousing it is very common to have a high number of indexes. I have worked with fact tables having two hundred columns and 190 of them indexed.
Although there is an overhead to this it must be understood in the context that in a data warehouse we generally only insert a row once, we never update it, but it can then participate in thousands of SELECT queries which might benefit from indexing on any of the columns.
For maximum flexibility a data warehouse generally uses single column bitmap indexes except on high cardinality columns, where (compressed) btree indexes can be used.
The overhead on index maintenance is mostly associated with the expense of writing to a great many blocks and the block splits as new rows are added with values that are "in the middle" of existing value ranges for that column. This can be mitigated by partitioning and having the new data loads aligned with the partitioning scheme, and by using direct path inserts.
To address your question more directly, I think it is probably fine to index the obvious at first, but do not be afraid of adding more indexes on if the queries against the table would benefit.
In a paraphrase of Einstein about simplicity, add as many indexes as you need and no more.
Seriously, however, every index you add requires maintenance whenever data is added to the table. On tables that are primarily read only, lots of indexes are a good thing. On tables that are highly dynamic, fewer is better.
My advice is to cover the common and obvious cases and then, as you encounter issues where you need more speed in getting data from specific tables, evaluate and add indices at that point.
Also, it's a good idea to re-evaluate your indexing schemes every few months, just to see if there is anything new that needs indexing or any indices that you've created that aren't being used for anything and should be gotten rid of.
In addition to the points everyone else has raised, the Cost Based Optimizer incurs a cost when creating a plan for an SQL statement if there are more indexes because there are more combinations for it to consider. You can reduce this by correctly using bind variables so that SQL statements stay in the SQL cache. Oracle can then do a soft parse and re-use the plan it found last time.
As always, nothing is simple. If there are skewed columns and histograms involved then this can be a bad idea.
In our web applications we tend to limit the combinations of searches that we allow. Otherwise you would have to test literally every combination for performance to ensure you did not have a lurking problem that someone will find one day. We have also implemented resource limits to stop this causing issues elsewhere in the application should something go wrong.
I made some simple tests on my real project and real MySql database. I already answered in this topic: What is the cost of indexing multiple db columns?
But I think it will be better if I quote it here:
I made some simple tests using my real
project and real MySql database.
My results are: adding average index
(1-3 columns in an index) to a table -
makes inserts slower by 2.1%. So, if
you add 20 indexes, your inserts will
be slower by 40-50%. But your selects
will be 10-100 times faster.
So is it ok to add many indexes? - It
depends :) I gave you my results - You
decide!
Ultimately how many indexes you need depend on the behavior of your applications that ride on top of your database server.
In general the more inserting you do the more painful your indexes become. Each time you do an insert, all the indexes that include that table have to be updated.
Now if your application has a decent amount of reading, or even more so if it's almost all reading, then indexes are the way to go as there will be major performance improvements for very little cost.
There's no static answer in my opinion, this sort of thing falls under 'performance tuning'.
It could be that everything your app does is looked up by a primary key, or it could be the oposite in that queries are done over unristricted combinations of fields and any one in particular could be used at any given time.
Beyond just indexing, there's reogranizing your DB to include calculated search fields, splitting tables, etc - it's really dependant on your load shapes and query parameters, how much/what data 'really' needs to be retruend by a query.
If your entire DB is fronted by stored-procedure facades turning becomes a bit easier, as you don't have to wory about every ad-hoc query. Or you may have a deep understanding of the kind of queries that will hit your DB, and can limit the tuning to those.
For SQL Server I've found the Database Engine Tuning advisor usefull - you set up 'typical' workloads and it can make recommendations about adding/removing indexes and statistics. I'm sure other DBs have similar tools, either 'offical' or third party.
This really is a more theoretical questions than practical. Indexes impact on your performance depends on the hardware you have, the version of Oracle, index types, etc. Yesterday I heard Oracle announced a dedicated storage, made by HP, which is supposed to perform 10 times faster with 11g database.
As for your case, there can be several solutions:
1. Have a large amount of indexes (>20) and rebuild them daily (nightly). This would be especially useful if the table gets thousands of updates/deletes daily.
2. Partition your table (if that applies your data model).
3. Use a separate table for new/updated data, and run a nightly process which combines the data together. This would require a change in your application logic.
4. Switch to IOT (index organized table), if your data support this.
Of course there might be many more solutions for such case. My first suggestion to you, would be to clone the DB to a development environment, and run some stress testing against it.
An index imposes a cost when the underlying table is updated. An index provides a benefit when it is used to spped up a query. For each index, you need to balance the cost against the benefit. How much slower does the query run without the index? How much of a benefit is running faster? Can you or your users tolerate the slow speed when the index is missing?
Can you tolerate the additional time it takes to complete an update?
You need to compare costs and benefits. That's particular to your situation. There's no magic number of indexes that passes the threshold of "too many".
There's also the cost of the space needed to store the index, but you've said that in your situation that's not an issue. The same is true in most situations, given how cheap disk space has become.
If you do mostly reads (and few updates) then there's really no reason not to index everything you'll need to index. If you update often, then you may need to be cautious on how many indexes you have. There's no hard number, but you'll notice when things start to slow down. Make sure your clustered index is the one that makes the most sense based on the data.
One thing you may consider is building indexes to target a standard combination of searches. If column1 is commonly searched, and column2 is often used with it, and column3 is sometimes used with column2 and column1, then an index on column1, column2, and column3 in that order can be used for any of those three circumstances, though it is only one index that has to be maintained.
How many columns are there?
I have always been told to make single-column indexes, not multi-column indexes. So no more indexes than the amount of columns, IMHO.
What it really comes down to is, don't add an index unless you know (and this often means gathering usage statistics) that it will be used far more often than it's updated.
Any index that doesn't meet that criteria will cost you more to rebuild than the performance penalty of not having it in the odd case it got used.
Sql server gives you some good tools that let you see which indexes are actually being used.
This article, http://www.mssqltips.com/tip.asp?tip=1239, gives you some queries that let you get a better insight into how much an index is used, as opposed to how much it is updated.
It is totally based on the columns which are being used in Where Clause.
And as the Thumb of Rule, we must have indexes on Foreign Key Columns to avoid DEADLOCKS.
AWR report should analyze periodically to understand the need of indexes.