DB non-clustered Index on event log date DESC a bad idea? - database

We have a SQL table that is populated with events from our website (mostly error logging and the like.) The table has several text fields that contain all of the information about the type of event, and a date/time field that shows when the event was logged. The table is fairly large and grows by around 10-100 records per day.
Obviously, when going through this log, we often are looking for the most recent items, so I figured an obvious way to improve our search times would be to add a index to the date field. Me, I figured that while either ASC or DESC would both be great, DESC would be better since that's the way we're searching most of the time. Our DB guy said "no way"...it would be really bad, because the index table would rapidly become fragmented.
I could see why you wouldn't want to have a clustered index on date DESC, because you'd constantly be trying to insert at the beginning...but I thought with a non-clustered index it would be okay, since the records wouldn't need to be moved around. But what he's saying also makes sense...still would have to move indexes around.
But how much? And how big of a hit would it be? And even if it isn't much of a hit, maybe it's still not worth it because the performance on occasional selects just couldn't improve that much? Thoughts?

I don't think it's a bad idea - quite the contrary!
Not knowing your database system, I can't really be sure why your DB guy would think this would be a bad idea. And even so - even an ascending index on the date will be quite beneficial already (at least in the case of SQL Server).
In this case, if you do frequently query by date and usually will retrieve the most recent ones, this seems like a perfect index to me! Maybe you could make it even better by adding the second most likely selection criteria (log application? log type?) to it, so that if you specify both the date and that second criteria, the search scope would be even more limited within the index.
If I were you, I would try a few sample queries against the table without this index, and then add the non-clustered index on your logdate - first with ASC and test how your queries perform (check out their execution plans!), then try the index with DESC, and possibly try the index with LogDate and an additional criteria field, too. See how performance looks like.
Marc

Indexes speed up some queries but slow down all loads. Whether or not an index gives an overall performance improvement depends on how much it speeds up your actual query workload and how much it slows down your actual loading workload (as well as deletes and updates that modify the indexed column).
In many (probably most) applications that involve storing event data, there is a huge amount of loading going on and relatively little querying, which is primarily summary-type queries that don't benefit from indexes. In these sorts of applications, indexes often do more harm than good.
In many such applications, it is possible to do loads during off hours so even if the index gives an overall slowdown, it might be worth it to increase query speed because someone is waiting for the query output but no one waits for the load to complete. However, the index can get so large that overruns the file cache and each insert has to read and write a different leaf page from disk. At this point, loads start to require a linear number of random access disk reads and writes, which can cause it to take all day to do a load.

Related

Advice on adding index in db2

Good day,
In my java web application, I have a table, which having 107 columns, and this table also a parent table, and having many child tables. Currently this table is having more than 10 millions row of records in production.
Since last year, the java web application keep hitting slowness issue. After checking and debugging, we found that the slowness is happen during update or select data from this table.
Every time having this issue, I will take the select query or update query to run a db2advis command to check its result, and everytime I am getting result that need >99% improvement to apply the recommended indexes. After add those indexes, will solve the slowness issue.
So until now, there are already 7~8 indexes being apply in this table. Today, I am being reported there is a slowness issue again. After checking, found that its also slowness issue during a select statement from this table and join other table. Same way, I run the db2advis command and result also >99% improvement and few recommended indexes.
However, I am starting to question myself, is all these solution is a good solution? If there is another slowness issue in future, should I apply the same solution again?
And everytime I get the result of db2advis, it will also have a part of unused existing indexes that list of drop index query, those indexes are the index that I insert previously. I believe this is because of those indexes is not related to current query for db2advis? So I can ignore this? Or these existing indexes will affected the performance?
As my understanding, there are disadvantage for index also, specially for insert and update statement.
Additionally, there is a policy for the system owner to keep the data for at least 7 years, thus, the owner is not going to do housekeeping for the database.
Would like to ask for advice, other than add index, and change the query to better query, is there any other way to overcome this issue?
This answer contains general advice about levers that may be available to you.
Your situation happens in many companies that are subject to regulatory requirements for multi-year online data retention.
When the physical data model is not designed to exploit range-partitioning for easy roll out of old data (without delete), performance can degrade over time especially when business changes or legal changes impact data distributions.
Your question is not about programming, but instead it is about performance management, and that is a big topic.
Because of that reason, your question may be more suitable for dba.stackexchange.com. This stackoverflow website is intended for more specific programming questions.
Always focus on the whole workload, not only a single query. A "good solution" for one query may be bad for another aspect of functionality.
Adding one index can speed up one query but negatively impact other insert/update/delete activities, as you mention.
Companies that have a non-production environment that has the same (or higher) volumes of data with matching distributions can exploit such environments for performance-measurement , especially if they have a realistic test workload-generator and instrumentation for profiling.
Separately, keep in mind the importance of designing the statistics collection properly - sometimes column-group-statistics can have a big impact to help index selection even for existing indexes, other times the use of distribution-statistics can greatly help dynamic SQL, and statistical-views can help with many problems. So before adding new indexes always consider if other kinds of techniques can help especially if the join columns are already indexed correctly, and foreign-key indexes are present , but for some reason the Db2-optimiser is ignoring the indexes.
If the Db2 index lastused column (in syscat.indexes) shows that an index is never used or used extremely rarely, then you should investigate why the index was created, and why some queries that might be expected to benefit from that specific index are ignoring the index. Sometimes, it's necessary to reorder the columns in the index to ensure that the highest selectivity columns are at the lowest ordinal position.
There are other levers you can adjust, MQT, MDC, optimisation profiles (hints), registry settings, optimisation-levels, but the start point is a good data model and good measurements.

What is the best moment to create SQL indexes?

When starting a project, should SQL indexes be created at the beginning?
I have a project where I haven´t created any indexes yet in production. The table that will grow most has 30000 rows and I have measured the time of the queries against this table creating an index and deleting it afterwards. The times are very similar.
I have decided to postpone the creation of the indexes in production until I notice a reduction of the response time in queries when creating them.
Is my approach correct? Or should I create them now?
I'm pretty deep into the topic of database indexing (it's actually my full-time job, also wrote a book about it (SQL Performance Explained) which is available for free here).
In my opinion, indexes should be created at the time you write the query because this is the time you have all the required information needed to decide which indexes to create in your head. In other words, if you do it at that time, it doesn't take you any extra effort. Another reason is that indexing sometimes affects the way you have to write the query so it can actually take benefit of that index.
However, the above statement assumes that you know how indexes work so you can decide which indexes to create. If you don't know that, I'd really suggest to learn about proper indexing first. Again, the book I've written is available for free on the web (Table of Contents). According to a recent survey, it takes you about 4-5 hours to read through it. Well-spent time, I'd say.
However, due to the ludicrous speed of modern hardware and vast amount of memory—even cheap commodity hardware—it is absolutely possible that you cannot measure any difference with these small tables (30k is small in DB world) yet. Nevertheless, you because you cannot measure this difference with a timers resolution of maybe 10ms, it doesn't mean the difference isn't there. Further: did you verify that the index was actually used? Are you sure the index you created was a good index for the given query?
Never the less, if the overall system is fast enough for you at the moment, sure you can go on without indexes. The risk remains, however, that it isn't fast enough on the day a major news outlet covers your app. What is supposed to be your best day might turn out to become your worst day :(
You didn't tell us a lot about your app, so I've to do some guesswork. I guess it is more like an OLTP app like an online website (as opposed to BI/OLAP). Although indexes add some overhead to write operations (insert, update, delete and merge), this is typically small compared to the benefit they bring to select (still assuming OLTP). Sure you can misuse indexes (e.g., creating hundreds on a single table) so that the overhead becomes a major problem too. But adding "a few" indexes on an OLTP table will most certainly not cause any problems due to the maintenance overhead.
Coming to an end: if you already know which indexes are good for your queries (verify it using explain), add them now before it is too late. If you are not sure, I'd still suggest to put some effort on that now. If you are not afraid of load peaks taking your app down, go on without indexes.
If you need more help, create a new question containing your query, table & index definitions as well as the explain output and people will be happy to help you figuring out if that index is fine or not.
Just create them now based on sensible choices: start with primary and foreign keys - thst'll keep your joins fast - then add indexes on single columns you'll be searching on (name, phone, etc) you are using.
Avoid creating multiple column indexes until you have a demonstrated performance problem and you can prove that an index helps. Often, reworking the query will fix the problem better than some complicated index.
The only time I delay creating indexes is if I'm about to load a heap of data and building indexes before loading means a much slower load as the index is updated for every row addition, although some databases allow the index rebuild to be deferred until after the load, so even then there's no point in waiting.

Indexing all columns for a table

I have a huge database which has only select statements on that. And there are different applications which use this. Out of 15 columns, 12 are being used in these applications. Is it recommended to use all these 12 columns in Index ? If No, what is the issue?
Please note, inserts or updates will happen once or twice a week only. And the idea is to delete all these indexes and recreate after the table load.
Thanks for your help.
It depends on what theses columns are being used for:
Indexing columns uses more disk space, memory and decreases the INSERT and UPDATE speeds as the SQL engine has to update the index.
Indexing gives you big speed gains when you are retrieving data and the indexed fields are used in WHERE, JOIN, GROUP BY or ORDER BY.
If the table is huge, then 12 indexed columns could potentially use a lot of disk space and memory and effectively slow down any retrieving of data. The best thing to do is to use a performance tuner to identify which indexes will give you the most gain.
However it really depends your applications. Although it is unusual to have such a large percentage of columns indexed, it may work in your situation.
#JustinHui gave some great insight, and since your situation is readonly, you can definitely go for indexing all the columns if you have the space.
But before you do that, test with a smaller but sizable fraction of data. Start with those columns referenced more often than others in WHERE, JOIN, GROUP BY and ORDER BY and see what improvements you find. Keep gradually increasing if needed. I would guess certain indexes are overkill, but only tests would prove it.
Finally, if you want to index everything and space is an issue, you could always look outside the database with Apache Solr. Once you get the hang of it, you could index everything and even provide users with cool faceted searching. And you would only need to rebuild the Solr index once or twice a week.
Hope that helps.

indexing large table in SQL SERVER

I have a large table (more than 10 millions records). this table is heavily used for search in the application. So, I had to create indexes on the table. However ,I experience a slow performance when a record is inserted or updated in the table. This is most likely because of the re-calculation of indexes.
Is there a way to improve this.
Thanks in advance
You could try reducing the number of page splits (in your indexes) by reducing the fill factor on your indexes. By default, the fill factor is 0 (same as 100 percent). So, when you rebuild your indexes, the pages are completely filled. This works great for tables that are not modified (insert/update/delete). However, when data is modified, the indexes need to change. With a Fill Factor of 0, you are guaranteed to get page splits.
By modifying the fill factor, you should get better performance on inserts and updates because the page won't ALWAYS have to split. I recommend you rebuild your indexes with a Fill Factor = 90. This will leave 10% of the page empty, which would cause less page splits and therefore less I/O. It's possible that 90 is not the optimal value to use, so there may be some 'trial and error' involved here.
By using a different value for fill factor, your select queries may become slightly slower, but with a 90% fill factor, you probably won't notice it too much.
There are a number of solutions you could choose
a) You could partition the table
b) Consider performing updates in batch at offpeak hours (like at night)
c) Since engineering is a balancing act of trade-offs, you would have to choose which is more important (Select or Update/insert/delete) and which operation is more important. Assuming you don't need the results in real time for an "insert", you can use Sql server service broker for those operations to perform the "less important" operation asynchronously
http://msdn.microsoft.com/en-us/library/ms166043(SQL.90).aspx
Thanks
-RVZ
We'd need to see your indexes, but likely yes.
Some things to keep in mind are that you don't want to just put an index on every column, and you don't generally want just one column per index.
The best thing to do is if you can log some actual searches, track what users are actually searching for, and create indexes targeted at those specific types of searches.
This is a classic engineering trade-off... you can make the shovel lighter or stronger, never both... (until a breakthrough in material science. )
More index mean more DML maintenance, means faster queries.
Fewer indexes mean less DML maintenance, means slower queries.
It could be possible that some of your indexes are redundant and could be combined.
Besides what Joel wrote, you also need to define the SLA for DML. Is it ok that it's slower, you noticed that it slowed down but does it really matter vs. the query improvement you've achieved... IOW, is it ok to have a light shovel that weak?
If you have a clustered index that is not on an identity numeric field, rearranging the pages can slow you down. If this is the case for you see if you can improve speed by making this a non-clustered index (faster than no index but tends to be a bit slower than the clustered index, so your selects may slow down but inserts improve)
I have found users more willing to tolerate a slightly slower insert or update to a slower select. That is a general rule, but if it becomes unacceptably slow (or worse times out) well no one can deal with that well.

How many database indexes is too many?

I'm working on a project with a rather large Oracle database (although my question applies equally well to other databases). We have a web interface which allows users to search on almost any possible combination of fields.
To make these searches go fast, we're adding indexes to the fields and combinations of fields on which we believe users will commonly search. However, since we don't really know how our customers will use this software, it's hard to tell which indexes to create.
Space isn't a concern; we have a 4 terabyte RAID drive of which we are using only a small fraction. However, I'm worried about the possible performance penalties of having too many indexes. Because those indexes need to be updated every time a row is added, deleted, or modified, I imagine it'd be a bad idea to have dozens of indexes on a single table.
So how many indexes is considered too many? 10? 25? 50? Or should I just cover the really, really common and obvious cases and ignore everything else?
It depends on the operations that occur on the table.
If there's lots of SELECTs and very few changes, index all you like.... these will (potentially) speed the SELECT statements up.
If the table is heavily hit by UPDATEs, INSERTs + DELETEs ... these will be very slow with lots of indexes since they all need to be modified each time one of these operations takes place
Having said that, you can clearly add a lot of pointless indexes to a table that won't do anything. Adding B-Tree indexes to a column with 2 distinct values will be pointless since it doesn't add anything in terms of looking the data up. The more unique the values in a column, the more it will benefit from an index.
I usually proceed like this.
Get a log of the real queries run on the data on a typical day.
Add indexes so the most important queries hit the indexes in their execution plan.
Try to avoid indexing fields that have a lot of updates or inserts
After a few indexes, get a new log and repeat.
As with all any optimization, I stop when the requested performance is reached (this obviously implies that point 0. would be getting specific performance requirements).
Everyone else has been giving you great advice. I have an added suggestion for you as you move forward. At some point you have to make a decision as to your best indexing strategy. In the end though, the best PLANNED indexing strategy can still end up creating indexes that don't end up getting used. One strategy that lets you find indexes that aren't used is to monitor index usage. You do this as follows:-
alter index my_index_name monitoring usage;
You can then monitor whether the index is used or not from that point forward by querying v$object_usage. Information on this can be found in the Oracle® Database Administrator's Guide.
Just remember that if you have a warehousing strategy of dropping indexes before updating a table, then recreating them, you will have to set the index up for monitoring again, and you'll lose any monitoring history for that index.
In data warehousing it is very common to have a high number of indexes. I have worked with fact tables having two hundred columns and 190 of them indexed.
Although there is an overhead to this it must be understood in the context that in a data warehouse we generally only insert a row once, we never update it, but it can then participate in thousands of SELECT queries which might benefit from indexing on any of the columns.
For maximum flexibility a data warehouse generally uses single column bitmap indexes except on high cardinality columns, where (compressed) btree indexes can be used.
The overhead on index maintenance is mostly associated with the expense of writing to a great many blocks and the block splits as new rows are added with values that are "in the middle" of existing value ranges for that column. This can be mitigated by partitioning and having the new data loads aligned with the partitioning scheme, and by using direct path inserts.
To address your question more directly, I think it is probably fine to index the obvious at first, but do not be afraid of adding more indexes on if the queries against the table would benefit.
In a paraphrase of Einstein about simplicity, add as many indexes as you need and no more.
Seriously, however, every index you add requires maintenance whenever data is added to the table. On tables that are primarily read only, lots of indexes are a good thing. On tables that are highly dynamic, fewer is better.
My advice is to cover the common and obvious cases and then, as you encounter issues where you need more speed in getting data from specific tables, evaluate and add indices at that point.
Also, it's a good idea to re-evaluate your indexing schemes every few months, just to see if there is anything new that needs indexing or any indices that you've created that aren't being used for anything and should be gotten rid of.
In addition to the points everyone else has raised, the Cost Based Optimizer incurs a cost when creating a plan for an SQL statement if there are more indexes because there are more combinations for it to consider. You can reduce this by correctly using bind variables so that SQL statements stay in the SQL cache. Oracle can then do a soft parse and re-use the plan it found last time.
As always, nothing is simple. If there are skewed columns and histograms involved then this can be a bad idea.
In our web applications we tend to limit the combinations of searches that we allow. Otherwise you would have to test literally every combination for performance to ensure you did not have a lurking problem that someone will find one day. We have also implemented resource limits to stop this causing issues elsewhere in the application should something go wrong.
I made some simple tests on my real project and real MySql database. I already answered in this topic: What is the cost of indexing multiple db columns?
But I think it will be better if I quote it here:
I made some simple tests using my real
project and real MySql database.
My results are: adding average index
(1-3 columns in an index) to a table -
makes inserts slower by 2.1%. So, if
you add 20 indexes, your inserts will
be slower by 40-50%. But your selects
will be 10-100 times faster.
So is it ok to add many indexes? - It
depends :) I gave you my results - You
decide!
Ultimately how many indexes you need depend on the behavior of your applications that ride on top of your database server.
In general the more inserting you do the more painful your indexes become. Each time you do an insert, all the indexes that include that table have to be updated.
Now if your application has a decent amount of reading, or even more so if it's almost all reading, then indexes are the way to go as there will be major performance improvements for very little cost.
There's no static answer in my opinion, this sort of thing falls under 'performance tuning'.
It could be that everything your app does is looked up by a primary key, or it could be the oposite in that queries are done over unristricted combinations of fields and any one in particular could be used at any given time.
Beyond just indexing, there's reogranizing your DB to include calculated search fields, splitting tables, etc - it's really dependant on your load shapes and query parameters, how much/what data 'really' needs to be retruend by a query.
If your entire DB is fronted by stored-procedure facades turning becomes a bit easier, as you don't have to wory about every ad-hoc query. Or you may have a deep understanding of the kind of queries that will hit your DB, and can limit the tuning to those.
For SQL Server I've found the Database Engine Tuning advisor usefull - you set up 'typical' workloads and it can make recommendations about adding/removing indexes and statistics. I'm sure other DBs have similar tools, either 'offical' or third party.
This really is a more theoretical questions than practical. Indexes impact on your performance depends on the hardware you have, the version of Oracle, index types, etc. Yesterday I heard Oracle announced a dedicated storage, made by HP, which is supposed to perform 10 times faster with 11g database.
As for your case, there can be several solutions:
1. Have a large amount of indexes (>20) and rebuild them daily (nightly). This would be especially useful if the table gets thousands of updates/deletes daily.
2. Partition your table (if that applies your data model).
3. Use a separate table for new/updated data, and run a nightly process which combines the data together. This would require a change in your application logic.
4. Switch to IOT (index organized table), if your data support this.
Of course there might be many more solutions for such case. My first suggestion to you, would be to clone the DB to a development environment, and run some stress testing against it.
An index imposes a cost when the underlying table is updated. An index provides a benefit when it is used to spped up a query. For each index, you need to balance the cost against the benefit. How much slower does the query run without the index? How much of a benefit is running faster? Can you or your users tolerate the slow speed when the index is missing?
Can you tolerate the additional time it takes to complete an update?
You need to compare costs and benefits. That's particular to your situation. There's no magic number of indexes that passes the threshold of "too many".
There's also the cost of the space needed to store the index, but you've said that in your situation that's not an issue. The same is true in most situations, given how cheap disk space has become.
If you do mostly reads (and few updates) then there's really no reason not to index everything you'll need to index. If you update often, then you may need to be cautious on how many indexes you have. There's no hard number, but you'll notice when things start to slow down. Make sure your clustered index is the one that makes the most sense based on the data.
One thing you may consider is building indexes to target a standard combination of searches. If column1 is commonly searched, and column2 is often used with it, and column3 is sometimes used with column2 and column1, then an index on column1, column2, and column3 in that order can be used for any of those three circumstances, though it is only one index that has to be maintained.
How many columns are there?
I have always been told to make single-column indexes, not multi-column indexes. So no more indexes than the amount of columns, IMHO.
What it really comes down to is, don't add an index unless you know (and this often means gathering usage statistics) that it will be used far more often than it's updated.
Any index that doesn't meet that criteria will cost you more to rebuild than the performance penalty of not having it in the odd case it got used.
Sql server gives you some good tools that let you see which indexes are actually being used.
This article, http://www.mssqltips.com/tip.asp?tip=1239, gives you some queries that let you get a better insight into how much an index is used, as opposed to how much it is updated.
It is totally based on the columns which are being used in Where Clause.
And as the Thumb of Rule, we must have indexes on Foreign Key Columns to avoid DEADLOCKS.
AWR report should analyze periodically to understand the need of indexes.

Resources