For any given single table that has more than 1 column that you look up against using a WHERE clause, where those columns are int or bigint, at what point does it become worth creating an index on those columns.
I am aware that i should create those columns anyway, this question is about when does the performance advantage of having those indexes there, kick in in terms of table size.
Instantly, granted you wouldn't notice it on a small table, but it needs to do a table scan if an index isn't in place on these columns.
So in execution time you wouldn't notice it but if you look at CPU used by the query, and reads you'll notice with the index the query instantly start performing better.
The advantage would happen when data must be retrieved and the most restrictive where clause involve the indexed column
Note that: On CUD statement, indexes will add an overhead (which might be compensated if the CUD involve some data retrieving such as explained above).
Related
I have a table with two non-clustered indexes. Both have the same key columns and both have INCLUDE columns that are the same as well EXCEPT for one column, otherwise they are identical.
I need to drop one as having two almost identical indexes is unnecessary. I'm looking over the Usage Stats and Op Stats that are returned using sp_BlitzIndex.
Usage Stats
Index A: Reads: 652,366(652,366 seek) Writes: 3,297,125
Index B (Has additional column in INCLUDE): Reads 644,443(640,332 seek 4,111 scan) Writes: 3,897,213
Op Stats
Index A: 536,711 Singleton Lookups; 1,239,859 scans/seeks; 423,781 deletes; 5,125 updates
Index B: (Has additional column in INCLUDE): 1,070,124 singleton lookups; 1,225,548 scans/seeks; 913,185 deletes; 5,127 updates.
Index A has more seek reads but less singleton lookups. My first choice is to keep the index with the additional column (Index B) as I would think it would cover more queries in the long run. Should I just be focusing on reads only and keep index A?
EDIT: The table in question has 22 indexes including the two in question which were added by a prior DBA. Seems unnecessary to have both indexes around when they are so very similar and I'm trying to reduce overhead of so many indexes on this table as it seems to affect performance. I do realize that proper indexing can be tricky so I guess I'm just asking if the Usage and Op stats I provided should steer me to possibly eliminating one or the other or perhaps neither.
Why do you "need" to drop one? Unless there is a performance hit due to constant updating of the data thus updating the indexes, both could still provide a benefit for your selects with minimal impact.
Have you ran a trace under normal use and analyzed to see what would benefit you with no indexes? You can learn a lot about what is going on doing this.
In Oracle 11g, say, I have a table Task which has a column ProcessState. The values of this column can be Queued, Running and Complete (can have couple more states in future). The table will have 50M+ data with 99.9% of rows having Complete as that column value. Only a few thousand rows will have value Queued/Running.
I read that although bitmap index is good for low cardinality column, but that is used largely for static tables.
So, what index can improve the query for Queued/Running tasks? bitmap or normal non-unique b-tree index?
Also, what index can improve the query for a binary column (NUMBER(1,0) with just yes/no values) ?
Disclaimer: I am an accidental dba.
A regular (b*tree) index is fine. Just make sure there is a histogram on the column. (See METHOD_OPT parameter in DBMS_STATS.GATHER_TABLE_STATS).
With a histogram on that column, Oracle will have the data it needs to make sure it uses the index when looking for queued/running jobs but use a full table scan when looking for completed job.
Do NOT use a bitmap index, as suggested in the comments. With lots of updates, you'll have concurrency and, worse, deadlocking issues.
Also, what index can improve the query for a binary column (NUMBER(1,0) with just yes/no values)
Sorry -- I missed this part of your question. If the data in the column is skewed (i.e., almost all 1 or almost all 0), then a regular (b*tree) index as above. If the data is evenly distributed, then no index will help. Reading 50% of your table's rows via an index will be slower than a full table scan.
I guess that you are interested in selecting rows with (Queued/Running) states for updating them. So it would be nice to separate the completed rows from the others because there is no much sence in indexing completed rows. You can use paritioning here or function-based index with function returning NULL for completed rows and actual values for the others, in this case only uncompleted rows appears in an index tree.
I have a table in a database that will be generated from the start and probably never be written to again. Even if it were ever written to, it'll be in the form of batch processes run during a release, and write time is not important at all.
It's a relatively large table with about 80k rows and maybe about 10-12 columns.
The application is likely to retrieve data from this table often.
I was thinking, since it'll never be written to again, should I just put indices on all the columns? That way it'll always be quick to read no matter what type of query I form?
Is this a good idea? Is there any downside to this I should be aware of?
My understanding is that each index does require some (a relatively small amount of) storage space. If you're tight for space this could matter. Exactly how much impact this might make may depend on which DB you are using.
It will depend on the table. If all of the columns will be used in search criteria, then it is not unreasonable to put indexes on them all. That is fairly unlikely though. Also, there may be compound (multi-column) indexes that would be more beneficial than some of the simple (single-column) indexes.
Finally, the query optimizer will have to review all the indexes that are present on the table when evaluating how answer queries. It is hard to say when this becomes a measurable overhead, but more indexes takes more time.
So, given the static nature of the table you describe, it is reasonable to index it more heavily than you might a more dynamic table. Indexing every column is probably not sensible. Choosing carefully which compound indexes to add may be important too.
Choose indexes for a table based on the queries you run against that table.
Indexes you never need for any query are just wasted space.
Individual indexes on each column isn't the full set of indexes possible. You also can make multi-column indexes (i.e. compound indexes), and these can be important for optimizing certain queries. The order of columns in a compound index matters.
SQL Server 2008 supports only 999 nonclustered indexes per table, so if you try to create all possible indexes on a table of more than a few columns, you will reach the limit.
Sorry, but you actually need to learn some things before you can optimize effectively. If it were simply a matter of indexing every column, then the RDBMS would do this by default.
What I mean is: Does a table with 20 columns benefit more from indexing a certain field (one that's used in search-ish queries) than a table that has just 4 columns?
Also: What is the harm in adding index to fields that I don't search with much, but might later in the future? Is there a negative to adding indexes? Is it just the size it takes up on disk, or can it make things run slower to add unnecessary indexes?
extracted from a comment
I'm using Postgres (latest version) and I have one table that I'll be doing a lot of LIKE type queries, etc but the values will undoubtedly change often since my clients have access to CRUD. Should I can the idea of indexes? Are they just a headache?
Does a table with 20 columns benefit more from indexing a certain field (one that's used in search-ish queries) than a table that has just 4 columns?
No, number of columns in a table has no bearing on benefits from having an index.
An index is solely on the values in the column(s) specified; it's the frequency of the values that will impact how much benefit your queries will see. For example, a column containing a boolean value is a poor choice for indexing, because it's a 50/50 chance the value will be one or the other value. At a 50/50 split over all the rows, the index doesn't narrow the search for a particular row.
What is the harm in adding index to fields that I don't search with much, but might later in the future?
Indexes only speed up data retrieval when they can be used, but they negatively impact the speed of INSERT/UPDATE/DELETE statements. Indexes also require maintenance to keep their value.
If you are doing LIKE queries you may find that indexes are not not much help anyway. While an index might improve this query ...
select * from t23
where whatever like 'SOMETHING%'
/
... it is unlikely that an index will help with either of these queries ...
select * from t23
where whatever like '%SOMETHING%'
/
select * from t23
where whatever like '%SOMETHING'
/
If you have free text fields and your users need fuzzy matching then you should look at Postgres's full text functionality. This employs the MATCH operator rather than LIKE and which requires a special index type. Find out more.
There is a gotcha, which is that full text indexes are more complicated than normal ones, and the related design decisions are not simple. Also some implementations require additional maintenance activities.
I have to design a database to store log data but I don't have experience before. My table contains about 19 columns (about 500 bytes each row) and daily grows up to 30.000 new rows. My app must be able to query effectively again this table.
I'm using SQL Server 2005.
How can I design this database?
EDIT: data I want to store contains a lot of type: datetime, string, short and int. NULL cells are about 25% in total :)
However else you'll do lookups, a logging table will almost certainly have a timestamp column. You'll want to cluster on that timestamp first to keep inserts efficient. That may mean also always constraining your queries to specific date ranges, so that the selectivity on your clustered index is good.
You'll also want indexes for the fields you'll query on most often, but don't jump the gun here. You can add the indexes later. Profile first so you know which indexes you'll really need. On a table with a lot of inserts, unwanted indexes can hurt your performance.
Well, given the description you've provided all you can really do is ensure that your data is normalized and that your 19 columns don't lead you to a "sparse" table (meaning that a great number of those columns are null).
If you'd like to add some more data (your existing schema and some sample data, perhaps) then I can offer more specific advice.
Throw an index on every column you'll be querying against.
Huge amounts of test data, and execution plans (with query analyzer) are your friend here.
In addition to the comment on sparse tables, you should index the table on the columns you wish to query.
Alternatively, you could test it using the profiler and see what the profiler suggests in terms of indexing based on actual usage.
Some optimisations you could make:
Cluster your data based on the most likely look-up criteria (e.g. clustered primary key on each row's creation date-time will make look-ups of this nature very fast).
Assuming that rows are written one at a time (not in batch) and that each row is inserted but never updated, you could code all select statements to use the "with (NOLOCK)" option. This will offer a massive performance improvement if you have many readers as you're completely bypassing the lock system. The risk of reading invalid data is greatly reduced given the structure of the table.
If you're able to post your table definition I may be able to offer more advice.