Help with index on database - database

Is it a good idea to create an index on a field which is VARCHAR (500) ? I am going to do a lot of search in it, but I am not sure if creating an index on such a 'big' field is a good idea?
What do you think?

It is usually not a good idea since the index files will be huge and the search relatively slow. It is better to use a prefix of the field such as the first 32 or 64 characters of the field as an index. Another possibility is that if it makes sense use a full text index,.

In general it's a good idea to create indexes on fields that you'll use for search. But, depending on the use, there are better options:
Full text search (from wikipedia): In a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the user.
Partial index: (again, from wikipedia): In databases, a partial index, also known as filtered index is an index which has some condition applied to it so that it includes a subset of rows in the table.
Maybe you should consider giving more information on the use that index will have.

You should put indexes where often used queries will run faster, however, there are a number of issues to contemplate
Indexes have a very limited size, eg. mssql has a 900 byte limit
Many index may incur an overhead while writing (although minimal last time i benchmarked inserting a million entries on a table with 9 indexes
Many indexes takes up precious space in the db
many indexes may create deadlocks when inserting data
Also take a look at the documentation for the database you use. Most databases has support for text columns with efficient searching in them

Related

SQL Server not using proper index for query

I have a table on SQL Server with about 10 million rows. It has a nonclustered index ClearingInfo_idx which looks like:
I am running query which isn't using ClearingInfo_idx index and execution plan looks like this:
Can anyone explain why query optimizer chooses to scan clustered index ?
I think it suggests this index because you use a sharp search for the two columns immediate and clearingOrder_clearingOrderId. Those values are numbers, which were good to search. The column status is nvarchar which isn't the best for a search, and due to your search with in, SQL Server needs to search two of those values.
SQL Server would use the two number columns to get a faster result and searching in the status in the second round after the number of possible results is reduced due to the exact search on the two number columns.
Hopefully you get my opinion. :-) Otherwise, just ask again. :-)
As Luaan already pointed out, the likely reason the system prefers to scan the clustered index is because
you're asking for all fields to be returned (SELECT *), change this to fields that are present in the index ( = index fields + clustered index-fields) and you'll probably see it using just the index. If you'd need a couple of extra fields you can consider INCLUDEing those in the index.
the order of the index fields isn't very optimal. Additionally it might well be that the 'content' of the field isn't very helpful either. How many distinct values are present in the index-columns and how are they spread around? If you're WHERE covers 90% of the records there is very little reason to first create a (huge) list of keys and then go fetch those from the clustered index later on. Scanning the latter directly then makes much more sense.
Did you try the suggested index? Not sure what other queries run on the table, but for this particular query it seems like a valid replacement to me. If the replacement will satisfy the other queries is another question off course. Adding extra indexes might negatively impact your IUD operations and it will require more disk-space; there is no such thing as a free lunch =)
That said, if performance is an issue, have you considered a filtered index? (again, no such thing as a free lunch; it's all about priorities)

Indexes with columns included

I noticed in my database that I have a lot of indexes in a table that only differ in the included columns.
For example for table A I have this:
INDEX ON A(COLUMN_A) INCLUDE (COLUMN_B)
INDEX ON A(COLUMN_A) INCLUDE (COLUMN_C)
INDEX ON A(COLUMN_A) INCLUDE (COLUMN_D)
It seems to me it would be more efficient (for inserts/updates/deletes) to just have this:
INDEX ON A(COLUMN_A) INCLUDE (COLUMN_B, COLUMN_C, COLUMN_D)
Would there be any reason not to do this?
Thanks!
You are right, they should be combined into one. The non-key columns that are INCLUDE-ed are just stored in the leaf nodes of the index so they can be read. Unlike key columns they don't form part of the hierarchy of the index so the order isn't important. Having fewer indexes is a good thing if the indexes aren't adding anything useful, as in this case with your redundant indexes.
See also
https://stackoverflow.com/a/1308012/8479
http://msdn.microsoft.com/en-us/library/ms190806.aspx
Most likely, this is a read optimization, not write. If there are multiple queries that use the same key column (COLUMN_A), but using a different "data" column (COLUMN_B/C/D), using this kind of index saves some I/O (no need to load unnecessary data from the table, you already have it along with the index). Including all the data columns in one index would take less space overall (no need to have the key column saved three timse), but each of the indices in your case is smaller than one "combined" index.
Hopefully, the indices were created this way for a reason, based on performance profiling and real performance issues. This is where documentation with motivation comes in very handy - why were the indices created this way? Was it sloppiness? Little understanding of the way indices work? Automatic optimization based on query plan analysis? Unless you know that, it might be a bad idea to tweak this kind of thing, especially if you don't actually have a performance problem.

MS-SQL 2012 - indexing Bit field

On an MS-SQL 2012, does it makes sense to index a "Deleted" BIT field if one is going to always use it on the Queries (ie. SELECT xx FROM oo WHERE Deleted = 0)
Or does the fact that a field is BIT, already comes with some sort of auto index for performance issues??
When you index a bit field which consist of either 1,0 or some limited values, you are actually reducing the number of rows matching that value. For fewer records this may work well but for large number of data it may help you in performance gain.
You can include bit columns as part of compound index
Index on bit field could be really helpful in scenarios where there is a large discrepancy between the number of 0's and 1's, and you are searching for the the smaller of the two.
indexing a bit field will be pretty useless, under must conditions, because the selectivity is so low. An index scan on a large table is not going to be better than a table scan. If there are other conditions you can use to create filtered indices you could consider that.
If this field is changing the nature of the logic in such a way that you will always need to consider it in the predicate, you might consider splitting the data into other tables when reporting.
Whether to index a bit field depends on several factors which have been adequately explained in the answer to this question. Link to 231125
As others have mentioned, selectivity is the key. However, if you're always searching on one value or another and that value is highly selective, consider using a filtered index.
Why not put out on the front of your clustered index? If deletes are incremental, you'd have to turn your fill factor down, but they're probably daily, right? And you have way more deleted records than undeleted records? And, as you say, you only ever query undeleted records. So, yes. Don't just index that column. Cluster on it.
It can be useful as a part of composite index, when the bit-column is at the first position in the index. But if you suppose to use it only for selectitn one value (select .... where deleted=1 and another_key=?; but never deleted=0) then create index on another_key with filter:
create index i_another on t(another_key) where deleted=1
If the bit-column should be the last in the composite index then the occurrence in index is useless. However You can include it for better performace:
create index i_another on t(another_key) include(deleted)
Then the DB engine gets the value along with reading index and doesn't need to pick up it from base table page.

Database indexing - how does it work?

how does indexing increases the performance of data retrieval?
How indexing works?
Database products (RDMS) such as Oracle, MySQL builds their own indexing system, they give some control to the database administrators however nobody exactly knows what happens on the background except people makes research in that area, so why indexing :
Put simply, database indexes help
speed up retrieval of data. The other
great benefit of indexes is that your
server doesn't have to work as hard to
get the data. They are much the same
as book indexes, providing the
database with quick jump points on
where to find the full reference (or
to find the database row).
There are many indexing techiques for example :
Primary indexing, secondary indexing
B-trees and variants (B+-trees,B*-trees)
Hashing and variants (linear hashing, spiral etc.)
for example, just think that you have a database with the primary keys are sorted (simply) and these all data is stored in blocks (in hdd) so everytime you want to access the data you don't want to increase the access time (sometimes called transaction time or i/o time) the indexing helps you which data is stored in which block by using these primary keys.
Alice (primary key is names, not good example but just give an idea)
Alice
...
...
AZ...
Bob
Bri
...
Bza
...
Now you have an index in this index you only store Alice and Bob and the blocks they point, with this way users can access the data faster.The RDMS deals with the details.
I don't give the details but if you want to delve these topics, i offer you take an Database course or look at this popular book which is taught most of the universities.
Database Management Systems Ramakrishn CGherke
Each index keep the indexed fields stored separately, sorted (typically) and in a data structure which makes finding the right entries particularly easy. The database finds the entries in the index then cross-references them to the entries in the tables (Except in the case of clustered indexes and covering indexes, in which case the index has it all already). This cross-referencing takes time but is faster (you hope) than scanning the entire table.
A clustered index is where the rows themselves with all columns* are stored together with the index. Scanning clustered indexes is better than scanning non-clustered non-covering indexes because fewer lookups are required.
A covering index is where the query only requires columns which are part of the index, so the rest of the row does not need to be looked up (This is often good for performance).
* typically excluding blob / long text columns etc
How does an index in a book increase the ease with which you find the right page?
Much easier to look through an alphabetic list and then go to the right page than read every page.
This is a gross oversimplification, but in general, database indexing creates another list of some of the contents of the table, arranged in a way that the database engine can find information quickly. By organizing table contents deliberately, this eliminates the need to look for a row of data by scanning the entire table, creating a create efficiency in searches.
Indexes provide an optimal data structure for lookup queries. If your dataset changes a lot, you might consider the performance of updating/regenerating the index as well.
There are lot of open source indexing engines like lucene available, and you can search online for detailed information about performance benchmarks.

Query performance of combined index vs. multiple single indexes vs. fulltext index

Background: I have a table with 5 million address entries which I'd like to search for different fields (customer name, contact name, zip, city, phone, ...), up to 8 fields. The data is pretty stable, maximum 50 changes a day, so almost only read access.
The user isn't supposed to tell me in advance what he's searching for, and I also want support of combined search (AND-concatenation of search terms). For example "lincoln+lond" should search for all records containing both search terms in any of the search fields, also those entries starting with any of the terms (like "London" in this example).
Problem: Now I need to choose an indexing strategy for this search table. (As a side note: I'm trying to achieve sub-second response time, worst response time should be 2 seconds.) What's better in terms of perfomance:
Do a combined index out of all queryable columns (would need 2 of them, as index limit of 900 bytes reached)
Put single indexes on each of the queryable columns
Make a fulltext index on the queryable columns and use fulltext query
I'm discarding point 1, as it doesn't seem to have any advantage (index usage will be limited and there will be no "index seek", because not all fields fit in one single index).
Question: Now, should I use the multiple single indexes variant or should I go with the fulltext index? Is there any other way to achieve the functionality mentioned above?
Try them both and see which is faster on your system. There are few hard and fast rules for database optimizations, it really depends on your environment.
Originally, i was about to suggest going with FTS as that has a lot of strong performance features going for it. Especially when you dealing with varied queries. (eg. x AND y. x NEAR y, etc..).
But before I start to ramble on with the pro's of FTS, I just checked your server version -> sql2000.
poor thing. FTS was very simple back then, so stick with multiple single indexes.
We use Sql2008 and ... it rocks.
Oh, btw. did you know that Sql2008 (free edition) has FTS in it? Is it possible to upgrade?
Going from sql2000 -> sql2008 is very worth it, if you can.
But yeah, stick with your M.S.I. option.
I agree with Grauenwolf, and I'd like to add a note about indexes. Keep in mind that if you use a syntax like the following:
SELECT field1, field2, field3
FROM table
WHERE field1 LIKE '%value%
Then no index will be used anyway when searching on field1 and you have to resort to a full-text index. For the sake of completeness, the above syntax returns all rows where field1 contains value (not necessarily at the beginning).
If you have to search for "contains", a full-text index is probably more appropriate.
To answer my own question:
I've chosen the "multiple single indexes" option. I ended having an index for each of the queried columns, each index containing only the column itself. The search works very good with mostly subsecond response times. Sometimes it takes up to 2-3 seconds, but I'm attributing it to my database server (several years old laptop with 3GB Ram and slow disk).
I didn't test the fulltext option as it was not anymore necessary (and I don't have the time to do it.)

Resources