Related
What is the index creating strategy?
Is it possible to create more than one non-clustered index on the same column in SQL Server?
How about creating clustered and non-clustered on same column?
Very sorry, but indexing is very confusing to me.
Is there any way to find out the estimated query execution time in SQL Server?
The words are rather logical and you'll learn them quite quickly. :)
In layman's terms, SEEK implies seeking out precise locations for records, which is what the SQL Server does when the column you're searching in is indexed, and your filter (the WHERE condition) is accurrate enough.
SCAN means a larger range of rows where the query execution planner estimates it's faster to fetch a whole range as opposed to individually seeking each value.
And yes, you can have multiple indexes on the same field, and sometimes it can be a very good idea. Play out with the indexes and use the query execution planner to determine what happens (shortcut in SSMS: Ctrl + M). You can even run two versions of the same query and the execution planner will easily show you how much resources and time is taken by each, making optimization quite easy.
But to expand on these a bit, say you have an address table like so, and it has over 1 billion records:
CREATE TABLE ADDRESS
(ADDRESS_ID INT -- CLUSTERED primary key ADRESS_PK_IDX
, PERSON_ID INT -- FOREIGN KEY, NONCLUSTERED INDEX ADDRESS_PERSON_IDX
, CITY VARCHAR(256)
, MARKED_FOR_CHECKUP BIT
, **+n^10 different other columns...**)
Now, if you want to find all the address information for person 12345, the index on PERSON_ID is perfect. Since the table has loads of other data on the same row, it would be inefficient and space-consuming to create a nonclustered index to cover all other columns as well as PERSON_ID. In this case, SQL Server will execute an index SEEK on the index in PERSON_ID, then use that to do a Key Lookup on the clustered index in ADDRESS_ID, and from there return all the data in all other columns on that same row.
However, say you want to search for all the persons in a city, but you don't need other address information. This time, the most effective way would be to create an index on CITY and use INCLUDE option to cover PERSON_ID as well. That way, a single index seek / scan would return all the information you need without the need to resort to checking the CLUSTERED index for the PERSON_ID data on the same row.
Now, let's say both of those queries are required but still rather heavy because of the 1 billion records. But there's one special query that needs to be really really fast. That query wants all the persons on addresses that have been MARKED_FOR_CHECKUP, and who must live in New York (ignore whatever checkup means, that doesn't matter). Now you might want to create a third, filtered index on MARKED_FOR_CHECKUP and CITY, with INCLUDE covering PERSON_ID, and with a filter saying CITY = 'New York' and MARKED_FOR_CHECKUP = 1. This index would be insanely fast, as it only ever cover queries that satisfy those exact conditions, and therefore has a fraction of the data to go through compared to the other indexes.
(Disclaimer here, bear in mind that the query execution planner is not stupid, it can use multiple nonclustered indexes together to produce the correct results, so the examples above may not be the best ones available as it's very hard to imagine when you would need 3 different indexes covering the same column, but I'm sure you get the idea.)
The types of index, their columns, included columns, sorting orders, filters etc depend entirely on the situation. You will need to make covering indexes to satisfy several different types of queries, as well as customized indexes created specifically for singular, important queries. Each index takes up space on the HDD so making useless indexes is wasteful and requires extra maintenance whenever the data model changes, and wastes time in defragmentation and statistics update operations though... so you don't want to just slap an index on everything either.
Experiment, learn and work out which works best for your needs.
I'm not the expert on indexing either, but here is what I know.
You can have only ONE Clustered Index per table.
You can have up to a certain limit of non clustered indexes per table. Refer to http://social.msdn.microsoft.com/Forums/en-US/63ba3877-e0bd-4417-a04b-19c3bfb02ac9/maximum-number-of-index-per-table-max-no-of-columns-in-noncluster-index-in-sql-server?forum=transactsql
Indexes should just have different names, but its better not to use the same column(s) on a lot of different indexes as you will run into some performance problems.
A very important point to remember is that Indexes although it makes your select faster, influence your Insert/Update/Delete speed as the information needs to be added to the index, which means that the more indexes you have on a column that gets updated a lot, will drastically reduce the speed of the update.
You can include columns that is used on a CLUSTERED index in one or more NON-CLUSTERED indexes.
Here is some more reading material
http://www.sqlteam.com/article/sql-server-indexes-the-basics
http://www.programmerinterview.com/index.php/database-sql/what-is-an-index/
EDIT
Another point to remember is that an index takes up space just like the table. The more indexes you create the more space it uses, so try not to use char/varchar (or nchar/nvarchar) in an index. It uses to much space in the index, and on huge columns give basically no benefit. When your Indexes start to become bigger than your table, it also means that you have to relook your index strategy.
Looking at the missing index DMVs on SQLServer it suggests I add the following index:
CREATE INDEX [IXFoo] ON [a].[b].[MyTable] ([BarFlag]) INCLUDE ([BazID])
There's two things that confuse me.
[BarFlag] is a bit field. Hardly highly selective, why put an index on a bit field?
Why not use a composite index in this case.: CREATE INDEX [IXFoo] ON [a].[b].[MyTable] ([BarFlag],[BazID])
I guess I'm not understanding the INCLUDE keyword properly. I've looked at msdn for an explanation but I'm still unclear.
Can someone explain why this index is suggested over a composite and explain the INCLUDE keyword to me?
The main difference is this:
if you create a composite index on (BarFlag, BazID), then your index will contain both values on all levels of the index b-tree; this means, the query analyzer will also have the chance to use both values when making decisions, and this can support queries that specify both columns in a WHERE clause
if you create an index on (BarFlag) and only include (BazID), then your index will contain only BarFlag values on all levels of the index b-tree, and only on the leaf level, the "last" level, there will also be the values of BazID included. The BazID values cannot be used in selecting the data - they're just present at the index leaf level for lookup.
Just for an INT and a BIT that isn't much of a concern, but if you're dealing with a VARCHAR(2000) column, you cannot add that to the actual index (max. is 900 byte per entry) - but you can include it.
Having a column included in an index can be useful if you select for these two values - then if SQL Server finds a match for BarFlag, it can look up the corresponding BazID value in the leaf-level node of the index itself and it can save itself a trip back to the actual data page (a "bookmark lookup") to go grab that value from the data pages. This can be a massive boost for performance
And you're right - having an index just on BarFlag (BIT) really doesn't make sense - then again, that DMV only suggests indices - you're not supposed to blindly follow all its recommendations - you still need to think and consider if those are good recommendations (or not).
The INCLUDE keyword just means that the value of the included columns should be stored in the index itself so that for queries like the following:
SELECT BazID FROM MyTable WHERE BarFlag = #SomeValue
It isn't necessary to do an additional lookup on the table itself in order to find the value of BazID after doing an index seek.
This question already has answers here:
What are the differences between a clustered and a non-clustered index?
(13 answers)
Closed 7 years ago.
I need to add proper index to my tables and need some help.
I'm confused and need to clarify a few points:
Should I use index for non-int columns? Why/why not
I've read a lot about clustered and non-clustered index yet I still can't decide when to use one over the other. A good example would help me and a lot of other developers.
I know that I shouldn't use indexes for columns or tables that are often updated. What else should I be careful about and how can I know that it is all good before going to test phase?
A clustered index alters the way that the rows are stored. When you create a clustered index on a column (or a number of columns), SQL server sorts the table’s rows by that column(s). It is like a dictionary, where all words are sorted in alphabetical order in the entire book.
A non-clustered index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table’s rows containing the data. It is like an index in the last pages of a book, where keywords are sorted and contain the page number to the material of the book for faster reference.
You really need to keep two issues apart:
1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way!
One rule of thumb I would apply is this: any "regular" table (one that you use to store data in, that is a lookup table etc.) should have a clustering key. There's really no point not to have a clustering key. Actually, contrary to common believe, having a clustering key actually speeds up all the common operations - even inserts and deletes (since the table organization is different and usually better than with a heap - a table without a clustering key).
Kimberly Tripp, the Queen of Indexing has a great many excellent articles on the topic of why to have a clustering key, and what kind of columns to best use as your clustering key. Since you only get one per table, it's of utmost importance to pick the right clustering key - and not just any clustering key.
GUIDs as PRIMARY KEY and/or clustered key
The clustered index debate continues
Ever-increasing clustering key - the Clustered Index Debate..........again!
Disk space is cheap - that's not the point!
Marc
You should be using indexes to help SQL server performance. Usually that implies that columns that are used to find rows in a table are indexed.
Clustered indexes makes SQL server order the rows on disk according to the index order. This implies that if you access data in the order of a clustered index, then the data will be present on disk in the correct order. However if the column(s) that have a clustered index is frequently changed, then the row(s) will move around on disk, causing overhead - which generally is not a good idea.
Having many indexes is not good either. They cost to maintain. So start out with the obvious ones, and then profile to see which ones you miss and would benefit from. You do not need them from start, they can be added later on.
Most column datatypes can be used when indexing, but it is better to have small columns indexed than large. Also it is common to create indexes on groups of columns (e.g. country + city + street).
Also you will not notice performance issues until you have quite a bit of data in your tables. And another thing to think about is that SQL server needs statistics to do its query optimizations the right way, so make sure that you do generate that.
A comparison of a non-clustered index with a clustered index with an example
As an example of a non-clustered index, let’s say that we have a non-clustered index on the EmployeeID column. A non-clustered index will store both the value of the
EmployeeID
AND a pointer to the row in the Employee table where that value is actually stored. But a clustered index, on the other hand, will actually store the row data for a particular EmployeeID – so if you are running a query that looks for an EmployeeID of 15, the data from other columns in the table like
EmployeeName, EmployeeAddress, etc
. will all actually be stored in the leaf node of the clustered index itself.
This means that with a non-clustered index extra work is required to follow that pointer to the row in the table to retrieve any other desired values, as opposed to a clustered index which can just access the row directly since it is being stored in the same order as the clustered index itself. So, reading from a clustered index is generally faster than reading from a non-clustered index.
In general, use an index on a column that's going to be used (a lot) to search the table, such as a primary key (which by default has a clustered index). For example, if you have the query (in pseudocode)
SELECT * FROM FOO WHERE FOO.BAR = 2
You might want to put an index on FOO.BAR. A clustered index should be used on a column that will be used for sorting. A clustered index is used to sort the rows on disk, so you can only have one per table. For example if you have the query
SELECT * FROM FOO ORDER BY FOO.BAR ASCENDING
You might want to consider a clustered index on FOO.BAR.
Probably the most important consideration is how much time your queries are taking. If a query doesn't take much time or isn't used very often, it may not be worth adding indexes. As always, profile first, then optimize. SQL Server Studio can give you suggestions on where to optimize, and MSDN has some information1 that you might find useful
faster to read than non cluster as data is physically storted in index order
we can create only one per table.(cluster index)
quicker for insert and update operation than a cluster index.
we can create n number of non cluster index.
[11] tells:
"In a nonclustered index, the leaf level does not contain all the data. In addition to the key values, each index row in the leaf level (the lowest level of the tree) contains a bookmark that tells SQL Server where to find the data row corresponding to the key in the index.
A bookmark can take one of two forms. If the table has a clustered index, the bookmark is the clustered index key for the corresponding data row
. If the table is a heap (in other words, it has no clustered index), the bookmark is a row identifier (RID), which is an actual row locator in the form File#:Page#:Slot#."
Is this a copy of clustered index key or nonclustered index has a pointer to it?
Should all clustered index structure, i.e. b-tree with intermediate data, be traversed to get to row data through non-clustered index bookmark on clustered table?
What does clustered index indtroduce that direct referencing become impossible?
Update:
Let me re-phrase the question. How this is done I can read myself but I want to understand why it is done this way.
Would not it be much more efficient to continue referencing row data by RID from non-clustered index having (added) clustered one?
Suppose a table has only non-clustered index(es) (but no clustered index).
Non-clustered index leaves contains RID to real data. For direct access of row data without any need of lookup/traversals.
Adding clustered index means elimination of IAM (Index Allocation Map) pages and substituting all RIDs of all non-clustered indexes by clustered index keys + necessity of additional lookup instead of direct access.
What is the point in this?
Update2:
Was my question downvoted by Microsoft himself? Thanks again, this is an honor.
It is pointless to downvote without explaining.
Update3:
#PerformanceDB", I could not understand the phrase in your answer:
""It also means the B-Tree is reduced by one level in index height (which is why they are tiny if you inspected them)."
Can you explain it?
Yes, I'd like illustrations.
I started to read: Debunking myths about clustered indexes - part 4 (CIXs, TPC-C & Oracle clusters) and it, as many other sources, explicitly refer to the fact that SQL Server, in contrat to Oracle, lacks direct access features on a clustered table.
Update4 (Update5 - corrected by strike-out):
A few answerers referred to the fact that a bookmark CI key in NCI leaf is for address independence in case of page splits.
Aren't during index reorganization or de-fragmenting in non-clustered table with CI NCI (non-clustered index) the rows relocated and corresponding RIDs in NCI change in NCI are modified?
This seems to me as addressing scheme deficiency - the row should have moved with its address, should have not it?
Also, Is heap completely immune to page splits? due to size increase of variable size data types in row
Related questions:
What do I miss in understanding the clustered index?
Why/when/how is whole clustered index scan chosen rather than full table scan?
Reasons not to have a clustered index in SQL Server 2005
Cited:
[11]
Inside Microsoft® SQL Server™ 2005: The Storage Engine
By Kalen Delaney - (Solid Quality Learning)
...............................................
Publisher: Microsoft Press
Pub Date: October 11, 2006
Print ISBN-10: 0-7356-2105-5
Print ISBN-13: 978-0-7356-2105-3
Pages: 464
[11a]
p.250 Section Index organization from Chapter 7. Index Internals and Management
Here is helpful online copypaste from it
http://sqlserverindexeorgnization.blogspot.com/
though without any credits to source
The problem is that the doco is gobbledegook, and increases the very confusion it is alleging it clarifies. If you forget about all that and start again, it is actually quite straight-forward. Since you are inquiring re data storage structures, and concerned re performance, let's look at that perspective (not the logical). There is no data storage structure caled "Table".
Heap
Data pages containing rows. There is no Clustered Index. The rows are not shifted as a result of Inserts/Deletes. The rows can be read in entirety (table scan) or singly (via a NonClustered Index). It gets badly fragmented.
Clustered Index
B-Tree. The Index is Clustered with the Data Rows. The leaf level is the data row. That means one less I/O on every access. It also means the B-Tree is reduced by one level in index height (which is why they are tiny if you inspected them). The Heap (entire data storage stucture) is eliminated. There are no pointers. The rows are maintained in Clustered Index Key order (rows are moved on the page as a result of Insert/Delete/Expand). Pages are trimmed within the extents.
NonClustered Index
B-Tree. Full height as required by number of rows.
Where there is a Clustered Index, the Leaf level is the Clustered Index Key (so that it can go to the exact location in the CI, which is the row).
Where there is no Clustered Index, the Leaf level is a pointer: File:Page:Offset (so that it can go to the Heap, and get the row). The RowIds in the Heap do not change (if they did, every time you Inserted/Deleted one row, you would have to update all the NCI entries in all associated NCIs, for all other rows on the page).
That is why, when you create a CI, all NCIs are automatically rebuilt (they have to be switched from [2] to 1). Obviously, always create the CI before the NCIs.
There is no File:Page:Slot, the row length is variable, it is Offset within Page.
There is no Bookmark or other goobledegook.
Re "No direct access to data row in clustered table - why"
Nonsense. You have direct and immediate access to each data row, via the CI (one less I/O) or the NCI⇢CI Key.
This is very fast, invented by Britton Lee; re-implemented and patented by Sybase; obtained by dishonesty and for a pittance by Darth Vader.
If you need further clarification, I can provide illustrations.
Responses to Comments
"It also means the B-Tree is reduced by one level in index height (which is why they are tiny if you inspected them)."
Let's say you have a tables with 1 billion rows. The "height" of the B-Tree of any given index (eg. Unique, on PK) drawn vertically is say 8; or you can say the index is 8 levels deep, between the top (a single entry) and the bottom, the leaf level. the leaf level is of course the widest, and most polpulated; it will have 1 billion entries. Given that each index page contains say 256 entries, the leaf-minus-one level contains 390K entries.
The CI B-tree (index only portion) will contain 7 levels, 390K entries, taking 10MB; because the leaf level IS the data row (of which there are 1 billion entries, spread nicely across 100GB), and is thus excluded, or not repeated.
Yes, I'd like illustrations.
Ok. I have a set of finished Sybase docs; I have butchered one for you, so as to avoid confusion, and excluded the bits that Sybase has, that MS does not. Sorry. Don't follow the links, just stay on the one page. Also the very low levels of Fragmentation in the heap are different by the fragmentation in the Heap is massive, in both Sybase and MS, so I have left that intact.
Data Storage Basics
(That is a condensed version of my much more elaborate Sybase diagrams, which I have butchered for the MS context. There is a link at the bottom of that doc, if you want the full Sybase set.)
"I started to read: Debunking myths about clustered indexes - part 4 (CIXs, TPC-C & Oracle clusters) and it, as many other sources, explicitly refer to the fact that SQL Server, in contrat to Oracle, lacks direct access features on a clustered table."
Be careful what you read, the web is full of superficial information; half truths discussed out of context; misinformation (from vendiors as well as well-meaning ignorants). As you notice, I just answer questions; I do not waste time answering points raised in references.
Just think about this. Well-implemented Tables with a CI do not need de-fragmentation; and when implemented badly, need infrequent de-fragmentation; tables without a CI need frequent and pretty much offline de-fragmentation. That's your maintenance window running into Monday morning. Just an example of why discussing items in isolation is actually misinformation. Which is why my docs are all linked and related to one another.
"A few answerers referred to the fact that CI key in NCI leaf is for address independence in case of page splits."
Yes, I just would not put it that way, that's as confusing as the first reference you posted. Page splits have nothing to do with it. I put is the way I did in my post above on purpose, for clarity. Since the rows move (the CI keeps the pages and Extents trim), the NCI MUST have the CI key, in order to find the row. It can't use a RowId which would keep changing all the time. Unless you have wide CI keys, this is no big deal; a 4-byte RowId (plus processing overhead) vs an 8-byte CI key (minus said overhead) ... who cares (ok, maybe you). Address the higher level issues, and the low level issues will be small enough to not warrant address. Squeezing out 1% performance improvement at the low level when your db is fragmented and unnormalised, is amore than a bit silly.
A system in an integrated set of components, none can be changed or evaluated in isolation. A bunch of components that are not integrated are dis-integrated, not a system. At your level of questioning, you are not yet in a position to form conclusions, or have grudges against this or that, if you do, they are premature conclusions and grudges, that will impede your progress. On top of that, there is a big difference between knowledge gained by question-and-answer vs knowledge gained by reading plus experience.
"Aren't during undex reotganization or defragmanting of non-clustered table with CI the rows relocated and corresponding RIDs in NCI change in NCI?"
Do you mean "non-clustered INDEX with CI" ? Well the NCIs are not worth de-fragmenting, just drop/create them.
Or do you mean "de-fragmenting a CI [whole table]" ? I have already posted, when you re-create the CI (or de-fragment it in place), the NCIs are automatically rebuilt. It is not about RowIds, it is about the change: when you drop the CI, the NCIs have to be rewritten from CI keys to RowIds; when you create the CI, the NCIs have to the changed back to CI Keys. Switched on DBAs drop the NCI before dropping the CI.
"This seems to me as addressing scheme deficiency - the row should have moved with its address, should have not it?"
You're getting too low-level without understanding the higher levels. If the row moves, its address changes; if the address changes, the row moves. Either you have a CI (rows move) xor you have a Heap (rows do not move).
"Also, Is heap completely immune to page splits?"
No. Page Splits still happen when variable length rows expand and there is no room on the page. But in the scheme of things, massive fragmentation on Heaps, due to never moving rows, due to it being RowId based (which the NCIs rely on), this is a small item.
Let me re-phrase the question. How
this is done I can read myself but I
want to understand why it is done this
way.
Would not it be much more efficient to
continue referencing row data by RID
from non-clustered index having
(added) clustered one?
NO ! If a table has an insert, and a page split occurs, then you would have to potentially update a lot of references that use a RID to point to the new locations of those data rows that have been moved to a new page in SQL Server. That's exactly why the SQL Server team chose to use the clustering key instead, as the "data pointer", so to speak. The value of the clustering key does not change when a page split occurs, so no update to indices are needed.
Would not it be much more efficient to continue referencing row data by RID from non-clustered index having (added) clustered one?
The whole point of a clustered index is that the records are accessed via a logical locator (which is not normally meant to change), not physical.
If the indexes were pointing to a physical RID and a row changed its physical location (say from a page split), all indexes would need to be updated too.
It's exactly the kind of problem the clustered indexes were invented to deal with.
If a table has a clustered index, each non-clustered index row contains a copy of the clustered index key.
If a table does not have a clustered index, i.e. the table is a heap, each non-clustered index row contains a pointer built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is known as a Row ID (RID).
When you identify (select) a row using a clustered index, you have all the columns from the row.
When you identify a row in a non-clustered index, you need to perfrom another lookup step to obtain the columns not included in the non-clustered index.
Would not it be much more efficient to
continue referencing row data by RID
from non-clustered index having
(added) clustered one?
In many cases it would be more efficient, yes. I believe that clustered indexes were originally implemented that way (in version 6.0?). Presumably they were changed for the reasons that marc_s mentioned, which make sense if your clustered index is such that it has a lot of page splits.
I would not have posted this (my) question, have I seen before my posting here that answer by AlexSmith there, which I saw just a few minutes after posting and having been already answered here:
"Greg Linwood wrote a great series of blogs. It is a must read: Debunking myths about clustered indexes"
It is pity, it is impossible to accept it there as an answer here
Update:
The accepted here answer by PerformanceDBA told: "The problem is that the doco is gobbledegook, and increases the very confusion it is alleging it clarifies"
Well, all msdn docs tell and show, for ex., cf. pictures from Clustered Index Structures vs. "Heap Structures" that clustered table does not have IAM page. Meanwhile, the output from following the code from Inside the Storage Engine: Using DBCC PAGE and DBCC IND to find out if page splits ever roll back shows the opposite.
Having no desire to continue spamming here I shifted my questioning and participation to /www.sqlservercentral.com/Forums
My related question there:
Does clustered table have IAM page?
If I'm am trying to squeeze every last drop of performance out of a query what affect does having these types of index's being used by my joins.
clustered index.
non-clustered index.
clustered or non-clustered index with extra columns that may not be involved in the join.
Will I gain any performance if I go through and create clustered index's that only contain the columns involved in my joins and nothing else?
(I realize I may have to move the clustered index from another index(making that index non-clustered) since it can only have one.)
In addition to Gareth Saul's answer a tiny clarification:
Non-clustered indexes repeat the
included fields, with pointer to the
rows that have that value.
This pointer to the actual data value is the column (or the set of columns) that are in your clustering key.
That's one of the main reasons why you should try and keep the clustering key small and static - small because otherwise you'll waste a lot of space, on disk and in your server's RAM, and static because otherwise, you'll have to update not just your clustering index, but also all your non-clustered indices as well, if your value changes.
This "lookup pointer is the clustering key" feature has been in SQL Server since version 7, as Kim Tripp will explain in great detail here:
What is a clustered index?
In SQL Server 7.0 and higher the
internal dependencies on the
clustering key CHANGED. (Yes, it's
important to know that things CHANGED
in 7.0... why? Because there are still
some folks out there that don't
realize how RADICAL of a change
occurred in the internals (wrt to the
clustering key) in SQL Server 7.0).
What changed is that the clustering
key gets used as the "lookup" value
from the nonclustered indexes.
Will I gain any performance if I go through and create clustered index's that only contain the columns involved in my joins and nothing else?
Not as I understand. The point of a clustered index is that it then sorts the data on disk around that index (hence why you can only have the one), so if your join data isn't being sorted by those exact columns as well, I don't think it'd make any difference. Plus by putting data that might change (as opposed to the key) into the clustered index, you make it more likely that things will need rebuilding peridically, slowing the overall database down.
Sorry if this sounds a daft question, but have you tried running your query through the index tuning wizard? Not foolproof by any stretch but I've had some decent improvements from it in the past.
You only get one clustered index - this is what controls the physical storage of the table on disk / in memory.
Non-clustered indexes repeat the included fields, with pointer to the rows that have that value. Having an index on the columns being used in your joins should improve performance. You can further optimise by using "included columns" in your index - this duplicates the row information directly into the index, which can remove the performance penalty of having to look up the row itself to perform the select.
It is useful to pay attention to the order in which your joins occur - the sequence of columns in your index should match up to this. Remember that the SQL engine may optimise and re-order your query internally - profiling may be helpful.
In most situations, you can just use the Database Engine Tuning Advisor - the recommendations it provides are pretty much spot on.
If you can your best bet is for a non-clustered index that has all the element of your join in it and if possible the field you are selecting.
This will create a spanning index meaning that all the fields SQL requires to perform are on one index.
If possible have an index which has no unnessasery field in it. Every field added makes the an individual index record larger, the smaller each index record the more you get in each Page. The more index items you get in each page the less you have to go to the Disk.
Clustered Index - Will mean the table is layed out in the order specified in the Index, this means that you will get better performance for select * from TABLE where INDEXFIELD = 3. Unless you are selecting lots of large data items this should not be required.