Is there benefit to index base tables of an indexed view? - sql-server

After I created the indexed view, I tried disabling all the indexes in base tables including the indexes for foreign key column (constraint is still there) and the query plan for the view stays the same.
It is just like magic to me that the indexed view would be able to optimize the query so much even without base table being indexed. Even without any index on the View, SQL Server is able to do an index scan on the primary key index of the indexed view to retrieve data like 1000 times faster than using the base table.
Something like SELECT * FROM MyView WITH(NOEXPAND) WHERE NotIndexedColumn = 5 ORDER BY NotIndexedColumn
So the first two questions are:
Is there any benefit to index base tables of indexed view?
What is Sql server doing when it is doing a index scan on the PK while the constraint is on a not indexed column?
Then I noticed that if I use full-text search + order by I would see a table spool (eager spool) in the query plan with a cost like 95%.
Query looks like SELECT ID FROM View WITH(NOEXPAND) WHERE CONTAINS(IndexedColumn, '"SomeText*"') ORDER BY IndexedColumn
Question n° 3:
Is there any index I could add to get rid of that operation?

It's important to understand that an indexed view is a "materialized view" and the results are stored onto disk.
So the speedup you are seeing is the actual result of the query you are seeing stored to disk.
To answer your questions:
1) Is there any benefit to index base tables of indexed view?
This is situational. If your view is flattening out data or having many extra aggregate columns, then an indexed view is better than the table. If you are just using your indexed view like such
SELECT * FROM foo WHERE createdDate > getDate() then probably not.
But if you are doing SELECT sum(price),min(id) FROM x GROUP BY id,price then the indexed view would probably be better. Granted, you are doing a more complex query with joins and other advanced options.
2) What is Sql server doing when it is doing a index scan on the PK while the constraint is on a not indexed column?
First we need to understand how clustered indexes are stored. The index is stored in a B-tree. So SQL Server is walking the tree finding all values that match your criteria when you are searching on a clustered index Depending on how you have your indexes set up i.e covering vs non covering and how your non-clustered indexes are set up will determine what the Pages and Extents look like. Without more knowledge of the table structure I can't help you understand what the scan is actually doing.
3)Is there any index I could add to get rid of that operation?
Just because something is taking 95% of the query's time doesn't make that a bad thing. The query time needs to add up to 100%, so no matter what you do there is always going to be something taking up a large percentage of time. What you need to check is the IO reads and how much time the query itself takes.
To determine this, you need to understand that SQL Server caches the results of queries. With this in mind, you can have a query take a long time the first time but afterward since the data itself is cached it would be much quicker. It all depends on the frequency of the query and how your system is set up.
For a more in-depth read on indexed view

Related

What really happens internally when an update happens on a record with indexed column in sql server

I know that when a table has indexed column(s), sql server duplicates the data of these columns so that it can be accessed fast without looking through every record. And if the index is covered with other columns then all these included columns are also stored along with the indexed columns.
So, I am assuming when an update happens on any of the indexed columns or included columns then it is obvious that the update should happen in both the actual record location and the index location. This point looks interesting to me because if a table is expected to have more updates than searches, then wouldn't it be overhead to have the index? I wanted to confirm on this and also would like to know the internals on what actually happens behind the screen when a update happens.
Yes, you have it correct. There is a trade-off when adding indexes. They have the potential to make selects faster, but they will make updates/inserts/deletes slower.
When you design an index, consider the following database guidelines:
Large numbers of indexes on a table affect the performance of INSERT, UPDATE, DELETE, and MERGE statements because all indexes must be adjusted appropriately as data in the table changes. For example, if a column is used in several indexes and you execute an UPDATE statement that modifies that column's data, each index that contains that column must be updated as well as the column in the underlying base table (heap or clustered index).
Avoid over-indexing heavily updated tables and keep indexes narrow, that is, with as few columns as possible.
Use many indexes to improve query performance on tables with low update requirements, but large volumes of data. Large numbers of indexes can help the performance of queries that do not modify data, such as SELECT statements, because the query optimizer has more indexes to choose from to determine the fastest access method.
Indexing small tables may not be optimal because it can take the query optimizer longer to traverse the index searching for data than to perform a simple table scan. Therefore, indexes on small tables might never be used, but must still be maintained as data in the table changes.
For more information have a look at the below link:-
http://technet.microsoft.com/en-us/library/jj835095(v=sql.110).aspx

Create more than one non clustered index on same column in SQL Server

What is the index creating strategy?
Is it possible to create more than one non-clustered index on the same column in SQL Server?
How about creating clustered and non-clustered on same column?
Very sorry, but indexing is very confusing to me.
Is there any way to find out the estimated query execution time in SQL Server?
The words are rather logical and you'll learn them quite quickly. :)
In layman's terms, SEEK implies seeking out precise locations for records, which is what the SQL Server does when the column you're searching in is indexed, and your filter (the WHERE condition) is accurrate enough.
SCAN means a larger range of rows where the query execution planner estimates it's faster to fetch a whole range as opposed to individually seeking each value.
And yes, you can have multiple indexes on the same field, and sometimes it can be a very good idea. Play out with the indexes and use the query execution planner to determine what happens (shortcut in SSMS: Ctrl + M). You can even run two versions of the same query and the execution planner will easily show you how much resources and time is taken by each, making optimization quite easy.
But to expand on these a bit, say you have an address table like so, and it has over 1 billion records:
CREATE TABLE ADDRESS
(ADDRESS_ID INT -- CLUSTERED primary key ADRESS_PK_IDX
, PERSON_ID INT -- FOREIGN KEY, NONCLUSTERED INDEX ADDRESS_PERSON_IDX
, CITY VARCHAR(256)
, MARKED_FOR_CHECKUP BIT
, **+n^10 different other columns...**)
Now, if you want to find all the address information for person 12345, the index on PERSON_ID is perfect. Since the table has loads of other data on the same row, it would be inefficient and space-consuming to create a nonclustered index to cover all other columns as well as PERSON_ID. In this case, SQL Server will execute an index SEEK on the index in PERSON_ID, then use that to do a Key Lookup on the clustered index in ADDRESS_ID, and from there return all the data in all other columns on that same row.
However, say you want to search for all the persons in a city, but you don't need other address information. This time, the most effective way would be to create an index on CITY and use INCLUDE option to cover PERSON_ID as well. That way, a single index seek / scan would return all the information you need without the need to resort to checking the CLUSTERED index for the PERSON_ID data on the same row.
Now, let's say both of those queries are required but still rather heavy because of the 1 billion records. But there's one special query that needs to be really really fast. That query wants all the persons on addresses that have been MARKED_FOR_CHECKUP, and who must live in New York (ignore whatever checkup means, that doesn't matter). Now you might want to create a third, filtered index on MARKED_FOR_CHECKUP and CITY, with INCLUDE covering PERSON_ID, and with a filter saying CITY = 'New York' and MARKED_FOR_CHECKUP = 1. This index would be insanely fast, as it only ever cover queries that satisfy those exact conditions, and therefore has a fraction of the data to go through compared to the other indexes.
(Disclaimer here, bear in mind that the query execution planner is not stupid, it can use multiple nonclustered indexes together to produce the correct results, so the examples above may not be the best ones available as it's very hard to imagine when you would need 3 different indexes covering the same column, but I'm sure you get the idea.)
The types of index, their columns, included columns, sorting orders, filters etc depend entirely on the situation. You will need to make covering indexes to satisfy several different types of queries, as well as customized indexes created specifically for singular, important queries. Each index takes up space on the HDD so making useless indexes is wasteful and requires extra maintenance whenever the data model changes, and wastes time in defragmentation and statistics update operations though... so you don't want to just slap an index on everything either.
Experiment, learn and work out which works best for your needs.
I'm not the expert on indexing either, but here is what I know.
You can have only ONE Clustered Index per table.
You can have up to a certain limit of non clustered indexes per table. Refer to http://social.msdn.microsoft.com/Forums/en-US/63ba3877-e0bd-4417-a04b-19c3bfb02ac9/maximum-number-of-index-per-table-max-no-of-columns-in-noncluster-index-in-sql-server?forum=transactsql
Indexes should just have different names, but its better not to use the same column(s) on a lot of different indexes as you will run into some performance problems.
A very important point to remember is that Indexes although it makes your select faster, influence your Insert/Update/Delete speed as the information needs to be added to the index, which means that the more indexes you have on a column that gets updated a lot, will drastically reduce the speed of the update.
You can include columns that is used on a CLUSTERED index in one or more NON-CLUSTERED indexes.
Here is some more reading material
http://www.sqlteam.com/article/sql-server-indexes-the-basics
http://www.programmerinterview.com/index.php/database-sql/what-is-an-index/
EDIT
Another point to remember is that an index takes up space just like the table. The more indexes you create the more space it uses, so try not to use char/varchar (or nchar/nvarchar) in an index. It uses to much space in the index, and on huge columns give basically no benefit. When your Indexes start to become bigger than your table, it also means that you have to relook your index strategy.

Sql Server 2005 Indexed View

I have created a unique, clustered index on a view. The clustered index contains 5 columns (out of the 30 on this view), but a typical select using this view will want all 30 columns.
Doing some testing shows that the time it takes to query for the 5 columns is way faster than all 30 columns. Is this because that is just natural overhead regarding selecting on 6x as many columns, or because the indexed view is not storing the non-indexed columns in a temp table, and therefore needs to perform some extra steps to gather the missing columns (joins on base tables I guess?)
If the latter, what are some steps to prevent this? Well, even if the former... what are some ways around this!
Edit: for comparison purposes, a select on the indexed view with just the 5 columns is about 10x faster than the same query on the base tables. But a select on all columns is basically equivalent in speed to the query on the base tables.
A clustered index, by definition, contains every field in every row in the table. It basically is a recreation of the table, but with the physical data pages in order by the clustered index, with b-tree sorting to allow quick access to specified values of your clustered key(s).
Are you just pulling values or are you getting aggregate functions like MIN(), MAX(), AVG(), SUM() for the other 25 fields?
An indexed view is a copy of the data, stored (clustered) potentially (and normally) in a different way to the base table. For all purposes
you now have two copies of the data
SQL Server is smart enough to see that the view and table are aliases of each other
for queries that involve only the columns in the indexed view
if the indexed view contains all columns, it is considered a full alias and can be used (substituted) by the optimizer wherever the table is queried
the indexed view can be used as just another index for the base table
When you select only the 5 columns from tbl (which has an indexed view ivw)
SQL Server completely ignores your table, and just gives you data from ivw
because the data pages are shorter (5 columns only), more records can be grabbed into memory in each page retrieval, so you get a 5x increase in speed
When you select all 30 columns - there is no way for the indexed view to be helpful. The query completely ignores the view, and just selects data from the base table.
IF you select data from all 30 columns,
but the query filters on the first 4 columns of the indexed view,*
and the filter is very selective (will result in a very small subset of records)
SQL Server can use the indexed view (scanning/seeking) to quickly generate a small result set, which it can then use to JOIN back to the base table to get the rest of the data.
However, similarly to regular indexes, an index on (a,b,c,d,e) or in this case clustered indexed view on (a,b,c,d,e) does NOT help a query that searches on (b,d,e) because they are not the first columns in the index.

Index Seek with Bookmark Lookup Only Option for SQL Query?

I am working on optimizing a SQL query that goes against a very wide table in a legacy system. I am not able to narrow the table at this point for various reasons.
My query is running slowly because it does an Index Seek on an Index I've created, and then uses a Bookmark Lookup to find the additional columns it needs that do not exist in the Index. The bookmark lookup takes 42% of the query time (according to the query optimizer).
The table has 38 columns, some of which are nvarchars, so I cannot make a covering index that includes all the columns. I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used.
Also, since 28 of the 38 columns are pulled out via this query, I'd have 28/38 of the columns in the table stored in these covering indexes, so I'm not sure how much this would help.
Do you think a Bookmark Lookup is as good as it is going to get, or what would another option be?
(I should specify that this is SQL Server 2000)
OH,
the covering index with include should work. Another option might be to create a clustered indexed view containing only the columns you need.
Regards,
Lieven
You could create an index with included columns as another option
example from BOL, this is for 2005 and up
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);
To answer this part "I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used."
An index can only be used when the query is created in a way that it is sargable, in other words if you use function on the left side of the operator or leave out the first column of the index in your WHERE clause then the index won't be used. If the selectivity of the index is low then also the index won't be used
Check out SQL Server covering indexes for some more info

What columns generally make good indexes?

As a follow up to "What are indexes and how can I use them to optimise queries in my database?" where I am attempting to learn about indexes, what columns are good index candidates? Specifically for an MS SQL database?
After some googling, everything I have read suggests that columns that are generally increasing and unique make a good index (things like MySQL's auto_increment), I understand this, but I am using MS SQL and I am using GUIDs for primary keys, so it seems that indexes would not benefit GUID columns...
Indexes can play an important role in query optimization and searching the results speedily from tables. The most important step is to select which columns are to be indexed. There are two major places where we can consider indexing: columns referenced in the WHERE clause and columns used in JOIN clauses. In short, such columns should be indexed against which you are required to search particular records. Suppose, we have a table named buyers where the SELECT query uses indexes like below:
SELECT
buyer_id /* no need to index */
FROM buyers
WHERE first_name='Tariq' /* consider indexing */
AND last_name='Iqbal' /* consider indexing */
Since "buyer_id" is referenced in the SELECT portion, MySQL will not use it to limit the chosen rows. Hence, there is no great need to index it. The below is another example little different from the above one:
SELECT
buyers.buyer_id, /* no need to index */
country.name /* no need to index */
FROM buyers LEFT JOIN country
ON buyers.country_id=country.country_id /* consider indexing */
WHERE
first_name='Tariq' /* consider indexing */
AND
last_name='Iqbal' /* consider indexing */
According to the above queries first_name, last_name columns can be indexed as they are located in the WHERE clause. Also an additional field, country_id from country table, can be considered for indexing because it is in a JOIN clause. So indexing can be considered on every field in the WHERE clause or a JOIN clause.
The following list also offers a few tips that you should always keep in mind when intend to create indexes into your tables:
Only index those columns that are required in WHERE and ORDER BY clauses. Indexing columns in abundance will result in some disadvantages.
Try to take benefit of "index prefix" or "multi-columns index" feature of MySQL. If you create an index such as INDEX(first_name, last_name), don’t create INDEX(first_name). However, "index prefix" or "multi-columns index" is not recommended in all search cases.
Use the NOT NULL attribute for those columns in which you consider the indexing, so that NULL values will never be stored.
Use the --log-long-format option to log queries that aren’t using indexes. In this way, you can examine this log file and adjust your queries accordingly.
The EXPLAIN statement helps you to reveal that how MySQL will execute a query. It shows how and in what order tables are joined. This can be much useful for determining how to write optimized queries, and whether the columns are needed to be indexed.
Update (23 Feb'15):
Any index (good/bad) increases insert and update time.
Depending on your indexes (number of indexes and type), result is searched. If your search time is gonna increase because of index then that's bad index.
Likely in any book, "Index Page" could have chapter start page, topic page number starts, also sub topic page starts. Some clarification in Index page helps but more detailed index might confuse you or scare you. Indexes are also having memory.
Index selection should be wise. Keep in mind not all columns would require index.
Some folks answered a similar question here: How do you know what a good index is?
Basically, it really depends on how you will be querying your data. You want an index that quickly identifies a small subset of your dataset that is relevant to a query. If you never query by datestamp, you don't need an index on it, even if it's mostly unique. If all you do is get events that happened in a certain date range, you definitely want one. In most cases, an index on gender is pointless -- but if all you do is get stats about all males, and separately, about all females, it might be worth your while to create one. Figure out what your query patterns will be, and access to which parameter narrows the search space the most, and that's your best index.
Also consider the kind of index you make -- B-trees are good for most things and allow range queries, but hash indexes get you straight to the point (but don't allow ranges). Other types of indexes have other pros and cons.
Good luck!
It all depends on what queries you expect to ask about the tables. If you ask for all rows with a certain value for column X, you will have to do a full table scan if an index can't be used.
Indexes will be useful if:
The column or columns have a high degree of uniqueness
You frequently need to look for a certain value or range of values for
the column.
They will not be useful if:
You are selecting a large % (>10-20%) of the rows in the table
The additional space usage is an issue
You want to maximize insert performance. Every index on a table reduces insert and update performance because they must be updated each time the data changes.
Primary key columns are typically great for indexing because they are unique and are often used to lookup rows.
Any column that is going to be regularly used to extract data from the table should be indexed.
This includes:
foreign keys -
select * from tblOrder where status_id=:v_outstanding
descriptive fields -
select * from tblCust where Surname like "O'Brian%"
The columns do not need to be unique. In fact you can get really good performance from a binary index when searching for exceptions.
select * from tblOrder where paidYN='N'
In general (I don't use mssql so can't comment specifically), primary keys make good indexes. They are unique and must have a value specified. (Also, primary keys make such good indexes that they normally have an index created automatically.)
An index is effectively a copy of the column which has been sorted to allow binary search (which is much faster than linear search). Database systems may use various tricks to speed up search even more, particularly if the data is more complex than a simple number.
My suggestion would be to not use any indexes initially and profile your queries. If a particular query (such as searching for people by surname, for example) is run very often, try creating an index over the relevate attributes and profile again. If there is a noticeable speed-up on queries and a negligible slow-down on insertions and updates, keep the index.
(Apologies if I'm repeating stuff mentioned in your other question, I hadn't come across it previously.)
It really depends on your queries. For example, if you almost only write to a table then it is best not to have any indexes, they just slow down the writes and never get used. Any column you are using to join with another table is a good candidate for an index.
Also, read about the Missing Indexes feature. It monitors the actual queries being used against your database and can tell you what indexes would have improved the performace.
Your primary key should always be an index. (I'd be surprised if it weren't automatically indexed by MS SQL, in fact.) You should also index columns you SELECT or ORDER by frequently; their purpose is both quick lookup of a single value and faster sorting.
The only real danger in indexing too many columns is slowing down changes to rows in large tables, as the indexes all need updating too. If you're really not sure what to index, just time your slowest queries, look at what columns are being used most often, and index them. Then see how much faster they are.
Numeric data types which are ordered in ascending or descending order are good indexes for multiple reasons. First, numbers are generally faster to evaluate than strings (varchar, char, nvarchar, etc). Second, if your values aren't ordered, rows and/or pages may need to be shuffled about to update your index. That's additional overhead.
If you're using SQL Server 2005 and set on using uniqueidentifiers (guids), and do NOT need them to be of a random nature, check out the sequential uniqueidentifier type.
Lastly, if you're talking about clustered indexes, you're talking about the sort of the physical data. If you have a string as your clustered index, that could get ugly.
A GUID column is not the best candidate for indexing. Indexes are best suited to columns with a data type that can be given some meaningful order, ie sorted (integer, date etc).
It does not matter if the data in a column is generally increasing. If you create an index on the column, the index will create it's own data structure that will simply reference the actual items in your table without concern for stored order (a non-clustered index). Then for example a binary search can be performed over your index data structure to provide fast retrieval.
It is also possible to create a "clustered index" that will physically reorder your data. However you can only have one of these per table, whereas you can have multiple non-clustered indexes.
The ol' rule of thumb was columns that are used a lot in WHERE, ORDER BY, and GROUP BY clauses, or any that seemed to be used in joins frequently. Keep in mind I'm referring to indexes, NOT Primary Key
Not to give a 'vanilla-ish' answer, but it truly depends on how you are accessing the data
It should be even faster if you are using a GUID.
Suppose you have the records
100
200
3000
....
If you have an index(binary search, you can find the physical location of the record you are looking for in O( lg n) time, instead of searching sequentially O(n) time. This is because you dont know what records you have in you table.
Best index depends on the contents of the table and what you are trying to accomplish.
Taken an example A member database with a Primary Key of the Members Social Security Numnber. We choose the S.S. because the application priamry referes to the individual in this way but you also want to create a search function that will utilize the members first and last name. I would then suggest creating a index over those two fields.
You should first find out what data you will be querying and then make the determination of which data you need indexed.

Resources