If we create an index to a view, we materialize the view.
Why is the view materialized when it is indexed ? What is the signification as opposed to a non-materialized view ?
To my understanding, a normal view don't exist physically. Only its definition is stored, and each reference to the view actually executes the view definition all over again. So when we insert through a view, we insert directly into the table. Is it correct ?
If the view is materialized, it will become a physical table with its data. Then in this case, would modification to the base table is not updated in this view (that has materialized and now lives its own life) anymore ?
Let's think about a table with a clustered index for a minute. When you choose your clustering key, SQL Server creates a b tree, the leaves of which are the actual data. Non-clustered indexes work the same way, except the leaf nodes are tuples that represent your clustering key (so you can traverse the clustered index and get back to the actual data).
Extending the example, when you index a view, you first need to provide a clustered index. What would you expect to live at the leaves of that index? The data of course! :) And any non-clustered indexes on the view will behave exactly like their analogs on a physical table.
As to your question about a materialized view becoming stale, it doesn't. That is, SQL Server knows that the view relies on the table (which is why the view needs to be schema bound so you can't drop one of its constituent tables), and so any DML operations against the constituent tables are also reflected in the table. You can convince yourself of this by creating an indexed view and then looking at a query plan of a simple update to one of the underlying tables. You should see a corresponding update for the indexed view.
A view is simply a select statement that is saved and can be selected from, for convenience. Inserting/updating through a view does go directly to the table to perform its operation.
An indexed view is stored, indexed, just as a table.
Related
After I created the indexed view, I tried disabling all the indexes in base tables including the indexes for foreign key column (constraint is still there) and the query plan for the view stays the same.
It is just like magic to me that the indexed view would be able to optimize the query so much even without base table being indexed. Even without any index on the View, SQL Server is able to do an index scan on the primary key index of the indexed view to retrieve data like 1000 times faster than using the base table.
Something like SELECT * FROM MyView WITH(NOEXPAND) WHERE NotIndexedColumn = 5 ORDER BY NotIndexedColumn
So the first two questions are:
Is there any benefit to index base tables of indexed view?
What is Sql server doing when it is doing a index scan on the PK while the constraint is on a not indexed column?
Then I noticed that if I use full-text search + order by I would see a table spool (eager spool) in the query plan with a cost like 95%.
Query looks like SELECT ID FROM View WITH(NOEXPAND) WHERE CONTAINS(IndexedColumn, '"SomeText*"') ORDER BY IndexedColumn
Question n° 3:
Is there any index I could add to get rid of that operation?
It's important to understand that an indexed view is a "materialized view" and the results are stored onto disk.
So the speedup you are seeing is the actual result of the query you are seeing stored to disk.
To answer your questions:
1) Is there any benefit to index base tables of indexed view?
This is situational. If your view is flattening out data or having many extra aggregate columns, then an indexed view is better than the table. If you are just using your indexed view like such
SELECT * FROM foo WHERE createdDate > getDate() then probably not.
But if you are doing SELECT sum(price),min(id) FROM x GROUP BY id,price then the indexed view would probably be better. Granted, you are doing a more complex query with joins and other advanced options.
2) What is Sql server doing when it is doing a index scan on the PK while the constraint is on a not indexed column?
First we need to understand how clustered indexes are stored. The index is stored in a B-tree. So SQL Server is walking the tree finding all values that match your criteria when you are searching on a clustered index Depending on how you have your indexes set up i.e covering vs non covering and how your non-clustered indexes are set up will determine what the Pages and Extents look like. Without more knowledge of the table structure I can't help you understand what the scan is actually doing.
3)Is there any index I could add to get rid of that operation?
Just because something is taking 95% of the query's time doesn't make that a bad thing. The query time needs to add up to 100%, so no matter what you do there is always going to be something taking up a large percentage of time. What you need to check is the IO reads and how much time the query itself takes.
To determine this, you need to understand that SQL Server caches the results of queries. With this in mind, you can have a query take a long time the first time but afterward since the data itself is cached it would be much quicker. It all depends on the frequency of the query and how your system is set up.
For a more in-depth read on indexed view
I want to create a SQL Server table materialized view where I want to add an extra column named ID which is auto increment.
Is that possible?
No, that's not possible. The restrictions on indexed views prevent this.
The ID would not be stable anyway. It would change in unexpected ways when the underlying data changes. The view is not a persistent table. It reflects what the view definition says at all times.
Use something else as the key of the indexed view. Usually, there is a suitable combination of columns from the underlying tables.
Lets say that a column will only be used for joining. (i.e. I won't be ordering on the column, nor will a search for specific values in the column individually) ... the only thing that I will use the column for is joining to another table.
If the database supports Hash Joins (which from my understanding don't benefit from indexes) .. then wouldn't the addition of an index be completely redundant? (and wasteful) ?
In SQL Server it will still prevent a Key Lookup.
If you JOIN on an unindexed field, the server needs to get the values for that field from the clustered index.
If you JOIN on a NC index, the values can be obtained directly without loading all the data pages from the cluster (which really is the whole table).
So essentially you save yourself a lot of IO as the first step filters down based on a very narrow index instead of on the entire table loaded from disk.
I am going to design a Data Warehouse and I heard about materialized views. Actually I want to create a view and it should update automatically when base tables are changed. Can anyone explain with a query example?
They're called indexed views in SQL Server - read these white papers for more background:
Creating an Indexed View
Improving Performance with SQL Server 2008 Indexed Views
Basically, all you need to do is:
create a regular view
create a clustered index on that view
and you're done!
The tricky part is: the view has to satisfy quite a number of constraints and limitations - those are outlined in the white paper. If you do this, that's all there is. The view is being updated automatically, no maintenance needed.
Additional resources:
Creating and Optimizing Views in SQL Server
SQL Server Indexed Views
Although purely from engineering perspective, indexed views sound like something everybody could use to improve performance but the real life scenario is very different. I have been unsuccessful is using indexed views where I most need them because of too many restrictions on what can be indexed and what cannot.
If you have outer joins in the views, they cannot be used. Also, common table expressions are not allowed... In fact if you have any ordering in subselects or derived tables (such as with partition by clause), you are out of luck too.
That leaves only very simple scenarios to be utilizing indexed views, something in my opinion can be optimized by creating proper indexes on underlying tables anyway.
I will be thrilled to hear some real life scenarios where people have actually used indexed views to their benefit and could not have done without them
You might need a bit more background on what a Materialized View actually is. In Oracle these are an object that consists of a number of elements when you try to build it elsewhere.
An MVIEW is essentially a snapshot of data from another source. Unlike a view the data is not found when you query the view it is stored locally in a form of table. The MVIEW is refreshed using a background procedure that kicks off at regular intervals or when the source data changes. Oracle allows for full or partial refreshes.
In SQL Server, I would use the following to create a basic MVIEW to (complete) refresh regularly.
First, a view. This should be easy for most since views are quite common in any database
Next, a table. This should be identical to the view in columns and data. This will store a snapshot of the view data.
Then, a procedure that truncates the table, and reloads it based on the current data in the view.
Finally, a job that triggers the procedure to start its work.
Everything else is experimentation.
When indexed view is not an option, and quick updates are not necessary, you can create a hack cache table:
select * into cachetablename from myviewname
alter table cachetablename add primary key (columns)
-- OR alter table cachetablename add rid bigint identity primary key
create index...
then sp_rename view/table or change any queries or other views that reference it to point to the cache table.
schedule daily/nightly/weekly/whatnot refresh like
begin transaction
truncate table cachetablename
insert into cachetablename select * from viewname
commit transaction
NB: this will eat space, also in your tx logs. Best used for small datasets that are slow to compute. Maybe refactor to eliminate "easy but large" columns first into an outer view.
For MS T-SQL Server, I suggest looking into creating an index with the "include" statement. Uniqueness is not required, neither is the physical sorting of data associated with a clustered index. The "Index ... Include ()" creates a separate physical data storage automatically maintained by the system. It is conceptually very similar to an Oracle Materialized View.
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns
How it is possible to create clustered indexes on a view in SQL Server 2008.
View is not a real table so there is no sense in physical arrangement of the data that clustered index creates.
Where do I miss the point?
An index always exists on disk. When you create the index, you are materialising the rows of the view on disk even if the view itself is not "real" rows.
MSDN White paper with an explanation
This is a somewhat simplified explanation. There's lots of technical hoo-hah going on under the hood, but it sounded like you wanted a general "wassup" explanation.
A view is, essentially, a pre-written and stored query; whenever you access the view, you're retrieving and plugging that pre-written query into your current query. (Leastways this is how I think of it.)
So these "basic" views read data that's stored in tables already present in the database/on the hard drive. When you build a clustered index on a view, what you are really doing is making a second physical copy of the data that is referenced by the view. For example, if you have table A, create view vA as "select * from A", and then build a clustered index on that view, what you end up with is two copies of the data on the hard drive.
This can be useful if table A is very large, and you want quick access to a small subset of the table (such as only 2-3 columns, or only where Status = 1, or you want quick access to the data that requires an ugly join to produce.)
The fun comes in when you update table A (really, any of the tables referenced by the view), as any changes to the "base" table must also be made to the "view" table. Not a good idea in heavily used OLTP systems.
FYI, I believe SQL's "Indexed Views" are called "Materialized Views" in Oracle. For my money, Materialized View is a much better name/description.
Though a view is not a real object, the clustered index is.
And the rows the view returns can be sorted and stored.
However, to be indexable, the view should satisfy a number of conditions.
Mostly they make sure that the results are persisted and the updates to the underlying table can be easily tracked in a view (so that the index will not have to be rebuilt each time the underlying table is updated).
For instance, SUM(*) and COUNT_BIG(*) are distributive functions:
SUM(set1) + SUM(set2) = SUM(set1 + set2)
COUNT_BIG(set1) + COUNT_BIG(set2) = COUNT_BIG(set1 + set2)
, so it's easy to recalculate values of SUM and COUNT_BIG when the table is changed, using only the view rows and the values of the columns affected.
However, it's not the case with other aggregates, so they are not allowed in an indexed view.