Joining against views in SQLServer with strange query optimizer behavior - sql-server

I have a complex view that I use to pull a list of primary keys that indicate rows in a table that have been modified between two temporal points.
This view has to query 13 related tables and look at a changelog table to determine if a entity is "dirty" or not.
Even with all of this going on, doing a simple query:
select * from vwDirtyEntities;
Takes only 2 seconds.
However, if I change it to
select
e.Name
from
Entities e
inner join vwDirtyEntities de
on e.Entity_ID = de.Entity_ID
This takes 1.5 minutes.
However, if I do this:
declare #dirtyEntities table
(
Entity_id uniqueidentifier;
)
insert into #dirtyEntities
select * from vwDirtyEntities;
select
e.Name
from
Entities e
inner join #dirtyEntities de
on e.Entity_ID = de.Entity_ID
I get the same results in only 2 seconds.
This leads me to believe that SQLServer is evaluating the view per row when joined to Entities, instead of constructing a query plan that involves joining the single inner join above to the other joins in the view.
Note that I want to join against the full result set from this view, as it filters out only the keys I want internally.
I know I could make it into a materialized view, but this would involve schema binding the view and it's dependencies and I don't like the overhead maintaining the index would cause (This view is only queried for exports, while there are far more writes to the underlying tables).
So, aside from using a table variable to cache the view results, is there any way to tell SQL Server to cache the view while evaluating the join? I tried changing the join order (Select from the view and join against Entities), however that did not make any difference.
The view itself is also very efficient, and there is no room to optimize there.

There is nothing magical about a view. It's a macro that expands. The optimiser decides when JOINed to expand the view into the main query.
I'll address other points in your post:
you have ruled out an indexed view. A view can only be a discrete entity when it is indexed
SQL Server will never do a RBAR query on it's own. Only developers can write loops.
there is no concept of caching: every query uses latest data unless you use temp tables
you insist on using the view which you've decided is very efficient. But have no idea how views are treated by the optimizer and it has 13 tables
SQL is declarative: join order usually does not matter
Many serious DB developer don't use views because of limitations like this: they are not reusable because they are macros
Edit, another possibility. Predicate pushing on SQL Server 2005. That is, SQL Server can not push the JOIN condition "deeper" into the view.

Related

SQL Server Performance: LEFT JOIN vs NOT IN

The output of these two queries is the same. Just wondering if there's a performance difference between these 2 queries, if the requirement is to list down all the customers who have not purchased anything? The database used is the Northwind sample database. I'm using T-SQL.
select companyname
from Customers c
left join Orders o on c.customerid = o.customerid
where o.OrderID is null
select companyname
from Customers c
where Customerid not in (select customerid from orders)
If you want to find out empirically, I'd try them on a database with one customer who placed 1,000,000 orders.
And even then you should definitely keep in mind that the results you'll be seeing are valid only for the particular optimiser you're using (comes with particular version of particular DBMS) and for the particular physical design you're using (different sets of indexes or different detailed properties of some index might yield different performance characteristics).
Potentially the second is faster if the tables are indexed. So if orders has an index on customer ID, then NOT IN will mean that you aren't bringing back the entire ORDERS table.
But as Erwin said, a lot depends on how things are set up. I'd tend to go for the second option as I don't like bringing in tables unless I need data from them.

Benefits to views in stored procedures

I've tried searching in different ways, but haven't found a clear answer to my question. This question almost answers my query, but not quite.
Besides the obvious readability differences, are there any benefits to using a view in a stored procedure:
SELECT
*
FROM
view1
WHERE
view1.fdate > #start AND
view1.fdate <= #end
...over using a linked table list?
SELECT
*
FROM
table1
INNER JOIN
table2
ON table1.pid = table2.fid
INNER JOIN
table3
ON table1.pid = table3.fid
WHERE
table1.fdate > #start AND
table1.fdate <= #end
Is not all about your app and you.
Think enterprise databases, with tens of different apps accessing the same data, and hundreds of individuals querying the data for business purposes. How do you explain to each one of the many individual how to recompose your highly normalized data? Which lookup field maps to which table? How are they joined? And how to you grant read only access to the data, making sure some sensitive fields are not accessible, w/o repeating yourself?
You, the DBA, create VIEWs. They denormalize the data into easy to process relations for the business people, for the many apps, and for the reports. You grant select permission on the views w/o granting access to the underlying table to hide sensitive private fields. And sometime you write views because you're tired of being called at midnight because the database is 'down' cause Johnny from accounting is running a cartesian join.
There are no difference. Query plans will be identical in both cases. Query optimizer can use indexed view even if you don't use it explicitly (in case 2)

DB Performance - Left outer join over database funtion

This is a bit complex query which has multiple joins and reruns a lot of records with several data fields. Let’s say it basically use to retrieve manager details.
First set of tables (already implemented query):
Select m.name, d.name, d.address, m.salary , m.age,……
From manager m,department d,…..etc
JOINS …..
Assume, a one manger can have zero or more employees.
Let’s say I need to list down all employee names for each and every manager for result of first set of tables with managers who has no employees (which means want to keep the manager list of first set of tables as it is).
Then I have to access “employee” table through “party” tables (might be involved few more tables).
Second set of tables (to be newly connected):
That means there are one or more join with “employee” , “party” and …..etc
I have two approaches on this.
Make left outer join with first set of tables to second set of
tables.
Create a user define function (UDF) in DB level for second set of
tables. Then I have to insert manger id in to this UDF as a
parameter and take all the employees (e1,e2,…) as a formatted string
by calling through the select clause in the first set of tables
Please can someone suggest me the best solution in DB performance wise out of these two options?
Go for the JOIN, using appropriate WHERE clauses and indexes.
The database engine is far better at optimizing that you'll ever be. Let it do its job.
Your way sounds like (n+1) query death.
Write a sample query and ask your database to EXPLAIN PLAN to see what the cost is. If you spot a TABLE

Why we can edit view in sql server

In sql server we can Update data view.I think the concept of view is a read only table.
Why we can edit view in sql.is there possible in oracle?
To answer your question of why can we create an editable view, it is so that you can limit access to fields that you do not want updated (or viewed). Then you can give a user access to the view, but not to the underlying tables
For a simple example, you could have a personnel table. You could create an view allowing some users to update a field like emergency contact details, but not see or update bank details or salary
There are lots of criteria to meet to make a view updatable, and you can indeed use INSTEAD OF triggers for extended functionality http://msdn.microsoft.com/en-us/library/ms187956.aspx
I think the concept of view is a read only table
No, it's more of a virtual table - anywhere you have a real table, you ought to be able to replace it with a view, and the users should be none the wiser.
According to Codd:
Rule 6: The view updating rule:
All views that are theoretically updatable must be updatable by the system.
However, in practicality, this ideal has not been achievable.
In addition to what #JamieA wrote, views can not only limit access to fields, but also limit access to data in the table.
Look at simple SQL-Fiddle example and experiment with it.
The view in the example restrict access only to columns id,val1 of the table, but also restrics access to rows (only id = 2..10). You can update and delete only rows 2..10 throught the view.
However the view does not prevent insertion of a row with id = 20
Here is another example - a view with check option - it this case the view prevents not only deletes and updates, but prevent also inserting rows that do not match a where clause of the view.
#yogi wrote that we can't update a view if the view joins two tables -> here is a third demo that shows a simple view that joins two tables, and how an update of this view works.
These simple examples are for Oracle, but after small modifications should also work in MS-SQL (must change datatypes in create tables), since when i looked througs MSDN documentation (section: updatable views -> http://msdn.microsoft.com/en-us/library/ms187956.aspx), I didn't find any significant differences between ms-sql and oracle, it seems that views work similary on both databases.
Yes it is possible in Oracle, the other answers already explained why views are updatable and had shed some light on that question, they are also allowed in Oracle but have some restrictions/limitations here is the Oracle documentation
Like, the view select cant have: aggregate functions, distinct clause, group by... read the link for more info
Since views are read only tables and its doesn't support DML statements you can't perform update on view.
An interesting factor is there you can write update statemnt over view and write a instead of trigger for that hence you can perform multiple update statements on tables which are in the view.
According to Pinal Dev Views having following limitations
ORDER BY Does Not Work.
Adding Column is Expensive by Joining Table Outside View
Index Created on View not Used Often
SELECT * and Adding Column Issue in View
COUNT(*) Not Allowed but COUNT_BIG(*) Allowed
UNION Not Allowed but OR Allowed in Index View
Cross Database Queries Not Allowed in Indexed View
Outer Join Not Allowed in Indexed Views
SELF JOIN Not Allowed in Indexed View
Keywords View Definition Must Not Contain for Indexed View
View Over the View Not Possible with Index View

In SQL Server, when should I use an indexed view instead of a real table?

I know in SQL Server you can create indexes on a view, then the view saves the data from the underlying table. Then you can query the view. But, why I need to use view instead of table?
You may want to use a view to simplify on queries. In our projects, the consensus is on using views for interfaces, and especially "report interfaces".
Imagine you've got a client table, and the manager wants a report every morning with the client's name, and their account balance (or whatever). If you code your report against the table, you're creating a strong link between your report and your table, making later changes difficult.
On the other hand if your report hits a view, you can twist the database freely; as long as the view is the same the report works, the manager is happy and you're free to experiment with the database. You want to separate client metadata from the main client table? go for it, and join the two tables in the view. You want to denormalize the cart info for the client? no problem, the view can adapt...
To be honest, it's my view as a programmer but other uses will certainly be found by db gurus :)
One advantage of using an indexed view is for ordering results of 2 or more columns, where the columns are in different tables. ie, have a view which is the result of table1 and table2 sorted by table1.column1, table2.column2. You could then create an index on column1, column2 to optimise that query
A table is where the data is physically stored.
A view is where tables are summarized or grouped to make groups of tables easier to use.
An indexed view allows a query to use a view, and not need to get data from the underlying table, as the view already has the data, thus increasing performance.
You could not achieve the same result with just tables, without denormalizing your database, and thus potentially creating other issues.
Basically, use a view:
When you use the same complex query on many tables, multiple times.
When new system need to read old table data, but doesn't watch to change their perceived schema.
Indexed Views can improve performance by creating more specific index without increasing redundancy.
A view is simply a SELECT statement that has been given a name and stored in a database. The main advantage of a view is that once it's created, it acts like a table for any other SELECT statements that you want to write.
The select statement for the view can reference tables, other views and functions.
You can create an index on the view (indexed view) to improve performance. An indexed view is self-updating, immediately reflecting changes to the underlying tables.
If your indexed view only selects columns from one table, you could just as well place the index on that table and query that table directly, the view would only cause overhead for your database. However, if your SELECT statement covers multiple tables with joins etc. than you could gain a performance boost by placing an index on the view.

Resources