Do database views affect query performance? - database

Are database views only a means to simplify the access of data or does it provide performance benefits when accessing the views as opposed to just running the query which the view is based on? I suspect views are functionally equivalent to just the adding the stored view query to each query on the view data, is this correct or are there other details and/or optimizations happening?

I have always considered Views to be like a read-only Stored Procedures. You give the database as much information as you can in advance so it can pre-compile as best it can.
You can index views as well allowing you access to an optimised view of the data you are after for the type of query you are running.

Although a certain query running inside a view and the same query running outside of the view should perform equivalently, things get much more complicated quickly when you need to join two views together. You can easily end up bringing tables that you don't need into the query, or bringing tables in redundantly. The database's optimizer may have more trouble creating a good query execution plan. So while views can be very good in terms of allowing more fine grained security and the like, they are not necessarily good for modularity.

It depends on the RDBMS, but usually there isn't optimization going on, and it's just a convenient way to simplify queries. Some database systems use "materialized views" however, which do use a caching mechanism.

Usually a view is just a way to create a common shorthand for defining result sets that you need frequently.
However, there is a downside. The temptation is to add in every column you think you might need somewhere sometime when you might like to use the view. So YAGNI is violated. Not only columns, but sometimes additional outer joins get tacked on "just in case". So covering indexes might not cover any more, and the query plan may increase in complexity (and drop in efficiency).
YAGNI is a critical concept in SQL design.

Generally speaking, views should perform equivalently to a query written directly on the underlying tables.
But: there may be edge cases, and it would behoove you to test your code. All modern RDBMS systems have tools that will let you see the queryplans, and monitor execution. Don't take my (or anybody else's) word for it, when you can have the definitive data at your fingertips.

I know this is an old thread. Discussion is good, but I do want to throw in one more thought. Performance also depends on what you are using to pull data with. For example, if you are front-ending with something like Microsoft Access you can definately gain performance for some complex queries by using a view. This is because Access does not always pull from the SQL server as we would like -- in some cases it would pull entire tables across then try to process locally from there! Not so if you use a view.

Yes, in all modern RDBMS's (MSSQL after 2005? etc) view's query plans are cached removing the overhead of planning the query and speeding up performance over the same SQL performed in-line. Previously to this (and it applies to parameterized SQL/Prepared Statements as well) people correctly thought stored procedures performed better.
Many still hang onto this today making it a modern DB myth. Ever since Views/PS's got the cached query planning of SPs they've been pretty much even.

Related

Using NOLOCK Hint in EF4?

We're evaluating EF4 and my DBA says we must use the NOLOCK hint in all our SELECT statements. So I'm looking into how to make this happen when using EF4.
I've read the different ideas on how to make this happen in EF4, but all seem like a work around and not sanctioned by Microsoft or EF4. What is the "official Microsoft" response to someone who wants their SELECT statement(s) to include the NOLOCK hint when using LINQ-to-SQL / LINQ-to-Entities and EF4?
By the way, the absolute best information I have found was right here and I encourage everyone interested in this topic to read this thread.
Thanks.
NOLOCK = "READ UNCOMMITTED" = dirty reads
I'd assume MS knows why they chose the default isolation level as "READ COMMITTED"
NOLOCK, in fact any hint, should be used very judiciously: not by default.
Your DBA is a muppet. See this (SO): What can happen as a result of using (nolock) on every SELECT in SQL Sever?. If you happen to work at a bank, or any institution where I may have an account please let me know so I can close it.
I'm a developer on a tools team in the SQL org at Microsoft. I'm in no way authorized to make any official statement, and I'm sure there are people on SO who know more about these things than I do. Nevertheless, I'll offer a friendly rule of thumb, along the theme of "Premature optimization is the root of all evil":
Don't use NOLOCK (or any other query hint for that matter), until you have to. If you have a select statement which has a decent query plan, and it runs fine when there is very little other load on the system, but then it slows down when other queries are accessing the same table, try adding some NOLOCK hints. But always understand that when you do, you run the risk of getting inconsistent data. If you are writing some mission critical app that does online banking or controls an aircraft, this may be unacceptable. However, for many applications the perf speedup is worth the risk. Evaluate on a case-by-case basis, though. Don't just use them willy nilly all over the place.
If you do choose to use NOLOCK, I have blogged a solution in C# using extension methods, so that you can easily change a LINQ query to use NOLOCK hints. If you can adapt this to EF4, please post your adaptation.
EF4 does not currently have a built in way to do it IF ef4 is generating all your queries.
There are ways around this such as using stored procedures or a more extended inline query model, however, this can be time consuming to say the least.
I believe (and I don't speak for Microsoft on this) that caching is Microsoft's intended solution for lightening the load on the server in EF4 sites. Having read uncommitted (or nolock) built into a framework would create unpredictable issues for the expected behaviour of EF4 when 2 contexts are run at the same time. That doesn't mean your situation needs that level of concurrency.
It sounds like you were asked for nolock on ALL selects. While I agree with earlier poster that this can be dangerous if you have ANY transactions that need to be transactions, I don't agree that automatically makes the DBA a muppet. You might just be running a CMS which is totally cool for dirty reads. You can change the ISOLATION LEVEL on your whole database which can have the same effect.
The DBA may have recommended nolock for operations that were ONLY selects (which is fine, especially if there's an ORM being misuesd and doing some dodgy data dumps). The funniest thing about that muppet comment is that Stack Overflow itself runs SQL server in a READ UNCOMMITTED mode. Guess you need to find somewhere else to get answers for your problems then?
Talk to your DBA about the posibility of setting this on a database level or consider a caching strategy if you only need it in a few places. The web is stateless after all so concurrency can often be an illusion anyway unless you address it direclty.
Info about isolation levels
Having worked with EF4 for over a year now, I will offer that using stored procedures for specific tasks is not a hack and absolutely necessary for performance under certain situations.
Our platform gets a lot of traffic through our web site, APIs and ETL data feeds. We use EF primarily on our web side, but also for some back-end processes. Sometimes EF does a great job with its query generation, sometimes it is terrible. You need to look at the queries being generated, load them into query analyzer, and decide whether you might be better off writing the operation in another way (stored procedure, etc.).
If you find that you need to make data available via EF and need NOLOCKs, you can always create views with the NOLOCK hints included, and expose the view to EF instead of the underlying table. The same can be done with Stored Procedures. These methods are probably a bit easier when you are using the Code First approach.
But I think that one mistake a lot of people make with EF is believing that the EF object model has to map directly to the physical (table) model in the database. It doesn't and this is where your DBA comes into play. Let him design your physical model and you work together to abstract your logical data model which is mapped to your object model in EF.
Although this would be a major PITA to do, you can always drop your SQL in a stored procedure and get the functionality you need (or are forced into). It's definitely a hack though!
I know this isn't an answer to your question, but I just wanted to throw this in.
It seems to me that this is (at least partially) the DBA's job. It's fine to say that an application should behave a certain way, and you can and should certainly attempt to program it the way that he would like.
The only way to be sure though, is for the DBA to work on the application with you and construct the DB surface that he would like to present to the app. If he wants critical tables to be queried as READ UNCOMMITTED, then he should help to provide a set of stored procedures with the correct access and isolation level.
Relying on the application code to construct every ad-hoc query correctly is not a scalable approach.

Optimize database for web usage (lots more reading than writing)

I am trying to layout the tables for use in new public-facing website. Seeing how there will lots more reading than writing data (guessing >85% reading) I would like to optimize the database for reading.
Whenever we list members we are planning on showing summary information about the members. Something akin to the reputation points and badges that stackoverflow uses. Instead of doing a subquery to find the information each time we do a search, I wanted to have a "calculated" field in the member table.
Whenever an action is initiated that would affect this field, say the member gets more points, we simply update this field by running a query to calculate the new values.
Obviously, there would be the need to keep this field up to date, but even if the field gets out of sync, we can always rerun the query to update this field.
My question: Is this an appropriate approach to optimizing the database? Or are the subqueries fast enough where the performance would not suffer.
There are two parts:
Caching
Tuned Query
Indexed Views (AKA Materialized views)
Tuned table
The best solution requires querying the database as little as possible, which would require caching. But you still need a query to fill that cache, and the cache needs to be refreshed when it is stale...
Indexed views are the next consideration. Because they are indexed, querying against is faster than an ordinary view (which is equivalent to a subquery). Nonclustered indexes can be applied to indexed views as well. The problem is that indexed views (materialized views in general) are very constrained to what they support - they can't have non-deterministic functions (IE: GETDATE()), extremely limited aggregate support, etc.
If what you need can't be handled by an indexed view, a table where the data is dumped & refreshed via a SQL Server Job is the next alternative. Like the indexed view, indexes would be applied to make fetching data faster. But data change means cleaning up the indexes to ensure the query is running as best it can, and this maintenance can take time.
The least expensive database query is the one that you don't have to run against the database at all.
In the scenario you describe, using a high-performance cache technology (example: memcached) to store query results in your application can be a lot better strategy than trying to trick out the database to be highly scalable.
The First Rule of Program Optimization: Don't do it.
The Second Rule of Program Optimization (for experts only!): Don't do it yet.
Michael A. Jackson
If you are just designing the tables, I'd say, it's definitely premature to optimize.
You might want to redesign your database a few days later, you might find out that things work pretty fast without any clever hacks, you might find out they work slow, but in a different way than you expected. In either case you would waste your time, if you start optimizing now.
The approach you describe is generally fine; you could get some pre-computed values, either using triggers/SPs to preserve data consistency, or running a job to update these values time-to-time.
All databases are more than 85% read only! Usually high nineties too.
Tune it when you need to and not before.

Why don't databases intelligently create the indexes they need?

I just heard that you should create an index on any column you're joining or querying on. If the criterion is this simple, why can't databases automatically create the indexes they need?
Well, they do; to some extent at least...
See SQL Server Database Engine Tuning Advisor, for instance.
However, creating optimal indexes is not as simple as you mentioned. An even simpler rule could be to create indexes on every column (which is far from optimal)!
Indexes are not free. You create indexes at the cost of storage and update performance among other things. They should be carefully thought about to be optimal.
Every index you add may increase the speed of your queries. It will decrease the speed of your updates, inserts and deletes and it will increase disk space usage.
I, for one, would rather keep the control to myself, using tools such as DB Visualizer and explain statements to provide the information I need to evaluate what should be done. I do not want a DBMS unilaterally deciding what's best.
It's far better, in my opinion, that a truly intelligent entity be making decisions re database tuning. The DBMS can suggest all it wants but the final decision should be left up to the DBAs.
What happens when the database usage patterns change for one week? Do you really want the DBMS creating indexes and destroying them a week later? That sounds like a management nightmare scenario right up alongside Skynet :-)
This is a good question. Databases could create the indexes they need based on data usage patterns, but this means that the database would be slow the first time certain queries were executed and then get faster as time goes on. For example if there is a table like this:
ID USERNAME
-- --------
: then the username would be used to look up the users very often. After some time the database could see that say 50% of queries did this, in which case it could add an index on the username.
However the reason that this hasn't been implemented in great detail is simply because it is not a killer feature. Adding indexes is performed relatively few times by the DBA, and by automating this (which is a very big task) is probably just not worth it for the database vendors. Remember that every query will have to be analyzed to enable auto indexes, and also the query response time, and result set size as well, so it is non-trivial to implement.
Because databases simply store and retrieve data - the database engine has no clue how you intend to retrieve that data until you actually do it, in which case it is too late to create an index. And the column you are joining on may not be suitable for an efficient index.
It's a non-trivial problem to solve, and in many cases a sub-optimal automatic solution might actually make things worse. Imagine a database whose read operations were sped up by automatic index creation but whose inserts and updates got hosed as a result of the overhead of managing the index? Whether that's good or bad depends on the nature of your database and the application it's serving.
If there were a one-size-fits-all solution, databases would certainly do this already (and there are tools to suggest exactly this sort of optimization). But tuning database performance is largely an app-specific function and is best accomplished manually, at least for now.
An RDBMS could easily self-tune and create indices as it saw fit but this would only work for simple cases with queries that do not have demanding execution plans. Most indices are created to optimize for specific purposes and these kinds of optimizations are better handled manually.

Query vs. View

I want to know what is the difference between a query and a view in terms of performance. And if a view is costly, what else besides a query could I do to improve performance?
I can't speak for all databases, but in SQL Server you cannot index views unless you have an Enterprise version. An unindexed view can be significantly poorer in terms of performance than a query especially if you are writing a query against it to add some where conditions. Indexed views generally can perform fairly well. An indexed view can also be against multiple fields which are in differnt tables and that may imporve performance over the ad hoc query. (It may not too, in performance tuning, you must always test against your particular circumstances.)
One point against views is that they do not allow for run-time selection of where criteria. So often you end up with both a view and a query.
Views can be more easily maintained (Just add that new table in a join and everything accessing financial reports has it available) but they are much more difficult to performance tune. This is in part because they tend to be over generalized and thus are slower than their counterparts which only return the minimum necessary. And yes as Jonathan said, you can far too easily get into joining together views for a report into a mess which joins to the same large tables many more times than need be and is very slow.
Two places where views shine though is:
Making sure that complex relationships are always correctly described. This is one reason why report writers tend to favor them.
Limiting access to a subset of records
There are also limitations on the type of queries that can be done for a view vice an ad hoc query or a stored proc. For instance you can't use an if statement (or other procedural type code such as looping) or as noted above you cannot provide run-time values for the where criteria.
One place where views are often significantly slower is when they call other views. The underlying views need to be fully realized in some databases and thus you might need to callup 4,459,203 records to see the 10 you are ultimately interested in. Start to layer this more than once and it can get very slow, very fast; views that call views are simply a poor practice.
Views and ad-hoc queries, in the simple case, are nearly identical in terms of performance. So much so that when you program with a view, you should think of it as though the text of the view definition were being cut and pasted into your parent query.
HLGEM points out in his answer that certain editions of SQL Server allows you to "index" views -- in this case, behind the scenes SQL Server maintains the same structures that underlie a table, making an indexed view and a table very similar in terms of performance.
In SQL Server, though you can generally nest views fairly liberally without running into performance problems, it can make things more difficult to understand and debug.
In SQL Server I believe that the performance difference between views and queries is negligible. What I would recommend doing to improve performance is to create another table that holds the results of the view. You could perhaps create a staging table where new data is held and then a stored procedure can be run at some interval that populates the working table with the new information. A trigger might be good for this purpose. Depending on the requirements of your application this design may or may not be suitable. If you are working with near real-time data, this approach will lead to concurrency issues...
One other thing to look into, is to make absolutely sure that the base tables you are using to construct your view are indexed correctly, and that the query itself is optimized. Finally, I believe it is possible in SQL Server enterprise to create indexed views although I have not used them before.
If they do exactly the same thing a view might be slightly faster on first execution as the database server will have a precompiled execution plan for it. Depends on your server though.
Empasis on might and slightly...
Views promote code reuse and can abstract away database complexity to give a more coherent 'business' model of data. However they are not nearly as tunable. You may find yourself in a position where you need to provide join hints or other low level optimisations and many DBA's that i have worked with do not like them being applied to views as they may then be reused across many queries, the opinion being that these types of hints should be employed as sparingly as possible. I like using views myself.
A view is barely more expensive to the computer than writing out the query longhand. A view can save the programmer/user a lot of time writing the same query out time after time, and getting it wrong, and so on. The view may also be the only way to access the data if views are also used to enforce authorization (access control) on the underlying tables.
If the query does not perform well, you need to review how the query is formed, and whether the tables all have the appropriate indexes on them. If your system needs accurate statistics for the optimizer to perform well, have you updated those statistics sufficiently recently?
Once upon a long time ago, I came across a system where a query generator had created one query that listed seventeen tables in a single FROM clause, including several LEFT OUTER JOIN of a table with itself. And, in fact, closer scrutiny revealed that several of the 'tables' were in fact multi-table views, and some of these also involved self outer joins, and were themselves involved in self outer joins of the view. To say "ghastly" is an understatement. There was a lot of cleanup possible to improve the performance of that query - eliminating unnecessary outer joins, self joins, and so on. (It actually pre-dated the explicit join notation of SQL-92 - I said a long time ago - so the outer join syntax was DBMS-specific.)
If you mean network performance then working from a local cache (as with ADO.Net DataSets) would reduce network traffic- but could cause problems with locking. Just a thought.
A view is still a query, it just abstracts certain parts of it so that your queries can be simplified (if they do similar things) and to maximize reuse.

Does LINQ To SQL provide faster response times than using ado.net and oledb?

LINQ simplifies database programming no doubt, but does it have a downside? Inline SQL requires one to communicate with the database in a certain way that opens the database to injections. Inline SQL must also be syntax-checked, have a plan built, and then executed, which takes precious cycles. Stored procedures have also been a rock-solid standard in great database application programming. Many programmers I know use a data layer that simplifies development, however, not to the extent LINQ does. Is it time to give up on the SP's and go LINQ?
LINQ to SQL actually presents some alarming performance problems in the database. Basically, it creates multiple execution plans based on the length of the parameter you are using. I posted about it a while back on my blog LINQ to SQL may cause performance problems.
Now, is that to say that LINQ doesn't have a place? Hardly. LINQ definitely has a place in the development toolkit, just like stored procedures. Ultimately, you want to use stored procedures when performance is absolutely necessary and use an ORM tool in any other situation.
As far as inline SQL goes, there are ways to execute inline SQL so that the plan is only built once and is never recompiled. Most ORMs should take care of this aspect of performance tuning as well and using these methods is usually the safest way to execute your SQL since it forces you to use parameterized queries.
Like most database solutions, the right answer depends on the problem you're trying to solve. If you favor development speed over database/application performance, then using LINQ or another DAL/ORM tool is the best way to go. If you favor performance over ease of development, then using stored procedures and pure datasets is going to be your best bet. LLBLGen even provides a LINQ to LLBLGen layer so you can use LINQ to query LLBLGen's objects and have LLBLGen actually handle building your queries and avoid some of the downfalls of LINQ.
Your basic premise is flawed..
Inline SQL requires one to communicate with the database in a certain way that opens the database to injections.
No it doesn't. Hard-coding user-inputted values into a SQL statement does, but you could do that with store procedures as well.
Parameterizing your queries guards against injection attacks, but inline SQL can be parameterizing just as easily as stored procedures.
Inline SQL must also be syntax-checked, have a plan built, and then executed.
All Sql (SPs and inline) must be syntax-checked and have a plan built on their first call. Thereafter, the exact text of the request & the execution plan are cached. If another request with the exact same text (not counting parameters) is received, the cached execution plan is used.
So, if you hard-code values into inline SQL, the text won't match, and it will have to re-parse the query. However, if you use parameters, the text of the query will match, and you will get a cache hit. In which case, it wouldn't matter if the query in inline SQL or a SP.
In other words, the only problem with inline SQL is that it easy to do something that slow & insecure. But making inline SQL fast & secure is no more work that using a SP.
Which brings us to LINQ, which always using parameters, even if you hard-code the values into the LINQ statement, making "fast & secure" inline SQL trivial.
LINQ also have the advantage over SPs of having all your code in one spot, instead of scattered over two different machines.
If you're interested in benchmarking, Rico Mariani has an excellent 5-part study that covers the qualitative and quantitative differences.
He may be an MS guy, but he's known as a performance nut - his benchmarks are thorough and well thought out.
This is a performance run by Maximilian Beller. According to him, LINQ is much much slower.
Read his comprehensive study
Just think about changing a columns name - now change the (n)SPs and (x)Views.
Do everything that is expensive on the database (like searches , sorting etc..) and you won't notice a problem.
Also, if you want to display a large grid without paging ... then use a dataset - that one is faster.
StackOverflow also uses linq2sql - do you see a problem :) ?
Use an ORM - it's the way to go on most applications.
PS: also, about micro benchmarks - like .. let's select 10.000 rows with an ORM - DON'T DO IT. That's not why you use an ORM. If you want to select 10.000 rows use ADO.
It depends on what you're doing. LINQ is going to be less efficient at the actual data/set manipulation than a real database. But you'll save a lot in not having to connect to the database over a network.
If your database is on the same machine or is formally 'well-connected', you're probably better off using it.
But if you're getting back a large result set from a remote db that could mean significant transmission time, or if it's a really short query that won't justify the overhead, LINQ would likely be better.
Because of the structure of LINQ to SQL, there is no possible way it can be faster than using raw SQL, either your own well-formed queries or as a stored procedure. What LINQ buys you is not speed but type safety and organization; in short most of the benefits that ORMs generally grant you.
LINQ to SQL is not about speed, it's about building a more maintainable software system. It's about all the stuff dedicated Software Engineers and Architects care about, stuff like loose coupling and layering
That's not to say that you can't build some really unmaintainable code with LINQ -- nobody is keeping you from shooting yourself in the foot but you -- but done properly, LINQ can help tremendously. I'm not saying LINQ is a silver bullet, however. It has a host of issues that make it difficult to use in many enterprise situations -- which is why MS offers Entity Framework (ADO.NET 3.0). Of course, even that's not perfect given the recent EF Vote of No Confidence.
Is LINQ to SQL or even EF better than raw SQL? I'd say a resounding Hells Yeah. Are there other solutions that might work better? Maybe.

Resources