I want to know what is the difference between a query and a view in terms of performance. And if a view is costly, what else besides a query could I do to improve performance?
I can't speak for all databases, but in SQL Server you cannot index views unless you have an Enterprise version. An unindexed view can be significantly poorer in terms of performance than a query especially if you are writing a query against it to add some where conditions. Indexed views generally can perform fairly well. An indexed view can also be against multiple fields which are in differnt tables and that may imporve performance over the ad hoc query. (It may not too, in performance tuning, you must always test against your particular circumstances.)
One point against views is that they do not allow for run-time selection of where criteria. So often you end up with both a view and a query.
Views can be more easily maintained (Just add that new table in a join and everything accessing financial reports has it available) but they are much more difficult to performance tune. This is in part because they tend to be over generalized and thus are slower than their counterparts which only return the minimum necessary. And yes as Jonathan said, you can far too easily get into joining together views for a report into a mess which joins to the same large tables many more times than need be and is very slow.
Two places where views shine though is:
Making sure that complex relationships are always correctly described. This is one reason why report writers tend to favor them.
Limiting access to a subset of records
There are also limitations on the type of queries that can be done for a view vice an ad hoc query or a stored proc. For instance you can't use an if statement (or other procedural type code such as looping) or as noted above you cannot provide run-time values for the where criteria.
One place where views are often significantly slower is when they call other views. The underlying views need to be fully realized in some databases and thus you might need to callup 4,459,203 records to see the 10 you are ultimately interested in. Start to layer this more than once and it can get very slow, very fast; views that call views are simply a poor practice.
Views and ad-hoc queries, in the simple case, are nearly identical in terms of performance. So much so that when you program with a view, you should think of it as though the text of the view definition were being cut and pasted into your parent query.
HLGEM points out in his answer that certain editions of SQL Server allows you to "index" views -- in this case, behind the scenes SQL Server maintains the same structures that underlie a table, making an indexed view and a table very similar in terms of performance.
In SQL Server, though you can generally nest views fairly liberally without running into performance problems, it can make things more difficult to understand and debug.
In SQL Server I believe that the performance difference between views and queries is negligible. What I would recommend doing to improve performance is to create another table that holds the results of the view. You could perhaps create a staging table where new data is held and then a stored procedure can be run at some interval that populates the working table with the new information. A trigger might be good for this purpose. Depending on the requirements of your application this design may or may not be suitable. If you are working with near real-time data, this approach will lead to concurrency issues...
One other thing to look into, is to make absolutely sure that the base tables you are using to construct your view are indexed correctly, and that the query itself is optimized. Finally, I believe it is possible in SQL Server enterprise to create indexed views although I have not used them before.
If they do exactly the same thing a view might be slightly faster on first execution as the database server will have a precompiled execution plan for it. Depends on your server though.
Empasis on might and slightly...
Views promote code reuse and can abstract away database complexity to give a more coherent 'business' model of data. However they are not nearly as tunable. You may find yourself in a position where you need to provide join hints or other low level optimisations and many DBA's that i have worked with do not like them being applied to views as they may then be reused across many queries, the opinion being that these types of hints should be employed as sparingly as possible. I like using views myself.
A view is barely more expensive to the computer than writing out the query longhand. A view can save the programmer/user a lot of time writing the same query out time after time, and getting it wrong, and so on. The view may also be the only way to access the data if views are also used to enforce authorization (access control) on the underlying tables.
If the query does not perform well, you need to review how the query is formed, and whether the tables all have the appropriate indexes on them. If your system needs accurate statistics for the optimizer to perform well, have you updated those statistics sufficiently recently?
Once upon a long time ago, I came across a system where a query generator had created one query that listed seventeen tables in a single FROM clause, including several LEFT OUTER JOIN of a table with itself. And, in fact, closer scrutiny revealed that several of the 'tables' were in fact multi-table views, and some of these also involved self outer joins, and were themselves involved in self outer joins of the view. To say "ghastly" is an understatement. There was a lot of cleanup possible to improve the performance of that query - eliminating unnecessary outer joins, self joins, and so on. (It actually pre-dated the explicit join notation of SQL-92 - I said a long time ago - so the outer join syntax was DBMS-specific.)
If you mean network performance then working from a local cache (as with ADO.Net DataSets) would reduce network traffic- but could cause problems with locking. Just a thought.
A view is still a query, it just abstracts certain parts of it so that your queries can be simplified (if they do similar things) and to maximize reuse.
Related
I just found out that a report I quickly threw together years ago has been the sole means of collecting millions of dollars, and there isn't anything being done to check if it is correct.
For performance reasons, the report makes heavy use of indexed views. This concerns me, since while I have used indexed views a lot, I tend not to use them for anything this critical.
Is it possible that the indexed views can fail to update or otherwise return information different from the data in the tables? How real of a risk is this? Is there a good SQL script I can run periodically to check for errors?
There is zero risk for inconsistency according to the docs.
In practice, you have to deal with product bugs. They are not a realistic concern.
Indexed view maintenance is based on the exact same mechanism that indexes are based on: They are updated as part of the DML query plan. I guess you wouldn't expect indexes to become corrupt so you should trust indexes views with about the same strength.
I am trying to layout the tables for use in new public-facing website. Seeing how there will lots more reading than writing data (guessing >85% reading) I would like to optimize the database for reading.
Whenever we list members we are planning on showing summary information about the members. Something akin to the reputation points and badges that stackoverflow uses. Instead of doing a subquery to find the information each time we do a search, I wanted to have a "calculated" field in the member table.
Whenever an action is initiated that would affect this field, say the member gets more points, we simply update this field by running a query to calculate the new values.
Obviously, there would be the need to keep this field up to date, but even if the field gets out of sync, we can always rerun the query to update this field.
My question: Is this an appropriate approach to optimizing the database? Or are the subqueries fast enough where the performance would not suffer.
There are two parts:
Caching
Tuned Query
Indexed Views (AKA Materialized views)
Tuned table
The best solution requires querying the database as little as possible, which would require caching. But you still need a query to fill that cache, and the cache needs to be refreshed when it is stale...
Indexed views are the next consideration. Because they are indexed, querying against is faster than an ordinary view (which is equivalent to a subquery). Nonclustered indexes can be applied to indexed views as well. The problem is that indexed views (materialized views in general) are very constrained to what they support - they can't have non-deterministic functions (IE: GETDATE()), extremely limited aggregate support, etc.
If what you need can't be handled by an indexed view, a table where the data is dumped & refreshed via a SQL Server Job is the next alternative. Like the indexed view, indexes would be applied to make fetching data faster. But data change means cleaning up the indexes to ensure the query is running as best it can, and this maintenance can take time.
The least expensive database query is the one that you don't have to run against the database at all.
In the scenario you describe, using a high-performance cache technology (example: memcached) to store query results in your application can be a lot better strategy than trying to trick out the database to be highly scalable.
The First Rule of Program Optimization: Don't do it.
The Second Rule of Program Optimization (for experts only!): Don't do it yet.
Michael A. Jackson
If you are just designing the tables, I'd say, it's definitely premature to optimize.
You might want to redesign your database a few days later, you might find out that things work pretty fast without any clever hacks, you might find out they work slow, but in a different way than you expected. In either case you would waste your time, if you start optimizing now.
The approach you describe is generally fine; you could get some pre-computed values, either using triggers/SPs to preserve data consistency, or running a job to update these values time-to-time.
All databases are more than 85% read only! Usually high nineties too.
Tune it when you need to and not before.
I just heard that you should create an index on any column you're joining or querying on. If the criterion is this simple, why can't databases automatically create the indexes they need?
Well, they do; to some extent at least...
See SQL Server Database Engine Tuning Advisor, for instance.
However, creating optimal indexes is not as simple as you mentioned. An even simpler rule could be to create indexes on every column (which is far from optimal)!
Indexes are not free. You create indexes at the cost of storage and update performance among other things. They should be carefully thought about to be optimal.
Every index you add may increase the speed of your queries. It will decrease the speed of your updates, inserts and deletes and it will increase disk space usage.
I, for one, would rather keep the control to myself, using tools such as DB Visualizer and explain statements to provide the information I need to evaluate what should be done. I do not want a DBMS unilaterally deciding what's best.
It's far better, in my opinion, that a truly intelligent entity be making decisions re database tuning. The DBMS can suggest all it wants but the final decision should be left up to the DBAs.
What happens when the database usage patterns change for one week? Do you really want the DBMS creating indexes and destroying them a week later? That sounds like a management nightmare scenario right up alongside Skynet :-)
This is a good question. Databases could create the indexes they need based on data usage patterns, but this means that the database would be slow the first time certain queries were executed and then get faster as time goes on. For example if there is a table like this:
ID USERNAME
-- --------
: then the username would be used to look up the users very often. After some time the database could see that say 50% of queries did this, in which case it could add an index on the username.
However the reason that this hasn't been implemented in great detail is simply because it is not a killer feature. Adding indexes is performed relatively few times by the DBA, and by automating this (which is a very big task) is probably just not worth it for the database vendors. Remember that every query will have to be analyzed to enable auto indexes, and also the query response time, and result set size as well, so it is non-trivial to implement.
Because databases simply store and retrieve data - the database engine has no clue how you intend to retrieve that data until you actually do it, in which case it is too late to create an index. And the column you are joining on may not be suitable for an efficient index.
It's a non-trivial problem to solve, and in many cases a sub-optimal automatic solution might actually make things worse. Imagine a database whose read operations were sped up by automatic index creation but whose inserts and updates got hosed as a result of the overhead of managing the index? Whether that's good or bad depends on the nature of your database and the application it's serving.
If there were a one-size-fits-all solution, databases would certainly do this already (and there are tools to suggest exactly this sort of optimization). But tuning database performance is largely an app-specific function and is best accomplished manually, at least for now.
An RDBMS could easily self-tune and create indices as it saw fit but this would only work for simple cases with queries that do not have demanding execution plans. Most indices are created to optimize for specific purposes and these kinds of optimizations are better handled manually.
I have Six tables which will joined in many queries.If a View created by joining all the tables will increase the performance?.Is there are any alternatives to increase the performance?. I am using oracle as the database and all the joined columns are indexed.
A view in and of itself will not increase performance. With that said depending on the database engine you are using there are things you can do with a view.
In SQL Server you can put an index on the view (Assuming the view fits a variety of requirements). This can greatly improve the performance.
What DBMS are you using?
Edit
In oracle you can create a Materialized View. Here's a link which should get you started. I don't know Oracle so I really can only provide links.
http://searchoracle.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid41_gci909605,00.html
Correct indexing is the key to improving performance when joining to many tables.
Add a view may improve performance if it is indexed or may decrease performance if it is not. A lot depends on your database design and the actual query the view is based on. Once thing a view will do is ensure that everyone who uses is is looking at the same six tables. This is a plus or a minus depending on the query (not all queries based on the view may actually need all six tables).
The very first thing I would look at is whether your foreign key fields are currently indexed. Primary keys are given an index when created, but foreign keys are not. Since they are used in the joins, they should be indexed for performance.
It should be noted how views work: They run the view query at run time. They are not pre-cached, so without indexes, they are essentially function calls.
Make heavy uses of indexes on your tables and views to improve performance. Don't shotgun them, but well designed indexes can be day and night for your performance.
It also depends on the Database. If the RDMS does not analyze the underlying query based on your criteria, it may not perform well at all.
Most current RDMS's will do a query rewrite (I think some earlier versions of SQL Server, like 6 or 7, did not do this), it will pull apart the view in light of your query, then rewrite the whole thing based on its optimization techniques.
The various iterations of queries using the view could have their plans cached and gain some performance in that way.
So, from my understanding, at least in theory (depending on the database optimizer), using views is not different then writing the whole query.
Are database views only a means to simplify the access of data or does it provide performance benefits when accessing the views as opposed to just running the query which the view is based on? I suspect views are functionally equivalent to just the adding the stored view query to each query on the view data, is this correct or are there other details and/or optimizations happening?
I have always considered Views to be like a read-only Stored Procedures. You give the database as much information as you can in advance so it can pre-compile as best it can.
You can index views as well allowing you access to an optimised view of the data you are after for the type of query you are running.
Although a certain query running inside a view and the same query running outside of the view should perform equivalently, things get much more complicated quickly when you need to join two views together. You can easily end up bringing tables that you don't need into the query, or bringing tables in redundantly. The database's optimizer may have more trouble creating a good query execution plan. So while views can be very good in terms of allowing more fine grained security and the like, they are not necessarily good for modularity.
It depends on the RDBMS, but usually there isn't optimization going on, and it's just a convenient way to simplify queries. Some database systems use "materialized views" however, which do use a caching mechanism.
Usually a view is just a way to create a common shorthand for defining result sets that you need frequently.
However, there is a downside. The temptation is to add in every column you think you might need somewhere sometime when you might like to use the view. So YAGNI is violated. Not only columns, but sometimes additional outer joins get tacked on "just in case". So covering indexes might not cover any more, and the query plan may increase in complexity (and drop in efficiency).
YAGNI is a critical concept in SQL design.
Generally speaking, views should perform equivalently to a query written directly on the underlying tables.
But: there may be edge cases, and it would behoove you to test your code. All modern RDBMS systems have tools that will let you see the queryplans, and monitor execution. Don't take my (or anybody else's) word for it, when you can have the definitive data at your fingertips.
I know this is an old thread. Discussion is good, but I do want to throw in one more thought. Performance also depends on what you are using to pull data with. For example, if you are front-ending with something like Microsoft Access you can definately gain performance for some complex queries by using a view. This is because Access does not always pull from the SQL server as we would like -- in some cases it would pull entire tables across then try to process locally from there! Not so if you use a view.
Yes, in all modern RDBMS's (MSSQL after 2005? etc) view's query plans are cached removing the overhead of planning the query and speeding up performance over the same SQL performed in-line. Previously to this (and it applies to parameterized SQL/Prepared Statements as well) people correctly thought stored procedures performed better.
Many still hang onto this today making it a modern DB myth. Ever since Views/PS's got the cached query planning of SPs they've been pretty much even.