I have Six tables which will joined in many queries.If a View created by joining all the tables will increase the performance?.Is there are any alternatives to increase the performance?. I am using oracle as the database and all the joined columns are indexed.
A view in and of itself will not increase performance. With that said depending on the database engine you are using there are things you can do with a view.
In SQL Server you can put an index on the view (Assuming the view fits a variety of requirements). This can greatly improve the performance.
What DBMS are you using?
Edit
In oracle you can create a Materialized View. Here's a link which should get you started. I don't know Oracle so I really can only provide links.
http://searchoracle.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid41_gci909605,00.html
Correct indexing is the key to improving performance when joining to many tables.
Add a view may improve performance if it is indexed or may decrease performance if it is not. A lot depends on your database design and the actual query the view is based on. Once thing a view will do is ensure that everyone who uses is is looking at the same six tables. This is a plus or a minus depending on the query (not all queries based on the view may actually need all six tables).
The very first thing I would look at is whether your foreign key fields are currently indexed. Primary keys are given an index when created, but foreign keys are not. Since they are used in the joins, they should be indexed for performance.
It should be noted how views work: They run the view query at run time. They are not pre-cached, so without indexes, they are essentially function calls.
Make heavy uses of indexes on your tables and views to improve performance. Don't shotgun them, but well designed indexes can be day and night for your performance.
It also depends on the Database. If the RDMS does not analyze the underlying query based on your criteria, it may not perform well at all.
Most current RDMS's will do a query rewrite (I think some earlier versions of SQL Server, like 6 or 7, did not do this), it will pull apart the view in light of your query, then rewrite the whole thing based on its optimization techniques.
The various iterations of queries using the view could have their plans cached and gain some performance in that way.
So, from my understanding, at least in theory (depending on the database optimizer), using views is not different then writing the whole query.
Related
I just found out that a report I quickly threw together years ago has been the sole means of collecting millions of dollars, and there isn't anything being done to check if it is correct.
For performance reasons, the report makes heavy use of indexed views. This concerns me, since while I have used indexed views a lot, I tend not to use them for anything this critical.
Is it possible that the indexed views can fail to update or otherwise return information different from the data in the tables? How real of a risk is this? Is there a good SQL script I can run periodically to check for errors?
There is zero risk for inconsistency according to the docs.
In practice, you have to deal with product bugs. They are not a realistic concern.
Indexed view maintenance is based on the exact same mechanism that indexes are based on: They are updated as part of the DML query plan. I guess you wouldn't expect indexes to become corrupt so you should trust indexes views with about the same strength.
This is a follow up to a previous question of mine after definitely deciding on partition switching as the best way to quickly get data into a heavily indexed fact type table that needs to remain available to readers.
While it seems to be the best way, it is not quite good enough to really satisfy the requirement to allow several (< 5) users to bulk insert at the same time, have the new data indexed and to appear in the indexed views (not necessarily real indexed views, just selects that rely on indices).
The idea of partitioning was that each partition and the index subtree rooted at the partition could, in parallel, be locked as read-only, copied into a working table, new data inserted/updated and the indexes rebuilt then switched back into the main table so readers aren't affected.
The problem is the single working table. Each parallel bulk insert needs its own copy, with the same constraints as the main table to allow switching.
So far I've hit several walls trying to get around this bottleneck:
I tried partitioning the working table using the same partition
function. This doesn't work because you can't disable the indexes on
a partition basis to insert into one while rebuilding the index on
another.
Creating a temporary table as the working table. This
doesn't work because, while you can use the same index names, you
can't easily dynamically create the constraints and can't switch
that in anyway.
Have a fixed set of named working tables? How can I select one and work with it under an alias so I have just one stored proc?
Dynamic SQL? I've tried very hard to avoid going that route. It's complicated as it is.
Big challenge but has anyone got any ideas before I accept the bottleneck? Would Sql 2012 help? How do proper data warehouses cope with this?
How do proper data warehouses cope with this? Compromise and set realistic goals for the EDW. The data warehouse can't be everything to everyone. Make sure that what you're implementing is the best solution for the business (not just the techies/analysts). Are your goals realistic if you cannot find solutions from experienced peers and experts?
Associate a cost with all of the hoops you jump through. Does the data really need to be up to the minute? What if I told you that we needed to spend another $200,000 on storage because we're constantly duplicating partitions and rebuilding indexes and the current solution can't keep up with the IOPS demand? At some point, they're going to figure out that it's not free. While you don't need to just say no, you do need to be realistic and up-front about the cost associated. Additionally, your storage admin will thank you.
As for 2012, there is a new columnstore index which can reduce or replace all of the current nonclustereds you're using to cover all you're analysts search requests. It's highly compressed, covers a very wide variety of search arguments, and utilizes the new Batch execution mode. It performs best on low selectivity queries like the ones frequently performed on fact tables. The one catch is that you can't directly do updates. You'll have to switch the partition out to a staging table, drop the columnstore on the staging table, update the staging table, add the columnstore back, then switch the partition back into the fact table. It sounds like alot, but could be significantly faster and require less IO than maintaining all of those nonclustereds.
My question has always been "Is it really a fact table if it is constantly changing?". This is not OLTP is it? Try offsetting transactions or at least push all updates to a scheduled off-peak time. Updating fact tables is becoming a thing of the past. All of the big boys are moving toward the "Update frowned upon" column oriented architecture for data warehousing. PowerPivot and the Analysis Services Tabular Model are built on the columnstore technology.
Finally, Review Kimballs' DW Toolkit books. He has several that lay out best practices and cover edge-case scenarios. What I learned from them was that Data Warehouse Development is not just Database Development on steroids. It also involves politics and focusing resources on what's best for the business.
I am trying to layout the tables for use in new public-facing website. Seeing how there will lots more reading than writing data (guessing >85% reading) I would like to optimize the database for reading.
Whenever we list members we are planning on showing summary information about the members. Something akin to the reputation points and badges that stackoverflow uses. Instead of doing a subquery to find the information each time we do a search, I wanted to have a "calculated" field in the member table.
Whenever an action is initiated that would affect this field, say the member gets more points, we simply update this field by running a query to calculate the new values.
Obviously, there would be the need to keep this field up to date, but even if the field gets out of sync, we can always rerun the query to update this field.
My question: Is this an appropriate approach to optimizing the database? Or are the subqueries fast enough where the performance would not suffer.
There are two parts:
Caching
Tuned Query
Indexed Views (AKA Materialized views)
Tuned table
The best solution requires querying the database as little as possible, which would require caching. But you still need a query to fill that cache, and the cache needs to be refreshed when it is stale...
Indexed views are the next consideration. Because they are indexed, querying against is faster than an ordinary view (which is equivalent to a subquery). Nonclustered indexes can be applied to indexed views as well. The problem is that indexed views (materialized views in general) are very constrained to what they support - they can't have non-deterministic functions (IE: GETDATE()), extremely limited aggregate support, etc.
If what you need can't be handled by an indexed view, a table where the data is dumped & refreshed via a SQL Server Job is the next alternative. Like the indexed view, indexes would be applied to make fetching data faster. But data change means cleaning up the indexes to ensure the query is running as best it can, and this maintenance can take time.
The least expensive database query is the one that you don't have to run against the database at all.
In the scenario you describe, using a high-performance cache technology (example: memcached) to store query results in your application can be a lot better strategy than trying to trick out the database to be highly scalable.
The First Rule of Program Optimization: Don't do it.
The Second Rule of Program Optimization (for experts only!): Don't do it yet.
Michael A. Jackson
If you are just designing the tables, I'd say, it's definitely premature to optimize.
You might want to redesign your database a few days later, you might find out that things work pretty fast without any clever hacks, you might find out they work slow, but in a different way than you expected. In either case you would waste your time, if you start optimizing now.
The approach you describe is generally fine; you could get some pre-computed values, either using triggers/SPs to preserve data consistency, or running a job to update these values time-to-time.
All databases are more than 85% read only! Usually high nineties too.
Tune it when you need to and not before.
Are database views only a means to simplify the access of data or does it provide performance benefits when accessing the views as opposed to just running the query which the view is based on? I suspect views are functionally equivalent to just the adding the stored view query to each query on the view data, is this correct or are there other details and/or optimizations happening?
I have always considered Views to be like a read-only Stored Procedures. You give the database as much information as you can in advance so it can pre-compile as best it can.
You can index views as well allowing you access to an optimised view of the data you are after for the type of query you are running.
Although a certain query running inside a view and the same query running outside of the view should perform equivalently, things get much more complicated quickly when you need to join two views together. You can easily end up bringing tables that you don't need into the query, or bringing tables in redundantly. The database's optimizer may have more trouble creating a good query execution plan. So while views can be very good in terms of allowing more fine grained security and the like, they are not necessarily good for modularity.
It depends on the RDBMS, but usually there isn't optimization going on, and it's just a convenient way to simplify queries. Some database systems use "materialized views" however, which do use a caching mechanism.
Usually a view is just a way to create a common shorthand for defining result sets that you need frequently.
However, there is a downside. The temptation is to add in every column you think you might need somewhere sometime when you might like to use the view. So YAGNI is violated. Not only columns, but sometimes additional outer joins get tacked on "just in case". So covering indexes might not cover any more, and the query plan may increase in complexity (and drop in efficiency).
YAGNI is a critical concept in SQL design.
Generally speaking, views should perform equivalently to a query written directly on the underlying tables.
But: there may be edge cases, and it would behoove you to test your code. All modern RDBMS systems have tools that will let you see the queryplans, and monitor execution. Don't take my (or anybody else's) word for it, when you can have the definitive data at your fingertips.
I know this is an old thread. Discussion is good, but I do want to throw in one more thought. Performance also depends on what you are using to pull data with. For example, if you are front-ending with something like Microsoft Access you can definately gain performance for some complex queries by using a view. This is because Access does not always pull from the SQL server as we would like -- in some cases it would pull entire tables across then try to process locally from there! Not so if you use a view.
Yes, in all modern RDBMS's (MSSQL after 2005? etc) view's query plans are cached removing the overhead of planning the query and speeding up performance over the same SQL performed in-line. Previously to this (and it applies to parameterized SQL/Prepared Statements as well) people correctly thought stored procedures performed better.
Many still hang onto this today making it a modern DB myth. Ever since Views/PS's got the cached query planning of SPs they've been pretty much even.
I want to know what is the difference between a query and a view in terms of performance. And if a view is costly, what else besides a query could I do to improve performance?
I can't speak for all databases, but in SQL Server you cannot index views unless you have an Enterprise version. An unindexed view can be significantly poorer in terms of performance than a query especially if you are writing a query against it to add some where conditions. Indexed views generally can perform fairly well. An indexed view can also be against multiple fields which are in differnt tables and that may imporve performance over the ad hoc query. (It may not too, in performance tuning, you must always test against your particular circumstances.)
One point against views is that they do not allow for run-time selection of where criteria. So often you end up with both a view and a query.
Views can be more easily maintained (Just add that new table in a join and everything accessing financial reports has it available) but they are much more difficult to performance tune. This is in part because they tend to be over generalized and thus are slower than their counterparts which only return the minimum necessary. And yes as Jonathan said, you can far too easily get into joining together views for a report into a mess which joins to the same large tables many more times than need be and is very slow.
Two places where views shine though is:
Making sure that complex relationships are always correctly described. This is one reason why report writers tend to favor them.
Limiting access to a subset of records
There are also limitations on the type of queries that can be done for a view vice an ad hoc query or a stored proc. For instance you can't use an if statement (or other procedural type code such as looping) or as noted above you cannot provide run-time values for the where criteria.
One place where views are often significantly slower is when they call other views. The underlying views need to be fully realized in some databases and thus you might need to callup 4,459,203 records to see the 10 you are ultimately interested in. Start to layer this more than once and it can get very slow, very fast; views that call views are simply a poor practice.
Views and ad-hoc queries, in the simple case, are nearly identical in terms of performance. So much so that when you program with a view, you should think of it as though the text of the view definition were being cut and pasted into your parent query.
HLGEM points out in his answer that certain editions of SQL Server allows you to "index" views -- in this case, behind the scenes SQL Server maintains the same structures that underlie a table, making an indexed view and a table very similar in terms of performance.
In SQL Server, though you can generally nest views fairly liberally without running into performance problems, it can make things more difficult to understand and debug.
In SQL Server I believe that the performance difference between views and queries is negligible. What I would recommend doing to improve performance is to create another table that holds the results of the view. You could perhaps create a staging table where new data is held and then a stored procedure can be run at some interval that populates the working table with the new information. A trigger might be good for this purpose. Depending on the requirements of your application this design may or may not be suitable. If you are working with near real-time data, this approach will lead to concurrency issues...
One other thing to look into, is to make absolutely sure that the base tables you are using to construct your view are indexed correctly, and that the query itself is optimized. Finally, I believe it is possible in SQL Server enterprise to create indexed views although I have not used them before.
If they do exactly the same thing a view might be slightly faster on first execution as the database server will have a precompiled execution plan for it. Depends on your server though.
Empasis on might and slightly...
Views promote code reuse and can abstract away database complexity to give a more coherent 'business' model of data. However they are not nearly as tunable. You may find yourself in a position where you need to provide join hints or other low level optimisations and many DBA's that i have worked with do not like them being applied to views as they may then be reused across many queries, the opinion being that these types of hints should be employed as sparingly as possible. I like using views myself.
A view is barely more expensive to the computer than writing out the query longhand. A view can save the programmer/user a lot of time writing the same query out time after time, and getting it wrong, and so on. The view may also be the only way to access the data if views are also used to enforce authorization (access control) on the underlying tables.
If the query does not perform well, you need to review how the query is formed, and whether the tables all have the appropriate indexes on them. If your system needs accurate statistics for the optimizer to perform well, have you updated those statistics sufficiently recently?
Once upon a long time ago, I came across a system where a query generator had created one query that listed seventeen tables in a single FROM clause, including several LEFT OUTER JOIN of a table with itself. And, in fact, closer scrutiny revealed that several of the 'tables' were in fact multi-table views, and some of these also involved self outer joins, and were themselves involved in self outer joins of the view. To say "ghastly" is an understatement. There was a lot of cleanup possible to improve the performance of that query - eliminating unnecessary outer joins, self joins, and so on. (It actually pre-dated the explicit join notation of SQL-92 - I said a long time ago - so the outer join syntax was DBMS-specific.)
If you mean network performance then working from a local cache (as with ADO.Net DataSets) would reduce network traffic- but could cause problems with locking. Just a thought.
A view is still a query, it just abstracts certain parts of it so that your queries can be simplified (if they do similar things) and to maximize reuse.