Related
I'm primarily a Java developer who works with Hibernate, and in some of my use cases I have queries which perform very slowly compared to what I expect. I've talked to the local DBAs and in many cases they claim the performance can't be improved because of the nature of the query.
However, I'm somewhat hesitant to take them at their word. What resources can I use to learn when I have to suck it up and find a different way of getting the information I want or learn to live with the speed and when I can call bullshit on the DBAs.
You are at an interesting juncture. Most Java developers use ORM tools because they don't know anything about databases, they don't want to learn anything about databases and especially they don't want to learn anything about the peculiarities of a specific proprietary DBMS. ORMs ostensibly shield us from all this.
But if you really want to understand how a query ought to perform you are going to have to understand how the Oracle database works. This is a good thing: you will definitely build better Hibernate apps if you work with the grain of the database.
Oracle's documentation set includes a volume on Performance Tuning. This is the place to start. Find out more. As others have said, the entry level tool is EXPLAIN PLAN. Another essential read is Oracle's Database Concepts Guide. A vital aspect of tuning is understanding the logical and physical architecture of the database. I also agree with DCooke's recommendation of Tom Kyte's book.
Bear in mind that there are contractors and consultants who make a fine living out of Oracle performance tuning. If it was easy they would be a lot poorer. But you can certainly give yourself enough knowledge to force those pesky DBAs to engage with you properly.
DCookie, OMG, and APC all get +1 for their answers. I've been on the DBA side of Hibernate apps and as Tom Kyte puts it, there is no "fast=true" parameter that can take inefficient SQL and make it run faster. Once in a while I've been able to capture a problematic Hibernate query and use the Oracle SQL analyzer to generate a SQL Profile for that statement that improves performance. This profile is a set of hints that "intercept" the optimizer's generation of an execution plan and force it (with no change to the app) to a better one that would normally be overlooked by the optimizer for some reason. Still, these findings are the exception rather than the rule for poorly-performing SQL.
One thing you can do as a developer who presumably understands the data better than the ORM layer is to write efficient views to address specific query problems and present these views to Hibernate.
A very specific issue to watch out for is Oracle DATE (not TIMESTAMP) columns that end up in the Hibernate-generated queries and are compared to bind variables in a WHERE clause - the type mismatch with Java timestamp data types will prevent the use of indexes on these columns.
What is the basis for your query performance expectations? Gut feel? If you're going to argue with the DBA's, you'll need to know (at a minimum) what your queries are, understand EXPLAIN PLAN, and be able to point out how to improve things. And, as #OMG Ponies points out, Hibernate may be doing a poor job of constructing queries - then what do you do?
Not easy? Perhaps a better approach would be to take a bit less adversarial approach with the DBA staff and ask politely what it is about the queries that inhibits performance improvements and if there's any suggestions they might have about how you could refactor them to perform better.
Posting my comment as an answer:
That's the trade-off with using an ORM - you're at it's mercy for how it constructs the query that is shipped to the database. LINQ is the only one that interested me, because you can use it in tandem with stored procedures for situations like these. I'm surprised the DBAs don't tell you to ditch ORM if you want better speed...
The EXPLAIN plan will give you an idea of the efficiency, but not really a perspective on speed. For Oracle, you'd need to use tkprof (assuming available to you) to analyze what is going on.
it also maybe a table structure (normalization) issue. Mostly, this is exactly why i don't go with hybernate - you should always be able to write your own queries that are optimal.
We're evaluating EF4 and my DBA says we must use the NOLOCK hint in all our SELECT statements. So I'm looking into how to make this happen when using EF4.
I've read the different ideas on how to make this happen in EF4, but all seem like a work around and not sanctioned by Microsoft or EF4. What is the "official Microsoft" response to someone who wants their SELECT statement(s) to include the NOLOCK hint when using LINQ-to-SQL / LINQ-to-Entities and EF4?
By the way, the absolute best information I have found was right here and I encourage everyone interested in this topic to read this thread.
Thanks.
NOLOCK = "READ UNCOMMITTED" = dirty reads
I'd assume MS knows why they chose the default isolation level as "READ COMMITTED"
NOLOCK, in fact any hint, should be used very judiciously: not by default.
Your DBA is a muppet. See this (SO): What can happen as a result of using (nolock) on every SELECT in SQL Sever?. If you happen to work at a bank, or any institution where I may have an account please let me know so I can close it.
I'm a developer on a tools team in the SQL org at Microsoft. I'm in no way authorized to make any official statement, and I'm sure there are people on SO who know more about these things than I do. Nevertheless, I'll offer a friendly rule of thumb, along the theme of "Premature optimization is the root of all evil":
Don't use NOLOCK (or any other query hint for that matter), until you have to. If you have a select statement which has a decent query plan, and it runs fine when there is very little other load on the system, but then it slows down when other queries are accessing the same table, try adding some NOLOCK hints. But always understand that when you do, you run the risk of getting inconsistent data. If you are writing some mission critical app that does online banking or controls an aircraft, this may be unacceptable. However, for many applications the perf speedup is worth the risk. Evaluate on a case-by-case basis, though. Don't just use them willy nilly all over the place.
If you do choose to use NOLOCK, I have blogged a solution in C# using extension methods, so that you can easily change a LINQ query to use NOLOCK hints. If you can adapt this to EF4, please post your adaptation.
EF4 does not currently have a built in way to do it IF ef4 is generating all your queries.
There are ways around this such as using stored procedures or a more extended inline query model, however, this can be time consuming to say the least.
I believe (and I don't speak for Microsoft on this) that caching is Microsoft's intended solution for lightening the load on the server in EF4 sites. Having read uncommitted (or nolock) built into a framework would create unpredictable issues for the expected behaviour of EF4 when 2 contexts are run at the same time. That doesn't mean your situation needs that level of concurrency.
It sounds like you were asked for nolock on ALL selects. While I agree with earlier poster that this can be dangerous if you have ANY transactions that need to be transactions, I don't agree that automatically makes the DBA a muppet. You might just be running a CMS which is totally cool for dirty reads. You can change the ISOLATION LEVEL on your whole database which can have the same effect.
The DBA may have recommended nolock for operations that were ONLY selects (which is fine, especially if there's an ORM being misuesd and doing some dodgy data dumps). The funniest thing about that muppet comment is that Stack Overflow itself runs SQL server in a READ UNCOMMITTED mode. Guess you need to find somewhere else to get answers for your problems then?
Talk to your DBA about the posibility of setting this on a database level or consider a caching strategy if you only need it in a few places. The web is stateless after all so concurrency can often be an illusion anyway unless you address it direclty.
Info about isolation levels
Having worked with EF4 for over a year now, I will offer that using stored procedures for specific tasks is not a hack and absolutely necessary for performance under certain situations.
Our platform gets a lot of traffic through our web site, APIs and ETL data feeds. We use EF primarily on our web side, but also for some back-end processes. Sometimes EF does a great job with its query generation, sometimes it is terrible. You need to look at the queries being generated, load them into query analyzer, and decide whether you might be better off writing the operation in another way (stored procedure, etc.).
If you find that you need to make data available via EF and need NOLOCKs, you can always create views with the NOLOCK hints included, and expose the view to EF instead of the underlying table. The same can be done with Stored Procedures. These methods are probably a bit easier when you are using the Code First approach.
But I think that one mistake a lot of people make with EF is believing that the EF object model has to map directly to the physical (table) model in the database. It doesn't and this is where your DBA comes into play. Let him design your physical model and you work together to abstract your logical data model which is mapped to your object model in EF.
Although this would be a major PITA to do, you can always drop your SQL in a stored procedure and get the functionality you need (or are forced into). It's definitely a hack though!
I know this isn't an answer to your question, but I just wanted to throw this in.
It seems to me that this is (at least partially) the DBA's job. It's fine to say that an application should behave a certain way, and you can and should certainly attempt to program it the way that he would like.
The only way to be sure though, is for the DBA to work on the application with you and construct the DB surface that he would like to present to the app. If he wants critical tables to be queried as READ UNCOMMITTED, then he should help to provide a set of stored procedures with the correct access and isolation level.
Relying on the application code to construct every ad-hoc query correctly is not a scalable approach.
Triggers seems like a simple solution for Audit logging. Why should I use Interceptors?
Database portability is one con of trigger...
what are others?
Con of using anything except a trigger is that not all data changes may take place through the GUI and therefore might not get logged. You have to consider that databases are changed from many sources including data imports and set-based queries from the query window (for instance when someone is asked to update all prices by 10%). If you use another method, you had better make sure that it captures any way data can be changed. If you use dynamic sql at all, then all your tables are open to the users to make changes directly in the database including fradulaent changes designed to steal from the company. Users committing fraud are one of the key things audit triggers are designed to catch. If you think your audit solution is ok becasue it captures evreything from the user interface and that it all it needs to capture, you are very, very wrong. I don't know how interceptors work, but you had better test with SSIS (or DTS) imports and queries from the query window before you think the solution will work. Also if it works just from the GUI, remember there might be more than one GUI connecting to a database.
I think that the reason for using interceptors is two fold:
So that you don't tie yourself to a particular database. the porting to different DBMS is significantly easier.
So that your domain model doesn't bleed into other areas of your code. ie the database needing to know about if a record has changed.
But all this depends on the context. If it is vital that all changes to particular records is neccessary then i think than HLGEM is correct. triggers are the best for handling that type of senario.
I agree with HLGEM.
A good alternative to having the advantages of having both triggers and portability of DBMS, is to use some auditing tool that:
Given an audit plan: generate the triggers for the appropriate DBMS
Pablo Javier
Another small issue is with triggers doing any DML. nHibernate uses the affected row count to determine success on a lot of its operations. If you are doing any inserts/updates/etc. inside your triggers, then you'll need to turn NOCOUNT on inside the triggers to prevent those false rowcounts from bubbling up.
Not that this, in any way, prevents you from making triggers work, but I've spent enough time refactoring away from this problem I thought it was worth mentioning. Interceptors, or EventListeners, are a simple, portable way to satisfy auditing requirements.
Plus, no more icky T-SQL code...
Triggers aren't easily testable and they're actually quite difficult to write properly. And if your audit data is to be consumed by business users, it's often difficult to translate from database row-level operations back into the domain model.
I'm of the opinion that a database is really just the persistence area of an application. Of a single application. In other words, I don't think that other systems should use my database directly and so I think auditing to be done outside of the database (i.e. not with triggers).
Are database views only a means to simplify the access of data or does it provide performance benefits when accessing the views as opposed to just running the query which the view is based on? I suspect views are functionally equivalent to just the adding the stored view query to each query on the view data, is this correct or are there other details and/or optimizations happening?
I have always considered Views to be like a read-only Stored Procedures. You give the database as much information as you can in advance so it can pre-compile as best it can.
You can index views as well allowing you access to an optimised view of the data you are after for the type of query you are running.
Although a certain query running inside a view and the same query running outside of the view should perform equivalently, things get much more complicated quickly when you need to join two views together. You can easily end up bringing tables that you don't need into the query, or bringing tables in redundantly. The database's optimizer may have more trouble creating a good query execution plan. So while views can be very good in terms of allowing more fine grained security and the like, they are not necessarily good for modularity.
It depends on the RDBMS, but usually there isn't optimization going on, and it's just a convenient way to simplify queries. Some database systems use "materialized views" however, which do use a caching mechanism.
Usually a view is just a way to create a common shorthand for defining result sets that you need frequently.
However, there is a downside. The temptation is to add in every column you think you might need somewhere sometime when you might like to use the view. So YAGNI is violated. Not only columns, but sometimes additional outer joins get tacked on "just in case". So covering indexes might not cover any more, and the query plan may increase in complexity (and drop in efficiency).
YAGNI is a critical concept in SQL design.
Generally speaking, views should perform equivalently to a query written directly on the underlying tables.
But: there may be edge cases, and it would behoove you to test your code. All modern RDBMS systems have tools that will let you see the queryplans, and monitor execution. Don't take my (or anybody else's) word for it, when you can have the definitive data at your fingertips.
I know this is an old thread. Discussion is good, but I do want to throw in one more thought. Performance also depends on what you are using to pull data with. For example, if you are front-ending with something like Microsoft Access you can definately gain performance for some complex queries by using a view. This is because Access does not always pull from the SQL server as we would like -- in some cases it would pull entire tables across then try to process locally from there! Not so if you use a view.
Yes, in all modern RDBMS's (MSSQL after 2005? etc) view's query plans are cached removing the overhead of planning the query and speeding up performance over the same SQL performed in-line. Previously to this (and it applies to parameterized SQL/Prepared Statements as well) people correctly thought stored procedures performed better.
Many still hang onto this today making it a modern DB myth. Ever since Views/PS's got the cached query planning of SPs they've been pretty much even.
In the past I've never been a fan of using triggers on database tables. To me they always represented some "magic" that was going to happen on the database side, far far away from the control of my application code. I also wanted to limit the amount of work the DB had to do, as it's generally a shared resource and I always assumed triggers could get to be expensive in high load scenarios.
That said, I have found a couple of instances where triggers have made sense to use (at least in my opinion they made sense). Recently though, I found myself in a situation where I sometimes might need to "bypass" the trigger. I felt really guilty about having to look for ways to do this, and I still think that a better database design would alleviate the need for this bypassing. Unfortunately this DB is used by mulitple applications, some of which are maintained by a very uncooperative development team who would scream about schema changes, so I was stuck.
What's the general consesus out there about triggers? Love em? Hate em? Think they serve a purpose in some scenarios?
Do think that having a need to bypass a trigger means that you're "doing it wrong"?
Triggers are generally used incorrectly, introduce bugs and therefore should be avoided. Never design a trigger to do integrity constraint checking that crosses rows in a table (e.g "the average salary by dept cannot exceed X).
Tom Kyte, VP of Oracle has indicated that he would prefer to remove triggers as a feature of the Oracle database because of their frequent role in bugs. He knows it is just a dream, and triggers are here to stay, but if he could he would remove triggers from Oracle, he would (along with the WHEN OTHERS clause and autonomous transactions).
Can triggers be used correctly? Absolutely.
The problem is - they are not used correctly in so
many cases that I'd be willing to give
up any perceived benefit just to get
rid of the abuses (and bugs) caused by
them. - Tom Kyte
Think of a database as a great big object - after each call to it, it ought to be in a logically consistent state.
Databases expose themselves via tables, and keeping tables and rows consistent can be done with triggers. Another way to keep them consistent is to disallow direct access to the tables, and only allowing it through stored procedures and views.
The downside of triggers is that any action can invoke them; this is also a strength - no-one is going to screw up the integrity of the system through incompetence.
As a counterpoint, allowing access to a database only through stored procedures and views still allows the backdoor access of permissions. Users with sufficient permissions are trusted not to break database integrity, all others use stored procedures.
As to reducing the amount of work: databases are stunningly efficient when they don't have to deal with the outside world; you'd be really surprised how much even process switching hurts performance. That's another upside of stored procedures: rather than a dozen calls to the database (and all the associated round trips), there's one.
Bunching stuff up in a single stored proc is fine, but what happens when something goes wrong? Say you have 5 steps and the first step fails, what happens to the other steps? You need to add a whole bunch of logic in there to cater for that situation. Once you start doing that you lose the benefits of the stored procedure in that scenario.
Business logic has to go somewhere, and there's a lot of implied domain rules embedded in the design of a database - relations, constraints and so on are an attempt to codify business rules by saying, for example, a user can only have one password. Given you've started shoving business rules onto the database server by having these relations and so on, where do you draw the line? When does the database give up responsibility for the integrity of the data, and start trusting the calling apps and database users to get it right? Stored procedures with these rules embedded in them can push a lot of political power into the hands of the DBAs. It comes down to how many tiers are going to exist in your n-tier architecture; if there's a presentation, business and data layer, where does the separation between business and data lie? What value-add does the business layer add? Will you run the business layer on the database server as stored procedures?
Yes, I think that having to bypass a trigger means that you're "doing it wrong"; in this case a trigger isn't for you.
I work with web and winforms apps in c# and I HATE triggers with a passion. I have never come across a situation where I could justify using a trigger over moving that logic into the business layer of the application and replicating the trigger logic there.
I don't do any DTS type work or anything like that, so there might be some use cases for using trigger there, but if anyone in any of my teams says that they might want to use a trigger they better have prepared their arguments well because I refuse to stand by and let triggers be added to any database I'm working on.
Some reasons why I don't like triggers:
They move logic into the database. Once you start doing that, you're asking for a world of pain because you lose your debugging, your compile time safety, your logic flow. It's all downhill.
The logic they implement is not easily visible to anyone.
Not all database engines support triggers so your solution creates dependencies on database engines
I'm sure I could think of more reasons off the top of my head but those alone are enough for me not to use triggers.
"Never design a trigger to do integrity constraint checking that crosses rows in a table" -- I can't agree. The question is tagged 'SQL Server' and CHECK constraints' clauses in SQL Server cannot contain a subquery; worse, the implementation seems to have a 'hard coded' assumption that a CHECK will involve only a single row so using a function is not reliable. So if I need a constraint which does legitimately involve more than one row -- and a good example here is the sequenced primary key in a classic 'valid time' temporal table where I need to prevent overlapping periods for the same entity -- how can I do that without a trigger? Remember this is a primary key, something to ensure I have data integrity, so enforcing it anywhere other than the DBMS is out of the question. Until CHECK constraints get subqueries, I don't see an alternative to using triggers for certain kinds of integrity constraints.
Triggers can be very helpful. They can also be very dangerous. I think they're fine for house cleaning tasks like populating audit data (created by, modified date, etc) and in some databases can be used for referential integrity.
But I'm not a big fan of putting lots of business logic into them. This can make support problematic because:
it's an extra layer of code to research
sometimes, as the OP learned, when you need to do a data fix the trigger might be doing things with the assumption that the data change is always via an application directive and not from a developer or DBA fixing a problem, or even from a different app
As for having to bypass a trigger to do something, it could mean you are doing something wrong, or it could mean that the trigger is doing something wrong.
The general rule I like to use with triggers is to keep them light, fast, simple, and as non-invasive as possible.
I find myself bypassing triggers when doing bulk data imports. I think it's justified in such circumstances.
If you end up bypassing the triggers very often though, you probably need to take another look at what you put them there for in the first place.
In general, I'd vote for "they serve a purpose in some scenarios". I'm always nervous about performance implications.
I'm not a fan, personally. I'll use them, but only when I uncover a bottleneck in the code that can be cleared by moving actions into a trigger. Generally, I prefer simplicity and one way to keep things simple is to keep logic in one place - the application. I've also worked on jobs where access is very compartmentalized. In those environments, the more code I pack into triggers the more people I have to engage for even the simplest fixes.
I first used triggers a couple of weeks ago. We changed over a production server from SQL 2000 to SQL 2005 and we found that the drivers were behaving differently with NText fields (storing a large XML document), dropping off the last byte. I used a trigger as a temporary fix to add an extra dummy byte (a space) to the end of the data, solving our problem until a proper solution could be rolled out.
Other than this special, temporary case, I would say that I would avoid them since they do hide what is going on, and the function they provide should be handled explictly by the developer rather then as some hidden magic.
Honestly the only time I use triggers to simulate a unique index that is allowed to have NULL that don't count for the uniqueness.
As to reducing the amount of work: databases are stunningly efficient when they don't have to deal with the outside world; you'd be really surprised how much even process switching hurts performance. That's another upside of stored procedures: rather than a dozen calls to the database (and all the associated round trips), there's one.
this is a little off topic, but you should also be aware that you're only looking at this from one potential positive.
Bunching stuff up in a single stored proc is fine, but what happens when something goes wrong? Say you have 5 steps and the first step fails, what happens to the other steps? You need to add a whole bunch of logic in there to cater for that situation. Once you start doing that you lose the benefits of the stored procedure in that scenario.
Total fan,
but really have to use it sparingly when,
Need to maintain consistency (especially when dimension tables are used in a warehouse and we need to relate the data in the fact table with their proper dimension . Sometime, the proper row in the dimension table can be very expensive to compute so you want the key to be written straight to the fact table, one good way to maintain that "relation" is with trigger.
Need to log changes (in a audit table for instance, it's useful to know what ##user did the change and when it occurred)
Some RDBMS like sql server 2005 also provide you with triggers on CREATE/ALTER/DROP statements (so you can know who created what table, when, dropped what column, when, etc..)
Honestly, using triggers in those 3 scenarios, I don't see why would you ever need to "disable" them.
The general rule of thumb is: do not use triggers. As mentioned before, they add overhead and complexity that can easily be avoided by moving logic out of the DB layer.
Also, in MS SQL Server, triggers are fired once per sql command, and not per row. For example, the following sql statement will execute the trigger only once.
UPDATE tblUsers
SET Age = 11
WHERE State = 'NY'
Many people, including myself, were under the impression that the triggers are fired on every row, but this isn't the case. If you have a sql statement like the one above that may change data in more than one row, you might want to include a cursor to update all records affected by the trigger. You can see how this can get convoluted very quickly.