note: this Q is looking for a comparison of Hibernate named queries and ordinary session queries. Hibernate Criteria is of no concern within the context of this Q.
from what i know, named queries are those parsed once when the system starts up, and can be used from everywhere throughout the application. so - w/named queries, the query isn't parsed from scratch for each caller of that query and this is the major gain in named queries.
but then -
is there a difference between how Hibernate operates its caches for named- and ordinary-queries? if so- what is this?
is there any loss in turning ordinary Hibernate queries into named-queries?
i've had a discussion w/a colleague. he thinks that, before i should go turning ordinary queries into named queries, i should device some metrics and write tests just to prove how named-queries is performing better.
i think this-- generating metrics and writing tests just for the sake of measuring how/whether named queries perform better than ordinary queries is nothing but burning time into something useless. that's been shown already-- the reason of existence of named queries is just getting the query parsed. what data it's pulling/changing in DB is immaterial. and, Hibernate named queries is being used by many developers.
my Q is -
am i missing something in named queries that is relevant to this discussion?
opinions on how to handle this situation? the options i'm looking at are i.) drop doing anything at all-- let queries as is, ii.) just change named queries-- reverting if disliked wont have burned too much of my time iii.) do those tests-- if i would consider this as an option.
TIA.
Short answer - Use it if you can. But if you already have queries that are working fine with tests that cover its functionality, I wouldn't recommend that you go converting them.
Another SO post addressing this can be found here:
Advantages of Named queries in hibernate?
Related
A lot of guys on this site state that: "Optimizing something for performance is the root of all evil". My problem now is that I have a lot of complex SQL queries, many of them utilizing user created functions in PL/pgSQL or PL/python. My problem is that I do not have any performance profiling tool to show me, which functions actually make the queries slow. My current method is to exclude the various functions and take the time on the query for each one. I know that I could use explain analyze as well, but I do not think it will provide me with the information about user created functions.
My current method is quite tedious, especially since there is not query progress in PostgreSQL so I have sometimes have to wait for the query to run for 60 seconds, if I choose to run it on too much data.
Therefore, I am thinking whether it could be a good idea to create a tool, which will automatically do a performance profiling of SQL queries by modifying the SQL query and take the actual processing time on various versions of it. Each version would be a simplified one, which would maybe just contain a single user created function. I know that I am not describing how to do this clearly, and I can think of a lot of complicating factors, but I can also see that there are workarounds for many of these factors. I basically need your gut feeling on whether such a method is feasible.
Another similar idea is to run the query setting server settings work_mem to various values, and showing how this would impact the performance.
Such a tool could be written using JDBC so it could be modified to work across all major databases. In this case it might be a viable commercial product.
Apache JMeter can be used to load test and monitor the performance of SQL Queries (using JDBC). It will howerever not modify your SQL.
Actually I don't think any tool out there could simplify and then re-run your SQL. How should that "simplifying" work?
I ever developed several projects based on python framework Django. And it greatly improved my production. But when the project was released and there are more and more visitors the db becomes the bottleneck of the performance.
I try to address the issue, and find that it's ORM(django) to make it become so slow. Why? Because Django have to serve a uniform interface for the programmer no matter what db backend you are using. So it definitely sacrifice some db's performance(make one raw sql to several sqls and never use the db-specific operation).
I'm wondering the ORM is definitely useful and it can:
Offer a uniform OO interface for the progarammers
Make the db backend migration much easier (from mysql to sql server or others)
Improve the robust of the code(using ORM means less code, and less code means less error)
But if I don't have the requirement of migration, What's the meaning of the ORM to me?
ps. Recently my friend told me that what he is doing now is just rewriting the ORM code to the raw sql to get a better performance. what a pity!
So what's the real meaning of ORM except what I mentioned above?
(Please correct me if I made a mistake. Thanks.)
You have mostly answered your own question when you listed the benefits of an ORM. There are definitely some optimisation issues that you will encounter but the abstraction of the database interface probably over-rides these downsides.
You mention that the ORM sometimes uses many sql statements where it could use only one. You may want to look at "eager loading", if this is supported by your ORM. This tells the ORM to fetch the data from related models at the same time as it fetches data from another model. This should result in more performant sql.
I would suggest that you stick with your ORM and optimise the parts that need it, but, explore any methods within the ORM that allow you to increase performance before reverting to writing SQL to do the access.
A good ORM allows you to tune the data access if you discover that certain queries are a bottleneck.
But the fact that you might need to do this does not in any way remove the value of the ORM approach, because it rapidly gets you to the point where you can discover where the bottlenecks are. It is rarely the case that every line of code needs the same amount of careful hand-optimisation. Most of it won't. Only a few hotspots require attention.
If you write all the SQL by hand, you are "micro optimising" across the whole product, including the parts that don't need it. So you're mostly wasting effort.
here is the definition from Wikipedia
Object-relational mapping is a programming technique for converting data between incompatible type systems in relational databases and object-oriented programming languages. This creates, in effect, a "virtual object database" that can be used from within the programming language.
a good ORM (like Django's) makes it much faster to develop and evolve your application; it lets you assume you have available all related data without having to factor every use in your hand-tuned queries.
but a simple one (like Django's) doesn't relieve you from good old DB design. if you're seeing DB bottleneck with less than several hundred simultaneous users, you have serious problems. Either your DB isn't well tuned (typically you're missing some indexes), or it doesn't appropriately represents the data design (if you need many different queries for every page this is your problem).
So, i wouldn't ditch the ORM unless you're twitter or flickr. First do all the usual DB analysis: You see a lot of full-table scans? add appropriate indexes. Lots of queries per page? rethink your tables. Every user needs lots of statistics? precalculate them in a batch job and serve from there.
ORM separates you from having to write that pesky SQL.
It's also helpful for when you (never) port your software to another database engine.
On the downside: you lose performance, which you fix by writing a custom flavor of SQL - that it tried to insulate from having to write in the first place.
ORM generates sql queries for you and then return as object to you. that's why it slower than if you access to database directly. But i think it slow a little bit ... i recommend you to tune your database. may be you need to check about index of table etc.
Oracle for example, need to be tuned if you need to get faster ( i don't know why, but my db admin did that and it works faster with queries that involved with lots of data).
I have recommendation, if you need to do complex query (eg: reports) other than (Create Update Delete/CRUD) and if your application won't use another database, you should use direct sql (I think Django has it feature)
I have a problem similar to the on in this weeks podcast.
We have a Java application using hibernate with Sql Server 2005.
Hibernate is generating a Query for us that is taking nearly 20 minutes to complete.
If we take the same query using show_sql and replace the questions marks with constant value the answer is returned immediately.
I think we need option(recompile), but I can't figure out how to do that with HQL.
Please help!
From the description of your problem, it sounds like you're running into parameter sniffing. Essentially, SQL Server is creating a query plan based on an older set of parameter values that were passed in and which do not create an effective execution plan for the currently running query.
Typically I resolve this issue by passing the parameter values into local variables and using those in my query or by using OPTION (RECOMPILE). However, since you are using Hibernate my usual solution isn't an option for you. As I understand it, the best option is going to be to use Hibernate to run a native SQL query using prepareStatement() or createSQLQuery() which, unfortunately, removes some of the benefits of using Hibernate.
In my experience, the main problem with complex queries in Hibernate is not the query itself, but rather the creation of all the objects representing the result set.
In my case at work, we had a very large domain model, with lots of couplings, so that even fetching one single object from the database was quite expensive because that object was linked to other objects, which in turn were linked to other objects and so on.
For us, more use of lazy loading solved at least parts of the problem. Smart caching helped even more. What I learned was that in the future, I'll allow more loose coupling between domain classes.
You should post your mapping and HQL statement. If you are using "join" in your HQL, you might want to take a look what exactly is fetched by Hibernate. It might turn out that the request itself is simple, but Hibernate is fetching tons of data before the it gets to it.
Are database views only a means to simplify the access of data or does it provide performance benefits when accessing the views as opposed to just running the query which the view is based on? I suspect views are functionally equivalent to just the adding the stored view query to each query on the view data, is this correct or are there other details and/or optimizations happening?
I have always considered Views to be like a read-only Stored Procedures. You give the database as much information as you can in advance so it can pre-compile as best it can.
You can index views as well allowing you access to an optimised view of the data you are after for the type of query you are running.
Although a certain query running inside a view and the same query running outside of the view should perform equivalently, things get much more complicated quickly when you need to join two views together. You can easily end up bringing tables that you don't need into the query, or bringing tables in redundantly. The database's optimizer may have more trouble creating a good query execution plan. So while views can be very good in terms of allowing more fine grained security and the like, they are not necessarily good for modularity.
It depends on the RDBMS, but usually there isn't optimization going on, and it's just a convenient way to simplify queries. Some database systems use "materialized views" however, which do use a caching mechanism.
Usually a view is just a way to create a common shorthand for defining result sets that you need frequently.
However, there is a downside. The temptation is to add in every column you think you might need somewhere sometime when you might like to use the view. So YAGNI is violated. Not only columns, but sometimes additional outer joins get tacked on "just in case". So covering indexes might not cover any more, and the query plan may increase in complexity (and drop in efficiency).
YAGNI is a critical concept in SQL design.
Generally speaking, views should perform equivalently to a query written directly on the underlying tables.
But: there may be edge cases, and it would behoove you to test your code. All modern RDBMS systems have tools that will let you see the queryplans, and monitor execution. Don't take my (or anybody else's) word for it, when you can have the definitive data at your fingertips.
I know this is an old thread. Discussion is good, but I do want to throw in one more thought. Performance also depends on what you are using to pull data with. For example, if you are front-ending with something like Microsoft Access you can definately gain performance for some complex queries by using a view. This is because Access does not always pull from the SQL server as we would like -- in some cases it would pull entire tables across then try to process locally from there! Not so if you use a view.
Yes, in all modern RDBMS's (MSSQL after 2005? etc) view's query plans are cached removing the overhead of planning the query and speeding up performance over the same SQL performed in-line. Previously to this (and it applies to parameterized SQL/Prepared Statements as well) people correctly thought stored procedures performed better.
Many still hang onto this today making it a modern DB myth. Ever since Views/PS's got the cached query planning of SPs they've been pretty much even.
LINQ simplifies database programming no doubt, but does it have a downside? Inline SQL requires one to communicate with the database in a certain way that opens the database to injections. Inline SQL must also be syntax-checked, have a plan built, and then executed, which takes precious cycles. Stored procedures have also been a rock-solid standard in great database application programming. Many programmers I know use a data layer that simplifies development, however, not to the extent LINQ does. Is it time to give up on the SP's and go LINQ?
LINQ to SQL actually presents some alarming performance problems in the database. Basically, it creates multiple execution plans based on the length of the parameter you are using. I posted about it a while back on my blog LINQ to SQL may cause performance problems.
Now, is that to say that LINQ doesn't have a place? Hardly. LINQ definitely has a place in the development toolkit, just like stored procedures. Ultimately, you want to use stored procedures when performance is absolutely necessary and use an ORM tool in any other situation.
As far as inline SQL goes, there are ways to execute inline SQL so that the plan is only built once and is never recompiled. Most ORMs should take care of this aspect of performance tuning as well and using these methods is usually the safest way to execute your SQL since it forces you to use parameterized queries.
Like most database solutions, the right answer depends on the problem you're trying to solve. If you favor development speed over database/application performance, then using LINQ or another DAL/ORM tool is the best way to go. If you favor performance over ease of development, then using stored procedures and pure datasets is going to be your best bet. LLBLGen even provides a LINQ to LLBLGen layer so you can use LINQ to query LLBLGen's objects and have LLBLGen actually handle building your queries and avoid some of the downfalls of LINQ.
Your basic premise is flawed..
Inline SQL requires one to communicate with the database in a certain way that opens the database to injections.
No it doesn't. Hard-coding user-inputted values into a SQL statement does, but you could do that with store procedures as well.
Parameterizing your queries guards against injection attacks, but inline SQL can be parameterizing just as easily as stored procedures.
Inline SQL must also be syntax-checked, have a plan built, and then executed.
All Sql (SPs and inline) must be syntax-checked and have a plan built on their first call. Thereafter, the exact text of the request & the execution plan are cached. If another request with the exact same text (not counting parameters) is received, the cached execution plan is used.
So, if you hard-code values into inline SQL, the text won't match, and it will have to re-parse the query. However, if you use parameters, the text of the query will match, and you will get a cache hit. In which case, it wouldn't matter if the query in inline SQL or a SP.
In other words, the only problem with inline SQL is that it easy to do something that slow & insecure. But making inline SQL fast & secure is no more work that using a SP.
Which brings us to LINQ, which always using parameters, even if you hard-code the values into the LINQ statement, making "fast & secure" inline SQL trivial.
LINQ also have the advantage over SPs of having all your code in one spot, instead of scattered over two different machines.
If you're interested in benchmarking, Rico Mariani has an excellent 5-part study that covers the qualitative and quantitative differences.
He may be an MS guy, but he's known as a performance nut - his benchmarks are thorough and well thought out.
This is a performance run by Maximilian Beller. According to him, LINQ is much much slower.
Read his comprehensive study
Just think about changing a columns name - now change the (n)SPs and (x)Views.
Do everything that is expensive on the database (like searches , sorting etc..) and you won't notice a problem.
Also, if you want to display a large grid without paging ... then use a dataset - that one is faster.
StackOverflow also uses linq2sql - do you see a problem :) ?
Use an ORM - it's the way to go on most applications.
PS: also, about micro benchmarks - like .. let's select 10.000 rows with an ORM - DON'T DO IT. That's not why you use an ORM. If you want to select 10.000 rows use ADO.
It depends on what you're doing. LINQ is going to be less efficient at the actual data/set manipulation than a real database. But you'll save a lot in not having to connect to the database over a network.
If your database is on the same machine or is formally 'well-connected', you're probably better off using it.
But if you're getting back a large result set from a remote db that could mean significant transmission time, or if it's a really short query that won't justify the overhead, LINQ would likely be better.
Because of the structure of LINQ to SQL, there is no possible way it can be faster than using raw SQL, either your own well-formed queries or as a stored procedure. What LINQ buys you is not speed but type safety and organization; in short most of the benefits that ORMs generally grant you.
LINQ to SQL is not about speed, it's about building a more maintainable software system. It's about all the stuff dedicated Software Engineers and Architects care about, stuff like loose coupling and layering
That's not to say that you can't build some really unmaintainable code with LINQ -- nobody is keeping you from shooting yourself in the foot but you -- but done properly, LINQ can help tremendously. I'm not saying LINQ is a silver bullet, however. It has a host of issues that make it difficult to use in many enterprise situations -- which is why MS offers Entity Framework (ADO.NET 3.0). Of course, even that's not perfect given the recent EF Vote of No Confidence.
Is LINQ to SQL or even EF better than raw SQL? I'd say a resounding Hells Yeah. Are there other solutions that might work better? Maybe.