Entity Framework performance difference with many includes - sql-server

In my application I have a big nasty query that uses 25 includes. I know that might be a bit exessive, but it haven't given us much problems and have been working fine. If I take the query generated by EF and run it manually in the database it takes around 500ms, and from the code EF uses around 700ms to get the data from the database and build up the object structure, and that is perfectly acceptable.
The problem however is on the production server. If I run the query manually there I see the same around 500ms time usage to fetch the data, but entity framework now uses around 11000ms to get the data and build the object, and that is of course not good by any measure.
So my question is: What can be the cause of these extreme differences when the query fired manually on the database is roughly the same?

I ended up using Entity Framework in a more "manual" way.
So instead of using dbContext.Set<T> and a lot of includes, I had to manually use a series of dbContext.Database.SqlQuery<T>("select something from something else"). After a bit of painful coding binding all the objects together I tested it on the machines that had the problem, and now it worked as expected on all machines.
So I don't know why it worked on some machines and not others, but it seems that EF has problems on some machine setups when there are a great deal of includes.

Related

How Entity Framework generates queries?

For last 5-6 months, I've started using EF Code First for all my data access needs in .net projects.
However, most of the time I got complaints from clients about that the webpages are slow.
I've recently took some time and drilled down this issue and it seems most of the time is being consumed by EF.
Normally the database has about few ten thousands of records, and only couple of tables. I regularly use standard LINQ queries to manipulate database. Also, I use repository and unit of work patterns in my EF codes. I apply indexes to database.
All I do is nothing magic, just regular/standard suggested ways.
Is EF taking so long to generate queries, or there is something I might be missing? Why data access is so slow even if I use standard algorithms and suggested ways?

stored procedures and testing -- still a problem even today. Why?

Right now this is just a theory so if I'm way off, feel free to comment and give some ideas (I'm running out of them). I guess it's more so an update to this question and as I look at the "related question" list -- there's a lot of 0 answers. This tells me there's a real gap.
We have multiple problems with our sql setups in general, the majority of which stem from stored procedures that have grown into monsters from hell and some other user functions skattered about into the db. My biggest concern is they're completely untested -- when something goes wrong, no one can say with 100% certainty "yes, I know for a fact this works". Makes debugging a recurring nightmare.
This afternoon, I got this crazy idea we could start writing some assemblies (CLR-ing yo!) for SQL and test them. I ran into the constraints (static methods only, safe/external/unsafe, etc) and overall, that didn't go all that well. At least not as well as I'd hoped and didn't help me move toward my goal.
I've also tried setting up data in a test by hand (they tried it here too before I showed up). Even using an ORM to seed the data -- this also becomes rather difficult very quickly and a maintenance hassle. Of course, most of this pain is in the data setup and not the actual test.
So what's out there now in 2011 that helps fix/curb this problem or have we (as devs) abandonded the idea of testing stored procedures because of the heavy cost?
You can actually make stored procedure tests as a project. Our DBEs at work do that - here's a link you might like: Database Unit Testing with Visual Studio
We've had a lot of success with DbFit.
Yes, there is a cost to setting up test data (there is no way to avoid this cost IMHO), but the Fitnesse platfom (on which DbFit is based) enables you to reuse data population scripts by including them within multiple tests.
Corporate culture rules the day. Some places test extensively. Other place, well, not so much.
I did a short-term contract with a Fortune 500 a few years ago. Design, build, and deploy internally. My database had to interface with two legacy systems. It was clear early on that I was going to have to spend more time testing that usual. (Some days, a query of historical data would return 35 rows. Other days the identical query would return 20,000 rows.)
I built a tool in Microsoft Access that stored and executed SQL statements. (Access was the only tool I was allowed to use.) I could build a current version of the database, populate it with test data, and run all the tests I'd built--several hundred of them--in about 20 minutes.
It helped a lot to be able to go into meetings armed with a one-page printout that said my code was working exactly like it was when they signed off on it. But it wasn't easily automated--most of the SQL was hand-coded.
Can DBUnit help you?
Not used it much myself but you should be able to set the database to a known state, execute the procedure and then verify the data has changed as expected.
EDIT: After looking in to this more it would seem you need something like SQLunit rather than DBUnit. SQLUnit is described as
SQLUnit is a regression and unit
testing harness for testing database
stored procedures. An SQLUnit test
suite would be written as an XML file.
The SQLUnit harness, which is written
in Java, uses the JUnit unit testing
framework to convert the XML test
specifications to JDBC calls and
compare the results generated from the
calls with the specified results.
There are downsides; it's Java based which might not be your preference and more importantly there doesn't seem to have been much activity on the project since June '06 :(

What percent of your codebase is represented by Data Access code?

An interesting question came up on twitter tonight and I tought I'd post it here.
Basically, I am wondering what you are using to persist data to your database and an estimation of the percent of your codebase that is data access code.
-- Edit --
Other interesting metrics (as noted in the comments) include the number of business classes and the overall size of your codebase.
I just looked at a project that I did a long time before discovering NHibernate. Quickly looking at some of the data access code showed about 10 lines of code for persistence/hydration for every persistable property in a class.
In one project we did, we used LLBLGenPro as our OR/M. But since we didn't want to litter our application with LLBL entities, we mapped those to our BO's before they hit the client. That mean mapping them back to LLBL entities before hitting the DB again. The DAL code ended up being a very big portion of our application. Not 50%, but sizable.
In a project I am working on now I started with db40 hoping to minimize the DAL footprint. And it was very small, but I ran into problems with db40 and had to abandon it. I switched to ADO.NET for a few days just to get something working and was amazed at how much freakin' ADO I had to write just to get a simple repository working. That was a headache, so I finally opted for NHibernate.
My DAL code with NHibernate (2.0) is probably 5% or less of my code base. That's including the XML mapping files. I mean, the DAL footprint is so small and such a pleasure to work with. I had problems with NHibernate 1.2 in the past in a distributed environment where I needed to work with detached objects, but NHibernate 2.0 seems to have solved that issue. I know it's the way I'm doing DAL from now on, until something better comes along.
Interestingly, my colleagues and I had a similar conversation a couple of months ago. But ours was focused not so much on how much code our DAL accounted for, but how much of our application was simply data querying/manipulation. We estimated that probably 90% of our application was just a matter of fetching the correct subset of data and allowing users to edit it.
We use nHibernate most of the times, and it being generated by a tool, dunno if that counts, we still end up adding a lot of code to the nHibernate layer for each of the entities and yes; it does go on to be a large part of the code.
Also the DAL code increases as you add in more functionality and more entities in you app.
So more or less i guess around 20%-30% of our codebase is composed of the DAL.

What is a good balance in an MVC model to have efficient data access?

I am working on a few PHP projects that use MVC frameworks, and while they all have different ways of retrieving objects from the database, it always seems that nothing beats writing your SQL queries by hand as far as speed and cutting down on the number of queries.
For example, one of my web projects (written by a junior developer) executes over 100 queries just to load the home page. The reason is that in one place, a method will load an object, but later on deeper in the code, it will load some other object(s) that are related to the first object.
This leads to the other part of the question which is what are people doing in situations where you have a table that in one part of the code only needs the values for a few columns, and another part needs something else? Right now (in the same project), there is one get() method for each object, and it does a "SELECT *" (or lists all the columns in the table explicitly) so that anytime you need the object for any reason, you get the whole thing.
So, in other words, you hear all the talk about how SELECT * is bad, but if you try to use a ORM class that comes with the framework, it wants to do just that usually. Are you stuck to choosing ORM with SELECT * vs writing the specific SQL queries by hand? It just seems to me that we're stuck between convenience and efficiency, and if I hand write the queries, if I add a column, I'm most likely going to have to add it to several places in the code.
Sorry for the long question, but I'm explaining the background to get some mindsets from other developers rather than maybe a specific solution. I know that we can always use something like Memcached, but I would rather optimize what we can before getting into that.
Thanks for any ideas.
First, assuming you are proficient at SQL and schema design, there are very few instances where any abstraction layer that removes you from the SQL statements will exceed the efficiency of writing the SQL by hand. More often than not, you will end up with suboptimal data access.
There's no excuse for 100 queries just to generate one web page.
Second, if you are using the Object Oriented features of PHP, you will have good abstractions for collections of objects, and the kinds of extended properties that map to SQL joins. But the important thing to keep in mind is to write the best abstracted objects you can, without regard to SQL strategies.
When I write PHP code this way, I always find that I'm able to map the data requirements for each web page to very few, very efficient SQL queries if my schema is proper and my classes are proper. And not only that, but my experience is that this is the simplest and fastest way to implement. Putting framework stuff in the middle between PHP classes and a good solid thin DAL (note: NOT embedded SQL or dbms calls) is the best example I can think of to illustrate the concept of "leaky abstractions".
I got a little lost with your question, but if you are looking for a way to do database access, you can do it couple of ways. Your MVC can use Zend framework that comes with database access abstractions, you can use that.
Also keep in mind that you should design your system well to ensure there is no contention in the database as your queries are all scattered across the php pages and may lock tables resulting in the overall web application deteriorating in performance and becoming slower over time.
That is why sometimes it is prefereable to use stored procedures as it is in one place and can be tuned when we need to, though other may argue that it is easier to debug if query statements are on the front-end.
No ORM framework will even get close to hand written SQL in terms of speed, although 100 queries seem unrealistic (and maybe you are exaggerating a bit) even if you have the creator of the ORM framework writing the code, it will always be far from the speed of good old SQL.
My advice is, look at the whole picture not only speed:
Does the framework improves code readability?
Is your team comfortable with writing SQL and mixing it with code?
Do you really understand how to optimize the framework queries? (I think a get() for each object is not the optimal way of retrieving them)
Do the queries (after optimization) of the framework present a bottleneck?
I've never developed anything with PHP, but I think that you could mix both approaches (ORM and plain SQL), maybe after a thorough profiling of the app you can determine the real bottlenecks and only then replace that ORM code for hand written SQL (Usually in ruby you use ActiveRecord, then you profile the application with something as new relic and finally if you have a complicated AR query you replace that for some SQL)
Regads
Trust your experience.
To not repeat yourself so much in the code you could write some simple model-functions with your own SQL. This is what I am doing all the time and I am happy with it.
Many of the "convenience" stuff was written for people who need magic because they cannot do it by hand or just don't have the experience.
And after all it's a question of style.
Don't hesitate to add your own layer or exchange or extend a given layer with your own stuff. Keep it clean and make a good design and some documentation so you feel home when you come back later.

What is a good choice of ORM for an eCommerce website?

I am using C# 3.0 / .NET 3.5 and planning to build an eCommerce website.
I've seen NHibernate, LLBLGEN, Genome, Linq to SQL, Entity Framework, SubSonic, etc.
I don't want to code everything by hand. If there is some specific bottleneck I'll manage to optimize the database/code.
Which ORM would be best? There is so much available those day that I don't even know where to start.
Which feature(s) should I be using?
Links, Screencast and Documentation are welcome.
I've been using nHibernate which is a very good free solution. The one downside is the lack of documentation, which causes a slightly steep rampup time. But once you get the basics down it really speeds up development.
I like Fluent nHibernate for a way to configure without the xml files. The one thing I suggest though is to abstract out your data access from your application. this way should you choose wrong you don't have to worry about re-coding the App tiers.
I can only really speak for LINQ-SQL and can say that it is:
Easy to use
Quick to get you up and running
Good for simple schemas and object models
but it starts to fall down if:
You're using a disconnected (tiered) architecture because its datacontexts require the same object instances to perform tracking and concurrency (though there are ways around this).
You have a complex object model / database
Plus it has some other niggles and strange behaviour
I'm looking to try EF next myself and MS seem to be quietly dropping LINQ-SQL in favour of EF, which isn't exactly a ringing recommendation of LINQ-SQL :)
That depends on the architecture of the data model. I can speak to the effectiveness of SubSonic, since I'm in the process of launching a web app that it backs.
I've run into problems with JOINs and DISTINCTs while using SubSonic. Both times, all I had to do is patch the source and rebuild the DLL. Now, I'm not at all averse to something like this, but you might be.
Other than those two problems, SubSonic is a joy to use. Selects are very easy and flowing. It maps fairly closely to SQL, much the same way LINQ does. Also, SubSonic comes with the scaffolding function that should be able to pre-build certain pages for you. I'm not sure how effective it is, since I like to do that stuff myself.
One more thing, selection of specific rows as opposed to * is slow, but only in debug mode. Once you compile for release, it's actually faster.
That's my two cents.
I started out using Linq to SQL as the whole linq integration is awesome, but if you want to do Model First rather than Schema First and you want to have a rich domain model then nHibernate\Fluent nHibernate is really the way to go. We switched to this and is far simpler, better supported than l2s. However for straight dragging your schema into the dbml code generator, linq to sql is great.
I have also heard very good things about Mindscape Lightspeed but have not used it.

Resources