How Entity Framework generates queries? - sql-server

For last 5-6 months, I've started using EF Code First for all my data access needs in .net projects.
However, most of the time I got complaints from clients about that the webpages are slow.
I've recently took some time and drilled down this issue and it seems most of the time is being consumed by EF.
Normally the database has about few ten thousands of records, and only couple of tables. I regularly use standard LINQ queries to manipulate database. Also, I use repository and unit of work patterns in my EF codes. I apply indexes to database.
All I do is nothing magic, just regular/standard suggested ways.
Is EF taking so long to generate queries, or there is something I might be missing? Why data access is so slow even if I use standard algorithms and suggested ways?

Related

Entity Framework performance difference with many includes

In my application I have a big nasty query that uses 25 includes. I know that might be a bit exessive, but it haven't given us much problems and have been working fine. If I take the query generated by EF and run it manually in the database it takes around 500ms, and from the code EF uses around 700ms to get the data from the database and build up the object structure, and that is perfectly acceptable.
The problem however is on the production server. If I run the query manually there I see the same around 500ms time usage to fetch the data, but entity framework now uses around 11000ms to get the data and build the object, and that is of course not good by any measure.
So my question is: What can be the cause of these extreme differences when the query fired manually on the database is roughly the same?
I ended up using Entity Framework in a more "manual" way.
So instead of using dbContext.Set<T> and a lot of includes, I had to manually use a series of dbContext.Database.SqlQuery<T>("select something from something else"). After a bit of painful coding binding all the objects together I tested it on the machines that had the problem, and now it worked as expected on all machines.
So I don't know why it worked on some machines and not others, but it seems that EF has problems on some machine setups when there are a great deal of includes.

When should entity framework be used?

I'm new to Entity Framework and, of course, I have found few questions on SOF with regards to target use-cases.
Let me give you some information. I'm not dealing with different database vendors or different databases; one, and only one, SQL Server 2008 and the database has less than 30 tables. Do I really need to redo things and go with Entity Framework?
EDIT:
Thanks to James for fixing my question. So I'm assuming using EF adds up overhead and does some work in background which I won't know. This is MS working style so I guess my next questions are:
Does it affect performance as well?
Does it support hierarchyid data type?
The main reason for using entity framework (or any other Object-Relational-Mapper such as LINQ to SQL, NHIbernate etc.) is to get easier data access in your application. With EF you gain access to the power of querying through LINQ and simple updating by just assigning the new values to a .NET object, without the need to ever write any SQL code yourself.
Personally I use EF Code First with EF Migrations for green field development and LINQ to SQL in the cases where I have an existing database to build upon.
ORMs Object-Relational-Mapper such as EntityFramework should be used if you are doing interactive data manipulation, the typical OLTP app can obtain big reductions in the number of lines of code needed vs using something like the plain ADO.NET API (either with dynamic SQL or with Stored Procedure invocation). ORMs do have a performance penalty, but, in interactive systems, this performace penalty becomes negligible, on one hand, because of the big reduction in "grunt work", and on the other hand, because the element that generates the most performance penalty is the human interacting with the system
If, on the other hand you need to make heavy batch processing of complex stuff that will run taking stuff from some tables and putting it into other tables without any interactive user intervention then the perfomance penalty of the ORM approach becomes noticeable, under this circumstances, it pays off to avoid taking things out of the database memory space and the performance advantage of using stored procedures finally becomes noticeable.
So in general:
Use ORM for interactive stuff
Use Stored procedures for non-interactive batch stuff

Dataset retrieving data from another dataset

I work with an application that it switching from filebased datastorage to database based. It has a very large amount of code that is written specifically towards the filebased system. To make the switch I am implementing functionality that will work as the old system, the plan is then making more optimal use of the database in new code.
One problem is that the filebased system often was reading single records, and read them repeatedly for reports. This have become alot of queries to the database, which is slow.
The idea I have been trying to flesh out is using two datasets. One dataset to retrieve an entire table, and another dataset to query against the first, thereby decreasing communication overhead with the database server.
I've tried to look at the DataSource property of TADODataSet but the dataset still seems to require a connection, and it asks the database directly if Connection is assigned.
The reason I would prefer to get the result in another dataset, rather than navigating the first one, is that there is already implemented a good amount of logic for emulating the old system. This logic is based on having a dataset containing only the results as queried with the old interface.
The functionality only have to support reading data, not writing it back.
How can I use one dataset to supply values for another dataset to select from?
I am using Delphi 2007 and MSSQL.
You can use a ClientDataSet/DataSetProvider pair to fetch data from an existing DataSet. You can use filters on the source dataset, filters on the ClientDataSet and provider events to trim the dataset only to the interesting records.
I've used this technique with success in a couple of migrating projects and to mitigate similar situation where a old SQL Server 7 database was queried thousands of times to retrieve individual records with painful performance costs. Querying it only one time and then fetching individual records to the client dataset was, at the time, not only an elegant solution but a great performance boost to that particular application: The most great example was an 8 hour process reduced to 15 minutes... poor users loved me that time.
A ClientDataSet is just a TDataSet you can seamlessly integrate into existing code and UI.

Fooling Around With Databases, Online Development

I'm trying to design a database for a small project I'm working on. Eventually, I'd like to make it a web-app, but right now I don't mind just experimenting with data offline. However, I'm stuck in a crossroads.
The basic concept would be a user inputs values for 10 fields, to be compared against what is in the database, with each item having a weighted value. I know that if I was to code it, that I could use a look-up tables for each field, add up the values, and display the result to the end-user.
Another example would be having to get the distance between two points, each point stored in a row, with the X value getting its own column as well as the Y value.
Now, if I store data within a database, should I try to do everything within queries (which I think would involve temporary tables among other things), or just use simple queries, and manipulate the rows returned within the application code?
Right now, I'm thinking to go for the latter (manipulate data within the app) and just use queries to reduce the amount of data that I would have to go sort through. What would you guys suggest?
EDIT: Right now I'm using Microsoft Access to get the basics down pat and try to get a good design going. IIRC with my experience with Oracle and MySQL you can run commands together in a batch process and return just one result. But not sure if you can do that with Access.
If you're using a database I would strongly suggest using SQL to do all your manipulation. SQL is far more capable and powerful for this kind of job as compared to imperative programming languages.
Of course it does imply that you're comfortable in thinking about data as "sets" and programming in a declarative style. But spending time now to get really comfortable with SQL and manipulating data using SQL will pay off big time in the long run. Not only for this project but for projects in the future. I would also suggest using stored procedures over queries in code because stored procedure provide a beautiful abstraction layer allowing your table design to change over time without impacting the rest of the system.
A very big part of using and working with databases is understanding Data modeling, normalization and the like. Like everything else it will be a effort but in the long run it will pay off.
May I ask why you're using Access when you have a far better database available to you such as MSSQL Express? The migration path from MSSQL Express to MSSQL or SQL Azure even is quite seamless and everything you do and experience today (in this project) completely translates to MSSQL Server/SQL Azure for future projects as well as if this project grows beyond your expectations.
I don't understand your last statement about running a batch process and getting just one result, but if you can do it in Oracle and MySQL then you can do it in MSSQL Express as well.
What Shiv said, and also...
A good DBMS has quite a bit of solid engineering in it. There are two components that are especially carefully engineered, namely the query optimizer and the transaction controller. If you adopt the view of using the DBMS as just a stupid table retrieval tool, you will most likely end up inventing your own optimizer and transaction controller inside the application. You won't need the transaction controller until you move to an environment that supports multiple concurrent users.
Unless your engineering talents are extraordinary, you will probably end up with a home brew data management system that is not as good as the one in a good DBMS.
The learning curve for SQL is steep. You need to learn how to phrase queries that join, project, and restrict data from multiple tables. You need to learn how to handle updates in the context of a transaction.
You need to learn simple and sound table design and index design. This includes, but is not limited to, data normalization and data modeling. And you need a DBMS with a good optimizer and good transaction control.
The learning curve is steep. But the view from the top is worth the climb.

SQLite optimization

I consider to use SQLite in a desktop application to persist my model.
I plan to load all data to model classes when the user opens a project and write it again when the user saves it. I will write all data and not just the delta that changed (since it is hard for me to tell).
The data may contain thousands of rows which I will need to insert. I am afraid that consecutive insertion of many rows will be slow (and a preliminary tests proves it).
Are there any optimization best practices / tricks for such a scenario?
EDIT: I use System.Data.SQLite for .Net
Like Nick D said: If you are going to be doing lots of inserts or updates at once, put them in a transaction. You'll find the results to be worlds apart. I would suggest re-running your preliminary test within a transaction and comparing the results.

Resources