Entity framework performance tuning - sql-server

We are using entity framework to query a SQL server database. The LINQ expression is IQueryable. This query takes about 10 seconds to execute. If this were in a stored procedure, I would play around with the query to make it more efficient. However if I am using IQueryable, does entity framework itself decide on how to build an efficient query or do I have to play around with the linq expression and improve performance with trial and error?

As a database developer more than a C# developer, and with very limited exposure to Entity Framework specifics, I can say:
My understanding is that Entity Framework decides how to build a query, probably without much ability to understand efficiency. There might be some things you can do better or worse in your Linq query or Lambda expression, but for the most part you probably aren't going to be able to really tweak the query. This is a main downside of using an ORM, at least from the perspective of the DBAs who get paged in the middle of the night when the server crawls to a halt and they can't do anything to fix the query and it's not like you can always just add an index ;-).
I can also say that you do have the option in Entity Framework to specify a Stored Procedure for each of the DML operations so that if you really needed to do something better with this particular query, then create a Stored Procedure for just this one operation and point the EF object to it for SELECT but allow EF to build the query for INSERT / UPDATE DELETE.
Does that help?

does entity framework itself decide on how to build an efficient query
EF will always automatically determine how to build the query, and sql -server also optimizes queries automagically.
do I have to play around with the linq expression and improve performance with trial and error?
You can try to play with the queries but typically minor changes won't affect performance (in terms of ordering of expressions)
You can always use SQL Profiler to watch what EF does and see how efficient the query is. If it takes you long you can rerun the query in SSMS and turn on Include Actual Execution Plan and determine where the query is slow.

If you're using EF 6, you can enable logging quite easily. You can then inspect what each call is doing. I would start there.
MSDN EF6 Logging
Are you able to share a bit more about your query and the result size?

Related

Multiple linq queries or just build a SQL view?

I have a mvc view that requires data from 5 different DB tables. I currently have a big LINQ query that joins all the tables and returns the results, works fine. However, I am wondering if it would better to build a DB view to make the LINQ query simple.
Querying 5 tables via a single query isn't necessarily a problem. It depends on a ton of external factors like how performant is your database setup and the characteristics of the tables themselves: are they huge with millions of rows or only a few hundred?
Assuming it is a problem, causing excessive load on your database or long page load times, then yes, you might want to look into an alternative solution, but a view is almost certainly not the right choice.
Views have a very key negative in that they cannot have keys nor indexes. That means unless you plan to just return everything in the view, it will almost always be slower to query into a view than even doing joins across tables. Frankly, I've pretty much never found a good use for a database view in a web application context. Maybe they work in other environments, such as reporting, but other than that, they're useless. If you need an alternative to Entity Framework, use a stored procedure.
Since your objective is performance, keep with the 5 joins. You could enable SQL Profiler and track the query that is being generated by EF. Probably, if you write the query manually and then send to EF execute it, you'll get a better performance too.

SSRS Best Practice - Data Calculations/Aggregatopm in SQL SP or in SSRS Expressions (VS/Report Builder)

Should I try to do all (or as many as possible) necessary calculations for an SSRS report in SQL code (stored procedures) like summing, percentages etc. or should I do the calculations using Expressions in Report Builder/VS?
Is there an advantage to doing one over the other?
In other words, should I try to keep the data in my Datasets very granular, detailed, low level and then just use Report Builder 3.0/VS to do all the necessary calculations/aggregations?
There is no one-size-fits-all best approach. In a lot of cases, SQL will be faster at performing aggregations than SSRS. SSRS will be faster at performing the kind of operations that would cause a table scan instead of an index seek when it's done in SQL.
Experience, common sense, and testing are the best guides.
Almost always you want to do your filtering and calcs on the server side. If you do it through a stored procedure SQL Server can optimize the query and create a well prepared, reusable, query plan. You can examine the resulting query plan and optimize it. None of this is possible if you create and run the code on the client side. How will it use indexes on the client? If your report uses a lot of data your report will take a much longer time to run and your users will blame you. The editor in BIDS is much poorer then the one in SSMS. Procs can be backed up and managed through SVN of TFS. Unless you know for sure that it runs faster on the client (and this is very rare) learn how to create stored procedures.

LINQ is so slow with huge database table

I have an ASP.Net MVC software with SQL server backend. I have a table with 80 column, currently counting about 975413 records. I am using Linq for the transactions with database. The problem is that I noticed that it is taking so long time to execute commands like SaveChanges(), Find(), Select().. and so.
How can I reduce the time taken to execute such Linq commands...
You'll have to do some profiling.
Log the actual SQL commands that Linq is generating.
Use SQL Server profiling to suss out which queries are the worst culprits in terms of performance. Examine those queries' execution strategies.
If Linq is generating silly SQL, then you might have to tweak your Linq code, or consider using raw SQL commands. If the execution strategies are showing unwanted strategies like table scans, then you might want to consider adding indices, or changing them (re-ordering the keys, adding included columns).
Note also that Linq is generally quite slow. But really, 1 million records isn't that big, I'm sure you can improve performance using the above.

Performance Difference between LINQ and Stored Procedures

Related
LINQ-to-SQL vs stored procedures?
I have heard a lot of talk back and forth about the advantages of stored procedures being pre compiled. But what are the actual performance difference between LINQ and Stored procedures on Selects, Inserts, Updates, Deletes? Has anyone run any tests at all to see if there is any major difference. I'm also curious if a greater number of transactions makes a difference.
My guess is that LINQ statements get cached after the first transaction and performance is probably going to be nearly identical. Thoughts?
LINQ should be close in performance but I disagree with the statement above that says LINQ is faster, it can't be faster, it could possibly be just as as fast though, all other things being equal.
I think the difference is that a good SQL developer, who knows how to optimize, and uses stored procedures is always going to have a slight edge in performance. If you are not strong on SQL, let Linq figure it out for you, and your performance is most likely going to be acceptable. If you are a strong SQL developer, use stored procedures to squeeze out a bit of extra performance if you app requires it.
It certainly is possible if you write terrible SQL to code up some stored procedures that execute slower than Linq would, but if you know what you are doing, stored procedures and a Datareader can't be beat.
Stored procedures are faster as compared to LINQ query they can take the full advantage of SQL features.
when a stored procedure is being executed next time, the database used the cached execution plan to execute that stored procedure.
while LINQ query is compiled each and every time.
Hence, LINQ query takes more time in execution as compared to stored procedures.
Stored procedure is a best way for writing complex queries as compared to LINQ.
LINQ queries can (and should be) precompiled as well. I don't have any benchmarks to share with you, but I think everyone should read this article for reference on how to do it. I'd be especially interested to see some comparison of precompiled LINQ queries to SPROCS.
There is not much difference except that LINQ can degrade when you have lot of data and you need some database tuning.
LINQ2SQL queries will not perform any differently from any other ad-hoc parameterized SQL query, other than the possibility that the generator may not optimize the query in the best fashion.
The common perception is that ad-hoc sql queries perform better than Stored Procedures. However, this is false:
SQL Server 2000 and SQL Server version
7.0 incorporate a number of changes to statement processing that extend many
of the performance benefits of stored
procedures to all SQL statements. SQL
Server 2000 and SQL Server 7.0 do not
save a partially compiled plan for
stored procedures when they are
created. A stored procedure is
compiled at execution time, like any
other Transact-SQL statement. SQL
Server 2000 and SQL Server 7.0 retain
execution plans for all SQL statements
in the procedure cache, not just
stored procedure execution plans.
-- SqlServer's Books Online
Given the above and the fact that LINQ generates ad-hoc queries, my conclusion is that there is no performance difference between Stored Procedures & LINQ. And I am also apt to believe that SQL Server wouldn't move backwards in terms of query performance.
Linq should be used at the business logic layer on top of views created in sql or oracle. Linq helps you insert another layer for business logic, maintenance of which is in the hands of coders or non sql guy. It will definitely not perform as well as sql coz its not precompiled and you can perform lots of different things in sps.
But you can definitely add a programming detail and segregate the business logic from core sql tables and database objects using Linq.
See LINQ-to-SQL vs stored procedures for help - I think that post has most of the info. you need.
Unless you are trying to get every millisecond out of your application, whether to use a stored procedure or LINQ may need to be determined by what you expect developers to know and maintainability.
Stored procedures will be fast, but when you are actively developing an application you may find that the easy of using LINQ may be a positive, as you can change your query and your anonymous type that is created from LINQ very quickly.
Once you are done writing the application and you know what you need, and start to look at optimizing it, then you can look at other technologies and if you have good unit testing then you should be able to compare different techniques and determine which solution is best.
You may find this comparison of various ways for .NET 3.5 to interact with the database useful.
http://toomanylayers.blogspot.com/2009/01/entity-framework-and-linq-to-sql.html
I don't think I would like having my database layer in compiled code. It should be a separate layer not combined. As I develop and make use of Agile I am constantly changing the database design, and the process goes very fast. Adding columns, removing columns or creating a new tables in SQL Server is as easy as typing into Excel. Normalizing a table or de-normalizing is also pretty fast at the database level. Now with Linq I would also have to change the object representation every time I make a change to the database or live with it not truly reflecting how the data is stored. That is a lot of extra work.
I have heard that Linq entities can shelter your application from database change but that doesn't make sense. The database design and application design need to go hand in hand. If I normalize several tables or do some other redesign of the database I wouldn't want a Linq object model to no longer reflect the actual database design.
And what about advantage of tweaking a View or Stored Procedure. You can do that directly at the database level without having to re-compile code and release it to production. If I have a View which shows data from several tables and I decide to change the database design all I have to do is change that View. All my code remains the same.
Consider a database table with a million entries, joined to another table with a million entries... do you honestly think that doing this on the webserver (be it in LINQ or ad-hoc SQL) is going to be faster or more efficient than letting SQL Server do it on the database?
For simple queries, then LINQ is obviously better as it will be pre-compiled, giving you the advantage of having type checking , etc. However, for any intensive database operations, report building, bulk data analysis that need doing, stored procedures will win hands down.
<script>alert("hello") </script> I think that doing this on the webserver (be it in LINQ or ad-hoc SQL) is going to be faster or more efficient than letting SQL Server do it on the database?
For simple queries, then LINQ is obviously better as it will be pre-compiled, giving you the advantage of having type checking , etc. However, for any intensive database operations, report building, bulk data analysis that need doing, stored procedures will win hands dow

Instrumenting Database Access

Jeff mentioned in one of the podcasts that one of the things he always does is put in instrumentation for database calls, so that he can tell what queries are causing slowness etc. This is something I've measured in the past using SQL Profiler, but I'm interested in what strategies other people have used to include this as part of the application.
Is it simply a case of including a timer across each database call and logging the result, or is there a 'neater' way of doing it? Maybe there's a framework that does this for you already, or is there a flag I could enable in e.g. Linq-to-SQL that would provide similar functionality.
I mainly use c# but would also be interested in seeing methods from different languages, and I'd be more interested in a 'code' way of doing this over a db platform method like SQL Profiler.
If a query is more then just a simple SELECT on a single table I always run it through EXPLAIN if I am on MySQL or PostgreSQL. If you are using SQL Server then Management Studio has a Display Estimated Execution Plan which is essentially the same. It is useful to see how the engine will access each table and what indexes it will use. Sometimes it will surprise you.
Recording the database calls, the gross timing and the number of records (bytes) returned in the application is useful, but it's not going to give you all the information you need.
It might show you usage patterns you were not expecting. It might show where your using "row-by-row" access instead of "set based" operations.
The best tool to use is SQL Profiler and analyse the number of "Reads" vs the CPU and duration. You want to avoid high CPU queries, high Read's and long durations (duh!).
The "group by reads" is a useful feature to bring to the top the nastiest queries.
If you're writing queries in SQL Management Studio you can enter: SET STATISTICS TIME ON and SQl Server will tell you how long the individual parts of a query took to parse, compile and execute.
You might be able to log this information by handling the InfoMessage event of the SqlConnection class (but I think using the SQL Profiler is much easier.)
I would have thought that the important thing to ask here is "what database platform are you using?"
For example, in Sybase, installing MDA tables might solve your problem, they provide a whole bunch of statistics from procedure call usage to average logical I/O, CPU time and index coverage. It can be as clever as you want it to be.
I definitely see the value in using SQL Profiler while you're app is running, and EXPLAIN or SET STATISTICS will give you information about individual queries, but does anyone routinely put measurement points into their code to gather information about database queries ongoing - that would pick up on for example, a query on a table that performs fine initially, but as the number of rows grows, becomes slower and slower.
If you're using MySQL or Postgre there's various tools for seeing query activity in real time, but I haven't found a tool as good as the SQL Profiler for measuring query performance over time.
I'm wondering if there is (or should be?) something similar to ELMAH in the way it just plugs in and gives you information without much additional effort?
If you're into Firebird you may want to watch sinatica.com.
We'll soon launch a real-time monitoring tool for Firebird DBAs.
< /shameless plug>
If you use Hibernate (I use the Java version, I'd imagine NHibernate has something similar), you can have Hibernate collect statistics about lots of different things. See, for example:
http://www.javalobby.org/java/forums/t19807.html

Resources