LINQ is so slow with huge database table

LINQ is so slow with huge database table - sql-server

I have an ASP.Net MVC software with SQL server backend. I have a table with 80 column, currently counting about 975413 records. I am using Linq for the transactions with database. The problem is that I noticed that it is taking so long time to execute commands like SaveChanges(), Find(), Select().. and so.
How can I reduce the time taken to execute such Linq commands...

You'll have to do some profiling.
Log the actual SQL commands that Linq is generating.
Use SQL Server profiling to suss out which queries are the worst culprits in terms of performance. Examine those queries' execution strategies.
If Linq is generating silly SQL, then you might have to tweak your Linq code, or consider using raw SQL commands. If the execution strategies are showing unwanted strategies like table scans, then you might want to consider adding indices, or changing them (re-ordering the keys, adding included columns).
Note also that Linq is generally quite slow. But really, 1 million records isn't that big, I'm sure you can improve performance using the above.

Related

Multiple linq queries or just build a SQL view?

I have a mvc view that requires data from 5 different DB tables. I currently have a big LINQ query that joins all the tables and returns the results, works fine. However, I am wondering if it would better to build a DB view to make the LINQ query simple.

Querying 5 tables via a single query isn't necessarily a problem. It depends on a ton of external factors like how performant is your database setup and the characteristics of the tables themselves: are they huge with millions of rows or only a few hundred?
Assuming it is a problem, causing excessive load on your database or long page load times, then yes, you might want to look into an alternative solution, but a view is almost certainly not the right choice.
Views have a very key negative in that they cannot have keys nor indexes. That means unless you plan to just return everything in the view, it will almost always be slower to query into a view than even doing joins across tables. Frankly, I've pretty much never found a good use for a database view in a web application context. Maybe they work in other environments, such as reporting, but other than that, they're useless. If you need an alternative to Entity Framework, use a stored procedure.

Since your objective is performance, keep with the 5 joins. You could enable SQL Profiler and track the query that is being generated by EF. Probably, if you write the query manually and then send to EF execute it, you'll get a better performance too.

Entity framework performance tuning

We are using entity framework to query a SQL server database. The LINQ expression is IQueryable. This query takes about 10 seconds to execute. If this were in a stored procedure, I would play around with the query to make it more efficient. However if I am using IQueryable, does entity framework itself decide on how to build an efficient query or do I have to play around with the linq expression and improve performance with trial and error?

As a database developer more than a C# developer, and with very limited exposure to Entity Framework specifics, I can say:
My understanding is that Entity Framework decides how to build a query, probably without much ability to understand efficiency. There might be some things you can do better or worse in your Linq query or Lambda expression, but for the most part you probably aren't going to be able to really tweak the query. This is a main downside of using an ORM, at least from the perspective of the DBAs who get paged in the middle of the night when the server crawls to a halt and they can't do anything to fix the query and it's not like you can always just add an index ;-).
I can also say that you do have the option in Entity Framework to specify a Stored Procedure for each of the DML operations so that if you really needed to do something better with this particular query, then create a Stored Procedure for just this one operation and point the EF object to it for SELECT but allow EF to build the query for INSERT / UPDATE DELETE.
Does that help?

does entity framework itself decide on how to build an efficient query
EF will always automatically determine how to build the query, and sql -server also optimizes queries automagically.
do I have to play around with the linq expression and improve performance with trial and error?
You can try to play with the queries but typically minor changes won't affect performance (in terms of ordering of expressions)
You can always use SQL Profiler to watch what EF does and see how efficient the query is. If it takes you long you can rerun the query in SSMS and turn on Include Actual Execution Plan and determine where the query is slow.

If you're using EF 6, you can enable logging quite easily. You can then inspect what each call is doing. I would start there.
MSDN EF6 Logging
Are you able to share a bit more about your query and the result size?

Hive vs SQL Server performance

1) I started using hive from last 2 months. I have a same task as that in SQL. I found that Hive is slow and takes more time to execute queries while SQL executes it in very few minutes/seconds.
After executing the task in Hive when I cross check the result in both (SQL and Hive), I found some difference in results (Not all but in some tables).
e.g. : I have one table which has 2012 records, when I executed a task in Hive in the same table in Hive I got 2007 records.
Why it is happening?
2) If I think to speed up my execution in Hive then what should I do for it?
(Currently I am executing all this stuff on single cluster only. If I think to increase the clusters then how many cluster should I need it to increase the performance)
Please suggest me some solution or some good practices so that I can do it keenly.
Thanks.

Hive and SQL Server are not comparable in any way other than the similarity in the syntax of the query language.
While SQL Server is built to be able to respond in realtime from a single machine, hive is for processing large data sets that may span hundreds or thousands of machines.
Hive (via hadoop) has a lot of overhead for starting up a job.
Hive and hadoop will not cache data in memory like sql server does.
Hive has only recent added indexes so most queries end up being a table scan.
If your dataset fits on a single computer you probably want to stick with SQL Server and not hive. Hive performance tuning is mostly based in Hadoop performance tuning although depending on the types of queries you run there can be free performance from using the LazyBinarySerDe.
Hive does have some differences from regular SQL that may be effecting your query. Without more details I can't speculate as to why.

Ignore the "they aren't comparable in any way" comment. If it stores data, it is comparable to any other method of storing data.
But be aware that SQL Server, 13 years ago, had 1000+ people being paid full-time to improve their product. So while that doesn't "Prove" anything, it does increase ones confidence that more work = more results.
More importantly, look for any non-trivial benchmark done on an open source and/or non-relational method of storing data vs one of the mainstream relational databases. You won't find them. That says a lot to me. (Also, mainstream isn't necessary since the current world's fastest data engine isn't even mainstream. But if that level is needed, look at ExoSol.)
If your need is to learn to work with technology at your job and that technology is Hive, my recommendation is to find someone who is really focused on getting the most out of Hive query performance as possible. If there is a Hive query guru out there, find them. But if you need a lot more than what they can give you, you're using the wrong technology.
And if Hive isn't a requirement, I would avoid it and other technologies lacking the compelling business model that will guarantee their survival past 5 years and move them out of niche category they currently exist in (currently 20 times less popular than any mainstream data engine - https://db-engines.com/en/ranking).

How do you gather statistics from SQL Server?

sys.dm_exec_query_stats seems to be a very useful function to gather statistics from your database which you can use as a starting point to find queries which need to be optimized. selecting * gives somewhat cryptic results, how do you make the results readable? What type of queries do you get from it? Are there other functions or queries you use to gain performance statistics?

To make the results useful, you need to cross reference the information with a few other DMV's and also concentrate your analysis and tunning efforts on the most poorly performing queries.
Here is (one I made earlier) an example of using the DMV you have mentioned to identify the most costly SQL Server queries.
How to identify the most costly SQL Server queries using DMV’s
You can easily extend this to look at other metrics too.
If you want to make performance tuning a breeze for yourself, you should consider installing the freely available SQL Server Performance Dashboard Reports.
These can be used to identify SQL Server Waits, the queries that consume the most I/O, the longest running queries by duration etc.

Why don't you first use 'set pagesize 0'.

Performance Difference between LINQ and Stored Procedures

Related
LINQ-to-SQL vs stored procedures?
I have heard a lot of talk back and forth about the advantages of stored procedures being pre compiled. But what are the actual performance difference between LINQ and Stored procedures on Selects, Inserts, Updates, Deletes? Has anyone run any tests at all to see if there is any major difference. I'm also curious if a greater number of transactions makes a difference.
My guess is that LINQ statements get cached after the first transaction and performance is probably going to be nearly identical. Thoughts?

LINQ should be close in performance but I disagree with the statement above that says LINQ is faster, it can't be faster, it could possibly be just as as fast though, all other things being equal.
I think the difference is that a good SQL developer, who knows how to optimize, and uses stored procedures is always going to have a slight edge in performance. If you are not strong on SQL, let Linq figure it out for you, and your performance is most likely going to be acceptable. If you are a strong SQL developer, use stored procedures to squeeze out a bit of extra performance if you app requires it.
It certainly is possible if you write terrible SQL to code up some stored procedures that execute slower than Linq would, but if you know what you are doing, stored procedures and a Datareader can't be beat.

Stored procedures are faster as compared to LINQ query they can take the full advantage of SQL features.
when a stored procedure is being executed next time, the database used the cached execution plan to execute that stored procedure.
while LINQ query is compiled each and every time.
Hence, LINQ query takes more time in execution as compared to stored procedures.
Stored procedure is a best way for writing complex queries as compared to LINQ.

LINQ queries can (and should be) precompiled as well. I don't have any benchmarks to share with you, but I think everyone should read this article for reference on how to do it. I'd be especially interested to see some comparison of precompiled LINQ queries to SPROCS.

There is not much difference except that LINQ can degrade when you have lot of data and you need some database tuning.

LINQ2SQL queries will not perform any differently from any other ad-hoc parameterized SQL query, other than the possibility that the generator may not optimize the query in the best fashion.

The common perception is that ad-hoc sql queries perform better than Stored Procedures. However, this is false:
SQL Server 2000 and SQL Server version
7.0 incorporate a number of changes to statement processing that extend many
of the performance benefits of stored
procedures to all SQL statements. SQL
Server 2000 and SQL Server 7.0 do not
save a partially compiled plan for
stored procedures when they are
created. A stored procedure is
compiled at execution time, like any
other Transact-SQL statement. SQL
Server 2000 and SQL Server 7.0 retain
execution plans for all SQL statements
in the procedure cache, not just
stored procedure execution plans.
-- SqlServer's Books Online
Given the above and the fact that LINQ generates ad-hoc queries, my conclusion is that there is no performance difference between Stored Procedures & LINQ. And I am also apt to believe that SQL Server wouldn't move backwards in terms of query performance.

Linq should be used at the business logic layer on top of views created in sql or oracle. Linq helps you insert another layer for business logic, maintenance of which is in the hands of coders or non sql guy. It will definitely not perform as well as sql coz its not precompiled and you can perform lots of different things in sps.
But you can definitely add a programming detail and segregate the business logic from core sql tables and database objects using Linq.

See LINQ-to-SQL vs stored procedures for help - I think that post has most of the info. you need.

Unless you are trying to get every millisecond out of your application, whether to use a stored procedure or LINQ may need to be determined by what you expect developers to know and maintainability.
Stored procedures will be fast, but when you are actively developing an application you may find that the easy of using LINQ may be a positive, as you can change your query and your anonymous type that is created from LINQ very quickly.
Once you are done writing the application and you know what you need, and start to look at optimizing it, then you can look at other technologies and if you have good unit testing then you should be able to compare different techniques and determine which solution is best.
You may find this comparison of various ways for .NET 3.5 to interact with the database useful.
http://toomanylayers.blogspot.com/2009/01/entity-framework-and-linq-to-sql.html

I don't think I would like having my database layer in compiled code. It should be a separate layer not combined. As I develop and make use of Agile I am constantly changing the database design, and the process goes very fast. Adding columns, removing columns or creating a new tables in SQL Server is as easy as typing into Excel. Normalizing a table or de-normalizing is also pretty fast at the database level. Now with Linq I would also have to change the object representation every time I make a change to the database or live with it not truly reflecting how the data is stored. That is a lot of extra work.
I have heard that Linq entities can shelter your application from database change but that doesn't make sense. The database design and application design need to go hand in hand. If I normalize several tables or do some other redesign of the database I wouldn't want a Linq object model to no longer reflect the actual database design.
And what about advantage of tweaking a View or Stored Procedure. You can do that directly at the database level without having to re-compile code and release it to production. If I have a View which shows data from several tables and I decide to change the database design all I have to do is change that View. All my code remains the same.

Consider a database table with a million entries, joined to another table with a million entries... do you honestly think that doing this on the webserver (be it in LINQ or ad-hoc SQL) is going to be faster or more efficient than letting SQL Server do it on the database?
For simple queries, then LINQ is obviously better as it will be pre-compiled, giving you the advantage of having type checking , etc. However, for any intensive database operations, report building, bulk data analysis that need doing, stored procedures will win hands down.

<script>alert("hello") </script> I think that doing this on the webserver (be it in LINQ or ad-hoc SQL) is going to be faster or more efficient than letting SQL Server do it on the database?
For simple queries, then LINQ is obviously better as it will be pre-compiled, giving you the advantage of having type checking , etc. However, for any intensive database operations, report building, bulk data analysis that need doing, stored procedures will win hands dow

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

LINQ is so slow with huge database table - sql-server

Related

Multiple linq queries or just build a SQL view?

Entity framework performance tuning

Hive vs SQL Server performance

How do you gather statistics from SQL Server?

Performance Difference between LINQ and Stored Procedures

Categories

Resources