Clever tricks to find specific LINQ queries in SQL Profiler

Clever tricks to find specific LINQ queries in SQL Profiler - sql-server

Profiling LINQ queries and their execution plans is especially important due to the crazy SQL that can sometimes be created.
I often find that I need to track a specific query and have a hard time finding in query analyzer. I often do this on a database which has a lot of running transactions (sometimes production server) - so just opening Profiler is no good.
I've also found tryin to use the DataContext to trace inadequate, since it doesnt give me SQL I can actually execute myself.
My best strategy so far is to add in a 'random' number to my query, and filter for it in the trace.
LINQ:
where o.CompletedOrderID != "59872547981"
Profiler filter:
'TextData' like '%59872547981'
This works fine with a couple caveats :
I have to be careful to remember to remove the criteria, or pick something that wont affect the query plan too much. Yes I know leaving it in is asking for trouble.
As far as I can tell though, even with this approach I need to start a new trace for every LINQ query I need to track. If I go to 'File > Properties' for an existing trace I cannot change the filter criteria.
You cant beat running a query in your app and seeing it pop up in the Profiler without any extra effort. Was just hoping someone else had come up with a better way than this, or at least suggest a less 'dangerous' token to search for than a query on a column.

Messing with the where clause is maybe not the best thing to do since it can and will affect the execution plans for your queries.
Do something funky with projection into anonymous classes instead - use a unique static column name or something that will not affect the execution plan. (That way you can leave it intact in production code in case you later need to do any profiling of production code...)
from someobject in dc.SomeTable
where someobject.xyz = 123
select new { MyObject = someobject, QueryTraceID1234132412='boo' }

You can use the Linq to SQL Debug Visualiser - http://weblogs.asp.net/scottgu/archive/2007/07/31/linq-to-sql-debug-visualizer.aspx and see it in your watch window.
Or you can use DataContext.GetCommand(); to see the SQL before it executes.
You can also look at the DataContext.GetChangeSet() to view what's going to be inserted/ updated or deleted.

EFCore has a feature TagWith() exactly for this purpose.
var nearestFriends =
(from f in context.Friends.TagWith("This is my spatial query!")
orderby f.Location.Distance(myLocation) descending
select f).Take(5).ToList();
https://learn.microsoft.com/en-us/ef/core/querying/tags
Unfortunately you can't use Query Store to find them :-)
This is because comments before the query are stripped out.
Such a shame! Hope I don't have to wait another 12 years.

You can have your datacontext log out the raw SQL, which you could then search for in the profiler to examine performance.
using System.Diagnostics.Debugger;
yourDataContext.Log = new DebuggerWriter();
All of your SQL queries will be displayed in the debugger output window now.

Related

Power BI - How to use native query AND query folding for long queries?

I have a Power BI report pulling from SQL Server that needs to be set up for incremental refresh due to the large data pull. As the load is fairly complex (and PQuery editor is tedious and often breaks folding), I need to use a SQL query (aka a "native query" in PBI speak) while retaining query folding (so that incremental refresh works).
I've been using the nice...
Value.NativeQuery( Source, query, null, [EnableFolding = true])
... trick found here to get that working.
BUT it only seems to work if the native query finishes fairly quickly. When my WHERE clause only pulls data for this year, it's no problem. When I remove the date filter in the WHERE clause (so as to not conflict with the incremental refresh filter), or simply push the year farther back, it takes longer seemingly causing PBI to determine that:
"We cannot fold on top of this native query. Please modify the native query or remove the 'EnableFolding' option."
The error above comes up after a few minutes both in the PQuery editor or if I try to "bypass" it by quickly quickly clicking Close & Apply. And unfortunately, the underlying SQL is probably about as good as it gets due to our not-so-great data structures. I've tried tricking PBI's seeming time-out via an OPTION (FAST 1) in the script, but it just can't pull anything quick enough.
I'm now stuck. This seems like a silly barrier as all I need to do is get that first import to complete as obviously it can query fold for the shorter loads. How do I work past this?

In retrospect, it's silly that I didn't try this initially, but even though the Value.NativeQuery() M step doesn't allow a command time-out option, you can still put it in a preceding Sql.Database() step manually and it carries forward. I also removed some common table expressions from my query which also were breaking query folding (not sure why, but easy fix by saving my complex logic as a view in sql server itself and just joining to that). Takes a while to run now, but doesn't time-out!

SQLSTATE[IMSSP]: Tried to bind parameter number 2101. SQL Server supports a maximum of 2100 parameters

I am trying to run this query
$claims = Claim::wherein('policy_id', $userPolicyIds)->where('claim_settlement_status', 'Accepted')->wherebetween('intimation_date', [$startDate, $endDate])->get();
Here, $userPolicyIds can have thousands of policy ids. Is there any way I can increase the maximum number of parameters in SQL server? If not, could anyone help me find a way to solve this issue?

The wherein method creates an SQL fragment of the form WHERE policy_id IN (userPolicyIds[0], userPolicyIds[1], userPolicyIds[2]..., userPolicyIds[MAX]). In other words, the entire collection is unwrapped into the SQL statement. The result is a HUGE SQL statement that SQL Server refuses to execute.
This is a well known limitation of Microsoft SQL Server. And it is a hard limit, because there appears to be no option for changing it. But SQL Server can hardly be blamed for having this limit, because trying to execute a query with as many as 2000 parameters is an unhealthy situation that you should not have put yourself into in the first place.
So, even if there was a way to change the limit, it would still be advisable to leave the limit as it is, and restructure your code instead, so that this unhealthy situation does not arise.
You have at least a couple of options:
Break your query down to batches of, say, 2000 items each.
Add your fields into a temporary table and make your query join that table.
Personally, I would go with the second option, since it will perform much better than anything else, and it is arbitrarily scalable.

I solved this problem by running this raw query
SELECT claims.*, policies.* FROM claims INNER JOIN policies ON claims.policy_id = policies.id
WHERE policy_id IN $userPolicyIds AND claim_settlement_status = 'Accepted' AND intimation_date BETWEEN '$startDate' AND '$endDate';
Here, $userPolicyIds is a string like this ('123456','654321','456789'). This query is a bit slow, I'll admit that. But the number of policy ids is always going to a very big number and I wanted a quick fix.

just use prepare driver_options (PDO::prepare)
PDO::ATTR_EMULATE_PREPARES => true
https://learn.microsoft.com/en-us/sql/connect/php/pdo-prepare
and split where in on peaces (where (column in [...]) or column in [...])

How to Get Query Plan During Execution

Is it possible to capture the query text of the entire RPC SELECT statement from within the view which it calls? I am developing an enterprise database in SQL Server 2014 which must support a commercial application that submits a complex SELECT statement in which the outer WHERE clause is critical. This statement also includes subqueries against the same object (in my case, a view) which do NOT include that condition. This creates a huge problem because the calling application is joining those subqueries to the filtered result on another field and thus producing ID duplication that throws errors elsewhere.
The calling application assumes it is querying a table (not a view) and it can only be configured to use a simple WHERE clause. My task is to implement a more sophisticated security model behind this rather naive application. I can't re-write the offending query but I had hoped to retrieve the missing information from the cached query plan. Here's a super-simplified psuedo-view of my proposed solution:
CREATE VIEW schema.important_data AS
WITH a AS (SELECT special_function() AS condition),
b AS (SELECT c.criteria
FROM a, lookup_table AS c
WHERE a.condition IS NULL OR c.criteria = a.condition)
SELECT d.field_1, d.field_2, d.filter_field
FROM b, underlying_table AS d
WHERE d.filter_field = b.criteria;
My "special function" reads the query plan for the RPC, extracts the WHERE condition and preemptively filters the view so it always returns only what it should. Unfortunately, query plans don't seem to be cached until after they are executed. If the RPC is made several times, my solution works perfectly -- but only on the second and subsequent calls.
I am currently using dm_exec_cached_plans, dm_exec_query_stats and dm_exec_sql_text to obtain the full text of the RPC by searching on spid, plan creation time and specific words. It would be so much better if I could somehow get the currently executing plan. I believe dm_exec_requests does that but it only returns the current statement -- which is just the definition of my function.
Extended Events look promising but unfamiliar and there is a lot to digest. I haven't found any guidance, either, on whether they are appropriate for this particular challenge, whether a notification can be obtained in real time or how to ensure that the Event Session is always running. I am pursuing this investigation now and would appreciate any suggestions or advice. Is there anything else I can do?

This turns out to be an untenable idea and I have devised a less elegant work-around by restructuring the view itself. There is a performance penalty but the downstream error is avoided. My fundamental problem is the way the client application generates its SQL statements and there is nothing I do about that -- so, users will just have to accept whatever limitations may result.

Query executed from Nhibernate is slow, but from ADO.NET is fast

I have a query in my MVC application which takes about 20 seconds to complete (using NHibernate 3.1). When I execute the query manually on Management studio it takes 0 seconds.
I've seen similiar questions on SO about problems similar to this one, so I took my test one step further.
I intercepted the query using Sql Server Profiler, and executed the query using ADO.NET in my application.
The query that i got from the Profiler is something like: "exec sp_executesql N'select...."
My ADO.NET code:
SqlConnection conn = (SqlConnection) NHibernateManager.Current.Connection;
var query = #"<query from profiler...>";
var cmd = new SqlCommand(query, conn);
SqlDataReader reader = cmd.ExecuteReader(CommandBehavior.CloseConnection);
return RedirectToAction("Index");
This query is also very fast, taking no time to execute.
Also, I've seen something very strange on the Profiler. The query, when executed from NH, has the following statistics:
reads: 281702
writes: 0
The one from ADO.NET:
reads: 333
writes: 0
Anyone has any clue? Is there any info I may provide to help diagnose the problem?
I thought it might be related to some connection settings, but the ADO.NET version is using the same connection from NHibernate.
Thanks in advance
UPDATE:
I'm using NHibernate LINQ. The query is enormous, but is a paging query, with just 10 records being fetched.
The parameters that are passed to the "exec sp_executesql" are:
#p0 int,#p1 datetime,#p2 datetime,#p3 bit,#p4 int,#p5 int
#p0=10,#p1='2009-12-01 00:00:00',#p2='2009-12-31 23:59:59',#p3=0,#p4=1,#p5=0

I had the ADO.NET and NHibernate using different query-plans, and I was sufering the effects of parameter sniffing on the NH version. Why? Because I had previously made a query with a small date interval, and the stored query-plan was optimized for it.
Afterwards, when querying with a large date interval, the stored plan was used and it took ages to get a result.
I confirmed that this was in fact the problem because a simple:
DBCC FREEPROCCACHE -- clears the query-plan cache
made my query fast again.
I found 2 ways to solve this:
Injecting an "option(recompile)" to the query, using a NH Interceptor
Adding a dummy predicate to my NH Linq expression, like: query = query.Where(true) when the expected result-set was small (date interval wise). This way two different query plans would be created, one for large-sets of data and one for small-sets.
I tried both options, and both worked, but opted for the second approach. It's a little bit hacky but works really well I my case, because the data is uniformly distributed date-wise.

I had the exact same problem as the OP. I tried #psousa's suggestion of injecting an "option(recompile)" which did improve my performance. But in the end I found that simply updating statistics on SQL Server did the trick for me.
update statistics tablename;
I ended up backing out my code to inject the "option(recompile)". I realize this may not be the answer for everyone, but wanted to share since it was the cause of my problems.

Look at the parameters being supplied to the sp_executesql stored proc. If the parameters are supplied as nvarchar (N'value') and the columns they reference are varchar, SQL Server will use a very inefficient query plan. This has been the root cause of all the performance issues I've had that exhibit these symptoms (slow in app., fast in SSMS).

you didn't specify your query or the size of its resultset, but there's an issue with fetching large number of entities with nHibernate.
basically, the time to 'hydrate' the objects is what's taking that long.
you can try turning on the reflection optimizer, or using an IStatelessSession.
see som suggestions that i've got here.

SQL Server query taking up 100% CPU and runs for hours

I have a query that has been running every day for a little over 2 years now and has typically taken less than 30 seconds to complete. All of a sudden, yesterday, the query started taking 3+ hours to complete and was using 100% CPU the entire time.
The SQL is:
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER JOIN vwSomeData alpha ON beta.id = alpha.id
alpha.id is a BIGINT type and beta.id is an INT type. dbo.fnGetStuff() is a simple SELECT statement with 2 INNER JOINs on tables in the same DB, using a WHERE id = #id. The function returns approximately 11000 results.
The view vwSomeData is a simple SELECT statement with two INNER JOINs that returns about 590000 results.
Both the view and the function will complete in less than 10 seconds when executed by themselves. Selecting the results of the function into a temporary table first and then joining on that makes the query finish in < 10 seconds.
How do I troubleshoot what's going on? I don't see any locks in the activity manager.

Look at the query plan. My guess is that there is a table scan or more in the execution plan. This will cause huge amounts of I/O for the few record you get in the result.

You could use the SQL Server Profiler tool to monitor what queries are running on SQL Server. It doesn't show the locks, but it can for instance also give you hints on how to improve your query by suggesting indexes.

If you've got a reasonably recent version of SQL Server Management Studio, it has a Database Tuning Adviser as well, under Tools. It takes a trace from the Profiler and makes some, sometimes highly useful, suggestions. Makes sure there's not too many queries - it takes a long time to build advice.
I'm not an expert on it, but have had some luck with it in the past.

Do you need to use a function? Can you re-write the entire thing into a stored procedure in which you pass in the #ID as a parameter.
Even if your table has indexes because you pass the #ID as a variable to the WHERE clause potentially greatly increasing the amount of time for the query to run.
The reason the indexes may not be used is because the Query Analyzer does not know the value of the variables when it selects an access method to perform the query. Because this is a batch file, only one pass is made of the Transact-SQL code, preventing the Query Optimizer from knowing what it needs to know in order to select an access method that uses the indexes.
You might want to consider an INDEX query hint if you cannot re-write the SQL.
it might also be possible, since this just started happening, that the INDEXes have become fragmented and might need to be rebuilt.

I've had similar problems with joining functions that return large datasets. I had to do what you've already suggested. Put the results in a temp table and join on that.

Look at the estimated plan, this will probably shed some light. Typically when query cost gets orders of magnitude more expensive it is because a loop or merge join is being used where a hash join is more appropriate. If you see a loop or merge join in the estimated plan, look at the number of rows it expects to process - is it far smaller than the number of rows you know will actually be in play? You can also specify a hint to use a hash join and see if it performs much better. If so, try updating statistics and see if it goes back to a hash join without a hint.
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER HASH JOIN vwSomeData alpha ON beta.id = alpha.id

-- having no idea what type of schema is in place and just trying to throw out ideas:
Like others have said... use Profiler and find the source of pain... but I'm thinking it is the function on the other database. Since that function might be a source of pain, have you thought about a little denormalization or anything on [DifferentDatabase]. I think you'll find a bit more scalability in joining to a more flattened table with indexes than a costly function.

Run this command:
SET SHOWPLAN_ALL ON
Then run your query. It will display the execution plan, look for a "SCAN" on an index or a table. That is most likely what is happening to your query now. If that is the case, try to figure out why it is not using indexes now (refresh statistics, etc)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight