Related
Below shows the CPU spike in a 24 hour period of one of our Azure SQL database. compute utilization
In Query Performance, the top 5 queries by CPU in thesame 24 hour period is shown in the image below. query performance
However,the number of spikes in the overview above are more frequent than the top query by CPU. Where can we find what is causing the spike beside the Query Performance since it seems to be something else altogether?
I know it is a bit late, but might be useful for others, the following query shows top 10 active CPU consuming queries in Azure:
SELECT TOP 10
GETDATE() runtime,
*
FROM
(
SELECT query_stats.query_hash,
SUM(query_stats.cpu_time) 'Total_Request_Cpu_Time_Ms',
SUM(logical_reads) 'Total_Request_Logical_Reads',
MIN(start_time) 'Earliest_Request_start_Time',
COUNT(*) 'Number_Of_Requests',
SUBSTRING(REPLACE(REPLACE(MIN(query_stats.statement_text), CHAR(10), ' '), CHAR(13), ' '), 1, 256) AS "Statement_Text"
FROM
(
SELECT req.*,
SUBSTRING( ST.text,
(req.statement_start_offset / 2) + 1,
((CASE statement_end_offset
WHEN -1 THEN
DATALENGTH(ST.text)
ELSE
req.statement_end_offset
END - req.statement_start_offset
) / 2
) + 1
) AS statement_text
FROM sys.dm_exec_requests AS req
CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) AS ST
) AS query_stats
GROUP BY query_hash
) AS t
ORDER BY Total_Request_Cpu_Time_Ms DESC;
I am investigating a problem with the execution speed of an inline table function in SQL Server. Or that's where I thought the problem lay. I came across
T-SQL code is extremely slow when saved as an Inline Table-valued Function
which looked promising, since it described what I was seeing, but I seemed to have the opposite problem - when I passed variables to my function, it took 17 seconds, but when I ran the code of my function in a query window, using DECLARE statements for the variables (which I thought effectively made them literals), it ran in milliseconds. Same code, same parameters - just wrapping them up in an inline table function seemed to drag it way down.
I tried to reduce my query to the minimum possible code that still exhibited the behaviour. I am using numerous existing inline table functions (all of which have worked fine for years), and managed to strip my code down to needing just a call of one existing inline table function to be able to highlight the speed difference. But in doing so I noticed something very odd
SELECT strStudentNumber
FROM dbo.udfNominalSnapshot('2019', 'REG')
takes 17 seconds whereas
DECLARE #strAcademicSessionStart varchar(4) = '2019'
DECLARE #strProgressCode varchar(12)= 'REG'
SELECT strStudentNumber
FROM dbo.udfNominalSnapshot(#strAcademicSessionStart, #strProgressCode)
takes milliseconds! So nothing to do with wrapping the code in an inline table function, but everything to do with how the parameters are passed to a nested function within it. Based on the cited article I'm guessing there are two different execution plans in play, but I have no idea why/how, and more importantly, what I can do to persuade SQL Server to use the efficient one?
P.S. here is the code of the inner UDF call in response to a comment request
ALTER FUNCTION [dbo].[udfNominalSnapshot]
(
#strAcademicSessionStart varchar(4)='%',
#strProgressCode varchar(10)='%'
)
RETURNS TABLE
AS
RETURN
(
SELECT TOP 100 PERCENT S.strStudentNumber, S.strSurname, S.strForenames, S.strTitle, S.strPreviousSurname, S.dtmDoB, S.strGender, S.strMaritalStatus,
S.strResidencyCode, S.strNationalityCode, S.strHESAnumber, S.strSLCnumber, S.strPreviousSchoolName, S.strPreviousSchoolCode,
S.strPreviousSchoolType,
COLLEGE_EMAIL.strEmailAddress AS strEmailAlias,
PERSONAL_EMAIL.strEmailAddress AS strPersonalEmail,
P.[str(Sub)Plan], P.intYearOfCourse, P.strProgressCode,
P.strAcademicSessionStart, strC2Knumber AS C2K_ID, AcadPlan, strC2KmailAlias
,ISNULL([strC2KmailAlias], [strC2Knumber]) + '#c2kni.net' AS strC2KmailAddress
FROM dbo.tblStudents AS S
LEFT JOIN
dbo.udfMostRecentEmail('COLLEGE') AS COLLEGE_EMAIL ON S.strStudentNumber = COLLEGE_EMAIL.strStudentNumber
LEFT JOIN
dbo.udfMostRecentEmail('PERSONAL') AS PERSONAL_EMAIL ON S.strStudentNumber = PERSONAL_EMAIL.strStudentNumber
INNER JOIN
dbo.udfProgressHistory(#strAcademicSessionStart) AS P ON S.strStudentNumber = P.strStudentNumber
WHERE (P.strProgressCode LIKE #strProgressCode OR (SUBSTRING(#strProgressCode, 1, 1) = '^' AND P.strProgressCode NOT LIKE SUBSTRING(#strProgressCode, 2, LEN(#strProgressCode)))) AND
(P.strStudentNumber NOT IN
(SELECT strStudentNumber
FROM dbo.tblPilgrims
WHERE (strAcademicSessionStart = #strAcademicSessionStart) AND (strScheme = 'BEI')))
ORDER BY P.[str(Sub)Plan], P.intYearOfCourse, S.strSurname
)
Expanding on #Ross Pressers comment, this might not really be an answer, but demonstrates what is happening (a bit), with my understanding (which could be wrong!) of what is happening...
Run the setup code at the end and then....
Execute the following with query plan on (Ctrl-M)... (note: depending on the random number generator you may or may not get any results, that does not affect the plan)
declare #one varchar(100) = '379', #two varchar(200) = '726'
select * from wibble(#one, #two) -- 1
select * from wibble('379', '726') -- 2
select * from wibble(#one, #two) OPTION (RECOMPILE) -- 3
select * from wibble(#one, #two) -- 4
Caveat. The following is what happens on MY system, your mileage may vary...
-- 1 (and -- 4) are the most expensive.
SQL Server creates a generic plan as it does not know what the parameters are (yes they are defined, but the plan is for wibble(#one, #two) where, at that point, the parameter values are "unknown")
https://www.brentozar.com/pastetheplan/?id=rJtIRwx_r
-- 2 has a different plan
Here, sql server knows what the parameters are, so can create a specific plan, which is quite different to --1
https://www.brentozar.com/pastetheplan/?id=rJa9APldS
-- 3 has the same plan as --2
Testing this further, adding OPTION (RECOMPILE) gets SQL Server to create a specific plan for the specific execution of wibble(#one, #two) so we get the same plan as --2
--4 is there for completeness to show that after all that mucking about the generic plan is still in place
So, in this simple example we have a parameterised TVF being called with identical values, that are passed either as parameters or inline, producing different execution plans and different execution times as per the OP
Set up
use tempdb
GO
drop table if EXISTS Orders
GO
create table Orders (
OrderID int primary key,
UserName varchar(50),
PhoneNumber1 varchar(50),
)
-- generate 300000 with randon "phone" numbers
;WITH TallyTable AS (
SELECT TOP 300000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS [N]
FROM dbo.syscolumns tb1,dbo.syscolumns tb2
)
insert into Orders
select n, 'user' + cast(n as varchar(10)), cast(CRYPT_GEN_RANDOM(3) as int)
FROM TallyTable;
GO
drop function if exists wibble
GO
create or alter function wibble (
#one varchar(4) = '%'
, #two varchar(4) = '%'
)
returns table
as
return select * from Orders
where PhoneNumber1 like '%' + #one + '%'
and PhoneNumber1 like '%' + #two + '%'
or (SUBSTRING(#one, 1, 1) = '^' AND PhoneNumber1 NOT LIKE SUBSTRING(#two, 2, LEN(#two)))
and (select 1) = 1
GO
Problem was overcome (I wouldn't say "fixed") by following up on Ross Presser's observation about the complexity of udfProgressHistory. This sucked data from a table tblProgressHistory which was joined to itself. The table is added to annually. I think this year's additional 2K records must have caused the sudden cost-hike when using a particular execution plan. I deleted >2K redundant records and we're back to sub-second execution.
In MS SQL Server. Please don't tell me to ctrl-f. I don't have access to the entire query, but I'm composing the columns that depend on whether certain variable is declared.
Thanks.
Edit:
I'm working with some weird query engine. I need to write the select columns part and the engine will take care of the rest (hopefully). But in some cases this engine will declare variables (thankfully I will know the variable names), and in other cases it doesn't. I need to compose my columns to take these variables when they are declared, and give default values when these variables are not declared.
Given the limitations of my understanding of what you're running (I'm deciphering the word problem to mean that your "query engine" is actually a "query generation engine" so, something like an ORM) you could observe what's occurring on the server in this scenario with the following query:
select
sql_handle,
st.text
from sys.dm_exec_requests r
cross apply sys.dm_exec_sql_text(r.sql_handle) st
where session_id <> ##SPID
and st.text like '%#<<parameter_name>>%';
The statement needs to have begun execution to be able to catch it. Depending on a multitude of situations, you may be able to pull it from query stats, too:
This will also get you the query plan (if it has one), but note that it will also pull stats for itself as well as the query above so you'll need to be discerning when you look at the outer and statement text values:
select
text,
SUBSTRING(
st.text,
(qs.statement_start_offset / 2) + 1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(st.TEXT)
ELSE qs.statement_end_offset
END -
qs.statement_start_offset) / 2) + 1)
AS statement_text,
plan_generation_num, creation_time, last_execution_time, execution_count
,total_worker_time, last_worker_time, min_worker_time, max_worker_time,
total_physical_reads, min_physical_reads, max_physical_reads, last_physical_reads,
total_logical_writes, min_logical_writes, max_logical_writes, last_logical_writes,
total_logical_reads, min_logical_reads, max_logical_reads, last_logical_reads,
total_elapsed_time, last_elapsed_time, min_elapsed_time, max_elapsed_time,
total_rows,last_rows,min_rows,max_rows
,qp.*
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) st
outer apply sys.dm_exec_query_plan(qs.plan_handle) qp
where st.text like '%#<<parameter_name>>%';
Im trying to write a query that will tell me how much time a restore (full or log) has taken on SQL server 2008.
I can run this query to find out how much time the backup took:
select database_name,
[uncompressed_size] = backup_size/1024/1024,
[compressed_size] = compressed_backup_size/1024/1024,
backup_start_date,
backup_finish_date,
datediff(s,backup_start_date,backup_finish_date) as [TimeTaken(s)],
from msdb..backupset b
where type = 'L' -- for log backups
order by b.backup_start_date desc
This query will tell me what is restored but now how much time it took:
select * from msdb..restorehistory
restorehistory has a column backup_set_id which will link to msdb..backupset, but that hold the start and end date for the backup not the restore.
Any idea where to query the start and end time for restores?
To find the RESTORE DATABASE time, I have found that you can use this query:
declare #filepath nvarchar(1000)
SELECT #filepath = cast(value as nvarchar(1000)) FROM [fn_trace_getinfo](NULL)
WHERE [property] = 2 and traceid=1
SELECT *
FROM [fn_trace_gettable](#filepath, DEFAULT)
WHERE TextData LIKE 'RESTORE DATABASE%'
ORDER BY StartTime DESC;
The downside is, you'll notice that, at least on my test server, the EndTime is always NULL.
So, I came up with a second query to try and determine the end time. First of all, I apologize that this is pretty ugly and nested like crazy.
The query below assumes the following:
When a restore is run, for that DatabaseID and ClientProcessID, the next EventSequence contains the TransactionID we need.
I then go and find the max EventSequence for the Transaction
Finally, I select the record that contains RESTORE DATABASE and the maximum transaction associated with that record.
I'm sure someone can probably take what I've done and refine it, but this appears to work on my test environment:
declare #filepath nvarchar(1000)
SELECT #filepath = cast(value as nvarchar(1000)) FROM [fn_trace_getinfo](NULL)
WHERE [property] = 2 and traceid=1
SELECT *
FROM [fn_trace_gettable](#filepath, DEFAULT) F5
INNER JOIN
(
SELECT F4.EventSequence MainSequence,
MAX(F3.EventSequence) MaxEventSequence, F3.TransactionID
FROM [fn_trace_gettable](#filepath, DEFAULT) F3
INNER JOIN
(
SELECT F2.EventSequence, MIN(TransactionID) as TransactionID
FROM [fn_trace_gettable](#filepath, DEFAULT) F1
INNER JOIN
(
SELECT DatabaseID, SPID, StartTime, ClientProcessID, EventSequence
FROM [fn_trace_gettable](#filepath, DEFAULT)
WHERE TextData LIKE 'RESTORE DATABASE%'
) F2 ON F1.DatabaseID = F2.DatabaseID AND F1.SPID = F2.SPID
AND F1.ClientProcessID = F2.ClientProcessID
AND F1.StartTime > F2.StartTime
GROUP BY F2.EventSequence
) F4 ON F3.TransactionID = F4.TransactionID
GROUP BY F3.TransactionID, F4.EventSequence
) F6 ON F5.EventSequence = F6.MainSequence
OR F5.EventSequence = F6.MaxEventSequence
ORDER BY F5.StartTime
EDIT
I made some changes to the query, since one of the test databases I used is case-sensitive and it was losing some records. I also noticed when restoring from disk that the DatabaseID is null, so I'm handling that now as well:
SELECT *
FROM [fn_trace_gettable](#filepath, DEFAULT) F5
INNER JOIN
(
SELECT F4.EventSequence MainSequence,
MAX(F3.EventSequence) MaxEventSequence, F3.TransactionID
FROM [fn_trace_gettable](#filepath, DEFAULT) F3
INNER JOIN
(
SELECT F2.EventSequence, MIN(TransactionID) as TransactionID
FROM [fn_trace_gettable](#filepath, DEFAULT) F1
INNER JOIN
(
SELECT DatabaseID, SPID, StartTime, ClientProcessID, EventSequence
FROM [fn_trace_gettable](#filepath, DEFAULT)
WHERE upper(convert(nvarchar(max), TextData))
LIKE 'RESTORE DATABASE%'
) F2 ON (F1.DatabaseID = F2.DatabaseID OR F2.DatabaseID IS NULL)
AND F1.SPID = F2.SPID
AND F1.ClientProcessID = F2.ClientProcessID
AND F1.StartTime > F2.StartTime
GROUP BY F2.EventSequence
) F4 ON F3.TransactionID = F4.TransactionID
GROUP BY F3.TransactionID, F4.EventSequence
) F6 ON F5.EventSequence = F6.MainSequence
OR F5.EventSequence = F6.MaxEventSequence
ORDER BY F5.StartTime
Make it a Job. Then run it as the Job. Then check the View Job History. Then look at the duration column.
While it is running you can check something like this dmv.
select
d.name
,percent_complete
,dateadd(second,estimated_completion_time/1000, getdate())
, Getdate() as now
,datediff(minute, start_time
, getdate()) as running
, estimated_completion_time/1000/60 as togo
,start_time
, command
from sys.dm_exec_requests req
inner join sys.sysdatabases d on d.dbid = req.database_id
where
req.command LIKE '%RESTORE%'
Or you can use some magic voodoo and interpret the transaction log in the following table function, however the only person I know to understand any info in this log is Paul Randal.
I Know he sometimes checks Server Fault, but don't know if he wonders StackOverflow.
select * from fn_dblog(NULL,NULL)
Hope this helps.
If you manage to use this and find a solution please tell us.
Good Luck!
I am using SQL Server 2008 and we are using the DMV's to find missing indexes. However, before I create the new index I am trying to figure out what proc/query is wanting that index. I want the most information I can get so I can make informed decision on my indexes. Sometimes the indexes SQL Server wants does not make sense to me. Does anyone know how I can figure out what wants it?
you could try something like this query, which lists the QueryText:
;WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
, CachedPlans AS
(SELECT
RelOp.op.value(N'../../#NodeId', N'int') AS ParentOperationID
,RelOp.op.value(N'#NodeId', N'int') AS OperationID
,RelOp.op.value(N'#PhysicalOp', N'varchar(50)') AS PhysicalOperator
,RelOp.op.value(N'#LogicalOp', N'varchar(50)') AS LogicalOperator
,RelOp.op.value(N'#EstimatedTotalSubtreeCost ', N'float') AS EstimatedCost
,RelOp.op.value(N'#EstimateIO', N'float') AS EstimatedIO
,RelOp.op.value(N'#EstimateCPU', N'float') AS EstimatedCPU
,RelOp.op.value(N'#EstimateRows', N'float') AS EstimatedRows
,cp.plan_handle AS PlanHandle
,qp.query_plan AS QueryPlan
,st.TEXT AS QueryText
,cp.cacheobjtype AS CacheObjectType
,cp.objtype AS ObjectType
,cp.usecounts AS UseCounts
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
CROSS APPLY qp.query_plan.nodes(N'//RelOp') RelOp (op)
)
SELECT
PlanHandle
,ParentOperationID
,OperationID
,PhysicalOperator
,LogicalOperator
,UseCounts
,CacheObjectType
,ObjectType
,EstimatedCost
,EstimatedIO
,EstimatedCPU
,EstimatedRows
,QueryText
FROM CachedPlans
WHERE CacheObjectType = N'Compiled Plan'
AND PhysicalOperator IN ('nothing will ever match this one!'
--,'Assert'
--,'Bitmap'
--,'Clustered Index Delete'
--,'Clustered Index Insert'
,'Clustered Index Scan'
--,'Clustered Index Seek'
--,'Clustered Index Update'
--,'Compute Scalar'
--,'Concatenation'
--,'Constant Scan'
,'Deleted Scan'
--,'Filter'
--,'Hash Match'
,'Index Scan'
--,'Index Seek'
--,'Index Spool'
,'Inserted Scan'
--,'Merge Join'
--,'Nested Loops'
--,'Parallelism'
,'Parameter Table Scan'
--,'RID Lookup'
--,'Segment'
--,'Sequence Project'
--,'Sort'
--,'Stream Aggregate'
--,'Table Delete'
--,'Table Insert'
,'Table Scan'
--,'Table Spool'
--,'Table Update'
--,'Table-valued function'
--,'Top'
)
just add an ORDER BY on something like the combination of the UseCounts and EstimatedCost.
Here is what finally worked:
with xmlnamespaces(default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') , CachedPlans as (
select
query_plan,
n.value('../../../#StatementText' ,'varchar(1000)') as [Statement],
n.value('../../../#StatementSubTreeCost' ,'varchar(1000)') as [Cost],
n.value('../../../#StatementEstRows' ,'varchar(1000)') as [Rows],
n.value('#Impact' ,'float') as Impact,
n.value('MissingIndex[1]/#Database' ,'varchar(128)') as [Database],
n.value('MissingIndex[1]/#Table' ,'varchar(128)') as [TableName],
(
select dbo.concat(c.value('#Name' ,'varchar(128)'))
from n.nodes('MissingIndex/ColumnGroup[#Usage="EQUALITY"][1]') as t(cg)
cross apply cg.nodes('Column') as r(c)
) as equality_columns,
(
select dbo.concat(c.value('#Name' ,'varchar(128)'))
from n.nodes('MissingIndex/ColumnGroup[#Usage="INEQUALITY"][1]') as t(cg)
cross apply cg.nodes('Column') as r(c)
) as inequality_columns,
(
select dbo.concat(c.value('#Name' ,'varchar(128)'))
from n.nodes('MissingIndex/ColumnGroup[#Usage="INCLUDE"][1]') as t(cg)
cross apply cg.nodes('Column') as r(c)
) as include_columns
from (
select query_plan
from sys.dm_exec_cached_plans p
outer apply sys.dm_exec_query_plan(p.plan_handle) tp
) as tab(query_plan)
cross apply query_plan.nodes('//MissingIndexGroup') as q(n)
)
select *
from CachedPlans
You could run a profiler trace and check out the procedures that are running and their effectiveness in terms on index seeks / usage.
Rather than just do all indices for everyone, it is better to optimize the biggest problem - you usually will get the most benefit from this.
In the profiler trace, figure out which stored proc / tsql statement runs the most number of times and consumes the most resources. Those are the ones that you really want to go after.