Is it possible to run Firebird 2.5 query in parallel mode? - query-optimization

Colleagues, i have a query which looks like
FOR select first 100 contracts.doc from TABLE_1
INTO :contr
DO
BEGIN
insert into tmp_port (
DT,
....)
select
:IN$DT as DT,
...
from p_procedure (0, :contr , :IN$DT);
END
The problem is that proprietary procedure p_procedure works very slow. I can't optimize p_procedure.
In Oracle, as I remember, I can increase the speed of execution using parallel.
Is there something similar (like hint 'parallel') in FireBird 2.5?
Is there any other approaches to increase the execution speed?
Thank you for any advice.

Related

EF Count(*) on a TableFunction is extremely slower than launched by management studio

I'm throwing a trivial count on the result of a Table Function. and it takes over 10 seconds.
Through Sql Profiler I intercepted the query that is sent to the DB, and if I launch it through Management Stodio it takes 10 milliseconds.
IUnitOfWorkAsync<ScooterAccountingPreviewContext> _uow = new UnitOfWork<ScooterAccountingPreviewContext>();
IOrderedQueryable<GetProviders_Result> Query = (IOrderedQueryable<GetProviders_Result>)_uow.DbContext.GetProviders(AspNetUserID);
Int32 DataObjectCount = Query.Count();
This is the extrapolated query
exec sp_executesql N'SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[GetProviders](#AspNetUserID) AS [Extent1]
) AS [GroupBy1]',N'#AspNetUserID int',#AspNetUserID=0
let's leave out the fact that EF6 always adds useless code, and the groupby could have been easily avoided, but why are there so different results?
I would expect that the queries, once sent to the SqlServer engine, would be executed in the same way with comparable timings, but instead they have extremely different timings, cpu loads and readings

Why is GETDATE slowing down select query if I use a variable?

I'm doing a select on a table with about 6 millions records selecting GETDATE()
select getdate() as date, [...] from MyTable
I verified that the performance issue is on GETDATE(), removing all other fields the query is still slow.
I thought that putting the value of GETDATE() in a separate var would speed the query up
declare #now datetime
set #now = GETDATE()
select #now as date, [...] from MyTable
It is slow as well. Why?
I'd never really noticed this before. But I am seeing the same thing.
Ran the following on a 10 million row table...
-- query #1
DECLARE #now AS DATETIME ;
SET #now = GETDATE() ;
SELECT #now AS [date], * FROM [MyTable] ;
-- cpu time = 2,563 ms
-- duration = 27,511 ms
-- query #2
SELECT GETDATE() AS [date], * FROM [MyTable] ;
-- cpu time = 2,421 ms
-- duration = 26,862 ms
-- query #3
SELECT * FROM [MyTable] ;
-- cpu time = 1,969 ms
-- duration = 23,149 ms
And the cpu times and durations are showing a difference.
All three query plans are more or less the same, with negligible difference between estimated costs for the queries.
The only differences I could see between the plans were the wait stats...
Query #1
WaitType = ASYNC_NETWORK_IO
WaitCount = 77,716
WaitTimeMs = 24,234
Query #2
WaitType = ASYNC_NETWORK_IO
WaitCount = 75,261
WaitTimeMs = 23,662
Query #3
WaitType = ASYNC_NETWORK_IO
WaitCount = 55,434
WaitTimeMs = 20,280
That's an extra 3-4 seconds, between including and not including the GETDATE() column in the result set, just waiting for whatever's running the query to acknowledge it has consumed the data and is ready for more.
In my case, I was using SSMS to execute the queries. So, I can only put it down to SSMS dragging its heels to render that extra column, which amounted to about 75 MB (10M x 8 bytes).
Having said that, the bulk of the time is obviously taken up with scanning all 10 million rows.
Unfortunately, I think the extra execution time to include your GETDATE() column is unavoidable.
Two points.
ASYNC_NETWORK_IO is SQL Server saying that it is waiting for network bandwidth to be available in order to send more data down the pipe.
SSMS stores the output of the Results window in a temp file on your C:\ drive so will be affected by disk I/O, AV scanning, other processes, etc. running on your machine. Same concept if you use a Linux OS.
I'd experiment with limiting the size of the data being returned (10M records can hardly be analysed by a human), and using a different tool to pull the records (if you really need 10M records) for starters.
Also, review the Execution Plan to find out where exactly the delay is. If it still points yo the ASYNC_NETWORK_IO wait, then your problem could be one or more of the network components between yourself and the server. Try using a wired connection instead of WiFi. Do you have a VPN? Is there anything limiting data transfer rates? Or the reason might simply be that too much data is being pulled.

Average insert time for a table in SQL Server historically

Is there a way to retrieve the info on what was the average (or better a distribution) insert time into a given table in SQL Server up to the current point in time?
e.g. inserting into 'employees' took on average 1 millisecond per record.
I'm talking about historical data here e.g. over the last year, not what I can get for specific queries when profiling.
You should also check plan cache. From there you can calculate the average duration per statement, and assuming you're not inserting into the table using a lot of different statements (and your queries are parametrized) you should get quite good results.
Here's one example how to query the DMVs:
select top 100
SUBSTRING(t.text, (s.statement_start_offset/2)+1,
((CASE s.statement_end_offset
WHEN -1 THEN DATALENGTH(t.text)
ELSE s.statement_end_offset
END - s.statement_start_offset)/2) + 1) as statement_text,
t.text,
s.total_logical_reads, s.total_logical_reads / s.execution_count as avg_logical_reads,
s.total_worker_time, s.total_worker_time / s.execution_count as avg_worker_time,
s.execution_count,
creation_time,
last_execution_time
--,cast(p.query_plan as xml) as query_plan
from sys.dm_exec_query_stats s
cross apply sys.dm_exec_sql_text (sql_handle) t
--cross apply sys.dm_exec_text_query_plan (plan_handle, statement_start_offset, statement_end_offset) p
order by s.execution_count desc
The part commented out is for query plans.
SQL Profiler is not accurate, also it is marked deprecated starting SQL 2012.
Best tools for caturing performance-related data is extended events or perfmon. Don't think perfmon will give you object level performance, but it will tell you if you have bottlenecks at IO level. You will need to enable these tools/features for data collection, so if it hasn't been enabled already, then getting historical data is probably not possible.

Why is a T-SQL variable comparison slower than GETDATE() function-based comparison?

I have a T-SQL statement that I am running against a table with many rows. I am seeing some strange behavior. Comparing a DateTime column against a precalculated value is slower than comparing each row against a calculation based on the GETDATE() function.
The following SQL takes 8 secs:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO
DECLARE #TimeZoneOffset int = -(DATEPART("HH", GETUTCDATE() - GETDATE()))
DECLARE #LowerTime DATETIME = DATEADD("HH", ABS(#TimeZoneOffset), CONVERT(VARCHAR, GETDATE(), 101) + ' 17:00:00')
SELECT TOP 200 Id, EventDate, Message
FROM Events WITH (NOLOCK)
WHERE EventDate > #LowerTime
GO
This alternate strangely returns instantly:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO
SELECT TOP 200 Id, EventDate, Message
FROM Events WITH (NOLOCK)
WHERE EventDate > GETDATE()-1
GO
Why is the second query so much faster?
EDITED: I updated the SQL to accurately reflect other settings I am using
After doing a lot of reading and researching, I've discovered the issue here is parameter sniffing. Sql Server attempts to determine how best to use indexes based on the where clause, but in this case it isnt doing a very good job.
See the examples below :
Slow version:
declare #dNow DateTime
Select #dNow=GetDate()
Select *
From response_master_Incident rmi
Where rmi.response_date between DateAdd(hh,-2,#dNow) AND #dNow
Fast version:
Select *
From response_master_Incident rmi
Where rmi.response_date between DateAdd(hh,-2,GetDate()) AND GetDate()
The "Fast" version runs around 10x faster than the slow version. The Response_Date field is indexed and is a DateTime type.
The solution is to tell Sql Server how best to optimise the query. Modifying the example as follows to include the OPTIMIZE option resulted in it using the same execution plan as the "Fast Version". The OPTMIZE option here explicitly tells sql server to treat the local #dNow variable as a date (as if declaring it as DateTime wasnt enough :s )
Care should be taken when doing this however because in more complicated WHERE clauses you could end up making the query perform worse than Sql Server's own optimisations.
declare #dNow DateTime
SET #dNow=GetDate()
Select ID, response_date, call_back_phone
from response_master_Incident rmi
where rmi.response_date between DateAdd(hh,-2,#dNow) AND #dNow
-- The optimizer does not know too much about the variable so assumes to should perform a clusterd index scann (on the clustered index ID) - this is slow
-- This hint tells the optimzer that the variable is indeed a datetime in this format (why it does not know that already who knows)
OPTION(OPTIMIZE FOR (#dNow = '99991231'));
The execution plans must be different, because SQL Server does not evaluate the value of the variable when creating the execution plan in execution time. So, it uses average statistics from all the different dates that can be stored in the table.
On the other hand, the function getdate is evaluated in execution time, so the execution plan is created using statistics for that specific date, which of course, are more realistic that the previous ones.
If you create a stored procedure with #LowerTime as a parameter, you will get better results.

CPU utilization by database?

Is it possible to get a breakdown of CPU utilization by database?
I'm ideally looking for a Task Manager type interface for SQL server, but instead of looking at the CPU utilization of each PID (like taskmgr) or each SPID (like spwho2k5), I want to view the total CPU utilization of each database. Assume a single SQL instance.
I realize that tools could be written to collect this data and report on it, but I'm wondering if there is any tool that lets me see a live view of which databases are contributing most to the sqlservr.exe CPU load.
Sort of. Check this query out:
SELECT total_worker_time/execution_count AS AvgCPU
, total_worker_time AS TotalCPU
, total_elapsed_time/execution_count AS AvgDuration
, total_elapsed_time AS TotalDuration
, (total_logical_reads+total_physical_reads)/execution_count AS AvgReads
, (total_logical_reads+total_physical_reads) AS TotalReads
, execution_count
, SUBSTRING(st.TEXT, (qs.statement_start_offset/2)+1
, ((CASE qs.statement_end_offset WHEN -1 THEN datalength(st.TEXT)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2) + 1) AS txt
, query_plan
FROM sys.dm_exec_query_stats AS qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) AS st
cross apply sys.dm_exec_query_plan (qs.plan_handle) AS qp
ORDER BY 1 DESC
This will get you the queries in the plan cache in order of how much CPU they've used up. You can run this periodically, like in a SQL Agent job, and insert the results into a table to make sure the data persists beyond reboots.
When you read the results, you'll probably realize why we can't correlate that data directly back to an individual database. First, a single query can also hide its true database parent by doing tricks like this:
USE msdb
DECLARE #StringToExecute VARCHAR(1000)
SET #StringToExecute = 'SELECT * FROM AdventureWorks.dbo.ErrorLog'
EXEC #StringToExecute
The query would be executed in MSDB, but it would poll results from AdventureWorks. Where should we assign the CPU consumption?
It gets worse when you:
Join between multiple databases
Run a transaction in multiple databases, and the locking effort spans multiple databases
Run SQL Agent jobs in MSDB that "work" in MSDB, but back up individual databases
It goes on and on. That's why it makes sense to performance tune at the query level instead of the database level.
In SQL Server 2008R2, Microsoft introduced performance management and app management features that will let us package a single database in a distributable and deployable DAC pack, and they're promising features to make it easier to manage performance of individual databases and their applications. It still doesn't do what you're looking for, though.
For more of those, check out the T-SQL repository at Toad World's SQL Server wiki (formerly at SQLServerPedia).
Updated on 1/29 to include total numbers instead of just averages.
SQL Server (starting with 2000) will install performance counters (viewable from Performance Monitor or Perfmon).
One of the counter categories (from a SQL Server 2005 install is:)
- SQLServer:Databases
With one instance for each database. The counters available however do not provide a CPU % Utilization counter or something similar, although there are some rate counters, that you could use to get a good estimate of CPU. Example would be, if you have 2 databases, and the rate measured is 20 transactions/sec on database A and 80 trans/sec on database B --- then you would know that A contributes roughly to 20% of the total CPU, and B contributes to other 80%.
There are some flaws here, as that's assuming all the work being done is CPU bound, which of course with databases it's not. But that would be a start I believe.
Here's a query that will show the actual database causing high load. It relies on the query cache which might get flushed frequently in low-memory scenarios (making the query less useful).
select dbs.name, cacheobjtype, total_cpu_time, total_execution_count from
(select top 10
sum(qs.total_worker_time) as total_cpu_time,
sum(qs.execution_count) as total_execution_count,
count(*) as number_of_statements,
qs.plan_handle
from
sys.dm_exec_query_stats qs
group by qs.plan_handle
order by sum(qs.total_worker_time) desc
) a
inner join
(SELECT plan_handle, pvt.dbid, cacheobjtype
FROM (
SELECT plan_handle, epa.attribute, epa.value, cacheobjtype
FROM sys.dm_exec_cached_plans
OUTER APPLY sys.dm_exec_plan_attributes(plan_handle) AS epa
/* WHERE cacheobjtype = 'Compiled Plan' AND objtype = 'adhoc' */) AS ecpa
PIVOT (MAX(ecpa.value) FOR ecpa.attribute IN ("dbid", "sql_handle")) AS pvt
) b on a.plan_handle = b.plan_handle
inner join sys.databases dbs on dbid = dbs.database_id
I think the answer to your question is no.
The issue is that one activity on a machine can cause load on multiple databases. If I have a process that is reading from a config DB, logging to a logging DB, and moving transactions in and out of various DBs based on type, how do I partition the CPU usage?
You could divide CPU utilization by the transaction load, but that is again a rough metric that may mislead you. How would you divide transaction log shipping from one DB to another, for instance? Is the CPU load in the reading or the writing?
You're better off looking at the transaction rate for a machine and the CPU load it causes. You could also profile stored procedures and see if any of them are taking an inordinate amount of time; however, this won't get you the answer you want.
With all said above in mind.
Starting with SQL Server 2012 (may be 2008 ?) , there is column database_id in sys.dm_exec_sessions.
It gives us easy calculation of cpu for each database for currently connected sessions. If session have disconnected, then its results have gone.
select session_id, cpu_time, program_name, login_name, database_id
from sys.dm_exec_sessions
where session_id > 50;
select sum(cpu_time)/1000 as cpu_seconds, database_id
from sys.dm_exec_sessions
group by database_id
order by cpu_seconds desc;
Take a look at SQL Sentry. It does all you need and more.
Regards,
Lieven
Have you looked at SQL profiler?
Take the standard "T-SQL" or "Stored Procedure" template, tweak the fields to group by the database ID (I think you have to used the number, you dont get the database name, but it's easy to find out using exec sp_databases to get the list)
Run this for a while and you'll get the total CPU counts / Disk IO / Wait etc. This can give you the proportion of CPU used by each database.
If you monitor the PerfMon counter at the same time (log the data to a SQL database), and do the same for the SQL Profiler (log to database), you may be able to correlate the two together.
Even so, it should give you enough of a clue as to which DB is worth looking at in more detail. Then, do the same again with just that database ID and look for the most expensive SQL / Stored Procedures.
please check this query:
SELECT
DB_NAME(st.dbid) AS DatabaseName
,OBJECT_SCHEMA_NAME(st.objectid,dbid) AS SchemaName
,cp.objtype AS ObjectType
,OBJECT_NAME(st.objectid,dbid) AS Objects
,MAX(cp.usecounts)AS Total_Execution_count
,SUM(qs.total_worker_time) AS Total_CPU_Time
,SUM(qs.total_worker_time) / (max(cp.usecounts) * 1.0) AS Avg_CPU_Time
FROM sys.dm_exec_cached_plans cp
INNER JOIN sys.dm_exec_query_stats qs
ON cp.plan_handle = qs.plan_handle
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
WHERE DB_NAME(st.dbid) IS NOT NULL
GROUP BY DB_NAME(st.dbid),OBJECT_SCHEMA_NAME(objectid,st.dbid),cp.objtype,OBJECT_NAME(objectid,st.dbid)
ORDER BY sum(qs.total_worker_time) desc

Resources