Backround
This question is a follow-up to a previous question. To give you context here as well, I would like to summarize the previous question: in my previous question I intended to have a methodology to execute selections without sending their result to the client. The goal was to measure performance without eating up a lot of resources by sending millions of data. I am only interested in the time needed to execute those queries and not in the time they will send the results to the client app, since I intend to optimize queries, so the results of the queries will not change at all, but the methodology will change and I intend to be able to compare the methodologies.
Current knowledge
In my other question several ideas were presented. An idea was to select the count of the records and put it into a variable. However, that changed the query plan significantly and the results were not accurate in terms of performance. The idea of using a temporary table was presented as well, but creating a temporary table and inserting into it is difficult if we do not know what query will be our input to measure and also introduces a lot of white noise, so, even though the idea was creative, it was not ideal for my problem. Finally Vladimir Baranov came with an idea to create as many variables as many columns the selection will return. This was a great idea, but I refined it further, by creating a single variable of nvarchar(max) and selecting all my columns into it. The idea works great, except for a few problems. I have the solution for most of the problems, but I would like to share them, so, I will describe them regardless, but do not misunderstand me, I have a single question.
Problem1
If I have a #container variable and I do a #container = columnname inside each selection, then I will have conversion problems.
Solution1
Instead of just doing a #container = columnname, I need to do a #container = cast(columnname as nvarchar(max))
Problem2
I will need to convert <whatever> as something into #container = cast(<whatever> as nvarchar(max)) for each columns in the selection, but not for subselections and I will need to have a general solution handling case when and parantheses, I do not want to have any instances of #container = anywhere, except to the left of the main selection.
Solution2
Since I am clueless about regular expressions, I can solve this by iterating the query string until I find the from of the main query and each time I find a parantheses, I will do nothing until that parantheses is closed, find the indexes where #container = should be put and as [customname] should be taken out and from right to left do all that in the query string. This will be a long and unelegant code.
Question
Is it possible to make sure that all my main columns but nothing else start with #container = and ends without as [Customname]?
This is much too long for a comment but I'd like to add my $.02 to the other answers and share the scripts I used to test the suggested methods.
I like #MartinSmith's TOP 0 solution but am concerned that it could result in a different execution plan shape in some cases. I didn't see this in the tests I ran but I think you'll need to verify the plan is about the same as the unmolested query for each query you test. However, the results of my tests suggest the number of columns and/or data types might skew performance with this method.
The SQLCLR method in #VladimirBaranov's answer should provide the exact plan as the app code generates (assuming identical SET options for the test) there will still be some slight overhead (YMMV) with SqlClient consuming results within the SQLCLR. There will be less server overhead with this method compared to returning results back to the calling application.
The SSMS discard results method I suggested in my first comment will incur more overhead than the other methods but does include the server-side work SQL Server will perform not only in running the query, but also filling buffers for the returned result. Whether or not this additional SQL Server work should be taken into account depends on the purpose of the test. For unit-level performance tests, I prefer to execute tests using the same API as the app code.
I captured server-side performance with these 3 methods with #MartinSmith's original query. The average of 1,000 iterations on my machine were:
test method cpu_time duration logical_reads
SSMS discard 53031.000000 55358.844000 7190.512000
TOP 0 52374.000000 52432.936000 7190.527000
SQLCLR 49110.000000 48838.532000 7190.578000
I did the same with a trivial query returning 10,000 rows and 2 columns (int and nvarchar(100)) from a user table:
test method cpu_time duration logical_reads
SSMS discard 4204.000000 9245.426000 402.004000
TOP 0 2641.000000 2752.695000 402.008000
SQLCLR 1921.000000 1878.579000 402.000000
Repeating the same test but with a varchar(100) column instead of nvarchar(100):
test method cpu_time duration logical_reads
SSMS discard 3078.000000 5901.023000 402.004000
TOP 0 2672.000000 2616.359000 402.008000
SQLCLR 1750.000000 1798.098000 402.000000
Below are the scripts I used for testing:
Source code for SQLCLR proc like #VladimirBaranov suggested:
public static void ExecuteNonQuery(string sql)
{
using (var connection = new SqlConnection("Context Connection=true"))
{
connection.Open();
var command = new SqlCommand(sql, connection);
command.ExecuteNonQuery();
}
}
Xe trace to capture the actual server-side timings and resource usage:
CREATE EVENT SESSION [test] ON SERVER
ADD EVENT sqlserver.sql_batch_completed(SET collect_batch_text=(1))
ADD TARGET package0.event_file(SET filename=N'QueryTimes')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF);
GO
User table create and load:
CREATE TABLE dbo.Foo(
FooID int NOT NULL CONSTRAINT PK_Foo PRIMARY KEY
, Bar1 nvarchar(100)
, Bar2 varchar(100)
);
WITH
t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
,t10k AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c CROSS JOIN t10 AS d)
INSERT INTO dbo.Foo WITH (TABLOCKX)
SELECT num, REPLICATE(N'X', 100), REPLICATE('X', 100)
FROM t10k;
GO
SQL script run from SSMS with the discard results query option to run 1000 iterations of the test with the 3 different methods:
SET NOCOUNT ON;
GO
--return and discard results
SELECT v.*,
o.name
FROM master..spt_values AS v
JOIN sys.objects o
ON o.object_id % NULLIF(v.number, 0) = 0;
GO 1000
--TOP 0
DECLARE #X NVARCHAR(MAX);
SELECT #X = (SELECT TOP 0 v.*,
o.name
FOR XML PATH(''))
FROM master..spt_values AS v
JOIN sys.objects o
ON o.object_id % NULLIF(v.number, 0) = 0;
GO 1000
--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery #sql = N'
SELECT v.*,
o.name
FROM master..spt_values AS v
JOIN sys.objects o
ON o.object_id % NULLIF(v.number, 0) = 0;
'
GO 1000
--return and discard results
SELECT FooID, Bar1
FROM dbo.Foo;
GO 1000
--TOP 0
DECLARE #X NVARCHAR(MAX);
SELECT #X = (SELECT TOP 0 FooID, Bar1
FOR XML PATH(''))
FROM dbo.Foo;
GO 1000
--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery #sql = N'
SELECT FooID, Bar1
FROM dbo.Foo
';
GO 1000
--return and discard results
SELECT FooID, Bar1
FROM dbo.Foo;
GO 1000
--TOP 0
DECLARE #X NVARCHAR(MAX);
SELECT #X = (SELECT TOP 0 FooID, Bar2
FOR XML PATH(''))
FROM dbo.Foo;
GO 1000
--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery #sql = N'
SELECT FooID, Bar2
FROM dbo.Foo
';
GO 1000
I would try to write a single CLR function that runs as many queries as needed to measure. It may have a parameter with the text(s) of queries to run, or names of stored procedures to run.
You have a single request to the server. Everything is done locally on the server. No network overhead. You discard query result in the .NET CLR code without using explicit temp tables by using ExecuteNonQuery for each query that you need to measure.
Don't change the query that you are measuring. Optimizer is complex, changes to the query may have various effects on the performance.
Also, use SET STATISTICS TIME ON and let the server measure the time for you. Fetch what the server has to say, parse it and send it back in the format that suits you.
I think, that results of SET STATISTICS TIME ON / OFF are the most reliable and accurate and have the least amount of noise.
Related
I'm working on a huge SQL code and unfortunately it has a CURSOR which handles another two nested CURSORS within it (totally three cursors inside a stored procedure), which handles millions of data to be DELETE,UPDATE and INSERT. This takes a whole lot of time because of row by row execution and I wish to modify this in to SET based approach
From many articles it shows use of CURSORs is not recommend and the alternate is to use WHILE loops instead, So I tried and replaced the three CUROSRs with three WHILE loops nothing more, though I get the same result but there is no improvement in performance, it took the same time as it took for CUROSRs.
Below is the basic structure of the code I'm working on (i Will try to put as simple as possible) and I will put the comments what they are supposed to do.
declare #projects table (
ProjectID INT,
fieldA int,
fieldB int,
fieldC int,
fieldD int)
INSERT INTO #projects
SELECT ProjectID,fieldA,fieldB,fieldC, fieldD
FROM ProjectTable
DECLARE projects1 CURSOR LOCAL FOR /*First cursor - fetch the cursor from ProjectaTable*/
Select ProjectID FROM #projects
OPEN projects1
FETCH NEXT FROM projects1 INTO #ProjectID
WHILE ##FETCH_STATUS = 0
BEGIN
BEGIN TRY
BEGIN TRAN
DELETE FROM T_PROJECTGROUPSDATA td
WHERE td.ID = #ProjectID
DECLARE datasets CURSOR FOR /*Second cursor - this will get the 'collectionDate'field from datasetsTable for every project fetched in above cursor*/
Select DataID, GroupID, CollectionDate
FROM datasetsTable
WHERE datasetsTable.projectID = #ProjectID /*lets say this will fetch ten records for a single projectID*/
OPEN datasets
FETCH NEXT FROM datasets INTO #DataID, #GroupID, #CollectionDate
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE period CURSOR FOR /*Third Cursor - this will process the records from another table called period with above fetched #collectionDate*/
SELECT ID, dbo.fn_GetEndOfPeriod(ID)
FROM T_PERIODS
WHERE DATEDIFF(dd,#CollectionDate,dbo.fn_GetEndOfPeriod(ID)) >= 0 /*lets say this will fetch 20 records for above fetched single #CollectionDate*/
ORDER BY [YEAR],[Quarter]
OPEN period
FETCH NEXT FROM period INTO #PeriodID, #EndDate
WHILE ##FETCH_STATUS = 0
BEGIN
IF EXISTS (some conditions No - 1 )
BEGIN
BREAK
END
IF EXISTS (some conditions No - 2 )
BEGIN
FETCH NEXT FROM period INTO #PeriodID, #EndDate
CONTINUE
END
/*get the appropirate ID from T_uploads table for the current projectID and periodID fetched*/
SET #UploadID = (SELECT ID FROM T_UPLOADS u WHERE u.project_savix_ID = #ProjectID AND u.PERIOD_ID = #PeriodID AND u.STATUS = 3)
/*Update some fields in T_uploads table for the current projectID and periodID fetched*/
UPDATE T_uploads
SET fieldA = mp.fieldA, fieldB = mp.fieldB
FROM #projects mp
WHERE T_UPLOADS.ID = #UploadID AND mp.ProjectID = #ProjectID
/*Insert some records in T_PROJECTGROUPSDATA table for the current projectID and periodID fetched*/
INSERT INTO T_PROJECTGROUPSDATA tpd ( fieldA,fieldB,fieldC,fieldD,uploadID)
SELECT fieldA,fieldB,fieldC,fieldD,#UploadID
FROM #projects
WHERE tpd.DataID = #DataID
FETCH NEXT FROM period INTO #PeriodID, #EndDate
END
CLOSE period
DEALLOCATE period
FETCH NEXT FROM datasets INTO #DataID, #GroupID, #CollectionDate, #Status, #Createdate
END
CLOSE datasets
DEALLOCATE datasets
COMMIT
END TRY
BEGIN CATCH
Error handling
IF ##TRANCOUNT > 0
ROLLBACK
END CATCH
FETCH NEXT FROM projects1 INTO #ProjectID, #FAID
END
CLOSE projects1
DEALLOCATE projects1
SELECT 1 as success
I request you to suggest any methods to rewrite this code to follow the SET based approach.
Until the table structure and expected result sample data is not provided, here are a few quick things I see that can be improved (some of those are already mentioned by others above):
WHILE Loop is also a cursor. So, changing into to while loop is not
going make things any faster.
Use LOCAL FAST_FORWARD cursor unless you have need to back track a record. This would make the execution much faster.
Yes, I agree that having a SET based approach would be the fastest in most cases, however if you must store intermediate resultset somewhere, I would suggest using a temp table instead of a table variable. Temp table is 'lesser evil' between these 2 options. Here are a few reason why you should try to avoid using a table variable:
Since SQL Server would not have any prior statistics on the table variable during building on Execution Plan, it will always consider that only one record would be returned by the table variable during construction of the execution plan. And accordingly Storage Engine would assign only as much RAM memory for execution of the query. But in reality, there could be millions of records that the table variable might hold during execution. If that happens, SQL Server would be forced spill the data to hard disk during execution (and you will see lots of PAGEIOLATCH in sys.dm_os_wait_stats) making the queries way slower.
One way to get rid of the above issue would be by providing statement level hint OPTION (RECOMPILE) at the end of each query where a table value is used. This would force SQL Server to construct the Execution Plan of those queries each time during runtime and the less memory allocation issue can be avoided. However the downside of this is: SQL Server will no longer be able to take advantage of an already cached execution plan for that stored procedure, and would require recompilation every time, which would deteriorate the performance by some extent. So, unless you know that data in the underlying table changes frequently or the stored procedure itself is not frequently executed, this approach is not recommended by Microsoft MVPs.
Replacing Cursor with While blindly, is not a recommended option, hence it would not impact your performance and might even have negative impact on the performance.
When you define the cursor using Declare C Cursor in fact you are going to create a SCROLL cursor which specifies that all fetch options (FIRST, LAST, PRIOR, NEXT, RELATIVE, ABSOLUTE) are available.
When you need just Fetch Next as scroll option, you can declare the cursor as FAST_FORWARD
Here is the quote about FAST_FORWARD cursor in Microsoft docs:
Specifies that the cursor can only move forward and be scrolled from
the first to the last row. FETCH NEXT is the only supported fetch
option. All insert, update, and delete statements made by the current
user (or committed by other users) that affect rows in the result set
are visible as the rows are fetched. Because the cursor cannot be
scrolled backward, however, changes made to rows in the database after
the row was fetched are not visible through the cursor. Forward-only
cursors are dynamic by default, meaning that all changes are detected
as the current row is processed. This provides faster cursor opening
and enables the result set to display updates made to the underlying
tables. While forward-only cursors do not support backward scrolling,
applications can return to the beginning of the result set by closing
and reopening the cursor.
So you can declare your cursors using DECLARE <CURSOR NAME> FAST_FORWARD FOR ... and you will get noticeable improvements
I think all the cursors code above can be simplified to something like this:
DROP TABLE IF EXISTS #Source;
SELECT DISTINCT p.ProjectID,p.fieldA,p.fieldB,p.fieldC,p.fieldD,u.ID AS [UploadID]
INTO #Source
FROM ProjectTable p
INNER JOIN DatasetsTable d ON d.ProjectID = p.ProjectID
INNER JOIN T_PERIODS s ON DATEDIFF(DAY,d.CollectionDate,dbo.fn_GetEndOfPeriod(s.ID)) >= 0
INNER JOIN T_UPLOADS u ON u.roject_savix_ID = p.ProjectID AND u.PERIOD_ID = s.ID AND u.STATUS = 3
WHERE NOT EXISTS (some conditions No - 1)
AND NOT EXISTS (some conditions No - 2)
;
UPDATE u SET u.fieldA = s.fieldA, u.fieldB = s.fieldB
FROM T_UPLOADS u
INNER JOIN #Source s ON s.UploadID = u.ID
;
INSERT INTO T_PROJECTGROUPSDATA (fieldA,fieldB,fieldC,fieldD,uploadID)
SELECT DISTINCT s.fieldA,s.fieldB,s.fieldC,s.fieldD,s.UploadID
FROM #Source s
;
DROP TABLE IF EXISTS #Source;
Also it would be nice to know "some conditions No" details as query can differ depends on that.
I am running a cursor to generate automatically SQL statements to search a DB for a list of specific values. This will generate, for example, 180 queries stored in a SQL_QueryTable. Secondly, as seen below, I will use a cursor to fetch each statement from the SQL_QueryTable, execute the statement against a table with 150 million records, and ultimately store the results into a result table.
However, this works, but it takes a very long time.
Looking for a suggestion to improve running time.
DECLARE #SQLQuery nvarchar(max)
DECLARE #Counter int = 1
DECLARE #TrackerID nvarchar(max)
DECLARE SQLQuery CURSOR
FOR SELECT SQL_Query, TrackerID FROM SQL_QueryTable
OPEN SQLQuery
FETCH NEXT FROM SQLQuery INTO #SQLQuery, #TrackerID
WHILE ##FETCH_STATUS = 0
BEGIN
PRINT #SQLQuery
PRINT #Counter
Insert Into Table_1 (column1, column2, column3, column4)
Exec(#SQLQuery) --
Update Table_1
Set TrackerID = #TrackerID
where TrackerID is null
SET #Counter = #Counter + 1
FETCH NEXT FROM SQLQuery INTO #SQLQuery, #TrackerID
END
Close SQLQuery
Deallocate SQLQuery
Regardless of any standard optimization methods (indexes, optimized SELECT statement...) you query the table one time per single data you need to find. This multiplies the effort. You need a different approach here to speed things up.
A simple upgrade could be to search one time for a SET of variables. If for example your operator is (=) this is simple. Store the search variable data into a new assistant table and then build one statement for all stored variables using IN or INNER JOIN to get the filtered data. The more values you search for that way, the better the performance. If your values are stored on different columns then you can repeat for each column, because using OR will degrade performance significantly.
The operators will somewhat or hardly complicate things. It depends on your custom searches. The equal operator (=) is easy to handle. Range operators (lesser, greater, between) define a set on their own and are harder to combine. Same for LIKE operator plus the extra effort to search strings.
Concluding, if you search using (=) you can group the searches even if those are split on different columns. If you do not use other operators you are fine. If you use at low numbers, use the grouping for the majority and leave those complex searches the way they work now.
I have the following query:
DECLARE #DaysNotUsed int = 14
DECLARE #DaysNotPhoned int = 7
--Total Unique Students
DECLARE #totalStudents TABLE (SchoolID uniqueidentifier, TotalUniqueStudents int)
INSERT INTO #totalStudents
SELECT
SSGG.School,
COUNT(DISTINCT S.StudentID)
FROM Student S
INNER JOIN StudentStudents_GroupGroups SSGG ON (SSGG.Students = S.StudentID AND SSGG.School = S.School)
INNER JOIN [Group] G ON (G.GroupID = SSGG.Groups AND G.School = SSGG.School)
INNER JOIN SessionHistory SH ON (SH.Student = S.StudentID AND SH.School = S.School AND SH.StartDateTime > GETDATE() - #DaysNotUsed)
WHERE G.IsBuiltIn = 0
AND S.HasStartedProduct = 1
GROUP BY SSGG.School
--Last Used On
DECLARE #lastUsed TABLE (SchoolID uniqueidentifier, LastUsedOn datetime)
INSERT INTO #lastUsed
SELECT
vi.SchoolID,
MAX(sh.StartDateTime)
FROM View_Installation as vi
INNER JOIN SessionHistory as sh on sh.School = vi.SchoolID
GROUP BY vi.SchoolID
SELECT
VI.SchoolID,
INS.DateAdded,
INS.Removed,
INS.DateRemoved,
INS.DateToInclude,
VI.SchoolName AS [School Name],
VI.UsersLicensed AS [Licenses],
ISNULL(TS.TotalUniqueStudents, 0) as [Total Unique Students],
ISNULL(TS.TotalUniqueStudents, 0) * 100 / VI.UsersLicensed as [% of Students Using],
S.State,
LU.LastUsedOn,
DATEDIFF(DAY, LU.LastUsedOn, GETDATE()) AS [Days Not Used],
SI.AreaSalesManager AS [Sales Rep],
SI.CaseNumber AS [Case #],
SI.RenewalDate AS [Renewal Date],
SI.AssignedTo AS [Assigned To],
SI.Notes AS [Notes]
FROM View_Installation VI
INNER JOIN School S ON S.SchoolID = VI.SchoolID
LEFT OUTER JOIN #totalStudents TS on TS.SchoolID = VI.SchoolID
INNER JOIN #lastUsed LU on LU.SchoolID = VI.SchoolID
LEFT OUTER JOIN InactiveReports..SchoolInfo SI ON S.SchoolID = SI.SchoolID
LEFT OUTER JOIN InactiveReports..InactiveSchools INS ON S.SchoolID = INS.SchoolID
WHERE VI.UsersLicensed > 0
AND VI.LastPhoneHome > GETDATE() - #DaysNotPhoned
AND
(
(
SELECT COUNT(DISTINCT S.StudentID)
FROM Student S
INNER JOIN StudentStudents_GroupGroups SSGG ON (SSGG.Students = S.StudentID AND SSGG.School = S.School)
INNER JOIN [Group] G ON (G.GroupID = SSGG.Groups AND G.School = SSGG.School)
WHERE G.IsBuiltIn = 0
AND S.School = VI.SchoolID
) * 100 / VI.UsersLicensed < 50
OR
VI.SchoolID NOT IN
(
SELECT DISTINCT SH1.School
FROM SessionHistory SH1
WHERE SH1.StartDateTime > GETDATE() - #DaysNotUsed
)
)
ORDER BY [Days Not Used] DESC
Running just plain sql like this in SSMS take about 10 seconds to run. When I created a stored procedure with exactly the same code, the query takes 50 seconds instead. The only difference in the actual code of the proc is a SET NOCOUNT ON that the IDE put in by default, but adding that line to the query doesn't have any impact. Any idea what would cause such a dramatic slow down like this?
EDIT I neglected the declare statements at the beginning. These are not in the proc, but are parameters to it. Could this be the difference?
I agree about the potential parameter sniffing issue, but I would also check these settings.
For the procedure:
SELECT uses_ansi_nulls, uses_quoted_identifier
FROM sys.sql_modules
WHERE [object_id] = OBJECT_ID('dbo.procedure_name');
For the SSMS query window where the query is running fast:
SELECT [ansi_nulls], [quoted_identifier]
FROM sys.dm_exec_sessions
WHERE session_id = ##SPID;
If either of these don't match, you might consider dropping the stored procedure and re-creating it with those two settings matching. For example, if the procedure has uses_quoted_identifier = 0 and the session has quoted_identifier = 1, you could try:
DROP PROCEDURE dbo.procedure_name;
GO
SET QUOTED_IDENTIFIER ON;
GO
CREATE PROCEDURE dbo.procedure_name
AS
BEGIN
SET NOCOUNT ON;
...
END
GO
Ideally all of your modules will be created with the exact same QUOTED_IDENTIFIER and ANSI_NULLS settings. It's possible the procedure was created when the settings were off (the default is on for both), or it's possible that where you are executing the query, one or both options are off (you can change this behavior in SSMS under Tools/Options/Query Execution/SQL Server/ANSI).
I'm not going to make any disclaimers about the behavior of the stored procedure with the different settings (for example you may have wanted ANSI_NULLS off so you could compare NULL = NULL), that you'll have to test, but at least you'll be comparing queries that are being run with the same options, and it will help narrow down potential parameter sniffing issues. If you're intentionally using SET ANSI_NULLS OFF, however, I caution you to find other approaches as that behavior will eventually be unsupported.
Other ways around parameter sniffing:
make sure you don't inadvertently compile the procedure with atypical parameters
use the recompile option either on the procedure or on the statement that seems to be the victim (I'm not sure if all of these are valid, because I can only tell that you are using SQL Server 2005 or greater, and some of these were introduced in 2008)
declare local variables similar to your input parameters, and pass the input parameter values to them, using the local variables later in the prodedure and ignoring the input parameters
The last option is my least favorite, but it's the quickest / easiest fix in the midst of troubleshooting and when users are complaining.
Also, in addition to everything else mentioned, if you are on SQL Server 2008 and up, have a look at OPTIMIZE FOR UNKNOWN http://blogs.msdn.com/b/sqlprogrammability/archive/2008/11/26/optimize-for-unknown-a-little-known-sql-server-2008-feature.aspx
I would recommend recompiling the execution plan for the stored procedure.
usage: sp_recompile '[target]'
example: sp_recompile 'dbo.GetObject'
When you execute a query from SSMS the query plan is automatically redone every time its executed. However with stored procs, sql server caches execution plans for stored procedures, and its this execution plan that gets used everytime the stored proc is called.
Link for sp_recompile.
You can also change the proc to use with WITH RECOMPILE clause within the stored proc.
Example:
CREATE PROCEDURE dbo.GetObject
(
#parm1 VARCHAR(20)
)
WITH RECOMPILE
AS
BEGIN
-- Queries/work here.
END
However this will force the execution plan to be recompiled every time the stored proc is called. This is good for dev/testing where the proc and/or data changes quite frequently. Make sure you remove it when you deploy it to production, as this can have a performance hit.
sp_recompile only recompiles the execution plan once. If you need to do it again at a later date, you will need to make the call again.
Good luck!
OK, thank you all for your help. Turns out it was a terribly stupid rookie mistake. The first time I created the proc, it created it under my user's schema instead of the dbo schema. When I called the proc I was simply doing 'exec proc_name', which I'm realizing now was using the version of the proc under my user's schema. Running 'exec dbo.proc_name' ran as expected.
I've developed a couple of T-SQL stored procedures that iterate over a fair bit of data. The first one takes a couple of minutes to run over a year's worth of data which is fine for my purposes. The second one, which uses the same structure/algorithm, albeit over more data, takes two hours, which is unbearable.
I'm using SQL-Server and Query-Analyzer. Are there any profiling tools, and, if so, how do they work?
Alternatively, any thoughts on how improve the speed, based on the pseudo-code below? In short, I use a cursor to iterate over the data from a straight-forward SELECT (from a few joined tables). Then I build an INSERT statement based on the values and INSERT the result into another table. Some of the SELECTed variables require a bit of manipulation before INSERTion. The includes extracting some date parts from a date value, some basic float operations and some string concatenation.
--- Rough algorithm / pseudo-code
DECLARE <necessary variables>
DECLARE #cmd varchar(1000)
DECLARE #insert varchar(100) = 'INSERT INTO MyTable COL1, COL2, ... COLN, VALUES('
DECLARE MyCursor Cursor FOR
SELECT <columns> FROM TABLE_1 t1
INNER JOIN TABLE_2 t2 on t1.key = t2.foreignKey
INNER JOIN TABLE_3 t3 on t2.key = t3.foreignKey
OPEN MyCursor
FETCH NEXT FROM MyCursor INTO #VAL1, #VAL2, ..., #VALn
WHILE ##FETCH_STATUS = 0
BEGIN
#F = #VAL2 / 1.1 --- float op
#S = #VAL3 + ' ' + #VAL1
SET #cmd = #insert
SET #cmd = #cmd + DATEPART(#VAL1) + ', '
SET #cmd = #cmd + STR(#F) + ', '
SET #cmd = #cmd + #S + ', '
SET #cmd = #cmd + ')'
EXEC (#cmd)
FETCH NEXT FROM MyCursor #VAL1, #VAL2, ..., #VALn
END
CLOSE MyCursor
DEALLOCATE MyCursor
The first thing to do - get rid of the cursor...
INSERT INTO MyTable COL1, COL2, ... , COLN
SELECT ...cols and manipulations...
FROM TABLE_1 t1
INNER JOIN TABLE_2 t2 on t1.key = t2.foreignKey
INNER JOIN TABLE_3 t3 on t2.key = t3.foreignKey
Most things should be possible direct in TSQL (it is hard to be definite without an example) - and you could consider a UDF for more complex operations.
Lose the cursor. Now. (See here for why: Why is it considered bad practice to use cursors in SQL Server?).
Without being rude you seem to be taking a procedural programmers approach to SQL which is pretty much always going to be sub-optimal.
If what you're doing is complex and you're not confident I'd do it in three steps:
1) Select of the core data into a temporary table using insert or select into.
2) Use update to do the manipulation - you may be able to do this just updating existing columns or you may need to have added a few extra ones in the right format when you create the temporary table. You can use multiple update statements to break it down further if you want.
3) Select it out into wherever you want it.
If you want to call it all as one step then you can then wrap the whole thing up into a stored procedure.
This makes it easy to debug and easy for someone else to work with if they need to. You can break your updates down into individual steps so you can quickly identify what's gone wrong where.
That said I don't believe that what you're doing can't be done in a single insert statement from the looks of it. It might not be attractive but I believe it could be done:
INSERT INTO NewTable
DATEPART(#VAL1) DateCol,
#STR(#VAL2 / 1.1) FloatCol,
#VAL3 + ' ' + #VAL1 ConcatCol
FROM TABLE_1 t1
INNER JOIN TABLE_2 t2 on t1.key = t2.foreignKey
INNER JOIN TABLE_3 t3 on t2.key = t3.foreignKey
DateCol, FloatCol and ConcatCol are whatever names you want the columns to have. Although they're not needed it's best to assign them as (a) it makes it clearer what you're doing and (b) some languages struggle with unnamed columns (and handle it in a very unclear way).
get rid of the cursor and dynamic sql:
INSERT INTO MyTable
(COL1, COL2, ... COLN)
SELECT
<columns>
,DATEPART(#VAL1) AS DateCol
,#STR(#VAL2 / 1.1) AS FloatCol
,#VAL3 + ' ' + #VAL1 AS ConcatCol
FROM TABLE_1 t1
INNER JOIN TABLE_2 t2 on t1.key = t2.foreignKey
INNER JOIN TABLE_3 t3 on t2.key = t3.foreignKey
Are there any profiling tools, and, if
so, how do they work?
To answer your question regarding query tuning tools, you can use TOAD for SQL Server to assist in query tuning.
I really like this tool as it will run your SQL statement something like 20 different ways and compare execution plans for you to determine the best one. Sometimes I'm amazed at what it does to optimize my statements, and it works quite well.
More importantly, I've used it to become a better t-sql writer as I use the tips on future scripts that I write. I don't know how TOAD would work with this script because as others have mentioned it uses a cursor, and I don't use them so have never tried to optimize one.
TOAD is a huge toolbox of SQL Server functionality, and query optimization is only a small part. Incidentally, I am not affiliated with Quest Software in any way.
SQl Server also comes with a profiling tool called SQL Server Profiler. It's the first pick on the menu under Tools in SSMS.
Lets say you have inherited a MS SQL 2000 or 2005 database, and you know that some of the Tables, Views, Procs, and Functions are not actually used in the final product.
Is there some kind of internal logging or another mechanism that could tell me what objects are NOT being called? or have only been called a few times versus thousands of times.
This SO question, Identifying Unused Objects In Microsoft SQL Server 2005, might be relevant.
The answer will depend a little on how the database has been put together, but my approach to a similar problem was 3-fold:
Figure out which objects have no internal dependencies. You can work this out from queries against sysdepends such as:
select
id,
name
from
sys.sysdepends sd
inner join sys.sysobjects so
on so.id = sd.id
where
not exists (
select
1
from
sysdepends sd2
where
sd2.depid = so.id
)
You should combine this with collecting the type of object (sysobjects.xtype) as you'll only want to isolate the tables, functions, stored procs and views. Also ignore any procedures starting "sp_", unless people have been creating procedures with those names for your application!
Many of the returned procedures may be your application's entry points. That is to say the procedures that are called from your application layer or from some other remote call and don't have any objects that depend on them within the database.
Assuming the process won't be too invasive (it will create some additional load, though not too much) you can now switch on some profiling of the SP:Starting, SQL:BatchStarting and / or SP:StmtStarting events. Run this for as long as you see fit, ideally logging into a sql table for easy cross referencing. You should be able to eliminate many of the procedures that are called directly from your application.
By cross referencing the text data from this log and your dependent object list you will hopefully have isolated most of the unused procedures.
Finally, you may want to take your candidate list resulting from this process and grep your sourcecode base against them. This is a cumbersome task and just because you find references in your code doesn't mean you need them! It may simply be that the code hasn't been removed though it's now logically inaccessible.
This is far from a perfect process. A relatively clean alternative is to set up far more detailed (and therefore invasive) profiling on the server to monitor all the activity. This can include every SQL statement called during the time the log is active. You can then work back through the dependent tables or even cross-database dependencies from this text data. I've found the reliability of the log detail (too many rows per second attempting to be parsed) and the sheer quanitity of data difficult to deal with. If your application is less likely to suffer from this then it may be a good approach.
Caveat:
Because, so far as I'm aware, there isn't a perfect answer to this be particularly wary of removing tables. Procedures, functions and views are easily replaced if something goes wrong (though make sure you have them in source control before burning them of course!). If you're feeling really nervous why not rename the table and create a view with the old name, you've then got an easy out.
We can also find unused columns and table using following query. I tired to write cursor. Cursor will give you information aboout each column n each table.
declare #name varchar(200), #id bigint, #columnname varchar(500)
declare #temptable table
(
table_name varchar(500),
Status bit
)
declare #temp_column_name table
(
table_name varchar(500),
column_name varchar(500),
Status bit
)
declare find_table_dependency cursor for
select name, id from sysobjects where xtype ='U'
open find_table_dependency
fetch find_table_dependency into #name, #id
while ##fetch_Status = 0
begin
if exists(select top 1 name from sysobjects where id in
(select id from syscomments where text like '%'+#name +'%'))
insert into #temptable
select #name, 1
else
insert into #temptable
select #name, 0
declare find_column_dependency cursor for
select name from syscolumns where id = #id
open find_column_dependency
fetch find_column_dependency into #columnname
while ##fetch_Status = 0
begin
if exists(select top 1 name from sysobjects where id in
(select id from syscomments where text like '%'+#columnname +'%'))
insert into #temp_column_name
select #name,#columnname, 1
else
insert into #temp_column_name
select #name,#columnname, 0
fetch find_column_dependency into #columnname
end
close find_column_dependency
deallocate find_column_dependency
fetch find_table_dependency into #name, #id
end
close find_table_dependency
deallocate find_table_dependency
select * from #temptable
select * from #temp_column_name