I have a large SQL query with multiple statements and UNION ALL. I am doing something like this now:
DECLARE #condition BIT;
SET #condition = 0;
SELECT * FROM table1
WHERE #condition = 1;
UNION ALL
SELECT * FROM table2
In this case, table1 won't return any results. However, that query is complex with many joins (such as FullTextTable). The execution plan's estimate shows a high cost, but the actual number of rows and time to execute seems to show otherwise. Is this the most efficient way of filtering a whole query, or is there a better way? I don't want anything in the first select to run, if possible.
I would imagine that your eventual SQL query with all of the unions and conditions that depend on pre-calculated values gets pretty complicated. If you're interested in reducing the complexity of the query (not to the machine but for maintenance purposes), I would go with a moving the individual queries into views or table valued functions to move that logic elsewhere. Then you can use the if #condition = 1 syntax that has been suggested elsewhere.
The best way to solve this is by using Dynamic SQL. The problem with DForck's solutions is that it may lead to parameter sniffing. Just to give a rough idea, your query might look something like this
DECLARE #query VARCHAR(MAX);
IF (#condition = 0)
SET #query = 'SELECT * FROM table1
UNION ALL '
SET #query = #query + 'SELECT * FROM table2'
sp_executesql #query
This is just a simplified case, but in actual implementation you would parameterize the dynamic query which will solve the problem of parameter sniffing. Here is an excellent explanation about this problem Parameter Sniffing (or Spoofing) in SQL Server
i think you might be better off with this:
if (#condition=1)
begin
select * from table1
union all
select * from table2
end
else
begin
select * from table2
end
Related
Backround
This question is a follow-up to a previous question. To give you context here as well, I would like to summarize the previous question: in my previous question I intended to have a methodology to execute selections without sending their result to the client. The goal was to measure performance without eating up a lot of resources by sending millions of data. I am only interested in the time needed to execute those queries and not in the time they will send the results to the client app, since I intend to optimize queries, so the results of the queries will not change at all, but the methodology will change and I intend to be able to compare the methodologies.
Current knowledge
In my other question several ideas were presented. An idea was to select the count of the records and put it into a variable. However, that changed the query plan significantly and the results were not accurate in terms of performance. The idea of using a temporary table was presented as well, but creating a temporary table and inserting into it is difficult if we do not know what query will be our input to measure and also introduces a lot of white noise, so, even though the idea was creative, it was not ideal for my problem. Finally Vladimir Baranov came with an idea to create as many variables as many columns the selection will return. This was a great idea, but I refined it further, by creating a single variable of nvarchar(max) and selecting all my columns into it. The idea works great, except for a few problems. I have the solution for most of the problems, but I would like to share them, so, I will describe them regardless, but do not misunderstand me, I have a single question.
Problem1
If I have a #container variable and I do a #container = columnname inside each selection, then I will have conversion problems.
Solution1
Instead of just doing a #container = columnname, I need to do a #container = cast(columnname as nvarchar(max))
Problem2
I will need to convert <whatever> as something into #container = cast(<whatever> as nvarchar(max)) for each columns in the selection, but not for subselections and I will need to have a general solution handling case when and parantheses, I do not want to have any instances of #container = anywhere, except to the left of the main selection.
Solution2
Since I am clueless about regular expressions, I can solve this by iterating the query string until I find the from of the main query and each time I find a parantheses, I will do nothing until that parantheses is closed, find the indexes where #container = should be put and as [customname] should be taken out and from right to left do all that in the query string. This will be a long and unelegant code.
Question
Is it possible to make sure that all my main columns but nothing else start with #container = and ends without as [Customname]?
This is much too long for a comment but I'd like to add my $.02 to the other answers and share the scripts I used to test the suggested methods.
I like #MartinSmith's TOP 0 solution but am concerned that it could result in a different execution plan shape in some cases. I didn't see this in the tests I ran but I think you'll need to verify the plan is about the same as the unmolested query for each query you test. However, the results of my tests suggest the number of columns and/or data types might skew performance with this method.
The SQLCLR method in #VladimirBaranov's answer should provide the exact plan as the app code generates (assuming identical SET options for the test) there will still be some slight overhead (YMMV) with SqlClient consuming results within the SQLCLR. There will be less server overhead with this method compared to returning results back to the calling application.
The SSMS discard results method I suggested in my first comment will incur more overhead than the other methods but does include the server-side work SQL Server will perform not only in running the query, but also filling buffers for the returned result. Whether or not this additional SQL Server work should be taken into account depends on the purpose of the test. For unit-level performance tests, I prefer to execute tests using the same API as the app code.
I captured server-side performance with these 3 methods with #MartinSmith's original query. The average of 1,000 iterations on my machine were:
test method cpu_time duration logical_reads
SSMS discard 53031.000000 55358.844000 7190.512000
TOP 0 52374.000000 52432.936000 7190.527000
SQLCLR 49110.000000 48838.532000 7190.578000
I did the same with a trivial query returning 10,000 rows and 2 columns (int and nvarchar(100)) from a user table:
test method cpu_time duration logical_reads
SSMS discard 4204.000000 9245.426000 402.004000
TOP 0 2641.000000 2752.695000 402.008000
SQLCLR 1921.000000 1878.579000 402.000000
Repeating the same test but with a varchar(100) column instead of nvarchar(100):
test method cpu_time duration logical_reads
SSMS discard 3078.000000 5901.023000 402.004000
TOP 0 2672.000000 2616.359000 402.008000
SQLCLR 1750.000000 1798.098000 402.000000
Below are the scripts I used for testing:
Source code for SQLCLR proc like #VladimirBaranov suggested:
public static void ExecuteNonQuery(string sql)
{
using (var connection = new SqlConnection("Context Connection=true"))
{
connection.Open();
var command = new SqlCommand(sql, connection);
command.ExecuteNonQuery();
}
}
Xe trace to capture the actual server-side timings and resource usage:
CREATE EVENT SESSION [test] ON SERVER
ADD EVENT sqlserver.sql_batch_completed(SET collect_batch_text=(1))
ADD TARGET package0.event_file(SET filename=N'QueryTimes')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF);
GO
User table create and load:
CREATE TABLE dbo.Foo(
FooID int NOT NULL CONSTRAINT PK_Foo PRIMARY KEY
, Bar1 nvarchar(100)
, Bar2 varchar(100)
);
WITH
t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
,t10k AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c CROSS JOIN t10 AS d)
INSERT INTO dbo.Foo WITH (TABLOCKX)
SELECT num, REPLICATE(N'X', 100), REPLICATE('X', 100)
FROM t10k;
GO
SQL script run from SSMS with the discard results query option to run 1000 iterations of the test with the 3 different methods:
SET NOCOUNT ON;
GO
--return and discard results
SELECT v.*,
o.name
FROM master..spt_values AS v
JOIN sys.objects o
ON o.object_id % NULLIF(v.number, 0) = 0;
GO 1000
--TOP 0
DECLARE #X NVARCHAR(MAX);
SELECT #X = (SELECT TOP 0 v.*,
o.name
FOR XML PATH(''))
FROM master..spt_values AS v
JOIN sys.objects o
ON o.object_id % NULLIF(v.number, 0) = 0;
GO 1000
--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery #sql = N'
SELECT v.*,
o.name
FROM master..spt_values AS v
JOIN sys.objects o
ON o.object_id % NULLIF(v.number, 0) = 0;
'
GO 1000
--return and discard results
SELECT FooID, Bar1
FROM dbo.Foo;
GO 1000
--TOP 0
DECLARE #X NVARCHAR(MAX);
SELECT #X = (SELECT TOP 0 FooID, Bar1
FOR XML PATH(''))
FROM dbo.Foo;
GO 1000
--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery #sql = N'
SELECT FooID, Bar1
FROM dbo.Foo
';
GO 1000
--return and discard results
SELECT FooID, Bar1
FROM dbo.Foo;
GO 1000
--TOP 0
DECLARE #X NVARCHAR(MAX);
SELECT #X = (SELECT TOP 0 FooID, Bar2
FOR XML PATH(''))
FROM dbo.Foo;
GO 1000
--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery #sql = N'
SELECT FooID, Bar2
FROM dbo.Foo
';
GO 1000
I would try to write a single CLR function that runs as many queries as needed to measure. It may have a parameter with the text(s) of queries to run, or names of stored procedures to run.
You have a single request to the server. Everything is done locally on the server. No network overhead. You discard query result in the .NET CLR code without using explicit temp tables by using ExecuteNonQuery for each query that you need to measure.
Don't change the query that you are measuring. Optimizer is complex, changes to the query may have various effects on the performance.
Also, use SET STATISTICS TIME ON and let the server measure the time for you. Fetch what the server has to say, parse it and send it back in the format that suits you.
I think, that results of SET STATISTICS TIME ON / OFF are the most reliable and accurate and have the least amount of noise.
I have a question for best use when creating where clause in a SQL Procedure.
I have written a query three different ways one using Coalesce in where clause, one using a isnull or statement, and one which is dynamic using sp_executesql.
Coalesce:
WHERE ClientID = COALESCE(#Client, ClientID) AND
AccessPersonID = COALESCE(#AccessPerson, AccessPersonID)
IsNull Or:
WHERE (#Client IS NULL OR #Client = ClientID)
AND (#AccessPerson IS NULL OR #AccessPerson= AccessPersonID)
and dynamically:
SET #sql = #sql + Char(13) + Char(10) + N'WHERE 1 = 1';
IF #Client <> 0
BEGIN
SET #sql = #sql + Char(13) + Char(10) + N' AND ClientID = #Client '
END
IF #AccessPerson <> 0
BEGIN
SET #sql = #sql + Char(13) + Char(10) + N' AND AccessPersonID = #AccessPerson '
END
When I use SQL Sentry Plan Explorer the results show for the estimated that the Coalesce is the best but the the lest accurate between estimated and actual. Where the dynamic has the worst estimated but it is 100% accurate to the actual.
This is a very simple procedure I am just trying to figure out what is the bes way to write procedures like this. I would thin the dynamic is the way to go since it is the most accurate.
The correct answer is the 'dynamic' option. It's good you left parameters in because it protects against SQL Injection (at this layer anyway).
The reason 'dynamic' is the best is because it will create a query plan that is best for the given query. With your example you might get up to 3 plans for this query, depending on which parameters are > 0, but each plan generated one will be optimized for that scenario (they will leave out unnecessary parameter comparisons).
The other two styles will generate one plan (each), and it will only be optimized for the parameters you used AT THAT TIME ONLY. Each subsequent execution will use the old plan and might be cached using the parameter you are not calling with.
'Dynamic' is not as clean-code as the other two options, but for performance, it will give you the optimal query plan each time.
And the dynamic SQL operates in a different scope than your sproc will, so even though you declare a variable in your sproc, you'll have to redeclare it in your dynamic SQL. Or concat it into the statement. But then you should also do NULL checks in your dynamic SQL AND in your sproc, because NULL isn't equal to 0 nor is it not equal to 0. You can't compare it because it doesn't exist. :-S
DECLARE #Client int = 1
, #AccessPerson int = NULL
;
DECLARE #sql nvarchar(2000) = N'SELECT * FROM ##TestClientID WHERE 1=1'
;
IF #Client <> 0
BEGIN
SET #sql = CONCAT(#sql, N' AND ClientID = ', CONVERT(nvarchar(10), #Client))
END
;
IF #AccessPerson <> 0
BEGIN
SET #sql = CONCAT(#sql, N' AND AccessPersonID =', CONVERT(nvarchar(10), #AccessPerson))
END
;
PRINT #sql
EXEC sp_ExecuteSQL #sql
Note: For demo purposes, I also had to modify my temp table above and make it a global temp instead of a local temp, since I'm calling it from dynamic SQL. It exists in a different scope. Don't forget to clean it up after you're done. :-)
Your top two statements don't do quite the same things if either value is NULL.
http://sqlfiddle.com/#!9/d0aa3/4
IF OBJECT_ID (N'tempdb..#TestClientID', N'U') IS NOT NULL
DROP TABLE #TestClientID;
GO
CREATE TABLE #TestClientID ( ClientID int , AccessPersonID int )
INSERT INTO #TestClientID (ClientID, AccessPersonID)
SELECT 1,1 UNION ALL
SELECT NULL,1 UNION ALL
SELECT 1,NULL UNION ALL
SELECT 0,0
DECLARE #ClientID int = NULL
DECLARE #AccessPersonID int = 1
SELECT * FROM #TestClientID
WHERE ClientID = COALESCE(#ClientID, ClientID)
AND AccessPersonID = COALESCE(#AccessPersonID, AccessPersonID)
SELECT * FROM #TestClientID
WHERE (#ClientID IS NULL OR #ClientID = ClientID)
AND (#AccessPersonID IS NULL OR #AccessPersonID = AccessPersonID)
That said, if you're looking to eliminate a NULL input value, then use the COALESCE(). NULLs can get weird when doing comparisons. COALESCE(a,b) is more akin to MS SQL's ISNULL(a,b). In other words, if a IS NULL, use b.
And again, it really all depends on what you're ultimately trying to do. sp_ExecuteSQL is MS-centric, so if you don't plan to port this to any other database, you can use that. But honestly, in 15 years I've probably ported an application from one db to another fewer than a dozen times. It's more important if you're writing an application that will be used by other people who will install it on different systems, but if it's an enclosed system, the benefits of the database you're using usually outweigh the lack of portability.
I probably should have included One more section of the query
For the ISNULL and the COALESCE I am converting a value of 0 to null where in the dynamic I am leaving the value as 0 for the if clause. That is why the look a bit different.
From what I have been seeing the the COALESCE seems to be the consistently the worst performing.
Surprisingly from what I have tested the ISNULL and dynamic are very similar with the ISNULL version being slightly better in most cases.
In most cases it has reviled indexes that needed to be add and in most cases the indexes improved the queries the most but after thet have been added the ISNULL and Dynamic still perform better than the COALESCE.
Also I can not see us switching from MSSQL in the near or distant future.
Using SQL Server 2008, I would like to specify the column names of an inner join using dynamic SQL. The two tables I am joining have the same names for the columns I am joining on. I know SQL Server does not support natural joins; if it did, the dynamic SQL would look like something like this:
DECLARE #join_columns AS NVARCHAR(100)
DECLARE #sql_1 AS NVARCHAR(4000)
SET #join_columns = 'Age, Gender'
SET #sql_1 = '
SELECT ' + #join_columns + ', table_1.Field_x , table_2.Field_y
FROM table_1 , table_2
NATURAL JOIN ON ' + #join_columns
EXECUTE sp_executesql #sql_1
Now, I realize this won't work because there are no natural joins in SQL Server. So, what is the next best way to do this?
Here are a few things I unsuccessfully pursued:
Tokenizing #join_columns and forming up a dynamic WHERE table_1.<col_1> = table_2.<col_1> [AND...] kind of clause. But, it doesn't look like T-SQL has string tokenization functions.
Using dynamic SQL to make temp tables, each with a new key column called temp_key that is the concatenation of the fields in #join_column. If it were easy to dynamically concatenate these, then the final join could be always be ON #temp_table_1.temp_key = #temp_table_2.temp_key. One way of setting this up would be to use the REPLACE function to replace the commas in #join_column with plus signs. The problem I ran into here was that the concatenation required casting for the non-VARCHAR columns. So, I'd have to know column types ahead of time - back to square one.
Ideally, I'd like to keep #join_columns as a comma-delimited string, because I am using it elsewhere in dynamic SQL GROUP BY clauses.
It may be that one of the failed approaches above could work, using something I missed. Or, maybe there's a better overall approach.
Any suggestions?
Update
Solution was a combination of both #usr and #Karl's posts below. I used #usr's suggestion to track down a tokenizing table-valued UDF (ended up going with this one ). Then I used #Karl's COALESCE example to turn the resulting table into the WHERE clause. I also used #Karl's full example for another join problem I just ran into. I wish I could give answer status to both posters - thanks guys!
I find that this works well:
declare #whereClause varchar(8000)
declare #table2 varchar(255)
declare #table1 varchar(255)
set #table1='SomeTable'
set #table2 = 'SomeOtherTable'
SELECT COLUMN_NAME as [joincolumn]
into #join_columns
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #table1
AND COLUMN_NAME not in ('list names of columns that are not to be joined on here')
select #whereClause=coalesce(#whereClause+' and ','')+
'['+#table2+'].'+joincolumn+'=['+#table1+'].'+joincolumn +'
'
from #join_columns
print #whereClause
You can then create a dynamic SQL script and tag the WHERE clause to the back of it
Encapsulate the tokenization/splitting functionality in a table valued UDF. That allows you to build the dynamic SQL string in a clean and architecturally sound way. You can find such splitting functions readily available on the web. It is a shame that they are not built-in but we can build them ourselves.
Using SQL 2005 / 2008
I have to use a forward cursor, but I don't want to suffer poor performance. Is there a faster way I can loop without using cursors?
Here is the example using cursor:
DECLARE #VisitorID int
DECLARE #FirstName varchar(30), #LastName varchar(30)
-- declare cursor called ActiveVisitorCursor
DECLARE ActiveVisitorCursor Cursor FOR
SELECT VisitorID, FirstName, LastName
FROM Visitors
WHERE Active = 1
-- Open the cursor
OPEN ActiveVisitorCursor
-- Fetch the first row of the cursor and assign its values into variables
FETCH NEXT FROM ActiveVisitorCursor INTO #VisitorID, #FirstName, #LastName
-- perform action whilst a row was found
WHILE ##FETCH_STATUS = 0
BEGIN
Exec MyCallingStoredProc #VisitorID, #Forename, #Surname
-- get next row of cursor
FETCH NEXT FROM ActiveVisitorCursor INTO #VisitorID, #FirstName, #LastName
END
-- Close the cursor to release locks
CLOSE ActiveVisitorCursor
-- Free memory used by cursor
DEALLOCATE ActiveVisitorCursor
Now here is the example how can we get same result without using cursor:
/* Here is alternative approach */
-- Create a temporary table, note the IDENTITY
-- column that will be used to loop through
-- the rows of this table
CREATE TABLE #ActiveVisitors (
RowID int IDENTITY(1, 1),
VisitorID int,
FirstName varchar(30),
LastName varchar(30)
)
DECLARE #NumberRecords int, #RowCounter int
DECLARE #VisitorID int, #FirstName varchar(30), #LastName varchar(30)
-- Insert the resultset we want to loop through
-- into the temporary table
INSERT INTO #ActiveVisitors (VisitorID, FirstName, LastName)
SELECT VisitorID, FirstName, LastName
FROM Visitors
WHERE Active = 1
-- Get the number of records in the temporary table
SET #NumberRecords = ##RowCount
--You can use: SET #NumberRecords = SELECT COUNT(*) FROM #ActiveVisitors
SET #RowCounter = 1
-- loop through all records in the temporary table
-- using the WHILE loop construct
WHILE #RowCounter <= #NumberRecords
BEGIN
SELECT #VisitorID = VisitorID, #FirstName = FirstName, #LastName = LastName
FROM #ActiveVisitors
WHERE RowID = #RowCounter
EXEC MyCallingStoredProc #VisitorID, #FirstName, #LastName
SET #RowCounter = #RowCounter + 1
END
-- drop the temporary table
DROP TABLE #ActiveVisitors
"NEVER use Cursors" is a wonderful example of how damaging simple rules can be. Yes, they are easy to communicate, but when we remove the reason for the rule so that we can have an "easy to follow" rule, then most people will just blindly follow the rule without thinking about it, even if following the rule has a negative impact.
Cursors, at least in SQL Server / T-SQL, are greatly misunderstood. It is not accurate to say "Cursors affect performance of SQL". They certainly have a tendency to, but a lot of that has to do with how people use them. When used properly, Cursors are faster, more efficient, and less error-prone than WHILE loops (yes, this is true and has been proven over and over again, regardless of who argues "cursors are evil").
First option is to try to find a set-based approach to the problem.
If logically there is no set-based approach (e.g. needing to call EXEC per each row), and the query for the Cursor is hitting real (non-Temp) Tables, then use the STATIC keyword which will put the results of the SELECT statement into an internal Temporary Table, and hence will not lock the base-tables of the query as you iterate through the results. By default, Cursors are "sensitive" to changes in the underlying Tables of the query and will verify that those records still exist as you call FETCH NEXT (hence a large part of why Cursors are often viewed as being slow). Using STATIC will not help if you need to be sensitive of records that might disappear while processing the result set, but that is a moot point if you are considering converting to a WHILE loop against a Temp Table (since that will also not know of changes to underlying data).
If the query for the cursor is only selecting from temporary tables and/or table variables, then you don't need to prevent locking as you don't have concurrency issues in those cases, in which case you should use FAST_FORWARD instead of STATIC.
I think it also helps to specify the three options of LOCAL READ_ONLY FORWARD_ONLY, unless you specifically need a cursor that is not one or more of those. But I have not tested them to see if they improve performance.
Assuming that the operation is not eligible for being made set-based, then the following options are a good starting point for most operations:
DECLARE [Thing1] CURSOR LOCAL READ_ONLY FORWARD_ONLY STATIC
FOR SELECT columns
FROM Schema.ReadTable(s);
DECLARE [Thing2] CURSOR LOCAL READ_ONLY FORWARD_ONLY FAST_FORWARD
FOR SELECT columns
FROM #TempTable(s) and/or #TableVariables;
You can do a WHILE loop, however you should seek to achieve a more set based operation as anything in SQL that is iterative is subject to performance issues.
http://msdn.microsoft.com/en-us/library/ms178642.aspx
Common Table Expressions would be a good alternative as #Neil suggested. Here's an example from Adventureworks:
WITH cte_PO AS
(
SELECT [LineTotal]
,[ModifiedDate]
FROM [AdventureWorks].[Purchasing].[PurchaseOrderDetail]
),
minmax AS
(
SELECT MIN([LineTotal]) as DayMin
,MAX([LineTotal]) as DayMax
,[ModifiedDate]
FROM cte_PO
GROUP BY [ModifiedDate]
)
SELECT * FROM minmax ORDER BY ModifiedDate
Here's the top few lines of what it returns:
DayMin DayMax ModifiedDate
135.36 8847.30 2001-05-24 00:00:00.000
129.8115 25334.925 2001-06-07 00:00:00.000
Recursive Queries using Common Table Expressions.
I have to use a forward cursor, but I don't want to suffer poor performance. Is there a faster way I can loop without using cursors?
This depends on what you do with the cursor.
Almost everything can be rewritten using set-based operations in which case the loops are performed inside the query plan and since they involve no context switch are much faster.
However, there are some things SQL Server is just not good at, like computing cumulative values or joining on date ranges.
These kinds of queries can be made faster using a CURSOR:
Flattening timespans: SQL Server
But again, this is a quite a rare exception, and normally a set-based way performs better.
If you posted your query, we could probably optimize it and get rid of a CURSOR.
Depending on what you want it for, you may be able to use a tally table.
Jeff Moden has an excellent article on tally tables Here
Don't use a cursor, instead look for a set-based solution. If you can't find a set-based solution... still don't use a cursor! Post details of what you are trying to achieve, someone will be able to find a set-based solution for you.
There may be some scenarios where one can use Tally tables. It could be a good alternative of loop and cusrors but remember it cannot be applied in every case. A well explain case can be found here
I've developed a couple of T-SQL stored procedures that iterate over a fair bit of data. The first one takes a couple of minutes to run over a year's worth of data which is fine for my purposes. The second one, which uses the same structure/algorithm, albeit over more data, takes two hours, which is unbearable.
I'm using SQL-Server and Query-Analyzer. Are there any profiling tools, and, if so, how do they work?
Alternatively, any thoughts on how improve the speed, based on the pseudo-code below? In short, I use a cursor to iterate over the data from a straight-forward SELECT (from a few joined tables). Then I build an INSERT statement based on the values and INSERT the result into another table. Some of the SELECTed variables require a bit of manipulation before INSERTion. The includes extracting some date parts from a date value, some basic float operations and some string concatenation.
--- Rough algorithm / pseudo-code
DECLARE <necessary variables>
DECLARE #cmd varchar(1000)
DECLARE #insert varchar(100) = 'INSERT INTO MyTable COL1, COL2, ... COLN, VALUES('
DECLARE MyCursor Cursor FOR
SELECT <columns> FROM TABLE_1 t1
INNER JOIN TABLE_2 t2 on t1.key = t2.foreignKey
INNER JOIN TABLE_3 t3 on t2.key = t3.foreignKey
OPEN MyCursor
FETCH NEXT FROM MyCursor INTO #VAL1, #VAL2, ..., #VALn
WHILE ##FETCH_STATUS = 0
BEGIN
#F = #VAL2 / 1.1 --- float op
#S = #VAL3 + ' ' + #VAL1
SET #cmd = #insert
SET #cmd = #cmd + DATEPART(#VAL1) + ', '
SET #cmd = #cmd + STR(#F) + ', '
SET #cmd = #cmd + #S + ', '
SET #cmd = #cmd + ')'
EXEC (#cmd)
FETCH NEXT FROM MyCursor #VAL1, #VAL2, ..., #VALn
END
CLOSE MyCursor
DEALLOCATE MyCursor
The first thing to do - get rid of the cursor...
INSERT INTO MyTable COL1, COL2, ... , COLN
SELECT ...cols and manipulations...
FROM TABLE_1 t1
INNER JOIN TABLE_2 t2 on t1.key = t2.foreignKey
INNER JOIN TABLE_3 t3 on t2.key = t3.foreignKey
Most things should be possible direct in TSQL (it is hard to be definite without an example) - and you could consider a UDF for more complex operations.
Lose the cursor. Now. (See here for why: Why is it considered bad practice to use cursors in SQL Server?).
Without being rude you seem to be taking a procedural programmers approach to SQL which is pretty much always going to be sub-optimal.
If what you're doing is complex and you're not confident I'd do it in three steps:
1) Select of the core data into a temporary table using insert or select into.
2) Use update to do the manipulation - you may be able to do this just updating existing columns or you may need to have added a few extra ones in the right format when you create the temporary table. You can use multiple update statements to break it down further if you want.
3) Select it out into wherever you want it.
If you want to call it all as one step then you can then wrap the whole thing up into a stored procedure.
This makes it easy to debug and easy for someone else to work with if they need to. You can break your updates down into individual steps so you can quickly identify what's gone wrong where.
That said I don't believe that what you're doing can't be done in a single insert statement from the looks of it. It might not be attractive but I believe it could be done:
INSERT INTO NewTable
DATEPART(#VAL1) DateCol,
#STR(#VAL2 / 1.1) FloatCol,
#VAL3 + ' ' + #VAL1 ConcatCol
FROM TABLE_1 t1
INNER JOIN TABLE_2 t2 on t1.key = t2.foreignKey
INNER JOIN TABLE_3 t3 on t2.key = t3.foreignKey
DateCol, FloatCol and ConcatCol are whatever names you want the columns to have. Although they're not needed it's best to assign them as (a) it makes it clearer what you're doing and (b) some languages struggle with unnamed columns (and handle it in a very unclear way).
get rid of the cursor and dynamic sql:
INSERT INTO MyTable
(COL1, COL2, ... COLN)
SELECT
<columns>
,DATEPART(#VAL1) AS DateCol
,#STR(#VAL2 / 1.1) AS FloatCol
,#VAL3 + ' ' + #VAL1 AS ConcatCol
FROM TABLE_1 t1
INNER JOIN TABLE_2 t2 on t1.key = t2.foreignKey
INNER JOIN TABLE_3 t3 on t2.key = t3.foreignKey
Are there any profiling tools, and, if
so, how do they work?
To answer your question regarding query tuning tools, you can use TOAD for SQL Server to assist in query tuning.
I really like this tool as it will run your SQL statement something like 20 different ways and compare execution plans for you to determine the best one. Sometimes I'm amazed at what it does to optimize my statements, and it works quite well.
More importantly, I've used it to become a better t-sql writer as I use the tips on future scripts that I write. I don't know how TOAD would work with this script because as others have mentioned it uses a cursor, and I don't use them so have never tried to optimize one.
TOAD is a huge toolbox of SQL Server functionality, and query optimization is only a small part. Incidentally, I am not affiliated with Quest Software in any way.
SQl Server also comes with a profiling tool called SQL Server Profiler. It's the first pick on the menu under Tools in SSMS.