Why insert-select to variable table from XML variable so slow?

Why insert-select to variable table from XML variable so slow? - sql-server

I'm trying to insert some data from a XML document into a variable table. What blows my mind is that the same select-into (bulk) runs in no time while insert-select takes ages and holds SQL server process accountable for 100% CPU usage while the query executes.
I took a look at the execution plan and INDEED there's a difference. The insert-select adds an extra "Table spool" node even though it doesn't assign cost. The "Table Valued Function [XML Reader]" then gets 92%. With select-into, the two "Table Valued Function [XML Reader]" get 49% each.
Please explain "WHY is this happening" and "HOW to resolve this (elegantly)" as I can indeed bulk insert into a temporary table and then in turn insert into variable table, but that's just creepy.
I tried this on SQL 10.50.1600, 10.00.2531 with the same results
Here's a test case:
declare #xColumns xml
declare #columns table(name nvarchar(300))
if OBJECT_ID('tempdb.dbo.#columns') is not null drop table #columns
insert #columns select name from sys.all_columns
set #xColumns = (select name from #columns for xml path('columns'))
delete #columns
print 'XML data size: ' + cast(datalength(#xColumns) as varchar(30))
--raiserror('selecting', 10, 1) with nowait
--select ColumnNames.value('.', 'nvarchar(300)') name
--from #xColumns.nodes('/columns/name') T1(ColumnNames)
raiserror('selecting into #columns', 10, 1) with nowait
select ColumnNames.value('.', 'nvarchar(300)') name
into #columns
from #xColumns.nodes('/columns/name') T1(ColumnNames)
raiserror('inserting #columns', 10, 1) with nowait
insert #columns
select ColumnNames.value('.', 'nvarchar(300)') name
from #xColumns.nodes('/columns/name') T1(ColumnNames)
Thanks a bunch!!

This is a bug in SQL Server 2008.
Use
insert #columns
select ColumnNames.value('.', 'nvarchar(300)') name
from #xColumns.nodes('/columns/name') T1(ColumnNames)
OPTION (OPTIMIZE FOR ( #xColumns = NULL ))
This workaround is from an item on the Microsoft Connect Site which also mentions a hotfix for this Eager Spool / XML Reader issue is available (under traceflag 4130).
The reason for the performance regression is explained in a different connect item
The spool was introduced due to a general halloween protection logic
(that is not needed for the XQuery expressions).

Looks to be an issue specific to SQL Server 2008. When I run the code in SQL Server 2005, both inserts run quickly and produce identical execution plans that start with the fragment shown below as Plan 1. In 2008, the first insert uses Plan 1 but the second insert produces Plan 2. The remainder of both plans beyond the fragment shown are identical.
Plan 1
Plan 2

Related

How to Auto Generate Code for Stored Procedure Column Data Types - SQL Server

My desired end result is to simply be able to SELECT from a Stored Procedure. I've searched the Internet and unfortunately the Internet said this can't be done and that you first need to create a Temp Table to store the data. My problem is that you must first define the columns in the Temp Table before Executing the STORED Procedure. This is just time consuming. I simply want to take the data from the stored procedure and just stick it into a Temp Table.
What is the FASTEST route to achieve this from a coding perspective? To put it simply it's time consuming to first have to lookup the returned fields from a Stored Procedure and then write them all out.
Is there some sort of tool that can just build the CREATE Table Statement based on the Stored Procedure? See screenshot for clarification.
Most of the Stored Procedures I'm dealing with have 50+ fields. I don't look forward to defining each of these fields manually.
Here is good SO Post that got me this far but not what I was hoping. This still takes too much time. What are experienced SQL Server guys doing? I've only just recently made the jump from Oracle to SQL Server and I see that Temp Tables are a big deal in SQL Server from what I can tell.

You have several options to ease your task. However, these won't be fully automatic. Be aware that these won't work if there's dynamic sql in the procedure's code. You might be able to format the result from the functions to increase the automation allowing you to copy and paste easily.
SELECT * FROM sys.dm_exec_describe_first_result_set_for_object(OBJECT_ID('report.MyStoredProcedureWithAnyColumns'), 0) ;
SELECT * FROM sys.dm_exec_describe_first_result_set(N'EXEC report.MyStoredProcedureWithAnyColumns', null, 0) ;
EXEC sp_describe_first_result_set #tsql = N'EXEC report.MyStoredProcedureWithAnyColumns';
GO

If you don't mind ##temp table and some dynamic SQL
NOTE: As Luis Cazares correctly pointed out... the ##temp runs the risk of collision due to concurrency concerns
Example
Declare #SQL varchar(max) = 'Exec [dbo].[prc-App-Lottery-Search] ''8117'''
Declare #temp varchar(500) = '##myTempTable'
Set #SQL = '
If Object_ID(''tempdb..'+#temp+''') Is Not NULL Drop Table '+#temp+';
Create Table '+#temp+' ('+stuff((Select concat(',',quotename(Name),' ',system_type_name)
From sys.dm_exec_describe_first_result_set(#SQL,null,null ) A
Order By column_ordinal
For XML Path ('')),1,1,'') +')
Insert '+#temp+' '+#SQL+'
'
Exec(#SQL)
Select * from ##myTempTable

How to insert into table the results of a dynamic query when the schema of the result is unknown a priori?

Observe the following simple SQL code:
CREATE TABLE #tmp (...) -- Here comes the schema
INSERT INTO #tmp
EXEC(#Sql) -- The #Sql is a dynamic query generating result with a known schema
All is good, because we know the schema of the result produced by #Sql.
But what if the schema is unknown? In this case I use Powershell to generate a Sql query like that:
SET #Sql = '
SELECT *
INTO ##MySpecialAndUniquelyNamedGlobalTempTable
FROM ($Query) x
'
EXEC(#Sql)
(I omit some details, but the "spirit" of the code is preserved)
And it works fine, except that there is a severe limitation to what $Query can be - it must be a single SELECT statement.
This is not very good for me, I would like to be able to run any Sql script like that. The problem, is that no longer can I concatenate it to FROM (, it must be executed by EXEC or sp_executesql. But then I have no idea how to collect the results into a table, because I have no idea of the schema of that table.
Is it possible in Sql Server 2012?
Motivation: We have many QA databases across different Sql servers and more often than not I find myself running queries on all of them in order to locate the database most likely to yield best results for my tests. Alas, I am only able to run single SELECT statements, which is inconvenient.

We use SP and OPENROWSET for this purpose.
At first create SP based on a query you need, than use OPENROWSET to get data into temp table:
USE Test
DECLARE #sql nvarchar(max),
#query nvarchar(max)
SET #sql = N'Some query'
IF OBJECT_ID(N'SomeSPname') IS NOT NULL DROP PROCEDURE SomeSPname
SET #query =N'
CREATE PROCEDURE SomeSPname
AS
BEGIN
'+#sql+'
END'
EXEC sp_executesql #query
USE tempdb
IF OBJECT_ID(N'#temp') IS NOT NULL DROP TABLE #temp
SELECT *
INTO #temp
FROM OPENROWSET(
'SQLNCLI',
'Server=SERVER\INSTANCE;Database=Test;Trusted_Connection=yes;',
'EXEC dbo.SomeSPname')
SELECT *
FROM #temp

Linked Server Insert-Select Performance

Assume that I have a table on my local which is Local_Table and I have another server and another db and table, which is Remote_Table (table structures are the same).
Local_Table has data, Remote_Table doesn't. I want to transfer data from Local_Table to Remote_Table with this query:
Insert into RemoteServer.RemoteDb..Remote_Table
select * from Local_Table (nolock)
But the performance is quite slow.
However, when I use SQL Server import-export wizard, transfer is really fast.
What am I doing wrong? Why is it fast with Import-Export wizard and slow with insert-select statement? Any ideas?

The fastest way is to pull the data rather than push it. When the tables are pushed, every row requires a connection, an insert, and a disconnect.
If you can't pull the data, because you have a one way trust relationship between the servers, the work around is to construct the entire table as a giant T-SQL statement and run it all at once.
DECLARE #xml XML
SET #xml = (
SELECT 'insert Remote_Table values (' + '''' + isnull(first_col, 'NULL') + ''',' +
-- repeat for each col
'''' + isnull(last_col, 'NULL') + '''' + ');'
FROM Local_Table
FOR XML path('')
) --This concatenates all the rows into a single xml object, the empty path keeps it from having <colname> </colname> wrapped arround each value
DECLARE #sql AS VARCHAR(max)
SET #sql = 'set nocount on;' + cast(#xml AS VARCHAR(max)) + 'set nocount off;' --Converts XML back to a long string
EXEC ('use RemoteDb;' + #sql) AT RemoteServer

It seems like it's much faster to pull data from a linked server than to push data to a linked server: Which one is more efficient: select from linked server or insert into linked server?
Update: My own, recent experience confirms this. Pull if possible -- it will be much, much faster.
Try this on the other server:
INSERT INTO Local_Table
SELECT * FROM RemoteServer.RemoteDb.Remote_Table

The Import/Export wizard will be essentially doing this as a bulk insert, where as your code is not.
Assuming that you have a Clustered Index on the remote table, make sure that you have the same Clustered index on the local table, set Trace flag 610 globally on your remote server and make sure remote is in Simple or bulk logged recovery mode.
If you're remote table is a Heap (which will speed things up anyway), make sure your remote database is in simple or bulk logged mode change your code to read as follows:
INSERT INTO RemoteServer.RemoteDb..Remote_Table WITH(TABLOCK)
SELECT * FROM Local_Table WITH (nolock)

The reason why it's so slow to insert into the remote table from the local table is because it inserts a row, checks that it inserted, and then inserts the next row, checks that it inserted, etc.
Don't know if you figured this out or not, but here's how I solved this problem using linked servers.
First, I have a LocalDB.dbo.Table with several columns:
IDColumn (int, PK, Auto Increment)
TextColumn (varchar(30))
IntColumn (int)
And I have a RemoteDB.dbo.Table that is almost the same:
IDColumn (int)
TextColumn (varchar(30))
IntColumn (int)
The main difference is that remote IDColumn isn't set up as as an ID column, so that I can do inserts into it.
Then I set up a trigger on remote table that happens on Delete
Create Trigger Table_Del
On Table
After Delete
AS
Begin
Set NOCOUNT ON;
Insert Into Table (IDColumn, TextColumn, IntColumn)
Select IDColumn, TextColumn, IntColumn from MainServer.LocalDB.dbo.table L
Where not exists (Select * from Table R WHere L.IDColumn = R.IDColumn)
END
Then when I want to do an insert, I do it like this from the local server:
Insert Into LocalDB.dbo.Table (TextColumn, IntColumn) Values ('textvalue', 123);
Delete From RemoteServer.RemoteDB.dbo.Table Where IDColumn = 0;
--And if I want to clean the table out and make sure it has all the most up to date data:
Delete From RemoteServer.RemoteDB.dbo.Table
By triggering the remote server to pull the data from the local server and then do the insert, I was able to turn a job that took 30 minutes to insert 1258 lines into a job that took 8 seconds to do the same insert.
This does require a linked server connection on both sides, but after that's set up it works pretty good.
Update:
So in the last few years I've made some changes, and have moved away from the delete trigger as a way to sync the remote table.
Instead I have a stored procedure on the remote server that has all the steps to pull the data from the local server:
CREATE PROCEDURE [dbo].[UpdateTable]
-- Add the parameters for the stored procedure here
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Insert statements for procedure here
--Fill Temp table
Insert Into WebFileNamesTemp Select * From MAINSERVER.LocalDB.dbo.WebFileNames
--Fill normal table from temp table
Delete From WebFileNames
Insert Into WebFileNames Select * From WebFileNamesTemp
--empty temp table
Delete From WebFileNamesTemp
END
And on the local server I have a scheduled job that does some processing on the local tables, and then triggers the update through the stored procedure:
EXEC sp_serveroption #server='REMOTESERVER', #optname='rpc', #optvalue='true'
EXEC sp_serveroption #server='REMOTESERVER', #optname='rpc out', #optvalue='true'
EXEC REMOTESERVER.RemoteDB.dbo.UpdateTable
EXEC sp_serveroption #server='REMOTESERVER', #optname='rpc', #optvalue='false'
EXEC sp_serveroption #server='REMOTESERVER', #optname='rpc out', #optvalue='false'

If you must push data from the source to the target (e.g., for firewall or other permissions reasons), you can do the following:
In the source database, convert the recordset to a single XML string (i.e., multiple rows and columns combined into a single XML string).
Then push that XML over as a single row (as a varchar(max), since XML isn't allowed over linked databases in SQL Server).
DECLARE #xml XML
SET #xml = (select * from SourceTable FOR XML path('row'))
Insert into TempTargetTable values (cast(#xml AS VARCHAR(max)))
In the target database, cast the varchar(max) as XML and then use XML parsing to turn that single row and column back into a normal recordset.
DECLARE #X XML = (select '<toplevel>' + ImportString + '</toplevel>' from TempTargetTable)
DECLARE #iX INT
EXEC sp_xml_preparedocument #ix output, #x
insert into TargetTable
SELECT [col1],
[col2]
FROM OPENXML(#iX, '//row', 2)
WITH ([col1] [int],
[col2] [varchar](128)
)
EXEC sp_xml_removedocument #iX

I've found a workaround. Since I'm not a big fun of GUI tools like SSIS, I've reused a bcp script to load table into csv and vice versa. Yeah, it's an odd case to have the bulk operation support for files, but tables. Feel free to edit the following script to fit your needs:
exec xp_cmdshell 'bcp "select * from YourLocalTable" queryout C:\CSVFolder\Load.csv -w -T -S .'
exec xp_cmdshell 'bcp YourAzureDBName.dbo.YourAzureTable in C:\CSVFolder\Load.csv -S yourdb.database.windows.net -U youruser#yourdb.database.windows.net -P yourpass -q -w'
Pros:
No need to define table structures every time.
I've tested and it worked way faster than inserting directly through
the LinkedServer.
It's easier to manage than XML (which is limited to
varchar(max) length anyway).
No need of an extra layout of abstraction (tools like SSIS).
Cons:
Using the external tool bcp through the xp_cmdshell interface.
Table properties will be lost after ex/im-poring csv (i.e. datatype, nulls,length, separator within value, etc).

Why would a stored procedure perform differently when executed remotely to locally?

We've a stored procedure that happens to build up some dynamic SQL and execute via a parametrised call to sp_executesql.
Under normal conditions, this works wonderfully, and has made a large benefit in execution times for the procedure (~8 seconds to ~1 second), however, under some unknown conditions, something strange happens, and performance goes completely the other way (~31 seconds), but only when executed via RPC (i.e. a call from a .Net app with the SqlCommand.CommandType of CommandType.StoredProcedure; or as a remote query from a linked server) - if executed as a SQL Batch using SQL Server Management Studio, we do not see the degradation in performance.
Altering the white-space in the generated SQL and recompiling the stored procedure, seems to resolve the issue at least in the short term, but we'd like to understand the cause, or ways to force the execution plans to be rebuilt for the generated SQL; but at the moment, I'm not sure how to proceed with either?
To illustrate, the Stored Procedure, looks a little like:
CREATE PROCEDURE [dbo].[usp_MyObject_Search]
#IsActive AS BIT = NULL,
#IsTemplate AS BIT = NULL
AS
DECLARE #WhereClause NVARCHAR(MAX) = ''
IF #IsActive IS NOT NULL
BEGIN
SET #WhereClause += ' AND (svc.IsActive = #xIsActive) '
END
IF #IsTemplate IS NOT NULL
BEGIN
SET #WhereClause += ' AND (svc.IsTemplate = #xIsTemplate) '
END
DECLARE #Sql NVARCHAR(MAX) = '
SELECT svc.[MyObjectId],
svc.[Name],
svc.[IsActive],
svc.[IsTemplate]
FROM dbo.MyObject svc WITH (NOLOCK)
WHERE 1=1 ' + #WhereClause + '
ORDER BY svc.[Name] Asc'
EXEC sp_executesql #Sql, N'#xIsActive BIT, #xIsTemplate BIT',
#xIsActive = #IsActive, #xIsTemplate = #IsTemplate
With this approach, the query plan will be cached for the permutations of NULL/not-NULL, and we're getting the benefit of cached query plans. What I don't understand is why it would use a different query plan when executed remotely vs. locally after "something happens"; I also don't understand what the "something" might be?
I realise I could move away from parametrisation, but then we'd lose the benefit of caching what are normally good execution plans.

I would suspect parameter sniffing. If you are on SQL Server 2008 you could try including OPTIMIZE FOR UNKNOWN to minimise the chance that when it generates a plan it does so for atypical parameter values.
RE: What I don't understand is why it would use a different query plan when executed remotely vs. locally after "something happens"
When you execute in SSMS it won't use the same bad plan because of different SET options (e.g. SET ARITHABORT ON) so it will compile a new plan that works well for the parameter values you are currently testing.
You can see these plans with
SELECT usecounts, cacheobjtype, objtype, text, query_plan, value as set_options
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
CROSS APPLY sys.dm_exec_query_plan(plan_handle)
cross APPLY sys.dm_exec_plan_attributes(plan_handle) AS epa
where text like '%FROM dbo.MyObject svc WITH (NOLOCK)%'
and attribute='set_options'
Edit
The following bit is just in response to badbod99's answer
create proc #foo #mode bit, #date datetime
as
declare #Sql nvarchar(max)
if(#mode=1)
set #Sql = 'select top 0 * from sys.objects where create_date < #date /*44FC79BD-2AF5-4774-9674-04D6C3D4B228*/'
else
set #Sql = 'select top 0 * from sys.objects where modify_date < #date /*44FC79BD-2AF5-4774-9674-04D6C3D4B228*/'
EXEC sp_executesql #Sql, N'#date datetime',
#date = #date
go
declare #d datetime
set #d = getdate()
exec #foo 0,#d
exec #foo 1, #d
SELECT usecounts, cacheobjtype, objtype, text, query_plan, value as set_options
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
CROSS APPLY sys.dm_exec_query_plan(plan_handle)
cross APPLY sys.dm_exec_plan_attributes(plan_handle) AS epa
where text like '%44FC79BD-2AF5-4774-9674-04D6C3D4B228%'
and attribute='set_options'
Returns

Recompilation
Any time the execution of the SP would be significantly different due to conditional statements the execution plan which was cached from the last request may not be optimal for this one.
It's all about when SQL compiles the execution plan for the SP. They key section regarding sp compilation on Microsoft docs is this:
... this optimization occurs automatically the first time a stored procedure is run after SQL Server is restarted. It also occurs if an underlying table that is used by the stored procedure changes. But if a new index is added from which the stored procedure might benefit, optimization does not occur until the next time that the stored procedure is run after SQL Server is restarted. In this situation, it can be useful to force the stored procedure to recompile the next time that it executes
SQL does recompile execution plans at times, from Microsoft docs
SQL Server automatically recompiles stored procedures and triggers when it is advantageous to do this.
... but it will not do this with each call (unless using WITH RECOMPILE), so if each execution could be resulting in different SQL, you may be stuck with the same old plan for at least one call.
RECOMPILE query hint
The RECOMPILE query hint, takes into account your parameter values when checking what needs to be recompiled at the statement level.
WITH RECOMPILE option
WITH RECOMPILE (see section F) will cause the execution plan to be compiled with each call, so you will never have a sub-optimal plan, but you will have the compilation overhead.
Restructure into multiple SPs
Looking at your specific case, the execution plan for the proc never changes and the 2 sql statements should have prepared execution plans.
I would suggest that restructuring the code to split the SPs rather than have this conditional SQL generation would simplify things and ensure you always have the optimal execution plan without any SQL magic sauce.

How to determine the SQL Server object name from object id and database id?

I need the behaviour of SQL Server 2005 where function OBJECT_NAME takes two arguments, obj id and db id, while SQL Server 2000 takes only obj id so the execution must be in the context of the database to which inspected object belongs to.
Solution must be possible to implement in a function, so it can be used in a select query.

In SQL 2005 and up it is of course trivial to do this. The problem is SQL 2000. I used 2000 a lot back when, but no longer have access to any installations of it; the rest of this is largely from memory, and may be inaccurate.
The key thing is how to retrieve data from a database other than the "current" database, when you cannot know what that other database (or databases) will be at the time the code is written. (Yes, the db_id parameter is very convenient!) For this problem and for similar problems, the general work-around is to create dynamic code, something like:
SET #Command = 'select name from ' + #dbname + '.dbo.sysobjects where object_id = ' + #ObjectId
EXECUTE (#Command)
The problem is, I'm pretty sure you can't run dynamic code within functions (or perhaps just within SQL 2000 functions).
You might have to resort to creating a temp table, populating it via dynamic query, and then using it within the "main" query you are trying to write. Psuedo code would be like:
CREATE #TempTable
IF SQL2000 or earlier
INSERT #TempTable EXECUTE (select data from TargetDb.dbo.sysobjects)
-- Note that the entire insert may need to be in the dynamic statement
ELSE
INSERT #TempTable SELECT [from query based on object_id]
SELECT [the data you need]
from YourTable
join #TempTable

In SQL 2008 and up, use:
OBJECT_NAME ( object_id [, database_id ] )
for example:
SELECT TOP 10
object_schema_name(objectid, dbid) as [SchemaName],
object_name(objectid, dbid) as [ObjectName],
e.*
from sys.dm_exec_cached_plans P
CROSS APPLY sys.dm_exec_query_plan(P.plan_handle) E

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight