Table Valued Parameter has slow performance because of table scan - sql-server

I have an aplication that passes parameters to a procedure in SQL. One of the parameters is an table valued parameter containing items to include in a where clause.
Because the table valued parameter has no statistics attached to it when I join my TVP to a table that has 2 mil rows I get a very slow query.
What alternatives do I have ?
Again, the goal is to pass certain values to a procedure that will be included in a where clause:
select * from table1 where id in
(select id from #mytvp)
or
select * from table1 t1 join #mytpv
tvp on t1.id = tvp.id

although it looks like it would need to run the query once for each row in table1, EXISTS often optimizes to be more efficient than a JOIN or an IN. So, try this:
select * from table1 t where exists (select 1 from #mytvp p where t.id=p.id)
also, be sure that t.id is the same datatype as p.id and t.id has an index.

You can use a temp table with an index to boost performance....(assuming you have more than a couple of records in your #mytvp)
just before you join the table you could insert the data from the variable #mytvp to a temp table...
here's a sample code to create a temp table with index....The primary key and unique field determines which columns to index on..
CREATE TABLE #temp_employee_v3
(rowID int not null identity(1,1)
,lname varchar (30) not null
,fname varchar (30) not null
,city varchar (20) not null
,state char (2) not null
,PRIMARY KEY (lname, fname, rowID)
,UNIQUE (state, city, rowID) )

I had the same issue that table-valued parameters where very slow in my context. I came up with a solution that passed the list of values as a comma separated string to the stored procedure. the procedure then made a PATINDEX(...) > 0 comparision. This was about a factor of 1:6 faster.

As mentioned here and explained here you can have primary key and unique constraints on the table type. E.g.
CREATE TYPE IdList AS TABLE ( Id UNIQUEIDENTIFIER NOT NULL PRIMARY KEY )
However, check if it improves performance in your case as now, these indexes exist when the TVP is populated which might lead to a counter effect depending if your input is sorted and/or if you use more than one column.

In common with table variables, table-valued parameters have no statistics (see the section "restrictions"); the query optimiser works on the assumption that they contain only one row, which if your parameter contains a lot of rows is likely to result in an inappropriate query plan.
One way to improve your chances of a better plan is to add a statement level recompile; this should enable the optimiser to take the size of the TVP into account when selecting a plan.
select * from table1 t where exists (select 1 from #mytvp p where t.id=p.id) OPTION (RECOMPILE)
(incorporating KM's suggestion)

Related

Splitting multiple fields by delimiter

I have to write an SP that can perform Partial Updates on our databases, the changes are stored in a record of the PU table. A values fields contains all values, delimited by a fixed delimiter. A tables field refers to a Schemes table containing the column names for each table in a similar fashion in a Colums fiels.
Now for my SP I need to split the Values field and Columns field in a temp table with Column/Value pairs, this happens for each record in the PU table.
An example:
Our PU table looks something like this:
CREATE TABLE [dbo].[PU](
[Table] [nvarchar](50) NOT NULL,
[Values] [nvarchar](max) NOT NULL
)
Insert SQL for this example:
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','John Doe;26');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Jane Doe;22');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Mike Johnson;20');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Mary Jane;24');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','Mathematics');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','English');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','Geography');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus A;Schools Road 1;Educationville');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus B;Schools Road 31;Educationville');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus C;Schools Road 22;Educationville');
And we have a Schemes table similar to this:
CREATE TABLE [dbo].[Schemes](
[Table] [nvarchar](50) NOT NULL,
[Columns] [nvarchar](max) NOT NULL
)
Insert SQL for this example:
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Person','[Name];[Age]');
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Course','[Name]');
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Campus','[Name];[Address];[City]');
As a result the first record of the PU table should result in a temp table like:
The 5th will have:
Finally, the 8th PU record should result in:
You get the idea.
I tried use the following query to create the temp tables, but alas it fails when there's more that one value in the PU record:
DECLARE #Fields TABLE
(
[Column] INT,
[Value] VARCHAR(MAX)
)
INSERT INTO #Fields
SELECT TOP 1
(SELECT Value FROM STRING_SPLIT([dbo].[Schemes].[Columns], ';')),
(SELECT Value FROM STRING_SPLIT([dbo].[PU].[Values], ';'))
FROM [dbo].[PU] INNER JOIN [dbo].[Schemes] ON [dbo].[PU].[Table] = [dbo].[Schemes].[Table]
TOP 1 correctly gets the first PU record as each PU record is removed once processed.
The error is:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
In the case of a Person record, the splits are indeed returning 2 values/colums at a time, I just want to store the values in 2 records instead of getting an error.
Any help on rewriting the above query?
Also do note that the data is just generic nonsense. Being able to have 2 fields that both have delimited values, always equal in amount (e.g. a 'person' in the PU table will always have 2 delimited values in the field), and break them up in several column/header rows is the point of the question.
UPDATE: Working implementation
Based on the (accepted) answer of Sean Lange, I was able to work out followin implementation to overcome the issue:
As I need to reuse it, the combine column/value functionality is performed by a new function, declared as such:
CREATE FUNCTION [dbo].[JoinDelimitedColumnValue]
(#splitValues VARCHAR(8000), #splitColumns VARCHAR(8000),#pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH MyValues AS
(
SELECT ColumnPosition = x.ItemNumber,
ColumnValue = x.Item
FROM dbo.DelimitedSplit8K(#splitValues, #pDelimiter) x
)
, ColumnData AS
(
SELECT ColumnPosition = x.ItemNumber,
ColumnName = x.Item
FROM dbo.DelimitedSplit8K(#splitColumns, #pDelimiter) x
)
SELECT cd.ColumnName,
v.ColumnValue
FROM MyValues v
JOIN ColumnData cd ON cd.ColumnPosition = v.ColumnPosition
;
In case of the above sample data, I'd call this function with the following SQL:
DECLARE #FieldValues VARCHAR(8000), #FieldColumns VARCHAR(8000)
SELECT TOP 1 #FieldValues=[dbo].[PU].[Values], #FieldColumns=[dbo].[Schemes].[Columns] FROM [dbo].[PU] INNER JOIN [dbo].[Schemes] ON [dbo].[PU].[Table] = [dbo].[Schemes].[Table]
INSERT INTO #Fields
SELECT [Column] = x.[ColumnName],[Value] = x.[ColumnValue] FROM [dbo].[JoinDelimitedColumnValue](#FieldValues, #FieldColumns, #Delimiter) x
This data structure makes this way more complicated than it should be. You can leverage the splitter from Jeff Moden here. http://www.sqlservercentral.com/articles/Tally+Table/72993/ The main difference of that splitter and all the others is that his returns the ordinal position of each element. Why all the other splitters don't do this is beyond me. For things like this it is needed. You have two sets of delimited data and you must ensure that they are both reassembled in the correct order.
The biggest issue I see is that you don't have anything in your main table to function as an anchor for ordering the results correctly. You need something, even an identity to ensure the output rows stay "together". To accomplish I just added an identity to the PU table.
alter table PU add RowOrder int identity not null
Now that we have an anchor this is still a little cumbersome for what should be a simple query but it is achievable.
Something like this will now work.
with MyValues as
(
select p.[Table]
, ColumnPosition = x.ItemNumber
, ColumnValue = x.Item
, RowOrder
from PU p
cross apply dbo.DelimitedSplit8K(p.[Values], ';') x
)
, ColumnData as
(
select ColumnName = replace(replace(x.Item, ']', ''), '[', '')
, ColumnPosition = x.ItemNumber
, s.[Table]
from Schemes s
cross apply dbo.DelimitedSplit8K(s.Columns, ';') x
)
select cd.[Table]
, v.ColumnValue
, cd.ColumnName
from MyValues v
join ColumnData cd on cd.[Table] = v.[Table]
and cd.ColumnPosition = v.ColumnPosition
order by v.RowOrder
, v.ColumnPosition
I recommended not storing values like this in the first place. I recommend having a key value in the tables and preferably not using Table and Columns as a composite key. I recommend to avoid using reserved words. I also don't know what version of SQL you are using. I am going to assume you are using a fairly recent version of Microsoft SQL Server that will support my provided stored procedure.
Here is an overview of the solution:
1) You need to convert both the PU and the Schema table into a table where you will have each "column" value in the list of columns isolated in their own row. If you can store the data in this format rather than the provided format, you will be a little better off.
What I mean is
Table|Columns
Person|Jane Doe;22
needs converted to
Table|Column|OrderInList
Person|Jane Doe|1
Person|22|2
There are multiple ways to do this, but I prefer an xml trick that I picked up. You can find multiple split string examples online so I will not focus on that. Use whatever gives you the best performance. Unfortunately, You might not be able to get away from this table-valued function.
Update:
Thanks to Shnugo's performance enhancement comment, I have updated my xml splitter to give you the row number which reduces some of my code. I do the exact same thing to the Schema list.
2) Since the new Schema table and the new PU table now have the order each column appears, the PU table and the schema table can be joined on the "Table" and the OrderInList
CREATE FUNCTION [dbo].[fnSplitStrings_XML]
(
#List NVARCHAR(MAX),
#Delimiter VARCHAR(255)
)
RETURNS TABLE
AS
RETURN
(
SELECT y.i.value('(./text())[1]', 'nvarchar(4000)') AS Item,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as RowNumber
FROM
(
SELECT CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.') AS x
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
CREATE Procedure uspGetColumnValues
as
Begin
--Split each value in PU
select p.[Table],p.[Values],a.[Item],CHARINDEX(a.Item,p.[Values]) as LocationInStringForSorting,a.RowNumber
into #PuWithOrder
from PU p
cross apply [fnSplitStrings_XML](p.[Values],';') a --use whatever string split function is working best for you (performance wise)
--Split each value in Schema
select s.[Table],s.[Columns],a.[Item],CHARINDEX(a.Item,s.[Columns]) as LocationInStringForSorting,a.RowNumber
into #SchemaWithOrder
from Schemes s
cross apply [fnSplitStrings_XML](s.[Columns],';') a --use whatever string split function is working best for you (performance wise)
DECLARE #Fields TABLE --If this is an ETL process, maybe make this a permanent table with an auto incrementing Id and reference this table in all steps after this.
(
[Table] NVARCHAR(50),
[Columns] NVARCHAR(MAX),
[Column] VARCHAR(MAX),
[Value] VARCHAR(MAX),
OrderInList int
)
INSERT INTO #Fields([Table],[Columns],[Column],[Value],OrderInList)
Select pu.[Table],pu.[Values] as [Columns],s.Item as [Column],pu.Item as [Value],pu.RowNumber
from #PuWithOrder pu
join #SchemaWithOrder s on pu.[Table]=s.[Table] and pu.RowNumber=s.RowNumber
Select [Table],[Columns],[Column],[Value],OrderInList
from #Fields
order by [Table],[Columns],OrderInList
END
GO
EXEC uspGetColumnValues
GO
Update:
Since your working implementation is a table-valued function, I have another recommendation. The problem I see is that your using a table valued function which ultimately handles one record at a time. You are going to have better performance with set based operations and batching as needed. With a tabled valued function, you are likely going to be looping through each row. If this is some sort of ETL process, your team will be better off if you have a stored procedure that processes the rows in bulk. It might make sense to stage the results into a better table that your team can work with down stream rather than have them use a potentially slow table-valued function.

Does MS SQL Server automatically create temp table if the query contains a lot id's in 'IN CLAUSE'

I have a big query to get multiple rows by id's like
SELECT *
FROM TABLE
WHERE Id in (1001..10000)
This query runs very slow and it ends up with timeout exception.
Temp fix for it is querying with limit, break this query into 10 parts per 1000 id's.
I heard that using temp tables may help in this case but also looks like ms sql server automatically doing it underneath.
What is the best way to handle problems like this?
You could write the query as follows using a temporary table:
CREATE TABLE #ids(Id INT NOT NULL PRIMARY KEY);
INSERT INTO #ids(Id) VALUES (1001),(1002),/*add your individual Ids here*/,(10000);
SELECT
t.*
FROM
[Table] AS t
INNER JOIN #ids AS ids ON
ids.Id=t.Id;
DROP TABLE #ids;
My guess is that it will probably run faster than your original query. Lookup can be done directly using an index (if it exists on the [Table].Id column).
Your original query translates to
SELECT *
FROM [TABLE]
WHERE Id=1000 OR Id=1001 OR /*...*/ OR Id=10000;
This would require evalutation of the expression Id=1000 OR Id=1001 OR /*...*/ OR Id=10000 for every row in [Table] which probably takes longer than with a temporary table. The example with a temporary table takes each Id in #ids and looks for a corresponding Id in [Table] using an index.
This all assumes that there are gaps in the Ids between 1000 and 10000. Otherwise it would be easier to write
SELECT *
FROM [TABLE]
WHERE Id BETWEEN 1001 AND 10000;
This would also require an index on [Table].Id to speed it up.

Sql Server Performance: table variable inner join vs multiple conditions in where clause

What is faster in MS Sql Server, a where clause with multiple conditions or a inner join after creating a table variable? For example:
select A.* from A where A.fk='one ' or A.fk='two ' or A.fk='three' ...ect.
vs
declare #temp (key as char(matchingWidth)) table;
insert into #temp values ('one ');
insert into #temp values ('two ');
insert into #temp values ('three');
select A.* from A inner join #temp t on A.fk=t.key;
I know normally the difference would be negligible; however, sadly the database I am querying use the char type for primary keys...
If it helps, in my particular case, table A has a few million records, and there would usually be about a hundred ids I'd be querying for. The column is indexed, but not a clustered index.
EDIT: I am also open to the same thing with a temp table... although I was under the impression that both a temp table and table variable where virtually identical in terms of performance.
Thanks!
In most cases the first approach will win as table variable does not use statistics. You'll notice big performance decrease with big amount of data. When you have just few values then there is not supposed to be any noticeable difference.

How to use INSERT SELECT?

I have a table's structure:
[Subjects]:
id int Identity Specification yes
Deleted bit
[Juridical]:
id int
Name varchar
typeid int
[Individual]:
id int
Name varchar
Juridical and Individual it's a children classes of Subjects class. So it's mean that same rows in tables Individual and Subjects have a same id.
Now I have a table:
[MyTable]:
typeid varchar
Name varchar
And I want to select data from this table and insert it into my table structure. But I don't know what to do. I tried to use OUTPUT:
INSERT INTO [Individual](Name)
OUTPUT false
INTO [Subjects].[Deleted]
SELECT [MyTable].[Name] as Name
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
But the syntax is not correct.
Just use:
INSERT INTO Individual(Name)
SELECT [MyTable].[Name] as Name
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
and
INSERT INTO Subjects(Deleted)
SELECT [MyTable].[Name] as Name
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
You can't insert in a single query in two tables, you need two separate queries for that. For that reason I split your initial query into two INSERT statements, to add records to both your Individual and Subjects table.
Just as #marc_s said, you must select the exact number of columns in your SELECT statement with the number of columns you want to insert data into your tables.
Other than these two constraints, which are both related to syntax, you are fully allowed to do any filtering in the SELECT part or make any complex logic as you would do in a normal SELECT query.
You need to use this syntax:
INSERT INTO [Individual] (Name)
SELECT [MyTable].[Name]
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
You should define the list of column to insert into in the INSERT INTO line, and then you must have a SELECT that returns exactly that many columns as you need (and the column types need to match, too)

Including value from temp table slows down query

I have a stored procedure that uses a temporary table to make some joins in a select clause. The select clause contains the value from the Id column of the temporary table like this:
CREATE TABLE #TempTable
(
Id INT PRIMARY KEY,
RootVal INT
)
The Select looks like this:
Select value1, value2, #TempTable.Id AS ValKey
From MainTable INNER JOIN #TempTable ON MainTable.RootVal = #TempTable.RootVal
The query takes over a minute to run in real life but if I remove the "#TempTable.Id" from the select list it runs in a second.
Does anyone know why there is such a huge cost to including a value from a #temp table compared to just using it in a join?
Most likely:
data type mismatch
eg nvarchar vs int
lack of index on MainTable.RootVal
Why have Id as PK and then JOIN on another column?

Resources