concatenate multiple results of table-valued function into a single table - sql-server

Let's assume there is such table in SqlServer 2008 database:
CREATE TABLE [dbo].[Test] (
[TableId] [int] IDENTITY(1,1) NOT> NULL,
[Data] [xml] NOT NULL
)
and also I have such table-valued function to parse column Data in my table:
ALTER FUNCTION [dbo].[fnParseTable] (#header XML)
RETURNS #parsedTable TABLE (
[Type] NVARCHAR(50),
[Value] NVARCHAR(50)
)
AS BEGIN
--parse xml here
RETURN
END
Can I concatenate all results of this function for each column of the table?
I need something like this:
SELECT UNION fnParseTable(Data) FROM dbo.Test
PS. I know I can do it using cursor, but I want to make sure there are no any easier solutions

You don't need a table valued function, use XPath to extract these values directly in a SELECT statement
SELECT
Data.query('data(/xpath/to[#your="type"])') AS type,
Data.query('data(/xpath/to[#your="value"])') AS value
FROM Test
/* JOINs, WHERE HAVING, GROUP BY and/or ORDER BY clauses */
query() executes an XPath expression, while data() extracts a value from the resulting XML node.
Update
MSDN Link
DECLARE #testTable TABLE(
XmlData XML
)
INSERT INTO #testTable (XmlData)
VALUES ('<row><node><key>key11</key><value>value11</value></node><node><key>key12</key><value>value12</value></node></row>')
INSERT INTO #testTable (XmlData)
VALUES ('<row><node><key>key21</key><value>value21</value></node><node><key>key22</key><value>value22</value></node></row>')
INSERT INTO #testTable (XmlData)
VALUES ('<row><node><key>key31</key><value>value31</value></node><node><key>key32</key><value>value32</value></node></row>')
SELECT
nref.value('key[1]', 'nvarchar(50)') AS [key],
nref.value('value[1]', 'nvarchar(50)') AS value
FROM #testTable CROSS APPLY XmlData.nodes('//node') AS R(nref)
Result
key11 value11
key12 value12
key21 value21
key22 value22
key31 value31
key32 value32

Orangepips answer seems the adequate solution for your problem.
But taking the question verbatim: YES there is a way to create a suitable aggregate function using CLR in SQL-Server 2005+.
But it is admittedly a bit complicated.
You have to compile the c# code, you find at Invoking CLR User-Defined Aggregate Functions.
I succeeded in running that example, but I prefer to use orangepips xpath solutions when suitable, because it is just T-SQL und requires no
sp_configure 'clr enabled',1
reconfigure

Related

Table valued parameters join performance

I want to pass a list of names to a stored procedure and then perform a left join. I have passed the list of names as a table-valued parameter.
CREATE PROCEDURE [DBO].[INSERTANDGETLATESTNAMES]
(#list [dbo].[NamesCollection] READONLY)
AS
BEGIN
INSERT INTO [dbo].[Employee](NAME)
OUTPUT INSERTED.NAME
SELECT NamesCollection.Name
FROM #list AS NamesCollection
LEFT JOIN [dbo].[Employee] AS emp ON NamesCollection.Name = emp.Name
WHERE emp.Name IS NULL
END
User-defined table type:
CREATE TYPE [dbo].[NamesCollection] AS TABLE
(
[NAME] [varchar](50) NULL
)
GO
SQL Server does not maintain statistics on table-valued parameters will that effect join performance in above case. If performance is slow then can I go for passing the list of names in comma separated string and write a function to split and return a table to the stored procedure?
CREATE FUNCTION split_string_XML
(#in_string VARCHAR(MAX),
#delimiter VARCHAR(1))
RETURNS #list TABLE(NAMES VARCHAR(50))
AS
BEGIN
DECLARE #sql_xml XML = Cast('<root><U>'+ Replace(#in_string, #delimiter, '</U><U>')+ '</U></root>' AS XML)
INSERT INTO #list(NAMES)
SELECT f.x.value('.', 'VARCHAR(50)') AS NAMES
FROM #sql_xml.nodes('/root/U') f(x)
WHERE f.x.value('.', 'VARCHAR(50)') <> ''
RETURN
END
GO
or
CREATE FUNCTION split_string_delimiter
(#in_string VARCHAR(MAX),
#delimiter VARCHAR(1))
RETURNS #list TABLE(NAME VARCHAR(50))
AS
BEGIN
INSERT INTO #list(NAME)
SELECT value AS NAMES
FROM STRING_SPLIT(#in_string, #delimiter);
RETURN
END
GO
Using STRING_SPLIT is better then using XML PATH for splitting. If you can use STRING_SPLIT, you can use JSON PATH.
Using JSON PATH, transform the JSON variable into row set and insert it #temporary table, not #table variable as depending on your SQL Server version, #temporary tables performed better when large amount of data is proceed.
Also, if you want to add new fields to the JSON variable there will be no need to edit the table type. While editing the table type is difficult because of referencing.
The SELECT in which you're using the table-valued parameter is pretty straightforward.
It's just a join and given that the cardinality for table variables is calculated as one, assuming that dbo.Employee.Name is indexed and that column types match, that join is going to be implemented with a loop join that is the quickest option for that case.
Just make sure that dbo.Employee.Name is properly indexed.

Splitting multiple fields by delimiter

I have to write an SP that can perform Partial Updates on our databases, the changes are stored in a record of the PU table. A values fields contains all values, delimited by a fixed delimiter. A tables field refers to a Schemes table containing the column names for each table in a similar fashion in a Colums fiels.
Now for my SP I need to split the Values field and Columns field in a temp table with Column/Value pairs, this happens for each record in the PU table.
An example:
Our PU table looks something like this:
CREATE TABLE [dbo].[PU](
[Table] [nvarchar](50) NOT NULL,
[Values] [nvarchar](max) NOT NULL
)
Insert SQL for this example:
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','John Doe;26');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Jane Doe;22');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Mike Johnson;20');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Mary Jane;24');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','Mathematics');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','English');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','Geography');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus A;Schools Road 1;Educationville');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus B;Schools Road 31;Educationville');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus C;Schools Road 22;Educationville');
And we have a Schemes table similar to this:
CREATE TABLE [dbo].[Schemes](
[Table] [nvarchar](50) NOT NULL,
[Columns] [nvarchar](max) NOT NULL
)
Insert SQL for this example:
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Person','[Name];[Age]');
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Course','[Name]');
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Campus','[Name];[Address];[City]');
As a result the first record of the PU table should result in a temp table like:
The 5th will have:
Finally, the 8th PU record should result in:
You get the idea.
I tried use the following query to create the temp tables, but alas it fails when there's more that one value in the PU record:
DECLARE #Fields TABLE
(
[Column] INT,
[Value] VARCHAR(MAX)
)
INSERT INTO #Fields
SELECT TOP 1
(SELECT Value FROM STRING_SPLIT([dbo].[Schemes].[Columns], ';')),
(SELECT Value FROM STRING_SPLIT([dbo].[PU].[Values], ';'))
FROM [dbo].[PU] INNER JOIN [dbo].[Schemes] ON [dbo].[PU].[Table] = [dbo].[Schemes].[Table]
TOP 1 correctly gets the first PU record as each PU record is removed once processed.
The error is:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
In the case of a Person record, the splits are indeed returning 2 values/colums at a time, I just want to store the values in 2 records instead of getting an error.
Any help on rewriting the above query?
Also do note that the data is just generic nonsense. Being able to have 2 fields that both have delimited values, always equal in amount (e.g. a 'person' in the PU table will always have 2 delimited values in the field), and break them up in several column/header rows is the point of the question.
UPDATE: Working implementation
Based on the (accepted) answer of Sean Lange, I was able to work out followin implementation to overcome the issue:
As I need to reuse it, the combine column/value functionality is performed by a new function, declared as such:
CREATE FUNCTION [dbo].[JoinDelimitedColumnValue]
(#splitValues VARCHAR(8000), #splitColumns VARCHAR(8000),#pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH MyValues AS
(
SELECT ColumnPosition = x.ItemNumber,
ColumnValue = x.Item
FROM dbo.DelimitedSplit8K(#splitValues, #pDelimiter) x
)
, ColumnData AS
(
SELECT ColumnPosition = x.ItemNumber,
ColumnName = x.Item
FROM dbo.DelimitedSplit8K(#splitColumns, #pDelimiter) x
)
SELECT cd.ColumnName,
v.ColumnValue
FROM MyValues v
JOIN ColumnData cd ON cd.ColumnPosition = v.ColumnPosition
;
In case of the above sample data, I'd call this function with the following SQL:
DECLARE #FieldValues VARCHAR(8000), #FieldColumns VARCHAR(8000)
SELECT TOP 1 #FieldValues=[dbo].[PU].[Values], #FieldColumns=[dbo].[Schemes].[Columns] FROM [dbo].[PU] INNER JOIN [dbo].[Schemes] ON [dbo].[PU].[Table] = [dbo].[Schemes].[Table]
INSERT INTO #Fields
SELECT [Column] = x.[ColumnName],[Value] = x.[ColumnValue] FROM [dbo].[JoinDelimitedColumnValue](#FieldValues, #FieldColumns, #Delimiter) x
This data structure makes this way more complicated than it should be. You can leverage the splitter from Jeff Moden here. http://www.sqlservercentral.com/articles/Tally+Table/72993/ The main difference of that splitter and all the others is that his returns the ordinal position of each element. Why all the other splitters don't do this is beyond me. For things like this it is needed. You have two sets of delimited data and you must ensure that they are both reassembled in the correct order.
The biggest issue I see is that you don't have anything in your main table to function as an anchor for ordering the results correctly. You need something, even an identity to ensure the output rows stay "together". To accomplish I just added an identity to the PU table.
alter table PU add RowOrder int identity not null
Now that we have an anchor this is still a little cumbersome for what should be a simple query but it is achievable.
Something like this will now work.
with MyValues as
(
select p.[Table]
, ColumnPosition = x.ItemNumber
, ColumnValue = x.Item
, RowOrder
from PU p
cross apply dbo.DelimitedSplit8K(p.[Values], ';') x
)
, ColumnData as
(
select ColumnName = replace(replace(x.Item, ']', ''), '[', '')
, ColumnPosition = x.ItemNumber
, s.[Table]
from Schemes s
cross apply dbo.DelimitedSplit8K(s.Columns, ';') x
)
select cd.[Table]
, v.ColumnValue
, cd.ColumnName
from MyValues v
join ColumnData cd on cd.[Table] = v.[Table]
and cd.ColumnPosition = v.ColumnPosition
order by v.RowOrder
, v.ColumnPosition
I recommended not storing values like this in the first place. I recommend having a key value in the tables and preferably not using Table and Columns as a composite key. I recommend to avoid using reserved words. I also don't know what version of SQL you are using. I am going to assume you are using a fairly recent version of Microsoft SQL Server that will support my provided stored procedure.
Here is an overview of the solution:
1) You need to convert both the PU and the Schema table into a table where you will have each "column" value in the list of columns isolated in their own row. If you can store the data in this format rather than the provided format, you will be a little better off.
What I mean is
Table|Columns
Person|Jane Doe;22
needs converted to
Table|Column|OrderInList
Person|Jane Doe|1
Person|22|2
There are multiple ways to do this, but I prefer an xml trick that I picked up. You can find multiple split string examples online so I will not focus on that. Use whatever gives you the best performance. Unfortunately, You might not be able to get away from this table-valued function.
Update:
Thanks to Shnugo's performance enhancement comment, I have updated my xml splitter to give you the row number which reduces some of my code. I do the exact same thing to the Schema list.
2) Since the new Schema table and the new PU table now have the order each column appears, the PU table and the schema table can be joined on the "Table" and the OrderInList
CREATE FUNCTION [dbo].[fnSplitStrings_XML]
(
#List NVARCHAR(MAX),
#Delimiter VARCHAR(255)
)
RETURNS TABLE
AS
RETURN
(
SELECT y.i.value('(./text())[1]', 'nvarchar(4000)') AS Item,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as RowNumber
FROM
(
SELECT CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.') AS x
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
CREATE Procedure uspGetColumnValues
as
Begin
--Split each value in PU
select p.[Table],p.[Values],a.[Item],CHARINDEX(a.Item,p.[Values]) as LocationInStringForSorting,a.RowNumber
into #PuWithOrder
from PU p
cross apply [fnSplitStrings_XML](p.[Values],';') a --use whatever string split function is working best for you (performance wise)
--Split each value in Schema
select s.[Table],s.[Columns],a.[Item],CHARINDEX(a.Item,s.[Columns]) as LocationInStringForSorting,a.RowNumber
into #SchemaWithOrder
from Schemes s
cross apply [fnSplitStrings_XML](s.[Columns],';') a --use whatever string split function is working best for you (performance wise)
DECLARE #Fields TABLE --If this is an ETL process, maybe make this a permanent table with an auto incrementing Id and reference this table in all steps after this.
(
[Table] NVARCHAR(50),
[Columns] NVARCHAR(MAX),
[Column] VARCHAR(MAX),
[Value] VARCHAR(MAX),
OrderInList int
)
INSERT INTO #Fields([Table],[Columns],[Column],[Value],OrderInList)
Select pu.[Table],pu.[Values] as [Columns],s.Item as [Column],pu.Item as [Value],pu.RowNumber
from #PuWithOrder pu
join #SchemaWithOrder s on pu.[Table]=s.[Table] and pu.RowNumber=s.RowNumber
Select [Table],[Columns],[Column],[Value],OrderInList
from #Fields
order by [Table],[Columns],OrderInList
END
GO
EXEC uspGetColumnValues
GO
Update:
Since your working implementation is a table-valued function, I have another recommendation. The problem I see is that your using a table valued function which ultimately handles one record at a time. You are going to have better performance with set based operations and batching as needed. With a tabled valued function, you are likely going to be looping through each row. If this is some sort of ETL process, your team will be better off if you have a stored procedure that processes the rows in bulk. It might make sense to stage the results into a better table that your team can work with down stream rather than have them use a potentially slow table-valued function.

How to check if a value is included by a list in an effective way in sql 2008?

I would like to use something like an .Include function in SQL Server 2008, but I could not find the correct syntax for it. I have a sql query like below:
--#values has to be varchar list and start & end with comma
declare #values varchar(max) = ',7,34,37,74,85,'
select (case when #values like '%,' + m.Id + ',%' then m.Name else null end)
from #myTable m
So the logic is, if ID of a record matches with one of the numbers in #values list, I would like to see its name in the output list. This query is working fine, but I would like to find a more professional way to handle it, maybe like:
case when #values.Include(m.Id) then m.Name else null end
Any advice would be appreciated. Thanks.
The fastest method to split a delimited string is using xquery in my experience.
Ex:
DECLARE #values VARCHAR(50), #XML XML
SET #values = ',7,34,37,74,85,'
SET #XML = cast(('<X>'+replace(#values,',' ,'</X><X>')+'</X>') as xml)
SELECT N.value('.', 'VARCHAR(255)') as value FROM #XML.nodes('X') as T(N)
declare #table table (id varchar(5))
insert into #table(id)
values ('7')
select *
from #table y
where exists (SELECT 1 FROM #XML.nodes('X') as T(N) where N.value('.', 'VARCHAR(255)') = y.id)
If you are calling this code from an application, you might want to consider using Table-Valued Parameters and a stored procedure to do this.
First, you would need to create a table type to use with the procedure:
create type dbo.Ids_udt as table (Id int not null);
go
Then, create the procedure:
create procedure dbo.get_names_from_list (
#Ids as dbo.Ids_udt readonly
) as
begin;
set nocount, xact_abort on;
select t.Name
from t
inner join #Ids i
on t.Id = i.Id
end;
go
Then, assemble and pass the list of Ids to the stored procedure using a DataTable added as a SqlParameter using SqlDbType.Structured.
Table Valued Parameter Reference:
SQL Server 2008 Table-Valued Parameters and C# Custom Iterators: A Match Made In Heaven! - Leonard Lobel
Table Value Parameter Use With C# - Jignesh Trivedi
Using Table-Valued Parameters in SQL Server and .NET - Erland Sommarskog
Maximizing Performance with Table-Valued Parameters - Dan Guzman
Maximizing throughput with tvp - sqlcat
How to use TVPs with Entity Framework 4.1 and CodeFirst
Assuming that the data/list is not required to be structered as a comma separated list you could either use IN, EXISTS or SOME / ANY
If it is unavoidable you could use JiggsJedi way but since you asked for a fast way you should try to store the data in a way that in can be processed faster and does not require additional work to be queried.
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
Drop table #Temp
Create table #Temp (ID INt ,Name varchar(5))
INSERT into #Temp
SELECT 7,'AA' Union all
SELECT 34,'BA' Union all
SELECT 37,'CA' Union all
SELECT 74,'DA' Union all
SELECT 85,'TA'
DECLARE #values varchar(max) = ',,,,,,7,,34,,,74,85,,,,' --If extra commas are added in starting or end or in between of string it could handle
SET #values=','+#values+','
SELECT #values= LEFT(STUFF(#values,1,1,''),LEN(#values)-2)
DECLARE #SelectValuesIn TABLE(Value INT)
INSERT INTO #SelectValuesIn
SELECT Split.a.value('.', 'VARCHAR(100)') AS Data
FROM
(
SELECT
CAST ('<M>' + REPLACE(#values, ',', '</M><M>') + '</M>' AS XML) AS Data
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
SELECT * FROM #Temp WHERE ID IN(SELECT Value from #SelectValuesIn)

How to avoid bulk data duplicate values when insert in to a table in SQL Server 2012

The problem im trying to solve is about avoiding duplicate data getting into my table. I'm using xml to send bulk data to a stored procedure. The procedure I wrote works with 100, 200 records. But when it comes to 20000 of them there is a time out exception.
This is the stored procedure:
DECLARE #TEMP TABLE (Page_No varchar(MAX))
DECLARE #TEMP2 TABLE (Page_No varchar(MAX))
INSERT INTO #TEMP(Page_No)
SELECT
CAST(CC.query('data(PageId)') AS NVARCHAR(MAX)) AS Page_No
FROM
#XML.nodes('DocumentElement/CusipsFile') AS tt(CC)
INSERT INTO #TEMP2(Page_No)
SELECT Page_No
FROM tbl_Cusips_Pages
INSERT INTO tbl_Cusips_Pages(Page_No, Download_Status)
SELECT Page_No, 'False'
FROM #TEMP
WHERE Page_No NOT IN (SELECT Page_No FROM #TEMP
INTERSECT
SELECT Page_No FROM #TEMP2)
How can I solve this? Is there a better way to write this procedure?
As was already suggested, NVARCHAR(MAX) column/variable is very slow and has limited options. If you can change it, it would help a lot.
MERGE tbl_Cusips_Pages
USING (
SELECT
CAST(CC.query('data(PageId)') AS NVARCHAR(4000))
FROM
#XML.nodes('DocumentElement/CusipsFile') AS tt(CC)
) AS source (Page_No)
ON tbl_Cusips_Pages.Page_No = source.Page_No
WHEN NOT MATCHED BY TARGET
THEN INSERT (Page_No, Download_Status)
VALUES (source.Page_No, 'false')
Anyway, your query is not that bad either, just put the queries directly into the third one (TEMP2 one for sure) instead of inserting the data into the table variables. Table variables are quite slow in comparison.
Replace last INSERT Statement with following Script, I have replace IN Clause With NOT EXISTS that may help you for better performance.
DECLARE #CommanPageNo TABLE (Page_No varchar(MAX))
INSERT INTO #CommanPageNo SELECT Page_No FROM #TEMP
INTERSECT
SELECT Page_No FROM #TEMP2
INSERT INTO tbl_Cusips_Pages(Page_No, Download_Status)
SELECT Page_No, 'False'
FROM #TEMP
WHERE NOT EXISTS (SELECT 1 FROM #CommanPageNo WHERE Page_No=#CommanPageNo.Page_No)

Is it possible to use a Stored Procedure as a subquery in SQL Server 2008?

I have two stored procedures, one of which returns a list of payments, while the other returns a summary of those payments, grouped by currency. Right now, I have a duplicated query: the main query of the stored procedure that returns the list of payments is a subquery of the stored procedure that returns the summary of payments by currency. I would like to eliminate this duplicity by making the stored procedure that returns the list of payments a subquery of the stored procedure that returns the summary of payments by currency. Is that possible in SQL Server 2008?
You are better off converting the first proc into a TABLE-VALUED function. If it involves multiple statements, you need to first define the return table structure and populate it.
Sample:
CREATE proc getRecords #t char(1)
as
set nocouut on;
-- other statements --
-- final select
select * from master..spt_values where type = #t
GO
-- becomes --
CREATE FUNCTION fn_getRecords(#t char(1))
returns #output table(
name sysname,
number int,
type char(1),
low int,
high int,
status int) as
begin
-- other statements --
-- final select
insert #output
select * from master..spt_values where type = #t
return
end;
However, if it is a straight select (or can be written as a single statement), then you can use the INLINE tvf form, which is highly optimized
CREATE FUNCTION fn2_getRecords(#t char(1))
returns table as return
-- **NO** other statements; single statement table --
select * from master..spt_values where type = #t
The second proc simply selects from the first proc
create proc getRecordsByStatus #t char(1)
as
select status, COUNT(*) CountRows from dbo.fn2_getRecords(#t)
group by status
And where you used to call
EXEC firstProc #param
to get a result, you now select from it
SELECT * FROM firstProc(#param)
You can capture the output from a stored procedure in a temp table and then use the table in your main query.
Capture the output of a stored procedure returning columns ID and Name to a table variable.
declare #T table (ID int, Name nvarchar(50))
insert into #T
exec StoredProcedure
Inserting the results of your stored proc into a table variable or temp table will do the trick.
If you're trying to reuse code in SQL Server from one query to the next, you have more flexibility with Table Functions. Views are all right if you don't need to pass parameters or use any kind of flow control logic. These may be used like tables in any other function, procedure, view or t-sql statement.
If you made the procedure that returns the list into a table-valued function, then I believe you could use it in a sub-query.
I would use a view, unless it needs to be parameterized, in which case I would use an inline table-valued function if possible, unless it needs to be a multi-statement operation, where you can still use a table-valued function, but they are usually less efficient.

Resources