SQL Server : efficient way to find missing Ids - sql-server

I am using SQL Server to store tens of millions of records. I need to be able to query its tables to find missing rows where there are gaps in the Id column, as there should be none.
I am currently using a solution that I have found here on StackOverflow:
CREATE PROCEDURE [dbo].[find_missing_ids]
#Table NVARCHAR(128)
AS
BEGIN
DECLARE #query NVARCHAR(MAX)
SET #query = 'WITH Missing (missnum, maxid) '
+ N'AS '
+ N'('
+ N' SELECT 1 AS missnum, (select max(Id) from ' + #Table + ') '
+ N' UNION ALL '
+ N' SELECT missnum + 1, maxid FROM Missing '
+ N' WHERE missnum < maxid '
+ N') '
+ N'SELECT missnum '
+ N'FROM Missing '
+ N'LEFT OUTER JOIN ' + #Table + ' tt on tt.Id = Missing.missnum '
+ N'WHERE tt.Id is NULL '
+ N'OPTION (MAXRECURSION 0);';
EXEC sp_executesql #query
END;
This solution has been working very well, but it has been getting slower and more resource intensive as the tables have grown. Now, running the procedure on a table of 38 million rows is taking about 3.5 minutes and lots of CPU.
Is there a more efficient way to perform this? After a certain range has been found to not contain any missing Ids, I no longer need to check that range again.

JBJ's answer is almost complete. The query needs to return the From and Through for each range of missing values.
select B+1 as [From],A-1 as[Through]from
(select StuffID as A,
lag(StuffID)over(order by StuffID)as B from Stuff)z
where A<>B+1
order by A
I created a test table with 50 million records, then deleted a few. The first row of the result is:
From Through
33 35
This indicates that all IDs in the range from 33 through 35 are missing, i.e. 33, 34 and 35.
On my machine the query took 37 seconds.

try
select pId
from (select Id, lag(Id) over (order by Id) pId from yourschema.yourtable) e
where pId <> (Id-1)
order by Id
replacing yourschema.yourtable with the appropriate table information

Try this solution, it will be faster than CTE.
;WITH CTE AS
(
SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT NULL)) RN
FROM ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v(id) --10 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v1(id)--100 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v2(id) --1000 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v3(id) --10000 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v4(id)--100000 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v5(id)--1000000 ROWS
)
SELECT RN AS Missing
FROM CTE C
LEFT JOIN YOURABLE T ON T.ID=R.ID
WHERE T.ID IS NULL
If you want you can use master..[spt_values] also to generate the number like following.
SELECT (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) RN
FROM master..[spt_values] T1
CROSS JOIN (select top 500 * from master..[spt_values]) T2
Above query will generate 1268500 numbers
Note: You need to add the CROSS JOIN as per your requirement.

Related

Display All Columns from all Table by using Union ALL with Different no of Columns in Each Table

I have Three Tables with Different no of Columns. e.g T1(C1), T2(C1,C2,C3), T3(C1,C4). I want to generate a Dynamic SQL that will create a View like
CREATE VIEW [dbo].[vwData]
AS
SELECT C1,NULL AS C2,NULL AS C3,NULL AS C4
FROM DBO.T1
UNION ALL
SELECT C1,C2,C3,NULL AS C4
FROM DBO.T2
UNION ALL
SELECT C1,NULL AS C2,NULL AS C3,C4
FROM DBO.T3
I have achieved this goal by using two nested loop by Checking Each column If It is Existed in a table or not.
But in Production we have around 30 tables with around 60 Columns in Each table.
Create of Dynamic SQL is taking around 7 minutes and this is not Acceptable to us. We want to improve performance Further.
Immediate help would be highly appreciated.
Here's some Dynamic SQL which would create and execute what you describe. How does this compare to your current SQL's performance?
Fiddle: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=800747a3d832e6e29a15484665f5cc8b
declare #tablesOfInterest table(tableName sysname, sql nvarchar(max))
declare #allColumns table(columnName sysname)
declare #sql nvarchar(max)
insert #tablesOfInterest(tableName) values ('table1'), ('table2')
insert #allColumns (columnName)
select distinct c.name
from sys.columns c
where c.object_id in
(
select object_id(tableName)
from #tablesOfInterest
)
update t
set sql = 'select ' + columnSql + ' from ' + quotename(tableName)
from #tablesOfInterest t
cross apply
(
select string_agg(coalesce(quotename(c.Name), 'null') + ' ' + quotename(ac.columnName), ', ') within group (order by ac.columnName)
from #allColumns ac
left outer join sys.columns c
on c.object_id = object_id(t.tableName)
and c.Name = ac.columnName
) x(columnSql)
select #sql = string_agg(sql, ' union all ')
from #tablesOfInterest
print #sql
exec (#sql)
As mentioned in the comments, rather than running this dynamic SQL every time you need to execute this query, you could use it to generate a view which you can then reuse as required.
Adding indexes and filters to the underlying tables as appropriate could further improve performance; but without knowing more of the context, we can't give much advise on specifics.
You might try this:
I use some general tables where I know, that they share some of their columns to show the principles. Just replace the tables with your own tables:
Attention: I do not use these INFORMATION_SCHEMA tables to read their content. They serve as examples with overlapping columns...
DECLARE #statement NVARCHAR(MAX);
WITH cte(x) AS
(
SELECT
(SELECT TOP 1 * FROM INFORMATION_SCHEMA.TABLES FOR XML AUTO, ELEMENTS XSINIL,TYPE) AS [*]
,(SELECT TOP 1 * FROM INFORMATION_SCHEMA.COLUMNS FOR XML AUTO, ELEMENTS XSINIL,TYPE) AS [*]
,(SELECT TOP 1 * FROM INFORMATION_SCHEMA.ROUTINES FOR XML AUTO, ELEMENTS XSINIL,TYPE) AS [*]
--add all your tables here...
FOR XML PATH(''),TYPE
)
,AllColumns AS
(
SELECT DISTINCT a.value('local-name(.)','nvarchar(max)') AS ColumnName
FROM cte
CROSS APPLY x.nodes('/*/*') A(a)
)
,AllTables As
(
SELECT a.value('local-name(.)','nvarchar(max)') AS TableName
,a.query('*') ConnectedColumns
FROM cte
CROSS APPLY x.nodes('/*') A(a)
)
SELECT #statement=
STUFF((
(
SELECT 'UNION ALL SELECT ' +
'''' + TableName + ''' AS SourceTableName ' +
(
SELECT ',' + CASE WHEN ConnectedColumns.exist('/*[local-name()=sql:column("ColumnName")]')=1 THEN QUOTENAME(ColumnName) ELSE 'NULL' END + ' AS ' + QUOTENAME(ColumnName)
FROM AllColumns ac
FOR XML PATH('root'),TYPE
).value('.','nvarchar(max)') +
' FROM ' + REPLACE(QUOTENAME(TableName),'.','].[')
FROM AllTables
FOR XML PATH(''),TYPE).value('.','nvarchar(max)')
),1,10,'');
EXEC( #statement);
Short explanation:
The first row of each table will be tranformed into an XML. Using AUTO-mode will use the table's name in the <root> and add all columns as nested elements.
The second CTE will create a distinct list of all columns existing in any of the tables.
the third CTE will extract all Tables with their connected columns.
The final SELECT will use a nested string-concatenation to create a UNION ALL SELECT of all columns. The existance of a given name will decide, whether the column is called with its name or as NULL.
Just use PRINT to print out the #statement in order to see the resulting dynamically created SQL command.

return value at a position from STRING_SPLIT in SQL Server 2016

Can I return a value at a particular position with the STRING_SPLIT function in SQL Server 2016 or higher?
I know the order from a select is not guaranteed, but is it with STRING_SPLIT?
DROP TABLE IF EXISTS #split
SELECT 'z_y_x' AS splitIt
INTO #split UNION
SELECT 'a_b_c'
SELECT * FROM #split;
WITH cte
AS (
SELECT ROW_NUMBER() OVER ( PARTITION BY s.splitIt ORDER BY s.splitIt ) AS position,
s.splitIt,
value
FROM #split s
CROSS APPLY STRING_SPLIT(s.splitIt, '_')
)
SELECT * FROM cte WHERE position = 2
Will this always return the value at the 2nd element? b for a_b_c and y for z_y_x?
I don't understand why Microsoft doesn't return a position indicator column alongside the value for this function.
There is - starting with v2016 - a solution via FROM OPENJSON():
DECLARE #str VARCHAR(100) = 'val1,val2,val3';
SELECT *
FROM OPENJSON('["' + REPLACE(#str,',','","') + '"]');
The result
key value type
0 val1 1
1 val2 1
2 val3 1
The documentation tells clearly:
When OPENJSON parses a JSON array, the function returns the indexes of the elements in the JSON text as keys.
For your case this was:
SELECT 'z_y_x' AS splitIt
INTO #split UNION
SELECT 'a_b_c'
DECLARE #delimiter CHAR(1)='_';
SELECT *
FROM #split
CROSS APPLY OPENJSON('["' + REPLACE(splitIt,#delimiter,'","') + '"]') s
WHERE s.[key]=1; --zero based
Let's hope, that future versions of STRING_SPLIT() will include this information
UPDATE Performance tests, compare with popular Jeff-Moden-splitter
Try this out:
USE master;
GO
CREATE DATABASE dbTest;
GO
USE dbTest;
GO
--Jeff Moden's splitter
CREATE FUNCTION [dbo].[DelimitedSplit8K](#pString VARCHAR(8000), #pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
GO
--Avoid first call bias
SELECT * FROM dbo.DelimitedSplit8K('a,b,c',',');
GO
--Table to keep the results
CREATE TABLE Results(ID INT IDENTITY,ResultSource VARCHAR(100),durationMS INT, RowsCount INT);
GO
--Table with strings to split
CREATE TABLE dbo.DelimitedItems(ID INT IDENTITY,DelimitedNString nvarchar(4000),DelimitedString varchar(8000));
GO
--Get rows wiht randomly mixed strings of 100 items
--Try to play with the count of rows (count behind GO) and the count with TOP
INSERT INTO DelimitedItems(DelimitedNString)
SELECT STUFF((
SELECT TOP 100 ','+REPLACE(v.[name],',',';')
FROM master..spt_values v
WHERE LEN(v.[name])>0
ORDER BY NewID()
FOR XML PATH('')),1,1,'')
--Keep it twice in varchar and nvarchar
UPDATE DelimitedItems SET DelimitedString=DelimitedNString;
GO 500 --create 500 differently mixed rows
--The tests
DECLARE #d DATETIME2;
SET #d = SYSUTCDATETIME();
SELECT DI.ID, DS.Item, DS.ItemNumber
INTO #TEMP
FROM dbo.DelimitedItems DI
CROSS APPLY dbo.DelimitedSplit8K(DI.DelimitedNString,',') DS;
INSERT INTO Results(ResultSource,RowsCount,durationMS)
SELECT 'delimited8K with NVARCHAR(4000)'
,(SELECT COUNT(*) FROM #TEMP) AS RowCountInTemp
,DATEDIFF(MILLISECOND,#d,SYSUTCDATETIME()) AS Duration_NV_ms_delimitedSplit8K
SET #d = SYSUTCDATETIME();
SELECT DI.ID, DS.Item, DS.ItemNumber
INTO #TEMP2
FROM dbo.DelimitedItems DI
CROSS APPLY dbo.DelimitedSplit8K(DI.DelimitedString,',') DS;
INSERT INTO Results(ResultSource,RowsCount,durationMS)
SELECT 'delimited8K with VARCHAR(8000)'
,(SELECT COUNT(*) FROM #TEMP2) AS RowCountInTemp
,DATEDIFF(MILLISECOND,#d,SYSUTCDATETIME()) AS Duration_V_ms_delimitedSplit8K
SET #d = SYSUTCDATETIME();
SELECT DI.ID, OJ.[Value] AS Item, OJ.[Key] AS ItemNumber
INTO #TEMP3
FROM dbo.DelimitedItems DI
CROSS APPLY OPENJSON('["' + REPLACE(DI.DelimitedNString,',','","') + '"]') OJ;
INSERT INTO Results(ResultSource,RowsCount,durationMS)
SELECT 'OPENJSON with NVARCHAR(4000)'
,(SELECT COUNT(*) FROM #TEMP3) AS RowCountInTemp
,DATEDIFF(MILLISECOND,#d,SYSUTCDATETIME()) AS Duration_NV_ms_OPENJSON
SET #d = SYSUTCDATETIME();
SELECT DI.ID, OJ.[Value] AS Item, OJ.[Key] AS ItemNumber
INTO #TEMP4
FROM dbo.DelimitedItems DI
CROSS APPLY OPENJSON('["' + REPLACE(DI.DelimitedString,',','","') + '"]') OJ;
INSERT INTO Results(ResultSource,RowsCount,durationMS)
SELECT 'OPENJSON with VARCHAR(8000)'
,(SELECT COUNT(*) FROM #TEMP4) AS RowCountInTemp
,DATEDIFF(MILLISECOND,#d,SYSUTCDATETIME()) AS Duration_V_ms_OPENJSON
GO
SELECT * FROM Results;
GO
--Clean up
DROP TABLE #TEMP;
DROP TABLE #TEMP2;
DROP TABLE #TEMP3;
DROP TABLE #TEMP4;
USE master;
GO
DROP DATABASE dbTest;
Results:
200 items in 500 rows
1220 delimited8K with NVARCHAR(4000)
274 delimited8K with VARCHAR(8000)
417 OPENJSON with NVARCHAR(4000)
443 OPENJSON with VARCHAR(8000)
100 items in 500 rows
421 delimited8K with NVARCHAR(4000)
140 delimited8K with VARCHAR(8000)
213 OPENJSON with NVARCHAR(4000)
212 OPENJSON with VARCHAR(8000)
100 items in 5 rows
10 delimited8K with NVARCHAR(4000)
5 delimited8K with VARCHAR(8000)
3 OPENJSON with NVARCHAR(4000)
4 OPENJSON with VARCHAR(8000)
5 items in 500 rows
32 delimited8K with NVARCHAR(4000)
30 delimited8K with VARCHAR(8000)
28 OPENJSON with NVARCHAR(4000)
24 OPENJSON with VARCHAR(8000)
--unlimited length (only possible with OPENJSON)
--Wihtout a TOP clause while filling
--results in about 500 items in 500 rows
1329 OPENJSON with NVARCHAR(4000)
1117 OPENJSON with VARCHAR(8000)
Facit:
the popular splitter function does not like NVARCHAR
the function is limited to strings within 8k byte volumen
Only the case with many items and many rows in VARCHAR lets the splitter function be ahead.
In all other cases OPENJSON seems to be more or less faster...
OPENJSON can deal with (almost) unlimited counts
OPENJSON demands for v2016
Everybody is waiting for STRING_SPLIT with the position
UPDATE Added STRING_SPLIT to the test
In the meanwhile I re-run the test with two more test sections using STRING_SPLIT(). As position I had to return a hardcoded value as this function does not return the part's index.
In all tested cases OPENJSON was close with STRING_SPLIT and often faster:
5 items in 1000 rows
250 delimited8K with NVARCHAR(4000)
124 delimited8K with VARCHAR(8000) --this function is best with many rows in VARCHAR
203 OPENJSON with NVARCHAR(4000)
204 OPENJSON with VARCHAR(8000)
235 STRING_SPLIT with NVARCHAR(4000)
234 STRING_SPLIT with VARCHAR(8000)
200 items in 30 rows
140 delimited8K with NVARCHAR(4000)
31 delimited8K with VARCHAR(8000)
47 OPENJSON with NVARCHAR(4000)
31 OPENJSON with VARCHAR(8000)
47 STRING_SPLIT with NVARCHAR(4000)
31 STRING_SPLIT with VARCHAR(8000)
100 items in 10.000 rows
8145 delimited8K with NVARCHAR(4000)
2806 delimited8K with VARCHAR(8000) --fast with many rows!
5112 OPENJSON with NVARCHAR(4000)
4501 OPENJSON with VARCHAR(8000)
5028 STRING_SPLIT with NVARCHAR(4000)
5126 STRING_SPLIT with VARCHAR(8000)
The simple answer is, no. Microsoft so far have refused to provide Ordinal position as part of the return dataset in STRING_SPLIT. You'll need to use a different solution I'm afraid. For example Jeff Moden's DelimitedSplit8k.
(Yes, I realise this is more or less a link only answer, however, pasting Jeff's solution here would effectively be plagiarism).
If you were to use Jeff's solution, then you would be able to do something like:
SELECT *
FROM dbo.DelimitedSplit8K('a,b,c,d,e,f,g,h,i,j,k',',') DS
WHERE ItemNumber = 2;
Of course, you'd likely be passing column rather than a literal string.
I just extended #Shnugo's answer if the splitted text would contain line breaks, unicode and other non json compatible characters, to use
STRING_ESCAPE
My Test code with pipe as separator instead comma:
DECLARE #Separator VARCHAR(5) = STRING_ESCAPE('|', 'json'); -- here pipe or use any other separator (even ones escaped by json)
DECLARE #LongText VARCHAR(MAX) = 'Albert says: "baby, listen!"|ve Çağrı söylüyor: "Elma"|1st Line' + CHAR(13) + CHAR(10) + '2nd line';
SELECT * FROM OPENJSON('["' + REPLACE(STRING_ESCAPE(#LongText, 'json'), #Separator ,'","') + '"]'); -- ok
-- SELECT * FROM OPENJSON('["' + REPLACE(#LongText, #Separator ,'","') + '"]'); -- fails with: JSON text is not properly formatted. ...
Updated due to comment from Simon Zeinstra
I didn't want to deal with OPENJSON, but still wanted to get string_split() value by index.
The performance was not an issue in my case.
I used CTE (Common Table Expression)
Assume you have string str = "part1 part2 part3".
WITH split_res_list as
(
SELECT value FROM STRING_SPLIT('part1 part2 part3', ' ')
),
split_res_list_with_index as
(
SELECT [value],
ROW_NUMBER() OVER (ORDER BY [value] ASC) as [RowNumber]
FROM split_res_list
)
SELECT * FROM split_res_list_with_index WHERE RowNumber = 2
BUT: please be aware that the order of 3 parts is changed according to ORDER BY condition!
The output for the second row with "part2" value:
Using STRING_SPLIT:
STRING_SPLIT ( string , separator [ , enable_ordinal ] )
enable_ordinal
An int or bit expression that serves as a flag to enable or disable the ordinal output column. A value of 1 enables the ordinal column. If enable_ordinal is omitted, NULL, or has a value of 0, the ordinal column is disabled.
The enable_ordinal argument and ordinal output column are currently only supported in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics (serverless SQL pool only).
Query:
SELECT value FROM STRING_SPLIT('part1_part2_part3', '_', 1) WHERE ordinal = 2;
Here is my workaround. I will follow the Question waiting for a better answer:
UPDATED: Original code did not take into consideration if a word contains another.
UPDATE 2: Performance was horrible in production so i have to think another way. you have it at the end as option 2, implementation for table.
UPDATE 3: Added code for UDF in the implementation in a string.
Implementation in a string:
declare #a as nvarchar(100) = 'Lorem ipsum dolor dol ol sit amet. D Lorem DO ipsum DOL dolor sit amet. DOLORES ipsum';
WITH T AS (
SELECT T1.value
,charindex(' ' + T1.value + ' ',' ' + #a + ' ' ,0) AS INDX
,RN = ROW_NUMBER() OVER (PARTITION BY value order BY value)
FROM STRING_SPLIT(#a, ' ') AS T1
WHERE T1.value <> ''
),
R (VALUE,INDX,RN) AS (
SELECT *
FROM T
WHERE T.RN = 1
UNION ALL
SELECT T.VALUE
,charindex(' ' + T.value + ' ',' ' + #a + ' ',R.INDX + 1) AS INDX
,T.RN
FROM T
JOIN R
ON T.value = R.VALUE
AND T.RN = R.RN + 1
)
SELECT * FROM R ORDER BY INDX
result:
tableOfResults
UDF:
CREATE FUNCTION DBO.UDF_get_word(#string nvarchar(100),#wordNumber int)
returns nvarchar(100)
AS
BEGIN
DECLARE #searchedWord nvarchar(100);
WITH T AS (
SELECT T1.value
,charindex(' ' + T1.value + ' ',' ' + #string + ' ' ,0) AS INDX
,RN = ROW_NUMBER() OVER (PARTITION BY value order BY value)
FROM STRING_SPLIT(#string, ' ') AS T1
WHERE T1.value <> ''
),
R (VALUE,INDX,RN) AS (
SELECT *
FROM T
WHERE T.RN = 1
UNION ALL
SELECT T.VALUE
,charindex(' ' + T.value + ' ',' ' + #string + ' ',R.INDX + 1) AS INDX
,T.RN
FROM T
JOIN R
ON T.value = R.VALUE
AND T.RN = R.RN + 1
)
SELECT #searchedWord = (value) FROM ( SELECT *, ORD = ROW_NUMBER() OVER (ORDER BY INDX) FROM R )AS TBL WHERE ORD = #wordNumber
RETURN #searchedword
END
GO
Modification for a column in a table, OPTION 1:
WITH T AS (
SELECT T1.stringToBeSplit
,T1.column1 --column1 is an example of column where stringToBeSplit is the same for more than one record. better to be avoid but if you need to added here it is how just follow column1 over the code
,T1.column2
,T1.value
,T1.column3
/*,...any other column*/
,charindex(' ' + T1.value + ' ',' ' + T1.stringToBeSplit + ' ' ,0) AS INDX
,RN = ROW_NUMBER() OVER (PARTITION BY t1.column1, T1.stringToBeSplit, T1.value order BY T1.column1, T1.T1.stringToBeSplit, T1.value) --any column that create duplicates need to be added here as example i added column1
FROM (SELECT TOP 10 * FROM YourTable D CROSS APPLY string_split(D.stringToBeSplit,' ')) AS T1
WHERE T1.value <> ''
),
R (stringToBeSplit, column1, column2, value, column3, INDX, RN) AS (
SELECT stringToBeSplit, column1, column2, value, column3, INDX, RN
FROM T
WHERE T.RN = 1
UNION ALL
SELECT T.stringToBeSplit, T.column1, column2, T.value, T.column3
,charindex(' ' + T.value + ' ',' ' + T.stringToBeSplit + ' ',R.INDX + 1) AS INDX
,T.RN
FROM T
JOIN R
ON T.value = R.VALUE AND T.COLUMN1 = R.COLUMN1 --any column that create duplicates need to be added here as exapmle i added column1
AND T.RN = R.RN + 1
)
SELECT * FROM R ORDER BY column1, stringToBeSplit, INDX
Modification for a column in a table, OPTION 2 (max performance i could get, main action came from removing the join and finding a way of properly execute (and stop) the recursive loop of the CTE, from 1.30 for 1000 lines to 2 sec for 30K lines of strings of similar type and length):
WITH T AS (
SELECT T1.stringToBeSplit --no extracolumns this time
,T1.value
,charindex(' ' + T1.value + ' ',' ' + T1.stringToBeSplit + ' ' ,0) AS INDX
,RN = ROW_NUMBER() OVER (PARTITION BY T1.stringToBeSplit,T1.value order BY T1.stringToBeSplit,T1.value) --from clause use distinct and where if possible
FROM (SELECT DISTINCT stringToBeSplit, VALUE FROM [your table] D CROSS APPLY string_split(D.stringToBeSplit,' ') WHERE [your filter]) AS T1
WHERE T1.value <> ''
),
R (stringToBeSplit, value, INDX, RN) AS (
SELECT stringToBeSplit, value, INDX, RN
FROM T
WHERE T.RN = 1
UNION ALL
SELECT R.stringToBeSplit, R.value
,charindex(' ' + R.value + ' ',' ' + R.stringToBeSplit + ' ',R.INDX + 1) AS INDX
,R.RN + 1
FROM R
WHERE charindex(' ' + R.value + ' ',' ' + R.stringToBeSplit + ' ',R.INDX + 1) <> 0
)
SELECT * FROM R ORDER BY stringToBeSplit, INDX
For getting the word ordinal instead of SELECT * FROM R USE:
SELECT stringToBeSplit ,value , ROW_NUMBER() OVER (PARTITION BY stringToBeSplit order BY [indX]) AS ORD FROM R
if instead of having one RW per word you prefer one column:
select * FROM (SELECT [name 1],value , ROW_NUMBER() OVER (PARTITION BY [name 1] order BY [indX]) AS ORD FROM R ) as R2
pivot (MAX(VALUE) FOR ORD in ([1],[2],[3]) ) AS PIV
if you don't want to specify the number of columns QUOTNAME() like in this link, in my case i only need first 4 words rest are irrelevant for the moment. Below the code from the page in case link fail:
DECLARE
#columns NVARCHAR(MAX) = '',
#sql NVARCHAR(MAX) = '';
-- select the category names
SELECT
#columns+=QUOTENAME(category_name) + ','
FROM
production.categories
ORDER BY
category_name;
-- remove the last comma
SET #columns = LEFT(#columns, LEN(#columns) - 1);
-- construct dynamic SQL
SET #sql ='
SELECT * FROM
(
SELECT
category_name,
model_year,
product_id
FROM
production.products p
INNER JOIN production.categories c
ON c.category_id = p.category_id
) t
PIVOT(
COUNT(product_id)
FOR category_name IN ('+ #columns +')
) AS pivot_table;';
-- execute the dynamic SQL
EXECUTE sp_executesql #sql;
Last but not least i'm really looking forward to know if there is an easier way with same performance either in SQL server or in C#. i just think everything that does not use external info should stay in the Server and run as query or batch but not sure to be honest as i heard the contrary (specially from people that use panda) but no one have convince me just yet.
This works
Example:
String = "pos1-pos2-pos3"
REVERSE(PARSENAME(REPLACE(REVERSE(String), '-', '.'), 1))
With 1 Returns "pos1"
With 2 will return "pos2"...

SQL Pivot table without aggregate

I have a number of text files that are in a format similar to what is shown below.
ENTRY,1,000000,Widget 4000,1,,,2,,
FIELD,Type,A
FIELD,Component,Widget 4000
FIELD,Vendor,Acme
ENTRY,2,000000,PRODUCT XYZ,1,,,3,
FIELD,Type,B
FIELD,ItemAssembly,ABCD
FIELD,Component,Product XYZ - 123
FIELD,Description1,Product
FIELD,Description2,XYZ-123
FIELD,Description3,Alternate Part #440
FIELD,Vendor,Contoso
They have been imported into a table with VARCHAR(MAX) as the only field. Each ENTRY is a "new" item, and all the subsequent FIELD rows are properties of that item. The data next to the FIELD is the column name of the property. The data to the right of the property is the data I want to display.
The desired output would be:
ENTRY Type Component Vendor ItemAssembly Description1
1,000000,Widget 4000 A Widget 4000 Acme
2,000000,Product XYZ B Product XYZ-123 Contoso ABCD Product
I've got the column names using the code below (there are several tables that I have UNIONed together to list all the property names).
select #cols =
STUFF (
(select Distinct ', ' + QUOTENAME(ColName) from
(SELECT
SUBSTRING(ltrim(textFileData),CHARINDEX(',', textFileData, 1)+1,CHARINDEX(',', textFileData, CHARINDEX(',', textFileData, 1)+1)- CHARINDEX(',', textFileData, 1)-1) as ColName
FROM [MyDatabase].[dbo].[MyTextFile]
where
(LEFT(textFileData,7) LIKE #c)
UNION
....
) A
FOR XML PATH(''), TYPE).value('.','NVARCHAR(MAX)'),1,1,'')
Is a Pivot table the best way to do this? No aggregation is needed. Is there a better way to accomplish this? I want to list out data next to the FIELD name in a column format.
Thanks!
Here is the solution in SQL fiddle:
http://sqlfiddle.com/#!3/8f0b0/8
Prepare raw data in format (entry, field, value), use dynamic SQL to make pivot on unknown column count.
MAX() for string is enough to simulate "without aggregate" behavior in this case.
create table t(data varchar(max))
insert into t values('ENTRY,1,000000,Widget 4000,1,,,2,,')
insert into t values('FIELD,Type,A')
insert into t values('FIELD,Component,Widget 4000')
insert into t values('FIELD,Vendor,Acme ')
insert into t values('ENTRY,2,000000,PRODUCT XYZ,1,,,3,')
insert into t values('FIELD,Type,B')
insert into t values('FIELD,ItemAssembly,ABCD')
insert into t values('FIELD,Component,Product XYZ - 123')
insert into t values('FIELD,Description1,Product ')
insert into t values('FIELD,Description2,XYZ-123 ')
insert into t values('FIELD,Description3,Alternate Part #440')
insert into t values('FIELD,Vendor,Contoso');
create type preparedtype as table (entry varchar(max), field varchar(max), value varchar(max))
declare #prepared preparedtype
;with identified as
(
select
row_number ( ) over (order by (select 1)) as id,
substring(data, 1, charindex(',', data) - 1) as type,
substring(data, charindex(',', data) + 1, len(data)) as data
from t
)
, tree as
(
select
id,
(select max(id)
from identified
where type = 'ENTRY'
and id <= i.id) as parentid,
type,
data
from identified as i
)
, pivotsrc as
(
select
p.data as entry,
substring(c.data, 1, charindex(',', c.data) - 1) as field,
substring(c.data, charindex(',', c.data) + 1, len(c.data)) as value
from tree as p
inner join tree as c on c.parentid = p.id
where p.id = p.parentid
and c.parentid <> c.id
)
insert into #prepared
select * from pivotsrc
declare #dynamicPivotQuery as nvarchar(max)
declare #columnName as nvarchar(max)
select #columnName = ISNULL(#ColumnName + ',','')
+ QUOTENAME(field)
from (select distinct field from #prepared) AS fields
set #dynamicPivotQuery = N'select * from #prepared
pivot (max(value) for field in (' + #columnName + ')) as result'
exec sp_executesql #DynamicPivotQuery, N'#prepared preparedtype readonly', #prepared
Here your are, this comes back exactly as you need it. I love tricky SQL :-). This is a real ad-hoc singel-statement call.
DECLARE #tbl TABLE(OneCol VARCHAR(MAX));
INSERT INTO #tbl
VALUES('ENTRY,1,000000,Widget 4000,1,,,2,,')
,('FIELD,Type,A')
,('FIELD,Component,Widget 4000')
,('FIELD,Vendor,Acme ')
,('ENTRY,2,000000,PRODUCT XYZ,1,,,3,')
,('FIELD,Type,B')
,('FIELD,ItemAssembly,ABCD')
,('FIELD,Component,Product XYZ - 123')
,('FIELD,Description1,Product ')
,('FIELD,Description2,XYZ-123 ')
,('FIELD,Description3,Alternate Part #440')
,('FIELD,Vendor,Contoso');
WITH OneColumn AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS inx
,CAST('<root><r>' + REPLACE(OneCol,',','</r><r>') + '</r></root>' AS XML) AS Split
FROM #tbl AS tbl
)
,AsParts AS
(
SELECT inx
,Each.part.value('/root[1]/r[1]','varchar(max)') AS Part1
,Each.part.value('/root[1]/r[2]','varchar(max)') AS Part2
,Each.part.value('/root[1]/r[3]','varchar(max)') AS Part3
,Each.part.value('/root[1]/r[4]','varchar(max)') AS Part4
,Each.part.value('/root[1]/r[5]','varchar(max)') AS Part5
FROM OneColumn
CROSS APPLY Split.nodes('/root') AS Each(part)
)
,TheEntries AS
(
SELECT DISTINCT *
FROM AsParts
WHERE Part1='ENTRY'
)
SELECT TheEntries.Part2 + ',' + TheEntries.Part3 + ',' + TheEntries.Part4 AS [ENTRY]
,MyFields.AsXML.value('(fields[1]/field[Part2="Type"])[1]/Part3[1]','varchar(max)') AS [Type]
,MyFields.AsXML.value('(fields[1]/field[Part2="Component"])[1]/Part3[1]','varchar(max)') AS Component
,MyFields.AsXML.value('(fields[1]/field[Part2="Vendor"])[1]/Part3[1]','varchar(max)') AS Vendor
,MyFields.AsXML.value('(fields[1]/field[Part2="ItemAssembly"])[1]/Part3[1]','varchar(max)') AS ItemAssembly
,MyFields.AsXML.value('(fields[1]/field[Part2="Description1"])[1]/Part3[1]','varchar(max)') AS Description1
FROM TheEntries
CROSS APPLY
(
SELECT *
FROM AsParts AS ap
WHERE ap.Part1='FIELD' AND ap.inx>TheEntries.inx
AND ap.inx < ISNULL((SELECT TOP 1 nextEntry.inx FROM TheEntries AS nextEntry WHERE nextEntry.inx>TheEntries.inx ORDER BY nextEntry.inx DESC),10000000)
ORDER BY ap.inx
FOR XML PATH('field'), ROOT('fields'),TYPE
) AS MyFields(AsXML)

Sql Insert the same row multiple times

I'm trying to write a stored procedure which will take rows from a table like the one below
and insert them as many times as the value of the Quantity column. It should also assign a unique name & number to the rows inserted.
The end result should look something like the screenshot below
I can get very close to the what I want by the SQL below
Source
INSERT INTO dbo. MyTable (....)
SELECT
t1.Name + ' (' + CAST(E.n as VARCHAR(3)) + ')',
#Prefix + ' - ' + ROW_NUMBER () OVER (ORDER BY t1.Name )
FROM
MyFirstTable t1
JOIN ....
JOIN .....
CROSS JOIN
(SELECT TOP 500 ROW_NUMBER() OVER(ORDER BY (SELECT 1)) FROM sys.columns)E(n)
WHERE
E.n <= t1.Quantity
AND....
The above statement works because I do know that quantity will never exceed 500 but I'm not a big fan of the way it is done. Is there a better way to accomplish this?
I'm not very experienced in sql.
Seems like you have already figured out what you need for the most part. As far as the top 500 not exceeding goes, you could either leave it there or remove it. I think this is what you may be looking for:
SELECT
id, --not sure where this id comes from but looks different in your output
CASE
WHEN E.n-1 > 0
THEN t1.Name + ' (' + CAST(E.n-1 as VARCHAR(3)) + ')'
ELSE t1.Name
END as Name,
#prefix + ' - ' + cast(ROW_NUMBER () OVER (ORDER BY t1.id) as varchar(10)) as Number
FROM
test t1
JOIN ...
JOIN ...
CROSS JOIN
(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) FROM sys.columns)E(n)
WHERE
E.n <= t1.Quantity
AND ....;
SQL Fiddle Demo

PIVOT in sql 2005

I need to pivot one column (Numbers column).
example need this data:
a 1
a 2
b 3
b 4
c 5
d 6
d 7
d 8
d 9
e 10
e 11
e 12
e 13
e 14
Look like this
a 1 2
b 3 4
c 5
d 6 7 8 9
e 10 11 12 13 14
any help would be greatly appreciated...
Using ROW_NUMBER(), PIVOT and some dynamic SQL (but no cursor necessary) :
CREATE TABLE [dbo].[stackoverflow_198716](
[code] [varchar](1) NOT NULL,
[number] [int] NOT NULL
) ON [PRIMARY]
DECLARE #sql AS varchar(max)
DECLARE #pivot_list AS varchar(max) -- Leave NULL for COALESCE technique
DECLARE #select_list AS varchar(max) -- Leave NULL for COALESCE technique
SELECT #pivot_list = COALESCE(#pivot_list + ', ', '') + '[' + CONVERT(varchar, PIVOT_CODE) + ']'
,#select_list = COALESCE(#select_list + ', ', '') + '[' + CONVERT(varchar, PIVOT_CODE) + '] AS [col_' + CONVERT(varchar, PIVOT_CODE) + ']'
FROM (
SELECT DISTINCT PIVOT_CODE
FROM (
SELECT code, number, ROW_NUMBER() OVER (PARTITION BY code ORDER BY number) AS PIVOT_CODE
FROM stackoverflow_198716
) AS rows
) AS PIVOT_CODES
SET #sql = '
;WITH p AS (
SELECT code, number, ROW_NUMBER() OVER (PARTITION BY code ORDER BY number) AS PIVOT_CODE
FROM stackoverflow_198716
)
SELECT code, ' + #select_list + '
FROM p
PIVOT (
MIN(number)
FOR PIVOT_CODE IN (
' + #pivot_list + '
)
) AS pvt
'
PRINT #sql
EXEC (#sql)
Just because I wanted to get some more experience with CTEs, I came up with the following:
WITH CTE(CTEstring, CTEids, CTElast_id)
AS
(
SELECT string, CAST(id AS VARCHAR(1000)), id
FROM dbo.Test_Pivot TP1
WHERE NOT EXISTS (SELECT * FROM dbo.Test_Pivot TP2 WHERE TP2.string = TP1.string AND TP2.id < TP1.id)
UNION ALL
SELECT CTEstring, CAST(CTEids + ' ' + CAST(TP.id AS VARCHAR) AS VARCHAR(1000)), TP.id
FROM dbo.Test_Pivot TP
INNER JOIN CTE ON
CTE.CTEstring = TP.string
WHERE
TP.id > CTE.CTElast_id AND
NOT EXISTS (SELECT * FROM dbo.Test_Pivot WHERE string = CTE.CTEstring AND id > CTE.CTElast_id AND id < TP.id)
)
SELECT
t1.CTEstring, t1.CTEids
FROM CTE t1
INNER JOIN (SELECT CTEstring, MAX(LEN(CTEids)) AS max_len_ids FROM CTE GROUP BY CTEstring) SQ ON SQ.CTEstring = t1.CTEstring AND SQ.max_len_ids = LEN(t1.CTEids)
ORDER BY CTEstring
GO
It might need some tweaking, but it worked with your example
The coalesce function could also be used here, similar to other questions that have been asked about concatenating data.
How to create a SQL Server function to "join" multiple rows from a subquery into a single delimited field?
This related question should have the answer you need: SQL Server: Examples of PIVOTing String data
A Matrix control in SSRS has dynamic columns, if this data is bound for a report anyways then you could use that. Otherwise you'll have to create a sql sproc that generates the sql like in the exaamples dynamicly and then executes it.
I'm not sure that what you're doing is really possible (or at least practical) in SQL - I'm not sure, because I'm still not exactly sure what you want to do.
You could build that pivot table in your client application, for example with:
select distinct Letter from MyTable
to get the list of letters, and then use a parameterized query inside a loop:
select Number from MyTable where Letter=:letter
to get the list of numbers for each letter.

Resources