split string in to rows in sql server - sql-server

I have a string in sql ABCDEF . This is coming under First_Name column in a EMP table.
I want to split this string into rows as give below.
Please note there is no delimiter or Comma or space. Its a string without any special characters and special symbols and space.
First_Name
A
B
C
D
E
F

A loop obviously comes to mind but we can do much better. This is where a tally or numbers table is perfect. I have a view in my system like this which creates such a table with 10,000 rows on demand. There are plenty of ways to create such a table, or you could create a persistent table for a slight performance gain.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
Now that we have the view this type of thing is super painless using a totally set based approach and abandoning the idea of looping.
declare #Something varchar(20) = 'ABCDEF'
select SUBSTRING(#Something, N, 1)
from cteTally t
where t.N < LEN(#Something)

Solving this is trivial if you have a Numbers table. No loops or cursors are necessary, resulting in performance that is orders of magnitude better than other solutions:
declare #name varchar(10)='ABCDEF'
select #name,SUBSTRING(#name,n,1)
from numbers
where n<=LEN(#name)
Or
select EMP.First_Name,SUBSTRING(EMP.First_Name,n,1)
from EMP,numbers
where n<=LEN(EMP.First_Name)
A Numbers table contains only numbers from 1 to a sufficiently large number. You can create such a table with the following statement (borrowed from the linked article):
SELECT TOP (1000000) n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
INTO dbo.Numbers
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);
CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers(n)

You can use a CTE to parse out each character of your string onto a new row:
DECLARE #data varchar(200) = 'ABCDEF'
;WITH CTE AS (
SELECT
1 as CharacterPosition,
SUBSTRING(#data,1,1) as Character
UNION ALL
SELECT
CharacterPosition + 1,
SUBSTRING(#data,CharacterPosition + 1,1)
FROM
CTE
WHERE CharacterPosition < LEN(#data)
)
SELECT Character
FROM CTE

Related

Get all Numbers that are not existing from a Microsoft SQL-VarChar-Column

I have a table with a Column for ID-Numbers that are not increased in single steps.
So there are Numbers that are not used and this i need. The Column is a VarChar-Column.
For Example:
Used Numbers in the table = 2, 5, 7, 9, 10 etc.
So i need a Query that gives me = 1, 3, 4, 6, 8 etc.
Pseudo-Code something like:
select numbers from Table NOT IN (select numbers from table)!
I have tried with NOT IN and NOT EXISTS, but nothing works.
Can someone help me to achieve this?
EDIT: Range of Numbers is from 0 to 99999999 !!!
DECLARE #Table AS TABLE
(
Id VARCHAR(5)
)
INSERT INTO #Table
VALUES
('1')
,('3')
,('5')
,('7')
,('10')
DECLARE #Range AS TABLE
(
RangeId VARCHAR(10)
)
INSERT INTO #Range
SELECT TOP (1000000) n = CONVERT(VARCHAR(10), ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1)
select
MissingId = RangeId
from
#Range AS R
LEFT OUTER JOIN #Table AS T ON T.Id = R.RangeId
WHERE
CONVERT(INT,R.RangeId) <= (SELECT MAX(CONVERT(INT,Id)) FROM #Table)
AND T.Id IS NULL
order by MissingId
As you don't mention what the upper limit is, and recursive Common Table Expressions are inherently slow, you would likely be better off with a Tally to achieve this:
CREATE TABLE dbo.YourTable (ID int);
INSERT INTO dbo.YourTable (ID)
VALUES(1),(3),(5),(7),(9),(11),(13),(15),(216); --Big jump on purpose
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP (SELECT MAX(ID) FROM dbo.YourTable) --Limit the tally for performance
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3) --1000 rows, add more Ns for more rows
SELECT I AS ID
FROM Tally T
LEFT JOIN dbo.YourTable YT ON T.I = YT.ID
WHERE YT.ID IS NULL;
Warning: Based on the comment on another answer:
This is the right Direction. When I do it in my Situation, I get only 100 Numbers. But the Numbers have 8 Digits and there are MUCH more then 100 Numbers!
8 digits means you have IDs with a value of 10,000,000 (10 Million) plus. Creating over 10 million rows in a tally will be very IO intensive. I strongly suggest putting this into batches.
Edit2: Ok, the max (from a comment on this answer) is 99,999,999! This is information that definitely should have been in the question. This process must be batched or you will kill your transaction Logs.
With out using CTE you can use the master.dbo.spt_values trick. Not sure the actual purpose of this table in msdb, but it contains the values we need. Give it a try. If you have bigger values than spt_values, then divide your max id by spt_value's max, and replace number+1 with number+1+(#currentbatch*#maxsptvalues) (first batch is batch 0). I haven't tested it nor written then code for it, but something of that sort should certainly work. You can do it in a while loop for instance.
IF OBJECT_ID('tmptbl') IS NOT null
DROP TABLE tmptbl
GO
SELECT * INTO tmptbl
FROM
(
SELECT '1' [id]
UNION
SELECT '3'
UNION
SELECT '5' ) t
DECLARE #maxid INT = 0
SELECT #maxid = MAX(id) FROM tmptbl
SELECT number+1
FROM master.dbo.spt_values
WHERE number < #maxid
AND Type = 'p'
AND NOT EXISTS ( SELECT 1
FROM dbo.tmptbl
WHERE CONVERT(INT,[id]) = (number+1))
ORDER BY number
The Result:
2,4

Remove special characters and numbers from column

How to remove all special characters and numbers except spaces from specific column in microsoft sql server?
The links above all use loops to solve this. There is no need to resort to loops for this type of thing. Instead we can use a tally table. I keep a tally table as a view in my system like this.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
GO
Now we can leverage that tally as a set. This is a LOT faster than those loop based solutions.
create function GetOnlyCharacters
(
#SearchVal varchar(8000)
) returns table as return
with MyValues as
(
select substring(#SearchVal, N, 1) as number
, t.N
from cteTally t
where N <= len(#SearchVal)
and substring(#SearchVal, N, 1) like '[a-z]'
)
select distinct NumValue = STUFF((select number + ''
from MyValues mv2
order by mv2.N
for xml path('')), 1, 0, '')
from MyValues mv

Efficiently insert sequential numbers 1-N and renumber duplicates

I have a table whose primary key is a positive integer:
CREATE TABLE T
(
ID int PRIMARY KEY CHECK (ID > 0) -- not an IDENTITY column
-- ... other irrelevant columns...
)
Given a positive integer N, I want to insert N records with the IDs 1–N, inclusive. However, if a record with a particular ID already exists, I want to instead insert the next highest unused ID. For example, with N = 5:
If the table contains... Then insert...
(Nothing) 1,2,3,4,5
1,2,3 4,5,6,7,8
3,6,9,12 1,2,4,5,7
Here's a naïve way to do this:
DECLARE #N int = 5 -- number of records to insert
DECLARE #ID int = 1 -- next candidate ID
WHILE #N > 0 -- repeat N times
BEGIN
WHILE EXISTS(SELECT * FROM T WHERE ID = #ID) -- conflicting record?
SET #ID = #ID + 1
INSERT T VALUES (#ID)
SET #ID = #ID + 1
SET #N = #N - 1
END
But if E is the number of existing records, then in the worst case, this code performs E + N SELECTs and N INSERTs, which is quite inefficient.
Is there a smart way to perform this task with a small number of SELECTs and just one INSERT?
You can use a tally table and NOT IN I suppose...
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N into #temp from cteTally
declare #table table (i int)
insert into #table
values
(3),
(6),
(9),
(12)
insert into #table
select top 5 N from #temp where N not in (select i from #table) order by N
select * from #table
drop table #temp
Credit #SeanLange For Stressing Tally Tables and Originally Showing Me
Try this;
insert into T
select top 5
[ID]
from
(
select
[ID]=RANK()over(order by [ID])+5
from
T
union
select [ID]=1 union
select [ID]=2 union
select [ID]=3 union
select [ID]=4 union
select [ID]=5
)IDs
where
not exists(select 1 from T data where data.ID=IDs.ID)
Doesn't need temp tables and possibly easier to read and maintain (happy to be corrected :))

INSERT INTO slows down table-valued FUNCTION

On the SQL Server 2008 side I have the table-valued function, that receives 45k of integer ids merged into single VARBINARY(MAX), splits them and returns back as a table. SplitIds takes up to 5s. As I see in the estimated execution plan - 100% is 'Table Insert'. Is it possible somehow to speed up this function?
ALTER FUNCTION [dbo].[SplitIds](#data VARBINARY(MAX))
RETURNS #result TABLE(Id INT NOT NULL)
AS
BEGIN
IF #data IS NULL
RETURN
DECLARE #ptr INT = 0, #size INT = 4
WHILE #ptr * #size < LEN(#data)
BEGIN
INSERT INTO #result(Id)
VALUES(SUBSTRING(#data, #ptr * #size + 1, #size))
SET #ptr += 1
END
RETURN
END
Currently on the C# side it is used in Linq-to-SQL query in the next way:
XDbOrder[] orders =
database.SplitIds(ConvertToVarbinary(orderIds))
Join(
database.Get<XDbOrder>,
r = r.Id,
o => o.Id,
(r, o) => o).
ToArray();
More general question: is it possible somehow in Linq-to-SQL to implement the next thing without SplitIds? .Contains does not work - it creates the query with more than 2100 SQL parameters and crashes.
int[] orderIds = { ... 45k random entries .....};
XDbOrder[] orders =
database.Get<XDbOrder>().
Where(o => orderIds.Contains(o.Id)).
ToArray();
You could try a more set based approach.
(I've kept the multi statement TVF approach because the inline approach to generating a table of numbers works well in isolation but the execution plans when incorporated into a larger query can be quite catastrophically bad - this ensures that the split happens once and only once)
I've also added a Primary Key to the return table so it contains a useful index.
CREATE FUNCTION [dbo].[SplitIds](#data VARBINARY(MAX))
RETURNS #result TABLE(Id INT NOT NULL PRIMARY KEY WITH (IGNORE_DUP_KEY=ON))
AS
BEGIN
IF #data IS NULL
RETURN
DECLARE #size INT = 4;
WITH E1(N)
AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), -- 1*10^1 or 10 rows
E2(N)
AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N)
AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^4 or 10,000 rows
E8(N)
AS (SELECT 1 FROM E4 a, E4 b), -- 1*10^8 or 100,000,000 rows
Nums(N)
AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1
FROM E8)
INSERT INTO #result
(Id)
SELECT TOP (DATALENGTH(#data)/#size) SUBSTRING(#data, N * #size + 1, #size)
FROM Nums
RETURN
END
The following completes in about 160ms for me
DECLARE #data VARBINARY(MAX) = 0x
WHILE DATALENGTH(#data) < 184000
SET #data = #data + CRYPT_GEN_RANDOM(8000)
SELECT COUNT(*)
FROM [dbo].[SplitIds](#data)
Here is my version of set based approach
create FUNCTION [dbo].[SplitIds1](#data VARBINARY(MAX))
returns table with SCHEMABINDING
as
return
WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2), -- 10*100
e4(n) AS (SELECT 1 FROM e3 A CROSS JOIN e3 B), -- 1000*1000
Numbers(ptr,Size) AS (SELECT ROW_NUMBER() OVER (ORDER BY n)-1,4 FROM e4)
SELECT SUBSTRING(#data, ptr * Size + 1, Size) as Id
FROM Numbers
WHERE ptr * Size < LEN(#data)
Few notes about my approach
Adding SCHEMABINDING to the function will avoid unnecessary Table
spool operator in execution plan
Also removed #size variable since it is hard coded inside function
Changed Multi-Statement Table-Valued Function to Inline Table-Valued Function which allows you to see execution plan of select statement inside function just like any view or select query

Insert row for each integer between 0 and <value> without cursor

I have a source table with id and count.
id count
a 5
b 2
c 31
I need to populate a destination table with each integer up to the count for each id.
id value
a 1
a 2
a 3
a 4
a 5
b 1
b 2
c 1
c 2
etc...
My current solution is like so:
INSERT INTO destination (id,value)
source.id
sequence.number
FROM
(VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9)) AS sequence(number)
INNER JOIN
source ON sequence.number <= source.count
This solution has an upper limit and is plain lame. Is there anyway to replace the sequence with a set of all integers? Or another solution that does not use looping.
this should work:
WITH r AS (
SELECT id, count, 1 AS n FROM SourceTable
UNION ALL
SELECT id, count, n+1 FROM r WHERE n<count
)
SELECT id,n FROM r
order by id,n
OPTION (MAXRECURSION 0)
Unfortunately, there is not set of all integers in SQL Server. However, using a little trickery, you can easily generate such a set:
select N from (
select ROW_NUMBER() OVER (ORDER BY t1.object_id) AS N
from sys.all_objects t1, sys.all_objects t2
) AS numbers
where N between 1 and 1000000
will generate a set of all numbers from 1 through 1000000. If you need more than a few million numbers, add sys.all_objects to the cross join a third time.
You can find many examples in this page:
DECLARE #table TABLE (ID VARCHAR(1), counter INT)
INSERT INTO #table SELECT 'a', 5
INSERT INTO #table SELECT 'b', 3
INSERT INTO #table SELECT 'c', 31
;WITH cte (ID, counter) AS (
SELECT id, 1
FROM #table
UNION ALL
SELECT c.id, c.counter +1
FROM cte AS c
INNER JOIN #table AS t
ON t.id = c.id
WHERE c.counter + 1 <= t.counter
)
SELECT *
FROM cte
ORDER BY ID, Counter

Resources