How to extract every 7 characters of an nvarchar into another table?

How to extract every 7 characters of an nvarchar into another table? - sql-server

I have an nvarchar(200) called ColumnA in Table1 that contains, for example, the value:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
I want to extract every 7 characters into Table2, ColumnB and end up with all of these values below.
ABCDEFG
BCDEFGH
CDEFGHI
DEFGHIJ
EFGHIJK
FGHIJKL
GHIJKLM
HIJKLMN
IJKLMNO
JKLMNOP
KLMNOPQ
LMNOPQR
MNOPQRS
NOPQRST
OPQRSTU
PQRSTUV
QRSTUVW
RSTUVWX
STUVWXY
TUVWXYZ
[Not the real table and column names.]
The data is being loaded to Table1 and Table2 in an SSIS Package, and I'm puzzling whether it is better to do the string handling in TSQL in a SQL Task or parse out the string in a VB Script Component.
[Yes, I think we're the last four on the planet using VB in Script Components. I cannot persuade the other three that this C# thing is here to stay. Although, maybe it is a perfect time to go rogue.]

You can use a recursive CTE calculating the offsets step by step and substring().
WITH
cte
AS
(
SELECT 1 n
UNION ALL
SELECT n + 1 n
FROM cte
WHERE n + 1 <= len('ABCDEFGHIJKLMNOPQRSTUVWXYZ') - 7 + 1
)
SELECT substring('ABCDEFGHIJKLMNOPQRSTUVWXYZ', n, 7)
FROM cte;
db<>fiddle

If you have a physical numbers table, this is easy. If not, you can create a tally-on-the-fly:
DECLARE #string VARCHAR(100)='ABCDEFGHIJKLMNOPQRSTUVWXYZ';
--We create the tally using ROW_NUMBER against any table with enough rows.
WITH Tally(Nmbr) AS
(SELECT TOP(LEN(#string)-6) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT Nmbr
,SUBSTRING(#string,Nmbr,7) AS FragmentOf7
FROM Tally
ORDER BY Nmbr;
The idea in short:
The tally returns a list of numbers from 1 to n (n=LEN(#string)-6). This Number is used in SUBSTRING to define the starting position.

You can do it with T-SQL like this:
DECLARE C CURSOR LOCAL FOR SELECT [ColumnA] FROM [Table1]
OPEN C
DECLARE #Val nvarchar(200);
FETCH NEXT FROM C into #Val
WHILE ##FETCH_STATUS = 0 BEGIN
DECLARE #I INTEGER;
SELECT #I = 1;
WHILE #I <= LEN(#vAL)-6 BEGIN
PRINT SUBSTRING(#Val, #I, 7)
SELECT #I = #I + 1
END
FETCH NEXT FROM C into #Val
END
CLOSE C

Script Component solution
Assuming that the input Column name is Column1
Add a script component
Open the script component configuration form
Go to Inputs and Outputs Tab
Click on the Output icon and set the Synchronous Input property to None
Add an Output column (example outColumn1)
In the Script editor, use a similar code in the row processing function:
Dim idx as integer = 0
While Row.Column1.length > idx + 7
Output0Buffer.AddRow()
Output0Buffer.outColumn1 = Row.
Column1.Substring(idx,7)
idx +=1
End While

Related

Searching for multiple patterns in a string in T-SQL

In t-sql my dilemma is that I have to parse a potentially long string (up to 500 characters) for any of over 230 possible values and remove them from the string for reporting purposes. These values are a column in another table and they're all upper case and 4 characters long with the exception of two that are 5 characters long.
Examples of these values are:
USFRI
PROME
AZCH
TXJS
NYDS
XVIV. . . . .
Example of string before:
"Offered to XVIV and USFRI as back ups. No response as of yet."
Example of string after:
"Offered to and as back ups. No response as of yet."
Pretty sure it will have to be a UDF but I'm unable to come up with anything other than stripping ALL the upper case characters out of the string with PATINDEX which is not the objective.

This is unavoidably cludgy but one way is to split your string into rows, once you have a set of words the rest is easy; Simply re-aggregate while ignoring the matching values*:
with t as (
select 'Offered to XVIV and USFRI as back ups. No response as of yet.' s
union select 'Another row AZCH and TXJS words.'
), v as (
select * from (values('USFRI'),('PROME'),('AZCH'),('TXJS'),('NYDS'),('XVIV'))v(v)
)
select t.s OriginalString, s.Removed
from t
cross apply (
select String_Agg(j.[value], ' ') within group(order by Convert(tinyint,j.[key])) Removed
from OpenJson(Concat('["',replace(s, ' ', '","'),'"]')) j
where not exists (select * from v where v.v = j.[value])
)s;
* Requires a fully-supported version of SQL Server.

build a function to do the cleaning of one sentence, then call that function from your query, something like this SELECT Col1, dbo.fn_ReplaceValue(Col1) AS cleanValue, * FROM MySentencesTable. Your fn_ReplaceValue will be something like the code below, you could also create the table variable outside the function and pass it as parameter to speed up the process, but this way is all self contained.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION fn_ReplaceValue(#sentence VARCHAR(500))
RETURNS VARCHAR(500)
AS
BEGIN
DECLARE #ResultVar VARCHAR(500)
DECLARE #allValues TABLE (rowID int, sValues VARCHAR(15))
DECLARE #id INT = 0
DECLARE #ReplaceVal VARCHAR(10)
DECLARE #numberOfValues INT = (SELECT COUNT(*) FROM MyValuesTable)
--Populate table variable with all values
INSERT #allValues
SELECT ROW_NUMBER() OVER(ORDER BY MyValuesCol) AS rowID, MyValuesCol
FROM MyValuesTable
SET #ResultVar = #sentence
WHILE (#id <= #numberOfValues)
BEGIN
SET #id = #id + 1
SET #ReplaceVal = (SELECT sValue FROM #allValues WHERE rowID = #id)
SET #ResultVar = REPLACE(#ResultVar, #ReplaceVal, SPACE(0))
END
RETURN #ResultVar
END
GO

I suggest creating a table (either temporary or permanent), and loading these 230 string values into this table. Then use it in the following delete:
DELETE
FROM yourTable
WHERE col IN (SELECT col FROM tempTable);
If you just want to view your data sans these values, then use:
SELECT *
FROM yourTable
WHERE col NOT IN (SELECT col FROM tempTable);

SQL Server Management Studio using While loop function to split result

I have a task to find a way to use WHILE loop function to try to split a single selected result into many, the table is simple column in (var) format looks like :
SELECT [Names ID]
FROM [Names Database]
The result is just a column with numbers some of which are repeated many times. My question is there a way to use WHILE function to split the result grouped by those [Names ID] numbers so it looks like I used SELECT and WHERE filter for each different number?
i used this :
SELECT CUSTOMER_ID
,DENSE_RANK() OVER(ORDER BY CUSTOMER_ID) as [number]
INTO [ID NUMBERS]
FROM [Customer]
USE [TEST];
GO
DECLARE #N int = 0
WHILE (SELECT max(NUMBER) FROM [ID NUMBERS] ) > #N
BEGIN
SET #N = #N + 1
SELECT CUSTOMER_ID,number FROM [ID NUMBERS]
END
i used the dense_rank basically not to calculate the long customer_id +1 the result i get looks like :
how can i fix it too look like this :
manage to get the result i want by using this :
SELECT distinct CUSTOMER_ID
,DENSE_RANK() OVER(ORDER BY CUSTOMER_ID) as [number] INTO [TEST].[Trainee].[ID NUMBERS] FROM [TEST] USE [TEST]; GO DECLARE #N BIG INT = 1 WHILE (SELECT max(NUMBER) FROM [TEST][ID NUMBERS] ) >= #N BEGIN
SELECT #N
,ID.[Customer_ID]
,[number]
FROM [TEST].[ID NUMBERS] AS ID
WHERE #N = NUMBER SET #N = #N + 1 END

You can do a WHILE loop from the MIN value of the table to the MAX value, increasing it by one each time, and doing an IF EXISTS(...) SELECT... inside the loop.
EDIT based on edit to question:
I don't know what made you think you could use DENSE_RANK as a shortcut. You can't.
Start with #N as the MIN Customer_ID.
in the loop, SELECT the rows where Customer_ID=#N. Be sure to use IF EXISTS() so you don't get empty result sets for non-existing customer numbers.
Loop WHILE #N is less than or equal to the MAX Customer_ID.

How can I complete this Excel function in SQL Server?

I have approximately 30,000 records where I need to split the Description field and so far I can only seem to achieve this in Excel. An example Description would be:
1USBCP 2RJ45C6 1DVI 1DP 3MD 3MLP HANDS
Below is my Excel function:
=TRIM(MID(SUBSTITUTE($G309," ",REPT(" ",LEN($G309))),((COLUMNS($G309:G309)-1)*LEN($G309))+1,LEN($G309)))
This is then dragged across ten Excel columns, and splits the description field at each space.
I have seen many questions asked about splitting a string in SQL but they only seem to cover one space, not multiple spaces.

There is no easy function in SQL server to split strings. At least I don't know it. I use usually some trick that I found somewhere in the Internet some time ago. I modified it to your example.
The trick is that first we try to figure out how many columns do we need. We can do it by checking how many empty strings we have in the string. The easiest way is lenght of string - lenght of string without empty string.
After that for each string we try to find start and end of each word by position. At the end we cut simply string by start and end position and assign to coulmns. The details are in the query. Have fun!
CREATE TABLE test(id int, data varchar(100))
INSERT INTO test VALUES (1,'1USBCP 2RJ45C6 1DVI 1DP 3MD 3MLP HANDS')
INSERT INTO test VALUES (2,'Shorter one')
DECLARE #pivot varchar(8000)
DECLARE #select varchar(8000)
SELECT
#pivot=coalesce(#pivot+',','')+'[col'+cast(number+1 as varchar(10))+']'
FROM
master..spt_values where type='p' and
number<=(SELECT max(len(data)-len(replace(data,',',''))) FROM test)
SELECT
#select='
select p.*
from (
select
id,substring(data, start+2, endPos-Start-2) as token,
''col''+cast(row_number() over(partition by id order by start) as varchar(10)) as n
from (
select
id, data, n as start, charindex('','',data,n+2) endPos
from (select number as n from master..spt_values where type=''p'') num
cross join
(
select
id, '' '' + data +'' '' as data
from
test
) m
where n < len(data)-1
and substring(odata,n+1,1) = '','') as data
) pvt
Pivot ( max(token)for n in ('+#pivot+'))p'
EXEC(#select)
Here you can find example in SQL Fiddle
I didn't notice that you want to get rid of multiple blank spaces.
To do it please create some function that preprare your data :
CREATE FUNCTION dbo.[fnRemoveExtraSpaces] (#Number AS varchar(1000))
Returns Varchar(1000)
As
Begin
Declare #n int -- Length of counter
Declare #old char(1)
Set #n = 1
--Begin Loop of field value
While #n <=Len (#Number)
BEGIN
If Substring(#Number, #n, 1) = ' ' AND #old = ' '
BEGIN
Select #Number = Stuff( #Number , #n , 1 , '' )
END
Else
BEGIN
SET #old = Substring(#Number, #n, 1)
Set #n = #n + 1
END
END
Return #number
END
After that use the new version that removes extra spaces.
DECLARE #pivot varchar(8000)
DECLARE #select varchar(8000)
SELECT
#pivot=coalesce(#pivot+',','')+'[col'+cast(number+1 as varchar(10))+']'
FROM
master..spt_values where type='p' and
number<=(SELECT max(len(dbo.fnRemoveExtraSpaces(data))-len(replace(dbo.fnRemoveExtraSpaces(data),' ',''))) FROM test)
SELECT
#select='
select p.*
from (
select
id,substring(data, start+2, endPos-Start-2) as token,
''col''+cast(row_number() over(partition by id order by start) as varchar(10)) as n
from (
select
id, data, n as start, charindex('' '',data,n+2) endPos
from (select number as n from master..spt_values where type=''p'') num
cross join
(
select
id, '' '' + dbo.fnRemoveExtraSpaces(data) +'' '' as data
from
test
) m
where n < len(data)-1
and substring(data,n+1,1) = '' '') as data
) pvt
Pivot ( max(token)for n in ('+#pivot+'))p'
EXEC(#select)

I am probably not understanding your question, but all that you are doing in that formula, can be done almost exactly the same in SQL. I see someone has already answered but to my mind, how can it be necessary to do all that when you can do this. I might be wrong. But here goes.
declare #test as varchar(100)
set #test='abcd1234567'
select right(#test,2)
, left(#test,2)
, len(#test)
, case when len(#test)%2>0
then left(right(#test,round(len(#test)/2,0)+1),1)
else left(right(#test,round(len(#test)/2,0)+1),2) end
Results
67 ab 11 2
So right, left, length and mid can all be achieved.
If the spaces are the "substring" dividers, then: I dont remember well the actual syntax for do-while inside selects of sql, neither have i actually done that per se, but I don't see why it should not be possible. If it doesn't work then you need a temporary table and if that does not work you need a cursor. The cursor would be an external loop around this one to fetch and process a single string at a time. Or you can do something more clever. I am just a novice.
declare #x varchar(1)
declare #n integer
declare #i integer
declare #str varchar(100) -- this is your description. Fetch it and assign it. if in a cursor just use column-name
set #x = null
set #n = 0
set #i = 0
while n < len(#str)
while NOT #x = " "
begin
set #x = left(right(#str,n),1)
n = n+1
end
--insert into or update #temptable blablabla here.
Use i and n to locate substring and then left(right()) it out. or you can SELECT it, but that is a messy procedure if the number of substrings are long. Continue with:
set i = n
set #str = right(#str, i) -- this includes the " ". left() it out at will.
end
Now, a final comment, there should perhaps be a third loop checking for if you are at the last "substring" because I see now this code will throw error when it gets to the end. or "add" an empty space at the end to #str, that will also work. But my time is up. This is a suggestion at least.

How to make unique random alphanumeric sequence in SQL Server

I want to make unique random alphanumeric sequence to be the primary key for a database table.
Each char in the sequence is either a letter (a-z) or number (0-9)
Examples for what I want :
kl7jd6fgw
zjba3s0tr
a9dkfdue3
I want to make a function that could handle that task!

You can use an uniqueidentifier. This can be generated with the NEWID() function:
SELECT NEWID()
will return something like:
BE228C22-C18A-4B4A-9AD5-1232462F7BA9

It is a very bad idea to use random strings as a primary key.
It will effect performance as well as storage size, and you will be much better of using an int or a bigint with an identity property.
However, generating a random string in SQL maybe useful for other things, and this is why I offer this solution:
Create a table to hold permitted char values.
In my example the permitted chars are 0-9 and A-Z.
CREATE TABLE Chars (C char(1))
DECLARE #i as int = 0
WHILE #i < 10
BEGIN
INSERT INTO Chars (C) VALUES (CAST(#i as Char(1)))
SET #i = #i+1
END
SET #i = 65
WHILE #i < 91
BEGIN
INSERT INTO Chars (C) VALUES (CHAR(#i))
SET #i = #i+1
END
Then use this simple select statement to generate a random string from this table:
SELECT TOP 10 C AS [text()]
FROM Chars
ORDER BY NEWID()
FOR XML PATH('')
The advantages:
You can easily control the allowed characters.
The generation of a new string is a simple select statement and not manipulation on strings.
The disadvantages:
This select results with an ugly name (i.e XML_F52E2B61-18A1-11d1-B105-00805F49916B). This is easily solved by setting the result into a local variable.
Characters will only appear once in every string. This can easily be solved by adding union:
example:
SELECT TOP 10 C AS [text()]
FROM (
SELECT * FROM Chars
UNION ALL SELECT * FROM Chars
) InnerSelect
ORDER BY NEWID()
FOR XML PATH('')
Another option is to use STUFF function instead of As [Text()] to eliminate those pesky XML tags:
SELECT STUFF((
SELECT TOP 100 ''+ C
FROM Chars
ORDER BY NEWID()
FOR XML PATH('')
), 1, 1, '') As RandomString;
This option doesn't have the disadvantage of the ugly column name, and can have an alias directly. Execution plan is a little different but it should not suffer a lot of performance lose.
Play with it yourself in this Sql Fiddle
If there are any more advantages / disadvantages you think of please leave a comment. Thanks.

NewID() Function will generate unique numbers.So i have incremented them with loop and picked up the combination of alpha numeric characters using Charindex and Left functions
;with list as
(
select 1 as id,newid() as val
union all
select id + 1,NEWID()
from list
where id + 1 < 100
)
select ID,left(val, charindex('-', val) - 2) from list
option (maxrecursion 0)

The drawback of NEWID() for this request is it limits the character pool to 0-9 and A-F. To define your own character pool, you have to role a custom solution.
This solution adapted from Generating random strings with T-SQL
--Define list of characters to use in random string
DECLARE #CharPool VARCHAR(255)
SET #CharPool = '0123456789abcdefghijkmnopqrstuvwxyz'
--Store length of CharPool for use later
DECLARE #PoolLength TINYINT
SET #PoolLength = LEN(#CharPool) --36
--Define random string length
DECLARE #StringLength TINYINT
SET #StringLength = 9
--Declare target parameter for random string
DECLARE #RandomString VARCHAR(255)
SET #RandomString = ''
--Loop control variable
DECLARE #LoopCount TINYINT
SET #LoopCount = 0
--For each char in string, choose random char from char pool
WHILE(#LoopCount < #StringLength)
BEGIN
SELECT #RandomString += SUBSTRING(#Charpool, CONVERT(int, RAND() * #PoolLength), 1)
SELECT #LoopCount += 1
END
SELECT #RandomString
http://sqlfiddle.com/#!6/9eecb/4354
I must reiterate, however, that I agree with the others: this is a horrible idea.

Generating a unique key in SQL Server on a per company basis

I store positions in a SQL Server 2012 database, where each position is defined by a position number and a company number.
The position numbers are unique for each company only.
For instance, my database could have the following
POSITION_NO COMPANY_NO
1 1
2 1
3 1
1 2
2 2
3 2
1 3
I need a function which takes a company number as a parameter, and returns the next sequential position number, which in the example table above would be 2 for COMPANY_NO = 3
What I use at the moment is:
CREATE PROCEDURE [DB].[GenerateKey]
#p_company_no float(53),
#return_value_argument float(53) OUTPUT
AS
BEGIN
DECLARE
#v_position_no numeric(5, 0)
SELECT #v_position_no = max(POSITION_NO) + 1
FROM DB.POSITION_TABLE with (nolock)
WHERE COMPANY_NO = #p_company_no
SET #return_value_argument = #v_position_no
RETURN
END
I am aware of the potential pitfalls of using with (nolock), but this was added in an unsuccessful attempt to prevent data-locks on my database. In fact, besides the fact that well-written code is obviously preferable, the main reason I am asking this question is to try and cut down the amount of places that could be causing the data-lock.
Is there any way my code could be improved?

Create an auxilliary table with sequences, with one row for every company (as you already did):
create table seq (company int, sequence int);
go
Seed the counters, one for every company (say there are two companies, 1 and 2):
insert seq values
(1, 1), (2, 1);
go
Then all you need is a way to both update and select the new value in a single statement to avoid race conditions. This is how to do it:
declare #next int;
declare #company int;
set #company = 2;
update seq
set #next = sequence = sequence + 1
where company = #company;
select #next
It would be nice to enclose this into a scalar function, but unfortunatelly no updates in functions are allowed. But you already have a stored procedure in place, so just modify the code in it.
And please tell me that the datatypes used are not really floats? Why not ints?

WHILE(1=1)
BEGIN
SELECT #v_position_no = max(POSITION_NO)
FROM DB.POSITION_TABLE with (nolock)
WHERE COMPANY_NO = #p_company_no
INSERT INTO DB.POSITION_TABLE
(COMPANY_NO, POSITION_NO)
SELECT TOP 1 #p_company_no, #v_position_no + 1
FROM DB.POSITION_TABLE with (nolock)
WHERE NOT EXISTS (SELECT 1
FROM DB.POSITION_TABLE with (nolock)
WHERE COMPANY_NO = #p_company_no
AND POSITION_NO = #v_position_no + 1)
IF(##ROWCOUNT > 0)
BREAK;
END
SET #return_value_argument = #v_position_no + 1
Note that this would only insert in the second statement if the POSITION_NO + 1 wasn't since added. If it was then it would try again.