I have created a user defined function to gain performance with queries containing 'WHERE col IN (...)' like this case:
SELECT myCol1, myCol2
FROM myTable
WHERE myCol3 IN (100, 200, 300, ..., 4900, 5000);
The queries are generated from an web application and are in some cases much more complex.
The function definition looks like this:
CREATE FUNCTION [dbo].[udf_CSVtoIntTable]
(
#CSV VARCHAR(MAX),
#Delimiter CHAR(1) = ','
)
RETURNS
#Result TABLE
(
[Value] INT
)
AS
BEGIN
DECLARE #CurrStartPos SMALLINT;
SET #CurrStartPos = 1;
DECLARE #CurrEndPos SMALLINT;
SET #CurrEndPos = 1;
DECLARE #TotalLength SMALLINT;
-- Remove space, tab, linefeed, carrier return
SET #CSV = REPLACE(#CSV, ' ', '');
SET #CSV = REPLACE(#CSV, CHAR(9), '');
SET #CSV = REPLACE(#CSV, CHAR(10), '');
SET #CSV = REPLACE(#CSV, CHAR(13), '');
-- Add extra delimiter if needed
IF NOT RIGHT(#CSV, 1) = #Delimiter
SET #CSV = #CSV + #Delimiter;
-- Get total string length
SET #TotalLength = LEN(#CSV);
WHILE #CurrStartPos < #TotalLength
BEGIN
SET #CurrEndPos = CHARINDEX(#Delimiter, #CSV, #CurrStartPos);
INSERT INTO #Result
VALUES (CAST(SUBSTRING(#CSV, #CurrStartPos, #CurrEndPos - #CurrStartPos) AS INT));
SET #CurrStartPos = #CurrEndPos + 1;
END
RETURN
END
The function is intended to be used like this (or as an INNER JOIN):
SELECT myCol1, myCol2
FROM myTable
WHERE myCol3 IN (
SELECT [Value]
FROM dbo.udf_CSVtoIntTable('100, 200, 300, ..., 4900, 5000', ',');
Do anyone have some optimiztion idears of my function or other ways to improve performance in my case?
Is there any drawbacks that I have missed?
I am using MS SQL Server 2005 Std and .NET 2.0 framework.
I'm not sure of the performance increase, but I would use it as an inner join and get away from the inner select statement.
Using a UDF in a WHERE clause or (worse) a subquery is asking for trouble. The optimizer sometimes gets it right, but often gets it wrong and evaluates the function once for every row in your query, which you don't want.
If your parameters are static (they appear to be) and you can issue a multistatement batch, I'd load the results of your UDF into a table variable, then use a join against the table variable to do your filtering. This should work more reliably.
that loop will kill performance!
create a table like this:
CREATE TABLE Numbers
(
Number int not null primary key
)
that has rows containing values 1 to 8000 or so and use this function:
CREATE FUNCTION [dbo].[FN_ListAllToNumberTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000) --REQUIRED, the list to split apart
)
RETURNS
#ParsedList table
(
RowNumber int
,ListValue varchar(500)
)
AS
BEGIN
/*
DESCRIPTION: Takes the given #List string and splits it apart based on the given #SplitOn character.
A table is returned, one row per split item, with a columns named "RowNumber" and "ListValue".
This function workes for fixed or variable lenght items.
Empty and null items will be included in the results set.
PARAMETERS:
#List varchar(8000) --REQUIRED, the list to split apart
#SplitOn char(1) --OPTIONAL, the character to split the #List string on, defaults to a comma ","
RETURN VALUES:
a table, one row per item in the list, with a column name "ListValue"
TEST WITH:
----------
SELECT * FROM dbo.FN_ListAllToNumTable(',','1,12,123,1234,54321,6,A,*,|||,,,,B')
DECLARE #InputList varchar(200)
SET #InputList='17;184;75;495'
SELECT
'well formed list',LEFT(#InputList,40) AS InputList,h.Name
FROM Employee h
INNER JOIN dbo.FN_ListAllToNumTable(';',#InputList) dt ON h.EmployeeID=dt.ListValue
WHERE dt.ListValue IS NOT NULL
SET #InputList='17;;;184;75;495;;;'
SELECT
'poorly formed list join',LEFT(#InputList,40) AS InputList,h.Name
FROM Employee h
INNER JOIN dbo.FN_ListAllToNumTable(';',#InputList) dt ON h.EmployeeID=dt.ListValue
SELECT
'poorly formed list',LEFT(#InputList,40) AS InputList, ListValue
FROM dbo.FN_ListAllToNumTable(';',#InputList)
**/
/*this will return empty rows, and row numbers*/
INSERT INTO #ParsedList
(RowNumber,ListValue)
SELECT
ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(#SplitOn, ListValue, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS ListValue
) AS InnerQuery
INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue)
WHERE SUBSTRING(ListValue, number, 1) = #SplitOn
RETURN
END /*Function FN_ListAllToNumTable*/
I have other versions that do not return empty or null rows, ones that return just the item and not the row number, etc. Look in the header comment to see how to use this as part of a JOIN, which is much faster than in a where clause.
The CLR solution did not give me an good performance so I will use a recursive query. So here is the definition of the SP I will use (mostly based on Erland Sommarskogs examples):
CREATE FUNCTION [dbo].[priudf_CSVtoIntTable]
(
#CSV VARCHAR(MAX),
#Delimiter CHAR(1) = ','
)
RETURNS
#Result TABLE
(
[Value] INT
)
AS
BEGIN
-- Remove space, tab, linefeed, carrier return
SET #CSV = REPLACE(#CSV, ' ', '');
SET #CSV = REPLACE(#CSV, CHAR(9), '');
SET #CSV = REPLACE(#CSV, CHAR(10), '');
SET #CSV = REPLACE(#CSV, CHAR(13), '');
WITH csvtbl(start, stop) AS
(
SELECT start = CONVERT(BIGINT, 1),
stop = CHARINDEX(#Delimiter, #CSV + #Delimiter)
UNION ALL
SELECT start = stop + 1,
stop = CHARINDEX(#Delimiter, #CSV + #Delimiter, stop + 1)
FROM csvtbl
WHERE stop > 0
)
INSERT INTO #Result
SELECT CAST(SUBSTRING(#CSV, start, CASE WHEN stop > 0 THEN stop - start ELSE 0 END) AS INT) AS [Value]
FROM csvtbl
WHERE stop > 0
OPTION (MAXRECURSION 1000)
RETURN
END
Thank for the input, I have to admit that I have made som bad research before I started my work. I found that Erland Sommarskog has written a lot of this problem on his webpage, after your responeses and after reading his page I decided that I will try to make a CLR to solve this.
I tried a recursive query, this resulted in good performance but I will try CLR function anyway.
Related
In t-sql my dilemma is that I have to parse a potentially long string (up to 500 characters) for any of over 230 possible values and remove them from the string for reporting purposes. These values are a column in another table and they're all upper case and 4 characters long with the exception of two that are 5 characters long.
Examples of these values are:
USFRI
PROME
AZCH
TXJS
NYDS
XVIV. . . . .
Example of string before:
"Offered to XVIV and USFRI as back ups. No response as of yet."
Example of string after:
"Offered to and as back ups. No response as of yet."
Pretty sure it will have to be a UDF but I'm unable to come up with anything other than stripping ALL the upper case characters out of the string with PATINDEX which is not the objective.
This is unavoidably cludgy but one way is to split your string into rows, once you have a set of words the rest is easy; Simply re-aggregate while ignoring the matching values*:
with t as (
select 'Offered to XVIV and USFRI as back ups. No response as of yet.' s
union select 'Another row AZCH and TXJS words.'
), v as (
select * from (values('USFRI'),('PROME'),('AZCH'),('TXJS'),('NYDS'),('XVIV'))v(v)
)
select t.s OriginalString, s.Removed
from t
cross apply (
select String_Agg(j.[value], ' ') within group(order by Convert(tinyint,j.[key])) Removed
from OpenJson(Concat('["',replace(s, ' ', '","'),'"]')) j
where not exists (select * from v where v.v = j.[value])
)s;
* Requires a fully-supported version of SQL Server.
build a function to do the cleaning of one sentence, then call that function from your query, something like this SELECT Col1, dbo.fn_ReplaceValue(Col1) AS cleanValue, * FROM MySentencesTable. Your fn_ReplaceValue will be something like the code below, you could also create the table variable outside the function and pass it as parameter to speed up the process, but this way is all self contained.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION fn_ReplaceValue(#sentence VARCHAR(500))
RETURNS VARCHAR(500)
AS
BEGIN
DECLARE #ResultVar VARCHAR(500)
DECLARE #allValues TABLE (rowID int, sValues VARCHAR(15))
DECLARE #id INT = 0
DECLARE #ReplaceVal VARCHAR(10)
DECLARE #numberOfValues INT = (SELECT COUNT(*) FROM MyValuesTable)
--Populate table variable with all values
INSERT #allValues
SELECT ROW_NUMBER() OVER(ORDER BY MyValuesCol) AS rowID, MyValuesCol
FROM MyValuesTable
SET #ResultVar = #sentence
WHILE (#id <= #numberOfValues)
BEGIN
SET #id = #id + 1
SET #ReplaceVal = (SELECT sValue FROM #allValues WHERE rowID = #id)
SET #ResultVar = REPLACE(#ResultVar, #ReplaceVal, SPACE(0))
END
RETURN #ResultVar
END
GO
I suggest creating a table (either temporary or permanent), and loading these 230 string values into this table. Then use it in the following delete:
DELETE
FROM yourTable
WHERE col IN (SELECT col FROM tempTable);
If you just want to view your data sans these values, then use:
SELECT *
FROM yourTable
WHERE col NOT IN (SELECT col FROM tempTable);
I have a string like this:
Apple
I want to include a separator after each character so the end result will turn out like this:
A,p,p,l,e
In C#, we have one liner method to achieve the above with Regex.Replace('Apple', ".{1}", "$0,");
I can only think of looping each character with charindex to append the separator but seems a little complicated. Is there any elegant way and simpler way to achieve this?
Thanks HABO for the suggestions. I'm able to generate the result that I want using the code but takes a little bit of time to really understand how the code work.
After some searching, I manage to found one useful article to insert empty spaces between each character and it's easier for me to understand.
I modify the code a little to define and include desire separator instead of fixing it to space as the separator:
DECLARE #pos INT = 2 -- location where we want first space
DECLARE #result VARCHAR(100) = 'Apple'
DECLARE #separator nvarchar(5) = ','
WHILE #pos < LEN(#result)+1
BEGIN
SET #result = STUFF(#result, #pos, 0, #separator);
SET #pos = #pos+2;
END
select #result; -- Output: A,p,p,l,e
Reference
In following SQL scripts, I get each character using SUBSTRING() function using with a number table (basically I used spt_values view here for simplicity) and then I concatenate them via two different methods, you can choose one
If you are using SQL Server 2017, we have a new SQL string aggregation function
First script uses string_agg function
declare #str nvarchar(max) = 'Apple'
SELECT
string_agg( substring(#str,number,1) , ',') Within Group (Order By number)
FROM master..spt_values n
WHERE
Type = 'P' and
Number between 1 and len(#str)
If you are working with a previous version, you can use string concatenation using FOR XML Path and SQL Stuff function as follows
declare #str nvarchar(max) = 'Apple'
; with cte as (
SELECT
number,
substring(#str,number,1) as L
FROM master..spt_values n
WHERE
Type = 'P' and
Number between 1 and len(#str)
)
SELECT
STUFF(
(
SELECT
',' + L
FROM cte
order by number
FOR XML PATH('')
), 1, 1, ''
)
Both solution yields the same result, I hope it helps
If you have SQL Server 2017 and a copy of ngrams8k it's ultra simple:
declare #word varchar(100) = 'apple';
select newString = string_agg(token, ',') within group (order by position)
from dbo.ngrams8k(#word,1);
For pre-2017 systems it's almost as simple:
declare #word varchar(100) = 'apple';
select newstring =
( select token + case len(#word)+1-position when 1 then '' else ',' end
from dbo.ngrams8k(#word,1)
order by position
for xml path(''))
One ugly way to do it is to split the string into characters, ideally using a numbers table, and reassemble it with the desired separator.
A less efficient implementation uses recursion in a CTE to split the characters and insert the separator between pairs of characters as it goes:
declare #Sample as VarChar(20) = 'Apple';
declare #Separator as Char = ',';
with Characters as (
select 1 as Position, Substring( #Sample, 1, 1 ) as Character
union all
select Position + 1,
case when Position & 1 = 1 then #Separator else Substring( #Sample, Position / 2 + 1, 1 ) end
from Characters
where Position < 2 * Len( #Sample ) - 1 )
select Stuff( ( select Character + '' from Characters order by Position for XML Path( '' ) ), 1, 0, '' ) as Result;
You can replace the select Stuff... line with select * from Characters; to see what's going on.
Try this
declare #var varchar(50) ='Apple'
;WITH CTE
AS
(
SELECT
SeqNo = 1,
MyStr = #var,
OpStr = CAST('' AS VARCHAR(50))
UNION ALL
SELECT
SeqNo = SeqNo+1,
MyStr = MyStR,
OpStr = CAST(ISNULL(OpStr,'')+SUBSTRING(MyStR,SeqNo,1)+',' AS VARCHAR(50))
FROM CTE
WHERE SeqNo <= LEN(#var)
)
SELECT
OpStr = LEFT(OpStr,LEN(OpStr)-1)
FROM CTE
WHERE SeqNo = LEN(#Var)+1
I'm currently working on a problem where certain characters need to be cleaned from strings that exist in a table. Normally I'd do a simple UPDATE with a replace, but in this case there are 32 different characters that need to be removed.
I've done some looking around and I can't find any great solutions for quickly cleaning strings that already exist in a table.
Things I've looked into:
Doing a series of nested replaces
This solution is do-able, but for 32 different replaces it would require either some ugly code, or hacky dynamic sql to build a huge series of replaces.
PATINDEX and while loops
As seen in this answer it is possible to mimic a kind of regex replace, but I'm working with a lot of data so I'm hesitant to even trust the improved solution to run in a reasonable amount of time when the data volume is large.
Recursive CTEs
I tried a CTE approuch to this problem, but it didn't run terribly fast once the number of rows got large.
For reference:
CREATE TABLE #BadChar(
id int IDENTITY(1,1),
badString nvarchar(10),
replaceString nvarchar(10)
);
INSERT INTO #BadChar(badString, replaceString) SELECT 'A', '^';
INSERT INTO #BadChar(badString, replaceString) SELECT 'B', '}';
INSERT INTO #BadChar(badString, replaceString) SELECT 's', '5';
INSERT INTO #BadChar(badString, replaceString) SELECT '-', ' ';
CREATE TABLE #CleanMe(
clean_id int IDENTITY(1,1),
DirtyString nvarchar(20)
);
DECLARE #i int;
SET #i = 0;
WHILE #i < 100000 BEGIN
INSERT INTO #CleanMe(DirtyString) SELECT 'AAAAA';
INSERT INTO #CleanMe(DirtyString) SELECT 'BBBBB';
INSERT INTO #CleanMe(DirtyString) SELECT 'AB-String-BA';
SET #i = #i + 1
END;
WITH FixedString (Step, String, cid) AS (
SELECT 1 AS Step, REPLACE(DirtyString, badString, replaceString), clean_id
FROM #BadChar, #CleanMe
WHERE id = 1
UNION ALL
SELECT Step + 1, REPLACE(String, badString, replaceString), cid
FROM FixedString AS T1
JOIN #BadChar AS T2 ON T1.step + 1 = T2.id
Join #CleanMe AS T3 on T1.cid = t3.clean_id
)
SELECT String FROM FixedString WHERE step = (SELECT MAX(STEP) FROM FixedString);
DROP TABLE #BadChar;
DROP TABLE #CleanMe;
Use a CLR
It seems like this is a common solution many people use, but the environment I'm in doesn't make this a very easy one to embark on.
Are there any other ways to go about this I've over looked? Or any improvements upon the methods I've already looked into for this?
Leveraging the idea from Alan Burstein's solution, you could do something like this, if you wanted to hard code the bad/replace strings. This would work for bad/replace strings longer than a single character as well.
CREATE FUNCTION [dbo].[CleanStringV1]
(
#String nvarchar(4000)
)
RETURNS nvarchar(4000) WITH SCHEMABINDING AS
BEGIN
SELECT #string = REPLACE
(
#string COLLATE Latin1_General_BIN,
badString,
replaceString
)
FROM
(VALUES
('A', '^')
, ('B', '}')
, ('s', '5')
, ('-', ' ')
) t(badString, replaceString)
RETURN #string;
END;
Or, if you have a table containing the bad/replace strings, then
CREATE FUNCTION [dbo].[CleanStringV2]
(
#String nvarchar(4000)
)
RETURNS nvarchar(4000) AS
BEGIN
SELECT #string = REPLACE
(
#string COLLATE Latin1_General_BIN,
badString,
replaceString
)
FROM BadChar
RETURN #string;
END;
These are case sensitive. You can remove the COLLATE bit if you want case insensitive. I did a few small tests, and these were not much slower than nested REPLACE. The first one with the hardcoded strings was a the faster of the two, and was nearly as fast as nested REPLACE.
Given the below table and data:
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
DROP TABLE #Temp
CREATE TABLE #Temp
(
ID INT,
Code INT,
PDescription VARCHAR(2000)
)
INSERT INTO #Temp
(ID,
Code,
PDescription)
VALUES (1,0001,'c and d, together'),
(2,0002,'equals or Exceeds $27.00'),
(3,0003,'Fruit Evaporating Or preserving'),
(4,0004,'Domestics And domestic Maintenance'),
(5,0005,'Bakeries and cracker')
SELECT *
FROM #Temp
DROP TABLE #Temp
Output:
ID Code PDescription
1 1 c and d, together
2 2 equals or Exceeds $27.00
3 3 Fruit Evaporating Or preserving
4 4 Domestics And domestic Maintenance
5 5 Bakeries and cracker
I need a way to achieve the below update to the description field:
ID Code PDescription
1 1 C and D, Together
2 2 Equals or Exceeds $27.00
3 3 Fruit Evaporating or Preserving
4 4 Domestics and Domestic Maintenance
5 5 Bakeries and Cracker
If you fancied going the SQL CLR route the function could look something like
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
//One or more "word characters" or apostrophes
private static readonly Regex _regex = new Regex("[\\w']+");
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString ProperCase(SqlString subjectString)
{
string resultString = null;
if (!subjectString.IsNull)
{
resultString = _regex.Replace(subjectString.ToString().ToLowerInvariant(),
(Match match) =>
{
var word = match.Value;
switch (word)
{
case "or":
case "of":
case "and":
return word;
default:
return char.ToUpper(word[0]) + word.Substring(1);
}
});
}
return new SqlString(resultString);
}
}
Doubtless there may be Globalization issues in the above but it should do the job for English text.
You could also investigate TextInfo.ToTitleCase but that still leaves you needing to handle your exceptions.
The following function is not the most elegant of solutions but should do what you want.
ALTER FUNCTION [dbo].[ToProperCase](#textValue AS NVARCHAR(2000))
RETURNS NVARCHAR(2000)
AS
BEGIN
DECLARE #reset BIT;
DECLARE #properCase NVARCHAR(2000);
DECLARE #index INT;
DECLARE #character NCHAR(1);
SELECT #reset = 1, #index=1, #properCase = '';
WHILE (#index <= len(#textValue))
BEGIN
SELECT #character= substring(#textValue,#index,1),
#properCase = #properCase + CASE WHEN #reset=1 THEN UPPER(#character) ELSE LOWER(#character) END,
#reset = CASE WHEN #character LIKE N'[a-zA-Z\'']' THEN 0 ELSE 1 END,
#index = #index +1
END
SET #properCase = N' ' + #properCase + N' ';
SET #properCase = REPLACE(#properCase, N' And ', N' and ');
SET #properCase = REPLACE(#properCase, N' Or ', N' or ');
SET #properCase = REPLACE(#properCase, N' Of ', N' of ');
RETURN RTRIM(LTRIM(#properCase))
END
Example use:
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
DROP TABLE #Temp
CREATE TABLE #Temp
(
ID INT,
Code INT,
PDescription VARCHAR(2000)
)
INSERT INTO #Temp
(ID,
Code,
PDescription)
VALUES (1,0001, N'c and d, together and'),
(2,0002, N'equals or Exceeds $27.00'),
(3,0003, N'Fruit Evaporating Or preserving'),
(4,0004, N'Domestics And domestic Maintenance'),
(5,0005, N'Bakeries and cracker')
SELECT ID, Code, dbo.ToProperCase(PDescription) AS [Desc]
FROM #Temp
DROP TABLE #Temp
If you want to convert your text to proper case before inserting into table, then simply call function as follow:
INSERT INTO #Temp
(ID,
Code,
PDescription)
VALUES (1,0001, dbo.ToProperCase( N'c and d, together and')),
(2,0002, dbo.ToProperCase( N'equals or Exceeds $27.00')),
(3,0003, dbo.ToProperCase( N'Fruit Evaporating Or preserving')),
(4,0004, dbo.ToProperCase( N'Domestics And domestic Maintenance')),
(5,0005, dbo.ToProperCase( N'Bakeries and cracker'))
This is a dramatically modified version of my Proper UDF. The good news is you may be able to process the entire data-set in ONE SHOT rather than linear.
Take note of #OverR (override)
Declare #Table table (ID int,Code int,PDescription varchar(150))
Insert into #Table values
(1,1,'c and d, together'),
(2,2,'equals or Exceeds $27.00'),
(3,3,'Fruit Evaporating Or preserving'),
(4,4,'Domestics And domestic Maintenance'),
(5,5,'Bakeries and cracker')
-- Generate Base Mapping Table - Can be an Actual Table
Declare #Pattn table (Key_Value varchar(25));Insert into #Pattn values (' '),('-'),('_'),(','),('.'),('&'),('#'),(' Mc'),(' O''') -- ,(' Mac')
Declare #Alpha table (Key_Value varchar(25));Insert Into #Alpha values ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('X')
Declare #OverR table (Key_Value varchar(25));Insert Into #OverR values (' and '),(' or '),(' of ')
Declare #Map Table (MapSeq int,MapFrom varchar(25),MapTo varchar(25))
Insert Into #Map
Select MapSeq=1,MapFrom=A.Key_Value+B.Key_Value,MapTo=A.Key_Value+B.Key_Value From #Pattn A Join #Alpha B on 1=1
Union All
Select MapSeq=99,MapFrom=A.Key_Value,MapTo=A.Key_Value From #OverR A
-- Convert Base Data Into XML
Declare #XML xml
Set #XML = (Select KeyID=ID,String=+' '+lower(PDescription)+' ' from #Table For XML RAW)
-- Convert XML to varchar(max) for Global Search & Replace
Declare #String varchar(max)
Select #String = cast(#XML as varchar(max))
Select #String = Replace(#String,MapFrom,MapTo) From #Map Order by MapSeq
-- Convert Back to XML
Select #XML = cast(#String as XML)
-- Generate Final Results
Select KeyID = t.col.value('#KeyID', 'int')
,NewString = ltrim(rtrim(t.col.value('#String', 'varchar(150)')))
From #XML.nodes('/row') AS t (col)
Order By 1
Returns
KeyID NewString
1 C and D, Together
2 Equals or Exceeds $27.00
3 Fruit Evaporating or Preserving
4 Domestics and Domestic Maintenance
5 Bakeries and Cracker
You don't even need functions and temporary objects. Take a look at this query:
WITH Processor AS
(
SELECT ID, Code, 1 step,
CONVERT(nvarchar(MAX),'') done,
LEFT(PDescription, CHARINDEX(' ', PDescription, 0)-1) process,
SUBSTRING(PDescription, CHARINDEX(' ', PDescription, 0)+1, LEN(PDescription)) waiting
FROM #temp
UNION ALL
SELECT ID, Code, step+1,
done+' '+CASE WHEN process IN ('and', 'or', 'of') THEN LOWER(process) ELSE UPPER(LEFT(process, 1))+LOWER(SUBSTRING(process, 2, LEN(process))) END,
CASE WHEN CHARINDEX(' ', waiting, 0)>0 THEN LEFT(waiting, CHARINDEX(' ', waiting, 0)-1) ELSE waiting END,
CASE WHEN CHARINDEX(' ', waiting, 0)>0 THEN SUBSTRING(waiting, CHARINDEX(' ', waiting, 0)+1, LEN(waiting)) ELSE NULL END FROM Processor
WHERE process IS NOT NULL
)
SELECT ID, Code, done PSDescription FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY step DESC) RowNum FROM Processor
) Ordered
WHERE RowNum=1
ORDER BY ID
It produces desired result as well. You can SELECT * FROM Processor to see all steps executed.
I have approximately 30,000 records where I need to split the Description field and so far I can only seem to achieve this in Excel. An example Description would be:
1USBCP 2RJ45C6 1DVI 1DP 3MD 3MLP HANDS
Below is my Excel function:
=TRIM(MID(SUBSTITUTE($G309," ",REPT(" ",LEN($G309))),((COLUMNS($G309:G309)-1)*LEN($G309))+1,LEN($G309)))
This is then dragged across ten Excel columns, and splits the description field at each space.
I have seen many questions asked about splitting a string in SQL but they only seem to cover one space, not multiple spaces.
There is no easy function in SQL server to split strings. At least I don't know it. I use usually some trick that I found somewhere in the Internet some time ago. I modified it to your example.
The trick is that first we try to figure out how many columns do we need. We can do it by checking how many empty strings we have in the string. The easiest way is lenght of string - lenght of string without empty string.
After that for each string we try to find start and end of each word by position. At the end we cut simply string by start and end position and assign to coulmns. The details are in the query. Have fun!
CREATE TABLE test(id int, data varchar(100))
INSERT INTO test VALUES (1,'1USBCP 2RJ45C6 1DVI 1DP 3MD 3MLP HANDS')
INSERT INTO test VALUES (2,'Shorter one')
DECLARE #pivot varchar(8000)
DECLARE #select varchar(8000)
SELECT
#pivot=coalesce(#pivot+',','')+'[col'+cast(number+1 as varchar(10))+']'
FROM
master..spt_values where type='p' and
number<=(SELECT max(len(data)-len(replace(data,',',''))) FROM test)
SELECT
#select='
select p.*
from (
select
id,substring(data, start+2, endPos-Start-2) as token,
''col''+cast(row_number() over(partition by id order by start) as varchar(10)) as n
from (
select
id, data, n as start, charindex('','',data,n+2) endPos
from (select number as n from master..spt_values where type=''p'') num
cross join
(
select
id, '' '' + data +'' '' as data
from
test
) m
where n < len(data)-1
and substring(odata,n+1,1) = '','') as data
) pvt
Pivot ( max(token)for n in ('+#pivot+'))p'
EXEC(#select)
Here you can find example in SQL Fiddle
I didn't notice that you want to get rid of multiple blank spaces.
To do it please create some function that preprare your data :
CREATE FUNCTION dbo.[fnRemoveExtraSpaces] (#Number AS varchar(1000))
Returns Varchar(1000)
As
Begin
Declare #n int -- Length of counter
Declare #old char(1)
Set #n = 1
--Begin Loop of field value
While #n <=Len (#Number)
BEGIN
If Substring(#Number, #n, 1) = ' ' AND #old = ' '
BEGIN
Select #Number = Stuff( #Number , #n , 1 , '' )
END
Else
BEGIN
SET #old = Substring(#Number, #n, 1)
Set #n = #n + 1
END
END
Return #number
END
After that use the new version that removes extra spaces.
DECLARE #pivot varchar(8000)
DECLARE #select varchar(8000)
SELECT
#pivot=coalesce(#pivot+',','')+'[col'+cast(number+1 as varchar(10))+']'
FROM
master..spt_values where type='p' and
number<=(SELECT max(len(dbo.fnRemoveExtraSpaces(data))-len(replace(dbo.fnRemoveExtraSpaces(data),' ',''))) FROM test)
SELECT
#select='
select p.*
from (
select
id,substring(data, start+2, endPos-Start-2) as token,
''col''+cast(row_number() over(partition by id order by start) as varchar(10)) as n
from (
select
id, data, n as start, charindex('' '',data,n+2) endPos
from (select number as n from master..spt_values where type=''p'') num
cross join
(
select
id, '' '' + dbo.fnRemoveExtraSpaces(data) +'' '' as data
from
test
) m
where n < len(data)-1
and substring(data,n+1,1) = '' '') as data
) pvt
Pivot ( max(token)for n in ('+#pivot+'))p'
EXEC(#select)
I am probably not understanding your question, but all that you are doing in that formula, can be done almost exactly the same in SQL. I see someone has already answered but to my mind, how can it be necessary to do all that when you can do this. I might be wrong. But here goes.
declare #test as varchar(100)
set #test='abcd1234567'
select right(#test,2)
, left(#test,2)
, len(#test)
, case when len(#test)%2>0
then left(right(#test,round(len(#test)/2,0)+1),1)
else left(right(#test,round(len(#test)/2,0)+1),2) end
Results
67 ab 11 2
So right, left, length and mid can all be achieved.
If the spaces are the "substring" dividers, then: I dont remember well the actual syntax for do-while inside selects of sql, neither have i actually done that per se, but I don't see why it should not be possible. If it doesn't work then you need a temporary table and if that does not work you need a cursor. The cursor would be an external loop around this one to fetch and process a single string at a time. Or you can do something more clever. I am just a novice.
declare #x varchar(1)
declare #n integer
declare #i integer
declare #str varchar(100) -- this is your description. Fetch it and assign it. if in a cursor just use column-name
set #x = null
set #n = 0
set #i = 0
while n < len(#str)
while NOT #x = " "
begin
set #x = left(right(#str,n),1)
n = n+1
end
--insert into or update #temptable blablabla here.
Use i and n to locate substring and then left(right()) it out. or you can SELECT it, but that is a messy procedure if the number of substrings are long. Continue with:
set i = n
set #str = right(#str, i) -- this includes the " ". left() it out at will.
end
Now, a final comment, there should perhaps be a third loop checking for if you are at the last "substring" because I see now this code will throw error when it gets to the end. or "add" an empty space at the end to #str, that will also work. But my time is up. This is a suggestion at least.