String Manipulation in SQL Server (STRING_SPLIT, REVERSE) - sql-server

I want to reverse the sub-strings in NVARCHAR column, that are separated by one character like '-', for example:
Input column cl:
a1-b1-c1-d1
Required output:
d1-c1-b1-a1
I tried REVERSE(cl), the result was 1d-1c-1b-1a!
The best approach I think is using:
STRING_SPLIT(cl,'-')
And then looking for reversing the resulted sub strings and rejoining them, but since we don't know how many delimited sub strings, it is still difficult to handle.
How can we achieve this?
Thank you in advance

REVERSE isn't what you are after here. What you are after is a string splitter that supports (is aware of) ordinal positions; STRING_SPLIT is documented that it explicitly "doesn't care" about the ordinal positions of values in a delimited string.
One function that is aware of ordinal positions is DelimitedSplit8k_LEAD. You can then use that, along with STRING_AGG to recreate your delimited string:
SELECT STRING_AGG(DS.item,'-') WITHIN GROUP (ORDER BY DS.ItemNumber DESC) AS R
FROM (VALUES('a1-b1-c1-d1'))V(S)
CROSS APPLY dbo.DelimitedSplit8K_LEAD(V.S,'-') DS;
Of course, the real solution here is to stop storing delimited data in your database in the first place.

If using an UDF is not an option, you may try a JSON-based approach. You need to transform the input values into a valid JSON array (a1-b1-c1-d1 into ["a1", "b1","c1","d1"]) and parse this array with OPENJSON():
CREATE TABLE Data (ColumnData nvarchar(max))
INSERT INTo Data (ColumnData) VALUES (N'a1-b1-c1-d1')
UPDATE Data
SET ColumnData = (
SELECT STRING_AGG([value], N'-') WITHIN GROUP (ORDER BY CONVERT(int, [key]) DESC)
FROM OPENJSON(CONCAT(N'["', REPLACE(STRING_ESCAPE(ColumnData, 'json'), N'-', '","'), '"]'))
)
Result:
ColumnData
-----------
d1-c1-b1-a1

If you have a modern version of SQL Server (2016 or higher), you can do it as you said: you split the string with string_split, you reverse its order, and aggregate the result with string_agg.
with cte as (
select value,
row_number() over (order by (select null)) as number
from string_split('a1-b1-c1-d1', '-')
)
select string_agg(value, '-') within group (order by number desc)
from cte

This splits the string while preserving the ordinal position without using a custom splitter. To preserve the ordering of words the CHARINDEX function is used as per this.
declare
#string nvarchar(4000)='a1-b1-c1-d1',
#added_delimitter CHAR(1)= '-';
;with ndx_split_cte(split_val, split_ndx) as (
select
sp.[value],
CHARINDEX(#added_delimitter + sp.[value] + #added_delimitter, #added_delimitter + #string + #added_delimitter)
from
string_split(#string, '-') sp)
select string_agg(split_val, '-') within group (order by split_ndx desc) rev_split_str
from ndx_split_cte;
Results
rev_split_str
d1-c1-b1-a1

Related

I want to know How to Segregate Digits Chars and Special Chars In Sql server Using CTE?

I have a string which contains characters,special characters and numbers in it.
what i have to do this i have to seperate all characters into one column and special characters into to another column and numbers into another column using common table expression.it would be helpful for me if anyone provide the code by using common table expression for obtaining the required result.
The Code what I have Tried So far:
DECLARE #search VARCHAR(200)
SET #search='123%#'
;with cte (Num,indexing) as (
SELECT
#search,0
UNION ALL
SELECT
indexing
num
from
cte
cross apply
(select indexing+1) C(NewLevel)
CROSS APPLY
(SELECT REPLACE(Num, NewLevel, '')) C2(NewInput)
WHERE
indexing<=LEN(NUM)
)
select Num
from cte
My input Should be:abcd12345$###
my Expected Output is:
Column 1 Column 2 Column 3
12345 $#### abcd
I would, personally, use a Tally to split the string into separate characters, and then use conditional string aggregation to create the new strings:
DECLARE #YourString varchar(200) = 'abcd12345$###';
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP(LEN(#YourString))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4),
Chars AS(
SELECT V.YourString,
SS.C,
T.I,
CASE WHEN SS.C LIKE '[A-z]' THEN 'Letter'
WHEN SS.C LIKE '[0-9]' THEN 'Number'
ELSE 'Special'
END AS CharType
FROM (VALUES(#YourString)) V(YourString)
CROSS JOIN Tally T
CROSS APPLY (VALUES(SUBSTRING(V.YourString,T.I,1)))SS(C))
SELECT STRING_AGG(CASE CharType WHEN 'Letter' THEN C END,'') WITHIN GROUP (ORDER BY I) AS Letters,
STRING_AGG(CASE CharType WHEN 'Number' THEN C END,'') WITHIN GROUP (ORDER BY I) AS Numbers,
STRING_AGG(CASE CharType WHEN 'Special' THEN C END,'') WITHIN GROUP (ORDER BY I) AS Special
FROM Chars
GROUP BY YourString;
db<>fiddle
Note, as well, the use of whitespace, so yo can clearly see where one CTE starts and another begins.
Explanation
The first CTE (N), is just 10 NULL values. I use 10 as multiples of 10 are easy to work with.
Then next CTE (Tally) creates a Tally table. It Cross joins N to itself 4 times, creating 10^4 rows, or 10,000 rows, each with an incrementing number, due to ROW_NUMBER. We don't need all of those rows, however, so I limit it to the length of your variable.
Then we get onto Chars; this splits the string into its individual characters, 1 row per character, numbers them and defines the type they are, a Letter, Number or Special Character.
Then we finally, after the CTEs, aggregate the values using a conditional STRING_AGG, so that only the character type you want is aggregated in the column.
If you "must" use a rCTE, you can do it this way, but, like I have mentioned, a rCTE is a far slower solution and suffers from maximum recursion errors (If you have more than 100 characters, you'll need to change the MAXRECURSION value in the OPTION clause):
DECLARE #YourString varchar(200) = 'abcd12345$###';
WITH Chars AS(
SELECT V.YourString,
1 AS I,
SS.C,
CASE WHEN SS.C LIKE '[A-z]' THEN 'Letter'
WHEN SS.C LIKE '[0-9]' THEN 'Number'
ELSE 'Special'
END AS CharType
FROM (VALUES(#YourString))V(YourString)
CROSS APPLY (VALUES(SUBSTRING(#YourString,1,1)))SS(C)
UNION ALL
SELECT C.YourString,
C.I + 1 AS I,
SS.C,
CASE WHEN SS.C LIKE '[A-z]' THEN 'Letter'
WHEN SS.C LIKE '[0-9]' THEN 'Number'
ELSE 'Special'
END AS CharType
FROM Chars C
CROSS APPLY (VALUES(SUBSTRING(#YourString,C.I + 1,1)))SS(C)
WHERE C.I + 1 <= LEN(C.YourString))
SELECT STRING_AGG(CASE CharType WHEN 'Letter' THEN C END,'') WITHIN GROUP (ORDER BY I) AS Letters,
STRING_AGG(CASE CharType WHEN 'Number' THEN C END,'') WITHIN GROUP (ORDER BY I) AS Numbers,
STRING_AGG(CASE CharType WHEN 'Special' THEN C END,'') WITHIN GROUP (ORDER BY I) AS Special
FROM Chars
GROUP BY YourString;
db<>fiddle

Order by string returns wrong order

I am using a column which is named ItemCode. ItemCode is Varchar(50) type.
Here is my query
Select * from Inventory order by ItemCode
So, now my result is looks like
ItemCode-1
ItemCode-10
ItemCode-2
ItemCode-20
And so on.
How can I order my string as the example below?
ItemCode-1
ItemCode-2
ItemCode-10
ItemCode-20
Should I convert my column as number? Also I mention that I have some fields that contain no number.
You could order by the numbers as
SELECT Str
FROM
(
VALUES
('ItemCode-1'),
('ItemCode-10'),
('ItemCode-2'),
('ItemCode-20')
) T(Str)
ORDER BY CAST(RIGHT(Str, LEN(Str) - CHARINDEX('-', Str)) AS INT)
Note: Since you tagged your Q with SQL Server 2008 tag, you should upgrade as soon as possible because it's out of support.
UPDATE:
Since you don't provide a good sample data, I'm just guessing.
Here is another way may feet your requirements
SELECT Str
FROM
(
VALUES
('ItemCode-1'),
('ItemCode-10'),
('ItemCode-2'),
('ItemCode-20'),
('Item-Code')
) T(Str)
ORDER BY CASE WHEN Str LIKE '%[0-9]' THEN CAST(RIGHT(Str, LEN(Str) - CHARINDEX('-', Str)) AS INT) ELSE 0 END
This is an expected behavior, based on this:
that is lexicographic sorting which means basically the language
treats the variables as strings and compares character by character
You need to use something like this:
ORDER BY
CASE WHEN ItemCode like '%[0-9]%'
THEN Replicate('0', 100 - Len(ItemCode)) + ItemCode
ELSE ItemCode END

Query for date value AFTER specified patindex value in TEXT column using SQL Server

I am querying for a date value that follows a specific phrase in the same text column value. There are a number of date values in this text column but I'm needing only the DATE following the phrase I've found via a patindex value. Any help/direction would be appreciated. Thanks.
Here is my SQL code:
SELECT
NotesSysID,
PATINDEX('%Demand Due Date:%', NoteText) AS [Index of DemandDueDate text],
PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', NoteText) AS [Index of DemandDueDate date],
SUBSTRING(NoteText, PATINDEX('%Demand Due Date:%', NoteText), (PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', NoteText)))
FROM
#temp_ExtractedNotes;
Here is a pic of my data, please note that the 2nd index is LESS than the Index of DemandDueDate text found and I'm needing the subsequent date AFTER the Index of DemandDueDate text column. Hopefully this makes sense.
I think your code will be easier for you to work with if you use a CTE and some stepwise refinement. That'll free you from trying to do everything with one hugely-nested SELECT statement.
;
WITH FirstCut as (
SELECT
NotesSysID,
LocationOfText = PATINDEX('%Demand Due Date:%', NoteText),
NoteText
FROM #temp_ExtractedNotes
),
SecondCut as (
SELECT
NotesSysID,
NoteText,
-- making assumption date will be within first 250 chars of text
DemandDueDateSection = SUBSTRING( NoteText, [LocationOfText], 250)
FROM FirstCut
),
ThirdCut as (
SELECT
NotesSysID,
NoteText,
DemandDueDateSection,
LocationOfDate = PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', DemandDueDateSection)
FROM SecondCut
),
FourthCut as (
SELECT
NotesSysID,
NoteText,
DateAsText = SUBSTRING( DemandDueDateSection, LocationOfDate, 10 )
FROM ThirdCut
)
SELECT
NotesSysID,
NoteText,
DemandDueDate = CONVERT( DateTime, DateAsText)
FROM FourthCut
The issue you're having is because your second PATINDEX isn't finding the date you want; it's finding the first date in the string, which, as you noted, appears before the PATINDEX of the phrase you're searching for %Demand Due Date:%. The third parameter of SUBSTRING is LENGTH, which is to say how many characters after the second parameter you want to pull. By using that second PATINDEX value as the third parameter in your SUBSTRING you're returning a sub-string that starts where you want it to, and is of LENGTH equal to the number of characters into the string where that first date appears.
Which, of course, isn't what you want. To #ZLK's point in the comments, first, you need to do a nested PATINDEX. That's going to be pretty slow, so I'm hoping there aren't a bazillion records in your temp table.
Based on the sample image, it looks like the date you're interested in can appear a variable number of characters after %Demand Due Date:%. We'll start by adding 16 to the PATINDEX of %Demand Due Date:% (because that's how many characters are in %Demand Due Date:%, so we'll just start right after that). Then we'll pick up the next 100 characters. You can tweak that later if you need more or not that many.
So you're first SUBSTRING will look like this:
SUBSTRING(NoteText, PATINDEX('%Demand Due Date:%', NoteText) + 16, 100)
Now we have to search that sub-string for the second pattern, the one that should yield a date for you.
PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', SUBSTRING(NoteText, PATINDEX('%Demand Due Date:%', NoteText) + 16, 100))
The number returned there is the point where your date value starts within the 100 characters following %Demand Due Date:%. Armed with that number, you just need to SUBSTRING out the next ten characters, and, just for fun, CAST it as a DATE. That big ugly formula will look like this:
DECLARE #test VARCHAR(200) = 'foo bar Demand Due Date: 12/21/2018 bar foo foo bar';
SELECT
CAST(
SUBSTRING(
SUBSTRING
(#test, PATINDEX('%Demand Due Date:%', #test) + 16, 100),
PATINDEX
( '%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',
SUBSTRING
(#test, PATINDEX('%Demand Due Date:%', #test) + 16, 100)
)
,10)
AS DATE);
Result:
2018-12-21
Rextester: https://rextester.com/KCY79989
Grab a copy of PatternSplitCM.
CREATE FUNCTION dbo.PatternSplitCM
(
#List VARCHAR(8000) = NULL
,#Pattern VARCHAR(50)
) RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
WITH numbers AS (
SELECT TOP(ISNULL(DATALENGTH(#List), 0))
n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n))
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)),
Item = SUBSTRING(#List,MIN(n),1+MAX(n)-MIN(n)),
Matched
FROM (
SELECT n, y.Matched, Grouper = n - ROW_NUMBER() OVER(ORDER BY y.Matched,n)
FROM numbers
CROSS APPLY (
SELECT Matched = CASE WHEN SUBSTRING(#List,n,1) LIKE #Pattern THEN 1 ELSE 0 END
) y
) d
GROUP BY Matched, Grouper;
Then it's pretty easy:
-- Sample Data
DECLARE #temp_ExtractedNotes TABLE (someID INT IDENTITY, NoteText VARCHAR(8000));
INSERT #temp_ExtractedNotes (NoteText) VALUES
('blah blah blah 2/4/2016... Demand Due Date: 1/05/2011; blah blah 12/1/2017...'),
('Yada yad..... Demand Due Date: 11/21/2016;...'),
('adas dasd asd a sd asdas Demand Due Date: 09/09/2019... 5/05/2005, 10/10/2010....'),
('nothing to see here - moving on... 01/02/2003!');
-- Solution
SELECT TOP (1) WITH TIES t.someID, ns.s, DueDate = CASE SIGN(nt.s) WHEN 1 THEN f.item END
FROM #temp_ExtractedNotes AS t
CROSS APPLY (VALUES(CHARINDEX('Demand Due Date:',t.NoteText))) AS nt(s)
CROSS APPLY (VALUES(SUBSTRING(t.noteText, nt.s+16, 8000))) AS ns(s)
CROSS APPLY dbo.patternsplitCM(ns.s,'[0-9/]') AS f
WHERE f.matched = 1
ORDER BY ROW_NUMBER() OVER (PARTITION BY t.someID ORDER BY f.itemNumber);
Results:
someID s DueDate
----------- ---------------------------------------- ------------
1 1/05/2011; blah blah 12/1/2017... 1/05/2011
2 11/21/2016;... 11/21/2016
3 09/09/2019... 5/05/2005, 10/10/2010.... 09/09/2019
4 here - moving on... 01/02/2003! NULL

SQL to split a column values into rows in Netezza

I have data in the below way in a column. The data within the column is separated by two spaces.
4EG C6CC C6DE 6MM C6LL L3BC C3
I need to split it into as beloW. I tried using REGEXP_SUBSTR to do it but looks like it's not in the SQL toolkit. Any suggestions?
1. 4EG
2. C6CC
3. C6DE
4. 6MM
5. C6LL
6. L3BC
7. C3
This has ben answered here: http://nz2nz.blogspot.com/2016/09/netezza-transpose-delimited-string-into.html?m=1
Please note the comment at the button about the best performing way of use if array functions. I have measured the use of regexp_extract_all_sp() versus repeated regex matches and the benefit can be quite large
The examples from nz2nz.blogpost.com are hard to follow. I was able to piece together this method:
with
n_rows as (--update on your end
select row_number() over(partition by 1 order by some_field) as seq_num
from any_table_with_more_rows_than_delimited_values
)
, find_values as ( -- fake data
select 'A' as id, '10,20,30' as orig_values
union select 'B', '5,4,3,2,1'
)
select
id,
seq_num,
orig_values,
array_split(orig_values, ',') as array_list,
get_value_varchar(array_list, seq_num) as value
from
find_values
cross join n_rows
where
seq_num <= regexp_match_count(orig_values, ',') + 1 -- one row for each value in list
order by
id,
seq_num

ORDER BY not putting SELECT statement in numerical order

I am working on a SELECT statement.
USE SCRUMAPI2
DECLARE #userParam VARCHAR(100)
,#statusParam VARCHAR(100)
SET #userParam = '%'
SET #statusParam = '%'
SELECT ROW_NUMBER() OVER (
ORDER BY PDT.[Name] DESC
) AS 'RowNumber'
,PDT.[Name] AS Project
,(
CASE WHEN (
STY.KanBanProductId IS NOT NULL
AND STY.SprintId IS NULL
) THEN 'KanBan' WHEN (
STY.KanBanProductId IS NULL
AND STY.SprintId IS NOT NULL
) THEN 'Sprint' END
) AS ProjectType
,STY.[Number] StoryNumber
,STY.Title AS StoryTitle
,TSK.[Name] AS Task
,CONVERT(VARCHAR(20), STY.Effort) AS Effort
,CONVERT(VARCHAR(20), TSK.OriginalEstimateHours) AS OriginalEstimateHours
,TSK.STATUS AS STATUS
FROM Task TSK
LEFT JOIN Story STY ON TSK.StoryId = STY.PK_Story
LEFT JOIN Sprint SPT ON STY.SprintId = SPT.PK_Sprint
LEFT JOIN Product PDT ON STY.ProductId = PDT.PK_Product
WHERE TSK.PointPerson LIKE #userParam
AND TSK.STATUS LIKE #statusParam
GROUP BY STY.[Number]
,TSK.STATUS
,STY.Title
,PDT.[Name]
,TSK.CreateDate
,TSK.[Name]
,STY.KanBanProductId
,STY.SprintId
,TSK.OriginalEstimateHours
,STY.Effort
My issue that that although I have the ORDER BY sorting by story number first it is not returning as expected (below is column STY.[Number]):
As you can see it foes from 33 to 4 to 42, I want it in numerical order so that 4 would be between 3 and 5 not 33 and 42. How do I achieve this?
Given the structure of your data (with a constant prefix), probably the easiest way to get what you want is:
order by len(STY.[Number]), STY.[Number]
This orders first by the length and then by the number itself.
Those are strings. Do you really expect SQL Server to be able to identify that there is a number at character 6 in every single row in the result, and instead of ordering by character 6, they pretend that, say, SUPP-5 is actually SUPP-05? If that worked for you, people who expect the opposite behavior (to treat the whole string as a string) would be complaining. The real fix is to store this information in two separate columns, since it is clearly two separate pieces of data.
In the meantime, you can hack something, like:
ORDER BY LEFT(col, 4), CONVERT(INT, SUBSTRING(col, 6, 255)));
As Martin explained, this should be on the outer query, not just used to generate a ROW_NUMBER() - generating a row number alone doesn't guarantee the results will be ordered by that value. And this will only work with additional checks to ensure that every single row has a value following the dash that can be converted to an int. As soon as you have SUPP-5X this will break.
It's sorting by the string in lexicography order. To get numerical ordering you need to extract the number from the string (with substring()) and cast it to integer.

Resources