Split a Description Column to 3 Columns, T-SQL / SQL Server 2008 - sql-server

I need to split a nvarchar(100) column into three nvarchar(28) columns without a known delimiter and without breaking mid-word. I am thinking that I would need to find the space that is near the 28th character, measure length and position of the word just before that space and decide weather the delimiter should be before or after that word. Then, again for the 3rd column.
From:
Col
-------------------------------------------------------------------------------------
Royal Mission Open Hutch with 4-Arch Doors, Plain Glass (with Glass Shelves Standard)
To:
Col1 Col2 Col3
------------------------ ------------------------- --------------------------
Royal Mission Open Hutch 4-Arch Doors, Plain Glass (with Glass Shelves Standa
Any Ideas?
Thanks nab
I am using SQL 2008

Using CROSS APPLYs, you could try something like this:
WITH data (col) AS (
SELECT CAST('Royal Mission Open Hutch with 4-Arch Doors, Plain Glass (with Glass Shelves Standard)' AS nvarchar(100))
)
SELECT
col1, col2, col3
FROM data
CROSS APPLY (
SELECT
NULLIF(30 - CHARINDEX(' ', REVERSE(LEFT(col, 29))), 0)
) AS x1 (first_space_pos)
CROSS APPLY (
SELECT
LEFT(col, ISNULL(first_space_pos, 28)),
LTRIM(NULLIF(SUBSTRING(col, ISNULL(first_space_pos, 29), 999), ''))
) AS x2 (col1, col23)
CROSS APPLY (
SELECT
NULLIF(30 - CHARINDEX(' ', REVERSE(LEFT(col23, 29))), 0)
) AS x3 (second_space_pos)
CROSS APPLY (
SELECT
LEFT(col23, ISNULL(second_space_pos, 28)),
LTRIM(NULLIF(SUBSTRING(col23, ISNULL(second_space_pos, 29), 28), ''))
) AS x4 (col2, col3)
;
The first CROSS APPLY searches for the last space in the first 29 characters of the big column, and the second one uses the found position to produce the first smaller column, col1 and return the rest of the string as col23.
The next two CROSS APPLYs do perform the same manipulations on col23, thus producing col2 and col3. The only difference is, the last CROSS APPLY puts into col3 at most 28 characters rather than all the remaining ones.
The hardcoded values like 28, 29 and 30 could be parametrised, but I'll leave that part of the job to you.
You can try this query at SQL Fiddle.

Related

T-SQL split string containing alpha and numeric characters by variable delimiter

I am trying to split a string that has 3 set Alpha Characters that can appear in any order followed by a numeric value. The issue I am having is that the order of the alpha characters isn't fixed. And neither is the number of numeric values after the alpha character it may contain any of the following examples:
X1Y45Z1
Y25Z1
X1Y9Z1
X2Z6
With a a lot of help from our local IT ( I am still learning SQL) I have managed to separate out X Y and Z into separate columns with the numbers after them, but they don't always appear in order
Col1 may contain X or Y
Col2 may contain Y or Z
Col3 may contain Z or nothing
I am trying to get a result like the following:
If X is in Col1, Show number(s) after X, in new column "X", if Y is in col1, Show number(s) after Y in new column "Y", etc.
At present we are using 2 cte's to break up the string. and I am trying to simplify it so that I can search the string, have 3 columns after created 'X','Y','Z' and put the correct number(s) after each Alpha delimiter into it. I should note I Do Not have full admin access so I cannot create new tables or update/insert data or clean it.
Also apologies if this is slightly formatted incorrectly. It is my first post on StackOverflow
declare #tbl table
(
Col1 varchar(100), <-------This Column contains the values I want
)
insert into #tbl
select Col1,
from table1,
where xyz
;with cte as
(
select
Col1,
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(**Col1**,'P', '</x><x>P'),'C', '</x><x>C'),'I', '</x><x>I'),'M', '</x><x>M'),'S', '</x><x>S'),'Q', '</x><x>Q'),'L', '</x><x>L'),'T', '</x><x>T'),'E', '</x><x>E'),'R', '</x><x>R'),'U', '</x><x>U'),'W', '</x><x>W')
**Col1NODES**
from
#tbl
)
, cte2 (Col1, Col1Nodes) as
(
select
Col1,
convert(xml,'<z><x>' + Col1nodes + '</x></z>') **Col1NODES**
from
cte
)
select
Col1,
isnull(Col1Nodes.value('/z[1]/x[2]','varchar(100)'),'-') F1,
isnull(Col1Nodes.value('/z[1]/x[3]','varchar(100)'),'-') F2,
isnull(Col1Nodes.value('/z[1]/x[4]','varchar(100)'),'-') F3
from
cte2
Current output is below:
If you have SQL Server 2016+ you may try to use the following solution, based on JSON. The important part is to transform the input data into a valid JSON object (X1Y45Z1 is transformed into {"X":1,"Y":45,"Z":1} for example). After that you need to parse this object with OPENJSON() function using the appropriate WITH clause to define the columns in the output.
Table:
CREATE TABLE Data (
TextData nvarchar(100)
)
INSERT INTO Data
(TextData)
VALUES
('X1Y45Z1'),
('Y25Z1'),
('X1Y9Z1'),
('X2Z6'),
('Z1X6')
Statement:
SELECT d.TextData, j.*
FROM Data d
CROSS APPLY OPENJSON(
CONCAT(
N'{',
STUFF(REPLACE(REPLACE(REPLACE(d.TextData, N'X', N',"X":'), N'Y', N',"Y":'), N'Z', N',"Z":'), 1, 1, N''),
N'}'
)
) WITH (
X int '$.X',
Y int '$.Y',
Z int '$.Z'
) j
Output:
---------------------
TextData X Y Z
---------------------
X1Y45Z1 1 45 1
Y25Z1 25 1
X1Y9Z1 1 9 1
X2Z6 2 6
Z1X6 6 1
For versions before SQL Server 2016, you may use an XML based approach. You need to transform text data into an appropriate XML (X1Y45Z1 is transformed into <row><name>X</name><value>1</value></row><row><name>Y</name><value>45</value></row><row><name>Z</name><value>1</value></row> for example):
SELECT
TextData,
XmlData.value('(/row[name = "X"]/value/text())[1]', 'nvarchar(4)') AS X,
XmlData.value('(/row[name = "Y"]/value/text())[1]', 'nvarchar(4)') AS Y,
XmlData.value('(/row[name = "Z"]/value/text())[1]', 'nvarchar(4)') AS Z
FROM (
SELECT
TextData,
CONVERT(
xml,
CONCAT(
STUFF(REPLACE(REPLACE(REPLACE(d.TextData, N'X', N'</value></row><row><name>X</name><value>'), N'Y', N'</value></row><row><name>Y</name><value>'), N'Z', N'</value></row><row><name>Z</name><value>'), 1, 14, N''),
N'</value></row>'
)
) AS XmlData
FROM Data d
) x

Query for date value AFTER specified patindex value in TEXT column using SQL Server

I am querying for a date value that follows a specific phrase in the same text column value. There are a number of date values in this text column but I'm needing only the DATE following the phrase I've found via a patindex value. Any help/direction would be appreciated. Thanks.
Here is my SQL code:
SELECT
NotesSysID,
PATINDEX('%Demand Due Date:%', NoteText) AS [Index of DemandDueDate text],
PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', NoteText) AS [Index of DemandDueDate date],
SUBSTRING(NoteText, PATINDEX('%Demand Due Date:%', NoteText), (PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', NoteText)))
FROM
#temp_ExtractedNotes;
Here is a pic of my data, please note that the 2nd index is LESS than the Index of DemandDueDate text found and I'm needing the subsequent date AFTER the Index of DemandDueDate text column. Hopefully this makes sense.
I think your code will be easier for you to work with if you use a CTE and some stepwise refinement. That'll free you from trying to do everything with one hugely-nested SELECT statement.
;
WITH FirstCut as (
SELECT
NotesSysID,
LocationOfText = PATINDEX('%Demand Due Date:%', NoteText),
NoteText
FROM #temp_ExtractedNotes
),
SecondCut as (
SELECT
NotesSysID,
NoteText,
-- making assumption date will be within first 250 chars of text
DemandDueDateSection = SUBSTRING( NoteText, [LocationOfText], 250)
FROM FirstCut
),
ThirdCut as (
SELECT
NotesSysID,
NoteText,
DemandDueDateSection,
LocationOfDate = PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', DemandDueDateSection)
FROM SecondCut
),
FourthCut as (
SELECT
NotesSysID,
NoteText,
DateAsText = SUBSTRING( DemandDueDateSection, LocationOfDate, 10 )
FROM ThirdCut
)
SELECT
NotesSysID,
NoteText,
DemandDueDate = CONVERT( DateTime, DateAsText)
FROM FourthCut
The issue you're having is because your second PATINDEX isn't finding the date you want; it's finding the first date in the string, which, as you noted, appears before the PATINDEX of the phrase you're searching for %Demand Due Date:%. The third parameter of SUBSTRING is LENGTH, which is to say how many characters after the second parameter you want to pull. By using that second PATINDEX value as the third parameter in your SUBSTRING you're returning a sub-string that starts where you want it to, and is of LENGTH equal to the number of characters into the string where that first date appears.
Which, of course, isn't what you want. To #ZLK's point in the comments, first, you need to do a nested PATINDEX. That's going to be pretty slow, so I'm hoping there aren't a bazillion records in your temp table.
Based on the sample image, it looks like the date you're interested in can appear a variable number of characters after %Demand Due Date:%. We'll start by adding 16 to the PATINDEX of %Demand Due Date:% (because that's how many characters are in %Demand Due Date:%, so we'll just start right after that). Then we'll pick up the next 100 characters. You can tweak that later if you need more or not that many.
So you're first SUBSTRING will look like this:
SUBSTRING(NoteText, PATINDEX('%Demand Due Date:%', NoteText) + 16, 100)
Now we have to search that sub-string for the second pattern, the one that should yield a date for you.
PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', SUBSTRING(NoteText, PATINDEX('%Demand Due Date:%', NoteText) + 16, 100))
The number returned there is the point where your date value starts within the 100 characters following %Demand Due Date:%. Armed with that number, you just need to SUBSTRING out the next ten characters, and, just for fun, CAST it as a DATE. That big ugly formula will look like this:
DECLARE #test VARCHAR(200) = 'foo bar Demand Due Date: 12/21/2018 bar foo foo bar';
SELECT
CAST(
SUBSTRING(
SUBSTRING
(#test, PATINDEX('%Demand Due Date:%', #test) + 16, 100),
PATINDEX
( '%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',
SUBSTRING
(#test, PATINDEX('%Demand Due Date:%', #test) + 16, 100)
)
,10)
AS DATE);
Result:
2018-12-21
Rextester: https://rextester.com/KCY79989
Grab a copy of PatternSplitCM.
CREATE FUNCTION dbo.PatternSplitCM
(
#List VARCHAR(8000) = NULL
,#Pattern VARCHAR(50)
) RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
WITH numbers AS (
SELECT TOP(ISNULL(DATALENGTH(#List), 0))
n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n))
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)),
Item = SUBSTRING(#List,MIN(n),1+MAX(n)-MIN(n)),
Matched
FROM (
SELECT n, y.Matched, Grouper = n - ROW_NUMBER() OVER(ORDER BY y.Matched,n)
FROM numbers
CROSS APPLY (
SELECT Matched = CASE WHEN SUBSTRING(#List,n,1) LIKE #Pattern THEN 1 ELSE 0 END
) y
) d
GROUP BY Matched, Grouper;
Then it's pretty easy:
-- Sample Data
DECLARE #temp_ExtractedNotes TABLE (someID INT IDENTITY, NoteText VARCHAR(8000));
INSERT #temp_ExtractedNotes (NoteText) VALUES
('blah blah blah 2/4/2016... Demand Due Date: 1/05/2011; blah blah 12/1/2017...'),
('Yada yad..... Demand Due Date: 11/21/2016;...'),
('adas dasd asd a sd asdas Demand Due Date: 09/09/2019... 5/05/2005, 10/10/2010....'),
('nothing to see here - moving on... 01/02/2003!');
-- Solution
SELECT TOP (1) WITH TIES t.someID, ns.s, DueDate = CASE SIGN(nt.s) WHEN 1 THEN f.item END
FROM #temp_ExtractedNotes AS t
CROSS APPLY (VALUES(CHARINDEX('Demand Due Date:',t.NoteText))) AS nt(s)
CROSS APPLY (VALUES(SUBSTRING(t.noteText, nt.s+16, 8000))) AS ns(s)
CROSS APPLY dbo.patternsplitCM(ns.s,'[0-9/]') AS f
WHERE f.matched = 1
ORDER BY ROW_NUMBER() OVER (PARTITION BY t.someID ORDER BY f.itemNumber);
Results:
someID s DueDate
----------- ---------------------------------------- ------------
1 1/05/2011; blah blah 12/1/2017... 1/05/2011
2 11/21/2016;... 11/21/2016
3 09/09/2019... 5/05/2005, 10/10/2010.... 09/09/2019
4 here - moving on... 01/02/2003! NULL

How to merge multiple values from multiple columns into one row?

I have this table:
I need to convert it to (with parenthesis as well):
row_nbr - row_label - default_order
10 - TOTAL ACCOUNTABLE GROSS - (1, 3)
12 - DEDUCTIBLE TERMS - (3)
20 - TOTAL DEDUCTIBLE TERMS - (3)
34 - AMOUNT DUE (UNRECOUPED) - (4)
36 - ACCOUNTABLE GROSS - (2)
41 - TOTAL CONTINGENT COMPENSATION - (3)
I could have more than twice of the same row_nbr.
In this case the 10 is there twice, but I could have 3 10's, 4 12's, etc.
I kind of started the pivot table but honestly, even by looking at the Microsoft site, I cannot for the life of me figure this out.
select row_nbr, row_label, default_order
from #temp
pivot
(
max(row_nbr)
for default_order in (default_order)
) piv;
Anyone care to help?
Thanks.
As #Vinit says, you can use the string_agg function in 2017, but if you're at least on 2005 you can use a horrible, torturous XML generator:
SELECT row_nbr
,row_label
,default_order = '(' +
STUFF(
(SELECT ', ' + CAST(default_order AS VARCHAR(10))
FROM #temp
WHERE row_nbr = t.row_nbr
ORDER BY default_order
FOR XML PATH('') ,
ROOT('MyString'),
TYPE ).value('/MyString[1]', 'varchar(max)'), 1, 2, '')
+ ')'
FROM #temp t;
You can read more about it in this blog post
PIVOTS are definitely wonky to get a grip on. Fortunately, in this case, while you could use one as an intermediate step, it's not necessary. PIVOT will take each value and put it into a corresponding distinct column, and what you're wanting is a single column, with them all concatenated together. Like I said, you could do the pivot, then just concatenate all the generated columns together, but that's way more work than is needed.
On 2014, the easiest way to do this is using FOR XML. Russell Fox's answer pretty much covers how that technique works (although there are a few variants on how you can do that should you so choose).
If you know definitively the values are all integers, you can save a bit of typing and omit the type and value operators, as those are only necessary when you have to escape certain XML characters in string fields
select
row_nbr,
row_label,
default_order,
stuff
(
(
select concat(',', default_order)
from #temp i
where i.row_nbr = o.row_nbr
for xml path('')
), 1, 1, ''
)
from #temp o

SQL to split a column values into rows in Netezza

I have data in the below way in a column. The data within the column is separated by two spaces.
4EG C6CC C6DE 6MM C6LL L3BC C3
I need to split it into as beloW. I tried using REGEXP_SUBSTR to do it but looks like it's not in the SQL toolkit. Any suggestions?
1. 4EG
2. C6CC
3. C6DE
4. 6MM
5. C6LL
6. L3BC
7. C3
This has ben answered here: http://nz2nz.blogspot.com/2016/09/netezza-transpose-delimited-string-into.html?m=1
Please note the comment at the button about the best performing way of use if array functions. I have measured the use of regexp_extract_all_sp() versus repeated regex matches and the benefit can be quite large
The examples from nz2nz.blogpost.com are hard to follow. I was able to piece together this method:
with
n_rows as (--update on your end
select row_number() over(partition by 1 order by some_field) as seq_num
from any_table_with_more_rows_than_delimited_values
)
, find_values as ( -- fake data
select 'A' as id, '10,20,30' as orig_values
union select 'B', '5,4,3,2,1'
)
select
id,
seq_num,
orig_values,
array_split(orig_values, ',') as array_list,
get_value_varchar(array_list, seq_num) as value
from
find_values
cross join n_rows
where
seq_num <= regexp_match_count(orig_values, ',') + 1 -- one row for each value in list
order by
id,
seq_num

Expression to find multiple spaces in string

We handle a lot of sensitive data and I would like to mask passenger names using only the first and last letter of each name part and join these by three asterisks (***),
For example: the name 'John Doe' will become 'J***n D***e'
For a name that consists of two parts this is doable by finding the space using the expression:
LEFT(CardHolderNameFromPurchase, 1) +
'***' +
CASE WHEN CHARINDEX(' ', PassengerName) = 0
THEN RIGHT(PassengerName, 1)
ELSE SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) -1, 1) +
' ' +
SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) +1, 1) +
'***' +
RIGHT(PassengerName, 1)
END
However, the passenger name can have more than two parts, there is no real limit to it. How should can I find the indices of all spaces within an expression? Or should I maybe tackle this problem in a different way?
Any help or pointer is much appreciated!
This solution does what you want it to, but is really the wrong approach to use when trying to hide personally identifiable data, as per Gordon's explanation in his answer.
SQL:
declare #t table(n nvarchar(20));
insert into #t values('John Doe')
,('JohnDoe')
,('John Doe Two')
,('John Doe Two Three')
,('John O''Neill');
select n
,stuff((select ' ' + left(s.item,1) + '***' + right(s.item,1)
from dbo.fn_StringSplit4k(t.n,' ',null) as s
for xml path('')
),1,1,''
) as mask
from #t as t;
Output:
+--------------------+-------------------------+
| n | mask |
+--------------------+-------------------------+
| John Doe | J***n D***e |
| JohnDoe | J***e |
| John Doe Two | J***n D***e T***o |
| John Doe Two Three | J***n D***e T***o T***e |
| John O'Neill | J***n O***l |
+--------------------+-------------------------+
String splitting function based on Jeff Moden's Tally Table approach:
create function [dbo].[fn_StringSplit4k]
(
#str nvarchar(4000) = ' ' -- String to split.
,#delimiter as nvarchar(1) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return, null returns all.
)
returns table
as
return
-- Start tally table with 10 rows.
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(isnull(#str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where substring(isnull(#str,''),t,1) = #delimiter)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,isnull(nullif(charindex(#delimiter,isnull(#str,''),s),0)-s,4000) from s)
select rn
,item
from(select row_number() over(order by s) as rn
,substring(#str,s,l) as item
from l
) a
where rn = #num
or #num is null;
GO
If you consider PassengerName as sensitive information, then you should not be storing it in clear text in generally accessible tables. Period.
There are several different options.
One is to have reference tables for sensitive information. Any table that references this would have an id rather than the name. Viola. No sensitive information is available without access to the reference table, and that would be severely restricted.
A second method is a reversible compression algorithm. This would allow the the value to be gibberish, but with the right knowledge, it could be transformed back into a meaningful value. Typical methods for this are the public key encryption algorithms devised by Rivest, Shamir, and Adelman (RSA encoding).
If you want to do first and last letters of names, I would be really careful about Asian names. Many of them consist of two or three letters, when written in Latin script. That isn't much hiding. SQL Server does not have simple mechanisms to do this. You can write a user-defined function with a loop to manager the process. However, I view this as the least secure and least desirable approach.
This uses Jeff Moden's DelimitedSplit8K, as well as the new functionality in SQL Server 2017 STRING_AGG. As I don't know what version you're using, I've just gone "whole hog" and assumed you're using the latest version.
Jeff's function is invaluable here, as it returns the ordinal position, something which Microsoft have foolishly omitted from their own function, STRING_SPLIT (and didn't add in 2017 either). Ordinal position is key here, so we can't make use of the built in function.
WITH VTE AS(
SELECT *
FROM (VALUES ('John Doe'),('Jane Bloggs'),('Edgar Allan Poe'),('Mr George W. Bush'),('Homer J Simpson')) V(FullName)),
Masking AS (
SELECT *,
ISNULL(STUFF(Item, 2, LEN(item) -2,'***'), Item) AS MaskedPart
FROM VTE V
CROSS APPLY dbo.delimitedSplit8K(V.Fullname, ' '))
SELECT STRING_AGG(MaskedPart,' ') AS MaskedFullName
FROM Masking
GROUP BY Fullname;
Edit: Nevermind, OP has commented they are using 2008, so STRING_AGG is out of the question. #iamdave, however, has posted an answer which is very similar to my own, just do it the "old fashioned XML way".
Depending on your version of SQL Server, you may be able to use the built-in string split to rows on spaces in the name, do your string formatting, and then roll back up to name level using an XML path.
create table dataset (id int identity(1,1), name varchar(50));
insert into dataset (name) values
('John Smith'),
('Edgar Allen Poe'),
('One Two Three Four');
with split as (
select id, cs.Value as Name
from dataset
cross apply STRING_SPLIT (name, ' ') cs
),
formatted as (
select
id,
name,
left(name, 1) + '***' + right(name, 1) as out
from split
)
SELECT
id,
(SELECT ' ' + out
FROM formatted b
WHERE a.id = b.id
FOR XML PATH('')) [out_name]
FROM formatted a
GROUP BY id
Result:
id out_name
1 J***n S***h
2 E***r A***n P***e
3 O***e T***o T***e F***r
You can do that using this function.
create function [dbo].[fnMaskName] (#var_name varchar(100))
RETURNS varchar(100)
WITH EXECUTE AS CALLER
AS
BEGIN
declare #var_part varchar(100)
declare #var_return varchar(100)
declare #n_position smallint
set #var_return = ''
set #n_position = 1
WHILE #n_position<>0
BEGIN
SET #n_position = CHARINDEX(' ', #var_name)
IF #n_position = 0
SET #n_position = LEN(#var_name)
SET #var_part = SUBSTRING(#var_name, 1, #n_position)
SET #var_name = SUBSTRING(#var_name, #n_position+1, LEN(#var_name))
if #var_part<>''
SET #var_return = #var_return + stuff(#var_part, 2, len(#var_part)-2, replicate('*',len(#var_part)-2)) + ' '
END
RETURN(#var_return)
END

Resources