How to clean a string and extract number postfixes in T-SQL

How to clean a string and extract number postfixes in T-SQL - sql-server

I have a string that consists of a name and in most cases it has a postfix with one or two numbers at the end. This number-postfix should be cut off from the name. One number represents a status and should be extracted. If there are two numbers it is the seconde from the right, if there is one number it is the first from the right. These numbers are seperated by an underscore. Underscores can also be used within the name.
The result should be a column with the clearname and the extracted status.
I tried to solve the problem with the standard string functions like Substring, Charindex, Patindex, LEN and son on. But my approach became very bulky quickly and hard to maintain. I wonder if there is an elegant solution with the usual SQl-Server capabilities (if possible without installing extras for regex).
SELECT _data.myStr
-- , ... AS clearname /*String cleaned from number_postfixes*/
-- , ... AS Status /*second number from the right*/
FROM (
SELECT 'tree_leafs_offer_2_1' AS myStr --clearname: tree_leafs_offer; cut off: _2_1; extracted status: 2
UNION
SELECT 'tree_leafs_offer_2_10' AS myStr --clearname: tree_leafs_offer_2_10; cut off: _2_10; extracted status: 2
UNION
SELECT 'tree_leafs_offer_2_2' AS myStr --clearname: tree_leafs_offer; cut off: _2_2; extracted status: 2
UNION
SELECT 'tree_leafs_offer_1150_1' AS myStr --clearname: tree_leafs_offer; cut off: _1150_1; extracted status: 1150
UNION
SELECT 'tree_leafs_offer_1150_10' AS myStr --clearname: tree_leafs_offer; cut off: _1150_10; extracted status: 1150
UNION
SELECT 'builder_bundle_less_xl_1' AS myStr --clearname: builder_bundle_less_xl; cut off: _1; extracted status: 1
UNION
SELECT 'builder_bundle_less_xl_10' AS myStr --clearname: builder_bundle_less_xl; cut off: _10; extracted status: 10
UNION
SELECT 'static_components_wolves_10_4' AS myStr --clearname: static_components_wolves; cut off: _10_4; extracted status: 4
UNION
SELECT 'coke_0_boring_components_bundle_grant_1' AS myStr --clearname: oke_0_boring_components_bundle_grant; cut off: _1; extracted status: 1
UNION
SELECT 'coke_0_soccer18_end_1_4h_101' AS myStr --clearname: coke_0_soccer18_end_1_4h; cut off: _101; extracted status: 101
UNION
SELECT 'coke_0_late_downsell_bundle_high_114' AS myStr --clearname: coke_0_late_downsell_bundle_high; cut off: _114; extracted status: 114
UNION
SELECT 'itembundle_mine_bundle_small' AS myStr --clearname: itembundle_mine_bundle_small; cut off: <nothing>; extracted status: NULL
) AS _data
As-Is Result:
-----------------
myStr:
---------------------------------------
builder_bundle_less_xl_1
builder_bundle_less_xl_10
coke_0_boring_components_bundle_grant_1
coke_0_late_downsell_bundle_high_114
coke_0_soccer18_end_1_4h_101
itembundle_mine_bundle_small
static_components_wolves_10_4
tree_leafs_offer_1150_1
tree_leafs_offer_1150_10
tree_leafs_offer_2_1
tree_leafs_offer_2_10
tree_leafs_offer_2_2
To-Be Result (two new columns):
-------------------
clearname: |Status
----------------------------------------------
builder_bundle_less_xl | 1
builder_bundle_less_xl | 10
coke_0_boring_components_bundle_grant | 1
coke_0_late_downsell_bundle_high | 114
coke_0_soccer18_end_1_4h | 101
itembundle_mine_bundle_small |NULL
static_components_wolves | 10
tree_leafs_offer |1150
tree_leafs_offer |1150
tree_leafs_offer | 2
tree_leafs_offer | 2
tree_leafs_offer | 2

To be honest: this format is awful! If this is not a one-time-action you really should try to change this before you have to deal with it.
But - if you have to stick with this - you might give this a try:
EDIT: resolved a bad computation of the status position...
DECLARE #tbl TABLE(ID INT IDENTITY,myStr VARCHAR(1000));
INSERT INTO #tbl VALUES
('tree_leafs_offer_2_1')
,('tree_leafs_offer_2_10')
,('tree_leafs_offer_2_2')
,('tree_leafs_offer_1150_1')
,('tree_leafs_offer_1150_10')
,('builder_bundle_less_xl_1')
,('builder_bundle_less_xl_10')
,('static_components_wolves_10_4')
,('coke_0_boring_components_bundle_grant_1')
,('coke_0_soccer18_end_1_4h_101')
,('coke_0_late_downsell_bundle_high_114')
,('itembundle_mine_bundle_small');
The query
WITH cte AS
(
SELECT t.ID
,t.myStr
,A.[key] AS Position
,A.[value] AS WordFragment
,B.CastedToInt
FROM #tbl t
CROSS APPLY OPENJSON(N'["' + REPLACE(t.myStr,'_','","') + '"]') A
CROSS APPLY(SELECT TRY_CAST(A.[value] AS INT)) B(CastedToInt)
)
SELECT ID
,myStr
,STUFF(
(SELECT CONCAT('_',cte2.WordFragment)
FROM cte cte2
WHERE cte2.ID=cte.ID
AND cte2.Position<=A.PositionHighestNonInt
ORDER BY cte2.Position
FOR XML PATH('')
),1,1,'') AS ClearName
,(SELECT cte3.CastedToInt FROM cte cte3 WHERE cte3.ID=cte.ID AND cte3.Position=A.PositionHighestNonInt+1) AS [Status]
FROM cte
CROSS APPLY (
SELECT ISNULL(MAX(x.Position),1000)
FROM cte x
WHERE x.ID=cte.ID AND x.CastedToInt IS NULL
) A(PositionHighestNonInt)
GROUP BY ID,myStr,PositionHighestNonInt;
The result
+----+---------------------------------------+--------+
| ID | ClearName | Status |
+----+---------------------------------------+--------+
| 1 | tree_leafs_offer | 2 |
+----+---------------------------------------+--------+
| 2 | tree_leafs_offer | 2 |
+----+---------------------------------------+--------+
| 3 | tree_leafs_offer | 2 |
+----+---------------------------------------+--------+
| 4 | tree_leafs_offer | 1150 |
+----+---------------------------------------+--------+
| 5 | tree_leafs_offer | 1150 |
+----+---------------------------------------+--------+
| 6 | builder_bundle_less_xl | 1 |
+----+---------------------------------------+--------+
| 7 | builder_bundle_less_xl | 10 |
+----+---------------------------------------+--------+
| 8 | static_components_wolves | 10 |
+----+---------------------------------------+--------+
| 9 | coke_0_boring_components_bundle_grant | 1 |
+----+---------------------------------------+--------+
| 10 | coke_0_soccer18_end_1_4h | 101 |
+----+---------------------------------------+--------+
| 11 | coke_0_late_downsell_bundle_high | 114 |
+----+---------------------------------------+--------+
| 12 | itembundle_mine_bundle_small | NULL |
+----+---------------------------------------+--------+
The idea:
Provide your data in a mockup table
Use a trick with OPENJSON to get the string split and find parts which can be cast to INT.
Find the highest non-int fragment. The Status will be the next index
With v2017 you could use STRING_AGG, but with v2016 we have to use a XML-based trick to concatenate all fragments before the [Status].

One possible approach is to use string replacement and JSON capabilities of SQL Server 2016+. Each row is reversed and transformed into a valid JSON array ('tree_leafs_offer_2_1' is transformed into '["1","2","reffo","sfael","eert"]' for example). Then you can easily check if the first and the second items are valid numbers using JSON_VALUE(<json_array>, '$[0]'), JSON_VALUE(<json_array>, '$[1]') and TRY_CONVERT(). This will work if you have maximum two numbers from the right.
Input:
CREATE TABLE #Data (
myStr varchar(max)
)
INSERT INTO #Data
(MyStr)
VALUES
('tree_leafs_offer_2_1'),
('tree_leafs_offer_2_10'),
('tree_leafs_offer_2_2'),
('tree_leafs_offer_1150_1'),
('tree_leafs_offer_1150_10'),
('builder_bundle_less_xl_1'),
('builder_bundle_less_xl_10'),
('static_components_wolves_10_4'),
('coke_0_boring_components_bundle_grant_1'),
('coke_0_soccer18_end_1_4h_101'),
('coke_0_late_downsell_bundle_high_114'),
('itembundle_mine_bundle_small')
T-SQL:
SELECT
LEFT(myStr, LEN(myStr) - CHARINDEX('_', REVERSE(myStr))) as ClearName,
REVERSE(LEFT(REVERSE(myStr), CHARINDEX('_', REVERSE(myStr)) - 1)) AS Status
FROM (
SELECT
CASE
WHEN
TRY_CONVERT(int, REVERSE(JSON_VALUE(CONCAT('["', REPLACE(STRING_ESCAPE(REVERSE(MyStr), 'json'), '_', '","'), '"]'), '$[1]'))) IS NULL AND
TRY_CONVERT(int, REVERSE(JSON_VALUE(CONCAT('["', REPLACE(STRING_ESCAPE(REVERSE(MyStr), 'json'), '_', '","'), '"]'), '$[0]'))) IS NULL
THEN CONCAT(myStr, '_0')
WHEN
TRY_CONVERT(int, REVERSE(JSON_VALUE(CONCAT('["', REPLACE(STRING_ESCAPE(REVERSE(MyStr), 'json'), '_', '","'), '"]'), '$[1]'))) IS NULL AND
TRY_CONVERT(int, REVERSE(JSON_VALUE(CONCAT('["', REPLACE(STRING_ESCAPE(REVERSE(MyStr), 'json'), '_', '","'), '"]'), '$[0]'))) IS NOT NULL
THEN MyStr
ELSE LEFT(myStr, LEN(myStr) - CHARINDEX('_', REVERSE(myStr)))
END AS myStr
FROM #Data
) fixed
ORDER BY MyStr
Output:
----------------------------------------------
ClearName Status
----------------------------------------------
builder_bundle_less_xl 1
builder_bundle_less_xl 10
coke_0_boring_components_bundle_grant 1
coke_0_late_downsell_bundle_high 114
coke_0_soccer18_end_1_4h 101
itembundle_mine_bundle_small 0
static_components_wolves 10
tree_leafs_offer 1150
tree_leafs_offer 1150
tree_leafs_offer 2
tree_leafs_offer 2
tree_leafs_offer 2

Related

sql split row value before and after substring

I have a table that contains a list of names. However, some rows contain the name and the alias separated by , f/k/a—, , f/k/a or , n/k/a . I'm trying to split the names and aliases into separate rows. Can someone please help?
Sample data below:
|---------------------------------------------------------|
| ID | Name |
|---------------------------------------------------------|
| 1 | Evil Empire, f/k/a - Starbucks |
| 2 | Aubrey Drake Graham, n/k/a Drake |
| 3 | Thomas Johnson Bridge, f/k/a Solomans Bridge |
|---------------------------------------------------------|
Desired output below:
|---------------------------------------------------------|
| ID | Name |
|---------------------------------------------------------|
| 1 | Evil Empire |
| 1.1 | Starbucks |
| 2 | Aubrey Drake Graham |
| 2.1 | Drake |
| 3 | Thomas Johnson Bridge |
| 3.1 | Solomans Bridge |
|---------------------------------------------------------|

No need for ordinal splitter. It's a simple unpivot using CROSS APPLY
[EDIT] Changed method of splitting to look for '%/%/%' in cases where there's an alias
select cast(unpvt.id as varchar(9))+iif(unpvt.seq=1, '', '.1') ID,
trim(replace(replace(replace(unpvt.[Name],'f/k/a - ',''),'f/k/a ',''),'n/k/a ','')) [Name]
from (values (1, 'Evil Empire, f/k/a - Starbucks'),
(2, 'Aubrey Drake Graham, n/k/a Drake'),
(3, 'Thomas Johnson Bridge, f/k/a Solomans Bridge'),
(4, 'Thomas, J, Cat, f/k/a Solomans,,,Bridge'),
(5, 'Thomas')) v(id, [Name])
cross apply (values (v.id, substring(v.[Name], 1, patindex('%/%/%', v.Name)-4), 1),
(v.id, substring(v.[Name], patindex('%/%/%', v.Name)-1, len(v.[Name])), 2)) unpvt(id, [Name], seq)
where patindex('%/%/%', v.[Name])>1;
Results
ID Name
1 Evil Empire
1.1 Starbucks
2 Aubrey Drake Graham
2.1 Drake
3 Thomas Johnson Bridge
3.1 Solomans Bridge
4 Thomas, J, Cat
4.1 Solomans,,,Bridge

As you can't use the built in string_split you will need to add a Table Valued Function to do that for you. Using one of these allows you to split your data like this:
Query
declare #t table(ID int,[Name] varchar(100));
insert into #t values
(1,'Evil Empire, f/k/a - Starbucks')
,(2,'Aubrey Drake Graham, n/k/a Drake')
,(3,'Thomas Johnson Bridge, f/k/a Solomans Bridge')
;
select case when s.rn = 1
then t.ID
else t.ID + ((s.rn - 1)/10.)
end as ID
,replace(replace(replace(s.item,' f/k/a - ',''),' f/k/a ',''),' n/k/a ','')
from #t as t
cross apply dbo.fn_StringSplit4k(t.[Name],',',null) as s
order by t.ID
,s.rn;
Output
+----------+-----------------------+
| ID | Name |
+----------+-----------------------+
| 1.000000 | Evil Empire |
| 1.100000 | Starbucks |
| 2.000000 | Aubrey Drake Graham |
| 2.100000 | Drake |
| 3.000000 | Thomas Johnson Bridge |
| 3.100000 | Solomans Bridge |
+----------+-----------------------+
Function
create function [dbo].[fn_StringSplit4k]
(
#str nvarchar(4000) = ' ' -- String to split.
,#delimiter as nvarchar(20) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return.
)
returns table
as
return
-- Start tally table with 10 rows.
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(isnull(#str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where case when #delimiter = '' and t < len(#str) then 1 else case when substring(isnull(#str,''),t,1) = #delimiter then 1 else 0 end end = 1)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,case when #delimiter = '' then 1 else isnull(nullif(charindex(#delimiter,isnull(#str,''),s),0)-s,4000) end from s)
select rn
,item
from(select row_number() over(order by s) as rn
,substring(#str,s,l) as item
from l
) a
where rn = #num
or #num is null;

PIVOT Multiple columns in tsql

I have a table that is constructed like this
custid|prodid|calls|orders|upsell
34 | 2 | 4 | 2 | 1
However i need to Pivot or reconstruct to table to reflect something like
custid|prodid|code |Value
34 | 2 | call | 4
34 | 2 | order | 2
34 | 2 | upsell| 1

You could use a union approach:
SELECT custid, prodid, 'call' AS code, calls AS [Value] FROM yourTable UNION
SELECT custid, prodid, 'order', orders FROM yourTable UNION
SELECT custid, prodid, 'upsell', upsell FROM yourTable;

How to select all PK's (column 1) where the MAX(ISNULL(value, 0)) in column 3 grouped by a value in column 2?

I couldn't find an answer on my question since all questions similar to this one aren't using a nullable int in the max value and getting 1 column out of it.
My table is as follows:
| ContractId | ContractNumber | ContractVersion |
+------------+----------------+-----------------+
| 1 | 11 | NULL |
| 2 | 11 | 1 |
| 3 | 11 | 2 |
| 4 | 11 | 3 | --get this one
| 5 | 24 | NULL |
| 6 | 24 | 1 | --get this one
| 7 | 75 | NULL | --get this one
The first version is NULL and all following versions get a number starting with 1.
So now I only want to get the rows of the latest contracts (as shown in the comments behind the rows).
So for each ContractNumber I want to select the ContractId from the latest ContractVersion.
The MAX() function wont work since it's a nullable int.
So I was thinking to use the ISNULL(ContractVersion, 0) in combination with the MAX() function, but I wouldn't know how.
I tried the following code:
SELECT
ContractNumber,
MAX(ISNULL(ContractVersion, 0))
FROM
Contracts
GROUP BY
ContractNumber
...which returned all of the latest version numbers combined with the ContractNumber, but I need the ContractId. When I add ContractId in the SELECT and the GROUP BY, I'm getting all the versions again.
The result should be:
| ContractId |
+------------+
| 4 |
| 6 |
| 7 |

It's just a simple application of ROW_NUMBER() when you're wanting to select rows based on Min/Max:
declare #t table (ContractId int, ContractNumber int, ContractVersion int)
insert into #t(ContractId,ContractNumber,ContractVersion) values
(1,11,NULL ),
(2,11, 1 ),
(3,11, 2 ),
(4,11, 3 ),
(5,24,NULL ),
(6,24, 1 ),
(7,75,NULL )
;With Numbered as (
select *,ROW_NUMBER() OVER (
PARTITION BY ContractNumber
order by ContractVersion desc) rn
from #t
)
select
*
from
Numbered
where rn = 1

this will work:
select ContractId,max(rank),ContractNumber from(select *,rank() over(partition by
ContractVersion order by nvl(ContractVersion,0)) desc ) rank from tablename) group by
ContractId,max(rank),ContractNumber;

How can I use column reference in REGEXP_REPLACE in Oracle?

My script looks like this :
SELECT
regexp_replace(column1,'"resId":([^"]+?)..','"resId":column2,"')
FROM
table;
Here, I need to replace resId value in column1 by value from column2.

It is hard to tell without the sample data and expected output what you exactly want.
It appears you are expecting to transform a pattern like this - "resId":Value1,"otherid":othervalue
Please note I took into consideration that a key value pair of
"resID":value exists in the data and there is a separator ( a space or a comma) between such key-value pairs.
(,|$) indicates a comma separator or end of line after the value. You may change this to contain any separator in your data that distinguishes it from other combinations. If there is no such thing and the data is purer than this, you should care to describe it clearly by editing your question, which may help us to provide you a proper solution.
SQL Fiddle
Query:
SELECT column1,
column2,
regexp_replace(column1,'"resId":[^"]+(,|$)','"resId":' || column2 || '\1') as replaced
FROM t
Results:
| COLUMN1 | COLUMN2 | REPLACED |
|-------------------------------------|---------|-------------------------------------|
| "resId":Value1,"otherid":othervalue | Value2 | "resId":Value2,"otherid":othervalue |
| "otherid":othervalue,"resId":Value1 | Value2 | "otherid":othervalue,"resId":Value2 |

Concatenate the column2 value into your replacement string:
SELECT regexp_replace(
column1,
'"resId":([^"]+?),"','"resId":' || column2 || ',"'
)
FROM table;
However, if you data is well-formed JSON and the "resId" value will be a simple literal (not an array or an object) then you can use a regular expression that would parse this like:
'("resId":)(null|true|false|(-?0|[1-9]\d*)(\.\d*)?([eE][+-]?\d+)?|"(\\["\/bfrnt]|\\u\d{4}|[^"\/'||CHR(8)||CHR(9)|| CHR(10)||CHR(12)||CHR(13)||'])*")'
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( column1, column2 ) AS
SELECT '{"resId":null}', 1 FROM DUAL UNION ALL
SELECT '{"resId":true}', 2 FROM DUAL UNION ALL
SELECT '{"resId":false}', 3 FROM DUAL UNION ALL
SELECT '{"resId":123}', 4 FROM DUAL UNION ALL
SELECT '{"resId":""}', 5 FROM DUAL UNION ALL
SELECT '{"resId":"\r\n"}', 6 FROM DUAL UNION ALL
SELECT '{"resId":"test"}', 7 FROM DUAL UNION ALL
SELECT '{"resId":"' || CHR(13) || CHR(10) || '"}', 8 FROM DUAL;
Query 1:
SELECT column1,
column2,
regexp_replace(
column1,
'("resId":)(null|true|false|(-?0|[1-9]\d*)(\.\d*)?([eE][+-]?\d+)?|"(\\["\/bfrnt]|\\u\d{4}|[^"\/'||CHR(8)||CHR(9)|| CHR(10)||CHR(12)||CHR(13)||'])*")',
'\1' || column2
) As repl
FROM table_name
Results:
| COLUMN1 | COLUMN2 | REPL |
|------------------|---------|----------------|
| {"resId":null} | 1 | {"resId":1} |
| {"resId":true} | 2 | {"resId":2} |
| {"resId":false} | 3 | {"resId":3} |
| {"resId":123} | 4 | {"resId":4} |
| {"resId":""} | 5 | {"resId":5} |
| {"resId":"\r\n"} | 6 | {"resId":6} |
| {"resId":"test"} | 7 | {"resId":7} |
| {"resId":" | 8 | {"resId":" | -- Note: not well-formed JSON
| "} | | "} | -- so did not get matched.

Return column names based on which holds the maximum value in the record

I have a table with the following structure ...
+--------+------+------+------+------+------+
| ID | colA | colB | colC | colD | colE | [...] etc.
+--------+------+------+------+------+------+
| 100100 | 15 | 100 | 90 | 80 | 10 |
+--------+------+------+------+------+------+
| 100200 | 10 | 80 | 90 | 100 | 10 |
+--------+------+------+------+------+------+
| 100300 | 100 | 90 | 10 | 10 | 80 |
+--------+------+------+------+------+------+
I need to return a concatenated value of column names which hold the maximum 3 values per row ...
+--------+----------------------------------+
| ID | maxCols |
+--------+----------------------------------+
| 100100 | colB,colC,colD |
+--------+------+------+------+------+------+
| 100200 | colD,colC,colB |
+--------+------+------+------+------+------+
| 100300 | colA,colB,colE |
+--------+------+------+------+------+------+
It's okay to not concatenate the column names, and have maxCol1 | maxCol2 | maxCol3 if that's simpler
The order of the columns is important when concatenating them
The number of columns is limited and not dynamic
The number of rows is many

You could use UNPIVOT and get TOP 3 for each ID
;with temp AS
(
SELECT ID, ColValue, ColName
FROM #SampleData sd
UNPIVOT
(
ColValue For ColName in ([colA], [colB], [colC], [colD], [colE])
) unp
)
SELECT sd.ID, ca.ColMax
FROM #SampleData sd
CROSS APPLY
(
SELECT STUFF(
(
SELECT TOP 3 WITH TIES
',' + t.ColName
FROM temp t
WHERE t.ID = sd.ID
ORDER BY t.ColValue DESC
FOR XML PATH('')
)
,1,1,'') AS ColMax
) ca
See demo here: http://rextester.com/CZCPU51785

Here is one trick to do it using Cross Apply and Table Valued Constructor
SELECT Id,
maxCols= Stuff(cs.maxCols, 1, 1, '')
FROM Yourtable
CROSS apply(SELECT(SELECT TOP 3 ',' + NAME
FROM (VALUES (colA,'colA'),(colB,'colB'),(colC,'colC'),
(colD,'colD'),(colE,'colE')) tc (val, NAME)
ORDER BY val DESC
FOR xml path, type).value('.[1]', 'nvarchar(max)')) cs (maxCols)
If needed it can be made dynamic using Information_schema.Columns

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to clean a string and extract number postfixes in T-SQL - sql-server

Related

sql split row value before and after substring

PIVOT Multiple columns in tsql

How to select all PK's (column 1) where the MAX(ISNULL(value, 0)) in column 3 grouped by a value in column 2?

How can I use column reference in REGEXP_REPLACE in Oracle?

Return column names based on which holds the maximum value in the record

Categories

Resources