Sub string replacement in Snowflake SQL - snowflake-cloud-data-platform

My need is to replace, format a string to make it match to the key.
-- Replace symbols ( ) , with no space,
-- replace single space with underscore,
-- replace BAND with BD
e.g. x_input = Higher Education Worker Level 10, Band 2 (salaried)
x_output = HEW_HIGHER_EDUCATION_WORKER_LEVEL_10_BD_2_SALARIED
I have written the code with nested replace which gives the correct output pattern
select 'Higher Education Worker Level 10, Band 2 (salaried)' as class_0,
replace(replace(replace(upper('Higher Education Worker Level 10, Band 2 (salaried)'), '(', ''), ')', ''), ' ', '_') as class_1,
replace(class_1, ',', '') as class_2,
replace(class_2, 'Band', 'BD') as class_4
Is there a more elegant way to do this, i was reading through the snowflake regex pattern matching help, but was not able to find a cleaner way and it too nested couple of iterations.
Any hint would be appreciated.
Thanks

For the one character replacement and removal you can use translate(), which will shorten the multiple replace() by a lot.
https://docs.snowflake.com/en/sql-reference/functions/translate.html
Query with identical results from the question, but way less code:
select 'Higher Education Worker Level 10, Band 2 (salaried)' as class_0,
translate(upper(class_0), ' ()', '_') as class_1,
replace(class_1, ',', '') as class_2,
replace(class_2, 'Band', 'BD') as class_4;
In one step:
select replace(translate(upper(class_0), ' (),', '_'), 'Band', 'BD') class_4
from (
select 'Higher Education Worker Level 10, Band 2 (salaried)' class_0
)

Related

issue with MSSQL fulltext search

original table contains json, but i've stripped it down to the table below:
id
json
1
"name":"one.it.two"
2
"name": "one.it.two"
difference between the two rows is the space after :
catalog has no stopwords.
searching for CONTAINS (json, 'it') return both rows.
searching for CONTAINS (json, 'two') return both rows.
searching for CONTAINS (json, 'one') returns only the second row.
why does searching for one not return the first row?
i've reduced the test case even further. thanks to #RobinWebb
this is no more a json or delimited text issue.
id
text1
1
name:first.it
2
name: first.it
difference between the two rows is the space after :
searching for first does not return the first row.
search works if i change first.it to first.and
thanks to #AlwaysLearning, this is an issue with the word breaker
results from sys.dm_fts_parser is not consistent:
text
words
name:first.it
name:first.itname:firstit
name:first.and
namefirst.andfirstand
name:first,it
namefirstit
i used SELECT * FROM sys.dm_fts_parser ('"<text>"', 1033, NULL, 0)
Based on the info provided in this answer https://dba.stackexchange.com/a/65845/94130
it seems that .it is treated as a special word (possibly a top level domain) by word breaker.
I can only infer that this "special word logic" has a b̶u̶g̶ feature in it, where : is treated as part of the name. Examples:
SELECT * FROM sys.dm_fts_parser (' "name:first.it" ', 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser (' "name:first.net" ', 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser (' "name:first.com" ', 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser (' "name:first.gov" ', 1033, 0, 0);
Notice that it always returns one extra result. I assume it includes the extra line when it thinks that part of the string is a URL.
special_term
display_term
expansion_type
source_term
Exact Match
name:first.gov
0
name:first.gov
Exact Match
name
0
name:first.gov
Exact Match
:first
0
name:first.gov
Exact Match
gov
0
name:first.gov
Note that some words are not affected:
SELECT * FROM sys.dm_fts_parser (' "name:first.he" ', 1033, 0, 0);
I have modified code provided in https://dba.stackexchange.com/a/25848/94130
to get all characters that are treated this way.
declare #i integer
declare #cnt integer
set #i=0
while #i<255
begin
set #cnt=0
select #cnt=COUNT(1) FROM sys.dm_fts_parser ('"name'+REPLACE(CHAR(#i),'"','""')+'first.net"', 1033, 0, 0)
WHERE display_term = CHAR(#i) + 'first'
if #cnt=1
begin
print 'this char - '+CASE WHEN #i > 31 THEN char(#i) ELSE '' END+' - char('+convert(varchar(3),#i)+') is included'
end
set #i=#i+1
end
Output:
this char - - char(0) is included
this char - : - char(58) is included
this char - ­ - char(173) is included
There is a Microsoft article explaining how to switch word breakers/stemmers, which may (or may not) solve this but I have not tried this.
Note: Above code was executed on Win 11 and SQL 2019 Dev

SQL Server 2014: Invalid length parameter passed to the LEFT or SUBSTRING function

I am getting the following error while trying to execute the query shown in SQL Server 2014. I have data customers chat data and I want to replace the customers with "Customer" and Agent name with "Agent"
Error :- Invalid length parameter passed to the LEFT or SUBSTRING function.
Data format:
11:35:41 Daniella Sichtman : I don't mind. It's ok
11:35:55 Daniella Sichtman : Did you understand my problem?
11:36:09 Madan : Yes, I got your issue.
11:36:20 Madan : Please stay connected while I check what best I can do for you.
11:37:01 Daniella Sichtman : OK. If I may suggest. Mail the hotel that we need 2 nights. I have their contact information if you need that.
11:37:21 Daniella Sichtman : The room are availible they told us
11:37:41 Daniella Sichtman : Just need an ok
11:37:43 Madan : Have you visited the hotel reception to extend your stay ?
11:38:01 Daniella Sichtman : Yes. They told us you need to give the ok
11:39:14 Madan : Nico, I would like to Inform you that we have already authorized to the hotel to extend the stay for our guests.
11:39:46 Daniella Sichtman : They don't know about that or did you told them this morning?
SQL Code:-
SELECT REPLACE(Transcript,
SUBSTRING(SUBSTRING(transcript, CHARINDEX(' ', transcript) + 1, (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript))),
1,
CASE
WHEN CHARINDEX(':', SUBSTRING(transcript, CHARINDEX(' ', transcript) + 1, (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript)))) = 0 THEN LEN(Transcript)
ELSE CHARINDEX(':', SUBSTRING(transcript, CHARINDEX(' ', transcript) + 1, (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript)))) - 2
END),
'Customer')
FROM [easyJet_Staging].[dbo].[KANA_Chat_Transcript_StageAll]
WHERE Transcript IS NOT NULL
AND Transcript <> '';
Note: I have seen many post but didn't get the desired output. I would request to all of you, could you help me.
Expected output:-
XX:XX:XX Agent : I do understand what you are saying but I am afraid, but any instrument larger than XXcm x XXXcm x XXcm like a double bass or harp can’t be taken on board the aircraft as cabin baggage.
XX:XX:XX Customer : Hello, it's a GUITAR. it is just X cm bigger, the case is Xcm thicker.
XX:XX:XX Customer : it's a soft case that can be pushed to fit your dimensions
XX:XX:XX Agent : I do understand what you are saying but even it is X cm then also they will ask you to put it on hold and we do understand that you are worried if it gets damaged but there won't be any case and you can get the fragile tag from the Airport so that it will be taken care.
XX:XX:XX Customer : What if I take it on board (priority boarding) and it fits within their dimensions exactly and can go in an overhead locker?
SUBSTRING spec indicates that it will accept any int for the the second argument, but must have a positive int for the third. So the problem is probably in the third parameter. ( https://learn.microsoft.com/en-us/sql/t-sql/functions/substring-transact-sql?view=sql-server-2017 )
Sounds like you have records in your data that don't fit what you are expecting: Try the query to identify them. It looks for any records where the third parameter of the substring is zero or negative.
Because of the way SQL works, sometimes functions in the select will be applied to records before they are filtered out using the where clause. It is safest to make sure your select will work on all records, even the ones that will be filtered out in the where clause.
SELECT *
FROM
[easyJet_Staging].[dbo].[KANA_Chat_Transcript_StageAll]
WHERE (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript)) <= 0
OR CASE
WHEN CHARINDEX(':', SUBSTRING(transcript, CHARINDEX(' ', transcript) + 1, (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript)))) = 0 THEN LEN(Transcript)
ELSE CHARINDEX(':', SUBSTRING(transcript, CHARINDEX(' ', transcript) + 1, (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript)))) - 2
END <=0

MS SQL REPLACE based on 1 character to the left of the $

I am not a SQL expert so please forgive me if this is SQL 101 :).
In a select statement there are 2 replace functions. They look for a Servername and it's admin share d$ by it's UNC path. Example '\SERVERNAME\d$'
It then replaces '\SERVERNAME\d$' with 'D:'.
Here is the query currently:
select Replace(p.Path,'\\SERVERNAME\d$','D:') as searchpath
,p.path as fullpath
,s.ShareName
,s.SharePath
,p.Member
,p.Access
From Paths As p
Left Outer Join Shares as s on
Replace(p.Path,'\\SERVERNAME\d$','D:') Like s.SharePath + '\%'
Up until now it has always been d$.
Today my needs have changed and I need the query to find ANY servername UNC path admin share regardless of share letter (c$, d$, e$, f$...etc) and replace it with it's respective drive letter (D:, E:, F:... etc).
My thought is replace function could find the $ and look one character to the left of it to get the proper share letter, then use that for the replace. The issue I have, not being a SQL professional, is that I know SQL can likley do what I need it to do...I just don't know how to get there. I've googled and found some examples, but haven't had any luck in getting them to work.
Any help would be greatly appreciated.
You can use a combination of STUFF, PATINDEX, LEN to get what you want.
Sample Query
DECLARE #ReplaceChar VARCHAR(100) = '[prefixcharacters]\\SERVERNAME\d$[postcharacter]'
DECLARE #SearchString VARCHAR(100) = '\\SERVERNAME\_$'
SELECT
STUFF(#ReplaceChar,PATINDEX('%' + #SearchString + '%',#ReplaceChar),LEN(#SearchString),
UPPER(SUBSTRING(#ReplaceChar,PATINDEX('%' + #SearchString + '%',#ReplaceChar) + LEN(#SearchString) - 2,1)) + ':') as searchpath
WHERE PATINDEX('%' + #SearchString + '%',#ReplaceChar) > 0
Output
[prefixcharacters]D:[postcharacter]
Alternate Query
You can shorten the query if you want to get the previous character before $ as per your title. Something like this
DECLARE #ReplaceChar VARCHAR(100) = '[prefixcharacters]\\SERVERNAME\d$[postcharacter]'
DECLARE #SearchString VARCHAR(100) = '\\SERVERNAME\_$'
SELECT
STUFF(#ReplaceChar,
PATINDEX('%'+#SearchString+'%',#ReplaceChar),
LEN(#SearchString),
UPPER(SUBSTRING(#ReplaceChar,CHARINDEX('$',#ReplaceChar) -1,1)) + ':')
WHERE PATINDEX('%'+#SearchString+'%',#ReplaceChar) > 0
In this query
STUFF replaces your pattern with with the character before $ + ':'
Start of pattern is identified by PATINDEX('%'+#SearchString+'%',#ReplaceChar)
D is identified by getting the charindex of '$' and then getting the previous character using SUBSTRING
What about ¸
select Replace(SUBSTRING(p.path, 14, Len(#spath)-14),'$',':') as searchpath
,p.path as fullpath
,s.ShareName
,s.SharePath
,p.Member
,p.Access
From Paths As p
Left Outer Join Shares as s on
Replace(SUBSTRING(p.path, 14, Len(#spath)-14),'$',':') Like s.SharePath + '\%
select as searchpath
DECLARE #str nvarchar (100)
SET #str = '\\SERVERNAME\d$'
IF #str LIKE '\\SERVERNAME\_$'
SET #str = UPPER(SUBSTRING(#str, 14, 1)) + ':'
SELECT #str
Starting from previous, something like
select UPPER(SUBSTRING(p.path, 14, 1)) + ':' as searchpath
,p.path as fullpath
,s.ShareName
,s.SharePath
,p.Member
,p.Access
From Paths As p
Left Outer Join Shares as s on
SUBSTRING(p.path, 14, 1) + ':' Like s.SharePath + '\%'
I am no mysql expert either :)
Based on the logic you mentioned in the last part of the question, I have used concat and substring to get to the drive letter in the column.
Hope this helps
select replace(path, concat(substring(path, 1, locate('$', path) - 2), substring(path, locate('$', path) - 1, 1) , '$'), concat(substring(path, locate('$', path) - 1, 1) , ':')) as searchpath ...
The remaining part of the query would be the same.

Get a Substring in SQL between two characters and remove all white spaces

I've a strings such as:
Games/Maps/MapsLevel1/Level 1.swf
Games/AnimalWorld/Animal1.1/Level 1.1.swf
Games/patterns and spatial understanding/Level 13.5/Level 13.5.swf
I want to get only file name without its extension(String After last Slash and before Last dot), i.e Level 1 and Level 1.1 and Level 13.5, Even I want to remove all the white spaces and the final string should be in lower case i.e the final output should be
level1
level1.1
level13.5 and so on..
I tried following query but i got Level 1.swf, How do i change this Query?
SELECT SUBSTRING(vchServerPath, LEN(vchServerPath) - CHARINDEX('/', REVERSE(vchServerPath)) + 2, LEN(vchServerPath)) FROM Games
SELECT (left((Path), LEN(Path) - charindex('.', reverse(Path))))
FROM
(
SELECT SUBSTRING(vchServerPath,
LEN(vchServerPath) - CHARINDEX('/', REVERSE(vchServerPath)) + 2,
LEN(vchServerPath)) Path
FROM Games
) A
This would work, I kept your inner substring which got you part way and I added the stripping of the dot.
I have included a sql fiddle link for you to see it in action sql fiddle
Edited:
Following will remove the white space and returns lower case...
SELECT REPLACE(LOWER((left((Path), LEN(Path) - charindex('.', reverse(Path))))), ' ', '')
FROM
(
SELECT SUBSTRING(vchServerPath,
LEN(vchServerPath) - CHARINDEX('/', REVERSE(vchServerPath)) + 2,
LEN(vchServerPath)) Path
FROM Games
) A
Try this:
select
case
when vchServerPath is not null
then reverse(replace(substring(reverse(vchServerPath),charindex('.',reverse(vchServerPath))+1, charindex('/',reverse(vchServerPath))-(charindex('.',reverse(vchServerPath))+1)),' ',''))
else ''
end
This should work fine; with extension removed.
select
REVERSE(
SUBSTRING(
reverse('Games/patterns and spatial understanding/Level 13.5/Level 13.5.swf'),
5,
(charindex('/',
reverse('Games/patterns and spatial understanding/Level 13.5/Level 13.5.swf')) - 5)
))

Oracle split text into multiple rows

Inside a varchar2 column I have text values like :
aaaaaa. fgdfg.
bbbbbbbbbbbbbb ccccccccc
dddddd ddd dddddddddddd,
asdasdasdll
sssss
if i do select column from table where id=... i get the whole text in a single row, normally.
But i would like to get the result in multiple rows, 5 for the example above.
I have to use just one select statement, and the delimiters will be new line or carriage return (chr(10), chr(13) in oracle)
Thank you!
Like this, maybe (but it all depends on the version of oracle you are using):
WITH yourtable AS (SELECT REPLACE('aaaaaa. fgdfg.' ||chr(10)||
'bbbbbbbbbbbbbb ccccccccc ' ||chr(13)||
'dddddd ddd dddddddddddd,' ||chr(10)||
'asdasdasdll ' ||chr(13)||
'sssss '||chr(10),chr(13),chr(10)) AS astr FROM DUAL)
SELECT REGEXP_SUBSTR ( astr, '[^' ||chr(10)||']+', 1, LEVEL) data FROM yourtable
CONNECT BY LEVEL <= LENGTH(astr) - LENGTH(REPLACE(astr, chr(10))) + 1
see: Comma Separated values in Oracle
The answer by Kevin Burton contains a bug if your data contains empty lines.
The adaptation below, based on the solution invented here, works. Check that post for an explanation on the issue and the solution.
WITH yourtable AS (SELECT REPLACE('aaaaaa. fgdfg.' ||chr(10)||
'bbbbbbbbbbbbbb ccccccccc ' ||chr(13)||
chr(13)||
'dddddd ddd dddddddddddd,' ||chr(10)||
'asdasdasdll ' ||chr(13)||
'sssss '||chr(10),chr(13),chr(10)) AS astr FROM DUAL)
SELECT REGEXP_SUBSTR ( astr, '([^' ||chr(10)||']*)('||chr(10)||'|$)', 1, LEVEL, null, 1) data FROM yourtable
CONNECT BY LEVEL <= LENGTH(astr) - LENGTH(REPLACE(astr, chr(10))) + 1;

Resources