Extract particular text from String in Snowflake - snowflake-cloud-data-platform

I m new to snowflake.
Input String : ["http://info.wealthenhancement.com/ppc-rt-retirement-planning"]
Output String : info.wealthenhancement.com/ppc-rt-retirement-planning
Please help to get output string.
Thanks

Use the substr function to only take characters from the 8th character to the end:
select
'http://info.wealthenhancement.com/ppc-rt-retirement-planning' as orig_value,
substr(orig_value, 8) as new_value
The output is:
+-------------------------------------------------------------+-------------------------------------------------------+
|ORIG_VALUE | NEW_VALUE |
+-------------------------------------------------------------+-------------------------------------------------------+
|http://info.wealthenhancement.com/ppc-rt-retirement-planning | info.wealthenhancement.com/ppc-rt-retirement-planning |
+-------------------------------------------------------------+-------------------------------------------------------+

This will work for http and https URLs by splitting using // as a delimiter. Only the last statement is required. The other two show how it's done built into steps:
-- Set a session variable to the string
set INPUT_STRING = '["http://info.wealthenhancement.com/ppc-rt-retirement-planning"]';
-- Trim leading and trailing square brackets and double quotes
select (trim($INPUT_STRING, '"[]'));
-- Split using // as a delimiter and keep only the right part and cast as string
select split((trim($INPUT_STRING, '"[]')), '//')[1]::string as URL

Related

Regex string with 2+ different numbers and some optional characters in Snowflake syntax

I would like to check if a specific column in one of my tables meets the following conditions:
String must contain at least three characters
String must contain at least two different numbers [e.g. 123 would work but 111 would not]
Characters which are allowed in the string:
Numbers (0-9)
Uppercase letters
Lowercase letters
Underscores (_)]
Dashes (-)
I have some experience with Regex but am having issues with Snowflake's syntax. Whenever I try using the '?' regex character (to mark something as optional) I receive an error. Can someone help me understand a workaround and provide a solution?
What I have so far:
SELECT string,
LENGTH(string) AS length
FROM tbl
WHERE REGEXP_LIKE(string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$')
ORDER BY length;
Thanks!
Your regex looks a little confusing and invalid, and it doesn't look like it quite meets your needs either. I read this expression as a string that:
Must start with one or more digits, at least 3 or more times
The confusing part to me is the '+' is a quantifier, which is not quantifiable with {3,} but somehow doesn't produce an error for me
Optionally followed by either a dash or plus sign
Followed by an uppercase character zero or one times (giving back as needed)
Followed by and ending with a lowercase character zero or one times (giving back as needed)
Questions
You say that your string must contain 3 characters and at least 2 different numbers, numbers are characters but I'm not sure if you mean 3 letters...
Are you considering the numbers to be characters?
Does the order of the characters matter?
Can you provide an example of the error you are receiving?
Notes
Checking for a second digit that is not the same as the first involves the concept of a lookahead with a backreference. Snowflake does not support backreferences.
One thing about pattern matching with regular expressions is that order makes a difference. If order is not of importance to you, then you'll have multiple patterns to match against.
Example
Below is how you can test each part of your requirements individually. I've included a few regexp_substr functions to show how extraction can work to check if something exists again.
Uncomment the WHERE clause to see the dataset filtered. The filters are written as expressions so you can remove any/all of the regexp_* columns.
select randstr(36,random(123)) as r_string
,length(r_string) AS length
,regexp_like(r_string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$') as reg
,regexp_like(r_string,'.*[A-Za-z]{3,}.*') as has_3_consecutive_letters
,regexp_like(r_string,'.*\\d+.*\\d+.*') as has_2_digits
,regexp_substr(r_string,'(\\d)',1,1) as first_digit
,regexp_substr(r_string,'(\\d)',1,2) as second_digit
,first_digit <> second_digit as digits_1st_not_equal_2nd
,not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) as first_digit_does_not_appear_again
,has_3_consecutive_letters and has_2_digits and first_digit_does_not_appear_again as test
from table(generator(rowcount => 10))
//where regexp_like(r_string,'.*[A-Za-z]{3,}.*') // has_3_consecutive_letters
// and regexp_like(r_string,'.*\\d+.*\\d+.*') // has_2_digits
// and not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) // first_digit_does_not_appear_again
;
Assuming the digits need to be contiguous, you can use a javascript UDF to find the number in a string with with the largest number of distinct digits:
create or replace function f(S text)
returns float
language javascript
returns null on null input
as
$$
const m = S.match(/\d+/g)
if (!m) return 0
const lengths = m.map(m=> [...new Set (m.split(''))].length)
const max_length = lengths.reduce((a,b) => Math.max(a,b))
return max_length
$$
;
Combined with WHERE-clause, this does what you want, I believe:
select column1, f(column1) max_length
from t
where max_length>1 and length(column1)>2 and column1 rlike '[\\w\\d-]+';
Yielding:
COLUMN1 | MAX_LENGTH
------------------------+-----------
abc123def567ghi1111_123 | 3
123 | 3
111222 | 2
Assuming this input:
create or replace table t as
select * from values ('abc123def567ghi1111_123'), ('xyz111asdf'), ('123'), ('111222'), ('abc 111111111 abc'), ('12'), ('asdf'), ('123 456'), (null);
The function is even simpler if the digits don't have to be contiguous (i.e. count the distinct digits in a string). Then core logic changes to:
const m = S.match(/\d/g)
if (!m) return 0
const length = [...new Set (m)].length
return length
Hope that's helpful!

Find first appearance of a character in a set of possible characters in a string in SQL Server 2012

I'm aware of the SQL Server CHARINDEX function which returns the position of a character (or sub-string) within another string. Still, I did not find any evident that there is support for regular expressions (unless I develop my own UDF).
What I'm looking for is the ability to find the first position of any character in a set within a string.
Example:
DECLARE #_Source_String NVARCHAR(100) = 'This is "MY" string \ and here is more text' ;
SELECT <some function> (#_Source_String,'"\') ;
This should return 9 because " appears before \. On the other hand:
SELECT <some function> (#_Source_String,'x\') ;
should return 21 because \ is before x.
I should add that performance is very important since this function/mechanism will be invoked with very high frequency.
Pattern matching capabilities in TSQL are pretty basic and often you would require CLR and regular expressions.
You can do this requirement with PATINDEX though. A list of characters in square brackets denotes a set of characters to match.
DECLARE #_Source_String NVARCHAR(100) = 'This is "MY" string \ and here is more text';
SELECT PATINDEX('%["\]%', #_Source_String),
PATINDEX('%[x\]%', #_Source_String);
Returns
+------------------+------------------+
| (No column name) | (No column name) |
+------------------+------------------+
| 9 | 21 |
+------------------+------------------+

SQL Select statement until a character

I'm looking to extract all the text up until a '\' (backslash).
The substring is required to remove all proceeding characters (17 in total) and so I would like to return all after the 17th until it comes across a backslash.
I've tried using charindex but it doesn't seem to stop at the \ it returns characters afterward. My code is as follows
SELECT path, substring(path,17, CHARINDEX('\',Path)+ LEN(Path)) As Data
FROM [Table].[dbo].[Projects]
WHERE Path like '\ENQ%\' AND
Deleted = '0'
Example
The below screen shot shows the basic query and result i.e the whole string
I then use substring to remove the first X characters as there will always be the same amount of proceeding characters
But what Im actually after is (based on the above result) the "Testing 1" "Testing 2" and "Testing ABC" section
The substring is required to remove all proceeding characters (17 in total) and so I would like to return all after the 17th until it comes across a backslash.
select
substring(path,17,CHARINDEX('\',Path)-17)
from
table
To overcome Invalid length parameter passed to the LEFT or SUBSTRING function error, you can use CASE
select
substring(path,17,
CASE when CHARINDEX('\',Path,17)>0
Then CHARINDEX('\',Path)-17)
else VA end
)
from
table

Make substring using a specific delimiter in SQL

I want to make a substring of a column value using a specific delimiter.I tried SUBSTRING_INDEX,but it doesn't work for SQL.Is there any way to achieve this??
Column values are:
ARTCSOFT-1111
ARTCSOFT-1112
ARTCSOFT-1113
and I want to achieve the same thing in SQL:
SUBSTRING_INDEX(Code,'SOFT-',1))
i.e I want the number after SOFT- in my substring.I can't use only - because before SOFT- there is chance that - may occur(rare case,but I don't want to take a chance)
Try using just SUBSTRING . For example
SELECT
SUBSTRING(code, CHARINDEX('SOFT-', code) + 5, LEN(code)) AS [name] from dbo.yourtable
hope this helps.
Tested Result:
SELECT RIGHT(Code , CHARINDEX ('-' ,REVERSE(Code))-1)
Read this as: Get the rightmost string after the first '-' in a reversed string - which is the same as the string after the last '-' character.
Try This Query:
select substring(col,charindex('-',col)+1,len(col)-charindex('-',col)) from #Your_table
Explanation of Query:
Here Charindex find the '-' delimeter [length] IN Given String and now that Result[length+1] is our starting point and ending length is [len(col)-starting length] gives ending point and then use substring Function to split a string according to our requirement.
Result of Query:
Required_col
1111
1112
1113

Validate the first part of a splitted string

I need to validate in my query if the value of a string (the first part) is equal to a definited value, for instance:
String
----------
F11-EDEDED
F1-SAFSDA
F455-ADADD
F11-ASDA-FAFA
And validate when the string is F11, i was searching something like split on vba, but i can't find it.
Im working with :
Case when ("splitted string") =F11 then X)
Use a Left() and Charindex() to grab the beginning of your strings.
Declare #str varchar(100)='F11-ASDA-FAFA'
Select #str,Case When left(#str,charindex('-',#str)-1)='F11' Then 1 Else 0 End

Resources