Snowflake REGEX to identify if column contains any digit - snowflake-cloud-data-platform

I currently have select REGEXP_LIKE(col, '[0-9]+') which seems to return True only if all the characters in the string are numeric.
For example, it returns True for 12345 but False for something like 100 Apple St.
What is the necessary regex pattern to return True in both examples above?

To check if a column contains any digit, you can modify your current pattern to use the .* character to match any number of characters before or after the digit(s):
SELECT REGEXP_LIKE(col, '.*[0-9].*')

Related

Regex string with 2+ different numbers and some optional characters in Snowflake syntax

I would like to check if a specific column in one of my tables meets the following conditions:
String must contain at least three characters
String must contain at least two different numbers [e.g. 123 would work but 111 would not]
Characters which are allowed in the string:
Numbers (0-9)
Uppercase letters
Lowercase letters
Underscores (_)]
Dashes (-)
I have some experience with Regex but am having issues with Snowflake's syntax. Whenever I try using the '?' regex character (to mark something as optional) I receive an error. Can someone help me understand a workaround and provide a solution?
What I have so far:
SELECT string,
LENGTH(string) AS length
FROM tbl
WHERE REGEXP_LIKE(string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$')
ORDER BY length;
Thanks!
Your regex looks a little confusing and invalid, and it doesn't look like it quite meets your needs either. I read this expression as a string that:
Must start with one or more digits, at least 3 or more times
The confusing part to me is the '+' is a quantifier, which is not quantifiable with {3,} but somehow doesn't produce an error for me
Optionally followed by either a dash or plus sign
Followed by an uppercase character zero or one times (giving back as needed)
Followed by and ending with a lowercase character zero or one times (giving back as needed)
Questions
You say that your string must contain 3 characters and at least 2 different numbers, numbers are characters but I'm not sure if you mean 3 letters...
Are you considering the numbers to be characters?
Does the order of the characters matter?
Can you provide an example of the error you are receiving?
Notes
Checking for a second digit that is not the same as the first involves the concept of a lookahead with a backreference. Snowflake does not support backreferences.
One thing about pattern matching with regular expressions is that order makes a difference. If order is not of importance to you, then you'll have multiple patterns to match against.
Example
Below is how you can test each part of your requirements individually. I've included a few regexp_substr functions to show how extraction can work to check if something exists again.
Uncomment the WHERE clause to see the dataset filtered. The filters are written as expressions so you can remove any/all of the regexp_* columns.
select randstr(36,random(123)) as r_string
,length(r_string) AS length
,regexp_like(r_string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$') as reg
,regexp_like(r_string,'.*[A-Za-z]{3,}.*') as has_3_consecutive_letters
,regexp_like(r_string,'.*\\d+.*\\d+.*') as has_2_digits
,regexp_substr(r_string,'(\\d)',1,1) as first_digit
,regexp_substr(r_string,'(\\d)',1,2) as second_digit
,first_digit <> second_digit as digits_1st_not_equal_2nd
,not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) as first_digit_does_not_appear_again
,has_3_consecutive_letters and has_2_digits and first_digit_does_not_appear_again as test
from table(generator(rowcount => 10))
//where regexp_like(r_string,'.*[A-Za-z]{3,}.*') // has_3_consecutive_letters
// and regexp_like(r_string,'.*\\d+.*\\d+.*') // has_2_digits
// and not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) // first_digit_does_not_appear_again
;
Assuming the digits need to be contiguous, you can use a javascript UDF to find the number in a string with with the largest number of distinct digits:
create or replace function f(S text)
returns float
language javascript
returns null on null input
as
$$
const m = S.match(/\d+/g)
if (!m) return 0
const lengths = m.map(m=> [...new Set (m.split(''))].length)
const max_length = lengths.reduce((a,b) => Math.max(a,b))
return max_length
$$
;
Combined with WHERE-clause, this does what you want, I believe:
select column1, f(column1) max_length
from t
where max_length>1 and length(column1)>2 and column1 rlike '[\\w\\d-]+';
Yielding:
COLUMN1 | MAX_LENGTH
------------------------+-----------
abc123def567ghi1111_123 | 3
123 | 3
111222 | 2
Assuming this input:
create or replace table t as
select * from values ('abc123def567ghi1111_123'), ('xyz111asdf'), ('123'), ('111222'), ('abc 111111111 abc'), ('12'), ('asdf'), ('123 456'), (null);
The function is even simpler if the digits don't have to be contiguous (i.e. count the distinct digits in a string). Then core logic changes to:
const m = S.match(/\d/g)
if (!m) return 0
const length = [...new Set (m)].length
return length
Hope that's helpful!

How to Convert Get Text Value to ArrayList in Robot framework

I would like to know how to convert this value to ArrayList?
${doc1}= Open Excel Document filename=${OpenExcel} doc_id=doc1
${view_bicccicmdu}= Read Excel Row row_num=1 max_num=6 sheet_name=UpperTT
${view_bicccicmduCheckLength}= Get Length ${view_bicccicmdu}
${HG}= Get Text ${ClickAV.CheckColumn}
${HGLenght}= Get Line Count ${HG}
Should Be Equal ${HGLenght} ${view_bicccicmduCheckLength}
Should Contain ${HG} ${view_bicccicmdu} ignore_case=True
Close Excel Document
But the result is
${HG} = Nodename
Transdate
BICC Support FAX Detection
Trunk Group Number
Bill Trunk Group Number
MGW Name Trunk
Group Name
Sub-Route Name
Circuit Type
Group Direction
Circuit Selection Mode
I need to convert it to be ArrayList and should count to be 11 Records, What should I do?
You can use the String Library and Split the string using \n as your separator, because in your case your data is separated by a line break, You can split the string into a list.
Splits the string using separator as a delimiter string.
If a separator is not given, any whitespace string is a separator. In
that case also possible consecutive whitespace as well as leading and
trailing whitespace is ignored.
Split words are returned as a list. If the optional max_split is
given, at most max_split splits are done, and the returned list will
have maximum max_split + 1 elements
You can do the following.
*** Test Cases ***
Test
${HG} = Set Variable Nodename\n ransdate\n ICC Support FAX Detection\n Trunk Group Number\n Bill Trunk Group Number\n MGW Name Trunk\n Group Name\n Sub-Route Name\n Circuit Type\n Group Direction\n Circuit Selection Mode\n
#{words} = Split String ${HG} \n
${HGLenght}= Get length ${words}
log ${words}
Results
${HGLenght} = 11
${words} = ['Nodename', 'ransdate', 'ICC Support FAX Detection', 'Trunk Group Number', 'Bill Trunk Group Number', 'MGW Name Trunk', 'Group Name', 'Sub-Route Name', 'Circuit Type', 'Group Direction', 'Circuit Selection Mode']
Hope This Helps
Thank you again, #WojTek T
My final code is
`${HG}= Get Text ${ClickAV.CheckColumn}
#{words} = Split String ${HG} \n
${UPPER1}= Evaluate "${words}".upper()
${UPPER2}= Evaluate "${view_dnc}".upper()
${HGLenght}= Get Line Count ${HG}
Should Be Equal ${HGLenght} ${view_dncCheckLength}
Should Contain ${UPPER1} ${UPPER2}`
I try to use "Get List Item" with Table name but It doesn't work, I should do this solution for my last question that I asking u. haha
Thank you again.

SQL Server - How to get last numeric value in the given string

I am trying to get last numeric part in the given string.
For Example, below are the given strings and the result should be last numeric part only
SB124197 --> 124197
287276ACBX92 --> 92
R009321743-16 --> 16
How to achieve this functionality. Please help.
Try this:
select right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
Explanation:
Using PATINDEX with '%[^0-9]%' as a search pattern you get the starting position of the first occurrence of a character that is not a number.
Using REVERSE you get the position of the first non numeric character starting from the back of the string.
Edit:
To handle the case of strings not containing non numeric characters you can use:
select case
when patindex(#str, '%[^0-9]%') = 0 then #str
else right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
end
If your data always contains at least one non-numeric character then you can use the first query, otherwise use the second one.
Actual query:
So, if your table is something like this:
mycol
--------------
SB124197
287276ACBX92
R009321743-16
123456
then you can use the following query (works in SQL Server 2012+):
select iif(x.i = 0, mycol, right(mycol, x.i - 1))
from mytable
cross apply (select patindex('%[^0-9]%', reverse(mycol) )) as x(i)
Output:
mynum
------
124197
92
16
123456
Demo here
Here is one way using Patindex
SELECT RIGHT(strg, COALESCE(NULLIF(Patindex('%[^0-9]%', Reverse(strg)), 0) - 1, Len(strg)))
FROM (VALUES ('SB124197'),
('287276ACBX92'),
('R009321743-16')) tc (strg)
After reversing the string, we are finding the position of first non numeric character and extracting the data from that position till the end..
Result :
-----
124197
92
16

String in Postgres; validation of the contents of a string

I would like a way to validate a string, it needs to have only letters and uppercase; but I could not find a way to do that.
Check for all records where the value 'record' matches only upper or lowercase letters from a-z
SELECT value FROM values WHERE value ~ '[A-Za-z]';

Regex for one column that has numbers and letters but not one or the other

I am attempting to search a column that contains alphanumeric ids in it but want to write a query that returns records with letters and numbers but not one or the other.
i.e Acceptable: jjk44kndkfndFF
i.e Not acceptable: 223232323232 or aajnfdskDFdd
So far I have:
where PATINDEX('%[^a-zA-Z0-9 ]%',columnInQuestion)
This returns all alphanumeric records. Any direction appreciated
I think you need three predicates in the WHERE clause:
WHERE (columnInQuestion NOT LIKE '%[^a-zA-Z0-9]%') AND
(PATINDEX('%[a-zA-Z]%', columnInQuestion) <> 0) AND
(PATINDEX('%[0-9]%', columnInQuestion) <> 0)
First predicate (columnInQuestion NOT LIKE '%[^a-zA-Z0-9]%') is true if columnInQuestion contains only alphanumeric characters
Second predicate (PATINDEX('%[a-zA-Z]%', columnInQuestion) <> 0) is true if there is at least one alphabetic character in columnInQuestion
Third predicate (PATINDEX('%[0-9]%', columnInQuestion) <> 0) is true if there is at least one numeric character in columnInQuestion
It can be done with just one regexp:
^[a-zA-Z0-9]*([a-zA-Z][0-9]|[0-9][a-zA-Z])[a-zA-Z0-9]*$
It starts and ends with 0-x legal chars.
And somewhere there is a switch from a letter to a digit or from a digit to a letter.

Resources