How to use regular expression in snowflake? - snowflake-cloud-data-platform

Rule:
1. Start with R ;
2. One or more number ;
3. One space ;
4. Follow with other characters ;
Test case:
Input : 'R1 ABC' 'R4 DEF' 'Randwick Acca' 'R11 PPP'
Expect Output : 'R1 ABC' 'R4 DEF' 'R11 PPP'
Regular expression : "R\d{1,} "
I use regular expression tester, it works.
https://regex-golang.appspot.com/assets/html/index.html?_sm_au_=iHVPMjQb0QjFkMTHfLJ4vK7214sJW
Test query:
WITH tbl
AS (select t.column1 mycol from values('R1 ABC'),('R4 DEF'),('Randwick Acca'),('R11 PPP') t)
SELECT *
FROM tbl
WHERE mycol regexp 'R\d{1,} ' ;
Return NULL .
Thanks,
Bin

1) where's the "any other character"? Because what you have ends with space, period
2) welcome to SQL. \ is a special character and needs to be escaped
So:
WHERE mycol regexp 'R\\d{1,} .*';
I tested it on your query and it seemed to work

Related

Leetcode SQL 1440. Evaluate Boolean Expression

Table Variables:
Column Name
Type
name
varchar
value
int
name is the primary key for this table.
This table contains the stored variables and their values.
Table Expressions:
Column Name
Type
left_operand
varchar
operator
enum
right_operand
varchar
(left_operand, operator, right_operand) is the primary key for this table.
This table contains a boolean expression that should be evaluated.
operator is an enum that takes one of the values ('<', '>', '=')
The values of left_operand and right_operand are guaranteed to be in the Variables table.
Write an SQL query to evaluate the boolean expressions in Expressions table.
Return the result table in any order.
I am working on a SQL problem as shown in the above. I used MS SQL server and tried
SELECT
left_operand, operator, right_operand,
IIF(
(left_values > right_values AND operator = '>') OR
(left_values < right_values AND operator = '<' ) OR
(left_values = right_values AND operator = '='), 'true', 'false') as 'value'
FROM
(SELECT *,
IIF(left_operand = 'x', (SELECT value FROM Variables WHERE name='x')
, (SELECT value FROM Variables WHERE name='y')) as left_values,
IIF(right_operand = 'x', (SELECT value FROM Variables WHERE name='x')
, (SELECT value FROM Variables WHERE name='y')) as right_values
FROM Expressions) temp;
It works well on the test set but gets wrong when I submit it.
I think my logic is correct, could anyone help take a look at it and let me know what my problem is?
Thank you!
It feels like your example code is a lot more complicated than it needs to be. That's probably why it's failing the check. In your FROM you're using sub-selects but really a simple inner join would work much simpler. Also, if there were variables other than X and Y it doesn't look like your example code would work. Here's my code that I wrote in Postgres (should work in any SQL though).
SELECT e.left_operand, l.value as left_val, e.operator, e.right_operand, r.value as right_val,
CASE e.operator
WHEN '<' THEN
(l.value < r.value)
WHEN '=' THEN
(l.value = r.value)
WHEN '>' THEN
(l.value = r.value)
END as eval
FROM
expression as e
JOIN
variable as l on e.left_operand = l.name
JOIN
variable as r on e.right_operand = r.name
Here's a screenshot of my output:
I also have a db-fiddle link for you to check out.
https://www.db-fiddle.com/f/fdnJVSUQHS9Vep4uDSe5ZP/0

SQL Server - How to get last numeric value in the given string

I am trying to get last numeric part in the given string.
For Example, below are the given strings and the result should be last numeric part only
SB124197 --> 124197
287276ACBX92 --> 92
R009321743-16 --> 16
How to achieve this functionality. Please help.
Try this:
select right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
Explanation:
Using PATINDEX with '%[^0-9]%' as a search pattern you get the starting position of the first occurrence of a character that is not a number.
Using REVERSE you get the position of the first non numeric character starting from the back of the string.
Edit:
To handle the case of strings not containing non numeric characters you can use:
select case
when patindex(#str, '%[^0-9]%') = 0 then #str
else right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
end
If your data always contains at least one non-numeric character then you can use the first query, otherwise use the second one.
Actual query:
So, if your table is something like this:
mycol
--------------
SB124197
287276ACBX92
R009321743-16
123456
then you can use the following query (works in SQL Server 2012+):
select iif(x.i = 0, mycol, right(mycol, x.i - 1))
from mytable
cross apply (select patindex('%[^0-9]%', reverse(mycol) )) as x(i)
Output:
mynum
------
124197
92
16
123456
Demo here
Here is one way using Patindex
SELECT RIGHT(strg, COALESCE(NULLIF(Patindex('%[^0-9]%', Reverse(strg)), 0) - 1, Len(strg)))
FROM (VALUES ('SB124197'),
('287276ACBX92'),
('R009321743-16')) tc (strg)
After reversing the string, we are finding the position of first non numeric character and extracting the data from that position till the end..
Result :
-----
124197
92
16

sql like statement picking up unexpected results

I have a simple table like the following
id, target
-----------
1, test_1
2, test_2
3, test_3
4, testable
I have a simple query like so:
select * from my_table where target like 'test_%'
What I'm expecting are the first 3 records but I'm getting all 4 records
See SQLFiddle example here
Underscore is a pattern matching character. Try this:
select * from my_table where target like 'test[_]%'
_ is also a wildcard. You can escape it like:
... like 'test\_%' escape '\'
The underscore character _ as you've used it is a wildcard for a single character, hence it returns 4 rows. Try using [_] instead of _.
To illustrate..
CREATE TABLE #tmp (val varchar(10))
INSERT INTO #tmp (val)
VALUES ('test_1'), ('test_2'), ('test_3'), ('testing')
-- This returns all four
SELECT * FROM #tmp WHERE val LIKE 'test_%'
-- This returns the three test_ rows
SELECT * FROM #tmp WHERE val LIKE 'test[_]%'
The underscore is a wildcard character that says "match any character single character", just like the % is a wildcard that says "match any 0 or more characters". If you're familiar with Regular Expressions, the underscore character is equivalent to the dot there. You'll need to properly escape the underscore to match that character literally.

Sql Server's regex LIKE - behaviour clarification?

Someone asked here how to get only values which are a number :
So , if the table is :
DECLARE #Table TABLE(
Col nVARCHAR(50)
)
INSERT INTO #Table SELECT 'ABC'
INSERT INTO #Table SELECT '234.62'
INSERT INTO #Table SELECT '10:10:10:10'
INSERT INTO #Table SELECT 'France'
INSERT INTO #Table SELECT '2'
then - the desired results are :
234.62
2
But when I tested this query :
SELECT * FROM #Table WHERE Col LIKE '%[0-9.]%' --expected to see only 234.62
it showed :
234.62
10:10:10:10
2
Question #1
How come 10:10:10:10 , 2 satisfies the condition ?
Question #2
I saw this answer here which does work
SELECT * FROM #Table WHERE Col NOT LIKE '%[^0-9.]%'
But I don't understand why this works. AFAIU - it selects all values which are not like (not(has number) and not( has dot)) which is ===>(de morgan)===> not like ( has number or has dot)
Can someone please shed light ?
nb I already know that isnumeric can be used also , but it's unsafe (+). also valid wildcards are %,_,[],[^]
Any particular use of [set] within a LIKE expression is a check against one character in the target string.
So, LIKE '%[0-9.]%' says - % - match 0-to-many arbitrary characters, then [0-9.] match one character in the set 0-9., and then % match 0-to-many arbitrary characters. Paraphrased, it says "match any string that contains at least one character in the set 0-9.". So, 10:10:10:10 can be matched as 0 arbitrary characters, then 1 matches [0-9.], and then 0:10:10:10 matches the final %.
LIKE '%[^0-9.]%' says - % - match 0-to-many arbitrary characters, then [^0-9.] match one character not in the set 0-9., and then % match 0-to-many arbitrary characters. Paraphrased, it says "match any string that contains at least one character outside of the set 0-9.. So when we apply the NOT to the front of that, we are saying "match any string that doesn't contain at least one character outside of the set 0-9." or "match strings that only contain characters in the set 0-9..
Essentially, the double-negative is a way to make an assertion about all characters in the string.

How can i find the pattern identified by PATINDEX()

Which pattern is identified by PATINDEX in the below statement? Could any one help me analyse it?
How can we find which of ('I','II','III') is identified ?
select PATINDEX ('%[I,II,III]%','sjfhasjdg II')
Please help me finding it.
This is not how you use PATINDEX. , is not an alternation operator.
You are telling it to find characters in the set I,II,III which just repeats a lot of characters so can be simplified to "find the first location of either I or ,"
You could try
WITH SearchTerms(Term)
AS (SELECT 'I'
UNION ALL
SELECT 'II'
UNION ALL
SELECT 'III'),
ToBeSearched(string)
AS (SELECT 'sjfhasjdg II')
SELECT string,
Term,
Charindex(Term, string) AS Location
FROM ToBeSearched
JOIN SearchTerms
ON Charindex(Term, string) > 0
Returns
string Term Location
------------ ---- -----------
sjfhasjdg II I 11
sjfhasjdg II II 11
Of course both I and II match as anything that matches the second will always match the first.

Resources