Regular expression in snowflake - snowflake-cloud-data-platform

I have a requirement where the string from a column has a value "/Date(-34905600000)/". The value within brackets could be in any one of the following patters
"/Date(-34905600000)/"
"/Date(1407283200000)/"
"/Date(1636654411000+0000)/"
I need to extract all inside the parenthesis for examples 1 and 2 including the "-" if any. For the 3rd example, it should be only the numbers inside the parenthesis before "+" ie 1636654411000.
I tried the following and not getting the results as the output is coming along with the parenthesis.
select REGEXP_substr("/Date(-34905600000)/", '\\([[:alnum:]\-]+\\)')
from table A;
select REGEXP_substr("/Date(-34905600000)/", '\\((.*?)\\)') from table
A;
select REGEXP_substr("/Date(-34905600000)/", '[0-9]+') from table A;

Using regexp_replace() instead you could do:
regexp_replace(colA, '(\\/Date\\()([-0-9]*)(.*)', '\\2')
That splits the string into three substitution groups and then only keeps the second. I often end up doing regexp_replace() with substitution groups like this when regexp_substr() fails me.

if you want the REGEXP_SUBSTR to sub-matches you need to use the 'e' <regex_parameters> option, and then you can use 1 as the to match your first grouping, thus:
SELECT column1,
REGEXP_substr(column1, 'Date\\(([-+]?[0-9]+)',1,1,'e')
FROM VALUES
('"/Date(-34905600000)/"'),
('"/Date(1407283200000)/"'),
('"/Date(1636654411000+0000)/"');
gives:
COLUMN1
REGEXP_SUBSTR(COLUMN1, 'DATE\(([-+]?[0-9]+)',1,1,'E')
"/Date(-34905600000)/"
-34905600000
"/Date(1407283200000)/"
1407283200000
"/Date(1636654411000+0000)/"
1636654411000
I am quite sure the regexp is greedy by default, but otherwise you can force the match to the timezone or paren with
'Date\\(([-+]?[0-9]+)[-+\\)]'

Related

Extract string after first '/' using snowflake query

I have an input table in snowflake with column contains data pattern as follows
city, state/LOCATION/designation
city state/LOCATION/designation
city, state/LOCATION
Want to extract only location and store in another column, can you help me doing this?
You could use SPLIT_PART, as mentioned in a previous answer, but if you wanted to use regular expressions I would use REGEXP_SUBSTR, like this:
REGEXP_SUBSTR(YOUR_FIELD_HERE,'/([^/]+)',1,1,'e')
To break it down, briefly, it's looking for a slash and then takes all the non-slash characters that follow it, meaning it ends just before the next slash, or at the end of the string.
The 1,1,'e' correspond to: starting at the first character of the string, returning the 1st match, and extracting the substring (everything in the parentheses).
Snowflake documentation is here.
There are several ways to do this:
A) using the SPLIT_PART function:
SELECT SPLIT_PART('city, state/LOCATION/designation', '/', 2);
Reference: SPLIT_PART
B) using the SPLIT_TO_TABLE tabular function:
SELECT t.VALUE
FROM TABLE(SPLIT_TO_TABLE('city, state/LOCATION/designation', '/')) AS t
WHERE t.INDEX = 2;
Reference: SPLIT_TO_TABLE
C) using REGEXP expressions:
SELECT REGEXP_REPLACE('city, state/LOCATION/designation', '(.*)/(.*)/(.*)', '\\2');
but this one doesn't work if you don't have a third term ('designation'), you need to combine with two calls and check by number of backslashes.
SELECT IFF(REGEXP_COUNT('city, state/LOCATION', '/') = 1,
REGEXP_REPLACE('city, state/LOCATION','(.*)/(.*)','\\2'),
REGEXP_REPLACE('city, state/LOCATION','(.*)/(.*)/(.*)','\\2'));
Reference: REGEXP_REPLACE

SQL Pattern matching not giving the correct output

I am trying to find a certain set of characters in a column from a datatable. I have tried the pattern that seems more logical to me (right below) but it doesn't seem to be doing the job. What I wish to achieve is a pattern where I have something like '["5"]', basically with: square brackets, quotation marks, any integer number, quotation marks, square brackets. The output I am getting is just empty, and I can't seem to undersand why. Besides this, I would like to update the records that do not follow this pattern to follow it. Does anyone have a solution for this?
To give you some context, here is the test table:
I want to achive only the last three records.
Here is what I have tried:
SELECT ToJsonTestValue
FROM Test
WHERE ToJsonTestValue LIKE '["%"]'
and
UPDATE dbo.Test
SET ToJsonTestValue = '["'+ToJsonTestValue+'"]'
WHERE ToJsonTestValue LIKE '#';
You have a couple of problem here. Firstly you have the square brackets, which needs escaping. Then you also use % which is a multi character wildcard, however, it appears that you want a single character. It also appears that that character can only be an integer, so you might want to be more specific. Either of these should give you the result you want:
--Using single character wildcard:
SELECT *
FROM (VALUES('["1"]'),('["["1"]"]'))V(S)
WHERE V.S LIKE '[[]"_"[\]]' ESCAPE '\';
--Specifically requiring int:
SELECT *
FROM (VALUES('["1"]'),('["["1"]"]'))V(S)
WHERE V.S LIKE '[[]"[0-9]"[\]]' ESCAPE '\';

Use String parameter for RegEx in query

In my query (the database is a sql server) I use a RegEx for a select command like this:
SELECT * FROM test WHERE id LIKE '1[2,3]'
(This query is tested and returns the data I want)
I want to use a paramter for this RegEx. For that I definded the Paramter in iReport $P{id} as a string and the value is "1[2,3]".
In my query I use now this parameter like this:
SELECT * FROM test WHERE id LIKE $P{id}
As result I get a blank page. I think the problem is that the value of the parameter is defined with " ". But with ' ' I get a compiler error that the paramter isn't a string.
I hope someone can help me.
LIKE applies to text values, not to numeric values. Since id is numeric use something like this:
SELECT * FROM test WHERE id IN (12, 13)
with the parameter
SELECT * FROM test WHERE id IN ($P!{id_list})
and supply a comma separated list of ids for the parameter. The bang (!) makes sure that the parameter will be inserted as-is, without string delimiters.
Btw: LIKE (Transact-SQL) uses wildcards, not regex.
You can still use LIKE since there exists an implicit conversion from numeric types to text in T-SQL, but this will result in a (table or index) scan, where as the IN clause can take advantage of indexes.
The accepted answer works but it is using String replacement, read more about sql-injection, to understand why this is not good practice.
The correct way to execute this IN query in jasper report (using prepared statement) is:
SELECT * FROM test WHERE $X{IN, id, id_list}
For more information as the use of NOTIN, BETWEEN ecc. see JasperReports sample reference for query

Find rows where text array contains value similar to input

I'm trying to get rows where a column of type text[] contains a value similar to some user input.
What I've thought and done so far is to use the 'ANY' and 'LIKE' operator like this:
select * from someTable where '%someInput%' LIKE ANY(someColum);
But it doesn't work. The query returns the same values as that this query:
select * from someTable where 'someInput' = ANY(someColum);
I've got good a result using the unnest() function in a subquery but I need to query this in WHERE clause if possible.
Why doesn't the LIKE operator work with the ANY operator and I don't get any errors? I thought that one reason should be that ANY operator is in the right-hand of query, but ...
Is there any solution to this without using unnest() and if it is possible in WHERE clause?
It's also important to understand that ANY is not an operator but an SQL construct that can only be used to the right of an operator. More:
How to use ANY instead of IN in a WHERE clause with Rails?
The LIKE operator - or more precisely: expression, that is rewritten with to the ~~ operator in Postgres internally - expects the value to the left and the pattern to the right. There is no COMMUTATOR for this operator (like there is for the simple equality operator =) so Postgres cannot flip operands around.
Your attempt:
select * from someTable where '%someInput%' LIKE ANY(someColum);
has flipped left and right operand so '%someInput%' is the value and elements of the array column someColum are taken to be patterns (which is not what you want).
It would have to be ANY(someColum) LIKE '%someInput%' - except that's not possible with the ANY construct which is only allowed to the right of an operator. You are hitting a road block here.
Related:
Is there a way to usefully index a text column containing regex patterns?
Can PostgreSQL index array columns?
You can normalize your relational design and save elements of the array in separate rows in a separate table. Barring that, unnest() is the solution, as you already found yourself. But while you are only interested in the existence of at least one matching element, an EXISTS subquery will be most efficient while avoiding duplicates in the result - Postgres can stop the search as soon as the first match is found:
SELECT *
FROM tbl
WHERE EXISTS (
SELECT -- can be empty
FROM unnest(someColum) elem
WHERE elem LIKE '%someInput%'
);
You may want to escape special character in someInput. See:
Escape function for regular expression or LIKE patterns
Careful with the negation (NOT LIKE ALL (...)) when NULL can be involved:
Check if NULL exists in Postgres array
An admittedly imperfect possibility might be to use ARRAY_TO_STRING, then use LIKE against the result. For example:
SELECT *
FROM someTable
WHERE ARRAY_TO_STRING(someColum, '||') LIKE '%someInput%';
This approach is potentially problematic, though, because someone could search over two array elements if they discover the joining character sequence. For example, an array of {'Hi','Mom'}, connected with || would return a result if the user had entered i||M in place of someInput. Instead, the expectation would probably be that there would be no result in that case since neither Hi nor Mom individually contain the i||M sequence of characters.
My question was marked duplicate and linked to a question out of context by a careless mod. This question comes closest to what I asked so I leave my answer here. (I think it may help people for who unnest() would be a solution)
In my case a combination of DISTINCT and unnest() was the solution:
SELECT DISTINCT ON (id_) *
FROM (
SELECT unnest(tags) tag, *
FROM someTable
) x
WHERE (tag like '%someInput%');
unnest(tags) expands the text array to a list of rows and DISTINCT ON (id_) removes the duplicates that result from the expansion, based on a unique id_ column.
Update
Another way to do this without DISTINCT within the WHERE clause would be:
SELECT *
FROM someTable
WHERE (
0 < (
SELECT COUNT(*)
FROM unnest(tags) AS tag
WHERE tag LIKE '%someInput%'
)
);
Please check this out.
This answer was exactly what I was looking for. It also provides for some useful tips (and examples) in case you need more flexibility.
It basically explains the ANY(), the #> and the && operators.
"If you want to search multiple values, you can use #> operator"
"#> means contains all the values in that array. If you want to search if the current array contains any values in another array, you can use &&"

How to compose LIKE in T-SQL to show all rows except those containing ":","-","~"?

I have a SQL Server engine in which I have a filed with filter clause. I need that clause to be compose to show all rows except those that contain :, -, ~.
My query is:
SELECT 1
WHERE '' LIKE '%[^:-~]%'
It is not working - it shows zero rows. Also I try this:
SELECT 1
WHERE 'aa:a' LIKE '%[^:-~]%'
And it shows as result 1, which is not desired result.
Is there a way to manage this?
REMARK: expression after like must be string which will be saved inside the table field (for exmaple: '%[^:-~]%' will be used as LIKE x.fldFilter)
EDIT: I need to make validation inside my engine inside SQL Server. I have a table with Parameters. Inside it I have column Format. For a specific parameter I need to check if value provided fits Format column.
For example:
DECLARE #value AS VARCHAR(1000) = 'aaa:aa';
SELECT 1 FROM dbo.ParameterDefinitions X WHERE #value LIKE X.[Format];
Where X.[Format] column contains '%[^:-~]%'.
When I test a value check must return 1 if it fits conditions and nothing if not fits.
So if I test value 'aaa:aa' or even ' ' it works. But when I have empty string ('') the condition does not working.
Unfortunately I can not change my engine and can not replace '' with space for example. I just wonder why '' does not fit the condition?
This is due to SQL Server not having a solid regex implementation.
Instead of negating your search with ^
Negate it with Not
SELECT 1
WHERE '' NOT LIKE '%[:-~]%'
Returns 1 row
SELECT 1
WHERE 'aa:a' NOT LIKE '%[:-~]%'
Returns 0 rows
EDIT:
Breaking down your search cases
'' LIKE '%[^:-~]%'
[^:-~] requires a single character so an empty string must fail
'aa:a' LIKE '%[^:-~]%'
% is a 0 or more wildcard which lets [^:-~] take its choice of 'a' while either % collects your forbidden character.
With a full regex engine we could repeat your negated range with the following [^:-~]* but SQL Server doesn't support that. Docs
The only option left to us is to perform a search for the forbidden characters '%[:-~]%' and to negate the like.

Resources