Extract string after first '/' using snowflake query - snowflake-cloud-data-platform

I have an input table in snowflake with column contains data pattern as follows
city, state/LOCATION/designation
city state/LOCATION/designation
city, state/LOCATION
Want to extract only location and store in another column, can you help me doing this?

You could use SPLIT_PART, as mentioned in a previous answer, but if you wanted to use regular expressions I would use REGEXP_SUBSTR, like this:
REGEXP_SUBSTR(YOUR_FIELD_HERE,'/([^/]+)',1,1,'e')
To break it down, briefly, it's looking for a slash and then takes all the non-slash characters that follow it, meaning it ends just before the next slash, or at the end of the string.
The 1,1,'e' correspond to: starting at the first character of the string, returning the 1st match, and extracting the substring (everything in the parentheses).
Snowflake documentation is here.

There are several ways to do this:
A) using the SPLIT_PART function:
SELECT SPLIT_PART('city, state/LOCATION/designation', '/', 2);
Reference: SPLIT_PART
B) using the SPLIT_TO_TABLE tabular function:
SELECT t.VALUE
FROM TABLE(SPLIT_TO_TABLE('city, state/LOCATION/designation', '/')) AS t
WHERE t.INDEX = 2;
Reference: SPLIT_TO_TABLE
C) using REGEXP expressions:
SELECT REGEXP_REPLACE('city, state/LOCATION/designation', '(.*)/(.*)/(.*)', '\\2');
but this one doesn't work if you don't have a third term ('designation'), you need to combine with two calls and check by number of backslashes.
SELECT IFF(REGEXP_COUNT('city, state/LOCATION', '/') = 1,
REGEXP_REPLACE('city, state/LOCATION','(.*)/(.*)','\\2'),
REGEXP_REPLACE('city, state/LOCATION','(.*)/(.*)/(.*)','\\2'));
Reference: REGEXP_REPLACE

Related

Regular expression in snowflake

I have a requirement where the string from a column has a value "/Date(-34905600000)/". The value within brackets could be in any one of the following patters
"/Date(-34905600000)/"
"/Date(1407283200000)/"
"/Date(1636654411000+0000)/"
I need to extract all inside the parenthesis for examples 1 and 2 including the "-" if any. For the 3rd example, it should be only the numbers inside the parenthesis before "+" ie 1636654411000.
I tried the following and not getting the results as the output is coming along with the parenthesis.
select REGEXP_substr("/Date(-34905600000)/", '\\([[:alnum:]\-]+\\)')
from table A;
select REGEXP_substr("/Date(-34905600000)/", '\\((.*?)\\)') from table
A;
select REGEXP_substr("/Date(-34905600000)/", '[0-9]+') from table A;
Using regexp_replace() instead you could do:
regexp_replace(colA, '(\\/Date\\()([-0-9]*)(.*)', '\\2')
That splits the string into three substitution groups and then only keeps the second. I often end up doing regexp_replace() with substitution groups like this when regexp_substr() fails me.
if you want the REGEXP_SUBSTR to sub-matches you need to use the 'e' <regex_parameters> option, and then you can use 1 as the to match your first grouping, thus:
SELECT column1,
REGEXP_substr(column1, 'Date\\(([-+]?[0-9]+)',1,1,'e')
FROM VALUES
('"/Date(-34905600000)/"'),
('"/Date(1407283200000)/"'),
('"/Date(1636654411000+0000)/"');
gives:
COLUMN1
REGEXP_SUBSTR(COLUMN1, 'DATE\(([-+]?[0-9]+)',1,1,'E')
"/Date(-34905600000)/"
-34905600000
"/Date(1407283200000)/"
1407283200000
"/Date(1636654411000+0000)/"
1636654411000
I am quite sure the regexp is greedy by default, but otherwise you can force the match to the timezone or paren with
'Date\\(([-+]?[0-9]+)[-+\\)]'

How to take apart information between hyphens in SQL Server

How would I take apart a column that contains string:
92873-987dsfkj80-2002-04-11
20392-208kj48384-2008-01-04
Data would look like this:
Filename Yes/No Key
Abidabo Yes 92873-987dsfkj80-2002-04-11
Bibiboo No 20392-208kj48384-2008-01-04
Want it to look like this:
Filename Yes/No Key
Abidabo Yes 92873-987dsfkj80-20020411
Bibiboo No 20392-208kj48384-20080104
whereby I would like to concat the dates in the end as 20020411 and 20080104. From the right side, the information is the same always. From the left it is not, otherwise I could have concatenated it. It is not an import issue.
As mentioned in the comments already, storing data like this is a bad idea. However, you can obtain the dates from those strings by using a RIGHT function like so:
SELECT RIGHT('20392-208kj48384-2008-01-04', 10)
Output:
2008-01-04
Depending on the SQLSERVER version you are using, you can use STRING_SPLIT which requieres COMPATIBILITY_LEVEL 130. You can also build your own User Defined Function to split the contents of a field and manipulate it as you need, you can find some useful examples of SPLIT functions in this thread:
Split function equivalent in T-SQL?
Assuming I'm correct and the date part is always on the right side of the string, you can simply use RIGHT and CAST to get the date (assuming, again, that the date is represented as yyyy-mm-dd):
SELECT CAST(RIGHT(YourColumn, 10) As Date)
FROM YourTable
However, Panagiotis is correct in his comment - You shouldn't store data like that. Each column in the database should hold only a single point of data, be it string, number or date.
Update following your comment and the updated question:
SELECT LEFT(YourColumn, LEN(YourColumn) - 10) + REPLACE(RIGHT(YourColumn, 10), '-', '')
FROM YourTable
will return the desired results.

Use String parameter for RegEx in query

In my query (the database is a sql server) I use a RegEx for a select command like this:
SELECT * FROM test WHERE id LIKE '1[2,3]'
(This query is tested and returns the data I want)
I want to use a paramter for this RegEx. For that I definded the Paramter in iReport $P{id} as a string and the value is "1[2,3]".
In my query I use now this parameter like this:
SELECT * FROM test WHERE id LIKE $P{id}
As result I get a blank page. I think the problem is that the value of the parameter is defined with " ". But with ' ' I get a compiler error that the paramter isn't a string.
I hope someone can help me.
LIKE applies to text values, not to numeric values. Since id is numeric use something like this:
SELECT * FROM test WHERE id IN (12, 13)
with the parameter
SELECT * FROM test WHERE id IN ($P!{id_list})
and supply a comma separated list of ids for the parameter. The bang (!) makes sure that the parameter will be inserted as-is, without string delimiters.
Btw: LIKE (Transact-SQL) uses wildcards, not regex.
You can still use LIKE since there exists an implicit conversion from numeric types to text in T-SQL, but this will result in a (table or index) scan, where as the IN clause can take advantage of indexes.
The accepted answer works but it is using String replacement, read more about sql-injection, to understand why this is not good practice.
The correct way to execute this IN query in jasper report (using prepared statement) is:
SELECT * FROM test WHERE $X{IN, id, id_list}
For more information as the use of NOTIN, BETWEEN ecc. see JasperReports sample reference for query

T-SQL Regex for social security number (SQL Server 2008 R2)

I need to find invalid social security numbers in a varchar field in a SQL Server 2008 database table. (Valid SSNs are being defined by being in the format ###-##-#### - doesn't matter what the numbers are, as long as they are in that "3-digit dash 2-digit dash 4-digit" pattern.
I do have a working regex:
SELECT *
FROM mytable
WHERE ssn NOT LIKE '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'
That does find the invalid SSNs in the column, but I know (okay - I'm pretty sure) that there is a way to shorten that to indicate that the previous pattern can have x iterations.
I thought this would work:
'[0-9]{3}-[0-9]{2}-[0-9]{4}'
But it doesn't.
Is there a shorter regex than the one above in the select, or not? Or perhaps there is, but T-SQL/SQL Server 2008 doesn't support it!?
If you plan to get a shorter variant of your LIKE expression, then the answer is no.
In T-SQL, you can only use the following wildcards in the pattern:
%
- Any string of zero or more characters.
WHERE title LIKE '%computer%' finds all book titles with the word computer anywhere in the book title.
_ (underscore)
Any single character.
WHERE au_fname LIKE '_ean' finds all four-letter first names that end with ean (Dean, Sean, and so on).
[ ]
Any single character within the specified range ([a-f]) or set ([abcdef]).
WHERE au_lname LIKE '[C-P]arsen' finds author last names ending with arsen and starting with any single character between C and P, for example Carsen, Larsen, Karsen, and so on. In range searches, the characters included in the range may vary depending on the sorting rules of the collation.
[^]
Any single character not within the specified range ([^a-f]) or set ([^abcdef]).
So, your LIKE statement is already the shortest possible expression. No limiting quantifiers can be used (those like {min,max}), not shorthand classes like \d.
If you were using MySQL, you could use a richer set of regex utilities, but it is not the case.
I suggest you to use another solution like this:
-- Use `REPLICATE` if you really want to use a number to repeat
Declare #rgx nvarchar(max) = REPLICATE('#', 3) + '-' +
REPLICATE('#', 2) + '-' +
REPLICATE('#', 4);
-- or use your simple format string
Declare #rgx nvarchar(max) = '###-##-####';
-- then use this to get your final `LIKE` string.
Set #rgx = REPLACE(#rgx, '#', '[0-9]');
And you can also use something like '_' for characters then replace it with [A-Z] and so on.

TSQL Split Function - Comma Delimited Parameter Values

I've been trying to solve this issue where a sproc I'm using passes in user names with comma's in them. The part before the comma is a position prefix so for example, 'sel, MyName'. Our split function looks for commas to pass in something like this, 'sel, MyName, sel, YourName'.
I cannot figure out how to keep the comma but also separate the comma between names to perform the query against where username in (select result from dbo.split(#namestosplit)
I've tried removing the comma then putting it back, tried replacing temporarily, I've tried prefixing with the text (removing the prefix from the param's passed in)
I figured out a way to do this.
select 'someprefix, ' + Item from Split(replace(#valueToSplit,'sel, ',''), ',')
This ends up forming a select like this:
row 1: sel, Some Person
row 2: sel, Another Person
I essentially removed it to split the names then put it back as a concat for the where in (do something) piece.

Resources