I need to remove leading characters with variable zeros in my column. The string always starts with '049' + filling zeros + some number. I need to extract the number after leading zeros.
04912040 -> 12040
04901204 -> 1204
04900100 -> 100
04900012 -> 12
04900008 -> 8
I have found this solution and added replace for the leading '049' to be replaced with '000':
SUBSTRING(mycolumn, PATINDEX('%[^0]%', REPLACE(mycolumn, '049', '000')), LEN(mycolumn))
However, this won't work if my string looks like 04904901, since instead of 4901 I will get 1.
Just remove the first 3 characters and convert it to an int:
SELECT CONVERT(int,STUFF(YourColumn,1,3,''))
FROM dbo.YourTable;
db<>fiddle
Related
I would like to check if a specific column in one of my tables meets the following conditions:
String must contain at least three characters
String must contain at least two different numbers [e.g. 123 would work but 111 would not]
Characters which are allowed in the string:
Numbers (0-9)
Uppercase letters
Lowercase letters
Underscores (_)]
Dashes (-)
I have some experience with Regex but am having issues with Snowflake's syntax. Whenever I try using the '?' regex character (to mark something as optional) I receive an error. Can someone help me understand a workaround and provide a solution?
What I have so far:
SELECT string,
LENGTH(string) AS length
FROM tbl
WHERE REGEXP_LIKE(string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$')
ORDER BY length;
Thanks!
Your regex looks a little confusing and invalid, and it doesn't look like it quite meets your needs either. I read this expression as a string that:
Must start with one or more digits, at least 3 or more times
The confusing part to me is the '+' is a quantifier, which is not quantifiable with {3,} but somehow doesn't produce an error for me
Optionally followed by either a dash or plus sign
Followed by an uppercase character zero or one times (giving back as needed)
Followed by and ending with a lowercase character zero or one times (giving back as needed)
Questions
You say that your string must contain 3 characters and at least 2 different numbers, numbers are characters but I'm not sure if you mean 3 letters...
Are you considering the numbers to be characters?
Does the order of the characters matter?
Can you provide an example of the error you are receiving?
Notes
Checking for a second digit that is not the same as the first involves the concept of a lookahead with a backreference. Snowflake does not support backreferences.
One thing about pattern matching with regular expressions is that order makes a difference. If order is not of importance to you, then you'll have multiple patterns to match against.
Example
Below is how you can test each part of your requirements individually. I've included a few regexp_substr functions to show how extraction can work to check if something exists again.
Uncomment the WHERE clause to see the dataset filtered. The filters are written as expressions so you can remove any/all of the regexp_* columns.
select randstr(36,random(123)) as r_string
,length(r_string) AS length
,regexp_like(r_string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$') as reg
,regexp_like(r_string,'.*[A-Za-z]{3,}.*') as has_3_consecutive_letters
,regexp_like(r_string,'.*\\d+.*\\d+.*') as has_2_digits
,regexp_substr(r_string,'(\\d)',1,1) as first_digit
,regexp_substr(r_string,'(\\d)',1,2) as second_digit
,first_digit <> second_digit as digits_1st_not_equal_2nd
,not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) as first_digit_does_not_appear_again
,has_3_consecutive_letters and has_2_digits and first_digit_does_not_appear_again as test
from table(generator(rowcount => 10))
//where regexp_like(r_string,'.*[A-Za-z]{3,}.*') // has_3_consecutive_letters
// and regexp_like(r_string,'.*\\d+.*\\d+.*') // has_2_digits
// and not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) // first_digit_does_not_appear_again
;
Assuming the digits need to be contiguous, you can use a javascript UDF to find the number in a string with with the largest number of distinct digits:
create or replace function f(S text)
returns float
language javascript
returns null on null input
as
$$
const m = S.match(/\d+/g)
if (!m) return 0
const lengths = m.map(m=> [...new Set (m.split(''))].length)
const max_length = lengths.reduce((a,b) => Math.max(a,b))
return max_length
$$
;
Combined with WHERE-clause, this does what you want, I believe:
select column1, f(column1) max_length
from t
where max_length>1 and length(column1)>2 and column1 rlike '[\\w\\d-]+';
Yielding:
COLUMN1 | MAX_LENGTH
------------------------+-----------
abc123def567ghi1111_123 | 3
123 | 3
111222 | 2
Assuming this input:
create or replace table t as
select * from values ('abc123def567ghi1111_123'), ('xyz111asdf'), ('123'), ('111222'), ('abc 111111111 abc'), ('12'), ('asdf'), ('123 456'), (null);
The function is even simpler if the digits don't have to be contiguous (i.e. count the distinct digits in a string). Then core logic changes to:
const m = S.match(/\d/g)
if (!m) return 0
const length = [...new Set (m)].length
return length
Hope that's helpful!
When I execute this
select PATINDEX('%[0 ]%', '03/SI/00807/18-19')
I am getting 1.
By using ^ like this:
select PATINDEX('%[^0 ]%', '03/SI/00807/18-19')
I am getting 2.
[^] Allows you to match on any character not in the [^] brackets (for example, [^abc] would match on any character that is not a, b, or c characters) Whereas
[ ] Allows you to match on any character in the [ ] brackets (for example, [abc] would match on a, b, or c characters)
_ Allows you to match on a single character
% Allows you to match any string of any length (including zero length)
[^abcd] means: any one character EXCEPT a,b,c or d
select PATINDEX('%[0 ]%', '03/SI/00807/18-19')
The first character in your string which is (0 or space) is the 0 in the first place, so patindex returns 1.
select PATINDEX('%[^0 ]%', '03/SI/00807/18-19')
The first character in your string which is (neither 0 nor space) is the 3 in the second place, so patindex returns 2.
I am trying to get last numeric part in the given string.
For Example, below are the given strings and the result should be last numeric part only
SB124197 --> 124197
287276ACBX92 --> 92
R009321743-16 --> 16
How to achieve this functionality. Please help.
Try this:
select right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
Explanation:
Using PATINDEX with '%[^0-9]%' as a search pattern you get the starting position of the first occurrence of a character that is not a number.
Using REVERSE you get the position of the first non numeric character starting from the back of the string.
Edit:
To handle the case of strings not containing non numeric characters you can use:
select case
when patindex(#str, '%[^0-9]%') = 0 then #str
else right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
end
If your data always contains at least one non-numeric character then you can use the first query, otherwise use the second one.
Actual query:
So, if your table is something like this:
mycol
--------------
SB124197
287276ACBX92
R009321743-16
123456
then you can use the following query (works in SQL Server 2012+):
select iif(x.i = 0, mycol, right(mycol, x.i - 1))
from mytable
cross apply (select patindex('%[^0-9]%', reverse(mycol) )) as x(i)
Output:
mynum
------
124197
92
16
123456
Demo here
Here is one way using Patindex
SELECT RIGHT(strg, COALESCE(NULLIF(Patindex('%[^0-9]%', Reverse(strg)), 0) - 1, Len(strg)))
FROM (VALUES ('SB124197'),
('287276ACBX92'),
('R009321743-16')) tc (strg)
After reversing the string, we are finding the position of first non numeric character and extracting the data from that position till the end..
Result :
-----
124197
92
16
I am making a program which got to split the phone-number apart, each part has been divided by a hyphen (or spaces, or '( )' or empty).
Exp: Input: 0xx-xxxx-xxxx or 0xxxxxxxxxx or (0xx)xxxx-xxxx
Output: code 1: 0xx
code 2: xxxx
code 3: xxxx
But my problem is: sometime "Code 1" is just 0x -> so "Code 2" must be xxxxx (1st part always have hyphen or a parenthesis when 2 digit long)
Anyone can give me a hand, It would be grateful.
According to your comments, the following regex will extract the information you need
^\(?(0\d{1,2})\)?[- ]?(\d{4,5})[- ]?(\d{4})$
Break down:
^\(?(0\d{1,2})\)? matches 0x, 0xx, (0xx) and (0x) at he beggining of the string
[- ]? as parenthesis can only be used for the first group, the only valid separators left are space and the hyphen. ? means 0 or 1 time.
(\d{4,5}) will match the second group. As the length of the 3rd group is fixed (4 digits), the regex will automatically calculate the length of the Group1 and 2.
(\d{4})$ matches the 4 digits at the end of the number.
See it in action
You can the extract data from capture group 1,2 and 3
Note: As mentionned in the comments of the OP, this only extracts data from correctly formed numbers. It will match some ill-formed numbers.
i have weight column in a table where weight must be inserted with following format '09.230'. Weight column is of varchar type. so value from front end comes as '9.23' it should get converted to above mentioned format i.e.(09.230). I am able to add trailing zero but adding leading zero is a problem.
This is what i have done to add trailing zero
CAST(ROUND(#Weight,3,0) AS DECIMAL (9,3))
Suppose #Weight = 6.56 output with above comes out be '6.560' but output wanted as '06.560'.
RIGHT('0'+ CONVERT(VARCHAR, CAST(ROUND(#Weight,3,0) AS DECIMAL (9,3))), 6)
This
takes your expression,
converts it to a varchar (retaining the trailing zeros, since the source data type was decimal),
adds a 0 in front of it, and
trims it to 6 characters by removing characters from the front, if needed (e.g. 012.560 -> 12.560, but 06.560 -> 06.560).
Do note, though, that this only works for numbers with at most two digits before the decimal point: 100.123 would be truncated to 00.123!