Sql Server's regex LIKE - behaviour clarification? - sql-server

Someone asked here how to get only values which are a number :
So , if the table is :
DECLARE #Table TABLE(
Col nVARCHAR(50)
)
INSERT INTO #Table SELECT 'ABC'
INSERT INTO #Table SELECT '234.62'
INSERT INTO #Table SELECT '10:10:10:10'
INSERT INTO #Table SELECT 'France'
INSERT INTO #Table SELECT '2'
then - the desired results are :
234.62
2
But when I tested this query :
SELECT * FROM #Table WHERE Col LIKE '%[0-9.]%' --expected to see only 234.62
it showed :
234.62
10:10:10:10
2
Question #1
How come 10:10:10:10 , 2 satisfies the condition ?
Question #2
I saw this answer here which does work
SELECT * FROM #Table WHERE Col NOT LIKE '%[^0-9.]%'
But I don't understand why this works. AFAIU - it selects all values which are not like (not(has number) and not( has dot)) which is ===>(de morgan)===> not like ( has number or has dot)
Can someone please shed light ?
nb I already know that isnumeric can be used also , but it's unsafe (+). also valid wildcards are %,_,[],[^]

Any particular use of [set] within a LIKE expression is a check against one character in the target string.
So, LIKE '%[0-9.]%' says - % - match 0-to-many arbitrary characters, then [0-9.] match one character in the set 0-9., and then % match 0-to-many arbitrary characters. Paraphrased, it says "match any string that contains at least one character in the set 0-9.". So, 10:10:10:10 can be matched as 0 arbitrary characters, then 1 matches [0-9.], and then 0:10:10:10 matches the final %.
LIKE '%[^0-9.]%' says - % - match 0-to-many arbitrary characters, then [^0-9.] match one character not in the set 0-9., and then % match 0-to-many arbitrary characters. Paraphrased, it says "match any string that contains at least one character outside of the set 0-9.. So when we apply the NOT to the front of that, we are saying "match any string that doesn't contain at least one character outside of the set 0-9." or "match strings that only contain characters in the set 0-9..
Essentially, the double-negative is a way to make an assertion about all characters in the string.

Related

Is there a way to find values that contain only 0's and a symbol of any length?

I want to find strings of any length that contain only 0's and a symbol such as a / a . or a -
Examples include 0__0 and 000/00/00000 and .00000
Considering this sample data:
CREATE TABLE dbo.things(thing varchar(255));
INSERT dbo.things(thing) VALUES
('0__0'),('000/00/00000'),('00000'),('0123456');
Try the following, which locates the first position of any character that is NOT a 0, a decimal, a forward slash, or an underscore. PATINDEX returns 0 if the pattern is not found.
SELECT thing FROM dbo.things
WHERE PATINDEX('%[^0^.^/^_]%', thing) = 0;
Results:
thing
0__0
000/00/00000
00000
The opposite:
SELECT thing FROM dbo.things
WHERE PATINDEX('%[^0^.^/^_]%', thing) > 0;
Results:
thing
0123food456
Example db<>fiddle
I can see a way of doing this... But it's something that wouldn't perform well, if you think about using it as a search criteria.
We are going to use a translate function on SQL Server, to replace the allowed characters, or symbols as you've said, with a zero. And then, eliminates the zeroes. If the result is an empty string, then there are two cases, or it only had zeroes and allowed characters, or it already was an empty string.
So, checking for this and for non-empty strings, we can define if it matches your criteria.
-- Test scenario
create table #example (something varchar(200) )
insert into #example(something) values
--Example cases from Stack Overflow
('0__0'),('000/00/00000'),('.00000'),
-- With something not allowed (don't know, just put a number)
('1230__0'),('000/04560/00000'),('.00000789'),
-- Just not allowed characters, zero, blank, and NULL
('1234567489'),('0'), (''),(null)
-- Shows the data, with a column to check if it matches your criteria
select *
from #example e
cross apply (
select case when
-- If it *must* have at least a zero
e.something like '%0%' and
-- Eliminates zeroes
replace(
-- Replaces the allowed characters with zero
translate(
e.something
,'_./'
,'000'
)
,'0'
,''
) = ''
then cast(1 as bit)
else cast(0 as bit)
end as doesItMatch
) as criteria(doesItMatch)
I really discourage you from using this as a search criteria.
-- Queries the table over this criteria.
-- This is going to compute over your entire table, so it can get very CPU intensive
select *
from #example e
where
-- If it *must* have at least a zero
e.something like '%0%' and
-- Eliminates zeroes
replace(
-- Replaces the allowed characters with zero
translate(
e.something
,'_./'
,'000'
)
,'0'
,''
) = ''
If you must use this as a search criteria, and this will be a common filter on your application, I suggest you create a new bit column, to flag if it matches this, and index it. Thus, the increase in computational effort would be spread on the inserts/updates/deletes, and the search queries won't overloading the database.
The code can be seen executing here, on DB Fiddle.
What I got from the question is that the strings must contain both 0 and any combination of the special characters in the string.
If you have SQL Server 2017 and above, you can use translate() to replace multiple characters with a space and compare this with the empty string. Also you can use LIKE to enforce that both a 0 and any combination of the special character(s) appear at least once:
DECLARE #temp TABLE (val varchar(100))
INSERT INTO #temp VALUES
('0__0'), ('000/00/00000'), ('.00000'), ('w0hee/'), ('./')
SELECT *
FROM #temp
WHERE val LIKE '%0%' --must have at least one zero somewhere
AND val LIKE '%[_/.]%' --must have at least one special character(s) somewhere
AND TRANSLATE(val, '0./_', ' ') = '' --translated zeros and sp characters to spaces equivalent to an empty string
Creates output:
val
0__0
000/00/00000
.00000

SQL Server - How to get last numeric value in the given string

I am trying to get last numeric part in the given string.
For Example, below are the given strings and the result should be last numeric part only
SB124197 --> 124197
287276ACBX92 --> 92
R009321743-16 --> 16
How to achieve this functionality. Please help.
Try this:
select right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
Explanation:
Using PATINDEX with '%[^0-9]%' as a search pattern you get the starting position of the first occurrence of a character that is not a number.
Using REVERSE you get the position of the first non numeric character starting from the back of the string.
Edit:
To handle the case of strings not containing non numeric characters you can use:
select case
when patindex(#str, '%[^0-9]%') = 0 then #str
else right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
end
If your data always contains at least one non-numeric character then you can use the first query, otherwise use the second one.
Actual query:
So, if your table is something like this:
mycol
--------------
SB124197
287276ACBX92
R009321743-16
123456
then you can use the following query (works in SQL Server 2012+):
select iif(x.i = 0, mycol, right(mycol, x.i - 1))
from mytable
cross apply (select patindex('%[^0-9]%', reverse(mycol) )) as x(i)
Output:
mynum
------
124197
92
16
123456
Demo here
Here is one way using Patindex
SELECT RIGHT(strg, COALESCE(NULLIF(Patindex('%[^0-9]%', Reverse(strg)), 0) - 1, Len(strg)))
FROM (VALUES ('SB124197'),
('287276ACBX92'),
('R009321743-16')) tc (strg)
After reversing the string, we are finding the position of first non numeric character and extracting the data from that position till the end..
Result :
-----
124197
92
16

Oracle: Check if number column contains a value from a formatted string of numbers

In my local table, I am try to check if an Oracle Number column called JOBNUMBER has a value that exists in a string parameter. Technically I am passing in the string as a stored procedure nvarchar2 parameter, but for simplicity, I hardcoded the string in my Query below:
SELECT FIRST_NAME, JOB_NUMBER
FROM JOBTABLE
WHERE TO_CHAR(JOB_NUMBER) IN ('00052, 00048');
When Oracle runs the query above, it returns no values even though 00052 is a number value in the table column for JOB_NUMBER. I'm thinking that it checks for the whole string ('00052, 00048') in JOB_NUMBER and can't find it, so it returns no values. The string will contain different values each time, and there will several numbers (of type string) in that string.
Does anyone know how to do this?
The trick is to keep the leading zeroes of the number when comparing to the string, then looping through the string to compare. Here a CTE is used is to simulate creating a numeric job number and a string to search. The TO_CHAR function makes sure to preserve the leading zeroes and the FM format removes the leading space that TO_CHAR leaves for the sign. CONNECT BY loops through the elements for the count of the delimiter + 1 times, keeping the count in the value in 'LEVEL'. This value is used in REGEXP_SUBSTR to iterate through the elements to compare the converted numeric value to each element to see if a match is found. Note this regular expression allows for NULL elements should you need to know which item in the list is your match.
SQL> with tbl(job_nbr_in, job_str_in) as (
select 00052, '00052, 00048' from dual
)
select --level element_nbr,
to_char(job_nbr_in, 'FM00000') search_for, job_str_in in_string,
regexp_substr(job_str_in, '(.*?)(, |$)', 1, level, NULL, 1) found
from tbl
where to_char(job_nbr_in, 'FM00000') = regexp_substr(job_str_in, '(.*?)(, |$)', 1, level, NULL, 1)
connect by level <= regexp_count(job_str_in, ',')+1;
SEARCH_FOR IN_STRING FOUND
---------- ------------ ------------
00052 00052, 00048 00052
If you are not sure if you will always have a space after the comma, remove spaces with REPLACE and adjust the delimiter in REGEXP_SUBSTR:
with tbl(job_nbr_in, job_str_in) as (
select 00052, '00052, 00048' from dual
)
select to_char(job_nbr_in, 'FM00000') search_for, job_str_in in_string,
regexp_substr(replace(job_str_in, ' '), '(.*?)(,|$)', 1, level, NULL, 1) found
from tbl
where to_char(job_nbr_in, 'FM00000') = regexp_substr(replace(job_str_in, ' '), '(.*?)(,|$)', 1, level, NULL, 1)
connect by level <= regexp_count(job_str_in, ',')+1;

Return words in between specific phrases in string in T-SQL

My column Details would return a big message such as and the only thing I want to extract is the number 874659.29. This number varies among rows but it will always comes after ,"CashAmount": and a coma (,).
There will be only one ,"CashAmount": but several comas after.
dhfgdh&%^&%,"CashAmount":874659.29,"Hasdjhf"&^%^%
Therefore, I was wondering if I could use anything to only show the number in my output column.
Thanks in advance!
Here is another option for this just using some string manipulation.
declare #Details varchar(100) = 'dhfgdh&%^&%,"CashAmount":874659.29,"Hasdjhf"&^%^%'
select left(substring(#Details, CHARINDEX('CashAmount":', #Details) + 12 /*12 is the length of CashAmount":*/, LEN(#Details))
, charindex(',', substring(#Details, CHARINDEX('CashAmount":', #Details) + 12, LEN(#Details))) - 1)
You could use one of the split string functions as described here..
declare #string varchar(max)
set #string='dhfgdh&%^&%,"CashAmount":874659.29,"Hasdjhf"&^%^%'
select b.val from
[dbo].[SplitStrings_Numbers](#string,',')a
cross apply
(
select isnumeric(replace(a.item,'"CashAmount":',1)),replace(a.item,'"CashAmount":',1)
) b(chk,val)
where b.chk=1
Output:
874659.29
The above will work only if number comes after cashamount and before , and if it doesn't have any special characters..
if your number has special characters,you can use TRY_PARSE and check for NULL..

Parsing Chars In SQL Using PATINDEX

I'm trying to validate a string using raw sql;
tried using:
DECLARE #AlphaNumeric varchar(50)
SET #AlphaNumeric = '1017a'
SELECT SUBSTRING(#AlphaNumeric, 1, (PATINDEX('%[^0-9]%', #AlphaNumeric) - 1)) AS 'Numeric',
SUBSTRING(#AlphaNumeric, PATINDEX('%[^0-9]%', #AlphaNumeric), DATALENGTH(#AlphaNumeric)) AS 'Alpha'
But if the user types 101a7a,this doesnt work properly;what i want to do exactly is;
I want the variable always to be, numeric+alphanumeric,lenght doesnt matter.
For example :
2303A OK
23A434A NOT OK
A344 NOT OK.
4324AAC OK
This would be dead easy if i could do it in Regex but sql gives me headaches :(
Letters followed by numbers are OK; Numbers followed by letters aren't; All characters must be letters or numbers. Hence...
select * from yourtable
where yourfield like '%[0-9][a-z]%'
and not (yourfield like '%[a-z][0-9]%')
and not (yourfield like '%[^0-9a-z]%')
I think this will do what you want. At least, it works on your sample data:
with t as (
select '2303A' as col union all
select '23A434A' union all
select 'A344'
)
select *,
(case when col like '%[0-9]%' and
substring(col, patindex('%[A-Z]%', col), len(col)) not like '%[^A-Z]%'
then 'OK'
else 'NOT OK'
end)
from t;
The two conditions are. First check that the character string has a number somewhere. Then, check that there are only letters after the first letter is found. I'm assuming that all letters are uppercase.
EDIT:
There might be an easier way. You can check that a number is followed by a letter somewhere in the string, but that a letter is never followed by a number. For this, you only need like:
select (case when col not like '%[^A-Z0-9]%' and
col like '%[0-9][A-Z]%' and
col not like '%[A-Z][0-9]%'
then 'OK'
else 'NOT OK'
end)
I have an approach that should work in your situation. Basically identify the position of the last integer and compare it to the position of the first non integer. You can get the position of the last integer like this
len(#AlphaNumeric) - PATINDEX('%[0-9]%', Reverse(#AlphaNumeric))+1
and you can get the position of the first non integer like this
PATINDEX('%[^0-9]%', #AlphaNumeric)
so that would make your where clause (where all integers precede any non integers like this
Where (len(#AlphaNumeric) - PATINDEX('%[0-9]%', Reverse(#AlphaNumeric))+1 ) < PATINDEX('%[^0-9]%', #AlphaNumeric)

Resources