Regex Function in salesforce - salesforce

Could any of you pls explain the following code. For eg., Why D,d is used for?
NOT(REGEX(Phone, "\\D*?(\\d\\D*?){10}"))

The double backslashes are used because of Java's string escaping rules. The pure regex means:
\D*? # Match any number of non-digit characters (the "?" is useless here)
( # Match...
\d # a single digit
\D*? # optionally followed by any number of non-digits (again, useless "?")
){10} # Repeat the previous group 10 times.
So this regex matches any string that contains exactly ten digits (plus any number of other, non-digit characters).

If you're using the REGEX from the example in Salesforce, it's useless. It matches "this1234567890that" where "this" and "that" can be any value. I used: NOT( REGEX(Phone, "\([0-9]{3}\) [0-9]{3}-[0-9]{4}|\d{10}")) to accomplish the desired behavior.
My version translates to:
\\( # Match '('
[0-9]{3} # Match 3 digits
\\) # Match ')' followed by a space
[0-9]{3} # Match 3 digits
- # Match hyphen
[0-9]{4} # Match 4 more digits
|\\d{10} # or match 10 digits instead of all the previous

Related

Regular expression unexpected pattern matching

I am trying to create a syntax parser using C-Bison and Flex. In Flex I have a regular expression which matches integers based on the following:
Must start with any digit in range 1-9 and followed by any number of digits in range 0-9. (ex. Correct: 1,12,11024 | Incorrect: 012)
Can be signed (ex. +2,-5)
The number 0 must not be followed by any digit (0-9) and must not signed. (ex. Correct: 0 | Incorrect: 012,+0,-0)
Here is the regex I have created to perform the matching:
[^+-]0[^0-9]|[+-]?[1-9][0-9]*
Here is the expression I am testing:
(1 + 1 + 10)
The matches:
1
1
10)
And here is my question, why does it match '10)'?
The reason I used the above expression, instead of the much simpler one,
(0|[+-]?[1-9][0-9]*) is due to inability of the parser to recognise incorrect expressions such as 012.
The problem seems to occur only when before the ')' precedes the digit '0'. However if the '0' is preceded by two or more digits (ex. 100), then the ')' is not matched.
I know for a fact if I remove [^0-9] from the regex it doesn't match the ')'.
It matches 10( because 1 matches [^+-], 0 matches 0 and ( matches [^0-9].
The reason I used the above expression, instead of the much simpler one, (0|[+-]?[1-9][0-9]*) is due to inability of the parser to recognise incorrect expressions such as 012.
How so? Using the above regex, 012 would be recognized as two tokens: 0 and 12. Would this not cause an error in your parser?
Admittedly, this would not produce a very good error message, so a better approach might be to just use [0-9]+ as the regex and then use the action to check for a leading zero. That way 012 would be a single token and the lexer could produce an error or warning about the leading zero (I'm assuming here that you actually want to disallow leading zeros - not use them for octal literals).
Instead of a check in the action, you could also keep your regex and then add another one for integers with a leading zero (like 0[0-9]+ { warn("Leading zero"); return INT; }), but I'd go with the check in the action since it's an easy check and it keeps the regex short and simple.
PS: If you make - and + part of the integer token, something like 2+3 will be seen as the integer 2, followed by the integer +3, rather than the integers 2 and 3 with a + token in between. Therefore it is generally a better idea to not make the sign a part of the integer token and instead allow prefix + and - operators in the parser.

Valid regex expressions won't work with AngularJS

I'd like to check if user input is correct for phone numbers in two formats:
01 1234567 (two numbers, a space, seven numbers)
+12 123 123 123 123 (plus sign, two numbers, a space, three numbers, a space, three numbers, a space, three numbers
no character at all (no input)
I wrote a regex for this [0-9]{2} [0-9]{3} [0-9]{3} [0-9]{3}|[0-9]{2} [0-9]{7}|. It works when checked with online regex checkers, but it won't work (user can write whatever they want) when used with AngularJS: ng-pattern="[0-9]{2} [0-9]{3} [0-9]{3} [0-9]{3}|[0-9]{2} [0-9]{7}|".
You need to define a regex that will match the whole string that matches your patterns as optional patterns:
ng-pattern="/^(?:\+[0-9]{2} [0-9]{3} [0-9]{3} [0-9]{3}|[0-9]{2} [0-9]{7})?$/"
^^ ^^
Or, a bit shorter:
ng-pattern="/^(?:\+[0-9]{2}(?: [0-9]{3}){3}|[0-9]{2} [0-9]{7})?$/"
If you define the pattern in a JS file as a variable use
var mypattern = /^(?:\+[0-9]{2}(?: [0-9]{3}){3}|[0-9]{2} [0-9]{7})?$/;
Note that when using regex delimiters the anchors are required for the regex to match entire input.
See the regex demo.
Details
^ - start of string
(?: - start of an optional non-capturing group:
\+ - a + char
[0-9]{2} [0-9]{3} [0-9]{3} [0-9]{3} (equal to [0-9]{2}(?: [0-9]{3}){3}) - 2 digits and then 3 occurrences of a space, 3 digits
| - or
[0-9]{2} [0-9]{7} - 2 digits, space, 7 digits
)? - end of the optional group
$ - end of string.

Regex to reject if all numbers and reject colon

I am trying for a regex to
reject if input is all numbers
accept alpha-neumeric
reject colon ':'
I tried ,
ng-pattern="/[^0-9]/" and
ng-pattern="/[^0-9] [^:]*$/"
for example ,
"Block1 Grand-street USA" must be accepted
"111132322" must be rejected
"Block 1 grand : " must be rejected
You may use
ng-pattern="/^(?!\d+$)[^:]+$/"
See the regex demo.
To only forbid a : at the end of the string, use
ng-pattern="/^(?!\d+$)(?:.*[^:])?$/"
See another regex demo
The pattern matches
^ - start of string
(?!\d+$) - no 1+ digits to the end of the string
[^:]+ - one or more chars other than :
(?:.*[^:])? - an optional non-capturing group that matches 1 or 0 occurrences of
.* - any 0+ chars other than line break chars, as many as possible
[^:] - any char other than : (if you do not want to match an empty string, replace the (?: and )?)
$ - end of string.
According to comments, you want to match any character but colon.
This should do the job:
ng-pattern="/^(?!\d+$)[^:]+$/"

Lex/Flex - Split the phone number Up?

I am making a program which got to split the phone-number apart, each part has been divided by a hyphen (or spaces, or '( )' or empty).
Exp: Input: 0xx-xxxx-xxxx or 0xxxxxxxxxx or (0xx)xxxx-xxxx
Output: code 1: 0xx
code 2: xxxx
code 3: xxxx
But my problem is: sometime "Code 1" is just 0x -> so "Code 2" must be xxxxx (1st part always have hyphen or a parenthesis when 2 digit long)
Anyone can give me a hand, It would be grateful.
According to your comments, the following regex will extract the information you need
^\(?(0\d{1,2})\)?[- ]?(\d{4,5})[- ]?(\d{4})$
Break down:
^\(?(0\d{1,2})\)? matches 0x, 0xx, (0xx) and (0x) at he beggining of the string
[- ]? as parenthesis can only be used for the first group, the only valid separators left are space and the hyphen. ? means 0 or 1 time.
(\d{4,5}) will match the second group. As the length of the 3rd group is fixed (4 digits), the regex will automatically calculate the length of the Group1 and 2.
(\d{4})$ matches the 4 digits at the end of the number.
See it in action
You can the extract data from capture group 1,2 and 3
Note: As mentionned in the comments of the OP, this only extracts data from correctly formed numbers. It will match some ill-formed numbers.

Posix regex capture group matching sequence

I have the following text string and regex pattern in a c program:
char text[] = " identification division. ";
char pattern[] = "^(.*)(identification *division)(.*)$";
Using regexec() library function, I got the following results:
String: identification division.
Pattern: ^(.*)(identification *division)(.*)$
Total number of subexpressions: 3
OK, pattern has matched ...
begin: 0, end: 37,match: identification division.
subexpression 1 begin: 0, end: 8, match:
subexpression 2 begin: 8, end: 35, match: identification division
subexpression 3 begin: 35, end: 37, match: .
I was wondering since the regex engine matches in a greedy fashion and the first capture group (.*) matches any number of characters (except new line characters) why doesn't it match characters all the way to the end in the text string (up to '.') as oppose to matching only the first 8 spaces?
Does each capture group have to be matched?
Are there any rules on how the capture group matches the text string?
Thanks.
Regexes are as greedy as possible, without being too greedy. Had the left group been as greedy as you expect, the group that matches "identification division" would have been unable to match, erronously rejecting text, which was clearly in the language.
Just as you said, if the greedy group (.*) had consumed the whole string, the rest of the regex wouldn't have anything to match which wouldn't make your regex match the string. So, yes, each capture group (and other pattern parts) needs to be matched. This is exactly what you specified in your regex.
Try the following string instead and run the code with both a reluctant and a greedy first group and you will see the difference.
char text[] = " identification division identification division. ";

Resources