Split String at every dot except when some word appears - arrays

I'm really new into regex and currently getting some trouble to solve a problem. I will appreciate any help :)
Using ruby 2.4.2
The problem: Split a string at every dot, except when the asd word is after the dot
String: str = "qwer.qwer.asd"
Code: str.split(/\./)
Output: ["qwer", "qwer", "asd"]
The output should be: ["qwer", "qwer.asd"]

Use
str.split(/\.(?!asd\b)/)
The \.(?!asd\b) pattern matches any dot that is not followed with asd followed with a word boundary. The (?!asd\b) is a negative lookahead that fails the match if the lookahead pattern finds a match immediately to the right of the current location.
In case the "word" ends with a period or end of string, use
str.split(/\.(?!asd(?:\.|\z))/)
where (?:\.|\z) is a non-capturing group matching either a dot (\.) or (|) end of string (\z).
See the Ruby demo and a regex demo.

Related

Regex: Match string pattern with AND condition

Is there a regex which can match the strings 1 and 2 but not 3:
TABLE_SP_02.csv.gz --match
TABLE.csv.gz --match
TABLE_REMARK.csv.gz --not match
I have many files with TABLE_SP format so I would like to match String 2 and all other string starting with TABLE_SP
Any help is appreciated. Thanks!
I tried a few regex patterns. I was able to individually match String 1 and 2. I could write a regex to match them both together. I am using regex_substr function in Snowflake for the same. I tested out the pattern in regex 101:
enter image description here
This matches the first 2 but not the 3rd.
^TABLE(_SP|\.)(.*)?csv.gz$
https://regex101.com/r/UE9I5C/1
TABLE(_SP|.) this regex pattern is solving the problem.

Regex empty string or number [duplicate]

I have the following Regular Expression which matches an email address format:
^[\w\.\-]+#([\w\-]+\.)+[a-zA-Z]+$
This is used for validation with a form using JavaScript. However, this is an optional field. Therefore how can I change this regex to match an email address format, or an empty string?
From my limited regex knowledge, I think \b matches an empty string, and | means "Or", so I tried to do the following, but it didn't work:
^[\w\.\-]+#([\w\-]+\.)+[a-zA-Z]+$|\b
To match pattern or an empty string, use
^$|pattern
Explanation
^ and $ are the beginning and end of the string anchors respectively.
| is used to denote alternates, e.g. this|that.
References
regular-expressions.info/Anchors and Alternation
On \b
\b in most flavor is a "word boundary" anchor. It is a zero-width match, i.e. an empty string, but it only matches those strings at very specific places, namely at the boundaries of a word.
That is, \b is located:
Between consecutive \w and \W (either order):
i.e. between a word character and a non-word character
Between ^ and \w
i.e. at the beginning of the string if it starts with \w
Between \w and $
i.e. at the end of the string if it ends with \w
References
regular-expressions.info/Word Boundaries
On using regex to match e-mail addresses
This is not trivial depending on specification.
Related questions
What is the best regular expression for validating email addresses?
Regexp recognition of email address hard?
How far should one take e-mail address validation?
An alternative would be to place your regexp in non-capturing parentheses. Then make that expression optional using the ? qualifier, which will look for 0 (i.e. empty string) or 1 instances of the non-captured group.
For example:
/(?: some regexp )?/
In your case the regular expression would look something like this:
/^(?:[\w\.\-]+#([\w\-]+\.)+[a-zA-Z]+)?$/
No | "or" operator necessary!
Here is the Mozilla documentation for JavaScript Regular Expression syntax.
I'm not sure why you'd want to validate an optional email address, but I'd suggest you use
^$|^[^#\s]+#[^#\s]+$
meaning
^$ empty string
| or
^ beginning of string
[^#\s]+ any character but # or whitespace
#
[^#\s]+
$ end of string
You won't stop fake emails anyway, and this way you won't stop valid addresses.
\b matches a word boundary. I think you can use ^$ for empty string.
^$ did not work for me if there were multiple patterns in regex.
Another solution:
/(pattern1)(pattern2)?/g
"pattern2" is optional. If empty, not matched.
? matches (pattern2) between zero and one times.
Tested here ("m" is there for multi-line example purposes): https://regex101.com/r/mezfvx/1

Regex to validate particular url

I am using angularjs. I want to use validation for my url field. I am a beginner in regex expressions. I want that the url should starting with 'https' or 'http' and followed by a string(this string can be a string or ip). For E.g https://localhost or http://100.100.100.100 should be valid and ftp://localhost should be invalid as it is starting with ftp.
I am using ng-pattern to validate this field. What regex expression should i use? Appreciate your help.
The following regexp should do it, or at least be a good start:
https?:\/\/[0-9A-z.]+
What it does:
http matches the characters http literally (case sensitive)
s? matches the character s literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times as possible
: matches the character : literally (case sensitive)
/ matches the character / literally (case sensitive)
/ matches the character / literally (case sensitive)
[0-9A-z.]+ Match any character present in the lists (0-9 = all numbers from 0 to 9, A-z = all letters case insensitive, . = matches also the . character)
+ Quantifier — Matches between one and unlimited times, as many times as possible
By the way this is simple enought and you could have figured it out by yourself. Googling url regex gives tons of other possible solutions.
If you are interested in trying out your regexs, this website will be really useful to you: regex101

Regex pattern matching for input type time

To check whether input type "time" field is completed (09:00am) I have used a regular expression.
ng-pattern="\b((1[0-2]|0?[1-9]):([0-5][0-9]) ([AaPp][Mm]))"
But in the same regular expression I want to check whether the input field is empty. For further information, time field can be empty or completed (ex: )(09:30am)
Can anyone help me regarding this..
In an ng-pattern, you need to use
ng-pattern="/^(?:(?:1[0-2]|0?[1-9]):[0-5]\d\s*[AaPp][Mm])?$/"
and if you need to avoid leading/trailing spaces, also add ng-trim="false".
See this regex demo.
The (?:...)? optional non-capturing group is a wrapper for the whole pattern that becomes optional, i.e. can match an empty string.
The ^ anchor will only match at the start of the string, and $ will anchor the match at the end of the string, so that an entire string should match.
In case somebody is looking for a regex for European time format (00:00-23:59), as i was, here is the regex for that:
^(?:1[0-9]|2[0-3]|0?[0-9]):[0-5]\d{1}?$
Hope this helps somebody save a minute or two.

Matching Regular Expressions In SQL Server

I am trying to extract id of Android app from its url but getting extra characters.
Using replace function in sql server, below are two sample urls:
https://play.google.com/store/apps/details?id=com.flipkart.android&hl=en com.flipkart.android
https://play.google.com/store/apps/details?hl=en_US&id=com.surveysampling.mobile.quickthoughts&referrer=mat_click_id%3Df1901cef59f79b1542d05a1fdfa67202-20150429-5128 en_US&id=com.surveysampling.mobile.quickthoughts&r
I am doing this right now:
SELECT
SUBSTRING(REPLACE(PREVIEW, '&hl=en',''), CHARINDEX('?', PREVIEW) + 4 , 50)
FROM OFFERS_TABLE;
But for 1st I am getting com.flipkart.android which is correct, but for 2nd I am getting en_US&id=com.surveysampling.mobile.quickthoughts&r.
I want to remove en_US&id from starting of it and &r from its end.
Can someone help me with any post or url from where I can refer?
What you are actually trying to do is extract the string preceded by id= until the & is found which is separator for variables in URL. Taking this condition I came up with following regex.
Regex: (?<=id=)[^&]*
Explanation: It uses the lookbehind assertion that is the string is preceded by id= until the first & is found.
Regex101 Demo
It seems like you've made some assumptions of lengths. The the &r is appearing because that is 50 characters. You are also getting the en_US because you assumed 4 characters at the beginning but your second string has more. Perhaps you can split on & and then look for the variable that begins with id=.
it seems like a function like this would help.
http://www.sqlservercentral.com/blogs/querying-microsoft-sql-server/2013/09/19/how-to-split-a-string-by-delimited-char-in-sql-server/

Resources