I'm trying to build a .NET regex to match SQL Server constant strings... but not Unicode strings.
Here's a bit of SQL:
select * from SomeTable where SomeKey = 'abc''def' and AnotherField = n'another''value'
Note that within a string two single quotes escapes a single quote.
The regex should match 'abc''def' but not n'another''value'.
I have a regex now that manages to locate a string, but it also matches the Unicode string (starting just after the N):
'('{2})*([^']*)('{2})*([^']*)('{2})*'
Thanks!
This pattern will do most of what you are looking to do:
(?<unicode>n)?'(?<value>(?:''|[^'])*)'
The upside is that it should accurately match any number of escaped quotes. (SomeKey = 'abc''''def''' will match abc''''def''.)
The downside is it also matches Unicode strings, although it captures the leading n to identify it as a Unicode string. When you process the regular expression, you can ignore matches where the match group "unicode" was successful.
The pattern creates the following groups for each match:
unicode: Success if the string is a Unicode string, fails to match if ASCII
value: the string value. escaped single quotes remain escaped
If you are using .NET regular expressions, you could add (?(unicode)(?<-value>)) to the end of the pattern to suppress matching the value, although the pattern as a whole would still match.
Edit
Having thought about it some more, the following pattern should do exactly what you wanted; it will not match Unicode strings at all. The above approach might still be more readable, however.
(?:n'(?:''|[^'])*'[^']*)*(?<!n)'(?<value>(?:''|[^'])*)'
Related
I have the following Regular Expression which matches an email address format:
^[\w\.\-]+#([\w\-]+\.)+[a-zA-Z]+$
This is used for validation with a form using JavaScript. However, this is an optional field. Therefore how can I change this regex to match an email address format, or an empty string?
From my limited regex knowledge, I think \b matches an empty string, and | means "Or", so I tried to do the following, but it didn't work:
^[\w\.\-]+#([\w\-]+\.)+[a-zA-Z]+$|\b
To match pattern or an empty string, use
^$|pattern
Explanation
^ and $ are the beginning and end of the string anchors respectively.
| is used to denote alternates, e.g. this|that.
References
regular-expressions.info/Anchors and Alternation
On \b
\b in most flavor is a "word boundary" anchor. It is a zero-width match, i.e. an empty string, but it only matches those strings at very specific places, namely at the boundaries of a word.
That is, \b is located:
Between consecutive \w and \W (either order):
i.e. between a word character and a non-word character
Between ^ and \w
i.e. at the beginning of the string if it starts with \w
Between \w and $
i.e. at the end of the string if it ends with \w
References
regular-expressions.info/Word Boundaries
On using regex to match e-mail addresses
This is not trivial depending on specification.
Related questions
What is the best regular expression for validating email addresses?
Regexp recognition of email address hard?
How far should one take e-mail address validation?
An alternative would be to place your regexp in non-capturing parentheses. Then make that expression optional using the ? qualifier, which will look for 0 (i.e. empty string) or 1 instances of the non-captured group.
For example:
/(?: some regexp )?/
In your case the regular expression would look something like this:
/^(?:[\w\.\-]+#([\w\-]+\.)+[a-zA-Z]+)?$/
No | "or" operator necessary!
Here is the Mozilla documentation for JavaScript Regular Expression syntax.
I'm not sure why you'd want to validate an optional email address, but I'd suggest you use
^$|^[^#\s]+#[^#\s]+$
meaning
^$ empty string
| or
^ beginning of string
[^#\s]+ any character but # or whitespace
#
[^#\s]+
$ end of string
You won't stop fake emails anyway, and this way you won't stop valid addresses.
\b matches a word boundary. I think you can use ^$ for empty string.
^$ did not work for me if there were multiple patterns in regex.
Another solution:
/(pattern1)(pattern2)?/g
"pattern2" is optional. If empty, not matched.
? matches (pattern2) between zero and one times.
Tested here ("m" is there for multi-line example purposes): https://regex101.com/r/mezfvx/1
How do you escape strings for SQLite table names in c?
I find a document, but it do not tell me the detail https://www.sqlite.org/lang_keywords.html
And this document says that sql is end with '\x00' https://www.sqlite.org/c3ref/prepare.html
Here is the similar question in python: How do you escape strings for SQLite table/column names in Python?
Identifiers should be wrapped in double quotes if they need escaping, with all double quotes in them escaped by doubling up the quotes. bad"name" needs to become "bad""name" to be used in a SQL statement.
Sqlite comes with custom versions of *printf() functions that include formats for escaping sql identifiers and strings (Which use single quotes in SQL). The one that does the escaping of double quotes for identifiers is %w:
char *sanitized_ddl = sqlite3_mprintf("CREATE TABLE \"%w\"(\"%w\", \"%w\");",
"bad\"name", "foo bar", "baz");
Ideally, though, you're not going to use table or column names that need escaping, but it's good practice to escape user-supplied names to help protect against SQL injection attacks and the like.
Example:
example to 'example'
'a to '''a'
Detail:
Do not use the byte value 0 in the string.It will return an error like unrecognized token: "'" even if you pass the correct zSql len to sqlite3_prepare_v2().
Replace ' to '' in the string, add ' to the start and the end to the string. ' is single quote (byte 39).
It is not recommend to use invalid utf8 string. The string do not need to be valid utf8 string according to the parse code in sqlite3 version 3.28.0 . I have tested that invalid utf8 string can use as table name, but the document of sqlite3_prepare_v2() says you need SQL statement, UTF-8 encoded
I have write a program to confirm that any byte list with len 1,2 without byte value 0 in it, can use as the table name, and the program can read value from that table, can list the table from the SQLITE_MASTER table in sqlite3 version 3.28.0.
I am trying to filter items with a stored procedure using like. The column is a varchar(15). The items I am trying to filter have square brackets in the name.
For example: WC[R]S123456.
If I do a LIKE 'WC[R]S123456' it will not return anything.
I found some information on using the ESCAPE keyword with LIKE, but how can I use it to treat the square brackets as a regular string?
LIKE 'WC[[]R]S123456'
or
LIKE 'WC\[R]S123456' ESCAPE '\'
Should work.
Let's say you want to match the literal its[brac]et.
You don't need to escape the ] as it has special meaning only when it is paired with [.
Therefore escaping [ suffices to solve the problem. You can escape [ by replacing it with [[].
I needed to exclude names that started with an underscore from a query, so I ended up with this:
WHERE b.[name] not like '\_%' escape '\' -- use \ as the escape character
Here is what I actually used:
like 'WC![R]S123456' ESCAPE '!'
The ESCAPE keyword is used if you need to search for special characters like % and _, which are normally wild cards. If you specify ESCAPE, SQL will search literally for the characters % and _.
Here's a good article with some more examples
SELECT columns FROM table WHERE
column LIKE '%[[]SQL Server Driver]%'
-- or
SELECT columns FROM table WHERE
column LIKE '%\[SQL Server Driver]%' ESCAPE '\'
According to documentation:
You can use the wildcard pattern matching characters as literal
characters. To use a wildcard character as a literal character,
enclose the wildcard character in brackets.
You need to escape these three characters %_[:
'5%' LIKE '5[%]' -- true
'5$' LIKE '5[%]' -- false
'foo_bar' LIKE 'foo[_]bar' -- true
'foo$bar' LIKE 'foo[_]bar' -- false
'foo[bar' LIKE 'foo[[]bar' -- true
'foo]bar' LIKE 'foo]bar' -- true
If you would need to escape special characters like '_' (underscore), as it was in my case, and you are not willing/not able to define an ESCAPE clause, you may wish to enclose the special character with square brackets '[' and ']'.
This explains the meaning of the "weird" string '[[]' - it just embraces the '[' character with square brackets, effectively escaping it.
My use case was to specify the name of a stored procedure with underscores in it as a filter criteria for the Profiler. So I've put string '%name[_]of[_]a[_]stored[_]procedure%' in a TextData LIKE field and it gave me trace results I wanted to achieve.
Here is a good example from the documentation:
LIKE (Transact-SQL) - Using Wildcard Characters As Literals
There is a problem in that while
LIKE 'WC[[]R]S123456'
and
LIKE 'WC\[R]S123456' ESCAPE '\'
both work for SQL Server, neither work for Oracle.
It seems that there isn't any ISO/IEC 9075 way to recognize a pattern involving a left brace.
Instead of '\' or another character on the keyboard, you can also use special characters that aren't on the keyboard. Depending o your use case this might be necessary, if you don't want user input to accidentally be used as an escape character.
Use the following.
For user input to search as it is, use escape, in that it will require the following replacement for all special characters (the below covers all of SQL Server).
Here a single quote, "'" ,is not taken as it does not affect the like clause as it is a matter of string concatenation.
The "-" & "^" & "]" replace is not required as we are escaping "[".
String FormattedString = "UserString".Replace("ð","ðð").Replace("_", "ð_").Replace("%", "ð%").Replace("[", "ð[");
Then, in SQL Query it should be as following. (In parameterised query, the string can be added with patterns after the above replacement).
To search an exact string.
like 'FormattedString' ESCAPE 'ð'
To search start with a string:
like '%FormattedString' ESCAPE 'ð'
To search end with a string:
like 'FormattedString%' ESCAPE 'ð'
To search containing with a string:
like '%FormattedString%' ESCAPE 'ð'
And so on for other pattern matching. But direct user input needs to be formatted as mentioned above.
As expected, I get an error when entering some characters not included in my database collation:
(1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='")
Is there any function I could use to make sure a string only contains characters existing in my database collation?
thanks
You can use a regular expression to only allow certain characters. The following allows only letters, numbers and _(underscore), but you can change to include whatever you want:
import re
exp = '^[A-Za-z0-9_]+$'
re.match(exp, my_string)
If an object is returned a match is found, if no return value, invalid string.
I'd look at Python's unicode.translate() and codec.encode() functions. Both of these would allow more elegant handling of non-legal input characters, and IIRC, translate() has been shown to be faster than a regexp for similar use-cases (should be easy to google the findings).
From Python's docs:
"For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted. Note, a more flexible approach is to create a custom character mapping codec using the codecs module (see encodings.cp1251 for an example)."
http://docs.python.org/library/stdtypes.html
http://docs.python.org/library/codecs.html
I am trying to filter items with a stored procedure using like. The column is a varchar(15). The items I am trying to filter have square brackets in the name.
For example: WC[R]S123456.
If I do a LIKE 'WC[R]S123456' it will not return anything.
I found some information on using the ESCAPE keyword with LIKE, but how can I use it to treat the square brackets as a regular string?
LIKE 'WC[[]R]S123456'
or
LIKE 'WC\[R]S123456' ESCAPE '\'
Should work.
Let's say you want to match the literal its[brac]et.
You don't need to escape the ] as it has special meaning only when it is paired with [.
Therefore escaping [ suffices to solve the problem. You can escape [ by replacing it with [[].
I needed to exclude names that started with an underscore from a query, so I ended up with this:
WHERE b.[name] not like '\_%' escape '\' -- use \ as the escape character
Here is what I actually used:
like 'WC![R]S123456' ESCAPE '!'
The ESCAPE keyword is used if you need to search for special characters like % and _, which are normally wild cards. If you specify ESCAPE, SQL will search literally for the characters % and _.
Here's a good article with some more examples
SELECT columns FROM table WHERE
column LIKE '%[[]SQL Server Driver]%'
-- or
SELECT columns FROM table WHERE
column LIKE '%\[SQL Server Driver]%' ESCAPE '\'
According to documentation:
You can use the wildcard pattern matching characters as literal
characters. To use a wildcard character as a literal character,
enclose the wildcard character in brackets.
You need to escape these three characters %_[:
'5%' LIKE '5[%]' -- true
'5$' LIKE '5[%]' -- false
'foo_bar' LIKE 'foo[_]bar' -- true
'foo$bar' LIKE 'foo[_]bar' -- false
'foo[bar' LIKE 'foo[[]bar' -- true
'foo]bar' LIKE 'foo]bar' -- true
If you would need to escape special characters like '_' (underscore), as it was in my case, and you are not willing/not able to define an ESCAPE clause, you may wish to enclose the special character with square brackets '[' and ']'.
This explains the meaning of the "weird" string '[[]' - it just embraces the '[' character with square brackets, effectively escaping it.
My use case was to specify the name of a stored procedure with underscores in it as a filter criteria for the Profiler. So I've put string '%name[_]of[_]a[_]stored[_]procedure%' in a TextData LIKE field and it gave me trace results I wanted to achieve.
Here is a good example from the documentation:
LIKE (Transact-SQL) - Using Wildcard Characters As Literals
There is a problem in that while
LIKE 'WC[[]R]S123456'
and
LIKE 'WC\[R]S123456' ESCAPE '\'
both work for SQL Server, neither work for Oracle.
It seems that there isn't any ISO/IEC 9075 way to recognize a pattern involving a left brace.
Instead of '\' or another character on the keyboard, you can also use special characters that aren't on the keyboard. Depending o your use case this might be necessary, if you don't want user input to accidentally be used as an escape character.
Use the following.
For user input to search as it is, use escape, in that it will require the following replacement for all special characters (the below covers all of SQL Server).
Here a single quote, "'" ,is not taken as it does not affect the like clause as it is a matter of string concatenation.
The "-" & "^" & "]" replace is not required as we are escaping "[".
String FormattedString = "UserString".Replace("ð","ðð").Replace("_", "ð_").Replace("%", "ð%").Replace("[", "ð[");
Then, in SQL Query it should be as following. (In parameterised query, the string can be added with patterns after the above replacement).
To search an exact string.
like 'FormattedString' ESCAPE 'ð'
To search start with a string:
like '%FormattedString' ESCAPE 'ð'
To search end with a string:
like 'FormattedString%' ESCAPE 'ð'
To search containing with a string:
like '%FormattedString%' ESCAPE 'ð'
And so on for other pattern matching. But direct user input needs to be formatted as mentioned above.