Dot in regular expression (regex)

Dot in regular expression (regex) - c

I am using slre (https://code.google.com/p/slre/) for providing a regex library for a c program.
I want to match an IP address with following pattern: "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$"
I get following compile error: Warning: unknown excape sequence '\.'
I also tried it with '\\.' --> the compile error is gone, but it's still saying it doesn't match.
if (!slre_compile(&slre, settings[i].regex)) {
printf("Error compiling RE: %s\n", slre.err_str);
}
else if (!slre_match(&slre, settings[i].value, strlen(settings[i].value), captures)) {
printf("\nSetting '%s' does not match the regular expression!", settings[i].internName);
}
settings[i].regex is a char* with the regular expression I mentioned above
settings[i].value is a char*
the string I am trying to match is 8.8.8.8
Is there any other way to check for a dot?

Try [.]
Dot isn't special inside character class.

The C compiler is seeing your backslash as an attempt to escape a character in C, in the same way that \n becomes a newline. You need to use a double-backslash:
\\.
The C compiler will turn that into a single backslash and pass that to the regex library.
That's the source of the compiler warning - if it's still not matching after you add the extra backslash then you have a different problem as well.
According to http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx your regex does match 8.8.8.8, so the problem isn't with the regex itself.

Your question is about C, but if you can compile with C++11 you can have a look to literal raw string
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2442.htm
std::string literal_string = R"literal string\. \n";

Related

How do I fix this error in the C file I am trying to run on the command prompt ? the code is written in the notepad [duplicate]

I have a problem compiling the following exploit code:
http://downloads.securityfocus.com/vulnerabilities/exploits/59846-1.c
I am using "gcc file.c" and "gcc -O2 file.c", but both of them results in the following errors:
sorbolinux-exec.c: In function ‘sc’:
sorbolinux-exec.c:76: error: stray ‘\302’ in program
sorbolinux-exec.c:76: error: stray ‘\244’ in program
sorbolinux-exec.c:76: error: ‘t’ undeclared (first use in this function)
sorbolinux-exec.c:76: error: (Each undeclared identifier is reported only once
sorbolinux-exec.c:76: error: for each function it appears in.)
I tried compiling them on both Kali Linux and Ubuntu 10.04 (Lucid Lynx) and got the same result.

You have an invalid character on that line. This is what I saw:

You have invalid characters in your source. If you don't have any valid non-ASCII characters in your source, maybe in a double quoted string literal, you can simply convert your file back to ASCII with:
tr -cd '\11\12\15\40-\176' < old.c > new.c
The method with iconv will stop at wrong characters which makes no sense. The above command line is working with the example file.

Sure, convert the file to ASCII and blast all Unicode characters away.
It will probably work... But...
You won't know what you fixed.
It will also destroy any Unicode comments. Example: //: A²+B²=C²
It could potentially damage obvious logic and the code will still be broken,
but the solution less obvious.
For example: A string with "Smart-Quotes" (“ & ”) or a pointer with a full-width asterisk (＊). Now “SOME_THING” looks like a #define (SOME_THING) and ＊SomeType is the wrong type (SomeType).
Two more surgical approaches to fixing the problem:
Switch fonts to see the character. (It might be invisible in your current font)
Regular expression search all Unicode characters not part of non-extended ASCII.
In Notepad++ I can search up to FFFF, which hasn't failed me yet.
[\x{80}-\x{FFFF}]
80 is hex for 128, the first extended ASCII character.
After hitting "find next" and highlighting what appears to be empty space, you can close your search dialog and press Ctrl + C to copy to clipboard.
Then paste the character into a Unicode search tool.
I usually use an online one.
http://unicode.scarfboy.com/
Example:
I had a bullet point (•) in my code somehow.
The Unicode value is 2022 (hex), but when read as ASCII by the compiler
you get \342 \200 \242 (3 octal values). It's not as simple as converting each octal values to hex and smashing them together. So "E2 80 A2" is not the hexadecimal Unicode point in your code.

I got the same with a character that visibly appeared as an asterisk, but it was a UTF-8 sequence instead:
Encoder * st;
When compiled, it returned:
g.c:2:1: error: stray ‘\342’ in program
g.c:2:1: error: stray ‘\210’ in program
g.c:2:1: error: stray ‘\227’ in program
342 210 227 turns out to be UTF-8 for ASTERISK OPERATOR (Unicode code point U+2217).
Deleting the '*' and typing it again fixed the problem.

Whenever the compiler found a special character, it gives these kind of compile errors. The error I found is as follows:
error: stray '\302' in program and error: stray '\240' in program
....
It is some piece of code I copied from a chat messenger. In Facebook Messenger, it was a special character only. After copying into the Vim editor it changed to the correct character only. But the compiler was giving the above error .. then .. that statement I wrote manually after .. it got resolved... :)

It's perhaps because you copied code from the Internet (from a site which has perhaps not an ASCII encoded page, but a UTF-8 encoded page), so you can convert the code to ASCII from this site:
"http://www.percederberg.net/tools/text_converter.html"
There you can either detect errors manually by converting it back to UTF-8, or you can automatically convert it to ASCII and remove all the stray characters.

This problem comes when you have copied some text from an HTML page or you have done modification in a Windows environment and are trying to compile in a Unix/Solaris environment.
Please do "dos2unix" to remove the special characters from the file:
dos2unix fileName.ext fileName.ext

Invalid character in your code.
It is a common copy-paste error, especially when code is copied from Microsoft Word documents or PDF files.

I noticed an issue in using the above tr command. The tr command COMPLETELY removes the "smart quotes". It would be better to replace the "smart quotes" with something like this.
This will give you a quick preview of what will be replaced.
sed s/[”“]/'"'/g File.txt
This will do the replacements and put the replacement in a new file called WithoutSmartQuotes.txt.
sed s/[”“]/'"'/g File.txt > WithoutSmartQuotes.txt
This will overwrite the original file.
sed -i ".bk" s/[”“]/'"'/g File.txt
http://developmentality.wordpress.com/2010/10/11/how-to-remove-smart-quotes-from-a-text-file/

Codo was exactly right on Oct. 5 that &current[i] is the intended text (with the currency symbol inadvertently introduced when the source was put into HTML (see original):
http://downloads.securityfocus.com/vulnerabilities/exploits/59846-1.c
Codo's change makes this exploit code compile without error. I did that and was able to use the exploit on Ubuntu 12.04 (Precise Pangolin) to escalate to root privilege.

The explanations given here are correct. I just wanted to add that this problem might be because you copied the code from somewhere, from a website or a PDF file due to which there are some invalid characters in the code.
Try to find those invalid characters, or just retype the code if you can't. It will definitely compile then.
Source: stray error reason

With me, this error occurred when I copied and pasted code in text format to my editor (gedit).
The code was in a text document (.odt). I copied it and pasted it into gedit.
If you did the same, you have manually rewrite the code.

error: stray '\255' in program in C programme [duplicate]

I have a problem compiling the following exploit code:
http://downloads.securityfocus.com/vulnerabilities/exploits/59846-1.c
I am using "gcc file.c" and "gcc -O2 file.c", but both of them results in the following errors:
sorbolinux-exec.c: In function ‘sc’:
sorbolinux-exec.c:76: error: stray ‘\302’ in program
sorbolinux-exec.c:76: error: stray ‘\244’ in program
sorbolinux-exec.c:76: error: ‘t’ undeclared (first use in this function)
sorbolinux-exec.c:76: error: (Each undeclared identifier is reported only once
sorbolinux-exec.c:76: error: for each function it appears in.)
I tried compiling them on both Kali Linux and Ubuntu 10.04 (Lucid Lynx) and got the same result.

You have an invalid character on that line. This is what I saw:

You have invalid characters in your source. If you don't have any valid non-ASCII characters in your source, maybe in a double quoted string literal, you can simply convert your file back to ASCII with:
tr -cd '\11\12\15\40-\176' < old.c > new.c
The method with iconv will stop at wrong characters which makes no sense. The above command line is working with the example file.

Sure, convert the file to ASCII and blast all Unicode characters away.
It will probably work... But...
You won't know what you fixed.
It will also destroy any Unicode comments. Example: //: A²+B²=C²
It could potentially damage obvious logic and the code will still be broken,
but the solution less obvious.
For example: A string with "Smart-Quotes" (“ & ”) or a pointer with a full-width asterisk (＊). Now “SOME_THING” looks like a #define (SOME_THING) and ＊SomeType is the wrong type (SomeType).
Two more surgical approaches to fixing the problem:
Switch fonts to see the character. (It might be invisible in your current font)
Regular expression search all Unicode characters not part of non-extended ASCII.
In Notepad++ I can search up to FFFF, which hasn't failed me yet.
[\x{80}-\x{FFFF}]
80 is hex for 128, the first extended ASCII character.
After hitting "find next" and highlighting what appears to be empty space, you can close your search dialog and press Ctrl + C to copy to clipboard.
Then paste the character into a Unicode search tool.
I usually use an online one.
http://unicode.scarfboy.com/
Example:
I had a bullet point (•) in my code somehow.
The Unicode value is 2022 (hex), but when read as ASCII by the compiler
you get \342 \200 \242 (3 octal values). It's not as simple as converting each octal values to hex and smashing them together. So "E2 80 A2" is not the hexadecimal Unicode point in your code.

I got the same with a character that visibly appeared as an asterisk, but it was a UTF-8 sequence instead:
Encoder * st;
When compiled, it returned:
g.c:2:1: error: stray ‘\342’ in program
g.c:2:1: error: stray ‘\210’ in program
g.c:2:1: error: stray ‘\227’ in program
342 210 227 turns out to be UTF-8 for ASTERISK OPERATOR (Unicode code point U+2217).
Deleting the '*' and typing it again fixed the problem.

Whenever the compiler found a special character, it gives these kind of compile errors. The error I found is as follows:
error: stray '\302' in program and error: stray '\240' in program
....
It is some piece of code I copied from a chat messenger. In Facebook Messenger, it was a special character only. After copying into the Vim editor it changed to the correct character only. But the compiler was giving the above error .. then .. that statement I wrote manually after .. it got resolved... :)

It's perhaps because you copied code from the Internet (from a site which has perhaps not an ASCII encoded page, but a UTF-8 encoded page), so you can convert the code to ASCII from this site:
"http://www.percederberg.net/tools/text_converter.html"
There you can either detect errors manually by converting it back to UTF-8, or you can automatically convert it to ASCII and remove all the stray characters.

This problem comes when you have copied some text from an HTML page or you have done modification in a Windows environment and are trying to compile in a Unix/Solaris environment.
Please do "dos2unix" to remove the special characters from the file:
dos2unix fileName.ext fileName.ext

Invalid character in your code.
It is a common copy-paste error, especially when code is copied from Microsoft Word documents or PDF files.

I noticed an issue in using the above tr command. The tr command COMPLETELY removes the "smart quotes". It would be better to replace the "smart quotes" with something like this.
This will give you a quick preview of what will be replaced.
sed s/[”“]/'"'/g File.txt
This will do the replacements and put the replacement in a new file called WithoutSmartQuotes.txt.
sed s/[”“]/'"'/g File.txt > WithoutSmartQuotes.txt
This will overwrite the original file.
sed -i ".bk" s/[”“]/'"'/g File.txt
http://developmentality.wordpress.com/2010/10/11/how-to-remove-smart-quotes-from-a-text-file/

Codo was exactly right on Oct. 5 that &current[i] is the intended text (with the currency symbol inadvertently introduced when the source was put into HTML (see original):
http://downloads.securityfocus.com/vulnerabilities/exploits/59846-1.c
Codo's change makes this exploit code compile without error. I did that and was able to use the exploit on Ubuntu 12.04 (Precise Pangolin) to escalate to root privilege.

The explanations given here are correct. I just wanted to add that this problem might be because you copied the code from somewhere, from a website or a PDF file due to which there are some invalid characters in the code.
Try to find those invalid characters, or just retype the code if you can't. It will definitely compile then.
Source: stray error reason

With me, this error occurred when I copied and pasted code in text format to my editor (gedit).
The code was in a text document (.odt). I copied it and pasted it into gedit.
If you did the same, you have manually rewrite the code.

Unknown Conversion Type Character `"'

I have the follow line of C code:
sprintf (ptr, "<th width=\"25%\">Head One</th>\n");
I have tried combinations of replacements:
Replacing: \" with ""
Replacing % with %%
But on compilation, using a Makefile, all produce the same error:
warning: unknown conversion type character `"' in format
Any suggestions on how to avoid this please?

Use strcpy() since you don't want formatting (or strlcpy()/strcpy_s() for safety).
Doubling the percent should work, and if you get the same error then that of course points at something being wrong in your build environment.

C compiler warning Unknown escape sequence '\.' using regex for c program

I am using regex to determine a command line argument has the .dat extension. I am trying the following regex:
#define to_find "^.*\.(dat)?"
For some reason I am getting the warning I stated in the title of this question. First, is this expression correct? I believe it is. Second, if it is correct, how can i get rid of this warning?
I am coding a c program in Xcode and the above #define is in my .h file.
Thanks!

The warning is coming from the C compiler. It is telling you that \. is not a known escape sequence in C. Since this string is going to a regex engine, you need to double-escape the slash, like this:
#define to_find "^.*\\.(dat)?"
This regex would match a string with an optional .dat extension, with dat being optional. However, the dot . is required. If you want the dot to be optional as well, put it inside the parentheses, like this: ^.*(\\.dat)?.
Note that you can avoid escaping the individual metacharacters by enclosing them in square brackets, like this:
#define to_find "^.*([.]dat)?"

You need
#define to_find "^.*\\.(dat)?"
Should do the trick as the \ needs to be escaped for C and not the benefit for regex at this stage

Weird gcc error stray/missing terminating " character in C

I get the following errors:
error: missing terminating " character
and
error: stray `\' in program
In this line of C code:
system("sqlite3 -html /home/user/.rtcom-eventlogger/el.db \"SELECT service_id, event_type_id,free_text, remote_uid FROM Events WHERE remote_uid=\'%d\' ORDER BY start_time DESC;\" > lol.html", nr);
"nr" is a integer variable.
I have gone over this so many times but are totally stuck of finding a solution.
EDIT: The errors is the ouput while compiling with gcc if I didn't make that clear.

Within a double-quoted string in C, I don't think that \' has any meaning. It looks like your backslashing there is meant to protect the single quotes in the shell, which means they should be double-backslashed within the string: remote_uid=\\'%d\\'.

Well, you don't need to escape the single quotes inside the string (e.g. \' should just be '), but I'm not sure that that would cause the error you're seeing.

I had the same problem, trying to do basically the same thing.
My problem was that I used WinZip to decompress the source. After using 7z it worked fine.

In my case I had an external define variable with escaped ", like this:
#define DEFINE \"string\"
It was transcluded into code like this:
cout << DEFINE; // source code
cout << \"string\"; // source code during compilation

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Dot in regular expression (regex) - c

Try [.] Dot isn't special inside character class.

Your question is about C, but if you can compile with C++11 you can have a look to literal raw string http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2442.htm std::string literal_string = R"literal string\. \n";

Related

How do I fix this error in the C file I am trying to run on the command prompt ? the code is written in the notepad [duplicate]

error: stray '\255' in program in C programme [duplicate]

Unknown Conversion Type Character `"'

C compiler warning Unknown escape sequence '\.' using regex for c program

Weird gcc error stray/missing terminating " character in C

Categories

Resources