I'm trying to run a regex through a system command in the code, I have gone through the threads in StackOverflow on similar warnings but I couldn't understand on how to fix the below warnings, it seems to come only for the closed brackets on doing \\}. The warnings seem to disappear but not able to get the exact output in the redirected file.
#include<stdio.h>
int main(){
FILE *in;
char buff[512];
if(system("grep -o '[0-9]\{1,3\}\\.[0-9]\{1,3\}\\.[0-9]\{1,3\}\\.[0-9]\{1,3\}' /home/santosh/Test/text >t2.txt") < 0){
printf("system failed:");
exit(1);
}
}
Warnings:
dup.c:9:11: warning: unknown escape sequence '\}'
dup.c:9:11: warning: unknown escape sequence '\}'
dup.c:9:11: warning: unknown escape sequence '\}'
dup.c:9:11: warning: unknown escape sequence '\}'
dup.c: In function 'main':
In C string literals the \ has a special meaning, it's for representing characters such as line endings \n. If you want to put a \ in a string, you need to use \\.
For example
"\\Hello\\Test"
will actually result in "\Hello\Test".
So your regexp needs to be written as:
"[0-9]\\{1,3\}\\\\.[0-9]\\{1,3\}\\\\.[0-9]\\{1,3\\}\\\\.[0-9]\\{1,3\\}"
instead of:
"[0-9]\{1,3\}\\.[0-9]\{1,3\}\\.[0-9]\{1,3\}\\.[0-9]\{1,3\}"
Sure this is painful because \ is used as escape character for the regexp and again as escape character for the string literal.
So basically: when you want to put a \ you need to write \\.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
When I compile a C file has contents below:
#include <stdio.h>
#define FILE_NAME "text\ 1"
int main()
{
FILE* file_ptr = fopen(FILE_NAME, "w");
fclose(file_ptr);
return 0;
}
get warning:
tt.c: In function ‘main’:
tt.c:6:37: warning: unknown escape sequence: '\040'
6 | FILE* file_ptr = fopen(FILE_NAME, "w");
|
I know it caused by \ in a string of my C language code and 40 is decimal 32 as ASCII of SPACE. Why the warning is '\040' not '\x20'?
And seems also in bash \ transfer to \040 seed to binaries (not sure).
Is there a rule to force it?
Update:delete '\32' which used to represent ASCII of SPACE to decimal.
How I encounter this problem?
I just wanna know how Bash process ESCAPED SPACE, I thougt bash turn it to SPACE, but after I check source code of Bash (hard for me). I found maybe Bash treat \ as normal string of characters as below source code not involved \ :
#define slashify_in_quotes "\\`$\"\n"
#define slashify_in_here_document "\\`$"
#define shell_meta_chars "()<>;&|"
#define shell_break_chars "()<>;&| \t\n"
#define shell_quote_chars "\"`'"
So I think Bash turn the \ to the command or binary to process, so I write above simple C file to check how C treat \
So my question is Why gcc warning '\040' not '\x20'?
For how Bash treat \ still need me to check...
Answer to Updated Question
Why the warning is '\040' not '\x20'?
This is merely a choice by the compiler implementors. When you have \ in a string or character constant followed by something that is not a recognized escape sequence, the compiler warns you. For example, if you had \g, the compiler would warn you that \g is not recognized. When the character after the \ may be unclear, because it is a white space character that cannot be distinguished from others (like space from tab) or is not a printable character, the compiler shows it by value in the error message. This helps you find the exact character in your text editor, in case some unprintable character has slipped into the source code. The compiler authors could have used hexadecimal but simply chose to use octal.
I will fault them for using an inconsistent style. In GCC 10.2, \g results in the message unknown escape sequence: '\g', but \ results in the message unknown escape sequence: '\040'. These should either be:
unknown escape sequence: 'g' and unknown escape sequence: '\040' or
unknown escape sequence: '\g' and unknown escape sequence: '\\040'.
Answer to Original Vague Question
C 2018 6.4.4.4 specific character constants in C source code, and paragraph 1 lists four choices for escape-sequence: simple-escape-sequence, octal-escape-sequence, hexdecimal-escape-sequence, and univesal-char-name.
An octal-escape-sequence is \ followed by one to three octal digits. Thus, \040 the character with code 0408 = 32, and \32 is the character with code 328 = 26.
There is no decimal escape sequence; \32 is an octal escape sequence, not decimal. (Also note that because octal escape sequences can have various lengths, if one wishes to follow it by an octal digit, one must use all three allowed digits. \324 will be parsed as one character, not as \32 followed by 4, whereas \0324 is \032 followed by 4.)
A hexadecimal-escape-sequence is \x followed by any positive integer number of hexadecimal digits. \x20 is equal to \040.
(A simple-escape-sequence is one of \', \", \?, \\, \a, \b, \f, \n, \r, \t, or \v. A universal-character-name is \u followed by four hexadecimal digits or \U followed by eight hexadecimal digits.)
I created code here that is supposed to determine if a URL contains an invalid set of characters, and regex may be a good way to go.
The problem here is that the target string in this code (stored in the value of the char array variable "find") is not being taken as a valid match even though my regex means match any character between square brackets at least once, and an exclamation mark is listed in the character set.
Also, when compiling with all warnings on, I receive these warnings:
./test2.c:6:25: warning: unknown escape sequence '\#'
./test2.c:6:25: warning: unknown escape sequence '\!'
./test2.c:6:25: warning: unknown escape sequence '\$'
./test2.c:6:25: warning: unknown escape sequence '\&'
./test2.c:6:25: warning: unknown escape sequence '\-'
./test2.c:6:25: warning: unknown escape sequence '\;'
./test2.c:6:25: warning: unknown escape sequence '\='
./test2.c:6:25: warning: unknown escape sequence '\]'
./test2.c:6:25: warning: unknown escape sequence '\_'
./test2.c:6:25: warning: unknown escape sequence '\~'
And the one that bugs me is:
./test2.c:6:25: warning: unknown escape sequence '\]'
because if I don't escape it, then I'm using it to end a set of characters to check for, yet I want that character to be included as a literal character in the check.
What can I do to fix this regex issue?
I want to be able to make an apache module from this after in C so that if a hacker tries using strange unacceptable characters in the URL, he will be directed to an error page. Once I figure this regex mess out, then I'll be on my way.
This is my code so far:
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
int main(){
const char* regex="/^[\#\!\$\&\-\;\=\?\[\]\_\~]+$/";
const char* find="!!!";
regex_t r;int s;
if ((s=regcomp(&r,regex,REG_EXTENDED)) != 0){
printf("Error compiling\n");return 1;
}
const int maxmat=10;
regmatch_t ml[maxmat];
if (regexec(&r,find,maxmat,ml,0) != 0){
printf("No match\n");
}else{
printf("Matched");
}
regfree(&r);
return 0;
}
This regex seems to work for me:
char* regex="(.*)[#!$&-;=?_~]+";
The various warnings you got were from the C compiler itself, not the regex compiler. The C compiler does not know anything about regular expressions or character sets. It does know about string lierals and the escape character for C strings is also '\', so it is trying to interpret all of the backslash characters as C string escape character for things like:
\n - newline
\" - quote character
\\ - backslash character
In order to pass a backslash to the regex engine, you must first escape it in the C string literal. Simply replace all of your \ with \\ and you will have more luck with you regular expressions.
If you have the option of compiling with C++11 compliant compiler you have the option of using raw strings, which get rid of all of the escaping in normal C strings:
strlen("\n") => 1
strlen(R"(\n)"); => 2
In the second case the string starts with R"( and continues until it finds )". So the second string consists of two characters \ and n rather than a single newline character.
This is very handy for using with regular expressions as it does not require multiple levels of escape characters.
A common beginner mistake is the assumption that you need or want to backslash stuff in a regular expression class. You don't; inside square brackets, every character represents just itself. There are a few special cases which require special handling, but not with backslashing.
If you want a literal ^ in the character class, it mustn't go first.
If you want a literal ] in the character class, it needs to go first (after any ^ to specify negation).
If you want a literal - in the character class, it needs to go first (even before any ], but after a ^ for negating the character class) or last.
By convention, if you want both ] and [, you usually put them next to each other.
So, you want
const char* regex="^[-][#!$&;=?_~]+$";
The slashes you had before and after the regex looked like you thought they were necessary or useful as regex separators; but they're not, so I took them out.
This will match a string consisting solely of the characters in your class. By your description, that's not really what you want. But you don't need a regex for finding an occurrence of one of these characters somewhere in a string; look at the general C string search functions.
This question already has answers here:
Why is percentage character not escaped with backslash in C?
(4 answers)
How to escape the % (percent) sign in C's printf
(13 answers)
Closed 9 years ago.
After reading over some K&R C I saw that printf can "recognize %% for itself" I tested this and it printed out "%", I then tried "\%" which also printed "%".
So, is there any difference?
Edit for code request:
#include <stdio.h>
int main()
{
printf("%%\n");
printf("\%\n");
return 0;
}
Output:
%
%
Compiled with GCC using -o
GCC version: gcc (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388]
%% is not a C escape sequence, but a printf formatter acting like an escape for its own special character.
\% is illegal because it has the syntax of a C escape sequence, but no defined meaning. Escape sequences besides the few listed as standard are compiler-specific. In all likelihood the compiler ignored the backslash, and printf did not see any backslash at runtime. If it had, it would have printed the backslash in the output, because backslash is not special to printf.
Both are not the same. The second one will print %, but in case of the first one, you will get compiler warning:
[Warning] unknown escape sequence: '%' [enabled by default]
The warning is self explanatory that there is no escape sequence like \% in C.
6.4.4.4 Character constants;
says
The double-quote " and question-mark ? are representable either by themselves or by the escape sequences \" and \?, respectively, but the single-quote ' and the backslash \ shall be represented, respectively, by the escape sequences \' and \\.
It is clear that % can't be represented as \%. There isn't any \% in C.
When "%%" is passed to printf it will print % to standard output, but "\%" in not an valid escape sequence in C. Hence the program will compile, but it will not print anything and will generate a warning:
warning: spurious trailing ‘%’ in format [-Wformat=] printf("%");
The list of escape sequences in C can be found in Escape sequences in C.
This won't print % for the second printf.
int main()
{
printf("%%\n");
printf("\%");
printf("\n");
return 0;
}
Output:
%
#include<stdio.h>
#include<conio.h>
FILE *fp;
int main()
{
int val;
char line[80];
fp=fopen("\Users\P\Desktop\Java\a.txt","rt");
while( fgets(line,80,fp)!=NULL )
{
sscanf(line,"%d",&val);
printf("val is:: %d",val);
}
fclose(fp);
return 0;
}
Why is there a compile error in the line fp=fopen("\Users\P\Desktop\Java\a.txt","rt")?
Escape your backslashes.
fp=fopen("\\Users\\P\\Desktop\\Java\\a.txt","rt");
xx.c:8:12: error: \u used with no following hex digits
fp=fopen("\Users\P\Desktop\Java\a.txt","rt");
^
xx.c:8:12: warning: unknown escape sequence '\P'
xx.c:8:12: warning: unknown escape sequence '\D'
xx.c:8:12: warning: unknown escape sequence '\J'
The issue with the backslash. Backslash is an escape in a C char string.
Try this
fp=fopen("\\Users\\P\Desktop\\Java\\a.txt","rt");
or this depending on your OS:
fp=fopen("/Users/P/Desktop/Java/a.txt","rt");
You may be familiar with how "\n" (newline) and "\t" (tab) are used in C-strings.
The compiler will look at any \<Character> and try to interpret it as an Escape-Sequence.
So, where you wrote "\Users\P\Desktop\Java\a.txt", the compiler is trying to treat
\U, \P, \D, \J and \a as special escape-sequences.
(The only one that seems to be valid is \a, which is the Bell/Beep sequence. The others should all generate errors)
As others have said, use \\ to insert a literal Backslash character, and not start an escape sequence.
P.S. Shame on you for not including the the compiler message in your question.
The worst questions all say, "I got an error", without ever describing what the error was.
I was playing around a bit with the C preprocessor, when something which seemed so simple failed:
#define STR_START "
#define STR_END "
int puts(const char *);
int main() {
puts(STR_START hello world STR_END);
}
When I compile it with gcc (note: similar errors with clang), it fails, with these errors:
$ gcc test.c
test.c:1:19: warning: missing terminating " character
test.c:2:17: warning: missing terminating " character
test.c: In function ‘main’:
test.c:7: error: missing terminating " character
test.c:7: error: ‘hello’ undeclared (first use in this function)
test.c:7: error: (Each undeclared identifier is reported only once
test.c:7: error: for each function it appears in.)
test.c:7: error: expected ‘)’ before ‘world’
test.c:7: error: missing terminating " character
Which sort of confused me, so I ran it through the pre-processor:
$ gcc -E test.c
# 1 "test.c"
# 1 ""
# 1 ""
# 1 "test.c"
test.c:1:19: warning: missing terminating " character
test.c:2:17: warning: missing terminating " character
int puts(const char *);
int main() {
puts(" hello world ");
}
Which, despite the warnings, produces completely valid code (in the bolded text)!
If, macros in C are simply a textual replace, why is it that my initial example would fail? Is this a compiler bug? If not, where in the standards does it have information pertaining to this scenario?
Note: I am not looking for how to make my initial snippet compile. I am simply looking for info on why this scenario fails.
The problem is that even though the code expands to " hello, world ", it's not being recognized as a single string literal token by the preprocessor; instead, it's being recognized as the (invalid) sequence of tokens ", hello, ,, world, ".
N1570:
6.4 Lexical elements
...
3 A token is the minimal lexical element of the language in translation phases 7 and 8. The
categories of tokens are: keywords, identifiers, constants, string literals, and punctuators.
A preprocessing token is the minimal lexical element of the language in translation
phases 3 through 6. The categories of preprocessing tokens are: header names,
identifiers, preprocessing numbers, character constants, string literals, punctuators, and
single non-white-space characters that do not lexically match the other preprocessing
token categories.69) If a ' or a " character matches the last category, the behavior is
undefined. Preprocessing tokens can be separated by white space; this consists of
comments (described later), or white-space characters (space, horizontal tab, new-line,
vertical tab, and form-feed), or both. As described in 6.10, in certain circumstances
during translation phase 4, white space (or the absence thereof) serves as more than
preprocessing token separation. White space may appear within a preprocessing token
only as part of a header name or between the quotation characters in a character constant
or string literal.
69) An additional category, placemarkers, is used internally in translation phase 4 (see 6.10.3.3); it cannot
occur in source files.
Note that neither ' nor " are punctuators under this definition.
The preprocessor runs in multiple phases. Phase 3, tokenization, occurs before expansion, so preprocessor macros must represent full tokens. In your case, STR_START and STR_END are tokenized and then substituted, which makes those tokens invalid.
Here
#define STR_START "
compiler expects string literal. String literal must end with closing quote. That's why compiler complains about missing terminating " character.
After macro expansion compiler complains again, because invalid tokens.
For example, MSVC compiler complains in other words:
error C2001: newline in constant
and after expansion it complains about missing quotes.