This question already has an answer here:
Why is "\?" an escape sequence in C/C++?
(1 answer)
Closed 8 years ago.
I was reading this and found the escape \?. What does means exactly this escape? the literal ? inside a string(I still can't see a reason) or is this a BNF grammar rule which I don't know about?
It specifies a literal question mark. see http://en.wikipedia.org/wiki/Digraphs_and_trigraphs
The backslash is used as a marker character to tell the compiler/interpreter that the next character has some special meaning. What that next character means is up to the implementation. For example C-style languages use \n to mean newline and \t to mean tab.
The use of the word "escape" really means to temporarily escape out of parsing the text and into a another mode where the subsequent character is treated differently.
It is used in a feature called trigraphs, it specifies a question mark. Using this you can write three-character sequence starting with question marks to substitute another character
From C11
C11 ยง6.4.4.4 Character constants Section 4
The double-quote " and question-mark ? are representable either by
themselves or by the escape sequences \" and \?, respectively, but the
single-quote ' and the backslash \ shall be represented, respectively,
by the escape sequences \' and \.
Related
I have one question.
I'm writing some code in C, on UNIX.
I need to write a special character in a file, because I need to divide my file in small sections.
Example:
'SPECIAL_CHARACTER'
section 1 with some text
'SPECIAL_CHARACTER'
section 2 with some text
etc..
I was thinking to use character '\1'.It seems to work, but it is ok? Or It is wrong?
To do these things without using characters like "\0" or "\n" what should I do?
I hear two different questions where you ask "Or It is wrong?"
I hear you asking "how can I designate a separator byte in my code?", and I hear you asking "what is a good choice for a separator byte?"
First, fundamentally, what you are asking about is covered in section 6.4.4.4 of the C language specification, which covers "C Character Constants". There are various places you can look up the formal C language spec, or you can search for "C Character Constants" for perhaps a friendlier description, etc.
In detail, a handful of letters can be used in escape sequences to stand in for single bytes of specific values; e.g., \n is one of those, as a stand-in for 0x0a (decimal 10), a byte designated (in ASCII) as a newline. Here are the legal ones:
\a \b \f \n \r \t \v
The escape sequences \0 and \1 work because C supports using \ followed by digits as an octal value. So, that'll also work with, say, \3 and \35, but not \9, and note that \35 has a decimal value of 29. (Google "octal values" if you don't immediately see why that's the case.)
There are other legal escape sequences:
\' \" \\ \? : ' " \ and ?, respectively
\xNNNN... : each 'N' can be a hexadecimal digit
And, of course, escape sequences are just one aspect of C character constants.
Second, whether or not you should use a given byte value as your file's section separator depends entirely on how your program will be used. As others have pointed out in the comments, there are commonplace prevailing practices on what sort of byte value to use for this sort of thing.
I personally agree that 0x1e makes perhaps the most sense since in ASCII it is the "record separator". Conforming to ASCII can matter if the data will need to be understood by other programs, or if your program will need to be understood by other people.
On the other hand, a simple code comment can make it clear to anyone reading your code what byte value you are using for separating sections of your data file, and any program that needs to understand your data files needs to 'know' a lot more about the file format than just what the record separator is. There is nothing magical about 0x1e : it is merely a convention, and a reserved spot on the ASCII table to facilitate a common need -- that is, record separation of text that could contain normal text separators like space, newline, and null.
Broadly, any byte value that won't show up in the contents of your sections would make a fine section separator. Since you say those contents will be text, there are well over 100 choices, even if you exclude \0 (0x00) and \n (0x0a). In ASCII, a handful of byte values have been set aside for this sort of purpose, so that helps reduce the choice from several dozen to just several. Even among those several, there are only a few commonly used as separators.
This question already has an answer here:
Simple parsing question [duplicate]
(1 answer)
Closed 5 years ago.
In many of the C expressions, white spaces are ignored (example: in case of a**b, where b is a pointer, whitespace is ignored) . But in few cases they cannot be ignored. We get many SO posts on x+++y and related (c++ spaces in operators , what are the rules). I know x+++y really mean (x++) + Y because of higher precedence for postfix. Also there is a difference between x++ +y and x+ ++y.
So whitespaces are not always ignored in c expressions. I want to know what is the rule for whitespaces in expressions. Where it is defined? When they are not ignored? Is it when two operators come one after the other especially increment/decrement operators?
Whitespace is only relevant for creating tokens. + and ++ are both valid tokens. The rule in C is that a token is formed from the longest sequence of characters that would create a valid token, so "++" without whitespace becomes a single "++" token while "+ +" with a space character becomes two "+" tokens. Since there is no "+++" token, "+++" becomes a "++" token followed by a "+" token.
I am currently learning about regex and I am trying to figure out how to capture a string in C that does not allow newlines. I have searched around and found answers regarding flex and lex but I'm trying to learn it a simplistic as I can to gain a better understanding.
This is a piece of expression that I have found searching and it appears to be common(I have found it a lot). But I still have yet to find a clear explanation as to what it means and how it is used.
\"(\\.|[^"])*\"
What this expression means is that there must be a doublequote at the beginning and at the end \", and there will be a sequence of zero or more o the following:
A backslash character \\ followed by any single character ., or
A non-doublequote character [^"]
The first clause is self-explanatory. The second clause is there to treat any single character preceded by backslash as an escape sequence. This ensures that the expression would capture any of the following strings to the end:
"string \"one\" has embedded doublequotes"
"string two \
is split across \
multiple lines"
"string\tthree\nhas\tembedded\tescape\tcharacters"
Apologies for the vagueness; I barely know how to pose this question.
Can anyone tell me the name of that family of 3 character constructs that represent another character or characters?
I think they were used in the old VT100 terminal days.
I know C supports them.
They are called trigraph. There are also two characters code called digraphs.
They are called trigraph sequences. E.g. ??/ maps to \. You have to take care to remember this when building regular expression-type parsers for C code.
I am doing programs in The C Programming Language by Kernighan and Ritchie.
I am currently at exercise 1-24 that says:
Write a program to check a C Program for rudimentary syntax errors
like unbalanced parentheses, brackets and braces. Don't forget about
quotes, both single and double, escape sequences, and comments.
I have done everything well... But I am not getting how escape sequences would affect these parentheses, brackets and braces?
Why did they warned about escape sequences?
In "\"", there are three double quote characters, but still it's a valid string literal. The middle " is escaped, meaning the outer two balance each other. Similarly, '\'' is a valid character literal.
Parentheses, brackets and braces are not affected, unless of course they appear in a string literal that you don't parse correctly because of an escaped quote.
I'd guess they mean that you need to differentiate between " (which starts or ends a string) and \" (which is a " character, possibly inside a string)
This is important if you're to avoid reporting e.g. strlen("\")"); as having unbalanced parentheses.
The obvious possibility would be an escaped quote inside a string. If you don't take the escape into account, you might think the string ended there. For example: "\")\"". The ) is part of the string literal, so it doesn't count as a mis-matched parenthesis.