Rule for ignoring whitespaces in C expression [duplicate] - c

This question already has an answer here:
Simple parsing question [duplicate]
(1 answer)
Closed 5 years ago.
In many of the C expressions, white spaces are ignored (example: in case of a**b, where b is a pointer, whitespace is ignored) . But in few cases they cannot be ignored. We get many SO posts on x+++y and related (c++ spaces in operators , what are the rules). I know x+++y really mean (x++) + Y because of higher precedence for postfix. Also there is a difference between x++ +y and x+ ++y.
So whitespaces are not always ignored in c expressions. I want to know what is the rule for whitespaces in expressions. Where it is defined? When they are not ignored? Is it when two operators come one after the other especially increment/decrement operators?

Whitespace is only relevant for creating tokens. + and ++ are both valid tokens. The rule in C is that a token is formed from the longest sequence of characters that would create a valid token, so "++" without whitespace becomes a single "++" token while "+ +" with a space character becomes two "+" tokens. Since there is no "+++" token, "+++" becomes a "++" token followed by a "+" token.

Related

Writing Regular Expressions for a C string

I am currently learning about regex and I am trying to figure out how to capture a string in C that does not allow newlines. I have searched around and found answers regarding flex and lex but I'm trying to learn it a simplistic as I can to gain a better understanding.
This is a piece of expression that I have found searching and it appears to be common(I have found it a lot). But I still have yet to find a clear explanation as to what it means and how it is used.
\"(\\.|[^"])*\"
What this expression means is that there must be a doublequote at the beginning and at the end \", and there will be a sequence of zero or more o the following:
A backslash character \\ followed by any single character ., or
A non-doublequote character [^"]
The first clause is self-explanatory. The second clause is there to treat any single character preceded by backslash as an escape sequence. This ensures that the expression would capture any of the following strings to the end:
"string \"one\" has embedded doublequotes"
"string two \
is split across \
multiple lines"
"string\tthree\nhas\tembedded\tescape\tcharacters"

What does mean \? escape in C grammar? [duplicate]

This question already has an answer here:
Why is "\?" an escape sequence in C/C++?
(1 answer)
Closed 8 years ago.
I was reading this and found the escape \?. What does means exactly this escape? the literal ? inside a string(I still can't see a reason) or is this a BNF grammar rule which I don't know about?
It specifies a literal question mark. see http://en.wikipedia.org/wiki/Digraphs_and_trigraphs
The backslash is used as a marker character to tell the compiler/interpreter that the next character has some special meaning. What that next character means is up to the implementation. For example C-style languages use \n to mean newline and \t to mean tab.
The use of the word "escape" really means to temporarily escape out of parsing the text and into a another mode where the subsequent character is treated differently.
It is used in a feature called trigraphs, it specifies a question mark. Using this you can write three-character sequence starting with question marks to substitute another character
From C11
C11 §6.4.4.4 Character constants Section 4
The double-quote " and question-mark ? are representable either by
themselves or by the escape sequences \" and \?, respectively, but the
single-quote ' and the backslash \ shall be represented, respectively,
by the escape sequences \' and \.

sub expression and grouping of subexpressins

I'm new to c language. Did "precedence" determine the grouping of sub expression. Can you explain how sub grouping works?
Explain why the strange output come when I do i=7; ++i+++i+++i; shows error while just putting space between ++i + ++i + ++i; don't give any error and answer is 22 in Gcc; how this output come?
I checked books also most of them have some "precedence" order and than some" associativity rules", no clear explanation about sub grouping.
can you explain me what to do whenever I saw these kind of mix expression. Almost every c language aptitude ask such type of question.
This is a duplicate of a few questions on SO, but here goes.
Maximal Munch
The C parser will try to grab as many characters as it can to split your program into tokens. In ++i+++i+++i; the parser splits the string into:
++
i
++
+
i
++
+
i
;
It then sees that preincrement (token 1) and postincrement (token 3) are both applied to the first i (token 2), and reports an error. The parser does not backtrack and reparse the string to use + for token 3 and ++ for token 4. If the compiler had the license to do this, a malicious program could take arbitrarily-long time to parse.
Multiple Side-Effects
C and its family of languages defines a sequence point as a point in a statement's execution where all variables have definite values. It is undefined behavior to have more than one side-effect occur to a variable between sequence points. Simplify your example a bit. What could this code do? I have changed a preincrement to a predecrement so I can talk about them easier.
int j = ++i + --i;
Increment i.
Use the incremented value for the first summand.
Decrement i.
Use the decremented value for the second summand.
Add the two values and assign to j.
However, the C standard does not fix the order of these effects except that step 1 must precede step 2, step 3 must precede step 4, and step 5 must be last. What your compiler does need not be what another compiler does, and it need not be consistent, even in the same program. As the joke in the Jargon File goes:
nasal demons, n.
Recognized shorthand on the Usenet group comp.std.c for any unexpected behavior of a C
compiler on encountering an undefined construct. During a discussion on that group in early
1992, a regular remarked “When the compiler encounters [a given undefined construct] it is
legal for it to make demons fly out of your nose” (the implication is that the compiler may
choose any arbitrarily bizarre way to interpret the code without violating the ANSI C
standard). Someone else followed up with a reference to “nasal demons”, which quickly
became established. The original post is web-accessible at http://groups.google.com/groups?hl=en&selm=10195%40ksr.com.

K and R exercise 1-24

I am doing programs in The C Programming Language by Kernighan and Ritchie.
I am currently at exercise 1-24 that says:
Write a program to check a C Program for rudimentary syntax errors
like unbalanced parentheses, brackets and braces. Don't forget about
quotes, both single and double, escape sequences, and comments.
I have done everything well... But I am not getting how escape sequences would affect these parentheses, brackets and braces?
Why did they warned about escape sequences?
In "\"", there are three double quote characters, but still it's a valid string literal. The middle " is escaped, meaning the outer two balance each other. Similarly, '\'' is a valid character literal.
Parentheses, brackets and braces are not affected, unless of course they appear in a string literal that you don't parse correctly because of an escaped quote.
I'd guess they mean that you need to differentiate between " (which starts or ends a string) and \" (which is a " character, possibly inside a string)
This is important if you're to avoid reporting e.g. strlen("\")"); as having unbalanced parentheses.
The obvious possibility would be an escaped quote inside a string. If you don't take the escape into account, you might think the string ended there. For example: "\")\"". The ) is part of the string literal, so it doesn't count as a mis-matched parenthesis.

3 plus symbols between two variables (like a+++b) in C [duplicate]

This question already has answers here:
What does the operation c=a+++b mean?
(9 answers)
Closed 9 years ago.
#include <stdio.h>
int main()
{
int a=8,b=9,c;
c=a+++b;
printf("%d%d%d\n",a,b,c);
return 0;
}
The program above outputs a=9 b=9 and c=17. In a+++b why is the compiler takes a++ and then adds with b. Why is it not taking a + and
++b? Is there a specific name for this a+++b. Please help me to understand.
I like the explanation from Expert C Programming:
The ANSI standard specifies a convention that has come to be known as
the maximal munch strategy. Maximal munch says that if there's more
than one possibility for the next token, the compiler will prefer to
bite off the one involving the longest sequence of characters. So the
example will be parsed
c = a++ + b;
Read Maximum Munch Principle
"maximal munch" or "longest match" is the principle that when creating some construct, as much of the available input as possible should be consumed.
Every compiler has a tokenizer, which is a component that parses a source file into distinct tokens (keywords, operators, identifiers etc.). One of the tokenizer's rules is called "maximal munch", which says that the tokenizer should keep reading characters from the source file until adding one more character causes the current token to stop making sense
Order of operations in C dictate that unary operations have higher precedence than binary operations.
You could use a + (++b) if you wanted b to be incremented first.

Resources