I tried to solve exercise 1-24 in K&R C book in which you have to create a program which can detect basic syntax errors (unbalanced parentheses, brackets and so on). I ran some tests to debug it on C source files scattered on my system.
My program detected an error when it met this line in a file :
av_opt_set_q (abuffer_ctx, "time_base", (AVRational ){ 1, INPUT_SAMPLERATE }, AV_OPT_SEARCH_CHILDREN);
I made the assumption that, every time a regular curly bracket is encountered (outside comments, double quotes), parentheses and brackets must be balanced. This is not true as this error showed. Unfortunately I cannot find what it means. Thanks for your help.
This
(AVRational ){ 1, INPUT_SAMPLERATE }
is a compound literal. Check more about it here.
From the C11, chapter §6.5.2.5
A postfix expression that consists of a parenthesized type name followed by a brace-enclosed
list of initializers is a compound literal. It provides an unnamed object whose value is given by the initializer list.
That said, I do not see how the braces are not balanced here. This is a valid syntax and your tool should consider this while making decision.
Related
In clang, a pointer to a function and the designation/designator have distinct resulting grouping symbols, parenthesis compared to curly braces as seen in the code fragments further down.
What is:
type * type(*)(type)
used for and versus/what is:
type *(type)
used for? Curly braces are for use in lists, compound statements and aggregates while parenthesis are primarily used for ordering, grouping, casting and arguments.
It looks like the first is a type reference to a function pointer accepting a specific type argument which, if enclosed in parenthesis either orders the group be evaluated individually if part of a larger expression or creates a cast which after the resolution of the function would convert it to a specific value of the return type of the specified size and alignment requirements instead of the binary sequence containing the value.
The second one is in a block with curly braces and thus a compound object or statement block since I can't see how it would be attached to an aggregate though it may be reference to a function body result of a call.
The second snippet looks like a type reference to a type defined or object group and, since a type may not share a name with an object, effectively making it a keyword, I presume that this somehow creates a named object that is multiplied with itself, or is a reverse cast or something since keywords with an asterisk have no real use aside from object declarations.
P
Why does the initial have parenthesis enclosing the symbol whereas the second symbol uses curly braces?
For example, the following code fragment when compiled,
.
(*function)(1);
mentions the symbol in the comments after compiling is noted as:
(type *(*)(type))
with parenthesis encompassing the representation in the compilation log of the console.
.
whereas:
.
function(1);
mentions the symbol in the comments after compiling is noted as:
{type *(type)}
with curly braces encompassing the representation in the compilation log of the console.
.
What is the distinction between the two for, and under what circumstances is it practically applied in any usual/useful contexts?
My general observations of the syntax and semantics of the C language suggest to me the first appears to be a basic ordered-group expression which may be used in a comma-separated list or as a function parameter whereas the second appears to be a struct, union or block statement.
I was exploring function pointers and so I tried both and those two notes were clang's comments of the compilation. None of the other 5 posts I read seemed to clarify the nature of the grouping symbols in any context but did provided additional insight to function pointers and their applications.
Thank you!
function designator is an expression with function type.
for example in
a = foo(x);
foo is a function designator.
Function pointer is a reference to the function. Function designators decay to function pointers when used as values.
int (*fptr)(int) = foo;
So from my limited understanding, C has syntax ambiguity as seen in the expression:
T(*b)[4];
Here it is said about this sort of thing:
The well-known "typedef problem" with parsing C is that the standard C grammar is ambiguous unless the lexer distinguishes identifiers bound by typedef and other identifiers as two separate lexical classes. This means that the parser needs to feed scope information to the lexer during parsing. One upshot is that lexing must be done concurrently with parsing.
The problem is it can be interpreted as either multiplication or as a pointer depending on context (I don't 100% understand the details of this since I'm not expert in C, but I get the gist of it and why it's a problem).
typedef a;
b * a; // multiplication
a * b; // b is pointer to type a
What I'm wondering is if you were to parse C with a Parsing Expression Grammar (PEG) such as this C grammar, how does it handle this ambiguity? I assume this grammar is not 100% correct because of this problem, and so am wondering how you would go about fixing it. What does it need to keep track of or do differently to account for this?
The usual way this is handled in a PEG grammar is to use a semantic predicate on a rule such that the rule only matches when the predicate is true, and have the predicate check whether the name in question is a type in the current context or not. In the link you give, there's a rule
typedefName : Identifier
which is the (only) one that needs the semantic predicate to resolve this ambiguity. The predicate simply checks the Identifier in question against the definitions in the current scope. If it is not defined as a type, then it rejects this rule, so the next lower priority one will (try to) match.
I was digging SnoopSnitch's source code when I found in one of it's libraries this line, written in C :
(_s, m);
_s and m are both structures so what can it be ?
PS: Check the end of this file to see the actual source code.
C hasn't "methods" at all, it has functions.
In any event, the code you present is not a function call, it is an expression statement. The parentheses serve their precedence-overriding grouping function, albeit unnecessarily, and the comma is the comma operator, which evaluates both operands, and has as its result the value of its second operand.
Inasmuch as the result is unused and the comma's operands are simple variable names, the statement overall has no side effects. The only purpose I can think of is the one #chux suggested in comments: to provide a statement where you can insert a breakpoint for debugging, and especially for examining the values at that point of the two variables involved.
I am learning the basics of C-language and I did not understand how this code is working, It should give an error because I am using an assignment operator instead of using equal to (==) in the if block
#include<stdio.h>
int main()
{
int i=4;
if(i=5){
printf("Yup");
}
else
printf("Nope");
}
While it might not seem intuitive, an assignment is actually a valid expression, with the assigned value being the value of the expression.
So when you see this:
if(i=5){
It is effectively:
if(5){
So why is this behavior allowed? A classic example is that it allows you to call a function, save the return value, and check the return value in one shot:
FILE *fp;
if ((fp = fopen("filename","r")) == NULL) {
perror("fopen failed");
exit(1);
}
// use fp
Here, the return value of fopen is assigned to fp, then fp is checked to see if it is NULL, i.e. if fopen failed.
Assignment operator when used inside an if statement will not give any error..
The assignent i = 5 will take place, and the if statement will be evaluated according to the result of the expression on the right side of the =. In this case that is 5.
The expression i=5 evaluates to a non zero value , hence the if() condition turns true.
I believe this question can be interpreted in two different ways. The first is the most literal: "Why does a C compiler allow this syntax?" The second is probably more vauge: "Why was C designed to allow such syntax to be legal?"
The answer to the first can be found in The C Programming Language (a highly recommend book if you do not already have it) and comes down too "because the language says so. It's just the way it is defined.
In the book you can refer to Appendix A to find a description of how the grammar is broken down. Specifically A7. Expressions, and A9. Statements.
A9.4 Selection Statements states:
selection-statement:
if ( expression ) statement
if ( expression ) statement else statement
switch ( expression ) statement
Meaning that any valid expression, of which assignment applies, is legal as the 'argument' to the selection with a minor cavet (emphasis is my own):
In both forms of the if statement, the expression, which must have arithmetic or pointer type, is evaluated, including all side effects, and if it compares unequal to 0, the first substatement is executed.
This might seem odd if you are coming from a language like Java, that requires the result of an expression used in a conditional to be expressly 'boolean' in nature, that attempts to lower runtime errors that are the results of typographical issues (i.e. using = instead of ==).
As for why C's syntax is like this I am not sure. A quick Google search returns nothing immediately but I offer this conjection (in which I stess I have found nothing to back up my claim and my experience with assembly languages is minimal):
C was designed to be a low level language that mapped closely to assembly level mechanisms; making it easier to implement a compiler for, and to translate assembly to.
In assembly level languages branches are the results of instructions that look at registers and decided to do. The work previously placed in the register is of no concern. Decrementing a counter is not a boolean operation but testing the resulting value in the register is. Allowing a general expression possibly made implementations of C easier to write. The original compiler written by Dennis Ritche simply spat our assembly files that needed to be assembled manually.
In C, the assignment operator = is just that: an operator. You can use it everywhere where an expression is expected,† including in the control expression of an if statement. Modern compilers typically warn about this, make sure to turn on this warning.
† Except where a constant expression is expected as an expression involving the = operator is not a constant expression.
Consider the function call (calling int sum(int, int))
printf("%d", sum(a,b));
How does the compiler decide that the , used in the function call sum(int, int) is not a comma operator?
NOTE: I didn't want to actually use the comma operator in the function call. I just wanted to know how the compiler knows that it is not a comma operator.
Look at the grammar for the C language. It's listed, in full, in Appendix A of the standard. The way it works is that you can step through each token in a C program and match them up with the next item in the grammar. At each step you have only a limited number of options, so the interpretation of any given character will depend on the context in which it appears. Inside each rule in the grammar, each line gives a valid alternative for the program to match.
Specifically, if you look for parameter-list, you will see that it contains an explicit comma. Therefore, whenever the compiler's C parser is in "parameter-list" mode, commas that it finds will be understood as parameter separators, not as comma operators. The same is true for brackets (that can also occur in expressions).
This works because the parameter-list rule is careful to use assignment-expression rules, rather than just the plain expression rule. An expression can contain commas, whereas an assignment-expression cannot. If this were not the case the grammar would be ambiguous, and the compiler would not know what to do when it encountered a comma inside a parameter list.
However, an opening bracket, for example, that is not part of a function definition/call, or an if, while, or for statement, will be interpreted as part of an expression (because there's no other option, but only if the start of an expression is a valid choice at that point), and then, inside the brackets, the expression syntax rules will apply, and that allows comma operators.
From C99 6.5.17:
As indicated by the syntax, the comma operator (as described in this subclause) cannot
appear in contexts where a comma is used to separate items in a list (such as arguments to functions or lists
of initializers). On the other hand, it can be used within a parenthesized expression or within the second
expression of a conditional operator in such contexts. In the function call
f(a, (t=3, t+2), c)
the function has three arguments, the second of which has the value 5.
Another similar example is the initializer list of arrays or structs:
int array[5] = {1, 2};
struct Foo bar = {1, 2};
If a comma operator were to be used as the function parameter, use it like this:
sum((a,b))
This won't compile, of course.
The reason is the C Grammar. While everyone else seems to like to cite the example, the real deal is the phrase structure grammar for function calls in the Standard (C99). Yes, a function call consists of the () operator applied to a postfix expression (like for example an identifier):
6.5.2 postfix-expression:
...
postfix-expression ( argument-expression-list_opt )
together with
argument-expression-list:
assignment-expression
argument-expression-list , assignment-expression <-- arglist comma
expression:
assignment-expression
expression , assignment-expression <-- comma operator
The comma operator can only occur in an expression, i.e. further down the in the grammar. So the compiler treats a comma in a function argument list as the one separating assignment-expressions, not as one separating expressions.
Existing answers say "because the C language spec says it's a list separator, and not an operator".
However, your question is asking "how does the compiler know...", and that's altogether different: It's really no different from how the compiler knows that the comma in printf("Hello, world\n"); isn't a comma operator: The compiler 'knows' because of the context where the comma appears - basically, what's gone before.
The C 'language' can be described in Backus-Naur Form (BNF) - essentially, a set of rules that the compiler's parser uses to scan your input file. The BNF for C will distinguish between these different possible occurences of commas in the language.
There are lots of good resources on how compilers work, and how to write one.
The draft C99 standard says:
As indicated by the syntax, the comma operator (as described in this subclause) cannot
appear in contexts where a comma is used to separate items in a list (such as arguments to functions or lists of initializers). On the other hand, it can be used within a parenthesized expression or within the second expression of a conditional operator in such contexts. In the function call f(a, (t=3, t+2), c) the function has three arguments, the second of which has the value 5.
In other words, "because".
There are multiple facets to this question. One par is that the definition says so. Well, how does the compiler know what context this comma is in? That's the parser's job. For C in particular, the language can be parsed by an LR(1) parser (http://en.wikipedia.org/wiki/Canonical_LR_parser).
The way this works is that the parser generates a bunch of tables that make up the possible states of the parser. Only a certain set of symbols are valid in certain states, and the symbols may have different meaning in different states. The parser knows that it is parsing a function because of the preceding symbols. Thus, it knows the possible states do not include the comma operator.
I am being very general here, but you can read all about the details in the Wiki.