Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am writing a script that can process .c and .h files. Using regular expressions I am finding all functions within a given file. During my experiences with C I always defined functions in the following manner:
void foo(int a){
//random code
}
Is it possible to declare a function in the following manner:
void
foo(int a){
//random code
}
I always assumed that the function type, name, and parameters needed to be in the same line, but I've been told otherwise so I'm not exactly sure.
Firstly, what kind of whitespace - space, newline, tab etc. - you use in C source code does not matter, as long as there's whitespace where whitespace is required. Also, it does not matter how much whitespace you use.
Secondly, taking into account C preprocessor capabilities, one can write function declarations (and the rest of the code) as
vo\
id f\
o\
o(i\
n\
t\
\
a)
(Obviously, there are many more different ways in which preprocessor can obfuscate a function definition. For your particular task, it would be a better idea to work on already preprocessed source code.)
Thirdly, C still supports K&R-style function definitions that look as follows
void foo(a)
int a;
{
...
}
C is a free-form language; white space is not significant, in general, except to separate tokens. (There are caveats to that assertion, notably in preprocessor directives, and inside string literals and character literals, but in general that is accurate.) Thus, the following is a ghastly but legitimate C function definition:
/* Comment before the type */ SomeUserDefinedTypeName
/??/
* comments, with trigraphs to boot
*??/
/
FunctionName
(
SomeType param1,
AnotherType
(
*
param2
)
[
]
)
/\
/ one line comment
// another line comment \
yes, this is part \
of that one-line comment too
{
...
}
Of course, anyone who produces a function like that deserves to be hung, drawn and quartered — or, at least, severely castigated — but you will have to decide on how general-purpose you want your code to be. If it needs to work with any C whatsoever, you will need to handle c**p1 like this. On the other hand, you can probably get away with a lot less sophisticated parsing.
1 There's an A and an R missing, and I'm not talking about fish.
Don't try to parse C with regexes
This is a valid C function, named test, which takes a const pointer to void (named ptr) and returns a pointer to a function that takes an array of five pointer to functions which return an int and returns an unsigned int.
unsigned int (*(test)(const void *ptr)) (int (*[5])())
{
return 0;
}
(bonus points if someone can find a real-world scenario where this thing could have any use)
Although deprecated, you may also come in contact with the "old style" function notation:
// declaration
unsigned int test2();
// definition
unsigned int test2(ptr)
const void *ptr;
{
return 0;
}
Intermixed in this you can find comments (both multi-line and single-line since C99), trigraphs and even macros:
#define defun(fn) fn (
#define fstart ){
#define fend }
void defun(test3) int a, double b
fstart
printf("%d %f", a, b);
fend
http://ideone.com/JDDeMr
Even excluding the pathological macro scenario, "plain" regexes cannot even start to parse this thing because they can't match parentheses; maybe you can do something with extended regexes, but let's be honest, do you really want to cope with this stuff? Use a ready-made parser or even a compiler (libclang comes to mind) and let it do the dirty work.
I think that for a beginner user writing from zero a code that uses regex in order to parse a source code, is quite difficult but it could be quite inefficient too.
As I've stated before, I suggest to use a well written library like pyparsing that will let you translate the BNF notation of the language to the specific object of the library.
After you have defined a parsing element written using the pyparsing API, you can easily parse a simple string or a complex file using the library too.
In a first moment could be a bit difficult, but I think that you can easily use it with great results.
I suggest you to have a look to this simple C grammar defined using the pyparsing library. It's very well written and documented.
Both of these are correct (will compile) because the C compiler will ignore whitespace(s) between the return type and function-name. The format for function definition is usually:
<return type> <function name> (<parameter list>) {
<body>
}
During compilation the return type and function-name are separate tokens, the parser will ignore the whitespace(s) between them. Hope this helps.
So, whitespace includes characters like tabs, newlines, and spaces (among others).
In general, these whitespace characters are interchangeable. That is, you could replace every space with a newline (or vice-versa), and the compiler wouldn't care.
There are a few places where newlines are treated specially. Some that come to mind include the preprocessor, string literals, character literals, and single line comments.
With the two examples that you've shown, both are parsed identically. Additionally, we could also write it as:
void
foo (
int
a
) { //random code
}
or:
void
foo
(
int
a
)
{
//random code
}
or:
void foo(int a){ /* random code */ }
Related
I have a very simple parser that provides a small section of the C language; it looks at a well-formed translation unit and, with one pass and online, determine what the global symbols and types (function, struct, union, variable,) if one is not trying to trick it. However, I'm having trouble determining if it's a struct or a function in this example,
#define CAT_(x, y) x ## y
#define CAT(x, y) CAT_(x, y)
#define F_(thing) CAT(foo, thing)
static struct F_(widget) { int i; }
F_(widget);
static struct F_(widget) a(void) { int i;
return i = 42, F_(widget).i = i, F_(widget); }
int main(void) {
a();
return 0;
}
It assumes that the parenthesis is a function and parses this this way,
[ID<stati>, ID<struc>, ID<F_>, LPAR<(>, ID<widge>, RPAR<)>, LBRA<{>, RBRA<}>].
[ID<F_>, LPAR<(>, ID<widge>, RPAR<)>, SEMI<;>].
[ID<stati>, ID<struc>, ID<F_>, LPAR<(>, ID<widge>, RPAR<)>, ID<a>, LPAR<(>, ID<void>, RPAR<)>, LBRA<{>, RBRA<}>].
[ID<int>, ID<main>, LPAR<(>, ID<void>, RPAR<)>, LBRA<{>, RBRA<}>].
When in fact, what it thinks is the function at the top is actually a struct declaration and the top two should be concatenated. What is the simplest way to recognise that this?
Two-pass, emulating what actually happens in macro replacement; I would have to build a subset of the C pre-processor;
like the C lexer hack, except with macros;
backtrack with the semicolon at the end; that seems hard;
somehow recognise the difference at the beginning, (probably requiring me to add struct to my symbol table.)
As mentioned in the comments, if you want to be able to handle preprocessor macros, you will need to implement (or borrow) a preprocessor.
Writing a preprocessor mostly involves coming to terms with the formal description in the C standard, but it is not otherwise particularly challenging. It can be done online with the resulting token stream fed into a parser, so it doesn't really require a second pass.
(This depends on how you define a "pass" I suppose, but in my usage a one-pass parser reads the input only once without creating and rereading a temporary file. And that is definitely doable.)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm learning C and programming in general, and I don't know when to return a value and when to use void.
Is there any rule to apply when to use one over the another ?
Is there any difference between this two cases? I know that first case is
working with a local copy of int (n) , and second with original value.
#include <stdio.h>
int case_one(int n)
{
return n + 2;
}
void case_two(int *n)
{
*n = *n + 2;
}
int main(int argc, char *argv[])
{
int n = 5;
n = case_one(n);
printf("%i\n", n);
n = 5;
case_two(&n);
printf("%i\n", n);
return 0;
}
There is one more reason to use out param instead of return value - error handling. Usually return value (int) of the function call in C represents success of the operation. Error represented by not 0 value.
Example:
#include <stdio.h>
int extract_ip(const char *str, int out[4]) {
return -1;
}
int main() {
int out[4];
int rv = extract_ip("test", out);
if (rv != 0) {
printf("Error :%d", rv);
};
}
This approach used in POSIX socket API for example.
It very much depends on what you want to do, but basically You should use the former unless You have a good reason to use the latter.
Think of the implications of the choices. First, let's think about the way we provide input to function. It is quite often that you provide explicitly constant, compile-time constant or temporary data as input:
foo(1);
const int a = 2;
foo(a);
int x = 5;
int y = 5;
foo(x + y);
In all of the above cases the source of initial value is not a viable location for storing the result.
Next, let's think about how we may want to store the result. Foremost we may often want to use the result elsewhere. It may be inconvenient to use the same variable to store and then pass input, and to store output. But furthermore, often we would like to use the result immediately. We invoke the function as a part of larger expression:
x = foo(1) + foo(2);
Rewriting the preceding line in a manner that would use a pointer would require much unnecessary code - that is time and complication that we certainly don't want, when it's not really buying anything.
So when do we actually want to use a pointer? C functions are pass-by-value. Whenever we pass anything, a copy is created. We can then work on that copy and upon returning it, it requires copying again. If we do know that all we want to do is manipulate certain data set in place, we can provide a pointer as a handler and all that is copied is several bytes that store address.
So the former, prevalent way to create functions is flexible and leads to concise usage. The latter is useful for manipulation of objects in place.
Obviously sometimes our input actually is an address, and that's a trivial case for using pointers as function parameters.
I'm learning C and programming in general, and i don't know when to return a value and when to use void.
There is no definitive rule, and it is a matter of opinion. Notice that you might re-code your case_one as the following:
// we take the convention that the first argument would be ...
// a pointer to the "result"
void case_one_proc(int *pres, int n) {
*pres = n+2;
}
then a code like
int i = j+3; /// could be any arbitrary expression initializing i
int r = case_one(i);
is equivalent to
int i = j+3; // the same expression initializing i
int r;
case_one_proc(&r, i); //we pass the address of the "result" as first argument
Hence, you can guess that you might replace any whole C program with an equivalent program having only void returning functions (that is, only procedures). Of course, you may have to introduce supplementary variables like r above.
So you see that you might even avoid any value returning function. However, that would not be convenient (for human developers to code and to read other code) and might not be efficient.
(actually you could even make a complex C program which transforms the text of any C program -given by their several translation units- into another equivalent C program without any value returning function)
Notice that at the most elementary machine code level (at least on real processors like x86 and ARM), everything are instructions, and expressions don't exist anymore! And you favorite C compiler is transforming your program (and every C program in practice) into such machine code.
If you want more theory about such whole-program transformations, read about A-normal forms and about continuations (and CPS transformations)
Is there any rule to apply when to use one over the another ?
The rule is to be pragmatic, and favor first the readability and understandability of your source code. As a rule of thumb, any pure function implementing a mathematical function (like your case_one, which mathematically is a translation) is better coded as returning some result. Conversely, any program function which has mostly side effects is often coded as returning void. For cases in between, use your common sense, and look at existing practice -their source code- in existing free software projects (e.g. on github). Often a side effecting procedure might return some error code (or some success flag). Read documentation of printf & scanf for good examples.
Notice also that many ABIs and calling conventions are passing the result (and some arguments) in processor registers (and that is faster than passing thru memory).
Notice also that C has only call by value.
Some programming languages have only functions returning one value (sometimes ignored, or uninteresting), and have just expressions, without any notion of statement or instruction. Read about functional programming. An excellent example of a programming language with only value returning functions is Scheme, and SICP is an excellent introductory book (freely available) to programming that I strongly recommend.
The first approach is preferred, where you return a value. The second can be used when multiple values are computed by the function.
For example, the function strtol has this prototype:
long strtol(const char *s, char **endp, int base);
strtol attempts to interpret the initial portion of the string pointed to by s as the representation of a long integer expressed in base base. It returns the converted value with a return statement, and stores a pointer to the character that follows the number in s into *endp. Note however that this standard function should have returned 3 values: the converted value, the updated pointer and a success indicator.
There are other ways to return multiple values:
returning a structure.
updating a structure to which you receive a pointer.
C offers some flexibility. Sometimes different methods are equivalent and choosing one is mostly a matter of conventions, personal style or local practice, but for the example you give, the first option is definitely preferred.
The question is: Could you please help me understand better the RAII macro in C language(not c++) using only the resources i supply at the bottom of this question? I am trying to analyse it in my mind so as to understand what it says and how it makes sense(it does not make sense in my mind). The syntax is hard. The focus of the question is: i have trouble reading and understanding the weird syntax and its implementation in C language.
For instance i can easily read, understand and analyse(it makes sense to me) the following swap macro:
#define myswap(type,A,B) {type _z; _z = (A); (A) = (B); (B) = _z;}
(the following passage is lifted from the book: Understanding C pointers)
In C language the GNU compiler provides a nonstandard extension to
support RAII.
The GNU extension uses a macro called RAII_VARIABLE. It declares a
variable and associates with the variable:
A type
A function to execute when the variable is created
A function to execute when the variable goes out of scope
The macro is shown below:
#define RAII_VARIABLE(vartype,varname,initval,dtor) \
void _dtor_ ## varname (vartype * v) { dtor(*v); } \
vartype varname __attribute__((cleanup(_dtor_ ## varname))) = (initval)
Example:
void raiiExample() {
RAII_VARIABLE(char*, name, (char*)malloc(32), free);
strcpy(name,"RAII Example");
printf("%s\n",name);
}
int main(void){
raiiExample();
}
When this function is executed, the string “RAII_Example” will be displayed. Similar results can be achieved without using the GNU extension.
Of course you can achieve anything without using RAII. RAII use case it to not have to think about releasing ressources explicitly. A pattern like:
void f() {
char *v = malloc(...);
// use v
free v;
}
need you to take care about releasing memory, if not you would have a memory leak. As it is not always easy to release ressources correctly, RAII provides you a way automatize the freeing:
void f() {
RAII_VARIABLE(char*, v, malloc(...), free);
// use v
}
What is interesting is that ressource will be released whatever the path of execution will be. So if your code is a kind of spaghetti code, full of complex conditions and tests, etc, RAII lets you free your mind about releasing...
Ok, let's look at the parts of the macro line by line
#define RAII_VARIABLE(vartype,varname,initval,dtor) \
This first line is, of course, the macro name plus its argument list. Nothing unexpected here, we seem to pass a type, a token name, some expression to init a variable, and some destructor that will hopefully get called in the end. So far, so easy.
void _dtor_ ## varname (vartype * v) { dtor(*v); } \
The second line declares a function. It takes the provided token varname and prepends it with the prefix _dtor_ (the ## operator instructs the preprocessor to fuse the two tokens together into a single token). This function takes a pointer to vartype as an argument, and calls the provided destructor with that argument.
This syntax may be unexpected here (like the use of the ## operator, or the fact that it relies on the ability to declare nested functions), but it's no real magic yet. The magic appears on the third line:
vartype varname __attribute__((cleanup(_dtor_ ## varname))) = (initval)
Here the variable is declared, without the __attribute__() this looks pretty straight-forward: vartype varname = (initvar). The magic is the __attribute__((cleanup(_dtor_ ## varname))) directive. It instructs the compiler to ensure that the provided function is called when the variable falls out of scope.
The __attribute__() syntax is is a language extension provided by the compiler, so you are deep into implementation defined behavior here. You cannot rely on other compilers providing the same __attribute__((cleanup())). Many may provide it, but none has to. Some older compilers may not even know the __attribute__() syntax at all, in which case the standard procedure is to #define __attribute__() empty, stripping all __attribute__() declarations from the code. You don't want that to happen with RAII variables. So, if you rely on an __attribute__(), know that you've lost the ability to compile with any standard conforming compiler.
The syntax is little bit tricky, because __attribute__ ((cleanup)) expects to pass a function that takes pointer to variable. From GCC documentation (emphasis mine):
The function must take one parameter, a pointer to a type compatible
with the variable. The return value of the function (if any) is
ignored.
Consider following incorrect example:
char *name __attribute__((cleanup(free))) = malloc(32);
It would be much simpler to implement it like that, however in this case free function implicitely takes pointer to name, where its type is char **. You need some way to force passing the proper object, which is the very idea of the RAII_VARIABLE function-like macro.
The simplified and non-generic incarnation of the RAII_VARIABLE would be to define function, say raii_free:
#include <stdlib.h>
void raii_free(char **var) { free(*var); }
int main(void)
{
char *name __attribute__((cleanup(raii_free))) = malloc(32);
return 0;
}
i want to concatenate a lot of macros in order to pass them as parameter in a struck array. to be more specific i have this struct
static struct
{
unsigned int num_irqs;
volatile __int_handler *_int_line_handler_table;
}_int_handler_table[INTR_GROUPS];
and I want to pass as num_irqs parameter a series of macros
AVR32_INTC_NUM_IRQS_PER_GRP1
AVR32_INTC_NUM_IRQS_PER_GRP2
...
first I thought to use this code
for (int i=0;i<INTR_GROUPS;i++)
{
_int_handler_table[i].num_irqs = TPASTE2(AVR32_INTC_NUM_IRQS_PER_GRP,i);
}
but it takes the i as char and not the specific value each time. I saw also that there is a MREPEAT macro defined in the preprocessor.h but I do not understand how it is used from the examples.
Can anyone can explain the use of MREPEAT or another way to do the above.
Keep in mind the preprocessor (which manipulates macros) runs before the compiler. It's meant to manipulate the final source code to be submitted to the compiler.
Hence, it has no idea of what value has a variable. For the preprocessor, i means i.
What you try to do is a bit complex, especially keeping in mind that preprocessor cannot generate preprocessor directives.
But it can generate constants.
Speaking of which, for your use case, I would prefer to use a table of constants, such as :
const int AVR32_INTC_NUM_IRQS_PER_GRP[] = { 1, 2, 3, 4, 5 };
for (int i=0;i<INTR_GROUPS;i++)
{
_int_handler_table[i].num_irqs = TPASTE2(AVR32_INTC_NUM_IRQS_PER_GRP[i]);
}
C doesn't work like that.
Macros are just text-replacement which happens att compile-time. You can't write code to construct a macro name, that doesn't make sense. The compiler is no longer around when your code runs.
You probably should just do it manually, unless the amount of code is very large (in which case code-generation is a common solution).
I'm trying to write a C parser, for my own education. I know that I could use tools like YACC to simplify the process, but I want to learn as much as possible from the experience, so I'm starting from scratch.
My question is how I should handle a line like this:
doSomethingWith((foo)(bar));
It could be that (foo)(bar) is a type cast, as in:
typedef int foo;
void doSomethingWith(foo aFoo) { ... }
int main() {
float bar = 23.6;
doSomethingWith((foo)(bar));
return 0;
}
Or, it could be that (foo)(bar) is a function call, as in:
int foo(int bar) { return bar; }
void doSomethingWith(int anInt) { ... }
int main() {
int bar = 10;
doSomethingWith((foo)(bar));
return 0;
}
It seems to me that the parser cannot determine which of the two cases it is dealing with solely by looking at the line doSomethingWith((foo)(bar)); This annoys me, because I was hoping to be able to separate the parsing stage from the "interpretation" stage where you actually determine that the line typedef int foo; means that foo is now a valid type. In my imagined scenario, Type a = b + c * d would parse just fine, even if Type, a, b, c, and d aren't defined anywhere, and problems would only arise later, when actually trying to "resolve" the identifiers.
So, my question is: how do "real" C parsers deal with this? Is the separation between the two stages that I was hoping for just a naive wish, or am I missing something?
Historically, typedefs were a relatively late addition to C. Before they were added to the language, type names consisted of keywords (int, char, double, struct, etc.) and punctuation characters (*, [], ()), and so were easy to recognize unambiguously. An identifier could never be a type name, so an identifier in parentheses followed by an expression could not be a cast expression.
Typedefs made it possible for a user-defined identifier to be a type name, which rather seriously messed up the grammar.
Take a look at the syntax of type-specifier in the C standard (I'll use the C90 version since it's slightly simpler):
type-specifier:
void
char
short
int
long
float
double
signed
unsigned
struct-or-union-specifier
enum-specifier
typedef-name
All but the last can be easily recognized because they either are keywords, or start with a keyword. But a typedef-name is just an identifier.
When a C compiler processes a typedef declaration, it needs to, in effect, introduce the typedef name as a new keyword. Which means that, unlike for a language with a context-free grammar, there needs to be feedback from the symbol table to the parser.
And even that's a bit of an oversimplification. A typedef name can still be redefined, either as another typedef or as something else, in an inner scope:
{
typedef int foo; /* foo is a typedef name */
{
int foo; /* foo is now an ordinary identifier, an object name */
}
/* And now foo is a typedef name again */
}
So a typedef name is effectively a user-defined keyword if it's used in a context where a type name is valid, but is still an ordinary identifier if it's redeclared.
TL;DR: Parsing C is hard.
What you're talking about is a "context-free grammar", where you can parse everything without having to remember what's a type and what's a variable (or, in general, use any semantic attributes associated with an identifier). C, unfortunately, is not context-free, so you don't have that luxury.
Virtually no modern language is context free (e.g, can have the meaning of a phrase determined entirely locally).
The smart money is to build a context-free parser, and resolve context-dependencies later, isolating the two tasks.
Thus the question of "how does the parser know type cast from function call" becomes a non-topic; the only reason it exists is that people insist on tangling raw parsing with name and type resolution.
For a cleaner model, consider using a GLR parser. See this SO answer for more detail, using the problem of resolving what
x*y;
means in C, the same problem for OP, if he hasn't tripped over it yet.