Context: In a recent conversation, the question "does gcc/clang do strlen("static string") at compile time?" came up. After some testing, the answer seems to be yes, regardless the level of optimization. I was a bit surprised to see this done even at -O0, so I did some testing, and eventually arrived to the following code:
#include <stdio.h>
unsigned long strlen(const char* s) {
return 10;
}
unsigned long f() {
return strlen("abcd");
}
unsigned long g(const char* s) {
return strlen(s);
}
int main() {
printf("%ld %ld\n",f(),g("abcd"));
return 0;
}
To my surprise, it prints 4 10 and not 10 10. I tried compiling with gcc and clang, and with various flags (-pedantic, -O0, -O3, -std=c89, -std=c11, ...) and the behavior is consistent between the tests.
Since I didn't include string.h, I expected my definition of strlen to be used. But the assembly code shows indeed that strlen("abcd") was basically replaced by return 4 (which is what I'm observing when running the program).
Also, the compilers print no warnings with -Wall -Wextra (more precisely, none related to the issue: they still warn that parameter s is unused in my definition of strlen).
Two (related) questions arise (I think they are related enough to be asked in the same question):
- is it allowed to redefine a standard function in C when the header declaring it isn't included?
- does this program behave as it should? If so, what happens exactly?
Per C 2011 (draft N1570) 7.1.3 1 and 2:
All identifiers with external linkage in any of the following subclauses … are always reserved for use as identifiers with external linkage.
If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined.
The “following subclauses” specify the standard C library, including strlen. Your program defines strlen, so its behavior is undefined.
What is happening in the case you observe is:
The compiler knows how strlen is supposed to behave, regardless of your definition, so, while optimizing strlen("abcd") in f, it evaluates strlen at compile time, resulting in four.
In g("abcd"), the compiler fails to recognize that, because of the definition of g, this is equivalent to strlen("abcd"), so it does not optimize it at compile time. Instead, it compiles it to a call to g, and it compiles g to call strlen, and it also compiles your definition of strlen, with the result that g("abcd") calls g, which calls your strlen, which returns ten.
The C standard would allow the compiler to discard your definition of strlen completely, so that g returned four. However, a good compiler should warn that your program defines a reserved identifier.
Related
I'm new to C, just a question on strlen. I know that it is defined in <string.h>. But if I don't include <string.h> as:
#include <stdio.h>
int main()
{
char greeting[6] = { 'z', 'e', 'l', 'l', 'o', '\0' };
printf("message size: %d\n", strlen(greeting));
return 0;
}
it still works and prints 5, I did receive a warning which says warning: incompatible implicit declaration of built-in function ‘strlen’ but how come it still compile? and who provides strlen()?
There are several things going on here.
As of the 1990 version of C, it was legal to call a function without a visible declaration. Your call strlen(greeting) would cause the compiler to assume that it's declared as
int strlen(char*);
Since the return type is actually size_t and not int, the call has undefined behavior. Since strlen is a standard library function, the compiler knows how it should be declared and warns you that the implicit warning created by the call doesn't match.
As of the 1999 version of the language, a call to a function with no visible declaration is invalid (the "implicit int" rule was dropped). The language requires a diagnostic -- if you've told the compiler to conform to C99 or later. Many C compilers are more lax by default, and will let you get away with some invalid constructs.
If you compiled with options to conform to the current standard, you would most likely get a more meaningful diagnostic -- even a fatal error if that's what you want (which is a good idea). If you're using gcc or something reasonable compatible with it, try
gcc -std=c11 -pedantic-errors ...
Of course the best solution is to add the required #include <string.h> -- and be aware that you can't always rely on your C compiler to tell you everything that's wrong with your code.
Some more problems with your code:
int main() should be int main(void) though this is unlikely to be a real problem.
Since strlen returns a result of type size_t, use %zu, not %d, to print it. (The %zu format was introduced in C99. In the unlikely event that you're using an implementation that doesn't support it, you can cast the result to a known type.)
The C standard library is linked automatically. That's what gives you access to the strlen function.
Including the header file helps the compiler out by telling it what kind of function strlen is. That is, it tells it what kind of value it returns and what kind of parameters it takes.
That's why your compiler is warning you. It's saying, "You're calling this function but I haven't been told what kind of function it is." Often a compiler will make a guess. You're passing in a char[] so the compiler figures the function must take a char* as an argument. As for the return type, the compiler makes the guess that the return value is an int.
I have code:
#include <stdio.h>
int main() {
int a = sum(1, 3);
return 0;
}
int sum(int a, int b, int c) {
printf("%d\n", c);
return a + b + c;
}
I know that I have to declare functions first, and only after that I can call them, but I want to understand what happends.
(Compiled by gcc v6.3.0)
I ignored implicit declaration of function warning and ran program several times, output was this:
1839551928
-2135227064
41523672
// And more strange numbers
I have 2 questions:
1) What do these numbers mean?
2) How function main knows how to call function sum without it declaration?
I'll assume that the code in your question is the code you're actually compiling and running:
int main() {
int a = sum(1, 3);
return 0;
}
int sum(int a, int b, int c) {
printf("%d\n", c);
return a + b + c;
}
The call to printf is invalid, since you don't have the required #include <stdio.h>. But that's not what you're asking about, so we'll ignore it. The question was edited to add the include directive.
In standard C, since the 1999 standard, calling a function (sum in this case) with no visible declaration is a constraint violation. That means that a diagnostic is required (but a conforming compiler can still successfully compile the program if it chooses to). Along with syntax errors, constraint violations are the closest C comes to saying that something is illegal. (Except for #error directives, which must cause a translation unit to be rejected.)
Prior to C99, C had an "implicit int" rule, which meant that if you call a function with no visible declaration an implicit declaration would be created. That declaration would be for a function with a return type of int, and with parameters of the (promoted) types of the arguments you passed. Your call sum(1, 3) would create an implicit declaration int sum(int, int), and generate a call as if the function were defined that way.
Since it isn't defined that way, the behavior is undefined. (Most likely the value of one of the parameters, perhaps the third, will be taken from some arbitrary register or memory location, but the standard says nothing about what the call will actually do.)
C99 (the 1999 edition of the ISO C standard) dropped the implicit int rule. If you compile your code with a conforming C99 or later compiler, the compiler is required to diagnose an error for the sum(1, 3) call. Many compilers, for backward compatibility with old code, will print a non-fatal warning and generate code that assumes the definition matches the implicit declaration. And many compilers are non-conforming by default, and might not even issue a warning. (BTW, if your compiler did print an error or warning message, it is tremendously helpful if you include it in your question.)
Your program is buggy. A conforming C compiler must at least warn you about it, and possibly reject it. If you run it in spite of the warning, the behavior is undefined.
This is undefined behavior per 6.5.2.2 Function calls, paragraph 9 of the C standard:
If the function is defined with a type that is not compatible with the type (of the expression) pointed to by the expression that denotes the called function, the behavior is undefined.
Functions without prototypes are allowed under 6.5.2.2 Function calls, paragraph 6:
If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined. ...
Note again: if the parameters passed don't match the arguments expected, the behavior is undefined.
In strictly standard conforming C, if you don't declare a function before using it, it will assume certain default argument types for the function.This is based on early versions of C with a weaker type system, and retained only for backwards compatibility. It should not be used generally.
Ill skip the details here, but in your case it assumes sum takes 2 ints and returns an int.
Calling a function with the wrong number of parameters, as you are doing here, is undefined behaviour. When you call sum, the compiler thinks that it takes two integers, so it passes two integers to it. When the function is actually called, however, it tries to read one more integer, c. Since you only passed 2 ints, the space for c contains random crap, which is what you're seeing when you print out. Note that it doesn't have to do this, since this is undefined behaviour, it could do anything. It could have given values for b & c, for example.
Obviously this behaviour is confusing, and you should not rely on undefined behaviour, so you'd be better off compiling with stricter compiler settings so this program wouldn't compile. (The proper version would declare sum above main.)
1) Since you haven't provided value for parameter "c" when calling function "sum" its value inside the function is undefined. If you declared function before main, your program wouldn't even compile, and you would get "error: too few arguments to function call" error.
2) Normally, it doesn't. Function has to be declared before the call so the compiler can check function signature. Compiler optimizations solved this for you in this case.
I'm not 100% sure if C works exactly like this but, your function calls work like a stack in memory. When you call a function your arguments are put on that stack so when in the fuction you can access them by selecting less x positions on memory. So:
You call summ(1, 3)
the stack will have 1 and the on the top a 3.
when executing the fuction it will see the last position of memory for the 1º argument (it recovers the 1) and then the position before that for the 2º argument (recovering the 3), however, there is a 3º argument so it accesses the position before that as well.
This position is garbige as not put by you and different everytime you run it.
Hope it was clear enought. Remeber that the stack works is inverted so every time you add something it goes to the previous memory position, not the next.
I'm fetching the version number of an hpux machine and trying to convert it to a float using atof, but this happens:
#include <stdio.h>
#include <sys/utsname.h>
int main(int argc, char *argv[]) {
struct utsname u;
uname(&u);
char* release = u.release;
while (*release != '.')
release++;
release++;
printf("%s\n", release);
printf("%f\n", atof(release));
}
prints this:
# ./test
11.31
0.000000
Returning double 0 just means the conversion failed. The utsname man page says the strings are null terminated, so I don't understand why the atof command is failing. Am I making some kind of obvious mistake here?
The atof function is declared in <stdlib.h>.
If you call it without the required #include <stdlib.h>, the most likely result is that (a) your compiler will print a warning (which you've apparently ignored), and (b) the compiler will assume that atof returns an int result.
Add
#include <stdlib.h>
to the top of your source file. And pay attention to compiler warnings. If you didn't get a warning, find out how to invoke your compiler in a way that makes it warn about this kind of thing. (I don't know what compiler you're using, so I can't offer specifics.) For gcc, the compiler you're using, the -Wall option will enable this warning.
Some background:
Prior to the 1999 edition of the ISO C standard, it was permitted to call a function with no visible declaration. The compiler would assume that the function takes arguments of the (promoted) types passed in the call, and that it returns a result of type int. This typically would work correctly if the function actually does return an int, and if you write the call correctly (and if the function is not variadic, like printf).
The 1999 version of the standard ("C99") dropped the "implicit int" rule, making any call to a function with no visible declaration a constraint violation, requiring a diagnostic (which can be a non-fatal warning). Many compilers, will merely warn about the error and then handle the call under the older rules. And gcc still doesn't enforce C99 rules by default; its default dialect is "GNU90", consisting of the 1990 ISO C standard plus GNU-specific extensions. You can ask it to use C99 semantics with "-std=c99"; to enforce those semantics, you can use "-std=c99 -pedantic" or "-std=c99 -pedantic-errors".
Newer versions of gcc (partially) support the latest 2011 standard if you specify "-std=c11".
See the gcc 4.8.2 manual for details.
I've always believed that GCC would place a static const variable to .rodata segments (or to .text segments for optimizations) of an ELF or such file. But it seems not that case.
I'm currently using gcc (GCC) 4.7.0 20120505 (prerelease) on a laptop with GNU/Linux. And it does place a static constant variable to .bss segment:
/*
* this is a.c, and in its generated asm file a.s, the following line gives:
* .comm a,4,4
* which would place variable a in .bss but not .rodata(or .text)
*/
static const int a;
int main()
{
int *p = (int*)&a;
*p = 0; /* since a is in .data, write access to that region */
/* won't trigger an exception */
return 0;
}
So, is this a bug or a feature? I've decided to file this as a bug to bugzilla but it might be better to ask for help first.
Are there any reasons that GCC can't place a const variable in .rodata?
UPDATED:
As tested, a constant variable with an explicit initialization(like const int a = 0;) would be placed into .rodata by GCC, while I left the variable uninitialized. Thus this question might be closed later -- I didn't present a correct question maybe.
Also, in my previous words I wrote that the variable a is placed in '.data' section, which is incorrect. It's actually placed into .bss section since not initialized. Text above now is corrected.
The compiler has made it a common, which can be merged with other compatible symbols, and which can go in bss (taking no space on disk) if it ends up with no explicitly initialized definition. Putting it in rodata would be a trade-off; you'd save memory (commit charge) at runtime, but would use more space on disk (potentially a lot for a huge array).
If you'd rather it go in rodata, use the -fno-common option to GCC.
Why GCC does it? Can't really answer that question without asking the developers themselves. If I'm allowed to speculate, I'd wager it has to do with optimization--compilers don't have to enforce const.
That said, I think it's better if we look at the language itself, particularly undefined behavior. There are a few mentions of undefined behavior, but none of them go in-depth.
Modifying a constant is undefined behavior. Const is a contract, and that is especially true in C (and C++).
"But what if I const_cast away the const and modify y anyway?" Then you have undefined behavior.
What undefined behavior means is that the compiler is allowed to do quite literally anything it wants, and whatever the compiler decides to do will not be considered a violation of the ISO 9899 standard.
3.4.3
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
ISO/IEC 9899:1999, §3.4.3
What this means is that, because you have invoked undefined behavior, anything the compiler does is technically correct by way of not being incorrect. Ergo, it is correct for GCC to take...
static const int a = 0;
...and turn it into a .rodata symbol, while taking...
static const int a; // guaranteed to be zero
...and turning it into a .bss symbol.
In the former case, any attempt to modify a--even by proxy--will typically result in a segmentation violation, causing the kernel to force-kill the running program. In the latter case, the program will probably run without crashing.
That said, it is not reasonable to guess which one the compiler will do. Const is a contract, and it is up to you, the programmer, to uphold that contract by not modifying data that is supposed to be constant. Violating that contract means undefined behavior, and all the portability issues and program bugs that come with it.
So GCC can do a couple things.
It might write the symbol to .rodata, giving it protection under the OS kernel
It might write the object to somewhere where memory protection is not guaranteed, in which case...
It might change the value
It might change the value and immediately change it back
It might completely delete the offending code under the rationale that the value isn't changing (0 -> 0), essentially optimizing...
int main(){
int *p = &a;
*p = 0;
return 0;
}
...to...
int main(void){
return 0;
}
It might even send a model T-800 back in time to terminate your parents before you're born.
All of these behaviors are legal (well, legal in the sense of adhering to the standard), so the bug report was not warranted.
writing to an object that has been declared const qualified is undefined behavior: anything can happen, even that.
There is no way in C to declare the object itself to be unmutable, you only forbid it to be mutable through the particular access that you have to it. Here you have an int*, so modification is "allowed" in the sense that the compiler is not forced to issue a diagnostic. Doing a cast in C means that you suppose to know what you are doing.
Are there any reasons that GCC can't place a const variable in .rodata?
Your program is optimized by the compiler (even in -O0 some optimizations are done). Constant propagation is done: http://en.wikipedia.org/wiki/Constant_folding
Try to deceive the compiler like this (note that this program is still technically undefined behavior):
#include <stdio.h>
static const int a;
int main(void)
{
*(int *) &a = printf(""); // compiler cannot assume it is 0
printf("%d\n", a);
return 0;
}
please take a look at my codes below
#include <stdio.h>
void printOut()
{
static int i = 0;
if (i < 10)
{
printOut(i);
}
}
int main(int argc, char *argv[])
{
return 0;
}
i guess there should be an error due to my invoking the non-existed function prototype.Actually, the code compiles well with mingw5 compiler, which is weird for me, then i change to Borland Compiler, i get a warning message said that no printOut function prototype, is this only a warning ? What is more, the code executes well without any pop-up error windows.
In C, a function without any parameters can still take parameters.
That's why it compiles. The way to specify that it doesn't take any parameters is:
void printOut(void)
This is the proper way to do, but is less common especially for those from a C++ background.
Your program's behavior is undefined, because you define printOut() with no parameters, but you call it with one argument. You need to fix it. But you've written it in such a way that the compiler isn't required to diagnose the problem. (gcc, for example, doesn't warn about the parameter mismatch, even with -std=c99 -pedantic -Wall -Wextra -O3.)
The reasons for this are historical.
Pre-ANSI C (prior to 1989) didn't have prototypes; function declarations could not specify the expected type or number of arguments. Function definition, on the other hand, specified the function's parameters, but not in a way that the compiler could use to diagnose mismatched calls. For example, a function with one int parameter might be declared (say, in a header file) like this:
int plus_one();
and defined (say, in the corresponding .c file) like this:
int plus_one(n)
int n;
{
return n + 1;
}
The parameter information was buried inside the definition.
ANSI C added prototypes, so the above could written like this:
int plus_one(int n);
int plus_one(int n)
{
return n + 1;
}
But the language continued to support the old-style declarations and definitions, so as not to break existing code. Even the upcoming C201X standard still permits pre-ANSI function declarations and definitions, though they've been obsolescent for 22 years now.
In your definition:
void printOut()
{
...
}
you're using an old-style function definition. It says that printOut has no parameters -- but it doesn't let the compiler warn you if you call it incorrectly. Inside your function you call it with one argument. The behavior of this call is undefined. It could quietly ignore the extraneous argument -- or it could conceivably corrupt the stack and cause your program to die horribly. (The latter is unlikely; for historical reasons, most C calling conventions are tolerant of such errors.)
If you want your printOut() function to have no parameters and you want the compiler to complain if you call it incorrectly, define it as:
void printOut(void)
{
...
}
This is the one and only correct way to write it in C.
Of course if you simply make this change in your program and then add a call to printOut() in main(), you'll have an infinite recursive loop on your hands. You probably want printOUt() to take an int argument:
void printOut(int n)
{
...
}
As it happens, C++ has different rules. C++ was derived from C, but with less concern for backward compatibility. When Stroustrup added prototypes to C++, he dropped old-style declarations altogether. Since there was no need for a special-case void marker for parameterless functions, void printOut() in C++ says explicitly that printOut has no parameters, and a call with arguments is an error. C++ also permits void printOut(void) for compatibility with C, but that's probably not used very often (it's rarely useful to write code that's both valid C and valid C++.) C and C++ are two different languages; you should follow the rules for whichever language you're using.