Can printf get replaced by puts automatically in a C program? - c

#include <stdio.h>
int puts(const char* str)
{
return printf("Hiya!\n");
}
int main()
{
printf("Hello world.\n");
return 0;
}
This code outputs "Hiya!" when run. Could someone explain why?
The compile line is:
gcc main.c
EDIT: it's now pure C, and any extraneous stuff has been removed from the compile line.

Yes, a compiler may replace a call to printf by an equivalent call to puts.
Because you defined your own function puts with the same name as a standard library function, your program's behavior is undefined.
Reference: N1570 7.1.3:
All identifiers with external linkage in any of the following subclauses [this includes puts] are always reserved for use as identifiers with external linkage.
...
If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved
identifier as a macro name, the behavior is undefined.
If you remove your own puts function and examine an assembly listing, you might find a call to puts in the generated code where you called printf in the source code. (I've seen gcc perform this particular optimization.)

It depends upon the compiler and the optimization level. Most recent versions of GCC, on some common systems, with some optimizations, are able to do such an optimization (replacing a simple printf with puts, which AFAIU is legal w.r.t. standards like C99)
You should enable warnings when compiling (e.g. try first to compile with gcc -Wall -g, then debug with gdb, then when you are confident with your code compile it with gcc -Wall -O2)
BTW, redefining puts is really really ugly, unless you do it on purpose (i.e. are coding your own C library, and then you have to obey to the standards). You are getting some undefined behavior (see also this answer about possible consequences of UB). Actually you should avoid redefining names mentioned in the standard, unless you really really know well what you are doing and what is happening inside the compiler.
Also, if you compiled with static linking like gcc -Wall -static -O main.c -o yourprog I'll bet that the linker would have complained (about multiple definition of puts).
But IMNSHO your code is plain wrong, and you know that.
Also, you could compile to get the assembler, e.g. with gcc -fverbose-asm -O -S; and you could even ask gcc to spill a lot of "dump" files, with gcc -fdump-tree-all -O which might help you understanding what gcc is doing.
Again, this particular optimization is valid and very useful : the printf routine of any libc has to "interpret" at runtime the print format string (handling %s etc ... specially); this is in practice quite slow. A good compiler is right in avoiding calling printf (and replacing with puts) when possible.
BTW gcc is not the only compiler doing that optimization. clang also does it.
Also, if you compile with
gcc -ffreestanding -O2 almo.c -o almo
the almo program shows Hello world.
If you want another fancy and surprizing optimization, try to compile
// file bas.c
#include <stdlib.h>
int f (int x, int y) {
int r;
int* p = malloc(2*sizeof(int));
p[0] = x;
p[1] = y;
r = p[0]+p[1];
free (p);
return r;
}
with gcc -O2 -fverbose-asm -S bas.c then look into bas.s; you won't see any call to malloc or to free (actually, no call machine instruction is emitted) and again, gcc is right to optimize (and so does clang)!
PS: Gnu/Linux/Debian/Sid/x86-64; gcc is version 4.9.1, clang is version 3.4.2

Try ltrace on your executable. You will see that printf gets replaced by puts call by the compiler. This depends on the way you called printf
An interesting reading on this is here

Presumably, your library's printf() calls puts ().
Your puts() is replacing the library version.

Related

Why am I able to link without including ctype.h

Without #include<ctype.h>, the following program outputs 1 and 0.
With the include, it outputs 1 and 1.
I am using TDM-GCC 4.9.2 64-bit. I wonder what the implementation of isdigit is in the first case, and why it is able to link.
#include<stdio.h>
//#include<ctype.h>
int main()
{
printf("%d %d\n",isdigit(48),isdigit(48.4));
return 0;
}
By default GCC uses the C90 standard (with GNU extensions (reference)) which allows implicit declarations. The problem with your case is that you have two calls to isdigit with two different arguments which might confuse the compiler when it creates the implicit declaration of the function, and it probably selects int isdigit(double) to be on the safe side. That is of course the wrong prototype for the function, which means that when the library function is called at run-time it will be called with wrong arguments and you will have undefined behavior.
When you include the <ctype.h> header file, there is a correct prototype, and then the compiler know that isdigit takes an int argument and can convert the double literal 48.4 to the integer 48 for the call.
As for why it's linking, it's because while these functions may be implemented as macros, that's not a requirement. What is a requirement is that those functions, at least in the C11 standard (I don't have any older version available at the moment), have to be aware of the current locale which will make their implementation as macros much harder, and much easier as normal library functions. And as the standard library is always linked (unless you tell GCC otherwise) the functions will be available.
First of all #include statements don't have anything to do with linking. Remember anything with a # in-front in C is meant for the preprocessor, not the compiler or the linker.
But that said the function has to be linked isn't it?
Let's do the steps in separate steps.
$ gcc -c -Werror --std=c99 st.c
st.c: In function ‘main’:
st.c:5:22: error: implicit declaration of function ‘isdigit’ [-Werror=implicit-function-declaration]
printf("%d %d\n",isdigit(48),isdigit(48.4));
^
cc1: all warnings being treated as errors
Well as you see gcc's lint(static analyzer) is in action!
Whatever we will proceed to ignore it...
$ gcc -c --std=c99 st.c
st.c: In function ‘main’:
st.c:5:22: warning: implicit declaration of function ‘isdigit’ [-Wimplicit-function-declaration]
printf("%d %d\n",isdigit(48),isdigit(48.4));
This time only an warning. Now we have a object file at the current directory. Let's inspect it...
$ nm st.o
U isdigit
0000000000000000 T main
U printf
As you can see both printf and isdigit is listed as undefined. So the code has to come from somewhere isn't it?
let's proceed to link it ...
$ gcc st.o
$ nm a.out | grep 'printf\|isdigit'
U isdigit##GLIBC_2.2.5
U printf##GLIBC_2.2.5
Well as you can see situation is mildly improved. As isdigit and printf are not helpless loners like they were in the st.o. You could see both of the functions are provided by GLIBC_2.2.5. But where is that GLIBC?
Well let's examine the final executable a bit more...
$ ldd a.out
linux-vdso.so.1 => (0x00007ffe58d70000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb66f299000)
/lib64/ld-linux-x86-64.so.2 (0x000055b26631d000)
AHA...there is that libc . So it turns out, though you have not given any instruction, the linker is linking with 3 libraries by default, one of them is the libc which contains both printf and isdigit.
You can see the default behaviour of the linker by :
$gcc -dumpspec
*link:
%{!r:--build-id} %{!static:--eh-frame-hdr} %{!mandroid|tno-android-ld:%{m16|m32|mx32:;:-m elf_x86_64} %{m16|m32:-m elf_i386} %{mx32:-m elf32_x86_64} --hash-style=gnu --as-needed %{shared:-shared} %{!shared: %{!static: %{rdynamic:-export-dynamic} %{m16|m32:-dynamic-linker %{muclibc:/lib/ld-uClibc.so.0;:%{mbionic:/system/bin/linker;:/lib/ld-linux.so.2}}} %{m16|m32|mx32:;:-dynamic-linker %{muclibc:/lib/ld64-uClibc.so.0;:%{mbionic:/system/bin/linker64;:/lib64/ld-linux-x86-64.so.2}}} %{mx32:-dynamic-linker %{muclibc:/lib/ldx32-uClibc.so.0;:%{mbionic:/system/bin/linkerx32;:/libx32/ld-linux-x32.so.2}}}} %{static:-static}};:%{m16|m32|mx32:;:-m elf_x86_64} %{m16|m32:-m elf_i386} %{mx32:-m elf32_x86_64} --hash-style=gnu --as-needed %{shared:-shared} %{!shared: %{!static: %{rdynamic:-export-dynamic} %{m16|m32:-dynamic-linker %{muclibc:/lib/ld-uClibc.so.0;:%{mbionic:/system/bin/linker;:/lib/ld-linux.so.2}}} %{m16|m32|mx32:;:-dynamic-linker %{muclibc:/lib/ld64-uClibc.so.0;:%{mbionic:/system/bin/linker64;:/lib64/ld-linux-x86-64.so.2}}} %{mx32:-dynamic-linker %{muclibc:/lib/ldx32-uClibc.so.0;:%{mbionic:/system/bin/linkerx32;:/libx32/ld-linux-x32.so.2}}}} %{static:-static}} %{shared: -Bsymbolic}}
What are the other two libraries?
Well remember when you dug into a.out, both printf and isdigit were still shown as U that means unknown. In other words, there were no memory address associated with these symbols.
In reality this is where the magic lies. These libraries were actually loaded during runtime, not during link time like older systems.
How it's implemented? Well it has a jargon associated with, something like lazy linking. What it does, is when the process calls a function , if there is no memory address(TEXT section), it generates a Trap (Something like a Exception in high level language jargon, when control is handed over to the language engine). The kernel intercepts such Trap and hands it over to the dynamic loader which loads the library and returns the associated memory address to the caller process.
There are multiple theoretical reason, why doing things lazily is better than doing it beforehand. I guess that's a whole new topic, which we will discuss at some other time.

printf and memcpy linkage to standard C library

It is my understanding that if I call printf in a program, by default (if the program isn't statically compiled) it makes a call to printf in the standard C library. However, if I were to call say memcpy, I'd hope the code would be inlined, as a function call is very expensive if memcpy is only copying a few bytes. If you're inlining sometimes and calling out others, the behaviour of your program after a libc upgrade is implementation dependent.
What actually occurs in both of these cases and generally?
First of all the function is never truly "inlined" - that applies to functions that you've written that are visible in the same compilation unit.
If you're inlining sometimes and calling out others, the behaviour of your program after a libc upgrade is implementation dependent.
This is not the case. The memcpy might be "inlined" at compile time. Once compiled, your libc version makes no difference.
In GCC, memcpy is recognized as a builtin. That means if GCC decides it, the call to memcpy will be replaced with a suitable implementation. On x86, this will usually be a rep movsb or similar instruction - depending on the size of the copy, and if it is of a constant size or not.
An implementation is allowed by the C standard to behave "as if" the actual standard library function were called. This is indeed a common optimization: small memcpy calls can be unrolled/inlined, and much more.
You're right that in some cases you could upgrade your libc and not see any change in function calls which were optimized out.
It's going to depend on a lot of things, here's how you can find out. GNU Binutils comes with a utility objdump that gives all sorts of details on what's in a binary.
On my system (an ARM Chromebook), compiling test.c:
#include <stdio.h>
int main(void) {
printf("Hello, world!\n");
}
with gcc test.c -o test and then running objdump -R test gives
test: file format elf32-littlearm
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
000105e4 R_ARM_GLOB_DAT __gmon_start__
000105d4 R_ARM_JUMP_SLOT puts
000105d8 R_ARM_JUMP_SLOT __libc_start_main
000105dc R_ARM_JUMP_SLOT __gmon_start__
000105e0 R_ARM_JUMP_SLOT abort
These are the dynamic relocation entries that are in the file, all the stuff that will be linked in from libraries external to the binary. Here it seems that the printf has been entirely optimized out, since it is only giving a constant string, and thus puts is sufficient. If we modify this to
printf("Hello world #%d\n", 1);
then we get the expected
000105e0 R_ARM_JUMP_SLOT printf
To get memcpy to be explicitly linked to, we have to prevent gcc from using its own builtin version with -fno-buildin-memcpy.
You can always attempt to drive the compiler behavior. For instance, with gcc:
gcc -fno-inline -fno-builtin-inline -fno-inline-functions -fno-builtin...
You should check the different results with nm or directly the interrupt calls in the assembly source code.

Problems with calling an assembly function from C

(Running MingW on 64-bit Windows 7 and the GCC on Kubuntu)
This may possibly be just a MingW problem, but it's failed on at least one Kubuntu installation as well, so I'm doubtful.
I have a short, simple C program, which is supposed to call an assembly function. I compile the assembler using nasm and the c program using MingW's implementation of the gcc. The two are linked together with a makefile - bog-simple. And yet, linkage fails on the claim the claim that the external function is an 'undefined reference'
Relevant part of the makefile:
assign0: ass0.o main.o
gcc -v -m32 -g -Wall -o assign0 ass0.o main.o
main.o: main.c
gcc -g -c -Wall -m32 -o main.o main.c
ass0.o: ass0.s
nasm -g -f elf -w+all -o ass0.o ass0.s
The beginning of the assembly file:
section .data ; data section, read-write
an: DD 0 ; this is a temporary var
section .text ; our code is always in the .text section
global do_str ; makes the function appear in global scope
extern printf
do_str: ; functions are defined as labels
[Just Code]
And the c file's declaration:
extern int do_str(char* a);
This has worked on at least one Kubuntu installation, failed on another, and failed on MingW. Does anyone have an idea?
... the claim that the external function is an 'undefined reference'
LOL! Linkers do not "claim" falsehoods. You will not convince it to change its mind by insisting that you are correct or it is wrong. Accept what the tools tell you to be the truth without delay. This is key to rapidly identifying the problem.
Almost every C compiler, including those you are using, generates global symbols with an underscore prefix to minimize name collisions with assembly language symbols. For example, change your code to
extern _printf
...
call _printf
and error messages about printf being undefined will go away. If you do get an undefined reference to _printf, it is because the linker is not accessing the C runtime library. The link command can be challenging to get correct. Usually doing so is not very educational, so crib from a working project, or look for an example. This is way that IDEs are very helpful.
As for the C code calling the assembly function, it is usually easiest to write the assembly function using C's conventions:
global _do_str
_do_str:
Alternatively, you could declare the function to use the Pascal calling convention:
extern int pascal do_str ( whatever parameters are needed);
...
retval = do_str ("hello world");
The Pascal calling convention is substantially different from C's: it does not prepend a leading underscore to the symbol, the caller is responsible for removing the parameters after return, and the parameters are in a different order, possibly with some parameter data types being passed in registers rather than on the stack. See the compiler references for all the details.
C compilers may call the actual "function" differently, e.g. _do_str instead of do_str. Name mangling not happening always could depends on the system (and of course on the compiler). Try calling the asm function _do_str. Using proper attributes (in gcc) could also fix the problem. Also read this.

Performance difference between C program executables created by gcc and g++ compilers

Lets say I have written a program in C and compiled it with both gcc (as C) and g++ (as C++), which compiled executable will run faster: the one created by gcc or by g++? I think using the g++ compiler will make the executable slow, but I'm not sure about it.
Let me clarify my question again because of confusion about gcc:
Let's say I compile program a.c like this in the terminal:
gcc a.c
g++ a.c
Which a.out executable will run faster?
Firstly: the question (and some of the other answers) seem to be based on the faulty premise that C is a strict subset of C++, which is not in fact the case. Compiling C as C++ is not the same as compiling it as C: it can change the meaning of your program!
C will mostly compile as C++, and will mostly give the same results, but there are some things that are explicitly defined to give different behaviour.
Here's a simple example - if this is your a.c:
#include <stdio.h>
int main(void)
{
printf("%d\n", sizeof('x'));
return 0;
}
then compiling as C will give one result:
$ gcc a.c
$ ./a.out
4
and compiling as C++ will give a different result (unless you're using an unusual platform where int and char are the same size):
$ g++ a.c
$ ./a.out
1
because the C specification defines a character literal to have type int, and the C++ specification defines it to have type char.
Secondly: gcc and g++ are not "the same compiler". The same back end code is used, but the C and C++ front ends are different pieces of code (gcc/c-*.c and gcc/cp/*.c in the gcc source).
Even if you stick to the parts of the language that are defined to do the same thing, there is no guarantee that the C++ front end will parse the code in exactly the same way as the C front end (i.e. giving exactly the same input to the back end), and hence no guarantee that the generated code will be identical. So it is certainly possible that one might happen to generate faster code than the other in some cases - although I would imagine that you'd need complex code to have any chance of finding a difference, as most of the optimisation and code generation magic happens in the common back end of the compiler; and the difference could be either way round.
I think they they will both produce the same machine code, and therefore the same speed on your computer.
If you want to find out, you could compile the assembly for both and compare the two, but I'm betting that they create the same assembly, and therefore the same machine code.
Profile it and try it out. I'm certain it will depend on the actual code, even if it would require potentially a really weird case to get any different bytecode. Though if you don't have extern C {} around your C code, and or works fine in C, I'm not sure how "compiling it as though it were C++" could provide any speed, unless the particular compiler optimizations in g++ just happen to be a bit better for your particular situation...
The machine code generated should be identical. The g++ version of a.out will probably link in a couple of extra support libraries. This will make the startup time of a.out be slower by a few system calls.
There is not really any practical difference though. The Linux linker will not become noticeably slower until you reach 20-40 linked libraries and thousands of symbols to resolve.
The gcc and g++ executables are just frontends, they are not the actual compilers. They both run the actual C or C++ compilers (and ld, ar, whatever is needed to produce the output you asked for) based on the file extensions. So you'll get the exact same result. G++ is commonly used for C++ because it links with the standard C++ library (iostreams etc.).
If you want to compile C code as C++, either change the file extension, or do something like this:
gcc test.c -otest -x c++
http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/G_002b_002b-and-GCC.html
GCC is a compiler collection. It is mainly used for compilation of C,C++,Ada,Java and many more programming languages.
G++ is a part of gnu compiler collection(gcc).
I mean gcc includes g++ as well. When we use gcc for compilation of C++ it uses g++. The output files will be different because the G++ compiler uses its own run time library.
Edit: Okay, to clarify things, because we have a bit of confusion in naming here. GCC is the GNU Compiler Collection. It can compile Ada, C++, C, and a billion and a half other languages. It is a "backend" to the various languages "front end" compilers like GNAT. Go read the link i made at the top of the page from GCC.GNU.Org.
GCC can also refer to the GNU C Compiler. This will compile C++ code if given the -lstdc++ command, but normally will choke and die because it's not pulling in the C++ libraries.
G++, the GNU C++ Compiler, like the GNU C Compiler is a front end to the GNU Compiler Collection. It's difference between the C Compiler is that it automatically includes those libraries and makes a few other small tweaks, because it's assuming it's going to be fed C++ code to compile.
This is where the confusion comes from. Does this clarify things a bit?

gcc - 2 versions, different treatment of inline functions

Recently I've come across a problem in my project. I normally compile it in gcc-4, but after trying to compile in gcc-3, I noticed a different treatment of inline functions. To illustrate this I've created a simple example:
main.c:
#include "header.h"
#include <stdio.h>
int main()
{
printf("f() %i\n", f());
return 0;
}
file.c:
#include "header.h"
int some_function()
{
return f();
}
header.h
inline int f()
{
return 2;
}
When I compile the code in gcc-3.4.6 with:
gcc main.c file.c -std=c99 -O2
I get linker error (multiple definition of f), the same if I remove the -O2 flag. I know the compiler does not have to inline anything if it doesn't want to, so I assumed it placed f in the object file instead of inlining it in case of both main.c and file.c, thus multiple definition error. Obviously I could fix this by making f static, then, in the worst case, having a few f's in the binary.
But I tried compiling this code in gcc-4.3.5 with:
gcc main.c file.c -std=c99 -O2
And everything worked fine, so I assumed the newer gcc inlined f in both cases and there was no function f in the binary at all (checked in gdb and I was right).
However, when I removed the -O2 flag, I got two undefined references to int f().
And here, I really don't understand what is happening. It seems like gcc assumed f would be inlined, so it didn't add it to the object file, but later (because there was no -O2) it decided to generate calls to these functions instead of inlining and that's where the linker error came from.
Now comes the question: how should I define and declare simple and small functions, which I want inline, so that they can be used throughout the project without the fear of problems in various compilers? And is making all of them static the right thing to do? Or maybe gcc-4 is broken and I should never have multiple definitions of inline functions in a few translation units unless they're static?
Yes, the behavior has been changed from gcc-4.3 onwards. The gcc inline doc (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Inline.html) details this.
Short story: plain inline only serves to tell gcc (in the old version anyway) to
inline calls to the from the same file scope. However, it does not tell gcc that
all callers would be from the file scope, thus gcc also keeps a linkable version
of f() around: which explains your duplicate symbols error above.
Gcc 4.3 changed this behavior to be compatible with c99.
And, to answer your specific question:
Now comes the question: how should I define and declare simple and small functions, which I want inline, so that they can be used throughout the project without the fear of problems in various compilers? And is making all of them static the right thing to do? Or maybe gcc-4 is broken and I should never have multiple definitions of inline functions in a few translation units unless they're static?
If you want portability across gcc versions use static inline.

Resources