Problems with calling an assembly function from C - c

(Running MingW on 64-bit Windows 7 and the GCC on Kubuntu)
This may possibly be just a MingW problem, but it's failed on at least one Kubuntu installation as well, so I'm doubtful.
I have a short, simple C program, which is supposed to call an assembly function. I compile the assembler using nasm and the c program using MingW's implementation of the gcc. The two are linked together with a makefile - bog-simple. And yet, linkage fails on the claim the claim that the external function is an 'undefined reference'
Relevant part of the makefile:
assign0: ass0.o main.o
gcc -v -m32 -g -Wall -o assign0 ass0.o main.o
main.o: main.c
gcc -g -c -Wall -m32 -o main.o main.c
ass0.o: ass0.s
nasm -g -f elf -w+all -o ass0.o ass0.s
The beginning of the assembly file:
section .data ; data section, read-write
an: DD 0 ; this is a temporary var
section .text ; our code is always in the .text section
global do_str ; makes the function appear in global scope
extern printf
do_str: ; functions are defined as labels
[Just Code]
And the c file's declaration:
extern int do_str(char* a);
This has worked on at least one Kubuntu installation, failed on another, and failed on MingW. Does anyone have an idea?

... the claim that the external function is an 'undefined reference'
LOL! Linkers do not "claim" falsehoods. You will not convince it to change its mind by insisting that you are correct or it is wrong. Accept what the tools tell you to be the truth without delay. This is key to rapidly identifying the problem.
Almost every C compiler, including those you are using, generates global symbols with an underscore prefix to minimize name collisions with assembly language symbols. For example, change your code to
extern _printf
...
call _printf
and error messages about printf being undefined will go away. If you do get an undefined reference to _printf, it is because the linker is not accessing the C runtime library. The link command can be challenging to get correct. Usually doing so is not very educational, so crib from a working project, or look for an example. This is way that IDEs are very helpful.
As for the C code calling the assembly function, it is usually easiest to write the assembly function using C's conventions:
global _do_str
_do_str:
Alternatively, you could declare the function to use the Pascal calling convention:
extern int pascal do_str ( whatever parameters are needed);
...
retval = do_str ("hello world");
The Pascal calling convention is substantially different from C's: it does not prepend a leading underscore to the symbol, the caller is responsible for removing the parameters after return, and the parameters are in a different order, possibly with some parameter data types being passed in registers rather than on the stack. See the compiler references for all the details.

C compilers may call the actual "function" differently, e.g. _do_str instead of do_str. Name mangling not happening always could depends on the system (and of course on the compiler). Try calling the asm function _do_str. Using proper attributes (in gcc) could also fix the problem. Also read this.

Related

Where does GCC find printf ? My code worked without any #include

I am a C beginner so I tried to hack around the stuff.
I read stdio.h and I found this line:
extern int printf (const char *__restrict __format, ...);
So I wrote this code and i have no idea why it works.
code:
extern int printf (const char *__restrict __format, ...);
main()
{
printf("Hello, World!\n");
}
output:
sh-5.1$ ./a.out
Hello, World!
sh-5.1$
Where did GCC find the function printf? It also works with other compilers.
I am a beginner in C and I find this very strange.
gcc will link your program, by default, with the c library libc which implements printf:
$ ldd ./a.out
linux-vdso.so.1 (0x00007ffd5d7d3000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdf2d307000)
/lib64/ld-linux-x86-64.so.2 (0x00007fdf2d4f0000)
$ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep ' printf' | head -1
0000000000056cf0 T printf##GLIBC_2.2.5
If you build your program with -nolibc you have to satisfy a few symbols on your own (see
Compiling without libc):
$ gcc -nolibc ./1.c
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x12): undefined reference to `__libc_csu_fini'
/usr/bin/ld: (.text+0x19): undefined reference to `__libc_csu_init'
/usr/bin/ld: (.text+0x26): undefined reference to `__libc_start_main'
/usr/bin/ld: /tmp/user/1000/ccCFGFhf.o: in function `main':
1.c:(.text+0xc): undefined reference to `puts'
collect2: error: ld returned 1 exit status
You need to understand the difference between the compile and link phases of program compilation.
In the compilation phase you describe to the compiler the various things you intend to call that may be in this file, in other files or in libraries. This is done using function declarations.
int woodle(char*);
for example. This is what header files are full of.
If the function is in the same file then the compiler will work out how to call it while it compiles that file. But for other functions it leaves a note in the generated code that says
please wire up the woodle function here so I can call it.
Usually called an import and there are tools you can use to look at the imports in an object file - name depends on platform and toolset
The linkers job is to find those imports and resolve them. It will look at objects files passed on the command line, at libraries included on the command line and also standard libraries that the c standard says should be available to all programs.
In your printf case the linker found printf in the c standard library that the linker includes automatically.
BTW - the linker looks for 'exports' from objects and libraries, there are tools to look at those too. The linkers job is to match each 'import' to an 'export'
First, realize what the gcc program is. Technically, it is not a compiler, but a compiler driver. A compiler driver is responsible for driving the various other tools which perform compilation-related tasks. Some of the tools are found in PATH, whereas others are in internal compiler directories.
There are various ways to check what the driver is doing. I won't go into much detail about how I made the rest of this post, but briefly:
strace -f -e %process gcc is a Linux-specific way of showing all the programs executed (elsewhere in this answer, I assume Linux when specifying details but it doesn't matter)
gcc -v will dump out various information, but you have to learn what parts actually matter for whatever you are doing.
there exists a "specs" file that controls some of the argument-related stuff the driver does
Now for the actual data:
Here's the tree of processes that gcc might execute:
gcc, the "driver" (input various, output various. Some arguments are handled by the driver itself, but most are passed to the various subprocesses)
(these are repeated for every input file. If -pipe is passed, temporary files are omitted and processes are run in parallel; if --save-temps is passed, intermediate files are preserved):
cc1 -E -lang-asm, the "preprocessor" for assembly code (input .S, output .s - yes, case matters. Only relevant if you're trying to compile separate ASM files that need preprocessing)
cc1 -E, the "preprocessor" for C code (input .c; output .i. Only a separate process if -fno-integrated-cpp is passed, which is rare. Note that the cpp program in PATH is never called, even though it is provided by GCC - rather, it calls this. If -E is passed, the driver stops after this)
cc1, the "compiler" proper (input (usually) .c or (rarely) .i; output .s. If -S is passed, the driver stops after this; if -fsyntax-only is passed, this stage doesn't even complete)
(For other languages, replace cc1 with cc1plus, cc1d, cc1obj, f951, gnat1, etc. Note that the different drivers like g++, gdc, etc. only affect what extra libraries are linked by default)
as, the "assembler" (input .s; output .o. This is looked up in PATH; it is shipped as part of Binutils, not GCC. If -c is passed, the driver stops here)
collect2, the "linker" wrapper (supposedly this has something to do with constructors, and potentially calls ld twice, but in practice I've never seen it. Just think of it as forwarding all its arguments to ld, even if you have constructors normally)
ld, the "linker" proper (input .o or others (assumed to be libraries); output executable or shared library. Like as, this is actually part of Binutils, not GCC, so it is looked up in PATH)
The driver has a lot of logic, so it is important that you use it. Notably, you should never call as or ld yourself, since that will omit arguments that rely on the driver's sense of "exact current platform".
Now, getting to your specific question:
Ignoring irrevelant arguments and simplifying paths, the ld call ends up looking like:
ld -o foo Scrt1.o crti.o crtbeginS.o foo.o -lgcc -lgcc_s -lc -lgcc -lgcc_s crtendS.o crtn.o
The various "crt" loose object files are a mixture of parts of GLIBC and GCC, needed to support the C runtime (note that there are others as well; which are linked depends on arguments). The gcc and gcc_s libraries are needed to run code on the platform at all; they are repeated because they rely on the c library which also relies on them.
Since -lc is passed by default (regardless of language), the printf symbol can be resolved. Notably, -lm, -lrt, -lpthread and others are not passed by default, so other symbols from differents parts of the C library will not be resolved unless you pass them manually.
All of this is completely independent of what headers are included.
That your program compiles without a header present means that the compiler settings were lenient. You should still get a warning though. The reason that your program links is that the C standard library, which contains the code of the function printf, is linked automatically. Almost every C program needs it because input and output, or generally interaction with peripherals, which that library handles, are the general means of generating a "side effect", an effect outside the program. The opposite is so uncommon that one must make the wish to not link with it explicit.
So why does your compiler accept a call to a function which has not been declared?
C emerged at a time when programs were much smaller and software development as an engineering discipline didn't formally exist:
Four years later [i.e., in 1978], as a still-junior faculty member, I tried to get my colleagues [...] to create an undergraduate computer-science degree. A senior mechanical engineer of forbidding mien snorted surely not: Harvard had never offered a degree in automotive science, why would we create one in computer science? I waited until I had tenure before trying again (and succeeding) in 1982. -Harry R. Lewis
That was about 10 years after Denis Ritchie had started to develop this versatile new programming language, the successor to B. The problems involved in creating and maintaining large programs back then were simply not as pressing and not as well-understood as they are, perhaps, today.
Among the many things that help us today, at least in most compiled languages, is strong typing. Every identifier we use is declared with a static type. But the importance and benefits of that were not that obvious in the 1970s, and early C permitted mixing and matching integers and pointers at will. It's all numbers, right? And a function is just a name for a jump address, right? The user will know what to put on the stack, and the function will read it off the stack — I really don't see a problem here ;-). This attitude brought us functions like printf().
After this stage-setting we are slowly getting to the point. Because a function is just a jump address, no function declaration needed to be present in order to to call one. The assumed parameters were what you presented, and the presumed return type defaulted to int, which was often correct or at least didn't hurt. And for a long time C kept this backward compatibility. I think the C99 standard forbid the use of undeclared identifiers, and the standard drafts for C11 and C21 both say:
An identifier is a primary expression, provided it has been declared as designating an object (in which case it is an lvalue) or a function (in which case it is a function designator)91
Footnote 91 says "Thus, an undeclared identifier is a violation of the syntax." (All emphasis by me.)
All compilers I tried compile it anyway (with a warning), perhaps because some ancient code that still gets compiled frequently depends on it.

when do printf() and scanf() functions are linked statically or dynamically to application?

When a C program is compiled it under goes in the order of pre-processor,compiler,assembler,linker.
One of the main tasks for linker is to make code of library functions available to your program.
Linker can link them in two ways static or dynamically..
stdio.h contains only declarations,no definitions are present in it.
we only include stdio.h in program to say compiler about the return type and name of functions eg(printf(),scanf(),getc(),putc()...)..
Then how printf() and scanf() are linked in the example program below?
If it is linking dynamically which "DLL" is responsible in linking??
Is total "C" Library is linked dynamically to program??
#include "stdio.h"
int main()
{
int n;
printf("Enter an integer\n");
scanf("%d", &n);
if (n%2 == 0)
printf("Even\n");
else
printf("Odd\n");
return 0;
}
I think the question you are trying to ask is: “I know that functions like printf and scanf are implemented by the C runtime library. But I can use them without telling my compiler and/or IDE to link my program with the C runtime library. Why don’t I need to do that?”
The answer to that question is: “Programs that don’t need to be linked with the C runtime library are very, very rare. Even if you don’t explicitly use any library functions, you will still need the startup code, and the compiler might issue calls to memcpy, floating-point emulation functions, and so on ‘under the hood.’ Therefore, as a convenience, the compiler automatically links your program with the C runtime library, unless you tell it to not do that.”
You will have to consult the documentation for your compiler to learn how to tell it not to link in the C runtime library. GCC uses the -nostdlib command-line option. Below, I demonstrate the hoops you have to jump through to make that work...
$ cat > test.c
#include <stdio.h>
int main(void) { puts("hello world"); return 0; }
^D
$ gcc -nostdlib test.c && { ./a.out; echo $?; }
/usr/bin/ld: warning: cannot find entry symbol _start
/tmp/cc8svIx5.o: In function ‘main’:
test.c:(.text+0xa): undefined reference to ‘puts’
collect2: error: ld returned 1 exit status
puts is obviously in the C library, but so is this mysterious "entry symbol _start". Turn off the C library and you have to provide that yourself, too...
$ cat > test.c
int _start(void) { return 0; }
^D
$ gcc -nostdlib test.c && { ./a.out; echo $?; }
Segmentation fault
139
It links now, but we get a segmentation fault, because _start has nowhere to return to! The operating system expects it to call _exit. OK, let's do that...
$ cat > test.c
extern void _exit(int);
void _start(void) { _exit(0); }
^D
$ gcc -nostdlib test.c && { ./a.out; echo $?; }
/tmp/ccuDrMQ9.o: In function `_start':
test.c:(.text+0xa): undefined reference to `_exit'
collect2: error: ld returned 1 exit status
... nuts, _exit is a function in the C runtime library, too! Raw system call time...
$ cat > test.c
#include <unistd.h>
#include <sys/syscall.h>
void _start(void) { syscall(SYS_exit, 0); }
^D
$ gcc -nostdlib test.c && { ./a.out; echo $?; }
/tmp/cchtZnbP.o: In function `_start':
test.c:(.text+0x14): undefined reference to `syscall'
collect2: error: ld returned 1 exit status
... nope, syscall is also a function in the C runtime. I guess we just have to use assembly!
$ cat > test.S
#include <sys/syscall.h>
.text
.globl _start
.type _start, #function
_start:
movq $SYS_exit, %rax
movq $0, %rdi
syscall
$ gcc -nostdlib test.S && { ./a.out; echo $?; }
0
And that, finally, works. On my computer. It wouldn't work on a different operating system, with a different assembly-level convention for system calls.
You might now be wondering what the heck -nostdlib is even good for, if you have to drop down to assembly language just to make system calls. It's intended to be used when compiling completely self-contained, low-level system programs like the bootloader, the kernel, and (parts of) the C runtime itself — things that were going to have to implement their own everything anyway.
If we had it to do all over again from scratch, it might well make sense to separate out a low-level language-independent runtime, with just the syscall wrappers, language-independent process startup code, and the functions that any language's compiler might need to call "under the hood" (memcpy, _Unwind_RaiseException, __muldi3, that sort of thing). The problem with that idea is it rapidly suffers mission creep — do you include errno? Generic threading primitives? (Which ones, with which semantics?) The dynamic linker? An implementation of malloc, which several of the above things need? Windows's ntdll.dll began as this concept, and it's 1.8MB on disk in Windows 10, which is (slightly) bigger than libc.so + ld.so on my Linux partition. And it's rare and difficult to write a program that only uses ntdll.dll, even if you're Microsoft (the only example I'm sure of is csrss.exe, which might as well be a kernel component).
Generally, standard C libraries are linked dynamically. This is mainly because of the reasons that once a program has been statically linked, the code in it is fixed forever. If someone finds and fixes a bug in printf or scanf, then every program has to be linked again in order to pick up the fixed code.
In case of dynamic linking, none of the executable files (created after linking) contains a copy of the code for printf or scanf. If a new, fixed, version of printf is available, then it is picked up at run time.
-static-libstdc++
When the g++ program is used to link a C++ program, it normally
automatically links against libstdc++. If libstdc++ is available as a
shared library, and the -static option is not used, then this links
against the shared version of libstdc++. That is normally fine.
However, it is sometimes useful to freeze the version of libstdc++
used by the program without going all the way to a fully static link.
The -static-libstdc++ option directs the g++ driver to link libstdc++
statically, without necessarily linking other libraries statically.
For more details, please check this thread.
How can i statically link standard library to my c++ program?
They are linked statically, that way your program will be able to determine if there are any compilation errors before preceding to run the program.

Can printf get replaced by puts automatically in a C program?

#include <stdio.h>
int puts(const char* str)
{
return printf("Hiya!\n");
}
int main()
{
printf("Hello world.\n");
return 0;
}
This code outputs "Hiya!" when run. Could someone explain why?
The compile line is:
gcc main.c
EDIT: it's now pure C, and any extraneous stuff has been removed from the compile line.
Yes, a compiler may replace a call to printf by an equivalent call to puts.
Because you defined your own function puts with the same name as a standard library function, your program's behavior is undefined.
Reference: N1570 7.1.3:
All identifiers with external linkage in any of the following subclauses [this includes puts] are always reserved for use as identifiers with external linkage.
...
If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved
identifier as a macro name, the behavior is undefined.
If you remove your own puts function and examine an assembly listing, you might find a call to puts in the generated code where you called printf in the source code. (I've seen gcc perform this particular optimization.)
It depends upon the compiler and the optimization level. Most recent versions of GCC, on some common systems, with some optimizations, are able to do such an optimization (replacing a simple printf with puts, which AFAIU is legal w.r.t. standards like C99)
You should enable warnings when compiling (e.g. try first to compile with gcc -Wall -g, then debug with gdb, then when you are confident with your code compile it with gcc -Wall -O2)
BTW, redefining puts is really really ugly, unless you do it on purpose (i.e. are coding your own C library, and then you have to obey to the standards). You are getting some undefined behavior (see also this answer about possible consequences of UB). Actually you should avoid redefining names mentioned in the standard, unless you really really know well what you are doing and what is happening inside the compiler.
Also, if you compiled with static linking like gcc -Wall -static -O main.c -o yourprog I'll bet that the linker would have complained (about multiple definition of puts).
But IMNSHO your code is plain wrong, and you know that.
Also, you could compile to get the assembler, e.g. with gcc -fverbose-asm -O -S; and you could even ask gcc to spill a lot of "dump" files, with gcc -fdump-tree-all -O which might help you understanding what gcc is doing.
Again, this particular optimization is valid and very useful : the printf routine of any libc has to "interpret" at runtime the print format string (handling %s etc ... specially); this is in practice quite slow. A good compiler is right in avoiding calling printf (and replacing with puts) when possible.
BTW gcc is not the only compiler doing that optimization. clang also does it.
Also, if you compile with
gcc -ffreestanding -O2 almo.c -o almo
the almo program shows Hello world.
If you want another fancy and surprizing optimization, try to compile
// file bas.c
#include <stdlib.h>
int f (int x, int y) {
int r;
int* p = malloc(2*sizeof(int));
p[0] = x;
p[1] = y;
r = p[0]+p[1];
free (p);
return r;
}
with gcc -O2 -fverbose-asm -S bas.c then look into bas.s; you won't see any call to malloc or to free (actually, no call machine instruction is emitted) and again, gcc is right to optimize (and so does clang)!
PS: Gnu/Linux/Debian/Sid/x86-64; gcc is version 4.9.1, clang is version 3.4.2
Try ltrace on your executable. You will see that printf gets replaced by puts call by the compiler. This depends on the way you called printf
An interesting reading on this is here
Presumably, your library's printf() calls puts ().
Your puts() is replacing the library version.

A C Project containing both C Code and Assembly Code

I have a sample project that contains C code and Assembly Code
There are Main.c, Main.h and convert.S.
Inside the assembly code convert.S there is the following code:
.global
.section .bss
.section .text
.global _FIL_2ORD
_FIL_2ORD:
inside the convert.h file:
extern int FIL_2ORD(
tFIL2HISTORY *history;
tFIL2COEFF *coeff;
int input;
);
Inside the Main.c function if it calls FIL_2ORD(); then would it be resolved through the function inside the assembly code as declared in convert.h file?
My question is whether the assembly code would get compiled and linked, and whenever the main.c calls the function would it be referenced and resolved?
Compile the C, assemble the ASM, and link the two together into an executable. The linker will find FIL_2ORD() inside the ASM's object file after it sees that the C's object file needs it.
The object files are created by the C compiler and the Assembler for each source file respectively.
My question is whether the assembly code would get compiled and
linked, and whenever the main.c calls the function would it be
referenced and resolved?
I'm assuming you're using the GCC compiler - Yes the .global directive in the assembly file makes the _FIL_2ORD symbol public to the linker, so it will become callable from outside the assembly source code.
This is an example of how you can compile, assemble and link using the command-line
gcc -o myexe Main.c convert.S
The extern declaration in convert.h is hinting the C compiler on what parameters the external function is expecting. The assembly source code should honor this declaration. You should look-up the standard C calling convention of your target platform to see the rules of how parameters are passed, and write your assembly code accordingly.
Depending on the target platform, the leading underscore char in the _FIL_2ORD declaration(inside convert.S) may or may not be necessary (this is part of the platform-specific C calling convention I referred to in the previous paragraph). If the program fails to link, try again, this time removing the leading underscore.

Compiling without libc

I want to compile my C-code without the (g)libc. How can I deactivate it and which functions depend on it?
I tried -nostdlib but it doesn't help: The code is compilable and runs, but I can still find the name of the libc in the hexdump of my executable.
If you compile your code with -nostdlib, you won't be able to call any C library functions (of course), but you also don't get the regular C bootstrap code. In particular, the real entry point of a program on Linux is not main(), but rather a function called _start(). The standard libraries normally provide a version of this that runs some initialization code, then calls main().
Try compiling this with gcc -nostdlib -m32:
// Tell the compiler incoming stack alignment is not RSP%16==8 or ESP%16==12
__attribute__((force_align_arg_pointer))
void _start() {
/* main body of program: call main(), etc */
/* exit system call */
asm("movl $1,%eax;"
"xorl %ebx,%ebx;"
"int $0x80"
);
__builtin_unreachable(); // tell the compiler to make sure side effects are done before the asm statement
}
The _start() function should always end with a call to exit (or other non-returning system call such as exec). The above example invokes the system call directly with inline assembly since the usual exit() is not available.
The simplest way to is compile the C code to object files (gcc -c to get some *.o files) and then link them directly with the linker (ld). You will have to link your object files with a few extra object files such as /usr/lib/crt1.o in order to get a working executable (between the entry point, as seen by the kernel, and the main() function, there is a bit of work to do). To know what to link with, try linking with the glibc, using gcc -v: this should show you what normally comes into the executable.
You will find that gcc generates code which may have some dependencies to a few hidden functions. Most of them are in libgcc.a. There may also be hidden calls to memcpy(), memmove(), memset() and memcmp(), which are in the libc, so you may have to provide your own versions (which is not hard, at least as long as you are not too picky about performance).
Things might get clearer at times if you look at the produced assembly (use the -S flag).

Resources