This code compiles, but no surprises, it fails while linking (no main found):
Listing 1:
void main();
Link error: \mingw\lib\libmingw32.a(main.o):main.c:(.text+0x106) undefined reference to _WinMain#16'
But, the code below compiles and links fine, with a warning:
Listing 2:
void (*main)();
warning: 'main' is usually a function
Questions:
In listing 1, linker should have
complained for missing "main". Why
is it looking for _WinMain#16?
The executable generated from
listing 2 simply crashes. What is
the reason?
Thanks for your time.
True, main doesn't need to be a function. This has been exploited in some obfuscated programs that contain binary program code in an array called main.
The return type of main() must be int (not void). If the linker is looking for WinMain, it thinks that you have a GUI application.
In most C compilation systems, there is no type information associated with symbols that are linked. You could declare main as e.g.:
char main[10];
and the linker would be perfectly happy. As you noted, the program would probably crash, uless you cleverly initialized the contents of the array.
Your first example doesn't define main, it just declares it, hence the linker error.
The second example defines main, but incorrectly.
Case 1. is Windows-specific - the compiler probably generates _WinMain symbol when main is properly defined.
Case 2. - you have a pointer, but as static variable it's initialized to zero, thus the crash.
On Windows platforms the program's main unit is WinMain if you don't set the program up as a console app. The "#16" means it is expecting 16 bytes of parameters. So the linker would be quite happy with you as long as you give it a function named WinMain with 16 bytes of parameters.
If you wanted a console app, this is your indication that you messed something up.
You declared a pointer-to-function named main, and the linker warned you that this wouldn't work.
The _WinMain message has to do with how Windows programs work. Below the level of the C runtime, a Windows executable has a WinMain.
Try redefining it as int main(int argc, char *argv[])
What you have is a linker error. The linker expects to find a function with that "signature" - not void with no parameters
See http://publications.gbdirect.co.uk/c_book/chapter10/arguments_to_main.html etc
In listing 1, you are saying "There's a main() defined elsewhere in my code --- I promise!". Which is why it compiles. But you are lying there, which is why the link fails. The reason you get the missing WinMain16 error, is because the standard libraries (for Microsoft compiler) contain a definition for main(), which calls WinMain(). In a Win32 program, you'd define WinMain() and the linker would use the library version of main() to call WinMain().
In Listing 2, you have a symbol called main defined, so both the compiler & the linker are happy, but the startup code will try to call the function that's at location "main", and discover that there's really not a function there, and crash.
1.) An (compiler/platform) dependent function is called before code in main is executed and hence your behavior(_init in case of linux/glibc).
2) The code crash in 2nd case is justified as the system is unable to access the contents of the symbol main as a function which actually is a function pointer pointing to arbitrary location.
Related
Been working on my own language for some time now and I've been implementing implicit and explicit casts. Went back to check what happens when invoking them. Everything went well, until I tried to invoke a cast from a double. Passing doubles to functions for some reason results in clang giving me an error when linking the obj file to the cpp file that invokes it, namely:"clang.exe: error: linker command failed with exit code 1143 (use -v to see invocation)program.obj : fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x5"
I've then noticed this also happens with floats, but does not occur with integers and the likes. I've written a short function (takes a double by value and returns it) to check whether the problem was related to parameters of types float/double. The error only occurs when invoking the function, when defining the function but not invoking it, everything functioned properly. I've also tried to simply store a double literal in memory (alloca) and that worked as well, so the problem has to be with actually passing the argument to the function.
Simple LLVM IR for a function that takes a double by value and returns it, then being invoked from main (main is names _mainCRTStartup as to not confuse it with the symbol for the main function from the cpp file that then invokes it (extern "C"))
Thanks for any help in advance! :D
I am trying to teach myself some C, and I am stuck almost right off the bat in xCode. Currently I have a folder structure that looks like this:
main.c looks like this:
#include <stdio.h>
int main(void)
{
int dogs;
printf("How many dogs do you have?\n");
scanf("%d", &dogs);
printf("So you have %d dog(s)! \n", dogs);
return 0;
}
concrete.c looks like this:
#include <stdio.h>
int main(void)
{
printf("Concrete contains gravel and cement.\n");
return 0;
}
No matter which one I run via Product->Run I get the same error:
Any ideas what is causing this? Coming from Python background only so these errors and xCode in general are totally new to me
EDIT- deleting the concrete.c file fixed the issue, but that doesn't seem reasonable to me. Is that because I had 2 files with int main(void) in it?
When you click on Xcode’s display of “Linker command failed with exit code 1”, it should show you the portion of the build log where the linker is displaying messages and exiting with code 1.
These messages include:
duplicate symbol _main in:
[Some path on your system…]/Build/Intermediates.noindex/foo.build/Debug/foo.build/Objects-normal/x86_64/foo.o
[Some path on your system…]/Build/Intermediates.noindex/foo.build/Debug/foo.build/Objects-normal/x86_64/main.o
ld: 1 duplicate symbol for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
That message “duplicate symbol _main” tells you the problem: You have two definitions of main. main is the name for the unique routine that is called to start your program, and you should have only one definition for it.
Supplement
By default, in C, function names have external linkage, meaning the same name in different translation units (source files, when used normally) refer to the same thing. So there cannot be two functions with the same name—any name, not just main—because then you have two things with the same name. You can give a function internal linkage instead of external by declaring it with static, and then that name will not conflict with the same name in other translation units.
Nonetheless, defining a static version of main may result in compiler complaints, both that main should not be declared with static and that it ought to be declared to return int, if you declared it otherwise. These stem from text in the C standard, at C 2018 5.1.2.2.1 1:
The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters: … or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared): …
Although that text restricts the declaration of main, it seems to me that it could be interpreted as referring to the main with external linkage, not to any routine that happens to be declared main. Nonetheless, it is generally bad practice to reuse well-known special-purpose names for other purposes.
I saw a snippet of code on CodeGolf that's intended as a compiler bomb, where main is declared as a huge array. I tried the following (non-bomb) version:
int main[1] = { 0 };
It seems to compile fine under Clang and with only a warning under GCC:
warning: 'main' is usually a function [-Wmain]
The resulting binary is, of course, garbage.
But why does it compile at all? Is it even allowed by the C specification? The section that I think is relevant says:
5.1.2.2.1 Program startup
The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters [...] or with two parameters [...] or in some other implementation-defined manner.
Does "some other implementation-defined manner" include a global array? (It seems to me that the spec still refers to a function.)
If not, is it a compiler extension? Or a feature of the toolchains, that serves some other purpose and they decided to make it available through the frontend?
It's because C allows for "non-hosted" or freestanding environment which doesn't require the main function. This means that the name main is freed for other uses. This is why the language as such allows for such declarations. Most compilers are designed to support both (the difference is mostly how linking is done) and therefore they don't disallow constructs that would be illegal in hosted environment.
The section you refers to in the standard refers to hosted environment, the corresponding for freestanding is:
in a freestanding environment (in which C program execution may take place without any
benefit of an operating system), the name and type of the function called at program
startup are implementation-defined. Any library facilities available to a freestanding
program, other than the minimal set required by clause 4, are implementation-defined.
If you then link it as usual it will go bad since the linker normally has little knowledge about the nature of the symbols (what type it has or even if it's a function or variable). In this case the linker will happily resolve calls to main to the variable named main. If the symbol is not found it will result in link error.
If you're linking it as usual you're basically trying to use the compiler in hosted operation and then not defining main as you're supposed to means undefined behavior as per appendix J.2:
the behavior is undefined in the following circumstances:
...
program in a hosted environment does not define a function named
main
using one
of the specified forms (5.1.2.2.1)
The purpose of the freestanding possibility is to be able to use C in environments where (for example) standard libraries or CRT initialization is not given. This means that the code that is run before main is called (that's the CRT initialization that initializes the C runtime) might not provided and you would be expected to provide that yourself (and you may decide to have a main or may decide not to).
If you are interested how to create program in main array: https://jroweboy.github.io/c/asm/2015/01/26/when-is-main-not-a-function.html. The example source there just contains a char (and later int) array called main which is filled with machine instructions.
The main steps and problems were:
Obtain the machine instructions of a main function from a gdb memory dump and copy it into the array
Tag the data in main[] executable by declaring it const (data is apparently either writable or executable)
Last detail: Change an address for actual string data.
The resulting C code is just
const int main[] = {
-443987883, 440, 113408, -1922629632,
4149, 899584, 84869120, 15544,
266023168, 1818576901, 1461743468, 1684828783,
-1017312735
};
but results in an executable program on a 64 bit PC:
$ gcc -Wall final_array.c -o sixth
final_array.c:1:11: warning: ‘main’ is usually a function [-Wmain]
const int main[] = {
^
$ ./sixth
Hello World!
The problem is that main is not a reserved identifier. The C standard only says that in hosted systems there is usually a function called main. But nothing in the standard prevents you from abusing the same identifier for other sinister purposes.
GCC gives you a smug warning "main is usually a function", hinting that the use of the identifier main for other unrelated purposes isn't a brilliant idea.
Silly example:
#include <stdio.h>
int main (void)
{
int main = 5;
main:
printf("%d\n", main);
main--;
if(main)
{
goto main;
}
else
{
int main (void);
main();
}
}
This program will repeatedly print the numbers 5,4,3,2,1 until it gets a stack overflow and crashes (don't try this at home). Unfortunately, the above program is a strictly conforming C program and the compiler can't stop you from writing it.
main is - after compiling - just another symbol in an object file like many others (global functions, global variables, etc).
The linker will link the symbol main regardless of its type. Indeed, the linker cannot see the type of the symbol at all (he can see, that it isn't in the .text-section however, but he doesn't care ;))
Using gcc, the standard entry point is _start, which in turn calls main() after preparing the runtime environment. So it will jump to the address of the integer array, which usually will result in a bad instruction, segfault or some other bad behaviour.
This all of course has nothing to do with the C-standard.
It only compiles because you don't use the proper options (and works because linkers sometimes only care for the names of symbols, not their type).
$ gcc -std=c89 -pedantic -Wall x.c
x.c:1:5: warning: ISO C forbids zero-size array ‘main’ [-Wpedantic]
int main[0];
^
x.c:1:5: warning: ‘main’ is usually a function [-Wmain]
const int main[1] = { 0xc3c3c3c3 };
This compiles and executes on x86_64... does nothing just return :D
I'm looking at example abo3.c from Insecure Programming and I'm not grokking the casting in the example below. Could someone enlighten me?
int main(int argv,char **argc)
{
extern system,puts;
void (*fn)(char*)=(void(*)(char*))&system;
char buf[256];
fn=(void(*)(char*))&puts;
strcpy(buf,argc[1]);
fn(argc[2]);
exit(1);
}
So - what's with the casting for system and puts? They both return an int so why cast it to void?
I'd really appreciate an explanation of the whole program to put it in perspective.
[EDIT]
Thank you both for your input!
Jonathan Leffler, there is actually a reason for the code to be 'bad'. It's supposed to be exploitable, overflowing buffers and function pointers etc. mishou.org has a blog post on how to exploit the above code. A lot of it is still above my head.
bta, I gather from the above blog post that casting system would somehow prevent the linker from removing it.
One thing that is not immediately clear is that the system and puts addresses are both written to the same location, I think that might be what gera is talking about “so the linker doesn’t remove it”.
While we are on the subject of function pointers, I'd like to ask a follow-up question now that the syntax is clearer. I was looking at some more advanced examples using function pointers and stumbled upon this abomination, taken from a site hosting shellcode.
#include <stdio.h>
char shellcode[] = "some shellcode";
int main(void)
{
fprintf(stdout,"Length: %d\n",strlen(shellcode));
(*(void(*)()) shellcode)();
}
So the array is getting cast to a function returning void, referenced and called? That just looks nasty - so what's the purpose of the above code?
[/EDIT]
Original question
User bta has given a correct explanation of the cast - and commented on the infelicity of casting system.
I'm going to add:
The extern line is at best weird. It is erroneous under strict C99 because there is no type, which makes it invalid; under C89, the type will be assumed to be int. The line says 'there is an externally defined integer called system, and another called puts', which is not correct - there are a pair of functions with those names. The code may actually 'work' because the linker might associate the functions with the supposed integers. But it is not safe for a 64-bit machine where pointers are not the same size as int. Of course, the code should include the correct headers (<stdio.h> for puts() and <stdlib.h> for system() and exit(), and <string.h> for strcpy()).
The exit(1); is bad on two separate counts.
It indicates failure - unconditionally. You exit with 0 or EXIT_SUCCESS to indicate success.
In my view, it is better to use return at the end of main() than exit(). Not everyone necessarily agrees with me, but I do not like to see exit() as the last line of main(). About the only excuse for it is to avoid problems from other bad practices, such as functions registered with atexit() that depend on the continued existence of local variables defined in main().
/usr/bin/gcc -g -std=c99 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -c nasty.c
nasty.c: In function ‘main’:
nasty.c:3: warning: type defaults to ‘int’ in declaration of ‘system’
nasty.c:3: warning: type defaults to ‘int’ in declaration of ‘puts’
nasty.c:3: warning: built-in function ‘puts’ declared as non-function
nasty.c:8: warning: implicit declaration of function ‘strcpy’
nasty.c:8: warning: incompatible implicit declaration of built-in function ‘strcpy’
nasty.c:10: warning: implicit declaration of function ‘exit’
nasty.c:10: warning: incompatible implicit declaration of built-in function ‘exit’
nasty.c: At top level:
nasty.c:1: warning: unused parameter ‘argv’
Not good code! I worry about a source of information that contains such code and doesn't explain all the awfulness (because the only excuse for showing such messy code is to dissect it and correct it).
There's another weirdness in the code:
int main(int argv,char **argc)
That is 'correct' (it will work) but 100% aconventional. The normal declaration is:
int main(int argc, char **argv)
The names are short for 'argument count' and 'argument vector', and using argc as the name for the vector (array) of strings is abnormal and downright confusing.
Having visited the site referenced, you can see that it is going through a set of graduated examples. I'm not sure whether the author simply has a blind spot on the argc/argv issue or is deliberately messing around ('abo1' suggests that he is playing, but it is not helpful in my view). The examples are supposed to feed your mind, but there isn't much explanation of what they do. I don't think I could recommend the site.
Extension question
What's the cast in this code doing?
#include <stdio.h>
char shellcode[] = "some shellcode";
int main(void)
{
fprintf(stdout,"Length: %d\n",strlen(shellcode));
(*(void(*)()) shellcode)();
}
This takes the address of the string 'shellcode' and treats it as a pointer to a function that takes an indeterminate set of arguments and returns no values and executes it with no arguments. The string contains the binary assembler code for some exploit - usually running the shell - and the objective of the intruder is to get a root-privileged program to execute their shellcode and give them a command prompt, with root privileges. From there, the system is theirs to own. For practicing, the first step is to get a non-root program to execute the shellcode, of course.
Reviewing the analysis
The analysis at Mishou's web site is not as authoritative as I'd like:
One, this code uses the extern keyword in the C language to make the system and puts functions available. What this does (I think) is basically references directly the location of a function defined in the (implied) header files…I get the impression that GDB is auto-magically including the header files stdlib.h for system and stdio.h for puts. One thing that is not immediately clear is that the system and puts addresses are both written to the same location, I think that might be what gera is talking about “so the linker doesn’t remove it”.
Dissecting the commentary:
The first sentence isn't very accurate; it tells the compiler that the symbols system and puts are defined (as integers) somewhere else. When the code is linked, the address of puts()-the-function is known; the code will treat it as an integer variable, but the address of the integer variable is, in fact, the address of the function - so the cast forces the compiler to treat it as a function pointer after all.
The second sentence is not fully accurate; the linker resolves the addresses of the external 'variables' via the function symbols system() and puts() in the C library.
GDB has nothing whatsoever to do the compilation or linking process.
The last sentence does not make any sense at all. The addresses only get written to the same location because you have an initialization and an assignment to the same variable.
This didn't motivate me to read the whole article, it must be said. Due diligence forces me onwards; the explanation afterwards is better, though still not as clear as I think it could be. But the operation of overflowing the buffer with an overlong but carefully crafted argument string is the core of the operation. The code mentions both puts() and system() so that when run in non-exploit mode, the puts() function is a known symbol (otherwise, you'd have to use dlopen() to find its address), and so that when run in exploit mode, the code has the symbol system() available for direct use. Unused external references are not made available in the executable - a good thing when you realize how many symbols there are in a typical system header compared with the number used by a program that includes the header.
There are some neat tricks shown - though the implementation of those tricks is not shown on the specific page; I assume (without having verified it) that the information for getenvaddr program is available.
The abo3.c code can be written as:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
void (*fn)(char*) = (void(*)(char*))system;
char buf[256];
fn = (void(*)(char*))puts;
strcpy(buf, argv[1]);
fn(argv[2]);
exit(1);
}
Now it compiles with only one warning with the fussy compilation options I originally used - and that's the accurate warning that 'argc' is not used. It is just as exploitable as the original; it is 'better' code though because it compiles cleanly. The indirections were unnecessary mystique, not a crucial part of making the code exploitable.
Both system and puts normally return int. The code is casting them to a pointer that returns void, presumably because they want to ignore whatever value is returned. This should be equivalent to using (void)fn(argc[2]); as the penultimate line if the cast didn't change the return type. Casting away the return type is sometimes done for callback functions, and this code snippet seems to be a simplistic example of a callback.
Why the cast for system if it is never used is beyond me. I'm assuming that there's more code that isn't shown here.
How does the compiler know the prototype of sleep function or even printf function, when I did not include any header file in the first place?
Moreover, if I specify sleep(1,1,"xyz") or any arbitrary number of arguments, the compiler still compiles it.
But the strange thing is that gcc is able to find the definition of this function at link time, I don't understand how is this possible, because actual sleep() function takes a single argument only, but our program mentioned three arguments.
/********************************/
int main()
{
short int i;
for(i = 0; i<5; i++)
{
printf("%d",i);`print("code sample");`
sleep(1);
}
return 0;
}
Lacking a more specific prototype, the compiler will assume that the function returns int and takes whatever number of arguments you provide.
Depending on the CPU architecture arguments can be passed in registers (for example, a0 through a3 on MIPS) or by pushing them onto the stack as in the original x86 calling convention. In either case, passing extra arguments is harmless. The called function won't use the registers passed in nor reference the extra arguments on the stack, but nothing bad happens.
Passing in fewer arguments is more problematic. The called function will use whatever garbage happened to be in the appropriate register or stack location, and hijinks may ensue.
In classic C, you don't need a prototype to call a function. The compiler will infer that the function returns an int and takes a unknown number of parameters. This may work on some architectures, but it will fail if the function returns something other than int, like a structure, or if there are any parameter conversions.
In your example, sleep is seen and the compiler assumes a prototype like
int sleep();
Note that the argument list is empty. In C, this is NOT the same as void. This actually means "unknown". If you were writing K&R C code, you could have unknown parameters through code like
int sleep(t)
int t;
{
/* do something with t */
}
This is all dangerous, especially on some embedded chips where the way parameters are passed for a unprototyped function differs from one with a prototype.
Note: prototypes aren't needed for linking. Usually, the linker automatically links with a C runtime library like glibc on Linux. The association between your use of sleep and the code that implements it happens at link time long after the source code has been processed.
I'd suggest that you use the feature of your compiler to require prototypes to avoid problems like this. With GCC, it's the -Wstrict-prototypes command line argument. In the CodeWarrior tools, it was the "Require Prototypes" flag in the C/C++ Compiler panel.
C will guess int for unknown types. So, it probably thinks sleep has this prototype:
int sleep(int);
As for giving multiple parameters and linking...I'm not sure. That does surprise me. If that really worked, then what happened at run-time?
This is to do with something called 'K & R C' and 'ANSI C'.
In good old K & R C, if something is not declared, it is assumed to be int.
So any thing that looks like a function call, but not declared as function
will automatically take return value of 'int' and argument types depending
on the actuall call.
However people later figured out that this can be very bad sometimes. So
several compilers added warning. C++ made this error. I think gcc has some
flag ( -ansic or -pedantic? ) , which make this condition an error.
So, In a nutshell, this is historical baggage.
Other answers cover the probable mechanics (all guesses as compiler not specified).
The issue that you have is that your compiler and linker have not been set to enable every possible error and warning. For any new project there is (virtually) no excuse for not doing so. for legacy projects more excuse - but should strive to enable as many as possible
Depends on the compiler, but with gcc (for example, since that's the one you referred to), some of the standard (both C and POSIX) functions have builtin "compiler intrinsics". This means that the compiler library shipped with your compiler (libgcc in this case) contains an implementation of the function. The compiler will allow an implicit declaration (i.e., using the function without a header), and the linker will find the implementation in the compiler library because you're probably using the compiler as a linker front-end.
Try compiling your objects with the '-c' flag (compile only, no link), and then link them directly using the linker. You will find that you get the linker errors you expect.
Alternatively, gcc supports options to disable the use of intrinsics: -fno-builtin or for granular control, -fno-builtin-function. There are further options that may be useful if you're doing something like building a homebrew kernel or some other kind of on-the-metal app.
In a non-toy example another file may include the one you missed. Reviewing the output from the pre-processor is a nice way to see what you end up with compiling.