I'm reading the Shellcoder's handbook and trying to follow along on there simple overflowing buffers on the stack example, but I'm stuck.
I'm running GCC on windows and before a function call instead of pushing on the stack like the book says it should, it just moves the values into registers and then makes the call. The book is running linux I'd assume, does it use a different calling method than windows? How would I get the linux behavior?
Also, when a program accept user input, how do I input data into the program such that it shows in the gdb?
It looks like your book is assuming the cdecl calling convention on an IA32 platform, but that your compiler is using a different calling convention that puts parameters in registers. Are you using an AMD64 platform by any chance? The standard for AMD64 is to put the first n arguments in registers and only additional arguments on the stack (Windows only uses four registers for parameters; every other common platform uses six).
More information on calling conventions: https://en.wikipedia.org/wiki/X86_calling_conventions
If you add a bunch of additional function parameters before the ones that you care about, you should get the last ones on the stack. Alternately, if you compile as 32-bit instead of 64-bit, you might get what you're looking for.
Related
I am wondering how mixing C and assembly can be possible as compilers generate code in different ways, for example many C compilers will use registers rather than pushing to the stack while making a function call, These functions will then move those registers into the appropiate memory locations because of this what if you write assembly code or link with an object file created by a different compiler that will call the C function but instead push the arguments to the stack rather than set the registers.
My guess is the C compiler assembly output has done it in such a clever way that it doesn't make a difference and it will still work but I can't be sure looking at the assembly code it doesn't appear it would work.
Can anyone answer my question as I am writing a compiler and need to know this so I don't make any mistakes should I want to link with a C module in the future.
The conventions that are used for calling functions are part of what's called the "application binary interface" (ABI). If this interface is specified, then all code that follows the specification can be linked together.
There is no standard ABI for C. However, most popular platforms have one prevailing C compiler that effectively produces a de-facto standard ABI (e.g. there's one for Windows, one for Linux on x86 (32 and 64 bit), one for Linux on ARM, etc.). ABIs may specify a large number of separate "calling conventions", and your C compiler will typically let you specify the desired convention at the point of function declaration using some vendor extension.
Conversely, if there is no documented ABI for your C compiler, or for an existing bit of object code, then you cannot in general link (or otherwise interact) with it successfully.
Since the system calls which any library function of C (say printf()) makes is OS dependant , does that imply that we have a different function printf() for different OS ?
It depends on your definition of "different", because I can think of at least three levels of difference:
Interface differences
High-level code differences
Machine code differences
The C standard suggests an interface, and this interface is supposed to be respected across the board. This means that for any OS with a C standard library, the OS should show your program an outlet called printf, and if your program plugs into it, it can expect it to behave as documented. This means that for all you're concerned, printf is the same across the board.
This doesn't mean that printf has to be the same piece of code in every standard library. If someone told me to write a printf function and told you to write a printf function, we could have a different approach, and that would still be fine as long as we both respected the documented behavior. As a matter of fact, for copyright reasons, you can be certain that the code for Windows's printf is different from Linux's printf code.
And finally, even with the same source code, printf would have to be different to accommodate platform differences. You can't expect an x86 printf to work on ARM, for instance. And as you noted, you can't expect a Linux printf to work on Windows because of platform conventions and system call differences.
So the machine code behind the printf outlet will be different, but the point of the standard is to make it work the same.
If you mean "printf behave differently on different OSes", then the answer is:
externally (from the user of the function viewpoint) no, its semantics is standardized. That means that a given call to such a function leads to the same results, whatever is the OS.
internally probably, its implementation is free. That means, that the computation that such a function will really do to produce you the result can be different.
I'm teaching myself Linux assembly language and I've come across an interesting difference between BSD and Linux. In Unix, you push your syscall parameters onto the stack before calling an 80h interrupt; by contrast, in Linux, you pass your parameters in registers.
Does anyone know what the rationale was for the Linux developers to go with registers instead of the stack?
Note: Here's a good page detailing this difference: FreeBSD Developer's Handbook:System Calls without explaining the rationale.
The syscall convention is different because the standard function calling sequence is different. Im assuming you're talking about the difference between the x86-32 calling convention and the AMD64 calling convention. You can check out the AMD64 ABI here.
But if you want to get to the point quickly check this post. Basically it's about speed. By changing the calling convention and using registers instead of the stack you can shave off instructions in the prologue and the epilogue of a call.
You can use some registers with 32 bit code as well. There are several calling conventions for 32-bit code: cdecl, stdcall, pascal and fastcall. Windows and Linux use the same calling conventions for 32-bit code. With fastcall (__attribute((fastcall) in GCC) the first two integer parameters (3 with some compilers) can be registers. The other calling conventions use the stack.
For 64-bit code Windows and Linux use different calling conventions. Linux can use up to 14 registers for calls and Windows only six. Using registers can make the code faster. That could be part of the reason some 64-bit code with many function calls runs O(10%) faster than the same 32-bit code.
I am using a bootloader program which is in Assembly and I am calling a C function frequently to SEND and RECEIVE a Character at a time. The controller I am using seems to have just 3 general purpose registers which it uses frequently. Apart from that I am storing some bytes in fixed RAM locations.
SO, my question is:
Will C function overwrite these RAM location, which were defined in Assembly?
I am doing PUSH and PULL of the concerned registers before going and after coming from these C functions.
If I understand your question correctly, you are concerned about the RAM locations used in your assembly module overlapping with some variable declared in a C module. You can examine the list file output by your linker to determine if this is the case. The linker list file will show all of the RAM addresses used by your C modules which you can compare to the fixed RAM locations used in the assembly module.
Note that if your linker does not produce a list file automatically, you will have to read through your linker's documentation to find the right command line option to do so.
As long as you are keeping the previous values on the stack when doing the c calls you should be fine. Just make sure that you are pushing onto stack before the call and popping off the stack after returning.
It all depends on the C calling convention that the C code was compiled in. Calling convention is how the caller and callee will communicate with regards to passing data into the function and returning values afterwards. This includes who wil do stuff like back up registers onto the stack before/after calling, will it be necessary to prep the registers before calling the C function, can you guarantee that the registers will return the way they were, etc.
You'll need to find out how the C code was compiled (with what Calling Convention setting). Note that this is also architecture specific. A summary of the different calling conventions and a description of what each entails can be found at Wikipedia here:
http://en.wikipedia.org/wiki/Calling_convention
http://en.wikipedia.org/wiki/X86_calling_conventions
On x86, cdecl and stdcall are the most popular conventions. cdecl means your ASM code should do the cleanup, while stdcall says the function being called is responsible for it. If you have the source code for the C function, I would suggest passing the necessary flags to the compiler to make it a "Callee cleanup" convention (usually stdcall, but safecall and fastcall are also options) which means you can safely call the C function without worrying about register corruption.
It come across to me that function like printf() have not limited the number of parameters.
But when debugging program on Solaris, I noticed it will push at most 5 parameters into stack, common register will be used if there are more than 5 parameters.
So what will happen if even common register is not enough in function like printf ? Did compiler do something for me ?
The behaviour is controlled by the ABI for the platform. If there are more parameters than fit in the registers, then they will be handled in a different way. There isn't a simple upper limit on the number of arguments that can be passed, so the compiler and the ABI define a mechanism that works on the hardware in question. What works on SPARC does not necessarily work on, for example, Intel IA32.
Normally platforms where the ABI uses registers for argument passing switch to a different calling convention for variadic functions, whereby everything is passed on the stack. This is why the C standard assigned undefined behavior to calling a variadic function without a prototype; without a prototype, on such platforms the compiler will generate an incorrect call.
It should be noted that some platforms use more complicated (uselessly complicated, I would say) methods of passing arguments to variadic functions, such as constructing a sort of linked list and passing a hidden pointer to that list, which the implementation of va_start is then somehow able to obtain. As a programmer, you should just treat the whole stdarg.h stuff as a black box that does what's expected, and pray that you never have to see the gorey details of some of the uglier implementations...