snprintf() prints garbage floats with newlib nano - c

I am running a bare metal embedded system with an ARM Cortex-M3 (STM32F205). When I try to use snprintf() with float numbers, e.g.:
float f;
f = 1.23;
snprintf(s, 20, "%5.2f", f);
I get garbage into s. The format seems to be honored, i.e. the garbage is a well-formed string with digits, decimal point, and two trailing digits. However, if I repeat the snprintf, the string may change between two calls.
Floating point mathematics seems to work otherwise, and snprintf works with integers, e.g.:
snprintf(s, 20, "%10d", 1234567);
I use the newlib-nano implementation with the -u _printf_float linker switch. The compiler is arm-none-eabi-gcc.
I do have a strong suspicion of memory allocation problems, as integers are printed without any hiccups, but floats act as if they got corrupted in the process. The printf family functions call malloc with floats, not with integers.
The only piece of code not belonging to newlib I am using in this context is my _sbrk(), which is required by malloc.
caddr_t _sbrk(int incr)
{
extern char _Heap_Begin; // Defined by the linker.
extern char _Heap_Limit; // Defined by the linker.
static char* current_heap_end;
char* current_block_address;
// first allocation
if (current_heap_end == 0)
current_heap_end = &_Heap_Begin;
current_block_address = current_heap_end;
// increment and align to 4-octet border
incr = (incr + 3) & (~3);
current_heap_end += incr;
// Overflow?
if (current_heap_end > &_Heap_Limit)
{
errno = ENOMEM;
current_heap_end = current_block_address;
return (caddr_t) - 1;
}
return (caddr_t)current_block_address;
}
As far as I have been able to track, this should work. It seems that no-one ever calls it with negative increments, but I guess that is due to the design of the newlib malloc. The only slightly odd thing is that the first call to _sbrk has a zero increment. (But this may be just malloc's curiosity about the starting address of the heap.)
The stack should not collide with the heap, as there is around 60 KiB RAM for the two. The linker script may be insane, but at least the heap and stack addresses seem to be correct.

As it may happen that someone else gets bitten by the same bug, I post an answer to my own question. However, it was #Notlikethat 's comment which suggested the correct answer.
This is a lesson of Thou shall not steal. I borrowed the gcc linker script which came with the STMCubeMX code generator. Unfortunately, the script along with the startup file is broken.
The relevant part of the original linker script:
_estack = 0x2000ffff;
and its counterparts in the startup script:
Reset_Handler:
ldr sp, =_estack /* set stack pointer */
...
g_pfnVectors:
.word _estack
.word Reset_Handler
...
The first interrupt vector position (at 0) should always point to the startup stack top. When the reset interrupt is reached, it also loads the stack pointer. (As far as I can say, the latter one is unnecessary as the HW anyway reloads the SP from the 0th vector before calling the reset handler.)
The Cortex-M stack pointer should always point to the last item in the stack. At startup there are no items in the stack and thus the pointer should point to the first address above the actual memory, 0x020010000 in this case. With the original linker script the stack pointer is set to 0x0200ffff, which actually results in sp = 0x0200fffc (the hardware forces word-aligned stack). After this the stack is misaligned by 4.
I changed the linker script by removing the constant definition of _estack and replacing it by _stacktop as shown below. The memory definitions were there before. I changed the name just to see where the value is used.
MEMORY
{
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 128K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 64K
}
_stacktop = ORIGIN(RAM) + LENGTH(RAM);
After this the value of _stacktop is 0x20010000, and my numbers float beautifully... The same problem could arise with any external (library) function using double length parameters, as the ARM Cortex ABI states that the stack must be aligned to 8 octets when calling external functions.

snprintf accepts size as second argument. You might want to go through this example http://www.cplusplus.com/reference/cstdio/snprintf/
/* snprintf example */
#include <stdio.h>
int main ()
{
char buffer [100];
int cx;
cx = snprintf ( buffer, 100, "The half of %d is %d", 60, 60/2 );
snprintf ( buffer+cx, 100-cx, ", and the half of that is %d.", 60/2/2 );
puts (buffer);
return 0;
}

Related

How do I call hex data stored in an array with inline assembly?

I have an OS project that I am working on and I am trying to call data that I have read from the disk in C with inline assembly.
I have already tried reading the code and executing it with the assembly call instruction, using inline assembly.
void driveLoop() {
uint16_t sectors = 31;
uint16_t sector = 0;
uint16_t basesector = 40000;
uint32_t i = 40031;
uint16_t code[sectors][256];
int x = 0;
while(x==0) {
read(i);
for (int p=0; p < 256; p++) {
if (readOut[p] == 0) {
} else {
x = 1;
//kprint_int(i);
}
}
i++;
}
kprint("Found sector!\n");
kprint("Loading OS into memory...\n");
for (sector=0; sector<sectors; sector++) {
read(basesector+sector);
for (int p=0; p<256; p++) {
code[sector][p] = readOut[p];
}
}
kprint("Done loading.\n");
kprint("Attempting to call...\n");
asm volatile("call (%0)" : : "r" (&code));
When the inline assembly is called I expect it to run the code from the sectors I read from the "disk" (this is in a VM, because its a hobby OS). What it does instead is it just hangs.
I probably don't much understand how variables, arrays, and assembly work, so if you could fill me in, that would be nice.
EDIT: The data I am reading from the disk is a binary file that was added
to the disk image file with
cat kernel.bin >> disk.img
and the kernel.bin is compiled with
i686-elf-ld -o kernel.bin -Ttext 0x4C4B40 *insert .o files here* --oformat binary
What it does instead is it just hangs.
Run your OS inside BOCHS so you can use BOCHS's built-in debugger to see exactly where it's stuck.
Being able to debug lockups, including with interrupts disabled, is probably very useful...
asm volatile("call (%0)" : : "r" (&code)); is unsafe because of missing clobbers.
But even worse than that it will load a new EIP value from the first 4 bytes of the array, instead of setting EIP to that address. (Unless the data you're loading is an array of pointers, not actual machine code?)
You have the %0 in parentheses, so it's an addressing mode. The assembler will warn you about an indirect call without *, but will assemble it like call *(%eax), with EAX = the address of code[0][0]. You actually want a call *%eax or whatever register the compiler chooses, register-indirect not memory-indirect.
&code and code are both just a pointer to the start of the array; &code doesn't create an anonymous pointer object storing the address of another address. &code takes the address of the array as a whole. code in this context "decays" to a pointer to the first object.
https://gcc.gnu.org/wiki/DontUseInlineAsm (for this).
You can get the compiler to emit a call instruction by casting the pointer to a function pointer.
__builtin___clear_cache(&code[0][0], &code[30][255]); // don't optimize away stores into the buffer
void (*fptr)(void) = (void*)code; // casting to void* instead of the actual target type is simpler
fptr();
That will compile (with optimization enabled) to something like lea 16(%esp), %eax / call *%eax, for 32-bit x86, because your code[][] buffer is an array on the stack.
Or to have it emit a jmp instead, do it at the end of a void function, or return funcptr(); in a non-void function, so the compiler can optimize the call/ret into a jmp tailcall.
If it doesn't return, you can declare it with __attribute__((noreturn)).
Make sure the memory page / segment is executable. (Your uint16_t code[]; is a local, so gcc will allocate it on the stack. This might not be what you want. The size is a compile-time constant so you could make it static, but if you do that for other arrays in other sibling functions (not parent or child), then you lose out on the ability to reuse a big chunk of stack memory for different arrays.)
This is much better than your unsafe inline asm. (You forgot a "memory" clobber, so nothing tells the compiler that your asm actually reads the pointed-to memory). Also, you forgot to declare any register clobbers; presumably the block of code you loaded will have clobbered some registers if it returns, unless it's written to save/restore everything.
In GNU C you do need to use __builtin__clear_cache when casting a data pointer to a function pointer. On x86 it doesn't actually clear any cache, it's telling the compiler that the stores to that memory are not dead because it's going to be read by execution. See How does __builtin___clear_cache work?
Without that, gcc could optimize away the copying into uint16_t code[sectors][256]; because it looks like a dead store. (Just like with your current inline asm which only asks for the pointer in a register.)
As a bonus, this part of your OS becomes portable to other architectures, including ones like ARM without coherent instruction caches where that builtin expands to a actual instructions. (On x86 it purely affects the optimizer).
read(basesector+sector);
It would probably be a good idea for your read function to take a destination pointer to read into, so you don't need to bounce data through your readOut buffer.
Also, I don't see why you'd want to declare your code as a 2D array; sectors are an artifact of how you're doing your disk I/O, not relevant to using the code after it's loaded. The sector-at-a-time thing should only be in the code for the loop that loads the data, not visible in other parts of your program.
char code[sectors * 512]; would be good.

how many recursive function calls causes stack overflow?

I am working on a simulation problem written in c, the main part of my program is a recursive function.
when the recursive depth reaches approximately 500000 it seems stack overflow occurs.
Q1 : I want to know that is this normal?
Q2 : in general how many recursive function calls causes stack overflow?
Q3 : in the code below, removing local variable neighbor can prevent from stack overflow?
my code:
/*
* recursive function to form Wolff Cluster(= WC)
*/
void grow_Wolff_cluster(lattic* l, Wolff* wolff, site *seed){
/*a neighbor of site seed*/
site* neighbor;
/*go through all neighbors of seed*/
for (int i = 0 ; i < neighbors ; ++i) {
neighbor = seed->neighbors[i];
/*add to WC according to the Wolff Algorithm*/
if(neighbor->spin == seed->spin && neighbor->WC == -1 && ((double)rand() / RAND_MAX) < add_probability)
{
wolff->Wolff_cluster[wolff->WC_pos] = neighbor;
wolff->WC_pos++; // the number of sites that is added to WC
neighbor->WC = 1; // for avoiding of multiple addition of site
neighbor->X = 0;
///controller_site_added_to_WC();
/*continue growing Wolff cluster(recursion)*/
grow_Wolff_cluster(l, wolff, neighbor);
}
}
}
I want to know that is this normal?
Yes. There's only so much stack size.
In the code below, removing local variable neighbor can prevent from stack overflow?
No. Even with no variables and no return values the function calls themselves must be stored in the stack so the stack can eventually be unwound.
For example...
void recurse() {
recurse();
}
int main (void)
{
recurse();
}
This still overflows the stack.
$ ./test
ASAN:DEADLYSIGNAL
=================================================================
==94371==ERROR: AddressSanitizer: stack-overflow on address 0x7ffee7f80ff8 (pc 0x00010747ff14 bp 0x7ffee7f81000 sp 0x7ffee7f81000 T0)
#0 0x10747ff13 in recurse (/Users/schwern/tmp/./test+0x100000f13)
SUMMARY: AddressSanitizer: stack-overflow (/Users/schwern/tmp/./test+0x100000f13) in recurse
==94371==ABORTING
Abort trap: 6
In general how many recursive function calls causes stack overflow?
That depends on your environment and function calls. Here on OS X 10.13 I'm limited to 8192K by default.
$ ulimit -s
8192
This simple example with clang -g can recurse 261976 times. With -O3 I can't get it to overflow, I suspect compiler optimizations have eliminated my simple recursion.
#include <stdio.h>
void recurse() {
puts("Recurse");
recurse();
}
int main (void)
{
recurse();
}
Add an integer argument and it's 261933 times.
#include <stdio.h>
void recurse(int cnt) {
printf("Recurse %d\n", cnt);
recurse(++cnt);
}
int main (void)
{
recurse(1);
}
Add a double argument, now it's 174622 times.
#include <stdio.h>
void recurse(int cnt, double foo) {
printf("Recurse %d %f\n", cnt, foo);
recurse(++cnt, foo);
}
int main (void)
{
recurse(1, 2.3);
}
Add some stack variables and it's 104773 times.
#include <stdio.h>
void recurse(int cnt, double foo) {
double this = 42.0;
double that = 41.0;
double other = 40.0;
double thing = 39.0;
printf("Recurse %d %f %f %f %f %f\n", cnt, foo, this, that, other, thing);
recurse(++cnt, foo);
}
int main (void)
{
recurse(1, 2.3);
}
And so on. But I can increase my stack size in this shell and get twice the calls.
$ ./test 2> /dev/null | wc -l
174622
$ ulimit -s 16384
$ ./test 2> /dev/null | wc -l
349385
I have a hard upper limit to how big I can make the stack of 65,532K or 64M.
$ ulimit -Hs
65532
A stack overflow isn’t defined by the C standard, but by the implementation. The C standard defines a language with unlimited stack space (among other resources) but does have a section about how implementations are allowed to impose limits.
Usually it’s the operating system that actually first creates the error. The OS doesn’t care about how many calls you make, but about the total size of the stack. The stack is composed of stack frames, one for each function call. Usually a stack frame consists of some combination of the following five things (as an approximation; details can vary a lot between systems):
The parameters to the function call (probably not actually here, in this case; they’re probably in registers, although this doesn’t actually buy anything with recursion).
The return address of the function call (in this case, the address of the ++i instruction in the for loop).
The base pointer where the previous stack frame starts
Local variables (at least those that don’t go in registers)
Any registers the caller wants to save when it makes a new function call, so the called function doesn’t overwrite them (some registers may be saved by the caller instead, but it doesn’t particularly matter for stack size analysis). This is why passing parameters in registers doesn’t help much in this case; they’ll end up on the stack sooner or later.
Because some of these (specifically, 1., 4., and 5.) can vary in size by a lot, it can be difficult to estimate how big an average stack frame is, although it’s easier in this case because of the recursion. Different systems also have different stack sizes; it currently looks like by default I can have 8 MiB for a stack, but an embedded system would probably have a lot less.
This also explains why removing a local variable gives you more available function calls; you reduced the size of each of the 500,000 stack frames.
If you want to increase the amount of stack space available, look into the setrlimit(2) function (on Linux like the OP; it may be different on other systems). First, though, you might want to try debugging and refactoring to make sure you need all that stack space.
Yes and no - if you come across a stack overflow in your code, it could mean a few things
Your algorithm is not implemented in a way that respects the amount of memory on the stack you have been given. You may adjust this amount to suit the needs of the algorithm.
If this is the case, it's more common to change the algorithm to more efficiently utilize the stack, rather than add more memory. Converting a recursive function to an iterative one, for example, saves a lot of precious memory.
It's a bug trying to eat all your RAM. You forgot a base case in the recursion or mistakenly called the same function. We've all done it at least 2 times.
It's not necessarily how many calls cause an overflow - it's dependent upon how much memory each individual call takes up on a stack frame. Each function call uses up stack memory until the call returns. Stack memory is statically allocated -- you can't change it at runtime (in a sane world). It's a last-in-first-out (LIFO) data structure behind the scenes.
It's not preventing it, it's just changing how many calls to grow_Wolff_cluster it takes to overflow the stack memory. On a 32-bit system, removing neighbor from the function costs a call to grow_Wolff_cluster 4 bytes less. It adds up quickly when you multiply that in the hundreds of thousands.
I suggest you learn more about how stacks work for you. Here's a good resource over on the software engineering stack exchange. And another here on stack overflow (zing!)
For each time a function recurs, your program takes more memory on the stack, the memory it takes for each function depends upon the function and variables within it. The number of recursions that can be done of a function is entirely dependant upon your system.
There is no general number of recursions that will cause stack overflow.
Removing the variable 'neighbour' will allow for the function to recur further as each recursion takes less memory, but it will still eventually cause stack overflow.
This is a simple c# function that will show you how many iteration your computer can take before stack overflow (as a reference, I have run up to 10478):
private void button3_Click(object sender, EventArgs e)
{
Int32 lngMax = 0;
StackIt(ref lngMax);
}
private void StackIt(ref Int32 plngMax, Int32 plngStack = 0)
{
if (plngStack > plngMax)
{
plngMax = plngStack;
Console.WriteLine(plngMax.ToString());
}
plngStack++;
StackIt(ref plngMax, plngStack);
}
in this simple case, the condition check: "if (plngStack > plngMax)" could be removed,
but if you got a real recursive function, this check will help you localize the problem.

How to corrupt the stack in a C program

I have to change the designated section of function_b so that it changes the stack in such a way that the program prints:
Executing function_a
Executing function_b
Finished!
At this point it also prints Executed function_b in between Executing function_b and Finished!.
I have the following code and I have to fill something in, in the part where it says // ... insert code here
#include <stdio.h>
void function_b(void){
char buffer[4];
// ... insert code here
fprintf(stdout, "Executing function_b\n");
}
void function_a(void) {
int beacon = 0x0b1c2d3;
fprintf(stdout, "Executing function_a\n");
function_b();
fprintf(stdout, "Executed function_b\n");
}
int main(void) {
function_a();
fprintf(stdout, "Finished!\n");
return 0;
}
I am using Ubuntu Linux with the gcc compiler. I compile the program with the following options: -g -fno-stack-protector -fno-omit-frame-pointer. I am using an intel processor.
Here is a solution, not exactly stable across environments, but works for me on x86_64 processor on Windows/MinGW64.
It may not work for you out of the box, but still, you might want to use a similar approach.
void function_b(void) {
char buffer[4];
buffer[0] = 0xa1; // part 1
buffer[1] = 0xb2;
buffer[2] = 0xc3;
buffer[3] = 0x04;
register int * rsp asm ("rsp"); // part 2
register size_t r10 asm ("r10");
r10 = 0;
while (*rsp != 0x04c3b2a1) {rsp++; r10++;} // part 3
while (*rsp != 0x00b1c2d3) rsp++; // part 4
rsp -= r10; // part 5
rsp = (int *) ((size_t) rsp & ~0xF); // part 6
fprintf(stdout, "Executing function_b\n");
}
The trick is that each of function_a and function_b have only one local variable, and we can find the address of that variable just by searching around in the memory.
First, we put a signature in the buffer, let it be the 4-byte integer 0x04c3b2a1 (remember that x86_64 is little-endian).
After that, we declare two variables to represent the registers: rsp is the stack pointer, and r10 is just some unused register.
This allows to not use asm statements later in the code, while still being able to use the registers directly.
It is important that the variables don't actually take stack memory, they are references to processor registers themselves.
After that, we move the stack pointer in 4-byte increments (since the size of int is 4 bytes) until we get to the buffer. We have to remember the offset from the stack pointer to the first variable here, and we use r10 to store it.
Next, we want to know how far in the stack are the instances of function_b and function_a. A good approximation is how far are buffer and beacon, so we now search for beacon.
After that, we have to push back from beacon, the first variable of function_a, to the start of instance of the whole function_a on the stack.
That we do by subtracting the value stored in r10.
Finally, here comes a werider bit.
At least on my configuration, the stack happens to be 16-byte aligned, and while the buffer array is aligned to the left of a 16-byte block, the beacon variable is aligned to the right of such block.
Or is it something with a similar effect and different explanation?..
Anyway, so we just clear the last four bits of the stack pointer to make it 16-byte aligned again.
The 32-bit GCC doesn't align anything for me, so you might want to skip or alter this line.
When working on a solution, I found the following macro useful:
#ifdef DEBUG
#define show_sp() \
do { \
register void * rsp asm ("rsp"); \
fprintf(stdout, "stack pointer is %016X\n", rsp); \
} while (0);
#else
#define show_sp() do{}while(0);
#endif
After this, when you insert a show_sp(); in your code and compile with -DDEBUG, it prints what is the value of stack pointer at the respective moment.
When compiling without -DDEBUG, the macro just compiles to an empty statement.
Of course, other variables and registers can be printed in a similar way.
ok, let assume that epilogue (i.e code at } line) of function_a and for function_b is the same
despite functions A and B not symmetric, we can assume this because it have the same signature (no parameters, no return value), same calling conventions and same size of local variables (4 byte - int beacon = 0x0b1c2d3 vs char buffer[4];) and with optimization - both must be dropped because unused. but we must not use additional local variables in function_b for not break this assumption. most problematic point here - what is function_A or function_B will be use nonvolatile registers (and as result save it in prologue and restore in epilogue) - but however look like here no place for this.
so my next code based on this assumption - epilogueA == epilogueB (really solution of #Gassa also based on it.
also need very clearly state that function_a and function_b must not be inline. this is very important - without this any solution impossible. so I let yourself add noinline attribute to function_a and function_b. note - not code change but attribute add, which author of this task implicitly implies but not clearly stated. don't know how in GCC mark function as noinline but in CL __declspec(noinline) for this used.
next code I write for CL compiler where exist next intrinsic function
void * _AddressOfReturnAddress();
but I think that GCC also must have the analog of this function. also I use
void* _ReturnAddress();
but however really _ReturnAddress() == *(void**)_AddressOfReturnAddress() and we can use _AddressOfReturnAddress() only. simply using _ReturnAddress() make source (but not binary - it equal) code smaller and more readable.
and next code is work for both x86 and x64. and this code work (tested) with any optimization.
despite I use 2 global variables - code is thread safe - really we can call main from multiple threads in concurrent, call it multiple time - but all will be worked correct (only of course how I say at begin if epilogueA == epilogueB)
hope comments in code enough self explained
__declspec(noinline) void function_b(void){
char buffer[4];
buffer[0] = 0;
static void *IPa, *IPb;
// save the IPa address
_InterlockedCompareExchangePointer(&IPa, _ReturnAddress(), 0);
if (_ReturnAddress() == IPa)
{
// we called from function_a
function_b();
// <-- IPb
if (_ReturnAddress() == IPa)
{
// we called from function_a, change return address for return to IPb instead IPa
*(void**)_AddressOfReturnAddress() = IPb;
return;
}
// we at stack of function_a here.
// we must be really at point IPa
// and execute fprintf(stdout, "Executed function_b\n"); + '}' (epilogueA)
// but we will execute fprintf(stdout, "Executing function_b\n"); + '}' (epilogueB)
// assume that epilogueA == epilogueB
}
else
{
// we called from function_b
IPb = _ReturnAddress();
return;
}
fprintf(stdout, "Executing function_b\n");
// epilogueB
}
__declspec(noinline) void function_a(void) {
int beacon = 0x0b1c2d3;
fprintf(stdout, "Executing function_a\n");
function_b();
// <-- IPa
fprintf(stdout, "Executed function_b\n");
// epilogueA
}
int main(void) {
function_a();
fprintf(stdout, "Finished!\n");
return 0;
}

Strange load instructions produced by mipsel-gcc when compiling glibc

I'm trying to get a small piece of hello-world MIPS program running in Gem 5 simulator. The program was compiled with gcc 4.9.2 and glibc 2.19 (built by crosstool-ng) and runs well in qemu, but it crashed with a page fault (trying to access address 0) in gem5.
Code is rather simple:
#include <stdio.h>
int main()
{
printf("hello, world\n");
return 0;
}
file ./test result:
./test: ELF 32-bit LSB executable, MIPS, MIPS-I version 1, statically
linked, for GNU/Linux 3.15.4, not stripped
After some debugging with gdb, I figured out that the page fault is triggered by _dl_setup_stack_chk_guard function in glibc. It accepts a void pointer called _dl_random passed by __libc_start_main function, which happens to be NULL. However, as far as I know, these functions never dereference the pointer, but instructions were generated to load values from the memory _dl_random pointer points to. Some code pieces might help understanding:
in function __libc_start_main (macro THREAD_SET_STACK_GUARD is not set):
/* Initialize the thread library at least a bit since the libgcc
functions are using thread functions if these are available and
we need to setup errno. */
__pthread_initialize_minimal ();
/* Set up the stack checker's canary. */
uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
# ifdef THREAD_SET_STACK_GUARD
THREAD_SET_STACK_GUARD (stack_chk_guard);
# else
__stack_chk_guard = stack_chk_guard;
# endif
in function _dl_setup_stack_chk_guard (always inlined):
static inline uintptr_t __attribute__ ((always_inline))
_dl_setup_stack_chk_guard (void *dl_random)
{
union
{
uintptr_t num;
unsigned char bytes[sizeof (uintptr_t)];
} ret = { 0 };
if (dl_random == NULL)
{
ret.bytes[sizeof (ret) - 1] = 255;
ret.bytes[sizeof (ret) - 2] = '\n';
}
else
{
memcpy (ret.bytes, dl_random, sizeof (ret));
#if BYTE_ORDER == LITTLE_ENDIAN
ret.num &= ~(uintptr_t) 0xff;
#elif BYTE_ORDER == BIG_ENDIAN
ret.num &= ~((uintptr_t) 0xff << (8 * (sizeof (ret) - 1)));
#else
# error "BYTE_ORDER unknown"
#endif
}
return ret.num;
}
disassembly code:
0x00400ea4 <+228>: jal 0x4014b4 <__pthread_initialize_minimal>
0x00400ea8 <+232>: nop
0x00400eac <+236>: lui v0,0x4a
0x00400eb0 <+240>: lw v0,6232(v0)
0x00400eb4 <+244>: li a0,-256
0x00400eb8 <+248>: lwl v1,3(v0)
0x00400ebc <+252>: lwr v1,0(v0)
0x00400ec0 <+256>: addiu v0,v0,4
0x00400ec4 <+260>: and v1,v1,a0
0x00400ec8 <+264>: lui a0,0x4a
0x00400ecc <+268>: sw v1,6228(a0)
0x4a1858 (0x4a0000 + 6232) is the address of _dl_random
0x4a1854 (0x4a0000 + 6228) is the address of __stack_chk_guard
Page fault occurs at 0x00400eb8. I don't quite get it how instruction 0x00400eb8 and 0x00400ebc are generated. Could someone shed some light on it please? Thanks.
Here is how I find the root of this problem and my suggestion for solution.
I think it helpful to dive into the Glibc source code to see what really happens. Starting from _dl_random or __libc_start_main are both OK.
As the value of _dl_random is unexpectedly NULL, we need to find how this variable initialize and where it is assigned. With the help of code analysing tools, we can find _dl_random in Glibc is only assigned with meaningful value in function _dl_aux_init, and this function is called by __libc_start_min.
_dl_aux_init iterates on its parameter -- auxvec -- and acts corresponding to auxvec[i].at_type. AT_RANDOM is the case for the assignment of _dl_random. So the problem is that there isn't an AT_RANDOM element to make _dl_random assigned.
As the program runs well in user mode qemu, the root of this problem resides in system environment provider, say, gem5, which has the responsibility to construct auxvec. Having that keyword, we can find that the auxv is constructed in gem5/src/arch/<arch-name>/process.cc.
The current auxv for MIPS is constructed as below:
// Set the system page size
auxv.push_back(auxv_t(M5_AT_PAGESZ, MipsISA::PageBytes));
// Set the frequency at which time() increments
auxv.push_back(auxv_t(M5_AT_CLKTCK, 100));
// For statically linked executables, this is the virtual
// address of the program header tables if they appear in the
// executable image.
auxv.push_back(auxv_t(M5_AT_PHDR, elfObject->programHeaderTable()));
DPRINTF(Loader, "auxv at PHDR %08p\n", elfObject->programHeaderTable());
// This is the size of a program header entry from the elf file.
auxv.push_back(auxv_t(M5_AT_PHENT, elfObject->programHeaderSize()));
// This is the number of program headers from the original elf file.
auxv.push_back(auxv_t(M5_AT_PHNUM, elfObject->programHeaderCount()));
//The entry point to the program
auxv.push_back(auxv_t(M5_AT_ENTRY, objFile->entryPoint()));
//Different user and group IDs
auxv.push_back(auxv_t(M5_AT_UID, uid()));
auxv.push_back(auxv_t(M5_AT_EUID, euid()));
auxv.push_back(auxv_t(M5_AT_GID, gid()));
auxv.push_back(auxv_t(M5_AT_EGID, egid()));
Now we know what to do. We just need to provide an accessible address value to _dl_random tagged by MT_AT_RANDOM. Gem5's ARM arch implements this already (code). Maybe we can take it as an example.

Buffer overflow in C

I'm attempting to write a simple buffer overflow using C on Mac OS X 10.6 64-bit. Here's the concept:
void function() {
char buffer[64];
buffer[offset] += 7; // i'm not sure how large offset needs to be, or if
// 7 is correct.
}
int main() {
int x = 0;
function();
x += 1;
printf("%d\n", x); // the idea is to modify the return address so that
// the x += 1 expression is not executed and 0 gets
// printed
return 0;
}
Here's part of main's assembler dump:
...
0x0000000100000ebe <main+30>: callq 0x100000e30 <function>
0x0000000100000ec3 <main+35>: movl $0x1,-0x8(%rbp)
0x0000000100000eca <main+42>: mov -0x8(%rbp),%esi
0x0000000100000ecd <main+45>: xor %al,%al
0x0000000100000ecf <main+47>: lea 0x56(%rip),%rdi # 0x100000f2c
0x0000000100000ed6 <main+54>: callq 0x100000ef4 <dyld_stub_printf>
...
I want to jump over the movl instruction, which would mean I'd need to increment the return address by 42 - 35 = 7 (correct?). Now I need to know where the return address is stored so I can calculate the correct offset.
I have tried searching for the correct value manually, but either 1 gets printed or I get abort trap – is there maybe some kind of buffer overflow protection going on?
Using an offset of 88 works on my machine. I used Nemo's approach of finding out the return address.
This 32-bit example illustrates how you can figure it out, see below for 64-bit:
#include <stdio.h>
void function() {
char buffer[64];
char *p;
asm("lea 4(%%ebp),%0" : "=r" (p)); // loads address of return address
printf("%d\n", p - buffer); // computes offset
buffer[p - buffer] += 9; // 9 from disassembling main
}
int main() {
volatile int x = 7;
function();
x++;
printf("x = %d\n", x); // prints 7, not 8
}
On my system the offset is 76. That's the 64 bytes of the buffer (remember, the stack grows down, so the start of the buffer is far from the return address) plus whatever other detritus is in between.
Obviously if you are attacking an existing program you can't expect it to compute the answer for you, but I think this illustrates the principle.
(Also, we are lucky that +9 does not carry out into another byte. Otherwise the single byte increment would not set the return address how we expected. This example may break if you get unlucky with the return address within main)
I overlooked the 64-bitness of the original question somehow. The equivalent for x86-64 is 8(%rbp) because pointers are 8 bytes long. In that case my test build happens to produce an offset of 104. In the code above substitute 8(%%rbp) using the double %% to get a single % in the output assembly. This is described in this ABI document. Search for 8(%rbp).
There is a complaint in the comments that 4(%ebp) is just as magic as 76 or any other arbitrary number. In fact the meaning of the register %ebp (also called the "frame pointer") and its relationship to the location of the return address on the stack is standardized. One illustration I quickly Googled is here. That article uses the terminology "base pointer". If you wanted to exploit buffer overflows on other architectures it would require similarly detailed knowledge of the calling conventions of that CPU.
Roddy is right that you need to operate on pointer-sized values.
I would start by reading values in your exploit function (and printing them) rather than writing them. As you crawl past the end of your array, you should start to see values from the stack. Before long you should find the return address and be able to line it up with your disassembler dump.
Disassemble function() and see what it looks like.
Offset needs to be negative positive, maybe 64+8, as it's a 64-bit address. Also, you should do the '+7' on a pointer-sized object, not on a char. Otherwise if the two addresses cross a 256-byte boundary you will have exploited your exploit....
You might try running your code in a debugger, stepping each assembly line at a time, and examining the stack's memory space as well as registers.
I always like to operate on nice data types, like this one:
struct stackframe {
char *sf_bp;
char *sf_return_address;
};
void function() {
/* the following code is dirty. */
char *dummy;
dummy = (char *)&dummy;
struct stackframe *stackframe = dummy + 24; /* try multiples of 4 here. */
/* here starts the beautiful code. */
stackframe->sf_return_address += 7;
}
Using this code, you can easily check with the debugger whether the value in stackframe->sf_return_address matches your expectations.

Resources