why buffer overflow happens? - c

#include <stdio.h>
void ft_rev_int_tab(int *tab, int size)
{
int *rev_tab;
int *ptab;
ptab = tab;
rev_tab = tab + size ;
while (rev_tab != ptab)
{
*rev_tab-- = *tab++;
printf("%d\n",*rev_tab);
}
printf("%d", *rev_tab);
}
int main()
{ int array[10] = {0,1,2,3,4,5,6,7,8,9};
ft_rev_int_tab(array, 10);
return 0;
}
I Create a function which reverses a given array of integer.
But they show this error . If I command ./a.out| cat -e on linux,
they show
and if I command ./a.out they show
why linux show difference Using cat -e and not using cat -e?
And if I make array_size in odd number , there is no error!
why this happen?

Fixing the memory corruption
The error message means that your program crashed with an abort signal. This can be the result of a memory corruption. Memory corruptions tend to result in undefined behavior. So if your program has a memory corruption, it can happen that your program sometimes crashes and sometimes works fine. This is the case for your example. Your problem is unrelated to cat -e, it was only by luck that your program worked fine when you called it without cat -e. If you would have run your program a few more times, eventually you would have seen another crash.
In your program, the following line is causing a memory corruption:
*rev_tab-- = *tab++;
The expression rev_tab-- is a post-decrement. A post-fix decrement decreases the variable by one and returns the old value. So for example if rev_tab is a pointer to memory location 100000 and you would run the following line:
int* result = rev_tab--;
then after that line, rev_tab is 99999, but result received the old value of 100000.
So in your program, rev_tab initially points to rev_tab = tab + size, which is the memory location immediately after the table (i.e. the last memory location in the table is tab + size - 1. Likewise, ptab = tab; makes ptab point to the first memory location in the table. So when you do *rev_tab-- = *tab++;, even though rev_tab-- decreases rev_tab by one, meaning it now points to the last memory location in the table, the expression rev_tab-- evaluates to the previous value of rev_tab, so *rev_tab-- dereferences the pointer that points to the memory location immediately after the table. You then write the result of *tab++ to that memory location. This is a buffer overflow, which results in undefined behavior, since arbitrary memory of your process can be overwritten, resulting in corrupted memory. Your program then may or may not crash, depending on how critical the corruption is.
So to fix this problem, you should use the pre-decrement:
*--rev_tab = *tab++;
This will fix the memory corruption, avoiding the crash.
I would advise you to enable Address Sanitization to detect such memory problems during debugging. If you use GCC or Clang, you can enable Address Sanitization by compiling with -fsanitize=address, which would give you the following output when applied to your original program:
$ gcc -fsanitize=address -g test.c
$ ./a.out
=================================================================
==4848==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe3396f688 at pc 0x562ef80e528a bp 0x7ffe3396f5e0 sp 0x7ffe3396f5d0
WRITE of size 4 at 0x7ffe3396f688 thread T0
#0 0x562ef80e5289 in ft_rev_int_tab test.c:10
#1 0x562ef80e5659 in main test.c:17
#2 0x7f6dbecf8b24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
#3 0x562ef80e50fd in _start (a.out+0x10fd)
Other problems
In your question you state that you want to reverse the input array. However, the code you posted does not accomplish this task. You have two pointers, one pointing to the front and one pointing to the back of the array. You then loop, incrementing the front pointer and decrementing the back pointer, until the back pointer reaches the first element. In every step of the iteration, you write the value referenced by the front pointer to the back pointer. However, you do not write the value referenced by the back pointer to the front pointer. So when the latter half of the array has been traversed, it has been overwritten with the values from the first half, and but the values from the latter half have not been written to the first half, so they are lost. Then you continue to traverse the first half, writing the values you wrote to the latter half back to the first half. So when your input array contains 0,1,2,3,4,5, your result would be 0,1,2,2,1,0. This is not the result you want.

Related

Strange Pointers Behaviour in C

I was experimenting with pointers. Look at this code:
#include <stdio.h>
int main() {
int numba = 1;
int *otherintptr = &numba;
printf("%d\n", otherintptr);
printf("%d\n", *otherintptr);
*otherintptr++;
printf("%d\n", otherintptr);
printf("%d\n", *otherintptr);
return 0;
}
The output is:
2358852
1
2358856
2358856
Now, I am well aware that (*otherintptr)++ would have incremented my int, but my question is not this.
After the increment, the memory location is correctly increased by 4 bytes, which is the size of an integer.
I'd like to know why the last printf instruction prints the memory location, while I am clearly asking to print the content of memory locations labelled 2358856 (I was expecting some dirty random content).
Note that the second printf statement prints the content of memory cell 2358852, (the integer 1) as expected.
What happens with these two lines
int numba = 1;
int *otherintptr = &numba;
due to the fact the C compiler will generate a sequential memory layout, otherintptr will initially point to the memory address corresponding to the numba variable. And this is relative to the stack frame allocated when main was called.
Now, the next position on the stack (actually the previous if we consider that the stack grows down on x86 based architectures) is occupied by the otherintptr variable. Incrementing otherintptr will make it point to itself, thus you see the same value.
To exemplify, let's assume that the stack for main begins at the 0x20 offset in memory:
0x20: 0x1 #numba variable
0x24: 0x20 #otherintptr variable pointing to numa
After executing the otherintptr++ instruction, the memory layout will look like this:
0x20: 0x1 #numba variable
0x24: 0x24 #otherintptr variable now pointing to itself
This is why the second printf's have the same output.
When you did otherintptr++, you accidentally made otherintptr to point to otherintptr, i.e. to itself. otherintptr just happened to be stored in memory immediately after your numba.
In any case, you got lucky on several occasions here. It is illegal to use an int * pointer to access something that is not an int and not compatible with int. It is illegal to use %d to print pointer values.
I suppose you wanted to increment the integer otherpointer points to (numba). However, you incremented actually the pointer, as ++ binds stronger than *
see here.
So otherpointer pointed past the variable. And as there is no valid variable, dereferencing the pointer is undefined behaviour. Thus, anything can happen and you just were lucky the program did not crash. It just happend by chance otherpointer itself resided at that address.

Array Destruction At The End Of Function Call

Here is my code
#include<stdio.h>
int * fun(int a1,int b)
{
int a[2];
a[0]=a1;
a[1]=b;
//int c=5;
printf("%x\n",&a[0]);
return a;
}
int main()
{
int *r=fun(3,5);
printf("%d\n",r[0]);
printf("%d\n",r[0]);
}
I am running codeblocks on Windows 7
Every time I run the loop I get the outputs as
22fee8
3
2293700
Here is the part I do not understand :
r expects a pointer to a part of memory which is interpreted as a sequence of boxes (each box of 4 byte width - >Integers ) on invoking fun function
What should happen is printf of function will print the address of a or address of a[0]:
Seconded
NOW THE QUESTION IS :
each time I run the program I get the same address?
And the array a should be destroyed at the end of Function fun only pointer must remain after function call
Then why on earth does the line r[0] must print 3?
r is pointing to something that doesn't exist anymore. You are returning a pointer to something on the stack. That stack will rewind when fun() ends. It can point to anything after that but nothing has overwritten it because another function is never called.
Nothing forces r[0] to be 3 - it's just a result of going for the simplest acceptable behaviour.
Basically, you're right that a must be destroyed at the end of fun. All this means is that the returned pointer (r in your case) is completely unreliable. That is, even though r[0] == 3 for you on your particular machine with your particular compiler, there's no guarantee that this will always hold on every machine.
To understand why it is so consistent for you, think about this: what does is mean for a to be destroyed? Only that you can't use it in any reliable way. The simplest way of satisfying this simple requirement is for the stack pointer to move back to the point where fun was called. So when you use r[0], the values of a are still present, but they are junk data - you can't count on them existing.
This is what happens:
int a[2]; is allocated on the stack (or similar). Suppose it gets allocated at the stack at address 0x12345678.
Various data gets pushed on the stack at this address, as the array is filled. Everything works as expected.
The address 0x12345678 pointing at the stack gets returned. (Ironically, the address itself likely gets returned on the stack.)
The memory allocated on the stack for a ceases to be valid. For now the two int values still sit at the given address in RAM, containing the values assigned to them. But the stack pointer isn't reserving those cells, nor is anything else in the program keeping track of that data. Computers don't delete data by erasing the value etc, they delete cells by forgetting that anything of use is stored at that memory location.
When the function ends, those memory cells are free to be used for the rest of the program. For example, a value returned by the function might end up there instead.
The function returned a pointer to a segment on the stack where there used to be valid data. The pointer is still 0x12345678 but at that address, anything might be stored by now. Furthermore, the contents at that address may change as different items are pushed/popped from the stack.
Printing the contents of that address will therefore give random results. Or it could print the same garbage value each time the program is executed. In fact it isn't guaranteed to print anything at all: printing the contents of an invalid memory cell is undefined behavior in C. The program could even crash when you attempt it.
r is undefined after the stack of the function int * fun(int a1,int b) is released, right after it ends, so it can be 3 or 42 or whatever value. The fact that it still contains your expected value is because it haven't been used for anything else, as a chunk of your memory is reserved for your program and your program does not use the stack further. Then after the first 3 is printed you get another value, that means that stack was used for something else, you could blame printf() since it's the only thing runing and it does a LOT of things to get that numbers into the console.
Why does it always print the same results? Because you always do the same process, there's no magic in it. But there's no guarantee that it'll be 3 since that 'memory space' is not yours and you are only 'peeking' into it.
Also, check the optimization level of your compiler fun() and main(), being as simple as they are, could be inline'd or replaced if the binary is to be optimized reducing the 'randomness' expected in your results. Although I wouldn't expect it to change much either.
You can find pretty good answers here:
can-a-local-variables-memory-be-accessed-outside-its-scope
returning-the-address-of-local-or-temporary-variable
return-reference-to-local-variable
Though the examples are for C++, underlying idea is same.

Memory Allocation: Why this C program works? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Returning the address of local or temporary variable
The add function is implemented wrongly. It should return a value instead of a pointer.
Why aren't any errors when ans and *ans_ptr are printed and the program even gives correct result? I guess the variable of z is already out of scope and there should be segmentation fault.
#include <stdio.h>
int * add(int x, int y) {
int z = x + y;
int *ans_ptr = &z;
return ans_ptr;
}
int main() {
int ans = *(add(1, 2));
int *ans_ptr = add(1, 2);
printf("%d\n", *ans_ptr);
printf("%d\n", ans);
return 0;
}
The reason it 'works' is because you got lucky. Returning a pointer to a local variable is Undefined Behaviour!! You should NOT do it.
int * add(int x, int y) {
int z = x + y; //z is a local variable in this stack frame
int *ans_ptr = &z; // ans_ptr points to z
return ans_ptr;
}
// at return of function, z is destroyed, so what does ans_ptr point to? No one knows. UB results
Because C has no garbage collection, when the "z" variable goes out of scope, nothing happens to the actual memory. It is simply freed for another variable to overwrite if the compiler pleases.
Since no memory is allocated between calling "add" and printing, the value is still sitting in memory, and you can access it because you have its address. You "got lucky."
However, as Tony points out, you should NEVER do this. It will work some of the time, but as soon as your program gets more complex, you will start ending up with spurious values.
No. Your question displays a fundamental lack of understanding of how the C memory model works.
The value z is allocated at an address on the stack, in the frame which is created when control enters add(). ans_ptr is then set to this memory address and returned.
The space on the stack will be overwritten by the next function that is called, but remember that C never performs memory clean up unless explicitly told to (eg via a function like calloc()).
This means that the value in the memory location &z (from the just-vacated stack frame) is still intact in the immediately following statement, ie. the printf() statement in main().
You should never ever rely on this behaviour - as soon as you add additional code into the above it will likely break.
The answer is: this program works because you are fortunate, but it will take no time to betray, as the address you return is not reserved to you anymore and any one can use it again. Its like renting the room, making a duplicate key, releasing the room, and after you have released the room at some later time you try to enter it with a duplicate key. In this case if the room is empty and not rented to someone else then you are fortunate, otherwise it can land you in police custody (something bad), and if the lock of the room was changed you get a segfault, so you can't just trust on the duplicate key which you made without acquisition of the room.
The z is a local variable allocated in stack and its scope is as long as the particular call to the function block. You return the address of such a local variable. Once you return from the function, all the addresses local to the block (allocated in the function call stack frame) might be used for another call and be overwritten, therefore you might or might not get what you expect. Which is undefined behavior, and thus such operation is incorrect.
If you are getting correct output, then you are fortunate that the old value held by that memory location is not overwritten, but your program has access to the page in which the address lies, therefore you do not get a segmentation fault error.
A quick test shows, as the OP points out, that neither GCC 4.3 nor MSVC 10 provide any warnings. But the Clang Static Analyzer does:
ccc-analyzer -c foo.c
...
ANALYZE: foo.c add
foo.c:6:5: warning: Address of stack memory associated with local
variable 'z' returned to caller
return ans_ptr;
^ ~~~~~~~

Explain the output

#include<stdio.h>
int * fun(int a1,int b)
{
int a[2];
a[0]=a1;
a[1]=b;
return a;
}
int main()
{
int *r=fun(3,5);
printf("%d\n",*r);
printf("%d\n",*r);
}
Output after running the code:
3
-1073855580
I understand that a[2] is local to fun() but why value is getting changed of same pointer?
The variable a is indeed local to fun. When you return from that function, the stack is popped. The memory itself remains unchanged (for the moment). When you dereference r the first time, the memory is what you'd expect it to be. And since the dereference happens before the call to printf, nothing bad happens. When printf executes, it modifies the stack and the value is wiped out. The second time through you're seeing whatever value happened to be put there by printf the first time through.
The order of events for a "normal" calling convention (I know, I know -- no such thing):
Dereference r (the first time through, this is what it should be)
Push value onto stack (notice this is making a copy of the value) (may wipe out a)
Push other parameters on to stack (order is usually right to left, IIRC) (may wipe out a)
Allocate room for return value on stack (may wipe out a)
Call printf
Push local printf variables onto stack (may wipe out a)
Do your thang
Return from function
If you change int a[2]; to static int a[2]; this will alleviate the problem.
Because r points to a location on the stack that is likely to be overwritten by a function call.
In this case, it's the first call to printf itself which is changing that location.
In detail, the return from fun has that particular location being preserved simply because nothing has overwritten it yet.
The *r is then evaluated (as 3) and passed to printf to be printed. The actual call to printf changes the contents of that location (since it uses the memory for its own stack frame), but the value has already been extracted at that point so it's safe.
On the subsequent call, *r has the different value, changed by the first call. That's why it's different in this case.
Of course, this is just the likely explanation. In reality, anything could be happening since what you've coded up there is undefined behaviour. Once you do that, all bets are off.
As you've mentioned, a[2] is local to fun(); meaning it is created on the stack right before the code within fun() starts executing. When fun exits the stack is popped, meaning it is unwound so that the stack pointer is pointing to where it was before fun started executing.
The compiler is now free to stick whatever it wants into those locations that were unwound. So, it is possible that the first location of a was skipped for a variety of reasons. Maybe it now represents an uninitialized variable. Maybe it was for memory alignment of another variable. Simple answer is, by returning a pointer to a local variable from a function, and then de-referencing that pointer, you're invoking undefined behavior and anything can happen, including demons flying out of your nose.
When you compile you code with the following command:
$ gcc -Wall yourProgram.c
It will yield a warning, which says.
In function ‘fun’:
warning: function returns address of local variable
When r is dereferenced in first printf statement, it's okay as the memory is preserved. However, the second printf statement overwrites the stack and so we get an undesired result.
Because printf is using the stack location and changes it after printing the first value.

Expected segmentation fault, as soon as assigning a value to index 1?

In the code snippet, I expected a segmentation fault as soon as trying to assign a value to count[1]. However the code continues and executes the second for-loop, only indicating a segmentation fault when the program terminates.
#include <stdio.h>
int main()
{
int count[1];
int i;
for(i = 0; i < 100; i++)
{
count[i] = i;
}
for(i = 0; i < 100; i++)
{
printf("%d\n", count[i]);
}
return 0;
}
Could someone explain what is happening?
Reasons for edit:
Improved the example code as per comments of users,
int count[0] -> int count[1],
too avoid flame wars.
You're writing beyond the bounds of the array. That doesn't mean you're going to get a segmentation fault. It just means that you have undefined behavior. Your program's behavior is no longer constrained by the C standard. Anything could happen (including the program seeming to work) -- a segfault is just one possible outcome.
In practice, a segmentation fault occurs when you try to access a memory page that is not mapped to your process by the OS. Each page is 4KB on a typical x86 PC, so basically, your process is given access to memory in 4KB chunks, and you only get a segfault if you write outside the current chunk.
With the small indices you're using, you're still staying within the current memory page, which is allocated to your process, and so the CPU doesn't detect that you're accessing memory out of bounds.
When you write beyond the array bounds, you are probably still writing data into the area of memory under the control of your process; you are also almost certainly overwriting memory used by other software, such as heap or stack frame management code. It is only when that code executes, such as when the current function attempts to return, that your code might go awry. Actually, you really hope for a seg fault.
Your code is broken:
seg.c:5: warning: ISO C forbids zero-size array ‘count’
Always compile with high warning levels, for example -Wall -pedantic for GCC.
Edit:
What you are effectively doing is corrupting mains function stack frame. Since stack nowadays pretty much always grows down, this is what's happening:
First loop overrides stack memory holding main parameters and return address to crt0 routines.
Second loop happily reads that memory.
When main returns the segmentation fault is triggered since return address is fubar-ed.
This is a classic case of buffer overrun and is the basis of many network worms.
Run the program under the debugger and check the addresses of local variables. In GDB you can say set backtrace past-main so backtrace would show you all the routines leading to main.
By the way, the same effect could be achieved without zero-length array - just make its size smaller then number of loop iterations.

Resources