C - Buffer Overflow Issue

C - Buffer Overflow Issue - c

Here is the source code for the program
#include <stdio.h>
% filename: test.c
int main(){
int local = 0;
char buff[7];
printf("Password: ");
gets(buff);
if (local)
printf("Buff: %s, local:%d\n", buff, local);
return 0;
}
I using "gcc test.c -fno-stack-protector -o test" to compile this file, and when I run the file, in order to make the gets overflow, I need to enter at least 13 characters. The way I think is since I only declare 7 bytes to the buff, which means when user enter at least 8 characters, overflow happened. But seems like not this case, why?

The stack layout is implementation dependent.
There is no guarantee that local object will be after or before buff object or that both will be contiguous.
In your case, what is likely is 5-bytes (1 + 4) of padding were inserted after buff object. Such padding is very common and is usually inserted by the compiler for performance reasons. The size of the padding can vary between compilers, compiler versions, compiler options or even from different source codes.
To have a better idea of the layout just print the addresses of both buff and local objects:
printf("%p %p\n", (void *) buff, (void *) local);

gets(buff);
There is no default bound check in C. At the best program will crash, usually it will get unallocated memory, so it just works fine. Even sometime you overrun already used memory, and may not realize it till you get into problem.

There's no guarantee that the items on your stack will be allocated packed tightly together, in fact I'm pretty certain they're not even guaranteed to be put in the same order.
If you want to understand why it's not failing as you seem to expect, your best bet is to look at the assembler output, with something like gcc -S.
For what it's worth, gcc 4.8.3 under CygWin fails at eight characters as one might expect. It overflows at seven characters but, because the overflowing character is the NUL terminator, that still leaves local as zero.
Bottom line, undefined behaviour is exactly that. It's most annoying feature (to developers) is that it sometimes works, meaning that we sometimes miss it until the code gets out into production. If UB were somehow tied to electrodes connected to the private parts of developers, I guarantee there'd be far fewer bugs :-)

Related

Difference between array and malloc

Here is my code :
#include<stdio.h>
#include <stdlib.h>
#define LEN 2
int main(void)
{
char num1[LEN],num2[LEN]; //works fine with
//char *num1= malloc(LEN), *num2= malloc(LEN);
int number1,number2;
int sum;
printf("first integer to add = ");
scanf("%s",num1);
printf("second integer to add = ");
scanf("%s",num2);
//adds integers
number1= atoi(num1);
number2= atoi(num2);
sum = number1 + number2;
//prints sum
printf("Sum of %d and %d = %d \n",number1, number2, sum);
return 0;
}
Here is the output :
first integer to add = 15
second integer to add = 12
Sum of 0 and 12 = 12
Why it is taking 0 instead of first variable 15 ?
Could not understand why this is happening.
It is working fine if I am using
char *num1= malloc(LEN), *num2= malloc(LEN);
instead of
char num1[LEN],num2[LEN];
But it should work fine with this.
Edited :
Yes, it worked for LEN 3 but why it showed this undefined behaviour. I mean not working with the normal arrays and working with malloc. Now I got that it should not work with malloc also. But why it worked for me, please be specific so that I can debug more accurately ?
Is there any issue with my system or compiler or IDE ?
Please explain a bit more as it will be helpful or provide any links to resources. Because I don't want to be unlucky anymore.

LEN is 2, which is enough to store both digits but not the required null terminating character. You are therefore overrunning the arrays (and the heap allocations, in that version of the code!) and this causes undefined behavior. The fact that one works and the other does not is simply a byproduct of how the undefined behavior plays out on your particular system; the malloc version could indeed crash on a different system or a different compiler.
Correct results, incorrect results, crashing, or something completely different are all possibilities when you invoke undefined behavior.
Change LEN to 3 and your example input would work fine.
I would suggest indicating the size of your buffers in your scanf() line to avoid the undefined behavior. You may get incorrect results, but your program at least would not crash or have a security vulnerability:
scanf("%2s", num1);
Note that the number you use there must be one less than the size of the array -- in this example it assumes an array of size 3 (so you read a maximum of 2 characters, because you need the last character for the null terminating character).

LEN is defined as 2. You left no room for a null terminator. In the array case you would overrun the array end and damage your stack. In the malloc case you would overrun your heap and potentially damage the malloc structures.
Both are undefined behaviour. You are unlucky that your code works at all: if you were "lucky", your program would decide to crash in every case just to show you that you were triggering undefined behaviour. Unfortunately that's not how undefined behaviour works, so as a C programmer, you just have to be defensive and avoid entering into undefined behaviour situations.
Why are you using strings, anyway? Just use scanf("%d", &number1) and you can avoid all of this.

Your program does not "work fine" (and should not "work fine") with either explicitly declared arrays or malloc-ed arrays. Strings like 15 and 12 require char buffers of size 3 at least. You provided buffers of size 2. Your program overruns the buffer boundary in both cases, thus causing undefined behavior. It is just that the consequences of that undefined behavior manifest themselves differently in different versions of the code.
The malloc version has a greater chance to produce illusion of "working" since sizes of dynamically allocated memory blocks are typically rounded to the nearest implementation-depended "round" boundary (like 8 or 16 bytes). That means that your malloc calls actually allocate more memory than you ask them to. This might temporarily hide the buffer overrun problems present in your code. This produces the illusion of your program "working fine".
Meanwhile, the version with explicit arrays uses local arrays. Local arrays often have precise size (as declared) and also have a greater chance to end up located next to each other in memory. This means that buffer overrun in one array can easily destroy the contents of the other array. This is exactly what happened in your case.
However, even in the malloc-based version I'd still expect a good debugging version of standard library implementation to catch the overrun problems. It is quite possible that if you attempt to actually free these malloc-ed memory blocks (something you apparently didn't bother to do), free will notice the problem and tell you that heap integrity has been violated at some point after malloc.
P.S. Don't use atoi to convert strings to integers. Function that converts strings to integers is called strtol.

Am I erasing the stack here?

I am using gcc on Linux and the below code compiles successfully but not printing the values of variable i correctly, if a character is entered once at a time i jumps or reduces to 0. I Know I am using %d for a char at scanf(I was trying to erase the stack). Is this a case of attempt to erase stack or something else ?( I thought if the stack was erased the program would crash).
#include <stdio.h>
int main()
{
int i;
char c;
for (i=0; i<5; i++) {
scanf ("%d", &c);
printf ("%d ", i);
}
return 0;
}

Besides the arguments to main, you have an int and a char on the stack.
Lets assume sizeof(int) == 4 and only have a look at i and c.
( int i )(char c )
[ 0 ][ 1 ][ 2 ][ 3 (&i)][ 4 (&c)]
So this is actually your stack layout without argc and *argv.
With i consuming four times more memory than c in this case.
The stack grows in the opposite direction, so if you write something bigger than a char to c, it will write to [4] and further to the left, never to the right. So [5] will never get written to. Instead you overwrite [3].
For the case where you write an int to c and int is four times bigger than c, you'll actually write to [1][2][3][4], just [0] will not be overwritten, but 3/4 of the memory for the int will be corrupted.
On a big-endian system, the most significant byte of i will be stored in [3] and therefore get overwritten by this operation. On a little-endian system, the most significant byte is stored in [0] and would be preserved. Nonetheless, you corrupt your stack this way.
As ams mentions this is not always true. There could be different alignments for efficiency or because the platform only supports aligned access, leaving gaps between variables. Also a compiler is allowed to do any optimizations as long as it has no visible side-effects as stated by the as-if rule. In this case the variables could perfectly be stored in a register and never be saved on the stack at all. But a lot of other compiler optimizations and platform dependencies can make this way more complex.
So this is only the simplest case without taking platform dependencies and compiler optimizations into account and also seems to be what happens in your special case with maybe some minor differences.

With your scanf(), you are inserting a int inside a char. One char is usually stored using just 1 byte, so your compiler will probably overflow, but it could or not overwrite other values, depending on the alignment of the variables.
If your compiler reserves 1 byte for the char, and the memory address of the int is just after the address of the char (that will probably be the case), then your scanf() will just overwrite the first bytes of i. If you are in a little-endian machine and you enter values smaller than 256, then i will always be 0.
But it can grow larger if you enter a bigger value. Try entering 256; i will become 1. With 512, i will become 2, and so one.
But you are not "erasing the stack", just overwriting some bytes (in fact, you are overwriting sizeof(int) bytes; one of them correspond to the char and the others will probably be all the bytes in your int but one).
If you really want to "erase the stack", you could do something like:
#include <string.h>
int
main(void) {
char c;
memset(&c, 0, 10000);
return 0;
}

#foobar has given a very nice description of what one compiler on one architecture happens to do. He(?) also gives an answer to what a hypothetical big-endian compiler/system might do.
To be fair, that's probably the most useful answer, but it's still not correct.
There is no rule that says the stack must grow in one way or the other (although, much like whether to drive on the left or the right, a choice must be made, and a descending stack is most common).
There is no rule that says the compiler must layout the variables on the stack in any particular way. The C standard doesn't care. Not even the official architecture/OS-specific ABI cares a jot about that, as long as the bits necessary for unwinding work. In this case, the compiler could even choose a different scheme for every function in the system (not that it's likely).
All that is certain is that scanf will try to write something int-sized to a char, and that this is undefined. In practice there are several possible outcomes:
It works fine, nothing extra is overwritten, and you got lucky. (Perhaps int is the same size as char, or there was padding in the stack.)
It overwrites the data following the char in memory, whatever that is.
It overwrites the data just before and/or after the char. (This might happen on an architecture where an aligned store instruction disregards the bottom bits of the write address.)
It crashes with an unaligned access exception.
It detects the stack scribble, prints a message, and reports the incident.
Of course, none of this will happen because you compile with -Wall -Wextra -Werror enabled, and GCC tells you that your scanf format doesn't match your variable types.

As Kerrek SB commented, the behavior is undefined.
As you know, you pass a char * to the scanf function, but tells the function to treat it like an int *.
It might (although very unlikely to) overwrite something else on the stack, for example, i, the previous stack pointer or the return address.
It might just override unused bytes, for example if the compiler uses padding to align the stack.
It might cause a crash, for example if the address of c is not 4- or 8-byte aligned and the platform requires de-reference of int to be 4- or 8-byte aligned.
And it might do anything else.
But the answer is still - anything is possible in such case. The behavior is simply not defined.

Strings behvior on C

I want to understand a number of things about the strings on C:
I could not understand why you can not change the string in a normal assignment. (But only through the functions of string.h), for example: I can't do d="aa" (d is a pointer of char or a array of char).
Can someone explain to me what's going on behind the scenes - the compiler gives to run such thing and you receive segmentation fault error.
Something else, I run a program in C that contains the following lines:
char c='a',*pc=&c;
printf("Enter a string:");
scanf("%s",pc);
printf("your first char is: %c",c);
printf("your string is: %s",pc);
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
If I put two letters, the first letter printed right! And the string is printed with a lot of profits (incorrect)
If I put a letter, the letter is printed right! And the string is printed with a lot of profits and at the end something weird (a square with four numbers containing zeros and ones)
Can anyone explain what is happening behind?
Please note: I do not want the program to work, I did not ask the question to get suggestions for another program, I just want to understand what happens behind the scenes in these situations.

Strings almost do not exist in C (except as C string literals like "abc" in some C source file).
In fact, strings are mostly a convention: a C string is an array of char whose last element is the zero char '\0'.
So declaring
const char s[] = "abc";
is exactly the same as
const char s[] = {'a','b','c','\0'};
in particular, sizeof(s) is 4 (3+1) in both cases (and so is sizeof("abc")).
The standard C library contains a lot of functions (such as strlen(3) or strncpy(3)...) which obey and/or presuppose the convention that strings are zero-terminated arrays of char-s.
Better code would be:
char buf[16]="a",*pc= buf;
printf("Enter a string:"); fflush(NULL);
scanf("%15s",pc);
printf("your first char is: %c",buf[0]);
printf("your string is: %s",pc);
Some comments: be afraid of buffer overflow. When reading a string, always give a bound to the read string, or else use a function like getline(3) which dynamically allocates the string in the heap. Beware of memory leaks (use a tool like valgrind ...)
When computing a string, be also aware of the maximum size. See snprintf(3) (avoid sprintf).
Often, you adopt the convention that a string is returned and dynamically allocated in the heap. You may want to use strdup(3) or asprintf(3) if your system provides it. But you should adopt the convention that the calling function (or something else, but well defined in your head) is free(3)-ing the string.
Your program can be semantically wrong and by bad luck happening to sometimes work. Read carefully about undefined behavior. Avoid it absolutely (your points 1,2,3 are probable UB). Sadly, an UB may happen to sometimes "work".
To explain some actual undefined behavior, you have to take into account your particular implementation: the compiler, the flags -notably optimization flags- passed to the compiler, the operating system, the kernel, the processor, the phase of the moon, etc etc... Undefined behavior is often non reproducible (e.g. because of ASLR etc...), read about heisenbugs. To explain the behavior of points 1,2,3 you need to dive into implementation details; look into the assembler code (gcc -S -fverbose-asm) produced by the compiler.
I suggest you to compile your code with all warnings and debugging info (e.g. using gcc -Wall -g with GCC ...), to improve the code till you got no warning, and to learn how to use the debugger (e.g. gdb) to run your code step by step.

If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
Because memory is allocated for only one byte.
See char c and assigned with "a". Which is equal to 'a' and '\0' is written in one byte memory location.
If scanf() uses this memory for reading more than one byte, then this is simply undefined behavior.

char c="a"; is a wrong declaration in c language since even a single character is enclosed within a pair of double quotes("") will treated as string in C because it is treated as "a\0" since all strings ends with a '\0' null character.
char c="a"; is wrong where as char c='c'; is correct.
Also note that the memory allocated for char is only 1byte, so it can hold only one character, memory allocation details for datatypes are described bellow

C buffer overflow only when way out of range

Consider next test program:
char a[10];
strcpy(a, "test");
for(int i=0; i<3; i++) {
char b[2];
strcpy(b, "tt");
strcat(a, b);
}
printf("%d %d %s\n", strlen(a), sizeof(a), a);
Output: 10 10 testtttttt.
Everything seems ok.
If i<7 the buffer is overflow, however there is no error. Output: 18 10 testtttttttttttttt. Program seems to be working.
If i<11 then we see an error "stack smashing detected"...
Why is that program doesn't prompt an error when i<7 ?

What you are doing is undefined behaviour. Anything could happen. What you saw is just one possible outcome, that some automatic tool detected it quite late instead of instantly. Your code is wrong, any assumption what should happen is wrong, and asking for a "why" is pointless. With a different compiler, or with different compiler settings, or on a different day, the outcome could be completely different.
By the way, there is a buffer overflow when i = 0 since you are trying to copy two chars and a trailing zero byte into a buffer that only has space for two chars.

If i<7 the buffer is overflow, however there is no error. Output: 18
10 testtttttttttttttt. Program seems to be working.
The reason is because its an undefined behavior. You can expect any value to appear since you are accessing an array out of its limits
You may check Valgrind for these scenarios
Your buffer variable is only allowing 10 characters, your argument is 11, increase your buffer to support your argument.
char a[10];
The error which you are getting i.e, Stack Smashing, that is a protection mechanism used by gcc to check buffer overflow errors.

You are asking why there is no Error:
The buffer overflow detection is a feature to help you, but there's absolutely no guaranty that it'll detect buffer overflows.

No error even when I'm overriding the limit of allocated memory?

I might be stupid and you need to excuse me in that case...but I don't get this.
I'm allocating a buffer of 16 chars and then (in a for loop) put in 23(!?) random characters and then printing that stuff out.
What I don't get is how I can put 23 chars into a buffer that is malloc'ed as 16 chars...When I change the loop to 24 characters I get an error though(at least in Linux with gcc)...but why not "earlier" (17 characters should break it...no?)
This is my example code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int n;
char *buf;
buf = malloc(16 * sizeof(*buf));
if(buf == NULL) exit(1);
for(n = 0; n < 22; n++)
{
buf[n] = rand()%26+'a';
}
buf[n]='\0';
printf("Random string: %s\n", buf);
free(buf);
buf = NULL;
getchar();
return 0;
}

You are producing an error, but like many bugs it just happens to not be noticed. That's all there is to it.
It might be for one of several reasons - probably that the way the free store is structured, there's slack space between allocations because the system needs (or wants) to keep addresses of allocatable blocks aligned on certain boundaries. So writes a little past your allocated block don't interfere with the free stores data structures, but a little bit further and they do.
It is also quite possible that your bug did corrupt something the free store manager was using, but it just happened to not be actually used in your simple program so the error wasn't noticed (yet).

Most memory allocation strategies round your malloc request up to some quantization value. Often 16 or 32 bytes. That quantization usually happens after the allocator has added in its overhead (used to keep track of the allocated blocks), so it's common to find that you can overrun a malloc by some small number of bytes without doing any actual damage to the heap, especially when the allocation size is odd.
Of course, this isn't something that you want to depend on, it's an implementation detail of the c runtime library, and subject to change without notice.

The behaviour when you overrun a buffer in C is undefined. So anything may happen including nothing. Specifically C is not required and is designed intentionally not to perform bounds checking.
If you get any runtime error at all, it will generally be because it has been detected and trapped by the operating system, not by the C runtime. And that will only occur if the access encroaches upon memory not mapped to the process. More often it will simply access memory belonging to your process but which may be in use by other memory objects; the behaviour of your program will then depend on what your code does with the invalid data.

In C you will get away with these kinds of things. Sometime later other parts of the programs may come in and overwrite the area you are not supposed to use. So it is better not to test these things.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight