Why no segmentation fault on strcpy? [duplicate] - c

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Undefined, unspecified and implementation-defined behavior
This should seg fault. Why doesn't it.
#include <string.h>
#include <stdio.h>
char str1[] = "Sample string. Sample string. Sample string. Sample string. Sample string. ";
char str2[2];
int main ()
{
strcpy (str2,str1);
printf("%s\n", str2);
return 0;
}
I am using gcc version 4.4.3 with the following command:
gcc -std=c99 testString.c -o test
I also tried setting optimisation to o (-O0).

This should seg fault
There's no reason it "should" segfault. The behaviour of the code is undefined. This does not mean it necessarily has to crash.

A segmentation fault only occurs when you perform an access to memory the operating system knows you're not supposed to.
So, what's likely going on, is that the OS allocates memory in pages (which are typically around 4KiB). str2 is probably on the same page as str1, and you're not running off the end of the page, so the OS doesn't notice.
That's the thing about undefined behavior. Anything can happen. Right now, that program actually "works" on your machine. Tomorrow, str2 may be put at the end of a page, and then segfault. Or possibly, you'll overwrite something else in memory, and have completely unpredictable results.
edit: how to cause a segfault:
Two ways. One is still undefined behavior, the other is not.
int main() {
*((volatile char *)0) = 42; /* undefined behavior, but normally segfaults */
}
Or to do it in a defined way:
#include <signal.h>
int main() {
raise(SIGSEGV); /* segfault using defined behavior */
}
edit: third and fourth way to segfault
Here is a variation of the first method using strcpy:
#include <string.h>
const char src[] = "hello, world";
int main() {
strcpy(0, src); /* undefined */
}
And this variant only crashes for me with -O0:
#include <string.h>
const char src[] = "hello, world";
int main() {
char too_short[1];
strcpy(too_short, src); /* smashes stack; undefined */
}

Your program writes beyond the allocated bounds of the array, this results in Undefined Behavior.
The program is ill-formed and It might crash or may not.An explanation may or may not be possible.
It probably doesn't crash because it overwrites some memory beyond the array bounds which is not being used, bt it will once the rightful owner of that memory tries to access it.

A seg-fault is NOT guaranteed behavior.It is one possible (and sometimes likely) outcome of doing something bad.Another possible outcome is that it works by pure luck.A third possible outcome is nasal demons.

if you really want to find out what this might be corrupting i would suggest you see what follows the over-written memory generate a linker map file that should give you a fair idea but then again this all depends on how things are layed out in memory, even can try running this with gdb to reason why it does or does not segfault, that being said, the granularity for built checks in access violations (HW assisted) cannot be finer than a page unless some software magic is thrown in (even with this page granularity access checking it may happen that the immediately next page does really point to something else for the program which you are executing and that it is a Writable page), someone who knows about valgrind can explain how it is able to detect such access violations (also libefence), most likely (i might be very wrong with this explanation, Correct me if i am wrong!) it uses some form of markers or comparisons for checking if out of bounds access has happened.

Related

Strings and Dynamic allocation in C [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Undefined, unspecified and implementation-defined behavior
This should seg fault. Why doesn't it.
#include <string.h>
#include <stdio.h>
char str1[] = "Sample string. Sample string. Sample string. Sample string. Sample string. ";
char str2[2];
int main ()
{
strcpy (str2,str1);
printf("%s\n", str2);
return 0;
}
I am using gcc version 4.4.3 with the following command:
gcc -std=c99 testString.c -o test
I also tried setting optimisation to o (-O0).
This should seg fault
There's no reason it "should" segfault. The behaviour of the code is undefined. This does not mean it necessarily has to crash.
A segmentation fault only occurs when you perform an access to memory the operating system knows you're not supposed to.
So, what's likely going on, is that the OS allocates memory in pages (which are typically around 4KiB). str2 is probably on the same page as str1, and you're not running off the end of the page, so the OS doesn't notice.
That's the thing about undefined behavior. Anything can happen. Right now, that program actually "works" on your machine. Tomorrow, str2 may be put at the end of a page, and then segfault. Or possibly, you'll overwrite something else in memory, and have completely unpredictable results.
edit: how to cause a segfault:
Two ways. One is still undefined behavior, the other is not.
int main() {
*((volatile char *)0) = 42; /* undefined behavior, but normally segfaults */
}
Or to do it in a defined way:
#include <signal.h>
int main() {
raise(SIGSEGV); /* segfault using defined behavior */
}
edit: third and fourth way to segfault
Here is a variation of the first method using strcpy:
#include <string.h>
const char src[] = "hello, world";
int main() {
strcpy(0, src); /* undefined */
}
And this variant only crashes for me with -O0:
#include <string.h>
const char src[] = "hello, world";
int main() {
char too_short[1];
strcpy(too_short, src); /* smashes stack; undefined */
}
Your program writes beyond the allocated bounds of the array, this results in Undefined Behavior.
The program is ill-formed and It might crash or may not.An explanation may or may not be possible.
It probably doesn't crash because it overwrites some memory beyond the array bounds which is not being used, bt it will once the rightful owner of that memory tries to access it.
A seg-fault is NOT guaranteed behavior.It is one possible (and sometimes likely) outcome of doing something bad.Another possible outcome is that it works by pure luck.A third possible outcome is nasal demons.
if you really want to find out what this might be corrupting i would suggest you see what follows the over-written memory generate a linker map file that should give you a fair idea but then again this all depends on how things are layed out in memory, even can try running this with gdb to reason why it does or does not segfault, that being said, the granularity for built checks in access violations (HW assisted) cannot be finer than a page unless some software magic is thrown in (even with this page granularity access checking it may happen that the immediately next page does really point to something else for the program which you are executing and that it is a Writable page), someone who knows about valgrind can explain how it is able to detect such access violations (also libefence), most likely (i might be very wrong with this explanation, Correct me if i am wrong!) it uses some form of markers or comparisons for checking if out of bounds access has happened.

the function of malloc(using malloc correctly)

so I'm quite new in this, sorry if it sound like a dumb question
I'm trying to understand malloc, and create a very simple program which will print "ABC" using ASCII code
here is my code (what our professor taught us) so far
char *i;
i = malloc(sizeof(char)*4);
*i = 65;
*(i+1) = 66;
*(i+2) = 67;
*(i+3) = '\0';
what I don't understand is, why do I have to put malloc there?
the professor told us the program won't run without the malloc,
but when I tried and run it without the malloc, the program run just fine.
so what's the function of malloc there?
am I even using it right?
any help and or explanation would be really appreciated
the professor told us the program won't run without the malloc
This is not quite true, the correct wording would be: "The program's behavior is undefined without malloc()".
The reason for this is that
char *i;
just declares a pointer to a char, but there's no initialization -- this pointer points to some indeterminate location. You could be just lucky in that writing values to this "random" location works and won't result in a crash. I'd personally call it unlucky because this hides a bug in your program. undefined behavior just means anything can happen, including a "correct" program execution.
malloc() will dynamically request some usable memory and return a pointer to that memory, so after the malloc(), you know i points to 4 bytes of memory you can use. If malloc() fails for some reason (no more memory available), it returns NULL -- your program should test for it before writing to *i.
All that said, of course the program CAN work without malloc(). You could just write
char i[4];
and i would be a local variable with room for 4 characters.
Final side note: sizeof(char) is defined to be 1, so you can just write i = malloc(4);.
Unfortunately, "runs fine" criterion proves nothing about a C program. Great deal of C programs that run to completion have undefined behavior, which does not happen to manifest itself on your particular platform.
You need special tools to see this error. For example, you can run your code through valgrind, and see it access uninitialized pointer.
As for the malloc, you do not have to use dynamic buffer in your code. It would be perfectly fine to allocate the buffer in automatic memory, like this:
char buf[4], *i = buf;
You have to allocate space for memory. In the example below, I did not allocate for memory for i, which resulted in a segmentation fault (you are trying to access memory that you don't have access to)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char *i;
strcpy(i, "hello");
printf("%s\n", i);
return (0);
}
Output: Segmentation fault (core dumped)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char *i;
/*Allocated 6 spots with sizeof char +1 for \0 character*/
i = malloc(sizeof(char) * 6);
strcpy(i, "hello");
printf("%s\n", i);
return (0);
}
Result: hello
Malloc allows you to create space, so you can write to a spot in memory. In the first example, "It won't work without malloc" because i is pointing to a spot in memory that doesn't have space allocated yet.

C Stack-Allocated String Scope

For straight C and GCC, why doesn't the pointed-to string get corrupted here?
#include <stdio.h>
int main(int argc, char *argv[])
{
char* str_ptr = NULL;
{
//local to this scope-block
char str[4]={0};
sprintf(str, "AGH");
str_ptr = str;
}
printf("str_ptr: %s\n", str_ptr);
getchar();
return 0;
}
|----OUTPUT-----|
str_ptr: AGH
|--------------------|
Here's a link to the above code compiled and executed using an online compiler.
I understand that if str was a string literal, str would be stored in the bss ( essentially as a static ), but sprintf(ing) to a stack-allocated buffer, I thought the string buffer would be purely stack-based ( and thus the address meaningless after leaving the scope block )? I understand that it may take additional stack allocations to over-write the memory at the given address, but even using a recursive function until a stack-overflow occurred, I was unable to corrupt the string pointed to by str_ptr.
FYI I am doing my testing in a VS2008 C project, although GCC seems to exhibit the same behavior.
While nasal lizards are a popular part of C folklore, code whose behaviour is undefined can actually exhibit any behaviour at all, including magically resuscitating variables whose lifetime has expired. The fact that code with undefined behaviour can appear to "work" should neither be surprising nor an excuse to neglect correcting it. Generally, unless you're in the business of writing compilers, it's not very useful to examine the precise nature of undefined behaviour in any given environment, especially as it might be different after you blink.
In this particular case, the explanation is simple, but it's still undefined behaviour, so the following explanation cannot be relied upon at all. It might at any time be replaced with reptilian emissions.
Generally speaking, C compilers will make each function's stack frame a fixed size, rather than expanding and contracting as control flow enters and leaves internal blocks. Unless called functions are inlined, their stack frames will not overlap with the stack frame of the caller.
So, in certain C compilers with certain sets of compile options and except for particular phases of the moon, the character array str will not be overwritten by the call to printf, even though the variable's lifetime has expired.
Most likely the compiler does some sort of simple optimizations resulting in the string still being in the same place on the stack. In other words, the compiler allows the stack to grow to store 'str'. But it doesn't shrink the stack in the scope of main, because it is not required to do so.
If you really want to see the result of saving the address of variables on the stack, call a function.
#include <stdio.h>
char * str_ptr = NULL;
void onstack(void)
{
char str[4] = {0};
sprintf(str,"AGH");
str_ptr = str;
}
int main(int argc, char *argv[])
{
onstack();
int x = 0x61626364;
printf("str_ptr: %s\n", str_ptr);
printf("x:%i\n",x);
getchar();
return 0;
}
With gcc -O0 -std=c99 strcorrupt.c I get random output on the first printf. It will vary from machine to machine and architecture to architecture.

Does memcpy() uses realloc()?

#inlcude <stdio.h>
#inlcude <stdlib.h>
#inlcude <string.h>
int main() {
char *buff = (char*)malloc(sizeof(char) * 5);
char *str = "abcdefghijklmnopqrstuvwxyz";
memcpy (buff, str, strlen(str));
while(*buff) {
printf("%c" , *buff++);
}
printf("\n");
return 0;
}
this code prints the whole string "abc...xyz". but "buff" has no enough memory to hold that string. how memcpy() works? does it use realloc() ?
Your code has Undefined Behavior. To answer your question, NO, memcpy doesn't use realloc.
sizeof(buf) should be adequate to accomodate strlen(str). Anything less is a crash.
The output might be printed as it's a small program, but in real big code it will cause hard to debug errors. Change your code to,
const char* const str = "abcdefghijklmnopqrstuvwxyz";
char* const buff = (char*)malloc(strlen(str) + 1);
Also, don't do *buff++ because you will loose the memory record (what you allocated). After malloc() one should do free(buff) once the memory usage is over, else it's a memory leak.
You might be getting the whole string printed out, but it is not safe and you are writing to and reading from unallocated memory. This produces Undefined Behavior.
memcpy does not do any memory allocation. It simply reads from and writes to the locations you provide. It doesn't check that it is alright to do so, and in this case you're lucky if your program doesn't crash.
how memcpy() works?
Because you've invoked undefined behavior. Undefined behavior may work exactly as you expect, and it may do something completely different. It may even differ between different runs of the same program. It could also format your hard disk and still be compliant with the standard (Though of course that's unlikely :P )
Undefined behavior means that the behavior is literally not defined to do anything. Anything is valid, including the behavior you're seeing. Note that if you try to free that memory the C runtime of your target platform will probably complain. ;)
No memcpy does not use malloc. As you suspected, you are writing off the end of of buff. In your simple example, that does no apparent harm, but it is bad. Here are some of the things that could go wrong in a "real" program:
You might scribble on something allocated in the memory following your buff leading to subtle (or not so subtle) bugs later on.
You might scribble on headers used internally by malloc and free, leading to crashes or other problems on your next call to those functions.
You might end up writing to an address that has not been allocated to your process, in which case your program will immediately crash. (I suspect this is what you were expecting.)
There are malloc implementations that put unmapped guard pages around allocated memory to (usually) cause the program to crash in cases like this. Other implementations will detect this, but only on your next call to malloc or free (or when you call a special function to check the heap).

Why does it NOT give a segmentation violation?

The code below is said to give a segmentation violation:
#include <stdio.h>
#include <string.h>
void function(char *str) {
char buffer[16];
strcpy(buffer,str);
}
int main() {
char large_string[256];
int i;
for( i = 0; i < 255; i++)
large_string[i] = 'A';
function(large_string);
return 1;
}
It's compiled and run like this:
gcc -Wall -Wextra hw.cpp && a.exe
But there is nothing output.
NOTE
The above code indeed overwrites the ret address and so on if you really understand what's going underneath.
The ret address will be 0x41414141 to be specific.
Important
This requires profound knowledge of stack
You're just getting lucky. There's no reason that code has to generate a segmentation fault (or any other kind of error). It's still probably a bad idea, though. You can probably get it to fail by increasing the size of large_string.
Probably in your implementation buffer is immediately below large_string on the stack. So when the call to strcpy overflows buffer, it's just writing most of the way into large_string without doing any particular damage. It will write at least 255 bytes, but whether it writes more depends what's above large_string (and the uninitialised value of the last byte of large_string). It seems to have stopped before doing any damage or segfaulting.
By fluke, the return address of the call to function isn't being trashed. Either it's below buffer on the stack or it's in a register, or maybe the function is inlined, I can't remember what no optimisation does. If you can't be bothered to check the disassembly, I can't either ;-). So you're returning and exiting without problems.
Whoever said that code would give a segfault probably isn't reliable. It results in undefined behaviour. On this occasion, the behaviour was to output nothing and exit.
[Edit: I checked on my compiler (GCC on cygwin), and for this code it is using the standard x86 calling convention and entry/exit code. And it does segfault.]
You're compiling a .cpp (c++) program by invoking gcc (instead of g++)... not sure if this is the cause, but on a linux system (it appears your running on windows due to the default .exe output) it throws the following error when trying to compile as you have stated:
/tmp/ccSZCCBR.o:(.eh_frame+0x12): undefined reference to `__gxx_personality_v0'
collect2: ld returned 1 exit status
Its UB ( undefined behavior).
Strcpy might have copied more bytes into memory pointed by buffer and it might not cause problem at that moment.
It's undefined behavior, which means that anything can happen. The program can even appear to work correctly.
It seem that you just happen to not overwrite any parts of memory that are still needed by the rest of the (short) program (or are out of the programs address space/write protected/...), so nothing special happens. At least nothing that would lead to any output.
There's a zero byte on the stack somewhere that stops the strcpy() and there's enough room on the stack not to hit protected page. Try printing out strlen(buffer) in that function. In any case the result is undefined behavior.
Get into habit of using strlcpy(3) family of functions.
You can test this in other ways:
#include <stdlib.h>
int main() {
int *a=(int *)malloc(10*sizeof(int));
int i;
for (i=0;i<1000000; i++) a[i] = i;
return 0;
}
In my machine, this causes SIGSEGV only at around i = 37000! (tested by inspecting the core with gdb).
To guard against these problems, test your programs using a malloc debugger... and use lots of mallocs, since there are no memory debugging libraries that I know of that can look into static memory. Example: Electric Fence
gcc -g -Wall docore.c -o c -lefence
And now the SIGSEGV is triggered as soon as i=10, as would be expected.
As everyone says, your program has undefined behaviour. In fact your program has more bugs than you thought it did, but after it's already undefined it doesn't get any further undefined.
Here's my guess about why there was no output. You didn't completely disable optimization. The compiler saw that the code in function() doesn't have any defined effect on the rest of the program. The compiler optimized out the call to function().
Odds are that the long string is, in fact, terminated by the zero byte in i. Assuming that the variables in main are laid out in the order they are declared -- which isn't required by anything in the language spec that I know of but seems likely in practice -- then large_string would be first in memory, followed by i. The loop sets i to 0 and counts up to 255. Whether i is stored big-endian or little-endian, either way it has a zero byte in it. So in traversing large_string, at either byte 256 or 257 you'll hit a null byte.
Beyond that, I'd have to study the generated code to figure out why this didn't blow. As you seem to indicate, I'd expect that the copy to buffer would overwrite the return address from the strcpy, so when it tried to return you'd be going into deep space some where and would quickly blow up on something.
But as others say, "undefined" means "unpredictable".
There may be anything in your 'char buffer[16]', including \0. strcpy copies till it finds first \0 - thus not going above your boundary of 16 characters.

Resources