Writing more characters than malloced. Why does it not fail? - c

Why does the following work and not throw some kind of segmentation fault?
char *path = "/usr/bin/";
char *random = "012";
// path + random + \0
// so its malloc(13), but I get 16 bytes due to memory alignment (im on 32bit)
newPath = (char *) malloc(strlen(path) + strlen(random) + 1);
strcat(newPath, path);
strcat(newPath, "random");
// newPath is now: "/usr/bin/012\0" which makes 13 characters.
However, if I add
strcat(newPath, "RANDOMBUNNIES");
shouldn't this call fail, because strcat uses more memory than allocated? Consequently, shouldn't
free(newPath)
also fail because it tries to free 16 bytes but I used 26 bytes ("/usr/bin/012RANDOMBUNNIES\0")?
Thank you so much in advance!

Most often this kind of overrun problems doesn't make your program explode in a cloud of smoke and the smell of burnt sulphur. It's more subtle: the variable that is allocated after the overrun variable will be altered, causing unexplainable and seemingly random behavior of the program later on.

The whole program snippet is wrong. You are assuming that malloc() returns something that has at least the first byte set to 0. This is not generally the case, so even your "safe" strcat() is wrong.
But otherwise, as others have said, undefined behavior doesn't mean your program will crash. It only means it can do anything (including crashing, but also not crashing, if you are unlucky).
(Also, you shouldn't cast the return value of malloc().)

Writing more characters than malloced is an Undefined Behavior.
Undefined Behavior means anything can happen and the behavior cannot be explained.

Segmentation fault generally occurs because of accessing the invalid memory section. Here it won't give error(Segmentation fault) because you can still access memory. However you are overwriting other memory locations which is undefined behavior, your code runs fine.

It will fail and not fail at random, depending on the availability of the memory just after the malloc'd memory.
Also when you want to concat random you shouldn't be putting in quotes. that should be
strcat(newPath, random);

Many C library functions do not check whether they overrun. Its up to the programmer to manage the memory allocated. You may just be writing over another variable in memory, with unpredictable effects for the operation of your program. C is designed for efficiency not for pointing out errors in programming.

You have luck with this call. You don't get a segfault because your calls presumably stay in a allocated part of the address space. This is undefined behaviour. The last chars of what has been written are not guaranteed to not be overwritten. This calls may also fail.

Buffer overruns aren't guaranteed to cause a segfault. The behavior is simply undefined. You may get away with writing to memory that's not yours one time, cause a crash another time, and silently overwrite something completely unrelated a third time. Which one of these happens depends on the OS (and OS version), the hardware, the compiler (and compiler flags), and pretty much everything else that is running on your system.
This is what makes buffer overruns such nasty sources of bugs: Often, the apparent symptom shows in production, but not when run through a debugger; and the symptoms usually don't show in the part of the program where they originate. And of course, they are a welcome vulnerability to inject your own code.

Operating systems allocate at a certain granularity which is on my system a page-size of 4kb (which is typical on 32bit machines), whether a malloc() always takes a fresh page from the OS depends on your C runtime library.

Related

Why the heap did not corrupt earlier?

I am trying to understand at a lower level how C manages memory. I have found some code on a webpage, whose aim is teaching you how bad can be poor memory management - so I copied and pasted it, and compiled:
int main(int argc, char **argv) {
char *p, *q;
p = malloc(1024);
q = malloc(1024);
if (argc >= 2)
strcpy(p, argv[1]);
free(q);
free(p);
return 0;
}
The test cases were executed with the generic command
/development/heapbug$ ./heapbug `perl -e 'print "A"x$K'`
For $K < 1023 I did not expect problems, but for $K = 1024 I expected a core dump, which didn't take place. Long story short, I started having segfaults for $K > 1033.
Two questions:
1) why did this happen?
2) is there a formula that states the "tolerance" of a system?
When you write past the bounds of allocated memory, you invoke undefined behavior. This means you can't accurately predict the behavior of the program. It may crash, it may output strange results, or it may appear to work properly.
Also, making a seemingly unrelated change such as adding an unused local variable or a printf call for debugging can change how undefined behavior manifests itself, as can compiling with a different compiler or with the same compiler with different optimization settings.
Just because the program could crash doesn't mean it will.
That being said, what probably happened has to do with how malloc is implemented on your system. It probably puts aside a few more bytes than what was requested for alignment and bookkeeping purposes. Without aggressive optimization those extra bytes for alignment probably aren't used for anything else so you get awya with writing to them, but then you have problems when you write further into bytes than might contain internal structures used by malloc and free that you corrupt.
But again, you can't depend on this behavior. C depends on the developer to follow the rules, and if you don't bad things happen.
Undefined behaviour is just that. It might crash. It might not. It might work flawlessly. It might drink all the milk in your fridge. It might steal your favourite pair of shoes and stomp around in the mud with them.
Just because something is undefined behaviour does not mean it will be immediately obvious as such. You've overflowed the buffer here but the consequences weren't observed. It's likely because you don't actually use the second buffer you allocate, so if you started writing data to that there's no impact to any code.
This is why tools like Valgrind exist, to look for mistakes that may not always produce obvious or undesirable results.
From my understanding, if you overflow into memory controlled in the user space of your application(code/stack/etc) it isn't guaranteed to cause a coredump and can indeed overwrite some of that memory which is the risk identified by unintentional buffer overflows.
Once you start attempting to overwrite data outside of those bounds, the OS is more likely to block it.
Writing to unallocated memory is undefined behavior. The outcome isn't specified. It may or may not cause a crash. A heap overflow may corrupt the contents of other memory addresses, but how that will affect the program is unknown.

Odd behavior regarding malloc()

Why does this work?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char * abc = malloc(1) + 4; //WRONG use of malloc.
char * xyz = "abc";
strcpy(abc, xyz); //Should fail.
printf("%s\n", abc); //Prints abc
}
I would expect the strcpy to fail for not having enough memory, as I'm passing in 1 to the argument of malloc(). Instead, this compiles and runs flawlessly (in both GCC on linux and dev c++ on Windows).
Is this expected behavior, or a happy coincidence?
I assume this isn't good practice, but why does it work?
Without the +4 at the end of malloc(), I get a segmentation fault. This is mostly what I'm curious about.
This is undefined behavior. Don't do that!!.
You're trying to access memory location beyond the allocated region. So, the memory location is invalid and accessing invalid memory invokes UB.
FWIW,
there is noting in the C standard that stops you from accessing out of bound (invalid) memory and
neither does strcpy() check for the size of the destination buffer compared to the source length
so, this code (somehow) compiles. As soon as you run it and it hits UB, nothing is guaranteed anymore.
P.S - the only guaranteed thing here is undefined behavior.
This is basically another demonstration of the fact that pointers in C are low-level and (typically) not checked. You said you expected it to "fail for not having enough memory", but think about it: what did you expect to fail? The strcpy function most assuredly does not check that there's enough room for the string it's copying. It has no way to do so; all it gets is a pointer. It just starts copying characters, and in practice it either succeeds or dies on a segmentation violation. (But the point is it does not die on "out of memory".)
Do not rely on that behavior. The answerers responding "vigorously" are justified in that relying on such behavior can lurk undetected for years and then, one day, a minor adjustment to the runtime system suddenly causes catastrophic failure.
It seems to work because, since the advent of 32-bit computers, many—if not most—C runtime libraries implement malloc/free which manage the heap with 16-byte granularity. That is, calling malloc() with a parameter from 1 to 16 provides the same allocation. SO you get a little more memory than you asked for and that allows it to execute.
A tool like valgrind would certainly detect a problem.

C strange int array pointer

int *array; //it allocate a pointer to an int right?
array=malloc(sizeof(int); //allocate the space for ONE int right?
scanf("%d", &array[4]); //must generate a segmentation fault cause there is not allocated space enough right?
printf("%d",array[4]); //it shouldn't print nothing...
but it print 4! why?
Reading or writing off the end of an array in C results in undefined behavior, meaning that absolutely anything can happen. It can crash the program, or format the hard drive, or set the computer on fire. In your case, it just so happens to be the case that this works out perfectly fine, probably because malloc pads the length of the allocated memory for efficiency reasons (or perhaps because you're trashing memory that malloc wants to use later on). However, this isn't at all portable, isn't guaranteed to work, and is a disaster waiting to happen.
Hope this helps!
Cause the operating system happens to have this memory you have asked. This code is by no means guaranteed to run on another machine or another time.
C doesn't check your code for getting of boundaries of an array size, that depends to the programmer. I guess you can call it an undefined behavior (although it's not exactly what they mean when they say an undefined behavior because mostly the memory would be in a part of the stack or the heap so you can still get to it.
When you say array[4] you actually say *(array + 4) which is of course later translated to *(array + 4*sizeof(int)) and you actually go to a certain space in the memory, the space exists, it could be maybe a read-only, maybe a place of another array or variable in your program or it might just work perfectly. No guarantee it'll be an error, but it's not what undefined behavior.
To understand more about undefined behavior you can go to this article (which I find very interesting).

C program help: Insufficient memory allocation but still works...why? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
behaviour of malloc(0)
I'm trying to understand memory allocation in C. So I am experimenting with malloc. I allotted 0 bytes for this pointer but yet it can still hold an integer. As a matter of fact, no matter what number I put into the parameter of malloc, it can still hold any number I give it. Why is this?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *ptr = (int*)malloc(0);
*ptr = 9;
printf("%i", *ptr); // 9
free(ptr);
return 0;
}
It still prints 9, what's up with that?
If size is 0, then malloc() returns either NULL, or a unique pointer
value that can later be successfully passed to free().
I guess you are hitting the 2nd case.
Anyway that pointer just by mistake happens to be in an area where you can write without generating segmentation fault, but you are probably writing in the space of some other variable messing up its value.
A lot of good answers here. But it is definitely undefined behavior. Some people declare that undefined behavior means that purple dragons may fly out of your computer or something like that... there's probably some history behind that outrageous claim that I'm missing, but I promise you that purple dragons won't appear regardless of what the undefined behavior will be.
First of all, let me mention that in the absence of an MMU, on a system without virtual memory, your program would have direct access to all of the memory on the system, regardless of its address. On a system like that, malloc() is merely the guy who helps you carve out pieces of memory in an ordered manner; the system can't actually enforce you to use only the addresses that malloc() gave you. On a system with virtual memory, the situation is slightly different... well, ok, a lot different. But within your program, any code in your program can access any part of the virtual address space that's mapped via the MMU to real physical memory. It doesn't matter whether you got an address from malloc() or whether you called rand() and happened to get an address that falls in a mapped region of your program; if it's mapped and not marked execute-only, you can read it. And if it isn't marked read-only, you can write it as well. Yes. Even if you didn't get it from malloc().
Let's consider the possibilities for the malloc(0) undefined behavior:
malloc(0) returns NULL.
OK, this is simple enough. There really is a physical address 0x00000000 in most computers, and even a virtual address 0x00000000 in all processes, but the OS intentionally doesn't map any memory to that address so that it can trap null pointer accesses. There's a whole page (generally 4KB) there that's just never mapped at all, and maybe even much more than 4KB. Therefore if you try to read or write through a null pointer, even with an offset from it, you'll hit these pages of virtual memory that aren't even mapped, and the MMU will throw an exception (a hardware exception, or interrupt) that the OS catches, and it declares a SIGSEGV (on Linux/Unix), or an illegal access (on Windows).
malloc(0) returns a valid address to previously unallocated memory of the smallest allocable unit.
With this, you actually get a real piece of memory that you can legally call your own, of some size you don't know. You really shouldn't write anything there (and probably not read either) because you don't know how big it is, and for that matter, you don't know if this is the particular case you're experiencing (see the following cases). If this is the case, the block of memory you were given is almost guaranteed to be at least 4 bytes and probably is 8 bytes or perhaps even larger; it all depends on whatever the size is of your implementation's minimum allocable unit.
malloc(0) intentionally returns the address of an unmapped page of
memory other than NULL.
This is probably a good option for an implementation, as it would allow you or the system to track & pair together malloc() calls with their corresponding free() calls, but in essence, it's the same as returning NULL. If you try to access (read/write) via this pointer, you'll crash (SEGV or illegal access).
malloc(0) returns an address in some other mapped page of memory
that may be used by "someone else".
I find it highly unlikely that a commercially-available system would take this route, as it serves to simply hide bugs rather than bring them out as soon as possible. But if it did, malloc() would be returning a pointer to somewhere in memory that you do not own. If this is the case, sure, you can write to it all you want, but you'd be corrupting some other code's memory, though it would be memory in your program's process, so you can be assured that you're at least not going to be stomping on another program's memory. (I hear someone getting ready to say, "But it's UB, so technically it could be stomping on some other program's memory. Yes, in some environments, like an embedded system, that is right. No modern commercial OS would let one process have access to another process's memory as easily as simply calling malloc(0) though; in fact, you simply can't get from one process to another process's memory without going through the OS to do it for you.) Anyway, back to reality... This is the one where "undefined behavior" really kicks in: If you're writing to "someone else's memory" (in your own program's process), you'll be changing the behavior of your program in difficult-to-predict ways. Knowing the structure of your program and where everything is laid out in memory, it's fully predictable. But from one system to another, things would be laid out in memory (appearing a different locations in memory), so the effect on one system would not necessarily be the same as the effect on another system, or on the same system at a different time.
And finally.... No, that's it. There really, truly, are only those four
possibilities. You could argue for special-case subset points for
the last two of the above, but the end result will be the same.
For one thing, your compiler may be seeing these two lines back to back and optimizing them:
*ptr = 9;
printf("%i", *ptr);
With such a simplistic program, your compiler may actually be optimizing away the entire memory allocate/free cycle and using a constant instead. A compiler-optimized version of your program could end up looking more like simply:
printf("9");
The only way to tell if this is indeed what is happening is to examine the assembly that your compiler emits. If you're trying to learn how C works, I recommend explicitly disabling all compiler optimizations when you build your code.
Regarding your particular malloc usage, remember that you will get a NULL pointer back if allocation fails. Always check the return value of malloc before you use it for anything. Blindly dereferencing it is a good way to crash your program.
The link that Nick posted gives a good explanation about why malloc(0) may appear to work (note the significant difference between "works" and "appears to work"). To summarize the information there, malloc(0) is allowed to return either NULL or a pointer. If it returns a pointer, you are expressly forbidden from using it for anything other than passing it to free(). If you do try to use such a pointer, you are invoking undefined behavior and there's no way to tell what will happen as a result. It may appear to work for you, but in doing so you may be overwriting memory that belongs to another program and corrupting their memory space. In short: nothing good can happen, so leave that pointer alone and don't waste your time with malloc(0).
The answer to the malloc(0)/free() calls not crashing you can find here:
zero size malloc
About the *ptr = 9, is just like overflowing a buffer (like malloc'ing 10 bytes and access the 11th), you are writing to memory you don't own, and doing that is looking for trouble. In this particular implementation malloc(0) happens to return a pointer instead of NULL.
Bottom line, it is wrong even if it seems to work on a simple case.
Some memory allocators have the notion of "minimum allocatable size". So, even if you pass zero, this will return pointer to the memory of word-size, for example. You need to check up with your system allocator documentation. But if it does return pointer to some memory it'd be wrong to rely on it as the pointer is only supposed to be passed either to be passed realloc() or free().

Array is larger than allocated?

I have an array that's declared as char buff[8]. That should only be 8 bytes, but looking as the assembly and testing the code, I get a segmentation fault when I input something larger than 32 characters into that buff, whereas I would expect it to be for larger than 8 characters. Why is this?
What you're saying is not a contradiction:
You have space for 8 characters.
You get an error when you input more than 32 characters.
So what?
The point is that nobody told you that you would be guaranteed to get an error if you input more than 8 characters. That's simply undefined behaviour, and anything can (and will) happen.
You absolutely mustn't think that the absence of obvious misbehaviour is proof of the correctness of your code. Code correctness can only be verified by checking the code against the rules of the language (though some automated tools such as valgrind are an immense help).
Writing beyond the end of the array is undefined behavior. Undefined behavior means nothing (including a segmentation fault) is guaranteed.
In other words, it might do anything. More practical, it's likely the write didn't touch anything protected, so from the point of view of the OS everything is still OK until 32.
This raises an interesting point. What is "totally wrong" from the point of view of C might be OK with the OS. The OS only cares about what pages you access:
Is the address mapped for your process ?
Does your process have the rights ?
You shouldn't count on the OS slapping you if anything goes wrong. A useful tool for this (slapping) is valgrind, if you are using Unix. It will warn you if your process is doing nasty things, even if those nasty things are technically OK with the OS.
C arrays have no bound checking.
As other said, you are hitting undefined behavior; until you stay inside the bounds of the array, everything works fine. If you cheat, as far as the standard is concerned, anything can happen, including your program seeming to work right as well as the explosion of the Sun.
What happens in practice is that with stack-allocated variables you are likely to overwrite other variables on the stack, getting "impossible" bugs, or, if you hit a canary value put by the compiler, it may detect the buffer overflow on return from the function. For variables allocated in the so-called heap, the heap allocator may have given some more room than requested, so the mistake may be less easy to spot, although you may easily mess up the internal structures of the heap.
In both cases you can also hit a protected memory page, which will result in your program being terminated forcibly (for the stack this happens less often because usually you have to overwrite the entire stack to get to a protected page).
Your declaration char buff[8] sounds like a stack allocated variable, although it could be heap allocated if part of a struct. Accessing out of bounds of an array is undefined behaviour and is known as a buffer overrun. Buffer overruns on stack allocated memory may corrupt the current stack frame and possibly other stack frames in the call stack. With undefined behaviour, anything could happen, including no apparent error. You would not expect a seg fault immediately because the stack is typically when the thread starts.
For heap allocated memory, memory managers typically allocate large blocks of memory and then sub-allocate from those larger blocks. That is why you often don't get a seg fault when you access beyond the end of a block of memory.
It is undefined behaviour to access beyond the end of a memory block. And it is perfectly valid, according to the standard, for such out of bounds accesses to result in seg faults or indeed an apparently successful read or write. I say apparently successful because if you are writing then you will quite possibly produce a heap corruption by writing out of bounds.
Unless you are not telling us something you answered your owflown question.
declaring
char buff[8] ;
means that the compiler grabs 8 bytes of memory. If you try and stuff 32 char's into it you should get a seg fault, that's called a buffer overflow.
Each char is a byte ( unless you are doing unicode in which it is a word ) so you are trying to put 4x the number of chars that will fit in your buffer.
Is this your first time coding in C ?

Resources