Why doesn't this segfault?
#include <stdio.h>
int main()
{
int i;
int arr[] = {1, 2, 3, 4};
for(i=0;i<8;i++)
{
arr[i] = i;
printf(" %d", arr[i]);
}
printf("\n");
return 0;
}
But it does when I replace 8 with 9 in the for loop.
Note: I am trying it on 32 bit crunchbang linux
Technically speaking, this program results in undefined behavior, meaning that there are absolutely no guarantees whatsoever about what this program is allowed to do. It could in principle format your hard drive, email nasty messages to all of your friends, set your computer on fire, or become sentient and enslave humanity.
In this case, the undefined behavior when n = 8 happens to not do anything bad, while the undefined behavior when n = 9 causes a segfault. Both are totally permissible behaviors for the program, but aren't at all guaranteed to be portable.
Hope this helps!
#templatetypedef is right; he beat me to the answer. Undefined behavior doesn't necessarily mean segfault.
But I would like to offer some speculation as for why it segfaults for n=9 but not 8. Normally, variables like int i and int arr[] reside on the stack, which grows down. So, i might reside at address 0x4000, and if it's a 4 byte int, then arr[0] is at 0x4004, arr[1] is at 0x4008, and so on. In this case, the compiler will most likely have allocated 16 bytes for arr, and it may be using addresses below 0x4014 (the first byte after arr, ie the address of arr[5]). But there are typically other things on the stack besides the variables you manually declare. For instance, the arguments to the printf call are probably on the stack, and there could be other bookkeeping information. So if arr[9] coincides with the stack location that the compiler is using for something else, you will inadvertently clobber that information, which can lead to segfault.
Alternatively, if arr[8] is at the bottom of your stack frame allocated from the OS, then your OS will have configured your processor to literally refuse to execute any instruction to load or store a value from the address that coincides with arr[9]. I think this scenario is unlikely since stack sizes are usually 4KB or so, and for such a short program you should be nowhere near the end of the allocated stack.
templatetypedef answer is correct, this is a case of undefined behavior (which is not as uncommon as you might think) but I would like to add something:
9*sizeof(int) is not a power of 2. When compilers allocate memory they usually allocate in power of two bytes to prevent fragmentation of the heap.
You might wonder why it didn't instead allocate 4*sizeof(int) as that would be exactly what you asked. I'm not sure but it might be that the compiler allocates a couple of extra bytes or that it has a minimum amount of memory it will allocate for arrays. It depends on the compiler and the options you compiled your code with.
Try to run your code without optimizations (-O0 on the command line), it might instead allocate the exact amount of memory you need and segfault for i>=4.
I might be wrong though, I'm not sure that C compilers allocate static arrays in the heap or the stack.
Related
Having recently switched to c, I've been told a thousand ways to Sunday that referencing a value that hasn't been initialized isn't good practice, and leads to unexpected behavior. Specifically, (because my previous language initializes integers as 0) I was told that integers might not be equal to zero when uninitialized. So I decided to put that to the test.
I wrote the following piece of code to test this claim:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <assert.h>
int main(){
size_t counter = 0;
size_t testnum = 2000; //The number of ints to allocate and test.
for(int i = 0; i < testnum; i++){
int* temp = malloc(sizeof(int));
assert(temp != NULL); //Just in case there's no space.
if(*temp == 0) counter++;
}
printf(" %d",counter);
return 0;
}
I compiled it like so (in case it matters):
gcc -std=c99 -pedantic name-of-file.c
Based on what my instructors had said, I expected temp to point to a random integer, and that the counter would not be incremented very often. However, my results blow this assumption out of the water:
testnum: || code returns:
2 2
20 20
200 200
2000 2000
20000 20000
200000 200000
2000000 2000000
... ...
The results go on for a couple more powers of 10 (*2), but you get the point.
I then tested a similar version of the above code, but I initialized an integer array, set every even index to plus 1 of its previous value (which was uninitialized), freed the array, and then performed the code above, testing the same amount of integers as the size of the array (i.e. testnum). These results are much more interesting:
testnum: || code returns:
2 2
20 20
200 175
2000 1750
20000 17500
200000 200000
2000000 2000000
... ...
Based on this, it's reasonable to conclude that c reuses freed memory (obviously), and sets some of those new integer pointers to point to addresses which contain the previously incremented integers. My question is why all of my integer pointers in the first test consistently point to 0. Shouldn't they point to whatever empty spaces on the heap that my computer has offered the program, which could (and should, at some point) contain non-zero values?
In other words, why does it seem like all of the new heap space that my c program has access to has been wiped to all 0s?
As you already know, you are invoking undefined behavior, so all bets are off. To explain the particular results you are observing ("why is uninitialized memory that I haven't written to all zeros?"), you first have to understand how malloc works.
First of all, malloc does not just directly ask the system for a page whenever you call it. It has an internal "cache" from which it can hand you memory. Let's say you call malloc(16) twice. The first time you call malloc(16), it will scan the cache, see that it's empty, and request a fresh page (4KB on most systems) from the OS. It then splits this page into two chunks, gives you the smaller chunk, and saves the other chunk in its cache. The second time you call malloc(16), it will see that it has a large enough chunk in its cache, and allocate memory by splitting that chunk again.
freeing memory simply returns it to the cache. There, it may (or may not be) be merged with other chunks to form a bigger chunk, and is then used for other allocations. Depending on the details of your allocator, it may also choose to return free pages to the OS if possible.
Now the second piece of the puzzle -- any fresh pages you obtain from the OS are filled with 0s. Why? Imagine it simply handed you an unused page that was previously used by some other process that has now terminated. Now you have a security problem, because by scanning that "uninitialized memory", your process could potentially find sensitive data such as passwords and private keys that were used by the previous process. Note that there is no guarantee by the C language that this happens (it may be guaranteed by the OS, but the C specification doesn't care). It's possible that the OS filled the page with random data, or didn't clear it at all (especially common on embedded devices).
Now you should be able to explain the behavior you're observing. The first time, you are obtaining fresh pages from the OS, so they are empty (again, this is an implementation detail of your OS, not the C language). However, if you malloc, free, then malloc again, there is a chance that you are getting back the same memory that was in the cache. This cached memory is not wiped, since the only process that could have written to it was your own. Hence, you just get whatever data was previously there.
Note: this explains the behavior for your particular malloc implementation. It doesn't generalize to all malloc implementations.
First off, you need to understand, that C is a language that is described in a standard and implemented by several compilers (gcc, clang, icc, ...). In several cases, the standard mentions that certain expressions or operations result in undefined behavior.
What is important to understand is that this means you have no guarantees on what the behavior will be. In fact any compiler/implementation is basically free to do whatever it wants!
In your example, this means you cannot make any assumptions of when the uninitialized memory will contain. So assuming it will be random or contain elements of a previously freed object are just as wrong as assuming that it is zero, because any of that could happen at any time.
Many compilers (or OS's) will consistently do the same thing (such as the 0s you observer), but that is also not guaranteed.
(To maybe see different behaviors, try using a different compiler or different flags.)
Undefined behavior does not mean "random behavior" nor does it mean "the program will crash." Undefined behavior means "the compiler is allowed to assume that this never happens," and "if this does happen, the program could do anything." Anything includes doing something boring and predictable.
Also, the implementation is allowed to define any instance of undefined behavior. For instance, ISO C never mentions the header unistd.h, so #include <unistd.h> has undefined behavior, but on an implementation conforming to POSIX, it has well-defined and documented behavior.
The program you wrote is probably observing uninitialized malloced memory to be zero because, nowadays, the system primitives for allocating memory (sbrk and mmap on Unix, VirtualAlloc on Windows) always zero out the memory before returning it. That's documented behavior for the primitives, but it is not documented behavior for malloc, so you can only rely on it if you call the primitives directly. (Note that only the malloc implementation is allowed to call sbrk.)
A better demonstration is something like this:
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
{
int *x = malloc(sizeof(int));
*x = 0xDEADBEEF;
free(x);
}
{
int *y = malloc(sizeof(int));
printf("%08X\n", *y);
}
return 0;
}
which has pretty good odds of printing "DEADBEEF" (but is allowed to print 00000000, or 5E5E5E5E, or make demons fly out of your nose).
Another better demonstration would be any program that makes a control-flow decision based on the value of an uninitialized variable, e.g.
int foo(int x)
{
int y;
if (y == 5)
return x;
return 0;
}
Current versions of gcc and clang will generate code that always returns 0, but the current version of ICC will generate code that returns either 0 or the value of x, depending on whether register EDX is equal to 5 when the function is called. Both possibilities are correct, and so generating code that always returns x, and so is generating code that makes demons fly out of your nose.
useless deliberations, wrong assumptions, wrong test. In your test every time you malloc sizeof int of the fresh memory. To see the that UB you wanted to see you should put something in that allocated memory and then free it. Otherwise you do not reuse it, you just leak it. Most of the OS-es clear all the memory allocated to the program before executing it for the security reasons (so when you start the program everything was zeroed or initialised to the static values).
Change your program to:
int main(){
size_t counter = 0;
size_t testnum = 2000; //The number of ints to allocate and test.
for(int i = 0; i < testnum; i++){
int* temp = malloc(sizeof(int));
assert(temp != NULL); //Just in case there's no space.
if(*temp == 0) counter++;
*temp = rand();
free(temp);
}
printf(" %d",counter);
return 0;
}
I am a beginner with C. I am wondering, how's malloc working.
Here is a sample code, I wrote on while trying to understand it's working.
CODE:
#include<stdio.h>
#include<stdlib.h>
int main() {
int i;
int *array = malloc(sizeof *array);
for (i = 0; i < 5; i++) {
array[i] = i+1;
}
printf("\nArray is: \n");
for (i = 0; i < 5; i++) {
printf("%d ", array[i]);
}
free(array);
return 0;
}
OUTPUT:
Array is:
1 2 3 4 5
In the program above, I have only allocated space for 1 element, but the array now holds 5 elements. So as the programs runs smoothly without any error, what is the purpose of realloc().
Could anybody explain why?
Thanks in advance.
The fact that the program runs smoothly does not mean it is correct!
Try to increase the 5 in the for loop to some extent (500000, for instance, should suffices). At some point, it will stop working giving you a SEGFAULT.
This is called Undefined Behaviour.
valgrind would also warn you about the issue with something like the following.
==16812== Invalid write of size 4
==16812== at 0x40065E: main (test.cpp:27)
If you dont know what valgrind is check this out: How do I use valgrind to find memory leaks?. (BTW it's a fantastic tool)
This should help gives you some more clarifications: Accessing unallocated memory C++
This is typical undefined behavior (UB).
You are not allowed to code like that. As a beginner, think it is a mistake, a fault, a sin, something very dirty etc.
Could anybody explain why?
If you need to understand what is really happening (and the details are complex) you need to dive into your implementation details (and you don't want to). For example, on Linux, you could study the source code of your C standard library, of the kernel, of the compiler, etc. And you need to understand the machine code generated by the compiler (so with GCC compile with gcc -S -O1 -fverbose-asm to get an .s assembler file).
See also this (which has more references).
Read as soon as possible Lattner's blog on What Every C programmer should know about undefined behavior. Every one should have read it!
The worst thing about UB is that sadly, sometimes, it appears to "work" like you want it to (but in fact it does not).
So learn as quickly as possible to avoid UB systematically.
BTW, enabling all warnings in the compiler might help (but perhaps not in your particular case). Take the habit to compile with gcc -Wall -Wextra -g if using GCC.
Notice that your program don't have any arrays. The array variable is a pointer (not an array) so is very badly named. You need to read more about pointers and C dynamic memory allocation.
int *array = malloc(sizeof *array); //WRONG
is very wrong. The name array is very poorly chosen (it is a pointer, not an array; you should spend days in reading what is the difference - and what do "arrays decay into pointers" mean). You allocate for a sizeof(*array) which is exactly the same as sizeof(int) (and generally 4 bytes, at least on my machine). So you allocate space for only one int element. Any access beyond that (i.e. with any even small positive index, e.g. array[1] or array[i] with some positive i) is undefined behavior. And you don't even test against failure of malloc (which can happen).
If you want to allocate memory space for (let's say) 8 int-s, you should use:
int* ptr = malloc(sizeof(int) * 8);
and of course you should check against failure, at least:
if (!ptr) { perror("malloc"); exit(EXIT_FAILURE); };
and you need to initialize that array (the memory you've got contain unpredictable junk), e.g.
for (int i=0; i<8; i++) ptr[i] = 0;
or you could clear all bits (with the same result on all machines I know of) using
memset(ptr, 0, sizeof(int)*8);
Notice that even after a successful such malloc (or a failed one) you always have sizeof(ptr) be the same (on my Linux/x86-64 box, it is 8 bytes), since it is the size of a pointer (even if you malloc-ed a memory zone for a million int-s).
In practice, when you use C dynamic memory allocation you need to know conventionally the allocated size of that pointer. In the code above, I used 8 in several places, which is poor style. It would have been better to at least
#define MY_ARRAY_LENGTH 8
and use MY_ARRAY_LENGTH everywhere instead of 8, starting with
int* ptr = malloc(MY_ARRAY_LENGTH*sizeof(int));
In practice, allocated memory has often a runtime defined size, and you would keep somewhere (in a variable, a parameter, etc...) that size.
Study the source code of some existing free software project (e.g. on github), you'll learn very useful things.
Read also (perhaps in a week or two) about flexible array members. Sometimes they are very useful.
So as the programs runs smoothly without any error
That's just because you were lucky. Keep running this program and you might segfault soon. You were relying on undefined behaviour (UB), which is always A Bad Thing™.
What is the purpose of realloc()?
From the man pages:
void *realloc(void *ptr, size_t size);
The realloc() function changes the size of the memory block pointed to
by ptr to size bytes. The contents will be unchanged in the range
from the start of the region up to the minimum of the old and new sizes. If the new size is larger than the old size, the added
memory
will not be initialized. If ptr is NULL, then the call is equivalent to malloc(size), for all values of size; if size is equal
to zero,
and ptr is not NULL, then the call is equivalent to free(ptr). Unless ptr is NULL, it must have been returned by an
earlier call to malloc(), calloc() or realloc(). If the area pointed to was moved, a free(ptr) is done.
I want to write a C code to see the difference between static and dynamic allocation.
That's my idea but it doesn't work.
It simply initializes an array of size 10, but assigns 100 elements instead of 10. I'll then initialize another array large enough hoping to replace the 90 elements that're not part of array1[10], then I print out the 100 elements of array1.
int i;
int array1[10];
int array2[10000];
for(i=0;i<100;i++)
array1[i] = i;
for(i=0;i<10000;i++)
array2[i] = i+1;
for(i=0;i<100;i++)
{
printf("%d \n",array1[i]);
}
What I hope to get is garbage outside then first 10 elements when using static allocation, afterwards, I'll use malloc and realloc to ensure that the 100 elements would be there correctly. But unfortunately, it seems that the memory is large enough so that the rest of the 100 elements wouldn't be replaced!
I tried to run the code on linux and use "ulimit" to limit the memory size, but it didn't work either.
Any ideas please?
Cdoesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
Accessing outside the array bounds is undefined behavior, from the c99 draft standard section Annex J.2 J.2 Undefined behavior includes the follow point:
An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]) (6.5.6).
In this example you are declaring a stack based array. Accessing out of bound will get memory from already allocated stack space. Currently undefined behavior is not in your favor as there is no Seg fault. Its programmer's responsibility to handle boundary conditions while writing code in C/C++.
You do get garbage after the first 10 elements of array1. All of the data after element 9 should not be considered allocated by the stack and can be written over at any time. When the program prints the 100 elements of array1, you might see the remnants of either for loop because the two arrays are allocated next to each other and normally haven't been written over. If this were implemented in a larger program, other arrays might take up the space after these two example arrays.
When you access array1[10] and higher index values, the program will just keep writing into adjacent memory locations even though they don't "belong" to your array. At some point you might try to access a memory location that's forbidden, but as long as you're mucking with memory that the OS has given to your program, this will run. The results will be unpredictable though. It could happen that this will corrupt data that belongs to another variable in your program, for example. It could also happen that the value that you wrote there will still be there when you go back to read it if no other variable has been "properly assigned" that memory location. (This seems to be what's happening in the specific case that you posted.)
All of that being said, I'm not clear at all how this relates to potential differences between static and dynamic memory allocation since you've only done static allocation in the program and you've deliberately introduced a bug.
Changing the memory size won't resolve your problem, because when you create your two arrays, the second one should be right after the first one in memory.
Your code should do what you think it will, and on my computer, it does.
Here's my output :
0
1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
...
What OS are you running your code on ? (I'm on linux 64bit).
Anyway, as everybody told you, DON'T EVER DO THIS IN A REAL PROGRAM. Writing outside an array is an undefined behaviour and could lead your program to crash.
Writing out of bounds of an array will prove nothing and is not well-defined. Generally, there's nothing clever or interesting involved in invoking undefined behavior. The only thing you'll achieve by that is random crashes.
If you wish to know where a variable is allocated, you have to look at addresses. Here's one example:
#include <stdio.h>
#include <stdlib.h>
int main (void)
{
int stack;
static int data = 1;
static int bss = 0;
int* heap = malloc(sizeof(*heap));
printf("stack: %p\n", (void*)&stack);
printf(".data: %p\n", (void*)&data);
printf(".bss: %p\n", (void*)&bss);
printf(".heap: %p\n", (void*)heap);
}
This should print 4 distinctively different addresses (.data and .bss probably close to each other though). To know exactly where a certain memory area starts, you either need to check some linker script or use a system-specific API. And once you know the memory area's offset and size, you can determine if a variable is stored within one of the different memory segments.
I'm a bit confused about malloc() function.
if sizeof(char) is 1 byte and the malloc() function accepts N bytes in argument to allocate, then if I do:
char* buffer = malloc(3);
I allocate a buffer that can to store 3 characters, right?
char* s = malloc(3);
int i = 0;
while(i < 1024) { s[i] = 'b'; i++; }
s[i++] = '$';
s[i] = '\0';
printf("%s\n",s);
it works fine. and stores 1024 b's in s.
bbbb[...]$
why doesn't the code above cause a buffer overflow? Can anyone explain?
malloc(size) returns a location in memory where at least size bytes are available for you to use. You are likely to be able to write to the bytes immediately after s[size], but:
Those bytes may belong to other bits of your program, which will cause problems later in the execution.
Or, the bytes might be fine for you to write to - they might belong to a page your program uses, but aren't used for anything.
Or, they might belong to the structures that malloc() has used to keep track of what your program has used. Corrupting this is very bad!
Or, they might NOT belong to your program, which will result in an immediate segmentation fault. This is likely if you access say s[size + large_number]
It's difficult to say which one of these will happen because accessing outside the space you asked malloc() for will result in undefined behaviour.
In your example, you are overflowing the buffer, but not in a way that causes an immediate crash. Keep in mind that C does no bounds checking on array/pointer accesses.
Also, malloc() creates memory on the heap, but buffer overflows are usually about memory on the stack. If you want to create one as an exercise, use
char s[3];
instead. This will create an array of 3 chars on the stack. On most systems, there won't be any free space after the array, and so the space after s[2] will belong to the stack. Writing to that space can overwrite other variables on the stack, and ultimately cause segmentation faults by (say) overwriting the current stack frame's return pointer.
One other thing:
if sizeof(char) is 1 byte
sizeof(char) is actually defined by the standard to always be 1 byte. However, the size of that 1 byte might not be 8 bits on exotic systems. Of course, most of the time you don't have to worry about this.
It is Undefined Behavior(UB) to write beyond the bounds of allocated memory.
Any behavior is possible, no diagnostic is needed for UB & any behavior can be encountered.
An UB does not necessarily warrant a segmentation fault.
In a way, you did overflow your 3 character buffer. However, you did not overflow your program's address space (yet). So you are well out of the bounds of s*, but you are overwriting random other data in your program. Because your program owns this data, the program doesn't crash, but still does very very wrong things, and the future behaviour is undefined.
In practice what this is doing is corrupting the heap. The effects may not appear immediately (in fact, that's part of what makes such errors a PITA to debug). However, you may trash anything else that happens to be in the heap, or in that part of your program's address space for that matter. It's likely that you have also trashed malloc() internal data structures, and so it's likely that subsequent malloc() or free() calls may crash your program, leading many programmers to (falsely) believe they've found a bug in malloc().
You're overflowing the buffer. It depends what memory you're overflowing into to get an error msg.
Did you try executing your code in release mode or did you try to free up the memory you of s? It is an undefined behavior.
It's a bit of a language hack, and a bit dubious about it's use.
help me in understanding the malloc behaviour.. my code is as follows::
int main()
{
int *ptr=NULL;
ptr=(int *)malloc(1);
//check for malloc
*ptr=1000;
printf("address of ptr is %p and value of ptr is %d\n",ptr,*ptr);
return 0;
}
the above program works fine(runs without error)...how?? as I have supplied a value of 1000 in 1 byte only!!
Am I overwriting the next memory addresss in heap?
if yes, then why not sigsgev is there?
Many implementations of malloc will allocate at a certain "resolution" for efficiency.
That means that, even though you asked for one byte, you may well have gotten 16 or 32.
However, it's not something you can rely on since it's undefined behaviour.
Undefined behaviour means that anything can happen, including the whole thing working despite the problematic code :-)
Using a debug heap you will definitely get a crash or some other notification when you freed the memory (but you didn't call free).
Segmentation faults are for page-level access violations, and a memory page is usually on the order of 4k, so an overrun by 3 bytes isn't likely to be detected until some finer grained check detects it or some other part of your code crashes because you overwrote some memory with 'garbage'