Here is the code that i want to understand:
#include <stdio.h>
#include <stdlib.h>
#define MAX 100
int main()
{
int *ptr = (int *)malloc(5 * sizeof(int)),i;
for(i=0;i<MAX;i++)
{
ptr[i] = i;
}
for(i=0;i<MAX;i++)
{
printf("%d\n",ptr[i]);
}
return 0;
}
My question: I allocated 5 * int size of memory but why it takes more than 5 ineteger?
Thnx
You reserved space for 5 integers. For the other 95 integers, you're writing into space that is reserved for other purposes. Your program may or may not crash, but you should expect that it will fail one way or another.
It doesn't "take" more than 5 integers; you are just invoking undefined behavior. You can't expect the code to "succeed" even if you are seeing it work on your implementation.
It's not 'taking' more than 5 integers : you allocated 5 * sizeof(int) and invoke undefined behavior by accessing memory beyond this size.
There's no question as whether you should set MAX to 10, 1024 or 100000 : the code is fundamentally wrong, and the fact that it didn't fail when you ran it doesn't make it less wrong. Tools like valgrind may help you detect such mistakes.
You are allocating 5 integers, anything you write or read more than this is incorrect
OS protection boundaries are 1 page, which generally means 4k.
Even if you have allocated only 5 integers, you still have the rest of the page unprotected. That is how buffer overflows and many program misbehaviors happen
I am betting if your MAX is set to 1025, you will have seg fault (assuming this is your program)
C doesn't perform bounds checking on arrays. If you have a 5-element array, C will happily let you assign to arr[5], arr[100], or even arr[-1].
If you're lucky, this will merely overwrite unused memory and your program will work anyway.
If you're unlucky, you'll overwrite other variables in your program, the metadata for malloc, or the OS, and Bad Things will happen. Get used to seeing the phrase "segmentation fault".
Related
This question already has answers here:
No out of bounds error
(7 answers)
Closed 3 years ago.
I am expecting the following snippet to allocate memory for five members using calloc.
$ cat calloc.c
// C program to demonstrate the use of calloc()
// and malloc()
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *arr;
arr = (int *)calloc(5, sizeof(int));
printf("%x\n", *arr);
printf("%x\n", *(arr+1));
printf("%x\n", *(arr+2));
printf("%x\n", *(arr+3));
printf("%x\n", *(arr+4));
printf("%x\n", *(arr+5));
printf("%x\n", *(arr+6));
// Deallocates memory previously allocated by calloc() function
free(arr);
return(0);
}
But it seems to be allocating more than five; it is allocating six members, why?
./a.out
0
0
0
0
0
0
411
Allocating memory isn't like buying lollipops. Allocating memory is more like buying land.
If you buy five lollipops and you try to eat the sixth one, this obviously doesn't work — it's pretty nonsensical to even talk about "trying to eat the sixth one".
But if you buy a ten foot by fifty foot plot of land, and you start putting up 10x10 foot buildings, and after building five of them (completely occupying your land), you build a sixth one encroaching over onto your neighbor's land, your neighbor might not notice right away, so you might get away with it. (For a little while, anyway. There's bound to be trouble in the end.)
Similarly, if you allocate an array of size 5 and then try to access a nonexistent 6th element, there's no law of nature that prevents it the way there was when you tried to eat the nonexistent 6th lollipop. In C you don't generally get an error message about out-of-bound array access, so you might get code that seems to work, even though it's doing something totally unacceptable, like encroaching on a neighbor's land.
First of all, let's get something clear: for all purposes, *(a+x) is the same as a[x].
C has free memory access. If you do arr[1000], you will still get a value printed, or the program will crash with a segmentation fault. This is the classic case of undefined behaviour. The compiler cannot know whether the code you wrote is wrong or not, so it cannot throw an error. Instead, the C standard says this is undefined behaviour. What this means is that you are accessing memory you shouldn't.
You, as the programmer, are responsible to check that you don't go out of bounds of the array and not the compiler. Also, calloc initializes all elements with 0. Why do you think you got 411? Try running it again, you will probably get a different value. That memory you are accessing at a[5] is not allocated for the array. Instead, you are going out of the bounds of the array. That memory could have very well been allocated to something else. If it was allocated to another program, you would get a segmentation fault when you run the program.
It hasn't allocated memory more than 5. It has allocated 5 members and initialized them with 0. When you access to outside of the allocated memory, it may be written anything into it, and not certainly a non zero value.
Every time a malloc, calloc or realloc is called, the memory allocation is done from heap area.
Run Time/ Dynamic allocation -> Heap
Static/Compile Time-> Stack
// C program to demonstrate the use of calloc()
// and malloc()
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *arr;
arr = (int *)calloc(5, sizeof(int)); // dynamically allocate memory for 5 element each of size of integer variables and intializes all with 0.
printf("%x\n", *arr);//array name is a constant pointer(points to base address of the array
printf("%x\n", *(arr+1));
printf("%x\n", *(arr+2));
printf("%x\n", *(arr+3));
printf("%x\n", *(arr+4));
printf("%x\n", *(arr+5));
printf("%x\n", *(arr+6));// you are accessing memory which is not the part of your pointer variable. There are chances that this is the part of the some other program or variable.
// Deallocates memory previously allocated by calloc() function
free(arr);
arr=NULL;//always assign pointer to null to avoid any dangling pointer situation
return(0);
}
Having recently switched to c, I've been told a thousand ways to Sunday that referencing a value that hasn't been initialized isn't good practice, and leads to unexpected behavior. Specifically, (because my previous language initializes integers as 0) I was told that integers might not be equal to zero when uninitialized. So I decided to put that to the test.
I wrote the following piece of code to test this claim:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <assert.h>
int main(){
size_t counter = 0;
size_t testnum = 2000; //The number of ints to allocate and test.
for(int i = 0; i < testnum; i++){
int* temp = malloc(sizeof(int));
assert(temp != NULL); //Just in case there's no space.
if(*temp == 0) counter++;
}
printf(" %d",counter);
return 0;
}
I compiled it like so (in case it matters):
gcc -std=c99 -pedantic name-of-file.c
Based on what my instructors had said, I expected temp to point to a random integer, and that the counter would not be incremented very often. However, my results blow this assumption out of the water:
testnum: || code returns:
2 2
20 20
200 200
2000 2000
20000 20000
200000 200000
2000000 2000000
... ...
The results go on for a couple more powers of 10 (*2), but you get the point.
I then tested a similar version of the above code, but I initialized an integer array, set every even index to plus 1 of its previous value (which was uninitialized), freed the array, and then performed the code above, testing the same amount of integers as the size of the array (i.e. testnum). These results are much more interesting:
testnum: || code returns:
2 2
20 20
200 175
2000 1750
20000 17500
200000 200000
2000000 2000000
... ...
Based on this, it's reasonable to conclude that c reuses freed memory (obviously), and sets some of those new integer pointers to point to addresses which contain the previously incremented integers. My question is why all of my integer pointers in the first test consistently point to 0. Shouldn't they point to whatever empty spaces on the heap that my computer has offered the program, which could (and should, at some point) contain non-zero values?
In other words, why does it seem like all of the new heap space that my c program has access to has been wiped to all 0s?
As you already know, you are invoking undefined behavior, so all bets are off. To explain the particular results you are observing ("why is uninitialized memory that I haven't written to all zeros?"), you first have to understand how malloc works.
First of all, malloc does not just directly ask the system for a page whenever you call it. It has an internal "cache" from which it can hand you memory. Let's say you call malloc(16) twice. The first time you call malloc(16), it will scan the cache, see that it's empty, and request a fresh page (4KB on most systems) from the OS. It then splits this page into two chunks, gives you the smaller chunk, and saves the other chunk in its cache. The second time you call malloc(16), it will see that it has a large enough chunk in its cache, and allocate memory by splitting that chunk again.
freeing memory simply returns it to the cache. There, it may (or may not be) be merged with other chunks to form a bigger chunk, and is then used for other allocations. Depending on the details of your allocator, it may also choose to return free pages to the OS if possible.
Now the second piece of the puzzle -- any fresh pages you obtain from the OS are filled with 0s. Why? Imagine it simply handed you an unused page that was previously used by some other process that has now terminated. Now you have a security problem, because by scanning that "uninitialized memory", your process could potentially find sensitive data such as passwords and private keys that were used by the previous process. Note that there is no guarantee by the C language that this happens (it may be guaranteed by the OS, but the C specification doesn't care). It's possible that the OS filled the page with random data, or didn't clear it at all (especially common on embedded devices).
Now you should be able to explain the behavior you're observing. The first time, you are obtaining fresh pages from the OS, so they are empty (again, this is an implementation detail of your OS, not the C language). However, if you malloc, free, then malloc again, there is a chance that you are getting back the same memory that was in the cache. This cached memory is not wiped, since the only process that could have written to it was your own. Hence, you just get whatever data was previously there.
Note: this explains the behavior for your particular malloc implementation. It doesn't generalize to all malloc implementations.
First off, you need to understand, that C is a language that is described in a standard and implemented by several compilers (gcc, clang, icc, ...). In several cases, the standard mentions that certain expressions or operations result in undefined behavior.
What is important to understand is that this means you have no guarantees on what the behavior will be. In fact any compiler/implementation is basically free to do whatever it wants!
In your example, this means you cannot make any assumptions of when the uninitialized memory will contain. So assuming it will be random or contain elements of a previously freed object are just as wrong as assuming that it is zero, because any of that could happen at any time.
Many compilers (or OS's) will consistently do the same thing (such as the 0s you observer), but that is also not guaranteed.
(To maybe see different behaviors, try using a different compiler or different flags.)
Undefined behavior does not mean "random behavior" nor does it mean "the program will crash." Undefined behavior means "the compiler is allowed to assume that this never happens," and "if this does happen, the program could do anything." Anything includes doing something boring and predictable.
Also, the implementation is allowed to define any instance of undefined behavior. For instance, ISO C never mentions the header unistd.h, so #include <unistd.h> has undefined behavior, but on an implementation conforming to POSIX, it has well-defined and documented behavior.
The program you wrote is probably observing uninitialized malloced memory to be zero because, nowadays, the system primitives for allocating memory (sbrk and mmap on Unix, VirtualAlloc on Windows) always zero out the memory before returning it. That's documented behavior for the primitives, but it is not documented behavior for malloc, so you can only rely on it if you call the primitives directly. (Note that only the malloc implementation is allowed to call sbrk.)
A better demonstration is something like this:
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
{
int *x = malloc(sizeof(int));
*x = 0xDEADBEEF;
free(x);
}
{
int *y = malloc(sizeof(int));
printf("%08X\n", *y);
}
return 0;
}
which has pretty good odds of printing "DEADBEEF" (but is allowed to print 00000000, or 5E5E5E5E, or make demons fly out of your nose).
Another better demonstration would be any program that makes a control-flow decision based on the value of an uninitialized variable, e.g.
int foo(int x)
{
int y;
if (y == 5)
return x;
return 0;
}
Current versions of gcc and clang will generate code that always returns 0, but the current version of ICC will generate code that returns either 0 or the value of x, depending on whether register EDX is equal to 5 when the function is called. Both possibilities are correct, and so generating code that always returns x, and so is generating code that makes demons fly out of your nose.
useless deliberations, wrong assumptions, wrong test. In your test every time you malloc sizeof int of the fresh memory. To see the that UB you wanted to see you should put something in that allocated memory and then free it. Otherwise you do not reuse it, you just leak it. Most of the OS-es clear all the memory allocated to the program before executing it for the security reasons (so when you start the program everything was zeroed or initialised to the static values).
Change your program to:
int main(){
size_t counter = 0;
size_t testnum = 2000; //The number of ints to allocate and test.
for(int i = 0; i < testnum; i++){
int* temp = malloc(sizeof(int));
assert(temp != NULL); //Just in case there's no space.
if(*temp == 0) counter++;
*temp = rand();
free(temp);
}
printf(" %d",counter);
return 0;
}
I am a beginner with C. I am wondering, how's malloc working.
Here is a sample code, I wrote on while trying to understand it's working.
CODE:
#include<stdio.h>
#include<stdlib.h>
int main() {
int i;
int *array = malloc(sizeof *array);
for (i = 0; i < 5; i++) {
array[i] = i+1;
}
printf("\nArray is: \n");
for (i = 0; i < 5; i++) {
printf("%d ", array[i]);
}
free(array);
return 0;
}
OUTPUT:
Array is:
1 2 3 4 5
In the program above, I have only allocated space for 1 element, but the array now holds 5 elements. So as the programs runs smoothly without any error, what is the purpose of realloc().
Could anybody explain why?
Thanks in advance.
The fact that the program runs smoothly does not mean it is correct!
Try to increase the 5 in the for loop to some extent (500000, for instance, should suffices). At some point, it will stop working giving you a SEGFAULT.
This is called Undefined Behaviour.
valgrind would also warn you about the issue with something like the following.
==16812== Invalid write of size 4
==16812== at 0x40065E: main (test.cpp:27)
If you dont know what valgrind is check this out: How do I use valgrind to find memory leaks?. (BTW it's a fantastic tool)
This should help gives you some more clarifications: Accessing unallocated memory C++
This is typical undefined behavior (UB).
You are not allowed to code like that. As a beginner, think it is a mistake, a fault, a sin, something very dirty etc.
Could anybody explain why?
If you need to understand what is really happening (and the details are complex) you need to dive into your implementation details (and you don't want to). For example, on Linux, you could study the source code of your C standard library, of the kernel, of the compiler, etc. And you need to understand the machine code generated by the compiler (so with GCC compile with gcc -S -O1 -fverbose-asm to get an .s assembler file).
See also this (which has more references).
Read as soon as possible Lattner's blog on What Every C programmer should know about undefined behavior. Every one should have read it!
The worst thing about UB is that sadly, sometimes, it appears to "work" like you want it to (but in fact it does not).
So learn as quickly as possible to avoid UB systematically.
BTW, enabling all warnings in the compiler might help (but perhaps not in your particular case). Take the habit to compile with gcc -Wall -Wextra -g if using GCC.
Notice that your program don't have any arrays. The array variable is a pointer (not an array) so is very badly named. You need to read more about pointers and C dynamic memory allocation.
int *array = malloc(sizeof *array); //WRONG
is very wrong. The name array is very poorly chosen (it is a pointer, not an array; you should spend days in reading what is the difference - and what do "arrays decay into pointers" mean). You allocate for a sizeof(*array) which is exactly the same as sizeof(int) (and generally 4 bytes, at least on my machine). So you allocate space for only one int element. Any access beyond that (i.e. with any even small positive index, e.g. array[1] or array[i] with some positive i) is undefined behavior. And you don't even test against failure of malloc (which can happen).
If you want to allocate memory space for (let's say) 8 int-s, you should use:
int* ptr = malloc(sizeof(int) * 8);
and of course you should check against failure, at least:
if (!ptr) { perror("malloc"); exit(EXIT_FAILURE); };
and you need to initialize that array (the memory you've got contain unpredictable junk), e.g.
for (int i=0; i<8; i++) ptr[i] = 0;
or you could clear all bits (with the same result on all machines I know of) using
memset(ptr, 0, sizeof(int)*8);
Notice that even after a successful such malloc (or a failed one) you always have sizeof(ptr) be the same (on my Linux/x86-64 box, it is 8 bytes), since it is the size of a pointer (even if you malloc-ed a memory zone for a million int-s).
In practice, when you use C dynamic memory allocation you need to know conventionally the allocated size of that pointer. In the code above, I used 8 in several places, which is poor style. It would have been better to at least
#define MY_ARRAY_LENGTH 8
and use MY_ARRAY_LENGTH everywhere instead of 8, starting with
int* ptr = malloc(MY_ARRAY_LENGTH*sizeof(int));
In practice, allocated memory has often a runtime defined size, and you would keep somewhere (in a variable, a parameter, etc...) that size.
Study the source code of some existing free software project (e.g. on github), you'll learn very useful things.
Read also (perhaps in a week or two) about flexible array members. Sometimes they are very useful.
So as the programs runs smoothly without any error
That's just because you were lucky. Keep running this program and you might segfault soon. You were relying on undefined behaviour (UB), which is always A Bad Thing™.
What is the purpose of realloc()?
From the man pages:
void *realloc(void *ptr, size_t size);
The realloc() function changes the size of the memory block pointed to
by ptr to size bytes. The contents will be unchanged in the range
from the start of the region up to the minimum of the old and new sizes. If the new size is larger than the old size, the added
memory
will not be initialized. If ptr is NULL, then the call is equivalent to malloc(size), for all values of size; if size is equal
to zero,
and ptr is not NULL, then the call is equivalent to free(ptr). Unless ptr is NULL, it must have been returned by an
earlier call to malloc(), calloc() or realloc(). If the area pointed to was moved, a free(ptr) is done.
I want to write a C code to see the difference between static and dynamic allocation.
That's my idea but it doesn't work.
It simply initializes an array of size 10, but assigns 100 elements instead of 10. I'll then initialize another array large enough hoping to replace the 90 elements that're not part of array1[10], then I print out the 100 elements of array1.
int i;
int array1[10];
int array2[10000];
for(i=0;i<100;i++)
array1[i] = i;
for(i=0;i<10000;i++)
array2[i] = i+1;
for(i=0;i<100;i++)
{
printf("%d \n",array1[i]);
}
What I hope to get is garbage outside then first 10 elements when using static allocation, afterwards, I'll use malloc and realloc to ensure that the 100 elements would be there correctly. But unfortunately, it seems that the memory is large enough so that the rest of the 100 elements wouldn't be replaced!
I tried to run the code on linux and use "ulimit" to limit the memory size, but it didn't work either.
Any ideas please?
Cdoesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
Accessing outside the array bounds is undefined behavior, from the c99 draft standard section Annex J.2 J.2 Undefined behavior includes the follow point:
An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]) (6.5.6).
In this example you are declaring a stack based array. Accessing out of bound will get memory from already allocated stack space. Currently undefined behavior is not in your favor as there is no Seg fault. Its programmer's responsibility to handle boundary conditions while writing code in C/C++.
You do get garbage after the first 10 elements of array1. All of the data after element 9 should not be considered allocated by the stack and can be written over at any time. When the program prints the 100 elements of array1, you might see the remnants of either for loop because the two arrays are allocated next to each other and normally haven't been written over. If this were implemented in a larger program, other arrays might take up the space after these two example arrays.
When you access array1[10] and higher index values, the program will just keep writing into adjacent memory locations even though they don't "belong" to your array. At some point you might try to access a memory location that's forbidden, but as long as you're mucking with memory that the OS has given to your program, this will run. The results will be unpredictable though. It could happen that this will corrupt data that belongs to another variable in your program, for example. It could also happen that the value that you wrote there will still be there when you go back to read it if no other variable has been "properly assigned" that memory location. (This seems to be what's happening in the specific case that you posted.)
All of that being said, I'm not clear at all how this relates to potential differences between static and dynamic memory allocation since you've only done static allocation in the program and you've deliberately introduced a bug.
Changing the memory size won't resolve your problem, because when you create your two arrays, the second one should be right after the first one in memory.
Your code should do what you think it will, and on my computer, it does.
Here's my output :
0
1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
...
What OS are you running your code on ? (I'm on linux 64bit).
Anyway, as everybody told you, DON'T EVER DO THIS IN A REAL PROGRAM. Writing outside an array is an undefined behaviour and could lead your program to crash.
Writing out of bounds of an array will prove nothing and is not well-defined. Generally, there's nothing clever or interesting involved in invoking undefined behavior. The only thing you'll achieve by that is random crashes.
If you wish to know where a variable is allocated, you have to look at addresses. Here's one example:
#include <stdio.h>
#include <stdlib.h>
int main (void)
{
int stack;
static int data = 1;
static int bss = 0;
int* heap = malloc(sizeof(*heap));
printf("stack: %p\n", (void*)&stack);
printf(".data: %p\n", (void*)&data);
printf(".bss: %p\n", (void*)&bss);
printf(".heap: %p\n", (void*)heap);
}
This should print 4 distinctively different addresses (.data and .bss probably close to each other though). To know exactly where a certain memory area starts, you either need to check some linker script or use a system-specific API. And once you know the memory area's offset and size, you can determine if a variable is stored within one of the different memory segments.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *arr = (int*)malloc(10);
int i;
for(i=0;i<100;i++)
{
arr[i]=i;
printf("%d", arr[i]);
}
return 0;
}
I am running above program and a call to malloc will allocate 10 bytes of memory and since each int variable takes up 2 bytes so in a way I can store 5 int variables of 2 bytes each thus making up my total 10 bytes which I dynamically allocated.
But on making a call to for-loop it is allowing me to enter values even till 99th index and storing all these values as well. So in a way if I am storing 100 int values it means 200 bytes of memory whereas I allocated only 10 bytes.
So where is the flaw with this code or how does malloc behave? If the behaviour of malloc is non-deterministic in such a manner then how do we achieve proper dynamic memory handling?
The flaw is in your expectations. You lied to the compiler: "I only need 10 bytes" when you actually wrote 100*sizeof(int) bytes. Writing beyond an allocated area is undefined behavior and anything may happen, ranging from nothing to what you expect to crashes.
If you do silly things expect silly behaviour.
That said malloc is usually implemented to ask the OS for chunks of memory that the OS prefers (like a page) and then manages that memory. This speeds up future mallocs especially if you are using lots of mallocs with small sizes. It reduces the number of context switches that are quite expensive.
First of all, in the most Operating Systems the size of int is 4 bytes. You can check that with:
printf("the size of int is %d\n", sizeof(int));
When you call the malloc function you allocate size at heap memory. The heap is a set aside for dynamic allocation. There's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time. Because your program is small and you have no collision in the heap you can run this for with more values that 100 and it runs too.
When you know what are you doing with malloc then you build programs with proper dynamic memory handling. When your code has improper malloc allocation then the behaviour of the program is "unknown". But you can use gdb debugger to find where the segmentation will be revealed and how the things are in heap.
malloc behaves exactly as it states, allocates n number bytes of memory, nothing more. Your code might run on your PC, but operating on non-allocated memory is undefined behavior.
A small note...
Int might not be 2 bytes, it varies on different architectures/SDKs. When you want to allocate memory for n integer elements, you should use malloc( n * sizeof( int ) ).
All in short, you manage dynamic memory with other tools that the language provides ( sizeof, realloc, free, etc. ).
C doesn't do any bounds-checking on array accesses; if you define an array of 10 elements, and attempt to write to a[99], the compiler won't do anything to stop you. The behavior is undefined, meaning the compiler isn't required to do anything in particular about that situation. It may "work" in the sense that it won't crash, but you've just clobbered something that may cause problems later on.
When doing a malloc, don't think in terms of bytes, think in terms of elements. If you want to allocate space for N integers, write
int *arr = malloc( N * sizeof *arr );
and let the compiler figure out the number of bytes.