Here len is at A[10] and i is at A[11]. Is there a way to catch these errors??
I tried compiling with gcc -Wall -W but no warnings are displayed.
int main()
{
int A[10];
int i, len;
len = sizeof(A) / sizeof(0[A]);
printf("Len = %d\n",len);
for(i = 0; i < len; ++i){
A[i] = i*19%7;
}
A[i] = 5;
A[i + 1] = 6;
printf("Len = %d i = %d\n",len,i);
return 0;
}
Output :
Len = 10
Len = 5 i = 6
You are accessing memory outside the bounds of the array; in C, there is no bounds checking done on array indices.
Accessing memory beyond the end of the array technically results in undefined behavior. This means that there are no guarantees about what happens when you do it. In your example, you end up overwriting the memory occupied by another variable. However, undefined behavior can also cause your application to crash, or worse.
Is there a way to catch these errors?
The compiler can catch some errors like this, but not many. It is often impossible to catch this sort of error at compile-time and report a warning.
Static analysis tools can catch other instances of this sort of error and are usually built to report warnings about code that is likely to cause this sort of error.
C does not generally do bounds-checking, but a number of people have implemented bounds-checking for C. For instance there is a patch for GCC at http://sourceforge.net/projects/boundschecking/. Of course bounds-checking does have some overhead, but it can often be enabled and disabled on a per-file basis.
The array allocation of A is adjacent in memory to i and len. Remember that when you address via an array, it's exactly like using pointers, and you're walking off the end of the array, bumping into the other stuff you put there.
C by default does not do bounds checking. You're expected to be careful as a programmer; in exchange you get speed and size benefits.
Usually external tools, like lint, will catch the problems via static code analysis. Some libraries, depending on compiler vendor, will add additional padding or memory protection to detect when you've walked off the end.
Lots of interesting, dangerous, and non-portable things reside in memory at "random spots." Most of the house keeping for heap memory allocations occur in memory locations before the one the compiler gives you.
The general rule is that if you didn't allocate or request it, don't mess with it.
i's location in memory is just past the end of A. That's not guaranteed with every compiler and every architecture, but most probably wouldn't have a reason to do it any other way.
Note that counting from 0 to 9, you have 10 elements.
Array indexing starts from 0. Hence the size of array is equal to one less than the declared value. You are overwriting the memory beyond what is allowed.
These errors may not be reported as warnings but you can use tools like prevent, sparrow, Klockworks or purify to find such "malpractices" if i may call them that.
The short answer is that local variables are al-located on stack, and indexing is just like *(ptr + index). So it could happen that the room for int y[N] is adjacent to the room for another int x; e.g. x is located after the last y. So, y[N-1] is this last y, while y[N] is the int past the last y, and in this case, by accident, it happens you get x (or whatever in your practical example). But it is absolutely not a sure fact what you can get going past the bounds of an array and so you can't rely on that. Even though undetected, it's a "index out of bound error", and a source of bugs.
Related
Having recently switched to c, I've been told a thousand ways to Sunday that referencing a value that hasn't been initialized isn't good practice, and leads to unexpected behavior. Specifically, (because my previous language initializes integers as 0) I was told that integers might not be equal to zero when uninitialized. So I decided to put that to the test.
I wrote the following piece of code to test this claim:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <assert.h>
int main(){
size_t counter = 0;
size_t testnum = 2000; //The number of ints to allocate and test.
for(int i = 0; i < testnum; i++){
int* temp = malloc(sizeof(int));
assert(temp != NULL); //Just in case there's no space.
if(*temp == 0) counter++;
}
printf(" %d",counter);
return 0;
}
I compiled it like so (in case it matters):
gcc -std=c99 -pedantic name-of-file.c
Based on what my instructors had said, I expected temp to point to a random integer, and that the counter would not be incremented very often. However, my results blow this assumption out of the water:
testnum: || code returns:
2 2
20 20
200 200
2000 2000
20000 20000
200000 200000
2000000 2000000
... ...
The results go on for a couple more powers of 10 (*2), but you get the point.
I then tested a similar version of the above code, but I initialized an integer array, set every even index to plus 1 of its previous value (which was uninitialized), freed the array, and then performed the code above, testing the same amount of integers as the size of the array (i.e. testnum). These results are much more interesting:
testnum: || code returns:
2 2
20 20
200 175
2000 1750
20000 17500
200000 200000
2000000 2000000
... ...
Based on this, it's reasonable to conclude that c reuses freed memory (obviously), and sets some of those new integer pointers to point to addresses which contain the previously incremented integers. My question is why all of my integer pointers in the first test consistently point to 0. Shouldn't they point to whatever empty spaces on the heap that my computer has offered the program, which could (and should, at some point) contain non-zero values?
In other words, why does it seem like all of the new heap space that my c program has access to has been wiped to all 0s?
As you already know, you are invoking undefined behavior, so all bets are off. To explain the particular results you are observing ("why is uninitialized memory that I haven't written to all zeros?"), you first have to understand how malloc works.
First of all, malloc does not just directly ask the system for a page whenever you call it. It has an internal "cache" from which it can hand you memory. Let's say you call malloc(16) twice. The first time you call malloc(16), it will scan the cache, see that it's empty, and request a fresh page (4KB on most systems) from the OS. It then splits this page into two chunks, gives you the smaller chunk, and saves the other chunk in its cache. The second time you call malloc(16), it will see that it has a large enough chunk in its cache, and allocate memory by splitting that chunk again.
freeing memory simply returns it to the cache. There, it may (or may not be) be merged with other chunks to form a bigger chunk, and is then used for other allocations. Depending on the details of your allocator, it may also choose to return free pages to the OS if possible.
Now the second piece of the puzzle -- any fresh pages you obtain from the OS are filled with 0s. Why? Imagine it simply handed you an unused page that was previously used by some other process that has now terminated. Now you have a security problem, because by scanning that "uninitialized memory", your process could potentially find sensitive data such as passwords and private keys that were used by the previous process. Note that there is no guarantee by the C language that this happens (it may be guaranteed by the OS, but the C specification doesn't care). It's possible that the OS filled the page with random data, or didn't clear it at all (especially common on embedded devices).
Now you should be able to explain the behavior you're observing. The first time, you are obtaining fresh pages from the OS, so they are empty (again, this is an implementation detail of your OS, not the C language). However, if you malloc, free, then malloc again, there is a chance that you are getting back the same memory that was in the cache. This cached memory is not wiped, since the only process that could have written to it was your own. Hence, you just get whatever data was previously there.
Note: this explains the behavior for your particular malloc implementation. It doesn't generalize to all malloc implementations.
First off, you need to understand, that C is a language that is described in a standard and implemented by several compilers (gcc, clang, icc, ...). In several cases, the standard mentions that certain expressions or operations result in undefined behavior.
What is important to understand is that this means you have no guarantees on what the behavior will be. In fact any compiler/implementation is basically free to do whatever it wants!
In your example, this means you cannot make any assumptions of when the uninitialized memory will contain. So assuming it will be random or contain elements of a previously freed object are just as wrong as assuming that it is zero, because any of that could happen at any time.
Many compilers (or OS's) will consistently do the same thing (such as the 0s you observer), but that is also not guaranteed.
(To maybe see different behaviors, try using a different compiler or different flags.)
Undefined behavior does not mean "random behavior" nor does it mean "the program will crash." Undefined behavior means "the compiler is allowed to assume that this never happens," and "if this does happen, the program could do anything." Anything includes doing something boring and predictable.
Also, the implementation is allowed to define any instance of undefined behavior. For instance, ISO C never mentions the header unistd.h, so #include <unistd.h> has undefined behavior, but on an implementation conforming to POSIX, it has well-defined and documented behavior.
The program you wrote is probably observing uninitialized malloced memory to be zero because, nowadays, the system primitives for allocating memory (sbrk and mmap on Unix, VirtualAlloc on Windows) always zero out the memory before returning it. That's documented behavior for the primitives, but it is not documented behavior for malloc, so you can only rely on it if you call the primitives directly. (Note that only the malloc implementation is allowed to call sbrk.)
A better demonstration is something like this:
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
{
int *x = malloc(sizeof(int));
*x = 0xDEADBEEF;
free(x);
}
{
int *y = malloc(sizeof(int));
printf("%08X\n", *y);
}
return 0;
}
which has pretty good odds of printing "DEADBEEF" (but is allowed to print 00000000, or 5E5E5E5E, or make demons fly out of your nose).
Another better demonstration would be any program that makes a control-flow decision based on the value of an uninitialized variable, e.g.
int foo(int x)
{
int y;
if (y == 5)
return x;
return 0;
}
Current versions of gcc and clang will generate code that always returns 0, but the current version of ICC will generate code that returns either 0 or the value of x, depending on whether register EDX is equal to 5 when the function is called. Both possibilities are correct, and so generating code that always returns x, and so is generating code that makes demons fly out of your nose.
useless deliberations, wrong assumptions, wrong test. In your test every time you malloc sizeof int of the fresh memory. To see the that UB you wanted to see you should put something in that allocated memory and then free it. Otherwise you do not reuse it, you just leak it. Most of the OS-es clear all the memory allocated to the program before executing it for the security reasons (so when you start the program everything was zeroed or initialised to the static values).
Change your program to:
int main(){
size_t counter = 0;
size_t testnum = 2000; //The number of ints to allocate and test.
for(int i = 0; i < testnum; i++){
int* temp = malloc(sizeof(int));
assert(temp != NULL); //Just in case there's no space.
if(*temp == 0) counter++;
*temp = rand();
free(temp);
}
printf(" %d",counter);
return 0;
}
I am a beginner with C. I am wondering, how's malloc working.
Here is a sample code, I wrote on while trying to understand it's working.
CODE:
#include<stdio.h>
#include<stdlib.h>
int main() {
int i;
int *array = malloc(sizeof *array);
for (i = 0; i < 5; i++) {
array[i] = i+1;
}
printf("\nArray is: \n");
for (i = 0; i < 5; i++) {
printf("%d ", array[i]);
}
free(array);
return 0;
}
OUTPUT:
Array is:
1 2 3 4 5
In the program above, I have only allocated space for 1 element, but the array now holds 5 elements. So as the programs runs smoothly without any error, what is the purpose of realloc().
Could anybody explain why?
Thanks in advance.
The fact that the program runs smoothly does not mean it is correct!
Try to increase the 5 in the for loop to some extent (500000, for instance, should suffices). At some point, it will stop working giving you a SEGFAULT.
This is called Undefined Behaviour.
valgrind would also warn you about the issue with something like the following.
==16812== Invalid write of size 4
==16812== at 0x40065E: main (test.cpp:27)
If you dont know what valgrind is check this out: How do I use valgrind to find memory leaks?. (BTW it's a fantastic tool)
This should help gives you some more clarifications: Accessing unallocated memory C++
This is typical undefined behavior (UB).
You are not allowed to code like that. As a beginner, think it is a mistake, a fault, a sin, something very dirty etc.
Could anybody explain why?
If you need to understand what is really happening (and the details are complex) you need to dive into your implementation details (and you don't want to). For example, on Linux, you could study the source code of your C standard library, of the kernel, of the compiler, etc. And you need to understand the machine code generated by the compiler (so with GCC compile with gcc -S -O1 -fverbose-asm to get an .s assembler file).
See also this (which has more references).
Read as soon as possible Lattner's blog on What Every C programmer should know about undefined behavior. Every one should have read it!
The worst thing about UB is that sadly, sometimes, it appears to "work" like you want it to (but in fact it does not).
So learn as quickly as possible to avoid UB systematically.
BTW, enabling all warnings in the compiler might help (but perhaps not in your particular case). Take the habit to compile with gcc -Wall -Wextra -g if using GCC.
Notice that your program don't have any arrays. The array variable is a pointer (not an array) so is very badly named. You need to read more about pointers and C dynamic memory allocation.
int *array = malloc(sizeof *array); //WRONG
is very wrong. The name array is very poorly chosen (it is a pointer, not an array; you should spend days in reading what is the difference - and what do "arrays decay into pointers" mean). You allocate for a sizeof(*array) which is exactly the same as sizeof(int) (and generally 4 bytes, at least on my machine). So you allocate space for only one int element. Any access beyond that (i.e. with any even small positive index, e.g. array[1] or array[i] with some positive i) is undefined behavior. And you don't even test against failure of malloc (which can happen).
If you want to allocate memory space for (let's say) 8 int-s, you should use:
int* ptr = malloc(sizeof(int) * 8);
and of course you should check against failure, at least:
if (!ptr) { perror("malloc"); exit(EXIT_FAILURE); };
and you need to initialize that array (the memory you've got contain unpredictable junk), e.g.
for (int i=0; i<8; i++) ptr[i] = 0;
or you could clear all bits (with the same result on all machines I know of) using
memset(ptr, 0, sizeof(int)*8);
Notice that even after a successful such malloc (or a failed one) you always have sizeof(ptr) be the same (on my Linux/x86-64 box, it is 8 bytes), since it is the size of a pointer (even if you malloc-ed a memory zone for a million int-s).
In practice, when you use C dynamic memory allocation you need to know conventionally the allocated size of that pointer. In the code above, I used 8 in several places, which is poor style. It would have been better to at least
#define MY_ARRAY_LENGTH 8
and use MY_ARRAY_LENGTH everywhere instead of 8, starting with
int* ptr = malloc(MY_ARRAY_LENGTH*sizeof(int));
In practice, allocated memory has often a runtime defined size, and you would keep somewhere (in a variable, a parameter, etc...) that size.
Study the source code of some existing free software project (e.g. on github), you'll learn very useful things.
Read also (perhaps in a week or two) about flexible array members. Sometimes they are very useful.
So as the programs runs smoothly without any error
That's just because you were lucky. Keep running this program and you might segfault soon. You were relying on undefined behaviour (UB), which is always A Bad Thing™.
What is the purpose of realloc()?
From the man pages:
void *realloc(void *ptr, size_t size);
The realloc() function changes the size of the memory block pointed to
by ptr to size bytes. The contents will be unchanged in the range
from the start of the region up to the minimum of the old and new sizes. If the new size is larger than the old size, the added
memory
will not be initialized. If ptr is NULL, then the call is equivalent to malloc(size), for all values of size; if size is equal
to zero,
and ptr is not NULL, then the call is equivalent to free(ptr). Unless ptr is NULL, it must have been returned by an
earlier call to malloc(), calloc() or realloc(). If the area pointed to was moved, a free(ptr) is done.
I want to write a C code to see the difference between static and dynamic allocation.
That's my idea but it doesn't work.
It simply initializes an array of size 10, but assigns 100 elements instead of 10. I'll then initialize another array large enough hoping to replace the 90 elements that're not part of array1[10], then I print out the 100 elements of array1.
int i;
int array1[10];
int array2[10000];
for(i=0;i<100;i++)
array1[i] = i;
for(i=0;i<10000;i++)
array2[i] = i+1;
for(i=0;i<100;i++)
{
printf("%d \n",array1[i]);
}
What I hope to get is garbage outside then first 10 elements when using static allocation, afterwards, I'll use malloc and realloc to ensure that the 100 elements would be there correctly. But unfortunately, it seems that the memory is large enough so that the rest of the 100 elements wouldn't be replaced!
I tried to run the code on linux and use "ulimit" to limit the memory size, but it didn't work either.
Any ideas please?
Cdoesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
Accessing outside the array bounds is undefined behavior, from the c99 draft standard section Annex J.2 J.2 Undefined behavior includes the follow point:
An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]) (6.5.6).
In this example you are declaring a stack based array. Accessing out of bound will get memory from already allocated stack space. Currently undefined behavior is not in your favor as there is no Seg fault. Its programmer's responsibility to handle boundary conditions while writing code in C/C++.
You do get garbage after the first 10 elements of array1. All of the data after element 9 should not be considered allocated by the stack and can be written over at any time. When the program prints the 100 elements of array1, you might see the remnants of either for loop because the two arrays are allocated next to each other and normally haven't been written over. If this were implemented in a larger program, other arrays might take up the space after these two example arrays.
When you access array1[10] and higher index values, the program will just keep writing into adjacent memory locations even though they don't "belong" to your array. At some point you might try to access a memory location that's forbidden, but as long as you're mucking with memory that the OS has given to your program, this will run. The results will be unpredictable though. It could happen that this will corrupt data that belongs to another variable in your program, for example. It could also happen that the value that you wrote there will still be there when you go back to read it if no other variable has been "properly assigned" that memory location. (This seems to be what's happening in the specific case that you posted.)
All of that being said, I'm not clear at all how this relates to potential differences between static and dynamic memory allocation since you've only done static allocation in the program and you've deliberately introduced a bug.
Changing the memory size won't resolve your problem, because when you create your two arrays, the second one should be right after the first one in memory.
Your code should do what you think it will, and on my computer, it does.
Here's my output :
0
1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
...
What OS are you running your code on ? (I'm on linux 64bit).
Anyway, as everybody told you, DON'T EVER DO THIS IN A REAL PROGRAM. Writing outside an array is an undefined behaviour and could lead your program to crash.
Writing out of bounds of an array will prove nothing and is not well-defined. Generally, there's nothing clever or interesting involved in invoking undefined behavior. The only thing you'll achieve by that is random crashes.
If you wish to know where a variable is allocated, you have to look at addresses. Here's one example:
#include <stdio.h>
#include <stdlib.h>
int main (void)
{
int stack;
static int data = 1;
static int bss = 0;
int* heap = malloc(sizeof(*heap));
printf("stack: %p\n", (void*)&stack);
printf(".data: %p\n", (void*)&data);
printf(".bss: %p\n", (void*)&bss);
printf(".heap: %p\n", (void*)heap);
}
This should print 4 distinctively different addresses (.data and .bss probably close to each other though). To know exactly where a certain memory area starts, you either need to check some linker script or use a system-specific API. And once you know the memory area's offset and size, you can determine if a variable is stored within one of the different memory segments.
Can anyone tell me the effect that the below code has on memory?
My questions are:
In code 1, is the same memory location getting updated every
time in the loop?
In code 2, is new memory allocated as the variable is declared and assigned in for loop?
Code 1:
int num;long result;
for(int i=0;i<500;i++){
num = i;
result = num * num;
}
Code 2:
for(int i=0;i<500;i++){
int num = i;
long result = num * num;
}
In both cases only one num and one result instance will be created.
The only different is that on Code 2 num and result won't be accessible after the loop and the memory used to hold them can be reused for other members.
Important: Where you declare a local variable in your source code has very little impact on when the actual allocation and deallocation (for example push/pop on the stack) takes place. If at all, the variable might get optimized away entirely.
Where in your C code you allocate your local variable has most likely no impact on performance what-so-ever. Therefore, you should declare local variables where it gives the best possible readability.
In both cases, the compiler will deduce that num is completely superfluous and optimize it away. No memory will be allocated for it.
In code 2, the compiler will deduce that the local variable result isn't used anywhere in the program, and therefore it will most likely optimize away the whole of code 2 into nothing.
The machine code of code 1 will look something like this:
allocate room for i
allocate room for result
set i to 0
loop:
multiply i by i
store in result
if i is less than 500, jump to loop
Can anyone tell me the effect that the below code has on memory?
As others have mentioned, the answer to your question hugely depends on the compiler you are using.
[1] Assuming you intend to use result and num later in the loop for other computations as in:
for(int i=0; i<500; ++i){
int num = i;
long result = num * num;
// more operations using num and result, e.g. function calls, IO, etc.
}
, a modern compiler (gcc, llvm, mvc, icc) would be smart enough to optimise the two codes you provided to the same thing, thus in both codes the same "memory location" would be updated on each iteration.
I put "memory location" in quotes, as the variables can be promoted to registers, which, while strictly speaking are still memory, are a lot faster to access and thus a preferable location for frequently used variables.
The only difference between your codes is the scope of your variables, but you probably already know that.
If, conversly to [1], you don't intend to use your variables later, the compiler would probably detect that and just skip the loop, not generating any machine code for it, as it is redundant.
I need help with my C code. I have a function that sets a value to the spot in memory
to the value that you have input to the function.
The issue that I am facing is if the pointer moves past the allocated amount of memory
It should throw an error. I am not sure how to check for this issue.
unsignded char the_pool = malloc(1000);
char *num = a pointer to the start of the_pool up to ten spots
num[i] = val;
num[11] = val; //This should throw an error in my function which
So how can I check to see that I have moved into unauthorized memory space.
C will not catch this error for you. You must do it yourself.
For example, you could safely wrap access to your array in a function:
typedef struct
{
char *data;
int length;
} myArrayType;
void MakeArray( myArrayType *p, int length )
{
p->data = (char *)malloc(length);
p->length = length;
}
int WriteToArrayWithBoundsChecking( myArrayType *p, int index, char value )
{
if ( index >= 0 && index < p->length )
{
p->data[index] = value;
return 1; // return "success"
}
else
{
return 0; // return "failure"
}
}
Then you can look at the return value of WriteToArrayWithBoundsChecking() to see if your write succeeded or not.
Of course you must remember to clean up the memory pointed at by myArrayType->data when you are done. Otherwise you will cause a leak.
dont you mean?
num[11] = val
Yes there is no way to check that it is beyond bounds except doing it yourself, C provides no way to do this. Also note that arrays start at zero so num[10] is also beyond bounds.
The standard defines this as Undefined behavior.
It might work, it might not, you never know, when coding in C/C++, make sure you check for bounds before accessing your arrays
Common C compilers will not perform array bounds checking for you.
Some compilers are available that claim to support array bounds -- but their performance is usually poor enough compared to the normal compilers that they are usually not distributed far and wide.
There are even dialects of C intended to provide memory safety, but again, these usually do not get very far. (The linked Cyclone, for example, only supports 32 bit platforms, last time I looked into it.)
You may build your own datastructures to provide bounds checking if you wish. If you maintain a structure that includes a pointer to the start of your data, a data member that includes the allocated size, and functions that work on the structure, you can implement all this. But the onus is entirely on you or your environment to provide these datastructures.
I guess you could use sizeof to avoid your array access out of bound index. But c allows you to access some memory out of your array bound. That's OK for c compiler, and OS will manage the behavior when you do that.
C/C++ doesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
You could use array like this:
type name[size];
if you are using the Visual Studio 2010 ( or 2011 Beta ) it will till you after u try to free the allocated memory.
there is advanced tools to check for leaked memory.
in you example, you have actually moved to unauthorized memory space indeed. your indexes should be between 0 to ( including ) 999.