defining a simple array of integers - c

Why can i do this?
int array[4]; // i assume this creates an array with 4 indexes
array[25]=1; // i have assigned an index larger than the int declaration
NSLog(#"result: %i", array[25]); // this prints "1" to the screen
Why does this work, if the index exceeds the declaration? what is the significance of the number in the declaration if it has no effect on what you can actually do with the array?
Thanks.

You are getting undefined behavior. It could print anything, it could crash, it could burst into singing (okay that isn't likely but you get the idea).
If it happens to write to a location that is mapped with the adequate permissions it will work. Until one day when it won't because of a different layout.

it is undefined. some OS will give you segmentation fault, while some tolerate this. anyhow, exceeding the array's size should be avoided.

An array is really just a pointer to the start of a contiguous, allocated block of memory.
In this case, you have allocated 4 ints worth of memory.
So if you went array[2] it would think "the memory at array + sizeof(int) * 2"
Change the 2 to 25, and you're just looking at 25 int's worth of memory past the start. Since there's no checks to verify you're in bounds (either when assigning or printing) it works.
There's nothing to say somehting else might be allocated there though!

The number in the decleration determines how many memory should be reserved, in that case 4 * sizeof(int).
If you write to memory out of the bounds, this is possible but not recommended. In C you can access any point in memory available to your program, but if you write to that if it's not reserved for that thing, you can cause memory corruption. Arrays are pointers (but not the other way around).
The behavior depends on the compiler, the platform and some randomness. Don't do it.

It's doing very bad things. If the array is declared locally to a function, you're probably writing on stack locations above the current function's stack frame. If it is declared globally, you're probably writing on memory allocated to adjacent variables. Either way, it is "working" by pure luck.

It is possible your C compiler had padded your array for memory alignment purposes, and by luck your array overrun just happens to still be within the rounded-up allocation. Don't rely on it though.

This is unsafe programming. It really should be avoided because it may not crash your program. Which is really the best thing you could hope for. It could give you garbage results. These are unpredictable and could really screw up your program. However since you don't know that is wrong because it not crashing it will ruin the integrity of your data. Since there is no try/catch with C you really should check inputs. Remember scanf returns an int.

C by design does not perform array bounds checking. It was designed as a systems level language, and the overhead of explicit run-time bounds checking could be prohibitive in a great many cases. Consequently C will allow 'dangerous' code and must be used carefully. If you need something 'safer' then C# or Java may be more appropriate, but there is a definite performance hit.
Automated static analysis tools may help, and there are run-time bounds checking tools for use in development and debugging.
In C an array is a contiguous block of memory. By accessing the array out-of-bounds, you are simply accessing memory beyond the end of the array. What accessing such memory will do is non-deterministic, it may be junk, it may belong to an adjacent variable, or it may belong to the variables of the calling function or above. It maybe a return address for the current function or a calling function. In a memory protected OS such as Windows or Linux, if you access so far out of bounds as to be no longer within the address range assigned to the process, a fault exception will occur.

Related

Accessing an array element that does not exist in C [duplicate]

Why does C/C++ differentiates in case of array index out of bound
#include <stdio.h>
int main()
{
int a[10];
a[3]=4;
a[11]=3;//does not give segmentation fault
a[25]=4;//does not give segmentation fault
a[20000]=3; //gives segmentation fault
return 0;
}
I understand that it's trying to access memory allocated to process or thread in case of a[11] or a[25] and it's going out of stack bounds in case of a[20000].
Why doesn't compiler or linker give an error, aren't they aware of the array size? If not then how does sizeof(a) work correctly?
The problem is that C/C++ doesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
In this particular case, you are declaring a stack based array. Depending upon the particular implementation, accessing outside the bounds of the array will simply access another part of the already allocated stack space (most OS's and threads reserve a certain portion of memory for stack). As long as you just happen to be playing around in the pre-allocated stack space, everything will not crash (note i did not say work).
What's happening on the last line is that you have now accessed beyond the part of memory that is allocated for the stack. As a result you are indexing into a part of memory that is not allocated to your process or is allocated in a read only fashion. The OS sees this and sends a seg fault to the process.
This is one of the reasons that C/C++ is so dangerous when it comes to boundary checking.
The segfault is not an intended action of your C program that would tell you that an index is out of bounds. Rather, it is an unintended consequence of undefined behavior.
In C and C++, if you declare an array like
type name[size];
You are only allowed to access elements with indexes from 0 up to size-1. Anything outside of that range causes undefined behavior. If the index was near the range, most probably you read your own program's memory. If the index was largely out of range, most probably your program will be killed by the operating system. But you can't know, anything can happen.
Why does C allow that? Well, the basic gist of C and C++ is to not provide features if they cost performance. C and C++ has been used for ages for highly performance critical systems. C has been used as a implementation language for kernels and programs where access out of array bounds can be useful to get fast access to objects that lie adjacent in memory. Having the compiler forbid this would be for naught.
Why doesn't it warn about that? Well, you can put warning levels high and hope for the compiler's mercy. This is called quality of implementation (QoI). If some compiler uses open behavior (like, undefined behavior) to do something good, it has a good quality of implementation in that regard.
[js#HOST2 cpp]$ gcc -Wall -O2 main.c
main.c: In function 'main':
main.c:3: warning: array subscript is above array bounds
[js#HOST2 cpp]$
If it instead would format your hard disk upon seeing the array accessed out of bounds - which would be legal for it - the quality of implementation would be rather bad. I enjoyed to read about that stuff in the ANSI C Rationale document.
You generally only get a segmentation fault if you try to access memory your process doesn't own.
What you're seeing in the case of a[11] (and a[10] by the way) is memory that your process does own but doesn't belong to the a[] array. a[25000] is so far from a[], it's probably outside your memory altogether.
Changing a[11] is far more insidious as it silently affects a different variable (or the stack frame which may cause a different segmentation fault when your function returns).
C isn't doing this. The OS's virtual memeory subsystem is.
In the case where you are only slightly out-of-bound you are addressing memeory that is allocated for your program (on the stack call stack in this case). In the case where you are far out-of-bounds you are addressing memory not given over to your program and the OS is throwing a segmentation fault.
On some systems there is also a OS enforced concept of "writeable" memory, and you might be trying to write to memeory that you own but is marked unwriteable.
Just to add what other people are saying, you cannot rely on the program simply crashing in these cases, there is no gurantee of what will happen if you attempt to access a memory location beyond the "bounds of the array." It's just the same as if you did something like:
int *p;
p = 135;
*p = 14;
That is just random; this might work. It might not. Don't do it. Code to prevent these sorts of problems.
As litb mentioned, some compilers can detect some out-of-bounds array accesses at compile time. But bounds checking at compile time won't catch everything:
int a[10];
int i = some_complicated_function();
printf("%d\n", a[i]);
To detect this, runtime checks would have to be used, and they're avoided in C because of their performance impact. Even with knowledge of a's array size at compile time, i.e. sizeof(a), it can't protect against that without inserting a runtime check.
As I understand the question and comments, you understand why bad things can happen when you access memory out of bounds, but you're wondering why your particular compiler didn't warn you.
Compilers are allowed to warn you, and many do at the highest warning levels. However the standard is written to allow people to run compilers for all sorts of devices, and compilers with all sorts of features so the standard requires the least it can while guaranteeing people can do useful work.
There are a few times the standard requires that a certain coding style will generate a diagnostic. There are several other times where the standard does not require a diagnostic. Even when a diagnostic is required I'm not aware of any place where the standard says what the exact wording should be.
But you're not completely out in the cold here. If your compiler doesn't warn you, Lint may. Additionally, there are a number of tools to detect such problems (at run time) for arrays on the heap, one of the more famous being Electric Fence (or DUMA). But even Electric Fence doesn't guarantee it will catch all overrun errors.
That's not a C issue its an operating system issue. You're program has been granted a certain memory space and anything you do inside of that is fine. The segmentation fault only happens when you access memory outside of your process space.
Not all operating systems have seperate address spaces for each proces, in which case you can corrupt the state of another process or of the operating system with no warning.
C philosophy is always trust the programmer. And also not checking bounds allows the program to run faster.
As JaredPar said, C/C++ doesn't always perform range checking. If your program accesses a memory location outside your allocated array, your program may crash, or it may not because it is accessing some other variable on the stack.
To answer your question about sizeof operator in C:
You can reliably use sizeof(array)/size(array[0]) to determine array size, but using it doesn't mean the compiler will perform any range checking.
My research showed that C/C++ developers believe that you shouldn't pay for something you don't use, and they trust the programmers to know what they are doing. (see accepted answer to this: Accessing an array out of bounds gives no error, why?)
If you can use C++ instead of C, maybe use vector? You can use vector[] when you need the performance (but no range checking) or, more preferably, use vector.at() (which has range checking at the cost of performance). Note that vector doesn't automatically increase capacity if it is full: to be safe, use push_back(), which automatically increases capacity if necessary.
More information on vector: http://www.cplusplus.com/reference/vector/vector/

What happens if I set a value outside of the memory allocated with calloc?

Consider the following:
int* x = calloc(3,sizeof(int));
x[3] = 100;
which is located inside of a function.
I get no error when I compile and run the program, but when I run it with valgrind I get an "Invalid write of size 4".
I understand that I am accessing a memory place outside of what I have allocated with calloc, but I'm trying to understand what actually happens.
Does some address in the stack(?) still have the value 100? Because there must certainly be more available memory than what I have allocated with calloc. Is the valgrind error more of a "Hey, you probably did not mean to do that"?
I understand that I am accessing a memory place outside of what I have allocated with calloc, but I'm trying to understand what actually happens.
"What actually happens" is not well-defined; it depends entirely on what gets overwritten. As long as you don't overwrite anything important, your code will appear to run as expected.
You could wind up corrupting other data that was allocated dynamically. You could wind up corrupting some bit of heap bookkeeping.
The language does not enforce any kind of bounds-checking on array accesses, so if you read or write past the end of the array, there are no guarantees on what will happen.
Does some address in the stack(?) still have the value 100?
First of all, calloc allocates memory on the heap not stack.
Now, regarding the error.
Sure most of the time there is plenty of memory available when your program is running. However when you allocate memory for x bytes, the memory manager looks for some free chunk of memory of that exact size(+ maybe some more if calloc requested larger memory to store some auxiliary info), there are no guaranties on what the bytes after that chunk are used for, and even no guaranties that they are not read-only or can be accessed by your program.
So anything can happen. In the case if the memory was just there waiting for it to be used by your program, nothing horrible happens, but if that memory was used by something else in your program, the values would be mess up, or worst of all the program could crash because of accessing something that wasn't supposed to be accessed.
So the valgrind error should be treated very seriously.
The C language doesn't require bounds checking on array accesses, and most C compilers don't implement it. Besides if you used some variable size instead of constant value 3, the array size could be unknown during compilation, and there would be no way to check if the access isn't out of bound.
There's no guarantees on what was allocated in the space past x[3] or what will be written there in the future. alinsoar mentioned that x[3] itself does not cause undefined behavior, but you should not attempt to fetch or store a value from there. Often you will probably be able to write and access this memory location without problems, but writing code that relies on reaching outside of your allocated arrays is setting yourself up for very hard to find errors in the future.
Does some address in the stack(?) still have the value 100?
When using calloc or malloc, the values of the array are not actually on the stack. These calls are used for dynamic memory allocation, meaning they are allocated in a seperate area of memory known as the "Heap". This allows you to access these arrays from different parts of the stack as long as you have a pointer to them. If the array were on the stack, writing past the bounds would risk overwriting other information contained in your function (like in the worst case the return location).
The act of doing that is what is called undefined behavior.
Literally anything can happen, or nothing at all.
I give you extra points for testing with Valgrind.
In practice, it is likely you will find the value 100 in the memory space after your array.
Beware of nasal demons.
You're allocating memory for 3 integer elements but accessing the 4th element (x[3]). Hence, the warning message from valgrind. Compiler will not complain about it.

C Array Segmentation Faults only after a certain threshold

I'm looking at this simple program. My understanding is that trying to modify the values at memory addresses past the maximum index should result in a segmentation fault. However, the following code runs without any problems. I am even able to print out all 6 of the array indexes 0->6
main()
{
int i;
int a[3];
for(i=0; i<6; i++)
a[i] = i;
}
However, when I change the for loop to
for(i=0; i<7; i++)
and executing the program, it will segfault.
This almost seems to me like it is some kind of extra padding done by malloc. Why does this happen only after the 6th index (s+6)? Will this behavior happen with longer/shorter arrays? Excuse my foolishness as lowly java programmer :)
This almost seems to me like it is some kind of extra padding done by malloc.
You did not call malloc(). You declared a as an array of 3 integers in stack memory, whereas malloc() uses heap memory. In both cases, accessing any element past the last one (the third one, a[2], in this case) is undefined behaviour. When it's done with heap memory, it usually causes a segmentation fault.
Well, malloc didn't do it, because you didn't call malloc. My guess is, the extra three writes were enough to chew through your frame pointer and your stack pointer, but not through your return address (but the seventh one hit that). Your program is not guaranteed to crash when you access out-of-bounds memory (though there are other languages which do guarantee it), any more than it is guaranteed not to crash. That's what undefined behavior is: unpredictable.
As others said, accessing an array beyond its limits is undefined behaviour. So what happens? If you access memory that you should not access, it depends on where that memory is.
Generally (but very generally, specifics can vary between systems and compilers), the following main things can happen:
It can be that you simply access other variables of your process that lie directly "behind" the array. If you write to that memory, you simply modify the values of the other variables. You will probably not get a segfault, so you may never notice why your program produces bad results or acts so weird or why, during debugging, your variables have values you never (knowlingly) assigned to them. This is, IMO, really bad, because you think everything is fine while it isn't.
It can be, especially on a stack, that you access other data on the stack, like saved processor registers or even the address to which the processor should return if the function ends. If you overwrite these with some other data, it is hard to tell what happens. In that case, a segfault is probably the lesser of all possible evils. You simply don't know what can happen.
If the memory beyond your array does not belong to your process then, on most modern computers, you will get a segfault or similar exception (e.g. an access violation, or whatever your OS calls it).
I may have forgotten a few more possible problems that can occur, but those are, IMO, the most usual things that happen if you write beyond array bounds.
It depends on the free memory available, if free memory available is less then it will give segmentation fault otherwise it will use the extra memory to store the data and it will not be giving segmentation fault.There is no need for malloc because array itself allocates memory.
In your system memory is available only for 6 integers and when you are trying to access to next memory(which is not accessible or say not free)it is giving segmentation fault.

C program help: Insufficient memory allocation but still works...why? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
behaviour of malloc(0)
I'm trying to understand memory allocation in C. So I am experimenting with malloc. I allotted 0 bytes for this pointer but yet it can still hold an integer. As a matter of fact, no matter what number I put into the parameter of malloc, it can still hold any number I give it. Why is this?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *ptr = (int*)malloc(0);
*ptr = 9;
printf("%i", *ptr); // 9
free(ptr);
return 0;
}
It still prints 9, what's up with that?
If size is 0, then malloc() returns either NULL, or a unique pointer
value that can later be successfully passed to free().
I guess you are hitting the 2nd case.
Anyway that pointer just by mistake happens to be in an area where you can write without generating segmentation fault, but you are probably writing in the space of some other variable messing up its value.
A lot of good answers here. But it is definitely undefined behavior. Some people declare that undefined behavior means that purple dragons may fly out of your computer or something like that... there's probably some history behind that outrageous claim that I'm missing, but I promise you that purple dragons won't appear regardless of what the undefined behavior will be.
First of all, let me mention that in the absence of an MMU, on a system without virtual memory, your program would have direct access to all of the memory on the system, regardless of its address. On a system like that, malloc() is merely the guy who helps you carve out pieces of memory in an ordered manner; the system can't actually enforce you to use only the addresses that malloc() gave you. On a system with virtual memory, the situation is slightly different... well, ok, a lot different. But within your program, any code in your program can access any part of the virtual address space that's mapped via the MMU to real physical memory. It doesn't matter whether you got an address from malloc() or whether you called rand() and happened to get an address that falls in a mapped region of your program; if it's mapped and not marked execute-only, you can read it. And if it isn't marked read-only, you can write it as well. Yes. Even if you didn't get it from malloc().
Let's consider the possibilities for the malloc(0) undefined behavior:
malloc(0) returns NULL.
OK, this is simple enough. There really is a physical address 0x00000000 in most computers, and even a virtual address 0x00000000 in all processes, but the OS intentionally doesn't map any memory to that address so that it can trap null pointer accesses. There's a whole page (generally 4KB) there that's just never mapped at all, and maybe even much more than 4KB. Therefore if you try to read or write through a null pointer, even with an offset from it, you'll hit these pages of virtual memory that aren't even mapped, and the MMU will throw an exception (a hardware exception, or interrupt) that the OS catches, and it declares a SIGSEGV (on Linux/Unix), or an illegal access (on Windows).
malloc(0) returns a valid address to previously unallocated memory of the smallest allocable unit.
With this, you actually get a real piece of memory that you can legally call your own, of some size you don't know. You really shouldn't write anything there (and probably not read either) because you don't know how big it is, and for that matter, you don't know if this is the particular case you're experiencing (see the following cases). If this is the case, the block of memory you were given is almost guaranteed to be at least 4 bytes and probably is 8 bytes or perhaps even larger; it all depends on whatever the size is of your implementation's minimum allocable unit.
malloc(0) intentionally returns the address of an unmapped page of
memory other than NULL.
This is probably a good option for an implementation, as it would allow you or the system to track & pair together malloc() calls with their corresponding free() calls, but in essence, it's the same as returning NULL. If you try to access (read/write) via this pointer, you'll crash (SEGV or illegal access).
malloc(0) returns an address in some other mapped page of memory
that may be used by "someone else".
I find it highly unlikely that a commercially-available system would take this route, as it serves to simply hide bugs rather than bring them out as soon as possible. But if it did, malloc() would be returning a pointer to somewhere in memory that you do not own. If this is the case, sure, you can write to it all you want, but you'd be corrupting some other code's memory, though it would be memory in your program's process, so you can be assured that you're at least not going to be stomping on another program's memory. (I hear someone getting ready to say, "But it's UB, so technically it could be stomping on some other program's memory. Yes, in some environments, like an embedded system, that is right. No modern commercial OS would let one process have access to another process's memory as easily as simply calling malloc(0) though; in fact, you simply can't get from one process to another process's memory without going through the OS to do it for you.) Anyway, back to reality... This is the one where "undefined behavior" really kicks in: If you're writing to "someone else's memory" (in your own program's process), you'll be changing the behavior of your program in difficult-to-predict ways. Knowing the structure of your program and where everything is laid out in memory, it's fully predictable. But from one system to another, things would be laid out in memory (appearing a different locations in memory), so the effect on one system would not necessarily be the same as the effect on another system, or on the same system at a different time.
And finally.... No, that's it. There really, truly, are only those four
possibilities. You could argue for special-case subset points for
the last two of the above, but the end result will be the same.
For one thing, your compiler may be seeing these two lines back to back and optimizing them:
*ptr = 9;
printf("%i", *ptr);
With such a simplistic program, your compiler may actually be optimizing away the entire memory allocate/free cycle and using a constant instead. A compiler-optimized version of your program could end up looking more like simply:
printf("9");
The only way to tell if this is indeed what is happening is to examine the assembly that your compiler emits. If you're trying to learn how C works, I recommend explicitly disabling all compiler optimizations when you build your code.
Regarding your particular malloc usage, remember that you will get a NULL pointer back if allocation fails. Always check the return value of malloc before you use it for anything. Blindly dereferencing it is a good way to crash your program.
The link that Nick posted gives a good explanation about why malloc(0) may appear to work (note the significant difference between "works" and "appears to work"). To summarize the information there, malloc(0) is allowed to return either NULL or a pointer. If it returns a pointer, you are expressly forbidden from using it for anything other than passing it to free(). If you do try to use such a pointer, you are invoking undefined behavior and there's no way to tell what will happen as a result. It may appear to work for you, but in doing so you may be overwriting memory that belongs to another program and corrupting their memory space. In short: nothing good can happen, so leave that pointer alone and don't waste your time with malloc(0).
The answer to the malloc(0)/free() calls not crashing you can find here:
zero size malloc
About the *ptr = 9, is just like overflowing a buffer (like malloc'ing 10 bytes and access the 11th), you are writing to memory you don't own, and doing that is looking for trouble. In this particular implementation malloc(0) happens to return a pointer instead of NULL.
Bottom line, it is wrong even if it seems to work on a simple case.
Some memory allocators have the notion of "minimum allocatable size". So, even if you pass zero, this will return pointer to the memory of word-size, for example. You need to check up with your system allocator documentation. But if it does return pointer to some memory it'd be wrong to rely on it as the pointer is only supposed to be passed either to be passed realloc() or free().

Using Malloc to allocate array size in C

In a program I'm writing, I have an array of accounts(account is a struct I made). I need this visible to all functions and threads in my program. However, I won't know the size it has to be until the main function figures that out. so I created it with:
account *accounts;
and try to allocate space to it in main with this:
number of accounts = 100 //for example
accounts = (account*)malloc(numberOfAccounts * sizeof (account));
However, it appears to be sizing the array larger than it needs to be. For example, accounts[150] exists, and so on.
Is there anything I am doing wrong? How can I get the size of accounts to be exactly 100?
Thanks
You can't do that - malloc() doesn't provide any guarantees about how much memory it actually allocates (except that if it succeeds it will return a pointer to at least as much as you requested). If you access anything outside the range you asked for, it causes undefined behaviour. That means it might appear to work, but there's nothing you can do about that.
BTW, in C you don't need to typecast the returned value from malloc().
Even though it may look like it, accounts[150] does not truly exist.
So why does your program continue to run? Well, that's because even though accounts[150] isn't a real element, it lies within the memory space your program is allowed to access.
C contains no runtime checking of indexes - it just calculates the appropriate address and accesses that. If your program doesn't have access to that memory address, it'll crash with a segmentation fault (or, in Windows terms, an access violation). If, on the other hand, the program is allowed to access that memory address, then it'll simply treat whatever is at that address as an account.
If you try to modify that, almost anything can happen - depending on a wide variety of factors, it could modify some other variables in your program, or given some very unlucky circumstances, it could even modify the program code itself, which could lead to all kinds of funky behavior (including a crash). It is even possible that no side effects can ever be observed if malloc (for whatever reason) allocated more memory than you explicitly requested (which is possible).
If you want to make sure that such errors are caught at runtime, you'll have to implement your own checking and error handling.
I can't seem to find anything wrong with what you provide. If you have a struct, e.g.:
struct account{
int a,b,c,d;
float e,f,g,h;
}
Then you can indeed create an array of accounts using:
struct account *accounts = (struct account *) malloc(numAccounts * sizeof(account));
Note that for C the casting of void* (retun type of malloc) is not necessary. It will get upcasted automatically.
[edit]
Ahhh! I see your problem now! Right. Yes you can still access accounts[150], but basically what happens is that accounts will point to some memory location. accounts[150] simply points 150 times the size of the struct further. You can get the same result by doing this:
*(accounts + 150), which basically says: Give me the value at location accounts+150.
This memory is simply not reserved, and therefore causes undefined behavior. It basically comes down to: Don't do this!
Your code is fine. When you say accounts[150] exits do you mean exits or exists?
If your code is crashing when accessing accounts[150] (assuming numberOfAccounts = 100) then this is to be expected you are accessing memory outside that you allocated.
If you meant exists it doesn't really, you are just walking off the end of the array and the pointer you get back is to a different area of memory than you allocated.
Size of accounts is exacly for 100 structures from malloc result pointer starts if this address is non-zero.
Just because it works doesn't mean it exists as part of the memory you allocated, most likely it belongs to someone else.
C doesn't care or know that your account* came from malloc, all it knows is that is a memory pointer to something that is sizeof(account).
accounts[150] accesses the 150th account-sized object from the value in the pointer, which may be random data, may be something else, depending on your system it may even be your program.
The reason things seem to "work" is that whatever is there happens to be unimportant, but that might not always be the case.

Resources