Using Malloc to allocate array size in C

Using Malloc to allocate array size in C - c

In a program I'm writing, I have an array of accounts(account is a struct I made). I need this visible to all functions and threads in my program. However, I won't know the size it has to be until the main function figures that out. so I created it with:
account *accounts;
and try to allocate space to it in main with this:
number of accounts = 100 //for example
accounts = (account*)malloc(numberOfAccounts * sizeof (account));
However, it appears to be sizing the array larger than it needs to be. For example, accounts[150] exists, and so on.
Is there anything I am doing wrong? How can I get the size of accounts to be exactly 100?
Thanks

You can't do that - malloc() doesn't provide any guarantees about how much memory it actually allocates (except that if it succeeds it will return a pointer to at least as much as you requested). If you access anything outside the range you asked for, it causes undefined behaviour. That means it might appear to work, but there's nothing you can do about that.
BTW, in C you don't need to typecast the returned value from malloc().

Even though it may look like it, accounts[150] does not truly exist.
So why does your program continue to run? Well, that's because even though accounts[150] isn't a real element, it lies within the memory space your program is allowed to access.
C contains no runtime checking of indexes - it just calculates the appropriate address and accesses that. If your program doesn't have access to that memory address, it'll crash with a segmentation fault (or, in Windows terms, an access violation). If, on the other hand, the program is allowed to access that memory address, then it'll simply treat whatever is at that address as an account.
If you try to modify that, almost anything can happen - depending on a wide variety of factors, it could modify some other variables in your program, or given some very unlucky circumstances, it could even modify the program code itself, which could lead to all kinds of funky behavior (including a crash). It is even possible that no side effects can ever be observed if malloc (for whatever reason) allocated more memory than you explicitly requested (which is possible).
If you want to make sure that such errors are caught at runtime, you'll have to implement your own checking and error handling.

I can't seem to find anything wrong with what you provide. If you have a struct, e.g.:
struct account{
int a,b,c,d;
float e,f,g,h;
}
Then you can indeed create an array of accounts using:
struct account *accounts = (struct account *) malloc(numAccounts * sizeof(account));
Note that for C the casting of void* (retun type of malloc) is not necessary. It will get upcasted automatically.
[edit]
Ahhh! I see your problem now! Right. Yes you can still access accounts[150], but basically what happens is that accounts will point to some memory location. accounts[150] simply points 150 times the size of the struct further. You can get the same result by doing this:
*(accounts + 150), which basically says: Give me the value at location accounts+150.
This memory is simply not reserved, and therefore causes undefined behavior. It basically comes down to: Don't do this!

Your code is fine. When you say accounts[150] exits do you mean exits or exists?
If your code is crashing when accessing accounts[150] (assuming numberOfAccounts = 100) then this is to be expected you are accessing memory outside that you allocated.
If you meant exists it doesn't really, you are just walking off the end of the array and the pointer you get back is to a different area of memory than you allocated.

Size of accounts is exacly for 100 structures from malloc result pointer starts if this address is non-zero.

Just because it works doesn't mean it exists as part of the memory you allocated, most likely it belongs to someone else.
C doesn't care or know that your account* came from malloc, all it knows is that is a memory pointer to something that is sizeof(account).
accounts[150] accesses the 150th account-sized object from the value in the pointer, which may be random data, may be something else, depending on your system it may even be your program.
The reason things seem to "work" is that whatever is there happens to be unimportant, but that might not always be the case.

Related

What happens if I set a value outside of the memory allocated with calloc?

Consider the following:
int* x = calloc(3,sizeof(int));
x[3] = 100;
which is located inside of a function.
I get no error when I compile and run the program, but when I run it with valgrind I get an "Invalid write of size 4".
I understand that I am accessing a memory place outside of what I have allocated with calloc, but I'm trying to understand what actually happens.
Does some address in the stack(?) still have the value 100? Because there must certainly be more available memory than what I have allocated with calloc. Is the valgrind error more of a "Hey, you probably did not mean to do that"?

I understand that I am accessing a memory place outside of what I have allocated with calloc, but I'm trying to understand what actually happens.
"What actually happens" is not well-defined; it depends entirely on what gets overwritten. As long as you don't overwrite anything important, your code will appear to run as expected.
You could wind up corrupting other data that was allocated dynamically. You could wind up corrupting some bit of heap bookkeeping.
The language does not enforce any kind of bounds-checking on array accesses, so if you read or write past the end of the array, there are no guarantees on what will happen.

Does some address in the stack(?) still have the value 100?
First of all, calloc allocates memory on the heap not stack.
Now, regarding the error.
Sure most of the time there is plenty of memory available when your program is running. However when you allocate memory for x bytes, the memory manager looks for some free chunk of memory of that exact size(+ maybe some more if calloc requested larger memory to store some auxiliary info), there are no guaranties on what the bytes after that chunk are used for, and even no guaranties that they are not read-only or can be accessed by your program.
So anything can happen. In the case if the memory was just there waiting for it to be used by your program, nothing horrible happens, but if that memory was used by something else in your program, the values would be mess up, or worst of all the program could crash because of accessing something that wasn't supposed to be accessed.
So the valgrind error should be treated very seriously.
The C language doesn't require bounds checking on array accesses, and most C compilers don't implement it. Besides if you used some variable size instead of constant value 3, the array size could be unknown during compilation, and there would be no way to check if the access isn't out of bound.

There's no guarantees on what was allocated in the space past x[3] or what will be written there in the future. alinsoar mentioned that x[3] itself does not cause undefined behavior, but you should not attempt to fetch or store a value from there. Often you will probably be able to write and access this memory location without problems, but writing code that relies on reaching outside of your allocated arrays is setting yourself up for very hard to find errors in the future.
Does some address in the stack(?) still have the value 100?
When using calloc or malloc, the values of the array are not actually on the stack. These calls are used for dynamic memory allocation, meaning they are allocated in a seperate area of memory known as the "Heap". This allows you to access these arrays from different parts of the stack as long as you have a pointer to them. If the array were on the stack, writing past the bounds would risk overwriting other information contained in your function (like in the worst case the return location).

The act of doing that is what is called undefined behavior.
Literally anything can happen, or nothing at all.
I give you extra points for testing with Valgrind.
In practice, it is likely you will find the value 100 in the memory space after your array.
Beware of nasal demons.

You're allocating memory for 3 integer elements but accessing the 4th element (x[3]). Hence, the warning message from valgrind. Compiler will not complain about it.

Why can I still access a member of a struct after the pointer to it is freed?

If I define a structure...
struct LinkNode
{
int node_val;
struct LinkNode *next_node;
};
and then create a pointer to it...
struct LinkNode *mynode = malloc(sizeof(struct LinkNode));
...and then finally free() it...
free(mynode);
...I can still access the 'next_node' member of the structure.
mynode->next_node
My question is this: which piece of the underlying mechanics keeps track of the fact that this block of memory is supposed to represent a struct LinkNode? I'm a newbie to C, and I expected that after I used free() on the pointer to my LinkNode, that I would no longer be able to access the members of that struct. I expected some sort of 'no longer available' warning.
I would love to know more about how the underlying process works.

The compiled program no longer has any knowledge about struct LinkedNode or field named next_node, or anything like that. Any names are completely gone from the compiled program. The compiled program operates in terms of numerical values, which can play roles of memory addresses, offsets, indices and so on.
In your example, when you read mynode->next_node in the source code of your program, it is compiled into machine code that simply reads the 4-byte numerical value from some reserved memory location (known as variable mynode in your source code), adds 4 to it (which is offset of the next_node field) and reads the 4-byte value at the resultant address (which is mynode->next_node). This code, as you can see, operates in terms of integer values - addresses, sizes and offsets. It does not care about any names, like LinkedNode or next_node. It does not care whether the memory is allocated and/or freed. It does not care whether any of these accesses are legal or not.
(The constant 4 I repeatedly use in the above example is specific for 32-bit platforms. On 64-bit platforms it would be replaced by 8 in most (or all) instances.)
If an attempt is made to read memory that has been freed, these accesses might crash your program. Or they might not. It is a matter of pure luck. As far as the language is concerned, the behavior is undefined.

There isn't and you can't. This is a classic case of undefined behavior.
When you have undefined behavior, anything can happen. It may even appear to work, only to randomly crash a year later.

It works by pure luck, because the freed memory has not yet been overwritten by something else. Once you free the memory, it is your responsibility to avoid using it again.

No part of the underlying Memory keeps track of it. It's just the semantics the programming language gives to the chunk of memory. You could e.g. cast it to something completely different and can still access the same memory region. However the catch here is, that this is more likely to lead to errors. Especially type-safty will be gone. In your case just because you called free doesn't mean that the underlying memory canges at all. There is just a flag in your operating system that marks this region as free again.
Think about it this way: the free-function is something like a "minimal" memory management system. If your call would require more than setting a flag it would introduce unneccessary overhead. Also when you access the member you (i.e. your operating system) could check if the flag for this memory region is set to "free" or "in use". But that's overhead again.
Of course that doesn't mean it wouldn't make sense to do those kind of things. It would avoid a lot of security holes and is done for example in .Net and Java. But those runtimes are much younger than C and we have much more ressources these days.

When your compiler translates your C code into executable machine code, a lot of information is thrown away, including type information. Where you write:
int x = 42;
the generated code just copies a certain bit pattern into a certain chunk of memory (a chunk that might typically be 4 bytes). You can't tell by examining the machine code that the chunk of memory is an object of type int.
Similarly, when you write:
if (mynode->next_node == NULL) { /* ... */ }
the generated code will fetch a pointer sized chunk of memory by dereferencing another pointer-sized chunk of memory, and compare the result to the system's representation of a null pointer (typically all-bits-zero). The generated code doesn't directly reflect the fact that next_node is a member of a struct, or anything about how the struct was allocated or whether it still exists.
The compiler can check a lot of things at compile time, but it doesn't necessarily generate code to perform checks at execution time. It's up to you as a programmer to avoid making errors in the first place.
In this specific case, after the call to free, mynode has an indeterminate value. It doesn't point to any valid object, but there's no requirement for the implementation to do anything with that knowledge. Calling free doesn't destroy the allocated memory, it merely makes it available for allocation by future calls to malloc.
There are a number of ways that an implementation could perform checks like this, and trigger a run-time error if you dereference a pointer after freeing it. But such checks are not required by the C language, and they're generally not implemented because (a) they would be quite expensive, making your program run more slowly, and (b) checks can't catch all errors anyway.
C is defined so that memory allocation and pointer manipulation will work correctly if your program does everything right. If you make certain errors that can be detected at compile time, the compiler can diagnose them. For example, assigning a pointer value to an integer object requires at least a compile-time warning. But other errors, such as dereferencing a freed pointer, cause your program to have undefined behavior. It's up to you, as a programmer, to avoid making those errors in the first place. If you fail, you're on your own.
Of course there are tools that can help. Valgrind is one; clever optimizing compilers are another. (Enabling optimization causes the compiler to perform more analysis of your code, and that can often enable it to diagnose more errors.) But ultimately C is not a language that holds your hand. It's a sharp tool -- and one that can be used to build safer tools, such as interpreted languages that do more run-time checking.

You need to assign NULL to mynode->next_node:
mynode->next_node = NULL;
after freeing the memory so it will indicate that you are not using anymore the memory allocated.
Without assigning the NULL value, it is still pointing to the previously freed memory location.

C program help: Insufficient memory allocation but still works...why? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
behaviour of malloc(0)
I'm trying to understand memory allocation in C. So I am experimenting with malloc. I allotted 0 bytes for this pointer but yet it can still hold an integer. As a matter of fact, no matter what number I put into the parameter of malloc, it can still hold any number I give it. Why is this?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *ptr = (int*)malloc(0);
*ptr = 9;
printf("%i", *ptr); // 9
free(ptr);
return 0;
}
It still prints 9, what's up with that?

If size is 0, then malloc() returns either NULL, or a unique pointer
value that can later be successfully passed to free().
I guess you are hitting the 2nd case.
Anyway that pointer just by mistake happens to be in an area where you can write without generating segmentation fault, but you are probably writing in the space of some other variable messing up its value.

A lot of good answers here. But it is definitely undefined behavior. Some people declare that undefined behavior means that purple dragons may fly out of your computer or something like that... there's probably some history behind that outrageous claim that I'm missing, but I promise you that purple dragons won't appear regardless of what the undefined behavior will be.
First of all, let me mention that in the absence of an MMU, on a system without virtual memory, your program would have direct access to all of the memory on the system, regardless of its address. On a system like that, malloc() is merely the guy who helps you carve out pieces of memory in an ordered manner; the system can't actually enforce you to use only the addresses that malloc() gave you. On a system with virtual memory, the situation is slightly different... well, ok, a lot different. But within your program, any code in your program can access any part of the virtual address space that's mapped via the MMU to real physical memory. It doesn't matter whether you got an address from malloc() or whether you called rand() and happened to get an address that falls in a mapped region of your program; if it's mapped and not marked execute-only, you can read it. And if it isn't marked read-only, you can write it as well. Yes. Even if you didn't get it from malloc().
Let's consider the possibilities for the malloc(0) undefined behavior:
malloc(0) returns NULL.
OK, this is simple enough. There really is a physical address 0x00000000 in most computers, and even a virtual address 0x00000000 in all processes, but the OS intentionally doesn't map any memory to that address so that it can trap null pointer accesses. There's a whole page (generally 4KB) there that's just never mapped at all, and maybe even much more than 4KB. Therefore if you try to read or write through a null pointer, even with an offset from it, you'll hit these pages of virtual memory that aren't even mapped, and the MMU will throw an exception (a hardware exception, or interrupt) that the OS catches, and it declares a SIGSEGV (on Linux/Unix), or an illegal access (on Windows).
malloc(0) returns a valid address to previously unallocated memory of the smallest allocable unit.
With this, you actually get a real piece of memory that you can legally call your own, of some size you don't know. You really shouldn't write anything there (and probably not read either) because you don't know how big it is, and for that matter, you don't know if this is the particular case you're experiencing (see the following cases). If this is the case, the block of memory you were given is almost guaranteed to be at least 4 bytes and probably is 8 bytes or perhaps even larger; it all depends on whatever the size is of your implementation's minimum allocable unit.
malloc(0) intentionally returns the address of an unmapped page of
memory other than NULL.
This is probably a good option for an implementation, as it would allow you or the system to track & pair together malloc() calls with their corresponding free() calls, but in essence, it's the same as returning NULL. If you try to access (read/write) via this pointer, you'll crash (SEGV or illegal access).
malloc(0) returns an address in some other mapped page of memory
that may be used by "someone else".
I find it highly unlikely that a commercially-available system would take this route, as it serves to simply hide bugs rather than bring them out as soon as possible. But if it did, malloc() would be returning a pointer to somewhere in memory that you do not own. If this is the case, sure, you can write to it all you want, but you'd be corrupting some other code's memory, though it would be memory in your program's process, so you can be assured that you're at least not going to be stomping on another program's memory. (I hear someone getting ready to say, "But it's UB, so technically it could be stomping on some other program's memory. Yes, in some environments, like an embedded system, that is right. No modern commercial OS would let one process have access to another process's memory as easily as simply calling malloc(0) though; in fact, you simply can't get from one process to another process's memory without going through the OS to do it for you.) Anyway, back to reality... This is the one where "undefined behavior" really kicks in: If you're writing to "someone else's memory" (in your own program's process), you'll be changing the behavior of your program in difficult-to-predict ways. Knowing the structure of your program and where everything is laid out in memory, it's fully predictable. But from one system to another, things would be laid out in memory (appearing a different locations in memory), so the effect on one system would not necessarily be the same as the effect on another system, or on the same system at a different time.
And finally.... No, that's it. There really, truly, are only those four
possibilities. You could argue for special-case subset points for
the last two of the above, but the end result will be the same.

For one thing, your compiler may be seeing these two lines back to back and optimizing them:
*ptr = 9;
printf("%i", *ptr);
With such a simplistic program, your compiler may actually be optimizing away the entire memory allocate/free cycle and using a constant instead. A compiler-optimized version of your program could end up looking more like simply:
printf("9");
The only way to tell if this is indeed what is happening is to examine the assembly that your compiler emits. If you're trying to learn how C works, I recommend explicitly disabling all compiler optimizations when you build your code.
Regarding your particular malloc usage, remember that you will get a NULL pointer back if allocation fails. Always check the return value of malloc before you use it for anything. Blindly dereferencing it is a good way to crash your program.
The link that Nick posted gives a good explanation about why malloc(0) may appear to work (note the significant difference between "works" and "appears to work"). To summarize the information there, malloc(0) is allowed to return either NULL or a pointer. If it returns a pointer, you are expressly forbidden from using it for anything other than passing it to free(). If you do try to use such a pointer, you are invoking undefined behavior and there's no way to tell what will happen as a result. It may appear to work for you, but in doing so you may be overwriting memory that belongs to another program and corrupting their memory space. In short: nothing good can happen, so leave that pointer alone and don't waste your time with malloc(0).

The answer to the malloc(0)/free() calls not crashing you can find here:
zero size malloc
About the *ptr = 9, is just like overflowing a buffer (like malloc'ing 10 bytes and access the 11th), you are writing to memory you don't own, and doing that is looking for trouble. In this particular implementation malloc(0) happens to return a pointer instead of NULL.
Bottom line, it is wrong even if it seems to work on a simple case.

Some memory allocators have the notion of "minimum allocatable size". So, even if you pass zero, this will return pointer to the memory of word-size, for example. You need to check up with your system allocator documentation. But if it does return pointer to some memory it'd be wrong to rely on it as the pointer is only supposed to be passed either to be passed realloc() or free().

reading a arbitary memory location making the program crash

I am trying to read the value at random memory location using the following c code
main()
{
int a,*b;
printf("enter the value of a");
scanf("%d",&a);
b=a;
printf("%d\n%d\n",a,*b);
getch();
}
But the program is crashing when some negative values or even when zero is entered in place of variable a through scanf.
What I am doing wrong? Does the pointers dont have negative values?

The thing is that as you are probably running on a modern, full service operating system and it provides a more complicated abstract machine than the one described in most intro to c programming books.
In particular, there are OS imposed restrictions on access to arbitrary addresses. You don't notice this when you look at addresses associated with standard variables because you do have permission to use the stack and the data segment, and the alloc family of functions takes care of making sure that you have permission to access the parts of the heap that they hand you.
So, what you are doing is accessing memory for which you do not have permission, which results in a failure called a "segmentation fault" on most OS, and that abruptly ends your program.
What can you do about it?
Allocate a big block on the heap, by calling char *p = malloc(1000000); and then find the starting and ending addresses with something like printf("start:\t%p\nend\t%p\n",(void*)p,(void*)(p+1000000)); and only entering numbers in that range.
Note that the %p specifier print specifier outputs in hexadecimal, so you probably want to enter address in the same base. The standard library function strtol will be helpful in that regard.
A more sophisticated approach would be to use your OS's API to request access permission for arbitrary address, but for some values the OS is likely to simply refuse.

I see some confusion here over just want a pointer is.
First, you ask the user for a value. This is fine. Then you assign that value as the location of the pointer b. This MAY be fine but likely not.
Think for a moment, what does *(-500) mean? What would *(0) mean?
In general you can never just take user input and use it without first checking it or manipulating it. This is one place where security breaches come from.
If you want to experiment with dereferencing memory, just hard code some values at first. Load the program up in a debugger and see what happens.
int c;
b = 500;
c = *b; // what happens?
b = 0;
c = *b; // what happens?
b = -100;
c = *b; // what happens?

Let me greatly oversimplify for you...
In almost all modern computers, with most operating systems, very little of the memory in the machine is directly addressable by your program. You can't take a pointer, point it at something, and try to read it. It will almost always fail.
There are generally three things that will go wrong:
The memory doesn't exist where you're pointing. Pointers can hold large range of values, and not all of them mean anything. It's like a house number in a postal address. Technically, you can put anything you want on the envelope. Only some are valid.
The memory exists, but isn't yours. The vast majority of memory in the computer is "owned" by the operating system and if you touch it, it will terminate your program. This is for your safety.
The memory you're trying to address is valid, in the right range, but not quite the right type. From the earlier example, you might have a reasonable house number but there's no house at that location. Or the address is really an apartment and just a number won't do.
In an old 8-bit computer from the 1980's with a full 64k of memory, you could just read any location you wanted to and it would be fine. Not so much anymore.

In theory, you have permissions to read from any address within your virtual address space (e.g. 0 to 0xFFFFFFFF on a 32-bit machine).
0 and negative numbers are not a problem - once you assign them to pointers, they are casted to non-negative values.
In practice, it won't work. OS will protect itself (and you) from this - it won't let you read from address that doesn't really belong to you.
That is, if you haven't allocated the memory page and haven't write something there, OS won't let you to read from there.
Moreover, you don't really own the whole address space - lower part of it is owned by kernel et al., so OS won't let you access it.

Pointers are exclusively positive from what I have heard. 0 (the NULL pointer) is guaranteed not to point at anything and will cause the program to halt. Further, operatings systems (even hardware if I remember correctly) provide memory protection. This is why programs used to be able to crash each other but this is now much less common. When you program runs, the OS decides what memory it has access to and will throw a segfault if you try to access memory that isn't yours.
Then again, perhaps you just wanted b = &a? This would make b point to the same place as a exists and so, when you *b it would equate to the value stored in a.

As i see you declare b as pointer, hence it is wrong to do a=b. You will get segmentation fault. Pointers only shows to pointers not to values of integers, floats or chars.
Alternatively you could do b = &a, which means that b shows to the memory address of a. So you could print then the value stored in the a.

defining a simple array of integers

Why can i do this?
int array[4]; // i assume this creates an array with 4 indexes
array[25]=1; // i have assigned an index larger than the int declaration
NSLog(#"result: %i", array[25]); // this prints "1" to the screen
Why does this work, if the index exceeds the declaration? what is the significance of the number in the declaration if it has no effect on what you can actually do with the array?
Thanks.

You are getting undefined behavior. It could print anything, it could crash, it could burst into singing (okay that isn't likely but you get the idea).
If it happens to write to a location that is mapped with the adequate permissions it will work. Until one day when it won't because of a different layout.

it is undefined. some OS will give you segmentation fault, while some tolerate this. anyhow, exceeding the array's size should be avoided.

An array is really just a pointer to the start of a contiguous, allocated block of memory.
In this case, you have allocated 4 ints worth of memory.
So if you went array[2] it would think "the memory at array + sizeof(int) * 2"
Change the 2 to 25, and you're just looking at 25 int's worth of memory past the start. Since there's no checks to verify you're in bounds (either when assigning or printing) it works.
There's nothing to say somehting else might be allocated there though!

The number in the decleration determines how many memory should be reserved, in that case 4 * sizeof(int).
If you write to memory out of the bounds, this is possible but not recommended. In C you can access any point in memory available to your program, but if you write to that if it's not reserved for that thing, you can cause memory corruption. Arrays are pointers (but not the other way around).
The behavior depends on the compiler, the platform and some randomness. Don't do it.

It's doing very bad things. If the array is declared locally to a function, you're probably writing on stack locations above the current function's stack frame. If it is declared globally, you're probably writing on memory allocated to adjacent variables. Either way, it is "working" by pure luck.

It is possible your C compiler had padded your array for memory alignment purposes, and by luck your array overrun just happens to still be within the rounded-up allocation. Don't rely on it though.

This is unsafe programming. It really should be avoided because it may not crash your program. Which is really the best thing you could hope for. It could give you garbage results. These are unpredictable and could really screw up your program. However since you don't know that is wrong because it not crashing it will ruin the integrity of your data. Since there is no try/catch with C you really should check inputs. Remember scanf returns an int.

C by design does not perform array bounds checking. It was designed as a systems level language, and the overhead of explicit run-time bounds checking could be prohibitive in a great many cases. Consequently C will allow 'dangerous' code and must be used carefully. If you need something 'safer' then C# or Java may be more appropriate, but there is a definite performance hit.
Automated static analysis tools may help, and there are run-time bounds checking tools for use in development and debugging.
In C an array is a contiguous block of memory. By accessing the array out-of-bounds, you are simply accessing memory beyond the end of the array. What accessing such memory will do is non-deterministic, it may be junk, it may belong to an adjacent variable, or it may belong to the variables of the calling function or above. It maybe a return address for the current function or a calling function. In a memory protected OS such as Windows or Linux, if you access so far out of bounds as to be no longer within the address range assigned to the process, a fault exception will occur.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight