I'm studying this malloc function and I could use some help:
static void *malloc(int size)
{
void *p;
if (size < 0)
error("Malloc error");
if (!malloc_ptr)
malloc_ptr = free_mem_ptr;
malloc_ptr = (malloc_ptr + 3) & ~3; /* Align */
p = (void *)malloc_ptr;
malloc_ptr += size;
if (free_mem_end_ptr && malloc_ptr >= free_mem_end_ptr)
error("Out of memory");
malloc_count++;
return p;
}
I know that the malloc func allocates memory space for any type, if there is enough memory, but the lines i don't understand are:
p = (void *)malloc_ptr;
malloc_ptr += size;
How can it point to any data type like that? I just can't understand that void pointer or its location.
NOTE: malloc_ptr is an unsigned long
The reason it returns a void pointer is because it has no idea what you are allocating space for in the malloc call. All it knows is the amount of space you requested. It is up to you or your compiler to decide what will fill the memory. The void pointer's location is typically implemented as a linked list to maintain integrity and know what values of memory are free which is surprisingly kept track of in the free function.
This is the implementation of malloc, so it is allowed to do things that would not be legitimate in a regular program. Specifically, it is making use of the implementation-defined conversion from unsigned long to void *. Program initialization sets malloc_ptr to the numeric address of a large block of unallocated memory. Then, when you ask for an allocation, malloc makes a pointer out of the current value of malloc_ptr and increases malloc_ptr by the number of bytes you asked for. That way, the next time you call malloc it will return a new pointer.
This is about the simplest possible implementation of malloc. Most notably, it appears not to ever reuse freed memory.
Malloc is returning a pointer for a chunk of completely unstructured, flat memory. The (void *) pointer means that it has no idea what it's pointing to (no structure), merely that it points to some memory of size size.
Outside of your call to malloc, you can then tell your program that this pointer has some structure. I.e., if you have a structure some_struct you can say: struct some_struct *pStruct = (struct some_struct *) malloc(sizeof(struct some_struct)).
See how malloc only knows the size of what it is going to allocate, but does not actually know it's structure? Your call to malloc is passing in no information about the structure, merely the size of how much memory to allocate.
This is C's way of being generic: malloc returns you a certain amount of memory and it's your job to cast it to the structured memory you need.
p = (void *)malloc_ptr;
malloc returns a void pointer, which indicates that it is a pointer to a region of unknown data type. The use of casting is only required in C++ due to the strong type system, whereas this is not the case in C. The lack of a specific pointer type returned from malloc is type-unsafe behaviour according to some programmers:
malloc allocates based on byte count but not on type.
malloc_ptr += size;
C implicitly casts from and to void*, so the cast will be done automatically. In C++ only conversion to void* would be done implicitly, for the other direction an explicit cast is required.
Wiki explanation about type casting: malloc function returns an untyped pointer type void *, which the calling code must cast to the appropriate pointer type. Older C specifications required an explicit cast to do so, therefore the code
(struct foo *) malloc(sizeof(struct foo))
became the accepted practice.
However, this practice is discouraged in ANSI C as it can mask a failure to include the header file in which malloc is defined, resulting in
downstream errors on machines where the int and pointer types are of different sizes,
such as the now-ubiquitous x86_64 architecture. A conflict arises in code that is
required to compile as C++, since the cast is necessary in that language.
As you see this both lines,
p = (void *)malloc_ptr;
malloc_ptr += size;
here you are having malloc_ptr of type unsigned long so we are type casting this variable to void type and then store it in p.
and in similar manner second one is denoting malloc_ptr = malloc_ptr + size;
And this both codes are for developer's comfortness as p is of type void pointer so in application when you use malloc then you don't know which type of memory block have to be return by function so this function is always returns this generic void pointer so we are able to typecast again in our application as per requirement.
and same in second code if you are enter size in negative then what happens with this condition
if (free_mem_end_ptr && malloc_ptr >= free_mem_end_ptr)
error("Out of memory");
Related
malloc() function forms a single block of memory (say 20 bytes typecasted to int), so how it can be used as an array of int blocks like as calloc() function? Shouldn't it be used to store just one int value in whole 20 bytes (20*8 bits)?
(say 20 bytes typecasted to int)
No, the returned memory is given as a pointer to void, an incomplete type.
We assign the returned pointer to a variable of pointer to some type, and we can use that variable to access the memory.
Quoting C11, chapter §7.22.3, Memory management functions
[....] The
pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to
a pointer to any type of object with a fundamental alignment requirement and then used
to access such an object or an array of such objects in the space allocated (until the space
is explicitly deallocated). [...] The pointer returned points to the start (lowest byte address) of the
allocated space. [....]
Since the allocated memory is contiguous, pointer arithmetic works, just as in case of arrays, since in arrays also, elements are placed in contiguous memory.
One point to clarify, a pointer is not an array.
There's an abstract concept in C formally known as effective type, meaning the actual type of the data stored in memory. This is something the compiler keeps track of internally.
Most objects in C have such an effective type at the point when the variable is declared, for example if we type int a; then the effective type of what's stored in a is int.
Meaning it is legal to do evil things like this:
int a;
double* d = (double*)&a;
*(int*)d = 1;
This works because the effective type of the actual memory remains an int, even though we pointed at it with a wildly incompatible type. As long as we access it with the same type as the effective type, all is well. If we access the data using the wrong type, very bad things will happen, such as program crashes or dormant bugs.
But when we call malloc family of functions, we only tell them to reserve n number of bytes, with no type specified. This memory is guaranteed to be allocated in adjacent memory cells, but nothing else. The only difference between malloc and calloc is that the latter sets all values in this raw memory to zero. Neither function knows anything about types or arrays.
The returned chunk of raw memory has no effective type. Not until the point when we access it, then it gets the effective type which corresponds to the type used for the access.
So just as in the previous example, it doesn't matter which type of pointer we set to point at the data. It doesn't matter if we write int* i = malloc(n); or bananas_t* b = malloc(n);, because the pointed-at memory does not yet have a type. It does not get one until at the point where we access it for the first time.
There is nothing special about memory returned from malloc compared to memory returned from calloc, other that the fact that the bytes of the memory block returned by calloc are initialized to 0. Memory returned by malloc does not have to be used for a single object but may also be used for an array.
This means that the following are equivalent:
int *p1 = malloc(3 * sizeof(int));
p1[0] = 1;
p1[2] = 2;
p1[3] = 3;
...
int *p2 = calloc(3, sizeof(int));
p2[0] = 1;
p2[2] = 2;
p2[3] = 3;
Both will return 3 * sizeof(int) bytes of memory which can be used as an array of int of size 3.
What malloc returns back to you is just a pointer to the starting memory address where the contiguous block of memory was allocated.
The size of the contiguous block of memory that you allocated using malloc depends on the argument you passed into malloc function. http://www.cplusplus.com/reference/cstdlib/malloc/
If you want to store int variable then you will do it by defining the pointer type you use to be of an int type.
example:
int p*; //pointer of type integer
size_t size = 20;
p = (int *) malloc(size); //returns to pointer p the memory address
after this, using the pointer p the programmer can access int (4 byte precision) values.
calloc only difference against malloc is that calloc initiallizes all values at this memory block to zero.
I'm trying to implement a new malloc that stores the size at the front of the malloc'ed region, and then returns a pointer to the incremented location (what comes after the stored unsigned int).
void* malloc_new(unsigned size) {
void* result = malloc(size + sizeof(unsigned));
((unsigned*)result)[0] = size;
result += sizeof(unsigned);
return result;
}
I'm having doubts regarding whether the
result += sizeof(unsigned);
line is correct (does what I want).
Say the original address in the heap for the malloc is X, and the size of unsigned is 4, I want the 'result' pointer to point to X + 4, right? Meaning that the memory location in the stack that stores the 'result' pointer should contain (the original heap address location + 4).
result += sizeof(unsigned); should give you at least a warning (pointer arithmetic on void * leads to undefined behavior).
unsigned *result = malloc(size + sizeof size);
result[0] = size;
return result + 1;
should be the easier way.
Please note that the returned memory is not well aligned for all possible datatypes. You will run into troubles if you are using this memory for double or other 64bit datatypes. You should use an 8 byte datatype uint64_t for storing the size, then the memory block afterwards is well aligned.
In addition to the problems noted in other answers with performing pointer arithmetic on void * pointers, you're also likely violating one of the restrictions the C standard places on memory returned from functions such as malloc().
7.22.3 Memory management functions, paragraph 1 of the C standard states:
The order and contiguity of storage allocated by successive calls to the aligned_alloc, calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object. The pointer returned points to the start (lowest byte address) of the allocated space. If the space cannot be allocated, a null pointer is returned. If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.
Note the bolded part.
Unless your system has a fundamental alignment that's only four bytes (8 or 16 is much more typical), you are violating that restriction, and wil invoke undefined behavior per 6.3.2.3 Pointers, paragraph 7 for any object type with a fundamental alignment requirement larger than four bytes:
... If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. ...
How void pointer arithmetic is happening in GCC
C does not allow pointer arithmetic with void * pointer type.
GNU C allows it by considering the size of void is 1.
The result is void* so result += sizeof(unsigned); just happens to work on compatible compilers.
You can refactor your function into:
void *malloc_new(unsigned size) {
void* result = malloc(size + sizeof(unsigned));
((unsigned*)result)[0] = size;
result = (char*)result + sizeof(unsigned);
return result;
}
Side note, before void type and void* generic pointer existed in the C language, programmers used char* to represent a generic pointer.
You can do void* arithmetic if you cast the type first to char*, for instance. And then cast back to void*. To get better alignment, use 64 bit type for the size, e.g. uint64_t.
#define W_REF_VOID_PTR(ptr,offset) \
((void*)((char*) (ptr) + (offset)))
So I'm a bit confused on how to make a function that will return a pointer to an array of ints in C. I understand that you cannot do:
int* myFunction() {
int myInt[aDefinedSize];
return myInt; }
because this is returning a pointer to a local variable.
So, I thought about this:
int* myFunction(){
int* myInt = (int) malloc(aDefinedSize * sizeof(int));
return myInt; }
This gives the error: warning cast from pointer to integer of different size
This implies to use this, which works:
int* myFunction(){
int* myInt = (int*) malloc(aDefinedSize * sizeof(int));
return myInt; }
What I'm confused by though is this:
the (int*) before the malloc was explained to me to do this: it tells the compiler what the datatype of the memory being allocated is. This is then used when, for example, you are stepping through the array and the compiler needs to know how many bytes to increment by.
So, if this explanation I was given is correct, isn't memory being allocated for aDefinedSize number of pointers to ints, not actually ints? Thus, isnt myInt a pointer to an array of pointers to ints?
Some help in understanding this would be wonderful. Thanks!!
So, if this explanation I was given is correct, isn't memory being allocated for aDefinedSize number of pointers to ints, not actually ints?
No, you asked malloc for aDefinedSize * sizeof(int) bytes, not
aDefinedSize * sizeof(int *) bytes. That's the size of memory you get, the type depends on the pointer used to access the memory.
Thus, isnt myInt a pointer to an array of pointers to ints?
No, since you defined it as a int *, a pointer-to-an-int.
Of course the pointer has no knowledge of how large the allocated memory are is, but only points at the first int that fits there. It's up to you as programmer to keep track of the size.
Note that you shouldn't use that explicit typecast. malloc returns a void *, that can be silently assigned to any pointer, as in here:
int* myInt = malloc(aDefinedSize * sizeof(int));
Arithmetic on the pointer works in strides of the pointed-to type, i.e. with int *p, p[3] is the same as *(p+3), which means roughly "go to p, go forward three times sizeof(int) in bytes, and access that location".
int **q would be a pointer-to-a-pointer-to-an-int, and might point to an array of pointers.
malloc allocates an array of bytes and returns void* pointing to the first byte. Or NULL if the allocation failed.
To treat this array as an array of a different data type, the pointer must be cast to that data type.
In C, void* implicitly casts to any data pointer type, so no explicit cast is required:
int* allocateIntArray(unsigned number_of_elements) {
int* int_array = malloc(number_of_elements * sizeof(int)); // <--- no cast is required here.
return int_array;
}
Arrays in C
In C, you want to remember that an array is just an address in memory, plus a length and an object type. When you pass it as an argument to a function or a return value from a function, the length gets forgotten and it’s treated interchangeably with the address of the first element. This has led to a lot of security bugs in programs that either read or write past the end of a buffer.
The name of an array automatically converts to the address of its first element in most contexts, so you can for example pass either arrays or pointers to memmove(), but there are a few exceptions where the fact it also has a length matters. The sizeof() operator on an array is the number of bytes in the array, but sizeof() a pointer is the size of a pointer variable. So if we declare int a[SIZE];, sizeof(a) is the same as sizeof(int)*(size_t)(SIZE), whereas sizeof(&a[0]) is the same as sizeof(int*). Another important one is that the compiler can often tell at compile time if an array access is out of bounds, whereas it does not know which accesses to a pointer are safe.
How to Return an Array
If you want to return a pointer to the same, static array, and it’s fine that you’ll get the same array each time you call the function, you can do this:
#define ARRAY_SIZE 32U
int* get_static_array(void)
{
static int the_array[ARRAY_SIZE];
return the_array;
}
You must not call free() on a static array.
If you want to create a dynamic array, you can do something like this, although it is a contrived example:
#include <stdlib.h>
int* make_dynamic_array(size_t n)
// Returns an array that you must free with free().
{
return calloc( n, sizeof(int) );
}
The dynamic array must be freed with free() when you no longer need it, or the program will leak memory.
Practical Advice
For anything that simple, you would actually write:
int * const p = calloc( n, sizeof(int) );
Unless for some reason the array pointer would change, such as:
int* p = calloc( n, sizeof(int) );
/* ... */
p = realloc( p, new_size );
I would recommend calloc() over malloc() as a general rule, because it initializes the block of memory to zeroes, and malloc() leaves the contents unspecified. That means, if you have a bug where you read uninitialized memory, using calloc() will always give you predictable, reproducible results, and using malloc() could give you different undefined behavior each time. In particular, if you allocate a pointer and then dereference it on an implementation where 0 is a trap value for pointers (like typical desktop CPUs), a pointer created by calloc() will always give you a segfault immediately, while a garbage pointer created by malloc() might appear to work, but corrupt any part of memory. That kind of bug is a lot harder to track down. It’s also easier to see in the debugger that memory is or is not zeroed out than whether an arbitrary value is valid or garbage.
Further Discussion
In the comments, one person objects to some of the terminology I used. In particular, C++ offers a few different kinds of ways to return a reference to an array that preserve more information about its type, for example:
#include <array>
#include <cstdlib>
using std::size_t;
constexpr size_t size = 16U;
using int_array = int[size];
int_array& get_static_array()
{
static int the_array[size];
return the_array;
}
std::array<int, size>& get_static_std_array()
{
static std::array<int, size> the_array;
return the_array;
}
So, one commenter (if I understand correctly) objects that the phrase “return an array” should only refer to this kind of function. I use the phrase more broadly than that, but I hope that clarifies what happens when you return the_array; in C. You get back a pointer. The relevance to you is that you lose the information about the size of the array, which makes it very easy to write security bugs in C that read or write past the block of memory allocated for an array.
There was also some kind of objection that I shouldn’t have told you that using calloc() instead of malloc() to dynamically allocate structures and arrays that contain pointers will make almost all modern CPUs segfault if you dereference those pointers before you initialize them. For the record: this is not true of absolutely all CPUs, so it’s not portable behavior. Some CPUs will not trap. Some old mainframes will trap on a special pointer value other than zero. However, it’s come in very handy when I’ve coded on a desktop or workstation. Even if you’re running on one of the exceptions, at least your pointers will have the same value each time, which should make the bug more reproducible, and when you debug and look at the pointer, it will be immediately obvious that it’s zero, whereas it will not be immediately obvious that a pointer is garbage.
I'm trying to use shared memory segments in POSIX and am having a lot of trouble figuring out if there is memory at a certain address.
I saw a solution that uses file_size = *(size_t *)ptr
Where ptr is the returned pointer from some call to mmap (.... )
I don't really understand how this works. What does *(size_t *) typecasting do? I assume it (size_t)*var would cast the value at pointer var to a size_t type. But then, when I put another asterisk... this would give me a pointer again, wouldn't it?
There is no general way to determine the size of the allocated memory to which a given pointer points. or even whether it points to a valid object. There might be some system-specific ways to determine something similar, but they're likely to be unreliable -- and they can't determine that a pointer points to a valid object, but not to the one that it's supposed to point to.
You'll just have to keep careful track of this information yourself.
The method you describe:
file_size = *(size_t *)ptr;
can work if the memory happens to have been allocated by something that specifically stores the size at the beginning of the allocated region -- but only if you already know that ptr is valid.
ptr could be a pointer of any type (other than a function pointer). The cast (size_t *) converts the value of ptr so you can treat it as a pointer to a size_t object (size_t is an unsigned integer type used to represent sizes). Dereferencing that size_t* value with the * dereference operator gives you the value of the size_t object.
Here's an example of a hypothetical allocation function that might work this way:
void *allocate(size_t size) {
void *result = malloc(sizeof (size_t) + size);
if (result != NULL) {
*(size_t*)result = size;
}
return result;
}
and a function that gives you the currently allocated size:
size_t curr_size(void *ptr) {
return *(size_t*)ptr;
}
NOTE that this ignores alignment issues. If you're allocating memory for something that requires stricter alignment that size_t does, this can fail badly.
I want to know if it is ok to free() a pointer cast to another type.
For instance if I do this:
char *p = malloc (sizeof (int));
int *q = (int *)p;
free (q);
I get no warning on gcc (-Wall).
On linux, the man pages on free says it is illegal to call free on a pointer that was not returned by malloc(), calloc() or realloc(). But what happens if the pointer was cast to another type in between?
I ask this because I read that the C standard does not require different pointer types (e.g. int* and char*) to have the same size, and I fail to understand how this is possible since they both need to be convertible to a void* in order to call the malloc/free functions.
Is the above code legal?
It's probably safe, but it's not absolutely guaranteed to be safe.
On most modern systems, all pointers (at least all object pointers) have the same representation, and converting from one pointer type to another just reinterprets the bits that make up the representation. But the C standard doesn't guarantee this.
char *p = malloc (sizeof (int));
This gives you a char* pointer to sizeof (int) bytes of data (assuming malloc() succeeds.)
int *q = (int *)p;
This converts the char* pointer to an int* pointer. Since int is bigger than char, an int* pointer could require less information to indicate what it points to. For example, on a word-oriented machine, an int* might point just point to a word, while a char* has to contain a word pointer and an offset that indicates which byte within the word it points to. (I've actually worked on a system, the Cray T90, that worked like this.) So a conversion from char* to int* can actually lose information.
free (q);
Since free() takes an argument of type void*, the argument q is implicitly converted from int* to void*. There is no guarantee in the language standard that converting a char* pointer to int*, and then converting the result to void*, gives you the same result as converting a char* directly to a void*.
On the other hand, since malloc() always returns a pointer that's correctly aligned to point to any type, even on a system where int* and char* have different representations, it's unlikely to cause problems in this particular case.
So your code is practically certain to work correctly on any system you're likely to be using, and very very likely to work correctly even on exotic systems you've probably never seen.
Still, I advise writing code that you can easily demonstrate is correct, by saving the original pointer value (of type char*) and passing it to free(). If it takes several paragraphs of text to demonstrate that your code is almost certainly safe, simplifying your assumptions is likely to save you effort in the long run. If something else goes wrong in your program (trust me, something will), it's good to have one less possible source of error to worry about.
A bigger potential problem with your code is that you don't check whether malloc() succeeded. You don't do anything that would fail if it doesn't (both the conversion and the free() call are ok with null pointers), but if you refer to the memory you allocated you could be in trouble.
UPDATE:
You asked whether your code is legal; you didn't ask whether it's the best way to do what you're doing.
malloc() returns a void* result, which can be implicitly converted to any pointer-to-object type by an assignment. free() takes a void* argument; any pointer-to-object type argument that you pass to it will be implicitly converted to void*. This round-trip conversion (void* to something_else* to void*) is safe. Unless you're doing some kind of type-punning (interpreting the same chunk of data as two different types), there's no need for any casts.
Rather than:
char *p = malloc (sizeof (int));
int *q = (int *)p;
free (q);
you can just write:
int *p = malloc(sizeof *p);
...
free(p);
Note the use of sizeof *p in the argument to malloc(). This gives you the size of whatever p points to without having to refer to its type explicitly. It avoids the problem of accidentally using the wrong type:
double *oops = malloc(sizeof (int));
which the compiler likely won't warn you about.
Yes, it's legal. free() takes a void pointer (void*), so the type doesn't matter. As long as the pointer passed to was returned by malloc/realloc/calloc it's valid.
Yes the pointer is not changed, the cast is merely how the compiler interprets the bunch of bits.
edit: The malloc call returns an address in memory ie a 32(or 64) bit number.
The cast only tells the compiler how to interpret the value stored at that address, is it a float, integer, string etc, and when you do arithmatic on the address how big a unit should it step in.
The code is legal, however it is not necessary. Since pointers only point to the address where data is stored, there is no need to allocate space, or subsequently free it.