C malloc function's size parameter

C malloc function's size parameter - c

I am reading in a book that the malloc function in C takes the number of 'chunks' of memory you wish to allocate as a parameter and determines how many bytes the chunks are based on what you cast the value returned by malloc to. For example on my system an int is 4 bytes:
int *pointer;
pointer = (int *)malloc(10);
Would allocate 40 bytes because the compiler knows that ints are 4 bytes.
This confuses me for two reasons:
I was reading up, and the size parameter is actually the number of bytes you want to allocate and is not related to the sizes of any types.
Malloc is a function that returns an address. How does it adjust the size of the memory it allocated based on an external cast of the address it returned from void to a different type? Is it just some compiler magic I am supposed to accept?
I feel like the book is wrong. Any help or clarification is greatly appreciated!
Here is what the book said:
char *string;
string = (char *)malloc(80);
The 80 sets aside 80 chunks of storage. The chunk size is set by the typecast, (char *), which means that malloc() is finding storage for 80 characters of text.

Yes the book is wrong and you are correct please throw away that book.
Also, do let everyone know of the name of the book so we can permanently put it in our never to recommend black list.
Good Read:
What is the Best Practice for malloc?

When using malloc(), use the sizeof operator and apply it to the object being allocated, not the type of it.
Not a good idea:
int *pointer = malloc (10 * sizeof (int)); /* Wrong way */
Better method:
int *pointer = malloc (10 * sizeof *pointer);
Rationale: If you change the data type that pointer points to, you don't need to change the malloc() call as well. Maintenance win.
Also, this method is less prone to errors during code development. You can check that it's correct without having to look at the declaration in cases where the malloc() call occurs apart from the variable declaration.
Regarding your question about casting malloc(), note that there is no need for a cast on malloc() in C today. Also, if the data type should change in a future revision, any cast there would have to be changed as well or be another error source. Also, always make sure you have <stdlib.h> included. Often people put in the cast to get rid of a warning that is a result from not having the include file. The other reason is that it is required in C++, but you typically don't write C++ code that uses malloc().
The exact details about how malloc() works internally is not defined in the spec, in effect it is a black box with a well-defined interface. To see how it works in an implementation, you'd want to look at an open source implementation for examples. But malloc() can (and does) vary wildly on various platforms.

Related

Allocating space of 1 char to an int pointer

I am learning C and am a bit confused about why I don't get any warnings/errors from GCC with the following snippet. I am allocating space of 1 char to a pointer to int, is it some changes done by GCC (like optimizing the allocated space for an int silently)?
#include <stdlib.h>
#include <stdio.h>
typedef int *int_ptr;
int main()
{
int_ptr ip;
ip = calloc(1, sizeof(char));
*ip = 1000;
printf("%d", *ip);
free(ip);
return 0;
}
Update
Having read the answers below, would it still be unsafe and risky if I did it the other way around, e.g. allocating space of an int to a pointer to char? The source of my confusion is the following answer in the Rosetta Code, in the function StringArray StringArray_new(size_t size) the coder seems to exactly be doing this this->elements = calloc(size, sizeof(int)); where this->elements is a char** elements.

The result of calloc is of the type void* which implicitly gets converted to an int* type. The C programming language and GCC simply trust the programmer to write sensible casts and thus do not produce any warnings. Your code is technically valid C, even though it produces an invalid memory write at runtime. So no, GCC does not implicitly allocate space for an integer.
If you would like to see warnings of this kind before running (or compilation), you may want to use, e.g., Clang Static Analyzer.
If you would like to see errors of this kind at runtime, run your program with Valgrind.
Update
Allocating space for 1 int (i.e. 4 bytes, generally) and then interpreting it as a char (1 char is 1 byte) will not result in any memory errors, as the space required for an int is larger than the space required for a char. In fact, you could use the result as an array of 4 char's.
The sizeof operator returns the size of that type as a number of bytes. The calloc function then allocates that number of bytes, it is not aware of what type will be stored in the allocated segment.
While this does not produce any errors, it can indeed be considered a "risky and unsafe" programming practice. Exceptions exist for advanced applications where you´d want to reuse the same memory segment for storing values of a different type.
The code on Rosetta Code you linked to contains a bug in exactly that line. It should allocate memory for a char* instead of an int. These are generally not equal. On my machine, the size of an int is 4 bytes, while the size of a char* is 8 bytes.

C has very little type safety and malloc has none. It allocates exactly as many bytes as you tell it to allocate. It's not the compiler's duty to warn about it, it is the programmer's duty to get the parameters right.
The reason why it "seems to work" is undefined behavior. *ip = 1000; might as well crash. What is undefined behavior and how does it work?
Also you should never hide pointers behind typedef. This is very bad practice and only serves to confuse the programmer and everyone reading the code.

The compiler only cares that you pass the right number and types of arguments to calloc - it doesn’t check to see if those arguments make sense, since that’s a runtime issue.
Yes, you could probably add some special case logic to the compiler when both arguments are constant expressions and sizeof operations like in this case, but how would it handle a case where both arguments are runtime variables like calloc( num, size );?
This is one of those cases where C assumes you’re smart enough to know what you’re doing.

Compiler only check Syntax, not Semantic.
Your code's Syntax is OK. But Semantic not.

What is wrong with how I'm dynamically allocating space for this 2d array?

I'm trying to create a 2D array that will store be able to store each character of a .txt file as an element in the 2D array.
How do I dynamically allocate space for it?
This what I've done so far to malloc it. (this was copied of GeeksForGeeks)
char *arr[rownum2];
for (i = 0; i < rownum2; i++) {
arr[i] = (char *)malloc(colnum * sizeof(char));
However, I think this is the source of serious memory related issues later on in my program, and I've also been told some parts of this are unnecessary.
Can I please get the most suitable way to dynamically allocate memory for the 2D array in this specific scenario?

The code you have posted is 'OK', so long as you remember to call free() on the allocated memory, later in your code, like this:
for (i=0;i<rownum2;i++) free(arr[i]);
...and I've also been told some parts of this are unnecessary.
The explicit cast is unnecessary, so, instead of:
arr[i] = (char *)malloc(colnum*sizeof(char));
just use:
arr[i] = malloc(colnum*sizeof(char));
The sizeof(char) is also, strictly speaking, unnecessary (char will always have a size of 1) but you can leave that, for clarity.

Technically, it's not a 2D array, but an array of arrays. The difference is, you can't make 2D array with lines of different size, but you can do it with your array of arrays.
If you don't need it, you can allocate rownum2*colnum elements and access each element as arr[x+colnum*y] (it's used often because all data are kept in one place, decreasing CPU cache load and some system inner needs for storing each pointer of each allocated chunk).
Also, even array of lines of different sizes can be placed into 1D array and accessed like 2D (at least, if they do not change size or even RO). You can allocate char body[total_size], read the whole array, allocate char* arr[rownum2] and set each arr[i]=body+line_beginning_offset.
BTW don't forget there are not actual C strings because they are not null-terminated. You'll need an additional column for null-term. If you store ASCII art, 2D array is a very good solution.

The only serious problem I see in your code is that you are casting the returned value of malloc(3), and probably you have forgotten to #include <stdlib.h> also (this is a dangerous cocktail), and this way, you are destroying the returned value of the call with the cast you put before malloc(3). Let me explain:
First, you have (or haven't, but I have to guess) a 64bit architecture (as it is common today) and pointers are 64bit wide in your system, while int integers are 32bit wide.
You have probably forgotten to #include <stdlib.h> in your code (which is something I have to guess also), so the compiler is assuming that malloc(3) is actually a function returning int (this is legacy in C, if you don't provide a prototype for a function external to the compilation unit), so the compiler is generating code to get just a 32 bit value from the malloc(3) function, and not the 64bit pointer that (probably, but I have to guess also) malloc(3) actually returns.
You are casting that int 32bit value (already incorrect) to a 64bit pointer (far more incorrect, but I have to guess...), making any warning about type conversions between integer values and pointers to dissapear, and be silenced when you put the cast (the compiler assumes that, as a wise programmer you are, you have put the cast there on purpose, and that you know what you are doing)
The first (undefined behaviour) returned value is being (undefined behaviour) just cut to 32 bit, and then converted (from int to char *, with more undefined behaviour) to be used in your code. This makes the original pointer returned from malloc(3) to be completely different value when reinterpreted and cast to (char *). This makes your pointers to point to a different place, and break your program on execution.
Your code should be something like (again, a snippet has to be used, as your code is not complete):
#include <stdlib.h> /* for malloc() */
/* ... */
char *arr[rownum2];
for (i = 0; i < rownum2; i++) {
arr[i] = malloc(colnum); /* sizeof(char) is always 1 */
I need finally to do you a recommendation:
Please, read (and follow) the how to create a minimal, verifiable example page, as your probable missing #include error, is something I had to guess.... Posting snippets of code makes many times your mistakes to go away, and we have to guess what can be happening here. This is the most important thing you have to learn from this answer. Post complete, compilable and verifiable code (that is, code that you can check fails, before posting, not a snippet you selected where you guess the problem can be). The code you posted does allow nobody to verify why it can be failing, because it must be completed (and repaired, probably) to make it executable.

using malloc over array

May be similar question found on SO. But, I didn't found that, here is the scenario
Case 1
void main()
{
char g[10];
char a[10];
scanf("%[^\n] %[^\n]",a,g);
swap(a,g);
printf("%s %s",a,g);
}
Case 2
void main()
{
char *g=malloc(sizeof(char)*10);
char *a=malloc(sizeof(char)*10);
scanf("%[^\n] %[^\n]",a,g);
swap(a,g);
printf("%s %s",a,g);
}
I'm getting same output in both case. So, my question is when should I prefer malloc() instead of array or vice-verse and why ?? I found common definition, malloc() provides dynamic allocation. So, it is the only difference between them ?? Please any one explain with example, what is the meaning of dynamic although we are specifying the size in malloc().

The principle difference relates to when and how you decide the array length. Using fixed length arrays forces you to decide your array length at compile time. In contrast using malloc allows you to decide the array length at runtime.
In particular, deciding at runtime allows you to base the decision on user input, on information not known at the time you compile. For example, you may allocate the array to be a size big enough to fit the actual data input by the user. If you use fixed length arrays, you have to decide at compile time an upper bound, and then force that limitation onto the user.
Another more subtle issue is that allocating very large fixed length arrays as local variables can lead to stack overflow runtime errors. And for that reason, you sometimes prefer to allocate such arrays dynamically using malloc.

Please any one explain with example, what is the meaning of dynamic although we are specifying the size.
I suspect this was significant before C99. Before C99, you couldn't have dynamically-sized auto arrays:
void somefunc(size_t sz)
{
char buf[sz];
}
is valid C99 but invalid C89. However, using malloc(), you can specify any value, you don't have to call malloc() with a constant as its argument.
Also, to clear up what other purpose malloc() has: you can't return stack-allocated memory from a function, so if your function needs to return allocated memory, you typically use malloc() (or some other member of the malloc familiy, including realloc() and calloc()) to obtain a block of memory. To understand this, consider the following code:
char *foo()
{
char buf[13] = "Hello world!";
return buf;
}
Since buf is a local variable, it's invalidated at the end of its enclosing function - returning it results in undefined behavior. The function above is erroneous. However, a pointer obtained using malloc() remains valid through function calls (until you don't call free() on it):
char *bar()
{
char *buf = malloc(13);
strcpy(buf, "Hello World!");
return buf;
}
This is absolutely valid.

I would add that in this particular example, malloc() is very wasteful, as there is more memory allocated for the array than what would appear [due to overhead in malloc] as well as the time it takes to call malloc() and later free() - and there's overhead for the programmer to remember to free it - memory leaks can be quite hard to debug.
Edit: Case in point, your code is missing the free() at the end of main() - may not matter here, but it shows my point quite well.
So small structures (less than 100 bytes) should typically be allocated on the stack. If you have large data structures, it's better to allocate them with malloc (or, if it's the right thing to do, use globals - but this is a sensitive subject).
Clearly, if you don't know the size of something beforehand, and it MAY be very large (kilobytes in size), it is definitely a case of "consider using malloc".
On the other hand, stacks are pretty big these days (for "real computers" at least), so allocating a couple of kilobytes of stack is not a big deal.

Why does calloc require two parameters and malloc just one?

It's very bothersome for me to write calloc(1, sizeof(MyStruct)) all the time. I don't want to use an idea like wrapping this method and etc. I mean I want to know what two parameters gives me? If it gives something, why doesn't mallochave two parameters too?
By the way, I searched for an answer to this question but I didn't find a really good answer. Those answers was that calloc can allocate larger blocks than malloc can and etc.
I saw another answer that calloc allocates an array. With malloc I can multiply and I'll get an array and I can use it without 1, at the start.

Historical reasons.
At the time of when calloc was introduced, the malloc function didn't exist and the calloc function would provide the correct alignment for one element object.
When malloc was introduced afterwards, it was decided the memory returned would be properly aligned for any use (which costs more memory) and so only one parameter was necessary. The API for calloc was not changed but calloc now also returns memory properly aligned for any use.
EDIT:
See the discussion in the comments and the interesting input from #JimBalter.
My first statement regarding the introduction of malloc and calloc may be totally wrong.
Also the real reasons could also be well unrelated to alignment. C history has been changed a lot by compiler implementers. malloc and calloc could come from different groups / compilers implementers and this would explain the API difference. And I actually favor this explanation as the real reason.

The only reason I could come up with is that
int *foo = calloc(42, sizeof *foo);
is one character shorter than
int *foo = malloc(42 * sizeof *foo);
The real reason is apparently lost to the millennia centuries decades of C history and needs a programming language archaeologist to unearth, but might be related to the following fact:
In contrast to malloc() - which needs to return a memory block aligned in accordance to the full block size - when using calloc() as intended, the memory block would only need to be aligned in accordance to the size passed as second argument. However, the C standard forbids this optimization in conforming implementations.

it is just by design.
you could write your own calloc
void *mycalloc(size_t num, size_t size)
{
void *block = malloc(num * size);
if(block != NULL)
memset(block, 0, num * size);
return block;
}

You shouldn't allocate objects with calloc (or malloc or anything like that). Even though calloc zero-initializes it, the object is still hasn't been constructed as far as C++ is concerned. Use constructors for that:
class MyClass
{
private:
short m_a;
int m_b;
long m_c;
float m_d;
public:
MyClass() : m_a(0), m_b(0), m_c(0), m_d(0.0) {}
};
And then instantiate it with new (or on the stack if you can):
MyClass* mc = new MyClass();

Does malloc() allocate a contiguous block of memory?

I have a piece of code written by a very old school programmer :-) . it goes something like this
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[1];
} ts_request_def;
ts_request_def* request_buffer =
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));
the programmer basically is working on a buffer overflow concept. I know the code looks dodgy. so my questions are:
Does malloc always allocate contiguous block of memory? because in this code if the blocks are not contiguous, the code will fail big time
Doing free(request_buffer) , will it free all the bytes allocated by malloc i.e sizeof(ts_request_def) + (2 * 1024 * 1024),
or only the bytes of the size of the structure sizeof(ts_request_def)
Do you see any evident problems with this approach, I need to discuss this with my boss and would like to point out any loopholes with this approach

To answer your numbered points.
Yes.
All the bytes. Malloc/free doesn't know or care about the type of the object, just the size.
It is strictly speaking undefined behaviour, but a common trick supported by many implementations. See below for other alternatives.
The latest C standard, ISO/IEC 9899:1999 (informally C99), allows flexible array members.
An example of this would be:
int main(void)
{
struct { size_t x; char a[]; } *p;
p = malloc(sizeof *p + 100);
if (p)
{
/* You can now access up to p->a[99] safely */
}
}
This now standardized feature allowed you to avoid using the common, but non-standard, implementation extension that you describe in your question. Strictly speaking, using a non-flexible array member and accessing beyond its bounds is undefined behaviour, but many implementations document and encourage it.
Furthermore, gcc allows zero-length arrays as an extension. Zero-length arrays are illegal in standard C, but gcc introduced this feature before C99 gave us flexible array members.
In a response to a comment, I will explain why the snippet below is technically undefined behaviour. Section numbers I quote refer to C99 (ISO/IEC 9899:1999)
struct {
char arr[1];
} *x;
x = malloc(sizeof *x + 1024);
x->arr[23] = 42;
Firstly, 6.5.2.1#2 shows a[i] is identical to (*((a)+(i))), so x->arr[23] is equivalent to (*((x->arr)+(23))). Now, 6.5.6#8 (on the addition of a pointer and an integer) says:
"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
For this reason, because x->arr[23] is not within the array, the behaviour is undefined. You might still think that it's okay because the malloc() implies the array has now been extended, but this is not strictly the case. Informative Annex J.2 (which lists examples of undefined behaviour) provides further clarification with an example:
An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]) (6.5.6).

3 - That's a pretty common C trick to allocate a dynamic array at the end of a struct. The alternative would be to put a pointer into the struct and then allocate the array separately, and not forgetting to free it too. That the size is fixed to 2mb seems a bit unusual though.

This is a standard C trick, and isn't more dangerous that any other buffer.
If you are trying to show to your boss that you are smarter than "very old school programmer", this code isn't a case for you. Old school not necessarily bad. Seems the "old school" guy knows enough about memory management ;)

1) Yes it does, or malloc will fail if there isn't a large enough contiguous block available. (A failure with malloc will return a NULL pointer)
2) Yes it will. The internal memory allocation will keep track of the amount of memory allocated with that pointer value and free all of it.
3)It's a bit of a language hack, and a bit dubious about it's use. It's still subject to buffer overflows as well, just may take attackers slightly longer to find a payload that will cause it. The cost of the 'protection' is also pretty hefty (do you really need >2mb per request buffer?). It's also very ugly, although your boss may not appreciate that argument :)

I don't think the existing answers quite get to the essence of this issue. You say the old-school programmer is doing something like this;
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[1];
} ts_request_def;
ts_request_buffer_def* request_buffer =
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));
I think it's unlikely he's doing exactly that, because if that's what he wanted to do he could do it with simplified equivalent code that doesn't need any tricks;
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[2*1024*1024 + 1];
} ts_request_def;
ts_request_buffer_def* request_buffer =
malloc(sizeof(ts_request_def));
I'll bet that what he's really doing is something like this;
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[1]; // effectively package[x]
} ts_request_def;
ts_request_buffer_def* request_buffer =
malloc( sizeof(ts_request_def) + x );
What he wants to achieve is allocation of a request with a variable package size x. It is of course illegal to declare the array's size with a variable, so he is getting around this with a trick. It looks as if he knows what he's doing to me, the trick is well towards the respectable and practical end of the C trickery scale.

As for #3, without more code it's hard to answer. I don't see anything wrong with it, unless its happening a lot. I mean, you don't want to allocate 2mb chunks of memory all the time. You also don't want to do it needlessly, e.g. if you only ever use 2k.
The fact that you don't like it for some reason isn't sufficient to object to it, or justify completely re-writing it. I would look at the usage closely, try to understand what the original programmer was thinking, look closely for buffer overflows (as workmad3 pointed out) in the code that uses this memory.
There are lots of common mistakes that you may find. For example, does the code check to make sure malloc() succeeded?

The exploit (question 3) is really up to the interface towards this structure of yours. In context this allocation might make sense, and without further information it is impossible to say if it's secure or not.
But if you mean problems with allocating memory bigger than the structure, this is by no means a bad C design (I wouldn't even say it's THAT old school... ;) )
Just a final note here - the point with having a char[1] is that the terminating NULL will always be in the declared struct, meaning there can be 2 * 1024 * 1024 characters in the buffer, and you don't have to account for the NULL by a "+1". Might look like a small feat, but I just wanted to point out.

I've seen and used this pattern frequently.
Its benefit is to simplify memory management and thus avoid risk of memory leaks. All it takes is to free the malloc'ed block. With a secondary buffer, you'll need two free. However one should define and use a destructor function to encapsulate this operation so you can always change its behavior, like switching to secondary buffer or add additional operations to be performed when deleting the structure.
Access to array elements is also slightly more efficient but that is less and less significant with modern computers.
The code will also correctly work if memory alignment changes in the structure with different compilers as it is quite frequent.
The only potential problem I see is if the compiler permutes the order of storage of the member variables because this trick requires that the package field remains last in the storage. I don't know if the C standard prohibits permutation.
Note also that the size of the allocated buffer will most probably be bigger than required, at least by one byte with the additional padding bytes if any.

Yes. malloc returns only a single pointer - how could it possibly tell a requester that it had allocated multiple discontiguous blocks to satisfy a request?

Would like to add that not is it common but I might also called it a standard practice because Windows API is full of such use.
Check the very common BITMAP header structure for example.
http://msdn.microsoft.com/en-us/library/aa921550.aspx
The last RBG quad is an array of 1 size, which depends on exactly this technique.

This common C trick is also explained in this StackOverflow question (Can someone explain this definition of the dirent struct in solaris?).

In response to your third question.
free always releases all the memory allocated at a single shot.
int* i = (int*) malloc(1024*2);
free(i+1024); // gives error because the pointer 'i' is offset
free(i); // releases all the 2KB memory

The answer to question 1 and 2 is Yes
About ugliness (ie question 3) what is the programmer trying to do with that allocated memory?

the thing to realize here is that malloc does not see the calculation being made in this
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));
Its the same as
int sz = sizeof(ts_request_def) + (2 * 1024 * 1024);
malloc(sz);
YOu might think that its allocating 2 chunks of memory , and in yr mind they are "the struct", "some buffers". But malloc doesnt see that at all.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight