I understand that malloc is used to dynamically allocate memory. In my code, I have the following function that I sometimes call:
int memory_get_log(unsigned char day, unsigned char date, unsigned char month){
char fileName[11];
unsigned long readItems, itemsToRead;
F_FILE *file;
sprintf(fileName, "%s_%u%u%u%s", "LOG", day, date, month, ".bin");
file = f_open(fileName , "r");
itemsToRead = f_filelength( fileName );
//unsigned char *fileData = (unsigned char *) malloc(itemsToRead);
unsigned char fileData[itemsToRead]; //here I am not using malloc
readItems = f_read(fileData, 1, itemsToRead, file);
transmit_data(fileData, itemsToRead);
f_close(file);
return 0;
}
As you may see, the number of items I read from the file can be different each time. The line
unsigned char fileData[itemsToRead]; is used to read these variable sized files. I can see that I am allocating memory dynamically in some way. This function works fine. Do I really need to use malloc here?
Is there anything wrong with the way I am declaring this array?
TL;DR
If you don't know what you're doing, use malloc or a fixed size array in all situations. VLA:s are not necessary at all. And do note that VLA:s cannot be static nor global.
Do I really need to use malloc here?
Yes. You're reading a file. They are typically way bigger than what's suitable for a VLA. They should only be used for small arrays. If at all.
Long version
Is there anything wrong with the way I am declaring this array?
It depends. VLA:s was removed as a mandatory component from C11, so strictly speaking, you are using compiler extensions, thus reducing the portability. In the future, VLA:s might (The chance is probably extremely low) get removed from your compiler. Maybe you also want to recompile the code on a compiler without support for VLA:s. The risk analysis about this is up to you. But I might mention that the same is true for alloca. Although commonly available, it's not required by the standard.
Another problem is if the allocation fails. If you're using malloc, you have a chance to recover from this, but if you're only going to do something like this:
unsigned char *fileData = malloc(itemsToRead);
if(!fileData)
exit(EXIT_FAILURE);
That is, just exit on failure and not trying to recover, then it does not really matter. At least not from a pure recovery point of view.
But also, although the C standard does not impose any requirement that VLA:s end up on the stack or heap, as far as I know it's pretty common to put them on the stack. This means that the risk of failing the allocation due to insufficient available memory is much, much higher. On Linux, the stack is usually 8MB and on Windows 1MB. In almost all cases, the available heap is much higher. The declaration char arr[n] is basically the same as char *arr = alloca(n) with the exception of how the sizeof operator works.
While I can understand that you might want to use the sizeof operator on a VLA sometimes, I find it very hard to find a real need for it. Afterall, the size can never change, and the size is known when you do the allocation. So instead of:
int arr[n];
...
for(int i=0; i<sizeof(arr), ...
Just do:
const size_t size = n;
int arr[size];
...
for(int i=0; i<size; ...
VLA:s are not a replacement for malloc. They are a replacement for alloca. If you don't want to change a malloc to an alloca, then you should not change to a VLA either.
Also, in many situations where a VLA would seem to bee a good idea, it is ALSO a good idea to check if the size is below a certain limit, like this:
int foo(size_t n)
{
if(n > LIMIT) { /* Handle error */ }
int arr[n];
/* Code */
}
That would work, but compare it to this:
int foo(size_t n)
{
int *arr = malloc(n*sizeof(*arr));
if(!arr) { /* Handle error */ }
/* Code */
free(arr);
}
You did not really make things that much easier. It's still an error check, so the only thing you really got rid of was the free call. I might also add that it's a MUCH higher risk that a VLA allocation fails due to the size being too big. So if you KNOW that the size is small, the check is not necessary, but then again, if you KNOW that it is small, just use a regular array that will fit what you need.
However, I will not deny that there are some advantages of VLA:s. You can read about them here. But IMO, while they have those advantages they are not worth it. Whenever you find VLA:s useful, I would say that you should at least consider switching to another language.
Also, one advantage of VLA:s (and also alloca) is that they are typically faster than malloc. So if you have performance issues, you might want to switch to alloca instead of malloc. A malloc call involves asking the operating system (or something similar) for a piece of memory. The operating system then searches for that and returns a pointer if it finds it. An alloca call, on the other hand, is typically just implemented by changing the stack pointer in one single cpu instruction.
There are many things to consider, but I would avoid using VLA:s. If you ask me, the biggest risk with them is that since they are so easy to use, people become careless with them. For those few cases where I find them suitable, I would use alloca instead, because then I don't hide the dangers.
Short summary
VLA:s are not required by C11 and later, so strictly speaking, you're relying on compiler extensions. However, the same is true for alloca. So if this is a very big concern, use fixed arrays if you don't want to use malloc.
VLA:s are syntactic sugar (Not 100% correct, especially when dealing with multidimensional arrays) for alloca and not malloc. So don't use them instead of malloc. With the exception of how sizeof work on a VLA, they offer absolutely no benefit at all except for a somewhat simpler declaration.
VLA:s are (usually) stored on the stack while allocations done by malloc are (usually) stored on the heap, so a big allocation has a much higher risk to fail.
You cannot check if a VLA allocation failed or not, so it can be a good idea to check if the size is too big in advance. But then we have an error check just as we do with checking if malloc returned NULL.
A VLA cannot be global nor static. The static part alone will likely not cause any problems whatsoever, but if you want a global array, then you're forced to use malloc or a fixed size array.
This function works fine.
No it does not. It has undefined behavior. As pointed out by Jonathan Leffler in comments, the array fileName is too short. It would need to be at least 12 bytes to include the \0-terminator. You can make this a bit safer by changing to:
snprintf(fileName,
sizeof(fileName),
"%s_%u%u%u%s",
"LOG", day, date, month, ".bin");
In this case, the problem with the too small array would manifest itself by creating a file with extension .bi instead of .bin which is a better bug than undefined behavior, which is the current case.
You also have no error checks in your code. I would rewrite it like this. And for those who thinks that goto is bad, well, it usually is, but error handling is both practical and universally accepted among experienced C coders. Another common use is breaking out of nested loops, but that's not applicable here.
int memory_get_log(unsigned char day, unsigned char date, unsigned char month){
char fileName[12];
unsigned long readItems, itemsToRead;
int ret = 0;
F_FILE *file;
snprintf(fileName,
sizeof(fileName),
"%s_%u%u%u%s", "LOG",
day, date, month, ".bin");
file = f_open(fileName , "r");
if(!file) {
ret = 1;
goto END;
}
itemsToRead = f_filelength( fileName );
unsigned char *fileData = malloc(itemsToRead);
if(!fileData) {
ret=2;
goto CLOSE_FILE;
}
readItems = f_read(fileData, 1, itemsToRead, file);
// Maybe not necessary. I don't know. It's up to you.
if(readItems != itemsToRead) {
ret=3;
goto FREE;
}
// Assuming transmit_data have some kind of error check
if(!transmit_data(fileData, itemsToRead)) {
ret=4;
}
FREE:
free(fileData);
CLOSE_FILE:
f_close(file);
END:
return ret;
}
If a function only returns 0, then it's pointless to return anything. Declare it as void instead. Now I used the return value to make it possible for the caller to detect errors and the type of error.
firstly, the line 'unsigned char fileData[itemsToRead]' asks for memory on stack, and this will be terrible mistake if file size is big. You should use 'malloc' to ask memory on heap.
secondly, if file size is really big enough, you should coside to use virsual memory or dynamic load such as 'fseek' method.
Related
When writing a program in which I ask the user to enter number N, which I have to use to allocate the memory for an int array, what is the correct way to handle this:
First approach:
int main() {
int array[],n;
scanf("%d\n",&n);
array = malloc(n * sizeof(int));
}
or the second approach:
int main() {
int n;
scanf("%d\n",&n);
int array[n];
}
Either one will work (though the first case needs to be changed from int array[] to int *array); the difference depends on where the array is stored.
In the first case, the array will be stored in the heap, while in the second case, it'll (most likely) be stored on the stack. When it's stored on the stack, the maximum size of the array will be much more limited based on the limit of the stack size. If it's stored in the heap, however, it can be much larger.
Your second approach is called a variable length array (VLA), and is supported only as of c99. This means that if you intend your code to be compatible with older compilers (or to be read and understood by older people..), you may have to fall back to the first option which is more standard. Note that dynamically allocating data requires proper maintenance, the most important part of that being - freeing it when you're done (which you don't do in your program)
Assuming you meant to use int *array; instead of int array[];(The first one wouldn't compile).
Always use the first approach unless you know the array size is going to be very small and you have the intimate knowledge of the platforms your will be running on. Naturally, the question arises how small is small enough?
The main problem with second approach is that there's no portable way to verify whether the VLA (Varible Length Array) allocation succeeded. The advantage is that you don't have to manage the memory but that's hardly an "advantage" considering the risk of undefined behaviour in case memory allocation fails.
It was introduced in C99 and been made optional in C11. That suggests the committee found it not-so-useful. Also, C11 compilers may not support it and you have to perform additional check whether your compiler supports it or not by checking if __STDC_NO_VLA__ has been defined.
Automatic storage allocation for an array as small int my_arr[10]; could fail. This is an extreme and unrealistic example in modern operating systems, but possible in theory. So I suggest to avoid VLAs in any serious projects.
You did say you wanted a COMMAND LINE parameter:
int main (int argc, char **argv)
{
int *array ;
int count ;
if (argc < 2)
return 1 ;
count = atoi (argv[1]) ;
array = malloc (sizeof(int)*count) ;
. . . . .
free (array) ;
return 0 ;
}
Before starting I know that there has been quite a lot of questions about this and I hope that my question is not redundant. I have read quite a lot of things on the internet, and I have a specific question. When allocating memory in C language, what is the best way to allocate memory.
So imagine I want to allocate a int* nb, so is there a better way to allocate the memory?
First solution I have read:
nb=malloc(sizeof *nb);
Second solution I have read:
nb=malloc(sizeof(nb));
Third solution I have read:
nb=malloc(sizeof(int*));
Th reason I am asking this is because I have read on the internet, all three solutions, and if I understood well, that the allocation size may differ depending the system you are on, so the reason for using sizeof(nb), which may allocate more memory than sizeof(int). So am I wrong ?
[EDIT]
The aim here is mostly to allocate an array of arbitrary size
[/EDIT]
Thanks for any help and again, hoping my question is not redundant
int *IntPtr=NULL;
IntPtr=malloc(100*sizeof(int));
is the same as
int *IntPtr=NULL;
IntPtr=malloc(100*sizeof(*IntPtr));
because IntPtr is of type int* so dereferencing it (i.e.*IntPtr ) leads to type int
On the other hand:
nb=malloc(sizeof(nb));
is a bad choice, since the sizeof(nb) is the size of the pointer itself, i.e. the size of the address in memory. On 32 bit systems this is always 4 no matter what the type of nb is.
For dynamic memory allocation, you should check realloc
The advantage of the first solution is that you can change the type in one place (e.g. from int* nb to double* nb) and the "malloc()" will automatically be correct for the new type and won't need to be modified.
The advantage of the third solution is that there are more visual hints for the programmer. E.g. if you see nb=malloc(sizeof(int)); you don't need to find the type of nb to determine what it's allocating.
In theory; which is preferred depends on context (which advantage is more important at the time) and the former is more likely to be better. In practice, I have a habit of always doing the latter and then being disappointed that mistakes aren't detected by the compiler's type checking. :-)
1) Normally, we declare and statically allocate int variable like below:
int nb;
2) If we want to create an array or for some reasons, we can declare a dynamically allocatable variable (a pointer):
int* nb;
In case 2) we need to allocated a memory block to the declared variable:
nb = (int*) malloc( sizeof(*nb) );
For single int storage we use sizeof(int) or sizeof(*nb).
Because nb is a pointer to int, typeof(*nb) is int.
May be similar question found on SO. But, I didn't found that, here is the scenario
Case 1
void main()
{
char g[10];
char a[10];
scanf("%[^\n] %[^\n]",a,g);
swap(a,g);
printf("%s %s",a,g);
}
Case 2
void main()
{
char *g=malloc(sizeof(char)*10);
char *a=malloc(sizeof(char)*10);
scanf("%[^\n] %[^\n]",a,g);
swap(a,g);
printf("%s %s",a,g);
}
I'm getting same output in both case. So, my question is when should I prefer malloc() instead of array or vice-verse and why ?? I found common definition, malloc() provides dynamic allocation. So, it is the only difference between them ?? Please any one explain with example, what is the meaning of dynamic although we are specifying the size in malloc().
The principle difference relates to when and how you decide the array length. Using fixed length arrays forces you to decide your array length at compile time. In contrast using malloc allows you to decide the array length at runtime.
In particular, deciding at runtime allows you to base the decision on user input, on information not known at the time you compile. For example, you may allocate the array to be a size big enough to fit the actual data input by the user. If you use fixed length arrays, you have to decide at compile time an upper bound, and then force that limitation onto the user.
Another more subtle issue is that allocating very large fixed length arrays as local variables can lead to stack overflow runtime errors. And for that reason, you sometimes prefer to allocate such arrays dynamically using malloc.
Please any one explain with example, what is the meaning of dynamic although we are specifying the size.
I suspect this was significant before C99. Before C99, you couldn't have dynamically-sized auto arrays:
void somefunc(size_t sz)
{
char buf[sz];
}
is valid C99 but invalid C89. However, using malloc(), you can specify any value, you don't have to call malloc() with a constant as its argument.
Also, to clear up what other purpose malloc() has: you can't return stack-allocated memory from a function, so if your function needs to return allocated memory, you typically use malloc() (or some other member of the malloc familiy, including realloc() and calloc()) to obtain a block of memory. To understand this, consider the following code:
char *foo()
{
char buf[13] = "Hello world!";
return buf;
}
Since buf is a local variable, it's invalidated at the end of its enclosing function - returning it results in undefined behavior. The function above is erroneous. However, a pointer obtained using malloc() remains valid through function calls (until you don't call free() on it):
char *bar()
{
char *buf = malloc(13);
strcpy(buf, "Hello World!");
return buf;
}
This is absolutely valid.
I would add that in this particular example, malloc() is very wasteful, as there is more memory allocated for the array than what would appear [due to overhead in malloc] as well as the time it takes to call malloc() and later free() - and there's overhead for the programmer to remember to free it - memory leaks can be quite hard to debug.
Edit: Case in point, your code is missing the free() at the end of main() - may not matter here, but it shows my point quite well.
So small structures (less than 100 bytes) should typically be allocated on the stack. If you have large data structures, it's better to allocate them with malloc (or, if it's the right thing to do, use globals - but this is a sensitive subject).
Clearly, if you don't know the size of something beforehand, and it MAY be very large (kilobytes in size), it is definitely a case of "consider using malloc".
On the other hand, stacks are pretty big these days (for "real computers" at least), so allocating a couple of kilobytes of stack is not a big deal.
It's very bothersome for me to write calloc(1, sizeof(MyStruct)) all the time. I don't want to use an idea like wrapping this method and etc. I mean I want to know what two parameters gives me? If it gives something, why doesn't mallochave two parameters too?
By the way, I searched for an answer to this question but I didn't find a really good answer. Those answers was that calloc can allocate larger blocks than malloc can and etc.
I saw another answer that calloc allocates an array. With malloc I can multiply and I'll get an array and I can use it without 1, at the start.
Historical reasons.
At the time of when calloc was introduced, the malloc function didn't exist and the calloc function would provide the correct alignment for one element object.
When malloc was introduced afterwards, it was decided the memory returned would be properly aligned for any use (which costs more memory) and so only one parameter was necessary. The API for calloc was not changed but calloc now also returns memory properly aligned for any use.
EDIT:
See the discussion in the comments and the interesting input from #JimBalter.
My first statement regarding the introduction of malloc and calloc may be totally wrong.
Also the real reasons could also be well unrelated to alignment. C history has been changed a lot by compiler implementers. malloc and calloc could come from different groups / compilers implementers and this would explain the API difference. And I actually favor this explanation as the real reason.
The only reason I could come up with is that
int *foo = calloc(42, sizeof *foo);
is one character shorter than
int *foo = malloc(42 * sizeof *foo);
The real reason is apparently lost to the millennia centuries decades of C history and needs a programming language archaeologist to unearth, but might be related to the following fact:
In contrast to malloc() - which needs to return a memory block aligned in accordance to the full block size - when using calloc() as intended, the memory block would only need to be aligned in accordance to the size passed as second argument. However, the C standard forbids this optimization in conforming implementations.
it is just by design.
you could write your own calloc
void *mycalloc(size_t num, size_t size)
{
void *block = malloc(num * size);
if(block != NULL)
memset(block, 0, num * size);
return block;
}
You shouldn't allocate objects with calloc (or malloc or anything like that). Even though calloc zero-initializes it, the object is still hasn't been constructed as far as C++ is concerned. Use constructors for that:
class MyClass
{
private:
short m_a;
int m_b;
long m_c;
float m_d;
public:
MyClass() : m_a(0), m_b(0), m_c(0), m_d(0.0) {}
};
And then instantiate it with new (or on the stack if you can):
MyClass* mc = new MyClass();
I have a piece of code written by a very old school programmer :-) . it goes something like this
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[1];
} ts_request_def;
ts_request_def* request_buffer =
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));
the programmer basically is working on a buffer overflow concept. I know the code looks dodgy. so my questions are:
Does malloc always allocate contiguous block of memory? because in this code if the blocks are not contiguous, the code will fail big time
Doing free(request_buffer) , will it free all the bytes allocated by malloc i.e sizeof(ts_request_def) + (2 * 1024 * 1024),
or only the bytes of the size of the structure sizeof(ts_request_def)
Do you see any evident problems with this approach, I need to discuss this with my boss and would like to point out any loopholes with this approach
To answer your numbered points.
Yes.
All the bytes. Malloc/free doesn't know or care about the type of the object, just the size.
It is strictly speaking undefined behaviour, but a common trick supported by many implementations. See below for other alternatives.
The latest C standard, ISO/IEC 9899:1999 (informally C99), allows flexible array members.
An example of this would be:
int main(void)
{
struct { size_t x; char a[]; } *p;
p = malloc(sizeof *p + 100);
if (p)
{
/* You can now access up to p->a[99] safely */
}
}
This now standardized feature allowed you to avoid using the common, but non-standard, implementation extension that you describe in your question. Strictly speaking, using a non-flexible array member and accessing beyond its bounds is undefined behaviour, but many implementations document and encourage it.
Furthermore, gcc allows zero-length arrays as an extension. Zero-length arrays are illegal in standard C, but gcc introduced this feature before C99 gave us flexible array members.
In a response to a comment, I will explain why the snippet below is technically undefined behaviour. Section numbers I quote refer to C99 (ISO/IEC 9899:1999)
struct {
char arr[1];
} *x;
x = malloc(sizeof *x + 1024);
x->arr[23] = 42;
Firstly, 6.5.2.1#2 shows a[i] is identical to (*((a)+(i))), so x->arr[23] is equivalent to (*((x->arr)+(23))). Now, 6.5.6#8 (on the addition of a pointer and an integer) says:
"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
For this reason, because x->arr[23] is not within the array, the behaviour is undefined. You might still think that it's okay because the malloc() implies the array has now been extended, but this is not strictly the case. Informative Annex J.2 (which lists examples of undefined behaviour) provides further clarification with an example:
An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]) (6.5.6).
3 - That's a pretty common C trick to allocate a dynamic array at the end of a struct. The alternative would be to put a pointer into the struct and then allocate the array separately, and not forgetting to free it too. That the size is fixed to 2mb seems a bit unusual though.
This is a standard C trick, and isn't more dangerous that any other buffer.
If you are trying to show to your boss that you are smarter than "very old school programmer", this code isn't a case for you. Old school not necessarily bad. Seems the "old school" guy knows enough about memory management ;)
1) Yes it does, or malloc will fail if there isn't a large enough contiguous block available. (A failure with malloc will return a NULL pointer)
2) Yes it will. The internal memory allocation will keep track of the amount of memory allocated with that pointer value and free all of it.
3)It's a bit of a language hack, and a bit dubious about it's use. It's still subject to buffer overflows as well, just may take attackers slightly longer to find a payload that will cause it. The cost of the 'protection' is also pretty hefty (do you really need >2mb per request buffer?). It's also very ugly, although your boss may not appreciate that argument :)
I don't think the existing answers quite get to the essence of this issue. You say the old-school programmer is doing something like this;
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[1];
} ts_request_def;
ts_request_buffer_def* request_buffer =
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));
I think it's unlikely he's doing exactly that, because if that's what he wanted to do he could do it with simplified equivalent code that doesn't need any tricks;
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[2*1024*1024 + 1];
} ts_request_def;
ts_request_buffer_def* request_buffer =
malloc(sizeof(ts_request_def));
I'll bet that what he's really doing is something like this;
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[1]; // effectively package[x]
} ts_request_def;
ts_request_buffer_def* request_buffer =
malloc( sizeof(ts_request_def) + x );
What he wants to achieve is allocation of a request with a variable package size x. It is of course illegal to declare the array's size with a variable, so he is getting around this with a trick. It looks as if he knows what he's doing to me, the trick is well towards the respectable and practical end of the C trickery scale.
As for #3, without more code it's hard to answer. I don't see anything wrong with it, unless its happening a lot. I mean, you don't want to allocate 2mb chunks of memory all the time. You also don't want to do it needlessly, e.g. if you only ever use 2k.
The fact that you don't like it for some reason isn't sufficient to object to it, or justify completely re-writing it. I would look at the usage closely, try to understand what the original programmer was thinking, look closely for buffer overflows (as workmad3 pointed out) in the code that uses this memory.
There are lots of common mistakes that you may find. For example, does the code check to make sure malloc() succeeded?
The exploit (question 3) is really up to the interface towards this structure of yours. In context this allocation might make sense, and without further information it is impossible to say if it's secure or not.
But if you mean problems with allocating memory bigger than the structure, this is by no means a bad C design (I wouldn't even say it's THAT old school... ;) )
Just a final note here - the point with having a char[1] is that the terminating NULL will always be in the declared struct, meaning there can be 2 * 1024 * 1024 characters in the buffer, and you don't have to account for the NULL by a "+1". Might look like a small feat, but I just wanted to point out.
I've seen and used this pattern frequently.
Its benefit is to simplify memory management and thus avoid risk of memory leaks. All it takes is to free the malloc'ed block. With a secondary buffer, you'll need two free. However one should define and use a destructor function to encapsulate this operation so you can always change its behavior, like switching to secondary buffer or add additional operations to be performed when deleting the structure.
Access to array elements is also slightly more efficient but that is less and less significant with modern computers.
The code will also correctly work if memory alignment changes in the structure with different compilers as it is quite frequent.
The only potential problem I see is if the compiler permutes the order of storage of the member variables because this trick requires that the package field remains last in the storage. I don't know if the C standard prohibits permutation.
Note also that the size of the allocated buffer will most probably be bigger than required, at least by one byte with the additional padding bytes if any.
Yes. malloc returns only a single pointer - how could it possibly tell a requester that it had allocated multiple discontiguous blocks to satisfy a request?
Would like to add that not is it common but I might also called it a standard practice because Windows API is full of such use.
Check the very common BITMAP header structure for example.
http://msdn.microsoft.com/en-us/library/aa921550.aspx
The last RBG quad is an array of 1 size, which depends on exactly this technique.
This common C trick is also explained in this StackOverflow question (Can someone explain this definition of the dirent struct in solaris?).
In response to your third question.
free always releases all the memory allocated at a single shot.
int* i = (int*) malloc(1024*2);
free(i+1024); // gives error because the pointer 'i' is offset
free(i); // releases all the 2KB memory
The answer to question 1 and 2 is Yes
About ugliness (ie question 3) what is the programmer trying to do with that allocated memory?
the thing to realize here is that malloc does not see the calculation being made in this
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));
Its the same as
int sz = sizeof(ts_request_def) + (2 * 1024 * 1024);
malloc(sz);
YOu might think that its allocating 2 chunks of memory , and in yr mind they are "the struct", "some buffers". But malloc doesnt see that at all.