Pretty new to C, but I thought I had the hang of allocating and managing memory until I ran into this issue recently.
I am working on a "make" utility. (It's not homework, just my friend's old assignment that I thought I could glean valuable practice from.) As I'm sure most of you know, makefiles have various targets, and these targets have depdendencies that must be attended to before the targets's commands can be executed.
In order to store data for a given target's dependencies found while parsing the makefile I made the following:
typedef struct{
char* target;
char** dependency_list;
}dependency_tracker;
In order to keep track of multiple dependency_trackers, I declared (and subsequently allocated for) the following variable. (NOTICE the "+4" after "total_number_of_targets". THE PROGRAM DOESN'T WORK WITHOUT IT, AND MY QUESTION IS WHY THAT IS.)
dependency_tracker** d_tracker_ptr = (dependency_tracker**) malloc((total_number_of_targets+4)*sizeof(dependency_tracker*));
I then sent the pointer for this to the parsing method with the following line:
parse_file(filename,&d_tracker_ptr);
Within the parse_file function, I believe these are the most important calls I make (left out string parsing calls). Note that target_counter is the number of targets parsed so far. I think everything else should be somewhat manageable to figure out:
dependency_tracker** tracker_ptr = *tracker_ptr_address; // tracker_ptr_address is the pointer I passed to the function above
// declare and allocate for the new struct we are creating
dependency_tracker* new_tracker_ptr = (dependency_tracker*) malloc(sizeof(dependency_tracker));
char* new_tracker_ptr_target = (char*) malloc((size_of_target)*sizeof(char)); // size_of_target is the string length
new_tracker_ptr->target = new_tracker_ptr_target;
*(tracker_ptr+target_counter*sizeof(dependency_tracker*)) = new_tracker_ptr;
As I mentioned earlier, I have to allocate space for four more (dependency_tracker*)'s than I would have thought I needed to in order for this program to complete without a segfault.
I came to the conclusion that this was because I was overwriting the space I had allocated for the pointer I pass to parse_file.
My question is: why does this happen? Even if space for a NULL pointer is needed, that shouldn't require the space of 4 additional pointers. And the program produces a segfault if I allocate anything less than 25 additional bytes in the original call to malloc
Let me know if anything needs clarification. I know this is a bit of a novel.
This is broken:
*(tracker_ptr+target_counter*sizeof(dependency_tracker*)) = new_tracker_ptr;
The pointer size is accounted for by C. You want:
tracker_ptr[target_counter] = new_tracker_ptr;
Also as I mentioned in comments, you did not allow for a null terminator in the strings.
Another comment: C does not require a cast on malloc, and using one invites trouble. Also it's safer to just dereference the pointer you're assigning to inform sizeof. So just say:
dependency_tracker *new_tracker_ptr = malloc(sizeof *new_tracker_ptr);
char* new_tracker_ptr_target = malloc(size_of_target * sizeof *new_tracker_ptr_target);
dependency_tracker *new_tracker_ptr = malloc(*new_tracker_ptr);
new_tracker_ptr->target = new_tracker_ptr_target;
Additionally, you may want to reconsider the vacuous words in your variable names. I'm actually a big fan of longish, explanatory identifiers, but "tracker" and "target" are so vague that they add little clarity. Similarly, embedding type information in variable names a la _ptr was a fad about 30 years ago. It's over now. If you have a function where the declaration and a variable name can't be grok'ed on the same screen, the function is too big.
*(tracker_ptr+target_counter*sizeof(dependency_tracker*)) = ...
This is the problem. Pointer arithmetic doesn't work like that. You do not have to multiply by sizeof(anyhing) when using properly typed (i.e. not char*) pointer arithmetic. What's better, you don't have to use pointer arithmetic at all.
tracker_ptr[target_counter] = ...
is all that's needed.
Related
I'm trying to create a 2D array that will store be able to store each character of a .txt file as an element in the 2D array.
How do I dynamically allocate space for it?
This what I've done so far to malloc it. (this was copied of GeeksForGeeks)
char *arr[rownum2];
for (i = 0; i < rownum2; i++) {
arr[i] = (char *)malloc(colnum * sizeof(char));
However, I think this is the source of serious memory related issues later on in my program, and I've also been told some parts of this are unnecessary.
Can I please get the most suitable way to dynamically allocate memory for the 2D array in this specific scenario?
The code you have posted is 'OK', so long as you remember to call free() on the allocated memory, later in your code, like this:
for (i=0;i<rownum2;i++) free(arr[i]);
...and I've also been told some parts of this are unnecessary.
The explicit cast is unnecessary, so, instead of:
arr[i] = (char *)malloc(colnum*sizeof(char));
just use:
arr[i] = malloc(colnum*sizeof(char));
The sizeof(char) is also, strictly speaking, unnecessary (char will always have a size of 1) but you can leave that, for clarity.
Technically, it's not a 2D array, but an array of arrays. The difference is, you can't make 2D array with lines of different size, but you can do it with your array of arrays.
If you don't need it, you can allocate rownum2*colnum elements and access each element as arr[x+colnum*y] (it's used often because all data are kept in one place, decreasing CPU cache load and some system inner needs for storing each pointer of each allocated chunk).
Also, even array of lines of different sizes can be placed into 1D array and accessed like 2D (at least, if they do not change size or even RO). You can allocate char body[total_size], read the whole array, allocate char* arr[rownum2] and set each arr[i]=body+line_beginning_offset.
BTW don't forget there are not actual C strings because they are not null-terminated. You'll need an additional column for null-term. If you store ASCII art, 2D array is a very good solution.
The only serious problem I see in your code is that you are casting the returned value of malloc(3), and probably you have forgotten to #include <stdlib.h> also (this is a dangerous cocktail), and this way, you are destroying the returned value of the call with the cast you put before malloc(3). Let me explain:
First, you have (or haven't, but I have to guess) a 64bit architecture (as it is common today) and pointers are 64bit wide in your system, while int integers are 32bit wide.
You have probably forgotten to #include <stdlib.h> in your code (which is something I have to guess also), so the compiler is assuming that malloc(3) is actually a function returning int (this is legacy in C, if you don't provide a prototype for a function external to the compilation unit), so the compiler is generating code to get just a 32 bit value from the malloc(3) function, and not the 64bit pointer that (probably, but I have to guess also) malloc(3) actually returns.
You are casting that int 32bit value (already incorrect) to a 64bit pointer (far more incorrect, but I have to guess...), making any warning about type conversions between integer values and pointers to dissapear, and be silenced when you put the cast (the compiler assumes that, as a wise programmer you are, you have put the cast there on purpose, and that you know what you are doing)
The first (undefined behaviour) returned value is being (undefined behaviour) just cut to 32 bit, and then converted (from int to char *, with more undefined behaviour) to be used in your code. This makes the original pointer returned from malloc(3) to be completely different value when reinterpreted and cast to (char *). This makes your pointers to point to a different place, and break your program on execution.
Your code should be something like (again, a snippet has to be used, as your code is not complete):
#include <stdlib.h> /* for malloc() */
/* ... */
char *arr[rownum2];
for (i = 0; i < rownum2; i++) {
arr[i] = malloc(colnum); /* sizeof(char) is always 1 */
I need finally to do you a recommendation:
Please, read (and follow) the how to create a minimal, verifiable example page, as your probable missing #include error, is something I had to guess.... Posting snippets of code makes many times your mistakes to go away, and we have to guess what can be happening here. This is the most important thing you have to learn from this answer. Post complete, compilable and verifiable code (that is, code that you can check fails, before posting, not a snippet you selected where you guess the problem can be). The code you posted does allow nobody to verify why it can be failing, because it must be completed (and repaired, probably) to make it executable.
I am reading in a book that the malloc function in C takes the number of 'chunks' of memory you wish to allocate as a parameter and determines how many bytes the chunks are based on what you cast the value returned by malloc to. For example on my system an int is 4 bytes:
int *pointer;
pointer = (int *)malloc(10);
Would allocate 40 bytes because the compiler knows that ints are 4 bytes.
This confuses me for two reasons:
I was reading up, and the size parameter is actually the number of bytes you want to allocate and is not related to the sizes of any types.
Malloc is a function that returns an address. How does it adjust the size of the memory it allocated based on an external cast of the address it returned from void to a different type? Is it just some compiler magic I am supposed to accept?
I feel like the book is wrong. Any help or clarification is greatly appreciated!
Here is what the book said:
char *string;
string = (char *)malloc(80);
The 80 sets aside 80 chunks of storage. The chunk size is set by the typecast, (char *), which means that malloc() is finding storage for 80 characters of text.
Yes the book is wrong and you are correct please throw away that book.
Also, do let everyone know of the name of the book so we can permanently put it in our never to recommend black list.
Good Read:
What is the Best Practice for malloc?
When using malloc(), use the sizeof operator and apply it to the object being allocated, not the type of it.
Not a good idea:
int *pointer = malloc (10 * sizeof (int)); /* Wrong way */
Better method:
int *pointer = malloc (10 * sizeof *pointer);
Rationale: If you change the data type that pointer points to, you don't need to change the malloc() call as well. Maintenance win.
Also, this method is less prone to errors during code development. You can check that it's correct without having to look at the declaration in cases where the malloc() call occurs apart from the variable declaration.
Regarding your question about casting malloc(), note that there is no need for a cast on malloc() in C today. Also, if the data type should change in a future revision, any cast there would have to be changed as well or be another error source. Also, always make sure you have <stdlib.h> included. Often people put in the cast to get rid of a warning that is a result from not having the include file. The other reason is that it is required in C++, but you typically don't write C++ code that uses malloc().
The exact details about how malloc() works internally is not defined in the spec, in effect it is a black box with a well-defined interface. To see how it works in an implementation, you'd want to look at an open source implementation for examples. But malloc() can (and does) vary wildly on various platforms.
I have a misunderstanding regarding this code -
typedef struct _EXP{
int x;
char* name;
char lastName[40];
}XMP
...main...
XMP a;
a.name = "eaaa";
a.lastName = strcpy(a.lastName, "bbb");
Why can't I use: a.lastName = "bbbb"; and that's all?
Well consider the types here. The array has the contents of the string, while the char* merely points to the data. Consequently the array requires strcpy and friends.
Besides, if you allocated memory for the char* on the heap or stack and then wanted to assign some content to that, you'd also have to use strcpy because a mere assignment would create a dangling pointer (i.e. a memory leak).
Because the location of an array is fixed, while the value of a pointer (which is itself a location) is not. You can assign new values to a pointer, but not an array.
Under the hood, they're both the same thing; an array name in C is a pointer, but from a semantics point of view you cannot reassign an array but you can repoint a pointer.
When you write
a.name = "eaaa" ;
the compiler will allocate memory for a NULL terminated string eaaa\0 and, because of that instruction, it will make the pointer name point to that location (e.g. the name variable will contain the address of the memory location where the first byte of the string resides).
If you have the array instead, you already have an allocated area of memory (which cannot be assigned to another memory location!), and you can only fill it with data (in this case bytes representing your string).
This is my understanding about what might be the reason for this.
I think it's about the way that language works. C (and also C++) produces an unmanaged code - which means they don't need an environment (like JVM) to run on to manage memory, threading etc. So, the code is produced to an executable that is run by the OS directly. For that reason, the executable includes information, for example, how much space that to be allocated for each type (not sure for the dynamic types though) including the arrays. (This is also why C++ introduced header files since this was the only way to know size of an object during compilation)
So, when the compiler sees an array of characters, it calculates how much space is needed for it during the compilation phase and put that information into the executable. When running the program, the flow can figure out how much space is required and allocates that much of memory. If you change this multiple times, let's say in a C function, each assignment would make the previous one(s) invalid. So, IMO, that's why the compiler doesn't allow that.
I am writing an app with the following dynamic config structure:
typedef struct {
char apphash[41];
char filenames_count;
char * filename[64];
} config;
But this code is wrong, I can't figure out how to copy data from and to c->filename[0] properly; c is a pointer to config structure, allocated dynamically like
config * c = (config *) malloc( 42 + 64 * 2 ) // alloc for 2 filenames. can realloc() later.
It segfaults if I use something like strcpy(c->filename[0],"file1.txt").
Can someone please help me with this?
Currently, I'm using direct address calculation, like
strcpy(
(char*)
((unsigned long) c + 42 /* apphash + filenames_count */ +
64 * 0 /* first item */ ),
"file1.txt"
);
and it works of course.
You see, I'm more of assembly programmer than of C, but I'd like this code to be more human-readable. This code looks that bad because I'm newcomer in C.
Oh, I gave a bad description of the situation. Sorry for that :(
The real code looks like:
config * c = (config*) malloc( 42 + 64 * 2 );
// we may realloc() it later if we are going to add more filenames.
// failing example how I do copy one default filename
strcpy(c->filename[0],"file1.txt");
// working example (i386)
strcpy((char*)((unsigned long) c + 42 + 64 * 0),"file1.txt");
I am using fully static structure type because it is going to be loaded directly from a file next time. That's why I can't really use pointers inside the structure, I need real data to be placed there.
I do check all lengths, no BOFs in real code, I just omitted all that stuff here.
I still didn't find a good solution to this.
Thanks again and sorry for bad question information.
I assume you have many filenames since you have filenames_count. Try
config_obj.filename[0] = strdup("file1.txt")
In your struct you're allocating an array of pointers to chars, not an array of chars. You must explicitely allocate also the targets of pointers, or at the very least, make the struct contain also the arrays of chars themselves:
char filename[64][MAX_PATH+1];
Replace MAX_PATH with the maximum length of any filename. Mind that this is not a very elegant solution, albeit a really simple one, because you're wasting lots of space.
Your direct address calculation is doing something different: it's placing the string directly in the space allocated for the pointers (and this is a Terribly Wrong Thing To Do™)
Right now config is a type that means the struct you have defined. You don't show us an identifier referring to an actual variable of type config.
So, first thing we need an instance of type config. You will either do
config c;
... c.filename ...
note the structure access operator is a ., or you will do something like
config *p = malloc(config)
/* error checking */
...c->filename ...
where -> is the pointer-dereference-and-access operator. The first form is preferred unless you hve a reason to want dynamic allocation (which, alas, happens a lot in c).
Then you have to figure out just what you want filename to be. As it is you have allocated space for 64 character pointers which don't point at allocated memory (except by purest acident, and then not at the memory you mean). You probably wanted{*} char filename[64] (a single filename allowed to be up to 63 characters long (to leave room for the null termination)) in which case you would use
strcpy(c.filename,"file1.txt");
/* or */
strcpy(p->filename,"file1.txt");
depending on how you allocated the structure in the first place.
If you really wanted a list of filenames, then you may want char *filenames[64], but you will have to allocate a buffer for each name before you can use it
c.filenames[0] = malloc(sizeOfString);
/* error checking */
strcpy(c.filenames[0],...
or as another poster suggested
c.filenames[o] = strdup(...
The first form may be better if you are building your filenames from multiple pieces and can project the total length from the get go.
{*} Later you may want to scrap this fixed length buffer, but leave that for now.
It is currently failing because you aren't allocating memory for the filenames. Either use strdup or malloc+strcpy (I'd use strdup).
Your filename field is an array of pointers to zero terminated string. You need to allocate the memory for the string and copy the string to that memory. You save the address of the new string in one of the pointers, e.g. filename[0].
The direct memory address code doesn't work. It just doesn't crash, yet! That code just overwrites the array of pointers. Don't ever write code like that. Never ever ever!! Writing code like that is morally equivalent to eating baby unicorns.
I have a piece of code written by a very old school programmer :-) . it goes something like this
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[1];
} ts_request_def;
ts_request_def* request_buffer =
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));
the programmer basically is working on a buffer overflow concept. I know the code looks dodgy. so my questions are:
Does malloc always allocate contiguous block of memory? because in this code if the blocks are not contiguous, the code will fail big time
Doing free(request_buffer) , will it free all the bytes allocated by malloc i.e sizeof(ts_request_def) + (2 * 1024 * 1024),
or only the bytes of the size of the structure sizeof(ts_request_def)
Do you see any evident problems with this approach, I need to discuss this with my boss and would like to point out any loopholes with this approach
To answer your numbered points.
Yes.
All the bytes. Malloc/free doesn't know or care about the type of the object, just the size.
It is strictly speaking undefined behaviour, but a common trick supported by many implementations. See below for other alternatives.
The latest C standard, ISO/IEC 9899:1999 (informally C99), allows flexible array members.
An example of this would be:
int main(void)
{
struct { size_t x; char a[]; } *p;
p = malloc(sizeof *p + 100);
if (p)
{
/* You can now access up to p->a[99] safely */
}
}
This now standardized feature allowed you to avoid using the common, but non-standard, implementation extension that you describe in your question. Strictly speaking, using a non-flexible array member and accessing beyond its bounds is undefined behaviour, but many implementations document and encourage it.
Furthermore, gcc allows zero-length arrays as an extension. Zero-length arrays are illegal in standard C, but gcc introduced this feature before C99 gave us flexible array members.
In a response to a comment, I will explain why the snippet below is technically undefined behaviour. Section numbers I quote refer to C99 (ISO/IEC 9899:1999)
struct {
char arr[1];
} *x;
x = malloc(sizeof *x + 1024);
x->arr[23] = 42;
Firstly, 6.5.2.1#2 shows a[i] is identical to (*((a)+(i))), so x->arr[23] is equivalent to (*((x->arr)+(23))). Now, 6.5.6#8 (on the addition of a pointer and an integer) says:
"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
For this reason, because x->arr[23] is not within the array, the behaviour is undefined. You might still think that it's okay because the malloc() implies the array has now been extended, but this is not strictly the case. Informative Annex J.2 (which lists examples of undefined behaviour) provides further clarification with an example:
An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]) (6.5.6).
3 - That's a pretty common C trick to allocate a dynamic array at the end of a struct. The alternative would be to put a pointer into the struct and then allocate the array separately, and not forgetting to free it too. That the size is fixed to 2mb seems a bit unusual though.
This is a standard C trick, and isn't more dangerous that any other buffer.
If you are trying to show to your boss that you are smarter than "very old school programmer", this code isn't a case for you. Old school not necessarily bad. Seems the "old school" guy knows enough about memory management ;)
1) Yes it does, or malloc will fail if there isn't a large enough contiguous block available. (A failure with malloc will return a NULL pointer)
2) Yes it will. The internal memory allocation will keep track of the amount of memory allocated with that pointer value and free all of it.
3)It's a bit of a language hack, and a bit dubious about it's use. It's still subject to buffer overflows as well, just may take attackers slightly longer to find a payload that will cause it. The cost of the 'protection' is also pretty hefty (do you really need >2mb per request buffer?). It's also very ugly, although your boss may not appreciate that argument :)
I don't think the existing answers quite get to the essence of this issue. You say the old-school programmer is doing something like this;
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[1];
} ts_request_def;
ts_request_buffer_def* request_buffer =
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));
I think it's unlikely he's doing exactly that, because if that's what he wanted to do he could do it with simplified equivalent code that doesn't need any tricks;
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[2*1024*1024 + 1];
} ts_request_def;
ts_request_buffer_def* request_buffer =
malloc(sizeof(ts_request_def));
I'll bet that what he's really doing is something like this;
typedef struct ts_request
{
ts_request_buffer_header_def header;
char package[1]; // effectively package[x]
} ts_request_def;
ts_request_buffer_def* request_buffer =
malloc( sizeof(ts_request_def) + x );
What he wants to achieve is allocation of a request with a variable package size x. It is of course illegal to declare the array's size with a variable, so he is getting around this with a trick. It looks as if he knows what he's doing to me, the trick is well towards the respectable and practical end of the C trickery scale.
As for #3, without more code it's hard to answer. I don't see anything wrong with it, unless its happening a lot. I mean, you don't want to allocate 2mb chunks of memory all the time. You also don't want to do it needlessly, e.g. if you only ever use 2k.
The fact that you don't like it for some reason isn't sufficient to object to it, or justify completely re-writing it. I would look at the usage closely, try to understand what the original programmer was thinking, look closely for buffer overflows (as workmad3 pointed out) in the code that uses this memory.
There are lots of common mistakes that you may find. For example, does the code check to make sure malloc() succeeded?
The exploit (question 3) is really up to the interface towards this structure of yours. In context this allocation might make sense, and without further information it is impossible to say if it's secure or not.
But if you mean problems with allocating memory bigger than the structure, this is by no means a bad C design (I wouldn't even say it's THAT old school... ;) )
Just a final note here - the point with having a char[1] is that the terminating NULL will always be in the declared struct, meaning there can be 2 * 1024 * 1024 characters in the buffer, and you don't have to account for the NULL by a "+1". Might look like a small feat, but I just wanted to point out.
I've seen and used this pattern frequently.
Its benefit is to simplify memory management and thus avoid risk of memory leaks. All it takes is to free the malloc'ed block. With a secondary buffer, you'll need two free. However one should define and use a destructor function to encapsulate this operation so you can always change its behavior, like switching to secondary buffer or add additional operations to be performed when deleting the structure.
Access to array elements is also slightly more efficient but that is less and less significant with modern computers.
The code will also correctly work if memory alignment changes in the structure with different compilers as it is quite frequent.
The only potential problem I see is if the compiler permutes the order of storage of the member variables because this trick requires that the package field remains last in the storage. I don't know if the C standard prohibits permutation.
Note also that the size of the allocated buffer will most probably be bigger than required, at least by one byte with the additional padding bytes if any.
Yes. malloc returns only a single pointer - how could it possibly tell a requester that it had allocated multiple discontiguous blocks to satisfy a request?
Would like to add that not is it common but I might also called it a standard practice because Windows API is full of such use.
Check the very common BITMAP header structure for example.
http://msdn.microsoft.com/en-us/library/aa921550.aspx
The last RBG quad is an array of 1 size, which depends on exactly this technique.
This common C trick is also explained in this StackOverflow question (Can someone explain this definition of the dirent struct in solaris?).
In response to your third question.
free always releases all the memory allocated at a single shot.
int* i = (int*) malloc(1024*2);
free(i+1024); // gives error because the pointer 'i' is offset
free(i); // releases all the 2KB memory
The answer to question 1 and 2 is Yes
About ugliness (ie question 3) what is the programmer trying to do with that allocated memory?
the thing to realize here is that malloc does not see the calculation being made in this
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));
Its the same as
int sz = sizeof(ts_request_def) + (2 * 1024 * 1024);
malloc(sz);
YOu might think that its allocating 2 chunks of memory , and in yr mind they are "the struct", "some buffers". But malloc doesnt see that at all.