This is a snippet of code from an array library I'm using. This runs fine on windows, but when I compile with gcc on linux if crashes in this function. when trying to narrow down the problem, I added a printf statement to it, and the code stopped crashing.
void _arrayCreateSize( void ***array, int capacity )
{
(*array) = malloc( (capacity * sizeof(int)) + sizeof(ArrayHeader) );
((ArrayHeader*)(*array))->size = 0;
((ArrayHeader*)(*array))->capacity = capacity;
// printf("Test!\n");
*(char**)array += sizeof(ArrayHeader);
}
As soon as that printf is taken out it starts crashing on me again. I'm completely baffled as to why it's happening.
The last line in the function is not doing what was intended. The code is obscure to the point of impenetrability.
It appears that the goal is to allocate an array of int, because of the sizeof(int) in the first memory allocation. At the very least, if you are meant to be allocating an array of structure pointers, you need to use sizeof(SomeType *), the size of some pointer type (sizeof(void *) would do). As written, this will fail horribly in a 64-bit environment.
The array is allocated with a structure header (ArrayHeader) followed by the array proper. The returned value is supposed to the start of the array proper; the ArrayHeader will presumably be found by subtraction from the pointer. This is ugly as sin, and unmaintainable to boot. It can be made to work, but it requires extreme care, and (as Brian Kernighan said) "if you're as clever as possible when you write the code, how are you ever going to debug it?".
Unfortunately, the last line is wrong:
void _arrayCreateSize( void ***array, int capacity )
{
(*array) = malloc( (capacity * sizeof(int)) + sizeof(ArrayHeader) );
((ArrayHeader*)(*array))->size = 0;
((ArrayHeader*)(*array))->capacity = capacity;
// printf("Test!\n");
*(char**)array += sizeof(ArrayHeader);
}
It adds sizeof(ArrayHeader) * sizeof(char *) to the address, instead of the intended sizeof(ArrayHeader) * sizeof(char). The last line should read, therefore:
*(char *)array += sizeof(ArrayHeader);
or, as noted in the comments and an alternative answer:
*(ArrayHeader *)array += 1;
*(ArrayHeader *)array++;
I note in passing that the function name should not really start with an underscore. External names starting with an underscore are reserved to the implementation (of the C compiler and library).
The question asks "why does the printf() statement 'fix' things". The answer is because it moves the problem around. You've got a Heisenbug because there is abuse of the allocated memory, and the presence of the printf() manages to alter the behaviour of the code slightly.
Recommendation
Run the program under valgrind. If you don't have it, get it.
Revise the code so that the function checks the return value from malloc(), and so it returns a pointer to a structure for the allocated array.
Use the clearer code outlined in Michael Burr's answer.
Arbitrary random crashing when adding seemingly unrelated printf() statements often is a sign of a corrupted heap. The compiler sometimes stores information about allocated memory directly on the heap itself. Overwriting that metadata leads to surprising runtime behavior.
A few suggestions:
are you sure that you need void ***?
try replacing your argument to malloc() with 10000. Does it work now?
Moreover, if you just want arrays that store some metadata, your current code is a bad approach. A clean solution would probably use a structure like the following:
struct Array {
size_t nmemb; // size of an array element
size_t size; // current size of array
size_t capacity; // maximum size of array
void *data; // the array itself
};
Now you can pass an object of type Array to functions that know about the Array type, and Array->data cast to the proper type to everything else. The memory layout might even be the same as in your current approach, but access to the metadata is significantly easier and especially more obvious.
Your main audience is the poor guy that has to maintain your code 5 years from now.
Now that Jonathan Leffler has pointed out what the bug was, might I suggest that the function be written in a manner that's a little less puzzling?:
void _arrayCreateSize( void ***array, int capacity )
{
// aloocate a header followed by an appropriately sized array of pointers
ArrayHeader* p = malloc( sizeof(ArrayHeader) + (capacity * sizeof(void*)));
p->size = 0;
p->capacity = capacity;
*array = (void**)(p+1); // return a pointer to just past the header
// (pointing at the array of pointers)
}
Mix in your own desired handling of malloc() failure.
I think this will probably help the next person who needs to look at it.
Related
For an experiment I created a function to initialize an array that have a built-in length like in java
int *create_arr(int len) {
void *ptr = malloc(sizeof(int[len + 1]));
int *arr = ptr + sizeof(int);
arr[-1] = len;
return arr;
}
that can be later be used like this
int *arr = create_arr(12);
and allow to find the length at arr[-1]. I was asking myself if this is a common practice or not, and if there is an error in what i did.
First of all, your code has some bugs, mainly that in standard C you can't do arithmetic on void pointers (as commented by MikeCAT). Probably a more typical way to write it would be:
int *create_arr(int len) {
int *ptr = malloc((len + 1) * sizeof(int));
if (ptr == NULL) {
// handle allocation failure
}
ptr[0] = len;
return ptr + 1;
}
This is legal but no, it's not common. It's more idiomatic to keep track of the length in a separate variable, not as part of the array itself. An exception is functions that try to reproduce the effect of malloc, where the caller will later pass back the pointer to the array but not the size.
One other issue with this approach is that it limits your array length to the maximum value of an int. On, let's say, a 64-bit system with 32-bit ints, you could conceivably want an array whose length did not fit in an int. Normally you'd use size_t for array lengths instead, but that won't work if you need to fit the length in an element of the array itself. (And of course this limitation would be much more severe if you wanted an array of short or char or bool :-) )
Note that, as Andrew Henle comments, the pointer returned by your function could be used for an array of int, but would not be safe to use for other arbitrary types as you have destroyed the alignment promised by malloc. So if you're trying to make a general wrapper or replacement for malloc, this doesn't do it.
Apart from the small mistakes that have already been pointed in comments, this is not common, because C programmers are used to handle arrays as an initial pointer and a size. I have mainly seen that in mixed programming environments, for example in Windows COM/DCOM where C++ programs can exchange data with VB programs.
Your array with builtin size is close to winAPI BSTR: an array of 16 bits wide chars where the allocated size is at index -1 (and is also a 16 bit integer). So there is nothing really bad with it.
But in the general case, you could have an alignment problem. malloc does return a pointer with a suitable alignment for any type. And you should make sure that the 0th index of your returned array also has a suitable alignment. If int has not the larger alignment, it could fail...
Furthermore, as the pointer is not a the beginning of the allocated memory, the array would require a special function for its deallocation. It should probaby be documented in a red flashing font, because this would be very uncommon for most C programmers.
This technique is not as uncommon as people expect. For example stb header only library for image processing uses this method to implement type safe vector like container in C. See https://github.com/nothings/stb/blob/master/stretchy_buffer.h
It would be more idiomatic to do something like:
struct array {
int *d;
size_t s;
};
struct array *
create_arr(size_t len)
{
struct array *a = malloc(sizeof *a);
if( a ){
a->d = malloc(len * sizeof *a->d);
a->s = a->d ? len : 0;
}
return a;
}
I want to make a program which will say how many big and short letters is in the word and such, but run in to the problem I can't declare content of array dynamically. This is all C code.
I tried this:
char something;
scanf("%c",somethnig);
char somethingmore[]=something;
printf("%c",something[0])
but it wasn't possible to compile I also tried something like this:
char *something;
scanf("%c",something);
printf("%c",something[0]);
which was possible to compile but crushed when called array pointer(I apologize if the naming is wrong) I programing beginner so this is maybe silly question.
This is all just example of problem I run to not code of my program.
Well, disregarding the weirdly wrong syntax in your snippet, I think a good answer comes down to remind you of one thing:
C doesn't do any memory management for you.
Or, in other words, managing memory has to be done explicitly. As a consequence, arrays have a fixed size in C (must be known at compile time, so the compiler can reserve appropriate space in the binary, typically in a data segment, or on the stack for a local variable).
One notable exception is variable length arrays in c99, but even with them, the size of the array can be set only one time -- at initialization. It's a matter of taste whether to consider this a great thing or just a misfeature, but it will not solve your problem of changing the size of something at runtime.
If you want to dynamically grow something, there's only one option: make it an allocated object and manage memory for it yourself, using the functions malloc(), calloc(), realloc() and free(). All these functions are part of standard C, so you should read up on them. A typical usage (not related to your question) would be something like:
#include <stdlib.h>
int *list = 0;
size_t capacity = 0;
size_t count = 0;
void append(int value)
{
if (capacity)
{
if (count == capacity)
{
/* reserve more space, for real world code check realloc()
* return value */
capacity *= 2;
list = realloc(list, capacity * sizeof(int));
}
}
else
{
/* reserve an initial amount, for real world code check malloc()
* return value */
capacity = 16;
list = malloc(capacity * sizeof(int));
}
list[count++] = value;
}
This is very simplified, you'd probably define a container as a struct containing your pointer to the "array" as well as the capacity and count members and define functions working on that struct in some real world code. Or you could go and use predefined containers in a library like e.g. glib.
I ran into a rather weird problem,
I have the following code:
typedef struct{
char *a;
char *b;
char *c;
}Str;
typedef struct{
int size;
str array[]; //flexible array.
}strArr;
The purpose here is to allocate a,b, and c for the new element from the realloc.
StrArr *arr;
int arrSize;
arrSize = 1;
arr = malloc(sizeof(strArr)+sizeof(int)*arrSize);
arr->size++;
arr = realloc(arr, sizeof(strArr)+sizeof(int)*arr->size);
arr->array[arr->size-1].a = malloc(sizeof(char)*75);
arr->size++;
card = realloc(arr, sizeof(strArr)+sizeof(int)*arr->size);
The question is: whenever arr is realloc'd to be one bigger, do you have to allocate memory for the strings of the new element? This code will fail if it is run because it gives me glibc detected at the second realloc. What am I doing wrong? If i take off the malloc statement in the middle it runs. Also, if i try a strcpy into arr->array[arr->size-1].a, it would segfault.
Any help would be appreciated.
Thank you.
There are numerous issues with this code, enough to suggest that whatever you're experiencing can't be reproduced. Nonetheless, there are sufficient problems to cause instability (i.e. segmentation violations). I'm going to assume you meant to use a lowercase s in str rather than an uppercase S in Str; it only makes sense that way. Similarly for the lowercase s (which should be) in strArray.
At which point have you assigned arr->size a value in order for arr->size++; to be useful? That itself is a mistake, but that's interlaced into another mistake:
arr = realloc(arr, sizeof(strArr)+sizeof(int)*arr->size);
That turns out to be a major issue as you continue to use the uninitialised variable in critical pieces of logic, again and again, nonetheless, once that issue is resolved, the next mistake here is:
Anything that resembles the pattern X = realloc(X, Y); is suspicious. It's the Xes. Those should be different. You're not supposed to just replace the values like that. I mean, it'll work, kind of... but it's not much more effort to do it properly, and unless done properly, this won't be valgrind-friendly. That should be a big deal to you, because valgrind is a tool that helps us identify memory leaks!
You should store this into a temporary variable:
void *temp = realloc(X, Y);
... and then you can handle errors, perhaps by cleaning up and exiting properly:
if (temp == NULL) {
perror("realloc");
/* free(X); // what would valgrind cease complaining about? */
exit(EXIT_FAILURE);
}
... and replacing X with temp:
X = temp;
sizeof(int) should not be assumed to be the same size as sizeof str (whatever str is). Given the type of arr->array, I would expect sizeof str or, better yet, here's a nice pattern to keep in mind:
// X = realloc(Y, Z); or ...
void *temp = realloc(arr, sizeof *arr + arr->size * sizeof arr->array[0]);
// XXX: handle errors
The question is: whenever arr is realloc'd to be one bigger, do you have to allocate memory for the strings of the new element?
The strings themselves should be in a separate storage location to the list nodes. What is this? Strings and list nodes, in the same array?!
I suppose it might make sense if by strings you mean fixed-width, null padded fields. Fixing the width of the field makes expressing the array in a one-dimensional space much easier.
Otherwise, you should keep your strings allocated separately from your list nodes... in a manner with which the down-stream programmer has complete control over, if I may add, is kinda nice, though you lose that the moment you use realloc, malloc, etc (and thus the moment you use VLAs, hmmmm!)...
What am I doing wrong?
I think I've picked apart your code sufficing to say:
Initialise all of your variables before you use them. In this case, there are some variables pointed at by arr which are used without first being initialised.
Don't assume sizeof(int) and sizeof (/*any pointer type*/) have the same width. There are very real systems where this won't be true.
Remember to use that X = realloc(Y, Z); pattern, followed by error handling, followed by Y = X;.
I'm still not sure whether forcing down-stream programmers to rely upon malloc/realloc/etc and free is necessary, or even beneficial, here.
Also, if i try a strcpy into arr->array[arr->size-1].a, it would segfault.
Yes, well... there's that phantom arr->size-related issue again!
I'm trying to learn C and as a start, i set off writing a strcpy for my own practice. As we know, the original strcpy easily allows for security problems so I gave myself the task to write a "safe" strcpy.
The path I've chosen is to check wether the source string (character array) actually fits in the destination memory. As I've understood it, a string in C is nothing more than a pointer to a character array, 0x00 terminated.
So my challenge is how to find how much memory the compiler actually reserved for the destination string?
I tried:
sizeof(dest)
but that doesn't work, since it will return (as I later found out) the size of dest which is actually a pointer and on my 64 bit machine, will always return 8.
I also tried:
strlen(dest)
but that doesn't work either because it will just return the length until the first 0x0 is encountered, which doesn't necessarily reflect the actual memory reserved.
So this all sums up to the following question: How to find our how much memory the compiler reserved for my destination "string"???
Example:
char s[80] = "";
int i = someFunction(s); // should return 80
What is "someFunction"?
Thanks in advance!
Once you pass a char pointer to the function you are writing, you will loose knowledge for how much memory is allocated to s. You will need to pass this size as argument to the function.
You can use sizeof to check at compile time:
char s[80] = "";
int i = sizeof s ; // should return 80
Note that this fails if s is a pointer:
char *s = "";
int j = sizeof s; /* probably 4 or 8. */
Arrays are not pointers. To keep track of the size allocated for a pointer, the program simply must keep track of it. Also, you cannot pass an array to a function. When you use an array as an argument to a function, the compiler converts that to a pointer to the first element, so if you want the size to be avaliable to the called function, it must be passed as a parameter. For example:
char s[ SIZ ] = "";
foo( s, sizeof s );
So this all sums up to the following question: How to find our how much memory the compiler reserved for my destination "string"???
There is no portable way to find out how much memory is allocated. You have to keep track of it yourself.
The implementation must keep track of how much memory was malloced to a pointer, and it may make something available for you to find out. For example, glibc's malloc.h exposes
size_t malloc_usable_size (void *__ptr)
that gives you access to roughly that information, however, it doesn't tell you how much you requested, but how much is usable. Of course, that only works with pointers you obtained from malloc (and friends). For an array, you can only use sizeof where the array itself is in scope.
char s[80] = "";
int i = someFunction(s); // should return 80
In an expression s is a pointer to the first element of the array s. You cannot deduce the size of an array object with the only information of the value of a pointer to its first element. The only thing you can do is to store the information of the size of the array after you declare the array (here sizeof s) and then pass this information to the functions that need it.
There's no portable way to do it. However, the implementation certainly needs to know this information internally. Unix-based OSes, like Linux and OS X, provide functions for this task:
// OS X
#include <malloc/malloc.h>
size_t allocated = malloc_size(somePtr);
// Linux
#include <malloc.h>
size_t allocated = malloc_usable_size(somePtr);
// Maybe Windows...
size_t allocated = _msize(somePtr);
A way to tag the member returned by malloc is to always malloc an extra sizeof(size_t) bytes. Add that to the address malloc returns, and you have a storage space for storing the actual length. Store the malloced size - the sizeof (size_t) there, and you have the basis for your new set of functions.
When you pass two of these sorts of pointers into your new-special strcpy, you can subtract sizeof(size_t) off the pointers, and access the sizes directly. That lets you decide if the memory can be copied safely.
If you are doing strcat, then the two sizes, along with calculating the strlens means you can do the same sort of check to see if the results of the strcat will overflow the memory.
It's doable.
It's probably more trouble than it's worth.
Consider what happens if you pass in a character pointer that was not mallocated.
The assumption is that the size is before the pointer. That assumption is false.
Attempting to access the size in that case is undefined behavior. If you are lucky, you may get a signal.
One other implication of that sort of implementation is that when you go to free the memory, you have to pass in exactly-the-pointer-that-malloc-returned. If you don't get that right, heap corruption is possible.
Long story short...
Don't do it that way.
For situations where you are using character buffers in your program, you can do some smoke and mirrors to get the effect that you want. Something like this.
char input[] = "test";
char output[3];
if (sizeof(output) < sizeof(input))
{
memcpy(output,input,sizeof(input) + 1);
}
else
{
printf("Overflow detected value <%s>\n",input);
}
One can improve the error message by wraping the code in a macro.
#define STRCPYX(output,input) \
if (sizeof(output) < sizeof(input)) \
{ \
memcpy(output,input,sizeof(input) + 1); \
} \
else \
{ \
printf("STRCPYX would overflow %s with value <%s> from %s\n", \
#output, input, #input); \
} \
char input[] = "test";
char output[3];
STRCPYX(output,input);
While this does give you what you want, the same sort of risks apply.
char *input = "testing 123 testing";
char output[9];
STRCPYX(output,input);
the size of input is 8, and output is 9, the value of output ends up as "Testing "
C was not designed to protect the programmer from doing things incorrectly.
It is kind of like you are attempting to paddle upriver :)
It is a good exercise to think about.
Although arrays and pointers can appear to be interchangeable, they differ in one important aspect; an array has size. However because an array when passed to a function "degrades" to a pointer, the size information is lost.
The point is that at some point you know the size of the object - because you allocated it or declared it to be a certain size. The C language makes it your responsibility to retain and disseminate that information as necessary. So after your example:
char s[80] = ""; // sizeof(s) here is 80, because an array has size
int i = someFunction(s, sizeof(s)) ; // You have to tell the function how big the array is.
There is no "magic" method of determining the size of the array within someFunction(), because that information is discarded (for reasons of performance and efficiency - C is relatively low level in this respect, and does not add code or data that is not explicit); if the information is needed, you must explicitly pass it.
One way in which you can pass a string and retain size information, and even pass the string by copy rather than by reference is to wrap the string in a struct thus:
typedef struct
{
char s[80] ;
} charArray_t ;
then
charArray_t s ;
int i = someFunction( &s ) ;
with a definition of someFunction() like:
int someFunction( charArray_t* s )
{
return sizeof( s->s ) ;
}
You don't really gain much by doing that however - just avoid the additional parameter; in fact you loose some flexibility because someFunction() now only takes a fixed array length defined by charrArray_t, rather than any array. Sometimes such restrictions are useful. On feature of this approach is that you can pass by copy this:
int i = someFunction( s ) ;
then
int someFunction( charArray_t s )
{
return sizeof( s.s ) ;
}
since structures unlike arrays can be passed this way. You can equally return by copy as well. It can be somewhat inefficient however. Sometimes the convenience and safety outweigh the inefficiency however.
I'm new to C and haven't really grasped when C decides to free an object and when it decides to keep an object.
heap_t is pointer to a struct heap.
heap_t create_heap(){
heap_t h_t = (heap_t)malloc(sizeof(heap));
h_t->it = 0;
h_t->len = 10;
h_t->arr = (token_t)calloc(10, sizeof(token));
//call below a couple of times to fill up arr
app_heap(h_t, ENUM, "enum", 1);
return h_t;
}
putting h_t through
int app_heap(heap_t h, enum symbol s, char* word, int line){
int it = h->it;
int len = h->len;
if (it + 1 < len ){
token temp;
h->arr[it] = temp;
h->arr[it].sym = s;
h->arr[it].word = word;
h->arr[it].line = line;
h->it = it + 1;
printf(h->arr[it].word);
return 1;
} else {
h->len = len*2;
h->arr = realloc(h->arr, len*2);
return app_heap(h, s, word, line);
}
}
Why does my h_t->arr fill up with junk and eventually I get a segmentation fault? How do I fix this? Any C coding tips/styles to avoid stuff like this?
First, to answer your question about the crash, I think the reason you are getting segmentation fault is that you fail to multiply len by sizeof(token) in the call to realloc. You end up writing past the end of the block that has been allocated, eventually triggering a segfault.
As far as "deciding to free an object and when [...] to keep an object" goes, C does not decide any of it for you: it simply does it when you tell it to by calling free, without asking you any further questions. This "obedience" ends up costing you sometimes, because you can accidentally free something you still need. It is a good idea to NULL out the pointer, to improve your chance of catching the issue faster (unfortunately, this is not enough to eliminate the problem altogether, because of shared pointers).
free(h->arr);
h -> arr = NULL; // Doing this is a good practice
To summarize, managing memory in C is a tedious task that requires a lot of thinking and discipline. You need to check the result of every allocation call to see if it has failed, and perform many auxiliary tasks when it does.
C does not "decide" anything, if you have allocated something yourself with an explicit call to e.g. malloc(), it will stay allocated until you free() it (or until the program terminates, typically).
I think this:
token temp;
h->arr[it] = temp;
h->arr[it].sym = s;
/* more accesses */
is very weird, the first two lines don't do anything sensible.
As pointed out by dasblinkenlight, you're failing to scale the re-allocation into bytes, which will cause dramatic shrinkage of the array when it tries to grow, and corrupt it totally.
You shouldn't cast the return values of malloc() and realloc(), in C.
Remember that realloc() might fail, in which case you will lose your pointer if you overwrite it like you do.
Lots of repetition in your code, i.e. realloc(h->arr, len*2) instead of realloc(h->arr, h->len * sizeof *h->arr) and so on.
Note how the last bullet point also fixes the realloc() scaling bug mentioned above.
You're not reallocating to the proper size, the realloc statement needs to be:
realloc(h->arr, sizeof(token) * len*2);
^^^^^^^^^^^^
(Or perhaps better realloc(h->arr, sizeof *h->arr * h->h_len);)
In C, you are responsible to free the memory you allocate. You have to free() the memory you've malloc/calloc/realloc'ed when it's suitable to do so. The C runtime never frees anything, except when the program has terminated(some more esoteric systems might not release the memory even then).
Also, try to be consistent, the general form for allocating is always T *foo = malloc(sizeof *foo), and dont duplicate stuff.
e.g.
h_t->arr = (token_t)calloc(10, sizeof(token));
^^^^^^^^ ^^ ^^^^^^^^^^^^^
Don't cast the return value of malloc in C. It's unncessesary and might hide a serious compiler warning and bug if you forget to include stdlib.h
the cast is token_t but the sizeof applies to token, why are they different, and are they the same type as *h_t->arr ?
You already have the magic 10 value, use h_t->len
If you ever change the type of h_t->arr, you have to remember to change the sizeof(..)
So make this
h_t->arr = calloc(h_t->len, sizeof *h_t->arr);
Two main problems in creating dangling pointers in C are the not assigning
NULL to a pointer after freeing its allocated memory, and shared pointers.
There is a solution to the first problem, of automatically nulling out the pointer.
void SaferFree(void *AFree[])
{
free(AFree[0]);
AFree[0] = NULL;
}
The caller, instead calling
free(p);
will call
SaferFree(&p);
In respect to the second and harder to be siolved issue:
The rule of three says:
If you need to explicitly declare either the destructor, copy constructor or copy assignment operator yourself, you probably need to explicitly declare all three of them.
Sharing a pointer in C is simply copying it (copy assignment). It means that using the rule of three (or the general rule of 0)
when programming in C obliges the programmer to supply a way to construct and especially destruct such an assignment, which is possible, but not an
easy task especially when C does not supply a descructor that is implicitly activated as in C++.