Potential issues with p* in structs? - c

I have been churning through C for the last several months. In an effort to learn the language, the project is an arithmetic parser - formulas, variables, etc.
I recently decided to go ahead and work out garbage collection because I have a lot of calls to this method:
char* read_token(const Source* source, const Token* token) {
int szWord = token->t_L + 1; // +1 for NULL terminator
char* word = (char*)malloc(sizeof(char)*szWord);
memset(word, '\0', sizeof(char)*(szWord));
char* p_T = source->p_Src + token->t_S;
memcpy(word, p_T, token->t_L);
return word;
}
... which means calling free(...) quite a bit.
The Source struct has two buffer properties among others:
typedef struct source Source;
struct source {
// ...
char* p_Src; // malloc'd source buffer
int srcLen;
Token* p_tokens; // malloc'd Token buffer
// ...
};
The Token struct has start and length properties:
typedef struct token Token;
struct token {
int t_S; // buffer start index
int t_L; // token length
};
In addition, since there could be many sources, a Source* buffer is malloc'd.
When a buffer is malloc'd, the size of the struct is provided (* numStructs). But if a given struct has a buffer that may be allocated at a later time, such as Token*, does that change the size of Source? Is the code in danger of overwriting previously allocated memory?
For some reason I was getting the idea that all of the memory used for a struct, including any buffers, is allocated in a linear manner. If Token* buffer in struct is allocated to 10 tokens, that space is not then linearly allocated within the Source struct right?

The pointer members in your struct are variables that store addresses of memory blocks and as you state yourself pointers and pointees are allocated independently. Hence those buffers might be located just next to where their 'parent' struct is stored, or not (and most probably won't).
If it is needed, ensuring contiguous storage of the struct members and its pointed buffers can be achieved by allocating everything in one call to the *alloc function.
This can be done
using fixed-size buffers: not really convenient since any flexibility on the buffer sizes is lost. Also note that declaring this updates the value of sizeof(struct foo) accordingly.
using C99's flexible array member or tricks to enable the feature in pre-C99 C: Allocate Pointer and pointee at once .
using not recommended hacks resorting on pointer arithmetic, watching out for the compiler's alignment policy.

A pointer in a struct is a fixed size, regardless of what it is pointing at, even if it is uninitialised. That way, sizeof(struct token) is a fixed length.
When malloc is used, the memory is taken somewhere from the heap, we know not where, and should not care either. It is highly unlikely that memory will be allocated anywhere near the original struct, and even if it was, that would be implementation specific and you could not count on that behaviour.
Obviously (?) you should call free() on the pointer before the struct that it lives in is destroyed.
Also note C99's Variable Length Arrays (VLAs).

Sorry, i'm writing this in the "Answer" section instead of "Comments" section because my Stackoverflow reputation isn't high enough yet.
What i was about to comment is Why don't you just use this line:
char word[ sizeof(char)*szWord ];
instead of
char* word = (char*)malloc(sizeof(char)*szWord); ?

Related

malloc'd pointer inside struct that is passed by value

I am putting together a project in C where I must pass around a variable length byte sequence, but I'm trying to limit malloc calls due to potentially limited heap.
Say I have a struct, my_struct, that contains the variable length byte sequence, ptr, and a function, my_func, that creates an instance of my_struct. In my_func, my_struct.ptr is malloc'd and my_struct is returned by value. my_struct will then be used by other functions being passed by value: another_func. Code below.
Is this "safe" to do against memory leaks provided somewhere on the original or any copy of my_struct when passed by value, I call my_struct_destroy or free the malloc'd pointer? Specifically, is there any way that when another_func returns, that inst.ptr is open to being rewritten or dangling?
Since stackoverflow doesn't like opinion-based questions, are there any good references that discuss this behavior? I'm not sure what to search for.
typedef struct {
char * ptr;
} my_struct;
// allocates n bytes to pointer in structure and initializes.
my_struct my_func(size_t n) {
my_struct out = {(char *) malloc(n)};
/* initialization of out.ptr */
return out;
}
void another_func(my_struct inst) {
/*
do something using the passed-by-value inst
are there problems with inst.ptr here or after this function returns?
*/
}
void my_struct_destroy(my_struct * ms_ptr) {
free(ms_ptr->ptr);
ms_ptr->ptr = NULL;
}
int main() {
my_struct inst = my_func(20);
another_func(inst);
my_struct_destroy(&inst);
}
I's safe to pass and return a struct containing a pointer by value as you did it. It contains a copy of ptr. Nothing is changed in the calling function. There would, of course, be a big problem if another_func frees ptr and then the caller tries to use it or free it again.
Locality of alloc+free is a best practice. Wherever possible, make the function that allocates an object also responsible for freeing it. Where that's not feasible, malloc and free of the same object should be in the same source file. Where that's not possible (think complex graph data structure with deletes), the collection of files that manage objects of a given type should be clearly identified and conventions documented. There's a common technique useful for programs (like compilers) that work in stages where much of the memory allocated in one stage should be freed before the next starts. Here, memory is only malloced in big blocks by a manager. From these, the manager allocs objects of any size. But it knows only one way to free: all at once, presumably at the end of a stage. This is a gcc idea: obstacks. When allocation is more complex, bigger systems implement some kind of garbage collector. Beyond these ideas, there are as many ways to manage C storage as there are colors. Sorry I don't have any pointers to references (pun intended :)
If you only have one variable-length field and its size doesn't need to be dynamically updated, consider making the last field in the struct an array to hold it. This is okay with the C standard:
typedef struct {
... other fields
char a[1]; // variable length
} my_struct;
my_struct my_func(size_t n) {
my_struct *p = malloc(sizeof *p + (n - 1) * sizeof p->a[0]);
... initialize fields of p
return p;
}
This avoids the need to separately free the variable length field. Unfortunately it only works for one.
If you're okay with gcc extensions, you can allocate the array with size zero. In C 99, you can get the same effect with a[]. This avoids the - 1 in the size calculation.

the difference between struct with flexible arrays members and struct with pointer members

I'm quit confused with the difference between flexible arrays and pointer as struct members. Someone suggested, struct with pointers need malloc twice. However, consider the following code:
struct Vector {
size_t size;
double *data;
};
int len = 20;
struct Vector* newVector = malloc(sizeof *newVector + len * sizeof*newVector->data);
printf("%p\n",newVector->data);//print 0x0
newVector->data =(double*)((char*)newVector + sizeof*newVector);
// do sth
free(newVector);
I find a difference is that the address of data member of Vector is not defined. The programmer need to convert to "find" the exactly address. However, if defined Vector as:
struct Vector {
size_t size;
double data[];
};
Then the address of data is defined.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
The difference is how the struct is stored. In the first example you over-allocate memory but that doesn't magically mean that the data pointer gets set to point at that memory. Its value after malloc is in fact indeterminate, so you can't reliably print it.
Sure, you can set that pointer to point beyond the part allocated by the struct itself, but that means potentially slower access since you need to go through the pointer each time. Also you allocate the pointer itself as extra space (and potentially extra padding because of it), whereas in a flexible array member sizeof doesn't count the flexible array member. Your first design is overall much more cumbersome than the flexible version, but other than that well-defined.
The reason why people malloc twice when using a struct with pointers could either be that they aren't aware of flexible array members or using C90, or alternatively that the code isn't performance-critical and they just don't care about the overhead caused by fragmented allocation.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
If you use pointer method and malloc only once, there is one extra thing you need to care of in the calculation: alignment.
Let's add one extra field to the structure:
struct Vector {
size_t size;
uint32_t extra;
double *data;
};
Let's assume that we are on system where each field is 4 bytes, there is no trailing padding on struct and total size is 12 bytes. Let's also assume that double is 8 bytes and requires alignment to 8 bytes.
Now there is a problem: expression (char*)newVector + sizeof*newVector no longer gives address that is divisible by 8. There needs to be manual padding of 4 bytes between structure and data. This complicates the malloc size calculation and data pointer offset calculation.
So the main reason you see 1 malloc pointer version less, is that it is harder to get right. With pointer and 2 mallocs, or flexible array member, compiler takes care of necessary alignment calculation and padding so you don't have to.

segFaut by allocating memory for pointer to pointer in struct

I have a pointer to char pointer in a struct. so there is a struct poiter object that I will pass to thread but thats not relevant.
so this is my struct
struct shard
{
sem_t *semaphore_o;
char **input;
char **image_o;
};
I am creating object like
struct shard *obj=malloc(sizeof(struct shard));
and I am allocating the input in shard struct
*(obj->input)=malloc(sizeof(char)+1);
but abocve line giving segFault
*(obj->input) dereferences input before input points to valid memory, invoking undefined behavior, where manifesting with a segfault is common. You first have to malloc memory for it before dereferencing. You don't provide enough details about what you're trying to do, but something like:
struct shard *obj=malloc(sizeof(struct shard));
if (obj == NULL)
{
// handle error how you want, and make a check like this for every malloc
exit(-1);
}
obj->input = malloc(howManyCharPtrsYouWant * sizeof(*(object->input));
for (int i=0; i<howManyCharPtrsYouWant; i++)
{
obj->input[i] = malloc(howManyCharsYouWant); // sizeof(char) is guaranteed to be 1, so no need to include it in `malloc`
}
See my previous answer for an explanation of sizeof(*(object->input)) if you're unsure what's happening there.
*(obj->input)=malloc(sizeof(char)+1); only allocates space for 2 chars (sizeof char == 1, +1 == 2 total). This is the smallest possible string you can have in C, one char followed by a NUL terminator. So if you're trying to put more than 2 chars in that buffer, that's an overflow --> UB --> possible segfault. If all you need is one char, better to bypass the malloc and simply use char c;, for example.
This is getting off topic, but the top two reasons you need dynamic memory allocation IMO are:
You won't know how much memory you'll need until runtime
You need "a lot" of memory (on your regular vanilla PC, I'd put this around 4+MB, but of course your mileage may vary depending on your needs and system).
SO has some good questions weighing the benefits of when to malloc or not, you can look them up.
If you know at compile time you'll need a max of 30 strings, only 30 chars long for example, then you can do something like
struct shard
{
sem_t *semaphore_o;
char input[30][30];
// .. etc
};
and
struct shard obj; // you only need one of these, malloc makes little sense IMO
obj.semaphore_o = malloc(..); // this doesn't need to be a pointer either
strcpy(input[0], "hello there"); // fits in 29 chars plus NUL
strcpy(input[5], "I'm Jim"); // same
My preference is always to use automatic (stack) storage when possible, it's simpler because it relieves you of the burden of memory management.

Memory allocation of nested char pointer

I have a question regarding the allocation of memory for a given char pointer inside a struct. The following typedef bson_value_t is given by an API and I would like to use it inside my own typedef ObjectInfo, shown in my code:
typedef struct _bson_value_t {
bson_type_t value_type;
union {
int64_t v_int64;
int32_t v_int32;
int8_t v_int8;
double v_double;
struct {
uint32_t len;
char *str;
} v_utf8;
} value;
} bson_value_t;
typedef struct _ObjectInfo {
char key[100];
bson_value_t value;
} ObjectInfo;
I have other data packages that contain hundreds of these ObjectInfo types, but all simply initalized like:
typedef _DataPackage {
ObjectInfo single;
ObjectInfo multiple[100];
...
} Datapackage;
So they do not contain any usefull data yet. I would like to use strcpy to put a string to the location where char *str is pointing. But as far as I know that does not work because there is no allocated memory where *str is pointing to, right?
My question would be, how do I accomplish that without changing the given typedef bson_value_t? Do I need to allocate memory for any one bson_value_t that I initialized?
strcpy(DataPackage.single.value.value.v_utf8.str, "test");
That does not work, unless I change it to:
strcpy(&DataPackage.single.value.value.v_utf8.str, "test");
but this is giving me compiler warnings.
I would like to use strcpy to put a string to the location where char *str is pointing. But as far as I know that does not work because there is no allocated memory where *str is pointing to, right?
Right.
My question would be, how do I accomplish that without changing the given typedef bson_value_t? Do I need to allocate memory for any one bson_value_t that I initialized?
No, you do not (for this purpose) need to dynamically allocate memory for any bson_value_t. The space, if any, to which any of those value.v_utf8.str members point will be external to the the bson_value_t, so arranging for that memory is a separate consideration.
As far as the data structure itself is concerned, the string data could be dynamically allocated, or they could be statically allocated (such as the contents of a string literal), or they could even be a locally-declared array (automatically allocated), though this last will present issues if the lifetime of the string data is shorter than that of the bson_value_t.
However, do make sure that you know what assumptions are made by the library from which you are drawing this. For example, if any of its functions assume that they can reallocate space to provide for lengthening the string, or if they assume that they can modify the contents in place, then such assumptions affect what kind of storage you need to provide.

Storing a pointer in C

I'm trying to create a memory allocation system, and part of this involves storing integers at pointer locations to create a sort of header. I store a couple of integers, and then two pointers (with locations to the next and prev spots in memory).
Right now I'm trying to figure out if I can store the pointer at a location that I could later use as the original pointer.
int * header;
int * prev;
int * next;
...
*(header+3) = prev;
*(header+4) = next;
Then later...
headerfunction(*(header+4));
would perform an operation using the pointer to the 'next' location in memory.
(code for illustration only)
Any help or suggestions greatly appreciated!
Don't do direct pointer manipulation. Structs were made to eliminate the need for you to do that directly.
Instead, do something a bit more like this:
typedef struct
{
size_t cbSize;
} MyAwesomeHeapHeader;
void* MyAwesomeMalloc(size_t cbSize)
{
MyAwesomeHeapHeader* header;
void* internalAllocatorPtr;
size_t cbAlloc;
// TODO: Maybe I want a heap footer as well?
// TODO: I should really check the following for an integer overflow:
cbAlloc = sizeof(MyAwesomeHeapHeader) + cbSize;
internalAllocatorPtr = MyAwesomeRawAllocator(cbAlloc);
// TODO: Check for null
header = (MyAwesomeHeapHeader*)internalAllocatorPtr;
header->heapSize = cbSize;
// TODO: other fields here.
return (uint8_t*)(internalAllocatorPtr) + sizeof(MyAwesomeHeapHeader);
}
What-ever you are doing is not safe because you are trying to write a memory location which is not pointed by header as *(header+3) it will try to write to some other memory location 12 byte far from header pointer & if this newly memory is held by another variable then it will cause problem.
You can do as first of all allocating a big memory & then the start address will give you the source of your memory in which you can use some starting bytes or memory for controlling other parts of the remaining memory with the help of structures.
Akp is correct, just looking at what you are trying to accomplish in your code segment, if you are trying to store integer pointers in header, header should be defined as such:
int **header;
and then memory should be allocated for it.
With regards to the actual memory allocation, if on a Unix machine, you should look into the brk() syscall.
You are building a memory allocation system, and thus we assume you have a trunk of memory somewhere you can use freely to manage allocations and freeings.
As per your question, the header pointer is allocated in the heap memory (by the compiler and libraries) - and you may wonder if it is safe to use that memory since you are allocating memory. It depends on your system, and if there is another (system) memory allocation management.
But what you could do is
main() {
void *header;
void *prev;
void *next;
manage_memory_allocations(&header, &prev, &next); // never returns
}
In this case, the pointers are created on the stack - so the allocation depends on the memory where the processor stack points to.
Note the "never returns" as the memory is "freed" as soon as main ends.

Resources