Avoiding freeing a string literal - c

If you have a function in C that takes ownership of whatever is passed into it, such as a function that adds a struct to a vector buffer by-value, and this struct value contains a member pointer to a character array (a string).
During the buffer's cleanup routine, it should release the strings that it owns, but what if some strings are allocated at runtime, but others are allocated at compiletime using a string literal.
There is no safe and standard (non-proprietary) way to detect if a char* points to read-only memory, so what is this hypothetical freeVector function to do with a pointer to a char buffer?
struct Element {
int id;
char* name;
}
struct Vector {
size_t maxIndex;
size_t length;
struct Element buffer[];
}
void addToVector(struct Vector* vector, struct Element element) {
// lazy-reallocation logic here if maxIndex => length
vector->buffer[ vector->maxIndex++ ] = element; // by-value copy
}
void freeVector(struct Vector* vector) {
for(size_t i = 0; i < vector->maxIndex; i++ ) {
free( vector->buffer[ i ].name ); // segfault/AV if name is a literal
}
}

The blessing and the curse of C is that it lets this totally up to you. Two choices are to allocate everything on the heap and to define a fat pointer type that includes a bit to say whether each instance needs freeing. A clever albeit non-portable implementation might use a low order bit of the pointer itself because for many architectures the bottom 2 bits or more of all pointers are always zero. Garbage collectors have used this trick to distinguish pointers from unboxed discrete types (fixnums in the biz) almost forever.
If you allow more than one pointer to the same object (think graph data structure), then things get more complex or interesting depending on your point of view. For this, you'll probably need a garbage collection scheme: obstacks, reference counting, mark and sweep, arena copying, etc. Other languages tend to give you one of these as a built-in or (as in C++) language features deliberately meant to support implementing one or more yourself. With C, not so much...

Related

malloc'd pointer inside struct that is passed by value

I am putting together a project in C where I must pass around a variable length byte sequence, but I'm trying to limit malloc calls due to potentially limited heap.
Say I have a struct, my_struct, that contains the variable length byte sequence, ptr, and a function, my_func, that creates an instance of my_struct. In my_func, my_struct.ptr is malloc'd and my_struct is returned by value. my_struct will then be used by other functions being passed by value: another_func. Code below.
Is this "safe" to do against memory leaks provided somewhere on the original or any copy of my_struct when passed by value, I call my_struct_destroy or free the malloc'd pointer? Specifically, is there any way that when another_func returns, that inst.ptr is open to being rewritten or dangling?
Since stackoverflow doesn't like opinion-based questions, are there any good references that discuss this behavior? I'm not sure what to search for.
typedef struct {
char * ptr;
} my_struct;
// allocates n bytes to pointer in structure and initializes.
my_struct my_func(size_t n) {
my_struct out = {(char *) malloc(n)};
/* initialization of out.ptr */
return out;
}
void another_func(my_struct inst) {
/*
do something using the passed-by-value inst
are there problems with inst.ptr here or after this function returns?
*/
}
void my_struct_destroy(my_struct * ms_ptr) {
free(ms_ptr->ptr);
ms_ptr->ptr = NULL;
}
int main() {
my_struct inst = my_func(20);
another_func(inst);
my_struct_destroy(&inst);
}
I's safe to pass and return a struct containing a pointer by value as you did it. It contains a copy of ptr. Nothing is changed in the calling function. There would, of course, be a big problem if another_func frees ptr and then the caller tries to use it or free it again.
Locality of alloc+free is a best practice. Wherever possible, make the function that allocates an object also responsible for freeing it. Where that's not feasible, malloc and free of the same object should be in the same source file. Where that's not possible (think complex graph data structure with deletes), the collection of files that manage objects of a given type should be clearly identified and conventions documented. There's a common technique useful for programs (like compilers) that work in stages where much of the memory allocated in one stage should be freed before the next starts. Here, memory is only malloced in big blocks by a manager. From these, the manager allocs objects of any size. But it knows only one way to free: all at once, presumably at the end of a stage. This is a gcc idea: obstacks. When allocation is more complex, bigger systems implement some kind of garbage collector. Beyond these ideas, there are as many ways to manage C storage as there are colors. Sorry I don't have any pointers to references (pun intended :)
If you only have one variable-length field and its size doesn't need to be dynamically updated, consider making the last field in the struct an array to hold it. This is okay with the C standard:
typedef struct {
... other fields
char a[1]; // variable length
} my_struct;
my_struct my_func(size_t n) {
my_struct *p = malloc(sizeof *p + (n - 1) * sizeof p->a[0]);
... initialize fields of p
return p;
}
This avoids the need to separately free the variable length field. Unfortunately it only works for one.
If you're okay with gcc extensions, you can allocate the array with size zero. In C 99, you can get the same effect with a[]. This avoids the - 1 in the size calculation.

Is it possible to define a pointer without a temp/aux variable? (Or would this be bad C-coding?)

I'm trying to understand C-pointers. As background, I'm used to coding in both C# and Python3.
I understand that pointers can be used to save the addresses of a variable (writing something like type* ptr = &var;) and that incrementing pointers is equivalent to incrementing the index of an array of objects of that object type type. But what I don't understand is whether or not you can use pointers and deferenced objects of the type (e.g. int) without referencing an already-defined variable.
I couldn't think of a way to do this, and most of the examples of C/C++ pointers all seem to use them to reference a variable. So it might be that what I'm asking is either impossible and/or bad coding practice. If so, it would be helpful to understand why.
For example, to clarify my confusion, if there is no way to use pointers without using predefined hard-coded variables, why would you use pointers at all instead of the basic object directly, or arrays of objects?
There is a short piece of code below to describe my question formally.
Many thanks for any advice!
// Learning about pointers and C-coding techniques.
#include <stdio.h>
/* Is there a way to define the int-pointer age WITHOUT the int variable auxAge? */
int main() // no command-line params being passed
{
int auxAge = 12345;
int* age = &auxAge;
// *age is an int, and age is an int* (i.e. age is a pointer-to-an-int, just an address to somewhere in memory where data defining some int is expected)
// do stuff with my *age int e.g. "(*age)++;" or "*age = 37;"
return 0;
}
Yes, you can use dynamic memory (also known as "heap") allocation:
#include <stdlib.h>
int * const integer = malloc(sizeof *integer);
if (integer != NULL)
{
*integer = 4711;
printf("forty seven eleven is %d\n", *integer);
free(integer);
// At this point we can no longer use the pointer, the memory is not ours any more.
}
This asks the C library to allocate some memory from the operating system and return a pointer to it. Allocating sizeof *integer bytes makes the allocation fit an integer exactly, and we can then use *integer to dereference the pointer, that will work pretty much exactly like referencing an integer directly.
There are many good reasons to use pointers in C, and one of them is, that you can only pass by value in C - you cannot pass by reference. Therefore passing pointer to an existing variable saves you the overhead of copying it to stack. As an example, let's assume this very large structure:
struct very_large_structure {
uint8_t kilobyte[1024];
}
And now assume a function which needs to use this structure:
bool has_zero(struct very_large_structure structure) {
for (int i = 0; i < sizeof(structure); i++) {
if (0 == structure.kilobyte[i]) {
return true;
}
}
return false;
}
So for this function to be called, you need to copy the whole structure to stack, and that can be especially on embedded platforms where C is widely used an unacceptable requirement.
If you will pass the structure via pointer, you are only copying to the stack the pointer itself, typically a 32-bit number:
bool has_zero(struct very_large_structure *structure) {
for (int i = 0; i < sizeof(*structure); i++) {
if (0 == structure->kilobyte[i]) {
return true;
}
}
return false;
}
This is by no mean the only and most important use of pointers, but it clearly shows the reasoning why pointers are important in C.
But what I don't understand is whether or not you can use pointers and deferenced objects of the type (e.g. int) without referencing an already-defined variable.
Yes, there are two cases where this is possible.
The first case occurs with dynamic memory allocation. You use the malloc, calloc, or realloc functions to allocate memory from a dynamic memory pool (the "heap"):
int *ptr = malloc( sizeof *ptr ); // allocate enough memory for a single `int` object
*ptr = some_value;
The second case occurs where you have a fixed, well-defined address for an I/O channel or port or something:
char *port = (char *) OxDEADBEEF;
although this is more common in embedded systems than general applications programming.
EDIT
Regarding the second case, chapter and verse:
6.3.2.3 Pointers
...
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment.
Parameters to a function in C are always pass by value, so changing a parameter value in a function isn't reflected in the caller. You can however use pointers to emulate pass by reference. For example:
void clear(int *x)
{
*x = 0;
}
int main()
{
int a = 4;
printf("a=%d\n", a); // prints 4
clear(&a);
printf("a=%d\n", a); // prints 0
return 0;
}
You can also use pointers to point to dynamically allocated memory:
int *getarray(int size)
{
int *array = malloc(size * sizeof *array);
if (!array) {
perror("malloc failed");
exit(1);
}
return array;
}
These are just a few examples.
Most common reason: because you wish to modify the contents without passing them around.
Analogy:
If you want your living room painted, you don't want to place your house on a truck trailer, move it to the painter, let him do the job and then haul it back. It would be expensive and time consuming. And if your house is to wide to get hauled around on the streets, the truck might crash. You would rather tell the painter which address you live on, have him go there and do the job.
In C terms, if you have a big struct or similar, you'll want a function to access this struct without making a copy of it, passing a copy to the function, then copy back the modified contents back into the original variable.
// BAD CODE, DONT DO THIS
typedef struct { ... } really_big;
really_big rb;
rb = do_stuff(rb);
...
rb do_stuff (really_big thing) // pass by value, return by value
{
thing->something = ...;
...
return thing;
}
This makes a copy of rb called thing. It is placed on the stack, wasting lots of memory and needlessly increasing the stack space used, increasing the possibility of stack overflow. And copying the contents from rb to thing takes lots of execution time. Then when it is returned, you make yet another copy, from thing back to rb.
By passing a pointer to the struct, none of the copying takes place, but the end result is the very same:
void do_stuff (really_big* thing)
{
thing->something = ...;
}

How to include a variable-sized array as stuct member in C?

I must say, I have quite a conundrum in a seemingly elementary problem. I have a structure, in which I would like to store an array as a field. I'd like to reuse this structure in different contexts, and sometimes I need a bigger array, sometimes a smaller one. C prohibits the use of variable-sized buffer. So the natural approach would be declaring a pointer to this array as struct member:
struct my {
struct other* array;
}
The problem with this approach however, is that I have to obey the rules of MISRA-C, which prohibits dynamic memory allocation. So then if I'd like to allocate memory and initialize the array, I'm forced to do:
var.array = malloc(n * sizeof(...));
which is forbidden by MISRA standards. How else can I do this?
Since you are following MISRA-C, I would guess that the software is somehow mission-critical, in which case all memory allocation must be deterministic. Heap allocation is banned by every safety standard out there, not just by MISRA-C but by the more general safety standards as well (IEC 61508, ISO 26262, DO-178 and so on).
In such systems, you must always design for the worst-case scenario, which will consume the most memory. You need to allocate exactly that much space, no more, no less. Everything else does not make sense in such a system.
Given those pre-requisites, you must allocate a static buffer of size LARGE_ENOUGH_FOR_WORST_CASE. Once you have realized this, you simply need to find a way to keep track of what kind of data you have stored in this buffer, by using an enum and maybe a "size used" counter.
Please note that not just malloc/calloc, but also VLAs and flexible array members are banned by MISRA-C:2012. And if you are using C90/MISRA-C:2004, there are no VLAs, nor are there any well-defined use of flexible array members - they invoked undefined behavior until C99.
Edit: This solution does not conform to MISRA-C rules.
You can kind of include VLAs in a struct definition, but only when it's inside a function. A way to get around this is to use a "flexible array member" at the end of your main struct, like so:
#include <stdio.h>
struct my {
int len;
int array[];
};
You can create functions that operate on this struct.
void print_my(struct my *my) {
int i;
for (i = 0; i < my->len; i++) {
printf("%d\n", my->array[i]);
}
}
Then, to create variable length versions of this struct, you can create a new type of struct in your function body, containing your my struct, but also defining a length for that buffer. This can be done with a varying size parameter. Then, for all the functions you call, you can just pass around a pointer to the contained struct my value, and they will work correctly.
void create_and_use_my(int nelements) {
int i;
// Declare the containing struct with variable number of elements.
struct {
struct my my;
int array[nelements];
} my_wrapper;
// Initialize the values in the struct.
my_wrapper.my.len = nelements;
for (i = 0; i < nelements; i++) {
my_wrapper.my.array[i] = i;
}
// Print the struct using the generic function above.
print_my(&my_wrapper.my);
}
You can call this function with any value of nelements and it will work fine. This requires C99, because it does use VLAs. Also, there are some GCC extensions that make this a bit easier.
Important: If you pass the struct my to another function, and not a pointer to it, I can pretty much guarantee you it will cause all sorts of errors, since it won't copy the variable length array with it.
Here's a thought that may be totally inappropriate for your situation, but given your constraints I'm not sure how else to deal with it.
Create a large static array and use this as your "heap":
static struct other heap[SOME_BIG_NUMBER];
You'll then "allocate" memory from this "heap" like so:
var.array = &heap[start_point];
You'll have to do some bookkeeping to keep track of what parts of your "heap" have been allocated. This assumes that you don't have any major constraints on the size of your executable.

Why do we use zero length array instead of pointers?

It's said that zero length array is for variable length structure, which I can understand. But what puzzle me is why we don't simply use a pointer, we can dereference and allocate a different size structure in the same way.
EDIT - Added example from comments
Assuming:
struct p
{
char ch;
int *arr;
};
We can use this:
struct p *p = malloc(sizeof(*p) + (sizeof(int) * n));
p->arr = (struct p*)(p + 1);
To get a contiguous chunk of memory. However, I seemed to forget the space p->arr occupies and it seems to be a disparate thing from the zero size array method.
If you use a pointer, the structure would no longer be of variable length: it will have fixed length, but its data will be stored in a different place.
The idea behind zero-length arrays* is to store the data of the array "in line" with the rest of the data in the structure, so that the array's data follows the structure's data in memory. Pointer to a separately allocated region of memory does not let you do that.
* Such arrays are also known as flexible arrays; in C99 you declare them as element_type flexArray[] instead of element_type flexArray[0], i.e. you drop zero.
The pointer isn't really needed, so it costs space for no benefit. Also, it might imply another level of indirection, which also isn't really needed.
Compare these example declarations, for a dynamic integer array:
typedef struct {
size_t length;
int data[0];
} IntArray1;
and:
typedef struct {
size_t length;
int *data;
} IntArray2;
Basically, the pointer expresses "the first element of the array is at this address, which can be anything" which is more generic than is typically needed. The desired model is "the first element of the array is right here, but I don't know how large the array is".
Of course, the second form makes it possible to grow the array without risking that the "base" address (the address of the IntArray2 structure itself) changes, which can be really neat. You can't do that with IntArray1, since you need to allocate the base structure and the integer data elements together. Trade-offs, trade-offs ...
These are various forms of the so-called "struct hack", discussed in question 2.6 of the comp.lang.c FAQ.
Defining an array of size 0 is actually illegal in C, and has been at least since the 1989 ANSI standard. Some compilers permit it as an extension, but relying on that leads to non-portable code.
A more portable way to implement this is to use an array of length 1, for example:
struct foo {
size_t len;
char str[1];
};
You could allocate more than sizeof (struct foo) bytes, using len to keep track of the allocated size, and then access str[N] to get the Nth element of the array. Since C compilers typically don't do array bounds checking, this would generally "work". But, strictly speaking, the behavior is undefined.
The 1999 ISO standard added a feature called "flexible array members", intended to replace this usage:
struct foo {
size_t len;
char str[];
};
You can deal with these in the same way as the older struct hack, but the behavior is well defined. But you have to do all the bookkeeping yourself; sizeof (struct foo) still doesn't include the size of the array, for example.
You can, of course, use a pointer instead:
struct bar {
size_t len;
char *ptr;
};
And this is a perfectly good approach, but it has different semantics. The main advantage of the "struct hack", or of flexible array members, is that the array is allocated contiguously with the rest of the structure, and you can copy the array along with the structure using memcpy (as long as the target has been properly allocated). With a pointer, the array is allocated separately -- which may or may not be exactly what you want.
This is because with a pointer you need a separate allocation and assignment.
struct WithPointer
{
int someOtherField;
...
int* array;
};
struct WithArray
{
int someOtherField;
...
int array[1];
};
To get an 'object' of WithPointer you need to do:
struct WithPointer* withPointer = malloc(sizeof(struct WithPointer));
withPointer.array = malloc(ARRAY_SIZE * sizeof(int));
To get an 'object' of WithArray:
struct WithArray* withArray = malloc(sizeof(struct WithArray) +
(ARRAY_SIZE - 1) * sizeof(int));
That's it.
In some cases it's also very handy, or even necessary, to have the array in consecutive memory; for example in network protocol packets.

Copying one structure to another

I know that I can copy the structure member by member, instead of that can I do a memcpy on structures?
Is it advisable to do so?
In my structure, I have a string also as member which I have to copy to another structure having the same member. How do I do that?
Copying by plain assignment is best, since it's shorter, easier to read, and has a higher level of abstraction. Instead of saying (to the human reader of the code) "copy these bits from here to there", and requiring the reader to think about the size argument to the copy, you're just doing a plain assignment ("copy this value from here to here"). There can be no hesitation about whether or not the size is correct.
Also, if the structure is heavily padded, assignment might make the compiler emit something more efficient, since it doesn't have to copy the padding (and it knows where it is), but mempcy() doesn't so it will always copy the exact number of bytes you tell it to copy.
If your string is an actual array, i.e.:
struct {
char string[32];
size_t len;
} a, b;
strcpy(a.string, "hello");
a.len = strlen(a.string);
Then you can still use plain assignment:
b = a;
To get a complete copy. For variable-length data modelled like this though, this is not the most efficient way to do the copy since the entire array will always be copied.
Beware though, that copying structs that contain pointers to heap-allocated memory can be a bit dangerous, since by doing so you're aliasing the pointer, and typically making it ambiguous who owns the pointer after the copying operation.
For these situations a "deep copy" is really the only choice, and that needs to go in a function.
Since C90, you can simply use:
dest_struct = source_struct;
as long as the string is memorized inside an array:
struct xxx {
char theString[100];
};
Otherwise, if it's a pointer, you'll need to copy it by hand.
struct xxx {
char* theString;
};
dest_struct = source_struct;
dest_struct.theString = malloc(strlen(source_struct.theString) + 1);
strcpy(dest_struct.theString, source_struct.theString);
If the structures are of compatible types, yes, you can, with something like:
memcpy (dest_struct, source_struct, sizeof (*dest_struct));
The only thing you need to be aware of is that this is a shallow copy. In other words, if you have a char * pointing to a specific string, both structures will point to the same string.
And changing the contents of one of those string fields (the data that the char * points to, not the char * itself) will change the other as well.
If you want a easy copy without having to manually do each field but with the added bonus of non-shallow string copies, use strdup:
memcpy (dest_struct, source_struct, sizeof (*dest_struct));
dest_struct->strptr = strdup (source_struct->strptr);
This will copy the entire contents of the structure, then deep-copy the string, effectively giving a separate string to each structure.
And, if your C implementation doesn't have a strdup (it's not part of the ISO standard), get one from here.
You can memcpy structs, or you can just assign them like any other value.
struct {int a, b;} c, d;
c.a = c.b = 10;
d = c;
In C, memcpy is only foolishly risky. As long as you get all three parameters exactly right, none of the struct members are pointers (or, you explicitly intend to do a shallow copy) and there aren't large alignment gaps in the struct that memcpy is going to waste time looping through (or performance never matters), then by all means, memcpy. You gain nothing except code that is harder to read, fragile to future changes and has to be hand-verified in code reviews (because the compiler can't), but hey yeah sure why not.
In C++, we advance to the ludicrously risky. You may have members of types which are not safely memcpyable, like std::string, which will cause your receiving struct to become a dangerous weapon, randomly corrupting memory whenever used. You may get surprises involving virtual functions when emulating slice-copies. The optimizer, which can do wondrous things for you because it has a guarantee of full type knowledge when it compiles =, can do nothing for your memcpy call.
In C++ there's a rule of thumb - if you see memcpy or memset, something's wrong. There are rare cases when this is not true, but they do not involve structs. You use memcpy when, and only when, you have reason to blindly copy bytes.
Assignment on the other hand is simple to read, checks correctness at compile time and then intelligently moves values at runtime. There is no downside.
You can use the following solution to accomplish your goal:
struct student
{
char name[20];
char country[20];
};
void main()
{
struct student S={"Wolverine","America"};
struct student X;
X=S;
printf("%s%s",X.name,X.country);
}
You can use a struct to read write into a file.
You do not need to cast it as a `char*.
Struct size will also be preserved.
(This point is not closest to the topic but guess it:
behaving on hard memory is often similar to RAM one.)
To move (to & from) a single string field you must use strncpy
and a transient string buffer '\0' terminating.
Somewhere you must remember the length of the record string field.
To move other fields you can use the dot notation, ex.:
NodeB->one=intvar;
floatvar2=(NodeA->insidebisnode_subvar).myfl;
struct mynode {
int one;
int two;
char txt3[3];
struct{char txt2[6];}txt2fi;
struct insidenode{
char txt[8];
long int myl;
void * mypointer;
size_t myst;
long long myll;
} insidenode_subvar;
struct insidebisnode{
float myfl;
} insidebisnode_subvar;
} mynode_subvar;
typedef struct mynode* Node;
...(main)
Node NodeA=malloc...
Node NodeB=malloc...
You can embed each string into a structs that fit it,
to evade point-2 and behave like Cobol:
NodeB->txt2fi=NodeA->txt2fi
...but you will still need of a transient string
plus one strncpy as mentioned at point-2 for scanf, printf
otherwise an operator longer input (shorter),
would have not be truncated (by spaces padded).
(NodeB->insidenode_subvar).mypointer=(NodeA->insidenode_subvar).mypointer
will create a pointer alias.
NodeB.txt3=NodeA.txt3
causes the compiler to reject:
error: incompatible types when assigning to type ‘char[3]’ from type ‘char *’
point-4 works only because NodeB->txt2fi & NodeA->txt2fi belong to the same typedef !!
A correct and simple answer to this topic I found at
In C, why can't I assign a string to a char array after it's declared?
"Arrays (also of chars) are second-class citizens in C"!!!

Resources