Sacrificing expression of intent for memory management - c

I'm pretty new at C programming, and this type of thing keeps popping up. As a simple example, suppose I have a struct http_header with some char pointers:
struct http_header {
char* name;
char* value;
};
I want to fill an http_header where value is the string representation of an int. I "feel" like, semantically, I should be able to write a function that takes in an empty header pointer, a name string, and an int and fills out the header appropriately.
void fill_header(struct http_header *h, char* name, int value)
{
h->name = name;
char *value_str = malloc(100);
sprintf(value_str, "%d", value);
h->value = value_str;
}
int main(int argc, const char * argv[])
{
struct http_header h;
char *name = "Header Name";
int val = 42;
fill_header(&h, name, val);
...
free(h.value);
}
Here, the calling code reads exactly as my intent, but in this case I'm creating the value string dynamically, which means I'd have to free it later. That doesn't smell right to me; it seems like the caller then knows too much about the implementation of fill_header. And in actual implementations it may not be so easy to know what to free: consider filling an array of http_headers where only one of them needed to have its value malloced.
To get around this, I'd have to create the string beforehand:
void fill_header2(struct http_header *h, char* name, char *value_str)
{
h->name = name;
h->value = value_str;
}
int main(int argc, const char * argv[])
{
struct http_header h;
char *name = "Header Name";
int value = 42;
char value_str[100];
sprintf(value_str, "%d", value);
fill_header2(&h, name, value_str);
}
As this pattern continues down the chain of structures with pointers to other structures, I end up doing so much work in top level functions the lower level ones seem hardly worth it. Furthermore, I've essentially sacrificed the "fill a header with an int" idea which I set out to write in the first place. I'm I missing something here? Is there some pattern or design choice that will make my life easier and keep my function calls expressing my intent?
P.S. Thanks to all at Stackoverfow for being the best professor I've ever had.

Well, I would go with the first approach (with a twist), and also provide a destroy function:
struct http_header *make_header(char *name, int value)
{
struct http_header *h = malloc(sizeof *h);
/* ... */
return h;
}
void destroy_header(struct http_header *h)
{
free(h->name);
free(h);
}
This way the caller doesn't have to know anything about http_header.
You might also get away with a version that leaves the main allocation (the struct itself) to the caller and does it's own internal allocation. Then you would have to provide a clear_header which only frees that fill allocated. But this clear_header leaves you with a partially-valid object.

I think your problem is simply that you are programming asymmetrically. You should once and for all decide who is responsible for the string inside your structure. Then you should have two functions, not only one, that should be called something like header_init and header_destroy.
For the init function I'd be a bit more careful. Check for a 0 argument of your pointer, and initialize your DS completely, something like *h = (http_header){ .name = name }. You never know if you or somebody will end up in adding another field to your structure. So by that at least all other fields are initialized with 0.

If you are new at C programming, you might perhaps want to use the Boehm's conservative garbage collector. Boehm's GC works very well in practice, and by using it systematically in your own code you could use GC_malloc instead of malloc and never bother about calling free or GC_free.
Hunting memory leaks in C (or even C++) code is often a headache. There are tools (like valgrind) which can help you, but you could decide to not bother by using Boehm's GC.
Garbage collection (and memory management) is a global property of a program, so if you use Boehm's GC you should decide that early.

The general solution to your problem is that of object ownership, as others have suggested. The simplest solution to your particular problem is, however, to use a char array for value, i.e., char value[12]. 2^32 has 10 decimal digits, +1 for the sign, +1 for the null-terminator.
You should ensure that 1) int is not larger than 32-bits at compile-time, 2) ensure that the value is within some acceptable range (HTTP codes have only 3 digits) before calling sprintf, 3) use snprintf.
So by using a static array you get rid of the ownership problem, AND you use less memory.

Related

Is it possible to store function arguments in a pointer to function?

Just out of curiosity, I'm trying to understand how pointers to functions work in C.
In order to associate a function to a typedef, I've declared a pointer in it, and then I've stored the address of the desired function in there.
This is what I was able to achieve:
typedef struct
{
void (*get)(char*, int);
char string[10];
} password;
int main()
{
password userPassword;
userPassword.get = &hiddenStringInput;
userPassword.get(userPassword.string, 10);
return EXIT_SUCCESS;
}
While this does actually work perfectly, I'd like for "userPassword.get" to be a shortcut that when used calls the hiddenStringInput function and fills in the requested arguments (in this case, an array of characters and a integer).
Basically, since I'm always going to use userPassword.get in association with the arguments "userPassword.string" and "10", I'm trying to figure out a way to somehow store those parameters in the pointer that points to the hiddenString function. Is it even possible?
The way I see this usually done is by providing a "dispatch" function:
void get(password * pw) {
pw->get(pw->string, 10);
}
Then, after setting userPassword.get to your function, you call just:
get(userPassword);
Obviously this adds some boilerplate code when done for multiple functions. Allows to implement further funny "class like" things, though.
You can do this in Clang using the "Blocks" language extension. As commented, there have been attempts to standardize this (and it's not been received with hostility or anything), but they're moving slowly.
Translated to use Blocks, your example could look like this:
#include <stdlib.h>
#include <Block.h>
typedef void (^GetPw)(int); // notice how Block pointer types are used
typedef void (*GetPw_Impl)(char*, int); // the same way as function pointer types
typedef struct
{
GetPw get;
char string[10];
} password;
extern void hiddenStringInput(char*, int);
extern void setPw(char dst [static 10], char * src);
GetPw bindPw (GetPw_Impl get_impl, char * pw)
{
return Block_copy (^ (int key) {
get_impl (pw, key);
});
}
int main()
{
password userPassword;
setPw(userPassword.string, "secret");
userPassword.get = bindPw(hiddenStringInput, userPassword.string);
userPassword.get(10);
return EXIT_SUCCESS;
}
There are some subtleties to the way arrays are captured that might confuse this case; the example captures the password by normal pointer and assumes userPassword is responsible for ownership of it, separately from the block.
Since a block captures values, it needs to provide and release dynamic storage for the copies of the captured values that will be created when the block itself is copied out of the scope where it was created; this is done with the Block_copy and Block_release functions.
Block types (syntactically function pointers, but using ^ instead of *) are just pointers - there's no way to access the underlying block entity, just like basic C functions.
This is the Clang API - standardization would change this slightly, and will probably reduce the requirement for dynamic memory allocation to copy a block around (but the Clang API reflects how these are currently most commonly used).
So, I've just realized that I can write functions directly inside of structs
typedef struct
{
char string[10];
void get(void)
{
hiddenStringInput(string, 10);
return;
}
void set(const char* newPassword)
{
strcpy(string, newPassword);
return;
}
void show(void)
{
printf("%s", string);
return;
}
} password;
Now I can just call userPassword.get(), userPassword.show() and userPassword.set("something"), and what happens is exactly what the label says. Are there any reasons I shouldn't do this? This looks like it could come pretty handy.
EDIT: So this is only possible in C++. I didn't realize I'm using a C++ compiler and by attempting to do random stuff I came up with this solution. So this isn't really what I was looking for.

Preventing char pointer overflow in struct

I have a function that accepts a struct * pointer containing sensitive data (in a char array) as an argument (sort of a small library).
The two struct models are as follows:
struct struct1 {
char str[1024]; /* maybe even 4096 or 10KB+ */
size_t str_length;
}
struct struct2 {
char *str;
size_t str_length;
}
The test function is:
/* Read str_length bytes from the char array. */
void foo(struct struct1/struct2 *s) {
int i;
for (i = 0; i < s->str_length; i++) {
printf("%c\n", s->str[i]);
}
}
My concern is that, since the str_length parameter is an arbitrary value, one could intentionally set it to cause a buffer overflow (actually someone stupid enough to purposely create a security flaw in its own program, but I feel I have to take such cases into account). By using the struct1 model, however, I could simply check for a possible buffer overflow by just using:
if (s->str_length > sizeof(s->str)) {
/* ERROR */
}
The problem is that the length array is actually unknown at compile-time. So I don't know whether to use a char * pointer (struct2 style, so no overflow check) or define a very big array (struct1), which would limit the max length (something I would like to avoid) and would allocate unnecessary space most of the time (which could be problematic in embedded systems with scarce memory, I suppose). I know I have to make a compromise, I'd personally use the struct2 model, but I'm not sure if it's a good choice security-wise.
Where does the user of your library get the struct2 instance to pass to the function from? I don't think he creates it by himself and then passes its address to your function, that would be a weird way to pass arguments. It is most likely returned from another function in your library, in which case you can make struct2 an opaque data type that the user cannot alter directly (or only in hacky ways):
/* in the header file */
typedef struct2_s struct2;
/* in the implementation file, where allocation is handled as well
* so you know str_length is set to the proper value.
*/
struct struct2_s {
char *str;
size_t str_length;
};
Put the big array at the end..
struct struct1 {
anyType thisVar;
someType anotherVar
size_t str_length;
char str[10240000]; /
}
Let the user malloc it to whatever 'real' size they wish. If they set 'str_length' wrong, well, there's not much you can do about it, no matter what you do:(

Valgrind use of unitialised value

I am creating a symbol table for a compiler I am writing and when I try adding to my symbol table I keep getting valgrind errors. When I call my function, I am calling my add function
stAdd (&sSymbolTable, "test", RSRVWRD, 4, 9);
and in my stAdd function it is currently
void stAdd (StPtr psSymbolTable, char *identifier, SymbolTableType type,
int addressField, int arrayDimensions)
{
int hashValue;
hashValue = hash (identifier, psSymbolTable->numBuckets);
if (psSymbolTable->spSymbolTable[hashValue] == NULL)
{
psSymbolTable->spSymbolTable[hashValue] = (StEntryPtr) malloc (sizeof(StEntry));
strcpy (psSymbolTable->spSymbolTable[hashValue]->identifier, identifier);
psSymbolTable->spSymbolTable[hashValue]->entryLevel = psSymbolTable->currentLevel;
psSymbolTable->spSymbolTable[hashValue]->type = type;
psSymbolTable->spSymbolTable[hashValue]->addressField = addressField;
psSymbolTable->spSymbolTable[hashValue]->arrayDimensions = arrayDimensions;
psSymbolTable->spSymbolTable[hashValue]->psNext = NULL;
}
}
But every time I set a value within my StEntry struckt, I get an error
Use of unitialised value of size 8
every time I set something within the if statement. Does any see where I am going wrong?
My StEntry is
typedef struct StEntry
{
char identifier[32];
SymbolTableLevel entryLevel;
SymbolTableType type;
int addressField;
int arrayDimensions;
StEntryPtr psNext;
} StEntry;
This would be a lot easier if I could see the definition of struct StEntry or even the precise valgrind error. But I'll take a wild guess anyway, because I'm feeling overconfident.
Here, you malloc a new StEntry which you will proceed to fill in:
psSymbolTable->spSymbolTable[hashValue] = (StEntryPtr) malloc (sizeof(StEntry));
This is C, by the way. You don't need to cast the result of the malloc, and it is generally a good idea not to do so. Personally, I'd prefer:
StEntry* new_entry = malloc(sizeof *new_entry);
// Fill in the fields in new_entry
psSymbolTable->spSymbolTable[hashvale] = new_entry;
And actually, I'd ditch the hungarian prefixes, too, but that's an entirely other discussion, which is primarily opinion-based. But I digress.
The next thing you do is:
strcpy (psSymbolTable->spSymbolTable[hashValue]->identifier, identifier);
Now, psSymbolTable->spSymbolTable[hashValue]->identifier might well be a char *, which will point to the character string of the identifier corresponding to this symbol table entry. So it's a pointer. But what is its value? Answer: it doesn't have one. It's sitting in a block of malloc'd and uninitialized memory.
So when strcpy tries to use it as the address of a character string... well, watching out for the flying lizards. (If that's the problem, you could fix it in a flash by using strdup instead of strcpy.)
Now, I could well be wrong. Maybe the identifier member is not char*, but rather char[8]. Then there is no problem with what it points to, but there's also nothing stopping the strcpy from writing beyond its end. So either way, there's something ungainly about that line, which needs to be fixed.

Pointers and assignment in a sub-function

I have a small program that creates a semver struct with some variables in it:
typedef struct {
unsigned major;
unsigned minor;
unsigned patch;
char * note;
char * tag;
} semver;
Then, I would like to create a function which creates a semver struct and returns it to the caller. Basically, a Factory.
That factory would call an initialize function to set the default values of the semver struct:
void init_semver(semver * s) {
s->major = 0;
s->minor = 0;
s->patch = 0;
s->note = "alpha";
generate_semver(s->tag, s);
}
And on top of that, I would like a function to generate a string of the complete semver tag.
void generate_semver(char * tag, semver * s) {
sprintf( tag, "v%d.%d.%d-%s",
s->major, s->minor, s->patch, s->note);
}
My problem appears to lie in this function. I have tried returning a string, but have heard that mallocing some space is bad unless you explicitly free it later ;) In order to avoid this problem, I decided to try to pass a string to the function to have it be changed within the function with no return value. I'm trying to loosely follow something like DI practices, even though I'd really like to separate the concerns of these functions and have the generate_semver function return a string that I can use like so:
char * generate_semver(semver * s) {
char * full_semver;
sprintf( full_semver, "v%d.%d.%d-%s",
s->major, s->minor, s->patch, s->note);
return full_semver; // I know this won't work because it is defined in the local stack and not outside.
}
semver->tag = generate_semver(semver);
How can I do this?
My problem appears to lie in this function. I have tried returning a string, but have heard that mallocing some space is bad unless you explicitly free it later.
Explicitly freeing dynamically allocated memory is required to avoid memory leaks. However, it is not necessarily a task that the end users need to perform directly: an API often provides a function to deal with this.
In your case, you should provide a deinit_semver function that does the clean up of memory that init_semver has allocated dynamically. These two functions behave in a way that is similar to constructor and destructor; init_semver is not a factory function, because it expects the semver struct to be allocated, rather than allocating it internally.
Here is one way of doing it:
void init_semver(semver * s, int major, int minor, int pathc, const char * note) {
s->major = major;
s->minor = minor;
s->patch = pathc;
size_t len = strlen(note);
s->note = malloc(len+1);
strcpy(s->note, note);
s->tag = malloc(40 + len);
sprintf(s->tag, "v%d.%d.%d-%s", major, minor, patch, note);
}
void deinit_semver(semver *s) {
free(s->note);
free(s->tag);
}
Note the changes above: rather than using fixed values for the components of struct semver, this code takes the values as parameters. In addition, the code copies the note into a dynamically allocated buffer, rather than pointing to it directly.
The deinit function does the clean-up by free-ing both fields that were allocated dynamically.
A char * on its own is just a pointer to memory. To accomplish what you want you will either need to instead use a fixed size field, i.e. char[33], or you can dynamically allocate the memory as needed.
As it is, your generate_semver function is attempting to print to an unknown address. Let's look at one solution.
typedef struct {
unsigned major;
unsigned minor;
unsigned patch;
char note[32];
char tag[32];
} semver;
Now, in your init_semver function, the line previously s->note = "alpha"; will become a string copy, as arrays are not a valid lvalue.
strncpy(s->note, "alpha", 31);
s->note[31] = '\0';
strncpy will copy a string from the second parameter to the first up to the number of bytes in the third parameter. The second line ensures that a trailing null terminator is in place.
Similarly, in the generate_semver function, it would directly work in the buffer:
void generate_semver(semver * s) {
snprintf( s->tag, 32, "v%d.%d.%d-%s",
s->major, s->minor, s->patch, s->note);
}
This will directly print to the array in the structure, with a maximum character limit. snprintf does append a trailing null terminator (unlike strncpy), so we don't need to worry about adding it ourselves.
You mention having to free allocated memory, and then say: "In order to avoid this problem". Well, it's not so much a problem, but rather a necessity of the C language. It's common to have functions that allocate memory, and require the caller to free it again.
The idiomatic way is to have a pair of "create" and "destroy" functions. So I'd suggest doing it like this:
// Your factory function
semver* create_semver() {
semver* instance = malloc(sizeof(*instance));
init_semver(instance); // will also allocate instance->tag and ->note
return instance;
}
// Your destruction function
void free_semver(semver* s) {
free(semver->tag);
free(semver->note);
free(semver);
}

How to return an integer from a function

Which is considered better style?
int set_int (int *source) {
*source = 5;
return 0;
}
int main(){
int x;
set_int (&x);
}
OR
int *set_int (void) {
int *temp = NULL;
temp = malloc(sizeof (int));
*temp = 5;
return temp;
}
int main (void) {
int *x = set_int ();
}
Coming for a higher level programming background I gotta say I like the second version more. Any, tips would be very helpful. Still learning C.
Neither.
// "best" style for a function which sets an integer taken by pointer
void set_int(int *p) { *p = 5; }
int i;
set_int(&i);
Or:
// then again, minimise indirection
int an_interesting_int() { return 5; /* well, in real life more work */ }
int i = an_interesting_int();
Just because higher-level programming languages do a lot of allocation under the covers, does not mean that your C code will become easier to write/read/debug if you keep adding more unnecessary allocation :-)
If you do actually need an int allocated with malloc, and to use a pointer to that int, then I'd go with the first one (but bugfixed):
void set_int(int *p) { *p = 5; }
int *x = malloc(sizeof(*x));
if (x == 0) { do something about the error }
set_int(x);
Note that the function set_int is the same either way. It doesn't care where the integer it's setting came from, whether it's on the stack or the heap, who owns it, whether it has existed for a long time or whether it's brand new. So it's flexible. If you then want to also write a function which does two things (allocates something and sets the value) then of course you can, using set_int as a building block, perhaps like this:
int *allocate_and_set_int() {
int *x = malloc(sizeof(*x));
if (x != 0) set_int(x);
return x;
}
In the context of a real app, you can probably think of a better name than allocate_and_set_int...
Some errors:
int main(){
int x*; //should be int* x; or int *x;
set_int(x);
}
Also, you are not allocating any memory in the first code example.
int *x = malloc(sizeof(int));
About the style:
I prefer the first one, because you have less chances of not freeing the memory held by the pointer.
The first one is incorrect (apart from the syntax error) - you're passing an uninitialised pointer to set_int(). The correct call would be:
int main()
{
int x;
set_int(&x);
}
If they're just ints, and it can't fail, then the usual answer would be "neither" - you would usually write that like:
int get_int(void)
{
return 5;
}
int main()
{
int x;
x = get_int();
}
If, however, it's a more complicated aggregate type, then the second version is quite common:
struct somestruct *new_somestruct(int p1, const char *p2)
{
struct somestruct *s = malloc(sizeof *s);
if (s)
{
s->x = 0;
s->j = p1;
s->abc = p2;
}
return s;
}
int main()
{
struct somestruct *foo = new_somestruct(10, "Phil Collins");
free(foo);
return 0;
}
This allows struct somestruct * to be an "opaque pointer", where the complete definition of type struct somestruct isn't known to the calling code. The standard library uses this convention - for example, FILE *.
Definitely go with the first version. Notice that this allowed you to omit a dynamic memory allocation, which is SLOW, and may be a source of bugs, if you forget to later free that memory.
Also, if you decide for some reason to use the second style, notice that you don't need to initialize the pointer to NULL. This value will either way be overwritten by whatever malloc() returns. And if you're out of memory, malloc() will return NULL by itself, without your help :-).
So int *temp = malloc(sizeof(int)); is sufficient.
Memory managing rules usually state that the allocator of a memory block should also deallocate it. This is impossible when you return allocated memory. Therefore, the second should be better.
For a more complex type like a struct, you'll usually end up with a function to initialize it and maybe a function to dispose of it. Allocation and deallocate should be done separately, by you.
C gives you the freedom to allocate memory dynamically or statically, and having a function work only with one of the two modes (which would be the case if you had a function that returned dynamically allocated memory) limits you.
typedef struct
{
int x;
float y;
} foo;
void foo_init(foo* object, int x, float y)
{
object->x = x;
object->y = y;
}
int main()
{
foo myFoo;
foo_init(&foo, 1, 3.1416);
}
In the second one you would need a pointer to a pointer for it to work, and in the first you are not using the return value, though you should.
I tend to prefer the first one, in C, but that depends on what you are actually doing, as I doubt you are doing something this simple.
Keep your code as simple as you need to get it done, the KISS principle is still valid.
It is best not to return a piece of allocated memory from a function if somebody does not know how it works they might not deallocate the memory.
The memory deallocation should be the responsibility of the code allocating the memory.
The first is preferred (assuming the simple syntax bugs are fixed) because it is how you simulate an Out Parameter. However, it's only usable where the caller can arrange for all the space to be allocated to write the value into before the call; when the caller lacks that information, you've got to return a pointer to memory (maybe malloced, maybe from a pool, etc.)
What you are asking more generally is how to return values from a function. It's a great question because it's so hard to get right. What you can learn are some rules of thumb that will stop you making horrid code. Then, read good code until you internalize the different patterns.
Here is my advice:
In general any function that returns a new value should do so via its return statement. This applies for structures, obviously, but also arrays, strings, and integers. Since integers are simple types (they fit into one machine word) you can pass them around directly, not with pointers.
Never pass pointers to integers, it's an anti-pattern. Always pass integers by value.
Learn to group functions by type so that you don't have to learn (or explain) every case separately. A good model is a simple OO one: a _new function that creates an opaque struct and returns a pointer to it; a set of functions that take the pointer to that struct and do stuff with it (set properties, do work); a set of functions that return properties of that struct; a destructor that takes a pointer to the struct and frees it. Hey presto, C becomes much nicer like this.
When you do modify arguments (only structs or arrays), stick to conventions, e.g. stdc libraries always copy from right to left; the OO model I explained would always put the structure pointer first.
Avoid modifying more than one argument in one function. Otherwise you get complex interfaces you can't remember and you eventually get wrong.
Return 0 for success, -1 for errors, when the function does something which might go wrong. In some cases you may have to return -1 for errors, 0 or greater for success.
The standard POSIX APIs are a good template but don't use any kind of class pattern.

Resources