Better practice to strcpy() or point to another data structure? - c

Because it's always easier to see code...
My parser fills this object:
typedef struct pair {
char* elementName;
char* elementValue;
} pair;
My interpreter wants to read that object and fill this one:
typedef struct thing {
char* label;
} thing;
Should I do this:
thing.label = pair.elementName;
or this:
thing.label = (char*)malloc(strlen(pair.elementName)+1);
strcpy(thing.label, pair.elementName);
EDIT: Yes, I guess I should have specified what the rest of the program will do with the objects. I will eventually need to save "pair" to a file. So when thing.label is modified, then (at some point) pair.elementName needs to be modified to match. So I guess the former is the best way to do it?

No good answer to that question as there is too little context. It all depends on how the rest of the program manages the lifetimes of the objects it creates.

I would personally do the former, but it's a tradeoff. The former avoids the need to allocate new memory and copy data to it, but the latter avoids the confusion of aliasing by keeping thing.label and pair.elementName pointing to separate memory addresses, which means you need to free both of them (with the former you need to be sure to free exactly one, to avoid either a memory leak or a double free)

Here are some of the things that need to be known to answer the question:
Which object will 'own' the string? Or will both own their string (in which case a 'deep' copy is necessary)?
are the lifetimes of the pair and thing objects related in any way - will one object always 'outlive' the other? Does one of these objects own the other one?
If the pair and thing objects are independent, then copying the string data is probably the correct thing to do. If one is owned by the other, then that might indicate that a simple sharing of the pointer is appropriate.
Not that these are the only possible answers - just a couple of the easier ones.

From an "object" independence standpoint, it is probably better to make a copy of the data to avoid problems with dangling pointers.
It would be more efficient and faster to just assign the pointer, but unless that extra performance is highly critical, you will probably be better off (from a debugging standpoint) by making the copy.

The answer is, as always, "It Depends." If all you are doing with the "copied" value is reading it, it is probably okay to just copy the pointer address (i.e., the former), as long as you cleanup properly. If the "copied" value is going to be modified in any way, you are going to want to create a new string entirely (i.e., the latter) to avoid any unintended side effects caused by the "original" value changing (unless, of course, that is exactly the desired effect).

If you want to do a copy, and do all the cleanup afterwards... in C you should do this:
thing.label = strdup(pair.elementName);

I don't want to be a c-police, but please use safer strncpy() instead of strcpy().
char* strncpy(char *s1, const char *s2, size_t n);
strncpy function copies at most n characters from s2 into s1.

Related

Is it bad programming practice to declare an array type in C?

I have a bunch of arrays of various lengths and I'd like to keep the length close to the array, is it bad programming practice to define something like this?
typedef struct Array {
long len;
int* buf;
} Array;
Are there any obvious downsides or pitfalls to this?
No, not necessarily. You will need to pass the length of a buffer around with the pointer to make sure that you don't overrun it. It would be a slight performance hit due to the extra dereference, but that can be a worthwhile tradeoff if always having the length available avoids causing bugs. You would need to make sure that the length gets updated if you resize the buffer, and make sure that the buffer gets freed if you free your struct. Other than that, it makes sense to me, and it's what a lot of other higher-level languages do in their array implementations.
This is fine, a similar structure is commonly used for counted strings, aka vectors, which are not null terminated, hence the need to store their length. I wouldn't bother with this approach if the arrays are only used locally, though.
As it's been noted, it would be better to use an unsigned type such as size_t or uint32_t for the length.
When dealing with pointers one needs to take the usual precautions to ensure that it always has a valid reference before using or passing around, that allocated memory is freed, etc. Using valgrind is highly recommended.

Embedded versus pointer in nested String-like structures

When designing structures to contain textual data, I have been using two basic approaches illustrated below:
typedef struct {
STRING address1;
STRING address2;
STRING city;
STRING state;
STRING zip;
} ADDRESS;
typedef struct {
STRING* address1;
STRING* address2;
STRING* city;
STRING* state;
STRING* zip;
} ADDRESS;
where STRING is some variable length string-storing type. The advantage of the pointer version is that I can store NULL indicating that data is missing. For example, address2 might be not provided for some addresses. In the type with embedded STRINGs, I have to use a "blank" string, meaning one that has 0 length.
With the pointers there is (possibly) more code burden because I have to check every member for NULL before using. The advantage is not that great, however, because usually the embedded version has to be checked too. For example, if I am printing an address, I have to check for a zero-length string and skip that line. With pointers the user can actually indicate they want a "blank" versus a missing value, although it is hard to see a use for this.
When creating or freeing the structure, pointers add a bunch of additional steps. My instinct is to standardize on the embedded style to save these steps, but I am concerned that there might be a hidden gotcha. Is this an unwarranted fear, or should I be using the pointers for some compelling reason?
Note that memory use is an issue, but it is pretty minor. The pointer version takes a little bit more memory because I am storing pointers to the structs in addition to the structs. But each string struct takes maybe 40 bytes on average, so if I am storing 4 byte pointers, then the pointer version costs maybe 10% more memory which is not significant. Having null pointers possible does not save significant memory because most fields are populated.
Question is About ADDRESS not STRING
Some of the respondents seem to be confused and think I am asking about global tradeoffs, like how to minimize my total work. That is not the case. I am asking about how to design ADDRESS, not STRING. The members of address could have fixed arrays, or in other cases not. For the purposes of my question, I am not concerned about are the consequences for the container.
I have already stated that the only issue I can see is that it costs more time to use pointers, but I get the benefit of being able to store a NULL. However, as I already said, that benefit does not seem to be significant, but maybe it is for some reason. That is the essence of my question: is there some hidden benefit of having this flexibility that I am not seeing and will wish I had later on.
If you don't understand the question, please read the preliminary answer I have written myself below (after some additional thought) to see the kind of answer I am looking for.
Tradeoffs over memory usage and reduction in mallocs
Seems like the tradeoffs center around two questions: 1) How precious is memory? and 2) Does it matter that a fixed amount of memory is allocated for the strings, limiting the lengths to be stored in each field?
If memory is more important than anything else, then the pointer version probably wins. If predictability of storage usage and avoidance of mallocs is preferred, and limiting the length of the names to some fixed amount is acceptabe, then the fixed length version may be the winner.
One problem with the embedded style is that STRING needs to be defined as something like char[MAX_CHAR + 1] where MAX_CHAR is a pessimistic maximum length for the given fields. The pointer style allows one to allocate the correct amount of memory. The downside as you mention is a much higher cognitive overhead managing your struct.
I have been considering this more deeply and I think that in most cases pointers are necessary because it is important to discriminate between blank and missing. The reason for this is that the missing data is needed when an input is invalid or corrupt or left out. For example, let's imagine that when reading from a file, the file is corrupted so a field like zip code is unreadable. In that case the data is "missing" and the pointer should be NULL. On the other hand, let's imagine that the place has no zip code, then it is "blank". So, NULL means that the user has not yet provided information, but blank means the user has provided the information and there is none of the type in question.
So, to further illustrate the importance of using the pointer, imagine that a complex structure is populated over time in different, asynchronous steps. Here we need to know what fields have been read, and which ones have not. Unless we are using pointers (or adding additional metadata), we have no way of telling the difference between a field that has been answered and one for which the answer is "none". Imagine the system prompting the user "what is the zip code?". User says, "this place has no zip code". Then 5 minutes later the system asks again, "what is the zip code?". That use case makes it clear that we need pointers in most cases.
In this light, the only situation where I should use embedded structs is when the container structure is guaranteed to have a full set of data whenever it is created.

How to distinguish a malloced string from a string literal?

Is there a way (in pure C) to distinguish a malloced string from a string literal, without knowing which is which? Strictly speaking, I'm trying to find a way to check a variable whether it's a malloced string or not and if it is, I'm gonna free it; if not, I'll let it go.
Of course, I can dig the code backwards and make sure if the variable is malloced or not, but just in case if an easy way exists...
edit: lines added to make the question more specific.
char *s1 = "1234567890"; // string literal
char *s2 = strdup("1234567890"); // malloced string
char *s3;
...
if (someVar > someVal) {
s3 = s1;
} else {
s3 = s2;
}
// ...
// after many, many lines of code an algorithmic branches...
// now I lost track of s3: is it assigned to s1 or s2?
// if it was assigned to s2, it needs to be freed;
// if not, freeing a string literal will pop an error
Is there a way (in pure C) to distinguish a malloced string from a string literal,
Not in any portable way, no. No need to worry though; there are better alternatives.
When you write code in C you do so while making strong guarantees about "who" owns memory. Does the caller own it? Then it's their responsibility to deallocate it. Does the callee own it? Similar thing.
You write code which is very clear about the chain of custody and ownership and you don't run into problems like "who deallocates this?" You shouldn't feel the need to say:
// after many, many lines of code an algorithmic branches...
// now I forgot about s3: was it assigned to s1 or s2?
The solution is; don't forget! You're in control of your code, just look up the page a bit. Design it to be bulletproof against leaking memory out to other functions without a clear understanding that "hey, you can read this thing, but there are no guarantees that it will be valid after X or Y. It's not your memory, treat it as such."
Or maybe it is your memory. Case in point; your call to strdup. strdup let's you know (via documentation) that it is your responsibility to deallocate the string that it returns to you. How you do that is up to you, but your best bet is to limit its scope to be as narrow as possible and to only keep it around for as short a time as necessary.
It takes time and practice for this to become second nature. You will create some projects which handle memory poorly before you get good at it. That's ok; your mistakes in the beginning will teach you exactly what not to do, and you'll avoid repeating them in the future (hopefully!)
Also, as #Lasse alluded to in the comments, you don't have to worry about s3, which is a copy of a pointer, not the entire chunk of memory. If you call free on s2 and s3 you end up with undefined behavior.
Here is a practical way:
Although the C-language standard does not dictate this, for all identical occurrences of a given literal string in your code, the compiler generates a single copy within the RO-data section of the executable image.
In other words, every occurrence of the string "1234567890" in your code is translated into the same memory address.
So at the point where you want to deallocate that string, you can simply compare its address with the address of the literal string "1234567890":
if (s3 != "1234567890") // address comparison
free(s3);
To emphasize this again, it is not imposed by the C-language standard, but it is practically implemented by any decent compiler of the language.
UPDATE:
The above is merely a practical trick referring directly to the question at hand (rather than to the motivation behind this question).
Strictly speaking, if you've reached a point in your implementation where you have to distinguish between a statically allocated string and a dynamically allocated string, then I would tend to guess that your initial design is flawed somewhere along the line.

Is it a common practice to re-use the same buffer name for various things in C?

For example, suppose I have a buffer called char journal_name[25] which I use to store the journal name. Now suppose a few lines later in the code I want to store someone's name into a buffer. Should I go char person_name[25] or just reuse the journal_name[25]?
The trouble is that everyone who reads the code (and me too after a few weeks) has to understand journal_name is now actually person_name.
But the counter argument is that having two buffers increases space usage. So better to use one.
What do you think about this problem?
Thanks, Boda Cydo.
The way to solve this in the C way, if you really don't want to waste memory, is to use blocks to scope the buffers:
int main()
{
{
char journal_name[26];
// use journal name
}
{
char person_name[26];
// use person name
}
}
The compiler will reuse the same memory location for both, while giving you a perfectly legible name.
As an alternative, call it name and use it for both <.<
Some code is really in order here. However a couple of points to note:
Keep the identifier decoupled from your objects. Call it scratchpad or anything. Also, by the looks of it, this character array is not dynamically allocated. Which means you have to allocate a large enough scratch-pad to be able to reuse them.
An even better approach is to probably make your functions shorter: One function should ideally do one thing at a time. See if you can break up and still face the issue.
As an alternative to the previous (good) answers, what about
char buffer[25];
char* journal_name = buffer;
then later
char* person_name = buffer;
Would it be OK ?
If both buffers are automatic why not use this?
Most of compilers will handle this correctly with memory reused.
But you keep readability.
{
char journal_name[25];
/*
Your code which uses journal_name..
*/
}
{
char person_name[25];
/*
Your code which uses person_name...
*/
}
By the way even if your compiler is stupid and you are very low of memory you can use union but keep different names for readability. Usage of the same variable is worst way.
Please use person_name[25]. No body likes hard to read code. Its not going to do much if anything at all for your program in terms of memory. Please, just do it the readable way.
You should always (well unless you're very tight on memory) go for readability and maintainability when writing code for the very reason you mentioned in your question.
25 characters (unless this is just an example) isn't going to "break the bank", but if memory is at a premium you could dynamically allocate the storage for journal_name and then free it when you've finished with it before dynamically allocating the storage for person_name. Though there is the "overhead" of the pointer to the array.
Another way would be to use local scoping on the arrays:
void myMethod()
{
... some code
{
char journal_name[25];
... some more code
}
... even more code
{
char person_name[25];
... yet more code
}
}
Though even with this pseudo code the method is getting quite long and would benefit from refactoring into subroutines which wouldn't have this problem.
If you are worried about memory, and I doubt 25 bytes will be an issue, but then you can just use malloc and free and then you just have an extra 4-8 bytes being used for the pointer.
But, as others mentioned, readability is important, and you may want to decompose your function so that the two buffers are used in functions that actually give more indication as to their uses.
UPDATE:
Now, I have had a buffer called buffer that I would use for reading from a file, for example, and then I would use a function pointer that was passed, to parse the results so that the function reads the file, and handles it appropriately, so that the buffer isn't filled in and then I have to remember that it shouldn't be overwritten yet.
So, yet, reusing a buffer can be useful, when reading from sockets or files, but you want to localize the usage of this buffer otherwise you may have race conditions.

Is it a best practice to wrap arrays and their length variable in a struct in C?

I will begin to use C for an Operating Systems course soon and I'm reading up on best practices on using C so that headaches are reduced later on.
This has always been one my first questions regarding arrays as they are easy to screw up.
Is it a common practice out there to bundle an array and its associated variable containing it's length in a struct?
I've never seen it in books and usually they always keep the two separate or use something like sizeof(array[]/array[1]) kind of deal.
But with wrapping the two into a struct, you'd be able to pass the struct both by value and by reference which you can't really do with arrays unless using pointers, in which case you have to again keep track of the array length.
I am beginning to use C so the above could be horribly wrong, I am still a student.
Cheers,
Kai.
Yes this is a great practice to have in C. It's completely logical to wrap related values into a containing structure.
I would go ever further. It would also serve your purpose to never modify these values directly. Instead write functions which act on these values as a pair inside your struct to change length and alter data. That way you can add invariant checks and also make it very easy to test.
Sure, you can do that. Not sure if I'd call it a best practice, but it's certainly a good idea to make C's rather rudimentary arrays a bit more manageable. If you need dynamic arrays, it's almost a requirement to group the various fields needed to do the bookkeeping together.
Sometimes you have two sizes in that case: one current, and one allocated. This is a tradeoff where you trade fewer allocations for some speed, paying with a bit of memory overhead.
Many times arrays are only used locally, and are of static size, which is why the sizeof operator is so handy to determine the number of elements. Your syntax is slightly off with that, by the way, here's how it usually looks:
int array[4711];
int i;
for(i = 0; i < sizeof array / sizeof *array; i++)
{
/* Do stuff with each element. */
}
Remember that sizeof is not a function, the parenthesis are not always needed.
EDIT: One real-world example of a wrapping exactly as that which you describe is the GArray type provided by glib. The user-visible part of the declaration is exactly what you describe:
typedef struct {
gchar *data;
guint len;
} GArray;
Programs are expected to use the provided API to access the array whenever possible, not poke these fields directly.
There are three ways.
For static array (not dynamically allocated and not passed as pointer) size is knows at compile time so you can used sizeof operator, like this: sizeof(array)/sizeof(array[0])
Use terminator (special value for last array element which cannot be used as regular array value), like null-terminated strings
Use separate value, either as a struct member or independent variable. It doesn't really matter because all the standard functions that work with arrays take separate size variable, however joining the array pointer and size into one struct will increase code readability. I suggest to use to have a cleaner interface for your own functions. Please note that if you pass your struct by value, called function will be able to change the array, but not the size variable, so passing struct pointer would be a better option.
For public API I'd go with the array and the size value separated. That's how it is handled in most (if not all) c library I know. How you handle it internally it's completely up to you. So using a structure plus some helper functions/macros that do the tricky parts for you is a good idea. It's always making me head-ache to re-think how to insert an item or to remove one, so it's a good source of problems. Solving that once and generic, helps you getting bugs from the beginning low. A nice implementation for dynamic and generic arrays is kvec.
I'd say it's a good practice. In fact, it's good enough that in C++ they've put it into the standard library and called it vector. Whenever you talk about arrays in a C++ forum, you'll get inundated with responses that say to use vector instead.
I don't see anything wrong with doing that but I think the reason that that is not usually done is because of the overhead incurred by such a structure. Most C is bare-metal code for performance reasons and as such, abstractions are often avoided.
I haven't seen it done in books much either, but I've been doing the same the same thing for a while now. It just seems to make sense to "package" those things together. I find it especially useful if you need to return an allocated array from a method for instance.
If you use static arrays you have access to the size of array using sizeof operator. If you'll put it into struct, you can pass it to function by value, reference and pointer. Passing argument by reference and by pointer is the same on assembly level (I'm almost sure of it).
But if you use dynamic arrays, you don't know the size of array at compile time. So you can store this value in struct, but you will also store only a pointer to array in structure:
struct Foo {
int *myarray;
int size;
};
So you can pass this structure by value, but what you realy do is passing pointer to int (pointer to array) and int (size of array).
In my opinion it won't help you much. The only thing that is in plus, is that you store the size and the array in one place and it is easy to get the size of the array. If you will use a lot of dynamic arrays you can do it this way. But if you will use few arrays, easier will be not to use structures.
I've never seen it done that way, but I haven't done OS level work in over a decade... :-) Seems like a reasonable approach at first glance. Only concern would be to make sure that the size somehow stays accurate... Calculating as needed doesn't have that concern.
considering you can calculate the length of the array (in bytes, that is, not # of elements) and the compiler will replace the sizeof() calls with the actual value (its not a function, calls to it are replaced by the compiler with the value it 'returns'), then the only reason you'd want to wrap it in a struct is for readability.
It isn't a common practice, if you did it, someone looking at your code would assume the length field was some special value, and not just the array size. That's a danger lurking in your proposal, so you'd have to be careful to comment it properly.
I think this part of your question is backwards:
"But with wrapping the two into a struct, you'd be able to pass the struct both by value and by reference which you can't really do with arrays unless using pointers, in which case you have to again keep track of the array length."
Using pointers when passing arrays around is the default behavior; that doesn't buy you anything in terms of passing the entire array by value. If you want to copy the entire array instead of having it decay to a pointer you NEED to wrap the array in a struct. See this answer for more information:
Pass array by value to recursive function possible?
And here is more on the special behavior of arrays:
http://denniskubes.com/2012/08/20/is-c-pass-by-value-or-reference/

Resources