C dynamic array of strings - c

I'm trying to write a simple implementation of dynamic arrays of strings in C. Here's my code (minus the includes and main function etc ...):
typedef char* string;
typedef struct {
string* list;
size_t size;
size_t used;
} list;
void initList(list* l, size_t initSize) {
l->list = malloc(initSize * sizeof(string));
l->used = 0;
l->size = initSize;
}
void insertList(list* l, string elem) {
if (l->used == l->size) {
l->size *= 2;
l->list = realloc(l->list, l->size * sizeof(string));
}
l->list[l->used++] = elem;
}
My code seems to work as I expect, I'm asking my question because I read that you should use char[] instead of char*.
I read that using typedef char* string declares the string in read-only memory, so trying to modify it causes undefined behaviour.
If so, using the GCC C compiler I don't receive any errors or warnings and the code seems to work when compiled.
The functions for creating and growing the dynamic array where taken from another question on StackOverflow, the original question created a dynamic array of integers.
I'm just curious as to if my code is good/bad practice.

I read that using typedef char* string declares the string in read-only memory, so trying to modify it causes undefined behaviour.
Well, that's nonsense. You might confuse this with something like
char *foo = "foo";
In that case, although foo is a mutable pointer to a mutable set of characters, the characters it points to are NOT mutable. This is because a "string literal" in C always lives in non-mutable memory. It doesn't help that foo isn't const here. A good compiler will warn you, though -- you should only assign string literals to const char *.
If you want a mutable string, initialize a char[] from it, this way it's not a pointer and the compiler knows to place it in mutable memory. But this really only concerns literals.
So there's nothing wrong with using char * as your string type. In fact, that's what C does implicitly for string literals. I'd still have a little objection: Seasoned C-programmers will know about char * et al and they will expect that "plain" strings are char * (or const char *). If you name something string, it should somehow provide more than that. If it doesn't, just don't confuse people and go by char *.

Related

pointer to a string literal - rewrite/reassign

Is it fine to use char pointer(char *retStatus) in below way? Like assigning/rewriting values whenever required without allocating memory? I tested it and it is working fine but would like to know is this a good approach to assign error messages to char * and then copy/concat to other static or allocated memory pointer.
void fn(char *status, size_t maxLen)
{
char *retStatus = NULL;
...
...
if(failure)
{
retStatus = "error1";
if((strlen(retStatus) + strlen(status)) < maxLen)
{
strcat_s(status, maxLen, retStatus);
}
}
...
...
if(failure)
{
retStatus = "error2";
if((strlen(retStatus) + strlen(status)) < maxLen)
{
strcat_s(status, maxLen, retStatus);
}
}
}
int main()
{
char status[10] = { 0 };
size_t statusMaxLen = sizeof(status) / sizeof(status[0]);
fn(status, statusMaxLen);
return 0;
}
Is it fine to use char pointer(char *retStatus) in below way? Like
assigning/rewriting values whenever required without allocating
memory?
A string literal represents an array of char with static storage duration, with the (rather substantial) restriction that any attempt to modify the array contents produces undefined behavior. You may use string literals in any way that you may use any other char array, subject to the restriction that you do not attempt to modify them.
With that said, it is better form to avoid assigning string literals to variables of type char *, or passing them as function arguments corresponding to parameters of that type. Instead, limit yourself to pointers of type const char *, which convey the relevant restriction explicitly.
I tested it and it is working fine but would like to know is
this a good approach to assign error messages to char * and then
copy/concat to other static or allocated memory pointer.
The particular combination of assignment followed by non-mutating access is allowed, and will work reliably, but again, it would be better to use a variable of type const char * instead of a variable of type char *. Do note, however, that it is possible to get yourself into trouble that way if you're not careful. For example, sizeof("error1") is unlikely to be equal to (retStatus = "error1", sizeof(retStatus)).
This is a valid and indeed a clever way of using pointers. The approach works fine for the given example.
The only possible issue with the approach, in the context of big and long running programs, is with the intended lifetime of the variable.
If memory is allocated explicitly using malloc, then it can also be deleted whenever the variable is no longer required. The explicit memory management will help conserve memory allocation and improve the overall performance.
In the current approach, the memory allocated to the variable will persist throughout the running time of the program.
If it is desirable to have the variable persist throughout the program runtime, then the followed approach is perfect.
If conservation of memory is a crucial requirement, then using malloc and free is a recommended approach.

How Should I Define/Declare String Constants

I've always used string constants in C as one of the following
char *filename = "foo.txt";
const char *s = "bar"; /* preferably this or the next one */
const char * const s3 = "baz":
But, after reading this, now I'm wondering, should I be declaring my string constants as
const char s4[] = "bux";
?
Please note that linked question suggested as a duplicate is different because this one is specifically asking about constant strings. I know how the types are different and how they are stored. The array version in that question is not const-qualified. This was a simple question as to whether I should use constant array for constant strings vs. the pointer version I had been using. The answers here have answered my question, when two days of searching on SO and Google did not yield an exact answer. Thanks to these answers, I've learned that the compiler can do special things when the array is marked const, and there are indeed (at least one) case where I will now be using the array version.
Pointer and arrays are different. Defining string constants as pointers or arrays fits different purposes.
When you define a global string constant that is not subject to change, I would recommend you make it a const array:
const char product_name[] = "The program version 3";
Defining it as const char *product_name = "The program version 3"; actually defines 2 objects: the string constant itself, which will reside in a constant segment, and the pointer which can be changed to point to another string or set to NULL.
Conversely, defining a string constant as a local variable would be better done as a local pointer variable of type const char *, initialized with the address of a string constant:
int main() {
const char *s1 = "world";
printf("Hello %s\n", s1);
return 0;
}
If you define this one as an array, depending on the compiler and usage inside the function, the code will make space for the array on the stack and initialize it by copying the string constant into it, a more costly operation for long strings.
Note also that const char const *s3 = "baz"; is a redundant form of const char *s3 = "baz";. It is different from const char * const s3 = "baz"; which defines a constant pointer to a constant array of characters.
Finally, string constants are immutable and as such should have type const char []. The C Standard purposely allows programmers to store their addresses into non const pointers as in char *s2 = "hello"; to avoid producing warnings for legacy code. In new code, it is highly advisable to always use const char * pointers to manipulate string constants. This may force you to declare function arguments as const char * when the function does not change the string contents. This process is known as constification and avoid subtile bugs.
Note that some functions violate this const propagation: strchr() does not modify the string received, declared as const char *, but returns a char *. It is therefore possible to store a pointer to a string constant into a plain char * pointer this way:
char *p = strchr("Hello World\n", 'H');
This problem is solved in C++ via overloading. C programmers must deal with this as a shortcoming. An even more annoying situation is that of strtol() where the address of a char * is passed and a cast is required to preserve proper constness.
The linked article explores a small artificial situation, and the difference demonstrated vanishes if you insert const after * in const char *ptr = "Lorum ipsum"; (tested in Apple LLVM 10.0.0 with clang-1000.11.45.5).
The fact the compiler had to load ptr arose entirely from the fact it could be changed in some other module not visible to the compiler. Making the pointer const eliminates that, and the compiler can prepare the address of the string directly, without loading the pointer.
If you are going to declare a pointer to a string and never change the pointer, then declare it as static const char * const ptr = "string";, and the compiler can happily provide the address of the string whenever the value of ptr is used. It does not need to actually load the contents of ptr from memory, since it can never change and will be known to point to wherever the compiler chooses to store the string. This is then the same as static const char array[] = "string";—whenever the address of the array is needed, the compiler can provide it from its knowledge of where it chose to store the array.
Furthermore, with the static specifier, ptr cannot be known outside the translation unit (the file being compiled), so the compiler can remove it during optimization (as long as you have not taken its address, perhaps when passing it to another routine outside the translation unit). The result should be no differences between the pointer method and the array method.
Rule of thumb: Tell the compiler as much as you know about stuff: If it will never change, mark it const. If it is local to the current module, mark it static. The more information the compiler has, the more it can optimize.
From the performance perspective, this is a fairly small optimization which makes sense for low-level code that needs to run with the lowest possible latency.
However, I would argue that const char s3[] = "bux"; is better from the semantic perspective, because the type of the right hand side is closer to type of the left hand side. For that reason, I think it makes sense to declare string constants with the array syntax.

C - Enforcing string parameter to be in read-only memory

I'm optimizing some code, and I have a function like this:
const char * gStrPtr = NULL;
void foo (const char *str) {
gStrPtr = strdup(str);
}
As of now, foo() is only called with constant strings. eg:
const char fooStr[]="Some really long string...";
foo(fooStr);
Notice that because it's always called with a constant, I should be able to just do:
void foo (const char *str) {
gStrPtr=str;
}
But, it opens up a sharp stick: if someone in the future breaks the convention, and tries calling foo() with a dynamic copy of a string that is later freed, it could cause undefined behavior.
I'm wondering if it's possible to create a compile-time or even a run-time check that checks if str is in read-only memory to avoid expensive bug-chases down the road.
Note: if I assume str is a string literal, then I can do it with a macro as so:
#define foo(str) foo_func("" str)
which will cause compile errors on non string-literals. But it also does not accept pointers to const chars.
EDIT
I thought I would post this after the discussion below. #CraigEtsy pointed out the use of __builtin_constant_p, which is a best-effort approach at this problem (but will likely suffice for my needs). I did the following tests with this, and got these results:
void foo(const char *str) {
if (__builtin_constant_p(*str))
printf("%s is constant\n", str);
else
printf("%s is not constant\n", str);
}
const char globalArray[] = "globalArray";
const char *globalPtr = "globalPtr";
int main()
{
const char localArray[]="localArray";
const char *localPtr="localPtr";
char localNonConst[]="localNonConst";
foo("literal"); // constant
foo(localArray); // not constant
foo(localPtr); // constant
foo(globalArray); // constant
foo(globalPtr); // not constant
foo(localNonConst); // not constant
}
And when compiled with -O3, it gave results:
literal is constant
localArray is not constant
localPtr is constant
globalArray is constant
globalPtr is not constant
localNonConst is not constant
So, for my particular case, I can just switch the const char arr[]="str"'s to const char * arr="str", and then, in my foo(), I can check if the value is constant, and allocate memory and raise a runtime warning if not (and mark a flag so I know whether to free the pointer later on...).
I don't think there's any reasonable way to enforce this at runtime, at least not without machinery that would be many orders of magnitude more expensive than just calling strdup.
If the function is only supposed to take immutable strings as arguments (that's the word you're looking for -- immutable, in the sense that its lifetime will be the remainder of the process's lifetime and its contents will not change for the remainder of its lifetime), this needs to be a documented part of its interface contract.

Purpose of NULL at the end of array?

I found this code from Fedora's Defensive Coding manual. They have this code snippet:
static const char *const string_list[] = {
"first",
"second",
"third",
NULL
};
What's the point of putting NULL at the end of the array? Surely the size of the array will always be known since it's an array at global scope.
This is a so called sentinel value. It allows you to use whatever method of inbounds-checking that lends itself more to your current approach.
Aren't you already dereferencing? check for NULL then, should keeping track of index not be required. Are you counting? check the count variable then.
The most prevalent use of this is there in most programs you write: The argument vector (char **argv) passed to main always has a NULL as the last member, although technically the int argc would have sufficed.
The number of elements is known
elems = sizeof(string_list) / sizeof(char*);
but it's cleaner (and works more generally) to test for the NULL value, since you will be handling the string pointer anyway.
Note too the title of your link: "defensive". In a more general case, the array might not be full so relying on its size is unsafe - there may be uninitialised pointers.
This allows you to iterate through the list with a pointer without keeping track of the index:
for (const char * const * string = string_list; *string; string++)
do_something(*string);
Without the const, for clarity:
for (char **string = string_list; *string; string++)
do_something(*string);
This is the same technique that is often used with strings themselves:
for (char *c = string; *c; c++)
process_char(c);
You can also pass the list to a function without having to also pass the length.
It also makes it easy to refer to a subsequence of the array, provided it's at the end.

Distinguish between string and byte array?

I have a lot of functions that expect a string as argument, for which I use char*, but all my functions that expect a byte-array, also use char*.
The problem is that I can easily make the mistake of passing a byte-array in a string-function, causing all kinds of overflows, because the null-terminator cannot be found.
How is this usually delt with? I can imagine changing all my byte-array functions to take an uint8_t, and then the compiler will warn about signed-ness when I pass a string. Or what is the right approach here?
I generally make an array something like the following
typedef struct {
unsigned char* data;
unsigned long length;
unsigned long max_length;
} array_t;
then pass array_t* around
and create array functions that take array_t*
void array_create( array_t* a, unsgined long length) // allocates memory, sets the max_length, zero length
void array_add(array_t* a, unsigned char byte) // add a byte
etc
The problem is more general in C than you are thinking. Since char* and char[] are equivalent for function parameters, such a parameter may refer to three different semantic concepts:
a pointer on one char object (this is the "official" definition of pointer types)
a char array
a string
In most cases where is is possible the mondern interfaces in the C standard uses void* for an untyped byte array, and you should probably adhere to that convention, and use char* only for strings.
char[] by themselves probably are rarely used as such; I can't imagine a lot of use cases for these. If you think of them as numbers you should use the signed or unsigned variant, if you see them just as bit pattern unsigned char should be your choice.
If you really mean an array as function parameter (char or not) you can mark that fact for the casual reader of your code by clearly indicating it:
void toto(size_t n, char A[const n]);
This is equivalent to
void toto(size_t n, char *const A);
but makes your intention clearer. And in the future there might even be tools that do the bounds checking for you.
Write a common structure to handle both string and bytes.
struct str_or_byte
{
int type;
union
{
char *buf;
char *str;
}pointer;
int buf_length;
}
If type is not string then access the pointer.buf only upto buf_length. Otherwise directly access pointer.str without checking buf_length and maintain it as null terminated string.
Or else maintain string also as byte array by considering only length, dont keep null terminated charater for string.
struct str_or_byte
{
char *buf;
int buf_length;
}
And dont use string manuplation functions which are not considering length. That means use strncpy, strncat, strncmp ... instead of strcpy, strcat, strcmp...
C using convention. Here's the rules I use (fashioned after the std lib)
void foo(char* a_string);
void bar(void* a_byte_array, size_t number_of_bytes_in_the_array);
This is easy to remember. If you are passing a single char* ptr, then it MUST be a null-terminated char array.

Resources