Couldn't find any questions on StackOverflow that addresses this question.
I realize that char* arrays don't have to be NULL terminated, but was wondering when you would want it to be?
For example, when debugging my code, I use a lot of printf() to see if my variable is correct at certain stage of the code.
I have an char** values that holds 4 char*, I made the last char* NULL.
With NULL terminating, printfs from values[0] to values[3] give me this
note: names is just another array that I print right after I finish printing the values array
Testing values1[0]: %HOME/bin:%PATH
Testing values1[1]: /%HOME/include
Testing values1[2]: /%HOME/lib
Testing values1[3]: (null)
Testing names2[0]: PATH
Testing names2[1]: IDIR
Testing names2[2]: LIBDIR
I have an char** with 3 char*, all of which are valid char*.
Without NULL terminating, printf from values[0] to values[3] gives me this (names doesn't show)
Testing values1[0]: %HOME/bin:%PATH
Testing values1[1]: /%HOME/include
Testing values1[2]: /%HOME/lib
I think when printf(...., values[3]) would be an undefined behavior, such as printing a garbage value, but as shown in the output above, everything including and after printf(...., values[3]) seems to not have been executed.
There's a lot of confusion here. First of all:
NULL refers to a null pointer constant which you only use for setting a pointer to point at "nothing", an invalid memory location.
Null termination, sometimes spelled "nul" to not confuse it with the above, means putting a zero character '\0' (sometimes called "nul character") at the end of a character array to state where your string ends. It has nothing to do with NULL pointers what so ever. A better name than "null termination" might be zero termination, as that is less confusing.
Now as it happens, 0, NULL and '\0' all give the value zero, so they could in practice be used for the wrong purpose and the code will still work. What you should do, however, is this:
Use 0 for integers.
Use NULL for pointers.
Use '\0' to terminate a characer array and thereby making it a C string.
Next matter of confusion:
I have an char** values that holds 4 char*
Pointers do not hold values. Arrays hold values. Pointers are not arrays, arrays are not pointers. Pointer-to-pointer is not an array, nor is it a 2D array.
Though in some circumstances, you can get a pointer to the first element from an array.
An array of pointers to strings of variable length can be declared as: char* string_array [N];. You could iterate through this array by using a pointer-to-pointer, but that's not a good idea. A better idea is to use array indexing: string_array[i].
Overall, there exists very few cases where you actually need to use a pointer-to-pointer. Returning a pointer to an allocated resource through a function parameter is the normal use for them. If you find yourself using pointer-to-pointers elsewhere, it is almost a certain indication of bad program design.
For example, one particular case of very wide-spread but 100% incorrect use of pointer-to-pointer is when allocating 2D arrays dynamically on the heap.
when should char** be null terminated?
Never. That doesn't make any sense, as explained above. Your should most likely not use char** to begin with.
You could however end a character pointer array with NULL, to indicate the end of the array. This is common practice, but don't confuse this with zero termination of strings.
Example:
const char* str_array [] =
{
"hello",
"world",
NULL
};
for(size_t i = 0; str_array[i] != NULL; i++)
{
puts(str_array[i]);
}
I got an answer from my TA, as a response to "when should char** be null terminated?" which I find reasonable. It would be cool if there are other reasons to why you would do this.
"This is a good conceptual question, and you can think of it as analogous to why C strings are null-terminated.
Suppose you didn't want to explicitly store the length of an array (because it's extra data for you to manage and pass around, etc). How would you know where the array ends? The NULL at the end acts as a sentinel value so you can simply iterate over it until you reach the magic end-of-array value.
If you have a fixed array size or are storing it in some other way, the NULL end isn't necessary."
Each string needs to be null terminated. Easy option would be to memset the full array with null (i.e. 0 or '\0').
Alternatively, if you don't want to null terminate, then you need to keep track of the length of the string.
As per C11, chapter ยง7.21.6.1, fprintf(), %s conversion specifier
s If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.
So, you may not pass a NULL as the argument. It invokes undefined behavior. You cannot predict the behaviour of UB.
What you can do is, put a check on the argument to be != NULL and then, pass the variable. Something like
if (values[n])
puts(values[n]);
In C99 (& POSIX), the only array of char* which is required to be NULL terminated is the argv second argument to main. Hence your main function is (or should be) declared as
int main(int argc, char**argv);
and (at least on POSIX systems) it is required (and enforced by the runtime crt0) and you should expect that:
argc is positive
argv is an array of argc+1 pointers, each of them being a C string (so ending with a \0 byte)
its last element argv[argc] is required to be the NULL pointer
two different pointers in argv (i.e. argv[i] and argv[j] with both i and j being non-negative and less than argc+1) are not pointer aliases (that is the same pointer value)
Of course, some libraries may also have functions whose argument might have similar requirements. That should be documented.
Related
This question already has answers here:
What is a null-terminated string?
(7 answers)
Closed last year.
#include <stdio.h>
#include <string.h>
int main()
{
char ch[20] = {'h','i'};
int k=strlen(ch);
printf("%d",k);
return 0;
}
The output is 2.
As far as I know '\0' helps compiler identify the end of string but the output here suggests the strlen can detect the end on it's own then why do we need '\0'?
long story short: it's your compiler making proactive decisions based on the standard.
long story:
char ch[20] = {'h','i'}
in the line above what you are implying to your compiler is;
allocate a memory big enough to store 20 characters (aka, array of 20 chars).
initialize first two slices (first two members of the array) as 'h' & 'i'.
implicitly initialize the rest.
since you are initialing your char array, your compiler is smart enough to insert the null terminator to the third element if it has enough space remaining. This process is the standard for initialization.
if you were to remove the initialization syntax and initialize each member manually like below, the result is undefined behavior.
char ch[20];
ch[0] = 'h';
ch[1] = 'i';
Also, if you were to not have extra space for your compiler to put the null terminator, even if you used a initializer the result would still be an undefined behavior as you can easily test via this code snippet below:
char ch[2] = { 'h','i' };
int k = strlen(ch);
printf("%d\n%s\n", k, ch);
now, if you were to increase the array size of 'ch' from 2 to 3 or any other number higher than 2, you can see that your compiler initializes it with the null terminator thus no more undefined behavior.
In this declaration:
char ch[20] = {'h','i'};
the first two elements are initialized explicitly and all other elements are initialized implicitly by zeroes.
The above declaration in fact (with one exceptions that the third element of the array is also explicitly initialized) is equivalent to:
char ch[20] = "hi";
Pat attention to that the string literal is represented as the following array:
{ 'h', 'i', '\0' }
That is the array contains a string that is terminated by the zero character '\0' and the function strlen can successfully find the length of the stored string.
If you would write for example:
char ch[2] = "hi";
then in this case the array ch does not have a space to store the terminating zero of the string literal. In this case applying the function strlen to this array invokes undefined behavior.
A null byte (i.e. the value 0) is what defines the end of a string in C.
When you defined ch, you gave less initializers than values in the array, so the remaining elements are set to 0. This results in a null terminated string.
The strlen function is basically looking for that value and counting how many elements it sees before it finds the null byte.
As far as I know '\0' helps compiler identify the end of string
Technically, it helps user code and the C runtime library identify the ends of strings. To the extent that the compiler needs to know where strings end, it knows without looking for a terminator.
but the output here suggests the strlen can detect the end on it's own
That would be a misinterpretation. The actual fact is that your string is null-terminated even though you did not put a null terminator in it explicitly. This is a consequence of declaring your array with an initializer that specifies values for only some of the elements. As some of your other answers describe in more detail, that does not produce a partial initialization. Rather, elements for which the initializer does not specify values are default-initialized. For elements of type char, that means initialization with 0, which serves as a string terminator.
Moreover, if the array were without a terminator then the result of passing it to strlen() would be undefined. You could not then conclude anything from the result.
then why do we need '\0'?
So that user code and many standard library functions can recognize the ends of strings. You already know this.
But in many cases we do not need to provide terminators explicitly. In particular, we do not need to represent them in string literals (and it means something different than you probably intended if you do), and you don't need to represent them in the initializers for char arrays storing strings, provided that the array has more elements than you specify in the initializer.
It is likely that your array ch contained zeros thus the byte after i is already set to zero. You can view it with a debugger or simply test it in the code. Trust me, strlen needs the zero to work.
char string1[3][4]={"koo","kid","kav"}; //This is a 2D array
char * string[3]={"koo","kid","kav"}; //This is an array of 3 pointers pointing to 1D array as strings are stored as arrays in memory
char (*string1Ptr)[4]=string1; //This is a pointer to a 1D array of 4 characters
//I want to know differences between string1Ptr(pointer to array mentioned in question) and string(array of pointers mentioned in question). I only typed string1 here to give string1Ptr an address to strings
Besides the fact that string can point to strings of any size and string1Ptr can only point to strings of size 4 only(otherwise pointer arithmetic would go wrong), I don't see any differences between them.
For example,
printf("%s\n", string1[2]); // All print the same thing, ie, the word "kav"
printf("%s\n", string1Ptr[2]);
printf("%s\n", string[2]);
They all seem to perform the same pointer arithmetic.(My reason for assuming string and string1Ptr are almost similar besides for the difference I stated above)
So what are the differences between string and string1Ptr? Any reason to use one over the other?
PS: I'm a newbie so please go easy on me.
Also, I did check C pointer to array/array of pointers disambiguation, it didn't seem to answer my question.
char string1[3][4]={"koo","kid","kav"}; //This is a 2D array
char * string[3]={"koo","kid","kav"}; //This is an array of 3 pointers pointing to 1D array as strings are stored as arrays in memory
char (*string1Ptr)[4]=string1; //This is a pointer to a 1D array of 4 characters
Besides the fact that string can point to strings of any size and
string1Ptr can only point to strings of size 4 only(otherwise
pointer arithmetic would go wrong), I don't any differences between
them.
They are absolutely, fundamentally different, but C goes to some trouble to hide the distinction from you.
string is an array. It identifies a block of contiguous memory wherein its elements are stored. Those elements happen to be of type char * in this example, but that's a relatively minor detail. One can draw an analogy here to a house containing several rooms -- the rooms are physically part of and exist inside the physical boundaries of the house. I can decorate the rooms however I want, but they always remain the rooms of that house.
string1Ptr is a pointer. It identifies a chunk of memory whose contents describe how to access another, different chunk of memory wherein an array of 4 chars resides. In our real estate analogy, this is like a piece of paper on which is written "42 C Street, master bedroom". Using that information, you can find the room and redecorate it as you like, just as in the other case. But you can also replace the paper with a locator for a different room, maybe in a different house, or with random text, or you can even burn the whole envelope, without any of that affecting the room on C Street.
string1, for its part, is an array of arrays. It identifies a block of contiguous memory where its elements are stored. Each of those elements is itself an array of 4 chars, which, incidentally, happens to be just the type of object to which string1Ptr can point.
For example,
printf("%s\n", string1[2]); // All print the same thing, ie, the word "kav"
printf("%s\n", string1Ptr[2]);
printf("%s\n", string[2]);
They all seem to perform the same pointer arithmetic.(My reason for
assuming string and string1Ptr are almost similar besides for the
difference I stated above)
... and that is where C hiding the distinction comes in. One of the essential things to understand about C arrays is that in nearly all expressions,* values of array type are silently and automatically converted to pointers [to the array's first element]. This is sometimes called pointer "decay". The indexing operator is thus an operator on pointers, not on arrays, and indeed it does have similar behavior in your three examples. In fact, the pointer type to which string1 decays is the same as the type of string1Ptr, which is why the initialization you present for the latter is permitted.
But you should understand that the logical sequence of operations is not the same in those three cases. First, consider
printf("%s\n", string1Ptr[2]);
Here, string1Ptr is a pointer, to which the indexing operator is directly applicable. The result is equivalent to *(string1Ptr + 2), which has type char[4]. As a value of array type, that is converted to a pointer to the first element (resulting in a char *).
Now consider
printf("%s\n", string1[2]);
string1 is an array, so first it is converted to a pointer to its first element, resulting in a value of type char(*)[4]. This is the same type as string1Ptr1, and evaluation proceeds accordingly, as described above.
But this one is a bit more different:
printf("%s\n", string[2]);
Here, string is a pointer, so the indexing operation applies directly to it. The result is equivalent to *(string + 2), which has type char *. No automatic conversions are performed.
Any reason to use one over the other?
Many, in both directions, depending on your particular needs at the time. Generally speaking, pointers are more flexible, especially in that they are required for working with dynamically allocated memory. But they suffer from the issues that
a pointer may be in scope, but not point to anything, and
declaring a pointer does not create anything for it to point to. Also,
even if a pointer points to something at one time during an execution of the program, and its value is not subsequently written by the program, it can nevertheless stop pointing to anything. (This most often is a result of the pointer outliving the object to which it points.)
Additionally, it can be be both an advantage and a disadvantage that
a pointer can freely be assigned to point to a new object, any number of times during its lifetime.
Generally speaking, arrays are easier to use for many purposes:
declaring an array allocates space for all its elements. You may optionally specify initial values for them at the point of declaration, or in some (but not all) cases avail yourself of default initialization.
the identifier of an array is valid and refers to the array wherever it is in scope.
Optionally, if an initializer is provided then an array declaration can use it to automatically determine the array dimension(s).
* But only nearly all. There are a few exceptions, with the most important being the operand of a sizeof operator.
The difference between string1 and string is the same as the difference between:
char s1[4] = "foo";
char *s2 = "foo";
s1 is a writable array of 4 characters, s2 is a pointer to a string literal, which is not writable. See Why do I get a segmentation fault when writing to a string initialized with "char *s" but not "char s[]"?.
So in your example, it's OK to do string1[0][0] = 'f'; to change string1[0] to "foo", but string[0][0] = 'f'; causes undefined behavior.
Also, since string is an array of pointers, you can reassign those pointers, e.g. string[0] = "abc";. You can't assign to string1[0] because the elements are arrays, not pointers, just as you can't reassign s1.
The reason that string1Ptr works is because the string1 is a 2D array of char, which is guaranteed to be contiguous. string1Ptr is a pointer to an array of 4 characters, and when you index it you increment by that number of characters, which gets you to the next row of the string1 array.
I am looking for a solution for my problem (newbie here).
I have an array of strings (char** arrayNum) and I would like to store the number of elements of that array at the first index.
But I can't find the right way to do convert the number of elements as a string (or character).
I have tried itoa, casting (char), +'0', snprintf ... nothing works.
As every time I ask a question on Stack Overflow, I am sure the solution will be obvious. Thanks in advance for your help.
So I have an array of strings char** arrayNum which I populate, leaving the index 0 empty.
When I try to assign a string to arrayNum[0]:
This works: arrayNum[0] = "blabla";
This does not work: arrayNum[0] = (char) ((arraySize - 1)+'0');
I have tried countless others combinations, I don't even remember...
arrayNum can be thought of as an array of strings (char *). So you will naturally have trouble trying to assign a char (or indeed any type other than char *) to an element of this array.
I think it would preferable to store the length of the array separately to the array. For example, using a struct. To do otherwise invites confusion.
If you really, really want to store the length in the first element, then you could do something like:
arrayNum[0] = malloc(sizeof(char));
arrayNum[0][0] = (char) ((arraySize - 1)+'0');
This takes advantage of the fact that arrayNum is strictly an array of pointers and each of those pointers is a pointer to a char. So it can point to a single character or an array of characters (a "string").
Compare this for clarity with (say):
struct elements {
int length;
char **data;
};
arrayNum is not an "array of strings."
It might be useful for you to think about it that way, but it is important for you to know what it really is. It is an array of pointers where each pointer is a pointer to char.
Sometimes a pointer to char is a "string," and sometimes it's a pointer into the middle of a string, and sometimes it's just a pointer to some character somewhere. It all depends on how you use it.
The C programming language does not really have strings. It has string literals, but a string literal is just a const array of characters that happens to end with a \000. The reason you can write arrayNum[0] = "blabla"; is because the value of the string literal "blabla" is a pointer to the first 'b' in "blabla", and the elements of the arrayNum array are pointers to characters.
It's your responsibility to decide whether arrayNum[i] points to the first character of some string, or whether it just happens to point to some single character; and it's your responsibility to decide and keep track of whether it points to something that needs to be freed() or whether it points to read-only memory, or whether it points to something on the stack, or whether it points to/into some staticly allocated data structure.
The language doesn't care.
Before putting my question, I want to quote "Expert C Programming" [Page :276, last paragraph]:
"The beauty of Illiffe vector data structre is that it allows arbitrary arrays of pointers to strings to be passed to functions, but only arrays of pointers, and only pointers to strings.
This is because both strings and pointers have the convention of an explicit out-of-bound value(NUL and NULL, respectively) that can be used as an end marker."
So, What I understood from above text is that if there is an array of pointers they have explicit out-of-bound value like NULL.( Correct me , if I'm wrong...)
So, it left me wondering what are the default values of an array of pointers(thinking that an array of pointers would have last pointer as NULL). Tried below code-snippets and result was very different.
int *x[2];
printf("%p %p",x[0],x[1]);
Output is: (nil) 0x400410
int *x[3];
printf("%p %p %p",x[0],x[1],x[2]);
Output is: 0xf0b2ff 0x400680 (nil)
int *x[4];
printf("%p %p %p %p", x[0],x[1],x[2],x[3]);
Output is: (nil) 0x4003db 0x7fffe48e4776 0x4006c5
So, with the above outputs , it is clear that there is an explicit Out-of-Bound (nil) value assigned to one of the pointers(one pointer is NIL), but is it truly the end-marker? No.
Is it one of those "Implementation defined" things of C-language?
I'm using a GCC compiler(4.6.3) on a Ubuntu machine.
Is it one of those "Implementation defined" things of C-language?
No, that's not implementation-defined - it's plain "undefined". The same is true for arrays of all types: the values that you see in them are undefined until explicitly initialized.
What I understood from above text is that if there is an array of pointers they have explicit out-of-bound value like NULL.
The author wanted to say that there is a value (specifically, NULL value) that can be used to mark a "no value" in an array of pointer. The author did not mean to imply that such a no-value marker would be placed into an array of pointers by default.
An array, or any object, with automatic storage duration (i.e., any object defined within a function body without the static keyword) has no default initial value unless you specify one. Its initial value is garbage, and you must not access that value before assigning something to it.
An object with static storage duration (i.e., any object defined outside any function and/or with the static keyword) is initialized to zero, with the meaning of "zero" (0 for integers, 0.0 for floating-point, null for pointers) applied recursively to subobjects.
You can use an initializer to ensure that a pointer object is set to a null pointer, or to whatever value you like:
int *x[2] = { NULL, NULL };
or, more simply:
int *x[2] = { 0 }; /* sets first element to 0, which is converted to a null
pointer; other elements are implicitly set to null
pointers as well */
You are misreading the quotation from "Expert C Programming." The key phrase there is the following:
This is because both strings and pointers have the *convention* of an explicit
out-of-bound value (NUL and NULL, respectively).
It is possible and even conventional to have an array of strings such that the last pointer is set to NULL. This can allow one to iterate over the array quite easily without knowing exactly how many elements there are in the array:
char* dwarves[] = { "Dopey",
"Grumpy",
"Sleepy",
"Happy",
"Sneezy",
"Bashful",
"Doc",
NULL
};
But you have to explicitly set the last pointer to NULL. Such structures are useful because they allow elegant code. So if you want to print or otherwise manipulate the array, you don't need to worry about how many strings are in it, as the NULL pointer will signal the end:
for (char** pWalk = dwarves; *pWalk; pWalk++)
printf ("%s\n", *pWalk);
The beauty of this particular type of ragged-array structure is that strings by definition have a built-in NUL terminator, and the array of pointers is terminated with the NULL, so the endpoints of both dimensions are known. However, the NULL as the last pointer in the array is not something that's built into the language. It has to be explicitly set. Failing to do so would be the equivalent of declaring an array of char but not terminating it with a NUL:
char myString[] = { 'H', 'e', 'l', 'l', 'o' } // No NUL termination
Just as you would have to know how many characters there are in this array if you want to manipulate it in any useful way, without the NULL at the end of the array of pointers, manipulating it would be more difficult.
That's really all that Peter van der Linden is saying in the paragraph you quoted about Illiffe data structures.
There is no requirement in C that any local variable should have any 'default' value. So, when the compiler reserves two (or three) memory locations, the initial value is whatever that these memory locations contained before - there will not be any default initialization.
Unless your array was declared at file scope (outside of any function) or with the static keyword, the array contents will not be initialized; each element will contain some random bit pattern that may or may not correspond to a valid address.
If your array was declared at file scope or with the static keyword, then each element would be implicitly initialized to NULL. Note that attempting to dereference a NULL pointer results in undefined behavior, so you will want to check that your pointer isn't NULL before doing something with it.
A null pointer represents a well-defined "nowhere", guaranteed to compare unequal to any valid memory address. Note that there is a null pointer constant1 and a null pointer value2, and the two are not necessarily the same. In your source code, the macro NULL is set to the null pointer constant. During translation, each occurence of NULL in your source code is replaced with the real null pointer value.
There are invalid pointer values other than NULL; it's just that NULL is well-defined and works the same everywhere.
1. Any 0-valued integral expression, as used in a pointer context. Could be a naked 0, or (void *) 0, or something else that evaluates to 0.
2. Value used by the platform to represent a null pointer, which does not have to be 0.
This is a general question about C.(I dont have a lot of experience coding in C)
So, if I have a function that takes a char* as an argument. How to know whether its a pointer to a single char or a char array, because if it's a char array I can expect a \0 but if it's not a char array then I wouldn't want to search for \0.
Is char* in argument a pointer to a single char or a char array?
Yes.
A parameter of type char* is always a pointer to a char object (or a null pointer, not pointing to anything, if that's what the caller passes as the corresponding argument).
It's not a pointer to an array (that would be, for example, a pointer of type char(*)[42]), but the usual way to access the elements of an array is via a pointer to the element type, not to the whole array. Why? Because an actual pointer-to-array must always specify the length of the array (42 in my example), which is inflexible and doesn't let the same function deal with arrays of different lengths.
A char* parameter can be treated just as a pointer to a single char object. For example, a function that gets a character of input might be declared like this:
bool get_next_char(char *c);
The idea here is that the function's result tells you whether it was successful; the actual input character is "returned" via the pointer. (This is a contrived example; <stdio.h> already has several functions that read characters from input, and they don't use this mechanism.)
Compare the strlen function, which computes the length of a string:
size_t strlen(const char *s);
s points to the first element of an array of char; internally, strlen uses that pointer to traverse the array, looking for the terminating '\0' character.
Ignoring the const, there's no real difference between the char* parameters for these two functions. In fact, C has no good way to distinguish between these cases: a pointer that simply points to a single object vs. a pointer that points to the first element of an array.
It does have a bad way to make that distinction. For example, strlen could be declared as:
size_t strlen(const char s[]);
But C doesn't really have parameters of array type at all. The parameter declaration const char s[] is "adjusted" to const char *s; it means exactly the same thing. You can even declare a length for something that looks like an array parameter:
void foo(char s[42]);
and it will be quietly ignored; the above really means exactly the same thing as:
void foo(char *s);
The [42] may have some documentation value, but a comment has the same value -- and the same significance as far as the compiler is concerned.
Any distinction between a pointer to a single object and a pointer to the first element of an array has to be made by the programmer, preferably in the documentation for the function.
Furthermore, this mechanism doesn't let the function know how long the array is. For char* pointers in particular, it's common to use the null character '\0' as a marker for the end of a string -- which means it's the callers responsibility to ensure that that marker is actually there. Otherwise, you can pass the length as a separate argument, probably of type size_t. Or you can use any other mechanism you like, as long as everything is done consistently.
... because if it's a char array I can expect a \0 ...
No, you can't, at least not necessarily. A char* could easily point to the first element of a char array that's not terminated by a '\0' character (i.e., that doesn't contain a string). You can impose such a requirement if you like. The standard library functions that operate on strings impose that requirement -- but they don't enforce it. For example, if you pass a pointer to an unterminated array to strlen, the behavior is undefined.
Recommended reading: Section 6 of the comp.lang.c FAQ.
You cannot determine how many bytes are referenced by a pointer. You need to keep track of this yourself.
It is possible that a char array is NOT terminated with a \0 in which case you need to know the length of the array. Also, it is possible for an array to have a length of 1, in which case you have one character with no terminating \0.
The nice thing about C is that you get to define the details about data structures, thus you are NOT limited to a char array always ending with \0.
Some of the terms used to describe C data structures are synonymous. For example, an array is sequential series of data elements, an array of characters is a string, and a string can be terminated with a null char (\0).