Defining string arrays - c

Why is the following an acceptable way to initialize an array of strings:
char * strings[] = { "John", "Paul", NULL};
But this way will fail:
char ** strings = { "John", "Paul", NULL};
My thought was that it would work relatively the same as doing:
char string[] = "John";
char * string = "Paul";
Where both work. What's the difference between the two?

char * strings[] is an array of pointers. When you initialize it as
char * strings[] = { "John", "Paul", NULL};
the strings John Paul are string literals. They are constants that exist somewhere in the code or Read only memory. What is done is to copy the pointer to the string literal John into the strings[0] and so on. i.e.
strings[0] --> holds a pointer to "John".
strings[1] --> holds a pointer to "Paul"
Note that the string literals should not be modified by your program. If you do, it is undefined behaviour.
In case of char ** strings This is a pointer to a pointer. It is a single memory location and cannot hold many pointers on its own. So, you cannot initialize it as below.
char ** strings = { "John", "Paul", NULL}; // error
However, a pointer to pointer can be used along with dynamic memory allocation (malloc,calloc etc) to point to an array of strings.
char string[] = "John";
In this case, you have a char array into which the string literal is copied. This step is done by the compiler, generally in the start up code before the main starts.
char * string = "Paul";
Here you have a char pointer which points to a string literal.
Difference between the above two statements is that in the case of a char array, you can modify the elements of string but you cannot in the second case.

Related

What is the difference between char stringA[LEN] and char* stringB[LEN] in C

I read a few similar questions, C: differences between char pointer and array, What is the difference between char s[] and char *s?, What is the difference between char array[] and char *array? but none of them seem to clear my doubt.
I'm aware that
char *s = "Hello world";
makes the string immutable whereas
char s[] = "Hello world";
can be modified.
My doubt is if I do char stringA[LEN]; and char* stringB[LEN]; Are they any different? Or does stringB again becomes immutable as in the case before?
Let me give you a visual explanation:
As you can see, a is an array of 4 characters, whereas b is an array of 4 character pointers, each pointing to the beginning of a C string. Each one of those strings can have a different length.
Are they any different?
Yes.
Both variables stringA and stringB are arrays. stringA is an array of char of size LEN and stringB is an array of char * of size LEN.
char and char * are two different types. stringA can hold only one character string of length LEN while elements of stingB can point to LEN number of strings.
Or does stringB again becomes immutable as in the case before?
Whether strings pointed by elements of stringB is mutable or not will depend on how memory is allocated. If they are initialized with string literals
char* stringB[LEN] = { "Apple", "Bapple", "Capple"};
then they are immutable. In case of
for(int i = 0; i < LEN; i++)
stringB[i] = malloc(30) // Allocating 30 bytes for each element
strcpy(stringB[0], "Apple");
strcpy(stringB[1], "Bapple");
strcpy(stringB[2], "Capple");
they are mutable.
They are not the same.
Here stringA is an array of char, that means printing stringA[0] will show the letter S:
char stringA[] = "Something";
Whereas printing stringB[0] here will show Something (array of pointer to char):
char* stringB[] = { "Something", "Else" };
They are not having the same datatype, less being comparable.
char stringA[LEN]; is a char array of LEN length. (Array of chars)
char* stringB[LEN]; is a char * array of LEN length. (Array of char pointers)
FWIW, in case of char *s = "Hello world"; s is a pointer which points to a string literal which is non-modifiable. The pointer itself can certainly be changed. Only the content it points to (values) cannot be changed.
The reason
char* s = "Hello World!";
is immutable is because "Hello World!" is stored in RO(Read Only) memory, sow hen you try to change it, it throws an error. Declaring it as a pointer to the first element of an "array" is NOT the reason it's immutable. You seem to be slightly confused as well. Assuming your questions is exactly what you mean,
char stringA[LEN] = "ABC";
is a string in traditional C style, but stringB as you've defined isn't a string, it's an array of strings -
char* stringB[LEN] = {"ABC", "DEF"};
Assuming you mean what I think you mean,
char stringA[LEN] = "Hello World!";
char *stringB = malloc(LEN);
strcpy(stringB, stringA);
In this case, stringB IS mutable, since it refers to writable memory.

Why can't define string array using char **?

We can assign a string constant to char * or char [ ] just like:
char *p = "hello";
char a[] = "hello";
Now for string array, naturally it'll be like this:
char **p = {"hello", "world"}; // Error
char *a[] = {"hello", "world"};
the first way will generate a warning when compiling, and has a Segmentation fault when I'm trying to print the string constant with printf("%s\n", p[0]);
Why ?
char **p = {"hello", "world"};
Here, p is a pointer to pointer to char which can't be initialized with an an array initializer with an array of pointers (each of the string literals gets converted into a pointer during initialization -- actual type of a string literal in C is char[n]).
The types are incompatible i.e. p is of type char ** and RHS is of type char *[]. Hence, the diagnostic is issued by the compiler.
Whereas,
char *a[] = {"hello", "world"};
is valid as a is an array of pointers to char and the types match. Hence this is a valid initialization.
Since C99, the C language supports Compound literals (6.5.2.5, C99) using which you can initialize:
char **p = (char *[]) {"hello", "world"};
So either use compound literals if C99 or later is supported by your compiler. Otherwise, stick to the array of pointers initialization (char *a[]).
You can read about various examples on compound literals here (gcc manual).
To summarize, because char **p points to a pointer, not to an array of pointers.
That happens because you need to build the array your self. For each level (or dimension) you need to reserve memory for the pointer holders.
What you need to do is:
// this holds you pointers
char **p = malloc( sizeof( char *) * nr of elements);
// to set the elements, you need to:
p[0] = "hello";
p[1] = "world";
And this needs to be done for as many levels (or dimensions) you have.
because char **p is a pointer to pointer not an array of pointers, where char* a[] is an array of pointers
char *ptr ="hello";
defines ptr to be a pointer to the (read-only) string "hello", and thus contains the address of string say 100. ptr must itself be stored somewhere: say location 200.
char **p = &ptr;
Now p points to ptr, that is, it contains the address of ptr (which is 200).
printf("%s",**p);
So your creating a pointer to another pointer but not extra memory to store more than one string but char *a[] creates array of pointers depends on the size you have given
an advice
Do not use char *p = "hello";
use const char *p = "hello";
because string literals are saved in read only memory
First, char *p = "hello" is different from char a[] = "hello".
char *p = "hello"
"hello" is actually a pointer to literal constant. You assign a pointer to literal constant to p. So It's better to use const char *p = "hello".
char a[] = "hello"
The characters in "hello" were copied to array a with a '\0' end.
Second,
char *a[]
defines a array of pointers, so it's OK to use char *a[] = {"hello", "world"};
char **p
defines a pointer to a pointer to a char, so it makes no sense to using char **p = {"hello", "world"};

c structs, pointer and memory allocation for fields

Suppose the following code:
struct c {
char* name;
};
int main(int argc, char *argv[]) {
struct c c1;
c1.name = "Ana";
printf ("%s\n",c1.name);
return 0;
}
My first reaction would have been to think that I needed to allocate some space, either on the heap, or by an explicit char name[] = "Anna", but my example above works. Is the compiler just storing that string in the Data segment and pointing to it? In other words, is that like doing a
struct c {
char* name = "Ana";
};
Thanks.
struct c c1;
c1.name = "Ana";
You don't have allocate memory here because you are making the pointer c1.name point to a string literal and string literals have static storage duration. This is NOT similar to:
char name[] = "Anna";
Because in this case memory is allocated to store the sting literal and then the string literal "Anna" is copied into the array name . What you do with the struct assignment c1.name = "Ana" is similar to when you do:
char *name = "Anna";
i.e. make the pointer point to a string literal.
I am new to C but from what I think this could be just the same as
char *cThing;
cThing = "Things!";
where printf("%s\n", cThing); would then print "Things!", except you're declaring the pointer in a struct.

C pointers and array declaration

I'm currently learning C through "Learning C the Hard Way"
I am a bit confused in some sample code as to why some arrays must be initialized with a pointer.
int ages[] = {23, 43, 12, 89, 2};
char *names[] = {
"Alan", "Frank",
"Mary", "John", "Lisa"
};
In the above example, why does the names[] array require a pointer when declared? How do you know when to use a pointer when creating an array?
A string literal such as "Alan" is of type char[5], and to point to the start of a string you use a char *. "Alan" itself is made up of:
{ 'A', 'L', 'A', 'N', '\0' }
As you can see it's made up of multiple chars. This char * points to the start of the string, the letter 'A'.
Since you want an array of these strings, you then add [] to your declaration, so it becomes: char *names[].
Prefer const pointers when you use string literals.
const char *names[] = {
"Alan", "Frank",
"Mary", "John", "Lisa"
};
In the declaration, name is a array of const char pointers which means it holds 5 char* to cstrings. when you want to use a pointer, you use a pointer, as simple as that.
Example:
const char *c = "Hello world";
So, when you use them in an array, you're creating 5 const char* pointers which point to string literals.
Because the content of the array is a char*. The other example has an int. "Alan" is a string, and in C you declare strings as char pointers.
In the case of char *names[] you are declaring an array of string pointers.
a string in C e.g. "Alan" is a series of characters in memory ended with a \0 value marking the end of the string
so with that declaration you are doing this
names[0] -> "Alan\0"
names[1] -> "Frank\0"
...
then you can use names[n] as the pointer to the string
printf( "%s:%d", names[0], strlen(names[0]) );
which gives output "Alan:4"
The use of "array of pointer" is not required.
The following will work as well. It's an array of 20 byte character arrays. The compiler only needs to know the size of the thing in the array, not the length of the array. What you end up with is an array with 5 elements of 20 bytes each with a name in each one.
#include <stdio.h>
char names[][20] = {
"Alan", "Frank",
"Mary", "John", "Lisa"
};
int main(int argc, char *argv[])
{
int idx;
for (idx = 0; idx < 5; idx++) {
printf("'%s'\n", names[idx]);
}
}
In your example the size of the thing in the array is "pointer to char". A string constant can be used to initialize either a "pointer to char" or an "array of char".

Do these statements about pointers have the same effect?

Does this...
char* myString = "hello";
... have the same effect as this?
char actualString[] = "hello";
char* myString = actualString;
No.
char str1[] = "Hello world!"; //char-array on the stack; string can be changed
char* str2 = "Hello world!"; //char-array in the data-segment; it's READ-ONLY
The first example creates an array of size 13*sizeof(char) on the stack and copies the string "Hello world!" into it.
The second example creates a char* on the stack and points it to a location in the data-segment of the executable, which contains the string "Hello world!". This second string is READ-ONLY.
str1[1] = 'u'; //Valid
str2[1] = 'u'; //Invalid - MAY crash program!
No. The first one gives you a pointer to const data, and if you change any character via that pointer, it's undefined behavior. The second one copies the characters into an array, which isn't const, so you can change any characters (either directly in array, or via pointer) at will with no ill effects.
No. In the first one, you can't modify the string pointed by myString, in the second one you can. Read more here.
It isn't the same, because the unnamed array pointed to by myString in the first example is read-only and has static storage duration, whereas the named array in the second example is writeable and has automatic storage duration.
On the other hand, this is closer to being equivalent:
static const char actualString[] = "hello";
char* myString = (char *)actualString;
It's still not quite the same though, because the unnamed arrays created by string literals are not guaranteed to be unique, whereas explicit arrays are. So in the following example:
static const char string_a[] = "hello";
static const char string_b[] = "hello";
const char *ptr_a = string_a;
const char *ptr_b = string_b;
const char *ptr_c = "hello";
const char *ptr_d = "hello";
ptr_a and ptr_b are guaranteed to compare unequal, whereas ptr_c and ptr_d may be either equal or unequal - both are valid.

Resources