I am confused with how an array of pointer to char works in C.
Here is a sample of the code which I am using to understand the array of pointers to char.
int main() {
char *d[]={"hi","bye"};
int a;
a = (d[0]=="hi") ? 1:0;
printf("%d\n",a);
return 0;
}
I am getting a = 1 so d[0]="hi". What is confusing me that since d is an array of char pointers, shouldn't be a[0] equal to the address of h of the hi string ?
C 2018 6.4.5 specifies the behavior of string literals. Paragraph 6 specifies that a string literal in source code causes the creation of an array of characters. When an array is used in an expression other than as the operand of sizeof, the operand of unary &, or as a string literal used to initialize an array, it is automatically converted to a pointer to its first character. So char *d[]={"hi","bye"}; initializes d[0] and d[1] to point to the first characters of "hi" and "bye", and d[0]=="hi" compares d[0] to a pointer to the first character of another "hi".
Paragraph 7 says that the same array may be used for identical string literals:
It is unspecified whether these arrays are distinct provided their elements have the appropriate values…
Thus, when your compiler is taking the address of the first element of "hi" in d[0]=="hi", it may, but is not required to, use the same memory for "hi" as it did when initializing d[0] in char *d[]={"hi","bye"};.
(Note that the paragraph also allows the same memory to be used for string literals identical to the substrings at the ends of other string literals. For example, "phi" and "hi" could share memory.)
Related
This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 4 years ago.
I am learning about pointers. I don't understand what the difference between the char variables is.
#include <stdio.h>
int main() {
char *cards = "JQK";
char cards2[] = "JQK";
printf("%c\n", cards[2]);
printf("%c\n", cards2[2]);
}
I experimented with them in the printf() and they seem to be working the same way except that cards2[] can't be resigned while cards can. Why is this the case?
The difference is second is an array object initialized with the content of the string literal. First one is char* which contains the address of the string literal. String literals are array - this array is converted into pointer to first element and then that is assigned to char*.
The thing is, arrays are non modifiable lvalue - it can't appear in the left side of = assignment operator. A Pointer can (not marked const) can. And as the pointer is pointing to a string literal - you shouldn't try to modify it. Doing so will invoke undefined behavior.
Pointer and arrays are not the same thing - arrays decay into pointer in most cases. Here also that happened with those string literals when used in the right hand side of assignment in the pointer initialization case. Second one is different as it is explicitly mentioned that this will copy the content of the string literal to a the array declared - this is why this is modifiable unlike the previous case(Here cards2).
To clarify a bit - first let's know what is going on and what is the difference between array and pointer?
Here I have said that string literals are arrays. From §6.4.5¶6
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals.78) The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence. For UTF-8 string literals, the array elements have type char, and are initialized with the characters of the multibyte character sequence, as encoded in UTF-8.
This is what is written in c11 standard - it is posted to show you that string literals are indeed a character array of static storage duration. This is how it is. Now the question is when we write
char *cards = "JQK";
So now we have an array which is in the left hand side of a pointer declaration. What will happen?
Now comes the concept of array decaying. Most of the cases array is converted into pointers to the first element of it. This may seem strange at first but this is what happens. For example,
int a[]={1,2,3};
Now if you write a[i] this is equivalent to *(a+i) and a is the decayed pointer to the first element of the array. Now you are asking to go to the position a+i and give the value that is in that address, which is 3 for i=2. Same thing happened here.
The pointer to the first element of the literal array is assigned to cards. Now what happened next? cards points to the first element of the literal array which is J.
Story doesn't end here. Now standard has imposed a constraint over string literal. You are not supposed to change it - if you try to do that it will be undefined behavior - undefined behavior as the name implies is not defined by the standard. You should avoid it. So what does that mean? You shouldn't try to change the string literal or whatever the cards points to. Most implementation put string literals in read only section - so trying to write to it will be erroneous.
Being said that, what happened in the second case? Only thing is - this time we say that
char cards2[] = "JQK";
Now cards2 is an array object - to the right of the assignment operator there is string literal again. What will happen? From §6.7.9¶14
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
The thing is now it means that - you can put the string literal to the right side of the assignment. The char array will be initialized. Here that is what is being done. And this is modifiable. So you can change it unlike the previous case. That is a key difference here.
Also if you are curious are there cases when we see an array as array and not as pointer - the whole rule is stated here.
From §6.3.2.1¶3
Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type 'array of type' is converted to an expression with type 'pointer to type' that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
This is all there is to this. This I hope will give you a clear idea than it gave before.
From C in a Nutshell:
In most cases, the compiler implicitly converts an expression
with an array type, such as the name of an array, into a pointer to
the array’s first element.
The array expression is not converted into a pointer only in the
following cases:
• When the array is the operand of the sizeof operator
• When the array is the operand of the address operator &
• When a string literal is used to initialize an array of char ,
wchar_t , char16_t , or char32_t
Could you explain what the last bullet means with some positive and
negative examples? I don't find an example in the book for the last
bullet.
Also why is an array of characters, not other element types?
char *ptr = "Hello OP!!";
ptr is an pointer to first char of the string literal stored in the RODATA segment. When you dereference it you can only read but not write values as string literals are constant char arrays.
char arr[] = "Hello OP!! How are you my friend?";
In this case:
Is allocated space for the arr array of the length of size literal including the trailing zero.
String literal is copied into the space allocated for the arr array
In this case arr is used as place in the memory where the string literal is copied.
You can read and write as the arr elements are read & write
And now answering the question
sizeof of an array is the size in bytes if all array elements. If the array was converted to pointer - the size would be the size of the pointer which is obviously wrong in this case
Array is only the continuous space in the memory accommodating all its elements. So the address of the array is always the address of this memory location
Third case i have explained above
you can see the code
https://godbolt.org/g/xVL5cR
** Note to TIM ** String literals are not converted to anything. String literal is only stored as a char (wchar_t ....) array with NUL (NOT NULL) teriminator at the end, in the RO memory.
Why is an array of characters, not other element types?
Its becouse string literals have static storage duration, and thus exist in memory for the life of the program.
Attempting to modify a string literal(with pointer to literal) results in undefined behavior: they may be stored in read-only storage (such as .rodata) or combined with other string literals.
Any of other constants arent stored like this, so this is why only array of characters (literals).
Could you explain what the last bullet means with some positive and
negative examples? I don't find an example in the book for the last
bullet.
String literal initialization looks like this:
char ptr[] = "Hello world!"; // This is char[]
char ptr[] = L"Hello world!"; // This is wchar_t[]
char ptr[] = u8"Hello world!"; // This is char[]
char ptr[] = u"Hello world!"; // This is char16_t[]
char ptr[] = U"Hello world!"; // This is char32_t[]
The string literal is copied from static storage duration to automatic storage duration and its possible to modify him.
While
char ptr[] = {'H','e','l','l',o',' ','w','o','r','l','d','\0'};
wont be string literal and wont have static duration storage.
Given the Source Code from strcpy()
char * strcpy(char *s1, const char *s2)
{
char *s = s1;
while ((*s++ = *s2++) != 0);
return (s1);
}
Why does handing over the second argument work and how does it look in memory since I do not pass a pointer to the function
char dest[100];
strcpy(dest, "HelloWorld");
This works, because,
For dest, arrays, when passed as function arguments, decay to the address of the first element. That's a pointer.
So, a call like
strcpy(dest, "HelloWorld");
is the same as
strcpy(&dest[0], "HelloWorld");
For "HelloWorld", a string literal, has a type of char[]. So, it essentially gives you the address of the fist element in it.
In C string literals have types of character arrays. From C Standard (6.4.5 String literals)
6 In translation phase 7, a byte or code of value zero is appended to
each multibyte character sequence that results from a string literal
or literals.78) The multibyte character sequence is then used to
initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals,
the array elements have type char, and are initialized with the
individual bytes of the multibyte character sequence.
Also arrays with rare exceptions are converted to pointers in expressions. The C Standard, 6.3.2.1 Lvalues, arrays, and function designators
3 Except when it is the operand of the sizeof operator or the unary &
operator, or is a string literal used to initialize an array, an
expression that has type ‘‘array of type’’ is converted to an
expression with type ‘‘pointer to type’’ that points to the initial
element of the array object and is not an lvalue. If the array object
has register storage class, the behavior is undefined.
Thus in this call
strcpy(dest, "HelloWorld");
the string literal has type char[11] that is converted to value of type char * that is equal to the address of the first character of the string literal.
You could also write for example
strcpy(dest, &"HelloWorld"[0]);
Or even :)
strcpy(dest, &0["HelloWorld"]);
or:)
strcpy(dest, &*"HelloWorld");
All three expressions yield the address of the initial element of the string literal and have type char *.
Take into account that it is implementation-defined (usually controlled by compiler options) whether
"HelloWorld" == "HelloWorld"
is evaluated to true. That is whether the compiler allocates separate extents of memory for identical string literals or will store only one copy of them.
In this expression the addresses of the first characters of the string literals are compared.
If you write
strcmp( "HelloWorld", "HelloWorld" )
then the result will be equal to 0 that is the string literals are equal each other (contain the same sequence of characters)
strcpy(dest, "HelloWorld");
dest being and array of char will decay to pointer to base element when passed to function and accessed there .
And "HelloWorld" being a string literal is of type char [](but not modifiable) and hence is correct argument to pass.
there's some basic concepts that I get confused when reading a C book recently.
It says: A variable that points to a string literal can’t be used to change the contents of the string.
As I know there's also character literal and integer literal, how's their situation? Are they also can not be update? If so can you give an example?
Besides, what's the difference between literal and array? like character array, string literal, are they actually one thing?
what should I call the variable below? an integer array? an integer literal?
int contestants[] = {1, 2, 3};
I've concluded some examples but I'm still somewhat messed:
char s[] = "How big is it?"; //s is an array variable
char *t = s; //t is a pointer variable to a string literal "How big is it?"
string literal:"ABC"
Character literal:'a'
Integer literal:1
I'm messed by these 4 item:character,array,string,literal. I'm still very messed up.
character array and string literal same thing?
an array of characters and array literal same ?
A literal is a token in a program text that denotes a value. There are string literals like "123", character literals like 'a' and numeric literals like 7.
int contestants[] = {1, 2, 3};
In the program fragment above there are three literals 1 2 and 3 and no others. In particular, neither contestants nor {1, 2, 3} are literals.
It is worth noting that the C standard uses the word literal only in reference to string literals. The other kinds are officially known as constants. But you may find them referred to as literals in all kind of places so I have included them here. "Integer literal" and "integer constant" are the same thing.
A string literal is also an object (a piece of data, a region of storage) in a program that is associated with a string literal in the previous sense. This piece of data is a character array. Not every character array is a string literal. No int array is a literal.
A pointer can point to a string literal, but not to a character literal or to an integer literal, because the latter two kinds are not objects (have no storage associated with them). A pointer can only point to an object. You cannot point a pounter to a literal 5. So the question of whether such things can be modified does not arise.
char* p = "123";
In a program fragment above, "123" is a literal and p points to it. You cannot modify an object pointed to by p.
char a[] = "123";
In the program fragment above, a is a character array. It is initialized with a string literal "123", but it is not a literal itself and can be modified freely.
int i = 5;
Above, 5 is a literal and i is not. i is initialized with a literal, but it isn't one itself.
int k[] = {1, 2, 3};
int* kp = k;
In the line above, much like in the one before it, neither the array k nor its elements are literals. They are merely initialized with literals. kp is a pointer that points to the first element of the array. One can update the array with thos pointer; kp[1] = 3;
Strings:
char s[] = "How big is it?"; //array variable
Here s is an array and holding the string and has both read and write option for this.You can modify the values of the array.The size of the string literal "How big is it?" is calculated and the array size is calculated based on the string length.
What is a string literal?
char *p = "someString";
Here the string someString is stored in a read-only location and the location address is returned to your pointer. So now you can't write to the location the pointer is pointing to.
Integers:
int a[] = {1,2,3};
a is an array which holds values and they can be read also be modified.
In the code
int i;
for(i=0;i<10;i++)
10 is a integer literal as we see 10 represents the decimal value and is directly included in the code.
One more example is
int b;
b=1;
Now 1 is a integer literal.
A literal is a syntactic form that directly represents a value in a programming language. Thus, 1 + 64 is an expression that evaluates to 65 and is not a literal; x after int x = 65 also evaluates to 65 but is not a literal. 65 is a literal that represents 65, and 0x41 is the same; 65L is also a literal that represents a "long integer" version of 65. 'A' is another literal that also represents the number 65, this time as a char. "ABC" is a string literal that, when put into code, represents a four-element array of characters, and fill it with values 65, 66, 67 and 0. You could also use the array literal (char[]){ 65, 66, 67, 0 }, and it would represent the same value, since strings are arrays of characters. [See comments]
Meanwhile, an array is a data structure that can contain multiple values, each value indexed by an integer. Arrays can have literal syntax (as demonstrated above e.g. in JavaScript), and literals can be of arrays; but the two are apples and oranges.
tl;dr: arrays are a specific kind of data structure; literal is how you write data in code.
For int contestants[] = {1, 2, 3};
contestants is array of 3 items of type int initialized by 3 literals 1, 2 and 3.
So literals are particular values written in the code and you should not mix terms literal (value of some type) and array (that also has a type, but is data structure).
Concerning your example with strings
char s[] = "How big is it?"; //array variable
char *t = s; //pointer variable to a string literal
I understand t as a pointer to the first element of array, that was initialized with string literal.
To start with, let's see some definitions.
Array
An array is an ordered data structure consisting of a collection of elements (values or variables), each identified by one (single dimensional array, or vector) or multiple indexes. The elemnts are stored in contiguous memory locations.
String Literals [From C11 standard, chapter 6.4.5]
A character string literal is a sequence of zero or more multibyte characters enclosed in
double-quotes, as in "xyz".
Integer constants [From C11 standard, chapter 6.4.4.1]
An integer constant begins with a digit, but has no period or exponent part. It may have a
prefix that specifies its base and a suffix that specifies its type.
So, int x = 5;, here 5 is an integer constant.
Character constants [From C11 standard, chapter 6.4.4.4]
An integer character constant is a sequence of one or more multibyte characters enclosed
in single-quotes, as in 'x'.
So, char y = 'S';, here 'S' is a character constant.
Now,
int contestants[] = {1, 2, 3};
contestants here is an integer array.
char s[] = "How big is it?";
s is a character array, being null terminated, can also be referred to as a string. OTOH, "How big is it?" is an unnamed string literal. We're initializing the containts of s using the string literal. s in present in read-write memory and it's containts are modifiable.
char * point = "Hello World";
p is a pointer to the string literal "Hello World". Usually it is stored in read-only memory location and alteration is not allowed.
In the following rules for the case when array decays to pointer:
An lvalue [see question 2.5] of type array-of-T which appears in an expression decays (with three exceptions) into a pointer to its first element; the type of the resultant pointer is pointer-to-T.
(The exceptions are when the array is the operand of a sizeof or & operator, or is a literal string initializer for a character array.)
How to understand the case when the array is "literal string initializer for a character array"? Some example please.
Thanks!
The three exceptions where an array does not decay into a pointer are the following:
Exception 1. — When the array is the operand of sizeof.
int main()
{
int a[10];
printf("%zu", sizeof(a)); /* prints 10 * sizeof(int) */
int* p = a;
printf("%zu", sizeof(p)); /* prints sizeof(int*) */
}
Exception 2. — When the array is the operand of the & operator.
int main()
{
int a[10];
printf("%p", (void*)(&a)); /* prints the array's address */
int* p = a;
printf("%p", (void*)(&p)); /*prints the pointer's address */
}
Exception 3. — When the array is initialized with a literal string.
int main()
{
char a[] = "Hello world"; /* the literal string is copied into a local array which is destroyed after that array goes out of scope */
char* p = "Hello world"; /* the literal string is copied in the read-only section of memory (any attempt to modify it is an undefined behavior) */
}
Assume the declarations
char foo[] = "This is a test";
char *bar = "This is a test";
In both cases, the type of the string literal "This is a test" is "15-element array of char". Under most circumstances, array expressions are implicitly converted from type "N-element array of T" to "pointer to T", and the expression evaluates to the address of the first element of the array. In the declaration for bar, that's exactly what happens.
In the declaration for foo, however, the expression is being used to initialize the contents of another array, and is therefore not converted to a pointer type; instead, the contents of the string literal are copied to foo.
This is a literal string initializer for a character array:
char arr[] = "literal string initializer";
Could also be:
char* str = "literal string initializer";
Definition from K&R2:
A string literal, also called a string
constant, is a sequence of characters
surrounded by double quotes as in
"...". A string has type ``array of
characters'' and storage class static
(see Par.A.3 below) and is initialized
with the given characters. Whether
identical string literals are distinct
is implementation-defined, and the
behavior of a program that attempts to
alter a string literal is undefined.
It seems like you pulled that quote from the comp.lang.c FAQ (maybe an old version or maybe the printed version; it doesn't quite match with the current state of the online one):
http://c-faq.com/aryptr/aryptrequiv.html
The corresponding section links to other sections of the FAQ to elaborate on those exceptions. In your case, you should look at:
http://c-faq.com/decl/strlitinit.html