Use of pointers to store character strings - c

I started learning pointers in C. I understood it fine untill I came across the topic "Using Pointers to store character arrays".
A sample program to highlight my doubt is as follows
#include <stdio.h>
main()
{
char *string;
string = "good";
printf ("%s", string);
}
This prints the character string, i.e, good.
Pointers are supposed to store memory addresses, or in other words, we assign the adress of a variable (using the address operator) to a pointer variable.
What I don't understand is how are we able to assign a character string directly to the pointer? That too without address operator?
Also, how are we able to print the string without the indirection operator (*) ?

A literal string like "good" is really stored as a (read-only) array of characters. Also, all strings in C must be terminated with a special "null" character '\0'.
When you do the assingment
string = "good";
what is really happening is that you make string point to the first character in that array.
Functions handling strings knows how to deal with pointers like that, and know how to loop over such arrays using the pointer to find all the characters in the string until it finds the terminator.
Looking at it a little differently, the compile creates its array
char internal_array[] = { 'g', 'o', 'o', 'd', '\0' };
then you make string point to the first element in the array
string = &internal_array[0];
Note that &internal_array[0] is actually equal to internal_array, since arrays naturally decays to pointers to their first element.

"cccccc" is a string literal which is actually the char array stored in the ReadOnly memory. You assign the pointer to the address of the first character of this literal.
if you want to copy string literal to the RAM you need to:
char string[] = "fgdfdfgdfgf";
Bare in mind that the array initialization (when you declare it) is the only place where you can use the = to copy the string literal to the char array (string).
In any other circumstances you need to use the appropriate library function for example.
strcpy(string, "asdf");
(the string has to have enough space to accommodate the new string)

What I don't understand is how are we able to assign a character string directly to the pointer? That too without address operator?
When an array is assigned to something, the array is converted to a pointer.
"good" is a string literal. It has a array 5 of char which includes a trailing null character. It exists in memory where write attempts should not be attempted. Attempting to write is undefined behavior (UB). It might "work", it might not. Code may die, etc.
char *string; declare string as pointer to char.
string = "good"; causes an assignment. The operation takes "good" and converts that array to the address and type (char*) of its first element 'g'. Then assigns that char * to string.
Also, how are we able to print the string without the indirection operator (*) ?
printf() expects a char * - which matches the type of string.
printf ("%s", string); passes string to printf() as a char * - no conversion is made. printf ("%s",... expects to see a "... the argument shall be a pointer to the initial element of an array of character type." then "Characters from the array are written up to (but not including) the terminating null character." C11 §7.21.6.1 8.

Your first question:
What I don't understand is how are we able to assign a character string directly to the pointer? That too without address operator?
A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, for e.g. "good".
From C Standard#6.4.5 [String literals]:
...The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.....
In C, an expression that has type array of type is converted to an expression with type pointer to type that points to the initial element of the array object [there are few exceptions]. Hence, the string literal which is an array decays into pointer which can be assigned to the type char *.
In the statement:
string = "good";
string will point to the initial character in the array where "good" is stored.
Your second question:
Also, how are we able to print the string without the indirection operator (*) ?
From printf():
s
writes a character string
The argument must be a pointer to the initial element of an array of characters...
So, format specifier %s expect pointer to initial element which is what the variable string is - a pointer to initial character of "good". Hence, you don't need indirection operator (*).

Related

Why de-referencing is not used in case of printing of a string?

How we use a string name in C to print a string without dereferencing it?
Like here:
char str[];
printf("%s",str);
But in case of arrays we use dereferencing by square brackets[] to print an array:
int a[10];
for(i=0;i<10;i++);
printf("%d",a[i]);
What's the difference between printing a string and an array? We need dereferencing an array by using square brackets [] but in case of string without using any dereferencing printf(); is just printing the value of the string and not the address.
Because that is what the interface to printf dictates.
In case of a "%s" it expects a pointer to a NUL terminated string.
In case of a "%d" it expects an int.
It all depends on what the function (printf or else) expects.
When you use printf with %s, like
printf("%s",str);
printf() wants "a string" which, in C, is the same as "an array of [NUL-terminated] char". So, you pass the only thing that identifies actually the whole array: its name (which decays to the address of the first element).
The same applies to every function that expects an array as argument.
If, to the function you call, you must pass something else, you use different approaches. When printf is used to print an integer, like
printf("%d",a[i]);
you must pass an integer; and an array is not an integer, so you must dereference (or "select") an integer from the array, using the correct notation/mechanism. But things are not different if you want to use a member of a struct, for example. You can not pass printf() a struct, so you use the dot to select a [correct] member of the struct. And the same applies to pointers to integer, you dereference them when you want the value and not the pointer itself. And so on for functions... like
printf("%d", (int) floor(PI));
In the above line, again, if printf wants an integer, you must give it an integer, so to use PI (3.14) you must convert it to an int.
For starters for example such a code snippet
char str[5] = "Hello";
printf("%s",str);
can result in undefined behavior.
The format string "%s" expects that the argument is a pointer to first character of a string.
In C the string is defined as a sequence of characters terminated by the zero character '\0'. That is a character array that contains a string has the sentinel value '\0'.
So in fact such a call of printf
printf("%s",str);
logically is similar to the following code
while ( *str != '\0' ) putchar( *str++ );
In the example above
char str[5] = "Hello";
the character array str does not contain a string because its five elements were initialized by characters of the string literal "Hello" excluding its terminating zero character (there is no enough space in the array to accommodate the sixth character '\0' of the string literal).
So such a call
printf("%s",str);
for the array declared above will try to output anything that follows the array in the memory.
To avoid the undefined behavior you should declare the array for example like
char str[6] = "Hello";
or like
char str[] = "Hello";
As for arrays of other fundamental types then in most cases they do not have a standard sentinel value. Opposite to strings it is impossible to define such a sentinel value for example for integer or float arrays.
So character arrays that contains strings are exclusions from the total kinds of arrays. Strings always end with the terminating zero character '\0' by the definition of the string in C. That is not valid for other kinds of arrays.
Hence to output any array of other kinds you in general need to use a loop knowing how many elements you need to output.

Why does a string not return its memory location?

They say a string is naturally an array of characters which represents the memory location of its first character: string == &string[0]. That is why a user do not have to specify ampersand before a string variable when he wants the program to read data into this variable:
scanf("%s", string);
But then there is a reasonable question: why don't we see the string's memory location when printing string out?
printf("%s", string);
But then there is a reasonable question: why don't we see &string[0] when printing string out?
printf("%s", string);
I assume you mean that you would expect to see the address printed instead of the content.
This is what the format specifier is for.
You pass an address of the first character to printf and because there is a %s format specifier for that parameter, the function knows that it has to take the address, dereference it and print any characters until a 0 is found in memory.
If you had provided a %p format specifier, you would see the address printed.
In this case you need to cast the address to void* to be fully standard compliant.
They say a string is naturally an array of characters which represents the memory location of its first character: string == &string[0]
Two small clarifications here.
A string is technically not an array of characters. It is defined as "a contiguous sequence of code units terminated by the first zero code unit" Or as RobertS said in comments: "A string can be (and high-probably is) stored in an array, but a string is not an array. A string is just a sequence of characters terminated by the null character. A string isn't an array and an array isn't a string."
If string is an array and not a pointer, then they will have different types. string is of type char[] and &string[0] is of type char*. However, if you do the comparison string == &string[0] then string will decay to char* and the expression will evaluate to true.
Related:
Is an array name a pointer?
Can it cause problems to pass the address to an array instead of the array?

Pointers and Arrays in c, what is the difference? [duplicate]

This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 4 years ago.
I am learning about pointers. I don't understand what the difference between the char variables is.
#include <stdio.h>
int main() {
char *cards = "JQK";
char cards2[] = "JQK";
printf("%c\n", cards[2]);
printf("%c\n", cards2[2]);
}
I experimented with them in the printf() and they seem to be working the same way except that cards2[] can't be resigned while cards can. Why is this the case?
The difference is second is an array object initialized with the content of the string literal. First one is char* which contains the address of the string literal. String literals are array - this array is converted into pointer to first element and then that is assigned to char*.
The thing is, arrays are non modifiable lvalue - it can't appear in the left side of = assignment operator. A Pointer can (not marked const) can. And as the pointer is pointing to a string literal - you shouldn't try to modify it. Doing so will invoke undefined behavior.
Pointer and arrays are not the same thing - arrays decay into pointer in most cases. Here also that happened with those string literals when used in the right hand side of assignment in the pointer initialization case. Second one is different as it is explicitly mentioned that this will copy the content of the string literal to a the array declared - this is why this is modifiable unlike the previous case(Here cards2).
To clarify a bit - first let's know what is going on and what is the difference between array and pointer?
Here I have said that string literals are arrays. From §6.4.5¶6
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals.78) The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence. For UTF-8 string literals, the array elements have type char, and are initialized with the characters of the multibyte character sequence, as encoded in UTF-8.
This is what is written in c11 standard - it is posted to show you that string literals are indeed a character array of static storage duration. This is how it is. Now the question is when we write
char *cards = "JQK";
So now we have an array which is in the left hand side of a pointer declaration. What will happen?
Now comes the concept of array decaying. Most of the cases array is converted into pointers to the first element of it. This may seem strange at first but this is what happens. For example,
int a[]={1,2,3};
Now if you write a[i] this is equivalent to *(a+i) and a is the decayed pointer to the first element of the array. Now you are asking to go to the position a+i and give the value that is in that address, which is 3 for i=2. Same thing happened here.
The pointer to the first element of the literal array is assigned to cards. Now what happened next? cards points to the first element of the literal array which is J.
Story doesn't end here. Now standard has imposed a constraint over string literal. You are not supposed to change it - if you try to do that it will be undefined behavior - undefined behavior as the name implies is not defined by the standard. You should avoid it. So what does that mean? You shouldn't try to change the string literal or whatever the cards points to. Most implementation put string literals in read only section - so trying to write to it will be erroneous.
Being said that, what happened in the second case? Only thing is - this time we say that
char cards2[] = "JQK";
Now cards2 is an array object - to the right of the assignment operator there is string literal again. What will happen? From §6.7.9¶14
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
The thing is now it means that - you can put the string literal to the right side of the assignment. The char array will be initialized. Here that is what is being done. And this is modifiable. So you can change it unlike the previous case. That is a key difference here.
Also if you are curious are there cases when we see an array as array and not as pointer - the whole rule is stated here.
From §6.3.2.1¶3
Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type 'array of type' is converted to an expression with type 'pointer to type' that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
This is all there is to this. This I hope will give you a clear idea than it gave before.

Why does direct passage of string to printf correctly works?

I know that in C both of these works:
char* string = "foo";
printf("string value: %s", string);
and more simply:
printf("string value: %s", "foo");
But I was asking myself why.
I know that %s identifier expects the argument to be a char*, and string actually is (and it will be the same with an array of characters, because this two datatypes are pretty the same in C)
but when I pass directly a string to printf shouldn't it be different? I mean "foo" is not a pointer anymore... Right?.
The string constant "foo" has type char []. When passed to a function, the array decays to a pointer, i.e. char *. So you can pass it to a function that expects the same.
For the same reason, you can also pass a variable of this type:
char string[4] = "foo";
printf("string value: %s", string);
The "foo" is string literal. It represents an unnamed array object with static storage duration of type char[4] (that is, without const qualifier), that is passed to function by value, just as it would be with any "normal" array.
Even though the array is not const, you are not allowed to modify its values. Such modification results in undefined behavior:
char* string = "foo";
string[0] = 'b'; // wrong, this invokes UB
The array has four elements, because of trailing null character '\0', sometimes reffered as NUL character. Please don't confuse it with NULL, which is a different thing. The purpose of that character is to terminate given string literal.
The function's parameter receives pointer to char, as array object is converted into pointer to array's first element (i.e. pointer to first character in array). To be precise, not the whole pointer is passed, only the address (i.e. the value of the pointer) it holds.
In C, all strings are null-terminating char[] so your example will interact in the same exact way.
The ISO C standard, section 7.1.1, defines a string this way:
A string is a contiguous sequence of characters terminated by and
including the first null character.
What printf() gets, is a pointer:
ISO/IEC 9899:TC3, 6.5.2.2 – 4:
An argument may be an expression of any object type. In preparing for the call to a function, the arguments are evaluated, and each parameter is assigned the value of the corresponding argument.81)
81) A parameter declared to have array or function type is adjusted to have a pointer type as described in 6.9.1.
ISO/IEC 9899:TC3, 6.9.1 – 10:
On entry to the function, the size expressions of each variably modified parameter are evaluated and the value of each argument expression is converted to the type of the corresponding parameter as if by assignment. (Array expressions and function designators as arguments were converted to pointers before the call.)
"foo", in the end, is a pointer literal pointing to a statically allocated 4-byte memory region (likely marked read-only) that is initialized with the content: 'f'. 'o', 'o', '\0'.

How to work with strings and pointers in C?

Why is it p+=*string[2]-*string[1] instead of p+=string[2]-string[1] (without asterisks)?
res[0]=*p; its value is 'c' why? Why p moves on the word letters while *string[1] moves among the words and not letters?
#####I have edited the code.........did "char* p=string;" can be replaced by "char p=*string[0];"?
The code's output is
c
int main (void)
{
char* strings[]={"abcdb","bbb","dddd"};
char*p=*strings;
char res[2];
p+=*string[2]-*string[1];
res[0]=*p;
p+=3;
res[1]=*p;
printf("%s\n",res);
return 0;
}
The program has undefined behaviour.
We will not take into account that there is a typo and instead of identifier strings there is sometimes used identifier string. Let's assume that everywhere in the code snippet there is used identifier strings.
In this statement
char*p=*strings;
the first element of the array is assigned to the pointer. The first element of the array is pointer to the first character of string literal "abc". So p points to character 'a' of the string literal.
In this statement
p += *strings[2] - *strings[1];
strings[2] is the third element of the array having type char * and its value is the address of the first character of string literal "dddd". Dereferencing this pointer *strings[2] you will get the first character of this string literal that is 'd'
strings[1] is the second element of the array having type char * and its value is the address of the first character of string literal "bbb". Dereferencing this pointer *strings[1] you will get the first character of this string literal that is `'b'
The difference between internal codes of characters 'd' and 'b' (for example in ASCII the code of character'b'is 98 while the code of'd'` is 100) is equal to 2.
So this statement
p += *strings[2]-*strings[1];
increases the pointer by 2. At first it pointed to character 'a' of the first string literal "abc" and after increasing by 2 it points now to character 'c' of the same string literal "abc".
Thus in this statement
res[0] = *p;
character 'c' is assigned to res[0].
After this statement
p+=3;
the value of p becomes invalid because it now points beyond the string literal "abc" and it is not necessary that the compiler placed string literal "bbb" exactly after string literal "abc".
So dereferencing this pointer in the next statement
res[1]=*p;
results in undefined behaviour.
According to the C Standard
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is
evaluated
It simply occured such a way that the compiler placed the string literals one after another in the memory. Though this is not guaranteed by the Standard.
So if after statement
res[1]=*p;
res[1] does not contain character '\0' then the next statement
printf("%s\n",res);
also has undefined behaviour.
First thing is there is nothing like string defined in your program, so your statement should be p+=*strings[2]-*strings[1]
Answer to both of your question is Dereferencing a pointer. You need to understand, how pointers work on Strings. Please check this link.

Resources