I don't understand this use of pointers - c

I'm trying to understand this use of pointers. From what I realized so far the value pointers hold is a reference to the memory address of another entity, and when using the * sign we access the value of the entity referenced by the pointer.
However, in this code that I encountered in the tutorial i'm using, the ptr_strpointer has a string value which is not a memory address, so I don't understand how *ptr_str (which I expected to be the value of a referenced entity) is used in the for loop.
char *ptr_str; int i;
ptr_str = "Assign a string to a pointer.";
for (i=0; *ptr_str; i++)
printf("%c", *ptr_str++);

This:
ptr_str = "Assign a string to a pointer.";
Is a shorthand for this:
// Somewhere else:
char real_str[] = {'A', 's', 's', 'i', 'g', ..., '.', '\0'};
// In your main():
ptr_str = real_str;
// or
ptr_str = &real_str[0];
In other words, string literals like "Hello World" are actually pointers to a character array holding your string. This is all done transparently by the compiler, so it might be confusing at first sight.
If you're curious, take a look at this other answer of mine, where I explain this in more detail.

Related

differences in array initialization(char, string, other) regarding storage duration

In this question it was said in the comments:
char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; and char arr[10] =
"Hello"; are strictly the same thing. – Michael Walz
This got me thinking.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal with.
Does char b[10]= {72, 101, 108, 108, 111, 0}; also create a "string" literal with static storage duration? Because theoretically it is the same thing.
char a = 'a'; is the same thing as char a; ...; a = 'a';, so your thoughts are correct 'a' is simply written to a
Are there differences between:
char a = 'a';
char a = {'a'};
How/where are the differences defined?
EDIT:
I see that I haven't made it clear enough that I am particularly interested in the memory usage/storage duration of the literals. I will leave the question as it is, but would like to make the emphasis of the question more clear in this edit.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes, but string literals are also a grammatical item in the C language. char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; is not a string literal, it is an initializer list. The initializer list does however behave as if it has static storage duration, remaining elements after the explicit \0 are set to zero etc.
The initializer list itself is stored in some manner of ROM memory. If your variable arr has static storage duration too, it will get allocated in the .data segment and initialized from the ROM init list before the program is started. If arr has automatic storage duration (local), then it is initialized from ROM in run-time, when the function containing arr is called.
The ROM memory where the initializer list is stored may or may not be the same ROM memory as used for string literals. Often there's a segment called .rodata where these things end up, but they may as well end up in some other segment, such as the code segment .text.
Compilers like to store string literals in a particular memory segment, because that means that they can perform an optimization called "string pooling". Meaning that if you have the string literal "Hello" several times in your program, the compiler will use the same memory location for it. It may not necessarily do this same optimization for initializer lists.
Regarding 'a' versus {'a'} in an initializer list, that's just a syntax hiccup in the C language. C11 6.7.6/11:
The initializer for a scalar shall be a single expression, optionally enclosed in braces. The
initial value of the object is that of the expression (after conversion); the same type
constraints and conversions as for simple assignment apply,
In plain English, this means that a "non-array" (scalar) can be either initialized with or without braces, it has the same meaning. Apart from that, the same rules as for regular assignment apply.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes. But with char arr[10] = "Hello";, you are copying the string literal to an array arr and there's no need to "keep" the string literal. So if an implementation chooses to do remove the string literal altogether after copying it to arr and that's totally valid.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal.
Again there's no need to make/store a string literal for this.
Only if you directly have a pointer to a string literal, it'd be usually stored somewhere such as:
char *string = "Hello, world!\n";
Even then an implementation can choose not to do so under the "as-if" rule. E.g.,
#include <stdio.h>
#include <string.h>
static const char *str = "Hi";
int main(void)
{
char arr[10];
strcpy(arr, str);
puts(arr);
}
"Hi" can be eliminated because it's used only for copying it into arr and isn't accessed directly anywhere. So eliminating the string literal (and the strcpy call too) as if you had "char arr[10] = "Hi"; and wouldn't affect the observable behaviour.
Basically the C standard doesn't necessitate a string literal has to be stored anywhere as long as the properties associated with a string literal are satisfied.
Are there differences between: char a = 'a'; char a = {'a'}; How/where are the differences defined?
Yes. C11, 6.7.9 says:
The initializer for a scalar shall be a single expression, optionally enclosed in braces. [..]
Per the syntax, even:
char c = {'a',}; is valid and equivalent too (though I wouldn't recommend this :).
In the abstract machine, char arr[10] = "Hello"; means that arr is initialized by copying data from the string literal "Hello" which has its own existence elsewhere; whereas the other version just has initial values like any other variable -- there is no string literal involved.
However, the observable behaviour of both versions is identical: there is created arr with values set as specified. This is what the other poster meant by the code being identical; according to the Standard, two programs are the same if they have the same observable behaviour. Compilers are allowed to generate the same assembly for both versions.
Your second question is entirely separate to the first; but char a = 'a'; and char a = {'a'}; are identical. A single initializer may optionally be enclosed in braces.
I belive your question is highly implementation dependant (HW and compiler wise). However, in general: arrays are placed in RAM, let it be global or not.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes this saves the string "Hello" in ROM (read only memory). Your array is loaded the literal in runtime.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal.
Yes but in this case the single characters are placed in ROM. The array you are initialized is loaded with character literals in runtime.
Does char b[10]= {72, 101, 108, 108, 111, 0}; also create a "string" literal with static storage duration? Because theoretically it is the same thing.
If you use UTF-8, then yes, since char == uint8_t and those are the values.
Are there differences between:
char a = 'a';
char a = {'a'};
How/where are the differences defined?
I believe not.
In reply to edit
Do you mean the lifetime of storage of string literals? Have a look at this.
So a string literal has static storage duration. It remains throughout the lifetime of the program, hardcoded in memory.

Shouldn't it be impossible to point directly to text in C?

I am learning C and I came across the pointers.
Even though I learned more with this tutorial than from the textbook I still wonder about the char pointers.
If I program this
#include <stdio.h>
int main()
{
char *ptr_str;
ptr_str = "Hello World";
printf(ptr_str);
return 0;
}
The result is
Hello World
I don't understand how there isn't an error while compiling since the pointer ptr_str is pointing directly to the text and not to the first character of the text. I thought that only this would work
#include <stdio.h>
int main()
{
char *ptr_str;
char var_str[] = "Hello World";
ptr_str = var_str;
printf(ptr_str);
return 0;
}
So in the first example how was I pointing directly to the text?
Your code works because string literals are essentially static arrays.
ptr_str = "Hello World";
is treated by the compiler as if it were
static char __tmp_0[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };
ptr_str = __tmp_0;
(except trying to modify the contents of a string literal has undefined behavior).
You can even apply sizeof to a string literal and you'll get the size of the array: sizeof "Hello" is 6, for example.
In the context of assignment to a char pointer the 'value' of a string literal is the address of its first character.
so
ptr_str = "Hello World";
sets ptr_str to the address of the 'H'
Why won't the first one work? It will work as you have seen.
String literals are arrays. From §6.4.5p6 C11 Standard N1570
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.
Now in the first case literal array decayed into pointer to first element - so decayed pointer will basically be pointing to 'H'. You assigned that pointer to ptr_str. Now printf will expect a format specifier and the corresponding argument. Here it will be %s and corresponding argument would be char*. And printf will print every character until it reached the \0. That's all it happened. This is how you ended up pointing directly to the text.
Note that second case is quite different from first case in that - second case a copy is being made which can be modified (Trying to modify the first one would be undefined behavior). We are basically initializing a char array with the content of the string literal.

Initializing an array with a pointer in c

When I run the code:
char *abc="Goodbye";
for(i=0; i<=7; i++){
printf("%c\n",*(abc+i)); }
it runs without problem, but when I run the following:
char *der={'a','a','a','a','a'};
for(i=0; i<=4; i++){
printf("%c\n",*(der+i)); }
it doesn't show the correct results and I receive warnings.
So why is this happening since "Goodbye" and {'a','a','a','a','a'} are arrays of chars?
{'a','a','a','a','a'} is not an array but an initializer list, and it can be used to initialize either an aggregate or a scalar.
(Yes, int x = {'a'}; is valid.)
If it's used to initialize a scalar, such as a pointer, only the first value is used, so your declaration of der is equivalent to
char *der = 'a';
You can probably see what the problem is.
So today's programming lessons:
When your compiler warns you that something might be wrong, it probably is.
(Most experienced programmers treat warnings as errors, -Werr. It's an even more important habit for the inexperienced.)
If you don't understand what a warning means, don't do anything until you've found out.
One way to initialize the array - which is related to your second example - is by doing:
char foo[] = {'a', 'b', 'c'};
In this syntax, to the right side of = you provide the array elements as comma separeted values inside a { }. So in the above example, array foo has three elements. First is a, second is b, and third is c. If you wanted a \0 at the end, you need to do it explicitly as char foo[] = {'a', 'b', 'c', '\0'};.
Regarding your second example, answer by #molbdnilo already explains what is semantically wrong with your statment char *der={'a','a','a','a','a'};. If you want to define der as array and initialize it using { }, you can do:
char der[] = {'a','a','a','a','a'};
This way you are actually defining a char array and initializing it with the content you want.
Note that, you NEED to mention the array size if you are not initializing it while defining it. Which means:
char foo[]; // Will give error because no size is mentioned
char foo[10]; // Fine because size is given as 10
However, mentioning size is optional if you initialize the array while defining it, as we saw in the examples above. But, if you mention the size and if your initializer is smaller than the array size, remaining elements will be initialized to 0. Like:
char bar[10] = {'a', 'b', 'c', '\0'};
/* Your arrays' content will be 'a', 'b', 'c', '\0',
* and all remaning 6 elements will be 0
*/

C String manipulation pointer vs array notation [duplicate]

This question already has answers here:
getting segmentation fault in a small c program
(3 answers)
Closed 7 years ago.
Why does the first version make the program crash, while the second one doesn't? Aren't they the same thing?
Pointer Notation
char *shift = "mondo";
shift[3] = shift[2];
Array Notation
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
shift[3] = shift[2];
MWE
int main( void )
{
char *shift = "mondo";
shift[3] = shift[2];
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
shift[3] = shift[2];
return 0;
}
No! This is one of the important issues in C. In the first, you create a pointer to a read-only part of memory, i.e. you can not change it, only read it. The second, makes an array of characters, i.e. a part of memory of continuous characters where you can have both read and write access, meaning you can both read and change the values of the array.
First one points to a string literal (usually in a read only section of code, should really be const char * but able to get away with it due to historical reasons)|.
The second one creates an array and then populates that array.
Therefore they are not the same
The first is allocating memory in the .TEXT segment while the second is putting it into the .BSS. Memory in the .TEXT segment is, effectively, read only or const:
char *string = "AAAA";
This creates what is effectively a const char * since the memory will be allocated in the .TEXT segment as a string literal. Since this will typically be marked read-only, an attempt to write to it will generate an access violation or segmentation fault.
You want to do this:
char string[] = "AAAA";
This will work as expected and allocate memory for a string of four capital As and use the variable string as a pointer to the location.
This creates a pointer to an existing string:
char *shift = "mondo";
This creates a new array of characters:
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
In the second case, you are allowed to modify the characters because they are the ones that you just created.
In the first case, you are just pointing to an existing string, which should never be modified. The details of where the string is stored is up to the particular compiler. For example, it can store the string in unmodifyable memory. The compiler is also allowed to do tricks to save space. For example:
char *s1 = "hello there";
char *s2 = "there";
s2 might actually point to the same letter 't' that is at the seventh position of the string that s1 points to.
To avoid confusion, prefer to use const pointers with string literals:
const char *shift = "mondo";
This way, the compiler will let you know if you accidentally try to modify it.
Whenever you define a string using
char * str = "hello";
This is implicitly expressed by compiler
const char * str= "hello";
Which makes this symbol goes to read only location of program memory.
But in case of array the same is interpreted as
char const *array[];
That's why compiler screams when user try to change base address of array.
This is implicit done by compiler

Cannot modify char array

Consider the following code.
char message[]="foo";
void main(void){
message[] = "bar";
}
Why is there a syntax error in MPLAB IDE v8.63? I am just trying to change the value of character array.
You cannot use character array like that after declaration. If you want to assign new value to your character array, you can do it like this: -
strcpy(message, "bar");
Assignments like
message[] = "bar";
or
message = "bar";
are not supported by C.
The reason the initial assignment works is that it's actually array initialization masquerading as assignment. The compiler interprets
char message[]="foo";
as
char message[4] = {'f', 'o', 'o', '\0'};
There is actually no string literal "foo" involved here.
But when you try to
message = "bar";
The "bar" is interpreted as an actual string literal, and not only that, but message is not a modifiable lvalue, ie. you can't assign stuff to it. If you want to modify your array you must do it character by character:
message[0] = 'b';
message[1] = 'a';
etc, or (better) use a library function that does it for you, like strcpy().
you can do that only in the initialisation when you declare the char array
message[] = "bar";
You can not do it in your code
To modify it you can use strcpy from <string.h>
strcpy(message, "bar");
You cant change the character array like this . If you want to change the value of character array then you have to change it by modifying single character or you can use
strcpy(message,"bar");
char message[]="foo";
This statement cause compiler to create memory space of 4 char variable.Starting address of this memory cluster is pointer value of message. address of message is unchangeable, you cannot change the address where it points . In this case, your only chance is changing the data pointed by message.
char* message="foo"
In this time, memory is created to store the address of pointer, so the address where message point can change during execution. Then you can safely do message="bar"

Resources