C String manipulation pointer vs array notation [duplicate] - c

This question already has answers here:
getting segmentation fault in a small c program
(3 answers)
Closed 7 years ago.
Why does the first version make the program crash, while the second one doesn't? Aren't they the same thing?
Pointer Notation
char *shift = "mondo";
shift[3] = shift[2];
Array Notation
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
shift[3] = shift[2];
MWE
int main( void )
{
char *shift = "mondo";
shift[3] = shift[2];
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
shift[3] = shift[2];
return 0;
}

No! This is one of the important issues in C. In the first, you create a pointer to a read-only part of memory, i.e. you can not change it, only read it. The second, makes an array of characters, i.e. a part of memory of continuous characters where you can have both read and write access, meaning you can both read and change the values of the array.

First one points to a string literal (usually in a read only section of code, should really be const char * but able to get away with it due to historical reasons)|.
The second one creates an array and then populates that array.
Therefore they are not the same

The first is allocating memory in the .TEXT segment while the second is putting it into the .BSS. Memory in the .TEXT segment is, effectively, read only or const:
char *string = "AAAA";
This creates what is effectively a const char * since the memory will be allocated in the .TEXT segment as a string literal. Since this will typically be marked read-only, an attempt to write to it will generate an access violation or segmentation fault.
You want to do this:
char string[] = "AAAA";
This will work as expected and allocate memory for a string of four capital As and use the variable string as a pointer to the location.

This creates a pointer to an existing string:
char *shift = "mondo";
This creates a new array of characters:
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
In the second case, you are allowed to modify the characters because they are the ones that you just created.
In the first case, you are just pointing to an existing string, which should never be modified. The details of where the string is stored is up to the particular compiler. For example, it can store the string in unmodifyable memory. The compiler is also allowed to do tricks to save space. For example:
char *s1 = "hello there";
char *s2 = "there";
s2 might actually point to the same letter 't' that is at the seventh position of the string that s1 points to.
To avoid confusion, prefer to use const pointers with string literals:
const char *shift = "mondo";
This way, the compiler will let you know if you accidentally try to modify it.

Whenever you define a string using
char * str = "hello";
This is implicitly expressed by compiler
const char * str= "hello";
Which makes this symbol goes to read only location of program memory.
But in case of array the same is interpreted as
char const *array[];
That's why compiler screams when user try to change base address of array.
This is implicit done by compiler

Related

I don't understand this use of pointers

I'm trying to understand this use of pointers. From what I realized so far the value pointers hold is a reference to the memory address of another entity, and when using the * sign we access the value of the entity referenced by the pointer.
However, in this code that I encountered in the tutorial i'm using, the ptr_strpointer has a string value which is not a memory address, so I don't understand how *ptr_str (which I expected to be the value of a referenced entity) is used in the for loop.
char *ptr_str; int i;
ptr_str = "Assign a string to a pointer.";
for (i=0; *ptr_str; i++)
printf("%c", *ptr_str++);
This:
ptr_str = "Assign a string to a pointer.";
Is a shorthand for this:
// Somewhere else:
char real_str[] = {'A', 's', 's', 'i', 'g', ..., '.', '\0'};
// In your main():
ptr_str = real_str;
// or
ptr_str = &real_str[0];
In other words, string literals like "Hello World" are actually pointers to a character array holding your string. This is all done transparently by the compiler, so it might be confusing at first sight.
If you're curious, take a look at this other answer of mine, where I explain this in more detail.

differences in array initialization(char, string, other) regarding storage duration

In this question it was said in the comments:
char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; and char arr[10] =
"Hello"; are strictly the same thing. – Michael Walz
This got me thinking.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal with.
Does char b[10]= {72, 101, 108, 108, 111, 0}; also create a "string" literal with static storage duration? Because theoretically it is the same thing.
char a = 'a'; is the same thing as char a; ...; a = 'a';, so your thoughts are correct 'a' is simply written to a
Are there differences between:
char a = 'a';
char a = {'a'};
How/where are the differences defined?
EDIT:
I see that I haven't made it clear enough that I am particularly interested in the memory usage/storage duration of the literals. I will leave the question as it is, but would like to make the emphasis of the question more clear in this edit.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes, but string literals are also a grammatical item in the C language. char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; is not a string literal, it is an initializer list. The initializer list does however behave as if it has static storage duration, remaining elements after the explicit \0 are set to zero etc.
The initializer list itself is stored in some manner of ROM memory. If your variable arr has static storage duration too, it will get allocated in the .data segment and initialized from the ROM init list before the program is started. If arr has automatic storage duration (local), then it is initialized from ROM in run-time, when the function containing arr is called.
The ROM memory where the initializer list is stored may or may not be the same ROM memory as used for string literals. Often there's a segment called .rodata where these things end up, but they may as well end up in some other segment, such as the code segment .text.
Compilers like to store string literals in a particular memory segment, because that means that they can perform an optimization called "string pooling". Meaning that if you have the string literal "Hello" several times in your program, the compiler will use the same memory location for it. It may not necessarily do this same optimization for initializer lists.
Regarding 'a' versus {'a'} in an initializer list, that's just a syntax hiccup in the C language. C11 6.7.6/11:
The initializer for a scalar shall be a single expression, optionally enclosed in braces. The
initial value of the object is that of the expression (after conversion); the same type
constraints and conversions as for simple assignment apply,
In plain English, this means that a "non-array" (scalar) can be either initialized with or without braces, it has the same meaning. Apart from that, the same rules as for regular assignment apply.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes. But with char arr[10] = "Hello";, you are copying the string literal to an array arr and there's no need to "keep" the string literal. So if an implementation chooses to do remove the string literal altogether after copying it to arr and that's totally valid.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal.
Again there's no need to make/store a string literal for this.
Only if you directly have a pointer to a string literal, it'd be usually stored somewhere such as:
char *string = "Hello, world!\n";
Even then an implementation can choose not to do so under the "as-if" rule. E.g.,
#include <stdio.h>
#include <string.h>
static const char *str = "Hi";
int main(void)
{
char arr[10];
strcpy(arr, str);
puts(arr);
}
"Hi" can be eliminated because it's used only for copying it into arr and isn't accessed directly anywhere. So eliminating the string literal (and the strcpy call too) as if you had "char arr[10] = "Hi"; and wouldn't affect the observable behaviour.
Basically the C standard doesn't necessitate a string literal has to be stored anywhere as long as the properties associated with a string literal are satisfied.
Are there differences between: char a = 'a'; char a = {'a'}; How/where are the differences defined?
Yes. C11, 6.7.9 says:
The initializer for a scalar shall be a single expression, optionally enclosed in braces. [..]
Per the syntax, even:
char c = {'a',}; is valid and equivalent too (though I wouldn't recommend this :).
In the abstract machine, char arr[10] = "Hello"; means that arr is initialized by copying data from the string literal "Hello" which has its own existence elsewhere; whereas the other version just has initial values like any other variable -- there is no string literal involved.
However, the observable behaviour of both versions is identical: there is created arr with values set as specified. This is what the other poster meant by the code being identical; according to the Standard, two programs are the same if they have the same observable behaviour. Compilers are allowed to generate the same assembly for both versions.
Your second question is entirely separate to the first; but char a = 'a'; and char a = {'a'}; are identical. A single initializer may optionally be enclosed in braces.
I belive your question is highly implementation dependant (HW and compiler wise). However, in general: arrays are placed in RAM, let it be global or not.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes this saves the string "Hello" in ROM (read only memory). Your array is loaded the literal in runtime.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal.
Yes but in this case the single characters are placed in ROM. The array you are initialized is loaded with character literals in runtime.
Does char b[10]= {72, 101, 108, 108, 111, 0}; also create a "string" literal with static storage duration? Because theoretically it is the same thing.
If you use UTF-8, then yes, since char == uint8_t and those are the values.
Are there differences between:
char a = 'a';
char a = {'a'};
How/where are the differences defined?
I believe not.
In reply to edit
Do you mean the lifetime of storage of string literals? Have a look at this.
So a string literal has static storage duration. It remains throughout the lifetime of the program, hardcoded in memory.

Shouldn't it be impossible to point directly to text in C?

I am learning C and I came across the pointers.
Even though I learned more with this tutorial than from the textbook I still wonder about the char pointers.
If I program this
#include <stdio.h>
int main()
{
char *ptr_str;
ptr_str = "Hello World";
printf(ptr_str);
return 0;
}
The result is
Hello World
I don't understand how there isn't an error while compiling since the pointer ptr_str is pointing directly to the text and not to the first character of the text. I thought that only this would work
#include <stdio.h>
int main()
{
char *ptr_str;
char var_str[] = "Hello World";
ptr_str = var_str;
printf(ptr_str);
return 0;
}
So in the first example how was I pointing directly to the text?
Your code works because string literals are essentially static arrays.
ptr_str = "Hello World";
is treated by the compiler as if it were
static char __tmp_0[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };
ptr_str = __tmp_0;
(except trying to modify the contents of a string literal has undefined behavior).
You can even apply sizeof to a string literal and you'll get the size of the array: sizeof "Hello" is 6, for example.
In the context of assignment to a char pointer the 'value' of a string literal is the address of its first character.
so
ptr_str = "Hello World";
sets ptr_str to the address of the 'H'
Why won't the first one work? It will work as you have seen.
String literals are arrays. From §6.4.5p6 C11 Standard N1570
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.
Now in the first case literal array decayed into pointer to first element - so decayed pointer will basically be pointing to 'H'. You assigned that pointer to ptr_str. Now printf will expect a format specifier and the corresponding argument. Here it will be %s and corresponding argument would be char*. And printf will print every character until it reached the \0. That's all it happened. This is how you ended up pointing directly to the text.
Note that second case is quite different from first case in that - second case a copy is being made which can be modified (Trying to modify the first one would be undefined behavior). We are basically initializing a char array with the content of the string literal.

Different type of string declarations in C [duplicate]

This question already has answers here:
How to declare strings in C [duplicate]
(4 answers)
Closed 8 years ago.
What is the difference between the following notations in string definition from a program address space point of view?
char str[20] = "Just to Ask";
char *str = "Just to Ask";
char str[20] = "Just to Ask";
The above statement defines an array str of 20 characters and initializes the array with the string literal "Just to Ask". The above statement is equivalent to
char str[20] = {'J', 'u', 's', 't', ' ', 't', 'o', ' ', 'A', 's', 'k', '\0'};
The array initializer list has only 12 elements. The rest 8 elements of the array str are initialized to 0. If the array has automatic storage allocation, then it is allocated on the stack. If it has static storage allocation, then it is allocated in the data section of the program memory space.
The below statement
char *str = "Just to Ask";
defines str not an array but a pointer which points to the string literal "Just to Ask". The string literal is read-only and attempting to modify it is undefined behaviour. Therefore, the two statements stated in your questions are totally different and the second statement better be written as
const char *str = "Just to Ask"; // string literal is read-only

strtok - char array versus char pointer [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
strtok wont accept: char *str
When using the strtok function, using a char * instead of a char [] results in a segmentation fault.
This runs properly:
char string[] = "hello world";
char *result = strtok(string, " ");
This causes a segmentation fault:
char *string = "hello world";
char *result = strtok(string, " ");
Can anyone explain what causes this difference in behaviour?
char string[] = "hello world";
This line initializes string to be a big-enough array of characters (in this case char[12]). It copies those characters into your local array as though you had written out
char string[] = { 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' };
The other line:
char* string = "hello world";
does not initialize a local array, it just initializes a local pointer. The compiler is allowed to set it to a pointer to an array which you're not allowed to change, as though the code were
const char literal_string[] = "hello world";
char* string = (char*) literal_string;
The reason C allows this without a cast is mainly to let ancient code continue compiling. You should pretend that the type of a string literal in your source code is const char[], which can convert to const char*, but never convert it to a char*.
In the second example:
char *string = "hello world";
char *result = strtok(string, " ");
the pointer string is pointing to a string literal, which cannot be modified (as strtok() would like to do).
You could do something along the lines of:
char *string = strdup("hello world");
char *result = strtok(string, " ");
so that string is pointing to a modifiable copy of the literal.
strtok modifies the string you pass to it (or tries to anyway). In your first code, you're passing the address of an array that's been initialized to a particular value -- but since it's a normal array of char, modifying it is allowed.
In the second code, you're passing the address of a string literal. Attempting to modify a string literal gives undefined behavior.
In the second case (char *), the string is in read-only memory. The correct type of string constants is const char *, and if you used that type to declare the variable you would get warned by the compiler when you tried to modify it. For historical reasons, you're allowed to use string constants to initialize variables of type char * even though they can't be modified. (Some compilers let you turn this historic license off, e.g. with gcc's -Wwrite-strings.)
The first case creates a (non const) char array that is big enough to hold the string and initializes it with the contents of the string. The second case creates a char pointer and initializes it to point at the string literal, which is probably stored in read only memory.
Since strtok wants to modify the memory pointed at by the argument you pass it, the latter case causes undefined behavior (you're passing in a pointer that points at a (const) string literal), so its unsuprising that it crashes
Because the second one declares a pointer (that can change) to a constant string...
So depending on your compiler / platform / OS / memory map... the "hello world" string will be stored as a constant (in an embedded system, it may be stored in ROM) and trying to modify it will cause that error.

Resources