#include <stdio.h>
void modify(char **s){
char *new_name = "Australia";
*s = new_name;
}
int main(){
char *name = "New Holland";
modify(&name);
printf("%s\n", name);
return 0;
}
Here we are trying to modify string literal name of char* type which is stored in a read-only location. So does this fall into undefined behavior?
First of all, you do not modify the string; you assign to the pointers, and pointers are not strings! They can only reference the C strings (which are nul-terminated arrays of char).
It's my understanding of the C standard (6.5.2.5 - especially examples, quoted below) that string literals have a static storage duration, and it is not UB as you assign the reference to the string literal (not the local variable). It does not matter where string literals are physically stored.
EXAMPLE 5 The following three expressions have different meanings:
"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char> []){"/tmp/fileXXXXXX"}
The first always has static storage
duration and has type array of char, but need not be modifiable; the
last two have automatic storage duration when they occur within the
body of a function, and the first of these two is modifiable.
Related
I have a question about string literals inside compound literals.
As the lifetime of the struct S compound literal is inside the block surrounding the func call, I wonder if it still alright to use the name pointer inside the global gs variable. I guess the string literal "foo" is not bound to the lifetime of the compound literal but rather resides in .rodata? Is that right? At least gs.name still prints foo.
#include <stddef.h>
#include <stdio.h>
struct S
{
const char *name;
int size;
};
static struct S gs;
static void func(struct S *s)
{
gs = *s;
}
int main(void)
{
{
func(&(struct S){.name="foo",.size=20});
}
printf("name: %s size: %d\n", gs.name, gs.size);
return 0;
}
I guess the string literal "foo" is not bound to the lifetime of the compound literal but rather resides in .rodata?
Yes. That is correct. In
func(&(struct S){.name="foo",.size=20});
where "foo" initializes a pointer, it's almost as if you had written (if that could be written in C):
func(&(struct S){.name=(static const char[]){'f','o','o','\0'},.size=20});
(C doesn't technically allow static on compound literals, and C's string literals are really de-facto const-char-arrays, but their formal type is char[] without the const (for weird historical reasons)).
But be careful with it. In C, you can use string literals to initialize arrays, and if .name were an member array, then storing "it" (its decayed pointer) to a global variable would be ill-advised.
This question already has answers here:
String literals: Where do they go?
(8 answers)
Closed 3 years ago.
I've been reading in various sources that string literals remain in memory for the whole lifetime of the program. In that case, what is the difference between those two functions
char *f1() { return "hello"; }
char *f2() {
char str[] = "hello";
return str;
}
While f1 compiles fine, f2 complains that I'm returning stack allocated data. What happens here?
if the str points to the actual string literal (which has static duration), why do I get an error?
if the string literal is copied to the local variable str, where does the original string literal go? does it remain in memory with no reference to it?
I've been reading in various sources that string literals remain in
memory for the whole lifetime of the program.
Yes.
In that case, what is
the difference between those two functions
char *f1() { return "hello"; }
char *f2() {
char str[] = "hello";
return str;
}
f1 returns a pointer to the first element of the array represented by a string literal, which has static storage duration. f2 returns a pointer to the first element of the automatic array str. str has a string literal for an initializer, but it is a separate object.
While f1 compiles fine, f2 complains that I'm returning stack
allocated data. What happens here?
if the str points to the actual string literal (which has static duration), why do I get an error?
It does not. In fact, it itself does not point to anything. It is an array, not a pointer.
if the string literal is copied to the local variable str, where does the original string literal go? does it remain in memory with no
reference to it?
C does not specify, but in practice, yes, some representation of the string literal must be stored somewhere in the program, perhaps in the function implementation, because it needs to be used to initialize str anew each time f2 is called.
This
char str[] = "hello";
is a declaration of a local array that is initialized by the string literal "hello".
In fact it is the same as if you declared the array the following way
char str[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
That is the own area of memory (with the automatic storage duration) of the array is initialized by a string literal.
After exiting the function the array will not be alive.
That is the function
char *f2() {
char str[] = "hello";
return str;
}
tries to return a pointer to the first element of the local character array str that has the automatic storage duration.
As for this function definition
char *f1() { return "hello"; }
then the function returns a pointer to the first character of the string literal "hello" that indeed has the static storage duration.
You may imagine the first function definition the following way
char literal[] = "hello";
char *f1() { return literal; }
Now compare where the arrays are defined in the first function definition and in the second function definition.
In the first function definition the array literal is defined globally while in the second function definition the array str is defined locally.
if the str points to the actual string literal (which has static
duration), why do I get an error?
str is not a pointer. It is a named extent of memory that was initialized by a string literal. That is the array has the type char[6].
In the return statement
return str;
the array is implicitly converted to pointer to its first element of the type char *.
Functions in C and C++ may not return arrays. In C++ functions may return references to arrays.
The string that you will see on your stack is not a direct result of the presence of a string literal. The string is stored, in case of ELF, in a separate region of the executable binary called "string table section", along with other string literals that the linker meets during the linking process. Whenever the stack context of the code that actually caused a string to be included is instantiated, the contents of the string in string table section are actually copied to the stack.
A brief reading that you might be interested in:
http://refspecs.linuxbase.org/elf/gabi4+/ch4.strtab.html
char str[] = "hello"; is a special syntax which copies the string literal, and your function returns a pointer to this local variable, which is destroyed once the function returns.
char *f1() { return "hello"; } is correct but returning const char* would probably be better.
This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 3 years ago.
I am trying to change value of character array components using a pointer. But I am not able to do so. Is there a fundamental difference between declaring arrays using the two different methods i.e. char A[] and char *A?
I tried accessing arrays using A[0] and it worked. But I am not able to change values of the array components.
{
char *A = "ab";
printf("%c\n", A[0]); //Works. I am able to access A[0]
A[0] = 'c'; //Segmentation fault. I am not able to edit A[0]
printf("%c\n", A[0]);
}
Expected output:
a
c
Actual output:
a
Segmentation fault
The difference is that char A[] defines an array and char * does not.
The most important thing to remember is that arrays are not pointers.
In this declaration:
char *A = "ab";
the string literal "ab" creates an anonymous array object of type char[3] (2 plus 1 for the terminating '\0'). The declaration creates a pointer called A and initializes it to point to the initial character of that array.
The array object created by a string literal has static storage duration (meaning that it exists through the entire execution of your program) and does not allow you to modify it. (Strictly speaking an attempt to modify it has undefined behavior.) It really should be const char[3] rather than char[3], but for historical reasons it's not defined as const. You should use a pointer to const to refer to it:
const char *A = "ab";
so that the compiler will catch any attempts to modify the array.
In this declaration:
char A[] = "ab";
the string literal does the same thing, but the array object A is initialized with a copy of the contents of that array. The array A is modifiable because you didn't define it with const -- and because it's an array object you created, rather than one implicitly created by a string literal, you can modify it.
An array indexing expression, like A[0] actually requires a pointer as one if its operands (and an integer as the other). Very often that pointer will be the result of an array expression "decaying" to a pointer, but it can also be just a pointer -- as long as that pointer points to an element of an array object.
The relationship between arrays and pointers in C is complicated, and there's a lot of misinformation out there. I recommend reading section 6 of the comp.lang.c FAQ.
You can use either an array name or a pointer to refer to elements of an array object. You ran into a problem with an array object that's read-only. For example:
#include <stdio.h>
int main(void) {
char array_object[] = "ab"; /* array_object is writable */
char *ptr = array_object; /* or &array_object[0] */
printf("array_object[0] = '%c'\n", array_object[0]);
printf("ptr[0] = '%c'\n", ptr[0]);
}
Output:
array_object[0] = 'a'
ptr[0] = 'a'
String literals like "ab" are supposed to be immutable, like any other literal (you can't alter the value of a numeric literal like 1 or 3.1419, for example). Unlike numeric literals, however, string literals require some kind of storage to be materialized. Some implementations (such as the one you're using, apparently) store string literals in read-only memory, so attempting to change the contents of the literal will lead to a segfault.
The language definition leaves the behavior undefined - it may work as expected, it may crash outright, or it may do something else.
String literals are not meant to be overwritten, think of them as read-only. It is undefined behavior to overwrite the string and your computer chose to crash the program as a result. You can use an array instead to modify the string.
char A[3] = "ab";
A[0] = 'c';
Is there a fundamental difference between declaring arrays using the two different methods i.e. char A[] and char *A?
Yes, because the second one is not an array but a pointer.
The type of "ab" is char /*readonly*/ [3]. It is an array with immutable content. So when you want a pointer to that string literal, you should use a pointer to char const:
char const *foo = "ab";
That keeps you from altering the literal by accident. If you however want to use the string literal to initialize an array:
char foo[] = "ab"; // the size of the array is determined by the initializer
// here: 3 - the characters 'a', 'b' and '\0'
The elements of that array can then be modified.
Array-indexing btw is nothing more but syntactic sugar:
foo[bar]; /* is the same as */ *(foo + bar);
That's why one can do funny things like
"Hello!"[2]; /* 'l' but also */ 2["Hello!"]; // 'l'
This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Closed 5 years ago.
A string constant in C can be initialized in two ways: using array and a character pointer;
Both can access the string constant and can print it;
Coming to editing part, if I want to edit a string that is initialized using arrays, it is straight forward and we can edit using array individual characters.
If I want to edit a string that is initialized using character pointer, is it impossible to do?
Let us consider the following two programs:
Program #1:
#include<stdio.h>
void str_change(char *);
int main()
{
char str[] = "abcdefghijklm";
printf("%s\n", str);
str_change(str);
printf("%s\n", str);
return 0;
}
void str_change(char *temp)
{
int i = 0;
while (temp[i] != '\0') {
temp[i] = 'n' + temp[i] - 'a';
i++;
}
}
Program #2:
#include<stdio.h>
void str_change(char *);
int main()
{
char *str = "abcdefghijklm";
printf("%s\n", str);
str_change(str);
printf("%s\n", str);
return 0;
}
void str_change(char *temp)
{
int i = 0;
while (temp[i] != '\0') {
temp[i] = 'n' + temp[i] - 'a';
i++;
}
}
I tried the following version of function to program #2, but of no use
void str_change(char *temp)
{
while (*temp != '\0') {
*temp = 'n' + *temp - 'a';
temp++;
}
}
The first program is working pretty well,but segmentation fault for other, So, is it mandatory to pass only the string constants that are initialized using arrays between functions, if editing of string is required?
So it is mandatory to pass only the string constants that are initialized using arrays between functions, if editing of string is requiredc?
Basically, yes, though the real explanation is not these exact words. The following definition creates an array:
char str[] = "abc";
this is not a string literal. The "abc" token is a string literal syntax, not a string literal object. Here, that literal specifies the initial value for the str array. Array objects are modifiable.
char *str = "abc";
Here the "abc" syntax in the source code is an expression denoting a string literal object in the translated program image. It's also a kind of array, with static storage duration (regardless of the storage duration of str). The "abc" syntax evaluates to a pointer to the first character of this array, and the str pointer is initialized with that pointer value.
String literals are not required to support modification; the behavior of attempting to modify a string literal object is undefined behavior.
Even in systems where you don't get a predictable segmentation fault, strange things can happen. For instance:
char *a = "xabc";
char *b = "abc";
b[0] = 'b'; /* b changes to "bbc" */
Suppose the assignment works. It's possible that a will also be changed to "xbbc". A C compiler is allowed to merge the storage of identical literals, or literals which are suffixes of other literals.
It doesn't matter whether or not a and b are close together; this sneaky effect could occur even between distant declarations in different functions, perhaps even in different translation units.
String literals should be considered to be part of the program's image; a program which successfully modifies a string literal is effectively self-modifying code. The reason you get a "segmentation fault" in your environment is precisely because of a safeguard against self-modifying code: the "text" section of the compiled program (which contains the machine code) is located in write-protected pages of virtual memory. And the string literals are placed there together with the machine code (often interspersed among its functions). Attempts to modify a string literal result in write accesses to the text section, which are blocked by the permission bits on the pages.
In another kind of environment, C code might be used to produce a software image which goes into read-only memory: actual ROM chips. The string literals go into the ROM together with the code. Attempting to modify one adds up to attempting to modify ROM. The hardware might have no detection for that. For instance, the instruction might appear to execute, but when the location is read back, the original value is still there, not the new value. Like the segmentation fault, this is within the specification range of "undefined behavior": any behavior is!
String literals are stored in static duration storage which exist for program lifetime and could be read only. Changing content of this literal leads to undefined behavior.
Copy this literal to modificable array and pass it to function.
char array[5];
strcpy(array, "test");
If you are declaring pointer to string literal, make it const so compiler Will warn you if you try to modify it.
const char * ptr = " string literal";
I think, because if you use pointer, you can only read this array, you can't write anything to there with loop, because elements of your array don't situated nearly. (Sorry for my English)
I am working on Microsoft Visual Studio environment. I came across a strange behavior
char *src ="123";
char *des ="abc";
printf("\nThe src string is %c", src[0]);
printf("\tThe dest string is %c",dest[0]);
des[0] = src[0];
printf("\nThe src string is %c", src[0]);
printf("\tThe dest string is %c",dest[0]);
The result is:
1 a
1 a
That means the des[0] is not being initialized. As src is pointing to the first element of the string. I guess by rules this should work.
This is undefined behavior:
des[0] = src[0];
Try this instead:
char des[] ="abc";
Since src and des are initialized with string literals, their type should actually be const char *, not char *; like this:
const char * src ="123";
const char * des ="abc";
There was never memory allocated for either of them, they just point to the predefined constants. Therefore, the statement des[0] = src[0] is undefined behavior; you're trying to change a constant there!
Any decent compiler should actually warn you about the implicit conversion from const char * to char *...
If using C++, consider using std::string instead of char *, and std::cout instead of printf.
Section 2.13.4 of ISO/IEC 14882 (Programming languages - C++) says:
A string literal is a sequence of characters (as defined in 2.13.2) surrounded by double quotes, optionally beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L is an ordinary string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n const char” and static storage duration (3.7), where n is the size of the string as defined below, and is initialized with the given characters. ...
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.
In C, string literals such as "123" are stored as arrays of char (const char in C++). These arrays are stored in memory such that they are available over the lifetime of the program. Attempting to modify the contents of a string literal results in undefined behavior; sometimes it will "work", sometimes it won't, depending on the compiler and the platform, so it's best to treat string literals as unwritable.
Remember that under most circumstances, an expression of type "N-element array of T" will be converted to an expression of type "pointer to T" whose value is the location of the first element in the array.
Thus, when you write
char *src = "123";
char *des = "abc";
the expressions "123" and "abc" are converted from "3-element array of char" to "pointer to char", and src will point to the '1' in "123", and des will point to the 'a' in "abc".
Again, attempting to modify the contents of a string literal results in undefined behavior, so when you write
des[0] = src[0];
the compiler is free to treat that statement any way it wants to, from ignoring it completely to doing exactly what you expect it to do to anything in between. That means that string literals, or a pointer to them, cannot be used as target parameters to calls like strcpy, strcat, memcpy, etc., nor should they be used as parameters to calls like strtok.
vinaygarg: That means the des[0] is not being initialized. As src is pointing to the first element of the string. I guess by rules this should work.
Firstly you must remember that *src and *dst are defined as pointers, nothing more, nothing less.
So you must then ask yourself what exactly "123" and "abc" are and why it cannot be altered? Well to cut a long story short, it is stored in application memory, which is read-only. Why? The strings must be stored with the program in order to be available to your code at run time, in theory you should get a compiler warning for assigning a non-const char* to a const char *. Why is it read-only? The memory for exe's and dll's need to be protected from being overwritten somehow, so it must be read-only to stop bugs and viruses from modifying executing code.
So how can you get this string into modifiable memory?
// Copying into an array.
const size_t BUFFER_SIZE = 256;
char buffer[BUFFER_SIZE];
strcpy(buffer, "abc");
strncpy(buffer, "abc", BUFFER_SIZE-1);