I am trying to understanding the passing of string to a called function and modifying the elements of the array inside the called function.
void foo(char p[]){
p[0] = 'a';
printf("%s",p);
}
void main(){
char p[] = "jkahsdkjs";
p[0] = 'a';
printf("%s",p);
foo("fgfgf");
}
Above code returns an exception. I know that string in C is immutable, but would like to know what is there is difference between modifying in main and modifying the calling function. What happens in case of other date types?
I know that string in C is immutable
That's not true. The correct version is: modifying string literals in C are undefined behaviors.
In main(), you defined the string as:
char p[] = "jkahsdkjs";
which is a non-literal character array, so you can modify it. But what you passed to foo is "fgfgf", which is a string literal.
Change it to:
char str[] = "fgfgf";
foo(str);
would be fine.
In the first case:
char p[] = "jkahsdkjs";
p is an array that is initialized with a copy of the string literal. Since you don't specify the size it will determined by the length of the string literal plus the null terminating character. This is covered in the draft C99 standard section 6.7.8 Initialization paragraph 14:
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
in the second case:
foo("fgfgf");
you are attempting to modify a string literal which is undefined behavior, which means the behavior of program is unpredictable, and an exception is one possibility. From the C99 draft standard section 6.4.5 String literals paragraph 6 (emphasis mine):
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
The difference is in how you are initializing p[].
char p[] = "jkahsdkjs";
This initializas a writeable array called p, auto-sized to be large enough to contain your string and stored on the stack at runtime.
However, in the case of:
foo("fgfgf");
You are passing in a pointer to the actual string literal, which are usually enforced as read-only in most compilers.
What happens in case of other date types?
String literals are a very special case. Other data types, such as int, etc do not have an issue that is analogous to this, since they are stored strictly by value.
Related
char *str = "String!";
char *str = (char []){"String!"};
Are these two initializations equivalent? If not, what is the difference between them?
Are these two initializations equivalent?
No.
If not, what is the difference between them?
One of the major differences between the two is that you cannot modify the object that str is pointing to in the first snippet of code, while in the second one, you can.
Trying to modify a string literal in C is undefined behavior. So in case of
char *str = "String!";
if you try to modify the object pointed to by str, it'll invoke UB.
In case of
char *str = (char []){"String!"};
however, (char[]){"String!"} is a compound literal, which has type array of chars, and str points to the first element of that array.
Since the compound literal is not read-only (doesn't have a const qualifier), you can modify the object pointed to by str.
Another difference you should note is that the string literal "String!" in the first one has static storage duration, while the compound literal (char []){"String!"} in the second one has static storage duration only if it occurs outside the body of a function; otherwise, it has automatic storage duration associated with the enclosing block.
From n1570 6.5.2.5 (Compound literals) p12:
12 EXAMPLE 5 The following three expressions have different meanings:
"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char []){"/tmp/fileXXXXXX"}
The first always has static storage duration and has type array of
char, but need not be modifiable; the last two have automatic storage
duration when they occur within the body of a function, and the first
of these two is modifiable.
Are these two initializations equivalent?
No. There are at least three differences:
The C standard does not define the behavior upon attempting to modify the characters of "String!", but it does define the behavior of attempting to modify the characters of (char []){"String!"}.
Each occurrence of "String!" may be the same object, but each occurrence of (char []){"String!"} is a distinct object.
The array defined by "String!" has static lifetime (the memory for it is reserved for all of program execution), but the array defined by (char []){"String!"} has static lifetime if it is outside of any function and automatic lifetime if it is inside a function.
Let’s examine these differences:
The C standard only requires the contents of string literals to be available for reading, not writing. It does not define the behavior if you attempt to modify them. This does not mean a program may not modify them, just that the standard does not say what happens if a program tries. In consequence, good engineers will not attempt to modify them in ordinary situations. (However, a C implementation may define the behavior, in which case a program for that implementation could make use of that.)
C implementations are allowed to coalesce string literals and parts of them, so defining two pointers initialized with the same string, as with char *str0 = "String!"; and char *str1 = "String!";, may yield two pointers with the same value. This is due to a special rule in the C standard for string literals, so it does not apply for compound literals. When two pointers are initialized with the same compound literal source code, they must have different values. This means that multiple uses of compound literals must use more memory than multiple uses of strings, unless the compiler is able to optimize them away. Even a single use of a compound literal to initialize a pointer inside a function may cause more use of memory because the compiler typically must use space on the stack for the compound literal and separately have a copy of the string used to initialize it.
Because a string literal has static lifetime, its address may be returned from a function and used by the caller. However, a compound literal may not be relied on after the function it is defined in returns. For example:
char *GetErrorMessageFromCode(int code)
{
char *ErrorMessages[] =
{
"invalid argument",
"out of range",
"resource unavailable",
…
}
return ErrorMessages[code];
/* The above is working code for returning pointers to (the first
characters of) static string literals. If compound literals were
used instead, the behavior would not be defined by the C standard
because the memory for the compound literals would not be reserved
after the function returns.
*/
}
A book on C programming says,
"There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals.
Takeaway - String literals are read-only.
If introduced today, the type of string literals would certainly be char const[], an array of const-qualified characters. Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward
compatibility."
Question 1. How can strings be read only, like takeaway says, if they can be modified?
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
How can strings be read only, like takeaway says, if they can be modified?
Because Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward compatibility.
String literals existed before protection (in the form of const keyword) existed in the language.
Ergo, they are not protected, because the tools to protect it did not exist. A classic example of undefined behavior is writing to a string literal, see Undefined, unspecified and implementation-defined behavior .
What type is this referring to, which doesn't keep a string literal from being modified?
It has the type char[N] - "array of N chars" - where N is the number of characters in the string plus one for zero terminating character.
See https://en.cppreference.com/w/c/language/string_literal .
char *str = "string literal";
// ^^^ - implicit decay from `char [N]` -> `char *`
str[0] = 'a'; // compiler will be fine, but it's invalid code
// or super shorter form:
"string literal"[0] = 'a'; // invalid code
If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
The type would be const char[N] - an array of N constant chars, which means you can't modify the characters.
// assuming string literal has type const char [N]
const char *str = "string literal";
// ^^^ - implicit decay from `const char [N]` -> `const char *`
str[0] = 'a'; // compile time error
// or super shorter form:
"string literal"[0] = 'a'; // with `const char [N]` would be a compiler time error
With gcc compiler use -Wwrite-strings to protect against mistakes like writing to string literals.
Question 1. How can strings be read only, like takeaway says, if they can be modified?
The text does not say they can be modified. It says they are not protected from being modified. That is a slight error; properly, it should say they are not protected from attempts to modify them: The rules of the C standard do not prevent you from writing code that attempts to modify a string literal, and they do not define the results when a program executes such an attempt. In some circumstances, attempting to modify a string literal may result in a signal, usually ending program execution by default. In other circumstances, the attempt may succeed, and the string literal will be modified. In other circumstances, nothing will happen; there will be neither a signal nor a change to the string literal. It is also possible other behaviors may occur.
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Technically, a string literal is a piece of source code that has a character sequence inside quotes, optionally with an encoding prefix. During compilation or program execution, an array is generated with the contents of the character sequence and a terminating null character. For string literals without a prefix, the type of that array is char []. (If there is a prefix, the type may also be wchar_t [], char16_t [], or char32_t, depending on the prefix.)
Colloquially, we often refer to this array as the string literal, even though the array is the thing that results from a string literal (an array in memory) not the actual string literal (in the source code).
The type char [] does not contain const, so it does not offer the protections that const char [] does. (Those protections are fairly mild.)
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
Your confusion here is unclear. When a string literal appears in source code, the compiler arranges for its contents to be in the memory of the running program. Those contents are in memory as an array of characters. If the rules of C were different, the type of that array would be const char [] instead of char [].
I know that string literal used in program gets storage in read only area for eg.
//global
const char *s="Hello World \n";
Here string literal "Hello World\n" gets storage in read only area of program .
Now suppose I declare some literal in body of some function like
func1(char *name)
{
const char *s="Hello World\n";
}
As local variables to function are stored on activation record of that function, is this the
same case for string literals also? Again assume I call func1 from some function func2 as
func2()
{
//code
char *s="Mary\n";
//call1
func1(s);
//call2
func1("Charles");
//code
}
Here above,in 1st call of func1 from func2, starting address of 's' is passed i.e. address of s[0], while in 2nd call I am not sure what does actually happens. Where does string literal "Charles" get storage. Whether some temperory is created by compiler and it's address is passed or something else happens?
I found literals get storage in "read-only-data" section from
String literals: Where do they go?
but I am unclear about whether that happens only for global literals or for literals local to some function also. Any insight will be appreciable. Thank you.
A C string literal represents an array object of type char[len+1], where len is the length, plus 1 for the terminating '\0'. This array object has static storage duration, meaning that it exists for the entire execution of the program. This applies regardless of where the string literal appears.
The literal itself is an expression type char[len+1]. (In most but not all contexts, it will be implicitly converted to a char* value pointing to the first character.)
Compilers may optimize this by, for example, storing identical string literals just once, or by not storing them at all if they're never referenced.
If you write this:
const char *s="Hello World\n";
inside a function, the literal's meaning is as I described above. The pointer object s is initialized to point to the first character of the array object.
For historical reasons, string literals are not const in C, but attempting to modify the corresponding array object has undefined behavior. Declaring the pointer const, as you've done here, is not required, but it's an excellent idea.
Where string literals (or rather, the character arrays they are compiled to) are located in memory is an implementation detail in the compiler, so if you're thinking about what the C standard guarantees, they could be in a number of places, and string literals used in different ways in the program could end up in different places.
But in practice most compilers will treat all string literals the same, and they will probably all end up in a read-only segment. So string literals used as function arguments, or used inside functions, will be stored in the same place as the "global" ones.
I was going through some documentation which states that
First Case
char * p_var="Sack";
will create a constant string literal.
And hence code like
p_var[1]="u";
will fail because of that property.
Second Case
Also mentioned is that this is possible only for character literals and not for other data types through pointers. So code like
float *p="3.14";
will fail, resulting in a compiler error.
But when i try it out i don't get compiler errors ,accessing it though gives me 0.000000f(using gcc on Ubuntu).
So regarding the above, i have three queries:
Why are string literals created in First Case read-only?
Why are only string literals allowed to be created and not other constants like float through pointers?
3. Why is Second Case not giving me compiler errors?
Update
Please discard the 3rd question and second case. I tested it by adding quotes.
Thanks
The premise is wrong: pointers don’t create any string literals, neither read-only nor writeable.
What does create a read-only string literal is the literal itself: "foo" is a read-only string literal. And if you assign it to a pointer, then that pointer points to a read-only memory location.
With that, let’s turn to your questions:
Why are string literals created in First Case read-only?
The real question is: why not? In most cases, you won’t want to change the value of a string literal later on so the default assumption makes sense. Furthermore, you can create writeable strings in C via other means.
Why are only string literals allowed to be created and not other constants like float?
Again, wrong assumption. You can create other constants:
float f = 1.23f;
Here, the 1.23f literal is read-only. You can also assign it to a constant variable:
const float f = 1.23f;
Why is Second Case not giving me compiler errors?
Because the compiler cannot check in general whether your pointer points to read-only memory or to writeable memory. Consider this:
char* p = "Hello";
char str[] = "world"; // `str` is a writeable string!
p = &str[0];
p[1] = 'x';
Here, p[1] = 'x' is entirely legal – if we hadn’t re-assigned p beforehand, it would have been illegal. Checking this cannot be generally done at compile-time.
Regarding your question:
Why are string literals created in First Case read-only?
char *p_var="Sack";
Well, the p_var is assigned with the starting address of the memory allocated to the string "Sack". p_var content is not read-only, since you haven't put the const keyword anywhere in C constructs. Although manipulating the p_var contents like strcpy or strcat may cause undefined behavior.
Quote C ISO 9899:
The declaration
char s[] = "abc", t[3] = "abc";
defines plain char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to:
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration:
char *p = "abc";
defines p with type pointer to char and initializes it to point to an object with type array of char with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
An explanation of why it could be read-only per your platform and compiler:
Commonly string literals will be put in "read-only-data" section which gets mapped into the process space as read-only (which is why you seem to not being allowed to change it).
But some platforms do allow, the data segment to be writable.
Why are only string literals allowed to be created and not other constants like float? and the third question.
To create a float constant you should use:
const float f=1.5f;
Now, when you are doing:
float *p="3.14";
you are basically assigning the string literal's address to a float pointer.
Try compiling with -Wall -Werror -Wextra. You will find out what is happening.
It works because, in practice, there's no difference between a char * and a float * under the hood.
Its as if you are writing this:
float *p=(float*) "3.14";
This is a well-defined behaviour, unless the memory alignment requirements of float and char differ, in which case it results in undefined behaviour (Reference: C99, 6.3.2.3 p7).
Efficiency
they are
It's a string mind the quotes
float *p="3.14";
This is also a string literal !
Why are string literals created in First Case read-only?
No, both "sack" and "3.14" are string literals and both are read-only.
Why are only string literals allowed to be created and not other constants like float?
If you want to create a float const then do:
const float p=3.14;
Why is Second Case not giving me compiler errors?
You are making the pointer p point to a string literal. When you dereference p, it expects to read a float value. So there's nothing wrong as far as the compiler can see.
Wouldn't the pointer returned by the following function be inaccessible?
char *foo(int rc)
{
switch (rc)
{
case 1:
return("one");
case 2:
return("two");
default:
return("whatever");
}
}
So the lifetime of a local variable in C/C++ is practically only within the function, right? Which means, after char* foo(int) terminates, the pointer it returns no longer means anything, right?
I'm a bit confused about the lifetime of a local variable. What is a good clarification?
Yes, the lifetime of a local variable is within the scope({,}) in which it is created.
Local variables have automatic or local storage. Automatic because they are automatically destroyed once the scope within which they are created ends.
However, What you have here is a string literal, which is allocated in an implementation-defined read-only memory. String literals are different from local variables and they remain alive throughout the program lifetime. They have static duration [Ref 1] lifetime.
A word of caution!
However, note that any attempt to modify the contents of a string literal is an undefined behavior (UB). User programs are not allowed to modify the contents of a string literal.
Hence, it is always encouraged to use a const while declaring a string literal.
const char*p = "string";
instead of,
char*p = "string";
In fact, in C++ it is deprecated to declare a string literal without the const though not in C. However, declaring a string literal with a const gives you the advantage that compilers would usually give you a warning in case you attempt to modify the string literal in the second case.
Sample program:
#include<string.h>
int main()
{
char *str1 = "string Literal";
const char *str2 = "string Literal";
char source[]="Sample string";
strcpy(str1,source); // No warning or error just Undefined Behavior
strcpy(str2,source); // Compiler issues a warning
return 0;
}
Output:
cc1: warnings being treated as errors
prog.c: In function ‘main’:
prog.c:9: error: passing argument 1 of ‘strcpy’ discards qualifiers from pointer target type
Notice the compiler warns for the second case, but not for the first.
To answer the question being asked by a couple of users here:
What is the deal with integral literals?
In other words, is the following code valid?
int *foo()
{
return &(2);
}
The answer is, no this code is not valid. It is ill-formed and will give a compiler error.
Something like:
prog.c:3: error: lvalue required as unary ‘&’ operand
String literals are l-values, i.e: You can take the address of a string literal, but cannot change its contents.
However, any other literals (int, float, char, etc.) are r-values (the C standard uses the term the value of an expression for these) and their address cannot be taken at all.
[Ref 1]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
It's valid. String literals have static storage duration, so the pointer is not dangling.
For C, that is mandated in section 6.4.5, paragraph 6:
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence.
And for C++ in section 2.14.5, paragraphs 8-11:
8 Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).
9 A string literal that begins with u, such as u"asdf", is a char16_t string literal. A char16_t string literal has type “array of n const char16_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters. A single c-char may produce more than one char16_t character in the form of surrogate pairs.
10 A string literal that begins with U, such as U"asdf", is a char32_t string literal. A char32_t string literal has type “array of n const char32_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters.
11 A string literal that begins with L, such as L"asdf", is a wide string literal. A wide string literal has type “array of n const wchar_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters.
String literals are valid for the whole program (and are not allocated not the stack), so it will be valid.
Also, string literals are read-only, so (for good style) maybe you should change foo to const char *foo(int)
Yes, it is valid code, see case 1 below. You can safely return C strings from a function in at least these ways:
const char* to a string literal. It can't be modified and must not be freed by caller. It is rarely useful for the purpose of returning a default value, because of the freeing problem described below. It might make sense if you actually need to pass a function pointer somewhere, so you need a function returning a string..
char* or const char* to a static char buffer. It must not be freed by the caller. It can be modified (either by the caller if not const, or by the function returning it), but a function returning this can't (easily) have multiple buffers, so it is not (easily) threadsafe, and the caller may need to copy the returned value before calling the function again.
char* to a buffer allocated with malloc. It can be modified, but it must usually be explicitly freed by the caller and has the heap allocation overhead. strdup is of this type.
const char* or char* to a buffer, which was passed as an argument to the function (the returned pointer does not need to point to the first element of argument buffer). It leaves responsibility of buffer/memory management to the caller. Many standard string functions are of this type.
One problem is, mixing these in one function can get complicated. The caller needs to know how it should handle the returned pointer, how long it is valid, and if caller should free it, and there's no (nice) way of determining that at runtime. So you can't, for example, have a function, which sometimes returns a pointer to a heap-allocated buffer which caller needs to free, and sometimes a pointer to a default value from string literal, which caller must not free.
Good question. In general, you would be right, but your example is the exception. The compiler statically allocates global memory for a string literal. Therefore, the address returned by your function is valid.
That this is so is a rather convenient feature of C, isn't it? It allows a function to return a precomposed message without forcing the programmer to worry about the memory in which the message is stored.
See also #asaelr's correct observation re const.
Local variables are only valid within the scope they're declared, however you don't declare any local variables in that function.
It's perfectly valid to return a pointer to a string literal from a function, as a string literal exists throughout the entire execution of the program, just as a static or a global variable would.
If you're worrying about what you're doing might be invalid undefined, you should turn up your compiler warnings to see if there is in fact anything you're doing wrong.
str will never be a dangling pointer, because it points to a static address where string literals resides.
It will be mostly read-only and global to the program when it will be loaded.
Even if you try to free or modify, it will throw a segmentation fault on platforms with memory protection.
A local variable is allocated on the stack. After the function finishes, the variable goes out of scope and is no longer accessible in the code. However, if you have a global (or simply - not yet out of scope) pointer that you assigned to point to that variable, it will point to the place in the stack where that variable was. It could be a value used by another function, or a meaningless value.
In the above example shown by you, you are actually returning the allocated pointers to whatever function that calls the above. So it would not become a local pointer. And moreover, for the pointers that are needed to be returned, memory is allocated in the global segment.