I have a question about string literals inside compound literals.
As the lifetime of the struct S compound literal is inside the block surrounding the func call, I wonder if it still alright to use the name pointer inside the global gs variable. I guess the string literal "foo" is not bound to the lifetime of the compound literal but rather resides in .rodata? Is that right? At least gs.name still prints foo.
#include <stddef.h>
#include <stdio.h>
struct S
{
const char *name;
int size;
};
static struct S gs;
static void func(struct S *s)
{
gs = *s;
}
int main(void)
{
{
func(&(struct S){.name="foo",.size=20});
}
printf("name: %s size: %d\n", gs.name, gs.size);
return 0;
}
I guess the string literal "foo" is not bound to the lifetime of the compound literal but rather resides in .rodata?
Yes. That is correct. In
func(&(struct S){.name="foo",.size=20});
where "foo" initializes a pointer, it's almost as if you had written (if that could be written in C):
func(&(struct S){.name=(static const char[]){'f','o','o','\0'},.size=20});
(C doesn't technically allow static on compound literals, and C's string literals are really de-facto const-char-arrays, but their formal type is char[] without the const (for weird historical reasons)).
But be careful with it. In C, you can use string literals to initialize arrays, and if .name were an member array, then storing "it" (its decayed pointer) to a global variable would be ill-advised.
Related
#include <stdio.h>
void modify(char **s){
char *new_name = "Australia";
*s = new_name;
}
int main(){
char *name = "New Holland";
modify(&name);
printf("%s\n", name);
return 0;
}
Here we are trying to modify string literal name of char* type which is stored in a read-only location. So does this fall into undefined behavior?
First of all, you do not modify the string; you assign to the pointers, and pointers are not strings! They can only reference the C strings (which are nul-terminated arrays of char).
It's my understanding of the C standard (6.5.2.5 - especially examples, quoted below) that string literals have a static storage duration, and it is not UB as you assign the reference to the string literal (not the local variable). It does not matter where string literals are physically stored.
EXAMPLE 5 The following three expressions have different meanings:
"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char> []){"/tmp/fileXXXXXX"}
The first always has static storage
duration and has type array of char, but need not be modifiable; the
last two have automatic storage duration when they occur within the
body of a function, and the first of these two is modifiable.
It's safe initialize pointers using compound literals in such way and it's possible at all?:
#include <stdio.h>
#include <string.h>
void numbers(int **p)
{
*p = (int []){1, 2, 3};
}
void chars(char **p)
{
*p = (char[]){'a','b','c'};
}
int main()
{
int *n;
char *ch;
numbers(&n);
chars(&ch);
printf("%d %c %c\n", n[0], ch[0], ch[1]);
}
output:
1 a b
I don't understand exactly how it's works, does it's not the same as init pointer with local variable?
also if i try to print:
printf("%s\n", ch);
It's print nothing.
A compound literal declared inside a function has automatic storage duration associated with its enclosing block (C 2018 6.5.2.5 5), which means its lifetime ends when execution of the block ends.
Inside numbers, *p = (int []){1, 2, 3}; assigns the address of the compound literal to *p. When numbers returns, the compound literal ceases to exist, and the pointer is invalid. After this, the behavior of a program that uses the pointer is undefined. The program might be able to print values because the data is still in memory, or the program might print different values because memory has changed, or the program might trap because it tried to access inaccessible memory, or the entire behavior of the program may change in drastic ways because compiler optimization changed the undefined behavior into something else completely.
It depends on where the compound literal is placed.
C17 6.5.2.5 §5
The value of the compound literal is that of an unnamed object initialized by the
initializer list. If the compound literal occurs outside the body of a function, the object
has static storage duration; otherwise, it has automatic storage duration associated with
the enclosing block.
That is, if the compound literal is at local scope, it works exactly like a local variable/array and it is not safe to return a pointer to it from a function.
If it is however declared at file scope, it works like any other variable with static storage duration, and you can safely return a pointer to it. However, doing so is probably an indication of questionable design. Plus you'll get the usual thread-safety issues in a multi-threaded application.
Here is one not-so-common way of initializing the pointer:
int *p = (int[10]){[1]=1};
Here, pointer point to compound literals.
#include <stdio.h>
int main(void)
{
int *p = (int[10]){[1]=1};
printf("%d\n", p[1]);
}
Output:
1
This program is compiled and run fine in G++ compiler.
So,
Is it the correct way to initializing a pointer to compound literals? or
Is it undefined behaviour initialize pointer to compound literals?
Yes, it is valid to have a pointer to compound literals. Standard allows this.
n1570-§6.5.2.5 (p8):
EXAMPLE 1 The file scope definition
int *p = (int []){2, 4};
initializes p to point to the first element of an array of two ints, the first having the value two and the second, four. The expressions in this compound literal are required to be constant. The unnamed object
has static storage duration.
I have came across a piece of code making the following initialization:
static const uint8_t s[] = {"Some string"};
I would expect it to be interpreted as follows: The right side is matching an array of char pointers with single element which points to a string literal "Some string". While the left part is an array of uint8_t. Then the behavior I would expect is that the first element of s to receive some truncated value of the pointer to the string literal, thus causing unexpected behavior in the following code, assuming s is a string.
I've made the following test code:
#include <stdint.h>
#include <stdio.h>
static const uint8_t s1[] = "String1";
static const uint8_t s2[] = { "String2" };
int main(void){
printf("%p, %p\n", s1, s2);
printf("%s, %s\n", s1, s2);
return 0;
}
For my surprise it seems that it is not happening. Not only the code will work correctly, but also the disassembly shows that both s1 and s2 are initialize as the corresponding strings in identical way.
Is this something gcc specific? Is C syntax permitting taking the single string literal into {} and still interpret it as a string literal?
Quoted from N1570(the final draft of C11), 6.7.9 Initialization(emphasis mine):
An array of character type may be initialized by a character string
literal or UTF-8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
The sun qingyao's answer correctly mentions that you can add extra braces to such initializer. It's worth to mention this does not only apply to arrays:
int x = { 0 };
compiles even though the element being initialized is not an array. This is thanks to the following clause:
6.7.9.11
The initializer for a scalar shall be a single expression, optionally enclosed in braces.
But why would such thing be allowed? The answer is, this makes possible to initialize values with a single syntax:
T x = { 0 };
works for any T and zero-initializes everything (for structs, every member, for arrays, every element, for scalar types, just initializes the value).
Consider this code:
char* foo(int myNum) {
char* StrArray[5] = {"TEST","ABC","XYZ","AA","BB"};
return StrArray[4];
}
When I return StrArray[4] to the caller, is this supposed to work?
Since the array is defined on the stack, when the caller gets the pointer, that part of memory has gone out of scope. Or will this code work?
This code will work. You are returning the value of the pointer in StrArray[4], which points to a constant string "BB". Constant strings have a lifetime equal to that of your entire program.
The important thing is the lifetime of what the pointer points to, not where the pointer is stored. For example, the following similar code would not work:
char* foo(int myNum) {
char bb[3] = "BB";
char* StrArray[5] = {"TEST","ABC","XYZ","AA",bb};
return StrArray[4];
}
This is because the bb array is a temporary value on the stack of the foo() function, and disappears when you return.
Beware: you're lying to the compiler.
Each element of StrArray points to a read-only char *;
You're telling the compiler the return value of your function is a modifiable char *.
Lie to the compiler and it will get its revenge sooner or later.
In C, the way to declare a pointer to read-only data is to qualify it with const.
I'd write your code as:
const char* foo(int myNum) {
const char* StrArray[5] = {"TEST","ABC","XYZ","AA","BB"};
return StrArray[4];
}
The code will work. The point you are returning (StrArray[4]) points to a string literal "BB". String literals in C are anonymous array objects with static storage duration, which means that they live as long as your program lives (i.e forever). It doesn't matter where you create that sting literal. Even if it is introduced inside a function, it still has static storage duration.
Just remember, that string literals are not modifiable, so it is better to use const char* pointers with string literals.
C uses arrays with indicies beginning at 0. So the first element is StrArray[0]
Thus StrArray[5] was not declared in your code.
C will allow you to write code to return StrArray[5] but wht happens is undefined and will differ on OS and compiler but often will crash the program.
Only the array of pointers is on the stack, not the string-literal-constants.