String initializer with curly braces - c

I have came across a piece of code making the following initialization:
static const uint8_t s[] = {"Some string"};
I would expect it to be interpreted as follows: The right side is matching an array of char pointers with single element which points to a string literal "Some string". While the left part is an array of uint8_t. Then the behavior I would expect is that the first element of s to receive some truncated value of the pointer to the string literal, thus causing unexpected behavior in the following code, assuming s is a string.
I've made the following test code:
#include <stdint.h>
#include <stdio.h>
static const uint8_t s1[] = "String1";
static const uint8_t s2[] = { "String2" };
int main(void){
printf("%p, %p\n", s1, s2);
printf("%s, %s\n", s1, s2);
return 0;
}
For my surprise it seems that it is not happening. Not only the code will work correctly, but also the disassembly shows that both s1 and s2 are initialize as the corresponding strings in identical way.
Is this something gcc specific? Is C syntax permitting taking the single string literal into {} and still interpret it as a string literal?

Quoted from N1570(the final draft of C11), 6.7.9 Initialization(emphasis mine):
An array of character type may be initialized by a character string
literal or UTF-8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.

The sun qingyao's answer correctly mentions that you can add extra braces to such initializer. It's worth to mention this does not only apply to arrays:
int x = { 0 };
compiles even though the element being initialized is not an array. This is thanks to the following clause:
6.7.9.11
The initializer for a scalar shall be a single expression, optionally enclosed in braces.
But why would such thing be allowed? The answer is, this makes possible to initialize values with a single syntax:
T x = { 0 };
works for any T and zero-initializes everything (for structs, every member, for arrays, every element, for scalar types, just initializes the value).

Related

Is this a syntax error or an error with the compiler?

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char charName[] = "John";
int charAge[] = 35;
printf("There once was a man named %s\n", charName);
printf("He was %s\n", charAge);
return 0;
}
First error: array initializer must be an initializer list or wide string literal
int charAge[] = 35;
Second error: make: *** [<builtin>: main.o] Error 1
exit status 2
I did everything I could to fix this, but nothing worked.
The error comes from the fact that you're declaring an array of integers, int charAge[] and yet you're assigning not an array but an individual value, 35, to it.
Based on your usage, you don't want an array. So, you can just do
int charAge = 35;
Furthermore, when you're printing charAge, you need to use %i (or %d; that's also acceptable for integers) instead of %s. %s is for char arrays like charName. That is, you should do
printf("He was %i\n", charAge);
If it was an error of the compiler then it would be already resolved.
If you declare an array then the initializer must be a braced list like
int charAge[] = { 35 };
The only exception is initialization of character arrays with string literals like
char charName[] = "John";
that may be also rewritten like
char charName[] = { "John" };
From the C Standard (6.7.9 Initialization)
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
15 An array with element type compatible with a qualified or
unqualified version of wchar_t may be initialized by a wide string
literal, optionally enclosed in braces. Successive wide characters of
the wide string literal (including the terminating null wide character
if there is room or if the array is of unknown size) initialize the
elements of the array.
16 Otherwise, the initializer for an object that has aggregate or union type shall be a brace-enclosed list of initializers for the
elements or named members.
Pay attention to that this call of printf
printf("He was %s\n", charAge);
invokes undefined behavior.
Taking into account the name of the array it seems you was going to declare it with the element type char like
char charAge[] = "35";
Otherwise you need to write
printf("He was %d\n", *charAge);
In general elements of an array of an integral type (except arrays that contain strings) should be outputted in a loop element by element.
Though in this concrete case there is no sense to declare an array. You could declare just a scalar object
int charAge = 35;
In this case the call or printf will look like
printf("He was %d\n", charAge);

Why pointers can't be used to index arrays? [duplicate]

This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 3 years ago.
I am trying to change value of character array components using a pointer. But I am not able to do so. Is there a fundamental difference between declaring arrays using the two different methods i.e. char A[] and char *A?
I tried accessing arrays using A[0] and it worked. But I am not able to change values of the array components.
{
char *A = "ab";
printf("%c\n", A[0]); //Works. I am able to access A[0]
A[0] = 'c'; //Segmentation fault. I am not able to edit A[0]
printf("%c\n", A[0]);
}
Expected output:
a
c
Actual output:
a
Segmentation fault
The difference is that char A[] defines an array and char * does not.
The most important thing to remember is that arrays are not pointers.
In this declaration:
char *A = "ab";
the string literal "ab" creates an anonymous array object of type char[3] (2 plus 1 for the terminating '\0'). The declaration creates a pointer called A and initializes it to point to the initial character of that array.
The array object created by a string literal has static storage duration (meaning that it exists through the entire execution of your program) and does not allow you to modify it. (Strictly speaking an attempt to modify it has undefined behavior.) It really should be const char[3] rather than char[3], but for historical reasons it's not defined as const. You should use a pointer to const to refer to it:
const char *A = "ab";
so that the compiler will catch any attempts to modify the array.
In this declaration:
char A[] = "ab";
the string literal does the same thing, but the array object A is initialized with a copy of the contents of that array. The array A is modifiable because you didn't define it with const -- and because it's an array object you created, rather than one implicitly created by a string literal, you can modify it.
An array indexing expression, like A[0] actually requires a pointer as one if its operands (and an integer as the other). Very often that pointer will be the result of an array expression "decaying" to a pointer, but it can also be just a pointer -- as long as that pointer points to an element of an array object.
The relationship between arrays and pointers in C is complicated, and there's a lot of misinformation out there. I recommend reading section 6 of the comp.lang.c FAQ.
You can use either an array name or a pointer to refer to elements of an array object. You ran into a problem with an array object that's read-only. For example:
#include <stdio.h>
int main(void) {
char array_object[] = "ab"; /* array_object is writable */
char *ptr = array_object; /* or &array_object[0] */
printf("array_object[0] = '%c'\n", array_object[0]);
printf("ptr[0] = '%c'\n", ptr[0]);
}
Output:
array_object[0] = 'a'
ptr[0] = 'a'
String literals like "ab" are supposed to be immutable, like any other literal (you can't alter the value of a numeric literal like 1 or 3.1419, for example). Unlike numeric literals, however, string literals require some kind of storage to be materialized. Some implementations (such as the one you're using, apparently) store string literals in read-only memory, so attempting to change the contents of the literal will lead to a segfault.
The language definition leaves the behavior undefined - it may work as expected, it may crash outright, or it may do something else.
String literals are not meant to be overwritten, think of them as read-only. It is undefined behavior to overwrite the string and your computer chose to crash the program as a result. You can use an array instead to modify the string.
char A[3] = "ab";
A[0] = 'c';
Is there a fundamental difference between declaring arrays using the two different methods i.e. char A[] and char *A?
Yes, because the second one is not an array but a pointer.
The type of "ab" is char /*readonly*/ [3]. It is an array with immutable content. So when you want a pointer to that string literal, you should use a pointer to char const:
char const *foo = "ab";
That keeps you from altering the literal by accident. If you however want to use the string literal to initialize an array:
char foo[] = "ab"; // the size of the array is determined by the initializer
// here: 3 - the characters 'a', 'b' and '\0'
The elements of that array can then be modified.
Array-indexing btw is nothing more but syntactic sugar:
foo[bar]; /* is the same as */ *(foo + bar);
That's why one can do funny things like
"Hello!"[2]; /* 'l' but also */ 2["Hello!"]; // 'l'

Why does C not allow concatenating strings when using the conditional operator?

The following code compiles without problems:
int main() {
printf("Hi" "Bye");
}
However, this does not compile:
int main() {
int test = 0;
printf("Hi" (test ? "Bye" : "Goodbye"));
}
What is the reason for that?
As per the C11 standard, chapter §5.1.1.2, concatenation of adjacent string literals:
Adjacent string literal tokens are concatenated.
happens in translation phase. On the other hand:
printf("Hi" (test ? "Bye" : "Goodbye"));
involves the conditional operator, which is evaluated at run-time. So, at compile time, during the translation phase, there are no adjacent string literals present, hence the concatenation is not possible. The syntax is invalid and thus reported by your compiler.
To elaborate a bit on the why part, during the preprocessing phase, the adjacent string literals are concatenated and represented as a single string literal (token). The storage is allocated accordingly and the concatenated string literal is considered as a single entity (one string literal).
On the other hand, in case of run-time concatenation, the destination should have enough memory to hold the concatenated string literal otherwise, there will be no way that the expected concatenated output can be accessed. Now, in case of string literals, they are already allocated memory at compile-time and cannot be extended to fit in any more incoming input into or appended to the original content. In other words, there will be no way that the concatenated result can be accessed (presented) as a single string literal. So, this construct in inherently incorrect.
Just FYI, for run-time string (not literals) concatenation, we have the library function strcat() which concatenates two strings. Notice, the description mentions:
char *strcat(char * restrict s1,const char * restrict s2);
The strcat() function appends a copy of the string pointed to by s2 (including the
terminating null character) to the end of the string pointed to by s1. The initial character
of s2 overwrites the null character at the end of s1. [...]
So, we can see, the s1 is a string, not a string literal. However, as the content of s2 is not altered in any way, it can very well be a string literal.
According to the C Standard (5.1.1.2 Translation phases)
1 The precedence among the syntax rules of translation is specified by
the following phases.6)
Adjacent string literal tokens are concatenated.
And only after that
White-space characters separating tokens are no longer significant. Each
preprocessing token is converted into a token. The resulting
tokens are syntactically and semantically analyzed and translated as a
translation unit.
In this construction
"Hi" (test ? "Bye" : "Goodbye")
there are no adjacent string literal tokens. So this construction is invalid.
String literal concatenation is performed by the preprocessor at compile-time. There is no way for this concatenation to be aware of the value of test, which is not known until the program actually executes. Therefore, these string literals cannot be concatenated.
Because the general case is that you wouldn't have a construction like this for values known at compile-time, the C standard was designed to restrict the auto-concatenation feature to the most basic case: when the literals are literally right alongside each other.
But even if it did not word this restriction in that way, or if the restriction were differently-constructed, your example would still be impossible to realise without making the concatenation a runtime process. And, for that, we have the library functions such as strcat.
Because C has no string type. String literals are compiled to char arrays, referenced by a char* pointer.
C allows adjacent literals to be combined at compile-time, as in your first example. The C compiler itself has some knowledge about strings. But this information is not present at runtime, and thus concatenation cannot happen.
During the compilation process, your first example is "translated" to:
int main() {
static const char char_ptr_1[] = {'H', 'i', 'B', 'y', 'e', '\0'};
printf(char_ptr_1);
}
Note how the two strings are combined to a single static array by the compiler, before the program ever executes.
However, your second example is "translated" to something like this:
int main() {
static const char char_ptr_1[] = {'H', 'i', '\0'};
static const char char_ptr_2[] = {'B', 'y', 'e', '\0'};
static const char char_ptr_3[] = {'G', 'o', 'o', 'd', 'b', 'y', 'e', '\0'};
int test = 0;
printf(char_ptr_1 (test ? char_ptr_2 : char_ptr_3));
}
It should be clear why this does not compile. The ternary operator ? is evaluated at runtime, not compile-time, when the "strings" no longer exist as such, but only as simple char arrays, referenced by char* pointers. Unlike adjacent string literals, adjacent char pointers are simply a syntax error.
If you really want to have both branches produce compile-time string constants to be chosen at runtime, you'll need a macro.
#include <stdio.h>
#define ccat(s, t, a, b) ((t)?(s a):(s b))
int
main ( int argc, char **argv){
printf("%s\n", ccat("hello ", argc > 2 , "y'all", "you"));
return 0;
}
What is the reason for that?
Your code using ternary operator conditionally chooses between two string literals. No matter condition known or unknown, this can't be evaluated at compile time, so it can't compile. Even this statement printf("Hi" (1 ? "Bye" : "Goodbye")); wouldn't compile. The reason is in depth explained in the answers above. Another possibility of making such a statement using ternary operator valid to compile, would also involve a format tag and the result of the ternary operator statement formatted as additional argument to printf. Even then, printf() printout would give an impression of "having concatenated" those strings only at, and as early as runtime.
#include <stdio.h>
int main() {
int test = 0;
printf("Hi %s\n", (test ? "Bye" : "Goodbye")); //specify format and print as result
}
In printf("Hi" "Bye"); you have two consecutive arrays of char which the compiler can make into a single array.
In printf("Hi" (test ? "Bye" : "Goodbye")); you have one array followed by a pointer to char (an array converted to a pointer to its first element). The compiler cannot merge an array and a pointer.
To answer the question - I would go to the definition of printf. The function printf expects const char* as argument. Any string literal such as "Hi" is a const char*; however an expression such as (test)? "str1" : "str2" is NOT a const char* because the result of such expression is found only at run-time and hence is indeterminate at compile time, a fact which duly causes the compiler to complain. On the other hand - this works perfectly well printf("hi %s", test? "yes":"no")
This does not compile because the parameter list for the printf function is
(const char *format, ...)
and
("Hi" (test ? "Bye" : "Goodbye"))
does not fit the parameter list.
gcc tries to make sense of it by imagining that
(test ? "Bye" : "Goodbye")
is a parameter list, and complains that "Hi" is not a function.

Modifying the array element in called function

I am trying to understanding the passing of string to a called function and modifying the elements of the array inside the called function.
void foo(char p[]){
p[0] = 'a';
printf("%s",p);
}
void main(){
char p[] = "jkahsdkjs";
p[0] = 'a';
printf("%s",p);
foo("fgfgf");
}
Above code returns an exception. I know that string in C is immutable, but would like to know what is there is difference between modifying in main and modifying the calling function. What happens in case of other date types?
I know that string in C is immutable
That's not true. The correct version is: modifying string literals in C are undefined behaviors.
In main(), you defined the string as:
char p[] = "jkahsdkjs";
which is a non-literal character array, so you can modify it. But what you passed to foo is "fgfgf", which is a string literal.
Change it to:
char str[] = "fgfgf";
foo(str);
would be fine.
In the first case:
char p[] = "jkahsdkjs";
p is an array that is initialized with a copy of the string literal. Since you don't specify the size it will determined by the length of the string literal plus the null terminating character. This is covered in the draft C99 standard section 6.7.8 Initialization paragraph 14:
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
in the second case:
foo("fgfgf");
you are attempting to modify a string literal which is undefined behavior, which means the behavior of program is unpredictable, and an exception is one possibility. From the C99 draft standard section 6.4.5 String literals paragraph 6 (emphasis mine):
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
The difference is in how you are initializing p[].
char p[] = "jkahsdkjs";
This initializas a writeable array called p, auto-sized to be large enough to contain your string and stored on the stack at runtime.
However, in the case of:
foo("fgfgf");
You are passing in a pointer to the actual string literal, which are usually enforced as read-only in most compilers.
What happens in case of other date types?
String literals are a very special case. Other data types, such as int, etc do not have an issue that is analogous to this, since they are stored strictly by value.

Simple modification of C strings using pointers

I have two pointers to the same C string. If I increment the second pointer by one, and assign the value of the second pointer to that of the first, I expect the first character of the first string to be changed. For example:
#include "stdio.h"
int main() {
char* original_str = "ABC"; // Get pointer to "ABC"
char* off_by_one = original_str; // Duplicate pointer to "ABC"
off_by_one++; // Increment duplicate by one: now "BC"
*original_str = *off_by_one; // Set 1st char of one to 1st char of other
printf("%s\n", original_str); // Prints "ABC" (why not "BBC"?)
*original_str = *(off_by_one + 1); // Set 1st char of one to 2nd char of other
printf("%s\n", original_str); // Prints "ABC" (why not "CBC"?)
return 0;
}
This doesn't work. I'm sure I'm missing something obvious - I have very, very little experience with C.
Thanks for your help!
You are attempting to modify a string literal. String literals are not modifiable (i.e., they are read-only).
A program that attempts to modify a string literal exhibits undefined behavior: the program may be able to "successfully" modify the string literal, the program may crash (immediately or at a later time), a program may exhibit unusual and unexpected behavior, or anything else might happen. All bets are off when the behavior is undefined.
Your code declares original_string as a pointer to the string literal "ABC":
char* original_string = "ABC";
If you change this to:
char original_string[] = "ABC";
you should be good to go. This declares an array of char that is initialized with the contents of the string literal "ABC". The array is automatically given a size of four elements (at compile-time), because that is the size required to hold the string literal (including the null terminator).
The problem is that you can't modify the literal "ABC", which is read only.
Try char[] original_string = "ABC", which uses an array to hold the string that you can modify.

Resources