Result of comparing hardcoded strings with == - c

The following C code attempts to compare hardcoded strings using ==:
#include <stdbool.h>
bool test_address_aliasing() {
const char * a = "1";
const char * b = "1";
return a==b;
}
This will compare pointers to a and b, and an optimizing compiler will most likely combine the references to a and b to store them in the same place, thus resulting in a==b being true. However, a compiler is not required to combine references to equivalent strings, so is the actual result implementation defined? (I am most interested in C99 if the answer depends on the version of the C spec.)

It is unspecified whether or not two string constants with the same content point to the same object. Section 6.4.5p7 of the C11 standard regarding string constants states:
It is unspecified whether these arrays are distinct provided
their elements have the appropriate values. If the program
attempts to modify such an array, the behavior is undefined.
The actual pointer comparison is valid in either case (see C11 6.5.9p6):
Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to
one past the last element of the same array object, or one is a
pointer to one past the end of one array object and the other is a
pointer to the start of a different array object that happens to
immediately follow the first array object in the address
space.

From C standard, 6.4.5 "String literals", item 6:
It is unspecified whether these [string literals] arrays are distinct provided their
elements have the appropriate values.
That means, result of this comparison is unspecified.

Since the question specifically sought advice on C99 (not C17), relevant quotes from the 1999 C standard follow;
Section 6.4.5 "String Literals", para 6 says
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
Section 6.5.2.5 "Compound literals", para 8.
String literals, and compound literals with const-qualified types, need not designate
distinct objects.
The above also refers to footnote 82, which (while informative, not normative) says;
This allows implementations to share storage for string literals and constant compound literals with the same or overlapping representations.
Para 14 of the same section also provides an example;
EXAMPLE 6 Like string literals, const-qualified compound literals can be placed into read-only memory and can even be shared. For example,
(const char []){"abc"} == "abc"
might yield 1 if the literals’ storage is shared.
Section J.1 (informative) lists a bunch of things that are unspecified in the standard, of which the 14th in the list is
Whether two string literals result in distinct arrays (6.4.5).
From all of the above, it is unspecified whether two string literals with equivalent content are distinct. Two string literals of the form "abc" are allowed but not required to be at the same address in memory. This also means that the result of a (pointer) comparison would be unspecified.

if you do so:
char *a = "1";
char *b ="1";
and then:
if(a==b){ DO SOMETHING} HERE you are comparing the address of memory, where the two pointers are stored.
To compare the content of the memory where the variables are stored(and so to compare the value of the 2 variables) you have to do so:
if(*a == *b){DO SOMETHING}
I hope you mean that

Related

Struct vs string literals? Read only vs read-write? [duplicate]

This question already has answers here:
Why are compound literals in C modifiable
(2 answers)
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 4 years ago.
Does the C99 standard permit writing to compound literals (structs)? It seems it doesn't provide writing to literal strings. I ask about this because it says in C Programming: A Modern Approach, 2nd Edition on Page 406.
Q. Allowing a pointer to a compound literal would seem to make it possible to modify the literal. Is that the case?
A. Yes. Compound literals are lvalues that can be modified.
But, I don't quite get how that works, and how that works with string literals which you certainly can't modify.
char *foo = "foo bar";
struct bar { char *a; int g; };
struct bar *baz = &(struct bar){.a = "foo bar", .g = 5};
int main () {
// Segfaults
// (baz->a)[0] = 'X';
// printf( "%s", baz->a );
// Segfaults
// foo[0] = 'a';
// printf("%s", foo);
baz->g = 9;
printf("%d", baz->g);
return 0;
}
You can see on my list of things that segfault, writing to baz->a causes a segfault. But, writing to baz->g does not. Why is that one of them would cause a segfault and not the other one? How are struct-literals different from string-literals? Why would struct-literals not also be put into read-only section of memory and is the behavior defined or undefined for both of these (standards question)?
First thing first: your struct literal has a pointer member initialized to a string literal. The members of the struct itself are writeable, including the pointer member. It is only the content of the string literal that is not writeable.
String literals were part of the language since the beginning, while struct literals (officially known as compound literals) are a relatively recent addition, as of C99. By that time many implementations existed that placed string literals in read-only memory, especially on embedded systems with tiny amounts of RAM. By then designers of the standard had a choice of requiring string literals to be moved to a writeable location, allowing struct literals to be read-only, or leaving things as-is. None of the three solutions was ideal, so it looks like they went on the path of least resistance, and left everything the way it is.
Does the C99 standard permit writing to compound literals (structs)?
C99 standard does not explicitly prohibit writing to data objects initialized with compound literals. This is different from string literals, whose modification is considered undefined behavior by the standard.
The standard essentially defines the same characteristics to string literals and to compound literals with a const-qualified type used outside the body of a function.
Lifetime
String literals: Always static.
§6.4.5p6 In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence.
Compound literals: Automatic if used inside a function body, otherwise static.
§6.5.2.5p5 The value of the compound literal is that of an unnamed object initialized by the initializer list. If the compound literal occurs outside the body of a function, the object has static storage duration; otherwise, it has automatic storage duration associated with the enclosing block.
Possibly shared
Both string literals and const-qualified compound literals might be shared. You should be prepared for the possibility but cannot rely on it happening.
§6.4.5p7 It is unspecified whether [the arrays created for the string literals] are distinct provided their elements have the appropriate values.
§6.5.2.5p7 String literals, and compound literals with const-qualified types, need not designate distinct objects.
Mutability
Modifying either a string literal or a const-qualified compound literal is undefined behaviour. Indeed attempting to modify any const-qualified object is undefined behaviour, although the wording of the standard is probably subject to hair-splitting.
§6.4.5p7 If the program attempts to modify [the array containing a string literal], the behavior is undefined.
§6.7.3p6 If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
A non-const-qualified compound literal can be freely modified. I don't have a quote for this, but the fact that modification is not explicitly prohibited seems to me to be definitive. It's not necessary to explicitly say that mutable objects may be mutated.
The fact that the lifetime of compound literals inside function bodies is automatic can lead to subtle bugs:
/* This is fine */
const char* foo(void) {
return "abcde";
}
/* This is not OK */
const int* oops(void) {
return (const int[]){1, 2, 3, 4, 5};
;
Does the C99 standard permit writing to compound literals (structs)?
By writing to compound literal if you mean modifying elements of a compound literal, then yes, it does if it is not a read only compound literal.
C99-6.5.2.5:
If the type name specifies an array of unknown size, the size is determined by the initializer list as specified in 6.7.8, and the type of the compound literal is that of the completed array type. Otherwise (when the type name specifies an object type), the type of the compound literal is that specified by the type name. In either case, the result is an lvalue.
It means, compound literals are lvalues like arrays, and elements of a compound literal can be modified, just like you can modify an aggregate type. For example
// 1
((int []) {1,2,3})[0] = 100; // OK
// 2
(char[]){"Hello World"}[0] = 'Y'; // OK. This is not a string literal!
// 3
char* str = (char[]){"Hello World"};
*str = 'Y'; // OK. Writing to a compound literal via pointer.
// 4
(const float []){1e0, 1e1, 1e2}[0] = 1e7 // ERROR. Read only compound literal
In your code what you are trying to do is modifying a compound literal element which is pointing to a string literal which is non-modifiable. If that element is initialized with a compound literal then it can be modified.
struct bar *baz = &(struct bar){.a = (char[]){"foo bar"}, .g = 5};
This snippet will work now
Segfaults
(baz->a)[0] = 'X';
printf( "%s", baz->a );
Further standard also gives an example, in the same section mentioned above, and differentiate between a string literal, compound literal and read only compound literal:
13 EXAMPLE 5 The following three expressions have different meanings:
"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char []){"/tmp/fileXXXXXX"}
The first always has static storage duration and has type array of char, but need not be modifiable; the last two have automatic storage duration when they occur within the body of a function, and the first of these two is modifiable.

Identically-valued pointers comparing unequal [duplicate]

Inspired by this answering this question, I dug a little into the C11 and C99 standards for the use of equality operators on pointers (the original question concerns relational operators). Here's what C11 has to say (C99 is similar) at §6.5.9.6:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.94)
Footnote 94 says (and note that footnotes are non-normative):
Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.
The body of the text and the non-normative note appear to be in conflict. If one takes the 'if and only if' from the body of the text seriously, then in no other circumstances than those set out should equality be returned, and there is no room for UB. So, for instance this code:
uintptr_t a = 1;
uintptr_t b = 1;
void *ap = (void *)a;
void *bp = (void *)b;
printf ("%d\n", ap <= bp); /* UB by §6.5.8.5 */
printf ("%d\n", ap < bp); /* UB by §6.5.8.5 */
printf ("%d\n", ap == bp); /* false by §6.5.9.6 ?? */
should print zero, as ap and bp are neither pointers to the same object or function, or any of the other bits set out.
In §6.5.8.5 (relational operators) the behaviour is more clear (my emphasis):
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
Questions:
I am correct that there is some ambiguity as to when equality operators with pointers are permitted UB (comparing the footnote and the body of the text)?
If there is no ambiguity, when precisely can comparison of pointers with equality operators be UB? For instance, is it always UB if at least one pointer is artificially created (per above)? What if one pointer refers to memory that has been free()d? Given the footnote is non-normative, can one conclude there is never UB, in the sense that all 'other' comparisons must yield false?
Does §6.5.9.6 really mean that equality comparison of meaningless but bitwise equal pointers should always be false?
Note this question is tagged language-lawyer; I am not asking what in practice compilers do, as I believe already know the answer to that (compare them using the same technique as comparing integers).
Am I correct that there is some ambiguity as to when equality operators with pointers are UB?
No, because this passage from §6.5.9(3):
The == and != operators are analogous to the relational operators except for their lower precedence.
Implies that the following from §6.5.9(6) also applies to the equality operators:
When two pointers are compared [...] In all other cases, the behavior is undefined.
If there is no ambiguity, when precisely can comparison of pointers with equality operators be UB?
There is undefined behaviour in all cases for which the standard does not explicitly define the behaviour.
Is it always UB if at least one pointer is artificially created converted from an arbitrary integer?
§6.3.2.3(5):
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
What if one pointer refers to memory that has been freed?
§6.2.4(2):
The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime.
can one conclude there is never UB, in the sense that all 'other' comparisons must yield false?
No. The standard defines under what conditions two pointers must compare equal, and under what conditions two pointers must compare not equal. Any equality comparisons between two pointers that falls outside both of those two sets of conditions invokes undefined behaviour.
Does §6.5.9(6) really mean that equality comparison of meaningless but bitwise equal pointers should always be false?
No, it is undefined.

Difference between char array declaration forms

we had this question in programming exam, and we are all debating the correct answer, soo what do you think?
3.1 Which of the following is an incorrect string initialization?
(a) char plant[] = "Tree";
(b) char plant[] = {'T','R','E','E'};
(c) char plant[80] = "Tree";
(d) char plant[80] = {'T','R','E','E'};
(e) None of the above
thanks in advance :)
They're all syntactically valid, but I'm assuming what the question is leaning towards is that (b) will simply create a char [4] - that is, it will not be null terminated, whereas the other three will be.
The C99 and draft C11 standards define explicitly that a string is null-terminated: 7 Library 7.1.1 Definitions “A string is a contiguous sequence of characters terminated by and including the first null character.” The term “string” being so defined – and more than just a convention in the libraries – a “incorrect string initialization” (as referred to in the question) could be one that does not include a null character.
The C11 standard stipulates in 6.7.9 ¶22 “If an array of unknown size is initialized, its size is determined by the largest indexed element with an explicit initializer.” C99 6.7.8 ¶22 says the same. This is the case in (b), which therefore is unterminated and incorrect:
char plant[] = {'T','R','E','E'};
6.7.9 /6.7.8 ¶21 that “If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration”; ¶10 says that such objects are filled with (various) zeroes; this implies that (c) and (d) are null-terminated:
char plant[80] = "Tree";
char plant[80] = {'T','R','E','E'};
6.7.9 / 6.7.8 ¶14 says “An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.” This implies that this (a) is null-terminated:
char plant[] = "Tree";

Array as compound literal [duplicate]

This question already has answers here:
Why are compound literals in C modifiable
(2 answers)
Closed 4 years ago.
In C99 we can use compound literals as unnamed array.
But are this literals constants like for example 100, 'c', 123.4f, etc.
I noticed that I can do:
((int []) {1,2,3})[0] = 100;
and, I have no compilation error and is guessable that the first element of that unnamed array is modified with 100.
So it seems as array as compound literal are lvalue and not constant value.
It is an lvalue, we can see this if we look at the draft C99 standard section 6.5.2.5 Compound literals it says (emphasis mine):
If the type name specifies an array of unknown size, the size is
determined by the initializer list as specified in 6.7.8, and the type
of the compound literal is that of the completed array type. Otherwise
(when the type name specifies an object type), the type of the
compound literal is that specified by the type name. In either case,
the result is an lvalue.
If you want a const version, later on in the same section it gives the following example:
EXAMPLE 4 A read-only compound literal can be specified through
constructions like:
(const float []){1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6}
We can find an explanation of the terminology in this Dr Dobb's article The New C: Compound Literals and says:
Compound literals are not true constants in that the value of the
literal might change, as is shown later. This brings us to a bit of
terminology. The C99 and C90 Standards [2, 3] use the word “constant”
for tokens that represent truly unchangeable values that are
impossible to modify in the language. Thus, 10 and 3.14 are an integer
decimal constant and a floating constant of type double, respectively.
The word “literal” is used for the representation of a value that
might not be so constant. For example, early C implementations
permitted the values of quoted strings to be modified. C90 and C99
banned the practice by saying that any program than modified a string
literal had undefined behavior, which is the Standard’s way of saying
it might work, or the program might fail in a mysterious way. [...]
As far I remeber you are right, compound literals are lvalues*, you can also take pointer of such literal (which points to its first element):
int *p = (int []){1, 2, 3};
*p = 5; /* modified first element */
It is also possible to apply const qualifier on such compound literal, so elements are read-only:
const int *p = (const int []){1, 2, 3};
*p = 5; /* wrong, violation of `const` qualifier */
*Note this not means it's automatically modifiable lvalue (so it can used as left operand for assignment operator) since it has array type and refering to C99 draft 6.3.2.1 Lvalues, arrays, and function designators:
A modifiable lvalue is an lvalue that does not have array type, [...]
Referring to the C11 standard draft N1570:
Section 6.5.2.5p4:
In either case, the result is an lvalue.
An "lvalue" is, roughly, an expression that designates an object -- but it's important to note that not all lvalues are modifiable. A simple example:
const int x = 42;
The name x is an lvalue, but it's not a modifiable lvalue. (Expressions of array type cannot be modifiable lvalues, because you can't assign to an array object, but the elements of an array may be modifiable.)
Paragraph 5 of the same section:
The value of the compound literal is that of an unnamed object
initialized by the initializer list. If the compound literal occurs
outside the body of a function, the object has static storage
duration; otherwise, it has automatic storage duration associated with
the enclosing block.
The section describing compound literals doesn't specifically say that whether the unnamed object is modifiable or not. In the absence of such a statement, the object is taken to be modifiable unless the type is const-qualified.
The example in the question:
((int []) {1,2,3})[0] = 100;
is not particularly useful, since there's no way to refer to the unnamed object after the assignment. But a similar construct can be quite useful. A contrived example:
#include <stdio.h>
int main(void) {
int *ptr = (int[]){1, 2, 3};
ptr[0] = 100;
printf("%d %d %d\n", ptr[0], ptr[1], ptr[2]);
}
As mentioned above, the array has automatic storage duration, which means that if it's created inside a function, it will cease to exist when the function returns. Compound literals are not a replacement for malloc.
Compound literals are lvalues and it's elements can be modifiable. You can assign value to it. Even pointer to compound literals are allowed.

Are the literal and constant the same concept in C?

Are the literal and constant the same concept in C?
Is there any difference between them in usage?
Literals and constants are significantly different things in C. It can be said that the term literal in C stands for unnamed object that occupies memory (literals are typically lvalues), while the term constant stands for (possibly named) value that does not necessarily occupy memory (constants are rvalues).
"Classic" C (C89/90) had only one kind of literal: string literal. There were no other kinds of literals in that C. C99 additionally introduced so called compound literals.
Meanwhile, the term constant refers to explicit values, like 1, 2.5f, 0xA and 's'. Additionally, enum members are also recognized as constants in C.
Again, since literals in C are lvalues, you can take and use their addresses
const char *s = "Hello";
char (*p)[6] = &"World";
int (*a)[4] = &(int []) { 1, 2, 3, 4 };
Since constants are rvalues, you can't take their addresses.
Objects declared with const keyword are not considered constants in C terminology. They cannot be used where constant values are required (like case labels, bit-field widths or global,static variable initialization).
P.S. Note that the relevant terminology in C is quite different from that of C++. In C++ the term literal actually covers most of what is known in C as constants. And in C++ const objects can form constant expressions. People are sometimes trying to impose C++ terminology onto C, which often leads to confusion.
(See also Shall I prefer constants over defines?)
The C standard (specifically ISO/IEC 9899, Second edition, 1999-12-01) does not define “literal” by itself, so this is not a specific concept for C. I find three uses of “literal”, described below.
First, a string literal (or “string-literal” in the formal grammar) is a sequence of characters inside quotation marks, optionally prefixed with an “L” (which makes it a wide string literal). This is undoubtedly called a literal because each character in the string stands for itself: A ”b” in the source text results in a “b” in the string. (In contrast the characters “34” in the source text as a number, not a string, result in the value 34 in the program, not in the characters “3” and “4”.) The semantics of these literals are defined in paragraphs 4 and 5 of 6.4.5. Essentially, adjacent literals are concatened ("abc" "def" is made into "abcdef"), a zero byte is appended, and the result is used to initialize an array of static storage duration. So a string literal results in an object.
Second, a compound literal is a more complicated construct used to create values for structures. “Compound literal” is an unfortunate name, because it is not a literal at all. That is, the characters in the source text of the literal do not necessarily stand for exactly themselves. In fact, a compound literal is not even a constant. A compound literal can contain expressions that are evaluated at run-time, including function calls!
Third, in 6.1 paragraph 1, the standard indicates that “literal words and character set members” are indicated by bold type. This use of “literal” is describing the standard itself rather than describing things within the C language; it means that “goto” in the standard means the string “goto” in the C language.
Aside from string literals, I do not think “literal” is a particular concept in C.
”Constant”, however, is a significant concept.
First, there are simple constants, such as “34”, “4.5f”, and “'b'”. The latter is a character constant; although written with a character, it has an integer value. Constants include integer constants (in decimal, octal, and hexadecimal), floating constants (in decimal and hexadecimal), character constants, and enumeration constants. Enumeration constants are names specified in “enum” constructs.
Second, there are constant expressions, defined in 6.6. A constant expression can be evaluated during translation (compile time) rather than run time and may be used any place that a constant may be. 6.6 sets certain rules for constant expressions. As an example, if x is a static array, as declared by static int x[8];, then &x[(int) (6.8 * .5)] is a constant expression: It is the address of element 3 of x. (You would rarely using floating-point to index arrays, but I include it in the example to show it is allowed as part of a constant expression.)
Regarding “const”: The standard does not appear to specifically define a const-qualified object as constant. Rather, it says that attempting to modify an object defined with a const-qualified type through an lvalue (such as a dereferenced pointer) with non-const-qualified type, the behavior is undefined. This implies a C implementation is not required to enforce the constantness of const objects. (Also, note that having a pointer-to-const does not imply the pointed-to object is const, just that the dereferenced pointer is not a modifiable lvalue. There could be a separate pointer to the same object that is not const-qualified. For example, in int i; int *p = &i; const int *q = p;, q is a pointer-to-const-int, but i is not a const object, and p is a pointer to the same int although it is pointer-to-int, without const.)
Not quite. Constant variables (as opposed to constants defined in macros) are actual variables with dedicated storage space, so you can take the address of them. You can't take the address of a literal.
Edit: Sorry everyone, it appears that I was mistaken.

Resources