Initializing bit-fields - c

When you write
struct {
unsigned a:3, b:2;
} x = {10, 11};
is x.b guaranteed to be 3 by ANSI C (C89)? I have read and reread the standard, but can't seem to find exactly that case.
For example, "result that cannot be represented by the
resulting unsigned integer type is reduced modulo the number that is
one greater than the largest value that can be represented by the
resulting unsigned integer type." speaks about computation, not about initialization. And moreover, bit-field is not really a type.
Also, (when speaking about unsigned t:4) "contains values in the range [0,15]", but it doesn't necessarily mean that initializer must be reduced modulo 16 to be mapped to [0,15].
Struct initialization is really painstakingly detailedly described, but I really can't seem to find exactly that behavior. (Of course compilers do exactly that. And IBM documentation says " when you assign a value that is out of range to a bit field, the low-order bit pattern is preserved and the appropriate bits are assigned.", but I'd like to know if ANSI C standardizes that.

"ANSI C"/C89 has been obsolete for 25 years. Therefore, my answer cites the current C standard ISO 9899:2011, also known as C11.
Pretty much everything related to bit-fields in the C standard is poorly defined. Typically, you will not find anything explicitly addressing the behavior of bit fields, but their behavior is rather specified implicitly, "between the lines". This is why you should avoid using bit fields.
However, I believe that this specific case is well-defined: it should work like any other integer initialization.
The detailed struct initialization rules you mention (6.7.9) show how the literal 11 in the initializer list is related to the variable b. Nothing strange with that. What then applies is "simple assignment", the same thing that would happen as if you wrote x.b = 11;.
When doing any kind of assignment or initialization in C, the right operand is converted to the type of the left operand. This is specified by C11 6.5.16:
In simple assignment (=), the value of the right operand is converted
to the type of the assignment expression and replaces the value stored
in the object designated by the left operand.
In your case, the literal 11 of type int is converted to a bit field of unsigned int:2.
Therefore, the rule you are looking for should be found in the chapter dealing with conversions (C11 6.3). What applies is what you already cited in your question, C11 6.3.1.3:
...if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.
The maximum value of an unsigned int:2 is 3. One more than the maximum value is 3+1=4. The compiler should repeatedly subtract this from the value 11:
11 - (3+1) = 7 does not fit, subtract once more:
7 - (3+1) = 3 does fit, store value 3
But then of course, this is the very same thing as taking the 2 least significant bits of the decimal value 11 and storing them in the bit field.

WRT "speaks about computation, not about initialization", the C89 standard explicitly applies the rules of assignment and conversion to initialization. It also says:
A bit-field is interpreted as an integral type consisting of the specified number of bits.
Given those, while a compiler warning would clearly be in order, it seems that throwing away upper-order bits is guaranteed by the standard.

Related

using bit shifting variables inside if statement - error or not

Suppose we have some variables x and y, and the following if statement which involves bit shifting:
if (x<<y)
I've read some posts which also deal with the issue of using bit shifting with variables (of some type) and inside if statement, but unfortunately I haven't been able to reach a unequivocal conclusion whether it is an error or not.
I assume that if it is an error, then it's a semantic error or a run-time error .
But is it necessarily en error ?
If x is of an unsigned integer type that is at least as large as unsigned int, and y is less than the number of bits in x's type, then the above partial statement will test whether bits in x that aren't in the top y are set. The C89 Standard would require that implementations behave likewise if x is of a signed type or a small unsigned type, with the caveat that setting the top bit of a small signed type is regarded as setting all bits beyond. The C99 and later standards, however, wouldn't require that implementations usefully process any situation in which x is non-zero but the expression x<<y would yield zero, unless x is an unsigned integer type at least as large as unsigned int.
It's not a syntactic error. if expects a parenthesized expression. (int_x<<int_y) satisfies that. The shift expression may cause a runtime error, but only if the particular values of int_x and int_y invoke undefined behavior (see 6.5.7 for when that might happen).

Are there any (valid) C implementations where float cannot represent the value 0?

If all floats are represented as x = (-1)^s * 2^e * 1.m , there is no way to store zero without support for special cases.
No, all conforming C implementations must support a floating-point value of 0.0.
The floating-point model is described in section 5.2.4.2.2 of the C standard (the link is to a recent draft). That model does not make the leading 1 in the significand (sometimes called the mantissa) implicit, so it has no problem representing 0.0.
Most implementations of binary floating-point don't store the leading 1, and in fact the formula you cited in the question:
x = (-1)^s * 2^e * 1.m
is typically correct (though the way e is stored can vary).
In such implementations, including IEEE, a special-case bit pattern, typically all-bits-zero, is used to represent 0.0.
Following up on the discussion in the comments, tmyklebu argues that not all numbers defined by the floating-point model in 5.2.4.2.2 are required to be representable. I disagree; if not all such numbers are required to be representable, then the model is nearly useless. But even leaving that argument aside, there is an explicit requirement that 0.0 must be representable. N1570 6.7.9 paragraph 10:
If an object that has static or thread storage duration is not
initialized explicitly, then:
...
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
...
This is a very long-standing requirement. A C reference from 1975 (3 years before the publication of K&R1) says:
The initial value of any externally-defined object not explicitly initialized is guaranteed to be 0.
which implies that there must be a representable 0 value. K&R1 (published in 1978) says, on page 198:
Static and external variables which are not initialized are guaranteed
to start off as 0; automatic and register variables which are not
initialized are guaranteed to start off as garbage.
Interestingly, the 1990 ISO C standard (equivalent to the 1989 ANSI C standard) is slightly less explicit than its predecessors and successors. In 6.5.7, it says:
If an object that has static storage duration is not initialized
explicitly, it is initialized implicitly as if every member that has
arithmetic type were assigned 0 and every member that has pointer type
were assigned a null pointer constant.
If a floating-point type were not required to have an exact representation for 0.0, then the "assigned 0" phrase would imply a conversion from the int value 0 to the floating-point type, yielding a small value close to 0.0. Still, C90 has the same floating-point model as C99 and C11 (but with no mention of subnormal or unnormalized values), and my argument above about model numbers still applies. Furthermore, the C90 standard was officially superseded by C99, which in turn was superseded by C11.
After o search a while a found this.
ISO/IEC 9899:201x section 6.2.5, Paragraph 13
Each complex type has the same representation and alignment requirements as an array
type containing exactly two elements of the corresponding real type; the first element is
equal to the real part, and the second element to the imaginary part, of the complex number.
section 6.3.1.7, Paragraph 1
When a value of real type is converted to a complex type, the real part of the complex
result value is determined by the rules of conversion to the corresponding real type and
the imaginary part of the complex result value is a positive zero or an unsigned zero.
So, if i understand this right, any implementations where supports C99 (first C standard with _Complex types), must support a floating-point value of 0.0.
EDIT
Keith Thompson pointed out that complex types are optional in C99, so this argument is pointless.
I believe the following floating-point system is an example of a conforming floating-point arithmetic without a representation for zero:
A float is a 48-bit number with one sign bit, 15 exponent bits, and 32 significand bits. Every choice of sign bit, exponent bits, and significant bits corresponds to a normal floating-point number with an implied leading 1 bit.
Going through the constraints in section 5.2.4.2.2 of the draft C standard Keith Thompson linked:
This floating-point system plainly conforms to paragraphs 1 and 2 of 5.2.4.2.2 in the draft standard.
We only represent normalised numbers; paragraph 3 merely permits us to go farther.
Paragraph 4 is tricky; it says that zero and "values that are not floating-point numbers" may be signed or unsigned. But paragraph 3 didn't force us to have any values that aren't floating-point numbers, so I can't imagine interpreting paragraph 4 as requiring there to be a zero.
The range of representable values in paragraph 5 is apparently -0x1.ffffffffp+16383 to 0x1.ffffffffp+16383.
Paragraph 6 states that +, -, *, / and the math library have implementation-defined accuracy. Still OK.
Paragraph 7 doesn't really constrain this implementation as long as we can find appropriate values for all the constants.
We can set FLT_ROUNDS to 0; this way, I don't even have to specify what happens when addition or subtraction overflows.
FLT_EVAL_METHOD shall be 0.
We don't have subnormals, so FLT_HAS_SUBNORM shall be zero.
FLT_RADIX shall be 2, FLT_MANT_DIG shall be 32, FLT_DECIMAL_DIG shall be 10, FLT_DIG shall be 9, FLT_MIN_EXP shall be -16383, FLT_MIN_10_EXP shall be -4933, FLT_MAX_EXP shall be 16384, and FLT_MAX_10_EXP shall be 4933.
You can work out FLT_MAX, FLT_EPSILON, FLT_MIN, and FLT_TRUE_MIN.

Is pointer tagging in C undefined according to the standard?

Some dynamically-typed languages use pointer tagging as a quick way to identify or narrow down the runtime type of the value being represented. A classic way to do this is to convert pointers to a suitably sized integer, and add a tag value over the least significant bits which are assumed to be zero for aligned objects. When the object needs to be accessed, the tag bits are masked away, the integer is converted to a pointer, and the pointer is dereferenced as normal.
This by itself is all in order, except it all hinges on one colossal assumption: that the aligned pointer will convert to an integer guaranteed to have zero bits in the right places.
Is it possible to guarantee this according to the letter of the standard?
Although standard section 6.3.2.3 (references are to the C11 draft) says that the result of a conversion from pointer to integer is implementation-defined, what I'm wondering is whether the pointer arithmetic rules in 6.5.2.1 and 6.5.6 effectively constrain the result of pointer->integer conversion to follow the same predictable arithmetic rules that many programs already assume. (6.3.2.3 note 67 seemingly suggests that this is the intended spirit of the standard anyway, not that that means much.)
I'm specifically thinking of the case where one might allocate a large array to act as a heap for the dynamic language, and therefore the pointers we're talking about are to elements of this array. I'm assuming that the start of the C-allocated array itself can be placed at an aligned position by some secondary means (by all means discuss this too though). Say we have an array of eight-byte "cons cells"; can we guarantee that the pointer to any given cell will convert to an integer with the lowest three bits free for a tag?
For instance:
typedef Cell ...; // such that sizeof(Cell) == 8
Cell heap[1024]; // such that ((uintptr_t)&heap[0]) & 7 == 0
((char *)&heap[11]) - ((char *)&heap[10]); // == 8
(Cell *)(((char *)&heap[10]) + 8); // == &heap[11]
&(&heap[10])[0]; // == &heap[10]
0[heap]; // == heap[0]
// So...
&((char *)0)[(uintptr_t)&heap[10]]; // == &heap[10] ?
&((char *)0)[(uintptr_t)&heap[10] + 8]; // == &heap[11] ?
// ...implies?
(Cell *)((uintptr_t)&heap[10] + 8); // == &heap[11] ?
(If I understand correctly, if an implementation provides uintptr_t then the undefined behaviour hinted at in 6.3.2.3 paragraph 6 is irrelevant, right?)
If all of these hold, then I would assume that it means that you can in fact rely on the low bits of any converted pointer to an element of an aligned Cell array to be free for tagging. Do they && does it?
(As far as I'm aware this question is hypothetical since the normal assumption holds for common platforms anyway, and if you found one where it didn't, you probably wouldn't want to look to the C standard for guidance rather than the platform docs; but that's beside the point.)
This by itself is all in order, except it all hinges on one colossal
assumption: that the aligned pointer will convert to an integer
guaranteed to have zero bits in the right places.
Is it possible to guarantee this according to the letter of the
standard?
It's possible for an implementation to guarantee this. The result of converting a pointer to an integer is implementation-defined, and an implementation can define it any way it likes, as long as it meets the standard's requirements.
The standard absolutely does not guarantee this in general.
A concrete example: I've worked on a Cray T90 system, which had a C compiler running under a UNIX-like operating system. In the hardware, an address is a 64-bit word containing the address of a 64-bit word; there were no hardware byte addresses. Byte pointers (void*, char*) were implemented in software by storing a 3-bit offset in the otherwise unused high-order 3 bits of a 64-bit word pointer.
All pointer-to-pointer, pointer-to-integer, and integer-to-pointer conversions simply copied the representation.
Which means that a pointer to an 8-byte aligned object, when converted to an integer, could have any bit pattern in its low-order 3 bits.
Nothing in the standard forbids this.
The bottom line: A scheme like the one you describe, that plays games with pointer representations, can work if you make certain assumptions about how the current system represents pointers -- as long as those assumptions happen to be valid for the current system.
But no such assumptions can be 100% reliable, because the standard says nothing about how pointers are represented (other than that they're of a fixed size for each pointer type, and that the representation can be viewed as an array of unsigned char).
(The standard doesn't even guarantee that all pointers are the same size.)
You're right about the relevant parts of the standard. For reference:
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
Since the conversions are implementation defined (except when the integer type is too small, in which case it's undefined), there's nothing the standard is going to tell you about this behaviour. If your implementation makes the guarantees you want, you're set. Otherwise, too bad.
I guess the answer to your explicit question:
Is it possible to guarantee this according to the letter of the standard?
Is "yes", since the standard punts on this behaviour and says the implementation has to define it. Arguably, "no" is just as good an answer for the same reason.

How is {int i=999; char c=i;} different from {char c=999;}?

My friend says he read it on some page on SO that they are different,but how could the two be possibly different?
Case 1
int i=999;
char c=i;
Case 2
char c=999;
In first case,we are initializing the integer i to 999,then initializing c with i,which is in fact 999.In the second case, we initialize c directly with 999.The truncation and loss of information aside, how on earth are these two cases different?
EDIT
Here's the link that I was talking of
why no overflow warning when converting int to char
One member commenting there says --It's not the same thing. The first is an assignment, the second is an initialization
So isn't it a lot more than only a question of optimization by the compiler?
They have the same semantics.
The constant 999 is of type int.
int i=999;
char c=i;
i created as an object of type int and initialized with the int value 999, with the obvious semantics.
c is created as an object of type char, and initialized with the value of i, which happens to be 999. That value is implicitly converted from int to char.
The signedness of plain char is implementation-defined.
If plain char is an unsigned type, the result of the conversion is well defined. The value is reduced modulo CHAR_MAX+1. For a typical implementation with 8-bit bytes (CHAR_BIT==8), CHAR_MAX+1 will be 256, and the value stored will be 999 % 256, or 231.
If plain char is a signed type, and 999 exceeds CHAR_MAX, the conversion yields an implementation-defined result (or, starting with C99, raises an implementation-defined signal, but I know of no implementations that do that). Typically, for a 2's-complement system with CHAR_BIT==8, the result will be -25.
char c=999;
c is created as an object of type char. Its initial value is the int value 999 converted to char -- by exactly the same rules I described above.
If CHAR_MAX >= 999 (which can happen only if CHAR_BIT, the number of bits in a byte, is at least 10), then the conversion is trivial. There are C implementations for DSPs (digital signal processors) with CHAR_BIT set to, for example, 32. It's not something you're likely to run across on most systems.
You may be more likely to get a warning in the second case, since it's converting a constant expression; in the first case, the compiler might not keep track of the expected value of i. But a sufficiently clever compiler could warn about both, and a sufficiently naive (but still fully conforming) compiler could warn about neither.
As I said above, the result of converting a value to a signed type, when the source value doesn't fit in the target type, is implementation-defined. I suppose it's conceivable that an implementation could define different rules for constant and non-constant expressions. That would be a perverse choice, though; I'm not sure even the DS9K does that.
As for the referenced comment "The first is an assignment, the second is an initialization", that's incorrect. Both are initializations; there is no assignment in either code snippet. There is a difference in that one is an initialization with a constant value, and the other is not. Which implies, incidentally, that the second snippet could appear at file scope, outside any function, while the first could not.
Any optimizing compiler will just make the int i = 999 local variable disappear and assign the truncated value directly to c in both cases. (Assuming that you are not using i anywhere else)
It depends on your compiler and optimization settings. Take a look at the actual assembly listing to see how different they are. For GCC and reasonable optimizations, the two blocks of code are probably equivalent.
Aside from the fact that the first also defines an object iof type int, the semantics are identical.
i,which is in fact 999
No, i is a variable. Semantically, it doesn't have a value at the point of the initialization of c ... the value won't be known until runtime (even though we can clearly see what it will be, and so can an optimizing compiler). But in case 2 you're assigning 999 to a char, which doesn't fit, so the compiler issues a warning.

Is using memcmp on array of int strictly conforming?

Is the following program a strictly conforming program in C? I am interested in c90 and c99 but c11 answers are also acceptable.
#include <stdio.h>
#include <string.h>
struct S { int array[2]; };
int main () {
struct S a = { { 1, 2 } };
struct S b;
b = a;
if (memcmp(b.array, a.array, sizeof(b.array)) == 0) {
puts("ok");
}
return 0;
}
In comments to my answer in a different question, Eric Postpischil insists that the program output will change depending on the platform, primarily due to the possibility of uninitialized padding bits. I thought the struct assignment would overwrite all bits in b to be the same as in a. But, C99 does not seem to offer such a guarantee. From Section 6.5.16.1 p2:
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
What is meant by "converted" and "replaces" in the context of compound types?
Finally, consider the same program, except that the definitions of a and b are made global. Would that program be a strictly conforming program?
Edit: Just wanted to summarize some of the discussion material here, and not add my own answer, since I don't really have one of my own creation.
The program is not strictly conforming. Since the assignment is by value and not by representation, b.array may or may not contain bits set differently from a.array.
a doesn't need to be converted since it is the same type as b, but the replacement is by value, and done member by member.
Even if the definitions in a and b are made global, post assignment, b.array may or may not contain bits set differently from a.array. (There was little discussion about the padding bytes in b, but the posted question was not about structure comparison. c99 lacks a mention of how padding is initialized in static storage, but c11 explicitly states it is zero initialized.)
On a side note, there is agreement that the memcmp is well defined if b was initialized with memcpy from a.
My thanks to all involved in the discussion.
In C99 §6.2.6
§6.2.6.1 General
1 The representations of all types are unspecified except as stated in this subclause.
[...]
4 [..] Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations.
6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.42)
42) Thus, for example, structure assignment need not copy any padding bits.
43) It is possible for objects x and y with the same effective type T to have the same value when they are accessed as objects of type T, but to have different values in other contexts. In particular, if == is defined for type T, then x == y does not imply that memcmp(&x, &y, sizeof (T)) == 0. Furthermore, x == y does not necessarily imply that x and y have the same value; other operations on values of type T may distinguish between them.
§6.2.6.2 Integer Types
[...]
2 For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits;[...]
[...]
5 The values of any padding bits are unspecified.[...]
In J.1 Unspecified Behavior
The value of padding bytes when storing values in structures or unions (6.2.6.1).
[...]
The values of any padding bits in integer representations (6.2.6.2).
Therefore there may be bits in the representation of a and b that differ while not affecting the value. This is the same conclusion as the other answer, but I thought that these quotes from the standard would be good additional context.
If you do a memcpy then the memcmp would always return 0 and the program would be strictly conforming. The memcpy duplicates the object representation of a into b.
My opinion is that it is strictly conforming. According to 4.5 that Eric Postpischil mentioned:
A strictly conforming program shall use only those features of the
language and library specified in this International Standard. It
shall not produce output dependent on any unspecified, undefined, or
implementation-defined behavior, and shall not exceed any minimum
implementation limit.
The behavior in question is the behavior of memcmp, and this is well-defined, without any unspecified, undefined or implementation-defined aspects. It works on the raw bits of the representation, without knowing anything about the values, padding bits or trap representations. Thus the result (but not the functionality) of memcmp in this specific case depends on the implementation of the values stored within these bytes.
Footnote 43) in 6.2.6.2:
It is possible for objects x and y with the same effective type T to
have the same value when they are accessed as objects of type T, but
to have different values in other contexts. In particular, if == is
defined for type T, then x == y does not imply that memcmp(&x, &y,
sizeof (T)) == 0. Furthermore, x == y does not necessarily imply that
x and y have the same value; other operations on values of type T may
distinguish between them.
EDIT:
Thinking it a bit further, I'm not so sure about the strictly conforming anymore because of this:
It shall not produce output dependent on any unspecified [...]
Clearly the result of memcmp depends on the unspecified behavior of the representation, thereby fulfilling this clause, even though the behavior of memcmp itself is well defined. The clause doesn't say anything about the depth of functionality until the output happens.
So it is not strictly conforming.
EDIT 2:
I'm not so sure that it will become strictly conforming when memcpy is used to copy the struct. According to Annex J, the unspecified behavior happens when a is initialized:
struct S a = { { 1, 2 } };
Even if we assume that the padding bits won't change and memcpy always returns 0, it still uses the padding bits to obtain its result. And it relies on the assumption that they won't change, but there is no guarantee in the standard about this.
We should differentiate between paddings bytes in structs, used for alignment, and padding bits in specific native types like int. While we can safely assume that the padding bytes won't change, but only because there is no real reason for it, the same does not apply for the padding bits. The standard mentions a parity flag as an example of a padding bit. This may be a software function of the implementation, but it may as well be a hardware function. Thus there may be other hardware flags used for the padding bits, including one that changes on read accesses for whatever reason.
We will have difficulties in finding such an exotic machine and implementation, but I see nothing that forbid this. Correct me if I'm wrong.

Resources