Signed bit field represetation - c

I made a bit field with a field sized 1 bit, and used int instead of unsigned. Later on when i tried to check the value of the field i found that the value was -1.
I used this code to check the binary represantation and the value of my bit field:
#include <stdio.h>
#include <stdlib.h>
union {
struct {
int bit:1;
} field;
int rep;
} n;
int main() {
int c, k;
n.field.bit=1;
for (c = 31; c >= 0; c--)
{
k = n.rep >> c;
if (k & 1)
printf("1");
else
printf("0");
}
printf("\n %d \n", n.field.bit);
return 0;
}
the output was:
00000000000000000000000000000001
-1
In that case, why is the value of my bit field is -1 and is it always be a negative number when i use signed int instead of unsigned ?

You should never use plain int as the bitfield type if you're expecting something about the value besides that it can hold n bits - according to the C11 standard it is actually implementation-defined whether int in a bit-field is signed or unsigned 6.7.2p5:
5 Each of the comma-separated multisets designates the same type, except that for bit-fields, it is implementation-defined whether the specifier int designates the same type as signed int or the same type as unsigned int.
In your case the int designates the same type as signed int; this is the default in GCC:
Whether a “plain” int bit-field is treated as a signed int bit-field or as an unsigned int bit-field (C90 6.5.2, C90 6.5.2.1, C99 and C11 6.7.2, C99 and C11 6.7.2.1).
By default it is treated as signed int but this may be changed by the -funsigned-bitfields option.
Thus any sane program always specifies either signed int or unsigned int, depending on which is appropriate for the current use case.
Then it is implementation defined whether the signed numbers are in one's complement, or two's complement - or perhaps sign and magnitude - if they're in one's complement or s-and-m, then the only value that can be stored in 1 bit is the sign bit, thus 0; so signed bit field of one bit probably makes sense only with 2's complement.
Your system seems to use 2's complement - this is e.g. what GCC always uses:
Whether signed integer types are represented using sign and magnitude, two’s complement, or one’s complement, and whether the extraordinary value is a trap representation or an ordinary value (C99 and C11 6.2.6.2).
GCC supports only two’s complement integer types, and all bit patterns are ordinary values.
and thus the bit values 1 and 0 are interpreted in terms of signed two's complement numbers: the former has sign bit set, so it is negative (-1) and the latter doesn't have a sign bit set so it is non-negative (0).
Thus for a signed bit-field of 2 bits, the possible bit patterns and their integer values on a 2's complement machine are
00 - has int value 0
01 - has int value 1
10 - has int value -2
11 - has int value -1
In an n-bit-field, the minimum signed number is - 2^(n - 1) and the maximum is 2^(n-1) - 1.
Now, when arithmetic is performed on a signed integer operand whose rank is less than int, it is converted to an int first, and thus the value -1 is sign-extended to full-width int; the same happens for default argument promotions; the value is sign-extended to a (full-width) int when it is passed in to printf.
Thus if you expect a sensible value from one bit bitfield, use either unsigned bit: 1; or alternatively if this is to be understood as a boolean flag, _Bool bit: 1;

When you call a variadic argument function (like printf) some arguments are promoted. For example bit-fields undergoes an integer promotion where it is promoted to an ordinary int value. That promotion brings with it sign extension (because your base type for the bit-field is signed). This sign extension will make it -1.
When using bit-fields, almost always use unsigned types as the base.

Related

Range of character is positive then if i store a integer in it still it should print a value between(0-255).why Output is -24

'''
void main()
{
int i=1000;
char c='A';
c=i;
printf("%d",c);
}
'''
Output is -24
why this output when range of character is (0-255)
When explaining that behaviour, the following things need to be considered:
First, in an assignment like c=i in
int i=1000;
char c='A';
c=i;
we need to consider that i is converted to the type of c before assignment. Integral conversion is defined here in an online C99 standard draft:
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if the value
can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.60)
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
So it is necessary to know if char is a singed integral type or not. This can differ from compiler to compiler, but it seems that your compiler considers type char as signed char by the default.
In case of signed char, the "result is implementation-defined", and we would need to have a look at the compiler's specification.
A common implementation is that the integral value 1000, which in binary is 00000011 11101000 is truncated to 8 bits and stored in the char value.
What 11101000 then means for a signed char is defined in the representation of types:
6-2-6-2 Integer types
2 For signed integer types, the bits of the object representation
shall be divided into three groups: value bits, padding bits, and the
sign bit. There need not be any padding bits; signed char shall not
have any padding bits. There shall be exactly one sign bit. Each bit
that is a value bit shall have the same value as the same bit in the
object representation of the corresponding unsigned type (if there are
M value bits in the signed type and N in the unsigned type, then M <=
N ). If the sign bit is zero, it shall not affect the resulting value.
If the sign bit is one, the value shall be modified in one of the
following ways:
the corresponding value with sign bit 0 is negated (sign and
magnitude);
the sign bit has the value -(2M) (two's complement);
the
sign bit has the value -(2M- 1) (ones' complement).
Which of these
applies is implementation-defined, as ...
Again, the result is defined in by the implementation of the compiler; but a common interpretation is two's complement, interpreting the most significant bit as sign bit:
In your case, the 8 bits of 11101000 is one sign bit (set) and 7 bits of the remainder; the remainder is 1101000 which is 104; the actual value in two's complement is then -(127-104+1), which is -24.
Note that it is not clear that the result is -24; other compilers might yield different results.
There is even one more step to consider, as you print the signed character value using format specifier "%d":
printf("%d",c)
This means, that negative signed char value gets promoted to int type; but this will yield the "same" negative value then. I omit the explanation of "promotion" and why arguments in printf are promoted at all.
As #JohnBollinger said, the typical range of type char, which is usually signed, is -128 - 127. When you assign 1000(1111101000 in binary) to the char, only the 8 least significant bits(assuming your chars are 8 bits) are kept leading to 11101000 in binary. This translates to -24 when it's printed as a signed integer.
Your compiler by default considers the type char as a signed integer type similarly to the type signed char.
The range of values for the type signed char is
— minimum value for an object of type signed char
SCHAR_MIN -127 // −(27 − 1)
— maximum value for an object of type signed char
SCHAR_MAX +127 // 27 − 1
From the C Standard (5.2.4.2.1 Sizes of integer types <limits.h>)
2 If the value of an object of type char is treated as a signed
integer when used in an expression, the value of CHAR_MIN shall be the
same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same
as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall be 0 and
the value of CHAR_MAX shall be the same as that of UCHAR_MAX. 20) The
value UCHAR_MAX shall equal 2CHAR_BIT − 1.
Here is a demonstrative program where there is used the type unsigned char instead of the type char.
#include <stdio.h>
int main(void)
{
int i = 1000;
unsigned char c;
printf( "i = %#x\n", i );
c = i;
printf( "c = %d\n", c );
return 0;
}
The program output is
i = 0x3e8
c = 232
As you can see an object of the type signed char can not hold such a big value as 232. If you will subtract 232 from 256 you will get 24. So this value and the value 232 used as an internal representation of signed char will get 0. So this internal representation of 232 interpreted as a signed value should be equal to -24 that is 24 + -24 = 0.

Type punning a positive signed integer into an unsigned (and vice versa)

union Positive_Small {
int8_t s;
uint8_t u;
};
union Positive_Small x = {.s = 3};
union Positive_Small y = {.u = 4};
assert(x.u == 3);
assert(y.s == 4);
Is this defined behaviour?
Does the Standard guarantee that the positive range of a signed integral type has the same representation as its unsigned equivalent?
I imagine that there is no implementation crazy enough (DS9K maybe?) to not do it, but is it defined?
Succinctly, yes — the standard guarantees that for the shared positive range of values, the bitwise representation of the values for the unsigned type are the same as for the signed type.
C11 Section 6.2.5 Types defines this (and a lot of other terminology and behaviour):
¶6 For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements. The type _Bool and the unsigned integer types that correspond to the standard signed integer types are the standard unsigned integer types. The unsigned integer types that correspond to the extended signed integer types are the extended unsigned integer types. The standard and extended unsigned integer types are collectively called unsigned integer types.40)
¶9 The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same.41) A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
40) Therefore, any statement in this Standard about unsigned integer types also applies to the extended unsigned integer types.
41) The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.
As dbush pointed out, section 6.2.6 Representation of types, and section 6.2.6.2 Integer types specifically, also contains relevant information:
¶2 For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M ≤ N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value -(2M) (two's complement);
the sign bit has the value -(2M- 1) (ones' complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.

Weird behavior of right shift in C (sometimes arithmetic, sometimes logical)

GCC version 5.4.0
Ubuntu 16.04
I have noticed some weird behavior with the right shift in C when I store a value in variable or not.
This code snippet is printing 0xf0000000, the expected behavior
int main() {
int x = 0x80000000
printf("%x", x >> 3);
}
These following two code snippets are printing 0x10000000, which is very weird in my opinion, it is performing logical shifts on a negative number
1.
int main() {
int x = 0x80000000 >> 3
printf("%x", x);
}
2.
int main() {
printf("%x", (0x80000000 >> 3));
}
Any insight would be really appreciated. I do not know if it a specific issue with my personal computer, in which case it can't be replicated, or if it is just a behavior in C.
Quoting from https://en.cppreference.com/w/c/language/integer_constant, for an hexadecimal integer constant without any suffix
The type of the integer constant is the first type in which the value can fit, from the list of types which depends on which numeric base and which integer-suffix was used.
int
unsigned int
long int
unsigned long int
long long int(since C99)
unsigned long long int(since C99)
Also, later
There are no negative integer constants. Expressions such as -1 apply the unary minus operator to the value represented by the constant, which may involve implicit type conversions.
So, if an int has 32 bit in your machine, 0x80000000 has the type unsigned int as it can't fit an int and can't be negative.
The statement
int x = 0x80000000;
Converts the unsigned int to an int in an implementation defined way, but the statement
int x = 0x80000000 >> 3;
Performs a right shift to the unsigned int before converting it to an int, so the results you see are different.
EDIT
Also, as M.M noted, the format specifier %x requires an unsigned integer argument and passing an int instead causes undefined behavior.
Right shift of the negative integer has implementation defined behavior. So when shifting right the negative number you cant "expect" anything
So it is just as it is in your implementation. It is not weird.
6.5.7/5 [...] If E1 has a signed type and a negative value, the resulting value is implementation- defined.
It may also invoke the UB
6.5.7/4 [...] If E1 has a signed type and nonnegative value, and E1×2E2 is representable in the result type, then that is the resulting
value; otherwise, the behavior is undefined.
As noted by #P__J__, the right shift is implementation-dependent, so you should not rely on it to be consistent on different platforms.
As for your specific test, which is on a single platform (possibly 32-bit Intel or another platform that uses two's complement 32-bit representation of integers), but still shows a different behavior:
GCC performs operations on literal constants using the highest precision available (usually 64-bit, but may be even more). Now, the statement x = 0x80000000 >> 3 will not be compiled into code that does right-shift at run time, instead the compiler figures out both operands are constant and folds them into x = 0x10000000. For GCC, the literal 0x80000000 is NOT a negative number. It is the positive integer 2^31.
On the other hand, x = 0x80000000 will store the value 2^31 into x, but the 32-bit storage cannot represent that as the positive integer 2^31 that you gave as an integer literal - the value is beyond the range representable by a 32-bit two's complement signed integer. The high-order bit ends up in the sign bit - so this is technically an overflow, though you don't get a warning or error. Then, when you use x >> 3, the operation is now performed at run-time (not by the compiler), with the 32-bit arithmetic - and it sees that as a negative number.

Type of integer literals and ~ in C

I'm a C beginner, and I'm confused by the following example found in the C answer book.
One way to find the size of unsigned long long on your system is to type:
printf("%llu", (unsigned long long) ~0);
I have no idea why this syntax works?
On my system, int are 32 bits, and long long are 64 bits.
What I expected was that, since 0 is a constant of type integer, ~0 calculates the negation of a 32-bits integer, which is then converted to an unsigned long long by the cast operator. This should give 232 - 1 as a result.
Somehow, it looks like the ~ operator already knows that it should act on 64 bits?
Does the compiler interprets this instruction as printf("%llu", ~(unsigned long long)0); ? That doesn't sound right since the cast and ~ have the same priority.
Somehow, it looks like the ~ operator already knows that it should act on 64 bits?
It's not the ~ operator, it's the cast. Here is how the integer conversion is done according to the standard:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The value of signed int ~0 corresponds to -1 on systems with two's complement representation of negative values. It cannot be represented by an unsigned long long, so the first bullet point does not apply.
The second bullet point does apply: the new type is unsigned, so MAX of unsigned long long is added to -1 once to get the result into the range of unsigned long long. This has the same effect as sign-extending -1 to 64 bits.
0 is of type int, not unsigned int. ~0 will therefore (on machines that use two's complement integer representation, which is all that are in use today) be -1, not 232 - 1.
Assuming a 64-bit unsigned long long, (unsigned long long) -1 is -1 modulo 264, which is 264 - 1.
0 is an int
~0 is still an int, namely the value -1.
Casting an int to unsigned long long is there merely to match the type that printf expects with the conversion llu.
However, the value of -1 extended an unsigned long long should be 0xffffffff for 4 byte int and 0xffffffffffffffff for 8 byte int.
According to N1570 Committee Draft:
6.5.3.3 Unary arithmetic operators
The result of the ~ operator is the bitwise complement of its
(promoted) operand (that is, each bit in the result is set if and only
if the corresponding bit in the converted operand is not set). The
integer promotions are performed on the operand, and the result has
the promoted type. If the promoted type is an "unsigned type, the
expression ~E is equivalent to the maximum value representable in that
type minus E".
§6.2.6.2 Language 45:
(ones’ complement). Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value. In the case of sign and magnitude and ones’ complement, if this representation is a normal value it is called a negative zero.
Hence, the behavior of code:
printf("%llu", (unsigned long long) ~0);
On some machine is implementation-defined and undefined - not as per expected — depend on the internal representations of integers in machine.
And according to section 6.5.3.3, approved way to write code would be:
printf("%llu", (unsigned long long) ~0u);
Further, type of ~0u is unsigned int where as you are casting it to unsigned long long int for which format string is llu. To print ~0u using format string %u.
To learn basic concept of type casting you may like to read: What exactly is a type cast in C/C++?

Printing declared char value in C

I understand that character variable holds from (signed)-128 to 127 and (unsigned)0 to 255
char x;
x = 128;
printf("%d\n", x);
But how does it work? Why do I get -128 for x?
printf is a variadic function, only providing an exact type for the first argument.
That means the default promotions are applied to the following arguments, so all integers of rank less than int are promoted to int or unsigned int, and all floating values of rank smaller double are promoted to double.
If your implementation has CHAR_BIT of 8, and simple char is signed and you have an obliging 2s-complement implementation, you thus get
128 (literal) to -128 (char/signed char) to -128 (int) printed as int => -128
If all the listed condition but obliging 2s complement implementation are fulfilled, you get a signal or some implementation-defined value.
Otherwise you get output of 128, because 128 fits in char / unsigned char.
Standard quote for case 2 (Thanks to Matt for unearthing the right reference):
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
This all has nothing to do with variadic functions, default argument promotions etc.
Assuming your system has signed chars, then x = 128; is performing an out-of-range assignment. The behaviour of this is implementation-defined ; meaning that the compiler may choose an action but it must document what it does (and therefore, do it reliably). This action is allowed to include raising a signal.
The usual behaviour that modern compilers do for out-of-range assignment is to truncate the representation of the value to fit in the destination type.
In binary representation, 128 is 000....00010000000.
Truncating this into a signed char gives the signed char of binary representation 10000000. In two's complement representation, which is used by all modern C systems for negative numbers, this is the representation of the value -128. (For historical curiousity: in one's complement this is -127, and in sign-magnitude, this is -0 which may be a trap representation and thus raise a signal).
Finally, printf accurately prints out this char's value of -128. The %d modifier works for char because of the default argument promotions and the facts that INT_MIN <= CHAR_MIN and INT_MAX >= CHAR_MAX.; this behaviour is guaranteed except on systems which have plain char as unsigned, and sizeof(int)==1 (which do exist but you'd know about it if you were on one).
Lets look at the binary representation of 128 when stored into 8 bits:
1000 0000
And now let's look at the binary representation of -128 when stored into 8 bits:
1000 0000
The standard for char with your current setup looks to be a signed char (note this isn't in the c standard, look here if you don't believe me) and thus when you're assigning the value of 128 to x you're assigning it the value 1000 0000 and thus when you compile and print it out it's printing out the signed value of that binary representation (meaning -128).
It turns out my environment is the same in assuming char is actually signed char. As expected if I cast x to be an unsigned char then I get the expected output of 128:
#include <stdio.h>
#include <stdlib.h>
int main() {
char x;
x = 128;
printf("%d %d\n", x, (unsigned char)x);
return 0;
}
gives me the output of -128 128
Hope this helps!

Resources