I was reading John Regehr's blog on how he gives his students an assignment about saturating arithmetic. The interesting part is that the code has to compile as-is while using typedefs to specify different integer types, see the following excerpt of the full header:
typedef signed int mysint;
//typedef signed long int mysint;
mysint sat_signed_add (mysint, mysint);
mysint sat_signed_sub (mysint, mysint);
The corresponding unsigned version is simple to implement (although I'm actually not sure if padding bits wouldn't make that problematic too), but I actually don't see how I can get the maximum (or minimum) value of an unknown signed type in C, without using macros for MAX_ und MIN_ or causing undefined behavior.
Am I missing something here or is the assignment just flawed (or more likely I'm missing some crucial information he gave his students)?
I don't see any way to do this without making assumptions or invoking implementation-defined (not necessarily undefined) behavior. If you assume that there are no padding bits in the representation of mysint or of uintmax_t, however, then you can compute the maximum value like this:
mysint mysint_max = (mysint)
((~(uintmax_t)0) >> (1 + CHAR_BITS * (sizeof(uintmax_t) - sizeof(mysint))));
The minimum value is then either -mysint_max (sign/magnitude or ones' complement) or -mysint_max - 1 (two's complement), but it is a bit tricky to determine which. You don't know a priori which bit is the sign bit, and there are possible trap representations that differ for different representations styles. You also must be careful about evaluating expressions, because of the possibility of "the usual arithmetic conversions" converting values to a type whose representation has different properties than those of the one you are trying to probe.
Nevertheless, you can distinguish the type of negative-value representation by computing the bitwise negation of the mysint representation of -1. For two's complement the mysint value of the result is 0, for ones' complement it is 1, and for sign/magnitude it is mysint_max - 1.
If you add the assumption that all signed integer types have the same kind of negative-value representation then you can simply perform such a test using an ordinary expression on default int literals. You don't need to make that assumption, however. Instead, you can perform the operation directly on the type representation's bit pattern, via a union:
union mysint_bits {
mysint i;
unsigned char bits[sizeof(mysint)];
} msib;
int counter = 0;
for (msib.i = -1; counter < sizeof(mysint); counter += 1) {
msib.bits[counter] = ~msib.bits[counter];
}
As long as the initial assumption holds (that there are no padding bits in the representation of type mysint) msib.i must then be a valid representation of the desired result.
I don't see a way to determine the largest and smallest representable values for an unknown signed integer type in C, without knowing something more. (In C++, you have std::numeric_limits available, so it is trivial.)
The largest representable value for an unsigned integer type is (myuint)(-1). That is guaranteed to work independent of padding bits, because (§ 6.3.1.3/1-2):
When a value with integer type is converted to another integer type… if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
So to convert -1 to an unsigned type, you add one more than the maximum representable value to it, and that result must be the maximum representable value. (The standard makes it clear that the meaning of "repeatedly adding or subtracting" is mathematical.)
Now, if you knew that the number of padding bits in the signed type was the same as the number of padding bits in the unsigned type [but see below], you could compute the largest representable signed value from the largest representable unsigned value:
(mysint)( (myuint)(-1) / (myuint)2 )
Unfortunately, that's not enough to compute the minimum representable signed value, because the standard permits the minimum to be either one less than the negative of the maximum (2's-complement representation) or exactly the negative of the maximum (1's-complement or sign/magnitude representations).
Moreover, the standard does not actually guarantee that the number of padding bits in the signed type is the same as the number of padding bits in the unsigned type. All it guarantees is that the number of value bits in the signed type be no greater than the number of value bits in the unsigned type. In particular, it would be legal for the unsigned type to have one more padding bit than the corresponding signed type, in which case they would have the same number of value bits and the maximum representable values would be the same. [Note: a value bit is neither a padding bit nor the sign bit.]
In short, if you knew (for example by being told) that the architecture were 2's-complement and that corresponding signed and unsigned types had the same number of padding bits, then you could certainly compute both signed min and max:
myuint max_myuint = (myuint)(-1);
mysint max_mysint = (mysint)(max_myuint / (my_uint)2);
mysint min_mysint = (-max_mysint) - (mysint)1;
Finally, casting an out-of-range unsigned integer to a signed integer is not undefined behaviour, although most other signed overflows are. The conversion, as indicated by §6.3.1.3/3, is implementation-defined behaviour:
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
Implementation-defined behaviour is required to be documented by the implementation. So, suppose we knew that the implementation was gcc. Then we could examine the gcc documentation, where we would read the following, in the section "C Implementation-defined behaviour":
Whether signed integer types are represented using sign and
magnitude, two's complement, or one's complement, and whether the
extraordinary value is a trap representation or an ordinary value
(C99 6.2.6.2).
GCC supports only two's complement integer types, and all bit
patterns are ordinary values.
The result of, or the signal raised by, converting an integer to a
signed integer type when the value cannot be represented in an
object of that type (C90 6.2.1.2, C99 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo
2^N to be within range of the type; no signal is raised.
Knowing that signed integers are 2s-complement and that unsigned to signed conversions will not trap, but will produce the expected pattern of low-order bits, we can find the maximum and minimum values for any signed type starting with the maximum representable value for the widest unsigned type, uintmax_t:
uintmax_t umax = (uintmax_t)(-1);
while ( (mysint)(umax) < 0 ) umax >>= 1;
mysint max_mysint = (mysint)(umax);
mysint min_mysint = (-max_mysint) - (mysint)1;
This is a suggestion for getting the MAX value of a specific type set with typedef without using any library
typedef signed int mysint;
mysint size; // will give the size of the type
size=sizeof(mysint)*(mysint)8-(mysint)1; // as it is signed so a bit
// will be deleted for sign bit
mysint max=1;//start with first bit
while(--size)
{
mysint temp;
temp=(max<<(mysint)1)|(mysint)1;// set all bit to 1
max=temp;
}
/// max will contain the max value of the type mysint
If you assume eight-bit chars and a two's complement representation (both reasonable on all modern hardware, with the exception of some embedded DSP stuff), then you just need to form an unsigned integer (use uintmax_t to make sure it's big enough) with sizeof(mysint)*8 - 1 1's in the bottom bits, then cast it to mysint. For the minimum value, negate the maximum value and subtract one.
If you don't want to assume those things, then it's still possible, but you'll need to do some more digging through limits.h to compensate for the size of chars and the sign representation.
I guess this should work irrespective of negative number representation
// MSB is 1 and rests are zero is minimum number in both 2's and 1's
// compliments representations.
mysint min = (1 << (sizeof(mysint) * 8 - 1));
mysint max = ~x;
Related
Do unsigned int and signed int have any relevance as far as the storage is concerned. I know it has its relevance in a print statement i.e.; -1 will be treated as 4294967295 (%d and %u) . If we consider just storage of the value , does the unsigned or signed would make a difference?
In C, you cannot have a value without a type. (Various operations are defined in terms of mathematical values, but each operation is specified to produce a result in a particular type, so, at each point in a C expression where there is a value, it has a type.) So any value is stored by storing the bytes of the object that represents it.
The C 2018 standard specifies the representations of types in 6.2.6 and of integer types specifically in 6.2.6.2. Objects are composed of one or more bits. Unsigned integers are represented with pure binary plus optional padding bits. The order of the bits is not specified. For a signed integer type, one of the bits is a sign bit, and each value bit has the same values as the same bit of the corresponding unsigned type. Some of the value bits in the unsigned type may be padding bits (unused for value) in the signed type. (But the total number of bits is the same, per 6.2.5 6.) The sign bit either indicates the value is negated or it represents the value −(2M) or −(2M−1), where M is the number of value bits. (Which of those three is implementation-defined.)
Therefore, whether an integer type is signed or unsigned makes no difference regarding the interpretation of the common value bits. It affects only the interpretation of bits that are value bits in the unsigned type but a sign bit or padding bits in the signed type. (The latter are rare.)
If the value in a signed integer type is the same as the value in its corresponding unsigned integer type, they have the same value in each of their common value bits and zeros in all of their unshared sign or value bits. (The padding bits are not specified by the C standard.)
In C11 standard
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it
is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type. 60)
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
In the first point,
I was wondering what it means by "if the value can be represented by the new type" in the first point? Two integer types might have different ranges of integers, but can have the same range of bit representations. (For example, unsigned int and int.)
Are both the bit representation and the integer value not changed before and after the conversion?
Thanks.
The C standard does not dictate one specific representation for integers. While most implementations you are likely to come across use two's complement for signed integers, that is not guaranteed. It also allows for one's complement where negating a number means inverting all bits (and not adding 1 as two's complement does), or sign-and-magnitude where negating a number means inverting only the high order bit.
As such, the language of the standard talks about what happens to the value of an integer when it is converted, not the representation.
As an example, suppose an int has range -231 to 231-1 and unsigned int has range 0 to 232-1. If you have an int with the value 45 that is converted to unsigned int, that value can be represented in both types so now you have an unsigned int with value 45.
Now suppose your int has value -1000. That value can not be represented in an unsigned int so it has to be converted as per the rule in clause 2, i.e. 232 is added to -1000 to result in 232 - 1000 == 4294966296. Now in two's complement representation, an unsigned int with value 4294966296 and an int with value -1000 happen to have the same representation which is FFFFFC18 in hex. This equivalence does not hold true for one's complement or sign-and-magnitude.
So converting a number to a different type may or may not change the representation, depending on the implementation.
char type is used in the example, but question relates to any integer type.
Can I be sure that
signed char foo = 127;
will always be binary identical to
unsigned char foo = 127;
so that it's possible to use signed variant for raw byte representation if MSB not needed?
The bits representing the values are not necessarily identical if padding bits are present but are identical if there are no padding bits.
C 2018 6.2.6.2 2 says of signed integer types:
Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type…
So the value bits in a signed integer type are the same as the value bits in the corresponding unsigned type. This leaves three more sets of bits to consider:
The sign bit.
Value bits that are in the unsigned type but not the signed type.
Padding bits.
The sign bit must be zero, because this paragraph also says:
… If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:…
Those “following ways” (sign-and-magnitude, one’s complement, two’s complement) all result in negative values if the value is not zero. Since we are told the represented value is positive, it is not negative, and so the sign bit must be zero. (We should note the question asserts the value is positive, thus excluding zero. With sign-and-magnitude or one’s complement, zero can be represented with a sign bit of one, and thus it could have different bits from an unsigned integer zero, which has all zero bits.)
Any value bits in that exist in the unsigned type but not the signed type must be zero, since the value is the same in both types.
That leaves the padding bits, and that is where the correspondence fails. The values of padding bits are not specified by the C standard and therefore may differ between signed and unsigned type or even between two instances of the same value in the same type. A specific C implementation may of course define its padding bits so that signed and unsigned types with the same value always have the same padding bits, and that the padding bits correspond between the signed and unsigned types. (We could imagine that the sign bit in the signed type corresponds to a padding bit in the unsigned type instead of to a value bit.)
C 1999 has the same wording.
uint32_t a = -1; // 11111111111111111111111111111111
int64_t b = (int64_t) a; // 0000000000000000000000000000000011111111111111111111111111111111
int32_t c = -1; // 11111111111111111111111111111111
int64_t d = (int64_t) c; // 1111111111111111111111111111111111111111111111111111111111111111
From the observation above, it appears that only the original value's sign matters.
I.e if the original 32 bit number is unsigned, casting it to a 64 bit value will add 0's to its left regardless of the destination value being signed or unsigned and;
if the original 32 bit number is signed and negative, casting it to a 64 bit value will add 1's to its left regardless of the destination value being signed or unsigned.
Is the above statement correct?
Correct, it's the source operand that dictates this.
uint32_t a = -1;
int64_t b = (int64_t) a;
No sign extension happens here because the source value is an unsigned uint32_t. The basic idea of sign extension is to ensure the wider variable has the same value (including sign). Coming from an unsigned integer type, the value is positive, always. This is covered by the standards snippet /1 below.
Negative sign extension (in the sense that the top 1-bit in a two's complement value is copied to all the higher bits in the wider type(a)) only happens when a signed type is extended in width, since only signed types can be negative.
If the original 32 bit number is signed and negative, casting it to a 64 bit value will add 1's to its left regardless of the destination value being signed or unsigned.
This is covered by the standards snippet /2 below. You still have to maintain the sign of the value when extending the bits but pushing a negative value (assuming the source was negative) into an unsigned variable will simply mathematically add the MAX_VAL + 1 to the value until it is within the range of the target type (in reality, for two's complement, no adding is done, it just interprets the same bit pattern in a different way).
Both these scenarios are covered in the standard, in this case C11 6.3.1.3 Signed and unsigned integers /1 and /2:
1/ When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2/ Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
3/ Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Note that your widening conversions are covered by the first two points above. I've included the third point for completion as it covers things like conversion from uint32_t to int32_t, or unsigned int to long where they have the same width (they both have a minimum range but there's no requirement that unsigned int be "thinner" than long).
(a) This may be different in ones' complement or sign-magnitude representations but, since they're in the process of being removed, nobody really cares that much.
See:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html (WG21, C++); and
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm (WG14, C)
for more detail.
In any case, the fixed width types are two's complement so you don't have to worry about this aspect for your example code.
As per the C standard the value representation of a integer type is implementation defined. So 5 might not be represented as 00000000000000000000000000000101 or -1 as 11111111111111111111111111111111 as we usually assume in a 32-bit 2's complement. So even though the operators ~, << and >> are well defined, the bit patterns they will work on is implementation defined. The only defined bit pattern I could find was "§5.2.1/3 A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.".
So my questions is - Is there a implementation independent way of converting integer types to a bit pattern?
We can always start with a null character and do enough bit operations on it to get it to a desired value, but I find it too cumbersome. I also realise that practically all implementations will use a 2's complement representation, but I want to know how to do it in a pure C standard way. Personally I find this topic quite intriguing due to the matter of device-driver programming where all code written till date assumes a particular implementation.
In general, it's not that hard to accommodate unusual platforms for the most cases (if you don't want to simply assume 8-bit char, 2's complement, no padding, no trap, and truncating unsigned-to-signed conversion), the standard mostly gives enough guarantees (a few macros to inspect certain implementation details would be helpful, though).
As far as a strictly conforming program can observe (outside bit-fields), 5 is always encoded as 00...0101. This is not necessarily the physical representation (whatever this should mean), but what is observable by portable code. A machine using Gray code internally, for example, would have to emulate a "pure binary notation" for bitwise operators and shifts.
For negative values of signed types, different encodings are allowed, which leads to different (but well-defined for every case) results when re-interpreting as the corresponding unsigned type. For example, strictly conforming code must distinguish between (unsigned)n and *(unsigned *)&n for a signed integer n: They are equal for two's complement without padding bits, but different for the other encodings if n is negative.
Further, padding bits may exist, and signed integer types may have more padding bits than their corresponding unsigned counterparts (but not the other way round, type-punning from signed to unsigned is always valid). sizeof cannot be used to get the number of non-padding bits, so e.g. to get an unsigned value where only the sign-bit (of the corresponding signed type) is set, something like this must be used:
#define TYPE_PUN(to, from, x) ( *(to *)&(from){(x)} )
unsigned sign_bit = TYPE_PUN(unsigned, int, INT_MIN) &
TYPE_PUN(unsigned, int, -1) & ~1u;
(there are probably nicer ways) instead of
unsigned sign_bit = 1u << sizeof sign_bit * CHAR_BIT - 1;
as this may shift by more than the width. (I don't know of a constant expression giving the width, but sign_bit from above can be right-shifted until it's 0 to determine it, Gcc can constant-fold that.) Padding bits can be inspected by memcpying into an unsigned char array, though they may appear to "wobble": Reading the same padding bit twice may give different results.
If you want the bit pattern (without padding bits) of a signed integer (little endian):
int print_bits_u(unsigned n) {
for(; n; n>>=1) {
putchar(n&1 ? '1' : '0'); // n&1 never traps
}
return 0;
}
int print_bits(int n) {
return print_bits_u(*(unsigned *)&n & INT_MAX);
/* This masks padding bits if int has more of them than unsigned int.
* Note that INT_MAX is promoted to unsigned int here. */
}
int print_bits_2scomp(int n) {
return print_bits_u(n);
}
print_bits gives different results for negative numbers depending on the representation used (it gives the raw bit pattern), print_bits_2scomp gives the two's complement representation (possibly with a greater width than a signed int has, if unsigned int has less padding bits).
Care must be taken not to generate trap representations when using bitwise operators and when type-punning from unsigned to signed, see below how these can potentially be generated (as an example, *(int *)&sign_bit can trap with two's complement, and -1 | 1 can trap with ones' complement).
Unsigned-to-signed integer conversion (if the converted value isn't representable in the target type) is always implementation-defined, I would expect non-2's complement machines to differ from the common definition more likely, though technically, it could also become an issue on 2's complement implementations.
From C11 (n1570) 6.2.6.2:
(1) For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that objects of that type shall be capable of representing values from 0 to 2N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
(2) For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed
type and N in the unsigned type, then M≤N ). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value -(2M) (two's complement);
the sign bit has the value -(2M-1) (ones' complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.
To add to mafso's excellent answer, there's a part of the ANSI C rationale which talks about this:
The Committee has explicitly restricted the C language to binary architectures, on the grounds that this stricture was implicit in any case:
Bit-fields are specified by a number of bits, with no mention of “invalid integer” representation. The only reasonable encoding for such bit-fields is binary.
The integer formats for printf suggest no provision for “invalid integer” values, implying that any result of bitwise manipulation produces an integer result which can be printed by printf.
All methods of specifying integer constants — decimal, hex, and octal — specify an integer value. No method independent of integers is defined for specifying “bit-string constants.” Only a binary encoding provides a complete one-to-one mapping between bit strings and integer values.
The restriction to binary numeration systems rules out such curiosities as Gray code and makes
possible arithmetic definitions of the bitwise operators on unsigned types.
The relevant part of the standard might be this quote:
3.1.2.5 Types
[...]
The type char, the signed and unsigned integer types, and the
enumerated types are collectively called integral types. The
representations of integral types shall define values by use of a pure
binary numeration system.
If you want to get the bit-pattern of a given int, then bit-wise operators are your friends. If you want to convert an int to its 2-complement representation, then arithmetic operators are your friends. The two representations can be different, as it is implementation defined:
Std Draft 2011. 6.5/4. Some operators (the unary operator ~, and the
binary operators <<, >>, &, ^, and |, collectively described as
bitwise operators) are required to have operands that have integer
type. These operators yield values that depend on the internal
representations of integers, and have implementation-defined and
undefined aspects for signed types.
So it means that i<<1 will effectively shift the bit-pattern by one position to the left, but that the value produced can be different than i*2 (even for smal values of i).