In section 2.7.1 Integer constants, it says:
To illustrate some of the subtleties of integer constants, assume that
type int uses a 16-bit twos-complement representation, type long uses
a 32-bit twos-complement representation, and type long long uses a
64-bit twos-complement representation. We list in Table 2-6 some
interesting integer constants...
An interesting point to note from this table is that integers in the
range 2^15 through 2^16 - 1 will have positive values when written as
decimal constants but negative values when written as octal or
hexadecimal constants (and cast to type int).
But, as far as I know, integers in the range 2^15 - 2^16-1 written as hex/octal constants also have positive values when cast to type unsigned. Is the book wrong?
In the described setup, decimal literals in the range [32768,65535] have type long int, and hexadecimal literals in that range have type unsigned int.
So, the constant 0xFFFF is an unsigned int with value 65535, and the constant 65535 is a signed long int with value 65535.
I think your text is trying to discuss the cases:
(int)0xFFFF
(int)65535
Now, since int cannot represent the value 65535 both of these cause out-of-range conversion which is implementation-defined (or may raise an implementation-defined signal).
Most commonly (in fact, all 2's complement systems I've ever heard of), it will use a combination of truncation and reinterpretation in both of those cases, giving a value of -1.
So the last paragraph of your quote is a bit strange. 65535 and 0xFFFF are both large positive numbers; (int)0xFFFF and (int)65535 are (probably) both negative numbers; but if you cast one and don't cast the other then you get a discrepancy which is not surprising.
Related
I'm reading Modern C, by Jens Gustedt and author points out the following,
All value have a type that is statically determined.
-At the starting of the chapter author said " C programs primarily reason about values and not about their representation ". So are literals, say decimal integer constant ways to represent a value or they have an intrinsic value.
Also stated was " We don't want result of computation to depend on executable which is platform specific but ideally depend only on program specification itself. An IMPORTANT STEP TO ACHIEVE THIS PLATFORM INDEPENDENCE IS THE CONCEPT OF TYPES ".
What does the text in uppercase actually mean, how do types help in platform independence. What is the use of types?
Why, decimal integer constants are signed and hexadecimal counterparts can be both signed and unsigned, even though they refer to same set of values?
I'm really confused at this point. If someone could answer each point and elaborate, I'll be grateful.
Why, decimal integer constants are signed and hexadecimal counterparts can be both signed and unsigned, even though they refer to same set of values?
This appears to allude to C 2018 6.4.4.1 5, which specifies the type of an integer constant. For a decimal constant with no suffix, the candidate types are all signed: int, long int, and long long int. (The choice from among these depends on the value of the constant.) For a hexadecimal constant with no suffix, the candidate types are a mix of signed and unsigned: int, unsigned int, long int, unsigned long int, long long int, and unsigned long long int.
This specification of candidate lists is not based on the values that each can represent, since, as you point out, decimal and hexadecimal notations can both represent any integer value. It is simply based on common use. Use of decimal constants fell largely into use with signed types (e.g., decimal constants were often used when doing general arithmetic) and use of hexadecimal constants was more diverse (e.g., hexadecimal constants were often used when working with bits), and presumably the C committee felt these rules suited the existing usage.
Why, decimal integer constants are signed and hexadecimal counterparts can be both signed and unsigned, even though they refer to same set of values?
You are kind of misunderstanding this part. It's really about the type of the constant. It's not about the signedness.
So assuming 32 bit int (2's complement) the following apply:
0x7ffffff has type int
0x8000000 has type unsigned int
The reason is that 0x80000000 is greater than INT_MAX and therefore can't be an int.
This is important because of the way usual arithmetic
conversions work. In case of 0x7fffffff the conversion will be towards int while in case of 0x80000000 the conversion will be towards unsigned int.
To put it in other word:
0x.... is always a value greater or equal to zero. If it's value fits into the range of int, its type will be an int. If it can't fit into int the the next step is to see if it fits into unsigned int in which case its type will be unsigned int and so on for long int, unsigned long int, long long int, unsigned long long int
An example (32 bit 2's complement):
The binary representation 0x80000000 as int represents the value -2147483648.
The binary representation 0x7fffffff as int represents the value 2147483647.
So from this it must be true that 0x7fffffff > 0x80000000 is true. Right?
Try:
int main(void)
{
if (0x7fffffff > 0x80000000)
{
puts("0x7fffffff is bigger than 0x80000000");
}
else
{
puts("0x7fffffff is less than 0x80000000");
}
return 0;
}
So this "should" print: 0x7fffffff is bigger than 0x80000000 Right?
No, it won't. It prints 0x7fffffff is less than 0x80000000. The reason is (again) that 0x80000000 is considered unsigned int so the comparison is done on unsigned types. This shows why the type of integer constants can be important.
Finally try:
int main(void)
{
if (((0x7fffffff - 0x40000000) - 0x40000000) > 0)
{
puts("(0x7fffffff - 0x40000000) - 0x40000000) is bigger than zero");
}
else
{
puts("(0x7fffffff - 0x40000000) - 0x40000000) is less than zero");
}
return 0;
}
Output:
(0x7fffffff - 0x40000000) - 0x40000000) is less than zero
This shows that the expression (0x7fffffff - 0x40000000) - 0x40000000) was calculated as int (i.e. had any operand been unsigned, result could not be less than zero).
Let's say I have the following variables and the following equation:
int16_t a,b,c,d;
int32_t result;
result = a*b - c*d;
The intermediate results of ab and cd will be stored in 16 bits or in 32 bits?
PS: I can test this faster than I can write the question, I want to know what does the C specification say.
The intermediate results will be of type int.
Any type narrower than int, will first be promoted. These integer promotions are to types int or unsigned **. Math therefore must occur at either int, unsigned or the original type.
int16_t is certainly narrower or the same as int.
The type of result is irrelevant to the type of the intermediate results
int16_t a,b,c,d;
int32_t result = a*b - c*d;
To make portable for all platforms, including those with a int narrower that int32_t, insure the products are calculated using at least 32-bit math.
#include <stdint.h>
int32_t result = INT32_C(1)*a*b - INT32_C(1)*c*d;
Of course the result is store as 32-bit, possible sign extending the int intermediate results.
With machines with 32 or 64-bit int, the intermediate results would fit in int32_t with no change in value with no exception. Results are -2147450880 to 2147450880 (80008000 to 7FFF8000).
** Never long, not even on unicorn platforms.
I will be updating this answer soon. I no longer believe that the standard permits int16_t to be promoted to long. But it can, in some extremely obscure cases, promote to unsigned int. The integer conversion rank rules have some odd results for exotic systems.
chux's answer is practically correct. There are a couple of obscure and unlikely cases where the intermediate result is of type long int.
int16_t is required to be a 16-bit 2's-complement integer type with no padding bits. Operands of type int16_t will be promoted to a type that can represent all possible values of type int16_t and that is at least as wide as int.
The standard requires int to have a range of at least -32767 to +32767.
Suppose int is represented using 1's-complement or sign-and-magnitude or that it's represented using 2's-complement but the representation that would normally be -32768 is treated as a trap representation. Then int cannot hold all values of type int16_t, and the operands must be promoted to long (which is guaranteed to have a wide enough range).
For this to happen, the implementation would have to support both an int type with a restricted 16-bit range (most likely not 2's-complement) and a type that's suitable for int16_t, meaning it has a 2's-complement representation and no padding bits or trap representations. A 1's-complement system, for example, would be more likely not to have such a type, and so it would not define int16_t at all (though it would define int_fast16_t and int_least16_t).
For practically all real-world implementations, int can hold all values of type int16_t, so the intermediate results are of type int. For practically all remaining real-world or hypothetical systems, int16_t would not be provided. For the hypothetical tiny fraction of nonexisting but conforming C implementations, the intermediate results are of type long int.
UPDATE : chux points out a possible weakness in my argument. In a comment, he argues that N1570 6.2.6.2 paragraph 2, which says that integer types may be represented using either two's complement, ones' complement, or sign and magnitude, is intended to require that all integer types use the same representation (differing in number of bits, of course, but all using the same one of those there choices).
The phrasing of the non-normative text in J.3.5, saying that:
Whether signed integer types are represented using sign and magnitude,
two’s complement, or ones’ complement, and whether the extraordinary
value is a trap representation or an ordinary value
is implementation-defined, tends to support that in interpretation. If different integer types can differ in that respect, it should say that for each integer type.
However:
6.2.6.2p2 doesn't explicitly say that all integer types must use the same representation, and I'm not aware of anything else that implies that they must do so.
It could be useful to support integer types with different representations. For example, the hardware might support ones' complement, but the implementation might support two's complement integers in software for the sake of code that depends on int16_t et al. Or the hardware might directly support both representations. (My guess is that the authors didn't consider this possibility, but we can only go by what the standard actually says.)
In any case, it's not actually necessary to invoke non-two's-complement representations to construct a case where int16_t promotes to long int.
Assume the following:
All integer types use two's complement representation.
INT_MAX == +32767
INT_MIN == -32767
An int with sign bit 1 and all value bits 0 is a trap representation.
int has no padding bits, but the bit pattern that would normally represent -32768 is a trap representation. This is explicitly permitted by N1570 6.2.6.2 paragraph 2.
But the range of int16_t must be -32768 to +32767. 7.20.2.1 says that INTN_MIN is required to be exactly -(2N-1) (N==16 in this case).
So under this almost entirely implausible but conforming implementation, int16_t is defined, but int cannot represent all values of int16_t, so int16_t values are promoted to long int (which must be at least 32 bits).
I was reading John Regehr's blog on how he gives his students an assignment about saturating arithmetic. The interesting part is that the code has to compile as-is while using typedefs to specify different integer types, see the following excerpt of the full header:
typedef signed int mysint;
//typedef signed long int mysint;
mysint sat_signed_add (mysint, mysint);
mysint sat_signed_sub (mysint, mysint);
The corresponding unsigned version is simple to implement (although I'm actually not sure if padding bits wouldn't make that problematic too), but I actually don't see how I can get the maximum (or minimum) value of an unknown signed type in C, without using macros for MAX_ und MIN_ or causing undefined behavior.
Am I missing something here or is the assignment just flawed (or more likely I'm missing some crucial information he gave his students)?
I don't see any way to do this without making assumptions or invoking implementation-defined (not necessarily undefined) behavior. If you assume that there are no padding bits in the representation of mysint or of uintmax_t, however, then you can compute the maximum value like this:
mysint mysint_max = (mysint)
((~(uintmax_t)0) >> (1 + CHAR_BITS * (sizeof(uintmax_t) - sizeof(mysint))));
The minimum value is then either -mysint_max (sign/magnitude or ones' complement) or -mysint_max - 1 (two's complement), but it is a bit tricky to determine which. You don't know a priori which bit is the sign bit, and there are possible trap representations that differ for different representations styles. You also must be careful about evaluating expressions, because of the possibility of "the usual arithmetic conversions" converting values to a type whose representation has different properties than those of the one you are trying to probe.
Nevertheless, you can distinguish the type of negative-value representation by computing the bitwise negation of the mysint representation of -1. For two's complement the mysint value of the result is 0, for ones' complement it is 1, and for sign/magnitude it is mysint_max - 1.
If you add the assumption that all signed integer types have the same kind of negative-value representation then you can simply perform such a test using an ordinary expression on default int literals. You don't need to make that assumption, however. Instead, you can perform the operation directly on the type representation's bit pattern, via a union:
union mysint_bits {
mysint i;
unsigned char bits[sizeof(mysint)];
} msib;
int counter = 0;
for (msib.i = -1; counter < sizeof(mysint); counter += 1) {
msib.bits[counter] = ~msib.bits[counter];
}
As long as the initial assumption holds (that there are no padding bits in the representation of type mysint) msib.i must then be a valid representation of the desired result.
I don't see a way to determine the largest and smallest representable values for an unknown signed integer type in C, without knowing something more. (In C++, you have std::numeric_limits available, so it is trivial.)
The largest representable value for an unsigned integer type is (myuint)(-1). That is guaranteed to work independent of padding bits, because (§ 6.3.1.3/1-2):
When a value with integer type is converted to another integer type… if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
So to convert -1 to an unsigned type, you add one more than the maximum representable value to it, and that result must be the maximum representable value. (The standard makes it clear that the meaning of "repeatedly adding or subtracting" is mathematical.)
Now, if you knew that the number of padding bits in the signed type was the same as the number of padding bits in the unsigned type [but see below], you could compute the largest representable signed value from the largest representable unsigned value:
(mysint)( (myuint)(-1) / (myuint)2 )
Unfortunately, that's not enough to compute the minimum representable signed value, because the standard permits the minimum to be either one less than the negative of the maximum (2's-complement representation) or exactly the negative of the maximum (1's-complement or sign/magnitude representations).
Moreover, the standard does not actually guarantee that the number of padding bits in the signed type is the same as the number of padding bits in the unsigned type. All it guarantees is that the number of value bits in the signed type be no greater than the number of value bits in the unsigned type. In particular, it would be legal for the unsigned type to have one more padding bit than the corresponding signed type, in which case they would have the same number of value bits and the maximum representable values would be the same. [Note: a value bit is neither a padding bit nor the sign bit.]
In short, if you knew (for example by being told) that the architecture were 2's-complement and that corresponding signed and unsigned types had the same number of padding bits, then you could certainly compute both signed min and max:
myuint max_myuint = (myuint)(-1);
mysint max_mysint = (mysint)(max_myuint / (my_uint)2);
mysint min_mysint = (-max_mysint) - (mysint)1;
Finally, casting an out-of-range unsigned integer to a signed integer is not undefined behaviour, although most other signed overflows are. The conversion, as indicated by §6.3.1.3/3, is implementation-defined behaviour:
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
Implementation-defined behaviour is required to be documented by the implementation. So, suppose we knew that the implementation was gcc. Then we could examine the gcc documentation, where we would read the following, in the section "C Implementation-defined behaviour":
Whether signed integer types are represented using sign and
magnitude, two's complement, or one's complement, and whether the
extraordinary value is a trap representation or an ordinary value
(C99 6.2.6.2).
GCC supports only two's complement integer types, and all bit
patterns are ordinary values.
The result of, or the signal raised by, converting an integer to a
signed integer type when the value cannot be represented in an
object of that type (C90 6.2.1.2, C99 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo
2^N to be within range of the type; no signal is raised.
Knowing that signed integers are 2s-complement and that unsigned to signed conversions will not trap, but will produce the expected pattern of low-order bits, we can find the maximum and minimum values for any signed type starting with the maximum representable value for the widest unsigned type, uintmax_t:
uintmax_t umax = (uintmax_t)(-1);
while ( (mysint)(umax) < 0 ) umax >>= 1;
mysint max_mysint = (mysint)(umax);
mysint min_mysint = (-max_mysint) - (mysint)1;
This is a suggestion for getting the MAX value of a specific type set with typedef without using any library
typedef signed int mysint;
mysint size; // will give the size of the type
size=sizeof(mysint)*(mysint)8-(mysint)1; // as it is signed so a bit
// will be deleted for sign bit
mysint max=1;//start with first bit
while(--size)
{
mysint temp;
temp=(max<<(mysint)1)|(mysint)1;// set all bit to 1
max=temp;
}
/// max will contain the max value of the type mysint
If you assume eight-bit chars and a two's complement representation (both reasonable on all modern hardware, with the exception of some embedded DSP stuff), then you just need to form an unsigned integer (use uintmax_t to make sure it's big enough) with sizeof(mysint)*8 - 1 1's in the bottom bits, then cast it to mysint. For the minimum value, negate the maximum value and subtract one.
If you don't want to assume those things, then it's still possible, but you'll need to do some more digging through limits.h to compensate for the size of chars and the sign representation.
I guess this should work irrespective of negative number representation
// MSB is 1 and rests are zero is minimum number in both 2's and 1's
// compliments representations.
mysint min = (1 << (sizeof(mysint) * 8 - 1));
mysint max = ~x;
When you cast a character to an int in C, what exactly is happening? Since characters are one byte and ints are four, how are you able to get an integer value for a character? Is it the bit pattern that is treated as a number. Take for example the character 'A'. Is the bit pattern 01000001 (i.e 65 in binary)?
char and int are both integer types.
When you convert a value from any arithmetic (integer or floating-point) type to another arithmetic type, the conversion preserves the value whenever possible. Arithmetic conversions are always defined in terms of values, not representations (though some of the rules are designed to be simply implemented on most hardware).
In your case, you might have:
char c = 'A';
int i = c;
c is an object of type char with the value 65 (assuming an ASCII representation). The conversion from char to int yields an int with the value 65. The compiler generates whatever code is necessary to make that happen; in terms of representation, it could either sign-extend or pad with 0 bits.
This applies when the value of the source expression can be represented as a value of the target type. For a char to int conversion, that's (almost) always going to be the case. For some other conversions, there are various rules for what to do when the value won't fit:
For any conversion to or from floating-point, if the value is out of range the behavior is undefined ((int)1.0e100 may yield some arbitrary value or it can crash your program), and if it's within range but inexact it's approximated by rounding or truncation;
For conversion of a signed or unsigned integer to an unsigned integer, the result is wrapped (unsigned)-1 == UINT_MAX);
For conversion of a signed or unsigned integer to a signed integer, the result is implementation-defined (wraparound semantics are common) -- or an implementation-defined signal can be raised.
(Floating-point conversions also have to deal with precision.)
Other than converting integers to unsigned types, you should generally avoid out-of-range conversions.
Incidentally, though int may happen to be 4 bytes on your system, it could be any size as long as it's able to represent values from -32767 to +32767. The ranges of the various integer types, and even the number of bits in a byte, are implementation-defined (with some restrictions imposed by the standard). 8-bit bytes are almost universal. 32-bit int is very common, though older systems commonly had 16-bit int (and I've worked on systems with 64-bit int).
This C code tries to find the absolute value of a negative number but the output also is negative. Can anyone tell me how to overcome this?
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <inttypes.h>
int main() {
int64_t a = 0x8000000000000000;
a = llabs(a);
printf("%" PRId64 "\n", a);
return 0;
}
Output
-9223372036854775808
UPDATE:
Thanks for all your answers. I understand that this is a non-standard value and that is why I am unable to perform an absolute operation on it. However, I did encounter this in an actual codebase that is a Genetic Programming simulation. The "organisms" in this do not know about the C standard and insist on generating this value :) Can anyone tell me an efficient way of working around this? Thanks again.
If the result of llabs() cannot be represented in the type long long, then the behaviour is undefined. We can infer that this is what's happening here - the out-of-range value 0x8000000000000000 is being converted to the value -9223372036854775808 when converted to int64_t, and your long long value is 64 bits wide, so the value 9223372036854775808 is unrepresentable.
In order for your program to have defined behaviour, you must ensure that the value passed to llabs() is not less than -LLONG_MAX. How you do this is up to you - either modify the "organisms" so that they cannot generate this value (eg. filter out those that create the out-of-range value as immediately unfit) or clamp the value before you pass it to llabs().
Basically, you can't.
The range of representable values for int64_t is -263 to +263-1. (And the standard requires int64_t to have a pure 2's-complement representation; if that's not supported, an implementation just won't define int64_t.)
That extra negative value has no corresponding representable positive value.
So unless your system has an integer type bigger than 64 bits, you're just not going to be able to represent the absolute value of 0x8000000000000000 as an integer.
In fact, your program's behavior is undefined according to the ISO C standard. Quoting section 7.22.6.1 of the N1570 draft of the 2011 ISO C standard:
The abs, labs, and llabs functions compute the absolute
value of an integer j. If the result cannot be represented, the
behavior is undefined.
For that matter, the result of
int64_t a = 0x8000000000000000;
is implementation-defined. Assuming long long is 64 bits, that constant is of type unsigned long long. It's implicitly converted to int64_t. It's very likely, but not guaranteed, that the stored value will be -263, or -9223372036854775808. (It's even permitted for the conversion to raise an implementation-defined signal, but that's not likely.)
(It's also theoretically possible for your program's behavior to be merely implementation-defined rather than undefined. If long long is wider than 64 bits, then the evaluation of llabs(a) is not undefined, but the conversion of the result back to int64_t is implementation-defined. In practice, I've never seen a C compiler with long long wider than 64 bits.)
If you really need to represent integer values that large, you might consider a multi-precision arithmetic package such as GNU GMP.
0x8000000000000000 is the smallest number that can be represented by a signed 64-bit integer. Because of quirks in two's complement, this is the only 64-bit integer with an absolute value that cannot be represented as a 64-bit signed integer.
This is because 0x8000000000000000 = -2^63, while the maximum representable 64-bit integer is 0x7FFFFFFFFFFFFFFF = 2^63-1.
Because of this, taking the absolute value of this is undefined behaviour that will generally result in the same value.
A signed 64-bit integer ranges from −(2^63) to 2^63 − 1, The absolute value of 0x8000000000000000, or −(2^63), is 2^63, is bigger than the max 64-bit integer.
No signed integer with its highest bit set high and all other bits low is representable in the same type as absolute value of that integer.
Observe an 8-bit integer
int8_t x = 0x80; // binary 1000_0000, decimal -128
An 8-bit signed integer can hold values between -128 and +127 inclusive, so the value +128 is out of range.
For a 16-bit integer this hols as well
int16_t = 0x8000; // binary 1000_0000_0000_0000, decimal -32,768
A 16-bit integer can hold values between -32,768 and +32,767 inclusive.
This pattern holds for any size integer as long as it is represented in two's complement, as is the de-facto representation for integers in computers. Two's complement holds 0 as all bits low and -1 as all bits high.
So an N-bit signed integer can hold values between 2^(N-1) and 2^(N-1)-1 inclusive, an unsigned integer can hold values between 0 and 2^N-1 inclusive.
Interestingly:
int64_t value = std::numeric_limits<int64_t>::max();
std::out << abs(value) << std::endl;
yields a value of 1 on gcc-9.
Frustrating!