Can someone explain what maxBit is? - c

I am trying to understand what is maxBit in the following and what it represents?
When I print min and max, I get numbers that make no sense to me.
Thank you.
#include <stdio.h>
#include <math.h>
int main() {
union {double a; size_t b;} u;
u.a = 12345;
size_t max = u.b;
u.a = 6;
size_t min = u.b;
int maxBit = floor(log(max-min) / log(2));
printf("%d",maxBit);
return 0;
}

This code appears to be using a horrible kludge. I am one of the more welcoming participants here regarding tolerating code that uses compiler extensions or other things beyond the C standard, but this code does simply unnecessary things for no apparent good purpose. It relies on size_t being 64 bits. It may be 64 bits in some specific C implementation this was written for, but that is not portable, and C implementations that use 64 bits are generally modern, and modern implementations ought to support the uint64_t of <stdint.h>, which would be an appropriate type for this. So better code would have used uint64_t.
Unless there is some quite surprising motivation for this and other issues in the code, it is low quality, bad code. Do not use it, and regard any code from the same source with skepticism.
That said, the code likely assumes the IEEE-754 binary64 is used for double, and max-min gives the difference between the representations of 12345 and 6. log(max-min) / log(2) finds the base-two-logarithm of max-min, and the integer portion of that will be the index of the highest bit that changed. For 12345, the exponent field is 1036. For 6, the exponent field is 1025. The difference is 11 (binary 1011), in which the first set bit is bit 3 of the exponent field. The field runs from bits 62 to 52 in the binary64 format, so bit 3 in the exponent field is bit 55 (52+3) in the whole 64 bits of the representation. So maxBit will be 55. However, there is no apparent significance to this. There is no great value in knowing that bit 55 is the highest bit set in the difference between the representations of 12345 and 6. I am familiar with a variety of IEEE-754 bit-twiddling hacks, and I do not recognize this. I expect nobody can tell you much more about this without context, such as where the code came from or how it is used.

From C17 document, 6.5.2.3 Structure and union members, footnote 97 :
If the member used to read the contents of a union object is not the
same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as described
in 6.2.6 (a process sometimes called “type punning”). This might be a
trap representation.
Therefore, when you store u.a = 12345 and then access size_t max = u.b, the bit patterns in the memory of u.a is reinterpreted as a size_t. Since, u.a is of double, it is represented in IEEE754 format.
The value stored in max and min are :
4668012349850910720 (0100000011001000000111001000000000000000000000000000000000000000-> IEEE754)
4618441417868443648 (0100000000011000000000000000000000000000000000000000000000000000-> IEEE754)
Then, max-min = 49570931982467072, then log(max-min)/log(2) = 55.460344, then floor(55.460344) = 55. This is reason for 55 as output.
PS: There are two types of IEEE754 format : Single precision (32) and Double precision (64). Please visit this website IEEE754 for more details.

Related

Can a type in C have more than one object representation?

The C99 standard, section 6.2.6.1 8, states:
When an operator is applied to a value that has more than one object
representation, which object representation is used shall not affect
the value of the result (43). Where a value is stored in an object using a
type that has more than one object representation for that value, it
is unspecified which representation is used, but a trap representation
shall not be generated.
I understood object to mean a location (bytes) in memory and value as the interpretation of those bytes based on the type used to access it. If so, then:
How can a value have more than one object representation?
How can a type have more than one object representation for a value?
The standard adds the below in the footnote:
Still, it's not clear to me. Can someone please simplify it for me and explain with examples?
An object is a region of storage (memory) that can contain values of a certain type [C18 3.15].
An object representation are the Bytes that make up the contents of an object [C18 6.2.6.1].
Not every possible combination of Bytes in an object representation also has to correspond to a value of the type (an object representation that doesn't is called a trap representation [C18 3.19.4]).
And not all the Bits in an object representation have to participate in representing a value. Consider the following type:
struct A
{
char c;
int n;
};
Compilers are allowed to (and generally will) insert padding Bytes between the members c and n of this struct to ensure correct alignment of n. These padding Bytes are part of an object of type struct A. They are, thus, part of the object representation. But the values of these padding Bytes do not have any effect on the logical value of type A that is stored in the object.
Let's say we're on a target platform where Bytes consist of 8 Bits, an int consists of 4 Bytes in little endian order, and there are 3 padding Bytes between c and n to ensure that n starts at an offset that is a multiple of 4. The value (struct A){42, 1} may be stored in an object as
2A 00 00 00 01 00 00 00
But it may as well be stored in an object as
2A FF FF FF 01 00 00 00
or whatever else the padding Bytes may happen to be. Each of these sequences of Bytes is a valid object representation of the same logical value of type struct A.
This is also what the footnote is about. If you had two objects x and y that each contained a different object representation of the same value of type struct A, then x == y will evaluate to true while simply performing a memcmp() will not since memcmp() simply compares the bytes of the object representation without any consideration as to what the logical value stored in these objects actually is…
How can a value have more than one object representation?
How can a type have more than one object representation for a value?
Yes, by not having each bit pattern correspond to a different value.
Typically 1 bit pattern is preferred, canonical form, and others rarely generated by normal means.
The x86 extended precision format contains bit patterns that are the same value of other bit patterns - even with the same sign. Research the "pseudo denormal" and "unnormal" bit patterns.
A side effect is that this 80-bit encoding does not realize 280 different values due to this redundancy. (even after accounting for not-a-numbers)
Using 2 double to encode a long double has a similar impact.
Oversimplified example of 2 double representing a long double value:
1000001.0 + 0.0 (canonical form) same value as 1000000.0 + 1.0
Decimal floating point has this issue too.
Because the significand is not normalized, most values with less than 16 significant digits have multiple possible representations; 1×102=0.1×103=0.01×104, etc.
As multiple bit patterns for the same value reduce the gamut of possible numbers, such encodings tend to fall out of favor compared to non-redundant ones. An effect is that we do not seem them as much these days.
A reason for their existence in the first place was to facilitate hardware realizations or simple easy to define (let's explicitly encode the most significant digit for our new FP format - using an implied one is so confusing).
#Eric brought up an interesting comment concerning value and operator that hinges on:
Where an operator is applied to a value that has more than one object representation, which object representation is used shall not affect the value of the result. C17 § 6.2.6.2 8
Given x = +0.0 and y = -0.0, which have the same numeric value of zero, would still qualify as having different values as the operator / distinguishes them as in 1.0/x != 1.0/y.
Still the various FP examples above have many other cases where x,y have different bit pattern yet the same value.
explain with examples?
For example on a compiler that to represent float type uses decimal floating point according to IEEE 754-2008 standard, assuming that stars are properly aligned - CHAR_BIT=8, sizeof(int)==4, floats have width of 32-bits with no padding bits and the compiler uses little endian, the following code (tested with gcc9.2 with -Dfloat=typeof(1.0df)):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
float a, b;
// simulate `a = 314` and `b = 314` with a compiler
// that chose to use different object representation for the same value
memcpy(&a, (int[1]){0x3280013a}, 4);
memcpy(&b, (int[1]){0x32000c44}, 4);
printf("a = %d, b = %d\n", (int)a, (int)b);
printf("a %s b and memcpy(&a, &b) %s 0\n",
a == b ? "==" : "!=",
memcmp(&a, &b, sizeof(a)) == 0? "==" : "!=");
}
should (could) output:
a = 314, b = 314
a == b and memcpy(&a, &b) != 0
A simple example of a value with more than one representation is an IEEE floating point zero. It has a "positive zero" and a "negative zero" representations.
Note An implementation that conforms to IEC 60559 must distinguish between positive and negative zeros, so in such an implementation they are different values rather than different representations of the same value. However an implementation doesn't need to conform to IEC 60559. Such implementations are allowed to e.g. always rerurn the same value for signbit of zero, even though the underlying hardware distinguishes +0 and -0.
On a sign-and-magnitude machine, integer zeros also have more than one representation.
On a segmented architecture like the 16-bit 8086, "long" pointers have more than one representation, for example 0x0000:0x0010 and 0x0001:0x0000 are two representations of the same pointer value.
Finally, in any data type with padding, padding bits do not influence the value. Examples include structs with padding holes.

Any precision loss when converting float64 to uint64 in C? assuming only the whole number part of the data is meaningful

I have a counter field from one TCP protocol which keeps track of samples sent out. It is defined as float64. I need to translate this protocol to a different one, in which the counter is defined as uint64.
Can I safely store the float64 value in the uint64 without losing any precision on the whole number part of the data? Assuming the fractional part can be ignored.
EDIT: Sorry if I didn't describe it clearly. There is no code. I'm just looking at two different protocol documentations to see if one can be translated to the other.
Please treat float64 as double, the documentation isn't well written and is pretty old.
Many thanks.
I am assuming you are asking about 64 bit floating point values such as IEEE Binary64 as documented in https://en.wikipedia.org/wiki/Double-precision_floating-point_format .
Converting a double represented as a 64 bit IEEE floating point value to a uint64_t will not lose any precision on the integral part of the value, as long as the value itself is non negative and less than 264. But if the value is larger than 253, the representation as a double does not allow complete precision, so whatever computation led to the value probably was inaccurate anyway.
Note that the reverse is false, IEEE floats have less precision than 64 bit uint64_t, so close but different large values will convert to the same double values.
Note that a counter implemented as a 64 bit double is intrinsically limited by the precision of the floating point type. Incrementing a value larger than 253 by one is likely to have no effect at all. Using a floating point type to implement a packet counter seems a very bad idea. Using a uint64_t counter directly seem a safer bet. You only have to worry about wrap around at 264, a condition that you can check for in the unlikely case where you would actually expect to count that far.
If you cannot change the format, verify that the floating point value is within range for the conversion and store an appropriate value if it is not:
#include <stdint.h>
...
double v = get_64bit_value();
uint64_t result;
if (v < 0) {
result = 0;
} else
if (v >= (double)UINT64_MAX) {
result = UINT64_MAX;
} else {
result = (uint64_t)v;
}
Yes, precision is lost. Negative numbers cannot be converted properly to uint64 (as this type is unsigned), as well as numbers greater than 2^64-1. In all other cases, the conversion is exact (providing you look at the float64 value as exact and rounding is done correctly).

Max value of datatypes in C

I am trying to understand the maximum value that I can store in C. I tried doing printf("%f", pow(2, x)). The answer holds good until x = 1023. It says Inf when x = 1024.
I am sorry that it is a basic question but I am trying to understand how C assigns datatypes' sizes based on my machine.
I have a Mac (64-bit processor). A clear understanding that I have is that my processor being a 64-bit one, it will be able to do calculations up to the value (264). Clearly pow(2, 1023) is greater than that. But my program is working fine till x = 1023. How is this possible? Is GNU compiler has something to do with this?
If this is a duplicate of other question kindly give the link.
In C the pow() functions returns a double, and the double type is typically a 64-bit IEEE format representation of a floating point number.
The basic idea of floating point is to express a number in the same general way as e.g. 1.234×1056. Here you have a mantissa 1.234 and an exponent 56. C++, and probably also C, allows decimal representation for floating point numbers (but not for integer types), but in practice the internal representation will be binary, with a power of 2 rather than a power of 10.
The limit you ran up against was the supported range for the exponent in your compiler's representation of double numbers; probably 64-bit IEEE 754.
The limits of the various built-in integral numerical types are available as symbolic constants from <limits.h>. The limits of the built-in floating point types are available as symbolic constants from <float.h>. See the table over at cppreference.com for more details.
In C++ these limits are also available via the numeric_limits class template from <limits>.
"64-bit processor" typically means that it can deal with integers that contain at most 64 bits at a time (i.e. in a single instruction), not that it can only process numbers with 64 binary digits or less. Using arbitrary precision arithmetic you can do calculations on numbers that are arbitrarily large, provided that you have enough memory (and time), just like how us humans can do operations on big values with only 10 fingers. Read more here: What is the biggest number you can generate using a 64-bit processor?
However pow(2, 1023) is a little bit different. It's not an integer but a floating-point number (of type double in C) represented by a sign, a mantissa and an exponent like this (-1)sign × 1 × 21023. Not all the digits are stored so it's only accurate to the first few digits. However most systems use binary floating-point types so they can store the precise value of a power of 2 up to a large exponent depending on the exponent range. Most modern systems' floating-point types conform to IEEE-754 standard with double maps to binary64/double precision, therefore the maximum value will be
21023 × (1 + (1 − 2−52)) ≈ 1.7976931348623157 × 10308
The maximum value for a double is DBL_MAX. This is defined by <float.h> in C, or <cfloat> in C++. The numeric value may vary across systems, but you can always refer to it by the macro DBL_MAX.
You can print this:
printf("%f\n", DBL_MAX);
The integer data types all have similar macros defined in <limits.h>: e.g. ULLONG_MAX is the biggest value for unsigned long long. If printing with printf make sure to use the correct format specifier.

A small program for understanding unions in C [duplicate]

Suppose I define a union like this:
#include <stdio.h>
int main() {
union u {
int i;
float f;
};
union u tst;
tst.f = 23.45;
printf("%d\n", tst.i);
return 0;
}
Can somebody tell me what the memory where tst is stored will look like?
I am trying to understand the output 1102813594 that this program produces.
It depends on the implementation (compiler, OS, etc.) but you can use the debugger to actually see the memory contents if you want.
For example, in my MSVC 2008:
0x00415748 9a 99 bb 41
is the memory contents. Read from LSB on the left side (Intel, little-endian machine), this is 0x41bb999a or indeed 1102813594.
Generally, however, the integer and float are stored in the same bytes. Depending on how you access the union, you get the integer or floating point interpretation of those bytes. The size of the memory space, again, depends on the implementation, although it's usually the largest of its constituents aligned to some fixed boundary.
Why is the value such as it is in your (or mine) case? You should read about floating-point number representation for that (look up ieee 754)
The result is depends on the compiler implementation, But for most x86 compilers, float and int will be the same size. Wikipedia has a pretty good diagram of the layout of a 32 bit float http://en.wikipedia.org/wiki/Single_precision_floating-point_format, that can help to explain 1102813594.
If you print out the int as a hex value, it will be easier to figure out.
printf("%x\n", tst.i);
With a union, both variables are stored starting at the same memory location. A float is stored in an IEEE format (can't remember the standard number, you can look that up[edit: as pointed out by others, IEEE 754]). But, it will be a two's complement normalized (mantissa is always between 0 and 10, exponent can be anything) floating point number.
you are taking the first 4 bytes of that number (again, you can look up what bits go where in the 16 or 32 bits that a float takes up, can't remember). So it basically means nothing and it isn't useful as an int. That is, unless you know why you would want to do something like that, but usually, a float and int combo isn't very useful.
And, no, I don't think it is implementation defined. I believe that the standard dictates what format a float is in.
In union, members will be share the same memory. so that we can get the float value as integer value.
Floating number format will be different from integer storage. so that we can understand the difference using the union.
For Ex:
If I store the 12 integer value in ( 32 bits ). we can get this 12 value as floating point format.
It will stored as signed(1 bit), exponent(8 bits) and significant precision(23 bits).
I wrote a little program that shows what happens when you preserve the bit pattern of a 32-bit float into a 32-bit integer. It gives you the exact same output you are experiencing:
#include <iostream>
int main()
{
float f = 23.45;
int x = *reinterpret_cast<int*>(&f);
std::cout << x; // 1102813594
}

How to get the upper-/lower machine-word of a double according to IEEE 754 (ansi-c)?

i want to use the sqrt implementation of fdlibm.
This implementation defines (according to the endianess) some macros for accessing the lower/upper 32-bit of a double) in the following way (here: only the little-endian-version):
#define __HI(x) *(1+(int*)&x)
#define __LO(x) *(int*)&x
#define __HIp(x) *(1+(int*)x)
#define __LOp(x) *(int*)x
The readme of flibm is saying the following (a little bit shortened)
Each double precision floating-point number must be in IEEE 754
double format, and that each number can be retrieved as two 32-bit
integers through the using of pointer bashing as in the example
below:
Example: let y = 2.0
double fp number y: 2.0
IEEE double format: 0x4000000000000000
Referencing y as two integers:
*(int*)&y,*(1+(int*)&y) = {0x40000000,0x0} (on sparc)
{0x0,0x40000000} (on 386)
Note: Four macros are defined in fdlibm.h to handle this kind of
retrieving:
__HI(x) the high part of a double x
(sign,exponent,the first 21 significant bits)
__LO(x) the least 32 significant bits of x
__HIp(x) same as __HI except that the argument is a pointer
to a double
__LOp(x) same as __LO except that the argument is a pointer
to a double
If the behavior of pointer bashing is undefined, one may hack on the
macro in fdlibm.h.
I want to use this implementation and these macros with the cbmc model checker, which should be conformable with ansi-c.
I don't know exactly whats wrong, but the following example shows that these macros aren't working (little-endian was chosen, 32-bit machine-word was chosen):
temp=24376533834232348.000000l (0100001101010101101001101001010100000100000000101101110010000111)
high=0 (00000000000000000000000000000000)
low=67296391 (00000100000000101101110010000111)
Both seem to be wrong. High seems to be empty for every value of temp.
Any new ideas for accessing the both 32-words with ansi-c?
UPDATE: Thanks for all your answers and comments. All of your proposals worked for me. For the moment i decided to use "R.."s version and marked this as favorite answer because it seems to be the most robust in my tool regarding endianness.
Why not use an union?
union {
double value;
struct {
int upper;
int lower;
} words;
} converter;
converter.value = 1.2345;
printf("%d",converter.words.upper);
(Note that the behaviour code is implementation-dependent and relies on internal representation and specific data sizes)
On top of that, if you make that struct contain bitfields, you can access the individual floating-point parts (sign, exponent and mantissa) separately:
union {
double value;
struct {
int upper;
int lower;
} words;
struct {
long long mantissa : 52; // not 2C!
int exponent : 11; // not 2C!
int sign : 1;
};
} converter;
Casting pointers like you're doing violates the aliasing rules of the C language (pointers of different types may be assumed by the compiler not to point to the same data, except in certain very restricted cases). A better approach might be:
#define REP(x) ((union { double v; uint64_t r; }){ x }).r
#define HI(x) (uint32_t)(REP(x) >> 32)
#define LO(x) (uint32_t)(REP(x))
Note that this also fixed the endian dependency (assuming the floating point and integer endianness are the same) and the illegal _-prefix on the macro names.
An even better way might be not breaking it into high/low portions at all, and using the uint64_t representation REP(x) directly.
From a standards perspective, this use of unions is a little bit suspect, but better than the pointer casts. Using a cast to unsigned char * and accessing the data byte-by-byte would be better in some ways, but worse in that you have to worry about endian considerations, and probably a lot slower..
I would suggest taking a look at the disassembly to see exactly why the existing "pointer-bashing" method does not work. In its absence, you might use something more traditional like a binary shift (if you're on a 64-bit system).

Resources