Explain how specific C #define works - c

I have been looking at some of the codes at http://www.netlib.org/fdlibm/ to see how some functions work and I was looking at the code for e_log.c and in some parts of the code it says:
hx = __HI(x); /* high word of x */
lx = __LO(x); /* low word of x */
The code for __HI(x) and __LO(x) is:
#define __HI(x) *(1+(int*)&x)
#define __LO(x) *(int*)&x
which I really don't understand because I am not familiar with this type of C. Can someone please explain to me what __HI(x) and __LO(x) are doing?
Also later in the code for the function there is a statement:
__HI(x) = hx|(i^0x3ff00000);
Can someone please explain to me:
how is it possible to make a function equal to something (I generally work with python so I don't really know what is going on)?
what are __HI(x) and __LO(x) doing?
what does the program mean by "high word" and "low word" of x?
The final purpose of my analysis is understanding this code in order to port it into a Python implementation

These macros use compiler-dependent properties to access the representations of double types.
In C, all objects other than bit-fields are represented as sequences of bytes. The fdlibm code you are looking at is designed for implementations where int is four bytes and the double type is represented using eight bytes in a format defined by the IEEE-754 floating-point specification. That format is called binary64 or IEEE-754 basic 64-bit binary floating-point. It is also designed for an implementation where the C compiler guarantees that aliasing via pointer conversions is supported. (This is not guaranteed by the C standard, but C implementations may support it.)
Consider a double object named x. Given these macros:
#define __HI(x) *(1+(int*)&x)
#define __LO(x) *(int*)&x
When __LO(x) is used in source code, it is replaced by *(int*)&x. The &x takes the address of x. The address of x has type double *. The cast (int *) converts this to int *, a pointer to an int. Then * dereferences this pointer, resulting in a reference to the int that is at the low-address part of x.
When __HI(x) is used in the source code, (int*)&x again points to the low-address part of x. Adding 1 changes it to point to the high-address part. Then * dereferences this, resulting in a reference to the int that is at the high-address part.
The routines in fdlibm are special mathematical routines. To operate, they need to examine and modify the bytes that represent double values. The __LO and __HI macros give them this access.
These definitions of __HI and __LO work for implementations that store the double values in little-endian order (with the “least significant” part of the double in the lower-addressed memory location). The fdlibm code may contain alternate definitions for big-endian systems, likely selected by some #if statement.
In the code __HI(x) = hx|(i^0x3ff00000);, the value 0x3ff00000 is a bit mask for the bits that encode the exponent (and part of the significand) of a double value. Without context, we cannot say precisely what is happening here, but the code appears to be merging hx with some value from i. It is likely completing some computation of the bytes representing a new double value it is creating and storing those bytes in the “high” part of a double object.

I add a reply to integrate the one already present (not substitute).
hx = __HI(x); /* high word of x */
lx = __LO(x); /* low word of x */
Comments are useful... even if in this case the macro name could be clear enough. "high" and "low" refer to the two halves of an integer representation, typically a 16 or 32 bit because for an 8-bit int the used term is "nibble".
If we take a 16-bit unsigned integer which can range from 0 to 65535, or in hex 0x0000 to 0xFFFF, for example 0x1234, the two halves are:
0x1234
^^-------------------- lower half, or "low"
^^---------------------- upper half, or "high"
Note that "lower" means the less significant part. The correct way to get the two halves, assuming 16 bits, is to make a logical (bitwise) AND with 0xFF to get lo(), and to shift 8 bit right (divide by 256) to get high.
Now, inside a CPU the number 0x1234 is written in two consecutive locations, either as 0x12 then 0x34 if big-endian, or 0x34 then 0x12 if little-endian. Given this, other ways are possible to read single halves, reading the correct one directly from memory without calculation. To get the lo() of 0x1234 in a little endian machine, it is possible to read the single byte in the first location.
From the question:
#define __HI(x) *(1+(int*)&x)
#define __LO(x) *(int*)&x
__LO is defined to make a bitwise AND (sure way), while __HI peeks directly in the memory (less sure). It is strange because it seems that the integer to be splitted in two has double dimension of the size of the word of the machine. If the machine is 32 bit, the integer to be split is 64 bits long. And there is another caveat: those macro can read the halves, but can also be used to write separately the two halves. In fact, from the question:
__HI(x) = hx|(i^0x3ff00000);
the result is to set only the HI part (upper, most significant) of x. Note also the value used, 0x3FFF0000, which seems to indicate that x is 128 bits because the mask used to generate a half of it is 64 bits long.
Hope this is clear enough to translate C to python. You should use integers 128 bit long. When in need to get the LO() part, use a bitwise AND with 0xFFFFFFFF; to get HI(), shift right 64 times or do the equivalent division.
When HI and LO are to the left of an assignment, only that half of the value is written, and you should construct separately the two halves and sum them up (or bitwise or them together).
Hope it helps...

#define A B
is a preprocessor directive that substitutes literal A with literal B all over the source code before the compilation.
#define A(x) B
is a function-like preprocessor macro which uses a parameter x in order to do a parameterized preprocessor substitution. In this case, B can be a function of x as well.
Your macros
#define __HI(x) *(1+(int*)&x)
#define __LO(x) *(int*)&x
// called as
__HI(x) = hx|(i^0x3ff00000);
Since it is just a matter of code substitution, the assignment is perfectly legit. Why? Because in this case the macro is substituted by an R-value in both cases.
That rvalue is in both cases a variable of type int:
take x's address
cast it to a pointer to int
deference it (in case of __LO())
Add 1 and then deference it in case of __HI ().
What it will actually point depends on architecture because pointer arithmetics are architecture dependant. Also endianness has to be taken into account.
What we can say is that they are designed in order to access the lower and the higher halves of a data type whose size is 2*sizeof (int) big (so, if for example integer data is 32-bit wide, they will allow the access to lower 32 bytes and to upper 32 bytes). Furthermore, from the macro names we understand that it is a little-endian architecture (LSB comes first).
In order to port to Python code containing this macros you will need to do it at higher level, since Python does not support pointers.
These tips don't solve your specific task, but provide to you a working method for this task and similar:
A way to understand what a macro does is checking how it is actually translated by the preprocessor. This can be done on most compilers through the -E compiler option.
Use a debugger to understand the functionality: set a breakpoint just before the call to the macro, and analyze its effects on addresses and variables.

Related

Operating Rightmost/Leftmost n-Bits, Not All the Bits of A Integer Type Data Variable

In a programming-task, I have to add a smaller integer in variable B (data type int)
to a larger integer (20 decimal integer) in variable A (data type long long int),
then compare A with variable C which is also as large integer (data type long long int) as A.
What I realized, since I add a smaller B to A,
I don't need to check all the digits of A when I compare that with C, in other words, we don't need to check all the bits of A and C.
Given that I know, how many bits from the right I need to check, say n-bits,
is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language?
Because for comparing all the bits take more time, and since I am working with large number, the program becomes slower.
Every time I search in the google, bit-masking appears which uses all the bits of A, C, that doesn't do what I am asking for, so probably I am not using correct terminology, please help.
Addition:
Initial comments of this post made me think there is no way but i found the following -
Bit Manipulation by University of Colorado Boulder
(#cuboulder, after 7:45)
...the bit band region is accessed via a bit band alías, each bit in a
supported bit band region has its own unique address and we can access
that bit using a pointer to its bit band alias location, the least
significant bit in an alias location can be sent or cleared and that
will be mapped to the bit in the corresponding data or peripheral
memory, unfortunately this will not help you if you need to write to
multiple bit locations in memory dependent operations only allow a
single bit to be cleared or set...
Is above what I a asking for? if yes then
where I can find the detail as beginner?
Updated question:
Is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language (or any other language) that makes the program faster?
Your assumption that comparing fewer bits is faster might be true in some cases but is probably not true in most cases.
I'm only familiar with x86 CPUs. A x86-64 Processor has 64 bit wide registers. These can be accessed as 64 bit registers but the lower bits also as 32, 16 and 8 bit registers. There are processor instructions which work with the 64, 32, 16 or 8 bit part of the registers. Comparing 8 bits is one instruction but so is comparing 64 bits.
If using the 32 bit comparison would be faster than the 64 bit comparison you could gain some speed. But it seems like there is no speed difference for current processor generations. (Check out the "cmp" instruction with the link to uops.info from #harold.)
If your long long data type is actually bigger then the word size of your processor, then it's a different story. E.g. if your long long is 64 bit but your are on a 32 bit processor then these instructions cannot be handled by one register and you would need multiple instructions. So if you know that comparing only the lower 32 bits would be enough this could save some time.
Also note that comparing only e.g. 20 bits would actually take more time then comparing 32 bits. You would have to compare 32 bits and then mask the 12 highest bits. So you would need a comparison and a bitwise and instruction.
As you see this is very processor specific. And you are on the processors opcode level. As #RawkFist wrote in his comment you could try to get the C compiler to create such instructions but that does not automatically mean that this is even faster.
All of this is only relevant if these operations are executed a lot. I'm not sure what you are doing. If e.g. you add many values B to A and compare them to C each time it might be faster to start with C, subtract the B values from it and compare with 0. Because the compare-operation works internally like a subtraction. So instead of an add and a compare instruction a single subtraction would be enough within the loop. But modern CPUs and compilers are very smart and optimize a lot. So maybe the compiler automatically performs such or similar optimizations.
Try this question.
Is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language (or any other language) that makes the program faster?
Yes - when A + B != C. We can short-cut the comparison once a difference is found: from least to most significant.
No - when A + B == C. All bits need comparison.
Now back to OP's original question
Is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language (or any other language) that makes the program faster?
No. In order to do so, we need to out-think the compiler. A well enabled compiler itself will notice any "tricks" available for long long + (signed char)int == long long and emit efficient code.
Yet what about really long compares? How about a custom uint1000000 for A and C?
For long compares of a custom type, a quick compare can be had.
First, select a fast working type. unsigned is a prime candidate.
typedef unsigned ufast;
Now define the wide integer.
#include <limits.h>
#include <stdbool.h>
#define UINT1000000_N (1000000/(sizeof(ufast) * CHAR_BIT))
typedef struct {
// Least significant first
ufast digit[UINT1000000_N];
} uint1000000;
Perform the addition and compare one "digit" at a time.
bool uint1000000_fast_offset_compare(const uint1000000 *A, unsigned B,
const uint1000000 *C) {
ufast carry = B;
for (unsigned i = 0; i < UINT1000000_N; i++) {
ufast sum = A->digit[i] + carry;
if (sum != C->digit[i]) {
return false;
}
carry = sum < A->digit[i];
}
return true;
}

"Bit-fields are assigned left to right on some machines and right to left on others"- unable to get the concept from "The C Programming Language" book

I was going through the text "The C Programming Language" by Kernighan and Ritchie. While discussing about bit-fields at the end of that section, the authors say:
"Fields are assigned left to right on some machines and right to left on others. This means that although fields are useful for maintaining internally-defined data structures, the question of which end comes first has to be carefully considered when picking apart externally-defined data; programs that depend on such things are not portable."
- The C Programming Language [2e] by Kernighan & Ritchie [Section 6.9, p.150]
Strictly I do not get the meaning of these lines. Can anyone please explain me with a possible diagram?
PS: Well I have taken a computer organization and architecture course. I know how computers deal with bits and bytes. In a computer system, the smallest unit of information is a single bit which can be either 0 or 1. 8 such bits form a byte. Memories are byte-addressable, which means that each byte in the memory has an address associated with it. But usually, the processors have word lengths as 2 bytes (very old systems),4 bytes, 8 bytes... This means in one memory cycle, the CPU can take up a word length number of bytes from the main memory and put it inside its registers. Now how these bytes are placed in registers depends on the endianness of the system.
But I do not get what the authors mean by "left to right" or "right to left". The words seem like they are related to the endianness but endianness depends on the CPU and C compilers have nothing to do with it... The question which comes to my mind is "left to right" of "what"? What object are the authors referring to?
When a structure contains bit-fields, the C implementation uses some storage unit to hold them (or multiple storage units if needed). The storage unit might be one eight-bit byte or it might be four bytes, or it might be other sizes—this is a determination made by each C implementation. The C standard only requires that it be addressable, which effectively means it has to be a whole number of bytes.
Once we have a storage unit, it is some number of bits. Say it is 32 bits, and number the bits from 31 to 0, where, if we consider the bits to represent a binary numeral, bit 0 represents 20, and bit 31 represents 231. Note that Kernighan and Ritchie are imprecise to use “left” and “right” here. There is no inherent left or right. We usually write numerals with the most significant digits on the left, so we might consider bit 31 to be the leftmost and bit 0 to be the rightmost.
Now we have a storage unit with some number of bits and some labeling for those bits (31 to 0 or left to right). Say you want to put two bit-fields in them, say fields of width 7 and 5.
Which 7 of the bits from bit 31 to bit 0 are used for the first field? Which 5 of the bits are used for the second field?
We could use bits 31-25 for the first field and bits 24-20 for the second field. Or we could use bits 6-0 for the first field and bits 11-7 for the second field.
In theory, we could also use bits 27-21 for the first field and bits 15-11 for the second field. However, the C standard does say that “If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit” (C 2018 6.7.2.1 11). “Adjacent” is not formally defined, but we can assume it means consecutively numbered bits. So, if the C implementation puts the first field in bits 31-25, it is required to put the second field in bits 24-20. Conversely, it it puts the first field in bits 6-0, it must put the second field in 11-7.
Thus, the C standard requires an implementation to arrange successive bit-fields in a storage unit from left-to-right or from right-to-left, but it does not say which.
(I do not see anything in the standard that says the first field must start at one end of the storage unit or the other, rather than somewhere in the middle. That would lead to wasting some bits.)
When you write:
struct {
unsigned int version: 4;
unsigned int length: 4;
unsigned char dcsn;
you end up with a big headache you weren't expecting because your code is non-portable.
When you set version to 4 and length to 5, some systems may set the first byte of the structure to 0x45 and other systems may set the first byte of the structure to 0x54.
When I went to college this thing was #ifdef'd as follows (incorrect):
struct {
#if BIG_ENDIAN
unsigned int version: 4;
unsigned int length: 4;
#else
unsigned int length: 4;
unsigned int version: 4;
#endif
unsigned char dcsn;
but this is still rolling the dice as there's no rule that the order of the bits in the bytes in a bitfield corresponds to the order of bytes in the word in the machine. I would not be surprised that when you cross-compile the bit order in the struct comes from the host machine's rules while the bit order of integers comes from the target machine's rules (as it must). In theory the code could be corrected by having a separate #ifdef for BIG_ENDIAN_BITFIELD but I've never seen it done.
Here is some demonstration code. The only goal is to demonstrate what you are asking about. Clean coding etc. is neglected.
#include <stdio.h>
#include <stdint.h>
union
{
uint32_t Everything;
struct
{
uint32_t FirstMentionedBit : 1;
uint32_t FewOTherBits :30;
uint32_t LastMentionedBit : 1;
} bitfield;
} Demonstration;
int main()
{
Demonstration.Everything =0;
Demonstration.bitfield.LastMentionedBit=1;
printf("%x\n", Demonstration.Everything);
Demonstration.Everything =0;
Demonstration.bitfield.FirstMentionedBit=1;
printf("%x\n", Demonstration.Everything);
return 0;
}
If you use this here https://www.tutorialspoint.com/compile_c_online.php
the output is
80000000
1
But in other environments it might easily be
1
80000000
This is because compilers are free to consider the first mentioned bit the MSB or the LSB and correspondingly the last mentioned bit to be the LSB or MSB.
And that is what your quote describes.

Bitwise operation results in unexpected variable size

Context
We are porting C code that was originally compiled using an 8-bit C compiler for the PIC microcontroller. A common idiom that was used in order to prevent unsigned global variables (for example, error counters) from rolling over back to zero is the following:
if(~counter) counter++;
The bitwise operator here inverts all the bits and the statement is only true if counter is less than the maximum value. Importantly, this works regardless of the variable size.
Problem
We are now targeting a 32-bit ARM processor using GCC. We've noticed that the same code produces different results. So far as we can tell, it looks like the bitwise complement operation returns a value that is a different size than we would expect. To reproduce this, we compile, in GCC:
uint8_t i = 0;
int sz;
sz = sizeof(i);
printf("Size of variable: %d\n", sz); // Size of variable: 1
sz = sizeof(~i);
printf("Size of result: %d\n", sz); // Size of result: 4
In the first line of output, we get what we would expect: i is 1 byte. However, the bitwise complement of i is actually four bytes which causes a problem because comparisons with this now will not give the expected results. For example, if doing (where i is a properly-initialized uint8_t):
if(~i) i++;
we will see i "wrap around" from 0xFF back to 0x00. This behaviour is different in GCC compared with when it used to work as we intended in the previous compiler and 8-bit PIC microcontroller.
We are aware that we can resolve this by casting like so:
if((uint8_t)~i) i++;
or, by
if(i < 0xFF) i++;
however in both of these workarounds, the size of the variable must be known and is error-prone for the software developer. These kinds of upper bounds checks occur throughout the codebase. There are multiple sizes of variables (eg., uint16_t and unsigned char etc.) and changing these in an otherwise working codebase is not something we're looking forward to.
Question
Is our understanding of the problem correct, and are there options available to resolving this that do not require re-visiting each case where we've used this idiom? Is our assumption correct, that an operation like bitwise complement should return a result that is the same size as the operand? It seems like this would break, depending on processor architectures. I feel like I'm taking crazy pills and that C should be a bit more portable than this. Again, our understanding of this could be wrong.
On the surface this might not seem like a huge issue but this previously-working idiom is used in hundreds of locations and we're eager to understand this before proceeding with expensive changes.
Note: There is a seemingly similar but not exact duplicate question here: Bitwise operation on char gives 32 bit result
I didn't see the actual crux of the issue discussed there, namely, the result size of a bitwise complement being different than what's passed into the operator.
What you are seeing is the result of integer promotions. In most cases where an integer value is used in an expression, if the type of the value is smaller than int the value is promoted to int. This is documented in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an intor
unsigned int may be used
An object or expression with an integer type (other than intor unsigned int) whose integer conversion rank is less
than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int ,signed int, orunsigned int`.
If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an
unsigned int. These are called the integer promotions. All
other types are unchanged by the integer promotions.
So if a variable has type uint8_t and the value 255, using any operator other than a cast or assignment on it will first convert it to type int with the value 255 before performing the operation. This is why sizeof(~i) gives you 4 instead of 1.
Section 6.5.3.3 describes that integer promotions apply to the ~ operator:
The result of the ~ operator is the bitwise complement of its
(promoted) operand (that is, each bit in the result is set if and only
if the corresponding bit in the converted operand is not set). The
integer promotions are performed on the operand, and the
result has the promoted type. If the promoted type is an unsigned
type, the expression ~E is equivalent to the maximum value
representable in that type minus E.
So assuming a 32 bit int, if counter has the 8 bit value 0xff it is converted to the 32 bit value 0x000000ff, and applying ~ to it gives you 0xffffff00.
Probably the simplest way to handle this is without having to know the type is to check if the value is 0 after incrementing, and if so decrement it.
if (!++counter) counter--;
The wraparound of unsigned integers works in both directions, so decrementing a value of 0 gives you the largest positive value.
in sizeof(i); you request the size of the variable i, so 1
in sizeof(~i); you request the size of the type of the expression, which is an int, in your case 4
To use
if(~i)
to know if i does not value 255 (in your case with an the uint8_t) is not very readable, just do
if (i != 255)
and you will have a portable and readable code
There are multiple sizes of variables (eg., uint16_t and unsigned char etc.)
To manage any size of unsigned :
if (i != (((uintmax_t) 2 << (sizeof(i)*CHAR_BIT-1)) - 1))
The expression is constant, so computed at compile time.
#include <limits.h> for CHAR_BIT and #include <stdint.h> for uintmax_t
Here are several options for implementing “Add 1 to x but clamp at the maximum representable value,” given that x is some unsigned integer type:
Add one if and only if x is less than the maximum value representable in its type:
x += x < Maximum(x);
See the following item for the definition of Maximum. This method
stands a good chance of being optimized by a compiler to efficient
instructions such as a compare, some form of conditional set or move,
and an add.
Compare to the largest value of the type:
if (x < ((uintmax_t) 2u << sizeof x * CHAR_BIT - 1) - 1) ++x
(This calculates 2N, where N is the number of bits in x, by shifting 2 by N−1 bits. We do this instead of shifting 1 N bits because a shift by the number of bits in a type is not defined by the C standard. The CHAR_BIT macro may be unfamiliar to some; it is the number of bits in a byte, so sizeof x * CHAR_BIT is the number of bits in the type of x.)
This can be wrapped in a macro as desired for aesthetics and clarity:
#define Maximum(x) (((uintmax_t) 2u << sizeof (x) * CHAR_BIT - 1) - 1)
if (x < Maximum(x)) ++x;
Increment x and correct if it wraps to zero, using an if:
if (!++x) --x; // !++x is true if ++x wraps to zero.
Increment x and correct if it wraps to zero, using an expression:
++x; x -= !x;
This is is nominally branchless (sometimes beneficial for performance), but a compiler may implement it the same as above, using a branch if needed but possibly with unconditional instructions if the target architecture has suitable instructions.
A branchless option, using the above macro, is:
x += 1 - x/Maximum(x);
If x is the maximum of its type, this evaluates to x += 1-1. Otherwise, it is x += 1-0. However, division is somewhat slow on many architectures. A compiler may optimize this to instructions without division, depending on the compiler and the target architecture.
Before stdint.h the variable sizes can vary from compiler to compiler and the actual variable types in C are still int, long, etc and are still defined by the compiler author as to their size. Not some standard nor target specific assumptions. The author(s) then need to create stdint.h to map the two worlds, that is the purpose of stdint.h to map the uint_this that to int, long, short.
If you are porting code from another compiler and it uses char, short, int, long then you have to go through each type and do the port yourself, there is no way around it. And either you end up with the right size for the variable, the declaration changes but the code as written works....
if(~counter) counter++;
or...supply the mask or typecast directly
if((~counter)&0xFF) counter++;
if((uint_8)(~counter)) counter++;
At the end of the day if you want this code to work you have to port it to the new platform. Your choice as to how. Yes, you have to spend the time hit each case and do it right, otherwise you are going to keep coming back to this code which is even more expensive.
If you isolate the variable types on the code before porting and what size the variable types are, then isolate the variables that do this (should be easy to grep) and change their declarations using stdint.h definitions which hopefully won't change in the future, and you would be surprised but the wrong headers are used sometimes so even put checks in so you can sleep better at night
if(sizeof(uint_8)!=1) return(FAIL);
And while that style of coding works (if(~counter) counter++;), for portability desires now and in the future it is best to use a mask to specifically limit the size (and not rely on the declaration), do this when the code is written in the first place or just finish the port and then you won't have to re-port it again some other day. Or to make the code more readable then do the if x<0xFF then or x!=0xFF or something like that then the compiler can optimize it into the same code it would for any of these solutions, just makes it more readable and less risky...
Depends on how important the product is or how many times you want send out patches/updates or roll a truck or walk to the lab to fix the thing as to whether you try to find a quick solution or just touch the affected lines of code. if it is only a hundred or few that is not that big of a port.
6.5.3.3 Unary arithmetic operators
...
4 The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,
each bit in the result is set if and only if the corresponding bit in the converted operand is
not set). The integer promotions are performed on the operand, and the result has the
promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent
to the maximum value representable in that type minus E.
C 2011 Online Draft
The issue is that the operand of ~ is being promoted to int before the operator is applied.
Unfortunately, I don't think there's an easy way out of this. Writing
if ( counter + 1 ) counter++;
won't help because promotions apply there as well. The only thing I can suggest is creating some symbolic constants for the maximum value you want that object to represent and testing against that:
#define MAX_COUNTER 255
...
if ( counter < MAX_COUNTER-1 ) counter++;

platform independent way to reduce precision of floating point constant values

The use case:
I have some large data arrays containing floating point constants that.
The file defining that array is generated and the template can be easily adapted.
I would like to make some tests, how reduced precision does influence the results in terms of quality, but also in compressibility of the binary.
Since I do not want to change other source code than the generated file, I am looking for a way to reduce the precision of the constants.
I would like to limit the mantissa to a fixed number of bits (set the lower ones to 0). But since floating point literals are in decimal, there are some difficulties, specifying numbers in a way that the binary representation does contain all zeros at the lower mantissa bits.
The best case would be something like:
#define FP_REDUCE(float) /* some macro */
static const float32_t veryLargeArray[] = {
FP_REDUCE(23.423f), FP_REDUCE(0.000023f), FP_REDUCE(290.2342f),
// ...
};
#undef FP_REDUCE
This should be done at compile time and it should be platform independent.
The following uses the Veltkamp-Dekker splitting algorithm to remove n bits (with rounding) from x, where p = 2n (for example, to remove eight bits, use 0x1p8f for the second argument). The casts to float32_t coerce the results to that type, as the C standard otherwise permits implementations to use more precision within expressions. (Double-rounding could produce incorrect results in theory, but this will not occur when float32_t is the IEEE basic 32-bit binary format and the C implementation computes this expression in that format or the 64-bit format or wider, as the former is the desired format and the latter is wide enough to represent intermediate results exactly.)
IEEE-754 binary floating-point is assumed, with round-to-nearest. Overflow occurs if x•(p+1) rounds to infinity.
#define RemoveBits(x, p) (float32_t) (((float32_t) ((x) * ((p)+1))) - (float32_t) (((float32_t) ((x) * ((p)+1))) - (x))))
What you're asking for can be done with varying degrees of partial portability, but not absolute unless you want to run the source file through your own preprocessing tool at build time to reduce the precision. If that's an option for you, it's probably your best one.
Short of that, I'm going to assume at least that your floating point types are base 2 and obey Annex F/IEEE semantics. This should be a reasonable assumption, but the latter is false with gcc on platforms (including 32-bit x86) with extended-precision under the default standards-conformance profile; you need -std=cNN or -fexcess-precision=standard to fix it.
One approach is to add and subtract a power of two chosen to cause rounding to the desired precision:
#define FP_REDUCE(x,p) ((x)+(p)-(p))
Unfortunately, this works in absolute precisions, not relative, and requires knowing the right value p for the particular x, which is going to be equal to the value of the leading base-2 place of x, times 2 raised to the power of FLT_MANT_DIG minus the bits of precision you want. This cannot be evaluated as a constant expression for use as an initializer, but you can write it in terms of FLT_EPSILON and, if you can assume C99+, a preprocessor-token-pasting to form a hex float literal, yielding the correct value for this factor. But you still need to know the power of two for the leading digit of x; I don't see any way to extract that as a constant expression.
Edit: I believe this is fixable, so as not to need an absolute precision but rather automatically scale to the value, but it depends on correctness of a work in progress. See Is there a correct constant-expression, in terms of a float, for its msb?. If that works I will later integrate the result with this answer.
Another approach I like, if your compiler supports compound literals in static initializers and if you can assume IEEE type representations, is using a union and masking off bits:
union { float x; uint32_t r; } fr;
#define FP_REDUCE(x) ((union fr){.r=(union fr){x}.r & (0xffffffffu<<n)}.x)
where n is the number of bits you want to drop. This will round towards zero rather than to nearest; if you want to make it round to nearest, it should be possible by adding an appropriate constant to the low bits before masking, but you have to take care about what happens when the addition overflows into the exponent bits.

Porting C endianness & pointers black magic to Swift

I'm trying to translate this snippet :
ntohs(*(UInt16*)VALUE) / 4.0
and some other ones, looking alike, from C to Swift.
Problem is, I have very few knowledge of Swift and I just can't understand what this snippet does... Here's all I know :
ntohs swap endianness to host endianness
VALUE is a char[32]
I just discovered that Swift : (UInt(data.0) << 6) + (UInt(data.1) >> 2) does the same thing. Could one please explain ?
I'm willing to return a Swift Uint (UInt64)
Thanks !
VALUE is a pointer to 32 bytes (char[32]).
The pointer is cast to UInt16 pointer. That means the first two bytes of VALUE are being interpreted as UInt16 (2 bytes).
* will dereference the pointer. We get the two bytes of VALUE as a 16-bit number. However it has net endianness (net byte order), so we cannot make integer operations on it.
We now swap the endianness to host, we get a normal integer.
We divide the integer by 4.0.
To do the same in Swift, let's just compose the byte values to an integer.
let host = (UInt(data.0) << 8) | UInt(data.1)
Note that to divide by 4.0 you will have to convert the integer to Float.
The C you quote is technically incorrect, although it will be compiled as intended by most production C compilers.¹ A better way to achieve the same effect, which should also be easier to translate to Swift, is
unsigned int val = ((((unsigned int)VALUE[0]) << 8) | // ² ³
(((unsigned int)VALUE[1]) << 0)); // ⁴
double scaledval = ((double)val) / 4.0; // ⁵
The first statement reads the first two bytes of VALUE, interprets them as a 16-bit unsigned number in network byte order, and converts them to host byte order (whether or not those byte orders are different). The second statement converts the number to double and scales it.
¹ Specifically, *(UInt16*)VALUE provokes undefined behavior because it violates the type-based aliasing rules, which are asymmetric: a pointer with character type may be used to access an object with any type, but a pointer with any other type may not be used to access an object with (array-of-)character type.
² In C, a cast to unsigned int here is necessary in order to make the subsequent shifting and or-ing happen in an unsigned type. If you cast to uint16_t, which might seem more appropriate, the "usual arithmetic conversions" would then convert it to int, which is signed, before doing the left shift. This would provoke undefined behavior on a system where int was only 16 bits wide (you're not allowed to shift into the sign bit). Swift almost certainly has completely different rules for arithmetic on types with small ranges; you'll probably need to cast to something before the shift, but I cannot tell you what.
³ I have over-parenthesized this expression so that the order of operations will be clear even if you aren't terribly familiar with C.
⁴ Left shifting by zero bits has no effect; it is only included for parallel structure.
⁵ An explicit conversion to double before the division operation is not necessary in C, but it is in Swift, so I have written it that way here.
It looks like the code is taking the single byte value[0]. This is then dereferenced, this should retrieve a number from a low memory address, 1 to 127 (possibly 255).
What ever number is there is then divided by 4.
I genuinely can't believe my interpretation is correct and can't check that cos I have no laptop. I really think there maybe a typo in your code as it is not a good thing to do. Portable, reusable
I must stress that the string is not converted to a number. Which is then used

Resources