What exactly is low bits subtraction - c

I was reading this article and thought that that everything was perfectly clear until I stumble upon this:
Again, most real Scheme systems use a slightly different implementation; for example, if GET_PAIR subtracts off the low bits of x, instead of masking them off, the optimizer will often be able to combine that subtraction with the addition of the offset of the structure member we are referencing, making a modified pointer as fast to use as an unmodified pointer.
How exactly one can achieve this subtraction and how the optimizer will do its magic to modify the pointer as fast as unmodified pointer?

The trick presented in the article is to encode type information into the unused three lowest bits of an 8 byte aligned pointer. After using this information to find out the type,
#define PAIR_P(x) (((int) (x) & 7) == 2)
one has to clear those additional bits before using the pointer as an address again.
#define GET_PAIR(x) ((struct pair *) ((int) (x) & ~7))
Note that at this point, we already know the type, so we know the value of the three least significant bits. They will always be 0b010 (decimal 2). So, instead of writing ((int) (x) & ~7), the author suggests to rather write ((int) (x) - 2). The idea is that if you write code like this,
if (PAIR_P(x))
{
SCM * thing = GET_PAIR(x)->cdr;
/* Use the thing… */
}
because we are accessing the cdr member inside the struct pair pointed to by the x (after clearing out the lower bits), the compiler will generate code to adjust the pointer appropriately. Something like this.
SCM * thing = (SCM *) ((char *)((int) (x) - 2)) + offsetof(struct pair, cdr));
Thanks to the associativity of integer addition and subtraction, we can omit one level of parenthesis and get (not showing the outer pointer casts that produce no machine code anyway)
(int) (x) - 2 + offsetof(struct pair, cdr)
where both, the 2 and the offsetof(struct pair, cdr) are compile-time constants and can be folded into a single constant. If we had asked for the car member (which has offset 0), this trick wouldn't help, but helping every other time is not too bad.
A modern optimizer might be able to figure out by itself that after we've just tested that (x & 7) == 2, x & ~7 is equivalent to x - 2 so the trick might not be required any more these days. You would like to measure this before you rely on it, though.

Related

Explain how specific C #define works

I have been looking at some of the codes at http://www.netlib.org/fdlibm/ to see how some functions work and I was looking at the code for e_log.c and in some parts of the code it says:
hx = __HI(x); /* high word of x */
lx = __LO(x); /* low word of x */
The code for __HI(x) and __LO(x) is:
#define __HI(x) *(1+(int*)&x)
#define __LO(x) *(int*)&x
which I really don't understand because I am not familiar with this type of C. Can someone please explain to me what __HI(x) and __LO(x) are doing?
Also later in the code for the function there is a statement:
__HI(x) = hx|(i^0x3ff00000);
Can someone please explain to me:
how is it possible to make a function equal to something (I generally work with python so I don't really know what is going on)?
what are __HI(x) and __LO(x) doing?
what does the program mean by "high word" and "low word" of x?
The final purpose of my analysis is understanding this code in order to port it into a Python implementation
These macros use compiler-dependent properties to access the representations of double types.
In C, all objects other than bit-fields are represented as sequences of bytes. The fdlibm code you are looking at is designed for implementations where int is four bytes and the double type is represented using eight bytes in a format defined by the IEEE-754 floating-point specification. That format is called binary64 or IEEE-754 basic 64-bit binary floating-point. It is also designed for an implementation where the C compiler guarantees that aliasing via pointer conversions is supported. (This is not guaranteed by the C standard, but C implementations may support it.)
Consider a double object named x. Given these macros:
#define __HI(x) *(1+(int*)&x)
#define __LO(x) *(int*)&x
When __LO(x) is used in source code, it is replaced by *(int*)&x. The &x takes the address of x. The address of x has type double *. The cast (int *) converts this to int *, a pointer to an int. Then * dereferences this pointer, resulting in a reference to the int that is at the low-address part of x.
When __HI(x) is used in the source code, (int*)&x again points to the low-address part of x. Adding 1 changes it to point to the high-address part. Then * dereferences this, resulting in a reference to the int that is at the high-address part.
The routines in fdlibm are special mathematical routines. To operate, they need to examine and modify the bytes that represent double values. The __LO and __HI macros give them this access.
These definitions of __HI and __LO work for implementations that store the double values in little-endian order (with the “least significant” part of the double in the lower-addressed memory location). The fdlibm code may contain alternate definitions for big-endian systems, likely selected by some #if statement.
In the code __HI(x) = hx|(i^0x3ff00000);, the value 0x3ff00000 is a bit mask for the bits that encode the exponent (and part of the significand) of a double value. Without context, we cannot say precisely what is happening here, but the code appears to be merging hx with some value from i. It is likely completing some computation of the bytes representing a new double value it is creating and storing those bytes in the “high” part of a double object.
I add a reply to integrate the one already present (not substitute).
hx = __HI(x); /* high word of x */
lx = __LO(x); /* low word of x */
Comments are useful... even if in this case the macro name could be clear enough. "high" and "low" refer to the two halves of an integer representation, typically a 16 or 32 bit because for an 8-bit int the used term is "nibble".
If we take a 16-bit unsigned integer which can range from 0 to 65535, or in hex 0x0000 to 0xFFFF, for example 0x1234, the two halves are:
0x1234
^^-------------------- lower half, or "low"
^^---------------------- upper half, or "high"
Note that "lower" means the less significant part. The correct way to get the two halves, assuming 16 bits, is to make a logical (bitwise) AND with 0xFF to get lo(), and to shift 8 bit right (divide by 256) to get high.
Now, inside a CPU the number 0x1234 is written in two consecutive locations, either as 0x12 then 0x34 if big-endian, or 0x34 then 0x12 if little-endian. Given this, other ways are possible to read single halves, reading the correct one directly from memory without calculation. To get the lo() of 0x1234 in a little endian machine, it is possible to read the single byte in the first location.
From the question:
#define __HI(x) *(1+(int*)&x)
#define __LO(x) *(int*)&x
__LO is defined to make a bitwise AND (sure way), while __HI peeks directly in the memory (less sure). It is strange because it seems that the integer to be splitted in two has double dimension of the size of the word of the machine. If the machine is 32 bit, the integer to be split is 64 bits long. And there is another caveat: those macro can read the halves, but can also be used to write separately the two halves. In fact, from the question:
__HI(x) = hx|(i^0x3ff00000);
the result is to set only the HI part (upper, most significant) of x. Note also the value used, 0x3FFF0000, which seems to indicate that x is 128 bits because the mask used to generate a half of it is 64 bits long.
Hope this is clear enough to translate C to python. You should use integers 128 bit long. When in need to get the LO() part, use a bitwise AND with 0xFFFFFFFF; to get HI(), shift right 64 times or do the equivalent division.
When HI and LO are to the left of an assignment, only that half of the value is written, and you should construct separately the two halves and sum them up (or bitwise or them together).
Hope it helps...
#define A B
is a preprocessor directive that substitutes literal A with literal B all over the source code before the compilation.
#define A(x) B
is a function-like preprocessor macro which uses a parameter x in order to do a parameterized preprocessor substitution. In this case, B can be a function of x as well.
Your macros
#define __HI(x) *(1+(int*)&x)
#define __LO(x) *(int*)&x
// called as
__HI(x) = hx|(i^0x3ff00000);
Since it is just a matter of code substitution, the assignment is perfectly legit. Why? Because in this case the macro is substituted by an R-value in both cases.
That rvalue is in both cases a variable of type int:
take x's address
cast it to a pointer to int
deference it (in case of __LO())
Add 1 and then deference it in case of __HI ().
What it will actually point depends on architecture because pointer arithmetics are architecture dependant. Also endianness has to be taken into account.
What we can say is that they are designed in order to access the lower and the higher halves of a data type whose size is 2*sizeof (int) big (so, if for example integer data is 32-bit wide, they will allow the access to lower 32 bytes and to upper 32 bytes). Furthermore, from the macro names we understand that it is a little-endian architecture (LSB comes first).
In order to port to Python code containing this macros you will need to do it at higher level, since Python does not support pointers.
These tips don't solve your specific task, but provide to you a working method for this task and similar:
A way to understand what a macro does is checking how it is actually translated by the preprocessor. This can be done on most compilers through the -E compiler option.
Use a debugger to understand the functionality: set a breakpoint just before the call to the macro, and analyze its effects on addresses and variables.

In C, How do I calculate the signed difference between two 48-bit unsigned integers?

I've got two values from an unsigned 48bit nanosecond counter, which may wrap.
I need the difference, in nanoseconds, of the two times.
I think I can assume that the readings were taken at roughly the same time, so of the two possible answers I think I'm safe taking the smallest.
They're both stored as uint64_t. Because I don't think I can have 48 bit types.
I'd like to calculate the difference between them, as a signed integer (presumably int64_t), accounting for the wrapping.
so e.g. if I start out with
x=5
y=3
then the result of x-y is 2, and will stay so if I increment both x and y, even as they wrap over the top of the max value 0xffffffffffff
Similarly if x=3, y=5, then x-y is -2, and will stay so whenever x and y are incremented simultaneously.
If I could declare x,y as uint48_t, and the difference as int48_t, then I think
int48_t diff = x - y;
would just work.
How do I simulate this behaviour with the 64-bit arithmetic I've got available?
(I think any computer this is likely to run on will use 2's complement arithmetic)
P.S. I can probably hack this out, but I wonder if there's a nice neat standard way to do this sort of thing, which the next person to read my code will be able to understand.
P.P.S Also, this code is going to end up in the tightest of tight loops, so something that will compile efficiently would be nice, so that if there has to be a choice, speed trumps readability.
You can simulate a 48-bit unsigned integer type by just masking off the top 16 bits of a uint64_t after any arithmetic operation. So, for example, to take the difference between those two times, you could do:
uint64_t diff = (after - before) & 0xffffffffffff;
You will get the right value even if the counter wrapped around during the procedure. If the counter didn't wrap around, the masking is not needed but not harmful either.
Now if you want this difference to be recognized as a signed integer by your compiler, you have to sign extend the 48th bit. That means that if the 48th bit is set, the number is negative, and you want to set the 49th through the 64th bit of your 64-bit integer. I think a simple way to do that is:
int64_t diff_signed = (int64_t)(diff << 16) >> 16;
Warning: You should probably test this to make sure it works, and also beware there is implementation-defined behavior when I cast the uint64_t to an int64_t, and I think there is implementation-defined behavior when I shift a signed negative number to the right. I'm sure a C language lawyer could some up with something more robust.
Update: The OP points out that if you combine the operation of taking the difference and doing the sign extension, there is no need for masking. That would look like this:
int64_t diff = (int64_t)(x - y) << 16 >> 16;
struct Nanosecond48{
unsigned long long u48 : 48;
// int res : 12; // just for clarity, don't need this one really
};
Here we just use the explicit width of the field to be 48 bits and with that (admittedly somewhat awkward) type you live it up to your compiler to properly handle different architectures/platforms/whatnot.
Like the following:
Nanosecond48 u1, u2, overflow;
overflow.u48 = -1L;
u1.u48 = 3;
u2.u48 = 5;
const auto diff = (u2.u48 + (overflow.u48 + 1) - u1.u48) & 0x0000FFFFFFFFFFFF;
Of course in the last statement you can just do the remainder operation with % (overflow.u48 + 1) if you prefer.
Do you know which was the earlier reading and which was later? If so:
diff = (earlier <= later) ? later - earlier : WRAPVAL - earlier + later;
where WRAPVAL is (1 << 48) is pretty easy to read.

Porting C endianness & pointers black magic to Swift

I'm trying to translate this snippet :
ntohs(*(UInt16*)VALUE) / 4.0
and some other ones, looking alike, from C to Swift.
Problem is, I have very few knowledge of Swift and I just can't understand what this snippet does... Here's all I know :
ntohs swap endianness to host endianness
VALUE is a char[32]
I just discovered that Swift : (UInt(data.0) << 6) + (UInt(data.1) >> 2) does the same thing. Could one please explain ?
I'm willing to return a Swift Uint (UInt64)
Thanks !
VALUE is a pointer to 32 bytes (char[32]).
The pointer is cast to UInt16 pointer. That means the first two bytes of VALUE are being interpreted as UInt16 (2 bytes).
* will dereference the pointer. We get the two bytes of VALUE as a 16-bit number. However it has net endianness (net byte order), so we cannot make integer operations on it.
We now swap the endianness to host, we get a normal integer.
We divide the integer by 4.0.
To do the same in Swift, let's just compose the byte values to an integer.
let host = (UInt(data.0) << 8) | UInt(data.1)
Note that to divide by 4.0 you will have to convert the integer to Float.
The C you quote is technically incorrect, although it will be compiled as intended by most production C compilers.¹ A better way to achieve the same effect, which should also be easier to translate to Swift, is
unsigned int val = ((((unsigned int)VALUE[0]) << 8) | // ² ³
(((unsigned int)VALUE[1]) << 0)); // ⁴
double scaledval = ((double)val) / 4.0; // ⁵
The first statement reads the first two bytes of VALUE, interprets them as a 16-bit unsigned number in network byte order, and converts them to host byte order (whether or not those byte orders are different). The second statement converts the number to double and scales it.
¹ Specifically, *(UInt16*)VALUE provokes undefined behavior because it violates the type-based aliasing rules, which are asymmetric: a pointer with character type may be used to access an object with any type, but a pointer with any other type may not be used to access an object with (array-of-)character type.
² In C, a cast to unsigned int here is necessary in order to make the subsequent shifting and or-ing happen in an unsigned type. If you cast to uint16_t, which might seem more appropriate, the "usual arithmetic conversions" would then convert it to int, which is signed, before doing the left shift. This would provoke undefined behavior on a system where int was only 16 bits wide (you're not allowed to shift into the sign bit). Swift almost certainly has completely different rules for arithmetic on types with small ranges; you'll probably need to cast to something before the shift, but I cannot tell you what.
³ I have over-parenthesized this expression so that the order of operations will be clear even if you aren't terribly familiar with C.
⁴ Left shifting by zero bits has no effect; it is only included for parallel structure.
⁵ An explicit conversion to double before the division operation is not necessary in C, but it is in Swift, so I have written it that way here.
It looks like the code is taking the single byte value[0]. This is then dereferenced, this should retrieve a number from a low memory address, 1 to 127 (possibly 255).
What ever number is there is then divided by 4.
I genuinely can't believe my interpretation is correct and can't check that cos I have no laptop. I really think there maybe a typo in your code as it is not a good thing to do. Portable, reusable
I must stress that the string is not converted to a number. Which is then used

C Operator Precedence with pointer increments

I am trying to understand a line of C-code which includes using a pointer to struct value (which is a pointer to something as well).
Example C-code:
// Given
typedef struct {
uint8 *output
uint32 bottom
} myType;
myType *e;
// Then at some point:
*e->output++ = (uint8) (e->bottom >> 24);
Source: https://www.rfc-editor.org/rfc/rfc6386#page-22
My question is:
What exactly does that line of C-code do?
"What exactly does that line of C-code do?"
Waste a lot of time having to carefully read it instead just knowing at a glance. If I was doing code review of that, I'd throw it back to the author and say break it up into two lines.
The two things it does is save something at e->output, then advance e->output to the next byte. I think if you need to describe code with two pieces though, it should be on two lines with two separate statements.
As pointed out by Deduplicator in the comments above, looking at an operator precedence table might help.
*e->output++ = ... means "assign value ... to the location e->output is pointing to, and let e->output point to a new location 8 bits further afterwards (because output is of type uint8).
(uint8) (e->bottom >> 24) is then evaluated to get a value for ...
The line
*e->output++ = (uint8) (e->bottom >> 24);
does the following:
Find the field bottom of the structure pointed to by the pointer e.
Fetch the 32-bit value from that field.
Shift that value right 24 bits.
Re-interpret that value as a uint8_t, which now contains the high order byte.
Find the field output of the structure. It's a pointer to uint8_t.
Store the uint8_t we computed earlier into the address pointed to by output.
And finally, add 1 to output, causing it to point to the next uint8_t.
The order of some of those things might be rearranged a bit as long as the result
behaves as if they had been done in that order. Operator precedence is a completely
separate question from order in which operations are performed, and not really
relevant here.

Using bit operations to "turn off" binary digits of a pointer

I was able to use bit operations to "turn off" binary digits of a number.
Ex:
x = x & ~(1<<0)
x = x & ~(1<<1)
(and repeat until desired number of digits starting from the right are changed to 0)
I would like to apply this technique to a pointer's address.
Unfortunately, the & operator cannot be used with pointers. Using the same lines of code as above, where x is a pointer, the compiler says "invalid operands to binary & (have int and int)."
I tried to typecast the pointers as ints, but that doesn't work as I assume the ints are too small (and I just realized I'm not allowed to cast).
(note: though this is part of a homework problem, I've already reasoned out why I need to turn off some digits after a good couple hours, so I'm fine in that regard. I'm simply trying to see if I can get a clever technique to do what I want to do here).
Restrictions: I cannot use loops, conditionals, any special functions, constants greater than 255, division, mod.
(edit: added restrictions to the bottom)
Use uintptr_t from <stdint.h>. You should always use unsigned types for bit twiddling, and (u)intptr_t is specifically chosen to be able to hold a pointer's value.
Note however that adjusting a pointer manually and dereferencing it is undefined behaviour, so watch your step. You shall be able to recover the exact original value of the pointer (or another valid pointer) before doing so.
Edit : from your comment I understand that you don't plan on dereferencing the twiddled pointer at all, so no undefined behaviour for you. Here is how you can check if your pointers share the same 64-byte block :
uintptr_t p1 = (uintptr_t)yourPointer1;
uintptr_t p2 = (uintptr_t)yourPointer2;
uintptr_t mask = ~(uintptr_t)63u; // Shave off 5 low-order bits
return (p1 & mask) == (p2 & mask);
C language standard library includes the (optional though) type intptr_t, for which there is guarantee that "any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer".
Of course if you perform bitwise operation on the integer than the result is undefined behaviour.
Edit:
How unfortunate haha. I need a function to show two pointers are in
the same 64-byte block of memory. This holds true so long as every
digit but the least significant 6 digits of their binary
representations are equal. By making sure the last 6 digits are all
the same (ex: 0), I can return true if both pointers are equal. Well,
at least I hope so.
You should be able to check if they're in the same 64 block of memory by something like this:
if ((char *)high_pointer - (char *)low_pointer < 64) {
// do stuff
}
Edit2: This is likely to be undefined behaviour as pointed out by chris.
Original post:
You're probably looking for intptr_t or uintptr_t. The standard says you can cast to and from these types to pointers and have the value equal to the original.
However, despite it being a standard type, it is optional so some library implementations may choose not to implement it. Some architectures might not even represent pointers as integers so such a type wouldn't make sense.
It is still better than casting to and from an int or a long since it is guaranteed to work on implementations that supply it. Otherwise, at least you'll know at compile time that your program will break on a certain implementation/architecture.
(Oh, and as other answers have stated, manually changing the pointer when casted to an integer type and dereferencing it is undefined behaviour)

Resources