I uses these two methods to get the bit field information from registers. The location of the bit field that I need extract is given by Intel Manual. Just as the code below. But the results I got are different with these two methods.
I cannot find any problems for these two methods. But for my understanding, maximum_power filed should not be '0' as the first method (It is the value that Intel has already defined in the register.)
Method 1:
typedef struct rapl_parameters_msr_t {
uint64_t thermal_spec_power : 15;
uint64_t : 1;
uint64_t minimum_power : 15;
uint64_t : 1;
uint64_t maximum_power : 15;
uint64_t : 1;
uint64_t maximum_limit_time_window : 6;
uint64_t : 10;
} rapl_parameters_msr_t;
uint64_t msr;
read_msr(cpu, 0x614, &msr);
rapl_parameters_msr_t domain_msr = *(rapl_parameters_msr_t *)&msr;
printf("%ld\n", domain_msr.thermal_spec_power); //print: 280
printf("%ld\n", domain_msr.minimum_power); //print: 192
printf("%ld\n", domain_msr.maximum_power); //print: 0
printf("%ld\n", domain_msr.maximum_limit_time_window); //print: 16
Method 2:
uint64_t
extractBitField(uint64_t inField, uint64_t width, uint64_t offset)
{
uint64_t bitMask;
uint64_t outField;
if ((offset+width) == 32)
{
bitMask = (0xFFFFFFFF<<offset);
}
else
{ /*Just keep the filed needs to be extrated*/
bitMask = (0xFFFFFFFF<<offset) ^ (0xFFFFFFFF<<(offset+width));
}
/*Move to the right most field to be calculated*/
outField = (inField & bitMask) >> offset;
return outField;
}
uint64_t flags;
read_msr(cpu, 0x614, &flags);
printf("thermal power: %d\n", extractBitField(flags,15,0)); //print: 280
printf("minimum power: %d\n", extractBitField(flags,15,16));//print: 192
printf("maximum power: %d\n", extractBitField(flags,15,32));//print: 0
printf("time window: %d\n", extractBitField(flags,6,48)); //print: 0
Do you have any insights where the problem would be?
Update:
Sorry for the confused part. I changed all type to be uint64_t, and the method 2 gets 0 for both maximum power and time window..
If compiler would make possible wrong result for method 1, I am still doubted how much I can trust for the method 2 result..
The following is the bit represented documentation from Intel Manual:
Thermal Spec Power (bits 14:0)
Minimum Power (bits 30:16)
Maximum Power (bits 46:32)
Maximum Time Window (bits 53:48)
Thank you for David, this is the right version for 64 bit extraction.
uint64_t
extractBitField(uint64_t inField, uint64_t width, uint64_t offset)
{
uint64_t bitMask;
uint64_t outField;
if ((offset+width) == 64)
{
bitMask = (0xFFFFFFFFFFFFFFFF<<offset);
}
else
{ /*Just keep the filed needs to be extrated*/
bitMask = (0xFFFFFFFFFFFFFFFF<<offset) ^ (0xFFFFFFFFFFFFFFFF<<(offset+width));
}
/*Move to the right most field to be calculated*/
outField = (inField & bitMask) >> offset;
return outField;
}
uint64_t flags;
read_msr(cpu, 0x614, &flags);
printf("thermal power: %d\n", extractBitField(flags,15,0)); //print: 280
printf("minimum power: %d\n", extractBitField(flags,15,16));//print: 192
printf("maximum power: %d\n", extractBitField(flags,15,32));//print: 0
printf("time window: %d\n", extractBitField(flags,6,48)); //print: 16
The ordering of bits in C bitfields is implementation defined, so be careful if you plan on using them-- the order you think you're getting may not be what you actually are. Check your compiler's documentation to see how it handles this.
Also, your second function takes in a uint32 while your first example is using a 64 bit struct, so your types aren't matching up. Can you correct that and update your results?
edit: Additionally, you have the time windows defined as six bits in the first example and 15 in the second.
C99:6.7.2.1p10 An implementation may allocate any addressable storage
unit large enough to hold a bit- field. If enough space remains, a
bit-field that immediately follows another bit-field in a structure
shall be packed into adjacent bits of the same unit. If insufficient
space remains, whether a bit-field that does not fit is put into the
next unit or overlaps adjacent units is implementation-defined. The
order of allocation of bit-fields within a unit (high-order to
low-order or low-order to high-order) is implementation-defined. The
alignment of the addressable storage unit is unspecified.
You have tried two ways to do the same thing, and I wouldn't trust either of them.
First, bit fields. Don't use them! The ordering of bit fields is unreliable, the behaviour of anything other than unsigned int is unreliable, the distribution of bit fields across struct members is unreliable. All these things can be fixed, but it just isn't worth it.
Second, shift and mask. This is the right way to do it but the code is wrong. You have a 32-bit mask (0xffffffff) shifted by 32 and 48 bits. Not a good idea at all.
So, what you need to do is write a simple reliable function that is an implementation of the signature given.
extractBitField(uint64_t inField, uint64_t width, uint64_t offset)
This is a good place to start. Write the function in a test program and unit test it until you are 100% certain it works exactly right. Step through with the debugger, check out all the shift combinations. Be absolutely sure you have it right.
When the test program works properly then transfer the function to the real program and watch it work first time.
I guess I could code that function for you but I don't think I will. You really need to go through the exercise so you learn how it works and why.
Related
I have read from memory a 6 byte unsigned char array.
The endianess is Big Endian here.
Now I want to assign the value that is stored in the array to an integer variable. I assume this has to be long long since it must contain up to 6 bytes.
At the moment I am assigning it this way:
unsigned char aFoo[6];
long long nBar;
// read values to aFoo[]...
// aFoo[0]: 0x00
// aFoo[1]: 0x00
// aFoo[2]: 0x00
// aFoo[3]: 0x00
// aFoo[4]: 0x26
// aFoo[5]: 0x8e
nBar = (aFoo[0] << 64) + (aFoo[1] << 32) +(aFoo[2] << 24) + (aFoo[3] << 16) + (aFoo[4] << 8) + (aFoo[5]);
A memcpy approach would be neat, but when I do this
memcpy(&nBar, &aFoo, 6);
the 6 bytes are being copied to the long long from the start and thus have padding zeros at the end.
Is there a better way than my assignment with the shifting?
What you want to accomplish is called de-serialisation or de-marshalling.
For values that wide, using a loop is a good idea, unless you really need the max. speed and your compiler does not vectorise loops:
uint8_t array[6];
...
uint64_t value = 0;
uint8_t *p = array;
for ( int i = (sizeof(array) - 1) * 8 ; i >= 0 ; i -= 8 )
value |= (uint64_t)*p++ << i;
// left-align
value <<= 64 - (sizeof(array) * 8);
Note using stdint.h types and sizeof(uint8_t) cannot differ from1`. Only these are guaranteed to have the expected bit-widths. Also use unsigned integers when shifting values. Right shifting certain values is implementation defined, while left shifting invokes undefined behaviour.
Iff you need a signed value, just
int64_t final_value = (int64_t)value;
after the shifting. This is still implementation defined, but all modern implementations (and likely the older) just copy the value without modifications. A modern compiler likely will optimize this, so there is no penalty.
The declarations can be moved, of course. I just put them before where they are used for completeness.
You might try
nBar = 0;
memcpy((unsigned char*)&nBar + 2, aFoo, 6);
No & needed before an array name caz' it's already an address.
The correct way to do what you need is to use an union:
#include <stdio.h>
typedef union {
struct {
char padding[2];
char aFoo[6];
} chars;
long long nBar;
} Combined;
int main ()
{
Combined x;
// reset the content of "x"
x.nBar = 0; // or memset(&x, 0, sizeof(x));
// put values directly in x.chars.aFoo[]...
x.chars.aFoo[0] = 0x00;
x.chars.aFoo[1] = 0x00;
x.chars.aFoo[2] = 0x00;
x.chars.aFoo[3] = 0x00;
x.chars.aFoo[4] = 0x26;
x.chars.aFoo[5] = 0x8e;
printf("nBar: %llx\n", x.nBar);
return 0;
}
The advantage: the code is more clear, there is no need to juggle with bits, shifts, masks etc.
However, you have to be aware that, for speed optimization and hardware reasons, the compiler might squeeze padding bytes into the struct, leading to aFoo not sharing the desired bytes of nBar. This minor disadvantage can be solved by telling the computer to align the members of the union at byte-boundaries (as opposed to the default which is the alignment at word-boundaries, the word being 32-bit or 64-bit, depending on the hardware architecture).
This used to be achieved using a #pragma directive and its exact syntax depends on the compiler you use.
Since C11/C++11, the alignas() specifier became the standard way to specify the alignment of struct/union members (given your compiler already supports it).
In a (real time) system, computer 1 (big endian) gets an integer data from from computer 2 (which is little endian). Given the fact that we do not know the size of int, I check it using a sizeof() switch statement and use the __builtin_bswapX method accordingly as follows (assume that this builtin method is usable).
...
int data;
getData(&data); // not the actual function call. just represents what data is.
...
switch (sizeof(int)) {
case 2:
intVal = __builtin_bswap16(data);
break;
case 4:
intVal = __builtin_bswap32(data);
break;
case 8:
intVal = __builtin_bswap64(data);
break;
default:
break;
}
...
is this a legitimate way of swapping the bytes for an integer data? Or is this switch-case statement totally unnecessary?
Update: I do not have access to the internals of getData() method, which communicates with the other computer and gets the data. It then just returns an integer data which needs to be byte-swapped.
Update 2: I realize that I caused some confusion. The two computers have the same int size but we do not know that size. I hope it makes sense now.
Seems odd to assume the size of int is the same on 2 machines yet compensate for variant endian encodings.
The below only informs the int size of the receiving side and not the sending side.
switch(sizeof(int))
The sizeof(int) is the size, in char of an int on the local machine. It should be sizeof(int)*CHAR_BIT to get the bit size. [Op has edited the post]
The sending machine should detail the data width, as a 16, 32, 64- bit without regard to its int size and the receiving end should be able to detect that value as part of the message or an agreed upon width should be used.
Much like hton() to convert from local endian to network endian, the integer size with these function is moving toward fixed width integers like
#include <netinet/in.h>
uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);
So suggest sending/receiving the "int" as a 32-bit uint32_t in network endian.
[Edit]
Consider computers exist that have different endian (little and big are the most common, others exist) and various int sizes with bit width 32 (common), 16, 64 and maybe even some odd-ball 36 bit and such and room for growth to 128-bit. Let us assume N combinations. Rather than write code to convert from 1 of N to N different formats (N*N) routines, let us define a network format and fix its endian to big and bit-width to 32. Now each computer does not care nor need to know the int width/endian of the sender/recipient of data. Each platform get/receives data in a locally optimized method from its endian/int to network endian/int-width.
OP describes not knowing the the sender's int width yet hints that the int width on the sender/receiver might be the same as the local machine. If the int widths are specified to be the same and the endian are specified to be one big/one little as described, then OP's coding works.
However, such a "endians are opposite and int-width the same" seems very selective. I would prepare code to cope with a interchange standard (network standard) as certainly, even if today it is "opposite endian, same int", tomorrow will evolved to a network standard.
A portable approach would not depend on any machine properties, but only rely on mathematical operations and a definition of the communication protocol that is also hardware independent. For example, given that you want to store bytes in a defined way:
void serializeLittleEndian(uint8_t *buffer, uint32_t data) {
size_t i;
for (i = 0; i < sizeof(uint32_t); ++i) {
buffer[i] = data % 256;
data /= 256;
}
}
and to restore that data to whatever machine:
uint32_t deserializeLittleEndian(uint8_t *buffer) {
uint32_t data = 0;
size_t i;
for (i = 0; i < sizeof(uint32_t); ++i) {
data *= 256;
data += buffer[i];
}
return data;
}
EDIT: This is not portable to systems with other than 8 bits per byte due to the uses of int8_t and int32_t. The use of type int8_t implies a system with 8 bit chars. However, it will not compile for systems where these conditions are not met. Thanks to Olaf and Chqrlie.
Yes, this is totally cool - given you fix your switch for proper sizeof return values. One might be a little fancy and provide, for example, template specializations based on the size of int. But a switch like this is totally cool and will not produce any branches in optimized code.
As already mentioned, you generally want to define a protocol for communications across networks, which the hton/ntoh functions are mostly meant for. Network byte order is generally treated as big endian, which is what the hton/ntoh functions use. If the majority of your machines are little endian, it may be better to standardize on it instead though.
A couple people have been critical of using __builtin_bswap, which I personally consider fine as long you don't plan to target compilers that don't support it. Although, you may want to read Dan Luu's critique of intrinsics.
For completeness, I'm including a portable version of bswap that (at very least Clang) compiles into a bswap for x86(64).
#include <stddef.h>
#include <stdint.h>
size_t bswap(size_t x) {
for (size_t i = 0; i < sizeof(size_t) >> 1; i++) {
size_t d = sizeof(size_t) - i - 1;
size_t mh = ((size_t) 0xff) << (d << 3);
size_t ml = ((size_t) 0xff) << (i << 3);
size_t h = x & mh;
size_t l = x & ml;
size_t t = (l << ((d - i) << 3)) | (h >> ((d - i) << 3));
x = t | (x & ~(mh | ml));
}
return x;
}
Given a counter/timer that increases and simply wraps at a given bit width, a well-known solution to the problem of finding the difference between two captured values of the counter (where the counter might have wrapped between the two points) is simply to perform unsigned subtraction on the counter (possibly then interpreting the result as signed if it's not known which one is larger).
For example given a 32-bit timer, code like this can be used to determine the length of time some code takes to run:
uint32_t start = GetSomePlatformSpecificTimer();
RunSomeOtherCode();
uint32_t end = GetSomePlatformSpecificTimer();
uint32_t platformTicksTakenByCode = end - start;
Or alternatively to check if some time limit has been reached:
uint32_t limit = GetSomePlatformSpecificTimer() + timeLimitInTicks;
while (true)
{
bool finished = DoSomethingSmall();
if (finished)
break;
if ((int32_t)(GetSomePlatformSpecificTimer() - limit) >= 0)
return ERROR_TIMEOUT;
}
This works great if the timer is known to be 32 bits wide. It also can be adjusted for 16-bit or 8-bit timers by changing the types used.
Is there a similarly simple way to do the same thing where the timer size does not match a type size? For example, a 24-bit timer, or an 18-bit timer.
Assume that the bit size is <= 32 and is specified by a #define COUNTER_WIDTH in some external header (and might change).
Is the best solution to sign-extend the two counter values from COUNTER_WIDTH to 32-bits and then use the code above? I can see that possibly working for the FF -> 00 rollover but I think it would break the 7F -> 80 rollover, so presumably there would have to be some sort of check for this (perhaps sign-extending if the values are near zero and zero-extending if the values are near the midpoint). I think this also means that the difference between two values should be no more than a quarter of the counter range, otherwise it could cause issues.
Or is there a better way to do this?
Instead of sign-extending, you could multiply up so that the full range becomes the same size as your arithmetic type. In other words, use fixed-point arithmetic to fill the integer. In your case, with uint32_t, that would look like
uint32_t start = GetSomePlatformSpecificTimer();
RunSomeOtherCode();
uint32_t end = GetSomePlatformSpecificTimer();
start <<= 32-COUNTER_WIDTH;
end <<= 32-COUNTER_WIDTH;
uint32_t platformTicksTakenByCode = end - start;
platformTicksTakenByCode >>= 32-COUNTER_WIDTH;
Obviously you'd want to encapsulate that arithmetic:
const uint32_t start = GetScaledTimer();
RunSomeOtherCode();
const uint32_t end = GetScaledTimer();
const uint32_t platformTicksTakenByCode = RescaleDuration(end - start);
with
uint32_t GetScaledTimer()
{
return GetSomePlatformSpecificTimer() << 32-COUNTER_WIDTH;
}
uint32_t RescaleDuration(uint32_t d)
{
return d >> 32-COUNTER_WIDTH;
}
You then have much the same behaviour as for your full-width timer, and the same option to use signed types if necessary.
I am trying to solve the Ex 2-1 of K&R's C book. The exercise asks to, among others, determine the ranges of char by direct computation (rather than printing the values directly from the limits.h). Any idea on how this should be done nicely?
Ok, I throw my version in the ring:
unsigned char uchar_max = (unsigned char)~0;
// min is 0, of course
signed char schar_min = (signed char)(uchar_max & ~(uchar_max >> 1));
signed char schar_max = (signed char)(0 - (schar_min + 1));
It does assume 2's complement for signed and the same size for signed and unsigned char. While the former I just define, the latter I'm sure can be deduced from the standard as both are char and have to hold all encodings of the "execution charset" (What would that imply for RL-encoded charsets like UTF-8).
It is straigt-forward to get a 1's complement and sing/magnitude-version from this. Note that the unsigned version is always the same.
One advantage is that is completely runs with char types and no loops, etc. So it will be still performant on 8-bit architectures.
Hmm ... I really thought this would need a loop for signed. What did I miss?
Assuming that the type will wrap intelligently1, you can simply start by setting the char variable to be zero.
Then increment it until the new value is less than the previous value.
The new value is the minimum, the previous value was the maximum.
The following code should be a good start:
#include<stdio.h>
int main (void) {
char prev = 0, c = 0;
while (c >= prev) {
prev = c;
c++;
}
printf ("Minimum is %d\n", c);
printf ("Maximum is %d\n", prev);
return 0;
}
1 Technically, overflowing a variable is undefined behaviour and anything can happen, but the vast majority of implementations will work. Just keep in mind it's not guaranteed to work.
In fact, the difficulty in working this out in a portable way (some implementations had various different bit-widths for char and some even used different encoding schemes for negative numbers) is probably precisely why those useful macros were put into limits.h in the first place.
You could always try the ol' standby, printf...
let's just strip things down for simplicity's sake.
This isn't a complete answer to your question, but it will check to see if a char is 8-bit--with a little help (yes, there's a bug in the code). I'll leave it up to you to figure out how.
#include <stdio.h>
#DEFINE MMAX_8_BIT_SIGNED_CHAR 127
main ()
{
char c;
c = MAX_8_BIT_SIGNED_CHAR;
printf("%d\n", c);
c++;
printf("%d\n", c);
}
Look at the output. I'm not going to give you the rest of the answer because I think you will get more out of it if you figure it out yourself, but I will say that you might want to take a look at the bit shift operator.
There are 3 relatively simple functions that can cover both the signed and unsigned types on both x86 & x86_64:
/* signed data type low storage limit */
long long limit_s_low (unsigned char bytes)
{ return -(1ULL << (bytes * CHAR_BIT - 1)); }
/* signed data type high storage limit */
long long limit_s_high (unsigned char bytes)
{ return (1ULL << (bytes * CHAR_BIT - 1)) - 1; }
/* unsigned data type high storage limit */
unsigned long long limit_u_high (unsigned char bytes)
{
if (bytes < sizeof (long long))
return (1ULL << (bytes * CHAR_BIT)) - 1;
else
return ~1ULL - 1;
}
With CHAR_BIT generally being 8.
the smart way, simply calculate sizeof() of your variable and you know it's that many times larger than whatever has sizeof()=1, usually char. Given that you can use math to calculate the range. Doesn't work if you have odd sized types, like 3 bit chars or something.
the try hard way, put 0 in the type, and increment until it doesn't increment anymore (wrap around or stays the same depending on machine). Whatever the number before that was, that's the max. Do the same for min.
Suppose I have the following tagged union:
// f32 is a float of 32 bits
// uint32 is an unsigned int of 32 bits
struct f32_or_uint32 {
char tag;
union {
f32 f;
uint32 u;
}
}
If tag == 0, then it is a f32. If tag == 1, then it is a uint32. There is only one problem with that representation: it uses 64 bits, when only 33 should be necessary. That is almost a ´1/2´ waste, which can be considerably when you are dealing with huge buffers. I never use the 32 bits, so I thought in using one bit as the flag and doing this instead:
#define IS_UINT32(x) (!(x&0x80000000))
#define IS_F323(x) (x&0x80000000)
#define MAKE_F32(x) (x|0x80000000)
#define EXTRACT_F32(x) (x&0x7FFFFFF)
union f32_or_uint32 {
f32 f;
uint32 u;
}
This way, I am using 31 bits for the value and only 1 for the tag. My question is: could this practice be detrimental to performance, maintainability and portability?
No, you can't do that. At least, not in the general sense.
An unsigned integer takes on 2^32 different values. It uses all 32 bits. Likewise, a float takes on (nearly) 2^32 different values. It uses all 32 bits.
With some care it might well be possible to isolate a bit that will always be 1 in one type and 0 for the other, across the range of values that you actually want to use. The high bit of unsigned int would be available if you decided to use values only up to 2^31. The low bit of float could be available if you didn't mind a small rounding error.
There is a better strategy available if the range of unsigned ints is smaller (say only 23 bits). You could select a high order bit pattern of 1+8 bits that was illegal for your usage of float. Perhaps you can manage without +/- infinity? Try 0x1ff.
To answer your other questions, it's relatively easy to create a new type like this in C++, using a class and some inline functions, and get good performance. Doing it with macros in C would tend to be more invasive of the code and more prone to bugs, but with similar performance. The instruction overhead required to do these tests and perhaps do some mask operations is unlikely to be detectable in most normal usages. Obviously that would have to be reconsidered in the case of a computationally intensive usage, but you can just see this as a typical space/speed trade-off.
Let's talk first about whether this works conceptually. This trick more or less works if you're storing unsigned 32-bit numbers but you know they will never be greater than 231. It works because all numbers smaller than 231 will always have a "0" in the high bit. If you know it will always be 0, you don't actually have to store it.
The trick also more or less works if you are storing floating point numbers that are never negative. For single-precision floating point numbers, the high bit indicates sign, and is always 0 if the number is positive. (This property of floating-point numbers is not nearly as well-known among programmers, so you'd want to document this).
So assuming your use case fits in these parameters, the approach works conceptually. Now let's investigate whether it is possible to express in C.
You can't perform bitwise operations on floating-point values; for more info see [Why you can't] perform a bitwise operation on floating point numbers. So to get at the floating-point number's bit pattern, you need to treat it as a char* array:
typedef uint32_t tagged_t;
tagged_t float_to_tagged(float f) {
uint32_t ret;
memcpy(&ret, &f, sizeof(f));
// Make sure the user didn't pass us a negative number.
assert((ret & 0x80000000) == 0);
return ret | 0x80000000
}
Don't worry about that memcpy() call -- any compiler worth it's salt will optimize it away. This is the best and fastest way to get at the float's underlying bit pattern.
And you'd likewise need to use memcpy to get the original float back.
float tagged_to_float(tagged_t val) {
float ret;
val &= 0x7FFFFFF;
memcpy(&ret, &val, sizeof(val));
return ret;
}
I have answered your question directly because I believe in giving people the facts. That said, I agree with other posters who say this is unlikely to be your best design choice. Reflect on your use case: if you have very large buffers of these values, is it really the case that every single one can be either a uint32 or a float, and there is no pattern to it? If you can move this type information to a higher level, where the type info applies to all values in some part of the buffer, it will most definitely be more efficient than making your loops test the type of every value individually.
Using the high bit is going to be annoying on the most diffuse x86 platform because it's the sign bit and the most significant bit for unsigned ints.
A scheme that's IMO slightly better is to use the lowest bit instead but that requires decoding (i.e. storing a shifted integer):
#include <stdio.h>
typedef union tag_uifp {
unsigned int ui32;
float fp32;
} uifp;
#define FLOAT_VALUE 0x00
#define UINT_VALUE 0x01
int get_type(uifp x) {
return x.ui32 & 1;
}
unsigned get_uiv(uifp x) {
return x.ui32 >> 1;
}
float get_fpv(uifp x) {
return x.fp32;
}
uifp make_uiv(unsigned x) {
uifp result;
result.ui32 = 1 + (x << 1);
return result;
}
uifp make_fpv(float x) {
uifp result;
result.fp32 = x;
result.ui32 &= ~1;
return result;
}
uifp data[10];
void setNumbers() {
int i;
for (i=0; i<10; i++) {
data[i] = (i & 1) ? make_fpv(i/10.0) : make_uiv(i);
}
}
void printNumbers() {
int i;
for (i=0; i<10; i++) {
if (get_type(data[i]) == FLOAT_VALUE) {
printf("%0.3f\n", get_fpv(data[i]));
} else {
printf("%i\n", get_uiv(data[i]));
}
data[i] = (i & 1) ? make_fpv(i) : make_uiv(i);
}
}
int main(int argc, const char *argv[]) {
setNumbers();
printNumbers();
return 0;
}
With this approach what you are losing is the least significant bit of precision from the float number (i.e. storing a float value and re-reading it is going to lose some accuracy) and only 31 bits are available for the integer.
You could try instead to use only NaNs floating point values, but this means that only 22 bits are easily available for the integers because of the float format (23 if you're willing to lose also infinity).
The idea of using lowest bits for tagging is used often (e.g. Lisp implementations).