How to find the nth bit of an integer in C - c

I've got an assignment where I need to convert from an 8 bit sign magnitude number to two's complement and then add those two numbers. I've got a relatively good idea as to how to do this, however I can't work out how to find the eighth bit of an integer such that I can tell what sign the number has.
The overall idea is that should the sign bit be 0 just return the number as it is already in two's complement if it is a one though then I want to set it to 0 before inverting all bits with the ~ operator and then add 1.
Thanks in advance

You can check if the high bit is set by creating a mask that has just that bit set and using a logical AND to see if the result is non-zero.
Once you know the high bit is set, you can convert to twos complement by flipping all bits and adding one.
uint8_t x = (some value)
if (x & (1 << 7)) {
printf("sign bit set\n");
x = (uint8_t)((~(x & (0x7F))) & 0xFF) + 1;
printf("converted value: %02X\n", x);
}
Then you can add this number to any other normally.

Assuming that your computer/compiler uses two's complement (almost certainly the case) and assuming that you want the result to be in two's complement.
Use an uint8_t to hold the sign and magnitude number.
To check if a bit is set, use the bitwise AND operator &, together with a bit mask corresponding to the msb. To get a bit mask corresponding to bit n, left shift the value 1 n times. In C code:
#define SIGN (1 << 7)
uint8_t sm = ...;
if(sm & SIGN) // if non-zero, then the SIGN bit is set
{
}
else // it was zero, the SIGN bit is not set
{
}
To do the actual conversion, there are several ways. I simply would mask out and copy the relevant parts of the number, again with bitwise AND:
#define MAGNITUDE 0x7F
int8_t magnitude = sm & MAGNITUDE; // variable magnitude is two's compl.
EDIT complete solution (since someone already posted one):
#define SIGN (1 << 7)
#define MAGNITUDE 0x7F
uint8_t sm = ...;
int8_t twos_compl = sm & MAGNITUDE;
if(sm & SIGN) // if non-zero, then the SIGN bit is set
{
twos_compl = -twos_compl;
}
int8_t x = ...; // some other number in two's complement
int16_t result = twos_compl + x;
As a side note, be very careful when mixing the ~ operator with small integer types, because it performs an implicit integer promotion. For example uint8_t x = 1 and then ~my_uint8 gives you 0xFFFFFFFE (32 bit system) and not 0xFE as you might expect.
For the above task, there is no need to use ~ at all.

Related

Unknown system bitsize for int, how to create mask

I would like to create a mask for the MSB only, however the width of the int on the operating system is suppose to be unknown, so you cannot assume 32 bits.
see the following
// THE FOLLOWING FAILS BECAUSE OF SYSTEM IMPLEMENTING A LOGICAL
// RIGHT SHIFT
// Idea is
// 1. 0 inverted = all 1's
// 2. Arithmetic shift right
// 3. Then invert again to preseve MSB '1'
const int unsigned mask = ~(~0>>1); // FAIL, because of logic shift
Assuming 16 bit system
~0 give FFFF
~0>>1 give 7FFF
~(~0 >> 1) give 8000
You should add an u suffix to make what is shifted unsigned so that logical right shift is performed instead of arithmetic one.
const int unsigned mask = ~(~0u>>1);
You can just left shift the (unsigned) value 1 by the number of bits in the type minus 1 (i.e. for a 32-bit type, the MSB will be 1 << 31). To get the number of bits, use a combination of the sizeof operator and the CHAR_BIT constant (defined in <limits.h>):
const unsigned int MSB = 1u << (sizeof(unsigned int) * CHAR_BIT - 1);
INT_MAX is the int bit pattern of 0111...1111 (of some width)* for all implementations.
To form 1000...0000, invert those bits.
~INT_MAX
The above treads on undefined beahvior (UB).
Better to looks to unsigned or wider types.
unsigned mask = ~(unsigned) INT_MAX;
On rare machines, INT_MAX == UINT_MAX, so on those, look to wider types:
long long = ~(long long) INT_MAX;
On rarer machines (unheard of), INT_MAX == LONG_MAX is also true, then we are out of luck.
Pedantic: Rare machines use padding on int/unsigned, so best to drive code with (U)INT_MAX than sizeof.
* Maybe some padding bits too - very rare.

Finding if a value falls within a range, using bitwise operators in C

So i am working on this method, but am restricted to using ONLY these operators:
<<, >>, !, ~, &, ^, |, +
I need to find if a given int parameter can be represented using 2's complement arithmetic in a given amount of bits.
Here is what I have so far:
int validRange(int val, int bits){
int minInRange = ~(1<<(bits + ~0 ))+1; //the smallest 2's comp value possible with this many bits
int maxInRange = (1<<(bits+~0))+~0; //largest 2's comp value possible
..........
}
This is what I have so far, and all I need to do now is figure out how to tell if minInRange <= val <=maxInRange. I wish I could use the greater than or less than operator, but we are not allowed. What is the bitwise way to check this?
Thanks for any help!
Two's complement negative numbers always have a '1' in their high bit.
You can convert from negative to positive (and vice versa) by converting from FF -> 00 -> 01. That is, invert the bits, add 1. (01 -> FE -> FF also works: invert the bits, add 1)
A positive number can be represented if the highest set bit in the number is within your range. (nbits - 1: 7 bits for an 8 bit signed char, etc.)
I'm not sure if your constraints allow you to use arrays. They would speed up some things but can be replaced with loops or if statements.
Anyway, if 1 << (NUM_INT_BITS-1) is set on your input, then it's negative.
Invert, add one.
Now, consider 0. Zero is a constant, and it's always the same no matter how many bits. But if you invert 0, you get "all the bits" which changes by architecture. So, ALL_BITS = ~0.
If you want to know if a positive number can be represented in 2 bits, check to see if any bits greater than or equal to bit 2 are set. Example:
two_bits = 0b00000011
any_other_bits = ~two_bits # Result: 0b11...11100
if positive_number & any_other_bits
this number is too fat for these bits!
But how do you know what ~two_bits should be? Well, it's "all set bits except the bottom however-many". And you can construct that by starting with "all set bits" and shifting them upwards (aka, "left") however-many places:
any_other_bits = ~0 << 2 # where "2" is the number of bits to check
All together now:
if (val & ((unsigned)INT_MAX + 1))
val = ~val + 1;
mask = ~0 << bits;
too_wide = val & mask;
return !too_wide;
To test if a number can be represented in a N-bit 2s compliment number: Simply test that either
The number bitwise-and'ed with the compliment of a word with the low (N-1) bits set is equal to zero
OR The high InputBitWidth-(N-1) bits of the number are 1s.
mask=(1<<(bits-1))-1; return ( !(val&mask) | !((val&~mask)^~mask) );

Convert Raw 14 bit Two's Complement to Signed 16 bit Integer

I am doing some work in embedded C with an accelerometer that returns data as a 14 bit 2's complement number. I am storing this result directly into a uint16_t. Later in my code I am trying to convert this "raw" form of the data into a signed integer to represent / work with in the rest of my code.
I am having trouble getting the compiler to understand what I am trying to do. In the following code I'm checking if the 14th bit is set (meaning the number is negative) and then I want to invert the bits and add 1 to get the magnitude of the number.
int16_t fxls8471qr1_convert_raw_accel_to_mag(uint16_t raw, enum fxls8471qr1_fs_range range) {
int16_t raw_signed;
if(raw & _14BIT_SIGN_MASK) {
// Convert 14 bit 2's complement to 16 bit 2's complement
raw |= (1 << 15) | (1 << 14); // 2's complement extension
raw_signed = -(~raw + 1);
}
else {
raw_signed = raw;
}
uint16_t divisor;
if(range == FXLS8471QR1_FS_RANGE_2G) {
divisor = FS_DIV_2G;
}
else if(range == FXLS8471QR1_FS_RANGE_4G) {
divisor = FS_DIV_4G;
}
else {
divisor = FS_DIV_8G;
}
return ((int32_t)raw_signed * RAW_SCALE_FACTOR) / divisor;
}
This code unfortunately doesn't work. The disassembly shows me that for some reason the compiler is optimizing out my statement raw_signed = -(~raw + 1); How do I acheive the result I desire?
The math works out on paper, but I feel like for some reason the compiler is fighting with me :(.
Converting the 14 bit 2's complement value to 16 bit signed, while maintaining the value is simply a metter of:
int16_t accel = (int16_t)(raw << 2) / 4 ;
The left-shift pushes the sign bit into the 16 bit sign bit position, the divide by four restores the magnitude but maintains its sign. The divide avoids the implementation defined behaviour of an right-shift, but will normally result in a single arithmetic-shift-right on instruction sets that allow. The cast is necessary because raw << 2 is an int expression, and unless int is 16 bit, the divide will simply restore the original value.
It would be simpler however to just shift the accelerometer data left by two bits and treat it as if the sensor was 16 bit in the first place. Normalising everything to 16 bit has the benefit that the code needs no change if you use a sensor with any number of bits up-to 16. The magnitude will simply be four times greater, and the least significant two bits will be zero - no information is gained or lost, and the scaling is arbitrary in any case.
int16_t accel = raw << 2 ;
In both cases, if you want the unsigned magnitude then that is simply:
int32_t mag = (int32_t)labs( (int)accel ) ;
I would do simple arithmetic instead. The result is 14-bit signed, which is represented as a number from 0 to 2^14 - 1. Test if the number is 2^13 or above (signifying a negative) and then subtract 2^14.
int16_t fxls8471qr1_convert_raw_accel_to_mag(uint16_t raw, enum fxls8471qr1_fs_range range)
{
int16_t raw_signed = raw;
if(raw_signed >= 1 << 13) {
raw_signed -= 1 << 14;
}
uint16_t divisor;
if(range == FXLS8471QR1_FS_RANGE_2G) {
divisor = FS_DIV_2G;
}
else if(range == FXLS8471QR1_FS_RANGE_4G) {
divisor = FS_DIV_4G;
}
else {
divisor = FS_DIV_8G;
}
return ((int32_t)raw_signed * RAW_SCALE_FACTOR) / divisor;
}
Please check my arithmetic. (Do I have 13 and 14 correct?)
Supposing that int in your particular C implementation is 16 bits wide, the expression (1 << 15), which you use in mangling raw, produces undefined behavior. In that case, the compiler is free to generate code to do pretty much anything -- or nothing -- if the branch of the conditional is taken wherein that expression is evaluated.
Also if int is 16 bits wide, then the expression -(~raw + 1) and all intermediate values will have type unsigned int == uint16_t. This is a result of "the usual arithmetic conversions", given that (16-bit) int cannot represent all values of type uint16_t. The result will have the high bit set and therefore be outside the range representable by type int, so assigning it to an lvalue of type int produces implementation-defined behavior. You'd have to consult your documentation to determine whether the behavior it defines is what you expected and wanted.
If you instead perform a 14-bit sign conversion, forcing the higher-order bits off ((~raw + 1) & 0x3fff) then the result -- the inverse of the desired negative value -- is representable by a 16-bit signed int, so an explicit conversion to int16_t is well-defined and preserves the (positive) value. The result you want is the inverse of that, which you can obtain simply by negating it. Overall:
raw_signed = -(int16_t)((~raw + 1) & 0x3fff);
Of course, if int were wider than 16 bits in your environment then I see no reason why your original code would not work as expected. That would not invalidate the expression above, however, which produces consistently-defined behavior regardless of the size of default int.
Assuming when code reaches return ((int32_t)raw_signed ..., it has a value in the [-8192 ... +8191] range:
If RAW_SCALE_FACTOR is a multiple of 4 then a little savings can be had.
So rather than
int16_t raw_signed = raw << 2;
raw_signed >>= 2;
instead
int16_t fxls8471qr1_convert_raw_accel_to_mag(uint16_t raw,enum fxls8471qr1_fs_range range){
int16_t raw_signed = raw << 2;
uint16_t divisor;
...
// return ((int32_t)raw_signed * RAW_SCALE_FACTOR) / divisor;
return ((int32_t)raw_signed * (RAW_SCALE_FACTOR/4)) / divisor;
}
To convert the 14-bit two's-complement into a signed value, you can flip the sign bit and subtract the offset:
int16_t raw_signed = (raw ^ 1 << 13) - (1 << 13);

How to convert from sign-magnitude to two's complement

How would I convert from sign-magnitude to two's complement. I don't know where to start. Any help would be appreciated.
I can only use the following operations:!,~,|,&,^,+,>>,<<.
/*
* sm2tc - Convert from sign-magnitude to two's complement
* where the MSB is the sign bit
* Example: sm2tc(0x80000005) = -5.
*
*/
int sm2tc(int x) {
return 2;
}
You can convert signed-magnitude to two's complement by subtracting the number from 0x80000000 if the number is negative. This will work for a 32-bit integer on a machine using two's complement to represent negative values, but if the value is positive this will result in a two's complement negation. A right shift of a two's complement negative number will shift in one's, we can utilize this to make a mask to select between the original value, or the conversion of a signed-magnitude negative value to a two's complement negative value.
int sm2tc(int x) {
int m = x >> 31;
return (~m & x) | (((x & 0x80000000) - x) & m);
}
There you go.
uint32_t sm2tc(uint32_t x)
{
return (x & 0x80000000)
? ((~(x & 0x7fffffff)) + (uint32_t)1)
: x;
}
Interestingly, the conversion between the two formats is symmetrical, so you need only one conversion function to swap from one format to the other. Here is the complete conversion without relying on any conditionals:
uint32_t convertSignRepresentation(uint32_t in) {
uint32_t mask = -(in >> 31);
return mask&0x80000000-in | ~mask&in;
}
The technique I used here is essentially replacing the conditional operator in
uint32_t convertSignRepresentation(uint32_t in) {
return (in >> 31) ? 0x80000000-in : in;
}
by a bitmask of only zeros or ones to select the correct resulting value.
Please note, that 0x80000000 (either smallest possible value, or negative zero) is projected to positive zero, and cannot be recovered. So convertSignRepresentation(converSignRepresentation(0x80000000)) yields zero instead of 0x80000000. This might give nasty surprises. It might be avoided in theory by mapping 0x80000000 onto itself, but that is not as easy to do and has even nastier surprises...
Edit:
A comment pointed out that subtraction was not on the list of allowed operators, even though addition is. I don't know whether this was deliberate or a mistake. Anyways, the operation -x can be written as ~x + 1. With this, the code becomes:
uint32_t convertSignRepresentation(uint32_t in) {
uint32_t mask = ~(in >> 31) + 1;
return mask&0x80000001+~in | ~mask&in;
}
Signed Numbers are 8 bit quantities with the least significant 7 bits representing the magnitude and the most significant bit indicating the sign. 0 in this bit indicates the number is positive, and 1 indicates it is negative. There is no magnitude information in this 8th bit-just the sign.
To convert a negative signed number to two's complement, first set the 8th bit to zero. Then invert all 8 bits. Finally add 1. An example follows:
Signed Number:
10001111
set the 8th bit to zero: (use & operator)
00001111
invert all 8 bits: (use bitwise-complement operator)
11110000
finally, add 1; resulting in the final two's complement number: (use + operator)
11110001
If the 8th bit is 0, indicating that the signed number is positive, the number requires no conversion. It's two's complement representation is the same as the signed magnitude representation.
To convert from Sign Magnitude x to Two's Complement y:
1) On a two's complement machine.
2) Use only !,~,|,&,^,+,>>,<<
3) Does not use ?:, -, *, /
4) Does not assume 4-byte int
5) Work with all Sign Magnitude including +0 and -0
#include <limits.h>
int sm2tc(int x) {
int sign = x & INT_MIN;
int negmask = UINT_MAX + !sign;
return (x & ~negmask) | (negmask & ((~x + 1)^INT_MIN));
}

How to create mask with least significat bits set to 1 in C

Can someone please explain this function to me?
A mask with the least significant n bits set to 1.
Ex:
n = 6 --> 0x2F, n = 17 --> 0x1FFFF // I don't get these at all, especially how n = 6 --> 0x2F
Also, what is a mask?
The usual way is to take a 1, and shift it left n bits. That will give you something like: 00100000. Then subtract one from that, which will clear the bit that's set, and set all the less significant bits, so in this case we'd get: 00011111.
A mask is normally used with bitwise operations, especially and. You'd use the mask above to get the 5 least significant bits by themselves, isolated from anything else that might be present. This is especially common when dealing with hardware that will often have a single hardware register containing bits representing a number of entirely separate, unrelated quantities and/or flags.
A mask is a common term for an integer value that is bit-wise ANDed, ORed, XORed, etc with another integer value.
For example, if you want to extract the 8 least significant digits of an int variable, you do variable & 0xFF. 0xFF is a mask.
Likewise if you want to set bits 0 and 8, you do variable | 0x101, where 0x101 is a mask.
Or if you want to invert the same bits, you do variable ^ 0x101, where 0x101 is a mask.
To generate a mask for your case you should exploit the simple mathematical fact that if you add 1 to your mask (the mask having all its least significant bits set to 1 and the rest to 0), you get a value that is a power of 2.
So, if you generate the closest power of 2, then you can subtract 1 from it to get the mask.
Positive powers of 2 are easily generated with the left shift << operator in C.
Hence, 1 << n yields 2n. In binary it's 10...0 with n 0s.
(1 << n) - 1 will produce a mask with n lowest bits set to 1.
Now, you need to watch out for overflows in left shifts. In C (and in C++) you can't legally shift a variable left by as many bit positions as the variable has, so if ints are 32-bit, 1<<32 results in undefined behavior. Signed integer overflows should also be avoided, so you should use unsigned values, e.g. 1u << 31.
For both correctness and performance, the best way to accomplish this has changed since this question was asked back in 2012 due to the advent of BMI instructions in modern x86 processors, specifically BLSMSK.
Here's a good way of approaching this problem, while retaining backwards compatibility with older processors.
This method is correct, whereas the current top answers produce undefined behavior in edge cases.
Clang and GCC, when allowed to optimize using BMI instructions, will condense gen_mask() to just two ops. With supporting hardware, be sure to add compiler flags for BMI instructions:
-mbmi -mbmi2
#include <inttypes.h>
#include <stdio.h>
uint64_t gen_mask(const uint_fast8_t msb) {
const uint64_t src = (uint64_t)1 << msb;
return (src - 1) ^ src;
}
int main() {
uint_fast8_t msb;
for (msb = 0; msb < 64; ++msb) {
printf("%016" PRIx64 "\n", gen_mask(msb));
}
return 0;
}
First, for those who only want the code to create the mask:
uint64_t bits = 6;
uint64_t mask = ((uint64_t)1 << bits) - 1;
# Results in 0b111111 (or 0x03F)
Thanks to #Benni who asked about using bits = 64. If you need the code to support this value as well, you can use:
uint64_t bits = 6;
uint64_t mask = (bits < 64)
? ((uint64_t)1 << bits) - 1
: (uint64_t)0 - 1
For those who want to know what a mask is:
A mask is usually a name for value that we use to manipulate other values using bitwise operations such as AND, OR, XOR, etc.
Short masks are usually represented in binary, where we can explicitly see all the bits that are set to 1.
Longer masks are usually represented in hexadecimal, that is really easy to read once you get a hold of it.
You can read more about bitwise operations in C here.
I believe your first example should be 0x3f.
0x3f is hexadecimal notation for the number 63 which is 111111 in binary, so that last 6 bits (the least significant 6 bits) are set to 1.
The following little C program will calculate the correct mask:
#include <stdarg.h>
#include <stdio.h>
int mask_for_n_bits(int n)
{
int mask = 0;
for (int i = 0; i < n; ++i)
mask |= 1 << i;
return mask;
}
int main (int argc, char const *argv[])
{
printf("6: 0x%x\n17: 0x%x\n", mask_for_n_bits(6), mask_for_n_bits(17));
return 0;
}
0x2F is 0010 1111 in binary - this should be 0x3f, which is 0011 1111 in binary and which has the 6 least-significant bits set.
Similarly, 0x1FFFF is 0001 1111 1111 1111 1111 in binary, which has the 17 least-significant bits set.
A "mask" is a value that is intended to be combined with another value using a bitwise operator like &, | or ^ to individually set, unset, flip or leave unchanged the bits in that other value.
For example, if you combine the mask 0x2F with some value n using the & operator, the result will have zeroes in all but the 6 least significant bits, and those 6 bits will be copied unchanged from the value n.
In the case of an & mask, a binary 0 in the mask means "unconditionally set the result bit to 0" and a 1 means "set the result bit to the input value bit". For an | mask, an 0 in the mask sets the result bit to the input bit and a 1 unconditionally sets the result bit to 1, and for an ^ mask, an 0 sets the result bit to the input bit and a 1 sets the result bit to the complement of the input bit.

Resources