How to convert from sign-magnitude to two's complement

How to convert from sign-magnitude to two's complement - c

How would I convert from sign-magnitude to two's complement. I don't know where to start. Any help would be appreciated.
I can only use the following operations:!,~,|,&,^,+,>>,<<.
/*
* sm2tc - Convert from sign-magnitude to two's complement
* where the MSB is the sign bit
* Example: sm2tc(0x80000005) = -5.
*
*/
int sm2tc(int x) {
return 2;
}

You can convert signed-magnitude to two's complement by subtracting the number from 0x80000000 if the number is negative. This will work for a 32-bit integer on a machine using two's complement to represent negative values, but if the value is positive this will result in a two's complement negation. A right shift of a two's complement negative number will shift in one's, we can utilize this to make a mask to select between the original value, or the conversion of a signed-magnitude negative value to a two's complement negative value.
int sm2tc(int x) {
int m = x >> 31;
return (~m & x) | (((x & 0x80000000) - x) & m);
}

There you go.
uint32_t sm2tc(uint32_t x)
{
return (x & 0x80000000)
? ((~(x & 0x7fffffff)) + (uint32_t)1)
: x;
}

Interestingly, the conversion between the two formats is symmetrical, so you need only one conversion function to swap from one format to the other. Here is the complete conversion without relying on any conditionals:
uint32_t convertSignRepresentation(uint32_t in) {
uint32_t mask = -(in >> 31);
return mask&0x80000000-in | ~mask&in;
}
The technique I used here is essentially replacing the conditional operator in
uint32_t convertSignRepresentation(uint32_t in) {
return (in >> 31) ? 0x80000000-in : in;
}
by a bitmask of only zeros or ones to select the correct resulting value.
Please note, that 0x80000000 (either smallest possible value, or negative zero) is projected to positive zero, and cannot be recovered. So convertSignRepresentation(converSignRepresentation(0x80000000)) yields zero instead of 0x80000000. This might give nasty surprises. It might be avoided in theory by mapping 0x80000000 onto itself, but that is not as easy to do and has even nastier surprises...
Edit:
A comment pointed out that subtraction was not on the list of allowed operators, even though addition is. I don't know whether this was deliberate or a mistake. Anyways, the operation -x can be written as ~x + 1. With this, the code becomes:
uint32_t convertSignRepresentation(uint32_t in) {
uint32_t mask = ~(in >> 31) + 1;
return mask&0x80000001+~in | ~mask&in;
}

Signed Numbers are 8 bit quantities with the least significant 7 bits representing the magnitude and the most significant bit indicating the sign. 0 in this bit indicates the number is positive, and 1 indicates it is negative. There is no magnitude information in this 8th bit-just the sign.
To convert a negative signed number to two's complement, first set the 8th bit to zero. Then invert all 8 bits. Finally add 1. An example follows:
Signed Number:
10001111
set the 8th bit to zero: (use & operator)
00001111
invert all 8 bits: (use bitwise-complement operator)
11110000
finally, add 1; resulting in the final two's complement number: (use + operator)
11110001
If the 8th bit is 0, indicating that the signed number is positive, the number requires no conversion. It's two's complement representation is the same as the signed magnitude representation.

To convert from Sign Magnitude x to Two's Complement y:
1) On a two's complement machine.
2) Use only !,~,|,&,^,+,>>,<<
3) Does not use ?:, -, *, /
4) Does not assume 4-byte int
5) Work with all Sign Magnitude including +0 and -0
#include <limits.h>
int sm2tc(int x) {
int sign = x & INT_MIN;
int negmask = UINT_MAX + !sign;
return (x & ~negmask) | (negmask & ((~x + 1)^INT_MIN));
}

Related

How can I convert this number representation to a float?

I read this 16-bit value from a temperature sensor (type MCP9808)
Ignoring the first three MSBs, what's an easy way to convert the other bits to a float?
I managed to convert the values 2^7 through 2^0 to an integer with some bit-shifting:
uint16_t rawBits = readSensor();
int16_t value = (rawBits << 3) / 128;
However I can't think of an easy way to also include the bits with an exponent smaller than 0, except for manually checking if they're set and then adding 1/2, 1/4, 1/8 and 1/16 to the result respectively.

Something like this seems pretty reasonable. Take the number portion, divide by 16, and fix the sign.
float tempSensor(uint16_t value) {
bool negative = (value & 0x1000);
return (negative ? -1 : 1) * (value & 0x0FFF) / 16.0f;
}

float convert(unsigned char msb, unsigned char lsb)
{
return ((lsb | ((msb & 0x0f) << 8)) * ((msb & 0x10) ? -1 : 1)) / 16.0f;
}
or
float convert(uint16_t val)
{
return (((val & 0x1000) ? -1 : 1) * (val << 4)) / 256.0f;
}

If performance isn't a super big deal, I would go for something less clever and more explcit, along the lines of:
bool is_bit_set(uint16_t value, uint16_t bit) {
uint16_t mask = 1 << bit;
return (value & mask) == mask;
}
float parse_temperature(uint16_t raw_reading) {
if (is_bit_set(raw_reading, 15)) { /* temp is above Tcrit. Do something about it. */ }
if (is_bit_set(raw_reading, 14)) { /* temp is above Tupper. Do something about it. */ }
if (is_bit_set(raw_reading, 13)) { /* temp is above Tlower. Do something about it. */ }
uint16_t whole_degrees = (raw_reading & 0x0FF0) >> 4;
float magnitude = (float) whole_degrees;
if (is_bit_set(raw_reading, 0)) magnitude += 1.0f/16.0f;
if (is_bit_set(raw_reading, 1)) magnitude += 1.0f/8.0f;
if (is_bit_set(raw_reading, 2)) magnitude += 1.0f/4.0f;
if (is_bit_set(raw_reading, 3)) magnitude += 1.0f/2.0f;
bool is_negative = is_bit_set(raw_reading, 12);
// TODO: What do the 3 most significant bits do?
return magnitude * (is_negative ? -1.0 : 1.0);
}
Honestly this is a lot of simple constant math, I'd be surprised if the compiler wasn't able to heavily optimize it. That would need confirmation, of course.

If your C compiler has a clz buitlin or equivalent, it could be useful to avoid mul operation.
In your case, as the provided temp value looks like a mantissa and if your C compiler uses IEEE-754 float representation, translating the temp value in its IEEE-754 equivalent may be a most efficient way :
Update: Compact the code a little and more clear explanation about the mantissa.
float convert(uint16_t val) {
uint16_t mantissa = (uint16_t)(val <<4);
if (mantissa==0) return 0.0;
unsigned char e = (unsigned char)(__builtin_clz(mantissa) - 16);
uint32_t r = (uint32_t)((val & 0x1000) << 19 | (0x86 - e) << 23 | ((mantissa << (e+8)) & 0x07FFFFF));
return *((float *)(&r));
}
or
float convert(unsigned char msb, unsigned char lsb) {
uint16_t mantissa = (uint16_t)((msb<<8 | lsb) <<4);
if (mantissa==0) return 0.0;
unsigned char e = (unsigned char)(__builtin_clz(mantissa) - 16);
uint32_t r = (uint32_t)((msb & 0x10) << 27 | (0x86 - e) << 23 | ((mantissa << (e+8)) & 0x07FFFFF));
return *((float *)(&r));
}
Explanation:
We use the fact that the temp value is somehow a mantissa in the range -255 to 255.
We can then consider that its IEEE-754 exponent will be 128 at max to -128 at min.
We use the clz buitlin to get the "order" of the first bit set in the mantissa,
this way we can define the exponent as the therorical max (2^7 =>128) less this "order".
We use also this order to left shift the temp value to get the IEEE-754 mantissa,
plus one left shift to substract the '1' implied part of the significand for IEEE-754.
Thus we build a 32 bits binary IEEE-754 representation from the temp value with :
At first the sign bit to the 32th bit of our binary IEEE-754 representation.
The computed exponent as the theorical max 7 (2^7 =>128) plus the IEEE-754 bias (127) minus the actual "order" of the temp value.
The "order" of the temp value is deducted from the number of leading '0' of its 12 bits representation in the variable mantissa through the clz builtin.
Beware that here we consider that the clz builtin is expecting a 32 bit value as parameter, that is why we substract 16 here. This code may require adaptation if your clz expects anything else.
The number of leading '0' can go from 0 (temp value above 128 or under -127) to 11 as we directly return 0.0 for a zero temp value.
As the following bit of the "order" is then 1 in the temp value, it is equivalent to a power of 2 reduction from the theorical max 7.
Thus, with 7 + 127 => 0x86, we can simply substract to that the "order" as the number of leading '0' permits us to deduce the 'first' base exponent for IEEE-754.
If the "order" is greater than 7 we will still get the negative exponent required for less than 1 values.
We add then this 8bits exponent to our binary IEEE-754 representation from 24th bit to 31th bit.
The temp value is somehow already a mantissa, we suppress the leading '0' and its first bit set by shifting it to the left (e + 1) while also shifting left for 7 bits to place the mantissa in the 32 bits (e+7+1 => e+8) . We mask then only the desired 23 bits (AND &0x7FFFFF).
Its first bit set must be removed as it is the '1' implied significand in IEEE-754 (the power of 2 of the exponent).
We have then the IEEEE-754 mantissa and place it from the 8th bit to the 23th bit of our binary IEEE-754 representation.
The 4 initial trailing 0 from our 16 bits temp value and the added seven 'right' 0 from the shifting won't change the effective IEEE-754 value.
As we start from a 32 bits value and use or operator (|) on a 32 bits exponent and mantissa, we have then the final IEEE-754 representation.
We can then return this binary representation as an IEEE-754 C float value.
Due to the required clz and the IEEE-754 translation, this way is less portable. The main interest is to avoid MUL operations in the resulting machine code for performance on arch with a "poor" FPU.
P.S.: Casts explanation. I've added explicit casts to let the C compiler know that we discard voluntary some bits :
uint16_t mantissa = (uint16_t)(val <<4); : The cast here tells the compiler that we know we'll "loose" four left bits, as it the goal here. We discard the four first bits of the temp value for the mantissa.
(unsigned char)(__builtin_clz(mantissa) - 16) : We tell to the C compiler that we will only consider a 8 bits range for the builtin return, as we know our mantissa has only 12 significatives bits and thus a range output from 0 to 12. Thus we do not need the full int return.
uint32_t r = (uint32_t) ... : We tell the C compiler to not bother with the sign representation here as we build an IEEE-754 representation.

arithmetic right shift shifts in 0s when MSB is 1

As an exercise I have to write the following function:
multiply x by 2, saturating to Tmin / Tmax if overflow, using only bit-wise and bit-shift operations.
Now this is my code:
// xor MSB and 2nd MSB. if diferent, we have an overflow and SHOULD get 0xFFFFFFFF. otherwise we get 0.
int overflowmask = ((x & 0x80000000) ^ ((x & 0x40000000)<<1)) >>31;
// ^ this arithmetic bit shift seems to be wrong
// this gets you Tmin if x < 0 or Tmax if x >= 0
int overflowreplace = ((x>>31)^0x7FFFFFFF);
// if overflow, return x*2, otherwise overflowreplace
return ((x<<1) & ~overflowmask)|(overflowreplace & overflowmask);
now when overflowmask should be 0xFFFFFFFF, it is 1 instead, which means that the arithmetic bit shift >>31 shifted in 0s instead of 1s (MSB got XORed to 1, then shifted to the bottom).
x is signed and the MSB is 1, so according to C99 an arithmetic right shift should fill in 1s. What am I missing?
EDIT: I just guessed that this code isn't correct. To detect an overflow it suffices for the 2nd MSB to be 1.
However, I still wonder why the bit shift filled in 0s.
EDIT:
Example: x = 0xA0000000
x & 0x80000000 = 0x80000000
x & 0x40000000 = 0
XOR => 0x80000000
>>31 => 0x00000001
EDIT:
Solution:
int msb = x & 0x80000000;
int msb2 = (x & 0x40000000) <<1;
int overflowmask = (msb2 | (msb^msb2)) >>31;
int overflowreplace = (x >>31) ^ 0x7FFFFFFF;
return ((x<<1) & ~overflowmask) | (overflowreplace & overflowmask);

Even on twos-complement machines, the behaviour of right-shift (>>) on negative operands is implementation-defined.
A safer approach is to work with unsigned types and explicitly OR-in the MSB.
While you're at it, you probably also want to use fixed-width types (e.g. uint32_t) rather than failing on platforms that don't meet your expectations.

0x80000000 is treated as an unsigned number which causes everything to be converted to unsigned, You can do this:
// xor MSB and 2nd MSB. if diferent, we have an overflow and SHOULD get 0xFFFFFFFF. otherwise we get 0.
int overflowmask = ((x & (0x40000000 << 1)) ^ ((x & 0x40000000)<<1)) >>31;
// this gets you Tmin if x < 0 or Tmax if x >= 0
int overflowreplace = ((x>>31)^0x7FFFFFFF);
// if overflow, return x*2, otherwise overflowreplace
return ((x<<1) & ~overflowmask)|(overflowreplace & overflowmask);
OR write the constants in negative decimals
OR I would store all the constants in const int variables to have them guaranteed signed.

Never use bit-wise operands on signed types. In case of right shift on signed integers, it is up to the compiler if you get an arithmetic or a logical shift.
That's only one of your problems though. When you use a hex integer constant 0x80000000, it is actually of type unsigned int as explained here. This accidentally turns your whole expression (x & 0x80000000) ^ ... into unsigned type because of the integer promotion rule known as "the usual arithmetic conversions". Whereas the 0x40000000 expression is signed int and works as (the specific compiler) expected.
Solution:
All variables involved must be of type uint32_t.
All hex constants involved must be u suffixed.
To get something arithmetic shift portably, you would have to do
(x >> n) | (0xFFFFFFFFu << (32-n)) or some similar hack.

How to find the nth bit of an integer in C

I've got an assignment where I need to convert from an 8 bit sign magnitude number to two's complement and then add those two numbers. I've got a relatively good idea as to how to do this, however I can't work out how to find the eighth bit of an integer such that I can tell what sign the number has.
The overall idea is that should the sign bit be 0 just return the number as it is already in two's complement if it is a one though then I want to set it to 0 before inverting all bits with the ~ operator and then add 1.
Thanks in advance

You can check if the high bit is set by creating a mask that has just that bit set and using a logical AND to see if the result is non-zero.
Once you know the high bit is set, you can convert to twos complement by flipping all bits and adding one.
uint8_t x = (some value)
if (x & (1 << 7)) {
printf("sign bit set\n");
x = (uint8_t)((~(x & (0x7F))) & 0xFF) + 1;
printf("converted value: %02X\n", x);
}
Then you can add this number to any other normally.

Assuming that your computer/compiler uses two's complement (almost certainly the case) and assuming that you want the result to be in two's complement.
Use an uint8_t to hold the sign and magnitude number.
To check if a bit is set, use the bitwise AND operator &, together with a bit mask corresponding to the msb. To get a bit mask corresponding to bit n, left shift the value 1 n times. In C code:
#define SIGN (1 << 7)
uint8_t sm = ...;
if(sm & SIGN) // if non-zero, then the SIGN bit is set
{
}
else // it was zero, the SIGN bit is not set
{
}
To do the actual conversion, there are several ways. I simply would mask out and copy the relevant parts of the number, again with bitwise AND:
#define MAGNITUDE 0x7F
int8_t magnitude = sm & MAGNITUDE; // variable magnitude is two's compl.
EDIT complete solution (since someone already posted one):
#define SIGN (1 << 7)
#define MAGNITUDE 0x7F
uint8_t sm = ...;
int8_t twos_compl = sm & MAGNITUDE;
if(sm & SIGN) // if non-zero, then the SIGN bit is set
{
twos_compl = -twos_compl;
}
int8_t x = ...; // some other number in two's complement
int16_t result = twos_compl + x;
As a side note, be very careful when mixing the ~ operator with small integer types, because it performs an implicit integer promotion. For example uint8_t x = 1 and then ~my_uint8 gives you 0xFFFFFFFE (32 bit system) and not 0xFE as you might expect.
For the above task, there is no need to use ~ at all.

Convert two's complement to sign-magnitude

I need to convert from two's complement to sign-magnitude in C using only the operators
! ~ & ^ | + << >>
My approach is to find sign:
int sign = !(!(a>>31));
basically, if sign == 1 . I want to flip the number and add 1 else just want to display the number.
The thing is I can't use any loops, if statements etc.
This is what I'm working on:
int s_M = ((((a+1)>>31)^sign)+1)&sign;
any suggestions?

From http://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs
int const mask = v >> 31;
unsigned int r = (v + mask) ^ mask;
Gives the absolute value (magnitude). If you wish to add the sign bit back simply mask and or the 32nd bit:
unsigned int s_M = r | (v & 0x80000000);
Or if you're looking for a one liner:
unsigned int s_M = ((v + (v >> 31)) ^ (v >> 31)) | (v & 0x80000000);

When you're converting from 2 complement, you should subtract 1, not add.

I'm not entirely sure what the output should be, but to obtain the magnitude you can do something like this:
int m = (a^(a>>31)) + sign;
Basically, shifting a negative number 31 bits to the right will make it all 1's, or 0xffffffff, which you can then use to xor the input number and make it positive. As you correctly noted sign needs to be added then for the correct result in that case.
If the input number was positive to begin with, the shift results in a zero and so the xor does nothing. Adding sign in that case also doesn't do anything, so it results in the input number.

To get the last bit you could use mask operation
int last_bit = 32 bit integer & 0x80000000
o/p may be 0 or 0x80000000
if it is 0 just display the given number else you have to perform the following operations to represent in signed magnitude
1) Subtract 1 from the number
2) perform 1s complement on the resultant ( that is negation ~)
3) Set the last bit of the resultant number
I mean ( ~ (num -`1) ) | 0x7fffffff
since your restricted not to use - operator. Perform the 2's complement on -1 and add it to the num.
To put it simple in one line
num & 0x80000000 ? printf("%d",(~(num+((~1)+1))) | 0x7fffffff) : printf("%d",num) ;

Multiply with negative integer just by shifting

I'm trying to find a way to multiply an integer value with negative value just with bit shifting.
Usually I do this by shifting with the power of 2 which is closest to my factor and just adding / subtracting the rest, e.g. x * 7 = ((x << 3) - x)
Let's say I'd want to calculate x * -112. The only way I can imagine is -((x << 7) - (x << 4), so to calculate x * 112 and negate it afterwards.
Is there a "prettier" way to do this?

Get the compiler to do it, then check the produced assembly.

The negative of a positive number in 2's complement is done by negating all the bits and then adding 1 to the result. For example, to get -4 from 4 you would do:
4 = 000...0100 in binary. ~4 = 111...1011. -4 = 111...1100.
Same to reverse the sign.
So you could do this:
(~((x << 7) - (x << 4))) + 1.
Not necessarily prettier, but faster if we consider bitwise operations faster than arithmetic operations (especially multiplication) and ignore compiler optimizations.
Not that I'm saying you should do this, because you shouldn't. It's good to know about it though.

Computers internally represent negative integers in two's compliment form. One of the nice properties of two's compliment arithmetic is that multiply negative numbers is just like multiplying positive numbers. Hence, find the two's complement and use your normal approach.
Here's a simple example. For ease of exposition, I'm going to using 8-bit integers and multiply by -15.
15 in hex is 0x0f. The two's compliment of 0x0f is 0xf1.
Since these are 8-bit integers, all arithmetic is mod 0xff. In particular, note that 0x100 * anything = 0.
x * 0xf1
= x * (0x100 - 0x10 + 0x01)
= -(x * 0x10) + x
= -(x << 4) + x