Return the mantissa as bits

Return the mantissa as bits - c

I got a bit trouble with my c code. I am trying to extract the sign, exponent and the mantissa. So far it worked out for sign and exponent. The problem begins with mantissa.
For example, let's take -5.0. 1 bit for sign, 8 bits for the exponent and 23 bits for the mantissa.
This is the binary notation of -5.0:
1 10000001 01000000000000000000000
1 = sign
10000001 = exponent
01000000000000000000000 = mantissa.
I wanna return the bits of the mantissa back.
static int fpmanitssa(uint32_t number) {
uint32_t numbTemp = 8388607;
number = number & numbTemp; //bitwise AND, number should be 2097152.
int numbDiv;
int leftover;
char resultString[23];
size_t idx = 0;
do {
numbDiv = number/2;
leftover = number % 2;
resultString[idx++] = leftover;
number = numbDiv;
} while (number != 0);
return resultString;
What I get as result are negative numbers. I don't know why it isn't working. Could you please help to find the problem?
regards
Hagi

1) Floating-point numbers have significands (with linear scale, also called the fraction portion), not mantissas (with logarithmic scales).
2) Your function is declared to return an int, but you attempt to return resultString, which is an array of char. An array of char is not an int and cannot be automatically converted to an int.
3) If you did return resultString, it would fail because the lifetime of resultString is only the time during function execution; it cannot be used after the function returns. If you do intend to return characters from the function, it should be declared so that the caller passes in an existing array of suitable size or the function dynamically allocates an array (as with malloc) and returns its address to the caller, who is responsible for eventually deallocating it (with free).
4) If you want to return an int with the bits of the encoded significand, and number contains the encoded floating-point number with the encoded significand in the low bits, then all you have to do is remove the high bits with an AND operation and return the low bits.
5) If the array of char is to form a printable numeral rather than numeric bits, then '0' should be added to each one, to convert it from the integer 0 or 1 to the character “0” or “1”. In this case, you may also want a null character at the end of resultString.
6) The bits are assigned to resultString in order from low bit to high bit, which may be the opposite order from desired if resultString is intended to be printed later. (The order may be fine if resultString will only be used for calculation, not printing.)
7) The above presume you want to return only the encoded significand, which is the explicit bits in the floating-point encoding. If you want to return the represented significand, including the implicit bit, then you need a 24th position, and it is 1 if the exponent is non-zero and 0 otherwise. (And presuming, of course, that the number is not a NaN or infinity.)

If you happen to be on uClibc or glibc, you can use ieee754.h and the mantissa field, like so
#include <ieee754.h>
static unsigned int fpmanitssa(float number)
{
ieee754_float x = {.f = number};
return x.mantissa;
}

For one thing, your function declaration specifies that you are returning an int, but then you return resultString, which is a char*. I think you may be relying on automatic type conversion which is not part of C. To get resultString to even look like what you want, you need to add 0x30 to leftover (0x30 is '0' and 0x31 is '1').
Then you would need to null terminate the string and convert it back to an int with atoi().
The following probably doesn't work exactly as you intend, but includes changes to handle the ascii/integer conversions:
static int fpmanitssa(uint32_t number) {
uint32_t numbTemp = 8388607;
number = number & numbTemp; //bitwise AND, number should be 2097152.
int numbDiv;
int leftover;
char resultString[24] = {0};
size_t idx = 0;
do {
numbDiv = number/2;
leftover = number % 2;
resultString[idx++] = leftover + 0x30;
number = numbDiv;
} while (number != 0);
return atoi(resultString);
}

Related

Need help understanding code converting doubles to binary in C

so below is some code which I am using to understand how to get the binary of a double in C however there are some areas of the code which do not make sense to me and I have tried using print statements to help me but to no avail
#include <stdio.h>
#include <string.h>
#include <limits.h>
void double_to_bits(double val);
int main(void)
{
unsigned idx;
double vals[] = { 10 };
for (idx = 0; idx < 4; idx++ ) {
printf("\nvals[%u]= %+lf-->>", idx, vals[idx] );
double_to_bits(vals[idx] );
}
printf("\n");
return 0;
}
void double_to_bits(double val)
{
unsigned idx;
unsigned char arr[sizeof val];
memcpy (arr, &val, sizeof val);
printf("array is : %s\n", arr);
for (idx=CHAR_BIT * sizeof val; idx-- ; ) {
putc(
( arr[idx/CHAR_BIT] & (1u << (idx%CHAR_BIT) ) )
? '1'
: '0'
, stdout
);
}
}
What does arr[idx/CHAR_BIT] return? I understand say if idx = 63 then we get arr[7] but in print statements these seem to just give random integer values .
Also why does the & operand between arr[idx/CHAR_BIT] and (1u%CHAR_BIT) give 2 weird character symbols? How does the & operand work on these two values?
Thank you for your time and help.

This code defines vals to have one element:
double vals[] = { 10 };
This code uses vals as if it has four elements:
for (idx = 0; idx < 4; idx++ ) {
printf("\nvals[%u]= %+lf-->>", idx, vals[idx] );
double_to_bits(vals[idx] );
}
Because of that, the behavior of the program is not defined by the C standard, and different compilers will do different things. You have not shown us what output you get, so we have no way to analyze what your compiler did. Mine seems to figure that, since vals has only one element, it never needs to bother computing the value of idx in vals[idx]; it can always use vals[0]. So every iteration operates on the value 10. Your compiler may have loaded from memory outside the array and used different values.
What does arr[idx/CHAR_BIT] return?
CHAR_BIT is defined in <limits.h> to be the number of bits in a byte. Because of the definition unsigned char arr[sizeof val];, arr is an array with as many elements (each one byte) as there are bytes in a double. So, for example, if there are 8 bits in a byte and 8 bytes in a double, then arr has 8 8-bit elements, 64 bits total.
Using these example numbers, the for loop iterates idx through the numbers from 63 to 0. Then the code uses idx as a counter of the bits in arr. The expression idx/CHAR_BIT figures out which element of the array contains the bit numbered idx: Each array element has CHAR_BIT bits, bits 0-7 (when CHAR_BIT is 8) are in element 0, bits 8-15 are in element 1, bits 16-23 are in element 2, and so on. So dividing idx by CHAR_BIT and truncating to an integer (which is what the / operator does for integer operands) gives the index of the array element that has the bit with number idx.
Then arr[idx/CHAR_BIT] gets that element. Then we need to pick out the individual bit from that element. idx%CHAR_BIT takes the remainder of idx divided by CHAR_BIT, so it gives the number of a bit within a byte. Then 1u << (idx%CHAR_BIT) shifts 1 by that number. The result is number that, when expressed in binary, has a 1 in the bit with that number and a 0 in other bits.
Then the & operator ANDs that with the array element. If the array element has a 0 where the 1 has been shifted to, the result is zero. If the array element has a 1 where the 1 has been shifted to, the result is that bit (still in its position).
Then ? '1' : '0;' uses the result of that AND to select either the character '1' (if the result is not zero) or the character '0' (if the result is zero). That character is passed to putc for printing.
I understand say if idx = 63 then we get arr[7] but in print statements these seem to just give random integer values .
When the array bounds problem described above is corrected and double_to_bits is passed a valid value, it should print the bits that encoded the value of the double. In most C implementations, this will be 0100000000100100000000000000000000000000000000000000000000000000, which is the encoding in the IEEE-754 binary64 format, also called double precision.
Also why does the & operand between arr[idx/CHAR_BIT] and (1u%CHAR_BIT) give 2 weird character symbols?
You have not shown the weird character symbols or other output, so we have no output to interpret.
The statement printf("array is : %s\n", arr); prints arr as if it were a string containing printable characters, but it is used to contain “raw binary data,” so you should not expect that data to result in meaningful characters when printed. Remove that statement. Also, the fact that your program accessed vals out of bounds could cause other complications in output.

How is float to int type conversion done in C? [duplicate]

I was wondering if you could help explain the process on converting an integer to float, or a float to an integer. For my class, we are to do this using only bitwise operators, but I think a firm understanding on the casting from type to type will help me more in this stage.
From what I know so far, for int to float, you will have to convert the integer into binary, normalize the value of the integer by finding the significand, exponent, and fraction, and then output the value in float from there?
As for float to int, you will have to separate the value into the significand, exponent, and fraction, and then reverse the instructions above to get an int value?
I tried to follow the instructions from this question: Casting float to int (bitwise) in C.
But I was not really able to understand it.
Also, could someone explain why rounding will be necessary for values greater than 23 bits when converting int to float?

First, a paper you should consider reading, if you want to understand floating point foibles better: "What Every Computer Scientist Should Know About Floating Point Arithmetic," http://www.validlab.com/goldberg/paper.pdf
And now to some meat.
The following code is bare bones, and attempts to produce an IEEE-754 single precision float from an unsigned int in the range 0 < value < 224. That's the format you're most likely to encounter on modern hardware, and it's the format you seem to reference in your original question.
IEEE-754 single-precision floats are divided into three fields: A single sign bit, 8 bits of exponent, and 23 bits of significand (sometimes called a mantissa). IEEE-754 uses a hidden 1 significand, meaning that the significand is actually 24 bits total. The bits are packed left to right, with the sign bit in bit 31, exponent in bits 30 .. 23, and the significand in bits 22 .. 0. The following diagram from Wikipedia illustrates:
The exponent has a bias of 127, meaning that the actual exponent associated with the floating point number is 127 less than the value stored in the exponent field. An exponent of 0 therefore would be encoded as 127.
(Note: The full Wikipedia article may be interesting to you. Ref: http://en.wikipedia.org/wiki/Single_precision_floating-point_format )
Therefore, the IEEE-754 number 0x40000000 is interpreted as follows:
Bit 31 = 0: Positive value
Bits 30 .. 23 = 0x80: Exponent = 128 - 127 = 1 (aka. 21)
Bits 22 .. 0 are all 0: Significand = 1.00000000_00000000_0000000. (Note I restored the hidden 1).
So the value is 1.0 x 21 = 2.0.
To convert an unsigned int in the limited range given above, then, to something in IEEE-754 format, you might use a function like the one below. It takes the following steps:
Aligns the leading 1 of the integer to the position of the hidden 1 in the floating point representation.
While aligning the integer, records the total number of shifts made.
Masks away the hidden 1.
Using the number of shifts made, computes the exponent and appends it to the number.
Using reinterpret_cast, converts the resulting bit-pattern to a float. This part is an ugly hack, because it uses a type-punned pointer. You could also do this by abusing a union. Some platforms provide an intrinsic operation (such as _itof) to make this reinterpretation less ugly.
There are much faster ways to do this; this one is meant to be pedagogically useful, if not super efficient:
float uint_to_float(unsigned int significand)
{
// Only support 0 < significand < 1 << 24.
if (significand == 0 || significand >= 1 << 24)
return -1.0; // or abort(); or whatever you'd like here.
int shifts = 0;
// Align the leading 1 of the significand to the hidden-1
// position. Count the number of shifts required.
while ((significand & (1 << 23)) == 0)
{
significand <<= 1;
shifts++;
}
// The number 1.0 has an exponent of 0, and would need to be
// shifted left 23 times. The number 2.0, however, has an
// exponent of 1 and needs to be shifted left only 22 times.
// Therefore, the exponent should be (23 - shifts). IEEE-754
// format requires a bias of 127, though, so the exponent field
// is given by the following expression:
unsigned int exponent = 127 + 23 - shifts;
// Now merge significand and exponent. Be sure to strip away
// the hidden 1 in the significand.
unsigned int merged = (exponent << 23) | (significand & 0x7FFFFF);
// Reinterpret as a float and return. This is an evil hack.
return *reinterpret_cast< float* >( &merged );
}
You can make this process more efficient using functions that detect the leading 1 in a number. (These sometimes go by names like clz for "count leading zeros", or norm for "normalize".)
You can also extend this to signed numbers by recording the sign, taking the absolute value of the integer, performing the steps above, and then putting the sign into bit 31 of the number.
For integers >= 224, the entire integer does not fit into the significand field of the 32-bit float format. This is why you need to "round": You lose LSBs in order to make the value fit. Thus, multiple integers will end up mapping to the same floating point pattern. The exact mapping depends on the rounding mode (round toward -Inf, round toward +Inf, round toward zero, round toward nearest even). But the fact of the matter is you can't shove 24 bits into fewer than 24 bits without some loss.
You can see this in terms of the code above. It works by aligning the leading 1 to the hidden 1 position. If a value was >= 224, the code would need to shift right, not left, and that necessarily shifts LSBs away. Rounding modes just tell you how to handle the bits shifted away.

Have you checked the IEEE 754 floating-point representation?
In 32-bit normalized form, it has (mantissa's) sign bit, 8-bit exponent (excess-127, I think) and 23-bit mantissa in "decimal" except that the "0." is dropped (always in that form) and the radix is 2, not 10. That is: the MSB value is 1/2, the next bit 1/4 and so on.

Joe Z's answer is elegant but range of input values is highly limited. 32 bit float can store all integer values from the following range:
[-224...+224] = [-16777216...+16777216]
and some other values outside this range.
The whole range would be covered by this:
float int2float(int value)
{
// handles all values from [-2^24...2^24]
// outside this range only some integers may be represented exactly
// this method will use truncation 'rounding mode' during conversion
// we can safely reinterpret it as 0.0
if (value == 0) return 0.0;
if (value == (1U<<31)) // ie -2^31
{
// -(-2^31) = -2^31 so we'll not be able to handle it below - use const
// value = 0xCF000000;
return (float)INT_MIN; // *((float*)&value); is undefined behaviour
}
int sign = 0;
// handle negative values
if (value < 0)
{
sign = 1U << 31;
value = -value;
}
// although right shift of signed is undefined - all compilers (that I know) do
// arithmetic shift (copies sign into MSB) is what I prefer here
// hence using unsigned abs_value_copy for shift
unsigned int abs_value_copy = value;
// find leading one
int bit_num = 31;
int shift_count = 0;
for(; bit_num > 0; bit_num--)
{
if (abs_value_copy & (1U<<bit_num))
{
if (bit_num >= 23)
{
// need to shift right
shift_count = bit_num - 23;
abs_value_copy >>= shift_count;
}
else
{
// need to shift left
shift_count = 23 - bit_num;
abs_value_copy <<= shift_count;
}
break;
}
}
// exponent is biased by 127
int exp = bit_num + 127;
// clear leading 1 (bit #23) (it will implicitly be there but not stored)
int coeff = abs_value_copy & ~(1<<23);
// move exp to the right place
exp <<= 23;
union
{
int rint;
float rfloat;
}ret = { sign | exp | coeff };
return ret.rfloat;
}
Of course there are other means to find abs value of int (branchless). Similarly couting leading zeros can also be done without a branch so treat this example as example ;-).

How to sign extend a 9-bit value when converting from an 8-bit value?

I'm implementing a relative branching function in my simple VM.
Basically, I'm given an 8-bit relative value. I then shift this left by 1 bit to make it a 9-bit value. So, for instance, if you were to say "branch +127" this would really mean, 127 instructions, and thus would add 256 to the IP.
My current code looks like this:
uint8_t argument = 0xFF; //-1 or whatever
int16_t difference = argument << 1;
*ip += difference; //ip is a uint16_t
I don't believe difference will ever be detected as a less than 0 with this however. I'm rusty on how signed to unsigned works. Beyond that, I'm not sure the difference would be correctly be subtracted from IP in the case argument is say -1 or -2 or something.
Basically, I'm wanting something that would satisfy these "tests"
//case 1
argument = -5
difference -> -10
ip = 20 -> 10 //ip starts at 20, but becomes 10 after applying difference
//case 2
argument = 127 (must fit in a byte)
difference -> 254
ip = 20 -> 274
Hopefully that makes it a bit more clear.
Anyway, how would I do this cheaply? I saw one "solution" to a similar problem, but it involved division. I'm working with slow embedded processors (assumed to be without efficient ways to multiply and divide), so that's a pretty big thing I'd like to avoid.

To clarify: you worry that left shifting a negative 8 bit number will make it appear like a positive nine bit number? Just pad the top 9 bits with the sign bit of the initial number before left shift:
diff = 0xFF;
int16 diff16=(diff + (diff & 0x80)*0x01FE) << 1;
Now your diff16 is signed 2*diff
As was pointed out by Richard J Ross III, you can avoid the multiplication (if that's expensive on your platform) with a conditional branch:
int16 diff16 = (diff + ((diff & 0x80)?0xFF00:0))<<1;
If you are worried about things staying in range and such ("undefined behavior"), you can do
int16 diff16 = diff;
diff16 = (diff16 | ((diff16 & 0x80)?0x7F00:0))<<1;
At no point does this produce numbers that are going out of range.
The cleanest solution, though, seems to be "cast and shift":
diff16 = (signed char)diff; // recognizes and preserves the sign of diff
diff16 = (short int)((unsigned short)diff16)<<1; // left shift, preserving sign
This produces the expected result, because the compiler automatically takes care of the sign bit (so no need for the mask) in the first line; and in the second line, it does a left shift on an unsigned int (for which overflow is well defined per the standard); the final cast back to short int ensures that the number is correctly interpreted as negative. I believe that in this form the construct is never "undefined".

All of my quotes come from the C standard, section 6.3.1.3. Unsigned to signed is well defined when the value is within range of the signed type:
1 When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it
is unchanged.
Signed to unsigned is well defined:
2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
Unsigned to signed, when the value lies out of range isn't too well defined:
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
Unfortunately, your question lies in the realm of point 3. C doesn't guarantee any implicit mechanism to convert out-of-range values, so you'll need to explicitly provide one. The first step is to decide which representation you intend to use: Ones' complement, two's complement or sign and magnitude
The representation you use will affect the translation algorithm you use. In the example below, I'll use two's complement: If the sign bit is 1 and the value bits are all 0, this corresponds to your lowest value. Your lowest value is another choice you must make: In the case of two's complement, it'd make sense to use either of INT16_MIN (-32768) or INT8_MIN (-128). In the case of the other two, it'd make sense to use INT16_MIN - 1 or INT8_MIN - 1 due to the presense of negative zeros, which should probably be translated to be indistinguishable from regular zeros. In this example, I'll use INT8_MIN, since it makes sense that (uint8_t) -1 should translate to -1 as an int16_t.
Separate the sign bit from the value bits. The value should be the absolute value, except in the case of a two's complement minimum value when sign will be 1 and the value will be 0. Of course, the sign bit can be where-ever you like it to be, though it's conventional for it to rest at the far left hand side. Hence, shifting right 7 places obtains the conventional "sign" bit:
uint8_t sign = input >> 7;
uint8_t value = input & (UINT8_MAX >> 1);
int16_t result;
If the sign bit is 1, we'll call this a negative number and add to INT8_MIN to construct the sign so we don't end up in the same conundrum we started with, or worse: undefined behaviour (which is the fate of one of the other answers).
if (sign == 1) {
result = INT8_MIN + value;
}
else {
result = value;
}
This can be shortened to:
int16_t result = (input >> 7) ? INT8_MIN + (input & (UINT8_MAX >> 1)) : input;
... or, better yet:
int16_t result = input <= INT8_MAX ? input
: INT8_MIN + (int8_t)(input % (uint8_t) INT8_MIN);
The sign test now involves checking if it's in the positive range. If it is, the value remains unchanged. Otherwise, we use addition and modulo to produce the correct negative value. This is fairly consistent with the C standard's language above. It works well for two's complement, because int16_t and int8_t are guaranteed to use a two's complement representation internally. However, types like int aren't required to use a two's complement representation internally. When converting unsigned int to int for example, there needs to be another check, so that we're treating values less than or equal to INT_MAX as positive, and values greater than or equal to (unsigned int) INT_MIN as negative. Any other values need to be handled as errors; In this case I treat them as zeros.
/* Generate some random input */
srand(time(NULL));
unsigned int input = rand();
for (unsigned int x = UINT_MAX / ((unsigned int) RAND_MAX + 1); x > 1; x--) {
input *= (unsigned int) RAND_MAX + 1;
input += rand();
}
int result = /* Handle positives: */ input <= INT_MAX ? input
: /* Handle negatives: */ input >= (unsigned int) INT_MIN ? INT_MIN + (int)(input % (unsigned int) INT_MIN)
: /* Handle errors: */ 0;

If the offset is in the 2's complement representation, then
convert this
uint8_t argument = 0xFF; //-1
int16_t difference = argument << 1;
*ip += difference;
into this:
uint8_t argument = 0xFF; //-1
int8_t signed_argument;
signed_argument = argument; // this relies on implementation-defined
// conversion of unsigned to signed, usually it's
// just a bit-wise copy on 2's complement systems
// OR
// memcpy(&signed_argument, &argument, sizeof argument);
*ip += signed_argument + signed_argument;

Convert from binary to floating point

I'm doing some exercises for Computer Science university and one of them is about converting an int array of 64 bit into it's double-precision floating point value.
Understanding the first bit, the sign +/-, is quite easy. Same for the exponent, as well as we know that the bias is 1023.
We are having problems with the significand. How can I calculate it?
In the end, I would like to obtain the real numbers that the bits meant.

computing the significand of the given 64 bit is quite easy.
according to the wiki article using the IEEE 754, the significand is made up the first 53 bits (from bit 0 to bit 52).
Now if you want to convert number having like 67 bits to your 64 bits value, it would be rounded by setting the trailing 64th bits of your value to 1, even if it was one before... because of the other 3 bits:
11110000 11110010 11111 becomes 11110000 11110011 after the rounding of the last byte;
therefore the there is no need to store the 53th bits because it has always a value a one.
that's why you only store in 52 bits in the significand instead of 53.
now to compute it, you just need to target the bit range of the significand [bit(1) - bit(52)] -bit(0) is always 1- and use it .
int index_signf = 1; // starting at 1, not 0
int significand_length = 52;
int byteArray[53]; // array containing the bits of the significand
double significand_endValue = 0;
for( ; index_signf <= significand_length ; index_signf ++)
{
significand_endValue += byteArray[index_signf] * (pow(2,-(index_signf)));
}
significand_endValue += 1;
Now you just have to fill byteArray accordlingly before computing it, using function like that:
int* getSignificandBits(int* array64bits){
//returned array
int significandBitsArray[53];
// indexes++
int i_array64bits = 0;
int i_significandBitsArray=1;
//set the first bit = 1
significandBitsArray[0] = 1;
// fill it
for(i_significandBitsArray=1, i_array64bits = (63 - 1); i_array64bits >= (64 - 52); i_array64bits--, i_significandBitsArray ++)
significandBitsArray[i_significandBitsArray] = array64bits[i_array64bits];
return significandBitsArray;
}

You could just load the bits into an unsigned integer of the same size as a double, take the address of that and cast it to a void* which you then cast to a double* and dereference.
Of course, this might be "cheating" if you really are supposed to parse the floating point standard, but this is how I would have solved the problem given the parameters you've stated so far.

If you have a byte representation of an object you can copy the bytes into the storage of a variable of the right type to convert it.
double convert_to_double(uint64_t x) {
double result;
mempcy(&result, &x, sizeof(x));
return result;
}
You will often see code like *(double *)&x to do the conversion, but whereas in practice this will always work it's undefined behavior in C.

Manually cast signed char

I'm working with some embedded hardware, a Rabbit SBC, which uses Dynamic C 9.
I'm using the microcontroller to read information from a digital compass sensor using one of its serial ports.
The sensor sends values to the microcontroller using a single signed byte. (-85 to 85)
When I receive this data, I am putting it into a char variable
This works fine for positive values, but when the sensor starts to send negative values, the reading jumps to 255, then works its way back down to 0. I presume this is because the last bit is being used to determine the negative/positive, and is skewing the real values.
My inital thought was to change my data type to a signed char.
However, the problem I have is that the version of Dynamic C on the Microcontroller I am using does not natively support signed char values, only unsigned.
I am wondering if there is a way to manually cast the data I receive into a signed value?

You just need to pull out your reference book and read how negative numbers are represented by your controller. The rest is just typing.
For example, two's complement is represented by taking the value mod 256, so you just need to adjust by the modulus.
int signed_from_unsignedchar(unsigned char c)
{
int result = c;
if (result >= 128) result -= 256;
return result;
}
One's complement is much simpler: You just flip the bits.
int signed_from_unsignedchar(unsigned char c)
{
int result = c;
if (result >= 128) result = -(int)(unsigned char)~c;
return result;
}
Sign-magnitude represents negative numbers by setting the high bit, so you just need to clear the bit and negate:
int signed_from_unsignedchar(unsigned char c)
{
int result = c;
if (result >= 128) result = -(result & 0x7F);
return result;
}

I think this is what you're after (assumes a 32-bit int and an 8-bit char):
unsigned char c = 255;
int i = ((int)(((unsigned int)c) << 24)) >> 24;
of course I'm assuming here that your platform does support signed integers, which may not be the case.

Signed and unsigned values are all just a bunch of bits, it is YOUR interpretation that makes them signed or unsigned. For example, if your hardware produces 2's complement, if you read 0xff, you can either interpret it as -1 or 255 but they are really the same number.
Now if you have only unsigned char at your disposal, you have to emulate the behavior of negative values with it.
For example:
c < 0
changes to
c > 127
Luckily, addition doesn't need change. Also subtraction is the same (check this I'm not 100% sure).
For multiplication for example, you need to check it yourself. First, in 2's complement, here's how you get the positive value of the number:
pos_c = ~neg_c+1
which is mathematically speaking 256-neg_c which congruent modulo 256 is simply -neg_c
Now let's say you want to multiply two numbers that are unsigned, but you want to interpret them as signed.
unsigned char abs_a = a, abs_b = b;
char final_sign = 0; // 0 for positive, 1 for negative
if (a > 128)
{
abs_a = ~a+1
final_sign = 1-final_sign;
}
if (b > 128)
{
abs_b = ~b+1
final_sign = 1-final_sign;
}
result = abs_a*abs_b;
if (sign == 1)
result = ~result+1;
You get the idea!

If your platform supports signed ints, check out some of the other answers.
If not, and the value is definitely between -85 and +85, and it is two's complement, add 85 to the input value and work out your program logic to interpret values between 0 and 170 so you don't have to mess with signed integers anymore.
If it's one's complement, try this:
if (x >= 128) {
x = 85 - (x ^ 0xff);
} else {
x = x + 85;
}
That will leave you with a value between 0 and 170 as well.
EDIT: Yes, there is also sign-magnitude. Then use the same code here but change the second line to x = 85 - (x & 0x7f).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Return the mantissa as bits - c

If you happen to be on uClibc or glibc, you can use ieee754.h and the mantissa field, like so #include <ieee754.h> static unsigned int fpmanitssa(float number) { ieee754_float x = {.f = number}; return x.mantissa; }

Related

Need help understanding code converting doubles to binary in C

How is float to int type conversion done in C? [duplicate]

How to sign extend a 9-bit value when converting from an 8-bit value?

Convert from binary to floating point

Manually cast signed char

Categories

Resources