Print double without printf - c

I'm trying to display a double without printf or all other libs except stdlib.h for malloc.
I know how the double is stocked and i'm experiencing issues with the calcul.
I know double is stocked in 64 bits :
1 for the sign;
11 for the exponent;
52 for the value;
I used some conversions to get all those values, and i'm failing on getting the 1.fraction (source: https://en.wikipedia.org/wiki/Double-precision_floating-point_format), i get the mantisma, but i don't know how to add correctly this 1.
here some code :
double d;
unsigned long long *double_as_int;
unsigned long long value;
d = 0.5;
double_as_int = (unsigned long long *)&d;
value = *double_as_int & 0x001FFFFFFFFFFFFFULL;
printf("value = %llu\n", value); /* <- just for verification */
i already know that to get the mantisma i need to do only 0x000FFFFFFFFFFFFULL but i'm trying to add the one in the 1.fraction part.
do you guys have any idea how to resolve this part?

I know double is stocked in 64 bits
Not necessarily. A "IEEE 754 double-precision binary floating-point" number is stocked in 64-bits. A "double" may be anything, it may not and it may follow IEEE 745 standard. You should check __STDC_IEC_559__ macro before assuming it is C11 Annex F.
If you want to manipulate floating point numbers, you should use frexp and other such functions specifically meant to abstractly manipulate the representation of floating point numbers, without any *(super unsafe casts*):
double d = DBL_MIN / 2;
int exponent;
double fraction = frexp(d, &exponent);
if (fraction == 0 && exponent == 0) abort(); /*handle error*/
printf("%g = %d * 2^%d * %f\n", d, d<0?-1:1, exponent, fraction);
how to resolve this part?
The 1.fraction represents a fractional number like 1.01010111.. in base-2. The digits after comma are just the bits in the fraction part of the floating point number, in order. The following program (with many bugs in it) is meant to output the floating point value in the representation in the form sign * 2^(exp) * [0/1].fraction(2), where fraction is in base-2:
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <math.h>
#include <stdbool.h>
#include <limits.h>
#include <float.h>
#if !__STDC_IEC_559__
#error
#endif
int main() {
double d = DBL_MIN / 2;
typedef union {
unsigned long long sign : 1;
unsigned long long exp : 11;
unsigned long long fract : 52;
} double64u;
double64u di;
static_assert(sizeof(double) == sizeof(double64u), "");
memcpy(&di, &d, sizeof(double));
// extract **binary** digits from value into buffer
char buffer[53] = {0};
char *p = buffer + 52;
unsigned long long tmp = di.fract;
for (int i = 0; i < 52; ++i) {
*(--p) = (tmp & 0x1) + '0';
tmp >>= 1;
}
char sign = di.sign < 0 ? -1 : 1;
bool normal = di.exp != 0;
printf("%g = \n", d);
if (normal) {
printf("%d * 2^(%d - 1023) * 1.%s(2)\n",
sign, di.exp, buffer);
} else {
printf("%d * 2^(1 - 1023) * 0.%s(2)\n",
sign, buffer);
}
}
On my x86-64 this program outputs:
1.11254e-308
1 * 2^(1 - 1023) * 0.1000000000000000000000000000000000000000000000000000(2)
You can then take the 0.10.. which is a base 2 number (so I added the (2) on the end) to some "binary to decimal converter", like rapidtables, and 0.1 in base-2 is 0.5 in base-10 (well, this example is simple anyway). So the number is:
1 * 2^(1 - 1023) * 0.5
which then you can use some unlimited calculator like bc and input the number to calculate the actual result:
$ bc
scale=400
1 * 2^(1 - 1023) * 0.5
.0000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000011125369292536006915451163586662\
0203210960799023116591527666370844360221740695909792714157950
which is the same number as 1.11254e-308.
Printing floating point numbers yourself is a very hard job to do. I can recommend https://www.ryanjuckett.com/printing-floating-point-numbers/ and papers that introduced Grisu3 and Ryu and Errol1 algorithms. For inspiration, read code from existing implementations: newlib vfprintf.c cvt(), musl vfprintf.c fmt_fp(), glibc printf_fp_ stuff.

Related

Concatenating binary numbers

I am trying to code a program that will take a floating point number in base 10 and convert its fractional part in base 2. In the following code, I am intending to call my converting function into a printf, and format the output; the issue I have lies in my fra_binary() where I can't figure out the best way to return an integer made of the result of the conversion at each turn respectively (concatenation). Here is what I have done now (the code is not optimized because I am still working on it) :
#include <stdio.h>
#include <math.h>
int fra_binary(double fract) ;
int main()
{
long double n ;
double fract, deci ;
printf("base 10 :\n") ;
scanf("%Lf", &n) ;
fract = modf(n, &deci) ;
int d = deci ;
printf("base 2: %d.%d\n", d, fra_binary(fract)) ;
return(0) ;
}
int fra_binary(double F)
{
double fl ;
double decimal ;
int array[30] ;
for (int i = 0 ; i < 30 ; i++) {
fl = F * 2 ;
F = modf(fl, &decimal) ;
array[i] = decimal ;
if (F == 0) break ;
}
return array[0] ;
}
Obviously this returns partly the desired output, because I would need the whole array concatenated as one int or char to display the series of 1 and 0s I need. So at each turn, I want to use the decimal part of the number I work on as the binary number to concatenate (1 + 0 = 10 and not 1). How would I go about it?
Hope this makes sense!
return array[0] ; is only the first value of int array[30] set in fra_binary(). Code discards all but the first calculation of the loop for (int i = 0 ; i < 30 ; i++).
convert its fractional part in base 2
OP's loop idea is a good starting point. Yet int array[30] is insufficient to encode the fractional portion of all double into a "binary".
can't figure out the best way to return an integer
Returning an int will be insufficient. Instead consider using a string - or manage an integer array in a likewise fashion.
Use defines from <float.h> to drive the buffer requirements.
#include <stdio.h>
#include <math.h>
#include <float.h>
char *fra_binary(char *dest, double x) {
_Static_assert(FLT_RADIX == 2, "Unexpected FP base");
double deci;
double fract = modf(x, &deci);
fract = fabs(fract);
char *s = dest;
do {
double d;
fract = modf(fract * 2.0, &d);
*s++ = "01"[(int) d];
} while (fract);
*s = '\0';
// For debug
printf("%*.*g --> %.0f and .", DBL_DECIMAL_DIG + 8, DBL_DECIMAL_DIG, x,
deci);
return dest;
}
int main(void) {
// Perhaps 53 - -1021 + 1
char fraction_string[DBL_MANT_DIG - DBL_MIN_EXP + 1];
puts(fra_binary(fraction_string, -0.0));
puts(fra_binary(fraction_string, 1.0));
puts(fra_binary(fraction_string, asin(-1))); // machine pi
puts(fra_binary(fraction_string, -0.1));
puts(fra_binary(fraction_string, DBL_MAX));
puts(fra_binary(fraction_string, DBL_MIN));
puts(fra_binary(fraction_string, DBL_TRUE_MIN));
}
Output
-0 --> -0 and .0
1 --> 1 and .0
3.1415926535897931 --> 3 and .001001000011111101101010100010001000010110100011
-0.10000000000000001 --> -0 and .0001100110011001100110011001100110011001100110011001101
1.7976931348623157e+308 --> 179769313486231570814527423731704356798070600000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 and .0
2.2250738585072014e-308 --> 0 and .00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
4.9406564584124654e-324 --> 0 and .000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
Also unclear why input is long double, yet processing is with double. Recommend using just one FP type.
Note that your algorithm finds out the binary representation of the fraction most significant bit first.
One way to convert the fractional part to a binary string, would be to supply the function with a string and a string length, and have the function fill it with up to that many binary digits:
/* This function returns the number of chars needed in dst
to describe the fractional part of value in binary,
not including the trailing NUL ('\0').
Returns zero in case of an error (non-finite value).
*/
size_t fractional_bits(char *dst, size_t len, double value)
{
double fraction, integral;
size_t i = 0;
if (!isfinite(value))
return 0;
if (value > 0.0)
fraction = modf(value, &integral);
else
if (value < 0.0)
fraction = modf(-value, &integral);
else {
/* Zero fraction. */
if (len > 1) {
dst[0] = '0';
dst[1] = '\0';
} else
if (len > 0)
dst[0] = '\0';
/* One binary digit was needed for exact representation. */
return 1;
}
while (fraction > 0.0) {
fraction = fraction * 2.0;
if (fraction >= 1.0) {
fraction = fraction - 1.0;
if (i < len)
dst[i] = '1';
} else
if (i < len)
dst[i] = '0';
i++;
}
if (i < len)
dst[i] = '\0';
else
if (len > 0)
dst[len - 1] = '\0';
return i;
}
The above function works very much like snprintf(), except it takes only the double whose fractional bits are to be stored as a string of binary digits (0 or 1). and returns 0 in case of an error (non-finite double value).
Another option is to use an unsigned integer type to hold the bits. For example, if your code is intended to work on architectures where double is an IEEE-754 Binary64 type or similar, the mantissa has up to 53 bits of precision, and an uint64_t would suffice.
Here is an example of that:
uint64_t fractional_bits(const double val, size_t bits)
{
double fraction, integral;
uint64_t result = 0;
if (bits < 1 || bits > 64) {
errno = EINVAL;
return 0;
}
if (!isfinite(val)) {
errno = EDOM;
return 0;
}
if (val > 0.0)
fraction = modf(val, &integral);
else
if (val < 0.0)
fraction = modf(-val, &integral);
else {
errno = 0;
return 0;
}
while (bits-->0) {
result = result << 1;
fraction = fraction * 2.0;
if (fraction >= 1.0) {
fraction = fraction - 1.0;
result = result + 1;
}
}
errno = 0;
return result;
}
The return value is the binary representation of the fractional part: [i]fractional_part[/i] ≈ [i]result[/i] / 2[sup][i]bits[/i][/sup], where [i]bits[/i] is between 1 and 64, inclusive.
In order for the caller to detect an error, the function clears errno to zero if no error occurred. If an error does occur, the function returns zero with errno set to EDOM if the value is not finite, or to EINVAL if bits is less than 1 or greater than 64.
You can combine the two approaches, if you implement an arbitrary-size unsigned integer type, or a bitmap type.

Random float in C using getrandom

I'm trying to generate a random floating point number in between 0 and 1 (whether it's on [0,1] or [0,1) shouldn't matter for me). Every question online about this seems to involves the rand() call, seeded with time(NULL), but I want to be able to invoke my program more than once a second and get different random numbers every time. This lead me to the getrandom syscall in Linux, which pulls from /dev/urandom. I came up with this:
#include <stdio.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <stdint.h>
int main() {
uint32_t r = 0;
for (int i = 0; i < 20; i++) {
syscall(SYS_getrandom, &r, sizeof(uint32_t), 0);
printf("%f\n", ((double)r)/UINT32_MAX);
}
return 0;
}
My question is simply whether or not I'm doing this correctly. It appears to work, but I'm worried that I'm misusing something, and there are next to no examples using getrandom() online.
OP has 2 issues:
How to started the sequence very randomly.
How to generate a double on the [0...1) range.
The usual method is to take a very random source like /dev/urandom or the result from the syscall() or maybe even seed = time() ^ process_id; and seed via srand(). Then call rand() as needed.
Below includes a quickly turned method to generate a uniform [0.0 to 1.0) (linear distribution). But like all random generating functions, really good ones are base on extensive study. This one simply calls rand() a few times based on DBL_MANT_DIG and RAND_MAX,
[Edit] Original double rand_01(void) has a weakness in that it only generates a 2^52 different doubles rather than 2^53. It has been amended. Alternative: a double version of rand_01_ld(void) far below.
#include <assert.h>
#include <float.h>
#include <limits.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
double rand_01(void) {
assert(FLT_RADIX == 2); // needed for DBL_MANT_DIG
unsigned long long limit = (1ull << DBL_MANT_DIG) - 1;
double r = 0.0;
do {
r += rand();
// Assume RAND_MAX is a power-of-2 - 1
r /= (RAND_MAX/2 + 1)*2.0;
limit = limit / (RAND_MAX/2 + 1) / 2;
} while (limit);
// Use only DBL_MANT_DIG (53) bits of precision.
if (r < 0.5) {
volatile double sum = 0.5 + r;
r = sum - 0.5;
}
return r;
}
int main(void) {
FILE *istream = fopen("/dev/urandom", "rb");
assert(istream);
unsigned long seed = 0;
for (unsigned i = 0; i < sizeof seed; i++) {
seed *= (UCHAR_MAX + 1);
int ch = fgetc(istream);
assert(ch != EOF);
seed += (unsigned) ch;
}
fclose(istream);
srand(seed);
for (int i=0; i<20; i++) {
printf("%f\n", rand_01());
}
return 0;
}
If one wanted to extend to an even wider FP, unsigned wide integer types may be insufficient. Below is a portable method that does not have that limitation.
long double rand_01_ld(void) {
// These should be calculated once rather than each function call
// Leave that as a separate implementation problem
// Assume RAND_MAX is power-of-2 - 1
assert((RAND_MAX & (RAND_MAX + 1U)) == 0);
double rand_max_p1 = (RAND_MAX/2 + 1)*2.0;
unsigned BitsPerRand = (unsigned) round(log2(rand_max_p1));
assert(FLT_RADIX != 10);
unsigned BitsPerFP = (unsigned) round(log2(FLT_RADIX)*LDBL_MANT_DIG);
long double r = 0.0;
unsigned i;
for (i = BitsPerFP; i >= BitsPerRand; i -= BitsPerRand) {
r += rand();
r /= rand_max_p1;
}
if (i) {
r += rand() % (1 << i);
r /= 1 << i;
}
return r;
}
If you need to generate doubles, the following algorithm could be of use:
CPython generates random numbers using the following algorithm (I changed the function name, typedefs and return values, but algorithm remains the same):
double get_random_double() {
uint32_t a = get_random_uint32_t() >> 5;
uint32_t b = get_random_uint32_t() >> 6;
return (a * 67108864.0 + b) * (1.0 / 9007199254740992.0);
}
The source of that algorithm is a Mersenne Twister 19937 random number generator by Takuji Nishimura and Makoto Matsumoto. Unfortunately the original link mentioned in the source is not available for download any longer.
The comment on this function in CPython notes the following:
[this function] is the function named genrand_res53 in the original code;
generates a random number on [0,1) with 53-bit resolution; note that
9007199254740992 == 2**53; I assume they're spelling "/2**53" as
multiply-by-reciprocal in the (likely vain) hope that the compiler will
optimize the division away at compile-time. 67108864 is 2**26. In
effect, a contains 27 random bits shifted left 26, and b fills in the
lower 26 bits of the 53-bit numerator.
The orginal code credited Isaku Wada for this algorithm, 2002/01/09
Simplifying from that code, if you want to create a float fast, you should mask the bits of uint32_t with (1 << FLT_MANT_DIG) - 1 and divide by (1 << FLT_MANT_DIG) to get the proper [0, 1) interval:
#include <stdio.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <stdint.h>
#include <float.h>
int main() {
uint32_t r = 0;
float result;
for (int i = 0; i < 20; i++) {
syscall(SYS_getrandom, &r, sizeof(uint32_t), 0);
result = (float)(r & ((1 << FLT_MANT_DIG) - 1)) / (1 << FLT_MANT_DIG);
printf("%f\n", result);
}
return 0;
}
Since it can be assumed that your Linux has a C99 compiler, we can use ldexpf instead of that division:
#include <math.h>
result = ldexpf(r & ((1 << FLT_MANT_DIG) - 1), -FLT_MANT_DIG);
To get the closed interval [0, 1], you can do the slightly less efficient
result = ldexpf(r % (1 << FLT_MANT_DIG), -FLT_MANT_DIG);
To generate lots of good quality random numbers fast, I'd just use the system call to fetch enough data to seed a PRNG or CPRNG, and proceed from there.

Correct algorithm to convert binary floating point "1101.11" into decimal (13.75)?

I have written a program in C to convert a floating point number represented in binary (1101.11) into a decimal (13.75).
However, I cannot seem to get the correct value out of the algorithm.
What is the correct method for converting a binary floating point number into a decimal?
I am using Dev CPP compiler (32 bit). The algorithm is defined below:
void b2d(double p, double q )
{
double rem, dec=0, main, f, i, t=0;
/* integer part operation */
while ( p >= 1 )
{
rem = (int)fmod(p, 10);
p = (int)(p / 10);
dec = dec + rem * pow(2, t);
t++;
}
/* fractional part operation */
t = 1; //assigning '1' to use 't' in new operation
while( q > 0 )
{
main = q * 10;
q = modf(main, &i); //extration of frational part(q) and integer part(i)
dec = dec+i*pow(2, -t);
t++;
}
printf("\nthe decimal value=%lf\n",dec); //prints the final output
}
int main()
{
double bin, a, f;
printf("Enter binary number to convert:\n");
scanf("%lf",&bin);
/* separation of integer part and decimal part */
a = (int)bin;
f = bin - a;
b2d(a, f); // function calling for conversion
getch();
return 0;
}
You are not, as you believe, reading "1101.11" as a floating point number represented in binary. You are reading it as a base-10 floating point number converted into an IEEE double-precision floating-point value, and then trying to change the base.
The inherent imprecision of this intermediate step is the reason for your problem.
A better approach, as suggested by Vicky, is to:
read "1101.11" as a string or line of text
convert the whole and fractional parts (whole=b1101=13 and numerator=b11=3, denominator=4)
re-combine these into whole + numerator/denominator = 13.75
Solution
The following will work as expected:
Output:
➤ gcc bin2dec.c -lm -o bin2dec && bin2dec
1101.11 -> 13.750000
1101 -> 13.000000
1101. -> 13.000000
.11 -> 0.750000
Code (bin2dec.c):
#include <stdio.h>
#include <math.h>
double convert(const char binary[]){
int bi,i;
int len = 0;
int dot = -1;
double result = 0;
for(bi = 0; binary[bi] != '\0'; bi++){
if(binary[bi] == '.'){
dot = bi;
}
len++;
}
if(dot == -1)
dot=len;
for(i = dot; i >= 0 ; i--){
if (binary[i] == '1'){
result += (double) pow(2,(dot-i-1));
}
}
for(i=dot; binary[i] != '\0'; i++){
if (binary[i] == '1'){
result += 1.0/(double) pow(2.0,(double)(i-dot));
}
}
return result;
}
int main()
{
char bin[] = "1101.11";
char bin1[] = "1101";
char bin2[] = "1101.";
char bin3[] = ".11";
printf("%s -> %f\n",bin, convert(bin));
printf("%s -> %f\n",bin1, convert(bin1));
printf("%s -> %f\n",bin2, convert(bin2));
printf("%s -> %f\n",bin3, convert(bin3));
return 0;
}
Explanation
The above code works by first finding the index of the decimal point in the number.
Once that is known, it walks the string both backwards and forwards from this index, adding the appropriate value to the result variable.
The first loop walks backwards from the decimal point and accumulates the powers of 2 if the character is 1. It takes the distance from the decimal point as the power of two, minus one for the indexing to be correct. Ie, it accumulates :
pow(2,<distance-from-decimal-point>)
The loop stops when the index reaches the beginning of the string.
The second loop walks forward until the end of the string, and deals with the fractional part as expected it also uses the distance from the index, but this time accumulates fractional parts:
1/pow(2,<distance-from-decimal-point>)
Worked out example:
1101.11 = 1101 + 0.11
1101 = 1*2^3 + 1*2^2 + 0*2^1 + 1*2^0 = 8 + 4 + 0 + 1 = 13
0.11 = 1/(2^1) + 1/(2^2) = 0.5 + 0.25 = 0.75
1101.11 = 13.75
Beware of malformed input. "10gsh.9701072.67812" will give you a result. It won't mean much :)
This piece of code behaves abnormally: I added some simple print statement
while(q>0)
{
double i;
main=q*10.0;
q=modf(main, &i); //extration of frational part(q) and integer part(i)
cout << "main = " << main << " frac part " << q << " int part " << i << endl;
cin.get();
dec=dec+i*pow(2,-t);
t++;
}
When you input 1101.11, the following output shown:
Enter binary number to convert(e.g: 1101.11 which will be 13.75 in decimal):
1101.11
bin in main 1101.11
p 1101 q 0.11
//inside the above while loop code
main = 1.1 frac part 0.1 int part 1
main = 1 frac part 1 int part 0 //^^^^^Error, given main=1, it should output integer part 1, fraction part 0
main = 10 frac part 1 int part 9 //^^^^^same strange error here, it should exit while already
So you got wrong result. I tested modf separately with input 1, it gave correct result.
So my guess is that you are reading the binary number as double, then tries to convert this double to binary back. There might be something going on under the hood for the precision of number though it shows that it is 1101.11. As suggested by #Useless, You may need to read the number as a string, figure out the substring before and after the decimal point . Then convert this two part into decimal separately.

How can I convert a float/double to ASCII without using sprintf or ftoa in C?

How can I convert a float/double to ASCII without using sprintf or ftoa in C?
I am using an embedded system.
The approach you take will depend on the possible range of values. You certainly have some internal knowledge of the possible range, and you may only be interested in conversions within a more narrow range.
So, suppose you are only interested in the integer value. In this case, I would just assign the number to an int or long, at which point the problem becomes fairly obvious.
Or, suppose the range won't include any large exponents but you are interested in several digits of fraction. To get three digits of fraction, I might say int x = f * 1000;, convert x, and then insert the decimal point as a string operation.
Failing all of the above, a float or double has a sign bit, a fraction, and an exponent. There is a hidden 1 in the fraction. (The numbers are normalized until they have no leading zeroes, at which point they do one more shift to gain an extra bit of precision.) The number is then equal to the fraction (plus a leading '1') * 2 ** exponent. With essentially all systems using the IEEE 754 representation you can just use this Wikipedia IEEE 754 page to understand the format. It's not that different from just converting an integer.
For single precision, once you get the exponent and fraction, the valueNote 1 of the number is then (frac / 223 + 1) * 2exp, or frac * 2exp - 23 + 2exp.
Here is an example that should get you started on a useful conversion:
$ cat t.c
#include <stdio.h>
void xconvert(unsigned frac)
{
if (frac) {
xconvert(frac / 10);
printf("%c", frac % 10 | '0');
}
}
void convert(unsigned i)
{
unsigned sign, exp, frac;
sign = i >> 31;
exp = (i >> (31 - 8)) - 127;
frac = i & 0x007fffff;
if (sign)
printf("-");
xconvert(frac);
printf(" * 2 ** %d + 2 ** %d\n", exp - 23, exp);
printf("\n");
}
int main(void)
{
union {
float f;
unsigned i;
} u;
u.f = 1.234e9;
convert(u.i);
return 0;
}
$ ./a.out
1252017 * 2 ** 7 + 2 ** 30
Note 1. In this case the fraction is being converted as if the binary point was on the right instead of the left, with compensating adjustments then made to the exponent and hidden bit.
#include<stdio.h>
void flot(char* p, float x)
{
int n,i=0,k=0;
n=(int)x;
while(n>0)
{
x/=10;
n=(int)x;
i++;
}
*(p+i) = '.';
x *= 10;
n = (int)x;
x = x-n;
while((n>0)||(i>k))
{
if(k == i)
k++;
*(p+k)='0'+n;
x *= 10;
n = (int)x;
x = x-n;
k++;
}
/* Null-terminated string */
*(p+k) = '\0';
}
int main()
{
float x;
char a[20]={};
char* p=&a;
printf("Enter the float value.");
scanf("%f",&x);
flot(p,x);
printf("The value=%s",p);
getchar();
return 0;
}
Even in an embedded system, you'd be hard pressed to beat the performance of ftoa. Why reinvent the wheel?

Subtraction without minus sign in C

How can I subtract two integers in C without the - operator?
int a = 34;
int b = 50;
You can convert b to negative value using negation and adding 1:
int c = a + (~b + 1);
printf("%d\n", c);
-16
This is two's complement sign negation. Processor is doing it when you use '-' operator when you want to negate value or subtrackt it.
Converting float is simpler. Just negate first bit (shoosh gave you example how to do this).
EDIT:
Ok, guys. I give up. Here is my compiler independent version:
#include <stdio.h>
unsigned int adder(unsigned int a, unsigned int b) {
unsigned int loop = 1;
unsigned int sum = 0;
unsigned int ai, bi, ci;
while (loop) {
ai = a & loop;
bi = b & loop;
ci = sum & loop;
sum = sum ^ ai ^ bi; // add i-th bit of a and b, and add carry bit stored in sum i-th bit
loop = loop << 1;
if ((ai&bi)|(ci&ai)|(ci&bi)) sum = sum^loop; // add carry bit
}
return sum;
}
unsigned int sub(unsigned int a, unsigned int b) {
return adder(a, adder(~b, 1)); // add negation + 1 (two's complement here)
}
int main() {
unsigned int a = 35;
unsigned int b = 40;
printf("%u - %u = %d\n", a, b, sub(a, b)); // printf function isn't compiler independent here
return 0;
}
I'm using unsigned int so that any compiler will treat it the same.
If you want to subtract negative values, then do it that way:
unsgined int negative15 = adder(~15, 1);
Now we are completly independent of signed values conventions. In my approach result all ints will be stored as two's complement - so you have to be careful with bigger ints (they have to start with 0 bit).
Pontus is right, 2's complement is not mandated by the C standard (even if it is the de facto hardware standard). +1 for Phil's creative answers; here's another approach to getting -1 without using the standard library or the -- operator.
C mandates three possible representations, so you can sniff which is in operation and get a different -1 for each:
negation= ~1;
if (negation+1==0) /* one's complement arithmetic */
minusone= ~1;
else if (negation+2==0) /* two's complement arithmetic */
minusone= ~0;
else /* sign-and-magnitude arithmetic */
minusone= ~0x7FFFFFFE;
r= a+b*minusone;
The value 0x7FFFFFFFE would depend on the width (number of ‘value bits’) of the type of integer you were interested in; if unspecified, you have more work to find that out!
+ No bit setting
+ Language independent
+ Can be adjusted for different number types (int, float, etc)
- Almost certainly not your C homework answer (which is likely to be about bits)
Expand a-b:
a-b = a + (-b)
= a + (-1).b
Manufacture -1:
float: pi = asin(1.0);
(with minusone_flt = sin(3.0/2.0*pi);
math.h) or = cos(pi)
or = log10(0.1)
complex: minusone_cpx = (0,1)**2; // i squared
integer: minusone_int = 0; minusone_int--; // or convert one of the floats above
+ No bit setting
+ Language independent
+ Independent of number type (int, float, etc)
- Requires a>b (ie positive result)
- Almost certainly not your C homework answer (which is likely to be about bits)
a - b = c
restricting ourselves to the number space 0 <= c < (a+b):
(a - b) mod(a+b) = c mod(a+b)
a mod(a+b) - b mod(a+b) = c mod(a+b)
simplifying the second term:
(-b).mod(a+b) = (a+b-b).mod(a+b)
= a.mod(a+b)
substituting:
a.mod(a+b) + a.mod(a+b) = c.mod(a+b)
2a.mod(a+b) = c.mod(a+b)
if b>a, then b-a>0, so:
c.mod(a+b) = c
c = 2a.mod(a+b)
So, if a is always greater than b, then this would work.
Given that encoding integers to support two's complement is not mandated in C, iterate until done. If they want you to jump through flaming hoops, no need to be efficient about it!
int subtract(int a, int b)
{
if ( b < 0 )
return a+abs(b);
while (b-- > 0)
--a;
return a;
}
Silly question... probably silly interview!
For subtracting in C two integers you only need:
int subtract(int a, int b)
{
return a + (~b) + 1;
}
I don't believe that there is a simple an elegant solution for float or double numbers like for integers. So you can transform your float numbers in arrays and apply an algorithm similar with one simulated here
If you want to do it for floats, start from a positive number and change its sign bit like so:
float f = 3;
*(int*)&f |= 0x80000000;
// now f is -3.
float m = 4 + f;
// m = 1
You can also do this for doubles using the appropriate 64 bit integer. in visual studio this is __int64 for instance.
I suppose this
b - a = ~( a + ~b)
Assembly (accumulator) style:
int result = a;
result -= b;
As the question asked for integers not ints, you could implement a small interpreter than uses Church numerals.
Create a lookup table for every possible case of int-int!
Not tested. Without using 2's complement:
#include <stdlib.h>
#include <stdio.h>
int sillyNegate(int x) {
if (x <= 0)
return abs(x);
else {
// setlocale(LC_ALL, "C"); // if necessary.
char buffer[256];
snprintf(buffer, 255, "%c%d", 0x2d, x);
sscanf(buffer, "%d", &x);
return x;
}
}
Assuming the length of an int is much less than 255, and the snprintf/sscanf round-trip won't produce any unspecified behavior (right? right?).
The subtraction can be computed using a - b == a + (-b).
Alternative:
#include <math.h>
int moreSillyNegate(int x) {
return x * ilogb(0.5); // ilogb(0.5) == -1;
}
This would work using integer overflow:
#include<limits.h>
int subtractWithoutMinusSign(int a, int b){
return a + (b * (INT_MAX + INT_MAX + 1));
}
This also works for floats (assuming you make a float version…)
For the maximum range of any data type , one's complement provide the negative value decreased by 1 to any corresponding value. ex:
~1 --------> -2
~2---------> -3
and so on... I will show you this observation using little code snippet
#include<stdio.h>
int main()
{
int a , b;
a=10;
b=~a; // b-----> -11
printf("%d\n",a+~b+1);// equivalent to a-b
return 0;
}
Output: 0
Note : This is valid only for the range of data type. means for int data type this rule will be applicable only for the value of range[-2,147,483,648 to 2,147,483,647].
Thankyou .....May this help you
Iff:
The Minuend is greater or equal to 0, or
The Subtrahend is greater or equal to 0, or
The Subtrahend and the Minuend are less than 0
multiply the Minuend by -1 and add the result to the Subtrahend:
SUB + (MIN * -1)
Else multiply the Minuend by 1 and add the result to the Subtrahend.
SUB + (MIN * 1)
Example (Try it online):
#include <stdio.h>
int subtract (int a, int b)
{
if ( a >= 0 || b >= 0 || ( a < 0 && b < 0 ) )
{
return a + (b * -1);
}
return a + (b * 1);
}
int main (void)
{
int x = -1;
int y = -5;
printf("%d - %d = %d", x, y, subtract(x, y) );
}
Output:
-1 - -5 = 4
int num1, num2, count = 0;
Console.WriteLine("Enter two numebrs");
num1 = int.Parse(Console.ReadLine());
num2 = int.Parse(Console.ReadLine());
if (num1 < num2)
{
num1 = num1 + num2;
num2 = num1 - num2;
num1 = num1 - num2;
}
for (; num2 < num1; num2++)
{
count++;
}
Console.WriteLine("The diferrence is " + count);
void main()
{
int a=5;
int b=7;
while(b--)a--;
printf("sud=%d",a);
}

Resources