Related
As you've read, I created a decimal to binary program and it works well, but it cannot handle user input equal to 100,000,000. My solution is to print each character as it goes, but I do not know what the appropriate loop to use is, and I am also not that great with the math so the main formula to be used is unclear to me. Arrays are not allowed. Any advice is appreciated. Thank you.
#include <stdio.h>
unsigned long long int input,inp,rem=0,ans=0,place_value=1,ans;
int main()
{
printf("\nYou have chosen Decimal to Binary and Octal Conversion!\n");
printf("Enter a decimal number:\n");
scanf("%llu", &input);
inp=input;
while(input){
rem=input%2;
input=input/2;
ans=ans+(rem*place_value);
place_value=place_value*10;
}
printf("%llu in Decimal is %llu in Binary Form.\n", inp,ans);
return 0;
}
Edit: I have already read all your answers and I have done my best to understand them. I was able to understand most of what was brought up but some terms or lessons mentioned will require more time from me to learn. I have already submitted my output without solving the 100,000,000 issue but I intend to use the knowledge I have now to create better outputs. I tried asking a friend of mine and he told me he was able to do it using method 2 found here:https://www.wikihow.com/Convert-from-Decimal-to-Binary. Perhaps my instructor simply wanted to teach us how to fully utilize control structures and data types which is why there are so many restrictions. Thank you all for your time and god bless.
So as the comments have explained, the decimal number 100000000 has the 27-bit binary representation 101111101011110000100000000. We can therefore store that in a 32-bit int with no problem. But if we were to try to store the decimal number 101111101011110000100000000, which just happens to look like a binary number, well, that would require 87 bits, so it won't even fit into a 64-bit long long integer.
And the code in this question does try to compute its result, ans, as a decimal number which just happens to look like a binary number. And for that reason this code can't work for numbers larger than 1048575 (assuming a 64-bit unsigned long long int).
And this is one reason that "decimal to binary" conversion (or, for that matter, conversion to any base) should normally not be done to a result variable that's an integer. Normally, the result of such a conversion — to any base — should either be done to a result variable that's a string, or it should be printed out immediately. (The moral here is that the base only matters when a number is printed out for a human to read, which implies either a string, and/or something printed to, say, stdout.)
However, in C a string is of course an array. So asking someone to do base conversion without using arrays is a perverse, pointless exercise.
If you print the digits out immediately, you don't have to store them in an array. But the standard algorithm — repeated division by 2 (or whatever the base is) generates digits in reverse order, from least-significant to most-significant, which ends up being right-to-left, which is the wrong order to just print them out. Conventional convert-to-digits code usually stores the computed digits into an array, and then reverses the array — but if there's a prohibition against using arrays, this strategy is (again pointlessly) denied to us.
The other way to get the digits out in the other order is to use a recursive algorithm, as #chux has demonstrated in his answer.
But just to be perverse in my own way, I'm going to show another way to do it.
Even though it's generally a horrible idea, constructing the digits into an integer, that's in base 10 but looks like it's in base 2, is at least one way to store things up and get the answer back out with the digits in the right order. The only problem is that, as we've seen, the number can get outrageously big, especially for base 2. (The other problem, not that it matters here, is that this approach won't work for bases greater than 10, since there's obviously no way to construct a decimal number that just happens to look like it's in, say, base 16.)
The question is, how can we represent integers that might be as big as 87 bits? And my answer is, we can use what's called "multiple precision arithmetic". For example, if we use a pair of 64-bit unsigned long long int variables, we can theoretically represent numbers up to 128 bits in size, or 340282366920938463463374607431768211455!
Multiple precision arithmetic is an advanced but fascinating and instructive topic. Normally it uses arrays, too, but if we limit ourselves to just two "halves" of our big numbers, and make certain other simplifications, we can do it pretty simply, and achieve something just powerful enough to solve the problem in the question.
So, to repeat, we're going to represent a 128-bit number as a "high half" and a "low half". Actually, to keeps things simpler, it's not actually going to be a 128-bit number. To keep things simpler, the "high half" is going to be the first 18 digits of a 36-digit decimal number, and the "low half" is going to be the other 18 digits. This will give us the equivalent of of only about 120 bits, but it will still be plenty for our purposes.
So how do we do arithmetic on 36-digit numbers represented as "high" and "low" halves? Actually, it ends up being more or less the same way we learned how to do pencil-and-paper arithmetic on numbers represented as digits, at all.
If I have one of these "big" numbers, in its two halves:
high1 low1
and if I have a second one, also in two halves:
high2 low2
and if I want to compute the sum
high1 low1
+ high2 low2
-----------
high3 low3
the way I do it is to add low1 and low2 to get the low half of the sum, low3. If low3 is less than 1000000000000000000 — that is, if it has 18 digits or less — I'm okay, but if it's bigger than that, I have a carry into the next column. And then to get the high half of the sum, high3, I just add high1 plus high2 plus the carry, if any.
Multiplication is harder, but it turns out for this problem we're never going to have to compute a full 36-digit × 36-digit product. We're only ever going to have to multiply one of our big numbers by a small number, like 2 or 10. The problem will look like this:
high1 low1
× fac
-----------
high3 low3
So, again by the rules of paper-and-pencil arithmetic we learned long ago, low3 is going to be low1 × fac, and high3 is going to be high1 × fac, again with a possible carry.
The next question is how we're going to carry these low and high halves around. As I said, normally we'd use an array, but we can't here. The second choice might be a struct, but you may not have learned about those yet, and if your crazy instructor won't let you use arrays, it seems that using structures might well be out of bounds, also. So we'll just write a few functions that accept high and low halves as separate arguments.
Here's our first function, to add two 36-digit numbers. It's actually pretty simple:
void long_add(unsigned long long int *hi, unsigned long long int *lo,
unsigned long long int addhi, unsigned long long int addlo)
{
*hi += addhi;
*lo += addlo;
}
The way I've written it, it doesn't compute c = a + b; it's more like a += b. That is, it takes addhi and addlo and adds them in to hi and lo, modifying hi and lo in the process. So hi and lo are passed in as pointers, so that the pointed-to values can be modified. The high half is *hi, and we add in the high half of the number to be added in, addhi. And then we do the same thing with the low half. And then — whoops — what about the carry? That's not too hard, but to keep things nice and simple, I'm going to defer it to a separate function. So my final long_add function looks like:
void long_add(unsigned long long int *hi, unsigned long long int *lo,
unsigned long long int addhi, unsigned long long int addlo)
{
*hi += addhi;
*lo += addlo;
check_carry(hi, lo);
}
And then check_carry is simple, too. It looks like this:
void check_carry(unsigned long long int *hi, unsigned long long int *lo)
{
if(*lo >= 1000000000000000000ULL) {
int carry = *lo / 1000000000000000000ULL;
*lo %= 1000000000000000000ULL;
*hi += carry;
}
}
Again, it accepts pointers to lo and hi, so that it can modify them.
The low half is *lo, which is supposed to be at most an 18-bit number, but if it's got 19 — that is, if it's greater than or equal to 1000000000000000000, that means it has overflowed, and we have to do the carry thing. The carry is the extent by which *lo exceeds 18 digits — it's actually just the top 19th (and any greater) digit(s). If you're not super-comfortable with this kind of math, it may not be immediately obvious that taking *lo, and dividing it by that big number (it's literally 1 with eighteen 0's) will give you the top 19th digit, or that using % will give you the low 18 digits, but that's exactly what / and % do, and this is a good way to learn that.
In any case, having computed the carry, we add it in to *hi, and we're done.
So now we're done with addition, and we can tackle multiplication. For our purposes, it's just about as easy:
void long_multiply(unsigned long long int *hi, unsigned long long int *lo,
unsigned int fac)
{
*hi *= fac;
*lo *= fac;
check_carry(hi, lo);
}
It looks eerily similar to the addition case, but it's just what our pencil-and-paper analysis said we were going to have to do. (Again, this is a simplified version.) We can re-use the same check_carry function, and that's why I chose to break it out as a separate function.
With these functions in hand, we can now rewrite the binary-to-decimal program so that it will work with these even bigger numbers:
int main()
{
unsigned int inp, input;
unsigned long long int anslo = 0, anshi = 0;
unsigned long long int place_value_lo = 1, place_value_hi = 0;
printf("Enter a decimal number:\n");
scanf("%u", &input);
inp = input;
while(input){
int rem = input % 2;
input = input / 2;
// ans=ans+(rem*place_value);
unsigned long long int tmplo = place_value_lo;
unsigned long long int tmphi = place_value_hi;
long_multiply(&tmphi, &tmplo, rem);
long_add(&anshi, &anslo, tmphi, tmplo);
// place_value=place_value*10;
long_multiply(&place_value_hi, &place_value_lo, 10);
}
printf("%u in Decimal is ", inp);
if(anshi == 0)
printf("%llu", anslo);
else printf("%llu%018llu", anshi, anslo);
printf(" in Binary Form.\n");
}
This is basically the same program as in the question, with these changes:
The ans and place_value variables have to be greater than 64 bits, so they now exist as _hi and _lo halves.
We're calling our new functions to do addition and multiplication on big numbers.
We need a tmp variable (actually tmp_hi and tmp_lo) to hold the intermediate result in what used to be the simple expression ans = ans + (rem * place_value);.
There's no need for the user's input variable to be big, so I've reduced it to a plain unsigned int.
There's also some mild trickiness involved in printing the two halves of the final answer, anshi and anslo, back out. But if you compile and run this program, I think you'll find it now works for any input numbers you can give it. (It should theoretically work for inputs up to 68719476735 or so, which is bigger than will fit in a 32-bit input inp.)
Also, for those still with me, I have to add a few disclaimers. The only reason I could get away with writing long_add and long_multiply functions that looked so small and simple was that they are simple, and work only for "easy" problems, without undue overflow. I chose 18 digits as the maximum for the "high" and "lo" halves because a 64-bit unsigned long long int can actually hold numbers up to the equivalent of 19 digits, and that means that I can detect overflow — of up to one digit — simply, with that > 1000000000000000000ULL test. If any intermediate result ever overflowed by two digits, I'd have been in real trouble. But for simple additions, there's only ever a single-digit carry. And since I'm only ever doing tiny multiplications, I could cheat and assume (that is, get away with) a single-digit carry there, too.
If you're trying to do multiprecision arithmetic in full generality, for multiplication you have to consider partial products that have up to twice as many digits/bits as their inputs. So you either need to use an output type that's twice as wide as the inputs, or you have to split the inputs into halves ("sub-halves"), and work with them individually, basically doing a little 2×2 problem, with various carries, for each "digit".
Another problem with multiplication is that the "obvious" algorithm, the one based on the pencil-and-paper technique everybody learned in elementary school, can be unacceptably inefficient for really big problems, since it's basically O(N2) in the number of digits.
People who do this stuff for a living have lots of more-sophisticated techniques they've worked out, for things like detecting overflow and for doing multiplication more efficiently.
And then if you want some real fun (or a real nightmare, full of bad flashbacks to elementary school), there's long division...
OP's code suffers from overflow in place_value*10
A way to avoid no array and range limitations is to use recursion.
Perhaps beyond where OP is now.
#include <stdio.h>
void print_lsbit(unsigned long long x) {
if (x > 1) {
print_lsbit(x / 2); // Print more significant digits first
}
putchar(x % 2 + '0'); // Print the LSBit
}
int main(void) {
printf("\nYou have chosen Decimal to Binary and Octal Conversion!\n");
printf("Enter a decimal number:\n");
//scanf("%llu", &input);
unsigned long long input = 100000000;
printf("%llu in Decimal is ", input);
print_lsbit(input);
printf(" in Binary Form.\n");
return 0;
}
Output
You have chosen Decimal to Binary and Octal Conversion!
Enter a decimal number:
100000000 in Decimal is 101111101011110000100000000 in Binary Form.
I'm struggling to understand the behavior of gcc in this. The size of a float is of 4 bytes for my architecture. But I can still store a 8 bytes real value in a float, and my compiler says nothing about it.
For example I have :
#include <stdio.h>
int main(int argc, char** argv){
float someFloatNumb = 0xFFFFFFFFFFFF;
printf("%i\n", sizeof(someFloatNumb));
printf("%f\n", someFloatNumb);
printf("%i\n", sizeof(281474976710656));
return 0;
}
I expected the compiler to insult me, or displaying a disclaimer of some sort, because I shouldn't be able to something like that, at least I think it's kind of twisted wizardry.
The program simply run :
4
281474976710656.000000
8
So, if I print the size of someFloatNumb, I get 4 bytes, which is expected. But the affected value isn't, as seen just below.
So I have a few questions:
Does sizeof(variable) simply get the variable type and return sizeof(type), which in this case would explain the result?
Does/Can gcc grow the capacity of a type? (managing multiple variables behind the curtains to allow us that sort of things)
1)
Does sizeof(variable) simply get the variable type and return sizeof(type), which in this case would explain the result ?
Except for variable-length arrays, sizeof doesn't evaluate its operand. So yes, all it cares is the type. So sizeof(someFloatNumb) is 4 which is equivalent to sizeof(float). This explains printf("%i\n", sizeof(someFloatNumb));.
2)
[..] But I can still store a 8 bytes real value in a float, and my compiler says nothing about it.
Does/Can gcc grow the capacity of a type ? (managing multiple variables behind the curtains to allow us that sort of things)
No. Capacity doesn't grow. You simply misunderstood how floats are represented/stored. sizeof(float) being 4 doesn't mean
it can't store more than 2^32 (assuming 1 byte == 8 bits). See Floating point representation.
What the maximum value of a float can represent is defined by the constant FLT_MAX (see <float.h>). sizeof(someFloatNumb) simply yields how many bytes the object (someFloatNumb) takes up in memory which isn't necessarily equal to the range of values it can represent.
This explains why printf("%f\n", someFloatNumb); prints the value as expected (and there's no automatic "capacity growth").
3)
printf("%i\n", sizeof(281474976710656));
This is slightly more involved. As said before in (1), sizeof only cares about the type here. But the type of 281474976710656 is not necessarily int.
The C standard defines the type of integer constants according to the smallest type that can represent the value. See https://stackoverflow.com/a/42115024/1275169 for an explanation.
On my system 281474976710656 can't be represented in an int and it's stored in a long int which is likely to be case on your system as well. So what you see is essentially equivalent to sizeof(long).
There's no portable way to determine the type of integer constants. But since you are using gcc, you could use a little trick with typeof:
typeof(281474976710656) x;
printf("%s", x); /* deliberately using '%s' to generate warning from gcc. */
generates:
warning: format ‘%s’ expects argument of type ‘char *’, but argument 2
has type ‘long int’ [-Wformat=]
printf("%s", x);
P.S: sizeof results a size_t for which the correct format specifier is %zu. So that's what you should be using in your 1st and 3rd printf statements.
This doesn't store "8 bytes" of data, that value gets converted to an integer by the compiler, then converted to a float for assignment:
float someFloatNumb = 0xFFFFFFFFFFFF; // 6 bytes of data
Since float can represent large values, this isn't a big deal, but you will lose a lot of precision if you're only using 32-bit floats. Notice there's a slight but important difference here:
float value = 281474976710656.000000;
int value = 281474976710655;
This is because float becomes an approximation when it runs out of precision.
Capacities don't "grow" for standard C types. You'll have to use a "bignum" library for that.
But I can still store a 8 bytes real value in a float, and my compiler
says nothing about it.
That's not what's happening.
float someFloatNumb = 0xFFFFFFFFFFFF;
0xFFFFFFFFFFFF is an integer constant. Its value, expressed in decimal, is 281474976710655, and its type is probably either long or long long. (Incidentally, that value can be stored in 48 bits, but most systems don't have a 48-bit integer type, so it will probably be stored in 64 bits, of which the high-order 16 bits will be zero.)
When you use an expression of one numeric type to initialize an object of a different numeric type, the value is converted. This conversion doesn't depend on the size of the source expression, only on its numeric value. For an integer-to-float conversion, the result is the closest representation to the integer value. There may be some loss of precision (and in this case, there is). Some compilers may have options to warn about loss of precision, but the conversion is perfectly valid so you probably won't get a warning by default.
Here's a small program to illustrate what's going on:
#include <stdio.h>
int main(void) {
long long ll = 0xFFFFFFFFFFFF;
float f = 0xFFFFFFFFFFFF;
printf("ll = %lld\n", ll);
printf("f = %f\n", f);
}
The output on my system is:
ll = 281474976710655
f = 281474976710656.000000
As you can see, the conversion has lost some precision. 281474976710656 is an exact power of two, and floating-point types generally can represent those exactly. There's a very small difference between the two values because you chose an integer value that's very close to one that can be represented exactly. If I change the value:
#include <stdio.h>
int main(void) {
long long ll = 0xEEEEEEEEEEEE;
float f = 0xEEEEEEEEEEEE;
printf("ll = %lld\n", ll);
printf("f = %f\n", f);
}
the apparent loss of precision is much larger:
ll = 262709978263278
f = 262709979381760.000000
0xFFFFFFFFFFFF == 281474976710655
If you init a float with that value, it will end up being
0xFFFFFFFFFFFF +1 == 0x1000000000000 == 281474976710656 == 1<<48
That fits easily in a 4byte float, simple mantisse, small exponent.
It does however NOT store the correct value (one lower) because that IS hard to store in a float.
Note that the " +1" does not imply incrementation. It ends up one higher because the representation can only get as close as off-by-one to the attempted value. You may consider that "rounding up to the next power of 2 mutliplied by whatever the mantisse can store". Mantisse, by the way, usually is interpreted as a fraction between 0 and 1.
Getting closer would indeed require the 48 bits of your initialisation in the mantisse; plus whatever number of bits would be used to store the exponent; and maybe a few more for other details.
Look at the value printed... 0xFFFF...FFFF is an odd value, but the value printed in your example is even. You are feeding the float variable with an int value that is converted to float. The conversion is loosing precision, as expected by the value used, which doesn't fit in the 23 bits reserved to the target variable mantissa. And finally you get an approximation with is the value 0x1000000....0000 (the next value, which is the closest value to the one you used, as posted #Yunnosch in his answer)
I need to transfer a double value (-47.1235648, for example) using sockets. Since I'll have a lot of platforms, I must convert to network byte order to ensure correct endian of all ends....but this convert doesn't accept double, just integer and short, so I'm 'cutting' my double into two integer to transfer, like this:
double lat = -47.848945;
int a;
int b;
a = (int)lat;
b = (int)(lat+1);
Now, I need to restore this on the other end, but using the minimum computation as possible (I saw some examples using POW, but looks like pow use a lot of resources for this, I'm not sure). Is there any way to join this as simples as possible, like bit manipulating?
Your code makes no sense.
The typical approach is to use memcpy():
const double lat = -47.848945;
uint32_t ints[sizeof lat / sizeof (uint32_t)];
memcpy(ints, &lat, sizeof lat);
Now send the elements of ints, which are just 32-bit unsigned integers.
This of course assumes:
That you know how to send uint32_ts in a safe manner, i.e. byte per byte or using endian-conversion functions.
That all hosts share the same binary double format (typically IEEE-754).
That you somehow can manage the byte order requirements when moving to/from a pair of integers from/to a single double value (see #JohnBollinger's answer).
I interpreted your question to mean all of these assumptions were safe, that might be a bit over the top. I can't delete this answer as long as it's accepted.
It's good that you're considering differences in numeric representation, but your idea for how to deal with this problem just doesn't work reliably.
Let us suppose that every machine involved uses 64-bit IEEE-754 format for its double representation. (That's already a potential point of failure, but in practice you probably don't have to worry about failures there.) You seem to postulate that the byte order for machines' doubles will map in a consistent way onto the byte order for their integers, but that is not a safe assumption. Moreover, even where that assumption holds true, you need exactly the right kind of mapping for your scheme to work, and that is not only not safe to assume, but very plausibly will not be what you actually see.
For the sake of argument, suppose machine A, which features big-endian integers, wants to transfer a double value to machine B, which features little-endian integers. Suppose further that on B, the byte order for its double representation is the exact reverse of the order on A (which, again, is not safe to assume). Thus, if on A, the bytes of that double are in the order
S T U V W X Y Z
then we want them to be in order
Z Y X W V U T S
on B. Your approach is to split the original into a pair (STUV, WXYZ), transfer the pair in a value-preserving manner to get (VUTS, ZYXW), and then put the pair back together to get ... uh oh ...
V U T S Z Y X W
. Don't imagine fixing that by first swapping the pair. That doesn't serve your purpose because you must avoid such a swap in the event that the two communicating machines have the same byte order, and you have no way to know from just the 8 bytes whether such a swap is needed. Thus even if we make simplifying assumptions that we know to be unsafe, your strategy is insufficient for the task.
Alternatives include:
transfer your doubles as strings.
transfer your doubles as integer (significand, scale) pairs. The frexp() and ldexp() functions can help with encoding and decoding such representations.
transfer an integer-based fixed-point representation of your doubles (the same as the previous option, but with pre-determined scale that is not transferred)
I need to transfer a double value (-47.1235648, for example) using sockets.
If the platforms have potentially different codings for double, then sending a bit pattern of the double is a problem. If code wants portability, a less than "just copy the bits" approach is needed. An alternative is below.
If platforms always have the same double format, just copy the n bits. Example:#Rishikesh Raje
In detail, OP's problem is only loosely defined. On many platforms, a double is a binary64 yet this is not required by C. That double can represent about 264 different values exactly. Neither -47.1235648 nor -47.848945 are one of those. So it is possible OP does not have a strong precision concern.
"using the minimum computation as possible" implies minimal code, usually to have minimal time. For speed, any solution should be rated on order of complexity and with code profiling.
A portable method is to send via a string. This approach addresses correctness and best possible precision first and performance second. It removes endian issues as data is sent via a string and there is no precision/range loss in sending the data. The receiving side, if the using the same double format will re-formed the double exactly. With different double machines, it has a good string representation to do the best it can.
// some ample sized buffer
#define N (sizeof(double)*CHAR_BIT)
double x = foo();
char buf[N];
#if FLT_RADIX == 10
// Rare based 10 platforms
// If macro DBL_DECIMAL_DIG not available, use (DBL_DIG+3)
sprintf(buf, "%.*e", DBL_DECIMAL_DIG-1, x);
#else
// print mantissa in hexadecimal notation with power-of-2 exponent
sprintf(buf, "%a", x);
#endif
bar_send_string(buf);
To reconstitute the double
char *s = foo_get_string();
double y;
// %f decode strings in decimal(f), exponential(e) or hexadecimal/exponential notation(a)
if (sscanf(s, "%f", &y) != 1) Handle_Error(s);
else use(y);
A much better idea would be to send the double directly as 8 bytes in network byte order.
You can use a union
typedef union
{
double a;
uint8_t bytes[8];
} DoubleUnionType;
DoubleUnionType DoubleUnion;
//Assign the double by
DoubleUnion.a = -47.848945;
Then you can make a network byte order conversion function
void htonfl(uint8_t *out, uint8_t *in)
{
#if LITTLE_ENDIAN // Use macro name as per architecture
out[0] = in[7];
out[1] = in[6];
out[2] = in[5];
out[3] = in[4];
out[4] = in[3];
out[5] = in[2];
out[6] = in[1];
out[7] = in[0];
#else
memcpy (out, in, 8);
#endif
}
And call this function before transmission and after reception.
I have the following code where I have an array. I add a large number to that array, but when printing it, it shows a smaller, incorrect value. Why is that, and is there a way to fix this?
int x[10];
x[0] = 252121521121;
printf(" %i " , x[0]); //prints short wrong value
Your number requires 38 bit. If your platform's int isn't that big (and there's no reason it should be), the number simply won't fit. (In fact, even the int literal should already have triggered a compiler warning, supposing that this is C or C++.)
You could always use a data type of guaranteed size, like an int64 or something like that, depending on your language and platform. Probably no need for arbitrary-precision libraries here.
In C, include <stdint.h> and use int64_t, or just use long long int, and make sure you initialize it from a long long integer literal, e.g. 252121521121LL. (Long longs are only officially part of the most recent language standards, I might add.)
(Edit: long long int is guaranteed to be at least 64 bit, so it should be a good choice.)
An int, on most systems, is 32 bits. That's enough to store a number of about 2 billion signed, or 4 billion unsigned. To store larger numbers you need a larger form of int. (Unfortunately, on some systems a long int is the same as an int -- good ol' standardization -- so you need to go to a long long int. Better if you can find a typedef in your library such as int64_t.)
If you only have the problem with this particular number, then just use a long long int as suggested in previous answers.
Otherwise, for even larger numbers (>1E19 for signed numbers), you might want to switch to a large number library or code yourself this kind of data type. You basically need to store each digit of your number in an array (or linked list) and manually code basic operations you need on them : adding, subtracting, multiplying etc.
Some libraries include
https://mattmccutchen.net/bigint/
or GMP.
Well, your number just seems to exceed the maximum value a 32bit integer can hold..
long long int A = 3289168178315264;
long long int B = 1470960727228416;
double D = sqrt(5);
long long int out = A + B*D;
printf("%lld",out);
This gives result : -2147483648
I am not able to figure out why (it should be a positive result).
Can somebody help?
maybe you have to specify those constants as "long long" literals? e.g. 3289168178315264LL
What compiler/operating system are you using? I ran your code using Visual C++ 2008 Express Edition on Windows XP and IT WORKS - answer: 6578336356630528 (this is a 53-bit number, so it just fits inside a double).
I also tried two variations to see if the order of operations mattered:
long long int out = A;
out+=B*D;
long long int out = B*D;
out+=A;
These both work as well!
Curious.
My guess is that the compiler needs to round the result from "A+B*D" to an integer first, because you're storing the result inside an int field. So basically, you're having a datatype conflict.
Both A and B are still valid numbers for a long long int, which is 8 bytes long. You could even multiply them by 1.000 and still have valid long long int values. In some other languages it's also known as the int64.
A double, however, is also 64-bits but part of those are used as exponent. When you multiply a double with an int64, the result would be another double. Adding another int64 to it still keeps it a double. Then you're assigning the result to an int64 again without using a specific rounding method. It would not surprise me if the compiler would use a 4-bit rounding function for this. Am even amazed that the compiler doesn't puke and break on that statement!
Anyways, when using large numbers like these, you have to be extra careful when mixing different types.
Your answer (have to verify) calcuates successfully, however, it causes an overflow into the sign bit, which makes the answer negative. Solution : make all your variables unsigned.
Why:
Numbers are stored as series of bits in you computer's memory. The first bit in such a series, when set means that you number is negative. So the calculation works, but overflows into the sign bit.
Recommendation:
If you're working with numbers this big, I recommend you to get a multiprecision arithmetic library. 'T will save you a lot of time and trouble.
The parameter to sqrt should be double.
#include <math.h>
double sqrt( double num );
And also we should explict cast the result from B * D to long long.
long long int A = 3289168178315264;
long long int B = 1470960727228416;
double D = sqrt(5.0);
printf("%f\n",D);
long long int out = A + (long long) (B * D);
printf("%lld",out);