Division by 3 without division operator - c

I was given this question in an interview to describe the output in comments.
unsigned int d2(unsigned int a)
{
__int64 q = (__int64)a * 0x0AAAAAAAB; // (2^33+1) / 3
return (unsigned int)(q >> 33);
}
I have checked other questions in Stackoverflow related to division by 3 but none seems so fast and small.
Can anybody help me in explaining how the function is giving the output written in comments ?

The function divides a 32-bit unsigned number by 3.
If you multiply by 2^33 and then divide by 2^33 (by right shifting), then you get the original number. But if you multiply by (2^33)/3 and then divide by 2^33, you effectively divide by three.
The last digit is B instead of A to cause the result to be rounded up.
There is no need to actually write this in your code because the compiler will usually do this for you. Try it and see. (Furthermore, for a signed input, the compiler can safely generate a signed right shift, but the C language does not define such an operation.)

Related

I need to create a decimal to binary program that can receive input of up to 100,000,000 and output the whole answer without displaying rubbish

As you've read, I created a decimal to binary program and it works well, but it cannot handle user input equal to 100,000,000. My solution is to print each character as it goes, but I do not know what the appropriate loop to use is, and I am also not that great with the math so the main formula to be used is unclear to me. Arrays are not allowed. Any advice is appreciated. Thank you.
#include <stdio.h>
unsigned long long int input,inp,rem=0,ans=0,place_value=1,ans;
int main()
{
printf("\nYou have chosen Decimal to Binary and Octal Conversion!\n");
printf("Enter a decimal number:\n");
scanf("%llu", &input);
inp=input;
while(input){
rem=input%2;
input=input/2;
ans=ans+(rem*place_value);
place_value=place_value*10;
}
printf("%llu in Decimal is %llu in Binary Form.\n", inp,ans);
return 0;
}
Edit: I have already read all your answers and I have done my best to understand them. I was able to understand most of what was brought up but some terms or lessons mentioned will require more time from me to learn. I have already submitted my output without solving the 100,000,000 issue but I intend to use the knowledge I have now to create better outputs. I tried asking a friend of mine and he told me he was able to do it using method 2 found here:https://www.wikihow.com/Convert-from-Decimal-to-Binary. Perhaps my instructor simply wanted to teach us how to fully utilize control structures and data types which is why there are so many restrictions. Thank you all for your time and god bless.
So as the comments have explained, the decimal number 100000000 has the 27-bit binary representation 101111101011110000100000000. We can therefore store that in a 32-bit int with no problem. But if we were to try to store the decimal number 101111101011110000100000000, which just happens to look like a binary number, well, that would require 87 bits, so it won't even fit into a 64-bit long long integer.
And the code in this question does try to compute its result, ans, as a decimal number which just happens to look like a binary number. And for that reason this code can't work for numbers larger than 1048575 (assuming a 64-bit unsigned long long int).
And this is one reason that "decimal to binary" conversion (or, for that matter, conversion to any base) should normally not be done to a result variable that's an integer. Normally, the result of such a conversion — to any base — should either be done to a result variable that's a string, or it should be printed out immediately. (The moral here is that the base only matters when a number is printed out for a human to read, which implies either a string, and/or something printed to, say, stdout.)
However, in C a string is of course an array. So asking someone to do base conversion without using arrays is a perverse, pointless exercise.
If you print the digits out immediately, you don't have to store them in an array. But the standard algorithm — repeated division by 2 (or whatever the base is) generates digits in reverse order, from least-significant to most-significant, which ends up being right-to-left, which is the wrong order to just print them out. Conventional convert-to-digits code usually stores the computed digits into an array, and then reverses the array — but if there's a prohibition against using arrays, this strategy is (again pointlessly) denied to us.
The other way to get the digits out in the other order is to use a recursive algorithm, as #chux has demonstrated in his answer.
But just to be perverse in my own way, I'm going to show another way to do it.
Even though it's generally a horrible idea, constructing the digits into an integer, that's in base 10 but looks like it's in base 2, is at least one way to store things up and get the answer back out with the digits in the right order. The only problem is that, as we've seen, the number can get outrageously big, especially for base 2. (The other problem, not that it matters here, is that this approach won't work for bases greater than 10, since there's obviously no way to construct a decimal number that just happens to look like it's in, say, base 16.)
The question is, how can we represent integers that might be as big as 87 bits? And my answer is, we can use what's called "multiple precision arithmetic". For example, if we use a pair of 64-bit unsigned long long int variables, we can theoretically represent numbers up to 128 bits in size, or 340282366920938463463374607431768211455!
Multiple precision arithmetic is an advanced but fascinating and instructive topic. Normally it uses arrays, too, but if we limit ourselves to just two "halves" of our big numbers, and make certain other simplifications, we can do it pretty simply, and achieve something just powerful enough to solve the problem in the question.
So, to repeat, we're going to represent a 128-bit number as a "high half" and a "low half". Actually, to keeps things simpler, it's not actually going to be a 128-bit number. To keep things simpler, the "high half" is going to be the first 18 digits of a 36-digit decimal number, and the "low half" is going to be the other 18 digits. This will give us the equivalent of of only about 120 bits, but it will still be plenty for our purposes.
So how do we do arithmetic on 36-digit numbers represented as "high" and "low" halves? Actually, it ends up being more or less the same way we learned how to do pencil-and-paper arithmetic on numbers represented as digits, at all.
If I have one of these "big" numbers, in its two halves:
high1 low1
and if I have a second one, also in two halves:
high2 low2
and if I want to compute the sum
high1 low1
+ high2 low2
-----------
high3 low3
the way I do it is to add low1 and low2 to get the low half of the sum, low3. If low3 is less than 1000000000000000000 — that is, if it has 18 digits or less — I'm okay, but if it's bigger than that, I have a carry into the next column. And then to get the high half of the sum, high3, I just add high1 plus high2 plus the carry, if any.
Multiplication is harder, but it turns out for this problem we're never going to have to compute a full 36-digit × 36-digit product. We're only ever going to have to multiply one of our big numbers by a small number, like 2 or 10. The problem will look like this:
high1 low1
× fac
-----------
high3 low3
So, again by the rules of paper-and-pencil arithmetic we learned long ago, low3 is going to be low1 × fac, and high3 is going to be high1 × fac, again with a possible carry.
The next question is how we're going to carry these low and high halves around. As I said, normally we'd use an array, but we can't here. The second choice might be a struct, but you may not have learned about those yet, and if your crazy instructor won't let you use arrays, it seems that using structures might well be out of bounds, also. So we'll just write a few functions that accept high and low halves as separate arguments.
Here's our first function, to add two 36-digit numbers. It's actually pretty simple:
void long_add(unsigned long long int *hi, unsigned long long int *lo,
unsigned long long int addhi, unsigned long long int addlo)
{
*hi += addhi;
*lo += addlo;
}
The way I've written it, it doesn't compute c = a + b; it's more like a += b. That is, it takes addhi and addlo and adds them in to hi and lo, modifying hi and lo in the process. So hi and lo are passed in as pointers, so that the pointed-to values can be modified. The high half is *hi, and we add in the high half of the number to be added in, addhi. And then we do the same thing with the low half. And then — whoops — what about the carry? That's not too hard, but to keep things nice and simple, I'm going to defer it to a separate function. So my final long_add function looks like:
void long_add(unsigned long long int *hi, unsigned long long int *lo,
unsigned long long int addhi, unsigned long long int addlo)
{
*hi += addhi;
*lo += addlo;
check_carry(hi, lo);
}
And then check_carry is simple, too. It looks like this:
void check_carry(unsigned long long int *hi, unsigned long long int *lo)
{
if(*lo >= 1000000000000000000ULL) {
int carry = *lo / 1000000000000000000ULL;
*lo %= 1000000000000000000ULL;
*hi += carry;
}
}
Again, it accepts pointers to lo and hi, so that it can modify them.
The low half is *lo, which is supposed to be at most an 18-bit number, but if it's got 19 — that is, if it's greater than or equal to 1000000000000000000, that means it has overflowed, and we have to do the carry thing. The carry is the extent by which *lo exceeds 18 digits — it's actually just the top 19th (and any greater) digit(s). If you're not super-comfortable with this kind of math, it may not be immediately obvious that taking *lo, and dividing it by that big number (it's literally 1 with eighteen 0's) will give you the top 19th digit, or that using % will give you the low 18 digits, but that's exactly what / and % do, and this is a good way to learn that.
In any case, having computed the carry, we add it in to *hi, and we're done.
So now we're done with addition, and we can tackle multiplication. For our purposes, it's just about as easy:
void long_multiply(unsigned long long int *hi, unsigned long long int *lo,
unsigned int fac)
{
*hi *= fac;
*lo *= fac;
check_carry(hi, lo);
}
It looks eerily similar to the addition case, but it's just what our pencil-and-paper analysis said we were going to have to do. (Again, this is a simplified version.) We can re-use the same check_carry function, and that's why I chose to break it out as a separate function.
With these functions in hand, we can now rewrite the binary-to-decimal program so that it will work with these even bigger numbers:
int main()
{
unsigned int inp, input;
unsigned long long int anslo = 0, anshi = 0;
unsigned long long int place_value_lo = 1, place_value_hi = 0;
printf("Enter a decimal number:\n");
scanf("%u", &input);
inp = input;
while(input){
int rem = input % 2;
input = input / 2;
// ans=ans+(rem*place_value);
unsigned long long int tmplo = place_value_lo;
unsigned long long int tmphi = place_value_hi;
long_multiply(&tmphi, &tmplo, rem);
long_add(&anshi, &anslo, tmphi, tmplo);
// place_value=place_value*10;
long_multiply(&place_value_hi, &place_value_lo, 10);
}
printf("%u in Decimal is ", inp);
if(anshi == 0)
printf("%llu", anslo);
else printf("%llu%018llu", anshi, anslo);
printf(" in Binary Form.\n");
}
This is basically the same program as in the question, with these changes:
The ans and place_value variables have to be greater than 64 bits, so they now exist as _hi and _lo halves.
We're calling our new functions to do addition and multiplication on big numbers.
We need a tmp variable (actually tmp_hi and tmp_lo) to hold the intermediate result in what used to be the simple expression ans = ans + (rem * place_value);.
There's no need for the user's input variable to be big, so I've reduced it to a plain unsigned int.
There's also some mild trickiness involved in printing the two halves of the final answer, anshi and anslo, back out. But if you compile and run this program, I think you'll find it now works for any input numbers you can give it. (It should theoretically work for inputs up to 68719476735 or so, which is bigger than will fit in a 32-bit input inp.)
Also, for those still with me, I have to add a few disclaimers. The only reason I could get away with writing long_add and long_multiply functions that looked so small and simple was that they are simple, and work only for "easy" problems, without undue overflow. I chose 18 digits as the maximum for the "high" and "lo" halves because a 64-bit unsigned long long int can actually hold numbers up to the equivalent of 19 digits, and that means that I can detect overflow — of up to one digit — simply, with that > 1000000000000000000ULL test. If any intermediate result ever overflowed by two digits, I'd have been in real trouble. But for simple additions, there's only ever a single-digit carry. And since I'm only ever doing tiny multiplications, I could cheat and assume (that is, get away with) a single-digit carry there, too.
If you're trying to do multiprecision arithmetic in full generality, for multiplication you have to consider partial products that have up to twice as many digits/bits as their inputs. So you either need to use an output type that's twice as wide as the inputs, or you have to split the inputs into halves ("sub-halves"), and work with them individually, basically doing a little 2×2 problem, with various carries, for each "digit".
Another problem with multiplication is that the "obvious" algorithm, the one based on the pencil-and-paper technique everybody learned in elementary school, can be unacceptably inefficient for really big problems, since it's basically O(N2) in the number of digits.
People who do this stuff for a living have lots of more-sophisticated techniques they've worked out, for things like detecting overflow and for doing multiplication more efficiently.
And then if you want some real fun (or a real nightmare, full of bad flashbacks to elementary school), there's long division...
OP's code suffers from overflow in place_value*10
A way to avoid no array and range limitations is to use recursion.
Perhaps beyond where OP is now.
#include <stdio.h>
void print_lsbit(unsigned long long x) {
if (x > 1) {
print_lsbit(x / 2); // Print more significant digits first
}
putchar(x % 2 + '0'); // Print the LSBit
}
int main(void) {
printf("\nYou have chosen Decimal to Binary and Octal Conversion!\n");
printf("Enter a decimal number:\n");
//scanf("%llu", &input);
unsigned long long input = 100000000;
printf("%llu in Decimal is ", input);
print_lsbit(input);
printf(" in Binary Form.\n");
return 0;
}
Output
You have chosen Decimal to Binary and Octal Conversion!
Enter a decimal number:
100000000 in Decimal is 101111101011110000100000000 in Binary Form.

Why doesn't my code work when replacing 622.08E6 with 622080000?

I recently came across a C code (working by the way) where I found
freq_xtal = ((622.08E6 * vcxo_reg_val->hiv * vcxo_reg_val->n1)/(temp_rfreq));
From my intuition it seems that 622.08E6 should mean 622.08 x 106. From this question this assumption is correct.
So I tried replacing 622.08e6 with
uint32_t default_freq = 622080000;
For some reason this doesn't seem to work
Any thoughts or suggestions appreciated
The problem you are having (and I'm speculating here because I don't have the rest of your code) appears to be that replacing the floating point with an integer caused the multiplication and division to be integer based, and not decimal based. As a result, you now compute the wrong value.
Try type casting your uint32_t to a double and see if that clears it up.
The problem is due to overflow!
The original expression (622.08E6 * vcxo_reg_val->hiv * vcxo_reg_val->n1)/temp_rfreq (you have too many unnecessary parentheses though) is done in double precision because 622.08E6 is a double literal. That'll result in a floating-point value
However if you replace the literal with 622080000 then the whole expression will be done in integer math if all the variables are integer. But more importantly, integer math will overflow (at least much sooner than floating-point one)
Notice that UINT32_MAX / 622080000.0 ≈ 6.9. That means just multiply the constant by 7 and it'll overflow. However in the code you multiply 622080000 with 2 other values whose product may well be above 6. You should add the ULL suffix to do the math in unsigned long long
freq_xtal = (622080000ULL * vcxo_reg_val->hiv * vcxo_reg_val->n1)/temp_rfreq;
or change the variable to uint64_t default_freq = 622080000ULL;

How does C perform the % operation interally

I am curious to understand the logic behind the mod operation since I understand that bit-shifting operations can be performed to do different things such as bit shifting to multiply.
One way I can see it being done is by a recursive algorithm that keeps dividing until you cannot divide anymore, but this does not seem efficient.
Any ideas will be helpful. Thanks in advance!
The quick version is: Depends on hardware, the optimizer, if it's division by a constant or not (pdf), if there's exceptions to be checked for (e.g. modulo by 0), if and how negative numbers are handled (this is a scary question for C++), etc...
R gave a nice, concise answer for unsigned integers, but it's difficult to understand unless you're well versed with C.
The crux of the technique illuminated by R is to strip away multiples of q until there's no more multiples of q left. We could naively do this with a simple loop:
while (p >= q) p -= q; // One liner, woohoo!
The code may be short, but for large values of p and small values of q this might take a very long time.
Better than stripping away one q at a time would be to strip away many q's at a time. Note that we actually want to strip away as many q's as possible -- that is, floor(p/q) many q's... And indeed, that's a valid technique. For unsigned integers, one would expect that p % q == p - (p / q) * q. (Note that unsigned integer division rounds down.)
But this almost feels like cheating because division and remainder operations are so intimately related. (In fact, often if hardware natively supports division, it supports a divide-and-compute-remainder operation because they're so strongly related.)
Assuming we've no access to division, how shall we find a multiple of q greater than 1 to strip away? In hardware, fixed shift operations are cheap (if not practically free) and conceptually represent multiplication by a non-negative power of two. For example, shifting a bit string left by 3 is equivalent to multiplying by 8 (that is, 2^3), e.g. 5 decimal is equivalent to '101' binary. Shift '101' in binary by adding three zeroes on the right (giving '101000') and the result is 50 in decimal -- five times eight.
Likewise, shift operations are very cheap as software operations and you'll struggle to find a controller that doesn't support them and quickly. (Some architectures such as ARM can even combine shifts with other instructions to make them 'free' a good deal of the time.)
ARMed (couldn't resist) with these shift operations, we can proceed as follows:
Find out the largest power of two we can multiply q by and still be less than p.
Working from the largest power of two to the smallest, multiply q by each power of two and if it's less than what's left of p subtract it from what's left of p.
Whatever you've got left is the remainder.
Why does this work? Because in the end you'll find that all the subtracted powers of two actually sum to floor(p / q)! Don't take my word for it, similar knowledge has been known for a very long time.
Breaking apart R's answer:
#define HI (-1U-(-1U/2))
This effectively gives you an unsigned integer with only the highest value bit set.
unsigned i;
for (i=0; !(HI & (q<<i)); i++);
This line actually finds the highest power of two q can be multiplied before overflowing an unsigned integer. This isn't strictly necessary, but it doesn't change the results other than increasing the amount of execution time required.
In case you're not familiar with the C-isms in this line:
(q<<i) is a left bit shift by i. Recall this is equivalent to multiplying by 2^i.
HI & (q<<i) performs a bitwise-AND. Since HI only has its top bit populated this will only result in a non-zero value when (q<<i) is large enough to cause the top bit to be non-zero. One more shift over to the left and there'd be an integer overflow.
!(HI & (q<<i)) is 'true' when (HI & (q<<i)) is zero and 'false' otherwise.
do { if (p >= (q<<i)) p -= (q<<i); } while (i--);
This is a simple decreasing loop do { .... } while (i--);. Note that post-decrementing is used on i so the loop executes, then it checks to see if i is not zero, then it subtracts one from i, and then if its earlier check resulted in true it continues. This has the property that the loop executes its last time when i is 0. This is important because we may need to strip away an unmultiplied copy of q.
if (p >= (q<<i)) checks if the 2^i * q is less than or equal to p. If it is, p -= (q<<i) strips it away.
The remainder is left.
While most C implementations run on hardware that has a division instruction, the remainder operation can be performed roughly like this, for computing p%q, assuming unsigned values:
#define HI (-1U-(-1U/2))
unsigned i;
for (i=0; !(HI & (q<<i)); i++);
do { if (p >= (q<<i)) p -= (q<<i); } while (i--);
The resulting remainder is in p.
In addition to a hardware instruction and implementation using shifts, as R.. suggests, there's also reciprocal multiplication.
This technique can be used when the right-hand side of % is a constant, known at compile time.
Reciprocal multiplication is used to implement division, but using it for % is easy, based on the formula a%b == a-(a/b)*b.
Depending on the smarts of the optimizer, there is a shortcut for modulo base 2. For example, a % 32 can be implemented as a & 31. In general, a % (2^N) == a & (2^N -1). This is lightning fast compared to division. Most dividers (ever hardware) require at least 1 cycle for each bit of the result to calculate, while logic AND is just a few cycle operation (in the pipeline).
EDIT: this only works if a is unsigned !

Checking overflow in C

Let us have
int a, b, c; // may be char or float, anything actually
c = a + b;
let int type be represented by 4 bytes. Let's say a+b requires 1 bit more than 4 bytes (ie, let's say the result is 1 00....0 (32 zeroes, in binary)). This would result in C=0, and I am sure the computer's microprocessor would set some kind of overflow flag. Is there any built in method to check this in C?
I am actually working on building a number type that is 1024 bits long (for example, int is a built in number type that is 32 bits long). I have attempted this using unsigned char type arrays with 128 elements. I also need to define addition and subtraction operations on these numbers. I have written the code for addition but I am having problem on subtraction. I don't need to worry about getting negative results because the way I will call the subtracting function always ensures that the result of subtraction is always positive, but to implement the subtraction function I need to somehow get the 2's complement of the subtrahend, which is it self my custom 1024 bit number.
I am sorry if it is difficult to understand my description. If needed I will elaborate it more. I am including my code for the adding function and the incomplete subtracting function. the NUM_OF_WORDS is a constant declared as
#define NUM_OF_WORDS 128
Please let me know if you did not understand my question or any part of my code.
PS: I don't see how to upload attachments in this forum so I am directing you to another website. My code may be found there
click on download in this page
Incidentally, I found this
I intend to replace INT_MAX by UCHAR_MAX as my 1024 bit numbers consist of array of char types (8-bit variable)
Is this check sufficient for all cases?
Update:
Yes I am working on Cryptography.
I need to implement a Montgomery Multiplication routine for 1024 bit size integers.
I had also considered using GMP library but couldn't find out how to use it.
I looked up a tutorial and after a few small modifications I was able to build the GMP project file in VC++ 6 which resulted in a lot of .obj files, but now I am not sure what to do with them.
Still it would be good if I can write my own data types, as it will give me complete control over how the arithmetic operations on my custom data type work, and I also need to be able to extend it from 1024 bits to larger numbers in the future.
If you're adding unsigned numbers then you can do this
c = a+b;
if (c<a) {
// you'll get here if and only if overflow has occurred
}
and you may even find that your compiler is clever enough to implement it by checking the overflow or carry flag instead of doing an extra comparison. For instance, I just fed this to gcc -O3 -S:
unsigned int foo() {
unsigned int x=g(), y=h();
unsigned int z = x+y;
return z<0 ? 0 : z;
}
and got this for the key bit of the code:
movl $0, %edx
addl %ebx, %eax
cmovb %edx, %eax
where you'll notice there's no extra comparison instruction.
Contrary to popular belief, an int overflow results in undefined behavior. This means that once a + b overflows, it doesn't make sense to use this value (or do anything else, for that matter). The wrap-around is just what most machines happen to do in case of overflow, but they might as well explode.
To check whether an int overflow will occur when adding two non-negative integers a and b, you can do the following:
if (INT_MAX - b < a) {
/* int overflow when evaluating a+b */
}
This is due to the fact that if a + b > INT_MAX, then INT_MAX - b < a, but INT_MAX - b can not overflow.
You will have to pay special attention to the case where b is negative, which is left as an exercise for the reader ;)
Regarding your actual goal: 1024-bit numbers suffer from exactly the same overall issues as 32-bit numbers. It might be more promising to choose a completely different approach, e.g. representing numbers as, say, linked lists of digits, using a very large base B. Usually, B is chosen such that B = sqrt(INT_MAX), so multiplication of digits doesn't overflow the machine's int type.
This way, you can represent arbitrarily large numbers, where "arbitrary" means "only limited by the amount of main memory available".
If you are working with unisigned numbers, then if a <= UINT_MAX, b <= UINT_MAX, and a + b >= UINT_MAX, then c = (a + b) % UINT_MAX will always be smaller than a and b. And this is the only case where this can happen.
So you can detect overflow this way.
int add_return_overflow(unsigned int a, unsigned int b, unsigned int* c) {
*c = a + b;
return *c < a && *c < b;
}
Information which maybe useful in this subject :
Secure Coding in C and C++
IntSafe library
You can base a solution on a particular feature of the C language. According to the specification, when you add two unsigned ints, "the result value is congruent to the modulo 2^n of the true result" ("C - A reference manual" by Harbison and Steele). This means you can use some simple arithmetic checks to detect overflow:
#include <stdio.h>
int main() {
unsigned int a, b, c;
char *overflow;
a = (unsigned int)-1;
for (b = 0; b < 3; b++) {
c = a + b;
overflow = (a < b) ? "yes" : "no";
printf("%u + %u = %u, %s overflow\n", a, b, c, overflow);
}
return 0;
}
Just xor MSB of both operands and result. Result of this operation is overflow flag.
But that will not show you if result is correct or not. (it might be correct result even whit overflow) for instance 3 + (-1) is 2 whit overflow.
In order to figure that using signed arithmetic you need to check if both operdas were same sign (xor of MSB).
Once you add 1 to INT_MAX, you end up getting INT_MIN (i.e. overflow).
In C, there's no reliable way to test for overflow, because all 32 bytes are used to represent the integer (and not a state flag). You can only test to see if the number you get will be within a valid range, as in your link.
You'll get answers suggesting that you can test if (c < a), however note that you could overflow the value of a and/or b to the point where their addition forms a number greater than a (but still overflown)

Finding the smallest integer that can not be represented as an IEEE-754 32 bit float [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Which is the first integer that an IEEE 754 float is incapable of representing exactly?
Firstly, this IS a homework question, just to clear this up immediately. I'm not looking for a spoon fed solution of course, just maybe a little pointer to the right direction.
So, my task is to find the smallest positive integer that can not be represented as an IEEE-754 float (32 bit). I know that testing for equality on something like "5 == 5.00000000001" will fail, so I thought I'd simply loop over all the numbers and test for that in this fashion:
int main(int argc, char **argv)
{
unsigned int i; /* Loop counter. No need to inizialize here. */
/* Header output */
printf("IEEE floating point rounding failure detection\n\n");
/* Main program processing */
/* Loop over every integer number */
for (i = 0;; ++i)
{
float result = (float)i;
/* TODO: Break condition for integer wrapping */
/* Test integer representation against the IEEE-754 representation */
if (result != i)
break; /* Break the loop here */
}
/* Result output */
printf("The smallest integer that can not be precisely represented as IEEE-754"
" is:\n\t%d", i);
return 0;
}
This failed. Then I tried to subtract the integer "i" from the floating point "result" that is "i" hoping to achieve something of a "0.000000002" that I could try and detect, which failed, too.
Can someone point me out a property of floating points that I can rely on to get the desired break condition?
-------------------- Update below ---------------
Thanks for help on this one! I learned multiple things here:
My original thought was indeed correct and determined the result on the machine it was intended to be run on (Solaris 10, 32 bit), yet failed to work on my Linux systems (64 bit and 32 bit).
The changes that Hans Passant added made the program also work with my systems, there seem to be some platform differences going on here that I didn't expect,
Thanks to everyone!
The problem is that your equality test is a float point test. The i variable will be converted to float first and that of course produces the same float. Convert the float back to int to get an integer equality test:
float result = (float)i;
int truncated = (int)result;
if (truncated != i) break;
If it starts with the digits 16 then you found the right one. Convert it to hex and explain why that was the one that failed for a grade bonus.
I think you should reason on the representation of the floating numbers as (base, sign,significand,exponent)
Here it is an excerpt from Wikipedia that can give you a clue:
A given format comprises:
* Finite numbers, which may be either base 2 (binary) or base 10
(decimal). Each finite number is most
simply described by three integers: s=
a sign (zero or one), c= a significand
(or 'coefficient'), q= an exponent.
The numerical value of a finite number
is
(−1)s × c × bq
where b is the base (2 or 10). For example, if the sign is 1
(indicating negative), the significand
is 12345, the exponent is −3, and the
base is 10, then the value of the
number is −12.345.
That would be FLT_MAX+1. See float.h.
Edit: or actually not. Check the modf() function in math.h

Resources