Find leading 1s in the binary number - c

everyone,
I am new in C and I try to understand the work with bytes, binary numbers and another important thing for the beginner.
I hope someone can push me in the right direction here.
For example, I have a 32- bits number 11000000 10101000 00000101 0000000 (3232236800).
I also assigned each part of this number to separate variables as a=11000000 (192), b = 10101000 (168), c =00000101 (5) d = 0000000 (0). I am not sure if I really need this.
Is there any way to find the last 1 in the number and use this location to calculate the number of leading 1s?
Thank you for help!

You can determine the bitposition of the first leading 1 by this formula:
floor(ln(number)/ln(2))
Where "floor()" means rounding down.
For counting the number of consecutive leading ones (if I get the second part of your question correctly) I can only imagine a loop.
Note 1:
The formula is the math formula for "logarithm of number on the base of 2".
The same works with log10(). Basically you can use any logarithm (i.e. to any base) this way to adapt to a different base.
Note 2:
It is of course questionable whether this formula is more efficient than search from MSB downwards with a loop. It could be with good FPU support. It probably is not for 8bit values. Double check in case you are out to speed optimise.

Sorry I am not a C expert, but here's the Python code I came up with:
num_ones = 0
while integer > 0:
if integer % 2 == 1:
num_ones += 1
else:
num_ones = 0
integer = integer >> 1
Basically, I count the number of continuous 1's by bit shifting the given integer.
A zero would reset the counter.

Related

How does a float get converted to scientific notation for storage?

http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)FloatingPoint.html
I was looking into why there are sometimes rounding issues when storing a float. I read the above link, and see that floats are converted to scientific notation.
https://babbage.cs.qc.cuny.edu/IEEE-754/index.xhtml
Base is always 2. So, 8 is stored as 1 * 2^3. 9 is stored as 1.001 * 2^3.
What is the math algorithm to determine the mantissa/significand and exponent?
Here is C++ code to convert a decimal string to a binary floating-point value. Although the question is tagged C, I presume the question is more about the algorithm and calculations than the programming language.
The DecimalToFloat class is constructed with a string that contains solely decimal digits and a decimal point (a period, most one). In its constructor, it shows how to use elementary school multiplication and long division to convert the number from decimal to binary. This demonstrates the fundamental concepts using elementary arithmetic. Real implementations of decimal-to-floating-point conversion in commercial software using algorithms that are faster and more complicated. They involve prepared tables, analysis, and proofs and are the subjects of academic papers. A significant problem of quality implementations of decimal-to-binary-floating-point conversion is getting the rounding correct. The disparate nature of powers of ten to powers of two (both positive and negative powers) makes it tricky to correctly determine when some values are above or below a point where rounding changes. Normally, when we are parsing something like 123e300, we want to figure out the binary floating-point result without actually calculating 10300. That is a much more extensive subject.
The GetValue routine finishes the preparation fo the number, taking the information prepared by the constructor and rounding it to the final floating-point form.
Negative numbers and exponential (scientific) notation are not handled. Handling negative numbers is of course easy. Exponential notation could be accommodated by shifting the input—moving the decimal point right for positive exponents or left for negative exponents. Again, this is not the fastest way to perform the conversion, but it demonstrates fundamental ideas.
/* This code demonstrates conversion of decimal numerals to binary
floating-point values using the round-to-nearest-ties-to-even rule.
Infinities and subnormal values are supported and assumed.
The basic idea is to convert the decimal numeral to binary using methods
taught in elementary school. The integer digits are repeatedly divided by
two to extract a string of bits in low-to-high position-value order. Then
sub-integer digits are repeatedly multiplied by two to continue extracting
a string of bits in high-to-low position-value order. Once we have enough
bits to determine the rounding direction or the processing exhausts the
input, the final value is computed.
This code is not (and will not be) designed to be efficient. It
demonstrates the fundamental mathematics and rounding decisions.
*/
#include <algorithm>
#include <limits>
#include <cmath>
#include <cstring>
template<typename Float> class DecimalToFloat
{
private:
static_assert(std::numeric_limits<Float>::radix == 2,
"This code requires the floatng-point radix to be two.");
// Abbreviations for parameters describing the floating-point format.
static const int Digits = std::numeric_limits<Float>::digits;
static const int MaximumExponent = std::numeric_limits<Float>::max_exponent;
static const int MinimumExponent = std::numeric_limits<Float>::min_exponent;
/* For any rounding rule supported by IEEE 754 for binary floating-point,
the direction in which a floating-point result should be rounded is
completely determined by the bit in the position of the least
significant bit (LSB) of the significand and whether the value of the
trailing bits are zero, between zero and 1/2 the value of the LSB,
exactly 1/2 the LSB, or between 1/2 the LSB and 1.
In particular, for round-to-nearest, ties-to-even, the decision is:
LSB Trailing Bits Direction
0 0 Down
0 In (0, 1/2) Down
0 1/2 Down
0 In (1/2, 1) Up
1 0 Down
1 In (0, 1/2) Down
1 1/2 Up
1 In (1/2, 1) Up
To determine whether the value of the trailing bits is 0, in (0, 1/2),
1/2, or in (1/2, 1), it suffices to know the first of the trailing bits
and whether the remaining bits are zeros or not:
First Remaining Value of Trailing Bits
0 All zeros 0
0 Not all zeros In (0, 1/2)
1 All zeros 1/2
1 Not all zeros In (1/2, 1)
To capture that information, we maintain two bits in addition to the
bits in the significand. The first is called the Round bit. It is the
first bit after the position of the least significand bit in the
significand. The second is called the Sticky bit. It is set if any
trailing bit after the first is set.
The bits for the significand are kept in an array along with the Round
bit and the Sticky bit. The constants below provide array indices for
locating the LSB, the Round Bit, and the Sticky bit in that array.
*/
static const int LowBit = Digits-1; // Array index for LSB in significand.
static const int Round = Digits; // Array index for rounding bit.
static const int Sticky = Digits+1; // Array index for sticky bit.
char *Decimal; // Work space for the incoming decimal numeral.
int N; // Number of bits incorporated so far.
char Bits[Digits+2]; // Bits for significand plus two for rounding.
int Exponent; // Exponent adjustment needed.
/* PushBitHigh inserts a new bit into the high end of the bits we are
accumulating for the significand of a floating-point number.
First, the Round bit shifted down by incorporating it into the Sticky
bit, using an OR so that the Sticky bit is set iff any bit pushed below
the Round bit is set.
Then all bits from the significand are shifted down one position,
which moves the least significant bit into the Round position and
frees up the most significant bit.
Then the new bit is put into the most significant bit.
*/
void PushBitHigh(char Bit)
{
Bits[Sticky] |= Bits[Round];
std::memmove(Bits+1, Bits, Digits * sizeof *Bits);
Bits[0] = Bit;
++N; // Count the number of bits we have put in the significand.
++Exponent; // Track the absolute position of the leading bit.
}
/* PushBitLow inserts a new bit into the low end of the bits we are
accumulating for the significand of a floating-point number.
If we have no previous bits and the new bit is zero, we are just
processing leading zeros in a number less than 1. These zeros are not
significant. They tell us the magnitude of the number. We use them
only to track the exponent that records the position of the leading
significant bit. (However, exponent is only allowed to get as small as
MinimumExponent, after which we must put further bits into the
significand, forming a subnormal value.)
If the bit is significant, we record it. If we have not yet filled the
regular significand and the Round bit, the new bit is recorded in the
next space. Otherwise, the new bit is incorporated into the Sticky bit
using an OR so that the Sticky bit is set iff any bit below the Round
bit is set.
*/
void PushBitLow(char Bit)
{
if (N == 0 && Bit == 0 && MinimumExponent < Exponent)
--Exponent;
else
if (N < Sticky)
Bits[N++] = Bit;
else
Bits[Sticky] |= Bit;
}
/* Determined tells us whether the final value to be produced can be
determined without any more low bits. This is true if and only if:
we have all the bits to fill the significand, and
we have at least one more bit to help determine the rounding, and
either we know we will round down because the Round bit is 0 or we
know we will round up because the Round bit is 1 and at least one
further bit is 1 or the least significant bit is 1.
*/
bool Determined() const
{
if (Digits < N)
if (Bits[Round])
return Bits[LowBit] || Bits[Sticky];
else
return 1;
else
return 0;
}
// Get the floating-point value that was parsed from the source numeral.
Float GetValue() const
{
// Decide whether to round up or not.
bool RoundUp = Bits[Round] && (Bits[LowBit] || Bits[Sticky]);
/* Now we prepare a floating-point number that contains a significand
with the bits we received plus, if we are rounding up, one added to
the least significant bit.
*/
// Start with the adjustment to the LSB for rounding.
Float x = RoundUp;
// Add the significand bits we received.
for (int i = Digits-1; 0 <= i; --i)
x = (x + Bits[i]) / 2;
/* If we rounded up, the addition may have carried out of the
initial significand. In this case, adjust the scale.
*/
int e = Exponent;
if (1 <= x)
{
x /= 2;
++e;
}
// Apply the exponent and return the value.
return MaximumExponent < e ? INFINITY : std::scalbn(x, e);
}
public:
/* Constructor.
Note that this constructor allocates work space. It is bad form to
allocate in a constructor, but this code is just to demonstrate the
mathematics, not to provide a conversion for use in production
software.
*/
DecimalToFloat(const char *Source) : N(), Bits(), Exponent()
{
// Skip leading sources.
while (*Source == '0')
++Source;
size_t s = std::strlen(Source);
/* Count the number of integer digits (digits before the decimal
point if it is present or before the end of the string otherwise)
and calculate the number of digits after the decimal point, if any.
*/
size_t DigitsBefore = 0;
while (Source[DigitsBefore] != '.' && Source[DigitsBefore] != 0)
++DigitsBefore;
size_t DigitsAfter = Source[DigitsBefore] == '.' ? s-DigitsBefore-1 : 0;
/* Allocate space for the integer digits or the sub-integer digits,
whichever is more numerous.
*/
Decimal = new char[std::max(DigitsBefore, DigitsAfter)];
/* Copy the integer digits into our work space, converting them from
digit characters ('0' to '9') to numbers (0 to 9).
*/
for (size_t i = 0; i < DigitsBefore; ++i)
Decimal[i] = Source[i] - '0';
/* Convert the integer portion of the numeral to binary by repeatedly
dividing it by two. The remainders form a bit string representing
a binary numeral for the integer part of the number. They arrive
in order from low position value to high position value.
This conversion continues until the numeral is exhausted (High <
Low is false) or we see it is so large the result overflows
(Exponent <= MaximumExponent is false).
Note that Exponent may exceed MaximumExponent while we have only
produced 0 bits during the conversion. However, because we skipped
leading zeros above, we know there is a 1 bit coming. That,
combined with the excessive Exponent, guarantees the result will
overflow.
*/
for (char *High = Decimal, *Low = Decimal + DigitsBefore;
High < Low && Exponent <= MaximumExponent;)
{
// Divide by two.
char Remainder = 0;
for (char *p = High; p < Low; ++p)
{
/* This is elementary school division: We bring in the
remainder from the higher digit position and divide by the
divisor. The remainder is kept for the next position, and
the quotient becomes the new digit in this position.
*/
char n = *p + 10*Remainder;
Remainder = n % 2;
n /= 2;
/* As the number becomes smaller, we discard leading zeros:
If the new digit is zero and is in the highest position,
we discard it and shorten the number we are working with.
Otherwise, we record the new digit.
*/
if (n == 0 && p == High)
++High;
else
*p = n;
}
// Push remainder into high end of the bits we are accumulating.
PushBitHigh(Remainder);
}
/* Copy the sub-integer digits into our work space, converting them
from digit characters ('0' to '9') to numbers (0 to 9).
The convert the sub-integer portion of the numeral to binary by
repeatedly multiplying it by two. The carry-outs continue the bit
string. They arrive in order from high position value to low
position value.
*/
for (size_t i = 0; i < DigitsAfter; ++i)
Decimal[i] = Source[DigitsBefore + 1 + i] - '0';
for (char *High = Decimal, *Low = Decimal + DigitsAfter;
High < Low && !Determined();)
{
// Multiply by two.
char Carry = 0;
for (char *p = Low; High < p--;)
{
/* This is elementary school multiplication: We multiply
the digit by the multiplicand and add the carry. The
result is separated into a single digit (n % 10) and a
carry (n / 10).
*/
char n = *p * 2 + Carry;
Carry = n / 10;
n %= 10;
/* Here we discard trailing zeros: If the new digit is zero
and is in the lowest position, we discard it and shorten
the numeral we are working with. Otherwise, we record the
new digit.
*/
if (n == 0 && p == Low-1)
--Low;
else
*p = n;
}
// Push carry into low end of the bits we are accumulating.
PushBitLow(Carry);
}
delete [] Decimal;
}
// Conversion operator. Returns a Float converted from this object.
operator Float() const { return GetValue(); }
};
#include <iostream>
#include <cstdio>
#include <cstdlib>
static void Test(const char *Source)
{
std::cout << "Testing " << Source << ":\n";
DecimalToFloat<float> x(Source);
char *end;
float e = std::strtof(Source, &end);
float o = x;
/* Note: The C printf is used here for the %a conversion, which shows the
bits of floating-point values clearly. If your C++ implementation does
not support this, this may be replaced by any display of floating-point
values you desire, such as printing them with all the decimal digits
needed to distinguish the values.
*/
std::printf("\t%a, %a.\n", e, o);
if (e != o)
{
std::cout << "\tError, results do not match.\n";
std::exit(EXIT_FAILURE);
}
}
int main(void)
{
Test("0");
Test("1");
Test("2");
Test("3");
Test(".25");
Test(".0625");
Test(".1");
Test(".2");
Test(".3");
Test("3.14");
Test(".00000001");
Test("9841234012398123");
Test("340282346638528859811704183484516925440");
Test("340282356779733661637539395458142568447");
Test("340282356779733661637539395458142568448");
Test(".00000000000000000000000000000000000000000000140129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125");
// This should round to the minimum positive (subnormal), as it is just above mid-way.
Test(".000000000000000000000000000000000000000000000700649232162408535461864791644958065640130970938257885878534141944895541342930300743319094181060791015626");
// This should round to zero, as it is mid-way, and the even rule applies.
Test(".000000000000000000000000000000000000000000000700649232162408535461864791644958065640130970938257885878534141944895541342930300743319094181060791015625");
// This should round to zero, as it is just below mid-way.
Test(".000000000000000000000000000000000000000000000700649232162408535461864791644958065640130970938257885878534141944895541342930300743319094181060791015624");
}
One of the surprising things about a real, practical computer -- surprising to beginning programmers who have been tasked with writing artificial little binary-to-decimal conversion programs, anyway -- is how thoroughly ingrained the binary number system is in an actual computer, and how few and how diffuse any actual binary/decimal conversion routines actually are. In the C world, for example (and if we confine our attention to integers for the moment), there is basically one binary-to-decimal conversion routine, and it's buried inside printf, where the %d directive is processed. There are perhaps three decimal-to-binary converters: atof(), strtol(), and the %d conversion inside scanf. (There might be another one inside the C compiler, where it converts your decimal constants into binary, although the compiler might just call strtol() directly for those, too.)
I bring this all up for background. The question of "what's the actual algorithm for constructing floating-point numbers internally?" is a fair one, and I'd like to think I know the answer, but as I mentioned in the comments, I'm chagrined to discover that I don't, really: I can't describe a clear, crisp "algorithm". I can and will show you some code that gets the job done, but you'll probably find it unsatisfying, as if I'm cheating somehow -- because a number of the interesting details happen more or less automatically, as we'll see.
Basically, I'm going to write a version of the standard library function atof(). Here are my ground rules:
I'm going to assume that the input is a string of characters. (This isn't really an assumption at all; it's a restatement of the original problem, which is to write a version of atof.)
I'm going to assume that we can construct the floating-point number "0.0". (In IEEE 754 and most other formats, it's all-bits-0, so that's not too hard.)
I'm going to assume that we can convert the integers 0-9 to their corresponding floating-point equivalents.
I'm going to assume that we can add and multiply any floating-point numbers we want to. (This is the biggie, although I'll describe those algorithms later.) But on any modern computer, there's almost certainly a floating-point unit, that has built-in instructions for the basic floating-point operations like addition and multiplication, so this isn't an unreasonable assumption, either. (But it does end up hiding some of the interesting aspects of the algorithm, passing the buck to the hardware designer to have implemented the instructions correctly.)
I'm going to initially assume that we have access to the standard library functions atoi and pow. This is a pretty big assumption, but again, I'll describe later how we could write those from scratch if we wanted to. I'm also going to assume the existence of the character classification functions in <ctype.h>, especially isdigit().
But that's about it. With those prerequisites, it turns out we can write a fully-functional version of atof() all by ourselves. It might not be fast, and it almost certainly won't have all the right rounding behaviors out at the edges, but it will work pretty well. (I'm even going to handle negative numbers, and exponents.) Here's how it works:
skip leading whitespace
look for '-'
scan digit characters, converting each one to the corresponding digit by subtracting '0' (aka ASCII 48)
accumulate a floating-point number (with no fractional part yet) representing the integer implied by the digits -- the significand -- and this is the real math, multiplying the running accumulation by 10 and adding the next digit
if we see a decimal point, count the number of digits after it
when we're done scanning digits, see if there's an e/E and some more digits indicating an exponent
if necessary, multiply or divide our accumulated number by a power of 10, to take care of digits past the decimal, and/or the explicit exponent.
Here's the code:
#include <ctype.h>
#include <stdlib.h> /* just for atoi() */
#include <math.h> /* just for pow() */
#define TRUE 1
#define FALSE 0
double my_atof(const char *str)
{
const char *p;
double ret;
int negflag = FALSE;
int exp;
int expflag;
p = str;
while(isspace(*p))
p++;
if(*p == '-')
{
negflag = TRUE;
p++;
}
ret = 0.0; /* assumption 2 */
exp = 0;
expflag = FALSE;
while(TRUE)
{
if(*p == '.')
expflag = TRUE;
else if(isdigit(*p))
{
int idig = *p - '0'; /* assumption 1 */
double fdig = idig; /* assumption 3 */
ret = 10. * ret + fdig; /* assumption 4 */
if(expflag)
exp--;
}
else break;
p++;
}
if(*p == 'e' || *p == 'E')
exp += atoi(p+1); /* assumption 5a */
if(exp != 0)
ret *= pow(10., exp); /* assumption 5b */
if(negflag)
ret = -ret;
return ret;
}
Before we go further, I encourage you to copy-and-paste this code into a nearby C compiler, and compile it, to convince yourself that I haven't cheated too badly. Here's a little main() to invoke it with:
#include <stdio.h>
int main(int argc, char *argv[])
{
double d = my_atof(argv[1]);
printf("%s -> %g\n", argv[1], d);
}
(If you or your IDE aren't comfortable with command-line invocations, you can use fgets or scanf to read the string to hand to my_atof, instead.)
But, I know, your question was "How does 9 get converted to 1.001 * 2^3 ?", and I still haven't really answered that, have I? So let's see if we can find where that happens.
First of all, that bit pattern 10012 for 9 came from... nowhere, or everywhere, or it was there all along, or something. The character 9 came in, probably with a bit pattern of 1110012 (in ASCII). We subtracted 48 = 1100002, and out popped 10012. (Even before doing the subtraction, you can see it hiding there at the end of 111001.)
But then what turned 1001 into 1.001E3? That was basically my "assumption 3", as embodied in the line
double fdig = idig;
It's easy to write that line in C, so we don't really have to know how it's done, and the compiler probably turns it into a 'convert integer to float' instruction, so the compiler writer doesn't have to know how to do it, either.
But, if we did have to implement that ourselves, at the lowest level, we could. We know we have a single-digit (decimal) number, occupying at most 4 bits. We could stuff those bits into the significand field of our floating-point format, with a fixed exponent (perhaps -3). We might have to deal with the peculiarities of an "implicit 1" bit, and if we didn't want to inadvertently create a denormalized number, we might have to some more tinkering, but it would be straightforward enough, and relatively easy to get right, because there are only 10 cases to test. (Heck, if we found writing code to do the bit manipulations troublesome, we could even use a 10-entry lookup table.)
Since 9 is a single-digit number, we're done. But for a multiple-digit number, our next concern is the arithmetic we have to do: multiplying the running sum by 10, and adding in the next digit. How does that work, exactly?
Again, if we're writing a C (or even an assembly language) program, we don't really need to know, because our machine's floating-point 'add' and 'multiply' instructions will do everything for us. But, also again, if we had to do it ourselves, we could. (This answer's getting way too long, so I'm not going to discuss floating-point addition and multiplication algorithms just yet. Maybe farther down.)
Finally, the code as presented so far "cheated" by calling the library functions atoi and pow. I won't have any trouble convincing you that we could have implemented atoi ourselves if we wanted/had to: it's basically just the same digit-accumulation code we already wrote. And pow isn't too hard, either, because in our case we don't need to implement it in full generality: we're always raising to integer powers, so it's straightforward repeated multiplication, and we've already assumed we know how to do multiplication.
(With that said, computing a large power of 10 as part of our decimal-to-binary algorithm is problematic. As #Eric Postpischil noted in his answer, "Normally we want to figure out the binary floating-point result without actually calculating 10N." Me, since I don't know any better, I'll compute it anyway, but if I wrote my own pow() I'd use the binary exponentiation algorithm, since it's super easy to implement and quite nicely efficient.)
I said I'd discuss floating-point addition and multiplication routines. Suppose you want to add two floating-point numbers. If they happen to have the same exponent, it's easy: add the two significands (and keep the exponent the same), and that's your answer. (How do you add the significands? Well, I assume you have a way to add integers.) If the exponents are different, but relatively close to each other, you can pick the smaller one and add N to it to make it the same as the larger one, while simultaneously shifting the significand to the right by N bits. (You've just created a denormalized number.) Once the exponents are the same, you can add the significands, as before. After the addition, it may be important to renormalize the numbers, that is, to detect if one or more leading bits ended up as 0 and, if so, shift the significand left and decrement the exponent. Finally, if the exponents are too different, such that shifting one significand to the right by N bits would shift it all away, this means that one number is so much smaller than the other that all of it gets lost in the roundoff when adding them.
Multiplication: Floating-point multiplication is actually somewhat easier than addition. You don't have to worry about matching up the exponents: the final product is basically a new number whose significand is the product of the two significands, and whose exponent is the sum of the two exponents. The only trick is that the product of the two M-bit significands is nominally 2M bits, and you may not have a multiplier that can do that. If the only multiplier you have available maxes out at an M-bit product, you can take your two M-bit significands and literally split them in half by bits:
signif1 = a * 2M/2 + b
signif2 = c * 2M/2 + d
So by ordinary algebra we have
signif1 × signif2 = ac × 2M + ad × 2M/2 + bc × 2M/2 + bd
Each of those partial products ac, ad, etc. is an M-bit product. Multiplying by 2M/2 or 2M is easy, because it's just a left shift. And adding the terms up is something we already know how to do. We actually only care about the upper M bits of the product, so since we're going to throw away the rest, I imagine we could cheat and skip the bd term, since it contributes nothing (although it might end up slightly influencing a properly-rounded result).
But anyway, the details of the addition and multiplication algorithms, and the knowledge they contain about the floating-point representation we're using, end up forming the other half of the answer to the question of the decimal-to-binary "algorithm" you're looking for. If you convert, say, the number 5.703125 using the code I've shown, out will pop the binary floating-point number 1.011011012 × 22, but nowhere did we explicitly compute that significand 1.01101101 or that exponent 2 -- they both just fell out of all the digitwise multiplications and additions we did.
Finally, if you're still with me, here's a quick and easy integer-power-only pow function using binary exponentiation:
double my_pow(double a, unsigned int b)
{
double ret = 1;
double fac = a;
while(1) {
if(b & 1) ret *= fac;
b >>= 1;
if(b == 0) break;
fac *= fac;
}
return ret;
}
This is a nifty little algorithm. If we ask it to compute, say, 1021, it does not multiply 10 by itself 21 times. Instead, it repeatedly squares 10, leading to the exponential sequence 101, 102, 104, 108, or rather, 10, 100, 10000, 100000000... Then it looks at the binary representation of 21, namely 10101, and selects only the intermediate results 101, 104, and 1016 to multiply into its final return value, yielding 101+4+16, or 1021, as desired. It therefore runs in time O(log2(N)), not O(N).
And, tune in tomorrow for our next exciting episode when we'll go in the opposite direction, writing a binary-to-decimal converter which will require us to do... (ominous chord)
floating point long division!
Here's a completely different answer, that tries to focus on the "algorithm" part of the question. I'll start with the example you asked about, converting the decimal integer 9 to the binary scientific notation number 1.0012×23. The algorithm is in two parts: (1) convert the decimal integer 9 to the binary integer 10012, and (2) convert that binary integer into binary scientific notation.
Step 1. Convert a decimal integer to a binary integer. (You can skip over this part if you already know it. Also, although this part of the algorithm is going to look perfectly fine, it turns out it's not the sort of thing that's actually used anywhere on a practical binary computer.)
The algorithm is built around a number we're working on, n, and a binary number we're building up, b.
Set n initially to the number we're converting, 9.
Set b to 0.
Compute the remainder when dividing n by 2. In our example, the remainder of 9 ÷ 2 is 1.
The remainder is one bit of our binary number. Tack it on to b. In our example, b is now 1. Also, here we're going to be tacking bits on to b on the left.
Divide n by 2 (discarding the remainder). In our example, n is now 4.
If n is now 0, we're done.
Go back to step 3.
At the end of the first trip through the algorithm, n is 4 and b is 1.
The next trip through the loop will extract the bit 0 (because 4 divided by 2 is 2, remainder 0). So b goes to 01, and n goes to 2.
The next trip through the loop will extract the bit 0 (because 2 divided by 2 is 1, remainder 0). So b goes to 001, and n goes to 1.
The next trip through the loop will extract the bit 1 (because 1 divided by 2 is 0, remainder 1). So b goes to 1001, and n goes to 0.
And since n is now 0, we're done. Meanwhile, we've built up the binary number 1001 in b, as desired.
Here's that example again, in tabular form. At each step, we compute n divided by two (or in C, n/2), and the remainder when dividing n by 2, which in C is n%2. At the next step, n gets replaced by n/2, and the next bit (which is n%2) gets tacked on at the left of b.
step n b n/2 n%2
0 9 0 4 1
1 4 1 2 0
2 2 01 1 0
3 1 001 0 1
4 0 1001
Let's run through that again, for the number 25:
step n b n/2 n%2
0 25 0 12 1
1 12 1 6 0
2 6 01 3 0
3 3 001 1 1
4 1 1001 0 1
5 0 11001
You can clearly see that the n column is driven by the n/2 column, because in step 5 of the algorithm as stated we divided n by 2. (In C this would be n = n / 2, or n /= 2.) You can clearly see the binary result appearing (in right-to-left order) in the n%2 column.
So that's one way to convert decimal integers to binary. (As I mentioned, though, it's likely not the way your computer does it. Among other things, the act of tacking a bit on to the left end of b turns out to be rather unorthodox.)
Step 2. Convert a binary integer to a binary number in scientific notation.
Before we begin with this half of the algorithm, it's important to realize that scientific (or "exponential") representations are typically not unique. Returning to decimal for a moment, let's think about the number "one thousand". Most often we'll represent that as 1 × 103. But we could also represent it as 10 × 102, or 100 × 101, or even crazier representations like 10000 × 10-1, or 0.01 × 105.
So, in practice, when we're working in scientific notation, we'll usually set up an additional rule or guideline, stating that we'll try to keep the mantissa (also called the "significand") within a certain range. For base 10, usually the goal is either to keep it in the range 0 ≤ mantissa < 10, or 0 ≤ mantissa < 1. That is, we like numbers like 1 × 103 or 0.1 × 104, but we don't like numbers like 100 × 101 or 0.01 × 105.
How do we keep our representations in the range we like? What if we've got a number (perhaps the intermediate result of a calculation) that's in a form we don't like? The answer is simple, and it depends on a pattern you've probably already noticed: If you multiply the mantissa by 10, and if you simultaneously subtract 1 from the exponent, you haven't changed the value of the number. Similarly, you can divide the mantissa by 10 and increment the exponent, again without changing anything.
When we convert a scientific-notation number into the form we like, we say we're normalizing the number.
One more thing: since 100 is 1, we can preliminarily convert any integer to scientific notation by simply multiplying it by 100. That is, 9 is 9×100, and 25 is 25×100. If we do it that way we'll usually get a number that's in a form we "don't like" (that is "nonnormalized"), but now we have an idea of how to fix that.
So let's return to base 2, and the rest of this second half of our algorithm. Everything we've said so far about decimal scientific notation is also true about binary scientific notation, as long as we make the obvious changes of "10" to "2".
To convert the binary integer 10012 to binary scientific notation, we first multiply it by 20, resulting in: 10012×20. So actually we're almost done, except that this number is nonnormalized.
What's our definition of a normalized base-two scientific notation number? We haven't said, but the requirement is usually that the mantissa is between 0 and 102 (that is, between 0 and 210), or stated another way, that the high-order bit of the mantissa is always 1 (unless the whole number is 0). That is, these mantissas are normalized: 1.0012, 1.12, 1.02, 0.02. These mantissas are nonnormalized: 10.012, 0.0012.
So to normalize a number, we may need to multiply or divide the mantissa by 2, while incrementing or decrementing the exponent.
Putting this all together in step-by-step form: to convert a binary integer to a binary scientific number:
Multiply the integer by 20: set the mantissa to the number we're converting, and the exponent to 0.
If the number is normalized (if the mantissa is 0, or if its leading bit is 1), we're done.
If the mantissa has more than one bit to the left of the decimal point (really the "radix point" or "binary point"), divide the mantissa by 2, and increment the exponent by 1. Return to step 2.
(This step will never be necessary if the number we started with was an integer.) If the mantissa is nonzero but the bit to the left of the radix point is 0, multiply the mantissa by 2, and decrement the exponent by 1. Return to step 2.
Running this algorithm in tabular form for our number 9, we have:
step mantissa exponent
0 1001. 0
1 100.1 1
2 10.01 2
3 1.001 3
So, if you're still with me, that's how we can convert the decimal integer 9 to the binary scientific notation (or floating-point) number 1.0012×23.
And, with all of that said, the algorithm as stated so far only works for decimal integers. What if we wanted to convert, say, the decimal number 1.25 to the binary number 1.012×20, or 34.125 to 1.000100012×25? That's a discussion that will have to wait for another day (or for this other answer), I guess.

precision multiplication of 255-bit integer in radix-2^16 [duplicate]

I'm trying to learn C and have come across the inability to work with REALLY big numbers (i.e., 100 digits, 1000 digits, etc.). I am aware that there exist libraries to do this, but I want to attempt to implement it myself.
I just want to know if anyone has or can provide a very detailed, dumbed down explanation of arbitrary-precision arithmetic.
It's all a matter of adequate storage and algorithms to treat numbers as smaller parts. Let's assume you have a compiler in which an int can only be 0 through 99 and you want to handle numbers up to 999999 (we'll only worry about positive numbers here to keep it simple).
You do that by giving each number three ints and using the same rules you (should have) learned back in primary school for addition, subtraction and the other basic operations.
In an arbitrary precision library, there's no fixed limit on the number of base types used to represent our numbers, just whatever memory can hold.
Addition for example: 123456 + 78:
12 34 56
78
-- -- --
12 35 34
Working from the least significant end:
initial carry = 0.
56 + 78 + 0 carry = 134 = 34 with 1 carry
34 + 00 + 1 carry = 35 = 35 with 0 carry
12 + 00 + 0 carry = 12 = 12 with 0 carry
This is, in fact, how addition generally works at the bit level inside your CPU.
Subtraction is similar (using subtraction of the base type and borrow instead of carry), multiplication can be done with repeated additions (very slow) or cross-products (faster) and division is trickier but can be done by shifting and subtraction of the numbers involved (the long division you would have learned as a kid).
I've actually written libraries to do this sort of stuff using the maximum powers of ten that can be fit into an integer when squared (to prevent overflow when multiplying two ints together, such as a 16-bit int being limited to 0 through 99 to generate 9,801 (<32,768) when squared, or 32-bit int using 0 through 9,999 to generate 99,980,001 (<2,147,483,648)) which greatly eased the algorithms.
Some tricks to watch out for.
1/ When adding or multiplying numbers, pre-allocate the maximum space needed then reduce later if you find it's too much. For example, adding two 100-"digit" (where digit is an int) numbers will never give you more than 101 digits. Multiply a 12-digit number by a 3 digit number will never generate more than 15 digits (add the digit counts).
2/ For added speed, normalise (reduce the storage required for) the numbers only if absolutely necessary - my library had this as a separate call so the user can decide between speed and storage concerns.
3/ Addition of a positive and negative number is subtraction, and subtracting a negative number is the same as adding the equivalent positive. You can save quite a bit of code by having the add and subtract methods call each other after adjusting signs.
4/ Avoid subtracting big numbers from small ones since you invariably end up with numbers like:
10
11-
-- -- -- --
99 99 99 99 (and you still have a borrow).
Instead, subtract 10 from 11, then negate it:
11
10-
--
1 (then negate to get -1).
Here are the comments (turned into text) from one of the libraries I had to do this for. The code itself is, unfortunately, copyrighted, but you may be able to pick out enough information to handle the four basic operations. Assume in the following that -a and -b represent negative numbers and a and b are zero or positive numbers.
For addition, if signs are different, use subtraction of the negation:
-a + b becomes b - a
a + -b becomes a - b
For subtraction, if signs are different, use addition of the negation:
a - -b becomes a + b
-a - b becomes -(a + b)
Also special handling to ensure we're subtracting small numbers from large:
small - big becomes -(big - small)
Multiplication uses entry-level math as follows:
475(a) x 32(b) = 475 x (30 + 2)
= 475 x 30 + 475 x 2
= 4750 x 3 + 475 x 2
= 4750 + 4750 + 4750 + 475 + 475
The way in which this is achieved involves extracting each of the digits of 32 one at a time (backwards) then using add to calculate a value to be added to the result (initially zero).
ShiftLeft and ShiftRight operations are used to quickly multiply or divide a LongInt by the wrap value (10 for "real" math). In the example above, we add 475 to zero 2 times (the last digit of 32) to get 950 (result = 0 + 950 = 950).
Then we left shift 475 to get 4750 and right shift 32 to get 3. Add 4750 to zero 3 times to get 14250 then add to result of 950 to get 15200.
Left shift 4750 to get 47500, right shift 3 to get 0. Since the right shifted 32 is now zero, we're finished and, in fact 475 x 32 does equal 15200.
Division is also tricky but based on early arithmetic (the "gazinta" method for "goes into"). Consider the following long division for 12345 / 27:
457
+-------
27 | 12345 27 is larger than 1 or 12 so we first use 123.
108 27 goes into 123 4 times, 4 x 27 = 108, 123 - 108 = 15.
---
154 Bring down 4.
135 27 goes into 154 5 times, 5 x 27 = 135, 154 - 135 = 19.
---
195 Bring down 5.
189 27 goes into 195 7 times, 7 x 27 = 189, 195 - 189 = 6.
---
6 Nothing more to bring down, so stop.
Therefore 12345 / 27 is 457 with remainder 6. Verify:
457 x 27 + 6
= 12339 + 6
= 12345
This is implemented by using a draw-down variable (initially zero) to bring down the segments of 12345 one at a time until it's greater or equal to 27.
Then we simply subtract 27 from that until we get below 27 - the number of subtractions is the segment added to the top line.
When there are no more segments to bring down, we have our result.
Keep in mind these are pretty basic algorithms. There are far better ways to do complex arithmetic if your numbers are going to be particularly large. You can look into something like GNU Multiple Precision Arithmetic Library - it's substantially better and faster than my own libraries.
It does have the rather unfortunate misfeature in that it will simply exit if it runs out of memory (a rather fatal flaw for a general purpose library in my opinion) but, if you can look past that, it's pretty good at what it does.
If you cannot use it for licensing reasons (or because you don't want your application just exiting for no apparent reason), you could at least get the algorithms from there for integrating into your own code.
I've also found that the bods over at MPIR (a fork of GMP) are more amenable to discussions on potential changes - they seem a more developer-friendly bunch.
While re-inventing the wheel is extremely good for your personal edification and learning, its also an extremely large task. I don't want to dissuade you as its an important exercise and one that I've done myself, but you should be aware that there are subtle and complex issues at work that larger packages address.
For example, multiplication. Naively, you might think of the 'schoolboy' method, i.e. write one number above the other, then do long multiplication as you learned in school. example:
123
x 34
-----
492
+ 3690
---------
4182
but this method is extremely slow (O(n^2), n being the number of digits). Instead, modern bignum packages use either a discrete Fourier transform or a Numeric transform to turn this into an essentially O(n ln(n)) operation.
And this is just for integers. When you get into more complicated functions on some type of real representation of number (log, sqrt, exp, etc.) things get even more complicated.
If you'd like some theoretical background, I highly recommend reading the first chapter of Yap's book, "Fundamental Problems of Algorithmic Algebra". As already mentioned, the gmp bignum library is an excellent library. For real numbers, I've used MPFR and liked it.
Don't reinvent the wheel: it might turn out to be square!
Use a third party library, such as GNU MP, that is tried and tested.
You do it in basically the same way you do with pencil and paper...
The number is to be represented in a buffer (array) able to take on an arbitrary size (which means using malloc and realloc) as needed
you implement basic arithmetic as much as possible using language supported structures, and deal with carries and moving the radix-point manually
you scour numeric analysis texts to find efficient arguments for dealing by more complex function
you only implement as much as you need.
Typically you will use as you basic unit of computation
bytes containing with 0-99 or 0-255
16 bit words contaning wither 0-9999 or 0--65536
32 bit words containing...
...
as dictated by your architecture.
The choice of binary or decimal base depends on you desires for maximum space efficiency, human readability, and the presence of absence of Binary Coded Decimal (BCD) math support on your chip.
You can do it with high school level of mathematics. Though more advanced algorithms are used in reality. So for example to add two 1024-byte numbers :
unsigned char first[1024], second[1024], result[1025];
unsigned char carry = 0;
unsigned int sum = 0;
for(size_t i = 0; i < 1024; i++)
{
sum = first[i] + second[i] + carry;
carry = sum - 255;
}
result will have to be bigger by one place in case of addition to take care of maximum values. Look at this :
9
+
9
----
18
TTMath is a great library if you want to learn. It is built using C++. The above example was silly one, but this is how addition and subtraction is done in general!
A good reference about the subject is Computational complexity of mathematical operations. It tells you how much space is required for each operation you want to implement. For example, If you have two N-digit numbers, then you need 2N digits to store the result of multiplication.
As Mitch said, it is by far not an easy task to implement! I recommend you take a look at TTMath if you know C++.
One of the ultimate references (IMHO) is Knuth's TAOCP Volume II. It explains lots of algorithms for representing numbers and arithmetic operations on these representations.
#Book{Knuth:taocp:2,
author = {Knuth, Donald E.},
title = {The Art of Computer Programming},
volume = {2: Seminumerical Algorithms, second edition},
year = {1981},
publisher = {\Range{Addison}{Wesley}},
isbn = {0-201-03822-6},
}
Assuming that you wish to write a big integer code yourself, this can be surprisingly simple to do, spoken as someone who did it recently (though in MATLAB.) Here are a few of the tricks I used:
I stored each individual decimal digit as a double number. This makes many operations simple, especially output. While it does take up more storage than you might wish, memory is cheap here, and it makes multiplication very efficient if you can convolve a pair of vectors efficiently. Alternatively, you can store several decimal digits in a double, but beware then that convolution to do the multiplication can cause numerical problems on very large numbers.
Store a sign bit separately.
Addition of two numbers is mainly a matter of adding the digits, then check for a carry at each step.
Multiplication of a pair of numbers is best done as convolution followed by a carry step, at least if you have a fast convolution code on tap.
Even when you store the numbers as a string of individual decimal digits, division (also mod/rem ops) can be done to gain roughly 13 decimal digits at a time in the result. This is much more efficient than a divide that works on only 1 decimal digit at a time.
To compute an integer power of an integer, compute the binary representation of the exponent. Then use repeated squaring operations to compute the powers as needed.
Many operations (factoring, primality tests, etc.) will benefit from a powermod operation. That is, when you compute mod(a^p,N), reduce the result mod N at each step of the exponentiation where p has been expressed in a binary form. Do not compute a^p first, and then try to reduce it mod N.
Here's a simple ( naive ) example I did in PHP.
I implemented "Add" and "Multiply" and used that for an exponent example.
http://adevsoft.com/simple-php-arbitrary-precision-integer-big-num-example/
Code snip
// Add two big integers
function ba($a, $b)
{
if( $a === "0" ) return $b;
else if( $b === "0") return $a;
$aa = str_split(strrev(strlen($a)>1?ltrim($a,"0"):$a), 9);
$bb = str_split(strrev(strlen($b)>1?ltrim($b,"0"):$b), 9);
$rr = Array();
$maxC = max(Array(count($aa), count($bb)));
$aa = array_pad(array_map("strrev", $aa),$maxC+1,"0");
$bb = array_pad(array_map("strrev", $bb),$maxC+1,"0");
for( $i=0; $i<=$maxC; $i++ )
{
$t = str_pad((string) ($aa[$i] + $bb[$i]), 9, "0", STR_PAD_LEFT);
if( strlen($t) > 9 )
{
$aa[$i+1] = ba($aa[$i+1], substr($t,0,1));
$t = substr($t, 1);
}
array_unshift($rr, $t);
}
return implode($rr);
}

How to use bit manipulation to find 5th bit, and to return number of 1 bits in an integer

Assume Z is an unsigned integer. Using ~, <<, >>, &, | , +, and - provide statements which return the desired result.
I am allowed to introduce new binary values if needed.
I have these problems:
1.Extract the 5th bit from the left Z.
For this I was thinking about doing something like
x x x x x x x x
& 0 0 0 0 1 0 0 0
___________________
0 0 0 0 1 0 0 0
Does this make sense for extracting the fifth bit? I am not totally sure how I would make this work by using just Z when I do not know its values. (I am relatively new to all of this). Would this type of idea work though?
2.Return the number of 1 bits in Z
Here I kind of have no idea how to work this out. What I really need to know is how to work on just Z with the operators, but I m not sure exactly how to.
Like I said I am new to this, so any help is appreciated.
Problem 1
You’re right on the money. I’d do an & and a >> so that you get either a nice 0 or 1.
result = (z & 0x08) >> 3;
However, this may not be strictly necessary. For example, if you’re trying to check whether the bit is set as part of an if conditional, you can exploit C’s definition of anything nonzero as true.
if (z & 0x08)
do_stuff();
Problem 2
There are a whole variety of ways to do this. According to that page, the following methodology dates from 1960, though it wasn’t published in C until 1988.
for (result = 0; z; result++)
z &= z - 1;
Exactly why this works might not be obvious at first, but if you work through a few examples, you’ll quickly see why it does.
It’s worth noting that this operation – determining the number of 1 bits in a number – is sufficiently important to have a name (population count or Hamming weight) and, on recent Intel and AMD processors, a dedicated instruction. If you’re using GCC, you can use the __builtin_popcount intrinsic.
Problem 1 looks right, except you should finish it by shifting the result right by 4 to get that bit after the mask.
To implement the mask, you need to know what integer is represented by a single 5th bit. That number is incidentally 2^5 = 32. So you can just AND z with 32 and shift it right by 4.
Problem 2:
int answer = 0;
while (z != 0){ //stop when there are no more 1 bits in z
//the following masks the lowest bit in z and adds it into answer
//if z ends with a 0, nothing is added, otherwise 1 is added
answer += (z & 1);
//this shifts z right by 1 to get the next higher bit
z >>= 1;
}
return answer;
To find out the value of the fifth bit, you don't care about the bottom bits so you can get rid of them:
unsigned int answer = z >> 4;
The fifth bit becomes the bottom bit, so you can strip it off with a bitwise-AND:
answer = answer & 1;
To find the number of 1-bits in a number you can apply stakSmashr's solution. You could optimise this further if you know you need to count the number of bits in a lot of integers - precompute the number of bits in every possible 8-bit number and store it in a table. There will only be 256 entries in the table so it won't use much memory. Then, you can loop over your data one byte at a time and find the answer from the table. This lookup will be quicker than looping again over each bit.

nth smallest number with n bits set to 1

There is a sequence of increasing numbers that have the same number of binary 1s in them. Given n (the number of 1 bits set in each number in the series) write an algorithm or C program to find the n'th number in the series.
I found this question on internet and I think the answer is just (((1 << (n+1)) - 1) & ~2). Isn't that right? I found some scary programs to compute the answer.
(1 << n+1) - 3 is a more concise way to express the result, but yes, I believe your expression is also correct.
Yes, it's true. When we have 3 bits:
1: 00000111
2: 00001011
3: 00001101 // bit 1 will be 0
4: 00001110
so the answer is n+1 bits, where bit 1 is 0.
I believe you are right. A simpler way to write it though will be:
((1 << n) - 1) << 1.
The question doesn't seem to specify where the sequence begins, or how much the number will increase each time, whereas your answer appears to assume that the sequence will start with 011111, then move to 101111, and so on. It could conceivably start with 0011111000, and the next element could be 1111100000.
Edit
The question, as stated at https://groups.google.com/group/algogeeks/browse_thread/thread/5fda06c0be475c41/ is titled, "nth number of the series", so the title of this post ("nth smallest number with n bits set to 1") isn't really in keeping the the question's origin.

Fastest way to loop every number with conditions

Given a 64 bit integer, where the last 52 bits to be evaluated and the leading 12 bits are to be ignored, what is the fastest way to loop every single combination of 7 bits on and all other bits off?
Example:
First permutation:
0[x57]1111111
Last permutation
00000000000011111110[x45]
Where 0[xn] means n off (zero) bits.
Speed is absolutely crucial, we are looking to save every clock cycle we can as it is part of a greater solution that needs to evaluate billions of states in a reasonable amount of time.
A working solution is not required, but some pseudo code would do just fine :)
I think you'll be interested in this article: http://realtimecollisiondetection.net/blog/?p=78
It solves your problem in very efficient way.
What you need is a good algorithm that will take you from one permutation to the next in minimal time.
Now, the first algorithm that comes to mind is to go through all combinations with seven loops.
The first loop goes through the 52 bits, setting one for the next loop.
The second loop goes through the bits after the set one, setting one for the third loop.
...ect
This will give you the fastest iteration. Here is some pseudo C++ code:
__int64 deck;
int bit1, bit2, bit3, ...;
for (bit1=0;bit1<52-6;bit1++) {
for (bit2=bit1+1;bit2<52-5;bit2++) {
...
for (bit7=bit6+1;bit7<52;bit7++) {
deck = (1<<bit1)+(1<<bit2)+(1<<bit3)+...; // this could be optimized.
// do whatever with deck
}
...
}
}
// note: the 52-6, 52-5, will be pre-calculated by the compiler and are there for convenience. You don't have to worry about optimizing this.
There is your solution right there. If you want to check that it works, I always downscale it. For example, following that algorithm on a 4bit number where you need to set 2 bits would go like this:
1100
1010
1001
0110
0101
0011
I think there is a relationship between each permutation.
We can see the number increases with permutation # with a pattern.
This maths isn't correct for all solutions, but works for some, hopefully indicating what I mean:
Permutation 3 difference = ((3%7+1)^2) * (roundUp(3/7) = 16
Permutation 10 difference = ((10%7+1)^2) * (roundUp(10/7) = 32
So we would loop from the absolute values:
int perm = 1;
for int64 i = 127; perm < totalPermutations
{
i = i + ((perm%7+1)^2) * (roundUp(perm/7);
perm++;
}
Again the maths is wrong, but gives an idea, I am sure it is possible to come up with a formula for this. As to whether it outperforms bitwise operations would have to be tested.

Resources