strtod with base parameter - c

I don't want to unnecessarily re-invent the wheel, but I have been looking for the functionality of strtod but with a base parameter (2,8,10,16). (I know strtoul allows a base parameter but I'm looking for return type double). Any advice / pointers in the right direction? Thanks.

For arbitrary base, this is a hard problem, but as long as your base is a power of two, the plain naive algorithm will work just fine.
strtod (in C99) supports hex floats in the same format as the C language's hex float constants. 0x prefix is required, p separates the exponent, and the exponent is in base 10 and represents a power of 2. If you need to support pre-C99 libraries, you'll have no such luck. But since you need base 2/4/8 too, it's probably just best to roll your own anyway.
Edit: An outline of the naive algorithm:
Start with a floating point accumulator variable (double or whatever, as you prefer) initialized to 0.
Starting from the leftmost digit, and up to the radix point, for each character you process, multiply the accumulator by the base and add the value of the character as a digit.
After the radix point, start a new running place-value variable, initially 1/base. On each character you process, add the digit value times the place-value variable, and then divide the place-value variable by base.
If you see the exponent character, read the number following it as an integer and use one of the standard library functions to scale a floating point number by a power of 2.
If you want to handle potentially rounding up forms that have too many digits, you have to work out that logic once you exceed the number of significant places in step 2 or 3. Otherwise you can ignore that.

Unlikely - I have never seen floating point numbers coded as 'decimals' in other number bases.

Related

C float and double comparisons

I'm comparing simple floats and doubles in C, specifically the value 8.7 for both of them. Now I assign 8.7 to each variable, when I print I get a result of 8.7000 for both values. Why has the compiler added these zeros. And the main question I wanted to ask was is there any further numbers that I'm not seeing, as in hidden after the trailing zeros. I read that I shouldn't do comparisons like this with float because of a lack of precision, but I thought with such a small value surely it can store 8.7 with a degree of accuracy needed to compare itself with another 8.7 value?
My only worry is that its actually being represented somewhere in memory as eg 8.70000003758 or something, which is throwing my comparisons off? I tried to printf with %.20f to see any further numbers that might be hiding but I think that just created numbers that were otherwise not there as the whole accuracy of the number changed to 8.6918734634834929 or something similar.
I'm comparing simple floats and doubles in C, specifically the value 8.7 for both of them.
Bad choice, since 8.7 has no exact binary representation.
Now I assign 8.7 to each variable, when I print I get a result of 8.7000 for both values. Why has the compiler added these zeros.
It hasn't, your print routine has.
And the main question I wanted to ask was is there any further numbers that I'm not seeing, as in hidden after the trailing zeros.
Definitely, since 8.7 has no exact binary representation. (Try to write it out as the sum of integer powers of 2, you can't do it.)
I read that I shouldn't do comparisons like this with float because of a lack of precision, but I thought with such a small value surely it can store 8.7 with a degree of accuracy needed to compare itself with another 8.7 value?
You thought wrong. 1/3 is small but has no exact decimal representation with a finite number of digits. Whether a value is big or small has nothing to do with whether it can be represented exactly with a finite number of digits in a particular base.
My only worry is that its actually being represented somewhere in memory as eg 8.70000003758 or something, which is throwing my comparisons off?
Exactly, just as representing 1/3 as 0.333333333 would do.
I tried to printf with %.20f to see any further numbers that might be hiding but I think that just created numbers that were otherwise not there as the whole accuracy of the number changed to 8.6918734634834929 or something similar.
That's probably just a bug. Show us that code. Perhaps you tried to output a double and left out the l.

Calculate square root using integer arithmetic

I want to calculate the square root of some integer without using any floating point arithmetic. The catch, however, is that I don't want to discard precision from the output. That is to say, I do not want a rounded integer as the result, I would like to achieve the decimal point value as well, at least to two significant digits. As an example:
sqrt(9) = 3
sqrt(10) = 3.16
sqrt(999999) = 999.99
I've been thinking about it but I haven't particularly come up with solutions, nor has searching helped much since most similar questions are just that, only similar.
Output is acceptable in any form which is not a floating point number and accurately represents the data. Preferably, I would have two ints, one for the portion before the decimal and one for the portion after the decimal.
I'm okay with just pseudo-code / an explained algorithm, if coding C would be best. Thanks
You can calculate an integer numerator and an integer denominator, such that the floating-point division of the numerator by the denominator will yield the square root of the input number.
Please note, however, that no square-root method exists such that the result is 100% accurate for every natural number, as the square root of such number can be an irrational number.
Here is the algorithm:
Function (input number, input num_of_iterations, output root):
Set root.numerator = number
Set root.denominator = 1
Run num_of_iterations:
Set root = root-(root^2-number)/(root*2)
You might find this C++ implementation useful (it also includes the conversion of the numerator divided by the denominator into a numerical string with predefined floating-point precision).
Please note that no floating-point operations are required (as demonstrated at the given link).

Floating Point Square Root Reciprocal Method Correct Rounding

I have implemented a 32-bit IEEE-754 Floating Point Square Root using the Newton-Raphson method (in assembly) based upon finding the reciprocal of the square root.
I am using the round-to-nearest rounding method.
My square root method only accepts normalized values and zeros, but no denormalized values or special values (NaN, Inf, etc.)
I am wondering how I can ACHIEVE correct rounding (with assembly like instructions) so that my results are correct (to IEEE-754) for all inputs?
Basically, I know how to test if my results are correct, but I want to adjust the algorithm below so that I obtain correctly rounded results. What instructions should I add to the algorithm?
See: Determining Floating Point Square Root
for more information
Thank you!
There are only about 2 billion floats matching your description. Try them all, compare against sqrtf from your C library, and examine all differences. You can get a higher-precision square root using sqrt or sqrtl from your C library if you are worried. sqrt, sqrtf, and sqrtl are correctly-rounded by typical C libraries, though, so a direct comparison ought to work.
Why not square the result, and if it's not equal to the input, add or subtract (depending on the sign of the difference) a least significant bit, square, and check whether that would have given a better result?
Better here could mean with less absolute difference. The only case where this could get tricky is when "crossing" √2 with the mantissa, but this could be checked once and for all.
EDIT
I realize that the above answer is insufficient. Simply squaring in 32-bit FP and comparing to the input doesn't give you enough information. Let's say y = your_sqrt(x). You compare y2 to x, find that y2>x, subtract 1 LSB from y obtaining z (y1 in your comments), then compare z2 to x and find that not only z2<x, but, within the available bits, y2-x==x-z2 - how do you choose between y and z? You should either work with all the bits (I guess this is what you were looking for), or at least with more bits (which I guess is what njuffa is suggesting).
From a comment of yours I suspect you are on strictly 32-bit hardware, but let me suppose that you have a 32-bit by 32-bit integer multiplication with 64-bit result available (if not, it can be constructed). If you take the 23 bits of the mantissa of y as an integer, put a 1 in front, and multiply it by itself, you have a number that, except for a possible extra shift by 1, you can directly compare to the mantissa of x treated the same way. This way you have all 48 bits available for the comparison, and can decide without any approximation whether abs(y2-x)≷abs(z2-x).
If you are not sure to be within one LSB from the final result (but you are sure not to be much farther than that), you should repeat the above until y2-x changes sign or hits 0. Watch out for edge cases, though, which should essentially be the cases when the exponent is adjusted because the mantissa crosses a power of 2.
It can also be helpful to remember that positive floating point numbers can be correctly compared as integers, at least on those machines where 1.0F is 0x3f800000.

C: Adding Exponentials

What I thought was a trivial addition in standard C code compiled by GCC has confused me somewhat.
If I have a double called A and also a double called B, and A = a very small exponential say 1e-20 and B is a larger value for example 1e-5 - why does my double C which equals the summation A+B take on the dominant value B? I was hoping that when I specify to print to 25 decimal places I would get 1.00000000000000100000e-5.
Instead what I get is just 1.00000000000000000000e-5. Do I have to use long double or something else?
Very confused, and an easy question for most to answer I'm sure! Thanks for any guidance in advance.
Yes, there is not enough precision in the double mantissa. 2^53 (the precision of the double mantissa) is only slightly larger than 10^15 (the ratio between 10^20 and 10^5) so binary expansion and round off can easily squash small bits at the end.
http://en.wikipedia.org/wiki/Double-precision_floating-point_format
Google is your friend etc.
Floating point variables can hold a bigger range of value than fixed point, however their precision on significant digit has limits.
You can represent very big or very small numbers but the precision is dependent on the number of significant digit.
If you try to make operation between numbers very far in terms of exponent used to express them, the ability to work with them depends on the ability to represent them with the same exponent.
In your case when you try to sum the two numbers, the smaller numbers is matched in exponent with the bigger one, resulting in a 0 because its significant digit is out of range.
You can learn more for example on wiki

What is the most efficient way to store and work with a floating point number with 1,000,000 significant digits in C?

I'm writing a utility to calculate π to a million digits after the decimal. On a 32- or 64-bit consumer desktop system, what is the most efficient way to store and work with such a large number accurate to the millionth digit?
clarification: The language would be C.
Forget floating point, you need bit strings that represent integers
This takes a bit less than 1/2 megabyte per number. "Efficient" can mean a number of things. Space-efficient? Time-efficient? Easy-to-program with?
Your question is tagged floating-point, but I'm quite sure you do not want floating point at all. The entire idea of floating point is that our data is only known to a few significant figures and even the famous constants of physics and chemistry are known precisely to only a handful or two of digits. So there it makes sense to keep a reasonable number of digits and then simply record the exponent.
But your task is quite different. You must account for every single bit. Given that, no floating point or decimal arithmetic package is going to work unless it's a template you can arbitrarily size, and then the exponent will be useless. So you may as well use integers.
What you really really need is a string of bits. This is simply an array of convenient types. I suggest <stdint.h> and simply using uint32_t[125000] (or 64) to get started. This actually might be a great use of the more obscure constants from that header that pick out bit sizes that are fast on a given platform.
To be more specific we would need to know more about your goals. Is this for practice in a specific language? For some investigation into number theory? If the latter, why not just use a language that already supports Bignum's, like Ruby?
Then the storage is someone else's problem. But, if what you really want to do is implement a big number package, then I might suggest using bcd (4-bit) strings or even ordinary ascii 8-bit strings with printable digits, simply because things will be easier to write and debug and maximum space and time efficiency may not matter so much.
I'd recommend storing it as an array of short ints, one per digit, and then carefully write utility classes to add and subtract portions of the number. You'll end up moving from this array of ints to floats and back, but you need a 'perfect' way of storing the number - so use its exact representation. This isn't the most efficient way in terms of space, but a million ints isn't very big.
It's all in the way you use the representation. Decide how you're going to 'work with' this number, and write some good utility functions.
If you're willing to tolerate computing pi in hex instead of decimal, there's a very cute algorithm that allows you to compute a given hexadecimal digit without knowing the previous digits. This means, by extension, that you don't need to store (or be able to do computation with) million digit numbers.
Of course, if you want to get the nth decimal digit, you will need to know all of the hex digits up to that precision in order to do the base conversion, so depending on your needs, this may not save you much (if anything) in the end.
Unless you're writing this purely for fun and/or learning, I'd recommend using a library such as GNU Multiprecision. Look into the mpf_t data type and its associated functions for storing arbitrary-precision floating-point numbers.
If you are just doing this for fun/learning, then represent numbers as an array of chars, which each array element storing one decimal digit. You'll have to implement long addition, long multiplication, etc.
Try PARI/GP, see wikipedia.
You could store its decimals digits as text in a file and mmap it to an array.
i once worked on an application that used really large numbers (but didnt need good precision). What we did was store the numbers as logarithms since you can store a pretty big number as a log10 within an int.
Think along this lines before resorting to bit stuffing or some complex bit representations.
I am not too good with complex math, but i reckon there are solutions which are elegant when storing numbers with millions of bits of precision.
IMO, any programmer of arbitrary precision arithmetics needs understanding of base conversion. This solves anyway two problems: being able to calculate pi in hex digits and converting the stuff to decimal representation and as well finding the optimal container.
The dominant constraint is the number of correct bits in the multiplication instruction.
In Javascript one has always 53-bits of accuracy, meaning that a Uint32Array with numbers having max 26 bits can be processed natively. (waste of 6 bits per word).
In 32-bit architecture with C/C++ one can easily get A*B mod 2^32, suggesting basic element of 16 bits. (Those can be parallelized in many SIMD architectures starting from MMX). Also each 16-bit result can contain 4-digit decimal numbers (wasting about 2.5 bits) per word.

Resources