Pretty print a double number in a fixed number of chars - c

What is the simplest solution to print a double (printf) in C so that:
exactly N characters are used (will be around 6) for all double numbers (nan and infinities are handled separately), positive and negative alike (+ or - always as first char);
decimal representation ('.' always present) is used as long as the numeric chars are not all 0 (i.e. too small number) or the decimal point is the last of the N char (i.e too big number). Otherwise switch to scientific representation, always occupying exactly N chars.
All the solutions I can think of seem quite involved, any idea to obtain this result easily (efficiency is not a concern here) ?
Thanks!

I could not find a way to do this via a single printf call, here is my solution.
At least 9 chars must be used as (with +- in front) that's the minimum amount of chars for scientific notation (for example: +1.0E-002). In the following I consider the case of 9 chars. The following two formats are used based on the conditions reported below:
Scientific format '%+.1e':
chars 4 to 9 as per decimal format are 0 and the number is not identical to 0 (i.e. too small for decimal)
the '.' char is not present between char 3 and char 8 as per decimal format (i.e. too large for decimal)
Decimal format '%+.6f':
Infinite or nan
All other cases
It's easy to adapt to a representation longer than 9 chars by changing the constants above.

Related

Multiplication of 2 numbers with a maximum of 2000 digits [duplicate]

This question already has answers here:
What is the simplest way of implementing bigint in C?
(5 answers)
How can I compute a very big digit number like (1000 digits ) in c , and print it out using array
(4 answers)
Store very big numbers in an integer in C
(2 answers)
Closed 3 months ago.
Implement a program to multiply two numbers, with the mention that the first can have a maximum of 2048 digits, and the second number is less than 100. HINT: multiplication can be done using repeated additions.
Up to a certain point, the program works using long double, but when working with larger numbers, only INF is displayed. Any ideas?
Implement a program to multiply two numbers, with the mention that the first can have a maximum of 2048 digits, and the second number is less than 100.
OK. The nature of multiplication is that if a number with N bits is multiplied by a number with M bits, then the result will have up to N+M bits. In other words, you need to handle a result that has 2148 bits.
A long double could be anything (it's implementation dependent). Most likely (Windows or not 80x86) is that it's a synonym for double, but sometimes it might be larger (e.g. the 80-bit format described on this Wikipedia page ). The best you can realistically hope for is a dodgy estimate with lots of precision loss and not a correct result.
The worst case (the most likely case) is that the exponent isn't big enough either. E.g. for double the (unbiased) exponent has to be in the range −1022 to +1023 so attempting to shove a 2048 bit number in there will cause an overflow (an infinity).
What you're actually being asked to do is implement a program that uses "big integers". The idea would be to store the numbers as arrays of integers, like uint32_t result[2148/32];, so that you actually do have enough bits to get a correct result without precision loss or overflow problems.
With this in mind, you want a multiplication algorithm that can work with big integers. Note: I'd recommend something from that Wikipedia page's "Algorithms for multiplying by hand" section - there's faster/more advanced algorithms that are way too complicated for (what I assume is) a university assignment.
Also, the "HINT: multiplication can be done using repeated additions" is a red herring to distract you. It'd take literally days for a computer do the equivalent of a while(source2 != 0) { result += source1; source2--; } with large numbers.
Here's a few hints.
Multiplying a 2048 digit string by a 100 digit string might yield a string with as many as 2148 digits. That's two high for any primitive C type. So you'll have to do all the math the hard way against "strings". So stay in the string space since your input will most likely be read in as much.
Let's say you are trying to multiple "123456" x "789".
That's equivalent to (123456 * (700 + 80 + 9)
Which is equivalent to to 123456 * 700 + 123456 * 80 + 123456 * 9
Which is equivalent to doing these steps:
result1 = Multiply 123456 by 7 and add two zeros at the end
result2 = Multiply 123456 by 8 and add one zero at the end
result3 = Multiply 123456 by 9
final result = result1+result2+result3
So all you need is a handful of primitives that can take a digit string of arbitrary length and do some math operations on it.
You just need these three functions:
// Returns a new string that is identical to s but with a specific number of
// zeros added to the end.
// e.g. MultiplyByPowerOfTen("123", 3) returns "123000"
char* MultiplyByPowerOfTen(char* s, size_t zerosToAdd)
{
};
// Performs multiplication on the big integer represented by s
// by the specified digit
// e.g. Multiple("12345", 2) returns "24690"
char* Multiply(char* s, int digit) // where digit is between 0 and 9
{
};
// Performs addition on the big integers represented by s1 and s2
// e.g. Add("12345", "678") returns "13023"
char* Add(char* s1, char* s2)
{
};
Final hint. Any character at position i in your string can be converted to its integer equivalent like this:
int digit = s[i] - '0';
And any digit can be converted back to a printable char:
char c = '0' + digit

Why is log base 10 used in this code to convert int to string?

I saw a post explaining how to convert an int to a string. In the explanation there is a line of code to get the number of chars in a string:
(int)((ceil(log10(num))+1)*sizeof(char))
I’m wondering why log base 10 is used?
ceil(log10(num))+1 is incorrectly being used instead of floor(log10(num))+2.
The code is attempting to determine the amount of memory needed to store the decimal representation of the positive integer num as a string.
The two formulas presented above are equal except for numbers which are exact powers of 10, in which case the former version returns one less than the desired number.
For example, 10,000 requires 6 bytes, yet ceil(log10(10000))+1 returns 5. floor(log10(10000))+2 correctly returns 6.
How was floor(log10(num))+2 obtained?
A 4-digit number such as 4567 will be between 1,000 (inclusive) and 10,000 (exclusive), so it will be between 103 (inclusive) and 104 (exclusive), so log10(4567) will be between 3 (inclusive) and 4 (exclusive).
As such, floor(log10(num))+1 will return number of digits needed to represent the positive value num in decimal.
As such, floor(log10(num))+2 will return the amount of memory needed to store the decimal representation of the positive integer num as a string. (The extra char is for the NUL that terminates the string.)
I’m wondering why log base 10 is used?
I'm wondering the same thing. It uses a very complex calculation that happens at runtime, to save a couple bytes of temporary storage. And it does it wrong.
In principle, you get the number of digits in base 10 by taking the base-10 logarithm and flooring and adding 1. It comes exactly from the fact that
log10(1) = log10(10⁰) = 0
log10(10) = log10(10¹) = 1
log10(100) = log10(10²) = 2
and all numbers between 10 and 100 have their logarithms between 1 and 2 so if you floor the logarithm for any two digit number you get 1... add 1 and you get the number of digits.
But you do not need to do this at runtime. The maximum number of bytes needed for a 32-bit int in base 10 is 10 digits, negative sign and null terminator for 12 chars. The maximum you can save with the runtime calculation are 10 bytes of RAM, but it is usually temporary so it is not worth it. If it is stack memory, well, the call to log10, ceil and so forth might require far more.
In fact, we know the maximum number of bits needed to represent an integer: sizeof (int) * CHAR_BIT. This is greater than or equal to log2 of the MAX_INT + 1. And we know that log10(x) =~ 3.32192809489 * log2(x), so we get a good (possibly floored) approximation of log10(MAX_INT) by just dividing sizeof (int) * CHAR_BIT by 3. Then add 1 for we were supposed to add 1 to the floored logarithm to get the number of digits, then 1 for possible sign, and 1 for the null terminator and we get
sizeof (int) * CHAR_BIT / 3 + 3
Unlike the one from your question, this is an integer constant expression, i.e. the compiler can easily fold it at the compilation time, and it can be used to set the size of a statically-typed array, and for 32-bits it gives 13 which is only one more than the 12 actually required, for 16 bits it gives 8 which is again only one more than the maximum required 7 and for 8 bits it gives 5 which is the exact maximum.
ceil(log10(num)) + 1 is intended to provide the number of characters needed for the output string.
For example, if num=101, the expression's value is 4, the correct length of '101' plus the null terminator.
But if num=100, the value is 3. This behavior is incorrect.
This is because it's allocating enough space for the number to fit in the string.
If, for example, you had the number 1034, log10(1034) = 3.0145.... ceil(3.0145) is 4, which is the number of digits in the number. The + 1 is for the null-terminator.
This isn't perfect though: take 1000, for example. Despite having four digits, log(1000) = 3, and ceil(3) = 3, so this will allocate space for too few digits. Plus, as #phuclv mentions below, the log() function is very time-consuming for this purpose, especially since the length of a number has a (relatively low) upper-bound.
The reason it's log base 10 is because, presumably, this function represents the number in decimal form. If, for example, it were hexadecimal, log base 16 would be used.
A number N has n decimal digits iff 10^(n-1) <= N < 10^n which is equivalent to n-1 <= log(N) < n or n = floor(log(N)) + 1.
Since double representation has only limited precision floor(log(N)) may be off by 1 for certain values, so it is safer to allow for an extra digit i.e. allocate floor(log(N)) + 2 characters, and then another char for the nul terminator for a total of of floor(log(N)) + 3.
The expression in the original question ceil(log(N)) + 1 appears to not count the nul terminator, and neither allow for the chance of rounding errors, so it is one shorter in general, and two shorter for powers of 10.

Algorithm for printing decimal value of a huge(over 128bits) binary number?

TLDR, at the bottom :)
Brief:
I am in a process of creating an basic arithmetic library(addition, subtraction, ...) for handling huge numbers. One of the problem i am facing is printing these huge binary numbers into decimal.
I have huge binary number stored in an array of uint64_t. e.g.
uint64_t a[64] = {0};
Now, the goal is to print the 64*64bits binary number in the console/file as its decimal value.
Initial Work:
To elaborate the problem I want to describe how I printed hex value.
int i;
int s = 1;
a[1] = (uint64_t)0xFF;
for(i = s; i>= 0; i--)
{
printf("0x%08llX, ", a[i]);
}
Output:
0x000000FF, 0x00000000,
Similarly for printing OCT value I can just take LSB 3 bits from a[64], print decimal equivalent of those bits, 3 bits right shift all the bits of a[64] and keep repeating until all the values of a[64] has been printed. (print in revers order to keep first Oct digit on the right)
I can print Hex and Oct value of a binary of unlimited size just by repeating this unit algorithm, but I could not find/develop one for Decimal which I can repeat over and over again to print a[64](or something bigger).
What I have thought of:
My initial idea was to keep subtracting
max_64 =(uint64)10000000000000000000; //(i.e.10^19)
the biggest multiple of 10 inside uint64_t, from a until the value inside a is smaller than max_64 (which is basically equivalent of rem_64 = a%max_64 ) and print the rem_64 value using
printf("%019llu",rem_64);
which is the 1st 19 decimal digits of the number a.
Then do an arithmetic operation similar to (not the code):
a = a/max_64; /* Integer division(no fractional part) to remove right most 19 dec digits from 'a' */
and keep repeating and printing 19 decimal digits. (print in such a way that first found 19 digits are on the right, then next 19 digits on its left and so on...).
The problem is this process is to long and I don't want to use all these to just print the dec value. And was looking for a process which avoids using these huge time consuming arithmetic operations.
What I believe is that there must be a way to print huge size just by repeating an algorithm (similar to how Hex and Oct can be printed) and I hope someone could point me to the right direction.
What my library can do(so far):
Add (Using Full-Adder)
Sub (Using Full-subtractor)
Compare (by comparing array size and comparing array elements)
Div (Integer division, no fractional part)
Modulus (%)
Multiplication (basically adding from several times :( )
I will write code for other operations if needed, but I would like to implement the printing function independent of the library if possible.
Consider the problem like this:
You have been given a binary number X of n bits (1<=n<=64*64) you have to print out X in decimal. You can use existing library if absolutely needed but better if unused.
TLDR:
Any code, reference or unit algorithm which I can repeat for printing decimal value of a binary of too big and/or unknown size would be much helpful. Emphasis on algorithm i.e. I don't need a code if some one could describe a process I will be able to implement it. Thanks in advance.
When faced with such doubts, and given that there are many bigint libraries out there, it is interesting to look into their code. I had a look at Java's BigInteger, which has a toString method, and they do two things:
for small numbers, they bite the bullet and do something similar to what you proposed - straightforward link-by-link base conversion, outputting decimal numbers in each step.
for large numbers, they use the recursive Schönhage algorithm, which they quote in the comments as being referred to in, among other places,
Knuth, Donald, The Art of Computer Programming, Vol. 2, Answers to
Exercises (4.4) Question 14.

e format in printf() and precision modifiers

Could you explain me why
printf("%2.2e", 1201.0);
gives a result 1.20e+03 and not just 12.01e2?
My way of thinking: default number is 1201.0, specifier tells are that there should be 2 numbers after the digit.
What is wrong?
According to Wikipedia:
In normalized scientific notation, the exponent b is chosen so that the absolute value of a remains at least one but less than ten (1 ≤ |a| < 10). Thus 350 is written as 3.5×102. This form allows easy comparison of numbers, as the exponent b gives the number's order of magnitude. In normalized notation, the exponent b is negative for a number with absolute value between 0 and 1 (e.g. 0.5 is written as 5×10−1). The 10 and exponent are often omitted when the exponent is 0.
Normalized scientific form is the typical form of expression of large numbers in many fields, unless an unnormalised form, such as engineering notation, is desired. Normalized scientific notation is often called exponential notation—although the latter term is more general and also applies when a is not restricted to the range 1 to 10 (as in engineering notation for instance) and to bases other than 10 (as in 3.15× 220).
The first 2 in "%2.2e" is the minimum character width to print. 1.20e+03 is 8 characters which is more than 2.
e directs that the number is printed: (sign), 1 digit, '.', followed by some digits and an exponent.
The 2nd 2 in "%2.2e" is the number of digits after the decimal point to print. 6 is used if this 2nd value is not provided.
The %e format uses scientific notation, i.e. one digit before the decimal separator and an exponent for scaling. You can't set the digits before the decimal separator using this format.
This is just how the scientific notation is defined. The result you expect is a very weird notation. I don't think you can get it with printf.
The number before the dot in the format specifier defines the minimum width of the resulting sub-string. Try %20.2e to see what that means.

GMP most significant digits

I'm performing some calculations on arbitrary precision integers using GNU Multiple Precision (GMP) library. Then I need the decimal digits of the result. But not all of them: just, let's say, a hundred of most significant digits (that is, the digits the number starts with) or a selected range of digits from the middle of the number (e.g. digits 100..200 from a 1000-digit number).
Is there any way to do it in GMP?
I couldn't find any functions in the documentation to extract a range of decimal digits as a string. The conversion functions which convert mpz_t to character strings always convert the entire number. One can only specify the radix, but not the starting/ending digit.
Is there any better way to do it other than converting the entire number into a humongous string only to take a small piece of it and throw out the rest?
Edit: What I need is not to control the precision of my numbers or limit it to a particular fixed amount of digits, but selecting a subset of digits from the digit string of the number of arbitrary precision.
Here's an example of what I need:
71316831 = 19821203202357042996...2076482743
The actual number has 1112852 digits, which I contracted into the ....
Now, I need only an arbitrarily chosen substring of this humongous string of digits. For example, the ten most significant digits (1982120320 in this case). Or the digits from 1112841th to 1112849th (21203202 in this case). Or just a single digit at the 1112841th position (2 in this case).
If I were to first convert my GMP number to a string of decimal digits with mpz_get_str, I would have to allocate a tremendous amount of memory for these digits only to use a tiny fraction of them and throw out the rest. (Not to mention that the original mpz_t number in binary representation already eats up quite a lot.)
If you know the number of decimal digits of x = 7^1316831 in advance, e.g., 1112852. Then you get your lower, say, 10 digits with:
x % (10^10), and the upper 20 digits with:
x / (10^(1112852 - 20)).
Note, I get 19821203202357042995 for the latter; 5 at final, not 6.
I don't think you can do that in GMP. However you can use Boost Multiprecision Library
Depending upon the number type, precision may be arbitrarily large (limited only by available memory), fixed at compile time (for example 50 or 100 decimal digits), or a variable controlled at run-time by member functions. The types are expression-template-enabled for better performance than naive user-defined types.
Emphasis mine
Another alternative is ttmath with the type ttmath::Big<e,m> that you can control the needed precision. Any fixed-precision types will work, provided that you only need the most significant digits, as they all drop the low significant digits like how float and double work. Those digits don't affect the high digits of the result, hence can be omitted safely. For instance if you need the high 20 digits then use a type that can store 20 digits and a little more, in order to provide enough data for correct rounding later
For demonstration let's take a simple example of 77 = 823543 and you only need the top 2 digits. Using a 4-digit type for calculation you'll get this
75 = 16807 => round to 1681×10¹ and store
75×7 = 1681×101×7 = 11767*10¹ ≈ 1177×102
75×7×7 = 1177×102×7 = 8232×102
As you can see the top digits are the same even without needing to get the full exact result. Calculating the full precision using GMP not only wastes a lot of time but also memory. Think about the amount of memory you need to store the result of another operation on 2 bigints to get the digits you want. By fixing the precision instead of leaving it at infinite you'll decrease the CPU and memory usage significantly.
If you need the 100th to 200th high order digits then use a type that has enough room for 201 digits and more, and extract those 101 digits after calculation. But this will be more wasteful so you may need to change to an arbitrary-precision (or fixed-precision) type that uses a base that's a power of 10 for its limbs (I'm using GMP notation here). For example if the type uses base 109 then each limb represents 9 digits in the decimal output and you can get arbitrary digit in decimal directly without any conversion from binary to decimal. That means zero waste for the string. I'm not sure which library uses base 10n but you can look at Mini-Pi's implementation which uses base 109, or write it yourself. This way it also work for efficiently getting the high digits
See
How are extremely large floating-point numbers represented in memory?
What is the simplest way of implementing bigint in C?

Resources