Why is log base 10 used in this code to convert int to string?

Why is log base 10 used in this code to convert int to string? - c

I saw a post explaining how to convert an int to a string. In the explanation there is a line of code to get the number of chars in a string:
(int)((ceil(log10(num))+1)*sizeof(char))
I’m wondering why log base 10 is used?

ceil(log10(num))+1 is incorrectly being used instead of floor(log10(num))+2.
The code is attempting to determine the amount of memory needed to store the decimal representation of the positive integer num as a string.
The two formulas presented above are equal except for numbers which are exact powers of 10, in which case the former version returns one less than the desired number.
For example, 10,000 requires 6 bytes, yet ceil(log10(10000))+1 returns 5. floor(log10(10000))+2 correctly returns 6.
How was floor(log10(num))+2 obtained?
A 4-digit number such as 4567 will be between 1,000 (inclusive) and 10,000 (exclusive), so it will be between 103 (inclusive) and 104 (exclusive), so log10(4567) will be between 3 (inclusive) and 4 (exclusive).
As such, floor(log10(num))+1 will return number of digits needed to represent the positive value num in decimal.
As such, floor(log10(num))+2 will return the amount of memory needed to store the decimal representation of the positive integer num as a string. (The extra char is for the NUL that terminates the string.)

I’m wondering why log base 10 is used?
I'm wondering the same thing. It uses a very complex calculation that happens at runtime, to save a couple bytes of temporary storage. And it does it wrong.
In principle, you get the number of digits in base 10 by taking the base-10 logarithm and flooring and adding 1. It comes exactly from the fact that
log10(1) = log10(10⁰) = 0
log10(10) = log10(10¹) = 1
log10(100) = log10(10²) = 2
and all numbers between 10 and 100 have their logarithms between 1 and 2 so if you floor the logarithm for any two digit number you get 1... add 1 and you get the number of digits.
But you do not need to do this at runtime. The maximum number of bytes needed for a 32-bit int in base 10 is 10 digits, negative sign and null terminator for 12 chars. The maximum you can save with the runtime calculation are 10 bytes of RAM, but it is usually temporary so it is not worth it. If it is stack memory, well, the call to log10, ceil and so forth might require far more.
In fact, we know the maximum number of bits needed to represent an integer: sizeof (int) * CHAR_BIT. This is greater than or equal to log2 of the MAX_INT + 1. And we know that log10(x) =~ 3.32192809489 * log2(x), so we get a good (possibly floored) approximation of log10(MAX_INT) by just dividing sizeof (int) * CHAR_BIT by 3. Then add 1 for we were supposed to add 1 to the floored logarithm to get the number of digits, then 1 for possible sign, and 1 for the null terminator and we get
sizeof (int) * CHAR_BIT / 3 + 3
Unlike the one from your question, this is an integer constant expression, i.e. the compiler can easily fold it at the compilation time, and it can be used to set the size of a statically-typed array, and for 32-bits it gives 13 which is only one more than the 12 actually required, for 16 bits it gives 8 which is again only one more than the maximum required 7 and for 8 bits it gives 5 which is the exact maximum.

ceil(log10(num)) + 1 is intended to provide the number of characters needed for the output string.
For example, if num=101, the expression's value is 4, the correct length of '101' plus the null terminator.
But if num=100, the value is 3. This behavior is incorrect.

This is because it's allocating enough space for the number to fit in the string.
If, for example, you had the number 1034, log10(1034) = 3.0145.... ceil(3.0145) is 4, which is the number of digits in the number. The + 1 is for the null-terminator.
This isn't perfect though: take 1000, for example. Despite having four digits, log(1000) = 3, and ceil(3) = 3, so this will allocate space for too few digits. Plus, as #phuclv mentions below, the log() function is very time-consuming for this purpose, especially since the length of a number has a (relatively low) upper-bound.
The reason it's log base 10 is because, presumably, this function represents the number in decimal form. If, for example, it were hexadecimal, log base 16 would be used.

A number N has n decimal digits iff 10^(n-1) <= N < 10^n which is equivalent to n-1 <= log(N) < n or n = floor(log(N)) + 1.
Since double representation has only limited precision floor(log(N)) may be off by 1 for certain values, so it is safer to allow for an extra digit i.e. allocate floor(log(N)) + 2 characters, and then another char for the nul terminator for a total of of floor(log(N)) + 3.
The expression in the original question ceil(log(N)) + 1 appears to not count the nul terminator, and neither allow for the chance of rounding errors, so it is one shorter in general, and two shorter for powers of 10.

Related

Multiplication of 2 numbers with a maximum of 2000 digits [duplicate]

This question already has answers here:
What is the simplest way of implementing bigint in C?
(5 answers)
How can I compute a very big digit number like (1000 digits ) in c , and print it out using array
(4 answers)
Store very big numbers in an integer in C
(2 answers)
Closed 3 months ago.
Implement a program to multiply two numbers, with the mention that the first can have a maximum of 2048 digits, and the second number is less than 100. HINT: multiplication can be done using repeated additions.
Up to a certain point, the program works using long double, but when working with larger numbers, only INF is displayed. Any ideas?

Implement a program to multiply two numbers, with the mention that the first can have a maximum of 2048 digits, and the second number is less than 100.
OK. The nature of multiplication is that if a number with N bits is multiplied by a number with M bits, then the result will have up to N+M bits. In other words, you need to handle a result that has 2148 bits.
A long double could be anything (it's implementation dependent). Most likely (Windows or not 80x86) is that it's a synonym for double, but sometimes it might be larger (e.g. the 80-bit format described on this Wikipedia page ). The best you can realistically hope for is a dodgy estimate with lots of precision loss and not a correct result.
The worst case (the most likely case) is that the exponent isn't big enough either. E.g. for double the (unbiased) exponent has to be in the range −1022 to +1023 so attempting to shove a 2048 bit number in there will cause an overflow (an infinity).
What you're actually being asked to do is implement a program that uses "big integers". The idea would be to store the numbers as arrays of integers, like uint32_t result[2148/32];, so that you actually do have enough bits to get a correct result without precision loss or overflow problems.
With this in mind, you want a multiplication algorithm that can work with big integers. Note: I'd recommend something from that Wikipedia page's "Algorithms for multiplying by hand" section - there's faster/more advanced algorithms that are way too complicated for (what I assume is) a university assignment.
Also, the "HINT: multiplication can be done using repeated additions" is a red herring to distract you. It'd take literally days for a computer do the equivalent of a while(source2 != 0) { result += source1; source2--; } with large numbers.

Here's a few hints.
Multiplying a 2048 digit string by a 100 digit string might yield a string with as many as 2148 digits. That's two high for any primitive C type. So you'll have to do all the math the hard way against "strings". So stay in the string space since your input will most likely be read in as much.
Let's say you are trying to multiple "123456" x "789".
That's equivalent to (123456 * (700 + 80 + 9)
Which is equivalent to to 123456 * 700 + 123456 * 80 + 123456 * 9
Which is equivalent to doing these steps:
result1 = Multiply 123456 by 7 and add two zeros at the end
result2 = Multiply 123456 by 8 and add one zero at the end
result3 = Multiply 123456 by 9
final result = result1+result2+result3
So all you need is a handful of primitives that can take a digit string of arbitrary length and do some math operations on it.
You just need these three functions:
// Returns a new string that is identical to s but with a specific number of
// zeros added to the end.
// e.g. MultiplyByPowerOfTen("123", 3) returns "123000"
char* MultiplyByPowerOfTen(char* s, size_t zerosToAdd)
{
};
// Performs multiplication on the big integer represented by s
// by the specified digit
// e.g. Multiple("12345", 2) returns "24690"
char* Multiply(char* s, int digit) // where digit is between 0 and 9
{
};
// Performs addition on the big integers represented by s1 and s2
// e.g. Add("12345", "678") returns "13023"
char* Add(char* s1, char* s2)
{
};
Final hint. Any character at position i in your string can be converted to its integer equivalent like this:
int digit = s[i] - '0';
And any digit can be converted back to a printable char:
char c = '0' + digit

Reduce() is not working as expected in Kotlin for sum operation for some value

val arr = arrayListOf(256741038, 623958417 ,467905213, 714532089, 938071625)
arr.sort()
val max = (arr.slice(1 until arr.size)).reduce { x, ars -> x+ars }
so I want the max sum of 4 out of 5 elements in an array but I am having an answer which is not expected
max = -1550499952
I don't know what going wrong there because it's working for many cases but not for this.
The expected output would be:
max = 2744467344

If you ever see a negative number appearing out of nowhere, that's a sign that you've got an overflow. The largest number an Int can represent is 2147483647 - add 1 to that and you get -2147483648, a negative number.
That's because signed integers represent negative numbers with a 1 in the most significant bit of the binary representation. Your largest positive number is 0111 (except with 32 bits not 4!), then you add 1 and it ticks over to 1000, the largest negative number. Then as you add to that, it moves towards zero, until you have 1111 (which is -1). Add another 1 and it overflows (there's no space to represent 10000) and you're back at zero, 0000.
Anyway point is you're adding lots of big numbers together and an Int can't hold the result. It keeps overflowing, so you lose the bigger digits (it can't represent more than ~2 billion) and it can be negative depending on where the overflow ends up, which half of the binary range it lands in.
You can fix this by using Longs instead (64-bits, max values +/- 9 quintillion, lots of room):
// note the L's to make them Longs
arrayListOf(256741038L, 623958417L ,467905213L, 714532089L, 938071625L)

Can some one explain why GMP mpz_sizeinbase will return 1 too big?

I use gmp library write a C program like
len = mpz_sizeinbase(res, 10);
when res = 9, it gives me 2. So i check manual and it says
size_t mpz_sizeinbase (mpz_t op, int base)
Return the size of op measured in number of digits in the given base. base can vary from 2 to 62. The sign of op is ignored, just the absolute value is used. The result will be either exact or 1 too big. If base is a power of 2, the result is always exact. If op is zero the return value is always 1.
I just want to know why this function design with this leak? Why CAN'T it be exact?
Some question i found similar :
GMP mpz_sizeinbase returns size 2 for 9 in base 10
Number of digits of GMP integer

mpz_sizeinbase does not look at the whole number but only at the highest word. It then estimates the size. Problem is that it might be looking at 999999999 or 1000000000. To know exactly which of the two it is all bits of the number would have to be looked at. What mpz_sizeinbase does is (using word == digit for the example) compute the size for 9xxxxxxxx. The xxxxxxxx part is ignored and could cause an overflow of the first digit. So the size is increased by one and returned.
This lets you allocate enough space for coneverting the number quickly with only a minimal waste in some cases. The alternative would be to convert the whole number just to get the size, allocate the buffer and then do it all over again to actually store the result.

The max number of digits in an int based on number of bits

So, I needed a constant value to represent the max number of digits in an int, and it needed to be calculated at compile time to pass into the size of a char array.
To add some more detail: The compiler/machine I'm working with has a very limited subset of the C language, so none of the std libraries work as they have unsupported features. As such I cannot use INT_MIN/MAX as I can neither include them, nor are they defined.
I need a compile time expression that calculates the size. The formula I came up with is:
((sizeof(int) / 2) * 3 + sizeof(int)) + 2
It is marginally successful with n byte integers based on hand calculating it.
sizeof(int) INT_MAX characters formula
2 32767 5 7
4 2147483647 10 12
8 9223372036854775807 19 22

You're looking for a result related to a logarithm of the maximum value of the integer type in question (which logarithm depends on the radix of the representation whose digits you want to count). You cannot compute exact logarithms at compile time, but you can write macros that estimate them closely enough for your purposes, or that compute a close enough upper bound for your purposes. For example, see How to compute log with the preprocessor.
It is useful also to know that you can convert between logarithms in different bases by multiplying by appropriate constants. In particular, if you know the base-a logarithm of a number and you want the base-b logarithm, you can compute it as
logb(x) = loga(x) / loga(b)
Your case is a bit easier than the general one, though. For the dimension of an array that is not a variable-length array, you need an "integer constant expression". Furthermore, your result does not need more than two digits of precision (three if you wanted the number of binary digits) for any built-in integer type you'll find in a C implementation, and it seems like you need only a close enough upper bound.
Moreover, you get a head start from the sizeof operator, which can appear in integer constant expressions and which, when applied to an integer type, gives you an upper bound on the base-256 logarithm of values of that type (supposing that CHAR_BIT is 8). This estimate is very tight if every bit is a value bit, but signed integers have a sign bit, and they may have padding bits as well, so this bound is a bit loose for them.
If you want a a bound on the number of digits in a power-of-two radix then you can use sizeof pretty directly. Let's suppose, though, that you're looking for the number of decimal digits. Mathematically, the maximum number of digits in the decimal representation of an int is
N = ceil(log10(MAX_INT))
or
N = floor(log10(MAX_INT)) + 1
provided that MAX_INT is not a power of 10. Let's express that in terms of the base-256 logarithm:
N = floor( log256(MAX_INT) / log256(10) ) + 1
Now, log256(10) cannot be part of an integer constant expression, but it or its reciprocal can be pre-computed: 1 / log256(10) = 2.40824 (to a pretty good approximation; the actual value is slightly less). Now, let's use that to rewrite our expression:
N <= floor( sizeof(int) * 2.40824 ) + 1
That's not yet an integer constant expression, but it's close. This expression is an integer constant expression, and a good enough approximation to serve your purpose:
N = 241 * sizeof(int) / 100 + 1
Here are the results for various integer sizes:
sizeof(int) INT_MAX True N Computed N
1 127 3 3
2 32767 5 5
4 2147483648 10 10
8 ~9.223372037e+18 19 20
(The values in the INT_MAX and True N columns suppose one of the allowed forms of signed representation, and no padding bits; the former and maybe both will be smaller if the representation contains padding bits.)
I presume that in the unlikely event that you encounter a system with 8-byte ints, the extra one byte you provide for your digit array will not break you. The discrepancy arises from the difference between having (at most) 63 value bits in a signed 64-bit integer, and the formula accounting for 64 value bits in that case, with the result that sizeof(int) is a bit too much of an overestimation of the base-256 log of INT_MAX. The formula gives exact results for unsigned int up to at least size 8, provided there are no padding bits.
As a macro, then:
// Expands to an integer constant expression evaluating to a close upper bound
// on the number the number of decimal digits in a value expressible in the
// integer type given by the argument (if it is a type name) or the the integer
// type of the argument (if it is an expression). The meaning of the resulting
// expression is unspecified for other arguments.
#define DECIMAL_DIGITS_BOUND(t) (241 * sizeof(t) / 100 + 1)

An upper bound on the number of decimal digits an int may produce depends on INT_MIN.
// Mathematically
max_digits = ceil(log10(-INT_MAX))
It is easier to use the bit-width of the int as that approximates a log of -INT_MIN. sizeof(int)*CHAR_BIT - 1 is the max number of value bits in an int.
// Mathematically
max_digits = ceil((sizeof(int)*CHAR_BIT - 1)* log10(2))
// log10(2) --> ~ 0.30103
On rare machines, int has padding, so the above will over estimate.
For log10(2), which is about 0.30103, we could use 1/3 or one-third.
As a macro, perform integer math and add 1 for the ceiling
#include <stdlib.h>
#define INT_DIGIT10_WIDTH ((sizeof(int)*CHAR_BIT - 1)/3 + 1)
To account for a sign and null character add 2, use the following. With a very tight log10(2) fraction to not over calculate the buffer needs:
#define INT_STRING_SIZE ((sizeof(int)*CHAR_BIT - 1)*28/93 + 3)
Note 28/93 = ‭0.3010752... > log2(10)
The number of digits needed for any base down to base 2 would need follows below. It is interesting that +2 is needed and not +1. Consider a 2 bit signed number in base 2 could be "-10", a size of 4.
#define INT_STRING2_SIZE ((sizeof(int)*CHAR_BIT + 2)

Boringly, I think you need to hardcode this, centred around inspecting sizeof(int) and consulting your compiler documentation to see what kind of int you actually have. (All the C standard specifies is that it can't be smaller than a short, and needs to have a range of at least -32767 to +32767, and 1's complement, 2's complement, and signed magnitude can be chosen. The manner of storage is arbitrary although big and little endianness are common.) Note that an arbitrary number of padding bits are allowed, so you can't, in full generality, impute the number of decimal digits from the sizeof.
C doesn't support the level of compile time evaluable constant expressions you'd need for this.
So hardcode it and make your code intentionally brittle so that compilation fails if a compiler encounters a case that you have not thought of.
You could solve this in C++ using constexpr and metaprogramming techniques.

((sizeof(int) / 2) * 3 + sizeof(int)) + 2
is the formula I came up with.
The +2 is for the negative sign and the null terminator.

If we suppose that integral values are either 2, 4, or 8 bytes, and if we determine the respective digits to be 5, 10, 20, then a integer constant expression yielding the exact values could be written as follows:
const int digits = (sizeof(int)==8) ? 20 : ((sizeof(int)==4) ? 10 : 5);
int testArray[digits];
I hope that I did not miss something essential. I've tested this at file scope.

Pretty print a double number in a fixed number of chars

What is the simplest solution to print a double (printf) in C so that:
exactly N characters are used (will be around 6) for all double numbers (nan and infinities are handled separately), positive and negative alike (+ or - always as first char);
decimal representation ('.' always present) is used as long as the numeric chars are not all 0 (i.e. too small number) or the decimal point is the last of the N char (i.e too big number). Otherwise switch to scientific representation, always occupying exactly N chars.
All the solutions I can think of seem quite involved, any idea to obtain this result easily (efficiency is not a concern here) ?
Thanks!

I could not find a way to do this via a single printf call, here is my solution.
At least 9 chars must be used as (with +- in front) that's the minimum amount of chars for scientific notation (for example: +1.0E-002). In the following I consider the case of 9 chars. The following two formats are used based on the conditions reported below:
Scientific format '%+.1e':
chars 4 to 9 as per decimal format are 0 and the number is not identical to 0 (i.e. too small for decimal)
the '.' char is not present between char 3 and char 8 as per decimal format (i.e. too large for decimal)
Decimal format '%+.6f':
Infinite or nan
All other cases
It's easy to adapt to a representation longer than 9 chars by changing the constants above.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight