Buffer for a 3 digit number - c

I want to create a buffer for a 3 digit number, status, and the null characters. Currently I have
char buf[4];
sprintf(buf, "%d",status);
My question is whether this is sufficient size for a 3 digit number, which is returned as the status from waitpid()

Yes, but there's nothing preventing the user (or however status is filled in) from punching in a lot more than three digits and overflowing that little buffer.
The snprintf function should be used instead, it will cut off input that is too long:
snprintf(buf, 4, "%d", status); // the 4 is the length of buf

Yes: 3 characters/digits and the trailing null.

Better be safe than sorry.
If you want it be safe, make sure it really 3 digits number.
For example you can use mod operator:
char buf[4];
sprintf(buf, "%d",status % 1000);
Note that if you choose the snprintf solution, and you give 1001, you'll get result of 100, if you use this solution you'll get 1.
You can choose what is more suitable for you.
You can also combine the both:
snprintf(buf, sizeof(buf), "%d",status % 1000);

It is safe for a strictly three digit decimal, but waitpid() makes no such guarantees; the type pid_t is a 32 bit signed integer, and three digits seems a somewhat optimistic assumption. Y
You should range check the waitpid() return value to guarantee that it will fit in your buffer, but it is generally unnecessary to be quite so sparing with memory; a buffer of 12 characters will allow any 32 bit signed integer to be represented as a decimal string (worst case 10 digits, a sign character and a nul terminator). Similarly for 16 bit, use 7 and 64 bit use 23. In general:
decimal_string_buffer_len = ceil( log( 2 sizeof(type) * CHAR_BIT)) + 3

Related

Why is log base 10 used in this code to convert int to string?

I saw a post explaining how to convert an int to a string. In the explanation there is a line of code to get the number of chars in a string:
(int)((ceil(log10(num))+1)*sizeof(char))
I’m wondering why log base 10 is used?
ceil(log10(num))+1 is incorrectly being used instead of floor(log10(num))+2.
The code is attempting to determine the amount of memory needed to store the decimal representation of the positive integer num as a string.
The two formulas presented above are equal except for numbers which are exact powers of 10, in which case the former version returns one less than the desired number.
For example, 10,000 requires 6 bytes, yet ceil(log10(10000))+1 returns 5. floor(log10(10000))+2 correctly returns 6.
How was floor(log10(num))+2 obtained?
A 4-digit number such as 4567 will be between 1,000 (inclusive) and 10,000 (exclusive), so it will be between 103 (inclusive) and 104 (exclusive), so log10(4567) will be between 3 (inclusive) and 4 (exclusive).
As such, floor(log10(num))+1 will return number of digits needed to represent the positive value num in decimal.
As such, floor(log10(num))+2 will return the amount of memory needed to store the decimal representation of the positive integer num as a string. (The extra char is for the NUL that terminates the string.)
I’m wondering why log base 10 is used?
I'm wondering the same thing. It uses a very complex calculation that happens at runtime, to save a couple bytes of temporary storage. And it does it wrong.
In principle, you get the number of digits in base 10 by taking the base-10 logarithm and flooring and adding 1. It comes exactly from the fact that
log10(1) = log10(10⁰) = 0
log10(10) = log10(10¹) = 1
log10(100) = log10(10²) = 2
and all numbers between 10 and 100 have their logarithms between 1 and 2 so if you floor the logarithm for any two digit number you get 1... add 1 and you get the number of digits.
But you do not need to do this at runtime. The maximum number of bytes needed for a 32-bit int in base 10 is 10 digits, negative sign and null terminator for 12 chars. The maximum you can save with the runtime calculation are 10 bytes of RAM, but it is usually temporary so it is not worth it. If it is stack memory, well, the call to log10, ceil and so forth might require far more.
In fact, we know the maximum number of bits needed to represent an integer: sizeof (int) * CHAR_BIT. This is greater than or equal to log2 of the MAX_INT + 1. And we know that log10(x) =~ 3.32192809489 * log2(x), so we get a good (possibly floored) approximation of log10(MAX_INT) by just dividing sizeof (int) * CHAR_BIT by 3. Then add 1 for we were supposed to add 1 to the floored logarithm to get the number of digits, then 1 for possible sign, and 1 for the null terminator and we get
sizeof (int) * CHAR_BIT / 3 + 3
Unlike the one from your question, this is an integer constant expression, i.e. the compiler can easily fold it at the compilation time, and it can be used to set the size of a statically-typed array, and for 32-bits it gives 13 which is only one more than the 12 actually required, for 16 bits it gives 8 which is again only one more than the maximum required 7 and for 8 bits it gives 5 which is the exact maximum.
ceil(log10(num)) + 1 is intended to provide the number of characters needed for the output string.
For example, if num=101, the expression's value is 4, the correct length of '101' plus the null terminator.
But if num=100, the value is 3. This behavior is incorrect.
This is because it's allocating enough space for the number to fit in the string.
If, for example, you had the number 1034, log10(1034) = 3.0145.... ceil(3.0145) is 4, which is the number of digits in the number. The + 1 is for the null-terminator.
This isn't perfect though: take 1000, for example. Despite having four digits, log(1000) = 3, and ceil(3) = 3, so this will allocate space for too few digits. Plus, as #phuclv mentions below, the log() function is very time-consuming for this purpose, especially since the length of a number has a (relatively low) upper-bound.
The reason it's log base 10 is because, presumably, this function represents the number in decimal form. If, for example, it were hexadecimal, log base 16 would be used.
A number N has n decimal digits iff 10^(n-1) <= N < 10^n which is equivalent to n-1 <= log(N) < n or n = floor(log(N)) + 1.
Since double representation has only limited precision floor(log(N)) may be off by 1 for certain values, so it is safer to allow for an extra digit i.e. allocate floor(log(N)) + 2 characters, and then another char for the nul terminator for a total of of floor(log(N)) + 3.
The expression in the original question ceil(log(N)) + 1 appears to not count the nul terminator, and neither allow for the chance of rounding errors, so it is one shorter in general, and two shorter for powers of 10.

Difference in the values of atoi

I have the following code:
char* input = (char*)malloc(sizeof(char) * BUFFER) // buffer is defined to 100
int digit = atoi(input); // convert char into a digit
int digit_check = 0;
digit_check += digit % 10; // get last value of digit
When I run the input 1234567896 and similarly digit = 1234567896 and digit_check = 6.
However when I run the input 9999999998, digit = 1410065406 and therefore digit_check = 6 when it should be 8.
For the second example, why is there a difference between input and digit when it should be the same value?
Probably because 9999999998 is bigger then the maximum (signed) integer representation, so you get an overflow.
In fact this is the binary representation of 9999999998 and 1410065406:
10 01010100 00001011 11100011 11111110
01010100 00001011 11100011 11111110
As you can see if you see 1410065406 is the 32ed bit value of 9999999998
atoi is limited to an int size (32 bits on most recent plateform).
If you want to handle large numbers, you can use atol or scanf("%ld").
Don't forget to type your variable to long int (or long).
You could also just getting the very last character of your input (gathered as a string rather than as an int) and use atoi on it, so it would never overflow.
On many platforms size of int is limited by 4 bytes, that limits digit in [-2 ** 31, 2**31 - 1].
Use long (or long long) with strtol (or strtoll) depending on platform you build for. For example, GCC on x86 will have 64-bit long long, and for amd64 it will have 64-bit long and long long types.
So:
long long digit = strtoll(input, NULL, 10);
NOTE: strtoll() is popular in Unix-like systems and became standard in C++11, but not all VC++ implementations have it. Use _strtoi64() instead:
__int64 digit = _strtoi64(input, NULL, 10);
You probably want to use the atoll function, which returns a long long int, that is twice as big as int (most likely 64 bits in your case).
It is declared in stdlib.h
http://linux.die.net/man/3/atoll
You should avoid to call atoi on uninitialized string, if there is no \0 on the string, you will invalid read and have a segmentation fault.
You should use strtoimax instead, it's more safe.
9999999998 is bigger then the maximum value that an integer can represent. Either use atol() OR atoll()
You should stop using atoi function or any other functions from ato... group. These functions are not officially deprecated, but they are effectively abandoned since 1995 and exist only for legacy code compatibility purposes. Forget about these functions as if they do not exist. These function provide no usable feedback in case of error or overflow. And overflow is what apparently happens in your example.
In order to convert strings to numbers, C standard library provides strtol function and other functions from strto... group. These are the functions you should use to perform the conversion. And don't forget to check the result for overflow: strto... functions provide this feedback through the return value and errno variable.

Dynamic allocation in C

I'm writing a program and I have the following problem:
char *tmp;
sprintf (tmp,"%ld",(long)time_stamp_for_file_name);
Could someone explain how much memory allocate for the string tmp.
How many chars are a long variable?
Thank you,
I would appreciate also a link to an exahustive resource on this kind of information.
Thank you
UPDATE:
Using your examples I got the following problem:
root#-[/tmp]$cat test.c
#include <stdio.h>
int
main()
{
int len;
long time=12345678;
char *tmp;
len=snprintf(NULL,0,"%ld",time);
printf ("Lunghezza:di %ld %d\n",time,len);
return 0;
}
root#-[/tmp]$gcc test.c
root#-[/tmp]$./a.out
Lunghezza:di 12345678 -1
root#-[/tmp]$
So the len result from snprintf is -1, I compiled on Solaris 9 with the standard compiler.
Please help me!
If your compiler conforms to C99, you should be able to do:
char *tmp;
int req_bytes = snprintf(NULL, 0, "%ld",(long)time_stamp_for_file_name);
tmp = malloc(req_bytes +1); //add +1 for NULL
if(!tmp) {
die_horrible_death();
}
if(snprintf(tmp, req_bytes+1, "%ld",(long)time_stamp_for_file_name) != req_bytes) {
die_horrible_death();
}
Relevant parts of the standard (from the draft document):
7.19.6.5.2: If n is zero, nothing is written, and s may be a null pointer.
7.19.6.5.3: The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a negative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
If this is not working, I'm guessing your compiler/libc does not support this part of c99, or you might need to explicitly enable it. Wh I run your example (with gcc version 4.5.0 20100610 (prerelease), Linux 2.6.34-ARCH), I get
$./example
Lunghezza:di 12345678 8
The number of chars actually used obviously depends on the value: if time_stamp_for_file_name is 0 you only actually need 2 bytes. If there's any doubt, you can use snprintf, which tells you how much space you need:
int len = snprinf(0, 0, "%ld", (long)time_stamp_for_file_name) + 1;
char *tmp = malloc(len);
if (tmp == 0) { /* handle error */ }
snprintf(tmp, len, "%ld", (long)time_stamp_for_file_name);
Beware implementations where snprintf returns -1 for insufficient space, rather than the space required.
As Paul R says, though, you can figure out a fixed upper bound based on the size of long on your implementation. That way you avoid dynamic allocation entirely. For example:
#define LONG_LEN (((sizeof(long)*CHAR_BIT)/3)+2)
(based on the fact that the base-2 log of 10 is greater than 3). That +2 gives you 1 for the minus sign and 1 for the fact that integer division rounds down. You'd need another 1 for the nul terminator.
Or:
#define STRINGIFY(ARG) #ARG
#define EXPAND_AND_STRINGIFY(ARG) STRINGIFY(ARG)
#define VERBOSE_LONG EXPAND_AND_STRINGIFY(LONG_MIN)
#define LONG_LEN sizeof(VERBOSE_LONG)
char tmp[LONG_LEN];
sprintf(tmp, "%ld", (long)time_stamp_for_file_name);
VERBOSE_LONG might be a slightly bigger string than you actually need. On my compiler it's (-2147483647L-1). I'm not sure whether LONG_MIN can expand to something like a hex literal or a compiler intrinsic, but if so then it could be too short, and this trick won't work. It's easy enough to unit-test, though.
If you want a tight upper bound to cover all possibilities within the standard, up to a certain limit, you could try something like this:
#if LONG_MAX <= 2147483647L
#define LONG_LEN 11
#else
#if LONG_MAX <= 4294967295L
#define LONG_LEN 11
#else
#if LONG_MAX <= 8589934591L
... etc, add more clauses as new architectures are
invented with bigger longs
#endif
#endif
#endif
But I doubt it's worth it: better just to define it in some kind of portability header and configure it manually for new platforms.
It's hard to tell in advance, although I guess you could guesstimate that it'll be at the most 64 bits, and thus "18,446,744,073,709,551,615" should be the largest possible value. That's 2+6*3 = 20 digits, the commas are generally not included. It'd be 21 for a negative number. So, go for 32 bytes as a nice and round size.
Better would be to couple that with using snprintf(), so you don't get a buffer overflow if your estimate is off.
It depends on how big long is on your system. Assuming a worst case of 64 bits then you need 22 characters max - this allows for 20 digits, a preceding - and a terminating \0. Of course if you're feeling extravagant you could always allow a little extra and make it a nice round number like 32.
It takes log210 (~3.32) bits to represent a decimal digit; thus, you can compute the number of digits like so:
#include <limits.h>
#include <math.h>
long time;
double bitsPerDigit = log10(10.0) / log10(2.0); /* or log2(10.0) in C99 */
size_t digits = ceil((sizeof time * (double) CHAR_BIT) / bitsPerDigit);
char *tmp = malloc(digits+2); /* or simply "char tmp[digits+2];" in C99 */
The "+2" accounts for sign and the 0 terminator.
Octal requires one character per three bits. You print to base of ten which never gives more digits than octal for same number. Therefore, allocate one character for each three bits.
sizeof(long) gives you amount of bytes when compiling. Multiply that by 8 to get bits. Add two before dividing by three so you get ceiling instead of floor. Remember the C strings want a final zero byte to their end, so add one to the result. (Another one for negative, as described in comments).
char tmp[(sizeof(long)*8+2)/3+2];
sprintf (tmp,"%ld",(long)time_stamp_for_file_name);
3*sizeof(type)+2 is a safe general rule for the number of bytes needed to format an integer type type as a decimal string, the reason being that 3 is an upper bound on log10(256) and a n-byte integer is n digits in base-256 and thus ceil(log10(256^n))==ceil(n*log10(256)) digits in base 10. The +2 is to account for the terminating NUL byte and possible minus sign if type is very small.
If you want to be pedantic and support DSPs and such with CHAR_BIT!=8 then use 3*sizeof(type)*((CHAR_BIT+7)/8)+2. (Note that for POSIX systems this is irrelevant since POSIX requires UCHAR_MAX==255 and CHAR_BIT==8.)

How do I prevent buffer overflow converting a double to char?

I'm converting a double to a char string:
char txt[10];
double num;
num = 45.344322345
sprintf(txt, "%.1f", num);
and using ".1f" to truncate the decimal places, to the tenths digit.
i.e. - txt contains 45.3
I usually use precision in sprintf to ensure the char buffer is not overflowed.
How can I do that here also truncating the decimal, without using snprintf?
(i.e. if num = 345694876345.3 for some reason)
Thanks
EDIT If num is > buffer the result no longer matters, just do not want to crash. Not sure what would make the most sense in that case.
EDIT2 I should have made it more clear than in just the tag, that this is a C program.
I am having issues using snprintf in a C program. I don't want to add any 3rd party libraries.
Use snprintf() , which will tell you how many bytes were not printed. In general, you should size your array to be large enough to handle the longest string representation of the target integer type. If not known in advance, use malloc() (or asprintf(), which is non-standard, but present on many platforms).
Edit
snprintf() will fail gracefully if the format exceeds the given buffer, it won't overflow. If you don't need to handle that, then simply using it will solve your problem. I can't think of an instance where you would not want to handle that, but then again, I'm not working on whatever you are working on :)
Why not just make your buffer big enough to hold the largest possible string representation of a double?
Assuming a 64-bit double using the IEEE standard for floating point arithmetic, which uses 52 bits for a mantissa: 2^52 = 4,503,599,627,370,500. So we need 16 characters to hold all the digits before and after the decimal point. 19 considering the decimal point, sign character and null terminator.
I would just use a buffer size of at least 20 characters and move on.
If you need to print a double using scientific notation, you will need to add enough space for the exponent. Assuming a 11 bit signed exponent, that's another 4 characters for the exponent plus a sign for the exponent and the letter 'E'. I would just go with 30 characters in that case.
If you absolutely must do it on your own, count the digits in the number before trying to convert:
int whole = num;
int wholeDigits = 0;
do {
++wholeDigits;
}
while (whole /= 10);
double fraction = num - (int) num;
int decimallDigits = 0;
while (fraction > 0) {
++decimalDigits;
fraction *= 10;
fraction = fraction - (int) fraction;
}
int totalLength = decimalDigits ? wholeDigits + decimalDigits + 1 : wholeDigits;
You should probably verify that this ad-hoc code works as advertised before relying on it to guard against crashes. I recommend that you use snprintf or something similar instead of my code, as others have said.
Why do you want to do it without snprintf? You should be using snprintf regardless of whether your format string contains a double, another string or anything else, really. As far as I can see, there's no reason not to.

Allocating memory for a char array to concatenate a known piece of text and an integer

I want to concatenate a piece of text, for example "The answer is " with a signed integer, to give the output "The number is 42".
I know how long the piece of text is (14 characters) but I don't know how many characters the string representation of the number will be.
I assume the worst case scenario, the largest signed 16-bit integer has 5 digits, plus one extra in case it is negative, so is the following code the correct way to do it?
#include <stdio.h>
#include <stdlib.h>
int main()
{
char *message;
message = malloc(14*sizeof(char)+(sizeof(int)*5)+1);
sprintf(message, "The answer is %d", 42);
puts(message);
free(message);
}
Use:
malloc(14*sizeof(char) /*for the 14 char text*/
+(sizeof(char)*5) /*for the magnitude of the max number*/
+1 /* for the sign of the number*/
+1 /* for NULL char*/
);
Since the digits will be represented
as char you have to use sizeof(char)
instead of sizeof(int).
Not quite, you only need a number of characters so sizeof(int) is not required.
However, for easily maintainable and portable code, you should have something like:
#define TEXT "The answer is "
#undef CHARS_PER_INT
#if INT_MAX == 32767
#define CHARS_PER_INT 6
#endif
#if INT_MAX == 2147483647
#define CHARS_PER_INT 11
#endif
#ifndef CHARS_PER_INT
#error Suspect system, I have no idea how many chars to allocate for an int.
#endif
int main (void) {
char *message;
message = malloc(sizeof(TEXT)+CHARS_PER_INT+1);
sprintf(message, TEXT "%d", 42);
puts(message);
free(message);
return 0;
}
This has a number of advantages:
If you change the string, you change one thing and one thing only. The argument to malloc adjusts automatically.
The expression sizeof(TEXT)+CHARS_PER_INT+1 is calculated at compile time. A solution involving strlen would have a runtime cost.
If you try to compile your code on a system where integers may cause overflow, you'll be told about it (go fix the code).
You should actually allocate an extra character for the number since the biggest 16-bit number (in terms of character count) is -32768 (six characters long). You'll notice I still have a +1 on the end - that's because you need space for the string null terminator.
One way of doing it (not necessarily recommended) that gives you the exact size of the number in characters is using the stdio functions themselves.
For example, if you print the number (somewhere, for whatever reason) before you allocate your memory, you can use the %n format identifier with printf. %n doesn't print anything; rather, you supply it with a pointer to int, and printf fills that with how many characters have been written so far.
Another example is snprintf, if you have it available. You pass it the maximum number of characters you want it to write to your string, and it returns the number of characters it should have written, not counting the final nul. (Or -1 on error.) So, using a 1-byte dummy string, snprintf can tell you exactly how many characters your number is.
A big advantage to using these functions is that if you decide to change the format of your number (leading 0's, padding spaces, octal output, long longs, whatever) you will not overrun your memory.
If you have GNU extensions to stdio, you may want to consider using asprintf. This is exactly like sprintf, except it does the memory allocation for you! No assembly required. (Although you do need to free it yourself.) But you shouldn't rely on it to be portable.
malloc((14 + 6 + 1) * sizeof(char));
14 char for the string
6 for de digits + sign
1 for the '\0'
Note : Sizeof(int) gives you the size of the type in byes. Sizeof(int) == 4 if the int is 32bits, 8 if it's a 64bits.
I think that the correct formula to get the maximum lenght of the decimal representation of an integer would be (floor(log10(INT_MAX))+1); you could also abuse of the preprocessor in this way:
#include <limits.h>
#define TOSTRING_(x) #x
#define TOSTRING(x) TOSTRING_(x)
/* ... */
#define YOUR_MESSAGE "The answer is "
char message[]=YOUR_MESSAGE "+" TOSTRING(INT_MAX);
sprintf(message+sizeof(YOUR_MESSAGE),"%d", 42);
, which also avoids the heap allocation. You may want to use snprintf for better security, although with this method it shouldn't be necessary.
Another trick like that would be to create a function like this:
size_t GetIntMaxLenght()
{
const char dummy[]=TOSTRING(INT_MAX);
return sizeof(dummy)+1;
}
if the compiler is smart enough it could completely sweep away the dummy var from the compiled code, otherwise it may be wise to declare that var as static to avoid reinitializing it every time the function is called.
A safe approximation for signed int is (number of digits including the potential - sign):
(CHAR_BIT * sizeof(int) + 1) / 3 + 1
The equivalent for unsigned is:
(CHAR_BIT * sizeof(unsigned) + 2) / 3
This calculates the number of digits - add one to both of them to account for the terminator, if allocating space for a null-terminated string.
This will slightly overestimate the space required for very long types (and will also overestimate in the unusual case where int has padding bits), but is a good approximation and has the advantage that it is a compile-time constant. CHAR_BIT is provided by <limits.h>.

Resources