Two part question;
I'm Coming from a high level Language, so this is a question about form not function;
I've written an isnumeric() function that takes a char[] and returns 1 if the string is a number taking advantage of the isdigit() function in ctype. Similar functions are builtin to other languages and I have always used something like that to integrity check the data before converting it to a numeric type. Mostly because some languages conversion functions fail badly if you try to convert a non-number string to an integer.
But it seems like a kludge having to do all that looping to compensate for the lack of strings in C, which poses the first part of the question;
Is it acceptable practice in C to trap for a 0 return from atoi() in lieu of doing an integrity check on the data before calling atoi()? The way atoi() (and other ascii to xx functions) works seems to lend itself well to eliminating the integrity check altogether. It would certainly seem more efficient to just skip the check.
The second part of the question is;
Is there a C function or common library function for a numeric integrity check
on a string? (by string, I of course mean char[])
Is it acceptable practice in C to trap for a 0 return from atoi() in lieu of doing an integrity check on the data before calling atoi()?
Never ever trap on error unless the error indicates a programming error that can't happen if there isn't a bug in the code. Always return some sort of error result in case of an error. Look at the OpenBSD strtonum function for how you could design such an interface.
The second part of the question is; Is there a C function or common library function for a numeric integrity check on a string? (by string, I of course mean char[])
Never use atoi unless you are writing a program without error checking as atoi doesn't do any error checking. The strtol family of functions allow you to check for errors. Here is a simply example of how you could use them:
int check_is_number(const char *buf)
{
const char *endptr;
int errsave = errno, errval;
long result;
errno = 0;
result = strtol(buf, &endptr, 0);
errval = errno;
errno = errsave;
if (errval != 0)
return 0; /* an error occured */
if (buf[0] == '\0' || *endptr != '\0')
return 0; /* not a number */
return 1;
}
See the manual page linked before for how the third argument to strtol (base) affects what it does.
errno is set to ERANGE if the value is out of range for the desired type (i.e. long). In this case, the return value is LONG_MAX or LONG_MIN.
If the conversion method returns an error indication (as distinct from going bananas if an error occurs, or not providing a definitive means to check if an error has occurred) then there is actually no need to check if a string is numeric before trying to convert it.
With that in mind, using atoi() is not a particularly good function to use if you need to check for errors on conversion. Zero will be returned for zero input, as well as an error, and there is no way to check on why. A better function to use is (assuming you want to read an integral value) is strtol(). Although strtol() returns zero on integer, it also returns information that can be used to check for failure. For example;
long x;
char *end;
x = strtol(your_string, &end, 10);
if (end == your_string)
{
/* nothing was read due to invalid character or the first
character marked the end of string */
}
else if (*end != '\0`)
{
/* an integral value was read, but there is following non-numeric data */
}
Second, there are alternatives to using strtol(), albeit involving more overhead. The return values from sscanf() (and, in fact, all functions in the scanf() family) can be checked for error conditions.
There is no standard function for checking if a string is numeric, but it can be easily rolled using the above.
int IsNumeric(char *your_string)
{
/* This has undefined behaviour if your_string is not a C-style string
It also deems that a string like "123AB" is non-numeric
*/
long x;
char *end;
x = strtol(your_string, &end, 10);
return !(end == your_string || *end != '\0`);
}
No (explicit) loops in any of the above options.
Is it acceptable practice in C to trap for a 0 return from atoi() in lieu of doing an integrity check on the data before calling atoi()?
No. #FUZxxl well answers that.
Is there a C function or common library function for a numeric integrity check on a string?
In C, the conversion of a string to a number and the check to see if the conversion is valid is usually done together. The function used depends on the type of number sought. "1.23" would make sense for a floating point type, but not an integer.
// No error handle functions
int atoi(const char *nptr);
long atol(const char *nptr);
long long atoll(const char *nptr);
double atof(const char *nptr);
// Some error detection functions
if (sscanf(buffer, "%d", &some_int) == 1) ...
if (sscanf(buffer, "%lf", &some_double) == 1) ...
// Robust methods use
long strtol( const char *nptr, char ** endptr, int base);
long long strtoll( const char *nptr, char ** endptr, int base);
unsigned long strtoul( const char *nptr, char ** endptr, int base);
unsigned long long strtoull( const char *nptr, char ** endptr, int base);
intmax_t strtoimax(const char *nptr, char ** endptr, int base);
uintmax_t strtoumax(const char *nptr, char ** endptr, int base);
float strtof( const char *nptr, char ** endptr);
double strtod( const char *nptr, char ** endptr);
long double strtold( const char *nptr, char ** endptr);
These robust methods use char ** endptr to store the string location where scanning stopped. If no numeric data was found, then *endptr == nptr. So a common test could is
char *endptr;
y = strto...(buffer, ..., &endptr);
if (buffer == endptr) puts("No conversion");
if (*endptr != '\0') puts("Extra text");
If the range was exceed these functions all set the global variable errno = ERANGE; and return a minimum or maximum value for the type.
errno = 0;
double y = strtod("1.23e10000000", &endptr);
if (errno == ERANGE) puts("Range exceeded");
The integer functions allow a radix selection from base 2 to 36. If 0 is used, the leading part of the string "0x", "0X", "0", other --> base 16, 16, 8, 10.
long y = strtol(buffer, &endptr, 10);
Read the specification or help page for more details.
You probably don't need a function to check whether a string is numeric. You will most likely need to convert the string to a number so just do that. Then check if the convertion is successful.
long number;
char *end;
number = strtol(string, &end, 10);
if ((*string == '\0') || (*end != '\0'))
{
// empty string or invalid number
}
the second argument of strtol is used to indicate where the parsing ended (the first non-numeric character). That character will be \0 if we've reached the end of the string. If you want to permit other characters after the number (like ), you can use switch to check for it.
strtol works with long integers. If you need some other type, you should consult the man page: man 3 strtol. For floating-point numbers you can use strtod.
Don't trap if the program logic permits that the string is not numeric (e.g. if it comes from the user or a file).
OP later commneted:
I'm looking for a way to determine if the string contains ONLY base 10 digits or a decimal or a comma. So if the string is 100,000.01 I want a positive return from func. Any other ascii characters anywhere in the string would result in a negative return value.
If is all your interest, use;
if (buffer[strspn(buffer, "0123456789.,")] == '\0') return 0; // Success
else return -1; // Failure
Related
Traditional strtol() is usually used like this:
int main()
{
errno = 0;
char *s = "12345678912345678900";
char *endptr;
long i = strtol(s, &endptr, 10);
if(i == LONG_MAX && errno == ERANGE)
printf("overflow");
}
We need to access errno two times, and errno nowadays is usually a C macro finally expanded to a function. It seems a little expensive considering parsing string to integer isn't a heavy job.
So, is it better to implement strtol without errno but using some other ways to indicating overflow?
like:
long strtol(const char *nptr, char **endptr, int base, bool *is_overflow);
instead of
long strtol(const char *nptr, char **endptr, int base);
is it better to implement strtol without errno ...
No.
... but using some other ways to indicating overflow?
No.
long int strtol(const char * restrict nptr, char ** restrict endptr, int base);
strtol() is a standard C library function and any implementation must adhere to proper use of the 3 inputs and errno to be compliant.
Of course OP can implement some other my_strtol() as desired.
Any performance concerns around avoiding errno are a micro-optimization yet a reasonable design goal.
It really comes down to how to conveys problems of string to long
Overflow "12345678912345678901234567890"
No conversions "abc"
Excess junk "123 abc"
Leading space allowed, trailing space allowed?
Allow various bases?
Once functionality about all exceptional cases are defined, not just overflow, then coding concerns about errno is useful, even if unlikely to make any meaningful performance improvements.
IMO, coding to one base only is likely a more productive path to speed improvements than errno.
OP code is not a robust strtol() usage. Suggest:
char *s = "12345678912345678900";
char *endptr;
errno = 0;
long i = strtol(s, &endptr, 10);
if (errno == ERANGE) printf("Overflow %ld\n", i);
else if (s == endptr) printf("No conversion %ld\n", i);
else if (*endptr) printf("Extra Junk %ld\n", i);
else printf("Success %ld\n", i);
There is actually some overhead besides errno in strtol(), like skipping spaces, taking care of the base (10 or hexa), check characters ...
In a specific environment where speed is critical and you know the string provided is a number base 10 that fits in a long, you could make your own quick function, like
#include <ctype.h>
long mystrtol(char *s) {
long res = 0, minus = *s == '-';
if (minus || *s == '+') s++;
while (isdigit(*s)) {
res = res*10 + (*s++ - '0');
}
return minus ? -res : res;
}
and choose to inline it.
If I'm reading numbers of type double from stdin, how can I check if the numbers being read are in fact valid (that the numbers are in fact a double)?
You can use strtod. Check if the result is zero and subsequently if endptr == nptr, according to the man page:
If no conversion is performed, zero is returned and the value of nptr is stored in the location referenced by endptr.
Something like this:
char input[50];
char * end;
double result = 0;
fgets(input, sizeof input, stdin);
errno = 0;
result = strtod(input, &end);
if(result == 0 && (errno != 0 || end == input)){
fprintf(stderr, "Error: input is not a valid double\n");
exit(EXIT_FAILURE);
}
EDIT there seems to be a bit of a discrepancy between the standard and the man page. The man page says that endptr == nptr when no conversion is performed, while the standard seems to imply this isn't necessarily the case. Worse still it says that in case of no conversion errno may be set to EINVAL. Edited the example code to check errno as well.
Alternatively, sscanf could be used (preferred over scanf), in conjunction with fgets:
/* just fgetsed input */
if(sscanf(input, "%lf", &result) != 1){
fprintf(stderr, "Error: input is not a valid double\n");
exit(EXIT_FAILURE);
}
Also, don't forget to check the return value of fgets for NULL, in case it failed!
Neither simple strtod nor sscanf are enough to distinguish cases such as 1,5 or 1blah from desired 1.0 - All of these will result in 1.0. The reason is that
The strtod(), strtof(), and strtold() functions convert the initial portion of the string pointed to by nptr to double, float, and long double representation, respectively.
To ensure that the entire string was a valid double literal, use strtod like this:
#include <stdlib.h>
#include <errno.h>
#include <stdio.h>
...
char *endptr;
errno = 0;
double result = strtod(input, &endptr);
if (errno != 0 || *endptr != '\0') {
fprintf(stderr, "the value could not be represented as a double exactly\n");
}
The errno will be set if the value cannot be represented (ERANGE). Additionally, end will be pointing to the first character not converted. If the locale has not been set, when parsing 1,5 or 1blah, endptr will point to the second character. Iff the entire string was successfully parsed as a double constant, *endptr will point to the terminating '\0'.
Note that the errno must be set to zero prior to calling the function, otherwise it will retain the value from a previous failed function call.
How can we check if an input string is a valid double?
Start with
strtod() for double,
strtof() for float and
strtold() for long double.
double strtod(const char * restrict nptr, char ** restrict endptr);
The strtod, ... functions convert the initial portion of the string pointed to by nptr to double ....
A pointer to the final string is stored in the object pointed to by endptr, provided that endptr is not a null pointer.
C11dr §7.22.1.3 2&5
Simplified code to check loosely for validity. Does not complain about over/underflow nor extra text.
// Return true on valid
bool valid_string_to_double(const char *s) {
char *end;
strtod(s, &end);
return s != end;
}
Challenges of using strto*() include: errno == RANGE on arithmetic overflow and maybe underflow. The return value on overflow is only specified in default rounding mode. That value is HUGE_VAL which may be an infinity or a great number. The return value on underflow is implementation defined. errno has been known to be set to other non-zero values on conditions not specified by the C spec. Leading white-space is allowed, trailing white-space is not considered.
Sample function that looks for 1) conversion, 2) extra space, 3) over/underflow. It not only returns a valid indication, it also addresses the value of the conversion and the state of errno afterward.
// Return 0 on success
// Return non-0 on error, adjust these values as needed - maybe as an `enum`?
int convert_string_to_double(double *y, const char *s) {
char *end;
errno = 0;
*y = strtod(s, &end);
if (s == end) {
return 1; // Failed: No conversion, *y will be 0
}
// This may/may not constitute an error - adjust per coding goals
// Too great or too small (yet not exactly 0.0)
if (errno == ERANGE) {
if (fabs(*y) > 1.0) {
return 2; // Overflow
}
// In the case of too small, errno _may_ be set. See §7.22.1.3 10.
// For high consistency, return 0.0 and/or clear errno and/or return success.
// *y = 0.0; errno = 0;
}
// What to do if the remainder of the string is not \0?
// Since leading whitespace is allowed,
// let code be generous and tolerate trailing whitespace too.
while (isspace((unsigned char) *end)) {
end++;
}
if (*end) {
return 3; // Failed: Extra non-white-space junk at the end.
}
return 0; // success
}
If the result underflows (7.12.1), the functions return a value whose magnitude is no greater than the smallest normalized positive number in the return type; whether errno acquires the value ERANGE is implementation-defined. C11dr §7.22.1.3 10
A consideration includes the value of errno after this function is done. The C spec only species errno == ERANGE for strtod(), yet various implementations have been known to set errno to other values for other reasons including "no conversion". Code could clear errno except when ERANGE for high consistency.
You could use the standard atof function. It returns 0 on fail - and you could test if the string was 0 beforehand.
http://www.cplusplus.com/reference/cstdlib/atof/
I tried
sscanf(str, "%016llX", &int64 );
but seems not safe. Is there a fast and safe way to do the type casting?
Thanks~
Don't bother with functions in the scanf family. They're nearly impossible to use robustly. Here's a general safe use of strtoull:
char *str, *end;
unsigned long long result;
errno = 0;
result = strtoull(str, &end, 16);
if (result == 0 && end == str) {
/* str was not a number */
} else if (result == ULLONG_MAX && errno) {
/* the value of str does not fit in unsigned long long */
} else if (*end) {
/* str began with a number but has junk left over at the end */
}
Note that strtoull accepts an optional 0x prefix on the string, as well as optional initial whitespace and a sign character (+ or -). If you want to reject these, you should perform a test before calling strtoull, for instance:
if (!isxdigit(str[0]) || (str[1] && !isxdigit(str[1])))
If you also wish to disallow overly long representations of numbers (leading zeros), you could check the following condition before calling strtoull:
if (str[0]=='0' && str[1])
One more thing to keep in mind is that "negative numbers" are not considered outside the range of conversion; instead, a prefix of - is treated the same as the unary negation operator in C applied to an unsigned value, so for example strtoull("-2", 0, 16) will return ULLONG_MAX-1 (without setting errno).
Your title (at present) contradicts the code you provided. If you want to do what your title was originally (convert a string to an integer), then you can use this answer.
You could use the strtoull function, which unlike sscanf is a function specifically geared towards reading textual representations of numbers.
const char *test = "123456789abcdef0";
errno = 0;
unsigned long long result = strtoull(test, NULL, 16);
if (errno == EINVAL)
{
// not a valid number
}
else if (errno == ERANGE)
{
// does not fit in an unsigned long long
}
At the time I wrote this answer, your title suggested you'd want to write an uint64_t into a string, while your code did the opposite (reading a hex string into an uint64_t). I answered "both ways":
The <inttypes.h> header has conversion macros to handle the ..._t types safely:
#include <stdio.h>
#include <inttypes.h>
sprintf( str, "%016" PRIx64, uint64 );
Or (if that is indeed what you're trying to do), the other way round:
#include <stdio.h>
#include <inttypes.h>
sscanf( str, "%" SCNx64, &uint64 );
Note that you cannot enforce widths etc. with the scanf() function family. It parses what it gets, which can yield undesired results when the input does not adhere to expected formatting. Oh, and the scanf() function family only knows (lowercase) "x", not (uppercase) "X".
gcc 4.4.4 c89
What is better to convert a string to an integer value.
I have tried 2 different methods atoi and sscanf. Both work as expected.
char digits[3] = "34";
int device_num = 0;
if(sscanf(digits, "%d", &device_num) == EOF) {
fprintf(stderr, "WARNING: Incorrect value for device\n");
return FALSE;
}
or using atoi
device_num = atoi(digits);
I was thinking that the sscanf would be better as you can check for errors. However, atoi doesn't doing any checking.
You have 3 choices:
atoi
This is probably the fastest if you're using it in performance-critical code, but it does no error reporting. If the string does not begin with an integer, it will return 0. If the string contains junk after the integer, it will convert the initial part and ignore the rest. If the number is too big to fit in int, the behaviour is unspecified.
sscanf
Some error reporting, and you have a lot of flexibility for what type to store (signed/unsigned versions of char/short/int/long/long long/size_t/ptrdiff_t/intmax_t).
The return value is the number of conversions that succeed, so scanning for "%d" will return 0 if the string does not begin with an integer. You can use "%d%n" to store the index of the first character after the integer that's read in another variable, and thereby check to see if the entire string was converted or if there's junk afterwards. However, like atoi, behaviour on integer overflow is unspecified.
strtol and family
Robust error reporting, provided you set errno to 0 before making the call. Return values are specified on overflow and errno will be set. You can choose any number base from 2 to 36, or specify 0 as the base to auto-interpret leading 0x and 0 as hex and octal, respectively. Choices of type to convert to are signed/unsigned versions of long/long long/intmax_t.
If you need a smaller type you can always store the result in a temporary long or unsigned long variable and check for overflow yourself.
Since these functions take a pointer to pointer argument, you also get a pointer to the first character following the converted integer, for free, so you can tell if the entire string was an integer or parse subsequent data in the string if needed.
Personally, I would recommend the strtol family for most purposes. If you're doing something quick-and-dirty, atoi might meet your needs.
As an aside, sometimes I find I need to parse numbers where leading whitespace, sign, etc. are not supposed to be accepted. In this case it's pretty damn easy to roll your own for loop, eg.,
for (x=0; (unsigned)*s-'0'<10; s++)
x=10*x+(*s-'0');
Or you can use (for robustness):
if (isdigit(*s))
x=strtol(s, &s, 10);
else /* error */
*scanf() family of functions return the number of values converted. So you should check to make sure sscanf() returns 1 in your case. EOF is returned for "input failure", which means that ssacnf() will never return EOF.
For sscanf(), the function has to parse the format string, and then decode an integer. atoi() doesn't have that overhead. Both suffer from the problem that out-of-range values result in undefined behavior.
You should use strtol() or strtoul() functions, which provide much better error-detection and checking. They also let you know if the whole string was consumed.
If you want an int, you can always use strtol(), and then check the returned value to see if it lies between INT_MIN and INT_MAX.
To #R.. I think it's not enough to check errno for error detection in strtol call.
long strtol (const char *String, char **EndPointer, int Base)
You'll also need to check EndPointer for errors.
Combining R.. and PickBoy answers for brevity
long strtol (const char *String, char **EndPointer, int Base)
// examples
strtol(s, NULL, 10);
strtol(s, &s, 10);
When there is no concern about invalid string input or range issues, use the simplest: atoi()
Otherwise, the method with best error/range detection is neither atoi(), nor sscanf().
This good answer all ready details the lack of error checking with atoi() and some error checking with sscanf().
strtol() is the most stringent function in converting a string to int. Yet it is only a start. Below are detailed examples to show proper usage and so the reason for this answer after the accepted one.
// Over-simplified use
int strtoi(const char *nptr) {
int i = (int) strtol(nptr, (char **)NULL, 10);
return i;
}
This is the like atoi() and neglects to use the error detection features of strtol().
To fully use strtol(), there are various features to consider:
Detection of no conversion: Examples: "xyz", or "" or "--0"? In these cases, endptr will match nptr.
char *endptr;
int i = (int)strtol(nptr, &endptr, 10);
if (nptr == endptr) return FAIL_NO_CONVERT;
Should the whole string convert or just the leading portion: Is "123xyz" OK?
char *endptr;
int i = (int)strtol(nptr, &endptr, 10);
if (*endptr != '\0') return FAIL_EXTRA_JUNK;
Detect if value was so big, the the result is not representable as a long like "999999999999999999999999999999".
errno = 0;
long L = strtol(nptr, &endptr, 10);
if (errno == ERANGE) return FAIL_OVERFLOW;
Detect if the value was outside the range of than int, but not long. If int and long have the same range, this test is not needed.
long L = strtol(nptr, &endptr, 10);
if (L < INT_MIN || L > INT_MAX) return FAIL_INT_OVERFLOW;
Some implementations go beyond the C standard and set errno for additional reasons such as errno to EINVAL in case no conversion was performed or EINVAL The value of the Base parameter is not valid.. The best time to test for these errno values is implementation dependent.
Putting this all together: (Adjust to your needs)
#include <errno.h>
#include <stdlib.h>
int strtoi(const char *nptr, int *error_code) {
char *endptr;
errno = 0;
long i = strtol(nptr, &endptr, 10);
#if LONG_MIN < INT_MIN || LONG_MAX > INT_MAX
if (errno == ERANGE || i > INT_MAX || i < INT_MIN) {
errno = ERANGE;
i = i > 0 : INT_MAX : INT_MIN;
*error_code = FAIL_INT_OVERFLOW;
}
#else
if (errno == ERANGE) {
*error_code = FAIL_OVERFLOW;
}
#endif
else if (endptr == nptr) {
*error_code = FAIL_NO_CONVERT;
} else if (*endptr != '\0') {
*error_code = FAIL_EXTRA_JUNK;
} else if (errno) {
*error_code = FAIL_IMPLEMENTATION_REASON;
}
return (int) i;
}
Note: All functions mentioned allow leading spaces, an optional leading sign character and are affected by locale change. Additional code is required for a more restrictive conversion.
Note: Non-OP title change skewed emphasis. This answer applies better to original title "convert string to integer sscanf or atoi"
If user enters 34abc and you pass them to atoi it will return 34.
If you want to validate the value entered then you have to use isdigit on the entered string iteratively
Working on a simple C program I'm stuck with an if test:
int line_number = 0;
if ((line_number >= argv[2]) && (line_number <= argv[4]))
gcc says:
cp.c:25: warning: comparison between pointer and integer
cp.c:25: warning: comparison between pointer and integer
What can I do to properly check the range of lines I want to deal with?
Of course it doesn't work: argv is a pointer to pointer to char.. it's not clear what you want to do but think about that argv[2] is third parameter and argv[4] is fifth one. But they are of char* type (they are strings) so if you want to parse them as integers you should do it using the function atoi:
int value = atoi(argv[2]);
will parse int that was as third parameter and place it into variable, then you can check whatever you want.
You should not be using function atoi. If fact, you should forget it ever existed. It has no practical uses.
While Jack's answer is correct in stating that the argv strings have to be converted to numbers first, using atoi for that purpose (specifically in the situation when the input comes from the "outside world") is a crime against C programming. There are virtually no situations when atoi can be meaningfully used in a program.
The function that you should be using in this case is strtol
char *end;
long long_value = strtol(argv[2], &end, 10);
if (*end != '\0' || errno == ERANGE)
/* Conversion error happened */;
The exact error checking condition (like whether to require *end == '\0') will actually depend on your intent.
If you want to obtain an int in the end, you should also check the value for int range (or for your application-specific range)
if (long_value < INT_MIN || long_value > INT_MAX)
/* Out of bounds error */;
int value = long_value;
/* This is your final value */