Related
I am using getopt to parse arguments passed to my c program. I want to check if theoptarg is all numeric right at the point of parsing it, but I do not know how to get length of the value in the optarg:
int opt;
unsigned short port;
while((opt = getopt(argc, argv, "p:")) != EOF) {
switch(opt) {
case 'p':
for(int i = 0; i < OPTARG_LENGTH; i++) {
if(!isnum[optarg[i]]) exit();
}
port = strtoul(optarg, NULL, 10);
break;
}
So how do i find out OPTARG_LENGTH ?
strlen(optarg) would give you that information, but don't try to check for integer value with that naive method, let strtoul do the exact checking for you.
For that you have to modify the way you're calling it to pass a pointer on a char pointer so strtoul points out the error if there's one.
(For instance, if the number exceeds unsigned long limit, your check is useless)
Also note that port is an unsigned short, so you have to perform an extra check before assigning it, or you'll get overflow.
char *temp;
unsigned long port_ul = strtoul(optarg,&temp,10);
if (optarg != temp && *temp == '\0' && port_ul <= USHRT_MAX) // needs #include <limits.h>
{
// argument is properly parsed: ok
port = port_ul;
}
the check is done in 3 parts:
check that *temp actually points on the end of the string
check that the string isn't empty (often forgotten, thanks to chux for the tip in another answer)
since there's no standard string to unsigned short conversion, use limits.h to ensure you're not getting overflow when assigning to unsigned short.
Two part question;
I'm Coming from a high level Language, so this is a question about form not function;
I've written an isnumeric() function that takes a char[] and returns 1 if the string is a number taking advantage of the isdigit() function in ctype. Similar functions are builtin to other languages and I have always used something like that to integrity check the data before converting it to a numeric type. Mostly because some languages conversion functions fail badly if you try to convert a non-number string to an integer.
But it seems like a kludge having to do all that looping to compensate for the lack of strings in C, which poses the first part of the question;
Is it acceptable practice in C to trap for a 0 return from atoi() in lieu of doing an integrity check on the data before calling atoi()? The way atoi() (and other ascii to xx functions) works seems to lend itself well to eliminating the integrity check altogether. It would certainly seem more efficient to just skip the check.
The second part of the question is;
Is there a C function or common library function for a numeric integrity check
on a string? (by string, I of course mean char[])
Is it acceptable practice in C to trap for a 0 return from atoi() in lieu of doing an integrity check on the data before calling atoi()?
Never ever trap on error unless the error indicates a programming error that can't happen if there isn't a bug in the code. Always return some sort of error result in case of an error. Look at the OpenBSD strtonum function for how you could design such an interface.
The second part of the question is; Is there a C function or common library function for a numeric integrity check on a string? (by string, I of course mean char[])
Never use atoi unless you are writing a program without error checking as atoi doesn't do any error checking. The strtol family of functions allow you to check for errors. Here is a simply example of how you could use them:
int check_is_number(const char *buf)
{
const char *endptr;
int errsave = errno, errval;
long result;
errno = 0;
result = strtol(buf, &endptr, 0);
errval = errno;
errno = errsave;
if (errval != 0)
return 0; /* an error occured */
if (buf[0] == '\0' || *endptr != '\0')
return 0; /* not a number */
return 1;
}
See the manual page linked before for how the third argument to strtol (base) affects what it does.
errno is set to ERANGE if the value is out of range for the desired type (i.e. long). In this case, the return value is LONG_MAX or LONG_MIN.
If the conversion method returns an error indication (as distinct from going bananas if an error occurs, or not providing a definitive means to check if an error has occurred) then there is actually no need to check if a string is numeric before trying to convert it.
With that in mind, using atoi() is not a particularly good function to use if you need to check for errors on conversion. Zero will be returned for zero input, as well as an error, and there is no way to check on why. A better function to use is (assuming you want to read an integral value) is strtol(). Although strtol() returns zero on integer, it also returns information that can be used to check for failure. For example;
long x;
char *end;
x = strtol(your_string, &end, 10);
if (end == your_string)
{
/* nothing was read due to invalid character or the first
character marked the end of string */
}
else if (*end != '\0`)
{
/* an integral value was read, but there is following non-numeric data */
}
Second, there are alternatives to using strtol(), albeit involving more overhead. The return values from sscanf() (and, in fact, all functions in the scanf() family) can be checked for error conditions.
There is no standard function for checking if a string is numeric, but it can be easily rolled using the above.
int IsNumeric(char *your_string)
{
/* This has undefined behaviour if your_string is not a C-style string
It also deems that a string like "123AB" is non-numeric
*/
long x;
char *end;
x = strtol(your_string, &end, 10);
return !(end == your_string || *end != '\0`);
}
No (explicit) loops in any of the above options.
Is it acceptable practice in C to trap for a 0 return from atoi() in lieu of doing an integrity check on the data before calling atoi()?
No. #FUZxxl well answers that.
Is there a C function or common library function for a numeric integrity check on a string?
In C, the conversion of a string to a number and the check to see if the conversion is valid is usually done together. The function used depends on the type of number sought. "1.23" would make sense for a floating point type, but not an integer.
// No error handle functions
int atoi(const char *nptr);
long atol(const char *nptr);
long long atoll(const char *nptr);
double atof(const char *nptr);
// Some error detection functions
if (sscanf(buffer, "%d", &some_int) == 1) ...
if (sscanf(buffer, "%lf", &some_double) == 1) ...
// Robust methods use
long strtol( const char *nptr, char ** endptr, int base);
long long strtoll( const char *nptr, char ** endptr, int base);
unsigned long strtoul( const char *nptr, char ** endptr, int base);
unsigned long long strtoull( const char *nptr, char ** endptr, int base);
intmax_t strtoimax(const char *nptr, char ** endptr, int base);
uintmax_t strtoumax(const char *nptr, char ** endptr, int base);
float strtof( const char *nptr, char ** endptr);
double strtod( const char *nptr, char ** endptr);
long double strtold( const char *nptr, char ** endptr);
These robust methods use char ** endptr to store the string location where scanning stopped. If no numeric data was found, then *endptr == nptr. So a common test could is
char *endptr;
y = strto...(buffer, ..., &endptr);
if (buffer == endptr) puts("No conversion");
if (*endptr != '\0') puts("Extra text");
If the range was exceed these functions all set the global variable errno = ERANGE; and return a minimum or maximum value for the type.
errno = 0;
double y = strtod("1.23e10000000", &endptr);
if (errno == ERANGE) puts("Range exceeded");
The integer functions allow a radix selection from base 2 to 36. If 0 is used, the leading part of the string "0x", "0X", "0", other --> base 16, 16, 8, 10.
long y = strtol(buffer, &endptr, 10);
Read the specification or help page for more details.
You probably don't need a function to check whether a string is numeric. You will most likely need to convert the string to a number so just do that. Then check if the convertion is successful.
long number;
char *end;
number = strtol(string, &end, 10);
if ((*string == '\0') || (*end != '\0'))
{
// empty string or invalid number
}
the second argument of strtol is used to indicate where the parsing ended (the first non-numeric character). That character will be \0 if we've reached the end of the string. If you want to permit other characters after the number (like ), you can use switch to check for it.
strtol works with long integers. If you need some other type, you should consult the man page: man 3 strtol. For floating-point numbers you can use strtod.
Don't trap if the program logic permits that the string is not numeric (e.g. if it comes from the user or a file).
OP later commneted:
I'm looking for a way to determine if the string contains ONLY base 10 digits or a decimal or a comma. So if the string is 100,000.01 I want a positive return from func. Any other ascii characters anywhere in the string would result in a negative return value.
If is all your interest, use;
if (buffer[strspn(buffer, "0123456789.,")] == '\0') return 0; // Success
else return -1; // Failure
In my program, I input an int value into argv[1]. I need to put an if statement like this:
num = 3;
if (argv[1] == num)
{
[...]
}
I get a warning: comparison between pointer and integer [enabled by default]
How can compare those two values?
Remember that argv, as passed to main, is an array of strings.
You can convert a string to an integer with functions like atoi or strtol (the latter is the preferred alternative). Or you convert the integer to a string, and do a strcmp.
num is an integer, while argv[1] is a string that may (or may not) be representing an integer. You can compare only items of the same type, so either compare a string-to-string or an integer-to-integer:
if (strcmp(argv[1], "3") == 0) {
// ...
}
or
if (atoi(argv[i]) == 3) {
// ...
}
The second way will fall apart when you try comparing to zero (atoi returns zero to indicate an error).
num = 3;
if (atoi(argv[1]) == num)
{
[...]
}
The command line arguments are strings. You will need to convert these strings first using atoi (not suggested) or strtol/strtoul (better, has error handling) and then use the converted value to compare with whatever integer you want to compare with.
char *endptr;
errno = 0;
long int n = strtol(argv[ i ], &endptr, 10);
if (endptr == argv[2])
...; /* no conversion */
else if (*endptr != '\0')
...; /* conversion incomplete */
else if (errno == ERANGE)
...; /* out of `long int''s range */
...
You may need to read a number from argv[1] using various methods, then compare with num. (s*scanf)
One that's most specific for you: http://pubs.opengroup.org/onlinepubs/7908799/xsh/strtol.html
Or print num into a string and do a strcmp with argv[1] (s*printf)
I want to get the following piece of code to work:
#define READIN(a, b) if(scanf('"#%d"', '"&a"') != 1) { printf("ERROR"); return EXIT_FAILURE; }
int main(void)
{
unsigned int stack_size;
printf("Type in size: ");
READIN(d, stack_size);
}
I don't get how to use directives with the # operator. I want to use the scanf with print ERROR etc. several times, but the "'"#%d"' & '"&a"'" is, I think, completely wrong. Is there any way to get that running? I think a macro is the best solution — or do you disagree?
You should only stringify arguments to the macro, and they must be outside of strings or character constants in the replacement text of the macro. Thus you probably should use:
#define READIN(a, b) do { if (scanf("%" #a, &b) != 1) \
{ fprintf(stderr, "ERROR\n"); return EXIT_FAILURE; } \
} while (0)
int main(void)
{
unsigned int stack_size;
printf("Type in size: ");
READIN(u, stack_size);
printf("You entered %u\n", stack_size);
return(0);
}
There are many changes. The do { ... } while (0) idiom prevents you from getting compilation errors in circumstances such as:
if (i > 10)
READIN(u, j);
else
READIN(u, k);
With your macro, you'd get an unexpected keyword 'else' type of message because the semi-colon after the first READIN() would be an empty statement after the embedded if, so the else could not belong to the visible if or the if inside the macro.
The type of stack_size is unsigned int; the correct format specifier, therefore, is u (d is for a signed int).
And, most importantly, the argument a in the macro is stringized correctly (and string concatenation of adjacent string literals - an extremely useful feature of C89! - takes care of the rest for you. And the argument b in the macro is not embedded in a string either.
The error reporting is done to stderr (the standard stream for reporting errors on), and the message ends with a newline so it will actually appear. I didn't replace return EXIT_FAILURE; with exit(EXIT_FAILURE);, but that would probably be a sensible choice if the macro will be used outside of main(). That assumes that 'terminate on error' is the appropriate behaviour in the first place. It often isn't for interactive programs, but fixing it is a bit harder.
I'm also ignoring my reservations about using scanf() at all; I usually avoid doing so because I find error recovery too hard. I've only been programming in C for about 28 years, and I still find scanf() too hard to control, so I essentially never use it. I typically use fgets() and sscanf() instead. Amongst other merits, I can report on the string that caused the trouble; that's hard to do when scanf() may have gobbled some of it.
My thought with scanf() here is, to only read in positive numbers and no letters. My overall code does create a stack, which the user types in and the type should be only positive, otherwise error. [...] I only wanted to know if there's a better solution to forbid the user to type in something other than positive numbers?
I just tried the code above (with #include <stdlib.h> and #include <stdio.h> added) and entered -2 and got told 4294967294, which isn't what I wanted (the %u format does not reject -2, at least on MacOS X 10.7.2). So, I would go with fgets() and strtoul(), most likely. However, accurately detecting all possible problems with strtoul() is an exercise of some delicacy.
This is the alternative code I came up with:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <limits.h>
#include <string.h>
int main(void)
{
unsigned int stack_size = 0;
char buffer[4096];
printf("Type in size: ");
if (fgets(buffer, sizeof(buffer), stdin) == 0)
printf("EOF or error detected\n");
else
{
char *eos;
unsigned long u;
size_t len = strlen(buffer);
if (len > 0)
buffer[len - 1] = '\0'; // Zap newline (assuming there is one)
errno = 0;
u = strtoul(buffer, &eos, 10);
if (eos == buffer ||
(u == 0 && errno != 0) ||
(u == ULONG_MAX && errno != 0) ||
(u > UINT_MAX))
{
printf("Oops: one of many problems occurred converting <<%s>> to unsigned integer\n", buffer);
}
else
stack_size = u;
printf("You entered %u\n", stack_size);
}
return(0);
}
The specification of strtoul() is given in ISO/IEC 9899:1999 §7.20.1.4:
¶1 [...]
unsigned long int strtoul(const char * restrict nptr,
char ** restrict endptr, int base);
[...]
¶2 [...] First,
they decompose the input string into three parts: an initial, possibly empty, sequence of
white-space characters (as specified by the isspace function), a subject sequence
resembling an integer represented in some radix determined by the value of base, and a
final string of one or more unrecognized characters, including the terminating null
character of the input string. Then, they attempt to convert the subject sequence to an
integer, and return the result.
¶3 [...]
¶4 The subject sequence is defined as the longest initial subsequence of the input string,
starting with the first non-white-space character, that is of the expected form. The subject
sequence contains no characters if the input string is empty or consists entirely of white
space, or if the first non-white-space character is other than a sign or a permissible letter
or digit.
¶5 If the subject sequence has the expected form and the value of base is zero, the sequence
of characters starting with the first digit is interpreted as an integer constant according to
the rules of 6.4.4.1. If the subject sequence has the expected form and the value of base
is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value
as given above. If the subject sequence begins with a minus sign, the value resulting from
the conversion is negated (in the return type). A pointer to the final string is stored in the
object pointed to by endptr, provided that endptr is not a null pointer.
¶6 [...]
¶7 If the subject sequence is empty or does not have the expected form, no conversion is
performed; the value of nptr is stored in the object pointed to by endptr, provided
that endptr is not a null pointer.
Returns
¶8 The strtol, strtoll, strtoul, and strtoull functions return the converted
value, if any. If no conversion could be performed, zero is returned. If the correct value
is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN,
LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type
and sign of the value, if any), and the value of the macro ERANGE is stored in errno.
The error I got was from a 64-bit compilation where -2 was converted to a 64-bit unsigned long, and that was outside the range acceptable to a 32-bit unsigned int (the failing condition was u > UINT_MAX). When I recompiled in 32-bit mode (so sizeof(unsigned int) == sizeof(unsigned long)), then the value -2 was accepted again, interpreted as 4294967294 again. So, even this is not delicate enough...you probably have to do a manual skip of leading blanks and reject a negative sign (and maybe a positive sign too; you'd also need to #include <ctype.h> too):
char *bos = buffer;
while (isspace(*bos))
bos++;
if (!isdigit(*bos))
...error - not a digit...
char *eos;
unsigned long u;
size_t len = strlen(bos);
if (len > 0)
bos[len - 1] = '\0'; // Zap newline (assuming there is one)
errno = 0;
u = strtoul(bos, &eos, 10);
if (eos == bos ||
(u == 0 && errno != 0) ||
(u == ULONG_MAX && errno != 0) ||
(u > UINT_MAX))
{
printf("Oops: one of many problems occurred converting <<%s>> to unsigned integer\n", buffer);
}
As I said, the whole process is rather non-trivial.
(Looking at it again, I'm not sure whether the u == 0 && errno != 0 clause would ever catch any errors...maybe not because the eos == buffer (or eos == bos) condition catches the case there's nothing to convert at all.)
You are incorrectly encasing your macro argument(s), it should look like:
#define READIN(a, b) if(scanf("%"#a, &b) != 1) { printf("ERROR"); return EXIT_FAILURE; }
you use of the stringify operator was also incorrect, it must directly prefix the argument name.
In short, use "%"#a, not '"#%d"', and &b, not '"&a"'.
as a side note, for longish macro's like those, it helps to make them multi-line using \, this keeps them readable:
#define READIN(a, b) \
if(scanf("%"#a, &b) != 1) \
{ \
printf("ERROR"); \
return EXIT_FAILURE; \
}
When doing something like this, one should preferably use a function, something along the lines of this should work:
inline int readIn(char* szFormat, void* pDst)
{
if(scanf(szFormat,pDst) != 1)
{
puts("Error");
return 0;
}
return 1;
}
invoking it would be like so:
if(!readIn("%d",&stack_size))
return EXIT_FAILURE;
scanf(3) takes a const char * as a first argument. You are passing '"..."', which is not a C "string". C strings are written with the " double quotes. The ' single quotes are for individual characters: 'a' or '\n' etc.
Placing a return statement inside a C preprocessor macro is usually considered very poor form. I've seen goto error; coded inside preprocessor macros before for repetitive error handling code when storing formatted data to and reading data from a file or kernel interface, but these are definitely exceptional circumstances. You would detest debugging this in six months time. Trust me. Do not hide goto, return, break, continue, inside C preprocessor macros. if is alright so long as it is entirely contained within the macro.
Also, please get in the habit of writing your printf(3) statements like this:
printf("%s", "ERROR");
Format string vulnerabilities are exceedingly easy to write. Your code does not contain any such vulnerability now, but trust me, at some point in the future those strings are inevitably modified to include some user-supplied content, and putting in an explicit format string now will help prevent these in the future. At least you'll think about it in the future if you see this.
It is considered polite to wrap your multi-line macros in do { } while (0) blocks.
Finally, the stringification is not quite done correctly; try this instead:
#define READIN(A, B) do { if (scanf("%" #A, B) != 1) { \
/* error handling */ \
} else { \
/* success case */ \
} } while(0)
Edit: I feel I should re-iterate akappa's advice: Use a function instead. You get better type checking, better backtraces when something goes wrong, and it is far easier to work with. Functions are good.
gcc 4.4.4 c89
What is better to convert a string to an integer value.
I have tried 2 different methods atoi and sscanf. Both work as expected.
char digits[3] = "34";
int device_num = 0;
if(sscanf(digits, "%d", &device_num) == EOF) {
fprintf(stderr, "WARNING: Incorrect value for device\n");
return FALSE;
}
or using atoi
device_num = atoi(digits);
I was thinking that the sscanf would be better as you can check for errors. However, atoi doesn't doing any checking.
You have 3 choices:
atoi
This is probably the fastest if you're using it in performance-critical code, but it does no error reporting. If the string does not begin with an integer, it will return 0. If the string contains junk after the integer, it will convert the initial part and ignore the rest. If the number is too big to fit in int, the behaviour is unspecified.
sscanf
Some error reporting, and you have a lot of flexibility for what type to store (signed/unsigned versions of char/short/int/long/long long/size_t/ptrdiff_t/intmax_t).
The return value is the number of conversions that succeed, so scanning for "%d" will return 0 if the string does not begin with an integer. You can use "%d%n" to store the index of the first character after the integer that's read in another variable, and thereby check to see if the entire string was converted or if there's junk afterwards. However, like atoi, behaviour on integer overflow is unspecified.
strtol and family
Robust error reporting, provided you set errno to 0 before making the call. Return values are specified on overflow and errno will be set. You can choose any number base from 2 to 36, or specify 0 as the base to auto-interpret leading 0x and 0 as hex and octal, respectively. Choices of type to convert to are signed/unsigned versions of long/long long/intmax_t.
If you need a smaller type you can always store the result in a temporary long or unsigned long variable and check for overflow yourself.
Since these functions take a pointer to pointer argument, you also get a pointer to the first character following the converted integer, for free, so you can tell if the entire string was an integer or parse subsequent data in the string if needed.
Personally, I would recommend the strtol family for most purposes. If you're doing something quick-and-dirty, atoi might meet your needs.
As an aside, sometimes I find I need to parse numbers where leading whitespace, sign, etc. are not supposed to be accepted. In this case it's pretty damn easy to roll your own for loop, eg.,
for (x=0; (unsigned)*s-'0'<10; s++)
x=10*x+(*s-'0');
Or you can use (for robustness):
if (isdigit(*s))
x=strtol(s, &s, 10);
else /* error */
*scanf() family of functions return the number of values converted. So you should check to make sure sscanf() returns 1 in your case. EOF is returned for "input failure", which means that ssacnf() will never return EOF.
For sscanf(), the function has to parse the format string, and then decode an integer. atoi() doesn't have that overhead. Both suffer from the problem that out-of-range values result in undefined behavior.
You should use strtol() or strtoul() functions, which provide much better error-detection and checking. They also let you know if the whole string was consumed.
If you want an int, you can always use strtol(), and then check the returned value to see if it lies between INT_MIN and INT_MAX.
To #R.. I think it's not enough to check errno for error detection in strtol call.
long strtol (const char *String, char **EndPointer, int Base)
You'll also need to check EndPointer for errors.
Combining R.. and PickBoy answers for brevity
long strtol (const char *String, char **EndPointer, int Base)
// examples
strtol(s, NULL, 10);
strtol(s, &s, 10);
When there is no concern about invalid string input or range issues, use the simplest: atoi()
Otherwise, the method with best error/range detection is neither atoi(), nor sscanf().
This good answer all ready details the lack of error checking with atoi() and some error checking with sscanf().
strtol() is the most stringent function in converting a string to int. Yet it is only a start. Below are detailed examples to show proper usage and so the reason for this answer after the accepted one.
// Over-simplified use
int strtoi(const char *nptr) {
int i = (int) strtol(nptr, (char **)NULL, 10);
return i;
}
This is the like atoi() and neglects to use the error detection features of strtol().
To fully use strtol(), there are various features to consider:
Detection of no conversion: Examples: "xyz", or "" or "--0"? In these cases, endptr will match nptr.
char *endptr;
int i = (int)strtol(nptr, &endptr, 10);
if (nptr == endptr) return FAIL_NO_CONVERT;
Should the whole string convert or just the leading portion: Is "123xyz" OK?
char *endptr;
int i = (int)strtol(nptr, &endptr, 10);
if (*endptr != '\0') return FAIL_EXTRA_JUNK;
Detect if value was so big, the the result is not representable as a long like "999999999999999999999999999999".
errno = 0;
long L = strtol(nptr, &endptr, 10);
if (errno == ERANGE) return FAIL_OVERFLOW;
Detect if the value was outside the range of than int, but not long. If int and long have the same range, this test is not needed.
long L = strtol(nptr, &endptr, 10);
if (L < INT_MIN || L > INT_MAX) return FAIL_INT_OVERFLOW;
Some implementations go beyond the C standard and set errno for additional reasons such as errno to EINVAL in case no conversion was performed or EINVAL The value of the Base parameter is not valid.. The best time to test for these errno values is implementation dependent.
Putting this all together: (Adjust to your needs)
#include <errno.h>
#include <stdlib.h>
int strtoi(const char *nptr, int *error_code) {
char *endptr;
errno = 0;
long i = strtol(nptr, &endptr, 10);
#if LONG_MIN < INT_MIN || LONG_MAX > INT_MAX
if (errno == ERANGE || i > INT_MAX || i < INT_MIN) {
errno = ERANGE;
i = i > 0 : INT_MAX : INT_MIN;
*error_code = FAIL_INT_OVERFLOW;
}
#else
if (errno == ERANGE) {
*error_code = FAIL_OVERFLOW;
}
#endif
else if (endptr == nptr) {
*error_code = FAIL_NO_CONVERT;
} else if (*endptr != '\0') {
*error_code = FAIL_EXTRA_JUNK;
} else if (errno) {
*error_code = FAIL_IMPLEMENTATION_REASON;
}
return (int) i;
}
Note: All functions mentioned allow leading spaces, an optional leading sign character and are affected by locale change. Additional code is required for a more restrictive conversion.
Note: Non-OP title change skewed emphasis. This answer applies better to original title "convert string to integer sscanf or atoi"
If user enters 34abc and you pass them to atoi it will return 34.
If you want to validate the value entered then you have to use isdigit on the entered string iteratively