How to use `strtoul` to parse string where zero may be valid? - c

According to the documentation for strtoul, regarding its return value...
This function returns the converted integral number as a long int value. If no valid conversion could be performed, a zero value is returned.
What if I'm parsing a user-supplied string of "0" where, for my application, "0" may be a valid entry? In that case it seems that I have no way to determine from using strtoul if a valid conversion was performed. Is there another way to handle this?

Read further the man page:
Since strtoul() can legitimately return 0 or ULONG_MAX (ULLONG_MAX for strtoull()) on both success and failure, the calling program should set errno to 0 before the call, and then determine if an error occurred by checking whether errno has a nonzero value after the call.
Also, to handle another scenario, where no digits were read in the input. If this happens, strtol() sets the value of *endptr to that of the nptr. So, you should also check that the pointer values compare equal or not.

How to use strtoul to parse string where zero may be valid?
Any value returned from strtoul() may be from an expected string input or from other not so expected strings. Further tests are useful.
The following strings all return 0 from strtoul()
OK "0", "-0", "+0"
Not OK "", "abc"
Usually considered OK: " 0"
OK or not OK depending on goals: "0xyz", "0 ", "0.0"
strtoul() has the various detection modes.
int base = 10;
char *endptr; // Store the location where conversion stopped
errno = 0;
unsigned long y = strtoul(s, &endptr, base);
if (s == endptr) puts("No conversion"); // "", "abc"
else if (errno == ERANGE) puts("Overflow");
else if (*endptr) puts("Extra text after the number"); // "0xyz", "0 ", "0.0"
else puts("Mostly successful");
What is not yet detected.
Negative input. strtoul() effectively wraps around such that strtoul("-1", 0, 10) == ULONG_MAX). This issue is often missed in cursory documentation review.
Leading white space allowed. This may or may not be desired.
To also detect negative values:
// find sign
while (isspace((unsigned char) *s)) {
s++;
}
char sign = *s;
int base = 10;
char *endptr; // Store the location where conversion stopped
errno = 0;
unsigned long y = strtoul(s, &endptr, base);
if (s == endptr) puts("No conversiosn");
else if (errno == ERANGE) puts("Overflow");
else if (*endptr) puts("Extra text after the number");
else if (sign == '-' && y != 0) puts("Negative value");
else puts("Successful");

One solution would be to pass the address of a char pointer and check if it is pointing to the beginning of the string:
char *str = "0";
char *endptr;
unsgined long x = strtoul(str, &endptr, 10);
if(endptr == str)
{
//Nothing was read
}

Consider the following function:
#include <stdlib.h>
#include <errno.h>
/* SPDX-Identifier: CC0-1.0 */
const char *parse_ulong(const char *src, unsigned long *to)
{
const char *end;
unsigned long val;
if (!src) {
errno = EINVAL;
return NULL;
}
end = src;
errno = 0;
val = strtoul(src, (char **)(&end), 0);
if (errno)
return NULL;
if (end == src) {
errno = EINVAL;
return NULL;
}
if (to)
*to = val;
return end;
}
This function parses the unsigned long in the string src, returning a pointer to the first unparsed character in src, with the unsigned long saved to *to. If there is an error, the function will return NULL with errno set to indicate the error.
If you compare the function to man 3 strtoul, you'll see it handles all error cases correctly, and only returns non-NULL when src yields a valid unsigned long. Especially see the Notes section. Also pay attention to how negative numbers are handled.
This same pattern works for strtol(), strtod(), strtoull().

Related

How I can handle integer overflow?

I am trying to handle integer overflow. My code is :
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<errno.h>
#include<limits.h>
int isInt (char *s)
{
char *ep = NULL;
long i = strtol (s, &ep, 10);
if ((*ep == 0) || (!strcmp(ep,"\n")))
return 1; // it's an int
return 0;
}
int main()
{
char *buffer = NULL;
size_t count = 0;
ssize_t ret;
//AMINO *a_acid;
int num;
for(;;)
{
printf("Please enter an integer:");
if((ret = getline(&buffer, &count, stdin)) < 0)
{
perror("getline: error\n");
free(buffer);
exit(EXIT_FAILURE);
}
if(!isInt(buffer))
{
perror("you are not entering int , Try again:");
continue;
}
sscanf(buffer, "%d",&num);
printf("%d\n", num);
if ((num > INT_MAX)|| (num < 0))
{
perror("you overflowed int variable , Try again:\n ");
continue;
}
break;
}
}
Now I was checking how this code is responding. And I saw something weird.When I am entering so big number, then it is detected. But sometimes is not getting detected.
Here is my terminal view:
> nazmul#nazmul-Lenovo-G50-80:~/2nd_sem/biophysics$ gcc torson.c
> nazmul#nazmul-Lenovo-G50-80:~/2nd_sem/biophysics$ ./a.out
> Please enter an integer:ksdjfjklh
> you are not entering int , Try again:: Success
> Please enter an integer:338479759475637465765
> -1
> you overflowed int variable , Try again: : Numerical result out of
> range
> Please enter an integer:58678946895785
> 1103697833
> nazmul#nazmul-Lenovo-G50-80:~/2nd_sem/biophysics$
*Why it is working for this number 338479759475637465765. But it is not working for 58678946895785. logic , I used in my program, is when it is out of bound, then int variable gives some -1 or negative value. I read many article, still it is not quite clear.
strtol converts the value to a long int, whose range might be distinct from int. Furthermore, it returns LONG_MAX or LONG_MIN if the value could be converted but is outside the range for long int. In that case, errno will be set to ERANGE (but not otherwise!) Also, in the case of matching failure the value returned is 0, but errno is not set; but the ep points to the beginning of the string.
int isInt (char *s)
{
char *ep = NULL;
// zero errno first!
errno = 0;
long i = strtol (s, &ep, 10);
if (errno) {
return 0;
}
// matching failure.
if (ep == s) {
return 0;
}
// garbage follows
if (! ((*ep == 0) || (!strcmp(ep,"\n")))) {
return 0;
}
// it is outside the range of `int`
if (i < INT_MIN || i > INT_MAX) {
return 0;
}
return 1;
}
What dbush says about the use of perror is correct, though. strtol sets an error only in case of long overflow, which is not the only possible failing case in your function, so perror could print anything like Is a directory or Multihop attempted.
sscanf(buffer, any_format_without_width, &anytype); is not sufficient to detect overflow.
if the result of the conversion cannot be represented in the object, the behavior is undefined. C11dr §7.21.6.2 10
Do not use *scanf() family to detect overflow. It may work in select cases, but not in general.
Instead use strto**() functions. Yet even OP's isInt() is mis-coded as it incorrectly assess isInt("\n"), isInt(""), isInt("999..various large values ...999") as good ints.
Alternative:
bool isint_alt(const char *s) {
char *endptr;
errno = 0;
long y = strtol(s, &endptr, 10);
if (s == endptr) {
return false; // No conversion
}
if (errno == ERANGE) {
return false; // Outside long range
}
if (y < INT_MIN || y > INT_MAX) {
return false; // Outside int range
}
// Ignore trailing white space
while (isspace((unsigned char)*endptr)) {
endptr++;
}
if (*endptr) {
return false; // Trailing junk
}
return true;
}
You're getting your types mixed up.
In the isInt function you use strtol, which return a long to check the value. Then in your main function you use sscanf with %d, which reads into an int.
On your system, it seems that a long is 64 bits while an int is 32 bits. So strtol fails to fully convert 338479759475637465765 because it is larger than a 64 bit variable can hold. Then you try to convert 58678946895785 which will fit in a 64 bit variable but not a 32 bit variable.
You should instead have sscanf read into a long. Then you can compare the value against INT_MAX:
long num;
...
sscanf(buffer, "%ld", &num);
printf("%ld\n", num);
if ((num > INT_MAX)|| (num < INT_MIN))
{
printf("you overflowed int variable , Try again:\n ");
continue;
}
Also note that it doesn't make sense to call perror here. You only use it right after calling a function which sets errno.
If one must use sscanf() to detect int overflow rather than the robust strtol(), there is a cumbersome way.
Use a wider type and a width limit to prevent overflow when scanning.
bool isint_via_sscanf(const char *s) {
long long y;
int n = 0;
if (sscanf(s, "18%lld %n", &y, &n) != 1) { // Overflow not possible
return false; // Conversion failed
}
if (y < INT_MIN || y > INT_MAX) {
return false; // Outside int range
}
if (s[n]) {
return false; // Trailing junk
}
return true;
}
It is insufficient on rare platforms where INT_MAX > 1e18.
It also incorrectly returns input like "lots of leading space and/or lot of leading zeros 000123" as invalid.
More complex code using sscanf() can address these short-comings, yet the best approach is strto*().

C Integer Safe Input

How can I get a safe input of integer (especially, positive number) using scanf or gets? I've tried several solutions and each solution had some problems.
1. Using getchar() to remove string inputs
int safeInput() {
int input;
scanf("%d", &input);
while(getchar() != '\n');
return input;
}
This method effectively handles string inputs, however, if strings such as 3a are inputted, the value of input becomes 3, which is not a true exception handle.
2. Retrieving input as a string then converting to integer value.
int safeInput() {
char[200] input, safe_input;
gets(input);
// I know about the security issue about gets - but it's not the point.
int i = 0;
while (1) {
if (input[i] >= 48 && input[i] <= 57) safe_input[i] = input[i];
else break;
i++;
}
return atoi(safe_input);
}
This method has problem that it cannot handle if string that has longer length than allocated to input was inputted.
3. What if defining a string using pointer?
I concerned about defining input by pointer, like char *input;. However, once I executed gets(input)(or scanf("%s", input)), it raised runtime-error.
So what is a proper way to retrieve an integer value from console window using scanf or gets?
The answer depends on what exactly you mean by safe. If you want to catch any possible input error, your only option is to use a function of the strtol() family, which even allows for a range check. In my beginners' guide away from scanf(), I'm describing its use.
Here's the code adapted to what you're attempting here, with comments:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <limits.h>
// return success as boolean (0, 1), on success write result through *number:
int safeInput(int *number)
{
long a;
char buf[1024]; // use 1KiB just to be sure
if (!fgets(buf, 1024, stdin))
{
// reading input failed:
return 0;
}
// have some input, convert it to integer:
char *endptr;
errno = 0; // reset error number
a = strtol(buf, &endptr, 10);
if (errno == ERANGE)
{
// out of range for a long
return 0;
}
if (endptr == buf)
{
// no character was read
return 0;
}
if (*endptr && *endptr != '\n')
{
// *endptr is neither end of string nor newline,
// so we didn't convert the *whole* input
return 0;
}
if (a > INT_MAX || a < INT_MIN)
{
// result will not fit in an int
return 0;
}
// write result through the pointer passed
*number = (int) a;
return 1;
}
First if you want a safe input, do not use gets. Saying that you know about the issues is not a true excuse when you could use fgets. Next, the trick is to try to read a non blank character after the int: if you find no one, then there is nothing after the int on the line.
int safeInput(int *input) { // the return value is the indicator of failed read
int c;
char dummy[2]; // never forget the terminating null!
if (scanf("%d%1s", input, dummy) == 1) return 1;
// in case of error, skip anything up to end of line or end of file
while (((c = fgetc(stdin)) != '\n') && (c != EOF));
return 0;
}
The nice point here, is that when scanf returns 1, the %1s has eaten anything up to the end of line, including the terminating 'n'. But this has a major drawback: the scanf will only end on end of stream or after reading one additional (non blank) character. For that reason, Felix Palmen's answer is easier and safer to use.

strtol not changing errno

I'm working on a program that performs calculations given a char array that represents a time in the format HH:MM:SS. It has to parse the individual time units.
Here's a cut down version of my code, just focusing on the hours:
unsigned long parseTime(const char *time)
{
int base = 10; //base 10
long hours = 60; //defaults to something out of range
char localTime[BUFSIZ] //declares a local array
strncpy(localTime, time, BUFSIZ); //copies parameter array to local
errno = 0; //sets errno to 0
char *par; //pointer
par = strchr(localTime, ':'); //parses to the nearest ':'
localTime[par - localTime] = '\0'; //sets the ':' to null character
hours = strtol(localTime, &par, base); //updates hours to parsed numbers in the char array
printf("errno is: %d\n", errno); //checks errno
errno = 0; //resets errno to 0
par++; //moves pointer past the null character
}
The problem is that if the input is invalid (e.g. aa:13:13), strtol() apparently doesn't detect an error because it's not updating errno to 1, so I can't do error handling. What am I getting wrong?
strtol is not required to produce an error code when no conversion can be performed. Instead you should use the second argument which stores the final position after conversion and compare it to the initial position.
BTW there are numerous other errors in your code that do not affect the problem you're seeing but which should also be fixed, such as incorrect use of strncpy.
As others have explained, strtol may not update errno in case it cannot perform any conversion. The C Standard only documents that errnor be set to ERANGE in case the converted value does not fit in a long integer.
Your code has other issues:
Copying the string with strncpy is incorrect: in case the source string is longer than BUFSIZ, localTime will not be null terminated. Avoid strncpy, a poorly understood function that almost never fits the purpose.
In this case, you no not need to clear the : to '\0', strtol will stop at the first non digit character. localTime[par - localTime] = '\0'; is a complicated way to write *par = '\0';
A much simpler version is this:
long parseTime(const char *time) {
char *par;
long hours;
if (!isdigit((unsigned char)*time) {
/* invalid format */
return -1;
}
errno = 0;
hours = strtol(time, &par, 10);
if (errno != 0) {
/* overflow */
return -2;
}
/* you may want to check that hour is within a decent range... */
if (*par != ':') {
/* invalid format */
return -3;
}
par++;
/* now you can parse further fields... */
return hours;
}
I changed the return type to long so you can easily check for invalid format and even determine which error from a negative return value.
For an even simpler alternative, use sscanf:
long parseTime(const char *time) {
unsigned int hours, minutes, seconds;
char c;
if (sscanf(time, "%u:%u:%u%c", &hours, &minutes, &seconds, &c) != 3) {
/* invalid format */
return -1;
}
if (hours > 1000 || minutes > 59 || seconds > 59) {
/* invalid values */
return -2;
}
return hours * 3600L + minutes * 60 + seconds;
}
This approach still accepts incorrect strings such as 1: 1: 1 or 12:00000002:1. Parsing the string by hand seem the most concise and efficient solution.
A useful trick with sscanf() is that code can do multiple passes to detect errant input:
// HH:MM:SS
int parseTime(const char *hms, unsigned long *secs) {
int n = 0;
// Check for valid text
sscanf(hms "%*[0-2]%*[0-9]:%*[0-5]%*[0-9]:%*[0-5]%*[0-9]%n", &n);
if (n == 0) return -1; // fail
// Scan and convert to integers
unsigned h,m,s;
sscanf(hms "%u:%u:%u", &h, &m, &s);
// Range checks as needed
if (h >= 24 || m >= 60 || s >= 60) return -1;
*sec = (h*60 + m)*60L + s;
return 0;
}
After hours = strtol(localTime, &par, base); statement you have to first save the value of errno. Because after this statement you are going to call printf() statement that also set errno accordingly.
printf("errno is: %d\n", errno);
So in this statement "errno" gives the error indication for printf() not for strtol()... To do so save "errno" before calling any library function because most of the library function interact with "errno".
The correct use is :
hours = strtol(localTime, &par, base);
int saved_error = errno; // Saving the error...
printf("errno is: %d\n", saved_error);
Now check it. It will give correct output surely...And one more thing to convert this errno to some meaningful string to represent error use strerror() function as :
printf("Error is: %s\n", strerror(saved_error));

Comparing an input string with a string that has a integer variable in C?

I'm trying to compare an input of characters with a string that can be of the format "!x" where x is any integer.
What's the easiest way to do this? I tried
int result = strcmp(input,"!%d");
which did not work.
Here's one way to do it:
int is_bang_num(const char *s) {
if (*s != '!') {
return 0;
}
size_t n = strspn(s + 1, "0123456789");
return n > 0 && s[1 + n] == '\0';
}
This verifies that the first character is !, that it is followed by more characters, and that all of those following characters are digits.
You see, scanf() family of functions return a value indicating how many parameters where converted.
Even books usually ignore this value and it leads programmers to ignore that it does return a value. One of the consequences of this is Undefined Behavior when the scanf() function failed and the value was not initialized, not before calling scanf() and since it has failed not by scanf() either.
You can use this value returned by sscanf() to check for success, like this
#include <stdio.h>
int
main(void)
{
const char *string;
int value;
int result;
string = "!12345";
result = sscanf(string, "!%d", &value);
if (result == 1)
fprintf(stderr, "the value was: %d\n", value);
else
fprintf(stderr, "the string did not match the pattern\n");
return 0;
}
As you can see, if one parameter was successfuly scanned it means that the string matched pattern, otherwise it didn't.
With this approach you also extract the integral value, but you should be careful because scanf()'s are not meant for regular expressions, this would work in very simple situations.
Since the stirng must begin with a ! and follow with an integer, use a qualified strtol() which allows a leading sign character. As OP did not specify the range of the integer, let us allow any range.
int is_sc_num(const char *str) {
if (*str != '!') return 0;
str++;
// Insure no spaces- something strtol() allows.
if (isspace((unsigned char) *str) return 0;
char *endptr;
// errno = 0;
// By using base 0, input like "0xABC" allowed
strtol(str, &endptr, 0);
// no check for errno as code allows any range
// if (errno == ERANGE) return 0
if (str == endptr) return 0; // no digits
if (*endptr) return 0; // Extra character at the end
return 1;
}
If you want to test that a string matches a format of an exclamation point and then some series of numbers, this regex: "!\d+" will match that. That won't catch if the first number is a zero, which is invalid. This will: "![1,2,3,4,5,6,7,8,9]\d*".

return value of strtod() if string equals to zero

As per MSDN:
strtod returns 0 if no conversion can be performed or an underflow occurs.
What if my string equals to zero (i.e., 0.0000)? How can I know if there is no error from the conversion?
OK, I use the following code to verify the idea:
char *Y = "XYZ";
double MyNum;
char *MyEndPtr;
int Err_Conversion = 0;
errno = 0; //reset
MyNum = strtod (Y, &MyEndPtr);
if ( (MyNum == 0) && (errno != 0) && (strcmp(Y, MyEndPtr) == 0) )
{ Err_Conversion = 1; }
I see that MyNum = 0, but never see the content of Y copied into MyEnPtr, or errno = 0 in this forced error. Any idea?
Use the str_end parameter of the function. For example:
const char* str = "123junk";
char* str_end;
double d = strtod(str, &str_end);
// Here:
// d will be 123
// str_end will point to (str + 3) (the 'j')
// You can tell the string has some junk data if *str_end != '\0'
if (*str_end != '\0') {
printf("Found bad data '%s' at end of string\n", str_end);
}
If conversion totally fails, str will equal str_end:
const char* str = "junk";
char* str_end;
double d = strtod(str, &str_end);
// Here:
// d will be 0 (conversion failed)
// str_end will equal str
if (str == str_end) {
printf("The string doesn't start with a number!\n");
}
You can combine these two methods to make sure the string was (completely) successfully converted (that is, by checking str != str_end && *str_end == '\0')
The signature is (give or take restrict keywords):
double strtod(const char *nptr, char **endptr);
If you pass a non-null pointer as the second argument, it will be returned with the value of nptr if it could perform no conversion. If it found a genuine zero in the input string, then the value stored in *endptr won't be nptr.
char *end;
const char *data = "0.00000";
errno = 0;
double d = strtod(data, &end);
if (end != data)
...a conversion was performed...
else
...trouble...
You can also look at errno, but you need to zero it before the call because no function in the standard C library or the POSIX library sets errno to zero.
The standard says:
If the subject sequence is empty or does not have the expected form, no conversion is performed; the value of nptr is stored in the object pointed to by endptr, provided that endptr is not a null pointer.
Returns
The functions return the converted value, if any. If no conversion could be performed, zero is returned. If the correct value overflows and default rounding is in effect (7.12.1), plus or minus HUGE_VAL, HUGE_VALF, or HUGE_VALL is returned (according to the return type and sign of the value), and the value of the macro ERANGE is stored in errno. If the result underflows (7.12.1), the functions return a value whose magnitude is
no greater than the smallest normalized positive number in the return type; whether errno acquires the value ERANGE is implementation-defined.

Resources