Using strtol to validate integer input in ANSI C - c

I am new to programming and to C in general and am currently studying it at university. This is for an assignment so I would like to avoid direct answers but are more after tips or hints/pushes in the right direction.
I am trying to use strtol to validate my keyboard input, more specifically, test whether the input is numeric. I have looked over other questions on here and other sites and I have followed instructions given to other users but it hasn't helped me.
From what I have read/ understand of strtol (long int strtol (const char* str, char** endptr, int base);) if the endptr is not a null pointer the function will set the value of the endptr to the first character after the number.
So if I was to enter 84948ldfk, the endptr would point to 'l', telling me there is characters other than numbers in the input and which would make it invalid.
However in my case, what is happening, is that no matter what I enter, my program is returning an Invalid input. Here is my code:
void run_perf_square(int *option_stats)
{
char input[MAX_NUM_INPUT + EXTRA_SPACES]; /*MAX_NUM_INPUT + EXTRA_SPACES are defined
*in header file. MAX_NUM_INPUT = 7
*and EXTRA_SPACES
*(for '\n' and '\0') = 2. */
char *ptr;
unsigned num=0; /*num is unsigned as it was specified in the start up code for the
*assignment. I am not allow to change it*/
printf("Perfect Square\n");
printf("--------------\n");
printf("Enter a positive integer (1 - 1000000):\n");
if(fgets(input, sizeof input, stdin) != NULL)
{
num=strtol(input, &ptr, 10);
if( num > 1000001)
{
printf("Invalid Input! PLease enter a positive integer between 1
and 1000000\n");
read_rest_of_line(); /*clears buffer to avoid overflows*/
run_perf_square(option_stats);
}
else if (num <= 0)
{
printf("Invalid Input! PLease enter a positive integer between 1
and 1000000\n");
run_perf_square(option_stats);
}
else if(ptr != NULL)
{
printf("Invalid Input! PLease enter a positive integer between 1
and 1000000\n");
run_perf_square(option_stats);
}
else
{
perfect_squares(option_stats, num);
}
}
}
Can anyone help me in the right direction? Obviously the error is with my if(ptr != NULL) condition, but as I understand it seems right. As I said, I have looked at previous questions similar to this and took the advice in the answers but it doesn't seem to work for me. Hence, I thought it best to ask for my help tailored to my own situation.
Thanks in advance!

You're checking the outcome of strtol in the wrong order, check ptr first, also don't check ptr against NULL, derference it and check that it points to the NUL ('\0') string terminator.
if (*ptr == '\0') {
// this means all characters were parsed and converted to `long`
}
else {
// this means either no characters were parsed correctly in which
// case the return value is completely invalid
// or
// there was a partial parsing of a number to `long` which is returned
// and ptr points to the remaining string
}
num > 1000001 also needs to be num > 1000000
num < 0 also needs to be num < 1
You can also with some reorganising and logic tweaks collapse your sequence of if statements down to only
a single invalid branch and a okay branch.

OP would like to avoid direct answers ....
validate integer input
Separate I/O from validation - 2 different functions.
I/O: Assume hostile input. (Text, too much text, too little text. I/O errors.) Do you want to consume leading spaces as part of I/O? Do you want to consume leading 0 as part of I/O? (suggest not)
Validate the string (NULL, lead space OK?, digits after a trailing space, too short, too long, under-range, over-range, Is 123.0 an OK integer)
strtol() is your friend to do the heavy conversion lifting. Check how errno should be set and tested afterward. Use the endptr. Should its value be set before. How to test afterward. It consume leading spaces, is that OK? It converts text to a long, but OP wants the nebulous "integer".
Qapla'

The function strtol returns long int, which is a signed value. I suggest that you use another variable (entry_num), which you could test for <0, thus detecting negative numbers.
I would also suggest that regex could test string input for digits and valid input, or you could use strtok and anything but digits as the delimiter ;-) Or you could scan the input string using validation, something like:
int validate_input ( char* input )
{
char *p = input;
if( !input ) return 0;
for( p=input; *p && (isdigit(*p) || iswhite(*p)); ++p )
{
}
if( *p ) return 0;
return 1;
}

Related

Checking just integers and no strings using fgets only

I'm trying to get an integer number from command line without scanf() but using justfgets(), how can I filter the fgets() contentsreporting an error if I insert a character or a string? The problem is that when I insert something different like a character or a string the atoi()function (essential to do some operations in my algorithm) converts me that string to 0, whilst I'd prefer to exit if the value inserted is different from an integer.
Here's a code part:
.....
char pos[30];
printf("\n Insert a number: ");
fgets (pos, sizeof(pos), stdin);
if (atoi(pos) < 0) //missing check for string character
exit(1);
else{
printf ("%d\n", atoi(pos)); //a string or character converted through atoi() gives 0
}
int number = atoi(pos);
......
As commenters have said, use strtol() not atoi().
The problem with strtol() is that it will only give an ERANGE error (as per the specification) when the converted number will not fit in a long-type. So if you ask it to convert " 1" it gives 1. If you ask it to convert "apple", it returns 0 and sets endptr to indicate an error.
Obviously you need to decide if " 12" is going to be acceptable input or not — strtol() will happily skip the leading white space.
EDIT: Function updated to better handle errors via the endptr.
// Convert the given <text> string to a decimal long, in <value>
// Allow a string of digits, or white space then digits
// returns 1 for OK, or 0 otherwise
int parseLong( const char *text, long *value )
{
int rc = 0; // fail
char *endptr; // used to determine failure
if ( text && value )
{
errno = 0; // Clear any errors
*value = strtol( text, &endptr, 10 ); // Do the conversion
// Check that conversion was performed, and
// that the value fits in a long
if ( endptr != text && errno != ERANGE )
{
rc = 1; // success
}
}
return rc;
}
First, you have to keep in mind that characters are not essentially alpha characters; be precise.
I think what you're looking for is an "is integer" function.
In the standard C library ctype.h there are functions called isalpha and isdigit.
https://www.programiz.com/c-programming/library-function/ctype.h/isalpha
So you could make a function that verifies if a char * contains only numeric characters.
int str_is_only_numeric(const char *str) {
int i = 0;
while (str[i] != '\0') {
if (isdigit(str[i++]) == 0) {
return -1;
}
}
return 0;
}
Here's a working example of the function: https://onlinegdb.com/SJBdLdy78
I solved on my own using strcspn()before checking through isdigit()the integer type, without strcspn() it'd have returned always -1

Store hex input into int variable without using scanf() function in C

Pre-History:
I had the issue, that the getchar() function did not get processed in the right way as there was not a request for any given input and the program just have continued processing further.
I searched the internet about what this issue could be and found the information that if the scanf() function is implemented into a program before the getchar() function, the getchar() function does not behave in the right way, and would act like my issue was.
Citation:
I will bet you ONE HUNDRED DOLLARS you only see this problem when the call to getchar() is preceded by a scanf().
Don't use scanf for interactive programs. There are two main reasons for this:
1) scanf can't recover from malformed input. You have to get the format string right, every time, or else it just throws away whatever input it couldn't match and returns a value indicating failure. This might be fine if you're parsing a fixed-format file when poor formatting is unrecoverable anyway, but it's the exact opposite of what you want to do with user input. Use fgets() and sscanf(), fgets() and strtok(), or write your own user input routines using getchar() and putchar().
1.5) Even properly used, scanf inevitably discards input (whitespace) that can sometimes be important.
2) scanf has a nasty habit of leaving newlines in the input stream. This is fine if you never use anything but scanf, since scanf will usually skip over any whitespace characters in its eagerness to find whatever it's expecting next. But if you mix scanf with fgets/getchar, it quickly becomes a total mess trying to figure out what might or might not be left hanging out in the input stream. Especially if you do any looping -- it's quite common for the input stream to be different on the first iteration, which results in a potentially weird bug and even weirder attempts to fix it.
tl;dr -- scanf is for formatted input. User input is not formatted. //
Here is the link, to that thread: https://bbs.archlinux.org/viewtopic.php?id=161294
scanf() with:
scanf("%x",integer_variable);
seems for me as a newbie to the scene as the only way possible to input a hex number from the keyboard (or better said the stdin file) and store it to a int variable.
Is there a different way to input a hex value from the stdin and store it into an integer variable?
Bonus challenge: It would be nice also, if i could write negative values (through negative hex input of course) into an signed int variable.
INFO: I have read many threads for C here on Stackoverflow about similar problems but none of those answer my explicit question quite well. So i´ve posted this question.
I work under Linux Ubuntu.
The quote about the hundred dollar bet is accurate. Mixing scanf with getchar is almost always a bad idea; it almost always leads to trouble. It's not that they can't be used together, though. It's possible to use them together -- but usually, it's just way too difficult. There are too many fussy little details and "gotcha!"s to keep track of. It's more trouble than it's worth.
At first you had said
scanf() with ... %d ... seems for me as a newbie to the scene as the only way possible to input a hex number from the keyboard
There was some side confusion there, because of course %d is for decimal input. But since I'd written this answer by the time you corrected that, let's proceed with decimal for the moment.
(Also for the moment I'm leaving out error checking -- that is, these code fragments don't check for or do anything graceful if the user doesn't type the requested number.) Anyway, here are several ways of reading an integer:
scanf("%d", &integer_variable);
You're right, this is the (superficially) easiest way.
char buf[100];
fgets(buf, sizeof(buf), stdin);
integer_variable = atoi(buf);
This is, I think, the easiest way that doesn't use scanf. But most people these days frown on using atoi, because it doesn't do much useful error checking.
char buf[100];
fgets(buf, sizeof(buf), stdin);
integer_variable = strtol(buf, NULL, 10);
This is almost the same as before, but avoids atoi in favor of the preferred strtol.
char buf[100];
fgets(buf, sizeof(buf), stdin);
sscanf(buf, "%d", &integer_variable);
This reads a line and then uses sscanf to parse it, another popular and general technique.
All of these will work; all of these will handle negative numbers. It's important to think about error conditions, though -- I'll have more to say about that later.
If you want to input hexadecimal numbers, the techniques are similar:
scanf("%x", &integer_variable);
char buf[100];
fgets(buf, sizeof(buf), stdin);
integer_variable = strtol(buf, NULL, 16);
char buf[100];
fgets(buf, sizeof(buf), stdin);
sscanf(buf, "%x", &integer_variable);
These should all work, too. I wouldn't necessarily expect them to handle "negative hexadecimal", though, because that's an unusual requirement. Most of the time, hexadecimal notation is used for unsigned integers. (In fact, strictly speaking, %x with scanf and sscanf must be used with an integer_variable that has been declared as unsigned int, not plain int.)
Sometimes it's useful or necessary to do this sort of thing "by hand". Here's a code fragment that reads exactly two hexadecimal digits. I'll start out with the version using getchar:
int c1 = getchar();
if(c1 != EOF && isascii(c1) && isxdigit(c1)) {
int c2 = getchar();
if(c2 != EOF && isascii(c2) && isxdigit(c2)) {
if(isdigit(c1)) integer_variable = c1 - '0';
else if(isupper(c1)) integer_variable = 10 + c1 - 'A';
else if(islower(c1)) integer_variable = 10 + c1 - 'a';
integer_variable = integer_variable * 16;
if(isdigit(c2)) integer_variable += c2 - '0';
else if(isupper(c2)) integer_variable += 10 + c2 - 'A';
else if(islower(c2)) integer_variable += 10 + c1 - 'a';
}
}
As you can see, it's a bit of a jawbreaker. Me, although I almost never use members of the scanf family, this is one place where I sometimes do, precisely because doing it "by hand" is so much work. You can simplify it considerably by using an auxiliary function or macro to do the digit conversion:
int c1 = getchar();
if(c1 != EOF && isascii(c1) && isxdigit(c1)) {
int c2 = getchar();
if(c2 != EOF && isascii(c2) && isxdigit(c2)) {
integer_variable = Xctod(c1);
integer_variable = integer_variable * 16;
integer_variable += Xctod(c2);
}
}
Or you could collapse those inner expressions down to just
integer_variable = 16 * Xctod(c1) + Xctod(c2);
These work in terms of an auxiliary function:
int Xctod(int c)
{
if(!isascii(c)) return 0;
else if(isdigit(c)) return c - '0';
else if(isupper(c)) return 10 + c - 'A';
else if(islower(c)) return 10 + c - 'a';
else return 0;
}
Or perhaps a macro (though this is definitely an old-school sort of thing):
#define Xctod(c) (isdigit(c) ? (c) - '0' : (c) - (isupper(c) ? 'A' : 'a') + 10)
Often I'm parsing hexadecimal digits like this not from stdin using getchar(), but from a string. Often I'm using a character pointer (char *p) to step through the string, meaning that I end up with code more like this:
char c1 = *p++;
if(isascii(c1) && isxdigit(c1)) {
char c2 = *p++;
if(isascii(c2) && isxdigit(c2))
integer_variable = 16 * Xctod(c1) + Xctod(c2);
}
It's tempting to omit the temporary variables and the error checking and boil this down still further:
integer_variable = 16 * Xctod(*p++) + Xctod(*p++);
But don't do this! Besides the lack of error checking, this expression is probably undefined, and it definitely won't always do what you want, because there's no longer any guarantee abut what order you read the characters in. If you know p points at the first of two hex digits, you don't want to collapse it any further than
integer_variable = Xctod(*p++);
integer_variable = 16 * integer_variable + Xctod(*p++);
and even then, this will work only with the function version of Xctod, not the macro, since the macro evaluates its argument multiple times.
Finally, let's talk abut error handling. There are quite a few possibilities to worry about:
The user hits Return without typing anything.
The user types whitespace before or after the number.
The user types extra garbage after the number.
The user types non-numeric input instead of a number.
The code hits end-of-file; there are no characters to read at all.
And then how you handle these depends on what input techniques you're using. Here are the basic rules:
A. If you're calling scanf, fscanf, or sscanf, always check the return value. If it's not 1 (or, in the case where you had multiple % specifiers, it's not the number of values you expected to read), it means something went wrong. This will generally catch problems 4 and 5, and will handle case 2 gracefully. But it will often quietly ignore problems 1 and 3. (In particular, scanf and fscanf treat an extra \n just like leading whitespace.)
B. If you're calling fgets, again, always check the return value. You'll get NULL on EOF (problem 5). Handling the other problems depends on what you do with the line you read.
C. If you're calling atoi, it will deal gracefully with problem 2, but it will ignore problem 3, and it will quietly turn problem 4 into the number 0 (which is why atoi is usually not recommended any more).
D. If you're calling strtol or any of the other "strto" functions, they will deal gracefully with problem 2, and if you let them give you back an "end pointer", you can check for and deal with problems 3 and 4. (Note that I left the end-pointer handling out of my two strtol examples above.)
E. Finally, if you're doing something down-and-dirty like my "hardway" two-digit hex converter, you generally have to take care of all these problems, explicitly, yourself. If you want to skip leading whitespace you have to do so (the isspace function from <ctype.h> can help), and if there might be unexpected non-digit characters, you have to check for those, too. (That's what the calls to isascii and isxdigit are doing in my "hardway" two-digit hex converter.)
Per scanf man page, you can use scanf to read hex number from stdin into (unsigned) integer variable.
unsigned int v ;
if ( scanf("%x", &v) == 1 ) {
// do something with v.
}
As per man page, %x is always unsigned. If you want to support negative values, you will have to add explicit logic.
As mentioned in the link you posted, using fgets and sscanf is the best way to handle this. fgets will read a full line of text and sscanf will parse the line.
For example
char line[100];
fgets(line, sizeof(line), stdin);
int x;
int rval = sscanf(line, "%x", &x);
if (rval == 1) {
printf("read value %x\n", x);
} else {
printf("please enter a hexadecimal integer\n");
}
Since you're only reading in a single integer, you could also use strtol instead of sscanf. This also has the advantage of detecting if any additional characters were entered:
char *ptr;
errno = 0;
long x = strtol(line, &ptr, 16);
if (errno) {
perror("parsing failed");
} else if (*ptr != '\n' && *ptr != 0) {
printf("extra characters entered: %s\n", ptr);
} else {
printf("read value %lx\n", x);
}

How to check if input is numeric(float) or it is some character?

I was asked to write a program to find sum of two inputs in my college so I should first check whether the input is valid.
For example, if I input 2534.11s35 the program should detect that it is not a valid input for this program because of s in the input.
to check input is numeric(float)
1) Take input as a string char buf[Big_Enough]. I'd expect 160 digits will handle all but the most arcane "float" strings1.
#define N 160
char buf[N];
if (fgets, buf, sizeof buf, stdin) {
2) Apply float strtof() for float, (strtod() for double, strtold() for long double).
char *endptr;
errno = 0;
float d = strtof(buf, &endptr);
// endptr now points to the end of the conversion, if any.
3) Check results.
if (buf == endptr) return "No_Conversion";
// Recommend to tolerate trailing white-space.
// as leading white-spaces are already allowed by `strtof()`
while (isspace((unsigned char)*endptr) {
endptr++;
}
if (*endptr) return "TrailingJunkFound";
return "Success";
4) Tests for extremes, if desired.
At this point, the input is numeric. The question remains if the "finite string" an be well represented by a finite float: if a the |result| is in range of 0 or [FLT_TRUE_MIN...FLT_MAX].
This involves looking at errno.
The conversion "succeed" yet finite string values outside the float range become HUGE_VALF which may be infinity or FLT_MAX.
Wee |values| close to 0.0, but not 0.0 become something in the range [0.0 ... INT_MIN].
Since the goal is to detect is a conversion succeeded (it did), I'll leave these details for a question that wants to get into the gory bits of what value.
An alternative is to use fscanf() to directly read and convert, yet the error handling there has its troubles too and hard to portably control.
1 Typical float range is +/- 1038. So allowing for 40 or so characters makes sense. An exact print of FLT_TRUE_MIN can take ~150 characters. To distinguish a arbitrarily "float" string from FLT_TRUE_MIN from the next larger one needs about that many digits.
If "float" strings are not arbitrary, but only come from the output of a printed float, then far few digits are needed - about 40.
Of course it is wise to allow for extra leading/trailing spaces and zeros.
You need to take the input as a string and then, make use of strtod() to parse the input.
Regarding the return values, from the man page:
double strtod(const char *nptr, char **endptr);
These functions return the converted value, if any.
If endptr is not NULL, a pointer to the character after the last character used in the conversion is stored in the location referenced by endptr.
If no conversion is performed, zero is returned and the value of nptr is stored in the location referenced by endptr.
Getting to the point of detection of errors, couple of points:
Ensure the errno is set to 0 before the call and it still is 0 after the call.
The return value is not HUGE_VAL.
The content pointed to by *endptr is not null and not equal to nptr (i.e., no conversation has been preformed).
The above checks, combined together will ensure a successful conversion.
In your case, the last point is essential, as if there is an invalid character present in the input, the *endptr would not be pointing to a null, instead it would hold the address of that (first) invalid character in the input.
#include<stdio.h>
#include<stdlib.h>
void main(){
char num1[15];
float number1;
int dot_check1=0,check=0,i;
printf("enter the numbers :\n");
gets(num1);
i=0;
while(num1[i]){
if(num1[i]>'/' && num1[i]<':')
;
else { if(dot_check1==0){
if(num1[i]=='.')
dot_check1=1;
else {
check=1;
break;
}
}
else {
check=1;
break;
}
}
i++;
}
if(check){
printf("please check the number you have entered");
}
else{
number1=atof(num1);
printf("you entered number is %f",number1);
}
}
Here is untested code to check whether a string meets the requested specification.
#include <ctype.h>
/* IsFloatNumeral returns true (1) if the string pointed to by p contains a
valid numeral and false (0) otherwise. A valid numeral:
Starts with optional white space.
Has an optional hyphen as a minus sign.
Contains either digits, a period followed by digits, or both.
Ends with optional white space.
Notes:
It is unusual not to accept "3." for a float literal, but this was
specified in a comment, so the code here is written for that.
The question does not state that leading or trailing white space
should be accepted (and ignored), but that is included here. To
exclude such white space, simply delete the relevant lines.
*/
_Bool IsFloatNumeral(const char *p)
{
_Bool ThereAreInitialDigits = 0;
_Bool ThereIsAPeriod = 0;
// Skip initial spaces. (Not specified in question; removed if undesired.)
while (isspace(*p))
++p;
// Allow an initial hyphen as a minus sign.
if (*p == '-')
++p;
// Allow initial digits.
if (isdigit(*p))
{
ThereAreInitialDigits = 1;
do
++p;
while (isdigit(*p));
}
// Allow a period followed by digits. Require at least one digit to follow the period.
if (*p == '.')
{
++p;
if (!isdigit(*p))
return 0;
ThereIsAPeriod = 1;
do
++p;
while (isdigit(*p));
}
/* If we did not see either digits or a period followed by digits,
reject the string (return 0).
*/
if (!ThereAreInitialDigits && !ThereIsAPeriod)
return 0;
// Skip trailing spaces. (Not specified in question; removed if undesired.)
while (isspace(*p))
++p;
/* If we are now at the end of the string (the null terminating
character), accept the string (return 1). Otherwise, reject it (return
0).
*/
return *p == 0;
}

C - scanf() vs gets() vs fgets()

I've been doing a fairly easy program of converting a string of Characters (assuming numbers are entered) to an Integer.
After I was done, I noticed some very peculiar "bugs" that I can't answer, mostly because of my limited knowledge of how the scanf(), gets() and fgets() functions work. (I did read a lot of literature though.)
So without writing too much text, here's the code of the program:
#include <stdio.h>
#define MAX 100
int CharToInt(const char *);
int main()
{
char str[MAX];
printf(" Enter some numbers (no spaces): ");
gets(str);
// fgets(str, sizeof(str), stdin);
// scanf("%s", str);
printf(" Entered number is: %d\n", CharToInt(str));
return 0;
}
int CharToInt(const char *s)
{
int i, result, temp;
result = 0;
i = 0;
while(*(s+i) != '\0')
{
temp = *(s+i) & 15;
result = (temp + result) * 10;
i++;
}
return result / 10;
}
So here's the problem I've been having. First, when using gets() function, the program works perfectly.
Second, when using fgets(), the result is slightly wrong because apparently fgets() function reads newline (ASCII value 10) character last which screws up the result.
Third, when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value. For this, I have no explanation.
Now I know that gets() is discouraged to use, so I would like to know if I can use fgets() here so it doesn't read (or ignores) newline character.
Also, what's the deal with the scanf() function in this program?
Never use gets. It offers no protections against a buffer overflow vulnerability (that is, you cannot tell it how big the buffer you pass to it is, so it cannot prevent a user from entering a line larger than the buffer and clobbering memory).
Avoid using scanf. If not used carefully, it can have the same buffer overflow problems as gets. Even ignoring that, it has other problems that make it hard to use correctly.
Generally you should use fgets instead, although it's sometimes inconvenient (you have to strip the newline, you must determine a buffer size ahead of time, and then you must figure out what to do with lines that are too long–do you keep the part you read and discard the excess, discard the whole thing, dynamically grow the buffer and try again, etc.). There are some non-standard functions available that do this dynamic allocation for you (e.g. getline on POSIX systems, Chuck Falconer's public domain ggets function). Note that ggets has gets-like semantics in that it strips a trailing newline for you.
Yes, you want to avoid gets. fgets will always read the new-line if the buffer was big enough to hold it (which lets you know when the buffer was too small and there's more of the line waiting to be read). If you want something like fgets that won't read the new-line (losing that indication of a too-small buffer) you can use fscanf with a scan-set conversion like: "%N[^\n]", where the 'N' is replaced by the buffer size - 1.
One easy (if strange) way to remove the trailing new-line from a buffer after reading with fgets is: strtok(buffer, "\n"); This isn't how strtok is intended to be used, but I've used it this way more often than in the intended fashion (which I generally avoid).
There are numerous problems with this code. We'll fix the badly named variables and functions and investigate the problems:
First, CharToInt() should be renamed to the proper StringToInt() since it operates on an string not a single character.
The function CharToInt() [sic.] is unsafe. It doesn't check if the user accidentally passes in a NULL pointer.
It doesn't validate input, or more correctly, skip invalid input. If the user enters in a non-digit the result will contain a bogus value. i.e. If you enter in N the code *(s+i) & 15 will produce 14 !?
Next, the nondescript temp in CharToInt() [sic.] should be called digit since that is what it really is.
Also, the kludge return result / 10; is just that -- a bad hack to work around a buggy implementation.
Likewise MAX is badly named since it may appear to conflict with the standard usage. i.e. #define MAX(X,y) ((x)>(y))?(x):(y)
The verbose *(s+i) is not as readable as simply *s. There is no need to use and clutter up the code with yet another temporary index i.
gets()
This is bad because it can overflow the input string buffer. For example, if the buffer size is 2, and you enter in 16 characters, you will overflow str.
scanf()
This is equally bad because it can overflow the input string buffer.
You mention "when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value."
That is due to an incorrect usage of scanf(). I was not able to duplicate this bug.
fgets()
This is safe because you can guarantee you never overflow the input string buffer by passing in the buffer size (which includes room for the NULL.)
getline()
A few people have suggested the C POSIX standard getline() as a replacement. Unfortunately this is not a practical portable solution as Microsoft does not implement a C version; only the standard C++ string template function as this SO #27755191 question answers. Microsoft's C++ getline() was available at least far back as Visual Studio 6 but since the OP is strictly asking about C and not C++ this isn't an option.
Misc.
Lastly, this implementation is buggy in that it doesn't detect integer overflow. If the user enters too large a number the number may become negative! i.e. 9876543210 will become -18815698?! Let's fix that too.
This is trivial to fix for an unsigned int. If the previous partial number is less then the current partial number then we have overflowed and we return the previous partial number.
For a signed int this is a little more work. In assembly we could inspect the carry-flag, but in C there is no standard built-in way to detect overflow with signed int math. Fortunately, since we are multiplying by a constant, * 10, we can easily detect this if we use an equivalent equation:
n = x*10 = x*8 + x*2
If x*8 overflows then logically x*10 will as well. For a 32-bit int overflow will happen when x*8 = 0x100000000 thus all we need to do is detect when x >= 0x20000000. Since we don't want to assume how many bits an int has we only need to test if the top 3 msb's (Most Significant Bits) are set.
Additionally, a second overflow test is needed. If the msb is set (sign bit) after the digit concatenation then we also know the number overflowed.
Code
Here is a fixed safe version along with code that you can play with to detect overflow in the unsafe versions. I've also included both a signed and unsigned versions via #define SIGNED 1
#include <stdio.h>
#include <ctype.h> // isdigit()
// 1 fgets
// 2 gets
// 3 scanf
#define INPUT 1
#define SIGNED 1
// re-implementation of atoi()
// Test Case: 2147483647 -- valid 32-bit
// Test Case: 2147483648 -- overflow 32-bit
int StringToInt( const char * s )
{
int result = 0, prev, msb = (sizeof(int)*8)-1, overflow;
if( !s )
return result;
while( *s )
{
if( isdigit( *s ) ) // Alt.: if ((*s >= '0') && (*s <= '9'))
{
prev = result;
overflow = result >> (msb-2); // test if top 3 MSBs will overflow on x*8
result *= 10;
result += *s++ & 0xF;// OPTIMIZATION: *s - '0'
if( (result < prev) || overflow ) // check if would overflow
return prev;
}
else
break; // you decide SKIP or BREAK on invalid digits
}
return result;
}
// Test case: 4294967295 -- valid 32-bit
// Test case: 4294967296 -- overflow 32-bit
unsigned int StringToUnsignedInt( const char * s )
{
unsigned int result = 0, prev;
if( !s )
return result;
while( *s )
{
if( isdigit( *s ) ) // Alt.: if (*s >= '0' && *s <= '9')
{
prev = result;
result *= 10;
result += *s++ & 0xF; // OPTIMIZATION: += (*s - '0')
if( result < prev ) // check if would overflow
return prev;
}
else
break; // you decide SKIP or BREAK on invalid digits
}
return result;
}
int main()
{
int detect_buffer_overrun = 0;
#define BUFFER_SIZE 2 // set to small size to easily test overflow
char str[ BUFFER_SIZE+1 ]; // C idiom is to reserve space for the NULL terminator
printf(" Enter some numbers (no spaces): ");
#if INPUT == 1
fgets(str, sizeof(str), stdin);
#elif INPUT == 2
gets(str); // can overflows
#elif INPUT == 3
scanf("%s", str); // can also overflow
#endif
#if SIGNED
printf(" Entered number is: %d\n", StringToInt(str));
#else
printf(" Entered number is: %u\n", StringToUnsignedInt(str) );
#endif
if( detect_buffer_overrun )
printf( "Input buffer overflow!\n" );
return 0;
}
You're correct that you should never use gets. If you want to use fgets, you can simply overwrite the newline.
char *result = fgets(str, sizeof(str), stdin);
char len = strlen(str);
if(result != NULL && str[len - 1] == '\n')
{
str[len - 1] = '\0';
}
else
{
// handle error
}
This does assume there are no embedded NULLs. Another option is POSIX getline:
char *line = NULL;
size_t len = 0;
ssize_t count = getline(&line, &len, stdin);
if(count >= 1 && line[count - 1] == '\n')
{
line[count - 1] = '\0';
}
else
{
// Handle error
}
The advantage to getline is it does allocation and reallocation for you, it handles possible embedded NULLs, and it returns the count so you don't have to waste time with strlen. Note that you can't use an array with getline. The pointer must be NULL or free-able.
I'm not sure what issue you're having with scanf.
never use gets(), it can lead to unprdictable overflows. If your string array is of size 1000 and i enter 1001 characters, i can buffer overflow your program.
Try using fgets() with this modified version of your CharToInt():
int CharToInt(const char *s)
{
int i, result, temp;
result = 0;
i = 0;
while(*(s+i) != '\0')
{
if (isdigit(*(s+i)))
{
temp = *(s+i) & 15;
result = (temp + result) * 10;
}
i++;
}
return result / 10;
}
It essentially validates the input digits and ignores anything else. This is very crude so modify it and salt to taste.
So I am not much of a programmer but let me try to answer your question about the scanf();. I think the scanf is pretty fine and use it for mostly everything without having any issues. But you have taken a not completely correct structure. It should be:
char str[MAX];
printf("Enter some text: ");
scanf("%s", &str);
fflush(stdin);
The "&" in front of the variable is important. It tells the program where (in which variable) to save the scanned value.
the fflush(stdin); clears the buffer from the standard input (keyboard) so you're less likely to get a buffer overflow.
And the difference between gets/scanf and fgets is that gets(); and scanf(); only scan until the first space ' ' while fgets(); scans the whole input. (but be sure to clean the buffer afterwards so you wont get an overflow later on)

Best way to do binary arithmetic in C?

I am learning C and writing a simple program that will take 2 string values assumed to each be binary numbers and perform an arithmetic operation according to user selection:
Add the two values,
Subtract input 2 from input 1, or
Multiply the two values.
My implementation assumes each character in the string is a binary bit, e.g. char bin5 = "0101";, but it seems too naive an approach to parse through the string a character at a time. Ideally, I would want to work with the binary values directly.
What is the most efficient way to do this in C? Is there a better way to treat the input as binary values rather than scanf() and get each bit from the string?
I did some research but I didn't find any approach that was obviously better from the perspective of a beginner. Any suggestions would be appreciated!
Advice:
There's not much that's obviously better than marching through the string a character at a time and making sure the user entered only ones and zeros. Keep in mind that even though you could write a really fast assembly routine if you assume everything is 1 or 0, you don't really want to do that. The user could enter anything, and you'd like to be able to tell them if they screwed up or not.
It's true that this seems mind-bogglingly slow compared to the couple cycles it probably takes to add the actual numbers, but does it really matter if you get your answer in a nanosecond or a millisecond? Humans can only detect 30 milliseconds of latency anyway.
Finally, it already takes far longer to get input from the user and write output to the screen than it does to parse the string or add the numbers, so your algorithm is hardly the bottleneck here. Save your fancy optimizations for things that are actually computationally intensive :-).
What you should focus on here is making the task less manpower-intensive. And, it turns out someone already did that for you.
Solution:
Take a look at the strtol() manpage:
long strtol(const char *nptr, char **endptr, int base);
This will let you convert a string (nptr) in any base to a long. It checks errors, too. Sample usage for converting a binary string:
#include <stdlib.h>
char buf[MAX_BUF];
get_some_input(buf);
char *err;
long number = strtol(buf, &err, 2);
if (*err) {
// bad input: try again?
} else {
// number is now a long converted from a valid binary string.
}
Supplying base 2 tells strtol to convert binary literals.
First out I do recommend that you use stuff like strtol as recommended by tgamblin,
it's better to use things that the lib gives to you instead of creating the wheel over and over again.
But since you are learning C I did a little version without strtol,
it's neither fast or safe but I did play a little with the bit manipulation as a example.
int main()
{
unsigned int data = 0;
int i = 0;
char str[] = "1001";
char* pos;
pos = &str[strlen(str)-1];
while(*pos == '0' || *pos == '1')
{
(*pos) -= '0';
data += (*pos) << i;
i++;
pos--;
}
printf("data %d\n", data);
return 0;
}
In order to get the best performance, you need to distinguish between trusted and untrusted input to your functions.
For example, a function like getBinNum() which accepts input from the user should be checked for valid characters and compressed to remove leading zeroes. First, we'll show a general purpose in-place compression function:
// General purpose compression removes leading zeroes.
void compBinNum (char *num) {
char *src, *dst;
// Find first non-'0' and move chars if there are leading '0' chars.
for (src = dst = num; *src == '0'; src++);
if (src != dst) {
while (*src != '\0')
*dst++ = *src++;
*dst = '\0';
}
// Make zero if we removed the last zero.
if (*num == '\0')
strcpy (num, "0");
}
Then provide a checker function that returns either the passed in value, or NULL if it was invalid:
// Check untested number, return NULL if bad.
char *checkBinNum (char *num) {
char *ptr;
// Check for valid number.
for (ptr = num; *ptr == '0'; ptr++)
if ((*ptr != '1') && (*ptr != '0'))
return NULL;
return num;
}
Then the input function itself:
#define MAXBIN 256
// Get number from (untrusted) user, return NULL if bad.
char *getBinNum (char *prompt) {
char *num, *ptr;
// Allocate space for the number.
if ((num = malloc (MAXBIN)) == NULL)
return NULL;
// Get the number from the user.
printf ("%s: ", prompt);
if (fgets (num, MAXBIN, stdin) == NULL) {
free (num);
return NULL;
}
// Remove newline if there.
if (num[strlen (num) - 1] == '\n')
num[strlen (num) - 1] = '\0';
// Check for valid number then compress.
if (checkBinNum (num) == NULL) {
free (num);
return NULL;
}
compBinNum (num);
return num;
}
Other functions to add or multiply should be written to assume the input is already valid since it will have been created by one of the functions in this library. I won't provide the code for them since it's not relevant to the question:
char *addBinNum (char *num1, char *num2) {...}
char *mulBinNum (char *num1, char *num2) {...}
If the user chooses to source their data from somewhere other than getBinNum(), you could allow them to call checkBinNum() to validate it.
If you were really paranoid, you could check every number passed in to your routines and act accordingly (return NULL), but that would require relatively expensive checks that aren't necessary.
Wouldn't it be easier to parse the strings into integers, and then perform your maths on the integers?
I'm assuming this is a school assignment, but i'm upvoting you because you appear to be giving it a good effort.
Assuming that a string is a binary number simply because it consists only of digits from the set {0,1} is dangerous. For example, when your input is "11", the user may have meant eleven in decimal, not three in binary. It is this kind of carelessness that gives rise to horrible bugs. Your input is ambiguously incomplete and you should really request that the user specifies the base too.

Resources