how to avoid wrapping of input data in c using gcc? - c

#include<stdio.h>
void main()
{
unsigned int a;
printf("Enter a number:");
scanf("%u",&a);
if ( a <= 4294967295) {
printf("Entered no is within limit\n");
}
else {
printf("Entered no is not in the limit");
}
}
Which input number will execute the else block with the above condition?
The if block is always executed though input is greater than the limit. It is because of wrapping. Is there is any way to find out wrapping?
This is because maximum limit of unsigned int is 4294967295 and it is 32 bits of 1 in binary
if my input is 4294967296 it becomes 1 followed by 32 bits of 0.
we can't access that 33rd bit of 1
Is there is any possibility to access that 33rd bit?

Which input number will execute the else block with the above condition?
Most likely your system uses 32 bit unsigned int with maximum value 4294967295.
If so, there is no input that can trigger the else block.
As commented by many, the simple solution is to use a variable with more bits (if possible). But as pointed out by #Brendan it just give you other problems.
A more robust solution is to write your own function to parse the input instead of using scanf
Such a function could be something like this:
#include <stdio.h>
#include <limits.h>
// Returns 1 if the string can by converted to unsigned int without overflow
// else return 0
int string2unsigned(const char* s, unsigned int* u)
{
unsigned int t = 0;
if (*s > '9' || *s < '0') return 0; // Check for unexpected char
while(*s)
{
if (*s == '\n') break; // Stop if '\n' is found
if (*s > '9' || *s < '0') return 0; // Check for unexpected char
if (t > UINT_MAX/10) return 0; // Check for overflow
t = 10 * t;
if (t > UINT_MAX - (*s - '0')) return 0; // Check for overflow
t = t + (*s - '0');
s++;
}
*u = t;
return 1;
}
int main(void) {
unsigned int u;
char s[100];
if (!fgets(s, 100, stdin)) // Use fgets to read a line
{
printf("Failed to read input\n");
}
else
{
if (string2unsigned(s, &u))
{
printf("OK %u\n", u);
}
else
{
printf("Illegal %s", s);
}
}
return 0;
}
Example:
input: 4294967295
output: OK 4294967295
input: 4294967296
output: Illegal 4294967296
(Thanks to #chux for suggesting a better alternative to my original code)

Using a larger integer type (with more bits) is wrong and broken. All that will do is give you a new problem (e.g. accepting numbers above 18446744073709551615 when it shouldn't).
For user interface design; a single "Entered number is not within the limit" error message is unacceptable. You have to tell the user what the real problem is and also remind the user of what you are expecting.
At a minimum, there are at least 4 different kinds of errors you should handle (and therefore at least 4 different error messages you should be able to display):
No input received (stdin gave you an EOF)
Input contained something that isn't a valid character (e.g. the byte 0x00, malformed UTF-8 multi-byte sequences, ASCII characters above 0x7F, etc)
Input contained an unrecognised/unaccepted character (e.g. the user typed "Bork!", or "0x1234", or "twelve"; or the user typed "12,345.00" and your code can't handle thousand's separators or fractions)
Input is a valid number but is not within a certain range. This might include negative values like "-1234" (and in that case the minus sign should not be treated as unrecognised/unaccepted character causing the wrong error message to be given).
Also note that the limit does not depend on the size of the variable type you use; the size of the variable type you choose depends on the limit. For example, if you're asking someone to enter their age and they type "12345" then that should cause an error message like "Number out of range (age must be between 1 and 150)." regardless of whether it causes an overflow or not. In this case the limit would be 150 (and not something like INT_MAX because nobody is ever likely to be that old). Because the limit is 150, you can't choose to use signed char as your variable type but could choose to use uint8_t (or anything larger).
With all of this in mind; scanf() is never usable for parsing "strings from humans". You must write your own parser or find something (a "non-standard" library) that is suitable.

Related

How to check an edge case in taking command line argument in C and evaluating to int or double?

So I have an assignment to figure out whether a number on the command line is either an integer or a double.
I have it mostly figured it out by doing:
sscanf(argv[x], "%lf", &d)
Where "d" is a double. I then cast it to an int and then subtract "d" with itself to check to see if it is 0.0 as such.
d - (int)d == 0.0
My problem is if the command line arguments contains doubles that can be technically classified as ints.
I need to classify 3.0 as a double whereas my solution considers it an int.
For example initializing the program.
a.out 3.0
I need it to print out
"3.0 is a double"
However right now it becomes
"3 is an int."
What would be a way to check for this? I did look around for similar problems which led me to the current solution but just this one edge case I do not know how to account for.
Thank you.
For example, a way like this:
#include <stdio.h>
int main(int argc, char *argv[]){
if(argc != 2){
puts("Need an argument!");
return -1;
}
int int_v, read_len = 0;
double double_v;
printf("'%s' is ", argv[1]);
//==1 : It was able to read normally.
//!argv[1][read_len] : It used all the argument strings.
if(sscanf(argv[1], "%d%n", &int_v, &read_len) == 1 && !argv[1][read_len])
puts("an int.");
else if(sscanf(argv[1], "%lf%n", &double_v, &read_len) == 1 && !argv[1][read_len])
puts("a double.");
else
puts("isn't the expected input.");
}
To test if a string will covert to a int and/or double (completely, without integer overflow, without undefined behavior), call strtol()/strtod(). #Tom Karzes
The trouble with a sscanf() approach is that the result is undefined behavior (UB) on overflow. To properly detect, use strtol()/strtod().
#include <errno.h>
#include <limits.h>
#include <stdbool.h>
#include <stdlib.h>
bool is_int(const char *src) {
char *endptr;
// Clear, so it may be tested after strtol().
errno = 0;
// Using 0 here allows 0x1234, octal 0123 and decimal 1234.
// or use 10 to allow only decimal text.
long num = strtol(src, &endptr, 0 /* or 10 */);
#if LONG_MIN < INT_MIN || LONG_MAX > INT_MAX
if (num < INT_MIN || num > INT_MAX) {
errno = ERANGE;
}
#endif
return !errno && endptr > src && *endptr == '\0';
}
bool is_double(const char *src) {
char *endptr;
// Clear, so it may be tested after strtod().
strtod(src, &endptr);
// In this case, detecting over/underflow IMO is not a concern.
return endptr > src && *endptr == '\0';
}
It is not entirely clear what the specific expectations are for your program, but it has at least something to do with the form of the input, since "3.0" must be classified as a double. If the form is all it should care about, then you should not try to convert the argument strings to numbers at all, for then you will run into trouble with unrepresentable values. In that case, you should analyze the character sequence of the argument to see whether it matches the pattern of an integer, and if not, whether it matches the pattern of a floating-point number.
For example:
int main(int argc, char *argv[]) {
for (int arg_num = 1; arg_num < argc; arg_num++) {
char *arg = argv[arg_num];
int i = (arg[0] == '-' || arg[0] == '+') ? 1 : 0; // skip any leading sign
// scan through all the decimal digits
while(isdigit(arg[i])) {
++i;
}
printf("Argument %d is %s.\n", arg_num, arg[i] ? "floating-point" : "integer");
}
}
That makes several assumptions, chief among them:
the question is strictly about form, so that the properties of your system's built-in data types (such as int and double) are not relevant.
each argument will have the form of either an integer or a floating-point number, so that eliminating "integer" as a possibility leaves "floating-point" as the only alternative. If "neither" is a possibility that must also be accommodated, then you'll also need to compare the inputs that do not have integer form to a pattern for floating-point numbers, too.
only decimal (or smaller radix) integers need be accommodated -- not, for example, hexadecimal inputs.
Under those assumptions, particularly the first, it is not just unnecessary but counterproductive to attempt to convert the arguments to one of the built-in numeric data types, because you would then come to the wrong conclusion about arguments that, say, are not within the bounds of representable values for those types.
For example, consider how the program should classify "9000000000". It has the form of an integer, but supposing that your system's int type has 31 value bits, that type cannot accommodate a value as large as the one the string represents.
int main (int argc,char *argv[])
{
if(argc==2)
{
int i;
double d;
d=atof(argv[1]);
i=atoi(argv[1]);
if(d!=i)
printf("%s is a double.",argv[1]);
else if(d==i)
printf("%s is an int.",argv[1]);
}
else
printf("Invalid input\n");
return 0;
}
You must add #include <stdlib.h>

Validation of infinite input char \ number

I need to get a valid number from the user between 0-9 without duplicates.
The valid number can have any number of digit, from 1 to 10.
If the user type "space" or any kind of char, then the input is invalid.
My algorithm :
1) Create an array of char in size of 10, then initialize all cells to '0'.
2) For every char that reads from the user, check if the char actually between 0-9.
2.1) If true: count the respectively cell number +1.
2.2) Else "error".
2.3) If I get to a cell that already has +1, means this number already exist, then "error".
Now a few questions about my idea:
1) Is there any better\easy algorithm to do that?
2) The user doesn't type char by char, means I can get an infinite char length, so where do I store everything?
The answer to 2) is: you don't store the characters at all, you process them one by one. You only need storage to remember which digits you have already seen. I'd do it like this:
#include <stdio.h>
#include <ctype.h>
int main(void)
{
char seen[10] = { 0 };
int c, loops;
for (loops = 0; (c = getchar()) != EOF && loops < 10; ++loops)
{
if (!isdigit(c)) {
printf ("Not a digit: %c\n", c);
break;
}
c -= '0';
if (seen[c]) {
printf ("Already seen: %d\n", c);
break;
}
seen[c] = 1;
}
return 0;
}
Try to modify this program as an exercise: reduce the storage requirements of the seen[] array. As written it uses one byte per digit. Make the program use only one bit per digit.

Using strtol to validate integer input in ANSI C

I am new to programming and to C in general and am currently studying it at university. This is for an assignment so I would like to avoid direct answers but are more after tips or hints/pushes in the right direction.
I am trying to use strtol to validate my keyboard input, more specifically, test whether the input is numeric. I have looked over other questions on here and other sites and I have followed instructions given to other users but it hasn't helped me.
From what I have read/ understand of strtol (long int strtol (const char* str, char** endptr, int base);) if the endptr is not a null pointer the function will set the value of the endptr to the first character after the number.
So if I was to enter 84948ldfk, the endptr would point to 'l', telling me there is characters other than numbers in the input and which would make it invalid.
However in my case, what is happening, is that no matter what I enter, my program is returning an Invalid input. Here is my code:
void run_perf_square(int *option_stats)
{
char input[MAX_NUM_INPUT + EXTRA_SPACES]; /*MAX_NUM_INPUT + EXTRA_SPACES are defined
*in header file. MAX_NUM_INPUT = 7
*and EXTRA_SPACES
*(for '\n' and '\0') = 2. */
char *ptr;
unsigned num=0; /*num is unsigned as it was specified in the start up code for the
*assignment. I am not allow to change it*/
printf("Perfect Square\n");
printf("--------------\n");
printf("Enter a positive integer (1 - 1000000):\n");
if(fgets(input, sizeof input, stdin) != NULL)
{
num=strtol(input, &ptr, 10);
if( num > 1000001)
{
printf("Invalid Input! PLease enter a positive integer between 1
and 1000000\n");
read_rest_of_line(); /*clears buffer to avoid overflows*/
run_perf_square(option_stats);
}
else if (num <= 0)
{
printf("Invalid Input! PLease enter a positive integer between 1
and 1000000\n");
run_perf_square(option_stats);
}
else if(ptr != NULL)
{
printf("Invalid Input! PLease enter a positive integer between 1
and 1000000\n");
run_perf_square(option_stats);
}
else
{
perfect_squares(option_stats, num);
}
}
}
Can anyone help me in the right direction? Obviously the error is with my if(ptr != NULL) condition, but as I understand it seems right. As I said, I have looked at previous questions similar to this and took the advice in the answers but it doesn't seem to work for me. Hence, I thought it best to ask for my help tailored to my own situation.
Thanks in advance!
You're checking the outcome of strtol in the wrong order, check ptr first, also don't check ptr against NULL, derference it and check that it points to the NUL ('\0') string terminator.
if (*ptr == '\0') {
// this means all characters were parsed and converted to `long`
}
else {
// this means either no characters were parsed correctly in which
// case the return value is completely invalid
// or
// there was a partial parsing of a number to `long` which is returned
// and ptr points to the remaining string
}
num > 1000001 also needs to be num > 1000000
num < 0 also needs to be num < 1
You can also with some reorganising and logic tweaks collapse your sequence of if statements down to only
a single invalid branch and a okay branch.
OP would like to avoid direct answers ....
validate integer input
Separate I/O from validation - 2 different functions.
I/O: Assume hostile input. (Text, too much text, too little text. I/O errors.) Do you want to consume leading spaces as part of I/O? Do you want to consume leading 0 as part of I/O? (suggest not)
Validate the string (NULL, lead space OK?, digits after a trailing space, too short, too long, under-range, over-range, Is 123.0 an OK integer)
strtol() is your friend to do the heavy conversion lifting. Check how errno should be set and tested afterward. Use the endptr. Should its value be set before. How to test afterward. It consume leading spaces, is that OK? It converts text to a long, but OP wants the nebulous "integer".
Qapla'
The function strtol returns long int, which is a signed value. I suggest that you use another variable (entry_num), which you could test for <0, thus detecting negative numbers.
I would also suggest that regex could test string input for digits and valid input, or you could use strtok and anything but digits as the delimiter ;-) Or you could scan the input string using validation, something like:
int validate_input ( char* input )
{
char *p = input;
if( !input ) return 0;
for( p=input; *p && (isdigit(*p) || iswhite(*p)); ++p )
{
}
if( *p ) return 0;
return 1;
}

Macro directives in C — my code example doesn't work

I want to get the following piece of code to work:
#define READIN(a, b) if(scanf('"#%d"', '"&a"') != 1) { printf("ERROR"); return EXIT_FAILURE; }
int main(void)
{
unsigned int stack_size;
printf("Type in size: ");
READIN(d, stack_size);
}
I don't get how to use directives with the # operator. I want to use the scanf with print ERROR etc. several times, but the "'"#%d"' & '"&a"'" is, I think, completely wrong. Is there any way to get that running? I think a macro is the best solution — or do you disagree?
You should only stringify arguments to the macro, and they must be outside of strings or character constants in the replacement text of the macro. Thus you probably should use:
#define READIN(a, b) do { if (scanf("%" #a, &b) != 1) \
{ fprintf(stderr, "ERROR\n"); return EXIT_FAILURE; } \
} while (0)
int main(void)
{
unsigned int stack_size;
printf("Type in size: ");
READIN(u, stack_size);
printf("You entered %u\n", stack_size);
return(0);
}
There are many changes. The do { ... } while (0) idiom prevents you from getting compilation errors in circumstances such as:
if (i > 10)
READIN(u, j);
else
READIN(u, k);
With your macro, you'd get an unexpected keyword 'else' type of message because the semi-colon after the first READIN() would be an empty statement after the embedded if, so the else could not belong to the visible if or the if inside the macro.
The type of stack_size is unsigned int; the correct format specifier, therefore, is u (d is for a signed int).
And, most importantly, the argument a in the macro is stringized correctly (and string concatenation of adjacent string literals - an extremely useful feature of C89! - takes care of the rest for you. And the argument b in the macro is not embedded in a string either.
The error reporting is done to stderr (the standard stream for reporting errors on), and the message ends with a newline so it will actually appear. I didn't replace return EXIT_FAILURE; with exit(EXIT_FAILURE);, but that would probably be a sensible choice if the macro will be used outside of main(). That assumes that 'terminate on error' is the appropriate behaviour in the first place. It often isn't for interactive programs, but fixing it is a bit harder.
I'm also ignoring my reservations about using scanf() at all; I usually avoid doing so because I find error recovery too hard. I've only been programming in C for about 28 years, and I still find scanf() too hard to control, so I essentially never use it. I typically use fgets() and sscanf() instead. Amongst other merits, I can report on the string that caused the trouble; that's hard to do when scanf() may have gobbled some of it.
My thought with scanf() here is, to only read in positive numbers and no letters. My overall code does create a stack, which the user types in and the type should be only positive, otherwise error. [...] I only wanted to know if there's a better solution to forbid the user to type in something other than positive numbers?
I just tried the code above (with #include <stdlib.h> and #include <stdio.h> added) and entered -2 and got told 4294967294, which isn't what I wanted (the %u format does not reject -2, at least on MacOS X 10.7.2). So, I would go with fgets() and strtoul(), most likely. However, accurately detecting all possible problems with strtoul() is an exercise of some delicacy.
This is the alternative code I came up with:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <limits.h>
#include <string.h>
int main(void)
{
unsigned int stack_size = 0;
char buffer[4096];
printf("Type in size: ");
if (fgets(buffer, sizeof(buffer), stdin) == 0)
printf("EOF or error detected\n");
else
{
char *eos;
unsigned long u;
size_t len = strlen(buffer);
if (len > 0)
buffer[len - 1] = '\0'; // Zap newline (assuming there is one)
errno = 0;
u = strtoul(buffer, &eos, 10);
if (eos == buffer ||
(u == 0 && errno != 0) ||
(u == ULONG_MAX && errno != 0) ||
(u > UINT_MAX))
{
printf("Oops: one of many problems occurred converting <<%s>> to unsigned integer\n", buffer);
}
else
stack_size = u;
printf("You entered %u\n", stack_size);
}
return(0);
}
The specification of strtoul() is given in ISO/IEC 9899:1999 §7.20.1.4:
¶1 [...]
unsigned long int strtoul(const char * restrict nptr,
char ** restrict endptr, int base);
[...]
¶2 [...] First,
they decompose the input string into three parts: an initial, possibly empty, sequence of
white-space characters (as specified by the isspace function), a subject sequence
resembling an integer represented in some radix determined by the value of base, and a
final string of one or more unrecognized characters, including the terminating null
character of the input string. Then, they attempt to convert the subject sequence to an
integer, and return the result.
¶3 [...]
¶4 The subject sequence is defined as the longest initial subsequence of the input string,
starting with the first non-white-space character, that is of the expected form. The subject
sequence contains no characters if the input string is empty or consists entirely of white
space, or if the first non-white-space character is other than a sign or a permissible letter
or digit.
¶5 If the subject sequence has the expected form and the value of base is zero, the sequence
of characters starting with the first digit is interpreted as an integer constant according to
the rules of 6.4.4.1. If the subject sequence has the expected form and the value of base
is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value
as given above. If the subject sequence begins with a minus sign, the value resulting from
the conversion is negated (in the return type). A pointer to the final string is stored in the
object pointed to by endptr, provided that endptr is not a null pointer.
¶6 [...]
¶7 If the subject sequence is empty or does not have the expected form, no conversion is
performed; the value of nptr is stored in the object pointed to by endptr, provided
that endptr is not a null pointer.
Returns
¶8 The strtol, strtoll, strtoul, and strtoull functions return the converted
value, if any. If no conversion could be performed, zero is returned. If the correct value
is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN,
LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type
and sign of the value, if any), and the value of the macro ERANGE is stored in errno.
The error I got was from a 64-bit compilation where -2 was converted to a 64-bit unsigned long, and that was outside the range acceptable to a 32-bit unsigned int (the failing condition was u > UINT_MAX). When I recompiled in 32-bit mode (so sizeof(unsigned int) == sizeof(unsigned long)), then the value -2 was accepted again, interpreted as 4294967294 again. So, even this is not delicate enough...you probably have to do a manual skip of leading blanks and reject a negative sign (and maybe a positive sign too; you'd also need to #include <ctype.h> too):
char *bos = buffer;
while (isspace(*bos))
bos++;
if (!isdigit(*bos))
...error - not a digit...
char *eos;
unsigned long u;
size_t len = strlen(bos);
if (len > 0)
bos[len - 1] = '\0'; // Zap newline (assuming there is one)
errno = 0;
u = strtoul(bos, &eos, 10);
if (eos == bos ||
(u == 0 && errno != 0) ||
(u == ULONG_MAX && errno != 0) ||
(u > UINT_MAX))
{
printf("Oops: one of many problems occurred converting <<%s>> to unsigned integer\n", buffer);
}
As I said, the whole process is rather non-trivial.
(Looking at it again, I'm not sure whether the u == 0 && errno != 0 clause would ever catch any errors...maybe not because the eos == buffer (or eos == bos) condition catches the case there's nothing to convert at all.)
You are incorrectly encasing your macro argument(s), it should look like:
#define READIN(a, b) if(scanf("%"#a, &b) != 1) { printf("ERROR"); return EXIT_FAILURE; }
you use of the stringify operator was also incorrect, it must directly prefix the argument name.
In short, use "%"#a, not '"#%d"', and &b, not '"&a"'.
as a side note, for longish macro's like those, it helps to make them multi-line using \, this keeps them readable:
#define READIN(a, b) \
if(scanf("%"#a, &b) != 1) \
{ \
printf("ERROR"); \
return EXIT_FAILURE; \
}
When doing something like this, one should preferably use a function, something along the lines of this should work:
inline int readIn(char* szFormat, void* pDst)
{
if(scanf(szFormat,pDst) != 1)
{
puts("Error");
return 0;
}
return 1;
}
invoking it would be like so:
if(!readIn("%d",&stack_size))
return EXIT_FAILURE;
scanf(3) takes a const char * as a first argument. You are passing '"..."', which is not a C "string". C strings are written with the " double quotes. The ' single quotes are for individual characters: 'a' or '\n' etc.
Placing a return statement inside a C preprocessor macro is usually considered very poor form. I've seen goto error; coded inside preprocessor macros before for repetitive error handling code when storing formatted data to and reading data from a file or kernel interface, but these are definitely exceptional circumstances. You would detest debugging this in six months time. Trust me. Do not hide goto, return, break, continue, inside C preprocessor macros. if is alright so long as it is entirely contained within the macro.
Also, please get in the habit of writing your printf(3) statements like this:
printf("%s", "ERROR");
Format string vulnerabilities are exceedingly easy to write. Your code does not contain any such vulnerability now, but trust me, at some point in the future those strings are inevitably modified to include some user-supplied content, and putting in an explicit format string now will help prevent these in the future. At least you'll think about it in the future if you see this.
It is considered polite to wrap your multi-line macros in do { } while (0) blocks.
Finally, the stringification is not quite done correctly; try this instead:
#define READIN(A, B) do { if (scanf("%" #A, B) != 1) { \
/* error handling */ \
} else { \
/* success case */ \
} } while(0)
Edit: I feel I should re-iterate akappa's advice: Use a function instead. You get better type checking, better backtraces when something goes wrong, and it is far easier to work with. Functions are good.

C - scanf() vs gets() vs fgets()

I've been doing a fairly easy program of converting a string of Characters (assuming numbers are entered) to an Integer.
After I was done, I noticed some very peculiar "bugs" that I can't answer, mostly because of my limited knowledge of how the scanf(), gets() and fgets() functions work. (I did read a lot of literature though.)
So without writing too much text, here's the code of the program:
#include <stdio.h>
#define MAX 100
int CharToInt(const char *);
int main()
{
char str[MAX];
printf(" Enter some numbers (no spaces): ");
gets(str);
// fgets(str, sizeof(str), stdin);
// scanf("%s", str);
printf(" Entered number is: %d\n", CharToInt(str));
return 0;
}
int CharToInt(const char *s)
{
int i, result, temp;
result = 0;
i = 0;
while(*(s+i) != '\0')
{
temp = *(s+i) & 15;
result = (temp + result) * 10;
i++;
}
return result / 10;
}
So here's the problem I've been having. First, when using gets() function, the program works perfectly.
Second, when using fgets(), the result is slightly wrong because apparently fgets() function reads newline (ASCII value 10) character last which screws up the result.
Third, when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value. For this, I have no explanation.
Now I know that gets() is discouraged to use, so I would like to know if I can use fgets() here so it doesn't read (or ignores) newline character.
Also, what's the deal with the scanf() function in this program?
Never use gets. It offers no protections against a buffer overflow vulnerability (that is, you cannot tell it how big the buffer you pass to it is, so it cannot prevent a user from entering a line larger than the buffer and clobbering memory).
Avoid using scanf. If not used carefully, it can have the same buffer overflow problems as gets. Even ignoring that, it has other problems that make it hard to use correctly.
Generally you should use fgets instead, although it's sometimes inconvenient (you have to strip the newline, you must determine a buffer size ahead of time, and then you must figure out what to do with lines that are too long–do you keep the part you read and discard the excess, discard the whole thing, dynamically grow the buffer and try again, etc.). There are some non-standard functions available that do this dynamic allocation for you (e.g. getline on POSIX systems, Chuck Falconer's public domain ggets function). Note that ggets has gets-like semantics in that it strips a trailing newline for you.
Yes, you want to avoid gets. fgets will always read the new-line if the buffer was big enough to hold it (which lets you know when the buffer was too small and there's more of the line waiting to be read). If you want something like fgets that won't read the new-line (losing that indication of a too-small buffer) you can use fscanf with a scan-set conversion like: "%N[^\n]", where the 'N' is replaced by the buffer size - 1.
One easy (if strange) way to remove the trailing new-line from a buffer after reading with fgets is: strtok(buffer, "\n"); This isn't how strtok is intended to be used, but I've used it this way more often than in the intended fashion (which I generally avoid).
There are numerous problems with this code. We'll fix the badly named variables and functions and investigate the problems:
First, CharToInt() should be renamed to the proper StringToInt() since it operates on an string not a single character.
The function CharToInt() [sic.] is unsafe. It doesn't check if the user accidentally passes in a NULL pointer.
It doesn't validate input, or more correctly, skip invalid input. If the user enters in a non-digit the result will contain a bogus value. i.e. If you enter in N the code *(s+i) & 15 will produce 14 !?
Next, the nondescript temp in CharToInt() [sic.] should be called digit since that is what it really is.
Also, the kludge return result / 10; is just that -- a bad hack to work around a buggy implementation.
Likewise MAX is badly named since it may appear to conflict with the standard usage. i.e. #define MAX(X,y) ((x)>(y))?(x):(y)
The verbose *(s+i) is not as readable as simply *s. There is no need to use and clutter up the code with yet another temporary index i.
gets()
This is bad because it can overflow the input string buffer. For example, if the buffer size is 2, and you enter in 16 characters, you will overflow str.
scanf()
This is equally bad because it can overflow the input string buffer.
You mention "when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value."
That is due to an incorrect usage of scanf(). I was not able to duplicate this bug.
fgets()
This is safe because you can guarantee you never overflow the input string buffer by passing in the buffer size (which includes room for the NULL.)
getline()
A few people have suggested the C POSIX standard getline() as a replacement. Unfortunately this is not a practical portable solution as Microsoft does not implement a C version; only the standard C++ string template function as this SO #27755191 question answers. Microsoft's C++ getline() was available at least far back as Visual Studio 6 but since the OP is strictly asking about C and not C++ this isn't an option.
Misc.
Lastly, this implementation is buggy in that it doesn't detect integer overflow. If the user enters too large a number the number may become negative! i.e. 9876543210 will become -18815698?! Let's fix that too.
This is trivial to fix for an unsigned int. If the previous partial number is less then the current partial number then we have overflowed and we return the previous partial number.
For a signed int this is a little more work. In assembly we could inspect the carry-flag, but in C there is no standard built-in way to detect overflow with signed int math. Fortunately, since we are multiplying by a constant, * 10, we can easily detect this if we use an equivalent equation:
n = x*10 = x*8 + x*2
If x*8 overflows then logically x*10 will as well. For a 32-bit int overflow will happen when x*8 = 0x100000000 thus all we need to do is detect when x >= 0x20000000. Since we don't want to assume how many bits an int has we only need to test if the top 3 msb's (Most Significant Bits) are set.
Additionally, a second overflow test is needed. If the msb is set (sign bit) after the digit concatenation then we also know the number overflowed.
Code
Here is a fixed safe version along with code that you can play with to detect overflow in the unsafe versions. I've also included both a signed and unsigned versions via #define SIGNED 1
#include <stdio.h>
#include <ctype.h> // isdigit()
// 1 fgets
// 2 gets
// 3 scanf
#define INPUT 1
#define SIGNED 1
// re-implementation of atoi()
// Test Case: 2147483647 -- valid 32-bit
// Test Case: 2147483648 -- overflow 32-bit
int StringToInt( const char * s )
{
int result = 0, prev, msb = (sizeof(int)*8)-1, overflow;
if( !s )
return result;
while( *s )
{
if( isdigit( *s ) ) // Alt.: if ((*s >= '0') && (*s <= '9'))
{
prev = result;
overflow = result >> (msb-2); // test if top 3 MSBs will overflow on x*8
result *= 10;
result += *s++ & 0xF;// OPTIMIZATION: *s - '0'
if( (result < prev) || overflow ) // check if would overflow
return prev;
}
else
break; // you decide SKIP or BREAK on invalid digits
}
return result;
}
// Test case: 4294967295 -- valid 32-bit
// Test case: 4294967296 -- overflow 32-bit
unsigned int StringToUnsignedInt( const char * s )
{
unsigned int result = 0, prev;
if( !s )
return result;
while( *s )
{
if( isdigit( *s ) ) // Alt.: if (*s >= '0' && *s <= '9')
{
prev = result;
result *= 10;
result += *s++ & 0xF; // OPTIMIZATION: += (*s - '0')
if( result < prev ) // check if would overflow
return prev;
}
else
break; // you decide SKIP or BREAK on invalid digits
}
return result;
}
int main()
{
int detect_buffer_overrun = 0;
#define BUFFER_SIZE 2 // set to small size to easily test overflow
char str[ BUFFER_SIZE+1 ]; // C idiom is to reserve space for the NULL terminator
printf(" Enter some numbers (no spaces): ");
#if INPUT == 1
fgets(str, sizeof(str), stdin);
#elif INPUT == 2
gets(str); // can overflows
#elif INPUT == 3
scanf("%s", str); // can also overflow
#endif
#if SIGNED
printf(" Entered number is: %d\n", StringToInt(str));
#else
printf(" Entered number is: %u\n", StringToUnsignedInt(str) );
#endif
if( detect_buffer_overrun )
printf( "Input buffer overflow!\n" );
return 0;
}
You're correct that you should never use gets. If you want to use fgets, you can simply overwrite the newline.
char *result = fgets(str, sizeof(str), stdin);
char len = strlen(str);
if(result != NULL && str[len - 1] == '\n')
{
str[len - 1] = '\0';
}
else
{
// handle error
}
This does assume there are no embedded NULLs. Another option is POSIX getline:
char *line = NULL;
size_t len = 0;
ssize_t count = getline(&line, &len, stdin);
if(count >= 1 && line[count - 1] == '\n')
{
line[count - 1] = '\0';
}
else
{
// Handle error
}
The advantage to getline is it does allocation and reallocation for you, it handles possible embedded NULLs, and it returns the count so you don't have to waste time with strlen. Note that you can't use an array with getline. The pointer must be NULL or free-able.
I'm not sure what issue you're having with scanf.
never use gets(), it can lead to unprdictable overflows. If your string array is of size 1000 and i enter 1001 characters, i can buffer overflow your program.
Try using fgets() with this modified version of your CharToInt():
int CharToInt(const char *s)
{
int i, result, temp;
result = 0;
i = 0;
while(*(s+i) != '\0')
{
if (isdigit(*(s+i)))
{
temp = *(s+i) & 15;
result = (temp + result) * 10;
}
i++;
}
return result / 10;
}
It essentially validates the input digits and ignores anything else. This is very crude so modify it and salt to taste.
So I am not much of a programmer but let me try to answer your question about the scanf();. I think the scanf is pretty fine and use it for mostly everything without having any issues. But you have taken a not completely correct structure. It should be:
char str[MAX];
printf("Enter some text: ");
scanf("%s", &str);
fflush(stdin);
The "&" in front of the variable is important. It tells the program where (in which variable) to save the scanned value.
the fflush(stdin); clears the buffer from the standard input (keyboard) so you're less likely to get a buffer overflow.
And the difference between gets/scanf and fgets is that gets(); and scanf(); only scan until the first space ' ' while fgets(); scans the whole input. (but be sure to clean the buffer afterwards so you wont get an overflow later on)

Resources