I want to read the input of a user and save it. What i have now does work but i need to know if its legit (following ansi standard - c90) that scanf is first assigning the variable "length" before it allocates memory for the input,
or if its just a quirk of the compiler.
#include <stdio.h>
int main()
{
char* text;
int length = 0;
scanf("%s%n", text = malloc(length+1), &length);
printf("%s", text);
return 0;
}
This will not work as you expect.
At the time you call malloc, length still has the value 0, so you're only allocating one byte. length isn't updated until after scanf returns. So any non-empty string will write past the bounds of the allocated buffer, invoking undefined behavior.
While not exactly the same, what you can do is use getline, assuming you're running on a POSIX system such as Linux. This function reads a line of text (including the newline) and allocates space for that line.
char *text = NULL;
size_t n = 0;
ssite_t rval = getline(&text, &n, stdin);
if (rval == -1) {
perror("getline failed");
} else {
printf("%s", text);
}
free(text);
Apart from the obvious problem with misuse of scanf addressed in another answer, this doesn't follow any C standard either:
#include <stdio.h>
...
text = malloc(length+1)
Since you didn't include stdlib.h where malloc is found, C90 will assume that the function malloc has the form int malloc(int); which is of course nonsense.
And then when you try to assign an int (the result of malloc) to a char*, you have a constraint violation of C90 6.3.16.1, the rules of simple assignment.
Therefore your code is not allowed to compile cleanly, but the compiler must give a diagnostic message.
You can avoid this bug by upgrading to standard ISO C.
Issues well explained by others
I want to read the input of a user and save it
To add and meet OP's goal, similar code could do
int length = 255;
char* text = malloc(length+1);
if (text == NULL) {
Handle_OOM();
}
scanf("%255s%n", text, &length);
// Reallocate if length < 255 and/or
if (length < 255) {
char *t = realloc(text, length + 1);
if (t) text = t;
} else {
tbd(); // fail when length == 255 as all the "word" is not certainly read.
}
The above would be a simple approach if excessive input was deemed hostile.
Related
#include <stdio.h>
int main() {
char *mystring = calloc(2, sizeof(char));
scanf("%10[^\n]s", mystring);
printf("\nValue: %s\nSize of array: %d\nAllocated space: %d\n",
mystring, 2 * sizeof(char), sizeof(char) * strlen(mystring));
free(mystring);
}
Output:
$ ./"dyn_mem"
laaaaaaaaaaa
Value: laaaaaaaaa
Size of array: 2
Allocated space: 10
This code can produce an undefined behavior if I enter in the scanf input a string bigger than array size. How can I handle this ?
There are multiple problems in your code:
mystring is initialized to point to an allocated block of 2 bytes. Technically, you should test for memory allocation failure.
the conversion format "%10[^\n]s" is incorrect: the trailing s should be removed, the syntax for character classes ends with the ].
the number 10 means store at most 10 characters and a null terminator into mystring. If more than 1 character needs to be stored, the code has undefined behavior.
the printf conversion specifier for size_t is %zu, not %d. If your C library is C99 compliant, use %zu, otherwise case the last 2 arguments as (int).
the sizes output do not correspond to the labels: the first is the allocated size, and the second is the length of the string.
the scanf() will fail if the file is empty or starts with a newline. You should test the return value of scanf(), which must be 1, to avoid undefined behavior in case of invalid input.
sizeof(char) is 1 by definition.
There are many ways to achieve your goal:
On systems that support it, such as linux with the GNU lib C, you could use an m prefix between the % and the [ in the scanf() conversion format and pass the address of a char * as an argument. scanf() will allocate an array with malloc() large enough to receive the converted input.
Here is a modified version for linux:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *mystring = NULL;
if (scanf("%m[^\n]", &mystring) == 1) {
printf("Value: %s\n"
"Length of string: %zu\n"
"Allocated space: %zu\n",
mystring, strlen(mystring), malloc_usable_size(mystring));
free(mystring);
}
return 0;
}
On POSIX systems, you could use getline() that reads a line into an allocated array.
On other systems, you would need to write a function that reads the input stream and reallocates the destination array as long as you don't get a newline or the end of file.
A common compromise is to make an assumption about the maximum length of the input:
#include <stdio.h>
#include <stdlib.h>
int main() {
char buf[1024];
if (scanf("%1023[^\n]", buf) == 1) {
char *mystring = strdup(buf);
if (mystring) {
printf("Value: %s\n"
"Length of string: %d\n",
"Minimum allocated size: %d\n",
mystring, (int)strlen(mystring), (int)strlen(mystring) + 1);
free(mystring);
}
}
return 0;
}
You could also use fgets() to read a line from the input stream and strip the newline (if any). This approach has the advantage of not failing on empty lines.
Here is a simple implementation of getline() that should fit your needs:
#include <stdio.h>
#include <stdlib.h>
int my_getline(char **lineptr, size_t *n, FILE *stream) {
char *ptr = *lineptr;
size_t size = *n;
size_t pos = 0;
int c;
while ((c = getc(stream) && c != '\n') {
if (pos + 1 >= size) {
/* reallocate the array increasing size by the golden ratio */
size = size + (size / 2) + (size / 8) + 16;
ptr = realloc(ptr);
if (ptr == NULL) {
ungetc(c, stream);
return EOF;
}
*n = size;
*lineptr = ptr;
}
ptr[pos++] = c;
ptr[pos] = '\0';
}
return (int)pos;
}
int main() {
char *mystring = NULL; // must be initialized
size_t size = 0; // must be initialized
int res;
while ((res = my_getline(&mystring, &size, stdin)) >= 0) {
printf("Value: %s\n"
"Length of string: %d\n",
"Allocated size: %d\n",
mystring, res, (int)size);
}
free(mystring);
return 0;
}
Option #1
from Kernighan and Ritchie 2nd ed appendix B.1.4
char *fgets(char *s, int n, FILE *stream)
fgets reads at most the next n-1 characters into the array s, stopping if a newline is
encountered; the newline is included in the array, which is terminated by '\0'. fgets
returns s, or NULL if end of file or error occurs.
replace n with sizeof(char)*strlen(mystring) in your code
Option #2
also from Kernighan and Ritchie 2nd ed appendix B.1.4
int fgetc(FILE *stream)
fgetc returns the next character of stream as an unsigned char (converted to an
int), or EOF if end of file or error occurs.
and manually put in a for loop with sizeof(char)*strlen(mystring) as the limit
This code can produce an undefined behavior if I enter in the scanf
input a string bigger than array size.
Yes.
How can I "handle" this ?
By ensuring that you always pass scanf a pointer to an object of type appropriate for the corresponding conversion directive. hat is always your responsibility as a C programmer. For s and [ directives, "appropriate" includes being large enough to accommodate all possible converted values.
It is easy enough to do that when the format expresses the maximum size of the input, either directly, as in the example, or parametrically. And the format is under your control. But if you need to handle input of unbounded size then scanf isn't up to the task, at least not by itself. In that case, you need to implement a variation on guessing how much space you'll need, and acquiring more if that turns out not to be enough. Among other things, that means being prepared to read the input in more than one piece, and probably obtaining space for it by dynamic allocation.
I am trying to write a function that does the following things:
Start an input loop, printing '> ' each iteration.
Take whatever the user enters (unknown length) and read it into a character array, dynamically allocating the size of the array if necessary. The user-entered line will end at a newline character.
Add a null byte, '\0', to the end of the character array.
Loop terminates when the user enters a blank line: '\n'
This is what I've currently written:
void input_loop(){
char *str = NULL;
printf("> ");
while(printf("> ") && scanf("%a[^\n]%*c",&input) == 1){
/*Add null byte to the end of str*/
/*Do stuff to input, including traversing until the null byte is reached*/
free(str);
str = NULL;
}
free(str);
str = NULL;
}
Now, I'm not too sure how to go about adding the null byte to the end of the string. I was thinking something like this:
last_index = strlen(str);
str[last_index] = '\0';
But I'm not too sure if that would work though. I can't test if it would work because I'm encountering this error when I try to compile my code:
warning: ISO C does not support the 'a' scanf flag [-Wformat=]
So what can I do to make my code work?
EDIT: changing scanf("%a[^\n]%*c",&input) == 1 to scanf("%as[^\n]%*c",&input) == 1 gives me the same error.
First of all, scanf format strings do not use regular expressions, so I don't think something close to what you want will work. As for the error you get, according to my trusty manual, the %a conversion flag is for floating point numbers, but it only works on C99 (and your compiler is probably configured for C90)
But then you have a bigger problem. scanf expects that you pass it a previously allocated empty buffer for it to fill in with the read input. It does not malloc the sctring for you so your attempts at initializing str to NULL and the corresponding frees will not work with scanf.
The simplest thing you can do is to give up on n arbritrary length strings. Create a large buffer and forbid inputs that are longer than that.
You can then use the fgets function to populate your buffer. To check if it managed to read the full line, check if your string ends with a "\n".
char str[256+1];
while(true){
printf("> ");
if(!fgets(str, sizeof str, stdin)){
//error or end of file
break;
}
size_t len = strlen(str);
if(len + 1 == sizeof str){
//user typed something too long
exit(1);
}
printf("user typed %s", str);
}
Another alternative is you can use a nonstandard library function. For example, in Linux there is the getline function that reads a full line of input using malloc behind the scenes.
No error checking, don't forget to free the pointer when you're done with it. If you use this code to read enormous lines, you deserve all the pain it will bring you.
#include <stdio.h>
#include <stdlib.h>
char *readInfiniteString() {
int l = 256;
char *buf = malloc(l);
int p = 0;
char ch;
ch = getchar();
while(ch != '\n') {
buf[p++] = ch;
if (p == l) {
l += 256;
buf = realloc(buf, l);
}
ch = getchar();
}
buf[p] = '\0';
return buf;
}
int main(int argc, char *argv[]) {
printf("> ");
char *buf = readInfiniteString();
printf("%s\n", buf);
free(buf);
}
If you are on a POSIX system such as Linux, you should have access to getline. It can be made to behave like fgets, but if you start with a null pointer and a zero length, it will take care of memory allocation for you.
You can use in in a loop like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h> // for strcmp
int main(void)
{
char *line = NULL;
size_t nline = 0;
for (;;) {
ptrdiff_t n;
printf("> ");
// read line, allocating as necessary
n = getline(&line, &nline, stdin);
if (n < 0) break;
// remove trailing newline
if (n && line[n - 1] == '\n') line[n - 1] = '\0';
// do stuff
printf("'%s'\n", line);
if (strcmp("quit", line) == 0) break;
}
free(line);
printf("\nBye\n");
return 0;
}
The passed pointer and the length value must be consistent, so that getline can reallocate memory as required. (That means that you shouldn't change nline or the pointer line in the loop.) If the line fits, the same buffer is used in each pass through the loop, so that you have to free the line string only once, when you're done reading.
Some have mentioned that scanf is probably unsuitable for this purpose. I wouldn't suggest using fgets, either. Though it is slightly more suitable, there are problems that seem difficult to avoid, at least at first. Few C programmers manage to use fgets right the first time without reading the fgets manual in full. The parts most people manage to neglect entirely are:
what happens when the line is too large, and
what happens when EOF or an error is encountered.
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer...
I don't feel I need to stress the importance of checking the return value too much, so I won't mention it again. Suffice to say, if your program doesn't check the return value your program won't know when EOF or an error occurs; your program will probably be caught in an infinite loop.
When no '\n' is present, the remaining bytes of the line are yet to have been read. Thus, fgets will always parse the line at least once, internally. When you introduce extra logic, to check for a '\n', to that, you're parsing the data a second time.
This allows you to realloc the storage and call fgets again if you want to dynamically resize the storage, or discard the remainder of the line (warning the user of the truncation is a good idea), perhaps using something like fscanf(file, "%*[^\n]");.
hugomg mentioned using multiplication in the dynamic resize code to avoid quadratic runtime problems. Along this line, it would be a good idea to avoid parsing the same data over and over each iteration (thus introducing further quadratic runtime problems). This can be achieved by storing the number of bytes you've read (and parsed) somewhere. For example:
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL, *temp;
do {
size_t alloc_size = bytes_read * 2 + 1;
temp = realloc(bytes, alloc_size);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
temp = fgets(bytes + bytes_read, alloc_size - bytes_read, f); /* Parsing data the first time */
bytes_read += strcspn(bytes + bytes_read, "\n"); /* Parsing data the second time */
} while (temp && bytes[bytes_read] != '\n');
bytes[bytes_read] = '\0';
return bytes;
}
Those who do manage to read the manual and come up with something correct (like this) may soon realise the complexity of an fgets solution is at least twice as poor as the same solution using fgetc. We can avoid parsing data the second time by using fgetc, so using fgetc might seem most appropriate. Alas most C programmers also manage to use fgetc incorrectly when neglecting the fgetc manual.
The most important detail is to realise that fgetc returns an int, not a char. It may return typically one of 256 distinct values, between 0 and UCHAR_MAX (inclusive). It may otherwise return EOF, meaning there are typically 257 distinct values that fgetc (or consequently, getchar) may return. Trying to store those values into a char or unsigned char results in loss of information, specifically the error modes. (Of course, this typical value of 257 will change if CHAR_BIT is greater than 8, and consequently UCHAR_MAX is greater than 255)
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL;
do {
if ((bytes_read & (bytes_read + 1)) == 0) {
void *temp = realloc(bytes, bytes_read * 2 + 1);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
}
int c = fgetc(f);
bytes[bytes_read] = c >= 0 && c != '\n'
? c
: '\0';
} while (bytes[bytes_read++]);
return bytes;
}
K & R section 1.9 code for saving the longest line of an input has the function:
int getline(char s[], int lim)
{
int c, i;
for(i = 0; i < lim -1 && (c =getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = c;
return i;
}
Yet, for best practice, I've learned that a function only does one thing. I believe this function copies the line in its input onto the char array of s AND returns the length. Is this not considered two things? Would I be correct in my assumption that this is a bad practice?
To elaborate, we do use the input from the getLine function but in a very non-intuitive way.
main()
{
int len; /*current line length*/
int max; /*Current max line length seen so far*/
char line[MAXLINE]; /*Current input line */
char longest[MAXLINE]; /*Longest line saved here*/
max = 0;
while ((len = getline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
return 0;
}
/*FUNCTION GETLINE TAKEN OUT */
/*copy: copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
No, for two reasons.
First one could argue that the purpose of getline (as the name suggests) is to read a line from input. The fact that it also returns the number of characters read could be explained by the way C-strings work and that the function could otherwise not be used to read data containing null-bytes.
Second the function does not contain any additional code to calculate the length. It is a byproduct of reading the string. The function would otherwise be of type void so there are really no drawbacks to returning the length of the string.
Also coding guidelines are not and end unto themselves but should help producing good code. I do not see how this code could possibly improve by omitting the return statement and writing a separate O(n) function to retrieve the length.
The getline() function is not doing two separate things. It's doing two very closely related things, and it's definitely not bad practice to have it return the string's length.
Do not try to fit functions, algorithms and other similar things into schemas blindly. What getline() does is conceptually correct, since a string is an object. It has a contents buffer and a length. Both "properties" belong to the string object, and in fact, I would consider it bad practice to separate them.
Also, it would be unnecessarily complicated (and inefficient) to have yet another function that computes the string length. In C, strings are 0-terminated and thus such a function has to walk the entire string in order to find its length.
(Not to mention that there already is such a function in the C standard library, it's called strlen().)
Often for strings, and in general arrays, this is not an exception but actually very useful behavior. In a sense, arrays have an implicit length property. Sometimes this is known at compile time, sometimes it is known at runtime and stored in a variable and sometimes it can only be determined by virtue of being a null-terminated array.
In any case, since one cannot return an array by value, returning the length of the array (which is often very handy to know) is a very useful property of functions that write an array into a buffer. I might even argue it is idiomatic in C when the written array's length cannot be known by the caller.
ADDITION:
The above answer is with respect to functions that do not allocate memory but only write into a provided buffer. It's sometimes useful to return a simple struct struct { size_t size; valtype* vals; } if knowing the allocated array size is always useful to the caller and you don't want to later iterate over the array. Drawing the parallels with your question, you can see why in a way it isn't really doing two things; it's just giving you a more complete result.
char sentence2[10];
strncpy(sentence2, second, sizeof(sentence2)); //shouldn't I specify the sizeof(source) instead of sizeof(destination)?
sentence2[10] = '\0'; //Is this okay since strncpy does not provide the null character.
puts(sentence2);
//////////////////////////////////////////////////////////////
char *pointer = first;
for(int i =0; i < 500; i++) //Why does it crashes without this meaningless loop?!
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
So here's the problem. When I run the first part of this code, the program crashes.
However, when I add the for loop that just prints garbage values in memory locations, it does not crash but still won't strcpy properly.
Second, when using strncpy, shouldn't I specify the sizeof(source) instead of sizeof(destination) since I'm moving the bytes of the source ?
Third, It makes sense to me to add the the null terminating character after strncpy, since I've read that it doesn't add the null character on its own, but I get a warning that it's a possible out of bounds store from my pelles c IDE.
fourth and most importantly, why doesn't the simply strcpy work ?!?!
////////////////////////////////////////////////////////////////////////////////////
UPDATE:
#include <stdio.h>
#include <string.h>
void main3(void)
{
puts("\n\n-----main3 reporting for duty!------\n");
char *first = "Metal Gear";
char *second = "Suikoden";
printf("strcmp(first, first) = %d\n", strcmp(first, first)); //returns 0 when both strings are identical.
printf("strcmp(first, second) = %d\n", strcmp(first, second)); //returns a negative when the first differenet char is less in first string. (M=77 S=83)
printf("strcmp(second, first) = %d\n", strcmp(second, first)); //returns a positive when the first different char is greater in first string.(M=77 S=83)
char sentence1[10];
strcpy(sentence1, first);
puts(sentence1);
char sentence2[10];
strncpy(sentence2, second, 10); //shouldn't I specify the sizeof(source) instead of sizeof(destination).
sentence2[9] = '\0'; //Is this okay since strncpy does not provide the null character.
puts(sentence2);
char *pointer = first;
for(int i =0; i < 500; i++) //Why does it crashes without this nonsensical loop?!
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
}
This is how I teach myself to program. I write code and comment all I know about it so that
the next time I need to look up something, I just look at my own code in my files. In this one, I'm trying to learn the string library in c.
char *first = "Metal Gear";
char sentence1[10];
strcpy(sentence1, first);
This doesn't work because first has 11 characters: the ten in the string, plus the null terminator. So you would need char sentence1[11]; or more.
strncpy(sentence2, second, sizeof(sentence2));
//shouldn't I specify the sizeof(source) instead of sizeof(destination)?
No. The third argument to strncpy is supposed to be the size of the destination. The strncpy function will always write exactly that many bytes.
If you want to use strncpy you must also put a null terminator on (and there must be enough space for that terminator), unless you are sure that strlen(second) < sizeof sentence2.
Generally speaking, strncpy is almost never a good idea. If you want to put a null-terminated string into a buffer that might be too small, use snprintf.
This is how I teach myself to program.
Learning C by trial and error is not good. The problem is that if you write bad code, you may never know. It might appear to work , and then fail later on. For example it depends on what lies in memory after sentence1 as to whether your strcpy would step on any other variable's toes or not.
Learning from a book is by far and away the best idea. K&R 2 is a decent starting place if you don't have any other.
If you don't have a book, do look up online documentation for standard functions anyway. You could have learnt all this about strcpy and strncpy by reading their man pages, or their definitions in a C standard draft, etc.
Your problems start from here:
char sentence1[10];
strcpy(sentence1, first);
The number of characters in first, excluding the terminating null character, is 10. The space allocated for sentence1 has to be at least 11 for the program to behave in a predictable way. Since you have already used memory that you are not supposed to use, expecting anything to behave after that is not right.
You can fix this problem by changing
char sentence1[10];
to
char sentence1[N]; // where N > 10.
But then, you have to ask yourself. What are you trying to accomplish by allocating memory on the stack that's on the edge of being wrong? Are you trying to learn how things behave at the boundary of being wrong/right? If the answer to the second question is yes, hopefully you learned from it. If not, I hope you learned how to allocate adequate memory.
this is an array bounds write error. The indices are only 0-9
sentence2[10] = '\0';
it should be
sentence2[9] = '\0';
second, you're protecting the destination from buffer overflow, so specifying its size is appropriate.
EDIT:
Lastly, in this amazingly bad piece of code, which really isn't worth mentioning, is relevant to neither strcpy() nor strncpy(), yet seems to have earned me the disfavor of #nonsensicke, who seems to write very verbose and thoughtful posts... there are the following:
char *pointer = first;
for(int i =0; i < 500; i++)
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
Your use of int i=0 in the for loop is C99 specific. Depending on your compiler and compiler arguments, it can result in a compilation error.
for(int i =0; i < 500; i++)
better
int i = 0;
...
for(i=0;i<500;i++)
You neglect to check the return code of printf or indicate that you are deliberately ignoring it. I/O can fail after all...
printf("%c", *pointer);
better
int n = 0;
...
n = printf("%c", *pointer);
if(n!=1) { // error! }
or
(void) printf("%c", *pointer);
some folks will get onto you for not using {} with your if statements
if(*pointer == '\n') putchar('\n');
better
if(*pointer == '\n') {
putchar('\n');
}
but wait there's more... you didn't check the return code of putchar()... dang
better
unsigned char c = 0x00;
...
if(*pointer == '\n') {
c = putchar('\n');
if(c!=*pointer) // error
}
and lastly, with this nasty little loop you're basically romping through memory like a Kiwi in a Tulip field and lucky if you hit a newline. Depending on the OS (if you even have an OS), you might actually encounter some type of fault, e.g. outside your process space, maybe outside addressable RAM, etc. There's just not enough info provided to say actually, but it could happen.
My recommendation, beyond the absurdity of actually performing some type of detailed analysis on the rest of that code, would be to just remove it altogether.
Cheers!
I've been doing a fairly easy program of converting a string of Characters (assuming numbers are entered) to an Integer.
After I was done, I noticed some very peculiar "bugs" that I can't answer, mostly because of my limited knowledge of how the scanf(), gets() and fgets() functions work. (I did read a lot of literature though.)
So without writing too much text, here's the code of the program:
#include <stdio.h>
#define MAX 100
int CharToInt(const char *);
int main()
{
char str[MAX];
printf(" Enter some numbers (no spaces): ");
gets(str);
// fgets(str, sizeof(str), stdin);
// scanf("%s", str);
printf(" Entered number is: %d\n", CharToInt(str));
return 0;
}
int CharToInt(const char *s)
{
int i, result, temp;
result = 0;
i = 0;
while(*(s+i) != '\0')
{
temp = *(s+i) & 15;
result = (temp + result) * 10;
i++;
}
return result / 10;
}
So here's the problem I've been having. First, when using gets() function, the program works perfectly.
Second, when using fgets(), the result is slightly wrong because apparently fgets() function reads newline (ASCII value 10) character last which screws up the result.
Third, when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value. For this, I have no explanation.
Now I know that gets() is discouraged to use, so I would like to know if I can use fgets() here so it doesn't read (or ignores) newline character.
Also, what's the deal with the scanf() function in this program?
Never use gets. It offers no protections against a buffer overflow vulnerability (that is, you cannot tell it how big the buffer you pass to it is, so it cannot prevent a user from entering a line larger than the buffer and clobbering memory).
Avoid using scanf. If not used carefully, it can have the same buffer overflow problems as gets. Even ignoring that, it has other problems that make it hard to use correctly.
Generally you should use fgets instead, although it's sometimes inconvenient (you have to strip the newline, you must determine a buffer size ahead of time, and then you must figure out what to do with lines that are too long–do you keep the part you read and discard the excess, discard the whole thing, dynamically grow the buffer and try again, etc.). There are some non-standard functions available that do this dynamic allocation for you (e.g. getline on POSIX systems, Chuck Falconer's public domain ggets function). Note that ggets has gets-like semantics in that it strips a trailing newline for you.
Yes, you want to avoid gets. fgets will always read the new-line if the buffer was big enough to hold it (which lets you know when the buffer was too small and there's more of the line waiting to be read). If you want something like fgets that won't read the new-line (losing that indication of a too-small buffer) you can use fscanf with a scan-set conversion like: "%N[^\n]", where the 'N' is replaced by the buffer size - 1.
One easy (if strange) way to remove the trailing new-line from a buffer after reading with fgets is: strtok(buffer, "\n"); This isn't how strtok is intended to be used, but I've used it this way more often than in the intended fashion (which I generally avoid).
There are numerous problems with this code. We'll fix the badly named variables and functions and investigate the problems:
First, CharToInt() should be renamed to the proper StringToInt() since it operates on an string not a single character.
The function CharToInt() [sic.] is unsafe. It doesn't check if the user accidentally passes in a NULL pointer.
It doesn't validate input, or more correctly, skip invalid input. If the user enters in a non-digit the result will contain a bogus value. i.e. If you enter in N the code *(s+i) & 15 will produce 14 !?
Next, the nondescript temp in CharToInt() [sic.] should be called digit since that is what it really is.
Also, the kludge return result / 10; is just that -- a bad hack to work around a buggy implementation.
Likewise MAX is badly named since it may appear to conflict with the standard usage. i.e. #define MAX(X,y) ((x)>(y))?(x):(y)
The verbose *(s+i) is not as readable as simply *s. There is no need to use and clutter up the code with yet another temporary index i.
gets()
This is bad because it can overflow the input string buffer. For example, if the buffer size is 2, and you enter in 16 characters, you will overflow str.
scanf()
This is equally bad because it can overflow the input string buffer.
You mention "when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value."
That is due to an incorrect usage of scanf(). I was not able to duplicate this bug.
fgets()
This is safe because you can guarantee you never overflow the input string buffer by passing in the buffer size (which includes room for the NULL.)
getline()
A few people have suggested the C POSIX standard getline() as a replacement. Unfortunately this is not a practical portable solution as Microsoft does not implement a C version; only the standard C++ string template function as this SO #27755191 question answers. Microsoft's C++ getline() was available at least far back as Visual Studio 6 but since the OP is strictly asking about C and not C++ this isn't an option.
Misc.
Lastly, this implementation is buggy in that it doesn't detect integer overflow. If the user enters too large a number the number may become negative! i.e. 9876543210 will become -18815698?! Let's fix that too.
This is trivial to fix for an unsigned int. If the previous partial number is less then the current partial number then we have overflowed and we return the previous partial number.
For a signed int this is a little more work. In assembly we could inspect the carry-flag, but in C there is no standard built-in way to detect overflow with signed int math. Fortunately, since we are multiplying by a constant, * 10, we can easily detect this if we use an equivalent equation:
n = x*10 = x*8 + x*2
If x*8 overflows then logically x*10 will as well. For a 32-bit int overflow will happen when x*8 = 0x100000000 thus all we need to do is detect when x >= 0x20000000. Since we don't want to assume how many bits an int has we only need to test if the top 3 msb's (Most Significant Bits) are set.
Additionally, a second overflow test is needed. If the msb is set (sign bit) after the digit concatenation then we also know the number overflowed.
Code
Here is a fixed safe version along with code that you can play with to detect overflow in the unsafe versions. I've also included both a signed and unsigned versions via #define SIGNED 1
#include <stdio.h>
#include <ctype.h> // isdigit()
// 1 fgets
// 2 gets
// 3 scanf
#define INPUT 1
#define SIGNED 1
// re-implementation of atoi()
// Test Case: 2147483647 -- valid 32-bit
// Test Case: 2147483648 -- overflow 32-bit
int StringToInt( const char * s )
{
int result = 0, prev, msb = (sizeof(int)*8)-1, overflow;
if( !s )
return result;
while( *s )
{
if( isdigit( *s ) ) // Alt.: if ((*s >= '0') && (*s <= '9'))
{
prev = result;
overflow = result >> (msb-2); // test if top 3 MSBs will overflow on x*8
result *= 10;
result += *s++ & 0xF;// OPTIMIZATION: *s - '0'
if( (result < prev) || overflow ) // check if would overflow
return prev;
}
else
break; // you decide SKIP or BREAK on invalid digits
}
return result;
}
// Test case: 4294967295 -- valid 32-bit
// Test case: 4294967296 -- overflow 32-bit
unsigned int StringToUnsignedInt( const char * s )
{
unsigned int result = 0, prev;
if( !s )
return result;
while( *s )
{
if( isdigit( *s ) ) // Alt.: if (*s >= '0' && *s <= '9')
{
prev = result;
result *= 10;
result += *s++ & 0xF; // OPTIMIZATION: += (*s - '0')
if( result < prev ) // check if would overflow
return prev;
}
else
break; // you decide SKIP or BREAK on invalid digits
}
return result;
}
int main()
{
int detect_buffer_overrun = 0;
#define BUFFER_SIZE 2 // set to small size to easily test overflow
char str[ BUFFER_SIZE+1 ]; // C idiom is to reserve space for the NULL terminator
printf(" Enter some numbers (no spaces): ");
#if INPUT == 1
fgets(str, sizeof(str), stdin);
#elif INPUT == 2
gets(str); // can overflows
#elif INPUT == 3
scanf("%s", str); // can also overflow
#endif
#if SIGNED
printf(" Entered number is: %d\n", StringToInt(str));
#else
printf(" Entered number is: %u\n", StringToUnsignedInt(str) );
#endif
if( detect_buffer_overrun )
printf( "Input buffer overflow!\n" );
return 0;
}
You're correct that you should never use gets. If you want to use fgets, you can simply overwrite the newline.
char *result = fgets(str, sizeof(str), stdin);
char len = strlen(str);
if(result != NULL && str[len - 1] == '\n')
{
str[len - 1] = '\0';
}
else
{
// handle error
}
This does assume there are no embedded NULLs. Another option is POSIX getline:
char *line = NULL;
size_t len = 0;
ssize_t count = getline(&line, &len, stdin);
if(count >= 1 && line[count - 1] == '\n')
{
line[count - 1] = '\0';
}
else
{
// Handle error
}
The advantage to getline is it does allocation and reallocation for you, it handles possible embedded NULLs, and it returns the count so you don't have to waste time with strlen. Note that you can't use an array with getline. The pointer must be NULL or free-able.
I'm not sure what issue you're having with scanf.
never use gets(), it can lead to unprdictable overflows. If your string array is of size 1000 and i enter 1001 characters, i can buffer overflow your program.
Try using fgets() with this modified version of your CharToInt():
int CharToInt(const char *s)
{
int i, result, temp;
result = 0;
i = 0;
while(*(s+i) != '\0')
{
if (isdigit(*(s+i)))
{
temp = *(s+i) & 15;
result = (temp + result) * 10;
}
i++;
}
return result / 10;
}
It essentially validates the input digits and ignores anything else. This is very crude so modify it and salt to taste.
So I am not much of a programmer but let me try to answer your question about the scanf();. I think the scanf is pretty fine and use it for mostly everything without having any issues. But you have taken a not completely correct structure. It should be:
char str[MAX];
printf("Enter some text: ");
scanf("%s", &str);
fflush(stdin);
The "&" in front of the variable is important. It tells the program where (in which variable) to save the scanned value.
the fflush(stdin); clears the buffer from the standard input (keyboard) so you're less likely to get a buffer overflow.
And the difference between gets/scanf and fgets is that gets(); and scanf(); only scan until the first space ' ' while fgets(); scans the whole input. (but be sure to clean the buffer afterwards so you wont get an overflow later on)