Issues with Printf() and fgets() - c

I am working on creating a project that has a custom header file, and ran into some problems:
I'm not exactly sure what happened to this code, but after I started using the fgets() function, it seems to have stopped the output of some printf()'s in my code. Here are the important parts of the code, probably missing some #include statements here:
main.cpp
#include "ichat.h"
int main() {
prompts(P_IDLE, response);
prompts(P_USERNAME, response);
}
ichat.cpp
#include "ichat.h"
int prompts(int p_tag, char *response) {
if (p_tag == P_IDLE) {
printf("\nWaiting for connections. Press [ENTER] to start a chat.");
char msg[50];
char *ptr;
while (true) {
fgets(msg, strlen(msg), stdin);
ptr = (char*)memchr(msg, '\n', strlen(msg));
if (ptr != NULL) {
break;
}
} else if (p_tag == P_USERNAME) {
printf("\nPlease enter a username you'd like to use: ");
....
} ...
some of ichat.h, incase you were curious...
#define P_IDLE 0
#define P_USERNAME 1
compiled with gcc main.cpp ichat.cpp.
The issue is that when adding in the while loop with the fgets() function, none of the printf() functions produce output in the command line. I'm not exactly sure what is happening here, because those print functions were working, and sometimes they work if I remove the \n, or if I add another printf() function right before them. It's pretty confusing...
Is there a better way to get input from the user? I'm open to any hints, critiques, and ideas about any part of my code.

The nub of your problem is
char msg[50];
/* no initialisation of data in msg */
fgets(msg, strlen(msg), stdin);
msg is uninitialised, so the values of any of the characters in it are indeterminate - they are not guaranteed to be zero. Formally, even accessing their values (which strlen() does to search for a '\0') gives undefined behaviour.
What you need to do is change strlen(msg) to sizeof msg (which will consistently give the value of 50).
fgets() will modify msg based on input. Then you can use strlen() on the result.
Also: try not to use (char *) type conversions on the result of functions like memchr() that return void *. It is only needed in C if you have forgotten to #include <string.h>. Including the right header is the better solution than the type conversion. If your C compiler is actually a C++ compiler, the conversion will be needed, but C++ provides alternative approaches that are often preferable (more type safe, etc).

Related

How to read an input given by a driver function

I am working on a project that will have a driver program redirect its output into the standard input of my program, how would I be able to scan what this program is feeding into my program and have my program respond accordingly. I was thinking of using scanf, would that work?
Additional info:
In the first line of the input (out of many lines), the driver gives a number ending in a new line character (\n). Depending on that number, my program will parse the rest of the lines in the input and output a response. Each line will be a string of random letters and my program will need to dynamically allocate memory for each string. These strings will be part of a struct in a linked list.
You can treat the input just like regular console input, all the console stdio input calls will work. Just dont try to have a conversation with it - ie dont go
Enter foodle count : dd
Invalid number
Enter Foodle count :
becuase there is nobody on the other end
It is generally recommended to use a more robust method than scanf, as it is full of edge cases, and offers little in the way of recovering from bad input.
A beginners' guide away from scanf() is a decent read, although I would offer stricter advice: forget scanf exists.
Given the requirement of
Each line will be a string of random letters and my program will need to dynamically allocate memory for each string. These strings will be part of a struct in a linked list.
POSIX getline (or getdelim) is a solid choice, if available. This function handles reading input from a file, and will dynamically allocate memory if requested.
Here is an example skeleton program, with functionality vaguely similar to what you have described.
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
#include <stdlib.h>
char *input(FILE *file) {
char *line = NULL;
size_t n;
if (-1 == getline(&line, &n, file))
return NULL;
return line;
}
size_t parse_header(char *header) {
return 42;
}
void use_data(char *data) {
free(data);
}
int main(void) {
char *header = input(stdin);
char *data;
if (!header)
return EXIT_FAILURE;
size_t lines_expected = parse_header(header);
size_t lines_read = 0;
free(header);
while ((data = input(stdin))) {
lines_read++;
use_data(data);
}
if (lines_read != lines_expected)
fprintf(stderr, "Mismatched lines.\n");
}
If POSIX is not available on your system, or you just want to explore alternatives, one of the better methods the standard library offers for reading input is fgets. Combined with malloc, strlen, and strcpy, you have will have roughly implemented getline, but do note the caveats in the provided fgets manual.
While scanf is a poor choice for both reading and parsing input, sscanf remains useful for parsing strings, as you have greater control over the state of your data.
The strtol/strtod family of functions are usually preferred over atoi style functions for parsing numbers, though they can be difficult to use properly.

Why do I keep getting a 'fgets' function error in my C program and how do I fix it? [duplicate]

When I try to compile C code that uses the gets() function with GCC, I get this warning:
(.text+0x34): warning: the `gets' function is dangerous and should not be used.
I remember this has something to do with stack protection and security, but I'm not sure exactly why.
How can I remove this warning and why is there such a warning about using gets()?
If gets() is so dangerous then why can't we remove it?
Why is gets() dangerous
The first internet worm (the Morris Internet Worm) escaped about 30 years ago (1988-11-02), and it used gets() and a buffer overflow as one of its methods of propagating from system to system. The basic problem is that the function doesn't know how big the buffer is, so it continues reading until it finds a newline or encounters EOF, and may overflow the bounds of the buffer it was given.
You should forget you ever heard that gets() existed.
The C11 standard ISO/IEC 9899:2011 eliminated gets() as a standard function, which is A Good Thing™ (it was formally marked as 'obsolescent' and 'deprecated' in ISO/IEC 9899:1999/Cor.3:2007 — Technical Corrigendum 3 for C99, and then removed in C11). Sadly, it will remain in libraries for many years (meaning 'decades') for reasons of backwards compatibility. If it were up to me, the implementation of gets() would become:
char *gets(char *buffer)
{
assert(buffer != 0);
abort();
return 0;
}
Given that your code will crash anyway, sooner or later, it is better to head the trouble off sooner rather than later. I'd be prepared to add an error message:
fputs("obsolete and dangerous function gets() called\n", stderr);
Modern versions of the Linux compilation system generates warnings if you link gets() — and also for some other functions that also have security problems (mktemp(), …).
Alternatives to gets()
fgets()
As everyone else said, the canonical alternative to gets() is fgets() specifying stdin as the file stream.
char buffer[BUFSIZ];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
...process line of data...
}
What no-one else yet mentioned is that gets() does not include the newline but fgets() does. So, you might need to use a wrapper around fgets() that deletes the newline:
char *fgets_wrapper(char *buffer, size_t buflen, FILE *fp)
{
if (fgets(buffer, buflen, fp) != 0)
{
size_t len = strlen(buffer);
if (len > 0 && buffer[len-1] == '\n')
buffer[len-1] = '\0';
return buffer;
}
return 0;
}
Or, better:
char *fgets_wrapper(char *buffer, size_t buflen, FILE *fp)
{
if (fgets(buffer, buflen, fp) != 0)
{
buffer[strcspn(buffer, "\n")] = '\0';
return buffer;
}
return 0;
}
Also, as caf points out in a comment and paxdiablo shows in their answer, with fgets() you might have data left over on a line. My wrapper code leaves that data to be read next time; you can readily modify it to gobble the rest of the line of data if you prefer:
if (len > 0 && buffer[len-1] == '\n')
buffer[len-1] = '\0';
else
{
int ch;
while ((ch = getc(fp)) != EOF && ch != '\n')
;
}
The residual problem is how to report the three different result states — EOF or error, line read and not truncated, and partial line read but data was truncated.
This problem doesn't arise with gets() because it doesn't know where your buffer ends and merrily tramples beyond the end, wreaking havoc on your beautifully tended memory layout, often messing up the return stack (a Stack Overflow) if the buffer is allocated on the stack, or trampling over the control information if the buffer is dynamically allocated, or copying data over other precious global (or module) variables if the buffer is statically allocated. None of these is a good idea — they epitomize the phrase 'undefined behaviour`.
There is also the TR 24731-1 (Technical Report from the C Standard Committee) which provides safer alternatives to a variety of functions, including gets():
§6.5.4.1 The gets_s function
###Synopsis
#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);
Runtime-constraints
s shall not be a null pointer. n shall neither be equal to zero nor be greater than RSIZE_MAX. A new-line character, end-of-file, or read error shall occur within reading n-1 characters from stdin.25)
3 If there is a runtime-constraint violation, s[0] is set to the null character, and characters are read and discarded from stdin until a new-line character is read, or end-of-file or a read error occurs.
Description
4 The gets_s function reads at most one less than the number of characters specified by n from the stream pointed to by stdin, into the array pointed to by s. No additional characters are read after a new-line character (which is discarded) or after end-of-file. The discarded new-line character does not count towards number of characters read. A null character is written immediately after the last character read into the array.
5 If end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then s[0] is set to the null character, and the other elements of s take unspecified values.
Recommended practice
6 The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.
25) The gets_s function, unlike gets, makes it a runtime-constraint violation for a line of input to overflow the buffer to store it. Unlike fgets, gets_s maintains a one-to-one relationship between input lines and successful calls to gets_s. Programs that use gets expect such a relationship.
The Microsoft Visual Studio compilers implement an approximation to the TR 24731-1 standard, but there are differences between the signatures implemented by Microsoft and those in the TR.
The C11 standard, ISO/IEC 9899-2011, includes TR24731 in Annex K as an optional part of the library. Unfortunately, it is seldom implemented on Unix-like systems.
getline() — POSIX
POSIX 2008 also provides a safe alternative to gets() called getline(). It allocates space for the line dynamically, so you end up needing to free it. It removes the limitation on line length, therefore. It also returns the length of the data that was read, or -1 (and not EOF!), which means that null bytes in the input can be handled reliably. There is also a 'choose your own single-character delimiter' variation called getdelim(); this can be useful if you are dealing with the output from find -print0 where the ends of the file names are marked with an ASCII NUL '\0' character, for example.
In order to use gets safely, you have to know exactly how many characters you will be reading, so that you can make your buffer large enough. You will only know that if you know exactly what data you will be reading.
Instead of using gets, you want to use fgets, which has the signature
char* fgets(char *string, int length, FILE * stream);
(fgets, if it reads an entire line, will leave the '\n' in the string; you'll have to deal with that.)
gets remained an official part of the language up to the 1999 ISO C standard, but it was officially removed in the 2011 standard. Most C implementations still support it, but at least gcc issues a warning for any code that uses it.
Because gets doesn't do any kind of check while getting bytes from stdin and putting them somewhere. A simple example:
char array1[] = "12345";
char array2[] = "67890";
gets(array1);
Now, first of all you are allowed to input how many characters you want, gets won't care about it. Secondly the bytes over the size of the array in which you put them (in this case array1) will overwrite whatever they find in memory because gets will write them. In the previous example this means that if you input "abcdefghijklmnopqrts" maybe, unpredictably, it will overwrite also array2 or whatever.
The function is unsafe because it assumes consistent input. NEVER USE IT!
You should not use gets since it has no way to stop a buffer overflow. If the user types in more data than can fit in your buffer, you will most likely end up with corruption or worse.
In fact, ISO have actually taken the step of removing gets from the C standard (as of C11, though it was deprecated in C99) which, given how highly they rate backward compatibility, should be an indication of how bad that function was.
The correct thing to do is to use the fgets function with the stdin file handle since you can limit the characters read from the user.
But this also has its problems such as:
extra characters entered by the user will be picked up the next time around.
there's no quick notification that the user entered too much data.
To that end, almost every C coder at some point in their career will write a more useful wrapper around fgets as well. Here's mine:
#include <stdio.h>
#include <string.h>
#define OK 0
#define NO_INPUT 1
#define TOO_LONG 2
static int getLine (char *prmpt, char *buff, size_t sz) {
int ch, extra;
// Get line with buffer overrun protection.
if (prmpt != NULL) {
printf ("%s", prmpt);
fflush (stdout);
}
if (fgets (buff, sz, stdin) == NULL)
return NO_INPUT;
// If it was too long, there'll be no newline. In that case, we flush
// to end of line so that excess doesn't affect the next call.
if (buff[strlen(buff)-1] != '\n') {
extra = 0;
while (((ch = getchar()) != '\n') && (ch != EOF))
extra = 1;
return (extra == 1) ? TOO_LONG : OK;
}
// Otherwise remove newline and give string back to caller.
buff[strlen(buff)-1] = '\0';
return OK;
}
with some test code:
// Test program for getLine().
int main (void) {
int rc;
char buff[10];
rc = getLine ("Enter string> ", buff, sizeof(buff));
if (rc == NO_INPUT) {
printf ("No input\n");
return 1;
}
if (rc == TOO_LONG) {
printf ("Input too long\n");
return 1;
}
printf ("OK [%s]\n", buff);
return 0;
}
It provides the same protections as fgets in that it prevents buffer overflows but it also notifies the caller as to what happened and clears out the excess characters so that they do not affect your next input operation.
Feel free to use it as you wish, I hereby release it under the "do what you damn well want to" licence :-)
fgets.
To read from the stdin:
char string[512];
fgets(string, sizeof(string), stdin); /* no buffer overflows here, you're safe! */
You can't remove API functions without breaking the API. If you would, many applications would no longer compile or run at all.
This is the reason that one reference gives:
Reading a line that overflows the
array pointed to by s results in
undefined behavior. The use of fgets()
is recommended.
I read recently, in a USENET post to comp.lang.c, that gets() is getting removed from the Standard. WOOHOO
You'll be happy to know that the
committee just voted (unanimously, as
it turns out) to remove gets() from
the draft as well.
In C11(ISO/IEC 9899:201x), gets() has been removed. (It's deprecated in ISO/IEC 9899:1999/Cor.3:2007(E))
In addition to fgets(), C11 introduces a new safe alternative gets_s():
C11 K.3.5.4.1 The gets_s function
#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);
However, in the Recommended practice section, fgets() is still preferred.
The fgets function allows properly-written programs to safely process input lines too
long to store in the result array. In general this requires that callers of fgets pay
attention to the presence or absence of a new-line character in the result array. Consider
using fgets (along with any needed processing based on new-line characters) instead of
gets_s.
gets() is dangerous because it is possible for the user to crash the program by typing too much into the prompt. It can't detect the end of available memory, so if you allocate an amount of memory too small for the purpose, it can cause a seg fault and crash. Sometimes it seems very unlikely that a user will type 1000 letters into a prompt meant for a person's name, but as programmers, we need to make our programs bulletproof. (it may also be a security risk if a user can crash a system program by sending too much data).
fgets() allows you to specify how many characters are taken out of the standard input buffer, so they don't overrun the variable.
The C gets function is dangerous and has been a very costly mistake. Tony Hoare singles it out for specific mention in his talk "Null References: The Billion Dollar Mistake":
http://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare
The whole hour is worth watching but for his comments view from 30 minutes on with the specific gets criticism around 39 minutes.
Hopefully this whets your appetite for the whole talk, which draws attention to how we need more formal correctness proofs in languages and how language designers should be blamed for the mistakes in their languages, not the programmer. This seems to have been the whole dubious reason for designers of bad languages to push the blame to programmers in the guise of 'programmer freedom'.
I would like to extend an earnest invitation to any C library maintainers out there who are still including gets in their libraries "just in case anyone is still depending on it": Please replace your implementation with the equivalent of
char *gets(char *str)
{
strcpy(str, "Never use gets!");
return str;
}
This will help make sure nobody is still depending on it. Thank you.
Additional info:
From man 3 gets on Linux Ubuntu you'll see (emphasis added):
DESCRIPTION
Never use this function.
And, from the cppreference.com wiki here (https://en.cppreference.com/w/c/io/gets) you'll see: Notes Never use gets().:
Notes
The gets() function does not perform bounds checking, therefore this function is extremely vulnerable to buffer-overflow attacks. It cannot be used safely (unless the program runs in an environment which restricts what can appear on stdin). For this reason, the function has been deprecated in the third corrigendum to the C99 standard and removed altogether in the C11 standard. fgets() and gets_s() are the recommended replacements.
Never use gets().
As you can see, the function has been deprecated and removed entirely in C11 or later.
Use fgets() or gets_s() instead.
Here is my demo usage of fgets(), with full error checking:
From read_stdin_fgets_basic_input_from_user.c:
#include <errno.h> // `errno`
#include <stdio.h> // `printf()`, `fgets()`
#include <stdlib.h> // `exit()`
#include <string.h> // `strerror()`
// int main(int argc, char *argv[]) // alternative prototype
int main()
{
char buf[10];
// NEVER USE `gets()`! USE `fgets()` BELOW INSTEAD!
// USE THIS!: `fgets()`: "file get string", which reads until either EOF is
// reached, OR a newline (`\n`) is found, keeping the newline char in
// `buf`.
// For `feof()` and `ferror()`, see:
// 1. https://en.cppreference.com/w/c/io/feof
// 1. https://en.cppreference.com/w/c/io/ferror
printf("Enter up to %zu chars: ", sizeof(buf) - 1); // - 1 to save room
// for null terminator
char* retval = fgets(buf, sizeof(buf), stdin);
if (feof(stdin))
{
// Check for `EOF`, which means "End of File was reached".
// - This doesn't really make sense on `stdin` I think, but it is a good
// check to have when reading from a regular file with `fgets
// ()`. Keep it here regardless, just in case.
printf("EOF (End of File) reached.\n");
}
if (ferror(stdin))
{
printf("Error indicator set. IO error when reading from file "
"`stdin`.\n");
}
if (retval == NULL)
{
printf("ERROR in %s(): fgets() failed; errno = %i: %s\n",
__func__, errno, strerror(errno));
exit(EXIT_FAILURE);
}
size_t num_chars_written = strlen(buf) + 1; // + 1 for null terminator
if (num_chars_written >= sizeof(buf))
{
printf("Warning: user input may have been truncated! All %zu chars "
"were written into buffer.\n", num_chars_written);
}
printf("You entered \"%s\".\n", buf);
return 0;
}
Sample runs and output:
eRCaGuy_hello_world/c$ gcc -Wall -Wextra -Werror -O3 -std=c17 read_stdin_fgets_basic_input_from_user.c -o bin/a && bin/a
Enter up to 9 chars: hello world!
Warning: user input may have been truncated! All 10 chars were written into buffer.
You entered "hello wor".
eRCaGuy_hello_world/c$ gcc -Wall -Wextra -Werror -O3 -std=c17 read_stdin_fgets_basic_input_from_user.c -o bin/a && bin/a
Enter up to 9 chars: hey
You entered "hey
".
In a few words gets() (can) be dangerous because the user might input something bigger than what the variable has enough space to store. First answer says about fgets() and why it is safer.

Why does comparing the result of scanf() with NULL generate a compiler warning?

I'm writing in C and I need to read everything that arrives in input but I don't know how many characters I will receive. I wrote
while (scanf("%c", &read) != NULL)
but the compiler tells me: [Warning] comparison between pointer and integer, so what should I write instead?
Instead, of scanf("%c", &read) you may consider read = getc(stdin).
Please, be aware that getc()/fgetc() return int.
This allows to store any character as number in range [0, 255] as well as to return EOF (usually -1) in case of failure.
So, with getc() it would look like:
int read;
while ((read = getc(stdin)) != EOF)
Note:
You may assign read to a variable of type char – it will be implicitly converted. In case of getc() succeeded, there shouldn't be any data loss in this implicit conversion.
A little sample to show this at work:
#include <stdio.h>
int main(void)
{
enum { N = 10 };
char buffer[N];
/* read characters and print if buffer ful */
int read, n = 0;
while ((read = getc(stdin)) != EOF) {
if (n == N) {
printf("%.*s", N, buffer); n = 0;
}
buffer[n++] = read;
}
/* print rest if buffer not empty */
if (n > 0) printf("%.*s", n, buffer);
/* done */
return 0;
}
Note:
The read characters are stored in buffer without a termination '\0'. This is handled in printf() respectively by the formatter %.*s meaning string with max. width * where width and string are read as consecutive arguments.
Live Demo on ideone
Your
scanf("%c", &read) != NULL
is a typing mistake : the types of left and right operands to != don't match. Read about type systems.
Read the documentation of scanf. It says that scanf returns some int value.
But NULL is a pointer value (it is (void*)0).
How can a pointer be meaningfully compared to an int? On my Debian/x86-64, they don't even have the same size (as returned by sizeof): a pointer takes 8 bytes but an int takes 4 bytes.
So the compiler is right in warning you.
You probably want to code instead something like while (scanf("%c", &read) >0) or even while (scanf("%c", &read) == 1) since for a successful scan of "%c" the scanf function is documented to give 1.
Of course, in that particular case, using fgetc is better (more readable, and probably slightly faster). Read documentation of fgetc and follow the advice given by Sheff's answer.
Next time:
be sure to read the documentation of every function that you are using. Never use a function whose documentation you have not read and understood.
ask the compiler to give all warnings and debug info, so with GCC compile with gcc -Wall -Wextra -g. Trust your compiler, and improve your code to get no warnings at all.
read How To Debug Small Programs.
read the documentation of gdb (and perhaps of valgrind).
study, for inspiration, the source code of some small free software similar to yours (perhaps on github).
PS. Naming your variable read is poor taste. For most people it conflicts with the POSIX read function.

Read a string from a file including white space and search a file for a given string in C

I am trying to search a string in a text file,when the text file is like what given below :
"Naveen; Okies
PSG; Diploma
SREC; BECSE"
When output console ask for input string and when i type naveen it will result in printing Okies, when i typed PSG it will print Diploma. This works fine as I am using the below code :
fscanf(fp, "%[^;];%s\n", temp, Mean);
However below text file is not working,
"Naveen; Okies Is it working
PSG; Diploma Is it working
SREC; BECSE Is it working"
My code still gives me Okies as output for Naveen, where i need "Okies Is it working" as output.
So i changed my code to fscanf(fp, "%[^;];%[^\n]s", temp, Mean); where i am getting 'Okies Is it working' as output. But for searching string it's not searching next line. When i search PSG, I dont get any ouput.
Kindly help me to understand my issue.
Side-bar
Note that you should check the return value from fscanf().
You say you tried:
fscanf(fp, "%[^;];%[^\n]s", temp, Mean);
This is probably a confused format. The s at the end is looking for a literal s in the input, but it will never be found and you'll have no way of knowing that it is not found. The %[^\n] scan set conversion specification looks for a sequence of 'non-newlines'. It will only stop when the next character is a newline, or EOF. The s therefore is a literal s that will never be matched. But the return values from fscanf() is the number of successful conversions, which would probably be 2. You have no way of spotting whether that s was read. It should be removed from the format string.
Main answer
To address your main question, the %s format stops at the first blank. If you want to process the whole line, don't use %s. Use either POSIX getline() or standard C fgets() to read the line, and then analyze it.
You can analyze it with strtok(). I wouldn't do that in any library code because any library function that calls strtok() cannot be used from code that might also be using strtok(), nor can it call any function where that function, or one of the functions it calls directly or indirectly, uses strtok(). The strtok() function is poisonous — you can only use it in one function at a time. These comments do not apply to
strtok_r() or the analogous Microsoft-provided variant strtok_s() — which is similar to but different from the strtok_s() defined in optional Annex K of C11. The variants with a suffix are reentrant and do not poison the system like strtok() does.
I'd probably use strchr() in this context; you could also look at
strstr(),
strpbrk(),
strspn(),
strcpsn(). All of these are standard C functions, and have been since C89/C90.
I think Jonathan has explained it pretty well, however, I am adding a sample to show how to deal with your example for learning purposes. Bear in mind you might wanna change some functions as I used the deprecated (insecure) ones, probably this could be an exercise for you.
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
int main(int argc, char *argv[]) {
FILE *fp = NULL;
char szBuffer[1024] = { '\0' };
char szChoice[256] = { '\0' };
char szResult[256] = { '\0' };
if ((fp = fopen("test.txt", "r")) == NULL) {
printf("Error opening file\n");
return EXIT_FAILURE;
}
printf("Enter your choice: ");
scanf("%s", &szChoice);
while (fgets(szBuffer, sizeof(szBuffer), fp) != NULL) {
if (!strncmp(szBuffer, szChoice, strlen(szChoice))) {
char *pch = szBuffer;
pch += (strlen(szChoice) + 1);
printf("Result: %s", pch);
}
}
getchar();
return EXIT_SUCCESS;
}

Why is the gets function so dangerous that it should not be used?

When I try to compile C code that uses the gets() function with GCC, I get this warning:
(.text+0x34): warning: the `gets' function is dangerous and should not be used.
I remember this has something to do with stack protection and security, but I'm not sure exactly why.
How can I remove this warning and why is there such a warning about using gets()?
If gets() is so dangerous then why can't we remove it?
Why is gets() dangerous
The first internet worm (the Morris Internet Worm) escaped about 30 years ago (1988-11-02), and it used gets() and a buffer overflow as one of its methods of propagating from system to system. The basic problem is that the function doesn't know how big the buffer is, so it continues reading until it finds a newline or encounters EOF, and may overflow the bounds of the buffer it was given.
You should forget you ever heard that gets() existed.
The C11 standard ISO/IEC 9899:2011 eliminated gets() as a standard function, which is A Good Thing™ (it was formally marked as 'obsolescent' and 'deprecated' in ISO/IEC 9899:1999/Cor.3:2007 — Technical Corrigendum 3 for C99, and then removed in C11). Sadly, it will remain in libraries for many years (meaning 'decades') for reasons of backwards compatibility. If it were up to me, the implementation of gets() would become:
char *gets(char *buffer)
{
assert(buffer != 0);
abort();
return 0;
}
Given that your code will crash anyway, sooner or later, it is better to head the trouble off sooner rather than later. I'd be prepared to add an error message:
fputs("obsolete and dangerous function gets() called\n", stderr);
Modern versions of the Linux compilation system generates warnings if you link gets() — and also for some other functions that also have security problems (mktemp(), …).
Alternatives to gets()
fgets()
As everyone else said, the canonical alternative to gets() is fgets() specifying stdin as the file stream.
char buffer[BUFSIZ];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
...process line of data...
}
What no-one else yet mentioned is that gets() does not include the newline but fgets() does. So, you might need to use a wrapper around fgets() that deletes the newline:
char *fgets_wrapper(char *buffer, size_t buflen, FILE *fp)
{
if (fgets(buffer, buflen, fp) != 0)
{
size_t len = strlen(buffer);
if (len > 0 && buffer[len-1] == '\n')
buffer[len-1] = '\0';
return buffer;
}
return 0;
}
Or, better:
char *fgets_wrapper(char *buffer, size_t buflen, FILE *fp)
{
if (fgets(buffer, buflen, fp) != 0)
{
buffer[strcspn(buffer, "\n")] = '\0';
return buffer;
}
return 0;
}
Also, as caf points out in a comment and paxdiablo shows in their answer, with fgets() you might have data left over on a line. My wrapper code leaves that data to be read next time; you can readily modify it to gobble the rest of the line of data if you prefer:
if (len > 0 && buffer[len-1] == '\n')
buffer[len-1] = '\0';
else
{
int ch;
while ((ch = getc(fp)) != EOF && ch != '\n')
;
}
The residual problem is how to report the three different result states — EOF or error, line read and not truncated, and partial line read but data was truncated.
This problem doesn't arise with gets() because it doesn't know where your buffer ends and merrily tramples beyond the end, wreaking havoc on your beautifully tended memory layout, often messing up the return stack (a Stack Overflow) if the buffer is allocated on the stack, or trampling over the control information if the buffer is dynamically allocated, or copying data over other precious global (or module) variables if the buffer is statically allocated. None of these is a good idea — they epitomize the phrase 'undefined behaviour`.
There is also the TR 24731-1 (Technical Report from the C Standard Committee) which provides safer alternatives to a variety of functions, including gets():
§6.5.4.1 The gets_s function
###Synopsis
#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);
Runtime-constraints
s shall not be a null pointer. n shall neither be equal to zero nor be greater than RSIZE_MAX. A new-line character, end-of-file, or read error shall occur within reading n-1 characters from stdin.25)
3 If there is a runtime-constraint violation, s[0] is set to the null character, and characters are read and discarded from stdin until a new-line character is read, or end-of-file or a read error occurs.
Description
4 The gets_s function reads at most one less than the number of characters specified by n from the stream pointed to by stdin, into the array pointed to by s. No additional characters are read after a new-line character (which is discarded) or after end-of-file. The discarded new-line character does not count towards number of characters read. A null character is written immediately after the last character read into the array.
5 If end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then s[0] is set to the null character, and the other elements of s take unspecified values.
Recommended practice
6 The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.
25) The gets_s function, unlike gets, makes it a runtime-constraint violation for a line of input to overflow the buffer to store it. Unlike fgets, gets_s maintains a one-to-one relationship between input lines and successful calls to gets_s. Programs that use gets expect such a relationship.
The Microsoft Visual Studio compilers implement an approximation to the TR 24731-1 standard, but there are differences between the signatures implemented by Microsoft and those in the TR.
The C11 standard, ISO/IEC 9899-2011, includes TR24731 in Annex K as an optional part of the library. Unfortunately, it is seldom implemented on Unix-like systems.
getline() — POSIX
POSIX 2008 also provides a safe alternative to gets() called getline(). It allocates space for the line dynamically, so you end up needing to free it. It removes the limitation on line length, therefore. It also returns the length of the data that was read, or -1 (and not EOF!), which means that null bytes in the input can be handled reliably. There is also a 'choose your own single-character delimiter' variation called getdelim(); this can be useful if you are dealing with the output from find -print0 where the ends of the file names are marked with an ASCII NUL '\0' character, for example.
In order to use gets safely, you have to know exactly how many characters you will be reading, so that you can make your buffer large enough. You will only know that if you know exactly what data you will be reading.
Instead of using gets, you want to use fgets, which has the signature
char* fgets(char *string, int length, FILE * stream);
(fgets, if it reads an entire line, will leave the '\n' in the string; you'll have to deal with that.)
gets remained an official part of the language up to the 1999 ISO C standard, but it was officially removed in the 2011 standard. Most C implementations still support it, but at least gcc issues a warning for any code that uses it.
Because gets doesn't do any kind of check while getting bytes from stdin and putting them somewhere. A simple example:
char array1[] = "12345";
char array2[] = "67890";
gets(array1);
Now, first of all you are allowed to input how many characters you want, gets won't care about it. Secondly the bytes over the size of the array in which you put them (in this case array1) will overwrite whatever they find in memory because gets will write them. In the previous example this means that if you input "abcdefghijklmnopqrts" maybe, unpredictably, it will overwrite also array2 or whatever.
The function is unsafe because it assumes consistent input. NEVER USE IT!
You should not use gets since it has no way to stop a buffer overflow. If the user types in more data than can fit in your buffer, you will most likely end up with corruption or worse.
In fact, ISO have actually taken the step of removing gets from the C standard (as of C11, though it was deprecated in C99) which, given how highly they rate backward compatibility, should be an indication of how bad that function was.
The correct thing to do is to use the fgets function with the stdin file handle since you can limit the characters read from the user.
But this also has its problems such as:
extra characters entered by the user will be picked up the next time around.
there's no quick notification that the user entered too much data.
To that end, almost every C coder at some point in their career will write a more useful wrapper around fgets as well. Here's mine:
#include <stdio.h>
#include <string.h>
#define OK 0
#define NO_INPUT 1
#define TOO_LONG 2
static int getLine (char *prmpt, char *buff, size_t sz) {
int ch, extra;
// Get line with buffer overrun protection.
if (prmpt != NULL) {
printf ("%s", prmpt);
fflush (stdout);
}
if (fgets (buff, sz, stdin) == NULL)
return NO_INPUT;
// If it was too long, there'll be no newline. In that case, we flush
// to end of line so that excess doesn't affect the next call.
if (buff[strlen(buff)-1] != '\n') {
extra = 0;
while (((ch = getchar()) != '\n') && (ch != EOF))
extra = 1;
return (extra == 1) ? TOO_LONG : OK;
}
// Otherwise remove newline and give string back to caller.
buff[strlen(buff)-1] = '\0';
return OK;
}
with some test code:
// Test program for getLine().
int main (void) {
int rc;
char buff[10];
rc = getLine ("Enter string> ", buff, sizeof(buff));
if (rc == NO_INPUT) {
printf ("No input\n");
return 1;
}
if (rc == TOO_LONG) {
printf ("Input too long\n");
return 1;
}
printf ("OK [%s]\n", buff);
return 0;
}
It provides the same protections as fgets in that it prevents buffer overflows but it also notifies the caller as to what happened and clears out the excess characters so that they do not affect your next input operation.
Feel free to use it as you wish, I hereby release it under the "do what you damn well want to" licence :-)
fgets.
To read from the stdin:
char string[512];
fgets(string, sizeof(string), stdin); /* no buffer overflows here, you're safe! */
You can't remove API functions without breaking the API. If you would, many applications would no longer compile or run at all.
This is the reason that one reference gives:
Reading a line that overflows the
array pointed to by s results in
undefined behavior. The use of fgets()
is recommended.
I read recently, in a USENET post to comp.lang.c, that gets() is getting removed from the Standard. WOOHOO
You'll be happy to know that the
committee just voted (unanimously, as
it turns out) to remove gets() from
the draft as well.
In C11(ISO/IEC 9899:201x), gets() has been removed. (It's deprecated in ISO/IEC 9899:1999/Cor.3:2007(E))
In addition to fgets(), C11 introduces a new safe alternative gets_s():
C11 K.3.5.4.1 The gets_s function
#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);
However, in the Recommended practice section, fgets() is still preferred.
The fgets function allows properly-written programs to safely process input lines too
long to store in the result array. In general this requires that callers of fgets pay
attention to the presence or absence of a new-line character in the result array. Consider
using fgets (along with any needed processing based on new-line characters) instead of
gets_s.
gets() is dangerous because it is possible for the user to crash the program by typing too much into the prompt. It can't detect the end of available memory, so if you allocate an amount of memory too small for the purpose, it can cause a seg fault and crash. Sometimes it seems very unlikely that a user will type 1000 letters into a prompt meant for a person's name, but as programmers, we need to make our programs bulletproof. (it may also be a security risk if a user can crash a system program by sending too much data).
fgets() allows you to specify how many characters are taken out of the standard input buffer, so they don't overrun the variable.
The C gets function is dangerous and has been a very costly mistake. Tony Hoare singles it out for specific mention in his talk "Null References: The Billion Dollar Mistake":
http://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare
The whole hour is worth watching but for his comments view from 30 minutes on with the specific gets criticism around 39 minutes.
Hopefully this whets your appetite for the whole talk, which draws attention to how we need more formal correctness proofs in languages and how language designers should be blamed for the mistakes in their languages, not the programmer. This seems to have been the whole dubious reason for designers of bad languages to push the blame to programmers in the guise of 'programmer freedom'.
I would like to extend an earnest invitation to any C library maintainers out there who are still including gets in their libraries "just in case anyone is still depending on it": Please replace your implementation with the equivalent of
char *gets(char *str)
{
strcpy(str, "Never use gets!");
return str;
}
This will help make sure nobody is still depending on it. Thank you.
Additional info:
From man 3 gets on Linux Ubuntu you'll see (emphasis added):
DESCRIPTION
Never use this function.
And, from the cppreference.com wiki here (https://en.cppreference.com/w/c/io/gets) you'll see: Notes Never use gets().:
Notes
The gets() function does not perform bounds checking, therefore this function is extremely vulnerable to buffer-overflow attacks. It cannot be used safely (unless the program runs in an environment which restricts what can appear on stdin). For this reason, the function has been deprecated in the third corrigendum to the C99 standard and removed altogether in the C11 standard. fgets() and gets_s() are the recommended replacements.
Never use gets().
As you can see, the function has been deprecated and removed entirely in C11 or later.
Use fgets() or gets_s() instead.
Here is my demo usage of fgets(), with full error checking:
From read_stdin_fgets_basic_input_from_user.c:
#include <errno.h> // `errno`
#include <stdio.h> // `printf()`, `fgets()`
#include <stdlib.h> // `exit()`
#include <string.h> // `strerror()`
// int main(int argc, char *argv[]) // alternative prototype
int main()
{
char buf[10];
// NEVER USE `gets()`! USE `fgets()` BELOW INSTEAD!
// USE THIS!: `fgets()`: "file get string", which reads until either EOF is
// reached, OR a newline (`\n`) is found, keeping the newline char in
// `buf`.
// For `feof()` and `ferror()`, see:
// 1. https://en.cppreference.com/w/c/io/feof
// 1. https://en.cppreference.com/w/c/io/ferror
printf("Enter up to %zu chars: ", sizeof(buf) - 1); // - 1 to save room
// for null terminator
char* retval = fgets(buf, sizeof(buf), stdin);
if (feof(stdin))
{
// Check for `EOF`, which means "End of File was reached".
// - This doesn't really make sense on `stdin` I think, but it is a good
// check to have when reading from a regular file with `fgets
// ()`. Keep it here regardless, just in case.
printf("EOF (End of File) reached.\n");
}
if (ferror(stdin))
{
printf("Error indicator set. IO error when reading from file "
"`stdin`.\n");
}
if (retval == NULL)
{
printf("ERROR in %s(): fgets() failed; errno = %i: %s\n",
__func__, errno, strerror(errno));
exit(EXIT_FAILURE);
}
size_t num_chars_written = strlen(buf) + 1; // + 1 for null terminator
if (num_chars_written >= sizeof(buf))
{
printf("Warning: user input may have been truncated! All %zu chars "
"were written into buffer.\n", num_chars_written);
}
printf("You entered \"%s\".\n", buf);
return 0;
}
Sample runs and output:
eRCaGuy_hello_world/c$ gcc -Wall -Wextra -Werror -O3 -std=c17 read_stdin_fgets_basic_input_from_user.c -o bin/a && bin/a
Enter up to 9 chars: hello world!
Warning: user input may have been truncated! All 10 chars were written into buffer.
You entered "hello wor".
eRCaGuy_hello_world/c$ gcc -Wall -Wextra -Werror -O3 -std=c17 read_stdin_fgets_basic_input_from_user.c -o bin/a && bin/a
Enter up to 9 chars: hey
You entered "hey
".
In a few words gets() (can) be dangerous because the user might input something bigger than what the variable has enough space to store. First answer says about fgets() and why it is safer.

Resources