I want to read line-by-line from a given input file,, process each line (i.e. its words) and then move on to other line...
So i am using fscanf(fptr,"%s",words) to read the word and it should stop once it encounters end of line...
but this is not possible in fscanf, i guess... so please tell me the way as to what to do...
I should read all the words in the given line (i.e. end of line should be encountered) to terminate and then move on to other line, and repeat the same process..
Use fgets(). Yeah, link is to cplusplus, but it originates from c stdio.h.
You may also use sscanf() to read words from string, or just strtok() to separate them.
In response to comment: this behavior of fgets() (leaving \n in the string) allows you to determine if the actual end-of-line was encountered. Note, that fgets() may also read only part of the line from file if supplied buffer is not large enough. In your case - just check for \n in the end and remove it, if you don't need it. Something like this:
// actually you'll get str contents from fgets()
char str[MAX_LEN] = "hello there\n";
size_t len = strlen(str);
if (len && str[len-1] == '\n') {
str[len-1] = 0;
}
Simple as that.
If you are working on a system with the GNU extensions available there is something called getline (man 3 getline) which allows you to read a file on a line by line basis, while getline will allocate extra memory for you if needed. The manpage contains an example which I modified to split the line using strtok (man 3 strtrok).
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE * fp;
char * line = NULL;
size_t len = 0;
ssize_t read;
fp = fopen("/etc/motd", "r");
if (fp == NULL)
{
printf("File open failed\n");
return 0;
}
while ((read = getline(&line, &len, fp)) != -1) {
// At this point we have a line held within 'line'
printf("Line: %s", line);
const char * delim = " \n";
char * ptr;
ptr = (char * )strtok(line,delim);
while(ptr != NULL)
{
printf("Word: %s\n",ptr);
ptr = (char *) strtok(NULL,delim);
}
}
if (line)
{
free(line);
}
return 0;
}
Given the buffering inherent in all the stdio functions, I would be tempted to read the stream character by character with getc(). A simple finite state machine can identify word boundaries, and line boundaries if needed. An advantage is the complete lack of buffers to overflow, aside from whatever buffer you collect the current word in if your further processing requires it.
You might want to do a quick benchmark comparing the time required to read a large file completely with getc() vs. fgets()...
If an outside constraint requires that the file really be read a line at a time (for instance, if you need to handle line-oriented input from a tty) then fgets() probably is your friend as other answers point out, but even then the getc() approach may be acceptable as long as the input stream is running in line-buffered mode which is common for stdin if stdin is on a tty.
Edit: To have control over the buffer on the input stream, you might need to call setbuf() or setvbuf() to force it to a buffered mode. If the input stream ends up unbuffered, then using an explicit buffer of some form will always be faster than getc() on a raw stream.
Best performance would probably use a buffer related to your disk I/O, at least two disk blocks in size and probably a lot more than that. Often, even that performance can be beat by arranging the input to be a memory mapped file and relying on the kernel's paging to read and fill the buffer as you process the file as if it were one giant string.
Regardless of the choice, if performance is going to matter then you will want to benchmark several approaches and pick the one that works best in your platform. And even then, the simplest expression of your problem may still be the best overall answer if it gets written, debugged and used.
but this is not possible in fscanf,
It is, with a bit of wickedness ;)
Update: More clarification on evilness
but unfortunately a bit wrong. I assume [^\n]%*[^\n] should read [^\n]%*. Moreover, one should note that this approach will strip whitespaces from the lines. – dragonfly
Note that xstr(MAXLINE) [^\n] reads MAXLINE characters which can be anything except the newline character (i.e. \n). The second part of the specifier i.e. *[^\n] rejects anything (that's why the * character is there) if the line has more than MAXLINE characters upto but NOT including the newline character. The newline character tells scanf to stop matching. What if we did as dragonfly suggested? The only problem is scanf will not know where to stop and will keep suppressing assignment until the next newline is hit (which is another match for the first part). Hence you will trail by one line of input when reporting.
What if you wanted to read in a loop? A little modification is required. We need to add a getchar() to consume the unmatched newline. Here's the code:
#include <stdio.h>
#define MAXLINE 255
/* stringify macros: these work only in pairs, so keep both */
#define str(x) #x
#define xstr(x) str(x)
int main() {
char line[ MAXLINE + 1 ];
/*
Wickedness explained: we read from `stdin` to `line`.
The format specifier is the only tricky part: We don't
bite off more than we can chew -- hence the specification
of maximum number of chars i.e. MAXLINE. However, this
width has to go into a string, so we stringify it using
macros. The careful reader will observe that once we have
read MAXLINE characters we discard the rest upto and
including a newline.
*/
int n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
if (!feof(stdin)) {
getchar();
}
while (n == 1) {
printf("[line:] %s\n", line);
n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
if (!feof(stdin)) {
getchar();
}
}
return 0;
}
Related
I need to ask one more question about reading from the stdin.
I am reading a huge trunk of lines from the stdin, but it is definitely unknown which is the size of every line. So I don't want to have a buffer like 50Mio just for a file having lines of three char and than a file using these 50 Mio per line.
So at the moment I am having this code:
int cur_max = 2047;
char *str = malloc(sizeof(char) * cur_max);
int length = 0;
while(fgets(str, sizeof(str), stdin) != NULL) {
//do something with str
//for example printing
printf("%s",str);
}
free(str);
So I am using fgets for every line and I do have a first size of 2047 char per line.
My plan is to increase the size of the buffer (str) when a line hits the limit. So my idea is to count the size with length and if the current length is bigger than cur_max then I am doubling the cur_max.
The idea comes from here Read line from file without knowing the line length
I am currently not sure how to do this with fgets because I think fgets is not doing this char by char so I don't know the moment when to increase the size.
Incorrect code
sizeof(str) is the size of a pointer, like 2, 4 or 8 bytes. Pass to fgets() the size of the memory pointed to by str. #Andrew Henle #Steve Summit
char *str = malloc(sizeof(char) * cur_max);
...
// while(fgets(str, sizeof(str), stdin) != NULL
while(fgets(str, cur_max, stdin) != NULL
Environmental limits
Text files and fgets() are not the portable solution for reading excessively long lines.
An implementation shall support text files with lines containing at least 254 characters, including the terminating new-line character. The value of the macro BUFSIZ shall be at least 256 C11 §7.21.2 9
So once the line length exceeds BUFSIZ - 2, code is on its own as to if the C standard library functions can handle a text file.
So either read the data as binary, use other libraries that insure the desired functionality, or rely on hope.
Note: BUFSIZ defined in <stdio.h>
POSIX.1 getline() (man 3 getline) is available in almost all operating systems' C libraries (the only exception I know of is Windows). A loop to read lines of any length is
char *line_ptr = NULL;
size_t line_max = 0;
ssize_t line_len;
while (1) {
line_len = getline(&line_ptr, &line_max, stdin);
if (line_len == -1)
break;
/* You now have 'line_len' chars at 'line_ptr',
but it may contain embedded nul chars ('\0').
Also, line_ptr[line_len] == '\0'.
*/
}
/* Discard dynamically allocated buffer; allow reuse later. */
free(line_ptr);
line_ptr = NULL;
line_max = 0;
There is also a related function getdelim(), that takes an extra parameter (specified before the stream), used as an end-of-record marker. It is particularly useful in Unixy/POSIXy environments when reading file names from e.g. standard input, as you can use nul itself ('\0') as the separator (see e.g. find -print0 or xargs -0), allowing correct handling for all possible file names.
If you use Windows, or if you have text files with varying newline conventions (not just '\n', but any of '\n', '\r', "\r\n", or "\n\r"), you can use my getline_universal() function implementation from another of my answers. It differs from standard getline() and fgets() in that the newline is not included in the line it returns; it is also left in the stream and consumed/ignored by the next call to getline_universal(). If you use getline_universal() to read each line in a file or stream, it will work as expected.
How to scan total line from user input with c program?
I tried scanf("%99[^\n]",st), but it is not working when I scan something before this scan statment.It worked if this is the first scan statement.
How to scan total line from user input with c program?
There are many ways to read a line of input, and your usage of the word scan suggests you're already focused on the scanf() function for the job. This is unfortunate, because, although you can (to some extent) achieve what you want with scanf(), it's definitely not the best tool for reading a line.
As already stated in the comments, your scanf() format string will stop at a newline, so the next scanf() will first find that newline and it can't match [^\n] (which means anything except newline). As a newline is just another whitespace character, adding a blank in front of your conversion will silently eat it up ;)
But now for the better solution: Assuming you only want to use standard C functions, there's already one function for exactly the job of reading a line: fgets(). The following code snippet should explain its usage:
char line[1024];
char *str = fgets(line, 1024, stdin); // read from the standard input
if (!str)
{
// couldn't read input for some reason, handle error here
exit(1); // <- for example
}
// fgets includes the newline character that ends the line, but if the line
// is longer than 1022 characters, it will stop early here (it will never
// write more bytes than the second parameter you pass). Often you don't
// want that newline character, and the following line overwrites it with
// 0 (which is "end of string") **only** if it was there:
line[strcspn(line, "\n")] = 0;
Note that you might want to check for the newline character with strchr() instead, so you actually know whether you have the whole line or maybe your input buffer was to small. In the latter case, you might want to call fgets() again.
How to scan total line from user input with c program?
scanf("%99[^\n]",st) reads a line, almost.
With the C Standard Library a line is
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. C11dr §7.21.2 2
scanf("%99[^\n]",st) fails to read the end of the line, the '\n'.
That is why on the 2nd call, the '\n' remains in stdin to be read and scanf("%99[^\n]",st) will not read it.
There are ways to use scanf("%99[^\n]",st);, or a variation of it as a step in reading user input, yet they suffer from 1) Not handling a blank line "\n" correctly 2) Missing rare input errors 3) Long line issues and other nuances.
The preferred portable solution is to use fgets(). Loop example:
#define LINE_MAX_LENGTH 200
char buf[LINE_MAX_LENGTH + 1 + 1]; // +1 for long lines detection, +1 for \0
while (fgets(buf, sizeof buf, stdin)) {
size_t eol = strcspn(buf, "\n"); **
buf[eol] = '\0'; // trim potential \n
if (eol >= LINE_MAX_LENGTH) {
// IMO, user input exceeding a sane generous threshold is a potential hack
fprintf(stderr, "Line too long\n");
// TBD : Handle excessive long line
}
// Use `buf[[]`
}
Many platforms support getline() to read a line.
Short-comings: Non C-standard and allow a hacker to overwhelm system resources with insanely long lines.
In C, there is not a great solution. What is best depends on the various coding goals.
** I prefer size_t eol = strcspn(buf, "\n\r"); to read lines in a *nix environment that may end with "\r\n".
scanf() should never be used for user input. The best way to get input from the user is with fgets().
Read more: http://sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html
char str[1024];
char *alline = fgets(str, 1024, stdin);
scanf("%[^'\n']s",alline);
I think the correct solution should be like this. It is worked for me.
Hope it helps.
I'm trying to read a line using the following code:
while(fscanf(f, "%[^\n\r]s", cLine) != EOF )
{
/* do something with cLine */
}
But somehow I get only the first line every time. Is this a bad way to read a line? What should I fix to make it work as expected?
It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure.
I prefer to use fgets() to get each line in and then sscanf() that. You can then continue to examine the line read in as you see fit. Something like:
#define LINESZ 1024
char buff[LINESZ];
FILE *fin = fopen ("infile.txt", "r");
if (fin != NULL) {
while (fgets (buff, LINESZ, fin)) {
/* Process buff here. */
}
fclose (fin);
}
fgets() appears to be what you're trying to do, reading in a string until you encounter a newline character.
If you want read a file line by line (Here, line separator == '\n') just make that:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
FILE *fp;
char *buffer;
int ret;
// Open a file ("test.txt")
if ((fp = fopen("test.txt", "r")) == NULL) {
fprintf(stdout, "Error: Can't open file !\n");
return -1;
}
// Alloc buffer size (Set your max line size)
buffer = malloc(sizeof(char) * 4096);
while(!feof(fp))
{
// Clean buffer
memset(buffer, 0, 4096);
// Read a line
ret = fscanf(fp, "%4095[^\n]\n", buffer);
if (ret != EOF) {
// Print line
fprintf(stdout, "%s\n", buffer);
}
}
// Free buffer
free(buffer);
// Close file
fclose(fp);
return 0;
}
Enjoy :)
If you try while( fscanf( f, "%27[^\n\r]", cLine ) == 1 ) you might have a little more luck. The three changes from your original:
length-limit what gets read in - I've used 27 here as an example, and unfortunately the scanf() family require the field width literally in the format string and can't use the * mechanism that the printf() can for passing the value in
get rid of the s in the format string - %[ is the format specifier for "all characters matching or not matching a set", and the set is terminated by a ] on its own
compare the return value against the number of conversions you expect to happen (and for ease of management, ensure that number is 1)
That said, you'll get the same result with less pain by using fgets() to read in as much of a line as will fit in your buffer.
Using fscanf to read/tokenise a file always results in fragile code or pain and suffering. Reading a line, and tokenising or scanning that line is safe, and effective. It needs more lines of code - which means it takes longer to THINK about what you want to do (and you need to handle a finite input buffer size) - but after that life just stinks less.
Don't fight fscanf. Just don't use it. Ever.
It looks to me like you're trying to use regex operators in your fscanf string. The string [^\n\r] doesn't mean anything to fscanf, which is why your code doesn't work as expected.
Furthermore, fscanf() doesn't return EOF if the item doesn't match. Rather, it returns an integer that indicates the number of matches--which in your case is probably zero. EOF is only returned at the end of the stream or in case of an error. So what's happening in your case is that the first call to fscanf() reads all the way to the end of the file looking for a matching string, then returns 0 to let you know that no match was found. The second call then returns EOF because the entire file has been read.
Finally, note that the %s scanf format operator only captures to the next whitespace character, so you don't need to exclude \n or \r in any case.
Consult the fscanf documentation for more information: http://www.cplusplus.com/reference/clibrary/cstdio/fscanf/
Your loop has several issues. You wrote:
while( fscanf( f, "%[^\n\r]s", cLine ) != EOF )
/* do something */;
Some things to consider:
fscanf() returns the number of items stored. It can return EOF if it reads past the end of file or if the file handle has an error. You need to distinguish a valid return of zero in which case there is no new content in the buffer cLine from a successfully read.
You do a have a problem when a failure to match occurs because it is difficult to predict where the file handle is now pointing in the stream. This makes recovery from a failed match harder to do than might be expected.
The pattern you wrote probably doesn't do what you intended. It is matching any number of characters that are not CR or LF, and then expecting to find a literal s.
You haven't protected your buffer from an overflow. Any number of characters may be read from the file and written to the buffer, regardless of the size allocated to that buffer. This is an unfortunately common error, that in many cases can be exploited by an attacker to run arbitrary code of the attackers choosing.
Unless you specifically requested that f be opened in binary mode, line ending translation will happen in the library and you will generally never see CR characters, and usually not in text files.
You probably want a loop more like the following:
while(fgets(cLine, N_CLINE, f)) {
/* do something */ ;
}
where N_CLINE is the number of bytes available in the buffer starting a cLine.
The fgets() function is a much preferred way to read a line from a file. Its second parameter is the size of the buffer, and it reads up to 1 less than that size bytes from the file into the buffer. It always terminates the buffer with a nul character so that it can be safely passed to other C string functions.
It stops on the first of end of file, newline, or buffer_size-1 bytes read.
It leaves the newline character in the buffer, and that fact allows you to distinguish a single line longer than your buffer from a line shorter than the buffer.
It returns NULL if no bytes were copied due to end of file or an error, and the pointer to the buffer otherwise. You might want to use feof() and/or ferror() to distinguish those cases.
i think the problem with this code is because when you read with %[^\n\r]s, in fact, you reading until reach '\n' or '\r', but you don't reading the '\n' or '\r' also.
So you need to get this character before you read with fscanf again at loop.
Do something like that:
do{
fscanf(f, "%[^\n\r]s", cLine) != EOF
/* Do something here */
}while(fgetc(file) != EOF)
I'm brand new to C and trying to learn how to read a file. My file is a simple file (just for testing) which contains the following:
this file
has been
successfully read
by C!
So I read the file using the following C code:
#include <stdio.h>
int main() {
char str[100];
FILE *file = fopen("/myFile/path/test.txt", "r");
if(file == NULL) {
puts("This file does not exist!");
return -1;
}
while(fgets(str, 100, file) != '\0') {
puts(str);
}
fclose(file);
return 0;
}
This prints my text like this:
this file
has been
successfully read
by C!
When I compile and run it, I pipe its output to hexdump -C and can see an extra 0a at the end of each line.
Finally, why do I need to declare an array of chars to read from a file? What if I don't know how much data is on each line?
fgets() reads up to the newline and keeps the newline in the string and puts() always adds a newline to the string it is given to print. Hence you get double-spaced output when used as in your code.
Use fputs(str, stdout) instead of puts(); it does not add a newline.
The obsolete function gets() — removed from the 2011 version of the C standard — read up to the newline but removed it. The gets() and puts() pair worked well together, as do fgets() and fputs(). However, you should certainly NOT use gets(); it is a catastrophe waiting to happen. (The first internet worm in 1988 used gets() to migrate — Google search for 'morris internet worm').
In comments, inquisitor asked:
Why does the line need to be read into a char array of a specific size?
Because you need to make sure you don't overrun the space that is available. C does not do automatic allocation of space for strings. That is one of its weaknesses from some viewpoints; it is also a strength, but it routinely confuses newcomers to the language. If you want the input code to allocate enough space for a line, use the POSIX function getline().
So is it better to just read and output until I hit a '\0' since I won't always know the amount of chars on a given line?
No. In general, you won't hit '\0'; most text files do not contain any of those. If you don't want to allocate enough space for a line, then use:
int c;
while ((c = getchar()) != EOF)
putchar(c);
which reads one character at a time in the user code, but the underlying standard I/O packages buffer the input up so it isn't too costly — it is perfectly feasible to implement a program that way. If you need to work on lines, either allocate enough space for lines (I use char buffer[4096]; routinely) or use getline().
And Charlie Burns asked in a comment:
Why don't we see getline() suggested more often?
I think it is not mentioned all that often because getline() is relatively new, and not necessarily available everywhere yet. It was added to POSIX 2008; it is available on Linux and BSD. I'm not sure about the other mainline Unix variants (AIX, HP-UX, Solaris). It isn't hard to write for yourself (I've done it), but it is a nuisance if you need to write portable code (especially if 'portable' includes 'Microsoft'). One of its merits is that it tells you how long the line it read actually was.
Example using getline()
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char *line = 0;
size_t length = 0;
char const name[] = "/myFile/path/test.txt";
FILE *file = fopen(name, "r");
if (file == NULL)
{
fprintf(stderr, "%s: failed to open file %s\n", argv[0], name);
return -1;
}
while (getline(&line, &length, file) > 0)
fputs(str, stdout);
free(line);
fclose(file);
return 0;
}
fgets saves the newline character at the end of the line when reading line by line. This allows you to determine wether actually a line was read or just your buffer was too small.
puts always adds a newline when printing.
Either trim off the newline from fgets or use printf
printf("%s", str);
When I try to compile C code that uses the gets() function with GCC, I get this warning:
(.text+0x34): warning: the `gets' function is dangerous and should not be used.
I remember this has something to do with stack protection and security, but I'm not sure exactly why.
How can I remove this warning and why is there such a warning about using gets()?
If gets() is so dangerous then why can't we remove it?
Why is gets() dangerous
The first internet worm (the Morris Internet Worm) escaped about 30 years ago (1988-11-02), and it used gets() and a buffer overflow as one of its methods of propagating from system to system. The basic problem is that the function doesn't know how big the buffer is, so it continues reading until it finds a newline or encounters EOF, and may overflow the bounds of the buffer it was given.
You should forget you ever heard that gets() existed.
The C11 standard ISO/IEC 9899:2011 eliminated gets() as a standard function, which is A Good Thing™ (it was formally marked as 'obsolescent' and 'deprecated' in ISO/IEC 9899:1999/Cor.3:2007 — Technical Corrigendum 3 for C99, and then removed in C11). Sadly, it will remain in libraries for many years (meaning 'decades') for reasons of backwards compatibility. If it were up to me, the implementation of gets() would become:
char *gets(char *buffer)
{
assert(buffer != 0);
abort();
return 0;
}
Given that your code will crash anyway, sooner or later, it is better to head the trouble off sooner rather than later. I'd be prepared to add an error message:
fputs("obsolete and dangerous function gets() called\n", stderr);
Modern versions of the Linux compilation system generates warnings if you link gets() — and also for some other functions that also have security problems (mktemp(), …).
Alternatives to gets()
fgets()
As everyone else said, the canonical alternative to gets() is fgets() specifying stdin as the file stream.
char buffer[BUFSIZ];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
...process line of data...
}
What no-one else yet mentioned is that gets() does not include the newline but fgets() does. So, you might need to use a wrapper around fgets() that deletes the newline:
char *fgets_wrapper(char *buffer, size_t buflen, FILE *fp)
{
if (fgets(buffer, buflen, fp) != 0)
{
size_t len = strlen(buffer);
if (len > 0 && buffer[len-1] == '\n')
buffer[len-1] = '\0';
return buffer;
}
return 0;
}
Or, better:
char *fgets_wrapper(char *buffer, size_t buflen, FILE *fp)
{
if (fgets(buffer, buflen, fp) != 0)
{
buffer[strcspn(buffer, "\n")] = '\0';
return buffer;
}
return 0;
}
Also, as caf points out in a comment and paxdiablo shows in their answer, with fgets() you might have data left over on a line. My wrapper code leaves that data to be read next time; you can readily modify it to gobble the rest of the line of data if you prefer:
if (len > 0 && buffer[len-1] == '\n')
buffer[len-1] = '\0';
else
{
int ch;
while ((ch = getc(fp)) != EOF && ch != '\n')
;
}
The residual problem is how to report the three different result states — EOF or error, line read and not truncated, and partial line read but data was truncated.
This problem doesn't arise with gets() because it doesn't know where your buffer ends and merrily tramples beyond the end, wreaking havoc on your beautifully tended memory layout, often messing up the return stack (a Stack Overflow) if the buffer is allocated on the stack, or trampling over the control information if the buffer is dynamically allocated, or copying data over other precious global (or module) variables if the buffer is statically allocated. None of these is a good idea — they epitomize the phrase 'undefined behaviour`.
There is also the TR 24731-1 (Technical Report from the C Standard Committee) which provides safer alternatives to a variety of functions, including gets():
§6.5.4.1 The gets_s function
###Synopsis
#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);
Runtime-constraints
s shall not be a null pointer. n shall neither be equal to zero nor be greater than RSIZE_MAX. A new-line character, end-of-file, or read error shall occur within reading n-1 characters from stdin.25)
3 If there is a runtime-constraint violation, s[0] is set to the null character, and characters are read and discarded from stdin until a new-line character is read, or end-of-file or a read error occurs.
Description
4 The gets_s function reads at most one less than the number of characters specified by n from the stream pointed to by stdin, into the array pointed to by s. No additional characters are read after a new-line character (which is discarded) or after end-of-file. The discarded new-line character does not count towards number of characters read. A null character is written immediately after the last character read into the array.
5 If end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then s[0] is set to the null character, and the other elements of s take unspecified values.
Recommended practice
6 The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.
25) The gets_s function, unlike gets, makes it a runtime-constraint violation for a line of input to overflow the buffer to store it. Unlike fgets, gets_s maintains a one-to-one relationship between input lines and successful calls to gets_s. Programs that use gets expect such a relationship.
The Microsoft Visual Studio compilers implement an approximation to the TR 24731-1 standard, but there are differences between the signatures implemented by Microsoft and those in the TR.
The C11 standard, ISO/IEC 9899-2011, includes TR24731 in Annex K as an optional part of the library. Unfortunately, it is seldom implemented on Unix-like systems.
getline() — POSIX
POSIX 2008 also provides a safe alternative to gets() called getline(). It allocates space for the line dynamically, so you end up needing to free it. It removes the limitation on line length, therefore. It also returns the length of the data that was read, or -1 (and not EOF!), which means that null bytes in the input can be handled reliably. There is also a 'choose your own single-character delimiter' variation called getdelim(); this can be useful if you are dealing with the output from find -print0 where the ends of the file names are marked with an ASCII NUL '\0' character, for example.
In order to use gets safely, you have to know exactly how many characters you will be reading, so that you can make your buffer large enough. You will only know that if you know exactly what data you will be reading.
Instead of using gets, you want to use fgets, which has the signature
char* fgets(char *string, int length, FILE * stream);
(fgets, if it reads an entire line, will leave the '\n' in the string; you'll have to deal with that.)
gets remained an official part of the language up to the 1999 ISO C standard, but it was officially removed in the 2011 standard. Most C implementations still support it, but at least gcc issues a warning for any code that uses it.
Because gets doesn't do any kind of check while getting bytes from stdin and putting them somewhere. A simple example:
char array1[] = "12345";
char array2[] = "67890";
gets(array1);
Now, first of all you are allowed to input how many characters you want, gets won't care about it. Secondly the bytes over the size of the array in which you put them (in this case array1) will overwrite whatever they find in memory because gets will write them. In the previous example this means that if you input "abcdefghijklmnopqrts" maybe, unpredictably, it will overwrite also array2 or whatever.
The function is unsafe because it assumes consistent input. NEVER USE IT!
You should not use gets since it has no way to stop a buffer overflow. If the user types in more data than can fit in your buffer, you will most likely end up with corruption or worse.
In fact, ISO have actually taken the step of removing gets from the C standard (as of C11, though it was deprecated in C99) which, given how highly they rate backward compatibility, should be an indication of how bad that function was.
The correct thing to do is to use the fgets function with the stdin file handle since you can limit the characters read from the user.
But this also has its problems such as:
extra characters entered by the user will be picked up the next time around.
there's no quick notification that the user entered too much data.
To that end, almost every C coder at some point in their career will write a more useful wrapper around fgets as well. Here's mine:
#include <stdio.h>
#include <string.h>
#define OK 0
#define NO_INPUT 1
#define TOO_LONG 2
static int getLine (char *prmpt, char *buff, size_t sz) {
int ch, extra;
// Get line with buffer overrun protection.
if (prmpt != NULL) {
printf ("%s", prmpt);
fflush (stdout);
}
if (fgets (buff, sz, stdin) == NULL)
return NO_INPUT;
// If it was too long, there'll be no newline. In that case, we flush
// to end of line so that excess doesn't affect the next call.
if (buff[strlen(buff)-1] != '\n') {
extra = 0;
while (((ch = getchar()) != '\n') && (ch != EOF))
extra = 1;
return (extra == 1) ? TOO_LONG : OK;
}
// Otherwise remove newline and give string back to caller.
buff[strlen(buff)-1] = '\0';
return OK;
}
with some test code:
// Test program for getLine().
int main (void) {
int rc;
char buff[10];
rc = getLine ("Enter string> ", buff, sizeof(buff));
if (rc == NO_INPUT) {
printf ("No input\n");
return 1;
}
if (rc == TOO_LONG) {
printf ("Input too long\n");
return 1;
}
printf ("OK [%s]\n", buff);
return 0;
}
It provides the same protections as fgets in that it prevents buffer overflows but it also notifies the caller as to what happened and clears out the excess characters so that they do not affect your next input operation.
Feel free to use it as you wish, I hereby release it under the "do what you damn well want to" licence :-)
fgets.
To read from the stdin:
char string[512];
fgets(string, sizeof(string), stdin); /* no buffer overflows here, you're safe! */
You can't remove API functions without breaking the API. If you would, many applications would no longer compile or run at all.
This is the reason that one reference gives:
Reading a line that overflows the
array pointed to by s results in
undefined behavior. The use of fgets()
is recommended.
I read recently, in a USENET post to comp.lang.c, that gets() is getting removed from the Standard. WOOHOO
You'll be happy to know that the
committee just voted (unanimously, as
it turns out) to remove gets() from
the draft as well.
In C11(ISO/IEC 9899:201x), gets() has been removed. (It's deprecated in ISO/IEC 9899:1999/Cor.3:2007(E))
In addition to fgets(), C11 introduces a new safe alternative gets_s():
C11 K.3.5.4.1 The gets_s function
#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);
However, in the Recommended practice section, fgets() is still preferred.
The fgets function allows properly-written programs to safely process input lines too
long to store in the result array. In general this requires that callers of fgets pay
attention to the presence or absence of a new-line character in the result array. Consider
using fgets (along with any needed processing based on new-line characters) instead of
gets_s.
gets() is dangerous because it is possible for the user to crash the program by typing too much into the prompt. It can't detect the end of available memory, so if you allocate an amount of memory too small for the purpose, it can cause a seg fault and crash. Sometimes it seems very unlikely that a user will type 1000 letters into a prompt meant for a person's name, but as programmers, we need to make our programs bulletproof. (it may also be a security risk if a user can crash a system program by sending too much data).
fgets() allows you to specify how many characters are taken out of the standard input buffer, so they don't overrun the variable.
The C gets function is dangerous and has been a very costly mistake. Tony Hoare singles it out for specific mention in his talk "Null References: The Billion Dollar Mistake":
http://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare
The whole hour is worth watching but for his comments view from 30 minutes on with the specific gets criticism around 39 minutes.
Hopefully this whets your appetite for the whole talk, which draws attention to how we need more formal correctness proofs in languages and how language designers should be blamed for the mistakes in their languages, not the programmer. This seems to have been the whole dubious reason for designers of bad languages to push the blame to programmers in the guise of 'programmer freedom'.
I would like to extend an earnest invitation to any C library maintainers out there who are still including gets in their libraries "just in case anyone is still depending on it": Please replace your implementation with the equivalent of
char *gets(char *str)
{
strcpy(str, "Never use gets!");
return str;
}
This will help make sure nobody is still depending on it. Thank you.
Additional info:
From man 3 gets on Linux Ubuntu you'll see (emphasis added):
DESCRIPTION
Never use this function.
And, from the cppreference.com wiki here (https://en.cppreference.com/w/c/io/gets) you'll see: Notes Never use gets().:
Notes
The gets() function does not perform bounds checking, therefore this function is extremely vulnerable to buffer-overflow attacks. It cannot be used safely (unless the program runs in an environment which restricts what can appear on stdin). For this reason, the function has been deprecated in the third corrigendum to the C99 standard and removed altogether in the C11 standard. fgets() and gets_s() are the recommended replacements.
Never use gets().
As you can see, the function has been deprecated and removed entirely in C11 or later.
Use fgets() or gets_s() instead.
Here is my demo usage of fgets(), with full error checking:
From read_stdin_fgets_basic_input_from_user.c:
#include <errno.h> // `errno`
#include <stdio.h> // `printf()`, `fgets()`
#include <stdlib.h> // `exit()`
#include <string.h> // `strerror()`
// int main(int argc, char *argv[]) // alternative prototype
int main()
{
char buf[10];
// NEVER USE `gets()`! USE `fgets()` BELOW INSTEAD!
// USE THIS!: `fgets()`: "file get string", which reads until either EOF is
// reached, OR a newline (`\n`) is found, keeping the newline char in
// `buf`.
// For `feof()` and `ferror()`, see:
// 1. https://en.cppreference.com/w/c/io/feof
// 1. https://en.cppreference.com/w/c/io/ferror
printf("Enter up to %zu chars: ", sizeof(buf) - 1); // - 1 to save room
// for null terminator
char* retval = fgets(buf, sizeof(buf), stdin);
if (feof(stdin))
{
// Check for `EOF`, which means "End of File was reached".
// - This doesn't really make sense on `stdin` I think, but it is a good
// check to have when reading from a regular file with `fgets
// ()`. Keep it here regardless, just in case.
printf("EOF (End of File) reached.\n");
}
if (ferror(stdin))
{
printf("Error indicator set. IO error when reading from file "
"`stdin`.\n");
}
if (retval == NULL)
{
printf("ERROR in %s(): fgets() failed; errno = %i: %s\n",
__func__, errno, strerror(errno));
exit(EXIT_FAILURE);
}
size_t num_chars_written = strlen(buf) + 1; // + 1 for null terminator
if (num_chars_written >= sizeof(buf))
{
printf("Warning: user input may have been truncated! All %zu chars "
"were written into buffer.\n", num_chars_written);
}
printf("You entered \"%s\".\n", buf);
return 0;
}
Sample runs and output:
eRCaGuy_hello_world/c$ gcc -Wall -Wextra -Werror -O3 -std=c17 read_stdin_fgets_basic_input_from_user.c -o bin/a && bin/a
Enter up to 9 chars: hello world!
Warning: user input may have been truncated! All 10 chars were written into buffer.
You entered "hello wor".
eRCaGuy_hello_world/c$ gcc -Wall -Wextra -Werror -O3 -std=c17 read_stdin_fgets_basic_input_from_user.c -o bin/a && bin/a
Enter up to 9 chars: hey
You entered "hey
".
In a few words gets() (can) be dangerous because the user might input something bigger than what the variable has enough space to store. First answer says about fgets() and why it is safer.