Reading from stdin and storing \n and whitespace - c

I've been trying to use scanf to get input from stdin but it truncates the string after seeing whitespace or after hitting return.
What I'm trying to get is a way to read keyboard input that stores in the buffer linebreaks as well as whitespace. And ending when ctrl-D is pressed.
Should I try using fgets? I figured that wouldn't be optimal either since fgets returns after reading in a \n

There is no ready-made function to read everyting from stdin, but creating your own is fortunately easy. Untested code snippet, with some explanation in comments, which can read arbitrarily large amount of chars from stdin:
size_t size = 0; // how many chars have actually been read
size_t reserved = 10; // how much space is allocated
char *buf = malloc(reserved);
int ch;
if (buf == NULL) exit(1); // out of memory
// read one char at a time from stdin, until EOF.
// let stdio to handle input buffering
while ( (ch = getchar()) != EOF) {
buf[size] = (char)ch;
++size;
// make buffer larger if needed, must have room for '\0' below!
// size is always doubled,
// so reallocation is going to happen limited number of times
if (size == reserved) {
reserved *= 2;
buf = realloc(buf, reserved);
if (buf == NULL) exit(1); // out of memory
}
}
// add terminating NUL character to end the string,
// maybe useless with binary data but won't hurt there either
buf[size] = 0;
// now buf contains size chars, everything from stdin until eof,
// optionally shrink the allocation to contain just input and '\0'
buf = realloc(buf, size+1);

scanf() splits the input at whitespace boundaries, so it's not suitable in your case. Indeed fgets() is the better choice. What you need to do is keep reading after fgets() returns; each call will read a line of input. You can keep reading until fgets() returns NULL, which means that nothing more can be read.
You can also use fgetc() instead if you prefer getting input character by character. It will return EOF when nothing more can be read.

If you want to read all input, regardless of whether it is whitespace or not, try fread.

Read like this
char ch,line[20];
int i=0; //define a counter
//read a character assign it to ch,
//check whether the character is End of file or not and
//also check counter value to avoid overflow.
while((ch=getchar())!=EOF && i < 19 )
{
line[i]=ch;
i++;
}
line[i]='\0';

Related

Dynamically allocate user inputted string

I am trying to write a function that does the following things:
Start an input loop, printing '> ' each iteration.
Take whatever the user enters (unknown length) and read it into a character array, dynamically allocating the size of the array if necessary. The user-entered line will end at a newline character.
Add a null byte, '\0', to the end of the character array.
Loop terminates when the user enters a blank line: '\n'
This is what I've currently written:
void input_loop(){
char *str = NULL;
printf("> ");
while(printf("> ") && scanf("%a[^\n]%*c",&input) == 1){
/*Add null byte to the end of str*/
/*Do stuff to input, including traversing until the null byte is reached*/
free(str);
str = NULL;
}
free(str);
str = NULL;
}
Now, I'm not too sure how to go about adding the null byte to the end of the string. I was thinking something like this:
last_index = strlen(str);
str[last_index] = '\0';
But I'm not too sure if that would work though. I can't test if it would work because I'm encountering this error when I try to compile my code:
warning: ISO C does not support the 'a' scanf flag [-Wformat=]
So what can I do to make my code work?
EDIT: changing scanf("%a[^\n]%*c",&input) == 1 to scanf("%as[^\n]%*c",&input) == 1 gives me the same error.
First of all, scanf format strings do not use regular expressions, so I don't think something close to what you want will work. As for the error you get, according to my trusty manual, the %a conversion flag is for floating point numbers, but it only works on C99 (and your compiler is probably configured for C90)
But then you have a bigger problem. scanf expects that you pass it a previously allocated empty buffer for it to fill in with the read input. It does not malloc the sctring for you so your attempts at initializing str to NULL and the corresponding frees will not work with scanf.
The simplest thing you can do is to give up on n arbritrary length strings. Create a large buffer and forbid inputs that are longer than that.
You can then use the fgets function to populate your buffer. To check if it managed to read the full line, check if your string ends with a "\n".
char str[256+1];
while(true){
printf("> ");
if(!fgets(str, sizeof str, stdin)){
//error or end of file
break;
}
size_t len = strlen(str);
if(len + 1 == sizeof str){
//user typed something too long
exit(1);
}
printf("user typed %s", str);
}
Another alternative is you can use a nonstandard library function. For example, in Linux there is the getline function that reads a full line of input using malloc behind the scenes.
No error checking, don't forget to free the pointer when you're done with it. If you use this code to read enormous lines, you deserve all the pain it will bring you.
#include <stdio.h>
#include <stdlib.h>
char *readInfiniteString() {
int l = 256;
char *buf = malloc(l);
int p = 0;
char ch;
ch = getchar();
while(ch != '\n') {
buf[p++] = ch;
if (p == l) {
l += 256;
buf = realloc(buf, l);
}
ch = getchar();
}
buf[p] = '\0';
return buf;
}
int main(int argc, char *argv[]) {
printf("> ");
char *buf = readInfiniteString();
printf("%s\n", buf);
free(buf);
}
If you are on a POSIX system such as Linux, you should have access to getline. It can be made to behave like fgets, but if you start with a null pointer and a zero length, it will take care of memory allocation for you.
You can use in in a loop like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h> // for strcmp
int main(void)
{
char *line = NULL;
size_t nline = 0;
for (;;) {
ptrdiff_t n;
printf("> ");
// read line, allocating as necessary
n = getline(&line, &nline, stdin);
if (n < 0) break;
// remove trailing newline
if (n && line[n - 1] == '\n') line[n - 1] = '\0';
// do stuff
printf("'%s'\n", line);
if (strcmp("quit", line) == 0) break;
}
free(line);
printf("\nBye\n");
return 0;
}
The passed pointer and the length value must be consistent, so that getline can reallocate memory as required. (That means that you shouldn't change nline or the pointer line in the loop.) If the line fits, the same buffer is used in each pass through the loop, so that you have to free the line string only once, when you're done reading.
Some have mentioned that scanf is probably unsuitable for this purpose. I wouldn't suggest using fgets, either. Though it is slightly more suitable, there are problems that seem difficult to avoid, at least at first. Few C programmers manage to use fgets right the first time without reading the fgets manual in full. The parts most people manage to neglect entirely are:
what happens when the line is too large, and
what happens when EOF or an error is encountered.
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer...
I don't feel I need to stress the importance of checking the return value too much, so I won't mention it again. Suffice to say, if your program doesn't check the return value your program won't know when EOF or an error occurs; your program will probably be caught in an infinite loop.
When no '\n' is present, the remaining bytes of the line are yet to have been read. Thus, fgets will always parse the line at least once, internally. When you introduce extra logic, to check for a '\n', to that, you're parsing the data a second time.
This allows you to realloc the storage and call fgets again if you want to dynamically resize the storage, or discard the remainder of the line (warning the user of the truncation is a good idea), perhaps using something like fscanf(file, "%*[^\n]");.
hugomg mentioned using multiplication in the dynamic resize code to avoid quadratic runtime problems. Along this line, it would be a good idea to avoid parsing the same data over and over each iteration (thus introducing further quadratic runtime problems). This can be achieved by storing the number of bytes you've read (and parsed) somewhere. For example:
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL, *temp;
do {
size_t alloc_size = bytes_read * 2 + 1;
temp = realloc(bytes, alloc_size);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
temp = fgets(bytes + bytes_read, alloc_size - bytes_read, f); /* Parsing data the first time */
bytes_read += strcspn(bytes + bytes_read, "\n"); /* Parsing data the second time */
} while (temp && bytes[bytes_read] != '\n');
bytes[bytes_read] = '\0';
return bytes;
}
Those who do manage to read the manual and come up with something correct (like this) may soon realise the complexity of an fgets solution is at least twice as poor as the same solution using fgetc. We can avoid parsing data the second time by using fgetc, so using fgetc might seem most appropriate. Alas most C programmers also manage to use fgetc incorrectly when neglecting the fgetc manual.
The most important detail is to realise that fgetc returns an int, not a char. It may return typically one of 256 distinct values, between 0 and UCHAR_MAX (inclusive). It may otherwise return EOF, meaning there are typically 257 distinct values that fgetc (or consequently, getchar) may return. Trying to store those values into a char or unsigned char results in loss of information, specifically the error modes. (Of course, this typical value of 257 will change if CHAR_BIT is greater than 8, and consequently UCHAR_MAX is greater than 255)
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL;
do {
if ((bytes_read & (bytes_read + 1)) == 0) {
void *temp = realloc(bytes, bytes_read * 2 + 1);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
}
int c = fgetc(f);
bytes[bytes_read] = c >= 0 && c != '\n'
? c
: '\0';
} while (bytes[bytes_read++]);
return bytes;
}

How can I make fgets() read all of the 1024 byte line? [duplicate]

Is this safe to do? Does fgets terminate the buffer with null or should I be setting the 20th byte to null after the call to fgets and before I call clean?
// strip new lines
void clean(char *data)
{
while (*data)
{
if (*data == '\n' || *data == '\r') *data = '\0';
data++;
}
}
// for this, assume that the file contains 1 line no longer than 19 bytes
// buffer is freed elsewhere
char *load_latest_info(char *file)
{
FILE *f;
char *buffer = (char*) malloc(20);
if (f = fopen(file, "r"))
if (fgets(buffer, 20, f))
{
clean(buffer);
return buffer;
}
free(buffer);
return NULL;
}
Yes fgets() always properly null-terminates the buffer. From the man page:
The fgets() function reads at most one less than the number of characters specified by n from the given stream and stores them in the string s. Reading stops when
a newline character is found, at end-of-file or error. The newline, if any, is retained. If any characters are read and there is no error, a '\0' character is
appended to end the string.
If there is an error, fgets() may or may not store any zero bytes anywhere in the buffer. Code which doesn't check the return value of fgets() won't be safe unless it ensures there's a zero in the buffer somewhere; the easiest way to do that is to unconditionally store a zero to the last spot. Doing that will mean that an unnoticed error may (depending upon implementation) cause a bogus extra line of data to be read, but won't fall off into Undefined Behavior.

How to discard the rest of a line in C

I'm trying to write a function that removes the rest of a line in C. I'm passing in a char array and a file pointer (which the char array was read from). The array is only supposed to have 80 chars in it, and if there isn't a newline in the array, read (and discard) characters in the file until you reach it (newline). Here's what I have so far, but it doesn't seem to be working, and I'm not sure what I'm doing wrong. Any help would be greatly appreciated! Here's the given information about what the function should do:
discardRest - if the fgets didn't read a newline than an entire line hasn't been read. This function takes as input the most recently read line and the pointer to the file being read. discardRest looks for the newline character in the input line. If newline character is not in the line, the function reads (and discards) characters from the file until the newline is read. This will cause the file pointer to be positioned to the beginning of the next line in the input file.
And here's the code:
void discardRest(char line[], FILE* file)
{
bool newlineFound = FALSE;
int i;
for(i = 0; i < sizeof(line); i++)
{
if(line[i] == '\n') newlineFound = TRUE;
}
if(!newlineFound)
{
int c = getc(file);
while(c != '\n')
{
c = getc(file);
}
}
}
Your way is much too difficult, besides sizeof always giving the size of its operand, which is a pointer and not the array it points to which you think it is.
fgets has thefollowing contract:
return NULL: Some kind of error, do not use the buffer, its content might be indeterminate.
otherwise the buffer contains a 0-terminated string, with the last non-0 being the retained '\n' if the buffer and the file were both large enough.
Thus, this should work:
So, use strlen() to get the buffer length.
Determine if a whole line was read (length && [length-1] == '\n').
As appropriate:
remove the newline character and return.
discard the rest of the line like you tried.

Elegant way to determine EOF?

I am reading from a text file, iterating with a while(!feof) loop,
but whenever I use this condition the loop iterates an extra time.
I solved the problem with this 'patchy' code
while (stop == FALSE)
{
...
terminator = fgetc(input);
if (terminator == EOF)
stop = TRUE;
else
fseek(input, -1, SEEK_CUR);
}
But it looks and feels very bad.
You can take advantage of the fact that an assignment gets evaluated as the value being assigned, in this case to the character being read:
while((terminator = fgetc(input))!= EOF) {
// ...
}
Here is an idiomatic example (source):
fp = fopen("datafile.txt", "r"); // error check this!
// this while-statement assigns into c, and then checks against EOF:
while((c = fgetc(fp)) != EOF) {
/* ... */
}
fclose(fp);
Similarly you ca read line-by-line:
char buf[MAXLINE];
// ...
while((fgets(buf,MAXLINE,stdin)) != NULL) {
do_something(buf);
}
Since fgets copies the detected newline character, you can detect
end of line by checking the second to last buffer element. You can use
realloc to resize the buffer (be sure you keep a pointer to the beginning of the buffer, but pass buf+n, to the next fgets, where n is the number of read characters). From the standard regarding fgets:
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first. A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str.
Alternatively, you could read the whole file in one go using fread() (see example following the link).

Does fgets() always null-terminate the string it returns?

Is this safe to do? Does fgets terminate the buffer with null or should I be setting the 20th byte to null after the call to fgets and before I call clean?
// strip new lines
void clean(char *data)
{
while (*data)
{
if (*data == '\n' || *data == '\r') *data = '\0';
data++;
}
}
// for this, assume that the file contains 1 line no longer than 19 bytes
// buffer is freed elsewhere
char *load_latest_info(char *file)
{
FILE *f;
char *buffer = (char*) malloc(20);
if (f = fopen(file, "r"))
if (fgets(buffer, 20, f))
{
clean(buffer);
return buffer;
}
free(buffer);
return NULL;
}
Yes fgets() always properly null-terminates the buffer. From the man page:
The fgets() function reads at most one less than the number of characters specified by n from the given stream and stores them in the string s. Reading stops when
a newline character is found, at end-of-file or error. The newline, if any, is retained. If any characters are read and there is no error, a '\0' character is
appended to end the string.
If there is an error, fgets() may or may not store any zero bytes anywhere in the buffer. Code which doesn't check the return value of fgets() won't be safe unless it ensures there's a zero in the buffer somewhere; the easiest way to do that is to unconditionally store a zero to the last spot. Doing that will mean that an unnoticed error may (depending upon implementation) cause a bogus extra line of data to be read, but won't fall off into Undefined Behavior.

Resources