How to get fscanf to stop if it hits a newline? [duplicate] - c

I'm trying to read a line using the following code:
while(fscanf(f, "%[^\n\r]s", cLine) != EOF )
{
/* do something with cLine */
}
But somehow I get only the first line every time. Is this a bad way to read a line? What should I fix to make it work as expected?

It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure.
I prefer to use fgets() to get each line in and then sscanf() that. You can then continue to examine the line read in as you see fit. Something like:
#define LINESZ 1024
char buff[LINESZ];
FILE *fin = fopen ("infile.txt", "r");
if (fin != NULL) {
while (fgets (buff, LINESZ, fin)) {
/* Process buff here. */
}
fclose (fin);
}
fgets() appears to be what you're trying to do, reading in a string until you encounter a newline character.

If you want read a file line by line (Here, line separator == '\n') just make that:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
FILE *fp;
char *buffer;
int ret;
// Open a file ("test.txt")
if ((fp = fopen("test.txt", "r")) == NULL) {
fprintf(stdout, "Error: Can't open file !\n");
return -1;
}
// Alloc buffer size (Set your max line size)
buffer = malloc(sizeof(char) * 4096);
while(!feof(fp))
{
// Clean buffer
memset(buffer, 0, 4096);
// Read a line
ret = fscanf(fp, "%4095[^\n]\n", buffer);
if (ret != EOF) {
// Print line
fprintf(stdout, "%s\n", buffer);
}
}
// Free buffer
free(buffer);
// Close file
fclose(fp);
return 0;
}
Enjoy :)

If you try while( fscanf( f, "%27[^\n\r]", cLine ) == 1 ) you might have a little more luck. The three changes from your original:
length-limit what gets read in - I've used 27 here as an example, and unfortunately the scanf() family require the field width literally in the format string and can't use the * mechanism that the printf() can for passing the value in
get rid of the s in the format string - %[ is the format specifier for "all characters matching or not matching a set", and the set is terminated by a ] on its own
compare the return value against the number of conversions you expect to happen (and for ease of management, ensure that number is 1)
That said, you'll get the same result with less pain by using fgets() to read in as much of a line as will fit in your buffer.

Using fscanf to read/tokenise a file always results in fragile code or pain and suffering. Reading a line, and tokenising or scanning that line is safe, and effective. It needs more lines of code - which means it takes longer to THINK about what you want to do (and you need to handle a finite input buffer size) - but after that life just stinks less.
Don't fight fscanf. Just don't use it. Ever.

It looks to me like you're trying to use regex operators in your fscanf string. The string [^\n\r] doesn't mean anything to fscanf, which is why your code doesn't work as expected.
Furthermore, fscanf() doesn't return EOF if the item doesn't match. Rather, it returns an integer that indicates the number of matches--which in your case is probably zero. EOF is only returned at the end of the stream or in case of an error. So what's happening in your case is that the first call to fscanf() reads all the way to the end of the file looking for a matching string, then returns 0 to let you know that no match was found. The second call then returns EOF because the entire file has been read.
Finally, note that the %s scanf format operator only captures to the next whitespace character, so you don't need to exclude \n or \r in any case.
Consult the fscanf documentation for more information: http://www.cplusplus.com/reference/clibrary/cstdio/fscanf/

Your loop has several issues. You wrote:
while( fscanf( f, "%[^\n\r]s", cLine ) != EOF )
/* do something */;
Some things to consider:
fscanf() returns the number of items stored. It can return EOF if it reads past the end of file or if the file handle has an error. You need to distinguish a valid return of zero in which case there is no new content in the buffer cLine from a successfully read.
You do a have a problem when a failure to match occurs because it is difficult to predict where the file handle is now pointing in the stream. This makes recovery from a failed match harder to do than might be expected.
The pattern you wrote probably doesn't do what you intended. It is matching any number of characters that are not CR or LF, and then expecting to find a literal s.
You haven't protected your buffer from an overflow. Any number of characters may be read from the file and written to the buffer, regardless of the size allocated to that buffer. This is an unfortunately common error, that in many cases can be exploited by an attacker to run arbitrary code of the attackers choosing.
Unless you specifically requested that f be opened in binary mode, line ending translation will happen in the library and you will generally never see CR characters, and usually not in text files.
You probably want a loop more like the following:
while(fgets(cLine, N_CLINE, f)) {
/* do something */ ;
}
where N_CLINE is the number of bytes available in the buffer starting a cLine.
The fgets() function is a much preferred way to read a line from a file. Its second parameter is the size of the buffer, and it reads up to 1 less than that size bytes from the file into the buffer. It always terminates the buffer with a nul character so that it can be safely passed to other C string functions.
It stops on the first of end of file, newline, or buffer_size-1 bytes read.
It leaves the newline character in the buffer, and that fact allows you to distinguish a single line longer than your buffer from a line shorter than the buffer.
It returns NULL if no bytes were copied due to end of file or an error, and the pointer to the buffer otherwise. You might want to use feof() and/or ferror() to distinguish those cases.

i think the problem with this code is because when you read with %[^\n\r]s, in fact, you reading until reach '\n' or '\r', but you don't reading the '\n' or '\r' also.
So you need to get this character before you read with fscanf again at loop.
Do something like that:
do{
fscanf(f, "%[^\n\r]s", cLine) != EOF
/* Do something here */
}while(fgetc(file) != EOF)

Related

Reading files with fgets()

I have a doubt about reading files in C using fgets(). I've seen people use loops in order to do this, but I skip the loop part, doing this instead.
What's the difference between using a loop and my way?
#include <stdio.h>
#include <stdlib.h>
int main() {
FILE *file = NULL;
char string[30];
file = fopen("test.txt", "r"); //test.txt contains "Hello world!"
if (file == NULL) {
puts("ERROR");
return 1;
}
fgets(string, 30, file);
puts(string);
fclose(file);
return 0;
}
Outputs:
Hello world!
What's the difference between using a loop and my way?
OP's way has many problems.
Wrong buffer size
fgets(string, 30, file); overstates the buffer size of 10 allowing undefined behavior (UB) due to a potential buffer overflow.
Input result not checked
fgets(string, 30, file); does not check the return value of fgets().
Until the return value is checked, the contents of string are not known to be updated correctly.
Extra '\n'
puts(string); appends an extra '\n'.
The entire file is not certainly read
A single read might not read the entire contents. Use a loop.
Alternative: read until fgets() returns NULL.
while (fgets(string, sizeof string, file)) {
fputs(string, stdout);
}
From the man page:
The fgets() function shall read bytes from stream into the array pointed to by s until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. A null byte shall be written immediately after the last bytes read into the array.
According to this, fgets stops reading when it encounters a newline, an EOF condition, an input error, or when it has read n - 1 characters. So your approach only reads one line from the file. That's well and good if you need to read only a single line.
To read a whole file line by line, fgets is called in a loop until an EOF condition is reached. Another way would be to read the whole file into a buffer with fread, and then parse it.
Or read it character by character by calling getc in a loop.
EDIT: In your code, fgets is trying to read (n - 1) 29 bytes of memory whereas you allocated only 10 bytes for the buffer. This leads to undefined behaviour. The memory not allocated should not be read. Use sizeof (string) instead.
"Hello World!" can not fit in a buffer you allocated 10 bytes for.
RETURN VALUE:
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer, and shall set errno to indicate the error.
You didn't check the return value of fgets.

Confused with making an input into an empty array.

Say I make an input :
"Hello world" // hit a new line
"Goodbye world" // second input
How could I scan through the two lines and input them separately in two different arrays. I believe I need to use getchar until it hits a '\n'. But how do I scan for the second input.
Thanks in advance. I am a beginner in C so please It'd be helpful to do it without pointers as I haven't covered that topic.
Try this code out :
#include<stdio.h>
int main(void)
{
int flx=0,fly=0;
char a,b[10][100];
while(1)
{
a=getchar();
if(a==EOF) exit(0);
else if(a=='\n')
{
flx++;
fly=0;
}
else
{
b[flx][fly++]=a;
}
}
}
Here I use a two dimensional array to store the strings.I read the input character by character.First i create an infinite loop which continues reading characters.If the user enters the end of File character the input stops. If there is a newline character then flx variable is incremented and the next characters are stored in the next array position.You can refer to the strings stored with b[n] where n is the index.
The function that you should probably look at is fgets. At least on my system, the definition is as follows:
char *fgets(char * restrict str, int size, FILE * restrict stream);
So a very simple program to read input from the keyboard would run something like this:
#include <stdio.h>
#include <stdlib.h>
#define MAXSTRINGSIZE 128
int main(void)
{
char array[2][MAXSTRINGSIZE];
int i;
void *result;
for (i = 0; i < 2; i++)
{
printf("Input String %d: ", i);
result = fgets(&array[i][0], MAXSTRINGSIZE, stdin);
if (result == NULL) exit(1);
}
printf("String 1: %s\nString 2: %s\n", &array[0][0], &array[1][0]);
exit(0);
}
That compiles and runs correctly on my system. The only issue with fgets though is that is retains the newline character \n in the string. So if you don't want that, you will need to remove it. As for the *FILE parameter, stdin is a predefined *FILE structure that indicates standard input, or file descriptor 0. There are also stdout for standard output (file descriptor 1) and a stderr for error messages and diagnostics (file descriptor 2). The file descriptor numbers correspond to the ones used in a shell like so:
$$$-> cat somefile > someotherfile 2>&1
What that does is take outfile of file descriptor 2 and redirect it to 1 with 1 in turn being redirected to a file. In addition, I am using the & operator because we are addressing parts of an array, and the functions in question (fgets, printf) require pointers. As for the result, the man page for gets and fgets states the following:
RETURN VALUES
Upon successful completion, fgets() and gets() return a pointer to the string. If end-of-file occurs before any characters are read,
they return NULL and the buffer contents remain unchanged. If an
error occurs, they return NULL and the buffer contents are
indeterminate. The fgets() and gets() functions do not distinguish
between end-of-file and error, and callers must use feof(3) and
ferror(3) to determine which occurred.
So to make your code more robust, if you get a NULL result, you need to check for errors using ferror or end of file using feof and respond approperiately. Furthermore, never EVER use gets. The only way that you can use it securely is that you have to have the ability to see into the future, which clearly nobody can do so it cannot be used securely. It will just open you up for a buffer overflow attack.

Using fscanf until EOF without using getc

I have a text file of composed of sequences of 2 bytes which I have to store in an array.
I have declared FILE *ptr.
How can I loop until EOF without using the method:
while(c = getc() != EOF)
{
// do something
}
I want to implement something along the lines of (PSEUDOCODE):
while (ptr is not pointing to the end of the file)
{
fscanf(...) // I will read in the bytes etc.
}
The getc() method wouldn't work well for me because I am reading in blocks of 2 bytes at a time.
You can use fread to read more than one byte at a time. fread returns the number of items it was able to read.
For example, to read 2-byte chunks you might use:
while ((fread(target, 2, 1, ptr) == 1) {
/* ... */
}
Here 2 is the number of bytes in each "item", and 1 is the number of "items" you want to read on each call.
In general, you shouldn't use feof() to control when to terminate an input loop. Use the value returned by whichever input routine you're using. Different input functions vary in the information they provide; you'll have to read the documentation for the one you're using.
Note that this will treat an end-of-line as a single '\n' character. You say you're reading from a text file; it's not clear how you want to handle line endings. You should also decide what you want to do if the file has an odd number of characters.
Another option is to call getc() twice in the loop, checking its result both times.
The only way to tell when you've reached the end of the file is when you try to read past it, and the read fails. (Yes, there is an feof() function, but it only returns true after you've tried to read past the end of the file.)
This means that, if you're going to use fscanf() to read your input, it's the return value of fscanf() itself that you need to check.
Specifically, fscanf() returns the number of items it has successfully read, or EOF (which is a negative value, typically -1) if the input ended before anything at all could be read. Thus, your input loop might look something like this:
while (1) {
/* ... */
int n = fscanf(ptr, "...", ...);
if (n == EOF && !ferror(ptr)) {
/* we've reached the end of the input; stop the loop */
break;
} else if (n < items_requested) {
if (ferror(ptr)) perror("Error reading input file");
else fprintf(stderr, "Parse error or unexpected end of input!\n");
exit(1); /* or do whatever you want to handle the error */
}
/* ... */
}
That said, there may be other options, too. For example, if your input is structured as lines (which a lot of text input is), you may be better off reading the input line by line with fgets(), and then parsing the lines e.g. with sscanf().
Also, technically, there is a way to peek one byte ahead in the input, using ungetc(). That is, you could do something like this:
int c;
while ((c = getc(ptr)) != EOF) {
ungetc(c, ptr);
/* ... now read and parse the input ... */
}
The problem is that this only checks that you can read one more byte before EOF; it doesn't, and can't, actually check that your fscanf() call will have enough data to match all the requested items. Thus, you still need to check the return value of fscanf() anyway — and if you're going to do that, you might as well use it for EOF detection too.

Use of fgets() and gets()

#include <stdlib.h>
#include <stdio.h>
int main() {
char ch, file_name[25];
FILE *fp;
printf("Enter the name of file you wish to see\n");
gets(file_name);
fp = fopen(file_name,"r"); // is for read mode
if (fp == NULL) {
printf(stderr, "There was an Error while opening the file.\n");
return (-1);
}
printf("The contents of %s file are :\n", file_name);
while ((ch = fgetc(fp)) != EOF)
printf("%c",ch);
fclose(fp);
return 0;
}
This code seems to work but I keep getting a warning stating "warning: this program uses gets(), which is unsafe."
So I tried to use fgets() but I get an error which states "too few arguments to function call expected 3".
Is there a way around this?
First : Never use gets() .. it can cause buffer overflows
second: show us how you used fgets() .. the correct way should look something like this:
fgets(file_name,sizeof(file_name),fp); // if fp has been opened
fgets(file_name,sizeof(file_name),stdin); // if you want to input the file name on the terminal
// argument 1 -> name of the array which will store the value
// argument 2 -> size of the input you want to take ( size of the input array to protect against buffer overflow )
// argument 3 -> input source
FYI:
fgets converts the whole input into a string by putting a \0 character at the end ..
If there was enough space then fgets will also get the \n from your input (stdin) .. to get rid of the \n and still make the whole input as a string , do this:
fgets(file_name,sizeof(file_name),stdin);
file_name[strlen(file_name)] = '\0';
Yes: fgets expects 3 arguments: the buffer (same as with gets), the size of the buffer and the stream to read from. In your case your buffer-size can be obtained with sizeof file_name and the stream you want to read from is stdin. All in all, this is how you'll call it:
fgets(file_name, sizeof file_name, stdin);
The reason gets is unsafe is because it doesn't (cannot) know the size of the buffer that it will read into. Therefore it is prone to buffer-overflows because it will just keep on writing to the buffer even though it's full.
fgets doesn't have this problem because it makes you provide the size of the buffer.
ADDIT: your call to printf inside the if( fp == NULL ) is invalid. printf expects as its first argument the format, not the output stream. I think you want to call fprintf instead.
Finally, in order to correctly detect EOF in your while-condition you must declare ch as an int. EOF may not necessarily fit into a char, but it will fit in an int (and getc also returns an int). You can still print it with %c.
Rather than ask how to use fgets() you should either use google, or look at the Unix/Linux man page or the VisualStudio documentation for the function. There are hundreds of functions in C, C++ and lots of class objects. You need to first figure out how to answer the basics yourself, so that your real questions stand a chance of being answered.
If you are new to C, you are definitely doing the right thing of experimenting, but take a look at other code, as you go along, to learn some of the tips/tricks of how code is written.

File Handling question on C programming

I want to read line-by-line from a given input file,, process each line (i.e. its words) and then move on to other line...
So i am using fscanf(fptr,"%s",words) to read the word and it should stop once it encounters end of line...
but this is not possible in fscanf, i guess... so please tell me the way as to what to do...
I should read all the words in the given line (i.e. end of line should be encountered) to terminate and then move on to other line, and repeat the same process..
Use fgets(). Yeah, link is to cplusplus, but it originates from c stdio.h.
You may also use sscanf() to read words from string, or just strtok() to separate them.
In response to comment: this behavior of fgets() (leaving \n in the string) allows you to determine if the actual end-of-line was encountered. Note, that fgets() may also read only part of the line from file if supplied buffer is not large enough. In your case - just check for \n in the end and remove it, if you don't need it. Something like this:
// actually you'll get str contents from fgets()
char str[MAX_LEN] = "hello there\n";
size_t len = strlen(str);
if (len && str[len-1] == '\n') {
str[len-1] = 0;
}
Simple as that.
If you are working on a system with the GNU extensions available there is something called getline (man 3 getline) which allows you to read a file on a line by line basis, while getline will allocate extra memory for you if needed. The manpage contains an example which I modified to split the line using strtok (man 3 strtrok).
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE * fp;
char * line = NULL;
size_t len = 0;
ssize_t read;
fp = fopen("/etc/motd", "r");
if (fp == NULL)
{
printf("File open failed\n");
return 0;
}
while ((read = getline(&line, &len, fp)) != -1) {
// At this point we have a line held within 'line'
printf("Line: %s", line);
const char * delim = " \n";
char * ptr;
ptr = (char * )strtok(line,delim);
while(ptr != NULL)
{
printf("Word: %s\n",ptr);
ptr = (char *) strtok(NULL,delim);
}
}
if (line)
{
free(line);
}
return 0;
}
Given the buffering inherent in all the stdio functions, I would be tempted to read the stream character by character with getc(). A simple finite state machine can identify word boundaries, and line boundaries if needed. An advantage is the complete lack of buffers to overflow, aside from whatever buffer you collect the current word in if your further processing requires it.
You might want to do a quick benchmark comparing the time required to read a large file completely with getc() vs. fgets()...
If an outside constraint requires that the file really be read a line at a time (for instance, if you need to handle line-oriented input from a tty) then fgets() probably is your friend as other answers point out, but even then the getc() approach may be acceptable as long as the input stream is running in line-buffered mode which is common for stdin if stdin is on a tty.
Edit: To have control over the buffer on the input stream, you might need to call setbuf() or setvbuf() to force it to a buffered mode. If the input stream ends up unbuffered, then using an explicit buffer of some form will always be faster than getc() on a raw stream.
Best performance would probably use a buffer related to your disk I/O, at least two disk blocks in size and probably a lot more than that. Often, even that performance can be beat by arranging the input to be a memory mapped file and relying on the kernel's paging to read and fill the buffer as you process the file as if it were one giant string.
Regardless of the choice, if performance is going to matter then you will want to benchmark several approaches and pick the one that works best in your platform. And even then, the simplest expression of your problem may still be the best overall answer if it gets written, debugged and used.
but this is not possible in fscanf,
It is, with a bit of wickedness ;)
Update: More clarification on evilness
but unfortunately a bit wrong. I assume [^\n]%*[^\n] should read [^\n]%*. Moreover, one should note that this approach will strip whitespaces from the lines. – dragonfly
Note that xstr(MAXLINE) [^\n] reads MAXLINE characters which can be anything except the newline character (i.e. \n). The second part of the specifier i.e. *[^\n] rejects anything (that's why the * character is there) if the line has more than MAXLINE characters upto but NOT including the newline character. The newline character tells scanf to stop matching. What if we did as dragonfly suggested? The only problem is scanf will not know where to stop and will keep suppressing assignment until the next newline is hit (which is another match for the first part). Hence you will trail by one line of input when reporting.
What if you wanted to read in a loop? A little modification is required. We need to add a getchar() to consume the unmatched newline. Here's the code:
#include <stdio.h>
#define MAXLINE 255
/* stringify macros: these work only in pairs, so keep both */
#define str(x) #x
#define xstr(x) str(x)
int main() {
char line[ MAXLINE + 1 ];
/*
Wickedness explained: we read from `stdin` to `line`.
The format specifier is the only tricky part: We don't
bite off more than we can chew -- hence the specification
of maximum number of chars i.e. MAXLINE. However, this
width has to go into a string, so we stringify it using
macros. The careful reader will observe that once we have
read MAXLINE characters we discard the rest upto and
including a newline.
*/
int n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
if (!feof(stdin)) {
getchar();
}
while (n == 1) {
printf("[line:] %s\n", line);
n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
if (!feof(stdin)) {
getchar();
}
}
return 0;
}

Resources