Why does C print an extra line when reading from a file?

Why does C print an extra line when reading from a file? - c

I'm brand new to C and trying to learn how to read a file. My file is a simple file (just for testing) which contains the following:
this file
has been
successfully read
by C!
So I read the file using the following C code:
#include <stdio.h>
int main() {
char str[100];
FILE *file = fopen("/myFile/path/test.txt", "r");
if(file == NULL) {
puts("This file does not exist!");
return -1;
}
while(fgets(str, 100, file) != '\0') {
puts(str);
}
fclose(file);
return 0;
}
This prints my text like this:
this file
has been
successfully read
by C!
When I compile and run it, I pipe its output to hexdump -C and can see an extra 0a at the end of each line.
Finally, why do I need to declare an array of chars to read from a file? What if I don't know how much data is on each line?

fgets() reads up to the newline and keeps the newline in the string and puts() always adds a newline to the string it is given to print. Hence you get double-spaced output when used as in your code.
Use fputs(str, stdout) instead of puts(); it does not add a newline.
The obsolete function gets() — removed from the 2011 version of the C standard — read up to the newline but removed it. The gets() and puts() pair worked well together, as do fgets() and fputs(). However, you should certainly NOT use gets(); it is a catastrophe waiting to happen. (The first internet worm in 1988 used gets() to migrate — Google search for 'morris internet worm').
In comments, inquisitor asked:
Why does the line need to be read into a char array of a specific size?
Because you need to make sure you don't overrun the space that is available. C does not do automatic allocation of space for strings. That is one of its weaknesses from some viewpoints; it is also a strength, but it routinely confuses newcomers to the language. If you want the input code to allocate enough space for a line, use the POSIX function getline().
So is it better to just read and output until I hit a '\0' since I won't always know the amount of chars on a given line?
No. In general, you won't hit '\0'; most text files do not contain any of those. If you don't want to allocate enough space for a line, then use:
int c;
while ((c = getchar()) != EOF)
putchar(c);
which reads one character at a time in the user code, but the underlying standard I/O packages buffer the input up so it isn't too costly — it is perfectly feasible to implement a program that way. If you need to work on lines, either allocate enough space for lines (I use char buffer[4096]; routinely) or use getline().
And Charlie Burns asked in a comment:
Why don't we see getline() suggested more often?
I think it is not mentioned all that often because getline() is relatively new, and not necessarily available everywhere yet. It was added to POSIX 2008; it is available on Linux and BSD. I'm not sure about the other mainline Unix variants (AIX, HP-UX, Solaris). It isn't hard to write for yourself (I've done it), but it is a nuisance if you need to write portable code (especially if 'portable' includes 'Microsoft'). One of its merits is that it tells you how long the line it read actually was.
Example using getline()
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char *line = 0;
size_t length = 0;
char const name[] = "/myFile/path/test.txt";
FILE *file = fopen(name, "r");
if (file == NULL)
{
fprintf(stderr, "%s: failed to open file %s\n", argv[0], name);
return -1;
}
while (getline(&line, &length, file) > 0)
fputs(str, stdout);
free(line);
fclose(file);
return 0;
}

fgets saves the newline character at the end of the line when reading line by line. This allows you to determine wether actually a line was read or just your buffer was too small.
puts always adds a newline when printing.
Either trim off the newline from fgets or use printf
printf("%s", str);

Related

How to get fscanf to stop if it hits a newline? [duplicate]

I'm trying to read a line using the following code:
while(fscanf(f, "%[^\n\r]s", cLine) != EOF )
{
/* do something with cLine */
}
But somehow I get only the first line every time. Is this a bad way to read a line? What should I fix to make it work as expected?

It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure.
I prefer to use fgets() to get each line in and then sscanf() that. You can then continue to examine the line read in as you see fit. Something like:
#define LINESZ 1024
char buff[LINESZ];
FILE *fin = fopen ("infile.txt", "r");
if (fin != NULL) {
while (fgets (buff, LINESZ, fin)) {
/* Process buff here. */
}
fclose (fin);
}
fgets() appears to be what you're trying to do, reading in a string until you encounter a newline character.

If you want read a file line by line (Here, line separator == '\n') just make that:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
FILE *fp;
char *buffer;
int ret;
// Open a file ("test.txt")
if ((fp = fopen("test.txt", "r")) == NULL) {
fprintf(stdout, "Error: Can't open file !\n");
return -1;
}
// Alloc buffer size (Set your max line size)
buffer = malloc(sizeof(char) * 4096);
while(!feof(fp))
{
// Clean buffer
memset(buffer, 0, 4096);
// Read a line
ret = fscanf(fp, "%4095[^\n]\n", buffer);
if (ret != EOF) {
// Print line
fprintf(stdout, "%s\n", buffer);
}
}
// Free buffer
free(buffer);
// Close file
fclose(fp);
return 0;
}
Enjoy :)

If you try while( fscanf( f, "%27[^\n\r]", cLine ) == 1 ) you might have a little more luck. The three changes from your original:
length-limit what gets read in - I've used 27 here as an example, and unfortunately the scanf() family require the field width literally in the format string and can't use the * mechanism that the printf() can for passing the value in
get rid of the s in the format string - %[ is the format specifier for "all characters matching or not matching a set", and the set is terminated by a ] on its own
compare the return value against the number of conversions you expect to happen (and for ease of management, ensure that number is 1)
That said, you'll get the same result with less pain by using fgets() to read in as much of a line as will fit in your buffer.

Using fscanf to read/tokenise a file always results in fragile code or pain and suffering. Reading a line, and tokenising or scanning that line is safe, and effective. It needs more lines of code - which means it takes longer to THINK about what you want to do (and you need to handle a finite input buffer size) - but after that life just stinks less.
Don't fight fscanf. Just don't use it. Ever.

It looks to me like you're trying to use regex operators in your fscanf string. The string [^\n\r] doesn't mean anything to fscanf, which is why your code doesn't work as expected.
Furthermore, fscanf() doesn't return EOF if the item doesn't match. Rather, it returns an integer that indicates the number of matches--which in your case is probably zero. EOF is only returned at the end of the stream or in case of an error. So what's happening in your case is that the first call to fscanf() reads all the way to the end of the file looking for a matching string, then returns 0 to let you know that no match was found. The second call then returns EOF because the entire file has been read.
Finally, note that the %s scanf format operator only captures to the next whitespace character, so you don't need to exclude \n or \r in any case.
Consult the fscanf documentation for more information: http://www.cplusplus.com/reference/clibrary/cstdio/fscanf/

Your loop has several issues. You wrote:
while( fscanf( f, "%[^\n\r]s", cLine ) != EOF )
/* do something */;
Some things to consider:
fscanf() returns the number of items stored. It can return EOF if it reads past the end of file or if the file handle has an error. You need to distinguish a valid return of zero in which case there is no new content in the buffer cLine from a successfully read.
You do a have a problem when a failure to match occurs because it is difficult to predict where the file handle is now pointing in the stream. This makes recovery from a failed match harder to do than might be expected.
The pattern you wrote probably doesn't do what you intended. It is matching any number of characters that are not CR or LF, and then expecting to find a literal s.
You haven't protected your buffer from an overflow. Any number of characters may be read from the file and written to the buffer, regardless of the size allocated to that buffer. This is an unfortunately common error, that in many cases can be exploited by an attacker to run arbitrary code of the attackers choosing.
Unless you specifically requested that f be opened in binary mode, line ending translation will happen in the library and you will generally never see CR characters, and usually not in text files.
You probably want a loop more like the following:
while(fgets(cLine, N_CLINE, f)) {
/* do something */ ;
}
where N_CLINE is the number of bytes available in the buffer starting a cLine.
The fgets() function is a much preferred way to read a line from a file. Its second parameter is the size of the buffer, and it reads up to 1 less than that size bytes from the file into the buffer. It always terminates the buffer with a nul character so that it can be safely passed to other C string functions.
It stops on the first of end of file, newline, or buffer_size-1 bytes read.
It leaves the newline character in the buffer, and that fact allows you to distinguish a single line longer than your buffer from a line shorter than the buffer.
It returns NULL if no bytes were copied due to end of file or an error, and the pointer to the buffer otherwise. You might want to use feof() and/or ferror() to distinguish those cases.

i think the problem with this code is because when you read with %[^\n\r]s, in fact, you reading until reach '\n' or '\r', but you don't reading the '\n' or '\r' also.
So you need to get this character before you read with fscanf again at loop.
Do something like that:
do{
fscanf(f, "%[^\n\r]s", cLine) != EOF
/* Do something here */
}while(fgetc(file) != EOF)

Scans multiple lines in C

I am new to C programming. I am taking a class where I have to:
The program will take all input from standard input, possibly transform it, and output it to standard output.
The program will read in input line by line. Transformations, if any, will be done per line. Then print out the transformed line.
You will have to read from the user until there is no more text left. Ctrl+D can be typed into the terminal to indicate there is no text left.
I am not a student who is looking for the answer to be done for me, but I am completely lost here. I tried to use:
char*buf = NULL;
while (fscanf(stdin, "%ms", &buf) > 0)
{ do transform }
but I have no luck. So any help is appreciated. Also I have no idea about the Ctrl+D part.

char*buf = NULL;
while (fscanf(stdin, "%ms", &buf) > 0)
has the following problems.
buf does not point to anything valid where input can be read and stored.
%ms is not a standard C format specifier (it is supported in POSIX standard compliant platforms, thanks #JonathanLeffler).
It will be better to use fgets to read lines of text.
I sugguest:
// Make LINE_LENGTH large enough for your needs.
#define LINE_LENGTH 200
char buf[LINE_LENGTH];
while ( fgets(buf, LINE_LENGTH, stdin) != NULL )
{
// Use buf
}

Ctrl+D is EOF usually so just check for that. fscanf(stdin, "%ms", &buf)!=EOF
Also you reserved just a pointer to char, you should either statically reserve array or do dynamic allocation.
char buf[255];
or
char *buf = (char*) malloc(255);
EDIT:
As noted by Jonathan Leffler fscanf() is really terrible idea if your lines don't have specific format use fgets() https://www.tutorialspoint.com/c_standard_library/c_function_fgets.htm

Since you tagged as C++, try this:
std::string text;
std::getline(cin, text);
The std::string will dynamically expand as necessary.
The getline function will read until an end-of-line character is read.
Much safer than reading into a character array.

C : Best way to go to a known line of a file

I have a file in which I'd like to iterate without processing in any sort the current line. What I am looking for is the best way to go to a determined line of a text file. For example, storing the current line into a variable seems useless until I get to the pre-determined line.
Example :
file.txt
foo
fooo
fo
here
Normally, in order to get here, I would have done something like :
FILE* file = fopen("file.txt", "r");
if (file == NULL)
perror("Error when opening file ");
char currentLine[100];
while(fgets(currentLine, 100, file))
{
if(strstr(currentLine, "here") != NULL)
return currentLine;
}
But fgetswill have to read fully three line uselessly and currentLine will have to store foo, fooo and fo.
Is there a better way to do this, knowing that here is line 4? Something like a go tobut for files?

Since you do not know the length of every line, no, you will have to go through the previous lines.
If you knew the length of every line, you could probably play with how many bytes to move the file pointer. You could do that with fseek().

You cannot access directly to a given line of a textual file (unless all lines have the same size in bytes; and with UTF8 everywhere a Unicode character can take a variable number of bytes, 1 to 6; and in most cases lines have various length - different from one line to the next). So you cannot use fseek (because you don't know in advance the file offset).
However (at least on Linux systems), lines are ending with \n (the newline character). So you could read byte by byte and count them:
int c= EOF;
int linecount=1;
while ((c=fgetc(file)) != EOF) {
if (c=='\n')
linecount++;
}
You then don't need to store the entire line.
So you could reach the line #45 this way (using while ((c=fgetc(file)) != EOF) && linecount<45) ...) and only then read entire lines with fgets or better yet getline(3) on POSIX systems (see this example). Notice that the implementation of fgets or of getline is likely to be built above fgetc, or at least share some code with it. Remember that <stdio.h> is buffered I/O, see setvbuf(3) and related functions.
Another way would be to read the file in two passes. A first pass stores the offset (using ftell(3)...) of every line start in some efficient data structure (a vector, an hashtable, a tree...). A second pass use that data structure to retrieve the offset (of the line start), then use fseek(3) (using that offset).
A third way, POSIX specific, would be to memory-map the file using mmap(2) into your virtual address space (this works well for not too huge files, e.g. of less than a few gigabytes). With care (you might need to mmap an extra ending page, to ensure the data is zero-byte terminated) you would then be able to use strchr(3) with '\n'
In some cases, you might consider parsing your textual file line by line (using appropriately fgets, or -on Linux- getline, or generating your parser with flex and bison) and storing each line in a relational database (such as PostGreSQL or sqlite).
PS. BTW, the notion of lines (and the end-of-line mark) vary from one OS to the next. On Linux the end-of-line is a \n character. On Windows lines are rumored to end with \r\n, etc...

A FILE * in C is a stream of chars. In a seekable file, you can address these chars using the file pointer with fseek(). But apart from that, there are no "special characters" in files, a newline is just another normal character.
So in short, no, you can't jump directly to a line of a text file, as long as you don't know the lengths of the lines in advance.
This model in C corresponds to the files provided by typical operating systems. If you think about it, to know the starting points of individual lines, your file system would have to store this information somewhere. This would mean treating text files specially.
What you can do however is just count the lines instead of pattern matching, something like this:
#include <stdio.h>
int main(void)
{
char linebuf[1024];
FILE *input = fopen("seekline.c", "r");
int lineno = 0;
char *line;
while (line = fgets(linebuf, 1024, input))
{
++lineno;
if (lineno == 4)
{
fputs("4: ", stdout);
fputs(line, stdout);
break;
}
}
fclose(input);
return 0;
}

If you don't know the length of each line, you have to go through all of them. But if you know the line you want to stop you can do this:
while (!found && fgets(line, sizeof line, file) != NULL) /* read a line */
{
if (count == lineNumber)
{
//you arrived at the line
//in case of a return first close the file with "fclose(file);"
found = true;
}
else
{
count++;
}
}
At least you can avoid so many calls to strstr

C: read (text) file into array, can't see what's wrong?

first time posting here. I have looked at a few other peoples ways to do this, and one of the ways was nearly exactly the same way that I am trying to do. But, it doesn't work for me?
#include<stdio.h>
int main()
{
FILE *file;
char buffer[15];
char *text[12];
file = fopen("Text.txt", "r");
if(!file) {
printf("Failed");
return 1;
}
int count = 0;
while(fgets(buffer,sizeof buffer, file) != NULL) {
printf("%s", buffer);
text[count] = buffer;
count++;
}
printf("\n");
for (count=0;count<10;count++) {
printf("%s\n", text[count]);
}
fclose(file);
return 0;
}
Now on another site (while looking for a solution or help I found this http://www.daniweb.com/software-development/c/threads/316766/storing-string-in-a-array-reading-from-text-file
Where the person has done it the same way as me (apart from obviously slight differences in what they're reading etc).
My text file reads:
The
Quick
Brown
Fox
Jumps
Over
The
Lazy
Dog (all on their own lines).
Basically I want to read the file in line by line and save each line in the next space in the array.
In the code when I use the line
printf("%s", buffer);
It prints out each word of each line okay to the console window.
However, when I print out the contents of the array with the for loop it simply prints out "dog, dog, dog..." for each space in the array.
Can someone help me here? Am I doing it wrong?

You haven't allocated any memory for text. And you have to copy the value of buffer as it changes in each iteration. Something like:
while(fgets(buffer,sizeof buffer, file) != NULL) {
printf("%s", buffer);
text[count] = malloc(strlen(buffer) + 1);
strcpy(text[count], buffer);
count++;
}
Also, check the return value of malloc for allocation failure.

The buffer is always the same address, so all the elements of array text contains the same pointer (or are uninitialized).
You should consider (assuming a POSIX system):
initializing all your arrays by zeroing them (and also initialize all pointers to NULL), e.g. memset(buffer, 0, sizeof(buffer)); (and for your line buffer, clear it also at the end of the loop).
duplicating the read string with strdup i.e. code
text[count] = strdup(buffer);
and don't forget to free that when appropriate. Actually, you should check that strdup did not fail by returning a NULL pointer.
use getline to read a line (don't forget to initialize the pointer, perhaps to NULL and to eventually free it).
end your printf formatting string with a newline \n or else call fflush, notably before any input.
use the error code errno to display a better error message. Or just call perror like
file = fopen("Text.txt", "r");
if (!file) {
perror("fopen Text.txt failed");
exit (EXIT_FAILURE);
}
Take the habit to compile with all warnings and debugging information (e.g. gcc -Wall -g on Linux) and learn how to use the debugger (e.g. gdb), notably try running line by line your program.
You might be interested in coding for a strict (non POSIX) but plain C2011 standard. But then you have much less freedom to use many POSIX functions (like strdup etc...). I leave the boring exercise to code for strict C99 compliance to the reader (remember that C99 does not even know about directories and suppose nearly "flat" filesystems), because I am a Posix and Linux fan and don't care about non Posix systems (in particular I don't care and did not use Windows or Microsoft systems since 1990).
Also, try to read the code of some free software project (e.g. from freecode). You'll learn a lot

There are two problems in the code:
In the while loop you're storing the same pointer (buffer) over and over again into the array of pointers and you are overwriting the buffer contents too. You should probably advance the buffer pointer by the length of each read string.
In the for loop you're destroying the actual line count and potentially dereferencing uninitialized pointers. Don't do that.

File Handling question on C programming

I want to read line-by-line from a given input file,, process each line (i.e. its words) and then move on to other line...
So i am using fscanf(fptr,"%s",words) to read the word and it should stop once it encounters end of line...
but this is not possible in fscanf, i guess... so please tell me the way as to what to do...
I should read all the words in the given line (i.e. end of line should be encountered) to terminate and then move on to other line, and repeat the same process..

Use fgets(). Yeah, link is to cplusplus, but it originates from c stdio.h.
You may also use sscanf() to read words from string, or just strtok() to separate them.
In response to comment: this behavior of fgets() (leaving \n in the string) allows you to determine if the actual end-of-line was encountered. Note, that fgets() may also read only part of the line from file if supplied buffer is not large enough. In your case - just check for \n in the end and remove it, if you don't need it. Something like this:
// actually you'll get str contents from fgets()
char str[MAX_LEN] = "hello there\n";
size_t len = strlen(str);
if (len && str[len-1] == '\n') {
str[len-1] = 0;
}
Simple as that.

If you are working on a system with the GNU extensions available there is something called getline (man 3 getline) which allows you to read a file on a line by line basis, while getline will allocate extra memory for you if needed. The manpage contains an example which I modified to split the line using strtok (man 3 strtrok).
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE * fp;
char * line = NULL;
size_t len = 0;
ssize_t read;
fp = fopen("/etc/motd", "r");
if (fp == NULL)
{
printf("File open failed\n");
return 0;
}
while ((read = getline(&line, &len, fp)) != -1) {
// At this point we have a line held within 'line'
printf("Line: %s", line);
const char * delim = " \n";
char * ptr;
ptr = (char * )strtok(line,delim);
while(ptr != NULL)
{
printf("Word: %s\n",ptr);
ptr = (char *) strtok(NULL,delim);
}
}
if (line)
{
free(line);
}
return 0;
}

Given the buffering inherent in all the stdio functions, I would be tempted to read the stream character by character with getc(). A simple finite state machine can identify word boundaries, and line boundaries if needed. An advantage is the complete lack of buffers to overflow, aside from whatever buffer you collect the current word in if your further processing requires it.
You might want to do a quick benchmark comparing the time required to read a large file completely with getc() vs. fgets()...
If an outside constraint requires that the file really be read a line at a time (for instance, if you need to handle line-oriented input from a tty) then fgets() probably is your friend as other answers point out, but even then the getc() approach may be acceptable as long as the input stream is running in line-buffered mode which is common for stdin if stdin is on a tty.
Edit: To have control over the buffer on the input stream, you might need to call setbuf() or setvbuf() to force it to a buffered mode. If the input stream ends up unbuffered, then using an explicit buffer of some form will always be faster than getc() on a raw stream.
Best performance would probably use a buffer related to your disk I/O, at least two disk blocks in size and probably a lot more than that. Often, even that performance can be beat by arranging the input to be a memory mapped file and relying on the kernel's paging to read and fill the buffer as you process the file as if it were one giant string.
Regardless of the choice, if performance is going to matter then you will want to benchmark several approaches and pick the one that works best in your platform. And even then, the simplest expression of your problem may still be the best overall answer if it gets written, debugged and used.

but this is not possible in fscanf,
It is, with a bit of wickedness ;)
Update: More clarification on evilness
but unfortunately a bit wrong. I assume [^\n]%*[^\n] should read [^\n]%*. Moreover, one should note that this approach will strip whitespaces from the lines. – dragonfly
Note that xstr(MAXLINE) [^\n] reads MAXLINE characters which can be anything except the newline character (i.e. \n). The second part of the specifier i.e. *[^\n] rejects anything (that's why the * character is there) if the line has more than MAXLINE characters upto but NOT including the newline character. The newline character tells scanf to stop matching. What if we did as dragonfly suggested? The only problem is scanf will not know where to stop and will keep suppressing assignment until the next newline is hit (which is another match for the first part). Hence you will trail by one line of input when reporting.
What if you wanted to read in a loop? A little modification is required. We need to add a getchar() to consume the unmatched newline. Here's the code:
#include <stdio.h>
#define MAXLINE 255
/* stringify macros: these work only in pairs, so keep both */
#define str(x) #x
#define xstr(x) str(x)
int main() {
char line[ MAXLINE + 1 ];
/*
Wickedness explained: we read from `stdin` to `line`.
The format specifier is the only tricky part: We don't
bite off more than we can chew -- hence the specification
of maximum number of chars i.e. MAXLINE. However, this
width has to go into a string, so we stringify it using
macros. The careful reader will observe that once we have
read MAXLINE characters we discard the rest upto and
including a newline.
*/
int n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
if (!feof(stdin)) {
getchar();
}
while (n == 1) {
printf("[line:] %s\n", line);
n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
if (!feof(stdin)) {
getchar();
}
}
return 0;
}