Reading file Line By Line in C - c

Preface:
This question is about reading a file, line by line, and inserting each line into a linked list.
I have already written the implementation for the linked list, and tested the insert() function manually. This works.
I have also written the code for reading text from a file, and writing it out. Again, this also works.
OKAY: HERE'S MY QUESTION
How can I merge these concepts, and write a function that reads text from a file, line by line, and inserts each line as a node in the linked list?
When reading from a file, I do the following:
//Code for reading a file
int c;
while((c = getc(f))!= EOF) {
putchar(c); //Prints out the character
}
fclose(f); //Close the file
The insert() function takes two parameters, one being the linked list head node, and the second one being the dataEntry ("string") to be held in that node.
void insert(node_lin *head, char *dataEntry) { ... }
Hence, since the function getc gets each character separately, and the putchar writes out each character to the screen, I imagine the code to do something along these lines:
Read each character until the end of file (EOF)
For each character, until reaching a new line ('\n'), append this to the previously read characters (building a "string)
If reaching the end of the line, insert this "string" into the linked list
Repeat until reaching EOF
//Code for reading a file
int c;
while((c = getc(f))!= EOF) {
//Build a string here consisting of characters from c, until reaching a new line.
/*
if(c == '\n') { //Indicates a new line
//Insert the line you have into the linked list: insert(myLinkedList, line);
}
*/
}
fclose(f); //Close the file
The thing is, I already have a working read_file function, as well as a working insert() function. The thing I need help with is separating the file into lines, and inserting these.
Thanks guys!

Replace your character-by-character reading by something more high-level.
The most typical choice would be fgets(), but that requires you to specify a static limit for the line's length.
If you have getline() you can use that, it will handle any line-length but it is POSIX, not standarc C.
Also, you should change your insert() function to accept const char * as the second argument (and remember to allocate memory inside and copy the text, of course).

You can use fgets to read entire line from the file until new line character is encountered
fgets (buffer, 128, f);
When reading from a file, you can do the following:
//Code for reading a file
char buffer[128]; // decide the buffer size as per your requirements.
while((fgets (buffer, 128, f))!= NULL) {
printf (buffer);
}
fclose(f); //Close the file

Related

C : Best way to go to a known line of a file

I have a file in which I'd like to iterate without processing in any sort the current line. What I am looking for is the best way to go to a determined line of a text file. For example, storing the current line into a variable seems useless until I get to the pre-determined line.
Example :
file.txt
foo
fooo
fo
here
Normally, in order to get here, I would have done something like :
FILE* file = fopen("file.txt", "r");
if (file == NULL)
perror("Error when opening file ");
char currentLine[100];
while(fgets(currentLine, 100, file))
{
if(strstr(currentLine, "here") != NULL)
return currentLine;
}
But fgetswill have to read fully three line uselessly and currentLine will have to store foo, fooo and fo.
Is there a better way to do this, knowing that here is line 4? Something like a go tobut for files?
Since you do not know the length of every line, no, you will have to go through the previous lines.
If you knew the length of every line, you could probably play with how many bytes to move the file pointer. You could do that with fseek().
You cannot access directly to a given line of a textual file (unless all lines have the same size in bytes; and with UTF8 everywhere a Unicode character can take a variable number of bytes, 1 to 6; and in most cases lines have various length - different from one line to the next). So you cannot use fseek (because you don't know in advance the file offset).
However (at least on Linux systems), lines are ending with \n (the newline character). So you could read byte by byte and count them:
int c= EOF;
int linecount=1;
while ((c=fgetc(file)) != EOF) {
if (c=='\n')
linecount++;
}
You then don't need to store the entire line.
So you could reach the line #45 this way (using while ((c=fgetc(file)) != EOF) && linecount<45) ...) and only then read entire lines with fgets or better yet getline(3) on POSIX systems (see this example). Notice that the implementation of fgets or of getline is likely to be built above fgetc, or at least share some code with it. Remember that <stdio.h> is buffered I/O, see setvbuf(3) and related functions.
Another way would be to read the file in two passes. A first pass stores the offset (using ftell(3)...) of every line start in some efficient data structure (a vector, an hashtable, a tree...). A second pass use that data structure to retrieve the offset (of the line start), then use fseek(3) (using that offset).
A third way, POSIX specific, would be to memory-map the file using mmap(2) into your virtual address space (this works well for not too huge files, e.g. of less than a few gigabytes). With care (you might need to mmap an extra ending page, to ensure the data is zero-byte terminated) you would then be able to use strchr(3) with '\n'
In some cases, you might consider parsing your textual file line by line (using appropriately fgets, or -on Linux- getline, or generating your parser with flex and bison) and storing each line in a relational database (such as PostGreSQL or sqlite).
PS. BTW, the notion of lines (and the end-of-line mark) vary from one OS to the next. On Linux the end-of-line is a \n character. On Windows lines are rumored to end with \r\n, etc...
A FILE * in C is a stream of chars. In a seekable file, you can address these chars using the file pointer with fseek(). But apart from that, there are no "special characters" in files, a newline is just another normal character.
So in short, no, you can't jump directly to a line of a text file, as long as you don't know the lengths of the lines in advance.
This model in C corresponds to the files provided by typical operating systems. If you think about it, to know the starting points of individual lines, your file system would have to store this information somewhere. This would mean treating text files specially.
What you can do however is just count the lines instead of pattern matching, something like this:
#include <stdio.h>
int main(void)
{
char linebuf[1024];
FILE *input = fopen("seekline.c", "r");
int lineno = 0;
char *line;
while (line = fgets(linebuf, 1024, input))
{
++lineno;
if (lineno == 4)
{
fputs("4: ", stdout);
fputs(line, stdout);
break;
}
}
fclose(input);
return 0;
}
If you don't know the length of each line, you have to go through all of them. But if you know the line you want to stop you can do this:
while (!found && fgets(line, sizeof line, file) != NULL) /* read a line */
{
if (count == lineNumber)
{
//you arrived at the line
//in case of a return first close the file with "fclose(file);"
found = true;
}
else
{
count++;
}
}
At least you can avoid so many calls to strstr

Finding a substring of a string in a file C

I'm trying to selectively filter a text file by a string which is input to the standard input.
I would like to know why the following code does not work and how to fix it:
void get_filtered_list() {
FILE *f;
f = fopen("presentlist.txt", "r");
printf("Enter the city by which you want to select lines:\n");
char stringToFind[20];
fgets(stringToFind, sizeof(stringToFind), stdin);
char line[160];
while (!feof(f)) {
fgets(line, sizeof(line), f);
if (strstr(line, stringToFind) != NULL) {
printf("%s", line);
}
}
fclose(f);
}
This code above is trying to take a text file, opening that file, then reading the file line by line, and for each line executing the strstr() function with the current line of the file as argument 1 as a string, and the given name of the city as argument 2 as a string.
However what I get as a result is the ENTIRE contents of the file printed (and the last line prints twice, though this is a separate issue and I know the fix to this part).
The C book I'm reading states that the strstr() function is used to find a needle string in a haystack string, so it's the C equivalent of the C++ substr() function.
strstr() takes argument 1 as the haystack and argument 2 as the needle.
I first read in from the standard input into the needle, then line by line I check whether strstr() returns NULL or not (it should return NULL if the needle is not found in the haystack) and if it returns something other than NULL that means it found the substring in the string and it should only print the line THEN.
Instead it prints all of the lines in the file. Why?
If I switch it to f(strstr(line, stringToFind)) instead then it prints absolutely nothing.
Why?
You do not find the string because you did not strip the trailing '\n' from the string read into stringToFind by fgets. Actually, you will find the string if and only if it is the last word on a line.
You can remove the linefeed with this:
#include <string.h>
stringToFind[strcspn(stringToFind, "\n")] = '\0';
There are other ways to strip the linefeed, but be aware that if the last line of the file does not end with a linefeed, there will not be one in the buffer filled by fgets, therefore you cannot just overwrite the last character of the line. For your problem, it would be a good idea to remove all whitespace characters at the beginning and at the end of stringToFind.
Also check this question: Why is “while ( !feof (file) )” always wrong?
Testing the end of file with while (!feof(f)) will catch the end of file too late: fgets will fail and you do not test its return value, so the last line of the file will appear to be handled twice. The correct way to write this loop is this:
while (fgets(line, sizeof(line), f)) {
if (strstr(line, stringToFind) != NULL) {
printf("%s", line);
}
}
Not also that lines longer than 159 characters will be split by fgets and will cause incorrect output if they contain the searched string, especially if the string itself is split.

How to make an array of words using a line from fgets?

My input is from a file with multiple lines of text. After obtaining a line with fgets() how do I make an array containing the words from that line, which I can then iterate through? ie. from "pink floyd" to {"pink", "floyd"}.
int main() {
char line[500];
while(fgets(line, sizeof(line), stdin) != NULL) {
...
}
return 0;
}
You can extract words from a line of text using the strtok() function.
See How does the strtok function in C work? and http://www.cplusplus.com/reference/cstring/strtok/.
The strtok() function will modify the contents of line[], but
I suppose that's OK for this usage because you just wrote a line of input there
and you will soon write another line of input over it.
You will have to allocate a separate array to hold the pointers to the individual words.
If you intend to keep using this array after reading the next line of input, you will
need to make new copies of the strings returned by strtok(), because what it returns
is pointers into char line[500] and the next line of input will overwrite them.

Why does this code not specify which element of the array is accessed?

#include <stdio.h>
#include <stdlib.h>
FILE *fptr;
main()
{
char fileLine[100];
fptr = fopen("C:\\Users\\user\\Desktop\\Summary.h", "r");
if (fptr != 0){
while (!feof(fptr)){
fgets(fileLine, 100, fptr); // << not specified like fileLine[1] ?
if (!feof(fptr)){
puts(fileLine); // The same thing ?
}
}
}
else
{
printf("\nErorr opening file.\n");
}
fclose(fptr);
return 0;
}
The tremendous pain here, why the array elements are not specified, and how the array holds the lines?
char fileLine[100];
This is not an array of lines, it's an array of characters. One char represents one character (or more precisely one byte). The declaration char fileLine[100] makes it an array of 100 characters. C doesn't have distinct types for strings and for arrays of characters: a string (such as the content of a line) is just an array of characters, with a null byte after the last character.
At each run through the loop, fileLine contains the line that is read by fgets. That string is printed out by puts. Each call to fgets overwrite the line that was previously stored in the string.
Note that since fgets retains the newline character that terminates each line, and puts adds a newline after printing the string, you will get double-spaced output. If a line is more than 99 characters long (strictly speaking, again, more than 99 bytes long), you'll get a line break after each block of 99 characters.
If you wanted to store all the lines, you'd need an array of strings, i.e. an array of arrays of characters.
char fileLines[42][100];
int i = 0;
while (!feof(fptr)) {
fgets(fileLines[i], 100, fptr);
++i;
}
/* i-1 lines have been read, from fileLines[0] to fileLines[i-2] */
The way you're using feof is quite awkward there. feof tells you whether the last attempt to read reached the end of the file, not whether the next attempt to read would reach the end of the file. For example, here, after the last line has been read, feof() is false (because the program doesn't know yet that this is the last line, it has to attempt to read more); then fgets runs again, and returns NULL because it couldn't read anything. Nonetheless i is incremeneted; and after that feof() returns false which terminates the loop. Thus i ends up being one plus the number of lines read.
While you can fix this here by decrementing i, the way that actually works even in real-life programs — and that also makes more sense — is to test the result of fgets. You know that you've reached the end of the file because fgets is unable to read a line.
char fileLines[42][100];
int i = 0;
while (fgets(fileLines[i], 100, fptr))
++i;
}
/* i lines have been read, from fileLines[0] to fileLines[i-1] */
(This is a toy example, real-life code would need dynamic memory management and error checks for long lines, too many lines, and read errors.)
The array of characters that is fileLine is treated as a string.

Why does C print an extra line when reading from a file?

I'm brand new to C and trying to learn how to read a file. My file is a simple file (just for testing) which contains the following:
this file
has been
successfully read
by C!
So I read the file using the following C code:
#include <stdio.h>
int main() {
char str[100];
FILE *file = fopen("/myFile/path/test.txt", "r");
if(file == NULL) {
puts("This file does not exist!");
return -1;
}
while(fgets(str, 100, file) != '\0') {
puts(str);
}
fclose(file);
return 0;
}
This prints my text like this:
this file
has been
successfully read
by C!
When I compile and run it, I pipe its output to hexdump -C and can see an extra 0a at the end of each line.
Finally, why do I need to declare an array of chars to read from a file? What if I don't know how much data is on each line?
fgets() reads up to the newline and keeps the newline in the string and puts() always adds a newline to the string it is given to print. Hence you get double-spaced output when used as in your code.
Use fputs(str, stdout) instead of puts(); it does not add a newline.
The obsolete function gets() — removed from the 2011 version of the C standard — read up to the newline but removed it. The gets() and puts() pair worked well together, as do fgets() and fputs(). However, you should certainly NOT use gets(); it is a catastrophe waiting to happen. (The first internet worm in 1988 used gets() to migrate — Google search for 'morris internet worm').
In comments, inquisitor asked:
Why does the line need to be read into a char array of a specific size?
Because you need to make sure you don't overrun the space that is available. C does not do automatic allocation of space for strings. That is one of its weaknesses from some viewpoints; it is also a strength, but it routinely confuses newcomers to the language. If you want the input code to allocate enough space for a line, use the POSIX function getline().
So is it better to just read and output until I hit a '\0' since I won't always know the amount of chars on a given line?
No. In general, you won't hit '\0'; most text files do not contain any of those. If you don't want to allocate enough space for a line, then use:
int c;
while ((c = getchar()) != EOF)
putchar(c);
which reads one character at a time in the user code, but the underlying standard I/O packages buffer the input up so it isn't too costly — it is perfectly feasible to implement a program that way. If you need to work on lines, either allocate enough space for lines (I use char buffer[4096]; routinely) or use getline().
And Charlie Burns asked in a comment:
Why don't we see getline() suggested more often?
I think it is not mentioned all that often because getline() is relatively new, and not necessarily available everywhere yet. It was added to POSIX 2008; it is available on Linux and BSD. I'm not sure about the other mainline Unix variants (AIX, HP-UX, Solaris). It isn't hard to write for yourself (I've done it), but it is a nuisance if you need to write portable code (especially if 'portable' includes 'Microsoft'). One of its merits is that it tells you how long the line it read actually was.
Example using getline()
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char *line = 0;
size_t length = 0;
char const name[] = "/myFile/path/test.txt";
FILE *file = fopen(name, "r");
if (file == NULL)
{
fprintf(stderr, "%s: failed to open file %s\n", argv[0], name);
return -1;
}
while (getline(&line, &length, file) > 0)
fputs(str, stdout);
free(line);
fclose(file);
return 0;
}
fgets saves the newline character at the end of the line when reading line by line. This allows you to determine wether actually a line was read or just your buffer was too small.
puts always adds a newline when printing.
Either trim off the newline from fgets or use printf
printf("%s", str);

Resources