How to skip white lines while reading text file? - c

I want to read a text file line by line, but I'm not interested in the white lines. What nice way is there of skipping the blank lines? I know I could read a line, check if it's blank and free it if it is, and so on until I reach a good line, but I'm wondering if there's some other way to do it.

I think your method is good enough. Technically you should even check if it's only spaces :-) Note that if you are using fscanf (quite used in homework problems), white line skipping is "Included in the price" :-) AND you don't have to fight against "this line is bigger than my buffer, what should I do?"

The general concept is fine ... you read in line by line and check to see if it has a non-whitespace character. A fairly optimum way of checking for it is to use strspn ... for example:
#include <stdio.h>
#include <string.h>
int is_blank_line(const char *line) {
const char accept[]=" \t\r\n"; /* white space characters (fgets stores \n) */
return (strspn(line, accept) == strlen(line));
}
int main(int argc, char *argv[]) {
char line[256]; /* assuming no line is longer than 256 bytes */
FILE *fp;
if ( argc < 2 ) {
fprintf(stderr, "Need a file name\n");
return -1;
}
fp = fopen(argv[1], "r");
if ( !fp ) {
perror(argv[1]);
return -1;
}
while (!feof(fp)) {
fgets(line, sizeof(line), fp);
if (is_blank_line(line)) {
continue;
}
printf("%s", line);
}
return 0;
}

If reading line by line use a simple check of '\n' (compiler will take care even if your real OS newline is \r\n).
If using fread to read whole file use strtok or strtok_r to split lines using sep='\n', empty lines will chopped out automatically.

Related

how to extaract data from file starting from 2nd line in c

Im very new to this language, can you help me:
Instead of making the user input col, row, and direction(scanf). I want to extract the data from file(format below)
From the file format i do not want to extract the first line(5,6), i only want to extract the remaining lines.
Below is a code of how to extract data from a file(using command line arguments), but this code extract the first line also, and only prints the lines.I do not want to print the line but to extract the data from a file instead of making the user input it.
File format:
colrow direction(starting from 2nd line)
5,6
A0 H
D0 V
C1 V
A4 H
F0 v
code of scanf
yourcolumn = getchar();
col = charToNum(yourcolumn); //function to input column
printf("enter row");
scanf("%d",&row);
printf("h: horizontally or v: vertically?\n");
scanf(" %c",&direction);
Code for extracting data from file:
#include <stdio.h>
int main(int argc, char* argv[])
{
char const* const fileName = argv[1]; /* should check that argc > 1 */
FILE* file = fopen(fileName, "r"); /* should check the result */
char line[256];
while (fgets(line, sizeof(line), file)) {
/* note that fgets don't strip the terminating \n, checking its
presence would allow to handle lines longer that sizeof(line) */
printf("%s", line);
}
/* may check feof here to make a difference between eof and io failure -- network
timeout for instance */
fclose(file);
return 0;
}
Since you are reading line-by-line, I suggest you restructure you file reading to match your logic
while (EOF != fscanf(file, "%[^\n]\n", line)) {
printf("> %s\n", line);
}
Is a way that one can read every line, one at a time. You can lookup the caveats of using fscanf and how to adjust the code to safely read without overflowing your line buffer.
Then, if you want to skip the first line, your code could look like this
if (EOF != fscanf(file, "%[^\n]\n", line)) {
// skip the first line
}
while (EOF != fscanf(file, "%[^\n]\n", line)) {
printf("> %s\n", line);
}
And your processing logic will look a lot like your mental process.
Yes, you could use a line counter, and only process if the counter is high enough; but, it is generally better to avoid introducing variables, if you can live without them. This is because an extra added variable doesn't make the code too hard to reason about; but, after you've repeated that "extra variable" rationale five or six times, the code quickly turns into something that's harder to maintain and harder to reason about. By the time you hit twenty or more extra variables, the odds of maintaining the code quickly without breaking it are lower.
Read the first line also with fgets() into a string and then scan the string for row, direction.
char line[256];
if (fgets(line, sizeof(line), file)) {
if (sscanf(line, "%d %c", &row, &direction) != 2) {
printf("Invalid first line '%s'\n", line);
} else {
while (fgets(line, sizeof(line), file)) {
printf("%s", line);
}
}
}

Best way to read this file while ignoring the new line character

Here's the file I want to read.
single
splash
single
V-Line
h-line
Macro for checking if string is equal.
#define STR_MATCH(a,b) (strncmp((a),(b),strlen(b)+1) == 0)
Here's what i'm using to read it.
void readMissilesFile(char* fileName)
{
FILE* mFile;
char missile[7];
/* Open the file. */
mFile = fopen(fileName, "r");
if (mFile != NULL)
{
while (!feof(mFile))
{
fgets(missile, 7, mFile);
if (!(STR_MATCH(missile, "\n")))
{
printf("Missile: %s", missile);
}
}
fclose(mFile);
}
else
{
perror("Could not open the file.");
}
}
So i'm having difficulties as its printing out spaces when I read the line. I tried to ignore this by ensuring it only reads 7 characters which is the max length of each missile. Then I made a macro called strcmp which just checks if they are equal(to hopefully not print it).
Please find the macro attached as well.
Thanks in advance and any help is great. :)
If I understand your question correctly you can replace the newline characters by using strcspn.
You should not use feof like this, this post explains why. A safe way to read the file till the end is to use fgets as stop condition in the while loop.
The container, missile should be one char bigger than the max size of the largest string to accomodate for '\0'.
Live sample
#include <string.h>
//...
char missile[10];
//...
if (mFile != NULL)
{
while (fgets(missile, 10, mFile)) //will read till there are no more lines
{
missile[strcspn(missile, "\r\n")] = '\0'; //remove newline characters
printf("Missile: %s ", missile);
}
}
//...
I would advise the reading of this post which has detailed info about fgets, namely the issue of newline characters consumption.
There is getline function in stdio.h which reads line until delimiter. Its a POSIX though, so if you are on Windows you may lack it.
Here is example implementation:
https://github.com/ivanrad/getline/blob/master/getline.c

fgets() not working after fscanf()

I am using fscanf to read in the date and then fgets to read the note.
However after the first iteration, fscanf returns a value of -1.
I used GDB to debug the program step by step. It works fine until the first use of fgets. When I try print out the line read by fgets on the first iteration, it gives me this:
(gdb) print line
$6 = "\rtest\r18/04/2010\rtest2\r03/05/2010\rtest3\r05/08/2009\rtest4\r\n\000\000\000\000q\352\261\a\370\366\377\267.N=\366\000\000\000\000\003\000\000\000\370xC\000\000\000\000\000\000\000\000\000\001\000\000\000\227\b\000\000\070\367\377\267H\364\377\267\362\202\004\bdoD\000\354\201\004\b\001\000\000\000\304oC\000p\363\377\277\260zC\000D\363\377\277\n!B\000\064\363\377\277\354\201\004\b(\363\377\277TzC\000\000\000\000\000\070\367\377\267\001\000\000\000\000\000\000\000\001\000\000\000\370xC\000\001\000\000\000\000\000\312\000\000\000\000\000\377\260\360\000\001\000\000\000\277\000\000\000\364\317\000\000\344\261\\\000\000\000\000\000p\363\377\277|\233\004\b\350\362\377\277 \204\004\b\005\000\000\000|\233\004\b\030\363\377\277"
It looks like fgets reads the remaining entries and then stores them all in a single string.
I am not sure why it is doing this.
Here is the main code:
int main(int argc, char* argv[]) {
FILE* file;
int numEntries, i = 0;
int index = atoi(argv[1]);
char line[SIZE];
JournalEntry *entry;
/*argument provided is the entry user wants to be displayed*/
if (argc > 2) {
perror("Error: Too many arguments provided");
}
file = fopen("journalentries.txt", "r");
if (file == NULL) {
perror("Error in opening file");
}
if (fscanf(file, "%d", &numEntries) != 1) {
perror("Unable to read number of entries");
}
entry = (JournalEntry*)malloc(numEntries * sizeof(JournalEntry));
if (entry == NULL) {
perror("Malloc failed");
}
for (i = 0; i < numEntries; i++) {
if (fscanf(file, "%d/%d/%d", &entry[i].day, &entry[i].month, &entry[i].year) != 3) {
perror("Unable to read date of entry");
}
if (fgets(line, sizeof(line), file) == NULL) {
perror("Unable to read text of entry");
}
}
printf("%d-%02d-%02d %s: ", entry[index].year, entry[index].month, entry[index].day, entry[index].text);
if(ferror(file)) {
perror("Error with file");
}
fclose(file);
free(entry);
return 0;
}
The file that I have to read:
The very first line contains the number of entries to be read
4
12/04/2010
test
18/04/2010
test2
03/05/2010
test3
05/08/2009
test4
The struct JournalEntry located in the header file:
typedef struct {
int day;
int month;
int year;
char text[250];
} JournalEntry;
It looks like fgets reads the remaining entries and then stores them all in a single string.
Yes, '\r' is not line terminator. So when fscanf stops parsing at the first invalid character, and leaves them in the buffer, then fgets will read them until end of line. And since there are no valid line terminators in the file, that is until end of file.
You should probably fix the file to have valid (Unix?) line endings, for example with suitable text editor which can do it. But that is another question, which has been asked before (like here), and depends on details not included in your question.
Additionally, you need dual check for fscanf return value. Use perror only if return value is -1, otherwise error message will not be related to the error at all. If return value is >=0 but different from what you wanted, then print custom error message "invalid input syntax" or whatever (and possibly use fgets to read rest of the line out of the buffer).
Also, to reliably mix scanf and fgets, I you need to add space in the fscanf format string, so it will read up any whitespace at the end of the line (also at the start of next line and any empty lines, so be careful if that matters), like this:
int items_read = scanf("%d ", &intvalue);
As stated in another answer, it's probably best to read lines with fgets only, then parse them with sscanf line-by-line.
Don't mix fscanf() and fgets(), since the former might leave stuff in the stream's buffer.
For a line-oriented format, read only full lines using fgets(), then use e.g. sscanf() to parse what you've read.
The string you see when running GDB really ends at the first null character:
"\rtest\r18/04/2010\rtest2\r03/05/2010\rtest3\r05/08/2009\rtest4\r\n\000"
The other data after is ignored (when using ordinary str-functions);

print binary file

Say you have a file dog.txt
The
quick
brown
fox
jumps
over
the
lazy
dog
You can print the lines like this
#include <stdio.h>
int
main (void)
{
char buf[10];
FILE *fp = fopen ("dog.txt", "r");
while (fgets (buf, sizeof buf, fp))
printf ("%s", buf);
return 0;
}
But what if each "line" was separated by a null character (\0), instead of a newline (\n)? How would you print each "line" ?
The difference between "text" file handling and any other file handling is that the "text" functions assume certain things (for example, that \n is a separator). If that's not the case for you, you obviously cannot use the "text" manipulation functions. You do fread, and parse the content yourself.
If the file is not a text file (that is, if it contains non-printable ASCII characters), treat it as binary.
Rather than reading a "line" at a time (which is a text file concept), read in a buffer at a time (e.g. 1024 characters at a time).
Output each character that you read one at a time, unless you encounter whatever line delimiter the file uses (e.g. the "null" character in your question). When that character is encountered, output a newline instead.
You open a file in binary mode by including the "b" flag, e.g.
FILE *fp = fopen("dog.txt", "rb");
Use fread to read data one buffer at a time.
n = fread(buffer, sizeof(char), BUFFER_SIZE, source);
This is a trimmed down version of
WhozCraig’s
deleted answer
If all you want to do is dump data from the input file to stdout, replacing any imbedded null-chars (0) with newlines, then just do that. read-ahead buffering and such is honestly overkill for the simplicity of this problem, and besides, the fopen/fread/etc.. family already buffers for you.
Note: this assumes exactly what the OP specified, the this is otherwise a reagular "text" file save for the oddity that imbedded null-chars (0)'s should be treated as newlines in the output stream:
#include <stdio.h>
#include <errno.h>
int main(int argc, char* argv[])
{
FILE *fp = NULL;
fp = fopen(argv[1], "rb");
do
{ // pull next char, break on EOF, subst '\n' on 0.
int ch = fgetc(fp);
if (EOF == ch)
break;
if (0 == ch)
ch = '\n';
fputc(ch, stdout);
} while (true);
fclose(fp);
return EXIT_SUCCESS;
}

C, reading a multiline text file

I know this is a dumb question, but how would I load data from a multiline text file?
while (!feof(in)) {
fscanf(in,"%s %s %s \n",string1,string2,string3);
}
^^This is how I load data from a single line, and it works fine. I just have no clue how to load the same data from the second and third lines.
Again, I realize this is probably a dumb question.
Edit: Problem not solved. I have no idea how to read text from a file that's not on the first line. How would I do this? Sorry for the stupid question.
Try something like:
/edited/
char line[512]; // or however large you think these lines will be
in = fopen ("multilinefile.txt", "rt"); /* open the file for reading */
/* "rt" means open the file for reading text */
int cur_line = 0;
while(fgets(line, 512, in) != NULL) {
if (cur_line == 2) { // 3rd line
/* get a line, up to 512 chars from in. done if NULL */
sscanf (line, "%s %s %s \n",string1,string2,string3);
// now you should store or manipulate those strings
break;
}
cur_line++;
}
fclose(in); /* close the file */
or maybe even...
char line[512];
in = fopen ("multilinefile.txt", "rt"); /* open the file for reading */
fgets(line, 512, in); // throw out line one
fgets(line, 512, in); // on line 2
sscanf (line, "%s %s %s \n",string1,string2,string3); // line 2 is loaded into 'line'
// do stuff with line 2
fgets(line, 512, in); // on line 3
sscanf (line, "%s %s %s \n",string1,string2,string3); // line 3 is loaded into 'line'
// do stuff with line 3
fclose(in); // close file
Putting \n in a scanf format string has no different effect from a space. You should use fgets to get the line, then sscanf on the string itself.
This also allows for easier error recovery. If it were just a matter of matching the newline, you could use "%*[ \t]%*1[\n]" instead of " \n" at the end of the string. You should probably use %*[ \t] in place of all your spaces in that case, and check the return value from fscanf. Using fscanf directly on input is very difficult to get right (what happens if there are four words on a line? what happens if there are only two?) and I would recommend the fgets/sscanf solution.
Also, as Delan Azabani mentioned... it's not clear from this fragment whether you're not already doing so, but you have to either define space [e.g. in a large array or some dynamic structure with malloc] to store the entire dataset, or do all your processing inside the loop.
You should also be specifying how much space is available for each string in the format specifier. %s by itself in scanf is always a bug and may be a security vulnerability.
First off, you don't use feof() like that...it shows a probable Pascal background, either in your past or in your teacher's past.
For reading lines, you are best off using either POSIX 2008 (Linux) getline() or standard C fgets(). Either way, you try reading the line with the function, and stop when it indicates EOF:
while (fgets(buffer, sizeof(buffer), fp) != 0)
{
...use the line of data in buffer...
}
char *bufptr = 0;
size_t buflen = 0;
while (getline(&bufptr, &buflen, fp) != -1)
{
...use the line of data in bufptr...
}
free(bufptr);
To read multiple lines, you need to decide whether you need previous lines available as well. If not, a single string (character array) will do. If you need the previous lines, then you need to read into an array, possibly an array of dynamically allocated pointers.
Every time you call fscanf, it reads more values. The problem you have right now is that you're re-reading each line into the same variables, so in the end, the three variables have the last line's values. Try creating an array or other structure that can hold all the values you need.
The best way to do this is to use a two dimensional array and and just write each line into each element of the array. Here is an example reading from a .txt file of the poem Ozymandias:
int main() {
char line[15][255];
FILE * fpointer = fopen("ozymandias.txt", "rt");
for (int a = 0; a < 15; a++) {
fgets(line[a], 255, fpointer);
}
for (int b = 0; b < 15; b++) {
printf("%s", line[b]);
}
return 0;
This produces the poem output. Notice that the poem is 14 lines long, it is more difficult to print out a file whose length you do not know because reading a blank line will produce the output "x�oA". Another issue is if you check if the next line is null by writing
while (fgets(....) != NULL)) {
each line will be skipped. You could try going back a line each time to solve this but i think this solution is fine for all intents.
I have an even EASIER solution with no confusing snippets of puzzling methods (no offense to the above stated) here it is:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
string line;//read the line
ifstream myfile ("MainMenu.txt"); // make sure to put this inside the project folder with all your .h and .cpp files
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
Happy coding

Resources