Reading CR terminated keyword text file

Reading CR terminated keyword text file - c

I want to read a list of keywords from a (text) file and then add those in a CString array in C. The trouble is that, I am reading the file line by line, and the file contains one word in every line. I can successfully populate the array, but when I try to look up these keywords in another string, it returns false because I am guessing the keyword has \n at the end.
Another way I could read the file could be, to make the text file a comma separated file, and read one line and tokenize it. But then, I won't know how to read a line whose size can be VERY large, as the list of keyword is ever expanding.
Saad Rehman

If your problem is that a string may have a rogue newline at the end, you can use:
size_t len = strlen (mystring);
if (len > 0)
if (mystring[len-1] == '\n')
mystring[--len] = '\0';
Do this to mystring after you've read it in but before you use it.
It simply checks if the last character is a newline and, if so, replaces it with a string terminator.
The first check is to ensure you don't try this on an empty string where mystring[-1] would invoke the dreaded undefined behaviour.

Related

C: Parsing through an input file in order to check formatting

My goal here is to read through a text file that has to follow these formatting regulations:
No spaces/tabs between characters
Characters must be either non-negative integer or new line character (no letters/symbols)
I can only use functions given in stdlib.h and stdio.h
I am thinking of reading through the file character by character using the fgetc() function, but I can't think of a way that tests whether or not the character is a new line character (isn't a new line char /n, which would be two chars together which would ruin the idea of going char by char?).
Following this train of thought I was thinking that using getline(), which would negate the necessity of checking if a char is a new line character, would be easier (am I right in thinking this or would this not negate such a requirement?). Yet if I were to do this what would be the easiest way to traverse through the char string that this would produce in order to still check each individual character?
Also, if someone could think of an easier route as to checking for the format of a file using the given libraries that would be much appreciated.

If you use getline it will retain the newline character so you'll still have to check for it.
If you can use any of the standard C library then you can use isdigit(...) from ctypes.h
It returns non-zero if the input character is a digit.
If you use the getline function the input would be written into the buffer that you pass as the first argument. It will append a null terminal character to this buffer so you can walk through it as so:
for(char* s = buffer; *s; s++)
/* test *s */

Searching for strings that are NULL terminated within a file where they are not NULL terminated

I am writing a program that opens two files for reading: the first file contains 20 names which I store in an array of the form Names[0] = John\0. The second file is a large text file that contains many occurences of each of the 20 names.
I need my program to scan the entirity of the second file and each time it finds one of the names, a variable Count is incremented and so on the completion of the program, the total number of all the names appearing in the text is stored in Count.
Here is my loop which searches for and counts the number of name occurences:
char LineOfText[85];
char *TempName;
while(fgets(LineOfText, sizeof(LineOfText), fpn)){
for(a = 0; a<NumOfNames; a++){
TempName = strstr(LineOfText, Names[a]);
if(TempName != NULL){
Count++;
}
}
}
No matter what I do, this loop doesn't work as I would expect it to, but I have discovered what is wrong (I think!). My problem is that each name in the array is NULL terminated, but when a name appears in the text file it is not NULL terminated, unless it occurs as the last word of a line. Therefore, this while loop is only counting the number of times any of the names appear at the end of a line, rather than the number of appearances of any of the names anywhere in the text file. How can I adjust this loop to combat this problem?
Thank you for any advice in advance.

The issue here is probably your use of fgets, which does not trim the newline from the line it reads.
If you are creating your names array by reading lines with fgets, then all the names will be terminated with a newline character. The lines in the file being read with fgets will also be terminated with a newline character, so the names will only match at the end of the lines.
strstr does not compare the NUL byte which terminates the pattern string, for obvious reasons. If it did, it would only match suffix strings, which would make it a very different function.
Also, you will only find a maximum of one instance of each name in each line. If you think that a name might appear more than once in the same line, you should replace:
TempName = strstr(LineOfText, Names[a]);
if(TempName != NULL){
Count++;
}
with something like:
for (TempName = LineOfText;
(TempName = strstr(TempName, Names[a]);
++Count, ++TempName) {
}
For reference, here is the definition of fgets from the C standard (emphasis added):
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
This is different from gets, which does not retain the new-line character.

I think the NULL termination of the names array is not an issue (See strstr function reference). The strstr function is not going to compare the terminator. You do have the possibility of missing additional names on each line. See my adjustment below for an example of how you could count multiple names on each line.
char LineOfText[85];
char *TempName;
while(fgets(LineOfText, sizeof(LineOfText), fpn)){
for(a = 0; a<NumOfNames; a++){
TempName = strstr(LineOfText, Names[a]);
/* Iterate through line for multiple occurrences of each name */
while(TempName != NULL){
Count++;
/* Get next occurrence of name on line. fgets is going to
leave a newline at the end of the LineOfText string so
unless some of your names contain a newline, it shouldn't
move past the end of the buffer */
TempName = strstr(TempName + 1, Names[a]);
}
}
}

How do I check if the line is over?

I ran into a problem today. I can't find a way to check if a line in a file is over and the words are read from the next one already. I read word by word from the file using fscanf, then process the word as I need to and print it out into another file but there is a problem.
for example my data file is:
Hello, how are you
doing?
and the result file shows:
Hello, how are you doing?
but i need the words to be in the same lines from which I took them. Please keep in mind that I need those words one by one, that is why I don't use getline()
here is my code of how I read words from the file:
while( fscanf(file, "%s", A) != EOF )
{
check(A, B, &a); // I edit the words and put them in B string
// which is printed to the write file
}
Thank you for any tips!

Read the line into a string with getline() or fgets(), then use sscanf to get the words out of this string.

You can use a simple logic instead, like matching strings like . or ? which generally ends lines.

You need to check for end of line by adding check.
As the end-of-line is represented by the newline character, which is '\n'. so in while loop instead of copying entire thing do it line by line with the help of check for '\n'

Reading line by line in C

Currently to read a file line by line in C I am using:
char buffer[1024];
while(fgets(buffer, sizeof(buffer), file) != NULL) {
//do something with each line that is now stored in buffer
}
However there is no guarantee in the file that the line will be shorter than 1024. What will happen if a line is longer than 1024? Will the rest of the line be read in the next iteration of the while loop?
And how can I read line by line without a maximum length?

Yes, the rest of the line will be read in the next iteration.
You can detect whether or not you read a whole line by inspecting the last character of the string (i.e. the one before the null terminator) to see if it is '\n' or not -- fgets passes '\n' through to you.
There is no Standard C function which will read a line whilst dynamically allocating enough memory for it, however there is a POSIX function getline() which does that. You could write your own that uses fgets or otherwise to do the reading, in a loop with realloc, of course.

From the standards §7.19.7.2,
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into the
array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
From MSDN,
fgets reads characters from the current stream position to and including the first newline character, to the end of the stream, or until the number of characters read is equal to n – 1, whichever comes first. The newline character, if read, is included in the string.
So, yes fgets will read the rest of the line in next iteration if the it doesn't encounters the newline character within sizeof(buffer)-1 range.
If you want to read the whole line in one shot, then it is better to go with malloc and, if needed, reallocing the memory as per your needs.

I/O in C Errors

I'm trying for hours to find the answer for this question i've got in university. I tried running this with writing a file with two lines of :
hello
world
and it reads the file perfectly, So i cant find the answer. I would appreciate your help !
A student wrote the next function for reading a text file and printing it exactly as it is.
void ReadFile(FILE *fIn)
{
char nextLine[MAX_LINE_LENGTH];
while(!feof(fIn))
{
fscanf(fIn,"%s",nextLine);
printf("%s\n",nextLine);
}
}
What are the two errors in this function?
You can assume that each line in the file is not longer than MAX_LINE_LENGTH characters, and that it is a text file that contains only alphabet characters, and that each line is terminated by '\n'.
Thanks.

It discards white space. Try adding multiple spaces and tabs.
It may evaluate a stream more than once, and If there is a read error, the loop never terminates.
See: Why is “while ( !feof (file) )” always wrong?
Reading strings via scanf is dangerous. There is no bounds checking. You may read past you MAX_LINE_LENGTH.(and boom! Segfault)

The main error is that fsacnf( fIn, "%s", nextLine ) doesn't scan a complete line.
From man page:
s
Matches a sequence of non-white-space characters; the next pointer must be a pointer to character array that is long enough to hold the input sequence and the terminating null byte ('\0'), which is added automatically. The input string stops at white space or at the maximum field width, whichever occurs first.
Thus if you have a line "a b" the first fscanf() will scan just "a" and the second one "b" and both are printed in two different lines. You can use fgets() to read a whole line.
The second one is maybe that it's stated "each line in the file is not longer than MAX_LINE_LENGTH characters" but nextLine can contain atmost MAX_LINE_LENGTH-1 characters (+ '\0'). That problem becomes even more important if you replace fscanf() by fgets() because than nextLine must have also capacity to store '\n' or '\r\n' (depending on the platform you're on)

A correct way of doing that is:
void ReadFile(FILE *fIn)
{
char nextLine[MAX_LINE_LENGTH];
while(fgets(nextLine, MAX_LINE_LENGTH, fIn)) {
printf("%s", nextLine);
}
}
As some have posted using feof to control a loop is not a good idea nor using fscanf to read lines.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Reading CR terminated keyword text file - c

Related

C: Parsing through an input file in order to check formatting

Searching for strings that are NULL terminated within a file where they are not NULL terminated

How do I check if the line is over?

Reading line by line in C

I/O in C Errors

Categories

Resources