Finding a substring of a string in a file C - c

I'm trying to selectively filter a text file by a string which is input to the standard input.
I would like to know why the following code does not work and how to fix it:
void get_filtered_list() {
FILE *f;
f = fopen("presentlist.txt", "r");
printf("Enter the city by which you want to select lines:\n");
char stringToFind[20];
fgets(stringToFind, sizeof(stringToFind), stdin);
char line[160];
while (!feof(f)) {
fgets(line, sizeof(line), f);
if (strstr(line, stringToFind) != NULL) {
printf("%s", line);
}
}
fclose(f);
}
This code above is trying to take a text file, opening that file, then reading the file line by line, and for each line executing the strstr() function with the current line of the file as argument 1 as a string, and the given name of the city as argument 2 as a string.
However what I get as a result is the ENTIRE contents of the file printed (and the last line prints twice, though this is a separate issue and I know the fix to this part).
The C book I'm reading states that the strstr() function is used to find a needle string in a haystack string, so it's the C equivalent of the C++ substr() function.
strstr() takes argument 1 as the haystack and argument 2 as the needle.
I first read in from the standard input into the needle, then line by line I check whether strstr() returns NULL or not (it should return NULL if the needle is not found in the haystack) and if it returns something other than NULL that means it found the substring in the string and it should only print the line THEN.
Instead it prints all of the lines in the file. Why?
If I switch it to f(strstr(line, stringToFind)) instead then it prints absolutely nothing.
Why?

You do not find the string because you did not strip the trailing '\n' from the string read into stringToFind by fgets. Actually, you will find the string if and only if it is the last word on a line.
You can remove the linefeed with this:
#include <string.h>
stringToFind[strcspn(stringToFind, "\n")] = '\0';
There are other ways to strip the linefeed, but be aware that if the last line of the file does not end with a linefeed, there will not be one in the buffer filled by fgets, therefore you cannot just overwrite the last character of the line. For your problem, it would be a good idea to remove all whitespace characters at the beginning and at the end of stringToFind.
Also check this question: Why is “while ( !feof (file) )” always wrong?
Testing the end of file with while (!feof(f)) will catch the end of file too late: fgets will fail and you do not test its return value, so the last line of the file will appear to be handled twice. The correct way to write this loop is this:
while (fgets(line, sizeof(line), f)) {
if (strstr(line, stringToFind) != NULL) {
printf("%s", line);
}
}
Not also that lines longer than 159 characters will be split by fgets and will cause incorrect output if they contain the searched string, especially if the string itself is split.

Related

Trying to make a simple program (in C) that copies all non empty lines from a given text file into a new text file

This is what I tried doing (if the first charcacter of a line is '\n' it must necessarily be an empty line) but it gives me the error message: "Thread 1: EXC_BAD_ACCESS (code=1, address=0x68" at the line of fgets..
#include<stdio.h>
#define MAX_LEN 80
int main(int argc, char *argv[])
{
FILE *fin,*fout;
fin=fopen("poem_in.txt","r");
fout=fopen("poem_out.txt","w");
char line[MAX_LEN];
do {
fgets(line, MAX_LEN, fin);
if ((line[0])!='\n') fputs(line,fout);
} while(fgets(line, MAX_LEN, fin)!=NULL);
fclose(fin);
fclose(fout);
return 0;
}
I tried looking at the correction my professor gave but she used strcmp(line,"\n") so its not very useful and I don't get how its possible to compare a string and a char? Any help at all would be greatly appreciated and would be of great help in my studies!
You're calling fgets() twice each time through the loop. As a result, you only check every other line for being empty.
Do it like this instead.
while (fgets(line, MAX_LEN, fin)) {
if ((line[0])!='\n') fputs(line,fout);
}
If you're getting an error on the fgets() line, it's probably because the file wasn't opened successfully. You should check it first.
fin=fopen("poem_in.txt","r");
if (!fin) {
fprintf(stderr, "Can't open put file poem_in.txt\n");
exit(1);
}
fout=fopen("poem_out.txt","w");
if (!fout) {
fprintf(stderr, "Can't open output file poem_out.txt\n");
exit(1);
}
I tried looking at the correction my professor gave but she used strcmp(line,"\n") so its not very useful and i don't get how its possible to compare a string and a char?
Actually "\n" is not a char but a C string. Note that the char would be written as '\n' and here you have C string ("\n") and so it is possible to compare. Furthermore, you might want to take a look at the strcmp documentation http://www.cplusplus.com/reference/cstring/strcmp/
Remember, however, that strcmp would read strings from arguments for as long as the NULL character is not found meaning that a bad formatted input would make it read more than intended which could lead to crashes. In order to prevent there is a smarter equivalent of strcmp called strncpy which takes an additional parameter - the maximum length of the input. You have this defined with MAX_LEN ans so if you decide to follow the suggestion of your professor it is better to use strncmp http://www.cplusplus.com/reference/cstring/strncmp/
This is what I tried doing (if the first charcacter of a line is '\n' it must necessarily be an empty line) but it gives me the error message: "Thread 1: EXC_BAD_ACCESS (code=1, address=0x68" at the line of fgets..
Now, reading through your code there are a places that require your attention. For example you call fgets twice in the do-while loop:
do {
fgets(line, MAX_LEN, fin);
if ((line[0])!='\n') fputs(line,fout);
} while(fgets(line, MAX_LEN, fin)!=NULL);
You read a line from the file, and possibly write it to the another file. Then (inside the while) you read a line again but this time you do not look at it at all but instead do another read. Effectively you are skipping every 2nd line.
I think what you should do is start off with while loop instead of do-while and do the fgets in the while clausule. Then use strncmp to compare the output with a new-line character and save to the file the way you do it now. Something like:
while(fgets(...)) {
if strncmp {
fputs()
}
}

Read last line of a text file - C programming

I'm still a novice in C as I just started out. Here is a part of my function to open the file and then save the file lines into variables. I did while to loop until the end of file so I can get the last line, however it did not go as expected. So, I was wondering how can I get just the last line from a text file? Thank you.
tfptr = fopen("trans.txt", "r");
while (!feof(tfptr)){
fscanf(tfptr, "%u:%u:%.2f\n", &combo_trans, &ala_trans, &grand_total);
}
fclose(tfptr);
sample text file:
0:1:7.98
1:1:20.97
2:1:35.96
2:2:44.95
2:2:44.95
3:2:55.94
In your fscanf(tfptr, "%u:%u:%.2f\n", &combo_trans, &ala_trans, &grand_total);, the %.2f will cause problem.
You can't specify the precision for floating-point numbers in scanf() unlike in the case of printf(). See this answer.
So, instead of %.2f in the scanf format string, use just %f.
Since you just need the last line, you could just read the file line by line with fgets() and keep the last line.
while( fgets(str, sizeof(str), tfptr)!=NULL );
printf("\nLast line: %s", str);
fgets() will return NULL when the file is over (or if some error occurred while reading).
The lines in the input file are read one by one and when there are no more lines to read, str (a character array of suitable size) will have the line that was read last.
You could then parse the string in str with sscanf() like
sscanf(str, "%u:%u:%f", &combo_trans, &ala_trans, &grand_total);
Also, you should be checking the return value of fopen() to see if the file was really opened. fopen() will return NULL if some error occurred.
if( (tfptr = fopen("trans.txt", "r"))==NULL )
{
perrror("Error");
}
What did go wrong? Did you get another line?
Don't use "&" as you don't want to save a pointer. That can be the reason of failure.

How to get fscanf to stop if it hits a newline? [duplicate]

I'm trying to read a line using the following code:
while(fscanf(f, "%[^\n\r]s", cLine) != EOF )
{
/* do something with cLine */
}
But somehow I get only the first line every time. Is this a bad way to read a line? What should I fix to make it work as expected?
It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure.
I prefer to use fgets() to get each line in and then sscanf() that. You can then continue to examine the line read in as you see fit. Something like:
#define LINESZ 1024
char buff[LINESZ];
FILE *fin = fopen ("infile.txt", "r");
if (fin != NULL) {
while (fgets (buff, LINESZ, fin)) {
/* Process buff here. */
}
fclose (fin);
}
fgets() appears to be what you're trying to do, reading in a string until you encounter a newline character.
If you want read a file line by line (Here, line separator == '\n') just make that:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
FILE *fp;
char *buffer;
int ret;
// Open a file ("test.txt")
if ((fp = fopen("test.txt", "r")) == NULL) {
fprintf(stdout, "Error: Can't open file !\n");
return -1;
}
// Alloc buffer size (Set your max line size)
buffer = malloc(sizeof(char) * 4096);
while(!feof(fp))
{
// Clean buffer
memset(buffer, 0, 4096);
// Read a line
ret = fscanf(fp, "%4095[^\n]\n", buffer);
if (ret != EOF) {
// Print line
fprintf(stdout, "%s\n", buffer);
}
}
// Free buffer
free(buffer);
// Close file
fclose(fp);
return 0;
}
Enjoy :)
If you try while( fscanf( f, "%27[^\n\r]", cLine ) == 1 ) you might have a little more luck. The three changes from your original:
length-limit what gets read in - I've used 27 here as an example, and unfortunately the scanf() family require the field width literally in the format string and can't use the * mechanism that the printf() can for passing the value in
get rid of the s in the format string - %[ is the format specifier for "all characters matching or not matching a set", and the set is terminated by a ] on its own
compare the return value against the number of conversions you expect to happen (and for ease of management, ensure that number is 1)
That said, you'll get the same result with less pain by using fgets() to read in as much of a line as will fit in your buffer.
Using fscanf to read/tokenise a file always results in fragile code or pain and suffering. Reading a line, and tokenising or scanning that line is safe, and effective. It needs more lines of code - which means it takes longer to THINK about what you want to do (and you need to handle a finite input buffer size) - but after that life just stinks less.
Don't fight fscanf. Just don't use it. Ever.
It looks to me like you're trying to use regex operators in your fscanf string. The string [^\n\r] doesn't mean anything to fscanf, which is why your code doesn't work as expected.
Furthermore, fscanf() doesn't return EOF if the item doesn't match. Rather, it returns an integer that indicates the number of matches--which in your case is probably zero. EOF is only returned at the end of the stream or in case of an error. So what's happening in your case is that the first call to fscanf() reads all the way to the end of the file looking for a matching string, then returns 0 to let you know that no match was found. The second call then returns EOF because the entire file has been read.
Finally, note that the %s scanf format operator only captures to the next whitespace character, so you don't need to exclude \n or \r in any case.
Consult the fscanf documentation for more information: http://www.cplusplus.com/reference/clibrary/cstdio/fscanf/
Your loop has several issues. You wrote:
while( fscanf( f, "%[^\n\r]s", cLine ) != EOF )
/* do something */;
Some things to consider:
fscanf() returns the number of items stored. It can return EOF if it reads past the end of file or if the file handle has an error. You need to distinguish a valid return of zero in which case there is no new content in the buffer cLine from a successfully read.
You do a have a problem when a failure to match occurs because it is difficult to predict where the file handle is now pointing in the stream. This makes recovery from a failed match harder to do than might be expected.
The pattern you wrote probably doesn't do what you intended. It is matching any number of characters that are not CR or LF, and then expecting to find a literal s.
You haven't protected your buffer from an overflow. Any number of characters may be read from the file and written to the buffer, regardless of the size allocated to that buffer. This is an unfortunately common error, that in many cases can be exploited by an attacker to run arbitrary code of the attackers choosing.
Unless you specifically requested that f be opened in binary mode, line ending translation will happen in the library and you will generally never see CR characters, and usually not in text files.
You probably want a loop more like the following:
while(fgets(cLine, N_CLINE, f)) {
/* do something */ ;
}
where N_CLINE is the number of bytes available in the buffer starting a cLine.
The fgets() function is a much preferred way to read a line from a file. Its second parameter is the size of the buffer, and it reads up to 1 less than that size bytes from the file into the buffer. It always terminates the buffer with a nul character so that it can be safely passed to other C string functions.
It stops on the first of end of file, newline, or buffer_size-1 bytes read.
It leaves the newline character in the buffer, and that fact allows you to distinguish a single line longer than your buffer from a line shorter than the buffer.
It returns NULL if no bytes were copied due to end of file or an error, and the pointer to the buffer otherwise. You might want to use feof() and/or ferror() to distinguish those cases.
i think the problem with this code is because when you read with %[^\n\r]s, in fact, you reading until reach '\n' or '\r', but you don't reading the '\n' or '\r' also.
So you need to get this character before you read with fscanf again at loop.
Do something like that:
do{
fscanf(f, "%[^\n\r]s", cLine) != EOF
/* Do something here */
}while(fgetc(file) != EOF)

fscanf() only picking up first line of file

I have a tab delimited file that I am trying to convert to a tab delimited file. I am using C. I am getting stuck on trying to read the second line of the file. Now I just have an tens of thousand of lines repeating the first line.
#include <stdio.h>
#include <string.h>
#define SELLERCODE A2LQ9QFN82X636
int main ()
{
typedef char* string;
FILE* stream;
FILE* output;
string asin[200];
string sku[15];
string fnsku[15];
int quality = 0;
stream = fopen("c:\\out\\a.txt", "r");
output = fopen("c:\\out\\output.txt", "w");
if (stream == NULL)
{
perror("open");
return 0;
}
for(;;)
{
fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
printf("%s\t%s\n", sku, fnsku);
fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
}
}
Prefer fgets() to read the input and parse the lines in your program, using, for example, sscanf() or strtok().
fscanf is notoriously difficult to use.
Your fscanf is not performing any conversions after the first line.
It reads characters up to a TAB, then ignores the TAB, and reads more characters up to the next TAB. On the 2nd time through the loop, there is no data for sku: the 1st character is a TAB.
Do check the return value though. It helps enormously.
chk = fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
/* 2 conversions: sku and fnsku */
if (chk != 2) {
/* something went wrong */
}
You are reading with
fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
After the first line is read, which should ends with a tab character (as in "%[^\t]\t%[^\t]"). The input buffer has the last tab character '\t' which is not read by the above function call. So in the next iteration it gets read at the beginning with your format string. But the fcanf in the next iteration immediately returns as it has encountered a tab character '\t' at the very beginning ("%[^\t]") , so the buffers still have the last read in value. From now on each iteration tries to read the file with the fscanf but fails every time encountering a '\t' at the very beginning. So you do not progress reading the file, and the first read values from your program buffers are shown on and on.
You need to read out the last character which terminated the scanset matching. You can either use a fgetc (stream) after the fscanf () call or use the following format string: "%[^\t]\t%[^\t]%*c" . The %*c is the assignment suppression syntax. This will make one character read from the input file but then discard it.
Also you should check what the fscanf () returns. If it does not return 2 (the number of elements to read) then there is a problem which you should handle. This way you can ensure the correct number of elements were read at one call.
So either you can do:
while (!feof (stream))
{
fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
fgetc (stream);
printf("%s\t%s\n", sku, fnsku);
fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
}
Or you can do:
while (!feof (stream))
{
fscanf(stream, "%[^\t]\t%[^\t]%*c", sku, fnsku);
printf("%s\t%s\n", sku, fnsku);
fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
}
But i will recommend to read it with fgets () and then parse it inside your program with strtok () or other means and ways.
EDIT1:
Note that if you have the original file terminated with a '\n' then after you read the lines as above an extra newline would be added into your buffers. If you still consider to directly read the fields with fscanf () where each line has multiple fields seperated with '\t' and an entry is terminated with a '\n' then you should use the following format string: "%[^\t]\t%[^\t]\n".
It is difficult to answer while we do not get the exact format of the file. Does the file contain only one single line with fields seperated with tabs? Or there are multiple lines, with each line having tab separated fields. If the later is true, best is to scan the whole line at once and then parse it internally.
Ok, here is what is actually happening. You are reading the first line, and from then on you aren't reading anything and just reusing those values. You should check the return value of fscanf and exit the loop if it is less than two (which it will be after the first iteration). Your fscanf line should look like this:
if( fscanf(stream, "%[^\t]\t%[^\t]\n", sku, fnsku) < 2 ) break;
The key is the newline at the end, which will eat the newline in the input.
There are some problems with your printf as well. (Incorrect number of formatting strings.) I'll leave that to you.

How to skip a line when fscanning a text file?

I want to scan a file and skip a line of text before reading. I tried:
fscanf(pointer,"\n",&(*struct).test[i][j]);
But this syntax simply starts from the first line.
I was able to skip lines with scanf with the following instruction:
fscanf(config_file, "%*[^\n]\n");
The format string matches a line containing any character including spaces. The * in the format string means we are not interested in saving the line, but just in incrementing the file position.
Format string explanation:
% is the character which each scanf format string starts with;
* indicates to not put the found pattern anywhere (typically you save pattern found into parameters after the format string, in this case the parameter is NULL);
[^\n] means any character except newline;
\n means newline;
so the [^\n]\n means a full text line ending with newline.
Reference here.
fgets will get one line, and set the file pointer starting at the next line. Then, you can start reading what you wish after that first line.
char buffer[100];
fgets(buffer, 100, pointer);
It works as long as your first line is less than 100 characters long. Otherwise, you must check and loop.
It's not clear what are you trying to store your data into so it's not easy to guess an answer, by the way you could just skip bytes until you go over a \n:
FILE *in = fopen("file.txt", "r");
Then you can either skip a whole line with fgets but it is unsafe (because you will need to estimate the length of the line a priori), otherwise use fgetc:
char c;
do {
c = fgetc(in);
} while (c != '\n');
Finally you should have format specifiers inside your fscanf to actually parse data, like
fscanf(in, "%f", floatVariable);
you can refer here for specifiers.
fgets would work here.
#define MAX_LINE_LENGTH 80
char buf[MAX_LINE_LENGTH];
/* skip the first line (pFile is the pointer to your file handle): */
fgets(buf, MAX_LINE_LENGTH, pFile);
/* now you can read the rest of your formatted lines */

Resources