Check multiple files with "strstr" and "fopen" in C - c

Today I decided to learn to code for the first time in my life. I decided to learn C. I have created a small program that checks a txt file for a specific value. If it finds that value then it will tell you that that specific value has been found.
What I would like to do is that I can put multiple files go through this program. I want this program to be able to scan all files in a folder for a specific string and display what files contain that string (basically a file index)
I just started today and I'm 15 years old so I don't know if my assumptions are correct on how this can be done and I'm sorry if it may sound stupid but I have been thinking of maybe creating a thread for every directory I put into this program and each thread individually runs that code on the single file and then it displays all the directories in which the string can be found.
I have been looking into threading but I don't quite understand it. Here's the working code for one file at a time. Does anyone know how to make this work as I want it?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
//searches for this string in a txt file
char searchforthis[200];
//file name to display at output
char ch, file_name[200];
FILE *fp;
//Asks for full directory of txt file (example: C:\users\...) and reads that file.
//fp is content of file
printf("Enter name of a file you wish to check:\n");
gets(file_name);
fp = fopen(file_name, "r"); // read mode
//If there's no data inside the file it displays following error message
if (fp == NULL)
{
perror("Error while opening the file.\n");
exit(EXIT_FAILURE);
}
//asks for string (what has to be searched)
printf("Enter what you want to search: \n");
scanf("%s", searchforthis);
char* p;
// Find first occurrence of searchforthis in fp
p = strstr(searchforthis, fp);
// Prints the result
if (p) {
printf("This Value was found in following file:\n%s", file_name);
} else
printf("This Value has not been found.\n");
fclose(fp);
return 0;
}

This line,
p = strstr(searchforthis, fp);
is wrong. strstr() is defined as, char *strstr(const char *haystack, const char *needle), no file pointers in it.
Forget about gets(), its prone to overflow, reference, Why is the gets function so dangerous that it should not be used?.
Your scanf("%s",...) is equally dangerous to using gets() as you don't limit the character to be read. Instead, you could re-format it as,
scanf("%199s", searchforthis); /* 199 characters + \0 to mark the end of the string */
Also check the return value of scanf() , in case an input error occurs, final code should look like this,
if (scanf("%199s", searchforthis) != 1)
{
exit(EXIT_FAILURE);
}
It is even better, if you use fgets() for this, though keep in mind that fgets() will also save the newline character in the buffer, you are going to have to strip it manually.
To actually perform checks on the file, you have to read the file line by line, by using a function like, fgets() or fscanf(), or POSIX getline() and then use strstr() on each line to determine if you have a match or not, something like this should work,
char *p;
char buff[500];
int flag = 0, lines = 1;
while (fgets(buff, sizeof(buff), fp) != NULL)
{
size_t len = strlen(buff); /* get the length of the string */
if (len > 0 && buff[len - 1] == '\n') /* check if the last character is the newline character */
{
buff[len - 1] = '\0'; /* place \0 in the place of \n */
}
p = strstr(buff, searchforthis);
if (p != NULL)
{
/* match - set flag to 1 */
flag = 1;
break;
}
}
if (flag == 0)
{
printf("This Value has not been found.\n");
}
else
{
printf("This Value was found in following file:\n%s", file_name);
}
flag is used to determine whether or not searchforthis exists in the file.
Side note, if the line contains more than 499 characters, you will need a larger buffer, or a different function, consider getline() for that case, or even a custom one reading character by character.
If you want to do this for multiple files, you have to place the whole process in a loop. For example,
for (int i = 0; i < 5; i++) /* this will execute 5 times */
{
printf("Enter name of a file you wish to check:\n");
...
}

Related

How to split text from file into words?

I'm trying to get the text from a file and split it into words by removing spaces and other symbols. This is part of my code for handling the file:
void addtext(char wordarray[M][N])
{
FILE *fileinput;
char word[N];
char filename[N];
char *pch;
int i=0;
printf("Input the name of the text file(ex \"test.txt\"):\n");
scanf("%19s", filename);
if ((fileinput = fopen(filename, "rt"))==NULL)
{
printf("cannot open file\n");
exit(EXIT_FAILURE);
}
fflush(stdin);
while (fgets(word, N, fileinput)!= NULL)
{
pch = strtok (word," ,'.`:-?");
while (pch != NULL)
{
strcpy(wordarray[i++], pch);
pch = strtok (NULL, " ,'.`:-?");
}
}
fclose(fileinput);
wordarray[i][0]='\0';
return ;
}
But here is the issue. When the text input from the file is:
Alice was beginning to get very tired of sitting by her sister on the bank.
Then the output when I try to print it is this:
Alice
was
beginning
to
get
very
tired
of
sitting
by
her
s
ister
on
the
bank
As you can see, the word "sister" is split into 2. This happens quite a few times when adding a bigger text file. What am I missing?
If you count the characters you'll see that s is the 57th character. 57 is 19 times 3 which is the number of parsed characters in each cycle, (20 -1, as fgets null terminates the string and leaves the 20th character in the buffer).
As you are reading lines in batches of 19 characters, the line will be cuted every multiple of 19 charater and the rest will be read by the next fgets in the cycle.
The first two times you where lucky enough that the line was cutted at a space, character 19 at the end of beggining, character 38 at the end of tired, the third time it was in the midle of sister so it cuted it in two words.
Two possible fixes:
Replace:
while (fgets(word, N, fileinput)!= NULL)
With:
while (fscanf(fileinput, %19s, word) == 1)
Provided that there are no words larger than 19 in the file, which is the case.
Make word large enough to take whole the line:
char word[80];
80 should be enough for the sample line.
What am I missing?
You are missing that a single fgets call at maximum will read N-1 characters from the file, Consequently the buffer word may contain only the first part of a word. For instance it seems that in your case the s from the word sister was read by one fgets call and that the remaining part, i.e. ister was read by the next fgets call. Consequently, your code detected sister as two words.
So you need to add code that can check whether the end of the is a whole word or a part of a word.
To start with you can increase N to a higher number but to make it work in general you must add code that checks the end of the word buffer.
Also notice that long words may require more than 2 fgets call.
As a simple alternative to fgets and strtok consider fread and a simple char-by-char passing of the input.
Below is a simple, low-performance example of how it can be done.
int isdelim(char c)
{
if (c == '\n') return 1;
if (c == ' ') return 1;
if (c == '.') return 1;
return 0;
}
void addtext(void)
{
FILE *fileinput;
char *filename = "test.txt";
if ((fileinput = fopen(filename, "rt"))==NULL)
{
printf("cannot open file\n");
return;
}
char c;
int state = LOOK_FOR_WORD;
while (fread(&c, 1, 1, fileinput) == 1)
{
if (state == LOOK_FOR_WORD)
{
if (isdelim(c))
{
// Nothing to do.. keep looking for next word
}
else
{
// A new word starts
putchar(c);
state = READING_WORD;
}
}
else
{
if (isdelim(c))
{
// Current word ended
putchar('\n');
state = LOOK_FOR_WORD;
}
else
{
// Current word continues
putchar(c);
}
}
}
fclose(fileinput);
return ;
}
To keep the code simple it prints the words using putchar instead of saving them in an array but that is quite easy to change.
Further, the code only reads one char at the time from the file. Again it's quit easy to change the code and read bigger chunks from the file.
Likewise you can add more delimiters to isdelim as you like (and improve the implementation)

My program creates a file named date.in but it is not inserting all the numbers

Write a C program that reads from the keyboard a natural number n
with up to 9 digits and creates the text file data.out containing the
number n and all its non-zero prefixes, in a single line, separated by
a space, in order decreasing in value. Example: for n = 10305 the data
file.out will contain the numbers: 10305 1030 103 10 1.
This is what I made:
#include <stdio.h>
int main()
{
int n;
FILE *fisier;
fisier=fopen("date.in","w");
printf("n= \n");
scanf("%d",&n);
fprintf(fisier,"%d",n);
while(n!=0)
{
fisier=fopen("date.in","r");
n=n/10;
fprintf(fisier,"%d",n);
}
fclose(fisier);
}
Few things:
Function calls may return error. You need to check that every time.
fisier=fopen("date.in","w");
This should have been followed by an error check. To understand more on what it return, first thing you should do is read the man page for that function. See man page for fopen(). If there is an error in opening the file, it will return NULL and errno is set to a value which indicates what error occurred.
if (NULL == fisier)
{
// Error handling code
;
}
Your next requirement is separating the numbers by a space. There isn't one. The following should do it.
fprintf(fisier, "%d ", n);
The next major problem is opening the file in a loop. Its like you are trying to open a door which is already open.
fisier=fopen("date.in","r");
if(NULL == fisier)
{
// Error handling code
;
}
while(n!=0)
{
n=n/10;
fprintf(fisier,"%d",n);
}
fclose(fisier);
A minor issue that you aren't checking is the number is not having more than 9 digits.
if(n > 999999999)
is apt after you get a number. If you want to deal with negative numbers as well, you can modify this condition the way you want.
In a nutshell, at least to start with, the program should be something similar to this:
#include <stdio.h>
// Need a buffer to read the file into it. 64 isn't a magic number.
// To print a 9 digit number followed by a white space and then a 8 digit number..
// and so on, you need little less than 64 bytes.
// I prefer keeping the memory aligned to multiples of 8.
char buffer[64];
int main(void)
{
size_t readBytes = 0;
int n = 0;
printf("\nEnter a number: ");
scanf("%d", &n);
// Open the file
FILE *pFile = fopen("date.in", "w+");
if(NULL == pFile)
{
// Prefer perror() instead of printf() for priting errors
perror("\nError: ");
return 0;
}
while(n != 0)
{
// Append to the file
fprintf(pFile, "%d ", n);
n = n / 10;
}
// Done, close the file
fclose(pFile);
printf("\nPrinting the file: ");
// Open the file
pFile = fopen("date.in", "r");
if(NULL == pFile)
{
// Prefer perror() instead of printf() for priting errors
perror("\nError: ");
return 0;
}
// Read the file
while((readBytes = fread(buffer, 1, sizeof buffer, pFile)) > 0)
{
// Preferably better way to print the contents of the file on stdout!
fwrite(buffer, 1, readBytes, stdout);
}
printf("\nExiting..\n\n");
return 0;
}
Remember: The person reading your code may not be aware of all the requirements, so comments are necessary. Secondly, I understand english to a decent level but I don't know what 'fisier' means. Its recommended to name variables in such a way that its easy to understand the purpose of the variable. For example, pFile is a pointer to a file. p in the variable immediately gives an idea that its a pointer.
Hope this helps!
To draw a conclusion from all the comments:
fopen returns a file handle when successfull and NULL otherwise. Opening a file twice might result in an error (it does on my machine), such that fisier is set to NULL inside the loop. Obvioulsy fprintf to NULL wont do anything.
You only need to call fopen once, so remove it from the loop. After that it will work as intended.
It's alwas good to check if the fopen succeeded or not:
FILE *fisier;
fisier=fopen("date.in","w");
if(!fisier) { /* handle error */ }
You print no spaces between the numbers. Maybe that's intended, but maybe
fprintf(fisier,"%d ",n);
would be better.

C - Print lines from file with getline()

I am trying to write a simple C program that loads a text-file, prints the first line to screen, waits for the user to press enter and then prints the next line, and so on.
As only argument it accepts a text-file that is loaded as a stream "database". I use the getline()-function for this, according to this example. It compiles fine, successfully loads the text-file, but the program never enters the while-loop and then exits.
#include <stdio.h>
#include <stdlib.h>
FILE *database = NULL; // input file
int main(int argc, char *argv[])
{
/* assuming the user obeyed syntax and gave input-file as first argument*/
char *input = argv[1];
/* Initializing input/database file */
database = fopen(input, "r");
if(database == NULL)
{
fprintf(stderr, "Something went wrong with reading the database/input file. Does it exist?\n");
exit(EXIT_FAILURE);
}
printf("INFO: database file %s loaded.\n", input);
/* Crucial part printing line after line */
char *line = NULL;
size_t len = 0;
ssize_t read;
while((read = getline(&line, &len, database)) != -1)
{
printf("INFO: Retrieved line of length %zu :\n", read);
printf("%s \n", line);
char confirm; // wait for user keystroke to proceed
scanf("%c", &confirm);
// no need to do anything with "confirm"
}
/* tidy up */
free(line);
fclose(database);
exit(EXIT_SUCCESS);
}
I tried it with fgets() -- I can also post that code --, but same thing there: it never enters the while-loop.
It might be something very obvious; I am new to programming.
I use the gcc-compiler on Kali Linux.
Change your scanf with fgetline using stdin as your file parameter.
You should step through this in a debugger, to make sure your claim that it never enters the while loop is correct.
If it truly never enters the while loop, it is necessarily because getline() has returned -1. Either the file is truly empty, or you have an error reading the file.
man getline says:
On success, getline() and getdelim() return the number of
characters
read, including the delimiter character, but not including the termi‐
nating null byte ('\0'). This value can be used to handle embedded
null bytes in the line read.
Both functions return -1 on failure to read a line (including end-of-
file condition). In the event of an error, errno is set to indicate
the cause.
Therefore, you should enhance your code to check for stream errors and deal with errno -- you should do this even when your code works, because EOF is not the only reason for the function
to return -1.
int len = getline(&line, &len, database);
if(len == -1 && ferror(database)) {
perror("Error reading database");
}
You can write more detailed code to deal with errno in more explicit ways.
Unfortunately handling this thoroughly can make your code a bit more verbose -- welcome to C!

fgets() not working after fscanf()

I am using fscanf to read in the date and then fgets to read the note.
However after the first iteration, fscanf returns a value of -1.
I used GDB to debug the program step by step. It works fine until the first use of fgets. When I try print out the line read by fgets on the first iteration, it gives me this:
(gdb) print line
$6 = "\rtest\r18/04/2010\rtest2\r03/05/2010\rtest3\r05/08/2009\rtest4\r\n\000\000\000\000q\352\261\a\370\366\377\267.N=\366\000\000\000\000\003\000\000\000\370xC\000\000\000\000\000\000\000\000\000\001\000\000\000\227\b\000\000\070\367\377\267H\364\377\267\362\202\004\bdoD\000\354\201\004\b\001\000\000\000\304oC\000p\363\377\277\260zC\000D\363\377\277\n!B\000\064\363\377\277\354\201\004\b(\363\377\277TzC\000\000\000\000\000\070\367\377\267\001\000\000\000\000\000\000\000\001\000\000\000\370xC\000\001\000\000\000\000\000\312\000\000\000\000\000\377\260\360\000\001\000\000\000\277\000\000\000\364\317\000\000\344\261\\\000\000\000\000\000p\363\377\277|\233\004\b\350\362\377\277 \204\004\b\005\000\000\000|\233\004\b\030\363\377\277"
It looks like fgets reads the remaining entries and then stores them all in a single string.
I am not sure why it is doing this.
Here is the main code:
int main(int argc, char* argv[]) {
FILE* file;
int numEntries, i = 0;
int index = atoi(argv[1]);
char line[SIZE];
JournalEntry *entry;
/*argument provided is the entry user wants to be displayed*/
if (argc > 2) {
perror("Error: Too many arguments provided");
}
file = fopen("journalentries.txt", "r");
if (file == NULL) {
perror("Error in opening file");
}
if (fscanf(file, "%d", &numEntries) != 1) {
perror("Unable to read number of entries");
}
entry = (JournalEntry*)malloc(numEntries * sizeof(JournalEntry));
if (entry == NULL) {
perror("Malloc failed");
}
for (i = 0; i < numEntries; i++) {
if (fscanf(file, "%d/%d/%d", &entry[i].day, &entry[i].month, &entry[i].year) != 3) {
perror("Unable to read date of entry");
}
if (fgets(line, sizeof(line), file) == NULL) {
perror("Unable to read text of entry");
}
}
printf("%d-%02d-%02d %s: ", entry[index].year, entry[index].month, entry[index].day, entry[index].text);
if(ferror(file)) {
perror("Error with file");
}
fclose(file);
free(entry);
return 0;
}
The file that I have to read:
The very first line contains the number of entries to be read
4
12/04/2010
test
18/04/2010
test2
03/05/2010
test3
05/08/2009
test4
The struct JournalEntry located in the header file:
typedef struct {
int day;
int month;
int year;
char text[250];
} JournalEntry;
It looks like fgets reads the remaining entries and then stores them all in a single string.
Yes, '\r' is not line terminator. So when fscanf stops parsing at the first invalid character, and leaves them in the buffer, then fgets will read them until end of line. And since there are no valid line terminators in the file, that is until end of file.
You should probably fix the file to have valid (Unix?) line endings, for example with suitable text editor which can do it. But that is another question, which has been asked before (like here), and depends on details not included in your question.
Additionally, you need dual check for fscanf return value. Use perror only if return value is -1, otherwise error message will not be related to the error at all. If return value is >=0 but different from what you wanted, then print custom error message "invalid input syntax" or whatever (and possibly use fgets to read rest of the line out of the buffer).
Also, to reliably mix scanf and fgets, I you need to add space in the fscanf format string, so it will read up any whitespace at the end of the line (also at the start of next line and any empty lines, so be careful if that matters), like this:
int items_read = scanf("%d ", &intvalue);
As stated in another answer, it's probably best to read lines with fgets only, then parse them with sscanf line-by-line.
Don't mix fscanf() and fgets(), since the former might leave stuff in the stream's buffer.
For a line-oriented format, read only full lines using fgets(), then use e.g. sscanf() to parse what you've read.
The string you see when running GDB really ends at the first null character:
"\rtest\r18/04/2010\rtest2\r03/05/2010\rtest3\r05/08/2009\rtest4\r\n\000"
The other data after is ignored (when using ordinary str-functions);

C, reading a multiline text file

I know this is a dumb question, but how would I load data from a multiline text file?
while (!feof(in)) {
fscanf(in,"%s %s %s \n",string1,string2,string3);
}
^^This is how I load data from a single line, and it works fine. I just have no clue how to load the same data from the second and third lines.
Again, I realize this is probably a dumb question.
Edit: Problem not solved. I have no idea how to read text from a file that's not on the first line. How would I do this? Sorry for the stupid question.
Try something like:
/edited/
char line[512]; // or however large you think these lines will be
in = fopen ("multilinefile.txt", "rt"); /* open the file for reading */
/* "rt" means open the file for reading text */
int cur_line = 0;
while(fgets(line, 512, in) != NULL) {
if (cur_line == 2) { // 3rd line
/* get a line, up to 512 chars from in. done if NULL */
sscanf (line, "%s %s %s \n",string1,string2,string3);
// now you should store or manipulate those strings
break;
}
cur_line++;
}
fclose(in); /* close the file */
or maybe even...
char line[512];
in = fopen ("multilinefile.txt", "rt"); /* open the file for reading */
fgets(line, 512, in); // throw out line one
fgets(line, 512, in); // on line 2
sscanf (line, "%s %s %s \n",string1,string2,string3); // line 2 is loaded into 'line'
// do stuff with line 2
fgets(line, 512, in); // on line 3
sscanf (line, "%s %s %s \n",string1,string2,string3); // line 3 is loaded into 'line'
// do stuff with line 3
fclose(in); // close file
Putting \n in a scanf format string has no different effect from a space. You should use fgets to get the line, then sscanf on the string itself.
This also allows for easier error recovery. If it were just a matter of matching the newline, you could use "%*[ \t]%*1[\n]" instead of " \n" at the end of the string. You should probably use %*[ \t] in place of all your spaces in that case, and check the return value from fscanf. Using fscanf directly on input is very difficult to get right (what happens if there are four words on a line? what happens if there are only two?) and I would recommend the fgets/sscanf solution.
Also, as Delan Azabani mentioned... it's not clear from this fragment whether you're not already doing so, but you have to either define space [e.g. in a large array or some dynamic structure with malloc] to store the entire dataset, or do all your processing inside the loop.
You should also be specifying how much space is available for each string in the format specifier. %s by itself in scanf is always a bug and may be a security vulnerability.
First off, you don't use feof() like that...it shows a probable Pascal background, either in your past or in your teacher's past.
For reading lines, you are best off using either POSIX 2008 (Linux) getline() or standard C fgets(). Either way, you try reading the line with the function, and stop when it indicates EOF:
while (fgets(buffer, sizeof(buffer), fp) != 0)
{
...use the line of data in buffer...
}
char *bufptr = 0;
size_t buflen = 0;
while (getline(&bufptr, &buflen, fp) != -1)
{
...use the line of data in bufptr...
}
free(bufptr);
To read multiple lines, you need to decide whether you need previous lines available as well. If not, a single string (character array) will do. If you need the previous lines, then you need to read into an array, possibly an array of dynamically allocated pointers.
Every time you call fscanf, it reads more values. The problem you have right now is that you're re-reading each line into the same variables, so in the end, the three variables have the last line's values. Try creating an array or other structure that can hold all the values you need.
The best way to do this is to use a two dimensional array and and just write each line into each element of the array. Here is an example reading from a .txt file of the poem Ozymandias:
int main() {
char line[15][255];
FILE * fpointer = fopen("ozymandias.txt", "rt");
for (int a = 0; a < 15; a++) {
fgets(line[a], 255, fpointer);
}
for (int b = 0; b < 15; b++) {
printf("%s", line[b]);
}
return 0;
This produces the poem output. Notice that the poem is 14 lines long, it is more difficult to print out a file whose length you do not know because reading a blank line will produce the output "x�oA". Another issue is if you check if the next line is null by writing
while (fgets(....) != NULL)) {
each line will be skipped. You could try going back a line each time to solve this but i think this solution is fine for all intents.
I have an even EASIER solution with no confusing snippets of puzzling methods (no offense to the above stated) here it is:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
string line;//read the line
ifstream myfile ("MainMenu.txt"); // make sure to put this inside the project folder with all your .h and .cpp files
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
Happy coding

Resources