Finding a keyword from one file in another file - c

So my project for class needs me to find a list of keywords from one file:
Master's,Bachelor's,Professor
And needs me to find it through a resume in another file:
John Smith
1234 Residence Road
johnsmith#gmail.com
Degree level: Bachelor's degree
Major: Applied mathematics
Work Experience:
Professor at local university for multiple math classes, mainly calculus
Worked at a tech company studying the analytics of their online store
Now I have the keywords stored in an array of char spacing[10][15] (with no commas)
and I have the resume saved as char* buffer. Both work, as they both print out, but when trying to find the keywords I keep getting 0 (int KWcount is the counter for the amount of times a keyword appears). Here's the code both putting the resume into buffer and my attempt at finding the words.
//Resume
FILE* fp2;
fp2 = fopen("resume.txt", "r");
if (fp == NULL) {
printf("\nFile not found.\n");
return 0;
}
//File Reading
fseek(fp2, 0L, SEEK_END);
numbytes = ftell(fp2);
fseek(fp2, 0L, SEEK_SET);
buffer = (char*)calloc(numbytes, sizeof(char));
if (buffer == NULL)
return 1;
fread(buffer, sizeof(char), numbytes, fp2);
fclose(fp2);
printf("Before process: %i", KWcount);
//Search for keyword
for (i=0; i < 10; i++) {
if (strcmp(buffer, spacing[i]) == 0) {
KWcount++;
}
}
printf("\n\nAfter process:%i\n", KWcount);
fclose(fp2);
}
Before process:0
After process:0
I genuinely cannot figure out what the problem is and my professor is not really any help, so does anyone have any tips or ways to fix this?

strcmp(buffer, spacing[i])
strcmp will return 0 only if the two arguments point to strings which are exactly equal. What you need to do is search buffer word by word and then compare those with the word you expect to find.
You may find the function strtok useful to break apart your resume buffer into words.
And also I recommend using strncmp, since it allows you to cap the maximum number of characters in your comparison. After all, your goal is to compare just a single word from the resume, not the entire string.

Related

Why this code doesn't display all chars in the file ? I tried also getc() and fscanf(); aslo doesn't not worked. look at the screen shot?

#include<stdio.h>
#include<stdlib.h>
int main()
{
char str[1];
FILE *file;
file = fopen("dataloger.txt", "rb");
while(fread(&str[0], sizeof(str), 1, file) ==1){
if(str[0] =='\n'){
str[0] = '#';
}
printf("%s", str);
}
fclose(file);
/************************************************
int c;
if (file) {
while ((c = getc(file)) != EOF){
if(c =='\n') {
c = '#';
}
printf("%c", c);
}
fclose(file);
}
could someone explain why I didn't see normal output not just #?
As I faced the problem only after I try to replace the char'\n' by another char and print it.
Look at the screen shot added
I copied in your code minus the commented out bits and did some testing. Following is a version of your code with some tweaks.
#include<stdio.h>
#include<stdlib.h>
int main()
{
char str; /* In reality, "str" is a character - a string normally would be at least two characters long with a NULL terminator */
FILE *fp; /* Don't like to name variables with possibly reserved words */
fp = fopen("dataloger.txt", "rb");
if (fp == NULL) /* Just to make the program more robust */
{
printf("Could not find the file\n");
return -1;
}
while(fread(&str, sizeof(str), 1, fp) ==1)
{
if(str =='\n')
{
str = '#';
}
printf("%c", str);
}
fclose(fp);
printf("\n"); /* Just to be neat */
return 0;
}
Following are some points to note.
First, since the program is supposed to read a character at a time from the text file, it seemed more clear to utilize a character variable in lieu of a character array that had a length of "1". Technically both work, but that would come across a bit clearer to anyone analyzing the program.
It's usually a good idea to shy away from naming variables that might have reserved meaning in a program language. Although "file" and "FILE" are different names, it can add to confusion if anyone is analyzing this code.
Also, when working with files, it is often a good idea to check to make sure there were no issues with opening the file; therefore, a check to see if the file pointer is not NULL is beneficial - this might be at the core of your issue depending upon where the text file is in relation to the compiled program.
Testing out this tweaked code, a simple two-line text file was set up with the following test data.
The quick brown fox jumps over the lazy dog.
Now is the time for all good men to come to the aid of their country.
Executing the code resulted in the following output at the terminal (the program was compiled with a name of "ReadFile").
#Dev:~/C_Programs/Console/ReadFile/bin/Release$ ./ReadFile
The quick brown fox jumps over the lazy dog.#Now is the time for all good men to come to the aid of their country.#
Give those tweaks a try to see if it meets the spirit of your project.

Check multiple files with "strstr" and "fopen" in C

Today I decided to learn to code for the first time in my life. I decided to learn C. I have created a small program that checks a txt file for a specific value. If it finds that value then it will tell you that that specific value has been found.
What I would like to do is that I can put multiple files go through this program. I want this program to be able to scan all files in a folder for a specific string and display what files contain that string (basically a file index)
I just started today and I'm 15 years old so I don't know if my assumptions are correct on how this can be done and I'm sorry if it may sound stupid but I have been thinking of maybe creating a thread for every directory I put into this program and each thread individually runs that code on the single file and then it displays all the directories in which the string can be found.
I have been looking into threading but I don't quite understand it. Here's the working code for one file at a time. Does anyone know how to make this work as I want it?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
//searches for this string in a txt file
char searchforthis[200];
//file name to display at output
char ch, file_name[200];
FILE *fp;
//Asks for full directory of txt file (example: C:\users\...) and reads that file.
//fp is content of file
printf("Enter name of a file you wish to check:\n");
gets(file_name);
fp = fopen(file_name, "r"); // read mode
//If there's no data inside the file it displays following error message
if (fp == NULL)
{
perror("Error while opening the file.\n");
exit(EXIT_FAILURE);
}
//asks for string (what has to be searched)
printf("Enter what you want to search: \n");
scanf("%s", searchforthis);
char* p;
// Find first occurrence of searchforthis in fp
p = strstr(searchforthis, fp);
// Prints the result
if (p) {
printf("This Value was found in following file:\n%s", file_name);
} else
printf("This Value has not been found.\n");
fclose(fp);
return 0;
}
This line,
p = strstr(searchforthis, fp);
is wrong. strstr() is defined as, char *strstr(const char *haystack, const char *needle), no file pointers in it.
Forget about gets(), its prone to overflow, reference, Why is the gets function so dangerous that it should not be used?.
Your scanf("%s",...) is equally dangerous to using gets() as you don't limit the character to be read. Instead, you could re-format it as,
scanf("%199s", searchforthis); /* 199 characters + \0 to mark the end of the string */
Also check the return value of scanf() , in case an input error occurs, final code should look like this,
if (scanf("%199s", searchforthis) != 1)
{
exit(EXIT_FAILURE);
}
It is even better, if you use fgets() for this, though keep in mind that fgets() will also save the newline character in the buffer, you are going to have to strip it manually.
To actually perform checks on the file, you have to read the file line by line, by using a function like, fgets() or fscanf(), or POSIX getline() and then use strstr() on each line to determine if you have a match or not, something like this should work,
char *p;
char buff[500];
int flag = 0, lines = 1;
while (fgets(buff, sizeof(buff), fp) != NULL)
{
size_t len = strlen(buff); /* get the length of the string */
if (len > 0 && buff[len - 1] == '\n') /* check if the last character is the newline character */
{
buff[len - 1] = '\0'; /* place \0 in the place of \n */
}
p = strstr(buff, searchforthis);
if (p != NULL)
{
/* match - set flag to 1 */
flag = 1;
break;
}
}
if (flag == 0)
{
printf("This Value has not been found.\n");
}
else
{
printf("This Value was found in following file:\n%s", file_name);
}
flag is used to determine whether or not searchforthis exists in the file.
Side note, if the line contains more than 499 characters, you will need a larger buffer, or a different function, consider getline() for that case, or even a custom one reading character by character.
If you want to do this for multiple files, you have to place the whole process in a loop. For example,
for (int i = 0; i < 5; i++) /* this will execute 5 times */
{
printf("Enter name of a file you wish to check:\n");
...
}

Make array read file and store all data

Okay, so all my code is functional. I am mostly looking for suggestions.
Right now, I have a file being read. Each line of the file has 3 different variables. These variables are being read into an array.
The problem I am trying get input on it that as the file is read in the while loop, the data overwrites itself. I need all the data stored in one array with spaces between. I a not sure what it is not currently doing that. Is there a better function to be using?
Here is a sample of what I have:
char filepath[1000], filepathBP1[1000];
char BP2_ext [] = "\\BP_2.txt";
char bp2_Val1[80], bp2_Val2[80], bp2_Val3[80], bp2_Line[100];
FILE* fp;
strcpy(filepathBP1, filepath);
strcat(filepathBP1, BP1_ext);
fp = fopen(filepathBP1, "r");
if (fp == NULL)
{
puts("ERROR OPENING FILES");
exit(EXIT_FAILURE);
}
while (!feof(fp))
{
printf("\n\nREADING BP_1.txt...");
fgets(bp1_Line, 100, fp);
sscanf(bp1_Line, "%s\t%s\t%s", bp1_Val1, bp1_Val2, bp1_Val3);
printf("%s\t%s\t%s\n", bp1_Val1, bp1_Val2, bp1_Val3);
}
fclose(fp);
Here is a modified version of your code. Note this is just a basic solution. Please feel free to modify the code according to your needs. Now your basic idea/approach is correct. The only thing you need to do is that you have to have an array of "Strings" to store the "strings". Also, your question isn't clear. Please be more specific as to what you finally want the output to result in or look like.
Now in my program I have 3 array of "strings" variables. And each of these store the strings of a column of strings.
For example if the file data is like this,
abc def zxc
qwe rty uio
Then line_list1 will store strings abc,qwe, line_list2 will store the strings def,rty and line_list3 will store the strings zxc,uio. Now I don't know if this exactly what you want(since you haven't been specific what the resulting output should be/do/look like.), but, this program will give you an idea to make your program work.
Here is the program,
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#define MAX 100
int main(){
char bp1_Line[MAX];
char *line_list1[MAX],*line_list2[MAX],*line_list3[MAX];
int index=0,i=0;
FILE* fp=NULL;
fp = fopen("data.txt", "r");
if (fp == NULL){
puts("ERROR OPENING FILES");
exit(EXIT_FAILURE);
}
while ( fgets(bp1_Line, MAX, fp)!= NULL && index<MAX){
printf("READING:%s\n",bp1_Line);
if(sscanf(bp1_Line, "%s\t%s\t%s", &line_list1[index], &line_list2[index], &line_list3[index]) == 3){
strcpy(bp1_Line,"");
index++;
}
}
fclose(fp);
for(i=0;i<index;i++){
printf("%s\t%s\t%s\n", &line_list1[i], &line_list2[i], &line_list3[i]);
}
return 0;
}
Or if you wanna store all those strings/words in one array of strings then change the above code while loop section to this code,
while ( fgets(bp1_Line, MAX, fp)!= NULL && index<MAX){
printf("READING:%s\n",bp1_Line);
if(sscanf(bp1_Line,"%s\t%s\t%s",&list[index],&list[index+1],&list[index+2]) == 3){
strcpy(bp1_Line,"");
index=index+3;
}
}
You should switch to another programming language. Python may be good for you.
You left out most of the error handling. Python will throw exceptions in such a case, which are hard to ignore.
You use fixed-length character arrays without checking for overflow. Python has built-in string support.
Python has built-in support for data structures like resizable sequences and even dictionaries.

C, reading a multiline text file

I know this is a dumb question, but how would I load data from a multiline text file?
while (!feof(in)) {
fscanf(in,"%s %s %s \n",string1,string2,string3);
}
^^This is how I load data from a single line, and it works fine. I just have no clue how to load the same data from the second and third lines.
Again, I realize this is probably a dumb question.
Edit: Problem not solved. I have no idea how to read text from a file that's not on the first line. How would I do this? Sorry for the stupid question.
Try something like:
/edited/
char line[512]; // or however large you think these lines will be
in = fopen ("multilinefile.txt", "rt"); /* open the file for reading */
/* "rt" means open the file for reading text */
int cur_line = 0;
while(fgets(line, 512, in) != NULL) {
if (cur_line == 2) { // 3rd line
/* get a line, up to 512 chars from in. done if NULL */
sscanf (line, "%s %s %s \n",string1,string2,string3);
// now you should store or manipulate those strings
break;
}
cur_line++;
}
fclose(in); /* close the file */
or maybe even...
char line[512];
in = fopen ("multilinefile.txt", "rt"); /* open the file for reading */
fgets(line, 512, in); // throw out line one
fgets(line, 512, in); // on line 2
sscanf (line, "%s %s %s \n",string1,string2,string3); // line 2 is loaded into 'line'
// do stuff with line 2
fgets(line, 512, in); // on line 3
sscanf (line, "%s %s %s \n",string1,string2,string3); // line 3 is loaded into 'line'
// do stuff with line 3
fclose(in); // close file
Putting \n in a scanf format string has no different effect from a space. You should use fgets to get the line, then sscanf on the string itself.
This also allows for easier error recovery. If it were just a matter of matching the newline, you could use "%*[ \t]%*1[\n]" instead of " \n" at the end of the string. You should probably use %*[ \t] in place of all your spaces in that case, and check the return value from fscanf. Using fscanf directly on input is very difficult to get right (what happens if there are four words on a line? what happens if there are only two?) and I would recommend the fgets/sscanf solution.
Also, as Delan Azabani mentioned... it's not clear from this fragment whether you're not already doing so, but you have to either define space [e.g. in a large array or some dynamic structure with malloc] to store the entire dataset, or do all your processing inside the loop.
You should also be specifying how much space is available for each string in the format specifier. %s by itself in scanf is always a bug and may be a security vulnerability.
First off, you don't use feof() like that...it shows a probable Pascal background, either in your past or in your teacher's past.
For reading lines, you are best off using either POSIX 2008 (Linux) getline() or standard C fgets(). Either way, you try reading the line with the function, and stop when it indicates EOF:
while (fgets(buffer, sizeof(buffer), fp) != 0)
{
...use the line of data in buffer...
}
char *bufptr = 0;
size_t buflen = 0;
while (getline(&bufptr, &buflen, fp) != -1)
{
...use the line of data in bufptr...
}
free(bufptr);
To read multiple lines, you need to decide whether you need previous lines available as well. If not, a single string (character array) will do. If you need the previous lines, then you need to read into an array, possibly an array of dynamically allocated pointers.
Every time you call fscanf, it reads more values. The problem you have right now is that you're re-reading each line into the same variables, so in the end, the three variables have the last line's values. Try creating an array or other structure that can hold all the values you need.
The best way to do this is to use a two dimensional array and and just write each line into each element of the array. Here is an example reading from a .txt file of the poem Ozymandias:
int main() {
char line[15][255];
FILE * fpointer = fopen("ozymandias.txt", "rt");
for (int a = 0; a < 15; a++) {
fgets(line[a], 255, fpointer);
}
for (int b = 0; b < 15; b++) {
printf("%s", line[b]);
}
return 0;
This produces the poem output. Notice that the poem is 14 lines long, it is more difficult to print out a file whose length you do not know because reading a blank line will produce the output "x�oA". Another issue is if you check if the next line is null by writing
while (fgets(....) != NULL)) {
each line will be skipped. You could try going back a line each time to solve this but i think this solution is fine for all intents.
I have an even EASIER solution with no confusing snippets of puzzling methods (no offense to the above stated) here it is:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
string line;//read the line
ifstream myfile ("MainMenu.txt"); // make sure to put this inside the project folder with all your .h and .cpp files
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
Happy coding

XOR on a very big file

I would like to XOR a very big file (~50 Go).
More precisely, I would like to do so by XORing each block of 32 bytes of a plaintext file (because of lack of memory) with the key 3847611839 and create (block after block) a new cipher file.
Thank You for any help!!
This sounded like fun, and doesn't sound like a homework assignment.
I don't have a previously xor-encrypted file to try with,but if you convert one back and forward, there's no diff.
That I tried atleast. Enjoy! :) This xor's every 4 bytes with 0xE555E5BF, I presume that's what you wanted.
Here's bloxor.c
// bloxor.c - by Peter Boström 2009, public domain, use as you see fit. :)
#include <stdio.h>
unsigned int xormask = 0xE555E5BF; //3847611839 in hex.
int main(int argc, char *argv[])
{
printf("%x\n", xormask);
if(argc < 3)
{
printf("usage: bloxor 'file' 'outfile'\n");
return -1;
}
FILE *in = fopen(argv[1], "rb");
if(in == NULL)
{
printf("Cannot open: %s", argv[2]);
return -1;
}
FILE *out = fopen(argv[2], "wb");
if(out == NULL)
{
fclose(in);
printf("unable to open '%s' for writing.",argv[2]);
return -1;
}
char buffer[1024]; //presuming 1024 is a good block size, I dunno...
int count;
while(count = fread(buffer, 1, 1024, in))
{
int i;
int end = count/4;
if(count % 4)
++end;
for(i = 0;i < end; ++i)
{
((unsigned int *)buffer)[i] ^= xormask;
}
if(fwrite(buffer, 1, count, out) != count)
{
fclose(in);
fclose(out);
printf("cannot write, disk full?\n");
return -1;
}
}
fclose(in);
fclose(out);
return 0;
}
As starblue mentioned in a comment, "Be aware that this is at best obfuscation, not encryption". And it's probably not even obfuscation.
One property of XOR is that (Y xor 0) == Y. What this means for your algorithm is that for anyplace in your very big file where there are runs of zeros (which seems pretty likely given the size of the file), your key will show up in the cipher file. Plain as day.
Another nice feature of XOR encrypted stuff is that if someone has both the plaintext and the cipher text, XOR'ing those items together nets you an output that has the key used to perform the cipher repeated over and over. If the person knows that the 2 files are a plaintext/ciphertext pair, they've learned the key which is bad if the key is used for more than one encryption. if the attacker isn't sure if the plaintext and ciphertext are related, they have a pretty good idea after this since the key is a repeated pattern in the output. None of this is a problem with one time pad because each bit of the key is used only once, so one one learns anything new from this attack.
A lot of people make the mistake of assuming that because a one time pad is provably unbreakable, that an XOR encryption might be OK 'if done well' since the fundamental operation performed is the same. The difference is that a one time pad uses each random bit of the key exactly once. So among other things, if the plaintext has a run of zeros, nothing is learned about the key, unlike with a simple fixed-key XOR cipher.
As Bruce Schneier said: "There are two kinds of cryptography in this world: cryptography that will stop your kid sister from reading your files, and cryptography that will stop major governments from reading your files."
An XOR cipher is barely kid sister proof - if even that.
You need to craft a solution around a streaming architecture: you read the input file in "stream", modify it, and write the result in the output file.
This way, you don't have to read all the file at once.
If your question is how to do it without using extra space on the disk, I would just read in the chunks in multiples of 32 bytes (as big as you can), work with the chunk in memory, then write it out again. You should be able to use the ftell and fseek functions to do that (assuming your long type is large enough, of course).
It may be faster to memory-map the file if you can spare that much out of your address space (and your OS supports it) but I'd try the easiest solution first.
Of course, if space isn't a problem, just read the chunks in and write them to a new file, something like the following (pseudo-code):
open infile
open outfile
while not end of infile:
read chunk from file
change chunk
write chunk to outfile
close outfile
close infile
This sort of read/process/write is pretty basic stuff. If you have more complicated requirements, you should update your question with them.

Resources