Read in individual words from text file and translate - C

Read in individual words from text file and translate - C - c

I am writing a program (for a class assignment) to translate normal words into their pirate equivalents (hi = ahoy).
I have created the dictionary using two arrays of strings and am now trying to translate an input.txt file and put it into an output.txt file. I am able to write to the output file, but it only writes the translated first word over and over on a new line.
I've done a lot of reading/scouring and from what I can tell, using fscanf() to read my input file isn't ideal, but I cannot figure out what would be a better function to use. I need to read the file word by word (separated by space) and also read in each punctuation mark.
Input File:
Hi, excuse me sir, can you help
me find the nearest hotel? I
would like to take a nap and
use the restroom. Then I need
to find a nearby bank and make
a withdrawal.
Miss, how far is it to a local
restaurant or pub?
Output: ahoy (46 times, each on a separate line)
Translate Function:
void Translate(char inputFile[], char outputFile[], char eng[][20], char pir[][20]){
char currentWord[40] = {[0 ... 39] = '\0'};
char word;
FILE *inFile;
FILE *outFile;
int i = 0;
bool match = false;
//open input file
inFile = fopen(inputFile, "r");
//open output file
outFile = fopen(outputFile, "w");
while(fscanf(inFile, "%s1023", currentWord) == 1){
if( ispunct(currentWord) == 0){
while( match != true){
if( strcasecmp(currentWord, eng[i]) == 0 || i<28){ //Finds word in English array
fprintf(outFile, pir[i]); //Puts pirate word corresponding to English word in output file
match = true;
}
else {i++;}
}
match = false;
i=0;
}
else{
fprintf(outFile, &word);//Attempt to handle punctuation which should carry over to output
}
}
}

As you start matching against different english words, i<28 is initially true. Hence the expression <anything> || i<28 is also immediately true and correspondingly the code will behave as though a match was found on the first word in your dictionary.
To avoid this you should handle the "found a match at index i" and the "no match found" condition separately. This can be achieved as follow:
if (i >= dictionary_size) {
// No pirate equivalent, print English word
fprintf(outFile, "%s", currentWord);
break; // stop matching
}
else if (strcasecmp(currentWord, eng[i]) == 0){
...
}
else {i++;}
where dictionary_size would be 28 in your case (based on your attempt at a stop condition with i<28).

Here's a code snippet that I use to parse things out. Here's what it does:
Given this input:
hi, excuse me sir, how are you.
It puts each word into an array of strings based on the DELIMS constant, and deletes any char in the DELIMS const. This will destroy your original input string though. I simply print out the array of strings:
[hi][excuse][me][sir][how][are][you][(null)]
Now this is taking input from stdin, but you can change it around to take it from a file stream. You also might want to consider input limits and such.
#include <stdio.h>
#include <string.h>
#define CHAR_LENGTH 100
const char *DELIMS = " ,.\n";
char *p;
int i;
int parse(char *inputLine, char *arguments[], const char *delimiters)
{
int count = 0;
for (p = strtok(inputLine, delimiters); p != NULL; p = strtok(NULL, delimiters))
{
arguments[count] = p;
count++;
}
return count;
}
int main()
{
char line[1024];
size_t bufferSize = 1024;
char *args[CHAR_LENGTH];
fgets(line, bufferSize, stdin);
int count = parse(line, args, DELIMS);
for (i = 0; i <= count; i++){
printf("[%s]", args[i]);
}
}

Related

strtok string from file to array but missing first line

I'm trying to read a .txt file and save all sentences end with .!? into array. I use getline and strtok to do this. When I save the sentences, it seems work. But when I try to retrieve data later through index, the first line is missing.
The input is in a file input.txt with content below
The wandering earth! In 2058, the aging Sun? is about to turn into a red .giant and threatens to engulf the Earth's orbit!
Below is my code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main() {
FILE *fp = fopen("input.txt", "r+");
char *line = NULL;
size_t len = 0;
char *sentences[100];
if (fp == NULL) {
perror("Cannot open file!");
exit(1);
}
char delimit[] = ".!?";
int i = 0;
while (getline(&line, &len, fp) != -1) {
char *p = strtok(line, delimit);
while (p != NULL) {
sentences[i] = p;
printf("sentences [%d]=%s\n", i, sentences[i]);
i++;
p = strtok(NULL, delimit);
}
}
for (int k = 0; k < i; k++) {
printf("sentence is ----%s\n", sentences[k]);
}
return 0;
}
output is
sentences [0]=The wandering earth
sentences [1]= In 2058, the aging Sun
sentences [2]= is about to turn into a red
sentences [3]=giant and threatens to engulf the Earth's orbit
sentence is ----
sentence is ---- In 2058, the aging Sun
sentence is ---- is about to turn into a red
sentence is ----giant and threatens to engulf the Earth's orbit
I use strtok to split string directly. It worked fine.

Change mode from "r+" to "r".
Changed the list of delimiters from a variable to a constant DELIMITERS and added '\n'. You may or may not what that '\n' in there but I would need to see the expected output now that you supplied input. vim, at least, ends the last line with a '\n' which would generate at least one '\n' token at the end. The other option is to remove leading and trailing white space, and if you end up with an empty string then don't add it as a sentence.
Introduced a constant for number of sentences, and ignore additional sentences beyond what we have space for.
Combined the two strtok() calls (DRY).
Eliminated the two memory leaks.
If your input contains multiple lines the contents of line will be overwritten. This means the pointers in in sentences no longer make sense. The easiest fix is strdup() each string. Another approach would be to retain an array of line pointers (for subsequent free()) and have getline() allocate new a new line each time by resetting line = 0 and line = NULL.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define DELIMITERS ".!?\n"
#define SENTENCES_LEN 100
int main() {
FILE *fp = fopen("input.txt", "r");
if (!fp) {
perror("Cannot open file!");
return 1;
}
char *line = NULL;
size_t len = 0;
char *sentences[SENTENCES_LEN];
int i = 0;
while (getline(&line, &len, fp) != -1) {
char *s = line;
for(; i < SENTENCES_LEN; i++) {
char *sentence = strtok(s, DELIMITERS);
if(!sentence)
break;
sentences[i] = strdup(sentence);
printf("sentences [%d]=%s\n", i, sentences[i]);
s = NULL;
}
}
for (int k = 0; k < i; k++) {
printf("sentence is ----%s\n", sentences[k]);
free(sentences[k]);
}
free(line);
fclose(fp);
}
Using the supplied input file the matching out is:
sentences [0]=The wandering earth
sentences [1]= In 2058, the aging Sun
sentences [2]= is about to turn into a red
sentences [3]=giant and threatens to engulf the Earth's orbit
sentence is ----The wandering earth
sentence is ---- In 2058, the aging Sun
sentence is ---- is about to turn into a red
sentence is ----giant and threatens to engulf the Earth's orbit

How to read specific words from a file?

I have a file that contains words and their synonyms each on a separate line.
I am writing this code that should read the file line by line then display it starting from the second word which is the synonym.
I used the variable count in the first loop in order to be able to count the number of synonyms of each word because the number of synonyms differs from one to another. Moreover I used the condition synonyms[i]==',' because each synonym is separate by a comma.
The purpose of me writing such code is to put them in a binary search tree in order to have a full dictionary.
The code doesn't contain any error yet it is not working.
I have tried to each the loop but that didn't work too.
Sample input from the file:
abruptly - dead, short, suddenly
acquittance - release
adder - common, vipera
Sample expected output:
dead short suddenly
acquittance realse
common vipera
Here is the code:
void LoadFile(FILE *fp){
int count;
int i;
char synonyms[50];
char word[50];
while(fgets(synonyms,50,fp)!=NULL){
for (i=0;i<strlen(synonyms);i++)
if (synonyms[i]==',' || synonyms[i]=='\n')
count++;
}
while(fscanf(fp,"%s",word)==1){
for(i=1;i<strlen(synonyms);i++){
( fscanf(fp,"%s",synonyms)==1);
printf("%s",synonyms);
}
}
}
int main(){
char fn[]="C:/Users/CLICK ONCE/Desktop/Semester 4/i2206/Project/Synonyms.txt";
FILE *fp;
fp=fopen(fn,"rt");
if (fp==NULL){
printf("Cannot open this file");
}
else{
LoadFile(fp);
}
return 0;
}

Here is my solution. I have split the work into functions for readability. The actual parsing is done in parsefunction. That function thakes into account hyphenated compound words such as seventy-two. The word and his synonyms must be separated by an hyphen preceded by at least one space.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
// Trim leading and trailing space characters.
// Warning: string is modified
char* trim(char* s) {
char* p = s;
int l = strlen(p);
while (isspace(p[l - 1])) p[--l] = 0;
while (*p && isspace(*p)) ++p, --l;
memmove(s, p, l + 1);
return s;
}
// Warning: string is modified
int parse(char* line)
{
char* token;
char* p;
char* word;
if (line == NULL) {
printf("Missing input line\n");
return 0;
}
// first find the word delimiter: an hyphen preceded by a space
p = line;
while (1) {
p = strchr(p, '-');
if (p == NULL) {
printf("Missing hypen\n");
return 0;
}
if ((p > line) && (p[-1] == ' ')) {
// We found an hyphen preceded by a space
*p = 0; // Replace by nul character (end of string)
break;
}
p++; // Skip hyphen inside hypheneted word
}
word = trim(line);
printf("%s ", word);
// Next find synonyms delimited by a coma
char delim[] = ", ";
token = strtok(p + 1, delim);
while (token != NULL) {
printf("%s ", token);
token = strtok(NULL, delim);
}
printf("\n");
return 1;
}
int LoadFile(FILE* fp)
{
if (fp == NULL) {
printf("File not open\n");
return 0;
}
int ret = 1;
char str[1024]; // Longest allowed line
while (fgets(str, sizeof(str), fp) != NULL) {
str[strcspn(str, "\r\n")] = 0; // Remove ending \n
ret &= parse(str);
}
return ret;
}
int main(int argc, char *argv[])
{
FILE* fp;
char* fn = "Synonyms.txt";
fp = fopen(fn, "rt");
if (fp == NULL) {
perror(fn);
return 1;
}
int ret = LoadFile(fp);
fclose(fp);
return ret;
}

I think the biggest conceptual misunderstanding demonstrated in the code is a failure to understand how fgets and fscanf work.
Consider the following lines of code:
while(fgets(synonyms,50,fp)!=NULL){
...
while(fscanf(fp,"%49s",word)==1){
for(i=1;i<strlen(synonyms);i++){
fscanf(fp,"%49s",synonyms);
printf("%s",synonyms);
}
}
}
The fgets reads one line of the input. (Unless there is an input line that is greater than 49 characters long (48 + a newline), in which case fgets will only read the first 49 characters. The code should check for that condition and handle it.) The next fscanf then reads a word from the next line of input. The first line is effectively being discarded! If the input is formatted as expected, the 2nd scanf will read a single - into synonyms. This makes strlen(synonyms) evaluate to 1, so the for loop terminates. The while scanf loop then reads another word, and since synonyms still contains a string of length 1, the for loop is never entered. while scanf then proceeds to read the rest of the file. The next call to fgets returns NULL (since the fscanf loop has read to the end of the file) so the while/fgets loop terminates after 1 iteration.
I believe the intention was for the scanfs inside the while/fgets to operate on the line read by fgets. To do that, all the fscanf calls should be replaced by sscanf.

C Reading a file of digits separated by commas

I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present.
For example: processes.txt contains
0,1,3
1,0,5
2,9,8
3,10,6
And an array called numbers should look like:
0 1 3 1 0 5 2 9 8 3 10 6
The code I had so far is:
FILE *fp1;
char c; //declaration of characters
fp1=fopen(argv[1],"r"); //opening the file
int list[300];
c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;
while(c!=EOF){ //iterate until end of file
if (isdigit(c)){ //if it is digit
sscanf(&c,"%d",&number); //changing character to number (c)
num=(num*10)+number;
}
else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
list[i]=num;
num=0;
i++;
}
c=fgetc(fp1);
}
But this is having problems if it is a double digit. Does anyone have a better solution? Thank you!

For the data shown with no space before the commas, you could simply use:
while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
list[i++] = num;
This will read the comma after the number if there is one, silently ignoring when there isn't one. If there might be white space before the commas in the data, add a blank before the comma in the format string. The test on i prevents you writing outside the bounds of the list array. The ++ operator comes into its own here.

First, fgetc returns an int, so c needs to be an int.
Other than that, I would use a slightly different approach. I admit that it is slightly overcomplicated. However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. For your specific problem, I recommend Johathan Leffler's answer.
int c=fgetc(f);
while(c!=EOF && i<300) {
if(isdigit(c)) {
fseek(f, -1, SEEK_CUR);
if(fscanf(f, "%d", &list[i++]) != 1) {
// Handle error
}
}
c=fgetc(f);
}
Here I don't care about commas and newlines. I take ANYTHING other than a digit as a separator. What I do is basically this:
read next byte
if byte is digit:
back one byte in the file
read number, irregardless of length
else continue
The added condition i<300 is for security reasons. If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error.
Note that you should always check the return value for functions like sscanf, fscanf, scanf etc. Actually, you should also do that for fseek. In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. But in production code you SHOULD check it.

My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. If you want portable code you should use strtok instead.
A naive implementation of readline would be something like this:
static char *readline(FILE *file)
{
char *line = malloc(sizeof(char));
int index = 0;
int c = fgetc(file);
if (c == EOF) {
free(line);
return NULL;
}
while (c != EOF && c != '\n') {
line[index++] = c;
char *l = realloc(line, (index + 1) * sizeof(char));
if (l == NULL) {
free(line);
return NULL;
}
line = l;
c = fgetc(file);
}
line[index] = '\0';
return line;
}
Then you just need to parse the whole line with strtok_r, so you would end with something like this:
int main(int argc, char **argv)
{
FILE *file = fopen(argv[1], "re");
int list[300];
if (file == NULL) {
return 1;
}
char *line;
int numc = 0;
while((line = readline(file)) != NULL) {
char *saveptr;
// Get the first token
char *tok = strtok_r(line, ",", &saveptr);
// Now start parsing the whole line
while (tok != NULL) {
// Convert the token to a long if possible
long num = strtol(tok, NULL, 0);
if (errno != 0) {
// Handle no value conversion
// ...
// ...
}
list[numc++] = (int) num;
// Get next token
tok = strtok_r(NULL, ",", &saveptr);
}
free(line);
}
fclose(file);
return 0;
}
And for printing the whole list just use a for loop:
for (int i = 0; i < numc; i++) {
printf("%d ", list[i]);
}
printf("\n");

Printf() prints string arguments out of order

I have some C-code that reads in a text file line by line, hashes the strings in each line, and keeps a running count of the string with the biggest hash values.
It seems to be doing the right thing but when I issue the print statement:
printf("Found Bigger Hash:%s\tSize:%d\n", textFile.biggestHash, textFile.maxASCIIHash);
my print returns this in the output:
Preprocessing: dict1
Found BiSize:110h:a
Found BiSize:857h:aardvark
Found BiSize:861h:aardwolf
Found BiSize:937h:abandoned
Found BiSize:951h:abandoner
Found BiSize:1172:abandonment
Found BiSize:1283:abbreviation
Found BiSize:1364:abiogenetical
Found BiSize:1593:abiogenetically
Found BiSize:1716:absentmindedness
Found BiSize:1726:acanthopterygian
Found BiSize:1826:accommodativeness
Found BiSize:1932:adenocarcinomatous
Found BiSize:2162:adrenocorticotrophic
Found BiSize:2173:chemoautotrophically
Found BiSize:2224:counterrevolutionary
Found BiSize:2228:counterrevolutionist
Found BiSize:2258:dendrochronologically
Found BiSize:2440:electroencephalographic
Found BiSize:4893:pneumonoultramicroscopicsilicovolcanoconiosis
Biggest Size:46umonoultTotal Words:71885covolcanoconiosis
So tt seems I'm misusing printf(). Below is the code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define WORD_LENGTH 100 // Max number of characters per word
// data1 struct carries information about the dictionary file; preprocess() initializes it
struct data1
{
int numRows;
int maxWordSize;
char* biggestWord;
int maxASCIIHash;
char* biggestHash;
};
int asciiHash(char* wordToHash);
struct data1 preprocess(char* fileName);
int main(int argc, char* argv[]){
//Diagnostics Purposes; Not used for algorithm
printf("Preprocessing: %s\n",argv[1]);
struct data1 file = preprocess(argv[1]);
printf("Biggest Word:%s\t Size:%d\tTotal Words:%d\n", file.biggestWord, file.maxWordSize, file.numRows);
//printf("Biggest hashed word (by ASCII sum):%s\tSize: %d\n", file.biggestHash, file.maxASCIIHash);
//printf("**%s**", file.biggestHash);
return 0;
}
int asciiHash(char* word)
{
int runningSum = 0;
int i;
for(i=0; i<strlen(word); i++)
{
runningSum += *(word+i);
}
return runningSum;
}
struct data1 preprocess(char* fName)
{
static struct data1 textFile = {.numRows = 0, .maxWordSize = 0, .maxASCIIHash = 0};
textFile.biggestWord = (char*) malloc(WORD_LENGTH*sizeof(char));
textFile.biggestHash = (char*) malloc(WORD_LENGTH*sizeof(char));
char* str = (char*) malloc(WORD_LENGTH*sizeof(char));
FILE* fp = fopen(fName, "r");
while( strtok(fgets(str, WORD_LENGTH, fp), "\n") != NULL)
{
// If found a larger hash
int hashed = asciiHash(str);
if(hashed > textFile.maxASCIIHash)
{
textFile.maxASCIIHash = hashed; // Update max hash size found
strcpy(textFile.biggestHash, str); // Update biggest hash string
printf("Found Bigger Hash:%s\tSize:%d\n", textFile.biggestHash, textFile.maxASCIIHash);
}
// If found a larger word
if( strlen(str) > textFile.maxWordSize)
{
textFile.maxWordSize = strlen(str); // Update biggest word size
strcpy(textFile.biggestWord, str); // Update biggest word
}
textFile.numRows++;
}
fclose(fp);
free(str);
return textFile;
}

You forget to remove the \r after reading. This is in your input because (1) your source file comes from a Windows machine (or at least one which uses \r\n line endings), and (2) you use the fopen mode "r", which does not translate line endings on your OS (again, presumably Windows).
This results in the weird output as follows:
Found Bigger Hash:text\r\tSize:123
– see the position of the \r? So what happens when outputting this string, you get at first
Found Bigger Hash:text
and then the cursor gets repositioned to the start of the line by \r. Next, a tab is output – not by printing spaces but merely moving the cursor to the 8thth position:
1234567↓
Found Bigger Hash:text
and the rest of the string is printed over the one already shown:
Found BiSize:123h:text
Possible solutions:
Open your file in "rt" "text" mode, and/or
Check for, and remove, the \r code as well as \n.
I'd go for both. strchr is pretty cheap and will make your code a bit more foolproof.
(Also, please simplify your fgets line by splitting it up into several distinct operations.)

Your statement
while( strtok(fgets(str, WORD_LENGTH, fp), "\n") != NULL)
takes no account of the return value from fgets() or the way strtok() works.
The way to do this is something like
char *fptr, *sptr;
while ((fptr = fgets(str, WORD_LENGTH, fp)) != NULL) {
sptr = strtok(fptr, "\n");
while (sptr != NULL) {
printf ("%s,", sptr);
sptr = strtok (NULL, "\n");
}
printf("\n");
}
Note than after the first call to strtok(), subsequent calls on the same sequence must pass the parameter NULL.

How do I count occurrences of a list of strings and output them to a new file?

I have been given three '.txt' files.
The first is a list of words.
The second is a document to search.
The third is a blank document that will have my output written to it.
I'm supposed to take each word in the first file, search the second file and print the number of occurrences in the third file as "wordX = numOccurences."
I've got a good function that will return the wordCount, and it returns it correctly for the first word, but then I get a zero for all the remaining words.
I've tried to dereference everything, and I think I've come to a standstill. There's something wrong with the "pointer talk."
I have yet to start outputting the words to a new file, but that printf statement should be a print to file statement in append mode. Easy enough.
Here is the working wordCount function - it works if I just give it a single word, like "testing," but if I give it an array I want to iterate through, it just returns 0.
int countWord(char* filePath, char* word){ //Not mine. This is a working prototype function from SO, returns word count of particular word
FILE *fp;
int count = 0;
int ch, len;
if(NULL==(fp=fopen(filePath, "r")))
return -1;
len = strlen(word);
for(;;){
int i;
if(EOF==(ch=fgetc(fp))) break;
if((char)ch != *word) continue;
for(i=1;i<len;++i){
if(EOF==(ch = fgetc(fp))) goto end;
if((char)ch != word[i]){
fseek(fp, 1-i, SEEK_CUR);
goto next;
}
}
++count;
next: ;
}
end:
fclose(fp);
return count;
}
This is my part of the program, trying to call the function while the loop gets all the words from the first file. The loop IS grabbing the words, because it prints them, but wordCount isn't accepting anything beyond the first word.
int main(){
FILE *ptr_file;
char words[100];
ptr_file = fopen("searchWords.txt", "r");
if(!ptr_file)
return -1;
while( fgets(words, 100, ptr_file)!=NULL )
{
int wordCount = 0;
char key[100] = &*words;
wordCount = countWord("document.txt", words);
printf("%s = %d\n", words, wordCount);
}
fclose(ptr_file);
return 0;
}

fgets reads \n too.That is the problem. To quote
A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str.
To solve this, change it
while( fgets(words, 100, ptr_file)!=NULL )
{
int len = strlen(words);
words[len-1] = '\0';

An immediate problem: fgets doesn't strip end-of-line from the string, so whatever you pass to countWord has an embedded newline.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Read in individual words from text file and translate - C - c

Related

strtok string from file to array but missing first line

How to read specific words from a file?

C Reading a file of digits separated by commas

Printf() prints string arguments out of order

How do I count occurrences of a list of strings and output them to a new file?

Categories

Resources