Parsing words in C; Translating program - c

I'm developing a program that will translate a string from the user (English) into Spanish.
For the assignment I'm given a file that contains a list of a 100 words and their spanish equivalent. I've successfully opened that file, and fed it to the string with a two dimensional array.
What I'm having difficulty with is parsing the words so it will allow me to find the equivalent version of the given words; any words that aren't given are suppose to be replaced with asterisks (*). Any ideas on how I can parse the words from the users inputted string?
Below is snippits of the source code to save some time.
--Thanks
char readFile[100][25];
fp = fopen("words.dat", "r");
if (fp == NULL){
printf ("File failed to load\n");
}
//This is how I stored the file into the two dimensional string.
while (fgets(readFile, 100, fp)){
x++;
}
printf ("User please input string\n");
gets (input);
That's as far as I've gotten. I commented out the for-loop that outputs the words so I can see the words (for the sake of curiousity) and it was successful. The format of the file string is
(english word), (spanish word).

First of, the array you declare is 100 arrays of 25-character arrays. If we talk about "lines" it means you have 100 lines where each line can be 24 characters (remember we need one extra for the terminating '\0' character). If you want 25 lines of 99 characters each, switch place of the sizes.
Secondly, you overwrite the same bytes of the array over and over again. And since each sub-array is actually only 25 characters, you can overwrite up to four of those arrays with that fgets call.
I suggest something like this instead:
size_t count = 0;
for (int i = 0; i < sizeof(readFile) / sizeof(readFile[0]) &&
fgets(readFile[i], sizeof(readFile[i]), fp); i++, count++)
{
}
This will make sure you don't read more than you can store, and automatically reads into the correct "line" in the array. After the loop count will contain the number of lines you read.

Related

Implementation of a c counter

I wrote a c program to count the number of time the word "printf" occurs in a specific file (here "document.c"). "document.c" has multiple lines of code. What I have done is I started with a while loop to iterate over every lines of the file and then I am reading the characters of each lines inside the for loop by using the function strstr.
It does not print anything with my current code. Moreove, I think there is some other minor issues because in an older version it used to print but not correctly, it printed a number much more larger than the actual number of "printf" in the document.
I am also novice in c.thank you!
int counter() {
FILE * filePointer;
filePointer = fopen("document.c", "r");
int counter = 0;
char singleLine[200];
while(!feof(filePointer)){
fgets(singleLine, 200, filePointer);
for (int i = 0; i < strlen(singleLine); i++){
if(strstr(singleLine, "printf")){
counter++;
}
}
}
fclose(filePointer);
printf("%d",counter);
return 0;
}
You're iterating over each character in the input line, and then asking if the string "printf" appears anywhere in the line. If the line contains 5 characters, you'll ask this 5 times; if it contains 40 characters, you'll ask this 40 times.
Assuming that you're trying to cover the case where "printf" can appear more than once on the line, look up what strstr() returns, and use that to adjust the starting position of the search in the inner loop (which shouldn't iterate over each character, but should loop while new "hits" are found).
(Note to up-voters: I'm answering the question, but not providing code because I don't want to do their homework for them.)

Reading line by line using fscanf

I want to read a file with 3 lines:
The first one with strings, second with a number and the third with strings again.
Example:
Line 1: bird toy book computer water
Line 2: 2
Line 3: toy water
I have this code, that reads a file, word by word storing them in the word array, and then putting the word into the words 2d array.
char words [5][50];
char word [50];
int i,j;
j = 0;
while( (fscanf(file, "%s", word))!=EOF ){
for(i = 0; i<50; i++){
if(word[i] != NULL){
words[j][i] = word[i];
} else{
break;
}
}
j++;
}
it's working, but it reads all the lines, i want a way to just do this process for the first line, and then store the second line into a int variable and the third line into another 2d array.
Read more about fscanf. It is not suitable to read line by line.
Consider instead reading every line with fgets or even better (on POSIX) with getline (see this), then parse each line perhaps with sscanf. Its return value (the count of scanned items given from sscanf etc...) could be useful to test (and you might also want to use %n in the scan control format string; as Jonathan Leffler commented, read also about %ms assignment-allocation modifier, at least on POSIX systems, see Linux sscanf(3)).
BTW, hard-coding limits like 50 for your word length is bad taste (and not robust). Consider perhaps using more systematically C dynamic memory allocation (using malloc, free and friends) and pointers, perhaps using sometimes flexible array members in some of your struct-s

What happens with extra memory using fscanf?

I'm new to C and I have a couple questions about fscanf. I wrote a simple program that reads the contents of a file and spits it back out on the command line:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char* argv[1])
{
if (argc != 2)
{
printf("Usage: fscanf txt\n");
return 1;
}
char* txt = argv[1];
FILE* fp = fopen(txt, "r");
if (fp == NULL)
{
printf("Could not open %s.\n", txt);
return 2;
}
char s[50];
while (fscanf(fp, "%49s", s) == 1)
printf("%s\n", s);
return 0;
}
Let's say the contents of my text file is just "C is cool.", which will output:
C
is
cool.
So I have two questions here:
1) Does fscanf assume that the placeholder "%s" will be a single word (an array of chars only)? According to this program's output, spaces and line breaks seem to prompt the function to return. But what if I wanted to read a whole paragraph? Would I use fread() instead?
2) More importantly I'm wondering what happens with all of the unused space in the array. On the first iteration, I think s[0] = "C" and s[1] = "\0", so are s[2] - s[49] just wasted?
EDIT: while (fscanf(fp, "%**49**s", s) == 1) - thanks to #M Oehm for pointing this out - enforcing strong limit here to prevent dangerous buffer overflows
1) Does fscanf assume that the placeholder "%s" will be a single word
(an array of chars only)? According to this program's output, spaces
and line breaks seem to prompt the function to return. But what if I
wanted to read a whole paragraph? Would I use fread() instead?
The %s specifier reads single words that are delimited by white space. The scanf family of functions are very cerude; they do not normally distinguish between line breaks and spaces, for example.
A line is anything up to the next newline. There is no concept of paragraph, but you might consider anything between blank lines a paragraph. The function to read lines of text is fgets, so you could read lines until you find an empty one. (fgets retains the newline at the end, mind.)
fread is a function for reading binary data. It is not useful for reading structured texts. (But it can be used to read the contents of a whole text file at once.)
2) More importantly I'm wondering what happens with all of the unused
space in the array. On the first iteration, I think c[0] = 'C' and
c[1] = '\0', so are c[2] - c[49] just wasted?
You are right, the data after the null ternimator isn't used. "Wasted" is too negative – with user input you don't know whether you encounter a longer word eventually. Because dynamic allocation requires some care in C, allocating "enogh for most cases" is a goopd practice in C. You should enforce the hard limit when reading, though, to prevent buffer overruns:
fscanf(fp, "%49s", s)
The issue of "wasted" memory becomes more serious if you have an array of arrays of 50 chars. Most of the words will be much shorter than 50 chars. Here, the extra memory might eventually hurt you. 48 extra characters for reading a line are okay, though.
(A strategy to save "compact" arrays of chars is to have a running array of chars that is a concatenation of all strings, including their terminators. The word array is then an array of piointers into that master string.)
You use specifier %s which will read and store data in array s until it encounters a space or newline . As soon as it encounters space fscanf returns.
I think c[0] = "C" and c[1] = "\0", so are c[2] - c[49] just wasted?
Yes , s[0]='C' and s[1]='\0' and you probably can't do anything about the size of array being much more.
If you want complete string "C is cool" stored in array use fgets.
#define len 1000
char s[len];
while(fgets(s,len,fp)!=NULL) {
//your code
}

Searching for strings that are NULL terminated within a file where they are not NULL terminated

I am writing a program that opens two files for reading: the first file contains 20 names which I store in an array of the form Names[0] = John\0. The second file is a large text file that contains many occurences of each of the 20 names.
I need my program to scan the entirity of the second file and each time it finds one of the names, a variable Count is incremented and so on the completion of the program, the total number of all the names appearing in the text is stored in Count.
Here is my loop which searches for and counts the number of name occurences:
char LineOfText[85];
char *TempName;
while(fgets(LineOfText, sizeof(LineOfText), fpn)){
for(a = 0; a<NumOfNames; a++){
TempName = strstr(LineOfText, Names[a]);
if(TempName != NULL){
Count++;
}
}
}
No matter what I do, this loop doesn't work as I would expect it to, but I have discovered what is wrong (I think!). My problem is that each name in the array is NULL terminated, but when a name appears in the text file it is not NULL terminated, unless it occurs as the last word of a line. Therefore, this while loop is only counting the number of times any of the names appear at the end of a line, rather than the number of appearances of any of the names anywhere in the text file. How can I adjust this loop to combat this problem?
Thank you for any advice in advance.
The issue here is probably your use of fgets, which does not trim the newline from the line it reads.
If you are creating your names array by reading lines with fgets, then all the names will be terminated with a newline character. The lines in the file being read with fgets will also be terminated with a newline character, so the names will only match at the end of the lines.
strstr does not compare the NUL byte which terminates the pattern string, for obvious reasons. If it did, it would only match suffix strings, which would make it a very different function.
Also, you will only find a maximum of one instance of each name in each line. If you think that a name might appear more than once in the same line, you should replace:
TempName = strstr(LineOfText, Names[a]);
if(TempName != NULL){
Count++;
}
with something like:
for (TempName = LineOfText;
(TempName = strstr(TempName, Names[a]);
++Count, ++TempName) {
}
For reference, here is the definition of fgets from the C standard (emphasis added):
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
This is different from gets, which does not retain the new-line character.
I think the NULL termination of the names array is not an issue (See strstr function reference). The strstr function is not going to compare the terminator. You do have the possibility of missing additional names on each line. See my adjustment below for an example of how you could count multiple names on each line.
char LineOfText[85];
char *TempName;
while(fgets(LineOfText, sizeof(LineOfText), fpn)){
for(a = 0; a<NumOfNames; a++){
TempName = strstr(LineOfText, Names[a]);
/* Iterate through line for multiple occurrences of each name */
while(TempName != NULL){
Count++;
/* Get next occurrence of name on line. fgets is going to
leave a newline at the end of the LineOfText string so
unless some of your names contain a newline, it shouldn't
move past the end of the buffer */
TempName = strstr(TempName + 1, Names[a]);
}
}
}

Why does this code not specify which element of the array is accessed?

#include <stdio.h>
#include <stdlib.h>
FILE *fptr;
main()
{
char fileLine[100];
fptr = fopen("C:\\Users\\user\\Desktop\\Summary.h", "r");
if (fptr != 0){
while (!feof(fptr)){
fgets(fileLine, 100, fptr); // << not specified like fileLine[1] ?
if (!feof(fptr)){
puts(fileLine); // The same thing ?
}
}
}
else
{
printf("\nErorr opening file.\n");
}
fclose(fptr);
return 0;
}
The tremendous pain here, why the array elements are not specified, and how the array holds the lines?
char fileLine[100];
This is not an array of lines, it's an array of characters. One char represents one character (or more precisely one byte). The declaration char fileLine[100] makes it an array of 100 characters. C doesn't have distinct types for strings and for arrays of characters: a string (such as the content of a line) is just an array of characters, with a null byte after the last character.
At each run through the loop, fileLine contains the line that is read by fgets. That string is printed out by puts. Each call to fgets overwrite the line that was previously stored in the string.
Note that since fgets retains the newline character that terminates each line, and puts adds a newline after printing the string, you will get double-spaced output. If a line is more than 99 characters long (strictly speaking, again, more than 99 bytes long), you'll get a line break after each block of 99 characters.
If you wanted to store all the lines, you'd need an array of strings, i.e. an array of arrays of characters.
char fileLines[42][100];
int i = 0;
while (!feof(fptr)) {
fgets(fileLines[i], 100, fptr);
++i;
}
/* i-1 lines have been read, from fileLines[0] to fileLines[i-2] */
The way you're using feof is quite awkward there. feof tells you whether the last attempt to read reached the end of the file, not whether the next attempt to read would reach the end of the file. For example, here, after the last line has been read, feof() is false (because the program doesn't know yet that this is the last line, it has to attempt to read more); then fgets runs again, and returns NULL because it couldn't read anything. Nonetheless i is incremeneted; and after that feof() returns false which terminates the loop. Thus i ends up being one plus the number of lines read.
While you can fix this here by decrementing i, the way that actually works even in real-life programs — and that also makes more sense — is to test the result of fgets. You know that you've reached the end of the file because fgets is unable to read a line.
char fileLines[42][100];
int i = 0;
while (fgets(fileLines[i], 100, fptr))
++i;
}
/* i lines have been read, from fileLines[0] to fileLines[i-1] */
(This is a toy example, real-life code would need dynamic memory management and error checks for long lines, too many lines, and read errors.)
The array of characters that is fileLine is treated as a string.

Resources