fgets storing unknown data from text file - c

I was creating a word guessing game that reads a text file line per line until it finds a random word and stores it in a string (word). Then the user enters letters until all the letters of the stored word are revealed.
So far it works perfectly but every time it's the very first word that is read some unknown characters get stored at the beginning of char word[20].
Note that I use C mobile app and it uses clang 6.0 compiler I think. So does the error come from my code or it come from their app? (which I love).
Here is the FULL, clearer code:
//guess the right word
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <stdlib.h>
#define TRUE 1
#define FALSE 0
#define NB_OF_WORDS 3 //number of words in text file
main() {
char begin, word[20] = { 0 }, guessedletter;
int num, rightletter, success;
int i = 0;
int show[20] = { 0 };//shown letters
FILE *ressource = NULL;
srand(time(NULL));
if ((ressource = fopen("ressource.txt", "r")) == NULL) {
fprintf(stderr, "Error ressource.txt");//open in read only mode
exit(EXIT_FAILURE);
}
printf("Welcome to Word Guess, a word guessing contest.\n"
"You have to find the letters of a word and guess what word it is.\n"
"Begin? (y/n)\n");
while ((begin = getchar()) == 'y') { //game loop
/*reinitializations*/
fseek(ressource, 3, SEEK_SET);//replaces rewind(ressource)
success = FALSE;
rightletter = 0;
num = 0;
num = rand() % NB_OF_WORDS; //random number between 0 and NB-1
for (i = 0; i < 20; i++) {
show[i] = FALSE;
word[i] = 0;
}
i = 0;
/*end of reinitializations*/
while (i <= num) {//reads line by line until random word is stored
if (fgets(word, 20, ressource) == NULL) {
fprintf(stderr, "fgets did not work");
exit(EXIT_FAILURE);
}
i++;
}
printf("%s", word);//here is just for testing if the read word is well read. Which isn't the case if num=0
for (i = 0; i < 20; i++)
if (word[i] == '\n')
word[i] = '\0';//adds zero character to show where the string ends
while (!success) { //guessing loop
printf("\nWrite a letter: ");
scanf("%c", &guessedletter);
printf("\n");
for (i = 0; word[i] != '\0'; i++) { //compares entered letter to letter from string. If they match...
if (word[i] == guessedletter) {
if (!show[i])
rightletter++;//avoids letters entered twice from increasing count
show[i] = TRUE;
}
if (show[i])
printf("%c", word[i]);//...a letter is revealed...
else
printf("*");//else it stays hidden
}
if (rightletter == strlen(word))
success=TRUE;//if all the right letters found (same number of letters found as number of letters in the words) you win
}
printf("\nCongratulations you have won!\nDo you want to replay? (y/n)");
getchar();//clears newline character
}
fclose(ressource);
return 0;
}
When num=0, I get a weird sign before the printed word as if the first characters of the text file were not supposed to be there...
Moreover during the guessing game if the word to guess (word[20]) is "annoying" let's say and that it's the first word from "ressource.txt" (num=0). The word will print like so on the screen once I guessed all the letters: ***annoying. Which does not happen with ANY other words from the list.
I'm new to this site and post from my phone... sorry for any mistakes.
EDIT: removed fgetc for fgets. Still get several unknown characters if fgets reads first line.
EDIT 2: added the whole code, translated mot[20] into word[20], added error testing
EDIT 3: Replacing rewind(ressource); by fseek(ressource, 3, SEEK_SET); solved the problem. Which means there are indeed three unknown characters at the beginning of the text file

There are multiple problems in the posted code:
You only posted a fragment of code, the rest of the function could cause problems that cannot be analysed from what you posted. Please post the complete code to a minimal program exhibiting the problem.
You do not test for end of file in the while loop. If you try to skip more lines than are present in the file, this loop will run indefinitely.
You do not test the return value of fgets(): if by chance you skipped the full contents of the file in the preceding while loop, fgets() will fail and return NULL, leaving mot in an undetermined state, causing unexpected behavior in the subsequent printf.
EDIT: the modified code still does not check the return value of fgets().
EDIT: thank you for posting the full code, but modifying the question this way makes this answer and the comments irrelevant.
Your dictionary file ressource.txt seems to start with a BOM (Byte Order Mark) encoded in UTF-8. I am guessing it contains French words including some with accented letters such as reculées and encodées... Your text editor saved it encoded in UTF-8 with an extra code-point at the beginning for other programs to determine this encoding easily. The problem with that is your program does not handle UTF-8. It assumes each byte read from stdin represents a single character. It might by chance match some words with accented letters, but will more likely fail to find them.

Related

Stopping user input using the enter key in C

I'm trying to write a program that receives strings using fgets, but for some reason I can't get it to go past the user input stage. The input should stop once the user enters a "blank line", ie. the Enter key (\n) but even when this key is pressed the loop continues.
Here's the problematic part of my code:
char array[100][256];
for (int i = 0; array[i] != '\n'; i++)
{
fgets(array[i], 256, stdin);
}
100 and 256 represent the maximum amount of lines and chars expected respectively.
Does anyone know where I went wrong?
Here is your code fixed with minimal changes, explanations in comments. Note that this is not a very good way to solve your problem, long lines for example may not behave as you want (they will get split at several array lines).
char array[100][256];
memset(array, 0, sizeof array); // initialize the memory
int i = 0;
while(i<100) // avoid overflow of lines, also while may be clearer than for loop
{
if(!fgets(array[i], 256, stdin)) break; // detect read failure
if(array[i][0] == '\n') break; // got empty line
// Note [0] above to test first char of line i
++i;
}
if (i==100) { /* too many lines */ }
else if (array[i][0] == 0) { /* read failure */ }
else { /* indexes 0...i-1 contain data, index i contains empty line */ }

Is there a way to read a filestream until a period (.) is found. Then repeat?

I'm fairly new to C and not sure how I would do this. I've found similar questions, but nothing exactly like I want.
What I want to do is read a raw txt file "sentence by sentence" with the end of a sentence being considered a period (.) or a newline (\n). With no assumed maximum lengths for any data structures.
My first thought was getline(), but the version of C I'm required to use does not seem to have such a function. So I've tried to use fgets() and then parse the data onto a sscanf() with a scanset. sscanf(charLine, "%[^.]s", sentence);
The problem with this, is that if there is more than one period (.) it will stop at the first and not start again at that period (.) to collect the others.
I feel like I'm on the right track but just don't how to expand on this.
while(fgets (charLine, size, readFile) == NULL)
{
sscanf(charLine, "%[^.]s", sentence);
// something here...
}
You can write a function that reads the stream until a . or a newline is found. David C.Rankin suggested that just scanning for a . might be too restrictive, causing embedded periods in www.google.com to act as sentence break. One can stop on . if followed by white space:
#include <ctype.h>
#include <stdio.h>
/* alternative to fgets to stop at `.` and newline */
char *fgetsentence(char *dest, size_t size, FILE *fp) {
size_t i = 0;
while (i + 2 < size) {
int c = getc(fp);
if (c == EOF)
break;
dest[i++] = (char)c;
if (c == '\n')
break;
if (c == '.') {
int d = getc(fp);
if (d == EOF)
break;
if (isspace(d)) {
dest[i++] = (char)d;
break;
}
ungetc(d, fp);
}
}
if (i == 0)
return NULL;
dest[i] = '\0';
return dest;
}
If you want to handle arbitrary long sentences, you would take pointers to dest and size and reallocate the array if required.
Note that it would be very impractical to use fscanf(fp, "%[^.\n]", dest) because it is not possible to pass the maximum number of bytes to store into dest as an evaluated argument and one would need to special case empty lines and sentences.
Note too that stopping on ., even with the above restriction that it must be followed by white space still causes false positives: sentences can contain embedded periods followed by white space that are not the end of the sentence. Example: Thanks to David C. Rankin for his comments on my answer.

Check first six characters in csv file for a specific sequence

I'm trying to write a program (obviously new to this) that checks an input file to ensure that it adheres to the specified format. The input file is supposed to be in csv format with the first two rows beginning with the designations "Class" and "Subject", respectively.
I know checking for the "Subject" designation at the beginning of the second line will be a bit more complicated, but I think I can figure out where to start reading/checking for the string by determining the dimensions of the input dataset through calculation of the number of commas and line breaks.
However, I'm getting a little stuck on how to make sure that the first six characters are "Class,".
I was first just trying to figure out how to scan for a commas within the first six characters (see below), and I got stuck there.
// search the first six characters for "Class,"
FILE *input;
int character;
int comma_check = 0;
int i = input[i];
while (i < 6)
{
character = fgetc(input);
if (character == ',')
{
comma_check++;
}
}
if (comma_check == 0)
{
printf("Input file is not in the correct format.\n");
return 1;
fclose(input);
}
I've read through a number of posts that suggest ways to print to the screen (although I haven't figured out how to just print a specified number of characters rather than through the end of the file), but I don't know how I would go about saving the those first six characters into a string that I could then compare to "Class,". Any help or suggestions would be greatly appreciated. Thank you in advance!
EDIT: Thank you for your help. That makes sense. So similarly, if I wanted to check the end of the file for a line break or carriage return, I could do something like this, right?
fseek(input, -2, SEEK_END);
char buf1[3] = {0};
if ((strncmp(buf1, "\n,", 2) == 0) || (strncmp(buf1, "\r,", 2) == 0))
{
return 4;
fclose(input);
}
This seemed logical (\n and \r are each 2 bytes, right), but it doesn't seem to be working.
Use fread to read the first 6 characters and compare it with the key. Example:
FILE *input;
int match = 0;
char buf[7] = { 0 };
input = fopen("variation_format_help.txt", "r");
if (!input)
{
printf("error\n");
return 0;
}
fread(buf, 1, 6, input);
if (strncmp(buf, "Class,", 6) == 0)
{
match = 1;
printf("%s match\n", buf);
}
At this point you can close the file or continue reading the file. If you want to go back to the beginning of the file then call fseek(input, 0, 0);, there is no need to close the file and open it again.

How would I compare a string (entered by the user) to the first word of a line in a file?

I am really struggling to understand how character arrays work in C. This seems like something that should be really simple, but I do not know what function to use, or how to use it.
I want the user to enter a string, and I want to iterate through a text file, comparing this string to the first word of each line in the file.
By "word" here, I mean substring that consists of characters that aren't blanks.
Help is greatly appreciated!
Edit:
To be more clear, I want to take a single input and search for it in a database of the form of a text file. I know that if it is in the database, it will be the first word of a line, since that is how to database is formatted. I suppose I COULD iterate through every single word of the database, but this seems less efficient.
After finding the input in the database, I need to access the two words that follow it (on the same line) to achieve the program's ultimate goal (which is computational in nature)
Here is some code that will do what you are asking. I think it will help you understand how string functions work a little better. Note - I did not make many assumptions about how well conditioned the input and text file are, so there is a fair bit of code for removing whitespace from the input, and for checking that the match is truly "the first word", and not "the first part of the first word". So this code will not match the input "hello" to the line "helloworld 123 234" but it will match to "hello world 123 234". Note also that it is currently case sensitive.
#include <stdio.h>
#include <string.h>
int main(void) {
char buf[100]; // declare space for the input string
FILE *fp; // pointer to the text file
char fileBuf[256]; // space to keep a line from the file
int ii, ll;
printf("give a word to check:\n");
fgets(buf, 100, stdin); // fgets prevents you reading in a string longer than buffer
printf("you entered: %s\n", buf); // check we read correctly
// see (for debug) if there are any odd characters:
printf("In hex, that is ");
ll = strlen(buf);
for(ii = 0; ii < ll; ii++) printf("%2X ", buf[ii]);
printf("\n");
// probably see a carriage return - depends on OS. Get rid of it!
// note I could have used the result that ii is strlen(but) but
// that makes the code harder to understand
for(ii = strlen(buf) - 1; ii >=0; ii--) {
if (isspace(buf[ii])) buf[ii]='\0';
}
// open the file:
if((fp=fopen("myFile.txt", "r"))==NULL) {
printf("cannot open file!\n");
return 0;
}
while( fgets(fileBuf, 256, fp) ) { // read in one line at a time until eof
printf("line read: %s", fileBuf); // show we read it correctly
// find whitespace: we need to keep only the first word.
ii = 0;
while(!isspace(fileBuf[ii]) && ii < 255) ii++;
// now compare input string with first word from input file:
if (strlen(buf)==ii && strstr(fileBuf, buf) == fileBuf) {
printf("found a matching line: %s\n", fileBuf);
break;
}
}
// when you get here, fileBuf will contain the line you are interested in
// the second and third word of the line are what you are really after.
}
Your recent update states that the file is really a database, in which you are looking for a word. This is very important.
If you have enough memory to hold the whole database, you should do just that (read the whole database and arrange it for efficient searching), so you should probably not ask about searching in a file.
Good database designs involve data structures like trie and hash table. But for a start, you could use the most basic improvement of the database - holding the words in alphabetical order (use the somewhat tricky qsort function to achieve that).
struct Database
{
size_t count;
struct Entry // not sure about C syntax here; I usually code in C++; sorry
{
char *word;
char *explanation;
} *entries;
};
char *find_explanation_of_word(struct Database* db, char *word)
{
for (size_t i = 0; i < db->count; i++)
{
int result = strcmp(db->entries[i].word, word);
if (result == 0)
return db->entries[i].explanation;
else if (result > 0)
break; // if the database is sorted, this means word is not found
}
return NULL; // not found
}
If your database is too big to hold in memory, you should use a trie that holds just the beginnings of the words in the database; for each beginning of a word, have a file offset at which to start scanning the file.
char* find_explanation_in_file(FILE *f, long offset, char *word)
{
fseek(f, offset, SEEK_SET);
char line[100]; // 100 should be greater than max line in file
while (line, sizeof(line), f)
{
char *word_in_file = strtok(line, " ");
char *explanation = strtok(NULL, "");
int result = strcmp(word_in_file, word);
if (result == 0)
return explanation;
else if (result > 0)
break;
}
return NULL; // not found
}
I think what you need is fseek().
1) Pre-process the database file as follows. Find out the positions of all the '\n' (carriage returns), and store them in array, say a, so that you know that ith line starts at a[i]th character from the beginning of the file.
2) fseek() is a library function in stdio.h, and works as given here. So, when you need to process an input string, just start from the start of the file, and check the first word, only at the stored positions in the array a. To do that:
fseek(inFile , a[i] , SEEK_SET);
and then
fscanf(inFile, "%s %s %s", yourFirstWordHere, secondWord, thirdWord);
for checking the ith line.
Or, more efficiently, you could use:
fseek ( inFile , a[i]-a[i-1] , SEEK_CURR )
Explanation: What fseek() does is, it sets the read/write position indicator associated with the file at the desired position. So, if you know at which point you need to read or write, you can just go there and read directly or write directly. This way, you won't need to read whole lines just to get first three words.

c detecting empty input for stdin

This seems like it should be a simple thing but after hours of searching I've found nothing...
I've got a function that reads an input string from stdin and sanitizes it. The problem is that when I hit enter without typing anything in, it apparently just reads in some junk from the input buffer.
In the following examples, the prompt is "input?" and everything that occurs after it on the same line is what I type. The line following the prompt echoes what the function has read.
First, here is what happens when I type something in both times. In this case, the function works exactly as intended.
input? abcd
abcd
input? efgh
efgh
Second, here is what happens when I type something in the first time, but just hit enter the second time:
input? abcd
abcd
input?
cd
And here is what happens when I just hit enter both times:
input?
y
input?
y
It happens to return either 'y' or '#' every time when I run it anew. 'y' is particularly dangerous for obvious reasons.
Here is my code:
#include <stdio.h>
#include <stdlib.h>
#define STRLEN 128
int main() {
char str[STRLEN];
promptString("input?", str);
printf("%s\n", str);
promptString("input?", str);
printf("%s\n", str);
return EXIT_SUCCESS;
}
void promptString(const char* _prompt, char* _writeTo) {
printf("%s ", _prompt);
fgets(_writeTo, STRLEN, stdin);
cleanString(_writeTo);
return;
}
void cleanString(char* _str) {
char temp[STRLEN];
int i = 0;
int j = 0;
while (_str[i] < 32 || _str[i] > 126)
i++;
while (_str[i] > 31 && _str[i] < 127) {
temp[j] = _str[i];
i++;
j++;
}
i = 0;
while (i < j) {
_str[i] = temp[i];
i++;
}
_str[i] = '\0';
return;
}
I've tried various methods (even the unsafe ones) of flushing the input buffer (fseek, rewind, fflush). None of it has fixed this.
How can I detect an empty input so that I can re-prompt, instead of this annoying and potentially dangerous behavior?
This part of cleanString
while (_str[i] < 32 || _str[i] > 126)
i++;
jumps over \0 when the string is empty.
You should add _str[i] != '\0' into the loop's condition.
To detect an empty string, simply check it's length just after the input:
do {
printf("%s ", _prompt);
fgets(_writeTo, STRLEN, stdin);
} while (strlen(_writeTo) < 2);
(comparing with two because of '\n' which fgets puts into the end of buffer)
Why do you have a bunch of variable names with leading underscores? That's nasty.
Anyway, the first thing you must do is check the return value of fgets. If it returns NULL, you didn't get any input. (You can then test feof or ferror to find out why you didn't get input.)
Moving on to cleanString, you have a while loop that consumes a sequence of non-printable characters (and you could use isprint for that instead of magic numbers), followed by a while loop that consumes a sequence of printable characters. If the input string doesn't consist of a sequence of non-printables followed by a sequence of printables, you will either consume too much or not enough. Why not use a single loop?
while(str[i]) {
if(isprint(str[i]))
temp[j++] = str[i];
++i;
}
This is guaranteed to consume the whole string until the \0 terminator, and it can't keep going past the terminator, and it copies the "good" characters to temp. I assume that's what you wanted.
You don't even really need to use a temp buffer, you could just copy from str[i] to str[j], since j can never get ahead of i you'll never be overwriting anything that you haven't already processed.

Resources