I'm trying to write a program (obviously new to this) that checks an input file to ensure that it adheres to the specified format. The input file is supposed to be in csv format with the first two rows beginning with the designations "Class" and "Subject", respectively.
I know checking for the "Subject" designation at the beginning of the second line will be a bit more complicated, but I think I can figure out where to start reading/checking for the string by determining the dimensions of the input dataset through calculation of the number of commas and line breaks.
However, I'm getting a little stuck on how to make sure that the first six characters are "Class,".
I was first just trying to figure out how to scan for a commas within the first six characters (see below), and I got stuck there.
// search the first six characters for "Class,"
FILE *input;
int character;
int comma_check = 0;
int i = input[i];
while (i < 6)
{
character = fgetc(input);
if (character == ',')
{
comma_check++;
}
}
if (comma_check == 0)
{
printf("Input file is not in the correct format.\n");
return 1;
fclose(input);
}
I've read through a number of posts that suggest ways to print to the screen (although I haven't figured out how to just print a specified number of characters rather than through the end of the file), but I don't know how I would go about saving the those first six characters into a string that I could then compare to "Class,". Any help or suggestions would be greatly appreciated. Thank you in advance!
EDIT: Thank you for your help. That makes sense. So similarly, if I wanted to check the end of the file for a line break or carriage return, I could do something like this, right?
fseek(input, -2, SEEK_END);
char buf1[3] = {0};
if ((strncmp(buf1, "\n,", 2) == 0) || (strncmp(buf1, "\r,", 2) == 0))
{
return 4;
fclose(input);
}
This seemed logical (\n and \r are each 2 bytes, right), but it doesn't seem to be working.
Use fread to read the first 6 characters and compare it with the key. Example:
FILE *input;
int match = 0;
char buf[7] = { 0 };
input = fopen("variation_format_help.txt", "r");
if (!input)
{
printf("error\n");
return 0;
}
fread(buf, 1, 6, input);
if (strncmp(buf, "Class,", 6) == 0)
{
match = 1;
printf("%s match\n", buf);
}
At this point you can close the file or continue reading the file. If you want to go back to the beginning of the file then call fseek(input, 0, 0);, there is no need to close the file and open it again.
Related
I was creating a word guessing game that reads a text file line per line until it finds a random word and stores it in a string (word). Then the user enters letters until all the letters of the stored word are revealed.
So far it works perfectly but every time it's the very first word that is read some unknown characters get stored at the beginning of char word[20].
Note that I use C mobile app and it uses clang 6.0 compiler I think. So does the error come from my code or it come from their app? (which I love).
Here is the FULL, clearer code:
//guess the right word
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <stdlib.h>
#define TRUE 1
#define FALSE 0
#define NB_OF_WORDS 3 //number of words in text file
main() {
char begin, word[20] = { 0 }, guessedletter;
int num, rightletter, success;
int i = 0;
int show[20] = { 0 };//shown letters
FILE *ressource = NULL;
srand(time(NULL));
if ((ressource = fopen("ressource.txt", "r")) == NULL) {
fprintf(stderr, "Error ressource.txt");//open in read only mode
exit(EXIT_FAILURE);
}
printf("Welcome to Word Guess, a word guessing contest.\n"
"You have to find the letters of a word and guess what word it is.\n"
"Begin? (y/n)\n");
while ((begin = getchar()) == 'y') { //game loop
/*reinitializations*/
fseek(ressource, 3, SEEK_SET);//replaces rewind(ressource)
success = FALSE;
rightletter = 0;
num = 0;
num = rand() % NB_OF_WORDS; //random number between 0 and NB-1
for (i = 0; i < 20; i++) {
show[i] = FALSE;
word[i] = 0;
}
i = 0;
/*end of reinitializations*/
while (i <= num) {//reads line by line until random word is stored
if (fgets(word, 20, ressource) == NULL) {
fprintf(stderr, "fgets did not work");
exit(EXIT_FAILURE);
}
i++;
}
printf("%s", word);//here is just for testing if the read word is well read. Which isn't the case if num=0
for (i = 0; i < 20; i++)
if (word[i] == '\n')
word[i] = '\0';//adds zero character to show where the string ends
while (!success) { //guessing loop
printf("\nWrite a letter: ");
scanf("%c", &guessedletter);
printf("\n");
for (i = 0; word[i] != '\0'; i++) { //compares entered letter to letter from string. If they match...
if (word[i] == guessedletter) {
if (!show[i])
rightletter++;//avoids letters entered twice from increasing count
show[i] = TRUE;
}
if (show[i])
printf("%c", word[i]);//...a letter is revealed...
else
printf("*");//else it stays hidden
}
if (rightletter == strlen(word))
success=TRUE;//if all the right letters found (same number of letters found as number of letters in the words) you win
}
printf("\nCongratulations you have won!\nDo you want to replay? (y/n)");
getchar();//clears newline character
}
fclose(ressource);
return 0;
}
When num=0, I get a weird sign before the printed word as if the first characters of the text file were not supposed to be there...
Moreover during the guessing game if the word to guess (word[20]) is "annoying" let's say and that it's the first word from "ressource.txt" (num=0). The word will print like so on the screen once I guessed all the letters: ***annoying. Which does not happen with ANY other words from the list.
I'm new to this site and post from my phone... sorry for any mistakes.
EDIT: removed fgetc for fgets. Still get several unknown characters if fgets reads first line.
EDIT 2: added the whole code, translated mot[20] into word[20], added error testing
EDIT 3: Replacing rewind(ressource); by fseek(ressource, 3, SEEK_SET); solved the problem. Which means there are indeed three unknown characters at the beginning of the text file
There are multiple problems in the posted code:
You only posted a fragment of code, the rest of the function could cause problems that cannot be analysed from what you posted. Please post the complete code to a minimal program exhibiting the problem.
You do not test for end of file in the while loop. If you try to skip more lines than are present in the file, this loop will run indefinitely.
You do not test the return value of fgets(): if by chance you skipped the full contents of the file in the preceding while loop, fgets() will fail and return NULL, leaving mot in an undetermined state, causing unexpected behavior in the subsequent printf.
EDIT: the modified code still does not check the return value of fgets().
EDIT: thank you for posting the full code, but modifying the question this way makes this answer and the comments irrelevant.
Your dictionary file ressource.txt seems to start with a BOM (Byte Order Mark) encoded in UTF-8. I am guessing it contains French words including some with accented letters such as reculées and encodées... Your text editor saved it encoded in UTF-8 with an extra code-point at the beginning for other programs to determine this encoding easily. The problem with that is your program does not handle UTF-8. It assumes each byte read from stdin represents a single character. It might by chance match some words with accented letters, but will more likely fail to find them.
I'm try to get my text to be read back to front and to be printed in the reverse order in that file, but my for loop doesn't seem to working. Also my while loop is counting 999 characters even though it should be 800 and something (can't remember exactly), I think it might be because there is an empty line between the two paragraphs but then again there are no characters there.
Here is my code for the two loops -:
/*Reversing the file*/
char please;
char work[800];
int r, count, characters3;
characters3 = 0;
count = 0;
r = 0;
fgets(work, 800, outputfile);
while (work[count] != NULL)
{
characters3++;
count++;
}
printf("The number of characters to be copied is-: %d", characters3);
for (characters3; characters3 >= 0; characters3--)
{
please = work[characters3];
work[r] = please;
r++;
}
fprintf(outputfile, "%s", work);
/*Closing all the file streams*/
fclose(firstfile);
fclose(secondfile);
fclose(outputfile);
/*Message to direct the user to where the files are*/
printf("\n Merged the first and second files into the output file
and reversed it! \n Check the outputfile text inside the Debug folder!");
There are a couple of huge conceptual flaws in your code.
The very first one is that you state that it "doesn't seem to [be] working" without saying why you think so. Just running your code reveals what the problem is: you do not get any output at all.
Here is why. You reverse your string, and so the terminating zero comes at the start of the new string. You then print that string – and it ends immediately at the first character.
Fix this by decreasing the start of the loop in characters3.
Next, why not print a few intermediate results? That way you can see what's happening.
string: [This is a test.
]
The number of characters to be copied is-: 15
result: [
.tset aa test.
]
Hey look, there seems to be a problem with the carriage return (it ends up at the start of the line), which is exactly what should happen – after all, it is part of the string – but more likely not what you intend to do.
Apart from that, you can clearly see that the reversing itself is not correct!
The problem now is that you are reading and writing from the same string:
please = work[characters3];
work[r] = please;
You write the character at the end into position #0, decrease the end and increase the start, and repeat until done. So, the second half of reading/writing starts copying the end characters back from the start into the end half again!
Two possible fixes: 1. read from one string and write to a new one, or 2. adjust the loop so it stops copying after 'half' is done (since you are doing two swaps per iteration, you only need to loop half the number of characters).
You also need to think more about what swapping means. As it is, your code overwrites a character in the string. To correctly swap two characters, you need to save one first in a temporary variable.
void reverse (FILE *f)
{
char please, why;
char work[800];
int r, count, characters3;
characters3 = 0;
count = 0;
r = 0;
fgets(work, 800, f);
printf ("string: [%s]\n", work);
while (work[count] != 0)
{
characters3++;
count++;
}
characters3--; /* do not count last zero */
characters3--; /* do not count the return */
printf("The number of characters to be copied is-: %d\n", characters3);
for (characters3; characters3 >= (count>>1); characters3--)
{
please = work[characters3];
why = work[r];
work[r] = please;
work[characters3] = why;
r++;
}
printf ("result: [%s]\n", work);
}
As a final note: you do not need to 'manually' count the number of characters, there is a function for that. All that's needed instead of the count loop is this;
characters3 = strlen(work);
Here's a complete and heavily commented function that will take in a filename to an existing file, open it, then reverse the file character-by-character. Several improvements/extensions could include:
Add an argument to adjust the maximum buffer size allowed.
Dynamically increase the buffer size as the input file exceeds the original memory.
Add a strategy for recovering the original contents if something goes wrong when writing the reversed characters back to the file.
// naming convention of l_ for local variable and p_ for pointers
// Returns 1 on success and 0 on failure
int reverse_file(char *filename) {
FILE *p_file = NULL;
// r+ enables read & write, preserves contents, starts pointer p_file at beginning of file, and will not create a
// new file if one doesn't exist. Consider a nested fopen(filename, "w+") if creation of a new file is desired.
p_file = fopen(filename, "r+");
// Exit with failure value if file was not opened successfully
if(p_file == NULL) {
perror("reverse_file() failed to open file.");
fclose(p_file);
return 0;
}
// Assumes entire file contents can be held in volatile memory using a buffer of size l_buffer_size * sizeof(char)
uint32_t l_buffer_size = 1024;
char l_buffer[l_buffer_size]; // buffer type is char to match fgetc() return type of int
// Cursor for moving within the l_buffer
int64_t l_buffer_cursor = 0;
// Temporary storage for current char from file
// fgetc() returns the character read as an unsigned char cast to an int or EOF on end of file or error.
int l_temp;
for (l_buffer_cursor = 0; (l_temp = fgetc(p_file)) != EOF; ++l_buffer_cursor) {
// Store the current char into our buffer in the original order from the file
l_buffer[l_buffer_cursor] = (char)l_temp; // explicitly typecast l_temp back down to signed char
// Verify our assumption that the file can completely fit in volatile memory <= l_buffer_size * sizeof(char)
// is still valid. Return an error otherwise.
if (l_buffer_cursor >= l_buffer_size) {
fprintf(stderr, "reverse_file() in memory buffer size of %u char exceeded. %s is too large.\n",
l_buffer_size, filename);
fclose(p_file);
return 0;
}
}
// At the conclusion of the for loop, l_buffer contains a copy of the file in memory and l_buffer_cursor points
// to the index 1 past the final char read in from the file. Thus, ensure the final char in the file is a
// terminating symbol and decrement l_buffer_cursor by 1 before proceeding.
fputc('\0', p_file);
--l_buffer_cursor;
// To reverse the file contents, reset the p_file cursor to the beginning of the file then write data to the file by
// reading from l_buffer in reverse order by decrementing l_buffer_cursor.
// NOTE: A less verbose/safe alternative to fseek is: rewind(p_file);
if ( fseek(p_file, 0, SEEK_SET) != 0 ) {
return 0;
}
for (l_temp = 0; l_buffer_cursor >= 0; --l_buffer_cursor) {
l_temp = fputc(l_buffer[l_buffer_cursor], p_file); // write buffered char to the file, advance f_open pointer
if (l_temp == EOF) {
fprintf(stderr, "reverse_file() failed to write %c at index %lu back to the file %s.\n",
l_buffer[l_buffer_cursor], l_buffer_cursor, filename);
}
}
fclose(p_file);
return 1;
}
SO i'm supposed to write a block of code that opens a file called "words" and writes the last word in the file to a file called "lastword". This is what I have so far:
FILE *f;
FILE *fp;
char string1[100];
f = fopen("words","w");
fp=fopen("lastword", "w");
fscanf(f,
fclose(fp)
fclose(f);
The problem here is that I don't know how to read in the last word of the text file. How would I know which word is the last word?
This is similar to what the tail tool does, you seek to a certain offset from the end of the file and read the block there, then search backwards, once you meet a whitespace or a new line, you can print the word from there, that is the last word. The basic code looks like this:
char string[1024];
char *last;
f = fopen("words","r");
fseek(f, SEEK_END, 1024);
size_t nread = fread(string, 1, sizeof string, f);
for (int I = 0; I < nread; I++) {
if (isspace(string[nread - 1 - I])) {
last = string[nread - I];
}
}
fprintf(fp, "%s", last);
If the word boundary is not find the first block, you continue to read the second last block and search in it, and the third, until your find it, then print all the characters after than position.
There are plenty of ways to do this.
Easy way
One easy approach would be to to loop on reading words:
f = fopen("words.txt","r"); // attention !! open in "r" mode !!
...
int rc;
do {
rc=fscanf(f, "%99s", string1); // attempt to read
} while (rc==1 && !feof(f)); // while it's successfull.
... // here string1 contains the last successfull string read
However this takes a word as any combination of characters separated by space. Note the use of the with filed in the scanf() format to make sure that there will be no buffer overflow.
More exact way
Building on previous attempt, if you want a stricter definition of words, you can just replace the call to scanf() with a function of your own:
rc=read_word(f, string1, 100);
The function would be something like:
int read_word(FILE *fp, char *s, int szmax) {
int started=0, c;
while ((c=fgetc(fp))!=EOF && szmax>1) {
if (isalpha(c)) { // copy only alphabetic chars to sring
started=1;
*s++=c;
szmax--;
}
else if (started) // first char after the alphabetics
break; // will end the word.
}
if (started)
*s=0; // if we have found a word, we end it.
return started;
}
Okay, so after reading both: How to read a specific line in a text file in C (integers) and What is the easiest way to count the newlines in an ASCII file? I figured that I could use the points mentioned in both to both efficiently and quickly read a single line from a file.
Here's the code I have:
char buf[BUFSIZ];
intmax_t lines = 2; // when set to zero, reads two extra lines.
FILE *fp = fopen(filename, "r");
while ((fscanf(fp, "%*[^\n]"), fscanf(fp, "%*c")) != EOF)
{
/* globals.lines_to_feed__queue is the line that we _do_ want to print,
that is we want to ignore all lines up to that point:
feeding them into "nothingness" */
if (lines == globals.lines_to_feed__queue)
{
fgets(buf, sizeof buf, fp);
}
++lines;
}
fprintf(stdout, "%s", buf);
fclose(fp);
Now the above code works wonderfully, and I'm extrememly pleased with myself for figuring out that you can fscanf a file up to a certain point, and then use fgets to read whatever data is at said point into a buffer, instead of having to fgets every single line and then fprintf the buf, when all I care about is the line that I'm printing: I don't want to be storing strings that I could care less about in a buffer that I'm only going to use once for a single line.
However, the only issue I've run into, as noted by the // when set to zero, reads two extra lines comment: when lines is initialized with a value of 0, and the line I want is like 200, the line I'll get will actually be line 202. Could someone please explain what I'm doing wrong here/why this is happening and whether my quick fix lines = 2; is fine or if it is insufficient (as in, is something really wrong going on here, and it just happens to work?)
There are two reasons why you have to set the lines to 2, and both can be derived from the special case where you want the first line.
On one hand, in the while loop the first thing you do is use fscanf to consume a line, then you check if the lines counter matches the line you want. The thing is that if the line you want is the one you just consumed you are out of luck. On the other hand you are basically moving through lines by finding the next \n and incrementing lines after you check if the current line is the one you're after.
These two factors combined cause the offset in the lines count, so the following is a version of the same function taking them into account. Additionally it also contains a break; statement once you get to the line you are looking for, so that the while loop stops looking further into the file.
void read_and_print_line(char * filename, int line) {
char buf[BUFFERSIZE];
int lines = 0;
FILE *fp = fopen(filename, "r");
do
{
if (++lines == line) {
fgets(buf, sizeof buf, fp);
break;
}
}while((fscanf(fp, "%*[^\n]"), fscanf(fp, "%*c")) != EOF);
if(lines == line)
printf("%s", buf);
fclose(fp);
}
Just as another way of looking at the problem… Assuming that your global specifies 1 when the first line is to be printed, 2 for the second, etc, then:
char buf[BUFSIZ];
FILE *fp = fopen(filename, "r");
if (fp == 0)
return; // Error exit — report error.
for (int lineno = 1; lineno < globals.lines_to_feed_queue; lineno++)
{
fscanf(fp, "%*[^\n]");
if (fscanf(fp, "%*c") == EOF)
break;
}
if (fgets(buf, sizeof(buf), fp) != 0)
fprintf(stdout, "%s", buf);
else
…requested line not present in file…
fclose(fp);
You could replace the break with fclose(fp); and return; if that's appropriate (but do make sure you close the file before exiting; otherwise, you leak resources).
If your line numbers are counted from 0, then change the lower limit of the for loop to 0.
First, about what is wrong here: this code is unable to read the very first line in the file (what happens if globals.lines_to_feed__queue is 0?). It would also miscount lines shall the file contain successive newlines.
Second, you must realize that there is no magic. Since you don't know at which offset the string in question lives, you have to patiently read file character by character, counting end-of-strings along the way. It doesn't matter if you delegate the reading/counting to fgets/fscanf, or fgetc each character for manual inspection - either way an uninteresting piece of file will make its way from the disk into the OS buffers, and then into the userspace for interpretation.
Your gut feeling is absolutely correct: the code is broken.
I am really struggling to understand how character arrays work in C. This seems like something that should be really simple, but I do not know what function to use, or how to use it.
I want the user to enter a string, and I want to iterate through a text file, comparing this string to the first word of each line in the file.
By "word" here, I mean substring that consists of characters that aren't blanks.
Help is greatly appreciated!
Edit:
To be more clear, I want to take a single input and search for it in a database of the form of a text file. I know that if it is in the database, it will be the first word of a line, since that is how to database is formatted. I suppose I COULD iterate through every single word of the database, but this seems less efficient.
After finding the input in the database, I need to access the two words that follow it (on the same line) to achieve the program's ultimate goal (which is computational in nature)
Here is some code that will do what you are asking. I think it will help you understand how string functions work a little better. Note - I did not make many assumptions about how well conditioned the input and text file are, so there is a fair bit of code for removing whitespace from the input, and for checking that the match is truly "the first word", and not "the first part of the first word". So this code will not match the input "hello" to the line "helloworld 123 234" but it will match to "hello world 123 234". Note also that it is currently case sensitive.
#include <stdio.h>
#include <string.h>
int main(void) {
char buf[100]; // declare space for the input string
FILE *fp; // pointer to the text file
char fileBuf[256]; // space to keep a line from the file
int ii, ll;
printf("give a word to check:\n");
fgets(buf, 100, stdin); // fgets prevents you reading in a string longer than buffer
printf("you entered: %s\n", buf); // check we read correctly
// see (for debug) if there are any odd characters:
printf("In hex, that is ");
ll = strlen(buf);
for(ii = 0; ii < ll; ii++) printf("%2X ", buf[ii]);
printf("\n");
// probably see a carriage return - depends on OS. Get rid of it!
// note I could have used the result that ii is strlen(but) but
// that makes the code harder to understand
for(ii = strlen(buf) - 1; ii >=0; ii--) {
if (isspace(buf[ii])) buf[ii]='\0';
}
// open the file:
if((fp=fopen("myFile.txt", "r"))==NULL) {
printf("cannot open file!\n");
return 0;
}
while( fgets(fileBuf, 256, fp) ) { // read in one line at a time until eof
printf("line read: %s", fileBuf); // show we read it correctly
// find whitespace: we need to keep only the first word.
ii = 0;
while(!isspace(fileBuf[ii]) && ii < 255) ii++;
// now compare input string with first word from input file:
if (strlen(buf)==ii && strstr(fileBuf, buf) == fileBuf) {
printf("found a matching line: %s\n", fileBuf);
break;
}
}
// when you get here, fileBuf will contain the line you are interested in
// the second and third word of the line are what you are really after.
}
Your recent update states that the file is really a database, in which you are looking for a word. This is very important.
If you have enough memory to hold the whole database, you should do just that (read the whole database and arrange it for efficient searching), so you should probably not ask about searching in a file.
Good database designs involve data structures like trie and hash table. But for a start, you could use the most basic improvement of the database - holding the words in alphabetical order (use the somewhat tricky qsort function to achieve that).
struct Database
{
size_t count;
struct Entry // not sure about C syntax here; I usually code in C++; sorry
{
char *word;
char *explanation;
} *entries;
};
char *find_explanation_of_word(struct Database* db, char *word)
{
for (size_t i = 0; i < db->count; i++)
{
int result = strcmp(db->entries[i].word, word);
if (result == 0)
return db->entries[i].explanation;
else if (result > 0)
break; // if the database is sorted, this means word is not found
}
return NULL; // not found
}
If your database is too big to hold in memory, you should use a trie that holds just the beginnings of the words in the database; for each beginning of a word, have a file offset at which to start scanning the file.
char* find_explanation_in_file(FILE *f, long offset, char *word)
{
fseek(f, offset, SEEK_SET);
char line[100]; // 100 should be greater than max line in file
while (line, sizeof(line), f)
{
char *word_in_file = strtok(line, " ");
char *explanation = strtok(NULL, "");
int result = strcmp(word_in_file, word);
if (result == 0)
return explanation;
else if (result > 0)
break;
}
return NULL; // not found
}
I think what you need is fseek().
1) Pre-process the database file as follows. Find out the positions of all the '\n' (carriage returns), and store them in array, say a, so that you know that ith line starts at a[i]th character from the beginning of the file.
2) fseek() is a library function in stdio.h, and works as given here. So, when you need to process an input string, just start from the start of the file, and check the first word, only at the stored positions in the array a. To do that:
fseek(inFile , a[i] , SEEK_SET);
and then
fscanf(inFile, "%s %s %s", yourFirstWordHere, secondWord, thirdWord);
for checking the ith line.
Or, more efficiently, you could use:
fseek ( inFile , a[i]-a[i-1] , SEEK_CURR )
Explanation: What fseek() does is, it sets the read/write position indicator associated with the file at the desired position. So, if you know at which point you need to read or write, you can just go there and read directly or write directly. This way, you won't need to read whole lines just to get first three words.