I've come across such an example of getword.
I understand all the checks and etc. but I have a problem with ungetc.
When the c does satisfy if ((!isalpha(c)) || c == EOF)and also doesn't satisfy while (isalnum(c)) -> it isn't a letter, nor a number - ungetc rejects that char.
Let's suppose it is '\n'.
Then it gets to return word however it can't be returned since it is not saved in any array. What happens then?
while (isalnum(c)) {
if (cur >= size) {
size += buf;
word = realloc(word, sizeof(char) * size);
}
word[cur] = c;
cur++;
c = fgetc(fp);
}
if ((!isalpha(c)) || c == EOF) {
ungetc(c, fp);
}
return word;
EDIT
#Mark Byers - thanks, but that c was rejected for a purpose, and will not satisfy the condition again and again in an infinite loop?
The terminal condition, just before the line you don't understand, is not good. It should probably be:
int c;
...
if (!isalpha(c) && c != EOF)
ungetc(c, fp);
This means that if the last character read was a real character (not EOF) and wasn't an alphabetic character, push it back for reprocessing by whatever next uses the input stream fp. That is, suppose you read a blank; the blank will terminate the loop and the blank will be pushed back so that the next getc(fp) will read the blank again (as would fscanf() or fread() or any other read operation on the file stream fp). If, instead of blank, you got EOF, then there is no attempt to push back the EOF in my revised code; in the original code, the EOF would be pushed back.
Note that c must be an int rather than a char.
ungetc pushes the characters onto the stream so that the next read will return that character again.
ungetc(c, fp); /* Push the character c onto the stream. */
/* ...etc... */
c = fgetc(fp); /* Reads the same value again. */
This can sometimes be convenient if you are reading characters to find out when the current token is complete, but aren't yet ready to read the next token.
OK. Now I understand why that case with eg. '\n' was troubling me. I'm just dumb and forgot about the section in main() referring to getword. Of course before calling getword there are a couple of tests (another ungetc there) and it fputs that characters not satisying isalnum
It emerges from this that while loop in getword always starts with at least one isalnum positive, and the check at then end is just for following characters.
Related
I have a program that reads a file into a buffer structure. The problem I'm having is that when I look at the output of the file, there's an extra EOF character at the end. Ill post the related functions:(NOTE: I removed parameter checks and only posted code in the function related to the issue)
b_load
int b_load(FILE * const fi, Buffer * const pBD){
unsigned char character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars = 0; /*Counter of the amount of characters read into the buffer*/
/*Assigns main Buffer to tempBuffer*/
tempBuffer = pBD;
/*Infinite loop that breaks after EOF is read*/
while(1){
/*calls fgetc() and returns the char into the character variable*/
character = (unsigned char)fgetc(fi);
if(!feof(fi)){
tempBuffer = b_addc(pBD,character);
if(tempBuffer == NULL)
return LOAD_FAIL;
++num_chars;
}else{
break;
}
}
return num_chars;
}
b_print
int b_print(Buffer * const pBD){
int num_chars = 0;
if(pBD->addc_offset == 0)
printf("The buffer is empty\n");
/*Sets getc_offset to 0*/
b_set_getc_offset(pBD, 0);
pBD->eob=0;
/*b_eob returns the structures eob field*/
while (!b_eob(pBD)){
printf("%c",b_getc(pBD));
++num_chars;
}
printf("\n");
return num_chars;
}
b_getc
char b_getc(Buffer * const pBD){
if(pBD->getc_offset == pBD->addc_offset){
pBD->eob = 1;
return R_FAIL_1;
}
pBD->eob = 0;
return pBD->ca_head[(pBD->getc_offset)++];
}
at the end I end up with:
"a catÿ"
(the y is the EOF character)
It prints an EOF character but is never added to the buffer. When the driver code adds an EOF character to the end of the buffer, 2 appear. Any idea what is causing this? I might be using feof() wrong so that may be it, but it is required in the code
There is no "EOF character". EOF is a value returned by getchar() and related functions to indicate that they have no more input to read. It's a macro that expands to a negative integer constant expression, typically (-1).
(For Windows text files, an end-of-file condition may be triggered by a Control-Z character in a file. If you read such a file in text mode, you won't see that character; it will just act like it reached the end of the file at that point.)
Don't use the feof() function to detect that there's no more input to read. Instead, look at the value returned by whatever input function you're using. Different input functions use different ways to indicate that they weren't able to read anything; read the documentation for whichever one you're using. For example, fgets() returns a null pointer, getchar() returns EOF, and scanf() returns the number of items it was able to read.
getchar(), for example, returns either the character it just read (treated as an unsigned char and converted to int) or the value EOF to indicate that it wasn't able to read anything. The negative value of EOF is chosen specifically to avoid colliding with any valid value of type unsigned char. Which means you need to store the value returned by getchar() in an int object; if you store it in a char or unsigned char instead, you can lose information, and an actual character with the value 0xff can be mistaken for EOF.
The feof() function returns the value of the end-of-file indicator for the file you're reading from. That indicator becomes true after you've tried and failed to read from the file. And if you ran out of input because of an error, rather than because of an end-of-file condition, feof() will never become true.
You can use feof() and/or ferror() to determine why there was no more input to be read, but only after you've detected it by other means.
Recommended reading: Section 12 of the comp.lang.c FAQ, which covers stdio. (And the rest of it.)
UPDATE :
I haven't seen enough of your code to understand what you're doing with the Buffer objects. Your input look actually looks (almost) correct, though it's written in a clumsy way.
The usual idiom for reading characters from a file is:
int c; /* `int`, NOT `char` or `unsigned char` */
while ((c = fgetc(fi)) != EOF) {
/* process character in `c` */
}
But your approach, which I might rearrange like this:
while (1) {
c = fgetc(fi);
if (feof(fi) || ferror(fi)) {
/* no more input */
break;
}
/* process character in c */
}
should actually work. Note that I've added a check for ferror(f1). Could it be that you have an error on input (which you're not detecting)? That would cause c to contain EOF, or the value of EOF converted to the type of c. That's doubtful, though, since it would probably give you an infinite loop.
Suggested approach: Using either an interactive debugger or added printf calls, show the value of character every time through the loop. If your input loop is working correctly, then build a stripped-down version of your program with a hard-wired sequence of calls to b_addc(), and see if you can reproduce the problem that way.
There you go ...
int b_load(FILE * const fi, Buffer * const pBD){
int character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars ; /*Counter of the amount of characters read into the buffer*/
/*Infinite loop that breaks WHEN EOF is read*/
while(num_chars = 0; 1; num_chars++ ) {
character = fgetc(fi);
if (character == EOF || feof(fi)) break; // since you insist on the silly feof() ...
tempBuffer = b_addc(pBD, (unsigned char) character);
if(tempBuffer == NULL) return LOAD_FAIL;
}
}
return num_chars;
}
I was trying an exercise from K&R (ex 1-17), and I came up with my own solution.
The problem is that my program appears to hang, perhaps in an infinite loop. I omitted the NUL ('\0') character insertion as I find C generally automatically attaches it to the end of a string (Doesn't it?).
Can somebody please help me find out what's wrong?
I'm using the GCC compiler with Cygwin on win8(x64), if that helps..
Question - Print all input lines that are longer than 80 characters
#include<stdio.h>
#define MINLEN 80
#define MAXLEN 1000
/* getlin : inputs the string and returns its length */
int getlin(char line[])
{
int c,index;
for(index = 0 ; (c != '\n') && ((c = getchar()) != EOF) && (index < MAXLEN) ; index++)
line[index] = c;
return (index); // Returns length of the input string
}
main()
{
int len;
char chArr[MAXLEN];
while((len = getlin(chArr))>0)
{
/* A printf here,(which I had originally inserted for debugging purposes) Miraculously solves the problem!!*/
if(len>=MINLEN)
printf("\n%s",chArr);
}
return 0;
}
And I omitted the null('\0') character insertion as I find C generally automatically attaches it to the end of a string (Doesn't it?).
No, it doesn't. You're using getchar() to read input characters one at a time. If you put the chars in an array yourself, you'll have to terminate it yourself.
The C functions that return a string will generally terminate it, but that's not what you're doing here.
Your input loop is a little weird. The logical AND operator only executes the right-hand-side if the left-hand-side evaluates to false (it's called "short-circuiting"). Rearranging the order of the tests in the loop should help.
for(index = 0 ; (index < MAXLEN) && ((c = getchar()) != EOF) && (c != '\n'); index++)
line[index] = c;
This way, c receives a value from getchar() before you perform tests on its contents.
I'm not positive about what's wrong, but you don't provide the input to the program so I'm guessing.
My guess is that in getlin your variable c gets set to '\n' and at that point it never gets another character. It just keeps returning and looping.
You never SET c to anything inside your getlin function before you test it, is the problem.
C does not insert a NUL terminator at the end of strings automatically. Some functions might do so (e.g. snprintf). Consult your documentation. Additionally, take care to initialize all your variables, like c in getlin().
1) C doesn't add a final \0 to your string. You are responsible for using an array of at least 81 chars and puting the final \0 after the last character you write in it.
2) You're testing the value of c before reading it
3) Your program doesn't print anything because printf uses a buffer for I/O which is flushed when you send \n. Modify this statement to print a final \n:
printf("\n%s",chArr);
to become:
printf("%s\n",chArr);
4) To send an EOF to your program you should do a Ctrl+D under unix and I don't know if it's possible for windows. This may be the reason why the program never ends.
I am reading from a text file, iterating with a while(!feof) loop,
but whenever I use this condition the loop iterates an extra time.
I solved the problem with this 'patchy' code
while (stop == FALSE)
{
...
terminator = fgetc(input);
if (terminator == EOF)
stop = TRUE;
else
fseek(input, -1, SEEK_CUR);
}
But it looks and feels very bad.
You can take advantage of the fact that an assignment gets evaluated as the value being assigned, in this case to the character being read:
while((terminator = fgetc(input))!= EOF) {
// ...
}
Here is an idiomatic example (source):
fp = fopen("datafile.txt", "r"); // error check this!
// this while-statement assigns into c, and then checks against EOF:
while((c = fgetc(fp)) != EOF) {
/* ... */
}
fclose(fp);
Similarly you ca read line-by-line:
char buf[MAXLINE];
// ...
while((fgets(buf,MAXLINE,stdin)) != NULL) {
do_something(buf);
}
Since fgets copies the detected newline character, you can detect
end of line by checking the second to last buffer element. You can use
realloc to resize the buffer (be sure you keep a pointer to the beginning of the buffer, but pass buf+n, to the next fgets, where n is the number of read characters). From the standard regarding fgets:
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first. A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str.
Alternatively, you could read the whole file in one go using fread() (see example following the link).
I'm reading a book about c programming and don't understand a shown example. Or more precisely I don't understand why it works because I would think it shouldn't.
The code is simple, it reads the content of a text file and outputs it in output area. As far as I understand it, I would think that the
ch = fgetc(stream);
ought to be inside the while loop, because it only reads one int a time? and needs to read the next int after the current one has been outputted. Well, it turns out that this code indeed works fine so I hope someone could explain my fallacy to me. Thanks!
#include <stdio.h>
int main(int argc, char *argv[]) {
FILE *stream;
char filename[67];
int ch;
printf("Please enter the filename?\n");
gets(filename);
if((stream = fopen(filename, "r")) == NULL) {
printf("Error opening the file\n");
exit(1);
}
ch = fgetc(stream);
while (!feof(stream)) {
putchar(ch);
ch = fgetc(stream);
}
fclose(stream);
}
I think you are confuse because of feof():
Doc: int feof ( FILE * stream );
Checks whether the end-of-File indicator associated with stream is
set, returning a value different from zero if it is.
This indicator is generally set by a previous operation on the stream
that attempted to read at or past the end-of-file.
ch = fgetc(stream); <---"Read current symbol from file"
while (!feof(stream)) { <---"Check EOF read/returned by last fgetc() call"
putchar(ch); <---"Output lasts read symbol, that was not EOF"
ch = fgetc(stream); <---"Read next symbols from file"
}
<-- control reach here when EOF found
A much better way is to write your loop like:
while((ch = fgetc(stream))!= EOF){ <--" Read while EOF not found"
putchar(ch); <-- "inside loop print a symbol that is not EOF"
}
Additionally, Note: int fgetc ( FILE * stream );
Returns the character currently pointed by the internal file position
indicator of the specified stream. The internal file position
indicator is then advanced to the next character.
If the stream is at the end-of-file when called, the function returns
EOF and sets the end-of-file indicator for the stream (feof).
If a read error occurs, the function returns EOF and sets the error
indicator for the stream (ferror).
If the fgetc outside while is removed, like this:
while (!feof(stream)) {
putchar(ch);
ch = fgetc(stream);
}
ch will be un-initialized the first time putchar(ch) is called.
By the way, don't use gets, because it may cause buffer overflow. Use fgets or gets_s instead. gets is removed in C11.
The code you have provided has 'ch =fgetc(stream);' before the While loop and also
'ch = fgetc(stream);' within the body of the loop.
It would be logical that the statement within the loop is retrieving the ch from the stream one at a time as you correctly state.
It is inside and outside as you see. The one outside is responsible for reading the first character (which may be already the end of file, then the while wouldn't be entered anyway and nothing is printed), then it enters the loop, puts the character and reads the next one.. as long as the read character is not the end of file, the loop continues.
This is because of second fgetc which is getting call upto while (!feof(stream)).
fgetc() reads a char(byte) and return that byte,The reading of byte value depends on where the read pointer is available.
Once fgetc() successfully read one byte the read file pointer moves to the next byte .so if you read the file the next byte will be the output and it will continue upto it find the end of the file where it return EOF.
Actually this part here:
while (!feof(stream)) {
putchar(ch);
ch = fgetc(stream);
}
is pretty unsafe and you should avoid checking EOF like that (here why).
The way you should read a file using fgetc is like so:
int ch;
while ((ch = fgetc(stream)) != EOF)
{
printf("%c", ch)
}
This is non functional code. Last character from file is never outputted. fgetc will read last character and pointer will be at end of file. So, when while is checked, !feof will return false, and read character will not be outputed.
feofis not preventing reading after end of file: for empty files fgetc will be called before feof!
Unless there is some benefit in console handling, two better options exist:
Using feof:
while (!feof(stream)) {
ch=fgetc(stream);
putchar(ch);
}
Without using feof - because fgetc returns EOF when there are no more characters:
while ((ch=fgetc(stream))!=EOF) putchar(ch);
I can't figure out why my while loop won't work. The code works fine without it... The purpose of the code is to find a secret message in a bin file. So I got the code to find the letters, but now when I try to get it to loop until the end of the file, it doesn't work. I'm new at this. What am I doing wrong?
main(){
FILE* message;
int i, start;
long int size;
char keep[1];
message = fopen("c:\\myFiles\\Message.dat", "rb");
if(message == NULL){
printf("There was a problem reading the file. \n");
exit(-1);
}
//the first 4 bytes contain an int that tells how many subsequent bytes you can throw away
fread(&start, sizeof(int), 1, message);
printf("%i \n", start); //#of first 4 bytes was 280
fseek(message, start, SEEK_CUR); //skip 280 bytes
keep[0] = fgetc(message); //get next character, keep it
printf("%c", keep[0]); //print character
while( (keep[0] = getc(message)) != EOF) {
fread(&start, sizeof(int), 1, message);
fseek(message, start, SEEK_CUR);
keep[0] = fgetc(message);
printf("%c", keep[0]);
}
fclose(message);
system("pause");
}
EDIT:
After looking at my code in the debugger, it looks like having "getc" in the while loop threw everything off. I fixed it by creating a new char called letter, and then replacing my code with this:
fread(&start, sizeof(int), 1, message);
fseek(message, start, SEEK_CUR);
while( (letter = getc(message)) != EOF) {
printf("%c", letter);
fread(&start, sizeof(int), 1, message);
fseek(message, start, SEEK_CUR);
}
It works like a charm now. Any more suggestions are certainly welcome. Thanks everyone.
The return value from getc() and its relatives is an int, not a char.
If you assign the result of getc() to a char, one of two things happens when it returns EOF:
If plain char is unsigned, then EOF is converted to 0xFF, and 0xFF != EOF, so the loop never terminates.
If plain char is signed, then EOF is equivalent to a valid character (in the 8859-1 code set, that's ÿ, y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS), and your loop may terminate early.
Given the problem you face, we can tentatively guess you have plain char as an unsigned type.
The reason that getc() et al return an int is that they have to return every possible value that can fit in a char and also a distinct value, EOF. In the C standard, it says:
ISO/IEC 9899:2011 §7.21.7.1 The fgetc() function
int fgetc(FILE *stream);
If the end-of-file indicator for the input stream pointed to by stream is not set and a
next character is present, the fgetc function obtains that character as an unsigned char converted to an int ...
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-
file indicator for the stream is set and the fgetc function returns EOF.
Similar wording applies to the getc() function and the getchar() function: they are defined to behave like the fgetc() function except that if getc() is implemented as a macro, it may take liberties with the file stream argument that are not normally granted to standard macros — specifically, the stream argument expression may be evaluated more than once, so calling getc() with side-effects (getc(fp++)) is very silly (but change to fgetc() and it would be safe, but still eccentric).
In your loop, you could use:
int c;
while ((c = getc(message)) != EOF) {
keep[0] = c;
This preserves the assignment to keep[0]; I'm not sure you truly need it.
You should be checking the other calls to fgets(), getc(), fread() to make sure you are getting what you expect as input. Especially on input, you cannot really afford to skip those checks. Sooner, rather than later, something will go wrong and if you aren't religiously checking the return statuses, your code is likely to crash, or simply 'go wrong'.
There are 256 different char values that might be returned by getc() and stored in a char variable like keep[0] (yes, I'm oversummarising wildly). To detect end-of-file reliably, EOF has to have a value different from all of them. That's why getc() returns int rather than char: because a 257th distinct value for EOF wouldn't fit into a char.
Thus you need to store the value returned by getc() in an int at least until you check it against EOF:
int tmpc;
while( (tmpc = getc(message)) != EOF) {
keep[0] = tmpc;
...