In K&R, at the beginning of chapter 5, is presented the function getint that
performs free-format input conversion by breaking a stream of charachters into integer values, one integer per call.
The function is pretty simple, but i can't actually understand why is c pushed back in to the buffer in the first if-statement. Because of this, every time you call getint it will do the same thing. Because it will read the next charachter in the buffer, which is c.
Is this strategy intended to be a kind of security mechanism?
int getint(int *pn) {
int c, sign;
while(isspace(c = getch()))
;
if(!isdigit(c) && c != EOF && c != '+' && c != '-') {
ungetch(c);
return 0;
}
sign = (c == '-') ? -1 : 1;
if(c == '+' || c == '-')
c = getch();
for(*pn = 0; isdigit(c); (c = getch()))
*pn = 10 * *pn + (c - '0');
*pn *= sign;
if(c != EOF)
ungetch(c);
return c;
}
The code you ask about causes getint to stop if the next character in the stream is not part of a numeral (or a space) because it is not a digit or sign character.
The notion is that as long as you call getint while there are acceptable numerals in the input, it will convert those numerals to int values and return them. When you call getint while there is not an acceptable numeral next in the input, it will not perform a conversion. Since it is not performing a character, it leaves the character it is not using in the stream.
A proper routine would return an error indication so the caller can easily determine that getint is not performing a conversion for this reason. As it is, the caller cannot distinguish between getint returning 0 for a “0” in the input and getint returning 0 for a non-numeral in the input. However, as this is only tutorial code, features like this are omitted.
Related
I'm aware that code snippets from textbooks are just for demonstration purposes and shouldn't be held to production standards but K&R 2nd edition on page 164-165 says:
fgets and fputs, here they are, copied from the standard library
on our system:
char *fgets(char *s, int n, FILE *iop)
{
register int c;
register char *cs;
cs = s;
while (--n > 0 && (c = getc(iop)) != EOF)
if ((*cs++ = c) == '\n')
break;
*cs = '\0';
return (c == EOF && cs == s) ? NULL : s;
}
Why is the return statement not return (ferror(iop) || (c == EOF && cs == s)) ? NULL : s; since:
ANSI C89 standard says:
If a read error occurs during the operation, the array contents are
indeterminate and a null pointer is returned.
Even the standard library illustrated in Appendix B of the book says so. From Page 247:
fgets returns s, or NULL if end of file or error occurs.
K&R uses ferror(iop) in fputs implementation given just below this fgets implementation on the same page.
With the above implementation, fgets will return s even if there is a read error after reading some characters. Maybe this is an oversight or am I missing something?
You are correct that the behavior of the posted implementation of the function fgets does not comply with the C89 standard. For the same reasons, it also does not comply with the modern C11/C18 standard.
The posted implementation of fgets handles end-of-file correctly, by only returning NULL if not a single character has been read. However, it does not handle a stream error correctly. According to your quote of the C89 standard (which is identical to the C11/C18 standard in this respect), the function should always return NULL if an error occurred, regardless of the number of characters read. Therefore, the posted implementation of fgets is wrong to handle an end-of-file in exactly the same way as a stream error.
It is worth noting that the second edition of K&R is from 1988, which is from before the ANSI C89 standard was published. Therefore, the exact wording of the standard may not have been finalized yet, when the second edition of the book was written.
The posted implementation of fgets also does not comply with the quoted specification of Appendix B. Assuming that the function fgets is supposed to behave as specified in Appendix B, then the posted implementation of fgets handles errors correctly, but it does not handle end-of-file correctly. According to the quote from Appendix B, the function should always return NULL when an end-of-file occurs (even if characters have been successfully read, which is not meaningful).
It is also worth noting that using the statement
return (ferror(iop) || (c == EOF && cs == s)) ? NULL : s;
as suggested in the question will not make the implementation of the function fgets fully comply with the C89/C11/C18 standards. When a stream error occurs "during the operation", the function is supposed to return NULL. However, when ferror returns nonzero, it may be impossible to tell whether the error occurred "during the operation", i.e. whether the stream's error indicator was already set before the function fgets was called. It is possible that the stream's error indicator was already set due to an error that occurred before fgets was called, but that all subsequent stream operations succeeded or failed due to end-of-file (i.e. not due to stream error). The function fgets is also not allowed to simply call clearerr at the start of the function in order to distinguish these cases, because it would then have to restore the state of the stream's error indicator before returning. Setting the stream's error indicator is not possible in the C standard library; it would require an implementation-specific function. Looking at the return value of getc will not always be able to resolve this ambiguity, because a return value of EOF can mean both end-of-file or error.
Why is the return statement not return (ferror(iop) || (c == EOF && cs == s)) ? NULL : s;
Because this is not the best option either.
return (c == EOF && cs == s) ? NULL : s; well handles the case of EOF due to end-of-file, but not EOF due to input error.
ferror(iop) is true when an input error just occurred or when an input error had occurred before. It is possible that fgetc() can return non-EOF after returning EOF due to input error. This differs from feof(), which is sticky. Once EOF occurs due to end-of-file, feof() continues to return EOF, unless the end-of-file flag is cleared.
A better alternative would be to insure an EOF just occurred rather than use an unqualified ferror().
if (c == EOF) {
if (feof(iop)) {
if (cs == s) return NULL;
} else {
return NULL; // Input error just occurred.
}
}
return s;
Pedantic n
Pathological cases:
The below suffers when n <= 0 as *cs = '\0' writes into cs[] outside its legal range. --n is a problem when n == INT_MIN.
while (--n > 0 && (c = getc(iop)) != EOF) // Is --n OK?
if ((*cs++ = c) == '\n')
break;
*cs = '\0'; // Is n big enough
// Alternative
while (n > 1 && (c = getc(iop)) != EOF) {
n--;
*cs++ = c;
if (c == '\n') {
break;
}
}
if (n > 0) *cs = '\0';
Pedantic CHAR_MAX >= INT_MAX
To note: On rare machines (some old graphics processors), returning an EOF may be a valid CHAR_MAX. Alternative code not presented.
Uninitialized c
Testing c == EOF is UB as it is not certain c was ever set with a small n.
Better to initialize C: register int c = 0;
More:
What are all the reasons fgetc() might return EOF?.
Is fgets() returning NULL with a short buffer compliant?.
I'm writing code that need to limit the use to enter characters that be only from A to H. Greater then H should not be accepted.
I saw that with numbers I can use that like:
if (input == 0 - 9) return 1;
But, how I do that A to H (char)?
The C Standard does not specify that character encoding should be ASCII, though it is likely. Nonetheless, it is possible for the encoding to be other (EBCDIC, for example), and the characters of the Latin alphabet may not be encoded in a contiguous sequence. This would cause problems for solutions that compare char values directly.
One solution is to create a string that holds valid input characters, and to use strchr() to search for the input in this string in order to validate:
#include <stdio.h>
#include <string.h>
int main(void)
{
char *valid_input = "ABCDEFGH";
char input;
printf("Enter a letter from 'A' - 'H': ");
if (scanf("%c", &input) == 1) {
if (input == '\0' || strchr(valid_input, input) == NULL) {
printf("Input '%c' is invalid\n", input);
} else {
puts("Valid input");
}
}
return 0;
}
This approach is portable, though solutions which compare ASCII values are likely to work in practice. Note that in the original code that I posted, an edge case was missed, as pointed out by #chux. It is possible to enter a '\0' character from the keyboard (or to obtain one by other methods), and since a string contains the '\0' character, this would be accepted as valid input. I have updated the validation code to check for this condition.
Yet there is another advantage to using the above solution. Consider the following comparison-style code:
if (input >= 'A' || input <= 'H') {
puts("Valid input");
} else {
puts("Invalid input");
}
Now, suppose that conditions for valid input change, and the program must be modified. It is simpler to modify a validation string, for example to change to:
char *valid_input = "ABCDEFGHIJ";
With the comparison code, which may occur in more than one location, each comparison must be found in the code. But with the validation string, only one line of code needs to be found and modified.
Further, the validation string is simpler for more complex requirements. For example, if valid input is a character in the range 'A' - 'I' or a character in the range '0' - '9', the validation string can simply be changed to:
char *valid_input = "ABCDEFGHI0123456789";
The comparison method begins to look unwieldy:
if ((input >= 'A' && input <= 'I') || (input >= '0' && input <= '9')) {
puts("Valid input");
} else {
puts("Invalid input");
}
Do note that one of the few requirements placed on character encoding by the C Standard is that the characters '0', ..., '9' be encoded in a contiguous sequence. This does allow for portable direct comparison of decimal digit characters, and also for reliably finding the integer value associated with a decimal digit character through subtraction:
char ch = '3';
int num;
if (ch >= '0' && ch <= '9') {
printf("'%c' is a decimal digit\n", ch);
num = ch - '0';
printf("'%c' represents integer value %d\n", ch, num);
}
The if statement you present here is equal to:
if (input == -9) return 1;
which will return 1 in the case of an input equal to -9, so there is no range checking at all.
To allow numbers from 0 to 9 you have to compare like:
if (input >= 0 && input <= 9) /* range valid */
or with the characters that you want (A to H)1:
if (input >= 'A' && input <= 'H') /* range valid */
If you want to return 1 if the input is not in a valid range just put the logical not operator (!) in front of the condition:
if (!(input >= 'A' && input <= 'H')) return 1; /* range invalid */
1 You should take care of the used range if working with conditions that uses character ranges because the range needs an encoding that specify the letters in an incrementing order without any gaps in between the range (ASCII code e.g.: A = 65, B = 66, C = 67, ..., Z = 90).
There are encoding where this rule breaks. As the other answer of #DavidBowling stated there is for example EBCDIC (e.g.: A = 193, B = 194, ..., I = 200, J = 209, ..., Z = 233) which has some gaps in between the range from A to Z. Nevertheless the condition: (input >= 'A' && input <= 'H') will work with both encodings.
I never fall about such an implementation yet and it is very unlikely. Most implementations uses the ASCII code for which the condition works.
Nevertheless his answer provides a solution that is working in every case.
It's as simple as:
if(input >='A' && input<='H') return 1;
C doesn't let you specify ranges like 0 - 9.
In fact that's an arithmetic expression "zero minus nine" and evaluates to minus nine (of course).
Nerd Corner:
As others point out this is not guaranteed by the C standard because it doesn't specify a character encoding though in practice all modern platforms encode these characters the same as ASCII. So it's very unlikely you will come unstuck and if you're working in an environment where it won't work you'd have been told!
A truly portable implementation could be:
#include <string.h>//contains strchr()
const char* alpha="ABCDEFGHIJKLMNOPQRSTUVWXYZ";
const char* pos=strchr(alpha,input);
if(pos!=NULL&&(pos-alpha)<8) return 1;
This tries to find the character in an alphabet string then determines if the character (if any) pointed to is before 'I'.
This is total over engineering and not the answer you're looking for.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am going through the "C Programming Book" by K&R.
Now the code for the function "getint()" is as follows :-->
#include<stdio.h>
#define BUFSIZE 100
char buf[BUFSIZE];
int bufp = 0;
int getch(void) {
return (bufp > 0)?buf[--bufp]:getchar();
}
void ungetch(int c) {
if(bufp >= BUFSIZE)
printf("ungetch: too many characters\n");
else
buf[bufp++] = c;
}
int getint(int *pn) {
int c, sign;
while(isspace(c = getch()));
if(!isdigit(c) && c != EOF && c != '-' && c != '+') {
ungetch(c);
return 0;
}
sign = (c == '-')?-1:1;
if(c == '+' || c == '-')
c = getch();
*pn = 0;
while(isdigit(c)) {
*pn = (*pn * 10) + (c - '0');
c = getch();
}
*pn *= sign;
if(c != EOF)
ungetch(c);
return c;
}
int main(int argc, char** argv) {
int r, i;
while((r = getint(&i)) != EOF)
if(r != 0)
printf("res: %d\n", i);
return 0;
}
Now I don't get the step by step working procedure of this function, even though I tried to run it theoretically on a paper.
And the fact that when I input "23". how does it converted to 23 , I know there is the logic to convert "23" to 23 but c = getch() doesn't store the remaining "3" in the buffer after input then how does it get back the 3, during the conversion.
Does getchar() have it's own buffer where it stores all the inout characters and fetch them 1 by 1.
Any help is highly appriciated.
In the code snippet you provided, the main logic is here:
1. *pn = 0;
2. while(isdigit(c)) {
3. *pn = (*pn * 10) + (c - '0');
4. c = getch();
5. }
The pn is a buffer that will hold the final value of integer and c is the char that is read each time, one by one, by getchar(). So, when you are reading in a "23", here's what is happening:
'2' was read onto c
pn = 0; c = '2'; on line 3 (from the snippet with main logic) we multiply by 10 the value in buffer and add (0x32 - 0x30)
pn = 2; c = 2;
read '3' onto c
multiplying pn by 10 gives you 20, you add (0x33 - 0x30) and you have the final 23.
Things to keep in mind:
getchar() reads chars one by one from stdin
adding simple printf() statements would help you in understanding the flow of your program
try to run it under gdb, examine values of variables
And the fact that when I input "23". how does it converted to 23 , I know there is the logic to convert "23" to 23 but c = getch() doesn't store the remaining "3" in the buffer after input then how does it get back the 3, during the conversion. Does getchar() have it's own buffer where it stores all the inout characters and fetch them 1 by 1.
From this, I read that you expect getch() to somehow receive your whole line of input. Well, that's wrong.
First a quick side-note about the getchar() vs getch() confusion here. In standard C, getch() requires an argument of type FILE *. This is a stream. getchar() is equivalent to getch(stdin). The code shown here seems to be pre-standard C. I assume any occurence of getch() should really be getchar().
What you have to know is that stdio.h FILE * streams are buffered. There are different modes (no buffering, line buffering and full buffering) available.
stdin is your default input stream. It will typically come from the keyboard (but your program doesn't care about that, it could be redirected to come from a file, pipe, etc). The default buffering mode of stdin is line buffered.
So what happens when you input 23 <enter> is that the 2 will only go in the input buffer of stdin, as well as the 3, and only when a newline follows (pressing the enter key, this is the character \n), there's finally something available to read on stdin.
getchar() doesn't care about buffering. It reads from stdin, waiting until there is something available to read. Then it reads a single character, so if there are more characters in stdin's input buffer, they will stay there until read by getchar() or any other function reading from stdin.
I'm studying K&R book. Currently i'm reading function getop() at p.78.
I do understand the code but i need clarifications about 2 things.
The code of getop() is as follows:
int getch(void);
void ungetch(int);
/* getop: get next character or numeric operand */
int getop(char s[])
{
int i, c;
while ((s[0] = c = getch()) == ' ' || c == '\t')
;
s[1] = '\0';
if (!isdigit(c) && c != '.')
return c; /* not a number */
i = 0;
if (isdigit(c)) /* collect integer part */
while (isdigit(s[++i] = c = getch()))
;
if (c == '.') /* collect fraction part */
while (isdigit(s[++i] = c = getch()))
;
s[i] = '\0';
if (c != EOF)
ungetch(c);
return NUMBER;
}
My question is about: s[0] in:
while ((s[0] = c = getch()) == ' ' || c == '\t')
The idea behind the while loop is to skip spaces and horizontal tab, so why are we saving 'c' in s[0]? Why the authors didn't simply write:
while (c= getch() == ' ' || c == '\t')
We are not going to use spaces and tabs later on, why do we need to save c in s[0] for? What is the need for s[0] here?
My second question is about:
s[1] = '\0';
Why are we assigning '\0' (end of string) to s[1] here?
I have read some of the previous answers posted on stackoverflow.com about it but i'm not totally convinced!
The accepted answer about the above question is: "Because the function might return before the remaining input is read, and then s needs to be a complete (and terminated) string."
Ok. But what if input has one white space at the beginning and followed by an operand or operator? In this case, s[1] = '\0' will close the string too early? isn't it?
In answer to your first question, the assignment to s[0] in this case is a convenient coding shortcut. The value of c is copied to s[0] for every character read by getch(), regardless of whether it will be used or discarded. If it is to be discarded, no big deal; it will be overwritten on the next iteration of the while() loop. If it is to be used, then it has already been copied into its necessary location in the destination array s[].
In answer to your second question,
But what if input has one white space at the beginning and followed by
an operand or operator?
Note that the previous while() loop prevents white space characters (spaces and tabs) from appearing in s[0] after exit from the loop. Therefore, after execution of
s[1] = '\0';
the s[] string will consist of a single character that is neither a space nor a tab, followed by a string terminator.
In the next statement
if (!isdigit(c) && c != '.')
return c; /* not a number */
the function will return if the character is anything but a digit or a decimal point. This is why it was necessary to terminate the string.
But what if input has one white space at the beginning and followed by an operand or operator? In this case, s[1] = '\0' will close the string too early? isn't it?
Nope,
i = 0;
if (isdigit(c)) /* collect integer part */
while (isdigit(s[++i] = c = getch()))
This makes sure, that if there is something to be read, it will get overwritten on \0, as i=0 and s[++i] would mean, storing in s[1], which contains the \0
for your first question about: s[0] in:
while ((s[0] = c = getch()) == ' ' || c == '\t')
because the saving 'c' in s[0] help to storing first number in advanced so that we can start our next code from simply i equal to 1.
i = 0;
if (isdigit(c)) /* collect integer part */
while (isdigit(s[++i] = c = getch()))
the above code is used for storing next string character which is start from index i = 1
About your second question :
we can not do
s[0] = '\0';
because at that time we already stored first number in string at s[0]
see
(s[0] = c = getch())
The answers given here are already good, though i would like to add one more point on the 2nd question.
"Ok. But what if input has one white space at the beginning and followed by an operand or operator? In this case, s[1] = '\0' will close the string too early? isn't it?"
In this case we do not care about the string at all(it would be overwritten anyway if a number is encountered) because the string is used only if a decimal number is encountered , rest of the characters such as '+' or '-' or '\n' are directly returned.
I am currently studying the well-known book for C - The C Programming Language, 2Ed. And when I trying the code in P.29, I think there is something wrong in the getline function:
int getline(char s[], int lim) {
int c, i;
for (i=0; i<lim-1 && (c=getchar()) != EOF && c!='\n'; i++)
s[i] = c;
if (c == '\n') {
s[i] = c;
i++;
}
s[i] = '\0';
return i;
}
What if when the for loop ended, i == lim-1 and c == '\n'? In this case, I think the array would be out of boundary, since s[lim] would be set to '\0'.
Does anyone think this is wrong? Thanks for your help.
The && operator has "early-out" semantics. This means that if i == lim-1, the rest of the condition is not executed - in particular, c = getchar() will not be called.
This means that in this case, c will have its value from the last iteration of the loop - and since the loop condition includes c != '\n', this value can't be '\n' (or the loop would have exited last time around).
This is true as long as lim is greater than 1, which must be a precondition of the function (because calling the function with lim less than or equal to 1 would cause the uninitialised value of c to be read).
So, let's look at some cases:
If lim == 0:, then this will do undefined behavior. There's two places this will happen:
We will execute no iterations of the for loop, giving i == 0 and c == undefined.
We then access c at (c == '\n'). It has no defined value yet, so it's undefined behavior.
We then cause undefined behavior again by overflowing s with: s[i] = '\0';
What if lim == 1:
The for loop will not be run, because the condition is not met.
We will hit undefined behavior just like in lim == 0 because c has no value.
The last line will work fine.
What if lim == 2, and the input string is "ab":
The for loop will grab 'a', and place it into s.
The for loop will exit on the next iteration, with the value of c still being 'a'.
The if conditional fails.
The adding of the null character works fine.
So s == "a\0"
What if lim == 2 and the input string is "a\n" (Which is the case you're worried about):
The for loop will grab 'a', and place it into s.
The for loop will exit on the next iteration, with the value of c still being 'a'.
The if conditional fails.
The adding of the null character works fine.
So s == "a\0"
Loop will continue until the condition i < lim-1 is true.When i == lim - 1, condition for loop becomes false and loop will terminate, and last element of array will stored in s[lim -2].It will not go out of boundary.
I think you are correct. - but in a different way.
There may have been is a problem. Should the limit be reached i==lim-1, and c had the value \n from the previous loop - but this can not happen as the previous loop c!='\n' would have exited.
This is a problem with lim <=1. The for loop exits, and c is not intiialized yet, thus undefined behavior with if (c == '\n'). Could be fixed with
int c = 0;
As mentioned by others, there is an additional problem with lim = 0 and s[i] = '\0';
It isn't possible for i to be lim-1 and c to be '\n' at the same time. If i==lim-1, then i<lim-1 will be false, so it will never read the next character. If c was '\n', then the loop would have terminated before i got to be lim-1.
The loop is equivalent to this:
i=0;
while (i<lim-1) {
c = getchar();
if (c==EOF) break;
if (c=='\n') break;
s[i] = c;
i++;
}