query regarding getop() in K&R C - c

In the function below:
Why it is terminating the string initially by s[1] = '\0';?
after i = 0, why starting to take values from s[1] not from s[0]?
#define NUMBER '0'
#define MAXSIZE 100
char s[MAXSIZE];
/* getop: get next character or numeric operand */
int getop(char s[ ])
{
int i, c;
while ((s[0] = c = getch()) == ' ' || c == '\t')
;
s[1] = '\0';
if (!isdigit(c) && c != '.')
return c; /* not a number */
i = 0;
if (isdigit(c)) /* collect integer part */
while (isdigit(s[++i] = c = getch()))
;
if (c == '.') /* collect fraction part */
while (isdigit(s[++i] = c = getch()))
;
s[i] = '\0';
if (c != EOF)
ungetch(c);
return NUMBER;
}

The function appears to store its result in a pointer to a string, in C style zero terminated. The first getch line stores its result in s[0], and if it's not a digit or the period, it immediately returns. Storing a zero as the 2nd character makes sure the returned string is properly ended -- it contains only one character.
After that initial step you already have one valid character, and it's stored in s[0]. So, all next getch calls need to store from 1 onwards, or it would overwrite the first character entered.

Related

writing a code that checks an ascending sequence only using loops and condition statments only

I was asked to write a code in c language that checks a certain alphabetical order from the input and determines how many "legal" orders are there. the order goes like this: I receive multiple inputs of both numbers(from 1-9) and letters(from A-Z) followed by '#' that determines the end of the input. once a number is received I should check the following letters (meaning if a received the number 3 I should check the next 3 letters and so on), these letters should be organized in ascending alphabetical order. for example ABC3DEF# (ABC- is not a legal sequence. However, 3DEF is a legal sequence and so in total, I have one legal sequence.)
I tried something but it doesn't work, the output is always 0! (note: I am only allowed to use loops and condition statements, meaning no Array, functions, pointers...). Is there an idea that I am missing? or was my idea wrong?
int i, x, sum = 0;
char cha, c = 0;
while((cha = getchar()) != '#') {
scanf("%d %c", &x, &cha);
cha = c;
if(x >= 1 && x <= 9) {
for(i = 0; i < x; i++) {
if(c == cha - i) {
sum++;
}
c--;
}
}
}
Every time you test the while loop condition, you read one character, and if it is not an #, you discard it. Those discarded characters are part of the data you need to parse.
Every time you call scanf() and the next character is a decimal digit or a + or a -, you parse that and all the following digits as a decimal number, and then read and ultimately discard the next character.
You do not attempt to read any following characters at all, nor to test whether they are letters rather than digits.
If there is no # character in the input, and in some cases even if there is one, then the program will never terminate. To solve this, it should test for EOF in addition to testing for '#'. Moreover, to do this properly you must store the return value of getchar() as the int it is; char cannot represent EOF.
Basically, almost nothing about the code presented works as described.
Parsers such as you need to write tend to be written explicitly or implicitly as state machines. You start in some initial state. You read some number of characters of input, and according to the current state and the input read, you perform an appropriate action, often including changing the current state to a different one. Lather, rinse, repeat, until a terminal state is reached (which at minimum should be achieved at the end of the input).
As pointed out in the comments, mixing scanf and getchar is full of pitfalls. Take your code as an example and your goal to parse the string "ABC3DEF#". Your calls to getchar and scanf exemplify the problem.
while((cha = getchar()) != '#') {
scanf("%d %c", &x, &cha);
...
}
When you read cha, the test (cha = getchar()) != '#' will be TRUE for every character except '#'. So for your sequence on first read cha = 'A' and "BC3DEF#" remains in stdin. You then attempt to read with scanf("%d %c", &x, &cha); and a matching failure occurs because 'B' is not a valid start of an integer value, and character extraction from stdin ceases leaving 'B' unread in stdin. Neither x or cha are assigned values.
You then attempt:
cha = c;
if(x >= 1 && x <= 9) {
Which sets cha = 0; and invokes Undefined Behavior by accessing the value of x (a variable with automatic storage duration) while the value is indeterminate. See C11 Standard - 6.3.2.1 Lvalues, arrays, and function designators(p2).
So all bets are off at that point. You loop again, this time reading 'B' with getchar and the matching failure and all subsequent errors repeat.
Further, the type for cha must be int as that is the proper return type for getchar and necessary to evaluate whether EOF has been reached (which you do not check). See: man 3 getchar
Instead, eliminate scanf from your code entirely and simply loop with:
while ((c = getchar()) != EOF && c != '\n') {
...
}
(Note:, I have used int c; where you used char cha;)
You can then handle everything else that needs to be done with three primary conditions, e.g.
while ((c = getchar()) != EOF && c != '\n') {
if (isdigit (c)) { /* if c is digit */
...
}
else if (isalpha(c)) { /* if c is [A-Za-z] */
...
}
else if (c == '#') { /* if c is end character */
...
}
}
From there you simply define several variable that help you track whether you are in a valid sequence, whether the sequence is currently legal (lgl), the number of characters read as part of the sequence (nchr) and the integer number (num) converted at the beginning of the sequence, in addition to keeping track of the previous (prev) character, e.g.
char buf[MAXC];
int c, in = 0, lgl = 1, nchr = 0, num = 0, prev = 0;
With that you can simply read a character-at-a-time keeping track of the current "state" of operations within your loop,
while ((c = getchar()) != EOF && c != '\n') {
if (isdigit (c)) { /* if c is digit */
if (!in) /* if not in seq. set in = 1 */
in = 1;
num *= 10; /* build number from digits */
num += c - '0';
}
else if (isalpha(c)) { /* if c is [A-Za-z] */
if (in) {
if (prev >= c) /* previous char greater or equal? */
lgl = 0; /* not a legal sequence */
prev = c; /* hold previous char in order */
if (nchr < MAXC - 1) /* make sure there is room in buf */
buf[nchr++] = c; /* add char to buf */
}
}
else if (c == '#') { /* if c is end character */
/* if in and legal and at least 1 char and no. char == num */
if (in && lgl && nchr && nchr == num && num < MAXC) {
buf[num] = 0; /* nul-terminate buf */
printf ("legal: %2d - %s\n", num, buf); /* print result */
}
lgl = 1; /* reset all values */
in = nchr = num = prev = 0;
}
}
Putting it altogether in a short example that will save the characters for each legal sequence in buf to allow outputting the sequence when the '#' is reached, you could do something similar to the following (that will handle sequences up to 8191 characters):
#include <stdio.h>
#include <ctype.h>
#define MAXC 8192 /* if you need a constant, #define one (or more) */
/* (don't skimp on buffer size!) */
int main (void) {
char buf[MAXC];
int c, in = 0, lgl = 1, nchr = 0, num = 0, prev = 0;
while ((c = getchar()) != EOF && c != '\n') {
if (isdigit (c)) { /* if c is digit */
if (!in) /* if not in seq. set in = 1 */
in = 1;
num *= 10; /* build number from digits */
num += c - '0';
}
else if (isalpha(c)) { /* if c is [A-Za-z] */
if (in) {
if (prev >= c) /* previous char greater or equal? */
lgl = 0; /* not a legal sequence */
prev = c; /* hold previous char in order */
if (nchr < MAXC - 1) /* make sure there is room in buf */
buf[nchr++] = c; /* add char to buf */
}
}
else if (c == '#') { /* if c is end character */
/* if in and legal and at least 1 char and no. char == num */
if (in && lgl && nchr && nchr == num && num < MAXC) {
buf[num] = 0; /* nul-terminate buf */
printf ("legal: %2d - %s\n", num, buf); /* print result */
}
lgl = 1; /* reset all values */
in = nchr = num = prev = 0;
}
}
}
(note: you can adjust MAXC to change the number of characters as needed)
Now you can go exercise the code and make sure it will handle your conversion needs (you can adjust the output to fit your requirements). While care has been taken in putting the logic together, any corner-cases that may need to be handled are left to you.
Example Use/Output
$ echo "ABC3DEF#11abcdefghijk#4AZaz#3AbC#" | ./bin/sequences
legal: 3 - DEF
legal: 11 - abcdefghijk
legal: 4 - AZaz
or
$ echo "ABC3DEF#11abcdefghijk#3AbC#4AZaz#" | ./bin/sequences
legal: 3 - DEF
legal: 11 - abcdefghijk
legal: 4 - AZaz
or without a final '#' even an otherwise legal sequence will be discarded
$ echo "ABC3DEF#11abcdefghijk#3AbC#4AZaz" | ./bin/sequences
legal: 3 - DEF
legal: 11 - abcdefghijk
Note: if you did want to accept the final legal sequence even though there is no closing '#' before EOF, you could simply add an additional conditional after your while loop terminates, e.g.
/* handle final sequence before EOF */
if (in && lgl && nchr && nchr == num && num < MAXC) {
buf[num] = 0; /* nul-terminate */
printf ("legal: %2d - %s\n", num, buf); /* print result */
}
With that change the final example above would match the output of the others.

How does the function getop() from "The C Programming Book" work?

int getop(char *s) {
int i, c;
while((s[0] = c = getch()) == ' ' || c == '\t')
;
s[1] = '\0';
if(!isdigit(c) && c != '.') {
return c;
}
i = 0;
if(isdigit(c))
while(isdigit(s[++i] = c = getch()))
;
if(c == '.')
while(isdigit(s[++i] = c = getch()))
;
s[i] = '\0';
if(!isdigit(c))
ungetch(c);
return NUMBER;
}
I came across this fucntion while working on an example named "Reverse polish calculator".
we can input numbers for calculator operations via this function, but I'm not getting the working of this function. Like.,
if we enter some input like ---->
12.34 11.34 +
From
while((s[0] = c = getch()) == ' ' || c == '\t')
;
s[1] = '\0';
s will contain 1. But from where does the remaining input goes inside s ?
I've gone through this function well and I came to know it's working but then I want to know the deep working, like the complete flow.
Any help is highly appriciated.
edit:-
After testing various inputs I came to the conclusion that what is the need for getch() and ungetch(), I mean yeah they are there to unread character that is not needed but than look at the test cases.,
int getop(char *s) {
int i, c;
while((s[0] = c = getchar()) == ' ' || c == '\t')
;
s[1] = '\0';
if(!isdigit(c) && c != '.') {
return c;
}
i = 0;
if(isdigit(c))
while(isdigit(s[++i] = c = getchar()))
;
if(c == '.')
while(isdigit(s[++i] = c = getchar()))
;
s[i] = '\0';
/* if(!isdigit(c))
ungetch(c);*/
return NUMBER;
}
Here I replaced getch() with getchar() and it still accepted the input
12a 12 -
and the output was absolutely correct and it unread 'a' character as well
144
and so was the case when I was using getch() ?
Lets break it down piece by piece:
while((s[0] = c = getch()) == ' ' || c == '\t') skips all spaces and tabulation from the beginning fo the string. At the end of this loop s[0] contains the first char which is not either of the two mentioned before.
Now if c is something other than . or a digit we simply return that.
if(!isdigit(c) && c != '.') {
return c;
}
If c is not a digit we are done and it's quite right to set s[1] to null!
Otherwise, if c is a digit we read as much digit chars as we can overwriting s from position 1. s[1] was '\0' but it does not matter because s[++i] would be s[1] at the very first iteration of
i = 0;
if(isdigit(c))
while(isdigit(s[++i] = c = getch()))
so s is kept in a consistent status.

Clarification in getop()

In Dennis Ritchie's "C programming Language" book,
In getop func,
he states that s[1]='\0'
why does he end the array on index 1? What's the significance and need?
In later part he does uses other parts of the array..
int getch(void);
void ungetch(int);
/* getop: get next character or numeric operand */
int getop(char s[])
{
int i, c;
while ((s[0] = c = getch()) == ' ' || c == '\t')
;
s[1] = '\0';
if (!isdigit(c) && c != '.')
return c; /* not a number */
i = 0;
if (isdigit(c)) /* collect integer part */
while (isdigit(s[++i] = c = getch()))
;
if (c == '.') /* collect fraction part */
while (isdigit(s[++i] = c = getch()))
;
s[i] = '\0';
if (c != EOF)
ungetch(c);
return NUMBER;
}
Because the function might return before the remaining input is read, and then s needs to be a complete (and terminated) string.
The zero char will be overwritten later, unless the test if (!isdigit(c) && c != '.') is true so that the function returns early. Some coding guidelines would discourage returns from the middle of a function, btw.
Only when the function returns early is the s[1] = '\0' relevant. Therefore, one could have coded
int getop(char s[])
{
int i, c;
while ((s[0] = c = getch()) == ' ' || c == '\t')
;
if (!isdigit(c) && c != '.')
{
s[1] = '\0'; /* Do this only when we return here */
return c; /* not a number */
}
i = 0;
if (isdigit(c)) /* collect integer part */
/* ... */
It is terse code. These guys knew their language and algorithms. But the snippet lacks error handling and thus depends on correct input (invalid input may let s be empty or let s overflow).
That is not untypical. Processing of valid data is often straightforward and short; handling all the contingencies makes the code convoluted and baroque.
s[] might be only a numeric operand like "+","-" etc.
In this case s[1] have to be '\0' to let s be a proper string.

How does `ungetch` work?

I searched everywhere, but I cannot find the answer to this! I'm working on the exercise from the K&R C books with a function they call getop. When it peeks at the next character from the input and sees that it isn't a digit, where does the character get stored when unget is called? I can compile and run the code so I know it works, I just want to know where the character has been stored.
int getop(char s[])
{
int i, c;
while ((s[0] = c = getch()) == ' ' || c == '\t')
;
s[1] = '\0';
if (!isdigit(c) && c != '.')
return c; /* not a number */
i = 0;
if (isdigit(c)) /*collect integer part*/
while (isdigit(s[++i] = c = getch()))
;
if (c == '.') /*collect fraction part*/
while (isdigit(s[++i] = c = getch()))
;
s[i] = '\0';
if (c != EOF)
ungetch(c);
return NUMBER;
}
It doesn't really peek, it reads the char into the stream's buffer. On an unbuffered stream, or when you've read nothing after opening the file or seeking, ungetc() isn't guaranteed to work.

Confusion about calculator program

int getop(char s[])
{
int i = 0, c, next;
/* Skip whitespace */
while((s[0] = c = getch()) == ' ' || c == '\t')
;
s[1] = '\0';
/* Not a number but may contain a unary minus. */
if(!isdigit(c) && enter code herec != '.' && c != '-')
return c;
if(c == '-')
{
next = getch();
if(!isdigit(next) && next != '.')
return c;
c = next;
}
else
c = getch();
while(isdigit(s[++i] = c)) //HERE
c = getch();
if(c == '.') /* Collect fraction part. */
while(isdigit(s[++i] = c = getch()))
;
s[i] = '\0';
if(c != EOF)
ungetch(c);
return NUMBER;
};
what if there is no blank space or tab than what value will s[0] will initialize .......& what is the use of s[1]='\0'
what if there is no blank space or tab than what value will s[0] will intialize
The following loop will continue executing until getch() returns a character that's neither a space nor a tab:
while((s[0] = c = getch()) == ' ' || c == '\t')
;
what is the use of s[1]='\0'
It converts s into a C string of length 1, the only character of which has been read by getch(). The '\0' is the required NUL-terminator.
If there is no space or tab, you're stuck with an infinite loop.
s[1]='\0' is a way of marking the end so functions like strlen()
know when to stop reading through c strings. it's called "Null-Terminating" a string: http://chortle.ccsu.edu/assemblytutorial/Chapter-20/ass20_2.html
while((s[0] = c = getch()) == ' ' || c == '\t')
Read character until its not a tab or space.
s[1] = '\0';
Convert char array s to a proper string in C format (all strings in c must be terminated with a null byte, which is represented by '\0'.

Resources