Why do we use extra expression? - c

Here is a sample from Kernighan & Ritchie's "The C Programming Language":
int getline(char s[], int lim)
{
int c, i = 0;
while (--lim > 0; && (c=getchar()) !=EOF && c !='\n')
{
s[i++] = c;
}
if (c =='\n')
{
s[i++] = c;
}
s[i] = '\0';
return i;
}
Why do we should check if c != '\n', despite we use s[i++] = c after that?

The functions reads characters from the standard input until either EOF or a newline characters is found.
The second check ensures that the only newline character is put into the char array. EOF shouldn't occur in a proper c-string. Also, if the character isn't newline that means that we might have filled up our c-string, in which case we shouldn't put any more characters into it.
Notice we still append the '\0'. We've ensured that theres still room for one more character in our c-string, as we use the pre-fix decrementor, which evaluates before the comparison.

The comparison is to ensure readline terminates when it encounters a newline character (the '\n'). On the iteration where it does, it terminates without adding the newline to the string, so the statement after that ensures that the string is always newline terminated, even if one of the other termination conditions was reached.

There is a bug in the code.
If the size of s is N bytes and the user types a newline as the (N-1)th character, the Nth character will become a '\n' and the (N+1)th character (which is not allocated) will become a '\0'.

You do that just to exit the while loop on new line. Else you would have to check it in while body and use break.

That ensures that you stop at the end of the line even if it's not the end of the input. Then if there is a newline the \n is added to the end of the line and i incremented one more time to avoid overwriting it with the \0.

int getline(char s[], int lim)
{
int c, i;
i=0;
/* While staying withing limit and there is a char in stdin and it's not new line sign */
while (--lim > 0; && (c=getchar()) !=EOF && c !='\n')
/* Store char at the current position in array, advance current pos by one */
s[i++] = c;
/* If While loop stopped on new-line, store it in array, advance current pos by one */
if (c =='\n')
s[i++] = c;
/* finally terminate string with \0 */
s[i] = '\0';
return i;
}

I'm not sure whether I understand the question. c !='\n' is used to stop reading the line when the end of line (linefeed) occurs. Otherwise we would always read it until the limit even if it ends before. The first s[i++] = c; in the while-loop doesn't occur if a linefeed has been reached. That's why there is the special test afterwards and the other s[i++] = c; in case it was a linefeed which broke the loop.

Not answering your question, but I'll write some comments anyway:
I don't remember all K&R rules, but the function you've listed will fail if lim is equal to one. Then you won't run the loop which leaves c unintialised, but you'll still use the variable in the if (c == '\n') check.
Also the while (--lm > 0; ...) thing will not go through the compiler. Remove the ';' and it does.

Related

I don't understand how how this code works

I'm doing Exercise 1-9 in the K&R Book, while trying to find solutions I came across this code:
int main()
{
int c;
while ((c = getchar()) != EOF) {
if (c == ' ') {
while ((c = getchar()) == ' ');
putchar(' ');
if (c == EOF) break;
}
putchar(c);
}
}
Why does the first if statement work if even if I input a letter. From my understanding it will only execute if the character I input is a blank space?
Btw the exercise is making a program replace multiple consecutive blank spaces to a single one.
This program prints any entered character except the blank character ' ' until the user will interrupt the input.
In this while loop
while ((c = getchar()) == ' ');
each blank character is read but not outputted. And after the loop only one blank character is outputted
putchar(' ');
That is the program removes adjacent blank characters leaving only one blank character in the input sequence of characters entered by the user.
I've formatted out and comented the code. Hope that'll help. The code actually covert all
sequence of spaces within stdin into one space:
"123 456 a b c " -> "123 456 a b c "
Code:
int main() {
int c;
/* we read stdin character after character */
while ((c = getchar()) != EOF) {
/* if we have read space */
if (c == ' ') {
/* we skip ALL spaces */
while ((c = getchar()) == ' ')
; /* skipping ALL spaces: we do nothing */
/* and then we print just ONE space instead of many skipped */
putchar(' ');
/* if we at the end of stdin, we have nothing more to print */
if (c == EOF)
break;
}
/* we print every non space character */
putchar(c);
}
}
It loops as long as the input stream doesn't end (EOF = End Of File).
If the entered character is a space, it will ignore any follow-up spaces and only print one space afterwards.
Otherwise it will output the entered character.
int main()
{
int c;
while ((c = getchar()) != EOF) {
For each character of input...
if (c == ' ') {
...if the character is a space...
while ((c = getchar()) == ' ');
... this loop skips all spaces that come after the first... See the semicolon in the right, that makes it read the character, check that it is a space, and does nothing with it. It is a very tricky thing to write it as such, as it is quite common to think that the loop body will be the next statement below, while it has no body at all. After a while loop, you can assume that the condition that let you enter the loop is false, so we have some true assertion: in c there's for sure no space stored (it can still be EOF, which is not a character, so we need to test for it before printing, and we do it next, after the next statement)
putchar(' ');
... after the loop, only a single space is output, corresponding to the one tested in the first if statement you mentioned in your question. Think that c is not a space character (so we cannot putchar(c);), as we skip all spaces until none remained. Still it can be an EOF indicator, that is checked below.
if (c == EOF) break;
if the character was not a space, it could only be an EOF indicator. In that case, we need to get out of the loop so we don't print it, in the next statement...
}
putchar(c);
... as we have been reading characters in c until we got a non space (nor an EOF indicator, as we got out of the loop above in that case) we need to print that char anyway. This putchar(c); statement will always print a non-space character. It is out of the if statement, as it must be done for all the characters that were initially nonspaces and the characters that followed a sequence of spaces.
}
...As above, we can assume the test condition of the loop is false, so here we can ensure that c has got EOF (but only because the break statement inside the loop also happens when c == EOF).
}
Et voila!!!
Note
while trying to find solutions I came across this code:
A final note, it is better for you if you post your attempt at programming a solution, instead of finding already made solutions by searching. You'll learn more on programming and less on googling, which IMHO is not your primary interest.

Usage of scanf ... getchar

Is the following pattern ok in C to get a string up until a newline?
int n = scanf("%40[^\n]s", title);
getchar();
It seems to work in being a quick way to strip off the trailing newline, but I'm wondering if there are shortcomings I'm not seeing here.
The posted code has multiple problems:
the s in the format string is not what you think it is: the specification is %40[^\n] and the s will try and match an s in the input stream, which may occur after 40 bytes have been stored into title.
scanf() will fail to convert anything of the pending input is a newline, leaving title unchanged and potentially uninitialized
getchar() will not necessarily read the newline: if more than 40 characters are present on the line, it will just read the next character.
If you want to read a line, up to 40 bytes and ignore the rest of the line up to and including the newline, use this:
char title[41];
*title = '\0';
if (scanf("%40[^\n]", title) == EOF) {
// end of file reached before reading anything, handle this case
} else {
scanf("%*[^\n]"); // discard the rest of the line, if any
getchar(); // discard the newline if any (or use scanf("%1*[\n]"))
}
It might be more readable to write:
char title[41];
int c, len = 0;
while ((c = getchar()) != EOF && c != '\n') {
if (len < 40)
title[len++] = c;
}
title[len] = '\0';
if (c == EOF && len == 0) {
// end of file reached before reading a line
} else {
// possibly empty line of length len was read in title
}
You can also use fgets():
char title[41];
if (fgets(title, sizeof title, stdin) {
char *p = strchr(title, '\n');
if (p != NULL) {
// strip the newline
*p = '\0';
} else {
// no newline found: discard reamining characters and the newline if any
int c;
while ((c = getchar()) != EOF && c != '\n')
continue;
}
} else {
// at end of file: nothing was read in the title array
}
Previous note, the s should be removed, it's not part of the specifier and is enough to mess up your read, scanf will try to match an s character against the string you input past the 40 characters, until it finds one the execution will not advance.
To answer your question using a single getchar is not the best approach, you can use this common routine to clear the buffer:
int n = scanf(" %40[^\n]", title);
int c;
while((c = getchar()) != '\n' && c != EOF){}
if(c == EOF){
// in the rare cases this can happen, it may be unrecoverable
// it's best to just abort
return EXIT_FAILURE;
}
//...
Why is this useful? It reads and discards all the characters remaing in the stdin buffer, regardless of what they are.
In a situation when an inputed string has, let's say 45 characters, this approach will clear the stdin buffer whereas a single getchar only clears 1 character.
Note that I added a space before the specifier, this is useful because it discards all white spaces before the first parseable character is found, newlines, spaces, tabs, etc. This is usually the desired behavior, for instance, if you hit Enter, or space Enter it will discard those and keep waiting for the input, but if you want to parse empty lines you should remove it, or alternatively use fgets.
There are a number of problems with your code like n never being used and wrong specifier for scanf.
The better approach is to use fgets. fgets will also read the newline character (if present before the buffer is full) but it's easy to remove.
See Removing trailing newline character from fgets() input

What's the role of c != '\n' condition in "for" loop (C language)?

The input of this function is s[] - char array and lim - max possible length of this array.
The function itself is used to determine the length of an inputed char array which is entered in the console.
The question is, what's the main idea of c != '\n' condition in for loop?
I guess it's used to break a loop. It's quite clear. But I can't get how it can be implemented if I don't type \n in my input.
Is that a terminator at the end of an array like \0?
If that's the case why should we use the if (c == '\n') condition after that?
The code:
int getline(char s[],int lim)
{
int c, i;
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
As you press Enter a \n newline character is added to the input buffer, stdin, this means that this is the last character in the input buffer.
So it's only fitting the cycle continues on until getchar() retrieves this last character.
The cycle control states that c must be:
Different from \n.
Different fromEOF.
And the iterator i must be lower than lim - 1.
If one of these conditions is not met the cycle breaks, so it's possible that c is not equal to \n.
After that, if c is equal to \n, it will be added to s, the condition if (c == '\n') is there because, as stated, c might not be equal to \n, and in that case it shouldn't be added to s.
Lastly s must be null terminated (s[i] = '\0') so it can be properly interpreted as a string, aka a null terminated char array. \n is not a null terminator, \0 is.
The fact that \n is added is only because the implementer wanted it to be so, it's part of the implementation, it wouldn't have to be, but it is.
It can be useful in some cases, for instance fgets library function has a similar implementation, it adds \n to the containing buffer and then null terminates it.
Well you want to get a line. \0 terminates the string but a string can exist of multiple lines. As the function's name indicates you only want a single line. Lines are terminated with \n and after that a new line begins.
The condition
c != '\n'
is evaluated as true if a character different to new line character is encountered.
As you can understand, since it is a getline() implementation, it is a very good reason to stop reading from the input string.
Please note how, after this check ends the loop s[i] = '\0'; is executed in order to terminate the string one position after where previously the new line character was.
So, how do getline() works?
The core of that function is the for-loop
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
The condition i < lim-1 && (c=getchar())!=EOF && c!='\n' means: "Assign to c a character got with `getchar() until...":
the read character is EOF
the read character is \n (newline)
the number of characters requested by the caller (lim) has been read
Because '\n'it is only one of the conditions -
it can be also EOF which author does not want to save and also chars number limit can be reached and we do not want to write the nul terminator out of the char array bounds
The nul termination is not related to any of those conditions and it always happens
If the character read c is a return to new line \n, then the for loop will stop, and the array will end with \n followed by \0.
And that's what the function is supposed to do, it's called getline(), and lines end with \n.
The logic is simply this:
In the for loop we're saying that when the user hit Enter, stop reading; so it stops and won't read the rest of the input. because the condition c!='\n' becomes false.
In the if statement, we're telling the program to continue reading; so "if" the user hits Enter, the program will start reading the rest of the input from where it left off and will start looping again.
The if statement is to prevent the loop from stopping and also a way of telling the program that there are separate lines in the input.

How to get get last character on getchar()!=EOF

Is there any way to get the last element before EOF while using getchar()?
E.g I have a text saying
"Hey there
people"
What would be the condition to check the last char in that text (in our example e). My thoughts are the following but i am not sure what the if condition should be.
//pseydocode
int c;
while(c=getchar() != EOF)
if(c==EOF-1) //assuming EOF is the end EOF -1 would be the last character.
if(c==O) print O;
else if(c==P) print P;
else if (c==e) print e;
I want to check the very last character and if it's a specific letter to print it.
Thank you.
There is no indication that the character just returned from getchar is the last available character. To print the last character before EOF, you must remember the return from getchar. When EOF is returned, then print the previously remembered character.
For example, this code prints the last character of a stream:
#include <stdio.h>
int main(void)
{
int c, previous = EOF;
while (1)
{
c = getchar();
if (c == EOF)
break;
previous = c;
}
if (previous == EOF)
printf("There were no characters in the stream.\n");
else
printf("The last character was %c.\n", previous);
}
When you try the above, you are likely to find the last character is a new-line character, '\n'.
In general, it would be impossible for a C implementation to know the character just returned is the last character. Input might be coming from a terminal, for example, and the user has just typed a character, which getchar() returns. At this point, we do not know what the user will do next—they might type another character, or they might type and end-of-file indication (as by pressing control-D twice, in a Unix system). So, having just gotten a character, we do not know whether what is coming next is another character or is EOF.
char c;
int tmp = 0;
while(tmp != EOF)
{
c = (char)tmp;
tmp=getchar();
}
c becomes 0 (null terminator) in case of an empty input, otherwise it is the last character before EOF.
You have to write the code to remember the last character(if any at all) before EOF.
int c;
int lc = EOF;
while((c=getchar()) != EOF) {
lc = c;
}
//here lc will hold the last character read before EOF,
//or it will also be EOF if no characters got read at all.
Note that your original condition was while(c=getchar() != EOF), which is incorrect - it is evaluated as while(c=(getchar() != EOF)), you have to write while((c=getchar()) != EOF

Don't know how to use getchar function correctly / C

I wanted to create a function that inputs a character in the console but the problem is that sometimes when I input 'a' it is considered as an empty char and the programm asks me to re-input a char.
This is the function :
char readChar()
{
char character;
character = getchar();
character = toupper(character);
while(getchar() != '\n' && getchar() != '\0');
return character;
}
Converting a barrage of comments into an answer.
1. Note that getchar() returns an int, not a char. And your loop needs to take into account EOF more than a null byte, though it's probably OK to detect null bytes.
My best guess about the trouble is that sometimes you do scanf("%d", &i) or something similar, and then call this function — but scanf() doesn't read the newline, so your function reads a newline left over by previous I/O operations. But without an MCVE (Minimal, Complete, and Verifiable Example), we can't demonstrate that my hypothesis is accurate.
2. Also, your 'eat the rest of the line' loop should only call getchar() once on each iteration; you call it twice. One option would be to use:
int readChar(void)
{
int c;
while ((c = getchar()) != EOF && isspace(c))
;
if (c == EOF)
return EOF;
int junk;
while ((junk = getchar()) != '\n' && junk != EOF /* && junk != '\0' */)
;
return toupper(c);
}
This eats white space until it gets a non-white-space character, and then reads any junk characters up to the next newline. It would fix my hypothetical scenario. Beware EOF — take EOF into account always.
Based on reading the Q&A about scanf() not reading a newline, Voltini proposed a fix:
char readChar()
{
char character;
scanf(" %c", &character);
getchar(); //I just added this line
character = toupper(character);
return character;
}
3. That is often a good way to work. Note that it has still not dealt with EOF — you always have to worry about EOF. The getchar() after the scanf() will read the newline if the user typed a and newline, but not if they typed a-z and then newline. You have to decide what you want done with that – and a character gobbling loop is often a good idea instead of the single getchar() call:
int c;
while ((c = getchar()) != EOF && c != '\n')
;
And in response to a comment along the lines of:
Please explain the importance of handling EOF.
4. If you don't ask, you won't necessarily learn about it! Input and output (I/O) and especially input, is fraught. Users don't type what you told them to type; they add spaces before or after what you told them to type; you expect something short like good and they type supercalifragilisticexpialidocious. And sometimes things go wrong and there is no more data available to be read — the state known as EOF or "end of file".
5. In the function with char character; scanf(" %c", &character); and no check, if there is no input (the user types ^D on Unix or ^Z on Windows, or the data file ended), you have no idea what value is going to be in the variable — it is quasi-random (indeterminate), and using it invokes undefined behaviour. That's bad. Further, in the code from the question, you have this loop, which would never end if the user indicates EOF.
while (getchar() != '\n' && getchar() != '\0') // Should handle EOF!
;
6. And, to add to the complexity, if plain char is an unsigned type, assigning to character and testing for EOF will always fail, and if plain char is a signed type, you will detect EOF on a valid character (often ÿ — small latin letter y with diaeresis in Unicode and 8859-1 or 8859-15 code sets). That's why my code uses int c; for character input. So, as you can see (I hope), there are solid reasons why you have to pay attention to EOF at all times. It can occur when you don't expect it, but your code shouldn't go into an infinite loop because of that.
I'm not sure how and where to … implement this … in my code.
7. There are two parts to that. One is in the readChar() function, which needs to return an int and not a char (for the same reasons that getchar() returns an int and not a char), or which needs an alternative interface such as:
bool readChar(char *cp)
{
int c;
while ((c = getchar()) != EOF && isspace(c))
;
if (c == EOF)
return false;
*cp = toupper(c);
while ((c = getchar()) != '\n' && c != EOF /* && c != '\0' */)
;
return true;
}
so that you can call:
if (readChar(&character))
{
…process valid input…
}
else
{
…EOF or other major problems — abandon hope all ye who enter here…
}
With the function correctly detecting EOF, you then have your calling code to fix so that it handles an EOF (error indication) from readChar().
Note that empty loop bodies are indicated by a semicolon indented on a line on its own. This is the way K&R (Kernighan and Ritchie in The C Programming Language — 1988) wrote loops with empty bodies, so you find it widely used.
You will find over time that an awful lot of the code you write in C is for error handling.
Do not read potentially twice per loop as with while(getchar() != '\n' && getchar() != '\0'); #Jonathan Leffler
To read a single and first character from a line of user input:
A text stream is an ordered sequence of characters composed into lines, each line
consisting of zero or more characters plus a terminating new-line character. Whether the
last line requires a terminating new-line character is implementation-defined. C11dr §7.21.2 2
Use getchar(). It returns an int in the range of unsigned char or EOF.
Read the rest of the line.
Sample code, a bit like OP's.
int readChar_and_make_uppercase() {
int ch = getchar();
if (ch != '\n' && ch != EOF) {
// read rest of line and throw it away
int dummy;
while ((dummy = getchar()) != '\n' && dummy != EOF) {
;
}
}
return toupper(ch);
}
Apparently, I forgot to take care of the newline character stored in the buffer (correct me if I'm wrong).
char readChar()
{
char character;
scanf(" %c", &character);
getchar(); //I just added this line
character = toupper(character);
return character;
}

Resources