At page29 of 'The C programming language' (Second edition) by K&R I've read a procedure I think is broken. Since I'm a beginner I would expect I'm wrong though I cannot explain why.
Here's the code:
#include <stdio.h>
#define MAXLINE 1000 // Maximum input line size
int get1line(char line[], int maxline);
void copy(char to[], char from[]);
// Print longest input line
int
main()
{
int len; // Current line lenght
int max; // Maximum lenght seen so far
char line[MAXLINE]; // Current input line
char longest[MAXLINE]; // Longest line saved here
max = 0;
while ((len = get1line(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0) // There was a line to read
printf("Longest string read is: %s", longest);
return 0;
}
// `get1line()` : save a line from stdin into `s`, return `lenght`
int
get1line(char s[], int lim)
{
int c, i;
for (i = 0; i < lim -1 && (c = getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
// `copy()` : copy `from` into `to`; assuming
// `to` is big enough.
void
copy(char to[], char from[])
{
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
My perplexity is: we're using the function get1line and suppose that at the end of the for-loop i is set at lim -1. Then the following if-statement will update i at lim, causing the next instruction (the one which puts the NULL character at the end of the string) to corrupting the stack (since s[lim] is not allocated, in that case).
Is the code broken?
Summary: It's not possible to exit the loop with both i == lim-1 and c == '\n', so the case you are worried about never arises.
In detail: We can rewrite the for loop (while preserving its meaning) to make the order of events clear.
i = 0;
for (;;) {
if (i >= lim-1) break; /* (1) */
c = getchar();
if (c == EOF) break; /* (2) */
if (c == '\n') break; /* (3) */
s[i] = c;
++i;
}
At loop exit (1) it can't be the case that c == '\n' because if that were the case then the loop would have exited at (3) the previous time around.*
At loop exits (2) and (3) it can't be the case that i == lim-1 because if that were the case then the loop would have exited at (1).
* This depends on lim being at least 2, so that there was in fact a previous time around the loop. The program only ever calls get1line with lim equal to MAXLINE, so this is always the case.**
** You could make make the function safe when lim is less than 2 by initializing c to a value other than '\n' before the loop begins. But if you are concerned about this possibility, then you might also want to be concerned about the possibility that lim is INT_MIN, so that lim-1 results in undefined behaviour due to integer overflow.
The code is wrong if lim == 0 because it uses c uninitialised and adds the \0.
It is also wrong if lim == 1 because it uses c uninitialised. Calling the function with lim < 2 is not very useful, but it should not fail like this.
If lim > 1 then the function is OK
for (i = 0; i < lim -1 && (c = getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
The loop exits either if i == lim-1 or if c == EOF or if c == '\n'.
If the first condition is true (i == lim-1), then the last condition is definitely not true (unless lim < 2, as noted above).
If the first condition is false (i < lim-1), then even if the loop exits with c == \n, we know that there is space in the buffer because we know that i < lim-1.
Related
i could not understand what mgetline does in this code.
anyone can help me?
int mgetline(char s[],int lim)
{
int c, i;
for(i = 0; i < lim - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if(c == '\n')
{
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
The function basically reads characters one-by-one from the the standard input stream stdin until you enter a \n (newline) or the array limit of s, lim, is reached. The characters are stored in the char s[] and the length of what was read is returned.
It's hard to answer with more detail since it's a little unclear what it is you don't understand, but I've tried to annotate the code to make it somewhat clearer.
This is the same code, only reformatted to fit my comments.
int mgetline(char s[], int lim) {
int c, i;
for(i = 0; // init-statement, start with `i` at zero
i < lim - 1 // condition, `i` must be less than `lim - 1`
&& // condition, logical AND
(c = getchar()) !=EOF // (function call, assignment) condition, `c` must not be EOF
&& // condition, logical AND
c != '\n'; // condition, `c` must not be `\n` (newline)
++i) // iteration_expression, increase i by one
{
s[i] = c; // store the value of `c` in `s[i]`
}
if(c == '\n') { // if a newline was the last character read
s[i] = c; // store it
++i; // and increase i by one
}
s[i] = '\0'; // store a null terminator last
return i; // return the length of the string stored in `s`
}
In the condition part of the for loop you have 3 conditions that must all be true for the loop to enter the statement for(...;...;...) statement. I've made that statement into a code block to make it easier to see the scope. EOF is a special value that is returned by getchar() if the input stream (stdin) is closed.
Note: If you pass an array of one char (lim == 1) this function will cause undefined behavior. Any program reading uninitialized variables has undefined behavior - and that's a bad thing. In this case, if lim == 1, you will read c after the loop and c will then still be uninitialized.
Either initialize it:
int mgetline(char s[], int lim) {
int c = 0, i;
or bail out of the function:
int mgetline(char s[], int lim) {
if(lim < 2) {
if(lim == 1) s[0] = '\0';
return 0;
}
int c, i;
I'm towards solving the exercise, but just half way, I find it so weird and cannot figure it out,
the next is the code snippet, I know it is steps away from finished, but I think it's worth figuring out how come the result is like this!
#define MAXLINE 1000
int my_getline(char line[], int maxline);
int main(){
int len;
char line[MAXLINE];/* current input line */
int j;
while((len = my_getline(line, MAXLINE)) > 0 ){
for (j = 0 ; j <= len-1 && line[j] != ' ' && line[j] != '\t'; j++){
printf("%c", line[j]);
}
}
return 0;
}
int my_getline(char s[], int limit){
int c,i;
for (i = 0 ; i < limit -1 && (c = getchar()) != EOF && c != '\n'; i++)
s[i] = c;
if (c == '\n'){
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
It will be compiled successfully with cc: cc code.c. But the following result is subtle!
Iit is working for lines without \t and blanks:
hello
hello
but it does not work for the line in the picture:
I typed hel[blank][blank]lo[blank]\n:
Could anyone help me a bit? many thanks!
The problem is that you are stuck because you try to get a full line and process it. It's better to process (and the problems of K&R are mostly this way all) the input char by char. If you don't print characters as you detect spaces, but save them in a buffer, and print them if there's a nontab character when you read one past the accumulated ones, then everything works fine. This is also true for new lines. You should keep the last (nonblank) character (as blanks are eliminated before a new line) read to see if it is a newline... in that case, the new line you have just read is not printed, and so, sequences of two or more newlines are only printed the first. This is a sample complete program that does this:
#include <stdio.h>
#include <stdlib.h>
#define F(_f) __FILE__":%d:%s: "_f, __LINE__, __func__
int main()
{
char buffer[1000];
int bs = 0;
int last_char = '\n', c;
unsigned long
eliminated_spntabs = 0,
eliminated_nl = 0;
while((c = getchar()) != EOF) {
switch(c) {
case '\t': case ' ':
if (bs >= sizeof buffer) {
/* full buffer, cannot fit more blanks/tabs */
fprintf(stderr,
"we can only hold upto %d blanks/tabs in"
" sequence\n", (int)sizeof buffer);
exit(1);
}
/* add to buffer */
buffer[bs++] = c;
break;
default: /* normal char */
/* print intermediate spaces, if any */
if (bs > 0) {
printf("%.*s", bs, buffer);
bs = 0;
}
/* and the read char */
putchar(c);
/* we only update last_char on nonspaces and
* \n's. */
last_char = c;
break;
case '\n':
/* eliminate the accumulated spaces */
if (bs > 0) {
eliminated_spntabs += bs;
/* this trace to stderr to indicate the number of
* spaces/tabs that have been eliminated.
* Erase it when you are happy with the code. */
fprintf(stderr, "<<%d>>", bs);
bs = 0;
}
if (last_char != '\n') {
putchar('\n');
} else {
eliminated_nl++;
}
last_char = '\n';
break;
} /* switch */
} /* while */
fprintf(stderr,
F("Eliminated tabs: %lu\n"),
eliminated_spntabs);
fprintf(stderr,
F("Eliminated newl: %lu\n"),
eliminated_nl);
return 0;
}
The program prints (on stderr to not interfer the normal output) the number of eliminated tabs/spaces surrounded by << and >>. And also prints at the end the full number of eliminated blank lines and the number of no content lines eliminated. A line full of spaces (only) is considered a blank line, and so it is eliminated. In case you don't want blank lines with spaces (they will be eliminated anyway, as they are at the end) to be eliminated, just assign spaces/tabs seen to the variable last_char.
In addition to the good answer by #LuisColorado, there a several ways you can look at your problem that may simplify things for you. Rather than using multiple conditionals to check for c == ' ' and c == '\t' and c == '\n', include ctype.h and use the isspace() macro to determine if the current character is whitespace. It is a much clearer way to go.
When looking at the return. POSIX getline uses ssize_t as the signed return allowing it to return -1 on error. While the type is a bit of an awkward type, you can do the same with long (or int64_t for a guaranteed exact width).
Where I am a bit unclear on what you are trying to accomplish, you appear to be wanting to read the line of input and ignore whitespace. (while POSIX getline() and fgets() both include the trailing '\n' in the count, it may be more advantageous to read (consume) the '\n' but not include that in the buffer filled by my_getline() -- up to you. So from your example output provided above it looks like you want both "hello" and "hel lo ", to be read and stored as "hello".
If that is the case, then you can simplify your function as:
long my_getline (char *s, size_t limit)
{
int c = 0;
long n = 0;
while ((size_t)n + 1 < limit && (c = getchar()) != EOF && c != '\n') {
if (!isspace (c))
s[n++] = c;
}
s[n] = 0;
return n ? n : c == EOF ? -1 : 0;
}
The return statement is just the combination of two ternary clauses which will return the number of characters read, including 0 if the line was all whitespace, or it will return -1 if EOF is encountered before a character is read. (a ternary simply being a shorthand if ... else ... statement in the form test ? if_true : if_false)
Also note the choice made above for handling the '\n' was to read the '\n' but not include it in the buffer filled. You can change that by simply removing the && c != '\n' from the while() test and including it as a simple if (c == '\n') break; at the very end of the while loop.
Putting together a short example, you would have:
#include <stdio.h>
#include <ctype.h>
#define MAXC 1024
long my_getline (char *s, size_t limit)
{
int c = 0;
long n = 0;
while ((size_t)n + 1 < limit && (c = getchar()) != EOF && c != '\n') {
if (!isspace (c))
s[n++] = c;
}
s[n] = 0;
return n ? n : c == EOF ? -1 : 0;
}
int main (void) {
char str[MAXC];
long nchr = 0;
fputs ("enter line: ", stdout);
if ((nchr = my_getline (str, MAXC)) != -1)
printf ("%s (%ld chars)\n", str, nchr);
else
puts ("EOF before any valid input");
}
Example Use/Output
With your two input examples, "hello" and "hel lo ", you would have:
$ ./bin/my_getline
enter line: hello
hello (5 chars)
Or with included whitespace:
$ ./bin/my_getline
enter line: hel lo
hello (5 chars)
Testing the error condition by pressing Ctrl + d (or Ctrl + z on windows):
$ ./bin/my_getline
enter line: EOF before any valid input
There are many ways to put these pieces together, this is just one possible solution. Look things over and let me know if you have further questions.
My question is concerning some code from section 1.9-character arrays in the Kernighan and Ritchie book. The code is as follows:
#include <stdio.h>
#define MAXLINE 1000 /* maximum input line size */
int getline(char line[], int maxline);
void copy(char to[], char from[]);
/* print longest input line */
main()
{
int len; /* current line length */
int max; /* maximum length seen so far */
char line[MAXLINE]; /* current input line */
char longest[MAXLINE]; /* longest line saved here */
max = 0;
while ((len = getline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
return 0;
}
/* get line: read a line into s, return length */
int getline(char s[], int lim) //
{
int c, i;
for (i=0; i<lim-1 && ((c=getchar()) != EOF) && c != '\n' ; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
/* copy : copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
My question is about the getline function. Now looking at this following output from my command line as a reference:
me#laptop
$ characterarray.exe
aaaaa
a
aaaaaa
^Z
aaaaaa
me#laptop
$
When I type in the first character which is 'a', does the character 'a' go through the for loop, in the getline function, and initialize s[0] - s[998] = a ? And my second part of the question is once the program leaves the for loop and goes to the
s[i] = '\0'
return i;
wouldn't it be initialized s[998] = '\0' and the return integer is 998? I've spent over an hour staring at this problem and I can't seem to grasp what is going on.
Character 'a' does not go through the for loop. Look here:
for (i=0; i<lim-1 && ((c=getchar()) != EOF) && c != '\n' ; ++i)
c = getchar() is where your program stops and waits for user input -- not just once, but in a loop! This is a condition of continuing the for loop, and each time it is checked, a getchar() is called. The loop goes only one char at a time. Each time a char is added to the current end of an array (indicated by i) and the loop breaks when a newline symbol is entered or when a limit size is reached. This is very compact code but it is kind of oldschool C coding which is not very readable these relaxed days. It can be broken into following parts:
i = 0;
while(i < lim - 1) {
c = getchar();
if (c == EOF || c == '\n') break;
s[i] = c;
++i;
}
s[i] = '\0';
return i;
(also I don't think a newline symbol is needed to be included in the string, so I omitted the
if (c == '\n') {
s[i] = c;
++i;
}
part.)
Here is how the input is working in the for-loop.
Before the for-loop, the array s is empty(ish. It's full of whatever garbage is in that memory). If you type in a single a and hit enter, the for-loop goes through 2 characters 'a' and '\n'. For 'a', the variable c becomes 'a' from getchar() in the for-loop parameters, and that gets saved in the i spot (which is 0) of s. So, the array s is now
s[0] = 'a' and the rest of s has random garbage in it.
Then c becomes '\n' from getchar(). This stops the for-loop because of the c != '\n' check. The if-statement has s[i], where i is 1, become '\n' and i bumps up to 2.
Now s is
s[0] = 'a'
s[1] = '\n'
getline finishes up by making s[2] be '\0', which is the end of string character, and returns i.
Your end result is
s[0] = 'a'
s[1] = '\n'
s[2] = '\0'
and i, your length, is 2.
So I'm working through the K&R C book and there was a bug in my code that I simply cannot figure out.
The program is supposed to remove all the comments from a C program. Obviously I'm just using stdin
#include <stdio.h>
int getaline (char s[], int lim);
#define MAXLINE 1000 //maximum number of characters to put into string[]
#define OUTOFCOMMENT 0
#define INASINGLECOMMENT 1
#define INMULTICOMMENT 2
int main(void)
{
int i;
int isInComment;
char string[MAXLINE];
getaline(string, MAXLINE);
for (i = 0; string[i] != EOF; ++i) {
//finds whether loop is in a comment or not
if (string[i] == '/') {
if (string[i+1] == '/')
isInComment = INASINGLECOMMENT;
if (string[i+1] == '*')
isInComment = INMULTICOMMENT;
}
//fixes the problem of print messing up after the comment
if (isInComment == INASINGLECOMMENT && string[i] == '\0')
printf("\n");
//if the line is done, restates all the variables
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
//prints current character in loop
if(isInComment == OUTOFCOMMENT && string[i] != EOF)
printf("%c", string[i]);
//checks to see of multiline comment is over
if(string[i] == '*' && string[i+1] == '/' ) {
++i;
isInComment = OUTOFCOMMENT;
}
}
return 0;
}
So this works great except for one problem. Whenever a line starts with a comment, it prints that comment.
So for instance, if I had a line that was simply
//this is a comment
without anything before the comment begins, it will print that comment even though it's not supposed to.
I thought I was making good progress, but this bug has really been holding me up. I hope this isn't some super easy thing I've missed.
EDIT: Forget the getaline function
//puts line into s[], returns length of that line
int getaline(char s[], int lim)
{
int c, i;
for (i = 0; i < lim-1 && (c = getchar()) != '\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
There are many problems in your code:
isInComment is not initialized in function main.
as pointed by others, string[i] != EOF is wrong. You need to test for end of file more precisely, especially for files that do not end with a linefeed. This test only works if char type is signed and EOF is a valid signed char value. It will nonetheless mistakenly stop on a stray \377 character, which is legal in a string or in a comment.
When you detect the end of line, you read another line and reset i to 0, but i will be incremented by the for loop before you test again for single line comment... hence the bug!
You do not handle special cases such as /* // */ or // /*
You do not handle strings. This is not a comment: "/*", nor this: '//'
You do not handle \ at end of line (escaped linefeed). This can be used to extend single line comments, strings, etc. There are more subtle cases related to \ handling and if you really want completeness, you should handle trigraphs too.
Your implementation has a limit for line size, this is not needed.
The problem you are assigned is a bit tricky. Instead of reading and parsing lines, read one character at a time and implement a state machine to parse escaped linefeeds, strings, and both comment styles. The code is not too difficult if you do it right with this method.
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
When you start a new line, you initialize i to 0. But then in the next iteration:
for (i = 0; string[i] != EOF; ++i)
i will be incremented, so you'll begin the new line with index 1. Therefore there is a bug when the line begins with //.
You can see that it solves the problem if you write instead:
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
though it's usually considered as bad style to modify for loop indices inside the loop. You may redesign your implementation in a more readable way.
Here's something that looks like a bug to me, however I am confused that my observation does not seem to pop up anywhere else on the internet, given the age and popularity of the book. Or maybe I am just bad at searching or it is not a bug at all.
I am talking about the "print out the longest input line" program from chapter one. Here's the code:
#include <stdio.h>
#define MAXLINE 1000 /* maximum input line length */
int getline(char line[], int maxline);
void copy(char to[], char from[]);
/* print the longest input line */
main()
{
int len; /* current line length */
int max; /* maximum length seen so far */
char line[MAXLINE]; /* current input line */
char longest[MAXLINE]; /* longest line saved here */
max = 0;
while ((len = getline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
return 0;
}
/* getline: read a line into s, return length */
int getline(char s[],int lim)
{
int c, i;
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
/* copy: copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
Now, it seems to me that it should be lim-2 as opposed to lim-1 in the for condition of getline. Otherwise, when the input is exactly of maximum length, that is 999 characters followed by '\n', getline will index into s[MAXLINE], which is out of bounds, and also all kinds of horrible things might happen when copy is called and from[] does not end with a '\0'.
I think you're confused somewhere. This loop condition:
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
Ensures that i is never greater than lim - 2, so in the maximum-length case, i is lim-1 after the loop exits and the null character is stored into that last position.
In the case of 999 non-\n characters followed by a \n, c never equals \n. When the for loop exits, c is equal to the last non-newline character.
So c != '\n' and doesn't enter the block that does i++ so i never goes out of bounds.
When the input is of length 999 and is followed by \n, value of limit is 1000 and the value of lim -1 will become 999 and the loop test condition i < lim -1 will become false when i becomes 998. For i < 999, c == \n never be true and hence array s will indexed to s[999] and not s[1000]