getline function definition in K&R - c

The following is getline implementation in K&R on page 29 ,
# define MAXLINE 1000
int getLine(char s[], int lim)
{
int c ,i ;
for(i = 0; i < lim - 1 && (c = getchar()) != EOF && c!= '\n'; ++i)
s[i] = c;
if(c == '\n'){
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
I don't understand why we need to do "i < lim - 1" here in the for loop. For correct indexing, why doesn't " i < lim " suffice ?
Any help would be much appreciated...

The function is going to build a string (stored in the array specified by the parameter char s[]) that is a sequence of characters terminated with the zero terminating character '\0'.
So one element of the array is reserved for the terminating zero character that is appended to the character array before exiting the function
s[i] = '\0';
For example if the variable lim is equal to 5 then you may enter in the array no more than 4 characters in the range of indices [0, 3]. In this case (if all lim-1 characters will be filled) in the last element of the array there will be written the terminating zero character '\0' at position 4.

Related

can anyone tell me what mgetline is doing in ? i cant understand

i could not understand what mgetline does in this code.
anyone can help me?
int mgetline(char s[],int lim)
{
int c, i;
for(i = 0; i < lim - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if(c == '\n')
{
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
The function basically reads characters one-by-one from the the standard input stream stdin until you enter a \n (newline) or the array limit of s, lim, is reached. The characters are stored in the char s[] and the length of what was read is returned.
It's hard to answer with more detail since it's a little unclear what it is you don't understand, but I've tried to annotate the code to make it somewhat clearer.
This is the same code, only reformatted to fit my comments.
int mgetline(char s[], int lim) {
int c, i;
for(i = 0; // init-statement, start with `i` at zero
i < lim - 1 // condition, `i` must be less than `lim - 1`
&& // condition, logical AND
(c = getchar()) !=EOF // (function call, assignment) condition, `c` must not be EOF
&& // condition, logical AND
c != '\n'; // condition, `c` must not be `\n` (newline)
++i) // iteration_expression, increase i by one
{
s[i] = c; // store the value of `c` in `s[i]`
}
if(c == '\n') { // if a newline was the last character read
s[i] = c; // store it
++i; // and increase i by one
}
s[i] = '\0'; // store a null terminator last
return i; // return the length of the string stored in `s`
}
In the condition part of the for loop you have 3 conditions that must all be true for the loop to enter the statement for(...;...;...) statement. I've made that statement into a code block to make it easier to see the scope. EOF is a special value that is returned by getchar() if the input stream (stdin) is closed.
Note: If you pass an array of one char (lim == 1) this function will cause undefined behavior. Any program reading uninitialized variables has undefined behavior - and that's a bad thing. In this case, if lim == 1, you will read c after the loop and c will then still be uninitialized.
Either initialize it:
int mgetline(char s[], int lim) {
int c = 0, i;
or bail out of the function:
int mgetline(char s[], int lim) {
if(lim < 2) {
if(lim == 1) s[0] = '\0';
return 0;
}
int c, i;

function to read a line on c programming language

On the C book, (second edition page 29), I read the following content:
/* getline: read a line into s, return length */
int getline(char s[], int lim)
{
int c, i;
for(i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if(c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
}
My question is: why it is i<lim-1 but not i<lim in the for condition test? (ps: lim is the maximum length of a line)
Question 2: On C, is EOF counted as a character?
Space needs to be reserved for the null-terminator \0 that is appended to the string at the end of the loop. (This is how strings are modelled in C).
EOF is a special value that denotes the end of a file. Note how getchar() returns an int: this is chiefly so the value of EOF doesn't have to be within the range of a char.

K&R - Section 1.9 - Character Arrays (concerning the getline function)

My question is concerning some code from section 1.9-character arrays in the Kernighan and Ritchie book. The code is as follows:
#include <stdio.h>
#define MAXLINE 1000 /* maximum input line size */
int getline(char line[], int maxline);
void copy(char to[], char from[]);
/* print longest input line */
main()
{
int len; /* current line length */
int max; /* maximum length seen so far */
char line[MAXLINE]; /* current input line */
char longest[MAXLINE]; /* longest line saved here */
max = 0;
while ((len = getline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
return 0;
}
/* get line: read a line into s, return length */
int getline(char s[], int lim) //
{
int c, i;
for (i=0; i<lim-1 && ((c=getchar()) != EOF) && c != '\n' ; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
/* copy : copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
My question is about the getline function. Now looking at this following output from my command line as a reference:
me#laptop
$ characterarray.exe
aaaaa
a
aaaaaa
^Z
aaaaaa
me#laptop
$
When I type in the first character which is 'a', does the character 'a' go through the for loop, in the getline function, and initialize s[0] - s[998] = a ? And my second part of the question is once the program leaves the for loop and goes to the
s[i] = '\0'
return i;
wouldn't it be initialized s[998] = '\0' and the return integer is 998? I've spent over an hour staring at this problem and I can't seem to grasp what is going on.
Character 'a' does not go through the for loop. Look here:
for (i=0; i<lim-1 && ((c=getchar()) != EOF) && c != '\n' ; ++i)
c = getchar() is where your program stops and waits for user input -- not just once, but in a loop! This is a condition of continuing the for loop, and each time it is checked, a getchar() is called. The loop goes only one char at a time. Each time a char is added to the current end of an array (indicated by i) and the loop breaks when a newline symbol is entered or when a limit size is reached. This is very compact code but it is kind of oldschool C coding which is not very readable these relaxed days. It can be broken into following parts:
i = 0;
while(i < lim - 1) {
c = getchar();
if (c == EOF || c == '\n') break;
s[i] = c;
++i;
}
s[i] = '\0';
return i;
(also I don't think a newline symbol is needed to be included in the string, so I omitted the
if (c == '\n') {
s[i] = c;
++i;
}
part.)
Here is how the input is working in the for-loop.
Before the for-loop, the array s is empty(ish. It's full of whatever garbage is in that memory). If you type in a single a and hit enter, the for-loop goes through 2 characters 'a' and '\n'. For 'a', the variable c becomes 'a' from getchar() in the for-loop parameters, and that gets saved in the i spot (which is 0) of s. So, the array s is now
s[0] = 'a' and the rest of s has random garbage in it.
Then c becomes '\n' from getchar(). This stops the for-loop because of the c != '\n' check. The if-statement has s[i], where i is 1, become '\n' and i bumps up to 2.
Now s is
s[0] = 'a'
s[1] = '\n'
getline finishes up by making s[2] be '\0', which is the end of string character, and returns i.
Your end result is
s[0] = 'a'
s[1] = '\n'
s[2] = '\0'
and i, your length, is 2.

Embracing/or not ++i in one part of a code. K&R longest line example

int getline(char s[], int lim)
{
int c, i;
for(i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if(c=='\n'){
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
This example is from K&R book on C, chapter 1.9 on arrays. What I do not understand is why do we have to embrace ++i inside if statement? Writing it outside should do the same work.
if(c=='\n')
s[i] = c;
++i;
s[i] = '\0'
return 0;
}
In case of embracing i program works as intended, but on the second case(which in my opinion should do the same work and this is why I edited that part) it doesn't. I ran it through debugger and watched i which in both cases was correctly calculated and returned. But program still won't work without embracing ++i. I don't get my print from printf statement, and Ctrl+D just won't work in terminal or XTerm(thorough CodeBlocks) I can't figure out why. Any hint please? Am I missing some logical step? Here is a complete code:
//Program that reads lines and prints the longest
/*----------------------------------------------------------------------------*/
#include <stdio.h>
#define MAXLINE 1000
int getline(char currentline[], int maxlinelenght);
void copy(char saveto[], char copyfrom[]);
/*----------------------------------------------------------------------------*/
int main(void)
{
int len, max;
char line[MAXLINE], longest[MAXLINE];
max = 0;
while( (len = getline(line, MAXLINE)) > 0 )
if(len > max){
max = len;
copy(longest, line);
}
if(max > 0)
printf("StrLength:%d\nString:%s", max, longest);
return 0;
}
/*----------------------------------------------------------------------------*/
int getline(char s[], int lim)
{
int c, i;
for(i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if(c=='\n'){
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
/*----------------------------------------------------------------------------*/
void copy(char to[], char from[])
{
int i;
i = 0;
while( (to[i]=from[i]) != '\0')
++i;
}
/*----------------------------------------------------------------------------*/
The line
if(c == '\n')
is equivalent to
if(c != EOF)
Does that help explain why the embracing occurs?
There is a logic there:
if(c=='\n'){
s[i] = c;
++i;
}
It means only if you read an additional newline, you need to increment i one more in order to keep space for the \0 character. If you put ++i outside the if block. it means that it will always increase i by 1 even there is no newline input, in this case, since i is already incremented in the for loop , there is already space for \0, therefore, ++i again will be wrong. You can print the value of i and see how it works.
The index specified by i is the location where the terminating null should be placed when there is no more input for the line. The location just before the index i contains the last valid character in the string.
Keep in mind that the loop that reads data from stdin can terminate for reasons other than reading a \n character.
If you had this construct:
if(c=='\n')
s[i] = c;
++i;
then if the last character read from stdin wasn't a newline you would increment the index by one without writing anything into the location specified by the pre-incremented value of i. You would be effectively adding an unspecified character to the result.
Worse(?), if the for loop terminated because of the i<lim-1 condition you would end up writing the terminating null character after the specified end of the array, resulting in undefined behavior (memory corruption).
The ++i is inside the if statement because we do not want to increment i if we are not placing the \n character in the current index; that would result in leaving an index in between the last character of the input and the \0 at the end of the character string.
For loop can exit due to 3 conditions
1. reading char limit reached or EOF encountered
2. New Line encountered
For first Case We need to store Null into string s , as i points to next position to last valid character read so no need to increment i.
But for second case , as i points to next position to last valid character read , we now store newline at that position then increment i for storing NULL character.
Thats why we need to increment i in 2nd case not in 1st case.
if(c=='\n'){
s[i] = c;
++i;
}

Newbie C Array question over functions

I bought a C book called "The C (ANSI C) PROGRAMMING LANGUAGE" to try and teach myself, well C. Anyhow, the book includes a lot of examples and practices to follow across the chapters, which is nice.
Anyhow, the code below is my answer to the books "count the longest line type of program", the authors are using a for-loop in the function getLine(char s[], int lim). Which allows for a proper display of the string line inside the main() function. However using while won't work - for a reason that is for me unknown, perhaps someone might shed a light on the situation to what my error is.
EDIT: To summarize the above. printf("%s\n", line); won't display anything.
Thankful for any help.
#include <stdio.h>
#define MAXLINE 1024
getLine(char s[], int lim) {
int c, i = 0;
while((c = getchar()) != EOF && c != '\n' && i < lim) {
s[++i] = c;
}
if(c == '\n' && i != 0) {
s[++i] = c;
s[++i] = '\0';
}
return i;
}
main(void) {
int max = 0, len;
char line[MAXLINE], longest[MAXLINE];
while((len = getLine(line,MAXLINE)) > 0) {
if(len > max) {
max = len;
printf("%s\n", line);
}
}
return 0;
}
You have a number of serious bugs. Here's the ones I found and how to fix them.
change your code to postincrement i to avoid leaving the first array member uninitialised, and to avoid double printing the final character:
s[++i] = c;
...
s[++i] = c;
s[++i] = '\0';
to
s[i++] = c;
...
// s[++i] = c; see below
...
s[i++] = '\0';
and fix your EOF bug:
if(c == '\n' && i != 0) {
s[++i] = c;
s[++i] = '\0';
}
to
if(c == '\n')
{
s[i++] = '\n';
}
s[i] = '\0'
Theory
When writing programs that deal with strings, arrays or other vector-type structures it is vitally important that you check the logic of your program. You should do this by hand, and run a few sample cases through it, providing sample inputs to your program and thinking out what happens.
The cases you need to run through it are:
a couple general cases
all the edge cases
In this case, your edge cases are:
first character ever is EOF
first character is 'x', second character ever is EOF
first character is '\n', second character is EOF
first character is 'x', second character is '\n', third character is EOF
a line has equal to lim characters
a line has one less than lim characters
a line has one more than lim characters
Sample edge case
first character is 'x', second character is '\n', third character is EOF
getLine(line[MAXLINE],MAXLINE])
(s := line[MAXLINE] = '!!!!!!!!!!!!!!!!!!!!!!!!...'
c := undef, i := 0
while(...)
c := 'x'
i := 1
s[1] := 'x' => s == '!x!!!!...' <- first bug found
while(...)
c := '\n'
end of while(...)
if (...)
(c== '\n' (T) && i != 0 (T)) = T
i := i + 1 = 2
s[2] = '\n' => s == '!x\n!!!!'
i := i + 1 = 3
s[3] = '\0' => s == '!x\n\0!!!' <- good, it's terminated
return i = 3
(len = i = 3) > 0) = T (the while eval)
if (...)
len (i = 3) > max = F
max = 3 <- does this make sense? is '!x\n' a line 3 chars long? perhaps. why did we keep the '\n' character? this is likely to be another bug.
printf("%s\n", line) <- oh, we are adding ANOTHER \n character? it was definitely a bug.
outputs "!x\n\n\0" <- oh, I expected it to print "x\n". we know why it didn't.
while(...)
getLine(...)
(s := line[MAXLINE] = '!x\n\0!!!!!!!!!!!!!!!!!!!...' ; <- oh, that's fun.
c := undef, i := 0
while(...)
c := EOF
while terminates without executing body
(c == '\n' && i != 0) = F
if body not executed
return i = 0
(len = i = 0 > 0) = F
while terminates
program stops.
So you see this simple process, that can be done in your head or (preferably) on paper, can show you in a matter of minutes whether your program will work or not.
By following through the other edge cases and a couple general cases you will discover the other problems in your program.
It's not clear from your question exactly what problem you're having with getLine (compile error? runtime error?), but there are a couple of bugs in your implementation. Instead of
s[++i] = something;
You should be using the postfix operator:
s[i++] = something;
The difference is that the first version stores 'something' at the index of (i+1), but the second version will store something at the index of i. In C/C++, arrays are indexed from 0, so you need to make sure it stores the character in s[0] on the first pass through your while loop, in s[1] on the second pass through, and so on. With the code you posted, s[0] is never assigned to, which will cause the printf() to print out unintialised data.
The following implementation of getline works for me:
int getLine(char s[], int lim) {
int c;
int i;
i = 0;
while((c = getchar()) != EOF && c != '\n' && i < lim) {
s[i++] = c;
}
if(c == '\n' && i != 0) {
s[i++] = c;
s[i++] = '\0';
}
return i;
}
By doing ++i instead of i++, you are not assigning anything to s[0] in getLine()!
Also, you are unnecesarilly incrementing when assigning '\0' at the end of the loop, which BTW you should always assign, so take it out from the conditional.
Also add return types to the functions (int main and int getLine)
Watch out for the overflow as well - you are assigning to s[i] at the end with a limit of i == lim thus you may be assigning to s[MAXLINE]. This would be a - wait for it - stack overflow, yup.

Resources