Writing in the location outside of array - c

I've just started learning programming. This is my first post. I'm reading a book "C Programming Language" by Kernighan and Ritchie, and I came across an example that I don't understand (section 1.9, p 30).
This program takes text as input, determines the longest line, and prints it.
Char array line[MAXLINE] is declared, where MAXLINE is 1000. This should mean that the last element of this array has index of MAXLINE-1, which is 999.
However, if you look at function getline, which is being passed line[] array as an argument (and MAXLINE as lim), it appears that if user input is a line longer than MAXLINE, i will be incremented until i = lim, that is, i = MAXLINE. Therefore, the statement line[i] = '\0' will be line[MAXLINE] = '\0'.
This looks wrong to me - how can we write to the line[MAXLINE] location, if the size of line[] is MAXLINE. Wouldn't it be writing into the location outside of the array?
The only explanation I can come up with is that when declaring char array[size], C language actually creates char array[size+1] array, where the last element is reserved for the NULL character. If so, this is pretty confusing, and isn't mentioned in the book. Can anyone confirm this, or explain what's going on?
#include <stdio.h>
#define MAXLINE 1000 /* maximum input line length */
int getline(char line[], int maxline);
void copy(char to[], char from[]);
/* print the longest input line */
main()
{
int len; /* current line length */
int max; /* maximum length seen so far */
char line[MAXLINE]; /* current input line */
char longest[MAXLINE]; /* longest line saved here */
max = 0;
while ((len = getline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
return 0;
}
/* getline: read a line into s, return length */
int getline(char s[],int lim)
{
int c, i;
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
/* copy: copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}

This for loop appears to be doing the reading in getline:
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
It looks like i is incremented until it reaches lim - 1, not lim (where lim here is equal to MAXLINE in the case you were talking about). Hence, if the line is longer than MAXLINE, it stops after reading MAXLINE-1 characters, and tacks on the '\0' at the end like you expect.

If you look at this line, then you can see that it stops the loop two characters before the limit. i < lim -1
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
If the char was a \n it is appended, so the 0-Byte is exactly at the limit in this case, if the line is exactly one byte shorter then the limit (which is correct, because the 0-Byte is also included).

No, I think it is clean.
Note that since the book was written, POSIX has standardized a getline() function with a completely different interface; this can cause some grief, but it is fixable by renaming the function from K&R.
The code is:
int getline(char s[],int lim)
{
int c, i;
for (i = 0; i < lim-1 && (c=getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
Let's consider 2 cases:
998 characters followed by newline.
999 characters followed by newline.
In the first case, when the character before the newline is read, i is 997, which is less than 999 (lim-1), so the getchar() is executed, the character is neither EOF nor newline, and s[997] is assigned, and i is incremented to 998. Since i is still less than 999, the newline is read, and the loop is terminated. Because c is the newline, s[998] is given the newline and i is incremented to 999. Then the assignment s[i] = '\0'; writes to element 999, which is safe.
The analysis in the second case is similar. When the character before the newline is read, i is 998, which is less than 999, so getchar() is executed, the character is neither EOF nor newline, so s[998] is assigned, and i is incremented to 999. Since i is no longer less than 999, the loop exits without reading the newline; since c is not a newline, the body of the if after the loop is not executed; then the null is written to s[999], which is safe.
If EOF is detected before the newline (so the file doesn't end with a newline and technically isn't a text file according to the C standard), the loop is safely broken without overflowing the buffer.
Is there a case that isn't covered?
This is called testing the boundary conditions. It is important to test just below a limit (to make sure it works OK) and at the limit (to ensure it handles that OK). Most of the time, the algorithm doesn't need more than one test just below and one test at the limit; sometimes, if the algorithm handles several numbers either side of a limit (e.g. average of 3 cells), then you have to do more testing at the upper boundary. Lower boundary testing is also important — testing for 0, 1, 2, ... is very valuable.

general answer
reading/writing outside of allocated memory is undefined behaviour.
In many cases it will lead to the dreaded Segmentation fault.
In some cases you might get away due to sheer luck (e.g. because the actual memory you have accessed is physically/logically existing and not used otherwise).
the simple answer is: do not do this!! protect your code against accessing out-of-bounds memory.
C does never do any magic, like allocating n+1 bytes when you really only asked you to allocate n bytes.
as for your specific example
for (i=0; i < lim-1 /* ... */ ; ++i)
this will not really increment i up to lim, as the condition makes sure that i is smaller than lim-1, so as soon as it reaches lim-1 (which is still a valid index within s[]) it will stop the for-loop..

Related

Kernighan and Ritchie C exercise 1-16

I tried to implement a solution for the exercise on the C language of K&R's book. I wanted to ask here if this could be considered a legal "solution", just modifying the main without changing things inside external functions.
Revise the main routine of the longest-line program so it will
correctly print the length of arbitrary long input lines, and as much
as possible of the text.
#include <stdio.h>
#define MAXLINE 2 ////
int get_line1(char s[], int lim)
{
int c, i;
for (i = 0; i < lim - 1 && ((c = getchar()) != EOF) && c != '\n'; i++) {
s[i] = c;
}
if (c == '\n') {
s[i] = c;
i++;
}
s[i] = '\0';
return i;
}
int main()
{
int len;
int max = MAXLINE;
char line[MAXLINE];
int tot = 0;
int text_l = 0;
while ((len = get_line1(line, max)) > 0) {
if (line[len - 1] != '\n') {
tot = tot + len;
}
if (line[1] == '\n' || line[0] == '\n') {
printf("%d\n", tot + 1);
text_l = text_l + (tot + 1);
tot = 0;
}
}
printf("%d\n", text_l);
}
The idea is to set the max lenght of the string considered for the array line ad 2.
For a string as abcdef\n , the array line will be ab. Since the last element of the array is not \n (thus the line we are considering is not over), we save the length up until now and repeat the cycle. We will get then the array made of cd, then ef and at the end we will get the array of just \n. Then the else if condition is executed, since the first element of this array is\n, and we print the tot length obtained from the previous additions. We add +1 in order to also consider the new character \n. This works also for odd strings: with abcdefg\n the process will go on up until we reach g\n and the sum is done correctly.
Outside the loop then we print the total amount of text.
Is this a correct way to do the exercise?
The exercise says to “Revise the main routine,” but you altered the definition of MAXLINE, which is outside of main, so that is not a valid solution.
Also, your code does not have the copy or getline routines of the original. Your get_line1 appears to be identical except for the name. However, a correction solution would use identical source code except for the code inside main.
Additionally, the exercise says to print “as much as possible of the text.” That is unclearly stated, but I expect it means to keep a buffer of MAXLINE characters (with MAXLINE at its original value of 1000) and use it to print the first MAXLINE−1 characters of the longest line.

K&R exercise 1-22 Hints

The exercise states as follows
Write a program to "fold" long input lines in two or more shorter lines after the non-last blank character that occurs before the n-th columns of input. Make sure your program does something intelligent with very long lines, and if there are no blanks or tabs before the specified column.
My question is on how to implement the foldStrings function. I have tried some things but none of them worked.
Can you give me some hints on how to do this, but please don't write the solution down I want to figure it myself.
I have written some code but I am stuck at the folding part
#include <stdio.h>
#include <string.h>
int getline(char s[], int lim);
void emptystring(char s[]);
void foldStrings(char s[],int len);
int main(){
int len ;
char line[255];
while((len = getline(line,255))>0)
{
foldStrings(line,len);
}
return 0;
}
int getline(char s[],int lim)
{
int c , i ;
for( i = 0 ; i < lim-1 && ( c = getchar()) != EOF && c !='\n';++i)
s[i] = c;
if ( c == '\n')
{
s[i] = c;
++i;
}
return i;
}
void foldStrings(char s[], int len)
{
}
void emptystring(char s[])
{
int i;
int len = strlen(s);
for( i = 0 ; i < len ; ++i){
s[i] = 0 ;
}
}
I am stuck at the foldStrings function.
P.S I am using the empty string function to print the lines, so print a segmented line and then empty it, fill it up again and print it and so on.
Update
I have tried doing the foldStrings, here is one of my implementations
void foldStrings(char s[], int len)
{
int i ;
char temp[255];
for(i = 1;i < len-1 ;++i)
{
if( i % 16 != 0)
{
temp[i-1] = s[i-1];
}
else if(i%16 == 0)
{
printf("%s",temp)
emptystring(temp);
}
}
}
When getline() is done, s is not necessarily null character terminated.
// for( i = 0 ; i < lim-1 && ...
for( i = 0 ; i < lim-2 && ( c = getchar()) != EOF && c !='\n';++i)
...
s[i] = '\0'; // add
return i;
Same for foldStrings(). missing null character.
temp[i-1] = s[i-1];
temp[i] = '\0'; // add
Other problems may exist
This exercise is a little challenging.
At first, you don't need to buffer at all, as you only have two kinds of read chars (blank/non blank) and for a non blank character you always have to print it, so your main loop can be something like
while((c = getchar()) != EOF) {
...
}
(much the style all exercises in K&R are written)
look that when blank character are input, you only have to count them, and reset the counter on \n input.
As you ask not to reveal the final solution, I'll commit on that, but the trick is to count characters as you read the line, outputting on nonblanks (and counting) and not outputting (but only counting) on blank characters. If the character read is a blank and you have passed the limit, you'll fold the line (emit a \n)
EDITION 1
In my first attempt to write the code I discovered that the pattern blank->nonblank crossing the maximum line length boundary makes the need to break the line at the point of the first blank character and remember all the nonblank characters read so far. In that case, I'll need at most, the maximum output line length of storage (to store the non blank characters that happen to be in the data when we reach the maximum line length and have to break the line), and a maximum output line length of them have to be stored, as if I get more, for sure the line must be broken before that point.
My first attempt will be to store the number of blank characters read so far, followed by a buffer of maximum output line length (not input line length, which is unbounded, as specified in the problem). The possible statuses will be: (follow next edition of page)

Char Array trucates itself(K&R1-16&1-17)

In the following code, the char array prints up to 100 characters at location 1 and 2 but at location 3 it only prints 22. What is the reason for this behavior?
#include<stdio.h>
/* print the longest input line */
/*Exercise 1-16. Revise the main routine of the longest-line program so it will correctly print the length of arbitrary long input lines, and as much as possible of the text.*/
#define MAXLENGTH 100
int mygetline(char s[], int limit);
void charcopy(char to[], char from[]);
int main(){
char current[MAXLENGTH];
char longest[MAXLENGTH];
int curlen;
int maxlen;
maxlen = 0;
while( (curlen = mygetline(current, MAXLENGTH)) > 0 ){
if (curlen > 80)
printf("\nvery long:%d; %s\n", curlen, current);//#1# prints 100 digits
if(curlen>maxlen){
maxlen=curlen;
charcopy(longest, current);
printf("\nlonger:%d; %s\n", maxlen, longest);//#2# prints 100 digits
}
}
if (maxlen)//char array seems to truncates itself at scope boundry.
printf("\nlongest:%d; %s\n", maxlen, longest);//#3# prints 22 digits
printf("\nall done!\n");
return 0;
}
int mygetline(char s[], int limit){
int i, c;
for(i=0; i < limit-1 && ((c=getchar()) != EOF) && c != '\n'; ++i)
s[i]=c;
if(c=='\n'){
s[i]=c;
++i;}
else
if(i >= limit-1)
while (((c=getchar()) != EOF) && c != '\n')
++i;
s[i]='\0';
return i-1;
}
void charcopy(char to[], char from[]){
int i;
i=0;
while( (to[i] = from[i]) != '\0'){
++i;}
}
Its the location marked 3 in comment that prints only 22 characters instead of the full 100. Its very weird.
Edit:
As per Scotts answer, i have changed mygetline to this:
int mygetline(char s[], int limit){
int i, c, k;
for(i=0; i < limit-1 && ((c=getchar()) != EOF) && c != '\n'; ++i)
s[i]=c;
if((c=='\n') && (i < limit -1)){
s[i]=c;
++i;}
else{//if we are over the limit, just store the num of char entered without storing chars
k = 0;
while (((c=getchar()) != EOF) && c != '\n')
++k;}
s[i]='\0';
return i+k;
}
As can be seen, if input overshoots limit, then num of characters entered is stored in entirely new variable, k which does not touch the array. I still get truncation of the last printed line and i get weird 32770 as line lengths.Why? As can be seen, array is being baby sat and coddled and fed just the precise amount of char and no more.
Edit: The problem with the first listing was, as pointed out by Scott was that i was overshooting arrays. The problem with the second mygetline was that k=0; was initialized way way inside the if else nest. Moving the initialization upwards and making it global to the whole function, seems to solve the second issue.
working mygetline as follows:
int mygetline(char s[], int limit){
int i, c, k;
k=0;
for(i=0; i < limit-1 && ((c=getchar()) != EOF) && c != '\n'; ++i)
s[i]=c;
if((c=='\n') && (i < limit -1)){
s[i]=c;
++i;}
else{//if we are over the limit, just add the num of char entered without storing chars
while (((c=getchar()) != EOF) && c != '\n')
++k;}
s[i]='\0';
return i+k;
}
Ok, so the thing you need to know about C is that it doesn't babysit you at all. If you have an array declared as char foo[4] and try writing to foo[20], C won't complain at all. (It will typically throw a segmentation violation if you write into restricted memory, like NULL, but if you have access to the memory, you can do whatever you want to it.)
So, what happens, when you write to an array, further than you should? The official answer is "undefined behavior" - a blanket answer that is totally generic, and says, "It's up to the compiler." However, in most C compilers, it will do something called corrupting your stack.
The memory you ask for in any function - including main - is all allocated in a nice single coherent block. So in your main function, you have 100 bytes for current, 100 bytes for longest, 4 bytes for curlen and 4 bytes for maxlen (assuming 32-bit integers. They may also be 64 - again, depends on the compiler.) If you write to current[123], C will let you do it - and it will put whatever you wrote in the place of longest[23]. (Usually. Again, it's technically undefined behavior, so it's not guaranteed this will happen.)
Your problem is the line in mygetline where you set s[i] = '\0';. The problem is, you've let i get bigger than the array. If you printf("i = %d\n", i); right before that line, you'll see that i = 123. Your last line is not as big as your biggest line, so you're overwriting the data in longest that you don't want to overwrite.
There are lots of ways to fix this. Namely, make sure that when you set something to '\0', make sure that i <= limit - 1. (You can do this by moving the s[i] = '\0' line to above your while !EOF line and setting it to s[limit - 1] just to make sure you don't go over. You'll also need to add {} to your if statement. Generally, it's a good policy to add them to any if or while statement. They take up a line, but make sure you're coding in the right place.) Remember, the instruction is to "get the maximum length, not the maximum string.`
I'd be surprised if you're actually seeing 100 characters in the first 2 lines. From what I can tell, you should be seeing 122.

Bug in K&R second edition?

Here's something that looks like a bug to me, however I am confused that my observation does not seem to pop up anywhere else on the internet, given the age and popularity of the book. Or maybe I am just bad at searching or it is not a bug at all.
I am talking about the "print out the longest input line" program from chapter one. Here's the code:
#include <stdio.h>
#define MAXLINE 1000 /* maximum input line length */
int getline(char line[], int maxline);
void copy(char to[], char from[]);
/* print the longest input line */
main()
{
int len; /* current line length */
int max; /* maximum length seen so far */
char line[MAXLINE]; /* current input line */
char longest[MAXLINE]; /* longest line saved here */
max = 0;
while ((len = getline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
return 0;
}
/* getline: read a line into s, return length */
int getline(char s[],int lim)
{
int c, i;
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
/* copy: copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
Now, it seems to me that it should be lim-2 as opposed to lim-1 in the for condition of getline. Otherwise, when the input is exactly of maximum length, that is 999 characters followed by '\n', getline will index into s[MAXLINE], which is out of bounds, and also all kinds of horrible things might happen when copy is called and from[] does not end with a '\0'.
I think you're confused somewhere. This loop condition:
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
Ensures that i is never greater than lim - 2, so in the maximum-length case, i is lim-1 after the loop exits and the null character is stored into that last position.
In the case of 999 non-\n characters followed by a \n, c never equals \n. When the for loop exits, c is equal to the last non-newline character.
So c != '\n' and doesn't enter the block that does i++ so i never goes out of bounds.
When the input is of length 999 and is followed by \n, value of limit is 1000 and the value of lim -1 will become 999 and the loop test condition i < lim -1 will become false when i becomes 998. For i < 999, c == \n never be true and hence array s will indexed to s[999] and not s[1000]

Embracing/or not ++i in one part of a code. K&R longest line example

int getline(char s[], int lim)
{
int c, i;
for(i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if(c=='\n'){
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
This example is from K&R book on C, chapter 1.9 on arrays. What I do not understand is why do we have to embrace ++i inside if statement? Writing it outside should do the same work.
if(c=='\n')
s[i] = c;
++i;
s[i] = '\0'
return 0;
}
In case of embracing i program works as intended, but on the second case(which in my opinion should do the same work and this is why I edited that part) it doesn't. I ran it through debugger and watched i which in both cases was correctly calculated and returned. But program still won't work without embracing ++i. I don't get my print from printf statement, and Ctrl+D just won't work in terminal or XTerm(thorough CodeBlocks) I can't figure out why. Any hint please? Am I missing some logical step? Here is a complete code:
//Program that reads lines and prints the longest
/*----------------------------------------------------------------------------*/
#include <stdio.h>
#define MAXLINE 1000
int getline(char currentline[], int maxlinelenght);
void copy(char saveto[], char copyfrom[]);
/*----------------------------------------------------------------------------*/
int main(void)
{
int len, max;
char line[MAXLINE], longest[MAXLINE];
max = 0;
while( (len = getline(line, MAXLINE)) > 0 )
if(len > max){
max = len;
copy(longest, line);
}
if(max > 0)
printf("StrLength:%d\nString:%s", max, longest);
return 0;
}
/*----------------------------------------------------------------------------*/
int getline(char s[], int lim)
{
int c, i;
for(i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if(c=='\n'){
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
/*----------------------------------------------------------------------------*/
void copy(char to[], char from[])
{
int i;
i = 0;
while( (to[i]=from[i]) != '\0')
++i;
}
/*----------------------------------------------------------------------------*/
The line
if(c == '\n')
is equivalent to
if(c != EOF)
Does that help explain why the embracing occurs?
There is a logic there:
if(c=='\n'){
s[i] = c;
++i;
}
It means only if you read an additional newline, you need to increment i one more in order to keep space for the \0 character. If you put ++i outside the if block. it means that it will always increase i by 1 even there is no newline input, in this case, since i is already incremented in the for loop , there is already space for \0, therefore, ++i again will be wrong. You can print the value of i and see how it works.
The index specified by i is the location where the terminating null should be placed when there is no more input for the line. The location just before the index i contains the last valid character in the string.
Keep in mind that the loop that reads data from stdin can terminate for reasons other than reading a \n character.
If you had this construct:
if(c=='\n')
s[i] = c;
++i;
then if the last character read from stdin wasn't a newline you would increment the index by one without writing anything into the location specified by the pre-incremented value of i. You would be effectively adding an unspecified character to the result.
Worse(?), if the for loop terminated because of the i<lim-1 condition you would end up writing the terminating null character after the specified end of the array, resulting in undefined behavior (memory corruption).
The ++i is inside the if statement because we do not want to increment i if we are not placing the \n character in the current index; that would result in leaving an index in between the last character of the input and the \0 at the end of the character string.
For loop can exit due to 3 conditions
1. reading char limit reached or EOF encountered
2. New Line encountered
For first Case We need to store Null into string s , as i points to next position to last valid character read so no need to increment i.
But for second case , as i points to next position to last valid character read , we now store newline at that position then increment i for storing NULL character.
Thats why we need to increment i in 2nd case not in 1st case.
if(c=='\n'){
s[i] = c;
++i;
}

Resources