I'm going through the K & R book and the answer to one of the exercises is troubling me.
In the solutions manual, exercise 1-22 declares a char array:
#define MAXCOL 10
char line[MAXCOL];
so my understanding is that in C arrays go from 0 ... n-1. If that's the case then the above declaration should allocate memory for a char array of length 10 starting with 0 and ending with 9. More to the point line[10] is out of bounds according to my understanding? A function in the sample program is eventually passed a integer value pos that is equal to 10 and the following comparison takes place:
int findblnk(int pos) {
while(pos > 0 && line[pos] != ' ')
--pos;
if (pos == 0) //no blanks in line ?
return MAXCOL;
else //at least one blank
return pos+1; //position after blank
}
If pos is 10 and line[] is only of length 10, then isn't line[pos] out of bounds for the array?
Is it okay to make comparisons this way in C, or could this potentially lead to a segmentation fault? I am sure the solutions manual is right this just really confused me. Also I can post the entire program if necessary. Thanks!
Thanks for the speedy and very helpful responses, I guess it is definitely a bug then. It is called through the following branch:
else if (++pos >= MAXCOL) {
pos = findblnk(pos);
printl(pos);
pos = newpos(pos);
}
MAXCOL is defined as 10 as stated above. So for this branch findblnk(pos) pos would be passed 10 as a minimum.
Do you think the solution manual for K & R is worth going through or is it known for having buggy code examples?
It is never, ever okay to over-run the bounds of an array in C. (Or any language really).
If 10 is really passed to that function, that is certainly a bug. While there are better ways of doing it, that function should at least verify that pos is within the bounds of line before attempting to use it as an index.
If pos is indeed 10 then it would be an out of bounds access and accessing an array out of bounds is undefined behavior and therefore anything can happen even a program that appears to work properly at the time, the results are unreliable. The draft C99 standard Annex J.2 undefined behavior contains the follows bullet:
An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]) (6.5.6).
I don't have a copy of K&R handy but the errata does not list anything for this problem. My best guess is the condition should < instead of >=.
Code above is fine as long as pos == 9 when its passed to that function . If pos ==10 when its passed then its undefined behaviour and .. you are correct , it should be avoided.
However it may or may not give segmentation fault .
my_type buffer[SOME_CONSTANT_NAME]; almost always is a bug.
Code like the one you present in the question is the source of the majority of security problems: when the buffer overflows, it invokes undefined behaviour, and that undefined behaviour (if it does not directly crash the program) can frequently be exploited by attackers to execute their own code within your process.
So, my advice is to stay away from all fixed buffer sizes and either use C++'s std::vector<> or dynamically allocate enough memory to fit. The Posix 2008 standard makes this quite easy even in C with the asprintf() function and friends.
Related
int main ()
{
/*
char a[] = "abc";
printf("strlen(a): %li", strlen(a));
printf("\nsizeof(a): %li", sizeof(a));
*/
char b[3];
printf("\nstrlen(b): %li", strlen(b));
printf("\nsizeof(b): %li", sizeof(b));
printf("\nb = ");
puts(b);
return 0;
}
When I run the above code it outputs the following:
strlen(b): 1
sizeof(b): 3
b =
but if I undo the comment, it outputs:
strlen(a): 3
sizeof(a): 4
strlen(b): 6
sizeof(b): 3
b = ���abc
Why does this happens? I would appreciate a good in depth explanation about it principally and if possible a quick "fix" for it so I don't get this problem again.
I'm relatively a beginner in programming and C in general and based on what I learned until now, this shouldn't happen
thanks and sorry if I broke any rule from this website, I'm new here too!
strlen(b) causes undefined behavior because the array b is not initialized. The contents of the array are therefore indeterminate. strlen may return a small number if there happens to be a null byte in the garbage contents of the array (acting as a null terminator), or a large number if there is no null byte in the array but there is one in memory adjacent to it (that happens not to crash when accessed), or it may segfault, or fail in some other unpredictable way. The particular misbehavior you observe can easily depend on the contents of other nearby memory and therefore be influenced by adding or removing other variables, or altering surrounding code in apparently unrelated ways.
puts(b) is similarly undefined behavior.
(Another bug: sizeof and strlen both return size_t, for which the correct printf format specifier is %zu, not %li which would be for long int.)
I would appreciate a good in depth explanation about it principally and if possible a quick "fix" for it so I don't get this problem again.
Do not attempt to read or use the contents of local variables that have not been initialized.
See also What happens to a declared, uninitialized variable in C? Does it have a value? and (Why) is using an uninitialized variable undefined behavior?.
If you enable compiler warnings, your compiler can warn you about some instances of this, e.g. gcc catches this example. Tools like valgrind can help too.
I'm relatively a beginner in programming and C in general and based on what I learned until now, this shouldn't happen
On the contrary, such behavior is extremely common in C. The C language does not guarantee any checks for bugs like this, and implementations generally don't provide them. You should get used to the possibility that the language will not stop you from doing something erroneous, and will instead misbehave in unpredictable ways (or worse, appear to work just fine for a while). As a result, when programming in C, you have to be much more careful and attentive to the language rules than when working with "safer" languages. It's a tough and unfriendly language for beginners.
I get the outcome for squares
squares = [ 512, 1, 4, 9, 16, 25, 36, 49 ].
I know I reached the boundaries of my limit but where did 512 come from? Can you give me an explanation of all the individual steps involved in the error occurring?
int main()
{
unsigned squares[8];
unsigned cubes[8];
for (int i = 0; i <= 8; i++) {
squares[i] = i * i;
cubes[i] = i * i * i;
}
}
I would say the same thing as everyone is saying. It is undefined behavior and you should not do that.
Now, you would ask whether the undefined behavior is the reason for whatever is happening.
I would say, Yes, probably.
Now, you may think this is an easy escape from answering the question.
It may be, but..
The main problem with these kind of question is, it is very hard to re-create the same case and investigate what actually made it behave the way it did. Because undefined behavior, well, behaves in a very undefined way.
That is the reason people do not try answering these kind of questions, and people advise to stay away from undefined behavior territory.
I know i reached the boundaries of my limit
Then you should know the consequences, too. Accessing out of bound memory invokes undefined behavior.
Stay within the valid memory limit. Use
for (int i = 0; i <8; i++)
You're accessing memory beyond the limit
for (int i = 0; i <= 8; i++)
should be
for (int i = 0; i <8; i++)
Remember unsigned squares[8]; allows you to legally access squares[0] upto squares[7]
I know i reached the boundaries of my limit but where did 512 come
from.
The consequences of illegal memory access is undefined as per ISO/IEC 9899:201x 6.5.10->fn109
Two objects may be adjacent in memory because they are adjacent
elements of a larger array or adjacent members of a structure with no
padding between them, or because the implementation chose to place
them so, even though they are unrelated. If prior invalid pointer
operations (such as accesses outside array bounds) produced undefined
behavior, subsequent comparisons also produce undefined behavior
You may use a debugger(say gdb) or an instrumentation framework(say valgrind) to find where the value came from. Here 512 looks like the cube of 8 but there is no guarantee that you will get the same value on the next run. Morever, there is a chance that the program might crash.
I am doing previous year C programming exam. And I came up with this:
A program (see below) defines the two variables x and y.
It produces the given output. Explain why the character ‘A’ appears in the output of variable x.
Program:
#include <stdio.h>
main ()
{
char x[6] = "12345\0";
char y[6] = "67890\0";
y[7]='A';
printf("X: %s\n",x);
printf("Y: %s\n",y);
}
Program output:
X: 1A345
Y: 67890
It has pretty high points (7). And I don't know how to explain it in detail. My answer would be:
char array (y) only have 6 chars allocated so changing 7th character will change whatever is after that in stack.
Any help would highly appreciated! (I'm only 1st year)
Your formal answer should be that this program yields undefined behavior.
The C-language standard does not define the result of an out-of-bound access operation.
With char y[6], by reading from or writing into y[7], this is exactly what you are doing.
Some compilers may choose to allocate array x[6] immediate after array y[6] in the stack.
So by writing 'A' into y[7], this program might indeed write 'A' into x[1].
But the standard does not dictate that, so it depends on compiler implementation.
As others have implied on previous comments to your question, if it was really given on a formal exam, then you may want to consider continuing your studies elsewhere...
The classic stack corruption problem in C. With the help of a debugger, you will find that your frame stack will look like this after the original assignments:
67890\012345\0
y points to the char 6. y[7] means 7 positions after that (2). So y[7] = 'A' replaces the char 2.
Access array beyond bound is undefined in the C standard, just one more quirk of C to be aware of. Some references:
Understanding stack corruption
Why do compilers not warn about out-of-bounds static array indices?
This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 9 years ago.
I have a program which I expect it to crash but it doesn't. Can you please let me know the reason.
char a[5];
strncpy(a,"abcdefg",7);
a[7] = '\0';
printf("%s\n",a);
Shouldn't the program crash at strncpy() or at a[7]='\0' which is greater than array size of 5. I get output as abcedefg. I'm using gcc compiler.
Size of a array is five char a[5]; and your are assigning at 7th location that is buffer overrun problem and behavior of your code is Undefined at run time.
strncpy(a,"abcdefg",7);
a[7] = '\0';
Both are wrong, you need to defined array like:
#defined size 9 // greater then > 7
char a[size];
notice "abcdefg" need 8 char one extra for \0 null char.
read: a string ends with a null character, literally a '\0' character
In your example, your program has access to memory beyond a (starting address of array) plus 5 as the stack of the program may be higher. Hence, though the code works, ideally it is undefined behavior.
C often assumes you know what your doing, even (especially) when you've done something wrong. There is no bounds to an array, and you'll only get an error if your lucky and you've entered into an undefined memory location and get a segmentation fault. Otherwise you'll be able to access change memory, to whatever results.
You can't give a definition to undefined behaviour, as you are attempting by stating that it should crash. Another example of undefined behaviour that doesn't commonly crash is int x = INT_MAX + 1;, and int x = 0; x = x++ + ++x;. These might work on your system, if only by coincidence. That doesn't stop them from wreaking havoc on other systems!
Consider "Colourless, green ideas sleep furiously", or "The typewriter passed the elephant to the blackness". Do either of these statements make any sense in English? How would you interpret them? This is a similar situation to how C implementations might treat undefined behaviour.
Let us consider what might happen if you ask me to put 42 eggs in my carton that can store at least 12 eggs. The container most certainly has bounds, but you insist that they can all fit in there. I find that the container can only store 12 eggs. You won't know what happens to the 30 remaining eggs, so the behaviour is undefined.
I came across this code accidentally:
#include<stdio.h>
int main()
{
int i;
int array[3];
for(i=0;i<=3;i++)
array[i]=0;
return 0;
}
On running this code my terminal gets hanged - the code is not terminating.
When I replace 3 by 2 code runs successfully and terminates without a problem.
In C there is no bound checking on arrays, so what's the problem with the above code that is causing it to not terminate?
Platform - Ubuntu 10.04
Compiler - gcc
Just because there's no bound checking doesn't mean that there are no consequences to writing out of bounds. Doing so invokes Undefined Behavior, so there's no telling what may happen.
This time, on this compiler, on this architecture, it happens that when you write to array[3], you actually set i to zero, because i was positioned right after array on the stack.
Your code is reading beyond the bound of array and causing an Undefined Behavior.
When you declare an array of size 3. The valid index range is from 0 to 2.
While your loop runs from 0 to 3.
If you access anything beyond the valid range of an array then it is Undefined Behavior and your program may hang or crash or show any behavior. The c standard does not mandate any specific behavior in such cases.
When you say C does not do bounds checking it actually means that it is programmers responsibility to ensure that their programs do not access beyond the beyonds of the allocated array and failing to do so results in all safe bets being off and any behavior.
int array[3];
This declares an array of 3 ints, having indices 0, 1, and 2.
for(i=0;i<=3;i++)
array[i]=0;
This writes four ints into the array, at indices 0, 1, 2, and 3. That's a problem.
Nobody here can tell exactly what you're seeing -- you haven't even specified what platform you're working on. All we can say is that the code is broken, and that leads to whatever result you're seeing. One possibility is that i is stored right after array, so you end up setting i back to 0 when you do array[3]=0;. But that's just a guess.
The highest valid index for array is 2. Writing past that index invokes undefined behaviour.
What you're seeing is a manifestation of the undefined behaviour.
Contrast this with the following two snippets, both of which are correct:
/* 1 */
int array[3];
for(i=0;i<3;i++) { array[i] = 0; }
/* 2 */
int array[4];
for(i=0;i<4;i++) { array[i] = 0; }
You declared array of size 3 which means (0,1,2 are the valid indexes)
if you try to set 0 to some memory location which is not for us unexpected (generally called UB undefined behavior) things can happen
The elements in an array are numbered 0 to (n-1). Your array has 3 spots, but is initializing 4 location (0, 1, 2, 3). Typically, you'd have you for loop say i < 3 so that your numbers match, but you don't go over the upper bound of the array.