I am doing previous year C programming exam. And I came up with this:
A program (see below) defines the two variables x and y.
It produces the given output. Explain why the character ‘A’ appears in the output of variable x.
Program:
#include <stdio.h>
main ()
{
char x[6] = "12345\0";
char y[6] = "67890\0";
y[7]='A';
printf("X: %s\n",x);
printf("Y: %s\n",y);
}
Program output:
X: 1A345
Y: 67890
It has pretty high points (7). And I don't know how to explain it in detail. My answer would be:
char array (y) only have 6 chars allocated so changing 7th character will change whatever is after that in stack.
Any help would highly appreciated! (I'm only 1st year)
Your formal answer should be that this program yields undefined behavior.
The C-language standard does not define the result of an out-of-bound access operation.
With char y[6], by reading from or writing into y[7], this is exactly what you are doing.
Some compilers may choose to allocate array x[6] immediate after array y[6] in the stack.
So by writing 'A' into y[7], this program might indeed write 'A' into x[1].
But the standard does not dictate that, so it depends on compiler implementation.
As others have implied on previous comments to your question, if it was really given on a formal exam, then you may want to consider continuing your studies elsewhere...
The classic stack corruption problem in C. With the help of a debugger, you will find that your frame stack will look like this after the original assignments:
67890\012345\0
y points to the char 6. y[7] means 7 positions after that (2). So y[7] = 'A' replaces the char 2.
Access array beyond bound is undefined in the C standard, just one more quirk of C to be aware of. Some references:
Understanding stack corruption
Why do compilers not warn about out-of-bounds static array indices?
Related
int main ()
{
/*
char a[] = "abc";
printf("strlen(a): %li", strlen(a));
printf("\nsizeof(a): %li", sizeof(a));
*/
char b[3];
printf("\nstrlen(b): %li", strlen(b));
printf("\nsizeof(b): %li", sizeof(b));
printf("\nb = ");
puts(b);
return 0;
}
When I run the above code it outputs the following:
strlen(b): 1
sizeof(b): 3
b =
but if I undo the comment, it outputs:
strlen(a): 3
sizeof(a): 4
strlen(b): 6
sizeof(b): 3
b = ���abc
Why does this happens? I would appreciate a good in depth explanation about it principally and if possible a quick "fix" for it so I don't get this problem again.
I'm relatively a beginner in programming and C in general and based on what I learned until now, this shouldn't happen
thanks and sorry if I broke any rule from this website, I'm new here too!
strlen(b) causes undefined behavior because the array b is not initialized. The contents of the array are therefore indeterminate. strlen may return a small number if there happens to be a null byte in the garbage contents of the array (acting as a null terminator), or a large number if there is no null byte in the array but there is one in memory adjacent to it (that happens not to crash when accessed), or it may segfault, or fail in some other unpredictable way. The particular misbehavior you observe can easily depend on the contents of other nearby memory and therefore be influenced by adding or removing other variables, or altering surrounding code in apparently unrelated ways.
puts(b) is similarly undefined behavior.
(Another bug: sizeof and strlen both return size_t, for which the correct printf format specifier is %zu, not %li which would be for long int.)
I would appreciate a good in depth explanation about it principally and if possible a quick "fix" for it so I don't get this problem again.
Do not attempt to read or use the contents of local variables that have not been initialized.
See also What happens to a declared, uninitialized variable in C? Does it have a value? and (Why) is using an uninitialized variable undefined behavior?.
If you enable compiler warnings, your compiler can warn you about some instances of this, e.g. gcc catches this example. Tools like valgrind can help too.
I'm relatively a beginner in programming and C in general and based on what I learned until now, this shouldn't happen
On the contrary, such behavior is extremely common in C. The C language does not guarantee any checks for bugs like this, and implementations generally don't provide them. You should get used to the possibility that the language will not stop you from doing something erroneous, and will instead misbehave in unpredictable ways (or worse, appear to work just fine for a while). As a result, when programming in C, you have to be much more careful and attentive to the language rules than when working with "safer" languages. It's a tough and unfriendly language for beginners.
I was practicing C programming and trying to create a 2D array with fixed rows, but variable columns. So, I used "array of pointers" concept i.e. I created an array such as int* b[4].
This is the code which was written:
#include <stdio.h>
int main(void) {
int* b[4];
int c[]={1,2,3};
int d[]={4,5,6,7,8, 9};
int e[]={10};
int f[]={11, 12, 13};
b[0]=c;
b[1]=d;
b[2]=e;
b[3]=f;
//printing b[0][0] to b[0][2] i.e. c[0] to c[2]
printf("b[0][0]= %d\tb[0][1]=%d\tb[0][2]=%d\n", b[0][0], b[0][1], b[0][2]);
//printing b[1][0] to b[1][5] i.e. d[0] to d[5]
printf("b[1][0]= %d\tb[1][1]=%d\tb[1][2]=%d\tb[1][3]=%d\tb[1][4]=%d\tb[1][5]=%d\n", b[1][0], b[1][1], b[1][2], b[1][3], b[1][4], b[1][5]);
//printing b[2][0] i.e. e[0]
printf("b[2][0]= %d\n", b[2][0]);
//printing b[3][0] to b[3][2] i.e. f[0] to f[2]
printf("b[3][0]= %d\tb[3][1]=%d\tb[3][2]=%d\n", b[3][0], b[3][1], b[3][2]);
return 0;
}
and the output was as expected:
b[0][0]= 1 b[0][1]=2 b[0][2]=3
b[1][0]= 4 b[1][1]=5 b[1][2]=6 b[1][3]=7 b[1][4]=8 b[1][5]=9
b[2][0]= 10
b[3][0]= 11 b[3][1]=12 b[3][2]=13
So, I think memory has been allocated this way:
But, question chimes in when this code is executed:
#include <stdio.h>
int main(void) {
int* b[4];
int c[]={1,2,3};
int d[]={4,5,6,7,8, 9};
int e[]={10};
int f[]={11, 12, 13};
b[0]=c;
b[1]=d;
b[2]=e;
b[3]=f;
int i, j;
for (i=0; i<4; i++)
{
for (j=0; j<7; j++)
{
printf("b[%d][%d]= %d ", i, j, b[i][j]);
}
printf("\n");
}
return 0;
}
And the output is something unusual:
b[0][0]= 1 b[0][1]= 2 b[0][2]= 3 b[0][3]= 11 b[0][4]= 12 b[0][5]= 13 b[0][6]= -1079668976
b[1][0]= 4 b[1][1]= 5 b[1][2]= 6 b[1][3]= 7 b[1][4]= 8 b[1][5]= 9 b[1][6]= -1216782128
b[2][0]= 10 b[2][1]= 1 b[2][2]= 2 b[2][3]= 3 b[2][4]= 11 b[2][5]= 12 b[2][6]= 13
b[3][0]= 11 b[3][1]= 12 b[3][2]= 13 b[3][3]= -1079668976 b[3][4]= -1079668936 b[3][5]= -1079668980 b[3][6]= -1079668964
One can observe that b[0][i] continues seeking values from b[3][i], array b[2][i] continues seeking values from b[0][i] followed by a[3][i], array b[3][i] and b1[i] terminate.
Every time when this program is executed, the same pattern is followed. So, is there something more on the way memory is allocated, or is this a mere co-incidence?
As Hrishi notes in comments, the reason this is happening is that you're trying to access beyond the end of your arrays. So what's actually happening?
The short version is that you're reading past the end of your arrays, and reading into the next array (Or into unallocated memory). But why is this happening?
A brief aside on C-style arrays
In C, arrays are just pointers1. b is a pointer to the start of the array, so *b will return the first element of the array (Which in this case is a pointer to the start of b[0].
The syntax b[i] is just syntactic sugar; it's the same as *(b + i), which is doing pointer arithmetic. It's literally saying: "The memory address i places after b; tell me what's pointing there"2.
So if we look at, for example, b[0][3], we can translate that into *((*b) + 3): you're getting the address of the start of b, and then getting whatever is stored three memory address away from that.
So what's happening to you?
As it happens, your computer has stored b[3] starting at that address. That's what this is really telling you: where your computer is placing each sub-array in memory. This is because arrays are always laid out contiguously, one position right after another in memory (That's how the pointer arithmetic trick works). But because you defined c, d, e, and f individually, the memory manager did not allocate them contiguous to one another, but instead just put them wherever it wanted. The resulting pattern is just what it came up with. As best I can tell, your arrays are laid out in memory like this:
--------
| e[0] |
--------
| c[0] |
--------
| c[1] |
--------
| c[2] |
--------
| f[0] |
--------
| f[1] |
--------
| f[2] |
--------
d is located somewhere in memory as well, but it could be before or after this contiguous block; we don't know.
However you can't rely on this. As I mention in a footnote, the ordering of allocated memory is not defined by the language, so it could (And does) change depending on any number of factors. Run this same code tomorrow, and it probably won't be exactly the same.
The next obvious question is: "What about b[0][6]? Why is that such a weird number?"
The answer is that you've run out of array, and you're now trying to read from unallocated memory.
When your program gets run, the operating system gives it a certain chunk of memory and says "Here, do with that whatever you like." When you declare a local variable on the stack (As you have here) or on the heap (With malloc), the memory manager grabs some of that memory and gives it back to you4. All the memory you're not currently using is still there, but you have no idea what is stored there; it's just leftover data from whatever was last using that particular chunk of memory. Reading this is also undefined behaviour in C, because you obviously have no control over what is stored in that memory.
I should note that most other languages (Java, for instance) wouldn't allow you to do anything like this; it would throw an exception because you're trying to access beyond the bounds of an array. C, however, isn't that smart. C likes to give you enough rope to hang yourself, so you need to do your own bounds checking.
1 This is a simplification. The truth is slightly more complicated
2 This implementation is why array indices start at 0.
3 This is an example of undefined behaviour, which is Very Bad. Basically it means that this result isn't consistent. It's happening the same way every time, on your computer, right now. Try it on a friend's computer, or even on your computer an hour from now, and you might get something completely different.
4 This is another oversimplification, but for your purposes it's close enough to true
Your little drawing is right, the only thing is that since you sequentially declared the arrays in your function, they're all in the stack, side by side. So, by accessing beyond your arrays' limits you're accessing the next array.
Compile with all warnings & debug info (gcc -Wall -Wextra -g). Then use the debugger (gdb). Beware of undefined behavior (UB).
Your b[2] is e which is an array of one element. At some time you are accessing b[2][3]. This is a buffer overflow (an instance of UB). What really happens is implementation specific (can vary with the compiler, its version, the ABI, the processor, the kernel, the moon, the compiler flags, ...) You may want to study the assembled code to understand more (gcc -fverbose-asm -S).
BTW, you should not suppose that arrays c, d, e, f have some particular memory layout.
When you print the values of the array elements using the specific location addresses you get the exact array values . but when you execute the same program you get garbage values as in c language we have no bond checking in C.Thus when you try accessing the value of a location that is beyond the memory scope utilized by you all you get is the data stored on that memory which is also referred as garbage value . So to get proper result you need to keep a check that accesses the values that are in array bond or say in the limit defined for that array.
I have a question :
char *c[] = {"GeksQuiz", "MCQ", "TEST", "QUIZ"};
char **cp[] = {c+3, c+2, c+1, c};
char ***cpp = cp;
int main()
{
printf("%s ", *--*++cpp+3);
}
I am not able to understand the output = sQUIZ ,
my approach: first it will point to cpp+3 i.e c now ++c means pointing to "MCQ" , * of that would give the value "MCQ" ,can't understand what the -- before * would do here . or is my approach totally wrong ?
I will post it as an answer as was mentioned in comments. You should read at first this: http://en.wikipedia.org/wiki/Sequence_point also look here and you can search for dozens of articles accross the Internet about sequence points. This stuff is as BAD as undefined behaviour and unspecified behaviour. You can read this post, especially the part What is the relation between Undefined Behaviour and Sequence Points? in the accepted answer.
Probably this interview question implied your knowledge about sequence points then it is not as bas as I see it, but nevertheless NEVER EVER write such a code even for your pet projects and I don't even want to mention production code. This is silly.
If they look for experienced C++/C developer they shouldn't ask such questions at all.
EDIT
Just for the tip about sequence points, because I saw some misunderstandings in other posted answer and in the comments. This is *--*++cpp+3 not an unspecified behaviour or undefined behaviour (I mean it is a bad code in general), but this IS:
int i =1;
*--*++cpp+i+i++;
The code above is unsequenced and unspecified. Please read about differences between undefined behaviour, unspecified behaviour, implementation-defined behavior and sequence points e.g. here .I wrote all this in order to explain you why you should avoid such a terrible code at all (whether it legal from the point of language standard or not). Yes, your code is legal, but unreadable, and, as you see in my edits, small changes made it illegal. Do not think we don't want to help you, I mean the code similar to your is a bad code in general wherever it will be asked. It will be better if they asked you to explain WHY such a code is bad and fragile - then it will be a good interview question.
P.S. The actual output is an empty string, because you print a null-terminator. See an excellent answer below - it explained the output from the point of C operators preceding (you should also learn it then such questions will not bother you at all).
All variables in this expression are modified only once. Maybe I don't understand something about sequence points, but I don't have no idea why people call this expression undefined behavior.
char *c[] = {"GeksQuiz", "MCQ", "TEST", "QUIZ"};
char **cp[] = {c+3, c+2, c+1, c};
char ***cpp = cp;
/*1*/ cpp; // == &cp[0]
/*2*/ ++cpp; // == &cp[1] (`cpp` changed)
/*3*/ *++cpp; // == cp[1] == c+2
/*4*/ --*++cpp; // == c+2-1 == &c[1] (`cp[1]` changed)
/*5*/ *--*++cpp; // == "MCQ"
/*6*/ *--*++cpp+3; // == "MCQ"+4 - it's pointer to '\0'
So it should not print anything.
I'm going through the K & R book and the answer to one of the exercises is troubling me.
In the solutions manual, exercise 1-22 declares a char array:
#define MAXCOL 10
char line[MAXCOL];
so my understanding is that in C arrays go from 0 ... n-1. If that's the case then the above declaration should allocate memory for a char array of length 10 starting with 0 and ending with 9. More to the point line[10] is out of bounds according to my understanding? A function in the sample program is eventually passed a integer value pos that is equal to 10 and the following comparison takes place:
int findblnk(int pos) {
while(pos > 0 && line[pos] != ' ')
--pos;
if (pos == 0) //no blanks in line ?
return MAXCOL;
else //at least one blank
return pos+1; //position after blank
}
If pos is 10 and line[] is only of length 10, then isn't line[pos] out of bounds for the array?
Is it okay to make comparisons this way in C, or could this potentially lead to a segmentation fault? I am sure the solutions manual is right this just really confused me. Also I can post the entire program if necessary. Thanks!
Thanks for the speedy and very helpful responses, I guess it is definitely a bug then. It is called through the following branch:
else if (++pos >= MAXCOL) {
pos = findblnk(pos);
printl(pos);
pos = newpos(pos);
}
MAXCOL is defined as 10 as stated above. So for this branch findblnk(pos) pos would be passed 10 as a minimum.
Do you think the solution manual for K & R is worth going through or is it known for having buggy code examples?
It is never, ever okay to over-run the bounds of an array in C. (Or any language really).
If 10 is really passed to that function, that is certainly a bug. While there are better ways of doing it, that function should at least verify that pos is within the bounds of line before attempting to use it as an index.
If pos is indeed 10 then it would be an out of bounds access and accessing an array out of bounds is undefined behavior and therefore anything can happen even a program that appears to work properly at the time, the results are unreliable. The draft C99 standard Annex J.2 undefined behavior contains the follows bullet:
An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]) (6.5.6).
I don't have a copy of K&R handy but the errata does not list anything for this problem. My best guess is the condition should < instead of >=.
Code above is fine as long as pos == 9 when its passed to that function . If pos ==10 when its passed then its undefined behaviour and .. you are correct , it should be avoided.
However it may or may not give segmentation fault .
my_type buffer[SOME_CONSTANT_NAME]; almost always is a bug.
Code like the one you present in the question is the source of the majority of security problems: when the buffer overflows, it invokes undefined behaviour, and that undefined behaviour (if it does not directly crash the program) can frequently be exploited by attackers to execute their own code within your process.
So, my advice is to stay away from all fixed buffer sizes and either use C++'s std::vector<> or dynamically allocate enough memory to fit. The Posix 2008 standard makes this quite easy even in C with the asprintf() function and friends.
This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 9 years ago.
I have a program which I expect it to crash but it doesn't. Can you please let me know the reason.
char a[5];
strncpy(a,"abcdefg",7);
a[7] = '\0';
printf("%s\n",a);
Shouldn't the program crash at strncpy() or at a[7]='\0' which is greater than array size of 5. I get output as abcedefg. I'm using gcc compiler.
Size of a array is five char a[5]; and your are assigning at 7th location that is buffer overrun problem and behavior of your code is Undefined at run time.
strncpy(a,"abcdefg",7);
a[7] = '\0';
Both are wrong, you need to defined array like:
#defined size 9 // greater then > 7
char a[size];
notice "abcdefg" need 8 char one extra for \0 null char.
read: a string ends with a null character, literally a '\0' character
In your example, your program has access to memory beyond a (starting address of array) plus 5 as the stack of the program may be higher. Hence, though the code works, ideally it is undefined behavior.
C often assumes you know what your doing, even (especially) when you've done something wrong. There is no bounds to an array, and you'll only get an error if your lucky and you've entered into an undefined memory location and get a segmentation fault. Otherwise you'll be able to access change memory, to whatever results.
You can't give a definition to undefined behaviour, as you are attempting by stating that it should crash. Another example of undefined behaviour that doesn't commonly crash is int x = INT_MAX + 1;, and int x = 0; x = x++ + ++x;. These might work on your system, if only by coincidence. That doesn't stop them from wreaking havoc on other systems!
Consider "Colourless, green ideas sleep furiously", or "The typewriter passed the elephant to the blackness". Do either of these statements make any sense in English? How would you interpret them? This is a similar situation to how C implementations might treat undefined behaviour.
Let us consider what might happen if you ask me to put 42 eggs in my carton that can store at least 12 eggs. The container most certainly has bounds, but you insist that they can all fit in there. I find that the container can only store 12 eggs. You won't know what happens to the 30 remaining eggs, so the behaviour is undefined.