#include <stdio.h>
#include <stdlib.h>
int test(void)
{
int a=0,b=0;
char buf[4];
gets(buf);
printf("a:%d b:%d",a,b);
}
int main()
{
test();
return 0;
}
Question is why with input:aaaaa a is becoming 97 instead of b?From the way variables are declared inside test when buf overflows shouldn't it affect first b and then a?
a and b variables will not be necessarily contiguous to the variable buf. Consequently, the overflow of the variable buf, has nothing to do with the possible values of a and b. The behaviour will be undefined.
However, it is important to mention that the C standard will store all of the arrays, such as buf, in continuous memory location.
Here you can check the documentation:
An array is a series of elements of the same type placed in contiguous
memory locations that can be individually referenced by adding an
index to a unique identifier.
Undefined behaviour is undefined. There's nothing in the language standard about the relative locations of different variables in a function, and there's definitely no guarantees about what will happen in a buffer overflow situation.
Related
As part of our training in the Academy of Programming Languages, we also learned C. During the test, we encountered the question of what the program output would be:
#include <stdio.h>
#include <string.h>
int main(){
char str[] = "hmmmm..";
const char * const ptr1[] = {"to be","or not to be","that is the question"};
char *ptr2 = "that is the qusetion";
(&ptr2)[3] = str;
strcpy(str,"(Hamlet)");
for (int i = 0; i < sizeof(ptr1)/sizeof(*ptr1); ++i){
printf("%s ", ptr1[i]);
}
printf("\n");
return 0;
}
Later, after examining the answers, it became clear that the cell (& ptr2)[3] was identical to the memory cell in &ptr1[2], so the output of the program is: to be or not to be (Hamlet)
My question is, is it possible to know, only by written code in the notebook, without checking any compiler, that a certain pointer (or all variables in general) follow or precede other variables in memory?
Note, I do not mean array variables, so all the elements in the array must be in sequence.
In this statement:
(&ptr2)[3] = str;
ptr2 was defined with char *ptr2 inside main. With this definition, the compiler is responsible for providing storage for ptr2. The compiler is allowed to use whatever storage it wants for this—it could be before ptr1, it could be after ptr1, it could be close, it could be far away.
Then &ptr2 takes the address of ptr2. This is allowed, but we do not know where that address will be in relation to ptr1 or anything else, because the compiler is allowed to use whatever storage it wants.
Since ptr2 is a char *, &ptr2 is a pointer to char *, also known as char **.
Then (&ptr2)[3] attempts to refer to element 3 of an array of char * that is at &ptr2. But there is no array there in C’s model of computation. There is just one char * there. When you try to refer to element of 3 of an array when there is no element 3 of an array, the behavior is not defined by the C standard.
Thus, this code is a bad example. It appears the test author misunderstood C, and this code does not illustrate what was intended.
char *ptr2 = some initializer;
(&ptr2)[3] = str;
When you evaluate &ptr2, you obtain the address of memory where is stored the pointer that points to that initializer.
When you do (&ptr2)[3]=something you try to write 3*sizeof(void*) locations further from the location of ptr2, the address of a string. This is invalid and almost sure it finishes with segmentation fault.
No, it's not possible and no such assumptions can be made.
By writing outside a variable's space, this code invokes undefined behavior, it's basically "illegal" and anything can happen when you run it. The C language specification says nothing about variables being allocated on a stack in some particular order that you can exploit, it does however say that accessing random memory is undefined behavior.
Basically this code is pretty horrible and should never be used, even less so in a teaching environment. It makes me sad, how people mis-understand C and still teach it to others. :/
A program usually is loaded in memory with this structure:
Stack, Mmap'ed files, Heap, BSS (uninitialized static variables), Data segment (Initialized static variables) and Text (Compiled code)
You can learn more here:
https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
Depending on how you declare the variable it will go to one of the places said before.
The compiler will arrange the BSS and Data segment variables as he wishes on compilation time so usually no chance. Neither heap vars (the OS will get the memory block that fits better the space allocated)
In the stack (which is a LIFO structure) the variables are put one over eachother so if you have:
int a = 5;
int b = 10;
You can say that a and b will be placed one following the other. So, in this case you can tell.
There is another exception and that is if the variable is an structure or an array, they are always placed like i said before, each one following the last.
In your code ptr1 is an array of arrays of chars so it will follow the exception i said.
In fact, do the following exercise:
#include <stdio.h>
#include <string.h>
int main(){
const char * const ptr1[] = {"to be","or not to be","that is the question"};
for (int i = 0; i < 3; i++) {
for (int j = 0; j < strlen(ptr1[i]); j++)
printf("%p -> %c\n", &ptr1[i][j], ptr1[i][j]);
printf("\n");
}
}
and you will see the memory address and its content!
Have a nice day.
I know I can just copy the function by reference, but I want to understand what's going on in the following code that produces a segfault.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int return0()
{
return 0;
}
int main()
{
int (*r0c)(void) = malloc(100);
memcpy(r0c, return0, 100);
printf("Address of r0c is: %x\n", r0c);
printf("copied is: %d\n", (*r0c)());
return 0;
}
Here's my mental model of what I thought should work.
The process owns the memory allocated to r0c. We are copying the data from the data segment corresponding to return0, and the copy is successful.
I thought that dereferencing a function pointer is the same as calling the data segment that the function pointer points to. If that's the case, then the instruction pointer should move to the data segment corresponding to r0c, which will contain the instructions for function return0. The binary code corresponding to return0 doesn't contain any jumps or function calls that would depend on the address of return0, so it should just return 0 and restore ip... 100 bytes is certainly enough for the function pointer, and 0xc3 is well within the bounds of r0c (it is at byte 11).
So why the segmentation fault? Is this a misunderstanding of the semantics of C's function pointers or is there some security feature that prevents self-modifying code that I'm unaware of?
The memory pages used by malloc to allocate memory are not marked as executable. You can't copy code to the heap and expect it to run.
If you want to do something like that you have to go deeper into the operating system, and allocate pages yourself. Then you need to mark those as executable. You would most likely need administrator rights to be able to set the executable flag on memory pages.
And it's really dangerous. If you do this in a program you distribute and have some kind of bug that lets an attacker use our program to write to those allocated memory pages, then the attacker can gain administrator rights and take control of the computer.
There's also other problems with your code, like pointers to functions might not translate well into general pointers on all platforms. It's very hard (not to mention non-standard) to predict or otherwise get the size of a function. You also print out pointers wrong in your code example. (use the "%p" format to print a void *, casting the pointer to a void * is needed).
Also when you declare a function like int fun() that's not the same as declaring a function that takes no arguments. If you want to declare a function that takes no arguments you should explicitly use void as in int fun(void).
The standard says:
The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.
[C2011, 7.24.2.1/2; emphasis added]
In the standard's terminology, functions are not "objects". The standard does not define behavior for the case where the source pointer points to a function, therefore such a memcpy() call produces undefined behavior.
Additionally, the pointer returned by malloc() is an object pointer. C does not provide for direct conversion of object pointers to function pointers, and it does not provide for objects to be called as functions. It is possible to convert between object pointer and function pointer by means of an intermediate integer value, but the effect of doing so is at minimum doubly implementation-defined. Under some circumstances it is undefined.
As in other cases, UB can turn out to be precisely the behavior you hoped for, but it is not safe to rely on that. In this particular case, other answers present good reasons to not expect to get the behavior you hoped for.
As was said in some comments, you need to make the data executable. This requires communicating with the OS to change protections on the data. On Linux, this is the system call int mprotect(void* addr, size_t len, int prot) (see http://man7.org/linux/man-pages/man2/mprotect.2.html).
Here is a Windows solution using VirtualProtect.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#ifdef _WIN32
#include <Windows.h>
#endif
int return0()
{
return 0;
}
int main()
{
int (*r0c)(void) = malloc(100);
memcpy((void*) r0c, (void*) return0, 100);
printf("Address of r0c is: %p\n", (void*) r0c);
#ifdef _WIN32
long unsigned int out_protect;
if(!VirtualProtect((void*) r0c, 100, PAGE_EXECUTE_READWRITE, &out_protect)){
puts("Failed to mark r0c as executable");
exit(1);
}
#endif
printf("copied is: %d\n", (*r0c)());
return 0;
}
And it works.
Malloc returns a pointer to an allocated memory (100 bytes in your case). This memory area is uninitialized; assuming that memory could be executed by the CPU, for your code to work, you would have to fill those 100 bytes with the executable instructions that the function implements (if indeed it can be held in 100 bytes). But as has been pointed out, your allocation is on the heap, not in the text (program) segment and I don't think it can be executed as instructions. Perhaps this would achieve what it is you want:
int return0()
{
return 0;
}
typedef int (*r0c)(void);
int main(void)
{
r0c pf = return0;
printf("Address of r0c is: %x\n", pf);
printf("copied is: %d\n", pf());
return 0;
}
I'm very new to C, and I'm not understanding this behavior. Upon printing the length of this empty array I get 3 instead of 0.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct entry entry;
struct entry{
char arr[16];
};
int main(){
entry a;
printf("%d\n",strlen(a.arr));
return 0;
}
What am I not understanding here?
The statement entry a; does not initialize the struct, so its value is likely garbage. Therefore, there's no guarantee that strlen on any of its members will return anything sensible. In fact, it might even crash the program, or worse.
There is no such thing as an "empty array" in C. Your array of char[16]; always contains 16 bytes - uninitialized as a local variable each char has an unspecified value. In addition, if none of these unspecified values happen to be 0, strlen will read outside the array and your code will have undefined behaviour.
Additionally strlen returns size_t and using %d to print this has undefined behaviour too; you must use %zu where z says that the corresponding argument is size_t.
(If by happenstance you're using the MSVC++ "C" compiler, do note that it might not support %zu. Get a real C compiler and C standard library instead.)
Here's the source code to strlen():
size_t strlen(const char *str)
{
const char *s;
for (s = str; *s; ++s);
return(s - str);
}
Wait, you mean there's source code to strlen()? Why yes. All the standard functions in C are themselves written in C.
This function starts at the memory address specified by str. It then uses the for function to start at that address, and then it goes forward, byte by byte, until it reaches zero. How does that for function do that? Well first it assigns s to str. Then, it checks the value s points to. If it's zero (i.e. if *s returns zero) then the for loop is done. If that value is not zero, the s pointer is incremented, and the zero check is done, over and over, until it finds a zero.
Finally, the distance that the s pointer has moved, minus the original pointer you passed in, is the result of strlen().
In other words, strlen() just walks through memory until it finds the next zero character, and it returns the number of characters from that point to the original pointer.
But, what if it doesn't find a zero? Does it stop? Nope. It will just trudge on and on until it finds a zero or the program crashes.
That is why strlen() is so confusing, and why it's source of many critical bugs in modern software. This doesn't mean you can't use it, but it does mean you must be very very careful to make sure that whatever you pass in is a null-terminated string (i.e. a set of zero or more non-zero characters, followed by a zero character.)
Remember also that in C, you basically have no idea what memory contains when you allocate it or set it aside. If you want it to be all zeros, then you need to make sure to fill it with zeros yourself!
Anyway, the answer to your question involves the use of the memset() function. You'll have to pass memset() the pointer to the beginning of your array, the length of that array, and the value to fill it with (in your case, zero of course!)
No initialization of a, this leads to undefined behavior.
C "strings" are '\0' terminated arrays of char. So strlen() will browse whole memory from given address until it either finds a '\0' or results in a segmentation fault.
What am I not understanding here?
Perhaps the mis-understanding is that auto variables, such as:
entry a;
are assigned memory from the process' stack. The pre-existing content of that stack memory is not zeroed-out for your benefit. Hence the value(s) of the elements of a, which will also be located on the process stack, will not be initially zeroed-out for your benefit. Rather, the entire content of a and its elements (including .arr) will contain bizarre and perhaps unexpected values.
C programmers learn to initialize auto variables by zeroing them out, or initializing them with a desirable value.
For example, the question code might do this as follows:
int main(){
entry a =
{
.arr[0] = 0
};
...
}
Or:
int main(){
entry a;
memset(&a, 0, sizeof(a));
...
}
So here is the deal , I was writing a Bulls and Cows Game in C and had an interesting observation..It was that my program was outputting values without initial referencing....First look at the whole code
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <math.h>
int main(){
int i,j,bulls,cows;
//int bulls = 0;
//int cows = 0;
char guess[4];
char chosenword[4] = "onea";
puts("Enter Your best guess <4 letter word>");
gets(guess);
for(i=0;i<4;i++){
if(guess[i] == chosenword[i]){
bulls++;
}
else {
for(j=0;j<4;j++){
if(chosenword[i]==guess[j]){
cows++;
}
}
}
}
printf("Bulls: %d And Cows: %d",bulls,cows);
return 0;
}
As you can see I have not initialized the variables bulls and cows to 0 but still my program outputs
some value or the other eg :
Here as you can see there are three trials..while the value of bulls changes the value of cows remains constant. Can anyone please explain the logic?
Because, other than the initialization, your variable contains some garbage value. You cannot predict what is that value.
In your code, bulls and cows are local variable in [auto]. So, compiler won't be initializing them, unless you do so explicitly in your code.
In contrast to using static or making them global.
Without initialization, using bulls++; or cows++ is creating a scenario read-before-write which may very well lead to undefined behavior. Always initialize your variables.
To avoid the issue: Please uncomment the initialization part in your code. :-)
A word of advice : Please don't use gets(). Use fgets(). Its lot safer.
Next,
char guess[4];
char chosenword[4] = "onea";
change to
char guess[5]; //to hold the terminating null character also
char chosenword[ ] = "onea"; //while initializing, you don't need to specify size explicitly.
EDIT:
In your case, cow is producing constant output because for your inputs, if condition never fails.
You need to avoid using gets() instead use fgets().
Using uninitialized variables leads to undefined behavior.
The value of uninitialized variables is undeterminate. The values you are seeing are some garbage values. So once we say UB anything might happen and explanation for it doesn't make sense.
Please initialize your local variables in order to get right outputs.
int main(void)
{
char name1[5];
int count;
printf("Please enter names\n");
count = scanf("%s",name1);
printf("You entered name1 %s\n",name1);
return 0;
}
When I entered more than 5 characters, it printed the characters as I entered, it was more than 5, but the char array is declared as:
char name1[5];
Why did this happened
Because the characters are stored on the addresses after the 'storage space'. This is very dangerous and can lead to crashes.
E.g. suppose you enter name: Michael and the name1 variable starts at 0x1000.
name1: M i c h a e l \0
0x1000 0x1001 0x1002 0x1003 0x1004 0x1005 0x1006 0x1007
[................................]
The allocated space is shown with [...]
This means from 0x1005 memory is overwritten.
Solution:
Copy only 5 characters (including the \0 at the end) or check the length of the entered string before you copy it.
This is undefined behavior, you are writing beyond the bounds of allocated memory. Anything can happen, including a program that appears to work correctly.
The C99 draft standard section J.2 Undefined Behavior says:
The behavior is undefined in the following circumstances:
and contains the following bullet:
An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).
This applies to the more general case since E1[E2] is identical to (*((E1)+(E2))).
This is undefined behavior, you can't count on it. It just happens to work, it may not work on another machine.
To avoid buffer overflow, use
fgets(name1, sizeof(name1) - 1, stdin);
or in C11
gets_s(name1, sizeof(name1) - 1);
another example to make things clearer :
#include <stdio.h>
int array[5] ;
int main ( void )
{
array[-1] = array[-1] ; // sound strange ??
printf ( "%d" , array[-1] ) ; // but work !!
return 0 ;
}
array in this case in an address, and you get number
before or after that address, but this is undefined behavior
unless you know what you do. Pointer works with ++ or -- !
It's very clear from other answers that this constitutes some kind of vulnerability to your program.
What can be learned from this? Lets assume:
int func(void)
{
char buffer[1];
...
In almost every implementation of the C compiler, the code generated here will create a local stack area and enables you to access this stack by the address given in buffer. On this stack reside other important data too, for example: the address of the next code line to be executed after the function returns to it's caller.
You could, therefore, theoretically:
Enter a lot of code into your input function,
Create a code that defines (in binary code) a new function that does something ugly,
Overwrite the correct return address (on the stack) with the address that the new function would have if you write it beyond the buffers bounds.
This is called buffer overflow exploit, you can read up here (and on many other places).
Yes it is allowed in C, as there is no bound checking.