What is the trick behind strcpy()/uninitialized char pointer this code? - c

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void main ()
{
char *imsi;
unsigned int i;
int val;
char *dest;
imsi = "405750111";
strncpy(dest,imsi,5);
printf("%s",dest);
/* i = 10; */
}
In the above code, with the i = 10 assignment is commented as above, the code works fine without error. When assignment is included for compilation, the error (segmentation fault) occurs at strncpy(dest,imsi,5);.
By avoiding optimization to variable i (i.e., volatile int i;), the error is cleared even with the assignment (i = 10) included.

In your code, by saying
strncpy(dest,imsi,5);
you're trying to write into an unitialized pointer dest. It can (and most possibly, it will) point to some memory which is not accessible from your program (invalid memory). It invokes undefined behavior.
There is nothing that can be guaranteed about a program having UB. It can work as expected (depends on what you're expecting, actually) or it may crash or open your bank account and transfer all money to some potential terrorist organization.
N.B - I hope by reading last line you got scared, so the bottom line is
Don't try to write into any uninitialized pointer (memory area). Period.

The behaviour of this code is unpredictable because the pointer dest is used before it is initialised. The difference in observed behaviour is only indirectly related to the root cause bug, which is the uninitialised variable. In C it is the programmers responsibility to allocate storage for the output of the strncpy() function and you haven't done that.
The simplest fix is to define an output buffer like this:
char dest[10];

Assuming you compiled this C source code into machine code for some "normal" architecture and then ran it, the possible effects of read-undefined UB basically boil down to what value floating around in registers or memory ends up getting used.
If the compiler happens to use the same value both times, and that value happened to point to a writeable memory address (and didn't overwrite anything that would break printf), it could certainly happen to work. UB doesn't guarantee a crash. It doesn't guarantee anything. Part of the point of UB is to let the compiler make assumptions and optimize based on them.
Any changes to surrounding code will affect code-gen for that function, and thus will can affect what's in the relevant register when the call happens, or which stack slot is used for dest. Reading from a different stack address will give dest a different value.
Before main, calls to dynamic-linker functions might have dirtied some memory, leaving some pointers floating around in there, maybe including apparently some to writeable memory.
Or main's caller might have a pointer to some writeable memory in a register, e.g. a stack address.
Although that's less likely; if a compiler was going to just not even set a register before making a call, strncpy would probably get main's first arg, an integer argc, unless the compiler used that register as a temporary first. But string literals normally go in read-only memory so that's an unlikely explanation in this case. (Even on an ISA / calling convention like ARM where gcc's favourite register for temporaries is R0, the return-value register but also the first arg-passing register. If optimization is disabled so statements compile separately, most expressions will use R0.)

Related

Returning a pointer to a variable stored in the stack

I am currently learning the C language and I please you to apologize me if my question is stupid.
As far as I am concerned returning pointers to variables stored in the stack is a bad idea, since the the memory that contains the variable is cleared when the function returns. Hence I expect to get a Segmentation fault when executing the following piece of code:
int *foo()
{
int j = 5;
int *ptr = &j;
return ptr;
}
int main()
{
int *p;
p = foo();
printf("%d\n", *p);
return 0;
}
However when compiled with gcc (version 8.3.0) the program works apparently fine (no compiler warnings as well) and outputs 5, rather than a Segmentation fault. My question is why does this piece of code work when it is supposed not to.
Thank you in advance!
Yes, returning the address of a local variable is a bad idea, as that local (j) ceases to exist when the function returns, and that pointer is now invalid.
However, dereferencing an invalid pointer is not guaranteed to lead to a segfault. The behavior is undefined, which means quite literally anything can happen, including appearing to work correctly.
What’s likely happening is that the portion of the stack that contained j has not yet been overwritten, so the printf just happens to work.
… the memory that contains the variable is cleared when the function returns…
That is not correct. When the function returns, the storage for its local objects is merely no longer reserved for use for those objects.
The C standard says that, when an object’s lifetime begins, storage (memory) is reserved for it. That means the C implementation provides some memory that can be used for that object, and that it will not use for any other purpose. When the object’s lifetime ends, the C standard says that storage is no longer reserved. The standard does not say that the memory is cleared or unmapped. It also does not say that it is not cleared or that it is not unmapped.
A C implementation could clear the memory, but normal C implementations do not, because that is generally a waste of resources. Most commonly, the memory remains unchanged until it is used for other purposes. But other effects are possible too, such as removing the memory from the process’ virtual address space. So it is normal that you would be able to use the memory after the function returns (and before you call any other functions that alter the memory), but it is also normal that errors would occur in your program if you use the memory. The behavior is not guaranteed either way.

Exceeding array bound in C -- Why does this NOT crash?

I have this piece of code, and it runs perfectly fine, and I don't why:
int main(){
int len = 10;
char arr[len];
arr[150] = 'x';
}
Seriously, try it! It works (at least on my machine)!
It doesn't, however, work if I try to change elements at indices that are too large, for instance index 20,000. So the compiler apparently isn't smart enough to just ignore that one line.
So how is this possible? I'm really confused here...
Okay, thanks for all the answers!
So I can use this to write into memory consumed by other variables on the stack, like so:
#include <stdio.h>
main(){
char b[4] = "man";
char a[10];
a[10] = 'c';
puts(b);
}
Outputs "can". That's a really bad thing to do.
Okay, thanks.
C compilers generally do not generate code to check array bounds, for the sake of efficiency. Out-of-bounds array accesses result in "undefined behavior", and one
possible outcome is that "it works". It's not guaranteed to cause a crash or other
diagnostic, but if you're on an operating system with virtual memory support, and your array index points to a virtual memory location that hasn't yet been mapped to physical memory, your program is more likely to crash.
So how is this possible?
Because the stack was, on your machine, large enough that there happened to be a memory location on the stack at the location to which &arr[150] happened to correspond, and because your small example program exited before anything else referred to that location and perhaps crashed because you'd overwritten it.
The compiler you're using doesn't check for attempts to go past the end of the array (the C99 spec says that the result of arr[150], in your sample program, would be "undefined", so it could fail to compile it, but most C compilers don't).
Most implementations don't check for these kinds of errors. Memory access granularity is often very large (4 KiB boundaries), and the cost of finer-grained access control means that it is not enabled by default. There are two common ways for errors to cause crashes on modern OSs: either you read or write data from an unmapped page (instant segfault), or you overwrite data that leads to a crash somewhere else. If you're unlucky, then a buffer overrun won't crash (that's right, unlucky) and you won't be able to diagnose it easily.
You can turn instrumentation on, however. When using GCC, compile with Mudflap enabled.
$ gcc -fmudflap -Wall -Wextra test999.c -lmudflap
test999.c: In function ‘main’:
test999.c:3:9: warning: variable ‘arr’ set but not used [-Wunused-but-set-variable]
test999.c:5:1: warning: control reaches end of non-void function [-Wreturn-type]
Here's what happens when you run it:
$ ./a.out
*******
mudflap violation 1 (check/write): time=1362621592.763935 ptr=0x91f910 size=151
pc=0x7f43f08ae6a1 location=`test999.c:4:13 (main)'
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_check+0x41) [0x7f43f08ae6a1]
./a.out(main+0xa6) [0x400a82]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f43f0538ead]
Nearby object 1: checked region begins 0B into and ends 141B after
mudflap object 0x91f960: name=`alloca region'
bounds=[0x91f910,0x91f919] size=10 area=heap check=0r/3w liveness=3
alloc time=1362621592.763807 pc=0x7f43f08adda1
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_register+0x41) [0x7f43f08adda1]
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_wrap_alloca_indirect+0x1a4) [0x7f43f08afa54]
./a.out(main+0x45) [0x400a21]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f43f0538ead]
number of nearby objects: 1
Oh look, it crashed.
Note that Mudflap is not perfect, it won't catch all of your errors.
Native C arrays do not get bounds checking. That would require additional instructions and data structures. C is designed for efficiency and leanness, so it doesn't specify features that trade performance for safety.
You can use a tool like valgrind, which runs your program in a kind of emulator and attempts to detect such things as buffer overflows by tracking which bytes are initialized and which aren't. But it's not infallible, for example if the overflowing access happens to perform an otherwise-legal access to another variable.
Under the hood, array indexing is just pointer arithmetic. When you say arr[ 150 ], you are just adding 150 times the sizeof one element and adding that to the address of arr to obtain the address of a particular object. That address is just a number, and it might be nonsense, invalid, or itself an arithmetic overflow. Some of these conditions result in the hardware generating a crash, when it can't find memory to access or detects virus-like activity, but none result in software-generated exceptions because there is no room for a software hook. If you want a safe array, you'll need to build functions around the principle of addition.
By the way, the array in your example isn't even technically of fixed size.
int len = 10; /* variable of type int */
char arr[len]; /* variable-length array */
Using a non-const object to set the array size is a new feature since C99. You could just as well have len be a function parameter, user input, etc. This would be better for compile-time analysis:
const int len = 10; /* constant of type int */
char arr[len]; /* constant-length array */
For the sake of completeness: The C standard doesn't specify bounds checking but neither is it prohibited. It falls under the category of undefined behavior, or errors that need not generate error messages, and can have any effect. It is possible to implement safe arrays, various approximations of the feature exist. C does nod in this direction by making it illegal, for example, to take the difference between two arrays in order to find the correct out-of-bounds index to access an arbitrary object A from array B. But the language is very free-form, and if A and B are part of the same memory block from malloc it is legal. In other words, the more C-specific memory tricks you use, the harder automatic verification becomes even with C-oriented tools.
Under the C spec, accessing an element past the end of an array is undefined behaviour. Undefined behaviour means that the specification does not say what would happen -- therefore, anything could happen, in theory. The program might crash, or it might not, or it might crash hours later in a completely unrelated function, or it might wipe your harddrive (if you got unlucky and poked just the right bits into the right place).
Undefined behaviour is not easily predictable, and it should absolutely never be relied upon. Just because something appears to work does not make it right, if it invokes undefined behaviour.
Because you were lucky. Or rather unlucky, because it means it's harder to find the bug.
The runtime will only crash if you start using the memory of another process (or in some cases unallocated memory). Your application is given a certain amount of memory when it opens, which in this case is enough, and you can mess about in your own memory as much as you like, but you'll give yourself a nightmare of a debugging job.

Why is this C program returning correct value in VC++2008?

We know that automatic variables are destroyed upon the return of the function.
Then, why is this C program returning correct value?
#include <stdio.h>
#include <process.h>
int * ReturningPointer()
{
int myInteger = 99;
int * ptrToMyInteger = &myInteger;
return ptrToMyInteger;
}
main()
{
int * pointerToInteger = ReturningPointer();
printf("*pointerToInteger = %d\n", *pointerToInteger);
system("PAUSE");
}
Output
*pointerToInteger = 99
Edit
Then why is this giving garbage values?
#include <stdio.h>
#include <process.h>
char * ReturningPointer()
{
char array[13] = "Hello World!";
return array;
}
main()
{
printf("%s\n", ReturningPointer());
system("PAUSE");
}
Output
xŤ
There is no answer to that question: your code exhibits undefined behavior. It could print "the right value" as you are seeing, it could print anything else, it could segfault, it could order pizza online with your credit card.
Dereferencing that pointer in main is illegal, it doesn't point to valid memory at that point. Don't do it.
There's a big difference between you two examples: in the first case, *pointer is evaluated before calling printf. So, given that there are no function calls between the line where you get the pointer value, and the printf, chances are high that the stack location pointer points to will not have been overwritten. So the value that was stored there prior to calling printf is likely to be output (that value will be passed on to printf's stack, not the pointer).
In the second case, you're passing a pointer to the stack to printf. The call to printf overwrites (a part of) that same stack region the pointer is pointing to, and printf ends up trying to print its own stack (more or less) which doesn't have a high chance of containing something readable.
Note that you can't rely on getting gibberish either. Your implementation is free to use a different stack for the printf call if it feels like it, as long as it follows the requirements laid out by the standard.
This is undefined behavior, and it could have launched a missile instead. But it just happened to give you the correct answer.
Think about it, it kind of make sense -- what else did you expect? Should it have given you zero? If so, then the compiler must insert special instructions at the scope end to erase the variable's content -- waste of resources. The most natural thing for the compiler to do is to leave the contents unchanged -- so you just got the correct output from undefined behavior by chance.
You could say this behavior is implementation defined. For example. Another compiler (or the same compiler in "Release" mode) may decide to allocate myInteger purely in register (not sure if it actually can do this when you take an address of it, but for the sake of argument...), in that case no memory would be allocated for 99 and you would get garbage output.
As a more illustrative (but totally untested) example -- if you insert some malloc and exercise some memory usage before printf you may find the garbage value you were looking for :P
Answer to "Edited" part
The "real" answer that you want needs to be answered in disassembly. A good place to start is gcc -S and gcc -O3 -S. I will leave the in-depth analysis for wizards that will come around. But I did a cursory peek using GCC and it turns out that printf("%s\n") gets translated to puts, so the calling convention is different. Since local variables are allocated on the stack, calling a function could "destroy" previously allocated local variables.
Destroying is the wrong word imho. Locals reside on the stack, if the function returns the stack space may be reused again. Until then it is not overwritten and still accessible by pointers which you might not really want (because this might never point to something valid)
Pointers are used to address space in memory, for local pointers the same as I described in 1 is valid. However the pointer seems to be passed to the main program.
If it really is the address storing the former integer it will result in "99" up until that point in the execution of your program when the program overwrite this memory location. It may also be another 99 by coincidence. Any way: do not do this.
These kind of errors will lead to trouble some day, may be on other machines, other OS, other compiler or compiler options - imagine you upgrade your compiler which may change the behaviour the memory usage or even a build with optimization flags, e.g. release builds vs default debug builds, you name it.
In most C/C++ programs their local variables live on the stack, and destroyed means overwritten with something else. In this case that particular location had not been overwritten yet when it was passed as a parameter to printf().
Of course, having such code is asking for trouble because per the C and C++ standards it exhibits undefined behavior.
That is undefined behavior. That means that anything can happen, even what you would expect.
The tricky part of UB is when it gives you the result you expect, and so you think that you are doing it right. Then, any change in an unrelated part of the program changes that...
Answering your question more specifically, you are returning a pointer to an automatic variable, that no longer exists when the function returns, but since you call no other functions in the middle, it happens to keep the old value.
If you call, for example printf twice, the second time it will most likely print a different value.
The key idea is that a variable represents a name and type for value stored somewhere in memory. When it is "destroyed", it means that a) that value can no longer be accessed using that name, and b) the memory location is free to be overwritten.
The behavior is undefined because the implementation of the compiler is free to choose what time after "destruction" the location is actually overwritten.

C Tutorial - Wonder about `int i = *(int *)&s;`

Working my way through a C tutorial
#include <stdio.h>
int main() {
short s = 10;
int i = *(int *)&s; // wonder about this
printf("%i", i);
return 0;
}
When I tell C that the address of s is an int, should it not read 4 bytes?
Starting from the left most side of 2 bytes of s. In which case is this not critically dangerous as I don't know what it is reading since the short only assigned 2 bytes?
Should this not crash for trying to access memory that I haven't assigned/belong-to-me?
Don't do that ever
Throw away the tutorial if it teaches/preaches that.
As you pointed out it will read more bytes than that were actually allocated, so it reads off some garbage value from the memory not allocate by your variable.
In fact it is dangerous and it breaks the Strict Aliasing Rule[Detail below] and causes an Undefined Behavior.
The compiler should give you a warning like this.
warning: dereferencing type-punned pointer will break strict-aliasing rules
And you should always listen to your compiler when it cries out that warning.
[Detail]
Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)
The exception to the rule is a char*, which is allowed to point to any type.
First of all, never do this.
As to why it doesn't crash: since s is a local, it's allocated on the stack. If short and int have different sizes in your architecture (which is not a given), then you will probably end up reading a few more bytes from memory that's on the same memory page as the stack; so and there will be no access violation (even though you will read garbage).
Probably.
This is dangerous and undefined behaviour, just as you said.
The reason why it doesn't crash on 32 (or 64) bit platforms is that most compilers allocate atleast 32 bits for each stack variable. This makes the access faster, but on e.g. 8 bit processor you would get garbage data in the upper bits instead.
No it's not going to crash your program, however it is going to be reading a portion of other variables (or possibly garbage) on the stack. I don't know what tutorial you got this from, but that kind of code is scary.
First of all, all addresses are of the same size and if you're in a 64bit architecture, each char *, short * or int * will have 8 bytes.
When using a star before an ampersand it will cancel the effect, so *&x is semantically equivalent to just x.
Basically you are right in the sense that since you are accessing an int * pointer, this will fetch 4 bytes instead of the only 2 reserved for 's' storage and the resulting content won't be a perfect reflection of what 's' really means.
However this most likely won't crash since 's' is located on the stack so depending on how your stack is laid out at this point, you will most likely read data pushed during the 'main' function prologue...
See for a program to crash due to invalid read memory access, you need to access a memory region that is not mapped which will trigger a 'segmentation fault' at the userworld level while a 'page fault' at the kernel level. By 'mapped' I mean you have a known mapping between a virtual memory region and a physical memory region (such mapping is handled by the operating system). That is why if you access a NULL pointer you will get such exception because there is no valid mapping at the userworld level. A valid mapping will usually be given to you by calling something like malloc() (note that malloc() is not a syscall but a smart wrapper around that manages your virtual memory blocks). Your stack is no exception since it is just memory like anything else but some pre-mapped area is already done for you so that when you create a local variable in a block you don't have to worry about its memory location since that's handled for you and in this case you are not accessing far enough to reach something non-mapped.
Now let's say you do something like that:
short s = 10;
int *i = (int *)&s;
*i = -1;
Then in this case your program is more likely to crash since in this case you start overwriting data. Depending on the data you are touching the effect of this might range from harmless program misbehavior to a program crash if for instance you overwrite the return address pushed in the stack... Data corruption is to me one of the hardest (if not the hardest) bugs category to deal with since its effect can affect your system randomly with non-deterministic pattern and might happen long after the original offending instructions were actually executed.
If you want to understand more about internal memory management, you probably want to look into Virtual Memory Management in Operating System designs.
Hope it helps,

C - calling a function via func_ptr, why doesnt it work?

i have the following code:
void print(const char* str){
system_call(4,1,str,strlen(str)); }
void foo2(void){ print("goo \n");}
void buz(void){ ...}
int main(){
char buf[256];
void (*func_ptr)(void)=(void(*)(void))buf;
memcpy(buf,foo2, ((void*)buz)-((void*)foo2));
func_ptr();
return 0;
}
the question is, why will this code fall?
the answer was, something about calling a function not via pointer is to a relative address, but i havent been able to figure out whats wrong here? which line is the problematic one?
thank you for your help
Well to begin with, there is nothing which says that foo2() and buz() must be next to each other in memory. And for another, as you guess, the code must be relative for stunts like that to work. But most of all, it is not allowed by the standard.
As Chris Luts referred to, stack (auto) variables are not executable on many operating systems, to protect from attacks.
The first two lines in your main() function are problematic.
Line 1. (void(*)(void))buf
converting buf to a function pointer is undefined
Line 2. ((void*)buz)-((void*)foo2)
subtraction of pointers is undefined unless the pointers point within the same array.
Also, Section 5.8 Functions of H&S says "Although a pointer to a function is often assumed to be the address of the function's code in memory, on some computers a function pointer actually points to a block of information needed to invoke the function."
First and foremost, C function pointers mechanism is for equal-signature function calling abstraction. This is powerful and error-prone enough without these stunts.
I can't see an advantage/sense in trying to copying code from one place to another. As some have commented, it's not easy to tell the amount of relativeness/rellocatable code within a C function.
You tried copying the code of a function onto a data memory region. Some microcontrollers would just told you "Buzz off!". On machine architectures that have data/program separated memories, given a very understanding compiler (or one that recognizes data/code modifiers/attributes), it would compile to the specific Code-Data Move instructions. It seams it would work... However, even in data/code separated memory archs, data-memory instruction execution is not possible.
On the other hand, in "normal" data/code shared memory PCs, likely it would also not work because data/code segments are declared (by the loader) on the MMU of the processor. Depending on the processor and OS, attempts to run code on data segments, is a segmentation fault.

Resources