C Tutorial - Wonder about `int i = *(int *)&s;` - c

Working my way through a C tutorial
#include <stdio.h>
int main() {
short s = 10;
int i = *(int *)&s; // wonder about this
printf("%i", i);
return 0;
}
When I tell C that the address of s is an int, should it not read 4 bytes?
Starting from the left most side of 2 bytes of s. In which case is this not critically dangerous as I don't know what it is reading since the short only assigned 2 bytes?
Should this not crash for trying to access memory that I haven't assigned/belong-to-me?

Don't do that ever
Throw away the tutorial if it teaches/preaches that.
As you pointed out it will read more bytes than that were actually allocated, so it reads off some garbage value from the memory not allocate by your variable.
In fact it is dangerous and it breaks the Strict Aliasing Rule[Detail below] and causes an Undefined Behavior.
The compiler should give you a warning like this.
warning: dereferencing type-punned pointer will break strict-aliasing rules
And you should always listen to your compiler when it cries out that warning.
[Detail]
Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)
The exception to the rule is a char*, which is allowed to point to any type.

First of all, never do this.
As to why it doesn't crash: since s is a local, it's allocated on the stack. If short and int have different sizes in your architecture (which is not a given), then you will probably end up reading a few more bytes from memory that's on the same memory page as the stack; so and there will be no access violation (even though you will read garbage).
Probably.

This is dangerous and undefined behaviour, just as you said.
The reason why it doesn't crash on 32 (or 64) bit platforms is that most compilers allocate atleast 32 bits for each stack variable. This makes the access faster, but on e.g. 8 bit processor you would get garbage data in the upper bits instead.

No it's not going to crash your program, however it is going to be reading a portion of other variables (or possibly garbage) on the stack. I don't know what tutorial you got this from, but that kind of code is scary.

First of all, all addresses are of the same size and if you're in a 64bit architecture, each char *, short * or int * will have 8 bytes.
When using a star before an ampersand it will cancel the effect, so *&x is semantically equivalent to just x.

Basically you are right in the sense that since you are accessing an int * pointer, this will fetch 4 bytes instead of the only 2 reserved for 's' storage and the resulting content won't be a perfect reflection of what 's' really means.
However this most likely won't crash since 's' is located on the stack so depending on how your stack is laid out at this point, you will most likely read data pushed during the 'main' function prologue...
See for a program to crash due to invalid read memory access, you need to access a memory region that is not mapped which will trigger a 'segmentation fault' at the userworld level while a 'page fault' at the kernel level. By 'mapped' I mean you have a known mapping between a virtual memory region and a physical memory region (such mapping is handled by the operating system). That is why if you access a NULL pointer you will get such exception because there is no valid mapping at the userworld level. A valid mapping will usually be given to you by calling something like malloc() (note that malloc() is not a syscall but a smart wrapper around that manages your virtual memory blocks). Your stack is no exception since it is just memory like anything else but some pre-mapped area is already done for you so that when you create a local variable in a block you don't have to worry about its memory location since that's handled for you and in this case you are not accessing far enough to reach something non-mapped.
Now let's say you do something like that:
short s = 10;
int *i = (int *)&s;
*i = -1;
Then in this case your program is more likely to crash since in this case you start overwriting data. Depending on the data you are touching the effect of this might range from harmless program misbehavior to a program crash if for instance you overwrite the return address pushed in the stack... Data corruption is to me one of the hardest (if not the hardest) bugs category to deal with since its effect can affect your system randomly with non-deterministic pattern and might happen long after the original offending instructions were actually executed.
If you want to understand more about internal memory management, you probably want to look into Virtual Memory Management in Operating System designs.
Hope it helps,

Related

What is the trick behind strcpy()/uninitialized char pointer this code?

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void main ()
{
char *imsi;
unsigned int i;
int val;
char *dest;
imsi = "405750111";
strncpy(dest,imsi,5);
printf("%s",dest);
/* i = 10; */
}
In the above code, with the i = 10 assignment is commented as above, the code works fine without error. When assignment is included for compilation, the error (segmentation fault) occurs at strncpy(dest,imsi,5);.
By avoiding optimization to variable i (i.e., volatile int i;), the error is cleared even with the assignment (i = 10) included.
In your code, by saying
strncpy(dest,imsi,5);
you're trying to write into an unitialized pointer dest. It can (and most possibly, it will) point to some memory which is not accessible from your program (invalid memory). It invokes undefined behavior.
There is nothing that can be guaranteed about a program having UB. It can work as expected (depends on what you're expecting, actually) or it may crash or open your bank account and transfer all money to some potential terrorist organization.
N.B - I hope by reading last line you got scared, so the bottom line is
Don't try to write into any uninitialized pointer (memory area). Period.
The behaviour of this code is unpredictable because the pointer dest is used before it is initialised. The difference in observed behaviour is only indirectly related to the root cause bug, which is the uninitialised variable. In C it is the programmers responsibility to allocate storage for the output of the strncpy() function and you haven't done that.
The simplest fix is to define an output buffer like this:
char dest[10];
Assuming you compiled this C source code into machine code for some "normal" architecture and then ran it, the possible effects of read-undefined UB basically boil down to what value floating around in registers or memory ends up getting used.
If the compiler happens to use the same value both times, and that value happened to point to a writeable memory address (and didn't overwrite anything that would break printf), it could certainly happen to work. UB doesn't guarantee a crash. It doesn't guarantee anything. Part of the point of UB is to let the compiler make assumptions and optimize based on them.
Any changes to surrounding code will affect code-gen for that function, and thus will can affect what's in the relevant register when the call happens, or which stack slot is used for dest. Reading from a different stack address will give dest a different value.
Before main, calls to dynamic-linker functions might have dirtied some memory, leaving some pointers floating around in there, maybe including apparently some to writeable memory.
Or main's caller might have a pointer to some writeable memory in a register, e.g. a stack address.
Although that's less likely; if a compiler was going to just not even set a register before making a call, strncpy would probably get main's first arg, an integer argc, unless the compiler used that register as a temporary first. But string literals normally go in read-only memory so that's an unlikely explanation in this case. (Even on an ISA / calling convention like ARM where gcc's favourite register for temporaries is R0, the return-value register but also the first arg-passing register. If optimization is disabled so statements compile separately, most expressions will use R0.)

Exceeding array bound in C -- Why does this NOT crash?

I have this piece of code, and it runs perfectly fine, and I don't why:
int main(){
int len = 10;
char arr[len];
arr[150] = 'x';
}
Seriously, try it! It works (at least on my machine)!
It doesn't, however, work if I try to change elements at indices that are too large, for instance index 20,000. So the compiler apparently isn't smart enough to just ignore that one line.
So how is this possible? I'm really confused here...
Okay, thanks for all the answers!
So I can use this to write into memory consumed by other variables on the stack, like so:
#include <stdio.h>
main(){
char b[4] = "man";
char a[10];
a[10] = 'c';
puts(b);
}
Outputs "can". That's a really bad thing to do.
Okay, thanks.
C compilers generally do not generate code to check array bounds, for the sake of efficiency. Out-of-bounds array accesses result in "undefined behavior", and one
possible outcome is that "it works". It's not guaranteed to cause a crash or other
diagnostic, but if you're on an operating system with virtual memory support, and your array index points to a virtual memory location that hasn't yet been mapped to physical memory, your program is more likely to crash.
So how is this possible?
Because the stack was, on your machine, large enough that there happened to be a memory location on the stack at the location to which &arr[150] happened to correspond, and because your small example program exited before anything else referred to that location and perhaps crashed because you'd overwritten it.
The compiler you're using doesn't check for attempts to go past the end of the array (the C99 spec says that the result of arr[150], in your sample program, would be "undefined", so it could fail to compile it, but most C compilers don't).
Most implementations don't check for these kinds of errors. Memory access granularity is often very large (4 KiB boundaries), and the cost of finer-grained access control means that it is not enabled by default. There are two common ways for errors to cause crashes on modern OSs: either you read or write data from an unmapped page (instant segfault), or you overwrite data that leads to a crash somewhere else. If you're unlucky, then a buffer overrun won't crash (that's right, unlucky) and you won't be able to diagnose it easily.
You can turn instrumentation on, however. When using GCC, compile with Mudflap enabled.
$ gcc -fmudflap -Wall -Wextra test999.c -lmudflap
test999.c: In function ‘main’:
test999.c:3:9: warning: variable ‘arr’ set but not used [-Wunused-but-set-variable]
test999.c:5:1: warning: control reaches end of non-void function [-Wreturn-type]
Here's what happens when you run it:
$ ./a.out
*******
mudflap violation 1 (check/write): time=1362621592.763935 ptr=0x91f910 size=151
pc=0x7f43f08ae6a1 location=`test999.c:4:13 (main)'
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_check+0x41) [0x7f43f08ae6a1]
./a.out(main+0xa6) [0x400a82]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f43f0538ead]
Nearby object 1: checked region begins 0B into and ends 141B after
mudflap object 0x91f960: name=`alloca region'
bounds=[0x91f910,0x91f919] size=10 area=heap check=0r/3w liveness=3
alloc time=1362621592.763807 pc=0x7f43f08adda1
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_register+0x41) [0x7f43f08adda1]
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_wrap_alloca_indirect+0x1a4) [0x7f43f08afa54]
./a.out(main+0x45) [0x400a21]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f43f0538ead]
number of nearby objects: 1
Oh look, it crashed.
Note that Mudflap is not perfect, it won't catch all of your errors.
Native C arrays do not get bounds checking. That would require additional instructions and data structures. C is designed for efficiency and leanness, so it doesn't specify features that trade performance for safety.
You can use a tool like valgrind, which runs your program in a kind of emulator and attempts to detect such things as buffer overflows by tracking which bytes are initialized and which aren't. But it's not infallible, for example if the overflowing access happens to perform an otherwise-legal access to another variable.
Under the hood, array indexing is just pointer arithmetic. When you say arr[ 150 ], you are just adding 150 times the sizeof one element and adding that to the address of arr to obtain the address of a particular object. That address is just a number, and it might be nonsense, invalid, or itself an arithmetic overflow. Some of these conditions result in the hardware generating a crash, when it can't find memory to access or detects virus-like activity, but none result in software-generated exceptions because there is no room for a software hook. If you want a safe array, you'll need to build functions around the principle of addition.
By the way, the array in your example isn't even technically of fixed size.
int len = 10; /* variable of type int */
char arr[len]; /* variable-length array */
Using a non-const object to set the array size is a new feature since C99. You could just as well have len be a function parameter, user input, etc. This would be better for compile-time analysis:
const int len = 10; /* constant of type int */
char arr[len]; /* constant-length array */
For the sake of completeness: The C standard doesn't specify bounds checking but neither is it prohibited. It falls under the category of undefined behavior, or errors that need not generate error messages, and can have any effect. It is possible to implement safe arrays, various approximations of the feature exist. C does nod in this direction by making it illegal, for example, to take the difference between two arrays in order to find the correct out-of-bounds index to access an arbitrary object A from array B. But the language is very free-form, and if A and B are part of the same memory block from malloc it is legal. In other words, the more C-specific memory tricks you use, the harder automatic verification becomes even with C-oriented tools.
Under the C spec, accessing an element past the end of an array is undefined behaviour. Undefined behaviour means that the specification does not say what would happen -- therefore, anything could happen, in theory. The program might crash, or it might not, or it might crash hours later in a completely unrelated function, or it might wipe your harddrive (if you got unlucky and poked just the right bits into the right place).
Undefined behaviour is not easily predictable, and it should absolutely never be relied upon. Just because something appears to work does not make it right, if it invokes undefined behaviour.
Because you were lucky. Or rather unlucky, because it means it's harder to find the bug.
The runtime will only crash if you start using the memory of another process (or in some cases unallocated memory). Your application is given a certain amount of memory when it opens, which in this case is enough, and you can mess about in your own memory as much as you like, but you'll give yourself a nightmare of a debugging job.

C program help: Insufficient memory allocation but still works...why? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
behaviour of malloc(0)
I'm trying to understand memory allocation in C. So I am experimenting with malloc. I allotted 0 bytes for this pointer but yet it can still hold an integer. As a matter of fact, no matter what number I put into the parameter of malloc, it can still hold any number I give it. Why is this?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *ptr = (int*)malloc(0);
*ptr = 9;
printf("%i", *ptr); // 9
free(ptr);
return 0;
}
It still prints 9, what's up with that?
If size is 0, then malloc() returns either NULL, or a unique pointer
value that can later be successfully passed to free().
I guess you are hitting the 2nd case.
Anyway that pointer just by mistake happens to be in an area where you can write without generating segmentation fault, but you are probably writing in the space of some other variable messing up its value.
A lot of good answers here. But it is definitely undefined behavior. Some people declare that undefined behavior means that purple dragons may fly out of your computer or something like that... there's probably some history behind that outrageous claim that I'm missing, but I promise you that purple dragons won't appear regardless of what the undefined behavior will be.
First of all, let me mention that in the absence of an MMU, on a system without virtual memory, your program would have direct access to all of the memory on the system, regardless of its address. On a system like that, malloc() is merely the guy who helps you carve out pieces of memory in an ordered manner; the system can't actually enforce you to use only the addresses that malloc() gave you. On a system with virtual memory, the situation is slightly different... well, ok, a lot different. But within your program, any code in your program can access any part of the virtual address space that's mapped via the MMU to real physical memory. It doesn't matter whether you got an address from malloc() or whether you called rand() and happened to get an address that falls in a mapped region of your program; if it's mapped and not marked execute-only, you can read it. And if it isn't marked read-only, you can write it as well. Yes. Even if you didn't get it from malloc().
Let's consider the possibilities for the malloc(0) undefined behavior:
malloc(0) returns NULL.
OK, this is simple enough. There really is a physical address 0x00000000 in most computers, and even a virtual address 0x00000000 in all processes, but the OS intentionally doesn't map any memory to that address so that it can trap null pointer accesses. There's a whole page (generally 4KB) there that's just never mapped at all, and maybe even much more than 4KB. Therefore if you try to read or write through a null pointer, even with an offset from it, you'll hit these pages of virtual memory that aren't even mapped, and the MMU will throw an exception (a hardware exception, or interrupt) that the OS catches, and it declares a SIGSEGV (on Linux/Unix), or an illegal access (on Windows).
malloc(0) returns a valid address to previously unallocated memory of the smallest allocable unit.
With this, you actually get a real piece of memory that you can legally call your own, of some size you don't know. You really shouldn't write anything there (and probably not read either) because you don't know how big it is, and for that matter, you don't know if this is the particular case you're experiencing (see the following cases). If this is the case, the block of memory you were given is almost guaranteed to be at least 4 bytes and probably is 8 bytes or perhaps even larger; it all depends on whatever the size is of your implementation's minimum allocable unit.
malloc(0) intentionally returns the address of an unmapped page of
memory other than NULL.
This is probably a good option for an implementation, as it would allow you or the system to track & pair together malloc() calls with their corresponding free() calls, but in essence, it's the same as returning NULL. If you try to access (read/write) via this pointer, you'll crash (SEGV or illegal access).
malloc(0) returns an address in some other mapped page of memory
that may be used by "someone else".
I find it highly unlikely that a commercially-available system would take this route, as it serves to simply hide bugs rather than bring them out as soon as possible. But if it did, malloc() would be returning a pointer to somewhere in memory that you do not own. If this is the case, sure, you can write to it all you want, but you'd be corrupting some other code's memory, though it would be memory in your program's process, so you can be assured that you're at least not going to be stomping on another program's memory. (I hear someone getting ready to say, "But it's UB, so technically it could be stomping on some other program's memory. Yes, in some environments, like an embedded system, that is right. No modern commercial OS would let one process have access to another process's memory as easily as simply calling malloc(0) though; in fact, you simply can't get from one process to another process's memory without going through the OS to do it for you.) Anyway, back to reality... This is the one where "undefined behavior" really kicks in: If you're writing to "someone else's memory" (in your own program's process), you'll be changing the behavior of your program in difficult-to-predict ways. Knowing the structure of your program and where everything is laid out in memory, it's fully predictable. But from one system to another, things would be laid out in memory (appearing a different locations in memory), so the effect on one system would not necessarily be the same as the effect on another system, or on the same system at a different time.
And finally.... No, that's it. There really, truly, are only those four
possibilities. You could argue for special-case subset points for
the last two of the above, but the end result will be the same.
For one thing, your compiler may be seeing these two lines back to back and optimizing them:
*ptr = 9;
printf("%i", *ptr);
With such a simplistic program, your compiler may actually be optimizing away the entire memory allocate/free cycle and using a constant instead. A compiler-optimized version of your program could end up looking more like simply:
printf("9");
The only way to tell if this is indeed what is happening is to examine the assembly that your compiler emits. If you're trying to learn how C works, I recommend explicitly disabling all compiler optimizations when you build your code.
Regarding your particular malloc usage, remember that you will get a NULL pointer back if allocation fails. Always check the return value of malloc before you use it for anything. Blindly dereferencing it is a good way to crash your program.
The link that Nick posted gives a good explanation about why malloc(0) may appear to work (note the significant difference between "works" and "appears to work"). To summarize the information there, malloc(0) is allowed to return either NULL or a pointer. If it returns a pointer, you are expressly forbidden from using it for anything other than passing it to free(). If you do try to use such a pointer, you are invoking undefined behavior and there's no way to tell what will happen as a result. It may appear to work for you, but in doing so you may be overwriting memory that belongs to another program and corrupting their memory space. In short: nothing good can happen, so leave that pointer alone and don't waste your time with malloc(0).
The answer to the malloc(0)/free() calls not crashing you can find here:
zero size malloc
About the *ptr = 9, is just like overflowing a buffer (like malloc'ing 10 bytes and access the 11th), you are writing to memory you don't own, and doing that is looking for trouble. In this particular implementation malloc(0) happens to return a pointer instead of NULL.
Bottom line, it is wrong even if it seems to work on a simple case.
Some memory allocators have the notion of "minimum allocatable size". So, even if you pass zero, this will return pointer to the memory of word-size, for example. You need to check up with your system allocator documentation. But if it does return pointer to some memory it'd be wrong to rely on it as the pointer is only supposed to be passed either to be passed realloc() or free().

Allocating less space then necessary for a certain type?

I am relatively new to C programming and having a hard time understanding the whole memory allocation issue.
Let's say, I do:
int *n = malloc(sizeof(char));
// (assuming malloc doesn't return NULL of course)
That provides a Pointer to int, but I didn't allocate enough memory for an int. Why does it work then? I could even cast it to int explicitly and it wouldn't bother gcc. I am aware of C compilers being very minimalist, but even if I assign a value to *n, which doesn't fit in a char, like:
*n = 300;
... and print it out afterwards:
printf("%d", *n);
... it works perfectly fine, although now at the latest I'd expect some error like a segmentation fault.
I mean, sizeof(char) is 1 and sizeof(int) is 4 on my machine. Hence 3 bytes are written to some place in memory which hasn't been allocated properly.
Does it work just because it doesn't leave the stack?
Could somebody please point me to a place where I might find enlightenment concerning that stuff?
That provides a Pointer to int, but I didn't allocate enough memory for an int. Why does it work then?
The return value from malloc is void*, the language allows this to be implicitly converted to any pointer type, in this case int*. Compilers don't typically include behavior to check that what you passed to malloc met a specific size requirement, in real-world code that can be very difficult (when non-constant sizes not known at compile time are passed to malloc). As you said, C compiler are usually rather minimalist. There are such things as "static analysis" tools which can analyze code to try to find these bugs, but that's a whole different class of tool than a compiler.
... it works perfectly fine, although now at the latest I'd expect some error like a segmentation fault. I mean, sizeof(char) is 1 and sizeof(int) is 4 on my machine. Hence 3 bytes are written to some place in memory which hasn't been allocated properly.
Writing beyond the bounds of allocated memory is what is called "undefined behavior". That means that a compliant compiler can do whatever it wants when that happens. Sometimes it will crash, sometimes it can write over some other variable in your program, sometimes nothing will happen, and sometimes nothing will seem to happen and your program will crash at a later date.
In this particular case what is happening is that most implementations of malloc allocate a minimum of 16 bytes (or more or less, like 8 or 32) even if you ask for less. So when you overwrite your single allocated byte you're writing into "extra" memory that was not used for anything. It is highly not recommended that you rely on that behavior in any real program.
Does it work just because it doesn't leave the stack?
The stack has nothing to do with this particular situation.
Could somebody please point me to a place where I might find enlightenment concerning that stuff?
Any good C book will have information of this type, take a look here: The Definitive C Book Guide and List
Generally a 32bit machine will allocate new memory on a 32bit boundary - it makes memory access faster.
So it has allocated a byte, but the next 3bytes are unused
Don't rely on this!

Is unused memory in address space protected

Is the unused memory in address space of a process protected by just having read permission, so that writing to a location pointed by an unitialized pointer for example always cause a page fault to be trapped by the OS? Or is it not the case, and every memory location besides the code (which ofcourse is given read only access), is given write access?
I'm asking this because my friend was showing me his code where he didn't initialize a pointer and wrote in the memory pointed by it, but still his program wasn't crashing with mingw gcc compiler for windows but always crashing with visual c++, in mac or linux.
What I think is that the OS do not protect memory for unused areas and the crashing was being caused because in the code generated by the mingw, the random pointer value was pointing to some used area such as stack, heap or code, while in other cases it was pointing to some free area. But if the OS really doesn't protect the unused areas, wouldn't these sort of bugs, such as uninitialized pointers be difficult to debug?
I guess this is why it is advised to always assign NULL to a pointer after calling delete or free, so that when something is accessed with it, it really causes a visible crash.
Uninitialized pointers don't necessarily to point to unused address space. They could very well be values that happen to point to writeable memory. Such as a pointer on the stack that happened to be where a previously executed function stored a valid address.
In a typical, current server/desktop OS (and quite a few smaller systems such as cell phones as well) you have virtual memory. This means the OS builds a table that translates from the virtual address your code uses, to a physical address that specifies the actual memory being addressed. This mapping is normally done in "pages" -- e.g., a 4KB chunk of memory at a time.
At least in the usual case, parts of the address space that aren't in use at all simply won't be mapped at all -- i.e., the OS won't build an entry in the table for that part of the address space. Note, however, that memory that is allocated will (of necessity) be rounded to a multiple of the page size, so each chunk of memory that's in use will often be followed by some small amount that's not really in use, but still allocated and "usable". Since protection is also (normally) done on a per-page basis, if the rest of that page is (say) Read-only, the remainder at the tail end will be the same.
It depends on the implementation of the OS. In some configurations, for example, ExecShield will protect most of the memory that goes out of the bounds of the program, and also it is common that the first few bytes of the data segment to be protected (to signal access with NULL pointers), but it may be the case that the pointer actually points to a valid, arbitrary, memory address within the program.
Memory protection is not provided by c/c++. You may find that the pointer just happens to contain a pointer to valid memory, e.g. a previous function has a ptr variable on the stack and another function called later just happens to use the same stack space for a pointer.
The following code will print "Hello" if compiled and ran with gcc:
#include
char buffer[10];
void function1(void) {
char * ptr = buffer;
sprintf(buffer, "Hello");
return;
}
void function2(void) {
char * ptr2;
printf("%s\n", ptr2);
}
int main(int argc, char * argv[]) {
function1();
function2();
}
For debug builds some compilers (I know that Visual Studio used to do this) will secretly initialise all variables like ptr2 to a bad value to detect these kinds of error.
With C normally you find out that memory has been abused by the OS killing your program.
Simply, I assume the answer is "No, unused memory in address is not space protected." C isn't sophisticated enough to handle such instances.

Resources