Accessing memory below the stack on linux - c

This program accesses memory below the stack.
I would assume to get a segfault or just nuls when going out of stack bounds but I see actual data. (This is assuming 100kb below stack pointer is beyond the stack bounds)
Or is the system actually letting me see memory below the stack? Weren't there supposed to be kernel level protections against this, or does that only apply to allocated memory?
Edit: With 1024*127 below char pointer it randomly segfaults or runs, so the stack doesn't seem to be a fixed 8MB, and there seems to be a bit of random to it too.
#include <stdio.h>
int main(){
char * x;
int a;
for( x = (char *)&x-1024*127; x<(char *)(&x+1); x++){
a = *x & 0xFF;
printf("%p = 0x%02x\n",x,a);
}
}
Edit: Another wierd thing. The first program segfaults at only 1024*127 but if I printf downwards away from the stack I don't get a segfault and all the memory seems to be empty (All 0x00):
#include <stdio.h>
int main(){
char * x;
int a;
for( x = (char *)(&x); x>(char *)&x-1024*1024; x--){
a = *x & 0xFF;
printf("%p = 0x%02x\n",x,a);
}
}

When you access memory, you're accessing the process address space.
The process address space is divided into pages (typically 4 KB on x86). These are virtual pages: their contents are held elsewhere. The kernel manages a mapping from virtual pages to their contents. Contents can be provided by:
A physical page, for pages that are currently backed by physical RAM. Accesses to these happen directly (via the memory management hardware).
A page that's been swapped out to disk. Accessing this will cause a page fault, which the kernel handles. It needs to fill a physical page with the on-disk contents, so it finds a free physical page (perhaps swapping that page's contents out to disk), reads in the contents from disk, and updates the mapping to state that "virtual page X is in physical page Y".
A file (i.e. a memory mapped file).
Hardware devices (i.e. hardware device registers). These don't usually concern us in user space.
Suppose that we have a 4 GB virtual address space, split into 4 KB pages, giving us 1048576 virtual pages. Some of these will be mapped by the kernel; others will not. When the process starts (i.e. when main() is invoked), the virtual address space will contain, amongst other things:
Program code. These pages are usually readable and executable.
Program data (i.e. for initialised variables). This usually has some read-only pages and some read-write pages.
Code and data from libraries that the program depends on.
Some pages for the stack.
These things are all mapped as pages in the 4 GB address space. You can see what's mapped by looking at /proc/(pid)/maps, as one of the comments has pointed out. The precise contents and location of these pages depend on (a) the program in question, and (b) address space layout randomisation (ASLR), which makes locations of things harder to guess, thereby making certain security exploitation techniques more difficult.
You can access any particular location in memory by defining a pointer and dereferencing it:
*(unsigned char *)0x12345678
If this happens to point to a mapped page, and that page is readable, then the access will succeed and yield whatever's mapped at that address. If not, then you'll receive a SIGSEGV from the kernel. You could handle that (which is useful in some cases, such as JIT compilers), but normally you don't, and the process will be terminated. As noted above, due to ASLR, if you do this in a program and run the program several times then you'll get non-deterministic results for some addresses.

There is usually quite a bit of accessible memory below the stack pointer, because that memory is used when you grow the stack normally. The stack itself is only controlled by the value of the stack pointer - it is a software entity, not a hardware entity.
However, system code may assume typical stack usage. I. e., on some systems, the stack is used to store state for a context switch, while a signal handler runs, etc. This also depends on whether the hardware automatically switches stack pointers when leaving user mode. If the system does use your stack for this, it will clobber the data you stored there, and that can really happen at every point in your program.
So it is not safe to manipulate stack memory below the stack pointer. It's not even safe to assume that a value that has successfully been written will still be the same in the next line code. Only the portion above the stack pointer is guaranteed not to be touched by the runtime/kernel.
It goes without saying, that this code invokes undefined behavior. The pointer arithmetic is invalid, because the address &x-1024*127 is not allocated to the variable x, so that dereferencing this pointer invokes undefined behavior.

This is undefined behavior in C. You're accessing a random memory address which, depending on the platform, may or may not be on the stack. It may or may not be in memory this user can access; if not you will get a segfault or similar. There are absolutely no promises either way.
Actually, it's not undefined behaviour, it's pretty well defined. Accessing memory locations through pointers is and was always defined since C is as close to the hardware as it can be.
I however agree that accessing hardware through pointers when you don't know exactly what you're doing is a dangerous thing to do.
Don't Do That. (If you're one of the five or six people who has a legitimate reason to do this, you already know it and don't need our advice.)
It would be a poor world with only five or six people legitimately programming operating systems, embedded devices and drivers (although it sometimes appears as if the latter is the case...).

This is undefined behavior in C. You're accessing a random memory address which, depending on the platform, may or may not be on the stack. It may or may not be in memory this user can access; if not you will get a segfault or similar. There are absolutely no promises either way.
Don't Do That. (If you're one of the five or six people who has a legitimate reason to do this, you already know it and don't need our advice.)

Related

why am I not gettting Segmentation error?

I have
x=(int *)malloc(sizeof(int)*(1));
but still I am able to read x[20] or x[4].
How am I able to access those values? Shouldn't I be getting segmentation error while accessing those memory?
The basic premise is that of Sourav Ghosh's answer: accessing memory returned from malloc beyond the size you asked for is undefined behavior, so a conforming implementation is allowed to do pretty much anything, including happily returning bizarre values.
But given a "normal" implementation on mainstream operating systems on "normal" machines (gcc/MSVC/clang, Linux/Windows/macOS, x86/ARM) why do you sometimes get segmentation faults (or access violations), and sometimes not?
Pretty much every "regular" C implementation doesn't perform any kind of memory check when reading/writing through pointers1; these loads/stores get generally translated straight to the corresponding machine code, which accesses the memory at a given location without much regard for the size of the "abstract C machine" objects.
However, on these machines the CPU doesn't straight access the physical memory (RAM) of the PC, but a translation layer (MMU) is introduced2; whenever your program tries to access an address, the MMU checks to see whether anything has been mapped there, and if your process has permissions to write over there. In case any of those checks fail3, you get a segmentation fault and your process gets killed. This is why uninitialized and NULL pointer values generally give nice segfaults: some memory at the beginning of the virtual address space is reserved unmapped just to spot NULL dereferences, and in general if you throw a dart at random into a 32 bit address space (or even better, a 64 bit one) you are most likely to find zones of memory that have never been mapped to anything.
As good as it is, the MMU cannot catch all your memory errors for several reasons.
First of all, the granularity of memory mappings is quite coarse compared to most "run of the mill" allocations; on PCs memory pages (the smallest unit of memory that can be mapped and have protection attributes) are generally 4 KB in size. There is of course a tradeoff here: very small pages would require a lot of memory themselves (as there's a target physical address plus protection attributes associated to each page, and those have to be stored somewhere) and slow down the MMU operation3. So, if you access memory out of "logical" boundaries but still within the same memory page, the MMU cannot help you: as far as the hardware is concerned, you are still accessing valid memory.
Besides, even if you go outside of the last page of your allocation, it may be that the page that follows is "valid" as far as the hardware is concerned; indeed, this is pretty common for memory you get from the so-called heap (malloc & friends).
This comes from the fact that malloc, for smaller allocations, doesn't ask the OS for "new" blocks of memory (which in theory may be allocated keeping a guard page at both ends); instead, the allocator in the C runtime asks the OS for memory in big sequential chunks, and logically partitions them in smaller zones (usually kept in linked lists of some kind), which are handed out on malloc and returned back by free.
Now, when in your program you step outside the boundaries of the requested memory, you probably don't get any error as:
the memory chunk you are using isn't near a page boundary, so your out-of-bounds read doesn't trigger an access violation;
even if it was at the end of a page, the page that follows is still mapped, as it still belongs to the heap; it may either be memory that has been given to some other code of your process (so you are reading data of some unrelated part of your code), or a free memory zone (so you are reading whatever garbage happened to be left by the previous owner of the block when it freed it), or a zone used by the allocator to keep its bookkeping data (so you are reading parts of such data).
In all these cases except for the "free block" one, even if you were to write there you wouldn't get a segmentation fault, but you could corrupt unrelated data or the data structures of the heap (which generally results in crashes later, as the allocator finds inconsistencies in its data).
Notes
Although modern compilers provide special instrumented builds to trap some of these errors; gcc and clang, in particular, provide the so-called "address sanitizer".
This allows to introduce transparent paging (swapping out to disk memory zones that aren't actively used in case of low physical memory availability) and, most importantly, memory protection and address space separation (when a user-mode process is running, it "sees" a full virtual address space containing only his stuff, and nothing from the other processes or the kernel).
And it's not a failure put there on purpose by the operating system to be notified that the processes is trying to access memory that has been swapped out.
Given that each access to memory needs to go through the MMU, the mapping must be very fast, so the most used page mappings are kept in a cache; if you make the pages very small and the cache can hold just as many entries, you effectively have a smaller memory range covered by the cache.
No, accessing invalid memory is undefined behavior, and segmantation fault is one of the many side effects of UB. It is not guaranteed.
That said,
Always check for the success of the malloc() by checking the returned pointer against NULL before using the returned pointer.
Please see this: Do I cast the result of malloc?

Memcpy() works on out of bounds memory?

I have been playing around with the idea that memcpy() could be used for malevolent purposes. I made several test applications to see if I could "steal" data in memory from different regions. I have tested three regions thus far, heap, stack, and constant( read only ) memory. The constant memory was the only one to crash in my tests, provoking an error from MinGW.
Here is an example to illustrate my latest test :
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void removeTerminatingCharacters( char ** string, const int length )
{
int i = 0;
for ( ; i < length; ++i )
if ( !( *string )[i] )
( *string )[i] = '0';
return;
}
int main()
{
int * naive = malloc( sizeof( int ) );
*naive = 0;
char * stolenData = malloc( 2000 );
memset( stolenData, 0, 2000 );
memcpy( stolenData, naive, 1999 );
removeTerminatingCharacters( &stolenData, 2000 );
printf( "%s\n", stolenData );
free( stolenData );
return 0;
}
Output :
0000-0:0Væ1lDk¦#:00000[æ0`Dk00p,:0-0:0MAIN=Computer0USERNAME=JohnDoe0USERPRO
FILE=C:\Users\JohnDoe0WATCOM=C:\watcom0windir=C:\Windows00?æ1+Ik?000S?000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00??????????????????????????000000 00000000 000000?0?0?
00000000000 0 0 ?0000000000 0000000000 0000 00000???????????????????????0???????
0 00000000000000000000000000000000000000000000000
000000000000000000abcdefghijklmnopqrstuvwxyz000000ABCDEFGHIJKLMNOPQRSTUVWXYZ0000
0000â000000Ü0£0P00000000000è0î0Ä 0000000000¬0000000000¦0000¦00000aßGpSsµtFTOd8fe
ä?:0ú?:0-?:0T?:0)?:0`?:0£?:0-?:0p?:0?:0??:0+?:08?:0T?:0¦?:0µ?:0²?:0¶?:03?:0D?:0R
?:0ä :0- :0+ :0a :0v :0?!:0^!:0q!:0ë!:0ñ!:0+!:0±!:0¤":0P":0g":0ó":0¦":0+":0¦":0?
#:0.#:0B#:0W#:0x#:0ë#:00000ALLUSERSPROFILE=C:\ProgramData0APPDATA=C:\Users\Chris
topher\AppData\Roaming0asl.log=Destination=file0CLASSPATH=.;C:\Program Files (x8
6)\Java\jre6\lib\ext\QTJava.zip0CommonProgramFiles=C:\Program Files (x86)\Common
Files0CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files0CommonProgram
W6432=C:\Program Files\Common Files0COMPUTERNAME=COMPUTER0ComSpec=C:\Windows\sys
tem32\cmd.exe0FPPUILang=en-US0FP_NO_HOST_CHECK=NO0HOMEDRIVE=C:0HOMEPATH=\Users\C
hristopher0HuluDesktopPath=C:\Users\JohnDoe\AppData\Local\HuluDesktop\instan
ces\0.9.13.1\HuluDesktop.exe0LOCALAPPDATA=C:\Users\JohnDoe\AppData\Local0LOG
ONSERVER=\\COMPUTER0NUMBER_OF_PROCESSORS=20OnlineServices=Online Services0OOBEUI
Lang=en-US0OS=Windows_NT0Path=.;F:\CodeBlocks\MinGW\bin;F:\CodeBlocks\MinGW;C:\M
inGW\bin;C:\MinGW;C:\Windows\System32;C:\Windows;C:\Windows\System32\wbem;C:\Pro
gram Files\Common Files\Microsoft Shared\Windows Live;C:\Program Files (x86)\Com
mon Files\Microsoft Shared\Windows Live;C:\Windows\System32\WindowsPowerShell\v1
.0;c:\Program Files (x86)\ATI Technologies\ATI.ACE\Core-Static;c:\Program Files
(x86)\Common F0Uô1m+k
My code isn't pretty, but it demonstrates my point. As you can see, the data is mostly garbage values, but there are a few interesting strings thrown in from the heap.
My primary question is why doesn't this action cause a memory access violation error?
Memory access violation errors are caught by virtual memory hardware, when you access an unmapped page. Not every address which is out of bounds is in an unmapped page. Pages are generally of equal sizes. A typical page size if 4096 bytes. (Page sizes are hardware-specific: some chips have memory management that allows for programmable page sizes, and even mixtures of different page sizes for different areas of memory.) Sometimes only part of a page contains valid data. It's not possible for just a fragment of a page to be unmapped, so the part which contains garbage is also mapped. Also, memory managers like malloc do not always give memory back to the operating system; they keep free areas for re-use. Those areas are valid memory (mapped pages). Also, making up pointer values, you can, by fluke, simply end up going beyond out-of-bounds, and can "land" in memory that corresponds to a valid object.
This is just how it works on your PC, with hits virtual memory OS. Virtual memory is not ubiquitous. On computers without virtual memory (nowadays, small embedded systems), you can access any location in memory. However, accessing certain areas may have side effects that change the state of hardware (namely I/O registers). Some address ranges may trigger a "bus error" type CPU exception because no hardware exists for that range, and so the access request times out. Other than that, as far as valid program memory goes, it is not protected from out-of-bounds accesses.
Operating systems without memory protection were used for early interactive systems in the 1960's. That history then repeated itself when personal microcomputers appeared: again, they had operating systems without memory protection, due to small memories and unsophisticated CPU's. On these operating systems, applications often stomped over each other's memory spaces, leading to frequent crashes. (Imagine that with your memcpy, you not only copy your own out-of-bounds area, such as some malloc block which was previously freed, but also an area from a completely different running program.) Users sometimes spotted patterns, like when certain programs were loaded into memory in certain orders, there were fewer problems, or so-called "conflicts" between applications.
Yes, it does can be used to malevolent purposes, but not in the way you may be thinking.
The memory your application read and write is your process' virtual memory, it is not the physical memory. You can't even know in what address of the physical memory yours or any application really is, only the system kernel knows that.
You cannot interact with active memory from other processes without proper permission and use of syscalls aswell.
You can howover, read the garbage left in unmapped memory areas that were once used by other processes, and you may even stumble in sensitive information left there, like passwords, certificate keys or personal stuff the user typed at some point, but you are on highly volatile grounds, the information there is most likely corrupted and there is no easy way to seek specific pieces of information.
Here is an article about that: https://security.stackexchange.com/questions/29019/are-passwords-stored-in-memory-safe
You must not try to access out of bounds memory.
Accessing memory you must not access results in Undefined Behavior.
Anything may happen.
You may be able to "steal" the memory. But that's not something you should rely on and write your code based on that.
memcpy() doesn't do any bounds check, the programmer has to take care of it. In your case, it causes a heap overflow.
This pretty much sums up the consequences of a heap overflow:
https://www.owasp.org/index.php/Heap_overflow
Description
A heap overflow condition is a buffer overflow, where the buffer that
can be overwritten is allocated in the heap portion of memory,
generally meaning that the buffer was allocated using a routine such
as the POSIX malloc() call.
Consequences
Availability: Buffer overflows generally lead to crashes. Other attacks leading to lack of availability are possible, including
putting the program into an infinite loop.
Access control (memory and instruction processing): Buffer overflows often can be used to execute arbitrary code, which is
usually outside the scope of a program's implicit security policy.
Other: When the consequence is arbitrary code execution, this can often be used to subvert any other security service.
Avoidance and mitigation
Pre-design: Use a language or compiler that performs automatic bounds checking.
Design: Use an abstraction library to abstract away risky APIs. Not a complete solution.
Pre-design through Build: Canary style bounds checking, library changes which ensure the validity of chunk data, and other such fixes
are possible, but should not be relied upon.
Operational: Use OS-level preventative functionality. Not a complete solution.
Discussion
Heap overflows are usually just as dangerous as stack overflows.
Besides important user data, heap overflows can be used to overwrite
function pointers that may be living in memory, pointing it to the
attacker's code.
Even in applications that do not explicitly use
function pointers, the run-time will usually leave many in memory. For
example, object methods in C++ are generally implemented using
function pointers. Even in C programs, there is often a global offset
table used by the underlying runtime.
To answer your question of "why doesn't this action cause a memory access violation error?", the answer is that it's because the memory in the area for the naive allocation has been allocated to the process. Probably it's memory that the C runtime has prepared for further possible malloc() requests.
On most desktop systems, the smallest unit of memory protection is a page, which can vary in size but will often be in the range of 4KB or so. You might well have run into a memory access violation if your read request were larger (but still maybe not - the heap might be much larger than your small allocation). Then again, your allocation could come near then end of a page and hit an access error right away (debug allocators will sometimes do this purposefully to help catch errors).
memcpy can't realistically be used for "malevolent" purposes. The only memory that memcpy can access is the memory belonging to your own process (which you own, anyway). If you screw up your own memory, your program takes the fall, but the OS will protect all other processes (and you cannot touch their memory spaces without mmap or similar).
Most of those "interesting strings" are environment variables set by the OS and passed to your program.

Declare a pointer to an integer at address 0x200 in memory

I have a couple of doubts, I remember some where that it is not possible for me to manually put a variable in a particular location in memory, but then I came across this code
#include<stdio.h>
void main()
{
int *x;
x=0x200;
printf("Number is %lu",x); // Checkpoint1
scanf("%d",x);
printf("%d",*x);
}
Is it that we can not put it in a particular location, or we should not put it in a particular location since we will not know if it's a valid location or not?
Also, in this code, till the first checkopoint, I get output to be 512.
And then after that Seg Fault.
Can someone explain why? Is 0x200 not a valid memory location?
In the general case - the behavior you will get is undefined - everything can happen.
In linux for example, the first 1GB is reserved for kernel, so if you try to access it - you will get a seg fault because you are trying to access a kernel memory in user mode.
No idea how it works in windows.
Reference for linux claim:
Currently the 32 bit x86 architecture is the most popular type of
computer. In this architecture, traditionally the Linux kernel has
split the 4GB of virtual memory address space into 3GB for user
programs and 1GB for the kernel.
Adding to what #amit wrote:
In windows it is the same. In general it is the same for all protected-mode operating systems. Since DOS etc. are no longer around it is the same with all systems except kernel-mode (km-drivers) and embedded systems.
The operating system manages which memory-pages you are allowed to write to and places markers that will make the cpu automatically raise access-violations if some other page is written to.
Up until the "checkpoint", you haven't accessed memory location 0x200, so everything works fine.
There I'd a local variable x in the function main. It is of type "pointer to int". x is assigned the value 0x200, and then that value is printed. But the target of x hasn't been accessed, so up to this point it doesn't matter whether x holds a valid memory address or not.
Then scanf tries to write to the memory address you passed in, which is the 0x200 stored in x. Then you get a seg fault, which is certainly sac possible result of trying to write to an arbitrary memory address.
So what are your doubts? What makes you think that this might work, when you come across this code that clearly doesn't?
Writing to a particular memory address might work under certain conditions, but is extremely unlikely to in general. Under all modern OSes, normal programs do not have control over their memory layout. The OS decides where initial things like the program's code, stack, and globals go. The OS will probably also be using some memory space, and it is not required to tell you what it's using. Instead you ask for memory (either by making variables or by calling memory allocation routines), and you use that.
So writing to particular addresses is very very likely to get either memory that hasn't been allocated, or memory that is being used for some other purpose. Neither of those is good, even if you do manage to hit an address that is actually writable. What if you clobber sundry some piece of data used by one of your program's other variables? Or some other part of your program clobbers the value you just wrote?
You should never be choosing a particular hard-coded memory address, you should be using an address of something you know is a variable, or an address you got from something like malloc.

How Process Size is determined? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am very new to these concepts but I want to ask you all a question that is very basic I think, but I am confused, So I am asking it.
The question is...
How is the size of a process determined by the OS?
Let me clear it first, suppose that I have written a C program and I want to know that how much memory it is going to take, how can I determine it? secondly I know that there are many sections like code section, data section, BSS of a process. Now does the size of these are predetermined? secondly how the size of Stack and heap are determined. does the size of stack and heap also matters while the Total size of process is calculated.
Again we say that when we load the program , an address space is given to the process ( that is done by base and limit register and controlled by MMU, I guess) and when the process tries to access a memory location that is not in its address space we get segmentation fault. How is it possible for a process to access a memory that is not in its address space. According to my understanding when some buffer overflows happens then the address gets corrupted. Now when the process wants to access the corrupted location then we get the segmentation fault. Is there any other way of Address violation.
and thirdly why the stack grows downward and heap upwards.Is this process is same with all the OS. How does it affects the performance.why can't we have it in other way?
Please correct me, if I am wrong in any of the statement.
Thanks
Sohrab
When a process is started it gets his own virtual address space. The size of the virtual address space depends on your operating system. In general 32bit processes get 4 GiB (4 giga binary) addresses and 64bit processes get 18 EiB (18 exa binary) addresses.
You cannot in any way access anything that is not mapped into your virtual address space as by definition anything that is not mapped there does not have an address for you. You may try to access areas of your virtual address space that are currently not mapped to anything, in which case you get a segfault exception.
Not all of the address space is mapped to something at any given time. Also not all of it may be mapped at all (how much of it may be mapped depends on the processor and the operating system). On current generation intel processors up to 256 TiB of your address space may be mapped. Note that operating systems can limit that further. For example for 32 bit processes (having up to 4 GiB addresses) Windows by default reserves 2 GiB for the system and 2 GiB for the application (but there's a way to make it 1 GiB for the system and 3 GiB for the application).
How much of the address space is being used and how much is mapped changes while the application runs. Operating system specific tools will let you monitor what the currently allocated memory and virtual address space is for an application that is running.
Code section, data section, BSS etc. are terms that refer to different areas of the executable file created by the linker. In general code is separate from static immutable data which is separate from statically allocated but mutable data. Stack and heap are separate from all of the above. Their size is computed by the compiler and the linker. Note that each binary file has his own sections, so any dynamically linked libraries will be mapped in the address space separately each with it's own sections mapped somewhere. Heap and stack, however, are not part of the binary image, there generally is just one stack per process and one heap.
The size of the stack (at least the initial stack) is generally fixed. Compilers and/or linkers generally have some flags you can use to set the size of the stack that you want at runtime. Stacks generally "grow backward" because that's how the processor stack instructions work. Having stacks grow in one direction and the rest grow in the other makes it easier to organize memory in situations where you want both to be unbounded but do not know how much each can grow.
Heap, in general, refers to anything that is not pre-allocated when the process starts. At the lowest level there are several logical operations that relate to heap management (not all are implemented as I describe here in all operating systems).
While the address space is fixed, some OSs keep track of which parts of it are currently reclaimed by the process. Even if this is not the case, the process itself needs to keep track of it. So the lowest level operation is to actually decide that a certain region of the address space is going to be used.
The second low level operation is to instruct the OS to map that region to something. This in general can be
some memory that is not swappable
memory that is swappable and mapped to the system swap file
memory that is swappable and mapped to some other file
memory that is swappable and mapped to some other file in read only mode
the same mapping that another virtual address region is mapped to
the same mapping that another virtual address region is mapped to, but in read only mode
the same mapping that another virtual address region is mapped to, but in copy on write mode with the copied data mapped to the default swap file
There may be other combinations I forgot, but those are the main ones.
Of course the total space used really depends on how you define it. RAM currently used is different than address space currently mapped. But as I wrote above, operating system dependent tools should let you find out what is currently happening.
The sections are predetermined by the executable file.
Besides that one, there may be those of any dynamically linked libraries. While the code and constant data of a DLL is supposed to be shared across multiple processes using it and not be counted more than once, its process-specific non-constant data should be accounted for in every process.
Besides, there can be dynamically allocated memory in the process.
Further, if there are multiple threads in the process, each of them will have its own stack.
What's more, there are going to be per-thread, per-process and per-library data structures in the process itself and in the kernel on its behalf (thread-local storage, command line params, handles to various resources, structures for those resources as well and so on and so forth).
It's difficult to calculate the full process size exactly without knowing how everything is implemented. You might get a reasonable estimate, though.
W.r.t. According to my understanding when some buffer overflows happens then the address gets corrupted. It's not necessarily true. First of all, the address of what? It depends on what happens to be in the memory near the buffer. If there's an address, it can get overwritten during a buffer overflow. But if there's another buffer nearby that contains a picture of you, the pixels of the picture can get overwritten.
You can get segmentation or page faults when trying to access memory for which you don't have necessary permissions (e.g. the kernel portion that's mapped or otherwise present in the process address space). Or it can be a read-only location. Or the location can have no mapping to the physical memory.
It's hard to tell how the location and layout of the stack and heap are going to affect performance without knowing the performance of what we're talking about. You can speculate, but the speculations can turn out to be wrong.
Btw, you should really consider asking separate questions on SO for separate issues.
"How is it possible for a process to access a memory that is not in its address space?"
Given memory protection it's impossible. But it might be attempted. Consider random pointers or access beyond buffers. If you increment any pointer long enough, it almost certainly wanders into an unmapped address range. Simple example:
char *p = "some string";
while (*p++ != 256) /* Always true. Keeps incrementing p until segfault. */
;
Simple errors like this are not unheard of, to make an understatement.
I can answer to questions #2 and #3.
Answer #2
When in C you use pointers you are really using a numerical value that is interpreted as address to memory (logical address on modern OS, see footnotes). You can modify this address at your will. If the value points to an address that is not in your address space you have your segmentation fault.
Consider for instance this scenario: your OS gives to your process the address range from 0x01000 to 0x09000. Then
int * ptr = 0x01000;
printf("%d", ptr[0]); // * prints 4 bytes (sizeof(int) bytes) of your address space
int * ptr = 0x09100;
printf("%d", ptr[0]); // * You are accessing out of your space: segfault
Mostly the causes of segfault, as you pointed out, are the use of pointers to NULL (that is mostly 0x00 address, but implementation dependent) or the use of corrupted addresses.
Note that, on linux i386, base and limit register are not used as you may think. They are not per-process limits but they point to two kind of segments: user space or kernel space.
Answer #3
The stack growth is hardware dependent and not OS dependent. On i386 assembly instruction like push and pop make the stack grow downwards with regard to stack related registers. For instance the stack pointer automatically decreases when you do a push, and increases when you do a pop. OS cannot deal with it.
Footnotes
In a modern OS, a process uses the so called logic address. This address is mapped with physical address by the OS. To have a note of this compile yourself this simply program:
#include <stdio.h>
int main()
{
int a = 10;
printf("%p\n", &a);
return 0;
}
If you run this program multiple times (even simultaneously) you would see, even for different instances, the same address printed out. Of course this is not the real memory address, but it is a logical address that will be mapped to physical address when needed.

Is unused memory in address space protected

Is the unused memory in address space of a process protected by just having read permission, so that writing to a location pointed by an unitialized pointer for example always cause a page fault to be trapped by the OS? Or is it not the case, and every memory location besides the code (which ofcourse is given read only access), is given write access?
I'm asking this because my friend was showing me his code where he didn't initialize a pointer and wrote in the memory pointed by it, but still his program wasn't crashing with mingw gcc compiler for windows but always crashing with visual c++, in mac or linux.
What I think is that the OS do not protect memory for unused areas and the crashing was being caused because in the code generated by the mingw, the random pointer value was pointing to some used area such as stack, heap or code, while in other cases it was pointing to some free area. But if the OS really doesn't protect the unused areas, wouldn't these sort of bugs, such as uninitialized pointers be difficult to debug?
I guess this is why it is advised to always assign NULL to a pointer after calling delete or free, so that when something is accessed with it, it really causes a visible crash.
Uninitialized pointers don't necessarily to point to unused address space. They could very well be values that happen to point to writeable memory. Such as a pointer on the stack that happened to be where a previously executed function stored a valid address.
In a typical, current server/desktop OS (and quite a few smaller systems such as cell phones as well) you have virtual memory. This means the OS builds a table that translates from the virtual address your code uses, to a physical address that specifies the actual memory being addressed. This mapping is normally done in "pages" -- e.g., a 4KB chunk of memory at a time.
At least in the usual case, parts of the address space that aren't in use at all simply won't be mapped at all -- i.e., the OS won't build an entry in the table for that part of the address space. Note, however, that memory that is allocated will (of necessity) be rounded to a multiple of the page size, so each chunk of memory that's in use will often be followed by some small amount that's not really in use, but still allocated and "usable". Since protection is also (normally) done on a per-page basis, if the rest of that page is (say) Read-only, the remainder at the tail end will be the same.
It depends on the implementation of the OS. In some configurations, for example, ExecShield will protect most of the memory that goes out of the bounds of the program, and also it is common that the first few bytes of the data segment to be protected (to signal access with NULL pointers), but it may be the case that the pointer actually points to a valid, arbitrary, memory address within the program.
Memory protection is not provided by c/c++. You may find that the pointer just happens to contain a pointer to valid memory, e.g. a previous function has a ptr variable on the stack and another function called later just happens to use the same stack space for a pointer.
The following code will print "Hello" if compiled and ran with gcc:
#include
char buffer[10];
void function1(void) {
char * ptr = buffer;
sprintf(buffer, "Hello");
return;
}
void function2(void) {
char * ptr2;
printf("%s\n", ptr2);
}
int main(int argc, char * argv[]) {
function1();
function2();
}
For debug builds some compilers (I know that Visual Studio used to do this) will secretly initialise all variables like ptr2 to a bad value to detect these kinds of error.
With C normally you find out that memory has been abused by the OS killing your program.
Simply, I assume the answer is "No, unused memory in address is not space protected." C isn't sophisticated enough to handle such instances.

Resources