Why are virtual addresses so big in my C program? - c

I recently learned about virtual memory and paging and that compilers only generate virtual addresses starting by 1 and simply counting upwards. I thought I'd test this and wrote the short C progam below that instantiates a global variable and prints it's address, expecting a very small value, since the CPU only sees the virtual addresses, but instead I get 4247584. What is going on here, are my assumptions wrong? And if possible, what would be a program that shows virtual addresses being generated from 1 up?
My program:
#include <stdio.h>
int x = 0;
int main(){
printf("%d\n", &x);
return 0;
}
(I'm using gcc 4.8.1 on Windows 10)

The actual value of a virtual address is relatively non-essential (well, because it's virtual). There's nothing "wasted" when it doesn't start at 0. The only precondition for address values is that the program, data and all its associated shared libraries actually fit into the value-space.
For security reasons, however, it makes sense to allocate the various code and data areas of a process in virtual address space in a way non-reproducible by a potential attacker (makes code injection attacks at fixed addresses virtually[sic] impossible), that is why modern operating systems allocate virtual address space values for a program randomly.
On some operating systems like Linux you may be able to switch off virtual address space layout randomization and thus make it reproducible. Addresses will most probably still not start at zero, because libraries and startup code will most likely occupy addresses lower than your own program.

Related

Will memory addresses be the same if I run a program in a VM from two different computers?

Fairly new to C and I learned that addresses depend on a few things like the operating system and the CPU. I have a lab for one of my C courses that asks us if we run a program and print out the address for each variable will they have the same address and value as another student's (exact same program). They are local variables, stored on the stack. Normally I would say no but all of us are required to ssh to our University's lab and our programs are being run on the same machines with the same specs. This is where I'm confused, pretty sure that the values will be the same however, I don't know what exactly determines these addresses. Here is a piece of code from the program:
int g2(int a, int b)
{
int c = g1(a + 3, b - 11);
printf("g2: %d %d %d \n", a,b,c);
printf("a's address is %p b's address is %p C's address is %p\n", &a, &b, &c);
return c - b;
}
For me a's address is 0x7ffe9bce4a0c. Also not just looking for a homework answer, asking here because none of my Teammates have sent me their addresses which we were allowed to do. Have researched it but can't find an answer that matches this sort of situation, any help is greatly appreciated, thank you!
"Will memory addresses be the same if I run a program in a VM from two different computers?"
No, they probably won´t even be the same when running only in the same environment and on the same machine. There is nothing like a guarantee that it will have the same address.
A modern-day OS assigns the memory arbitrarily (within certain sections of course).
And this has a good reason: To protect against the exploitation of memory vulnerabilities a hacker could use to harm the program or even the OS.
This technique is called Address Space Layout Randomization. You can read more about it here.
It could be that the variables may have the same address on several executions, but there is no guarantee that this will happen again, already on the next run. In fact, if the OS supports ASLR, It is more likely, that there is the "almost-guarantee" that the addresses will be unequal.
The virtual machine shall have no influence on that behavior. Maybe you should read more in the documentation about the memory storage for your particular virtual machine (if it supports ASLR), but it shall follow the same guidelines.
short answer, no.
operating system loads program in different position every time.
the address that you see is not the actual address in the memory. There is an abstract address layer, supplied by the operating system. You can read about virtual memory addresses if you would like you. You will probably learn it in a course on Operating Systems
Whether you get the same address or varying addresses depends on the operating system.
Not too many years ago, if a program printed the address of one of the local variables in its function, that address would be the same every time the program was run, as long as the function was called in the same point in program execution with the same program input and other circumstances. (Which functions are called, including recursive calls, and how much stack space they use could be affected by program input and other factors.) This was true because, when the program was loaded and initialized, its stack was always started at the same memory address.
This behavior was exploited by malicious people—if there were bugs in the program, they might be exploited, and knowing which addresses were used in the program helps some exploits. So common operating systems have changed it. Now, when a program is started, the locations of its stack and other parts of its memory layout are adjusted randomly. This is called Address Space Layout Randomization (ASLR).
So, in common modern operating systems, you will get varying addresses from run to run when printing the address of a local variable. In specialized operating systems, such as for embedded devices, you may get the same address every time.
The title of your question asks about “a VM,” presumably for virtual machine, but this is not mentioned in the body of your question. To the extent that a virtual machine implements a machine properly, it should produce identical behavior. So whether a program is running in a virtual machine or not should be irrelevant to this question.

Access to a specific memory address

I'm a new C programmer, still learning the language itself.
Anyway -
I'm trying to access a specific memory address.
I've written this code:
#include <stdio.h>
int main()
{
int* p = (int*) 0x4e0f68;
*p = 12;
getchar();
}
When I try to access a specific memory address like that, the program crashes.
I don't know if this information is relevant, but I'm using Windows 7 and Linux Ubuntu.
(I've tried this code only on Windows 7).
Any explanations why the program crashes?
How can I access a specific memory address (an address which is known at compile-time, I don't mean to dynamic memory allocation)?
Thanks.
That's memory you don't own and accessing it is undefined behavior. Anything can happen, including crashing.
On most systems, you'd be able to inspect the memory (although technically still undefined behavior), but writing to it is a whole different story.
Strictly speaking you cannot create a valid pointer like this. Valid pointers must point to valid objects (either on your stack or obtained from malloc).
For most modern operating systems you have a virtual memory space that only your process can see. As you request more memory from the system (malloc, VirtualAlloc, mmap, etc) this virtual memory is mapped into real usable memory that you can safely read and write to. So you can't just take an arbitrary address and try to use it without OS cooperation.
An example for windows:
#include <windows.h>
#include <stdio.h>
int main(void)
{
SYSTEM_INFO sysinfo;
GetSystemInfo(&sysinfo);
unsigned pageSize = sysinfo.dwPageSize;
printf("page size: %d\n", pageSize);
void* target = (void*)0x4e0f68;
printf("trying to allocate exactly one page containing 0x%p...\n", target);
void* ptr = VirtualAlloc(target, pageSize, MEM_COMMIT, PAGE_READWRITE);
if (ptr)
printf("got: 0x%p\n", ptr); // ptr <= target < ptr+pageSize
else
printf("failed! OS wont let us use that address.\n");
return 0;
}
Note that this will give you different results on different runs. Try it more than once.
Just to clrify one phrase the OP wrote: strictly speaking, no address associated to a program (code or data) is known at compile time. Programs usually are loaded at whatever address the OS determines. The final address a program sees (for example, to read a global variable) is patched by the OS in the very program code, using some sort of relocation table. DLL functions called by a program have a similar mechanism, where the IDATA section of the executable is converted into a jump table to jump to the actual address of a function in a DLL, taking the actual addresses from the DLL in memory.
That said, it is indeed possible to know by advance where a variable will be placed, if the program is linked with no relocation information. This is possible in Windows, where you can tell the linker to load the program to an absolute virtual address. The OS loader will try to load your program to that address, if possible.
However, this feature is not recommended because it can lead to easily exploiting possible security holes. If an attacker discovers a security hole in a program and try to inject code into it, it will be easier for him if the program has all its variables and functions in specific addresses, so the malicious code will know where to make patches to gain control of that program.
What you're getting is a segfault - when you're trying to access memory you don't have permission to access. Pointers, at least for userspace, must point to some variable, object, function, etc. You can set a pointer to a variable with the & operator - int* somePtr = &variableToPointTo, or to another pointer - int* someNewPtr = somePtr. In kernel mode (ring 0) or for OS development, you can do that, BUT IT IS NOT ADVISED TO DO SO. In MS-DOS, you could destroy your machine because there was no protection against that.

Why would setting a variable to its own address give different results on different program runs?

Yesterday I can across this obfuscated C code implementing Conway's Game of Life. As a pseudorandom generator, it writes code to this effect:
int pseudoRand = (int) &pseudoRand;
According to the author's comments on the program:
This is a big number that should be different on each run, so it works nicely as a seed.
I am fairly confident that the behavior here is either implementation-defined or undefined. However, I'm not sure why this value would vary from run to run. My understanding of how most OS's work is that, due to virtual memory, the stack is initialized to the same virtual address each time the program is run, so the address should be the same each time.
Will this code actually produce different results across different runs on most operating systems? Is it OS-dependent? If so, why would the OS map the same program to different virtual addresses on each run?
Thanks!
While the assignment of addresses to objects with automatic storage is unspecified (and the conversion of an address to an integer is implementation-defined), what you're doing in your case is simply stealing the entropy the kernel assigned to the initial stack address as part of Address space layout randomization (ASLR). It's a bad idea to use this as a source of entropy which may leak out of your program, especially in applications interacting over a network with untrusted, possibly malicious remote hosts, since you're essentially revealing the random address base the kernel gave you to an attacker who might want to know it and thereby defeating the purpose of ASLR. (Even if you just use this as a seed, as long as the attacker knows the PRNG algorithm, they can reverse it to get the seed.)

Declare a pointer to an integer at address 0x200 in memory

I have a couple of doubts, I remember some where that it is not possible for me to manually put a variable in a particular location in memory, but then I came across this code
#include<stdio.h>
void main()
{
int *x;
x=0x200;
printf("Number is %lu",x); // Checkpoint1
scanf("%d",x);
printf("%d",*x);
}
Is it that we can not put it in a particular location, or we should not put it in a particular location since we will not know if it's a valid location or not?
Also, in this code, till the first checkopoint, I get output to be 512.
And then after that Seg Fault.
Can someone explain why? Is 0x200 not a valid memory location?
In the general case - the behavior you will get is undefined - everything can happen.
In linux for example, the first 1GB is reserved for kernel, so if you try to access it - you will get a seg fault because you are trying to access a kernel memory in user mode.
No idea how it works in windows.
Reference for linux claim:
Currently the 32 bit x86 architecture is the most popular type of
computer. In this architecture, traditionally the Linux kernel has
split the 4GB of virtual memory address space into 3GB for user
programs and 1GB for the kernel.
Adding to what #amit wrote:
In windows it is the same. In general it is the same for all protected-mode operating systems. Since DOS etc. are no longer around it is the same with all systems except kernel-mode (km-drivers) and embedded systems.
The operating system manages which memory-pages you are allowed to write to and places markers that will make the cpu automatically raise access-violations if some other page is written to.
Up until the "checkpoint", you haven't accessed memory location 0x200, so everything works fine.
There I'd a local variable x in the function main. It is of type "pointer to int". x is assigned the value 0x200, and then that value is printed. But the target of x hasn't been accessed, so up to this point it doesn't matter whether x holds a valid memory address or not.
Then scanf tries to write to the memory address you passed in, which is the 0x200 stored in x. Then you get a seg fault, which is certainly sac possible result of trying to write to an arbitrary memory address.
So what are your doubts? What makes you think that this might work, when you come across this code that clearly doesn't?
Writing to a particular memory address might work under certain conditions, but is extremely unlikely to in general. Under all modern OSes, normal programs do not have control over their memory layout. The OS decides where initial things like the program's code, stack, and globals go. The OS will probably also be using some memory space, and it is not required to tell you what it's using. Instead you ask for memory (either by making variables or by calling memory allocation routines), and you use that.
So writing to particular addresses is very very likely to get either memory that hasn't been allocated, or memory that is being used for some other purpose. Neither of those is good, even if you do manage to hit an address that is actually writable. What if you clobber sundry some piece of data used by one of your program's other variables? Or some other part of your program clobbers the value you just wrote?
You should never be choosing a particular hard-coded memory address, you should be using an address of something you know is a variable, or an address you got from something like malloc.

Strange memory allocation code in C, how it works?

How does this code work???
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *addr = (int*) 0x4888d0;
*addr = 30;
printf("%i %p\n", *addr, addr);
return 0;
}
It works by assuming 0x4888d0 is a the address of a writable block of memory of at least sizeof(int) bytes that does not interfere with the functionality of printf or the C runtime system.
Or rather, it doesn't work, at least not on my system (Segmentation fault).
There is nothing strange in it, however, it seems quite dangerous. What this program is trying to do is to write 30 at a specific location. i.e., the Location whose address is contained in 0x4888d0.
Why this code is written like this and why this particular address, well this is anybody's guess.
int *addr = (int*) 0x4888d0; will give addr an address =0x4888d0. This address might be a valid address. But there is no guarantee that it will always work.
As stated above this tends to segment value, or corrupt, normal applications. However,
Modern computers tend to have reserved memory addresses that do magic things, like control I/O, set CPU modes, update memory maps, etc. Memory pages with such are "real" addresses not mapped into the virtual memory regular applications get. Such is where the kernel communicates with hardware controllers. The fact that the provided fragment pokes a memory location and then promptly reads it back is typical of asking a controller for some type of status and then getting the status back (any write to a magic word can update the status the controller makes available to the software... the value may not be important).
So, if this code is from kernel space, or if this is in some micro-controller, or other strange system, the magic memory addresses could be available. Another possibility is that a privileged application has requested special virtual memory mapping from the kernel that can also expose magic pages to it. This can get weird as while the application requested something be mapped to virtual memory location including 0x4888d0, the real memory page could be quite different (and unavailable to the application).

Resources