Static and Dynamic Memory Addresses in C - c

printf("address of literal: %p \n", "abc");
char alpha[] = "abcdef";
printf("address of alpha: %p \n", alpha);
Above, literal is stored in static memory, alpha is stored in dynamic memory. I read in a book that some compilers show these two addresses using different number of bits (I only tried using gcc on Linux, and it does show different number of bits). Does it depend on the compiler, or the operating system and hardware?

I only tried using gcc on Linux, and it does show different number of bits
It's not that it "uses a different number of bits". As far as I know, Linux – at least when running the major platforms I know of (e.g. x86, x64, ARM32) – doesn't have "near" and "far" pointers. For instance, on x86, every pointer is 32 bits wide and on x64, every pointer is 64 bits wide.
It's just that…
the compiler probably allocates the alpha array on the stack (which it is allowed to do since it has automatic storage duration. It is most probably not stored in "dynamic memory", that would be stupid since that would involve a superfluous dynamic allocation, which is one of the slowest things you can do with memory.)
Meanwhile the literals themselves, having static storage duration, are stored elsewhere (usually in the data segment of the executable);
and on top of that, the OS's memory manager happens to place these two things (the stack and the executable image) far apart, so one of them has addresses that start with a lot of zeroes, while addresses in the other one don't have that many leading zeroes.
Furthermore, the default behavior of %p in your libc implementation happens to not print leading zeroes.

alpha is stored e.g. in the stack or another dynamic memory segment. The literal is stored inside the code segment. These are different address ranges.
The addresses are platform dependant. In most cases the pointer size is 4 bytes long, but the addresses for the different segments are in different ranges.
The addresses are platform dependant.
The linker is responsible for the address assignment. You may want to enable an option to let the linker produce an address map file.
The dynamic parts are also called data segments. The static parts are code segments. You will find a lot literature searching for this term for your platform (e.g. search for "x86 memory segmentation").

Related

Difference between variables' addresses

Why do variable addresses differ by a specific amount each time I run a program (as in "printf("%d %d\n", &a, &b);". It will print "1000 988" in one run, "924 912" in another, "1288 1276", and so on and so forth)? Does the compiler occupy a set amount of memory after each variable declaration where nothing can be written? If yes, what does it depend on? Using some variables in a program of mine, the smallest difference between them was 12 bytes, and it reached up to 212. This was the only case where the difference was not a multiple of twelve (in other cases it was 24, 36 or 48 bytes). Is there any reason behind that? Since my variables were of type int (occupying 4 bytes in my system), could the difference between my variable addresses be less than 12 (for example 4)? Do those address differences depend on the variable types? If yes, in what way? Thank you in advance!
Most OSes today use address-space layout randomization in order to make it harder to write certain kinds of malware. (The kind that writes code to memory and then tries to get the program to hand over control to it; now has to guess what address to get the program to jump to.) As a result, variables won’t be at the same addresses every time you run a program.
Depending on the type of the variable, how it’s allocated and which OS and architecture you’re running on, the size and alignment of variables will vary. The compiler and runtime might or might not always put them on a four-, eight- or sixteen-byte boundary. For example, the x86_64 function-call ABI always starts a function’s stack frame on a sixteen-byte boundary, and some implementations of malloc() always return an address divisible by sixteen because that’s required to store vectors on some CPUs.
If you want to know what the compiler is doing, you can try compiling to assembly. On gcc or clang, you can do this with the -S flag.
If you're asking why the memory address for a variable differs in between different executable runs the answer is ASLR, which exists to make it harder to exploit security issues in code (see https://en.wikipedia.org/wiki/Address_space_layout_randomization).
If you disable ASLR you will get the same address for a given variable each time you run your executable.
See also Difference between gdb addresses and "real" addresses?
Your linker (and to some degree, your compiler) lays out the address space of your application. The linker typically builds a relocatable image based at some address (e.g., zero). Apparently, your loader is placing the relocatable image at different locations when it is run.
Does the compiler occupy a set amount of memory after each variable declaration where nothing can be written?
Typically no UNLESS, the next variable needs to be aligned. Variables are normally aligned to addresses that are multiples of the variable's size.
It sounds like your compiler is allocating memory for something that you simply are not accounting for.

char *c="1234". Address stored in c is always the same

This was a question asked by an interviewer:
#include<stdio.h>
int main()
{
char *c="123456";
printf("%d\n",c);
return 0;
}
This piece of code always prints a fixed number (e.g. 13451392), no matter how many times you execute it. Why?
Your code contains undefined behavior: printing a pointer needs to be done using %p format specifier, and only after converting it to void*:
printf("%p\n", (void*)c);
This would produce a system-dependent number, which may or may not be the same on different platforms.
The reason that it is fixed on your platform is probably that the operating system always loads your executable into the same spot of virtual memory (which may be mapped to different areas of physical memory, but your program would never know). String literal, which is part of the executable, would end up in the same spot as well, so the printout would be the same all the time.
To answer your question, the character string "123456" is a static constant in memory, and when the .exe is loaded, it always goes into the same memory location.
What c is (or rather what it contains) is the memory address of that character string which, as I said, is always at the same location. If you print the address as a decimal number, you see the address, in decimal.
Of course, as #dasblinkenlight said, you should print it as a pointer, because different machines/languages have different conventions about the size of pointers versus the size of ints.
Most executable file formats have an option to tell the OS loader at which virtual address to load the executable, For example PE format used by Windows has ImageBase field for this and usually sets to 0x00400000 for applications.
When the loader first load the executable, it tries load it at that address, if it's not used, it load it at it, which is mostly true, but if it's used. It load it at different address given by the system.
The case here is that the offset to your "12345" in the data section is the same, and OS loads the image base at the same base address, so you always get the same virtual address, base + offset.
But this is not always the case, one for the given reason above, the base address may be used, alot of Windows DLLs compile using MSVC sets their base address to 0x10000000, so only one or none is actually loaded at that address.
Another case is when there is address space randomization ASLR, security feature, if it is supported and enabled by the system, MSVC has the linker option /DYNAMICBASE, the system will ignore the specified image base and will give you different random address on its own.
Two things to conclude:
You should not depend on this behavior, the system can load your program at any address and thus you will give different address.
Use %p for printing address, on some systems, for example, int is 4 bytes and pointers are 8 bytes, part of you address will be chopped.

How do you know the exact address of a variable?

So I'm looking through my C programming text book and I see this code.
#include <stdio.h>
int j, k;
int *ptr;
int main(void)
{
j = 1;
k = 2;
ptr = &k;
printf("\n");
printf("j has the value %d and is stored at %p\n", j, (void *)&j);
printf("k has the value %d and is stored at %p\n", k, (void *)&k);
printf("ptr has the value %p and is stored at %p\n", (void *)ptr, (void *)&ptr);
printf("The value of the integer pointed to by ptr is %d\n", *ptr);
return 0;
}
I ran it and the output was:
j has the value 1 and is stored at 0x4030e0
k has the value 2 and is stored at 0x403100
ptr has the value 0x403100 and is stored at 0x4030f0
The value of the integer pointed to by ptr is 2
My question is if I had not ran this through a compiler, how would you know the address to those variables by just looking at this code? I'm just not sure how to get the actual address of a variable. Thanks!
Here's my understanding of it:
The absolute addresses of things in memory in C is unspecified. It's not standardised into the language. Because of this, you can't know the locations of things in memory by looking at just the code. (However, if you use the same compiler, code, compiler options, runtime and operating system, the addresses may be consistent.)
When you're developing applications, this is not behaviour you should rely on. You may rely on the difference between the locations of two things in some contexts, however. For example, you can determine the difference between the addresses of pointers to two array elements to determine how many elements apart they are.
By the way, if you are considering using the memory locations of variables to solve a particular problem, you may find it helpful to post a separate question asking how to so without relying on this behaviour.
There is no other way to "know the exact address" of a variable in Standard C than to print it with "%p". The actual address is determined by many factors not under control of the programmer writing code. It's a matter of OS, the linker, the compiler, options used and probably others.
That said, in the embedded systems world, there are ways to express this variable must reside at this address, for example if registers of external devices are mapped into the address space of a running program. This usually happens in what is called a linker file or map file or by assigning an integral value to a pointer (with a cast). All of these methods are non-standard.
For the purpose of your everyday garden-variety programs though, the point of writing C programs is that you need and should not care where your variables are stored.
You can't.
Different compilers can put the variables in different places. On some machines the address is not a simple integer anyway.
The compiler only knows things like "the third integer global variable" and "the four bytes allocated 36 bytes down from the stack pointer." It refers to global vars, pointers to subroutines (functions), subroutine arguments and local vars only in relative terms. (Never mind the extra stuff for polymorphic objects in C++, yikes!) These relative references are saved in the object file (.o or .obj) as special codes and offset values.
The Linker can fill in some details. It may modify some of these sketchy location references when joining several object files. Global variable locations will share a space (the Data Section) when globals from multiple compilation units are merged; the linker decides what order they all go in, but still describing them as relative to the start of the entire set of global vars. The result is an executable file with the final opcodes, but addresses still being sketchy and based on relative offsets.
It's not until the executable is loaded that the Loader replaces all the relative addresses with actual addresses. This is possible now, because the loader (or some part of the operating system it depends on) decides where in the whole virtual address space of the process to store the program's opcodes (Text Section), global variables (BSS, Data Sections) and call stack, and other things. The loader can do the math, and write the actual address into every spot in the executable, typically as part of "load immediate" opcodes and all opcodes involving memory access.
Google "relocation table" for more. See http://www.iecc.com/linker/linker07.html (somewhat old) for a more detailed explanation for particular platforms.
In real life, it's all complicated by the fact that virtual addresses are mapped to physical addresses by a virtual memory system, using segments or some other mechanism to keep each process in a separate address space.
I would like to further build upon the answers already provided by pointing out that some compilers, such as Visual Studio's, have a feature called Address Space Layout Randomization (ASLR), which makes programs begin at a random memory address as an anti-virus feature. Given the addresses that you have in your output, I'd say that you compiled without it (programs without it start at address 0x400000, I think). My source for this information is an answer to this question.
That said, the compiler is what determines the memory addresses at which local variables will be stored. The addresses will most likely change from compiler to compiler, and probably also with each version of the source code.
Every process has its own logical address space starting from zero. Addressees your program can access are all relative to zero. Absolute address of any memory location is decided only after loading the process in main memory. This is done using dynamic relocation by modern operating systems. Hence every time a process is loaded into memory it may be loaded at different location according to availability of the memory. Hence allowing user processes to know exact address of data stored in memory does not make any sense. What your code is printing, is a logical address and not the exact or physical address.
Continuing on the answers described above, please do not forget that processes would run in their own virtual address space (process isolation). This ensures that when your program corrupts some memory, the other running processes are not affected.
Process Isolation:
http://en.wikipedia.org/wiki/Process_isolation
Inter-Process Communication
http://en.wikipedia.org/wiki/Inter-process_communication

need explanation of how memory address work in this C program

I have a very simple C program where I am (out of my own curiosity) investigating which memory addresses are used to allocate local variables. My program is:
#include <stdio.h>
int main()
{
char buffer_1[8], buffer_2[8], buffer_3[8];
printf("address of buffer_1 %p\n", buffer_1);
printf("address of buffer_2 %p\n", buffer_2);
printf("address of buffer_3 %p\n", buffer_3);
return 0;
}
output is as follows:
address of buffer_1 0x7fff5fbfec30
address of buffer_2 0x7fff5fbfec20
address of buffer_3 0x7fff5fbfec10
my question is: why do the address seem to be getting smaller? Is there some logic to this? thank you.
The compiler is allowed to do whatever it wants with your automatic variables. In this case it just looks like it's putting them consecutively on the stack. On most popular systems in use today, stacks grow downwards.
Most compilers allocate stack memory for local variables in one step, at the very beginning pf the function. The memory is allocated as a single continuous block. Under these circumstances, the compiler, obviously, is free to use absolutely any memory layout for local variables inside that block. If can put them there so that the addresses increase in the order of declaration. Or decrease. Or arranged randomly. It is an implementation detail. And there's not much logic behind it.
It is quite possible that in your case the compiler tried to "pretend" that the memory for the arrays was allocated in the stack sequentially and independently (even though that was not the case). If on your platform stack grows downwards (as it does on many platforms), then it is expected that object declared later will have smaller addresses.
But again, functions don't allocate local objects individually. And on top of that the language makes no guarantees about any relationships between local object addresses. So, there's no real reason to prefer one ordering over the other.
The output of your C program is platform-dependent, compiler-dependent.
There cannot be just one perfect answer because the address arrangements vary based on:
Whether the system is little or big endian.
What kind of OS you are compiling on.
What kind of memory architecture you are compiling for.
What kind of compiler you are using(and compilers might have bugs too)
Whether you are on 64-bit or 32-bit platform.
And so much more.
But most important of all, is the type of processor architecture. :)
Here is a list of stack growth strategies per processor:
x86,PDP11 Downwards
System z In a linked list fashion, downwards, mostly.
ARM Select-able and can grow in either up or downward.
Mostek6502 Downwards (but only 256 bytes).
SPARC In a circular fashion with a sliding window, a limited depth stack.
RCA1802A Subject to SCRT(Standard Call and Return Technique) implementation.
But, in general, your compiler, at compile-time should map those addresses into the binary file generated. Then at the run-time, the binary file may occupy(or may pretend to occupy) a sequential set of memory addresses. And in your case the addresses printed by your C source, show that the stack is growing downward.
Basically compiler has responsibility to allocate memory to all the variables .
Array gets address on stack. but it has nothing to do with the o/p you are getting.
Basically The thing is compiler found the contiguous space(or chunk of memory) empty at that time and hence it allocated it to your program.

What is the difference between far pointers and near pointers?

Can anybody tell me the difference between far pointers and near pointers in C?
On a 16-bit x86 segmented memory architecture, four registers are used to refer to the respective segments:
DS → data segment
CS → code segment
SS → stack segment
ES → extra segment
A logical address on this architecture is written segment:offset. Now to answer the question:
Near pointers refer (as an offset) to the current segment.
Far pointers use segment info and an offset to point across segments. So, to use them, DS or CS must be changed to the specified value, the memory will be dereferenced and then the original value of DS/CS restored. Note that pointer arithmetic on them doesn't modify the segment portion of the pointer, so overflowing the offset will just wrap it around.
And then there are huge pointers, which are normalized to have the highest possible segment for a given address (contrary to far pointers).
On 32-bit and 64-bit architectures, memory models are using segments differently, or not at all.
Since nobody mentioned DOS, lets forget about old DOS PC computers and look at this from a generic point-of-view. Then, very simplified, it goes like this:
Any CPU has a data bus, which is the maximum amount of data the CPU can process in one single instruction, i.e equal to the size of its registers. The data bus width is expressed in bits: 8 bits, or 16 bits, or 64 bits etc. This is where the term "64 bit CPU" comes from - it refers to the data bus.
Any CPU has an address bus, also with a certain bus width expressed in bits. Any memory cell in your computer that the CPU can access directly has an unique address. The address bus is large enough to cover all the addressable memory you have.
For example, if a computer has 65536 bytes of addressable memory, you can cover these with a 16 bit address bus, 2^16 = 65536.
Most often, but not always, the data bus width is as wide as the address bus width. It is nice if they are of the same size, as it keeps both the CPU instruction set and the programs written for it clearer. If the CPU needs to calculate an address, it is convenient if that address is small enough to fit inside the CPU registers (often called index registers when it comes to addresses).
The non-standard keywords far and near are used to describe pointers on systems where you need to address memory beyond the normal CPU address bus width.
For example, it might be convenient for a CPU with 16 bit data bus to also have a 16 bit address bus. But the same computer may also need more than 2^16 = 65536 bytes = 64kB of addressable memory.
The CPU will then typically have special instructions (that are slightly slower) which allows it to address memory beyond those 64kb. For example, the CPU can divide its large memory into n pages (also sometimes called banks, segments and other such terms, that could mean a different thing from one CPU to another), where every page is 64kB. It will then have a "page" register which has to be set first, before addressing that extended memory. Similarly, it will have special instructions when calling/returning from sub routines in extended memory.
In order for a C compiler to generate the correct CPU instructions when dealing with such extended memory, the non-standard near and far keywords were invented. Non-standard as in they aren't specified by the C standard, but they are de facto industry standard and almost every compiler supports them in some manner.
far refers to memory located in extended memory, beyond the width of the address bus. Since it refers to addresses, most often you use it when declaring pointers. For example: int * far x; means "give me a pointer that points to extended memory". And the compiler will then know that it should generate the special instructions needed to access such memory. Similarly, function pointers that use far will generate special instructions to jump to/return from extended memory. If you didn't use far then you would get a pointer to the normal, addressable memory, and you'd end up pointing at something entirely different.
near is mainly included for consistency with far; it refers to anything in the addressable memory as is equivalent to a regular pointer. So it is mainly a useless keyword, save for some rare cases where you want to ensure that code is placed inside the standard addressable memory. You could then explicitly label something as near. The most typical case is low-level hardware programming where you write interrupt service routines. They are called by hardware from an interrupt vector with a fixed width, which is the same as the address bus width. Meaning that the interrupt service routine must be in the standard addressable memory.
The most famous use of far and near is perhaps the mentioned old MS DOS PC, which is nowadays regarded as quite ancient and therefore of mild interest.
But these keywords exist on more modern CPUs too! Most notably in embedded systems where they exist for pretty much every 8 and 16 bit microcontroller family on the market, as those microcontrollers typically have an address bus width of 16 bits, but sometimes more than 64kB memory.
Whenever you have a CPU where you need to address memory beyond the address bus width, you will have the need of far and near. Generally, such solutions are frowned upon though, since it is quite a pain to program on them and always take the extended memory in account.
One of the main reasons why there was a push to develop the 64 bit PC, was actually that the 32 bit PCs had come to the point where their memory usage was starting to hit the address bus limit: they could only address 4GB of RAM. 2^32 = 4,29 billion bytes = 4GB. In order to enable the use of more RAM, the options were then either to resort to some burdensome extended memory solution like in the DOS days, or to expand the computers, including their address bus, to 64 bits.
Far and near pointers were used in old platforms like DOS.
I don't think they're relevant in modern platforms. But you can learn about them here and here (as pointed by other answers). Basically, a far pointer is a way to extend the addressable memory in a computer. I.E., address more than 64k of memory in a 16bit platform.
A pointer basically holds addresses. As we all know, Intel memory management is divided into 4 segments.
So when an address pointed to by a pointer is within the same segment, then it is a near pointer and therefore it requires only 2 bytes for offset.
On the other hand, when a pointer points to an address which is out of the segment (that means in another segment), then that pointer is a far pointer. It consist of 4 bytes: two for segment and two for offset.
Four registers are used to refer to four segments on the 16-bit x86 segmented memory architecture. DS (data segment), CS (code segment), SS (stack segment), and ES (extra segment). A logical address on this platform is written segment:offset, in hexadecimal.
Near pointers refer (as an offset) to the current segment.
Far pointers use segment info and an offset to point across segments. So, to use them, DS or CS must be changed to the specified value, the memory will be dereferenced and then the original value of DS/CS restored. Note that pointer arithmetic on them doesn't modify the segment portion of the pointer, so overflowing the offset will just wrap it around.
And then there are huge pointers, which are normalized to have the highest possible segment for a given address (contrary to far pointers).
On 32-bit and 64-bit architectures, memory models are using segments differently, or not at all.
Well in DOS it was kind of funny dealing with registers. And Segments. All about maximum counting capacities of RAM.
Today it is pretty much irrelevant. All you need to read is difference about virtual/user space and kernel.
Since win nt4 (when they stole ideas from *nix) microsoft programmers started to use what was called user/kernel memory spaces.
And avoided direct access to physical controllers since then. Since then dissapered a problem dealing with direct access to memory segments as well. - Everything became R/W through OS.
However if you insist on understanding and manipulating far/near pointers look at linux kernel source and how it works - you will newer come back I guess.
And if you still need to use CS (Code Segment)/DS (Data Segment) in DOS. Look at these:
https://en.wikipedia.org/wiki/Intel_Memory_Model
http://www.digitalmars.com/ctg/ctgMemoryModel.html
I would like to point out to perfect answer below.. from Lundin. I was too lazy to answer properly. Lundin gave very detailed and sensible explanation "thumbs up"!

Resources