Consider the really simple code below:
#include <stdio.h>
#include <stdlib.h>
int main() {
int* a = (int*) malloc(10 * sizeof(int));
printf("a = %p, a+1 = %p", a, a+1);
return 0;
}
The output is this:
a = 0x127f190, a+1 = 0x127f194
Since the size of an int is 4 bytes, I am assuming from the result above that a pointer value is then the index of a byte on my RAM memory. Hence a+1 increases in fact the value of a by sizeof(int) = 4 (bytes). Is that correct?
If yes, then why do I get 32 bit memory addresses from my program? This machine is 64bit running a 64bit version of Ubuntu. How do I get the program to print a full 64bit address? Do I have to compile it with special flags?
You are correct about your pointer. Memory is traditionally organized and addressed in bytes, pointers point to the first byte of whatever they are pointing to (EDIT: they don't HAVE to, but on the usual platforms and compilers they do).
You are not seeing a "64 bit" pointer simply because the output strips the leading 0s :-) If you do a sizeof(a), chances are that you'll get "4" on a 32 bit system, and "8" on a 64 bit system.
the index of a byte on my RAM memory
That's close enough for most purposes, but not entirely accurate. It's the index of a byte in your process's address space. Virtual addresses do not relate directly to physical RAM, there's a thing called a "virtual memory manager" that's responsible for keeping track of what virtual addresses in each process refer to what physical RAM (and things other than RAM).
Normally you can forget about this, and just think of the virtual address space as RAM (or as "memory" to keep it abstract). But the same virtual address in different processes could refer to different physical memory, or the same memory could be referred to in different processes by different virtual addresses. Or the same virtual address in the same process could refer to different physical memory at different times, if the OS has noticed that the page hasn't been used for a while, swapped it to disk and then back to memory when it's used again. So it's not really "the" address of the RAM itself, it's just the address that your process has been given by the OS, to refer to some RAM.
The reason you're seeing an increase of four is because you're allocating memory for integers, which are fixed at four bytes long (in Intel Linux gcc) -- whether you've compiled 32bit or 64bit code. As already said, the pointers you get refer to virtual memory addresses, not physical memory.
If you change your int to long, you'll see a 4 byte increase with 32bit code and an 8 byte increase with 64bit code.
Additionally, if you look at sizeof(void *) it will tell you if your pointers are 32bit or 64bit. If your pointers are 64bit, then you'll be getting 64bit pointers printed with %p.
I edited your program to run on my copy of Ubuntu, adding:
printf("Size of pointer = %d\n", (int)sizeof(void *));
Here's the output:
a = 0x2067010, a+1 = 0x2067014
Size of pointer = 8
So the pointer is indeed 64bit.
Sorry, to not point you an exact answer. But here it comes:
Look for Pointer Arithmetic, you will find all you are looking for.
Even if you system runs x64, most of the compilers come default with x86 unless you specifically declare to compile for x64. So search compiler documentation for x64 flag, and for relevant options.
When you wrote int* a = (int*) malloc(10 * sizeof(int));, you were allocating memory for an array of ten elements. That is equivalent to int a[10]; and a = 0x127f190 is the address of the first element a[0].
Related
I was reading Operating Systems: Three Easy Pieces. To learn how the virtual address space for a program look like, I run the following code.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
printf("location of code : %p\n", (void *) main);
printf("location of heap : %p\n", (void *) malloc(1));
int x = 3;
printf("location of stack : %p\n", (void *) &x);
return x;
}
Its output is:
location of code : 0x564eac1266fa
location of heap : 0x564ead8e5670
location of stack : 0x7fffd0e77e54
Why the code segment's location is 0x564eac1266fa? What does so large a (virtual) space before it use for? Why doesn't it start from or near 0x0)
And, why the program's virtual address is so large?(from the stack location, it's 48 bits wide) What's the point of it?
The possible virtual address space organizations are defined by the hardware you are using, specifically the MMU it supports. The OS may then use any organization that the hardware can be coerced into using, but generally it just uses it directly (possibly with some subsetting), as that is most efficient.
The x86_64 architecture defines a 48-bit virtual address space1, and most OSes reserve half of that for system use, so user programs see a 47 bit address space. Within that address space, most OSes will randomize the addresses used for any given program, so as to make exploiting bugs in the programs harder.
1Strictly speaking, the architecture defines a 64-bit virtual address space, but then reserves all addresses that do not have the top 17 bits all 0 or all 1.
You are barking up the wrong tree with what you are trying to do here. A process has multiple stacks, may have multiple heaps, and main might not be the start of the code. Viewing an address space as a code segment, stack segment, heap segment, ... as horrible operating systems books do is only going to get you confused.
Because of logical addressing, the memory mapped into the address space does not have to be contiguous.
Why the code segment's location is 0x564eac1266fa? What does so large a (virtual) space before it use for? Why doesn't it start from or near 0x0)
The start of code in your process would well be at 0x564eac1266f8. The fact that you have a high address does not mean the lower addresses have been mapped into the process address space.
And, why the program's virtual address is so large?(from the stack location, it's 48 bits wide) What's the point of it?
Stacks generally start high and grow low.
I used this chunk of code:
int
main(int argc, char *argv[])
{
int *p; // memory for pointer is on "stack"
p = malloc(sizeof(int)); // malloc'd memory is on "heap"
assert(p != NULL);
printf("(pid:%d) addr of p: %llx\n", (int) getpid(),
(unsigned long long) &p);
printf("(pid:%d) addr stored in p: %llx\n", (int) getpid(),
(unsigned long long) p);
return 0;
}
However, I get:
addr of p: 7ffc0c53e3e0
addr stored in p: 558ae195c260
Now, first of all, since the program only does this, I do not understand why malloc() does not start with the address 00200000? Second, can I say that the 7ffc0c53e3e0 address is in heap, and the address 558ae195c260 is in stack? Third, if my guess with 00200000 is wrong, is there any logic with the addresses that I get, or is it completely random?
When I think about it, the address is not even 32 bits, it is 48 bits. Even if it is to be more than 32 bits (I have 8 GB memory, so I believe it must be more than 32 anyways), why is it not expressed in 64 bits, since the processor is 64 bits.
Thank you for your help.
Linux implements ASLR, so as far as I'm aware you always get random addresses. 558ae195c260 is actually allocated on the heap via malloc(), whereas 7ffc0c53e3e0 is allocated on the stack when you declare int *p;. 48 bits is still enough for 256 TB of RAM, but beyond that some architectures don't allow all address lines to be a full 64 bits (like AMD64).
Hope that helps. If anything I've said is wrong or misleading please correct me in the comments.
No, p is in stack (or global) and it points to a bunch of memory in the heap.
Regarding malloc() , if you are working on an OS, it depends on the kernel, and how it is managing the memory.
Finally, obviously some of your data is wrong. A 32bit bus data cannot manage 8GB of RAM (no more than 2^32=4GB). However, it makes sense in a 64bits bus, because a 64bits variable has enough space to contain an 8GB address.
Consider the following code in a linux machine with 32 bit OS:
void foo(int *pointer){
int *buf;
int *buf1 = pointer;
....
}
What is the maximum memory address buf and buf1 can point to using the above declaration (OS allocates the address)? E.g., can it point to address 2^32-200?
The reason I asked is that I may do pointer arithmetic on these buffers and I am concern that this pointer arithmetic can wrap around. E.g., assume the len is smaller than the size of buf and buf1. Assume some_pointer points to the end of the buffer.
unsigned char len = 255;
if(buf + len > some_pointer)
//do something
if(buf1 + len > some_pointer)
//do something
The standard says that
For two elements of an array, the address of the element with the lower subscript will always compare less to the address of the object with the higher subscript.
Comparing any two elements that are not part of the same aggregate (array or struct) is undefined behavior.
So if buf + len and some_pointer point to elments in the same array as buf (or one past the array), you don't have to worry about wrap arround. If one of them doesn't, you have undefined behavior anyway.
You shouldn't ever rely on the addresses provided by the allocator falling within a specific range. Even if you could show that on a particular Linux setup, malloc can only generate addresses between X and Y, there is no guarantee--it could change with any future update. The only guarantee from malloc is that successful allocations won't start at NULL (address 0 in code, for Linux and most other typical platforms).
Yes, for a 32 bit or 64 bit OS. Whether there's anything usable there, or if you'll get an access violation trying to dereference the pointer, is up to the compiler and OS.
The OS can map pages of physical memory anywhere in the address space. The addresses you see don’t correspond to physical RAM chips at all. The OS might, for example, have virtual memory or copy-on-write pages.
Basically my question is how much bytes does a single address take / have?
I mean a char takes 1 byte on my platform and has 1 address. But an int takes 4 bytes. How many addresses does this int take? Does it still have only 1 address or does it have 4?
For example :
char c = 'A'; //Address at 0xdeadbeee
int i = 45846; //Address at 0xdeadbeef
int* iPtr = &i;
iPtr++; //Address at 0xdeadbef3 now
What happens with the addresses between 0xdeadbeef and 0xdeadbef3? Are they all reserved for i? What happens to i when I point to 0xdeadbeee(should be exactly one address | byte or whatever under i) and change it's value?
Edit:
for those who will still answer, I don't want to know how big an integer is. I want to know if it has also 4 addresses when taking 4 bytes of memory and what happens (if it has 4 addresses) when changing one of these addresses' value.
I hope it's clearer now.
The sizes of the built-in types (char, short, int, long) are implementation specific and platform specific. If we assume your int is 32 bits, then we can address some of your questions:
If i resides at 0xdeadbeef, then 0xdeadbeef, 0xdeadbef0, 0xdeadbef1, and 0xdeadbef2 byte addresses would all be used to store i. If you were to set iPtr to 0xdeadbeee and write a value, 0xdeadbeee and the following three addresses would then contain the value you wrote. If you then attempt to read c or i, you would find the value corrupted.
Some things to consider: not all architectures allow byte addressing. A char may be one byte on your system, but due to limitations, 4 bytes may be reserved. Likewise, you may not be able to read or write a pointer that points to non-aligned addresses. For example, a system that can only access memory on 32 bit boundaries could only access 0xdeadbeec or 0xdeadbef0.
How could you find this out? How about:
printf("%zu\n", sizeof(iPtr));
But, as #H2CO3 points out, you're really asking about pointer arithmetic. Read up more on that for more info.
Yes, the address &i+1byte is an address of the second byte of i.
If you live on Memory Street 100 in 4 houses, you have four addresses. But those addresses address different buildings. Although, depending on your postal service the mail may not be delivered if it's not the canonical address (the same goes for memory access — depends on the platform).
You can find out how many bytes a pointer takes by using sizeof:
size_t int_ptr_size = sizeof(int*);
If you try to access data through a pointer that isn't properly aligned for the type, you invoke undefined behavior, so it's unpredictable what will happen. On some architectures, the program will crash with a Bus Error.
An address refers to the start of the data. So the size of the address doesn't change depending on the size of the data.
The actual size of the address will, however, depend on the platform. On many newer systems, that size will be 64 bits. But we can't say exactly without knowing your platform.
You can use sizeof() in your code to get the size of an address.
Everything depends on your hardware platform memory organization.
In case the memory is organized in 4 bytes cells, the variable which length is bellow or equal to 4 bytes (assuming correct memory adjustment), is hold in just one single memory cells, so it is pointed by only one single address value.
I'm just starting C. I have read about pointers in various books/tutorials and I understand the basics. But one thing I haven't seen explained is what are the numbers.
For example:
int main(){
int anumber = 10;
int *apointer;
apointer = &anumber;
printf("%i", &apointer);
}
may return a number like 4231168. What does this number represent? Is it some storage designation in the RAM?
Lots of PC programmer replies as always. Here is a reply from a generic programming point-of-view.
You will be quite interested in the actual numerical value of the address when doing any form of hardware-related programming. For example, you can access hardware registers in a computer in the following way:
#define MY_REGISTER (*(volatile unsigned char*)0x1234)
This code assumes you know that there is a specific hardware register located at address 0x1234. All addresses in a computer are by tradition/for convenience expressed in hexadecimal format.
In this example, the address is 16 bits long, meaning that the address bus on the computer used is 16-bits wide. Every memory cell in your computer has an address. So on a 16-bit address bus you could have a maximum of 2^16 = 65536 addressable memory cells.
On a PC for example, the address would typically be 32 bits long, giving you 4.29 billion addressable memory cells, ie 4.29 Gigabyte.
To explain that macro in detail:
0x1234 is the address of the register / memory location.
We need to access this memory location through a pointer, so therefore we typecast the integer constant 0x1234 into an unsigned char pointer = a pointer to a byte.
This assumes that the register we are interested in is 1 byte large. Had it been two bytes large, we would perhaps have used unsigned short instead.
Hardware registers may update themselves at any time (their contents are "volatile"), so the program can't be allowed to make any assumptions/optimizations of what's stored inside them. The program has to read the value from the register at every single time the register is used in the code. To enforce this behavior, we use the volatile keyword.
Finally, we want to access the register just as if it was a plain variable. Therefore the * is added, to take the contents of the pointer.
Now the specific memory location can be accessed by the program:
MY_REGISTER = 1;
unsigned char var = MY_REGISTER;
For example, code like this is used everywhere in embedded applications.
(But as already mentioned in other replies, you can't do things like this in modern PCs, since they are using something called virtual addressing, giving you a slap on the fingers should you attempt it.)
It's the address or location of the memory to which the pointer refers. However, it's best if you regard this as an opaque quantity - you are never interested in the actual value of the pointer, only that to which it refers.
How the address then relates to physical memory is a service that the system provides and actually varies across systems.
That's a virtual address of anumber variable. Every program has its own memory space and that memory space is mapped to the physical memory. The mapping id done by the processor and the service data used for that is maintained by the operating system. So your program never knows where it is in the physical memory.
It's the address of the memory1 location where your variable is stored. You shouldn't care about the exact value, you should just know that different variables have different addresses, that "contiguous memory" (e.g. arrays) has contiguous addresses, ...
By the way, to print the address stored in a pointer you should use the %p specifier in printf.
Notice that I did not say "RAM", because in most modern OSes the "memory" your process sees is virtual memory, i.e. an abstraction of the actual RAM managed by the OS.
A lot of people told you, that the numeric value of a pointer will designate its address. This is one way how implementations can do it, but it is very important, what the C standard has to say about pointers:
The nil pointer has always numeric value 0 when operated on in the C programming language. However the actual memory containing the pointer may have any value, as long as this special, architecture dependent value is consistently treated nil, and the implementation takes care that this value is seen as 0 by C source code. This is important to know, since 0 pointers may appear as a different value on certain architectures when inspected with a low level memory debugger.
There's no requirement whatsoever that the values of the pointer are in any way related to actual addresses. They may be as well abstract identifiers, resolved by a LUT or similar.
If a pointer addresses an array, the rules of pointer arithmetic must hold, i.e. int array[128]; int a, b; a = (int)&array[120]; b = (int)&array[100]; a - b == 20 ; array + (a-b) == &array[20]; &array[120] == (int*)a
Pointer arithmetic between pointers to different objects is undefined and causes undefined behaviour.
The mapping pointer to integer must be reversible, i.e. if a number corresponds to a valid pointer, the conversion to this pointer must be valid. However (pointer) arithmetic on the numerical representation of pointers to different objects is undefined.
Yes, exactly that - it's the address of the apointer data in memory. Local variable such as anumber and apointer will be allocated in your program's stack, so it will refer to an address in the main() function's frame in the stack.
If you had allocated the memory with malloc() instead it would refer to a position in your program's heap space. If it was a fixed string it may refer to a location in your program's data or rodata (read-only data) segments instead.
in this case &apointer represent the address in RAM memory of the pointer variable apointer
apointer is the "address" of the variable anumber. In theory, it could be the actual physical place in RAM where the value of anumber is stored, but in reality (on most OS'es) it's likely to be a place in virtual memory. The result is the same though.
It's a memory address, most likely to the current location in your program's stack. Contrary to David's comment, there are times when you'll calculate pointer offsets, but this is only if you have some kind of array that you are processing.
It's the address of the pointer.
"anumber" takes up some space in RAM, the data at this spot contains the number 10.
"apointer" also takes up some space in RAM, the data at this spot contains the location of "anumber" in RAM.
So, say you have 32 bytes of ram, addresses 0..31
At e.g. position 16 you have 4 bytes, the "anumber" value 10
At e.g. position 20 you have 4 bytes, the "apointer" value 16, "anumber"'s position in RAM.
What you print is 20, apointer's position in RAM.
Note that this isn't really directly in RAM, it's in virtual address space which is mapped to RAM. For the purpose of understanding pointers you can completely ignore virtual address space.
it is not the address of the variable anumber that is printed but it is the address of the pointer which gets printed.look carefully.had it been just "apointer",then we would have seen the address of the anumber variable.