How much bytes does an address take? - c

Basically my question is how much bytes does a single address take / have?
I mean a char takes 1 byte on my platform and has 1 address. But an int takes 4 bytes. How many addresses does this int take? Does it still have only 1 address or does it have 4?
For example :
char c = 'A'; //Address at 0xdeadbeee
int i = 45846; //Address at 0xdeadbeef
int* iPtr = &i;
iPtr++; //Address at 0xdeadbef3 now
What happens with the addresses between 0xdeadbeef and 0xdeadbef3? Are they all reserved for i? What happens to i when I point to 0xdeadbeee(should be exactly one address | byte or whatever under i) and change it's value?
Edit:
for those who will still answer, I don't want to know how big an integer is. I want to know if it has also 4 addresses when taking 4 bytes of memory and what happens (if it has 4 addresses) when changing one of these addresses' value.
I hope it's clearer now.

The sizes of the built-in types (char, short, int, long) are implementation specific and platform specific. If we assume your int is 32 bits, then we can address some of your questions:
If i resides at 0xdeadbeef, then 0xdeadbeef, 0xdeadbef0, 0xdeadbef1, and 0xdeadbef2 byte addresses would all be used to store i. If you were to set iPtr to 0xdeadbeee and write a value, 0xdeadbeee and the following three addresses would then contain the value you wrote. If you then attempt to read c or i, you would find the value corrupted.
Some things to consider: not all architectures allow byte addressing. A char may be one byte on your system, but due to limitations, 4 bytes may be reserved. Likewise, you may not be able to read or write a pointer that points to non-aligned addresses. For example, a system that can only access memory on 32 bit boundaries could only access 0xdeadbeec or 0xdeadbef0.

How could you find this out? How about:
printf("%zu\n", sizeof(iPtr));
But, as #H2CO3 points out, you're really asking about pointer arithmetic. Read up more on that for more info.

Yes, the address &i+1byte is an address of the second byte of i.
If you live on Memory Street 100 in 4 houses, you have four addresses. But those addresses address different buildings. Although, depending on your postal service the mail may not be delivered if it's not the canonical address (the same goes for memory access — depends on the platform).

You can find out how many bytes a pointer takes by using sizeof:
size_t int_ptr_size = sizeof(int*);
If you try to access data through a pointer that isn't properly aligned for the type, you invoke undefined behavior, so it's unpredictable what will happen. On some architectures, the program will crash with a Bus Error.

An address refers to the start of the data. So the size of the address doesn't change depending on the size of the data.
The actual size of the address will, however, depend on the platform. On many newer systems, that size will be 64 bits. But we can't say exactly without knowing your platform.
You can use sizeof() in your code to get the size of an address.

Everything depends on your hardware platform memory organization.
In case the memory is organized in 4 bytes cells, the variable which length is bellow or equal to 4 bytes (assuming correct memory adjustment), is hold in just one single memory cells, so it is pointed by only one single address value.

Related

Memory allocation and unused byte in basic C program

I have a question regarding memory allocation on the basic C program.
#include <stdio.h>
int main()
{
int iarray[3];
char carray[3];
printf("%p\n", &iarray); // b8
printf("%p\n", &iarray+1); // c4
printf("%p\n", &carray); // c5
return 0;
}
given the code above, you can see that &iarray+1 and &carray have a difference of one byte which I'm not sure for which purpose or what in it, why does the compiler assign an unused byte between the two arrays?
I thought maybe it uses to know the array size, but I understood that sizeof is a compile-time function that knows the size without allocation of real memory, so there no use for storing the array size
Note: The output can be seen on the comments of each printf.
b8
c4
c5
Playground: https://onlinegdb.com/cTdzccpDvI
Thanks.
Compilers are free to arrange variables in memory any way that they see fit. Typically, they will be placed at memory offsets whose value is a multiple of the variable's size, for example a 4 byte int or int array will start at an address which is a multiple of 4.
In this case, you have an int array starting at an address which is a multiple of 4, followed by an unused byte, followed by a char array of size 3. In theory, an int or long could immediately follow the char array in memory if it was defined as the next available address is a multiple of 8.
From your output it looks like the stack is arranged like this for these local variables:
b8-bb: 1st integer of iarray
bc-bf: 2nd integer of iarray
c0-c3: 3rd integer of iarray
c4: padding probably, only compiler knows
c5-c7: carray
Now when you do &iarray+1 You are taking the address of an array int[3], and adding +1 of that array type to it. In other words, you are getting the address of the next int[3] array, which indeed would be at c4 (but isn't because there's just one int[3]).
This code is actually valid. You must not dereference this pointer, but because it points exactly +1 past the iarray, having the pointer and printing its value is legal (in other words, not Undefined Behavior, like &iarray+2 would be).
If you also print this:
printf("%p\n", iarray+1);
You should get result bc, because now you take pointer of type int (iarray is treated as pointer to int), add 1 to that, getting the next int.
The reason this happens is the particular compiler you are using allocates memory from high addresses to low in this particular situation.
The compiler analyzes the main routine and see it needs 3 int for iarray and 3 char for carray. For whatever reason, it decides to work on carray first.
The compiler starts with a planned stack frame that is required to have 16-byte alignment at certain points. Additional data is needed on the stack with the result that the point where the compiler starts putting local variables is at an address that is 8 modulo 16 (so its hexadecimal representation ends in 8). That is, from some address like 7FFC565E90A8 and up, memory is used for managing the stack frame. The first bytes for local objects will be at 7FFC565E90A7, 7FFC565E90A6, 7FFC565E90A5, 7FFC565E90A4, and so on.
The compiler takes the first three bytes of that space for carray. Recall we are working from high addresses to low addresses. (For historical reasons, that is the direction that stacks grow; some high address is assigned as the starting point, and new data is put in lower addresses.) So carray is put at address 7FFC565E90A5. It fills the bytes at 7FFC565E90A5, 7FFC565E90A6, and 7FFC565E90A7.
Then the compiler needs to assign 12 bytes for iarray. The next available 12 bytes are from 7FFC565E9099 to 7FFC565E90A4. However, the int elements in iarray require 4-byte alignment, so they cannot start at 7FFC565E9099. Therefore, the compiler adjusts to have them start at 7FFC565E9098. Then iarray fills bytes from 7FFC565E9098 to 7FFC565E90A3, and 7FFC565E90A4 is unused.
Note that in other situations, the compiler may arrange local objects in different ways. When you have multiple objects with different alignments, the compiler may choose to cluster all objects with the same alignment to reduce the number of places it needs to insert padding. A compiler could also choose to allocate memory for objects in alphabetical order by their names. Or it could do it in the order it happens to store them in its hash table. Or some combination of these things, such as clustering all objects by alignment requirement but then sorting by name within each cluster.
This behavior is purely (compiler) implementation defined. What probably happens is this:
When a function (main() in this case) is invoked which has local variables, memory for those variables are allocated on the stack. In this case, 15 bytes are needed, but it is likely that 4-byte alignment is required for the stack allocation, so that 16 bytes are allocated.
It is also likely that the int-array must be 4-byte aligned. Hence the address of the int array is a multiple of 4. The char-array does not have any alignment requirements so it can be placed anywhere in the 4 remaining bytes.
So in short, the additional byte is unused, but allocated due to alignment.

The maximum memory location my C stack pointer can points to during initialization

Consider the following code in a linux machine with 32 bit OS:
void foo(int *pointer){
int *buf;
int *buf1 = pointer;
....
}
What is the maximum memory address buf and buf1 can point to using the above declaration (OS allocates the address)? E.g., can it point to address 2^32-200?
The reason I asked is that I may do pointer arithmetic on these buffers and I am concern that this pointer arithmetic can wrap around. E.g., assume the len is smaller than the size of buf and buf1. Assume some_pointer points to the end of the buffer.
unsigned char len = 255;
if(buf + len > some_pointer)
//do something
if(buf1 + len > some_pointer)
//do something
The standard says that
For two elements of an array, the address of the element with the lower subscript will always compare less to the address of the object with the higher subscript.
Comparing any two elements that are not part of the same aggregate (array or struct) is undefined behavior.
So if buf + len and some_pointer point to elments in the same array as buf (or one past the array), you don't have to worry about wrap arround. If one of them doesn't, you have undefined behavior anyway.
You shouldn't ever rely on the addresses provided by the allocator falling within a specific range. Even if you could show that on a particular Linux setup, malloc can only generate addresses between X and Y, there is no guarantee--it could change with any future update. The only guarantee from malloc is that successful allocations won't start at NULL (address 0 in code, for Linux and most other typical platforms).
Yes, for a 32 bit or 64 bit OS. Whether there's anything usable there, or if you'll get an access violation trying to dereference the pointer, is up to the compiler and OS.
The OS can map pages of physical memory anywhere in the address space. The addresses you see don’t correspond to physical RAM chips at all. The OS might, for example, have virtual memory or copy-on-write pages.

Initializing variable at address zero in C

This may be a pretty basic question. I understand that there is a C convention to set the value of null pointers to zero. Is it possible that you can ever allocate space for a new variable in Windows, and the address of that allocated space happens to be zero? If not, what usually occupies that address region?
On MS-DOS the null pointer is a fairly valid pointer and due to the OS running in real mode it was actually possible to overwrite the 0x0 address with garbage and corrupt the kernel. You could do something like:
int i;
unsigned char* ptr = (unsigned char *)0x0;
for(i = 0; i < 1024; i++)
ptr[i] = 0x0;
Modern operating systems (e.g. Linux, Windows) run in protected mode which never gives you direct access to physical memory.
The processor will map the physical addresses to virtual addresses that your program will make use of.
It also keeps track of what you access and dare you touch something not belonging to you will you be in trouble (your program will segfault). This most definitely includes trying to dereference the 0x0 address.
When you "set the value of a pointer to zero" as in
int *p = 0;
it will not necessarily end up pointing to physical address zero, as you seem to believe. When a pointer is assigned a constant zero value (or initialized with it), the compiler is required to recognize that situation and treat it in a special way. The compiler is required to replace that zero with implementation-dependent null-pointer value. The latter does not necessarily point to zero address.
Null pointer value is supposed to be represented by a physical address that won't be used for any other purpose. If in some implementation physical address zero is a usable address, then such implementation will have to use a different physical address to represent null pointers. For example, some implementation might use address 0xFFFFFFFF for that purpose. In such implementation the initialization
int *p = 0;
will actually initialize p with physical 0xFFFFFFFF, not with physical zero.
P.S. You might want to take a look at the FAQ: http://c-faq.com/null/index.html, which is mostly dedicated to exactly that issue.
The value 0 has no special meaning. It is a convention to set a pointer to 0 and the C compiler has to interpret it accordingly. However, there is no connection to the physical address 0 and in fact, that address can be a valid address. In many systems though the lower adresses are containing hardware related adresses, like interrupt vectors or other. On the Amiga for example, the address 4 was the entry point into the operating system, which is also an arbitrary decision.
If the address of allocated space is zero, there is insufficient memory available. That means your variable could not be allocated.
The address at 0x0 is where the CPU starts executing when you power it on. Usually at this address there's a jump to the BIOS code and IIRC the first 64K (or more) are reserved for other tasks (determined by the BIOS/UEFI). It's an area which is not accessbile by an application.
Given that it should be clear that you cannot have a variable at address 0x0 in Windows.

pointer typecasting

int main()
{
int *p,*q;
p=(int *)1000;
q=(int *)2000;
printf("%d:%d:%d",q,p,(q-p));
}
output
2000:1000:250
1.I cannot understand p=(int *)1000; line, does this mean that p is pointing to 1000 address location? what if I do *p=22 does this value is stored at 1000 address and overwrite the existing value? If it overwrites the value, what if another program is working with 1000 address space?
how q-p=250?
EDIT: I tried printf("%u:%u:%u",q,p,(q-p)); the output is the same
int main()
{
int *p;
int i=5;
p=&i;
printf("%u:%d",p,i);
return 0;
}
the output
3214158860:5
does this mean the addresses used by compiler are integers? there is no difference between normal integers and address integers?
does this mean that p is pointing to 1000 address location?
Yes.
what if I do *p=22
It's invoking undefined behavior - your program will most likely crash with a segfault.
Note that in modern OSes, addresses are virtual - you can't overwrite an other process' adress space like this, but you can attempt writing to an invalid memory location in your own process' address space.
how q-p=250?
Because pointer arithmetic works like this (in order to be compatible with array indexing). The difference of two pointers is the difference of their value divided by sizeof(*ptr). Similarly, adding n to a pointer ptr of type T results in a numeric value ptr + n * sizeof(T).
Read this on pointers.
does this mean the addresses used by compiler are integers?
That "used by compiler" part is not even necessary. Addresses are integers, it's just an abstraction in C that we have nice pointers to ease our life. If you were coding in assembly, you would just treat them as unsigned integers.
By the way, writing
printf("%u:%d", p, i);
is also undefined behavior - the %u format specifier expects an unsigned int, and not a pointer. To print a pointer, use %p:
printf("%p:%d", (void *)p, i);
Yes, with *p=22 you write to 1000 address.
q-p is 250 because size of int is 4 so it's 2000-1000/4=250
The meaning of p = (int *) 1000 is implementation-defined. But yes, in a typical implementation it will make p to point to address 1000.
Doing *p = 22 afterwards will indeed attempt to store 22 at address 1000. However, in general case this attempt will lead to undefined behavior, since you are not allowed to just write data to arbitrary memory locations. You have to allocate memory in one way or another in order to be able to use it. In your example you didn't make any effort to allocate anything at address 1000. This means that most likely your program will simply crash, because it attempted to write data to a memory region that was not properly allocated. (Additionally, on many platforms in order to access data through pointers these pointers must point to properly aligned locations.)
Even if you somehow succeed succeed in writing your 22 at address 1000, it does not mean that it will in any way affect "other programs". On some old platforms it would (like DOS, fro one example). But modern platforms implement independent virtual memory for each running program (process). This means that each running process has its own separate address 1000 and it cannot see the other program's address 1000.
Yes, p is pointing to virtual address 1000. If you use *p = 22;, you are likely to get a segmentation fault; quite often, the whole first 1024 bytes are invalid for reading or writing. It can't affect another program assuming you have virtual memory; each program has its own virtual address space.
The value of q - p is the number of units of sizeof(*p) or sizeof(*q) or sizeof(int) between the two addresses.
Casting arbitrary integers to pointers is undefined behavior. Anything can happen including nothing, a segmentation fault or silently overwriting other processes' memory (unlikely in the modern virtual memory models).
But we used to use absolute addresses like this back in the real mode DOS days to access interrupt tables and BIOS variables :)
About q-p == 250, it's the result of semantics of pointer arithmetic. Apparently sizeof int is 4 in your system. So when you add 1 to an int pointer it actually gets incremented by 4 so it points to the next int not the next byte. This behavior helps with array access.
does this mean that p is pointing to 1000 address location?
yes. But this 1000 address may belong to some other processes address.In this case, You illegally accessing the memory of another process's address space. This may results in segmentation fault.

C pointers and the physical address

I'm just starting C. I have read about pointers in various books/tutorials and I understand the basics. But one thing I haven't seen explained is what are the numbers.
For example:
int main(){
int anumber = 10;
int *apointer;
apointer = &anumber;
printf("%i", &apointer);
}
may return a number like 4231168. What does this number represent? Is it some storage designation in the RAM?
Lots of PC programmer replies as always. Here is a reply from a generic programming point-of-view.
You will be quite interested in the actual numerical value of the address when doing any form of hardware-related programming. For example, you can access hardware registers in a computer in the following way:
#define MY_REGISTER (*(volatile unsigned char*)0x1234)
This code assumes you know that there is a specific hardware register located at address 0x1234. All addresses in a computer are by tradition/for convenience expressed in hexadecimal format.
In this example, the address is 16 bits long, meaning that the address bus on the computer used is 16-bits wide. Every memory cell in your computer has an address. So on a 16-bit address bus you could have a maximum of 2^16 = 65536 addressable memory cells.
On a PC for example, the address would typically be 32 bits long, giving you 4.29 billion addressable memory cells, ie 4.29 Gigabyte.
To explain that macro in detail:
0x1234 is the address of the register / memory location.
We need to access this memory location through a pointer, so therefore we typecast the integer constant 0x1234 into an unsigned char pointer = a pointer to a byte.
This assumes that the register we are interested in is 1 byte large. Had it been two bytes large, we would perhaps have used unsigned short instead.
Hardware registers may update themselves at any time (their contents are "volatile"), so the program can't be allowed to make any assumptions/optimizations of what's stored inside them. The program has to read the value from the register at every single time the register is used in the code. To enforce this behavior, we use the volatile keyword.
Finally, we want to access the register just as if it was a plain variable. Therefore the * is added, to take the contents of the pointer.
Now the specific memory location can be accessed by the program:
MY_REGISTER = 1;
unsigned char var = MY_REGISTER;
For example, code like this is used everywhere in embedded applications.
(But as already mentioned in other replies, you can't do things like this in modern PCs, since they are using something called virtual addressing, giving you a slap on the fingers should you attempt it.)
It's the address or location of the memory to which the pointer refers. However, it's best if you regard this as an opaque quantity - you are never interested in the actual value of the pointer, only that to which it refers.
How the address then relates to physical memory is a service that the system provides and actually varies across systems.
That's a virtual address of anumber variable. Every program has its own memory space and that memory space is mapped to the physical memory. The mapping id done by the processor and the service data used for that is maintained by the operating system. So your program never knows where it is in the physical memory.
It's the address of the memory1 location where your variable is stored. You shouldn't care about the exact value, you should just know that different variables have different addresses, that "contiguous memory" (e.g. arrays) has contiguous addresses, ...
By the way, to print the address stored in a pointer you should use the %p specifier in printf.
Notice that I did not say "RAM", because in most modern OSes the "memory" your process sees is virtual memory, i.e. an abstraction of the actual RAM managed by the OS.
A lot of people told you, that the numeric value of a pointer will designate its address. This is one way how implementations can do it, but it is very important, what the C standard has to say about pointers:
The nil pointer has always numeric value 0 when operated on in the C programming language. However the actual memory containing the pointer may have any value, as long as this special, architecture dependent value is consistently treated nil, and the implementation takes care that this value is seen as 0 by C source code. This is important to know, since 0 pointers may appear as a different value on certain architectures when inspected with a low level memory debugger.
There's no requirement whatsoever that the values of the pointer are in any way related to actual addresses. They may be as well abstract identifiers, resolved by a LUT or similar.
If a pointer addresses an array, the rules of pointer arithmetic must hold, i.e. int array[128]; int a, b; a = (int)&array[120]; b = (int)&array[100]; a - b == 20 ; array + (a-b) == &array[20]; &array[120] == (int*)a
Pointer arithmetic between pointers to different objects is undefined and causes undefined behaviour.
The mapping pointer to integer must be reversible, i.e. if a number corresponds to a valid pointer, the conversion to this pointer must be valid. However (pointer) arithmetic on the numerical representation of pointers to different objects is undefined.
Yes, exactly that - it's the address of the apointer data in memory. Local variable such as anumber and apointer will be allocated in your program's stack, so it will refer to an address in the main() function's frame in the stack.
If you had allocated the memory with malloc() instead it would refer to a position in your program's heap space. If it was a fixed string it may refer to a location in your program's data or rodata (read-only data) segments instead.
in this case &apointer represent the address in RAM memory of the pointer variable apointer
apointer is the "address" of the variable anumber. In theory, it could be the actual physical place in RAM where the value of anumber is stored, but in reality (on most OS'es) it's likely to be a place in virtual memory. The result is the same though.
It's a memory address, most likely to the current location in your program's stack. Contrary to David's comment, there are times when you'll calculate pointer offsets, but this is only if you have some kind of array that you are processing.
It's the address of the pointer.
"anumber" takes up some space in RAM, the data at this spot contains the number 10.
"apointer" also takes up some space in RAM, the data at this spot contains the location of "anumber" in RAM.
So, say you have 32 bytes of ram, addresses 0..31
At e.g. position 16 you have 4 bytes, the "anumber" value 10
At e.g. position 20 you have 4 bytes, the "apointer" value 16, "anumber"'s position in RAM.
What you print is 20, apointer's position in RAM.
Note that this isn't really directly in RAM, it's in virtual address space which is mapped to RAM. For the purpose of understanding pointers you can completely ignore virtual address space.
it is not the address of the variable anumber that is printed but it is the address of the pointer which gets printed.look carefully.had it been just "apointer",then we would have seen the address of the anumber variable.

Resources