How can pointer addresses have different lengths?

How can pointer addresses have different lengths? - c

I just executed this code example:
int *i = (int*) malloc( sizeof(int) );
printf( "%p, %p\n", &i , i );
and this is what I got:
0x7fff38fed6a8, 0x10f7010
So I wonder why is the second address shorter than the first one?

i is on the stack, while the chunk of memory it points to is in the heap. On your platform these are two very different areas of memory and it just so happens the heap addess is relatively low, numerically, so it has a lot of leading zeroes which are not shown, i.e.
&i = 0x7fff38fed6a8; // stack
i = 0x0000010f7010; // heap

i is an address on the heap, while &i is an address on the stack. The heap and stack occupy different address ranges, so you see different numbers.
The pointers aren't actually different lengths: the shorter one is preceded by zeroes. You are probably running this on a 64-bit machine, so each pointer has 64 bits (16 hex digits).

It is not shorter, just number is smaller. Pointer &i is on stack and i is on heap.

There is no requirement that the %p formatting specifier pads the output to any fixed length. So you can't deduce any information about the in-memory "length" of an address from the printed representation. For instance, if you do this:
const void *nada = NULL;
printf("NULL is at %p\n", nada);
You might well see something like this:
NULL is at 0x0
Of course, this doesn't mean that the void * type is magically occupying only 4 bits when the value is NULL, it simply means that when the pointer value was converted to string, leading zeros were omitted.
UPDATE: Mis-read the question's code, I deleted the irrelevant text.

In addition to the other answers:
Since you didn't include <stdlib.h> there is a good chance that the compiler incorrectly assumes that malloc returns int rather than void*. This is a possibly severe bug, which you have hidden away with the typecast of malloc's return value. Read this and this.
If int has a different width than the address bus on your specific system, for example on many 16-bit or 64-bit CPUs, you will get incorrect results.

Related

Memory allocation and unused byte in basic C program

I have a question regarding memory allocation on the basic C program.
#include <stdio.h>
int main()
{
int iarray[3];
char carray[3];
printf("%p\n", &iarray); // b8
printf("%p\n", &iarray+1); // c4
printf("%p\n", &carray); // c5
return 0;
}
given the code above, you can see that &iarray+1 and &carray have a difference of one byte which I'm not sure for which purpose or what in it, why does the compiler assign an unused byte between the two arrays?
I thought maybe it uses to know the array size, but I understood that sizeof is a compile-time function that knows the size without allocation of real memory, so there no use for storing the array size
Note: The output can be seen on the comments of each printf.
b8
c4
c5
Playground: https://onlinegdb.com/cTdzccpDvI
Thanks.

Compilers are free to arrange variables in memory any way that they see fit. Typically, they will be placed at memory offsets whose value is a multiple of the variable's size, for example a 4 byte int or int array will start at an address which is a multiple of 4.
In this case, you have an int array starting at an address which is a multiple of 4, followed by an unused byte, followed by a char array of size 3. In theory, an int or long could immediately follow the char array in memory if it was defined as the next available address is a multiple of 8.

From your output it looks like the stack is arranged like this for these local variables:
b8-bb: 1st integer of iarray
bc-bf: 2nd integer of iarray
c0-c3: 3rd integer of iarray
c4: padding probably, only compiler knows
c5-c7: carray
Now when you do &iarray+1 You are taking the address of an array int[3], and adding +1 of that array type to it. In other words, you are getting the address of the next int[3] array, which indeed would be at c4 (but isn't because there's just one int[3]).
This code is actually valid. You must not dereference this pointer, but because it points exactly +1 past the iarray, having the pointer and printing its value is legal (in other words, not Undefined Behavior, like &iarray+2 would be).
If you also print this:
printf("%p\n", iarray+1);
You should get result bc, because now you take pointer of type int (iarray is treated as pointer to int), add 1 to that, getting the next int.

The reason this happens is the particular compiler you are using allocates memory from high addresses to low in this particular situation.
The compiler analyzes the main routine and see it needs 3 int for iarray and 3 char for carray. For whatever reason, it decides to work on carray first.
The compiler starts with a planned stack frame that is required to have 16-byte alignment at certain points. Additional data is needed on the stack with the result that the point where the compiler starts putting local variables is at an address that is 8 modulo 16 (so its hexadecimal representation ends in 8). That is, from some address like 7FFC565E90A8 and up, memory is used for managing the stack frame. The first bytes for local objects will be at 7FFC565E90A7, 7FFC565E90A6, 7FFC565E90A5, 7FFC565E90A4, and so on.
The compiler takes the first three bytes of that space for carray. Recall we are working from high addresses to low addresses. (For historical reasons, that is the direction that stacks grow; some high address is assigned as the starting point, and new data is put in lower addresses.) So carray is put at address 7FFC565E90A5. It fills the bytes at 7FFC565E90A5, 7FFC565E90A6, and 7FFC565E90A7.
Then the compiler needs to assign 12 bytes for iarray. The next available 12 bytes are from 7FFC565E9099 to 7FFC565E90A4. However, the int elements in iarray require 4-byte alignment, so they cannot start at 7FFC565E9099. Therefore, the compiler adjusts to have them start at 7FFC565E9098. Then iarray fills bytes from 7FFC565E9098 to 7FFC565E90A3, and 7FFC565E90A4 is unused.
Note that in other situations, the compiler may arrange local objects in different ways. When you have multiple objects with different alignments, the compiler may choose to cluster all objects with the same alignment to reduce the number of places it needs to insert padding. A compiler could also choose to allocate memory for objects in alphabetical order by their names. Or it could do it in the order it happens to store them in its hash table. Or some combination of these things, such as clustering all objects by alignment requirement but then sorting by name within each cluster.

This behavior is purely (compiler) implementation defined. What probably happens is this:
When a function (main() in this case) is invoked which has local variables, memory for those variables are allocated on the stack. In this case, 15 bytes are needed, but it is likely that 4-byte alignment is required for the stack allocation, so that 16 bytes are allocated.
It is also likely that the int-array must be 4-byte aligned. Hence the address of the int array is a multiple of 4. The char-array does not have any alignment requirements so it can be placed anywhere in the 4 remaining bytes.
So in short, the additional byte is unused, but allocated due to alignment.

The maximum memory location my C stack pointer can points to during initialization

Consider the following code in a linux machine with 32 bit OS:
void foo(int *pointer){
int *buf;
int *buf1 = pointer;
....
}
What is the maximum memory address buf and buf1 can point to using the above declaration (OS allocates the address)? E.g., can it point to address 2^32-200?
The reason I asked is that I may do pointer arithmetic on these buffers and I am concern that this pointer arithmetic can wrap around. E.g., assume the len is smaller than the size of buf and buf1. Assume some_pointer points to the end of the buffer.
unsigned char len = 255;
if(buf + len > some_pointer)
//do something
if(buf1 + len > some_pointer)
//do something

The standard says that
For two elements of an array, the address of the element with the lower subscript will always compare less to the address of the object with the higher subscript.
Comparing any two elements that are not part of the same aggregate (array or struct) is undefined behavior.
So if buf + len and some_pointer point to elments in the same array as buf (or one past the array), you don't have to worry about wrap arround. If one of them doesn't, you have undefined behavior anyway.

You shouldn't ever rely on the addresses provided by the allocator falling within a specific range. Even if you could show that on a particular Linux setup, malloc can only generate addresses between X and Y, there is no guarantee--it could change with any future update. The only guarantee from malloc is that successful allocations won't start at NULL (address 0 in code, for Linux and most other typical platforms).

Yes, for a 32 bit or 64 bit OS. Whether there's anything usable there, or if you'll get an access violation trying to dereference the pointer, is up to the compiler and OS.

The OS can map pages of physical memory anywhere in the address space. The addresses you see don’t correspond to physical RAM chips at all. The OS might, for example, have virtual memory or copy-on-write pages.

allocating from stack - data alignment issues in C

In another post, I asked a coding question and in the source code to that question, I declared some variables in the following manner:
char datablock[200];
char *pointer1=datablock;
char *pointer2=datablock+100;
However someone mentioned that the code may be incompatible with 64-bit systems because 100 isn't divisible by 8? I can't remember what it was.
But what I want to do is reserve a huge chunk of memory for use with my program and make it execute as fast as possible and I remember because of the way system caching memory works, that using data from the same block of memory is faster than using data from separate blocks. using malloc is also asking for slower memory.
So in code, This is an example of what I want to do. I want to allocate 40,000 bytes and give 4 pointers access to 10,000 bytes each:
char data[40000];
char *string0=data;
char *string1=data+10000;
char *string2=data+20000;
char *string3=data+30000;
This however is not what I want to do as I believe different sections of memory will be accessed:
char string0[10000];
char string1[10000];
char string2[10000];
char string3[10000];
I believe my idea is correct but is the only thing I need to be concerned about is that for 64-bit systems the offset value is a multiple of 8 and for 32-bit systems the offset value is a multiple of 4?
I don't want to pick wrong numbers and receive segmentation faults.

The alignment problems that may arise are related to storing something that has a specified alignment outside of its alignment rules.
This is not your case. You are not storing pointers in unaligned addresses, you are just storing addresses in aligned pointers.
Just to make it clear:
char *pointer2=datablock+100;
This declares a pointer which could be on stack or on register according to how this will be compiled but the allocation of the space for the pointer itself is given to the compiler, which will do it correctly for the underlying architecture.
The problem can arise when you do something like:
int* asInteger = (int*) (datablock+1);
*datablock = 10;
In this situation you are trying to store a value which has an alignment requirement (int) in an address which could be unaligned to the requirement of int.
In any case, if I remember correctly, x86 architecture allows it to work but it is slower.

Whether the system is 32 or 64 bit will not cause the code you mention to have a segmentation fault. In this example of pointer arithmetic :
char *pointer2=datablock+100;
you are saying : from address pointed to by datablock advance 100 times the size of a char. The number you advance doesn't have to be a multiple of any other number.
Regarding the last code snippet it's likely the 4 sections of memory will be consecutive in the stack but.
You can verify and see what's happening by printing pointer addresses. E.g.
printf("string0 %p\n", string0);
printf("string1 %p\n", string1);
printf("string2 %p\n", string2);
printf("string3 %p\n", string3);

How much bytes does an address take?

Basically my question is how much bytes does a single address take / have?
I mean a char takes 1 byte on my platform and has 1 address. But an int takes 4 bytes. How many addresses does this int take? Does it still have only 1 address or does it have 4?
For example :
char c = 'A'; //Address at 0xdeadbeee
int i = 45846; //Address at 0xdeadbeef
int* iPtr = &i;
iPtr++; //Address at 0xdeadbef3 now
What happens with the addresses between 0xdeadbeef and 0xdeadbef3? Are they all reserved for i? What happens to i when I point to 0xdeadbeee(should be exactly one address | byte or whatever under i) and change it's value?
Edit:
for those who will still answer, I don't want to know how big an integer is. I want to know if it has also 4 addresses when taking 4 bytes of memory and what happens (if it has 4 addresses) when changing one of these addresses' value.
I hope it's clearer now.

The sizes of the built-in types (char, short, int, long) are implementation specific and platform specific. If we assume your int is 32 bits, then we can address some of your questions:
If i resides at 0xdeadbeef, then 0xdeadbeef, 0xdeadbef0, 0xdeadbef1, and 0xdeadbef2 byte addresses would all be used to store i. If you were to set iPtr to 0xdeadbeee and write a value, 0xdeadbeee and the following three addresses would then contain the value you wrote. If you then attempt to read c or i, you would find the value corrupted.
Some things to consider: not all architectures allow byte addressing. A char may be one byte on your system, but due to limitations, 4 bytes may be reserved. Likewise, you may not be able to read or write a pointer that points to non-aligned addresses. For example, a system that can only access memory on 32 bit boundaries could only access 0xdeadbeec or 0xdeadbef0.

How could you find this out? How about:
printf("%zu\n", sizeof(iPtr));
But, as #H2CO3 points out, you're really asking about pointer arithmetic. Read up more on that for more info.

Yes, the address &i+1byte is an address of the second byte of i.
If you live on Memory Street 100 in 4 houses, you have four addresses. But those addresses address different buildings. Although, depending on your postal service the mail may not be delivered if it's not the canonical address (the same goes for memory access — depends on the platform).

You can find out how many bytes a pointer takes by using sizeof:
size_t int_ptr_size = sizeof(int*);
If you try to access data through a pointer that isn't properly aligned for the type, you invoke undefined behavior, so it's unpredictable what will happen. On some architectures, the program will crash with a Bus Error.

An address refers to the start of the data. So the size of the address doesn't change depending on the size of the data.
The actual size of the address will, however, depend on the platform. On many newer systems, that size will be 64 bits. But we can't say exactly without knowing your platform.
You can use sizeof() in your code to get the size of an address.

Everything depends on your hardware platform memory organization.
In case the memory is organized in 4 bytes cells, the variable which length is bellow or equal to 4 bytes (assuming correct memory adjustment), is hold in just one single memory cells, so it is pointed by only one single address value.

pointer typecasting

int main()
{
int *p,*q;
p=(int *)1000;
q=(int *)2000;
printf("%d:%d:%d",q,p,(q-p));
}
output
2000:1000:250
1.I cannot understand p=(int *)1000; line, does this mean that p is pointing to 1000 address location? what if I do *p=22 does this value is stored at 1000 address and overwrite the existing value? If it overwrites the value, what if another program is working with 1000 address space?
how q-p=250?
EDIT: I tried printf("%u:%u:%u",q,p,(q-p)); the output is the same
int main()
{
int *p;
int i=5;
p=&i;
printf("%u:%d",p,i);
return 0;
}
the output
3214158860:5
does this mean the addresses used by compiler are integers? there is no difference between normal integers and address integers?

does this mean that p is pointing to 1000 address location?
Yes.
what if I do *p=22
It's invoking undefined behavior - your program will most likely crash with a segfault.
Note that in modern OSes, addresses are virtual - you can't overwrite an other process' adress space like this, but you can attempt writing to an invalid memory location in your own process' address space.
how q-p=250?
Because pointer arithmetic works like this (in order to be compatible with array indexing). The difference of two pointers is the difference of their value divided by sizeof(*ptr). Similarly, adding n to a pointer ptr of type T results in a numeric value ptr + n * sizeof(T).
Read this on pointers.
does this mean the addresses used by compiler are integers?
That "used by compiler" part is not even necessary. Addresses are integers, it's just an abstraction in C that we have nice pointers to ease our life. If you were coding in assembly, you would just treat them as unsigned integers.
By the way, writing
printf("%u:%d", p, i);
is also undefined behavior - the %u format specifier expects an unsigned int, and not a pointer. To print a pointer, use %p:
printf("%p:%d", (void *)p, i);

Yes, with *p=22 you write to 1000 address.
q-p is 250 because size of int is 4 so it's 2000-1000/4=250

The meaning of p = (int *) 1000 is implementation-defined. But yes, in a typical implementation it will make p to point to address 1000.
Doing *p = 22 afterwards will indeed attempt to store 22 at address 1000. However, in general case this attempt will lead to undefined behavior, since you are not allowed to just write data to arbitrary memory locations. You have to allocate memory in one way or another in order to be able to use it. In your example you didn't make any effort to allocate anything at address 1000. This means that most likely your program will simply crash, because it attempted to write data to a memory region that was not properly allocated. (Additionally, on many platforms in order to access data through pointers these pointers must point to properly aligned locations.)
Even if you somehow succeed succeed in writing your 22 at address 1000, it does not mean that it will in any way affect "other programs". On some old platforms it would (like DOS, fro one example). But modern platforms implement independent virtual memory for each running program (process). This means that each running process has its own separate address 1000 and it cannot see the other program's address 1000.

Yes, p is pointing to virtual address 1000. If you use *p = 22;, you are likely to get a segmentation fault; quite often, the whole first 1024 bytes are invalid for reading or writing. It can't affect another program assuming you have virtual memory; each program has its own virtual address space.
The value of q - p is the number of units of sizeof(*p) or sizeof(*q) or sizeof(int) between the two addresses.

Casting arbitrary integers to pointers is undefined behavior. Anything can happen including nothing, a segmentation fault or silently overwriting other processes' memory (unlikely in the modern virtual memory models).
But we used to use absolute addresses like this back in the real mode DOS days to access interrupt tables and BIOS variables :)
About q-p == 250, it's the result of semantics of pointer arithmetic. Apparently sizeof int is 4 in your system. So when you add 1 to an int pointer it actually gets incremented by 4 so it points to the next int not the next byte. This behavior helps with array access.

does this mean that p is pointing to 1000 address location?
yes. But this 1000 address may belong to some other processes address.In this case, You illegally accessing the memory of another process's address space. This may results in segmentation fault.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight