allocating from stack - data alignment issues in C - c

In another post, I asked a coding question and in the source code to that question, I declared some variables in the following manner:
char datablock[200];
char *pointer1=datablock;
char *pointer2=datablock+100;
However someone mentioned that the code may be incompatible with 64-bit systems because 100 isn't divisible by 8? I can't remember what it was.
But what I want to do is reserve a huge chunk of memory for use with my program and make it execute as fast as possible and I remember because of the way system caching memory works, that using data from the same block of memory is faster than using data from separate blocks. using malloc is also asking for slower memory.
So in code, This is an example of what I want to do. I want to allocate 40,000 bytes and give 4 pointers access to 10,000 bytes each:
char data[40000];
char *string0=data;
char *string1=data+10000;
char *string2=data+20000;
char *string3=data+30000;
This however is not what I want to do as I believe different sections of memory will be accessed:
char string0[10000];
char string1[10000];
char string2[10000];
char string3[10000];
I believe my idea is correct but is the only thing I need to be concerned about is that for 64-bit systems the offset value is a multiple of 8 and for 32-bit systems the offset value is a multiple of 4?
I don't want to pick wrong numbers and receive segmentation faults.

The alignment problems that may arise are related to storing something that has a specified alignment outside of its alignment rules.
This is not your case. You are not storing pointers in unaligned addresses, you are just storing addresses in aligned pointers.
Just to make it clear:
char *pointer2=datablock+100;
This declares a pointer which could be on stack or on register according to how this will be compiled but the allocation of the space for the pointer itself is given to the compiler, which will do it correctly for the underlying architecture.
The problem can arise when you do something like:
int* asInteger = (int*) (datablock+1);
*datablock = 10;
In this situation you are trying to store a value which has an alignment requirement (int) in an address which could be unaligned to the requirement of int.
In any case, if I remember correctly, x86 architecture allows it to work but it is slower.

Whether the system is 32 or 64 bit will not cause the code you mention to have a segmentation fault. In this example of pointer arithmetic :
char *pointer2=datablock+100;
you are saying : from address pointed to by datablock advance 100 times the size of a char. The number you advance doesn't have to be a multiple of any other number.
Regarding the last code snippet it's likely the 4 sections of memory will be consecutive in the stack but.
You can verify and see what's happening by printing pointer addresses. E.g.
printf("string0 %p\n", string0);
printf("string1 %p\n", string1);
printf("string2 %p\n", string2);
printf("string3 %p\n", string3);

Related

How much bytes does an address take?

Basically my question is how much bytes does a single address take / have?
I mean a char takes 1 byte on my platform and has 1 address. But an int takes 4 bytes. How many addresses does this int take? Does it still have only 1 address or does it have 4?
For example :
char c = 'A'; //Address at 0xdeadbeee
int i = 45846; //Address at 0xdeadbeef
int* iPtr = &i;
iPtr++; //Address at 0xdeadbef3 now
What happens with the addresses between 0xdeadbeef and 0xdeadbef3? Are they all reserved for i? What happens to i when I point to 0xdeadbeee(should be exactly one address | byte or whatever under i) and change it's value?
Edit:
for those who will still answer, I don't want to know how big an integer is. I want to know if it has also 4 addresses when taking 4 bytes of memory and what happens (if it has 4 addresses) when changing one of these addresses' value.
I hope it's clearer now.
The sizes of the built-in types (char, short, int, long) are implementation specific and platform specific. If we assume your int is 32 bits, then we can address some of your questions:
If i resides at 0xdeadbeef, then 0xdeadbeef, 0xdeadbef0, 0xdeadbef1, and 0xdeadbef2 byte addresses would all be used to store i. If you were to set iPtr to 0xdeadbeee and write a value, 0xdeadbeee and the following three addresses would then contain the value you wrote. If you then attempt to read c or i, you would find the value corrupted.
Some things to consider: not all architectures allow byte addressing. A char may be one byte on your system, but due to limitations, 4 bytes may be reserved. Likewise, you may not be able to read or write a pointer that points to non-aligned addresses. For example, a system that can only access memory on 32 bit boundaries could only access 0xdeadbeec or 0xdeadbef0.
How could you find this out? How about:
printf("%zu\n", sizeof(iPtr));
But, as #H2CO3 points out, you're really asking about pointer arithmetic. Read up more on that for more info.
Yes, the address &i+1byte is an address of the second byte of i.
If you live on Memory Street 100 in 4 houses, you have four addresses. But those addresses address different buildings. Although, depending on your postal service the mail may not be delivered if it's not the canonical address (the same goes for memory access — depends on the platform).
You can find out how many bytes a pointer takes by using sizeof:
size_t int_ptr_size = sizeof(int*);
If you try to access data through a pointer that isn't properly aligned for the type, you invoke undefined behavior, so it's unpredictable what will happen. On some architectures, the program will crash with a Bus Error.
An address refers to the start of the data. So the size of the address doesn't change depending on the size of the data.
The actual size of the address will, however, depend on the platform. On many newer systems, that size will be 64 bits. But we can't say exactly without knowing your platform.
You can use sizeof() in your code to get the size of an address.
Everything depends on your hardware platform memory organization.
In case the memory is organized in 4 bytes cells, the variable which length is bellow or equal to 4 bytes (assuming correct memory adjustment), is hold in just one single memory cells, so it is pointed by only one single address value.

In C, how can I print out what is stored at a specific memory address?

Is it possible to print out what is stored at a specific memory address? For example I want to know what is stored at the address 0x7FFFFF0. How would I do this? I do not know what is stored at the address before hand ie. it could be an int or char or a null terminator.
Depending on your environment, you may be able to simply declare a pointer and dereference it:
volatile unsigned int *p = (volatile unsigned int *)0x7FFFFF0;
printf("%u\n", *p);
This operation requires your program to have permission to access that memory, of course. Your mileage may vary on different operating systems and environments.
You definitely won't be able to extract any type information at runtime without doing some more work to figure out what that memory represents, semantically speaking, and then extracting the bytes you care about in that context.

pointer typecasting

int main()
{
int *p,*q;
p=(int *)1000;
q=(int *)2000;
printf("%d:%d:%d",q,p,(q-p));
}
output
2000:1000:250
1.I cannot understand p=(int *)1000; line, does this mean that p is pointing to 1000 address location? what if I do *p=22 does this value is stored at 1000 address and overwrite the existing value? If it overwrites the value, what if another program is working with 1000 address space?
how q-p=250?
EDIT: I tried printf("%u:%u:%u",q,p,(q-p)); the output is the same
int main()
{
int *p;
int i=5;
p=&i;
printf("%u:%d",p,i);
return 0;
}
the output
3214158860:5
does this mean the addresses used by compiler are integers? there is no difference between normal integers and address integers?
does this mean that p is pointing to 1000 address location?
Yes.
what if I do *p=22
It's invoking undefined behavior - your program will most likely crash with a segfault.
Note that in modern OSes, addresses are virtual - you can't overwrite an other process' adress space like this, but you can attempt writing to an invalid memory location in your own process' address space.
how q-p=250?
Because pointer arithmetic works like this (in order to be compatible with array indexing). The difference of two pointers is the difference of their value divided by sizeof(*ptr). Similarly, adding n to a pointer ptr of type T results in a numeric value ptr + n * sizeof(T).
Read this on pointers.
does this mean the addresses used by compiler are integers?
That "used by compiler" part is not even necessary. Addresses are integers, it's just an abstraction in C that we have nice pointers to ease our life. If you were coding in assembly, you would just treat them as unsigned integers.
By the way, writing
printf("%u:%d", p, i);
is also undefined behavior - the %u format specifier expects an unsigned int, and not a pointer. To print a pointer, use %p:
printf("%p:%d", (void *)p, i);
Yes, with *p=22 you write to 1000 address.
q-p is 250 because size of int is 4 so it's 2000-1000/4=250
The meaning of p = (int *) 1000 is implementation-defined. But yes, in a typical implementation it will make p to point to address 1000.
Doing *p = 22 afterwards will indeed attempt to store 22 at address 1000. However, in general case this attempt will lead to undefined behavior, since you are not allowed to just write data to arbitrary memory locations. You have to allocate memory in one way or another in order to be able to use it. In your example you didn't make any effort to allocate anything at address 1000. This means that most likely your program will simply crash, because it attempted to write data to a memory region that was not properly allocated. (Additionally, on many platforms in order to access data through pointers these pointers must point to properly aligned locations.)
Even if you somehow succeed succeed in writing your 22 at address 1000, it does not mean that it will in any way affect "other programs". On some old platforms it would (like DOS, fro one example). But modern platforms implement independent virtual memory for each running program (process). This means that each running process has its own separate address 1000 and it cannot see the other program's address 1000.
Yes, p is pointing to virtual address 1000. If you use *p = 22;, you are likely to get a segmentation fault; quite often, the whole first 1024 bytes are invalid for reading or writing. It can't affect another program assuming you have virtual memory; each program has its own virtual address space.
The value of q - p is the number of units of sizeof(*p) or sizeof(*q) or sizeof(int) between the two addresses.
Casting arbitrary integers to pointers is undefined behavior. Anything can happen including nothing, a segmentation fault or silently overwriting other processes' memory (unlikely in the modern virtual memory models).
But we used to use absolute addresses like this back in the real mode DOS days to access interrupt tables and BIOS variables :)
About q-p == 250, it's the result of semantics of pointer arithmetic. Apparently sizeof int is 4 in your system. So when you add 1 to an int pointer it actually gets incremented by 4 so it points to the next int not the next byte. This behavior helps with array access.
does this mean that p is pointing to 1000 address location?
yes. But this 1000 address may belong to some other processes address.In this case, You illegally accessing the memory of another process's address space. This may results in segmentation fault.

How can pointer addresses have different lengths?

I just executed this code example:
int *i = (int*) malloc( sizeof(int) );
printf( "%p, %p\n", &i , i );
and this is what I got:
0x7fff38fed6a8, 0x10f7010
So I wonder why is the second address shorter than the first one?
i is on the stack, while the chunk of memory it points to is in the heap. On your platform these are two very different areas of memory and it just so happens the heap addess is relatively low, numerically, so it has a lot of leading zeroes which are not shown, i.e.
&i = 0x7fff38fed6a8; // stack
i = 0x0000010f7010; // heap
i is an address on the heap, while &i is an address on the stack. The heap and stack occupy different address ranges, so you see different numbers.
The pointers aren't actually different lengths: the shorter one is preceded by zeroes. You are probably running this on a 64-bit machine, so each pointer has 64 bits (16 hex digits).
It is not shorter, just number is smaller. Pointer &i is on stack and i is on heap.
There is no requirement that the %p formatting specifier pads the output to any fixed length. So you can't deduce any information about the in-memory "length" of an address from the printed representation. For instance, if you do this:
const void *nada = NULL;
printf("NULL is at %p\n", nada);
You might well see something like this:
NULL is at 0x0
Of course, this doesn't mean that the void * type is magically occupying only 4 bits when the value is NULL, it simply means that when the pointer value was converted to string, leading zeros were omitted.
UPDATE: Mis-read the question's code, I deleted the irrelevant text.
In addition to the other answers:
Since you didn't include <stdlib.h> there is a good chance that the compiler incorrectly assumes that malloc returns int rather than void*. This is a possibly severe bug, which you have hidden away with the typecast of malloc's return value. Read this and this.
If int has a different width than the address bus on your specific system, for example on many 16-bit or 64-bit CPUs, you will get incorrect results.

This code run on Turbo C but not on gcc compiler?

This code run on Turbo C but not on gcc compiler
Error:syntax error before '*' token
#include<stdio.h>
int main()
{
char huge *near *far *ptr1;
char near *far *huge *ptr2;
char far *huge *near *ptr3;
printf("%d, %d, %d\n", sizeof(ptr1), sizeof(ptr2), sizeof(ptr3));
return 0;
}
Turbo C output is :4, 4 , 2
Can you explain the Output on Turbo C?
The qualifiers huge, far and near, are non-standard. So, while they might work in Turbo C, you can't rely on them working in other compilers (such as gcc).
Borland's C/C++ compilers for DOS supported multiple memory models.
A memory model is a way to access code and data through pointers.
Since DOS runs in the so-called real mode of the CPU, in which memory is accessed through pairs of a segment value and an offset value (each normally being 16-bit long), a memory address is naturally 4 bytes long.
But segment values need not be always specified explicitly. If everything a program needs to access is contained within one segment (a 64KB block of memory aligned on a 16-byte boundary), a single segment value is enough and once it's loaded into the CPU's segment registers (CS, SS, DS, ES), the program can access everything by only using 16-bit offsets. Btw, many .COM-type programs work exactly like that, they use only one segment.
So, there you have 2 possible ways to access memory, with an explicit segment value or without.
In these lines:
char huge *near *far *ptr1;
char near *far *huge *ptr2;
char far *huge *near *ptr3;
the modifiers far, huge and near specify the proximities of the objects that ptr1, ptr2 and ptr3 will point to. They tell the compiler that the *ptr1 and *ptr2 objects will be "far away" from the program's main/current segment(s), that is, they will be in some other segments, and therefore need to be accessed through 4-byte pointers, and the *ptr3 object is "near", within the program's own segment(s), and a 2-byte pointer is sufficient.
This explains the different pointer sizes.
Depending on the memory model that you choose for your program to compile in, function and data pointers will default to either near or far or huge and spare you from spelling them out explicitly, unless you need non-default pointers.
The program memory models are:
tiny: 1 segment for everything; near pointers
small: 1 code segment, 1 data/stack segment; near pointers
medium: multiple code segments, 1 data/stack segment; far code pointers, near data pointers
compact: 1 code segment, multiple data segments; near code pointers, far data pointers
large: multiple code and data segments; far pointers
huge: multiple code and data segments; huge pointers
Huge pointers don't have certain limitations of far pointers, but are slower to operate with.
you forgot to put a comma between variables :).
Variables cannot have same name if their scope is same.

Resources