gcc compiler is not allocating memory contigously - c

I writing a simple program, after declaring a variables I am checking the addresses of variables but memory is not allocated contiguously there are gaps in between.
here is my program. why is leaving gaps i am not understanding
#include < stdio.h >
#include < stdint.h >
int main()
{
char char_one,char_two;
int a = 5,b = 7,*ptr,*ptr_one;
static int *sum_ptr;
printf("address of a %u\n",&a);
printf("address of variable b %u\n",&b);
printf("address of ptr variable %u\n",&ptr);
printf("address of ptr_one variable %u\n",&ptr_one);
printf("address of char_one var %u\n",&char_one);
printf("address of char_two var %u\n",&char_two);
return 0;
}
output:
address of a 2636128020
address of variable b 2636128024
address of ptr variable 2636128000
address of ptr_one variable 2636128008
address of char_one var 2636128030
address of char_two var 2636128031

The C standard does not require memory for variables to be allocated contiguously. In fact, memory might not even be allocated at all if the compiler decides it can optimize by keeping a value in a register instead.
If you declare a struct, the contents of the struct will be ordered the way you declare them, but you still might need to consider how the data is aligned within the struct - for example, ints are aligned on 4-byte boundaries in many architectures. So if you have:
struct foo
{
char a;
int b;
}
a is guaranteed to come before b in memory, but they will still be padded with extra bytes between them to keep the correct alignment (so it takes 8 bytes to store your struct, even though it only "really" needs 5).
Here's a good resource for how structure alignment works:
The lost art of C structure packing

It is not necessary that compiler will allocate memory contiguously (excluding array, In case of structures may be padding there). It is free to allocate memory from where it finds a free location.

This is by design and on purpose, even on a freshly booted machine you won't get contiguous malloc()s.
Read here: http://en.wikipedia.org/wiki/Address_space_layout_randomization
Upon request: address space layout randomization is a techique to make sure memory is allocated in a more or less unpredictable way to ensure attacks relying on a fixed address layout won't succeed. This makes it (more) difficult for an attacker to exploit known deficiencies (or even detect them first).

Related

Dereferencing pointer to arbitrary address gives Segmentation fault

I have written a simple C code for pointers. As per my understanding, Pointer is a variable which holds the address of another variable.
Eg :
int x = 25; // address - 1024
int *ptr = &x;
printf("%d", *ptr); // *ptr will give value at address of x i.e, 25 at 1024 address.
However when I try below code I'm getting segmentation fault
#include "stdio.h"
int main()
{
int *ptr = 25;
printf("%d", *ptr);
return 0;
}
What's wrong in this? Why can't a pointer variable return the value at address 25? Shouldn't I be able to read the bytes at that address?
Unless you're running on an embedded system with specific known memory locations, you can't assign an arbitrary value to a pointer an expect to be able to dereference it successfully.
Section 6.5.3.2p4 of the C standard states the following regarding the indirection operator *:
The unary
* operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an
object, the result is an lvalue designating the object. If
the operand has type "pointer to type", the result has
type "type". If an invalid value has been assigned to
the pointer, the behavior of the unary
* operator is undefined.
As mentioned in the passage above, the C standard only allows for pointers to point to known objects or to dynamically allocated memory (or NULL), not arbitrary memory locations. Some implementations may allow that in specific situations, but not in general.
Although the behavior of your program is undefined according to the C standard, your code is actually correct in the sense that it is doing exactly what you intend. It is attempting to read from memory address 25 and print the value at that address.
However, in most modern operating systems, such as Windows and Linux, programs use virtual memory and not physical memory. Therefore, you are most likely attempting to access a virtual memory address that is not mapped to a physical memory address. Accessing an unmapped memory location is illegal and causes a segmentation fault.
Since the memory address 0 (which is written in C as NULL) is normally reserved to specify an invalid memory address, most modern operating systems never map the first few kilobytes of virtual memory addresses to physical memory. That way, a segmentation fault will occur when an invalid NULL pointer is dereferenced (which is good, because it makes it easier to detect bugs).
For this reason, you can be reasonably certain that also the address 25 (which is very close to address 0) is never mapped to physical memory and will therefore cause a segmentation fault if you attempt to access that address.
However, most other addresses in your program's virtual memory address space will most likely have the same problem. Since the operating system tries to save physical memory if possible, it will not map more virtual memory address space to physical memory than necessary. Therefore, trying to guess valid memory addresses will fail, most of the time.
If you want to explore the virtual address space of your process to find memory addresses that you can read without a segmentation fault occuring, you can use the appropriate API supplied by your operating system. On Windows, you can use the function VirtualQuery. On Linux, you can read the pseudo-filesystem /proc/self/maps. The ISO C standard itself does not provide any way of determining the layout of your virtual memory address space, as this is operating system specific.
If you want to explore the virtual memory address layout of other running processes, then you can use the VirtualQueryEx function on Windows and read /proc/[pid]/maps on Linux. However, since other processes have a separate virtual memory address space, you can't access their memory directly, but must use the ReadProcessMemory and WriteProcessMemory functions on Windows and use /proc/[pid]/mem on Linux.
Disclaimer: Of course, I don't recommend messing around with the memory of other processes, unless you know exactly what you are doing.
However, as a programmer, you normally don't want to explore the virtual memory address space. Instead, you normally work with memory that has been assigned to your program by the operating system. If you want the operating system to give you some memory to play around with, which you are allowed to read from and write to at will (i.e. without segmentation faults), then you can just declare a large array of chars (bytes) as a global variable, for example char buffer[1024];. Be careful with declaring larger arrays as local variables, as this may cause a stack overflow. Alternatively, you can ask the operating system for dynamically allocated memory, for example using the malloc function.
You should consider all warnings that the compiler issues.
This statement
int *ptr = 25;
is incorrect. You are trying to assign an integer to a pointer as an address of memory. Thus in this statement
printf("%d", *ptr);
there is an attempt to access memory at address 25 that does not belong to your program.
What you mean is the following
#include "stdio.h"
int main( void )
{
int x = 25;
int *ptr = &x;
printf("%d", *ptr);
return 0;
}
Or
#include "stdio.h"
#include <stdlib.h>
int main( void )
{
int *ptr = malloc( sizeof( int ) );
*ptr = 25;
printf("%d", *ptr);
free( ptr );
return 0;
}

Is memory allocated when the variable is not used in c

#include<stdio.h>
int main()
{
int a,b;
float e;
char f;
printf("int &a = %u\n",&a);
printf("int &b = %u\n",&b);
printf("float &e = %u\n",&e);
printf("char &f = %u\n",&f);
}
The Output is
int &a = 2293324
int &b = 2293320
float &e = 2293316
char &f = 2293315
But when i use this code and replace the printf for float--
#include<stdio.h>
int main()
{
int a,b;
float e;
char f;
printf("int &a = %u\n",&a);
printf("int &b = %u\n",&b);
printf("char &f = %u\n",&f);
}
Then the Output is
int &a = 2293324
int &b = 2293320
char &f = 2293319
here address is not provided to float, but it is declared on top.
My questions are
Is memory not allocated to variables not used in program?
Why addresses allocated in decreasing order. ex- it's going from 2293324 to 2293320?
1) Is memory not allocated to variables not used in program?
Yes that can happen, the compiler is allowed to optimize it out.
2) Why addresses allocated in decreasing order. ex- it's going from 2293324 to 2293320?
That is usual for most local storage implementations, that they use the CPU supported stack pointer going from stack top to stack bottom. All those local variables will be allocated at the stack most probably.
1) Is memory not allocated to variables not used in program?
It's an allowed optimization; if an unused variable doesn't affect the program's observable behavior, a compiler may just discard it completely. Note that most modern compilers will warn you about unused variables (so you can either remove them from the code or do something with them).
2) Why addresses allocated in decreasing order. ex- it's going from 2293324 to 2293320?
The compiler is not required to allocate storage for separate objects in any particular order, so don't assume that your variables will be allocated in the order they were declared. Also, remember that on x86 and some other systems, the stack grows "downwards" towards decreasing addresses. Remember the top of any stack is simply the location where something was most recently pushed - it has nothing to do with relative address values.
While not specifically required by the standard, local variables are universally located on the program stack.
When you enter a function, one of the first thing done is to decrement the stack pointer to provide space for the local variables.
SUBL #SOMETHING, SP
Where SOMETHING is the amount of space required and SP is the stack pointer register.. In your first example, SOMETHING is probably 13. Then the address of:
f is 0(SP)
e is 1(sp)
b is 5(sp)
a is 9(sp)
I am assuming your compiler did not align the stack pointer. Often they do giving something more like:
f is 3(SP)
e is 4(sp)
b is 8(sp)
a is 12(sp)
And SOMETHING would be rounded up to 16 on a 32-bit system.
You might want to generate an assembly listing using your compiler to see what is going on underneath.
Is memory not allocated to variables not used in program?
Note that for local variable memory is not really allocated. A variable is temporarily bound to a location on the program stack (stack is not required by the standard but is how it is done in most cases). That is why the variable's initial value is undefined. It could have been bound to something else previously.
The compiler does not need to reserve space for variables that are not used. They can be optimized away. Usually, there are compiler settings to instruct not to do this for debugging.
Why addresses allocated in decreasing order. ex- it's going from 2293324 to 2293320?
Program stacks generally grow downward. Starting ye olde days, the program would be at the bottom of the address space, the heap above that and the stack at the opposite end.
The heap would grow towards higher addresses. The stack would grow towards the heap (lower addresses).
While the address spaces can be more complicated than that these days, the downward growth of stacks has stayed.
There is no particular requirement that the compiler map the variables to the stack in descending order but there's a 50/50 chance it will do it that way.

The maximum memory location my C stack pointer can points to during initialization

Consider the following code in a linux machine with 32 bit OS:
void foo(int *pointer){
int *buf;
int *buf1 = pointer;
....
}
What is the maximum memory address buf and buf1 can point to using the above declaration (OS allocates the address)? E.g., can it point to address 2^32-200?
The reason I asked is that I may do pointer arithmetic on these buffers and I am concern that this pointer arithmetic can wrap around. E.g., assume the len is smaller than the size of buf and buf1. Assume some_pointer points to the end of the buffer.
unsigned char len = 255;
if(buf + len > some_pointer)
//do something
if(buf1 + len > some_pointer)
//do something
The standard says that
For two elements of an array, the address of the element with the lower subscript will always compare less to the address of the object with the higher subscript.
Comparing any two elements that are not part of the same aggregate (array or struct) is undefined behavior.
So if buf + len and some_pointer point to elments in the same array as buf (or one past the array), you don't have to worry about wrap arround. If one of them doesn't, you have undefined behavior anyway.
You shouldn't ever rely on the addresses provided by the allocator falling within a specific range. Even if you could show that on a particular Linux setup, malloc can only generate addresses between X and Y, there is no guarantee--it could change with any future update. The only guarantee from malloc is that successful allocations won't start at NULL (address 0 in code, for Linux and most other typical platforms).
Yes, for a 32 bit or 64 bit OS. Whether there's anything usable there, or if you'll get an access violation trying to dereference the pointer, is up to the compiler and OS.
The OS can map pages of physical memory anywhere in the address space. The addresses you see don’t correspond to physical RAM chips at all. The OS might, for example, have virtual memory or copy-on-write pages.

When do I need dynamic memory? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Malloc or normal array definition?
We learn that there is dynamic memory in C and dynamic variables:
#include <stdio.h>
int a = 17;
int main(void)
{
int b = 18; //automatic stack memory
int * c;
c = malloc( sizeof( int ) ); //dynamic heap memory
*c = 19;
printf("a = %d at address %x\n", a, &a);
printf("b = %d at address %x\n", b, &b);
printf("c = %d at address %x\n", *c, c);
free(c);
system("PAUSE");
return 0;
}
How do I know which type of memory to use? When do I ned one or the other?
Use dynamic in the following situations:
When you need a lot of memory. Typical stack size is 1 MB, so anything bigger than 50-100KB should better be dynamically allocated, or you're risking crash. Some platforms can have this limit even lower.
When the memory must live after the function returns. Stack memory gets destroyed when function ends, dynamic memory is freed when you want.
When you're building a structure (like array, or graph) of size that is unknown (i.e. may get big), dynamically changes or is too hard to precalculate. Dynamic allocation allows your code to naturally request memory piece by piece at any moment and only when you need it. It is not possible to repeatedly request more and more stack space in a for loop.
Prefer stack allocation otherwise. It is faster and can not leak.
You use dynamic memory when the size of your allocation is not known in advance only on runtime.
For example you ask a user to input names (lets say up to 10 names) and store them in a string array. Since you do not know how much names the user will provide (only on runtime that is) you will have to allocate the array only after you know how much to allocate so you will use dynamic allocation.
You can of course use an array of fixed sized 10 but for larger amounts this will be wasteful
Use dynamic memory allocation, if you don't know exactly how much memory your program will need to allocate at compile-time.
int a[n] for example will limit your array size to n. Also, it allocated n x 4 bytes of memory whether you use it or not. This is allocated on the stack, and the variable n must be known at compile time.
int *a = (int *)malloc(n * sizeof (int)) on the other hand allocated at runtime, on the heap, and the n needs to be known only at runtime, not necessarily at compile-time.
This also ensures you allocate exactly as much memory as you really need. However, as you allocated it at runtime, the cleaning up has to be done by you using free.
You should use dynamic memory when:
If you want your object to persist beyond the scope in which it was created.
Usually, stack sizes are limited and hence if your object occupies a lot of memory then you might run out of stack space in such cases one would usually go for dynamic memory allocation.
Note that c99 standard introduces Variable Length Arrays(VLA) in C so you need not use dynamic memory allocation just because you do not know the array dimensions before hand(unless ofcourse #2 mentioned above is the case)
It is best to avoid dynamic memory allocations as much as you can because it means explicitly managing the memory instead of the automatic mechanism provided by the language.Explicit memory management means that you are prone to make more errors, which might lead to catastrophic effects.
Having said that dynamic memory allocations cannot be avoided always and must be used when the use is imperative(two cases mentioned above).
If you can program without dynamic allocation don't use it!
But a day you will be blocked, and the only way to unblock you will be to use dynamic allocation then now you can use it
Als made an interesting point that you should allocate memory from the heap if your object needs to persist beyond the scope in which it was created. In the code above, you don't need to allocate memory from heap at all. You can rewrite it like this:
#include <stdio.h>
int a = 17;
int main(void)
{
int b = 18; //automatic stack memory
int c[1]; // allocating stack memory. sizeof(int) * 1
c[0] = 19;
printf("a = %d at address %x\n", a, &a);
printf("b = %d at address %x\n", b, &b);
printf("c = %d at address %x\n", c[0], c);
system("PAUSE");
return 0;
}
In fact, as part of the C99 standard (Variable-length array), you can use the [] operator to allocate dynamic space for an array on the stack just as you would normally do to create an array. You don't even need to know the size of the array at compilation time. The compiler will just adjust the esp register (for x86 machines) based on the requested allocation space and you're good to go.

C Language: Why do dynamically-allocated objects return a pointer, while statically-allocated objects give you a choice?

This is actually a much more concise, much more clear question than the one I had asked here before(for any who cares): C Language: Why does malloc() return a pointer, and not the value? (Sorry for those who initially think I'm spamming... I hope it's not construed as the same question since I think the way I phrased it there made it unintentionally misleading)
-> Basically what I'm trying to ask is: Why does a C programmer need a pointer to a dynamically-allocated variable/object? (whatever the difference is between variable/object...)
If a C programmer has the option of creating just 'int x' or just 'int *x' (both statically allocated), then why can't he also have the option to JUST initialize his dynamically-allocated variable/object as a variable (and NOT returning a pointer through malloc())?
*If there are some obscure ways to do what I explained above, then, well, why does malloc() seem the way that most textbooks go about dynamic-allocation?
Note: in the following, byte refers to sizeof(char)
Well, for one, malloc returns a void *. It simply can't return a value: that wouldn't be feasible with C's lack of generics. In C, the compiler must know the size of every object at compile time; since the size of the memory being allocated will not be known until run time, then a type that could represent any value must be returned. Since void * can represent any pointer, it is the best choice.
malloc also cannot initialize the block: it has no knowledge of what's being allocated. This is in contrast with C++'s operator new, which does both the allocation and the initialization, as well as being type safe (it still returns a pointer instead of a reference, probably for historical reasons).
Also, malloc allocates a block of memory of a specific size, then returns a pointer to that memory (that's what malloc stands for: memory allocation). You're getting a pointer because that's what you get: an unitialized block of raw memory. When you do, say, malloc(sizeof(int)), you're not creating a int, you're allocating sizeof(int) bytes and getting the address of those bytes. You can then decide to use that block as an int, but you could also technically use that as an array of sizeof(int) chars.
The various alternatives (calloc, realloc) work roughly the same way (calloc is easier to use when dealing with arrays, and zero-fills the data, while realloc is useful when you need to resize a block of memory).
Suppose you create an integer array in a function and want to return it. Said array is a local variable to the function. You can't return a pointer to a local variable.
However, if you use malloc, you create an object on the heap whose scope exceeds the function body. You can return a pointer to that. You just have to destroy it later or you will have a memory leak.
It's because objects allocated with malloc() don't have names, so the only way to reference that object in code is to use a pointer to it.
When you say int x;, that creates an object with the name x, and it is referenceable through that name. When I want to set x to 10, I can just use x = 10;.
I can also set a pointer variable to point to that object with int *p = &x;, and then I can alternatively set the value of x using *p = 10;. Note that this time we can talk about x without specifically naming it (beyond the point where we acquire the reference to it).
When I say malloc(sizeof(int)), that creates an object that has no name. I cannot directly set the value of that object by name, since it just doesn't have one. However, I can set it by using a pointer variable that points at it, since that method doesn't require naming the object: int *p = malloc(sizeof(int)); followed by *p = 10;.
You might now ask: "So, why can't I tell malloc to give the object a name?" - something like malloc(sizeof(int), "x"). The answer to this is twofold:
Firstly, C just doesn't allow variable names to be introduced at runtime. It's just a basic restriction of the language;
Secondly, given the first restriction the name would have to be fixed at compile-time: if this is the case, C already has syntax that does what you want: int x;.
You are thinking about things wrong. It is not that int x is statically allocated and malloc(sizeof(int)) is dynamic. Both are allocated dynamically. That is, they are both allocated at execution time. There is no space reserved for them at the time you compile. The size may be static in one case and dynamic in the other, but the allocation is always dynamic.
Rather, it is that int x allocates the memory on the stack and malloc(sizeof(int)) allocates the memory on the heap. Memory on the heap requires that you have a pointer in order to access it. Memory on the stack can be referenced directly or with a pointer. Usually you do it directly, but sometimes you want to iterate over it with pointer arithmetic or pass it to a function that needs a pointer to it.
Everything works using pointers. "int x" is just a convenience - someone, somewhere got tired of juggling memory addresses and that's how programming languages with human-readable variable names were born.
Dynamic allocation is... dynamic. You don't have to know how much space you are going to need when the program runs - before the program runs. You choose when to do it and when to undo it. It may fail. It's hard to handle all this using the simple syntax of static allocation.
C was designed with simplicity in mind and compiler simplicity is a part of this. That's why you're exposed to the quirks of the underlying implementations. All systems have storage for statically-sized, local, temporary variables (registers, stack); this is what static allocation uses. Most systems have storage for dynamic, custom-lifetime objects and system calls to manage them; this is what dynamic allocation uses and exposes.
There is a way to do what you're asking and it's called C++. There, "MyInt x = 42;" is a function call or two.
I think your question comes down to this:
If a C programmer has the option of creating just int x or just int *x (both statically allocated)
The first statement allocates memory for an integer. Depending upon the placement of the statement, it might allocate the memory on the stack of a currently executing function or it might allocate memory in the .data or .bss sections of the program (if it is a global variable or static variable, at either file scope or function scope).
The second statement allocates memory for a pointer to an integer -- it hasn't actually allocated memory for the integer itself. If you tried to assign a value using the pointer *x=1, you would either receive a very quick SIGSEGV segmentation violation or corrupt some random piece of memory. C doesn't pre-zero memory allocated on the stack:
$ cat stack.c
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
int j;
int k;
int *l;
int *m;
int *n;
printf("i: %d\n", i);
printf("j: %d\n", j);
printf("k: %d\n", k);
printf("l: %p\n", l);
printf("m: %p\n", m);
printf("n: %p\n", n);
return 0;
}
$ make stack
cc stack.c -o stack
$ ./stack
i: 0
j: 0
k: 32767
l: 0x400410
m: (nil)
n: 0x4005a0
l and n point to something in memory -- but those values are just garbage, and probably don't belong to the address space of the executable. If we store anything into those pointers, the program would probably die. It might corrupt unrelated structures, though, if they are mapped into the program's address space.
m at least is a NULL pointer -- if you tried to write to it, the program would certainly die on modern hardware.
None of those three pointers actually point to an integer yet. The memory for those integers doesn't exist. The memory for the pointers does exist -- and is initially filled with garbage values, in this case.
The Wikipedia article on L-values -- mostly too obtuse to fully recommend -- makes one point that represented a pretty significant hurdle for me when I was first learning C: In languages with assignable variables it becomes necessary to distinguish between the R-value (or contents) and the L-value (or location) of a variable.
For example, you can write:
int a;
a = 3;
This stores the integer value 3 into whatever memory was allocated to store the contents of variable a.
If you later write:
int b;
b = a;
This takes the value stored in the memory referenced by a and stores it into the memory location allocated for b.
The same operations with pointers might look like this:
int *ap;
ap=malloc(sizeof int);
*ap=3;
The first ap= assignment stores a memory location into the ap pointer. Now ap actually points at some memory. The second assignment, *ap=, stores a value into that memory location. It doesn't update the ap pointer at all; it reads the value stored in the variable named ap to find the memory location for the assignment.
When you later use the pointer, you can choose which of the two values associated with the pointer to use: either the actual contents of the pointer or the value pointed to by the pointer:
int *bp;
bp = ap; /* bp points to the same memory cell as ap */
int *bp;
bp = malloc(sizeof int);
*bp = *ap; /* bp points to new memory and we copy
the value pointed to by ap into the
memory pointed to by bp */
I found assembly far easier than C for years because I found the difference between foo = malloc(); and *foo = value; confusing. I hope I found what was confusing you and seriously hope I didn't make it worse.
Perhaps you misunderstand the difference between declaring 'int x' and 'int *x'. The first allocates storage for an int value; the second doesn't - it just allocates storage for the pointer.
If you were to "dynamically allocate" a variable, there would be no point in the dynamic allocation anyway (unless you then took its address, which would of course yield a pointer) - you may as well declare it statically. Think about how the code would look - why would you bother with:
int x = malloc(sizeof(int)); *x = 0;
When you can just do:
int x = 0;

Resources