where is memory allocated for pointers and their data? - c

My question is if i have some function
void func1(){
char * s = "hello";
char * c;
int b;
c = (char *) malloc(15);
strcpy(c,s);
}
I think the s pointer is allocated on the stack but where is the data "hello" stored does that go in the data segment of the program? As for c and b they are unitialized and since 'c = some memory address' and it doesnt have one yet how does that work? and b also has no contents so it cant stored on the stack?
Then when we allocate memory for c on the heap with malloc c now has some memory address, how is this unitialized c variable given the address of the first byte for that string on the heap?

We need to consider what memory location a variable has and what its contents are. Keep this in mind.
For an int, the variable has a memory address and has a number as its contents.
For a char pointer, the variable has a memory address and its contents is a pointer to a string--the actual string data is at another memory location.
To understand this, we need to consider two things:(1) the memory layout of a program
(2) the memory layout of a function when it's been called
Program layout [typical]. Lower memory address to higher memory address:code segment -- where instructions go:
...
machine instructions for func1
...
data segment -- where initialized global variables and constants go:
...
int myglobal_inited = 23;
...
"hello"
...
bss segment -- for unitialized globals:
...
int myglobal_tbd;
...
heap segment -- where malloc data is stored (grows upward towards higher memory
addresses):
...
stack segment -- starts at top memory address and grows downward toward end
of heap
Now here's a stack frame for a function. It will be within the stack segment somewhere. Note, this is higher memory address to lower:function arguments [if any]:
arg2
arg1
arg0
function's return address [where it will go when it returns]
function's stack/local variables:
char *s
char *c
int b
char buf[20]
Note that I've added a "buf". If we changed func1 to return a string pointer (e.g. "char *func1(arg0,arg1,arg2)" and we added "strcpy(buf,c)" or "strcpy(buf,c)" buf would be usable by func1. func1 could return either c or s, but not buf.
That's because with "c" the data is stored in the data segment and persists after func1 returns. Likewise, s can be returned because the data is in the heap segment.
But, buf would not work (e.g. return buf) because the data is stored in func1's stack frame and that is popped off the stack when func1 returns [meaning it would appear as garbage to caller]. In other words, data in the stack frame of a given function is available to it and any function that it may call [and so on ...]. But, this stack frame is not available to a caller of that function. That is, the stack frame data only "persists" for the lifetime of the called function.
Here's the fully adjusted sample program:
int myglobal_initialized = 23;
int myglobal_tbd;
char *
func1(int arg0,int arg1,int arg2)
{
char *s = "hello";
char *c;
int b;
char buf[20];
char *ret;
c = malloc(15);
strcpy(c,s);
strcpy(buf,s);
// ret can be c, s, but _not_ buf
ret = ...;
return ret;
}

Let's divide this answer in two points of view of the same stuff, because the standards only complicate understanding of this topic, but they're standards anyway :).
Subject common to both parts
void func1() {
char *s = "hello";
char *c;
int b;
c = (char*)malloc(15);
strcpy(c, s);
}
Part I: From a standardese point of view
According to the standards, there's this useful concept known as automatic variable duration, in which a variable's space is reserved automatically upon entering a given scope (with unitialized values, a.k.a: garbage!), it may be set/accessed or not during such a scope, and such a space is freed for future use. Note: In C++, this also involves construction and destruction of objects.
So, in your example, you have three automatic variables:
char *s, which gets initialized to whatever the address of "hello" happens to be.
char *c, which holds garbage until it's initialized by a later assignment.
int b, which holds garbage all of its lifetime.
BTW, how storage works with functions is unspecified by the standards.
Part II: From a real-world point of view
On any decent computer architecture you will find a data structure known as the stack. The stack's purpose is to hold space that can be used and recycled by automatic variables, as well as some space for some stuff needed for recursion/function calling, and can serve as a place to hold temporary values (for optimization purposes) if the compiler decides to.
The stack works in a PUSH/POP fashion, that is, the stack grows downwards. Let my explain it a little better. Imagine an empty stack like this:
[Top of the Stack]
[Bottom of the Stack]
If you, for example, PUSH an int of value 5, you get:
[Top of the Stack]
5
[Bottom of the Stack]
Then, if you PUSH -2:
[Top of the Stack]
5
-2
[Bottom of the Stack]
And, if you POP, you retrieve -2, and the stack looks as before -2 was PUSHed.
The bottom of the stack is a barrier that can be moved uppon PUSHing and POPing. On most architectures, the bottom of the stack is recorded by a processor register known as the stack pointer. Think of it as a unsigned char*. You can decrease it, increase it, do pointer arithmetic on it, etcetera. Everything with the sole purpose to do black magic on the stack's contents.
Reserving (space for) automatic variables in the stack is done by decreasing it (remember, it grows downwards), and releasing them is done by increasing it. Basing us on this, the previous theoretical PUSH -2 is shorthand to something like this in pseudo-assembly:
SUB %SP, $4 # Subtract sizeof(int) from the stack pointer
MOV $-2, (%SP) # Copy the value `-2` to the address pointed by the stack pointer
POP whereToPop is merely the inverse
MOV (%SP), whereToPop # Get the value
ADD %SP, $4 # Free the space
Now, compiling func1() may yield the following pseudo-assembly (Note: you are not expected to understand this at its fullest):
.rodata # Read-only data goes here!
.STR0 = "hello" # The string literal goes here
.text # Code goes here!
func1:
SUB %SP, $12 # sizeof(char*) + sizeof(char*) + sizeof(int)
LEA .STR0, (%SP) # Copy the address (LEA, load effective address) of `.STR0` (the string literal) into the first 4-byte space in the stack (a.k.a `char *s`)
PUSH $15 # Pass argument to `malloc()` (note: arguments are pushed last to first)
CALL malloc
ADD %SP, 4 # The caller cleans up the stack/pops arguments
MOV %RV, 4(%SP) # Move the return value of `malloc()` (%RV) to the second 4-byte variable allocated (`4(%SP)`, a.k.a `char *c`)
PUSH (%SP) # Second argument to `strcpy()`
PUSH 4(%SP) # First argument to `strcpy()`
CALL strcpy
RET # Return with no value
I hope this has led some light on you!

Related

Is memory allocated when the variable is not used in c

#include<stdio.h>
int main()
{
int a,b;
float e;
char f;
printf("int &a = %u\n",&a);
printf("int &b = %u\n",&b);
printf("float &e = %u\n",&e);
printf("char &f = %u\n",&f);
}
The Output is
int &a = 2293324
int &b = 2293320
float &e = 2293316
char &f = 2293315
But when i use this code and replace the printf for float--
#include<stdio.h>
int main()
{
int a,b;
float e;
char f;
printf("int &a = %u\n",&a);
printf("int &b = %u\n",&b);
printf("char &f = %u\n",&f);
}
Then the Output is
int &a = 2293324
int &b = 2293320
char &f = 2293319
here address is not provided to float, but it is declared on top.
My questions are
Is memory not allocated to variables not used in program?
Why addresses allocated in decreasing order. ex- it's going from 2293324 to 2293320?
1) Is memory not allocated to variables not used in program?
Yes that can happen, the compiler is allowed to optimize it out.
2) Why addresses allocated in decreasing order. ex- it's going from 2293324 to 2293320?
That is usual for most local storage implementations, that they use the CPU supported stack pointer going from stack top to stack bottom. All those local variables will be allocated at the stack most probably.
1) Is memory not allocated to variables not used in program?
It's an allowed optimization; if an unused variable doesn't affect the program's observable behavior, a compiler may just discard it completely. Note that most modern compilers will warn you about unused variables (so you can either remove them from the code or do something with them).
2) Why addresses allocated in decreasing order. ex- it's going from 2293324 to 2293320?
The compiler is not required to allocate storage for separate objects in any particular order, so don't assume that your variables will be allocated in the order they were declared. Also, remember that on x86 and some other systems, the stack grows "downwards" towards decreasing addresses. Remember the top of any stack is simply the location where something was most recently pushed - it has nothing to do with relative address values.
While not specifically required by the standard, local variables are universally located on the program stack.
When you enter a function, one of the first thing done is to decrement the stack pointer to provide space for the local variables.
SUBL #SOMETHING, SP
Where SOMETHING is the amount of space required and SP is the stack pointer register.. In your first example, SOMETHING is probably 13. Then the address of:
f is 0(SP)
e is 1(sp)
b is 5(sp)
a is 9(sp)
I am assuming your compiler did not align the stack pointer. Often they do giving something more like:
f is 3(SP)
e is 4(sp)
b is 8(sp)
a is 12(sp)
And SOMETHING would be rounded up to 16 on a 32-bit system.
You might want to generate an assembly listing using your compiler to see what is going on underneath.
Is memory not allocated to variables not used in program?
Note that for local variable memory is not really allocated. A variable is temporarily bound to a location on the program stack (stack is not required by the standard but is how it is done in most cases). That is why the variable's initial value is undefined. It could have been bound to something else previously.
The compiler does not need to reserve space for variables that are not used. They can be optimized away. Usually, there are compiler settings to instruct not to do this for debugging.
Why addresses allocated in decreasing order. ex- it's going from 2293324 to 2293320?
Program stacks generally grow downward. Starting ye olde days, the program would be at the bottom of the address space, the heap above that and the stack at the opposite end.
The heap would grow towards higher addresses. The stack would grow towards the heap (lower addresses).
While the address spaces can be more complicated than that these days, the downward growth of stacks has stayed.
There is no particular requirement that the compiler map the variables to the stack in descending order but there's a 50/50 chance it will do it that way.

Finding stack frame size

The stack frame of a caller function can be easily obtained via __builtin_frame_address(1), but what about the stack frame size?
Is there a function that will let me know how big is the stack frame of the caller function?
My first reaction would have been, why would anybody want this? It should be considered bad practice for a C function to dynamically determine the size of the stack frame. The whole point of cdecl (the classic C calling convention) is that the function itself (the 'callee') has no knowledge of the size of the stack frame. Any diversion from that philosophy may cause your code to break when switching over to a different platform, a different address size (e.g. from 32-bit to 64-bit), a different compiler or even different compiler settings (in particular optimizations).
On the other hand, since gcc already offers this function __builtin_frame_address, it will be interesting to see how much information can be derived from there.
From the documentation:
The frame address is normally the address of the first word pushed on to the stack by the function.
On x86, a function typically starts with:
push ebp ; bp for 16-bit, ebp for 32-bit, rbp for 64-bit
In other words, __builtin_frame_address returns the base pointer of the caller's stack frame.
Unfortunately, the base pointer says little or nothing about where any stack frame starts or ends;
the base pointer points to a location that is somewhere in the middle of the stack frame (between the parameters and the local variables).
If you are only interested in the part of the stack frame that holds the local variables, then the function itself has all the knowledge. The size of that part is the difference between the stack pointer and the base pointer.
register char * const basepointer asm("ebp");
register char * const stackpointer asm("esp");
size_localvars = basepointer - stackpointer;
Please keep in mind that gcc seems to allocate space on the stack right from the beginning that is used to hold parameters for other functions called from inside the callee. Strictly speaking, that space belongs to the stack frames of those other functions, but the boundary is unclear. Whether this is a problem, depends on your purpose; what you are going to do with the calculated stack frame size?
As for the other part (the parameters), that depends. If your function has a fixed number of parameters, then you could simply measure the size of the (formal) parameters. It does not guarantee that the caller actually pushed the same amount of parameters on the stack, but assuming the caller compiled without warnings against callee's prototype, it should be OK.
void callee(int a, int b, int c, int d)
{
size_params = sizeof d + (char *)&d - (char *)&a;
}
You can combine the two techniques to get the full stackframe (including return address and saved base pointer):
register char * const stackpointer asm("esp");
void callee(int a, int b, int c, int d)
{
total_size = sizeof d + (char *)&d - stackpointer;
}
If however, your function has a variable number of parameter (an 'ellipsis', like printf has), then the size of the parameters is known only to the caller. Unless the callee has a way to derive the size and number of parameters (in case of a printf-style function, by analyzing the format string), you would have to let the caller pass that information on to the callee.
EDIT:
Please note, this only works to let a function measure his own stack frame. A callee cannot calculate his caller's stack frame size; callee will have to ask caller for that information.
However, callee can make an educated guess about the size of caller's local variables. This block starts where callee's parameters end (sizeof d + (char *)&d), and ends at caller's base pointer (__builtin_frame_address(1)). The start address may be slightly inaccurate due to address alignment imposed by the compiler; the calculated size may include a piece of unused stack space.
void callee(int a, int b, int c, int d)
{
size_localvars_of_caller = __builtin_frame_address(1) - sizeof d - (char *)&d;
}

Holding variables in memory, C++

Today something strange came to my mind. When I want to hold some string in C (C++) the old way, without using string header, I just create array and store that string into it. But, I read that any variable definition in C in local scope of function ends up in pushing these values onto the stack.
So, the string is actually 2* bigger than needed. Because first, the push instructions are located in memory, but then when they are executed (pushed onto the stack) another "copy" of the string is created. First the push instructions, than the stack space is used for one string.
So, why is it this way? Why doesn't compiler just add the string (or other variables) to the program instead of creating them once again when executed? Yes, I know you cannot just have some data inside program block, but it could just be attached to the end of the program, with some jump instruction before. And than, we would just point to these data? Because they are stored in RAM when the program is executed.
Thanks.
There are a couple of ways of dealing with static strings in C and C++:
char string[] = "Contents of the string";
char const *string2 = "Contents of another string";
If you do these inside a function, the first creates a string on the stack, about like you described. The second just creates a pointer to a statically string that's embedded into the executable, about like you imply that you want.
A very good question indeed. You do know that using static keyword on a variable (definition) EDIT: declaration does just what you described, right?
As far as locals are concerned, performance optimization is the key. A local variable cannot be accessed outside the scope of a function. Why then would the compiler try to persist memory for it outside of the stack?
It is not how it works. Nothing gets "pushed", the compiler simply reserves space in the stack frame. You cannot return such a string from the function, you'll return a pointer to a dead stack frame. Any subsequent function call will destroy the string.
Return strings by letting the caller pass a pointer to a buffer, as well as an argument that says how large the buffer is so you won't overrun the end of the buffer when the string is too long.
If you have:
extern void some_function(char * s, int l);
void do_it(void) {
char str[] = "I'm doing it!";
some_function(str, sizeof(str) );
}
This would turn into something like (in psudo asm for a made up processor):
.data
local .do_it.str ; The contents of str are stored in a static variable
.text ; text is where code lives within the executable or object file
do_it:
subtract (sizeof(str)) from stack_pointer ; This reserves the space for str, sizeof(str)
; and a pointer to str on the stack
copy (sizeof(str)) bytes from .do_it.str to [stack_pointer+0] ; this loads the local variable
; using to the memory at the top of the stack
; This copy can be a function call or inline code.
push sizeof(str) ; push second argument first
push stack_pointer+4 ; assuming that we have 4 byte integers,
; this is the memory just on the other side of where we pushed
; sizeof(str), which is where str[0] ended up
call some_function
add (sizeof(str)+8) to stack_pointer ; reclaim the memory used by str, sizeof(str),
; and the pointer to str from the stack
return
From this you can see that your assumption about how the local variable str is created aren't completely correct, but this still is not necessarily as efficient as it could be.
If you did
void do_it(void) {
static str[] = "I'm doing it!";
Then the compiler would not reserve the space on the stack for the string and then copy it onto the stack. If some_function were to alter the contents of str then the next (or a concurrent) call to do_it (in the same process) would be using the altered version of str.
If some_function had been declared as:
extern void some_function(const char * s, int l);
Then, since the compiler can see that there are no operations that change str within do_it it could also get away with not making a local copy of str on the stack even if str were not declared static.

char x[256] vs. char* = malloc(256*sizeof(char));

Someone here recently pointed out to me in a piece of code of mine I am using
char* name = malloc(256*sizeof(char));
// more code
free(name);
I was under the impression that this way of setting up an array was identical to using
char name[256];
and that both ways would require the use of free(). Am I wrong and if so could someone please explain in low level terms what the difference is?
In the first code, the memory is dynamically allocated on the heap. That memory needs to be freed with free(). Its lifetime is arbitrary: it can cross function boundaries, etc.
In the second code, the 256 bytes are allocated on the stack, and are automatically reclaimed when the function returns (or at program termination if it is outside all functions). So you don't have to (and cannot) call free() on it. It can't leak, but it also won't live beyond the end of the function.
Choose between the two based on the requirements for the memory.
Addendum (Pax):
If I may add to this, Ned, most implementations will typically provide more heap than stack (at least by default). This won't typically matter for 256 bytes unless you're already running out of stack or doing heavily recursive stuff.
Also, sizeof(char) is always 1 according to the standard so you don't need that superfluous multiply. Even though the compiler will probably optimize it away, it makes the code ugly IMNSHO.
End addendum (Pax).
and that both ways would require the use of free().
No, only the first needs the use of a free. The second is allocated on the stack. That makes it incredibly fast to allocate. Look here:
void doit() {
/* ... */
/* SP += 10 * sizeof(int) */
int a[10];
/* ... (using a) */
} /* SP -= 10 */
When you create it, the compiler at compile time knows its size and will allocate the right size at the stack for it. The stack is a large chunk of continuous memory located somewhere. Putting something at the stack will just increment (or decrement depending on your platform) the stackpointer. Going out of scope will do the reverse, and your array is freed. That will happen automatically. Therefor variables created that way have automatic storage duration.
Using malloc is different. It will order some arbitrary large memory chunk (from a place called freestore). The runtime will have to lookup a reasonably large block of memory. The size can be determined at runtime, so the compiler generally cannot optimize it at compile time. Because the pointer can go out of scope, or be copied around, there is no inherent coupling between the memory allocated, and the pointer to which the memory address is assigned, so the memory is still allocated even if you have left the function long ago. You have to call free passing it the address you got from malloc manually if the time has come to do so.
Some "recent" form of C, called C99, allows you to give arrays an runtime size. I.e you are allowed to do:
void doit(int n) {
int a[n]; // allocate n * sizeof(int) size on the stack */
}
But that feature should better be avoided if you don't have a reason to use it. One reason is that it's not failsafe: If no memory is available anymore, anything can happen. Another is that C99 is not very portable among compilers.
There is a third possibility here, which is that the array can be declared external to a function, but statically, eg,
// file foo.c
char name[256];
int foo() {
// do something here.
}
I was rather surprised in answers to another question on SO that someone felt this was inappropriate in C; here's it's not even mentioned, and I'm a little confused and surprised (like "what are they teaching kids in school these days?") about this.
If you use this definition, the memory is allocated statically, neither on the heap nor the stack, but in data space in the image. Thus is neither must be managed as with malloc/free, nor do you have to worry about the address being reused as you would with an auto definition.
It's useful to recall the whole "declared" vs "defined" thing here. Here's an example
/* example.c */
char buf1[256] ; /* declared extern, defined in data space */
static char buf2[256] ; /* declared static, defined in data space */
char * buf3 ; /* declared extern, defined one ptr in data space */
int example(int c) { /* c declared here, defined on stack */
char buf4[256] ; /* declared here, defined on stack */
char * buf5 = malloc(256)] /* pointer declared here, defined on stack */
/* and buf4 is address of 256 bytes alloc'd on heap */
buf3 = malloc(256); /* now buf3 contains address of 256 more bytes on heap */
return 0; /* stack unwound; buf4 and buf5 lost. */
/* NOTICE buf4 memory on heap still allocated */
/* so this leaks 256 bytes of memory */
}
Now in a whole different file
/* example2.c */
extern char buf1[]; /* gets the SAME chunk of memory as from example.c */
static char buf2[256]; /* DIFFERENT 256 char buffer than example.c */
extern char * buf3 ; /* Same pointer as from example.c */
void undoes() {
free(buf3); /* this will work as long as example() called first */
return ;
}
This is incorrect - the array declaration does not require a free. Further, if this is within a function, it is allocated on the stack (if memory serves) and is automatically released with the function returns - don't pass a reference to it back the caller!
Break down your statement
char* name = malloc(256*sizeof(char)); // one statement
char *name; // Step 1 declare pointer to character
name = malloc(256*sizeof(char)); // assign address to pointer of memory from heap
name[2]; // access 3rd item in array
*(name+2); // access 3rd item in array
name++; // move name to item 1
Translation: name is now a pointer to character which is assigned the address of some memory on the heap
char name[256]; // declare an array on the stack
name++; // error name is a constant pointer
*(name+2); // access 3rd item in array
name[2]; // access 3rd item in array
char *p = name;
p[2]; // access 3rd item in array
*(p+2); // access 3rd item in array
p++; // move p to item 1
p[0]; // item 1 in array
Translation: Name is a constant pointer to a character that points to some memory on the stack
In C arrays and pointers are the same thing more or less. Arrays are constant pointers to memory. The main difference is that when you call malloc you take your memory from the heap and any memory taken from the heap must be freed from the heap. When you declare the array with a size it is assigned memory from the stack. You can't free this memory because free is made to free memory from the heap. The memory on the stack will automatically be freed when the current program unit returns. In the second example free(p) would be an error also. p is a pointer the name array on the stack. So by freeing p you are attempting to free the memory on the stack.
This is no different from:
int n = 10;
int *p = &n;
freeing p in this case would be an error because p points to n which is a variable on the stack. Therefore p holds a memory location in the stack and cannot be freed.
int *p = (int *) malloc(sizeof(int));
*p = 10;
free(p);
in this case the free is correct because p points to a memory location on the heap which was allocated by malloc.
depending on where you are running this, stack space might be at a HUGE premium. If, for example, you're writing BREW code for Verizon/Alltel handsets, you are generally restricted to miniscule stacks but have ever increasing heap access.
Also, as char[] are most often used for strings, it's not a bad idea to allow the string constructing method to allocate the memory it needs for the string in question, rather than hope that for ever and always 256 (or whatever number you decree) will suffice.

How do I access an individual character from an array of strings in c?

Just trying to understand how to address a single character in an array of strings. Also, this of course will allow me to understand pointers to pointers subscripting in general.
If I have char **a and I want to reach the 3rd character of the 2nd string, does this work: **((a+1)+2)? Seems like it should...
Almost, but not quite. The correct answer is:
*((*(a+1))+2)
because you need to first de-reference to one of the actual string pointers and then you to de-reference that selected string pointer down to the desired character. (Note that I added extra parenthesis for clarity in the order of operations there).
Alternatively, this expression:
a[1][2]
will also work!....and perhaps would be preferred because the intent of what you are trying to do is more self evident and the notation itself is more succinct. This form may not be immediately obvious to people new to the language, but understand that the reason the array notation works is because in C, an array indexing operation is really just shorthand for the equivalent pointer operation. ie: *(a+x) is same as a[x]. So, by extending that logic to the original question, there are two separate pointer de-referencing operations cascaded together whereby the expression a[x][y] is equivalent to the general form of *((*(a+x))+y).
You don't have to use pointers.
int main(int argc, char **argv){
printf("The third character of
argv[1] is [%c].\n", argv[1][2]);
}
Then:
$ ./main hello The third character of
argv[1] is [l].
That's a one and an l.
You could use pointers if you want...
*(argv[1] +2)
or even
*((*(a+1))+2)
As someone pointed out above.
This is because array names are pointers.
Theres a brilliant C programming explanation in the book Hacking the art of exploitation 2nd Edition by Jon Erickson which discusses pointers, strings, worth a mention for the programming explanation section alone https://leaksource.files.wordpress.com/2014/08/hacking-the-art-of-exploitation.pdf.
Although the question has already been answered, someone else wanting to know more may find the following highlights from Ericksons book useful to understand some of the structure behind the question.
Headers
Examples of header files available for variable manipulation you will probably use.
stdio.h - http://www.cplusplus.com/reference/cstdio/
stdlib.h - http://www.cplusplus.com/reference/cstdlib/
string.h - http://www.cplusplus.com/reference/cstring/
limits.h - http://www.cplusplus.com/reference/climits/
Functions
Examples of general purpose functions you will probably use.
malloc() - http://www.cplusplus.com/reference/cstdlib/malloc/
calloc() - http://www.cplusplus.com/reference/cstdlib/calloc/
strcpy() - http://www.cplusplus.com/reference/cstring/strcpy/
Memory
"A compiled program’s memory is divided into five segments: text, data, bss, heap, and stack. Each segment represents a special portion of memory that is set aside for a certain purpose. The text segment is also sometimes called the code segment. This is where the assembled machine language instructions of the program are located".
"The execution of instructions in this segment is nonlinear, thanks to the aforementioned high-level control structures and functions, which compile
into branch, jump, and call instructions in assembly language. As a program
executes, the EIP is set to the first instruction in the text segment. The
processor then follows an execution loop that does the following:"
"1. Reads the instruction that EIP is pointing to"
"2. Adds the byte length of the instruction to EIP"
"3. Executes the instruction that was read in step 1"
"4. Goes back to step 1"
"Sometimes the instruction will be a jump or a call instruction, which
changes the EIP to a different address of memory. The processor doesn’t
care about the change, because it’s expecting the execution to be nonlinear
anyway. If EIP is changed in step 3, the processor will just go back to step 1 and read the instruction found at the address of whatever EIP was changed to".
"Write permission is disabled in the text segment, as it is not used to store variables, only code. This prevents people from actually modifying the program code; any attempt to write to this segment of memory will cause the program to alert the user that something bad happened, and the program
will be killed. Another advantage of this segment being read-only is that it
can be shared among different copies of the program, allowing multiple
executions of the program at the same time without any problems. It should
also be noted that this memory segment has a fixed size, since nothing ever
changes in it".
"The data and bss segments are used to store global and static program
variables. The data segment is filled with the initialized global and static variables, while the bss segment is filled with their uninitialized counterparts. Although these segments are writable, they also have a fixed size. Remember that global variables persist, despite the functional context (like the variable j in the previous examples). Both global and static variables are able to persist because they are stored in their own memory segments".
"The heap segment is a segment of memory a programmer can directly
control. Blocks of memory in this segment can be allocated and used for
whatever the programmer might need. One notable point about the heap
segment is that it isn’t of fixed size, so it can grow larger or smaller as needed".
"All of the memory within the heap is managed by allocator and deallocator algorithms, which respectively reserve a region of memory in the heap for use and remove reservations to allow that portion of memory to be reused for later reservations. The heap will grow and shrink depending on how
much memory is reserved for use. This means a programmer using the heap
allocation functions can reserve and free memory on the fly. The growth of
the heap moves downward toward higher memory addresses".
"The stack segment also has variable size and is used as a temporary scratch pad to store local function variables and context during function calls. This is what GDB’s backtrace command looks at. When a program calls a function, that function will have its own set of passed variables, and the function’s code will be at a different memory location in the text (or code) segment. Since the context and the EIP must change when a function is called, the stack is used to remember all of the passed variables, the location the EIP should return to after the function is finished, and all the local variables used by that function. All of this information is stored together on the stack in what is collectively called a stack frame. The stack contains many stack frames".
"In general computer science terms, a stack is an abstract data structure that is used frequently. It has first-in, last-out (FILO) ordering
, which means the first item that is put into a stack is the last item to come out of it. Think of it as putting beads on a piece of string that has a knot on one end—you can’t get the first bead off until you have removed all the other beads. When an item is placed into a stack, it’s known as pushing, and when an item is removed from a stack, it’s called popping".
"As the name implies, the stack segment of memory is, in fact, a stack data structure, which contains stack frames. The ESP register is used to keep track of the address of the end of the stack, which is constantly changing as items are pushed into and popped off of it. Since this is very dynamic behavior, it makes sense that the stack is also not of a fixed size. Opposite to the dynamic growth of the heap, as the stack change
s in size, it grows upward in a visual listing of memory, toward lower memory addresses".
"The FILO nature of a stack might seem odd, but since the stack is used
to store context, it’s very useful. When a function is called, several things are pushed to the stack together in a stack frame. The EBP register—sometimes called the frame pointer (FP) or local base (LB) pointer
—is used to reference local function variables in the current stack frame. Each stack frame contains the parameters to the function, its local variables, and two pointers that are necessary to put things back the way they were: the saved frame pointer (SFP) and the return address. The
SFP is used to restore EBP to its previous value, and the return address
is used to restore EIP to the next instruction found after the function call. This restores the functional context of the previous stack
frame".
Strings
"In C, an array is simply a list of n elements of a specific data type. A 20-character array is simply 20 adjacent characters located in memory. Arrays are also referred to as buffers".
#include <stdio.h>
int main()
{
char str_a[20];
str_a[0] = 'H';
str_a[1] = 'e';
str_a[2] = 'l';
str_a[3] = 'l';
str_a[4] = 'o';
str_a[5] = ',';
str_a[6] = ' ';
str_a[7] = 'w';
str_a[8] = 'o';
str_a[9] = 'r';
str_a[10] = 'l';
str_a[11] = 'd';
str_a[12] = '!';
str_a[13] = '\n';
str_a[14] = 0;
printf(str_a);
}
"In the preceding program, a 20-element character array is defined as
str_a, and each element of the array is written to, one by one. Notice that the number begins at 0, as opposed to 1. Also notice that the last character is a 0".
"(This is also called a null byte.) The character array was defined, so 20 bytes are allocated for it, but only 12 of these bytes are actually used. The null byte Programming at the end is used as a delimiter character to tell any function that is dealing with the string to stop operations right there. The remaining extra bytes are just garbage and will be ignored. If a null byte is inserted in the fifth element of the character array, only the characters Hello would be printed by the printf() function".
"Since setting each character in a character array is painstaking and strings are used fairly often, a set of standard functions was created for string manipulation. For example, the strcpy() function will copy a string from a source to a destination, iterating through the source string and copying each byte to the destination (and stopping after it copies the null termination byte)".
"The order of the functions arguments is similar to Intel assembly syntax destination first and then source. The char_array.c program can be rewritten using strcpy() to accomplish the same thing using the string library. The next version of the char_array program shown below includes string.h since it uses a string function".
#include <stdio.h>
#include <string.h>
int main()
{
char str_a[20];
strcpy(str_a, "Hello, world!\n");
printf(str_a);
}
Find more information on C strings
http://www.cs.uic.edu/~jbell/CourseNotes/C_Programming/CharacterStrings.html
http://www.tutorialspoint.com/cprogramming/c_strings.htm
Pointers
"The EIP register is a pointer that “points” to the current instruction during a programs execution by containing its memory address. The idea of pointers is used in C, also. Since the physical memory cannot actually be moved, the information in it must be copied. It can be very computationally expensive to copy large chunks of memory to be used by different functions or in different places. This is also expensive from a memory standpoint, since space for the new destination copy must be saved or allocated before the source can be copied. Pointers are a solution to this problem. Instead of copying a large block of memory, it is much simpler to pass around the address of the beginning of that block of memory".
"Pointers in C can be defined and used like any other variable type. Since memory on the x86 architecture uses 32-bit addressing, pointers are also 32 bits in size (4 bytes). Pointers are defined by prepending an asterisk (*) to the variable name. Instead of defining a variable of that type, a pointer is defined as something that points to data of that type. The pointer.c program is an example of a pointer being used with the char data type, which is only 1byte in size".
#include <stdio.h>
#include <string.h>
int main()
{
char str_a[20]; // A 20-element character array
char *pointer; // A pointer, meant for a character array
char *pointer2; // And yet another one
strcpy(str_a, "Hello, world!\n");
pointer = str_a; // Set the first pointer to the start of the array.
printf(pointer);
pointer2 = pointer + 2; // Set the second one 2 bytes further in.
printf(pointer2); // Print it.
strcpy(pointer2, "y you guys!\n"); // Copy into that spot.
printf(pointer); // Print again.
}
"As the comments in the code indicate, the first pointer is set at the beginning of the character array. When the character array is referenced like this, it is actually a pointer itself. This is how this buffer was passed as a pointer to the printf() and strcpy() functions earlier. The second pointer is set to the first pointers address plus two, and then some things are printed (shown in the output below)".
reader#hacking:~/booksrc $ gcc -o pointer pointer.c
reader#hacking:~/booksrc $ ./pointer
Hello, world!
llo, world!
Hey you guys!
reader#hacking:~/booksrc $
"The address-of operator is often used in conjunction with pointers, since pointers contain memory addresses. The addressof.c program demonstrates
the address-of operator being used to put the address of an integer variable
into a pointer. This line is shown in bold below".
#include <stdio.h>
int main()
{
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // put the address of int_var into int_ptr
}
"An additional unary operator called the dereference operator exists for use with pointers. This operator will return the data found in the address the pointer is pointing to, instead of the address itself. It takes the form of an asterisk in front of the variable name, similar to the declaration of a pointer. Once again, the dereference operator exists both in GDB and in C".
"A few additions to the addressof.c code (shown in addressof2.c) will
demonstrate all of these concepts. The added printf() functions use format
parameters, which I’ll explain in the next section. For now, just focus on the programs output".
#include <stdio.h>
int main()
{
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // Put the address of int_var into int_ptr.
printf("int_ptr = 0x%08x\n", int_ptr);
printf("&int_ptr = 0x%08x\n", &int_ptr);
printf("*int_ptr = 0x%08x\n\n", *int_ptr);
printf("int_var is located at 0x%08x and contains %d\n", &int_var, int_var);
printf("int_ptr is located at 0x%08x, contains 0x%08x, and points to %d\n\n", &int_ptr, int_ptr, *int_ptr);
}
"When the unary operators are used with pointers, the address-of operator can be thought of as moving backward, while the dereference operator moves forward in the direction the pointer is pointing".
Find out more about Pointers & memory allocation
Professor Dan Hirschberg, Computer Science Department, University of California on computer memory https://www.ics.uci.edu/~dan/class/165/notes/memory.html
http://cslibrary.stanford.edu/106/
http://www.programiz.com/c-programming/c-dynamic-memory-allocation
Arrays
Theres a simple tutorial on multi-dimensional arrays by a chap named Alex Allain available here http://www.cprogramming.com/tutorial/c/lesson8.html
Theres information on arrays by a chap named Todd A Gibson available here http://www.augustcouncil.com/~tgibson/tutorial/arr.html
Iterate an Array
#include <stdio.h>
int main()
{
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
char *char_pointer;
int *int_pointer;
char_pointer = char_array;
int_pointer = int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the integer %d\n", int_pointer, *int_pointer);
int_pointer = int_pointer + 1;
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
printf("[char pointer] points to %p, which contains the char '%c'\n", char_pointer, *char_pointer);
char_pointer = char_pointer + 1;
}
}
Linked Lists vs Arrays
Arrays are not the only option available, information on Linked List.
http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_linklist.aspx
Conclusion
This information was written simply to pass on some of what I have read throughout my research on the topic that might help others.
Iirc, a string is actually an array of chars, so this should work:
a[1][2]
Quote from the wikipedia article on C pointers -
In C, array indexing is formally defined in terms of pointer arithmetic; that is,
the language specification requires that array[i] be equivalent to *(array + i). Thus in C, arrays can be thought of as pointers to consecutive areas of memory (with no gaps),
and the syntax for accessing arrays is identical for that which can be used to dereference
pointers. For example, an array can be declared and used in the following manner:
int array[5]; /* Declares 5 contiguous (per Plauger Standard C 1992) integers */
int *ptr = array; /* Arrays can be used as pointers */
ptr[0] = 1; /* Pointers can be indexed with array syntax */
*(array + 1) = 2; /* Arrays can be dereferenced with pointer syntax */
So, in response to your question - yes, pointers to pointers can be used as an array without any kind of other declaration at all!
Try a[1][2]. Or *(*(a+1)+2).
Basically, array references are syntactic sugar for pointer dereferencing. a[2] is the same as a+2, and also the same as 2[a] (if you really want unreadable code). An array of strings is the same as a double pointer. So you can extract the second string using either a[1] or *(a+1). You can then find the third character in that string (call it 'b' for now) with either b[2] or *(b + 2). Substituting the original second string for 'b', we end up with either a[1][2] or *(*(a+1)+2).

Resources