I have a
LS_Led* LS_vol_leds[10];
declared in one C module, and the proper externs in the other modules that access it.
In func1() I have this line:
/* Debug */
LS_Led led = *(LS_vol_leds[0]);
And it does not cause an exception. Then
I call func2() in another C module (right after above line), and do the same line, namely:
/* Debug */
LS_Led led = *(LS_vol_leds[0]);`
first thing, and exception thrown!!!
I don't think I have the powers to debug this one on my own.
Before anything LS_vol_leds is initialized in func1() with:
LS_vol_leds[0] = &led3;
LS_vol_leds[1] = &led4;
LS_vol_leds[2] = &led5;
LS_vol_leds[3] = &led6;
LS_vol_leds[4] = &led7;
LS_vol_leds[5] = &led8;
LS_vol_leds[6] = &led9;
LS_vol_leds[7] = &led10;
LS_vol_leds[8] = &led11;
LS_vol_leds[9] = &led12;
My externs look like
extern LS_Led** LS_vol_leds;
So does that lead to disaster and I how do I prevent disaster?
Thanks.
This leads to disaster:
extern LS_Led** LS_vol_leds;
You should try this instead:
extern LS_Led *LS_vol_leds[];
If you really want to know why, you should read Expert C Programming - Deep C Secrets, by Peter Van Der Linden (amazing book!), especially chapter 4, but the quick answer is that this is one of those corner cases where pointers and arrays are not interchangeable: a pointer is a variable which holds the address of another one, whereas an array name is an address. extern LS_Led** LS_vol_leds; is lying to the compiler and generating the wrong code to access LS_vol_leds[i].
With this:
extern LS_Led** LS_vol_leds;
The compiler will believe that LS_vol_leds is a pointer, and thus, LS_vol_leds[i] involves reading the value stored in the memory location that is responsible for LS_vol_leds, use that as an address, and then scale i accordingly to get the offset.
However, since LS_vol_leds is an array and not a pointer, the compiler should instead pick the address of LS_vol_leds directly. In other words: what is happening is that your original extern causes the compiler to dereference LS_vol_leds[0] because it believes that LS_vol_leds[0] holds the address of the pointed-to object.
UPDATE: Fun fact - the back cover of the book talks about this specific case:
So that's why extern char *cp isn't the same as extern char cp[]. I
knew that it didn't work despite their superficial equivalence, but I
didn't know why. [...]
UPDATE2: Ok, since you asked, let's dig deeper. Consider a program split into two files, file1.c and file2.c. Its contents are:
file1.c
#define BUFFER_SIZE 1024
char cp[BUFFER_SIZE];
/* Lots of code using cp[i] */
file2.c
extern char *cp;
/* Code using cp[i] */
The moment you try to assing to cp[i] or use cp[i] in file2.c will most likely crash your code. This is deeply tight into the mechanics of C and the code that the compiler generates for array-based accesses and pointer-based accesses.
When you have a pointer, you must think of it as a variable. A pointer is a variable like an int, float or something similar, but instead of storing an integer or a float, it stores a memory address - the address of another object.
Note that variables have addresses. When you have something like:
int a;
Then you know that a is the name for an integer object. When you assign to a, the compiler emits code that writes into whatever address is associated with a.
Now consider you have:
char *p;
What happens when you access *p? Remember - a pointer is a variable. This means that the memory address associated with p holds an address - namely, an address holding a character. When you assign to p (i.e., make it point to somewhere else), then the compiler grabs the address of p and writes a new address (the one you provide it) into that location.
For example, if p lives at 0x27, it means that reading memory location 0x27 yields the address of the object pointed to by p. So, if you use *p in the right hand side of an assignment, the steps to get the value of *p are:
Read the contents of 0x27 - say it's 0x80 - this is the value of the pointer, or, equivalently, the address of the pointed-to object
Read the contents of 0x80 - this finally gives you *p.
What if p is an array? If p is an array, then the variable p itself represents the array. By convention, the address representing an array is the address of its first element. If the compiler chooses to store the array in address 0x59, it means that the first element of p lives at 0x59. So when you read p[0] (or *p), the generated code is simpler: the compiler knows that the variable p is an array, and the address of an array is the address of the first element, so p[0] is the same as reading 0x59. Compare this to the case for which p is a pointer.
If you lie to the compiler, and tell it you have a pointer instead of an array, the compiler will (wrongly) generate code that does what I showed for the pointer case. You're basically telling it that 0x59 is not the address of an array, it's the address of a pointer. So, reading p[i] will cause it to use the pointer version:
Read the contents of 0x59 - note that, in reality, this is p[0]
Use that as an address, and read its contents.
So, what happens is that the compiler thinks that p[0] is an address, and will try to use it as such.
Why is this a corner case? Why don't I have to worry about this when passing arrays to functions?
Because what is really happening is that the compiler manages it for you. Yes, when you pass an array to a function, a pointer to the first element is passed, and inside the called function you have no way to know if it is a "real" array or a pointer. However, the address passed into the function is different depending on whether you're passing a real array or a pointer. If you're passing a real array, the pointer you get is the address of the first element of the array (in other words: the compiler immediately grabs the address associated to the array variable from the symbol table). If you're passing a pointer, the compiler passes the address that is stored in the address associated with that variable (and that variable happens to be the pointer), that is, it does exactly those 2 steps mentioned before for pointer-based access. Again, note that we're discussing the value of the pointer here. You must keep this separated from the address of the pointer itself (the address where the address of the pointed-to object is stored).
That's why you don't see a difference. In most situations, arrays are passed around as function arguments, and this rarely raises problems. But sometimes, with some corner cases (like yours), if you don't really know what is happening down there, well.. then it will be a wild ride.
Personal advice: read the book, it's totally worth it.
Related
As I understand it, all of the cases where C has to handle an address involve the use of a pointer. For example, the & operand creates a pointer to the program, instead of just giving the bare address as data (i.e it never gives the address without using a pointer first):
scanf("%d", &foo)
Or when using the & operand
int i; //a variable
int *p; //a variable that store adress
p = &i; //The & operator returns a pointer to its operand, and equals p to that pointer.
My question is: Is there a reason why C programs always have to use a pointer to manage addresses? Is there a case where C can handle a bare address (the numerical value of the address) on its own or with another method? Or is that completely impossible? (Being because of system architecture, memory allocation changing during and in each runtime, etc). And finally, would that be useful being that addresses change because of memory management? If that was the case, it would be a reason why pointers are always needed.
I'm trying to figure out if the use pointers is a must in C standardized languages. Not because I want to use something else, but because I want to know for sure that the only way to use addresses is with pointers, and just forget about everything else.
Edit: Since part of the question was answered in the comments of Eric Postpischil, Michał Marszałek, user3386109, Mike Holt and Gecko; I'll group those bits here: Yes, using bare adresses bear little to no use because of different factors (Pointers allow a number of operations, adresses may change each time the program is run, etc). As Michał Marszałek pointed out (No pun intended) scanf() uses a pointer because C can only work with copies, so a pointer is needed to change the variable used. i.e
int foo;
scanf("%d", foo) //Does nothing, since value can't be changed
scanf("%d", &foo) //Now foo can be changed, since we use it's address.
Finally, as Gecko mentioned, pointers are there to represent indirection, so that the compiler can make the difference between data and address.
John Bode covers most of those topics in it's answer, so I'll mark that one.
A pointer is an address (or, more properly, it’s an abstraction of an address). Pointers are how we deal with address values in C.
Outside of a few domains, a “bare address” value simply isn’t useful on its own. We’re less interested in the address than the object at that address. C requires us to use pointers in two situations:
When we want a function to write to a parameter
When we need to track dynamically allocated memory
In these cases, we don’t really care what the address value actually is; we just need it to access the object we’re interested in.
Yes, in the embedded world specific address values are meaningful. But you still use pointers to access those locations. Like I said above, a pointer is an address for our purposes.
C allows you to convert pointers to integers. The <stdint.h> header provides a uintptr_t type with the property that any pointer to void can be converted to uintptr_t and back, and the result will compare equal to the original pointer.
Per C 2018 6.3.2.3 6, the result of converting a pointer to an integer is implementation-defined. Non-normative note 69 says “The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.”
Thus, on a machine where addresses are a simple numbering scheme, converting a pointer to a uintptr_t ought to give you the natural machine address, even though the standard does not require it. There are, however, environments where addresses are more complicated, and the result of converting a pointer to an integer may not be straightforward.
int i; //a variable
int *p; //a variable that store adres
i = 10; //now i is set to 10
p = &i; //now p is set to i address
*p = 20; //we set to 20 the given address
int tab[10]; // a table
p = tab; //set address
p++; //operate on address and move it to next element tab[1]
We can operate on address by pointers move forward or backwards. We can set and read from given address.
In C if we want get return values from functions we must use pointers. Or use return value from functions, but that way we can only get one value.
In C we don't have references therefore we must use pointers.
void fun(int j){
j = 10;
}
void fun2(int *j){
*j = 10;
}
int i;
i = 5; // now I is set to 5
fun(i);
//printf i will print 5
fun2(&i);
//printf I will print 10
I'm learning about pointers and have been told this: "The purpose of pointers is to allow you to manually, directly access a block of memory."
Say I have int var = 5;. Can't I use the variable 'var' to access the block of memory where the value 5 is stored, since I can change the value of the variable whenever I want var = 6;? Do I really need a pointer when I can access any variable's value just by using its variable, instead of using a pointer that points to the address where the value is stored?
"The purpose of pointers is to allow you to manually, directly access a block of memory."
This is not always true. Consider
*(int*)(0x1234) = some_value;
this is "direct" memory access. Though
int a = some_value, *ptr = &a;
*ptr = some_other_value;
you are now accessing a indirectly.
Can't I use the variable 'var' to access the block of memory where the
value 5 is stored, since I can change the value of the variable
whenever I want var = 6; ?
Surely; but the semantics is different.
Do I really need a pointer when I can access any variable's value just by using its variable, instead of using a pointer that points to the address where the value is stored?
No, you don't. Consider the first example: within the scope where a has been declared, modifying its value through ptr is rather pointless! However, what if you are not within the scope of a? That is
void foo(int x)
{
x = 5;
}
int main(void)
{
int x = 10;
foo(x);
}
In foo, when you do x = 5, there is an ambiguity: do you want to modify foo::x or main::x? In the latter case that has to be "requested" explicitly and the fact that happens through pointers -or, better, through indirection- is a coincidence and a language choice. Other languages have others.
Pointer types have some traits that make them really useful:
It's guaranteed that a pointer will be so large that it can hold any address that is supported by the architecture (on x86, that is 32 bits a.k.a. 4 bytes, and an x64 64 bits a.k.a. 8 bytes).
Dereferencing and indexing the memory is done per object, not per byte.
int buffer[10];
char*x = buffer;
int*y = (int*)buffer;
That way, x[1] isn't y[1]
Both is not guaranteed if you use simple ints to hold your values. The first trait is at least guaranteed by uintptr_t (not by size_t though, although most of the time they have the same size - except that size_t can be 2 bytes in size on systems with segmented memory layout, while uintptr_t is still 4 bytes in size).
While using ints might work at first, you always:
have to turn the value into a pointer
have to dereference the pointer
and have to make sure that you don't go beyond certain values for your "pointer". For a 16 bit int, you cannot go beyond 0xFFFF, for 32 bit it's 0xFFFF FFFF - once you do, your pointer might overflow without you noticing it until it's too late.
That is also the reason why linked lists and pointers to incomplete types work - the compiler already knows the size of the pointers you are going to you, and just allocates memory for them. All pointers have the same size (4 or 8 bytes on 32-bit/64-bit architectures) - the type that you assign them just tells the compiler how to dereference the value. char*s take up the same space as void*s, but you cannot dereference void*s. The compiler won't let you.
Also, if you are just dealing with simple integers, there's a good chance that you will slow down your program significantly do to something called "aliasing", which basically forces the compiler to read the value of a given address all the time. Memory accesses are slow though, so you want to optimized these memory accesses out.
You can compare a memory address to a street address:
If you order something, you tell the shop your address, so that they can send you what you bought.
If you don't want to use your address, you have to send them your house, such that they can place the parcel inside.
Later they return your house to you. This is a bit more cumbersome than using the address!
If you're not at home, the parcel can be delivered to your neighbor if they have your address, but this is not possible if
you sent them your house instead.
The same is true for pointers: They are small and can be transported easily, while the object they point to
might be large, and less easily transportable.
With pointer arithmetics, pointers can also be used to access other objects than the one they originally pointed to.
You call a function from main() or from another function, the function you called can only return 1 value.
Let say you want 3 values changed, you pass them to the called function as pointers. That way you don't have to use global values.
One possible advantage is that it can make it easier to have one function modify a variable that will be used by many other functions. Without pointers, your best option is for the modifying function to return a value to the caller and then to pass this value to the other functions. That can lead to a lot of passing around. Instead, you can give the modifying function a pointer where it stores its output, and all the other functions directly access that memory address. Kind of like global variables.
i am just started learning pointers in c. I have following few doubts. If i find the answers for the below questions. It Will be really useful for me to understand the concept of pointers in c. Thanks in advance.
i)
char *cptr;
int value = 2345;
cptr = (char *)value;
whats the use of (char *) and what it mean in the above code snippet.
ii)
char *cptr;
int value = 2345;
cptr = value;
This also compiles without any error .then whats the difference between i & ii code snippet
iii) &value is returning address of the variable. Is it a virtual memory address in RAM? Suppose another c program running in parallel, will that program can have same memory address as &value. Will each process can have duplicate memory address same as in other process and it is independent of each other?
iv)
#define MY_REGISTER (*(volatile unsigned char*)0x1234)
void main()
{
MY_REGISTER=12;
printf("value in the address tamil is %d",(MY_REGISTER));
}
The above snippet compiled successfully. But it outputs segmentation fault error. I don't know what's the mistake I am doing. I want to know how to access the value of random address, using pointers. Is there any way? Will program have the address 0x1234 for real?
v) printf("value at the address %d",*(236632));//consider the address 236632 available in
//stack
why does the above printf statement showing error?
That's a type cast, it tells the compiler to treat one type as some other (possibly unrelated) type. As for the result, see point 2 below.
That makes cptr point to the address 2345.
Modern operating systems isolate the processes. The address of one variable in one process is not valid in another process, even if started with the same program. In fact, the second process may have a completely different memory map due to Address Space Layout Randomisation (ASLR).
It's because you try to write to address 0x1234 which might be a valid address on some systems, but not on most, and almost never on a PC running e.g. Windows or Linux.
i)
(char *) means, that you cast the data stored in value to a pointer ptr, which points to a char. Which means, that ptr points to the memory location 2345. In your code snipet ptr is undefined though. I guess there is more in that program.
ii)
The difference is, that you now write to cptr, which is (as you defined) a pointer pointing to a char. There is not much of a difference as in i) except, that you write to a different variable, and that you use a implicit cast, which gets resolved by the compiler. Again, cptr points now to the location 2345 and expects there to be a char
iii)
Yes you can say it is a virtual address. Also segmentation plays some parts in this game, but at your stage you don't need to worry about it at all. The OS will resolve that for you and makes sure, that you only overwrite variables in the memory space dedicated to your program. So if you run a program twice at the same time, and you print a pointer, it is most likely the same value, but they won't point at the same value in memory.
iv)
Didn't see the write instruction at first. You can't just write anywhere into memory, as you could overwrite another program's value.
v)
Similar issue as above. You cannot just dereference any number you want to, you first need to cast it to a pointer, otherwise neither the compiler, your OS nor your CPU will have a clue, to what exactely it is pointing to
Hope I could help you, but I recommend, that you dive again in some books about pointers in C.
i.) Type cast, you cast the integer to a char
ii.) You point to the address of 2345.
iii.) Refer to answer from Joachim Pileborg. ^ ASLR
iv.) You can't directly write into an address without knowing if there's already something in / if it even exists.
v.) Because you're actually using a pointer to print a normal integer out, which should throw the error C2100: illegal indirection.
You may think pointers like numbers on mailboxes. When you set a value to a pointer, e.g cptr = 2345 is like you move in front of mailbox 2345. That's ok, no actual interaction with the memory, hence no crash. When you state something like *cptr, this refers to the actual "content of the mailbox". Setting a value for *cptr is like trying to put something in the mailbox in front of you (memory location). If you don't know who it belongs to (how the application uses that memory), it's probably a bad idea. You could use "malloc" to initialize a pointer / allocate memory, and "free" to cleanup after you finish the job.
I've just started to learn C so please be kind.
From what I've read so far regarding pointers:
int * test1; //this is a pointer which is basically an address to the process
//memory and usually has the size of 2 bytes (not necessarily, I know)
float test2; //this is an actual value and usually has the size of 4 bytes,
//being of float type
test2 = 3.0; //this assigns 3 to `test2`
Now, what I don't completely understand:
*test1 = 3; //does this assign 3 at the address
//specified by `pointerValue`?
test1 = 3; //this says that the pointer is basically pointing
//at the 3rd byte in process memory,
//which is somehow useless, since anything could be there
&test1; //this I really don't get,
//is it the pointer to the pointer?
//Meaning, the address at which the pointer address is kept?
//Is it of any use?
Similarly:
*test2; //does this has any sense?
&test2; //is this the address at which the 'test2' value is found?
//If so, it's a pointer, which means that you can have pointers pointing
//both to the heap address space and stack address space.
//I ask because I've always been confused by people who speak about
//pointers only in the heap context.
Great question.
Your first block is correct. A pointer is a variable that holds the address of some data. The type of that pointer tells the code how to interpret the contents of the address being held by that pointer.
The construct:
*test1 = 3
Is called the deferencing of a pointer. That means, you can access the address that the pointer points to and read and write to it like a normal variable. Note:
int *test;
/*
* test is a pointer to an int - (int *)
* *test behaves like an int - (int)
*
* So you can thing of (*test) as a pesudo-variable which has the type 'int'
*/
The above is just a mnemonic device that I use.
It is rare that you ever assign a numeric value to a pointer... maybe if you're developing for a specific environment which has some 'well-known' memory addresses, but at your level, I wouldn't worry to much about that.
Using
*test2
would ultimately result in an error. You'd be trying to deference something that is not a pointer, so you're likely to get some kind of system error as who knows where it is pointing.
&test1 and &test2 are, indeed, pointers to test1 and test2.
Pointers to pointers are very useful and a search of pointer to a pointer will lead you to some resources that are way better than I am.
It looks like you've got the first part right.
An incidental thought: there are various conventions about where to put that * sign. I prefer mine nestled with the variable name, as in int *test1 while others prefer int* test1. I'm not sure how common it is to have it floating in the middle.
Another incidental thought: test2 = 3.0 assigns a floating-point 3 to test2. The same end could be achieved with test2=3, in which case the 3 is implicitly converted from an integer to a floating point number. The convention you have chosen is probably safer in terms of clarity, but is not strictly necessary.
Non-incidentals
*test1=3 does assign 3 to the address specified by test.
test1=3 is a line that has meaning, but which I consider meaningless. We do not know what is at memory location 3, if it is safe to touch it, or even if we are allowed to touch it.
That's why it's handy to use something like
int var=3;
int *pointy=&var;
*pointy=4;
//Now var==4.
The command &var returns the memory location of var and stores it in pointy so that we can later access it with *pointy.
But I could also do something like this:
int var[]={1,2,3};
int *pointy=&var;
int *offset=2;
*(pointy+offset)=4;
//Now var[2]==4.
And this is where you might legitimately see something like test1=3: pointers can be added and subtracted just like numbers, so you can store offsets like this.
&test1 is a pointer to a pointer, but that sounds kind of confusing to me. It's really the address in memory where the value of test1 is stored. And test1 just happens to store as its value the address of another variable. Once you start thinking of pointers in this way (address in memory, value stored there), they become easier to work with... or at least I think so.
I don't know if *test2 has "meaning", per se. In principle, it could have a use in that we might imagine that the * command will take the value of test2 to be some location in memory, and it will return the value it finds there. But since you define test2 as a float, it is difficult to predict where in memory we would end up, setting test2=3 will not move us to the third spot of anything (look up the IEEE754 specification to see why). But I would be surprised if a compiler would allow such thing.
Let's look at another quick example:
int var=3;
int pointy1=&var;
int pointy2=&pointy1;
*pointy1=4; //Now var==4
**pointy2=5; //Now var==5
So you see that you can chain pointers together like this, as many in a row as you'd like. This might show up if you had an array of pointers which was filled with the addresses of many structures you'd created from dynamic memory, and those structures contained pointers to dynamically allocated things themselves. When the time comes to use a pointer to a pointer, you'll probably know it. For now, don't worry too much about them.
First let's add some confusion: the word "pointer" can refer to either a variable (or object) with a pointer type, or an expression with the pointer type. In most cases, when people talk about "pointers" they mean pointer variables.
A pointer can (must) point to a thing (An "object" in standards parlance). It can only point to the right kind of thing; a pointer to int is not supposed to point to a float object. A pointer can also be NULL; in that case there is no thing to point to.
A pointertype is also a type, and a pointer object is also an object. So it is allowable to construct a pointer to pointer: the pointer-to-pointer just stores the addres of the pointer object.
What a pointer can not be:
It cannot point to a value: p = &4; is impossible. 4 is a literal value, which is not stored in an object, and thus has no address.
the same goes for expressions: p = &(1+4); is impossible, because the expression "1+4" does not have a location.
the same goes for return value p = &sin(pi); is impossible; the return value is not an object and thus has no address.
variables marked as "register" (almost distinct now) cannot have an address.
you cannot take the address of a bitfield, basically because these can be smaller than character (or have a finer granularity), hence it would be possible that different bitmasks would have the same address.
There are some "exceptions" to the above skeletton (void pointers, casting, pointing one element beyond an array object) but for clarity these should be seen as refinements/amendments, IMHO.
Just trying to understand how to address a single character in an array of strings. Also, this of course will allow me to understand pointers to pointers subscripting in general.
If I have char **a and I want to reach the 3rd character of the 2nd string, does this work: **((a+1)+2)? Seems like it should...
Almost, but not quite. The correct answer is:
*((*(a+1))+2)
because you need to first de-reference to one of the actual string pointers and then you to de-reference that selected string pointer down to the desired character. (Note that I added extra parenthesis for clarity in the order of operations there).
Alternatively, this expression:
a[1][2]
will also work!....and perhaps would be preferred because the intent of what you are trying to do is more self evident and the notation itself is more succinct. This form may not be immediately obvious to people new to the language, but understand that the reason the array notation works is because in C, an array indexing operation is really just shorthand for the equivalent pointer operation. ie: *(a+x) is same as a[x]. So, by extending that logic to the original question, there are two separate pointer de-referencing operations cascaded together whereby the expression a[x][y] is equivalent to the general form of *((*(a+x))+y).
You don't have to use pointers.
int main(int argc, char **argv){
printf("The third character of
argv[1] is [%c].\n", argv[1][2]);
}
Then:
$ ./main hello The third character of
argv[1] is [l].
That's a one and an l.
You could use pointers if you want...
*(argv[1] +2)
or even
*((*(a+1))+2)
As someone pointed out above.
This is because array names are pointers.
Theres a brilliant C programming explanation in the book Hacking the art of exploitation 2nd Edition by Jon Erickson which discusses pointers, strings, worth a mention for the programming explanation section alone https://leaksource.files.wordpress.com/2014/08/hacking-the-art-of-exploitation.pdf.
Although the question has already been answered, someone else wanting to know more may find the following highlights from Ericksons book useful to understand some of the structure behind the question.
Headers
Examples of header files available for variable manipulation you will probably use.
stdio.h - http://www.cplusplus.com/reference/cstdio/
stdlib.h - http://www.cplusplus.com/reference/cstdlib/
string.h - http://www.cplusplus.com/reference/cstring/
limits.h - http://www.cplusplus.com/reference/climits/
Functions
Examples of general purpose functions you will probably use.
malloc() - http://www.cplusplus.com/reference/cstdlib/malloc/
calloc() - http://www.cplusplus.com/reference/cstdlib/calloc/
strcpy() - http://www.cplusplus.com/reference/cstring/strcpy/
Memory
"A compiled program’s memory is divided into five segments: text, data, bss, heap, and stack. Each segment represents a special portion of memory that is set aside for a certain purpose. The text segment is also sometimes called the code segment. This is where the assembled machine language instructions of the program are located".
"The execution of instructions in this segment is nonlinear, thanks to the aforementioned high-level control structures and functions, which compile
into branch, jump, and call instructions in assembly language. As a program
executes, the EIP is set to the first instruction in the text segment. The
processor then follows an execution loop that does the following:"
"1. Reads the instruction that EIP is pointing to"
"2. Adds the byte length of the instruction to EIP"
"3. Executes the instruction that was read in step 1"
"4. Goes back to step 1"
"Sometimes the instruction will be a jump or a call instruction, which
changes the EIP to a different address of memory. The processor doesn’t
care about the change, because it’s expecting the execution to be nonlinear
anyway. If EIP is changed in step 3, the processor will just go back to step 1 and read the instruction found at the address of whatever EIP was changed to".
"Write permission is disabled in the text segment, as it is not used to store variables, only code. This prevents people from actually modifying the program code; any attempt to write to this segment of memory will cause the program to alert the user that something bad happened, and the program
will be killed. Another advantage of this segment being read-only is that it
can be shared among different copies of the program, allowing multiple
executions of the program at the same time without any problems. It should
also be noted that this memory segment has a fixed size, since nothing ever
changes in it".
"The data and bss segments are used to store global and static program
variables. The data segment is filled with the initialized global and static variables, while the bss segment is filled with their uninitialized counterparts. Although these segments are writable, they also have a fixed size. Remember that global variables persist, despite the functional context (like the variable j in the previous examples). Both global and static variables are able to persist because they are stored in their own memory segments".
"The heap segment is a segment of memory a programmer can directly
control. Blocks of memory in this segment can be allocated and used for
whatever the programmer might need. One notable point about the heap
segment is that it isn’t of fixed size, so it can grow larger or smaller as needed".
"All of the memory within the heap is managed by allocator and deallocator algorithms, which respectively reserve a region of memory in the heap for use and remove reservations to allow that portion of memory to be reused for later reservations. The heap will grow and shrink depending on how
much memory is reserved for use. This means a programmer using the heap
allocation functions can reserve and free memory on the fly. The growth of
the heap moves downward toward higher memory addresses".
"The stack segment also has variable size and is used as a temporary scratch pad to store local function variables and context during function calls. This is what GDB’s backtrace command looks at. When a program calls a function, that function will have its own set of passed variables, and the function’s code will be at a different memory location in the text (or code) segment. Since the context and the EIP must change when a function is called, the stack is used to remember all of the passed variables, the location the EIP should return to after the function is finished, and all the local variables used by that function. All of this information is stored together on the stack in what is collectively called a stack frame. The stack contains many stack frames".
"In general computer science terms, a stack is an abstract data structure that is used frequently. It has first-in, last-out (FILO) ordering
, which means the first item that is put into a stack is the last item to come out of it. Think of it as putting beads on a piece of string that has a knot on one end—you can’t get the first bead off until you have removed all the other beads. When an item is placed into a stack, it’s known as pushing, and when an item is removed from a stack, it’s called popping".
"As the name implies, the stack segment of memory is, in fact, a stack data structure, which contains stack frames. The ESP register is used to keep track of the address of the end of the stack, which is constantly changing as items are pushed into and popped off of it. Since this is very dynamic behavior, it makes sense that the stack is also not of a fixed size. Opposite to the dynamic growth of the heap, as the stack change
s in size, it grows upward in a visual listing of memory, toward lower memory addresses".
"The FILO nature of a stack might seem odd, but since the stack is used
to store context, it’s very useful. When a function is called, several things are pushed to the stack together in a stack frame. The EBP register—sometimes called the frame pointer (FP) or local base (LB) pointer
—is used to reference local function variables in the current stack frame. Each stack frame contains the parameters to the function, its local variables, and two pointers that are necessary to put things back the way they were: the saved frame pointer (SFP) and the return address. The
SFP is used to restore EBP to its previous value, and the return address
is used to restore EIP to the next instruction found after the function call. This restores the functional context of the previous stack
frame".
Strings
"In C, an array is simply a list of n elements of a specific data type. A 20-character array is simply 20 adjacent characters located in memory. Arrays are also referred to as buffers".
#include <stdio.h>
int main()
{
char str_a[20];
str_a[0] = 'H';
str_a[1] = 'e';
str_a[2] = 'l';
str_a[3] = 'l';
str_a[4] = 'o';
str_a[5] = ',';
str_a[6] = ' ';
str_a[7] = 'w';
str_a[8] = 'o';
str_a[9] = 'r';
str_a[10] = 'l';
str_a[11] = 'd';
str_a[12] = '!';
str_a[13] = '\n';
str_a[14] = 0;
printf(str_a);
}
"In the preceding program, a 20-element character array is defined as
str_a, and each element of the array is written to, one by one. Notice that the number begins at 0, as opposed to 1. Also notice that the last character is a 0".
"(This is also called a null byte.) The character array was defined, so 20 bytes are allocated for it, but only 12 of these bytes are actually used. The null byte Programming at the end is used as a delimiter character to tell any function that is dealing with the string to stop operations right there. The remaining extra bytes are just garbage and will be ignored. If a null byte is inserted in the fifth element of the character array, only the characters Hello would be printed by the printf() function".
"Since setting each character in a character array is painstaking and strings are used fairly often, a set of standard functions was created for string manipulation. For example, the strcpy() function will copy a string from a source to a destination, iterating through the source string and copying each byte to the destination (and stopping after it copies the null termination byte)".
"The order of the functions arguments is similar to Intel assembly syntax destination first and then source. The char_array.c program can be rewritten using strcpy() to accomplish the same thing using the string library. The next version of the char_array program shown below includes string.h since it uses a string function".
#include <stdio.h>
#include <string.h>
int main()
{
char str_a[20];
strcpy(str_a, "Hello, world!\n");
printf(str_a);
}
Find more information on C strings
http://www.cs.uic.edu/~jbell/CourseNotes/C_Programming/CharacterStrings.html
http://www.tutorialspoint.com/cprogramming/c_strings.htm
Pointers
"The EIP register is a pointer that “points” to the current instruction during a programs execution by containing its memory address. The idea of pointers is used in C, also. Since the physical memory cannot actually be moved, the information in it must be copied. It can be very computationally expensive to copy large chunks of memory to be used by different functions or in different places. This is also expensive from a memory standpoint, since space for the new destination copy must be saved or allocated before the source can be copied. Pointers are a solution to this problem. Instead of copying a large block of memory, it is much simpler to pass around the address of the beginning of that block of memory".
"Pointers in C can be defined and used like any other variable type. Since memory on the x86 architecture uses 32-bit addressing, pointers are also 32 bits in size (4 bytes). Pointers are defined by prepending an asterisk (*) to the variable name. Instead of defining a variable of that type, a pointer is defined as something that points to data of that type. The pointer.c program is an example of a pointer being used with the char data type, which is only 1byte in size".
#include <stdio.h>
#include <string.h>
int main()
{
char str_a[20]; // A 20-element character array
char *pointer; // A pointer, meant for a character array
char *pointer2; // And yet another one
strcpy(str_a, "Hello, world!\n");
pointer = str_a; // Set the first pointer to the start of the array.
printf(pointer);
pointer2 = pointer + 2; // Set the second one 2 bytes further in.
printf(pointer2); // Print it.
strcpy(pointer2, "y you guys!\n"); // Copy into that spot.
printf(pointer); // Print again.
}
"As the comments in the code indicate, the first pointer is set at the beginning of the character array. When the character array is referenced like this, it is actually a pointer itself. This is how this buffer was passed as a pointer to the printf() and strcpy() functions earlier. The second pointer is set to the first pointers address plus two, and then some things are printed (shown in the output below)".
reader#hacking:~/booksrc $ gcc -o pointer pointer.c
reader#hacking:~/booksrc $ ./pointer
Hello, world!
llo, world!
Hey you guys!
reader#hacking:~/booksrc $
"The address-of operator is often used in conjunction with pointers, since pointers contain memory addresses. The addressof.c program demonstrates
the address-of operator being used to put the address of an integer variable
into a pointer. This line is shown in bold below".
#include <stdio.h>
int main()
{
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // put the address of int_var into int_ptr
}
"An additional unary operator called the dereference operator exists for use with pointers. This operator will return the data found in the address the pointer is pointing to, instead of the address itself. It takes the form of an asterisk in front of the variable name, similar to the declaration of a pointer. Once again, the dereference operator exists both in GDB and in C".
"A few additions to the addressof.c code (shown in addressof2.c) will
demonstrate all of these concepts. The added printf() functions use format
parameters, which I’ll explain in the next section. For now, just focus on the programs output".
#include <stdio.h>
int main()
{
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // Put the address of int_var into int_ptr.
printf("int_ptr = 0x%08x\n", int_ptr);
printf("&int_ptr = 0x%08x\n", &int_ptr);
printf("*int_ptr = 0x%08x\n\n", *int_ptr);
printf("int_var is located at 0x%08x and contains %d\n", &int_var, int_var);
printf("int_ptr is located at 0x%08x, contains 0x%08x, and points to %d\n\n", &int_ptr, int_ptr, *int_ptr);
}
"When the unary operators are used with pointers, the address-of operator can be thought of as moving backward, while the dereference operator moves forward in the direction the pointer is pointing".
Find out more about Pointers & memory allocation
Professor Dan Hirschberg, Computer Science Department, University of California on computer memory https://www.ics.uci.edu/~dan/class/165/notes/memory.html
http://cslibrary.stanford.edu/106/
http://www.programiz.com/c-programming/c-dynamic-memory-allocation
Arrays
Theres a simple tutorial on multi-dimensional arrays by a chap named Alex Allain available here http://www.cprogramming.com/tutorial/c/lesson8.html
Theres information on arrays by a chap named Todd A Gibson available here http://www.augustcouncil.com/~tgibson/tutorial/arr.html
Iterate an Array
#include <stdio.h>
int main()
{
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
char *char_pointer;
int *int_pointer;
char_pointer = char_array;
int_pointer = int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the integer %d\n", int_pointer, *int_pointer);
int_pointer = int_pointer + 1;
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
printf("[char pointer] points to %p, which contains the char '%c'\n", char_pointer, *char_pointer);
char_pointer = char_pointer + 1;
}
}
Linked Lists vs Arrays
Arrays are not the only option available, information on Linked List.
http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_linklist.aspx
Conclusion
This information was written simply to pass on some of what I have read throughout my research on the topic that might help others.
Iirc, a string is actually an array of chars, so this should work:
a[1][2]
Quote from the wikipedia article on C pointers -
In C, array indexing is formally defined in terms of pointer arithmetic; that is,
the language specification requires that array[i] be equivalent to *(array + i). Thus in C, arrays can be thought of as pointers to consecutive areas of memory (with no gaps),
and the syntax for accessing arrays is identical for that which can be used to dereference
pointers. For example, an array can be declared and used in the following manner:
int array[5]; /* Declares 5 contiguous (per Plauger Standard C 1992) integers */
int *ptr = array; /* Arrays can be used as pointers */
ptr[0] = 1; /* Pointers can be indexed with array syntax */
*(array + 1) = 2; /* Arrays can be dereferenced with pointer syntax */
So, in response to your question - yes, pointers to pointers can be used as an array without any kind of other declaration at all!
Try a[1][2]. Or *(*(a+1)+2).
Basically, array references are syntactic sugar for pointer dereferencing. a[2] is the same as a+2, and also the same as 2[a] (if you really want unreadable code). An array of strings is the same as a double pointer. So you can extract the second string using either a[1] or *(a+1). You can then find the third character in that string (call it 'b' for now) with either b[2] or *(b + 2). Substituting the original second string for 'b', we end up with either a[1][2] or *(*(a+1)+2).