As part of our training in the Academy of Programming Languages, we also learned C. During the test, we encountered the question of what the program output would be:
#include <stdio.h>
#include <string.h>
int main(){
char str[] = "hmmmm..";
const char * const ptr1[] = {"to be","or not to be","that is the question"};
char *ptr2 = "that is the qusetion";
(&ptr2)[3] = str;
strcpy(str,"(Hamlet)");
for (int i = 0; i < sizeof(ptr1)/sizeof(*ptr1); ++i){
printf("%s ", ptr1[i]);
}
printf("\n");
return 0;
}
Later, after examining the answers, it became clear that the cell (& ptr2)[3] was identical to the memory cell in &ptr1[2], so the output of the program is: to be or not to be (Hamlet)
My question is, is it possible to know, only by written code in the notebook, without checking any compiler, that a certain pointer (or all variables in general) follow or precede other variables in memory?
Note, I do not mean array variables, so all the elements in the array must be in sequence.
In this statement:
(&ptr2)[3] = str;
ptr2 was defined with char *ptr2 inside main. With this definition, the compiler is responsible for providing storage for ptr2. The compiler is allowed to use whatever storage it wants for this—it could be before ptr1, it could be after ptr1, it could be close, it could be far away.
Then &ptr2 takes the address of ptr2. This is allowed, but we do not know where that address will be in relation to ptr1 or anything else, because the compiler is allowed to use whatever storage it wants.
Since ptr2 is a char *, &ptr2 is a pointer to char *, also known as char **.
Then (&ptr2)[3] attempts to refer to element 3 of an array of char * that is at &ptr2. But there is no array there in C’s model of computation. There is just one char * there. When you try to refer to element of 3 of an array when there is no element 3 of an array, the behavior is not defined by the C standard.
Thus, this code is a bad example. It appears the test author misunderstood C, and this code does not illustrate what was intended.
char *ptr2 = some initializer;
(&ptr2)[3] = str;
When you evaluate &ptr2, you obtain the address of memory where is stored the pointer that points to that initializer.
When you do (&ptr2)[3]=something you try to write 3*sizeof(void*) locations further from the location of ptr2, the address of a string. This is invalid and almost sure it finishes with segmentation fault.
No, it's not possible and no such assumptions can be made.
By writing outside a variable's space, this code invokes undefined behavior, it's basically "illegal" and anything can happen when you run it. The C language specification says nothing about variables being allocated on a stack in some particular order that you can exploit, it does however say that accessing random memory is undefined behavior.
Basically this code is pretty horrible and should never be used, even less so in a teaching environment. It makes me sad, how people mis-understand C and still teach it to others. :/
A program usually is loaded in memory with this structure:
Stack, Mmap'ed files, Heap, BSS (uninitialized static variables), Data segment (Initialized static variables) and Text (Compiled code)
You can learn more here:
https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
Depending on how you declare the variable it will go to one of the places said before.
The compiler will arrange the BSS and Data segment variables as he wishes on compilation time so usually no chance. Neither heap vars (the OS will get the memory block that fits better the space allocated)
In the stack (which is a LIFO structure) the variables are put one over eachother so if you have:
int a = 5;
int b = 10;
You can say that a and b will be placed one following the other. So, in this case you can tell.
There is another exception and that is if the variable is an structure or an array, they are always placed like i said before, each one following the last.
In your code ptr1 is an array of arrays of chars so it will follow the exception i said.
In fact, do the following exercise:
#include <stdio.h>
#include <string.h>
int main(){
const char * const ptr1[] = {"to be","or not to be","that is the question"};
for (int i = 0; i < 3; i++) {
for (int j = 0; j < strlen(ptr1[i]); j++)
printf("%p -> %c\n", &ptr1[i][j], ptr1[i][j]);
printf("\n");
}
}
and you will see the memory address and its content!
Have a nice day.
Related
I have a question about this code below:
#include <stdio.h>
char abcd(char array[]);
int main(void)
{
char array[4] = { 'a', 'b', 'c', 'd' };
printf("%c\n", abcd(array));
return 0;
}
char abcd(char array[])
{
char *p = array;
while (*p) {
putchar(*p);
p++;
}
putchar(*p);
putchar(p[4]);
return *p;
}
Why isn't segmentation fault generated when this program comes across putchar(*p) right after exiting while loop? I think that after *p went beyond the array[3] there is supposed to be no value assigned to other memory locations. For example, trying to access p[4] would be illegal because it would be out of the bound, I thought. On the contrary, this program runs with no errors. Is this because any other memories which no value are assigned (in this case any other memories than array[4]) should be null, whose value is '\0'?
OP seems to think accessing an array out-of-bounds, something special should happen.
Accessing outside array bounds is undefined behavior (UB). Anything may happen.
Let's clarify what a undefined behavior is.
The C standard is a contract between the developer and the compiler as to what the code means. However, it just so happens that you can write things that are just outside what is defined by the standard.
One of the most common cases is trying to do out-of-bounds access. Other languages say that this should result in an exception or another error. C does not. An argument is that it would imply adding costly checks at every array access.
The compiler does not know that what you are writing is undefined behavior¹. Instead, the compiler assumes that what you write contains no undefined behavior, and translate your code to assembly accordingly.
If you want an example, compile the code below with or without optimizations:
#include <stdio.h>
int table[4] = {0, 0, 0, 0};
int exists_in_table(int v)
{
for (int i = 0; i <= 4; i++) {
if (table[i] == v) {
return 1;
}
}
return 0;
}
int main(void) {
printf("%d\n", exists_in_table(3));
}
Without optimizations, the assembly I get from gcc does what you might expect: it just goes too far in the memory, which might cause a segfault if the array is allocated right before a page boundary.
With optimizations, however, the compiler looks at your code and notices that it cannot exit the loop (otherwise, it would try to access table[4], which cannot be), so the function exists_in_table necessarily returns 1. And we get the following, valid, implementation:
exists_in_table(int):
mov eax, 1
ret
Undefined behavior means undefined. They are very tricky to detect since they can be virtually invisible after compiling. You need advanced static analyzer to interpret the C source code and understand whether what it does can be undefined behavior.
¹ in the general case, that is; modern compilers use some basic static analyzer to detect the most common errors
C does no bounds checking on array accesses; because of how arrays and array subscripting are implemented, it can't do any bounds checking. It simply doesn't know that you've run past the end of the array. The operating environment will throw a runtime error if you cross a page boundary, but up until that point you can read or clobber any memory following the end of the array.
The behavior on subscripting past the end of the array is undefined - the language definition does not require the compiler or the operating environment to handle it any particular way. You may get a segfault, you may get corrupted data, you may clobber a frame pointer or return instruction address and put your code in a bad state, or it may work exactly as expected.
There are few remark points inside your program:
array inside the main and abcd function are different. In main, it is array of 4 elements, in abcd, it is an input variable with array type. If inside main, you call something like array[4] there will be compiler warnings for this. But there won't be compiler warning if you call in side abcd.
*p is a pointer point to array or in other word, it point to first element of array. In C, there isn't any boundary or limit for p. Your program is lucky because the memory after array contains 0 value to stop the while(*p) loop. If you did check the address of pointer p (&p). It might not equal to array[4].
I've written a function that returns an array whilst I know that I should return a dynamically allocated pointer instead, but still I wanted to know what happens when I am returning an array declared locally inside a function (without declaring it as static), and I got surprised when I noticed that the memory of the internal array in my function wasn't deallocated, and I got my array back to main.
The main:
int main()
{
int* arr_p;
arr_p = demo(10);
return 0;
}
And the function:
int* demo(int i)
{
int arr[10] = { 0 };
for (int i = 0; i < 10; i++)
{
arr[i] = i;
}
return arr;
}
When I dereference arr_p I can see the 0-9 integers set in the demo function.
Two questions:
How come when I examined arr_p I saw that its address is the same as arr which is in the demo function?
How come demo_p is pointing to data which is not deallocated (the 0-9 numbers) already in demo? I expected that arr inside demo will be deallocated as we got out of demo scope.
One of the things you have to be careful of when programming is to pay good attention to what the rules say, and not just to what seems to work. The rules say you're not supposed to return a pointer to a locally-allocated array, and that's a real, true rule.
If you don't get an error when you write a program that returns a pointer to a locally-allocated array, that doesn't mean it was okay. (Although, it means you really ought to get a newer compiler, because any decent, modern compiler will warn about this.)
If you write a program that returns a pointer to a locally-allocated array and it seems to work, that doesn't mean it was okay, either. Be really careful about this: In general, in programming, but especially in C, seeming to work is not proof that your program is okay. What you really want is for your program to work for the right reasons.
Suppose you rent an apartment. Suppose, when your lease is up, and you move out, your landlord does not collect your key from you, but does not change the lock, either. Suppose, a few days later, you realize you forgot something in the back of one closet. Suppose, without asking, you sneak back to try to collect it. What happens next?
As it happens, your key still works in the lock. Is this a total surprise, or mildly unexpected, or guaranteed to work?
As it happens, your forgotten item still is in the closet. It has not yet been cleared out. Is this a total surprise, or mildly unexpected, or guaranteed to happen?
In the end, neither your old landlord, nor the police, accost you for this act of trespass. Once more, is this a total surprise, or mildly unexpected, or just about completely expected?
What you need to know is that, in C, reusing memory you're no longer allowed to use is just about exactly analogous to sneaking back in to an apartment you're no longer renting. It might work, or it might not. Your stuff might still be there, or it might not. You might get in trouble, or you might not. There's no way to predict what will happen, and there's no (valid) conclusion you can draw from whatever does or doesn't happen.
Returning to your program: local variables like arr are usually stored on the call stack, meaning they're still there even after the function returns, and probably won't be overwritten until the next function gets called and uses that zone on the stack for its own purposes (and maybe not even then). So if you return a pointer to locally-allocated memory, and dereference that pointer right away (before calling any other function), it's at least somewhat likely to "work". This is, again, analogous to the apartment situation: if no one else has moved in yet, it's likely that your forgotten item will still be there. But it's obviously not something you can ever depend on.
arr is a local variable in demo that will get destroyed when you return from the function. Since you return a pointer to that variable, the pointer is said to be dangling. Dereferencing the pointer makes your program have undefined behavior.
One way to fix it is to malloc (memory allocate) the memory you need.
Example:
#include <stdio.h>
#include <stdlib.h>
int* demo(int n) {
int* arr = malloc(sizeof(*arr) * n); // allocate
for (int i = 0; i < n; i++) {
arr[i] = i;
}
return arr;
}
int main() {
int* arr_p;
arr_p = demo(10);
printf("%d\n", arr_p[9]);
free(arr_p) // free the allocated memory
}
Output:
9
How come demo_p is pointing to data which is not deallocated (the 0-9 numbers) already in demo? I expected that arr inside demo will be deallocated as we got out of demo scope.
The life of the arr object has ended and reading the memory addresses previously occupied by arr makes your program have undefined behavior. You may be able to see the old data or the program may crash - or do something completely different. Anything can happen.
… I noticed that the memory of the internal array in my function wasn't deallocated…
Deallocation of memory is not something you can notice or observe, except by looking at the data that records memory reservations (in this case, the stack pointer). When memory is reserved or released, that is just a bookkeeping process about what memory is available or not available. Releasing memory does not necessarily erase memory or immediately reuse it for another purpose. Looking at the memory does not necessarily tell you whether it is in use or not.
When int arr[10] = { 0 }; appears inside a function, it defines an array that is allocated automatically when the function starts executing (or at certain times within the function execution if the definition is in some nested scope). This is commonly done by adjusting the stack pointer. In common systems, programs have a region of memory called the stack, and a stack pointer contains an address that marks the end of the portion of the stack that is currently reserved for use. When a function starts executing, the stack pointer is changed to reserve more memory for that function’s data. When execution of the function ends, the stack pointer is changed to release that memory.
If you keep a pointer to that memory (how you can do that is another matter, discussed below), you will not “notice” or “observe” any change to that memory immediately after the function returns. That is why you see the value of arr_p is the address that arr had, and it is why you see the old data in that memory.
If you call some other function, the stack pointer will be adjusted for the new function, that function will generally use the memory for its own purposes, and then the contents of that memory will have changed. The data you had in arr will be gone. A common example of this that beginners happen across is:
int main(void)
{
int *p = demo(10);
// p points to where arr started, and arr’s data is still there.
printf("arr[3] = %d.\n", p[3]);
// To execute this call, the program loads data from p[3]. Since it has
// not changed, 3 is loaded. This is passed to printf.
// Then printf prints “arr[3] = 3.\n”. In doing this, it uses memory
// on the stack. This changes the data in the memory that p points to.
printf("arr[3] = %d.\n", p[3]);
// When we try the same call again, the program loads data from p[3],
// but it has been changed, so something different is printed. Two
// different things are printed by the same printf statement even
// though there is no visible code changing p[3].
}
Going back to how you can have a copy of a pointer to memory, compilers follow rules that are specified abstractly in the C standard. The C standard defines an abstract lifetime of the array arr in demo and says that lifetime ends when the function returns. It further says the value of a pointer becomes indeterminate when the lifetime of the object it points to ends.
If your compiler is simplistically generating code, as it does when you compile using GCC with -O0 to turn off optimization, it typically keeps the address in p and you will see the behaviors described above. But, if you turn optimization on and compile more complicated programs, the compiler seeks to optimize the code it generates. Instead of mechanically generating assembly code, it tries to find the “best” code that performs the defined behavior of your program. If you use a pointer with indeterminate value or try to access an object whose lifetime has ended, there is no defined behavior of your program, so optimization by the compiler can produce results that are unexpected by new programmers.
As you know dear, the existence of a variable declared in the local function is within that local scope only. Once the required task is done the function terminates and the local variable is destroyed afterwards. As you are trying to return a pointer from demo() function ,but the thing is the array to which the pointer points to will get destroyed once we come out of demo(). So indeed you are trying to return a dangling pointer which is pointing to de-allocated memory. But our rule suggests us to avoid dangling pointer at any cost.
So you can avoid it by re-initializing it after freeing memory using free(). Either you can also allocate some contiguous block of memory using malloc() or you can declare your array in demo() as static array. This will store the allocated memory constant also when the local function exits successfully.
Thank You Dear..
#include<stdio.h>
#define N 10
int demo();
int main()
{
int* arr_p;
arr_p = demo();
printf("%d\n", *(arr_p+3));
}
int* demo()
{
static int arr[N];
for(i=0;i<N;i++)
{
arr[i] = i;
}
return arr;
}
OUTPUT : 3
Or you can also write as......
#include <stdio.h>
#include <stdlib.h>
#define N 10
int* demo() {
int* arr = (int*)malloc(sizeof(arr) * N);
for(int i = 0; i < N; i++)
{
arr[i]=i;
}
return arr;
}
int main()
{
int* arr_p;
arr_p = demo();
printf("%d\n", *(arr_p+3));
free(arr_p);
return 0;
}
OUTPUT : 3
Had the similar situation when i have been trying to return char array from the function. But i always needed an array of a fixed size.
Solved this by declaring a struct with a fixed size char array in it and returning that struct from the function:
#include <time.h>
typedef struct TimeStamp
{
char Char[9];
} TimeStamp;
TimeStamp GetTimeStamp()
{
time_t CurrentCalendarTime;
time(&CurrentCalendarTime);
struct tm* LocalTime = localtime(&CurrentCalendarTime);
TimeStamp Time = { 0 };
strftime(Time.Char, 9, "%H:%M:%S", LocalTime);
return Time;
}
I posted a question about some pointer issues I've been having earlier in this question:
C int pointer segmentation fault several scenarios, can't explain behaviour
From some of the comments, I've been led to believe that the following:
#include <stdlib.h>
#include <stdio.h>
int main(){
int *p;
*p = 1;
printf("%d\n", *p);
return 0;
}
is undefined behaviour. Is this true? I do this all the time, and I've even seen it in my C course.
However, when I do
#include <stdlib.h>
#include <stdio.h>
int main(){
int *p=NULL;
*p = 1;
printf("%d\n", *p);
return 0;
}
I get a seg fault right before printing the contents of p (after the line *p=1;). Does this mean I should have always been mallocing any time I actually assign a value for a pointer to point to?
If that's the case, then why does char *string = "this is a string" always work?
I'm quite confused, please help!
This:
int *p;
*p = 1;
Is undefined behavior because p isn't pointing anywhere. It is uninitialized. So when you attempt to dereference p you're essentially writing to a random address.
What undefined behavior means is that there is no guarantee what the program will do. It might crash, it might output strange results, or it may appear to work properly.
This is also undefined behaivor:
int *p=NULL;
*p = 1;
Because you're attempting to dereference a NULL pointer.
This works:
char *string = "this is a string" ;
Because you're initializing string with the address of a string constant. It's not the same as the other two cases. It's actually the same as this:
char *string;
string = "this is a string";
Note that here string isn't being dereferenced. The pointer variable itself is being assigned a value.
Yes, doing int *p; *p = 1; is undefined behavior. You are dereferencing an uninitialized pointer (accessing the memory to which it points). If it works, it is only because the garbage in p happened to be the address of some region of memory which is writable, and whose contents weren't critical enough to cause an immediate crash when you overwrote them. (But you still might have corrupted some important program data causing problems you won't notice until later...)
An example as blatant as this should trigger a compiler warning. If it doesn't, figure out how to adjust your compiler options so it does. (On gcc, try -Wall -O).
Pointers have to point to valid memory before they can be dereferenced. That could be memory allocated by malloc, or the address of an existing valid object (p = &x;).
char *string = "this is a string"; is perfectly fine because this pointer is not uninitialized; you initialized it! (The * in char *string is part of its declaration; you aren't dereferencing it.) Specifically, you initialized it with the address of some memory which you asked the compiler to reserve and fill in with the characters this is a string\0. Having done that, you can safely dereference that pointer (though only to read, since it is undefined behavior to write to a string literal).
is undefined behaviour. Is this true?
Sure is. It just looks like it's working on your system with what you've tried, but you're performing an invalid write. The version where you set p to NULL first is segfaulting because of the invalid write, but it's still technically undefined behavior.
You can only write to memory that's been allocated. If you don't need the pointer, the easiest solution is to just use a regular int.
int p = 1;
In general, avoid pointers when you can, since automatic variables are much easier to work with.
Your char* example works because of the way strings work in C--there's a block of memory with the sequence "this is a string\0" somewhere in memory, and your pointer is pointing at that. This would be read-only memory though, and trying to change it (i.e., string[0] = 'T';) is undefined behavior.
With the line
char *string = "this is a string";
you are making the pointer string point to a place in read-only memory that contains the string "this is a string". The compiler/linker will ensure that this string will be placed in the proper location for you and that the pointer string will be pointing to the correct location. Therefore, it is guaranteed that the pointer string is pointing to a valid memory location without any further action on your part.
However, in the code
int *p;
*p = 1;
p is uninitialized, which means it is not pointing to a valid memory location. Dereferencing p will therefore result in undefined behavior.
It is not necessary to always use malloc to make p point to a valid memory location. It is one possible way, but there are many other possible ways, for example the following:
int i;
int *p;
p = &i;
Now p is also pointing to a valid memory location and can be safely dereferenced.
Consider the code:
#include <stdio.h>
int main(void)
{
int i=1, j=2;
int *p;
... some code goes here
*p = 3;
printf("%d %d\n", i, j);
}
Would the statement *p = 2; write to i, j, or neither? It would write to i or j if p points to that object, but not if p points somewhere else. If the ... portion of the code doesn't do anything with p, then p might happen point to i, or j, or something within the stdout object, or anything at all. If it happens to point to i or j, then the write *p = 3; might affect that object without any side effects, but if it points to information within stdout that controls where output goes, it might cause the following printf to behave in unpredictable fashion. In a typical implementation, p might point anywhere, and there will be so many things to which p might point that it would be impossible to predict all of the possible effects of writing to them.
Note that the Standard classifies many actions as "Undefined Behavior" with the intention that many or even most implementations will extend the semantics of the language by documenting their behavior. Most implementations, for example, extend the meaning of the << operator to allow it to be used to multiply negative numbers by power of two. Even on implementations that extend the language to specify that an assignment like *p = 3; will always perform a word-sized write of the value 3 to the indicated address, with whatever consequence results, there would be relatively few platforms(*) where it would be possible to fully characterize all possible effects of that action in cases where nothing is known about the value of p. In cases where pointers are read rather than written, some systems may be able to offer useful behavioral guarantees about the effect of arbitrary stray reads, but not all(**).
(*) Some freestanding platforms which keep code in read-only storage may be able to uphold some behavioral guarantees even if code writes to arbitrary pointer addresses. Such behavioral guarantees may be useful in systems whose state might be corrupted by electrical interference, but even when targeting such systems writing to a stray pointer would never be useful.
(**) On many platforms, stray reads will either yield a meaningless value without side effects or force an abnormal program termination, but on an Apple II which a Disk II card in the customary slot-6 location, if code reads from address 0xC0EF within a second of performing a disk access, the drive head to start overwriting whatever happens to be on the last track accessed. This is by design (software that needs to write to the disk does so by accessing address 0xC0EF, and having hardware respond to both reads and writes required one less logic gate--and thus one less chip--than would be required for hardware that only responded to writes) but does mean that code must be careful not to perform any stray reads.
Do C pointer (always) start with a valid address memory? For example If I have the following piece of code:
int *p;
*p = 5;
printf("%i",*p); //shows 5
Why does this piece of code work? According to books (that I read), they say a pointer always needs a valid address memory and give the following and similar example:
int *p;
int v = 5;
p = &v;
printf("%i",*p); //shows 5
Do C pointer (always) start with a valid address memory?
No.
Why does this code work?
The code invokes undefined behavior. If it appears to work on your particular system with your particular compiler options, that's merely a coincidence.
No. Uninitialized local variables have indeterminate values and using them in expressions where they get evaluated cause undefined behavior.
The behaviour is undefined. A C compiler can optimize the pointer access away, noting that in fact the p is not used, only the object *p, and replace the *p with q and effectively produce the program that corresponds to this source code:
#include <stdio.h>
int main(void) {
int q = 5;
printf("%i", q); //shows 5
}
Such is the case when I compile the program with GCC 7.3.0 and -O3 switch - no crash. I get a crash if I compile it without optimization. Both programs are standard-conforming interpretations of the code, namely that dereferencing a pointer that does not point to a valid object has undefined behaviour.
No.
On older time, it was common to initialize pointer to selected memory addresses (e.g. linked to hardware).
char *start_memory buffer = (char *)0xffffb000;
Compiler has no way to find if this is a valid address. This involve a cast, so it is cheating.
Consider
static int *p;
p will have the value of NULL, which doesn't point to a valid address (Linux, but on Kernel, it invalidate such address, other OS could use memory on &NULL to store some data.
But you may also create initialized variables, so with undefined initial values (which probably it is wrong).
Just trying to understand how to address a single character in an array of strings. Also, this of course will allow me to understand pointers to pointers subscripting in general.
If I have char **a and I want to reach the 3rd character of the 2nd string, does this work: **((a+1)+2)? Seems like it should...
Almost, but not quite. The correct answer is:
*((*(a+1))+2)
because you need to first de-reference to one of the actual string pointers and then you to de-reference that selected string pointer down to the desired character. (Note that I added extra parenthesis for clarity in the order of operations there).
Alternatively, this expression:
a[1][2]
will also work!....and perhaps would be preferred because the intent of what you are trying to do is more self evident and the notation itself is more succinct. This form may not be immediately obvious to people new to the language, but understand that the reason the array notation works is because in C, an array indexing operation is really just shorthand for the equivalent pointer operation. ie: *(a+x) is same as a[x]. So, by extending that logic to the original question, there are two separate pointer de-referencing operations cascaded together whereby the expression a[x][y] is equivalent to the general form of *((*(a+x))+y).
You don't have to use pointers.
int main(int argc, char **argv){
printf("The third character of
argv[1] is [%c].\n", argv[1][2]);
}
Then:
$ ./main hello The third character of
argv[1] is [l].
That's a one and an l.
You could use pointers if you want...
*(argv[1] +2)
or even
*((*(a+1))+2)
As someone pointed out above.
This is because array names are pointers.
Theres a brilliant C programming explanation in the book Hacking the art of exploitation 2nd Edition by Jon Erickson which discusses pointers, strings, worth a mention for the programming explanation section alone https://leaksource.files.wordpress.com/2014/08/hacking-the-art-of-exploitation.pdf.
Although the question has already been answered, someone else wanting to know more may find the following highlights from Ericksons book useful to understand some of the structure behind the question.
Headers
Examples of header files available for variable manipulation you will probably use.
stdio.h - http://www.cplusplus.com/reference/cstdio/
stdlib.h - http://www.cplusplus.com/reference/cstdlib/
string.h - http://www.cplusplus.com/reference/cstring/
limits.h - http://www.cplusplus.com/reference/climits/
Functions
Examples of general purpose functions you will probably use.
malloc() - http://www.cplusplus.com/reference/cstdlib/malloc/
calloc() - http://www.cplusplus.com/reference/cstdlib/calloc/
strcpy() - http://www.cplusplus.com/reference/cstring/strcpy/
Memory
"A compiled program’s memory is divided into five segments: text, data, bss, heap, and stack. Each segment represents a special portion of memory that is set aside for a certain purpose. The text segment is also sometimes called the code segment. This is where the assembled machine language instructions of the program are located".
"The execution of instructions in this segment is nonlinear, thanks to the aforementioned high-level control structures and functions, which compile
into branch, jump, and call instructions in assembly language. As a program
executes, the EIP is set to the first instruction in the text segment. The
processor then follows an execution loop that does the following:"
"1. Reads the instruction that EIP is pointing to"
"2. Adds the byte length of the instruction to EIP"
"3. Executes the instruction that was read in step 1"
"4. Goes back to step 1"
"Sometimes the instruction will be a jump or a call instruction, which
changes the EIP to a different address of memory. The processor doesn’t
care about the change, because it’s expecting the execution to be nonlinear
anyway. If EIP is changed in step 3, the processor will just go back to step 1 and read the instruction found at the address of whatever EIP was changed to".
"Write permission is disabled in the text segment, as it is not used to store variables, only code. This prevents people from actually modifying the program code; any attempt to write to this segment of memory will cause the program to alert the user that something bad happened, and the program
will be killed. Another advantage of this segment being read-only is that it
can be shared among different copies of the program, allowing multiple
executions of the program at the same time without any problems. It should
also be noted that this memory segment has a fixed size, since nothing ever
changes in it".
"The data and bss segments are used to store global and static program
variables. The data segment is filled with the initialized global and static variables, while the bss segment is filled with their uninitialized counterparts. Although these segments are writable, they also have a fixed size. Remember that global variables persist, despite the functional context (like the variable j in the previous examples). Both global and static variables are able to persist because they are stored in their own memory segments".
"The heap segment is a segment of memory a programmer can directly
control. Blocks of memory in this segment can be allocated and used for
whatever the programmer might need. One notable point about the heap
segment is that it isn’t of fixed size, so it can grow larger or smaller as needed".
"All of the memory within the heap is managed by allocator and deallocator algorithms, which respectively reserve a region of memory in the heap for use and remove reservations to allow that portion of memory to be reused for later reservations. The heap will grow and shrink depending on how
much memory is reserved for use. This means a programmer using the heap
allocation functions can reserve and free memory on the fly. The growth of
the heap moves downward toward higher memory addresses".
"The stack segment also has variable size and is used as a temporary scratch pad to store local function variables and context during function calls. This is what GDB’s backtrace command looks at. When a program calls a function, that function will have its own set of passed variables, and the function’s code will be at a different memory location in the text (or code) segment. Since the context and the EIP must change when a function is called, the stack is used to remember all of the passed variables, the location the EIP should return to after the function is finished, and all the local variables used by that function. All of this information is stored together on the stack in what is collectively called a stack frame. The stack contains many stack frames".
"In general computer science terms, a stack is an abstract data structure that is used frequently. It has first-in, last-out (FILO) ordering
, which means the first item that is put into a stack is the last item to come out of it. Think of it as putting beads on a piece of string that has a knot on one end—you can’t get the first bead off until you have removed all the other beads. When an item is placed into a stack, it’s known as pushing, and when an item is removed from a stack, it’s called popping".
"As the name implies, the stack segment of memory is, in fact, a stack data structure, which contains stack frames. The ESP register is used to keep track of the address of the end of the stack, which is constantly changing as items are pushed into and popped off of it. Since this is very dynamic behavior, it makes sense that the stack is also not of a fixed size. Opposite to the dynamic growth of the heap, as the stack change
s in size, it grows upward in a visual listing of memory, toward lower memory addresses".
"The FILO nature of a stack might seem odd, but since the stack is used
to store context, it’s very useful. When a function is called, several things are pushed to the stack together in a stack frame. The EBP register—sometimes called the frame pointer (FP) or local base (LB) pointer
—is used to reference local function variables in the current stack frame. Each stack frame contains the parameters to the function, its local variables, and two pointers that are necessary to put things back the way they were: the saved frame pointer (SFP) and the return address. The
SFP is used to restore EBP to its previous value, and the return address
is used to restore EIP to the next instruction found after the function call. This restores the functional context of the previous stack
frame".
Strings
"In C, an array is simply a list of n elements of a specific data type. A 20-character array is simply 20 adjacent characters located in memory. Arrays are also referred to as buffers".
#include <stdio.h>
int main()
{
char str_a[20];
str_a[0] = 'H';
str_a[1] = 'e';
str_a[2] = 'l';
str_a[3] = 'l';
str_a[4] = 'o';
str_a[5] = ',';
str_a[6] = ' ';
str_a[7] = 'w';
str_a[8] = 'o';
str_a[9] = 'r';
str_a[10] = 'l';
str_a[11] = 'd';
str_a[12] = '!';
str_a[13] = '\n';
str_a[14] = 0;
printf(str_a);
}
"In the preceding program, a 20-element character array is defined as
str_a, and each element of the array is written to, one by one. Notice that the number begins at 0, as opposed to 1. Also notice that the last character is a 0".
"(This is also called a null byte.) The character array was defined, so 20 bytes are allocated for it, but only 12 of these bytes are actually used. The null byte Programming at the end is used as a delimiter character to tell any function that is dealing with the string to stop operations right there. The remaining extra bytes are just garbage and will be ignored. If a null byte is inserted in the fifth element of the character array, only the characters Hello would be printed by the printf() function".
"Since setting each character in a character array is painstaking and strings are used fairly often, a set of standard functions was created for string manipulation. For example, the strcpy() function will copy a string from a source to a destination, iterating through the source string and copying each byte to the destination (and stopping after it copies the null termination byte)".
"The order of the functions arguments is similar to Intel assembly syntax destination first and then source. The char_array.c program can be rewritten using strcpy() to accomplish the same thing using the string library. The next version of the char_array program shown below includes string.h since it uses a string function".
#include <stdio.h>
#include <string.h>
int main()
{
char str_a[20];
strcpy(str_a, "Hello, world!\n");
printf(str_a);
}
Find more information on C strings
http://www.cs.uic.edu/~jbell/CourseNotes/C_Programming/CharacterStrings.html
http://www.tutorialspoint.com/cprogramming/c_strings.htm
Pointers
"The EIP register is a pointer that “points” to the current instruction during a programs execution by containing its memory address. The idea of pointers is used in C, also. Since the physical memory cannot actually be moved, the information in it must be copied. It can be very computationally expensive to copy large chunks of memory to be used by different functions or in different places. This is also expensive from a memory standpoint, since space for the new destination copy must be saved or allocated before the source can be copied. Pointers are a solution to this problem. Instead of copying a large block of memory, it is much simpler to pass around the address of the beginning of that block of memory".
"Pointers in C can be defined and used like any other variable type. Since memory on the x86 architecture uses 32-bit addressing, pointers are also 32 bits in size (4 bytes). Pointers are defined by prepending an asterisk (*) to the variable name. Instead of defining a variable of that type, a pointer is defined as something that points to data of that type. The pointer.c program is an example of a pointer being used with the char data type, which is only 1byte in size".
#include <stdio.h>
#include <string.h>
int main()
{
char str_a[20]; // A 20-element character array
char *pointer; // A pointer, meant for a character array
char *pointer2; // And yet another one
strcpy(str_a, "Hello, world!\n");
pointer = str_a; // Set the first pointer to the start of the array.
printf(pointer);
pointer2 = pointer + 2; // Set the second one 2 bytes further in.
printf(pointer2); // Print it.
strcpy(pointer2, "y you guys!\n"); // Copy into that spot.
printf(pointer); // Print again.
}
"As the comments in the code indicate, the first pointer is set at the beginning of the character array. When the character array is referenced like this, it is actually a pointer itself. This is how this buffer was passed as a pointer to the printf() and strcpy() functions earlier. The second pointer is set to the first pointers address plus two, and then some things are printed (shown in the output below)".
reader#hacking:~/booksrc $ gcc -o pointer pointer.c
reader#hacking:~/booksrc $ ./pointer
Hello, world!
llo, world!
Hey you guys!
reader#hacking:~/booksrc $
"The address-of operator is often used in conjunction with pointers, since pointers contain memory addresses. The addressof.c program demonstrates
the address-of operator being used to put the address of an integer variable
into a pointer. This line is shown in bold below".
#include <stdio.h>
int main()
{
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // put the address of int_var into int_ptr
}
"An additional unary operator called the dereference operator exists for use with pointers. This operator will return the data found in the address the pointer is pointing to, instead of the address itself. It takes the form of an asterisk in front of the variable name, similar to the declaration of a pointer. Once again, the dereference operator exists both in GDB and in C".
"A few additions to the addressof.c code (shown in addressof2.c) will
demonstrate all of these concepts. The added printf() functions use format
parameters, which I’ll explain in the next section. For now, just focus on the programs output".
#include <stdio.h>
int main()
{
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // Put the address of int_var into int_ptr.
printf("int_ptr = 0x%08x\n", int_ptr);
printf("&int_ptr = 0x%08x\n", &int_ptr);
printf("*int_ptr = 0x%08x\n\n", *int_ptr);
printf("int_var is located at 0x%08x and contains %d\n", &int_var, int_var);
printf("int_ptr is located at 0x%08x, contains 0x%08x, and points to %d\n\n", &int_ptr, int_ptr, *int_ptr);
}
"When the unary operators are used with pointers, the address-of operator can be thought of as moving backward, while the dereference operator moves forward in the direction the pointer is pointing".
Find out more about Pointers & memory allocation
Professor Dan Hirschberg, Computer Science Department, University of California on computer memory https://www.ics.uci.edu/~dan/class/165/notes/memory.html
http://cslibrary.stanford.edu/106/
http://www.programiz.com/c-programming/c-dynamic-memory-allocation
Arrays
Theres a simple tutorial on multi-dimensional arrays by a chap named Alex Allain available here http://www.cprogramming.com/tutorial/c/lesson8.html
Theres information on arrays by a chap named Todd A Gibson available here http://www.augustcouncil.com/~tgibson/tutorial/arr.html
Iterate an Array
#include <stdio.h>
int main()
{
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
char *char_pointer;
int *int_pointer;
char_pointer = char_array;
int_pointer = int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the integer %d\n", int_pointer, *int_pointer);
int_pointer = int_pointer + 1;
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
printf("[char pointer] points to %p, which contains the char '%c'\n", char_pointer, *char_pointer);
char_pointer = char_pointer + 1;
}
}
Linked Lists vs Arrays
Arrays are not the only option available, information on Linked List.
http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_linklist.aspx
Conclusion
This information was written simply to pass on some of what I have read throughout my research on the topic that might help others.
Iirc, a string is actually an array of chars, so this should work:
a[1][2]
Quote from the wikipedia article on C pointers -
In C, array indexing is formally defined in terms of pointer arithmetic; that is,
the language specification requires that array[i] be equivalent to *(array + i). Thus in C, arrays can be thought of as pointers to consecutive areas of memory (with no gaps),
and the syntax for accessing arrays is identical for that which can be used to dereference
pointers. For example, an array can be declared and used in the following manner:
int array[5]; /* Declares 5 contiguous (per Plauger Standard C 1992) integers */
int *ptr = array; /* Arrays can be used as pointers */
ptr[0] = 1; /* Pointers can be indexed with array syntax */
*(array + 1) = 2; /* Arrays can be dereferenced with pointer syntax */
So, in response to your question - yes, pointers to pointers can be used as an array without any kind of other declaration at all!
Try a[1][2]. Or *(*(a+1)+2).
Basically, array references are syntactic sugar for pointer dereferencing. a[2] is the same as a+2, and also the same as 2[a] (if you really want unreadable code). An array of strings is the same as a double pointer. So you can extract the second string using either a[1] or *(a+1). You can then find the third character in that string (call it 'b' for now) with either b[2] or *(b + 2). Substituting the original second string for 'b', we end up with either a[1][2] or *(*(a+1)+2).