Why Use Pointers in C? - c

I'm still wondering why in C you can't simply set something to be another thing using plain variables. A variable itself is a pointer to data, is it not? So why make pointers point to the data in the variable when you can simply use the original variable? Is it to access specific bits (or bytes, I guess) of data within said variable?
I'm sure it's logical, however I have never fully grasped the concept and when reading code seeing *pointers always throws me off.

One common place where pointers are helpful is when you are writing functions. Functions take their arguments 'by value', which means that they get a copy of what is passed in and if a function assigns a new value to one of its arguments that will not affect the caller. This means that you couldn't write a "doubling" function like this:
void doubling(int x)
{
x = x * 2;
}
This makes sense because otherwise what would the program do if you called doubling like this:
doubling(5);
Pointers provide a tool for solving this problem because they let you write functions that take the address of a variable, for example:
void doubling2(int *x)
{
(*x) = (*x) * 2;
}
The function above takes the address of an integer as its argument. The one line in the function body dereferences that address twice: on the left-hand side of the equal sign we are storing into that address and on the right-hand side we are getting the integer value from that address and then multiply it by 2. The end result is that the value found at that address is now doubled.
As an aside, when we want to call this new function we can't pass in a literal value (e.g. doubling2(5)) as it won't compile because we are not properly giving the function an address. One way to give it an address would look like this:
int a = 5;
doubling2(&a);
The end result of this would be that our variable a would contain 10.

A variable itself is a pointer to data
No, it is not. A variable represents an object, an lvalue. The concept of lvalue is fundamentally different from the concept of a pointer. You seem to be mixing the two.
In C it is not possible to "rebind" an lvalue to make it "point" to a different location in memory. The binding between lvalues and their memory locations is determined and fixed at compile time. It is not always 100% specific (e.g. absolute location of a local variable is not known at compile time), but it is sufficiently specific to make it non-user-adjustable at run time.
The whole idea of a pointer is that its value is generally determined at run time and can be made to point to different memory locations at run time.

No, a variable is not a pointer to data. If you declare two integers with int x, y;, then there is no way to make x and y refer to the same thing; they are separate.
Whenever you read or write from a variable, your computer has to somehow determine the exact location of that variable in your computer's memory. Your computer will look at the code you wrote and use that to determine where the variable is. A pointer can represent the situation where the location is not known at the time when you compile your code; the exact address is only computed later when you actually run your code.
If you weren't allowed to use pointers or arrays, every line of code you write would have to access specific variables that are known at compile time. You couldn't write a general piece of code that reads and writes from different places in memory that are specified by the caller.
Note: You can also use arrays with a variable index to access variables whose location is not known at compile time, but arrays are mostly just syntactical sugar for pointers. You can think about all array operations in terms of pointer operations instead. Arrays are not as flexible as pointers.
Another caveat: As AnT points out, the location of local variables is usually on the stack, so they are a type of variable where the location isn't known at compile time. But the only reason that the stack works for storing local variables in a reentrant function is that your compiler implements hidden pointers called the stack pointer and/or frame pointer, and functions use these pointers to find out which part of memory holds their arguments and local variables. Pointers are so useful that the compiler actually uses them behind your back without telling you.

Another reason: C was designed to build operating systems and lots of low level code that deals with hardware. Every piece of hardware exposes its interface by means of registers, and on nearly all architectures, registers are mapped into the CPU memory space, and they have not to be in the same address always (thanks to jumper settings, PnP, autoconfig, and so on)
So the OS writer, while writing a driver for instance, needs a way to deal with what seems memory locations, only that they don't refer RAM cells.
Pointers serve to this purpose by allowing the OS writer to specify what memory location he or she wants to access to.

Related

Where is the address to the first element of an array stored?

I was playing with C, and I just discovered that a and &a yield to the same result that is the address to the first element of the array. By surfing here over the topics, I discovered they are only formatted in a different way. So my question is: where is this address stored?
This is an interesting question! The answer will depend on the specifics of the hardware you're working with and what C compiler you have.
From the perspective of the C language, each object has an address, but there's no specific prescribed mechanism that accounts for how that address would actually be stored or accessed. That's left up to the compiler to decide.
Let's imagine that you've declared your array as a local variable, and then write something like array[137], which accesses the 137th element of the array. How does the generated program know how to find your array? On most systems, the CPU has a dedicated register called the stack pointer that keeps track of the position of the memory used for all the local variables of the current function. As the compiler translates your C code into an actual executable file, it maintains an internal table mapping each local variable to some offset away from where the stack pointer points. For example, it might say something like "because 64 bytes are already used up for other local variables in this function, I'm going to place array 64 bytes past where the stack pointer points." Then, whenever you reference array, the compiler generates machine instructions of the form "look 64 bytes past the stack pointer to find the array."
Now, imagine you write code like this:
printf("%p\n", array); // Print address of array
How does the compiler generate code for this? Well, internally, it knows that array is 64 bytes past the stack pointer, so it might generate code of the form "add 64 to the stack pointer, then pass that as an argument to printf."
So in that sense, the answer to your question could be something like "the hardware stores a single pointer called the stack pointer, and the generated code is written in a way that takes that stack pointer and then adds some value to it to get to the point in memory where the array lives."
Of course, there are a bunch of caveats here. For example, some systems have both a stack pointer and a frame pointer. Interpreters use a totally different strategy and maintain internal data structures tracking where everything is. And if the array is stored at global scope, there's a different mechanism used altogether.
Hope thi shelps!
It isn't stored anywhere - it's computed as necessary.
Unless it is the operand of the sizeof, _Alignof, or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" is converted ("decays") to an expression of type "pointer to T", and the value of the expression is the address of the first element of the array.
When you declare an array like
T a[N]; // for any non-function type T
what you get in memory is
+---+
| | a[0]
+---+
| | a[1]
+---+
...
+---+
| | a[N-1]
+---+
That's it. No storage is materialized for any pointer. Instead, whenever you use a in any expression, the compiler will compute the address of a[0] and use that instead.
Consider this C code:
int x;
void foo(void)
{
int y;
...
}
When implementing this program, a C compiler will need to generate instructions that access the int objects named x and y and the int object allocated by the malloc. How does it tell those instructions where the objects are?
Each processor architecture has some way of referring to data in memory. This includes:
The machine instruction includes some bits that identify a processor register. The address in memory is in that processor register.
The machine instruction includes some bits that specify an address.
The machine instruction includes some bits that specify a processor register and some bits that specify an offset or displacement.
So, the compiler has a way of giving an address to the processor. It still needs to know that address. How does it do that?
One way is the compiler could decide exactly where everything in memory is going to go. It could decide it is going to put all the program’s instructions at addresses 0 to 10,000, and it is going to put data at 10,000 and on, and that x will go at address 12300. Then it could write an instruction to fetch x from address 12300. This is called absolute addressing, and it is rarely used anymore because it is inflexible.
Another option is that the compiler can let the program loader decide where to put the data. When the software that loads the program into memory is running, it will read the executable, see how much space is needed for instructions, how much is needed for data that is initialized to zero, how much space is needed for data with initial values listed in the executable file, how much space is needed for data that does not need to be initialized, how much space is requested for the stack, and so on. Then the loader will decide where to put all of these things. As it does so, it will set some processor registers, or some tables in memory, to contain the addresses where things go.
In this case, the compiler may know that x goes at displacement 2300 from the start of the “zero-initialized data” section, and that the loader sets register r12 to contain the base address of that section. Then, when the compiler wants to access x, it will generate an instruction that says “Use register r12 plus the displacement 2300.” This is largely the method used today, although there are many embellishments involving linking multiple object modules together, leaving a placeholder in the object module for the name x that the linker or loader fills in with the actual displacement as they do their work, and other features.
In the case of y, we have another problem. There can be two or more instances of y existing at once. The function foo might call itself, which causes there to be a y for the first call and a different y for the second call. Or foo might call another function that calls foo. To deal with this, most C implementations use a stack. One register in the processor is chosen to be a stack pointer. The loader allocates a large amount of space and sets the stack pointer register to point to the “top” of the space (usually the high-address end, but this is arbitrary). When a function is called, the stack pointer is adjusted according to how much space the new function needs for its local data. When the function executes, it puts all of its local data in memory locations determined by the value of the stack pointer when the function started executing.
In this model, the compiler knows that the y for the current function call is at a particular offset relative to the current stack pointer, so it can access y using instructions with addresses such as “the contents of the stack pointer plus 84 bytes.” (This can be done with a stack pointer alone, but often we also have a frame pointer, which is a copy of the stack pointer at the moment the function was called. This provides a firmer base address for working with local data, one that might not change as much as the stack pointer does.)
In either of these models, the compiler deals with the address of an array the same way it deals with the address of a single int: It knows where the object is stored, relative to some base address for its data segment or stack frame, and it generates the same sorts of instruction addressing forms.
Beyond that, when you access an array, such as a[i], or possibly a multidimensional array, a[i][j][k], the compiler has to do more calculations. To do this, compiler takes the starting address of the array and does the arithmetic necessary to add the offsets for each of the subscripts. Many processors have instructions that help with these calculations—a processor may have an addressing form that says “Take a base address from one register, add a fixed offset, and add the contents of another register multiplied by a fixed size.” This will help access arrays of one dimension. For multiple dimensions, the compiler has to write extra instructions to do some of the calculations.
If, instead of using an array element, like a[i], you take its address, as with &a[i], the compiler handles it similarly. It will get a base address from some register (the base address for the data segment or the current stack pointer or frame pointer), add the offset to where a is in that segment, and then add the offset required for i elements. All of the knowledge of where a[i] is is built into the instructions the compiler writes, plus the registers that help manage the program’s memory layout.
Yet one more point of view, a TL;DR answer if you will: When the compiler produces the binary, it stores the address everywhere where it is needed in the generated machine code.
The address may be just plain number in the machine code, or it may be a calculation of some sort, such as "stack frame base address register + a fixed offset number", but in either case it is duplicated everywhere in the machine code where it is needed.
In other words, it is not stored in any one location. Talking more technically, &some_array is not an lvalue, and trying to take the address of it, &(&some_array), will produce compiler error.
This actually applies to all variables, array is not special in any way here. The address of a variable can be used in the machine code directly (and if compiler actually generates code which does store the address somewhere, you have no way to know that from C code, you have to look at the assembly code).
The one thing special about arrays, which seems to be the source of your confusion is, that some_array is bascially a more convenient syntax for &(some_array[0]), while &some_array means something else entirely.
Another way to look at it:
The address of the first element doesn't have to be stored anywhere.
An array is a chunk of memory. It has an address simply because it exists somewhere in memory. That address may or may not have to be stored somewhere depending on a lot of things that others have already mentioned.
Asking where the address of the array has to be stored is like asking where reality stores the location of your car. The location doesn't have to be stored - your car is located where your car happens to be - it's a property of existing. Sure, you can make a note that you parked your car in row 97, spot 114 of some huge lot, but you don't have to. And your car will be wherever it is regardless of your note-taking.

What are the benefits to use const pointers or const data?

How can a non const pointer (that allows to access in nearby memory) or a non const data be exploited?
I studied that a non const pointer can be exploited to access in nearby memory of computer.
But my question is: how?
I mean, if I have started my compiled program, how can an attacker exploit it if it is already compiled and running?
Is this a dangerous vulnerability? Or it isn't so important?
Constant Pointer
A constant pointer in C cannot change the address of the variable to which it is pointing, i.e., the address will remain constant. Therefore, we can say that if a constant pointer is pointing to some variable, then it cannot point to any other variable.
Constant Data
Code Maintenance
Say I have a need to use the speed of light for calculations throughout my project. The speed of light is not going to change so there is no need for a variable. I could simply copy and paste 300,000,000 (obviously without the commas) anytime I need the speed of light (in meters per second). This will work fine for now but down the road my calculations may require more precision so if I want to change 300,000,000 to 299,792,458 it will be a fairly painful process if I used this value a lot in my code. If I use a const I can use this the name of this const throughout the code so if I need to change its value I can just change it once at the beginning of the code.
Memory
This isn’t usually relevant given the computing power of modern devices but for software on embedded devices there may not be the RAM space for variables and by using a const its value is put into the program when the code is compiled. This allows storage space (on the HDD/SSD) to be used instead of the RAM.
Accidental Changes
If you need a value to stay constant use a const so that you don’t accidentally change its value later on because you forget that you need its value to remain the same. Think of it this way, if its value is not going to change throughout the code why wouldn’t you use a const.
Pointers in C are the most dangerous entity that has potential exploit your RAM. If you somehow know the base address of the pointer where it is pointing, use that address to move forward or backward in the memory and do whatever you want since in C there is no exception handling of OutofBoundException.
For that, you need to have a good understanding of pointers.

Can I assign any integer value to a pointer variable directly?

Since addresses are numbers and can be assigned to a pointer variable, can I assign any integer value to a pointer variable directly, like this:
int *pPtr = 60000;
You can, but unless you're developing for an embedded device with known memory addresses with a compiler that explicitly allows it, attempting to dereference such a pointer will invoke undefined behavior.
You should only assign the address of a variable or the result of a memory allocation function such as malloc, or NULL.
Yes you can.
You should only assign the address of a variable or the result of a memory allocation function such as malloc, or NULL.
According to the pointer conversion rules, e.g. as described in this online c++ standard draft, any integer may be converted to a pointer value:
6.3.2.3 Pointers
(5) An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might not
be correctly aligned, might not point to an entity of the referenced
type, and might be a trap representation.
Allowing the conversion does, however, not mean that you are allowed to dereference the pointer then.
You can but there's a lot of considerations.
1) What does that mean?
The only really useful abstraction when this actually gets used is that you need to access a specific memory location because something is mapped to a specific point, generally hardware control registers (less often: a specific area in flash or from the linker table). The fact that you are assigning 60000 (a decimal number rather than a hexadecimal address or a symbolic mnemonic) makes me quite worried.
2) Do you have "odd" pointers?
Some microcontrollers have pointers with strange semantics (near vs far, tied to a specific memory page, etc.) You may have to do odd things to make the pointer make sense. In addition, some pointers can do strange things depending upon where they point. For example, the PIC32 series can point to the exact same data but with different upper bits that will retrieve a cached copy or an uncached copy.
3) Is that value the correct size for the pointer?
Different architectures need different sizes. The newer data types like intptr_t are meant to paper over this.

how to find out which variable malloc() is being assigned to?

I'm trying to track the usage of malloc'ed area through variables that point to the are in a profiler. For example, for the following assignment inside function func().
uint64_t * dictionary = (uint64_t *) malloc(sizeof(uint64_t)*128);
I need to figure out the variable name (which is 'dictionary' in the above example) that points to the malloc'ed memory region. I instrumented malloc() to record the start address and size of the allocation. However, still no knowledge of variable 'dictionary', what I'm thinking is to examine the stack frame of function func(), finding out the local pointer variable pointing to a data type that matches that of malloc'ed type. The approach would need to instrument malloc() to go back one frame to func() to find out the possible local variables, and then fuzzy match by type. Wondering whether there are any other neat ways to implement this.
In general, I would expect this to be impossible. :)
You can't, of course, assume that the variable name is available, the best bet in general would be (I guess) a stack offset in the calling function's frame. If debugging symbols are available you might perhaps be able to map that to a name, though.
I guess it's possible that there is no name; that the return address is put in a register and perhaps manipulated there, before (if ever) being written to memory. If this means your code needs to start analyzing the calling code to track what it does with the return value, that sounds difficult.
What do you want to do with the variable reference once you've isolated it? I assume you're instrumenting malloc() for debugging purposes, so probably you're going to store it somewhere.

Direct invocation vs indirect invocation in C

I am new to C and I was reading about how pointers "point" to the address of another variable. So I have tried indirect invocation and direct invocation and received the same results (as any C/C++ developer could have predicted). This is what I did:
int cost;
int *cost_ptr;
int main()
{
cost_ptr = &cost; //assign pointer to cost
cost = 100; //intialize cost with a value
printf("\nDirect Access: %d", cost);
cost = 0; //reset the value
*cost_ptr = 100;
printf("\nIndirect Access: %d", *cost_ptr);
//some code here
return 0; //1
}
So I am wondering if indirect invocation with pointers has any advantages over direct invocation or vice-versa? Some advantages/disadvantages could include speed, amount of memory consumed performing the operation (most likely the same but I just wanted to put that out there), safeness (like dangling pointers) , good programming practice, etc.
1Funny thing, I am using the GNU C Compiler (gcc) and it still compiles without the return statement and everything is as expected. Maybe because the C++ compiler will automatically insert the return statement if you forget.
Indirect and direct invocation are used in different places. In your sample the direct invocation is prefered. Some reasons to use pointers:
When passing a large data structure to a function one can passa pointer instead of copying the whole data structure.
When you want a function to be able to modify a variable in the calling function you have to pass a pointer to the original variable. If you pass it by value the called function will just modify a copy.
When you don't know exactly which value you want to pass at compile time, you can set a pointer to the right value during runtime.
When dynamically allocating a block of memory you have to use pointers, since the storage location is not known until runtime.
The pointer notation involves two memory fetches (or one fetch and one write) where the direct invocation involves one.
Since memory accesses are slow (compared to computation in registers; fast compared to access on disk), the pointer access will slow things down.
However, there are times when only a pointer allows you to do what you need. Don't get hung up on the difference and do use pointers when you need them. But avoid pointers when they are not necessary.
In the "printf(..., *cost) case, the value of "cost" is copied to the stack. While trivial in this instance, imagine if you were calling a different function and "cost_ptr" pointed to a multi-megabyte data structure.

Resources