Benefits of pointers? - c

I recently developed an interest in C programming so I got myself a book (K&R) and started studying.
Coming from a University course in Java (basics), pointers are a totally new chapter, and from what I read online it's a rather difficult concept to get your head around. Before getting to the pointer chapter I was under the impression that pointers are a major part of C and provide great benefits.
Upon reading the chapter and getting a basic idea of what pointers are and how they work, the benefits are not obvious to me.
For example (please correct me if I got this totally wrong) in the introduction of pointers in the K&R book it says that since we call by value, when passing a variable in a function call we pretty much pass a copy of the variable for the function to handle and therefore the function can't do anything to the original variable and we can overcome this with pointers.
In a later example that uses a char pointer, the book says that incrementing the char pointer is legal since the function has a private copy of the pointer. Aren't 'private copies' a reason to use pointers instead?
I guess I'm a bit confused on the whole use of pointers. If asked I can use pointers instead of using array subscripts for example, but I doubt this is the main use of pointers.
Linux and Open source programming was the main reason I got into C. I got the source code of a C project to study (Geany IDE) and I can see that pointers are used throughout the source code.
I also did a bit of searching in the forums and a found a couple of posts with similar questions. An answer was (I quote):
If you don't know when you should use pointers just don't use them.
It will become apparent when you need to use them, every situation is different.
Is it safe for me to avoid using pointers at the time being and only use them in specific situations (where the need for pointers will be apparent?)

One benefit of pointers is when you use them in function arguments, you don't need to copy large chunks of memory around, and you can also change the state by dereferencing the pointer.
For example, you may have a huge struct MyStruct, and you have a function a().
void a (struct MyStruct* b) {
// You didn't copy the whole `b` around, just passed a pointer.
}

Coming from Java, you'll have a slightly different perspective than what is presented in K&R (K&R doesn't assume that the reader knows any other modern programming language).
A pointer in C is like a slightly more capable version of a reference in Java. You can see this similarity through the Java exception named NullPointerException. One important aspect of pointers in C is that you can change what they point to by increment and decrement.
In C, you can store a bunch of things in memory in an array, and you know that they are sitting side by side each other in memory. If you have a pointer to one of them, you can make that pointer point to the "next" one by incrementing it. For example:
int a[5];
int *p = a; // make p point to a[0]
for (int i = 0; i < 5; i++) {
printf("element %d is %d\n", i, *p);
p++; // make `p` point to the next element
}
The above code uses the pointer p to point to each successive element in the array a in sequence, and prints them out.
(Note: The above code is an expository example only, and you wouldn't usually write a simple loop that way. It would be easier to access the elements of the array as a[i], and not use a pointer there.)

Your highlighted rule is very wise. It will keep you out of trouble but sooner or later you have to learn pointers.
So why do we want to use pointers?
Say I opened a textfile and read it into a giant string. I can't pass you the giant string by value because it's too big to fit on the stack (say 10mb). So I tell you where the string is and say "Go look over there at my string".
An array is a pointer ( well almost ).
int[] and int* are subtly different but interchangeable for the most part.
int[] i = new int[5]; // garbage data
int* j = new int[5] // more garbage data but does the same thing
std::cout << i[3] == i + 3 * sizeof(int); // different syntax for the same thing
A more advanced use of pointers that's extremely useful is the use of function pointers. In C and C++ functions aren't first class data types, but pointers are. So you can pass a pointer to a function that you want called and they can do so.
Hopefully that helps but more than likely will be confusing.

In a later example that uses a char pointer, the book says that incrementing the char pointer is legal since the function has a private copy of the pointer.
I'd say that this means that they are incrementing the pointer itself, which means changing the address (and therefore making it point to a different value). This could be useful if they were passed the first item of an array and want to continue in the array, but not change the values.

Remember that C (and the K&R book) very old, probably older than anything you've learned before (definitely ages older than Java). Pointers aren't an extra feature of C, they're a very basic part of how computers work.
Pointer theory isn't particularly hard to master, just that they're very powerful so an error will most likely crash your application, and compilers have a hard time trying to catch pointer-related errors. One of the big novelties about Java was to have 'almost' as much power as C without pointers.
So, in my opinion trying to write C avoiding pointers is like trying to ride a bike without one pedal. Yes, it's doable but you'll work twice as hard.

Ignore that answer please.
If you don't know how to use pointers, learn how to use them. Simple.
Pointers as you say allow you to pass more than one variable into a function via a pointer to it, as you have rightly observed. Another use of pointers is in referring to arrays of data, which you can step through using pointer arithmetic. Finally, pointers allow you allocate memory dynamically. Therefore advising you don't use pointers is going to severely limit what you can do.
Pointers are C's way to talk about memory addresses. For that reason, they are critical. As you have K&R, read that.
For one example of usage, see my answer to this. As I say in that answer, it isn't necessarily how you'd do it, given the question.
However, that technique is exactly how libraries like MPIR and GMP work. libgmp if you haven't met it powers mathematica, maple etc and is the library for arbitrary precision arithmetic. You'll find mpn_t is a typedef to a pointer; depending on the OS depends on what it points to. You'll also find a lot of pointer arithmetic in the slow, C, versions of this code.
Finally, I mentioned memory management. If you want to allocate an array of something you need malloc and free which deal with pointers to memory spaces; specifically malloc returns a pointer to some memory once allocated or NULL on failure.
One of my favourite uses of pointers yet is to make a C++ class member function act as a thread on windows using the win32 API i.e. have the class contain a thread. Unfortunately CreateThread on windows won't accept C++ class functions for obvious reasons - CreateThread is a C function that doesn't understand class instances; so you need to pass it a static function. Here's the trick:
DWORD WINAPI CLASSNAME::STATICFUNC(PVOID pvParam)
{
return ((CLASSNAME*)pvParam)->ThreadFunc();
}
(PVOID is a void *)
What happens is it returns ThreadFunc which gets executed "instead of" (actually STATICFUNC calls it) STATICFUNC and can indeed access all the private member variables of CLASSNAME. Ingenious, eh?
If that isn't enough to convince you pointers kinda are C, I don't know what is. Or maybe there's no point in C without pointers. Or...

Considering that you're coming from a Java background, here's the simplest way to get your head around what use pointers have.
Let's say you have a class like this in Java:
public class MyClass {
private int myValue;
public int getMyValue() { return myValue; }
public void setMyValue(int value) { myValue = value; }
}
Here's your main function, that creates one of your objects and calls a function on it.
public static void Main(String[] args) {
MyClass myInstance = new MyClass();
myInstance.setMyValue(1);
System.out.printLn(myInstance.getMyValue()); // prints 1
DoSomething(myInstance);
System.out.printLn(myInstance.getMyValue()); // prints 2
}
public static void DoSomething(MyClass instance) {
instance.setMyValue(2);
}
The myInstance variable you declared in Main is a reference. It's basically a handle that the JVM uses to keep tabs on your object instance. Now, let's do the same thing in C.
typedef struct _MyClass
{
int myValue;
} MyClass;
void DoSomething(MyClass *);
int main()
{
MyClass myInstance;
myInstance.myValue = 1;
printf("i", myInstance.myValue); // prints 1
MyClass *myPointer = &myInstance; // creates a pointer to the address of myInstance
DoSomething(myPointer);
printf("i", myInstance.myValue); // prints 2
return 0;
}
void DoSomething(MyClass *instance)
{
instance->myValue = 2;
}
Of course, pointers are much more flexible in C, but this is the gist of how they work in a Java sort of way.

If you are dealing with dynamically allocated memory, you have to use pointers to access the allocated space. Many programs deal with allocated memory, so many programs have to use pointers.

In this example:
int f1(int i);
f1(x);
the parameter i is passed by value, so the function f1 could not change the value of variable x of the caller.
But in this case:
int f2(int* i);
int x;
int* px = &x;
f2(px);
Here we still pass px parameter by value, but in the same time we pass x by refference!. So if the callee (f2) will change its int* i it will have no effect on px in the caller. However, by changing *i the callee will change the value of x in the caller.

Let me explain this more in terms of Java references (as pointed out by #Greg's answer)
In Java, there are reference types (i.e. reference to a class) and value types (i.e. int). Like C, Java is pass by value, only. If you pass a primitive type into a function, you actually pass the value of the value (after all, it is a "value type"), and therefore any modifications to that value inside that function are not reflected in the calling code. If you pass a reference type into a function, you can modify the value of that object, because when you pass the reference type, you pass the reference to that reference type by value.

Pointers make it possible to dynamically dispatch code given conditions or state of the program. A simple way of understanding this concept is to think of a tree structure where each node represents either a function call, variable, or a pointer to a sublevel node. Once you understand this you then use pointers to point to established memory locations that the program can reference at will to understand the intitial state and thus the first dereference and offset. Then each node will contain it's own pointers to further understand the state whereby a further dereference can take place, a function can be called, or a value grabbed.
Of course this is just one way of visualizing how pointers can be used since a pointer is nothing more than an address of a memory location. You can also use pointers for message passing, virtual function calls, garbage collection, etc. In fact I have used them to recreate c++ style virtual function calls. The result is that the c virtual function and c++ virtual functions run at the exact same speed. However the c implementation was much smaller in size (66% less in KB) and slightly more portable. But replicating features from other languages in C will not always be advantageous, obviously b/c other languages could be better equipped to gather information that the compiler can use for optimization of that data structure.
In summary there is a lot that can be done with pointers. The C language and pointers are devalued nowadays b/c most higher level languages come equipped with the more often used data structures / implementations that you would have had to build on your own with the use of pointers. But there are always times when a programmer may need to implement a complex routine and in this case knowing how to use pointers is a very good thing.

Related

Why do arrays in C decay to pointers?

[This is a question inspired by a recent discussion elsewhere, and I'll provide an answer right with it.]
I was wondering about the odd C phenomenon of arrays "decaying" to pointers, e.g. when used as function arguments. That just seems so unsafe. It is also inconvenient to pass the length explicitly with it. And I can pass the other type of aggregate -- structs -- perfectly well by value; structs do not decay.
What is the rationale behind this design decision? How does it integrate with the language? Why is there a difference to structs?
Rationale
Let's examine function calls because the problems are nicely visible there: Why are arrays not simply passed to functions as arrays, by value, as a copy?
There is first a purely pragmatic reason: Arrays can be big; it may not be advisable to pass them by value because they
could exceed the stack size, especially in the 1970s. The first compilers were written on a PDP-7 with about 9 kB RAM.
There is also a more technical reason rooted in the language. It would be hard to generate code for a function call with arguments whose size is not known at compile time. For all arrays, including variable length arrays in modern C, simply the addresses are put on the call stack. The size of an address is of course well known. Even languages with elaborate array types carrying run time size information do not pass the objects proper on the stack. These languages typically pass "handles" around, which is what C has effectively done, too, for 40 years. See Jon Skeet here and an illustrated explanation he references (sic) here.
Now a language could make it a requirement that an array always have a complete type; i.e. whenever it is used, its complete declaration including the size must be visible. This is, after all, what C requires from structures (when they are accessed). Consequently, structures can be passed to functions by value. Requiring the complete type for arrays as well would make function calls easily compilable and obviate the need to pass additional length arguments: sizeof() would still work as expected inside the callee. But imagine what that means. If the size were really part of the array's argument type, we would need a distinct function for each array size:
// for user input.
int average_ten(int arr[10]);
// for my new Hasselblad.
int average_twohundredfivemilliononehundredfourtyfivethousandsixhundred(int arr[16544*12400]);
// ...
In fact it would be totally comparable to passing structures, which differ in type if their elements differ (say, one struct with 10 int elements and one with 16544*12400). It is obvious that arrays need more flexibility. For example, as demonstrated one could not sensibly provide generally usable library functions which take array arguments.
This "strong typing conundrum" is, in fact, what happens in C++ when a function takes a reference to an array; that is also the reason why nobody does it, at least not explicitly. It is totally inconvenient to the point of being useless except for cases which target specific uses, and in generic code: C++ templates provide compile-time flexibility which is not available in C.
If, in existing C, indeed arrays of known sizes should be passed by value there is always the possibility to wrap them in a struct. I remember that some IP related headers on Solaris defined address family structures with arrays in them, allowing to copy them around. Because the byte layout of the struct was fixed and known, that made sense.
For some background it's also interesting to read The Development of the C Language by Dennis Ritchie about the origins of C. C's predecessor BCPL didn't have any arrays; the memory was just homogeneous linear memory with pointers into it.
The answer to this question can be found in Dennis Ritchie's "The Development of the C Language" paper (see "Embryonic C" section)
According to Dennis Ritchie, the nascent versions of C directly inherited/adopted array semantics from B and BCPL languages - predecessors of C. In those languages arrays were literally implemented as physical pointers. These pointers pointed to independently allocated blocks of memory containing the actual array elements. These pointers were initialized at run time. I.e. back in B and BCPL days arrays were implemented as "binary" (bipartite) objects: an independent pointer pointing to an independent block of data. There was no difference between pointer and array semantics in those languages, aside from the fact that array pointers were initialized automatically. At any time it was possible to re-assign an array pointer in B and BCPL to make it point somewhere else.
Initially, this approach to array semantics got inherited by C. However, its drawbacks became immediately obvious when struct types were introduced into the language (something neither B nor BCPL had). And the idea was that structs should naturally be able to contain arrays. However, continuing to stick with the above "bipartite" nature of B/BCPL arrays would immediately lead to a number of obvious complications with structs. E.g. struct objects with arrays inside would require non-trivial "construction" at the point of definition. It would become impossible to copy such struct objects - a raw memcpy call would copy the array pointers without copying the actual data. One wouldn't be able to malloc struct objects, since malloc can only allocate raw memory and does not trigger any non-trivial initializations. And so on and so forth.
This was deemed unacceptable, which led to the redesign of C arrays. Instead of implementing arrays through physical pointers Ritchie decided to get rid of the pointers entirely. The new array was implemented as a single immediate memory block, which is exactly what we have in C today. However, for backward compatibility reasons the behavior of B/BCPL arrays was preserved (emulated) as much as possible at superficial level: the new C array readily decayed to a temporary pointer value, pointing to the beginning of the array. The rest of the array functionality remained unchanged, relying on that readily available result of the decay.
To quote the aforementioned paper
The solution constituted the crucial jump in the evolutionary chain
between typeless BCPL and typed C. It eliminated the materialization
of the pointer in storage, and instead caused the creation of the
pointer when the array name is mentioned in an expression. The rule,
which survives in today's C, is that values of array type are
converted, when they appear in expressions, into pointers to the first
of the objects making up the array.
This invention enabled most existing B code to continue to work,
despite the underlying shift in the language's semantics. The few
programs that assigned new values to an array name to adjust its
origin—possible in B and BCPL, meaningless in C—were easily repaired.
More important, the new language retained a coherent and workable (if
unusual) explanation of the semantics of arrays, while opening the way
to a more comprehensive type structure.
So, the direct answer to your "why" question is as follows: arrays in C were designed to decay to pointers in order to emulate (as close as possible) the historical behavior of arrays in B and BCPL languages.
Take your time machine and travel back to 1970. Start designing a programming language. You want the following code to compile and do the expected thing:
size_t i;
int* p = (int *) malloc (10 * sizeof (int));
for (i = 0; i < 10; ++i) p [i] = i;
int a [10];
for (i = 0; i < 10; ++i) a [i] = i;
At the same time, you want a language that is simple. Simple enough that you can compile it on a 1970's computer. The rule that "a" decays to "pointer to first element of a" achieves that nicely.

Argument without type in C

I am fairly new to programming languages and wonder if it is possible to pass an argument without specific type to a function. For instance I have the following piece of code that defines a funcion add that will take a block of memory, check it if is filled via another function, and then adds an element to the list related to that block of memory.
This element can be an int, a float or a char. So I would like to write:
add(arrs1,20); //or also
add(arrs2,'b'); //or also
add(arrs3, 4.5);
Where arrs# are defined by struct arrs arrs#, and they refer to arrays of either floats, ints or chars but not mixed. How could I accomplish this?
int add(arrs list, NEW_ELEMENT){//adds NEW_ELEMENT at the end of an arrs
int check_if_resize;
check_if_resize=resize(list, list->size + 1);
list->ptr[list->used++] = NEW_ELEMENT;
return check_if_resize;
}
I appreciate your help.
C does, by design, not allow a single function to accept more than a single type for each argument. There are various ways to do something equivalent in C, though:
First and foremost, you can just write multiple different functions that do the same, but on different types. For instance, instead of add you could have three functions named add_int, add_char and add_float. I would recommend doing this in most cases, as it is by far the easiest and least error-prone.
Secondly, you might have noticed how printf can print both strings and numbers? So-called variadic functions, like printf, can take different types of arguments, but in the case of printf, you must still specify the type of arguments you want in the format string.
Finally, you can use void pointers if you need to work with the memory an object occupies, regardless of its type. Functions like memcpy and memset do this, for instance to copy the contents of one object directly to another. This is a bit harder to manage properly, as it is easy to make mistakes and end up corrupting memory, but its still doable (and sometimes even the best option).
But if you're a beginner in C, as you state you are, the first option is probably the easiest, especially when dealing with only a few different data types (in this case, three).
You can pass pretty much anything as a void * in a function that will add this content to a linked list by the use of memcpy (or even safe memmove).
As long as you have a pointer to the next node of your list, you don't have to worry about the type of the stored data.
Just be sure not to dereference a void *, but rather to cast it and use this casted variable (as a char if you want to work on this data byte by byte for instance)

Direct invocation vs indirect invocation in C

I am new to C and I was reading about how pointers "point" to the address of another variable. So I have tried indirect invocation and direct invocation and received the same results (as any C/C++ developer could have predicted). This is what I did:
int cost;
int *cost_ptr;
int main()
{
cost_ptr = &cost; //assign pointer to cost
cost = 100; //intialize cost with a value
printf("\nDirect Access: %d", cost);
cost = 0; //reset the value
*cost_ptr = 100;
printf("\nIndirect Access: %d", *cost_ptr);
//some code here
return 0; //1
}
So I am wondering if indirect invocation with pointers has any advantages over direct invocation or vice-versa? Some advantages/disadvantages could include speed, amount of memory consumed performing the operation (most likely the same but I just wanted to put that out there), safeness (like dangling pointers) , good programming practice, etc.
1Funny thing, I am using the GNU C Compiler (gcc) and it still compiles without the return statement and everything is as expected. Maybe because the C++ compiler will automatically insert the return statement if you forget.
Indirect and direct invocation are used in different places. In your sample the direct invocation is prefered. Some reasons to use pointers:
When passing a large data structure to a function one can passa pointer instead of copying the whole data structure.
When you want a function to be able to modify a variable in the calling function you have to pass a pointer to the original variable. If you pass it by value the called function will just modify a copy.
When you don't know exactly which value you want to pass at compile time, you can set a pointer to the right value during runtime.
When dynamically allocating a block of memory you have to use pointers, since the storage location is not known until runtime.
The pointer notation involves two memory fetches (or one fetch and one write) where the direct invocation involves one.
Since memory accesses are slow (compared to computation in registers; fast compared to access on disk), the pointer access will slow things down.
However, there are times when only a pointer allows you to do what you need. Don't get hung up on the difference and do use pointers when you need them. But avoid pointers when they are not necessary.
In the "printf(..., *cost) case, the value of "cost" is copied to the stack. While trivial in this instance, imagine if you were calling a different function and "cost_ptr" pointed to a multi-megabyte data structure.

Is it a best practice to wrap arrays and their length variable in a struct in C?

I will begin to use C for an Operating Systems course soon and I'm reading up on best practices on using C so that headaches are reduced later on.
This has always been one my first questions regarding arrays as they are easy to screw up.
Is it a common practice out there to bundle an array and its associated variable containing it's length in a struct?
I've never seen it in books and usually they always keep the two separate or use something like sizeof(array[]/array[1]) kind of deal.
But with wrapping the two into a struct, you'd be able to pass the struct both by value and by reference which you can't really do with arrays unless using pointers, in which case you have to again keep track of the array length.
I am beginning to use C so the above could be horribly wrong, I am still a student.
Cheers,
Kai.
Yes this is a great practice to have in C. It's completely logical to wrap related values into a containing structure.
I would go ever further. It would also serve your purpose to never modify these values directly. Instead write functions which act on these values as a pair inside your struct to change length and alter data. That way you can add invariant checks and also make it very easy to test.
Sure, you can do that. Not sure if I'd call it a best practice, but it's certainly a good idea to make C's rather rudimentary arrays a bit more manageable. If you need dynamic arrays, it's almost a requirement to group the various fields needed to do the bookkeeping together.
Sometimes you have two sizes in that case: one current, and one allocated. This is a tradeoff where you trade fewer allocations for some speed, paying with a bit of memory overhead.
Many times arrays are only used locally, and are of static size, which is why the sizeof operator is so handy to determine the number of elements. Your syntax is slightly off with that, by the way, here's how it usually looks:
int array[4711];
int i;
for(i = 0; i < sizeof array / sizeof *array; i++)
{
/* Do stuff with each element. */
}
Remember that sizeof is not a function, the parenthesis are not always needed.
EDIT: One real-world example of a wrapping exactly as that which you describe is the GArray type provided by glib. The user-visible part of the declaration is exactly what you describe:
typedef struct {
gchar *data;
guint len;
} GArray;
Programs are expected to use the provided API to access the array whenever possible, not poke these fields directly.
There are three ways.
For static array (not dynamically allocated and not passed as pointer) size is knows at compile time so you can used sizeof operator, like this: sizeof(array)/sizeof(array[0])
Use terminator (special value for last array element which cannot be used as regular array value), like null-terminated strings
Use separate value, either as a struct member or independent variable. It doesn't really matter because all the standard functions that work with arrays take separate size variable, however joining the array pointer and size into one struct will increase code readability. I suggest to use to have a cleaner interface for your own functions. Please note that if you pass your struct by value, called function will be able to change the array, but not the size variable, so passing struct pointer would be a better option.
For public API I'd go with the array and the size value separated. That's how it is handled in most (if not all) c library I know. How you handle it internally it's completely up to you. So using a structure plus some helper functions/macros that do the tricky parts for you is a good idea. It's always making me head-ache to re-think how to insert an item or to remove one, so it's a good source of problems. Solving that once and generic, helps you getting bugs from the beginning low. A nice implementation for dynamic and generic arrays is kvec.
I'd say it's a good practice. In fact, it's good enough that in C++ they've put it into the standard library and called it vector. Whenever you talk about arrays in a C++ forum, you'll get inundated with responses that say to use vector instead.
I don't see anything wrong with doing that but I think the reason that that is not usually done is because of the overhead incurred by such a structure. Most C is bare-metal code for performance reasons and as such, abstractions are often avoided.
I haven't seen it done in books much either, but I've been doing the same the same thing for a while now. It just seems to make sense to "package" those things together. I find it especially useful if you need to return an allocated array from a method for instance.
If you use static arrays you have access to the size of array using sizeof operator. If you'll put it into struct, you can pass it to function by value, reference and pointer. Passing argument by reference and by pointer is the same on assembly level (I'm almost sure of it).
But if you use dynamic arrays, you don't know the size of array at compile time. So you can store this value in struct, but you will also store only a pointer to array in structure:
struct Foo {
int *myarray;
int size;
};
So you can pass this structure by value, but what you realy do is passing pointer to int (pointer to array) and int (size of array).
In my opinion it won't help you much. The only thing that is in plus, is that you store the size and the array in one place and it is easy to get the size of the array. If you will use a lot of dynamic arrays you can do it this way. But if you will use few arrays, easier will be not to use structures.
I've never seen it done that way, but I haven't done OS level work in over a decade... :-) Seems like a reasonable approach at first glance. Only concern would be to make sure that the size somehow stays accurate... Calculating as needed doesn't have that concern.
considering you can calculate the length of the array (in bytes, that is, not # of elements) and the compiler will replace the sizeof() calls with the actual value (its not a function, calls to it are replaced by the compiler with the value it 'returns'), then the only reason you'd want to wrap it in a struct is for readability.
It isn't a common practice, if you did it, someone looking at your code would assume the length field was some special value, and not just the array size. That's a danger lurking in your proposal, so you'd have to be careful to comment it properly.
I think this part of your question is backwards:
"But with wrapping the two into a struct, you'd be able to pass the struct both by value and by reference which you can't really do with arrays unless using pointers, in which case you have to again keep track of the array length."
Using pointers when passing arrays around is the default behavior; that doesn't buy you anything in terms of passing the entire array by value. If you want to copy the entire array instead of having it decay to a pointer you NEED to wrap the array in a struct. See this answer for more information:
Pass array by value to recursive function possible?
And here is more on the special behavior of arrays:
http://denniskubes.com/2012/08/20/is-c-pass-by-value-or-reference/

What is the real difference between Pointers and References?

AKA - What's this obsession with pointers?
Having only really used modern, object oriented languages like ActionScript, Java and C#, I don't really understand the importance of pointers and what you use them for. What am I missing out on here?
It's all just indirection: The ability to not deal with data, but say "I'll direct you to some data, over there". You have the same concept in Java and C#, but only in reference format.
The key differences are that references are effectively immutable signposts - they always point to something. This is useful, and easy to understand, but less flexible than the C pointer model. C pointers are signposts that you can happily rewrite. You know that the string you're looking for is next door to the string being pointed at? Well, just slightly alter the signpost.
This couples well with C's "close to the bone, low level knowledge required" approach. We know that a char* foo consists of a set of characters beginning at the location pointed to by the foo signpost. If we also know that the string is at least 10 characters long, we can change the signpost to (foo + 5) to point at then same string, but start half the length in.
This flexibility is useful when you know what you're doing, and death if you don't (where "know" is more than just "know the language", it's "know the exact state of the program"). Get it wrong, and your signpost is directing you off the edge of a cliff. References don't let you fiddle, so you're much more confident that you can follow them without risk (especially when coupled with rules like "A referenced object will never disappear", as in most Garbage collected languages).
You're missing out on a lot! Understanding how the computer works on lower levels is very useful in several situations. C and assembler will do that for you.
Basically a pointer lets you write stuff to any point in the computer's memory. On more primitive hardware/OS or in embedded systems this actually might do something useful. Say turn the blinkenlichts on and off again.
Of course this doesn't work on modern systems. The operating system is the Lord and Master of main memory. If you try to access a wrong memory location, your process will pay for its hubris with its life.
In C, pointers are the way of passing references to data. When you call a function, you don't want to copy a million bits to a stack. Instead you just tell where the data resides in the main memory. In other words, you give a pointer to the data.
To some extent that is what happens even with Java. You pass references to objects, not the objects themselves. Remember, ultimately every object is a set of bits in the computer main memory.
Pointers are for directly manipulating the contents of memory.
It's up to you whether you think this is a good thing to do, but it's the basis of how anything gets done in C or assembler.
High-level languages hide pointers behind the scenes: for example a reference in Java is implemented as a pointer in almost any JVM you'll come across, which is why it's called NullPointerException rather than NullReferenceException. But it doesn't let the programmer directly access the memory address it points to, and it can't be modified to take a value other than the address of an object of the correct type. So it doesn't offer the same power (and responsibility) that pointers in low-level languages do.
[Edit: this is an answer to the question 'what's this obsession with pointers?'. All I've compared is assembler/C-style pointers with Java references. The question title has since changed: had I set out to answer the new question I might have mentioned references in languages other than Java]
This is like asking, “what's this obsession with CPU instructions? Do I miss out on something by not sprinkling x86 MOV instructions all over the place?”
You just need pointers when programming on a low level. In most higher-level programming language implementations, pointers are used just as extensively as in C, but hidden from the user by the compiler.
So... Don't worry. You're using pointers already -- and without the dangers of doing so incorrectly, too. :)
I see pointers as a manual transmission in a car. If you learn to drive with a car that has an automatic transmission, that won't make for a bad driver. And you can still do most everything that the drivers that learned on a manual transmission can do. There will just be a hole in your knowledge of driving. If you had to drive a manual you'd probably be in trouble. Sure, it is easy to understand the basic concept of it, but once you have to do a hill start, you're screwed. But, there is still a place for manual transmissions. For instance, race car drivers need to be able to shift to get the car to respond in the most optimal way to the current racing conditions. Having a manual transmission is very important to their success.
This is very similar to programming right now. There is a need for C/C++ development on some software. Some examples are high-end 3D games, low level embedded software, things where speed is a critical part of the software's purpose, and a lower level language that allows you closer access to the actual data that needs to be processed is key to that performance. However, for most programmers this is not the case and not knowing pointers is not crippling. However, I do believe everybody can benefit from learning about C and pointers, and manual transmissions too.
Since you have been programming in object-oriented languages, let me put it this way.
You get Object A instantiate Object B, and you pass it as a method parameter to Object C. The Object C modifies some values in the Object B. When you are back to Object A's code, you can see the changed value in Object B. Why is this so?
Because you passed in a reference of Object B to Object C, not made another copy of Object B. So Object A and Object C both hold references to the same Object B in memory. Changes from one place and be seen in another. This is called By Reference.
Now, if you use primitive types instead, like int or float, and pass them as method parameters, changes in Object C cannot be seen by Object A, because Object A merely passed a copy instead of a reference of its own copy of the variable. This is called By Value.
You probably already knew that.
Coming back to the C language, Function A passes to Function B some variables. These function parameters are natively copies, By Value. In order for Function B to manipulate the copy belonging to Function A, Function A must pass a pointer to the variable, so that it becomes a pass By Reference.
"Hey, here's the memory address to my integer variable. Put the new value at that address location and I will pick up later."
Note the concept is similar but not 100% analogous. Pointers can do a lot more than just passing "by reference". Pointers allow functions to manipulate arbitrary locations of memory to whatever value required. Pointers are also used to point to new addresses of execution code to dynamically execute arbitrary logic, not just data variables. Pointers may even point to other pointers (double pointer). That is powerful but also pretty easy to introduce hard-to-detect bugs and security vulnerabilities.
If you haven't seen pointers before, you're surely missing out on this mini-gem:
void strcpy(char *dest, char *src)
{
while(*dest++ = *src++);
}
Historically, what made programming possible was the realization that memory locations could hold computer instructions, not just data.
Pointers arose from the realization that memory locations could also hold the address of other memory locations, thus giving us indirection. Without pointers (at a low level) most complicated data structures would be impossible. No linked-lists, binary-trees or hash-tables. No pass by reference, only by value. Since pointers can point to code, without them we would also have no virtual functions or function look up tables.
I use pointers and references heavily in my day to day work...in managed code (C#, Java) and unmanaged (C++, C). I learned about how to deal with pointers and what they are by the master himself...[Binky!!][1] Nothing else needs to be said ;)
The difference between a pointer and reference is this. A pointer is an address to some block of memory. It can be rewritten or in other words, reassigned to some other block of memory. A reference is simply a renaming of some object. It can only be assigned once! Once it is assigned to an object, it cannot be assigned to another. A reference is not an address, it is another name for the variable. Check out C++ FAQ for more on this.
Link1
LInk2
I'm currently waist-deep in designing some high level enterprise software in which chunks of data (stored in an SQL database, in this case) are referenced by 1 or more other entities. If a chunk of data remains when no more entities reference it, we're wasting storage. If a reference points so data that's not present, that's a big problem too.
There's a strong analogy to be made between our issues, and those of memory management in a language that uses pointers. It's tremendously useful to be able to talk to my colleagues in terms of that analogy. Not deleting unreferenced data is a "memory leak". A reference that goes nowhere is a "dangling pointer". We can choose explicit "frees", or we can implement "garbage collection" using "reference counting".
So here, understanding low-level memory management is helping design high-level applications.
In Java you're using pointers all the time. Most variables are pointers to objects - which is why:
StringBuffer x = new StringBuffer("Hello");
StringBuffer y = x;
x.append(" boys");
System.out.println(y);
... prints "Hello boys" and not "Hello".
The only difference in C is that it's common to add and subtract from pointers - and if you get the logic wrong you can end up messing with data you shouldn't be touching.
Strings are fundamental to C (and other related languages). When programming in C, you must manage your memory. You don't just say "okay, I'll need a bunch of strings"; you need to think about the data structure. How much memory do you need? When will you allocate it? When will you free it? Let's say you want 10 strings, each with no more than 80 characters.
Okay, each string is an array of characters (81 characters - you mustn't forget the null or you'll be sorry!) and then each string is itself in an array. The final result will be a multidimensional array something like
char dict[10][81];
Note, incidentally, that dict isn't a "string" or an "array", or a "char". It's a pointer. When you try to print one of those strings, all you're doing is passing the address of a single character; C assumes that if it just starts printing characters it will eventually hit a null. And it assumes that if you are at the start of one string, and you jump forward 81 bytes, you'll be at the start of the next string. And, in fact taking your pointer and adding 81 bytes to it is the only possible way to jump to the next string.
So, why are pointers important? Because you can't do anything without them. You can't even do something simple like print out a bunch of strings; you certainly can't do anything interesting like implement linked lists, or hashes, or queues, or trees, or a file system, or some memory management code, or a kernel or...whatever. You NEED to understand them because C just hands you a block of memory and let's you do the rest, and doing anything with a block of raw memory requires pointers.
Also many people suggest that the ability to understand pointers correlates highly with programming skill. Joel has made this argument, among others. For example
Now, I freely admit that programming with pointers is not needed in 90% of the code written today, and in fact, it's downright dangerous in production code. OK. That's fine. And functional programming is just not used much in practice. Agreed.
But it's still important for some of the most exciting programming jobs. Without pointers, for example, you'd never be able to work on the Linux kernel. You can't understand a line of code in Linux, or, indeed, any operating system, without really understanding pointers.
From here. Excellent article.
To be honest, most seasoned developers will have a laugh (hopefully friendly) if you don't know pointers.
At my previous Job we had two new hires last year (just graduated) that didn't know about pointers, and that alone was the topic of conversation with them for about a week. No one could believe how someone could graduate without knowing pointers...
References in C++ are fundamentally different from references in Java or .NET languages; .NET languages have special types called "byrefs" which behave much like C++ "references".
A C++ reference or .NET byref (I'll use the latter term, to distinguish from .NET references) is a special type which doesn't hold a variable, but rather holds information sufficient to identify a variable (or something that can behave as one, such as an array slot) held elsewhere. Byrefs are generally only used as function parameters/arguments, and are intended to be ephemeral. Code which passes a byref to a function guarantees that the variable which is identified thereby will exist at least until that function returns, and functions generally guarantee not to keep any copy of a byref after they return (note that in C++ the latter restriction is not enforced). Thus, byrefs cannot outlive the variables identified thereby.
In Java and .NET languages, a reference is a type that identifies a heap object; each heap object has an associated class, and code in the heap object's class can access data stored in the object. Heap objects may grant outside code limited or full access to the data stored therein, and/or allow outside code to call certain methods within their class. Using a reference to calling a method of its class will cause that reference to be made available to that method, which may then use it to access data (even private data) within the heap object.
What makes references special in Java and .NET languages is that they maintain, as an absolute invariant, that every non-null reference will continue to identify the same heap object as long as that reference exists. Once no reference to a heap object exists anywhere in the universe, the heap object will simply cease to exist, but there is no way a heap object can cease to exist while any reference to it exists, nor is there any way for a "normal" reference to a heap object to spontaneously become anything other than a reference to that object. Both Java and .NET do have special "weak reference" types, but even they uphold the invariant. If no non-weak references to an object exist anywhere in the universe, then any existing weak references will be invalidated; once that occurs, there won't be any references to the object and it can thus be invalidated.
Pointers, like both C++ references and Java/.NET references, identify objects, but unlike the aforementioned types of references they can outlive the objects they identify. If the object identified by a pointer ceases to exist but the pointer itself does not, any attempt to use the pointer will result in Undefined Behavior. If a pointer isn't known either to be null or to identify an object that presently exists, there's no standard-defined way to do anything with that pointer other than overwrite it with something else. It's perfectly legitimate for a pointer to continue to exist after the object identified thereby has ceased to do so, provided that nothing ever uses the pointer, but it's necessary that something outside the pointer indicate whether or not it's safe to use because there's no way to ask the pointer itself.
The key difference between pointers and references (of either type) is that references can always be asked if they are valid (they'll either be valid or identifiable as null), and if observed to be valid they will remain so as long as they exist. Pointers cannot be asked if they are valid, and the system will do nothing to ensure that pointers don't become invalid, nor allow pointers that become invalid to be recognized as such.
For a long time I didn't understand pointers, but I understood array addressing. So I'd usually put together some storage area for objects in an array, and then use an index to that array as the 'pointer' concept.
SomeObject store[100];
int a_ptr = 20;
SomeObject A = store[a_ptr];
One problem with this approach is that after I modified 'A', I'd have to reassign it to the 'store' array in order for the changes to be permanent:
store[a_ptr] = A;
Behind the scenes, the programming language was doing several copy-operations. Most of the time this didn't affect performance. It mostly made the code error-prone and repetitive.
After I learned to understand pointers, I moved away from implementing the array addressing approach. The analogy is still pretty valid. Just consider that the 'store' array is managed by the programming language's run-time.
SomeObject A;
SomeObject* a_ptr = &A;
// Any changes to a_ptr's contents hereafter will affect
// the one-true-object that it addresses. No need to reassign.
Nowadays, I only use pointers when I can't legitimately copy an object. There are a bunch of reasons why this might be the case:
To avoid an expensive object-copy
operation for the sake of
performance.
Some other factor doesn't permit an
object-copy operation.
You want a function call to have
side-effects on an object (don't
pass the object, pass the pointer
thereto).
In some languages- if you want to
return more than one value from a
function (though generally
avoided).
Pointers are the most pragmatic way of representing indirection in lower-level programming languages.
Pointers are important! They "point" to a memory address, and many internal structures are represented as pointers, IE, An array of strings is actually a list of pointers to pointers! Pointers can also be used for updating variables passed to functions.
You need them if you want to generate "objects" at runtime without pre allocate memory on the stack
Parameter efficency - passing a pointer (Int - 4 bytes) as opposed to copying a whole (arbitarily large) object.
Java classes are passed via reference (basically a pointer) also btw, its just that in java that's hidden from the programmer.
Programming in languages like C and C++ you are much closer to the "metal". Pointers hold a memory location where your variables, data, functions etc. live. You can pass a pointer around instead of passing by value (copying your variables and data).
There are two things that are difficult with pointers:
Pointers on pointers, addressing, etc. can get very cryptic. It leads to errors, and it is hard to read.
Memory that pointers point to is often allocated from the heap, which means you are responsible for releasing that memory. Bigger your application gets, harder it is to keep up with this requirement, and you end up with memory leaks that are hard to track down.
You could compare pointer behavior to how Java objects are passed around, with the exception that in Java you do not have to worry about freeing the memory as this is handled by garbage collection. This way you get the good things about pointers but do not have to deal with the negatives. You can still get memory leaks in Java of course if you do not de-reference your objects but that is a different matter.
Also just something to note, you can use pointers in C# (as opposed to normal references) by marking a block of code as unsafe. Then you can run around changing memory addresses directly and do pointer arithmetic and all that fun stuff. It's great for very fast image manipulation (the only place I personally have used it).
As far as I know Java and ActionScript don't support unsafe code and pointers.
I am always distressed by the focus on such things as pointers or references in high-level languages. It's really useful to think at a higher level of abstraction in terms of the behavior of objects (or even just functions) as opposed to thinking in terms of "let me see, if I send the address of this thing to there, then that thing will return me a pointer to something else"
Consider even a simple swap function. If you have
void swap(int & a, int & b)
or
procedure Swap(var a, b : integer)
then interpret these to mean that the values can be changed. The fact that this is being implemented by passing the addresses of the variables is just a distraction from the purpose.
Same with objects --- don't think of object identifiers as pointers or references to "stuff". Instead, just think of them as, well, OBJECTS, to which you can send messages. Even in primitive languages like C++, you can go a lot further a lot faster by thinking (and writing) at as high a level as possible.
Write more than 2 lines of c or c++ and you'll find out.
They are "pointers" to the memory location of a variable. It is like passing a variable by reference kinda.

Resources