C Language: Why does malloc() return a pointer, and not the value? - c

From my understanding of C it seems that you are supposed to use malloc(size) whenever you are trying to initialize, for instance, an array whose size you do not know of until runtime.
But I was wondering why the function malloc() returns a pointer to the location of the variable and why you even need that.
Basically, why doesn't C just hide it all from you, so that whenever you do something like this:
// 'n' gets stdin'ed from the user
...
int someArray[n];
for(int i = 0; i < n; i++)
someArray[i] = 5;
you can do it without ever having to call malloc() or some other function? Do other languages do it like this (by hiding the memory properties/location altogether)? I feel that as a beginner this whole process of dealing with the memory locations of variables you use just confuse programmers (and since other languages don't use it, C seems to make a simple initialization process such as this overly complicated)...
Basically, what I'm trying to ask is why malloc() is even necessary, because why the language doesn't take care of all that for you internally without the programmer having to be concerned about or having to see memory. Thanks
*edit: Ok, maybe there are some versions of C that I'm not aware of that allows you to forgo the use of malloc() but let's try to ignore that for now...

C lets you manage every little bit of your program. You can manage when memory gets allocated; you can manage when it gets deallocated; you can manage how to grow a small allocation, etc.
If you prefer not to manage that and let the compiler do it for you, use another language.

Actually C99 allows this (so you're not the only one thinking of it). The feature is called VLA (VAriable Length Array).
It's legal to read an int and then have an array of that size:
int n;
fscanf("%d", &n);
int array[n];
Of course there are limitations since malloc uses the heap and VLAs use the stack (so the VLAs can't be as big as the malloced objects).
*edit: Ok, maybe there are some versions of C that I'm not aware of that allows you to forgo the use of malloc() but let's try to ignore
that for now...
So we can concentrate on the flame ?

Basically, what I'm trying to ask is why malloc() is even necessary,
because why the language doesn't take care of all that for you
internally without the programmer having to be concerned about or
having to see memory.
The very point of malloc(), it's raison d'être, it's function, if you will, is to allocate a block of memory. The way we refer to a block of memory in C is by its starting address, which is by definition a pointer.
C is close to 40 years old, and it's not nearly as "high level" as some more modern languages. Some languages, like Java, attempt to prevent mistakes and simplify programming by hiding pointers and explicit memory management from the programmer. C is not like that. Why? Because it just isn't.

Basically, what I'm trying to ask is why malloc() is even necessary, because why the language doesn't take care of all that for you internally without the programmer having to be concerned about or having to see memory. Thanks
One of the hallmarks of C is its simplicity (C compilers are relatively easy to implement); one way of making a language simple is to force the programmer to do all his own memory management. Clearly, other languages do manage objects on the heap for you - Java and C# are modern examples, but the concept isn't new at all; Lisp implementations have been doing it for decades. But that convenience comes at a cost in both compiler complexity and runtime performance.
The Java/C# approach helps eliminate whole classes of memory-management bugs endemic to C (memory leaks, invalid pointer dereferences, etc.). By the same token, C provides a level of control over memory management that allows the programmer to achieve high levels of performance that would be difficult (not impossible) to match in other languages.

If the only purpose of dynamic allocation were to allocate variable-length arrays, then malloc() might not be necessary. (But note that malloc() was around long before variable-length arrays were added to the language.)
But the size of a VLA is fixed (at run time) when the object is created. It can't be resized, and it's deallocated only when you leave the scope in which it's declared. (And VLAs, unlike malloc(), don't have a mechanism for reporting allocation failures.)
malloc() gives you a lot more flexibility.
Consider creating a linked list. Each node is a structure, containing some data and a pointer to the next node in the list. You might know the size of each node in advance, but you don't know how many nodes to allocate. For example, you might read lines from a text file, creating and appending a new node for each line.
You can also use malloc() along with realloc() to create a buffer (say, an array of unsigned char) whose size can be changed after you created it.
Yes, there are languages that don't expose pointers, and that handle memory management for you.
A lot of them are implemented in C.

Maybe the question should be "why do you need something like int array[n] when you can use pointers?"
After all, pointers allow you to keep an object alive beyond the scope it was created in, you can use pointer to slice and dice arrays (for example strchr() returns a pointer to a string), pointers are light-weight objects, so it's cheap to pass them to functions and return them from functions, etc.
But the real answer is "that's how it is". Other options are possible, and the proof is that there are other languages that do other things (and even C99 allows different things).

C is treated as highly developed low-level language, basically malloc is used in dynamic arrays which is a key component in stack & queue. for other languages that hides the pointer part from the developer are not well capable of doing hardware related programming.

The short answer to your question is to ponder this question: What if you also need to control exactly when the memory is de-allocated?

C is a compiled language, not an interpreted one. If you don't know n at compile time, how is the compiler supposed to produce a binary?

Related

Why do so many standard C functions tamper with parameters instead of returning values?

Many functions like strcat, strcpy and alike don't return the actual value but change one of the parameters (usually a buffer). This of course creates a boatload of side effects.
Wouldn't it be far more elegant to just return a new string? Why isn't this done?
Example:
char *copy_string(char *text, size_t length) {
char *result = malloc(sizeof(char) * length);
for (int i = 0; i < length; ++i) {
result[i] = text[i];
}
return result;
}
int main() {
char *copy = copy_string("Hello World", 12);
// *result now lingers in memory and can not be freed?
}
I can only guess it has something to do with memory leaking since there is dynamic memory being allocated inside of the function which you can not free internally (since you need to return a pointer to it).
Edit: From the answers it seems that it is good practice in C to work with parameters rather than creating new variables. So I should aim for building my functions like that?
Edit 2: Will my example code lead to a memory leak? Or can *result be free'd?
To answer your original question: C, at the time it was designed, was tailored to be a language of maximum efficiency. It was, basically, just a nicer way of writing assembly code (the guy who designed it, wrote his own compiler for it).
What you say (that parameters are often used rather than return codes) is mainly true for string handling. Most other functions (those that deal with numbers for example) work through return codes as expected. Or they only modify values for parameters if they have to return more than one value.
String handling in C today is considered one of the major (if not THE major) weakness in C. But those functions were written with performance in mind, and with the machines available those days (and the intent of performance) working on the callers buffers was the way of choice.
Re your edit 1: Today other intents may apply. Performance usually isn't the limiting factor. Equally or important are readability, robustness, pronenees to error. And generally, as said, the string handling in C is today generally considered an horrible relic of the past. So it's basically your choice, depending on your intent.
Re your edit 2: Yes, the memory will leak. You need to call free(copy); Which ties into edit 1: proneness of error - it's easy to forget the free and create leaks that way (or attempt to free it twice or access it after it was freed). It may be more readable and more more prone to error too (even more than the clunky original C approach of modifying the caller's buffer).
Generally, I'd suggest, whenever you have the choice, to work with a newer dialect that support std-string or something similar.
Why do so many standard C functions tamper with parameters instead of returning values?
Because that's often what the users of the C library wants.
Many functions like strcat, strcpy and alike don't return the actual value but change one of the parameters (usually a buffer). This of course creates a boatload of side effects. Wouldn't it be far more elegant to just return a new string? Why isn't this done?
It's not very efficient to allocate a memory and it'll require the user to free() them later, which is an unnecessary burden on the user. Efficiency and letting users do what they want (even if they want shoot themselves in the foot) is a part of C's philosophy.
Besides, there are syntax/implementation issues. For example, how can the following be done if the strcpy() function actually returns a newly allocated string?
char arr[256] = "Hello";
strcpy(arr, "world");
Because C doesn't allow you assign something to an array (arr).
Basically, you are questioning C is the way it is. For that question, the common answer is "historical reasons".
Two reasons:
Properly designed functions should only concern themselves with their designated purpose, and not unrelated things such as memory allocation.
Making a hard copy of the string would make the function far slower.
So for your example, if there is a need for a hard copy, the caller should malloc the buffer and afterwards call strcpy. That separates memory allocation from the algorithm.
On top of that, good design practice dictates that the module that allocated memory should also be responsible for freeing it. Otherwise the caller might not even realize that the function is allocating memory, and there would be a memory leak. If the caller instead is responsible for the allocation, then it is obvious that the caller is also responsible for clean-up.
Overall, C standard library functions are designed to be as fast as possible, meaning they will strive to meet the case where the caller has minimal requirements. A typical example of such a function is malloc, which doesn't even set the allocated data to zero, because that would take extra time. Instead they added an additional function calloc for that purpose.
Other languages have different philosophies, where they would for example force a hard copy for all string handling functions ("immutable objects"). This makes the function easier to work with and perhaps also the code easier to read, but it comes at the expense of a slower program, which needs more memory.
This is one of the main reasons why C is still widely used for development. It tends is much faster and more efficient than any other language (except raw assembler).

Why does Passing Pointers as input arguments allow "swapping" and other possibly unintended side effects?

C beginner. Below is a page from the lecture notes at my university (4 hours total to learn structured programming). The procedure manages to change the places that global pointers address, despite not having any output. This is confusing as hell, and seems a dangerous thing to be allowed. Can anyone explain what is happening in the second case, and how it happens. Ie: how are the normal rules bypassed?
I know this has been asked from a "I want to use this" background here and less generally in other places. But I am asking why this is allowed, when normally, changing values in a function would produce no effect unless they are returned?
That is basically the whole purpose of having pointers: you can change the values contained at a location in memory, and you can share the same pointer between different pieces of code.
In C, you can make a pointer to almost anything, and then pass that pointer to another function so that the other function can modify the object the pointer points to. Whew, that's a mouthful. C and C++ are fairly unusual in letting you make pointers so freely, many other languages have tight restrictions on what pointers can point to. Java, for example, only allows pointers to point to object instances, and you can't do pointer arithmetic. (Yes, Java has pointers.)
And yes, it is "dangerous", in the sense that it's easy to write incorrect programs that misuse pointers. Dangling pointers, buffer overflows, null pointer dereferencing, and many other types of programming errors are possible in C because of the way pointers work. If you're lucky, your program will crash and you can debug it. If you're unlucky, it won't crash.
The danger in using pointers is one of the primary motivations between other languages such as Java, things like std::unique_ptr<> in C++, C#, Rust, Go, etc. If you are writing C, you just have to be careful. So, why use C? Sometimes, you need to use those pointers and scribble over memory however you want. The Linux kernel is mostly written in C, and plenty of language runtimes are at least partially written in C (I know this applies at least to Python, Java, and Go).
People still write in assembly language, too. Sometimes even C doesn't let you do what you need.

Assert the allocation of a variable-length array

I apologize for the possible duplicate (have not been able to find an answer to that):
Do we need to ensure that the allocation of a variable-length array has completed successfully?
For example:
void func(int size)
{
int arr[size];
if (arr == NULL)
{
// Exit with a failure
}
else
{
// Continue as planned
}
}
It seems obvious that the answer is yes, but the syntax arr == NULL feels a bit unusual.
Thanks
UPDATE:
I admit to the fact that I haven't made sure that the code above even compiles (assuming that it does).
If it doesn't compile, then it means that there is no way to assert the allocation of a variable-length array.
Hence, I assume that if the allocation fails, then the program crashes immediately.
This would be a very awkward case, as it makes sense for a program to crash after an illegal memory access (read or write), but not after a non-successful memory allocation.
Or perhaps the allocation will not cause anything, but as soon as I access the array at an entry which "falls" outside the stack, I might get a memory access violation (as in a stack-overflow)...?
To be honest, I can't even see how VLAs are allocated on the stack if any more local variables follow them (other VLAs in particular), so I would appreciate an answer on that issue as well.
This question proceeds from a slightly flawed first premise. You cannot check if an array is NULL because, as is a popular discussion topic, an array is not a pointer in C. An array is the storage object, in-place.
You cannot get to code where the array name is accessible without the array having been allocated. A local array is exactly the same as any other local variable: its existence is inherent and assumed for the surrounding code to be running at all, and there's no notion in the language for checking whether any given variable slot has been "allocated" at all (as the comments on the question note, "the stack" is a notion below the level C operates on - the language assumes it "happens", by unspecified magic). It has to assume this much always succeeds in order for the code to make sense on a most basic level.
What happens in the case where the array couldn't be allocated is therefore the same as whatever happens when the runtime can't allocate space for any other local variable - the situation is inherently undefined and undefinable, because an assumption made by the C language abstract machine was violated. The language has no (fully formal) concepts that can even express this, let alone check for it or recover from it, so testing for it is similarly out of scope. Like a stack overflow, this is basically guaranteed to lead to a fatal crash.
This does not make VLAs useless, for several reasons:
Many uses of VLAs aren't going to be life-threateningly huge. Perhaps the only use of the variation is to choose a number between 3 and 5? This is no worse for space than using a few more scalar locals.
Just as avoiding infinite recursion requires the programmer to prove certain properties that a C compiler doesn't, similarly you should design your program with at least a weak bound on the amount of space VLAs will be allowed to consume at any given time. For instance, you can prove to yourself that no VLA functions are ever recursive, or called from a recursive function, and none of them use more than e.g. 10K space - that's plenty useful and should be safe.
You can view VLAs as an optimisation to allow you to save space where you otherwise would have had to allocate a statically-sized local array (e.g. in the first example, always allocating 5 instead of 3). As long as you know, and design around, the static upper bound, they are effectively guaranteed to make your program safer from overflow, by providing an option to not always use as much space when it isn't required.

Why isn't there a "memsize" in C which returns the size of a memory block allocated in the heap using malloc?

ok. It can be called anything else as in _msize in Visual Studio.
But why is it not in the standard to return the size of the memory given the memory block alloced using malloc? Since we can not tell how much memory is pointed to by the return pointer following malloc, we could use this "memsize" call to return that information should we need it. "memsize" would be implementation specific as are malloc/free
Just asking as I had to write a wrapper sometime back to store some additional bytes for the size.
Because the C library, including malloc, was designed for minimum overhead. A function like the one you want would require the implementation to record the exact size of the allocation, while implementations may now choose to "round" the size up as they please, to prevent actually reallocating in realloc.
Storing the size requires an extra size_t per allocation, which may be heavy for embedded systems. (And for the PDP-11s and 286s that were still abundant when C89 was written.)
To turn this around, why should there be? There's plenty of stuff in the Standards already, particularly the C++ standard. What are your use cases?
You ask for an adequately-sized chunk of memory, and you get it (or a null pointer or exception). There may or may not be additional bytes allocated, and some of these may be reserved. This is conceptually simple: you ask for what you want, and you get something you can use.
Why complicate it?
I don't think there is any definite answer. The developers of the standard probably considered it, and weighed the pros and cons. Anything that goes into a standard must be implemented by every implementation, so adding things to it places a significant burden on developers. I guess they just didn't find that feature useful enough to warrant this.
In C++, the wrapper that you talk about is provided by the standard. If you allocate a block of memory with std::vector, you can use the member function vector::size() to determine the size of the array and use vector::capacity() to determine the size of the allocation (which might be different).
C, on the other hand, is a low-level language which leaves such concerns to be managed by the developer, since tracking it dynamically (as you suggest) is not strictly necessary and would be redundant in many cases.

What is the real difference between Pointers and References?

AKA - What's this obsession with pointers?
Having only really used modern, object oriented languages like ActionScript, Java and C#, I don't really understand the importance of pointers and what you use them for. What am I missing out on here?
It's all just indirection: The ability to not deal with data, but say "I'll direct you to some data, over there". You have the same concept in Java and C#, but only in reference format.
The key differences are that references are effectively immutable signposts - they always point to something. This is useful, and easy to understand, but less flexible than the C pointer model. C pointers are signposts that you can happily rewrite. You know that the string you're looking for is next door to the string being pointed at? Well, just slightly alter the signpost.
This couples well with C's "close to the bone, low level knowledge required" approach. We know that a char* foo consists of a set of characters beginning at the location pointed to by the foo signpost. If we also know that the string is at least 10 characters long, we can change the signpost to (foo + 5) to point at then same string, but start half the length in.
This flexibility is useful when you know what you're doing, and death if you don't (where "know" is more than just "know the language", it's "know the exact state of the program"). Get it wrong, and your signpost is directing you off the edge of a cliff. References don't let you fiddle, so you're much more confident that you can follow them without risk (especially when coupled with rules like "A referenced object will never disappear", as in most Garbage collected languages).
You're missing out on a lot! Understanding how the computer works on lower levels is very useful in several situations. C and assembler will do that for you.
Basically a pointer lets you write stuff to any point in the computer's memory. On more primitive hardware/OS or in embedded systems this actually might do something useful. Say turn the blinkenlichts on and off again.
Of course this doesn't work on modern systems. The operating system is the Lord and Master of main memory. If you try to access a wrong memory location, your process will pay for its hubris with its life.
In C, pointers are the way of passing references to data. When you call a function, you don't want to copy a million bits to a stack. Instead you just tell where the data resides in the main memory. In other words, you give a pointer to the data.
To some extent that is what happens even with Java. You pass references to objects, not the objects themselves. Remember, ultimately every object is a set of bits in the computer main memory.
Pointers are for directly manipulating the contents of memory.
It's up to you whether you think this is a good thing to do, but it's the basis of how anything gets done in C or assembler.
High-level languages hide pointers behind the scenes: for example a reference in Java is implemented as a pointer in almost any JVM you'll come across, which is why it's called NullPointerException rather than NullReferenceException. But it doesn't let the programmer directly access the memory address it points to, and it can't be modified to take a value other than the address of an object of the correct type. So it doesn't offer the same power (and responsibility) that pointers in low-level languages do.
[Edit: this is an answer to the question 'what's this obsession with pointers?'. All I've compared is assembler/C-style pointers with Java references. The question title has since changed: had I set out to answer the new question I might have mentioned references in languages other than Java]
This is like asking, “what's this obsession with CPU instructions? Do I miss out on something by not sprinkling x86 MOV instructions all over the place?”
You just need pointers when programming on a low level. In most higher-level programming language implementations, pointers are used just as extensively as in C, but hidden from the user by the compiler.
So... Don't worry. You're using pointers already -- and without the dangers of doing so incorrectly, too. :)
I see pointers as a manual transmission in a car. If you learn to drive with a car that has an automatic transmission, that won't make for a bad driver. And you can still do most everything that the drivers that learned on a manual transmission can do. There will just be a hole in your knowledge of driving. If you had to drive a manual you'd probably be in trouble. Sure, it is easy to understand the basic concept of it, but once you have to do a hill start, you're screwed. But, there is still a place for manual transmissions. For instance, race car drivers need to be able to shift to get the car to respond in the most optimal way to the current racing conditions. Having a manual transmission is very important to their success.
This is very similar to programming right now. There is a need for C/C++ development on some software. Some examples are high-end 3D games, low level embedded software, things where speed is a critical part of the software's purpose, and a lower level language that allows you closer access to the actual data that needs to be processed is key to that performance. However, for most programmers this is not the case and not knowing pointers is not crippling. However, I do believe everybody can benefit from learning about C and pointers, and manual transmissions too.
Since you have been programming in object-oriented languages, let me put it this way.
You get Object A instantiate Object B, and you pass it as a method parameter to Object C. The Object C modifies some values in the Object B. When you are back to Object A's code, you can see the changed value in Object B. Why is this so?
Because you passed in a reference of Object B to Object C, not made another copy of Object B. So Object A and Object C both hold references to the same Object B in memory. Changes from one place and be seen in another. This is called By Reference.
Now, if you use primitive types instead, like int or float, and pass them as method parameters, changes in Object C cannot be seen by Object A, because Object A merely passed a copy instead of a reference of its own copy of the variable. This is called By Value.
You probably already knew that.
Coming back to the C language, Function A passes to Function B some variables. These function parameters are natively copies, By Value. In order for Function B to manipulate the copy belonging to Function A, Function A must pass a pointer to the variable, so that it becomes a pass By Reference.
"Hey, here's the memory address to my integer variable. Put the new value at that address location and I will pick up later."
Note the concept is similar but not 100% analogous. Pointers can do a lot more than just passing "by reference". Pointers allow functions to manipulate arbitrary locations of memory to whatever value required. Pointers are also used to point to new addresses of execution code to dynamically execute arbitrary logic, not just data variables. Pointers may even point to other pointers (double pointer). That is powerful but also pretty easy to introduce hard-to-detect bugs and security vulnerabilities.
If you haven't seen pointers before, you're surely missing out on this mini-gem:
void strcpy(char *dest, char *src)
{
while(*dest++ = *src++);
}
Historically, what made programming possible was the realization that memory locations could hold computer instructions, not just data.
Pointers arose from the realization that memory locations could also hold the address of other memory locations, thus giving us indirection. Without pointers (at a low level) most complicated data structures would be impossible. No linked-lists, binary-trees or hash-tables. No pass by reference, only by value. Since pointers can point to code, without them we would also have no virtual functions or function look up tables.
I use pointers and references heavily in my day to day work...in managed code (C#, Java) and unmanaged (C++, C). I learned about how to deal with pointers and what they are by the master himself...[Binky!!][1] Nothing else needs to be said ;)
The difference between a pointer and reference is this. A pointer is an address to some block of memory. It can be rewritten or in other words, reassigned to some other block of memory. A reference is simply a renaming of some object. It can only be assigned once! Once it is assigned to an object, it cannot be assigned to another. A reference is not an address, it is another name for the variable. Check out C++ FAQ for more on this.
Link1
LInk2
I'm currently waist-deep in designing some high level enterprise software in which chunks of data (stored in an SQL database, in this case) are referenced by 1 or more other entities. If a chunk of data remains when no more entities reference it, we're wasting storage. If a reference points so data that's not present, that's a big problem too.
There's a strong analogy to be made between our issues, and those of memory management in a language that uses pointers. It's tremendously useful to be able to talk to my colleagues in terms of that analogy. Not deleting unreferenced data is a "memory leak". A reference that goes nowhere is a "dangling pointer". We can choose explicit "frees", or we can implement "garbage collection" using "reference counting".
So here, understanding low-level memory management is helping design high-level applications.
In Java you're using pointers all the time. Most variables are pointers to objects - which is why:
StringBuffer x = new StringBuffer("Hello");
StringBuffer y = x;
x.append(" boys");
System.out.println(y);
... prints "Hello boys" and not "Hello".
The only difference in C is that it's common to add and subtract from pointers - and if you get the logic wrong you can end up messing with data you shouldn't be touching.
Strings are fundamental to C (and other related languages). When programming in C, you must manage your memory. You don't just say "okay, I'll need a bunch of strings"; you need to think about the data structure. How much memory do you need? When will you allocate it? When will you free it? Let's say you want 10 strings, each with no more than 80 characters.
Okay, each string is an array of characters (81 characters - you mustn't forget the null or you'll be sorry!) and then each string is itself in an array. The final result will be a multidimensional array something like
char dict[10][81];
Note, incidentally, that dict isn't a "string" or an "array", or a "char". It's a pointer. When you try to print one of those strings, all you're doing is passing the address of a single character; C assumes that if it just starts printing characters it will eventually hit a null. And it assumes that if you are at the start of one string, and you jump forward 81 bytes, you'll be at the start of the next string. And, in fact taking your pointer and adding 81 bytes to it is the only possible way to jump to the next string.
So, why are pointers important? Because you can't do anything without them. You can't even do something simple like print out a bunch of strings; you certainly can't do anything interesting like implement linked lists, or hashes, or queues, or trees, or a file system, or some memory management code, or a kernel or...whatever. You NEED to understand them because C just hands you a block of memory and let's you do the rest, and doing anything with a block of raw memory requires pointers.
Also many people suggest that the ability to understand pointers correlates highly with programming skill. Joel has made this argument, among others. For example
Now, I freely admit that programming with pointers is not needed in 90% of the code written today, and in fact, it's downright dangerous in production code. OK. That's fine. And functional programming is just not used much in practice. Agreed.
But it's still important for some of the most exciting programming jobs. Without pointers, for example, you'd never be able to work on the Linux kernel. You can't understand a line of code in Linux, or, indeed, any operating system, without really understanding pointers.
From here. Excellent article.
To be honest, most seasoned developers will have a laugh (hopefully friendly) if you don't know pointers.
At my previous Job we had two new hires last year (just graduated) that didn't know about pointers, and that alone was the topic of conversation with them for about a week. No one could believe how someone could graduate without knowing pointers...
References in C++ are fundamentally different from references in Java or .NET languages; .NET languages have special types called "byrefs" which behave much like C++ "references".
A C++ reference or .NET byref (I'll use the latter term, to distinguish from .NET references) is a special type which doesn't hold a variable, but rather holds information sufficient to identify a variable (or something that can behave as one, such as an array slot) held elsewhere. Byrefs are generally only used as function parameters/arguments, and are intended to be ephemeral. Code which passes a byref to a function guarantees that the variable which is identified thereby will exist at least until that function returns, and functions generally guarantee not to keep any copy of a byref after they return (note that in C++ the latter restriction is not enforced). Thus, byrefs cannot outlive the variables identified thereby.
In Java and .NET languages, a reference is a type that identifies a heap object; each heap object has an associated class, and code in the heap object's class can access data stored in the object. Heap objects may grant outside code limited or full access to the data stored therein, and/or allow outside code to call certain methods within their class. Using a reference to calling a method of its class will cause that reference to be made available to that method, which may then use it to access data (even private data) within the heap object.
What makes references special in Java and .NET languages is that they maintain, as an absolute invariant, that every non-null reference will continue to identify the same heap object as long as that reference exists. Once no reference to a heap object exists anywhere in the universe, the heap object will simply cease to exist, but there is no way a heap object can cease to exist while any reference to it exists, nor is there any way for a "normal" reference to a heap object to spontaneously become anything other than a reference to that object. Both Java and .NET do have special "weak reference" types, but even they uphold the invariant. If no non-weak references to an object exist anywhere in the universe, then any existing weak references will be invalidated; once that occurs, there won't be any references to the object and it can thus be invalidated.
Pointers, like both C++ references and Java/.NET references, identify objects, but unlike the aforementioned types of references they can outlive the objects they identify. If the object identified by a pointer ceases to exist but the pointer itself does not, any attempt to use the pointer will result in Undefined Behavior. If a pointer isn't known either to be null or to identify an object that presently exists, there's no standard-defined way to do anything with that pointer other than overwrite it with something else. It's perfectly legitimate for a pointer to continue to exist after the object identified thereby has ceased to do so, provided that nothing ever uses the pointer, but it's necessary that something outside the pointer indicate whether or not it's safe to use because there's no way to ask the pointer itself.
The key difference between pointers and references (of either type) is that references can always be asked if they are valid (they'll either be valid or identifiable as null), and if observed to be valid they will remain so as long as they exist. Pointers cannot be asked if they are valid, and the system will do nothing to ensure that pointers don't become invalid, nor allow pointers that become invalid to be recognized as such.
For a long time I didn't understand pointers, but I understood array addressing. So I'd usually put together some storage area for objects in an array, and then use an index to that array as the 'pointer' concept.
SomeObject store[100];
int a_ptr = 20;
SomeObject A = store[a_ptr];
One problem with this approach is that after I modified 'A', I'd have to reassign it to the 'store' array in order for the changes to be permanent:
store[a_ptr] = A;
Behind the scenes, the programming language was doing several copy-operations. Most of the time this didn't affect performance. It mostly made the code error-prone and repetitive.
After I learned to understand pointers, I moved away from implementing the array addressing approach. The analogy is still pretty valid. Just consider that the 'store' array is managed by the programming language's run-time.
SomeObject A;
SomeObject* a_ptr = &A;
// Any changes to a_ptr's contents hereafter will affect
// the one-true-object that it addresses. No need to reassign.
Nowadays, I only use pointers when I can't legitimately copy an object. There are a bunch of reasons why this might be the case:
To avoid an expensive object-copy
operation for the sake of
performance.
Some other factor doesn't permit an
object-copy operation.
You want a function call to have
side-effects on an object (don't
pass the object, pass the pointer
thereto).
In some languages- if you want to
return more than one value from a
function (though generally
avoided).
Pointers are the most pragmatic way of representing indirection in lower-level programming languages.
Pointers are important! They "point" to a memory address, and many internal structures are represented as pointers, IE, An array of strings is actually a list of pointers to pointers! Pointers can also be used for updating variables passed to functions.
You need them if you want to generate "objects" at runtime without pre allocate memory on the stack
Parameter efficency - passing a pointer (Int - 4 bytes) as opposed to copying a whole (arbitarily large) object.
Java classes are passed via reference (basically a pointer) also btw, its just that in java that's hidden from the programmer.
Programming in languages like C and C++ you are much closer to the "metal". Pointers hold a memory location where your variables, data, functions etc. live. You can pass a pointer around instead of passing by value (copying your variables and data).
There are two things that are difficult with pointers:
Pointers on pointers, addressing, etc. can get very cryptic. It leads to errors, and it is hard to read.
Memory that pointers point to is often allocated from the heap, which means you are responsible for releasing that memory. Bigger your application gets, harder it is to keep up with this requirement, and you end up with memory leaks that are hard to track down.
You could compare pointer behavior to how Java objects are passed around, with the exception that in Java you do not have to worry about freeing the memory as this is handled by garbage collection. This way you get the good things about pointers but do not have to deal with the negatives. You can still get memory leaks in Java of course if you do not de-reference your objects but that is a different matter.
Also just something to note, you can use pointers in C# (as opposed to normal references) by marking a block of code as unsafe. Then you can run around changing memory addresses directly and do pointer arithmetic and all that fun stuff. It's great for very fast image manipulation (the only place I personally have used it).
As far as I know Java and ActionScript don't support unsafe code and pointers.
I am always distressed by the focus on such things as pointers or references in high-level languages. It's really useful to think at a higher level of abstraction in terms of the behavior of objects (or even just functions) as opposed to thinking in terms of "let me see, if I send the address of this thing to there, then that thing will return me a pointer to something else"
Consider even a simple swap function. If you have
void swap(int & a, int & b)
or
procedure Swap(var a, b : integer)
then interpret these to mean that the values can be changed. The fact that this is being implemented by passing the addresses of the variables is just a distraction from the purpose.
Same with objects --- don't think of object identifiers as pointers or references to "stuff". Instead, just think of them as, well, OBJECTS, to which you can send messages. Even in primitive languages like C++, you can go a lot further a lot faster by thinking (and writing) at as high a level as possible.
Write more than 2 lines of c or c++ and you'll find out.
They are "pointers" to the memory location of a variable. It is like passing a variable by reference kinda.

Resources