Using Structs in Functions - c

I have a function and i'm accessing a struct's members a lot of times in it.
What I was wondering about is what is the good practice to go about this?
For example:
struct s
{
int x;
int y;
}
and I have allocated memory for 10 objects of that struct using malloc.
So, whenever I need to use only one of the object in a function, I usually create (or is passed as argument) pointer and point it to the required object (My superior told me to avoid array indexing because it adds a calculation when accessing any member of the struct)
But is this the right way? I understand that dereferencing is not as expensive as creating a copy, but what if I'm dereferencing a number of times (like 20 to 30) in the function.
Would it be better if i created temporary variables for the struct variables (only the ones I need, I certainly don't use all the members) and copy over the value and then set the actual struct's value before returning?
Also, is this unnecessary micro optimization? Please note that this is for embedded devices.

This is for an embedded system. So, I can't make any assumptions about what the compiler will do. I can't make any assumptions about word size, or the number of registers, or the cost of accessing off the stack, because you didn't tell me what the architecture is. I used to do embedded code on 8080s when they were new...
OK, so what to do?
Pick a real section of code and code it up. Code it up each of the different ways you have listed above. Compile it. Find the compiler option that forces it to print out the assembly code that is produced. Compile each piece of code with every different set of optimization options. Grab the reference manual for the processor and count the cycles used by each case.
Now you will have real data on which to base a decision. Real data is much better that the opinions of a million highly experience expert programmers. Sit down with your lead programmer and show him the code and the data. He may well show you better ways to code it. If so, recode it his way, compile it, and count the cycles used by his code. Show him how his way worked out.
At the very worst you will have spent a weekend learning something very important about the way your compiler works. You will have examined N ways to code things times M different sets of optimization options. You will have learned a lot about the instruction set of the machine. You will have learned how good, or bad, the compiler is. You will have had a chance to get to know your lead programmer better. And, you will have real data.
Real data is the kind of data that you must have to answer this question. With out that data nothing anyone tells you is anything but an ego based guess. Data answers the question.
Bob Pendleton

First of all, indexing an array is not very expensive (only like one operation more expensive than a pointer dereference, or sometimes none, depending on the situation).
Secondly, most compilers will perform what is called RVO or return value optimisation when returning structs by value. This is where the caller allocates space for the return value of the function it calls, and secretly passes the address of that memory to the function for it to use, and the effect is that no copies are made. It does this automatically, so
struct mystruct blah = func();
Only constructs one object, passes it to func for it to use transparently to the programmer, and no copying need be done.
What I do not know is if you assign an array index the return value of the function, like this:
someArray[0] = func();
will the compiler pass the address of someArray[0] and do RVO that way, or will it just not do that optimisation? You'll have to get a more experienced programmer to answer that. I would guess that the compiler is smart enough to do it though, but it's just a guess.
And yes, I would call it micro optimisation. But we're C programmers. And that's how we roll.

Generally, the case in which you want to make a copy of a passed struct in C is if you want to manipulate the data in place. That is to say, have your changes not be reflected in the struct it self but rather only in the return value. As for which is more expensive, it depends on a lot of things. Many of which change implementation to implementation so I would need more specific information to be more helpful. Though, I would expect, that in an embedded environment you memory is at a greater premium than your processing power. Really this reads like needless micro optimization, your compiler should handle it.

In this case creating temp variable on the stack will be faster. But if your structure is much bigger then you might be better with dereferencing.

Related

Economizing on variable use

I am working on some (embedded) device, recently I just started thinking maybe to use less memory, in case stack size isn't that big.
I have long functions (unfortunately).
And inside I was thinking to save space in this way.
Imagine there is code
1. void f()
2. {
3. ...
4. char someArray[300];
5. char someOtherArray[300];
6. someFunc(someArray, someOtherArray);
7. ...
8. }
Now, imagine, someArray and someOtherArray are never used in f function beyond line: 6.
Would following save some stack space??
1. void f()
2. {
3. ...
4. {//added
5. char someArray[300];
6. char someOtherArray[300];
7. someFunc(someArray, someOtherArray);
8. }//added
9. ...
8. }
nb: removed second part of the question
For the compiler proper both are exactly the same and thus makes no difference. The preprocessor would replace all instances of TEXT1 with the string constant.
#define TEXT1 "SomeLongStringLiteral"
someFunc(TEXT1)
someOtherFunc(TEXT1)
After the preprocessor's job is done, the above snippet becomes
someFunc("SomeLongStringLiteral");
someOtherFunc("SomeLongStringLiteral");
Thus it makes no difference performance or memory-wise.
Aside: The reason #define TEXT1 "SomeLongStringLiteral" is done is to have a single place to change all instances of TEXT1s usage; but that's a convinience only for the programmer and has no effect on the produced output.
recently I just started thinking maybe to use less memory, in case stack size isn't that big.
Never micro optimise or prematurely optimise. In case the stack size isn't that big, you'll get to know it when you benchmark/measure it. Don't make any assumptions when you optimise; 99% of the times it'd be wrong.
I am working on some device
Really? Are you? I wouldn't have thought that.
Now, imagine, someArray and someOtherArray are never used in f function beyond line 6. Would following save some stack space?
On a good compiler, it wouldn't make a difference. By the standard, it isn't specified if it saves or not, it isn't even specified if there is a stack or not.
But on a not so good compiler, the one with the additional {} may be better. It is worth a test: compile it and look at the generated assembler code.
it seems my compiler doesn't allow me to do this (this is C), so never mind...
But it should so. What happens then? Maybe you are just confusing levels of {} ...
I'll ask another one here.
Would better be a separate question...
someFunc("SomeLongStringLiteral");
someOtherFunc("SomeLongStringLiteral");
vs.
someFunc(TEXT1)
someOtherFunc(TEXT1)
A #define is processed before any compilation step, so it makes absolutely no difference.
If it happens within the same compilation unit, the compiler will tie them together anyway. (At least, in this case. On an ATXmega, if you use PSTR("whatever") for having them in flash space only, each occurrence of them will be put into flash separately. But that's a completely different thing...)
Modern compilers should push stack variables before they are used, and pop them when they are no longer needed. The old thinking with { ... } marking the start and end of a stack push/pop should be rather obsolete by now.
Since 1999, C allows stack variables to be allocated anywhere and not just immediately after a {. C++ allowed this far earlier. Today, where the local variable is declared inside the scope has little to do with when it actually starts to exist in the machine code. And similarly, the } has little to do with when is ceases to exist.
So regarding adding extra { }, don't bother. It is premature optimization and only adds pointless clutter.
Regarding the #define it absolutely makes no difference in terms of efficiency. Macros are just text replacement.
Furthermore, from the generic point-of-view, data must always be allocated somewhere. Data used by a program cannot be allocated in thin air! That's a very common misunderstanding. For example, many people incorrectly believe that
int x = func();
if(x == something)
consumes more memory than
if(func() == something)
But both examples compile into identical machine code. The result of func must be stored somewhere, it cannot be stored in thin air. In the first example, the result is stored in a memory segment that the programmer may refer to as x.
In the second example, it is stored in the very same memory segment, taking up the same amount of space, for the same duration of program execution. The only difference is that the memory segment is anonymous and the programmer has no name for it. As far as the machine code is concerned, that doesn't matter, since no variable names exist in machine code.
And this would be why every professional C programmer needs to understand a certain amount of assembler. You cannot hope to ever do any kind of manual code optimization if you don't.
(Please don't ask two questions in one, this is really annoying since you get two types of answer for your two different questions.)
For your first question. Probably putting {} around the use of a variable will not help. The lifetime of automatic variables that are not VLA (see below) is not bound to the scope in which it is declared. So compilers may have a hard time in figuring out how the use of the stack may be optimized, and maybe don't do such an optimization at all. In your case this is most likely the case, since you are exporting pointers to your data to a function that is perhaps not visible, here. The compiler has no way to figure out if there is a valid use of the arrays later on in the code.
I see two ways to "force" the compiler into optimizing that space, functions or VLA. The first, functions is simple: instead of putting the block around the code, put it in a static function. Function calls are quite optimized on modern platforms, and here the compiler knows exactly how he may clear the stack at the end.
The second alternative in your case is a VLA, variable length array, if you compiler supports that c99 feature. Arrays that have a size that doesn't depend on a compile time constant have a special rule for their lifetime. That lifetime exactly ends at the end of the scope where they are defined. Even a const-qualified variable could be used for that:
{
size_t const len = 300;
char someArray[len];
char someOtherArray[len];
someFunc(someArray, someOtherArray);
}
At the end, on a given platform, you'd really have to inspect what assembler your compiler produces.

Global Variables performance effect (c, c++)

I'm currently developing a very fast algorithm, with one part of it being an extremely fast scanner and statistics function.
In this quest, i'm after any performance benefit.
Therefore, I'm also interested in keeping the code "multi-thread" friendly.
Now for the question :
i've noticed that putting some very frequently accessed variables and arrays into "Global", or "static local" (which does the same), there is a measurable performance benefit (in the range of +10%).
I'm trying to understand why, and to find a solution about it, since i would prefer to avoid using these types of allocation.
Note that i don't think the difference comes from "allocation", since allocating a few variables and small array on the stack is almost instantaneous. I believe the difference comes from "accessing" and "modifying" data.
In this search, i've found this old post from stackoverflow :
C++ performance of global variables
But i'm very disappointed by the answers there. Very little explanation, mostly ranting about "you should not do that" (hey, that's not the question !) and very rough statements like 'it doesn't affect performance', which is obviously incorrect, since i'm measuring it with precise benchmark tools.
As said above, i'm looking for an explanation, and, if it exists, a solution to this issue. So far, i've got the feeling that calculating the memory address of a local (dynamic) variable costs a bit more than a global (or local static). Maybe something like an ADD operation difference. But that doesn't help finding a solution...
It really depends on your compiler, platform, and other details. However, I can describe one scenario where global variables are faster.
In many cases, a global variable is at a fixed offset. This allows the generated instructions to simply use that address directly. (Something along the lines of MOV AX,[MyVar].)
However, if you have a variable that's relative to the current stack pointer or a member of a class or array, some math is required to take the address of the array and determine the address of the actual variable.
Obviously, if you need to place some sort of mutex on your global variable in order to keep it thread-safe, then you'll almost certainly more than lose any performance gain.
Creating local variables can be literally free if they are POD types. You likely are overflowing a cache line with too many stack variables or other similar alignment-based causes which are very specific to your piece of code. I usually find that non-local variables significantly decrease performance.
It's hard to beat static allocation for speed, and while the 10% is a pretty small difference, it could be due to address calculation.
But if you're looking for speed,
your example in a comment while(p<end)stats[*p++]++; is an obvious candidate for unrolling, such as:
static int stats[M];
static int index_array[N];
int *p = index_array, *pend = p+N;
// ... initialize the arrays ...
while (p < pend-8){
stats[p[0]]++;
stats[p[1]]++;
stats[p[2]]++;
stats[p[3]]++;
stats[p[4]]++;
stats[p[5]]++;
stats[p[6]]++;
stats[p[7]]++;
p += 8;
}
while(p<pend) stats[*p++]++;
Don't count on the compiler to do it for you. It might or might not be able to figure it out.
Other possible optimizations come to mind, but they depend on what you're actually trying to do.
If you have something like
int stats[256]; while (p<end) stats[*p++]++;
static int stats[256]; while (p<end) stats[*p++]++;
you are not really comparing the same thing because for the first instance you are not doing an initialization of your array. Written explicitly the second line is equivalent to
static int stats[256] = { 0 }; while (p<end) stats[*p++]++;
So to be a fair comparison you should have the first read
int stats[256] = { 0 }; while (p<end) stats[*p++]++;
Your compiler might deduce much more things if he has the variables in a known state.
Now then, there could be runtime advantage of the static case, since the initialization is done at compile time (or program startup).
To test if this makes up for your difference you should run the same function with the static declaration and the loop several times, to see if the difference vanishes if your number of invocations grows.
But as other said already, best is to inspect the assembler that your compiler produces to see what effective difference there are in the code that is produced.

Why variables start out with random values in C

I think this is wrong, it should start as NULL and not with a random value. In the case that you have a pointer with a random memory address as its default value it could be a very dangerous thing, no?
The variables start out uninitialized because that's the fastest way - why waste the CPU cycles on initialization if you're going to write another value there anyway?
If you want a variable to be initialized after creation, just initialize it. :)
About it being a dangerous thing: Every good compiler will warn you if you try to use a variable without initialization.
No. C is a very efficient language, one that has traditionally been faster that a lot of other languages. One of the reasons for this is that it doesn't do too much on it's own. The programmer controls this.
In the case of initialization, C variables are not initialized to a random value. Rather, they are not initialized and so they contain whatever was at the memory location before.
If you wanted to initialize a variable to, say, 1 in your program, then it would be inefficient if the variable had already been initialized to zero or null. That would mean it was initialized twice.
Execution speed and overhead (or lack thereof) are the main reasons why. C is notorious for letting you walk off the proverbial cliff because it always assumes that the user knows better than it does.
Note that if you declared the variable as static it actually is guaranteed to be initialized to 0.
Variables start out with a random value because you are just handed a block of memory and told to deal with it yourself. It has whatever value that block of memory had before hand. Why should the program waste time setting the value to some arbitrary default when you are likely going to set it yourself later?
The design choice is performance, and it is one of the many reasons why C isn't the preferred language for most projects.
This has nothing to do with "if C were being designed today" or with efficiency of one initialization. Instead think of something like
void foo()
{
struct bar *ptrs[10000];
/* do something where only a few indices end up actually getting used */
}
Any language that forces useless initialization on you is doomed to be slow as hell for algorithms that can make use of sparse arrays where you don't care about the majority of the values, and have an easy way of knowing which values you care about.
If you don't like my example with such a large object on the stack, substitute malloc instead. It has the same semantics with regard to initialization.
In either case, if you want zero-initialization, you can get it with {0} or calloc.
It was a design choice made many ears ago, probably for efficiency reasons.
Statically allocated variables (globals and statics) are initialized to 0 if there's no explicit initialization - this could be justified even taking efficiency into account becuase it only occurs once. I'd guess the thinking was that for automatic variables (locals) that are allocated each time a scope is entered, implicit initialization was considered something that might cost too much and therefore should be left to the programmer's responsibility.
If C were being designed today, I wouldn't be surprised if that design decision were changed - especially since compilers are intelligent enough today to be able to optimize away an initialization that gets overwritten before any other use (or potential use).
However, there are so many C compiler toolchains that follow the spec of not initializing automatically, it would be foolish for a compiler to perform implicit initialization to a 'useful' value (like 0 or NULL). That would just encourage people targeting that tool chain to write code that didn't work correctly on other tool chains.
However, compilers can initialize local variables, and they often do. It's just that they initialize the locals to a values that's not generally useful (especially, that doesn't set a pointer to the null pointer). That kind of initialization isn't useful in writing your programming logic against, and it's not intended for that. It's intended to cause deterministic and reproducible errors so that if you erroneously use values that have been set by implicit initialization, you'll be able to find it easily in test/debug.
Usually this compiler behavior is turned on only for debug builds; I could see an argument being made for turning it on in release builds as well - particular if the release build can still optimize it away when the compiler can prove that the implicit initialized value is never used.

Code optimization

If I have a big structure(having lot of member variables). This structure pointer is passed to many functions in my code. Some member variables of this structure are used very often, in almost all functions.
If I put those frequently used member variables at the beginning in the structure declaration, will it optmize the code for MCPS - Million cycles per second(time consumed by the code). If i put frequently accessed members at time, will they be accessed efficiently/lesser time than if they are put randomly in the structure of at bottom of structure declaration? If yes what is the logic?
If I have a structure member being accessed in some function as follows:
structurepointer1->member_variable
Will it help in optimizing it in MCPS aspect if I assign it to a local variable and then access the local variable, as shown below?
local_variable = structurepointer1->member_variable;
If yes, then how does it help?
1) The position of a field in a structure should have no effect on its access time except to the extent that, if your structure is very large and spans multiple pages, it may be a good idea to position members that are often used in quick succession close together in order to increase locality of reference and try to decrease cache misses.
2) Maybe / maybe not. In fact it may make things slower. If the variable is not volatile, your compiler may be smart enough to store the field in a register anyway. Even if not, your processor will cache its value, but this may not help if is uses are somewhat far apart, with lots of other memory access in between. If the value would have either been stored in a register or would have stayed in your processor's cache, then assigning it to a local will only be unnecessary extra work.
Standard Optimizations Disclaimer: Always profile before optimizing. Make sure that what you are trying to optimize is worth optimizing. Always profile your attempted optimizations and make sure they actually made things faster (and not slower).
First, the obligatory disclaimer: for all performance questions, you must profile the code to see where improvements can be made.
In general though, anything you can do to keep your data in the processor cache will help. Putting the most commonly accessed items close together will facilitate this.
I know this is not really answering your question, but before you delve into super-optimizing your code, go through this presentation http://dl.fefe.de/optimizer-isec.pdf. I saw it live and it was a good eye opening experience showing compilers are getting far more advanced in optimization than we tend to think and readable code is more important than small optimizations.
On 2, you most likely are better off not declaring a local variable. The compiler is usually smart enough to figure out when and how variable is used and utilize registers to keep it around.
Also, I would second Mark Ransom's suggestion, profile the code before making assumptions about bottlenecks.
I think your question is related with data alignment and data structure padding. In modern compilers this is handled automatically the most of the times, trying to avoid the alignment faults that could happen on memory. You can read about this here. Of course, you can change the alignment for your data, but I think you would need to specify some compiler options to disable auto-alignment and rearrange the fields on the structure to match the architecture you are aiming to.
I would say this is a very low level optimization.
The location of the field in the structure is irrelevant as that will be calculated by the compiler. A more promising optimization is to make sure that your most-used fields are byte-aligned with the word size of your processor.
If you are using the variable local to a function, this should have no impact. If you are passing it to other functions (separate from the larger structure) than that might help a bit.
As with all of the other answers, you need to run a profile baseline before optimizing, to make sure changes are effective. If you're worried about execution time, profile your algorithms and optimize them before you worry about the code a compiler creates, more bang for the buck.
Also, if you want to know what is going to happen, you should consider compiling your c code into assembly output. This will give you an idea of what the compiler is going to do and how you may go about further "fine tuning".
Structure access is most always indexed indirect access. The assembly code will effectively pull memory knowing the pointer to the structure as the base plus and index to get the right field. This is usually an expensive operation, but for modern CPU's its probably not that slow.
This depends on the locality of the data being accessed. First and foremost accessing the structure the first time will be the most expensive. Accessing the data afterwards, can be quick if the data is already in a processor register, however, this may not be the case depending on the processor used. Storing to a local variable should be less expensive since the memory access instructions for such an operation is less expensive. Again, I think now days processors are fast enough that this optimization is minimal.
I still think that there are probably better places to optimize your code. It is good though that there is someone out there that thinks about this still, in a world of code bloat ;) Embedded computing, you still need to worry about these things.
This depends on the size of your fields and caching details. Look at using valgrind for profiling this.
If you doing this dereferencing a lot it would cost time. A decent optimizing compiler will effectively do the storing the pointer into the local variable optimization as you described. It will do a better job than you will and it will do it in an architecture-specific way.
What you want to do in this situation, overall, is make sure that you test the correctness and the performance of each optimization you are trying. Otherwise you are poking around in the dark.
Remember that fine optimizations at the C line level will virtually never trump higher-order algorithm/design optimizations.
Yes, it can help. But as people have already stated, it depends and can even be counter productive.
The reason why I think it can help, has to do with pointer aliasing. If you access your variables via a pointer, and the compiler can not guarantee that the structure was not changed elsewhere (via your pointer or another) he will generate code to reload or save the variable even if he could have hold the value in a register. Here an example to show what I mean:
calc = structurepointer1->member_variable * x + c;
/* Do something in function which doesn't involve member_variable; */
function(structurepointer1);
calc2 = structurepointer1->member_variable * y;
The compiler will make a memory access for both references to member_variable, because it can not be sure that the called function has modified that field.
If you're sure the function doesn't change that value, doing this would save 1 memory access
int temp = structurepointer1->member_variable;
calc = temp * x + something;
function(structurepointer1);
calc2 = temp * y;
There's also another reason you can use a local variable for your member variables, it can make the code much more readable.

What is the real difference between Pointers and References?

AKA - What's this obsession with pointers?
Having only really used modern, object oriented languages like ActionScript, Java and C#, I don't really understand the importance of pointers and what you use them for. What am I missing out on here?
It's all just indirection: The ability to not deal with data, but say "I'll direct you to some data, over there". You have the same concept in Java and C#, but only in reference format.
The key differences are that references are effectively immutable signposts - they always point to something. This is useful, and easy to understand, but less flexible than the C pointer model. C pointers are signposts that you can happily rewrite. You know that the string you're looking for is next door to the string being pointed at? Well, just slightly alter the signpost.
This couples well with C's "close to the bone, low level knowledge required" approach. We know that a char* foo consists of a set of characters beginning at the location pointed to by the foo signpost. If we also know that the string is at least 10 characters long, we can change the signpost to (foo + 5) to point at then same string, but start half the length in.
This flexibility is useful when you know what you're doing, and death if you don't (where "know" is more than just "know the language", it's "know the exact state of the program"). Get it wrong, and your signpost is directing you off the edge of a cliff. References don't let you fiddle, so you're much more confident that you can follow them without risk (especially when coupled with rules like "A referenced object will never disappear", as in most Garbage collected languages).
You're missing out on a lot! Understanding how the computer works on lower levels is very useful in several situations. C and assembler will do that for you.
Basically a pointer lets you write stuff to any point in the computer's memory. On more primitive hardware/OS or in embedded systems this actually might do something useful. Say turn the blinkenlichts on and off again.
Of course this doesn't work on modern systems. The operating system is the Lord and Master of main memory. If you try to access a wrong memory location, your process will pay for its hubris with its life.
In C, pointers are the way of passing references to data. When you call a function, you don't want to copy a million bits to a stack. Instead you just tell where the data resides in the main memory. In other words, you give a pointer to the data.
To some extent that is what happens even with Java. You pass references to objects, not the objects themselves. Remember, ultimately every object is a set of bits in the computer main memory.
Pointers are for directly manipulating the contents of memory.
It's up to you whether you think this is a good thing to do, but it's the basis of how anything gets done in C or assembler.
High-level languages hide pointers behind the scenes: for example a reference in Java is implemented as a pointer in almost any JVM you'll come across, which is why it's called NullPointerException rather than NullReferenceException. But it doesn't let the programmer directly access the memory address it points to, and it can't be modified to take a value other than the address of an object of the correct type. So it doesn't offer the same power (and responsibility) that pointers in low-level languages do.
[Edit: this is an answer to the question 'what's this obsession with pointers?'. All I've compared is assembler/C-style pointers with Java references. The question title has since changed: had I set out to answer the new question I might have mentioned references in languages other than Java]
This is like asking, “what's this obsession with CPU instructions? Do I miss out on something by not sprinkling x86 MOV instructions all over the place?”
You just need pointers when programming on a low level. In most higher-level programming language implementations, pointers are used just as extensively as in C, but hidden from the user by the compiler.
So... Don't worry. You're using pointers already -- and without the dangers of doing so incorrectly, too. :)
I see pointers as a manual transmission in a car. If you learn to drive with a car that has an automatic transmission, that won't make for a bad driver. And you can still do most everything that the drivers that learned on a manual transmission can do. There will just be a hole in your knowledge of driving. If you had to drive a manual you'd probably be in trouble. Sure, it is easy to understand the basic concept of it, but once you have to do a hill start, you're screwed. But, there is still a place for manual transmissions. For instance, race car drivers need to be able to shift to get the car to respond in the most optimal way to the current racing conditions. Having a manual transmission is very important to their success.
This is very similar to programming right now. There is a need for C/C++ development on some software. Some examples are high-end 3D games, low level embedded software, things where speed is a critical part of the software's purpose, and a lower level language that allows you closer access to the actual data that needs to be processed is key to that performance. However, for most programmers this is not the case and not knowing pointers is not crippling. However, I do believe everybody can benefit from learning about C and pointers, and manual transmissions too.
Since you have been programming in object-oriented languages, let me put it this way.
You get Object A instantiate Object B, and you pass it as a method parameter to Object C. The Object C modifies some values in the Object B. When you are back to Object A's code, you can see the changed value in Object B. Why is this so?
Because you passed in a reference of Object B to Object C, not made another copy of Object B. So Object A and Object C both hold references to the same Object B in memory. Changes from one place and be seen in another. This is called By Reference.
Now, if you use primitive types instead, like int or float, and pass them as method parameters, changes in Object C cannot be seen by Object A, because Object A merely passed a copy instead of a reference of its own copy of the variable. This is called By Value.
You probably already knew that.
Coming back to the C language, Function A passes to Function B some variables. These function parameters are natively copies, By Value. In order for Function B to manipulate the copy belonging to Function A, Function A must pass a pointer to the variable, so that it becomes a pass By Reference.
"Hey, here's the memory address to my integer variable. Put the new value at that address location and I will pick up later."
Note the concept is similar but not 100% analogous. Pointers can do a lot more than just passing "by reference". Pointers allow functions to manipulate arbitrary locations of memory to whatever value required. Pointers are also used to point to new addresses of execution code to dynamically execute arbitrary logic, not just data variables. Pointers may even point to other pointers (double pointer). That is powerful but also pretty easy to introduce hard-to-detect bugs and security vulnerabilities.
If you haven't seen pointers before, you're surely missing out on this mini-gem:
void strcpy(char *dest, char *src)
{
while(*dest++ = *src++);
}
Historically, what made programming possible was the realization that memory locations could hold computer instructions, not just data.
Pointers arose from the realization that memory locations could also hold the address of other memory locations, thus giving us indirection. Without pointers (at a low level) most complicated data structures would be impossible. No linked-lists, binary-trees or hash-tables. No pass by reference, only by value. Since pointers can point to code, without them we would also have no virtual functions or function look up tables.
I use pointers and references heavily in my day to day work...in managed code (C#, Java) and unmanaged (C++, C). I learned about how to deal with pointers and what they are by the master himself...[Binky!!][1] Nothing else needs to be said ;)
The difference between a pointer and reference is this. A pointer is an address to some block of memory. It can be rewritten or in other words, reassigned to some other block of memory. A reference is simply a renaming of some object. It can only be assigned once! Once it is assigned to an object, it cannot be assigned to another. A reference is not an address, it is another name for the variable. Check out C++ FAQ for more on this.
Link1
LInk2
I'm currently waist-deep in designing some high level enterprise software in which chunks of data (stored in an SQL database, in this case) are referenced by 1 or more other entities. If a chunk of data remains when no more entities reference it, we're wasting storage. If a reference points so data that's not present, that's a big problem too.
There's a strong analogy to be made between our issues, and those of memory management in a language that uses pointers. It's tremendously useful to be able to talk to my colleagues in terms of that analogy. Not deleting unreferenced data is a "memory leak". A reference that goes nowhere is a "dangling pointer". We can choose explicit "frees", or we can implement "garbage collection" using "reference counting".
So here, understanding low-level memory management is helping design high-level applications.
In Java you're using pointers all the time. Most variables are pointers to objects - which is why:
StringBuffer x = new StringBuffer("Hello");
StringBuffer y = x;
x.append(" boys");
System.out.println(y);
... prints "Hello boys" and not "Hello".
The only difference in C is that it's common to add and subtract from pointers - and if you get the logic wrong you can end up messing with data you shouldn't be touching.
Strings are fundamental to C (and other related languages). When programming in C, you must manage your memory. You don't just say "okay, I'll need a bunch of strings"; you need to think about the data structure. How much memory do you need? When will you allocate it? When will you free it? Let's say you want 10 strings, each with no more than 80 characters.
Okay, each string is an array of characters (81 characters - you mustn't forget the null or you'll be sorry!) and then each string is itself in an array. The final result will be a multidimensional array something like
char dict[10][81];
Note, incidentally, that dict isn't a "string" or an "array", or a "char". It's a pointer. When you try to print one of those strings, all you're doing is passing the address of a single character; C assumes that if it just starts printing characters it will eventually hit a null. And it assumes that if you are at the start of one string, and you jump forward 81 bytes, you'll be at the start of the next string. And, in fact taking your pointer and adding 81 bytes to it is the only possible way to jump to the next string.
So, why are pointers important? Because you can't do anything without them. You can't even do something simple like print out a bunch of strings; you certainly can't do anything interesting like implement linked lists, or hashes, or queues, or trees, or a file system, or some memory management code, or a kernel or...whatever. You NEED to understand them because C just hands you a block of memory and let's you do the rest, and doing anything with a block of raw memory requires pointers.
Also many people suggest that the ability to understand pointers correlates highly with programming skill. Joel has made this argument, among others. For example
Now, I freely admit that programming with pointers is not needed in 90% of the code written today, and in fact, it's downright dangerous in production code. OK. That's fine. And functional programming is just not used much in practice. Agreed.
But it's still important for some of the most exciting programming jobs. Without pointers, for example, you'd never be able to work on the Linux kernel. You can't understand a line of code in Linux, or, indeed, any operating system, without really understanding pointers.
From here. Excellent article.
To be honest, most seasoned developers will have a laugh (hopefully friendly) if you don't know pointers.
At my previous Job we had two new hires last year (just graduated) that didn't know about pointers, and that alone was the topic of conversation with them for about a week. No one could believe how someone could graduate without knowing pointers...
References in C++ are fundamentally different from references in Java or .NET languages; .NET languages have special types called "byrefs" which behave much like C++ "references".
A C++ reference or .NET byref (I'll use the latter term, to distinguish from .NET references) is a special type which doesn't hold a variable, but rather holds information sufficient to identify a variable (or something that can behave as one, such as an array slot) held elsewhere. Byrefs are generally only used as function parameters/arguments, and are intended to be ephemeral. Code which passes a byref to a function guarantees that the variable which is identified thereby will exist at least until that function returns, and functions generally guarantee not to keep any copy of a byref after they return (note that in C++ the latter restriction is not enforced). Thus, byrefs cannot outlive the variables identified thereby.
In Java and .NET languages, a reference is a type that identifies a heap object; each heap object has an associated class, and code in the heap object's class can access data stored in the object. Heap objects may grant outside code limited or full access to the data stored therein, and/or allow outside code to call certain methods within their class. Using a reference to calling a method of its class will cause that reference to be made available to that method, which may then use it to access data (even private data) within the heap object.
What makes references special in Java and .NET languages is that they maintain, as an absolute invariant, that every non-null reference will continue to identify the same heap object as long as that reference exists. Once no reference to a heap object exists anywhere in the universe, the heap object will simply cease to exist, but there is no way a heap object can cease to exist while any reference to it exists, nor is there any way for a "normal" reference to a heap object to spontaneously become anything other than a reference to that object. Both Java and .NET do have special "weak reference" types, but even they uphold the invariant. If no non-weak references to an object exist anywhere in the universe, then any existing weak references will be invalidated; once that occurs, there won't be any references to the object and it can thus be invalidated.
Pointers, like both C++ references and Java/.NET references, identify objects, but unlike the aforementioned types of references they can outlive the objects they identify. If the object identified by a pointer ceases to exist but the pointer itself does not, any attempt to use the pointer will result in Undefined Behavior. If a pointer isn't known either to be null or to identify an object that presently exists, there's no standard-defined way to do anything with that pointer other than overwrite it with something else. It's perfectly legitimate for a pointer to continue to exist after the object identified thereby has ceased to do so, provided that nothing ever uses the pointer, but it's necessary that something outside the pointer indicate whether or not it's safe to use because there's no way to ask the pointer itself.
The key difference between pointers and references (of either type) is that references can always be asked if they are valid (they'll either be valid or identifiable as null), and if observed to be valid they will remain so as long as they exist. Pointers cannot be asked if they are valid, and the system will do nothing to ensure that pointers don't become invalid, nor allow pointers that become invalid to be recognized as such.
For a long time I didn't understand pointers, but I understood array addressing. So I'd usually put together some storage area for objects in an array, and then use an index to that array as the 'pointer' concept.
SomeObject store[100];
int a_ptr = 20;
SomeObject A = store[a_ptr];
One problem with this approach is that after I modified 'A', I'd have to reassign it to the 'store' array in order for the changes to be permanent:
store[a_ptr] = A;
Behind the scenes, the programming language was doing several copy-operations. Most of the time this didn't affect performance. It mostly made the code error-prone and repetitive.
After I learned to understand pointers, I moved away from implementing the array addressing approach. The analogy is still pretty valid. Just consider that the 'store' array is managed by the programming language's run-time.
SomeObject A;
SomeObject* a_ptr = &A;
// Any changes to a_ptr's contents hereafter will affect
// the one-true-object that it addresses. No need to reassign.
Nowadays, I only use pointers when I can't legitimately copy an object. There are a bunch of reasons why this might be the case:
To avoid an expensive object-copy
operation for the sake of
performance.
Some other factor doesn't permit an
object-copy operation.
You want a function call to have
side-effects on an object (don't
pass the object, pass the pointer
thereto).
In some languages- if you want to
return more than one value from a
function (though generally
avoided).
Pointers are the most pragmatic way of representing indirection in lower-level programming languages.
Pointers are important! They "point" to a memory address, and many internal structures are represented as pointers, IE, An array of strings is actually a list of pointers to pointers! Pointers can also be used for updating variables passed to functions.
You need them if you want to generate "objects" at runtime without pre allocate memory on the stack
Parameter efficency - passing a pointer (Int - 4 bytes) as opposed to copying a whole (arbitarily large) object.
Java classes are passed via reference (basically a pointer) also btw, its just that in java that's hidden from the programmer.
Programming in languages like C and C++ you are much closer to the "metal". Pointers hold a memory location where your variables, data, functions etc. live. You can pass a pointer around instead of passing by value (copying your variables and data).
There are two things that are difficult with pointers:
Pointers on pointers, addressing, etc. can get very cryptic. It leads to errors, and it is hard to read.
Memory that pointers point to is often allocated from the heap, which means you are responsible for releasing that memory. Bigger your application gets, harder it is to keep up with this requirement, and you end up with memory leaks that are hard to track down.
You could compare pointer behavior to how Java objects are passed around, with the exception that in Java you do not have to worry about freeing the memory as this is handled by garbage collection. This way you get the good things about pointers but do not have to deal with the negatives. You can still get memory leaks in Java of course if you do not de-reference your objects but that is a different matter.
Also just something to note, you can use pointers in C# (as opposed to normal references) by marking a block of code as unsafe. Then you can run around changing memory addresses directly and do pointer arithmetic and all that fun stuff. It's great for very fast image manipulation (the only place I personally have used it).
As far as I know Java and ActionScript don't support unsafe code and pointers.
I am always distressed by the focus on such things as pointers or references in high-level languages. It's really useful to think at a higher level of abstraction in terms of the behavior of objects (or even just functions) as opposed to thinking in terms of "let me see, if I send the address of this thing to there, then that thing will return me a pointer to something else"
Consider even a simple swap function. If you have
void swap(int & a, int & b)
or
procedure Swap(var a, b : integer)
then interpret these to mean that the values can be changed. The fact that this is being implemented by passing the addresses of the variables is just a distraction from the purpose.
Same with objects --- don't think of object identifiers as pointers or references to "stuff". Instead, just think of them as, well, OBJECTS, to which you can send messages. Even in primitive languages like C++, you can go a lot further a lot faster by thinking (and writing) at as high a level as possible.
Write more than 2 lines of c or c++ and you'll find out.
They are "pointers" to the memory location of a variable. It is like passing a variable by reference kinda.

Resources