Suppose that the function
void foo(int n, double x[])
sorts the n-vector x, does some operations on x, and then restores the original ordering to x before returning. So internally, foo needs some temporary storage, e.g., at least an n-vector of integers so that it store the original ordering.
What's the best way to handle this temporary storage? I can think of two obvious approaches:
foo declares its own workspace by declaring an internal array, i.e., at the top of foo we have
int temp[n];
in the main calling routine, dynamically allocate the n-vector of ints once and pass in the storage at each call to a version of foo that accepts the temporary storage as a 3rd arg, i.e.,
double *temp = malloc(n*sizeof(double));
foo(n, x, temp);
I'm worried that option 1 is inefficient (the function foo will get called many times with the same n), and option 2 is just plain ugly, since I have to carry around this temporary storage so that it's always available wherever I happen to need a call to foo(n,x).
Are there other more elegant options?
If you end up using option 2 – that is, the function uses memory that is allocated elsewhere – use proper encapsulation.
In a nutshell, don’t pass in a raw array, pass in a context object which has matching init and release functions.
Then the user must still pass in the context and properly set it up and tear it down but the details are hidden from her and she doesn’t care about the details of the allocation. This is a common pattern in C.
typedef struct {
double* storage;
} foo_context;
void foo_context_init(foo_context*, int n);
void foo_context_free(foo_context*);
void foo(foo_context* context, int n, double x[]);
Now, for a very simple case this is clearly a tremendous overhead and I agree with Oli that option 1 called for.
Option 1 is clearly the cleanest (because it's completely encapsulated). So go with Option 1 until profiling has determined that this is a bottleneck.
Update
#R's comment below is correct; this could blow your stack if n is large. The pre-C99 "encapsulated" method would be to malloc the local array, rather than putting it on the stack.
On most architectures option 1 is very efficient since it allocates memory on the stack and is typically an add to the stack and/or frame pointer. Just be careful not to make n too large.
As Oli said in his answer the best is to have the function being autonomous about this temporary array. A single allocation is not going to cost a lot unless that function is called in a very fast loop... so get it right first, then profile and then decide if it's worth doing an optimization.
That said in a few cases after profiling and when the temp data structure needed was a bit more complex that a single int array I adopted the following approach:
void foo(int n, ... other parameters ...)
{
static int *temp_array, temp_array_size;
if (n > temp_array_size)
{
/* The temp array we have is not big enough, increase it */
temp_array = realloc(temp_array, n*sizeof(int));
if (!temp_array) abort("Out of memory");
temp_array_size = n;
}
... use temp_array ...
}
note that using a static array rules out for example multithreading or recursion and this should be clearly stated in the documentation.
Related
Let's say I have the following structure:
typedef struct s_tuple{
double x;
double y;
double z;
double w;
} t_tuple;
Let's say I have the two following functions:
t_tuple tuple_sub_values(t_tuple a, t_tuple b)
{
a.x -= b.x;
a.y -= b.y;
a.z -= b.z;
a.w -= b.w;
return (a);
}
t_tuple tuple_sub_pointers(t_tuple *a, t_tuple *b)
{
t_tuple c;
c.x = a->x - b->x;
c.y = a->y - b->y;
c.z = a->z - b->z;
c.w = a->w - b->w;
return (c);
}
Will there be a performance difference between the functions ? Is one of these better than the other ?
Basically, what are the pros and cons of passing by value vs. passing by pointer when all of the structure elements are called ?
Edit: Completely changed my structure and functions to give a more precise example
I found this post that is related to my question but is for C++: https://stackoverflow.com/questions/40185665/performance-cost-of-passing-by-value-vs-by-reference-or-by-pointer#:~:text=In%20short%3A%20It%20is%20almost,reference%20parameters%20than%20value%20parameters.
Context: My structures are not huge in this example, but I am coding a ray-tracer and some structs of size around 100B can be called millions of times so I'd like to try to optimize these calls. My structs are kind of imbricated so it would be a mess to copy them here, this is why I tried to ask my question on a kind of general example.
Getting to the core of the question: for optimal arg-passing/value-returning performance, you basically want to follow the ABI of your platform to try and make sure that things are in registers and stay in registers. If they aren't in registers and or cannot stay in registers, then passing larger-than-pointer-size data by pointer will likely save some copying (unless the copying would need to be done in the callee anyway: void pass_copy(struct large x){ use(&x); } could actually be a small bit better for codegen than void pass_copy2(struct large const*x){ struct large cpy=*x; use(&cpy);
}`).
The concrete rules for e.g., the sysv x86-64 ABI are a bit complicated (see the chapter on calling conventions).
But a short version might be: args/return-vals go through registers as long as their type is "simple enough" and appropriate argument passing registers are available (6 for integer vals and 6 for doubles). Structs of up to two eightbytes can go through registers (as arguments or a return value) provided they're "simple enough".
Supposing your doubles are already loaded in registers (or aren't aggregated into t_tuples that you could point the callee to), the most efficient way to pass them on x86-64 SysV ABI would be individually or via structs of two doubles each, but you'd still need to return them via memory because the ABI can only accommodate two-double retvals with registers, not 4-double retvals. If you returned a fourdouble, the compiler would stack-alloc memory in the caller, and pass a pointer to it as a hidden first argument and then return a pointer to the allocated memory (under the covers). A more flexible approach would be to not return such a large aggregate but instead explicitly pass a pointer to a struct-to-be-filled. That way the struct can be anywhere you want it (rather then auto-alloced on the stack by the compiler).
So something like
void tuple_sub_values(t_tuple *retval,
t_twodoubles a0, t_twodoubles a1,
t_twodoubles b0, t_twodoubles b1);
would a better API for avoiding memory spillage on x86-64 SysV ABI (Linux, MacOS, BSDs...).
If your measurements show the codesize savings / performance boost to be worth it for you, you could wrap it in an inline function that'd do the struct-splitting.
When it comes to performance, that will most likely be implementation specific for reasons going far away from this post, but most likely we're talking about microseconds at the worst case. Now when it comes to the pros and cons:
Passing by value will only give you a copy of that struct, and modifications will be local only. In other words, your function will receive an entirely new copy of the struct, and it will only be able to modify that copy.
In contrast, passing by reference gives you the ability to modify the given struct directly from the function, and is often seen when multiple values need to be returned from a function.
It's entirely up to you to choose which one works for your case. But to add some extra help:
Passing by reference will reduce the function call overhead because you won't have to copy 32 bytes from scratch to the new function. It will also help significantly if you're planning to keep memory footprint low, if you plan to call the function multiple times. Why? Because instead of creating multiple different structs for those calls, you simply tell every call to reuse the same struct. Which is mainly seen in games, where structs may be thousands of bytes large.
I have this struct:
typedef struct data{
char name[100], pseudo[100];
int num0, num1, num2, num3, num4;
long int lnum0, lnum1, lnum2, lnum3, lnum4;
double dnum0, dnum1;
}data;
data list[50]
I create an array of this struct and sort them with a quicksort algorithm.
To do that I must swap element using this function:
void swap(data list[], i, j){
data tmp;
tmp.num1 = list[i].num1
list[i].num1 = list[j].num1
list[j].num1 =tmp.num1
//using memmove to avoid overlaping from the strcpy function
memmove(temp.name,list[i].name,strlen(list[i].name));
memmove(list[i].name,list[j].name,strlen(list[j].num1));
memmove(list[j].name,tmp.name,strlen(tmp.name));
}
I have 16 element in my struct, and i have to repeat 16 times this function to swap them all.
My question is : Is there another simpler,faster or nicer way to proceed, or can we optimize this function ?
This is a typical workaround to sort a array with N elements of type T where both N and sizeof(T) are assumed to be large.
Create a temporary array of N pointers to T.
Fill the temporary array with pointers to the elements in your actual array.
Sort the temporary array. (When comparing elements, you have to dereference the pointers. When exchanging elements, you only have to swap single pointers.)
Rearrange the elements in your original array such that they are in the same order as pointed to by the pointers in your temporary array.
Free the temporary array again.
This technique has the advantage that you only have to perform O(N) swaps of T while you might do O(N log(N)) swaps of T*. The downside is that you'll have to allocate the temporary buffer and go through an additional pointer indirection when comparing elements. You'll have to benchmark in order to see whether or not this pays off for your type.
A possible optimization is to allocate the temporary array on the stack as it never outlives the sorting routine. Putting huge arrays on the stack might cause a stack overflow, though, so be careful about the size.
I'm not sure this is what you're looking for, but a simple way to speed up these swaps would be to store pointers to "struct data" in "list", rather than storing the entire structs themselves. That way when you swap, you only swap 4 or 8 bytes at a time (for 32-bit and 64-bit respectively), instead of a whopping 256 bytes.
If you're set on storing and swapping all the memory for those structs contiguously, your best bet is to use vector intrinsics (SIMD). Here's a guide for gcc. Here's one for msvc.
If it weren't for the fact that you're asking about optimisation, I'd assume this is a homework task. Homework tasks of the sorting variety don't usually involve optimisation, though. Nonetheless, your institution would've taught you in the real world never to reinvent the wheel unless the benefits outweigh the costs. In this case, they don't.
Imagine if your fastest for x86 code were also slowest for ARM. This is such a common scenario the standard library includes two functions within <stdlib.h>: qsort and bsearch. The odds are, the authors of the standard library have a better idea of how to write, test and tune a sorting algorithm for each platform.
Imagine if every process running at every given time reinvented the wheel, leading to lots of duplicate sorting functions being kept and swapped around in memory... One major benefit to using standard library code is that it can be shared among many processes, leading to less duplicate resource consumption. Less resource consumption also happens to most likely lead to faster code, and in this case sharing this resource between multiple processes is also likely to reduce cache thrashing.
To use qsort and bsearch you first need to define a comparison function. This can be as simple as wrapping strcmp, if you can guarantee that the field to sort based on is in fact a string (i.e. the character sequence ends with a '\0'). The comparison function needs to use the signature int compare(void const *x, void const *y);, for example:
int compare_data_by_name(void const *x, void const *y) {
data const *foo = x, *bar = y;
return strcmp(foo->name, bar->name);
}
Calling qsort(list, sizeof list / sizeof *list, sizeof *list, compare_data_by_name); will sort list.
Calling bsearch(&(data){ .name = "fred" }, list, sizeof list / sizeof *list, sizeof *list, compare_data_by_name); will retrieve an item with "fred" as the name.
Working on my C muscle lately and looking through the many libraries I've been working with its certainly gave me a good idea of what is good practice. One thing that I have NOT seen is a function that returns a struct:
something_t make_something() { ... }
From what I've absorbed this is the "right" way of doing this:
something_t *make_something() { ... }
void destroy_something(something_t *object) { ... }
The architecture in code snippet 2 is FAR more popular than snippet 1. So now I ask, why would I ever return a struct directly, as in snippet 1? What differences should I take into account when I'm choosing between the two options?
Furthermore, how does this option compare?
void make_something(something_t *object)
When something_t is small (read: copying it is about as cheap as copying a pointer) and you want it to be stack-allocated by default:
something_t make_something(void);
something_t stack_thing = make_something();
something_t *heap_thing = malloc(sizeof *heap_thing);
*heap_thing = make_something();
When something_t is large or you want it to be heap-allocated:
something_t *make_something(void);
something_t *heap_thing = make_something();
Regardless of the size of something_t, and if you don’t care where it’s allocated:
void make_something(something_t *);
something_t stack_thing;
make_something(&stack_thing);
something_t *heap_thing = malloc(sizeof *heap_thing);
make_something(heap_thing);
This is almost always about ABI stability. Binary stability between versions of the library. In the cases where it is not, it is sometimes about having dynamically sized structs. Rarely it is about extremely large structs or performance.
It is exceedingly rare that allocating a struct on the heap and returning it is nearly as fast as returning it by-value. The struct would have to be huge.
Really, speed is not the reason behind technique 2, return-by-pointer, instead of return-by-value.
Technique 2 exists for ABI stability. If you have a struct and your next version of the library adds another 20 fields to it, consumers of your previous version of the library are binary compatible if they are handed pre-constructed pointers. The extra data beyond the end of the struct they know about is something they don't have to know about.
If you return it on the stack, the caller is allocating the memory for it, and they must agree with you on how big it is. If your library updated since they last rebuilt, you are going to trash the stack.
Technique 2 also permits you to hide extra data both before and after the pointer you return (which versions appending data to the end of the struct is a variant of). You could end the structure with a variable sized array, or prepend the pointer with some extra data, or both.
If you want stack-allocated structs in a stable ABI, almost all functions that talk to the struct need to be passed version information.
So
something_t make_something(unsigned library_version) { ... }
where library_version is used by the library to determine what version of something_t it is expected to return and it changes how much of the stack it manipulates. This isn't possible using standard C, but
void make_something(something_t* here) { ... }
is. In this case, something_t might have a version field as its first element (or a size field), and you would require that it be populated prior to calling make_something.
Other library code taking a something_t would then query the version field to determine what version of something_t they are working with.
As a rule of thumb, you should never pass struct objects by value. In practice, it will be fine to do so as long as they are smaller or equal to the maximum size that your CPU can handle in a single instruction. But stylistically, one typically avoids it even then. If you never pass structs by value you can later on add members to the struct and it won't affect performance.
I think that void make_something(something_t *object) is the most common way to use structures in C. You leave the allocation to the caller. It is efficient but not pretty.
However, object-oriented C programs use something_t *make_something() since they are built with the concept of opaque type, which forces you to use pointers. Whether the returned pointer points at dynamic memory or something else depends on the implementation. OO with opaque type is often one of the most elegant and best ways to design more complex C programs, but sadly, few C programmers know/care about it.
Some pros of the first approach:
Less code to write.
More idiomatic for the use case of returning multiple values.
Works on systems that don't have dynamic allocation.
Probably faster for small or smallish objects.
No memory leak due to forgetting to free.
Some cons:
If the object is large (say, a megabyte) , may cause stack overflow, or may be slow if compilers don't optimize it well.
May surprise people who learned C in the 1970s when this was not possible, and haven't kept up to date.
Does not work with objects that contain a pointer to a part of themself.
I'm somewhat surprised.
The difference is that example 1 creates a structure on the stack, example 2 creates it on the heap. In C, or C++ code which is effectively C, it's idiomatic and convenient to create most objects on the heap. In C++ it is not, mostly they go on the stack. The reason is that if you create an object on the stack, the destructor is called automatically, if you create it on the heap, it must be called explicitly.So it's a lot easier to ensure there are no memory leaks and to handle exceptions is everything goes on the stack. In C, the destructor must be called explictly anyway, and there's no concept of a special destructor function (you have destructors, of course, but they are just normal functions with names like destroy_myobject()).
Now the exception in C++ is for low-level container objects, e.g. vectors, trees, hash maps and so on. These do retain heap members, and they have destructors. Now most memory-heavy objects consist of a few immediate data members giving sizes, ids, tags and so on, and then the rest of the information in STL structures, maybe a vector of pixel data or a map of English word / value pairs. So most of the data is in fact on the heap, even in C++.
And modern C++ is designed so that this pattern
class big
{
std::vector<double> observations; // thousands of observations
int station_x; // a bit of data associated with them
int station_y;
std::string station_name;
}
big retrieveobservations(int a, int b, int c)
{
big answer;
// lots of code to fill in the structure here
return answer;
}
void high_level()
{
big myobservations = retriveobservations(1, 2, 3);
}
Will compile to pretty efficient code. The large observation member won't generate unnecessary makework copies.
Unlike some other languages (like Python), C does not have the concept of a tuple. For example, the following is legal in Python:
def foo():
return 1,2
x,y = foo()
print x, y
The function foo returns two values as a tuple, which are assigned to x and y.
Since C doesn't have the concept of a tuple, it's inconvenient to return multiple values from a function. One way around this is to define a structure to hold the values, and then return the structure, like this:
typedef struct { int x, y; } stPoint;
stPoint foo( void )
{
stPoint point = { 1, 2 };
return point;
}
int main( void )
{
stPoint point = foo();
printf( "%d %d\n", point.x, point.y );
}
This is but one example where you might see a function return a structure.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
If I need to write a function that returns an array: int*, which way is better?
int* f(..data..)
or: void f(..data..,int** arr)
and we call f like this: int* x; f(&x);. (maybe they are both the same but I am not sure. but if I need to return an ErrorCode(it's an enum) too, then in the first way f will get ErrorCode* and in the second way, f will return an ErrorCode).
Returning an array is just returning a variable amount of data.
That's a really old problem, and C programmers developed many answers for it:
Caller passes in buffer.
The neccessary size is documented and not passed, too short buffers are Undefined Behavior: strcpy()
The neccessary size is documented and passed, errors are signaled by the return value: strcpy_s()
The buffer size is passed by pointer, and the called function reallocates with the documented allocator as needed: POSIX getline()
The neccessary size is unknown, but can be queried by calling the function with buffer-length 0: snprintf()
The neccessary size is unknown and cannot be queried, as much as fits in a buffer of passed size is returned. If neccessary, additional calls must be made to get the rest: fread()
⚠ The neccessary size is unknown, cannot be queried, and passing too small a buffer is Undefined Behavior. This is a design defect, therefore the function is deprecated / removed in newer versions, and just mentioned here for completeness: gets().
Caller passes a callback:
The callback-function gets a context-parameter: qsort_s()
The callback-function gets no context-parameter. Getting the context requires magic: qsort()
Caller passes an allocator: Not found in the C standard library. All allocator-aware C++ containers support that though.
Callee contract specifies the deallocator. Calling the wrong one is Undefined Behavior: fopen()->fclose() strdup()->free()
Callee returns an object which contains the deallocator: COM-Objects
Callee uses an internal shared buffer: asctime()
Be aware that either the returned array must contain a sentinel object or other marker, you have to return the length separately, or you have to return a struct containing a pointer to the data and the length.
Pass-by-reference (pointer to size or such) helps there.
In general, whenever the user has to guess the size or look it up in the manual, he will sometimes get it wrong. If he does not get it wrong, a later revision might invalidate his careful work, so it doesn't matter he was once right. Anyway, this way lies madness (UB).
For the rest, choose the most comfortable and efficient one you can.
Regarding an error code: Remember there's errno.
Usually it's more convenient and semantic to return the array
int* f(..data..)
If ever you need complexe error handling (e.g., returning errors values), you should return the error as an int, and the array by value.
There is no "better" here: you decide which approach fits the needs of the callers better.
Note that both functions are bound to give a user an array that they allocate internally, so deallocating the resultant array becomes a responsibility of the caller. In other words, somewhere inside f() you would have a malloc, and the user who receives the data must call free() on it.
You have another option here - let the caller pass the array into you, and return back a number that says how many items you put back into it:
size_t f(int *buffer, size_t max_length)
This approach lets the caller pass you a buffer in a static or in the automatic memory, thus improving flexibility.
the classic model is (assuming you need to return error code too)
int f(...., int **arr)
even though it doesnt flow so nicely as a function returning the array
Note this is why the lovely go language supports multiple return values.
Its also one of the reasons for exceptions - it gets the error indicators out of the function i/o space
The first one is better if there is no requirement to deal with an already existent pointer in the function.
The second one is used when you already have a defined pointer that points to an already allocated container (for example a list) and inside the function the value of the pointer can be changed.
If you must call f like int* x; f(&x);, you do not have much of a choice. You must use the second syntax, i.e., void f(..data..,int** arr). This is because you are not using return value anyways in your code.
The approach depends on a specific task and perhaps on your personal taste or a coding convention adopted in your project.
In general, I'd like to pass pointers as "output" parameters instead of return'ing an array for a number of reasons.
You likely want to return a number of elements in the array together with the array itself. But if you do this:
int f(const void* data, int** out_array);
Then if you see the signature first time, you can't quite tell what the function returns, the number of elements, or an error code, so I prefer to do this:
void f(const void* data, int** out_array, int* out_array_nelements);
Or even better:
void f(const void* data, int** out_array, size_t* out_array_nelements);
The function signature must be self-explanatory, and the parameter names help to achieve that.
The output array needs to be stored somewhere. You need to allocate some memory for the array. If you return a pointer to the array without passing the same pointer as argument, then you can't allocate memory on the stack. I mean, you cannot do this:
int f (const void *data) {
int array[10];
return array; /* the array is likely deallocated when the function exits */
}
Instead, you have to do static int array[10] (which is not thread-safe) or int *array = malloc(...) which leads to memory leaks.
So I suggest you to pass a pointer to the array which is already allocated before the function call, like this:
void f(const void *data, int* out_array, size_t* out_nelements, size_t max_nelements);
The benefit is you are free to choose where to allocate the array:
On the stack:
int array[10] = { 0 };
size_t max_nelements = sizeof(array)/sizeof(array[0]);
size_t nelements = 0;
f(data, array, &nelements, max_nelements);
Or in the heap:
size_t nelements = 0;
size_t max_nelements = 10;
int *array = malloc(max_nelements * sizeof(int));
f(data, array, &nelements, max_nelements);
See, with this approach you are free to choose how to allocate the memory.
I'm working on a C library, and part of it deals with some mathematical types and manipulating them. Each type has a factory constructor/destructor function that allocates and frees them dynamically. For example:
/* Example type, but illustrates situation very well. */
typdef struct {
float x;
float y;
float z;
} Vector3D;
/* Constructor */
Vector* Vector3D_new(float x, float y, float z) {
Vector3D* vector = (Vector3D*) malloc(sizeof(Vector3D));
/* Initialization code here...*/
return vector;
}
/* Destructor */
void Vector3D_destroy(Vector3D* vector) {
free(vector);
}
Nice & simple, and also alleviates the loads of proper initialization for a user.
Now my main concern is how to handle functions that operate upon these types (specifically how to return the result values.) Almost every binary operation will result in a new instance of the same type, and therefore, I need to consider how to give this back to the user. I could just return things by value, but passing around pointers would be preferred, since it is faster, compatible with the construct/destructor methods, and doesn't leave as much burden on the user.
Currently I have it implemented by having functions dynamically allocate the result, and then return a pointer to it:
/* Perform an operation, and dynamically return resultant vector */
Vector3D* addVectors(Vector3D* a, Vector3D* b) {
Vector3D* c = Vector3D_new(
a->x + b->x,
a->y + b->y,
a->z + b->z);
return c;
}
By returning the value directly to the user, it has the advantage of being able to be chained (e.g. to be passed directly into another function as a parameter), such as:
/* Given three Vector3D*s : a, b, & c */
float dot = dotProduct(crossProduct(a, addVectors(b, c));
But given the current method, this would result in a memory leak, since the result of addVectors() would be passed directly to crossProduct(), and the user wouldn't have a chance to free() it (and the same thing with crossProduct()'s result that is passed into dotProduct()). To make this work, a person would have to make a pointer to hold the value(s), use that, and then free() it via said pointer.
Vector3D* d = addVectors(b, c);
Vector3D* e = crossProduct(a, d);
float dot = dotProduct(e);
Vector3D_destroy(d);
Vector3d_destroy(e);
This works but is much less intuitive, and loses the chaining effect I so desire.
Another possibility is to have the operation functions take 3 arguments; two for the operands, and one to store the result in, but again not very intuitive.
My question is then: What are some elegant & productive ways of working with dynamic memory in binary operations? As a bonus, a solution that has been used in a real world library would be pretty cool. Any ideas? :)
In addition to the memory-leak you mentioned there are a few other problems with your current system:
Allocating to the heap is significantly slower than plain stack operations.
Every allocation will also need to be free()d, meaning every instance will require at least 2 function invocations, where as just using a stack based design would require none.
Since memory has to be manually managed it leaves much more room for memory leaks.
Memory allocations can fail! A stack based system would alleviate this.
Using pointers would require dereferencing. This is slower than direct access, and requires more (perhaps, sloppy) sytax.
In addition to this, many compilers cache the memory used for a program's stack, and can provide signifigant improvements over the heap (which is almost never cached (if possible!))
In short, simply relying on the stack for everything would be best, not only for performance, but also maintenence and clean code. The only thing to remember is that the stack is finite, and it could also be easy to go crazy. Use the stack for short term data (a binary operation result in this case), and the heap for heavier long term data.
I hope this helps! :)
Note: Much of the info in this answer is thanks to #Justin.
Allocating inside the operator isn't as convenient as it may seem.
This is mostly because you don't have garbage collection, and also because you have to worry about failed allocations.
Consider this code:
Vector3D *v1,*v2,*v3;
Vector3d v4 = addVectors(v1,multiplyVectors(v2,v3));
Seems nice.
But what happens with the vector returned from multiplyVectors? A memory leak.
And what happens if allocation fails? A crash in some other function.
I'd go for addition in-place:
void addVectors(Vector3D *target, const Vector3D *src);
This is equivalent to target += src;.
I would do as simple as
Vector3D addVectors(Vector3D a, Vector3D b) {
Vector3D c;
c.x = a.x + b.x;
c.y = a.y + b.y;
c.z = a.z + b.z;
return c;
}
If the caller really needs it on the heap, he can copy it by himself.