In Go there are various ways to return a struct value or slice thereof. For individual ones I've seen:
type MyStruct struct {
Val int
}
func myfunc() MyStruct {
return MyStruct{Val: 1}
}
func myfunc() *MyStruct {
return &MyStruct{}
}
func myfunc(s *MyStruct) {
s.Val = 1
}
I understand the differences between these. The first returns a copy of the struct, the second a pointer to the struct value created within the function, the third expects an existing struct to be passed in and overrides the value.
I've seen all of these patterns be used in various contexts, I'm wondering what the best practices are regarding these. When would you use which? For instance, the first one could be ok for small structs (because the overhead is minimal), the second for bigger ones. And the third if you want to be extremely memory efficient, because you can easily reuse a single struct instance between calls. Are there any best practices for when to use which?
Similarly, the same question regarding slices:
func myfunc() []MyStruct {
return []MyStruct{ MyStruct{Val: 1} }
}
func myfunc() []*MyStruct {
return []MyStruct{ &MyStruct{Val: 1} }
}
func myfunc(s *[]MyStruct) {
*s = []MyStruct{ MyStruct{Val: 1} }
}
func myfunc(s *[]*MyStruct) {
*s = []MyStruct{ &MyStruct{Val: 1} }
}
Again: what are best practices here. I know slices are always pointers, so returning a pointer to a slice isn't useful. However, should I return a slice of struct values, a slice of pointers to structs, should I pass in a pointer to a slice as argument (a pattern used in the Go App Engine API)?
tl;dr:
Methods using receiver pointers are common; the rule of thumb for receivers is, "If in doubt, use a pointer."
Slices, maps, channels, strings, function values, and interface values are implemented with pointers internally, and a pointer to them is often redundant.
Elsewhere, use pointers for big structs or structs you'll have to change, and otherwise pass values, because getting things changed by surprise via a pointer is confusing.
One case where you should often use a pointer:
Receivers are pointers more often than other arguments. It's not unusual for methods to modify the thing they're called on, or for named types to be large structs, so the guidance is to default to pointers except in rare cases.
Jeff Hodges' copyfighter tool automatically searches for non-tiny receivers passed by value.
Some situations where you don't need pointers:
Code review guidelines suggest passing small structs like type Point struct { latitude, longitude float64 }, and maybe even things a bit bigger, as values, unless the function you're calling needs to be able to modify them in place.
Value semantics avoid aliasing situations where an assignment over here changes a value over there by surprise.
Passing small structs by value can be more efficient by avoiding cache misses or heap allocations. In any case, when pointers and values perform similarly, the Go-y approach is to choose whatever provides the more natural semantics rather than squeeze out every last bit of speed.
So, Go Wiki's code review comments page suggests passing by value when structs are small and likely to stay that way.
If the "large" cutoff seems vague, it is; arguably many structs are in a range where either a pointer or a value is OK. As a lower bound, the code review comments suggest slices (three machine words) are reasonable to use as value receivers. As something nearer an upper bound, bytes.Replace takes 10 words' worth of args (three slices and an int). You can find situations where copying even large structs turns out a performance win, but the rule of thumb is not to.
For slices, you don't need to pass a pointer to change elements of the array. io.Reader.Read(p []byte) changes the bytes of p, for instance. It's arguably a special case of "treat little structs like values," since internally you're passing around a little structure called a slice header (see Russ Cox (rsc)'s explanation). Similarly, you don't need a pointer to modify a map or communicate on a channel.
For slices you'll reslice (change the start/length/capacity of), built-in functions like append accept a slice value and return a new one. I'd imitate that; it avoids aliasing, returning a new slice helps call attention to the fact that a new array might be allocated, and it's familiar to callers.
It's not always practical follow that pattern. Some tools like database interfaces or serializers need to append to a slice whose type isn't known at compile time. They sometimes accept a pointer to a slice in an interface{} parameter.
Maps, channels, strings, and function and interface values, like slices, are internally references or structures that contain references already, so if you're just trying to avoid getting the underlying data copied, you don't need to pass pointers to them. (rsc wrote a separate post on how interface values are stored).
You still may need to pass pointers in the rarer case that you want to modify the caller's struct: flag.StringVar takes a *string for that reason, for example.
Where you use pointers:
Consider whether your function should be a method on whichever struct you need a pointer to. People expect a lot of methods on x to modify x, so making the modified struct the receiver may help to minimize surprise. There are guidelines on when receivers should be pointers.
Functions that have effects on their non-receiver params should make that clear in the godoc, or better yet, the godoc and the name (like reader.WriteTo(writer)).
You mention accepting a pointer to avoid allocations by allowing reuse; changing APIs for the sake of memory reuse is an optimization I'd delay until it's clear the allocations have a nontrivial cost, and then I'd look for a way that doesn't force the trickier API on all users:
For avoiding allocations, Go's escape analysis is your friend. You can sometimes help it avoid heap allocations by making types that can be initialized with a trivial constructor, a plain literal, or a useful zero value like bytes.Buffer.
Consider a Reset() method to put an object back in a blank state, like some stdlib types offer. Users who don't care or can't save an allocation don't have to call it.
Consider writing modify-in-place methods and create-from-scratch functions as matching pairs, for convenience: existingUser.LoadFromJSON(json []byte) error could be wrapped by NewUserFromJSON(json []byte) (*User, error). Again, it pushes the choice between laziness and pinching allocations to the individual caller.
Callers seeking to recycle memory can let sync.Pool handle some details. If a particular allocation creates a lot of memory pressure, you're confident you know when the alloc is no longer used, and you don't have a better optimization available, sync.Pool can help. (CloudFlare published a useful (pre-sync.Pool) blog post about recycling.)
Finally, on whether your slices should be of pointers: slices of values can be useful, and save you allocations and cache misses. There can be blockers:
The API to create your items might force pointers on you, e.g. you have to call NewFoo() *Foo rather than let Go initialize with the zero value.
The desired lifetimes of the items might not all be the same. The whole slice is freed at once; if 99% of the items are no longer useful but you have pointers to the other 1%, all of the array remains allocated.
Copying or moving the values might cause you performance or correctness problems, making pointers more attractive. Notably, append copies items when it grows the underlying array. Pointers to slice items from before the append may not point to where the item was copied after, copying can be slower for huge structs, and for e.g. sync.Mutex copying isn't allowed. Insert/delete in the middle and sorting also move items around so similar considerations can apply.
Broadly, value slices can make sense if either you get all of your items in place up front and don't move them (e.g., no more appends after initial setup), or if you do keep moving them around but you're confident that's OK (no/careful use of pointers to items, and items are small or you've measured the perf impact). Sometimes it comes down to something more specific to your situation, but that's a rough guide.
If you can (e.g. a non-shared resource that does not need to be passed as reference), use a value. By the following reasons:
Your code will be nicer and more readable, avoiding pointer operators and null checks.
Your code will be safer against Null Pointer panics.
Your code will be often faster: yes, faster! Why?
Reason 1: you will allocate less items in the heap. Allocating/deallocating from stack is immediate, but allocating/deallocating on Heap may be very expensive (allocation time + garbage collection). You can see some basic numbers here: http://www.macias.info/entry/201802102230_go_values_vs_references.md
Reason 2: especially if you store returned values in slices, your memory objects will be more compacted in memory: looping a slice where all the items are contiguous is much faster than iterating a slice where all the items are pointers to other parts of the memory. Not for the indirection step but for the increase of cache misses.
Myth breaker: a typical x86 cache line are 64 bytes. Most structs are smaller than that. The time of copying a cache line in memory is similar to copying a pointer.
Only if a critical part of your code is slow I would try some micro-optimization and check if using pointers improves somewhat the speed, at the cost of less readability and mantainability.
Three main reasons when you would want to use method receivers as pointers:
"First, and most important, does the method need to modify the receiver? If it does, the receiver must be a pointer."
"Second is the consideration of efficiency. If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver."
"Next is consistency. If some of the methods of the type must have pointer receivers, the rest should too, so the method set is consistent regardless of how the type is used"
Reference : https://golang.org/doc/faq#methods_on_values_or_pointers
Edit : Another important thing is to know the actual "type" that you are sending to function. The type can either be a 'value type' or 'reference type'.
Even as slices and maps acts as references, we might want to pass them as pointers in scenarios like changing the length of the slice in the function.
A case where you generally need to return a pointer is when constructing an instance of some stateful or shareable resource. This is often done by functions prefixed with New.
Because they represent a specific instance of something and they may need to coordinate some activity, it doesn't make a lot of sense to generate duplicated/copied structures representing the same resource -- so the returned pointer acts as the handle to the resource itself.
Some examples:
func NewTLSServer(handler http.Handler) *Server -- instantiate a web server for testing
func Open(name string) (*File, error) -- return a file access handle
In other cases, pointers are returned just because the structure may be too large to copy by default:
func NewRGBA(r Rectangle) *RGBA -- allocate an image in memory
Alternatively, returning pointers directly could be avoided by instead returning a copy of a structure that contains the pointer internally, but maybe this isn't considered idiomatic:
No such examples found in the standard libraries...
Related question: Embedding in Go with pointer or with value
Regarding to struct vs. pointer return value, I got confused after reading many highly stared open source projects on github, as there are many examples for both cases, util I found this amazing article:
https://www.ardanlabs.com/blog/2014/12/using-pointers-in-go.html
"In general, share struct type values with a pointer unless the struct type has been implemented to behave like a primitive data value.
If you are still not sure, this is another way to think about. Think of every struct as having a nature. If the nature of the struct is something that should not be changed, like a time, a color or a coordinate, then implement the struct as a primitive data value. If the nature of the struct is something that can be changed, even if it never is in your program, it is not a primitive data value and should be implemented to be shared with a pointer. Don’t create structs that have a duality of nature."
Completedly convinced.
I am being passed an array from a C program that does not include the size of the array; that is, it just passes a pointer to the array. The array is a generic type <Item>. How can I determine the end of the array in order to detect a buffer overflow?
I tried iterating through the array until I received something that wasn't an <Item>. That worked most of the time but sometimes the nonsense at the end Would be of type <Item>. I am using C and calling a function from an external class I had no deal in developing. <Item> is a struct with multiple references to other arrays (sort of like a linked list).
EDIT:
The api stated that the array was intended to be a read-only version. The problem is I cannot read it if I do not know the size. It doesn't appear there is a sentinel value. There is a random comment stating that if the size is needed use sizeOf (array)/sizeOf (Item) which doesn't work. It was developed by a team that no longer works here. The problem is other code already relies on this C code and I cannot change it without fear of ruining other code.
It is not possible to determine the end of an array based on just a pointer to an element of that array.
I tried iterating through the array until I received something that wasn't an <Item>
It's also not possible to determine whether particular memory location contains an object of particular type - or whether it contains any object. Even if you could, how would you determine if the object that you find is really part of the array and not just a separate <Item> object that happens to be there?
A possible solution is to use a sentinel value to represent the end of an array. For example, you could define the interface such that <Item>.member == 0 if and only if that is the last element of the array. This is similar to how null-terminated strings work.
If all you have is a pointer and no size or known "end-of-array" marker (sentinel) in the data, then you have an impossible situation. There is no way in that case to determine the size/end of the passed array.
First thing first. I am aware that there is no way direct way to find out size of an incoming array in a method as they are received as pointers. Situation I am facing is something similar but different in nature.
I have a method something like this:
void method( int no_of_elements, int arr[])
{
/* checking if array has 'no_of_elements' of elements.
}
I want to check if the integer array arr[] has no_of_elements or not. Based on this my method shall process further. I know it almost sounds like finding the size of arr[]. At-least that's the only way I could think to check my requirement. Any approach to solve this would be appreciated.
I want to check if the integer array arr[] has "no_of_elements' or
not. Based on this my method shall process further.
This is not possible, the caller must pass the size of the array to this function for example as additional parameter.
Situation I am facing is something similar but different in nature.
No, it is the very same thing. You can't know how large the pointed-at chunk of data is, unless that information is provided, period.
Related, you can force an array to be of a certain size:
void method( int no_of_elements, int arr[no_of_elements])
or you can force it to be at least of a certain size:
void method( int no_of_elements, int arr[static no_of_elements])
None of these will likely yield compiler errors/warnings though, since arrays "decay" into pointers to first element when passed to functions. The above are rather to be regarded as documentation, a contract you sign with the caller. If they violate the contract by passing something else, it is their fault.
But very good compilers, or external static analyser tools, may be able to spot type-related bugs if you use these methods. If you just use a int*, they won't.
I know one can use SetWindowLongPtr + GWLP_USERDATA to store a pointer which points to some data.
But could one store the data directly, for example "a handle", "a bool, an "int" or other larger data.
From http://msdn.microsoft.com/zh-tw/library/windows/desktop/ms644898%28v=vs.85%29.aspx, it says:
Sets new extra information that is private to the application, such as handles or pointers.
, so I guess to store a handle is OK. I also used this method to store an RGB value without problem.
But I don't know if this is a good idea to do things like this. And can we store other data which is large (for example, a structure)?
p.s: The motivation of this question is: When I create a dialog window, I want to store data for each of its controls. Of course I can use static variables in the window procedure and pass pointer (to them) to SetWindowLongPtr function. But this is not "perfect" in theory, because when the dialog window is closed, I don't need these data anymore. Of course, in practice, the data I need to use is very small, and I should not care about the usage of memory. But I still like to know if there is a better way.
You only need one pointer to store anything you want. Declare a struct with the data you want to store. Allocate it before the CreateWindowEx() call and pass the pointer as the last argument. You get it back in your window procedure for the WM_CREATE message, CREATESTRUCT.lpCreateParams field. Now call SetWindowsLongPtr to store that pointer.
Anytime you need it back, use GetWindowlongPtr to recover the pointer to the struct. You'll need to cleanup again, use the WM_NCDESTROY message to release the pointer.
Note that this is a standard technique used in C++ class libraries that wrap the winapi. Do consider using one of them instead of spinning this yourself.
The SetWindowLongPtr function can store a piece of data which has the same size as LONG_PTR (most likely 32bit or 64bit). If your data can be stored in that size, you're fine. I.e. a bool would be fine, so would most handles (since handles tend to be pointers, too).
A typical RGB value would work as well since it's stored as three bytes (one byte per color component) or four bytes (an extra byte for the alpha channel).
If you need more space than this, you should allocate a structure somewhere else and store a pointer to that structure.
Here is a matrix declared as pointer to an array of pointers to rows.
(source: Numerical Recipes in C)
What is the better way to pass this matrix to a function along with its dimensions?
void printMatrix(float **matrix, int rows, int cols);
Or pack it in a struct
struct Matrix {
int rows, cols;
int **data;
};
and pass a pointer to the struct?
void printMatrix(struct Matrix *m);
Both ways work, however, the approach using a struct is a bit "easier" to use. You (or whoever will use this) won't have to worry about passing the correct size as well and it isn't required to organize it at all. You just handle one struct or one logical object. If you split everything up, you'll have to handle the data as well as the meta data yourself (i.e. storying/passing data and dimensions).
Is there a downside using the struct? Not that I know of (other than having to handle one more pointer). However there is one huge advantage: Using the struct you could use a function wanting data and meta data separated as well (by passing the struct elements rather than a pointer to the struct). This isn't that easy the other way around.
As for "is it worth it?" considering "should I do it for organisaiton?": Do it, if the grouping is logical. Lots of windows APIs work with structs that way, but I'm not a real fan of them, if the grouping isn't logical or it creates additional "pains". In other words: Don't group your parameters into a struct, if they're not related or if the user most likely wouldn't have them in that form (i.e. they're grouped for this call only).
Edit:
As an example:
I'd group your example data, as width and height belong to the matrix data and they're related (plus they might be used in other functions the same way).
However, I wouldn't group parameters such as this: write_log(LOG_INFO, "All data has been processed"); Adding a struct here would add complexity that isn't required. It's very likely that this group of data won't be used elsewhere and makes calling the function more complicated (as you'll have to create the struct first).
For the sake of optimization, I would consider simply passing the struct by value. i.e.
void printMatrix(struct Matrix m);
without the pointer. It's a very small data structure and the processor might just store this top-level data in the cache. The compiler and processor may be able to optimize access to this top-level data.
Then again, it might do nothing or even make it worse. Optimization can be a black art.
(And don't forget that if you make changes to the top-level Matrix struct, then you'll need to return it somehow). So maybe this should only be considered in place of const struct Matrix *m.
There is no single perfect method. In the appropriate chapter of c-faq you can see 5 methods and their comparison.