Alternative "null" value - c

It seems common to have the tail pointer at the end of a linked list be null (0).
What if I want to have two possible different "tails"?
My use case is a big integer representation that supports two's complement: I want to have a tail corresponding to "the rest of this number is zeros" and "the rest of this number is ones", where I can tell them apart just by performing a pointer equality.
It seems like this should be common enough to have standard practice, but it's hard to think of exactly what to search for. It seems somewhat arbitrary that we only get one "forbidden" pointer value (that will give a useful-ish error when accidentally dereferenced).
Options seem to include:
Use some arbitrary second value (like 1, or 0xdeadbeef). This seems evil. For one thing, I guess it needs to be aligned? Also, I will have obscure bugs if malloc happens to allocate a real linked list cell at the same address. Is there some region of memory malloc is guaranteed not to use?
Call malloc with a dummy non-zero size. This seems more sensible, but ideally I would have the pointer value be const, rather than requiring initialisation.
Take the address of something arbitrary, like a function defined in the file. This seems very evil, but does seem to lack any practical disadvantages (assuming it would work).

Given some ListItem type and a desired to have a ListItem * value that serves as a sentinel (also see sentinel node), we can simply define a ListItem object to serve that purpose:
ListItem SentinelObject;
ListItem * const SentinelValue = &SentinelObject;
This could also be made static if they will be used only in one translation unit.
The named object could be eliminated by using a compound literal:
ListItem * const SentinelValue = & (ListItem) {0};
(The initializer may need adjustment if 0 is not a suitable initailizer for the first member of ListItem.)
Alternately, wasting space could be avoided by overlapping the unused ListItem object with some other object:
union { SomeUsefulType SomeUsefulThing; ListItem SentinelObject; } MyUnion;
ListItem * const SentinelValue = &MyUnion.SentinelObject;
While this gives SomeUsefulThing and SentinelObject the same address, that is unlikely to be a problem given they have different types.

Related

Is it better to create new variables or using pointers in C? [duplicate]

In Go there are various ways to return a struct value or slice thereof. For individual ones I've seen:
type MyStruct struct {
Val int
}
func myfunc() MyStruct {
return MyStruct{Val: 1}
}
func myfunc() *MyStruct {
return &MyStruct{}
}
func myfunc(s *MyStruct) {
s.Val = 1
}
I understand the differences between these. The first returns a copy of the struct, the second a pointer to the struct value created within the function, the third expects an existing struct to be passed in and overrides the value.
I've seen all of these patterns be used in various contexts, I'm wondering what the best practices are regarding these. When would you use which? For instance, the first one could be ok for small structs (because the overhead is minimal), the second for bigger ones. And the third if you want to be extremely memory efficient, because you can easily reuse a single struct instance between calls. Are there any best practices for when to use which?
Similarly, the same question regarding slices:
func myfunc() []MyStruct {
return []MyStruct{ MyStruct{Val: 1} }
}
func myfunc() []*MyStruct {
return []MyStruct{ &MyStruct{Val: 1} }
}
func myfunc(s *[]MyStruct) {
*s = []MyStruct{ MyStruct{Val: 1} }
}
func myfunc(s *[]*MyStruct) {
*s = []MyStruct{ &MyStruct{Val: 1} }
}
Again: what are best practices here. I know slices are always pointers, so returning a pointer to a slice isn't useful. However, should I return a slice of struct values, a slice of pointers to structs, should I pass in a pointer to a slice as argument (a pattern used in the Go App Engine API)?
tl;dr:
Methods using receiver pointers are common; the rule of thumb for receivers is, "If in doubt, use a pointer."
Slices, maps, channels, strings, function values, and interface values are implemented with pointers internally, and a pointer to them is often redundant.
Elsewhere, use pointers for big structs or structs you'll have to change, and otherwise pass values, because getting things changed by surprise via a pointer is confusing.
One case where you should often use a pointer:
Receivers are pointers more often than other arguments. It's not unusual for methods to modify the thing they're called on, or for named types to be large structs, so the guidance is to default to pointers except in rare cases.
Jeff Hodges' copyfighter tool automatically searches for non-tiny receivers passed by value.
Some situations where you don't need pointers:
Code review guidelines suggest passing small structs like type Point struct { latitude, longitude float64 }, and maybe even things a bit bigger, as values, unless the function you're calling needs to be able to modify them in place.
Value semantics avoid aliasing situations where an assignment over here changes a value over there by surprise.
Passing small structs by value can be more efficient by avoiding cache misses or heap allocations. In any case, when pointers and values perform similarly, the Go-y approach is to choose whatever provides the more natural semantics rather than squeeze out every last bit of speed.
So, Go Wiki's code review comments page suggests passing by value when structs are small and likely to stay that way.
If the "large" cutoff seems vague, it is; arguably many structs are in a range where either a pointer or a value is OK. As a lower bound, the code review comments suggest slices (three machine words) are reasonable to use as value receivers. As something nearer an upper bound, bytes.Replace takes 10 words' worth of args (three slices and an int). You can find situations where copying even large structs turns out a performance win, but the rule of thumb is not to.
For slices, you don't need to pass a pointer to change elements of the array. io.Reader.Read(p []byte) changes the bytes of p, for instance. It's arguably a special case of "treat little structs like values," since internally you're passing around a little structure called a slice header (see Russ Cox (rsc)'s explanation). Similarly, you don't need a pointer to modify a map or communicate on a channel.
For slices you'll reslice (change the start/length/capacity of), built-in functions like append accept a slice value and return a new one. I'd imitate that; it avoids aliasing, returning a new slice helps call attention to the fact that a new array might be allocated, and it's familiar to callers.
It's not always practical follow that pattern. Some tools like database interfaces or serializers need to append to a slice whose type isn't known at compile time. They sometimes accept a pointer to a slice in an interface{} parameter.
Maps, channels, strings, and function and interface values, like slices, are internally references or structures that contain references already, so if you're just trying to avoid getting the underlying data copied, you don't need to pass pointers to them. (rsc wrote a separate post on how interface values are stored).
You still may need to pass pointers in the rarer case that you want to modify the caller's struct: flag.StringVar takes a *string for that reason, for example.
Where you use pointers:
Consider whether your function should be a method on whichever struct you need a pointer to. People expect a lot of methods on x to modify x, so making the modified struct the receiver may help to minimize surprise. There are guidelines on when receivers should be pointers.
Functions that have effects on their non-receiver params should make that clear in the godoc, or better yet, the godoc and the name (like reader.WriteTo(writer)).
You mention accepting a pointer to avoid allocations by allowing reuse; changing APIs for the sake of memory reuse is an optimization I'd delay until it's clear the allocations have a nontrivial cost, and then I'd look for a way that doesn't force the trickier API on all users:
For avoiding allocations, Go's escape analysis is your friend. You can sometimes help it avoid heap allocations by making types that can be initialized with a trivial constructor, a plain literal, or a useful zero value like bytes.Buffer.
Consider a Reset() method to put an object back in a blank state, like some stdlib types offer. Users who don't care or can't save an allocation don't have to call it.
Consider writing modify-in-place methods and create-from-scratch functions as matching pairs, for convenience: existingUser.LoadFromJSON(json []byte) error could be wrapped by NewUserFromJSON(json []byte) (*User, error). Again, it pushes the choice between laziness and pinching allocations to the individual caller.
Callers seeking to recycle memory can let sync.Pool handle some details. If a particular allocation creates a lot of memory pressure, you're confident you know when the alloc is no longer used, and you don't have a better optimization available, sync.Pool can help. (CloudFlare published a useful (pre-sync.Pool) blog post about recycling.)
Finally, on whether your slices should be of pointers: slices of values can be useful, and save you allocations and cache misses. There can be blockers:
The API to create your items might force pointers on you, e.g. you have to call NewFoo() *Foo rather than let Go initialize with the zero value.
The desired lifetimes of the items might not all be the same. The whole slice is freed at once; if 99% of the items are no longer useful but you have pointers to the other 1%, all of the array remains allocated.
Copying or moving the values might cause you performance or correctness problems, making pointers more attractive. Notably, append copies items when it grows the underlying array. Pointers to slice items from before the append may not point to where the item was copied after, copying can be slower for huge structs, and for e.g. sync.Mutex copying isn't allowed. Insert/delete in the middle and sorting also move items around so similar considerations can apply.
Broadly, value slices can make sense if either you get all of your items in place up front and don't move them (e.g., no more appends after initial setup), or if you do keep moving them around but you're confident that's OK (no/careful use of pointers to items, and items are small or you've measured the perf impact). Sometimes it comes down to something more specific to your situation, but that's a rough guide.
If you can (e.g. a non-shared resource that does not need to be passed as reference), use a value. By the following reasons:
Your code will be nicer and more readable, avoiding pointer operators and null checks.
Your code will be safer against Null Pointer panics.
Your code will be often faster: yes, faster! Why?
Reason 1: you will allocate less items in the heap. Allocating/deallocating from stack is immediate, but allocating/deallocating on Heap may be very expensive (allocation time + garbage collection). You can see some basic numbers here: http://www.macias.info/entry/201802102230_go_values_vs_references.md
Reason 2: especially if you store returned values in slices, your memory objects will be more compacted in memory: looping a slice where all the items are contiguous is much faster than iterating a slice where all the items are pointers to other parts of the memory. Not for the indirection step but for the increase of cache misses.
Myth breaker: a typical x86 cache line are 64 bytes. Most structs are smaller than that. The time of copying a cache line in memory is similar to copying a pointer.
Only if a critical part of your code is slow I would try some micro-optimization and check if using pointers improves somewhat the speed, at the cost of less readability and mantainability.
Three main reasons when you would want to use method receivers as pointers:
"First, and most important, does the method need to modify the receiver? If it does, the receiver must be a pointer."
"Second is the consideration of efficiency. If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver."
"Next is consistency. If some of the methods of the type must have pointer receivers, the rest should too, so the method set is consistent regardless of how the type is used"
Reference : https://golang.org/doc/faq#methods_on_values_or_pointers
Edit : Another important thing is to know the actual "type" that you are sending to function. The type can either be a 'value type' or 'reference type'.
Even as slices and maps acts as references, we might want to pass them as pointers in scenarios like changing the length of the slice in the function.
A case where you generally need to return a pointer is when constructing an instance of some stateful or shareable resource. This is often done by functions prefixed with New.
Because they represent a specific instance of something and they may need to coordinate some activity, it doesn't make a lot of sense to generate duplicated/copied structures representing the same resource -- so the returned pointer acts as the handle to the resource itself.
Some examples:
func NewTLSServer(handler http.Handler) *Server -- instantiate a web server for testing
func Open(name string) (*File, error) -- return a file access handle
In other cases, pointers are returned just because the structure may be too large to copy by default:
func NewRGBA(r Rectangle) *RGBA -- allocate an image in memory
Alternatively, returning pointers directly could be avoided by instead returning a copy of a structure that contains the pointer internally, but maybe this isn't considered idiomatic:
No such examples found in the standard libraries...
Related question: Embedding in Go with pointer or with value
Regarding to struct vs. pointer return value, I got confused after reading many highly stared open source projects on github, as there are many examples for both cases, util I found this amazing article:
https://www.ardanlabs.com/blog/2014/12/using-pointers-in-go.html
"In general, share struct type values with a pointer unless the struct type has been implemented to behave like a primitive data value.
If you are still not sure, this is another way to think about. Think of every struct as having a nature. If the nature of the struct is something that should not be changed, like a time, a color or a coordinate, then implement the struct as a primitive data value. If the nature of the struct is something that can be changed, even if it never is in your program, it is not a primitive data value and should be implemented to be shared with a pointer. Don’t create structs that have a duality of nature."
Completedly convinced.

C : Does creating a new instance of a value takes space if it exists already?

I was wondering in C if doing this
void aFunction(Type* pItem){
Type item = *pItem;
...do stuff with item
}
is less efficient in terms of speed or memory than always using *pItem in the function, that is not instantiating Type item = *pItem;. Or is it essentially the same after compiling ?
Thank you
Compilers generally optimize the code they generate, when invoked with optimization features enabled. Any good compiler will produce the same code for simple routines that use item after Type item = *pItem; as they do for routines that just use *pItem without saving it in item.
However, suppose the routine is not simple. Suppose you have:
void aFunction(Type *pItem, Type *qItem)
{
Type item = *pItem;
*qItem = SomeValue;
printf("%Format\n", item);
printf("%Format\n", *pItem);
}
In this case, the compiler cannot know that *pItem is the same as item, because pItem and qItem might point to the same object, so *qItem = SomeValue might have changed *pItem. Therefore, to implement the second printf, the compiler must load *pItem after executing the *qItem = SomeValue.
For this reason, using Type item = *pItem; may actually be better than not creating a new local object if you know that pItem and qItem will always point to different objects, because it allows the compiler to load *pItem once and keep it in a processor register instead of reloading it, perhaps multiple times if *pItem and *qItem are accessed multiple times throughout the routine.
In this case, there is a way to tell the compiler that this potential equality of pointers does not occur. The restrict qualifier will tell the compiler that the object pItem points to is accessed only through the pItem pointer:
void aFunction(restrict Type *pItem, Type *qItem)
However, in general, these situations can become very complicated. Type might be a structure that contains pointers to other objects of type Type. For example, Type might be a tree node that contains members left and right that point to subtrees. For the most part, you should write code in a way that is clear and let the compiler optimize it. If it is convenient for you to save *pItem in a local object and use that, then do so. As you gain experience, you will come to learn more about how compilers behave and how you can write code that allows a compiler to optimize.

Changing a pointer as a result of destroying an "object" in C

As part of a course I am attending at the moment, we are working in C with self-developed low level libraries, and we are now working in our final project, which is a game.
At a certain point, it seemed relevant to have a struct (serving as a sort of object) that held some important information about the current game status, namely a pointer to a player "object" (can't really call the simulated objects we are using actual objects, can we?).
It would go something like this:
typedef struct {
//Holds relevant information about game current state
state_st currstate;
//Buffer of events to process ('array of events')
//Needs to be pointers because of deallocating memory
event_st ** event_buffer;
//Indicates the size of the event buffer array above
unsigned int n_events_to_process;
//... Other members ...
//Pointer to a player (Pointer to allow allocation and deallocation)
Player * player;
//Flag that indicates if a player has been created
bool player_created;
} Game_Info;
The problem is the following:
If we are to stick to the design philosophy that is used in most of this course, we are to "abstract" these "objects" using functions like Game_Info * create_game_info() and destroy_game_info(Game_Info * gi_ptr) to act as constructors and destructors for these "objects" (also, "member functions" would be something like update_game_state(Game_Info * gi_ptr), acting like C++ by passing the normally implicit this as the first argument).
Therefore, as a way of detecting if the player object inside a Game_Info "instance" had already been deleted I am comparing the player pointer to NULL, since in all of the "destructors", after deallocating the memory I set the passed pointer to NULL, to show that the object was successfully deallocated.
This obviously causes a problem (which I did not detect at first, and thus the player_created bool flag that fixed it while I still was getting a grasp on what was happening) which is that because the pointer is passed by copy and not by reference, it is not set to NULL after the call to the "object" "destructor", and thus comparing it to NULL is not a reliable way to know if the pointer was deallocated.
I am writing this, then, to ask for input on what would be the best way to overcome this problem:
A flag to indicate if an "object" is "instanced" or not - using the flag instead of ptr == NULL in comparisons to assert if the "object" is "instanced" - the solution I am currently using
Passing a pointer to the pointer (calling the functions with &player instead of only player) - would enable setting to NULL
Setting the pointer to NULL one "level" above, after calling the "destructor"
Any other solution, since I am not very experienced in C and am probably overlooking an easier way to solve this problem.
Thank you for reading and for any advice you might be able to provide!
I am writing this, then, to ask for input on what would be the best way to overcome this problem: …
What would be the best way is primarily opinion-based, but of the ways you listed the worst is the first, where one has to keep two variables (pointer and flag) synchronized.
Any other solution…
Another solution would be using a macro, e. g.:
#define destroy_player(p) do { /* whatever cleanup needed */; free(p), p = NULL; } while (0)
…
destroy_player(gi_ptr->player);

If var seems to deep copy arrays in Swift. Does if let?

In Swift 3.0, the code below gives different addresses for thisArray[0], suggesting that the array was deep copied. Is this actually the case, or am I missing something in my analysis? Does if let behave the same way? It may be irrelevant for if let, as it is immutable...
var thisArray: [String]? = ["One", "Two"]
withUnsafePointer(to: &thisArray![0]) {
print("thisArray[0] has address \($0)")
}
if var thisArray = thisArray {
withUnsafePointer(to: &thisArray[0]) {
print("thisArray[0] has address \($0)")
}
}
Relevant: https://developer.apple.com/swift/blog/?id=10.
In Swift, Array, String, and Dictionary are all value types.
So, if you assign an existing value type via var or let then a copy occurs. If you assign an existing reference type (such as a class) via var or let then you'll be assigning a reference.
#CharlieS's answer is mostly correct but glosses over some important details...
Semantically, assigning a value type to a different binding (whether a var variable or let constant) always creates a copy. That is, your program code can always safely assume that modifications to one binding of a value type will never affect others.
Or to put it a different way: if you were building your own version of the Swift compiler / runtime / standard library from scratch, you could make every var a = b allocate new memory for a and copy all the memory contents of b, regardless of which value type a and b are. All other things being equal, your implementation would be compatible with all Swift programs.
The downside to value type reassignment always being a copy is that for large types (like collections or composite types), all that copying wastes time and memory. So...
In practice, value types can be implemented in ways that maintain the semantic always-a-copy guarantee of value types while providing performance optimizations like copy-on-write. The Swift Standard Library collection types (arrays, dictionaries, sets, etc) do this, and it's possible for custom value types (including yours) to implement copy-on-write too. (For details on how, this WWDC 2015 talk provides a good overview.)
To make copy-on-write work, an implementing value type needs to use reference types internally (as noted in that WWDC talk). And it has to do it in such a way that the language guarantee for value types — that assignments are always semantically copies — continues to hold in all cases.
One of the ways that a copy-on-write array implementation could fail that guarantee would be to allow unguarded access to its underlying storage buffer — if you can get a raw pointer into that storage, you could mutate the contents in ways that cause other bindings (that is, semantic copies) to mutate, violating the language guarantee.
To preserve the copy-on-write guarantee, the standard library's collection types make sure that copies certain operations that could perform unguarded mutation create copies. (Although even then, sometimes the copies created involve enough reference manipulation that the memory and time costs of the copies remain low up until an actual mutation happens.)
You can see a bit of how this works in the Swift compiler & standard library source code — start from a search for isUniquelyReferenced and follow the callers and callees of is various use cases in ArrayBuffer etc.
For an illustration of what's going on here, let's try a variation on your test:
var thisArray: [String] = ["One", "Two"]
withUnsafePointer(to: &thisArray[0]) {
print("thisArray[0] has address \($0)")
}
var thatArray = thisArray // comment/uncomment here
withUnsafePointer(to: &thisArray[0]) {
print("thisArray[0] has address \($0)")
}
When you comment out the assignment thatArray = thisArray, both addresses are the same. Once thisArray is no longer uniquely referenced, though, accessing even the original array's underlying buffer requires a copy (or at least some internal indirection).

A no-op device for function pointer tables?

void table_no_op()
{
// this is for function table elements that do nothing,
// fills space between states, use less of them
return;
}
I am currently using this to define a "zero" in a function pointer table, where the input index is supposed to do nothing. Is it okay or something glaringly wrong?
While there's nothing wrong with the no-op per se (in general, this is called Null Object Pattern), I would be worried about the function declaration - i.e. does every function in the table take 0 arguments and return void?
A counterexample would be OpenGL where you often retrieve a pointer to function and cast it to the desired type yourself - but casting a void->void pointer to something else, e.g. (int, int)->int would be undefined behavior and likely cause crash (or uninitialized return value, or else).
So, if the functions in the table are homogeneous - go for it. If not - better do something else.
EDIT: You can only do 2 things with a function pointer - cast it to a different function pointer; and call, but only with the original type.
See http://blog.frama-c.com/index.php?post/2013/08/24/Function-pointers-in-C for details. Raymond Chen has another example here - http://blogs.msdn.com/b/oldnewthing/archive/2011/05/06/10161590.aspx
EDIT2: However, you may make a number of no_ops (noop_IntInt_Int, noop_IntDouble_Double and so on... then if you match the types every time, that might work)

Resources