efficient sort with custom comparison, but no callback function - c

I have a need for an efficient sort that doesn't have a callback, but is as customizable as using qsort(). What I want is for it to work like an iterator, where it continuously calls into the sort API in a loop until it is done, doing the comparison in the loop rather than off in a callback function. This way the custom comparison is local to the calling function (and therefore has access to local variables, is potentially more efficient, etc). I have implemented this for an inefficient selection sort, but need it to be efficient, so prefer a quick sort derivative.
Has anyone done anything like this? I tried to do it for quick sort, but trying to turn the algorithm inside out hurt my brain too much.
Below is how it might look in use.
// the array of data we are sorting
MyData array[5000], *firstP, *secondP;
// (assume data is filled in)
Sorter sorter;
// initialize sorter
int result = sortInit (&sorter, array, 5000,
(void **)&firstP, (void **)&secondP, sizeof(MyData));
// loop until complete
while (sortIteration (&sorter, result) == 0) {
// here's where we do the custom comparison...here we
// just sort by member "value" but we could do anything
result = firstP->value - secondP->value;
}

Turning the sort function inside out as you propose isn't likely to make it faster. You're trading indirection on the comparison function for indirection on the item pointers.
It appears you want your comparison function to have access to state information. The quick-n-dirty way to create global variables or a global structure, assuming you don't have more than one thread going at once. The qsort function won't return until all the data is sorted, so in a single threaded environment this should be safe.
The only other thing I would suggest is to locate a source to qsort and modify it to take an extra parameter, a pointer to your state structure. You can then pass this pointer into your comparison function.

Take an existing implementation of qsort and update it to reference the Sorter object for its local variables. Instead of calling a compare function passed in, it would update its state and return to the caller.
Because of recursion in qsort, you'll need to keep some sort of a state stack in your Sorter object. You could accomplish that with an array or a linked-list using dynamic allocation (less efficient). Since most qsort implementations use tail recursion for the larger half and make a recursive call to qsort for the smaller half of the pivot point, you can sort at least 2n elements if your array can hold n states.

A simple solution is to use a inlineble sort function and a inlineble compare callback. When compiled with optimisation, both call get flatten into each other exactly like you want. The only downside is that your choice of sort algorithm is limited because if you recurse or alloc more memory you potentially lose any benefit from doing this. Method with small overhead, like this, work best with small data set.
You can use generic sort function with compare method, size, offset and stride.This way custom comparison can be done by parameter rather then callback. With this way you can use any algorithm. Just use some macro to fill in the most common case because you will have a lot of function argument.
Also, check out the STB library (https://github.com/nothings/stb).
It has sorting function similar to this among many other useful C tools.

What you're asking for has already been done -- it's called std::sort, and it's already in the C++ standard library. Better support for this (among many other things) is part of why well-written C++ is generally faster than C.

You could write a preprocessor macro to output a sort routine, and have the macro take a comparison expression as an argument.
#define GENERATE_SORT(name, type, comparison_expression) \
void name(type* begin, type* end) \
{ /* ... when needed, fill a and b and use comparison_expression */ }
GENERATE_SORT(sort_ints, (*a<*b))
void foo()
{
int array[10];
sort_ints(array, array+10);
}

Two points. I).
_asm
II). basic design limits of compilers.
Compilers have, as a basic purpose, the design goal of avoiding assembler or machine code. They achieve this by imposing certain limits. In this case, we give up a flexibility that we can easily do in assembly code. i.e. split the generated code of the sort into two pieces at the call to the compare function. copy the first half to somewhere. next copy the generated code of the compare function to there, just after the previous copied code of the first part. then copy the last half of the sort code. Finally, we have to deal with a whole series of minor details. See also the concept of "hot patching" running programs.

Related

Why do filter method implementations create another array instead of modifying the current array?

From what I know, many popular implementations of filter collection methods, e.g. JavaScript's Array#filter method, tend to create a new array rather than modifying it. (As #Berthur mentioned, this is also generally useful in terms of functional programming as well).
However, from what I've seen in homemade methods of filter implementations, sometimes the author chooses to use a while / for loop on a dynamically allocated array (e.g. an ArrayList in Java) and remove elements instead.
I have a general idea of why this is the case (since removing elements requires the rest of the array's elements afterwards to be shifted over, which is O(n) while adding elements is O(1)), but I also know that in the same case, if an element is added to the end of an array when the array is full, it requires memory to be allocated, which requires, in the case for Java, the array to be copied.
Thus, is there some mathematical reason of why creating a new array for filtering is (generally) faster than removing & moving elements over, or is it just for the guaranteed immutability over the original array that it guarantees?
It's not generally faster, and it's not done for performance reasons. It's more of a programming paradigm, as well as being a convenient tool.
While in-place algorithms are often faster for performance and/or memory critical applications, they need to know about the underlying implementation of the data structure, and become more specific. This immutable approach allows for more general functionality, apart from being convenient. The approach is common in functional programming. As you say, it guarantees immutability, which makes it compatible with this way of thinking.
In your Javascript example, for instance, notice that you can call filter on a regular array, but you could also call it on a TypedArray. Now, typed arrays cannot be resized, so performing an in-place filter would not be possible in the first place. But the filter method behaves in the same way through their common interface, following the principles of polymorphism.
Ultimately, these functions are just available to you and while they can be very convenient for many cases, it is up to you as a programmer to decide whether they cover your specific need or whether you must implement your own custom algorithm.

Is it better to create new variables or using pointers in C? [duplicate]

In Go there are various ways to return a struct value or slice thereof. For individual ones I've seen:
type MyStruct struct {
Val int
}
func myfunc() MyStruct {
return MyStruct{Val: 1}
}
func myfunc() *MyStruct {
return &MyStruct{}
}
func myfunc(s *MyStruct) {
s.Val = 1
}
I understand the differences between these. The first returns a copy of the struct, the second a pointer to the struct value created within the function, the third expects an existing struct to be passed in and overrides the value.
I've seen all of these patterns be used in various contexts, I'm wondering what the best practices are regarding these. When would you use which? For instance, the first one could be ok for small structs (because the overhead is minimal), the second for bigger ones. And the third if you want to be extremely memory efficient, because you can easily reuse a single struct instance between calls. Are there any best practices for when to use which?
Similarly, the same question regarding slices:
func myfunc() []MyStruct {
return []MyStruct{ MyStruct{Val: 1} }
}
func myfunc() []*MyStruct {
return []MyStruct{ &MyStruct{Val: 1} }
}
func myfunc(s *[]MyStruct) {
*s = []MyStruct{ MyStruct{Val: 1} }
}
func myfunc(s *[]*MyStruct) {
*s = []MyStruct{ &MyStruct{Val: 1} }
}
Again: what are best practices here. I know slices are always pointers, so returning a pointer to a slice isn't useful. However, should I return a slice of struct values, a slice of pointers to structs, should I pass in a pointer to a slice as argument (a pattern used in the Go App Engine API)?
tl;dr:
Methods using receiver pointers are common; the rule of thumb for receivers is, "If in doubt, use a pointer."
Slices, maps, channels, strings, function values, and interface values are implemented with pointers internally, and a pointer to them is often redundant.
Elsewhere, use pointers for big structs or structs you'll have to change, and otherwise pass values, because getting things changed by surprise via a pointer is confusing.
One case where you should often use a pointer:
Receivers are pointers more often than other arguments. It's not unusual for methods to modify the thing they're called on, or for named types to be large structs, so the guidance is to default to pointers except in rare cases.
Jeff Hodges' copyfighter tool automatically searches for non-tiny receivers passed by value.
Some situations where you don't need pointers:
Code review guidelines suggest passing small structs like type Point struct { latitude, longitude float64 }, and maybe even things a bit bigger, as values, unless the function you're calling needs to be able to modify them in place.
Value semantics avoid aliasing situations where an assignment over here changes a value over there by surprise.
Passing small structs by value can be more efficient by avoiding cache misses or heap allocations. In any case, when pointers and values perform similarly, the Go-y approach is to choose whatever provides the more natural semantics rather than squeeze out every last bit of speed.
So, Go Wiki's code review comments page suggests passing by value when structs are small and likely to stay that way.
If the "large" cutoff seems vague, it is; arguably many structs are in a range where either a pointer or a value is OK. As a lower bound, the code review comments suggest slices (three machine words) are reasonable to use as value receivers. As something nearer an upper bound, bytes.Replace takes 10 words' worth of args (three slices and an int). You can find situations where copying even large structs turns out a performance win, but the rule of thumb is not to.
For slices, you don't need to pass a pointer to change elements of the array. io.Reader.Read(p []byte) changes the bytes of p, for instance. It's arguably a special case of "treat little structs like values," since internally you're passing around a little structure called a slice header (see Russ Cox (rsc)'s explanation). Similarly, you don't need a pointer to modify a map or communicate on a channel.
For slices you'll reslice (change the start/length/capacity of), built-in functions like append accept a slice value and return a new one. I'd imitate that; it avoids aliasing, returning a new slice helps call attention to the fact that a new array might be allocated, and it's familiar to callers.
It's not always practical follow that pattern. Some tools like database interfaces or serializers need to append to a slice whose type isn't known at compile time. They sometimes accept a pointer to a slice in an interface{} parameter.
Maps, channels, strings, and function and interface values, like slices, are internally references or structures that contain references already, so if you're just trying to avoid getting the underlying data copied, you don't need to pass pointers to them. (rsc wrote a separate post on how interface values are stored).
You still may need to pass pointers in the rarer case that you want to modify the caller's struct: flag.StringVar takes a *string for that reason, for example.
Where you use pointers:
Consider whether your function should be a method on whichever struct you need a pointer to. People expect a lot of methods on x to modify x, so making the modified struct the receiver may help to minimize surprise. There are guidelines on when receivers should be pointers.
Functions that have effects on their non-receiver params should make that clear in the godoc, or better yet, the godoc and the name (like reader.WriteTo(writer)).
You mention accepting a pointer to avoid allocations by allowing reuse; changing APIs for the sake of memory reuse is an optimization I'd delay until it's clear the allocations have a nontrivial cost, and then I'd look for a way that doesn't force the trickier API on all users:
For avoiding allocations, Go's escape analysis is your friend. You can sometimes help it avoid heap allocations by making types that can be initialized with a trivial constructor, a plain literal, or a useful zero value like bytes.Buffer.
Consider a Reset() method to put an object back in a blank state, like some stdlib types offer. Users who don't care or can't save an allocation don't have to call it.
Consider writing modify-in-place methods and create-from-scratch functions as matching pairs, for convenience: existingUser.LoadFromJSON(json []byte) error could be wrapped by NewUserFromJSON(json []byte) (*User, error). Again, it pushes the choice between laziness and pinching allocations to the individual caller.
Callers seeking to recycle memory can let sync.Pool handle some details. If a particular allocation creates a lot of memory pressure, you're confident you know when the alloc is no longer used, and you don't have a better optimization available, sync.Pool can help. (CloudFlare published a useful (pre-sync.Pool) blog post about recycling.)
Finally, on whether your slices should be of pointers: slices of values can be useful, and save you allocations and cache misses. There can be blockers:
The API to create your items might force pointers on you, e.g. you have to call NewFoo() *Foo rather than let Go initialize with the zero value.
The desired lifetimes of the items might not all be the same. The whole slice is freed at once; if 99% of the items are no longer useful but you have pointers to the other 1%, all of the array remains allocated.
Copying or moving the values might cause you performance or correctness problems, making pointers more attractive. Notably, append copies items when it grows the underlying array. Pointers to slice items from before the append may not point to where the item was copied after, copying can be slower for huge structs, and for e.g. sync.Mutex copying isn't allowed. Insert/delete in the middle and sorting also move items around so similar considerations can apply.
Broadly, value slices can make sense if either you get all of your items in place up front and don't move them (e.g., no more appends after initial setup), or if you do keep moving them around but you're confident that's OK (no/careful use of pointers to items, and items are small or you've measured the perf impact). Sometimes it comes down to something more specific to your situation, but that's a rough guide.
If you can (e.g. a non-shared resource that does not need to be passed as reference), use a value. By the following reasons:
Your code will be nicer and more readable, avoiding pointer operators and null checks.
Your code will be safer against Null Pointer panics.
Your code will be often faster: yes, faster! Why?
Reason 1: you will allocate less items in the heap. Allocating/deallocating from stack is immediate, but allocating/deallocating on Heap may be very expensive (allocation time + garbage collection). You can see some basic numbers here: http://www.macias.info/entry/201802102230_go_values_vs_references.md
Reason 2: especially if you store returned values in slices, your memory objects will be more compacted in memory: looping a slice where all the items are contiguous is much faster than iterating a slice where all the items are pointers to other parts of the memory. Not for the indirection step but for the increase of cache misses.
Myth breaker: a typical x86 cache line are 64 bytes. Most structs are smaller than that. The time of copying a cache line in memory is similar to copying a pointer.
Only if a critical part of your code is slow I would try some micro-optimization and check if using pointers improves somewhat the speed, at the cost of less readability and mantainability.
Three main reasons when you would want to use method receivers as pointers:
"First, and most important, does the method need to modify the receiver? If it does, the receiver must be a pointer."
"Second is the consideration of efficiency. If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver."
"Next is consistency. If some of the methods of the type must have pointer receivers, the rest should too, so the method set is consistent regardless of how the type is used"
Reference : https://golang.org/doc/faq#methods_on_values_or_pointers
Edit : Another important thing is to know the actual "type" that you are sending to function. The type can either be a 'value type' or 'reference type'.
Even as slices and maps acts as references, we might want to pass them as pointers in scenarios like changing the length of the slice in the function.
A case where you generally need to return a pointer is when constructing an instance of some stateful or shareable resource. This is often done by functions prefixed with New.
Because they represent a specific instance of something and they may need to coordinate some activity, it doesn't make a lot of sense to generate duplicated/copied structures representing the same resource -- so the returned pointer acts as the handle to the resource itself.
Some examples:
func NewTLSServer(handler http.Handler) *Server -- instantiate a web server for testing
func Open(name string) (*File, error) -- return a file access handle
In other cases, pointers are returned just because the structure may be too large to copy by default:
func NewRGBA(r Rectangle) *RGBA -- allocate an image in memory
Alternatively, returning pointers directly could be avoided by instead returning a copy of a structure that contains the pointer internally, but maybe this isn't considered idiomatic:
No such examples found in the standard libraries...
Related question: Embedding in Go with pointer or with value
Regarding to struct vs. pointer return value, I got confused after reading many highly stared open source projects on github, as there are many examples for both cases, util I found this amazing article:
https://www.ardanlabs.com/blog/2014/12/using-pointers-in-go.html
"In general, share struct type values with a pointer unless the struct type has been implemented to behave like a primitive data value.
If you are still not sure, this is another way to think about. Think of every struct as having a nature. If the nature of the struct is something that should not be changed, like a time, a color or a coordinate, then implement the struct as a primitive data value. If the nature of the struct is something that can be changed, even if it never is in your program, it is not a primitive data value and should be implemented to be shared with a pointer. Don’t create structs that have a duality of nature."
Completedly convinced.

How to manage a large number of variables in C?

In an implementation of the Game of Life, I need to handle user events, perform some regular (as in periodic) processing and draw to a 2D canvas. The details are not particularly important. Suffice it to say that I need to keep track of a large(-ish) number of variables. These are things like: a structure representing the state of the system (live cells), pointers to structures provided by the graphics library, current zoom level, coordinates of the origin and I am sure a few others.
In the main function, there is a game loop like this:
// Setup stuff
while (!finished) {
while (get_event(&e) != 0) {
if (e.type == KEYBOARD_EVENT) {
switch (e.key.keysym) {
case q:
case x:
// More branching and nesting follows
The maximum level of nesting at the moment is 5. It quickly becomes unmanageable and difficult to read, especially on a small screen. The solution then is to split this up into multiple functions. Something like:
while (!finished {
while (get_event(&e) !=0) {
handle_event(state, origin_x, origin_y, &canvas, e...) //More parameters
This is the crux of the question. The subroutine must necessarily have access to the state (represented by the origin, the canvas, the live cells etc.) in order to function. Passing them all explicitly is error prone (which order does the subroutine expect them in) and can also be difficult to read. Aside from that, having functions with potentially 10+ arguments strikes me as a symptom of other design flaws. However the alternatives that I can think of, don't seem any better.
To summarise:
Accept deep nesting in the game loop.
Define functions with very many arguments.
Collate (somewhat) related arguments into structs - This really only hides the problem, especially since the arguments are only loosely related.
Define variables that represent the application state with file scope (static int origin_x; for example). If it weren't for the fact that it has been drummed into me that global variable are usually a terrible idea, this would be my preferred option. But if I want to display two views of the same instance of the Game of Life in the future, then the file scope no longer looks so appealing.
The question also applies in slightly more general terms I suppose: How do you pass state around a complicated program safely and in a readable way?
EDIT:
My motivations here are not speed or efficiency or performance or anything like this. If the code takes 20% longer to run as a result of the choice made here that's just fine. I'm primarily interested in what is less likely to confuse me and cause the least headache in 6 months time.
I would consider the canvas as one variable, containing a large 2D array...
consider static allocation
bool canvas[ROWS][COLS];
or dynamic
bool *canvas = malloc(N*M*sizeof(int));
In both cases you can refer to the cell at position i,j as canvas[i][j]
though for dynamic allocation, do not forget to free(canvas) at the end. You can then use a nested loop to update your state.
Search for allocating/handling a 2d array in C and examples or tutorials... Possibly check something like this or similar? Possibly this? https://www.geeksforgeeks.org/nested-loops-in-c-with-examples/
Also consider this Fastest way to zero out a 2d array in C?

Returning multiple values from a C function

Important: Please see this very much related question: Return multiple values in C++.
I'm after how to do the same thing in ANSI C? Would you use a struct or pass the addresses of the params in the function? I'm after extremely efficient (fast) code (time and space), even at the cost of readability.
EDIT: Thanks for all the answers. Ok, I think I owe some explanation: I'm writing this book about a certain subset of algorithms for a particular domain. I have set myself the quite arbitrary goal of making the most efficient (time and space) implementations for all my algos to put up on the web, at the cost of readability and other stuff. That is in part the nature of my (general) question.
Answer: I hope I get this straight, from (possibly) fastest to more common-sensical (all of this a priori, i.e. without testing):
Store outvalues in global object (I would assume something like outvals[2]?), or
Pass outvalues as params in the function (foo(int in, int *out1, int *out2)), or
return a struct with both outvals, or
(3) only if the values are semantically related.
Does this make sense? If so, I think Jason's response is the closest, even though they all provide some piece of the "puzzle". Robert's is fine, but at this time semantics is not what I'm after (although his advice is duly noted).
Both ways are valid, certianly, but I would would consider the semantics (struct vs parameter reference) to decide which way best communicates you intentions to the programmer.
If the values you are returning are tightly coupled, then it is okay to return them as a structure. But, if you are simply creating artificial mechanism to return values together (as a struct), then you should use a parameter reference (i.e. pass the address of the variables) to return the values back to the calling function.
As Neil says, you need to judge it for yourself.
To avoid the cost of passing anything, use a global. Next best is a single structure passed by pointer/reference. After that are individual pointer/reference params.
However, if you have to pack data into the structure and then read it back out after the call, you may be better off passing individual parameters.
If you're not sure, just write a bit of quick test code using both approaches, execute each a few hundred thousand times, and time them to see which is best.
You have described the two possible solutions and your perceived performance constraint. Where you go from here is really up to you - we don't have enough information to make an informed judgement.
Easiest to read should be passed addresses in the function, and it should be fast also, pops and pushes are cheap:
void somefunction (int inval1, int inval2, int *outval1, int *outval2) {
int x = inval1;
int y = inval2;
// do some processing
*outval1 = x;
*outval2 = y;
return;
}
The fastest Q&D way that I can think of is to pass the values on a global object, this way you skip the stack operation just keep in mind that it won't be thread safe.
I think that when you return a struct pointer, you probably need to manually find some memory for that. Addresses in parameter list are allocated on the stack, which is way faster.
Keep in mind that sometimes is faster to pass parameters by value and update on return (or make local copies on the stack) than by reference... This is very evident with small structures or few parameters and lots of accesses.
This depends massively on your architecture, and also if you expect (or can have) the function inlined. I'd first write the code in the simplest way, and then worry about speed if that shows up as an expensive part of your code.
I would pass the address to a struct. If the information to be returned isn't complex, then just passing in the addresses to the values would work too.
Personally, it really comes down to how messy the interface would be.
void SomeFunction( ReturnStruct* myReturnVals )
{
// Fill in the values
}
// Do some stuff
ReturnStruct returnVals;
SomeFunction( &returnVals);
// Do more stuff
In either case, you're passing references, so performance should be similar. If there is a chance that the function never actually returns a value, you could avoid the cost of the malloc with the "return a struct" option since you'd simply return null.
My personal preference is to return a dynamically allocated (malloc'd) struct. I avoid using function arguments for output because I think it makes code more confusing and less maintainable in the long-term.
Returning a local copy of the structure is bad because if the struct was declared as non-static inside the function, it becomes null and void once you exit the function.
And to all the folks suggesting references, well the OP did say "C," and C doesn't have them (references).
And sweet feathery Jesus, can I wake up tomorrow and not have to see anything about the King of Flop on TV?

GCC function attributes vs caching

I have one costly function that gets called many times and there is a very limited set of possible values for the parameter.
Function return code depends only on arguments so the obvious way to speed things up is to keep a static cache within the function for possible arguments and corresponding return codes, so for every combination of the parameters, the costly operation will be performed only once.
I always use this approach in such situations and it works fine but it just occurred to me that GCC function attributes const or pure probably can help me with this.
Does anybody have experience with this? How GCC uses pure and const attributes - only at compile time or at runtime as well?
Can I rely on GCC to be smart enough to call a function, declared as
int foo(int) __attribute__ ((pure))
just once for the same parameter value, or there is no guarantee whatsoever and I better stick to caching approach?
EDIT: My question is not about caching/memoization/lookup tables, but GCC function atributes.
I think you are confusing the GCC pure attribute with memoization.
The GCC pure attribute allows the compiler to reduce the number of times the function is called in certain circumstances (such as loop unrolling). However it makes no guarantees that it will do so, only if it think it's appropriate.
What you appear to be looking for is memoization of your function. Memoization is an optimization where calculations for the same input should not be repeated. Instead the previous result should be returned. The GCC pure attribute does not make a function work in this way. You would have to hand implement this.
I have one costly function that gets called many times and there is very limited set of possible values for the parameter.
Why not use a static constant map then (the arguments' can be hashed to generate a key, the return code the value)?
This sounds like it might be solved with a template function. If all if the known parameters and return values are known at compile-time, you could perhaps generate a template instance of the function for each possible parameter. Essentially you'd be calling a different instance of the function for each possible parameter. Not sure it would be any easier than the static cache you've already implemented, but might be worth exploring.
Check out template metaprogramming. The concepts are similar to 'memoization', suggested by JaredPar, even using the same introductory example of a factorial function. It might be appropriate to say that these kinds of templates are compile-time implementations of memoization.
I dont like to reopen old threads, but there was a particularly offensive comment here:
"templates are for dealing with different types, rather than different values of the same type"
Now, take a simple template factorial implementation:
template<int n> struct Factorial {
static const int value = n * Factorial<n-1>::value;
};
template<> struct Factorial<0> {
static const int value = 1;
};
The template parameter here is an integer, not a typename.

Resources