Memoization Libraries for C? - c

For a project I'm working on, there are a number of states where calculations can be relied upon to return the same results (and have no side effects). The obvious solution would be to use memoization for all the costly functions.
I would need to have memoization that handles more than one state (so that I could invalidate one cache set without invalidating another). Does anybody know a good C library for this sort of thing? (Note that it can't be C++, we're talking C.)
I've worked with some good implementations in Python that use decorators to be able to flexibly memoize a bunch of different functions. I'm kind of wondering is there's a generic library that could do similar things with C (though probably with explicit function wrapping rather than convenient syntax). I just think it would be silly to have to add caching to each function individually when it's a common enough issue there must be some off-the-shelf solutions for it.
The characteristics I would look for are the following:
Can cache functions with various types of input and output
Manages multiple different caches (so you can have short-term and long term caching)
Has good functions for invalidating caches
Intended to be used by wrapping functions, rather than altering existing functions
Anybody know a C implementation that can handle all or most of these requisites?

Okay, seeing as there were no memoization libraries for C and I was looking for a drop-in solution for memoizing existing C functions in a code base, I made my own little memoization library that I'm releasing under the APL 2.0. Hopefully people will find this useful and it won't crash and burn on other compilers. If it does have issues, message me here and I'll look into it whenever I have the time (which would probably be measured in increments of months).
This library is not built for speed, but it works and has been tested to make sure it is fairly straightforward to use and doesn't display any memory leaks in my testing. Fundamentally, this lets me add memoization to functions similar to the decorator pattern that I'm used to in Python.
The library is currently on SourceForge as the C-Memo Library. It comes with a little user manual and a couple of 3rd party permissively licensed libraries for generic hashing. If the location changes, I'll try to update this link. I found this helpful in working on my project, hopefully others will find it useful for their projects.

memoization is all but built into the haskell language. You can call this functionality from c
Update:
I'm still learning about functional programming, but I do know that memoization is fairly common in functional programming becuase the language features make it easy. I'm learning f#. I don't know haskell, but it is the only functional language I know of that will interact with c. You might be able to find another functional programming language that interfaces with c in a more suitable fashion than what haskell provides.

Why, just can't be C++?
Just for a starting point look to this memoization function:
declaration:
template<typename T, typename F>
auto Memoize(T key, F function) {
static T memory_key = key;
static auto memory = function(memory_key);
if (memory_key != key) {
memory_key = key;
memory = function(memory_key);
}
return memory;
}
Usage example:
auto index = Memoize(value, IndexByLetter);

Related

Emulating lambdas in C?

I should mention that I'm generating code in C, as opposed to doing this manually. I say this because it doesn't matter too much if there's a lot of code behind it, because the compiler should manage it all. Anyway, how would I go around emulating a lambda in C? I was thinking I could just generate a function with some random name somewhere in the source code and then call that? I'm not too sure. I haven't really tried anything just yet, since I wanted to get the idea down before I implement it.
Is there some kind of preprocessor directive I can do, or some macro that will make this cleaner to do? I've been inspired by Jon Blow to try out compiler development, and he seemed to implement Lambdas in his language Jai. However, I think he does something where he generates bytecode, and then into C? I'm not sure.
Edit:
I'm working on a compiler, the compiler is just a project of mine to keep me busy, plus I wanted to learn more about compilers. I primarily use clang, I'm on Ubuntu 14.10. I don't have any garbage collection, but I wanted to try my hand at some kind of smart pointer-y/rust/ARC inspired memory model for garbage collection, i.e. little to no overhead. I chose C because I wanted to dabble in it more. My project is free software, just a hobby project.
There are several ways of doing it ("having" lambdas in C). The important thing to understand is that lambdas give closures and that closures are mixing "code" with "data" (the closed values); notice that objects are also mixing "code" with "data" and there is a similarity between objects and closures. See also this answer on Programmers.
Traditionally, in C, you not only use function pointers, but you adopt a convention regarding callbacks. This for instance is the case with GTK: every time you pass a function pointer, you also pass some data with it. You can view callbacks (the convention of giving C function pointer with some void*data) as a way to implement closures.
Since you generate C code (which is a wise idea, I'm doing similar things in MELT which -on Linux- generates C++ code at runtime, compile it into a shared object, and dlopen-s that) you could adopt a callback convention and pass some closed values to every function that you generate.
You might also consider closed values as static variables, but this approach is generally unwise.
There have been in the past some lambda.h header library which generates a machine-specific trampoline code for closures (essentially generating a code which pushes some closed values as arguments then call some routine). You might use some JIT compilation techniques (using libjit, GNU lightning, LLVM, asmjit, ....) to do the same. See also libffi to call an arbitrary function (of signature known at runtime only).
Notice that there is a strong -but indirect- relation between closures and garbage collection (read the GC handbook for more), and it is not by accident that every functional language has a GC. C++11 lambda functions are an exception on this (and it is difficult to understand all the intricacies of memory management of C++11 closures). So if you are generating C code, you could and probably should use Boehm's conservative garbage collector (which is wrapping dlopen) and you would have closure GC-ed values. (You could use some other GC libraries, e.g. Ravenbrook's MPS or my unmaintained Qish...) Then you could have the convention that every generated C function takes its closure as first argument.
I would suggest to read Scott's book on Programming Language Pragmatics and (assuming you know a tiny bit of Scheme or Lisp; if you don't you should learn a bit of Scheme and read SICP) Queinnec's book Lisp In Small Pieces (if you happen to read French, read the latest French variant).

How much optimized is Vala generated C code over hand written C code?

Is Vala generated code are optimized like normal hand-written C code? Is there any performance overhead in using GObject system over not using it?
NOTE: In my next C project I am researching over to use Vala or not. The project is not a GUI application, it is an interpreter kind of application which has to be platform independent. I am using gcc as compiler.
As a Vala developer I wouldn't suggest Vala for an interpreter. In an interpreter you're going to create many objects for ast, data types, possible intermediate objects, codegen objects and so on. In Vala itself I've personally measured that the major overhead is creating objects (that are simple GTypeInstance, not even GObject).
Vala is designed to work with gobjects, but gobjects aren't designed to be allocated fast.
So, for your project I'd still be using glib/gio for cross-platform stuff, like networking, string utils, unicode, data structures and so on, because they have a clean, consistent and convenient API, but I wouldn't create ast objects as gobjects/gtypeinstance.
In an interpreter you want fast allocation, that's the whole point.
My personal advice is: use vala if you want to build desktop applications, dbus services, gstreamer stuff or anything that touches the g* world, nothing else.
It depends on what you would have done writing C. In particular:
Since I can write GObject based C code by hand, what is your threshold? Handwritten GObject-based C versus Vala-written GObject-based C? Probably comparable since Vala is going to generate more or less the same library calls as a human would.
GObject classes are technically optional. You can mark a class as [Compact] to skip all the GLib code generation for a class, which will be much faster, although you will lose many of the features, such as virtual methods, if you do so. This will still have slightly more overhead than an object written in C, but it comes with thread-safe reference counting and a few other things that a typical C programmer wouldn't bother doing.
Vala generates a lot of temporary variables. If your C compiler has optimisation at all, most of these temporaries will be eliminated. The bulk of Vala's control structures match with their C counter parts so a Vala if will not be shockingly more expensive than the C if.
Vala tracks references to do memory management at compilation time. Normally, this is cheap, but it can cause extra duplication of arrays and strings. Particularly, if you copy an unowned string to an owned variable, strdup will be automatically called. This means generated Vala will create more of these small, temporary objects, but, if it really is a problem, you can judiciously use unowned to limit their creation.
The vala compiler generated code uses GObject library. In case it is needed to avoid GObject, I suggest using the aroop compiler which uses vala parser to parse vala code but does not use GObject in the generated code.
Aroop compiler generates code that uses object pool which is optimized for object creation and manipulation. The collection of objects has data oriented features. For example the objects can be flagged and the flag can be selected while traversing the objects in a very efficient way and the objects are all in close distance in perspective of memory location.
The aroop compiler is used to write shotodol project which does not have a GUI of it's own. It has module and plugin system. It has a command line interface that enables people to write server application. An example of server application using shotodol exists here as shotodol_web. I wish people who like this project share their issues in the project page.
A generated code is never as optimized as a well designed hand written code, because the optimizer can not know the design goal. However, an optimizer creates optimized code more consistently then a human programmer would do. Also you should define your goals and then check if the performance requirements are met by the selected tools, not the other way around. Optimizing is not a design goal, it's a task that may need to be adressed, so first define your requirements and then think about how to reach it.
Premature optimization is the root of all evil. :)

Writing unit tests for C code

I'm a C++ developer and when it comes to testing, it's easy to test a class by injecting dependencies, overriding member functions, and so on, so that you can test edge cases easily. However, in C, you can't use those wonderful features. I'm finding it hard to add unit tests to code because of some of the 'standard' ways that C code is written. What are the best ways to tackle the following:
Passing around a large 'context' struct pointer:
void some_func( global_context_t *ctx, .... )
{
/* lots of code, depending on the state of context */
}
No easy way to test failure on dependent functions:
void some_func( .... )
{
if (!get_network_state() && !some_other_func()) {
do_something_func();
....
}
...
}
Functions with lots of parameters:
void some_func( global_context_t *, int i, int j, other_struct_t *t, out_param_t **out, ...)
{
/* hundreds and hundreds of lines of code */
}
Static or hidden functions:
static void foo( ... )
{
/* some code */
}
void some_public_func( ... }
{
/* call static functions */
foo( ... );
}
In general, I agree with Wes's answer - it is going to be much harder to add tests to code that isn't written with tests in mind. There's nothing inherent in C that makes it impossible to test - but, because C doesn't force you to write in a particular style, it's also very easy to write C code that is difficult to test.
In my opinion, writing code with tests in mind will encourage shorter functions, with few arguments, which helps alleviate some of the pain in your examples.
First, you'll need to pick a unit testing framework. There are a lot of examples in this question (though sadly a lot of the answers are C++ frameworks - I would advise against using C++ to test C).
I personally use TestDept, because it is simple to use, lightweight, and allows stubbing. However, I don't think it is very widely used yet. If you're looking for a more popular framework, many people recommend Check - which is great if you use automake.
Here are some specific answers for your use cases:
Passing around a large 'context' struct pointer
For this case, you can build an instance of the struct with the pre conditions manually set, then check the status of the struct after the function has run. With short functions, each test will be fairly straightforward.
No easy way to test failure on dependent functions
I think this is one of the biggest hurdles with unit testing C.
I've had success using TestDept, which allows run time stubbing of dependent functions. This is great for breaking up tightly coupled code. Here's an example from their documentation:
void test_stringify_cannot_malloc_returns_sane_result() {
replace_function(&malloc, &always_failing_malloc);
char *h = stringify('h');
assert_string_equals("cannot_stringify", h);
}
Depending on your target environment, this may or may not work for you. See their documentation for more details.
Functions with lots of parameters
This probably isn't the answer you're looking for, but I would just break these up into smaller functions with fewer parameters. Much much easier to test.
Static or hidden functions
It's not super clean, but I have tested static functions by including the source file directly, enabling calls of static functions. Combined with TestDept for stubbing out anything not under test, this works fairly well.
#include "implementation.c"
/* Now I can call foo(), defined static in implementation.c */
A lot of C code is legacy code with few tests - and in those cases, it is generally easier to add integration tests that test large parts of the code first, rather than finely grained unit tests. This allows you to start refactoring the code underneath the integration test to a unit-testable state - though it may or may not be worth the investment, depending on your situation. Of course, you'll want to be able to add unit tests to any new code written during this period, so having a solid framework up and running early is a good idea.
If you are working with legacy code, this book
(Working effectively with legacy code by Michael Feathers) is great further reading.
That was a very good question designed to lure people into believing that C++ is better than C because it's more testable. However, it's hardly that simple.
Having written lots of testable C++ and C code both, and an equally impressive amount of untestable C++ and C code, I can confidentially say you can wrap crappy untestable code in both languages. In fact the majority of the issues you present above are equally as problematic in C++. EG, lots of people write non-object encapsulated functions in C++ and use them inside classes (see the extensive use of C++ static functions within classes, as an example, such as MyAscii::fromUtf8() type functions).
And I'm quite sure that you've seen a gazillion C++ class functions with too many parameters. And if you think that just because a function only has one parameter it's better, consider the case that internally it's frequently masking the passed in parameters by using a bunch of member variables. Let alone "static or hidden" functions (hint, remember that "private:" keyword) being just as big of a problem.
So, the real answer to your question isn't "C is worse for exactly the reasons you state" but rather "you need to architect it properly in C, just as you would in C++". For example, if you have dependent functions, then put them in a different file and return the number of possible answers they might provide by implementing a bogus version of that function when testing the super-function. And that's the barely-getting-by change. Don't make static or hidden functions if you want to test them.
The real problem is that you seem to state in your question that you're writing tests for someone else's library that you didn't write and architect for proper testability. However, there are a ton of C++ libraries that exhibit the exact same symptoms and if you were handed one of them to test, you'd be just as equally annoyed.
The solution to all problems like this is always the same: write the code properly and don't use someone else's improperly written code.
When unit testing C you normally include the .c file in the test so you can first test the static functions before you test the public ones.
If you have complex functions and you want to test code calling them then it is possible to work with mock objects. Take a look at the cmocka unit testing framework which offers support for mock objects.

C for an Object-Oriented programmer

Having learned Java and C++, I've learned the OO-way. I want to embark on a fairly ambitious project but I want to do it in C. I know how to break problems down into classes and how to turn them into class hierarchies. I know how to abstract functionality into abstract classes and interfaces. I'm even somewhat proficient at using polymorphism in an effective way.
The problem is that when I'm presented with a problem, I only way I know how to do it is in an Object-Oriented way. I've become too dependent on Object-Oriented design philosophies and methodologies.
I want to learn how to think in a strictly procedural way. How do I do things in a world that lacks classes, interfaces, polymorphism, function overloading, constructors, etc.
How do you represent complex concepts using only non-object-oriented structs? How do you get around a lack of function overloading? What are some tip and tricks for thinking in a procedural way?
The procedural way is to, on one side, have your data structures, and, on the other, your algorithms. Then you take your data structures and pass them to your algorithms. Without encapsulation, it takes a somewhat higher amount of discipline to do this and if you increase the abstraction level to make it easier to do it right, you're doing a considerable part of OO in C.
I think you have a good plan. Doing things the completely OO way in C, while quite possible, is enough of a pain that you would soon drop it anyway. (Don't fight the language.)
If you want a philosophical statement on mapping the OO way to the C way, in part it happens by pushing object creation up one level. A module can still implement its object as a black box, and you can still use reasonable programming style, but basically its too much of a pain to really hide the object, so the caller allocates it and passes it down, rather than the module allocating it and returning it back up. You usually punt on getters and setters, or implement them as macros.
Consider also that all of those abstractions you mentioned are a relatively thin layer on top of ordinary structs, so you aren't really very far away from what you want to do. It just isn't packaged quite as nicely.
The C toolkit consists of functions, function pointers and macros. Function pointers can be used to emulate polymorphism.
You are taking the reverse trip old C programmers did for learning OO.
Even before c++ was a standart OO techniquis were used in C.
They included defining structs with a pointer to srtuct (usually called this...)
Then defining pointer functions in the struct, and during runtime initialize those pointers to the relevant functions.
All those functions received as first paremeter the struct pointer this.
Don't think C in the complete OOP way. If you have to use C, you should learn procedural programming. Doing this would not take more time than learning how to realize all the OOP features in C. Furthermore, basic encapsulation is probably fine, but a lot of other OOP features come with overhead on performance when you mimic them (not when the language is designed to support OOP). The overhead may be huge if you strictly follow the C++ design methodology to represent every small things as objects. Programming languages have specific purposes in design. When you break the boundary, you always have to pay something as the cost.
Don't think you have to shelve your knowledge of object-oriented work - you can "program into the language".
I had to work in C after being primarily experienced in object-oriented work. C allows for some level of object concepts to pull through. At the job, I had to implement a red-black tree in C, for use in a sweep-line algorithm to find the intersection points in a set of segments. Since the algorithm used different comparison functions, I ended up using function pointers to achieve the same effect as lambdas in Scheme or delegates in C#. It worked well, and also allowed the balanced tree to be reusable.
The other feature of the balanced tree was using void pointers to store arbitrary data. Again, void and function pointers in C are a pain (if you don't know their ins and outs), but they can be used to approximate creating a generic data structure.
One final note: use the right tool for the job. If you want to use C simply to master procedural technique, then choose a problem that is well-suited to a procedural approach. I didn't have a choice in the matter (legacy application written in C, and people demand the world and refuse to enter the 21st century), so I had to be creative. C is great for low/medium abstractions from the machine, say if you wanted to write a command-line packet inspection program.
The standard way to do polymorphic behavior in C is to use function pointers. You'll find a lot of C APIs (such as the standard qsort(3) and bsearch(3)) take function pointers as parameters; some non-standard ones such as qsort_r take a function pointer and a context pointer (thunk in this case) which serves no purpose other than to be passed back to the callback function. The context pointer functions exactly like the this pointer in object-oriented languages, when dealing with function objects (e.g. functors).
See also:
Can you write object-oriented code in C?
Object-Orientation in C
Try not to use OOP in C. But if you need to, use structures. For the functions,
take a structure for an argument, like so:
typedef struct{
int age;
char* name;
char* dialog;
} Human;
void make_dialog(Human human){
char* dialog="Hi";
human.dialog=dialog;
}
which works exactly like python's self, or something like that and to access other functions belonging to that class:
void get_dialog(Human human){
make_dialog(human);
printf(human.dialog);
}

How should I structure complex projects in C? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have little more than beginner-level C skills and would like to know if there are any de facto "standards" to structure a somewhat complex application in C. Even GUI based ones.
I have been always using the OO paradigm in Java and PHP and now that I want to learn C I'm afraid that I might structure my applications in the wrong way. I'm at a loss on which guidelines to follow to have modularity, decoupling and dryness with a procedural language.
Do you have any readings to suggest? I couldn't find any application framework for C, even if I don't use frameworks I've always found nice ideas by browsing their code.
The key is modularity. This is easier to design, implement, compile and maintain.
Identify modules in your app, like classes in an OO app.
Separate interface and implementation for each module, put in interface only what is needed by other modules. Remember that there is no namespace in C, so you have to make everything in your interfaces unique (e.g., with a prefix).
Hide global variables in implementation and use accessor functions for read/write.
Don't think in terms of inheritance, but in terms of composition. As a general rule, don't try to mimic C++ in C, this would be very difficult to read and maintain.
If you have time for learning, take a look at how an Ada app is structured, with its mandatory package (module interface) and package body (module implementation).
This is for coding.
For maintaining (remember that you code once, but you maintain several times) I suggest to document your code; Doxygen is a nice choice for me. I suggest also to build a strong regression test suite, which allows you to refactor.
It's a common misconception that OO techniques can't be applied in C. Most can -- it's just that they are slightly more unwieldy than in languages with syntax dedicated to the job.
One of the foundations of robust system design is the encapsulation of an implementation behind an interface. FILE* and the functions that work with it (fopen(), fread() etc.) is a good example of how encapsulation can be applied in C to establish interfaces. (Of course, since C lacks access specifiers you can't enforce that no-one peeks inside a struct FILE, but only a masochist would do so.)
If necessary, polymorphic behaviour can be had in C using tables of function pointers. Yes, the syntax is ugly but the effect is the same as virtual functions:
struct IAnimal {
int (*eat)(int food);
int (*sleep)(int secs);
};
/* "Subclass"/"implement" IAnimal, relying on C's guaranteed equivalence
* of memory layouts */
struct Cat {
struct IAnimal _base;
int (*meow)(void);
};
int cat_eat(int food) { ... }
int cat_sleep(int secs) { ... }
int cat_meow(void) { ... }
/* "Constructor" */
struct Cat* CreateACat(void) {
struct Cat* x = (struct Cat*) malloc(sizeof (struct Cat));
x->_base.eat = cat_eat;
x->_base.sleep = cat_sleep;
x->meow = cat_meow;
return x;
}
struct IAnimal* pa = CreateACat();
pa->eat(42); /* Calls cat_eat() */
((struct Cat*) pa)->meow(); /* "Downcast" */
All good answers.
I would only add "minimize data structure". This might even be easier in C, because if C++ is "C with classes", OOP is trying to encourage you to take every noun / verb in your head and turn it into a class / method. That can be very wasteful.
For example, suppose you have an array of temperature readings at points in time, and you want to display them as a line-chart in Windows. Windows has a PAINT message, and when you receive it, you can loop through the array doing LineTo functions, scaling the data as you go to convert it to pixel coordinates.
What I have seen entirely too many times is, since the chart consists of points and lines, people will build up a data structure consisting of point objects and line objects, each capable of DrawMyself, and then make that persistent, on the theory that that is somehow "more efficient", or that they might, just maybe, have to be able to mouse over parts of the chart and display the data numerically, so they build methods into the objects to deal with that, and that, of course, involves creating and deleting even more objects.
So you end up with a huge amount of code that is oh-so-readable and merely spends 90% of it's time managing objects.
All of this gets done in the name of "good programming practice" and "efficiency".
At least in C the simple, efficient way will be more obvious, and the temptation to build pyramids less strong.
The GNU coding standards have evolved over a couple of decades. It'd be a good idea to read them, even if you don't follow them to the letter. Thinking about the points raised in them gives you a firmer basis on how to structure your own code.
If you know how to structure your code in Java or C++, then you can follow the same principles with C code. The only difference is that you don't have the compiler at your side and you need to do everything extra carefully manually.
Since there are no packages and classes, you need to start by carefully designing your modules. The most common approach is to create a separate source folder for each module. You need to rely on naming conventions for differentiating code between different modules. For example prefix all functions with the name of the module.
You can't have classes with C, but you can easily implement "Abstract Data Types". You create a .C and .H file for every abstract data type. If you prefer you can have two header files, one public and one private. The idea is that all structures, constants and functions that need to be exported go to the public header file.
Your tools are also very important. A useful tool for C is lint, which can help you find bad smells in your code. Another tool you can use is Doxygen, which can help you generate documentation.
Encapsulation is always key to a successful development, regardless of the development language.
A trick I've used to help encapsulate "private" methods in C is to not include their prototypes in the ".h" file.
I'd suggets you to check out the code of any popular open source C project, like... hmm... Linux kernel, or Git; and see how they organize it.
The number rule for complex application: it should be easy to read.
To make complex application simplier, I employ Divide and conquer.
I would suggest reading a C/C++ textbook as a first step. For example, C Primer Plus is a good reference. Looking through the examples would give you and idea on how to map your java OO to a more procedural language like C.

Resources