Inline functions, macros and other solutions to allocate on caller's stack - c

I have a lot of code that looks like this:
int bufferSize = fooBufferSize(); // hate having to do this; this logic should be in `foo`
char buffer[bufferSize];
foo(buffer);
bar(buffer);
It happens all the time for me. In the wild, I see something similar a lot:
int bufferSize = snprintf(NULL, 0, format, ...); // exact same issue as above
char buffer[bufferSize+1];
sprintf(buffer, format, ...);
Besides the fact that the above are tedious for the user to write, they also probably redo a lot of computations, which is not only inefficient, but it isn't DRY. I know that I could just malloc the buffer within foo, but there's a lot of issues with that: memory fragmentation, remembering to call free, overhead of malloc/free.
char *foo() {
char *buffer = malloc(...);
// process the data in the buffer
return buffer;
}
main() {
char *buffer = foo();
bar(buffer);
}
There are a lot of cases where I probably would use malloc (or a buffer pool) for things that are constantly removed and deleted (e.g. projectile objects in a game). However, the case that comes up a lot for me is that I want to allocate an object on the stack and then dispose of it when my function returns. The issue is that the object is generally allocated in a stack frame further down from the stack frame I want the object to live in. I'd prefer if I could just build on-top of the stack that foo uses. Like, what if I did this instead:
void foo(char **bufferPtr) {
char buffer[...];
// process buffer
*bufferPtr = buffer;
jmp __builtin_return_address(0); // pseudocode to jump to return address
}
main() {
char *buffer = NULL;
foo(&buffer);
bar(buffer);
}
I don't even know the exact syntax to make this approach work, but it is both GCC-specific and extremely hacky. In addition, if the jump isn't understood by the compiler, local variable states might not be restored properly. I guess, what I really want is for foo to behave like a macro, e.g. like this:
#define foo(buffer) \
char buffer[4]; \
strcpy(buffer, "hey");
int main() {
foo(buffer)
bar(buffer);
}
However, I really don't like using macros (terrible error messages, bad IDE support, slippery slope, etc.)
The macro above looks nice, but in my current usecase, I'm building a computation graph (similar to TensorFlow), and some of the node constructors would look really awkward using macros.
typedef struct {
float *data; // buffer to store output data of computation
int order; // number of dimensions
int *dimensions; // e.g. [3,4] for a 3x4 matrix
} Node;
typedef struct {
Node super; // it's still a node, so just pass a ref to this whenever you need a Node*
Node *A;
Node *B;
} MatMulNode;
void printMatrix(const char *name, Node *node) {
assert(node->order == 2);
printf("%s: [%d x %d]\n", name, node->dimensions[0], node->dimensions[1]);
}
// look at all these backslashes
// also, `return -1` might not make sense in the context this macro is used.
#define matmul(node, left, right) \
if (left.order != 2 || right.order != 2) {\
return -1;\
}\
if (left.dimensions[1] != right.dimensions[0]) {\
return -1;\
}\
int dimensions[2];\
dimensions[0] = left.dimensions[0];\
dimensions[1] = right.dimensions[1];\
float data[dimensions[0] * dimensions[1]];\
MatMulNode node = {\
.super = {\
.data = data,\
.order = 2,\
.dimensions = dimensions,\
},\
.A = &left,\
.B = &right,\
};
int main() {
Node A = {
NULL,
2,
(int[]) {1, 2}
};
Node B = {
NULL,
2,
(int[]) {2, 3}
};
matmul(C, A, B);
printMatrix("A", &A);
printMatrix("B", &B);
printMatrix("C", &C.super);
}
I'm honestly surprised by how well this works, but I also hate the fact that I have to use macros for it and refuse to believe that this is the best API I can make which avoids malloc.
I tried using inline functions, but inline is just a suggestion unless I'm using the always_inline attribute (but that's GCC only), and AFAICT, it doesn't seem to work with alloca. I'm not even sure if the below code has defined behavior, or if I'm just getting lucky:
static inline __attribute__((always_inline)) Node *matmul(Node *left, Node *right) {
if (left->order != 2 || right->order != 2) {
return NULL;
}
if (left->dimensions[1] != right->dimensions[0]) {
return NULL;
}
int dimensions[2];
dimensions[0] = left->dimensions[0];
dimensions[1] = right->dimensions[1];
float data[dimensions[0] * dimensions[1]];
MatMulNode node = {
.super = {
.data = data,
.order = 2,
.dimensions = dimensions,
},
.A = left,
.B = right,
};
return &node.super;
}
The final approach I know of is to use continuation-passing style, e.g.
void matmul(Node *A, Node *B, void callback(void *, Node *C), void *context) {
MatMulNode C = ...;
callback(context, &C.super);
}
The obvious tail-call optimization of this approach is nice, and the fact that the stack is obviously preserved would make it much less hacky than things like jmp, but the API that it presents to the user is really ugly. For example, what if I want to do matmul(A, matmul(B, C))? The code I'd have to write is extremely counter-intuitive, especially because I have to pass in a context variable to the callbacks, when they should Ideally just have access to the entire stack and choose whatever variables they need from there.
void callbackABC(void *context, Node *ABC) {
assert(ABC->order == 2);
printf("A(BC) is [%d x %d]\n", ABC->dimensions[0], ABC->dimensions[1]);
}
void callbackBC(void *context, Node *BC) {
Node *A = context;
matmul(A, BC, callbackABC, NULL);
}
int main() {
Node A = {
NULL,
2,
(int[]) {1, 2}
};
Node B = {
NULL,
2,
(int[]) {2, 3}
};
Node C = {
NULL,
2,
(int[]) {3, 4}
};
matmul(&B, &C, callbackBC, &A);
}
Overall, I think that inline functions are the closest thing to what I want, but rather than ask "How do I force a function to always inline?" I figured I'd ask with the full context of what I want to achieve and why none of my solutions work.

I'm not recommending it, but also you may want to take a look at the __attribute__ cleanup(freefunc) feature in gcc. Basically handles cleanup when the associated variable leaves scope.

Related

How to pass a string as a macro?

There are many functions in the C libraries that require users to input with macros.
I wonder, if I have an array of strings, with contents of macros, like so:
char s[][3] = {"SIGINT", "SIGKILL", "SIGSTOP"};
How can I pass these strings as macros? (Like so:)
signal(s[0], do_something);
with do_something is a function pointer.
(and yes, technically I can pass ints in this case, but... hypothetically, ya know?)
EDIT:
As #RemyLebeau and SGeorgiades point out, the "SIGINT",... are aliases for integer consts, and therefore can be stored in an int array, like so:
int s[3] = {SIGINT, SIGKILL, SIGSTOP};
Although SGeorgiades and Remy Lebeau already gave you the answer, here is something that I've used in the past to allow conversion and pretty printing of signal numbers and names:
#include <stdio.h>
#include <signal.h>
#include <string.h>
struct sigfun {
int signo;
const char *signame;
};
#define SIGFUN(_sig) \
{ \
.signo = _sig, \
.signame = #_sig \
}
struct sigfun siglist[] = {
SIGFUN(SIGINT),
SIGFUN(SIGKILL),
SIGFUN(SIGSTOP),
// ...
{ .signo = 0, .signame = NULL }
};
#define SIGFORALL(_sig) \
_sig = siglist; _sig->signame != NULL; ++_sig
int
signame_to_signo(const char *signame)
{
struct sigfun *sig;
for (SIGFORALL(sig)) {
if (strcmp(sig->signame,signame) == 0)
break;
}
return sig->signo;
}
const char *
signo_to_signame(int signo)
{
struct sigfun *sig;
for (SIGFORALL(sig)) {
if (signo == sig->signo)
break;
}
return sig->signame;
}
UPDATE:
why not put for into SIGFORALL? –
tstanisl
For a few reasons ...
I've done that before (e.g.):
#define SIGFORALL(_sig) \
for (_sig = siglist; _sig->signame != 0; ++_sig)
SIGFORALL(sig) {
// do stuff
}
This tends to confuse certain IDEs and/or tools that parse the code without running it through the preprocessor.
It's also more difficult for programmers to quickly (without digesting the macro) skip over it.
They don't see a for and have trouble figuring out what SIGFORALL(sig) { does.
Is the macro a wrapper for if, for, or while?
With:
#define SIGFORALL(_sig) \
_sig = siglist; _sig->signame != 0; ++_sig
for (SIGFORALL(sig)) {
// do stuff
}
there is a better chance they can continue around the construct because they can understand (i.e. skip over) the for (...) [syntactically] without having to know what the macro is doing. That is, nobody has to "drill down" into the macro unless they wish to.
Another reason is that without the for in the macro, we can add extra code to the for loop's initialization and iteration expressions. It's more flexible.
For example, I've used a similar macro for linked list traversal and wanted to know the index/count of an element:
#define LLFORALL(_node) \
_node = nodelist; _node != NULL; _node = _node->next
int idx;
for (idx = 0, LLFORALL(node), ++idx) {
if (node->value == 5)
printf("found value at index %d\n",idx);
}
There's no absolute rule about this. Ultimately, it's a [personal] style preference.
Perhaps what you want instead is:
int s[3] = { SIGINT, SIGKILL, SIGSTOP };
signal(s[0], do_something);

GTK interface structure: Why is it built as casting interface?

GTK3 provides functions, that make casting necessary. As in
gtk_grid_attach(GTK_GRID(grid), button, 0,0,0,0)
However, this method will always take a grid. So why wouldn't the cast be inside the function? The call would become:
gtk_grid_attach(grid, button, 0,0,0,0)
And would therefore be much shorter and easier readable (because no redundancy).
edit:
Due to the discussion in the comments, I'll try to make my question more clear using the example from David Ranieri below
Given the code:
typedef struct {char *sound;} animal_t;
typedef struct {animal_t animal; char *name;} dog_t;
typedef struct {animal_t animal;} cat_t;
#define DOG(animal) (dog_t *)animal
#define CAT(animal) (cat_t *)animal
There are two ways to implement the function dog_print_name without loosing the typecheck-mechanism. In the example below I will hand over a cat, where a dog is expected.
(1)
void dog_print_name(dog_t *dog)
{
puts(dog->name);
}
int main(){
// ...
dog_print_name(DOG(cat)); // perfom type check here
// and fail on 'cat'
}
(2)
void dog_print_name(void *dog)
{
dog_t *dog_ = DOG(dog); // check performed
// will fail, if 'dog' is a cat.
puts(dog->name);
}
int main(){
// ...
dog_print_name(cat); // no check performed here
}
These two pieces of code being given, what is the reason one would choose implementation (1) over implementation (2)?
The goal is to get a warning / error on a call where dog is expected, but cat is given.
It's needed because C doesn't handle polymorphism. It can't check if that argument's type is valid based on the class hierarchy. So the macro triggers a check for that, and also make sure that even if you stored your pointer in the right pointer type, the content it points to (a GObject-derived object) uses the right type.
Here, grid has the right type, but not the right content. Making an explicit check makes debugging much easier.
GtkGrid *grid = gtk_image_new();
GtkButton *button = gtk_button_new();
gtk_grid_attach(grid, button, 0,0,0,0); // Won't catch early the fact that grid is a GtkImage, not a GtkGrid.
Also, most of the times, you just declare everything as a pointer to a GtkWidget. This is because depending on which methods you call, the class they come from may be different, so you'll need to cast anyway.
GtkWidget *grid = gtk_grid_new();
GtkWidget *button = gtk_button_new();
gtk_grid_attach(GTK_GRID(grid), button, 0,0,0,0); // WILL check at runtime that grid is a GtkGrid.
gtk_grid_remove_row(GTK_GRID(grid), 0);
gtk_container_add(GTK_CONTAINER(grid), button);
Just to expand a bit the nice answer of #liberfforce:
The only type allowed to receive a pointer to a different type without a cast is void *, so you need the cast because some functions (like grid_attach) are expecting to receive the derived object (GtkGrid), not the base (GtkWidget)
A little example that illustrates how inheritance works in gtk:
typedef struct {char *sound;} animal_t;
typedef struct {animal_t animal; char *name;} dog_t;
typedef struct {animal_t animal; int age;} cat_t;
#define DOG(animal) (dog_t *)animal
#define CAT(animal) (cat_t *)animal
static animal_t *new_dog(char *name)
{
dog_t *dog = malloc(sizeof *dog);
dog->animal.sound = "worf";
dog->name = name;
return &dog->animal;
}
static animal_t *new_cat(int age)
{
cat_t *cat = malloc(sizeof *cat);
cat->animal.sound = "meow";
cat->age = age;
return &cat->animal;
}
void animal_print_sound(animal_t *animal)
{
puts(animal->sound);
}
void dog_print_name(dog_t *dog)
{
puts(dog->name);
}
void cat_print_age(cat_t *cat)
{
printf("%d\n", cat->age);
}
int main(void)
{
animal_t *dog = new_dog("Bobbie");
animal_t *cat = new_cat(5);
animal_print_sound(dog);
animal_print_sound(cat);
dog_print_name(DOG(dog));
cat_print_age(CAT(cat));
return 0;
}
So why wouldn't you implement:
void dog_print_name(animal_t *dog) { puts(DOG(dog)->name); }
void cat_print_age(animal_t *cat) { printf("%d\n", CAT(cat)->age); }
?
From a "users"-perspective dog_print_name(dog); looks more
understandable to me.
Because in this way the compiler can not protect you checking the type, think what will happen if you pass dog_print_name(cat); using your approach.

How to implement a 'Pop' function that returns the "popped" element (i.e the data/value) ? (linked list stacks)

Confused as to how to implement a single function that would at the same time pop the element and return it as return value.
So far all I've seen are pop functions that return a pointer to the new head of the stack.
Here's a start, but...
#define VALUE int
typedef struct node_t {
VALUE item;
struct node_t *next;
} node;
.
.
.
// Function
VALUE pop(node *stack_head) {
// Used to store the node we will delete
node *deleteNode = stack_head;
// Error Checking // <<====== (btw, is this actually necessary ?)
if (!deleteNode || !stack_head) {
if (!stack_head) fprintf(stderr, "\nPop failed. --> ...\n");
if (!deleteNode) fprintf(stderr, "\nPop Failed. --> ...\n");
return 0;
}
// Storing the value in a variable
VALUE popped_item = stack_head->item;
// Updating the head
stack_head = stack_head->next; <<====== THERE'S A PROBLEM HERE ! (i think)
// Freeing/Deleting the 'popped' node
free(deleteNode);
// Return 'popped' value
return popped_item;
}
.
.
.
stack_head = stack_head->next;
Doesn't actually change the address that the pointer stack_head (i.e the head of the stack) points to... and so the value is indeed returned for the first pop but subsequent pops return errors.
Yes because it is a local variable but then how would you change the actual pointer (the one that points to the head of the stack) to point to the new head of the stack?
The parameter stack_head is local to the function pop, so when you modify it the result is not visible outside of the function.
You need to pass the address of the variable you want to modify, then in the function you dereference the pointer parameter to change what it points to.
So change your function to this:
VALUE pop(node **stack_head) {
node *deleteNode = *stack_head;
if (!*stack_head) {
fprintf(stderr, "\nPop failed. --> ...\n");
return 0;
}
VALUE popped_item = (*stack_head)->item;
*stack_head = (*stack_head)->next;
free(deleteNode);
return popped_item;
}
And call it like this:
node *stack_head = NULL;
// do something to push onto the stack
VALUE v = pop(&stack_head);
Okay, this will be a pretty long digest, but hopefully worth it. You can see a testcase of the code I've presented as my conclusion here and obtain a modular version of the code here. My suggestion would be that you use a structure like this:
struct {
size_t top;
T value[];
}
The reason you probably shouldn't use classical linked lists for this (or anything, really) is covered by this video courtesy of Bjarne Stroustrup. The basis of the problem is that the majority of your overhead is in allocation and cache misses which don't occur so much when you keep everything in one allocation.
If I were to write this for convenient use, perhaps:
#define stack_of(T) struct { size_t top; T value[]; }
This should allow you to declare empty stacks fairly sensibly, like:
int main(void) {
stack_of(int) *fubar = NULL;
}
This is familiar enough to templates in other languages to work fairly well, and also not a hideous abuse of the preprocessor. I'm sure I've written a push_back function somewhere which we can adapt to this version of push which I've linked to externally as it's not important for the conclusion of this answer (bear with me here; we'll come back to that momentarily)...
So now we have stack_of(T) and push(list, value) which we can use like:
int main(void) {
stack_of(int) *fubar = NULL;
push(fubar, 42);
push(fubar, -1);
}
The simplest solution for pop might be something like:
#define pop(list) (assert(list && list->top), list->value[--list->top]))
... but this does suffer a drawback we'll discuss later. For now we have as a testcase:
int main(void) {
stack_of(int) *fubar = NULL;
int x;
push(fubar, 42);
push(fubar, -1);
x = pop(fubar); printf("popped: %d\n", x);
x = pop(fubar); printf("popped: %d\n", x);
x = pop(fubar); printf("popped: %d\n", x);
}
... and as you'll see during debug the assert fails during execution telling us we've popped more than we've pushed... probably a good thing to have. Still, this doesn't actually reduce the size of the stack. To do that we actually need something more like push again, except we get rid of these lines:
list->top = y; \
list->value[x] = v; \
So there's an opportunity for refactoring. Thus I bring you operate():
#define operate(list, ...) { \
size_t x = list ? list->top : 0 \
, y = x + 1; \
if ((x & y) == 0) { \
void *temp = realloc(list, sizeof *list \
+ (x + y) * sizeof list->value[0]); \
if (!temp) \
return EXIT_FAILURE; \
list = temp; \
} \
__VA_ARGS__; \
}
Now we can redefine push in terms of operate:
#define push(list, v) operate(list, list->value[x] = v; list->top = y)
... and pop looks kind of like it did before, but with an invocation of operate on the end to cause list to shrink (from quadruple its size, for example when you've popped 3 elements off of a list of 4) to no larger than double its size.
#define pop(list) (assert(list && list->top), list->value[--list->top]); \
operate(list, )
Summing it all up, you can see a testcase of the code I've presented here and obtain a modular version of the code here...

Modern C++ pattern for ugly C struct allocation

I'm making an ioctl call from C++ into a driver I don't own/maintain, and I'm trying to sort out if there's a clean, "safe-ish" mechanism to deal with some of the ugly struct allocation required.
Slimmed down version of some structures involved
// IOCTL expects an instance of this structure "first"
typedef struct {
int param1;
int param2;
} s_ioctl_request;
//... followed by an instance of this. If attr_length
// is > sizeof(s_attr_header), more data is allowed to follow.
typedef struct {
uint32_t attr_length;
uint32_t attr_type;
} s_attr_header;
// Example that uses more data than just the header.
typedef struct {
s_attr_header hdr;
uint32_t attr_param;
} s_attr_type1;
// Another example.
typedef struct {
s_attr_header hdr;
uint32_t attr_param1;
uint32_t attr_param2;
} s_attr_type2;
The ioctl requires that s_ioctl_request be immediately followed by an s_attr_header, or other struct containing it, where attr_length is set to the size of the outer struct in bytes.
In C, to write a wrapper for the ioctl it would be done via something along these lines:
int do_ugly_ioctl(int fd, int p1, int p2, s_attr_header * attr)
{
int res;
// Allocate enough memory for both structures.
s_ioctl_request *req = malloc( sizeof(*req) + attr->hdr.attr_length );
// Copy the 2nd, (variable length) structure after the first.
memcpy( ((char*)req) + sizeof(*req), attr, attr->hdr.attr_length);
// Modify params as necessary
req->param1 = p1;
req->param2 = p2;
// Make the driver call, free mem, and return result.
res = ioctl(fd, SOME_IOCTL_ID, req);
free(req);
return res;
}
// Example invocation.
s_attr_type1 a1;
a1.hdr.attr_length = sizeof(a1);
a1.hdr.attr_type = 1;
do_ugly_ioctl(fd, 10, 20, &a1);
A couple options I'm thinking of, are:
Throw modern C++-isms out the window, and do exactly what I've shown above.
Allocate the storage with a std::vector, then do ugly casts with the resulting std::vector::data() pointer so at least I'm not doing new[] / delete[] or malloc / free.
Create a unique wrapper method for each s_attr_type* that uses its own "special" struct. This seems "safest", i.e. least likely for the user of the wrapper method to screw it up. And bonus points, allows pass-by-ref.
Method #3 example:
int do_ugly_ioctl(fd, int param1, int param2, s_attr_type2& attr){
struct RequestData {
s_ioctl_request ioreq;
s_attr_type2 attr;
};
RequestData r;
r.ioreq.param1 = param1;
r.ioreq.param2 = param2;
r.attr = attr;
r.attr.hdr.attr_length = sizeof(attr); // Might as well enforce this here.
ioctl(fd, SOME_IOCTL_ID, (void*) &r);
}
So I guess some questions here are:
Is it "worth it" to C++-ize a solution to this problem? (as opposed to relying on the more error-prone C impl).
If I go with method #3 or similar, is there anything that I can do with <type_traits> to make a template of this function and only accept structs with an s_attr_header as the first member?
Any other brilliant ideas?
Totally worth it, and your solution is quite nice.
You might want to declare your structures as packed (there are compiler extensions to achieve this) to avoid having extra padding when combining multiple structures.
You can also set the size of the structure within the constructor.
struct RequestData
{
RequestData() : ioreq{}, attr{}
{
attr.hdr.attr_length = sizeof(attr);
}
s_ioctl_request ioreq;
s_attr_type2 attr;
};
concerning your second question, you could split the assignment in two, it's not too nice but it's easy and if you pass something without a correct header, it will lead to a compiler error:
template<typename Attr>
int do_ugly_ioctl(fd, int param1, int param2, Attr& attr){
struct RequestData {
s_ioctl_request ioreq;
Attr attr;
};
RequestData r;
r.ioreq.param1 = param1;
r.ioreq.param2 = param2;
s_attr_header hdr = Attr.hdr; //this will lead to compilation error if the type is not what we expect
(void) hdr;
r.attr = attr;
r.attr.hdr.attr_length = sizeof(attr); // Might as well enforce this here.
ioctl(fd, SOME_IOCTL_ID, (void*) &r);
}

Timing behavior of null pointers in C

The below is mostly tested for Microsoft CL Version 17.00.50727.1 on Windows 7, but I see something similar with g++. I'm quite sure that the logical function is correct. The question is only about the timing.
Essentially I have a function that dynamically returns new "blocks" of data as needed. If it runs out of space in its "page", it makes a new page.
The purpose of the blocks is to match incoming data keys. If the key is found, that's great. If not, then a new data key is added to the block. If the block runs out of space, then a new block is created and linked to the old one.
The code works whether or not the block-making function explicitly sets the "next" pointer in the new block to NULL. The theory is that calloc() has set the content to 0 already.
The first odd thing is that the block-making function takes about 5 times(!) longer to run when that "next" pointer is explicitly set to NULL. However, then that is done, then the timing of the overall example behaves as expected: It takes linearly longer to match a new key, the more entries there are in the key list. The only difference occurs when a key is added which causes a new block to be fetched. The overhead of doing that is similar to the time taken to call the block-making function.
The only problem with this is that the block-making function is unacceptably slow then.
When the pointer is NOT explicitly set to NULL, then the block-making function becomes nice and fast -- maybe half to a quarter of the key-matching function, instead of as long or even longer.
But then the key-matching function starts to exhibit odd timing behavior. It does mostly increase linearly with the number of keys. It still has jumps at 16 and 32 keys (since the list length is 16). But it also has a large jump at key number 0, and it has large jumps at keys number 17, 33 etc.
These are the key numbers when the program first has to look at the "next" pointer. Apparently it takes a long time to figure out that the 0 value from calloc is really a NULL pointer? Once it knows this, the next times are faster.
The second weird thing is that the effect goes away if the data struct consists exclusively of the key. Now the jumps at 0, 17, 33 etc. go away whether or not the "next" pointer is explicitly set to NULL. But when "int unused[4]" is also in the struct, then the effect returns.
Maybe the compiler (with option /O2 or with -O3 for g++) optimizes away the struct when it consists of a single number? But I still don't see why that would affect the timing behavior in this way.
I've tried to simplify the example as much as I could from the real code, but I'm sorry that it's still quite long. It's not that complicated, though.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <windows.h>
void timer_start(int n);
void timer_end(int n);
void print_times();
// There are pages of blocks, and data entries per block.
// We don't know ahead of time how many there will be.
// GetNextBlock() returns new blocks, and if necessary
// makes new pages. MatchOrStore() goes through data in
// a block to try to match the key. It won't ever match
// in this example, so the function makes a new data entry.
struct dataType
{
// Surprise number 1: If the line with "unused" is
// commented out, things behave as expected, even if
// Surprise number 2 is in effect.
int unused[4];
int key;
};
#define DATA_PER_BLOCK 16
struct blockType
{
char nextEntryNo;
struct dataType list[DATA_PER_BLOCK];
struct blockType * next;
};
struct pageType
{
int nextBlockNo;
struct blockType * list;
struct pageType * next;
struct pageType * prev;
};
struct blockType * GetNextBlock();
void MatchOrStore(
struct dataType * dp,
struct blockType * bp);
struct pageType * pagep;
int main(int argc, char * argv[])
{
pagep = (struct pageType *) 0;
struct dataType data;
for (int j = 0; j < 50000; j++)
{
struct blockType * blockp = GetNextBlock();
// Make different keys each time.
for (data.key = 0; data.key < 40; data.key++)
{
// One timer per key number, useful for statistics.
timer_start(data.key);
MatchOrStore(&data, blockp);
timer_end(data.key);
}
}
print_times();
exit(0);
}
#define BLOCKS_PER_PAGE 5000
struct blockType * GetNextBlock()
{
if (pagep == NULL ||
pagep->nextBlockNo == BLOCKS_PER_PAGE)
{
// If this runs out of page space, it makes some more.
struct pageType * newpagep = (struct pageType *)
calloc(1, sizeof(struct pageType));
newpagep->list = (struct blockType *)
calloc(BLOCKS_PER_PAGE, sizeof(struct blockType));
// I never actually free this, but you get the idea.
newpagep->nextBlockNo = 0;
newpagep->next = NULL;
newpagep->prev = pagep;
if (pagep)
pagep->next = newpagep;
pagep = newpagep;
}
struct blockType * bp = &pagep->list[ pagep->nextBlockNo++ ];
// Surprise number 2: If this line is active, then the
// timing behaves as expected. If it is commented out,
// then presumably calloc still sets next to NULL.
// But the timing changes in an unexpected way.
// bp->next = (struct blockType *) 0;
return bp;
}
void MatchOrStore(
struct dataType * dp,
struct blockType * blockp)
{
struct blockType * bp = blockp;
while (1)
{
for (int i = 0; i < bp->nextEntryNo; i++)
{
// This will spend some time traversing the list of
// blocks, failing to find the key, because that's
// the way I've set up the data for this example.
if (bp->list[i].key != dp->key) continue;
// It will never match.
return;
}
if (! bp->next) break;
bp = bp->next;
}
if (bp->nextEntryNo == DATA_PER_BLOCK)
{
// Once in a while it will run out of space, so it
// will make a new block and add it to the list.
timer_start(99);
struct blockType * bptemp = GetNextBlock();
bp->next = bptemp;
bp = bptemp;
timer_end(99);
}
// Since it didn't find the key, it will store the key
// in the list here.
bp->list[ bp->nextEntryNo++ ].key = dp->key;
}
#define NUM_TIMERS 100
#ifdef _WIN32
#include <time.h>
LARGE_INTEGER
tu0[NUM_TIMERS],
tu1[NUM_TIMERS];
#else
#include <sys/time.h>
struct timeval
tu0[NUM_TIMERS],
tu1[NUM_TIMERS];
#endif
int ctu[NUM_TIMERS],
number[NUM_TIMERS];
void timer_start(int no)
{
number[no]++;
#ifdef _WIN32
QueryPerformanceCounter(&tu0[no]);
#else
gettimeofday(&tu0[no], NULL);
#endif
}
void timer_end(int no)
{
#ifdef _WIN32
QueryPerformanceCounter(&tu1[no]);
ctu[no] += (tu1[no].QuadPart - tu0[no].QuadPart);
#else
gettimeofday(&tu1[no], NULL);
ctu[no] += 1000000 * (tu1[no].tv_sec - tu0[no].tv_sec )
+ (tu1[no].tv_usec - tu0[no].tv_usec);
#endif
}
void print_times()
{
printf("%5s %10s %10s %8s\n",
"n", "Number", "User ticks", "Avg");
for (int n = 0; n < NUM_TIMERS; n++)
{
if (number[n] == 0)
continue;
printf("%5d %10d %10d %8.2f\n",
n,
number[n],
ctu[n],
ctu[n] / (double) number[n]);
}
}

Resources