Cheapest structure for deferencing?

Cheapest structure for deferencing? - c

Let's say I have an associative array keyed by unsigned int; values could be of any fixed-size type. There is some pre-defined maximum no. of instances.
API usage example: MyStruct * valuePtr = get(1234); and put(6789, &myStructInstance); ...basic.
I want to minimise cache misses as I read entries rapidly and at random from this array, so I pre-malloc(sizeof(MyType) * MAX_ENTRIES) to ensure locality of reference inasmuch as possible.
Genericism is important for the values array. I've looked at C pseudo-generics, but prefer void * for simplicity; however, not sure if this is at odds with performance goals. Ultimately, I would like to know what is best for performance.
How should I implement my associative array for performance? Thoughts thus far...
Do I pass the associative array a single void * pointer to the malloced values array and allow it to use that internally (for which we would need to guarantee a matching keys array size)? Can I do this generically, since the type needs(?) to be known in order to index into values array?
Do I have a separate void * valuePtrs[] within the associative array, then have these pointers point to each element in the malloced values array? This would seem to avoid need to know about concrete type?
Do I use C pseudo-generics and thus allow get() to return a specific value type? Surely in this case, the only benefit is not having to explicitly cast, e.g. MyStruct* value = (MyStruct*) get(...)... the array element still has to be dereferenced and so has the same overhead?
And, in general, does the above approach to minimising cahce misses appear to make sense?

In both cases, the performance is basically the same.
In the first one (void* implementation), you will need to look up the value + dereference the pointer. So these are two instructions.
In the other implementation, you will need to multiply the index with the size of the values. So this implementation also asks for two instructions.
However, the first implementation will be easier and more clean to implement. Also, the array is fully transparent; the user will not need to know what kind of structures are in the array.

See solutions categorised below, in terms of pros and cons (thanks to Ruben for assisting my thinking)... I've implemented Option 2 and 5 for my use case, which is somewhat generalised; I recommend Option 4 if you need a very specific, one-off data structure. Option 3 is most flexible while being trivial to code, and is also the slowest. Option 4 is the quickest. Option 5 is a little slow but with flexibility on the array size and ease of general use.
Associative array struct points to array of typed pointers:
pros no failure value required, explicit casts not required, does not need compile-time size of array
cons costly double deref, requires generic library code
Associative array struct holds array of void * pointers:
pros no failure value required, no generic library code
cons costly double deref, explicit casts following get(), needs compile time size of array if VLAs are not used
Associative array struct points to array of void * values:
pros no generic library code, does not need compile-time size of array
cons costly triple deref, explicit casts following get(), requires offset calc which requires sizeof value passed in explicitly
Associative array struct holds array of typed values:
pros cheap single deref, explicit casts not required, keys and entries allocated contiguously
cons requires generic library code, failure value must be supplied, needs compile time size of array if VLAs are not used
Associative array struct points to array of typed values:
pros explicit casts not required, flexible array size
cons costly double deref, requires generic library code, failure value must be supplied, needs compile time size of array if VLAs are not used

Related

Dynamic array guarantee clarification

In Skiena's Algorithm Design Manual, he mentions at one point:
The primary thing lost using dynamic arrays is the guarantee that each array
access takes constant time in the worst case. Now all the queries will be fast, except
for those relatively few queries triggering array doubling. What we get instead is a
promise that the nth array access will be completed quickly enough that the total
effort expended so far will still be O(n).
I'm struggling to understand this. How will an array query expand the array?

Dynamic arrays are arrays where the size does not need to be specified (Think of an ArrayList in java). Under the hood, dynamic arrays are implemented using a regular array. Though, because it's a regular array the implementation of the ArrayList needs to specify the size of the underlying array.
So the typical way to handle this in dynamic arrays is to initialize the standard array with a certain amount of elements, then when it reached it's maximum elements, the array is doubled in size.
Because of this underlying functionality, most of the time it will take constant time when adding to a dynamic array, but occasionally it will double the size of the 'under the hood' standard array which will take longer than the normal add time.
If your confusion lies with his use of the word 'query', I believe he means to say 'adding or removing from the array' because a simple 'get' query shouldn't be related to the underlying standard array size.

Fundamental limitations of cell arrays, arrays of structs, and scalar structs?

I've been using Matlab on and off for decades. I thought I had a good grip on arrays, structs, cell arrays, tables, an array of structs, and a struct in which each field is an array. For the latter two, I assumed that each field needed to be of uniform type. I'm finding that no such limitation exists:
Perhaps Matlab is becoming more flexible with the years (I'm using 2015b), but it does undermine my confidence in choosing the best type of variable for a task if I find that understanding of the limitations of each type is wrong. For the purpose of this question, I can't really articulate the needs of the task because the manner in which I break down a large to-do into tasks depends on my understanding of the data types at my disposal, and their advantages/limitations.
I can (and have) read online documentation ad nauseum, and while they will walk you through code to illustrate what the data types are able to do, I haven't yet come across a succinct description of the comparative limitations between cell arrays, arrays of structs, and structs whose fields are themselves arrays -- to the point that I can use that knowledge to choose the best structure in a given situation. Basic stuff, I do find, e.g., the same field names will occur in each struct of a struct array (but as the above example shows, each field of each struct can contain highly heterogeneous data types and/or array sizes).
THE QUESTION
Can anyone point to such a comparison of limitations between cell arrays, arrays of structs, and scalar structs whose fields are themselves arrays? I'm looking for a treatment at a level that informs a coder in deciding on the best trade-off between (i) speed, (ii) memory, and (iii) readability, maintainability, and evolvability.
I've deliberately left out tables because, although I'm enamoured of their convenient access to, and subsetting of, data sets (and presentation thereof), they have proved rather slow for manipulation of data. They have their uses, and I use them liberally, but I'm not interested in them for the purpose of this comparison, which is under-the-hood algorithm coding.

I think your question eventually narrows down to these three "types" of data structures:
comparative limitations between cell arrays, arrays of structs, and structs whose fiels are themselves arrays
[Note that "structs whose fields are themselves arrays" I translate as "scalar structs" here. An array of structs can also contain arbitrary arrays. My thinking becomes clear below, I hope.]
To me, these are not very different. All three are containers for heterogeneous data. (Heterogeneous data is non-uniform data, each data element is potentially of a different type and size.) Each of these statements can return an array of any type, unrelated to the type of any other array in the container:
cell array: array{i,j}
struct array: array(i,j).value
scalar struct: array.value
So it all depends on how you want to index:
array(i,j).value
^ ^
A B
If you want to index using A only, use a cell array (though you then need curly braces, of course). If you want to index using B only, use a scalar struct. If you want both A and B, use a struct array.
There is no difference in cost that I'm aware of. Each of the arrays contained in these containers takes up some space. The spatial overhead of the various containers is similar, and I have never noted a time overhead difference.
However, there is a huge difference between these two:
array(i).value % s1
array.value(i) % s2
I think that the question deals with this difference also. s1 has a lot more spatial overhead than s2:
>> s1=struct('value',num2cell(1:100))
s1 =
1×100 struct array with fields:
value
>> s2=struct('value',1:100)
s2 =
struct with fields:
value: [1×100 double]
>> whos
Name Size Bytes Class Attributes
s1 1x100 12064 struct
s2 1x1 976 struct
The data needs 800 bytes, so s2 has 176 bytes of overhead, whereas s1 has 11264 (1408%)!
The reason is not the container, but the fact that we're storing one array with 100 elements in one, and 100 arrays with one element in the other. Each array has a header of a certain size that MATLAB uses to know what type of array it is, what sizes it has, to manage its storage and the delayed copy mechanism. The fewer arrays one has, the less memory one uses.
So, don't use a heterogeneous container to store scalars! These things only make sense when you need to store larger arrays, or arrays of different type or size.
The heterogeneous container that is not explicitly asked about (and after the edit explicitly not asked about) is the table. A table is similar to a scalar struct in that each column of the table is a single array, and different columns can have different types. Note that it is possible to use a cell array as a column, allowing for heterogenous elements to be stored in a column, but they make most sense if this is not the case.
One difference with a scalar struct is that each column must have the same number of rows. Another difference is that indexing can look like that of a cell array, a scalar struct, or a struct array.
Thus, the table forces some constrains upon the contained data, which is very beneficial in some circumstances.
However, and as the OP noted, working with tables is slower than working with structs. This is because table is a custom class, not a native type like structs and cell arrays. If you type edit table in MATLAB, you'll see the source code, how it's implemented. It's a classdef file, just like something any of us could write. Consequently, it has the same speed limitations: the JIT is not optimized for it, indexing into a table implies running a function written as an M-file, etc.
One more thing: Don't create cell arrays of structs, or scalar structs with cell arrays. This increases the levels of containers, which increases overhead (both in space and time), and makes the contents more difficult to use. I have seen questions here on SO related to difficulty accessing data, caused by this type of construct:
data{i,j}.value % A cell array with structs. Don't do this!
data.value{i,j} % A struct with cell arrays. Don't do this!
The first example is equal to a struct array (with a lot more overhead), except there is no control over the struct fields within each cell. That is, it is possible for one of the cells to not have a .value field.
The second example makes sense only if value is a different size than a second struct field. If all struct fields are (supposed to be) cell arrays of the same size like this, then use a struct array. Again, less overhead and more uniformity.

Why have arrays in Go?

I understand the difference between arrays and slices in Go. But what I don't understand is why it is helpful to have arrays at all. Why is it helpful that an array type definition specifies a length and an element type? Why can't every "array" that we use be a slice?

There is more to arrays than just the fixed length: they are comparable, and they are values (not reference or pointer types).
There are countless advantages of arrays over slices in certain situations, all of which together more than justify the existence of arrays (along with slices). Let's see them. (I'm not even counting arrays being the building blocks of slices.)
1. Being comparable means you can use arrays as keys in maps, but not slices. Yes, you could say now that why not make slices comparable then, so that this alone wouldn't justify the existence of both. Equality is not well defined on slices. FAQ: Why don't maps allow slices as keys?
They don't implement equality because equality is not well defined on such types; there are multiple considerations involving shallow vs. deep comparison, pointer vs. value comparison, how to deal with recursive types, and so on.
2. Arrays can also give you higher compile-time safety, as the index bounds can be checked at compile time (array length must evaluate to a non-negative constant representable by a value of type int):
s := make([]int, 3)
s[3] = 3 // "Only" a runtime panic: runtime error: index out of range
a := [3]int{}
a[3] = 3 // Compile-time error: invalid array index 3 (out of bounds for 3-element array)
3. Also passing around or assigning array values will implicitly make a copy of the entire array, so it will be "detached" from the original value. If you pass a slice, it will still make a copy but just of the slice header, but the slice value (the header) will point to the same backing array. This may or may not be what you want. If you want to "detach" a slice from the "original" one, you have to explicitly copy the content e.g. with the builtin copy() function to a new slice.
a := [2]int{1, 2}
b := a
b[0] = 10 // This only affects b, a will remain {1, 2}
sa := []int{1, 2}
sb := sa
sb[0] = 10 // Affects both sb and sa
4. Also since the array length is part of the array type, arrays with different length are distinct types. On one hand this may be a "pain in the ass" (e.g. you write a function which takes a parameter of type [4]int, you can't use that function to take and process an array of type [5]int), but this may also be an advantage: this may be used to explicitly specify the length of the array that is expected. E.g. you want to write a function which takes an IPv4 address, it can be modeled with the type [4]byte. Now you have a compile-time guarantee that the value passed to your function will have exactly 4 bytes, no more and no less (which would be an invalid IPv4 address anyway).
5. Related to the previous, the array length may also serve a documentation purpose. A type [4]byte properly documents that IPv4 has 4 bytes. An rgb variable of type [3]byte tells there are 1 byte for each color components. In some cases it is even taken out and is available, documented separately; for example in the crypto/md5 package: md5.Sum() returns a value of type [Size]byte where md5.Size is a constant being 16: the length of an MD5 checksum.
6. They are also very useful when planning memory layout of struct types, see JimB's answer here, and this answer in greater detail and real-life example.
7. Also since slices are headers and they are (almost) always passed around as-is (without pointers), the language spec is more restrictive regarding pointers to slices than pointers to arrays. For example the spec provides multiple shorthands for operating with pointers to arrays, while the same gives compile-time error in case of slices (because it's rare to use pointers to slices, if you still want / have to do it, you have to be explicit about handling it; read more in this answer).
Such examples are:
Slicing a p pointer to array: p[low:high] is a shorthand for (*p)[low:high]. If p is a pointer to slice, this is compile-time error (spec: Slice expressions).
Indexing a p pointer to array: p[i] is a shorthand for (*p)[i]. If p is pointer to a slice, this is a compile time error (spec: Index expressions).
Example:
pa := &[2]int{1, 2}
fmt.Println(pa[1:1]) // OK
fmt.Println(pa[1]) // OK
ps := &[]int{3, 4}
println(ps[1:1]) // Error: cannot slice ps (type *[]int)
println(ps[1]) // Error: invalid operation: ps[1] (type *[]int does not support indexing)
8. Accessing (single) array elements is more efficient than accessing slice elements; as in case of slices the runtime has to go through an implicit pointer dereference. Also "the expressions len(s) and cap(s) are constants if the type of s is an array or pointer to an array".
May be suprising, but you can even write:
type IP [4]byte
const x = len(IP{}) // x will be 4
It's valid, and is evaluated and compile-time even though IP{} is not a constant expression so e.g. const i = IP{} would be a compile-time error! After this, it's not even surprising that the following also works:
const x2 = len((*IP)(nil)) // x2 will also be 4
Note: When ranging over a complete array vs a complete slice, there may be no performance difference at all as obviously it may be optimized so that the pointer in the slice header is only dereferenced once. For details / example, see Array vs Slice: accessing speed.
See related questions where an array can be used / makes more sense than a slice:
Why use arrays instead of slices?
Why can't Go slice be used as keys in Go maps pretty much the same way arrays can be used as keys?
Hash with key as an array type
How do I check the equality of three values elegantly?
Slicing a slice pointer passed as argument
And this is just for curiosity: a slice can contain itself while an array can't. (Actually this property makes comparison easier as you don't have to deal with recursive data structures).
Must-read blogs:
Go Slices: usage and internals
Arrays, slices (and strings): The mechanics of 'append'

Arrays are values, and it is often useful to have a value instead of a pointer.
Values can be compared, hence you can use arrays as map keys.
Values are always initialized, so there's you don't need to initialize, or make them like you do with a slice.
Arrays give you better control of memory layout, where as you can't allocate space directly in a struct with a slice, you can with an array:
type Foo struct {
buf [64]byte
}
Here, a Foo value will contains a 64 byte value, rather than a slice header which needs to be separately initialized. Arrays are also used to pad structs to match alignment when interoperating with C code and to prevent false sharing for better cache performance.
Another aspect for improved performance is that you can better define memory layout than with slices, because data locality can have a very big impact on memory intensive calculations. Dereferencing a pointer can take considerable time compared to the operations being performed on the data, and copying values smaller than a cache line incurs very little cost, so performance critical code often uses arrays for that reason alone.

Arrays are more efficient in saving space. If you never update the size of the slice (i.e. start with a predefined size and never go past it) there really is not much of a performance difference. But there is extra overhead in space, as a slice is simply a wrapper containing the array at its core. Contextually, it also improves clarity as it makes the intended use of the variable more apparent.

Every array could be a slice but not every slice could be an array. If you have a fixed collection size you can get a minor performance improvement from using an array. At the very least you'll save the space occupied by the slice header.

thread-safe cache for sparse, lazy, immutable arrays

I have an application that involves a collection of arrays which can be very large (indices up to the maximum value of an int), but which are lazy - their contents are calculated on the fly and are not actually known until requested. The arrays are also immutable - the value of each element of each array is constant throughout the life of the program. The arrays are sparse in the sense that often only a small subset of all array elements are ever requested (the arrays do not contain large blocks of zeros and are not "sparse" in that sense.)
Looking up (and possibly calculating in the process) an array element can be expensive, so I want to add a caching layer. The cache should implement the following interface:
void point_cache_store (gpointer data, gsize idx, gdouble value);
gdouble point_cache_fetch (gpointer data, gsize idx);
where data serves as a unique handle for each array (there can be many of these). point_cache_fetch() should return the value argument passed to point_cache_store() with the same data and idx arguments, or indicate a cache miss by returning the special value DATUM_UNKNOWN_VALUE (the caller will never call point_cache_store with DATUM_UNKNOWN_VALUE).
The question is: how can I implement point_cache_fetch() and point_cache_store()? (They are currently no-op stubs.)
Points to consider:
The cache implementation must be thread-safe. Several threads are running simultaneously and any of these can call point_cache_store() or point_cache_fetch() with any data or idx arguments.
The cache truly is a cache; it's always OK for point_cache_fetch() to return DATUM_UNKNOWN_VALUE, even if it once knew that value. The caller will just perform an ordinary lookup in that case.
Remember, the arrays are immutable - for given data and idx arguments, the caller will always provide the same value argument.
I realize that there are many ways to do this and that there are tradeoffs involved. For this question, though, I am going to evaluate answers by one very specific criterion: whether they improve performance in one particular benchmark in the application that inspired the question. If you want to go the extra mile and run the benchmark yourself, here is how to do it:
git clone git://github.com/gbenison/starparse
git clone git://github.com/gbenison/burrow-owl.git -b point-cache-base
The functions point_cache_fetch() and point_cache_store() are found in "burrow/spectrum/point_cache.c". The relevant benchmark is "benchmarks/b_cache".

A "very large sparse lazy array"? Sounds like you need a hash table.
From your point_cache_fetch function prototype and all through your question, I am confused about whether your cached values are doubles or immutable arrays.
I'm not going to provide an implementation, as this is not a 'coding challenge' website. You should try to find and reuse existing libraries of threadsafe hashtables and compare their performance for your specific needs.

C - How to implement Set data structure?

Is there any tricky way to implement a set data structure (a collection of unique values) in C? All elements in a set will be of the same type and there is a huge RAM memory.
As I know, for integers it can be done really fast'N'easy using value-indexed arrays. But I'd like to have a very general Set data type. And it would be nice if a set could include itself.

There are multiple ways of implementing set (and map) functionality, for example:
tree-based approach (ordered traversal)
hash-based approach (unordered traversal)
Since you mentioned value-indexed arrays, let's try the hash-based approach which builds naturally on top of the value-indexed array technique.
Beware of the advantages and disadvantages of hash-based vs. tree-based approaches.
You can design a hash-set (a special case of hash-tables) of pointers to hashable PODs, with chaining, internally represented as a fixed-size array of buckets of hashables, where:
all hashables in a bucket have the same hash value
a bucket can be implemented as a dynamic array or linked list of hashables
a hashable's hash value is used to index into the array of buckets (hash-value-indexed array)
one or more of the hashables contained in the hash-set could be (a pointer to) another hash-set, or even to the hash-set itself (i.e. self-inclusion is possible)
With large amounts of memory at your disposal, you can size your array of buckets generously and, in combination with a good hash method, drastically reduce the probability of collision, achieving virtually constant-time performance.
You would have to implement:
the hash function for the type being hashed
an equality function for the type being used to test whether two hashables are equal or not
the hash-set contains/insert/remove functionality.
You can also use open addressing as an alternative to maintaining and managing buckets.

Sets are usually implemented as some variety of a binary tree. Red black trees have good worst case performance.
These can also be used to build an map to allow key / value lookups.
This approach requires some sort of ordering on the elements of the set and the key values in a map.
I'm not sure how you would manage a set that could possibly contain itself using binary trees if you limit set membership to well defined types in C ... comparison between such constructs could be problematic. You could do it easily enough in C++, though.

The way to get genericity in C is by void *, so you're going to be using pointers anyway, and pointers to different objects are unique. This means you need a hash map or binary tree containing pointers, and this will work for all data objects.
The downside of this is that you can't enter rvalues independently. You can't have a set containing the value 5; you have to assign 5 to a variable, which means it won't match a random 5. You could enter it as (void *) 5, and for practical purposes this is likely to work with small integers, but if your integers can get into large enough sizes to compete with pointers this has a very small probability of failing.
Nor does this work with string values. Given char a[] = "Hello, World!"; char b[] = "Hello, World!";, a set of pointers would find a and b to be different. You would probably want to hash the values, but if you're concerned about hash collisions you should save the string in the set and do a strncmp() to compare the stored string with the probing string.
(There's similar problems with floating-point numbers, but trying to represent floating-point numbers in sets is a bad idea in the first place.)
Therefore, you'd probably want a tagged value, one tag for any sort of object, one for integer value, and one for string value, and possibly more for different sorts of values. It's complicated, but doable.

If the maximum number of elements in the set (the cardinality of the underlying data type) is small enough, you might want to consider using a plain old array of bits (or whatever you call them in your favourite language).
Then you have a simple set membership check: bit n is 1 if element n is in the set. You could even count 'ordinary' members from 1, and only make bit 0 equal to 1 if the set contains itself.
This approach will probably require some sort of other data structure (or function) to translate from the member data type to the position in the bit array (and back), but it makes basic set operations (union, intersection, membership test, difference, insertion, removal,compelment) very very easy. And it is only suitable for relatively small sets, you wouldn't want to use it for sets of 32-bit integers I don't suppose.