I was trying to write a library for linear algebra operations in Haskell. In order to be able to define safe operations for matrices and vectors I wanted to encode their dimensions in their types. After some research I found that using DataKinds one is able to do that, similar to the way it's done here. For example:
data Vector (n :: Nat) a
dot :: Num a => Vector n a -> Vector n a -> a
In the aforementioned article, as well as in some libraries, the size of the vectors is a phantom type and the vector type itself is a wrapper around an Array. In trying to figure out if there is a array type with its size at the type-level in the standard library I started wondering about the underlying representation of arrays. From what I could gather form this commentary on GHC memory layout, arrays need to store their size on the heap so a 3-dimensional vector would need to take up 1 more word than necessary. Of course we could use the following definition:
data Vector3 a = Vector3 a a a
which might be fine if we only care about 3D geometry, but it doesn't allow for vectors of arbitrary size and also it makes indexing awkward.
So, my question is this. Wouldn't it be useful and a potential memory optimization to have an array type with statically known size in the standard library? As far, as I understand the only thing that it would need is a different info table, which would store the size, instead of it being stored for at each heap object. Also, the compiler could choose between Array and SmallArray automatically.
Wouldn't it be useful and a potential memory optimization to have an array type with statically known size in the standard library?
Sure. I suspect if you wrote up your use case carefully and implemented this, GHC HQ would accept a patch. You might want to do the writeup first and double-check that they're into it to avoid wasting time on a patch they won't accept, though; I certainly don't speak for them.
Also, the compiler could choose between Array and SmallArray automatically.
I'm not an expert here, but I kinda doubt this. Usually supporting polymorphism means you need a uniform representation.
Related
Is it possible in modern Fortran to use a vector to index a multidimensional array? That is, given, say,
integer, dimension(3) :: index = [4,6,9]
double precision, dimension(10,10,10) :: data
is there a better (more general) way to access data(4,6,9) than writing data(index(1), index(2), index(3))? It would be good not to have to hard-code the rank of the data array.
(Naively I would like to write data(index) but of course this actually means something different - subset "gathering" - requiring data to be a rank-one array itself.)
For what it's worth this is essentially the same question as multidimensional index by array of indices in JavaScript, but in Fortran instead. Unfortunately the clever answers there won't work with predefined array ranks.
No. And all the workarounds I can think of are ghastly hacks, you're better off writing a function to take data and index as arguments and spit out the element(s) you want.
You might, however, be able to use modern Fortran's capabilities for array rank remapping to do exactly the opposite, which might satisfy your wish to play fast-and-loose with array ranks.
Given the declaration
double precision, dimension(1000), target :: data
you can define a rank-3 pointer
double precision, pointer :: index_3d(:,:,:)
and then set it like this:
index_3d(1:10,1:10,1:10) => data
and hey presto, you can now use both rank-3 and rank-1 indices into data, which is close to what you want to do. I've not used this in anger yet, but a couple of simple tests haven't revealed any serious problems.
Is it possible in modern Fortran to use a vector to index a multidimensional array? That is, given, say,
integer, dimension(3) :: index = [4,6,9]
double precision, dimension(10,10,10) :: data
is there a better (more general) way to access data(4,6,9) than writing data(index(1), index(2), index(3))? It would be good not to have to hard-code the rank of the data array.
(Naively I would like to write data(index) but of course this actually means something different - subset "gathering" - requiring data to be a rank-one array itself.)
For what it's worth this is essentially the same question as multidimensional index by array of indices in JavaScript, but in Fortran instead. Unfortunately the clever answers there won't work with predefined array ranks.
No. And all the workarounds I can think of are ghastly hacks, you're better off writing a function to take data and index as arguments and spit out the element(s) you want.
You might, however, be able to use modern Fortran's capabilities for array rank remapping to do exactly the opposite, which might satisfy your wish to play fast-and-loose with array ranks.
Given the declaration
double precision, dimension(1000), target :: data
you can define a rank-3 pointer
double precision, pointer :: index_3d(:,:,:)
and then set it like this:
index_3d(1:10,1:10,1:10) => data
and hey presto, you can now use both rank-3 and rank-1 indices into data, which is close to what you want to do. I've not used this in anger yet, but a couple of simple tests haven't revealed any serious problems.
I'm writing a matrix library (part of SciRuby) with multiple storage types ('stypes') and multiple data types ('dtypes'). For example, a matrix's stype may currently be dense, yale (AKA 'csr'), or list-of-lists; and its dtype may be int8, int16, int32, int64, float32, float64, complex64, etc.
It's super easy to write a template processor in Ruby or sed which takes a basic function (like sparse matrix multiplication) and creates a custom version for each possible dtype. I could even write such a template to handle two different dtypes, say if we wanted to multiply an int32 by a float64.
The same can be done in certain cases for different stypes. Eventually, though, you could end up with a very large set of functions, many of which never even get used in the course of most people's use.
It's also easy to use function pointer arrays to enable access to these functions -- and imagining even a 3-dimensional function pointer array is not too hard:
MultFuncs[lhs->stype][lhs->dtype][rhs->dtype](lhs->shape[0], rhs->shape[1], lhs->data, rhs->data, result->data);
// This might point to some function like this:
// i32_f64_dense_mult(size_t, size_t, int32_t*, float64*, float64*);
The extreme alternative to function pointer arrays, of course, which would be incredibly complicated to code and maintain, is hierarchical switch or if/else statements:
switch(lhs->stype) {
case STYPE_SPARSE:
switch(lhs->dtype) {
case DTYPE_INT32:
switch(rhs->dtype) {
case DTYPE_FLOAT64:
i32_f64_mult(lhs->shape[0], rhs->shape[1], lhs->ija, rhs->ija, lhs->a, rhs->a, result->data);
break;
// ... and so on ...
It also seems that this would be O(sd2), where s=number of stypes, d=number of dtypes for every operation, whereas the function pointer array would be O(r), where r=number of dimensions in the array.
But there's also a third option.
The third option is to use function pointer arrays for common operations (e.g., copying from one unknown type to another):
SetFuncs[lhs->dtype][rhs->dtype](5, // copy five consecutive items
&to, // destination
dtype_sizeof[lhs->dtype], // dtype_sizeof is a const size_t array giving sizeof(int8_t), sizeof(int16_t), etc.
&from, // source
dtype_sizeof[rhs->dtype]);
And then to call that from a generic sparse matrix multiplication function, which might be declared like this:
void generic_sparse_multiply(size_t* ija, size_t* ijb, void* a, void* b, int dtype_a, int dtype_b);
And that would use SetFuncs[dtype_a][dtype_b] to reference the correct assignment function, for example. The downside, then, is that you might have to implement a whole bunch of these -- IncrementFuncs, DecrementFuncs, MultFuncs, AddFuncs, SubFuncs, etc. -- because you'd never know what types to expect.
So, finally, my questions:
What is the cost, if any, of having enormous multi-dimensional const arrays of function pointers? Large library or executable? Slow load time? etc.
Does use of generics like IncrementFuncs, SetFuncs, etc. (which all probably depend on memcpy or typecasts) present barriers to compile-time optimization?
If one were to use switch statements as described above, would these be optimized out by modern compilers? Or would they be evaluated every single time?
I realize this is an incredibly complicated array of questions.
If you can simply refer me to a good resource and prefer not to answer directly, that's perfectly fine. I used the Google extensively before posting this, but wasn't quite sure what search terms to use.
First of all, try to reduce the complexity of the function(s). You should be able to have a declaration like
Result_t function (Param_t*);
where Param_t is a struct containing all those things you pass around. To use generic types, include an enum in the struct telling which type that is used, and a void* to that type.
1.What is the cost, if any, of having enormous multi-dimensional const arrays of function pointers? Large library or executable? Slow
load time? etc.
Definitely larger executable. Load time depends on what system the code is for. If it is for a RAM-based system (PC etc), then the load time might increase, but it shouldn't have any major impact of performance. Though of course it depends on how large "enormous" is :)
2.Does use of generics like IncrementFuncs, SetFuncs, etc. (which all probably depend on memcpy or typecasts) present barriers to
compile-time optimization?
Probably not, there's just so much that the compiler can optimize. When dealing with generic data types in C, it often boils down to memcpy() in the end, which in itself hopefully is implemented to be as fast as copying gets.
3.If one were to use switch statements as described above, would these be optimized out by modern compilers? Or would they be evaluated
every single time?
Ironically, the compiler would probably optimize it into something like an array of function pointers. The compiler can however likely not predict the nature of the data, especially if it gets set in runtime.
I want to tackle some image-processing problems in Haskell. I'm working with both bitonal (bitmap) and color images with millions of pixels. I have a number of questions:
On what basis should I choose between Vector.Unboxed and UArray? They are both unboxed arrays, but the Vector abstraction seems heavily advertised, particular around loop fusion. Is Vector always better? If not, when should I use which representation?
For color images I will wish to store triples of 16-bit integers or triples of single-precision floating-point numbers. For this purpose, is either Vector or UArray easier to use? More performant?
For bitonal images I will need to store only 1 bit per pixel. Is there a predefined datatype that can help me here by packing multiple pixels into a word, or am I on my own?
Finally, my arrays are two-dimensional. I suppose I could deal with the extra indirection imposed by a representation as "array of arrays" (or vector of vectors), but I'd prefer an abstraction that has index-mapping support. Can anyone recommend anything from a standard library or from Hackage?
I am a functional programmer and have no need for mutation :-)
For multi-dimensional arrays, the current best option in Haskell, in my view, is repa.
Repa provides high performance, regular, multi-dimensional, shape polymorphic parallel arrays. All numeric data is stored unboxed. Functions written with the Repa combinators are automatically parallel provided you supply +RTS -Nwhatever on the command line when running the program.
Recently, it has been used for some image processing problems:
Real time edge detection
Efficient Parallel Stencil Convolution in Haskell
I've started writing a tutorial on the use of repa, which is a good place to start if you already know Haskell arrays, or the vector library. The key stepping stone is the use of shape types instead of simple index types, to address multidimensional indices (and even stencils).
The repa-io package includes support for reading and writing .bmp image files, though support for more formats is needed.
Addressing your specific questions, here is a graphic, with discussion:
On what basis should I choose between Vector.Unboxed and UArray?
They have approximately the same underlying representation, however, the primary difference is the breadth of the API for working with vectors: they have almost all the operations you'd normally associate with lists (with a fusion-driven optimization framework), while UArray have almost no API.
For color images I will wish to store triples of 16-bit integers or triples of single-precision floating-point numbers.
UArray has better support for multi-dimensional data, as it can use arbitrary data types for indexing. While this is possible in Vector (by writing an instance of UA for your element type), it isn't the primary goal of Vector -- instead, this is where Repa steps in, making it very easy to use custom data types stored in an efficient manner, thanks to the shape indexing.
In Repa, your triple of shorts would have the type:
Array DIM3 Word16
That is, a 3D array of Word16s.
For bitonal images I will need to store only 1 bit per pixel.
UArrays pack Bools as bits, Vector uses the instance for Bool which does do bit packing, instead using a representation based on Word8. Howver, it is easy to write a bit-packing implementation for vectors -- here is one, from the (obsolete) uvector library. Under the hood, Repa uses Vectors, so I think it inherits that libraries representation choices.
Is there a predefined datatype that can help me here by packing multiple pixels into a word
You can use the existing instances for any of the libraries, for different word types, but you may need to write a few helpers using Data.Bits to roll and unroll packed data.
Finally, my arrays are two-dimensional
UArray and Repa support efficient multi-dimensional arrays. Repa also has a rich interface for doing so. Vector on its own does not.
Notable mentions:
hmatrix, a custom array type with extensive bindings to linear algebra packages. Should be bound to use the vector or repa types.
ix-shapeable, getting more flexible indexing from regular arrays
chalkboard, Andy Gill's library for manipulating 2D images
codec-image-devil, read and write various image formats to UArray
Once I reviewed the features of Haskell array libraries which matter for me, and compiled a comparison table (only spreadsheet: direct link). So I'll try to answer.
On what basis should I choose between Vector.Unboxed and UArray? They are both unboxed arrays, but the Vector abstraction seems heavily advertised, particular around loop fusion. Is Vector always better? If not, when should I use which representation?
UArray may be preferred over Vector if one needs two-dimensional or multi-dimensional arrays. But Vector has nicer API for manipulating, well, vectors. In general, Vector is not well suited for simulating multi-dimensional arrays.
Vector.Unboxed cannot be used with parallel strategies. I suspect that UArray cannot be used neither, but at least it is very easy to switch from UArray to boxed Array and see if parallelization benefits outweight the boxing costs.
For color images I will wish to store triples of 16-bit integers or triples of single-precision floating-point numbers. For this purpose, is either Vector or UArray easier to use? More performant?
I tried using Arrays to represent images (though I needed only grayscale images). For color images I used Codec-Image-DevIL library to read/write images (bindings to DevIL library), for grayscale images I used pgm library (pure Haskell).
My major problem with Array was that it provides only random access storage, but it doesn't provide many means of building Array algorithms nor doesn't come with ready to use libraries of array routines (doesn't interface with linear algebra libs, doesn't allow to express convolutions, fft and other transforms).
Almost every time a new Array has to be built from the existing one, an intermediate list of values has to be constructed (like in matrix multiplication from the Gentle Introduction). The cost of array construction often out-weights the benefits of faster random access, to the point that a list-based representation is faster in some of my use cases.
STUArray could have helped me, but I didn't like fighting with cryptic type errors and efforts necessary to write polymorphic code with STUArray.
So the problem with Arrays is that they are not well suited for numerical computations. Hmatrix' Data.Packed.Vector and Data.Packed.Matrix are better in this respect, because they come along with a solid matrix library (attention: GPL license). Performance-wise, on matrix multiplication, hmatrix was sufficiently fast (only slightly slower than Octave), but very memory-hungry (consumed several times more than Python/SciPy).
There is also blas library for matrices, but it doesn't build on GHC7.
I didn't have much experience with Repa yet, and I don't understand repa code well. From what I see it has very limited range of ready to use matrix and array algorithms written on top of it, but at least it is possible to express important algorithms by the means of the library. For example, there are already routines for matrix multiplication and for convolution in repa-algorithms. Unfortunately, it seems that convolution is now limited to 7×7 kernels (it's not enough for me, but should suffice for many uses).
I didn't try Haskell OpenCV bindings. They should be fast, because OpenCV is really fast, but I am not sure if the bindings are complete and good enough to be usable. Also, OpenCV by its nature is very imperative, full of destructive updates. I suppose it's hard to design a nice and efficient functional interface on top of it. If one goes OpenCV way, he is likely to use OpenCV image representation everywhere, and use OpenCV routines to manipulate them.
For bitonal images I will need to store only 1 bit per pixel. Is there a predefined datatype that can help me here by packing multiple pixels into a word, or am I on my own?
As far as I know, Unboxed arrays of Bools take care of packing and unpacking bit vectors. I remember looking at implementation of arrays of Bools in other libraries, and didn't see this elsewhere.
Finally, my arrays are two-dimensional. I suppose I could deal with the extra indirection imposed by a representation as "array of arrays" (or vector of vectors), but I'd prefer an abstraction that has index-mapping support. Can anyone recommend anything from a standard library or from Hackage?
Apart from Vector (and simple lists), all other array libraries are capable of representing two-dimensional arrays or matrices. I suppose they avoid unneccesary indirection.
Although, this doesn't exactly answer your question and isn't really even haskell as such, I would recommend taking a look at CV or CV-combinators libraries at hackage. They bind the many rather useful image processing and vision operators from the opencv-library and make working with machine vision problems much faster.
It would be rather great if someone figures out how repa or some such array library could be directly used with opencv.
Here is a new Haskell Image Processing library that can handle all of the tasks in question and much more. Currently it uses Repa and Vector packages for underlying representations, which consequently inherits fusion, parallel computation, mutation and most of the other goodies that come with those libraries. It provides an easy to use interface that is natural for image manipulation:
2D indexing and unboxed pixels with arbitrary precision (Double, Float, Word16, etc..)
all essential functions like map, fold, zipWith, traverse ...
support for various color spaces: RGB, HSI, gray scale, Bi-tonal, Complex, etc.
common image processing functionality:
Binary morphology
Convolution
Interpolation
Fourier transform
Histogram plotting
etc.
Ability to treat pixels and images as regular numbers.
Reading and writing common image formats through JuicyPixels library
Most importantly, it is a pure Haskell library, so it does not depend on any external programs. It is also highly extendable, new color spaces and image representations can be introduced.
One thing that it does not do is packing multiple binary pixels in a Word, instead it uses a Word per binary pixel, maybe in a future...
I want to implement (what represents abstractly) a two dimensional 4x4 matrix. All the code I write for matrix multiplication et cetera will be entirely "unrolled" as it were -- that is to say, I will not be using loops to access and write data entries in the matrix.
My question is: In C, would it be faster to use a struct as such:
typedef struct {
double e0, e1, e2, e3, e4, ..., e15
} My4x4Matrix;
Or would this be faster:
typedef double My4x4Matrix[16];
Given that I will be accessing each matrix element individually as such:
My4x4Matrix a,b,c;
// (Some initialization of a and b.)
...
c.e0=a.e0+b.e0;
c.e1=a.e1+b.e1;
...
Or
My4x4Matrix a,b,c;
// (Some initialization of a and b.)
...
c[0]=a[0]+b[0];
c[1]=a[1]+b[1];
...
Or are they exactly the same speed?
Any decent compiler will generate the exact same code, byte-for-byte. However, using arrays allows you a lot more flexibility; when accessing the matrix elements, you can choose whether you want to access fixed locations or address positions with variables.
I also highly question your choice to "unwind" (unroll?) all the operations by hand. Any good compiler can fully unroll loops with a constant number of iterations for you, and can perhaps even generate SIMD code and/or optimally schedule the order of instructions. You'll have a hard time doing better by hand, and you'll end up with code that's hideous to read. The fact that you asked this question suggests to me that you're probably not sufficiently experienced to do better than even a naive optimizing compiler.
Struct elements (fields) can only be accessed by their names explicitly specified in the program's source, which means that every time you access a field the actual field must be selected and hardcoded at compile time. If you wanted to implement the same thing with arrays, that would mean that you would use explicit constant compile-time array indices (as in your example). In this case the performance of the two will be exactly the same and the code generated will be exactly the same (excluding from consideration "malicious" compilers).
However, note that arrays provide you with an extra degree of freedom: if necessary, you can select array elements by a run-time index. This is something that's not possible with structs. Only you know whether it matters to you.
On the other hand, note also that arrays in C are not copyable, which means that you'll be forced to use memcpy to copy your array-based My4x4Matrix. With struct-based version normal language-level copying will work. With arrays this issue can be worked around by wrapping the actual array in a struct.
I guess both are the same speed. The difference between a struct and an array is just its meaning (in human terms.) Both will be compiled as memory addresses.
I would say the best way is to create a test to try it yourself. Results may vary based on system environments and compilers.