haskell FFI passing array of C structs in and out

haskell FFI passing array of C structs in and out - c

I am trying to perform a series of transforms on graphical files using Haskell and Repa/DevIL. The starting example used was provided by the Haskell wiki page https://wiki.haskell.org/Numeric_Haskell:_A_Repa_Tutorial. I am an imperative programmer of 30 years experience with some erlang for good measure, trying to learn Haskell outside a classroom environment.
The problem is manipulating the data after the file load was first transformed into a Repa array:
import Data.Array.Repa.IO.DevIL (runIL,readImage,writeImage,Image(RGB),IL)
import qualified Data.Array.Repa as R
import Data.Vector.Unboxed as DVU
import Control.Monad
main :: IO ()
main = do
[f] <- getArgs
(RGB a) <- runIL $ Data.Array.Repa.IO.DevIL.readImage f
let
c = (computeP (R.traverse a id rgbTransform)) :: IL (Array U DIM3 Float)
which is successfully cast to type "Array F DIM3 Float" as output from the rgbTransform. From that point on it has been a nightmare to use the data. Flicking the array storage type between F(oreign) and U(nboxed) changes all following call's usability, plus the Repa-added monad layer IL forces use of liftM for nearly every equation following the 1st transform:
let -- continued
sh = liftM R.extent c -- IL DIM3
v = liftM R.toUnboxed c -- IL (Vector Float)
lv = liftM DVU.length v -- IL Int
f = liftM indexed v -- vector of tuples: (Int,a) where Int is idx
k = (Z :. 2) :. 2 :. 0 :: DIM3
These are the routines I can call without error. The IO monad's print command produces no output if placed in or after this 'let' list, due to the IL monad layer.
The game plan for the curious:
read the graphic file (done, via Repa)
resize image (not done, no resize in Repa, must be hand-coded)
transform and convert image from Word8 to Float (done)
get a Stablepointer to the transformed Float data (not done)
transform in-place the Float data as an array of C structs
of {Float a,b,c;}, by an external C routine via FFI (not completely
done). This is done hopefully without marshalling a new graphic
array by passing a pointer to the data
perform more passes over the transformed data to extract more info (partly done).
I am looking for help with issues 4 and 5.
4 -> The type system has been difficult to deal with while attempting to get C-usable memory pointers. Going thru the mountains of haskell library calls has not helped.
5 -> The external C routine is of type:
foreign import ccall unsafe "transform.h xform"
c_xform :: Ptr (CFloat,CFloat,CFloat) ->
CInt ->
IO ()
The Ptr is expected to point to an unboxed flat C array of rgb_t structs:
typedef struct
{
float r;
float g;
float b;
} rgb_t;
Available web-based FFI descriptions of how to deal with array pointers in FFI are non-existent if not downright obscure. The fairly straightforward idea of unfreezing and passing in a C array of floating-point RGB structs, modifying them in-place and then freezing the result is what I had in mind. The external transform is pure in the sense that the same input will produce predictable output, does not use threads, does not use global vars nor depend upon obscure libraries.

Foreign.Marshal.Array seems to provide a way to convert haskell data to C data and other way around.
I tested interfacing C code and haskell using the following files (Haskell + FFI for the first time for me)
hsc2hs rgb_ffi.hsc
ghc main.hs rgb_ffi.hs rgb.c
rgb.h
#ifndef RGB_H
#define RGB_H
#include <stdlib.h>
typedef struct {
float r;
float g;
float b;
} rgb_t;
void rgb_test(rgb_t * rgbs, ssize_t n);
#endif
rgb.h
#include <stdlib.h>
#include <stdio.h>
#include "rgb.h"
void rgb_test(rgb_t * rgbs, ssize_t n)
{
int i;
for(i=0; i<n; i++) {
printf("%.3f %.3f %.3f\n", rgbs[i].r, rgbs[i].g, rgbs[i].b);
rgbs[i].r *= 2.0;
rgbs[i].g *= 2.0;
rgbs[i].b *= 2.0;
}
}
rgb_ffi.hsc
{-# LANGUAGE ForeignFunctionInterface #-}
{-# LANGUAGE CPP #-}
module RGB where
import Foreign
import Foreign.C
import Control.Monad (ap)
#include "rgb.h"
data RGB = RGB {
r :: CFloat, g :: CFloat, b :: CFloat
} deriving Show
instance Storable RGB where
sizeOf _ = #{size rgb_t}
alignment _ = alignment (undefined :: CInt)
poke p rgb_t = do
#{poke rgb_t, r} p $ r rgb_t
#{poke rgb_t, g} p $ g rgb_t
#{poke rgb_t, b} p $ b rgb_t
peek p = return RGB
`ap` (#{peek rgb_t, r} p)
`ap` (#{peek rgb_t, g} p)
`ap` (#{peek rgb_t, b} p)
foreign import ccall "rgb.h rgb_test" crgbTest :: Ptr RGB -> CSize -> IO ();
rgbTest :: [RGB] -> IO [RGB]
rgbTest rgbs = withArray rgbs $ \ptr ->
do
crgbTest ptr (fromIntegral (length rgbs))
peekArray (length rgbs) ptr
rgbAlloc :: [RGB] -> IO (Ptr RGB)
rgbAlloc rgbs = newArray rgbs
rgbPeek :: Ptr RGB -> Int -> IO [RGB]
rgbPeek rgbs l = peekArray l rgbs
rgbTest2 :: Ptr RGB -> Int -> IO ()
rgbTest2 ptr l =
do
crgbTest ptr (fromIntegral l)
return ()
main.hs
module Main (main) where
import RGB
main =
do
let a = [RGB {r = 1.0, g = 1.0, b = 1.0},
RGB {r = 2.0, g = 2.0, b = 2.0},
RGB {r = 3.0, g = 3.0, b = 3.0}]
let l = length a
print a
-- b <- rgbTest a
-- print b
c <- rgbAlloc a
rgbTest2 c l
rgbTest2 c l
d <- rgbPeek c l
print d
return ()

Related

Haskell vs C speed: Sieve of Eratosthenes

Haskell one is implemented using optimized Data.IntSet with complexity O(lg n). However, there is a 15x (previously 30x) speed difference for n = 2000000 despite Haskell code is already optimized for even number cases. I would like to know whether/why my implementation in Haskell is imperfect.
Original Haskell
primesUpTo :: Int -> [Int]
primesUpTo n = 2 : put S.empty [3,5..n]
where put :: S.IntSet -> [Int] -> [Int]
put _ [] = []
put comps (x:xs) =
if S.member x comps
then put comps xs
else x : put (S.union comps multiples) xs
where multiples = S.fromList [x*2, x*3 .. n]
Update
fromDistinctAscList gives a 4x speed increase. 2-3-5-7-Wheel speeds up by another 50%.
primesUpTo :: Int -> [Int]
primesUpTo n = 2 : 3 : 5 : 7 : put S.empty (takeWhile (<=n) (spin wheel 11))
where put :: S.IntSet -> [Int] -> [Int]
put _ [] = []
put comps (x:xs) =
if S.member x comps
then put comps xs
else x : put (S.union comps multiples) xs
where multiples = S.fromDistinctAscList [x*x, x*(x+2) .. n]
spin (x:xs) n = n : spin xs (n + x)
wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
Benchmarking
All time are measured by *nix time command, real space
Haskell original : 2e6: N/A; 2e7: >30s
Haskell optimized: 2e6: 0.396s; 2e7: 6.273s
C++ Set (ordered): 2e6: 4.694s; 2e7: >30s
C++ Bool Array : 2e6: 0.039s; 2e7: 0.421s
Haskell optimized is slower than C++ Bool by 10~15x, and faster than C++ Set by 10x.
Source code
C Compiler options: g++ 5.3.1, g++ -std=c++11
Haskell options: ghc 7.8.4, ghc
C code (Bool array) http://pastebin.com/W0s7cSWi
prime[0] = prime[1] = false;
for (int i=2; i<=limit; i++) { //edited
if (!prime[i]) continue;
for (int j=2*i; j<=n; j+=i)
prime[j] = false;
}
C code (Set) http://pastebin.com/sNpghrU4
nonprime.insert(1);
for (int i=2; i<=limit; i++) { //edited
if (nonprime.count(i) > 0) continue;
for (int j=2*i; j<=n; j+=i)
nonprime.insert(j);
}
Haskell code http://pastebin.com/HuMqwvRW
Code as written above.

I would like to know whether/why my implementation in Haskell is imperfect.
Instead of fromList you better use fromDistinctAscList which performs linearly. You may also add only odd multiples starting with x*x not x*2, because all the smaller odd multiples have already been added. Style-wise, a right fold may fit better than recursion.
Doing so, I get more than 3 times performance improvement for n equal to 2,000,000:
import Data.IntSet (member, union, empty, fromDistinctAscList)
sieve :: Int -> [Int]
sieve n = 2: foldr go (const []) [3,5..n] empty
where
go i run obs
| member i obs = run obs
| otherwise = i: run (union obs inc)
where inc = fromDistinctAscList [i*i, i*(i + 2)..n]
Nevertheless, an array has both O(1) access and cache friendly memory allocation. Using mutable arrays, I see more than 15 times performance improvement over your Haskell code (again n equal to 2,000,000):
{-# LANGUAGE FlexibleContexts #-}
import Data.Array.ST (STUArray)
import Control.Monad (forM_, foldM)
import Control.Monad.ST (ST, runST)
import Data.Array.Base (newArray, unsafeWrite, unsafeRead)
sieve :: Int -> [Int]
sieve n = reverse $ runST $ do
arr <- newArray (0, n) False :: ST s (STUArray s Int Bool)
foldM (go arr) [2] [3,5..n]
where
go arr acc i = do
b <- unsafeRead arr i
if b then return acc else do
forM_ [i*i, i*(i + 2).. n] $ \k -> unsafeWrite arr k True
return $ i: acc

passing in R array to C function with "NA"

I work in R using C libraries. I need to pass to a C function an array with numbers between 1 and 10 but that could also be "NA". Then in C, depending on the value I need to set the output.
Here's a simplified code
heredyn.load("ranking.so")
fun <- function(ranking) {
nrak <- length(ranking)
out <- .C("ranking", as.integer(nrak), as.character(ranking), rr = as.integer(vector("integer",nrak)))
out$rr
}
ranking <- sample(c(NA,seq(1,10)),10,replace=TRUE)
rr <- fun(ranking)
The C function could simply be such as
#include <R.h>
void ranking(int *nrak, char *ranking, int *rr) {
int i ;
for (i=0;i<*nrak;i++) {
if (ranking[i] == 'NA')
rr[i] = 1 ;
else
rr[i] = (int) strtol(&ranking[i],(char **)NULL,10) ;
}
}
Due to the "NA" value I set ranking as character but maybe there's another way to do that, using integer and without replacing "NA" to 0 before calling the function?
(The code like this, gives me always an array of zeros...)

Test for whether the value is an NA using R_NaInt, like
#include <R.h>
void ranking_c(int *nrak, int *ranking, int *rr) {
for (int i=0; i < *nrak; i++)
rr[i] = R_NaInt == ranking[i] ? -1 : ranking[i];
}
Invoke from R by explicitly allowing NAs
> x = c(1:2, NA_integer_)
> .C("ranking_c", length(x), as.integer(x), integer(length(x)), NAOK=TRUE)[[3]]
[1] 1 2 -1
Alternatively, use R's .Call() interface. Each R object is represented as an S-expression. There are C-level functions to manipulate S-expressions, e.g., length Rf_length(), data access INTEGER(), and allocation Rf_allocVector() of different types of S-expressions such as INTSXP for integer vectors.
R memory management uses a garbage collector that can run on any call that allocates memory. It is therefore best practice to PROTECT() any R allocation while in scope.
Your function will accept 0 or more S-expressions as input, and return a single S-expression; it might be implemented as
#include <Rinternals.h>
#include <R_ext/Arith.h>
SEXP ranking_call(SEXP ranking)
{
/* allocate space for result, PROTECTing from garbage collection */
SEXP result = PROTECT(Rf_allocVector(INTSXP, Rf_length(ranking)));
/* assign result */
for (int i = 0; i < Rf_length(ranking); ++i)
INTEGER(result)[i] =
R_NaInt == INTEGER(ranking)[i] ? -1 : INTEGER(ranking)[i];
UNPROTECT(1); /* no more need to protect */
return result;
}
And invoked from R with .Call("ranking_call", as.integer(ranking)).
Using .Call is more efficient than .C in terms of speed and memory allocation (.C may copy atomic vectors on the way in), but the primary reason to use it is for the flexibility it offers in terms of working directly with R's data structures. This is especially important when the return values are more complicated than atomic vectors.

You are attempting to address a couple of delicate and non-trivial points, least of all how to compile code with R, and to test for non-finite values.
You asked for help with C. I would like to suggest C++ -- which you do not need to use in a complicated way. Consider this short file with contains a function to process a vector along the lines you suggest (I just test for NA and then assign 42 as a marker for simplicit) or else square the value:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector foo(NumericVector x) {
unsigned int n = x.size();
for (unsigned int i=0; i<n; i++)
if (NumericVector::is_na(x[i]))
x[i] = 42.0;
else
x[i] = pow(x[i], 2);
return x;
}
/*** R
foo( c(1, 3, NA, NaN, 6) )
*/
If I save this on my box as /tmp/foo.cpp, in order compile, link, load and even run the embedded R use example, I only need one line to call sourceCpp():
R> Rcpp::sourceCpp("/tmp/foo.cpp")
R> foo( c(1, 3, NA, NaN, 6))
[1] 1 9 42 42 36
R>
We can do the same with integers:
// [[Rcpp::export]]
IntegerVector bar(IntegerVector x) {
unsigned int n = x.size();
for (unsigned int i=0; i<n; i++)
if (IntegerVector::is_na(x[i]))
x[i] = 42;
else
x[i] = pow(x[i], 2);
return x;
}

return IO(Data.Vector.Storable.Vector) from Haskell to C

I am trying to pass IO (Data.Vector.Storable.Vector Double) from Haskell to C, but there seem to be a problem as Haskell does not allow that. In this case, should I pass a pointer to the vector back to the C program (then how to wrap this object in Ptr) or is there another way?
Also, Data.Vector.Storable seem to have toList function, but when I try to use it, the program does not compile complaining that the Vector is wrapped in IO Monad.
I have tried to find a way to unwrap Vector from IO Monad, but failed to find anything specific, seems like to do so, the IO wrapped object has to be passed to another function, which would have to return back the IO wrapped object again, which, in my opinion, would lead to the same problem again as IO wrapped objects cannot be passed back to C program from the interface.
{-# LANGUAGE ForeignFunctionInterface #-}
module Safe where
import Foreign
import Foreign.C.Types
import Data.Vector.Storable
import AI.HNN.Recurrent.Network
foreign export ccall process :: Ptr CInt -> Ptr Double -> IO (Data.Vector.Storable.Vector Double) -> IO ()
feed :: [Double] -> IO (Data.Vector.Storable.Vector Double)
feed adj = do
let numNeurons = 3
numInputs = 1
thresholds = Prelude.replicate numNeurons 0.5
inputs = [[0.38], [0.75]]
n <- createNetwork numNeurons numInputs adj thresholds :: IO (Network Double)
output <- evalNet n inputs sigmoid
return output
peekInt :: Ptr CInt -> IO Int
peekInt = fmap fromIntegral . peek
process :: Ptr CInt -> Ptr Double -> IO (Data.Vector.Storable.Vector Double) -> IO ()
process n xs result = do
n <- peekInt n
es <- peekArray n xs
poke result $ (feed es)
Calling and collecting from C
double res = [output_size];
int asize = sizeof(weights)/sizeof(double);
process(&asize, &weights, &res);

Performance of changing values of STUArray in bulk

I am implementing a topological sort in Haskell with the requirement to be as efficient as possible. I have profiled my current solution and found out the the following method is taking 60% of total time (and 0 amount of additional space):
import Control.Monad.ST
import Control.Monad
import Data.Array.ST
import Data.Array.Unboxed
import Data.Word
import Data.Array.Base
zeroElementsAfterDecrement' :: (MArray a e m, Num e, Eq e) => a Int e -> [Int] -> m [Int]
zeroElementsAfterDecrement' arr is = foldr k (return []) is
where k i a = do xs <- a
decremented <- liftM (subtract 1) (unsafeRead arr i)
unsafeWrite arr i decremented
if decremented == 0 then return (i:xs) else return xs
largenum :: Int
largenum = 10000000
test = runST $ do arr <- newArray (1, largenum) 100 :: ST s (STUArray s Int Word32)
zeroElementsAfterDecrement' arr [1..largenum]
main = (putStrLn . show) test
The function takes an array (I use unboxed mutable arrays) and a list of indexes, decrements elements by these indexes and returns indexes of elements that became zero during this operation. Right now this is more than 10 times slower than the optimized C++ code but still pretty good compared to Python (or maybe I don't know Python way to optimize this). I understand there is an overhead from executing a monadic code, but maybe there are still ways to optimize I am not aware of?
Edit:
GHC: -O -fllvm: 0.54s
GHC (with unsafeWrite/unsafeRead and Word32): 0.34s
g++: 0.24s
g++ -O2: 0.05s
python3: 2.66s
Also when I change foldr to foldl' it starts allocating some memory and is 4 times slower as a result, why is that?
Here is a C++ version I compared it to:
#include <iostream>
#include <vector>
using namespace std;
#define LARGENUM 10000000
int main()
{
vector <int> arr;
for (int i = 0; i < LARGENUM; i++) {
arr.push_back(100);
}
for (int i = 0; i < arr.size(); i++) {
arr[i]--;
if (arr[i] == 0)
cout << i << endl;
}
return 0;
}
And a Python version:
arr = [100] * 10000000
for x in range (0, 10000000 - 1):
arr[x] = arr[x] - 1
if arr[x] == 0:
print x

How to read a file into Vector of Doubles?

I have a name of a file (as a string), and that file contains certain amount (1000000, for example) double-precision floating-point values (stored as binary, 8 bytes for each, obviously).
What would be the best way to read those doubles into a vector?

Here's how I did it in the end:
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Unboxed.Mutable as VM
import qualified Data.ByteString.Lazy as BS
import Data.Binary
import Data.Binary.Get
import System.IO.Unsafe (unsafePerformIO)
import Unsafe.Coerce
readDoubles :: Int -> FilePath -> IO (V.Vector Double)
readDoubles n f = BS.readFile f >>= return . runGet (getVector n)
getVector :: Int -> Get (V.Vector Double)
{-# INLINE getVector #-}
getVector n = do
mv <- liftGet $ VM.new n
let fill i
| i < n = do
x <- fmap unsafeCoerce getWord64be
(unsafePerformIO $ VM.unsafeWrite mv i x) `seq` return ()
fill (i+1)
| otherwise = return ()
fill 0
liftGet $ V.unsafeFreeze mv
liftGet :: IO b -> Get b
liftGet = return . unsafePerformIO