## Why does `clockTime` not seem to work for benchmarking, in Idris? - benchmarking

There are probably libraries to do this (although I haven't found any), I'm actually looking to measure the time a function takes to run in Idris. The way I've found is by using clockTime from System and differentiating between before and after a function runs. Here is an example code:
module Main
import Data.String
import System
factorial : Integer -> Integer
factorial 0 = 1
factorial 1 = 1
factorial n = n * factorial (n - 1)
main : IO ()
main = do
args <- getArgs
case args of
[self ] => putStrLn "Please enter a value"
[_, ar] => do
case parseInteger ar of
Just a => do
t1 <- clockTime
let r = factorial a
t2 <- clockTime
let elapsed = (nanoseconds t2) - (nanoseconds t1)
putStrLn $ "fact(" ++ show a ++ ") = "
++ show r ++ " in "
++ (show elapsed) ++ " ns"
Nothing => putStrLn "Not a valid number"
To avoid Idris optimising the program by already evaluating the factorial, I just asked that the program be called with an argument.
This code doesn't work though: no matter what numbers I enter, such as 10000, Idris always returns 0 nanoseconds, which makes me quite sceptical, even just allocating a bigint takes time. I compile with idris main.idr -o main.
What am I doing wrong in my code? Is clockTime not a good plan for benchmarks?

Idris 1 is no longer being maintained.
In Idris 2, clockTime can be used.
clockTime : (typ : ClockType) -> IO (clockTimeReturnType typ)
An example of its use for benchmarking can be found within the Idris2 compiler, here.

## Related

### Haskell -> C FFI performance

This is the dual question of Performance considerations of Haskell FFI / C?: I would like to call a C function with as small an overhead as possible. To set the scene, I have the following C function: typedef struct { uint64_t RESET; } INPUT; typedef struct { uint64_t VGA_HSYNC; uint64_t VGA_VSYNC; uint64_t VGA_DE; uint8_t VGA_RED; uint8_t VGA_GREEN; uint8_t VGA_BLUE; } OUTPUT; void Bounce(const INPUT* input, OUTPUT* output); Let's run it from C and time it, with gcc -O3: int main (int argc, char **argv) { INPUT input; input.RESET = 0; OUTPUT output; int cycles = 0; for (int j = 0; j < 60; ++j) { for (;; ++cycles) { Bounce(&input, &output); if (output.VGA_HSYNC == 0 && output.VGA_VSYNC == 0) break; } for (;; ++cycles) { Bounce(&input, &output); if (output.VGA_DE) break; } } printf("%d cycles\n", cycles); } Running it for 25152001 cycles takes ~400 ms: $ time ./Bounce 25152001 cycles real 0m0.404s user 0m0.403s sys 0m0.001s Now let's write some Haskell code to set up FFI (note that Bool's Storable instance really does use a full int): data INPUT = INPUT { reset :: Bool } data OUTPUT = OUTPUT { vgaHSYNC, vgaVSYNC, vgaDE :: Bool , vgaRED, vgaGREEN, vgaBLUE :: Word64 } deriving (Show) foreign import ccall unsafe "Bounce" topEntity :: Ptr INPUT -> Ptr OUTPUT -> IO () instance Storable INPUT where ... instance Storable OUTPUT where ... And let's do what I believe to be functionally equivalent to our C code from before: main :: IO () main = alloca $ \inp -> alloca $ \outp -> do poke inp $ INPUT{ reset = False } let loop1 n = do topEntity inp outp out#OUTPUT{..} <- peek outp let n' = n + 1 if not vgaHSYNC && not vgaVSYNC then loop2 n' else loop1 n' loop2 n = do topEntity inp outp out <- peek outp let n' = n + 1 if vgaDE out then return n' else loop2 n' loop3 k n | k < 60 = do n <- loop1 n loop3 (k + 1) n | otherwise = return n n <- loop3 (0 :: Int) (0 :: Int) printf "%d cycles" n I build it with GHC 8.6.5, using -O3, and I get.. more than 3 seconds! $ time ./.stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/sim-ffi/sim-ffi 25152001 cycles real 0m3.468s user 0m3.146s sys 0m0.280s And it's not a constant overhead at startup, either: if I run for 10 times the cycles, I get roughly 3.5 seconds from C and 34 seconds from Haskell. What can I do to reduce the Haskell -> C FFI overhead?

I managed to reduce the overhead so that the 25 M calls now finish in 1.2 seconds. The changes were: Make loop1, loop2 and loop3 strict in the n argument (using BangPatterns) Add an INLINE pragma to peek in OUTPUT's Storable instance Point #1 is silly, of course, but that's what I get for not profiling earlier. That change alone gets me to 1.5 seconds.... Point #2, however, makes a ton of sense and is generally applicable. It also addresses the comment from #Thomas M. DuBuisson: Do you ever need the Haskell structure in haskell? If you can just keep it as a pointer to memory and have a few test functions such as vgaVSYNC :: Ptr OUTPUT -> IO Bool then that will save a log of copying, allocation, GC work on every call. In the eventual full program, I do need to look at all the fields of OUTPUT. However, with peek inlined, GHC is happy to do the case-of-case transformation, so I can see in Core that now there is no OUTPUT value allocated; the output of peek is consumed directly.

### Haskell: looping over user input gracefully

This is an example from Learn You a Haskell: main = do putStrLn "hello, what's your name?" name <- getLine putStrLn ("Hey, " ++ name ++ ", you rock!") The same redone without do for clarity: main = putStrLn "hello, what's your name?" >> getLine >>= \name -> putStrLn $ "Hey, " ++ name ++ ", you rock!" How am I supposed to loop it cleanly (until "q"), the Haskell way (use of do discouraged)? I borrowed this from Haskell - loop over user input main = mapM_ process . takeWhile (/= "q") . lines =<< getLine where process line = do putStrLn line for starters, but it won't loop.

You can call main again and check if your string is "q" or not. import Control.Monad main :: IO () main = putStrLn "hello, what's your name?" >> getLine >>= \name -> when (name /= "q") $ (putStrLn $ "Hey, " ++ name ++ ", you rock!") >> main λ> main hello, what's your name? Mukesh Tiwari Hey, Mukesh Tiwari, you rock! hello, what's your name? Alexey Orlov Hey, Alexey Orlov, you rock! hello, what's your name? q λ>

May be you can also use laziness on IO type by adapting the System.IO.Lazy package. It basically includes only run :: T a -> IO a and interleave :: IO a -> T a functions to convert IO actions into lazy ones back and forth. import qualified System.IO.Lazy as LIO getLineUntil :: String -> IO [String] getLineUntil s = LIO.run ((sequence . repeat $ LIO.interleave getLine) >>= return . takeWhile (/=s)) printData :: IO [String] -> IO () printData d = d >>= print . sum . map (read :: String -> Int) *Main> printData $ getLineUntil "q" 1 2 3 4 5 6 7 8 9 q 45 In the above code we construct an infinite list of lazy getLines by repeat $ LIO.interleave getLine of type [T String] and by sequence we turn it into T [String] type and proceed reading up until "q" is received. The printData utility function is summing up and printing the entered integers.

### Example of while loop in Haskell using monads

I want to write a loop in haskell using monads but I am having a hard time understanding the concept. Could someone provide me with one simple example of a while loop while some conditions is satisfied that involves IO action? I don't want an abstract example but a real concrete one that works.

Below there's a horrible example. You have been warned. Consider the pseudocode: var x = 23 while (x != 0) { print "x not yet 0, enter an adjustment" x = x + read() } print "x reached 0! Exiting" Here's its piece-by-piece translation in Haskell, using an imperative style as much as possible. import Data.IORef main :: IO () main = do x <- newIORef (23 :: Int) let loop = do v <- readIORef x if v == 0 then return () else do putStrLn "x not yet 0, enter an adjustment" a <- readLn writeIORef x (v+a) loop loop putStrLn "x reached 0! Exiting" The above is indeed horrible Haskell. It simulates the while loop using the recursively-defined loop, which is not too bad. But it uses IO everywhere, including for mimicking imperative-style mutable variables. A better approach could be to remove those IORefs. main = do let loop 0 = return () loop v = do putStrLn "x not yet 0, enter an adjustment" a <- readLn loop (v+a) loop 23 putStrLn "x reached 0! Exiting" Not elegant code by any stretch, but at least the "while guard" now does not do unnecessary IO. Usually, Haskell programmers strive hard to separate pure computation from IO as much as possible. This is because it often leads to better, simpler and less error-prone code.

### I/O Loop in Haskell

I am currently working on this Haskell problem and I seem to be stuck. Write a function, evalpoly, that will ask a user for the degree of a single variable polynomial, then read in the coefficients for the polynomial (from highest power to lowest), then for a value, and will output the value of the polynomial evaluated at that value. As an example run: > evalpoly What is the degree of the polynomial: 3 What is the x^3 coefficient: 1.0 What is the x^2 coefficient: - 2.0 What is the x^1 coefficient: 0 What is the x^0 coefficient: 10.0 What value do you want to evaluate at: -1.0 The value of the polynomial is 7.0 As of now, I have this: evalpoly :: IO () evalpoly = putStr "What is the degree of the polynomial: " >> getLine >>= \xs -> putStr "What is the x^" >> putStr (show xs) >> putStr " coefficient: " >> putStrLn "" How would I go about adding the loop and calculations?

Warning: I spoil this completely so feel free to stop at any point and try to go on yourself Instead of pushing it all into this single function I will instead break this down into smaller tasks/functions. So let's start with this. 1. Input On obvious part is to ask for an value - and if we are on it we can make sure that the user input is any good (I am using Text.Read.readMaybe for this: query :: Read a => String -> IO a query prompt = do putStr $ prompt ++ ": " val <- readMaybe <$> getLine case val of Nothing -> do putStrLn "Sorry that's a wrong value - please reenter" query prompt Just v -> return v please note that I appended the ": " part already so you don't have to do this for your prompts having this all the questions to your user become almost trivial: queryDegree :: IO Int queryDegree = query "What is the degree of the polynomial" queryCoef :: Int -> IO (Int, Double) queryCoef i = do c <- query prompt return (i,c) where prompt = "What is the x^" ++ show i ++ " coefficient" queryPoint :: IO Double queryPoint = query "What value do you want to evaluate at" please note that I provide the powers together with the coefficients - this make the calculation a bit easier but is not strictly necessary here I guess (you could argue that this is more than the function should do at this point and later use zip to get the powers too) Asking all the inputs is now really easy once you've seen mapM and what it can do - it's the point where you usually would want to write a loop: queryPoly :: IO [(Int, Double)] queryPoly = do n <- queryDegree mapM queryCoef [n,n-1..0] 2. Evaluation Do evaluate this I just need to evaluate each term at the given point (that is each power, coefficient pair in the list) - which you can do using map - after we just need to sum this all up (sum can do this): evaluate :: Double -> [(Int, Double)] -> Double evaluate x = sum . map (\ (i,c) -> c*x^i) 3. Output Is rather boring: presentResult :: Double -> IO () presentResult v = putStrLn $ "The vaule of the polynomial is " ++ show v 4. Getting it all together I just have to ask for the inputs, evaluate the value and then present it: evalpoly :: IO () evalpoly = do p <- queryPoly x <- queryPoint presentResult $ evaluate x p 5. Test-Run Here is an example run What is the degree of the polynomial: 3 What is the x^3 coefficient: 1.0 What is the x^2 coefficient: -2.0 What is the x^1 coefficient: Hallo Sorry that's a wrong value - please reenter What is the x^1 coefficient: 0 What is the x^0 coefficient: 10.0 What value do you want to evaluate at: -1.0 The vaule of the polynomial is 7.0 complete Code Note that I like to enter the no-buffering because I run into trouble on Windows occasionally if I don't have it - you probably can live without module Main where import Control.Monad (mapM) import Text.Read (readMaybe) import System.IO (BufferMode(..), stdout, hSetBuffering) query :: Read a => String -> IO a query prompt = do putStr $ prompt ++ ": " val <- readMaybe <$> getLine case val of Nothing -> do putStrLn "Sorry that's a wrong value - please reenter" query prompt Just v -> return v queryDegree :: IO Int queryDegree = query "What is the degree of the polynomial" queryCoef :: Int -> IO (Int, Double) queryCoef i = do c <- query prompt return (fromIntegral i,c) where prompt = "What is the x^" ++ show i ++ " coefficient" queryPoint :: IO Double queryPoint = query "What value do you want to evaluate at" queryPoly :: IO [(Int, Double)] queryPoly = do n <- queryDegree mapM queryCoef [n,n-1..0] evaluate :: Double -> [(Int, Double)] -> Double evaluate x = sum . map (\ (i,c) -> c*x^i) presentResult :: Double -> IO () presentResult v = putStrLn $ "The vaule of the polynomial is " ++ show v evalpoly :: IO () evalpoly = do p <- queryPoly x <- queryPoint presentResult $ evaluate x p

### Comparing speed of Haskell and C for the computation of primes

I initially wrote this (brute force and inefficient) method of calculating primes with the intent of making sure that there was no difference in speed between using "if-then-else" versus guards in Haskell (and there is no difference!). But then I decided to write a C program to compare and I got the following (Haskell slower by just over 25%) : (Note I got the ideas of using rem instead of mod and also the O3 option in the compiler invocation from the following post : On improving Haskell's performance compared to C in fibonacci micro-benchmark) Haskell : Forum.hs divisibleRec :: Int -> Int -> Bool divisibleRec i j | j == 1 = False | i `rem` j == 0 = True | otherwise = divisibleRec i (j-1) divisible::Int -> Bool divisible i = divisibleRec i (i-1) r = [ x | x <- [2..200000], divisible x == False] main :: IO() main = print(length(r)) C : main.cpp #include <stdio.h> bool divisibleRec(int i, int j){ if(j==1){ return false; } else if(i%j==0){ return true; } else{ return divisibleRec(i,j-1); } } bool divisible(int i){ return divisibleRec(i, i-1); } int main(void){ int i, count =0; for(i=2; i<200000; ++i){ if(divisible(i)==false){ count = count+1; } } printf("number of primes = %d\n",count); return 0; } The results I got were as follows : Compilation times time (ghc -O3 -o runProg Forum.hs) real 0m0.355s user 0m0.252s sys 0m0.040s time (gcc -O3 -o runProg main.cpp) real 0m0.070s user 0m0.036s sys 0m0.008s and the following running times : Running times on Ubuntu 32 bit Haskell 17984 real 0m54.498s user 0m51.363s sys 0m0.140s C++ number of primes = 17984 real 0m41.739s user 0m39.642s sys 0m0.080s I was quite impressed with the running times of Haskell. However my question is this : can I do anything to speed up the haskell program without : Changing the underlying algorithm (it is clear that massive speedups can be gained by changing the algorithm; but I just want to understand what I can do on the language/compiler side to improve performance) Invoking the llvm compiler (because I dont have this installed) [EDIT : Memory usage] After a comment by Alan I noticed that the C program uses a constant amount of memory where as the Haskell program slowly grows in memory size. At first I thought this had something to do with recursion, but gspr explains below why this is happening and provides a solution. Will Ness provides an alternative solution which (like gspr's solution) also ensures that the memory remains static. [EDIT : Summary of bigger runs] max number tested : 200,000: (54.498s/41.739s) = Haskell 30.5% slower max number tested : 400,000: 3m31.372s/2m45.076s = 211.37s/165s = Haskell 28.1% slower max number tested : 800,000: 14m3.266s/11m6.024s = 843.27s/666.02s = Haskell 26.6% slower [EDIT : Code for Alan] This was the code that I had written earlier which does not have recursion and which I had tested on 200,000 : #include <stdio.h> bool divisibleRec(int i, int j){ while(j>0){ if(j==1){ return false; } else if(i%j==0){ return true; } else{ j -= 1;} } } bool divisible(int i){ return divisibleRec(i, i-1); } int main(void){ int i, count =0; for(i=2; i<8000000; ++i){ if(divisible(i)==false){ count = count+1; } } printf("number of primes = %d\n",count); return 0; } The results for the C code with and without recursion are as follows (for 800,000) : With recursion : 11m6.024s Without recursion : 11m5.328s Note that the executable seems to take up 60kb (as seen in System monitor) irrespective of the maximum number, and therefore I suspect that the compiler is detecting this recursion.

This isn't really answering your question, but rather what you asked in a comment regarding growing memory usage when the number 200000 grows. When that number grows, so does the list r. Your code needs all of r at the very end, to compute its length. The C code, on the other hand, just increments a counter. You'll have to do something similar in Haskell too if you want constant memory usage. The code will still be very Haskelly, and in general it's a sensible proposition: you don't really need the list of numbers for which divisible is False, you just need to know how many there are. You can try with main :: IO () main = print $ foldl' (\s x -> if divisible x then s else s+1) 0 [2..200000] (foldl' is a stricter foldl from Data.List that avoids thunks being built up).

Well bang patters give you a very small win (as does llvm, but you seem to have expected that): {-# LANUGAGE BangPatterns #-} divisibleRec !i !j | j == 1 = False And on my x86-64 I get a very big win by switching to smaller representations, such as Word32: divisibleRec :: Word32 -> Word32 -> Bool ... divisible :: Word32 -> Bool My timings: $ time ./so -- Int 2262 real 0m2.332s $ time ./so -- Word32 2262 real 0m1.424s This is a closer match to your C program, which is only using int. It still doesn't match performance wise, I suspect we'd have to look at core to figure out why. EDIT: and the memory use, as was already noted I see, is about the named list r. I just inlined r, made it output a 1 for each non-divisble value and took the sum: main = print $ sum $ [ 1 | x <- [2..800000], not (divisible x) ]

Another way to write down your algorithm is main = print $ length [()|x<-[2..200000], and [rem x d>0|d<-[x-1,x-2..2]]] Unfortunately, it runs slower. Using all ((>0).rem x) [x-1,x-2..2] as a test, it runs slower still. But maybe you'd test it on your setup nevertheless. Replacing your code with explicit loop with bang patterns made no difference whatsoever: {-# OPTIONS_GHC -XBangPatterns #-} r4::Int->Int r4 n = go 0 2 where go !c i | i>n = c | True = go (if not(divisible i) then (c+1) else c) (i+1) divisibleRec::Int->Int->Bool divisibleRec i !j | j == 1 = False | i `rem` j == 0 = True | otherwise = divisibleRec i (j-1)

When I started programming in Haskell I was also impressed about its speed. You may be interested in reading point 5 "The speed of Haskell" of this article.