Why does `clockTime` not seem to work for benchmarking, in Idris?

Why does `clockTime` not seem to work for benchmarking, in Idris? - benchmarking

There are probably libraries to do this (although I haven't found any), I'm actually looking to measure the time a function takes to run in Idris. The way I've found is by using clockTime from System and differentiating between before and after a function runs. Here is an example code:
module Main
import Data.String
import System
factorial : Integer -> Integer
factorial 0 = 1
factorial 1 = 1
factorial n = n * factorial (n - 1)
main : IO ()
main = do
args <- getArgs
case args of
[self ] => putStrLn "Please enter a value"
[_, ar] => do
case parseInteger ar of
Just a => do
t1 <- clockTime
let r = factorial a
t2 <- clockTime
let elapsed = (nanoseconds t2) - (nanoseconds t1)
putStrLn $ "fact(" ++ show a ++ ") = "
++ show r ++ " in "
++ (show elapsed) ++ " ns"
Nothing => putStrLn "Not a valid number"
To avoid Idris optimising the program by already evaluating the factorial, I just asked that the program be called with an argument.
This code doesn't work though: no matter what numbers I enter, such as 10000, Idris always returns 0 nanoseconds, which makes me quite sceptical, even just allocating a bigint takes time. I compile with idris main.idr -o main.
What am I doing wrong in my code? Is clockTime not a good plan for benchmarks?

Idris 1 is no longer being maintained.
In Idris 2, clockTime can be used.
clockTime : (typ : ClockType) -> IO (clockTimeReturnType typ)
An example of its use for benchmarking can be found within the Idris2 compiler, here.

Related

Haskell -> C FFI performance

This is the dual question of Performance considerations of Haskell FFI / C?: I would like to call a C function with as small an overhead as possible.
To set the scene, I have the following C function:
typedef struct
{
uint64_t RESET;
} INPUT;
typedef struct
{
uint64_t VGA_HSYNC;
uint64_t VGA_VSYNC;
uint64_t VGA_DE;
uint8_t VGA_RED;
uint8_t VGA_GREEN;
uint8_t VGA_BLUE;
} OUTPUT;
void Bounce(const INPUT* input, OUTPUT* output);
Let's run it from C and time it, with gcc -O3:
int main (int argc, char **argv)
{
INPUT input;
input.RESET = 0;
OUTPUT output;
int cycles = 0;
for (int j = 0; j < 60; ++j)
{
for (;; ++cycles)
{
Bounce(&input, &output);
if (output.VGA_HSYNC == 0 && output.VGA_VSYNC == 0) break;
}
for (;; ++cycles)
{
Bounce(&input, &output);
if (output.VGA_DE) break;
}
}
printf("%d cycles\n", cycles);
}
Running it for 25152001 cycles takes ~400 ms:
$ time ./Bounce
25152001 cycles
real 0m0.404s
user 0m0.403s
sys 0m0.001s
Now let's write some Haskell code to set up FFI (note that Bool's Storable instance really does use a full int):
data INPUT = INPUT
{ reset :: Bool
}
data OUTPUT = OUTPUT
{ vgaHSYNC, vgaVSYNC, vgaDE :: Bool
, vgaRED, vgaGREEN, vgaBLUE :: Word64
}
deriving (Show)
foreign import ccall unsafe "Bounce" topEntity :: Ptr INPUT -> Ptr OUTPUT -> IO ()
instance Storable INPUT where ...
instance Storable OUTPUT where ...
And let's do what I believe to be functionally equivalent to our C code from before:
main :: IO ()
main = alloca $ \inp -> alloca $ \outp -> do
poke inp $ INPUT{ reset = False }
let loop1 n = do
topEntity inp outp
out#OUTPUT{..} <- peek outp
let n' = n + 1
if not vgaHSYNC && not vgaVSYNC then loop2 n' else loop1 n'
loop2 n = do
topEntity inp outp
out <- peek outp
let n' = n + 1
if vgaDE out then return n' else loop2 n'
loop3 k n
| k < 60 = do
n <- loop1 n
loop3 (k + 1) n
| otherwise = return n
n <- loop3 (0 :: Int) (0 :: Int)
printf "%d cycles" n
I build it with GHC 8.6.5, using -O3, and I get.. more than 3 seconds!
$ time ./.stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/sim-ffi/sim-ffi
25152001 cycles
real 0m3.468s
user 0m3.146s
sys 0m0.280s
And it's not a constant overhead at startup, either: if I run for 10 times the cycles, I get roughly 3.5 seconds from C and 34 seconds from Haskell.
What can I do to reduce the Haskell -> C FFI overhead?

I managed to reduce the overhead so that the 25 M calls now finish in 1.2 seconds. The changes were:
Make loop1, loop2 and loop3 strict in the n argument (using BangPatterns)
Add an INLINE pragma to peek in OUTPUT's Storable instance
Point #1 is silly, of course, but that's what I get for not profiling earlier. That change alone gets me to 1.5 seconds....
Point #2, however, makes a ton of sense and is generally applicable. It also addresses the comment from #Thomas M. DuBuisson:
Do you ever need the Haskell structure in haskell? If you can just keep it as a pointer to memory and have a few test functions such as vgaVSYNC :: Ptr OUTPUT -> IO Bool then that will save a log of copying, allocation, GC work on every call.
In the eventual full program, I do need to look at all the fields of OUTPUT. However, with peek inlined, GHC is happy to do the case-of-case transformation, so I can see in Core that now there is no OUTPUT value allocated; the output of peek is consumed directly.

Haskell: looping over user input gracefully

This is an example from Learn You a Haskell:
main = do
putStrLn "hello, what's your name?"
name <- getLine
putStrLn ("Hey, " ++ name ++ ", you rock!")
The same redone without do for clarity:
main =
putStrLn "hello, what's your name?" >>
getLine >>= \name ->
putStrLn $ "Hey, " ++ name ++ ", you rock!"
How am I supposed to loop it cleanly (until "q"), the Haskell way (use of do discouraged)?
I borrowed this from Haskell - loop over user input
main = mapM_ process . takeWhile (/= "q") . lines =<< getLine
where process line = do
putStrLn line
for starters, but it won't loop.

You can call main again and check if your string is "q" or not.
import Control.Monad
main :: IO ()
main =
putStrLn "hello, what's your name?" >>
getLine >>= \name ->
when (name /= "q") $ (putStrLn $ "Hey, " ++ name ++ ", you rock!") >> main
λ> main
hello, what's your name?
Mukesh Tiwari
Hey, Mukesh Tiwari, you rock!
hello, what's your name?
Alexey Orlov
Hey, Alexey Orlov, you rock!
hello, what's your name?
q
λ>

May be you can also use laziness on IO type by adapting the System.IO.Lazy package. It basically includes only run :: T a -> IO a and interleave :: IO a -> T a functions to convert IO actions into lazy ones back and forth.
import qualified System.IO.Lazy as LIO
getLineUntil :: String -> IO [String]
getLineUntil s = LIO.run ((sequence . repeat $ LIO.interleave getLine) >>= return . takeWhile (/=s))
printData :: IO [String] -> IO ()
printData d = d >>= print . sum . map (read :: String -> Int)
*Main> printData $ getLineUntil "q"
1
2
3
4
5
6
7
8
9
q
45
In the above code we construct an infinite list of lazy getLines by repeat $ LIO.interleave getLine of type [T String] and by sequence we turn it into T [String] type and proceed reading up until "q" is received. The printData utility function is summing up and printing the entered integers.

Example of while loop in Haskell using monads

I want to write a loop in haskell using monads but I am having a hard time understanding the concept.
Could someone provide me with one simple example of a while loop while some conditions is satisfied that involves IO action? I don't want an abstract example but a real concrete one that works.

Below there's a horrible example. You have been warned.
Consider the pseudocode:
var x = 23
while (x != 0) {
print "x not yet 0, enter an adjustment"
x = x + read()
}
print "x reached 0! Exiting"
Here's its piece-by-piece translation in Haskell, using an imperative style as much as possible.
import Data.IORef
main :: IO ()
main = do
x <- newIORef (23 :: Int)
let loop = do
v <- readIORef x
if v == 0
then return ()
else do
putStrLn "x not yet 0, enter an adjustment"
a <- readLn
writeIORef x (v+a)
loop
loop
putStrLn "x reached 0! Exiting"
The above is indeed horrible Haskell. It simulates the while loop using the recursively-defined loop, which is not too bad. But it uses IO everywhere, including for mimicking imperative-style mutable variables.
A better approach could be to remove those IORefs.
main = do
let loop 0 = return ()
loop v = do
putStrLn "x not yet 0, enter an adjustment"
a <- readLn
loop (v+a)
loop 23
putStrLn "x reached 0! Exiting"
Not elegant code by any stretch, but at least the "while guard" now does not do unnecessary IO.
Usually, Haskell programmers strive hard to separate pure computation from IO as much as possible. This is because it often leads to better, simpler and less error-prone code.

I/O Loop in Haskell

I am currently working on this Haskell problem and I seem to be stuck.
Write a function, evalpoly, that will ask a user for the degree of a single variable polynomial, then read in the coefficients for the polynomial (from highest power to lowest), then for a value, and will output the value of the polynomial evaluated at that value. As an example run:
> evalpoly
What is the degree of the polynomial: 3
What is the x^3 coefficient: 1.0
What is the x^2 coefficient: - 2.0
What is the x^1 coefficient: 0
What is the x^0 coefficient: 10.0
What value do you want to evaluate at: -1.0
The value of the polynomial is 7.0
As of now, I have this:
evalpoly :: IO ()
evalpoly = putStr "What is the degree of the polynomial: " >>
getLine >>= \xs ->
putStr "What is the x^" >>
putStr (show xs) >>
putStr " coefficient: " >>
putStrLn ""
How would I go about adding the loop and calculations?

Warning:
I spoil this completely so feel free to stop at any point and try to go on yourself
Instead of pushing it all into this single function I will instead break this down into smaller tasks/functions.
So let's start with this.
1. Input
On obvious part is to ask for an value - and if we are on it we can make sure that the user input is any good (I am using Text.Read.readMaybe for this:
query :: Read a => String -> IO a
query prompt = do
putStr $ prompt ++ ": "
val <- readMaybe <$> getLine
case val of
Nothing -> do
putStrLn "Sorry that's a wrong value - please reenter"
query prompt
Just v -> return v
please note that I appended the ": " part already so you don't have to do this for your prompts
having this all the questions to your user become almost trivial:
queryDegree :: IO Int
queryDegree = query "What is the degree of the polynomial"
queryCoef :: Int -> IO (Int, Double)
queryCoef i = do
c <- query prompt
return (i,c)
where prompt = "What is the x^" ++ show i ++ " coefficient"
queryPoint :: IO Double
queryPoint = query "What value do you want to evaluate at"
please note that I provide the powers together with the coefficients - this make the calculation a bit easier but is not strictly necessary here I guess (you could argue that this is more than the function should do at this point and later use zip to get the powers too)
Asking all the inputs is now really easy once you've seen mapM and what it can do - it's the point where you usually would want to write a loop:
queryPoly :: IO [(Int, Double)]
queryPoly = do
n <- queryDegree
mapM queryCoef [n,n-1..0]
2. Evaluation
Do evaluate this I just need to evaluate each term at the given point (that is each power, coefficient pair in the list) - which you can do using map - after we just need to sum this all up (sum can do this):
evaluate :: Double -> [(Int, Double)] -> Double
evaluate x = sum . map (\ (i,c) -> c*x^i)
3. Output
Is rather boring:
presentResult :: Double -> IO ()
presentResult v = putStrLn $ "The vaule of the polynomial is " ++ show v
4. Getting it all together
I just have to ask for the inputs, evaluate the value and then present it:
evalpoly :: IO ()
evalpoly = do
p <- queryPoly
x <- queryPoint
presentResult $ evaluate x p
5. Test-Run
Here is an example run
What is the degree of the polynomial: 3
What is the x^3 coefficient: 1.0
What is the x^2 coefficient: -2.0
What is the x^1 coefficient: Hallo
Sorry that's a wrong value - please reenter
What is the x^1 coefficient: 0
What is the x^0 coefficient: 10.0
What value do you want to evaluate at: -1.0
The vaule of the polynomial is 7.0
complete Code
Note that I like to enter the no-buffering because I run into trouble on Windows occasionally if I don't have it - you probably can live without
module Main where
import Control.Monad (mapM)
import Text.Read (readMaybe)
import System.IO (BufferMode(..), stdout, hSetBuffering)
query :: Read a => String -> IO a
query prompt = do
putStr $ prompt ++ ": "
val <- readMaybe <$> getLine
case val of
Nothing -> do
putStrLn "Sorry that's a wrong value - please reenter"
query prompt
Just v -> return v
queryDegree :: IO Int
queryDegree = query "What is the degree of the polynomial"
queryCoef :: Int -> IO (Int, Double)
queryCoef i = do
c <- query prompt
return (fromIntegral i,c)
where prompt = "What is the x^" ++ show i ++ " coefficient"
queryPoint :: IO Double
queryPoint = query "What value do you want to evaluate at"
queryPoly :: IO [(Int, Double)]
queryPoly = do
n <- queryDegree
mapM queryCoef [n,n-1..0]
evaluate :: Double -> [(Int, Double)] -> Double
evaluate x = sum . map (\ (i,c) -> c*x^i)
presentResult :: Double -> IO ()
presentResult v = putStrLn $ "The vaule of the polynomial is " ++ show v
evalpoly :: IO ()
evalpoly = do
p <- queryPoly
x <- queryPoint
presentResult $ evaluate x p

Comparing speed of Haskell and C for the computation of primes

I initially wrote this (brute force and inefficient) method of calculating primes with the intent of making sure that there was no difference in speed between using "if-then-else" versus guards in Haskell (and there is no difference!). But then I decided to write a C program to compare and I got the following (Haskell slower by just over 25%) :
(Note I got the ideas of using rem instead of mod and also the O3 option in the compiler invocation from the following post : On improving Haskell's performance compared to C in fibonacci micro-benchmark)
Haskell : Forum.hs
divisibleRec :: Int -> Int -> Bool
divisibleRec i j
| j == 1 = False
| i `rem` j == 0 = True
| otherwise = divisibleRec i (j-1)
divisible::Int -> Bool
divisible i = divisibleRec i (i-1)
r = [ x | x <- [2..200000], divisible x == False]
main :: IO()
main = print(length(r))
C : main.cpp
#include <stdio.h>
bool divisibleRec(int i, int j){
if(j==1){ return false; }
else if(i%j==0){ return true; }
else{ return divisibleRec(i,j-1); }
}
bool divisible(int i){ return divisibleRec(i, i-1); }
int main(void){
int i, count =0;
for(i=2; i<200000; ++i){
if(divisible(i)==false){
count = count+1;
}
}
printf("number of primes = %d\n",count);
return 0;
}
The results I got were as follows :
Compilation times
time (ghc -O3 -o runProg Forum.hs)
real 0m0.355s
user 0m0.252s
sys 0m0.040s
time (gcc -O3 -o runProg main.cpp)
real 0m0.070s
user 0m0.036s
sys 0m0.008s
and the following running times :
Running times on Ubuntu 32 bit
Haskell
17984
real 0m54.498s
user 0m51.363s
sys 0m0.140s
C++
number of primes = 17984
real 0m41.739s
user 0m39.642s
sys 0m0.080s
I was quite impressed with the running times of Haskell. However my question is this : can I do anything to speed up the haskell program without :
Changing the underlying algorithm (it is clear that massive speedups can be gained by changing the algorithm; but I just want to understand what I can do on the language/compiler side to improve performance)
Invoking the llvm compiler (because I dont have this installed)
[EDIT : Memory usage]
After a comment by Alan I noticed that the C program uses a constant amount of memory where as the Haskell program slowly grows in memory size. At first I thought this had something to do with recursion, but gspr explains below why this is happening and provides a solution. Will Ness provides an alternative solution which (like gspr's solution) also ensures that the memory remains static.
[EDIT : Summary of bigger runs]
max number tested : 200,000:
(54.498s/41.739s) = Haskell 30.5% slower
max number tested : 400,000:
3m31.372s/2m45.076s = 211.37s/165s = Haskell 28.1% slower
max number tested : 800,000:
14m3.266s/11m6.024s = 843.27s/666.02s = Haskell 26.6% slower
[EDIT : Code for Alan]
This was the code that I had written earlier which does not have recursion and which I had tested on 200,000 :
#include <stdio.h>
bool divisibleRec(int i, int j){
while(j>0){
if(j==1){ return false; }
else if(i%j==0){ return true; }
else{ j -= 1;}
}
}
bool divisible(int i){ return divisibleRec(i, i-1); }
int main(void){
int i, count =0;
for(i=2; i<8000000; ++i){
if(divisible(i)==false){
count = count+1;
}
}
printf("number of primes = %d\n",count);
return 0;
}
The results for the C code with and without recursion are as follows (for 800,000) :
With recursion : 11m6.024s
Without recursion : 11m5.328s
Note that the executable seems to take up 60kb (as seen in System monitor) irrespective of the maximum number, and therefore I suspect that the compiler is detecting this recursion.

This isn't really answering your question, but rather what you asked in a comment regarding growing memory usage when the number 200000 grows.
When that number grows, so does the list r. Your code needs all of r at the very end, to compute its length. The C code, on the other hand, just increments a counter. You'll have to do something similar in Haskell too if you want constant memory usage. The code will still be very Haskelly, and in general it's a sensible proposition: you don't really need the list of numbers for which divisible is False, you just need to know how many there are.
You can try with
main :: IO ()
main = print $ foldl' (\s x -> if divisible x then s else s+1) 0 [2..200000]
(foldl' is a stricter foldl from Data.List that avoids thunks being built up).

Well bang patters give you a very small win (as does llvm, but you seem to have expected that):
{-# LANUGAGE BangPatterns #-}
divisibleRec !i !j | j == 1 = False
And on my x86-64 I get a very big win by switching to smaller representations, such as Word32:
divisibleRec :: Word32 -> Word32 -> Bool
...
divisible :: Word32 -> Bool
My timings:
$ time ./so -- Int
2262
real 0m2.332s
$ time ./so -- Word32
2262
real 0m1.424s
This is a closer match to your C program, which is only using int. It still doesn't match performance wise, I suspect we'd have to look at core to figure out why.
EDIT: and the memory use, as was already noted I see, is about the named list r. I just inlined r, made it output a 1 for each non-divisble value and took the sum:
main = print $ sum $ [ 1 | x <- [2..800000], not (divisible x) ]

Another way to write down your algorithm is
main = print $ length [()|x<-[2..200000], and [rem x d>0|d<-[x-1,x-2..2]]]
Unfortunately, it runs slower. Using all ((>0).rem x) [x-1,x-2..2] as a test, it runs slower still. But maybe you'd test it on your setup nevertheless.
Replacing your code with explicit loop with bang patterns made no difference whatsoever:
{-# OPTIONS_GHC -XBangPatterns #-}
r4::Int->Int
r4 n = go 0 2 where
go !c i | i>n = c
| True = go (if not(divisible i) then (c+1) else c) (i+1)
divisibleRec::Int->Int->Bool
divisibleRec i !j | j == 1 = False
| i `rem` j == 0 = True
| otherwise = divisibleRec i (j-1)

When I started programming in Haskell I was also impressed about its speed. You may be interested in reading point 5 "The speed of Haskell" of this article.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Why does `clockTime` not seem to work for benchmarking, in Idris? - benchmarking

Idris 1 is no longer being maintained. In Idris 2, clockTime can be used. clockTime : (typ : ClockType) -> IO (clockTimeReturnType typ) An example of its use for benchmarking can be found within the Idris2 compiler, here.

Related

Haskell -> C FFI performance

Haskell: looping over user input gracefully

Example of while loop in Haskell using monads

I/O Loop in Haskell

Comparing speed of Haskell and C for the computation of primes

Categories

Resources