Idiomatic exceptions for exiting loops in OCaml - loops

In OCaml, imperative-style loops can be exited early by raising exceptions.
While the use of imperative loops is not idiomatic per se in OCaml, I'd like to know what are the most idiomatic ways to emulate imperative loops with early exits (taking into account aspects such as performance, if possible).
For instance, an old OCaml FAQ mentions exception Exit:
Exit: used to jump out of loops or functions.
Is it still current? The standard library simply mentions it as a general-purpose exception:
The Exit exception is not raised by any library function. It is provided for use in your programs.
Relatedly, this answer to another question mentions using a precomputed let exit = Exit exception to avoid allocations inside the loop. Is it still required?
Also, sometimes one wants to exit from the loop with a specific value, such as raise (Leave 42). Is there an idiomatic exception or naming convention to do this? Should I use references in this case (e.g. let res = ref -1 in ... <loop body> ... res := 42; raise Exit)?
Finally, the use of Exit in nested loops prevents some cases where one would like to exit several loops, like break <label> in Java. This would require defining exceptions with different names, or at least using an integer to indicate how many scopes should be exited (e.g. Leave 2 to indicate that 2 levels should be exited). Again, is there an approach/exception naming that is idiomatic here?

As originally posted in comments, the idiomatic way to do early exit in OCaml is using continuations. At the point where you want the early return to go to, you create a continuation, and pass it to the code that might return early. This is more general than labels for loops, since you can exit from just about anything that has access to the continuation.
Also, as posted in comments, note the usage of raise_notrace for exceptions whose trace you never want the runtime to generate.
A "naive" first attempt:
module Continuation :
sig
(* This is the flaw with this approach: there is no good choice for
the result type. *)
type 'a cont = 'a -> unit
(* with_early_exit f passes a function "k" to f. If f calls k,
execution resumes as if with_early_exit completed
immediately. *)
val with_early_exit : ('a cont -> 'a) -> 'a
end =
struct
type 'a cont = 'a -> unit
(* Early return is implemented by throwing an exception. The ref
cell is used to store the value with which the continuation is
called - this is a way to avoid having to generate an exception
type that can store 'a for each 'a this module is used with. The
integer is supposed to be a unique identifier for distinguishing
returns to different nested contexts. *)
type 'a context = 'a option ref * int64
exception Unwind of int64
let make_cont ((cell, id) : 'a context) =
fun result -> cell := Some result; raise_notrace (Unwind id)
let generate_id =
let last_id = ref 0L in
fun () -> last_id := Int64.add !last_id 1L; !last_id
let with_early_exit f =
let id = generate_id () in
let cell = ref None in
let cont : 'a cont = make_cont (cell, id) in
try
f cont
with Unwind i when i = id ->
match !cell with
| Some result -> result
(* This should never happen... *)
| None -> failwith "with_early_exit"
end
let _ =
let nested_function i k = k 15; i in
Continuation.with_early_exit (nested_function 42)
|> string_of_int
|> print_endline
As you can see, the above implements early exit by hiding an exception. The continuation is actually a partially applied function that knows the unique id of the context for which it was created, and has a reference cell to store the result value while the exception is being thrown to that context. The code above prints 15. You can pass the continuation k as deep as you want. You can also define the function f immediately at the point where it is passed to with_early_exit, giving an effect similar to having a label on a loop. I use this very often.
The problem with the above is the result type of 'a cont, which I arbitrarily set to unit. Actually, a function of type 'a cont never returns, so we want it to behave like raise – be usable where any type is expected. However, this doesn't immediately work. If you do something like type ('a, 'b) cont = 'a -> 'b, and pass that down to your nested function, the type checker will infer a type for 'b in one context, and then force you to call continuations only in contexts with the same type, i.e. you won't be able to do things like
(if ... then 3 else k 15)
...
(if ... then "s" else k 16)
because the first expression forces 'b to be int, but the second requires 'b to be string.
To solve this, we need to provide a function analogous to raise for early return, i.e.
(if ... then 3 else throw k 15)
...
(if ... then "s" else throw k 16)
This means stepping away from pure continuations. We have to un-partially-apply make_cont above (and I renamed it to throw), and pass the naked context around instead:
module BetterContinuation :
sig
type 'a context
val throw : 'a context -> 'a -> _
val with_early_exit : ('a context -> 'a) -> 'a
end =
struct
type 'a context = 'a option ref * int64
exception Unwind of int64
let throw ((cell, id) : 'a context) =
fun result -> cell := Some result; raise_notrace (Unwind id)
let generate_id = (* Same *)
let with_early_exit f =
let id = generate_id () in
let cell = ref None in
let context = (cell, id) in
try
f context
with Unwind i when i = id ->
match !cell with
| Some result -> result
| None -> failwith "with_early_exit"
end
let _ =
let nested_function i k = ignore (BetterContinuation.throw k 15); i in
BetterContinuation.with_early_exit (nested_function 42)
|> string_of_int
|> print_endline
The expression throw k v can be used in contexts where different types are required.
I use this approach pervasively in some big applications I work on. I prefer it even to regular exceptions. I have a more elaborate variant, where with_early_exit has a signature roughly like this:
val with_early_exit : ('a context -> 'b) -> ('a -> 'b) -> 'b
where the first function represents an attempt to do something, and the second represents the handler for errors of type 'a that may result. Together with variants and polymorphic variants, this gives a more explicitly-typed take on exception handling. It is especially powerful with polymorphic variants, as the set of error variands can be inferred by the compiler.
The Jane Street approach effectively does the same as what is described here, and in fact I previously had an implementation that generated exception types with first-class modules. I am not sure anymore why I eventually chose this one – there may be subtle differences :)

Just to answer a specific part of my question which was not mentioned in other answers:
... using a precomputed let exit = Exit exception to avoid allocations inside the loop. Is it still required?
I did some micro-benchmarks using Core_bench on 4.02.1+fp and the results indicate no significant difference: when comparing two identical loops, one containing a local exit declared before the loop and another one without it, the time difference is minimal.
The difference between raise Exit and raise_notrace Exit in this example was also minimal, about 2% in some runs, up to 7% in others, but it could well be within the error margins of such a short experiment.
Overall, I couldn't measure any noticeable difference, so unless someone would have examples where Exit/exit significantly affect performance, I would prefer the former since it is clearer and avoids creating a mostly useless variable.
Finally, I also compared the difference between two idioms: using a reference to a value before exiting the loop, or creating a specific exception type containing the return value.
With reference to result value + Exit:
let res = ref 0 in
let r =
try
for i = 0 to n-1 do
if a.(i) = v then
(res := v; raise_notrace Exit)
done;
assert false
with Exit -> !res
in ...
With specific exception type:
exception Res of int
let r =
try
for i = 0 to n-1 do
if a.(i) = v then
raise_notrace (Res v)
done;
assert false
with Res v -> v
in ...
Once again, the differences were minimal and varied a lot between runs. Overall, the first version (reference + Exit) seemed to have a slight advantage (0% to 10% faster), but the difference was not significant enough to recommend one version over the another.
Since the former requires defining an initial value (which may not exist) or using an option type to initialize the reference, and the latter requires defining a new exception per type of value returned from the loop, there is no ideal solution here.

Exit is ok (I'm not sure whether I can say that it is idiomatic). But, make sure, that you're using raise_notrace, if you're using recent enough compiler (since 4.02).
Even better solution, is to use with_return from OCaml Core library. It will not have any problems with scope, because it will create a fresh new exception type for each nesting.
Of course, you can achieve the same results, or just take the source code of Core's implementation.
And even more idiomatic, is not to use exceptions for short-circuiting your iteration, and consider to use existing algorithm (find, find_map, exists, etc) or just write a recursive function, if no algorithm suits you.

Regarding the point
using a precomputed let exit = Exit exception to avoid allocations
inside the loop. Is it still required?
the answer is no with sufficiently recent versions of OCaml. Here is the relevant excerpt from the Changelog of OCaml 4.02.0.
PR#6203: Constant exception constructors no longer allocate (Alain Frisch)
Here is PR6203: http://caml.inria.fr/mantis/view.php?id=6203

Related

Overriding assignment of value in Lua

I am using Lua v5.2.2 within a C application (embedded environment/MCU).
I need to expose some "parameters" in Lua, that for reading and writing you need to directly access the hardware (thus a C call is needed). I am looking however for other means to implement this than using plain old getters and setters.
I am mostly exploring the meta-programming power of Lua, but also I believe I can create a simpler interface for the user.
What I want to achieve is behaviour like the following:
my_param = createParameter{name="hw_param1", type="number", min=0, max=100}
my_param = 5
result = my_param + 3
On the first line a new parameter is created. This is a call towards a C function. Userdata is pushed to stack with a properly initialized struct. The hardware is also initialized as needed. A new table is returned.
On the second line an assignment is done to the parameter object. I want this to call a C function with a single argument (that of the assignment), so the value can be stored to the hardware registers.
On the third line the parameter is read. I again need a call towards a C function that will get the value of the parameter from the hardware registers, and that will return the result.
Note that the actual value of this parameter may change outside the scope of Lua, so reading the value once during initialization is not correct. The C function must be called each time to get the actual value. Similarly writing to the value must cause an immediate write to the hardware.
How can I accomplish this? Specifically can I alter the metatable of the parameter to achieve lines 2 and 3? (I am aware of how to implement line 1).
Also is it necessary to return a table from the constructor? May I, for example, return a primitive Lua type (e.g. a number) that will behave like above?
Yes, you can modify the metatable metamethods.
Line 2 would completely change the variable's value that it holds.
However, if you were to set a field in the parameter object like: my_param.x = n, the __newindex metamethod would get invoked; which you could overwrite the metamethod to behave as you would like. In your case you would make it set the parameter's field and update it with a C function call.
Regarding line 3, same principle applies, instead this time you would just use the __add metamethod, and manipulate the object when __add is invoked.
http://lua-users.org/wiki/MetamethodsTutorial
This isn't exactly what you're asking for, but it's close:
function createParameter(t)
param = {}
param.data = t
backingTable = {}
metatable = {}
function metatable.__index(t, k)
-- You can intercept the value here if you
-- want and pass it on to your C fuction.
return backingTable[k]
end
function metatable.__newindex(t, k, v)
-- You can intercept the value here if you
-- want and pass it on to your C fuction.
backingTable[k] = v
end
setmetatable(param, metatable)
return param
end
--------------------------------------------------------
my_param = createParameter{name="hw_param1", type="number", min=0, max=100}
my_param.value = 5
result = my_param.value + 3
print(result) -- prints 8
print(my_param.data.name) -- prints hw_param1
You might be able to do something tricky by assigning a metatable to the global table _G, but I think that would be kind of tricky to get set up right and could lead to unexpected outcomes.
Edit:
If you really hate having to have a level of indirection, and you really want to be able to set it directly, here's how you can do it by setting the global table.
globalMetatable = {}
globalParamNames = {}
globalParams = {}
function globalMetatable.__index(t, k)
if globalParamNames[k] then
-- You can intercept the value here if you
-- want and pass it on to your C fuction.
print("Read from param " .. k)
return globalParams[k]
else
rawget(_G, k)
end
end
function globalMetatable.__newindex(t, k, v)
if globalParamNames[k] then
-- You can intercept the value here if you
-- want and pass it on to your C fuction.
print("Wrote to param " .. k)
globalParams[k] = v
else
rawset(_G, k, v)
end
end
setmetatable(_G, globalMetatable)
function createParameter(t)
globalParamNames[t.varname] = true
end
--------------------------------------------------------
createParameter{varname="my_param", name="hw_param1", type="number", min=0, max=100}
my_param = 5
result = my_param + 3
print(result) -- prints 8
print(my_param) -- prints 5

haskell reading and iterating

I need your help guys.
Im trying to learn and do a simple task in haskell, but it's still hard for me.
What im trying to do is: Read a line of numbers separated with whitespace, iterate over that list, check values, and if values are not zero add 1 otherwise -1. I was trying to do it watching some tutorials and other project code, but it just outputs a bunch of errors.
My code:
import System.Environment
import Control.Monad
import Text.Printf
import Data.List
import System.IO
solve :: IO ()
solve = do
nums <- map read . words <$> getLine
print (calculate nums)
calculate (x:xs) = x + check xs
check num
| num == 0 =
-1
| otherwise =
1
main :: IO ()
main = do
n <- readLn
if n /= 0
then do
printf "Case: "
solve
else main
Errors:
C:\Users\Donatas\Documents\haskell\la3.hs:9:21: error:
* Ambiguous type variable `b0' arising from a use of `read'
prevents the constraint `(Read b0)' from being solved.
Probable fix: use a type annotation to specify what `b0' should be.
These potential instances exist:
instance Read BufferMode -- Defined in `GHC.IO.Handle.Types'
instance Read Newline -- Defined in `GHC.IO.Handle.Types'
instance Read NewlineMode -- Defined in `GHC.IO.Handle.Types'
...plus 25 others
...plus six instances involving out-of-scope types
(use -fprint-potential-instances to see them all)
* In the first argument of `map', namely `read'
In the first argument of `(.)', namely `map read'
In the first argument of `(<$>)', namely `map read . words'
|
9 | nums <- map read . words <$> getLine
| ^^^^
C:\Users\Donatas\Documents\haskell\la3.hs:10:9: error:
* Ambiguous type variable `a0' arising from a use of `print'
prevents the constraint `(Show a0)' from being solved.
Probable fix: use a type annotation to specify what `a0' should be.
These potential instances exist:
instance Show HandlePosn -- Defined in `GHC.IO.Handle'
instance Show BufferMode -- Defined in `GHC.IO.Handle.Types'
instance Show Handle -- Defined in `GHC.IO.Handle.Types'
...plus 27 others
...plus 13 instances involving out-of-scope types
(use -fprint-potential-instances to see them all)
* In a stmt of a 'do' block: print (calculate nums)
In the expression:
do nums <- map read . words <$> getLine
print (calculate nums)
In an equation for `solve':
solve
= do nums <- map read . words <$> getLine
print (calculate nums)
|
10 | print (calculate nums)
| ^^^^^^^^^^^^^^^^^^^^^^
C:\Users\Donatas\Documents\haskell\la3.hs:12:1: error:
* Non type-variable argument in the constraint: Num [a]
(Use FlexibleContexts to permit this)
* When checking the inferred type
calculate :: forall a. (Eq a, Num [a], Num a) => [a] -> a
|
12 | calculate (x:xs) = x + check xs
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Failed, no modules loaded.
To start with, I suggest you default to always writing type annotations. And before you start implementing anything, sketch out what the types of your program look like. For this program I suggest you start from:
main :: IO ()
solve :: String -> String
calculate :: [Int] -> Int
check :: Int -> Int
The names could also probably be improved to better convey what it is they're doing.
Note that there is only one function with type IO _. This serves to isolate the impure part of your program, which will make your life easier (e.g. testing, code reasoning, etc).
You're not far off. Just try reworking your code to fit into the above types. And be aware that you're missing a pattern in your calculate implementation ;)
If you inspect your code and follow the types, it is crystal-clear where the error is. Yes, you can add type annotations -- that is highly recommended -- but I find your code is so simple you could get away with just a bit of equational reasoning.
It starts with solve, it is easy to see that nums is of type Read a => [a], given that you split a string by words (i.e. [String]) and map read over it. So a list of as is what you give to calculate. As you know, a list is the disjoint sum between (1) the empty list ([]) and (2) a cons cell made of a head, an element of type a, and a tail, the rest of the list ((x:xs)).
First thing you notice is that the case of the empty list is missing; let's add it:
calculate [] = 0 -- I assume this is correct
On to the body of calculate and check. The latter clearly expects a number, you can be a bit more concise by the way:
check 0 = -1
check _ = 1
Now if you look at calculate, you see that you are calling check and passing it xs. What is xs? It is bound in the pattern (x:xs) which is how you uncons a cons cell. Clearly, xs is the tail of the cell and so a list itself. But check expects a number! The only number you can expect here is x, not xs. So let's change you code to
calculate (x:xs) = check x + ...
Your specifications state that you want to iterate over the list. That can only happen if you do something with xs. What can you do with it? The only answer to that is to call calculate recursively:
calculate (x:xs) = check x + calculate xs
... and with these changes, your code is fine.

Scala convert IndexedSeq[AnyVal] to Array[Int]

I'm trying to solve Codility's GenomicRangeQuery using Scala and to that end I wrote the following function:
def solution(s: String, p: Array[Int], q: Array[Int]): Array[Int] = {
for (i <- p.indices) yield {
val gen = s.substring(p(i), q(i) + 1)
if (gen.contains('A')) 1
else if (gen.contains('C')) 2
else if (gen.contains('G')) 3
else if (gen.contains('T')) 4
}
}
I haven't done a lot of testing but it seems to solve the problem.
My issue is the for comprehension returns an scala.collection.immutable.IndexedSeq[AnyVal], while the function must return an Array[Int] and therefore it's throwing a type mismatch error.
Is there any way to make the for comprehension return an Array[Int] or transform the IndexedSeq[AnyVal] into an Array[Int]?
sheunis' answer above mostly covers it.
You can coerce an IndexedSeq into an Array with a call to toArray so the first bit's quite straightforward. For the second part, because there's a possible logical branch where you drop through all of your if... else... cases, it's possible for your yield to return both Int and Unit types, whose closest common ancestor is AnyVal.
Note that if you replaced your if... else... with pattern matching instead then you would explicitly get a compiler warning because you're not catching every possible case.
gen match {
case _ if gen.contains("A") => 1
case _ if gen.contains("C") => 2
...
// Throws warning unless you include a `case _ =>` with no `if` clause
}
def solution(s: String, p: Array[Int], q: Array[Int]): Array[Int] = {
(for (i <- p.indices) yield {
val gen = s.substring(p(i), q(i) + 1)
if (gen.contains('A')) 1
else if (gen.contains('C')) 2
else if (gen.contains('G')) 3
else 4
}).toArray
}
The problem with the if statement is that there is no default value, which is why you get an IndexedSeq of Any instead of Int.
They are two problems here, the first comes from p.indices that returns a scala.collection.immutable.Range instead of an Array. Doing p.indices.toArray (or adding .toArray in the end like #sheunis suggested) fixes the problem.
The other issue comes from your your if statement that is incomplete, if all conditions are false, your method returns a (): Unit (which was there added by the compiler). Adding a default case, such as a else -1 as a last statement should solve this second issue.
Edit: If the default case can never append, you could throw an exception as follows:
else {
val err = "the input String can only contain the characters ACGT"
throw new IllegalArgumentException(err)
}
This informs both the next programer and the compiler of what's going on in your code. Note that throw expressions have type Nothing, so when computing the least upper bound of (Int, Int, Int, Nothing) correctly yield Int, unlike (Int, Int, Int, Unit) that's lubed to AnyVal.

Looping over array values in Lua

I have a variable as follows
local armies = {
[1] = "ARMY_1",
[2] = "ARMY_3",
[3] = "ARMY_6",
[4] = "ARMY_7",
}
Now I want to do an action for each value. What is the best way to loop over the values? The typical thing I'm finding on the internet is this:
for i, armyName in pairs(armies) do
doStuffWithArmyName(armyName)
end
I don't like that as it results in an unused variable i. The following approach avoids that and is what I am currently using:
for i in pairs(armies) do
doStuffWithArmyName(armies[i])
end
However this is still not as readable and simple as I'd like, since this is iterating over the keys and then getting the value using the key (rather imperatively). Another boon I have with both approaches is that pairs is needed. The value being looped over here is one I have control over, and I'd prefer that it can be looped over as easily as possible.
Is there a better way to do such a loop if I only care about the values? Is there a way to address the concerns I listed?
I'm using Lua 5.0 (and am quite new to the language)
The idiomatic way to iterate over an array is:
for _, armyName in ipairs(armies) do
doStuffWithArmyName(armyName)
end
Note that:
Use ipairs over pairs for arrays
If the key isn't what you are interested, use _ as placeholder.
If, for some reason, that _ placeholder still concerns you, make your own iterator. Programming in Lua provides it as an example:
function values(t)
local i = 0
return function() i = i + 1; return t[i] end
end
Usage:
for v in values(armies) do
print(v)
end

Growing arrays in Haskell

I have the following (imperative) algorithm that I want to implement in Haskell:
Given a sequence of pairs [(e0,s0), (e1,s1), (e2,s2),...,(en,sn)], where both "e" and "s" parts are natural numbers not necessarily different, at each time step one element of this sequence is randomly selected, let's say (ei,si), and based in the values of (ei,si), a new element is built and added to the sequence.
How can I implement this efficiently in Haskell? The need for random access would make it bad for lists, while the need for appending one element at a time would make it bad for arrays, as far as I know.
Thanks in advance.
I suggest using either Data.Set or Data.Sequence, depending on what you're needing it for. The latter in particular provides you with logarithmic index lookup (as opposed to linear for lists) and O(1) appending on either end.
"while the need for appending one element at a time would make it bad for arrays" Algorithmically, it seems like you want a dynamic array (aka vector, array list, etc.), which has amortized O(1) time to append an element. I don't know of a Haskell implementation of it off-hand, and it is not a very "functional" data structure, but it is definitely possible to implement it in Haskell in some kind of state monad.
If you know approx how much total elements you will need then you can create an array of such size which is "sparse" at first and then as need you can put elements in it.
Something like below can be used to represent this new array:
data MyArray = MyArray (Array Int Int) Int
(where the last Int represent how many elements are used in the array)
If you really need stop-and-start resizing, you could think about using the simple-rope package along with a StringLike instance for something like Vector. In particular, this might accommodate scenarios where you start out with a large array and are interested in relatively small additions.
That said, adding individual elements into the chunks of the rope may still induce a lot of copying. You will need to try out your specific case, but you should be prepared to use a mutable vector as you may not need pure intermediate results.
If you can build your array in one shot and just need the indexing behavior you describe, something like the following may suffice,
import Data.Array.IArray
test :: Array Int (Int,Int)
test = accumArray (flip const) (0,0) (0,20) [(i, f i) | i <- [0..19]]
where f 0 = (1,0)
f i = let (e,s) = test ! (i `div` 2) in (e*2,s+1)
Taking a note from ivanm, I think Sets are the way to go for this.
import Data.Set as Set
import System.Random (RandomGen, getStdGen)
startSet :: Set (Int, Int)
startSet = Set.fromList [(1,2), (3,4)] -- etc. Whatever the initial set is
-- grow the set by randomly producing "n" elements.
growSet :: (RandomGen g) => g -> Set (Int, Int) -> Int -> (Set (Int, Int), g)
growSet g s n | n <= 0 = (s, g)
| otherwise = growSet g'' s' (n-1)
where s' = Set.insert (x,y) s
((x,_), g') = randElem s g
((_,y), g'') = randElem s g'
randElem :: (RandomGen g) => Set a -> g -> (a, g)
randElem = undefined
main = do
g <- getStdGen
let (grownSet,_) = growSet g startSet 2
print $ grownSet -- or whatever you want to do with it
This assumes that randElem is an efficient, definable method for selecting a random element from a Set. (I asked this SO question regarding efficient implementations of such a method). One thing I realized upon writing up this implementation is that it may not suit your needs, since Sets cannot contain duplicate elements, and my algorithm has no way to give extra weight to pairings that appear multiple times in the list.

Resources