How to create a matrix in OCaml? - arrays

I'am learning OCaml and currently i'am trying to undertand how iteration works in OCaml and how to create a matrix. I want an array 5 x 5 filled with 0. I know there is an issue with shared references so i created a new array at each iteration however iam having issues in other places, specifically at line 6. Let me know of other issues like indentation practices.
open Array;;
let n = ref 5 and i = ref 0 in
let m = Array.make !n 0 in
while !i < !n do
m.(!i) <- Array.make !n 0;;
i := !i + 1;;
done
m;;

You are using ;; too much. Contrary to popular belief, ;; is not part of ordinary OCaml syntax (in my opinion anyway). It's just a special way to tell the toplevel (the REPL) that you want it to evaluate what you've typed so far.
Leave the ;; after open Array. But change all but the last ;; to ; instead.
(Since you reference the Array module by name in your code, which IMHO is good style, you can also just leave out the open Array;; altogether.)
You want the last ;; because you do want the toplevel to evaluate what you've typed so far.
Your syntax error is caused by the fact that your overall code is like this
let ... in
let ... in
while ... do
...
done
m
The while is one expression (in OCaml everything is an expression) and m is another expression. If you want to have two expressions in a row you need ; between them. So you need ; after done.
You also have a type error. When you create m you're creating an array of ints (your given initial value is 0). So you can't make it into a matrix (an array of arrays) later in the code.
Also (not trying to overload you with criticisms :-) this code reads like imperative code. It's not particularly idiomatic OCaml code. In most people's code, using ref is pretty rare. One immediate improvement I see would just be to say let n = 5. You're not changing the value of n anywhere that I see (though maybe this is part of a larger chunk of code). Another improvement would be to use for instead of while.
Finally, you can do this entire operation in one function call:
let n = 5 in
let m = Array.init n (fun i -> Array.make n 0) in
m
Using explicit loops is actually also quite rare in OCaml (at least in my code).
Or you could try this:
let n = 5 in
let m = Array.make_matrix n n 0 in
m

Related

What is the fastest way to flatten an array of arrays in ocaml?

What is the fastest way to flatten an array of arrays in ocaml? Note that I mean arrays, and not lists.
I'd like to do this linearly, with the lowest coefficients possible.
OCaml Standard Library is rather deficient and requires you to implement so many things from scratch. That's why we have extended libraries like Batteries and Core. I would suggest you to use them, so that you will not face such problems.
Still, for the sake of completeness, let's try to implement our own solution, and then compare it with a proposed fun xxs -> Array.(concat (to_list xxs)) solution.
In the implementation we have few small problems. First of all in order to construct an array we need to provide a value for each cell. We can't just create an uninitialized array, this will break a type system. We can, of course use Obj module, but this is rather ugly. Another problem, is that the input array can be empty, so we need to handle this case somehow. We can, of course, just raise an exception, but I prefer to make my functions total. It is not obvious though, how to create an empty array, but it is not impossible:
let empty () = Array.init 0 (fun _ -> assert false)
This is a function that will create an empty polymorphic array. We use a bottom value (a value that is an inhabitant of every type), denoted as assert false. This is typesafe and neat.
Next is how to create an array, without having a default value. We can, write a very complex code, that will use Array.init and translate ith index to j'th index of n'th array. But this is tedious, error prone and quite ineffective. Another approach would be to find a first value in the input array and use it as a default. Here comes another problem, as in Standard Library we don't have an Array.find function. Sic. It's a shame that in 21th century we need to write an Array.find function, but this is how life is made. Again, use Core (or Core_kernel) library or Batteries. There're lots of excellent libraries in OCaml community available via opam. But back to our problem, since we don't have a find function we will use our own custom solution. We can use fold_left, but it will traverse the whole array, albeit we need to find only the first element. There is a solution, we can use exceptions, for non-local exits. Don't be afraid, this is idiomatic in OCaml. Also raising and catching an exception in OCaml is very fast. Other than non local exit, we also need to send the value, that we've found. We can use a reference cell as a communication channel. But this is rather ugly, and we will use the exception itself to bear the value for us. Since we don't know the type of an element in advance, we will use two modern features of OCaml language. Local abstract types and local modules. So let's go for the implementation:
let array_concat (type t) xxs =
let module Search = struct exception Done of t end in
try
Array.iter (fun xs ->
if Array.length xs <> 0
then raise_notrace (Search.Done xs.(0))) xxs;
empty ()
with Search.Done default ->
let len =
Array.fold_left (fun n xs -> n + Array.length xs) 0 xxs in
let ys = Array.make len default in
let _ : int = Array.fold_left (fun i xs ->
let len = Array.length xs in
Array.blit xs 0 ys i len;
i+len) 0 xxs in
ys
Now, the interesting part. Benchmarking! Let's use a proposed solution for comparison:
let default_concat xxs = Array.concat (Array.to_list xxs)
Here goes our testing harness:
let random_array =
Random.init 42;
let max = 100000 in
Array.init 1000 (fun _ -> Array.init (Random.int max) (fun i -> i))
let test name f =
Gc.major ();
let t0 = Sys.time () in
let xs = f random_array in
let t1 = Sys.time () in
let n = Array.length xs in
printf "%s: %g sec (%d bytes)\n%!" name (t1 -. t0) n
let () =
test "custom " array_concat;
test "default" default_concat
And... the results:
$ ./array_concat.native
custom : 0.38 sec (49203647 bytes)
default: 0.20 sec (49203647 bytes)
They don't surprise me, by the way. Our solution is two times slower than the standard library. The moral of this story is:
Always benchmark before optimizing
Use extended libraries (core, batteries, containers, ...)
Update (concatenating arrays using Base)
With the base library, we can concatenate arrays easily,
let concat_base = Array.concat_map ~f:ident
And here's our benchmark:
./example.native
custom : 0.524071 sec (49203647 bytes)
default: 0.308085 sec (49203647 bytes)
base : 0.201688 sec (49203647 bytes)
So now the base implementation is the fastest and the smallest.

Haskell : Increment index in a loop

I have a function that calculates f(n) in Haskell.
I have to write a loop so that it will start calculating values from f(0) to f(n), and will every time compare the value of f(i) with some fixed value.
I am an expert in OOP, hence I am finding it difficult to think in the functional way.
For example, I have to write something like
while (number < f(i))
i++
How would I write this in Haskell?
The standard approach here is
Create an infinite list containing all values of f(n).
Search this list until you find what you're after.
For example,
takeWhile (number <) $ map f [0..]
If you want to give up after you reach "n", you can easily add that as a separate step:
takeWhile (number <) $ take n $ map f [0..]
or, alternatively,
takeWhile (number <) $ map f [0 .. n]
You can do all sorts of other filtering, grouping and processing in this way. But it requires a mental shift. It's a bit like the difference between writing a for-loop to search a table, versus writing an SQL query. Think about Haskell as a bit like SQL, and you'll usually see how to structure your code.
You can generate the list of the is such that f i is larger than your number:
[ i | i<-[0..] , f i > number ]
Then, you can simply take the first one, if that's all you want:
head [ i | i<-[0..] , f i > number ]
Often, many idiomatic loops in imperative programming can be rephrased as list comprehensions, or expressed through map, filter, foldl, foldr. In the general case, when the loop is more complex, you can always exploit recursion instead.
Keep in mind that a "blind" translation from imperative to functional programming will often lead to non-idiomatic, hard-to-read code, as it would be the case when translating in the opposite direction. Still, I find it relieving that such translation is always possible.
If you are new to functional programming, I would advise against learning it by translating what you know about imperative programming. Rather, start from scratch following a good book (LYAH is a popular choice).
The first thing that's weird from a functional approach is that it's unclear what the result of your computation is. Do you care about the final result of f (i)? Perhaps you care about i itself. Without side effects everything neends to have a value.
Let's assume you want the final value of the function f (i) as soon as some comparison fails. You can simulate your own while loops using recursion and guards!
while :: Int -> Int -> (Int -> Int) -> Int
while start number f
| val >= number = val
| otherwise = while (start + 1) number f
where
val = f start
Instead of explicit recursion, you can use until e.g.
findGreaterThan :: (Int -> Int) -> Int -> Int -> (Int, Int)
findGreaterThan f init max = until (\(v, i) -> v >= max) (\(v, i) -> (f v, i + 1)) (init, 0)
this returns a pair containing the first value to fail the condition and the number of iterations of the given function.

How to write "good" Julia code when dealing with multiple types and arrays (multiple dispatch)

OP UPDATE: Note that in the latest version of Julia (v0.5), the idiomatic approach to answering this question is to just define mysquare(x::Number) = x^2. The vectorised case is covered using automatic broadcasting, i.e. x = randn(5) ; mysquare.(x). See also the new answer explaining dot syntax in more detail.
I am new to Julia, and given my Matlab origins, I am having some difficulty determining how to write "good" Julia code that takes advantage of multiple dispatch and Julia's type system.
Consider the case where I have a function that provides the square of a Float64. I might write this as:
function mysquare(x::Float64)
return(x^2);
end
Sometimes, I want to square all the Float64s in a one-dimentional array, but don't want to write out a loop over mysquare everytime, so I use multiple dispatch and add the following:
function mysquare(x::Array{Float64, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
But now I am sometimes working with Int64, so I write out two more functions that take advantage of multiple dispatch:
function mysquare(x::Int64)
return(x^2);
end
function mysquare(x::Array{Int64, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
Is this right? Or is there a more ideomatic way to deal with this situation? Should I use type parameters like this?
function mysquare{T<:Number}(x::T)
return(x^2);
end
function mysquare{T<:Number}(x::Array{T, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
This feels sensible, but will my code run as quickly as the case where I avoid parametric types?
In summary, there are two parts to my question:
If fast code is important to me, should I use parametric types as described above, or should I write out multiple versions for different concrete types? Or should I do something else entirely?
When I want a function that operates on arrays as well as scalars, is it good practice to write two versions of the function, one for the scalar, and one for the array? Or should I be doing something else entirely?
Finally, please point out any other issues you can think of in the code above as my ultimate goal here is to write good Julia code.
Julia compiles a specific version of your function for each set of inputs as required. Thus to answer part 1, there is no performance difference. The parametric way is the way to go.
As for part 2, it might be a good idea in some cases to write a separate version (sometimes for performance reasons, e.g., to avoid a copy). In your case however you can use the in-built macro #vectorize_1arg to automatically generate the array version, e.g.:
function mysquare{T<:Number}(x::T)
return(x^2)
end
#vectorize_1arg Number mysquare
println(mysquare([1,2,3]))
As for general style, don't use semicolons, and mysquare(x::Number) = x^2 is a lot shorter.
As for your vectorized mysquare, consider the case where T is a BigFloat. Your output array, however, is Float64. One way to handle this would be to change it to
function mysquare{T<:Number}(x::Array{T,1})
n = length(x)
y = Array(T, n)
for k = 1:n
#inbounds y[k] = x[k]^2
end
return y
end
where I've added the #inbounds macro to boost speed because we don't need to check the bound violation every time — we know the lengths. This function could still have issues in the event that the type of x[k]^2 isn't T. An even more defensive version would perhaps be
function mysquare{T<:Number}(x::Array{T,1})
n = length(x)
y = Array(typeof(one(T)^2), n)
for k = 1:n
#inbounds y[k] = x[k]^2
end
return y
end
where one(T) would give 1 if T is an Int, and 1.0 if T is a Float64, and so on. These considerations only matter if you want to make hyper-robust library code. If you really only will be dealing with Float64s or things that can be promoted to Float64s, then it isn't an issue. It seems like hard work, but the power is amazing. You can always just settle for Python-like performance and disregard all type information.
As of Julia 0.6 (c. June 2017), the "dot syntax" provides an easy and idiomatic way to apply a function to a scalar or an array.
You only need to provide the scalar version of the function, written in the normal way.
function mysquare{x::Number)
return(x^2)
end
Append a . to the function name (or preprend it to the operator) to call it on every element of an array:
x = [1 2 3 4]
x2 = mysquare(2) # 4
xs = mysquare.(x) # [1,4,9,16]
xs = mysquare.(x*x') # [1 4 9 16; 4 16 36 64; 9 36 81 144; 16 64 144 256]
y = x .+ 1 # [2 3 4 5]
Note that the dot-call will handle broadcasting, as in the last example.
If you have multiple dot-calls in the same expression, they will be fused so that y = sqrt.(sin.(x)) makes a single pass/allocation, instead of creating a temporary expression containing sin(x) and forwarding it to the sqrt() function. (This is different from Matlab/Numpy/Octave/Python/R, which don't make such a guarantee).
The macro #. vectorizes everything on a line, so #. y=sqrt(sin(x)) is the same as y = sqrt.(sin.(x)). This is particularly handy with polynomials, where the repeated dots can be confusing...

Appending to array within a loop

I am a SAS programmer learning R.
I have a matrix receivables read from a csv file.
I wish to read the value in the "transit" column, if the value of "id" column of a row is >= 1100000000.
I did this (loop 1):
x = vector()
for (i in 1:length(receivables[,"transit"])){
if(receivables[i,"id"] >= 1100000000){
append(x, receivables[i,"transit"]);
}
}
But, it does not work because after running the loop x is still empty.
>x
logical(0)
However, I was able to accomplish my task with (loop 2):
k=0
x=vector()
for (i in 1:length(receivables[,"transit"])){
if(receivables[i,"id"] >= 1100000000){
k=k+1
x[k] = receivables[i,"transit"]
}
}
Or, with (loop 3):
x = vector()
for (i in 1:length(receivables[,"transit"])){
if(receivables[i,"id"] >= 1100000000){
x <- append(x, receivables[i,"transit"]);
}
}
Why didn't the append function work in the loop as it would in command line?
Actually, to teach me how to fish, what is the attitude/beatitude one must bear in mind when operating functions in a loop as opposed to operating them in command line.
Which is more efficient? Loop 2 or loop 3?
Ok, a few things.
append didn't work in your first attempt because you did not assign the result to anything. In general, R does not work "in place". It is more of a functional language, which means that changes must always be assigned to something. (There are exceptions, but trying to bend this rule too early will get you in trouble.)
A bigger point is that "growing" objects in R is a big no-no. You will quickly start to wonder why anyone could possible use R, because growing objects like this will quickly become very, very slow.
Instead, learn to use vectorized operations, like:
receivables[receivables[,"id"] >= 1100000000,"transit"]

Dynamic programming with Data.Vector

am using Data.Vector and am currently in need of computing the contents of a vector for use in computing a cryptographic hash(Sha1). I created the following code.
dynamic :: a -> Int -> (Int -> Vector a -> a) -> Vector a
dynamic e n f =
let
start = Data.Vector.replicate n e
in step start 0
where
step vector i = if i==n then vector
else step (vector // [(i,f i vector)]) (i+1)
I created this so that the function f filling out the vector has access to the partial
results along the way. Surely something like this must already exist in Data.Vector, no?
The problem statement is the following: You are to solve a dynamic programming problem where the finished result is an array. You know the size of the array size and you have a recursive function for filling it out.
You probably already saw the function generate, which takes a size n and a function f of type Int -> a and then produces a Vector a of size n. What you probably weren't aware of is that when using this function you actually do have access to the partial results.
What I mean to say is that inside the function you pass to generate you can refer to the vector you're defining and due to Haskell's laziness it will work fine (unless you make it so that the different items of the vector depend on each other in a circular fashion, of course).
Example:
import Data.Vector
tenFibs = generate 10 fib
where fib 0 = 0
fib 1 = 1
fib n = tenFibs ! (n-1) + tenFibs ! (n-2)
tenFibs is now a vector containing the first 10 Fibonacci numbers.
Maybe you could use one of Data.Vector's scan functions?
http://hackage.haskell.org/packages/archive/vector/0.6.0.2/doc/html/Data-Vector.html#32

Resources