Implementation of Array.length in OCaml - arrays

I want to understand how Array.length is implemented. I managed to write it with Array.fold_left:
let length a = Array.fold_left (fun x _ -> x + 1) 0 a
However in the standard library, fold_left uses length so that can't be it. For length there is just this line in the stdlib which I don't understand:
external length : 'a array -> int = "%array_length"
How can I write length without usingfold_left?
EDIT:
I tried to do it with pattern matching, however it is not exhaustive, how can I make the matching more precise? (The aim is to remove the last element and return i+1 when only one element is left)
let length a =
let rec aux arr i =
match arr with
| [|h|] -> i+1
| [|h;t|] -> aux [|h|] (i+1)
in aux a 0;;

The array type is a primitive type, like the int type.
The implementation of many primitives functions on those primitive type is done with either C functions or compiler primitives.
The Array.length function belongs to the compiler primitive category and it is defined in the standard library by:
external length : 'a array -> int = "%array_length"
Here, this declaration bind the value length to the compiler primitive %array_length (compiler primitive names start with a % symbol) with type 'a array -> int. The compiler translates such compiler primitive to a lower level implementation during the translation process from the source code to either native code or bytecode.
In other words, you cannot reimplement Array.length or the array type in general in an efficient way yourself because this type is a basic building block defined by the compiler itself.

For length there is just this line in the stdlib which I don't understand:
The external keyword indicates that this function is implemented in C, and "%array_length" is the C symbol naming this function. The OCaml runtime is implemented in C and some types, like arrays, are built-in (also called primitives).
See for example Implementing primitives in Chapter 20: Interfacing C with OCaml
I tried to do it with pattern matching, however it is not exhaustive, how can I make the matching more precise
Note that OCaml tells you which pattern is not matched:
Here is an example of a case that is not matched:
[| |]
So you have to account for empty vectors as well.

Related

Fortran Array Splice Initialization

I'm trying to initialize an array with equal spacing between 0 and 1 in fortran.
My code is :
program test
double precision :: h
double precision, dimension(:), allocatable :: x
h = 1./11
if(.not. allocated(x)) allocate(x(10))
x(1:10) = [h:(1-h):h] (*)
end program
The error I am given is "The highest data type rank permitted is INTEGER(KIND=8)" at the stared line.
I've tried to change it with
x(1:10) = h:(1-h):h
x = h:(1-h):h
x(1:10) = (/(h:(1-h):h)/)
and various other forms with no luck.
The syntax you're using is not valid Fortran and implied DO loops can't have non-integer bounds. You want something like this:
x = h * real([(i,i=1,size(x))],kind(h))
For more information, look up "array constructors" in the standard or a textbook.
Don't use (1:10) on the left side - see https://software.intel.com/en-us/blogs/2008/03/31/doctor-it-hurts-when-i-do-this
This expression
[h:(1-h):h]
is, from a Fortran point of view, broken. It looks a bit like an array slice, but that would require integers, not reals, and ( and ) rather than the [ and ]. And it looks a bit like an array constructor, for which [ and ] are correct, but h:(1-h):h isn't.
Try this
x = REAL([(ix,ix=1,10)],real64)/11
(having declared ix to be an integer). The expression (ix,ix=1,10) is an implied do-loop and, inside the [ and ] produces an array with integers from 1 to 10. I trust the rest is obvious.
Incidentally, since the lhs of my suggested replacement is the whole of x you don't actually need to explicitly allocate it, Fortran will automatically do that for you. This is one of the differences between a whole array section, such as your x(1:10) and the whole array, x.
And if that doesn't produce the results you want let us know.

What is the fastest way to flatten an array of arrays in ocaml?

What is the fastest way to flatten an array of arrays in ocaml? Note that I mean arrays, and not lists.
I'd like to do this linearly, with the lowest coefficients possible.
OCaml Standard Library is rather deficient and requires you to implement so many things from scratch. That's why we have extended libraries like Batteries and Core. I would suggest you to use them, so that you will not face such problems.
Still, for the sake of completeness, let's try to implement our own solution, and then compare it with a proposed fun xxs -> Array.(concat (to_list xxs)) solution.
In the implementation we have few small problems. First of all in order to construct an array we need to provide a value for each cell. We can't just create an uninitialized array, this will break a type system. We can, of course use Obj module, but this is rather ugly. Another problem, is that the input array can be empty, so we need to handle this case somehow. We can, of course, just raise an exception, but I prefer to make my functions total. It is not obvious though, how to create an empty array, but it is not impossible:
let empty () = Array.init 0 (fun _ -> assert false)
This is a function that will create an empty polymorphic array. We use a bottom value (a value that is an inhabitant of every type), denoted as assert false. This is typesafe and neat.
Next is how to create an array, without having a default value. We can, write a very complex code, that will use Array.init and translate ith index to j'th index of n'th array. But this is tedious, error prone and quite ineffective. Another approach would be to find a first value in the input array and use it as a default. Here comes another problem, as in Standard Library we don't have an Array.find function. Sic. It's a shame that in 21th century we need to write an Array.find function, but this is how life is made. Again, use Core (or Core_kernel) library or Batteries. There're lots of excellent libraries in OCaml community available via opam. But back to our problem, since we don't have a find function we will use our own custom solution. We can use fold_left, but it will traverse the whole array, albeit we need to find only the first element. There is a solution, we can use exceptions, for non-local exits. Don't be afraid, this is idiomatic in OCaml. Also raising and catching an exception in OCaml is very fast. Other than non local exit, we also need to send the value, that we've found. We can use a reference cell as a communication channel. But this is rather ugly, and we will use the exception itself to bear the value for us. Since we don't know the type of an element in advance, we will use two modern features of OCaml language. Local abstract types and local modules. So let's go for the implementation:
let array_concat (type t) xxs =
let module Search = struct exception Done of t end in
try
Array.iter (fun xs ->
if Array.length xs <> 0
then raise_notrace (Search.Done xs.(0))) xxs;
empty ()
with Search.Done default ->
let len =
Array.fold_left (fun n xs -> n + Array.length xs) 0 xxs in
let ys = Array.make len default in
let _ : int = Array.fold_left (fun i xs ->
let len = Array.length xs in
Array.blit xs 0 ys i len;
i+len) 0 xxs in
ys
Now, the interesting part. Benchmarking! Let's use a proposed solution for comparison:
let default_concat xxs = Array.concat (Array.to_list xxs)
Here goes our testing harness:
let random_array =
Random.init 42;
let max = 100000 in
Array.init 1000 (fun _ -> Array.init (Random.int max) (fun i -> i))
let test name f =
Gc.major ();
let t0 = Sys.time () in
let xs = f random_array in
let t1 = Sys.time () in
let n = Array.length xs in
printf "%s: %g sec (%d bytes)\n%!" name (t1 -. t0) n
let () =
test "custom " array_concat;
test "default" default_concat
And... the results:
$ ./array_concat.native
custom : 0.38 sec (49203647 bytes)
default: 0.20 sec (49203647 bytes)
They don't surprise me, by the way. Our solution is two times slower than the standard library. The moral of this story is:
Always benchmark before optimizing
Use extended libraries (core, batteries, containers, ...)
Update (concatenating arrays using Base)
With the base library, we can concatenate arrays easily,
let concat_base = Array.concat_map ~f:ident
And here's our benchmark:
./example.native
custom : 0.524071 sec (49203647 bytes)
default: 0.308085 sec (49203647 bytes)
base : 0.201688 sec (49203647 bytes)
So now the base implementation is the fastest and the smallest.

Why do I need a '<' overload for an Array class?

I'm trying to add functionality to an Array class.
So I attempted to add a sort() similar to Ruby's lexicon.
For this purpose I chose the name 'ricSort()' if deference to Swift's sort().
But the compiler says it can't find an overload for '<', albeit the 'sort({$0, $1}' by
itself works okay.
Why?
var myArray:Array = [5,4,3,2,1]
myArray.sort({$0 < $1}) <-- [1, 2, 3, 4, 5]
myArray.ricSort() <-- this doesn't work.
Here's a solution that is close to what you are looking for, followed by a discussion.
var a:Int[] = [5,4,3,2,1]
extension Array {
func ricSort(fn: (lhs: T, rhs: T) -> Bool) -> T[] {
let tempCopy = self.copy()
tempCopy.sort(fn)
return tempCopy
}
}
var b = a.ricSort(<) // [1, 2, 3, 4, 5]
There are two problems with the original code. The first, a fairly simple mistake, is that Array.sort returns no value whatsoever (represented as () which is called void or Unit in some other languages). So your function, which ends with return self.sort({$0 < $1}) doesn't actually return anything, which I believe is contrary to your intention. So that's why it needs to return tempCopy instead of return self.sort(...).
This version, unlike yours, makes a copy of the array to mutate, and returns that instead. You could easily change it to make it mutate itself (the first version of the post did this if you check the edit history). Some people argue that sort's behavior (mutating the array, instead of returning a new one) is undesirable. This behavior has been debated on some of the Apple developer lists. See http://blog.human-friendly.com/swift-arrays-the-bugs-the-bad-and-the-ugly-incomplete
The other problem is that the compiler does not have enough information to generate the code that would implement ricSort, which is why you are getting the type error. It sounds like you are wondering why it is able to work when you use myArray.sort but not when you try to execute the same code inside a function on the Array.
The reason is because you told the compiler why myArray consists of:
var myArray:Array = [5,4,3,2,1]
This is shorthand for
var myArray: Array<Int> = [5,4,3,2,1]
In other words, the compiler inferred that the myArray consists of Int, and it so happens that Int conforms to the Comparable Protocol that supplies the < operator (see: https://developer.apple.com/library/prerelease/ios/documentation/General/Reference/SwiftStandardLibraryReference/Comparable.html#//apple_ref/swift/intf/Comparable)[1]. From the docs, you can see that < has the following signature:
#infix func < (lhs: Self, rhs: Self) -> Bool
Depending on what languages you have a background in, it may surprise you that < is defined in terms of the language, rather than just being a built in operator. But if you think about it, < is just a function that takes two arguments and returns true or false. The #infix means that it can appear between its two functions, so you don't have to write < 1 2.
(The type "Self" here means, "whatever the type is that this protocol implements," see Protocol Associated Type Declaration in https://developer.apple.com/library/prerelease/ios/documentation/swift/conceptual/swift_programming_language/Declarations.html#//apple_ref/doc/uid/TP40014097-CH34-XID_597)
Compare this to the signature of Array.sort: isOrderedBefore: (T, T) -> Bool
That is the generic signature. By the time the compiler is working on this line of code, it knows that the real signature is isOrderedBefore: (Int, Int) -> Bool
The compiler's job is now simple, it just has to figure out, is there a function named < that matches the expected signature, namely, one that takes two values of type Int and returns a Bool. Obviously < does match the signature here, so the compiler allows the function to be used here. It has enough information to guarantee that < will work for all values in the array. This is in contrast to a dynamic language, which cannot anticipate this. You have to actually attempt to perform the sort in order to learn if the types can actually be sorted. Some dynamic languages, like JavaScript, will make every possible attempt to continue without failing, so that expressions such as 0 < "1" evaluate correctly, while others, such as Python and Ruby, will throw an exception. Swift does neither: it prevents you from running the program, until you fixed the bug in your code.
So, why doesn't ricSort work? Because there is no type information for it to work with until you have created an instance of a particular type. It cannot infer whether the ricSort will be correct or not.
For example, suppose instead of myArray, I had this:
enum Color {
case Red, Orange, Yellow, Green, Blue, Indigo, Violet
}
var myColors = [Color.Red, Color.Blue, Color.Green]
var sortedColors = myColors.ricSort() // Kaboom!
In that case, myColors.ricSort would fail based on a type error, because < hasn't been defined for the Color enumeration. This can happen in dynamic languages, but is never supposed to happen in languages with sophisticated type systems.
Can I still use myColors.sort? Sure. I just need to define a function that takes two colors and returns then in some order that makes sense for my domain (EM wavelength? Alphabetical order? Favorite color?):
func colorComesBefore(lhs: Color, rhs: Color) -> Bool { ... }
Then, I can pass that in: myColors.sort(colorComesBefore)
This shows, hopefully, that in order to make ricSort work, we need to construct it in such a way that its definition guarantees that when it is compiled, it can be shown to be correct, without having to run it or write unit tests.
Hopefully that explains the solution. Some proposed modifications to the Swift language may make this less painful in the future. In particular creating parameterized extensions should help.
The reason you are getting an error is that the compiler cannot guarantee that the type stored in the Array can be compared with the < operator.
You can see the same sort closure on an array whose type can be compared using < like an Int:
var list = [3,1,2]
list.sort {$0 < $1}
But you will get an error if you try to use a type that cannot be compared with <:
var URL1 = NSURL()
var URL2 = NSURL()
var list = [URL1, URL2]
list.sort {$0 < $1} // error
Especially with all the syntax you can leave out in Swift, I don't see a reason to define a method for this. The following is valid and works as expected:
list.sort(<)
You can do this because < actually defines a function that takes two Ints and returns a Bool just like the sort method is expecting.

Concise notation for last element of an array

Is there a concise notation to access last element of an array, similar to std::vector::back() in C++? Do I have to write:
veryLongArrayName.[veryLongArrayName.Length-1]
each time?
Expanding from comment
The built-in option is Seq.last veryLongArrayName, but note that this is O(N) rather than O(1), so for all but the smallest arrays probably too inefficient for practical use.
That said, there's no harm in abstracting this functionality yourself:
[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
[<RequireQualifiedAccess>]
module Array =
let inline last (arr:_[]) = arr.[arr.Length - 1]
Now you can do Array.last veryLongArrayName with no overhead whatsoever, while keeping the code very idiomatic and readable.
I can not find it in the official documents, but F# 4 seems to have Array.last implemented out of the box:
/// Returns the last element of the array.
/// array: The input array.
val inline last : array:'T [] -> 'T
Link to implementation at github.
As an alternative to writing a function for _[], you can also write an extension property for IList<'T>:
open System.Collections.Generic
[<AutoOpen>]
module IListExtensions =
type IList<'T> with
member self.Last = self.[self.Count - 1]
let lastValue = [|1; 5; 13|].Last // 13

Checking OCaml type signature from C

Let's say I have an OCaml function
let _ = register "cbf_coh_insert" (fun k v -> print_endline ("Inserted key=" ^ k ^ " value=" ^ v))
That is a function that takes two arguments. On the C side, I would call that with caml_callback2(*caml_named_value("cbf_coh_insert"), k, v);. Is there a way, on the C side, to check that the number of arguments (2 in this case) match? Other than I guess calling it and trying to trap a SIGSEGV. Thanks!
UPDATE: some background.
NO WAI
This should be ensured at compile time (either manually or by code generation or by parsing and checking whether C and OCaml code are in sync)
UPDATE
Example register function :
let on_cbf_coh_insert (f : string -> string -> unit) = register "cbf_coh_insert" f
UPDATE
I wish it was possible to pass a closure/let binding straight into C.
Why do you think it is not possible? Look at existing bindings that do this all the time.
BTW This question is a perfect illustration for XY problem.

Resources