How fast is Data.Array?

How fast is Data.Array? - arrays

The documentation of Data.Array reads:
Haskell provides indexable arrays, which may be thought of as
functions whose domains are isomorphic to contiguous subsets of the
integers. Functions restricted in this way can be implemented
efficiently; in particular, a programmer may reasonably expect rapid
access to the components.
I wonder how fast can (!) and (//) be. Can I expect O(1) complexity from these, as I would have from their imperative counterparts?

In general, yes, you should be able to expect O(1) from ! although I'm not sure if thats guaranteed by the standard.
You might want to see the vector package if you want faster arrays though (through use of stream fusion). It is also better designed.
Note that // is probably O(n) though because it has to traverse the list (just like an imperative program would). If you need a lot of mutation you can use MArray orMVector.

Related

How expressive can we be with arrays in Z3(Py)? An example

My first question is whether I can express the following formula in Z3Py:
Exists i::Integer s.t. (0<=i<|arr|) & (avg(arr)+t<arr[i])
This means: whether there is a position i::0<i<|arr| in the array whose value a[i] is greater than the average of the array avg(arr) plus a given threshold t.
I know this kind of expressions can be queried in Dafny and (since Dafny uses Z3 below) I guess this can be done in Z3Py.
My second question is: how expressive is the decidable fragment involving arrays in Z3?
I read this paper on how the full theory of arrays is not decidable (http://theory.stanford.edu/~arbrad/papers/arrays.pdf), but only a concrete fragment, the array property fragment.
Is there any interesting paper/tutorial on what can and cannot be done with arrays+quantifiers+functions in Z3?

You found the best paper to read regarding reasoning with Array's, so I doubt there's a better resource or a tutorial out there for you.
I think the sequence logic (not yet officially supported by SMTLib, but z3 supports it), is the right logic to use for reasoning about these sorts of problems, see: https://microsoft.github.io/z3guide/docs/theories/Sequences/
Having said that, most properties about arrays/sequences of "arbitrary size" require inductive proofs. This is because most interesting functions on them are essentially recursive (or iterative), and induction is the only way to prove properties for such programs. While SMT solvers improved significantly regarding support for recursive definitions and induction, they still don't perform anywhere near well compared to a traditional theorem prover. (This is, of course, to be expected.)
I'd recommend looking at the sequence logic, and playing around with recursive definitions. You might get some mileage out of that, though don't expect proofs for anything that require induction, especially if the inductive-hypothesis needs some clever invariant to be specified.
Note that if you know the length of your array concretely (i.e., 10, 15, or some other hopefully not too large a number), then it's best to allocate the elements symbolically yourself, and not use arrays/sequences at all. (And you can repeat your proof for lenghts 0, 1, 2, .. upto some fixed number.) But if you want proofs that work for arbitrary lengths, your best bet is to use sequences in z3, not arrays; with all the caveats I mentioned above.

Is access to ruby Array thread-safe?

Say that I have N threads accessing an array with N elements. The array has been prepared before the threads start. Each thread will access a different element (the thread I will access element I, both for reading and writing).
In theory, I'd expect such an access pattern not to cause any race conditions, but will Ruby actually guarantee thread safety in this case?

but will Ruby actually guarantee thread safety in this case
Ruby does not have a defined memory model, so there are no guarantees of any kind.
YARV has a Giant VM Lock which prevents multiple Ruby threads from running at the same time, which gives some implicit guarantees, but this is a private, internal implementation detail of YARV. For example, TruffleRuby, JRuby, and Rubinius can run multiple Ruby threads in parallel.
Since there is no specification of what the behavior should be, any Ruby implementation is free to do whatever they want. Most commonly, Ruby implementors try to mimic the behavior of YARV, but even that is not well-defined. In YARV, data structures are generally not thread-safe, so if you want to mimic the behavior of YARV, do you make all your data structures not thread-safe? But in YARV, also multiple threads cannot run at the same time, so in a lot of cases, operations are implicitly thread-safe, so if you want to mimic YARV, should you make your data structures thread-safe?
Or, in order to mimic YARV, should you prevent multiple threads from running at the same time? But, being able to run multiple threads in parallel is actually one of the reasons why people choose, for example JRuby over YARV.
As you can see, this is very much not a trivial question.
The best solution is to verify the behavior of each Ruby implementation separately. Actually, that is the second best solution.
The best solution is to use something like the concurrent-ruby Gem where someone else has already done the work of verifying the behavior of each Ruby implementation for you. The concurrent-ruby maintainers have a close relationship with several Ruby implementations (Chris Seaton, one of the two lead maintainers of concurrent-ruby is also the lead developer of TruffleRuby, a JRuby core developer, and a member of ruby-core, for example), and so you can generally be certain that everything that is in concurrent-ruby is safe on all supported Ruby implementations (currently YARV, JRuby, and TruffleRuby).
Concurrent Ruby has a Concurrent::Array class which is thread-safe. You can see how it is implemented here: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent-ruby/concurrent/array.rb As you can see, for YARV, Concurrent::Array is actually the same as ::Array, but for other implementations, more work is required.
The concurrent-ruby developers are also working on specifying Ruby's memory model, so that in the future, both programmers know what to expect and what not to expect, and implementors know what they are allowed to optimize and what they aren't.

Alternatives to Mutable Arrays
In standard Ruby implementations, an Array is not thread-safe. However, a Queue is. On the other hand, a Queue is not quite an Array, so you don't have all the methods on Queue that you may be looking for.
The Concurrent Ruby gem provides a thread-safe Array class, but as a rule thread-safe classes will be slower than those that aren't. Depending on your data this may not matter, but it's certainly a design consideration.
If you know from the beginning that you're going to be heavily reliant on threading, you should build your application on a Ruby implementation that offers concurrency and threading to begin with (e.g. consider JRuby or TruffleRuby), and design your application to take advantage of Ractors or use other concurrency models that treat data as immutable rather than sharing objects between threads.
Immutable data is a better pattern for threading than shared objects. You may or may not have problems with any given mutable object given enough due care, but Ractors and fiber-local variables should be faster and safer than trying to make mutable objects threat-safe. YMMV, though.

Cython: Efficiently loop over list with multiple types

I have a Python function that loops over a list and I want to convert it to Cython for performance gain.
The lists it accepts contain a mix of strings, integers, and floats, so I'm not sure how to statically type the variables involved (I don't know C).
What would be the most efficient way to implement something like this in Cython?

You seem to be hoping for a C type that has all the flexibility of the Python object, but is somehow magically faster.
There is basically a good option and a bad option here:
The good option is to accept that such a type doesn't really exist.
Therefore, you should leave your data extracted from the list untyped so that it remains a regular Python object. Not everything in Cython needs to be typed - the vast majority of Python code should run unchanged.
It might be worth typing your list as list, since Cython can generate slightly more efficient loops when it knows the iterable is a list.
The bad option is to use a feature of C called a union which represents a variable that is one of a limited number of dissimilar types. I am not recommending this (especially for someone that doesn't know C already) because there is no "easy" Cython wrapping (you'll have to dive directly into the C details). You will find handling strings in a union particularly challenging.
Pursue this option at your own peril.

Using a giant workspace array for a Fortran subroutine

Quite often when I look at legacy Fortran code for linear algebra subroutines, I notice this:
Instead of working with a bunch of separate arrays, they will concatenate all their arrays into one big workspace array and use pointers to demarcate where a variable begins.
They even concatenate independent non-array variables into arrays. Are there benefits to doing this, and should I be doing this if I want to write optimized code?

No, don't do that if you want to keep sane mind. This is a practise from 1960s-1980s when there was no dynamic allocation possible and they wanted only small number of working arrays in the argument list.
In old subroutines you had a long list of arguments and then one or two working arrays:
call SUB(N1, N2, N3, M1, M2, M3, A, B, C, WRK, IWRK)
if you needed to pass 10 working arrays instead of one it would be too difficult to call it.
But in 21st century the most important thing is to keep your code readable and clear and only after that optimize it.
BTW having some quantities to close in memory can be even detrimental due to false sharing.
That does not mean you should fragment your memory too much, but it makes sense to keep stuff together when you will indeed access it sequentially. That's why structure of arrays are used instead of arrays of structures.

In general (independent of the programming language that is used): having "consecutive" blocks of well, anything is often helpful.
The operating system, or even the hardware might be able to benefit from having a single huge section in memory to deal with; compared to look at 50 or 100 different locations.
A good starter for such discussions would be this question for example.
But I agree 100% with the other answer: unless you get massive performance gains out of using such techniques, you should always prefer to write "clean" (aka readable) code. And that translates to avoiding such practices.

shared_ptr<> is not required to use reference count?

Do I understand the new Std right that shared_ptr is not required to use a reference count? Only that it is likely that it is implemented this way?
I could imagine an implementation that uses a hidden linked-list somehow. In N3291 "20.7.2.2.5.(8) shared_ptr observers [util.smartptr.shared.obs]" The note says
[ Note: use_count() is not necessarily efficient. — end note ]
which gave me that idea.

You're right, nothing in the spec requires the use of an explicit "counter", and other possibilities exist.
For example, a linked-list implementation was suggested for the implementation of boost's shared_ptr; however, the proposal was ultimately rejected because it introduced costs in other areas (size, copy operations, and thread safety).

Abstract description
Some people say that shared_ptr is a "reference counter smart pointer". I don't think it is the right way to look at it.
Actually shared_ptr is all about (non-exclusive) ownership: all the shared_ptr that are copies of a shared_ptr initialised with a pointer p are owners.
shared_ptr keeps track of the set of owners, to guaranty that:
while the set of owners is non-empty, delete p is not called
when the set of owners becomes empty, delete p (or a copy of D the destruction functor) is called immediately;
Of course, to determine when the set of owners becomes empty, shared_ptr only needs a counter. The abstract description is just slightly easier to think about.
Possible implementations techniques
To keep track of the number of owners, a counter is not only the most obvious approach, it's also relatively obvious how to make thread-safe using atomic compare-and-modify.
To keep track all the owners, a linked list of owner is not only the obvious solution, but also an easy way to avoid the need to allocate any memory for each set of owners. The problem is that it isn't easy to make such approach efficiently thread safe (anything can be made thread safe with a global lock, which is against the very idea of parallelism).
In the case of multi-thread implementation
On the one hand, we have a small, fix-size (unless the custom destruction function is used) memory allocation, that's very easy to optimise, and simple integer atomic operations.
On the other hand, there is costly and complicated linked-list handling; and if a per owners set mutex is needed (as I think it is), the cost of memory allocation is back, at which point we can just replace the mutex with the counter!
About multiple possible implementations
How many times I have read that many implementations are possible for a "standard" class?
Who has never heard this fantasy that the complex class that could be implemented as polar coordinates? This is idiotic, as we all know. complex must use Cartesian coordinates. In case polar coordinates are preferred, another class must be created. There is no way a polar complex class is going to be used as a drop-in replacement for the usual complex class.
Same for a (non-standard) string class: there is no reason for a string class to be internally NUL terminated and not store the length as an integer, just for the fun and inefficiency of repeatedly calling strlen.
We now know that designing std::string to tolerate COW was a bad idea that is the reason for the unusual invalidation semantics of const iterators.
std::vector is now guaranteed to be continuous.
The end of the fantasy
At some point, the fantasy where standard classes have many significantly different reasonable implementations has to be dropped. Standard classes are primitive building blocks; not only they should be very efficient, they should have predictable efficiency.
A programmer should be able to make portable assumptions about the relative speed of basic operations. A complex class is useless for serious number crunching if even the simplest addition turns into a bunch a transcendental computations. If a string class is not guaranteed to have very fast copy via data sharing, the programmer will have to minimize string copies.
An implementer is free to choose a different implementation techniques only when it doesn't make a common cheap operation extremely costly (by comparison).
For many classes, this means that there is exactly one viable implementation strategy, with sometimes a few degrees of liberty (like the size of a block in a std::deque).