shared_ptr<> is not required to use reference count? - shared-ptr

Do I understand the new Std right that shared_ptr is not required to use a reference count? Only that it is likely that it is implemented this way?
I could imagine an implementation that uses a hidden linked-list somehow. In N3291 "20.7.2.2.5.(8) shared_ptr observers [util.smartptr.shared.obs]" The note says
[ Note: use_count() is not necessarily efficient. — end note ]
which gave me that idea.

You're right, nothing in the spec requires the use of an explicit "counter", and other possibilities exist.
For example, a linked-list implementation was suggested for the implementation of boost's shared_ptr; however, the proposal was ultimately rejected because it introduced costs in other areas (size, copy operations, and thread safety).

Abstract description
Some people say that shared_ptr is a "reference counter smart pointer". I don't think it is the right way to look at it.
Actually shared_ptr is all about (non-exclusive) ownership: all the shared_ptr that are copies of a shared_ptr initialised with a pointer p are owners.
shared_ptr keeps track of the set of owners, to guaranty that:
while the set of owners is non-empty, delete p is not called
when the set of owners becomes empty, delete p (or a copy of D the destruction functor) is called immediately;
Of course, to determine when the set of owners becomes empty, shared_ptr only needs a counter. The abstract description is just slightly easier to think about.
Possible implementations techniques
To keep track of the number of owners, a counter is not only the most obvious approach, it's also relatively obvious how to make thread-safe using atomic compare-and-modify.
To keep track all the owners, a linked list of owner is not only the obvious solution, but also an easy way to avoid the need to allocate any memory for each set of owners. The problem is that it isn't easy to make such approach efficiently thread safe (anything can be made thread safe with a global lock, which is against the very idea of parallelism).
In the case of multi-thread implementation
On the one hand, we have a small, fix-size (unless the custom destruction function is used) memory allocation, that's very easy to optimise, and simple integer atomic operations.
On the other hand, there is costly and complicated linked-list handling; and if a per owners set mutex is needed (as I think it is), the cost of memory allocation is back, at which point we can just replace the mutex with the counter!
About multiple possible implementations
How many times I have read that many implementations are possible for a "standard" class?
Who has never heard this fantasy that the complex class that could be implemented as polar coordinates? This is idiotic, as we all know. complex must use Cartesian coordinates. In case polar coordinates are preferred, another class must be created. There is no way a polar complex class is going to be used as a drop-in replacement for the usual complex class.
Same for a (non-standard) string class: there is no reason for a string class to be internally NUL terminated and not store the length as an integer, just for the fun and inefficiency of repeatedly calling strlen.
We now know that designing std::string to tolerate COW was a bad idea that is the reason for the unusual invalidation semantics of const iterators.
std::vector is now guaranteed to be continuous.
The end of the fantasy
At some point, the fantasy where standard classes have many significantly different reasonable implementations has to be dropped. Standard classes are primitive building blocks; not only they should be very efficient, they should have predictable efficiency.
A programmer should be able to make portable assumptions about the relative speed of basic operations. A complex class is useless for serious number crunching if even the simplest addition turns into a bunch a transcendental computations. If a string class is not guaranteed to have very fast copy via data sharing, the programmer will have to minimize string copies.
An implementer is free to choose a different implementation techniques only when it doesn't make a common cheap operation extremely costly (by comparison).
For many classes, this means that there is exactly one viable implementation strategy, with sometimes a few degrees of liberty (like the size of a block in a std::deque).

Related

Is access to ruby Array thread-safe?

Say that I have N threads accessing an array with N elements. The array has been prepared before the threads start. Each thread will access a different element (the thread I will access element I, both for reading and writing).
In theory, I'd expect such an access pattern not to cause any race conditions, but will Ruby actually guarantee thread safety in this case?
but will Ruby actually guarantee thread safety in this case
Ruby does not have a defined memory model, so there are no guarantees of any kind.
YARV has a Giant VM Lock which prevents multiple Ruby threads from running at the same time, which gives some implicit guarantees, but this is a private, internal implementation detail of YARV. For example, TruffleRuby, JRuby, and Rubinius can run multiple Ruby threads in parallel.
Since there is no specification of what the behavior should be, any Ruby implementation is free to do whatever they want. Most commonly, Ruby implementors try to mimic the behavior of YARV, but even that is not well-defined. In YARV, data structures are generally not thread-safe, so if you want to mimic the behavior of YARV, do you make all your data structures not thread-safe? But in YARV, also multiple threads cannot run at the same time, so in a lot of cases, operations are implicitly thread-safe, so if you want to mimic YARV, should you make your data structures thread-safe?
Or, in order to mimic YARV, should you prevent multiple threads from running at the same time? But, being able to run multiple threads in parallel is actually one of the reasons why people choose, for example JRuby over YARV.
As you can see, this is very much not a trivial question.
The best solution is to verify the behavior of each Ruby implementation separately. Actually, that is the second best solution.
The best solution is to use something like the concurrent-ruby Gem where someone else has already done the work of verifying the behavior of each Ruby implementation for you. The concurrent-ruby maintainers have a close relationship with several Ruby implementations (Chris Seaton, one of the two lead maintainers of concurrent-ruby is also the lead developer of TruffleRuby, a JRuby core developer, and a member of ruby-core, for example), and so you can generally be certain that everything that is in concurrent-ruby is safe on all supported Ruby implementations (currently YARV, JRuby, and TruffleRuby).
Concurrent Ruby has a Concurrent::Array class which is thread-safe. You can see how it is implemented here: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent-ruby/concurrent/array.rb As you can see, for YARV, Concurrent::Array is actually the same as ::Array, but for other implementations, more work is required.
The concurrent-ruby developers are also working on specifying Ruby's memory model, so that in the future, both programmers know what to expect and what not to expect, and implementors know what they are allowed to optimize and what they aren't.
Alternatives to Mutable Arrays
In standard Ruby implementations, an Array is not thread-safe. However, a Queue is. On the other hand, a Queue is not quite an Array, so you don't have all the methods on Queue that you may be looking for.
The Concurrent Ruby gem provides a thread-safe Array class, but as a rule thread-safe classes will be slower than those that aren't. Depending on your data this may not matter, but it's certainly a design consideration.
If you know from the beginning that you're going to be heavily reliant on threading, you should build your application on a Ruby implementation that offers concurrency and threading to begin with (e.g. consider JRuby or TruffleRuby), and design your application to take advantage of Ractors or use other concurrency models that treat data as immutable rather than sharing objects between threads.
Immutable data is a better pattern for threading than shared objects. You may or may not have problems with any given mutable object given enough due care, but Ractors and fiber-local variables should be faster and safer than trying to make mutable objects threat-safe. YMMV, though.

linear genetic programming constants

What are the advantages and disadvantages to explicitly defining constants within the memory register, as opposed to initially seeding individuals with constants and mutating them through genetic operators ?
Specifically, reading this from a book I have on Linear Genetic Programming:
In our implementation all registers hold floating-point values. Internally,
constants are stored in registers that are write-protected, i.e., may not become
destination registers. As a consequence, the set of possible constants
remains fixed.
What I am wondering is, is this a better approach than simply initially randomly generating the constants and integrating them into the program, and then improving upon them through the training process.
Thanks
From your use of the term memory register I'm going to assume you are referring to a memory enhanced genetic programming technique.
Even with memory registers it isn't normal to use constants. Instead, a technique like Memory with memory in genetic programming can allow you to mutate your memory registers slowly instead of making abrupt changes that make the memory irrelevant. Still, it is common to introduce the values with random initialization so you can do random restarts to attempt to escape local optimums.
Perhaps you are thinking of setting all the memory registers to zero or some human estimated values?
EDIT:
It sounds like they have some values they wish to carry through all generations unmodified. Personally, that would think that kind of value is better represented by the environment your GA lives in, assuming they are shared across all instances.
IF they are not shared across instances, then this may reflect an intention to seed the population with some random values in setup which should not be allowed to evolve (at least, not during a given run).
The first type of constant (the environment constant) can be useful if you need to evolve against a complex background and you want to get good fit against a fixed environment first before trying to get a good fit for the more complex case of variable environments.
The second kind of constant (per DNA instance) isn't something I have used, but I guess if the analyst has identified some "volatile" variables that are too chaotic to allow continuous evolution it might be useful to "protect" them as constants for a run.

How fast is Data.Array?

The documentation of Data.Array reads:
Haskell provides indexable arrays, which may be thought of as
functions whose domains are isomorphic to contiguous subsets of the
integers. Functions restricted in this way can be implemented
efficiently; in particular, a programmer may reasonably expect rapid
access to the components.
I wonder how fast can (!) and (//) be. Can I expect O(1) complexity from these, as I would have from their imperative counterparts?
In general, yes, you should be able to expect O(1) from ! although I'm not sure if thats guaranteed by the standard.
You might want to see the vector package if you want faster arrays though (through use of stream fusion). It is also better designed.
Note that // is probably O(n) though because it has to traverse the list (just like an imperative program would). If you need a lot of mutation you can use MArray orMVector.

Is it okay to use functions to stay organized in C?

I'm a relatively new C programmer, and I've noticed that many conventions from other higher-level OOP languages don't exactly hold true on C.
Is it okay to use short functions to have your coding stay organized (even though it will likely be called only once)? An example of this would be 10-15 lines in something like void init_file(void), then calling it first in main().
I would have to say, not only is it OK, but it's generally encouraged. Just don't overly fragment the train of thought by creating myriads of tiny functions. Try to ensure that each function performs a single cohesive, well... function, with a clean interface (too many parameters can be a hint that the function is performing work which is not sufficiently separate from it's caller).
Furthermore, well-named functions can serve to replace comments that would otherwise be needed. As well as providing re-use, functions can also (or instead) provide a means to organize the code and break it down into smaller units which can be more readily understood. Using functions in this way is very much like creating packages and classes/modules, though at a more fine-grained level.
Yes. Please. Don't write long functions. Write short ones that do one thing and do it well. The fact that they may only be called once is fine. One benefit is that if you name your function well, you can avoid writing comments that will get out of sync with the code over time.
If I can take the liberty to do some quoting from Code Complete:
(These reason details have been abbreviated and in spots paraphrased, for the full explanation see the complete text.)
Valid Reasons to Create a Routine
Note the reasons overlap and are not intended to be independent of each other.
Reduce complexity - The single most important reason to create a routine is to reduce a program's complexity (hide away details so you don't need to think about them).
Introduce an intermediate, understandable abstraction - Putting a section of code int o a well-named routine is one of the best ways to document its purpose.
Avoid duplicate code - The most popular reason for creating a routine. Saves space and is easier to maintain (only have to check and/or modify one place).
Hide sequences - It's a good idea to hide the order in which events happen to be processed.
Hide pointer operations - Pointer operations tend to be hard to read and error prone. Isolating them into routines shifts focus to the intent of the operation instead of the mechanics of pointer manipulation.
Improve portability - Use routines to isolate nonportable capabilities.
Simplify complicated boolean tests - Putting complicated boolean tests into a function makes the code more readable because the details of the test are out of the way and a descriptive function name summarizes the purpose of the tests.
Improve performance - You can optimize the code in one place instead of several.
To ensure all routines are small? - No. With so many good reasons for putting code into a routine, this one is unnecessary. (This is the one thrown into the list to make sure you are paying attention!)
And one final quote from the text (Chapter 7: High-Quality Routines)
One of the strongest mental blocks to
creating effective routines is a
reluctance to create a simple routine
for a simple purpose. Constructing a
whole routine to contain two or three
lines of code might seem like
overkill, but experience shows how
helpful a good small routine can be.
If a group of statements can be thought of as a thing - then make them a function
i think it is more than OK, I would recommend it! short easy to prove correct functions with well thought out names lead to code which is more self documenting than long complex functions.
Any compiler worth using will be able to inline these calls to generate efficient code if needed.
Functions are absolutely necessary to stay organized. You need to first design the problem, and then depending on the different functionality you need to split them into functions. Some segment of code which is used multiple times, probably needs to be written in a function.
I think first thinking about what problem you have in hand, break down the components and for each component try writing a function. When writing the function see if there are some code segment doing the same thing, then break it into a sub function, or if there is a sub module then it is also a candidate for another function. But at some time this breaking job should stop, and it depends on you. Generally, do not make many too big functions and not many too small functions.
When construction the function please consider the design to have high cohesion and low coupling.
EDIT1::
you might want to also consider separate modules. For example if you need to use a stack or queue for some application. Make it separate modules whose functions could be called from other functions. This way you can save re-coding commonly used modules by programming them as a group of functions stored separately.
Yes
I follow a few guidelines:
DRY (aka DIE)
Keep Cyclomatic Complexity low
Functions should fit in a Terminal window
Each one of these principles at some point will require that a function be broken up, although I suppose #2 could imply that two functions with straight-line code should be combined. It's somewhat more common to do what is called method extraction than actually splitting a function into a top and bottom half, because the usual reason is to extract common code to be called more than once.
#1 is quite useful as a decision aid. It's the same thing as saying, as I do, "never copy code".
#2 gives you a good reason to break up a function even if there is no repeated code. If the decision logic passes a certain complexity threshold, we break it up into more functions that make fewer decisions.
It is indeed a good practice to refactor code into functions, irrespective of the language being used. Even if your code is short, it will make it more readable.
If your function is quite short, you can consider inlining it.
IBM Publib article on inlining

How to implement continuations?

I'm working on a Scheme interpreter written in C. Currently it uses the C runtime stack as its own stack, which is presenting a minor problem with implementing continuations. My current solution is manual copying of the C stack to the heap then copying it back when needed. Aside from not being standard C, this solution is hardly ideal.
What is the simplest way to implement continuations for Scheme in C?
A good summary is available in Implementation Strategies for First-Class Continuations, an article by Clinger, Hartheimer, and Ost. I recommend looking at Chez Scheme's implementation in particular.
Stack copying isn't that complex and there are a number of well-understood techniques available to improve performance. Using heap-allocated frames is also fairly simple, but you make a tradeoff of creating overhead for "normal" situation where you aren't using explicit continuations.
If you convert input code to continuation passing style (CPS) then you can get away with eliminating the stack altogether. However, while CPS is elegant it adds another processing step in the front end and requires additional optimization to overcome certain performance implications.
I remember reading an article that may be of help to you: Cheney on the M.T.A. :-)
Some implementations of Scheme I know of, such as SISC, allocate their call frames on the heap.
#ollie: You don't need to do the hoisting if all your call frames are on the heap. There's a tradeoff in performance, of course: the time to hoist, versus the overhead required to allocate all frames on the heap. Maybe it should be a tunable runtime parameter in the interpreter. :-P
If you are starting from scratch, you really should look in to Continuation Passing Style (CPS) transformation.
Good sources include "LISP in small pieces" and Marc Feeley's Scheme in 90 minutes presentation.
It seems Dybvig's thesis is unmentioned so far.
It is a delight to read. The heap based model
is the easiest to implement, but the stack based
is more efficient. Ignore the string based model.
R. Kent Dybvig. "Three Implementation Models for Scheme".
http://www.cs.indiana.edu/~dyb/papers/3imp.pdf
Also check out the implementation papers on ReadScheme.org.
https://web.archive.org/http://library.readscheme.org/page8.html
The abstract is as follows:
This dissertation presents three implementation models for the Scheme
Programming Language. The first is a heap-based model used in some
form in most Scheme implementations to date; the second is a new
stack-based model that is considerably more efficient than the
heap-based model at executing most programs; and the third is a new
string-based model intended for use in a multiple-processor
implementation of Scheme.
The heap-based model allocates several important data structures in a
heap, including actual parameter lists, binding environments, and call
frames.
The stack-based model allocates these same structures on a stack
whenever possible. This results in less heap allocation, fewer memory
references, shorter instruction sequences, less garbage collection,
and more efficient use of memory.
The string-based model allocates versions of these structures right in
the program text, which is represented as a string of symbols. In the
string-based model, Scheme programs are translated into an FFP
language designed specifically to support Scheme. Programs in this
language are directly executed by the FFP machine, a
multiple-processor string-reduction computer.
The stack-based model is of immediate practical benefit; it is the
model used by the author's Chez Scheme system, a high-performance
implementation of Scheme. The string-based model will be useful for
providing Scheme as a high-level alternative to FFP on the FFP machine
once the machine is realized.
Besides the nice answers you've got so far, I recommend Andrew Appel's Compiling with Continuations. It's very well written and while not dealing directly with C, it is a source of really nice ideas for compiler writers.
The Chicken Wiki also has pages that you'll find very interesting, such as internal structure and compilation process (where CPS is explained with an actual example of compilation).
Examples that you can look at are: Chicken (a Scheme implementation, written in C that support continuations); Paul Graham's On Lisp - where he creates a CPS transformer to implement a subset of continuations in Common Lisp; and Weblocks - a continuation based web framework, which also implements a limited form of continuations in Common Lisp.
Continuations aren't the problem: you can implement those with regular higher-order functions using CPS. The issue with naive stack allocation is that tail calls are never optimised, which means you can't be scheme.
The best current approach to mapping scheme's spaghetti stack onto the stack is using trampolines: essentially extra infrastructure to handle non-C-like calls and exits from procedures. See Trampolined Style (ps).
There's some code illustrating both of these ideas.
The traditional way is to use setjmp and longjmp, though there are caveats.
Here's a reasonably good explanation
Continuations basically consist of the saved state of the stack and CPU registers at the point of context switches. At the very least you don't have to copy the entire stack to the heap when switching, you could only redirect the stack pointer.
Continuations are trivially implemented using fibers. http://en.wikipedia.org/wiki/Fiber_%28computer_science%29
. The only things that need careful encapsulation are parameter passing and return values.
In Windows fibers are done using the CreateFiber/SwitchToFiber family of calls.
in Posix-compliant systems it can be done with makecontext/swapcontext.
boost::coroutine has a working implementation of coroutines for C++ that can serve as a reference point for implementation.
As soegaard pointed out, the main reference remains R. Kent Dybvig. "Three Implementation Models for Scheme".
The idea is, a continuation is a closure that keeps its evaluation control stack. The control stack is required in order to continue the evalution from the moment the continuation was created using call/cc.
Oftenly invoking the continuation makes long time of execution and fills the memory with duplicated stacks. I wrote this stupid code to prove that, in mit-scheme it makes the scheme crash,
The code sums the first 1000 numbers 1+2+3+...+1000.
(call-with-current-continuation
(lambda (break)
((lambda (s) (s s 1000 break))
(lambda (s n cc)
(if (= 0 n)
(cc 0)
(+ n
;; non-tail-recursive,
;; the stack grows at each recursive call
(call-with-current-continuation
(lambda (__)
(s s (- n 1) __)))))))))
If you switch from 1000 to 100 000 the code will spend 2 seconds, and if you grow the input number it will crash.
Use an explicit stack instead.
Patrick is correct, the only way you can really do this is to use an explicit stack in your interpreter, and hoist the appropriate segment of stack into the heap when you need to convert to a continuation.
This is basically the same as what is needed to support closures in languages that support them (closures and continuations being somewhat related).

Resources