Swift: Do For-In Loop Parameters Optimize Away Needless Accesses? - arrays

Context:
Suppose I want to enumerate the NSTableColumn objects of an NSTableView and I do this:
for column: NSTableColumn in someTableView.tableColumns
{
[...]
}
In Objective-C, it was a very well-known performance optimization to avoid unnecessary selector calls in loop parameters, so we'd factor that out like this:
NSArray *columns = [someTableView tableColumns];
for (NSTableColumn *column in columns)
{
[...]
}
Question:
Is this still needed in Swift, or will the compiler now factor out the repetitive access during optimization? Since Array is now a value type, if each loop iteration involved re-creating it, that would seem expensive.
(Apologies if it's been asked before; it's a really hard thing to Google.)

Related

How to manage a large number of variables in C?

In an implementation of the Game of Life, I need to handle user events, perform some regular (as in periodic) processing and draw to a 2D canvas. The details are not particularly important. Suffice it to say that I need to keep track of a large(-ish) number of variables. These are things like: a structure representing the state of the system (live cells), pointers to structures provided by the graphics library, current zoom level, coordinates of the origin and I am sure a few others.
In the main function, there is a game loop like this:
// Setup stuff
while (!finished) {
while (get_event(&e) != 0) {
if (e.type == KEYBOARD_EVENT) {
switch (e.key.keysym) {
case q:
case x:
// More branching and nesting follows
The maximum level of nesting at the moment is 5. It quickly becomes unmanageable and difficult to read, especially on a small screen. The solution then is to split this up into multiple functions. Something like:
while (!finished {
while (get_event(&e) !=0) {
handle_event(state, origin_x, origin_y, &canvas, e...) //More parameters
This is the crux of the question. The subroutine must necessarily have access to the state (represented by the origin, the canvas, the live cells etc.) in order to function. Passing them all explicitly is error prone (which order does the subroutine expect them in) and can also be difficult to read. Aside from that, having functions with potentially 10+ arguments strikes me as a symptom of other design flaws. However the alternatives that I can think of, don't seem any better.
To summarise:
Accept deep nesting in the game loop.
Define functions with very many arguments.
Collate (somewhat) related arguments into structs - This really only hides the problem, especially since the arguments are only loosely related.
Define variables that represent the application state with file scope (static int origin_x; for example). If it weren't for the fact that it has been drummed into me that global variable are usually a terrible idea, this would be my preferred option. But if I want to display two views of the same instance of the Game of Life in the future, then the file scope no longer looks so appealing.
The question also applies in slightly more general terms I suppose: How do you pass state around a complicated program safely and in a readable way?
EDIT:
My motivations here are not speed or efficiency or performance or anything like this. If the code takes 20% longer to run as a result of the choice made here that's just fine. I'm primarily interested in what is less likely to confuse me and cause the least headache in 6 months time.
I would consider the canvas as one variable, containing a large 2D array...
consider static allocation
bool canvas[ROWS][COLS];
or dynamic
bool *canvas = malloc(N*M*sizeof(int));
In both cases you can refer to the cell at position i,j as canvas[i][j]
though for dynamic allocation, do not forget to free(canvas) at the end. You can then use a nested loop to update your state.
Search for allocating/handling a 2d array in C and examples or tutorials... Possibly check something like this or similar? Possibly this? https://www.geeksforgeeks.org/nested-loops-in-c-with-examples/
Also consider this Fastest way to zero out a 2d array in C?

Iterating for `setindex!`

I have some specially-defined arrays in Julia which you can think of being just a composition of many arrays. For example:
type CompositeArray{T}
x::Vector{T}
y::Vector{T}
end
with an indexing scheme
getindex(c::CompositeArray,i::Int) = i <= length(c) ? c.x[i] : c.y[i-length(c.x)]
I do have one caveat: the higher indexing scheme just goes to x itself:
getindex(c::CompositeArray,i::Int...) = c.x[i...]
Now the iterator through these can easily be made as the chain of the iterator on x and then on y. This makes iterating through the values have almost no extra cost. However, can something similar be done for iteration to setindex!?
I was thinking of having a separate dispatch on CartesianIndex{2} just for indexing x vs y and the index, and building an eachindex iterator for that, similar to what CatViews.jl does. However, I'm not certain how that will interact with the i... dispatch, or whether it will be useful in this case.
In addition, will broadcasting automatically use this fast iteration scheme if it's built on eachindex?
Edits:
length(c::CompositeArray) = length(c.x) + length(c.y)
In the real case, x can be any AbstractArray (and thus has a linear index), but since only the linear indexing is used (except for that one user-facing getindex function), the problem really boils down to finding out how to do this with x a Vector.
Making X[CartesianIndex(2,1)] mean something different from X[2,1] is certainly not going to end well. And I would expect similar troubles from the fact that X[100,1] may mean something different from X[100] or if length(X) != prod(size(X)). You're free to break the rules, but you shouldn't be surprised when functions in Base and other packages expect you to follow them.
The safe way to do this would be to make eachindex(::CompositeArray) return a custom iterator over objects that you control entirely. Maybe just throw a wrapper around and forward methods to CartesianRange and CartesianIndex{2} if that data structure is helpful. Then when you get one of these custom index types, you know that SplitIndex(CartesianIndex(1,2)) is indeed intending to refer to the first element in the second array.

Array.isDefinedAt for n-dimensional arrays in scala

Is there an elegant way to express
val a = Array.fill(2,10) {1}
def do_to_elt(i:Int,j:Int) {
if (a.isDefinedAt(i) && a(i).isDefinedAt(j)) f(a(i)(j))
}
in scala?
I recommend that you not use arrays of arrays for 2D arrays, for three main reasons. First, it allows inconsistency: not all columns (or rows, take your pick) need to be the same size. Second, it is inefficient--you have to follow two pointers instead of one. Third, very few library functions exist that work transparently and usefully on arrays of arrays as 2D arrays.
Given these things, you should either use a library that supports 2D arrays, like scalala, or you should write your own. If you do the latter, among other things, this problem magically goes away.
So in terms of elegance: no, there isn't a way. But beyond that, the path you're starting on contains lots of inelegance; you would probably do best to step off of it quickly.
You just need to check the array at index i with isDefinedAt if it exists:
def do_to_elt(i:Int, j:Int): Unit =
if (a.isDefinedAt(i) && a(i).isDefinedAt(j)) f(a(i)(j))
EDIT: Missed that part about the elegant solution as I focused on the error in the code before your edit.
Concerning elegance: no, per se there is no way to express it in a more elegant way. Some might tell you to use the pimp-my-library-Pattern to make it look more elegant but in fact it does not in this case.
If your only use case is to execute a function with an element of a multidimensional array when the indices are valid then this code does that and you should use it. You could generalize the method by changing the signature of to take the function to apply to the element and maybe a value if the indices are invalid like this:
def do_to_elt[A](i: Int, j: Int)(f: Int => A, g: => A = ()) =
if (a.isDefinedAt(i) && a(i).isDefinedAt(j)) f(a(i)(j)) else g
but I would not change anything beyond this. This also does not look more elegant but widens your use case.
(Also: If you are working with arrays you mostly do that for performance reasons and in that case it might even be better to not use isDefinedAt but perform validity checks based on the length of the arrays.)

What is most efficient way to do immutable byte arrays in Scala?

I want to get an array of bytes (Array[Byte]) from somewhere (read from file, from socket, etc) and then provide a efficient way to pull bits out of it (e.g. provide a function to extract a 32-bit integer from offset N in array). I would then like to wrap the byte array (hiding it) providing functions to pull bits out from the array (probably using lazy val for each bit to pull out).
I would imagine having a wrapping class that takes an immutable byte array type in the constructor to prove the array contents is never modified. IndexedSeq[Byte] seemed relevant, but I could not work out how to go from Array[Byte] to IndexedSeq[Byte].
Part 2 of the question is if I used IndexedSeq[Byte] will the resultant code be any slower? I need the code to execute as fast as possible, so would stick with Array[Byte] if the compiler could do a better job with it.
I could write a wrapper class around the array, but that would slow things down - one extra level of indirection for each access to bytes in the array. Performance is critical due to the number of array accesses that will be required. I need fast code, but would like to do the code nicely at the same time. Thanks!
PS: I am a Scala newbie.
Treating Array[T] as an IndexedSeq[T] could hardly be simpler:
Array(1: Byte): IndexedSeq[Byte] // trigger an Implicit View
wrapByteArray(Array(1: Byte)) // explicitly calling
Unboxing will kill you long before an extra layer of indirection.
C:\>scala -Xprint:erasure -e "{val a = Array(1: Byte); val b1: Byte = a(0); val
b2 = (a: IndexedSeq[Byte])(0)}"
[[syntax trees at end of erasure]]// Scala source: scalacmd5680604016099242427.s
cala
val a: Array[Byte] = scala.Array.apply((1: Byte), scala.this.Predef.
wrapByteArray(Array[Byte]{}));
val b1: Byte = a.apply(0);
val b2: Byte = scala.Byte.unbox((scala.this.Predef.wrapByteArray(a): IndexedSeq).apply(0));
To avoid this, the Scala collections library should be specialized on the element type, in the same style as Tuple1 and Tuple2. I'm told this is planned, but it's a bit more involved than simply slapping #specialized everywhere, so I don't know how long it will take.
UPDATE
Yes, WrappedArray is mutable, although collection.IndexedSeq[Byte] doesn't have methods to mutate, so you could just trust clients not to cast to a mutable interface. The next release of Scalaz will include ImmutableArray which prevents this.
The boxing comes retrieving an element from the collection via this generic method:
trait SeqLike[+A, +Repr] extends IterableLike[A, Repr] { self =>
def apply(idx: Int): A
}
At the JVM level, this signature is type-erased to:
def apply(idx: Int): Object
If your collection contains primitives, that is, subtypes of AnyVal, they must be boxed in the corresponding wrapper to be returned from this method. For some applications, this is a major performance concern. Entire libraries have been written in Java to avoid this, notably fastutils.
Annotation directed specialization was added to Scala 2.8 to instruct the compiler to generate various versions of a class or method tailored to the permutations of primitive types. This has been applied to a few places in the standard library already, e.g. TupleN, ProductN, Function{0, 1, 2}. If this was also applied to the collections hierarchy, this performance cost could be alleviated.
If you want to work with sequences in Scala, I recommend you choose one of these:
Immutable seqs:
(linked seqs) List, Stream, Queue
(indexed seqs) Vector
Mutable seqs:
(linked seq) ListBuffer
(indexed seq) ArrayBuffer
The new (2.8) Scala collections have been hard to grasp for me, primarily due to shortage of (correct) documentation but also because of the source code (complex hierarchys). To clear my mind I made this pic to visualize the basic structure:
(source: programmera.net)
Also, note that Array is not part of the tree structure, it is a special case, since it wraps the Java array (which is a special case in Java).

efficient sort with custom comparison, but no callback function

I have a need for an efficient sort that doesn't have a callback, but is as customizable as using qsort(). What I want is for it to work like an iterator, where it continuously calls into the sort API in a loop until it is done, doing the comparison in the loop rather than off in a callback function. This way the custom comparison is local to the calling function (and therefore has access to local variables, is potentially more efficient, etc). I have implemented this for an inefficient selection sort, but need it to be efficient, so prefer a quick sort derivative.
Has anyone done anything like this? I tried to do it for quick sort, but trying to turn the algorithm inside out hurt my brain too much.
Below is how it might look in use.
// the array of data we are sorting
MyData array[5000], *firstP, *secondP;
// (assume data is filled in)
Sorter sorter;
// initialize sorter
int result = sortInit (&sorter, array, 5000,
(void **)&firstP, (void **)&secondP, sizeof(MyData));
// loop until complete
while (sortIteration (&sorter, result) == 0) {
// here's where we do the custom comparison...here we
// just sort by member "value" but we could do anything
result = firstP->value - secondP->value;
}
Turning the sort function inside out as you propose isn't likely to make it faster. You're trading indirection on the comparison function for indirection on the item pointers.
It appears you want your comparison function to have access to state information. The quick-n-dirty way to create global variables or a global structure, assuming you don't have more than one thread going at once. The qsort function won't return until all the data is sorted, so in a single threaded environment this should be safe.
The only other thing I would suggest is to locate a source to qsort and modify it to take an extra parameter, a pointer to your state structure. You can then pass this pointer into your comparison function.
Take an existing implementation of qsort and update it to reference the Sorter object for its local variables. Instead of calling a compare function passed in, it would update its state and return to the caller.
Because of recursion in qsort, you'll need to keep some sort of a state stack in your Sorter object. You could accomplish that with an array or a linked-list using dynamic allocation (less efficient). Since most qsort implementations use tail recursion for the larger half and make a recursive call to qsort for the smaller half of the pivot point, you can sort at least 2n elements if your array can hold n states.
A simple solution is to use a inlineble sort function and a inlineble compare callback. When compiled with optimisation, both call get flatten into each other exactly like you want. The only downside is that your choice of sort algorithm is limited because if you recurse or alloc more memory you potentially lose any benefit from doing this. Method with small overhead, like this, work best with small data set.
You can use generic sort function with compare method, size, offset and stride.This way custom comparison can be done by parameter rather then callback. With this way you can use any algorithm. Just use some macro to fill in the most common case because you will have a lot of function argument.
Also, check out the STB library (https://github.com/nothings/stb).
It has sorting function similar to this among many other useful C tools.
What you're asking for has already been done -- it's called std::sort, and it's already in the C++ standard library. Better support for this (among many other things) is part of why well-written C++ is generally faster than C.
You could write a preprocessor macro to output a sort routine, and have the macro take a comparison expression as an argument.
#define GENERATE_SORT(name, type, comparison_expression) \
void name(type* begin, type* end) \
{ /* ... when needed, fill a and b and use comparison_expression */ }
GENERATE_SORT(sort_ints, (*a<*b))
void foo()
{
int array[10];
sort_ints(array, array+10);
}
Two points. I).
_asm
II). basic design limits of compilers.
Compilers have, as a basic purpose, the design goal of avoiding assembler or machine code. They achieve this by imposing certain limits. In this case, we give up a flexibility that we can easily do in assembly code. i.e. split the generated code of the sort into two pieces at the call to the compare function. copy the first half to somewhere. next copy the generated code of the compare function to there, just after the previous copied code of the first part. then copy the last half of the sort code. Finally, we have to deal with a whole series of minor details. See also the concept of "hot patching" running programs.

Resources