Iterating for `setindex!` - arrays

I have some specially-defined arrays in Julia which you can think of being just a composition of many arrays. For example:
type CompositeArray{T}
x::Vector{T}
y::Vector{T}
end
with an indexing scheme
getindex(c::CompositeArray,i::Int) = i <= length(c) ? c.x[i] : c.y[i-length(c.x)]
I do have one caveat: the higher indexing scheme just goes to x itself:
getindex(c::CompositeArray,i::Int...) = c.x[i...]
Now the iterator through these can easily be made as the chain of the iterator on x and then on y. This makes iterating through the values have almost no extra cost. However, can something similar be done for iteration to setindex!?
I was thinking of having a separate dispatch on CartesianIndex{2} just for indexing x vs y and the index, and building an eachindex iterator for that, similar to what CatViews.jl does. However, I'm not certain how that will interact with the i... dispatch, or whether it will be useful in this case.
In addition, will broadcasting automatically use this fast iteration scheme if it's built on eachindex?
Edits:
length(c::CompositeArray) = length(c.x) + length(c.y)
In the real case, x can be any AbstractArray (and thus has a linear index), but since only the linear indexing is used (except for that one user-facing getindex function), the problem really boils down to finding out how to do this with x a Vector.

Making X[CartesianIndex(2,1)] mean something different from X[2,1] is certainly not going to end well. And I would expect similar troubles from the fact that X[100,1] may mean something different from X[100] or if length(X) != prod(size(X)). You're free to break the rules, but you shouldn't be surprised when functions in Base and other packages expect you to follow them.
The safe way to do this would be to make eachindex(::CompositeArray) return a custom iterator over objects that you control entirely. Maybe just throw a wrapper around and forward methods to CartesianRange and CartesianIndex{2} if that data structure is helpful. Then when you get one of these custom index types, you know that SplitIndex(CartesianIndex(1,2)) is indeed intending to refer to the first element in the second array.

Related

Why do filter method implementations create another array instead of modifying the current array?

From what I know, many popular implementations of filter collection methods, e.g. JavaScript's Array#filter method, tend to create a new array rather than modifying it. (As #Berthur mentioned, this is also generally useful in terms of functional programming as well).
However, from what I've seen in homemade methods of filter implementations, sometimes the author chooses to use a while / for loop on a dynamically allocated array (e.g. an ArrayList in Java) and remove elements instead.
I have a general idea of why this is the case (since removing elements requires the rest of the array's elements afterwards to be shifted over, which is O(n) while adding elements is O(1)), but I also know that in the same case, if an element is added to the end of an array when the array is full, it requires memory to be allocated, which requires, in the case for Java, the array to be copied.
Thus, is there some mathematical reason of why creating a new array for filtering is (generally) faster than removing & moving elements over, or is it just for the guaranteed immutability over the original array that it guarantees?
It's not generally faster, and it's not done for performance reasons. It's more of a programming paradigm, as well as being a convenient tool.
While in-place algorithms are often faster for performance and/or memory critical applications, they need to know about the underlying implementation of the data structure, and become more specific. This immutable approach allows for more general functionality, apart from being convenient. The approach is common in functional programming. As you say, it guarantees immutability, which makes it compatible with this way of thinking.
In your Javascript example, for instance, notice that you can call filter on a regular array, but you could also call it on a TypedArray. Now, typed arrays cannot be resized, so performing an in-place filter would not be possible in the first place. But the filter method behaves in the same way through their common interface, following the principles of polymorphism.
Ultimately, these functions are just available to you and while they can be very convenient for many cases, it is up to you as a programmer to decide whether they cover your specific need or whether you must implement your own custom algorithm.

Efficient/Parallel Apply an operation to each element in an Array in Java <= 1.7?

I am interested in ways of applying an operation (independetly) to each element in a Java Array. For example, clipping each element of a numeric array to be no more than a given element.
Example:
myArray.clip(5)
Or
Utils.clip(myArray, 5)
woud set each element greater than 5 to be 5
The most straightforward method is to iterate over the elements, but I don't like it for two reasons:
It is Iterative and doesn't exploit the parrallization chances
The code doesn't look beautiful, a mapping or vecotorized syntax would be nicer
I need to do such a clipping operation about 5000 times, each time on a 70 X 10 (2D Array)
If Java <= 1.7 doesn't provide ways to do that in the standard library. How can I achieve what I want (two points above) using other libs or Java 1.8 Features.
The traditional Java 7 technique for operating on the data in a collection in parallel is the "fork/join" pattern. The notion is that one decomposes the work to be done into smaller pieces, forks off a task to complete those pieces, and then join each task at the end.
I'm not sure your case warrants this, though. Depending on how complicated a "clip" operation is, it might be more straightforward to create a pair of threads to operate on the two axes simultaneously. Though, I may be misunderstanding your requirements.
Another solution might be a time/space trade-off. It may be more useful to maintain a clipped version of the data as you update the primary collection. Sort of a producer-consumer model, where there is something that looks for changes to the collection, and maintains a clipped version.
I am not sure about Arrays themselves, but if you have a "Collection" compatible type you can use parallel aggregate operations in Java 8.
Refer to https://docs.oracle.com/javase/tutorial/collections/streams/index.html
Using Java 8 features, this is how I went about it, I iterated over the rows and did parallel operations on it. I didn't find any standard methods that takes a
double [] []
and use it as a stream or a
parallelSetAll(double [] [] ...)
My code, I use Apache commons-math library for my matrix implementation because it provides utility functions based on the RealMatrix type implemented in the library
Can this code be optimized further without having to go too much too level. Like take things out of the for loop or do the assignments later , etc
public static RealMatrix clipLower(RealMatrix m, double lowerBound){
// TODO see if you really need to modify m or just return a modified copy of m
//process the matrix row by row, each row gets processed in parallel
// TODO See if you can directly parallelize on the entire matrix not row by row
// OPEN implement using Arrays.parallelSetAll Method?
int nrOfRows = m.getRowDimension();
for(int i = 0; i < nrOfRows; i++){
// TODO move the declaration outside the for loop
double[] currRow = m.getRow(i);
double[] newRow = Arrays.stream(currRow)
.parallel()
.map((number) -> (number < lowerBound) ? lowerBound : number)
.toArray();
m.setRow(i, newRow);
}
return m;
}

Sorting and managing numerous variables

My project has classes which, unavoidably, contain hundreds upon hundreds of variables that I'm always having to keep straight. For example, I'm always having to keep track of specific kinds of variables for a recurring set of "items" that occur inside of a class, where placing those variables between multiple classes would cause a lot of confusion.
How do I better sort my variables to keep from going crazy, especially when it comes time to save my data?
Am I missing something? Actionscript is an Object Oriented language, so you might have hundreds of variables, but unless you've somehow treated it like a grab bag and dumped it all in one place, everything should be to hand. Without knowing what all you're keeping track of, it's hard to give concrete advice, but here's an example from a current project I'm working on, which is a platform for building pre-employment assessments.
The basic unit is a Question. A Question has a stem, text that can go in the status bar, a collection of answers, and a collection of measures of things we're tracking about what the user does in that particular type of questions.
The measures are, again, their own type of object, and come in two "flavors": one that is used to track a time limit and one that isn't. The measure has a name (so we know where to write back to the database) and a value (which tells us what). Timed ones also have a property for the time limit.
When we need to time the question, we hand that measure to yet another object that counts the time down and a separate object that displays the time (if appropriate for the situation). The answers, known as distractors, have a label and a value that they can impart to the appropriate measure based on the user selection. For example, if a user selects "d", its value, "4" is transferred to the measure that stores the user's selection.
Once the user submits his answer, we loop through all the measures for the question and send those to the database. If those were not treated as a collection (in this case, a Vector), we'd have to know exactly what specific measures are being stored for each question and each question would have a very different structure that we'd have to dig through. So if looping through collections is your issue, I think you should revisit that idea. It saves a lot of code and is FAR more efficient than "var1", "var2", "var3."
If the part you think is unweildy is the type checking you have to do because literally anything could be in there, then Vector could be a good solution for you as long as you're using at least Flash Player 10.
So, in summary:
When you have a lot of related properties, write a Class that keeps all of those related bits and pieces together (like my Question).
When objects have 0-n "things" that are all of the same or very similar, use a collection of some sort, such as an Array or Vector, to allow you to iterate through them as a group and perform the same operation on each (for example, each Question is part of a larger grouping that allows each question to be presented in turn, and each question has a collection of distractors and another of measures.
These two concepts, used together, should help keep your information tidy and organized.
While I'm certain there are numerous ways of keeping arrays straight, I have found a method that works well for me. Best of all, it collapses large amounts of information into a handful of arrays that I can parse to an XML file or other storage method. I call this method my "indexed array system".
There are actually multiple ways to do this: creating a handful of 1-dimensional arrays, or creating 2-dimensional (or higher) array(s). Both work equally well, so choose the one that works best for your code. I'm only going to show the 1-dimensional method here. Those of you who are familiar with arrays can probably figure out how to rewrite this to use higher dimensional arrays.
I use Actionscript 3, but this approach should work with almost any programming or scripting language.
In this example, I'm trying to keep various "properties" of different "activities" straight. In this case, we'll say these properties are Level, High Score, and Play Count. We'll call the activities Pinball, Word Search, Maze, and Memory.
This method involves creating multiple arrays, one for each property, and creating constants that hold the integer "key" used for each activity.
We'll start by creating the constants, as integers. Constants work for this, because we never change them after compile. The value we put into each constant is the index the corresponding data will always be stored at in the arrays.
const pinball:int = 0;
const wordsearch:int = 1;
const maze:int = 2;
const memory:int = 3;
Now, we create the arrays. Remember, arrays start counting from zero. Since we want to be able to modify the values, this should be a regular variable.
Note, I am constructing the array to be the specific length we need, with the default value for the desired data type in each slot. I've used all integers here, but you can use just about any data type you need.
var highscore:Array = [0, 0, 0, 0];
var level:Array = [0, 0, 0, 0];
var playcount:Array = [0, 0, 0, 0];
So, we have a consistent "address" for each property, and we only had to create four constants, and three arrays, instead of 12 variables.
Now we need to create the functions to read and write to the arrays using this system. This is where the real beauty of the system comes in. Be sure this function is written in public scope if you want to read/write the arrays from outside this class.
To create the function that gets data from the arrays, we need two arguments: the name of the activity and the name of the property. We also want to set up this function to return a value of any type.
GOTCHA WARNING: In Actionscript 3, this won't work in static classes or functions, as it relies on the "this" keyword.
public function fetchData(act:String, prop:String):*
{
var r:*;
r = this[prop][this[act]];
return r;
}
That queer bit of code, r = this[prop][this[act]], simply uses the provided strings "act" and "prop" as the names of the constant and array, and sets the resulting value to r. Thus, if you feed the function the parameters ("maze", "highscore"), that code will essentially act like r = highscore[2] (remember, this[act] returns the integer value assigned to it.)
The writing method works essentially the same way, except we need one additional argument, the data to be written. This argument needs to be able to accept any
GOTCHA WARNING: One significant drawback to this system with strict typing languages is that you must remember the data type for the array you're writing to. The compiler cannot catch these type errors, so your program will simply throw a fatal error if it tries to write the wrong value type.
One clever way around this is to create different functions for different data types, so passing the wrong data type in an argument will trigger a compile-time error.
public function writeData(act:String, prop:String, val:*):void
{
this[prop][this[act]] = val;
}
Now, we just have one additional problem. What happens if we pass an activity or property name that doesn't exist? To protect against this, we just need one more function.
This function will validate a provided constant or variable key by attempting to access it, and catching the resulting fatal error, returning false instead. If the key is valid, it will return true.
function validateName(ID:String):Boolean
{
var checkthis:*
var r:Boolean = true;
try
{
checkthis = this[ID];
}
catch (error:ReferenceError)
{
r = false;
}
return r;
}
Now, we just need to adjust our other two functions to take advantage of this. We'll wrap the function's code inside an if statement.
If one of the keys is invalid, the function will do nothing - it will fail silently. To get around this, just put a trace (a.k.a. print) statement or a non-fatal error in the else construct.
public function fetchData(act:String, prop:String):*
{
var r:*;
if(validateName(act) && validateName(prop))
{
r = this[prop][this[act]];
return r;
}
}
public function writeData(act:String, prop:String, val:*):void
{
if(validateName(act) && validateName(prop))
{
this[prop][this[act]] = val;
}
}
Now, to use these functions, you simply need to use one line of code each. For the example, we'll say we have a text object in the GUI that shows the high score, called txtHighScore. I've omitted the necessary typecasting for the sake of the example.
//Get the high score.
txtHighScore.text = fetchData("maze", "highscore");
//Write the new high score.
writeData("maze", "highscore", txtHighScore.text);
I hope ya'll will find this tutorial useful in sorting and managing your variables.
(Afternote: You can probably do something similar with dictionaries or databases, but I prefer the flexibility with this method.)

Array.isDefinedAt for n-dimensional arrays in scala

Is there an elegant way to express
val a = Array.fill(2,10) {1}
def do_to_elt(i:Int,j:Int) {
if (a.isDefinedAt(i) && a(i).isDefinedAt(j)) f(a(i)(j))
}
in scala?
I recommend that you not use arrays of arrays for 2D arrays, for three main reasons. First, it allows inconsistency: not all columns (or rows, take your pick) need to be the same size. Second, it is inefficient--you have to follow two pointers instead of one. Third, very few library functions exist that work transparently and usefully on arrays of arrays as 2D arrays.
Given these things, you should either use a library that supports 2D arrays, like scalala, or you should write your own. If you do the latter, among other things, this problem magically goes away.
So in terms of elegance: no, there isn't a way. But beyond that, the path you're starting on contains lots of inelegance; you would probably do best to step off of it quickly.
You just need to check the array at index i with isDefinedAt if it exists:
def do_to_elt(i:Int, j:Int): Unit =
if (a.isDefinedAt(i) && a(i).isDefinedAt(j)) f(a(i)(j))
EDIT: Missed that part about the elegant solution as I focused on the error in the code before your edit.
Concerning elegance: no, per se there is no way to express it in a more elegant way. Some might tell you to use the pimp-my-library-Pattern to make it look more elegant but in fact it does not in this case.
If your only use case is to execute a function with an element of a multidimensional array when the indices are valid then this code does that and you should use it. You could generalize the method by changing the signature of to take the function to apply to the element and maybe a value if the indices are invalid like this:
def do_to_elt[A](i: Int, j: Int)(f: Int => A, g: => A = ()) =
if (a.isDefinedAt(i) && a(i).isDefinedAt(j)) f(a(i)(j)) else g
but I would not change anything beyond this. This also does not look more elegant but widens your use case.
(Also: If you are working with arrays you mostly do that for performance reasons and in that case it might even be better to not use isDefinedAt but perform validity checks based on the length of the arrays.)

Why no immutable arrays in scala standard library?

Scala has all sorts sorts of immutable sequences like List, Vector,etc. I have been surprised to find no implementation of immutable indexed sequence backed by a simple array (Vector seems way too complicated for my needs).
Is there a design reason for this? I could not find a good explanation on the mailing list.
Do you have a recommendation for an immutable indexed sequence that has close to the same performances as an array? I am considering scalaz's ImmutableArray, but it has some issues with scala trunk for example.
Thank you
You could cast your array into a sequence.
val s: Seq[Int] = Array(1,2,3,4)
The array will be implicitly converted to a WrappedArray. And as the type is Seq, update operations will no longer be available.
So, let's first make a distinction between interface and class. The interface is an API design, while the class is the implementation of such API.
The interfaces in Scala have the same name and different package to distinguish with regards to immutability: Seq, immutable.Seq, mutable.Seq.
The classes, on the other hand, usually don't share a name. A List is an immutable sequence, while a ListBuffer is a mutable sequence. There are exceptions, like HashSet, but that's just a coincidence with regards to implementation.
Now, and Array is not part of Scala's collection, being a Java class, but its wrapper WrappedArray shows clearly where it would show up: as a mutable class.
The interface implemented by WrappedArray is IndexedSeq, which exists are both mutable and immutable traits.
The immutable.IndexedSeq has a few implementing classes, including the WrappedString. The general use class implementing it, however, is the Vector. That class occupies the same position an Array class would occupy in the mutable side.
Now, there's no more complexity in using a Vector than using an Array, so I don't know why you call it complicated.
Perhaps you think it does too much internally, in which case you'd be wrong. All well designed immutable classes are persistent, because using an immutable collection means creating new copies of it, so they have to be optimized for that, which is exactly what Vector does.
Mostly because there are no arrays whatsoever in Scala. What you're seeing is java's arrays pimped with a few methods that help them fit into the collection API.
Anything else wouldn't be an array, with it's unique property of not suffering type erasure, or the broken variance. It would just be another type with indexes and values. Scala does have that, it's called IndexedSeq, and if you need to pass it as an array to some 3rd party API then you can just use .toArray
Scala 2.13 has added ArraySeq, which is an immutable sequence backed by an array.
Scala 3 now has IArray, an Immutable Array.
It is implemented as an Opaque Type Alias, with no runtime overhead.
The point of the scala Array class is to provide a mechanism to access the abilities of Java arrays (but without Java's awful design decision of allowing arrays to be covariant within its type system). Java arrays are mutable, hence so are those in the scala standard library.
Suppose there were also another class immutable.Array in the library but that the compiler were also to use a Java array as the underlying structure (for efficiency/speed). The following code would then compile and run:
val i = immutable.Array("Hello")
i.asInstanceOf[Array[String]](0) = "Goodbye"
println( i(0) ) //I thought i was immutable :-(
That is, the array would really be mutable.
The problem with Arrays is that they have a fixed size. There is no operation to add an element to an array, or remove one from it.
You can keep an array that you guess will be long enough as a backing store, "wasting" the memory you're not using, keep track of the last used index, and copy to a larger array if you need the extra space. That copying is O(N) obviously.
Changing a single element is also O(N) as you will need to copy over the entire array. There is no structural sharing, which is the lynchpin of performant functional datastructures.
You could also allocate an extra array for the "overflowing" elements, and somehow keep track of your arrays. At that point you're on your way of re-inventing Vector.
In short, due to their unsuitablility for structural sharing, immutable facades for arrays have terrible runtime performance characteristics for most common operations like adding an element, removing an element, and changing an element.
That only leaves the use-case of a fixed size fixed content data-carrier, and that use-case is relatively rare. Most uses better served with List, Stream or Vector
You can simply use Array[T].toIndexSeq to convert Array[T] to ArraySeq[T], which is of type immutable.IndexedSeq[T].
(after Scala 2.13.0)
scala> val array = Array(0, 1, 2)
array: Array[Int] = Array(0, 1, 2)
scala> array.toIndexedSeq
res0: IndexedSeq[Int] = ArraySeq(0, 1, 2)

Resources