Why do filter method implementations create another array instead of modifying the current array? - arrays

From what I know, many popular implementations of filter collection methods, e.g. JavaScript's Array#filter method, tend to create a new array rather than modifying it. (As #Berthur mentioned, this is also generally useful in terms of functional programming as well).
However, from what I've seen in homemade methods of filter implementations, sometimes the author chooses to use a while / for loop on a dynamically allocated array (e.g. an ArrayList in Java) and remove elements instead.
I have a general idea of why this is the case (since removing elements requires the rest of the array's elements afterwards to be shifted over, which is O(n) while adding elements is O(1)), but I also know that in the same case, if an element is added to the end of an array when the array is full, it requires memory to be allocated, which requires, in the case for Java, the array to be copied.
Thus, is there some mathematical reason of why creating a new array for filtering is (generally) faster than removing & moving elements over, or is it just for the guaranteed immutability over the original array that it guarantees?

It's not generally faster, and it's not done for performance reasons. It's more of a programming paradigm, as well as being a convenient tool.
While in-place algorithms are often faster for performance and/or memory critical applications, they need to know about the underlying implementation of the data structure, and become more specific. This immutable approach allows for more general functionality, apart from being convenient. The approach is common in functional programming. As you say, it guarantees immutability, which makes it compatible with this way of thinking.
In your Javascript example, for instance, notice that you can call filter on a regular array, but you could also call it on a TypedArray. Now, typed arrays cannot be resized, so performing an in-place filter would not be possible in the first place. But the filter method behaves in the same way through their common interface, following the principles of polymorphism.
Ultimately, these functions are just available to you and while they can be very convenient for many cases, it is up to you as a programmer to decide whether they cover your specific need or whether you must implement your own custom algorithm.

Related

How to manage a large number of variables in C?

In an implementation of the Game of Life, I need to handle user events, perform some regular (as in periodic) processing and draw to a 2D canvas. The details are not particularly important. Suffice it to say that I need to keep track of a large(-ish) number of variables. These are things like: a structure representing the state of the system (live cells), pointers to structures provided by the graphics library, current zoom level, coordinates of the origin and I am sure a few others.
In the main function, there is a game loop like this:
// Setup stuff
while (!finished) {
while (get_event(&e) != 0) {
if (e.type == KEYBOARD_EVENT) {
switch (e.key.keysym) {
case q:
case x:
// More branching and nesting follows
The maximum level of nesting at the moment is 5. It quickly becomes unmanageable and difficult to read, especially on a small screen. The solution then is to split this up into multiple functions. Something like:
while (!finished {
while (get_event(&e) !=0) {
handle_event(state, origin_x, origin_y, &canvas, e...) //More parameters
This is the crux of the question. The subroutine must necessarily have access to the state (represented by the origin, the canvas, the live cells etc.) in order to function. Passing them all explicitly is error prone (which order does the subroutine expect them in) and can also be difficult to read. Aside from that, having functions with potentially 10+ arguments strikes me as a symptom of other design flaws. However the alternatives that I can think of, don't seem any better.
To summarise:
Accept deep nesting in the game loop.
Define functions with very many arguments.
Collate (somewhat) related arguments into structs - This really only hides the problem, especially since the arguments are only loosely related.
Define variables that represent the application state with file scope (static int origin_x; for example). If it weren't for the fact that it has been drummed into me that global variable are usually a terrible idea, this would be my preferred option. But if I want to display two views of the same instance of the Game of Life in the future, then the file scope no longer looks so appealing.
The question also applies in slightly more general terms I suppose: How do you pass state around a complicated program safely and in a readable way?
EDIT:
My motivations here are not speed or efficiency or performance or anything like this. If the code takes 20% longer to run as a result of the choice made here that's just fine. I'm primarily interested in what is less likely to confuse me and cause the least headache in 6 months time.
I would consider the canvas as one variable, containing a large 2D array...
consider static allocation
bool canvas[ROWS][COLS];
or dynamic
bool *canvas = malloc(N*M*sizeof(int));
In both cases you can refer to the cell at position i,j as canvas[i][j]
though for dynamic allocation, do not forget to free(canvas) at the end. You can then use a nested loop to update your state.
Search for allocating/handling a 2d array in C and examples or tutorials... Possibly check something like this or similar? Possibly this? https://www.geeksforgeeks.org/nested-loops-in-c-with-examples/
Also consider this Fastest way to zero out a 2d array in C?

Array vs ArraySeq comparison

This is a bit of a general question but I was wondering if anybody could advise me on what would be advantages of working with Array vs ArraySeq. From what I have seen Array is scala's representation of java Array and there are not too many members in its API whereas ArraySeq seems to contain a much richer API.
There are actually four different classes you could choose from to get mutable array-like functionality.
Array + ArrayOps
WrappedArray
ArraySeq
ArrayBuffer
Array is a plain old Java array. It is by far the best way to go for low-level access to arrays of primitives. There's no overhead. Also it can act like the Scala collections thanks to implicit conversion to ArrayOps, which grabs the underlying array, applies the appropriate method, and, if appropriate, returns a new array. But since ArrayOps is not specialized for primitives, it's slow (as slow as boxing/unboxing always is).
WrappedArray is a plain old Java array, but wrapped in all of Scala's collection goodies. The difference between it and ArrayOps is that WrappedArray returns another WrappedArray--so at least you don't have the overhead of having to re-ArrayOps your Java primitive array over and over again for each operation. It's good to use when you are doing a lot of interop with Java and you need to pass in plain old Java arrays, but on the Scala side you need to manipulate them conveniently.
ArraySeq stores its data in a plain old Java array, but it no longer stores arrays of primitives; everything is an array of objects. This means that primitives get boxed on the way in. That's actually convenient if you want to use the primitives many times; since you've got boxed copies stored, you only have to unbox them, not box and unbox them on every generic operation.
ArrayBuffer acts like an array, but you can add and remove elements from it. If you're going to go all the way to ArraySeq, why not have the added flexibility of changing length while you're at it?
From the scala-lang.org forum:
Array[T] - Benefits: Native, fast -
Limitations: Few methods (only apply,
update, length), need to know T at
compile-time, because Java bytecode
represents (char[] different from
int[] different from Object[])
ArraySeq[T] (the class formerly known
as GenericArray[T]): - Benefits: Still
backed by a native Array, don't need
to know anything about T at
compile-time (new ArraySeq[T] "just
works", even if nothing is known about
T), full suite of SeqLike methods,
subtype of Seq[T] - Limitations: It's
backed by an Array[AnyRef], regardless
of what T is (if T is primitive, then
elements will be boxed/unboxed on
their way in or out of the backing
Array)
ArraySeq[Any] is much faster than
Array[Any] when handling primitives.
In any code you have Array[T], where T
isn't <: AnyRef, you'll get faster
performance out of ArraySeq.
Array is a direct representation of Java's Array, and uses the exact same bytecode on the JVM.
The advantage of Array is that it's the only collection type on the JVM to not undergo type erasure, Arrays are also able to directly hold primitives without boxing, this can make them very fast under some circumstances.
Plus, you get Java's messed up array covariance behaviour. (If you pass e.g. an Array[Int] to some Java class it can be assigned to a variable of type Array[Object] which will then throw an ArrayStoreException on trying to add anything that isn't an int.)
ArraySeq is rarely used nowadays, it's more of a historic artifact from older versions of Scala that treated arrays differently. Seeing as you have to deal with boxing anyway, you're almost certain to find that another collection type is a better fit for your requirements.
Otherwise... Arrays have exactly the same API as ArraySeq, thanks to an implicit conversion from Array to ArrayOps.
Unless you have a specific need for the unique properties of arrays, try to avoid them too.
See This Talk at around 19:30 or This Article for an idea of the sort of problems that Arrays can introduce.
After watching that video, it's interesting to note that Scala uses Seq for varargs :)
As you observed correctly, ArraySeq has a richer API as it is derived from IndexedSeq (and so on) whereas Array is a direct representation of Java arrays.
The relation between the both could be roughly compared to the relation of the ArrayList and arrays in Java.
Due to it's API, I would recommend using the ArraySeq unless there is a specific reason not to do so. Using toArray(), you can convert to an Array any time.

Why no immutable arrays in scala standard library?

Scala has all sorts sorts of immutable sequences like List, Vector,etc. I have been surprised to find no implementation of immutable indexed sequence backed by a simple array (Vector seems way too complicated for my needs).
Is there a design reason for this? I could not find a good explanation on the mailing list.
Do you have a recommendation for an immutable indexed sequence that has close to the same performances as an array? I am considering scalaz's ImmutableArray, but it has some issues with scala trunk for example.
Thank you
You could cast your array into a sequence.
val s: Seq[Int] = Array(1,2,3,4)
The array will be implicitly converted to a WrappedArray. And as the type is Seq, update operations will no longer be available.
So, let's first make a distinction between interface and class. The interface is an API design, while the class is the implementation of such API.
The interfaces in Scala have the same name and different package to distinguish with regards to immutability: Seq, immutable.Seq, mutable.Seq.
The classes, on the other hand, usually don't share a name. A List is an immutable sequence, while a ListBuffer is a mutable sequence. There are exceptions, like HashSet, but that's just a coincidence with regards to implementation.
Now, and Array is not part of Scala's collection, being a Java class, but its wrapper WrappedArray shows clearly where it would show up: as a mutable class.
The interface implemented by WrappedArray is IndexedSeq, which exists are both mutable and immutable traits.
The immutable.IndexedSeq has a few implementing classes, including the WrappedString. The general use class implementing it, however, is the Vector. That class occupies the same position an Array class would occupy in the mutable side.
Now, there's no more complexity in using a Vector than using an Array, so I don't know why you call it complicated.
Perhaps you think it does too much internally, in which case you'd be wrong. All well designed immutable classes are persistent, because using an immutable collection means creating new copies of it, so they have to be optimized for that, which is exactly what Vector does.
Mostly because there are no arrays whatsoever in Scala. What you're seeing is java's arrays pimped with a few methods that help them fit into the collection API.
Anything else wouldn't be an array, with it's unique property of not suffering type erasure, or the broken variance. It would just be another type with indexes and values. Scala does have that, it's called IndexedSeq, and if you need to pass it as an array to some 3rd party API then you can just use .toArray
Scala 2.13 has added ArraySeq, which is an immutable sequence backed by an array.
Scala 3 now has IArray, an Immutable Array.
It is implemented as an Opaque Type Alias, with no runtime overhead.
The point of the scala Array class is to provide a mechanism to access the abilities of Java arrays (but without Java's awful design decision of allowing arrays to be covariant within its type system). Java arrays are mutable, hence so are those in the scala standard library.
Suppose there were also another class immutable.Array in the library but that the compiler were also to use a Java array as the underlying structure (for efficiency/speed). The following code would then compile and run:
val i = immutable.Array("Hello")
i.asInstanceOf[Array[String]](0) = "Goodbye"
println( i(0) ) //I thought i was immutable :-(
That is, the array would really be mutable.
The problem with Arrays is that they have a fixed size. There is no operation to add an element to an array, or remove one from it.
You can keep an array that you guess will be long enough as a backing store, "wasting" the memory you're not using, keep track of the last used index, and copy to a larger array if you need the extra space. That copying is O(N) obviously.
Changing a single element is also O(N) as you will need to copy over the entire array. There is no structural sharing, which is the lynchpin of performant functional datastructures.
You could also allocate an extra array for the "overflowing" elements, and somehow keep track of your arrays. At that point you're on your way of re-inventing Vector.
In short, due to their unsuitablility for structural sharing, immutable facades for arrays have terrible runtime performance characteristics for most common operations like adding an element, removing an element, and changing an element.
That only leaves the use-case of a fixed size fixed content data-carrier, and that use-case is relatively rare. Most uses better served with List, Stream or Vector
You can simply use Array[T].toIndexSeq to convert Array[T] to ArraySeq[T], which is of type immutable.IndexedSeq[T].
(after Scala 2.13.0)
scala> val array = Array(0, 1, 2)
array: Array[Int] = Array(0, 1, 2)
scala> array.toIndexedSeq
res0: IndexedSeq[Int] = ArraySeq(0, 1, 2)

What's the difference between an object and a struct in OOP?

What distinguishes and object from a struct?
When and why do we use an object as opposed to a struct?
How does an array differ from both, and when and why would we use an array as opposed to an object or a struct?
I would like to get an idea of what each is intended for.
Obviously you can blur the distinctions according to your programming style, but generally a struct is a structured piece of data. An object is a sovereign entity that can perform some sort of task. In most systems, objects have some state and as a result have some structured data behind them. However, one of the primary functions of a well-designed class is data hiding — exactly how a class achieves whatever it does is opaque and irrelevant.
Since classes can be used to represent classic data structures such as arrays, hash maps, trees, etc, you often see them as the individual things within a block of structured data.
An array is a block of unstructured data. In many programming languages, every separate thing in an array must be of the same basic type (such as every one being an integer number, every one being a string, or similar) but that isn't true in many other languages.
As guidelines:
use an array as a place to put a large group of things with no other inherent structure or hierarchy, such as "all receipts from January" or "everything I bought in Denmark"
use structured data to compound several discrete bits of data into a single block, such as you might want to combine an x position and a y position to describe a point
use an object where there's a particular actor or thing that thinks or acts for itself
The implicit purpose of an object is therefore directly to associate tasks with the data on which they can operate and to bundle that all together so that no other part of the system can interfere. Obeying proper object-oriented design principles may require discipline at first but will ultimately massively improve your code structure and hence your ability to tackle larger projects and to work with others.
Generally speaking, objects bring the full object oriented functionality (methods, data, virtual functions, inheritance, etc, etc) whereas structs are just organized memory. Structs may or may not have support for methods / functions, but they generally won't support inheritance and other full OOP features.
Note that I said generally speaking ... individual languages are free to overload terminology however they want to.
Arrays have nothing to do with OO. Indeed, pretty much every language around support arrays. Arrays are just blocks of memory, generally containing a series of similar items, usually indexable somehow.
What distinguishes and object from a struct?
There is no notion of "struct" in OOP. The definition of structures depends on the language used. For example in C++ classes and structs are the same, but class members are private by defaults while struct members are public to maintain compatibility with C structs. In C# on the other hand, struct is used to create value types while class is for reference types. C has structs and is not object oriented.
When and why do we use an object as opposed to a struct?
Again this depends on the language used. Normally structures are used to represent PODs (Plain Old Data), meaning that they don't specify behavior that acts on the data and are mainly used to represent records and not objects. This is just a convention and is not enforced in C++.
How does an array differ from both,
and when and why would we use an
array as opposed to an object or a
struct?
An array is very different. An array is normally a homogeneous collection of elements indexed by an integer. A struct is a heterogeneous collection where elements are accessed by name. You'd use an array to represent a collection of objects of the same type (an array of colors for example) while you'd use a struct to represent a record containing data for a certain object (a single color which has red, green, and blue elements)
Short answer: Structs are value types. Classes(Objects) are reference types.
By their nature, an object has methods, a struct doesn't.
(nothing stops you from having an object without methods, jus as nothing stops you from, say, storing an integer in a float-typed variable)
When and why do we use an object as opposed to a struct?
This is a key question. I am using structs and procedural code modules to provide most of the benefits of OOP. Structs provide most of the data storage capability of objects (other than read only properties). Procedural modules provide code completion similar to that provided by objects. I can enter module.function in the IDE instead of object.method. The resulting code looks the same. Most of my functions now return stucts rather than single values. The effect on my code has been dramatic, with code readability going up and the number of lines being greatly reduced. I do not know why procedural programming that makes extensive use of structs is not more common. Why not just use OOP? Some of the languages that I use are only procedural (PureBasic) and the use of structs allows some of the benefits of OOP to be experienced. Others languages allow a choice of procedural or OOP (VBA and Python). I currently find it easier to use procedural programming and in my discipline (ecology) I find it very hard to define objects. When I can't figure out how to group data and functions together into objects in a philosophically coherent collection then I don't have a basis for creating classes/objects. With structs and functions, there is no need for defining a hierarchy of classes. I am free to shuffle functions between modules which helps me to improve the organisation of my code as I go. Perhaps this is a precursor to going OO.
Code written with structs has higher performance than OOP based code. OOP code has encapsulation, inheritance and polymorphism, however I think that struct/function based procedural code often shares these characteristics. A function returns a value only to its caller and only within scope, thereby achieving encapsulation. Likewise a function can be polymorphic. For example, I can write a function that calculates the time difference between two places with two internal algorithms, one that considers the international date line and one that does not. Inheritance usually refers to methods inheriting from a base class. There is inheritance of sorts with functions that call other functions and use structs for data transfer. A simple example is passing up an error message through a stack of nested functions. As the error message is passed up, it can be added to by the calling functions. The result is a stack trace with a very descriptive error message. In this case a message inherited through several levels. I don't know how to describe this bottom up inheritance, (event driven programming?) but it is a feature of using functions that return structs that is absent from procedural programming using simple return values. At this point in time I have not encountered any situations where OOP would be more productive than functions and structs. The surprising thing for me is that very little of the code available on the internet is written this way. It makes me wonder if there is any reason for this?
Arrays are ordered collection of items that (usually) are of the same types. Items can be accessed by index. Classic arrays allow integer indices only, however modern languages often provide so called associative arrays (dictionaries, hashes etc.) that allow use e.g. strings as indices.
Structure is a collection of named values (fields) which may be of 'different types' (e.g. field a stores integer values, field b - string values etc.). They (a) group together logically connected values and (b) simplify code change by hiding details (e.g. changing structure layout don't affect signature of function working with this structure). The latter is called 'encapsulation'.
Theroretically, object is an instance of structure that demonstrates some behavior in response to messages being sent (i.e., in most languages, having some methods). Thus, the very usefullness of object is in this behavior, not its fields.
Different objects can demonstrate different behavior in response to the same messages (the same methods being called), which is called 'polymorphism'.
In many (but not all) languages objects belong to some classes and classes can form hierarchies (which is called 'inheritance').
Since object methods can work with its fields directly, fields can be hidden from access by any code except for this methods (e.g. by marking them as private). Thus encapsulation level for objects can be higher than for structs.
Note that different languages add different semantics to this terms.
E.g.:
in CLR languages (C#, VB.NET etc) structs are allocated on stack/in registers and objects are created in heap.
in C++ structs have all fields public by default, and objects (instances of classes) have all fields private.
in some dynamic languages objects are just associative arrays which store values and methods.
I also think it's worth mentioning that the concept of a struct is very similar to an "object" in Javascript, which is defined very differently than objects in other languages. They are both referenced like "foo.bar" and the data is structured similarly.
As I see it an object at the basic level is a number of variables and a number of methods that manipulate those variables, while a struct on the other hand is only a number of variables.
I use an object when you want to include methods, I use a struct when I just want a collection of variables to pass around.
An array and a struct is kind of similar in principle, they're both a number of variables. Howoever it's more readable to write myStruct.myVar than myArray[4]. You could use an enum to specify the array indexes to get myArray[indexOfMyVar] and basically get the same functionality as a struct.
Of course you can use constants or something else instead of variables, I'm just trying to show the basic principles.
This answer may need the attention of a more experienced programmer but one of the differences between structs and objects is that structs have no capability for reflection whereas objects may. Reflection is the ability of an object to report the properties and methods that it has. This is how 'object explorer' can find and list new methods and properties created in user defined classes. In other words, reflection can be used to work out the interface of an object. With a structure, there is no way that I know of to iterate through the elements of the structure to find out what they are called, what type they are and what their values are.
If one is using structs as a replacement for objects, then one can use functions to provide the equivalent of methods. At least in my code, structs are often used for returning data from user defined functions in modules which contain the business logic. Structs and functions are as easy to use as objects but functions lack support for XML comments. This means that I constantly have to look at the comment block at the top of the function to see just what the function does. Often I have to read the function source code to see how edge cases are handled. When functions call other functions, I often have to chase something several levels deep and it becomes hard to figure things out. This leads to another benefit of OOP vs structs and functions. OOP has XML comments which show up as tool tips in the IDE (in most but not all OOP languages) and in OOP there are also defined interfaces and often an object diagram (if you choose to make them). It is becoming clear to me that the defining advantage of OOP is the capability of documenting the what code does what and how it relates to other code - the interface.

Does using lists of structs make sense in cocoa?

This question has spawned out of this one. Working with lists of structs in cocoa is not simple. Either use NSArray and encode/decode, or use a C type array and lose the commodities of NSArray. Structs are supposed to be simple, but when a list is needed, one would tend to build a class instead.
When does using lists of structs make sense in cocoa?
I know there are already many questions regarding structs vs classes, and I've read users argue that it's the same answer for every language, but at least cocoa should have its own specific answers to this, if only because of KVC or bindings (as Peter suggested on the first question).
Cocoa has a few common types that are structs, not objects: NSPoint, NSRect, NSRange (and their CG counterparts).
When in doubt, follow Cocoa's lead. If you find yourself dealing with a large number of small, mostly-data objects, you might want to make them structs instead for efficiency.
Using NSArray/NSMutableArray as the top-level container, and wrapping the structs in an NSValue will probably make your life a lot easier. I would only go to a straight C-type array if you find NSArray to be a performance bottleneck, or possibly if the array is essentially read-only.
It is convenient and useful at times to use structs, especially when you have to drop down to C, such as when working with an existing library or doing system level stuff. Sometimes you just want a compact data structure without the overhead of a class. If you need many instances of such structs, it can make a real impact on performance and memory footprint.
Another way to do an array of structs is to use the NSPointerArray class. It takes a bit more thought to set up but it works pretty much just like an NSArray after that and you don't have to bother with boxing/unboxing or wrapping in a class so accessing the data is more convenient, and it doesn't take up the extra memory of a class.
NSPointerFunctions *pf = [[NSPointerFunctions alloc] initWithOptions:NSPointerFunctionsMallocMemory |
NSPointerFunctionsStructPersonality |
NSPointerFunctionsCopyIn];
pf.sizeFunction = keventSizeFunction;
self.pending = [[NSPointerArray alloc] initWithPointerFunctions:pf];
In general, the use of a struct implies the existence of a relatively simple data type that has no logic associated with it nor should have any logic associated with it. Take an NSPoint for instance - it is merely a (x,y) representation. Given this, there are also some issues that arise from it's use. In general, this is OK for this type of data as we usually observe for a change in the point rather than the y-coordinate of a point (fundamentally, (0,1) isn't the same as (1,1) shifted down by 1 unit). If this is an undesirable behavior, it may be a better idea to use a class.

Resources