Apologies if the answer is obvious, but I don't get it. I have a function that accepts a FloatArray so I passed a Array<Float> to it but it rejects it! I thought FloatArray was just another way of creating Array<Float>. What's the difference?
Short answer: one is an array of primitives, the other an array of references to Float objects.
The difference is mostly hidden from you in Kotlin, so to explain it's probably best to go back to Java…
Java has nine basic types (if I've counted correctly). Eight of them hold a value directly: boolean, byte, short, char, int, long, float, and double — those are called ‘primitives’. The other type is a reference, which can point to an instance of an object or array.
Because there are cases when you need to pass one of those primitive values around as an object, Java also provides some objects which simply wrap a primitive value: java.lang.Boolean, java.lang.Byte, and so on. There's one for each primitive type.
Most code uses primitives directly, but sometimes it's handy to be able to pass an object reference. (For one thing, primitives are not nullable, so if you need to support a null, then you'll need an object reference. For another, generic code such as List and the other classes in the collections framework can handle only object references.)
However, object wrappers are less efficient, because each instance is a full object and takes a certain amount of memory (e.g. 16–32 bytes, depending on the Java runtime) — and that's in addition to the size of references to it (perhaps 8 bytes). The JVM caches commonly-used wrappers (e.g. true and false for booleans, and some small numbers), but for anything else you'll be creating new objects on the heap.
The wrappers are clearly distinguished from the primitive types — they're capitalised (and, in the case of Integer, spelled differently). In early versions of Java, they were not interchangeable; you needed to explicitly wrap (e.g. Int(someValue) and unwrap (e.g. someReference.intValue()) when needed. Java 5 added ‘autoboxing’, where in many cases the compiler would do that for you. This blurs the distinction a bit, but most of the time you still need to be aware of it.
One of the benefits of Kotlin is that it removes some of Java's unnecessary complexity. One of the ways it does this is by hiding that distinction almost completely. The Kotlin language has no primitives: everything looks like an object. However, for reasons of efficiency, compiled Kotlin uses primitives ‘under the hood’ where possible. For example:
var i: Int
That declares an Int value — which will be stored as a primitive field. However:
var i: Int?
That declares a reference to an integer wrapper. (That's because primitives are not nullable, and so a primitive can't store a null value.)
This is an implementation detail: most of the time, when you're writing Kotlin, you don't need to be aware of this. But the distinction is still there at runtime, and arrays are one of the rare times it becomes visible:
FloatArray is an array of primitives. It uses the minimum of memory, and interoperates with Java code that uses a float[] type.
Array<Float> is an array of references to Float objects. It's more flexible, and interoperates with Java code that uses a Float[] type.
So you can see that these are two different types, even though they do similar things.
If you're interoperating with existing code, that will control which one you should use. If you're writing new code, then you have the choice: FloatArray is likely to be more efficient and use less memory — but Array<Float> tends to be better supported in other code (which may be able to process all the relevant types just by accepting a generic Array, instead of having to support FloatArray and IntArray and LongArray and all the others).
Some information about arrays in Kotlin is available here: https://kotlinlang.org/docs/basic-types.html#primitive-type-arrays
Kotlin also has classes that represent arrays of primitive types without boxing overhead: ByteArray, ShortArray, IntArray, and so on. These classes have no inheritance relation to the Array class, but they have the same set of methods and properties.
So FloatArray and Array<Float> are not the same, the difference is that the first has no boxing overhead.
Look at how FloatArray is declared in the documentation. It is just another class, not related to the Array<T> class at all. Sure, they represent very similar things, with the difference being that one of them would box Float values, and the other doesn't, as explained by the other answer. But from the perspective of the type system, they are totally unrelated. It's as if I declared:
class A
class B
and tried to pass an instance of A to a parameter expecting a B.
There are builtin methods to convert between these types though:
floatArrayOf(1f,2f,3f).toTypedArray() // FloatArray to Array<Float>
arrayOf(1f,2f,3f).toFloatArray() // Array<Float> to FloatArray
It's just that there is no implicit conversion between them, because these are unrelated types, unlike if you have subclasses and superclasses for example.
Related
I have read multiple times the answers in this question about the TArray<T> and the array of T. From what I have understood the use of the first is more versatile than the latter because for a dynamic array I should declare a type like...
type
TMyFlexibleArray = array of Integer;
... that is needed (in certain cases) because I cannot return an array of Integer for example. Instead, of course, I can return a generic type. Dynamic arrays don't have a fixed length and memory for them is reallocated with the SetLength procedure. TArray is a generic class with static methods; the documentation about it states:
You should not create instances of this class, because its only
purpose is to provide sort and search static methods.
They have two different natures/functions but do they have the same result (for example when passed as parameter or when I just need a flexible container)? I see that TArray has also some useful method.
Is is correct if I say that TArray<T> is a dynamic array built with generics and type K = array of T is an own dynamic array (a custom one)? In my question I assume that they are equivalent in their function of being dynamic arrays (and I prefer the generic way just for comfort).
Generic dynamic arrays and non generic dynamic arrays are identical in every way, apart from their generic, or otherwise, nature. That is the single difference.
That difference drives decision making in the few scenarios where one can be used but not the other. For instance:
For reasons outlined in your question, generic arrays are sometimes necessary when working with generic types.
On the other hand, when writing code that you wish to compile on old compilers that pre-date generics, then you cannot use generic arrays.
If this seems obvious, that's because it is. There really is just this one single difference between generic and non-generic arrays.
You also mention the class TArray from System.Generics.Collections. That's a static class containing methods to search and sort arrays. It's completely different from any dynamic array types and something of a distraction here. Although the names are similar, TArray<T> and TArray these are quite different things. Ignore TArray for the purpose of this question.
So in Swift, what's the difference between
var arr = ["Foo", "Bar"] // normal array in Swift
and
var arr = NSMutableArray.array() // 'NSMutableArray' object
["Foo", "Bar"].map {
arr.addObject($0)
}
other than being different implementations of the same thing.
Both appear to have all the basic features that one might need (.count, the ability to insert/remove objects etc.).
NSMutableArray was invented back in the Obj-C days, obviously to provide a more modern solution instead of the regular C-style arrays. But how does it compare to Swift's built-in array?
Which one is safer and/or faster?
The most important difference, in my opinion, is that NSMutableArray is a class type and Array is a value type. Ergo, an NSMutableArray will be passed as a reference, whereas a Swift Array will be passed by value.
Furthermore NSMutableArray is a subclass of NSObject whereas Array has no parent class. - this means that you get access to all NSObject methods and other 'goodies' when utilising NSMutableArray.
An NSMutableArray will not be copied when you amend it, a Swift Array will be.
Which one is best really depends on your application.
I find (when working with UIKit and Cocoa touch) that NSMutableArray is great when I need a persistent model, whereas Array is great for performance and throw away arrays.
These are just my initial thoughts, I'm sure someone from the community can offer much deeper insight.
Reference Type When:(NSMutableArray)
Subclasses of NSObject must be class types
Comparing instance identity with === makes sense
You want to create shared, mutable state
Value Type When: (Swift array)
Comparing instance data with == makes sense (Equatable protocol)
You want copies to have independent state
The data will be used in code across multiple threads (avoid explicit synchronization)
Interestingly enough, the Swift standard library heavily favors value types:Primitive types (Int, Double, String, …) are value types
Standard collections (Array, Dictionary, Set, …) are value types
Aside from what is illustrated above, the choice really depends on what you are trying to implement. As a rule of thumb, if there is no specific constraint that forces you to opt for a reference type, or you are not sure which option is best for your specific use case, you could start by implementing your data structure using a value type. If needed, you should be able to convert it to a reference type later with relatively little effort.
Conclusion:
Reference types incur more memory overhead, from reference counting and also for storing its data on the heap.
It's worth knowing that copying value types is relatively cheap in Swift,
But it’s important to keep in mind that if your value types become too large, the performance cost of copying can become greater than the cost of using reference types.
This is a bit of a general question but I was wondering if anybody could advise me on what would be advantages of working with Array vs ArraySeq. From what I have seen Array is scala's representation of java Array and there are not too many members in its API whereas ArraySeq seems to contain a much richer API.
There are actually four different classes you could choose from to get mutable array-like functionality.
Array + ArrayOps
WrappedArray
ArraySeq
ArrayBuffer
Array is a plain old Java array. It is by far the best way to go for low-level access to arrays of primitives. There's no overhead. Also it can act like the Scala collections thanks to implicit conversion to ArrayOps, which grabs the underlying array, applies the appropriate method, and, if appropriate, returns a new array. But since ArrayOps is not specialized for primitives, it's slow (as slow as boxing/unboxing always is).
WrappedArray is a plain old Java array, but wrapped in all of Scala's collection goodies. The difference between it and ArrayOps is that WrappedArray returns another WrappedArray--so at least you don't have the overhead of having to re-ArrayOps your Java primitive array over and over again for each operation. It's good to use when you are doing a lot of interop with Java and you need to pass in plain old Java arrays, but on the Scala side you need to manipulate them conveniently.
ArraySeq stores its data in a plain old Java array, but it no longer stores arrays of primitives; everything is an array of objects. This means that primitives get boxed on the way in. That's actually convenient if you want to use the primitives many times; since you've got boxed copies stored, you only have to unbox them, not box and unbox them on every generic operation.
ArrayBuffer acts like an array, but you can add and remove elements from it. If you're going to go all the way to ArraySeq, why not have the added flexibility of changing length while you're at it?
From the scala-lang.org forum:
Array[T] - Benefits: Native, fast -
Limitations: Few methods (only apply,
update, length), need to know T at
compile-time, because Java bytecode
represents (char[] different from
int[] different from Object[])
ArraySeq[T] (the class formerly known
as GenericArray[T]): - Benefits: Still
backed by a native Array, don't need
to know anything about T at
compile-time (new ArraySeq[T] "just
works", even if nothing is known about
T), full suite of SeqLike methods,
subtype of Seq[T] - Limitations: It's
backed by an Array[AnyRef], regardless
of what T is (if T is primitive, then
elements will be boxed/unboxed on
their way in or out of the backing
Array)
ArraySeq[Any] is much faster than
Array[Any] when handling primitives.
In any code you have Array[T], where T
isn't <: AnyRef, you'll get faster
performance out of ArraySeq.
Array is a direct representation of Java's Array, and uses the exact same bytecode on the JVM.
The advantage of Array is that it's the only collection type on the JVM to not undergo type erasure, Arrays are also able to directly hold primitives without boxing, this can make them very fast under some circumstances.
Plus, you get Java's messed up array covariance behaviour. (If you pass e.g. an Array[Int] to some Java class it can be assigned to a variable of type Array[Object] which will then throw an ArrayStoreException on trying to add anything that isn't an int.)
ArraySeq is rarely used nowadays, it's more of a historic artifact from older versions of Scala that treated arrays differently. Seeing as you have to deal with boxing anyway, you're almost certain to find that another collection type is a better fit for your requirements.
Otherwise... Arrays have exactly the same API as ArraySeq, thanks to an implicit conversion from Array to ArrayOps.
Unless you have a specific need for the unique properties of arrays, try to avoid them too.
See This Talk at around 19:30 or This Article for an idea of the sort of problems that Arrays can introduce.
After watching that video, it's interesting to note that Scala uses Seq for varargs :)
As you observed correctly, ArraySeq has a richer API as it is derived from IndexedSeq (and so on) whereas Array is a direct representation of Java arrays.
The relation between the both could be roughly compared to the relation of the ArrayList and arrays in Java.
Due to it's API, I would recommend using the ArraySeq unless there is a specific reason not to do so. Using toArray(), you can convert to an Array any time.
What distinguishes and object from a struct?
When and why do we use an object as opposed to a struct?
How does an array differ from both, and when and why would we use an array as opposed to an object or a struct?
I would like to get an idea of what each is intended for.
Obviously you can blur the distinctions according to your programming style, but generally a struct is a structured piece of data. An object is a sovereign entity that can perform some sort of task. In most systems, objects have some state and as a result have some structured data behind them. However, one of the primary functions of a well-designed class is data hiding — exactly how a class achieves whatever it does is opaque and irrelevant.
Since classes can be used to represent classic data structures such as arrays, hash maps, trees, etc, you often see them as the individual things within a block of structured data.
An array is a block of unstructured data. In many programming languages, every separate thing in an array must be of the same basic type (such as every one being an integer number, every one being a string, or similar) but that isn't true in many other languages.
As guidelines:
use an array as a place to put a large group of things with no other inherent structure or hierarchy, such as "all receipts from January" or "everything I bought in Denmark"
use structured data to compound several discrete bits of data into a single block, such as you might want to combine an x position and a y position to describe a point
use an object where there's a particular actor or thing that thinks or acts for itself
The implicit purpose of an object is therefore directly to associate tasks with the data on which they can operate and to bundle that all together so that no other part of the system can interfere. Obeying proper object-oriented design principles may require discipline at first but will ultimately massively improve your code structure and hence your ability to tackle larger projects and to work with others.
Generally speaking, objects bring the full object oriented functionality (methods, data, virtual functions, inheritance, etc, etc) whereas structs are just organized memory. Structs may or may not have support for methods / functions, but they generally won't support inheritance and other full OOP features.
Note that I said generally speaking ... individual languages are free to overload terminology however they want to.
Arrays have nothing to do with OO. Indeed, pretty much every language around support arrays. Arrays are just blocks of memory, generally containing a series of similar items, usually indexable somehow.
What distinguishes and object from a struct?
There is no notion of "struct" in OOP. The definition of structures depends on the language used. For example in C++ classes and structs are the same, but class members are private by defaults while struct members are public to maintain compatibility with C structs. In C# on the other hand, struct is used to create value types while class is for reference types. C has structs and is not object oriented.
When and why do we use an object as opposed to a struct?
Again this depends on the language used. Normally structures are used to represent PODs (Plain Old Data), meaning that they don't specify behavior that acts on the data and are mainly used to represent records and not objects. This is just a convention and is not enforced in C++.
How does an array differ from both,
and when and why would we use an
array as opposed to an object or a
struct?
An array is very different. An array is normally a homogeneous collection of elements indexed by an integer. A struct is a heterogeneous collection where elements are accessed by name. You'd use an array to represent a collection of objects of the same type (an array of colors for example) while you'd use a struct to represent a record containing data for a certain object (a single color which has red, green, and blue elements)
Short answer: Structs are value types. Classes(Objects) are reference types.
By their nature, an object has methods, a struct doesn't.
(nothing stops you from having an object without methods, jus as nothing stops you from, say, storing an integer in a float-typed variable)
When and why do we use an object as opposed to a struct?
This is a key question. I am using structs and procedural code modules to provide most of the benefits of OOP. Structs provide most of the data storage capability of objects (other than read only properties). Procedural modules provide code completion similar to that provided by objects. I can enter module.function in the IDE instead of object.method. The resulting code looks the same. Most of my functions now return stucts rather than single values. The effect on my code has been dramatic, with code readability going up and the number of lines being greatly reduced. I do not know why procedural programming that makes extensive use of structs is not more common. Why not just use OOP? Some of the languages that I use are only procedural (PureBasic) and the use of structs allows some of the benefits of OOP to be experienced. Others languages allow a choice of procedural or OOP (VBA and Python). I currently find it easier to use procedural programming and in my discipline (ecology) I find it very hard to define objects. When I can't figure out how to group data and functions together into objects in a philosophically coherent collection then I don't have a basis for creating classes/objects. With structs and functions, there is no need for defining a hierarchy of classes. I am free to shuffle functions between modules which helps me to improve the organisation of my code as I go. Perhaps this is a precursor to going OO.
Code written with structs has higher performance than OOP based code. OOP code has encapsulation, inheritance and polymorphism, however I think that struct/function based procedural code often shares these characteristics. A function returns a value only to its caller and only within scope, thereby achieving encapsulation. Likewise a function can be polymorphic. For example, I can write a function that calculates the time difference between two places with two internal algorithms, one that considers the international date line and one that does not. Inheritance usually refers to methods inheriting from a base class. There is inheritance of sorts with functions that call other functions and use structs for data transfer. A simple example is passing up an error message through a stack of nested functions. As the error message is passed up, it can be added to by the calling functions. The result is a stack trace with a very descriptive error message. In this case a message inherited through several levels. I don't know how to describe this bottom up inheritance, (event driven programming?) but it is a feature of using functions that return structs that is absent from procedural programming using simple return values. At this point in time I have not encountered any situations where OOP would be more productive than functions and structs. The surprising thing for me is that very little of the code available on the internet is written this way. It makes me wonder if there is any reason for this?
Arrays are ordered collection of items that (usually) are of the same types. Items can be accessed by index. Classic arrays allow integer indices only, however modern languages often provide so called associative arrays (dictionaries, hashes etc.) that allow use e.g. strings as indices.
Structure is a collection of named values (fields) which may be of 'different types' (e.g. field a stores integer values, field b - string values etc.). They (a) group together logically connected values and (b) simplify code change by hiding details (e.g. changing structure layout don't affect signature of function working with this structure). The latter is called 'encapsulation'.
Theroretically, object is an instance of structure that demonstrates some behavior in response to messages being sent (i.e., in most languages, having some methods). Thus, the very usefullness of object is in this behavior, not its fields.
Different objects can demonstrate different behavior in response to the same messages (the same methods being called), which is called 'polymorphism'.
In many (but not all) languages objects belong to some classes and classes can form hierarchies (which is called 'inheritance').
Since object methods can work with its fields directly, fields can be hidden from access by any code except for this methods (e.g. by marking them as private). Thus encapsulation level for objects can be higher than for structs.
Note that different languages add different semantics to this terms.
E.g.:
in CLR languages (C#, VB.NET etc) structs are allocated on stack/in registers and objects are created in heap.
in C++ structs have all fields public by default, and objects (instances of classes) have all fields private.
in some dynamic languages objects are just associative arrays which store values and methods.
I also think it's worth mentioning that the concept of a struct is very similar to an "object" in Javascript, which is defined very differently than objects in other languages. They are both referenced like "foo.bar" and the data is structured similarly.
As I see it an object at the basic level is a number of variables and a number of methods that manipulate those variables, while a struct on the other hand is only a number of variables.
I use an object when you want to include methods, I use a struct when I just want a collection of variables to pass around.
An array and a struct is kind of similar in principle, they're both a number of variables. Howoever it's more readable to write myStruct.myVar than myArray[4]. You could use an enum to specify the array indexes to get myArray[indexOfMyVar] and basically get the same functionality as a struct.
Of course you can use constants or something else instead of variables, I'm just trying to show the basic principles.
This answer may need the attention of a more experienced programmer but one of the differences between structs and objects is that structs have no capability for reflection whereas objects may. Reflection is the ability of an object to report the properties and methods that it has. This is how 'object explorer' can find and list new methods and properties created in user defined classes. In other words, reflection can be used to work out the interface of an object. With a structure, there is no way that I know of to iterate through the elements of the structure to find out what they are called, what type they are and what their values are.
If one is using structs as a replacement for objects, then one can use functions to provide the equivalent of methods. At least in my code, structs are often used for returning data from user defined functions in modules which contain the business logic. Structs and functions are as easy to use as objects but functions lack support for XML comments. This means that I constantly have to look at the comment block at the top of the function to see just what the function does. Often I have to read the function source code to see how edge cases are handled. When functions call other functions, I often have to chase something several levels deep and it becomes hard to figure things out. This leads to another benefit of OOP vs structs and functions. OOP has XML comments which show up as tool tips in the IDE (in most but not all OOP languages) and in OOP there are also defined interfaces and often an object diagram (if you choose to make them). It is becoming clear to me that the defining advantage of OOP is the capability of documenting the what code does what and how it relates to other code - the interface.
I want to get an array of bytes (Array[Byte]) from somewhere (read from file, from socket, etc) and then provide a efficient way to pull bits out of it (e.g. provide a function to extract a 32-bit integer from offset N in array). I would then like to wrap the byte array (hiding it) providing functions to pull bits out from the array (probably using lazy val for each bit to pull out).
I would imagine having a wrapping class that takes an immutable byte array type in the constructor to prove the array contents is never modified. IndexedSeq[Byte] seemed relevant, but I could not work out how to go from Array[Byte] to IndexedSeq[Byte].
Part 2 of the question is if I used IndexedSeq[Byte] will the resultant code be any slower? I need the code to execute as fast as possible, so would stick with Array[Byte] if the compiler could do a better job with it.
I could write a wrapper class around the array, but that would slow things down - one extra level of indirection for each access to bytes in the array. Performance is critical due to the number of array accesses that will be required. I need fast code, but would like to do the code nicely at the same time. Thanks!
PS: I am a Scala newbie.
Treating Array[T] as an IndexedSeq[T] could hardly be simpler:
Array(1: Byte): IndexedSeq[Byte] // trigger an Implicit View
wrapByteArray(Array(1: Byte)) // explicitly calling
Unboxing will kill you long before an extra layer of indirection.
C:\>scala -Xprint:erasure -e "{val a = Array(1: Byte); val b1: Byte = a(0); val
b2 = (a: IndexedSeq[Byte])(0)}"
[[syntax trees at end of erasure]]// Scala source: scalacmd5680604016099242427.s
cala
val a: Array[Byte] = scala.Array.apply((1: Byte), scala.this.Predef.
wrapByteArray(Array[Byte]{}));
val b1: Byte = a.apply(0);
val b2: Byte = scala.Byte.unbox((scala.this.Predef.wrapByteArray(a): IndexedSeq).apply(0));
To avoid this, the Scala collections library should be specialized on the element type, in the same style as Tuple1 and Tuple2. I'm told this is planned, but it's a bit more involved than simply slapping #specialized everywhere, so I don't know how long it will take.
UPDATE
Yes, WrappedArray is mutable, although collection.IndexedSeq[Byte] doesn't have methods to mutate, so you could just trust clients not to cast to a mutable interface. The next release of Scalaz will include ImmutableArray which prevents this.
The boxing comes retrieving an element from the collection via this generic method:
trait SeqLike[+A, +Repr] extends IterableLike[A, Repr] { self =>
def apply(idx: Int): A
}
At the JVM level, this signature is type-erased to:
def apply(idx: Int): Object
If your collection contains primitives, that is, subtypes of AnyVal, they must be boxed in the corresponding wrapper to be returned from this method. For some applications, this is a major performance concern. Entire libraries have been written in Java to avoid this, notably fastutils.
Annotation directed specialization was added to Scala 2.8 to instruct the compiler to generate various versions of a class or method tailored to the permutations of primitive types. This has been applied to a few places in the standard library already, e.g. TupleN, ProductN, Function{0, 1, 2}. If this was also applied to the collections hierarchy, this performance cost could be alleviated.
If you want to work with sequences in Scala, I recommend you choose one of these:
Immutable seqs:
(linked seqs) List, Stream, Queue
(indexed seqs) Vector
Mutable seqs:
(linked seq) ListBuffer
(indexed seq) ArrayBuffer
The new (2.8) Scala collections have been hard to grasp for me, primarily due to shortage of (correct) documentation but also because of the source code (complex hierarchys). To clear my mind I made this pic to visualize the basic structure:
(source: programmera.net)
Also, note that Array is not part of the tree structure, it is a special case, since it wraps the Java array (which is a special case in Java).