In Swift 3.0, the code below gives different addresses for thisArray[0], suggesting that the array was deep copied. Is this actually the case, or am I missing something in my analysis? Does if let behave the same way? It may be irrelevant for if let, as it is immutable...
var thisArray: [String]? = ["One", "Two"]
withUnsafePointer(to: &thisArray![0]) {
print("thisArray[0] has address \($0)")
}
if var thisArray = thisArray {
withUnsafePointer(to: &thisArray[0]) {
print("thisArray[0] has address \($0)")
}
}
Relevant: https://developer.apple.com/swift/blog/?id=10.
In Swift, Array, String, and Dictionary are all value types.
So, if you assign an existing value type via var or let then a copy occurs. If you assign an existing reference type (such as a class) via var or let then you'll be assigning a reference.
#CharlieS's answer is mostly correct but glosses over some important details...
Semantically, assigning a value type to a different binding (whether a var variable or let constant) always creates a copy. That is, your program code can always safely assume that modifications to one binding of a value type will never affect others.
Or to put it a different way: if you were building your own version of the Swift compiler / runtime / standard library from scratch, you could make every var a = b allocate new memory for a and copy all the memory contents of b, regardless of which value type a and b are. All other things being equal, your implementation would be compatible with all Swift programs.
The downside to value type reassignment always being a copy is that for large types (like collections or composite types), all that copying wastes time and memory. So...
In practice, value types can be implemented in ways that maintain the semantic always-a-copy guarantee of value types while providing performance optimizations like copy-on-write. The Swift Standard Library collection types (arrays, dictionaries, sets, etc) do this, and it's possible for custom value types (including yours) to implement copy-on-write too. (For details on how, this WWDC 2015 talk provides a good overview.)
To make copy-on-write work, an implementing value type needs to use reference types internally (as noted in that WWDC talk). And it has to do it in such a way that the language guarantee for value types — that assignments are always semantically copies — continues to hold in all cases.
One of the ways that a copy-on-write array implementation could fail that guarantee would be to allow unguarded access to its underlying storage buffer — if you can get a raw pointer into that storage, you could mutate the contents in ways that cause other bindings (that is, semantic copies) to mutate, violating the language guarantee.
To preserve the copy-on-write guarantee, the standard library's collection types make sure that copies certain operations that could perform unguarded mutation create copies. (Although even then, sometimes the copies created involve enough reference manipulation that the memory and time costs of the copies remain low up until an actual mutation happens.)
You can see a bit of how this works in the Swift compiler & standard library source code — start from a search for isUniquelyReferenced and follow the callers and callees of is various use cases in ArrayBuffer etc.
For an illustration of what's going on here, let's try a variation on your test:
var thisArray: [String] = ["One", "Two"]
withUnsafePointer(to: &thisArray[0]) {
print("thisArray[0] has address \($0)")
}
var thatArray = thisArray // comment/uncomment here
withUnsafePointer(to: &thisArray[0]) {
print("thisArray[0] has address \($0)")
}
When you comment out the assignment thatArray = thisArray, both addresses are the same. Once thisArray is no longer uniquely referenced, though, accessing even the original array's underlying buffer requires a copy (or at least some internal indirection).
Related
In Go there are various ways to return a struct value or slice thereof. For individual ones I've seen:
type MyStruct struct {
Val int
}
func myfunc() MyStruct {
return MyStruct{Val: 1}
}
func myfunc() *MyStruct {
return &MyStruct{}
}
func myfunc(s *MyStruct) {
s.Val = 1
}
I understand the differences between these. The first returns a copy of the struct, the second a pointer to the struct value created within the function, the third expects an existing struct to be passed in and overrides the value.
I've seen all of these patterns be used in various contexts, I'm wondering what the best practices are regarding these. When would you use which? For instance, the first one could be ok for small structs (because the overhead is minimal), the second for bigger ones. And the third if you want to be extremely memory efficient, because you can easily reuse a single struct instance between calls. Are there any best practices for when to use which?
Similarly, the same question regarding slices:
func myfunc() []MyStruct {
return []MyStruct{ MyStruct{Val: 1} }
}
func myfunc() []*MyStruct {
return []MyStruct{ &MyStruct{Val: 1} }
}
func myfunc(s *[]MyStruct) {
*s = []MyStruct{ MyStruct{Val: 1} }
}
func myfunc(s *[]*MyStruct) {
*s = []MyStruct{ &MyStruct{Val: 1} }
}
Again: what are best practices here. I know slices are always pointers, so returning a pointer to a slice isn't useful. However, should I return a slice of struct values, a slice of pointers to structs, should I pass in a pointer to a slice as argument (a pattern used in the Go App Engine API)?
tl;dr:
Methods using receiver pointers are common; the rule of thumb for receivers is, "If in doubt, use a pointer."
Slices, maps, channels, strings, function values, and interface values are implemented with pointers internally, and a pointer to them is often redundant.
Elsewhere, use pointers for big structs or structs you'll have to change, and otherwise pass values, because getting things changed by surprise via a pointer is confusing.
One case where you should often use a pointer:
Receivers are pointers more often than other arguments. It's not unusual for methods to modify the thing they're called on, or for named types to be large structs, so the guidance is to default to pointers except in rare cases.
Jeff Hodges' copyfighter tool automatically searches for non-tiny receivers passed by value.
Some situations where you don't need pointers:
Code review guidelines suggest passing small structs like type Point struct { latitude, longitude float64 }, and maybe even things a bit bigger, as values, unless the function you're calling needs to be able to modify them in place.
Value semantics avoid aliasing situations where an assignment over here changes a value over there by surprise.
Passing small structs by value can be more efficient by avoiding cache misses or heap allocations. In any case, when pointers and values perform similarly, the Go-y approach is to choose whatever provides the more natural semantics rather than squeeze out every last bit of speed.
So, Go Wiki's code review comments page suggests passing by value when structs are small and likely to stay that way.
If the "large" cutoff seems vague, it is; arguably many structs are in a range where either a pointer or a value is OK. As a lower bound, the code review comments suggest slices (three machine words) are reasonable to use as value receivers. As something nearer an upper bound, bytes.Replace takes 10 words' worth of args (three slices and an int). You can find situations where copying even large structs turns out a performance win, but the rule of thumb is not to.
For slices, you don't need to pass a pointer to change elements of the array. io.Reader.Read(p []byte) changes the bytes of p, for instance. It's arguably a special case of "treat little structs like values," since internally you're passing around a little structure called a slice header (see Russ Cox (rsc)'s explanation). Similarly, you don't need a pointer to modify a map or communicate on a channel.
For slices you'll reslice (change the start/length/capacity of), built-in functions like append accept a slice value and return a new one. I'd imitate that; it avoids aliasing, returning a new slice helps call attention to the fact that a new array might be allocated, and it's familiar to callers.
It's not always practical follow that pattern. Some tools like database interfaces or serializers need to append to a slice whose type isn't known at compile time. They sometimes accept a pointer to a slice in an interface{} parameter.
Maps, channels, strings, and function and interface values, like slices, are internally references or structures that contain references already, so if you're just trying to avoid getting the underlying data copied, you don't need to pass pointers to them. (rsc wrote a separate post on how interface values are stored).
You still may need to pass pointers in the rarer case that you want to modify the caller's struct: flag.StringVar takes a *string for that reason, for example.
Where you use pointers:
Consider whether your function should be a method on whichever struct you need a pointer to. People expect a lot of methods on x to modify x, so making the modified struct the receiver may help to minimize surprise. There are guidelines on when receivers should be pointers.
Functions that have effects on their non-receiver params should make that clear in the godoc, or better yet, the godoc and the name (like reader.WriteTo(writer)).
You mention accepting a pointer to avoid allocations by allowing reuse; changing APIs for the sake of memory reuse is an optimization I'd delay until it's clear the allocations have a nontrivial cost, and then I'd look for a way that doesn't force the trickier API on all users:
For avoiding allocations, Go's escape analysis is your friend. You can sometimes help it avoid heap allocations by making types that can be initialized with a trivial constructor, a plain literal, or a useful zero value like bytes.Buffer.
Consider a Reset() method to put an object back in a blank state, like some stdlib types offer. Users who don't care or can't save an allocation don't have to call it.
Consider writing modify-in-place methods and create-from-scratch functions as matching pairs, for convenience: existingUser.LoadFromJSON(json []byte) error could be wrapped by NewUserFromJSON(json []byte) (*User, error). Again, it pushes the choice between laziness and pinching allocations to the individual caller.
Callers seeking to recycle memory can let sync.Pool handle some details. If a particular allocation creates a lot of memory pressure, you're confident you know when the alloc is no longer used, and you don't have a better optimization available, sync.Pool can help. (CloudFlare published a useful (pre-sync.Pool) blog post about recycling.)
Finally, on whether your slices should be of pointers: slices of values can be useful, and save you allocations and cache misses. There can be blockers:
The API to create your items might force pointers on you, e.g. you have to call NewFoo() *Foo rather than let Go initialize with the zero value.
The desired lifetimes of the items might not all be the same. The whole slice is freed at once; if 99% of the items are no longer useful but you have pointers to the other 1%, all of the array remains allocated.
Copying or moving the values might cause you performance or correctness problems, making pointers more attractive. Notably, append copies items when it grows the underlying array. Pointers to slice items from before the append may not point to where the item was copied after, copying can be slower for huge structs, and for e.g. sync.Mutex copying isn't allowed. Insert/delete in the middle and sorting also move items around so similar considerations can apply.
Broadly, value slices can make sense if either you get all of your items in place up front and don't move them (e.g., no more appends after initial setup), or if you do keep moving them around but you're confident that's OK (no/careful use of pointers to items, and items are small or you've measured the perf impact). Sometimes it comes down to something more specific to your situation, but that's a rough guide.
If you can (e.g. a non-shared resource that does not need to be passed as reference), use a value. By the following reasons:
Your code will be nicer and more readable, avoiding pointer operators and null checks.
Your code will be safer against Null Pointer panics.
Your code will be often faster: yes, faster! Why?
Reason 1: you will allocate less items in the heap. Allocating/deallocating from stack is immediate, but allocating/deallocating on Heap may be very expensive (allocation time + garbage collection). You can see some basic numbers here: http://www.macias.info/entry/201802102230_go_values_vs_references.md
Reason 2: especially if you store returned values in slices, your memory objects will be more compacted in memory: looping a slice where all the items are contiguous is much faster than iterating a slice where all the items are pointers to other parts of the memory. Not for the indirection step but for the increase of cache misses.
Myth breaker: a typical x86 cache line are 64 bytes. Most structs are smaller than that. The time of copying a cache line in memory is similar to copying a pointer.
Only if a critical part of your code is slow I would try some micro-optimization and check if using pointers improves somewhat the speed, at the cost of less readability and mantainability.
Three main reasons when you would want to use method receivers as pointers:
"First, and most important, does the method need to modify the receiver? If it does, the receiver must be a pointer."
"Second is the consideration of efficiency. If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver."
"Next is consistency. If some of the methods of the type must have pointer receivers, the rest should too, so the method set is consistent regardless of how the type is used"
Reference : https://golang.org/doc/faq#methods_on_values_or_pointers
Edit : Another important thing is to know the actual "type" that you are sending to function. The type can either be a 'value type' or 'reference type'.
Even as slices and maps acts as references, we might want to pass them as pointers in scenarios like changing the length of the slice in the function.
A case where you generally need to return a pointer is when constructing an instance of some stateful or shareable resource. This is often done by functions prefixed with New.
Because they represent a specific instance of something and they may need to coordinate some activity, it doesn't make a lot of sense to generate duplicated/copied structures representing the same resource -- so the returned pointer acts as the handle to the resource itself.
Some examples:
func NewTLSServer(handler http.Handler) *Server -- instantiate a web server for testing
func Open(name string) (*File, error) -- return a file access handle
In other cases, pointers are returned just because the structure may be too large to copy by default:
func NewRGBA(r Rectangle) *RGBA -- allocate an image in memory
Alternatively, returning pointers directly could be avoided by instead returning a copy of a structure that contains the pointer internally, but maybe this isn't considered idiomatic:
No such examples found in the standard libraries...
Related question: Embedding in Go with pointer or with value
Regarding to struct vs. pointer return value, I got confused after reading many highly stared open source projects on github, as there are many examples for both cases, util I found this amazing article:
https://www.ardanlabs.com/blog/2014/12/using-pointers-in-go.html
"In general, share struct type values with a pointer unless the struct type has been implemented to behave like a primitive data value.
If you are still not sure, this is another way to think about. Think of every struct as having a nature. If the nature of the struct is something that should not be changed, like a time, a color or a coordinate, then implement the struct as a primitive data value. If the nature of the struct is something that can be changed, even if it never is in your program, it is not a primitive data value and should be implemented to be shared with a pointer. Don’t create structs that have a duality of nature."
Completedly convinced.
Apologies if the answer is obvious, but I don't get it. I have a function that accepts a FloatArray so I passed a Array<Float> to it but it rejects it! I thought FloatArray was just another way of creating Array<Float>. What's the difference?
Short answer: one is an array of primitives, the other an array of references to Float objects.
The difference is mostly hidden from you in Kotlin, so to explain it's probably best to go back to Java…
Java has nine basic types (if I've counted correctly). Eight of them hold a value directly: boolean, byte, short, char, int, long, float, and double — those are called ‘primitives’. The other type is a reference, which can point to an instance of an object or array.
Because there are cases when you need to pass one of those primitive values around as an object, Java also provides some objects which simply wrap a primitive value: java.lang.Boolean, java.lang.Byte, and so on. There's one for each primitive type.
Most code uses primitives directly, but sometimes it's handy to be able to pass an object reference. (For one thing, primitives are not nullable, so if you need to support a null, then you'll need an object reference. For another, generic code such as List and the other classes in the collections framework can handle only object references.)
However, object wrappers are less efficient, because each instance is a full object and takes a certain amount of memory (e.g. 16–32 bytes, depending on the Java runtime) — and that's in addition to the size of references to it (perhaps 8 bytes). The JVM caches commonly-used wrappers (e.g. true and false for booleans, and some small numbers), but for anything else you'll be creating new objects on the heap.
The wrappers are clearly distinguished from the primitive types — they're capitalised (and, in the case of Integer, spelled differently). In early versions of Java, they were not interchangeable; you needed to explicitly wrap (e.g. Int(someValue) and unwrap (e.g. someReference.intValue()) when needed. Java 5 added ‘autoboxing’, where in many cases the compiler would do that for you. This blurs the distinction a bit, but most of the time you still need to be aware of it.
One of the benefits of Kotlin is that it removes some of Java's unnecessary complexity. One of the ways it does this is by hiding that distinction almost completely. The Kotlin language has no primitives: everything looks like an object. However, for reasons of efficiency, compiled Kotlin uses primitives ‘under the hood’ where possible. For example:
var i: Int
That declares an Int value — which will be stored as a primitive field. However:
var i: Int?
That declares a reference to an integer wrapper. (That's because primitives are not nullable, and so a primitive can't store a null value.)
This is an implementation detail: most of the time, when you're writing Kotlin, you don't need to be aware of this. But the distinction is still there at runtime, and arrays are one of the rare times it becomes visible:
FloatArray is an array of primitives. It uses the minimum of memory, and interoperates with Java code that uses a float[] type.
Array<Float> is an array of references to Float objects. It's more flexible, and interoperates with Java code that uses a Float[] type.
So you can see that these are two different types, even though they do similar things.
If you're interoperating with existing code, that will control which one you should use. If you're writing new code, then you have the choice: FloatArray is likely to be more efficient and use less memory — but Array<Float> tends to be better supported in other code (which may be able to process all the relevant types just by accepting a generic Array, instead of having to support FloatArray and IntArray and LongArray and all the others).
Some information about arrays in Kotlin is available here: https://kotlinlang.org/docs/basic-types.html#primitive-type-arrays
Kotlin also has classes that represent arrays of primitive types without boxing overhead: ByteArray, ShortArray, IntArray, and so on. These classes have no inheritance relation to the Array class, but they have the same set of methods and properties.
So FloatArray and Array<Float> are not the same, the difference is that the first has no boxing overhead.
Look at how FloatArray is declared in the documentation. It is just another class, not related to the Array<T> class at all. Sure, they represent very similar things, with the difference being that one of them would box Float values, and the other doesn't, as explained by the other answer. But from the perspective of the type system, they are totally unrelated. It's as if I declared:
class A
class B
and tried to pass an instance of A to a parameter expecting a B.
There are builtin methods to convert between these types though:
floatArrayOf(1f,2f,3f).toTypedArray() // FloatArray to Array<Float>
arrayOf(1f,2f,3f).toFloatArray() // Array<Float> to FloatArray
It's just that there is no implicit conversion between them, because these are unrelated types, unlike if you have subclasses and superclasses for example.
I have two arrays:
struct Data {
all_objects: Vec<Rc<dyn Drawable>>;
selected_objects: Vec<Rc<dyn Drawable>>;
}
selected_objects is guarenteed to be a subset of all_objects. I want to be able to somehow be able to add or remove mutable references to selected objects.
I can add the objects easily enough to selected_objects:
Rc::get_mut(selected_object).unwrap().select(true);
self.selected_objects.push(selected_object.clone());
However, if I later try:
for obj in self.selected_objects.iter_mut() {
Rc::get_mut(obj).unwrap().select(false);
}
This gives a runtime error, which matches the documentation for get_mut: "Returns None otherwise, because it is not safe to mutate a shared value."
However, I really want to be able to access and call arbitrary methods on both arrays, so I can efficiently perform operations on the selection, while also being able to still perform operations for all objects.
It seems Rc does not support this, it seems RefMut is missing a Clone() that alows me to put it into multiple arrays, plus not actually supporting dyn types. Box is also missing a Clone(). So my question is, how do you store writable pointers in multiple arrays? Is there another type of smart pointer for this purpose? Do I need to nest them? Is there some other data structure more suitable? Is there a way to give up the writable reference?
Ok, it took me a bit of trial and error, but I have a ugly solution:
struct Data {
all_objects: Vec<Rc<RefCell<dyn Drawable>>>;
selected_objects: Vec<Rc<RefCell<dyn Drawable>>>;
}
The Rc allows you to store multiple references to an object. RefCell makes these references mutable. Now the only thing I have to do is call .borrow() every time I use a object.
While this seems to work and be reasonably versitle, I'm still open for cleaner solutions.
So in Swift, what's the difference between
var arr = ["Foo", "Bar"] // normal array in Swift
and
var arr = NSMutableArray.array() // 'NSMutableArray' object
["Foo", "Bar"].map {
arr.addObject($0)
}
other than being different implementations of the same thing.
Both appear to have all the basic features that one might need (.count, the ability to insert/remove objects etc.).
NSMutableArray was invented back in the Obj-C days, obviously to provide a more modern solution instead of the regular C-style arrays. But how does it compare to Swift's built-in array?
Which one is safer and/or faster?
The most important difference, in my opinion, is that NSMutableArray is a class type and Array is a value type. Ergo, an NSMutableArray will be passed as a reference, whereas a Swift Array will be passed by value.
Furthermore NSMutableArray is a subclass of NSObject whereas Array has no parent class. - this means that you get access to all NSObject methods and other 'goodies' when utilising NSMutableArray.
An NSMutableArray will not be copied when you amend it, a Swift Array will be.
Which one is best really depends on your application.
I find (when working with UIKit and Cocoa touch) that NSMutableArray is great when I need a persistent model, whereas Array is great for performance and throw away arrays.
These are just my initial thoughts, I'm sure someone from the community can offer much deeper insight.
Reference Type When:(NSMutableArray)
Subclasses of NSObject must be class types
Comparing instance identity with === makes sense
You want to create shared, mutable state
Value Type When: (Swift array)
Comparing instance data with == makes sense (Equatable protocol)
You want copies to have independent state
The data will be used in code across multiple threads (avoid explicit synchronization)
Interestingly enough, the Swift standard library heavily favors value types:Primitive types (Int, Double, String, …) are value types
Standard collections (Array, Dictionary, Set, …) are value types
Aside from what is illustrated above, the choice really depends on what you are trying to implement. As a rule of thumb, if there is no specific constraint that forces you to opt for a reference type, or you are not sure which option is best for your specific use case, you could start by implementing your data structure using a value type. If needed, you should be able to convert it to a reference type later with relatively little effort.
Conclusion:
Reference types incur more memory overhead, from reference counting and also for storing its data on the heap.
It's worth knowing that copying value types is relatively cheap in Swift,
But it’s important to keep in mind that if your value types become too large, the performance cost of copying can become greater than the cost of using reference types.
I'm reading the documentation and I am constantly shaking my head at some of the design decisions of the language. But the thing that really got me puzzled is how arrays are handled.
I rushed to the playground and tried these out. You can try them too. So the first example:
var a = [1, 2, 3]
var b = a
a[1] = 42
a
b
Here a and b are both [1, 42, 3], which I can accept. Arrays are referenced - OK!
Now see this example:
var c = [1, 2, 3]
var d = c
c.append(42)
c
d
c is [1, 2, 3, 42] BUT d is [1, 2, 3]. That is, d saw the change in the last example but doesn't see it in this one. The documentation says that's because the length changed.
Now, how about this one:
var e = [1, 2, 3]
var f = e
e[0..2] = [4, 5]
e
f
e is [4, 5, 3], which is cool. It's nice to have a multi-index replacement, but f STILL doesn't see the change even though the length has not changed.
So to sum it up, common references to an array see changes if you change 1 element, but if you change multiple elements or append items, a copy is made.
This seems like a very poor design to me. Am I right in thinking this? Is there a reason I don't see why arrays should act like this?
EDIT: Arrays have changed and now have value semantics. Much more sane!
Note that array semantics and syntax was changed in Xcode beta 3 version (blog post), so the question no longer applies. The following answer applied to beta 2:
It's for performance reasons. Basically, they try to avoid copying arrays as long as they can (and claim "C-like performance"). To quote the language book:
For arrays, copying only takes place when you perform an action that has the potential to modify the length of the array. This includes appending, inserting, or removing items, or using a ranged subscript to replace a range of items in the array.
I agree that this is a bit confusing, but at least there is a clear and simple description of how it works.
That section also includes information on how to make sure an array is uniquely referenced, how to force-copy arrays, and how to check whether two arrays share storage.
From the official documentation of the Swift language:
Note that the array is not copied when you set a new value with subscript syntax, because setting a single value with subscript syntax does not have the potential to change the array’s length. However, if you append a new item to array, you do modify the array’s length. This prompts Swift to create a new copy of the array at the point that you append the new value. Henceforth, a is a separate, independent copy of the array.....
Read the whole section Assignment and Copy Behavior for Arrays in this documentation. You will find that when you do replace a range of items in the array then the array takes a copy of itself for all items.
The behavior has changed with Xcode 6 beta 3. Arrays are no longer reference types and have a copy-on-write mechanism, meaning as soon as you change an array's content from one or the other variable, the array will be copied and only the one copy will be changed.
Old answer:
As others have pointed out, Swift tries to avoid copying arrays if possible, including when changing values for single indexes at a time.
If you want to be sure that an array variable (!) is unique, i.e. not shared with another variable, you can call the unshare method. This copies the array unless it already only has one reference. Of course you can also call the copy method, which will always make a copy, but unshare is preferred to make sure no other variable holds on to the same array.
var a = [1, 2, 3]
var b = a
b.unshare()
a[1] = 42
a // [1, 42, 3]
b // [1, 2, 3]
The behavior is extremely similar to the Array.Resize method in .NET. To understand what's going on, it may be helpful to look at the history of the . token in C, C++, Java, C#, and Swift.
In C, a structure is nothing more than an aggregation of variables. Applying the . to a variable of structure type will access a variable stored within the structure. Pointers to objects do not hold aggregations of variables, but identify them. If one has a pointer which identifies a structure, the -> operator may be used to access a variable stored within the structure identified by the pointer.
In C++, structures and classes not only aggregate variables, but can also attach code to them. Using . to invoke a method will on a variable ask that method to act upon the contents of the variable itself; using -> on a variable which identifies an object will ask that method to act upon the object identified by the variable.
In Java, all custom variable types simply identify objects, and invoking a method upon a variable will tell the method what object is identified by the variable. Variables cannot hold any kind of composite data type directly, nor is there any means by which a method can access a variable upon which it is invoked. These restrictions, although semantically limiting, greatly simplify the runtime, and facilitate bytecode validation; such simplifications reduced the resource overhead of Java at a time when the market was sensitive to such issues, and thus helped it gain traction in the marketplace. They also meant that there was no need for a token equivalent to the . used in C or C++. Although Java could have used -> in the same way as C and C++, the creators opted to use single-character . since it was not needed for any other purpose.
In C# and other .NET languages, variables can either identify objects or hold composite data types directly. When used on a variable of a composite data type, . acts upon the contents of the variable; when used on a variable of reference type, . acts upon the object identified by it. For some kinds of operations, the semantic distinction isn't particularly important, but for others it is. The most problematical situations are those in which a composite data type's method which would modify the variable upon which it is invoked, is invoked on a read-only variable. If an attempt is made to invoke a method on a read-only value or variable, compilers will generally copy the variable, let the method act upon that, and discard the variable. This is generally safe with methods that only read the variable, but not safe with methods that write to it. Unfortunately, .does has not as yet have any means of indicating which methods can safely be used with such substitution and which can't.
In Swift, methods on aggregates can expressly indicate whether they will modify the variable upon which they are invoked, and the compiler will forbid the use of mutating methods upon read-only variables (rather than having them mutate temporary copies of the variable which will then get discarded). Because of this distinction, using the . token to call methods that modify the variables upon which they are invoked is much safer in Swift than in .NET. Unfortunately, the fact that the same . token is used for that purpose as to act upon an external object identified by a variable means the possibility for confusion remains.
If had a time machine and went back to the creation of C# and/or Swift, one could retroactively avoid much of the confusion surrounding such issues by having languages use the . and -> tokens in a fashion much closer to the C++ usage. Methods of both aggregates and reference types could use . to act upon the variable upon which they were invoked, and -> to act upon a value (for composites) or the thing identified thereby (for reference types). Neither language is designed that way, however.
In C#, the normal practice for a method to modify a variable upon which it is invoked is to pass the variable as a ref parameter to a method. Thus calling Array.Resize(ref someArray, 23); when someArray identifies an array of 20 elements will cause someArray to identify a new array of 23 elements, without affecting the original array. The use of ref makes clear that the method should be expected to modify the variable upon which it is invoked. In many cases, it's advantageous to be able to modify variables without having to use static methods; Swift addresses that means by using . syntax. The disadvantage is that it loses clarify as to what methods act upon variables and what methods act upon values.
To me this makes more sense if you first replace your constants with variables:
a[i] = 42 // (1)
e[i..j] = [4, 5] // (2)
The first line never needs to change the size of a. In particular, it never needs to do any memory allocation. Regardless of the value of i, this is a lightweight operation. If you imagine that under the hood a is a pointer, it can be a constant pointer.
The second line may be much more complicated. Depending on the values of i and j, you may need to do memory management. If you imagine that e is a pointer that points to the contents of the array, you can no longer assume that it is a constant pointer; you may need to allocate a new block of memory, copy data from the old memory block to the new memory block, and change the pointer.
It seems that the language designers have tried to keep (1) as lightweight as possible. As (2) may involve copying anyway, they have resorted to the solution that it always acts as if you did a copy.
This is complicated, but I am happy that they did not make it even more complicated with e.g. special cases such as "if in (2) i and j are compile-time constants and the compiler can infer that the size of e is not going to change, then we do not copy".
Finally, based on my understanding of the design principles of the Swift language, I think the general rules are these:
Use constants (let) always everywhere by default, and there won't be any major surprises.
Use variables (var) only if it is absolutely necessary, and be vary careful in those cases, as there will be surprises [here: strange implicit copies of arrays in some but not all situations].
What I've found is: The array will be a mutable copy of the referenced one if and only if the operation has the potential to change the array's length. In your last example, f[0..2] indexing with many, the operation has the potential to change its length (it might be that duplicates are not allowed), so it's getting copied.
var e = [1, 2, 3]
var f = e
e[0..2] = [4, 5]
e // 4,5,3
f // 1,2,3
var e1 = [1, 2, 3]
var f1 = e1
e1[0] = 4
e1[1] = 5
e1 // - 4,5,3
f1 // - 4,5,3
Delphi's strings and arrays had the exact same "feature". When you looked at the implementation, it made sense.
Each variable is a pointer to dynamic memory. That memory contains a reference count followed by the data in the array. So you can easily change a value in the array without copying the whole array or changing any pointers. If you want to resize the array, you have to allocate more memory. In that case the current variable will point to the newly allocated memory. But you can't easily track down all of the other variables that pointed to the original array, so you leave them alone.
Of course, it wouldn't be hard to make a more consistent implementation. If you wanted all variables to see a resize, do this:
Each variable is a pointer to a container stored in dynamic memory. The container holds exactly two things, a reference count and pointer to the actual array data. The array data is stored in a separate block of dynamic memory. Now there is only one pointer to the array data, so you can easily resize that, and all variables will see the change.
A lot of Swift early adopters have complained about this error-prone array semantics and Chris Lattner has written that the array semantics had been revised to provide full value semantics ( Apple Developer link for those who have an account). We will have to wait at least for the next beta to see what this exactly means.
I use .copy() for this.
var a = [1, 2, 3]
var b = a.copy()
a[1] = 42
Did anything change in arrays behavior in later Swift versions ? I just run your example:
var a = [1, 2, 3]
var b = a
a[1] = 42
a
b
And my results are [1, 42, 3] and [1, 2, 3]