Optional array vs. empty array in Swift - arrays

I have a simple Person class in Swift that looks about like this:
class Person {
var name = "John Doe"
var age = 18
var children = [Person]?
\\ init function goes here, but does not initialize children array
}
Instead of declaring children to be an optional array, I could simply declare it and initialize it as an empty array like this:
var children = [Person]()
I am trying to decide which approach is better. Declaring the array as an optional array means that it will not take up any memory at all, whereas an empty array has at least some memory allocated for it, correct? So using the optional array means that there will be at least some memory saving. I guess my first question is: Is there really any actual memory saving involved here, or are my assumptions about this incorrect?
On the other hand, if it is optional then each time I try to use it I will have to check to see if it is nil or not before adding or removing objects from it. So there will be be some loss of efficiency there (but not much, I imagine).
I kind of like the optional approach. Not every Person will have children, so why not let children be nil until the Person decides to settle down and raise a family?
At any rate, I would like to know if there are any other specific advantages or disadvantages to one approach or the other. It is a design question that will come up over and over again.

I'm going to make the opposite case from Yordi - an empty array just as clearly says "this Person has no children", and will save you a ton of hassle. children.isEmpty is an easy check for the existence of kids, and you won't ever have to unwrap or worry about an unexpected nil.
Also, as a note, declaring something as optional doesn't mean it takes zero space - it's the .None case of an Optional<Array<Person>>.

The ability to choose between an empty array or an optional gives us the ability to apply the one that better describe the data from a semantic point of view.
I would choose:
An empty array if the list can be empty, but it's a transient status and in the end it should have at least one element. Being non optional makes clear that the array should not be empty
An optional if it's possible for the list to be empty for the entire life cycle of the container entity. Being an optional makes clear that the array can be empty
Let me make some examples:
Purchase order with master and details (one detail per product): a purchase order can have 0 details, but that's a transient status, because it wouldn't make sense having a purchase order with 0 products
Person with children: a person can have no children for his entire life. It is not a transient status (although not permanent as well), but using an optional it's clear that it's legit for a person to have no children.
Note that my opinion is only about making the code more clear and self-explainatory - I don't think there is any significant difference in terms of performance, memory usage, etc. for choosing one option or the other.

Interestingly enough, we have recently had few discussions regarding this very same question at work.
Some suggest that there are subtle semantic differences. E.g. nil means a person has no children whatsoever, but then what does 0 mean? Does it mean "has children, the whole 0 of them"? Like I said, pure semantics "has 0 children" and "has no children" makes no difference when working with this model in code. In that case why not choosing more straightforwards and less guard-let-?-y approach?
Some suggest that keeping a nil there may be an indication that, for example, when fetching model from backend something went wrong and we got error instead of children. But I think model should not try to have this type of semantics and nil should not be used as indication of some error in the past.
I personally think that the model should be as dumb as possible and the dumbest option in this case is empty array.
Having an optional will make you drag that ? until the end of days and use guard let, if let or ?? over and over again.
You will have to have extra unwrapping logic for NSCoding implementation, you will have to do person.children?.count ?? 0 instead of straightforward person.children.count when you display that model in any view controller.
The final goal of all that manipulation is to display something on UI.
Would you really say
"This person has no children" and "This person has 0 children" for nil and empty array correspondingly? I hope you would not :)
Last Straw
Finally, and this is really the strongest argument I have
What is the type of subviews property of UIView: it's var subviews: [UIView] { get }
What is the type of children property of SKNode: it's var children: [SKNode] { get }
There's tons of examples like this in Cocoa framework: UIViewController::childViewControllers and more.
Even from pure Swift world: Dictionary::keys though this may be a bit far fetched.
Why is it OK for person to have nil children, but not for SKNode? For me the analogy is perfect. Hey, even the SKNode's method name is children :)
My view: there must be an obvious reason for keeping those arrays as optionals, like a really good one, otherwise empty array offers same semantics with less unwrapping.
The Last Last Straw
Finally, some references to very good articles, each of those
http://www.theswiftlearner.com/2015/05/08/empty-or-optional-arrays/
https://www.natashatherobot.com/ios-optional-vs-empty-data-source-swift/
In Natasha's post, you will find a link to NSHipster's blog post and in Swiftification paragraph you can read this:
For example, instead of marking NSArray return values as nullable, many APIs have been modified to return an empty array—semantically these have the same value (i.e., nothing), but a non-optional array is far simpler to work with

Sometimes there's a difference between something not existing and being empty.
Let's say we have an app where a user can modify a list of phone numbers and we save said modifications as modifiedPhoneNumberList. If no modification has ever occurred the array should be nil. If the user has modified the parsed numbers by deleting them all the array should be empty.
Empty means we're going to delete all the existing phone numbers, nil means we keep all the existing phone numbers. The difference matters here.
When we can't differentiate between a property being empty or not existing or it doesn't matter empty is the way to go. If a Person were to lose their only child we should simply have to remove that child and have an empty array rather than have to check if the count is 1 then set the entire array to nil.

I always use empty arrays.
In my humble opinion, the most important purpose of optionals in Swift is to safely wrap some value that may be nil. An array already act as this type of wrapper - you can ask the array if it has anything inside & access its value(s) safely with for loops, mapping, etc. Do we need to put a wrapper within a wrapper? I don't think so.

Swift is designed to take advantage of optional value's and optional unwrapping.
You could also declare the array as nil, as it will save you a very small (almost not noticable) amount of memory.
I would go with an optional array instead of an array that represents a nil value to keep Swift's Design Patterns happy :)
I also think
if let children = children {
}
looks nicer than :
if(children != nil){
}

Related

Shared pointer in rust arrays

I have two arrays:
struct Data {
all_objects: Vec<Rc<dyn Drawable>>;
selected_objects: Vec<Rc<dyn Drawable>>;
}
selected_objects is guarenteed to be a subset of all_objects. I want to be able to somehow be able to add or remove mutable references to selected objects.
I can add the objects easily enough to selected_objects:
Rc::get_mut(selected_object).unwrap().select(true);
self.selected_objects.push(selected_object.clone());
However, if I later try:
for obj in self.selected_objects.iter_mut() {
Rc::get_mut(obj).unwrap().select(false);
}
This gives a runtime error, which matches the documentation for get_mut: "Returns None otherwise, because it is not safe to mutate a shared value."
However, I really want to be able to access and call arbitrary methods on both arrays, so I can efficiently perform operations on the selection, while also being able to still perform operations for all objects.
It seems Rc does not support this, it seems RefMut is missing a Clone() that alows me to put it into multiple arrays, plus not actually supporting dyn types. Box is also missing a Clone(). So my question is, how do you store writable pointers in multiple arrays? Is there another type of smart pointer for this purpose? Do I need to nest them? Is there some other data structure more suitable? Is there a way to give up the writable reference?
Ok, it took me a bit of trial and error, but I have a ugly solution:
struct Data {
all_objects: Vec<Rc<RefCell<dyn Drawable>>>;
selected_objects: Vec<Rc<RefCell<dyn Drawable>>>;
}
The Rc allows you to store multiple references to an object. RefCell makes these references mutable. Now the only thing I have to do is call .borrow() every time I use a object.
While this seems to work and be reasonably versitle, I'm still open for cleaner solutions.

What is the difference between filter(_:).first and first(where:)?

Please consider the following strings array:
let strings = ["str1", "str2", "str10", "str20"]
Let's assume that what required is to get the first element (String) which contains 5 characters, I could get it by using filter(_:) as follows:
let filterString = strings.filter { $0.count == 5 }.first
print(filterString!) // str10
but after reviewing the first(where:) method, I recognized that I will be able to get the same output:
let firstWhereString = strings.first(where: { $0.count == 5 })
print(firstWhereString!) // str10
So what is the benefit of using one instead of the other? is it only about that the filter(_:) returns a sequence and the first(where:) returns a single element?
Update:
I noticed that the filter(_:) took 5 times to do such a process, while first(where:) took 4 times:
You are correct in observing that filter(_:) returns all elements that satisfy a predicate and that first(where:) returns the first element that satisfy a predicate.
So, that leaves us with the more interesting question of what the difference is between elements.filter(predicate).first and
elements.first(where: predicate).
As you've already noticed they both end up with the same result. The difference is in their "evaluation strategy". Calling:
elements.filter(predicate).first
will "eagerly" check the predicate against all elements to filter the full list of elements, and then pick the first element from the filterer list. By comparison, calling:
elements.first(where: predicate)
will "lazily" check the predicate against the elements until it finds one that satisfies the predicate, and then return that element.
As a third alternative, you can explicitly use "a view onto [the list] that provides lazy implementations of normally eager operations, such as map and filter":
elements.lazy.filter(predicate).first
This changes the evaluation strategy to be "lazy". In fact, it's so lazy that just calling elements.lazy.filter(predicate) won't check the predicate against any elements. Only when the first element is "eagerly" evaluated on this lazy view will it evaluate enough elements to return one result.
Separately from any technical differences between these alternatives, I'd say that you should use the one that most clearly describes your intentions. If you're looking for the first element that matches a criteria/predicate then first(where:) communicates that intent best.
I believe we should start from considering each method separately and their purpose.
filter(_:) is not purposefully designed to prepare us to obtain first element. It is about filtering and that's it. It merely returns us a sequence. first can be used after filter, but that's just an instance of usage, a possible case. first is called for a collection and if we want, of course we are free to use it directly after filter.
But first(where:) was designed to filter and return a single element and is evidently the go-to approach to accomplish that kind of a task. That said, it is easy to presume that it's also preferred from performance perspective. As mentioned, we filter entire sequence, but with first(where:) we only process the portion of a sequence until condition is met.

What is the technical term for a value which can be either a scalar or an array?

I'm working with the JSON API format, which has the notion of a data property which can hold either a scalar (single) or array (multiple) value. I'm writing code for encoding and decoding into the format, and when naming my types, was trying to come up with a good name for such types of values. In TypeScript, it would be
type Poly<T> = T | T[];
For your information, here is the relevant part of the JSON API doc (my emphasis):
Primary data MUST be either:
a single resource object, a single resource identifier object, or null, for requests that target single resources
an array of resource objects, an array of resource identifier objects, or an empty array ([]), for requests that target resource collections
As an example, here is a mapping function for such mutant values:
function polymap<T, U>(data: Poly<T>, fn: (input: T) => U, thisArg?: any): Poly<U> {
if (data instanceof Array) return (data as T[]).map(fn, thisArg);
return fn.call(thisArg, data as T);
}
Anyway, as you can see, I'm going with "poly", but is there any established terminology for this, or other good suggestions?
First, the difference between scalars and arrays isn't the number of elements, it's the dimensionality.
Scalars are arrays. Specifically, they're 0-dimensional arrays. So you'd just call all of them arrays.
But note that usually the focus isn't on what values the variable can hold, but what operations are allowed on the variable's potential values.
Some operations can generalize from 1 element to N elements, which seems to be what you want.
The CS-y term for this kind of operation is a "vectorizing operation".
The math term for this kind of operation is a "lifting operation".
I've never heard of anything like this and judging from the fact that the available JSON data types shown here
http://www.tutorialspoint.com/json/json_data_types.htm do not mention anything related to such a data type, I'm betting that this is just smoke and mirrors implemented behind the scenes.
Scalar values and arrays are structured very differently. Combining the two into a true data type would be self contradicting. I'm betting that when the variable is instantiated a method is called somewhere to check the value for an array or scalar value, at which point one of two things takes place. 1- The data type is automatically set as an array and if a scalar is given, it is converted to an array of length 1 and whenever it is called behind the scenes an array is being accessed at index 0. Or 2- The behind the scenes method checks what the data type that was passed in is and sets the same data type for the variable given before instantiation actually takes place.
If a MonoPoly Mohn-O-Pohl-E (if you will because I can't find anything else to call it and it looks like Monopoly) was a scalar and is set to an array data type at some point, the old value can be destroyed and a new one assigned with the same name but as an array. This can happen vise versa and all this can be done behind the scenes as well, making this data type appear to house the description of a scalar along with an array.
is there any established terminology for this, or other good
suggestions?
I'm going to ignore "established terminology", and likely "good" as well, and make the following suggestions on a linguistic basis. My personal choice, due to its sense of one or more rungs and its being the latin origin of scalar, would be:
ladder
Another possible noun:
assemblage
Like constant, I think the following adjectives are ripe to become nouns:
inconstant (similar to incontinent, "insufficient voluntary control" which seem to be what you described you have over this API issue.)
transferable
indiscrete (nicely homophonic to indiscreet)
dual-purpose
multipurpose (the adjective our schools love)
This might even be an oportunity to coin a new word:
polyunary
versutility (my second favorite)
Such functionality is likely replacing a scalar with an array when another element is "added" to a slot. AFAIK there's no term for this, so the question seems more like an English language question.
Try these:
elastic scalar
expandable scalar
scarray (scalar/array)
scalarray (another portmanteau)
arrayable
tardis (holds more than it appears to)

Using Array[Boolean] in Scala to find out progress of foreach

I have a class in Scala that has a method to perform a bunch of calculations sequentially using foreach on a list which is provided in the constructor. The class has a field val progress: Array[Boolean] = list.map(_ => false).toArray. Some of these calculations can take a long time so at the end of each one I set the appropriate index in progress to true. Then I can get progress to determine where I am in the calculations from outside the class.
This does not seem like the best approach in Scala (because I'm using a mutable data structure) so any advice to improve it would be much appreciated.
I don't think your approach is bad. The alternative is to use a var progress: List[Boolean] as an immutable data structure and have a long list of immutable lists pointed at by that variable. You don't really gain anything, you lose the ability to reserve the exact memory you will need in a single step and memory allocation is going to make this slower.
There is a reason why mutable data structures exist and that is because they are incredibly useful and very needed, same as why you can still define var instead of val, the important piece is not that one is "bad" and the other "good", it is a matter of knowing when you can use val and sacrifice mutability in exchange for security. In your example you just can't.
Side note: Instead of using
val progress: Array[Boolean] = list.map(_ => false).toArray
This is much clearer and faster IMHO:
val progress = Array.fill(list.size)(false)
Well, it depends on what you want to do with that information. If you are interested in specific events (e.g., 50% done or something like that), you could pass a listener into your foreach method and ask to be notified. But if you really need to inquire about the current state at any time, then ... well, if you need to know the state, then you have to keep the state, there is no way around that :)
Array of booleans seems to be an overkill (you could just keep the current index instead), but you mentioned that you were planning to keep se additional info around as well, so, it looks reasonable.

Sorting and managing numerous variables

My project has classes which, unavoidably, contain hundreds upon hundreds of variables that I'm always having to keep straight. For example, I'm always having to keep track of specific kinds of variables for a recurring set of "items" that occur inside of a class, where placing those variables between multiple classes would cause a lot of confusion.
How do I better sort my variables to keep from going crazy, especially when it comes time to save my data?
Am I missing something? Actionscript is an Object Oriented language, so you might have hundreds of variables, but unless you've somehow treated it like a grab bag and dumped it all in one place, everything should be to hand. Without knowing what all you're keeping track of, it's hard to give concrete advice, but here's an example from a current project I'm working on, which is a platform for building pre-employment assessments.
The basic unit is a Question. A Question has a stem, text that can go in the status bar, a collection of answers, and a collection of measures of things we're tracking about what the user does in that particular type of questions.
The measures are, again, their own type of object, and come in two "flavors": one that is used to track a time limit and one that isn't. The measure has a name (so we know where to write back to the database) and a value (which tells us what). Timed ones also have a property for the time limit.
When we need to time the question, we hand that measure to yet another object that counts the time down and a separate object that displays the time (if appropriate for the situation). The answers, known as distractors, have a label and a value that they can impart to the appropriate measure based on the user selection. For example, if a user selects "d", its value, "4" is transferred to the measure that stores the user's selection.
Once the user submits his answer, we loop through all the measures for the question and send those to the database. If those were not treated as a collection (in this case, a Vector), we'd have to know exactly what specific measures are being stored for each question and each question would have a very different structure that we'd have to dig through. So if looping through collections is your issue, I think you should revisit that idea. It saves a lot of code and is FAR more efficient than "var1", "var2", "var3."
If the part you think is unweildy is the type checking you have to do because literally anything could be in there, then Vector could be a good solution for you as long as you're using at least Flash Player 10.
So, in summary:
When you have a lot of related properties, write a Class that keeps all of those related bits and pieces together (like my Question).
When objects have 0-n "things" that are all of the same or very similar, use a collection of some sort, such as an Array or Vector, to allow you to iterate through them as a group and perform the same operation on each (for example, each Question is part of a larger grouping that allows each question to be presented in turn, and each question has a collection of distractors and another of measures.
These two concepts, used together, should help keep your information tidy and organized.
While I'm certain there are numerous ways of keeping arrays straight, I have found a method that works well for me. Best of all, it collapses large amounts of information into a handful of arrays that I can parse to an XML file or other storage method. I call this method my "indexed array system".
There are actually multiple ways to do this: creating a handful of 1-dimensional arrays, or creating 2-dimensional (or higher) array(s). Both work equally well, so choose the one that works best for your code. I'm only going to show the 1-dimensional method here. Those of you who are familiar with arrays can probably figure out how to rewrite this to use higher dimensional arrays.
I use Actionscript 3, but this approach should work with almost any programming or scripting language.
In this example, I'm trying to keep various "properties" of different "activities" straight. In this case, we'll say these properties are Level, High Score, and Play Count. We'll call the activities Pinball, Word Search, Maze, and Memory.
This method involves creating multiple arrays, one for each property, and creating constants that hold the integer "key" used for each activity.
We'll start by creating the constants, as integers. Constants work for this, because we never change them after compile. The value we put into each constant is the index the corresponding data will always be stored at in the arrays.
const pinball:int = 0;
const wordsearch:int = 1;
const maze:int = 2;
const memory:int = 3;
Now, we create the arrays. Remember, arrays start counting from zero. Since we want to be able to modify the values, this should be a regular variable.
Note, I am constructing the array to be the specific length we need, with the default value for the desired data type in each slot. I've used all integers here, but you can use just about any data type you need.
var highscore:Array = [0, 0, 0, 0];
var level:Array = [0, 0, 0, 0];
var playcount:Array = [0, 0, 0, 0];
So, we have a consistent "address" for each property, and we only had to create four constants, and three arrays, instead of 12 variables.
Now we need to create the functions to read and write to the arrays using this system. This is where the real beauty of the system comes in. Be sure this function is written in public scope if you want to read/write the arrays from outside this class.
To create the function that gets data from the arrays, we need two arguments: the name of the activity and the name of the property. We also want to set up this function to return a value of any type.
GOTCHA WARNING: In Actionscript 3, this won't work in static classes or functions, as it relies on the "this" keyword.
public function fetchData(act:String, prop:String):*
{
var r:*;
r = this[prop][this[act]];
return r;
}
That queer bit of code, r = this[prop][this[act]], simply uses the provided strings "act" and "prop" as the names of the constant and array, and sets the resulting value to r. Thus, if you feed the function the parameters ("maze", "highscore"), that code will essentially act like r = highscore[2] (remember, this[act] returns the integer value assigned to it.)
The writing method works essentially the same way, except we need one additional argument, the data to be written. This argument needs to be able to accept any
GOTCHA WARNING: One significant drawback to this system with strict typing languages is that you must remember the data type for the array you're writing to. The compiler cannot catch these type errors, so your program will simply throw a fatal error if it tries to write the wrong value type.
One clever way around this is to create different functions for different data types, so passing the wrong data type in an argument will trigger a compile-time error.
public function writeData(act:String, prop:String, val:*):void
{
this[prop][this[act]] = val;
}
Now, we just have one additional problem. What happens if we pass an activity or property name that doesn't exist? To protect against this, we just need one more function.
This function will validate a provided constant or variable key by attempting to access it, and catching the resulting fatal error, returning false instead. If the key is valid, it will return true.
function validateName(ID:String):Boolean
{
var checkthis:*
var r:Boolean = true;
try
{
checkthis = this[ID];
}
catch (error:ReferenceError)
{
r = false;
}
return r;
}
Now, we just need to adjust our other two functions to take advantage of this. We'll wrap the function's code inside an if statement.
If one of the keys is invalid, the function will do nothing - it will fail silently. To get around this, just put a trace (a.k.a. print) statement or a non-fatal error in the else construct.
public function fetchData(act:String, prop:String):*
{
var r:*;
if(validateName(act) && validateName(prop))
{
r = this[prop][this[act]];
return r;
}
}
public function writeData(act:String, prop:String, val:*):void
{
if(validateName(act) && validateName(prop))
{
this[prop][this[act]] = val;
}
}
Now, to use these functions, you simply need to use one line of code each. For the example, we'll say we have a text object in the GUI that shows the high score, called txtHighScore. I've omitted the necessary typecasting for the sake of the example.
//Get the high score.
txtHighScore.text = fetchData("maze", "highscore");
//Write the new high score.
writeData("maze", "highscore", txtHighScore.text);
I hope ya'll will find this tutorial useful in sorting and managing your variables.
(Afternote: You can probably do something similar with dictionaries or databases, but I prefer the flexibility with this method.)

Resources