Why and when to use lazy with Array in Swift?

Why and when to use lazy with Array in Swift? - arrays

[1, 2, 3, -1, -2].filter({ $0 > 0 }).count // => 3
[1, 2, 3, -1, -2].lazy.filter({ $0 > 0 }).count // => 3
What is the advantage of adding lazy to the second statement. As per my understanding, when lazy variable is used, memory is initialized to that variable at the time when it used. How does it make sense in this context?
Trying to understand the the use of LazySequence in little more detail. I had used the map, reduce and filter functions on sequences, but never on lazy sequence. Need to understand why to use this?

lazy changes the way the array is processed. When lazy is not used, filter processes the entire array and stores the results into a new array. When lazy is used, the values in the sequence or collection are produced on demand from the downstream functions. The values are not stored in an array; they are just produced when needed.
Consider this modified example in which I've used reduce instead of count so that we can print out what is happening:
Not using lazy:
In this case, all items will be filtered first before anything is counted.
[1, 2, 3, -1, -2].filter({ print("filtered one"); return $0 > 0 })
.reduce(0) { (total, elem) -> Int in print("counted one"); return total + 1 }
filtered one
filtered one
filtered one
filtered one
filtered one
counted one
counted one
counted one
Using lazy:
In this case, reduce is asking for an item to count, and filter will work until it finds one, then reduce will ask for another and filter will work until it finds another.
[1, 2, 3, -1, -2].lazy.filter({ print("filtered one"); return $0 > 0 })
.reduce(0) { (total, elem) -> Int in print("counted one"); return total + 1 }
filtered one
counted one
filtered one
counted one
filtered one
counted one
filtered one
filtered one
When to use lazy:
option-clicking on lazy gives this explanation:
From the Discussion for lazy:
Use the lazy property when chaining operations:
to prevent intermediate operations from allocating storage
or
when you only need a part of the final collection to avoid unnecessary computation
I would add a third:
when you want the downstream processes to get started sooner and not have to wait for the upstream processes to do all of their work first
So, for example, you'd want to use lazy before filter if you were searching for the first positive Int, because the search would stop as soon as you found one and it would save filter from having to filter the whole array and it would save having to allocate space for the filtered array.
For the 3rd point, imagine you have a program that is displaying prime numbers in the range 1...10_000_000 using filter on that range. You would rather show the primes as you found them than having to wait to compute them all before showing anything.

I hadn't seen this before so I did some searching and found it.
The syntax you post creates a lazy collection. A lazy collection avoids creating a whole series of intermediate arrays for each step of your code. It isn't that relevant when you only have a filter statement it would have much more effect if you did something like filter.map.map.filter.map, since without the lazy collection a new array is created at each step.
See this article for more information:
https://medium.com/developermind/lightning-read-1-lazy-collections-in-swift-fa997564c1a3
EDIT:
I did some benchmarking, and a series of higher-order functions like maps and filters is actually a little slower on a lazy collection than on a "regular" collection.
It looks like lazy collections give you a smaller memory footprint at the cost of slightly slower performance.
Edit #2:
#discardableResult func timeTest() -> Double {
let start = Date()
let array = 1...1000000
let random = array
.map { (value) -> UInt32 in
let random = arc4random_uniform(100)
//print("Mapping", value, "to random val \(random)")
return random
}
let result = random.lazy //Remove the .lazy here to compare
.filter {
let result = $0 % 100 == 0
//print(" Testing \($0) < 50", result)
return result
}
.map { (val: UInt32) -> NSNumber in
//print(" Mapping", val, "to NSNumber")
return NSNumber(value: val)
}
.compactMap { (number) -> String? in
//print(" Mapping", number, "to String")
return formatter.string(from: number)
}
.sorted { (lhv, rhv) -> Bool in
//print(" Sorting strings")
return (lhv.compare(rhv, options: .numeric) == .orderedAscending)
}
let elapsed = Date().timeIntervalSince(start)
print("Completed in", String(format: "%0.3f", elapsed), "seconds. count = \(result.count)")
return elapsed
}
In the code above, if you change the line
let result = random.lazy //Remove the .lazy here to compare
to
let result = random //Removes the .lazy here
Then it runs faster. With lazy, my benchmark has it take about 1.5 times longer with the .lazy collection compared to a straight array.

Related

How to check if two identical array exist in a single two-dimensional array? [Swift]

if I have a two-dimensional array like this: [[1,2,3], [3,2,1], [4,9,3]], I want to be able to find out that there are two identical arrays inside this array, which are [1,2,3] and [3,2,1]. How can I achieve this?
Thank you for all your answers, I was focusing on the leetCode threeSum problem so I didn't leave any comment. But since I am a programming noobie, my answer exceeded the time limit.. so I actually wanted to find the duplicated arrays and remove all the duplicates and leave only one unique array in the multi-dimensional array. I have added some extra code based on #Oleg's answer, and thought I would put my function here :
func removeDuplicates(_ nums: inout [[Int]] ) -> [[Int]]{
let sorted = nums.map{$0.sorted()}
var indexs = [Int]()
for (pos,item) in sorted.enumerated() {
for i in pos+1..<sorted.count {
if item == sorted[i] {
if nums.indices.contains(i){
indexs.append(i)
}
}
}
}
indexs = Array(Set<Int>(indexs))
indexs = indexs.sorted(by: {$0 > $1})
for index in indexs{
nums.remove(at: index)
}
return nums
}

My solution is quite simple and easy to understand.
let input = [[1,2,3], [3,2,1], [4,9,3]]
First let sort all elements of the nested arrays. (It gives us a bit more efficiency.)
let sorted = input.map{$0.sorted()}
Than we should compare each elements.
for (pos,item) in sorted.enumerated() {
for i in pos+1..<sorted.count {
if item == sorted[i] {
print(input[pos])
print(input[i])
}
}
}
Output:
[1, 2, 3]
[3, 2, 1]

One simple and easy brute force approach that comes to my mind is:
Iterate over each row and sort its values. So 1,2,3 will become 123 and 3,2,1 will also become 1,2,3.
Now store it in a key value pair i.e maps. So your key will be 123 and it will map to array 1,2,3 or 3,2,1.
Note:- Your key is all the sorted elements combined together as string without commas.
This way you will know that how may pairs of arrays are there inside a 2d array are identical.

There is a very efficient algorithm using permutation hashing method.
1) preprocess the 2-dim array so that all elements are non-negative. (by subtracting the smallest element from all elements)
2) with each sub-array A:
compute hash[A] = sum(base^A[i] | with all indexes i of sub-array A). Choose base to be a very large prime (1e9+7 for example). You can just ignore the integer-overflow problem when computing, because we use only additions and multiplications here.
3) now you have array "hash" of each sub-array. If the array has 2 identical sub-arrays, then they must have the same hash codes. Find all pairs of sub-arrays having equal hash codes(using hash again, or sorting, ... whatever).
4) For each pair, check again if these sub-arrays actually match (sort and compare, ... whatever). Return true if you can find 2 sub-arrays that actually match, false otherwise.
Practically, this method runs extremely fast even though it is very slow theoretically. This is because of the hashing step will prune most of search space, and this hash function is super strong. I am sure 99.99% that if exist, the pair of corresponding sub-arrays having the same hash codes will actually match.

Array Contains Too Slow Swift

I have been porting over an algorithm I've been using in Java (Android) to Swift (iOS), and have run into some issues with speed on the Swift version.
The basic idea is there are objects with depths (comment tree), and I can hide and show replies from the dataset by matching against a list of hidden objects. Below is a visualization
Top
- Reply 1
- - Reply 2
- - Reply 3
- Reply 4
and after hiding from the dataset
Top
- Reply 1
- Reply 4
The relevant methods I've converted from Java are as follows
//Gets the "real" position of the index provided in the "position" variable. The comments array contains all the used data, and the hidden array is an array of strings that represent items in the dataset that should be skipped over.
func getRealPosition(position: Int)-> Int{
let hElements = getHiddenCountUpTo(location: position)
var diff = 0
var i = 0
while i < hElements {
diff += 1
if(comments.count > position + diff && hidden.contains(comments[(position + diff)].getId())){
i -= 1
}
i += 1
}
return position + diff
}
func getHiddenCountUpTo(location: Int) -> Int{
var count = 0
var i = 0
repeat {
if (comments.count > i && hidden.contains(comments[i].getId())) {
count += 1
}
i += 1
} while(i <= location && i < comments.count)
return count
}
This is used with a UITableViewController to display comments as a tree.
In Java, using array.contains was quick enough to not cause any lag, but the Swift version calls the getRealPosition function many times when calling heightForRowAt and when populating the cell, leading to increasing lag as more comment ids are added to the "hidden" array.
Is there any way I can improve on the speed of the array "contains" lookups (possibly with a different type of collection)? I did profiling on the application and "contains" was the method that took up the most time.
Thank you

Both Java and Swift have to go through all elements contained in the array. This gets slower and slower as the array gets larger.
There is no a priori reason for Java to fare better, as they both use the exact same algorithm. However, strings are implemented very differently in each language, so that could make string comparisons more expensive in Swift.
In any case, if string comparison slows you down, then you must avoid it.
Easy fix: use a Set
If you want a simple performance boost, you can replace an array of strings with a set of strings. A set in Swift is implemented with a hash table, meaning that you have expected constant time query. In practice, this means that for large sets, you will see better performance.
var hiddenset Set<String> = {}
for item in hidden {
strset.insert(item)
}
For best performance: use a BitSet
But you should be able to do a whole lot better than even a set can do. Let us look at your code
hidden.contains(comments[i].getId()))
If you are always accessing hidden in this manner, then it means that what you have is a map from integers (i) to Boolean values (true or false).
Then you should do the following...
import Bitset;
let hidden = Bitset ();
// replace hidden.append(comments[i].getId())) by this:
hidden.add(i)
// replace hidden.contains(comments[i].getId())) by this:
hidden.contains(i)
Then your code will really fly!
To use a fast BitSet implementation in Swift, include the following in Package.swift (it is free software):
import PackageDescription
let package = Package(
name: "fun",
dependencies: [
.Package(url: "https://github.com/lemire/SwiftBitset.git", majorVersion: 0)
]
)

i think you need the realPosition to link from a tap on a row in the tableview to the source array?
1) make a second array with data only for the tableViewDataSource
copy all visible elements to this new array. create a special ViewModel as class or better struct which only has the nessesary data to display in the tableview. save in this new ViewModel the realdataposition also as value. now you have a backlink to the source array
2) then populate this TableView only from the new datasource
3) look more into the functional programming in swift - there you can nicer go over arrays for example:
var array1 = ["a", "b", "c", "d", "e"]
let array2 = ["a", "c", "d"]
array1 = array1.filter { !array2.contains($0) }
or in your case:
let newArray = comments.filter{ !hidden.contains($0.getId()) }
or enumerated to create the viewmodel
struct CommentViewModel {
var id: Int
var text: String
var realPosition: Int
}
let visibleComments: [CommentViewModel] = comments
.enumerated()
.map { (index, element) in
return CommentViewModel(id: element.getId(), text: element.getText(), realPosition: index)
}
.filter{ !hidden.contains($0.id) }

Does joined() or flatMap(_:) perform better in Swift 3?

I'm curious about the performance characteristics of joined() and .flatMap(_:) in flattening a multidimensional array:
let array = [[1,2,3],[4,5,6],[7,8,9]]
let j = Array(array.joined())
let f = array.flatMap{$0}
They both flatten the nested array into [1, 2, 3, 4, 5, 6, 7, 8, 9]. Should I prefer one over the other for performance? Also, is there a more readable way to write the calls?

TL; DR
When it comes just to flattening 2D arrays (without any transformations or separators applied, see #dfri's answer for more info about that aspect), array.flatMap{$0} and Array(array.joined()) are both conceptually the same and have similar performance.
The main difference between flatMap(_:) and joined() (note that this isn't a new method, it has just been renamed from flatten()) is that joined() is always lazily applied (for arrays, it returns a special FlattenBidirectionalCollection<Base>).
Therefore in terms of performance, it makes sense to use joined() over flatMap(_:) in situations where you only want to iterate over part of a flattened sequence (without applying any transformations). For example:
let array2D = [[2, 3], [8, 10], [9, 5], [4, 8]]
if array2D.joined().contains(8) {
print("contains 8")
} else {
print("doesn't contain 8")
}
Because joined() is lazily applied & contains(_:) will stop iterating upon finding a match, only the first two inner arrays will have to be 'flattened' to find the element 8 from the 2D array. Although, as #dfri correctly notes below, you are also able to lazily apply flatMap(_:) through the use of a LazySequence/LazyCollection – which can be created through the lazy property. This would be ideal for lazily applying both a transformation & flattening a given 2D sequence.
In cases where joined() is iterated fully through, it is conceptually no different from using flatMap{$0}. Therefore, these are all valid (and conceptually identical) ways of flattening a 2D array:
array2D.joined().map{$0}
Array(array2D.joined())
array2D.flatMap{$0}
In terms of performance, flatMap(_:) is documented as having a time-complexity of:
O(m + n), where m is the length of this sequence and n is the length of the result
This is because its implementation is simply:
public func flatMap<SegmentOfResult : Sequence>(
_ transform: (${GElement}) throws -> SegmentOfResult
) rethrows -> [SegmentOfResult.${GElement}] {
var result: [SegmentOfResult.${GElement}] = []
for element in self {
result.append(contentsOf: try transform(element))
}
return result
}
}
As append(contentsOf:) has a time-complexity of O(n), where n is the length of sequence to append, we get an overall time-complexity of O(m + n), where m will be total length of all sequences appended, and n is the length of the 2D sequence.
When it comes to joined(), there is no documented time-complexity, as it is lazily applied. However, the main bit of source code to consider is the implementation of FlattenIterator, which is used to iterate over the flattened contents of a 2D sequence (which will occur upon using map(_:) or the Array(_:) initialiser with joined()).
public mutating func next() -> Base.Element.Iterator.Element? {
repeat {
if _fastPath(_inner != nil) {
let ret = _inner!.next()
if _fastPath(ret != nil) {
return ret
}
}
let s = _base.next()
if _slowPath(s == nil) {
return nil
}
_inner = s!.makeIterator()
}
while true
}
Here _base is the base 2D sequence, _inner is the current iterator from one of the inner sequences, and _fastPath & _slowPath are hints to the compiler to aid with branch prediction.
Assuming I'm interpreting this code correctly & the full sequence is iterated through, this also has a time complexity of O(m + n), where m is the length of the sequence, and n is the length of the result. This is because it goes through each outer iterator and each inner iterator to get the flattened elements.
So, performance wise, Array(array.joined()) and array.flatMap{$0} both have the same time complexity.
If we run a quick benchmark in a debug build (Swift 3.1):
import QuartzCore
func benchmark(repeatCount:Int = 1, name:String? = nil, closure:() -> ()) {
let d = CACurrentMediaTime()
for _ in 0..<repeatCount {
closure()
}
let d1 = CACurrentMediaTime()-d
print("Benchmark of \(name ?? "closure") took \(d1) seconds")
}
let arr = [[Int]](repeating: [Int](repeating: 0, count: 1000), count: 1000)
benchmark {
_ = arr.flatMap{$0} // 0.00744s
}
benchmark {
_ = Array(arr.joined()) // 0.525s
}
benchmark {
_ = arr.joined().map{$0} // 1.421s
}
flatMap(_:) appears to be the fastest. I suspect that joined() being slower could be due to the branching that occurs within the FlattenIterator (although the hints to the compiler minimise this cost) – although just why map(_:) is so slow, I'm not too sure. Would certainly be interested to know if anyone else knows more about this.
However, in an optimised build, the compiler is able to optimise away this big performance difference; giving all three options comparable speed, although flatMap(_:) is still fastest by a fraction of a second:
let arr = [[Int]](repeating: [Int](repeating: 0, count: 10000), count: 1000)
benchmark {
let result = arr.flatMap{$0} // 0.0910s
print(result.count)
}
benchmark {
let result = Array(arr.joined()) // 0.118s
print(result.count)
}
benchmark {
let result = arr.joined().map{$0} // 0.149s
print(result.count)
}
(Note that the order in which the tests are performed can affect the results – both of above results are an average from performing the tests in the various different orders)

From the Swiftdoc.org documentation of Array (Swift 3.0/dev) we read [emphasis mine]:
func flatMap<SegmentOfResult : Sequence>(_: #noescape (Element) throws -> SegmentOfResult)
Returns an array containing the concatenated results of calling the
given transformation with each element of this sequence.
...
In fact, s.flatMap(transform) is equivalent to Array(s.map(transform).flatten()).
We may also take a look at the actual implementations of the two in the Swift source code (from which Swiftdoc is generated ...)
swift/stdlib/public/core/Join.swift
swift/stdlib/public/core/FlatMap.swift
Most noteably the latter source file, where the flatMap implementations where the used closure (transform) does not yield and optional value (as is the case here) are all described as
/// Returns the concatenated results of mapping `transform` over
/// `self`. Equivalent to
///
/// self.map(transform).joined()
From the above (assuming the compiler can be clever w.r.t. a simple over self { $0 } transform), it would seem as if performance-wise, the two alternatives should be equivalent, but joined does, imo, better show the intent of the operation.
In addition to intent in semantics, there is one apparent use case where joined is preferable over (and not entirely comparable to) flatMap: using joined with it's init(separator:) initializer to join sequences with a separator:
let array = [[1,2,3],[4,5,6],[7,8,9]]
let j = Array(array.joined(separator: [42]))
print(j) // [1, 2, 3, 42, 4, 5, 6, 42, 7, 8, 9]
The corresponding result using flatMap is not really as neat, as we explicitly need to remove the final additional separator after the flatMap operation (two different use cases, with or without trailing separator)
let f = Array(array.flatMap{ $0 + [42] }.dropLast())
print(f) // [1, 2, 3, 42, 4, 5, 6, 42, 7, 8, 9]
See also a somewhat outdated post of Erica Sadun dicussing flatMap vs. flatten() (note: joined() was named flatten() in Swift < 3).
Erica Sadun- Beta 6: flatten #swiftlang

how to count specific items in array in swift

Let's say i have array of any object below , I'm looking for a way to count items in the array as following:
var OSes = ["iOS", "Android", "Android","Android","Windows Phone", 25]
Is there a short way for swift to do something like this below ?
Oses.count["Android"] // 3

A fast, compact and elegant way to do it is by using the reduce method:
let count = OSes.reduce(0) { $1 == "Android" ? $0 + 1 : $0 }
It's more compact than a for loop, and faster than a filter, because it doesn't generate a new array.
The reduce method takes an initial value, 0 in our case, and a closure, applied to each element of the array.
The closure takes 2 parameters:
the value at the previous iteration (or the initial value, 0 in our case)
the array element for the current iteration
The value returned by the closure is used as the first parameter in the next iteration, or as the return value of the reduce method when the last element has been processed
The closure simply checks if the current element is Android:
if not, it returns the aggregate value (the first parameter passed to the closure)
if yes, it returns that number plus one

It's pretty simple with .filter:
OSes.filter({$0 == "Android"}).count // 3

Swift 5 with count(where:)
let countOfAndroid = OSes.count(where: { $0 == "Android" })
Swift 4 or less with filter(_:)
let countOfAndroid = OSes.filter({ $0 == "Android" }).count

Efficient way to convert Scala Array to Unique Sorted List

Can anybody optimize following statement in Scala:
// maybe large
val someArray = Array(9, 1, 6, 2, 1, 9, 4, 5, 1, 6, 5, 0, 6)
// output a sorted list which contains unique element from the array without 0
val newList=(someArray filter (_>0)).toList.distinct.sort((e1, e2) => (e1 > e2))
Since the performance is critical, is there a better way?
Thank you.

This simple line is one of the fastest codes so far:
someArray.toList.filter (_ > 0).sortWith (_ > _).distinct
but the clear winner so far is - due to my measurement - Jed Wesley-Smith. Maybe if Rex' code is fixed, it looks different.
Typical disclaimer 1 + 2:
I modified the codes to accept an Array and return an List.
Typical benchmark considerations:
This was random data, equally distributed. For 1 Million elements, I created an Array of 1 Million ints between 0 and 1 Million. So with more or less zeros, and more or less duplicates, it might vary.
It might depend on the machine etc.. I used a single core CPU, Intel-Linux-32bit, jdk-1.6, scala 2.9.0.1
Here is the underlying benchcoat-code and the concrete code to produce the graph (gnuplot). Y-axis: time in seconds. X-axis: 100 000 to 1 000 000 elements in Array.
update:
After finding the problem with Rex' code, his code is as fast as Jed's code, but the last operation is a transformation of his Array to a List (to fullfill my benchmark-interface). Using a var result = List [Int], and result = someArray (i) :: result speeds his code up, so that it is about twice as fast as the Jed-Code.
Another, maybe interesting, finding is: If I rearrange my code in the order of filter/sort/distinct (fsd) => (dsf, dfs, fsd, ...), all 6 possibilities don't differ significantly.

I haven't measured, but I'm with Duncan, sort in place then use something like:
util.Sorting.quickSort(array)
array.foldRight(List.empty[Int]){
case (a, b) =>
if (!b.isEmpty && b(0) == a)
b
else
a :: b
}
In theory this should be pretty efficient.

Without benchmarking I can't be sure, but I imagine the following is pretty efficient:
val list = collection.SortedSet(someArray.filter(_>0) :_*).toList
Also try adding .par after someArray in your version. It's not guaranteed to be quicker, bit it might be. You should run a benchmark and experiment.
sort is deprecated. Use .sortWith(_ > _) instead.

Boxing primitives is going to give you a 10-30x performance penalty. Therefore if you really are performance limited, you're going to want to work off of raw primitive arrays:
def arrayDistinctInts(someArray: Array[Int]) = {
java.util.Arrays.sort(someArray)
var overzero = 0
var ndiff = 0
var last = 0
var i = 0
while (i < someArray.length) {
if (someArray(i)<=0) overzero = i+1
else if (someArray(i)>last) {
last = someArray(i)
ndiff += 1
}
i += 1
}
val result = new Array[Int](ndiff)
var j = 0
i = overzero
last = 0
while (i < someArray.length) {
if (someArray(i) > last) {
result(j) = someArray(i)
last = someArray(i)
j += 1
}
i += 1
}
result
}
You can get slightly better than this if you're careful (and be warned, I typed this off the top of my head; I might have typoed something, but this is the style to use), but if you find the existing version too slow, this should be at least 5x faster and possibly a lot more.
Edit (in addition to fixing up the previous code so it actually works):
If you insist on ending with a list, then you can build the list as you go. You could do this recursively, but I don't think in this case it's any clearer than the iterative version, so:
def listDistinctInts(someArray: Array[Int]): List[Int] = {
if (someArray.length == 0 || someArray(someArray.length-1) <= 0) List[Int]()
else {
java.util.Arrays.sort(someArray)
var last = someArray(someArray.length-1)
var list = last :: Nil
var i = someArray.length-2
while (i >= 0) {
if (someArray(i) < last) {
last = someArray(i)
if (last <= 0) return list;
list = last :: list
}
i -= 1
}
list
}
}
Also, if you may not destroy the original array by sorting, you are by far best off if you duplicate the array and destroy the copy (array copies of primitives are really fast).
And keep in mind that there are special-case solutions that are far faster yet depending on the nature of the data. For example, if you know that you have a long array, but the numbers will be in a small range (e.g. -100 to 100), then you can use a bitset to track which ones you've encountered.

For efficiency, depending on your value of large:
val a = someArray.toSet.filter(_>0).toArray
java.util.Arrays.sort(a) // quicksort, mutable data structures bad :-)
res15: Array[Int] = Array(1, 2, 4, 5, 6, 9)
Note that this does the sort using qsort on an unboxed array.

I'm not in a position to measure, but some more suggestions...
Sorting the array in place before converting to a list might well be more efficient, and you might look at removing dups from the sorted list manually, as they will be grouped together. The cost of removing 0's before or after the sort will also depend on their ratio to the other entries.

How about adding everything to a sorted set?
val a = scala.collection.immutable.SortedSet(someArray filter (0 !=): _*)
Of course, you should benchmark the code to check what is faster, and, more importantly, that this is truly a hot spot.