Okay, so I have a stream that is receiving an array at a fast speed constantly. Here's what I want to do...
If the array is the same, don't do anything. If it is different, make a new array with nil as every value except the changed ones.
Example:
Incoming array 1: [1,1,1,1]
Incoming array 2: [1,1,2,1]
I want to create: [nil, nil, 2, nil]. Only marking the changes.
I made something that worked, I just don't think it's efficient. Is it the best way to do it?
var storedArray = [Int](count: 10, repeatedValue: 0) //array for comparing
func incomingArray(data: [Int]) {
if data == storedArray {return} //do nothing if its the same
var tempArray = [Int?](count: 10, repeatedValue: nil) //nil array
for index in 0...9 {
if storedArray[index] != data[index] {
tempArray[index] = data[index] //replace the temp array index
}
}
//send the completed tempArray to do work ....
storedArray = incomingArray //save the stored as the current data
}
So the above code works. It's just not efficient. Any better ideas for this?
thanks
UPDATE 1:
I have mistaken in the original post. Instead of Int. They are UInt8.
If you’re concerned about performance, the first thing to look for is hidden loops. Here’s one:
if data == storedArray {return}
This is presumably here for intended efficiency – if the two arrays are equal, don’t bother doing anything. But really, this might be self-defeating. That comparison isn’t constant time – it loops over the elements and compares them. Since you’re going to loop over them later anyway, that probably doesn’t give you much.
You could argue it saves you allocating a new array, but this then leads to the next question which is do you really need to create an array with all those nil values? Why not instead generate an array of the indices into the array that are different? That way, the recipient of your differences will only have to loop over the differences (maybe only a couple) rather than the whole array.
It probably makes sense to factor out the array diffing from the processing and storage. Here’s a function that takes two arrays and returns an array of indices where they differ:
func differences<T: Equatable>(lhs: [T], rhs: [T]) -> [Int] {
// indexedPairs is a sequence of (index, (left-hand val, right-hand val))
let indexedPairs = enumerate(zip(lhs,rhs))
// the lazy may or may not help here, benchmark to find out...
return lazy(indexedPairs).filter { (index, pair) in
// only return different pairs
pair.0 != pair.1
}.map {
// only return the index not the values
$0.0
}.array
}
Note this is a pure function – that is, it takes inputs and produces a result without referencing any external state. This makes it easier to test and debug as a standalone function.
You could then rewrite your original function in terms of it:
func incomingArray(data: [Int]) {
let diffs = differences(storedArray, data)
if !diffs.isEmpty {
// send new data and diff indices for further processing
// then overwrite the old array
storedArray = data
}
}
Update
Benchmarking suggests the filter/map version performs horribly, compared to a simple loop, so here’s a version of differences that just uses for…in:
func differences<T: Equatable>(lhs: [T], rhs: [T]) -> [Int] {
var diffs: [Int] = []
// still using zip, since this guards against the two
// arrays being of different sizes - doesn’t seem to
// impact performance
for (i,(l,r)) in zip(indices(lhs),zip(lhs,rhs)) {
if l != r { diffs.append(i) }
}
return diffs
}
Some quick tests suggests this version gets a significant speedup if the input is large and the # of differences small, but performs identically if the arrays are mostly different.
Here are some ideas to make this code faster:
1) Instead of using an array of Int?, use a plain Int and instead of marking elements as nil, mark them as some special integer value. I don't know what that value is, maybe 0 is fine, or -1, or Int.max.
Update: The above change gives me a ~ 10% performance increase
2) Recycle your result array. So that you can skip the following code:
var tempArray = [Int?](count: 10, repeatedValue: nil)
Or maybe better, let the caller pass it in via an inout parameter so that you don't have to worry about ownership of it.
Update: The above change gives me a ~ 50% performance increase
Here is the code for all the versions suggested in this question:
import UIKit
import XCTest
var storedArray1 = [Int?](count: 10, repeatedValue: 0) //array for comparing
func processIncomingArray1(data: [Int]) {
var tempArray = [Int?](count: 10, repeatedValue: nil) //nil array
for index in 0...9 {
if storedArray1[index] != data[index] {
tempArray[index] = data[index] //replace the temp array index
}
}
storedArray1 = tempArray
}
var storedArray2 = [Int](count: 10, repeatedValue: 0)
func processIncomingArray2(data: [Int]) {
var tempArray = [Int](count: 10, repeatedValue: Int.max)
for index in 0...9 {
if storedArray2[index] != data[index] {
tempArray[index] = data[index]
}
}
storedArray2 = tempArray
}
var storedArray3 = [Int](count: 10, repeatedValue: Int.max)
func processIncomingArray3(data: [Int], inout result: [Int]) {
for index in 0...9 {
if result[index] != data[index] {
result[index] = data[index]
}
}
}
// Given two sequences, return a sequence of 2-tuples (pairs)
public func zip<A: SequenceType, B: SequenceType>(a: A, b: B)
-> ZipSequence<A, B>
{
return ZipSequence(a, b)
}
// Lazy sequence of tuples created from values from two other sequences
public struct ZipSequence<A: SequenceType, B: SequenceType>: SequenceType {
private var a: A
private var b: B
public init (_ a: A, _ b: B) {
self.a = a
self.b = b
}
public func generate() -> ZipGenerator<A.Generator, B.Generator> {
return ZipGenerator(a.generate(), b.generate())
}
}
// Generator that creates tuples of values from two other generators
public struct ZipGenerator<A: GeneratorType, B: GeneratorType>: GeneratorType {
private var a: A
private var b: B
public init(_ a: A, _ b: B) {
self.a = a
self.b = b
}
mutating public func next() -> (A.Element, B.Element)? {
switch (a.next(), b.next()) {
case let (.Some(aValue), .Some(bValue)):
return (aValue, bValue)
default:
return nil
}
}
}
func differences<T: Equatable>(lhs: [T], rhs: [T]) -> [Int] {
// indexedPairs is a sequence of (index, (left-hand val, right-hand val))
let indexedPairs = enumerate(zip(lhs,rhs))
// the lazy may or may not help here, benchmark to find out...
return lazy(indexedPairs).filter { (index, pair) in
// only return different pairs
pair.0 != pair.1
}.map {
// only return the index not the values
$0.0
}.array
}
var storedArray4 = [Int](count: 10, repeatedValue: Int.max)
func processIncomingArray4(data: [Int]) {
let diffs = differences(storedArray4, data)
if !diffs.isEmpty {
// send new data and diff indices for further processing
// then overwrite the old array
storedArray4 = data
}
}
func differences5<T: Equatable>(lhs: [T], rhs: [T]) -> [Int] {
var diffs: [Int] = []
// still using zip, since this guards against the two
// arrays being of different sizes - doesn’t seem to
// impact performance
for (i,(l,r)) in zip(indices(lhs),zip(lhs,rhs)) {
if l != r { diffs.append(i) }
}
return diffs
}
var storedArray5 = [Int](count: 10, repeatedValue: Int.max)
func processIncomingArray5(data: [Int]) {
let diffs = differences5(storedArray4, data)
if !diffs.isEmpty {
// send new data and diff indices for further processing
// then overwrite the old array
storedArray5 = data
}
}
class StackOverflowTests: XCTestCase {
func testPerformanceExample1() {
var data = [1,2,3,4,5,6,7,8,9,10]
self.measureBlock() {
for i in 1...100000 {
processIncomingArray1(data)
}
}
}
func testPerformanceExample2() {
var data = [1,2,3,4,5,6,7,8,9,10]
self.measureBlock() {
for i in 1...100000 {
processIncomingArray2(data)
}
}
}
func testPerformanceExample3() {
var data = [1,2,3,4,5,6,7,8,9,10]
self.measureBlock() {
for i in 1...100000 {
processIncomingArray3(data, &storedArray3)
}
}
}
func testPerformanceExample4() {
var data = [1,2,3,4,5,6,7,8,9,10]
self.measureBlock() {
for i in 1...100000 {
processIncomingArray4(data)
}
}
}
func testPerformanceExample5() {
var data = [1,2,3,4,5,6,7,8,9,10]
self.measureBlock() {
for i in 1...100000 {
processIncomingArray5(data)
}
}
}
}
I think I have the best answer. Instead of the whole empty nil array. I made my temp array a bool value. Then if the value changes, mark its index as true.
So here's a example.
Incoming array 1: [1,1,1,1]
Incoming array 2: [1,1,2,1]
I then export the full array plus a bool array of: [false, false, true, false].
Then I just check if theres a change and pull the value. It turned out to work much faster then the other answers. I also recycled the temp array to speed it up. My guess is that since it's value can only be true or false, it's a lot faster then NIL/UInt8.
Thanks for the help guys. Let me know if any other ideas come up.
Related
I have an array of location coordinates like this
let coodinatesArray: [CLLocationCoordinate2D] = [
CLLocationCoordinate2DMake(-37.986866915266461, 145.0646907496548),
CLLocationCoordinate2DMake(-30.082868871929833, 132.65902204771416),
CLLocationCoordinate2DMake(-21.493671405743246, 120.25335334577362),
CLLocationCoordinate2DMake(-20.311181400000002, 118.58011809999996),
CLLocationCoordinate2DMake(-20.311183981008153, 118.58011757850542),
CLLocationCoordinate2DMake(-8.3154534852154622, 119.32445770062185),
CLLocationCoordinate2DMake(4.0574731310941274, 120.06879782273836),
CLLocationCoordinate2DMake(16.244430153007979, 120.68908125783528),
CLLocationCoordinate2DMake(27.722556142642798, 121.4334213799517),
CLLocationCoordinate2DMake(37.513067999999976, 122.12041999999997)
]
I found some answers 1, 2 but I cannot use it since my array is not Equatable. Is it possible to remove it using filter in swift for arrays?
You can create an extension to CLLocationCoordinate2D to make it conform to Hashable.
extension CLLocationCoordinate2D: Hashable {
public var hashValue: Int {
return Int(latitude * 1000)
}
static public func == (lhs: CLLocationCoordinate2D, rhs: CLLocationCoordinate2D) -> Bool {
// Due to the precision here you may wish to use alternate comparisons
// The following compares to less than 1/100th of a second
// return abs(rhs.latitude - lhs.latitude) < 0.000001 && abs(rhs.longitude - lhs.longitude) < 0.000001
return lhs.latitude == rhs.latitude && lhs.longitude == rhs.longitude
}
}
Then you can get unique locations using Set:
let coodinatesArray: [CLLocationCoordinate2D] = ... // your array
let uniqueLocations = Array(Set(coodinatesArray))
If you need to keep the original order you can replace that last line with:
let uniqueLocations = NSOrderedSet(array: coodinatesArray).array as! [CLLocationCoordinate2D]
I've just read a post by Basem Emara about creating a threadsafe array Type in Swift. While I glanced through the code example, I asked myself if there isn't a way to achieve this with quite less code.
Suppose I create this class:
// MARK: Class Declaration
class ThreadsafeArray<Element> {
// Private Variables
private var __array: [Element] = []
private var __arrayQueue: DispatchQueue = DispatchQueue(
label: "ThreadsafeArray.__concurrentArrayQueue",
attributes: .concurrent
)
}
// MARK: Interface
extension ThreadSafeArray {
// ReadWrite Variables
var threadsafe: [Element] {
get {
return self.__arrayQueue.sync {
return self.__array
}
}
set(newArray) {
self.__arrayQueue.async(flags: .barrier) {
self.__array = newArray
}
}
}
}
If, from now on, I only accessed the actual array through .threadsafe, would this suffice to make the array threadsafe?
Also, could I implement it a struct instead of a class to get the mutating checks as well?
I am aware that the objects inside this array would not be threadsafe themselves through this but this is not the point, so let's assume I only put threadsafe stuff in there.
(Of course, to avoid the calls to .threadsafe, I would make the shiny new class conform to ExpressibleByArrayLiteral, Collection and RangeReplaceableCollection, so I can use it like a normal array.
Edit
Meanwhile, I've tried testing it in a playground and have come to believe that it doesn't suffice.
Playground code:
import Foundation
import PlaygroundSupport
PlaygroundPage.current.needsIndefiniteExecution = true
// Testing //
// Thread-unsafe array
func unsafeArray() {
var array: [Int] = []
var iterations: Int = 1000
let start: TimeInterval = Date().timeIntervalSince1970
DispatchQueue.concurrentPerform(iterations: iterations) { index in
let last: Int = array.last ?? 0
array.append(last + 1)
DispatchQueue.global().sync {
iterations -= 1
// Final loop
guard iterations <= 0 else { return }
print(String(
format: "Unsafe loop took %.3f seconds, count: %d.",
Date().timeIntervalSince1970 - start, array.count
))
}
}
}
// Thread-safe array
func safeArray() {
let array: ThreadsafeArray<Int> = ThreadsafeArray<Int>()
var iterations: Int = 1000
let start: TimeInterval = Date().timeIntervalSince1970
DispatchQueue.concurrentPerform(iterations: iterations) { index in
let last: Int = array.threadsafe.last ?? 0
array.threadsafe.append(last + 1)
DispatchQueue.global().sync {
iterations -= 1
// Final loop
guard iterations <= 0 else { return }
print(String(
format: "Safe loop took %.3f seconds, count: %d.",
Date().timeIntervalSince1970 - start, array.threadsafe.count
))
}
}
}
unsafeArray()
safeArray()
Output:
Most of the time:
experiments(31117,0x7000038d0000) malloc: *** error for object 0x11f663d28: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Sometimes:
IndexError: Index out of range
Unfortunately also:
Unsafe loop took 1.916 seconds, count: 994.
Safe loop took 11.258 seconds, count: 515.
Doesn't seem to suffice (also, it's incredibly unperformant).
The synchronization mechanism in your question, with concurrent queue and judicious use of barrier is known as the “reader-writer” pattern. In short, it offers concurrent synchronous reads and non-concurrent asynchronous writes. This is a fine synchronization mechanism. It is not the problem here.
But there are a few problems:
In the attempt to pare back the implementation, this class has become very inefficient. Consider:
class ThreadSafeArray<Element> {
private var array: [Element]
private let queue = DispatchQueue(label: "ThreadsafeArray.reader-writer", attributes: .concurrent)
init(_ array: [Element] = []) {
self.array = array
}
}
extension ThreadSafeArray {
var threadsafe: [Element] {
get { queue.sync { array } }
set { queue.async(flags: .barrier) { self.array = newValue } }
}
}
let numbers = ThreadSafeArray([1, 2, 3])
numbers.threadsafe[1] = 42 // !!!
What that numbers.threadsafe[1] = 42 line is really doing is as follows:
Fetching the whole array;
Changing the second item in a copy of the array; and
Replacing the whole array with a copy of the array that was just created.
That is obviously very inefficient.
The intuitive solution is to add an efficient subscript operator in the implementation:
extension ThreadSafeArray {
typealias Index = Int
subscript(index: Index) -> Element {
get { queue.sync { array[index] } }
set { queue.async(flags: .barrier) { self.array[index] = newValue} }
}
}
Then you can do:
numbers[1] = 42
That will perform a synchronized update of the existing array “in place”, without needing to copy the array at all. In short, it is an efficient, thread-safe mechanism.
What will end up happening, as one adds more and more basic “array” functionality (e.g., especially mutable methods such as the removing of items, adding items, etc.), you end up with an implementation not dissimilar to the original implementation you found online. This is why that article you referenced implemented all of those methods: It exposes array-like functionality, but offering an efficient and (seemingly) thread-safe interface.
While the above addresses the data races, there is a deep problem in that code sample you found online, as illuminated by your thread-safety test.
To illustrate this, let’s first assume we flesh out our ThreadSafeArray to have last, append() and make it print-able:
class ThreadSafeArray<Element> {
private var array: [Element]
private let queue = DispatchQueue(label: "ThreadsafeArray.reader-writer", attributes: .concurrent)
init(_ array: [Element] = []) {
self.array = array
}
}
extension ThreadSafeArray {
typealias Index = Int
subscript(index: Index) -> Element {
get { queue.sync { array[index] } }
set { queue.async(flags: .barrier) { self.array[index] = newValue} }
}
var last: Element? {
queue.sync { array.last }
}
func append(_ newElement: Element) {
queue.async(flags: .barrier) {
self.array.append(newElement)
}
}
}
extension ThreadSafeArray: CustomStringConvertible {
var description: String {
queue.sync { array.description }
}
}
That implementation (a simplified version of the rendition found on that web site) looks OK, as it solves the data race and avoids unnecessary copying of the array. But it has its own problems. Consider this rendition of your thread-safety test:
let numbers = ThreadSafeArray([0])
DispatchQueue.concurrentPerform(iterations: 1_000) { <#Int#> in
let lastValue = numbers.last! + 1
numbers.append(lastValue)
}
print(numbers) // !!!
The strict data race is solved, but the result will not be [0, 1, 2, ..., 1000]. The problem are the lines:
let lastValue = numbers.last! + 1
numbers.append(lastValue)
That does a synchronized retrieval of last followed by a separate synchronized append. The problem is that another thread might slip in between these two synchronized calls and fetch the same last value! You need to wrap the whole “fetch last value, increment it, and append this new value” in a single, synchronized task.
To solve this, we would often give the thread-safe object a method that would provide a way to perform multiple statements as a single, synchronized, task. E.g.:
extension ThreadSafeArray {
func synchronized(block: #escaping (inout [Element]) -> Void) {
queue.async(flags: .barrier) { [self] in
block(&array)
}
}
}
Then you can do:
let numbers = ThreadSafeArray([0])
DispatchQueue.concurrentPerform(iterations: 1_000) { <#Int#> in
numbers.synchronized { array in
let lastValue = array.last! + 1
array.append(lastValue)
}
}
print(numbers) // OK
So let’s return to your intuition that the author’s class can be simplified. You are right, that it can and should be simplified. But my rationale is slightly different than yours.
The complexity of the implementation is not my concern. It actually is an interesting pedagogical exercise to understand barriers and the broader reader-writer pattern.
My concern is that (to my point 3, above), is that the author’s implementation lulls an application developer in a false sense of security provided by the low-level thread-safety. As your tests demonstrate, a higher-level level of synchronization is almost always needed.
In short, I would stick to a very basic implementation, one that exposes the appropriate high-level, thread-safe interface, not a method-by-method and property-by-property interface to the underlying array, which almost always will be insufficient. In fact, this desire for a high-level, thread-safe interface is a motivating idea behind a more modern thread-safety mechanism, namely actors in Swift concurrency.
I suspect this line is your issue:
DispatchQueue.global().sync { ...
If you specify one serial queue you want to use here you should get the result you want.
Something like:
let array = SynchronizedArray<Int>()
var iterations = 1000
let queue = DispatchQueue(label: "queue")
DispatchQueue.concurrentPerform(iterations: 1000) { index in
array.append(array.last ?? 0)
queue.sync {
iterations -= 1
if iterations == 0 {
print(array.count)
}
}
}
Another method of locking objects is:
func lock(obj: AnyObject, work:() -> ()) {
objc_sync_enter(obj)
work()
objc_sync_exit(obj)
}
Could your class use this to lock its standard array when needed?
I'd like a function runningSum on an array of numbers a (or any ordered collection of addable things) that returns an array of the same length where each element i is the sum of all elements in A up to an including i.
Examples:
runningSum([1,1,1,1,1,1]) -> [1,2,3,4,5,6]
runningSum([2,2,2,2,2,2]) -> [2,4,6,8,10,12]
runningSum([1,0,1,0,1,0]) -> [1,1,2,2,3,3]
runningSum([0,1,0,1,0,1]) -> [0,1,1,2,2,3]
I can do this with a for loop, or whatever. Is there a more functional option? It's a little like a reduce, except that it builds a result array that has all the intermediate values.
Even more general would be to have a function that takes any sequence and provides a sequence that's the running total of the input sequence.
The general combinator you're looking for is often called scan, and can be defined (like all higher-order functions on lists) in terms of reduce:
extension Array {
func scan<T>(initial: T, _ f: (T, Element) -> T) -> [T] {
return self.reduce([initial], combine: { (listSoFar: [T], next: Element) -> [T] in
// because we seeded it with a non-empty
// list, it's easy to prove inductively
// that this unwrapping can't fail
let lastElement = listSoFar.last!
return listSoFar + [f(lastElement, next)]
})
}
}
(But I would suggest that that's not a very good implementation.)
This is a very useful general function, and it's a shame that it's not included in the standard library.
You can then generate your cumulative sum by specializing the starting value and operation:
let cumSum = els.scan(0, +)
And you can omit the zero-length case rather simply:
let cumSumTail = els.scan(0, +).dropFirst()
Swift 4
The general sequence case
Citing the OP:
Even more general would be to have a function that takes any sequence
and provides a sequence that's the running total of the input
sequence.
Consider some arbitrary sequence (conforming to Sequence), say
var seq = 1... // 1, 2, 3, ... (CountablePartialRangeFrom)
To create another sequence which is the (lazy) running sum over seq, you can make use of the global sequence(state:next:) function:
var runningSumSequence =
sequence(state: (sum: 0, it: seq.makeIterator())) { state -> Int? in
if let val = state.it.next() {
defer { state.sum += val }
return val + state.sum
}
else { return nil }
}
// Consume and print accumulated values less than 100
while let accumulatedSum = runningSumSequence.next(),
accumulatedSum < 100 { print(accumulatedSum) }
// 1 3 6 10 15 21 28 36 45 55 66 78 91
// Consume and print next
print(runningSumSequence.next() ?? -1) // 120
// ...
If we'd like (for the joy of it), we could condense the closure to sequence(state:next:) above somewhat:
var runningSumSequence =
sequence(state: (sum: 0, it: seq.makeIterator())) {
(state: inout (sum: Int, it: AnyIterator<Int>)) -> Int? in
state.it.next().map { (state.sum + $0, state.sum += $0).0 }
}
However, type inference tends to break (still some open bugs, perhaps?) for these single-line returns of sequence(state:next:), forcing us to explicitly specify the type of state, hence the gritty ... in in the closure.
Alternatively: custom sequence accumulator
protocol Accumulatable {
static func +(lhs: Self, rhs: Self) -> Self
}
extension Int : Accumulatable {}
struct AccumulateSequence<T: Sequence>: Sequence, IteratorProtocol
where T.Element: Accumulatable {
var iterator: T.Iterator
var accumulatedValue: T.Element?
init(_ sequence: T) {
self.iterator = sequence.makeIterator()
}
mutating func next() -> T.Element? {
if let val = iterator.next() {
if accumulatedValue == nil {
accumulatedValue = val
}
else { defer { accumulatedValue = accumulatedValue! + val } }
return accumulatedValue
}
return nil
}
}
var accumulator = AccumulateSequence(1...)
// Consume and print accumulated values less than 100
while let accumulatedSum = accumulator.next(),
accumulatedSum < 100 { print(accumulatedSum) }
// 1 3 6 10 15 21 28 36 45 55 66 78 91
The specific array case: using reduce(into:_:)
As of Swift 4, we can use reduce(into:_:) to accumulate the running sum into an array.
let runningSum = arr
.reduce(into: []) { $0.append(($0.last ?? 0) + $1) }
// [2, 4, 6, 8, 10, 12]
By using reduce(into:_:), the [Int] accumulator will not be copied in subsequent reduce iterations; citing the Language reference:
This method is preferred over reduce(_:_:) for efficiency when the
result is a copy-on-write type, for example an Array or a
Dictionary.
See also the implementation of reduce(into:_:), noting that the accumulator is provided as an inout parameter to the supplied closure.
However, each iteration will still result in an append(_:) call on the accumulator array; amortized O(1) averaged over many invocations, but still an arguably unnecessary overhead here as we know the final size of the accumulator.
Because arrays increase their allocated capacity using an exponential
strategy, appending a single element to an array is an O(1) operation
when averaged over many calls to the append(_:) method. When an array
has additional capacity and is not sharing its storage with another
instance, appending an element is O(1). When an array needs to
reallocate storage before appending or its storage is shared with
another copy, appending is O(n), where n is the length of the array.
Thus, knowing the final size of the accumulator, we could explicitly reserve such a capacity for it using reserveCapacity(_:) (as is done e.g. for the native implementation of map(_:))
let runningSum = arr
.reduce(into: [Int]()) { (sums, element) in
if let sum = sums.last {
sums.append(sum + element)
}
else {
sums.reserveCapacity(arr.count)
sums.append(element)
}
} // [2, 4, 6, 8, 10, 12]
For the joy of it, condensed:
let runningSum = arr
.reduce(into: []) {
$0.append(($0.last ?? ($0.reserveCapacity(arr.count), 0).1) + $1)
} // [2, 4, 6, 8, 10, 12]
Swift 3: Using enumerated() for subsequent calls to reduce
Another Swift 3 alternative (with an overhead ...) is using enumerated().map in combination with reduce within each element mapping:
func runningSum(_ arr: [Int]) -> [Int] {
return arr.enumerated().map { arr.prefix($0).reduce($1, +) }
} /* thanks #Hamish for improvement! */
let arr = [2, 2, 2, 2, 2, 2]
print(runningSum(arr)) // [2, 4, 6, 8, 10, 12]
The upside is you wont have to use an array as the collector in a single reduce (instead repeatedly calling reduce).
Just for fun: The running sum as a one-liner:
let arr = [1, 2, 3, 4]
let rs = arr.map({ () -> (Int) -> Int in var s = 0; return { (s += $0, s).1 } }())
print(rs) // [1, 3, 6, 10]
It does the same as the (updated) code in JAL's answer, in particular,
no intermediate arrays are generated.
The sum variable is captured in an immediately-evaluated closure returning the transformation.
If you just want it to work for Int, you can use this:
func runningSum(array: [Int]) -> [Int] {
return array.reduce([], combine: { (sums, element) in
return sums + [element + (sums.last ?? 0)]
})
}
If you want it to be generic over the element type, you have to do a lot of extra work declaring the various number types to conform to a custom protocol that provides a zero element, and (if you want it generic over both floating point and integer types) an addition operation, because Swift doesn't do that already. (A future version of Swift may fix this problem.)
Assuming an array of Ints, sounds like you can use map to manipulate the input:
let arr = [0,1,0,1,0,1]
var sum = 0
let val = arr.map { (sum += $0, sum).1 }
print(val) // "[0, 1, 1, 2, 2, 3]\n"
I'll keep working on a solution that doesn't use an external variable.
I thought I'd be cool to extend Sequence with a generic scan function as is suggested in the great first answer.
Given this extension, you can get the running sum of an array like this: [1,2,3].scan(0, +)
But you can also get other interesting things…
Running product: array.scan(1, *)
Running max: array.scan(Int.min, max)
Running min: array.scan(Int.max, min)
Because the implementation is a function on Sequence and returns a Sequence, you can chain it together with other sequence functions. It is efficient, having linear running time.
Here's the extension…
extension Sequence {
func scan<Result>(_ initialResult: Result, _ nextPartialResult: #escaping (Result, Self.Element) -> Result) -> ScanSequence<Self, Result> {
return ScanSequence(initialResult: initialResult, underlying: self, combine: nextPartialResult)
}
}
struct ScanSequence<Underlying: Sequence, Result>: Sequence {
let initialResult: Result
let underlying: Underlying
let combine: (Result, Underlying.Element) -> Result
typealias Iterator = ScanIterator<Underlying.Iterator, Result>
func makeIterator() -> Iterator {
return ScanIterator(previousResult: initialResult, underlying: underlying.makeIterator(), combine: combine)
}
var underestimatedCount: Int {
return underlying.underestimatedCount
}
}
struct ScanIterator<Underlying: IteratorProtocol, Result>: IteratorProtocol {
var previousResult: Result
var underlying: Underlying
let combine: (Result, Underlying.Element) -> Result
mutating func next() -> Result? {
guard let nextUnderlying = underlying.next() else {
return nil
}
previousResult = combine(previousResult, nextUnderlying)
return previousResult
}
}
One solution using reduce:
func runningSum(array: [Int]) -> [Int] {
return array.reduce([], combine: { (result: [Int], item: Int) -> [Int] in
if result.isEmpty {
return [item] //first item, just take the value
}
// otherwise take the previous value and append the new item
return result + [result.last! + item]
})
}
I'm very late to this party. The other answers have good explanations. But none of them have provided the initial result, in a generic way. This implementation is useful to me.
public extension Sequence {
/// A sequence of the partial results that `reduce` would employ.
func scan<Result>(
_ initialResult: Result,
_ nextPartialResult: #escaping (Result, Element) -> Result
) -> AnySequence<Result> {
var iterator = makeIterator()
return .init(
sequence(first: initialResult) { partialResult in
iterator.next().map {
nextPartialResult(partialResult, $0)
}
}
)
}
}
extension Sequence where Element: AdditiveArithmetic & ExpressibleByIntegerLiteral {
var runningSum: AnySequence<Element> { scan(0, +).dropFirst() }
}
I've been manipulating byte arrays in Swift 2.1 lately, and I often find myself writing code like this:
// code to add functions to a [UInt8] object
extension CollectionType where Generator.Element == UInt8 {
func xor(with byte: UInt8) -> [UInt8] {
return map { $0 ^ byte }
}
}
// example usage: [67, 108].xor(with: 0) == [67, 108]
Is there an easy way to parallelize this map call, so that multiple threads can operate on non-overlapping areas of the array at the same time?
I could write code to manually divide the array into sub-arrays and call map on each sub-array in distinct threads.
But I wonder if some framework exists in Swift to do the division automatically, since map is a functional call that can work in a thread-safe environment without side-effects.
Clarifying notes:
The code only needs to work on a [UInt8] object, not necessarily every CollectionType.
The easiest way to perform a loop of calculations in parallel is concurrentPerform (previously called dispatch_apply; see Performing Loop Iterations Concurrently in the Concurrency Programming Guide). But, no, there is no map rendition that will do this for you. You have to do this yourself.
For example, you could write an extension to perform the concurrent tasks:
extension Array {
public func concurrentMap<T>(_ transform: (Element) -> T) -> [T] {
var results = [Int: T](minimumCapacity: count)
let lock = NSLock()
DispatchQueue.concurrentPerform(iterations: count) { index in
let result = transform(self[index])
lock.synchronized {
results[index] = result
}
}
return (0 ..< results.count).compactMap { results[$0] }
}
}
Where
extension NSLocking {
func synchronized<T>(block: () throws -> T) rethrows -> T {
lock()
defer { unlock() }
return try block()
}
}
You can use whatever synchronization mechanism you want (locks, serial queues, reader-writer), but the idea is to perform transform concurrently and then synchronize the update of the collection.
Note:
This will block the thread you call it from (just like the non-concurrent map will), so make sure to dispatch this to a background queue.
One needs to ensure that there is enough work on each thread to justify the inherent overhead of managing all of these threads. (E.g. a simple xor call per loop is not sufficient, and you'll find that it's actually slower than the non-concurrent rendition.) In these cases, make sure you stride (see Improving Loop Code that balances the amount of work per concurrent block). For example, rather than doing 5000 iterations of one extremely simple operation, do 10 iterations of 500 operations per loop. You may have to experiment with suitable striding values.
While I suspect you don't need this discussion, for readers unfamiliar with concurrentPerform (formerly known as dispatch_apply), I'll illustrate its use below. For a more complete discussion on the topic, refer to the links above.
For example, let's consider something far more complicated than a simple xor (because with something that simple, the overhead outweighs any performance gained), such as a naive Fibonacci implementation:
func fibonacci(_ n: Int) -> Int {
if n == 0 || n == 1 {
return n
}
return fibonacci(n - 1) + fibonacci(n - 2)
}
If you had an array of Int values for which you wanted to calculate, rather than:
let results = array.map { fibonacci($0) }
You could:
var results = [Int](count: array.count, repeatedValue: 0)
DispatchQueue.concurrentPerform(iterations: array.count) { index in
let result = self.fibonacci(array[index])
synchronize.update { results[index] = result } // use whatever synchronization mechanism you want
}
Or, if you want a functional rendition, you can use that extension I defined above:
let results = array.concurrentMap { fibonacci($0) }
For Swift 2 rendition, see previous revision of this answer.
My implementation seems to be correct and performs well by comparison with all the others I've seen. Tests and benchmarks are here
extension RandomAccessCollection {
/// Returns `self.map(transform)`, computed in parallel.
///
/// - Requires: `transform` is safe to call from multiple threads.
func concurrentMap<B>(_ transform: (Element) -> B) -> [B] {
let batchSize = 4096 // Tune this
let n = self.count
let batchCount = (n + batchSize - 1) / batchSize
if batchCount < 2 { return self.map(transform) }
return Array(unsafeUninitializedCapacity: n) {
uninitializedMemory, resultCount in
resultCount = n
let baseAddress = uninitializedMemory.baseAddress!
DispatchQueue.concurrentPerform(iterations: batchCount) { b in
let startOffset = b * n / batchCount
let endOffset = (b + 1) * n / batchCount
var sourceIndex = index(self.startIndex, offsetBy: startOffset)
for p in baseAddress+startOffset..<baseAddress+endOffset {
p.initialize(to: transform(self[sourceIndex]))
formIndex(after: &sourceIndex)
}
}
}
}
}
Hope this helps,
-Dave
You can use parMap(), which is parrallel map. You can use activity monitor to check if it's parrallel map.
func map<T: Collection, U>( _ transform: (T.Iterator.Element) -> U, _ xs: T) -> [U] {
return xs.reduce([U](), {$0 + [transform($1)]})
}
public func parMap<T,U>(_ transform: #escaping (T)->U, _ xs: [T]) -> [U] {
let len = xs.count
var results = [U?](repeating: nil, count: len)
let process = { (i: Int) -> Void in results[i] = transform(xs[i]) }
DispatchQueue.concurrentPerform(iterations: len, execute: process)
return map({$0!}, results)
}
func test() {
parMap({_ in Array(1...10000000).reduce(0,+)}, Array(1...10))
}
Consider the following silly, simple example:
let arr = ["hey", "ho"]
let doubled = arr.map {$0 + $0}
let capitalized = arr.map {$0.capitalizedString}
As you can see, I'm processing the same initial array in multiple ways in order to end up with multiple processed arrays.
Now imagine that arr is very long and that I have many such processes generating many final arrays. I don't like the above code because we are looping multiple times, once for each map call. I'd prefer to loop just once.
Now, obviously we could handle this by brute force, i.e. by starting with multiple mutable arrays and writing into all of them on each iteration:
let arr = ["hey", "ho"]
var doubled = [String]()
var capitalized = [String]()
for s in arr {
doubled.append(s + s)
capitalized.append(s.capitalizedString)
}
Fine. But now we don't get the joy of using map. So my question is: is there a better, Swiftier way? In a hazy way I imagine myself using map, or something like map, to generate something like a tuple and magically splitting that tuple out into all resulting arrays as we iterate, as if I could say something like this (pseudocode, don't try this at home):
let arr = ["hey", "ho"]
let (doubled, capitalized) = arr.map { /* ???? */ }
If I were designing my own language, I might even permit a kind of splatting by assignment into a pseudo-array of lvalues:
let arr = ["hey", "ho"]
let [doubled, capitalized] = arr.map { /* ???? */ }
No big deal if it can't be done, but it would be fun to be able to talk this way.
How about a function, multimap, that takes a collection of transformations, and applies each one, returning them as an array of arrays:
// yay protocol extensions
extension SequenceType {
// looks like T->U works OK as a constraint
func multimap
<U, C: CollectionType
where C.Generator.Element == Generator.Element->U>
(transformations: C) -> [[U]] {
return transformations.map {
self.map($0)
}
}
}
Then use it like this:
let arr = ["hey", "ho"]
let double: String->String = { $0 + $0 }
let uppercase: String->String = { $0.uppercaseString }
arr.multimap([double, uppercase])
// returns [["heyhey", "hoho"], ["HEY", "HO"]]
Or it might be quite nice in variadic form:
extension SequenceType {
func multimap<U>(transformations: (Generator.Element->U)...) -> [[U]] {
return self.multimap(transformations)
}
}
arr.multimap({ $0 + $0 }, { $0.uppercaseString })
Edit: if you want separate variables, I think the best you can do is a destructure function (which you have to declare n times for each n-tuple unfortunately):
// I don't think this can't be expressed as a protocol extension quite yet
func destructure<C: CollectionType>(source: C) -> (C.Generator.Element,C.Generator.Element) {
precondition(source.count == 2)
return (source[source.startIndex],source[source.startIndex.successor()])
}
// and, since it's a function, let's declare pipe forward
// to make it easier to call
infix operator |> { }
func |> <T,U>(lhs: T, rhs: T->U) -> U {
return rhs(lhs)
}
And then you can declare the variables like this:
let (doubled,uppercased)
= arr.multimap({ $0 + $0 }, { $0.uppercaseString }) |> destructure
Yes this is a teensy bit inefficient because you have to build the array then rip it apart – but that’s really not going to be material, since the arrays are copy-on-write and we’re talking about a small number of them in the outer array.
edit: an excuse to use the new guard statement:
func destructure<C: Sliceable where C.SubSlice.Generator.Element == C.Generator.Element>(source: C) -> (C.Generator.Element,C.Generator.Element) {
guard let one = source.first else { fatalError("empty source") }
guard let two = dropFirst(source).first else { fatalError("insufficient elements") }
return (one,two)
}
What is wrong with your suggestion of tuple?
let arr = ["hey", "ho"]
let mapped = arr.map {e in
return (e + e, e.capitalizedString)
}
How about this, we process 'capitalized' array while we map the 'doubled' array:
let arr = ["hey", "ho"]
var capitalized = [String]()
let doubled = arr.map {(var myString) -> String in
capitalized.append(myString.capitalizedString)
return myString + myString
}
//doubled ["heyhey", "hoho"]
//capitalized: ["Hey", "Ho"]