Best Data get/set uint8 at index / Data masking - arrays

Im trying to create Data mask function.
I found two ways:
using data subscripts
very slow
creating array from data, change it and then convert it back
~70 times faster
uses 2 times more memory
Why Data subscripting is so slow?
Is there a better way to get/set uint8 at index without duplicating memory?
here is my test:
var data = Data(bytes: [UInt8](repeating: 123, count: 100_000_000))
let a = CFAbsoluteTimeGetCurrent()
// data masking
for i in 0..<data.count {
data[i] = data[i] &+ 1
}
let b = CFAbsoluteTimeGetCurrent()
// creating array
var bytes = data.withUnsafeBytes {
[UInt8](UnsafeBufferPointer(start: $0, count: data.count))
}
for i in 0..<bytes.count {
bytes[i] = bytes[i] &+ 1
}
data = Data(bytes: bytes)
let c = CFAbsoluteTimeGetCurrent()
print(b-a) // 8.8887130022049
print(c-b) // 0.12415999174118

I cannot tell you exactly why the first method (via subscripting the Data value) is so slow. According to Instruments, a lot of time
is spend in objc_msgSend, when calling methods on the
underlying NSMutableData object.
But you can mutate the bytes without copying the
data to an array:
data.withUnsafeMutableBytes { (bytes: UnsafeMutablePointer<UInt8>) -> Void in
for i in 0..<data.count {
bytes[i] = bytes[i] &+ 1
}
}
which is even faster than your "copy to array" method.
On a MacBook I got the following results:
Data subscripting: 7.15 sec
Copy to array and back: 0.238 sec
withUnsafeMutableBytes: 0.0659 sec

Related

why Array of Float64Array is taking so much memory in Node.js?

Found a very interesting situation, node.js 6.11.0, Win 10. After running this code
function rand() {return Math.floor(Math.random()*10);}
let s = new Array(10000000).fill(0).map(a => new Float64Array([rand(), rand()]));
and calling global.gc() few times, the node.js environment was taking 1,7GB of space. I have no explanation of this - Float64Array of two numbers is taking 16bytes, times 10000000 is ~160MB. Even if you assume that each element of array s is actually a pointer to Float64Array, which is another 8 bytes, it makes 240MB, but not 1,7GB for sure.
What could be the explanation to this?
Looking at the node --inspect (Node 9.5.0) output for
function rand() {
return Math.floor(Math.random()*10);
}
const arr = [];
for(var i = 0; i < 1000000; i++) {
arr.push(new Float64Array([rand(), rand()]));
if(i % 1000 == 0) {
console.log(i);
}
}
global.x = arr;
it looks like each of those Float64Arrays of 2 items requires 208 bytes of memory, so there's "simply" a significant per-object overhead there.
If you need something like this, I'd suggest allocating a single flat Float64Array of 2 * 10000000 items and indexing into it. (FWIW, I just tried that: the single 200-million-item Float64Array consumes 600 megabytes of memory and the allocation and execution near-instant.)

why this for loop approach is so slow compared with the map approach?

I tested my code in Playground, but as the discussion points out, that Playground is debug configuration, once I put all those code in real app running, they don't make a big difference. Don't know about this debug/release thing before.
Swift performance related question, I need to loop through the pixel offset of images, first I attempted it in this way.
func p1() -> [[Int]]{
var offsets = [[Int]]()
for row in 0..<height {
var rowOffset = [Int]()
for col in 0..<width {
let offset = width * row + col
rowOffset.append(offset)
}
offsets.append(rowOffset)
}
return offsets
}
But it is very slow, I searched and found some code snippet loop through offset this way:
func p2() -> [[Int]]{
return (0..<height).map{ row in
(0..<width).map { col in
let offset = width * row + col
return offset
}
}
}
So I tested if I use function p1 and p2 to loop through height = 128 and width = 128 image , p1 is 18 times slower than p2, why p1 is so slow compared with p2 ? also I'm wondering is there any other faster approach for this task?
The most obvious reason why the map approach is faster is because map allocates the array capacity up front (since it knows how many elements will be in the resulting array). You can do this too in your code by calling ary.reserveCapacity(n) on your arrays, e.g.
func p1() -> [[Int]]{
var offsets = [[Int]]()
offsets.reserveCapacity(height) // NEW LINE
for row in 0..<height {
var rowOffset = [Int]()
rowOffset.reserveCapacity(width) // NEW LINE
for col in 0..<width {
let offset = width * row + col
rowOffset.append(offset)
}
offsets.append(rowOffset)
}
return offsets
}

Dereference UnsafeMutablePointer<UnsafeMutableRawPointer>

I have a block that is passing data in that I'd like to convert to an array of array of floats -- e.g. [[0.1,0.2,0.3, 1.0], [0.3, 0.4, 0.5, 1.0], [0.5, 0.6, 0.7, 1.0]]. This data is passed to me in the form of data:UnsafeMutablePointer<UnsafeMutableRawPointer> (The inner arrays are RGBA values)
fwiw -- the block parameters are from SCNParticleEventBlock
How can I dereference data into a [[Float]]? Once I have the array containing the inner arrays, I can reference the inner array (colorArray) data with:
let rgba: UnsafeMutablePointer<Float> = UnsafeMutablePointer(mutating: colorArray)
let count = 4
for i in 0..<count {
print((rgba+i).pointee)
}
fwiw -- this is Apple's example Objective-C code for referencing the data (from SCNParticleSystem handle(_:forProperties:handler:) )
[system handleEvent:SCNParticleEventBirth
forProperties:#[SCNParticlePropertyColor]
withBlock:^(void **data, size_t *dataStride, uint32_t *indices , NSInteger count) {
for (NSInteger i = 0; i < count; ++i) {
float *color = (float *)((char *)data[0] + dataStride[0] * i);
if (rand() & 0x1) { // Switch the green and red color components.
color[0] = color[1];
color[1] = 0;
}
}
}];
You can actually subscript the typed UnsafeMutablePointer without having to create an UnsafeMutableBufferPointer, as in:
let colorsPointer:UnsafeMutableRawPointer = data[0] + dataStride[0] * i
let rgbaBuffer = colorsPointer.bindMemory(to: Float.self, capacity: dataStride[0])
if(arc4random_uniform(2) == 1) {
rgbaBuffer[0] = rgbaBuffer[1]
rgbaBuffer[1] = 0
}
Were you ever able to get your solution to work? It appears only a handful of SCNParticleProperties can be used within an SCNParticleEventBlock block.
Based on this answer, I've written the particle system handler function in swift as:
ps.handle(SCNParticleEvent.birth, forProperties [SCNParticleSystem.ParticleProperty.color]) {
(data:UnsafeMutablePointer<UnsafeMutableRawPointer>, dataStride:UnsafeMutablePointer<Int>, indicies:UnsafeMutablePointer<UInt32>?, count:Int) in
for i in 0..<count {
// get an UnsafeMutableRawPointer to the i-th rgba element in the data
let colorsPointer:UnsafeMutableRawPointer = data[0] + dataStride[0] * i
// convert the UnsafeMutableRawPointer to a typed pointer by binding it to a type:
let floatPtr = colorsPointer.bindMemory(to: Float.self, capacity: dataStride[0])
// convert that to a an UnsafeMutableBufferPointer
var rgbaBuffer = UnsafeMutableBufferPointer(start: floatPtr, count: dataStride[0])
// At this point, I could convert the buffer to an Array, but doing so copies the data into the array and any changes made in the array are not reflected in the original data. UnsafeMutableBufferPointer are subscriptable, nice.
//var rgbaArray = Array(rgbaBuffer)
// about half the time, mess with the red and green components
if(arc4random_uniform(2) == 1) {
rgbaBuffer[0] = rgbaBuffer[1]
rgbaBuffer[1] = 0
}
}
}
I'm really not certain if this is the most direct way to go about this and seems rather cumbersome compared to the objective-C code (see above question). I'm certainly open to other solutions and/or comments on this solution.

Integer vs Boolean array Swift Performance

I tried executing Sieve Of Eratosthenes algorithm using a large Integer array and a large Bool array.
The integer version seems to execute MUCH faster than the boolean one. What is the possible reason for this?
import Foundation
var n : Int = 100000000;
var prime = [Bool](repeating: true, count: n+1)
var p = 2
let start = DispatchTime.now()
while((p*p)<=n)
{
if(prime[p] == true)
{
var i = p*2
while (i<=n)
{
prime[i] = false
i = i + p
}
}
p = p+1
}
let stop = DispatchTime.now()
let time = (Double)(stop.uptimeNanoseconds - start.uptimeNanoseconds) / 1000000.0
print("Time = \(time) ms")
Boolean array execution time : 78223.342295 ms
import Foundation
var n : Int = 100000000;
var prime = [Int](repeating: 1, count: n+1)
var p = 2
let start = DispatchTime.now()
while((p*p)<=n)
{
if(prime[p] == 1)
{
var i = p*2
while (i<=n)
{
prime[i] = 0
i = i + p
}
}
p = p+1
}
let stop = DispatchTime.now()
let time = (Double)(stop.uptimeNanoseconds - start.uptimeNanoseconds) / 1000000.0
print("Time = \(time) ms")
Integer array execution time : 8535.54546 ms
TL, DR:
Do not attempt to optimize your code in a Debug build. Always run it through the Profiler. Int was faster then Bool in Debug but the oposite was true when run through the Profiler.
Heap allocation is expensive. Use your memory judiciously. (This question discusses the complications in C, but also applicable to Swift)
Long answer
First, let's refactor your code for easier execution:
func useBoolArray(n: Int) {
var prime = [Bool](repeating: true, count: n+1)
var p = 2
while((p*p)<=n)
{
if(prime[p] == true)
{
var i = p*2
while (i<=n)
{
prime[i] = false
i = i + p
}
}
p = p+1
}
}
func useIntArray(n: Int) {
var prime = [Int](repeating: 1, count: n+1)
var p = 2
while((p*p)<=n)
{
if(prime[p] == 1)
{
var i = p*2
while (i<=n)
{
prime[i] = 0
i = i + p
}
}
p = p+1
}
}
Now, run it in the Debug build:
let count = 100_000_000
let start = DispatchTime.now()
useBoolArray(n: count)
let boolStop = DispatchTime.now()
useIntArray(n: count)
let intStop = DispatchTime.now()
print("Bool array:", Double(boolStop.uptimeNanoseconds - start.uptimeNanoseconds) / Double(NSEC_PER_SEC))
print("Int array:", Double(intStop.uptimeNanoseconds - boolStop.uptimeNanoseconds) / Double(NSEC_PER_SEC))
// Bool array: 70.097249517
// Int array: 8.439799614
So Bool is a lot slower than Int right? Let's run it through the Profiler by pressing Cmd + I and choose the Time Profile template. (Somehow the Profiler wasn't able to separate these functions, probably because they were inlined so I had to run only 1 function per attempt):
let count = 100_000_000
useBoolArray(n: count)
// useIntArray(n: count)
// Bool: 1.15ms
// Int: 2.36ms
Not only they are an order of magnitude faster than Debug but the results are reversed to: Bool is now faster than Int!!! The Profiler doesn't tell us why how so we must go on a witch hunt. Let's check the memory allocation by adding an Allocation instrument:
Ha! Now the differences are laid bare. The Bool array uses only one-eight as much memory as Int array. Swift array uses the same internals as NSArray so it's allocated on the heap and heap allocation is slow.
When you think even more about it: a Bool value only take up 1 bit, an Int takes 64 bits on a 64-bit machine. Swift may have chosen to represent a Bool with a single byte, while an Int takes 8 bytes, hence the memory ratio. In Debug, this difference may have caused all the difference as the runtime must do all kinds of checks to ensure that it's actually dealing with a Bool value so the Bool array method takes significantly longer.
Moral of the lesson: don't optimize your code in Debug mode. It can be misleading!
(A partial answer ...)
As #MartinR mentions in his comments to the question, there is no such major difference between the two cases if you build for release mode (with optimizations); the Bool case is slightly faster due its smaller memory footprint (but equally fast as e.g. UInt8 which has the same footprint).
Running instruments to profile the (non-optimized) debug build, we clearly see that the array element access & assignment is the culprit for the Bool case (an as far as my brief testing has seen; for all types except the integer ones, Int, UInt16, and so on).
We can further ascertain that its not the writing part in particular that yields the overhead, but rather the repeated accessing of the i:th element.
The same explicit read-access tests for an array of integer elements show no such large overhead.
It would almost seem as if the random element access is, for some reason, not working as it should (for non-integer types) when compiling with debug build config.

Copy contents of Swift Array to Struct embedded Tuple

For communicating with a BLE characteristic, I have a Swift struct that looks like:
struct Packet {
var control1:UInt8 = 0
var control2:UInt8 = 0
var payload:(UInt8,UInt8,UInt8,UInt8,UInt8,UInt8,UInt8,UInt8,UInt8,UInt8,UInt8,UInt8,UInt8,UInt8,UInt8,UInt8) = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
init(control1:UInt8, control2:UInt8) {
self.control1 = control1
self.control2 = control2
}
}
I have payload defined as a tuple, because that seems to be the only way to have an array (of bytes in this case) of fixed size embedded in a Swift struct. Verbose, but whatever.
I have a big ol' source:[UInt8] that I want to pull swatches of into that Packet struct, so I can send them via BLE to the remote device. When I do:
var packet = Packet(control1: self.pageIndex, control2: sentenceIndex)
let offset = (Int(self.pageIndex) * self.pageSize) + (Int(sentenceIndex) * self.sentenceSize)
let limit = offset + self.sentenceSize
packet.payload = self.source[offset..<limit]
For the last line, I get the rather confusing error:
Cannot subscript a value of type '[UInt8]'
Cryptic I say, because it actually can. If I take the assignment to the packet.payload out, it has no problem subscripting the value.
What I'm really interested in at a higher level, is how one puts together a struct with a fixed size array of bytes, and then copies swatches of a large buffer into those. I would like to both understand the above, as well as know how to solve my problem.
UPDATE:
I ended up backing up a little, influenced by both answers below, and rethinking. My main driving force was that I wanted a simple/clever way to have convert a struct with an internal array to/from NSData, primary in BLE communications. What I ended up doing was:
struct Packet {
var pageIndex:UInt8 = 0
var sentenceIndex:UInt8 = 0
var payload:ArraySlice<UInt8> = []
var nsdata:NSData {
let bytes:[UInt8] = [self.pageIndex, self.sentenceIndex] + self.payload
return NSData(bytes: bytes, length: bytes.count)
}
}
Not the most efficient because I have to create the intermediate [UInt8] array, but I decided that a simple way to convert didn't exist, that I'd have to do things with as conversions or memcpy and friends.
I'm not sure which of the two below to mark as an answer, since both influenced what I ended up with.
There are two ugly/simple solutions:
To assign each member of the tuple separately:
var offset = ...
packet.payload = (source[offset++], source[offset++], ... , source[offset++])
To just copy the raw memory (recommended)
var values = Array(source[offset..<limit])
memcpy(&packet.payload, &values, sentenceSize)
Note that it's possible to create an array from a tuple:
func tupleToArray<T>(tuple: Any, t: T.Type) -> [T] {
return Mirror(reflecting: tuple).children.flatMap{ $0.value as? T }
}
tupleToArray((1, 2, 3, 4, 5), t: Int.self) // [1, 2, 3, 4, 5]
But the other way around doesn't work, as Swift's reflection is read-only.
Another much more complicated but more beautiful solution would be to use Dependent Types, which enables you to have arrays with compile-time known length. Check out this great blog post, in which he also mentions this post on the Apple Developer forums which is basically what you'd need:
let vector = 3.0 ⋮ 4.0 ⋮ 5.0 // [3.0, 4.0, 5.0]
vector[1] // 4.0
vector.count // 3
sizeofValue(vector) // 3 * 8 ( same size as a tuple with 3 elements)
First of all don't use tuples to create contiguous arrays of memory. Go ahead and use the [UInt8] type. I would recommend using a stride function to create your indices for you like this. You will have to handle the case of your data source not being a multiple of the Packet payload size.
struct Packet {
var control1: UInt8 = 0
var control2: UInt8 = 0
static let size = 16
var payload = [UInt8].init(count: Packet.size, repeatedValue: 0)
init(control1: UInt8, control2: UInt8) {
self.control1 = control1
self.control2 = control2
}
}
// random values between 0...255
let blob = (0..<(Packet.size * 3)).map{_ in UInt8(arc4random_uniform(UInt32(UInt8.max)))}
for index in 0.stride(through: blob.count - 1, by: Packet.size) {
var packet = Packet(control1: 4, control2: 5)
packet.payload[0..<Packet.size] = blob[index..<index + Packet.size]
print(packet.payload)
}
As far as the cannot subscript error, I encountered that too. I suspect that this has changed recently. I was able to eliminate the error by matching the packet indice slice with the data source slice.
UPDATE
A commenter correctly pointed out that Packet structure contained a reference to an Array and therefore did not meet the OP's need. While I was focused more on iterating through a large data source using stride, here is an alternative using an untyped [UInt8] for such a simple data structure.
// payload size in count of UInt8
let size = 16
// field offsets
let control1 = 0
let control2 = 1
let payload = 2..<(2 + size)
// random values between 0...255
let blob = (0..<size * 3).map{_ in UInt8(arc4random_uniform(UInt32(UInt8.max)))}
for index in 0.stride(through: blob.count - 1, by: size) {
var buffer = [UInt8](count: 2 + size, repeatedValue: 0)
buffer[control1] = 255
buffer[control2] = 0
buffer[payload] = blob[index..<index + size]
let data = NSData(bytesNoCopy: &buffer, length: buffer.count, freeWhenDone: false)
// send data
}

Resources