xy array in Swift performance - arrays

I have two large (several thousand values) arrays with floats, and would like to combine them in an xy point array for further processing, eg to plot.
So right now in Xcode playground I am doing this:
let xArray = // read from datafile, fast
let yArray = // read from another datafile, fast
struct xyPoint {
let x: Float
let y: Float
}
var spectrum: [xyPoint] = []
for i in 0..<xArray.count {
let xy = xyPoint(x: xArray[i], y: yArray[i])
spectrum.append(xy)
}
Now when I run the playground, this takes a really long time to do.
Any ideas how I can speed this up?

I checked the performance for various solutions to your problem. You can download my tests from this link to github
A) Your code
var spectrum: [XYPoint] = []
for i in 0..<xArray.count {
let xy = XYPoint(x: xArray[i], y: yArray[i])
spectrum.append(xy)
}
B) Zip + map (Martin R's answer)
let spectrumB = zip(xArray, yArray).map(XYPoint.init)
C) Range + map (My solution)
let spectrum = (0 ..< xArray.count).map { i in
return XYPoint(x: xArray[i], y: yArray[i])
}
D) ReserveCapacity + append (Duncan C's answer)
var spectrum: [XYPoint] = []
spectrum.reserveCapacity(xArray.count)
for (index, value) in xArray.enumerated() {
spectrum.append(XYPoint(x: xArray[index], y: yArray[index]))
}
My results (in seconds)
╭──────────────┬──────────────┬──────────────┬──────────────╮
│ A │ B │ C │ D │
╭───────────╬══════════════╪══════════════╪══════════════╪══════════════╡
│ 100 ║ 0.000009426 │ 0.000002401 │ 0.000000571 │ 0.000000550 │
│ 200 ║ 0.000003356 │ 0.000002629 │ 0.000000911 │ 0.000000866 │
│ 500 ║ 0.000005610 │ 0.000007288 │ 0.000002236 │ 0.000002012 │
│ 1000 ║ 0.000010638 │ 0.000009181 │ 0.000003905 │ 0.000005030 │
│ 2000 ║ 0.000019377 │ 0.000013316 │ 0.000007116 │ 0.000008732 │
│ 5000 ║ 0.000023430 │ 0.000019304 │ 0.000019809 │ 0.000019092 │
│ 10000 ║ 0.000050463 │ 0.000031669 │ 0.000035121 │ 0.000035420 │
│ 20000 ║ 0.000087040 │ 0.000058664 │ 0.000069300 │ 0.000069456 │
│ 50000 ║ 0.000272357 │ 0.000204213 │ 0.000176962 │ 0.000192996 │
│ 100000 ║ 0.000721436 │ 0.000459551 │ 0.000415024 │ 0.000437604 │
│ 200000 ║ 0.001114534 │ 0.000924621 │ 0.000816374 │ 0.000896202 │
│ 500000 ║ 0.002576687 │ 0.002094998 │ 0.001860833 │ 0.002060462 │
│ 1000000 ║ 0.007063596 │ 0.005924892 │ 0.004319181 │ 0.004869024 │
│ 2000000 ║ 0.014474969 │ 0.013594134 │ 0.008568550 │ 0.009388957 │
│ 5000000 ║ 0.038348767 │ 0.035136008 │ 0.021276415 │ 0.023855382 │
│ 10000000 ║ 0.081750925 │ 0.078742713 │ 0.043578664 │ 0.047700495 │
│ 20000000 ║ 0.202616669 │ 0.199960563 │ 0.148141266 │ 0.145360923 │
│ 50000000 ║ 0.567078563 │ 0.552158644 │ 0.370327555 │ 0.397115294 │
│ 100000000 ║ 1.136993625 │ 1.101725386 │ 0.713406642 │ 0.740150322 │
└───────────╨──────────────┴──────────────┴──────────────┴──────────────┘

The easiest way to create the array of points would be
let spectrum = zip(xArray, yArray).map(XYPoint.init)
(I have taken the liberty to call the struct XYPoint, as Swift types
should start with uppercase letters.) This also allows to define
the result array as a constant.
However, it is not the fastest with respect to execution time.
Reasons may be
zip() operates on general sequences and does not take advantage
of the input being arrays.
zip() returns a Sequence and therefore map()
does not know the number of elements which are to be created.
As a consequence, the destination array will be reallocated several
times.
Therefore an explicit loop is faster if you reserve the needed
capacity in advance:
var spectrum: [XYPoint] = []
spectrum.reserveCapacity(xArray.count)
for i in 0..<xArray.count {
let xy = XYPoint(x: xArray[i], y: yArray[i])
spectrum.append(xy)
}
In my test (on a 1.2 GHz Intel Core m5 MacBook, compiled in Release
mode) with two arrays of 10,000 elements, the first method took
about 0.65 milliseconds and the second method about 0.42 milliseconds.
For 1,000,000 elements I measured 12 milliseconds vs 6 milliseconds.

Once you have 2 separate arrays, combining them is a little awkward, and there isn't a neat "Swifty" way to do it. If you had an array of structs, where each struct contained an x an y value, you could use a map statement to transform that array into an array of CGPoint objects (which is actually another Struct type).
You start out by telling us:
let xArray = // read from datafile, fast
let yArray = // read from another datafile, fast
It might be better to rework that code you don't show, so that instead of reading all of the x points data file, then reading all of the y points data file, you:
read an x point
read a y point
create a CGPoint for that X/Y pair
Add the new CGPoint to your output array of CGPoint values
Or even, restructure your code that creates the data files so that it writes a file containing an array of X/Y pairs rather than 2 separate files.
If you have 2 separate arrays, you might use a variant of for... in that gives you both an index and a value for each array entry:
let xArray: [CGFloat] = [0.1, 0.2, 0.3, 0.4]
let yArray: [CGFloat] = [0.4, 0.3, 0.2, 0.1]
var output = [CGPoint]()
output.reserveCapacity(xArray.count)
for (index, value) in xArray.enumerated() {
let yValue = yArray[index]
let aPoint = CGPoint (x: value, y: yValue)
output.append(aPoint)
}
The code above will crash if yArray has less values than xArray, and will miss the last values in yArray if it contains more values than xArray. A complete implementation should really do error checking first and handle the cases were the arrays have different numbers of values.

When you run code in the main playground file, you will likely have logging enabled. This adds a huge performance hit to the code.
I tried out your code in the question as a function. Putting the function in the main swift file for arrays of size 10000 took over 10 minutes!
I moved the function to a separate swift file in the sources folder of the playground with the same size arrays, and it finished instantly.
The code I used was from your question (within a func) rather than the optimized versions.

Related

How to combine and make an array into a cell of DataFrames.jl?

Let say
df = DataFrame(a=[1])
Row │ a
│ Int64
─────┼───────
1 │ 1
We have
Tried to combine data and make a new column holding arrays
combine(df, :a => x->[1,2])
Row │ a_function
│ Int64
─────┼────────────
1 │ 1
2 │ 2
Tried to combine data and make a new column holding tuples
combine(df, :a => x->(1,2))
Row │ a_function
│ Tuple…
─────┼────────────
1 │ (1, 2)
Why 1 doesn't work as intended, i.e. holding the whole [1,2] array in one cell instead of creating 2 rows?
I converted the array into a tuple and it worked, but I wonder why they works in so different ways.

Clickhouse SELECT Array values that have all elements in a specific order

I need to find an array that contains all of the values of another array, in the presented order - similar to 'hasString' or 'hasAll' - or an idea of how to go about this.
Example Data
dataCol = [1,2,3,4]
hasSubstr
hasSubstr is close however if the values are not an exact match, they are not a match.
hasSubstr(dataCol, [1,2,4]) will return 0
But I need a 1 here because 1, 2 and 4 are in dataCol in the order of 1 then 2 then 4.
Has All
hasAll is also close however it doesn't care about the order.
hasAll(dataCol, [4,2,1]) will return 1
But I need a 0 here because the order is incorrect.
Function or Query?
Something the equivalent of the 'imaginary' function: hasAllOrdered(dataCol, [1,3,4]) = 1
Or an idea of how to construct a query for this. Maybe a combination of hasAll and some query logic magic?
Edit: To clarify my intended result, I need to run a query to select multiple columns as could be used with a function.
SELECT
path AS dataCol,
track
FROM tracks
WHERE time_start > 1645232556
AND { magic here returning rows containing [276,277,279] in dataCol }
LIMIT 10
Query id: cac9b576-193e-475f-98e4-84354bf13af4
┌─dataCol───────────────────────────────────┬──track─┐
│ [211,210,207,205,204] │ 413354 │
│ [211,210,207,205,204] │ 413355 │
│ [73,74,142,209,277,276,208] │ 413356 │
│ [73,74,142,209,277,276,208] │ 413357 │
│ [280,279] │ 413358 │
│ [280,279] │ 413359 │
│ [272,208,276,277,278,346,347,273,206,207] │ 413360 │
│ [208,276,277,278,346,272,273,206,207,347] │ 413361 │
│ [276,277,278,279,348,208,209,141] │ 413362 │
│ [141,276,208,209,277,278,279,348] │ 413363 │
└───────────────────────────────────────────┴────────┘
10 rows in set. Elapsed: 0.007 sec. Processed 13.59 thousand rows, 273.88 KB (1.86 million rows/s., 37.49 MB/s.)
Ref: https://clickhouse.com/docs/en/sql-reference/functions/array-functions/
You have 2 arrays [a, b] and [a,,b]
Lets build the second array through indexes of the first (indexOf + arrayMap) === [a,d,b] --> [1,0,2], remove zeros d by indexOf ( <> 0) --> [1,2]
Now we need the array only if indexes are grow, otherwise elements is in a wrong order.
arrayDifference == [1,2] -> [0,1]. Now if this array has negative elements then indexes are not grow
-- not arrayExists j < 0
create table tracks( dataCol Array(UInt64), track UInt64 ) Engine = Memory;
insert into tracks values
( [211,210,207,205,204] , 413354)
( [211,210,207,205,204] , 413355)
( [280,279] , 413358)
( [280,279] , 413359)
( [272,208,276,277,278,346,347,273,206,207], 413360)
( [208,276,277,278,346,272,273,206,207,347], 413361)
( [276,277,278,279,348,208,209,141] , 413362)
( [141,276,208,209,277,278,279,348] , 413363);
select *
from tracks
where hasAll(dataCol, [276,277,279] as x ) and not arrayExists(j -> j<0, arrayDifference(arrayFilter(i->indexOf(dataCol, i)<>0, x)))
┌─dataCol───────────────────────────┬──track─┐
│ [276,277,278,279,348,208,209,141] │ 413362 │
│ [141,276,208,209,277,278,279,348] │ 413363 │
└───────────────────────────────────┴────────┘

Evaluate whether an array contains an element other than given elements

I am trying to determine if my given array _servicetype contains an element other than 12,1,2,3.
Below is what I have so far,
Scenario 1: if my array is {1,2,3,6015} I want FALSE
Scenario 2: if my array is {1,2,12} I want TRUE
Scenario 3: if my array is {1,2} I want true
I ended up creating the iif statement as a User defined function in Postgres and got the following below:
IIF(_servicetype#>ARRAY['12']::INT[]
OR _servicetype#>ARRAY['1'] ::INT[]
OR _servicetype#>ARRAY['2'] ::INT[]
OR _servicetype#>ARRAY['3'] ::INT[],TRUE,FALSE)::BOOLEAN
My concern is it will not work for Scenario 1.
You can check that ARRAY[12, 1, 2, 3] is a superset of _servicetype using the #> (contains/covers) operator, ie if _supertype contains anything not in ARRAY[12, 1, 2, 3] return false:
WITH examples(_servicetype) AS (
VALUES
('{1,2,3,6015}'::int[]),
('{2,1}'::int[]),
('{1}'::int[])
)
SELECT _servicetype, '{12, 1, 2, 3}' #> _servicetype
FROM examples;
┌──────────────┬──────────┐
│ _servicetype │ ?column? │
├──────────────┼──────────┤
│ {1,2,3,6015} │ f │
│ {2,1} │ t │ -- set-wise "contains", order does not matter
│ {1} │ t │
└──────────────┴──────────┘
(3 rows)

How to create two dimensional Möbius strip, Klein bottle and projective plane arrays?

I was reading the following article about “Games on Strange Boards”. It describes various locally two-dimensional array topologies such as:
Cylinder
Torus
Möbius strip
Klein bottle
Projective plane
In the above diagrams, sides with the same arrows are glued together in a way that they arrows match up. Hence, if the arrows point in the same direction then they are glued normally. However, if they point in different directions then they are glued after twisting.
For example, moving off the top right edge of a cylinder will wrap you back to the top left edge. However, moving off the top right edge of a Möbius strip will wrap you back to the bottom left edge.
Now, creating cylindrical and toroidal arrays is easy. You use the modulo operation to make the rows and columns wrap around. Consider the code for calculating the coordinates of a toroidal array with m rows and n columns:
const mod = (x, y) => (x % y + y) % y; // floored division modulo operation
const coords = (m, n) => (i, j) => [mod(i, m), mod(j, n)]; // toroidal array
How would you calculate the coordinates of a Möbius strip, Klein bottle or projective plane? Are there any special cases to handle considering that these are non-orientable surfaces?
Consider a cylinder made from a rectangular sheet of paper. The sheet has two sides, front and back. When we glue the sheet into a cylinder, we can't reach the back side (inside of the cylinder) from the front side (outside of the cylinder). However, if we glue the sheet of paper into a Möbius strip then we can. Here's what a grid on a Möbius strip would look like if we separated the two sides and flattened it:
┌────┬────┬────┬────┰────┬────┬────┬────┐
│ a4 │ b4 │ c4 │ d4 ┃ A1 │ B1 │ C1 │ D1 │
├────┼────┼────┼────╂────┼────┼────┼────┤
│ a3 │ b3 │ c3 │ d3 ┃ A2 │ B2 │ C2 │ D2 │
├────┼────┼────┼────╂────┼────┼────┼────┤
│ a2 │ b2 │ c2 │ d2 ┃ A3 │ B3 │ C3 │ D3 │
├────┼────┼────┼────╂────┼────┼────┼────┤
│ a1 │ b1 │ c1 │ d1 ┃ A4 │ B4 │ C4 │ D4 │
└────┴────┴────┴────┸────┴────┴────┴────┘
Note that the squares on the left (i.e. the ones in lowercase) are on the front whereas the square on the right (i.e. the ones in uppercase) are on the back. Squares with only a case difference are the same square, only on opposite sides of the Möbius strip. One thing to notice is that this flattened Möbius strip is a lot like a cylinder, except that the left and right sides coincide.
Here's what the code for a Möbius strip would look like:
const mod = (x, y) => (x % y + y) % y;
const coords = (m, n) => (i, j) => {
j = mod(j, 2 * n); // wrapping around like a cylinder
if (j < n) return [i, j]; // front side
return [m - i - 1, j - n]; // back side, translated to front side
};
A Klein bottle exactly the same as a Möbius strip, except that it behaves like a torus instead of a cylinder. Here's what the code for a Klein bottle would look like:
const mod = (x, y) => (x % y + y) % y;
const coords = (m, n) => (i, j) => {
i = mod(i, m); // wrapping around
j = mod(j, 2 * n); // like a torus
if (j < n) return [i, j]; // front side
return [m - i - 1, j - n]; // back side, translated to front side
};
A projective plane also behaves like a torus. However, each of its sides can have two orientations, regular and rotated by 180°. Here's what a flattened projective plane would look like:
┌────┬────┬────┬────┰────┬────┬────┬────┐
│ a4 │ b4 │ c4 │ d4 ┃ A1 │ B1 │ C1 │ D1 │
├────┼────┼────┼────╂────┼────┼────┼────┤
│ a3 │ b3 │ c3 │ d3 ┃ A2 │ B2 │ C2 │ D2 │
├────┼────┼────┼────╂────┼────┼────┼────┤
│ a2 │ b2 │ c2 │ d2 ┃ A3 │ B3 │ C3 │ D3 │
├────┼────┼────┼────╂────┼────┼────┼────┤
│ a1 │ b1 │ c1 │ d1 ┃ A4 │ B4 │ C4 │ D4 │
┝━━━━┿━━━━┿━━━━┿━━━━╋━━━━┿━━━━┿━━━━┿━━━━┥
│ D4 │ C4 │ B4 │ A4 ┃ d1 │ c1 │ b1 │ a1 │
├────┼────┼────┼────╂────┼────┼────┼────┤
│ D3 │ C3 │ B3 │ A3 ┃ d2 │ c2 │ b2 │ a2 │
├────┼────┼────┼────╂────┼────┼────┼────┤
│ D2 │ C2 │ B2 │ A2 ┃ d3 │ c3 │ b3 │ a3 │
├────┼────┼────┼────╂────┼────┼────┼────┤
│ D1 │ C1 │ B1 │ A1 ┃ d4 │ c4 │ b4 │ a4 │
└────┴────┴────┴────┸────┴────┴────┴────┘
So, here's what the code for a projective plane would look like:
const mod = (x, y) => (x % y + y) % y;
const coords = (m, n) => (i, j) => {
i = mod(i, 2 * m); // wrapping around
j = mod(j, 2 * n); // like a torus
if (i >= m) { // collapse to Klein bottle topology
i -= m;
j = mod(n - j - 1, 2 * n);
}
if (j < n) return [i, j]; // front side
return [m - i - 1, j - n]; // back side, translated to front side
};
Hope that helps.

Broadcasting, multiple n*m returns instead of n*m element [Julia 1.0]

Suppose I want to write a function that requires arrays of any length as argument inputs:
e.g.,
f = function(x,y)
z = x * y
outputs = DataFrame(x = x, y = y, z = z)
return(outputs)
end
The return, f.([1,2],[1,2]) is a 2-element array of two 1x3 DataFrames. But, in this scenario I want one 2x3 DataFrame.
I could accomplish this by defining z prior to a nested for loop:
f = function(x,y)
z=fill(0,length(x))
for i in 1:length(x)
z[i] = x[i] * y[i]
end
outputs = DataFrame(x = x, y = y, z = z)
return(outputs)
end
Here, f([1,2],[1, 2]) gets me what I desire. But, the problems are that I have to define all in-function variables twice and add a for loop, while remembering to include the iterated variable, i. Is there something I'm missing? My question is, how do I get my desired nm element as opposed to an nm array...
I tried to follow this Julia blog post. Also this Julia discussion post specifically address the issue but I think the solutions were outdated for 1.0.
---- EDIT
Using a for loop could work as would using dots to denote element-wise operations.
The larger issue I'm concerned about is consistency.
Suppose I have two functions. One function (f1) returns one-dimensional output while the other (f2) has a two-dimensional output.
function f1(x, y)
z = x .* y
DataFrame(x = x, y = y, z = z)
end
function f2(x, y)
z = x * y
return(z)
end
Here the correct calls when x = [1,2] and y = [1,2] would be f1([1,2], [1,2]) and f2.([1,2], [1,2]).
What I'm calling here as inconsistent is that (from the point of view of a user who doesn't know the internal function code), to get an output where z is the length of x and y, . is used with f2 but not f1. The only workaround I can see is to define z = .x * y (or alternatively use a for each index loop) in f2. In that case, both f1 and f2 both can be called without a dot. Is that an appropriate solution? To be clear, what I am aiming for is that f1 and f2 are called identically by a user whether x and y are single or multiple element arrays. My preferences would be to have the user call both functions without a dot if x and y are single elements and with a .if each variable had multiple elements. This doesn't seem possible. Therefore, the part that I have to learn to live with is having to write many . or [i]'s in my functions (if I desire "consistency"). Correct?
Alternatively, I could add documentation that explicitly states that my functions which return one variable need to be called with . when arguments are of length>1 and functions that return a dataframe need not be called with . for any reason.
[forgive any misuse of technical language; my background is ecology]
Is that what you want?
julia> function f(x, y)
z = x .* y
DataFrame(x = x, y = y, z = z)
end
f (generic function with 1 method)
julia> f([1,2], [1,2])
2×3 DataFrame
│ Row │ x │ y │ z │
├─────┼───┼───┼───┤
│ 1 │ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │ 4 │
You could also write f(x, y) = DataFrame(x = x, y = y, z = x .* y) in short.
The way you wrote the function definition suggests that you know R. In Julia, as opposed to R, scalars and arrays are totally separated types (eg. Float64 and Vector{Float64}), and have to be treated differently; but usually, just adding enough broadcasting at the right places works (and broadcasting works by putting a . after any function call or before any operator).
To be sure not to mix such things up, you can add types to the arguments: f(x::Vector{Float64}, y::Vector{Float64}) or whatever suits you.
My preferences would be to have the user call both functions without a dot if x and y are single elements and with a .if each variable had multiple elements.
You need a function that specializes on the types of arguments. The most elegant and the fastest at execution time once compiled way to do it is with the #generated macro.
using DataFrames
#generated function f(a,b)
if a<:Array && b<:Array
code = quote
DataFrame(x = a, y = b, z = a .* b)
end
else
code = quote
DataFrame(x = a, y = b, z = a * b)
end
end
code
end
Now let us test it. Please note how the function behavior depends on the types of the arguments (Float64 vs Int). Each of the parameters can be either an Array or a scalar.
julia> f(3,4)
1×3 DataFrame
│ Row │ x │ y │ z │
├─────┼───┼───┼────┤
│ 1 │ 3 │ 4 │ 12 │
julia> f(3,4.0)
1×3 DataFrame
│ Row │ x │ y │ z │
├─────┼───┼─────┼──────┤
│ 1 │ 3 │ 4.0 │ 12.0 │
julia> f(3.0,[1,2,3])
3×3 DataFrame
│ Row │ x │ y │ z │
├─────┼─────┼───┼─────┤
│ 1 │ 3.0 │ 1 │ 3.0 │
│ 2 │ 3.0 │ 2 │ 6.0 │
│ 3 │ 3.0 │ 3 │ 9.0 │
julia> f([1,2,3],4)
3×3 DataFrame
│ Row │ x │ y │ z │
├─────┼───┼───┼────┤
│ 1 │ 1 │ 4 │ 4 │
│ 2 │ 2 │ 4 │ 8 │
│ 3 │ 3 │ 4 │ 12 │
julia> f([6,7,8],[1,2,3])
3×3 DataFrame
│ Row │ x │ y │ z │
├─────┼───┼───┼────┤
│ 1 │ 6 │ 1 │ 6 │
│ 2 │ 7 │ 2 │ 14 │
│ 3 │ 8 │ 3 │ 24 │

Resources