The bounty expires in 5 days. Answers to this question are eligible for a +100 reputation bounty.
johnbakers is looking for an answer from a reputable source:
Desiring a good understanding of why copy-on-write is not interfering with multithreaded updates to different array indexes and whether this is in fact safe to do from a specification standpoint, as it appears to be.
I see frequent mention that Swift arrays, due to copy-on-write, are not threadsafe, but have found this works, as it updates different and unique elements in an array from different threads simultaneously:
//pixels is [(UInt8, UInt8, UInt8)]
let q = DispatchQueue(label: "processImage", attributes: .concurrent)
q.sync {
DispatchQueue.concurrentPerform(iterations: n) { i in
... do work ...
pixels[i] = ... store result ...
}
}
(simplified version of this function)
If threads never write to the same indexes, does copy-on-write still interfere with this? I'm wondering if this is safe since the array itself is not changing length or memory usage. But it does seem that copy-on-write would prevent the array from staying consistent in such a scenario.
If this is not safe, and since doing parallel computations on images (pixel arrays) or other data stores is a common requirement in parallel computation, what is the best idiom for this? Is it better that each thread have its own array and then they are combined after all threads complete? It seems like additional overhead and the memory juggling from creating and destroying all these arrays doesn't feel right.
Updated answer:
Having thought about this some more, I suppose the main thing is that there's no copy-on-write happening here either way.
COW happens because arrays (and dictionaries, etc) in Swift behave as value types. With value types, if you pass a value to a function you're actually passing a copy of the value. But with array, you really don't want to do that because copying the entire array is a very expensive operation. So Swift will only perform the copy when the new copy is edited.
But in your example, you're not actually passing the array around in the first place, so there's no copy on write happening. The array of pixels exists in some scope, and you set up a DispatchQueue to update the pixel values in place. Copy-on-write doesn't come into play here because you're not copying in the first place.
I see frequent mention that Swift arrays, due to copy-on-write, are not threadsafe
To the best of my knowledge, this is more or less the opposite of the actual situation. Swift arrays are thread-safe because of copy-on-write. If you make an array and pass it to multiple different threads which then edit the array (the local copy of it), it's the thread performing the edits that will make a new copy for its editing; threads only reading data will keep reading from the original memory.
Consider the following contrived example:
import Foundation
/// Replace a random element in the array with a random int
func mutate(array: inout [Int]) {
let idx = Int.random(in: 0..<array.count)
let val = Int.random(in: 1000..<10_000)
array[idx] = val
}
class Foo {
var numbers: [Int]
init(_ numbers: [Int]) {
// No copying here; the local `numbers` property
// will reference the same underlying memory buffer
// as the input array of numbers. The reference count
// of the underlying buffer is increased by one.
self.numbers = numbers
}
func mutateNumbers() {
// Copy on write can happen when we call this function,
// because we are not allowed to edit the underlying
// memory buffer if more than one array references it.
// If we have unique access (refcount is 1), we can safely
// edit the buffer directly.
mutate(array: &self.numbers)
}
}
var numbers = [0, 1, 2, 3, 4, 5]
var foo_instances: [Foo] = []
for _ in 0..<4 {
let t = Thread() {
let f = Foo(numbers)
foo_instances.append(f)
for _ in 0..<5_000_000 {
f.mutateNumbers()
}
}
t.start()
}
for _ in 0..<5_000_000 {
// Copy on write can potentially happen here too,
// because we can get here before the threads have
// started mutating their arrays. If that happens,
// the *original* `numbers` array in the global will
// make a copy of the underlying buffer, point to the
// the new one and decrement the reference count of the
// previous buffer, potentially releasing it.
mutate(array: &numbers)
}
print("Global numbers:", numbers)
for foo in foo_instances {
print(foo.numbers)
}
Copy-on-write can happen when the threads mutate their numbers, and it can happen when the main thread mutates the original array, and but in neither case will it affect any of the data used by the other objects.
Arrays and copy-on-write are both thread-safe. The copying is done by the party responsible for the editing, not the other instances referencing the memory, so two threads will never step on each others toes here.
However, what you're doing isn't triggering copy-on-write in the first place, because the different threads are writing to the array in place. You're not passing the value of the array to the queue. Due to the how the closure works, it's more akin to using the inout keyword on a function. The reference count of the underlying buffer remains 1 but the reference count of the array goes up, because the threads executing the work are all pointing to the same array. This means that COW doesn't come into play at all.
As for this part:
If this is not safe, and since doing parallel computations on images (pixel arrays) or other data stores is a common requirement in parallel computation, what is the best idiom for this?
It depends. If you're simply doing a parallel map function, executing some function on each pixel that depends solely on the value of that pixel, then just doing a concurrentPerform for each pixel seems like it should be fine. But if you want to do something like apply a multi-pixel filter (like a convolution for example), then this approach does not work. You can either divide the pixels into 'buckets' and give each thread a bucket for itself, or you can have a read-only input pixel buffer and an output buffer.
Old answer below:
As far as I can tell, it does actually work fine. This code below runs fine, as best as I can tell. The dumbass recursive Fibonacci function means the latter values in the input array take a bit of time to run. It maxes out using all CPUs in my computer, but eventually only the slowest value to compute remains (the last one), and it drops down to just one thread being used.
As long as you're aware of all the risks of multi-threading (don't read the same data you're writing, etc), it does seem to work.
I suppose you could use withUnsafeMutableBufferPointer on the input array to make sure that there's no overhead from COW or reference counting.
import Foundation
func stupidFib(_ n: Int) -> Int {
guard n > 1 else {
return 1
}
return stupidFib(n-1) + stupidFib(n-2)
}
func parallelMap<T>(over array: inout [T], transform: (T) -> T) {
DispatchQueue.concurrentPerform(iterations: array.count) { idx in
array[idx] = transform(array[idx])
}
}
var data = (0..<50).map{$0} // ([0, 1, 2, 3, ... 49]
parallelMap(over: &data, transform: stupidFib) // uses all CPU cores (sort of)
print(data) // prints first 50 numbers in the fibonacci sequence
Following to the question published in How expressive can we be with arrays in Z3(Py)? An example, I expressed the following formula in Z3Py:
Exists i::Integer s.t. (0<=i<|arr|) & (avg(arr)+t<arr[i])
This means: whether there is a position i::0<i<|arr| in the array whose value a[i] is greater than the average of the array avg(arr) plus a given threshold t.
The solution in Z3Py:
t = Int('t')
avg_arr = Int('avg_arr')
len_arr = Int('len_arr')
arr = Array('arr', IntSort(), IntSort())
phi_1 = And(0 <= i, i< len_arr)
phi_2 = (t+avg_arr<arr[i])
phi = Exists(i, And(phi_1, phi_2))
s = Solver()
s.add(phi)
print(s.check())
print(s.model())
Note that, (1) the formula is satisfiable and (2) each time I execute it, I get a different model. For instance, I just got: [avg_a = 0, t = 7718, len_arr = 1, arr = K(Int, 7719)].
I have three questions now:
What does arr = K(Int, 7719)] mean? Does this mean the array contains one Int element with value 7719? In that case, what does the K mean?
Of course, this implementation is wrong in the sense that the average and length values are independent from the array itself. How can I implement simple avg and len functions?
Where is the i index in the model given by the solver?
Also, in which sense would this implementation be different using sequences instead of arrays?
(1) arr = K(Int, 7719) means that it's a constant array. That is, at every location it has the value 7719. Note that this is truly "at every location," i.e., at every integer value. There's no "size" of the array in SMTLib parlance. For that, use sequences.
(2) Indeed, your average/length etc are not related at all to the array. There are ways of modeling this using quantifiers, but I'd recommend staying away from that. They are brittle, hard to code and maintain, and furthermore any interesting theorem you want to prove will get an unknown as answer.
(3) The i you declared and the i you used as the existential is completely independent of each other. (Latter is just a trick so z3 can recognize it as a value.) But I guess you removed that now.
The proper way to model such problems is using sequences. (Although, you shouldn't expect much proof performance there either.) Start here: https://microsoft.github.io/z3guide/docs/theories/Sequences/ and see how much you can push it through. Functions like avg will need a recursive definition most likely, for that you can use RecAddDefinition, for an example see: https://stackoverflow.com/a/68457868/936310
Stack-overflow works the best when you try to code these yourself and ask very specific questions about how to proceed, as opposed to overarching questions. (But you already knew that!) Best of luck..
I have a 2D array of type f32 (from ndarray::ArrayView2) and I want to find the index of the maximum value in each row, and put the index value into another array.
The equivalent in Python is something like:
import numpy as np
for i in range (0, max_val, batch_size):
sims = xp.dot(batch, vectors.T)
# sims is the dot product of batch and vectors.T
# the shape is, for example, (1024, 10000)
best_rows[i: i+batch_size] = sims.argmax(axis = 1)
In Python, the function .argmax is very fast, but I don't see any function like that in Rust. What's the fastest way of doing so?
Consider the easy case of a general Ord type: The answer will differ slightly depending on whether you know the values are Copy or not, but here's the code:
fn position_max_copy<T: Ord + Copy>(slice: &[T]) -> Option<usize> {
slice.iter().enumerate().max_by_key(|(_, &value)| value).map(|(idx, _)| idx)
}
fn position_max<T: Ord>(slice: &[T]) -> Option<usize> {
slice.iter().enumerate().max_by(|(_, value0), (_, value1)| value0.cmp(value1)).map(|(idx, _)| idx)
}
The basic idea is that we pair [a reference to] each item in the array (really, a slice - it doesn't matter if it's a Vec or an array or something more exotic) with its index, use std::iter::Iterator functions to find the maximum value according to the value only (not the index), then return just the index. If the slice is empty None will be returned. Per the documentation, the rightmost index will be returned; if you need the leftmost, do rev() after enumerate().
rev(), enumerate(), max_by_key(), and max_by() are documented here; slice::iter() is documented here (but that one needs to be on your shortlist of things to recall without documentation as a rust dev); map is Option::map() documented here (ditto). Oh, and cmp is Ord::cmp but most of the time you can use the Copy version which doesn't need it (e.g. if you're comparing integers).
Now here's the catch: f32 isn't Ord because of the way IEEE floats work. Most languages ignore this and have subtly wrong algorithms. The most popular crate to provide a total order on Ord (by declaring all NaN to be equal, and greater than all numbers) seems to be ordered-float. Assuming it's implemented correctly it should be very very lightweight. It does pull in num_traits but this is part of the most popular numerics library so might well be pulled in by other dependencies already.
You'd use it in this case by mapping ordered_float::OrderedFloat (the "constructor" of the tuple type) over the slice iter (slice.iter().map(ordered_float::OrderedFloat)). Since you only want the position of the maximum element, no need to extract the f32 afterward.
The approach from #David A is cool, but as mentioned, there's a catch: f32 & f64 do not implement Ord::cmp. (Which is really a pain in your-know-where.)
There are multiple ways of solving that: You can implement cmp yourself, or you can use ordered-float, etc..
In my case, this is a part of a bigger project and we are very careful about using external packages. Besides, I am pretty sure we don't have any NaN values. Therefore I would prefer using fold, which, if you take a close look at the max_by_key source code, is what they have been using too.
for (i, row) in matrix.axis_iter(Axis(1)).enumerate() {
let (max_idx, max_val) =
row.iter()
.enumerate()
.fold((0, row[0]), |(idx_max, val_max), (idx, val)| {
if &val_max > val {
(idx_max, val_max)
} else {
(idx, *val)
}
});
}
Say I have an array of integers A such that A[i] = j, and I want to "invert it"; that is, to create another array of integers B such that B[j] = i.
This is trivial to do procedurally in linear time in any language; here's a Python example:
def invert_procedurally(A):
B = [None] * (max(A) + 1)
for i, j in enumerate(A):
B[j] = i
return B
However, is there any way to do this functionally (as in functional programming, using map, reduce, or functions like those) in linear time?
The code might look something like this:
def invert_functionally(A):
# We can't modify variables in FP; we can only return a value
return map(???, A) # What goes here?
If this is not possible, what is the best (most efficient) alternative when doing functional programming?
In this context are arrays mutable or immutable? Generally I'd expect the mutable case to be about as straightforward as your Python implementation, perhaps aside from a few wrinkles with types. I'll assume you're more interested in the immutable scenario.
This operation inverts the indices and elements, so it's also important to know something about what constitutes valid array indices and impose those same constraints on the elements. Haskell has a class for index constraints called Ix. Any Ix type is ordered and has a range implementation to make an ordered list of indices ranging from one specified index to another. I think this Haskell implementation does what you want.
import Data.Array.IArray
invertArray :: (Ix x) => Array x x -> Array x x
invertArray arr = listArray (low,high) newElems
where oldElems = elems arr
newElems = indices arr
low = minimum oldElems
high = maximum oldElems
Under the hood listArray uses zipWith and range to associate indices in the specified range to the listed elements. That part ought to be linear time, and so is the one-time operation of extracting elements and indices from an array.
Whenever the sets of the input arrays indices and elements differ some elements will be undefined, which for better or worse blow up faster than Python's None. I believe you could overcome the undefined issue by implementing new Ix a instances over the Maybe monad, for instance.
Quick side-note: check out the invPerm example in the Haskell 98 Library Report. It does something similar to invertArray, but assumes up front that input array's elements are a permutation of its indices.
A solution needing mapand 3 operations:
toTuples views an the array as a list of tuples (i,e) where i is the index and e the element in the array at that index.
fromTuples creates and loads an array from a list of tuples.
swap which takes a tuple (a,b) and returns (b,a)
Hence the solution would be (in Haskellish notation):
invert = fromTuples . map swap . toTuples
I'd like something like this:
each[i_, {1,2,3},
Print[i]
]
Or, more generally, to destructure arbitrary stuff in the list you're looping over, like:
each[{i_, j_}, {{1,10}, {2,20}, {3,30}},
Print[i*j]
]
Usually you want to use Map or other purely functional constructs and eschew a non-functional programming style where you use side effects. But here's an example where I think a for-each construct is supremely useful:
Say I have a list of options (rules) that pair symbols with expressions, like
attrVals = {a -> 7, b -> 8, c -> 9}
Now I want to make a hash table where I do the obvious mapping of those symbols to those numbers. I don't think there's a cleaner way to do that than
each[a_ -> v_, attrVals, h[a] = v]
Additional test cases
In this example, we transform a list of variables:
a = 1;
b = 2;
c = 3;
each[i_, {a,b,c}, i = f[i]]
After the above, {a,b,c} should evaluate to {f[1],f[2],f[3]}. Note that that means the second argument to each should be held unevaluated if it's a list.
If the unevaluated form is not a list, it should evaluate the second argument. For example:
each[i_, Rest[{a,b,c}], Print[i]]
That should print the values of b and c.
Addendum: To do for-each properly, it should support Break[] and Continue[]. I'm not sure how to implement that. Perhaps it will need to somehow be implemented in terms of For, While, or Do since those are the only loop constructs that support Break[] and Continue[].
And another problem with the answers so far: they eat Return[]s. That is, if you are using a ForEach loop in a function and want to return from the function from within the loop, you can't. Issuing Return inside the ForEach loop seems to work like Continue[]. This just (wait for it) threw me for a loop.
I'm years late to the party here, and this is perhaps more an answer to the "meta-question", but something many people initially have a hard time with when programming in Mathematica (or other functional languages) is approaching a problem from a functional rather than structural viewpoint. The Mathematica language has structural constructs, but it's functional at its core.
Consider your first example:
ForEach[i_, {1,2,3},
Print[i]
]
As several people pointed out, this can be expressed functionally as Scan[Print, {1,2,3}] or Print /# {1,2,3} (although you should favor Scan over Map when possible, as previously explained, but that can be annoying at times since there is no infix operator for Scan).
In Mathematica, there's usually a dozen ways to do everything, which is sometimes beautiful and sometimes frustrating. With that in mind, consider your second example:
ForEach[{i_, j_}, {{1,10}, {2,20}, {3,30}},
Print[i*j]
]
... which is more interesting from a functional point of view.
One possible functional solution is to instead use list replacement, e.g.:
In[1]:= {{1,10},{2,20},{3,30}}/.{i_,j_}:>i*j
Out[1]= {10,40,90}
...but if the list was very large, this would be unnecessarily slow since we are doing so-called "pattern matching" (e.g., looking for instances of {a, b} in the list and assigning them to i and j) unnecessarily.
Given a large array of 100,000 pairs, array = RandomInteger[{1, 100}, {10^6, 2}], we can look at some timings:
Rule-replacement is pretty quick:
In[3]:= First[Timing[array /. {i_, j_} :> i*j;]]
Out[3]= 1.13844
... but we can do a little better if we take advantage of the expression structure where each pair is really List[i,j] and apply Times as the head of each pair, turning each {i,j} into Times[i,j]:
In[4]:= (* f###list is the infix operator form of Apply[f, list, 1] *)
First[Timing[Times ### array;]]
Out[4]= 0.861267
As used in the implementation of ForEach[...] above, Cases is decidedly suboptimal:
In[5]:= First[Timing[Cases[array, {i_, j_} :> i*j];]]
Out[5]= 2.40212
... since Cases does more work than just the rule replacement, having to build an output of matching elements one-by-one. It turns out we can do a lot better by decomposing the problem differently, and take advantage of the fact that Times is Listable, and supports vectorized operation.
The Listable attribute means that a function f will automatically thread over any list arguments:
In[16]:= SetAttributes[f,Listable]
In[17]:= f[{1,2,3},{4,5,6}]
Out[17]= {f[1,4],f[2,5],f[3,6]}
So, since Times is Listable, if we instead had the pairs of numbers as two separate arrays:
In[6]:= a1 = RandomInteger[{1, 100}, 10^6];
a2 = RandomInteger[{1, 100}, 10^6];
In[7]:= First[Timing[a1*a2;]]
Out[7]= 0.012661
Wow, quite a bit faster! Even if the input wasn't provided as two separate arrays (or you have more than two elements in each pair,) we can still do something optimal:
In[8]:= First[Timing[Times##Transpose[array];]]
Out[8]= 0.020391
The moral of this epic is not that ForEach isn't a valuable construct in general, or even in Mathematica, but that you can often obtain the same results more efficiently and more elegantly when you work in a functional mindset, rather than a structural one.
Newer versions of Mathematica (6.0+) have generalized versions of Do[] and Table[] that do almost precisely what you want, by taking an alternate form of iterator argument. For instance,
Do[
Print[i],
{i, {1, 2, 3}}]
is exactly like your
ForEach[i_, {1, 2, 3,},
Print[i]]
Alterntatively, if you really like the specific ForEach syntax, you can make a HoldAll function that implements it, like so:
Attributes[ForEach] = {HoldAll};
ForEach[var_Symbol, list_, expr_] :=
ReleaseHold[
Hold[
Scan[
Block[{var = #},
expr] &,
list]]];
ForEach[vars : {__Symbol}, list_, expr_] :=
ReleaseHold[
Hold[
Scan[
Block[vars,
vars = #;
expr] &,
list]]];
This uses symbols as variable names, not patterns, but that's how the various built-in control structures like Do[] and For[] work.
HoldAll[] functions allow you to put together a pretty wide variety of custom control structures. ReleaseHold[Hold[...]] is usually the easiest way to assemble a bunch of Mathematica code to be evaluated later, and Block[{x = #}, ...]& allows variables in your expression body to be bound to whatever values you want.
In response to dreeves' question below, you can modify this approach to allow for more arbitrary destructuring using the DownValues of a unique symbol.
ForEach[patt_, list_, expr_] :=
ReleaseHold[Hold[
Module[{f},
f[patt] := expr;
Scan[f, list]]]]
At this point, though, I think you may be better off building something on top of Cases.
ForEach[patt_, list_, expr_] :=
With[{bound = list},
ReleaseHold[Hold[
Cases[bound,
patt :> expr];
Null]]]
I like making Null explicit when I'm suppressing the return value of a function. EDIT: I fixed the bug pointed out be dreeves below; I always like using With to interpolate evaluated expressions into Hold* forms.
The built-in Scan basically does this, though it's uglier:
Scan[Print[#]&, {1,2,3}]
It's especially ugly when you want to destructure the elements:
Scan[Print[#[[1]] * #[[2]]]&, {{1,10}, {2,20}, {3,30}}]
The following function avoids the ugliness by converting pattern to body for each element of list.
SetAttributes[ForEach, HoldAll];
ForEach[pat_, lst_, bod_] := Scan[Replace[#, pat:>bod]&, Evaluate#lst]
which can be used as in the example in the question.
PS: The accepted answer induced me to switch to this, which is what I've been using ever since and it seems to work great (except for the caveat I appended to the question):
SetAttributes[ForEach, HoldAll]; (* ForEach[pattern, list, body] *)
ForEach[pat_, lst_, bod_] := ReleaseHold[ (* converts pattern to body for *)
Hold[Cases[Evaluate#lst, pat:>bod];]]; (* each element of list. *)
The built-in Map function does exactly what you want. It can be used in long form:
Map[Print, {1,2,3}]
or short-hand
Print /# {1,2,3}
In your second case, you'd use "Print[Times###]&/#{{1,10}, {2,20}, {3,30}}"
I'd recommend reading the Mathematica help on Map, MapThread, Apply, and Function. They can take bit of getting used to, but once you are, you'll never want to go back!
Here is a slight improvement based on the last answer of dreeves that allows to specify the pattern without Blank (making the syntax similar to other functions like Table or Do) and that uses the level argument of Cases
SetAttributes[ForEach,HoldAll];
ForEach[patt_/; FreeQ[patt, Pattern],list_,expr_,level_:1] :=
Module[{pattWithBlanks,pattern},
pattWithBlanks = patt/.(x_Symbol/;!MemberQ[{"System`"},Context[x]] :> pattern[x,Blank[]]);
pattWithBlanks = pattWithBlanks/.pattern->Pattern;
Cases[Unevaluated#list, pattWithBlanks :> expr, {level}];
Null
];
Tests:
ForEach[{i, j}, {{1, 10}, {2, 20}, {3, 30}}, Print[i*j]]
ForEach[i, {{1, 10}, {2, 20}, {3, 30}}, Print[i], 2]
Mathematica have map functions, so lets say you have a function Functaking one argument. Then just write
Func /# list
Print /# {1, 2, 3, 4, 5}
The return value is a list of the function applied to each element in the in-list.
PrimeQ /# {10, 2, 123, 555}
will return {False,True,False,False}
Thanks to Pillsy and Leonid Shifrin, here's what I'm now using:
SetAttributes[each, HoldAll]; (* each[pattern, list, body] *)
each[pat_, lst_List, bod_] := (* converts pattern to body for *)
(Cases[Unevaluated#lst, pat:>bod]; Null); (* each element of list. *)
each[p_, l_, b_] := (Cases[l, p:>b]; Null); (* (Break/Continue not supported) *)