How to clone an array with length bigger than 32? - arrays

A fixed-length array of a native type (or of a type that implements the Copy trait) can be cloned in Rust up to the length of 32. That is, this compiles:
fn main() {
let source: [i32; 32] = [0; 32]; // length 32
let _cloned = source.clone();
}
But this doesn't:
fn main() {
let source: [i32; 33] = [0; 33]; // length 33
let _cloned = source.clone(); // <-- compile error
}
In fact, the trait Clone only declares a method for each generic array length, from 0 to 32.
What is an efficient and idiomatic way to clone a generic array of length, say, 33?

You can't add the impl Clone in your own code. This problem will be fixed at some point, in the mean time you can mostly work around it with varying amount of effort:
If you just have a local variable of a concrete type and the type is Copy (as in your example), you can simply copy rather than cloning, i.e., let _cloned = source;.
If the array is a field of a struct you want to implement Clone for (and derive won't work), you can still manually implement Clone and using the above trick in the implementation.
Cloning an array of non-Copy types is trickier, because Clone can fail. You could write out [x[0].clone(), x[1].clone(), ...] for as many times as you need, it's a lot of work but at least it's certain to be correct.
If all else fails, you can still create a newtype wrapper. This requires quite a bit of boilerplate to delegate all the other traits you need, but then you can (again, manually) implement Clone.

You can clone arbitrary-length arrays since Rust 1.21.0. The "Libraries" section of the CHANGELOG says:
Generate builtin impls for Clone for all arrays and tuples that are T: Clone

Related

Passing shared memory variables in python multiprocessing

I have a bunch of files that I want to read in parallel using Python's multiprocessing and collect all the data in a single NumPy array. For this purpose, I want to define a shared memory NumPy array and pass its slices to different processes to read in parallel. A toy illustration of what I am trying to do is given in the following code where I am trying to modify a numpy array using multiprocessing.
Example 1:
import numpy as np
import multiprocessing
def do_stuff(i, arr):
arr[:]=i
return
def print_error(err):
print(err)
if __name__ == '__main__':
idx = [0,1,2,3]
# Need to fill this array in parallel
arr = np.zeros(4)
p = multiprocessing.Pool(4)
# Passing slices to arr to modify using multiprocessing
for i in idx:
p.apply(do_stuff, args=(i,arr[i:i+1]))
p.close()
p.join()
print(arr)
In this code, I want the arr to be filled with 0, 1, 2, 3. This however prints arr to be all zeros. After reading the answers here, I used multiprocessing.Array to define the shared memory variable and modified my code as follows
Example 2:
import numpy as np
import multiprocessing
def do_stuff(i, arr):
arr[:]=i
return
def print_error(err):
print(err)
if __name__ == '__main__':
idx = [0,1,2,3]
p = multiprocessing.Pool(4)
# Shared memory Array
shared = multiprocessing.Array('d', 4)
arr = np.ctypeslib.as_array(shared.get_obj())
for i in idx:
p.apply(do_stuff, args=(i,arr[i:i+1]))
p.close()
p.join()
print(arr)
This also prints all zeros for arr. However, when I define the array outside main and use pool.map, the code works. For e.g., the following code works
Example 3:
import numpy as np
import multiprocessing
shared = multiprocessing.Array('d', 4)
arr = np.ctypeslib.as_array(shared.get_obj())
def do_stuff(i):
arr[i]=i
return
def print_error(err):
print(err)
if __name__ == '__main__':
idx = [0,1,2,3]
p = multiprocessing.Pool(4)
shared = multiprocessing.Array('d', 4)
p.map(do_stuff, idx)
p.close()
p.join()
print(arr)
This prints [0,1,2,3].
I am very confused by all this. My questions are:
When I define arr = np.zeros(4), which processor owns this variable? When I then send the slice of this array to different processors what is being sent if this variable is not defined on those processors.
Why doesn't example 2 work while example 3 does?
I am working on Linux and Python/3.7/4
When I define arr = np.zeros(4), which processor owns this variable?
Only the main process should have access to this. If you use "fork" for the start method, everything will be accessible to the child process, but as soon as something tries to modify it, it will be copied to it's own private memory space before being modified (copy on write). This reduces overhead if you have large read-only arrays, but doesn't help you much for writing data back to those arrays.
what is being sent if this variable is not defined on those processors.
A new array is created within the child process when the arguments are re-constructed after being sent from the main process via a pipe and pickle. The data is serialized to text and re-constructed, so no information other than the value of the data in the slice remains. it's a totally new object.
Why doesn't example 2 work while example 3 does?
example 3 works because at the time of "fork" (the moment you call Pool), arr has already been created, and will be shared. It's also important that you used an Array to create it, so when you attempt to modify the data, the data is shared (the exact mechanics of this are complicated).
example 2 does not work in a similar way that example 1 does not work: you pass a slice of an array as an argument, which gets converted into a totally new object, so arr inside your do_stuff function is just a copy of arr[i:i+1] from the main process. It is still important to create anything which will be shared between processes before calling Pool (if you're relying on "fork" to share the data), but that's not why this example doesn't work.
You should know: example 3 only works because you're on linux, and the default start method is fork. This is not the preferred start method due to the possibility of deadlocks with copying lock objects in a locked state. This will not work on Windows at all, and won't work on MacOS by default on 3.8 and above.
The best solution (most portable) to all this is to pass the Array itself as the argument, and re-construct the numpy array inside the child process. This has the complication that "shared objects" can only be passed as arguments at the creation of the child process. This isn't as big a deal if you use Process, but with Pool, you basically have to pass any shared objects as arguments to an initialization function, and get the re-constructed array as a global variable of the child's scope. In this example for instance you will get an error trying to pass buf as an argument with p.map or p.apply, but not when passing buf as initargs=(buf,) to Pool()
import numpy as np
from multiprocessing import Pool, Array
def init_child(buf):
global arr #use global context (for each process) to pass arr to do_stuff
arr = np.frombuffer(buf.get_obj(), dtype='d')
def do_stuff(i):
global arr
arr[i]=i
if __name__ == '__main__':
idx = [0,1,2,3]
buf = Array('d', 4)
arr = np.frombuffer(buf.get_obj(), dtype='d')
arr[:] = 0
#"with" context is easier than writing "close" and "join" all the time
with Pool(4, initializer=init_child, initargs=(buf,)) as p:
for i in idx:
p.apply(do_stuff, args=(i,)) #you could pass more args to get slice indices too
print(arr)
with 3.8 and above there's a new module which is better than Array or any of the other sharedctypes classes called: shared_memory. This is a bit more complicated to use, and has some additional OS dependent nastiness, but it's theoretically lower overhead and faster. If you want to go down the rabbit hole I've written a few answers on the topic of shared_memory, and have recently been answering lots of questions on concurrency in general if you want to take a gander at my answers from the last month or two.

In Perl 6, can I use an Array as a Hash key?

In the Hash documentation, the section on Object keys seems to imply that you can use any type as a Hash key as long as you indicate but I am having trouble when trying to use an array as the key:
> my %h{Array};
{}
> %h{[1,2]} = [3,4];
Type check failed in binding to parameter 'key'; expected Array but got Int (1)
in block <unit> at <unknown file> line 1
Is it possible to do this?
The [1,2] inside the %h{[1,2]} = [3,4] is interpreted as a slice. So it tries to assign %h{1} and %{2}. And since the key must be an Array, that does not typecheck well. Which is what the error message is telling you.
If you itemize the array, it "does" work:
my %h{Array};
%h{ $[1,2] } = [3,4];
say %h.perl; # (my Any %{Array} = ([1, 2]) => $[3, 4])
However, that probably does not get what you want, because:
say %h{ $[1,2] }; # (Any)
That's because object hashes use the value of the .WHICH method as the key in the underlying array.
say [1,2].WHICH; say [1,2].WHICH;
# Array|140324137953800
# Array|140324137962312
Note that the .WHICH values for those seemingly identical arrays are different.
That's because Arrays are mutable. As Lists can be, so that's not really going to work.
So what are you trying to achieve? If the order of the values in the array is not important, you can probably use Sets as keys:
say [1,2].Set.WHICH; say [1,2].Set.WHICH
# Set|AEA2F4CA275C3FE01D5709F416F895F283302FA2
# Set|AEA2F4CA275C3FE01D5709F416F895F283302FA2
Note that these two .WHICHes are the same. So you could maybe write this as:
my %h{Set};
dd %h{ (1,2).Set } = (3,4); # $(3, 4)
dd %h; # (my Any %{Set} = ((2,1).Set) => $(3, 4))
Hope this clarifies things. More info at: https://docs.raku.org/routine/WHICH
If you are really only interested in use of an Object Hash for some reason, refer to Liz's answer here and especially the answers to, and comments on, a similar earlier question.
The (final1) focus of this answer is a simple way to use an Array like [1,'abc',[3/4,Mu,["more",5e6],9.9],"It's {<sunny rainy>.pick} today"] as a regular string hash key.
The basic principle is use of .perl to approximate an immutable "value type" array until such time as there is a canonical immutable Positional type with a more robust value type .WHICH.
A simple way to use an array as a hash key
my %hash;
%hash{ [1,2,3].perl } = 'foo';
say %hash{ [1,2,3].perl }; # displays 'foo'
.perl converts its argument to a string of Perl 6 code that's a literal version of that argument.
say [1,2,3].perl; # displays '[1, 2, 3]'
Note how spaces have been added but that doesn't matter.
This isn't a perfect solution. You'll obviously get broken results if you mutate the array between key accesses. Less obviously you'll get broken results corresponding to any limitations or bugs in .perl:
say [my %foo{Array},42].perl; # displays '[(my Any %{Array}), 42]'
1 This is, hopefully, the end of my final final answer to your question. See my earlier 10th (!!) version of this answer for discussion of the alternative of using prefix ~ to achieve a more limited but similar effect and/or to try make some sense of my exchange with Liz in the comments below.

How would you create a multidimensional array with n dimensions in Swift?

For instance, asume
var hierarchicalFileSystem: [[String]] = []
This allows one to support one layer of folders, but there appears to be no way to create an array in Swift like the one above but with an undefined number of nested String arrays.
Am I missing something here?
An array of arrays (of arrays of arrays...) of strings doesn't really make much sense to represent a file system.
What I'd instead recommend is making a class or struct to represent objects in the file system. Perhaps something like this:
struct FileSystemObject {
let name: String
let extension: String?
let isFolder: Bool
let contents: [FileSystemObject]?
}
Something like this let's us represent a file system quite nicely.
let fileSystem = [FileSystemObject]()
So, your fileSystem variable here is an array of FileSystemObjects and it represents the root. Each object within the root has its own set of details (its name, its file extension if it has one, and whether or not its a folder), and if it's a folder it has a non-nil contents property, and if it's a non-empty folder, that contents array of FileSystemObjects contains more file system objects (some of which are folders of course, which contain contents themselves).
What you can perhaps do is create an array with AnyObject and add new depths as you need it
var fileSystem: [AnyObject] = []
This would be a very bad way of representing a file system however and you should really go with some kind of tree structure like
struct Node {
children: [Node]?
parent: Node?
name: String
}
Swift is type-safe language. You have to declare type of your variable, or set it to AnyObject, but please don't. So, answering your question: yes it's possible:
var array: [AnyObject] = [[[1,2,3], [1,2,3]], [[1,2,3],[1,2,3]]]
But this is awful. Try to figure out better representation for your problem. Maybe custom structures.
you can have as much dimensional array as you want. is it a good idea? i don't think ...
var threeDArray: Array<Array<Array<String>>> = []
let oneDArray = ["1","2","3"]
let twoDArray1: Array<Array<String>> = [oneDArray, oneDArray, oneDArray, oneDArray, oneDArray]
let twoDArray2 = twoDArray1 + [["4","5","6"],["7","8","9"]]
threeDArray.append(twoDArray1)
threeDArray.append(twoDArray2)
let arr = [threeDArray,threeDArray,threeDArray]
print(arr.dynamicType) // Array<Array<Array<Array<String>>>>

Why do I get an error when creating an array in Swift?

In my program I have an array with some values:
let pointArray = [
[[185,350],8],
[[248.142766340927,337.440122864078],5],
[[301.67261889578,301.67261889578],5],
[[337.440122864078,248.142766340927],5],
[[350,185],8],
[[327.371274561396,101.60083825503],5],
[[301.67261889578,68.3273811042197],5],
[[248.142766340927,32.5598771359224],5],
[[185,20],8],
[[101.60083825503,42.6287254386042],5],
[[68.3273811042197,68.3273811042197],5],
[[42.6287254386042,101.60083825503],5],
[[20,185],8],
[[32.5598771359224,248.142766340927],5],
[[68.3273811042197,301.67261889578],8],
[[101.60083825503,327.371274561396],5]
]
When compiling I get the following error:
Expression was too complex to be solved in reasonable time; consider
breaking up the expression into distinct sub-expressions
Why am I getting the error? Is it just because the array is too large?
The Swift compiler is generally not happy if you give it a big array without telling it the type. It has to parse all of that data to try to infer a type. It will work if you declare the type of the array:
let pointArray:[[Any]] = [[[185,350],8],[[248.142766340927, ...
But, you'll have to cast to read the values. You should really consider putting your values into a struct and letting the array hold that.
For your data, an array of tuples might also work nicely:
let pointArray:[(point: [Double], count: Int)] = [
([185,350],8),
([248.142766340927,337.440122864078],5),
([301.67261889578,301.67261889578],5)
]
let point = pointArray[0].point // [185, 350]
let count = pointArray[0].count // 8
Theoretically it should work but there is a ceiling for the complexity of an expression that the IDE will not go above so having the Swift compiler quit is intentional. And since the type isn't declared i.e. [[AnyObject]] if you put your code in a playground it will spin and spin around, your fans will start whirling and the compiler will essentially quit.
Apple is working to reduce these errors. On the Apple Dev forums, they are asking people to file these errors as radar reports.

Go: Define multidimensional array with existing array's type and values?

Is it possible to a)define b)initialize a new multidimensional array using an existing array, like in following code instead of var b [2][3]int, just saying something like var b [2]a ?
Using a's type whatever it is, instead of hardcoding it (which misses the point of using [...] for a).
And perhaps handling initialization=copying of values at the same time?
package main
func main () {
a := [...]int{4,5,6}
var b [2][3]int
b[0],b[1] = a,a
}
(I'm aware of ease and convenience of slices, but this question is about understanding arrays.)
Edit: can't believe I forgot about var b [2][len(a)]int, beginner's brain freeze. One line answer would be var b = [2][len(a)]int{a,a} . That's a type conversion, right?
The following code would also work. Both your example and mine do the same thing and neither should be much faster than the other.
Unless you use reflect to make a slice (not array) of your [3]int, it is impossible to not repeat [3]int in your new type. Even that is not possible in the current release. It is in tip and will be released in Go 1.1.
package main
import "fmt"
func main() {
a := [...]int{4,5,6}
var b = [2][3]int{a, a}
fmt.Println(b)
}

Resources