In Python we can create index a numpy.ndarray with tuples, like
cube = numpy.zeros((3,3,3,))
print(cube[(0,1,2,)])
. However, in Haskell, to index a multi-layered array this can only be done with multiple !!'s which seems pretty adhoc.
I tried foldl:
foldl
(!!)
[[[1, 2, ....], [1, 2, ....], [1, 2, ....]],
[[1, 2, ....], [1, 2, ....], [1, 2, ....]],
[[1, 2, ....], [1, 2, ....], [1, 2, ....]]]
[0, 1, 2]
However foldl can only apply to functions like a -> b -> a, not [a] -> b -> a. Some other information shows hmatrix can do things like numpy in python, but it only applies to matrix and vectors, where the dimension is not adjustable.
This can always be done with C style indexing, i.e. put all data in a 1d list, and index them with multiplications, 0 + 1*3 + 2*9, but it seems rude, losing the information of dimensions and will cause the compiler fail to adjust them in a proper order.
How to do this with a more abstract way?
It is not quite clear to me from the question what you are trying to achieve, but if your question is only about indexing multidimensional arrays in Haskell then I'll try to answer it to best of my ability. Thanks to #leftaroundabout for suggesting massiv in the comments section, being the author of that library I am inclined to agree with his comment.
One thing is for certain, for multiple reasons you do not want to use nested lists for the purpose of arrays. Linear indexing complexity and abysmal performance are only some of those reasons.
Constructing an array
Let's see how we can get it done with massiv. First I'll translate your numpy example:
cube :: Array P Ix3 Float
cube = A.replicate Seq (Sz (3 :> 3 :. 3)) 0
Note because we actually have types in Haskell we need to do some annotations on what type of array we are trying to construct, eg. boxed vs unboxed, mutable vs immutable etc. I recommend reading through library's documentation in order to get more info on those topics. Here I'll focus on indices, since that is what the question is about. In order to get an element from the above 3D array at 0th page, 2nd row and 3rd column (the cube[(0,1,2,)] from numpy example) we can use O(1) time operator ! with an index supplied on its right side:
λ> cube ! (0 :> 1 :. 2)
0.0
Note that indexing operator ! is partial and will result in a runtime exception on out of bounds:
λ> cube ! (10 :> 1 :. 2)
*** Exception: IndexOutOfBoundsException: (10 :> 1 :. 2) is not safe for (Sz (3 :> 3 :. 3))
CallStack (from HasCallStack):
throwEither, called at src/Data/Massiv/Core/Common.hs:807:11 in massiv-1.0.1.0-...
Which can be easily avoid with its safer variant !?:
λ> cube !? (0 :> 1 :. 2) :: Maybe Float
Just 0.0
λ> cube !? (10 :> 1 :. 2) :: Maybe Float
Nothing
Index syntax
Same as with numpy it is possible to use tuples for indexing massiv arrays, but because tuples are polymorphic, it is sometimes trickier for the type checker to infer the right thing, also tuples are supported in massiv only up to 5 dimensions. That's why I will show examples for Ix n type instead, where n is number of dimensions, which can be arbitrary.
When working with flat vectors then regular Int is used for indexing (corresponds to Ix 1):
λ> let vec = makeVectorR P Seq (Sz 10) id
λ> vec
Array P Seq (Sz1 10)
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
λ> vec ! 7
7
For two dimensions there is a special operator :. (corresponds to Ix 2):
λ> let mat = makeArrayR P Seq (Sz (2 :. 10)) $ \(i :. j) -> i + j
λ> mat
Array P Seq (Sz (2 :. 10))
[ [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
, [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
]
λ> mat ! (1 :. 3)
4
Index for any dimension larger than 2 is built with :> operator (corresponds to Ix n):
λ> let arr3D = makeArrayR P Seq (Sz (3 :> 2 :. 1)) $ \(i :> j :. k) -> i + j + k
λ> arr3D ! (2 :> 1 :. 0)
3
λ> let arr4D = makeArrayR P Seq (Sz (4 :> 3 :> 2 :. 1)) $ \(h :> i :> j :. k) -> h + i + j + k
λ> arr4D ! (3 :> 2 :> 1 :. 0)
6
More info on indices with examples can be found in the README's #index section.
Related
Dear friends in stack overflow,
I have trouble calculation with Numpy and Sympy. A is defined by
import numpy as np
import sympy as sym
sym.var('x y')
f = sym.Matrix([0,x,y])
func = sym.lambdify( (x,y), f, "numpy")
X=np.array([1,2,3])
Y=np.array((1,2,3])
A = func(X,Y).
Here, X and Y are just examples. In general, X and Y are one dimensional array in numpy, and they have the same length. Then, A’s output is
array([[0],
[array([1, 2, 3])],
[array([1, 2, 3])]], dtype=object).
But, I'd like to get this as
np.array([[0,0,0],[1,2,3],[1,2,3]]).
If we call this B, How do you convert A to B automatically. B’s first column is filled by 0, and it has the same length with X and Y.
Do you have any ideas?
First let's make sure we understand what is happening:
In [52]: x, y = symbols('x y')
In [54]: f = Matrix([0,x,y])
...: func = lambdify( (x,y), f, "numpy")
In [55]: f
Out[55]:
⎡0⎤
⎢ ⎥
⎢x⎥
⎢ ⎥
⎣y⎦
In [56]: print(func.__doc__)
Created with lambdify. Signature:
func(x, y)
Expression:
Matrix([[0], [x], [y]])
Source code:
def _lambdifygenerated(x, y):
return (array([[0], [x], [y]]))
See how the numpy function looks just like the sympy, replacing sym.Matrix with np.array. lambdify just does a lexographic translation; it does not have a deep knowledge of the differences between the languages.
With scalars the func runs as expected:
In [57]: func(1,2)
Out[57]:
array([[0],
[1],
[2]])
With arrays the results is this ragged array (new enough numpy adds this warning:
In [59]: func(np.array([1,2,3]),np.array([1,2,3]))
<lambdifygenerated-2>:2: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return (array([[0], [x], [y]]))
Out[59]:
array([[0],
[array([1, 2, 3])],
[array([1, 2, 3])]], dtype=object)
If you don't know numpy, sympy is not a short cut to filling in your knowledge gaps.
The simplest fix is to replace original 0 with another symbol.
Even in sympy, the 0 is not expanded:
In [65]: f.subs({x:Matrix([[1,2,3]]), y:Matrix([[4,5,6]])})
Out[65]:
⎡ 0 ⎤
⎢ ⎥
⎢[1 2 3]⎥
⎢ ⎥
⎣[4 5 6]⎦
In [74]: Matrix([[0,0,0],[1,2,3],[4,5,6]])
Out[74]:
⎡0 0 0⎤
⎢ ⎥
⎢1 2 3⎥
⎢ ⎥
⎣4 5 6⎦
In [75]: Matrix([[0],[1,2,3],[4,5,6]])
...
ValueError: mismatched dimensions
To make the desired array in numpy we have to do something like:
In [71]: arr = np.zeros((3,3), int)
In [72]: arr[1:,:] = [[1,2,3],[4,5,6]]
In [73]: arr
Out[73]:
array([[0, 0, 0],
[1, 2, 3],
[4, 5, 6]])
That is, initial the array and fill selected rows. There isn't simple expression that will do the desired 'automaticlly fill the first row with 0', much less something that can be naively translated from sympy.
I am wondering if there is a 1 liner to do this assignment in the array in Julia:
h = .1
L = 1
x = 0:h:L
n = length(x)
discretized = zeros(n,n)
#really any old function
f(x,y) = x*y + cos(x) + sin(y)
for i in 1:n
for j in 1:n
discretized[i, j] = f(x[i], x[j])
end
end
Or do I explicitly have to write out the loops?
You could broadcast the function over an array an its transpose - julia will return the result as a 2d Array:
x = 0:0.1:1
f(x,y) = x*y + cos(x) + sin(y)
A = f.(x,x') # the `.` before the bracket broadcasts the dimensions
# 11×11 Array{Float64,2}
or if have more complicated expressions or functions and don't want to write out lots of dots use the #. macro, e.g:
A = #. f(x,x') + x^2
Once A already exists, you can also do
#. A = f(x,x') + x^2
which uses .= to write the result locally to each element of A, and hence is non-allocating.
Broadcasting goes much further than this easy extension of scalar functions to arrays, allowing "fusion" of multiple calculations into a single fast operation https://julialang.org/blog/2017/01/moredots
You could do:
discretized = [f(i, j) for i in x, j in x]
For more information, see https://docs.julialang.org/en/v1/manual/arrays/#Comprehensions-1
Edit: Based on the comments, here's a brief overview of what the : operator does in indexing:
julia> a = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3
julia> a[:]
3-element Array{Int64,1}:
1
2
3
julia> ans === a
false
julia> a[:] .= [2, 3, 4]
3-element view(::Array{Int64,1}, :) with eltype Int64:
2
3
4
julia> a
3-element Array{Int64,1}:
2
3
4
Suppose I have a 2-D array such that the first column is composed of only two integers 1 and 2:
1 5 1 7 0.5
2 4 5 6 0.1
1 9 3 4 0.6
2 8 7 2 0.2
I want to separate two matrices out of this, such that the first column of each contains the same integer (so the first column of first matrix contains only integer 1, same goes for 2 in the second matrix).
So it would become:
1 5 1 7 0.5
1 9 3 4 0.6
and
2 4 5 6 0.1
2 8 7 2 0.2
I don't know exactly how to start. I was thinking of using the count at the beginning (well, because I have a way larger matrix with 10 different integers in the first column), then according to the counted number of each integer I construct the dimension of each [sub]matrix. After that, the only thing I could think of is the count(mask), and if the value is true it's then added to the matrix by if statement.
You can't have mixed types (integer and real) in the same array in Fortran, so I will suppose all data are real in the 2-dim array:
program split
implicit none
real, allocatable :: a(:, :), b(:, :)
integer :: i, ids = 10
integer, allocatable :: id(:), seq(:)
a = reshape([real :: 1, 5, 1, 7, 0.5, &
& 2, 4, 5, 6, 0.1, &
& 1, 9, 3, 4, 0.6, &
& 2, 8, 7, 2, 0.2], [5, 4])
seq = [(i, i = 1, size(a, 2))]
do i = 1, ids
print*, "i = ", i
! here we are creating a vector with all the line indices that start with i
! e.g. for i = 1 we get id = [1, 3], for i = 2 we get [2, 4], for i = 3 we get [], ...
id = pack(seq, a(1,:) == i)
! here we use a Fortran feature named vector-subscript
b = a(:, id)
print*, b
end do
end
If you want the first column(or any column) to be integer, you can declare it as a separated array, and use the same vector subscripts to gather the desired lines.
Suppose there are two 1-D arrays of the same length:
let x = fromListUnboxed (ix1 4) [1, 2, 3, 4]
let y = fromListUnboxed (ix1 4) [5, 6, 7, 8]
Now I would like to stack these two arrays into one 2-D array so that these arrays form the rows. How can I do it in repa?
Basically, I'm looking for an equivalent of numpy's row_stack:
>>> x = np.array([1, 2, 3, 4])
>>> y = np.array([5, 6, 7, 8])
>>> np.row_stack((x, y))
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Note. The two arrays, x and y, come from outside, i.e. I cannot create the 2-D array from scratch.
As I mentioned in the initial comment, all you need is to reshape then append (both in Data.Array.Repa.
ghci> let x' = reshape (ix2 4 1) x
ghci> let y' = reshape (ix2 4 1) y
ghci> z <- computeP $ x' `append` y' :: IO (Array U DIM2 Int)
ghci> z
AUnboxed ((Z :. 4) :. 2) [1,5,2,6,3,7,4,8]
As for pretty-printing, repa isn't very good (likely because there is no good pretty printing for higher dimensions). Here is a one-line hack to display z
ghci> putStr $ unlines [ unwords [ show $ z ! ix2 i j | i<-[0..3] ] | j<-[0..1] ]
1 2 3 4
5 6 7 8
I have a very simple function in my application that does a lot of work and occupies the most computation time:
f :: Int -> Array (Int,Int) Int -> [Int]
f x arr = [v | v <- range (l,u), vv <- [g!(x,v)], vv /= 0]
where ((_,l), (_,u)) = bounds arr
What this does is: extract a row at index x from the array arr and return all the column indices with elements \= 0. So, for instance, given the following matrix with bounds ((0,0),(2,2)):
arr = [[0, 0, 5],
[4, 0, 3],
[0, 3, 1]] -- for simplicity in [[a]] notation
the expected output is
f 0 arr == [2]
f 1 arr == [0,2]
f 2 arr == [1,2]
How do I speed up f and profile with more detail what actually takes most of the computation time in f (list construction, array access, etc)?
Thank you!
f x arr = [v | v <- range (l,u), vv <- [g!(x,v)], vv /= 0]
I suppose g is a typo, it should be arr.
Instead of range (l,u) use [l .. u], the compiler may be able to optimise the former as well as the latter, but [l .. u] is more idiomatic (and maybe the compiler can't optimise the former as well).
Don't create a pointless one-element list, you can use the direct test,
f x arr = [v | v <- [l .. u], arr!(x,v) /= 0]
The compiler may again be able to rewrite the former to the latter, but a) the latter is much clearer, and b) doesn't risk that the compiler can't.
To find out where time is spent, you can insert cost-centre annotations,
f x arr = [v | v <- {-# SCC "Range" #-} [l .. u], {-# SCC "ZeroTest" #-} (arr!(x,v) /= 0)]
but such annotations disable many optimisations (compiling for profiling always does), so the picture you get from profiling can be skewed and differ from what actually happens in the optimised programme.
Here is documentation on profiling:
http://www.haskell.org/ghc/docs/latest/html/users_guide/profiling.html
And a book chapter:
http://book.realworldhaskell.org/read/profiling-and-optimization.html
If your code turns out to be slow because of array updates, use Data.Vector.* from vector package.