how to calculate the mean value of an incomplete matrix with maxima? - symbolic-math

maybe I miss the obvious, but how do I get the mean value of the following matrix ?
matrix( [ , , ],
[7.5133, , 5.3 ],
[4.93 , 5.7667 , 2.9067 ] );
I tried mean, geometric_mean, ... and the other commands from the descriptive package, but they don't work with missing values.
regards,
Marcus

I think one have to implement it. For example
M: matrix(
[ und , und, und],
[7.5133, und, 5.3],
[4.93 , 5.7667 , 2.9067] ) $
ulength(x):=block([n: 0], matrixmap(lambda([e], if e#'und then n: n + 1), x), n) $
usum(x):=block([s: 0], matrixmap(lambda([e], if e#'und then s: s + e), x), s) $
umean(x):=usum(x)/ulength(x) $
umean(M);

Related

How do i slice an array for any size

I have an array of 2D, called X and a 1D array for X's classes, what i want to do is slice the same amount of first N percent elements for each class and store inside a new array, for example, in a simple way without doing for loops:
For the following X array which is 2D:
[[0.612515 0.385088 ]
[0.213345 0.174123 ]
[0.432596 0.8714246]
[0.700230 0.730789 ]
[0.455105 0.128509 ]
[0.518423 0.295175 ]
[0.659871 0.320614 ]
[0.459677 0.940614 ]
[0.823733 0.831789 ]
[0.236175 0.10750 ]
[0.379032 0.241121 ]
[0.512535 0.8522193]
Output is 3.
Then, i'd like to store the first 3 index that belongs to class 0 and first 3 elements that belongs to class 0 and maintain the occurence order of the indices, the following output:
First 3 from each class:
[1 0 0 1 0 1]
New_X =
[[0.612515 0.385088 ]
[0.213345 0.174123 ]
[0.432596 0.8714246]
[0.700230 0.730789 ]
[0.455105 0.128509 ]
[0.518423 0.295175 ]]
First, 30% is only 2 elements from each class (even when using np.ceil).
Second, I'll assume both arrays are numpy.array.
Given the 2 arrays, we can find the desired indices using np.where and array y in the following way:
in_ = sorted([x for x in [*np.where(y==0)[0][:np.ceil(0.3*6).astype(int)],*np.where(y==1)[0][:np.ceil(0.3*6).astype(int)]]]) # [0, 1, 2, 3]
Now we can simply slice X like so:
X[in_]
# array([[0.612515 , 0.385088 ],
# [0.213345 , 0.174123 ],
# [0.432596 , 0.8714246],
# [0.70023 , 0.730789 ]])
The definition of X and y are:
X = np.array([[0.612515 , 0.385088 ],
[0.213345 , 0.174123 ],
[0.432596 , 0.8714246],
[0.70023 , 0.730789 ],
[0.455105 , 0.128509 ],
[0.518423 , 0.295175 ],
[0.659871 , 0.320614 ],
[0.459677 , 0.940614 ],
[0.823733 , 0.831789 ],
[0.236175 , 0.1075 ],
[0.379032 , 0.241121 ],
[0.512535 , 0.8522193]])
y = np.array([1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0])
Edit
The following line: np.where(y==0)[0][:np.ceil(0.3*6).astype(int)] doing the following:
np.where(y==0)[0] - returns all the indices where y==0
Since you wanted only the 30%, we slice those indices to get all the values up to 30% - [:np.ceil(0.3*6).astype(int)]

Haskell - Reproduce numpy's reshape

Getting into Haskell, I'm trying to reproduce something like numpy's reshape with lists. Specifically, given a flat list, reshape it into an n-dimensional list:
import numpy as np
a = np.arange(1, 18)
b = a.reshape([-1, 2, 3])
# b =
#
# array([[[ 1, 2, 3],
# [ 4, 5, 6]],
#
# [[ 7, 8, 9],
# [10, 11, 12]],
#
# [[13, 14, 15],
# [16, 17, 18]]])
I was able to reproduce the behaviour with fixed indices, e.g.:
*Main> reshape23 [1..18]
[[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]],[[13,14,15],[16,17,18]]]
My code is:
takeWithRemainder :: (Integral n) => n -> [a] -> ([a], [a])
takeWithRemainder _ [] = ([], [])
takeWithRemainder 0 xs = ([], xs)
takeWithRemainder n (x:xs) = (x : taken, remaining)
where (taken, remaining) = takeWithRemainder (n-1) xs
chunks :: (Integral n) => n -> [a] -> [[a]]
chunks _ [] = []
chunks chunkSize xs = chunk : chunks chunkSize remainderOfList
where (chunk, remainderOfList) = takeWithRemainder chunkSize xs
reshape23 = chunks 2 . chunks 3
Now, I can't seem to find a way to generalise this to an arbitrary shape. My original idea was doing a fold:
reshape :: (Integral n) => [n] -> [a] -> [b]
reshape ns list = foldr (\n acc -> (chunks n) . acc) id ns list
But, no matter how I go about it, I always get a type error from the compiler. From my understanding, the problem is that at some moment, the type for acc is inferred to be id's i.e. a -> a, and it doesn't like the fact that the list of functions in the fold all have a different (although compatible for composition) type signature. I run into the same problem trying to implement this with recursion myself instead of a fold.
This confused me because originally I had intended for the [b] in reshape's type signature to be a stand-in for "another, dissociated type" that could be anything from [[a]] to [[[[[a]]]]].
How am I going wrong about this? Is there a way to actually achieve the behaviour I intended, or is it just plain wrong to want this kind of "dynamic" behaviour in the first place?
There are two details here that are qualitatively different from Python, ultimately stemming from dynamic vs. static typing.
The first one you have noticed yourself: at each chunking step the resulting type is different from the input type. This means you cannot use foldr, because it expects a function of one specific type. You could do it via recursion though.
The second problem is a bit less obvious: the return type of your reshape function depends on what the first argument is. Like, if the first argument is [2], the return type is [[a]], but if the first argument is [2, 3], then the return type is [[[a]]]. In Haskell, all types must be known at compile time. And this means that your reshape function cannot take the first argument that is defined at runtime. In other words, the first argument must be at the type level.
Type-level values may be computed via type functions (aka "type families"), but because it's not just the type (i.e. you also have a value to compute), the natural (or the only?) mechanism for that is a type class.
So, first let's define our type class:
class Reshape (dimensions :: [Nat]) from to | dimensions from -> to where
reshape :: from -> to
The class has three parameters: dimensions of kind [Nat] is a type-level array of numbers, representing the desired dimensions. from is the argument type, and to is the result type. Note that, even though it is known that the argument type is always [a], we have to have it as a type variable here, because otherwise our class instances won't be able to correctly match the same a between argument and result.
Plus, the class has a functional dependency dimensions from -> to to indicate that if I know both dimensions and from, I can unambiguously determine to.
Next, the base case: when dimentions is an empty list, the function just degrades to id:
instance Reshape '[] [a] [a] where
reshape = id
And now the meat: the recursive case.
instance (KnownNat n, Reshape tail [a] [b]) => Reshape (n:tail) [a] [[b]] where
reshape = chunksOf n . reshape #tail
where n = fromInteger . natVal $ Proxy #n
First it makes the recursive call reshape #tail to chunk out the previous dimension, and then it chunks out the result of that using the value of the current dimension as chunk size.
Note also that I'm using the chunksOf function from the library split. No need to redefine it yourself.
Let's test it out:
λ reshape # '[1] [1,2,3]
[[1],[2],[3]]
λ reshape # '[1,2] [1,2,3,4]
[[[1,2]],[[3,4]]]
λ reshape # '[2,3] [1..12]
[[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]]
λ reshape # '[2,3,4] [1..24]
[[[[1,2,3,4],[5,6,7,8],[9,10,11,12]],[[13,14,15,16],[17,18,19,20],[21,22,23,24]]]]
For reference, here's the full program with all imports and extensions:
{-# LANGUAGE
MultiParamTypeClasses, FunctionalDependencies, TypeApplications,
ScopedTypeVariables, DataKinds, TypeOperators, KindSignatures,
FlexibleInstances, FlexibleContexts, UndecidableInstances,
AllowAmbiguousTypes
#-}
import Data.Proxy (Proxy(..))
import Data.List.Split (chunksOf)
import GHC.TypeLits (Nat, KnownNat, natVal)
class Reshape (dimensions :: [Nat]) from to | dimensions from -> to where
reshape :: from -> to
instance Reshape '[] [a] [a] where
reshape = id
instance (KnownNat n, Reshape tail [a] [b]) => Reshape (n:tail) [a] [[b]] where
reshape = chunksOf n . reshape #tail
where n = fromInteger . natVal $ Proxy #n
#Fyodor Soikin's answer is perfect with respect to the actual question. Except there is a bit of a problem with the question itself. Lists of lists is not the same thing as an array. It is a common misconception that Haskell doesn't have arrays and you are forced to deal with lists, which could not be further from the truth.
Because the question is tagged with array and there is comparison to numpy, I would like to add a proper answer that handles this situation for multidimensional arrays. There are a couple of array libraries in Haskell ecosystem, one of which is massiv
A reshape like functionality from numpy can be achieved by resize' function:
λ> 1 ... (18 :: Int)
Array D Seq (Sz1 18)
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 ]
λ> resize' (Sz (3 :> 2 :. 3)) (1 ... (18 :: Int))
Array D Seq (Sz (3 :> 2 :. 3))
[ [ [ 1, 2, 3 ]
, [ 4, 5, 6 ]
]
, [ [ 7, 8, 9 ]
, [ 10, 11, 12 ]
]
, [ [ 13, 14, 15 ]
, [ 16, 17, 18 ]
]
]

How can I assign slices of one structured Numpy array to another?

I have two numpy structured arrays arr1, arr2.
arr1 has fields ['f1','f2','f3'].
arr2 has fields ['f1','f2','f3','f4'].
I.e.:
arr1 = [[f1_1_1, f2_1_1, f3_1_1 ], arr2 = [[f1_2_1, f2_2_1, f3_2_1, f4_2_1 ],
[f1_1_2, f2_1_2, f3_1_2 ], [f1_2_2, f2_2_2, f3_2_2, f4_2_2 ],
... , ... ,
[f1_1_N1, f2_1_N1, f3_1_N1]] [f1_2_N2, f2_2_N2, f3_2_N2, f4_2_N2]]
I want to assign various slices of arr1 to the corresponding slice of arr2 (slices in the indexes and in the fields).
See below for the various cases.
From answers I found (to related, but not exactly the same, questions) it seemed to me that the only way to do it is assigning one slice at a time, for a single field, i.e., something like
arr2['f1'][0:1] = arr1['f1'][0:1]
(and I can confirm this works), looping over all source fields in the slice.
Is there a way to assign all intended source fields in the slice at a time?
I mean to assign, say, the elements x in the image
Case 1 (only some fields in arr1)
arr1 = [[ x , x , f3_1_1 ], arr2 = [[ x , x , f3_2_1, f4_2_1 ],
[ x , x , f3_1_2 ], [ x , x , f3_2_2, f4_2_2 ],
... , ... ,
[f1_1_N1, f2_1_N1, f3_1_N1]] [f1_2_N2, f2_2_N2, f3_2_N2, f4_2_N2]]
Case 2 (all fields in arr1)
arr1 = [[ x , x , x ], arr2 = [[ x , x , x , f4_2_1 ],
[ x , x , x ], [ x , x , x , f4_2_2 ],
... , ... ,
[f1_1_N1, f2_1_N1, f3_1_N1]] [f1_2_N2, f2_2_N2, f3_2_N2, f4_2_N2]]
Case 3
arr1 has fields ['f1','f2','f3','f5'].
arr2 has fields ['f1','f2','f3','f4'].
Assign a slice of ['f1','f2','f3']
Sources:
Python Numpy Structured Array (recarray) assigning values into slices
Convert a slice of a structured array to regular NumPy array in NumPy 1.14
You can do it for example like that:
import numpy as np
x = np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)], dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
y = np.array([('Carl', 10, 75.0), ('Joe', 7, 76.0)], dtype=[('name2', 'U10'), ('age2', 'i4'), ('weight', 'f4')])
print(x[['name', 'age']])
print(y[['name2', 'age2']])
# multiple field indexing
y[['name2', 'age2']] = x[['name', 'age']]
print(y[['name2', 'age2']])
# you can also use slicing if you want specific parts or the size does not match
y[:1][['name2', 'age2']] = x[1:][['name', 'age']]
print(y[:][['name2', 'age2']])
The names field names can be different, I am not sure about the dtypes and if there is (down)casting.
https://docs.scipy.org/doc/numpy/user/basics.rec.html#assignment-from-other-structured-arrays
https://docs.scipy.org/doc/numpy/user/basics.rec.html#accessing-multiple-fields

Filter 2d arrays containing a 1d array inside a 3d array

I have got a 3d array (an array of triangles). I would like to get the triangles (2d arrays) containing a given point (1d array).
I went through in1d, where, argwhere but I am still unsuccessfull....
For instance with :
import numpy as np
import numpy.random as rd
t = rd.random_sample((10,3,3))
v0 = np.array([1,2,3])
t[1,2] = v0
t[5,0] = v0
t[8,1] = v0
I would like to get:
array([[[[[ 0.87312 , 0.33411403, 0.56808291],
[ 0.36769417, 0.66884858, 0.99675896],
[ 1. , 2. , 3. ]],
[[ 0.31995867, 0.58351034, 0.38731405],
[ 1. , 2. , 3. ],
[ 0.04435288, 0.96613852, 0.83228402]],
[[ 1. , 2. , 3. ],
[ 0.28647107, 0.95755263, 0.5378722 ],
[ 0.73731078, 0.8777235 , 0.75866665]]]])
to then get the set of v0 adjacent points
{[ 0.87312 , 0.33411403, 0.56808291],
[ 0.36769417, 0.66884858, 0.99675896],
[ 0.31995867, 0.58351034, 0.38731405],
[ 0.04435288, 0.96613852, 0.83228402],
[ 0.28647107, 0.95755263, 0.5378722 ],
[ 0.73731078, 0.8777235 , 0.75866665]}
without looping, the array being quite big.
For instance
In [28]: np.in1d(v0,t[8]).all()
Out[28]: True
works as a test on a line, but I can't get it over the all array.
Thanks for your help.
What I mean is the vectorized equivalent to:
In[54]:[triangle for triangle in t if v0 in triangle ]
Out[54]:
[array([[ 0.87312 , 0.33411403, 0.56808291],
[ 0.36769417, 0.66884858, 0.99675896],
[ 1. , 2. , 3. ]]),
array([[ 0.31995867, 0.58351034, 0.38731405],
[ 1. , 2. , 3. ],
[ 0.04435288, 0.96613852, 0.83228402]]),
array([[ 1. , 2. , 3. ],
[ 0.28647107, 0.95755263, 0.5378722 ],
[ 0.73731078, 0.8777235 , 0.75866665]])]
You can simply do -
t[(t==v0).all(axis=-1).any(axis=-1)]
We are performing ALL and ANY reduction along the last axis with axis=-1 there. First .all(axis=-1) looks for rows exactly matching the array v0 and then the latter .any(axis=-1) looks for ANY match in each of the 2D blocks. This results in a boolean array of the same length as the length of input array. So, we use the boolean array to filter out valid elements off the input array.

Using the apply functions to get sum of products of arrays in R

Say I have two three dimensional arrays and I would like the sum of the products of the arrays based on one of the indices. What I would like is that sum in the the last line of the example code below. I know I can use a loop but I'd like to do this in an efficient way, hoping that there is some R function that does something like this. Any help would be greatly appreciated.
a <- array(1:12, dim=c(3, 2, 2))
b <- array(1, dim=c(3, 2, 2))
a[1, , ] %*% t(b[1, , ]) + a[2, , ] %*% t(b[2, , ]) + a[3, , ] %*% t(b[3, , ])
Unless you actually experience serious inefficiency issues then do it with a for loop. You can't really use the built-in apply on two objects. (See comment)
Note that apply isn't guaranteed to be faster than regular for loops.
EDIT: As a result of the comments:
Reduce(`+`, lapply(1:dim(a)[1], function(i) a[i, , ] %*% t(b[i, , ])))
is a potential solution with applies. Though I doubt it is more efficient than a straight forward
sum <- matrix(0, ncol = dim(a)[2], nrow = dim(a)[2])
for (i in 1: dim(a)[1]) sum <- sum + a[i, , ] %*% t(b[i, , ])
which I think is much clearer in what its trying to do.

Resources