convert dictionary to abstract matrix in julia

convert dictionary to abstract matrix in julia - arrays

I'm trying to do dimension reduction, and I got a:
d = Dict{Tuple{String, String}, Vector{Float64}}
Trying to apply umap on it.
While umap can only accepts abstractmatrix, so I do collect(d), but the dict is converted into vector, not array.
How do I convert it correctly to successfully apply umap?

You should be able to use
hcat(values(d)...)
using values you get a vector of vectors, the dictionary values. And hcat will concatenate them horizontally, however this function takes each vector as a different argument and therefore you need to splat this array into its elements, that is what the three dots ... do.
Check the documentation for splatting
As noted in the comments, a more efficient alternative is
reduce(hcat, values(d))
which achieves the same avoiding splatting.

Related

Indexing array by tuples in Julia?

I would like to create (in Julia) a 2 dimensional array Y storing the spherical harmonics Y_lm(x) evaluated at some fixed x, indexed by an integer l>=0 and -l<=m<=l.
How can I create the array Y such that I may access elements via tuples, e.g, to access Y_20(x) I would call Y[(2,0)]?
More generally does Julia allow arrays indexed by tuples (x1,...xn) if we don't know anything about the possible range of the xi (like a dictionary, but indexed by tuples of integers instead of strings)?

Short answer, that isn't what arrays are "for" in Julia, this is what Dict is for. In Julia (and many languages) what is generally meant by an array is something that is indexed by a series of contiguous integer values. (That said, you can implement your own object implementing the Array interface that might work differently...).
A Dict allows for any arbitrary set of indices, that can be any type you want, not just strings. For example:
Y = Dict()
Y[(2,0)] = "Hello, World"
println(Y[(2,0)])
For your particular problem there may be a more efficient solution, but I don't know enough about spherical harmonics to know what it would be. It would be worth looking at the package mentioned in the comments. It probably has a more idiomatic approach.

Why [1:2] != Array[1:2]

I am learning Julia following the Wikibook, but I don't understand why the following two commands give different results:
julia> [1:2]
1-element Array{UnitRange{Int64},1}:
1:2
julia> Array[1:2]
1-element Array{Array,1}:
[1,2]
Apologies if there is an explanation I haven't seen in the Wikibook, I have looked briefly but didn't find one.

Type[a] runs convert on the elements, and there is a simple conversion between a Range to an Array (collect). So Array[1:2] converts 1:2 to an array, and then makes an array of objects like that. This is the same thing as why Float64[1;2;3] is an array of Float64.
These previous parts answer answered the wrong thing. Oops...
a:b is not an array, it's a UnitRange. Why would you create an array for A = a:b? It only takes two numbers to store it, and you can calculate A[i] basically for free for any i. Using an array would take an amount of memory which is proportional to the b-a, and thus for larger arrays would take a lot of time to allocate, whereas allocation for UnitRange is essentially free.
These kinds of types in Julia are known as lazy iterators. LinSpace is another. Another interesting set of types are the special matrix types: why use more than an array to store a Diagonal? The UniformScaling operator acts as the identity matrix while only storing one value (it's scale) to make A-kI efficient.
Since Julia has a robust type system, there is no reason to make all of these things arrays. Instead, you can make them a specialized type which will act (*, +, etc.) and index like an array, but actually aren't. This will make them take less memory and be faster. If you ever need the array, just call collect(A) or full(A).
I realized that you posted something a little more specific. The reason here is that Array[1:2] calls the getindex function for an array. This getindex function has a special dispatch on a Range so that way it "acts like it's indexed by an array" (see the discussion from earlier). So that's "special-cased", but in actuality it just has dispatches to act like an array just like it does with every other function. [A] gives an array of typeof(A) no matter what A is, so there's no magic here.

Concentrate dataset arrays in matlab

Hi I have many arrays of different lengths now I want to create ONE long array (1D) out of all of them. Counterintuitively vertcat gives me a dimension error even though I do not see the point why the dimensions of my arrays should match.
Am I using vertcat wrong?

Your vectors are probably column vectors of different lengths (or matrices). Suppose A to D are the matrices you want to create a 1D-vector from. Try "flattening" them out using (:), and vertcat thereafter, like this:
long_1D_vector = [A(:); B(:); C(:); D(:)];
You may transpose it if you want a column vector instead:
long_1D_vector = [A(:); B(:); C(:); D(:)].';

How to get mean, median, and other statistics over entire matrix, array or dataframe?

I know this is a basic question but for some strange reason I am unable to find an answer.
How should I apply basic statistical functions like mean, median, etc. over entire array, matrix or dataframe to get unique answers and not a vector over rows or columns

Since this comes up a fair bit, I'm going to treat this a little more comprehensively, to include the 'etc.' piece in addition to mean and median.
For a matrix, or array, as the others have stated, mean and median will return a single value. However, var will compute the covariances between the columns of a two dimensional matrix. Interestingly, for a multi-dimensional array, var goes back to returning a single value. sd on a 2-d matrix will work, but is deprecated, returning the standard deviation of the columns. Even better, mad returns a single value on a 2-d matrix and a multi-dimensional array. If you want a single value returned, the safest route is to coerce using as.vector() first. Having fun yet?
For a data.frame, mean is deprecated, but will again act on the columns separately. median requires that you coerce to a vector first, or unlist. As before, var will return the covariances, and sd is again deprecated but will return the standard deviation of the columns. mad requires that you coerce to a vector or unlist. In general for a data.frame if you want something to act on all values, you generally will just unlist it first.
Edit: Late breaking news(): In R 3.0.0 mean.data.frame is defunctified:
o mean() for data frames and sd() for data frames and matrices are
defunct.

By default, mean and median etc work over an entire array or matrix.
E.g.:
# array:
m <- array(runif(100),dim=c(10,10))
mean(m) # returns *one* value.
# matrix:
mean(as.matrix(m)) # same as before
For data frames, you can coerce them to a matrix first (the reason this is by default over columns is because a dataframe can have columns with strings in it, which you can't take the mean of):
# data frame
mdf <- as.data.frame(m)
# mean(mdf) returns column means
mean( as.matrix(mdf) ) # one value.
Just be careful that your dataframe has all numeric columns before coercing to matrix. Or exclude the non-numeric ones.

You can use library dplyr via install.packages('dplyr') and then
dataframe.mean <- dataframe %>%
summarise_all(mean) # replace for median

How can I use any() on a multidimensional array?

I'm testing an arbitrarily-large, arbitrarily-dimensioned array of logicals, and I'd like to find out if any one or more of them are true. any() only works on a single dimension at a time, as does sum(). I know that I could test the number of dimensions and repeat any() until I get a single answer, but I'd like a quicker, and frankly, more-elegant, approach.
Ideas?
I'm running 2009a (R17, in the old parlance, I think).

If your data is in a matrix A, try this:
anyAreTrue = any(A(:));
EDIT: To explain a bit more for anyone not familiar with the syntax, A(:) uses the colon operator to take the entire contents of the array A, no matter what the dimensions, and reshape them into a single column vector (of size numel(A)-by-1). Only one call to ANY is needed to operate on the resulting column vector.

As pointed out, the correct solution is to reshape the result into a vector. Then any will give the desired result. Thus,
any(A(:))
gives the global result, true if any of numel(A) elements were true. You could also have used
any(reshape(A,[],1))
which uses the reshape operator explicitly. If you don't wish to do the extra step of converting your matrices into vectors to apply any, then another approach is to write a function of your own. For example, here is a function that would do it for you:
======================
function result = myany(A)
% determines if any element at all in A was non-zero
result = any(A(:));
======================
Save this as an m-file on your search path. The beauty of MATLAB (true for any programming language) is it is fully extensible. If there is some capability that you wish it had, just write a little idiom that does it. If you do this often enough, you will have customized the environment to fit your needs.