Non-scalar enumeration in Matlab

Non-scalar enumeration in Matlab - arrays

Is it possible to have enumeration member that are non-scalar?
For example, how can I enumerate colors such that each color is a 1x3 double (as needed for plots), without using methods?
With the following class definition
classdef color
properties
R, G, B
end
methods
function c = color(r, g, b)
c.R = r;
c.G = g;
c.B = b;
end
function g = get(c)
g = [c.R, c.G, c.B];
end
end
enumeration
red (1, 0, 0)
green (0, 1, 0)
end
end
I can write color.green.get() to get [0 1 0], but I would like the same result with color.green to make the code cleaner.
A different solution may be setting color as a global struct, but it's not practical because global variables can cause confusion and I have to write global color; in each script/function.

I'm not sure exactly what you're asking here, but I think the main answer is that you're currently doing basically the right thing (although I'd suggest a few small changes).
You can certainly have non-scalar arrays of enumeration values - using your class, for example, you could create mycolors = [color.red, color.green]. You can also have an enumeration with non-scalar properties, such as the following:
classdef color2
properties
RGB
end
methods
function c = color2(r, g, b)
c.RGB = [r,g,b];
end
end
enumeration
red (1, 0, 0)
green (0, 1, 0)
end
end
and then you could just say color2.red.RGB and you'd get [1,0,0].
But I'm guessing that neither of those are really what you want. The thing that I imagine you're aiming for, and unfortunately what you explicitly can't do, is something like:
classdef color3 < double
enumeration
red ([1,0,0])
green ([0,1,0])
end
end
where you would then just type color3.red and you'd get [1,0,0]. You can't do that, because when an enumeration inherits from a built-in, it has to be a scalar.
Personally, I would do basically what you're doing, but instead of calling your method get I would call it toRGB, so you'd say color.red.toRGB, which feels quite natural (especially if you also give it some other methods like toHSV or toHex as well). I'd also modify it slightly, so that it could accept arrays of colors:
function rgb = toRGB(c)
rgb = [[c.R]', [c.G]', [c.B]'];
end
That way you can pass in an array of n colors, and it will output an n-by-3 array of RGB values. For example, you could say mycolors = [color.red, color.green]; mycolors.toRGB and you'd get [1,0,0;0,1,0].
Hope that helps!

Related

How do I set a vector's elements to point to the first element in an array of arrays?

I've been learning Julia by trying to write a simple rigid body simulation, but I'm still somewhat confused about the assignment and mutating of variables.
I'm storing the points making up the shape of a body into an array of arrays where one vector holds the x,y,z coordinates of a point. For plotting the body with PyPlot the points are first transformed from local coordinates into world coordinates and then assigned to three arrays which hold the x, y, and z coordinates for the points respectively. I would like to have the three arrays only reference the array of arrays values instead of having copies of the values.
The relevant part of my code looks like this
type Rigidbody
n::Integer
k::Integer
bodyXYZ::Array{Array{Float64,1},2}
worldXYZ::Array{Array{Float64,1},2}
worldX::Array{Float64,2}
worldY::Array{Float64,2}
worldZ::Array{Float64,2}
Rotmat::Array{Float64,2}
x::Array{Float64,1}
end
# body.worldXYZ[1,1] = [x; y; z]
# and body.worldX[1,1] should be body.worldXYZ[1,1][1]
function body_to_world(body::Rigidbody)
for j in range(1, body.k)
for i in range(1, body.n)
body.worldXYZ[i,j] = body.x + body.Rotmat*body.bodyXYZ[i,j]
body.worldX[i,j] = body.worldXYZ[i,j][1]
body.worldY[i,j] = body.worldXYZ[i,j][2]
body.worldZ[i,j] = body.worldXYZ[i,j][3]
end
end
return nothing
end
After calling the body_to_world() and checking the elements with === they evaluate to true but if I then for example set
body.worldXYZ[1,1][1] = 99.999
the change is not reflected in body.worldX. The problem is probably something trivial but as can be seen from my code, I am a beginner and could use some help.

body.worldX[i,j] = body.worldXYZ[i,j][1]
You're setting a number to a number here. Numbers are not mutable, so body.worldX[i,j] won't refer back to body.worldXYZ[i,j][1]. What you're thinking of is that the value of an array will be a reference, but numbers don't have references, just the value themselves.
However, I would venture to say that if you're doing something like that, you're going about the problem wrong. You should probably be using types somewhere. Remember, types in Julia give good performance, so don't be afraid of them (and immutable types should be almost perfectly optimized after carneval's PR, so there's really no need to be afraid). Instead, I would make world::Array{Point,2} where
immutable Point{T}
x::T
y::T
z::T
end
Then you can get body.world[i,j].x for the x coordinate, etc. And then for free you can use map((i,j)->Ref(body.world[i,j].x),size(body.world)...) to get an array of references to the x's.
Or, you should be adding dispatches to your type. For example
import Base: size
size(RigidBody) = (n,k)
now size(body) outputs (n,k), as though it's an array. You can complete the array interface with getindex and setindex!. This kind of adding dispatches to your type will help clean up the code immensely.

How to write "good" Julia code when dealing with multiple types and arrays (multiple dispatch)

OP UPDATE: Note that in the latest version of Julia (v0.5), the idiomatic approach to answering this question is to just define mysquare(x::Number) = x^2. The vectorised case is covered using automatic broadcasting, i.e. x = randn(5) ; mysquare.(x). See also the new answer explaining dot syntax in more detail.
I am new to Julia, and given my Matlab origins, I am having some difficulty determining how to write "good" Julia code that takes advantage of multiple dispatch and Julia's type system.
Consider the case where I have a function that provides the square of a Float64. I might write this as:
function mysquare(x::Float64)
return(x^2);
end
Sometimes, I want to square all the Float64s in a one-dimentional array, but don't want to write out a loop over mysquare everytime, so I use multiple dispatch and add the following:
function mysquare(x::Array{Float64, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
But now I am sometimes working with Int64, so I write out two more functions that take advantage of multiple dispatch:
function mysquare(x::Int64)
return(x^2);
end
function mysquare(x::Array{Int64, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
Is this right? Or is there a more ideomatic way to deal with this situation? Should I use type parameters like this?
function mysquare{T<:Number}(x::T)
return(x^2);
end
function mysquare{T<:Number}(x::Array{T, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
This feels sensible, but will my code run as quickly as the case where I avoid parametric types?
In summary, there are two parts to my question:
If fast code is important to me, should I use parametric types as described above, or should I write out multiple versions for different concrete types? Or should I do something else entirely?
When I want a function that operates on arrays as well as scalars, is it good practice to write two versions of the function, one for the scalar, and one for the array? Or should I be doing something else entirely?
Finally, please point out any other issues you can think of in the code above as my ultimate goal here is to write good Julia code.

Julia compiles a specific version of your function for each set of inputs as required. Thus to answer part 1, there is no performance difference. The parametric way is the way to go.
As for part 2, it might be a good idea in some cases to write a separate version (sometimes for performance reasons, e.g., to avoid a copy). In your case however you can use the in-built macro #vectorize_1arg to automatically generate the array version, e.g.:
function mysquare{T<:Number}(x::T)
return(x^2)
end
#vectorize_1arg Number mysquare
println(mysquare([1,2,3]))
As for general style, don't use semicolons, and mysquare(x::Number) = x^2 is a lot shorter.
As for your vectorized mysquare, consider the case where T is a BigFloat. Your output array, however, is Float64. One way to handle this would be to change it to
function mysquare{T<:Number}(x::Array{T,1})
n = length(x)
y = Array(T, n)
for k = 1:n
#inbounds y[k] = x[k]^2
end
return y
end
where I've added the #inbounds macro to boost speed because we don't need to check the bound violation every time — we know the lengths. This function could still have issues in the event that the type of x[k]^2 isn't T. An even more defensive version would perhaps be
function mysquare{T<:Number}(x::Array{T,1})
n = length(x)
y = Array(typeof(one(T)^2), n)
for k = 1:n
#inbounds y[k] = x[k]^2
end
return y
end
where one(T) would give 1 if T is an Int, and 1.0 if T is a Float64, and so on. These considerations only matter if you want to make hyper-robust library code. If you really only will be dealing with Float64s or things that can be promoted to Float64s, then it isn't an issue. It seems like hard work, but the power is amazing. You can always just settle for Python-like performance and disregard all type information.

As of Julia 0.6 (c. June 2017), the "dot syntax" provides an easy and idiomatic way to apply a function to a scalar or an array.
You only need to provide the scalar version of the function, written in the normal way.
function mysquare{x::Number)
return(x^2)
end
Append a . to the function name (or preprend it to the operator) to call it on every element of an array:
x = [1 2 3 4]
x2 = mysquare(2) # 4
xs = mysquare.(x) # [1,4,9,16]
xs = mysquare.(x*x') # [1 4 9 16; 4 16 36 64; 9 36 81 144; 16 64 144 256]
y = x .+ 1 # [2 3 4 5]
Note that the dot-call will handle broadcasting, as in the last example.
If you have multiple dot-calls in the same expression, they will be fused so that y = sqrt.(sin.(x)) makes a single pass/allocation, instead of creating a temporary expression containing sin(x) and forwarding it to the sqrt() function. (This is different from Matlab/Numpy/Octave/Python/R, which don't make such a guarantee).
The macro #. vectorizes everything on a line, so #. y=sqrt(sin(x)) is the same as y = sqrt.(sin.(x)). This is particularly handy with polynomials, where the repeated dots can be confusing...

Smart and Fast Indexing of multi-dimensional array with R

This is another step of my battle with multi-dimensional arrays in R, previous question is here :)
I have a big R array with the following dimensions:
> data = array(..., dim = c(x, y, N, value))
I'd like to perform a sort of bootstrap comparing the mean (see here for a discussion about it) obtained with:
> vmean = apply(data, c(1,2,3), mean)
With the mean obtained sampling the N values randomly with replacement, to explain better if data[1,1,,1] is equals to [v1 v2 v3 ... vN] I'd like to replace it with something like [v_k1 v_k2 v_k3 ... v_kN] with k values sampled with sample(N, N, replace = T).
Of course I want to AVOID a for loop. I've read this but I don't know how to perform an efficient indexing of this array avoiding a loop through x and y.
Any ideas?
UPDATE: the important thing here is that I want a different sample for each sample in the fourth (value) dimension, otherwise it would be simple to do something like:
> dataSample = data[,,sample(N, N, replace = T), ]

Also there's the compiler package which speeds up for loops by using a Just In Time compiler.
Adding thes lines at the top of your code enables the compiler for all code.
require("compiler")
compilePKGS(enable=T)
enableJIT(3)
setCompilerOptions(suppressAll=T)

Implementing chained iterators in a Ruby C extension

I see that there's a relatively new feature in Ruby which allows chained iteration -- in other words, instead of each_with_indices { |x,i,j| ... } you might do each.with_indices { |x,i,j| ... }, where #each returns an Enumerator object, and Enumerator#with_indices causes the additional yield parameters to be included.
So, Enumerator has its own method #with_index, presumably for one-dimensional objects, source found here. But I can't figure out the best way to adapt this to other objects.
To be clear, and in response to comments: Ruby doesn't have an #each_with_indices right now -- it's only got an #each_with_index. (That's why I want to create one.)
A series of questions, themselves chained:
How would one adapt chained iteration to a one-dimensional object? Simply do an include Enumerable?
Presumably the above (#1) would not work for an n-dimensional object. Would one create an EnumerableN class, derived from Enumerable, but with #with_index converted into #with_indices?
Can #2 be done for Ruby extensions written in C? For example, I have a matrix class which stores various types of data (floats, doubles, integers, sometimes regular Ruby objects, etc.). Enumeration needs to check the data type (dtype) first as per the example below.
Example:
VALUE nm_dense_each(VALUE nm) {
volatile VALUE nm = nmatrix; // Not sure this actually does anything.
DENSE_STORAGE* s = NM_STORAGE_DENSE(nm); // get the storage pointer
RETURN_ENUMERATOR(nm, 0, 0);
if (NM_DTYPE(nm) == nm::RUBYOBJ) { // matrix stores VALUEs
// matrix of Ruby objects -- yield those objects directly
for (size_t i = 0; i < nm_storage_count_max_elements(s); ++i)
rb_yield( reinterpret_cast<VALUE*>(s->elements)[i] );
} else { // matrix stores non-Ruby data (int, float, etc)
// We're going to copy the matrix element into a Ruby VALUE and then operate on it. This way user can't accidentally
// modify it and cause a seg fault.
for (size_t i = 0; i < nm_storage_count_max_elements(s); ++i) {
// rubyobj_from_cval() converts any type of data into a VALUE using macros such as INT2FIX()
VALUE v = rubyobj_from_cval((char*)(s->elements) + i*DTYPE_SIZES[NM_DTYPE(nm)], NM_DTYPE(nm)).rval;
rb_yield( v ); // yield to the copy we made
}
}
}
So, to combine my three questions into one: How would I write, in C, a #with_indices to chain onto the NMatrix#each method above?
I don't particularly want anyone to feel like I'm asking them to code this for me, though if you did want to, we'd love to have you involved in our project. =)
But if you know of some example elsewhere on the web of how this is done, that'd be perfect -- or if you could just explain in words, that'd be lovely too.

#with_index is a method of Enumerator: http://ruby-doc.org/core-1.9.3/Enumerator.html#method-i-with_index
I suppose you could make a subclass of Enumerator that has #with_indices and have your #each return an instance of that class? That's the first thing that comes to mind, although your enumerator might have to be pretty coupled to the originating class...

Since you are saying that you are also interested in Ruby linguistics, not just C, let me contribute my 5 cents, without claiming to actually answer the question. #each_with_index and #with_index already became so idiomatic, that majority of the people rely on the index being a number. Therefore, if you go and implement your NMatrix#each_with_index in such way, that in the block { |e, i| ... } it would supply eg. arrays [0, 0], [0, 1], [0, 2], [1, 0], [1, 1], ... as index i, you would surprise people. Also, if others chain your NMatrix#each enumerator with #with_index method, they will receive just a single number as index. So, indeed, you are right to conclude that you need a distinct method to take care for the 2 indices-type (or, more generally, n indices for higher dimension matrices):
matrix.each_with_indices { |e, indices| ... }
This method should return a 2-dimensional (n-dimensional) array as indices == [i, j] . You should not go for the version:
matrix.each_with_indices { |e, i, j| ... }
As for the #with_index method, it is not your concern at all. If your NMatrix provides #each method (which it certainly does), then #with_index will work normally with it, out of your control. And you do not need to ponder about introducing matrix-specific #with_indices, because #each itself is not really specific to matrices, but to one-dimensional ordered collections of any sort. Finally, sorry for not being a skilled C programmer to cater to your C-related part of the question.

Matlab array of struct : Fast assignment

Is there any way to "vector" assign an array of struct.
Currently I can
edges(1000000) = struct('weight',1.0); //This really does not assign the value, I checked on 2009A.
for i=1:1000000; edges(i).weight=1.0; end;
But that is slow, I want to do something more like
edges(:).weight=[rand(1000000,1)]; //with or without the square brackets.
Any ideas/suggestions to vectorize this assignment, so that it will be faster.
Thanks in advance.

This is much faster than deal or a loop (at least on my system):
N=10000;
edge(N) = struct('weight',1.0); % initialize the array
values = rand(1,N); % set the values as a vector
W = mat2cell(values, 1,ones(1,N)); % convert values to a cell
[edge(:).weight] = W{:};
Using curly braces on the right gives a comma separated value list of all the values in W (i.e. N outputs) and using square braces on the right assigns those N outputs to the N values in edge(:).weight.

You can try using the Matlab function deal, but I found it requires to tweak the input a little (using this question: In Matlab, for a multiple input function, how to use a single input as multiple inputs?), maybe there is something simpler.
n=100000;
edges(n)=struct('weight',1.0);
m=mat2cell(rand(n,1),ones(n,1),1);
[edges(:).weight]=deal(m{:});
Also I found that this is not nearly as fast as the for loop on my computer (~0.35s for deal versus ~0.05s for the loop) presumably because of the call to mat2cell. The difference in speed is reduced if you use this more than once but it stays in favor of the for loop.

You could simply write:
edges = struct('weight', num2cell(rand(1000000,1)));

Is there something requiring you to particularly use a struct in this way?
Consider replacing your array of structs with simply a separate array for each member of the struct.
weights = rand(1, 1000);
If you have a struct member which is an array, you can make an extra dimension:
matrices = rand(3, 3, 1000);
If you just want to keep things neat, you could put these arrays into a struct:
edges.weights = weights;
edges.matrices = matrices;
But if you need to keep an array of structs, I think you can do
[edges.weight] = rand(1, 1000);

The reason that the structs in your example don't get initialized properly is that the syntax you're using only addresses the very last element in the struct array. For a nonexistent array, the rest of them get implicitly filled in with structs that have the default value [] in all their fields.
To make this behavior clear, try doing a short array with clear edges; edges(1:3) = struct('weight',1.0) and looking at each of edges(1), edges(2), and edges(3). The edges(3) element has 1.0 in its weight like you want; the others have [].
The syntax for efficiently initializing an array of structs is one of these.
% Using repmat and full assignment
edges = repmat(struct('weight', 1.0), [1 1000]);
% Using indexing
% NOTE: Only correct if variable is uninitialized!!!
edges(1:1000) = struct('weight', 1.0); % QUESTIONABLE
Note the 1:1000 instead of just 1000 when indexing in to the uninitialized edges array.
There's a problem with the edges(1:1000) form: if edges is already initialized, this syntax will just update the values of selected elements. If edges has more than 1000 elements, the others will be left unchanged, and your code will be buggy. Or if edges is a different type, you could get an error or weird behavior depending on its existing datatype. To be safe, you need to do clear edges before initializing using the indexing syntax. So it's better to just do full assignment with the repmat form.
BUT: Regardless of how you initialize it, an array-of-structs like this is always going to be inherently slow to work with for larger data sets. You can't do real "vectorized" operations on it because your primitive arrays are all broken up in to separate mxArrays inside each struct element. That includes the field assignment in your question – it is not possible to vectorize that. Instead, you should switch a struct-of-arrays like Brian L's answer suggests.

You can use a reverse struct and then do all operations without any errors
like this
x.E(1)=1;
x.E(2)=3;
x.E(2)=8;
x.E(3)=5;
and then the operation like the following
x.E
ans =
3 8 5
or like this
x.E(1:2)=2
x =
E: [2 2 5]
or maybe this
x.E(1:3)=[2,3,4]*5
x =
E: [10 15 20]
It is really faster than for_loop and you do not need other big functions to slow your program.