OP UPDATE: Note that in the latest version of Julia (v0.5), the idiomatic approach to answering this question is to just define mysquare(x::Number) = x^2. The vectorised case is covered using automatic broadcasting, i.e. x = randn(5) ; mysquare.(x). See also the new answer explaining dot syntax in more detail.
I am new to Julia, and given my Matlab origins, I am having some difficulty determining how to write "good" Julia code that takes advantage of multiple dispatch and Julia's type system.
Consider the case where I have a function that provides the square of a Float64. I might write this as:
function mysquare(x::Float64)
return(x^2);
end
Sometimes, I want to square all the Float64s in a one-dimentional array, but don't want to write out a loop over mysquare everytime, so I use multiple dispatch and add the following:
function mysquare(x::Array{Float64, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
But now I am sometimes working with Int64, so I write out two more functions that take advantage of multiple dispatch:
function mysquare(x::Int64)
return(x^2);
end
function mysquare(x::Array{Int64, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
Is this right? Or is there a more ideomatic way to deal with this situation? Should I use type parameters like this?
function mysquare{T<:Number}(x::T)
return(x^2);
end
function mysquare{T<:Number}(x::Array{T, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
This feels sensible, but will my code run as quickly as the case where I avoid parametric types?
In summary, there are two parts to my question:
If fast code is important to me, should I use parametric types as described above, or should I write out multiple versions for different concrete types? Or should I do something else entirely?
When I want a function that operates on arrays as well as scalars, is it good practice to write two versions of the function, one for the scalar, and one for the array? Or should I be doing something else entirely?
Finally, please point out any other issues you can think of in the code above as my ultimate goal here is to write good Julia code.
Julia compiles a specific version of your function for each set of inputs as required. Thus to answer part 1, there is no performance difference. The parametric way is the way to go.
As for part 2, it might be a good idea in some cases to write a separate version (sometimes for performance reasons, e.g., to avoid a copy). In your case however you can use the in-built macro #vectorize_1arg to automatically generate the array version, e.g.:
function mysquare{T<:Number}(x::T)
return(x^2)
end
#vectorize_1arg Number mysquare
println(mysquare([1,2,3]))
As for general style, don't use semicolons, and mysquare(x::Number) = x^2 is a lot shorter.
As for your vectorized mysquare, consider the case where T is a BigFloat. Your output array, however, is Float64. One way to handle this would be to change it to
function mysquare{T<:Number}(x::Array{T,1})
n = length(x)
y = Array(T, n)
for k = 1:n
#inbounds y[k] = x[k]^2
end
return y
end
where I've added the #inbounds macro to boost speed because we don't need to check the bound violation every time — we know the lengths. This function could still have issues in the event that the type of x[k]^2 isn't T. An even more defensive version would perhaps be
function mysquare{T<:Number}(x::Array{T,1})
n = length(x)
y = Array(typeof(one(T)^2), n)
for k = 1:n
#inbounds y[k] = x[k]^2
end
return y
end
where one(T) would give 1 if T is an Int, and 1.0 if T is a Float64, and so on. These considerations only matter if you want to make hyper-robust library code. If you really only will be dealing with Float64s or things that can be promoted to Float64s, then it isn't an issue. It seems like hard work, but the power is amazing. You can always just settle for Python-like performance and disregard all type information.
As of Julia 0.6 (c. June 2017), the "dot syntax" provides an easy and idiomatic way to apply a function to a scalar or an array.
You only need to provide the scalar version of the function, written in the normal way.
function mysquare{x::Number)
return(x^2)
end
Append a . to the function name (or preprend it to the operator) to call it on every element of an array:
x = [1 2 3 4]
x2 = mysquare(2) # 4
xs = mysquare.(x) # [1,4,9,16]
xs = mysquare.(x*x') # [1 4 9 16; 4 16 36 64; 9 36 81 144; 16 64 144 256]
y = x .+ 1 # [2 3 4 5]
Note that the dot-call will handle broadcasting, as in the last example.
If you have multiple dot-calls in the same expression, they will be fused so that y = sqrt.(sin.(x)) makes a single pass/allocation, instead of creating a temporary expression containing sin(x) and forwarding it to the sqrt() function. (This is different from Matlab/Numpy/Octave/Python/R, which don't make such a guarantee).
The macro #. vectorizes everything on a line, so #. y=sqrt(sin(x)) is the same as y = sqrt.(sin.(x)). This is particularly handy with polynomials, where the repeated dots can be confusing...
I'm seeing an issue when I try and reference an object property after having used a dot notation to apply a method.
it only occurs when I try to index the initial object
classdef myclassexample
properties
data
end
methods
function obj = procData(obj)
if numel(obj)>1
for i = 1:numel(obj)
obj(i) = obj(i).procData;
end
return
end
%do some processing
obj.data = abs(obj.data);
end
end
end
then assigning the following
A = myclassexample;
A(1).data= - -1;
A(2).data = -2;
when calling the whole array and collecting the property data it works fine
[A.procData.data]
if i try and index A then i only get a scalar out
[A([1 2]).procData.data]
even though it seems to do fine without the property call
B = A([1 2]).procData;
[B.data]
any ideas?
I would definitely call this a bug in the parser; A bug because it did not throw an error to begin with, and instead allowed you to write: obj.method.prop in the first place!
The fact that MATLAB crashed in some variations of this syntax is a serious bug, and should definitely be reported to MathWorks.
Now the general rule in MATLAB is that you should not "index into a result" directly. Instead, you should first save the result into a variable, and then index into that variable.
This fact is clear if you use the form func(obj) rather than obj.func() to invoke member methods for objects (dot-notation vs. function notation):
>> A = MyClass;
>> A.procData.data % or A.procData().data
ans =
[]
>> procData(A).data
Undefined variable "procData" or class "procData".
Instead, as you noted, you should use:
>> B = procData(A): % or: B = A.pocData;
>> [B.data]
FWIW, this is also what happens when working with plain structures and regular functions (as opposed to OOP objects and member functions), as you cannot index into the result of a function call anyway. Example:
% a function that works on structure scalar/arrays
function s = procStruct(s)
if numel(s) > 1
for i=1:numel(s)
s(i) = procStruct(s(i));
end
else
s.data = abs(s.data);
end
end
Then all the following calls will throw errors (as they should):
% 1x2 struct array
>> s = struct('data',{1 -2});
>> procStruct(s).data
Undefined variable "procStruct" or class "procStruct".
>> procStruct(s([1 2])).data
Undefined variable "procStruct" or class "procStruct".
>> feval('procStruct',s).data
Undefined variable "feval" or class "feval".
>> f=#procStruct; f(s([1 2])).data
Improper index matrix reference.
You might be asking yourself why they decided to not allow such syntax. Well it turns out there is a good reason why MATLAB does not allow indexing into a function call (without having to introduce a temporary variable that is), be it dot-indexing or subscript-indexing.
Take the following function for example:
function x = f(n)
if nargin == 0, n=3; end
x = magic(n);
end
If we allowed indexing into a function call, then there would be an ambiguity in how to interpret the following call f(4):
should it be interpreted as: f()(4) (that is call function with no arguments, then index into the resulting matrix using linear indexing to get the 4th element)
or should it interpreted as: f(4) (call the function with one argument being n=4, and return the matrix magic(4))
This confusion is caused by several things in the MATLAB syntax:
it allows calling function with no arguments simply by their name, without requiring the parentheses. If there is a function f.m, you can call it as either f or f(). This makes parsing M-code harder, because it is not clear whether tokens are variables or functions.
parentheses are used for both matrix indexing as well as function calls. So if a token x represents a variable, we use the syntax x(1,2) as indexing into the matrix. At the same time if x is the name of a function, then x(1,2) is used to call the function with two arguments.
Another point of confusion is comma-separated lists and functions that return multiple outputs. Example:
>> [mx,idx] = max(magic(3))
mx =
8 9 7
idx =
1 3 2
>> [mx,idx] = max(magic(3))(4) % now what?
Should we return the 4th element of each output variables from MAX, or 4th element from only the first output argument along with the full second output? What about when the function returns outputs of different sizes?
All of this still applies to the other types of indexing: f()(3)/f(3), f().x/f.x, f(){3}/f{3}.
Because of this, MathWorks decided avoid all the above confusion and simply not allow directly indexing into results. Unfortunately they limited the syntax in the process. Octave for example has no such restriction (you can write magic(4)(1,2)), but then again the new OOP system is still in the process of being developed, so I don't know how Octave deals with such cases.
For those interested, this reminds me of another similar bug with regards to packages and classes and directly indexing to get a property. The results were different whether you called it from the command prompt, from a script, or from a M-file function...
Is there any way to "vector" assign an array of struct.
Currently I can
edges(1000000) = struct('weight',1.0); //This really does not assign the value, I checked on 2009A.
for i=1:1000000; edges(i).weight=1.0; end;
But that is slow, I want to do something more like
edges(:).weight=[rand(1000000,1)]; //with or without the square brackets.
Any ideas/suggestions to vectorize this assignment, so that it will be faster.
Thanks in advance.
This is much faster than deal or a loop (at least on my system):
N=10000;
edge(N) = struct('weight',1.0); % initialize the array
values = rand(1,N); % set the values as a vector
W = mat2cell(values, 1,ones(1,N)); % convert values to a cell
[edge(:).weight] = W{:};
Using curly braces on the right gives a comma separated value list of all the values in W (i.e. N outputs) and using square braces on the right assigns those N outputs to the N values in edge(:).weight.
You can try using the Matlab function deal, but I found it requires to tweak the input a little (using this question: In Matlab, for a multiple input function, how to use a single input as multiple inputs?), maybe there is something simpler.
n=100000;
edges(n)=struct('weight',1.0);
m=mat2cell(rand(n,1),ones(n,1),1);
[edges(:).weight]=deal(m{:});
Also I found that this is not nearly as fast as the for loop on my computer (~0.35s for deal versus ~0.05s for the loop) presumably because of the call to mat2cell. The difference in speed is reduced if you use this more than once but it stays in favor of the for loop.
You could simply write:
edges = struct('weight', num2cell(rand(1000000,1)));
Is there something requiring you to particularly use a struct in this way?
Consider replacing your array of structs with simply a separate array for each member of the struct.
weights = rand(1, 1000);
If you have a struct member which is an array, you can make an extra dimension:
matrices = rand(3, 3, 1000);
If you just want to keep things neat, you could put these arrays into a struct:
edges.weights = weights;
edges.matrices = matrices;
But if you need to keep an array of structs, I think you can do
[edges.weight] = rand(1, 1000);
The reason that the structs in your example don't get initialized properly is that the syntax you're using only addresses the very last element in the struct array. For a nonexistent array, the rest of them get implicitly filled in with structs that have the default value [] in all their fields.
To make this behavior clear, try doing a short array with clear edges; edges(1:3) = struct('weight',1.0) and looking at each of edges(1), edges(2), and edges(3). The edges(3) element has 1.0 in its weight like you want; the others have [].
The syntax for efficiently initializing an array of structs is one of these.
% Using repmat and full assignment
edges = repmat(struct('weight', 1.0), [1 1000]);
% Using indexing
% NOTE: Only correct if variable is uninitialized!!!
edges(1:1000) = struct('weight', 1.0); % QUESTIONABLE
Note the 1:1000 instead of just 1000 when indexing in to the uninitialized edges array.
There's a problem with the edges(1:1000) form: if edges is already initialized, this syntax will just update the values of selected elements. If edges has more than 1000 elements, the others will be left unchanged, and your code will be buggy. Or if edges is a different type, you could get an error or weird behavior depending on its existing datatype. To be safe, you need to do clear edges before initializing using the indexing syntax. So it's better to just do full assignment with the repmat form.
BUT: Regardless of how you initialize it, an array-of-structs like this is always going to be inherently slow to work with for larger data sets. You can't do real "vectorized" operations on it because your primitive arrays are all broken up in to separate mxArrays inside each struct element. That includes the field assignment in your question – it is not possible to vectorize that. Instead, you should switch a struct-of-arrays like Brian L's answer suggests.
You can use a reverse struct and then do all operations without any errors
like this
x.E(1)=1;
x.E(2)=3;
x.E(2)=8;
x.E(3)=5;
and then the operation like the following
x.E
ans =
3 8 5
or like this
x.E(1:2)=2
x =
E: [2 2 5]
or maybe this
x.E(1:3)=[2,3,4]*5
x =
E: [10 15 20]
It is really faster than for_loop and you do not need other big functions to slow your program.
I'd like something like this:
each[i_, {1,2,3},
Print[i]
]
Or, more generally, to destructure arbitrary stuff in the list you're looping over, like:
each[{i_, j_}, {{1,10}, {2,20}, {3,30}},
Print[i*j]
]
Usually you want to use Map or other purely functional constructs and eschew a non-functional programming style where you use side effects. But here's an example where I think a for-each construct is supremely useful:
Say I have a list of options (rules) that pair symbols with expressions, like
attrVals = {a -> 7, b -> 8, c -> 9}
Now I want to make a hash table where I do the obvious mapping of those symbols to those numbers. I don't think there's a cleaner way to do that than
each[a_ -> v_, attrVals, h[a] = v]
Additional test cases
In this example, we transform a list of variables:
a = 1;
b = 2;
c = 3;
each[i_, {a,b,c}, i = f[i]]
After the above, {a,b,c} should evaluate to {f[1],f[2],f[3]}. Note that that means the second argument to each should be held unevaluated if it's a list.
If the unevaluated form is not a list, it should evaluate the second argument. For example:
each[i_, Rest[{a,b,c}], Print[i]]
That should print the values of b and c.
Addendum: To do for-each properly, it should support Break[] and Continue[]. I'm not sure how to implement that. Perhaps it will need to somehow be implemented in terms of For, While, or Do since those are the only loop constructs that support Break[] and Continue[].
And another problem with the answers so far: they eat Return[]s. That is, if you are using a ForEach loop in a function and want to return from the function from within the loop, you can't. Issuing Return inside the ForEach loop seems to work like Continue[]. This just (wait for it) threw me for a loop.
I'm years late to the party here, and this is perhaps more an answer to the "meta-question", but something many people initially have a hard time with when programming in Mathematica (or other functional languages) is approaching a problem from a functional rather than structural viewpoint. The Mathematica language has structural constructs, but it's functional at its core.
Consider your first example:
ForEach[i_, {1,2,3},
Print[i]
]
As several people pointed out, this can be expressed functionally as Scan[Print, {1,2,3}] or Print /# {1,2,3} (although you should favor Scan over Map when possible, as previously explained, but that can be annoying at times since there is no infix operator for Scan).
In Mathematica, there's usually a dozen ways to do everything, which is sometimes beautiful and sometimes frustrating. With that in mind, consider your second example:
ForEach[{i_, j_}, {{1,10}, {2,20}, {3,30}},
Print[i*j]
]
... which is more interesting from a functional point of view.
One possible functional solution is to instead use list replacement, e.g.:
In[1]:= {{1,10},{2,20},{3,30}}/.{i_,j_}:>i*j
Out[1]= {10,40,90}
...but if the list was very large, this would be unnecessarily slow since we are doing so-called "pattern matching" (e.g., looking for instances of {a, b} in the list and assigning them to i and j) unnecessarily.
Given a large array of 100,000 pairs, array = RandomInteger[{1, 100}, {10^6, 2}], we can look at some timings:
Rule-replacement is pretty quick:
In[3]:= First[Timing[array /. {i_, j_} :> i*j;]]
Out[3]= 1.13844
... but we can do a little better if we take advantage of the expression structure where each pair is really List[i,j] and apply Times as the head of each pair, turning each {i,j} into Times[i,j]:
In[4]:= (* f###list is the infix operator form of Apply[f, list, 1] *)
First[Timing[Times ### array;]]
Out[4]= 0.861267
As used in the implementation of ForEach[...] above, Cases is decidedly suboptimal:
In[5]:= First[Timing[Cases[array, {i_, j_} :> i*j];]]
Out[5]= 2.40212
... since Cases does more work than just the rule replacement, having to build an output of matching elements one-by-one. It turns out we can do a lot better by decomposing the problem differently, and take advantage of the fact that Times is Listable, and supports vectorized operation.
The Listable attribute means that a function f will automatically thread over any list arguments:
In[16]:= SetAttributes[f,Listable]
In[17]:= f[{1,2,3},{4,5,6}]
Out[17]= {f[1,4],f[2,5],f[3,6]}
So, since Times is Listable, if we instead had the pairs of numbers as two separate arrays:
In[6]:= a1 = RandomInteger[{1, 100}, 10^6];
a2 = RandomInteger[{1, 100}, 10^6];
In[7]:= First[Timing[a1*a2;]]
Out[7]= 0.012661
Wow, quite a bit faster! Even if the input wasn't provided as two separate arrays (or you have more than two elements in each pair,) we can still do something optimal:
In[8]:= First[Timing[Times##Transpose[array];]]
Out[8]= 0.020391
The moral of this epic is not that ForEach isn't a valuable construct in general, or even in Mathematica, but that you can often obtain the same results more efficiently and more elegantly when you work in a functional mindset, rather than a structural one.
Newer versions of Mathematica (6.0+) have generalized versions of Do[] and Table[] that do almost precisely what you want, by taking an alternate form of iterator argument. For instance,
Do[
Print[i],
{i, {1, 2, 3}}]
is exactly like your
ForEach[i_, {1, 2, 3,},
Print[i]]
Alterntatively, if you really like the specific ForEach syntax, you can make a HoldAll function that implements it, like so:
Attributes[ForEach] = {HoldAll};
ForEach[var_Symbol, list_, expr_] :=
ReleaseHold[
Hold[
Scan[
Block[{var = #},
expr] &,
list]]];
ForEach[vars : {__Symbol}, list_, expr_] :=
ReleaseHold[
Hold[
Scan[
Block[vars,
vars = #;
expr] &,
list]]];
This uses symbols as variable names, not patterns, but that's how the various built-in control structures like Do[] and For[] work.
HoldAll[] functions allow you to put together a pretty wide variety of custom control structures. ReleaseHold[Hold[...]] is usually the easiest way to assemble a bunch of Mathematica code to be evaluated later, and Block[{x = #}, ...]& allows variables in your expression body to be bound to whatever values you want.
In response to dreeves' question below, you can modify this approach to allow for more arbitrary destructuring using the DownValues of a unique symbol.
ForEach[patt_, list_, expr_] :=
ReleaseHold[Hold[
Module[{f},
f[patt] := expr;
Scan[f, list]]]]
At this point, though, I think you may be better off building something on top of Cases.
ForEach[patt_, list_, expr_] :=
With[{bound = list},
ReleaseHold[Hold[
Cases[bound,
patt :> expr];
Null]]]
I like making Null explicit when I'm suppressing the return value of a function. EDIT: I fixed the bug pointed out be dreeves below; I always like using With to interpolate evaluated expressions into Hold* forms.
The built-in Scan basically does this, though it's uglier:
Scan[Print[#]&, {1,2,3}]
It's especially ugly when you want to destructure the elements:
Scan[Print[#[[1]] * #[[2]]]&, {{1,10}, {2,20}, {3,30}}]
The following function avoids the ugliness by converting pattern to body for each element of list.
SetAttributes[ForEach, HoldAll];
ForEach[pat_, lst_, bod_] := Scan[Replace[#, pat:>bod]&, Evaluate#lst]
which can be used as in the example in the question.
PS: The accepted answer induced me to switch to this, which is what I've been using ever since and it seems to work great (except for the caveat I appended to the question):
SetAttributes[ForEach, HoldAll]; (* ForEach[pattern, list, body] *)
ForEach[pat_, lst_, bod_] := ReleaseHold[ (* converts pattern to body for *)
Hold[Cases[Evaluate#lst, pat:>bod];]]; (* each element of list. *)
The built-in Map function does exactly what you want. It can be used in long form:
Map[Print, {1,2,3}]
or short-hand
Print /# {1,2,3}
In your second case, you'd use "Print[Times###]&/#{{1,10}, {2,20}, {3,30}}"
I'd recommend reading the Mathematica help on Map, MapThread, Apply, and Function. They can take bit of getting used to, but once you are, you'll never want to go back!
Here is a slight improvement based on the last answer of dreeves that allows to specify the pattern without Blank (making the syntax similar to other functions like Table or Do) and that uses the level argument of Cases
SetAttributes[ForEach,HoldAll];
ForEach[patt_/; FreeQ[patt, Pattern],list_,expr_,level_:1] :=
Module[{pattWithBlanks,pattern},
pattWithBlanks = patt/.(x_Symbol/;!MemberQ[{"System`"},Context[x]] :> pattern[x,Blank[]]);
pattWithBlanks = pattWithBlanks/.pattern->Pattern;
Cases[Unevaluated#list, pattWithBlanks :> expr, {level}];
Null
];
Tests:
ForEach[{i, j}, {{1, 10}, {2, 20}, {3, 30}}, Print[i*j]]
ForEach[i, {{1, 10}, {2, 20}, {3, 30}}, Print[i], 2]
Mathematica have map functions, so lets say you have a function Functaking one argument. Then just write
Func /# list
Print /# {1, 2, 3, 4, 5}
The return value is a list of the function applied to each element in the in-list.
PrimeQ /# {10, 2, 123, 555}
will return {False,True,False,False}
Thanks to Pillsy and Leonid Shifrin, here's what I'm now using:
SetAttributes[each, HoldAll]; (* each[pattern, list, body] *)
each[pat_, lst_List, bod_] := (* converts pattern to body for *)
(Cases[Unevaluated#lst, pat:>bod]; Null); (* each element of list. *)
each[p_, l_, b_] := (Cases[l, p:>b]; Null); (* (Break/Continue not supported) *)