I'd like something like this:
each[i_, {1,2,3},
Print[i]
]
Or, more generally, to destructure arbitrary stuff in the list you're looping over, like:
each[{i_, j_}, {{1,10}, {2,20}, {3,30}},
Print[i*j]
]
Usually you want to use Map or other purely functional constructs and eschew a non-functional programming style where you use side effects. But here's an example where I think a for-each construct is supremely useful:
Say I have a list of options (rules) that pair symbols with expressions, like
attrVals = {a -> 7, b -> 8, c -> 9}
Now I want to make a hash table where I do the obvious mapping of those symbols to those numbers. I don't think there's a cleaner way to do that than
each[a_ -> v_, attrVals, h[a] = v]
Additional test cases
In this example, we transform a list of variables:
a = 1;
b = 2;
c = 3;
each[i_, {a,b,c}, i = f[i]]
After the above, {a,b,c} should evaluate to {f[1],f[2],f[3]}. Note that that means the second argument to each should be held unevaluated if it's a list.
If the unevaluated form is not a list, it should evaluate the second argument. For example:
each[i_, Rest[{a,b,c}], Print[i]]
That should print the values of b and c.
Addendum: To do for-each properly, it should support Break[] and Continue[]. I'm not sure how to implement that. Perhaps it will need to somehow be implemented in terms of For, While, or Do since those are the only loop constructs that support Break[] and Continue[].
And another problem with the answers so far: they eat Return[]s. That is, if you are using a ForEach loop in a function and want to return from the function from within the loop, you can't. Issuing Return inside the ForEach loop seems to work like Continue[]. This just (wait for it) threw me for a loop.
I'm years late to the party here, and this is perhaps more an answer to the "meta-question", but something many people initially have a hard time with when programming in Mathematica (or other functional languages) is approaching a problem from a functional rather than structural viewpoint. The Mathematica language has structural constructs, but it's functional at its core.
Consider your first example:
ForEach[i_, {1,2,3},
Print[i]
]
As several people pointed out, this can be expressed functionally as Scan[Print, {1,2,3}] or Print /# {1,2,3} (although you should favor Scan over Map when possible, as previously explained, but that can be annoying at times since there is no infix operator for Scan).
In Mathematica, there's usually a dozen ways to do everything, which is sometimes beautiful and sometimes frustrating. With that in mind, consider your second example:
ForEach[{i_, j_}, {{1,10}, {2,20}, {3,30}},
Print[i*j]
]
... which is more interesting from a functional point of view.
One possible functional solution is to instead use list replacement, e.g.:
In[1]:= {{1,10},{2,20},{3,30}}/.{i_,j_}:>i*j
Out[1]= {10,40,90}
...but if the list was very large, this would be unnecessarily slow since we are doing so-called "pattern matching" (e.g., looking for instances of {a, b} in the list and assigning them to i and j) unnecessarily.
Given a large array of 100,000 pairs, array = RandomInteger[{1, 100}, {10^6, 2}], we can look at some timings:
Rule-replacement is pretty quick:
In[3]:= First[Timing[array /. {i_, j_} :> i*j;]]
Out[3]= 1.13844
... but we can do a little better if we take advantage of the expression structure where each pair is really List[i,j] and apply Times as the head of each pair, turning each {i,j} into Times[i,j]:
In[4]:= (* f###list is the infix operator form of Apply[f, list, 1] *)
First[Timing[Times ### array;]]
Out[4]= 0.861267
As used in the implementation of ForEach[...] above, Cases is decidedly suboptimal:
In[5]:= First[Timing[Cases[array, {i_, j_} :> i*j];]]
Out[5]= 2.40212
... since Cases does more work than just the rule replacement, having to build an output of matching elements one-by-one. It turns out we can do a lot better by decomposing the problem differently, and take advantage of the fact that Times is Listable, and supports vectorized operation.
The Listable attribute means that a function f will automatically thread over any list arguments:
In[16]:= SetAttributes[f,Listable]
In[17]:= f[{1,2,3},{4,5,6}]
Out[17]= {f[1,4],f[2,5],f[3,6]}
So, since Times is Listable, if we instead had the pairs of numbers as two separate arrays:
In[6]:= a1 = RandomInteger[{1, 100}, 10^6];
a2 = RandomInteger[{1, 100}, 10^6];
In[7]:= First[Timing[a1*a2;]]
Out[7]= 0.012661
Wow, quite a bit faster! Even if the input wasn't provided as two separate arrays (or you have more than two elements in each pair,) we can still do something optimal:
In[8]:= First[Timing[Times##Transpose[array];]]
Out[8]= 0.020391
The moral of this epic is not that ForEach isn't a valuable construct in general, or even in Mathematica, but that you can often obtain the same results more efficiently and more elegantly when you work in a functional mindset, rather than a structural one.
Newer versions of Mathematica (6.0+) have generalized versions of Do[] and Table[] that do almost precisely what you want, by taking an alternate form of iterator argument. For instance,
Do[
Print[i],
{i, {1, 2, 3}}]
is exactly like your
ForEach[i_, {1, 2, 3,},
Print[i]]
Alterntatively, if you really like the specific ForEach syntax, you can make a HoldAll function that implements it, like so:
Attributes[ForEach] = {HoldAll};
ForEach[var_Symbol, list_, expr_] :=
ReleaseHold[
Hold[
Scan[
Block[{var = #},
expr] &,
list]]];
ForEach[vars : {__Symbol}, list_, expr_] :=
ReleaseHold[
Hold[
Scan[
Block[vars,
vars = #;
expr] &,
list]]];
This uses symbols as variable names, not patterns, but that's how the various built-in control structures like Do[] and For[] work.
HoldAll[] functions allow you to put together a pretty wide variety of custom control structures. ReleaseHold[Hold[...]] is usually the easiest way to assemble a bunch of Mathematica code to be evaluated later, and Block[{x = #}, ...]& allows variables in your expression body to be bound to whatever values you want.
In response to dreeves' question below, you can modify this approach to allow for more arbitrary destructuring using the DownValues of a unique symbol.
ForEach[patt_, list_, expr_] :=
ReleaseHold[Hold[
Module[{f},
f[patt] := expr;
Scan[f, list]]]]
At this point, though, I think you may be better off building something on top of Cases.
ForEach[patt_, list_, expr_] :=
With[{bound = list},
ReleaseHold[Hold[
Cases[bound,
patt :> expr];
Null]]]
I like making Null explicit when I'm suppressing the return value of a function. EDIT: I fixed the bug pointed out be dreeves below; I always like using With to interpolate evaluated expressions into Hold* forms.
The built-in Scan basically does this, though it's uglier:
Scan[Print[#]&, {1,2,3}]
It's especially ugly when you want to destructure the elements:
Scan[Print[#[[1]] * #[[2]]]&, {{1,10}, {2,20}, {3,30}}]
The following function avoids the ugliness by converting pattern to body for each element of list.
SetAttributes[ForEach, HoldAll];
ForEach[pat_, lst_, bod_] := Scan[Replace[#, pat:>bod]&, Evaluate#lst]
which can be used as in the example in the question.
PS: The accepted answer induced me to switch to this, which is what I've been using ever since and it seems to work great (except for the caveat I appended to the question):
SetAttributes[ForEach, HoldAll]; (* ForEach[pattern, list, body] *)
ForEach[pat_, lst_, bod_] := ReleaseHold[ (* converts pattern to body for *)
Hold[Cases[Evaluate#lst, pat:>bod];]]; (* each element of list. *)
The built-in Map function does exactly what you want. It can be used in long form:
Map[Print, {1,2,3}]
or short-hand
Print /# {1,2,3}
In your second case, you'd use "Print[Times###]&/#{{1,10}, {2,20}, {3,30}}"
I'd recommend reading the Mathematica help on Map, MapThread, Apply, and Function. They can take bit of getting used to, but once you are, you'll never want to go back!
Here is a slight improvement based on the last answer of dreeves that allows to specify the pattern without Blank (making the syntax similar to other functions like Table or Do) and that uses the level argument of Cases
SetAttributes[ForEach,HoldAll];
ForEach[patt_/; FreeQ[patt, Pattern],list_,expr_,level_:1] :=
Module[{pattWithBlanks,pattern},
pattWithBlanks = patt/.(x_Symbol/;!MemberQ[{"System`"},Context[x]] :> pattern[x,Blank[]]);
pattWithBlanks = pattWithBlanks/.pattern->Pattern;
Cases[Unevaluated#list, pattWithBlanks :> expr, {level}];
Null
];
Tests:
ForEach[{i, j}, {{1, 10}, {2, 20}, {3, 30}}, Print[i*j]]
ForEach[i, {{1, 10}, {2, 20}, {3, 30}}, Print[i], 2]
Mathematica have map functions, so lets say you have a function Functaking one argument. Then just write
Func /# list
Print /# {1, 2, 3, 4, 5}
The return value is a list of the function applied to each element in the in-list.
PrimeQ /# {10, 2, 123, 555}
will return {False,True,False,False}
Thanks to Pillsy and Leonid Shifrin, here's what I'm now using:
SetAttributes[each, HoldAll]; (* each[pattern, list, body] *)
each[pat_, lst_List, bod_] := (* converts pattern to body for *)
(Cases[Unevaluated#lst, pat:>bod]; Null); (* each element of list. *)
each[p_, l_, b_] := (Cases[l, p:>b]; Null); (* (Break/Continue not supported) *)
Related
Following to the question published in How expressive can we be with arrays in Z3(Py)? An example, I expressed the following formula in Z3Py:
Exists i::Integer s.t. (0<=i<|arr|) & (avg(arr)+t<arr[i])
This means: whether there is a position i::0<i<|arr| in the array whose value a[i] is greater than the average of the array avg(arr) plus a given threshold t.
The solution in Z3Py:
t = Int('t')
avg_arr = Int('avg_arr')
len_arr = Int('len_arr')
arr = Array('arr', IntSort(), IntSort())
phi_1 = And(0 <= i, i< len_arr)
phi_2 = (t+avg_arr<arr[i])
phi = Exists(i, And(phi_1, phi_2))
s = Solver()
s.add(phi)
print(s.check())
print(s.model())
Note that, (1) the formula is satisfiable and (2) each time I execute it, I get a different model. For instance, I just got: [avg_a = 0, t = 7718, len_arr = 1, arr = K(Int, 7719)].
I have three questions now:
What does arr = K(Int, 7719)] mean? Does this mean the array contains one Int element with value 7719? In that case, what does the K mean?
Of course, this implementation is wrong in the sense that the average and length values are independent from the array itself. How can I implement simple avg and len functions?
Where is the i index in the model given by the solver?
Also, in which sense would this implementation be different using sequences instead of arrays?
(1) arr = K(Int, 7719) means that it's a constant array. That is, at every location it has the value 7719. Note that this is truly "at every location," i.e., at every integer value. There's no "size" of the array in SMTLib parlance. For that, use sequences.
(2) Indeed, your average/length etc are not related at all to the array. There are ways of modeling this using quantifiers, but I'd recommend staying away from that. They are brittle, hard to code and maintain, and furthermore any interesting theorem you want to prove will get an unknown as answer.
(3) The i you declared and the i you used as the existential is completely independent of each other. (Latter is just a trick so z3 can recognize it as a value.) But I guess you removed that now.
The proper way to model such problems is using sequences. (Although, you shouldn't expect much proof performance there either.) Start here: https://microsoft.github.io/z3guide/docs/theories/Sequences/ and see how much you can push it through. Functions like avg will need a recursive definition most likely, for that you can use RecAddDefinition, for an example see: https://stackoverflow.com/a/68457868/936310
Stack-overflow works the best when you try to code these yourself and ask very specific questions about how to proceed, as opposed to overarching questions. (But you already knew that!) Best of luck..
For example, if I want to read the middle value from magic(5), I can do so like this:
M = magic(5);
value = M(3,3);
to get value == 13. I'd like to be able to do something like one of these:
value = magic(5)(3,3);
value = (magic(5))(3,3);
to dispense with the intermediate variable. However, MATLAB complains about Unbalanced or unexpected parenthesis or bracket on the first parenthesis before the 3.
Is it possible to read values from an array/matrix without first assigning it to a variable?
It actually is possible to do what you want, but you have to use the functional form of the indexing operator. When you perform an indexing operation using (), you are actually making a call to the subsref function. So, even though you can't do this:
value = magic(5)(3, 3);
You can do this:
value = subsref(magic(5), struct('type', '()', 'subs', {{3, 3}}));
Ugly, but possible. ;)
In general, you just have to change the indexing step to a function call so you don't have two sets of parentheses immediately following one another. Another way to do this would be to define your own anonymous function to do the subscripted indexing. For example:
subindex = #(A, r, c) A(r, c); % An anonymous function for 2-D indexing
value = subindex(magic(5), 3, 3); % Use the function to index the matrix
However, when all is said and done the temporary local variable solution is much more readable, and definitely what I would suggest.
There was just good blog post on Loren on the Art of Matlab a couple days ago with a couple gems that might help. In particular, using helper functions like:
paren = #(x, varargin) x(varargin{:});
curly = #(x, varargin) x{varargin{:}};
where paren() can be used like
paren(magic(5), 3, 3);
would return
ans = 16
I would also surmise that this will be faster than gnovice's answer, but I haven't checked (Use the profiler!!!). That being said, you also have to include these function definitions somewhere. I personally have made them independent functions in my path, because they are super useful.
These functions and others are now available in the Functional Programming Constructs add-on which is available through the MATLAB Add-On Explorer or on the File Exchange.
How do you feel about using undocumented features:
>> builtin('_paren', magic(5), 3, 3) %# M(3,3)
ans =
13
or for cell arrays:
>> builtin('_brace', num2cell(magic(5)), 3, 3) %# C{3,3}
ans =
13
Just like magic :)
UPDATE:
Bad news, the above hack doesn't work anymore in R2015b! That's fine, it was undocumented functionality and we cannot rely on it as a supported feature :)
For those wondering where to find this type of thing, look in the folder fullfile(matlabroot,'bin','registry'). There's a bunch of XML files there that list all kinds of goodies. Be warned that calling some of these functions directly can easily crash your MATLAB session.
At least in MATLAB 2013a you can use getfield like:
a=rand(5);
getfield(a,{1,2}) % etc
to get the element at (1,2)
unfortunately syntax like magic(5)(3,3) is not supported by matlab. you need to use temporary intermediate variables. you can free up the memory after use, e.g.
tmp = magic(3);
myVar = tmp(3,3);
clear tmp
Note that if you compare running times with the standard way (asign the result and then access entries), they are exactly the same.
subs=#(M,i,j) M(i,j);
>> for nit=1:10;tic;subs(magic(100),1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
0.0103
>> for nit=1:10,tic;M=magic(100); M(1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
0.0101
To my opinion, the bottom line is : MATLAB does not have pointers, you have to live with it.
It could be more simple if you make a new function:
function [ element ] = getElem( matrix, index1, index2 )
element = matrix(index1, index2);
end
and then use it:
value = getElem(magic(5), 3, 3);
Your initial notation is the most concise way to do this:
M = magic(5); %create
value = M(3,3); % extract useful data
clear M; %free memory
If you are doing this in a loop you can just reassign M every time and ignore the clear statement as well.
To complement Amro's answer, you can use feval instead of builtin. There is no difference, really, unless you try to overload the operator function:
BUILTIN(...) is the same as FEVAL(...) except that it will call the
original built-in version of the function even if an overloaded one
exists (for this to work, you must never overload
BUILTIN).
>> feval('_paren', magic(5), 3, 3) % M(3,3)
ans =
13
>> feval('_brace', num2cell(magic(5)), 3, 3) % C{3,3}
ans =
13
What's interesting is that feval seems to be just a tiny bit quicker than builtin (by ~3.5%), at least in Matlab 2013b, which is weird given that feval needs to check if the function is overloaded, unlike builtin:
>> tic; for i=1:1e6, feval('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 49.904117 seconds.
>> tic; for i=1:1e6, builtin('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 51.485339 seconds.
OP UPDATE: Note that in the latest version of Julia (v0.5), the idiomatic approach to answering this question is to just define mysquare(x::Number) = x^2. The vectorised case is covered using automatic broadcasting, i.e. x = randn(5) ; mysquare.(x). See also the new answer explaining dot syntax in more detail.
I am new to Julia, and given my Matlab origins, I am having some difficulty determining how to write "good" Julia code that takes advantage of multiple dispatch and Julia's type system.
Consider the case where I have a function that provides the square of a Float64. I might write this as:
function mysquare(x::Float64)
return(x^2);
end
Sometimes, I want to square all the Float64s in a one-dimentional array, but don't want to write out a loop over mysquare everytime, so I use multiple dispatch and add the following:
function mysquare(x::Array{Float64, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
But now I am sometimes working with Int64, so I write out two more functions that take advantage of multiple dispatch:
function mysquare(x::Int64)
return(x^2);
end
function mysquare(x::Array{Int64, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
Is this right? Or is there a more ideomatic way to deal with this situation? Should I use type parameters like this?
function mysquare{T<:Number}(x::T)
return(x^2);
end
function mysquare{T<:Number}(x::Array{T, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
This feels sensible, but will my code run as quickly as the case where I avoid parametric types?
In summary, there are two parts to my question:
If fast code is important to me, should I use parametric types as described above, or should I write out multiple versions for different concrete types? Or should I do something else entirely?
When I want a function that operates on arrays as well as scalars, is it good practice to write two versions of the function, one for the scalar, and one for the array? Or should I be doing something else entirely?
Finally, please point out any other issues you can think of in the code above as my ultimate goal here is to write good Julia code.
Julia compiles a specific version of your function for each set of inputs as required. Thus to answer part 1, there is no performance difference. The parametric way is the way to go.
As for part 2, it might be a good idea in some cases to write a separate version (sometimes for performance reasons, e.g., to avoid a copy). In your case however you can use the in-built macro #vectorize_1arg to automatically generate the array version, e.g.:
function mysquare{T<:Number}(x::T)
return(x^2)
end
#vectorize_1arg Number mysquare
println(mysquare([1,2,3]))
As for general style, don't use semicolons, and mysquare(x::Number) = x^2 is a lot shorter.
As for your vectorized mysquare, consider the case where T is a BigFloat. Your output array, however, is Float64. One way to handle this would be to change it to
function mysquare{T<:Number}(x::Array{T,1})
n = length(x)
y = Array(T, n)
for k = 1:n
#inbounds y[k] = x[k]^2
end
return y
end
where I've added the #inbounds macro to boost speed because we don't need to check the bound violation every time — we know the lengths. This function could still have issues in the event that the type of x[k]^2 isn't T. An even more defensive version would perhaps be
function mysquare{T<:Number}(x::Array{T,1})
n = length(x)
y = Array(typeof(one(T)^2), n)
for k = 1:n
#inbounds y[k] = x[k]^2
end
return y
end
where one(T) would give 1 if T is an Int, and 1.0 if T is a Float64, and so on. These considerations only matter if you want to make hyper-robust library code. If you really only will be dealing with Float64s or things that can be promoted to Float64s, then it isn't an issue. It seems like hard work, but the power is amazing. You can always just settle for Python-like performance and disregard all type information.
As of Julia 0.6 (c. June 2017), the "dot syntax" provides an easy and idiomatic way to apply a function to a scalar or an array.
You only need to provide the scalar version of the function, written in the normal way.
function mysquare{x::Number)
return(x^2)
end
Append a . to the function name (or preprend it to the operator) to call it on every element of an array:
x = [1 2 3 4]
x2 = mysquare(2) # 4
xs = mysquare.(x) # [1,4,9,16]
xs = mysquare.(x*x') # [1 4 9 16; 4 16 36 64; 9 36 81 144; 16 64 144 256]
y = x .+ 1 # [2 3 4 5]
Note that the dot-call will handle broadcasting, as in the last example.
If you have multiple dot-calls in the same expression, they will be fused so that y = sqrt.(sin.(x)) makes a single pass/allocation, instead of creating a temporary expression containing sin(x) and forwarding it to the sqrt() function. (This is different from Matlab/Numpy/Octave/Python/R, which don't make such a guarantee).
The macro #. vectorizes everything on a line, so #. y=sqrt(sin(x)) is the same as y = sqrt.(sin.(x)). This is particularly handy with polynomials, where the repeated dots can be confusing...
I see that there's a relatively new feature in Ruby which allows chained iteration -- in other words, instead of each_with_indices { |x,i,j| ... } you might do each.with_indices { |x,i,j| ... }, where #each returns an Enumerator object, and Enumerator#with_indices causes the additional yield parameters to be included.
So, Enumerator has its own method #with_index, presumably for one-dimensional objects, source found here. But I can't figure out the best way to adapt this to other objects.
To be clear, and in response to comments: Ruby doesn't have an #each_with_indices right now -- it's only got an #each_with_index. (That's why I want to create one.)
A series of questions, themselves chained:
How would one adapt chained iteration to a one-dimensional object? Simply do an include Enumerable?
Presumably the above (#1) would not work for an n-dimensional object. Would one create an EnumerableN class, derived from Enumerable, but with #with_index converted into #with_indices?
Can #2 be done for Ruby extensions written in C? For example, I have a matrix class which stores various types of data (floats, doubles, integers, sometimes regular Ruby objects, etc.). Enumeration needs to check the data type (dtype) first as per the example below.
Example:
VALUE nm_dense_each(VALUE nm) {
volatile VALUE nm = nmatrix; // Not sure this actually does anything.
DENSE_STORAGE* s = NM_STORAGE_DENSE(nm); // get the storage pointer
RETURN_ENUMERATOR(nm, 0, 0);
if (NM_DTYPE(nm) == nm::RUBYOBJ) { // matrix stores VALUEs
// matrix of Ruby objects -- yield those objects directly
for (size_t i = 0; i < nm_storage_count_max_elements(s); ++i)
rb_yield( reinterpret_cast<VALUE*>(s->elements)[i] );
} else { // matrix stores non-Ruby data (int, float, etc)
// We're going to copy the matrix element into a Ruby VALUE and then operate on it. This way user can't accidentally
// modify it and cause a seg fault.
for (size_t i = 0; i < nm_storage_count_max_elements(s); ++i) {
// rubyobj_from_cval() converts any type of data into a VALUE using macros such as INT2FIX()
VALUE v = rubyobj_from_cval((char*)(s->elements) + i*DTYPE_SIZES[NM_DTYPE(nm)], NM_DTYPE(nm)).rval;
rb_yield( v ); // yield to the copy we made
}
}
}
So, to combine my three questions into one: How would I write, in C, a #with_indices to chain onto the NMatrix#each method above?
I don't particularly want anyone to feel like I'm asking them to code this for me, though if you did want to, we'd love to have you involved in our project. =)
But if you know of some example elsewhere on the web of how this is done, that'd be perfect -- or if you could just explain in words, that'd be lovely too.
#with_index is a method of Enumerator: http://ruby-doc.org/core-1.9.3/Enumerator.html#method-i-with_index
I suppose you could make a subclass of Enumerator that has #with_indices and have your #each return an instance of that class? That's the first thing that comes to mind, although your enumerator might have to be pretty coupled to the originating class...
Since you are saying that you are also interested in Ruby linguistics, not just C, let me contribute my 5 cents, without claiming to actually answer the question. #each_with_index and #with_index already became so idiomatic, that majority of the people rely on the index being a number. Therefore, if you go and implement your NMatrix#each_with_index in such way, that in the block { |e, i| ... } it would supply eg. arrays [0, 0], [0, 1], [0, 2], [1, 0], [1, 1], ... as index i, you would surprise people. Also, if others chain your NMatrix#each enumerator with #with_index method, they will receive just a single number as index. So, indeed, you are right to conclude that you need a distinct method to take care for the 2 indices-type (or, more generally, n indices for higher dimension matrices):
matrix.each_with_indices { |e, indices| ... }
This method should return a 2-dimensional (n-dimensional) array as indices == [i, j] . You should not go for the version:
matrix.each_with_indices { |e, i, j| ... }
As for the #with_index method, it is not your concern at all. If your NMatrix provides #each method (which it certainly does), then #with_index will work normally with it, out of your control. And you do not need to ponder about introducing matrix-specific #with_indices, because #each itself is not really specific to matrices, but to one-dimensional ordered collections of any sort. Finally, sorry for not being a skilled C programmer to cater to your C-related part of the question.
Is there any way to "vector" assign an array of struct.
Currently I can
edges(1000000) = struct('weight',1.0); //This really does not assign the value, I checked on 2009A.
for i=1:1000000; edges(i).weight=1.0; end;
But that is slow, I want to do something more like
edges(:).weight=[rand(1000000,1)]; //with or without the square brackets.
Any ideas/suggestions to vectorize this assignment, so that it will be faster.
Thanks in advance.
This is much faster than deal or a loop (at least on my system):
N=10000;
edge(N) = struct('weight',1.0); % initialize the array
values = rand(1,N); % set the values as a vector
W = mat2cell(values, 1,ones(1,N)); % convert values to a cell
[edge(:).weight] = W{:};
Using curly braces on the right gives a comma separated value list of all the values in W (i.e. N outputs) and using square braces on the right assigns those N outputs to the N values in edge(:).weight.
You can try using the Matlab function deal, but I found it requires to tweak the input a little (using this question: In Matlab, for a multiple input function, how to use a single input as multiple inputs?), maybe there is something simpler.
n=100000;
edges(n)=struct('weight',1.0);
m=mat2cell(rand(n,1),ones(n,1),1);
[edges(:).weight]=deal(m{:});
Also I found that this is not nearly as fast as the for loop on my computer (~0.35s for deal versus ~0.05s for the loop) presumably because of the call to mat2cell. The difference in speed is reduced if you use this more than once but it stays in favor of the for loop.
You could simply write:
edges = struct('weight', num2cell(rand(1000000,1)));
Is there something requiring you to particularly use a struct in this way?
Consider replacing your array of structs with simply a separate array for each member of the struct.
weights = rand(1, 1000);
If you have a struct member which is an array, you can make an extra dimension:
matrices = rand(3, 3, 1000);
If you just want to keep things neat, you could put these arrays into a struct:
edges.weights = weights;
edges.matrices = matrices;
But if you need to keep an array of structs, I think you can do
[edges.weight] = rand(1, 1000);
The reason that the structs in your example don't get initialized properly is that the syntax you're using only addresses the very last element in the struct array. For a nonexistent array, the rest of them get implicitly filled in with structs that have the default value [] in all their fields.
To make this behavior clear, try doing a short array with clear edges; edges(1:3) = struct('weight',1.0) and looking at each of edges(1), edges(2), and edges(3). The edges(3) element has 1.0 in its weight like you want; the others have [].
The syntax for efficiently initializing an array of structs is one of these.
% Using repmat and full assignment
edges = repmat(struct('weight', 1.0), [1 1000]);
% Using indexing
% NOTE: Only correct if variable is uninitialized!!!
edges(1:1000) = struct('weight', 1.0); % QUESTIONABLE
Note the 1:1000 instead of just 1000 when indexing in to the uninitialized edges array.
There's a problem with the edges(1:1000) form: if edges is already initialized, this syntax will just update the values of selected elements. If edges has more than 1000 elements, the others will be left unchanged, and your code will be buggy. Or if edges is a different type, you could get an error or weird behavior depending on its existing datatype. To be safe, you need to do clear edges before initializing using the indexing syntax. So it's better to just do full assignment with the repmat form.
BUT: Regardless of how you initialize it, an array-of-structs like this is always going to be inherently slow to work with for larger data sets. You can't do real "vectorized" operations on it because your primitive arrays are all broken up in to separate mxArrays inside each struct element. That includes the field assignment in your question – it is not possible to vectorize that. Instead, you should switch a struct-of-arrays like Brian L's answer suggests.
You can use a reverse struct and then do all operations without any errors
like this
x.E(1)=1;
x.E(2)=3;
x.E(2)=8;
x.E(3)=5;
and then the operation like the following
x.E
ans =
3 8 5
or like this
x.E(1:2)=2
x =
E: [2 2 5]
or maybe this
x.E(1:3)=[2,3,4]*5
x =
E: [10 15 20]
It is really faster than for_loop and you do not need other big functions to slow your program.