I have an array called skj. skj contain 2 million rows with numbers (2000000x1 uint32).
I want to compute the following
string_skj = num2str(skj);
When I run the above line it takes about 1 minute, is there a faster way of doing it?
Hennadii Madan's answer got me thinking if there was a way to do this for column vectors more efficiently than the standard Matlab num2str (or int2str) and I've come up with 2 solutions that do.
EDIT: And after all that work #Luis Mendo comes in and blows it all out of the water :'(
EDIT: Now #Daniel has improved on all of the previous options again!
Given our row vector, V, as
V = uint32(randi(100, 200000, 1));
we can achieve the same result as
A = num2str(V);
with *
B = char(strsplit(num2str(V.')).');
or without the error checking of num2str
C = char(strsplit(sprintf('%d\n', V)).');
C = C(1:end-1, :); % Remove extraneous '\n'
B and C are slightly different to A. num2str pre-pads with a space, ' ', whilst B and C post-pad with a space.
In the below D and E are pre-padded with 0's and so do not match A, B or C exactly.
Benchmarks
-----num2str() on row vector [Original]-----
Elapsed time is 3.501976 seconds.
Name Size Bytes Class Attributes
A 200000x3 1200000 char
-----num2str() on column vector [IKavanagh modified from Hennadii Madan]-----
Elapsed time is 0.660878 seconds.
Name Size Bytes Class Attributes
B 200000x3 1200000 char
-----sprintf() on row vector [IKavanagh]-----
Elapsed time is 0.582472 seconds.
Name Size Bytes Class Attributes
C 200000x3 1200000 char
-----dec2base() on row vector [Luis Mendo]-----
Elapsed time is 0.042563 seconds.
Name Size Bytes Class Attributes
D 200000x3 1200000 char
-----myfastint2str() on row vector [Daniel]-----
Elapsed time is 0.011894 seconds.
Name Size Bytes Class Attributes
E 200000x3 1200000 char
Code
clear all
close all
clc
V = uint32(randi(100, 200000, 1));
for k = 1:50000
tic(); elapsed = toc(); % Warm up tic/toc
end
disp('-----num2str() on row vector [Original]-----');
tic;
A = num2str(V);
toc, whos A
disp('-----num2str() on column vector [IKavanagh modified from Hennadii Madan]-----');
tic;
B = char(strsplit(num2str(V.')).');
toc, whos B
disp('-----sprintf() on row vector [IKavanagh]-----');
tic;
C = char(strsplit(sprintf('%d\n', V)).');
C = C(1:end-1, :); % Remove extraneous '\n'
toc, whos C
disp('-----dec2base() on row vector [Luis Mendo]-----');
tic;
D = dec2base(V, 10);
toc, whos D
disp('-----myfastint2str() on row vector [Daniel]-----');
tic;
E = myfastint2str(V);
toc, whos E
Credit for idea to transpose should go to Hennadii Madan
Implementing int2str yourself, you can beat the performance of the original function by far.
function [ o ] = myfastint2str( x )
maxvalue=max(x(:));
%maxvalue=intmax(class(x));%Alternative implementation based on class
required_digits=ceil(log(double(maxvalue+1))/log(10));
o=repmat(x(1)*0,size(x,1),required_digits);%initialize array of required size
for c=size(o,2):-1:1
o(:,c)=mod(x,10);
x=(x-o(:,c))/10;
end
o=char(o+'0');
end
For the example input, my function required less than 0.15 seconds, while both int2str and num2str took about 15 seconds.
The output is slightly different as it generates leading zeros instead of blanks.
The following is much faster on my machine:
y = dec2base(skj,10);
Here's a quick test:
>> skj = uint32(2^32*rand(1e6,1)); %// random data
>> tic, y = num2str(skj); toc
Elapsed time is 22.823348 seconds.
>> tic, z = dec2base(skj,10); toc
Elapsed time is 1.235942 seconds.
Note that using dec2base gives leading zeros instead of leading spaces.
>> y(1:5,:)
ans =
3864067979
1572155259
1067755677
2492696731
561648530
>> z(1:5,:)
ans =
3864067979
1572155259
1067755677
2492696731
0561648530
If you really need to increase speed, have you considered writing a MEx function extension in C? It's a little bit complicated, but it's worth investing the time if you have some small routines that can easily be coded in C/C++. Once compiled, the MEx function can be called from the MATLAB command prompt, just like a .m function.
See http://www.mathworks.com/help/matlab/call-mex-files-1.html for more details.
Warning: the output is wrong, but may be workable.
Edit: A super fast 'solution' output is not a column, but a string with line breaks as separators. If you try to print it it will look the same
>> tic;a = sprintf('%d\n',skj);toc
Elapsed time is 0.422143 seconds
Edit: Old 'solution'
Try transposing before and after. Like num2str(skj.').'
>> skj = ones(2000000,1,'uint32');
>> tic;num2str(skj);toc
Elapsed time is 23.305860 seconds.
>> tic;num2str(skj.');toc
Elapsed time is 1.044551 seconds.
Related
I'm trying to optimizing the value N to split arrays up for vectorizing an array so it runs the quickest on different machines. I have some test code below
#example use random values
clear all,
t=rand(1,556790);
inner_freq=rand(8193,6);
N=100; # use N chunks
nn = int32(linspace(1, length(t)+1, N+1))
aa_sig_combined=zeros(size(t));
total_time_so_far=0;
for ii=1:N
tic;
ind = nn(ii):nn(ii+1)-1;
aa_sig_combined(ind) = sum(diag(inner_freq(1:end-1,2)) * cos(2 .* pi .* inner_freq(1:end-1,1) * t(ind)) .+ repmat(inner_freq(1:end-1,3),[1 length(ind)]));
toc
total_time_so_far=total_time_so_far+sum(toc)
end
fprintf('- Complete test in %4.4fsec or %4.4fmins\n',total_time_so_far,total_time_so_far/60);
This takes 162.7963sec or 2.7133mins to complete when N=100 on a 16gig i7 machine running ubuntu
Is there a way to find out what value N should be to get this to run the fastest on different machines?
PS: I'm running Octave 3.8.1 on 16gig i7 ubuntu 14.04 but it will also be running on even a 1 gig raspberry pi 2.
This is the Matlab test script that I used to time each parameter. The return is used to break it after the first iteration as it looks like the rest of the iterations are similar.
%example use random values
clear all;
t=rand(1,556790);
inner_freq=rand(8193,6);
N=100; % use N chunks
nn = int32( linspace(1, length(t)+1, N+1) );
aa_sig_combined=zeros(size(t));
D = diag(inner_freq(1:end-1,2));
for ii=1:N
ind = nn(ii):nn(ii+1)-1;
tic;
cosPara = 2 * pi * A * t(ind);
toc;
cosResult = cos( cosPara );
sumParaA = D * cosResult;
toc;
sumParaB = repmat(inner_freq(1:end-1,3),[1 length(ind)]);
toc;
aa_sig_combined(ind) = sum( sumParaA + sumParaB );
toc;
return;
end
The output is indicated as follows. Note that I have a slow computer.
Elapsed time is 0.156621 seconds.
Elapsed time is 17.384735 seconds.
Elapsed time is 17.922553 seconds.
Elapsed time is 18.452994 seconds.
As you can see, the cos operation is what's taking so long. You are running cos on a 8192x5568 matrix (45,613,056 elements) which makes sense that it takes so long.
If you wish to improve performance, use parfor as it appears each iteration is independent. Assuming you had 100 cores to run your 100 iterations, your script would be done in 17 seconds + parfor overhead.
Within the cos calculation, you might want to look into if another method exists to calculate cos of a value faster and more parallel than the stock method.
Another minor optimization is this line. It ensures that the diag function isn't called within the loop as the diagonal matrix is constant. You don't want a 8192x8192 diagonal matrix to be generated every time! I just stored it outside the loop and it gives a bit of a performance boost as well.
D = diag(inner_freq(1:end-1,2));
Note that I didn't use the Matlab profile as it didn't work for me, but you should use that in the future for more functionalized code.
I have a dataset (Data) which is a vector of, let's say, 1000 real numbers. I would like to extract at random from Data 100 times 10 contiguous numbers. I don't know how to use Datasample for that purpose.
Thanks in advance for you help.
You can just pick 100 random numbers between 1 and 991:
I = randi(991, 100, 1)
Then use them as the starting points to index 10 contiguous elements:
cell2mat(arrayfun(#(x)(Data(x:x+9)), I, 'uni', false))
Here you have a snipet, but instead of using Datasample, I used randi to generate random indexes.
n_times = 100;
l_data = length(Data);
index_random = randi(l_data-9,n_times,1); % '- 9' to not to surpass the vector limit when you read the 10 items
for ind1 = 1:n_times
random_number(ind1,:) = Data(index_random(ind1):index_random(ind1)+9)
end
This is similar to Dan's answer, but avoids using cells and arrayfun, so it may be faster.
Let Ns denote the number of contiguous numbers you want (10 in your example), and Nt the number of times (100 in your example). Then:
result = Data(bsxfun(#plus, randi(numel(Data)-Ns+1, Nt, 1), 0:Ns-1)); %// Nt x Ns
Here is another solution, close to #Luis, but with cumsum instead of bsxfun:
A = rand(1,1000); % The vector to sample
sz = size(A,2);
N = 100; % no. of samples
B = 10; % size of one sample
first = randi(sz-B+1,N,1); % the starting point for all blocks
rand_blocks = A(cumsum([first ones(N,B-1)],2)); % the result
This results in an N-by-B matrix (rand_blocks), each row of it is one sample. Of course, this could be one-lined, but it won't make it faster, and I want to keep it clear. For small N or B this method is slightly faster. If N or B becomes very large then the bsxfun method is slightly faster. This ranking is not affected by the size of A.
I'm trying to find a fastest way for finding unique values in a array and to remove 0 as a possibility of unique value.
Right now I have two solutions:
result1 = setxor(0, dataArray(1:end,1)); % This gives the correct solution
result2 = unique(dataArray(1:end,1)); % This solution is faster but doesn't give the same result as result1
dataArray is equivalent to :
dataArray = [0 0; 0 2; 0 4; 0 6; 1 0; 1 2; 1 4; 1 6; 2 0; 2 2; 2 4; 2 6]; % This is a small array, but in my case there are usually over 10 000 lines.
So in this case, result1 is equal to [1; 2] and result2 is equal to [0; 1; 2].
The unique function is faster but I don't want 0 to be considered. Is there a way to do this with unique and not consider 0 as a unique value? Is there an another alternative?
EDIT
I wanted to time the various solutions.
clc
dataArray = floor(10*rand(10e3,10));
dataArray(mod(dataArray(:,1),3)==0)=0;
% Initial
tic
for ii = 1:10000
FCT1 = setxor(0, dataArray(:,1));
end
toc
% My solution
tic
for ii = 1:10000
FCT2 = unique(dataArray(dataArray(:,1)>0,1));
end
toc
% Pursuit solution
tic
for ii = 1:10000
FCT3 = unique(dataArray(:, 1));
FCT3(FCT3==0) = [];
end
toc
% Pursuit solution with chappjc comment
tic
for ii = 1:10000
FCT32 = unique(dataArray(:, 1));
FCT32 = FCT32(FCT32~=0);
end
toc
% chappjc solution
tic
for ii = 1:10000
FCT4 = setdiff(unique(dataArray(:,1)),0);
end
toc
% chappjc 2nd solution
tic
for ii = 1:10000
FCT5 = find(accumarray(dataArray(:,1)+1,1))-1;
FCT5 = FCT5(FCT5>0);
end
toc
And the results:
Elapsed time is 5.153571 seconds. % FCT1 Initial
Elapsed time is 3.837637 seconds. % FCT2 My solution
Elapsed time is 3.464652 seconds. % FCT3 Pursuit solution
Elapsed time is 3.414338 seconds. % FCT32 Pursuit solution with chappjc comment
Elapsed time is 4.097164 seconds. % FCT4 chappjc solution
Elapsed time is 0.936623 seconds. % FCT5 chappjc 2nd solution
However, the solution with sparse and accumarray only works with integer. These solutions won't work with double.
Here's a wacky suggestion with accumarray, demonstrated using Floris' test data:
a = floor(10*rand(100000, 1)); a(mod(a,3)==0)=0;
result = find(accumarray(nonzeros(a(:,1))+1,1))-1;
Thanks to Luis Mendo for pointing out that with nonzeros, it is not necessary to perform result = result(result>0)!
Note that this solution requires integer-valued data (not necessarily an integer data type, but just not with decimal components). Comparing floating point values for equality, as unique would do, is perilous. See here and here.
Original suggestion: Combine unique with setdiff:
result = setdiff(unique(a(:,1)),0)
Or remove with logical indexing after unique:
result = unique(a(:,1));
result = result(result>0);
I generally prefer not to assign [] as in (result(result==0)=[];) since it gets very inefficient for large data sets.
Removing zeros after unique should be faster since the it operates on less data (unless every element is unique, OR if a/dataArray is very short).
Just to add to the general clamor - here are three different methods. They all give the same answer, but slightly different timings:
a = floor(10*rand(100000, 1));
a(mod(a,3)==0)=0;
tic
b1 = unique(a(:,1));
b1(b1==0) = [];
toc
tic
b2 = find(sparse(a(:,1)+1, 1, 1)) - 1;
b2(b2==0)=[];
toc
tic
b3 = setxor(0, a(:, 1), 'rows');
toc
display(b1)
display(b2)
display(b3)
On my machine, the timings (for an array of 100000 elements) were as follows:
0.0087 s - for unique
0.0142 s - for find(sparse)
0.0302 s = for setxor
I always like sparse for a problem like this - you get the count of elements at the same time as their unique values.
EDIT per #chappj's suggestion. I added a fourth option
b4 = find(accumarray(a(:,1)+1,1)-1);
b4(b4==0) = [];
Time:
0.0029 s , THREE TIMES FASTER THAN UNIQUE
Ladies and gentlemen, we have a winner.
AFTERWORD the index-based methods (sparse and accumarray) only work with integer-valued inputs (although they can be of double type). This seemed OK based on the input array given in the question, but doesn't work for non-integer valued inputs. Of course, unique is a tricky concept when you have doubles - number that "look" the same may be represented differently. You might consider truncating the input array (sanitizing the data) to make sure this is not a problem. For example, if you did
a = 0.001 * double(int(a * 1000));
You would round all values to no more than 3 significant figures, and because you went "via an int" you are sure that you don't end up with values that are "very subtly different" (say in the 8th digit or beyond). Of course in that case you could also do
a = round(a * 1000);
mina = min(a(:));
b = find(accumarray(a - mina + 1, 1)) + mina - 1;
b = 0.001 * b(b ~= 0);
This is "fairly robust" for non-integer values (in the above case it handles values with up to three significant digits; if you need more, the space requirements will eventually get too large and this method will be slower than unique, which in fact has to sort the data.)
Why not remove the zeros as a second step:
result2 = unique(.....);
result2 = (result2~=0);
I also found another way to do it :
result2 = unique(dataArray(dataArray(:,1)>0,1));
Searching around here one finds many questions how one can convert cell arrays of doubles into one big matrix.
In my application I have a two dimensional cell array (lets call it celldata of size m times n) of all same sized double matrices (lets say of size a times b).
I want to convert that data structure into one bit 4D double (m times n times a times b).
At the moment I do that by
reshape(cat(3,celldata{:}),m,n,a,b)
but maybe there are other methods doing that directly? Maybe with a call like
cat([3 4],celldata{:,:})
or similar.
I think
cell2mat(permute(celldata, [3 4 1 2]))
will do the trick. However,
%// create some bogus data
m = 1.1e2;
n = 1.2e2;
a = 1.3e2;
b = 1.4e2;
celldata = cellfun(#(~) randi(10, a,b, 'uint8'), cell(m,n), 'UniformOutput', false);
%// new method
tic
cell2mat(permute(celldata, [3 4 1 2]));
toc
%// your current method
tic
reshape(cat(3,celldata{:}),m,n,a,b);
toc
Results:
Elapsed time is 1.745495 seconds. % cell2mat/permute
Elapsed time is 0.305368 seconds. % reshape/cat
cell2mat is a matlab m-file (with necessary inefficiencies in the loop due to compatibility issues), while reshape and cat are built-ins. This is where that difference comes from.
I'd stick with your current method :)
Now, I'm asking you why you'd want to do this convesion in the first place. Is it an indexing problem? Because
celldata{x,y}(w,z)
prevents you from having to do the conversion, so you can index like
converted_celldata(x,y,w,z)
I don't see other reasons, because matrix/vector operations don't work anyway on 4D arrays...
Background
My question is motivated by simple observations, which somewhat undermine the beliefs/assumptions often held/made by experienced MATLAB users:
MATLAB is very well optimized when it comes to the built-in functions and the fundamental language features, such as indexing vectors and matrices.
Loops in MATLAB are slow (despite the JIT) and should generally be avoided if the algorithm can be expressed in a native, 'vectorized' manner.
The bottom line: core MATLAB functionality is efficient and trying to outperform it using MATLAB code is hard, if not impossible.
Investigating performance of vector indexing
The example codes shown below are as fundamental as it gets: I assign a scalar value to all vector entries. First, I allocate an empty vector x:
tic; x = zeros(1e8,1); toc
Elapsed time is 0.260525 seconds.
Having x I would like to set all its entries to the same value. In practice you would do it differently, e.g., x = value*ones(1e8,1), but the point here is to investigate the performance of vector indexing. The simplest way is to write:
tic; x(:) = 1; toc
Elapsed time is 0.094316 seconds.
Let's call it method 1 (from the value assigned to x). It seems to be very fast (faster at least than memory allocation). Because the only thing I do here is operate on memory, I can estimate the efficiency of this code by calculating the obtained effective memory bandwidth and comparing it to the hardware memory bandwidth of my computer:
eff_bandwidth = numel(x) * 8 bytes per double * 2 / time
In the above, I multiply by 2 because unless SSE streaming is used, setting values in memory requires that the vector is both read from and written to the memory. In the above example:
eff_bandwidth(1) = 1e8*8*2/0.094316 = 17 Gb/s
STREAM-benchmarked memory bandwidth of my computer is around 17.9 Gb/s, so indeed - MATLAB delivers close to peak performance in this case! So far, so good.
Method 1 is suitable if you want to set all vector elements to some value. But if you want to access elements every step entries, you need to substitute the : with e.g., 1:step:end. Below is a direct speed comparison with method 1:
tic; x(1:end) = 2; toc
Elapsed time is 0.496476 seconds.
While you would not expect it to perform any different, method 2 is clearly big trouble: factor 5 slowdown for no reason. My suspicion is that in this case MATLAB explicitly allocates the index vector (1:end). This is somewhat confirmed by using explicit vector size instead of end:
tic; x(1:1e8) = 3; toc
Elapsed time is 0.482083 seconds.
Methods 2 and 3 perform equally bad.
Another possibility is to explicitly create an index vector id and use it to index x. This gives you the most flexible indexing capabilities. In our case:
tic;
id = 1:1e8; % colon(1,1e8);
x(id) = 4;
toc
Elapsed time is 1.208419 seconds.
Now that is really something - 12 times slowdown compared to method 1! I understand it should perform worse than method 1 because of the additional memory used for id, but why is it so much worse than methods 2 and 3?
Let's try to give the loops a try - as hopeless as it may sound.
tic;
for i=1:numel(x)
x(i) = 5;
end
toc
Elapsed time is 0.788944 seconds.
A big surprise - a loop beats a vectorized method 4, but is still slower than methods 1, 2 and 3. It turns out that in this particular case you can do it better:
tic;
for i=1:1e8
x(i) = 6;
end
toc
Elapsed time is 0.321246 seconds.
And that is the probably the most bizarre outcome of this study - a MATLAB-written loop significantly outperforms native vector indexing. That should certainly not be so. Note that the JIT'ed loop is still 3 times slower than the theoretical peak almost obtained by method 1. So there is still plenty of room for improvement. It is just surprising (a stronger word would be more suitable) that usual 'vectorized' indexing (1:end) is even slower.
Questions
is simple indexing in MATLAB very inefficient (methods 2, 3, and 4 are slower than method 1), or did I miss something?
why is method 4 (so much) slower than methods 2 and 3?
why does using 1e8 instead of numel(x) as a loop bound speed up the code by factor 2?
Edit
After reading Jonas's comment, here is another way to do that using logical indices:
tic;
id = logical(ones(1, 1e8));
x(id) = 7;
toc
Elapsed time is 0.613363 seconds.
Much better than method 4.
For convenience:
function test
tic; x = zeros(1,1e8); toc
tic; x(:) = 1; toc
tic; x(1:end) = 2; toc
tic; x(1:1e8) = 3; toc
tic;
id = 1:1e8; % colon(1,1e8);
x(id) = 4;
toc
tic;
for i=1:numel(x)
x(i) = 5;
end
toc
tic;
for i=1:1e8
x(i) = 6;
end
toc
end
I can, of course, only speculate. However when I run your test with the JIT compiler enabled vs disabled, I get the following results:
% with JIT no JIT
0.1677 0.0011 %# init
0.0974 0.0936 %# #1 I added an assigment before this line to avoid issues with deferring
0.4005 0.4028 %# #2
0.4047 0.4005 %# #3
1.1160 1.1180 %# #4
0.8221 48.3239 %# #5 This is where "don't use loops in Matlab" comes from
0.3232 48.2197 %# #6
0.5464 %# logical indexing
Dividing shows us where there is any speed increase:
% withoutJit./withJit
0.0067 %# w/o JIT, the memory allocation is deferred
0.9614 %# no JIT
1.0057 %# no JIT
0.9897 %# no JIT
1.0018 %# no JIT
58.7792 %# numel
149.2010 %# no numel
The apparent speed-up on initialization happens, because with JIT turned off it appears that MATLAB delays the memory allocation until it is used, so x=zeros(...) does not do anything really. (thanks, #angainor).
Methods 1 through 4 don't seem to benefit from the JIT. I guess that #4 could be slow due to additional input testing in subsref to make sure that the input is of the proper form.
The numel result could have something to do with it being harder for the compiler to deal with uncertain number of iterations, or with some overhead due to checking whether the bound of the loop is ok (thought no-JIT tests suggest only ~0.1s for that)
Surprisingly, on R2012b on my machine, logical indexing seems to be slower than #4.
I think that this goes to show, once again, that MathWorks have done great work in speeding up code, and that "don't use loops" isn't always best if you're trying to get the fastest execution time (at least at the moment). Nevertheless, I find that vectorizing is in general a good approach, since (a) the JIT fails on more complex loops, and (b) learning to vectorize makes you understand Matlab a lot better.
Conclusion: If you want speed, use the profiler, and re-profile if you switch Matlab versions.
As pointed out by #Adriaan in the comments, nowadays it may be better to use timeit() to measure execution speed.
For reference, I used the following slightly modified test function
function tt = speedTest
tt = zeros(8,1);
tic; x = zeros(1,1e8); tt(1)=toc;
x(:) = 2;
tic; x(:) = 1; tt(2)=toc;
tic; x(1:end) = 2; tt(3)=toc;
tic; x(1:1e8) = 3; tt(4)=toc;
tic;
id = 1:1e8; % colon(1,1e8);
x(id) = 4;
tt(5)=toc;
tic;
for i=1:numel(x)
x(i) = 5;
end
tt(6)=toc;
tic;
for i=1:1e8
x(i) = 6;
end
tt(7)=toc;
%# logical indexing
tic;
id = true(1e8,1));
x(id)=7;
tt(8)=toc;
I do not have an answer to all the problems, but I do have some refined speculations on methods 2, 3 and 4.
Regarding methods 2 and 3. It does indeed seem that MATLAB allocates memory for the vector indices and fills it with values from 1 to 1e8. To understand it, lets see what is going on. By default, MATLAB uses double as its data type. Allocating the index array takes the same time as allocating x
tic; x = zeros(1e8,1); toc
Elapsed time is 0.260525 seconds.
For now, the index array contains only zeros. Assigning values to the x vector in an optimal way, as in method 1, takes 0.094316 seconds. Now, the index vector has to be read from the memory so that it can be used in indexing. That is additional 0.094316/2 seconds. Recall that in x(:)=1 vector x has to be both read from and written to the memory. So only reading it takes half the time. Assuming this is all that is done in x(1:end)=value, the total time of methods 2 and 3 should be
t = 0.260525+0.094316+0.094316/2 = 0.402
It is almost correct, but not quite. I can only speculate, but filling the index vector with values is probably done as an additional step and takes additional 0.094316 seconds. Hence, t=0.4963, which more or less fits with the time of methods 2 and 3.
These are only speculations, but they do seem to confirm that MATLAB explicitly creates index vectors when doing native vector indexing. Personally, I consider this to be a performance bug. MATLABs JIT compiler should be smart enough to understand this trivial construct and convert it to a call to a correct internal function. As it is now, on the todays memory bandwidth bounded architectures indexing performs at around 20% theoretical peak.
So if you do care about performance, you will have to implement x(1:step:end) as a MEX function, something like
set_value(x, 1, step, 1e8, value);
Now this is clearly illegal in MATLAB, since you are NOT ALLOWED to modify arrays in the MEX files inplace.
Edit Regarding method 4, one can try to analyze the performance of the individual steps as follows:
tic;
id = 1:1e8; % colon(1,1e8);
toc
tic
x(id) = 4;
toc
Elapsed time is 0.475243 seconds.
Elapsed time is 0.763450 seconds.
The first step, allocation and filling the values of the index vector takes the same time as methods 2 and 3 alone. It seems that it is way too much - it should take at most the time needed to allocate the memory and to set the values (0.260525s+0.094316s = 0.3548s), so there is an additional overhead of 0.12 seconds somewhere, which I can not understand. The second part (x(id) = 4) looks also very inefficient: it should take the time needed to set the values of x, and to read the id vector (0.094316s+0.094316/2s = 0.1415s) plus some error checks on the id values. Programed in C, the two steps take:
create id 0.214259
x(id) = 4 0.219768
The code used checks that a double index in fact represents an integer, and that it fits the size of x:
tic();
id = malloc(sizeof(double)*n);
for(i=0; i<n; i++) id[i] = i;
toc("create id");
tic();
for(i=0; i<n; i++) {
long iid = (long)id[i];
if(iid>=0 && iid<n && (double)iid==id[i]){
x[iid] = 4;
} else break;
}
toc("x(id) = 4");
The second step takes longer than the expected 0.1415s - that is due to the necessity of error checks on id values. The overhead seems too large to me - maybe it could be written better. Still, the time required is 0.4340s , not 1.208419s. What MATLAB does under the hood - I have no idea. Maybe it is necessary to do it, I just don't see it.
Of course, using doubles as indices introduces two additional levels of overhead:
size of double twice the size of uint32. Recall that memory bandwidth is the limiting factor here.
doubles need to be cast to integers for indexing
Method 4 can be written in MATLAB using integer indices:
tic;
id = uint32(1):1e8;
toc
tic
x(id) = 8;
toc
Elapsed time is 0.327704 seconds.
Elapsed time is 0.561121 seconds.
Which clearly improved the performance by 30% and proves that one should use integers as vector indices. However, the overhead is still there.
As I see it now, we can not do anything to improve the situation working within the MATLAB framework, and we have to wait till Mathworks fixes these issues.
Just a quick note to show how in 8 years of development, the performance characteristics of MATLAB have changed a lot.
This is on R2017a (5 years after OP's post):
Elapsed time is 0.000079 seconds. % x = zeros(1,1e8);
Elapsed time is 0.101134 seconds. % x(:) = 1;
Elapsed time is 0.578200 seconds. % x(1:end) = 2;
Elapsed time is 0.569791 seconds. % x(1:1e8) = 3;
Elapsed time is 1.602526 seconds. % id = 1:1e8; x(id) = 4;
Elapsed time is 0.373966 seconds. % for i=1:numel(x), x(i) = 5; end
Elapsed time is 0.374775 seconds. % for i=1:1e8, x(i) = 6; end
Note how the loop for 1:numel(x) is faster than indexing x(1:end), it seems that the array 1:end is still being created, whereas for the loop it is not. It is now better in MATLAB to not vectorize!
(I did add an assignment x(:)=0 after allocating the matrix, outside of any timed regions, to actually have the memory allocated, since zeros only reserves the memory.)
On MATLAB R2020b (online) (3 years later) I see these times:
Elapsed time is 0.000073 seconds. % x = zeros(1,1e8);
Elapsed time is 0.084847 seconds. % x(:) = 1;
Elapsed time is 0.084643 seconds. % x(1:end) = 2;
Elapsed time is 0.085319 seconds. % x(1:1e8) = 3;
Elapsed time is 1.393964 seconds. % id = 1:1e8; x(id) = 4;
Elapsed time is 0.168394 seconds. % for i=1:numel(x), x(i) = 5; end
Elapsed time is 0.169830 seconds. % for i=1:1e8, x(i) = 6; end
x(1:end) is now optimized in the same as x(:), the vector 1:end is no longer being explicitly created.