Multithreading Performance in MATLAB C API - c

I was reading the article Multi-threaded Mex from Undocumented Matlab and decided to benchmark the given example (a max(a,b) function without an explicit output; the function updates a in-place with the maximal values from corresponding indices of the both matrices).
The multithreading starts to show its power for matrices with more than 1 million elements (1000 x 1000 matrix, for example). For small matrices, since the main function is very simple (a for-loop and an if-statement to copy the values from b to a if b[i] > a[i]), we have basically the time necessary for creating the threads. I was expecting that the multithreading would be slower in this context, but not that slower (more than hundred of times). So I decided to come here and ask if those results are reasonable.
The .c file can be found in MATLAB's File Exchange and the benchmark routine was the following.
function t = max_in_place_tester(r,n,~)
if nargin < 1
r = [1e3,5e2,1e2,1e2,1e1,1e0,1e0,1e0];
r = [r.',r.'].';
r = r(1:(end-1));
end
if isempty(r)
r = 1;
end
if nargin < 2
m = maxNumCompThreads;
n = [1e1,1e2,1e3,1e4,1e5,1e6,1e7,1e8];
n = [n.',5*n.'].';
n = n(1:(end-1));
t = zeros(m,size(n,2));
if size(r,2) == 1
r = repmat(r,1,size(n,2));
end
for i = 1:size(n,2)
t(:,i) = max_in_place_tester(r(i),n(i),[]);
end
n = log10(n);
t = t ./ r(1,:);
%t = t ./ t(1,:);
figure('Color','White');
hold on, grid on;
xlabel('log10(Number of Elements)');
ylabel('Relative Time Spent');
for i = 1:m
plot(n,t(i,:)./t(1,:),'LineWidth',2.5,'DisplayName',sprintf('Number of Threads: %d',i));
end
legend;
else
m = maxNumCompThreads;
n = round(sqrt(n));
t = zeros(m,1);
a = rand(n,n);
b = rand(n,n);
c = a;
d = b;
%getaddress(a,b,c,d)
c(1,1) = a(1,1);
d(1,1) = b(1,1);
%getaddress(a,b,c,d)
for i = 1:m
a = c;
b = d;
%getaddress(a,b,c,d)
a(1,1) = c(1,1);
b(1,1) = d(1,1);
%getaddress(a,b,c,d)
maxNumCompThreads(i);
if nargin > 2
s = tic;
for j = 1:r
max_in_place(a,b);
end
t(i,1) = toc(s);
else
fprintf('Number of Threadings: %d\n',maxNumCompThreads);
tic;
for j = 1:r
max_in_place(a,b);
end
toc;
end
end
end
end

Related

compressed sparse row and Jacobi iterative method

I've tried implementing Jacobi method for compressed sparse row format. But i couldnt obtain the output correctly. Below is the coding i tried. I'm trying with a 4 by 4 sparse matrix which is a tridiagonal matrix stored in compressed form before implementing Jacobi iterative method. Please help.
clear all;
close all;
clc;
H=4;
a=2;
b=-1;
c=-1;
A = diag(a*ones(1,H)) + diag(b*ones(1,H-1),1) + diag(c*ones(1,H-1),-1);%Matrix A
n = size(A,1); % no of rows
m = size(A,2); % no of columns
V = [];
C = [];
R = [];
counter=1;
R= [counter];
for i=1:n
for j=1:m
if (A(i,j) ~= 0)
V = [V A(i,j)];
C = [C j];
counter=counter+1;
end
R(i+1)=counter;
end
end
b = [9,18,24,3];
x_new = [1 ; 1 ; 1 ; 1];
eps = 1e-5; % 1 x 10^(-10).
error = 1000; % use any large value greater than eps to make sure that the loop can work
counter2=1;
while (error > eps)
x_old = x_new;
for i=1:length(R)-1 %modified
t = 0;
for j=R(i):R(i+1)-1 %modified
if (C(j)~=i) %not equal
t = t + x_old(C(j))*A(i,C(j)); %modified
end
end
x_new(i,1) = (b(i) - t)/A(i,C(j)); % is a row vector
end
error = norm(x_new-x_old);
counter2=counter2+1;
end
x_new % print x
Expected output is
[28.1987 47.3978 48.5979 25.7986]
this is the coding i tried and the expected output is above. Thank you for your time and consideration.

Convert c code to haskell code using recursion instead of loops (no lists)

I want to convert the following c code to haskell code, without using lists. It returns the number of occurrences of two numbers for a given n , where n satisfies n=(a*a)*(b*b*b).
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
int main(void) {
int n = 46656;
int i,j,counter=0,res=1;
int tr = sqrt(n);
for(i=1; i<=tr; i++) {
for(j=1; j<=tr; j++) {
res = (i*i) * (j*j*j) ;
if(res==n) {
counter=counter+1;
}
printf("%d\n",res);
}
}
printf("%d\n",counter);
}
I've managed to do something similar in haskell in regarding to loops, but only for finding the overall sum. I find difficult implementing the if part and counter part(see on c code) in haskell also. Any help much appreciated! Heres my haskell code also:
sumF :: (Int->Int)->Int->Int
sumF f 0 = 0
sumF f n = sumF f (n-1) + f n
sumF1n1n :: (Int->Int->Int)->Int->Int
sumF1n1n f 0 = 0
sumF1n1n f n = sumF1n1n f (n-1)
+sumF (\i -> f i n) (n-1)
+sumF (\j -> f n j) (n-1)
+f n n
func :: Int->Int->Int
func 0 0 = 0
func a b = res
where
res = (a^2 * b^3)
call :: Int->Int
call n = sumF1n1n func n
I guess an idiomatic translation would look like this:
n = 46656
tr = sqrt n
counter = length
[ ()
| i <- [1..tr]
, j <- [1..tr]
, i*i*j*j*j == n
]
Not that it isn't possible, but definitely not the best looking:
counter n = go (sqrt n) (sqrt n)
where
go 0 _ = 0
go i tr = (go2 tr 0 i) + (go (i - 1) tr)
go2 0 c i = c
go2 j c i = go2 (j - 1) (if i^2 * j^3 == n then c + 1 else c) i
A general and relatively straightforward way to translate imperative code is to replace each basic block with a function, and give it a parameter for every piece of state it uses. If it’s a loop, it will repeatedly tail-call itself with different values of those parameters. If you don’t care about printing the intermediate results, this translates straightforwardly:
The main program prints the result of the outer loop, which begins with i = 1 and counter = 0.
main = print (outer 1 0)
where
These are constants, so we can just bind them outside the loops:
n = 46656
tr = floor (sqrt n)
The outer loop tail-calls itself with increasing i, and counter updated by the inner loop, until i > tr, then it returns the final counter.
outer i counter
| i <= tr = outer (i + 1) (inner 1 counter)
| otherwise = counter
where
The inner loop tail-calls itself with increasing j, and its counter (counter') incremented when i^2 * j^3 == n, until j > tr, then it returns the updated counter back to outer. Note that this is inside the where clause of outer because it uses i to calculate res—you could alternatively make i an additional parameter.
inner j counter'
| j <= tr = inner (j + 1) $ let
res = i ^ 2 * j ^ 3
in if res == n then counter' + 1 else counter'
| otherwise = counter'

Need suggestion on Code conversion to Matlab

I am new in Matlab programming but have to convert a C program in Matlab. There are few parts which is making me confused. I am putting here the parts for both C and Matlab and looking for your suggestion for improvement of the code because the full code is not giving right output:
C Code:
j = 0;
for (i=0;i<256;i++){
j = (j+S[i]+key[i%strlen(key)]) %256;
int t = S[i];
S[i] = S[j];
S[j] = t;
}
Matlab Code:
le = length(key);
sc = 0:255;
output = 0;
for i0 = 1:255
output=rem((output+sc(i0+1)+key(rem(i0,le)+1)),256);
tm = sc(i0+1);
sc(i0+1) = sc(outpt+1);
sc(outpt+1) = tm;
end
Since you're using the expression sc(i0+1) to calculate the reminder you should start the for loop from 0.
le = length(key);
sc = 0:255;
output = 0;
for i0 = 0:255
output=rem((output+sc(i0+1)+key(rem(i0,le)+1)),256);
end
For this C code:
j = 0;
for (i=0;i<256;i++)
{
j = (j+S[i]+key[i%strlen(key)]) %256;
int t = S[i];
S[i] = S[j];
S[j] = t;
}
I would get this Matlab code:
j = 0;
for i = 1:256
j = mod(j + S(i) + key(mod(i-1, length(key)) + 1), 256);
t = S(i);
S(i) = S(j+1);
S(j+1) = t;
end
So two issues:
% in C is neither exactly the same as rem nor mod in Matlab unless all your numbers are always positive in which case it doesn't matter. If you are dealing with negative numbers then you need to do a bit of research into which you're after.
an indexing loop from 0 -> 255 in C should go from 1 -> 256 in Matlab because it begins indexing arrays at 1 rather than 0 like in C.

Filtering out Non-integers in my array OCTAVE/MATLAB

I have a code that determines prime factors written as:
N=12345678
for i = 2 : N
q = 0;
while N/i == floor(N/i)
N = N/i;
q = q + 1;
end
if q > 0
fac=i
if N == 1
break
end
end
end
However, I want my desired values which are 2, 3 ,47, and 14593 into one single matrix.
How can I do this?
If as it seems your code is in MATLAB, you simply can do this:
N=12345678
fac = [];
for i = 2 : N
q = 0;
while N/i == floor(N/i)
N = N/i;
q = q + 1;
end
if q > 0
fac=[fac, i];
if N == 1
break
end
end
end
Did you try to do it yourself on purpose? You could use Matlab's factor function instead,
factor(N)
which gives the same result.

Code benchmarking statistics -

As I wrote in my previous topic: Benchmarking code - am I doing it right? I need to find a way to get benchmark statistics, like average, mean, standard deviation, etc. How can I do this using those methods I posted? Notice that I use a solution to benchmark code with time interval, not by calling a function many times. Any ideas?
I came up with just one, dont know if its correct (pseudocode):
buffsize = 1024;
buffer [buffsize];
totalcycles = 0
// arrays
walltimeresults = []
cputimeresults = []
// benchmarking
for i in (0, iterations):
start = walltime();
fun2measure(args, buffer);
end = walltime();
walltimeresults[i] = end - start;
start = cputime();
fun2measure(args, buffer);
end = cputime();
cputimeresults[i] = end - start;
c1 = cyclecount();
fun2measure(args, buffer);
c2 = cyclecount();
cyclesperbyte = c2-c1/(buffsize);
totalcycles += cyclesperbyte;
for i in range (0, iterations) : sum += walltimeresults[i];
avg_wall_time = sum / iterations;
sum = 0;
for i in range (0, iterations) : sum += cputimeresults[i];
avg_cpu_time = sum / iterations;
avg_cycles = totalcycles / iterations;
Is it correct? How about mean, standard deviation, etc?
Your average looks OK.
Mean (i.e. average) is
mean = 1/N * sum( x[i] )
Standard deviation is square root of variance:
sigma = sqrt( 1/N * sum( (x[i]-mean)^2 )

Resources