I am new in Matlab programming but have to convert a C program in Matlab. There are few parts which is making me confused. I am putting here the parts for both C and Matlab and looking for your suggestion for improvement of the code because the full code is not giving right output:
C Code:
j = 0;
for (i=0;i<256;i++){
j = (j+S[i]+key[i%strlen(key)]) %256;
int t = S[i];
S[i] = S[j];
S[j] = t;
}
Matlab Code:
le = length(key);
sc = 0:255;
output = 0;
for i0 = 1:255
output=rem((output+sc(i0+1)+key(rem(i0,le)+1)),256);
tm = sc(i0+1);
sc(i0+1) = sc(outpt+1);
sc(outpt+1) = tm;
end
Since you're using the expression sc(i0+1) to calculate the reminder you should start the for loop from 0.
le = length(key);
sc = 0:255;
output = 0;
for i0 = 0:255
output=rem((output+sc(i0+1)+key(rem(i0,le)+1)),256);
end
For this C code:
j = 0;
for (i=0;i<256;i++)
{
j = (j+S[i]+key[i%strlen(key)]) %256;
int t = S[i];
S[i] = S[j];
S[j] = t;
}
I would get this Matlab code:
j = 0;
for i = 1:256
j = mod(j + S(i) + key(mod(i-1, length(key)) + 1), 256);
t = S(i);
S(i) = S(j+1);
S(j+1) = t;
end
So two issues:
% in C is neither exactly the same as rem nor mod in Matlab unless all your numbers are always positive in which case it doesn't matter. If you are dealing with negative numbers then you need to do a bit of research into which you're after.
an indexing loop from 0 -> 255 in C should go from 1 -> 256 in Matlab because it begins indexing arrays at 1 rather than 0 like in C.
Related
The computational cost will only consider how many times c = c+1; is executed.
I want to represent the Big O notation to use n.
count = 0; index = 0; c = 0;
while (index <= n) {
count = count + 1;
index = index + count;
c = c + 1;
}
I think if the "iteration of count" is k and "iteration of index" is n, then k(k+1)/2 = n.
So, I think O(root(n)) is the answer.
Is that right solution about this question?
Is that right solution about this question?
This is easy to test. The value of c when your while loop has finished will be the number of times the loop has run (and, thus, the number of times the c = c + 1; statement is executed). So, let us examine the values of c, for various n, and see how they differ from the posited O(√n) complexity:
#include <stdio.h>
#include <math.h>
int main()
{
printf(" c root(n) ratio\n"); // rubric
for (int i = 1; i < 10; ++i) {
int n = 10000000 * i;
int count = 0;
int index = 0;
int c = 0;
while (index < n) {
count = count + 1;
index = index + count;
c = c + 1;
}
double d = sqrt(n);
printf("%5d %8.3lf %8.5lf\n", c, d, c / d);
}
return 0;
}
Output:
c root(n) ratio
4472 3162.278 1.41417
6325 4472.136 1.41431
7746 5477.226 1.41422
8944 6324.555 1.41417
10000 7071.068 1.41421
10954 7745.967 1.41416
11832 8366.600 1.41419
12649 8944.272 1.41420
13416 9486.833 1.41417
We can see that, even though there are some 'rounding' errors, the last column appears reasonably constant (and, as it happens, an approximation to √2, which will generally improve as n becomes larger) – thus, as we ignore constant coefficients in Big-O notation, the complexity is, as you predicted, O(√n).
Let's first see how index changes for each loop iteration:
index = 0 + 1 = 1
index = 0 + 1 + 2 = 3
index = 0 + 1 + 2 + 3 = 6
...
index = 0 + 1 + ... + i-1 + i = O(i^2)
Then we need to figure out how many times the loop runs, which is equivalent of isolating i in the equation:
i^2 = n =>
i = sqrt(n)
So your algorithm runs in O(sqrt(n)) which also can be written as O(n^0.5).
I was reading the article Multi-threaded Mex from Undocumented Matlab and decided to benchmark the given example (a max(a,b) function without an explicit output; the function updates a in-place with the maximal values from corresponding indices of the both matrices).
The multithreading starts to show its power for matrices with more than 1 million elements (1000 x 1000 matrix, for example). For small matrices, since the main function is very simple (a for-loop and an if-statement to copy the values from b to a if b[i] > a[i]), we have basically the time necessary for creating the threads. I was expecting that the multithreading would be slower in this context, but not that slower (more than hundred of times). So I decided to come here and ask if those results are reasonable.
The .c file can be found in MATLAB's File Exchange and the benchmark routine was the following.
function t = max_in_place_tester(r,n,~)
if nargin < 1
r = [1e3,5e2,1e2,1e2,1e1,1e0,1e0,1e0];
r = [r.',r.'].';
r = r(1:(end-1));
end
if isempty(r)
r = 1;
end
if nargin < 2
m = maxNumCompThreads;
n = [1e1,1e2,1e3,1e4,1e5,1e6,1e7,1e8];
n = [n.',5*n.'].';
n = n(1:(end-1));
t = zeros(m,size(n,2));
if size(r,2) == 1
r = repmat(r,1,size(n,2));
end
for i = 1:size(n,2)
t(:,i) = max_in_place_tester(r(i),n(i),[]);
end
n = log10(n);
t = t ./ r(1,:);
%t = t ./ t(1,:);
figure('Color','White');
hold on, grid on;
xlabel('log10(Number of Elements)');
ylabel('Relative Time Spent');
for i = 1:m
plot(n,t(i,:)./t(1,:),'LineWidth',2.5,'DisplayName',sprintf('Number of Threads: %d',i));
end
legend;
else
m = maxNumCompThreads;
n = round(sqrt(n));
t = zeros(m,1);
a = rand(n,n);
b = rand(n,n);
c = a;
d = b;
%getaddress(a,b,c,d)
c(1,1) = a(1,1);
d(1,1) = b(1,1);
%getaddress(a,b,c,d)
for i = 1:m
a = c;
b = d;
%getaddress(a,b,c,d)
a(1,1) = c(1,1);
b(1,1) = d(1,1);
%getaddress(a,b,c,d)
maxNumCompThreads(i);
if nargin > 2
s = tic;
for j = 1:r
max_in_place(a,b);
end
t(i,1) = toc(s);
else
fprintf('Number of Threadings: %d\n',maxNumCompThreads);
tic;
for j = 1:r
max_in_place(a,b);
end
toc;
end
end
end
end
I am fighting some simple question.
I want to get prime numbers
I will use this algorithm
and... I finished code writing like this.
int k = 0, x = 1, n, prim, lim = 1;
int p[100000];
int xCount=0, limCount=0, kCount=0;
p[0] = 2;
scanf("%d", &n);
start = clock();
do
{
x += 2; xCount++;
if (sqrt(p[lim]) <= x)
{
lim++; limCount++;
}
k = 2; prim = true;
while (prim && k<lim)
{
if (x % p[k] == 0)
prim = false;
k++; kCount++;
}
if (prim == true)
{
p[lim] = x;
printf("prime number : %d\n", p[lim]);
}
} while (k<n);
I want to check how much repeat this code (x+=2; lim++; k++;)
so I used xCount, limCount, kCount variables.
when input(n) is 10, the results are x : 14, lim : 9, k : 43. wrong answer.
answer is (14,3,13).
Did I write code not well?
tell me correct point plz...
If you want to adapt an algorithm to your needs, it's always a good idea to implement it verbatim first, especially if you have pseudocode that is detailed enough to allow for such a verbatim translation into C-code (even more so with Fortran but I digress)
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main (void){
// type index 1..n
int index;
// var
// x: integer
int x;
//i, k, lim: integer
int i, k, lim;
// prim: boolean
bool prim;
// p: array[index] of integer {p[i] = i'th prime number}
/*
We cannot do that directly, we need to know the value of "index" first
*/
int res;
res = scanf("%d", &index);
if(res != 1 || index < 1){
fprintf(stderr,"Only integral values >= 1, please. Thank you.\n");
return EXIT_FAILURE;
}
/*
The array from the pseudocode is a one-based array, take care
*/
int p[index + 1];
// initialize the whole array with distinguishable values in case of debugging
for(i = 0;i<index;i++){
p[i] = -i;
}
/*
Your variables
*/
int lim_count = 0, k_count = 0;
// begin
// p[1] = 2
p[1] = 2;
// write(2)
puts("2");
// x = 1
x = 1;
// lim = 1
lim = 1;
// for i:=2 to n do
for(i = 2;i < index; i++){
// repeat (until prim)
do {
// x = x + 2
x += 2;
// if(sqr(p[lim]) <= x) then
if(p[lim] * p[lim] <= x){
// lim = lim +1
lim++;
lim_count++;
}
// k = 2
k = 2;
// prim = true
prim = true;
// while (prim and (k < lim)) do
while (prim && (k < lim)){
// prim = "x is not divisible by p[k]"
if((x % p[k]) == 0){
prim = false;
}
// k = k + 1
k++;
k_count++;
}
// (repeat) until prim
} while(!prim);
// p[i] := x
p[i] = x;
// write(x)
printf("%d\n",x);
}
// end
printf("x = %d, lim_count = %d, k_count = %d \n",x,lim_count,k_count);
for(i = 0;i<index;i++){
printf("%d, ",p[i]);
}
putchar('\n');
return EXIT_SUCCESS;
}
It will print an index - 1 number of primes starting at 2.
You can easily change it now--for example: print only the primes up to index instead of index - 1 primes.
In your case the numbers for all six primes up to 13 gives
x = 13, lim_count = 2, k_count = 3
which is distinctly different from the result you want.
Your translation looks very sloppy.
for i:= 2 to n do begin
must translate to:
for (i=2; i<=n; i++)
repeat
....
until prim
must translate to:
do {
...
} while (!prim);
The while prim... loop is inside the repeat...until prim loop.
I leave it to you to apply this to your code and to check that all constructs have been properly translated. it doesn't look too difficult to do that correctly.
Note: it looks like the algorithm uses 1-based arrays whereas C uses 0-based arrays.
This is the function I have written for 2D Convolution in C:
typedef struct PGMImage{
int w;
int h;
int* data;
}GrayImage;
GrayImage Convolution2D(GrayImage image,GrayImage kernel){
int aH,aW,bW,bH,r,c,x,y,xx,yy,X,Y;
int temp = 0;
GrayImage conv;
CreateGrayImage(&conv,image.w,image.h);
aH = image.h;
aW = image.w;
bH = kernel.h;
bW = kernel.w;
if(aW < bW || aH < bH){
fprintf(stderr,"Image cannot have smaller dimensions than the blur kernel");
}
for(r = aH-1;r >= 0;r--){
for(c = aW-1;c >= 0;c--){
temp = 0;
for(y = bH-1;y >= 0;y--){
yy = bH - y -1;
for(x = bW-1;x >= 0;x--){
xx = bW - x - 1;
X = c + (x - (bW/2));
Y = r + (y - (bH/2));
if(X >= 0 && X < aW && Y >= 0 && Y < aH){
temp += ((kernel.data[(yy*bW)+xx])*(image.data[(Y*aW)+X]));
}
}
}
conv.data[(r*aW)+c] = temp;
}
}
return conv;
}
I reproduced this function in Matlab and found that it overestimates the values for certain pixels as compared to the regular 2D Convolution function in Matlab (conv2D). I can't figure out where I am going wrong with the logic. Please help.
EDIT:
Here's the stock image I am using (512*512):
https://drive.google.com/file/d/0B3qeTSY-DQRvdWxCZWw5RExiSjQ/view?usp=sharing
Here's the kernel (3*3):
https://drive.google.com/file/d/0B3qeTSY-DQRvdlQzamcyVmtLVW8/view?usp=sharing
On using the above function I get
46465 46456 46564
45891 46137 46158
45781 46149 46030
But Matlab's conv2 gives me
46596 46618 46627
46073 46400 46149
45951 46226 46153
for the same pixels (rows:239-241,col:316:318)
This is the Matlab code I am using to compare the values:
pgm_img = imread('path\to\lena512.pgm');
kernel = imread('path\to\test_kernel.pgm');
sz_img = size(pgm_img);
sz_ker = size(kernel);
conv = conv2(double(pgm_img),double(kernel),'same');
pgm_img = padarray(pgm_img,floor(0.5*sz_ker),'both');
convolve = zeros(sz_img);
for i=floor(0.5*sz_ker(1))+1:floor(0.5*sz_ker(1))+sz_img(1)
for j=floor(0.5*sz_ker(2))+1:floor(0.5*sz_ker(2))+sz_img(2)
startX = j - floor(sz_ker(2)/2);
startY = i - floor(sz_ker(1)/2);
endX = j + floor(sz_ker(2)/2);
endY = i + floor(sz_ker(1)/2);
block = pgm_img(startY:endY,startX:endX);
prod = double(block).*double(kernel);
convolve(i-floor(0.5*sz_ker(1)),j-floor(0.5*sz_ker(2))) = sum(sum(prod));
end
end
disp(conv(239:241,316:318));
disp(convolve(239:241,316:318));
One obvious difference is that your c code uses ints, while the matlab code uses doubles. Change your c code to use doubles, and see if the results are still different.
I created Image Convolution library for simple cases of an image which is a simple 2D Float Array.
The function supports arbitrary kernels and verified against MATLAB's implementation.
So all needed on your side is calling it with your generated Kernel.
You can use its generated DLL inside MATLAB and see it yields same results as MATLAB's Image Convolution functions.
Image Convolution - GitHub.
I am given a set of elements from, say, 10 to 21 (always sequential),
I generate arrays of the same size, where size is determined runtime.
Example of 3 generated arrays (arrays # is dynamic as well as # of elements in all arrays, where some elements can be 0s - not used):
A1 = [10, 11, 12, 13]
A2 = [14, 15, 16, 17]
A3 = [18, 19, 20, 21]
these generated arrays will be given to different processes to to do some computations on the elements. My aim is to balance the load for every process that will get an array. What I mean is:
With given example, there are
A1 = 46
A2 = 62
A3 = 78
potential iterations over elements given for each thread.
I want to rearrange initial arrays to give equal amount of work for each process, so for example:
A1 = [21, 11, 12, 13] = 57
A2 = [14, 15, 16, 17] = 62
A3 = [18, 19, 20, 10] = 67
(Not an equal distribution, but more fair than initial). Distributions can be different, as long as they approach some optimal distribution and are better than the worst (initial) case of 1st and last arrays. As I see it, different distributions can be achieved using different indexing [where the split of arrays is made {can be uneven}]
This works fine for given example, but there may be weird cases..
So, I see this as a reflection problem (due to the lack of knowledge of proper definition), where arrays should be seen with a diagonal through them, like:
10|111213
1415|1617
181920|21
And then an obvious substitution can be done..
I tried to implement like:
if(rest == 0)
payload_size = (upper-lower)/(processes-1);
else
payload_size = (upper-lower)/(processes-1) + 1;
//printf("payload size: %d\n", payload_size);
long payload[payload_size];
int m = 0;
int k = payload_size/2;
int added = 0; //track what been added so far (to skip over already added elements)
int added2 = 0; // same as 'added'
int p = 0;
for (i = lower; i <= upper; i=i+payload_size){
for(j = i; j<(i+payload_size); j++){
if(j <= upper){
if((j-i) > k){
if(added2 > j){
added = j;
payload[(j-i)] = j;
printf("1 adding data: %d at location: %d\n", payload[(j-i)], (j-i));
}else{
printf("else..\n");
}
}else{
if(added < upper - (m+1)){
payload[(j-i)] = upper - (p*payload_size) - (m++);
added2 = payload[(j-i)];
printf("2 adding data: %d at location: %d\n", payload[(j-i)], (j-i));
}else{
payload[(j-i)] = j;
printf("2.5 adding data: %d at location: %d\n", payload[(j-i)], (j-i));
}
}
}else{ payload[(j-i)] = '\0'; }
}
p++;
k=k/2;
//printf("send to proc: %d\n", ((i)/payload_size)%(processes-1)+1);
}
..but failed horribly.
You definitely can see the problem in the implementation, because it is poorly scalable, not complete, messy, badly written and so on, and on, and on, ...
So, I need help either with the implementation or with an idea of a better approach to do what I want to achieve, given the description.
P.S. I need the solution to be as 'in-liney' as possible (avoid loop nesting) - that is why I am using bunch of flags and global indexes.
Surely this can be done with extra loops and unnecessary iterations. I invite people that can and appreciate t̲h̲e̲ ̲a̲r̲t̲ ̲o̲f̲ ̲i̲n̲d̲e̲x̲i̲n̲g̲ when it comes to arrays.
I am sure there is a solution somewhere out there, but I just cannot make an appropriate Google query to find it.
Hint? I thought of using index % size_of_my_data to achieve this task..
P.S. Application: described here
Here is an O(n) solution I wrote using deque (double-ended queue, a deque is not necessary and a simple array can be used, but a deque makes the code clean because of popRight and popLeft). The code is Python, not pseudocode, but it should be pretty to understand (because it's Python).:
def balancingSumProblem(seqStart = None, seqStop = None, numberOfArrays = None):
from random import randint
from collections import deque
seq = deque(xrange(seqStart or randint(1, 10),
seqStop and seqStop + 1 or randint(11,30)))
arrays = [[] for _ in xrange(numberOfArrays or randint(1,6))]
print "# of elements: {}".format(len(seq))
print "# of arrays: {}".format(len(arrays))
averageNumElements = float(len(seq)) / len(arrays)
print "average number of elements per array: {}".format(averageNumElements)
oddIteration = True
try:
while seq:
for array in arrays:
if len(array) < averageNumElements and oddIteration:
array.append(seq.pop()) # pop() is like popright()
elif len(array) < averageNumElements:
array.append(seq.popleft())
oddIteration = not oddIteration
except IndexError:
pass
print arrays
print [sum(array) for array in arrays]
balancingSumProblem(10,21,3) # Given Example
print "\n---------\n"
balancingSumProblem() # Randomized Test
Basically, from iteration to iteration, it alternates between grabbing large elements and distributing them evenly in the arrays and grabbing small elements and distributing them evenly in the arrays. It goes from out to in (though you could go from in to out) and tries to use what should be the average number of elements per array to balance it out further.
It's not 100 percent accurate with all tests but it does a good job with most randomized tests. You can try running the code here: http://repl.it/cJg
With a simple sequence to assign, you can just iteratively add the min and max elements to each list in turn. There are some termination details to fix up, but that's the general idea. Applied to your example the output would look like:
john-schultzs-macbook-pro:~ jschultz$ ./a.out
10 21 13 18 = 62
11 20 14 17 = 62
12 19 15 16 = 62
A simple reflection assignment like this will be optimal when num_procs evenly divides num_elems. It will be sub-optimal, but still decent, when it doesn't:
#include <stdio.h>
int compute_dist(int lower, int upper, int num_procs)
{
if (lower > upper || num_procs <= 0)
return -1;
int num_elems = upper - lower + 1;
int num_elems_per_proc_floor = num_elems / num_procs;
int num_elems_per_proc_ceil = num_elems_per_proc_floor + (num_elems % num_procs != 0);
int procs[num_procs][num_elems_per_proc_ceil];
int i, j, sum;
// assign pairs of (lower, upper) to each process until we can't anymore
for (i = 0; i + 2 <= num_elems_per_proc_floor; i += 2)
for (j = 0; j < num_procs; ++j)
{
procs[j][i] = lower++;
procs[j][i+1] = upper--;
}
// handle left overs similarly to the above
// NOTE: actually you could use just this loop alone if you set i = 0 here, but the above loop is more understandable
for (; i < num_elems_per_proc_ceil; ++i)
for (j = 0; j < num_procs; ++j)
if (lower <= upper)
procs[j][i] = ((0 == i % 2) ? lower++ : upper--);
else
procs[j][i] = 0;
// print assignment results
for (j = 0; j < num_procs; ++j)
{
for (i = 0, sum = 0; i < num_elems_per_proc_ceil; ++i)
{
printf("%d ", procs[j][i]);
sum += procs[j][i];
}
printf(" = %d\n", sum);
}
return 0;
}
int main()
{
compute_dist(10, 21, 3);
return 0;
}
I have used this implementation, which I mentioned in this report (Implementation works for cases I've used for testing (1-15K) (1-30K) and (1-100K) datasets. I am not saying that it will be valid for all the cases):
int aFunction(long lower, long upper, int payload_size, int processes)
{
long result, i, j;
MPI_Status status;
long payload[payload_size];
int m = 0;
int k = (payload_size/2)+(payload_size%2)+1;
int lastAdded1 = 0;
int lastAdded2 = 0;
int p = 0;
int substituted = 0;
int allowUpdate = 1;
int s;
int times = 1;
int times2 = 0;
for (i = lower; i <= upper; i=i+payload_size){
for(j = i; j<(i+payload_size); j++){
if(j <= upper){
if(k != 0){
if((j-i) >= k){
payload[(j-i)] = j- (m);
lastAdded2 = payload[(j-i)];
}else{
payload[(j-i)] = upper - (p*payload_size) - (m++) + (p*payload_size);
if(allowUpdate){
lastAdded1 = payload[(j-i)];
allowUpdate = 0;
}
}
}else{
int n;
int from = lastAdded1 > lastAdded2 ? lastAdded2 : lastAdded1;
from = from + 1;
int to = lastAdded1 > lastAdded2 ? lastAdded1 : lastAdded2;
int tempFrom = (to-from)/payload_size + ((to-from)%payload_size>0 ? 1 : 0);
for(s = 0; s < tempFrom; s++){
int restIndex = -1;
for(n = from; n < from+payload_size; n++){
restIndex = restIndex + 1;
payload[restIndex] = '\0';
if(n < to && n >= from){
payload[restIndex] = n;
}else{
payload[restIndex] = '\0';
}
}
from = from + payload_size;
}
return 0;
}
}else{ payload[(j-i)] = '\0'; }
}
p++;
k=(k/2)+(k%2)+1;
allowUpdate = 1;
}
return 0;
}