Prolog loop while dividing list over N lists - loops

What I want to achieve when doing divide([1,2], 3, X). is something like:
I Should just get all the permutations of the first list, divided over N lists.
X = [[],[],[1,2]] ;
X = [[],[],[2,1]] ;
X = [[],[2],[1]] ;
X = [[],[1],[2]] ;
X = [[],[1,2],[]] ;
X = [[],[2,1],[]] ;
X = [[],[],[2,1]] ;
X = [[],[],[1,2]] ;
X = [[],[1],[2]] ;
X = [[],[2],[1]] ;
X = [[],[2,1],[]] ;
X = [[],[1,2],[]] ;
X = [[2],[],[1]] ;
X = [[2],[1],[]] ;
X = [[1],[],[2]] ;
X = [[1],[2],[]] ;
X = [[1,2],[],[]] ;
X = [[2,1],[],[]] ;
but for some reason, if my list is longer than 2 items, the code below goes into a loop and shows way too much information.
% Divides a list over N sets
divide(_,N,[]) :- N < 1.
divide(Items,1,[Items]).
divide(Items,N,[Selected|Other]) :- N > 1,
sublistPerm(Items,Selected,Rest),
N1 is N-1,
divide(Rest,N1,Other).
the sublistPerm works as it should (you can test it if you want).
% Gets all power sets of a list and permutes them
sublistPerm(Items, Sel, Rest) :- sublist(Items, Temp1, Temp2),
permutation(Temp1, Sel),
permutation(Temp2, Rest).
% Gets all power sets of a list
sublist([], [], []).
sublist([X|XS], YS, [X|ZS]) :- sublist(XS, YS, ZS).
sublist([X|XS], [X|YS], ZS) :- sublist(XS, YS, ZS).
If you would do the effort of running the following code, you will see the redundant info that I am getting. I have ABSOLUTELY no idea why it doesn't just terminate, as it should. divide([1,2,3], 3, X).
As you can see in my example, there are no duplicates. Normally these won't occur, and if they occur, duplicates should be removed.
Thanks for anyone pointing me in the right direction.

There are several issues with your code, looping is none of them. We can set that issue apart very quickly:
?- divide([1,2], 3, X), false.
This terminates. No termination issues with this query.
There are some redundant solutions. But again this is not really an issue. However, what is most problematic is that your relation is incomplete. The minimal example is:
?- divide([1,2], 1, [[2,1]]).
which should succeed but fails. So let's attack this issue first. The fact
divide(Items,1,[Items]).
has to be generalized to cover all permutations.
divide(Items,1,[ItemsP]) :-
permutation(Items, ItemsP).
For the redundant answers/solutions the second goal permutation/2 is not needed, you can replace it by (=)/2 or rewrite your program accordingly.

Related

MATLAB Broadcasting for unifrnd

I am coming back from NumPy to MATLAB and don't quite have the hang of the broadcasting here.
Can someone explain to me why the first one fails and the second (more explicit works)?
After my understanding, x0 and x1 are both 1x2 arrays and it should be possible to extend them to 5x2.
n_a = 5;
n_b = 2;
x0 = [1, 2];
x1 = [11, 22];
% c = unifrnd(x0, x1, [n_a, n_b])
% Error using unifrnd
% Size information is inconsistent.
% c = unifrnd(x0, x1, [n_a, 1]) % also fails
c = unifrnd(ones(n_a, n_b) .* x0, ones(n_a, n_b) .* x1, [n_a, n_b])
% works
There is a size verification within the unifrnd function (you can type open unifrnd in the command line to see the function code). It sends the error if the third input is not coherent with the size of the first 2 inputs:
[err, sizeOut] = internal.stats.statsizechk(2,a,b,varargin{:});
if err > 0
error(message('stats:unifrnd:InputSizeMismatch'));
end
If you skip this part, though (as in if you create a custom function without the size check), both your function calls that fail will actually work, due to implicit expansion. The real question is whether calling the function this way makes sense.
TL;DR : It is not the broadcasting that fails, it is the function that does not allow you these sets of inputs
unifrnd essentially calls rand and applies scaling and shifting to the desired interval. So you can use rand and do the scaling and shifting manually, which allows you to employ broadcasting (singleton expansion):
c = x0 + (x1-x0).*rand(n_a, n_b);

How to fast solve multiple equations in R?

I'm trying to fast solve equations (of the form x%*%res = y) for a large array in R CRAN.
I have the data x and y and want to compute res.
How can this be done best, i.e., fast? Thanks a lot!
Here is an example and some approaches: (seems like "solve" is the fastest?)
# setup:
p = 20 # dimension of matrix to solve
nmkt= 3000 # length of array, i.e., number of equations to solve
res = matrix(0,p,nmkt) # result matrix
x = array(rnorm(p*p*nmkt),c(p,p,nmkt)) # data
# make x symetric and invertible
for(i in 1:nmkt){ x[, , i]= crossprod(x[, , i])+diag(p)*0.01}
y = matrix(rnorm(p*nmkt),nmkt,p) # data
# computation and test:
R=100 # number of replications (actually much larger than 100 in my application R=1e5 or 1e7)
system.time(for(r in 1:R){ for(i in 1:nmkt){res[,i] = qr.solve(x[, , i], y[i,], tol = 1e-7)}})
system.time(for(r in 1:R){ for(i in 1:nmkt){res[,i] = solve(x[, , i], y[i,], tol = 1e-7)}})
system.time(for(r in 1:R){ for(i in 1:nmkt){res[,i] = crossprod( chol2inv(chol( x[, , i] )) , y[i,] )}})
Is the loop through the array a good solution?
Or use a sparse matrix? :
require(Matrix)
j = c(matrix(1:(p*nmkt),p,p*nmkt,byrow=TRUE))
i = c(aperm( array(j,c(p,p,nmkt)), c(2,1,3)))
system.time(for(r in 1:R){ res= solve(sparseMatrix(i=i, j=j, x = c(x)), c(t(y)), tol = 1e-7)} )

My OpenCL code changes the output based on a seemingly noop

I'm running the same OpenCL kernel code on an Intel CPU and on a NVIDIA GPU and the results are wrong on the first but right on the latter; the strange thing is that if I do some seemingly irrelevant changes the output works as expected in both cases.
The goal of the function is to calculate the matrix multiplication between A (triangular) and B (regular), where the position of A in the operation is determined by the value of the variable left. The bug only appears when left is true and when the for loop iterates at least twice.
Here is a fragment of the code omitting some bits that shouldn't affect for the sake of clarity.
__kernel void blas_strmm(int left, int upper, int nota, int unit, int row, int dim, int m, int n,
float alpha, __global const float *a, __global const float *b, __global float *c) {
/* [...] */
int ty = get_local_id(1);
int y = ty + BLOCK_SIZE * get_group_id(1);
int by = y;
__local float Bs[BLOCK_SIZE][BLOCK_SIZE];
/* [...] */
for(int i=start; i<end; i+=BLOCK_SIZE) {
if(left) {
ay = i+ty;
bx = i+tx;
}
else {
ax = i+tx;
by = i+ty;
}
barrier(CLK_LOCAL_MEM_FENCE);
/* [...] (Load As) */
if(bx >= m || by >= n)
Bs[tx][ty] = 0;
else
Bs[tx][ty] = b[bx*n+by];
barrier(CLK_LOCAL_MEM_FENCE);
/* [...] (Calculate Csub) */
}
if(y < n && x < (left ? row : m)) // In bounds
c[x*n+y] = alpha*Csub;
}
Now it gets weird.
As you can see, by always equals y if left is true. I checked (with some printfs, mind you) and left is always true, and the code on the else branch inside the loop is never executed. Nevertheless, if I remove or comment out the by = i+ty line there, the code works. Why? I don't know yet, but I though it might be something related to by not having the expected value assigned.
My train of thought took me to check if there was ever a discrepancy between by and y, as they should have the same value always; I added a line that checked if by != y but that comparison always returned false, as expected. So I went on and changed the appearance of by for y so the line
if(bx >= m || by >= n)
transformed into
if(bx >= m || y >= n)
and it worked again, even though I'm still using the variable by properly three lines below.
With an open mind I tried some other things and I got to the point that the code works if I add the following line inside the loop, as long as it is situated at any point after the initial if/else and before the if condition that I mentioned just before.
if(y >= n) left = 1;
The code inside (left = 1) can be substituted for anything (a printf, another useless assignation, etc.), but the condition is a bit more restrictive. Here are some examples that make the code output the correct values:
if(y >= n) left = 1;
if(y < n) left = 1;
if(y+1 < n+1) left = 1;
if(n > y) left = 1;
And some that don't work, note that m = n in the particular example that I'm testing:
if(y >= n+1) left = 1;
if(y > n) left = 1;
if(y >= m) left = 1;
/* etc. */
That's the point where I am now. I have added a line that shouldn't affect the program at all but it makes it work. This magic solution is not satisfactory to me and I would like to know what's happening inside my CPU and why.
Just to be sure I'm not forgetting anything, here is the full function code and a gist with example inputs and outputs.
Thank you very much.
Solution
Both users DarkZeros and sharpneli were right about their assumptions: the barriers inside the for loop weren't being hit the right amount of times. In particular, there was a bug involving the very first element of each local group that made it run one iteration less than the rest, provoking an undefined behaviour. It was painfully obvious to see in hindsight.
Thank you all for your answers and time.
Have you checked that the get_local_size always returns the correct value?
You said "In short, the full length of the matrix is divided in local blocks of BLOCK_SIZE and run in parallel; ". Remember that OpenCL allows any concurrency only within a workgroup. So if you call enqueueNDrange with global size of [32,32] and local size of [16,16] it is possible that the first thread block runs from start to finish, then the second one, then third etc. You cannot synchronize between workgroups.
What are your EnqueueNDRange call(s)? Example of the calls required to get your example output would be heavily appreciated (mostly interested in the global and local size arguments).
(I'd ask this in a comment but I am a new user).
E (Had an answer, upon verification did not have it, still need more info):
http://multicore.doc.ic.ac.uk/tools/GPUVerify/
By using that I got a complaint that a barrier could be reached by a nonuniform control flow.
It all depends on what values dim, nota and upper get. Could you provide some examples?
I did some testing. Assuming left = 1. nota != upper and dim = 32, row as 16 or 32 or whatnot, still worked and got the following result:
...
gid0: 2 gid1: 0 lid0: 14 lid1: 13 start: 0 end: 32
gid0: 2 gid1: 0 lid0: 14 lid1: 14 start: 0 end: 32
gid0: 2 gid1: 0 lid0: 14 lid1: 15 start: 0 end: 32
gid0: 2 gid1: 0 lid0: 15 lid1: 0 start: 0 end: 48
gid0: 2 gid1: 0 lid0: 15 lid1: 1 start: 0 end: 48
gid0: 2 gid1: 0 lid0: 15 lid1: 2 start: 0 end: 48
...
So if my assumptions about the variable values are even close to correct you have barrier divergence issue there. Some threads encounter a barrier which another threads never will. I'm surprised it did not deadlock.
The first thing I see it can terribly fail, is that you are using barriers inside a for loop.
If all the threads do not enter the same amount of times the for loop. Then the results are undefined completely. And you clearly state the problem only occurs if the for loop runs more than once.
Do you ensure this condition?

Enumerating a sequence and if-then-else

To emulate a simple loop like that:
start = something;
incr = something_else;
end = yet_something_else; /* all three are numerical values, int or float */
while (start <= end) {
/* do something for its side effect, for example: */
printf("%d %d\n", start, start*start);
start += incr;
}
I could write either:
loop1(Start, End, _Incr) :-
Start > End, !. % yes, the cut is necessary!
loop1(Start, End, Incr) :-
Start =< End,
/* do something for its side effect, for example */
format('~d ~d~n', [Start, Start*Start]),
Next is Start + Incr,
loop1(Next, End, Incr).
or:
loop2(Start, End, Incr) :-
( Start =< End
-> format('~d ~d~n, [Start, Start*Start]),
Next is Start + Incr,
loop2(Next, End, Incr)
; true
).
loop/3 must (and always will be) called with all arguments instantiated to numbers.
I should be using the second version, right? The only reason there is a doubt is that the if-then-else construct is pretty much absent from introductory Prolog material, and I can't figure out why (Learn Prolog Now!, for example, otherwise a good introductory material, doesn't even mention it!). At the same time there are cuts haphazardly flying every each way.
Thanks for the help!
my preferred way, that resembles structured programming, is between/3 coupled with forall/2.
?- forall(between(1,3,N), writeln(N)).
here is an 'applicative' example, from ICLP2013 contest:
icecream(N) :-
loop(N, top(N)),
left, loop(N+1, center), nl,
loop(N+1, bottom(N)).
:- meta_predicate loop(+, 1).
loop(XH, PR) :-
H is XH,
forall(between(1, H, I), call(PR, I)).
top(N, I) :-
left, spc(N-I+1), pop,
( I > 1
-> pop,
spc(2*(I-2)),
pcl
; true
),
pcl, nl.
bottom(N, I) :-
left, spc(I-1), put(\), spc(2*(N-I+1)), put(/), nl.
center(_) :- put(/), put(\).
left :- spc(4).
pop :- put(0'().
pcl :- put(0')).
spc(Ex) :- V is Ex, forall(between(1, V, _), put(0' )).
yields
2 ?- [icecream].
% icecream compiled 0.00 sec, 10 clauses
true.
3 ?- icecream(5).
()
(())
(( ))
(( ))
(( ))
/\/\/\/\/\/\
\ /
\ /
\ /
\ /
\ /
\/
true.
I don't know why they don't mention it. All practical programmers use it.
But we can avoid using of cut/if-then-else if rewrite your code with a failure-driven loop.
loop(From, To, Incr, Val) :-
From =< To,
( Val = From
; Next is From + Incr,
loop(Next, To, Incr, Val)
).
print_squares(Start, End, Incr) :-
loop(Start, End, Incr, Val),
Square is Val * Val,
format('~d ~d~n', [Val, Square]),
fail
;
true.
In a case Incr = 1 you can use between/3 from the standard library:
print_squares(Start, End) :-
between(Start, End, Val),
Square is Val * Val,
format('~d ~d~n', [Val, Square]),
fail
;
true.
If you know Russian or can translate it I can recommend my book http://sourceforge.net/projects/uranium-test/files/prolog/speed_prolog.pdf/download as an introductory matherial for Prolog.
Probably a better way to enumerate a sequence of (float) numbers:
sequence(First, Step, Last, R) :-
D is Last - First,
sign(Step) =:= sign(D),
N is floor(D / Step),
between(0, N, X),
R is First + X * Step.
One of the virtues of this solution is that it does not accumulate a floating point error like Next is This + Step.

Understanding Matlab code

I've got some code, and I've been trying to make some minor tweaks to it. It used to use fgets to load in a single character from a line, and use it to colour points in a 3D plot. So it would read
a
p
p
n
c
and then use other data files to assign what x, y, z points to give these. The result is a really pretty 3D plot.
I've edited the input file so it reads
0
1
1
0
2
2
0
and I want it to colour numbers the same colour.
This is where I've gotten so far with the code:
function PlotCluster(mcStep)
clear all
filename = input('Please enter filename: ', 's');
disp('Loading hopping site coordinates ...')
load x.dat
load y.dat
load z.dat
temp = z;
z = x;
x = temp;
n_sites = length(x);
disp('Loading hopping site types ...')
fp = fopen([filename]);
data = load(filename); %# Load the data
% Plot the devices
% ----------------
disp('Plotting the sample surface ...')
figure
disp('Hello world!')
ia = data == 0;
in = data == 1;
ip = data == 2;
disp('Hello Again')
plot3(x(ia),y(ia),z(ia),'b.') %,'MarkerSize',4)
hold on
plot3(x(ic),y(ic),z(ic),'b.') %,'MarkerSize',4)
plot3(x(in),y(in),z(in),'g.') %,'MarkerSize',4)
plot3(x(ip),y(ip),z(ip),'r.') %,'MarkerSize',4)
daspect([1 1 1])
set(gca,'Projection','Perspective')
set(gca,'FontSize',16)
axis tight
xlabel('z (nm)','FontSize',18)
ylabel('y (nm)','FontSize',18)
zlabel('x (nm)','FontSize',18)
%title(['Metropolis Monte Carlo step ' num2str(mcStep)])
view([126.5 23])
My issue is I'm getting this error
Index exceeds matrix dimensions.
Error in PlotCluster (line 34)
plot3(x(ia),y(ia),z(ia),'b.') %,'MarkerSize',4)
And I don't see why ia would go out of bounds of the x array. Is it to do with changing the fgets to a load statement? It was the only way to get it read the correct numbers in (not 49s and 50s which was very odd.)
The main bits that are sticking me are these lines (where the number used to correspond to 'a','n','p' etc)
ia = data == 0;
in = data == 1;
ip = data == 2;
They look like implied if statements with assignment from data to ia etc. where ia becomes an array. But I'm not sure.
Any help understanding this would be greatly appreciated.
I've fixed the issue, I hadn't updated my input correctly. To clear this up for anyone who comes to this question: ia = data ==0 means 'Make an array the same size as data, and fill it with 1 or 0 depending on if the logic (data == 0) is true or false'

Resources