Halo exchange not working properly in MPI - c

I'm writing some code which does calculations on a large 3D grid, and uses a halo exchange procedure so that it can work using MPI. I'm getting the wrong results from my code, and I'm pretty sure it's because of the halo exchange not working properly.
Basically I have a large 3D array, a chunk of which is held on each process. Each process has an array which is 2 elements bigger in each dimension than the chunk of data it is holding - so that we can halo exchange into each face of the array without affecting the data stored in the rest of the array. I have the following code to do the halo exchange communication:
MPI_Type_vector(g->ny, g->nx, g->nx, MPI_DOUBLE, &face1);
MPI_Type_commit(&face1);
MPI_Type_vector(2*g->ny, 1, g->nx, MPI_DOUBLE, &face2);
MPI_Type_commit(&face2);
MPI_Type_vector(g->nz, g->nx, g->nx * g->ny, MPI_DOUBLE, &face3);
MPI_Type_commit(&face3);
/* Send to WEST receive from EAST */
MPI_Sendrecv(&(g->data)[current][0][0][0], 1, face1, g->west, tag,
&(g->data)[current][0][0][0], 1, face1, g->east, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* Send to EAST receive from WEST */
MPI_Sendrecv(&(g->data)[current][g->nz-1][0][0], 1, face1, g->east, tag,
&(g->data)[current][g->nz-1][0][0], 1, face1, g->west, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* Send to NORTH receive from SOUTH */
MPI_Sendrecv(&(g->data)[current][0][0][0], 1, face2, g->north, tag,
&(g->data)[current][0][0][0], 1, face2, g->south, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* Send to SOUTH receive from NORTH */
MPI_Sendrecv(&(g->data)[current][0][g->ny-1][0], 1, face2, g->south, tag,
&(g->data)[current][0][0][0], 1, face2, g->north, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* Send to UP receive from DOWN */
MPI_Sendrecv(&(g->data)[current][0][0][0], 1, face3, g->up, tag,
&(g->data)[current][0][0][0], 1, face3, g->down, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* Send to DOWN receive from UP */
MPI_Sendrecv(&(g->data)[current][0][0][g->nx-1], 1, face3, g->down, tag,
&(g->data)[current][0][0][g->nx-1], 1, face3, g->up, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
g->nx, g->ny and g->nz are the sizes of the array chunk that this process is holding, and g->west, g->east, g->north, g->south, g->up and g->down are the ranks of the adjacent processes in each direction, found using the following code:
/* Who are my neighbours in each direction? */
MPI_Cart_shift( cart_comm, 2, 1, &g->north, &g->south);
MPI_Cart_shift( cart_comm, 1, 1, &g->west, &g->east);
MPI_Cart_shift( cart_comm, 0, 1, &g->up, &g->down);
The array on each process is defined as:
array[2][g->nz][g->ny][g->nx]
(It has two copies because I need one to update into each time through my update routine, once I've done the halo exchange).
Can anyone tell me if I'm doing the communication correctly? Particularly the defining of the vector types. Will the vector types I've defined in the code extract each face of a 3D array? And do the MPI_Sendrecv calls look right?
I'm completely lost as to why my code isn't working, but I'm pretty sure it's communications related.

So I'm a big fan of using MPI_Type_create_subarray for pulling out slices of arrays; it's easier to keep straight than vector types. In general, you can't use a single vector type to describe multi-d guardcells (because there are multiple strides, you need to create vectors of vectors), but I think because you're only using 1 guardcell in each direction here that you're ok.
So let's consider the x-face GC; here you're sending an entire y-z plane to your x-neighbour. In memory, this looks like this given your array layout:
+---------+
| #|
| #|
| #|
| #| z=2
| #|
+---------+
| #|
| #|
| #| z=1
| #|
| #|
+---------+
| #|
^| #|
|| #| z=0
y| #|
| #|
+---------+
x->
so you're looking to send count=(ny*nz) blocks of 1 value, each strided by nx. I'm assuming here nx, ny, and nz include guardcells, and that you're sending the corner values. If you're not sending corner values, subarray is the way to go. I'm also assuming, crucially, that g->data is a contiguous block of nx*ny*nz*2 (or 2 contiguous blocks of nx*ny*nz) doubles, otherwise all is lost.
So your type create should look like
MPI_Type_vector((g->ny*g->nz), 1, g->nx, MPI_DOUBLE, &face1);
MPI_Type_commit(&face1);
Note that we are sending a total of count*blocksize = ny*nz values, which is right, and we are striding over count*stride = nx*ny*nz memory in the process, which is also right.
Ok, so the y face looks like this:
+---------+
|#########|
| |
| |
| | z=2
| |
+---------+
|#########|
| |
| | z=1
| |
| |
+---------+
|#########|
^| |
|| | z=0
y| |
| |
+---------+
x->
So you have nz blocks of nx values, each separated by stride nx*ny. So your type create should look like
MPI_Type_vector(g->nz, g->nx, (g->nx)*(g->ny), MPI_DOUBLE, &face2);
MPI_Type_commit(&face2);
And again double-checking, you're sending count*blocksize = nz*nx values, striding count*stride = nx*ny*nz memory. Check.
Finally, sending z-face data involves sending an entire x-y plane:
+---------+
|#########|
|#########|
|#########| z=2
|#########|
|#########|
+---------+
| |
| |
| | z=1
| |
| |
+---------+
| |
^| |
|| | z=0
y| |
| |
+---------+
x->
MPI_Type_vector(1, (g->nx)*(g->ny), 1, MPI_DOUBLE, &face3);
MPI_Type_commit(&face3);
And again double-checking, you're sending count*blocksize = nx*ny values, striding count*stride = nx*ny memory. Check.
Update:
I didn't take a look at your Sendrecvs, but there might be something there, too. Notice that you have to use a pointer to the first piece of data you're sending with an vector data type.
First off, if you have array size nx in the x direction, and you have two guardcells (one on either side), your left guardcell is 0, right is nx-1, and your 'real' data extends from 1..nx-2. So to send your westmost data to your west neighbour, and to receive into your eastmost guardcell from your east neighbour, you would want
/* Send to WEST receive from EAST */
MPI_Sendrecv(&(g->data)[current][0][0][g->nx-2], 1, face1, g->west, westtag,
&(g->data)[current][0][0][0], 1, face1, g->east, westtag,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* Send to EAST receive from WEST */
MPI_Sendrecv(&(g->data)[current][0][0][1], 1, face1, g->east, easttag,
&(g->data)[current][0][0][g->nx-1], 1, face1, g->west, easttag,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
(I like to use different tags for each stage of communication, helps keep things sorted.)
Likewise for the other directions.

Related

Excel: Problems w. INDIRECT, Arrays, and Aggregate Functions (SUM, MAX, etc.)

Objective
I have a Microsoft Excel spreadsheet containing a price list that may change over time (B2:B5 in the example). Separately, I have a budget that too may change over time (D2). I am attempting to construct a formula for E2 to output the number of items that can be purchased with the budget in D2. Thereafter, I'll attempt to construct formulas to output any change that would be made (F2) and a comma-delimited list of purchasable items (G2).
Note: It unfortunately isn't possible to add an intermediate calculation column to the list, such as a running total. As such, I'm trying for formulas for single cells (i.e., E2, F2, and G2).
Note: I'm using Excel for Mac 2019.
A B C D E F G
+---------+---------+-----+---------+-------+---------+---------------------------+
1 | Label | Price | | Budget | Items | Change | Item(s) |
+---------+---------+-----+---------+-------+---------+---------------------------+
2 | Item #1 | $ 10.00 | | $ 40.00 | 3 | $ 4.50 | Item #1, Item #2, Item #3 |
+---------+---------+-----+---------+-------+---------+---------------------------+
3 | Item #2 | $ 20.00 | | | | | |
+---------+---------+-----+---------+-------+---------+---------------------------+
4 | Item #3 | $ 5.50 | | | | | |
+---------+---------+-----+---------+-------+---------+---------------------------+
5 | Item #4 | $ 25.00 | | | | | |
+---------+---------+-----+---------+-------+---------+---------------------------+
6 | Item #5 | $ 12.50 | | | | | |
+---------+---------+-----+---------+-------+---------+---------------------------+
For E2, I've attempted:
{=MAX(N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)*ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1)}
Though, the above values and this formula result in an output of -1.
Note: The formula for F2 and G2 seemingly easily follow E2; e.g. {=$D2-SUM(IF((ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1)<=$E2,$B$2:$B$6,0))} and {=TEXTJOIN(", ",TRUE,INDIRECT("$A$2:$A$"&(MIN(ROW($B$2:$B$6))+$E2-1)))} seem to work well, respectively.
Observations
{="$B$2:$B$"&ROW($B$2:$B$6)} evaluates to {"$B$2:$B$2";"$B$2:$B$3";...;"$B$2:$B$6"} (as desired);
{=INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)) should evaluate to the equivalent of {{$B$2:$B$2},{$B$2:$B$3},...,{$B$2:$B$6}}; though, as a 1x5 multi-cell array formula, evaluates to the equivalent of {#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!} and, with F9 does to {10;#N/A;#N/A;#N/A;12.5};
{=SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2}, as a 1x5 multi-cell array formula, evaluates to the equivalent of {TRUE;TRUE;TRUE;FALSE;FALSE} (as desired); though, with F9 does to #VALUE!;
{=N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)}, as a 1x5 multi-cell array formula, evaluates to the equivalent of 1;1;1;0;0 (as desired); though, with F9 does again to #VALUE!;
{=N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)*ROW($B$2:$B$6), as as 1x5 multi-cell array formula, evaluates to the equivalent of {2,3,4,0,0} (as desired); though, with F9 does to {#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!};
{=N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)*ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1}, as a 1x5 multi-cell array formula, evaluates to the equivalent of {1,2,3,-1,-1} (as desired); though, with F9 does again to {#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!}; and,
{=MAX(N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)*ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1)} evaluates to -1
Interestingly:
If {=N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)*ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1} is placed as the multi-cell array formula in, say, E10:E14, a =MAX($E$10:$E$14) results in 3 (as desired).
Speculation
At present, I'm speculating that, when entered as a single cell array formula, the INDIRECT is not being assessed to be array producing and/or the SUM, as part of a single cell array formula, is not producing an array result.
Please assist. And, thank you in advance.
Solutions (Thanks to Contributors Below)
For E2, {=IF($B$2<=$D2,MATCH(1,0/(MMULT(N(ROW($B$2:$B$6)>=TRANSPOSE(ROW($B$2:$B$6))),$B$2:$B$6)<=$D2)),0)} (thank you Jos Woolley);
For F2, =IF($E2=0,MAX(0,$D2),$D2-SUM($B$2:INDEX($B$2:$B$6,$E2))) (thank you P.b); and,
For G2, =IF($E2=0,"",TEXTJOIN(", ",TRUE,$A$2:INDEX($A$2:$A$6,$E2))) (thank you P.b).
The first point to make, as I mentioned in the comments, is that it must be understood that piecemeal evaluation of a formula - via highlighting subsections of that formula and committing with F9 within the formula bar - will not necessarily correspond to the actual evaluation.
Evaluation via F9 in the formula bar always forces that part to be evaluated as an array. Though this is misleading, since the overall construction may not actually evaluate that part as an array.
The second point to make is that SUM cannot iterate over an array of ranges, though SUBTOTAL, for example, can, so replacing SUM with SUBTOTAL (9, in your current formula should work.
However, you would still be left with a construction which is volatile, so I would recommend this non-volatile alternative:
=MATCH(1,0/(MMULT(N(ROW(B2:B6)>=TRANSPOSE(ROW(B2:B6))),B2:B6)<=D2))
In E2 you can use:
=MATCH(TRUE,--SUBTOTAL(9,OFFSET(B2:B6,,,ROW(B2:B6)))>=D2,0)
In F2 you can use:
=D2-SUM(B2:INDEX(B2:B6,E2))
In G2 you can use:
=TEXTJOIN(", ",1,A2:INDEX(A2:A6,E2))

make x in a cell equal 8 and total

I need an excel formula that will look at the cell and if it contains an x will treat it as a 8 and add it to the total at the bottom of the table. I have done these in the pass and I am so rusty that I cannot remember how I did it.
Generally, I try and break this sort of problem into steps. In this case, that'd be:
Determine if a cell is 'x' or not, and create new value accordingly.
Add up the new values.
If your values are in column A (for example), in column B, fill in:
=if(A1="x", 8, 0) (or in R1C1 mode, =if(RC[-1]="x", 8, 0).
Then just sum those values (eg sum(B1:B3)) for your total.
A | B
+---------+---------+
| VALUES | TEMP |
+---------+---------+
| 0 | 0 <------ '=if(A1="x", 8, 0)'
| x | 8 |
| fish | 0 |
+---------+---------+
| TOTAL | 8 <------ '=sum(B1:B3)'
+---------+---------+
If you want to be tidy, you could also hide the column with your intermediate values in.
(I should add that the way your question is worded, it almost sounds like you want to 'push' a value into the total; as far as I've ever known, you can really only 'pull' values into a total.)
Try this one for total sum:
=SUMIF(<range you want to sum>, "<>" & <x>, <range you want to sum>)+ <x> * COUNTIF(<range you want to sum>, <x>)

C - MPI: Parallel Processing of Column Arrays

I have a matrix (c) of 10x10 (M = 10) elements in which I divide the matrix by rows to be executed by 5 different processes (slaves = 5) with each process corresponding to 2 rows of that matrix.
offset = 0;
rows = (M / slaves);
MPI_Send(&c[offset][0], rows*M, MPI_DOUBLE, id_slave,0,MPI_COMM_WORLD);
offset= offset+rows;
Now I want to divide the matrix but by columns. I did the test as follows by changing array indices but not working:
MPI_Send(&c[0][offset], rows*M, MPI_DOUBLE, id_slave,0,MPI_COMM_WORLD);
Do you know how to do it? Thank you.
You are using the wrong datatype. As noted by Jonathan Dursi, you need to create a strided datatype that tells MPI how to access the memory in such a way that it matches the data layout of a column or a set of consecutive columns.
In your case, instead of
MPI_Send(&c[0][offset], rows*M, MPI_DOUBLE, id_slave, 0, MPI_COMM_WORLD);
you have to do:
MPI_Datatype dt_columns;
MPI_Type_vector(M, rows, M, MPI_DOUBLE, &dt_columns);
MPI_Type_commit(&dt_columns);
MPI_Send(&c[0][offset], 1, dt_columns, id_slave, 0, MPI_COMM_WORLD);
MPI_Type_vector(M, rows, M, MPI_DOUBLE, &dt_columns) creates a new MPI datatype that consists of M blocks of rows elements of MPI_DOUBLE each with the heads of the consecutive blocks M elements apart (stride M). Something like this:
|<------------ stride = M ------------->|
|<---- rows --->| |
+---+---+---+---+---+---+---+---+---+---+--
| x | x | x | x | | | | | | | ^
+---+---+---+---+---+---+---+---+---+---+ |
| x | x | x | x | | | | | | | |
+---+---+---+---+---+---+---+---+---+---+
. . . . . . . . . . . M blocks
+---+---+---+---+---+---+---+---+---+---+
| x | x | x | x | | | | | | | |
+---+---+---+---+---+---+---+---+---+---+ |
| x | x | x | x | | | | | | | v
+---+---+---+---+---+---+---+---+---+---+--
>> ------ C stores such arrays row-wise ------ >>
If you set rows equal to 1, then you create a type that corresponds to a single column. This type cannot be used to send multiple columns though, e.g., two columns, as MPI will look for the second one there, where the first one ends, which is at the bottom of the matrix. You have to tell MPI to pretend that a column is just one element wide, i.e. resize the datatype. This can be done using MPI_Type_create_resized:
MPI_Datatype dt_temp, dt_column;
MPI_Type_vector(M, 1, M, MPI_DOUBLE, &dt_temp);
MPI_Type_create_resized(dt_temp, 0, sizeof(double), &dt_column);
MPI_Type_commit(&dt_column);
You can use this type to send as many columns as you like:
// Send one column
MPI_Send(&c[0][offset], 1, dt_column, id_slave, 0, MPI_COMM_WORLD);
// Send five columns
MPI_Send(&c[0][offset], 5, dt_column, id_slave, 0, MPI_COMM_WORLD);
You can also use dt_column in MPI_Scatter[v] and/or MPI_Gather[v] to scatter and/or gather entire columns.
The problem with your code is the following:
your c array is continuous in memory, and in C it stored row-major order, and the dividing it by row like you do will just add constant offset from the beginning.
and the way you are going to divide it by columns just gives you wrong offset.
You can imagine it for 3x3 matrix and 3 slave processes:
a[3][3] = {{a00 a01 a02},
{a10 a11 a12},
{a20 a21 a22}}
which is actually in memory looks like:
A = {a00,a01,a02,a10,a11,a12,a20,a21,a22}
For example we want to send data to CPU with id = 1. In this case a[1][0] will point you to the forth element of A and the a[0][1] will point you to the second element of A. And the in both cases you just send the rows*M elements from the specific point in A.
In first case it will be:
a10,a11,a12
And in second case:
a01,a02,a10
One of the way to solve things you want is to transpose your matrix and the send it.
And also it is much natural to use MPI_Scatter than MPI_Send for this problem,
something like it explained here: scatter

Understanding MPI group communication

I am trying to understand the concept of MPI group communication through the following example:
int new_rank, rank, size, ranks1[4]={0,1,2,3}, ranks2[4]={4,5,6,7};
MPI_Group Original_Group, New_Group;
MPI_Comm comm1;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_group(MPI_COMM_WORLD, &Original_Group);
if (rank < 4) {
MPI_Group_incl(Original_Group, 4, ranks1, &New_Group);
}
else {
MPI_Group_incl(Original_Group, 4, ranks2, &New_Group);
}
MPI_Comm_create(MPI_COMM_WORLD, New_Group, &comm1);
MPI_Group_rank(New_Group, &new_rank);
printf("rank= %d newrank= %d \n",rank,new_rank);
In fact, I want to divide n*n matrix to a set of processors, say 9 processors, such that each row is divided into 3 blocks and sent to 3 processors in a group to perform some calculation. For example: The following figure shows n*n matrix is divided into sqrt(n) blocks and assigned to different processors such that each row forms one group, i.e. P1,P2,P3 is group 1, P4,P5,P6 is group 2 and so on.
| P1 | P2 | P3 |
| P4 | P5 | P6 |
| P7 | P8 | P9 |
I tried to play with the previous code by creating 3 groups (each group for each row processors) but my program crashed. Does it mean I don't need to make 3 groups? If so, how P1, for example, knows its neighbor processors in its group?

Determine whether a 3D object is hidden by another 3D object

I have some GeometryModel3D balls in a Viewport3D, Some of them are visible and some of them are hidden by a blue cube.
(Althouth the image below is in 2d lets pretend that all the objects are 3D)
I want to determine wich of the red balls can be seen and which are hidden.
How can I do this ?
This problem is also known as Occlusion Culling, although you're interested in counting the occluded primitives. Given the conditions of your scene, a brute force approach to solve this problem (given that you're using perspective projection) is the following pseudocode:
occludedSpheresCount = 0
spheres = {Set of spheres}
cubes = {Set of cubes}
normalizedCubes = {}
# First, build the set of normalized cubes (it means,
# take the cubes that are free in space and transform their
# coordinates to values between [-1, -1, -1] and [1, 1, 1], they are the same
# cubes but now the coordinates are laying in that range
# To do that, use the
ProjectionMatrix
projectionMatrix = GetProjectionMatrix(perspectiveCamera)
for each cube in cubes do
Rect3D boundingBox = cube.Bounds()
Rect3D normalizedBBox = projectionMatrix.transform(boundingBox)
cubes_normalized.add(normalizedBBox)
end for
# Now search every sphere, normalize it's bounding box
# and check if it's been occluded by some normalized cube
for each sphere in spheres do
Rect3D sphereBBox = sphere.Bounds()
Rect3D normalizedSphere = projectionMatrix.transform(sphereBBox)
for each normalizedCube in normalizedCubes do
x0 = normalizedCube.Location.X - (normalizedCube.Location.SizeX / 2)
y0 = normalizedCube.Location.Y - (normalizedCube.Location.SizeY / 2)
z0 = normalizedCube.Location.Z - (normalizedCube.Location.SizeZ / 2)
xf = normalizedCube.Location.X + (normalizedCube.Location.SizeX / 2)
yf = normalizedCube.Location.Y + (normalizedCube.Location.SizeY / 2)
sx0 <- normalizedSphere.Location.X - (normalizedSphere.Location.SizeX / 2)
sy0 <- normalizedSphere.Location.X - (normalizedSphere.Location.SizeY / 2)
sz0 <- normalizedSphere.Location.X - (normalizedSphere.Location.SizeZ / 2)
sxf <- normalizedSphere.Location.X + (normalizedSphere.Location.SizeX / 2)
syf <- normalizedSphere.Location.X + (normalizedSphere.Location.SizeY / 2)
# First, let's check that the normalized-sphere is behind the
# normalized-cube, to do that, let's compare their z-front values
if z0 > sz0 then
# Now that we know that the sphere is behind the frontface of the cube
# lets check if it is fully contained inside the
# the normalized-cube, in that case, it is occluded
if sx0 >= x0 and sxf <= xf and sy0 >= y0 and syf >= yf then
occludedSpheresCount++
# Here you can even avoid rendering the sphere altogether
end if
end if
end for
end for
A way to get the projectionMatrix is using the following code (extracted from here):
private static Matrix3D GetProjectionMatrix(PerspectiveCamera camera, double aspectRatio)
{
// This math is identical to what you find documented for
// D3DXMatrixPerspectiveFovRH with the exception that in
// WPF the camera's horizontal rather the vertical
// field-of-view is specified.
double hFoV = MathUtils.DegreesToRadians(camera.FieldOfView);
double zn = camera.NearPlaneDistance;
double zf = camera.FarPlaneDistance;
double xScale = 1 / Math.Tan(hFoV / 2);
double yScale = aspectRatio * xScale;
double m33 = (zf == double.PositiveInfinity) ? -1 : (zf / (zn - zf));
double m43 = zn * m33;
return new Matrix3D(
xScale, 0, 0, 0,
0, yScale, 0, 0,
0, 0, m33, -1,
0, 0, m43, 0);
}
The only drawback of this method is in the following case:
+--------------+--------------+
| -|- |
| / | \ |
| | | | |
| \ | / |
| -|- |
+--------------+--------------+
or
interception here
|
v
+----------+--+--------------+
| | -|- |
| /| | \ |
| | | | | |
| \| | / |
| | -|- |
+----------+--+--------------+
In which two intercepting cubes occlude the sphere, in that case, you have to build a set of sets of normalized cubes (Set{ Set{ cube1, cube2}, Set{cube3, cube4}, ... }) when two or more cube areas intercepts (that can be done in the first loop) and the contention test would be more complex. Don't know if that (cubes intercepting) is allowed in your program though
This algorithm is O(n^2) because is a brute force approach, hope this could give you a hint for the definitive solution, if you're looking for an efficient-more general solution, please use something like the Hierarchical Z Buffering

Resources