What causes the stackoverflow? And how can I resolve it? - c

I was doing the homework for computer graphics.
We need to use floodfill to paint an area, but no matter how I changed the reserve stack of Visual Studio, it would always jump out stackoverflow.
void Polygon_FloodFill(HDC hdc, int x0, int y0, int fillColor, int borderColor) {
int interiorColor;
interiorColor = GetPixel(hdc, x0, y0);
if ((interiorColor != borderColor) && (interiorColor != fillColor)) {
SetPixel(hdc, x0, y0, fillColor);
Polygon_FloodFill(hdc, x0 + 1, y0, fillColor, borderColor);
Polygon_FloodFill(hdc, x0, y0 + 1, fillColor, borderColor);
Polygon_FloodFill(hdc, x0 - 1 ,y0, fillColor, borderColor);
Polygon_FloodFill(hdc, x0, y0 - 1, fillColor, borderColor);
}

You may have too large an area to fill, which causes recursive calls to consume all of the execution stack in your program.
Your options:
grow the execution stack even further, if you can
reduce the area (how about just 100x100 or 20x20?)
stop using the execution stack and use a data structure that works similarly but can contain more elements (by being more efficient and/or being able to grow/be larger)
use a different algorithm (e.g. consider going from individual pixels to horizontal spans of pixels, there will be many fewer of the latter than the former)

What causes the stackoverflow?
What is the range of x0? +/- 2,000,000,000? That is your stack depth potential.
Code does not obviously prevent going out of range unless GetPixel(out-of-range) returns a no-match value.
And how can I resolve it?
Code needs to be more selective on recursive calls.
When a row of pixels can be set, do so without recursion.
Then examine that row's neighbors and only recurse when the neighbors were not continuously in need of setting.
A promising approach would handle the middle and then look at the 4 cardinal directions.
// Pseudo code
Polygon_FloodFill(x,y,c)
if (pixel(x,y) needs filling) {
set pixel(x,y,c);
for each of the 4 directions
// example: east
i = 1;
// fill the east line first
while (pixel(x+i,y) needs filling) {
i++;
set pixel(x,y,c);
}
// now examine the line above the "east" line
recursed = false;
for (j=1; j<i; j++) {
if (pixel(x+j, y+j) needs filling) {
if (!recursed) {
recursed = true;
Polygon_FloodFill(x+j,y+j,c)
} else {
// no need to call Polygon_FloodFill as will be caught with previous call
}
} else {
recursed = false;
}
}
// Same for line below the "east" line
// do same for south, west, north.
}

how many pixels to fill? each pixel is one level deep of recursion and you got a lot of variables all local ones and operands of the recursive function + return value and address so for reach pixel you store this:
void Polygon_FloodFill(HDC hdc, int x0, int y0, int fillColor, int borderColor) {
int interiorColor;
in 32 bit environment I estimate this in [Bytes]:
4 Polygon_FloodFill return address
4 HDC hdc ?
4 int x0
4 int y0
4 int fillColor
4 int borderColor
4 int interiorColor
-------------------
~ 7*4 = 28 Bytes
There might be even more depending on the C engine and calling sequence.
Now if your filled area has for example 256x256 pixel then you need:
7*4*256*256 = 1.75 MByte
of memory on the stack/heap. How much memory you got depends on the settings you compile/link with so go to project option and look for memory stack/heap limits...
How to deal with this?
lower the stack/heap trashing
simply do not use operands for your flood_fill instead move them to global variables:
HDC floodfill_hdc;
int floodfill_x0,floodfill_y0,floodfill_fillColor,floodfill_borderColor;
void _Polygon_FloodFill()
{
// here your original filling code
int interiorColor;
...
}
void PolygonFloodFill(HDC hdc, int x0, int y0, int fillColor, int borderColor) // this is what you call when want to fill something
{
floodfill_hdc=hdc;
floodfill_x0=x0;
floodfill_y0=y0;
floodfill_fillColor=fillColor;
floodfill_borderColor=borderColor;
_Polygon_FloodFill();
}
this will allow to fill ~14 times bigger area.
limit recursion depth
This is also sometimes called priority que ... You just add one gobal counter that is counting actual depth of recursion and if hit limit value then do not allow recursion. Instead add pixel position to some list that will be processed after actual recursion stops.
change filling from pixels to lines
this simply eliminates a lot of recursive calls in wildly rough estimate to sqrt(n) recursions from n... You simply fill whole line from a start point to predetermined direction until you hit the border ... So you would have just recursion call per each line instead of per pixel. Here example (see [edit2]):
Paint algorithm leaving white pixels at the edges when I color
However the function name Polygon_FloodFill implies you got the border polygon in vector form. If the case than filling it will be much faster using polygon rasterization techniques like:
how to rasterize rotated rectangle (in 2d by setpixel)
but for that the polygon must be convex one so if not the case you need to triangulate or break down to convex polygons first (for example with Ear clipping).

Related

OpenGL - Is it efficient to call glBufferSubData (nearly) each frame?

I have a spritesheet that contains a simple sprite animation. I can successfully extract each animation frame sprite and display it as a texture. However, I managed to do that by calling glBufferSubData to change the texture coordinates (before the game loop, at initialization). I want to play a simple sprite animation using these extracted textures, and I guess I will do it by changing the texture coordinates each frame (except if animation is triggered by user input). Anyways, this results in calling glBufferSubData almost every frame (to change texture data), and my question is that is this approach efficient? If not, how can I solve the issue? (In Reddit, I saw a comment saying that the goal must be to minimize the traffic between CPU and GPU memory in modern OpenGL, and I guess my approach violates this goal.) Thanks in advance.
For anyone interested, here is my approach:
void set_sprite_texture_from_spritesheet(Sprite* sprite, const char* path, int x_offset, int y_offset, int sprite_width, int sprite_height)
{
float* uv_coords = get_uv_coords_from_spritesheet(path, x_offset, y_offset, sprite_width, sprite_height);
for (int i = 0; i < 8; i++)
{
/* 8 means that I am changing a total of 8 texture coordinates (2 for each 4 vertices) */
edit_vertex_data_by_index(sprite, &uv_coords[i], (i / 2) * 5 + 3 + (i % 2 != 0));
/*
the last argument in this function gives the index of the desired texture coordinate
(5 is for stride, 3 for offset of texture coordinates in each row)
*/
}
free(uv_coords);
sprite->texture = load_texture(path); /* loads the texture -
since the UV coordinate is
adjusted based on the spritesheet
I am loading the entire spritesheet as
a texture.
*/
}
void edit_vertex_data_by_index(Sprite *sprite, float *data, unsigned int start_index)
{
glBindBuffer(GL_ARRAY_BUFFER, sprite->vbo);
glBufferSubData(GL_ARRAY_BUFFER, start_index * sizeof(float), sizeof(data), data);
glBindBuffer(GL_ARRAY_BUFFER, 0);
/*
My concern is that if I call this almost every frame, it could be not efficient, but I am not sure.
*/
}
Editing buffers is fine. Literally every game has buffers that change every frame. Buffers are how you get the data to the GPU so it can render it! (And uniforms. Your driver is likely to secretly put uniforms in buffers though!)
Yes, you should minimize the amount of buffer updates. You should minimize everything, really. The less stuff the computer does, the faster it can do it! That doesn't mean you should avoid doing stuff entirely. It means you should only do as much stuff as you need to, instead of doing wasteful stuff that you don't need.
Every time you call an OpenGL function, the driver takes some time to check how to process your request, which buffer is bound, that it's big enough, that the GPU isn't using it at the same time, etc. You want to do as few calls as possible, because that way, the driver has to check all this stuff less often.
You are doing 8 separate glBufferSubData calls in this function. If you put the UV coordinates all next to each other in the buffer, you could update them all at once with 1 call. And if you have lots of animated sprites, you should try to put all of their UV coordinates in one big array, and update the whole array in one call - all the sprites at once.
And loading textures from paths is really slow. Maybe your program can load 100 textures per second but that still means you blew half your frame time budget on texture loading. The texture hasn't changed anyway so why would you load it again?

Having trouble of drawing hands of an analog clock

I have an assigment to make an analog clock in C PIC18 starter kit.
I need to draw all the hands of the clock, aka seconds, minutes and hours.
I have initialized all 60 points of the clock diamater in a 2D array with x and y values and have the values of the clock center.
In order to draw the hands I've been provided for this assigment with the DrawLine function, which looks like this:
void drawLine( BYTE x0, BYTE y0, BYTE x1, BYTE y1, LineWidth lw )
the x0, y0 are where the line starts, and x1,y1 are where the line ends.
the drawLine function works like a XOR, so if I call it again on the same values the line disappears from the screen.
the screen values when they're x=0,y=0 starts from the top-left corner of the screen.
I have built a function in order to draw the hour hand of the clock.
I have an i value which increments by 1 for each coordinate of the clock, and it goes like this:
center[0][0]+(cord[i][0]-center[0][0])/2
But for some reason it only works on the 4th quadrant of the clock (aka when i is between 15 and 30), otherwise the lines it draws dosn't resemble an hand of a clock.
Below this is the full code for your understanding, but I would like to know what is wrong with my function, and what do I need to do in order of it to draw normally for the rest of thequadrants.
BYTE cord[60][2] = {
{67,0},{71,1},{74,2},{79,3},{81,4},
{82,5},{84,6},{88,9},{91,12},{92,14},
{93,16},{95,18},{96,21},{97,25},{98,29},
{98,32},{98,35},{97,39},{96,43},{95,45},
{93,47},{92,50},{89,53},{85,56},{84,58},
{82,58},{79,60},{76,61},{73,62},{70,63},
{67,63},{64,63},{60,62},{57,61},{54,60},
{51,58},{47,56},{45,54},{43,52},{41,50},
{40,47},{38,44},{37,41},{36,38},{35,35},
{35,32},{35,29},{36,26},{37,22},{38,18},
{40,16},{41,13},{44,10},{47,7},{49,6},
{51,5},{52,4},{54,3},{57,2},{61,1}};
void main(void)
{
BYTE xtemp, ytemp ;
BYTE i = 0;
BYTE center[1][2] = {{67,32}};
InitializeSystem();
while(1)
{
xtemp = center[0][0]+(cord[i][0]-center[0][0])/2;
ytemp = center[0][1]+(cord[i][1]-center[0][1])/2;
drawLine( center[0][0], center[0][1], xtemp, ytemp, thick ) ;
DelayMs(50);
drawLine( center[0][0], center[0][1], xtemp, ytemp, thick );
i++;
if(i>60)
i = 0;
}
}
There is no standard datatype BYTE included in C. I guess you have a typedef with an unsigned type.
When you do a signed calculation like:
xtemp = center[0][0]+(cord[i][0]-center[0][0])/2;
you need a signed signed type like int.

Suggestions on optimizing a Z-buffer implementation?

I'm writing a 3D graphics library as part of a project of mine, and I'm at the point where everything works, but not well enough.
In particular, my main headache is that my pixel fill-rate is horribly slow -- I can't even manage 30 FPS when drawing a triangle that spans half of an 800x600 window on my target machine (which is admittedly an older computer, but it should be able to manage this . . .)
I ran gprof on my executable, and I end up with the following interesting lines:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
43.51 9.50 9.50 vSwap
34.86 17.11 7.61 179944 0.04 0.04 grInterpolateHLine
13.99 20.17 3.06 grClearDepthBuffer
<snip>
0.76 21.78 0.17 624 0.27 12.46 grScanlineFill
The function vSwap is my double-buffer swapping function, and it also performs vsyching, so it makes sense to me that the test program will spend much of its time waiting in there. grScanlineFill is my triangle-drawing function, which creates an edge list and then calls grInterpolateHLine to actually fill in the triangle.
My engine is currently using a Z-buffer to perform hidden surface removal. If we discount the (presumed) vsynch overhead, then it turns out that the test program is spending something like 85% of its execution time either clearing the depth buffer, or writing pixels according to the values in the depth buffer. My depth buffer clearing function is simplicity itself: copy the maximum value of a float into each element. The function grInterpolateHLine is:
void grInterpolateHLine(int x1, int x2, int y, float z, float zstep, int colour) {
for(; x1 <= x2; x1 ++, z += zstep) {
if(z < grDepthBuffer[x1 + y*VIDEO_WIDTH]) {
vSetPixel(x1, y, colour);
grDepthBuffer[x1 + y*VIDEO_WIDTH] = z;
}
}
}
I really don't see how I can improve that, especially considering that vSetPixel is a macro.
My entire stock of ideas for optimization has been whittled down to precisely one:
Use an integer/fixed-point depth buffer.
The problem that I have with integer/fixed-point depth buffers is that interpolation can be very annoying, and I don't actually have a fixed-point number library yet. Any further thoughts out there? Any advice would be most appreciated.
You should have a look at the source code to something like Quake - considering what it could achieve on a Pentium, 15 years ago. Its z-buffer implementation used spans rather than per-pixel (or fragment) depth. Otherwise, you could look at the rasterization code in Mesa.
Hard to really tell what higher order optimizations can be done without seeing the rest of the code. I have a couple of minor observation, though.
There's no need to calculate x1 + y * VIDEO_WIDTH more than once in grInterpolateHLine. i.e.:
void grInterpolateHLine(int x1, int x2, int y, float z, float zstep, int colour) {
int offset = x1 + (y * VIDEO_WIDTH);
for(; x1 <= x2; x1 ++, z += zstep, offset++) {
if(z < grDepthBuffer[offset]) {
vSetPixel(x1, y, colour);
grDepthBuffer[offset] = z;
}
}
}
Likewise, I'm guessing that your vSetPixel does a similar calculation, so you should be able to use the same offset there as well, and then you only need to increment offset and not x1 in each loop iteration. Chances are this can be extended back to the function that calls grInterpolateHLine, and you would then only need to do the multiplication once per triangle.
There are some other things you could do with the depth buffer. Most of the time if the first pixel of the line either fails or passes the depth test, then the rest of the line will have the same result. So after the first test you can write a more efficient assembly block to test the entire line in one shot, then if it passes you can use a more efficient block memory setter to block-set the pixel and depth values instead of doing them one at a time. You would only need to test/set per pixel if the line is only partially occluded.
Also, not sure what you mean by older computer, but if your target computer is multi-core then you can break it up among multiple cores. You can do this for the buffer clearing function as well. It can help quite a bit.
I ended up solving this by replacing the Z-buffer with the Painter's Algorithm. I used SSE to write a Z-buffer implementation that created a bitmask w/the pixels to paint (plus the range optimization suggested by Gerald), and it still ran far too slowly.
Thank you, everyone, for your input.

Looking for a fast outlined line rendering algorithm

I'm looking for a fast algorithm to draw an outlined line. For this application, the outline only needs to be 1 pixel wide. It should be possible, whether by default or through an option, to make two lines connect together seamlessly, if they share a common point.
Excuse the ASCII art but this is probably the best way to demonstrate it.
Normal line:
##
##
##
##
##
##
"Outlined" line:
**
*##**
**##**
**##**
**##**
**##**
**##*
**
I'm working on a dsPIC33FJ128GP802. It's a small microcontroller/digital signal processor, capable of 40 MIPS (million instructions per second.) It is only capable of integer math (add, subtract and multiply: it can do division, but it takes ~19 cycles.) It's being used to process an OSD layer at the same time and only 3-4 MIPS of the processing time is available for calculations, so speed is critical. The pixels occupy three states: black, white and transparent; and the video field is 192x128 pixels. This is for Super OSD, an open source project: http://code.google.com/p/super-osd/
The first solution I thought of was to draw 3x3 rectangles with outlined pixels on the first pass and normal pixels on the second pass, but this could be slow, as for every pixel at least 3 pixels are overwritten and the time spent drawing them is wasted. So I'm looking for a faster way. Each pixel costs around 30 cycles. The target is <50, 000 cycles to draw a line of 100 pixels length.
I suggest this (C/pseudocode mix) :
void draw_outline(int x1, int y1, int x2, int y2)
{
int x, y;
double slope;
if (abs(x2-x1) >= abs(y2-y1)) {
// line closer to horizontal than vertical
if (x2 < x1) swap_points(1, 2);
// now x1 <= x2
slope = 1.0*(y2-y1)/(x2-x1);
draw_pixel(x1-1, y1, '*');
for (x = x1; x <= x2; x++) {
y = y1 + round(slope*(x-x1));
draw_pixel(x, y-1, '*');
draw_pixel(x, y+1, '*');
// here draw_line() does draw_pixel(x, y, '#');
}
draw_pixel(x2+1, y2, '*');
}
else {
// same as above, but swap x and y
}
}
Edit: If you want to have successive lines connect seamlessly, I
think you really have to draw all the outlines in the first pass, and
then the lines. I edited the code above to draw only the outlines. The
draw_line() function would be exactly the same but with one single
draw_pixel(x, y, '#'); instead of four draw_pixel(..., ..., '*');.
And then you just:
void draw_polyline(point p[], int n)
{
int i;
for (i = 0; i < n-1; i++)
draw_outline(p[i].x, p[i].y, p[i+1].x, p[i+1].y);
for (i = 0; i < n-1; i++)
draw_line(p[i].x, p[i].y, p[i+1].x, p[i+1].y);
}
My approach would be to use the Bresenham to draw multiple lines. Looking at your ASCII art, you'll note that the outline lines are just the same as the Bresenham line, just shifted 1 pixel up and down -- plus a single pixel to the left of the first point and to the right of the last.
For a generic version, you'll need to determine whether your line is flat or steep -- i.e., whether abs(y1 - y0) <= abs(x1 - x0). For steep lines, the outlines are shifted by 1 pixel to the left and right, and the closing pixels are above the starting and below the ending point.
It could be worth optimizing this by drawing the line and two outline pixels in one go for each line pixel. However, if you need seamless outlines, the simplest solution would be to first draw all outlines, then the lines themselves -- which wouldn't work with the "three-pixel-Bresenham" optimization.

Pruning short line segments from edge detector output?

I am looking for an algorithm to prune short line segments from the output of an edge detector. As can be seen in the image (and link) below, there are several small edges detected that aren't "long" lines. Ideally I'd like just the 4 sides of the quadrangle to show up after processing, but if there are a couple of stray lines, it won't be a big deal... Any suggestions?
Image Link
Before finding the edges pre-process the image with an open or close operation (or both), that is, erode followed by dilate, or dilate followed by erode. this should remove the smaller objects but leave the larger ones roughly the same.
I've looked for online examples, and the best I could find was on page 41 of this PDF.
I doubt that this can be done with a simple local operation. Look at the rectangle you want to keep - there are several gaps, hence performing a local operation to remove short line segments would probably heavily reduce the quality of the desired output.
In consequence I would try to detect the rectangle as important content by closing the gaps, fitting a polygon, or something like that, and then in a second step discard the remaining unimportant content. May be the Hough transform could help.
UPDATE
I just used this sample application using a Kernel Hough Transform with your sample image and got four nice lines fitting your rectangle.
In case somebody steps on this thread, OpenCV 2.x brings an example named squares.cpp that basically nails this task.
I made a slight modification to the application to improve the detection of the quadrangle
Code:
#include "highgui.h"
#include "cv.h"
#include <iostream>
#include <math.h>
#include <string.h>
using namespace cv;
using namespace std;
void help()
{
cout <<
"\nA program using pyramid scaling, Canny, contours, contour simpification and\n"
"memory storage (it's got it all folks) to find\n"
"squares in a list of images pic1-6.png\n"
"Returns sequence of squares detected on the image.\n"
"the sequence is stored in the specified memory storage\n"
"Call:\n"
"./squares\n"
"Using OpenCV version %s\n" << CV_VERSION << "\n" << endl;
}
int thresh = 70, N = 2;
const char* wndname = "Square Detection Demonized";
// helper function:
// finds a cosine of angle between vectors
// from pt0->pt1 and from pt0->pt2
double angle( Point pt1, Point pt2, Point pt0 )
{
double dx1 = pt1.x - pt0.x;
double dy1 = pt1.y - pt0.y;
double dx2 = pt2.x - pt0.x;
double dy2 = pt2.y - pt0.y;
return (dx1*dx2 + dy1*dy2)/sqrt((dx1*dx1 + dy1*dy1)*(dx2*dx2 + dy2*dy2) + 1e-10);
}
// returns sequence of squares detected on the image.
// the sequence is stored in the specified memory storage
void findSquares( const Mat& image, vector<vector<Point> >& squares )
{
squares.clear();
Mat pyr, timg, gray0(image.size(), CV_8U), gray;
// karlphillip: dilate the image so this technique can detect the white square,
Mat out(image);
dilate(out, out, Mat(), Point(-1,-1));
// then blur it so that the ocean/sea become one big segment to avoid detecting them as 2 big squares.
medianBlur(out, out, 3);
// down-scale and upscale the image to filter out the noise
pyrDown(out, pyr, Size(out.cols/2, out.rows/2));
pyrUp(pyr, timg, out.size());
vector<vector<Point> > contours;
// find squares only in the first color plane
for( int c = 0; c < 1; c++ ) // was: c < 3
{
int ch[] = {c, 0};
mixChannels(&timg, 1, &gray0, 1, ch, 1);
// try several threshold levels
for( int l = 0; l < N; l++ )
{
// hack: use Canny instead of zero threshold level.
// Canny helps to catch squares with gradient shading
if( l == 0 )
{
// apply Canny. Take the upper threshold from slider
// and set the lower to 0 (which forces edges merging)
Canny(gray0, gray, 0, thresh, 5);
// dilate canny output to remove potential
// holes between edge segments
dilate(gray, gray, Mat(), Point(-1,-1));
}
else
{
// apply threshold if l!=0:
// tgray(x,y) = gray(x,y) < (l+1)*255/N ? 255 : 0
gray = gray0 >= (l+1)*255/N;
}
// find contours and store them all as a list
findContours(gray, contours, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
vector<Point> approx;
// test each contour
for( size_t i = 0; i < contours.size(); i++ )
{
// approximate contour with accuracy proportional
// to the contour perimeter
approxPolyDP(Mat(contours[i]), approx, arcLength(Mat(contours[i]), true)*0.02, true);
// square contours should have 4 vertices after approximation
// relatively large area (to filter out noisy contours)
// and be convex.
// Note: absolute value of an area is used because
// area may be positive or negative - in accordance with the
// contour orientation
if( approx.size() == 4 &&
fabs(contourArea(Mat(approx))) > 1000 &&
isContourConvex(Mat(approx)) )
{
double maxCosine = 0;
for( int j = 2; j < 5; j++ )
{
// find the maximum cosine of the angle between joint edges
double cosine = fabs(angle(approx[j%4], approx[j-2], approx[j-1]));
maxCosine = MAX(maxCosine, cosine);
}
// if cosines of all angles are small
// (all angles are ~90 degree) then write quandrange
// vertices to resultant sequence
if( maxCosine < 0.3 )
squares.push_back(approx);
}
}
}
}
}
// the function draws all the squares in the image
void drawSquares( Mat& image, const vector<vector<Point> >& squares )
{
for( size_t i = 1; i < squares.size(); i++ )
{
const Point* p = &squares[i][0];
int n = (int)squares[i].size();
polylines(image, &p, &n, 1, true, Scalar(0,255,0), 3, CV_AA);
}
imshow(wndname, image);
}
int main(int argc, char** argv)
{
if (argc < 2)
{
cout << "Usage: ./program <file>" << endl;
return -1;
}
static const char* names[] = { argv[1], 0 };
help();
namedWindow( wndname, 1 );
vector<vector<Point> > squares;
for( int i = 0; names[i] != 0; i++ )
{
Mat image = imread(names[i], 1);
if( image.empty() )
{
cout << "Couldn't load " << names[i] << endl;
continue;
}
findSquares(image, squares);
drawSquares(image, squares);
imwrite("out.jpg", image);
int c = waitKey();
if( (char)c == 27 )
break;
}
return 0;
}
The Hough Transform can be a very expensive operation.
An alternative that may work well in your case is the following:
run 2 mathematical morphology operations called an image close (http://homepages.inf.ed.ac.uk/rbf/HIPR2/close.htm) with a horizontal and vertical line (of a given length determined from testing) structuring element respectively. The point of this is to close all gaps in the large rectangle.
run connected component analysis. If you have done the morphology effectively, the large rectangle will come out as one connected component. It then only remains iterating through all the connected components and picking out the most likely candidate that should be the large rectangle.
Perhaps finding the connected components, then removing components with less than X pixels (empirically determined), followed by dilation along horizontal/vertical lines to reconnect the gaps within the rectangle
It's possible to follow two main techniques:
Vector based operation: map your pixel islands into clusters (blob, voronoi zones, whatever). Then apply some heuristics to rectify the segments, like Teh-Chin chain approximation algorithm, and make your pruning upon vectorial elements (start, endpoint, length, orientation and so on).
Set based operation: cluster your data (as above). For every cluster, compute principal components and detect lines from circles or any other shape by looking for clusters showing only 1 significative eigenvalue (or 2 if you look for "fat" segments, that could resemble to ellipses). Check eigenvectors associated with eigenvalues to have information about orientation of the blobs, and make your choice.
Both ways could be easily explored with OpenCV (the former, indeed, falls under "Contour analysis" category of algos).
Here is a simple morphological filtering solution following the lines of #Tom10:
Solution in matlab:
se1 = strel('line',5,180); % linear horizontal structuring element
se2 = strel('line',5,90); % linear vertical structuring element
I = rgb2gray(imread('test.jpg'))>80; % threshold (since i had a grayscale version of the image)
Idil = imdilate(imdilate(I,se1),se2); % dilate contours so that they connect
Idil_area = bwareaopen(Idil,1200); % area filter them to remove the small components
The idea is to basically connect the horizontal contours to make a large component and filter by an area opening filter later on to obtain the rectangle.
Results:

Resources