Looking for a fast outlined line rendering algorithm - c

I'm looking for a fast algorithm to draw an outlined line. For this application, the outline only needs to be 1 pixel wide. It should be possible, whether by default or through an option, to make two lines connect together seamlessly, if they share a common point.
Excuse the ASCII art but this is probably the best way to demonstrate it.
Normal line:
##
##
##
##
##
##
"Outlined" line:
**
*##**
**##**
**##**
**##**
**##**
**##*
**
I'm working on a dsPIC33FJ128GP802. It's a small microcontroller/digital signal processor, capable of 40 MIPS (million instructions per second.) It is only capable of integer math (add, subtract and multiply: it can do division, but it takes ~19 cycles.) It's being used to process an OSD layer at the same time and only 3-4 MIPS of the processing time is available for calculations, so speed is critical. The pixels occupy three states: black, white and transparent; and the video field is 192x128 pixels. This is for Super OSD, an open source project: http://code.google.com/p/super-osd/
The first solution I thought of was to draw 3x3 rectangles with outlined pixels on the first pass and normal pixels on the second pass, but this could be slow, as for every pixel at least 3 pixels are overwritten and the time spent drawing them is wasted. So I'm looking for a faster way. Each pixel costs around 30 cycles. The target is <50, 000 cycles to draw a line of 100 pixels length.

I suggest this (C/pseudocode mix) :
void draw_outline(int x1, int y1, int x2, int y2)
{
int x, y;
double slope;
if (abs(x2-x1) >= abs(y2-y1)) {
// line closer to horizontal than vertical
if (x2 < x1) swap_points(1, 2);
// now x1 <= x2
slope = 1.0*(y2-y1)/(x2-x1);
draw_pixel(x1-1, y1, '*');
for (x = x1; x <= x2; x++) {
y = y1 + round(slope*(x-x1));
draw_pixel(x, y-1, '*');
draw_pixel(x, y+1, '*');
// here draw_line() does draw_pixel(x, y, '#');
}
draw_pixel(x2+1, y2, '*');
}
else {
// same as above, but swap x and y
}
}
Edit: If you want to have successive lines connect seamlessly, I
think you really have to draw all the outlines in the first pass, and
then the lines. I edited the code above to draw only the outlines. The
draw_line() function would be exactly the same but with one single
draw_pixel(x, y, '#'); instead of four draw_pixel(..., ..., '*');.
And then you just:
void draw_polyline(point p[], int n)
{
int i;
for (i = 0; i < n-1; i++)
draw_outline(p[i].x, p[i].y, p[i+1].x, p[i+1].y);
for (i = 0; i < n-1; i++)
draw_line(p[i].x, p[i].y, p[i+1].x, p[i+1].y);
}

My approach would be to use the Bresenham to draw multiple lines. Looking at your ASCII art, you'll note that the outline lines are just the same as the Bresenham line, just shifted 1 pixel up and down -- plus a single pixel to the left of the first point and to the right of the last.
For a generic version, you'll need to determine whether your line is flat or steep -- i.e., whether abs(y1 - y0) <= abs(x1 - x0). For steep lines, the outlines are shifted by 1 pixel to the left and right, and the closing pixels are above the starting and below the ending point.
It could be worth optimizing this by drawing the line and two outline pixels in one go for each line pixel. However, if you need seamless outlines, the simplest solution would be to first draw all outlines, then the lines themselves -- which wouldn't work with the "three-pixel-Bresenham" optimization.

Related

What causes the stackoverflow? And how can I resolve it?

I was doing the homework for computer graphics.
We need to use floodfill to paint an area, but no matter how I changed the reserve stack of Visual Studio, it would always jump out stackoverflow.
void Polygon_FloodFill(HDC hdc, int x0, int y0, int fillColor, int borderColor) {
int interiorColor;
interiorColor = GetPixel(hdc, x0, y0);
if ((interiorColor != borderColor) && (interiorColor != fillColor)) {
SetPixel(hdc, x0, y0, fillColor);
Polygon_FloodFill(hdc, x0 + 1, y0, fillColor, borderColor);
Polygon_FloodFill(hdc, x0, y0 + 1, fillColor, borderColor);
Polygon_FloodFill(hdc, x0 - 1 ,y0, fillColor, borderColor);
Polygon_FloodFill(hdc, x0, y0 - 1, fillColor, borderColor);
}
You may have too large an area to fill, which causes recursive calls to consume all of the execution stack in your program.
Your options:
grow the execution stack even further, if you can
reduce the area (how about just 100x100 or 20x20?)
stop using the execution stack and use a data structure that works similarly but can contain more elements (by being more efficient and/or being able to grow/be larger)
use a different algorithm (e.g. consider going from individual pixels to horizontal spans of pixels, there will be many fewer of the latter than the former)
What causes the stackoverflow?
What is the range of x0? +/- 2,000,000,000? That is your stack depth potential.
Code does not obviously prevent going out of range unless GetPixel(out-of-range) returns a no-match value.
And how can I resolve it?
Code needs to be more selective on recursive calls.
When a row of pixels can be set, do so without recursion.
Then examine that row's neighbors and only recurse when the neighbors were not continuously in need of setting.
A promising approach would handle the middle and then look at the 4 cardinal directions.
// Pseudo code
Polygon_FloodFill(x,y,c)
if (pixel(x,y) needs filling) {
set pixel(x,y,c);
for each of the 4 directions
// example: east
i = 1;
// fill the east line first
while (pixel(x+i,y) needs filling) {
i++;
set pixel(x,y,c);
}
// now examine the line above the "east" line
recursed = false;
for (j=1; j<i; j++) {
if (pixel(x+j, y+j) needs filling) {
if (!recursed) {
recursed = true;
Polygon_FloodFill(x+j,y+j,c)
} else {
// no need to call Polygon_FloodFill as will be caught with previous call
}
} else {
recursed = false;
}
}
// Same for line below the "east" line
// do same for south, west, north.
}
how many pixels to fill? each pixel is one level deep of recursion and you got a lot of variables all local ones and operands of the recursive function + return value and address so for reach pixel you store this:
void Polygon_FloodFill(HDC hdc, int x0, int y0, int fillColor, int borderColor) {
int interiorColor;
in 32 bit environment I estimate this in [Bytes]:
4 Polygon_FloodFill return address
4 HDC hdc ?
4 int x0
4 int y0
4 int fillColor
4 int borderColor
4 int interiorColor
-------------------
~ 7*4 = 28 Bytes
There might be even more depending on the C engine and calling sequence.
Now if your filled area has for example 256x256 pixel then you need:
7*4*256*256 = 1.75 MByte
of memory on the stack/heap. How much memory you got depends on the settings you compile/link with so go to project option and look for memory stack/heap limits...
How to deal with this?
lower the stack/heap trashing
simply do not use operands for your flood_fill instead move them to global variables:
HDC floodfill_hdc;
int floodfill_x0,floodfill_y0,floodfill_fillColor,floodfill_borderColor;
void _Polygon_FloodFill()
{
// here your original filling code
int interiorColor;
...
}
void PolygonFloodFill(HDC hdc, int x0, int y0, int fillColor, int borderColor) // this is what you call when want to fill something
{
floodfill_hdc=hdc;
floodfill_x0=x0;
floodfill_y0=y0;
floodfill_fillColor=fillColor;
floodfill_borderColor=borderColor;
_Polygon_FloodFill();
}
this will allow to fill ~14 times bigger area.
limit recursion depth
This is also sometimes called priority que ... You just add one gobal counter that is counting actual depth of recursion and if hit limit value then do not allow recursion. Instead add pixel position to some list that will be processed after actual recursion stops.
change filling from pixels to lines
this simply eliminates a lot of recursive calls in wildly rough estimate to sqrt(n) recursions from n... You simply fill whole line from a start point to predetermined direction until you hit the border ... So you would have just recursion call per each line instead of per pixel. Here example (see [edit2]):
Paint algorithm leaving white pixels at the edges when I color
However the function name Polygon_FloodFill implies you got the border polygon in vector form. If the case than filling it will be much faster using polygon rasterization techniques like:
how to rasterize rotated rectangle (in 2d by setpixel)
but for that the polygon must be convex one so if not the case you need to triangulate or break down to convex polygons first (for example with Ear clipping).

What is the most efficient way to put multiple colours on a window, especially in frame-by-frame format?

I am making a game with C and X11. I've been trying for quite a while to find a way to put different coloured pixels on a window, frame by frame. I've seen fully developed games get thousands of frames per second. What is the most efficient way of doing this?
I have seen 2-coloured bitmaps with XImages, allocating 256 colours on a fade of black-white, and using XPutPixel with XImages (which I wasn't able to figure how to create an XImage properly that could later have pixels put on it).
I have made this for loop that creates a random image, but it is, obviously, pixel-by-pixel instead of frame-by-frame and takes 18 seconds to render one entire frame.
XColor pixel;
for (int x = 0; x < currentWindowWidth; x++) {
for (int y = 0; y < currentWindowHeight; y++) {
pixel.red = rand() % 256 * 256; //Converting 16-bit colour to 8-bit colour
pixel.green = rand() % 256 * 256;
pixel.blue = rand() % 256 * 256;
XAllocColor(display, XDefaultColormap(display, screenNumber), &pixel); //This probably takes the most time,
XSetForeground(display, graphics, pixel.pixel); //as does this.
XDrawPoint(display, window, graphics, x, y);
}
}
After three or so more weeks of testing things off and on, I finally figured out how to do it, and it was rather simple. As I said in the OP, XAllocColor and XSetForground take quite a bit of time (relatively) to work. XDrawPoint also was slow, as it does more than just put a pixel at a point on an image.
First I tested how Xlib's colour format works (for the unsigned long int represented as pixel.pixel, which was what I needed XAllocColor for), and it appears to have 100% red set to 16711680, 100% green set to 65280, and 100% blue set to 255, which is obviously a pattern. I found the maximum to be a 50% of all colours, 4286019447, which is a solid grey.
Next, I made sure my XVisualInfo would be supported by my system with a test using XMatchVisualInfo([expected visual info values]). That ensures the depth I will use and the TrueColor class works.
Finally, I made an XImage copied from the root window's image for manipulation. I used XPutPixel for each pixel on the window and set it to a random value between 0 and 4286019448, creating the random image. I then used XPutImage to paste the image to the window.
Here's the final code:
if (!XMatchVisualInfo(display, screenNumber, 24, TrueColor, &visualInfo)) {
exit(0);
}
frameImage = XGetImage(display, rootWindow, 0, 0, screenWidth, screenHeight, AllPlanes, ZPixmap);
while (1) {
for (unsigned short x = 0; x < currentWindowWidth; x += pixelSize) {
for (unsigned short y = 0; y < currentWindowHeight; y += pixelSize) {
XPutPixel(frameImage, x, y, rand() % 4286019447);
}
}
XPutImage(display, window, graphics, frameImage, 0, 0, 0, 0, currentWindowWidth, currentWindowHeight);
}
This puts a random image on the screen, at a stable 140 frames per second on fullscreen. I don't necessarily know if this is the most efficient way, but it works way better than anything else I've tried. Let me know if there is any way to make it better.
Thousands of frames per second is not possible. The monitor frequency is about 100 Hz, or 100 cycles per second, that's roughly the maximum frame rate. This is still very fast. Human eye wouldn't pick up faster frame rates.
The monitor response time is about 5ms, so any single point on the screen cannot be refreshed more than 200 times per second.
8-bit is 1 byte, so 8-bit image uses one byte per pixel, each pixel is from 0 to 256. The pixel doesn't have red, blue, green component. Instead each pixel points to an index in the color table. The color table holds 256 colors. There is a trick where you keep the pixels the same and change the color table, this makes the image fade in and out or do other weird things.
In a 24-bit image, each pixel has blue, red, green component. Each color is 1 byte, so each pixel is 3 bytes, or 24 bits.
uint8_t red = rand() % 256;
uint8_t grn = rand() % 256;
uint8_t blu = rand() % 256;
A 16-bit image uses an odd format to store red, blue, green. 16 is not divisible by 3, often times 2 colors are assigned 5-bits, and the 3rd color gets 6-bits. Then you have to fit these colors on one uint16_t sized pixel. It's probably not worth it to explore this.
The slowness of your routine is because you are painting one pixel at a time. You should paint to a buffer instead, and render the buffer once per frame. You might consider using other frame works like SDL. Other games may use things like OpenGL which takes advantage of GPU optimization for matrix operation etc.
You must use a GPU. GPUs have a highly parallel architecture optimized for graphics (hence the name). To access the GPU you will use an API like OpenGL or Vulkan or make use of a Game Engine.

Standard deviation of pixel values in a masked image

I have a DICOM image with a mask on. It looks like a black background with a white circle in the middle (area not covered and zeroed with the mask).
The code for which is:
import numpy as np
import dicom
import pylab
ds = dicom.read_file("C:\Users\uccadmin\Desktop\James_Phantom_CT_Dec_16th\James Phantom CT Dec 16th\Images\SEQ4Recon_3_34\IM-0268-0001.dcm")
lx, ly = ds.pixel_array.shape
X, Y = np.ogrid[0:lx, 0:ly]
mask = (X - lx/2)**2 + (Y - ly/2)**2 > lx*ly/8 # defining mask
ds.pixel_array[mask] = 0
print np.std(ds.pixel_array) # trying to get standard deviation
pylab.imshow(ds.pixel_array, cmap=pylab.cm.bone) # shows image with mask
I want to get the standard deviation of the pixel values INSIDE the white circle ONLY i.e. exclude the black space outside the circle (the mask).
I do not think the value I am getting with the above code is correct, as it is ~500, and the white circle is almost homogenous.
Any ideas how to make sure that I get the standard deviation of the pixel values within the white circle ONLY in a Pythonic way?
I think the reason you are getting a big number is because your standard deviation is including all the zero values.
Is it enough for you to simply ignore all zero values? (This will be okay, providing that no or very few pixels in the circle have value 0.) If so
np.std([x for x in ds.pixel_array if x > 0])
should do the trick. If this isn't good enough, then you can reverse the condition in your mask to be
mask = (X - lx/2)**2 + (Y - ly/2)**2 < lx*ly/8 # defining mask, < instead of >
and do
mp.std(ds.pixel_array[mask])

Suggestions on optimizing a Z-buffer implementation?

I'm writing a 3D graphics library as part of a project of mine, and I'm at the point where everything works, but not well enough.
In particular, my main headache is that my pixel fill-rate is horribly slow -- I can't even manage 30 FPS when drawing a triangle that spans half of an 800x600 window on my target machine (which is admittedly an older computer, but it should be able to manage this . . .)
I ran gprof on my executable, and I end up with the following interesting lines:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
43.51 9.50 9.50 vSwap
34.86 17.11 7.61 179944 0.04 0.04 grInterpolateHLine
13.99 20.17 3.06 grClearDepthBuffer
<snip>
0.76 21.78 0.17 624 0.27 12.46 grScanlineFill
The function vSwap is my double-buffer swapping function, and it also performs vsyching, so it makes sense to me that the test program will spend much of its time waiting in there. grScanlineFill is my triangle-drawing function, which creates an edge list and then calls grInterpolateHLine to actually fill in the triangle.
My engine is currently using a Z-buffer to perform hidden surface removal. If we discount the (presumed) vsynch overhead, then it turns out that the test program is spending something like 85% of its execution time either clearing the depth buffer, or writing pixels according to the values in the depth buffer. My depth buffer clearing function is simplicity itself: copy the maximum value of a float into each element. The function grInterpolateHLine is:
void grInterpolateHLine(int x1, int x2, int y, float z, float zstep, int colour) {
for(; x1 <= x2; x1 ++, z += zstep) {
if(z < grDepthBuffer[x1 + y*VIDEO_WIDTH]) {
vSetPixel(x1, y, colour);
grDepthBuffer[x1 + y*VIDEO_WIDTH] = z;
}
}
}
I really don't see how I can improve that, especially considering that vSetPixel is a macro.
My entire stock of ideas for optimization has been whittled down to precisely one:
Use an integer/fixed-point depth buffer.
The problem that I have with integer/fixed-point depth buffers is that interpolation can be very annoying, and I don't actually have a fixed-point number library yet. Any further thoughts out there? Any advice would be most appreciated.
You should have a look at the source code to something like Quake - considering what it could achieve on a Pentium, 15 years ago. Its z-buffer implementation used spans rather than per-pixel (or fragment) depth. Otherwise, you could look at the rasterization code in Mesa.
Hard to really tell what higher order optimizations can be done without seeing the rest of the code. I have a couple of minor observation, though.
There's no need to calculate x1 + y * VIDEO_WIDTH more than once in grInterpolateHLine. i.e.:
void grInterpolateHLine(int x1, int x2, int y, float z, float zstep, int colour) {
int offset = x1 + (y * VIDEO_WIDTH);
for(; x1 <= x2; x1 ++, z += zstep, offset++) {
if(z < grDepthBuffer[offset]) {
vSetPixel(x1, y, colour);
grDepthBuffer[offset] = z;
}
}
}
Likewise, I'm guessing that your vSetPixel does a similar calculation, so you should be able to use the same offset there as well, and then you only need to increment offset and not x1 in each loop iteration. Chances are this can be extended back to the function that calls grInterpolateHLine, and you would then only need to do the multiplication once per triangle.
There are some other things you could do with the depth buffer. Most of the time if the first pixel of the line either fails or passes the depth test, then the rest of the line will have the same result. So after the first test you can write a more efficient assembly block to test the entire line in one shot, then if it passes you can use a more efficient block memory setter to block-set the pixel and depth values instead of doing them one at a time. You would only need to test/set per pixel if the line is only partially occluded.
Also, not sure what you mean by older computer, but if your target computer is multi-core then you can break it up among multiple cores. You can do this for the buffer clearing function as well. It can help quite a bit.
I ended up solving this by replacing the Z-buffer with the Painter's Algorithm. I used SSE to write a Z-buffer implementation that created a bitmask w/the pixels to paint (plus the range optimization suggested by Gerald), and it still ran far too slowly.
Thank you, everyone, for your input.

Pruning short line segments from edge detector output?

I am looking for an algorithm to prune short line segments from the output of an edge detector. As can be seen in the image (and link) below, there are several small edges detected that aren't "long" lines. Ideally I'd like just the 4 sides of the quadrangle to show up after processing, but if there are a couple of stray lines, it won't be a big deal... Any suggestions?
Image Link
Before finding the edges pre-process the image with an open or close operation (or both), that is, erode followed by dilate, or dilate followed by erode. this should remove the smaller objects but leave the larger ones roughly the same.
I've looked for online examples, and the best I could find was on page 41 of this PDF.
I doubt that this can be done with a simple local operation. Look at the rectangle you want to keep - there are several gaps, hence performing a local operation to remove short line segments would probably heavily reduce the quality of the desired output.
In consequence I would try to detect the rectangle as important content by closing the gaps, fitting a polygon, or something like that, and then in a second step discard the remaining unimportant content. May be the Hough transform could help.
UPDATE
I just used this sample application using a Kernel Hough Transform with your sample image and got four nice lines fitting your rectangle.
In case somebody steps on this thread, OpenCV 2.x brings an example named squares.cpp that basically nails this task.
I made a slight modification to the application to improve the detection of the quadrangle
Code:
#include "highgui.h"
#include "cv.h"
#include <iostream>
#include <math.h>
#include <string.h>
using namespace cv;
using namespace std;
void help()
{
cout <<
"\nA program using pyramid scaling, Canny, contours, contour simpification and\n"
"memory storage (it's got it all folks) to find\n"
"squares in a list of images pic1-6.png\n"
"Returns sequence of squares detected on the image.\n"
"the sequence is stored in the specified memory storage\n"
"Call:\n"
"./squares\n"
"Using OpenCV version %s\n" << CV_VERSION << "\n" << endl;
}
int thresh = 70, N = 2;
const char* wndname = "Square Detection Demonized";
// helper function:
// finds a cosine of angle between vectors
// from pt0->pt1 and from pt0->pt2
double angle( Point pt1, Point pt2, Point pt0 )
{
double dx1 = pt1.x - pt0.x;
double dy1 = pt1.y - pt0.y;
double dx2 = pt2.x - pt0.x;
double dy2 = pt2.y - pt0.y;
return (dx1*dx2 + dy1*dy2)/sqrt((dx1*dx1 + dy1*dy1)*(dx2*dx2 + dy2*dy2) + 1e-10);
}
// returns sequence of squares detected on the image.
// the sequence is stored in the specified memory storage
void findSquares( const Mat& image, vector<vector<Point> >& squares )
{
squares.clear();
Mat pyr, timg, gray0(image.size(), CV_8U), gray;
// karlphillip: dilate the image so this technique can detect the white square,
Mat out(image);
dilate(out, out, Mat(), Point(-1,-1));
// then blur it so that the ocean/sea become one big segment to avoid detecting them as 2 big squares.
medianBlur(out, out, 3);
// down-scale and upscale the image to filter out the noise
pyrDown(out, pyr, Size(out.cols/2, out.rows/2));
pyrUp(pyr, timg, out.size());
vector<vector<Point> > contours;
// find squares only in the first color plane
for( int c = 0; c < 1; c++ ) // was: c < 3
{
int ch[] = {c, 0};
mixChannels(&timg, 1, &gray0, 1, ch, 1);
// try several threshold levels
for( int l = 0; l < N; l++ )
{
// hack: use Canny instead of zero threshold level.
// Canny helps to catch squares with gradient shading
if( l == 0 )
{
// apply Canny. Take the upper threshold from slider
// and set the lower to 0 (which forces edges merging)
Canny(gray0, gray, 0, thresh, 5);
// dilate canny output to remove potential
// holes between edge segments
dilate(gray, gray, Mat(), Point(-1,-1));
}
else
{
// apply threshold if l!=0:
// tgray(x,y) = gray(x,y) < (l+1)*255/N ? 255 : 0
gray = gray0 >= (l+1)*255/N;
}
// find contours and store them all as a list
findContours(gray, contours, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
vector<Point> approx;
// test each contour
for( size_t i = 0; i < contours.size(); i++ )
{
// approximate contour with accuracy proportional
// to the contour perimeter
approxPolyDP(Mat(contours[i]), approx, arcLength(Mat(contours[i]), true)*0.02, true);
// square contours should have 4 vertices after approximation
// relatively large area (to filter out noisy contours)
// and be convex.
// Note: absolute value of an area is used because
// area may be positive or negative - in accordance with the
// contour orientation
if( approx.size() == 4 &&
fabs(contourArea(Mat(approx))) > 1000 &&
isContourConvex(Mat(approx)) )
{
double maxCosine = 0;
for( int j = 2; j < 5; j++ )
{
// find the maximum cosine of the angle between joint edges
double cosine = fabs(angle(approx[j%4], approx[j-2], approx[j-1]));
maxCosine = MAX(maxCosine, cosine);
}
// if cosines of all angles are small
// (all angles are ~90 degree) then write quandrange
// vertices to resultant sequence
if( maxCosine < 0.3 )
squares.push_back(approx);
}
}
}
}
}
// the function draws all the squares in the image
void drawSquares( Mat& image, const vector<vector<Point> >& squares )
{
for( size_t i = 1; i < squares.size(); i++ )
{
const Point* p = &squares[i][0];
int n = (int)squares[i].size();
polylines(image, &p, &n, 1, true, Scalar(0,255,0), 3, CV_AA);
}
imshow(wndname, image);
}
int main(int argc, char** argv)
{
if (argc < 2)
{
cout << "Usage: ./program <file>" << endl;
return -1;
}
static const char* names[] = { argv[1], 0 };
help();
namedWindow( wndname, 1 );
vector<vector<Point> > squares;
for( int i = 0; names[i] != 0; i++ )
{
Mat image = imread(names[i], 1);
if( image.empty() )
{
cout << "Couldn't load " << names[i] << endl;
continue;
}
findSquares(image, squares);
drawSquares(image, squares);
imwrite("out.jpg", image);
int c = waitKey();
if( (char)c == 27 )
break;
}
return 0;
}
The Hough Transform can be a very expensive operation.
An alternative that may work well in your case is the following:
run 2 mathematical morphology operations called an image close (http://homepages.inf.ed.ac.uk/rbf/HIPR2/close.htm) with a horizontal and vertical line (of a given length determined from testing) structuring element respectively. The point of this is to close all gaps in the large rectangle.
run connected component analysis. If you have done the morphology effectively, the large rectangle will come out as one connected component. It then only remains iterating through all the connected components and picking out the most likely candidate that should be the large rectangle.
Perhaps finding the connected components, then removing components with less than X pixels (empirically determined), followed by dilation along horizontal/vertical lines to reconnect the gaps within the rectangle
It's possible to follow two main techniques:
Vector based operation: map your pixel islands into clusters (blob, voronoi zones, whatever). Then apply some heuristics to rectify the segments, like Teh-Chin chain approximation algorithm, and make your pruning upon vectorial elements (start, endpoint, length, orientation and so on).
Set based operation: cluster your data (as above). For every cluster, compute principal components and detect lines from circles or any other shape by looking for clusters showing only 1 significative eigenvalue (or 2 if you look for "fat" segments, that could resemble to ellipses). Check eigenvectors associated with eigenvalues to have information about orientation of the blobs, and make your choice.
Both ways could be easily explored with OpenCV (the former, indeed, falls under "Contour analysis" category of algos).
Here is a simple morphological filtering solution following the lines of #Tom10:
Solution in matlab:
se1 = strel('line',5,180); % linear horizontal structuring element
se2 = strel('line',5,90); % linear vertical structuring element
I = rgb2gray(imread('test.jpg'))>80; % threshold (since i had a grayscale version of the image)
Idil = imdilate(imdilate(I,se1),se2); % dilate contours so that they connect
Idil_area = bwareaopen(Idil,1200); % area filter them to remove the small components
The idea is to basically connect the horizontal contours to make a large component and filter by an area opening filter later on to obtain the rectangle.
Results:

Resources