Calling userdefined functions in thrust - c

I'm loading a .png file using OpenCV and I want to extract its blue intensity values using thrust library.
My code goes like this:
Loading an image using OpenCV IplImage pointer
Copying the image data into thrust::device_vector
Extracting the blue intensity values from the device vector inside a structure using thrust library.
Now I have a problem in extracting Blue Intensity values from the device vector.
I did this code in cuda already now converting it using thrust library.
I fetch blue intensity values inside this function.
I want to know how to call this struct FetchBlueValues from the main function.
Code:
#define ImageWidth 14
#define ImageHeight 10
thrust::device_vector<int> BinaryImage(ImageWidth*ImageHeight);
thrust::device_vector<int> ImageVector(ImageWidth*ImageHeight*3);
struct FetchBlueValues
{
__host__ __device__ void operator() ()
{
int index = 0 ;
for(int i=0; i<= ImageHeight*ImageWidth*3 ; i = i+3)
{
BinaryImage[index]= ImageVector[i];
index++;
}
}
};
void main()
{
src = cvLoadImage("../Input/test.png", CV_LOAD_IMAGE_COLOR);
unsigned char *raw_ptr,*out_ptr;
raw_ptr = (unsigned char*) src->imageData;
thrust::device_ptr<unsigned char> dev_ptr = thrust::device_malloc<unsigned char>(ImageHeight*src->widthStep);
thrust::copy(raw_ptr,raw_ptr+(src->widthStep*ImageHeight),dev_ptr);
int index=0;
for(int j=0;j<ImageHeight;j++)
{
for(int i=0;i<ImageWidth;i++)
{
ImageVector[index] = (int) dev_ptr[ (j*src->widthStep) + (i*src->nChannels) + 0 ];
ImageVector[index+1] = (int) dev_ptr[ (j*src->widthStep) + (i*src->nChannels) + 1 ];
ImageVector[index+2] = (int) dev_ptr[ (j*src->widthStep) + (i*src->nChannels) + 2 ];
index +=3 ;
}
}
}

Since the image is stored in pixel format, and each pixel includes distinct colors, there is a natural "stride" in accessing the individual color components of each pixel. In this case, it appears that the color components of a pixel are stored in three successive int quantities per pixel, so the access stride for a given color component would be three.
An example strided range access iterator methodology is covered here.

Related

implementing object detection like openCV

I'm trying to implement the Viola-Jones algorithm for object detection using Haar cascades (like openCV's implementation) in C, to detect faces. I writing the C code in a Vivado HLS compatible way, so I can port the the implementation to an FPGA. My main goal is to learn as much as possible, rather than just getting it to work. I would also appreciate any help with improving my question.
I basically started reading G. Bradski's Learning openCV, watched some online tutorials and got started writing the code. Sure enough its not detecting faces and I don't know why. At this point I care more about understanding my mistakes rather than beeing able to detect faces.
My Implementation Steps
I'm not sure how much detail is appropriate, but to keep it short:
Extracting Haar cascade data from haarcascade_frontalface_default.xml to C readable structures (huge arrays)
Writing a function to create an integral image of any given 8bit greyscale image of size 24x24 (same size as listed in the cascade)
Applying knowledge from this great post to make the necessary calculations
My Testing Scheme
Implementing a python script to detect faces using the openCV library with the same Haar cascade as mentioned above to create golden data, a detected face is cut out (ensuring 24x24 size) from the image and stored.
Stored images are converted to one dimensional C arrays, containing pixel values row-wise: img = {row0col0, row0col1, row1col0, row1col1, ... }
integral image is calculated and face detection applied
Result
Faces pass only 6 from 25 stages of the Haar cascade and are therefore not detected by my implementation, where I know they should have been detected since the python script with openCV and the same Haar cascade did indeed detect them.
My Code
/*
* This is detectFace.c
*/
#include <stdio.h>
#include "detectFace.h"
// define constants based on Haar cascade in use
// Each feature is made of max 3 rects
//#define FEAT_NO 1 // max no. of features (= 2912 for face_default.xml)
#define RECTS_IN_FEAT 3 // max no. of rect's per feature
//#define INTS_IN_RECT 5 // no. of int's needed to describe a rect
// each node has one feature (bijective relation) and three doubles
#define STAGE_NO 25 // no. of stages
#define NODE_NO 211 // no of nodes per stage, corresponds to FEAT_NO since each Node has always one feature in haarcascade_frontalface_default.xml
//#define ELMNT_IN_NODE 3 // no. of doubles needed to describe a node
// constants for frame size
#define WIN_WIDTH 24 // width = height =24
//int detectFace(int features[FEAT_NO][RECTS_IN_FEAT][INTS_IN_RECT], double stages[STAGE_NO][NODE_NO][ELMNT_IN_NODE], double stageThresh[STAGE_NO], int ii[24][24]){
int detectFace(
int ii[576],
int stageNum,
int stageOrga[25],
float stageThresholds[25],
float nodes[8739],
int featOrga[2913],
int rectangles[6383][5])
{
int passedStages = 0; // number of stages passed in this run
int faceDetected = 0; // turns to 1 if face is detected and to 0 if its not detected
// Debug:
int nodesUsed = 0; // number of floats out of nodes[] processed, use to skip to the unprocessed floats
int rectsUsed = 0; // number of rects processed
int droppedInStage0 = 0;
// loop through all stages
int i;
detectFace_label1:
for (i = 0; i < STAGE_NO; i++)
{
double tmp = 0.0; //variable to accumulate node-values, to then compare to stage threshold
int nodeNum = stageOrga[i]; // get number of nodes for this stage from stageOrga using stage index i
// loop through nodes inside each stage
// NOTE: it is assumed that each node maps to one corresponding feature. Ex: node[0] has feat[0) and node[1] has feat[1]
// because this is how it is written in the haarcascade_frontalface_default.xml
int j;
detectFace_label0:
for (j = 0; j < NODE_NO; j++)
{
// a node is defined by 3 values:
double nodeThresh = nodes[nodesUsed]; // the first value is the node threshold
double lValue = nodes[nodesUsed + 1]; // the second value is the left value
double rValue = nodes[nodesUsed + 2]; // the third value is the right value
int sum = 0; // contains the weighted value of rectangles in one Haar feature
// loop through rect's in a feature, some have 2 and some have 3 rect's.
// Each node always refers to one feature in a way that node0 maps to feature0 and node1 to feature1 (The XML file is build like that)
//int rectNum = featOrga[j]; // get number of rects for current feature using current node index j
int k;
detectFace_label2:
for (k = 0; k < RECTS_IN_FEAT; k++)
{
int x = 0, y = 0, width = 0, height = 0, weight = 0, coordUpL = 0, coordUpR = 0, coordDownL = 0, coordDownR = 0;
// a rect is defined by 5 values:
x = rectangles[rectsUsed][0]; // the first value is the x coordinate of the top left corner pixel
y = rectangles[rectsUsed][1]; // the second value is the y coordinate of the top left corner pixel
width = rectangles[rectsUsed][2]; // the third value is the width of the current rectangle
height = rectangles[rectsUsed][3]; // the fourth value is the height of this rectangle
weight = rectangles[rectsUsed][4]; // the fifth value is the weight of this rectangle
// calculating 1-Dim index for points of interest. Formula: index = width * row + column, assuming values are stored in row order
coordUpL = ((WIN_WIDTH * y) - WIN_WIDTH) + (x - 1);
coordUpR = coordUpL + width;
coordDownL = coordUpL + (height * WIN_WIDTH);
coordDownR = coordDownL + width;
// calculate the area sum according to Viola-Jones
//sum += (ii[x][y] + ii[x+width][y+height] - ii[x][y+height] - ii[x+width][y]) * weight;
sum += (ii[coordUpL] + ii[coordDownR] - ii[coordUpR] - ii[coordDownL]) * weight;
// Debug: counting the number of actual rectangles used
rectsUsed++; //
}
// decide whether the result of the feature calculation reaches the node threshold
if (sum < nodeThresh)
{
tmp += lValue; // add left value to tmp if node threshold was not reached
}
else
{
tmp += rValue; // // add right value to tmp if node threshold was reached
}
nodesUsed = nodesUsed + 3; // one node is processed, increase nodesUsed by number of floats needed to represent a node (3)¬
}
//######## at this point we went through each node in the current stage #######
// check if threshold of current stage was reached
if (tmp < stageThresholds[i])
{
faceDetected = 0; // if any stage threshold is not reached the operation is done and no face is present
// Debug: show in which stage the frame was dropped
printf("Face detection failed in stage %d \n", i);
//i = stageNum; // breaks out this loop, because i is supposed to stay smaller than STAGE_NO
}
else
{
passedStages++; // stage threshold is reached, therefore passedStages will count up
}
}
//######## at this point we went through all stages ###############################
//----------------------------------------------------------------------------------
// if the number of passed stages reaches the total number of stages, a face is detected
if (passedStages == stageNum)
{
faceDetected = 1; // one symbolizes that the input is a face
}
else
{
faceDetected = 0; // zero symbolizes that the input is not a face
};
return faceDetected;
}

Subtracting two images using NEON

I'm trying to subtract two images(grayscaled) by using Neon intrinsics as an exercise, I don't know what is the best way to subtract two vectors using the C intrinsics.
void subtractTwoImagesNeonOnePass( uint8_t *src, uint8_t*dest, uint8_t*result, int srcWidth)
{
for (int i = 0; i<srcWidth; i++)
{
// load 8 pixels
uint8x8x3_t srcPixels = vld3_u8 (src);
uint8x8x3_t dstPixels = vld3_u8 (src);
// subtract them
uint8x8x3_t subPixels = vsub_u8(srcPixels, dstPixels);
// store the result
vst1_u8 (result, subPixels);
// move 8 pixels
src+=8;
dest+=8;
result+=8;
}
}
It looks like you're using the wrong kind of loads and stores. Did you copy this from a three channel example? I think this is what you need:
#include <stdint.h>
#include <arm_neon.h>
void subtractTwoImagesNeon( uint8_t *src, uint8_t*dst, uint8_t*result, int srcWidth, int srcHeight)
{
for (int i = 0; i<(srcWidth/8); i++)
{
// load 8 pixels
uint8x8_t srcPixels = vld1_u8(src);
uint8x8_t dstPixels = vld1_u8(dst);
// subtract them
uint8x8_t subPixels = vsub_u8(srcPixels, dstPixels);
// store the result
vst1_u8 (result, subPixels);
// move 8 pixels
src+=8;
dst+=8;
result+=8;
}
}
You should also check that srcWidth is a multiple of 8. Also, you'd need to include all the lines of the image, as it appears that your code only handles the first line (maybe you know this and just cut down the example for simplicity).

How to apply line segment detector (LSD) on a video , frame by frame?

int main()
{
image_double image;
ntuple_list out;
unsigned int xsize,ysize,depth;
int x,y,i,j,width,height,step;
uchar *p;
IplImage* img = 0;
IplImage* dst = 0;
img = cvLoadImage("D:\\Ahram.jpg",CV_LOAD_IMAGE_COLOR);
width = img->width;
height = img->height;
dst=cvCreateImage(cvSize(width,height),IPL_DEPTH_8U,1);
cvCvtColor(img,dst,CV_RGB2GRAY);
width=dst->width;
height=dst->height;
step=dst->widthstep;
p=(uchar*)dst->imageData;
image=new_image_double(dst->width,dst->height);
xsize=dst->width;
for(i=0;i<height;i++)
{
for(j=0;j<width;j++)
{
image->data[i+j*xsize]=p[i*step+j];
}
}
/* call LSD */
out = lsd(dst);
/* print output */
printf("%u line segments found:\n",out->size);
for(i=0;i<out->size;i++)
{
for(j=0;j<out->dim;j++)
printf("%f ",out->values[ i * out->dim + j ]);
printf("\n");
}
/* free memory */
free_image_double(image);
free_ntuple_list(out);
return 0;
}
N.B:it has no errors but when i run it gives out an LSD internal error:invalid image input
Start by researching how PGM is structured:
Each PGM image consists of the following:
1. A "magic number" for identifying the file type.
A pgm image's magic number is the two characters "P5".
2. Whitespace (blanks, TABs, CRs, LFs).
3. A width, formatted as ASCII characters in decimal.
4. Whitespace.
5. A height, again in ASCII decimal.
6. Whitespace.
7. The maximum gray value (Maxval), again in ASCII decimal.
Must be less than 65536, and more than zero.
8. A single whitespace character (usually a newline).
9. A raster of Height rows, in order from top to bottom.
Each row consists of Width gray values, in order from left to right.
Each gray value is a number from 0 through Maxval, with 0 being black
and Maxval being white. Each gray value is represented in pure binary
by either 1 or 2 bytes. If the Maxval is less than 256, it is 1 byte.
Otherwise, it is 2 bytes. The most significant byte is first.
For PGM type P2, pixels are readable (ASCII) on the file, but for P5 they won't be because they will be stored in binary format.
One important thing you should know, is that this format takes only 1 channel per pixel. This means PGM can only store GREY scaled images. Remember this!
Now, if you're using OpenCV to load images from a file, you should load them using CV_LOAD_IMAGE_GRAYSCALE:
IplImage* cv_img = cvLoadImage("chairs.png", CV_LOAD_IMAGE_GRAYSCALE);
if(!cv_img)
{
std::cout << "ERROR: cvLoadImage failed" << std::endl;
return -1;
}
But if you use any other flag on this function or if you create an image with cvCreateImage(), or if you're capturing frames from a camera or something like that, you'll need to convert each frame to its grayscale representation using cvCvtColor().
I downloaded lsd-1.5 and noticed that there is an example there that shows how to use the library. One of the source code files, named lsd_cmd.c, manually reads a PGM file and assembles an image_double with it. The function that does this trick is read_pgm_image_double(), and it reads the pixels from a PGM file and stores them inside image->data. This is important because if the following does not work, you'll have to iterate on the pixels of IplImage and do this yourself.
After successfully loading a gray scaled image into IplImage* cv_img, you can try to create the structure you need with:
image_double image = new_image_double(cv_img->width, cv_img->height);
image->data = (double) cv_img->imageData;
In case this doesn't work, you'll need to check the file I suggested above and iterate through the pixels of cv_img->imageData and copy them one by one (doing the proper type conversion) to image->data.
At the end, don't forget to free this resource when you're done using it:
free_image_double(image);
This question helped me some time ago. You probably solved it already so sorry for the delay but i'm sharing now the answer.
I'm using lsd 1.6 and the lsd interface is a little different from the one you are using (they changed the lsd function interface from 1.5 to 1.6).
CvCapture* capture;
capture = cvCreateCameraCapture (0);
assert( capture != NULL );
//get capture properties
int width = cvGetCaptureProperty(capture, CV_CAP_PROP_FRAME_WIDTH);
int height = cvGetCaptureProperty(capture, CV_CAP_PROP_FRAME_HEIGHT);
//create OpenCV image structs
IplImage *frame;
IplImage *frameBW = cvCreateImage( cvSize( width, height ), IPL_DEPTH_8U, 1 );
//create LSD image type
double *image;
image = (double *) malloc( width * height * sizeof(double) );
while (1) {
frame = cvQueryFrame( capture );
if( !frame ) break;
//convert to grayscale
cvCvtColor( frame , frameBW, CV_RGB2GRAY);
//cast into LSD image type
uchar *data = (uchar *)frameBW->imageData;
for (i=0;i<width;i++){
for(j=0;j<height;j++){
image[ i + j * width ] = data[ i + j * width];
}
}
//run LSD
double *list;
int n;
list = lsd( &n, image, width, height );
//DO PROCESSING DRAWING ETC
//draw segments on frame
for (int j=0; j<n ; j++){
//define segment end-points
CvPoint pt1 = cvPoint(list[ 0 + j * 7 ],list[ 1 + j * 7 ]);
CvPoint pt2 = cvPoint(list[ 2 + j * 7 ],list[ 3 + j * 7 ]);
// draw line segment on frame
cvLine(frame,pt1,pt2,CV_RGB(255,0,0),1.5,8,0);
}
cvShowImage("FRAME WITH LSD",frame);
//free memory
free( (void *) list );
char c = cvWaitKey(1);
if( c == 27 ) break; // ESC QUITS
}
//free memory
free( (void *) image );
cvReleaseImage( &frame );
cvReleaseImage( &frameBW );
cvDestroyWindow( "FRAME WITH LSD");
Hope this helps you or someone in the future! LSD works really great.

How to read back a CUDA Texture for testing?

Ok, so far, I can create an array on the host computer (of type float), and copy it to the gpu, then bring it back to the host as another array (to test if the copy was successful by comparing to the original).
I then create a CUDA array from the array on the GPU. Then I bind that array to a CUDA texture.
I now want to read that texture back and compare with the original array (again to test that it copied correctly). I saw some sample code that uses the readTexel() function shown below. It doesn't seem to work for me... (basically everything works except for the section in the bindToTexture(float* deviceArray) function starting at the readTexels(SIZE, testArrayDevice) line).
Any suggestions of a different way to do this? Or are there some obvious problems I missed in my code?
Thanks for the help guys!
#include <stdio.h>
#include <assert.h>
#include <cuda.h>
#define SIZE 20;
//Create a channel description to use.
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
//Create a texture to use.
texture<float, 2, cudaReadModeElementType> cudaTexture;
//cudaTexture.filterMode = cudaFilterModeLinear;
//cudaTexture.normalized = false;
__global__ void readTexels(int amount, float *Array)
{
int index = blockIdx.x * blockDim.x + threadIdx.x;
if (index < amount)
{
float x = tex1D(cudaTexture, float(index));
Array[index] = x;
}
}
float* copyToGPU(float* hostArray, int size)
{
//Create pointers, one for the array to be on the device, and one for bringing it back to the host for testing.
float* deviceArray;
float* testArray;
//Allocate some memory for the two arrays so they don't get overwritten.
testArray = (float *)malloc(sizeof(float)*size);
//Allocate some memory for the array to be put onto the GPU device.
cudaMalloc((void **)&deviceArray, sizeof(float)*size);
//Actually copy the array from hostArray to deviceArray.
cudaMemcpy(deviceArray, hostArray, sizeof(float)*size, cudaMemcpyHostToDevice);
//Copy the deviceArray back to testArray in host memory for testing.
cudaMemcpy(testArray, deviceArray, sizeof(float)*size, cudaMemcpyDeviceToHost);
//Make sure contents of testArray match the original contents in hostArray.
for (int i = 0; i < size; i++)
{
if (hostArray[i] != testArray[i])
{
printf("Location [%d] does not match in hostArray and testArray.\n", i);
}
}
//Don't forget free these arrays after you're done!
free(testArray);
return deviceArray; //TODO: FREE THE DEVICE ARRAY VIA cudaFree(deviceArray);
}
cudaArray* bindToTexture(float* deviceArray)
{
//Create a CUDA array to translate deviceArray into.
cudaArray* cuArray;
//Allocate memory for the CUDA array.
cudaMallocArray(&cuArray, &cudaTexture.channelDesc, SIZE, 1);
//Copy the deviceArray into the CUDA array.
cudaMemcpyToArray(cuArray, 0, 0, deviceArray, sizeof(float)*SIZE, cudaMemcpyHostToDevice);
//Release the deviceArray
cudaFree(deviceArray);
//Bind the CUDA array to the texture.
cudaBindTextureToArray(cudaTexture, cuArray);
//Make a test array on the device and on the host to verify that the texture has been saved correctly.
float* testArrayDevice;
float* testArrayHost;
//Allocate memory for the two test arrays.
cudaMalloc((void **)&testArray, sizeof(float)*SIZE);
testArrayHost = (float *)malloc(sizeof(float)*SIZE);
//Read the texels of the texture to the test array in the device.
readTexels(SIZE, testArrayDevice);
//Copy the device test array to the host test array.
cudaMemcpy(testArrayHost, testArrayDevice, sizeof(float)*SIZE, cudaMemcpyDeviceToHost);
//Print contents of the array out.
for (int i = 0; i < SIZE; i++)
{
printf("%f\n", testArrayHost[i]);
}
//Free the memory for the test arrays.
free(testArrayHost);
cudaFree(testArrayDevice);
return cuArray; //TODO: UNBIND THE CUDA TEXTURE VIA cudaUnbindTexture(cudaTexture);
//TODO: FREE THE CUDA ARRAY VIA cudaFree(cuArray);
}
int main(void)
{
float* hostArray;
hostArray = (float *)malloc(sizeof(float)*SIZE);
for (int i = 0; i < SIZE; i++)
{
hostArray[i] = 10.f + i;
}
float* deviceAddy = copyToGPU(hostArray, SIZE);
free(hostArray);
return 0;
}
Briefly:
------------- in your main.cu ---------------------------------------------------------------------------------------
-1. Define the texture as a globlal variable
texture refTexture; // global variable !
// meaning: address the texture with (x,y) (2D) and get an unsinged int
In the main function:
-2. Use arrays combined with texture
cudaArray* myArray; // declar.
// ask for memory
cudaMallocArray ( &myArray,
&refTex.channelDesc, /* with this you don't need to fill a channel descriptor */
width,
height);
-3. copy data from CPU to GPU (to the array)
cudaMemcpyToArray ( arrayCudaEntrada, // destination: the array
0, 0, // offsets
sourceData, // pointer uint*
widthheightsizeof(uint), // total amount of bytes to be copied
cudaMemcpyHostToDevice);
-4. bind texture and array
cudaBindTextureToArray( refTex,arrayCudaEntrada)
-5. change some parameters in the texture
refTextura_In.normalized = false; // don't automatically convert fetched data to [0,1[
refTextura_In.addressMode[0] = cudaAddressModeClamp; // if my indexing is out of bounds: automatically use a valid indexing (0 if negative index, last if too great index)
refTextura_In.addressMode[1] = cudaAddressModeClamp;
---------- in the kernel --------------------------------------------------------
// find out indexes (f,c) to process by this thread
uint f = (blockIdx.x * blockDim.x) + threadIdx.x;
uint c = (blockIdx.y * blockDim.y) + threadIdx.y;
// this is curious and necessary: indexes for reading from a texture
// are floats !. Even if you are certain to access (4,5) you have
// match the "center" this is (4.5, 5.5)
uint read = tex2D( refTex, c+0.5f, f+0.5f); // texRef is a global variable
Now You process read and write the results to other zone of the device global
memory, not to the texture itself !
readTexels() is a kernel (__global__) function, i.e. it runs on the GPU. Therefore you need to use the correct syntax to launch a kernel.
Take a look through the CUDA Programming Guide and some of the SDK samples, both available via the NVIDIA CUDA site to see how to launch a kernel.
Hint: It'll end up something like readTexels<<<grid,block>>>(...)

Pruning short line segments from edge detector output?

I am looking for an algorithm to prune short line segments from the output of an edge detector. As can be seen in the image (and link) below, there are several small edges detected that aren't "long" lines. Ideally I'd like just the 4 sides of the quadrangle to show up after processing, but if there are a couple of stray lines, it won't be a big deal... Any suggestions?
Image Link
Before finding the edges pre-process the image with an open or close operation (or both), that is, erode followed by dilate, or dilate followed by erode. this should remove the smaller objects but leave the larger ones roughly the same.
I've looked for online examples, and the best I could find was on page 41 of this PDF.
I doubt that this can be done with a simple local operation. Look at the rectangle you want to keep - there are several gaps, hence performing a local operation to remove short line segments would probably heavily reduce the quality of the desired output.
In consequence I would try to detect the rectangle as important content by closing the gaps, fitting a polygon, or something like that, and then in a second step discard the remaining unimportant content. May be the Hough transform could help.
UPDATE
I just used this sample application using a Kernel Hough Transform with your sample image and got four nice lines fitting your rectangle.
In case somebody steps on this thread, OpenCV 2.x brings an example named squares.cpp that basically nails this task.
I made a slight modification to the application to improve the detection of the quadrangle
Code:
#include "highgui.h"
#include "cv.h"
#include <iostream>
#include <math.h>
#include <string.h>
using namespace cv;
using namespace std;
void help()
{
cout <<
"\nA program using pyramid scaling, Canny, contours, contour simpification and\n"
"memory storage (it's got it all folks) to find\n"
"squares in a list of images pic1-6.png\n"
"Returns sequence of squares detected on the image.\n"
"the sequence is stored in the specified memory storage\n"
"Call:\n"
"./squares\n"
"Using OpenCV version %s\n" << CV_VERSION << "\n" << endl;
}
int thresh = 70, N = 2;
const char* wndname = "Square Detection Demonized";
// helper function:
// finds a cosine of angle between vectors
// from pt0->pt1 and from pt0->pt2
double angle( Point pt1, Point pt2, Point pt0 )
{
double dx1 = pt1.x - pt0.x;
double dy1 = pt1.y - pt0.y;
double dx2 = pt2.x - pt0.x;
double dy2 = pt2.y - pt0.y;
return (dx1*dx2 + dy1*dy2)/sqrt((dx1*dx1 + dy1*dy1)*(dx2*dx2 + dy2*dy2) + 1e-10);
}
// returns sequence of squares detected on the image.
// the sequence is stored in the specified memory storage
void findSquares( const Mat& image, vector<vector<Point> >& squares )
{
squares.clear();
Mat pyr, timg, gray0(image.size(), CV_8U), gray;
// karlphillip: dilate the image so this technique can detect the white square,
Mat out(image);
dilate(out, out, Mat(), Point(-1,-1));
// then blur it so that the ocean/sea become one big segment to avoid detecting them as 2 big squares.
medianBlur(out, out, 3);
// down-scale and upscale the image to filter out the noise
pyrDown(out, pyr, Size(out.cols/2, out.rows/2));
pyrUp(pyr, timg, out.size());
vector<vector<Point> > contours;
// find squares only in the first color plane
for( int c = 0; c < 1; c++ ) // was: c < 3
{
int ch[] = {c, 0};
mixChannels(&timg, 1, &gray0, 1, ch, 1);
// try several threshold levels
for( int l = 0; l < N; l++ )
{
// hack: use Canny instead of zero threshold level.
// Canny helps to catch squares with gradient shading
if( l == 0 )
{
// apply Canny. Take the upper threshold from slider
// and set the lower to 0 (which forces edges merging)
Canny(gray0, gray, 0, thresh, 5);
// dilate canny output to remove potential
// holes between edge segments
dilate(gray, gray, Mat(), Point(-1,-1));
}
else
{
// apply threshold if l!=0:
// tgray(x,y) = gray(x,y) < (l+1)*255/N ? 255 : 0
gray = gray0 >= (l+1)*255/N;
}
// find contours and store them all as a list
findContours(gray, contours, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
vector<Point> approx;
// test each contour
for( size_t i = 0; i < contours.size(); i++ )
{
// approximate contour with accuracy proportional
// to the contour perimeter
approxPolyDP(Mat(contours[i]), approx, arcLength(Mat(contours[i]), true)*0.02, true);
// square contours should have 4 vertices after approximation
// relatively large area (to filter out noisy contours)
// and be convex.
// Note: absolute value of an area is used because
// area may be positive or negative - in accordance with the
// contour orientation
if( approx.size() == 4 &&
fabs(contourArea(Mat(approx))) > 1000 &&
isContourConvex(Mat(approx)) )
{
double maxCosine = 0;
for( int j = 2; j < 5; j++ )
{
// find the maximum cosine of the angle between joint edges
double cosine = fabs(angle(approx[j%4], approx[j-2], approx[j-1]));
maxCosine = MAX(maxCosine, cosine);
}
// if cosines of all angles are small
// (all angles are ~90 degree) then write quandrange
// vertices to resultant sequence
if( maxCosine < 0.3 )
squares.push_back(approx);
}
}
}
}
}
// the function draws all the squares in the image
void drawSquares( Mat& image, const vector<vector<Point> >& squares )
{
for( size_t i = 1; i < squares.size(); i++ )
{
const Point* p = &squares[i][0];
int n = (int)squares[i].size();
polylines(image, &p, &n, 1, true, Scalar(0,255,0), 3, CV_AA);
}
imshow(wndname, image);
}
int main(int argc, char** argv)
{
if (argc < 2)
{
cout << "Usage: ./program <file>" << endl;
return -1;
}
static const char* names[] = { argv[1], 0 };
help();
namedWindow( wndname, 1 );
vector<vector<Point> > squares;
for( int i = 0; names[i] != 0; i++ )
{
Mat image = imread(names[i], 1);
if( image.empty() )
{
cout << "Couldn't load " << names[i] << endl;
continue;
}
findSquares(image, squares);
drawSquares(image, squares);
imwrite("out.jpg", image);
int c = waitKey();
if( (char)c == 27 )
break;
}
return 0;
}
The Hough Transform can be a very expensive operation.
An alternative that may work well in your case is the following:
run 2 mathematical morphology operations called an image close (http://homepages.inf.ed.ac.uk/rbf/HIPR2/close.htm) with a horizontal and vertical line (of a given length determined from testing) structuring element respectively. The point of this is to close all gaps in the large rectangle.
run connected component analysis. If you have done the morphology effectively, the large rectangle will come out as one connected component. It then only remains iterating through all the connected components and picking out the most likely candidate that should be the large rectangle.
Perhaps finding the connected components, then removing components with less than X pixels (empirically determined), followed by dilation along horizontal/vertical lines to reconnect the gaps within the rectangle
It's possible to follow two main techniques:
Vector based operation: map your pixel islands into clusters (blob, voronoi zones, whatever). Then apply some heuristics to rectify the segments, like Teh-Chin chain approximation algorithm, and make your pruning upon vectorial elements (start, endpoint, length, orientation and so on).
Set based operation: cluster your data (as above). For every cluster, compute principal components and detect lines from circles or any other shape by looking for clusters showing only 1 significative eigenvalue (or 2 if you look for "fat" segments, that could resemble to ellipses). Check eigenvectors associated with eigenvalues to have information about orientation of the blobs, and make your choice.
Both ways could be easily explored with OpenCV (the former, indeed, falls under "Contour analysis" category of algos).
Here is a simple morphological filtering solution following the lines of #Tom10:
Solution in matlab:
se1 = strel('line',5,180); % linear horizontal structuring element
se2 = strel('line',5,90); % linear vertical structuring element
I = rgb2gray(imread('test.jpg'))>80; % threshold (since i had a grayscale version of the image)
Idil = imdilate(imdilate(I,se1),se2); % dilate contours so that they connect
Idil_area = bwareaopen(Idil,1200); % area filter them to remove the small components
The idea is to basically connect the horizontal contours to make a large component and filter by an area opening filter later on to obtain the rectangle.
Results:

Resources