Convert image document to black and white with OpenCV - c

I'm new to OpenCV and image processing and I'M not sure how to solve my problem.
I have a photo of document made in iPhone and I want to convert that document to black and white. I tried to use threshold but the text was not so good (a little blurry and unreadable). I'd like to text looks same as on the original image, only black, and background will be white. What can I do?
P.S. When I made a photo of part of the document, where text is quite big, then result is ok.
I will be grateful for any help.
Here are the example image I use and the result:

My attemp, maybe a little more readable than yours:
IplImage * pRGBImg = 0;
pRGBImg = cvLoadImage(input_file.c_str(), CV_LOAD_IMAGE_UNCHANGED);
if(!pRGBImg)
{
std::cout << "ERROR: Failed to load input image" << std::endl;
return -1;
}
// Allocate the grayscale image
IplImage * pGrayImg = 0;
pGrayImg = cvCreateImage(cvSize(pRGBImg->width, pRGBImg->height), pRGBImg->depth, 1);
// Convert it to grayscale
cvCvtColor(pRGBImg, pGrayImg, CV_RGB2GRAY);
// Dilate
cvDilate(pGrayImg, pGrayImg, 0, 0.2);
cvThreshold(pGrayImg, pGrayImg, 30, 255, CV_THRESH_BINARY | CV_THRESH_OTSU);
cvSmooth(pGrayImg, pGrayImg, CV_BLUR, 2, 2);
cvSaveImage("out.png", pGrayImg);

Threshold image is used for different purposes.
If u just want to convert it to b/w image just do this. USing openCV 2.2
cv::Mat image_name = cv::imread("fileName", 0);
the second parameter 0 tells to read a color image as b/w image.
And if you want to save as a b/w image file.
code this
cv::Mat image_name = cv::imread("fileName", 0);
cv::imwrite(image_name, "bw_filename.jpg");

Using Adaptive Gaussian Thresholding is a good idea here. This will also enhance the quality of text written in the image.
You can do that by simply giving the command:
AdaptiveThreshold(src_Mat, dst_Mat, Max_value, Adaptive_Thresholding_Method, Thresholding_type, blocksize, C);

Related

OpenCV canny; output image is pure gray

I am learning opencv and reading a book and following examples. The book introduced the canny filter. However there is some problem with my output. As an input image I have given a 512x512 gray scale image but the filter output is pure gray image. Here is the image:
This is the input image.
And this is the output image.
And here is the snippets:
#include <opencv\cv.h>
#include <opencv2\highgui\highgui.hpp>
#include "Resources.h"
IplImage* doCanny(
IplImage* in,
double lowThresh,
double highThresh,
double aperture
) {
if (in->nChannels != 1)
{
return 0; // Canny only handle gray scale images.
}
IplImage* out = cvCreateImage(
CvSize(cvGetSize(in)),
IPL_DEPTH_8U,
1
);
cvCanny(in, out, lowThresh, highThresh, aperture);
return out;
}
int main(int argc, char** argv)
{
IplImage* image = cvLoadImage(IMAGE_FRUIT);
IplImage* output = doCanny(image, 200, 201, 1);
cvNamedWindow("Canny", CV_WINDOW_AUTOSIZE);
cvShowImage("Canny", output);
cvWaitKey(0);
cvReleaseImage(&output);
cvDestroyWindow("Canny");
return 0;
}
Visual Studio 2015, OpenCV version 2.4.13
I think if you step through your code, you will realize the cvCanny function never gets triggered, the returned output from doCanny is a null pointer.
OpenCV's Canny edge detection algorithm only accepts gray scale image, which is why the original code has the "if (in->nChannels != 1)" check, so you need to convert your input image into a grayscale image first.
// Convert to grayscale first
IplImage* gray_image = cvCreateImage(cvGetSize(image), IPL_DEPTH_8U, 1);
cvCvtColor(image, gray_image, CV_BGR2GRAY);
// Perform Canny
IplImage* output = doCanny(gray_image, 200, 201, 3);
Additional, I think your "aperture" parameter for cvCanny is also invalid, try to use the default value 3 (or 5, 7), and you should be able to see the result.
I would also recommend using the C++ interface instead of the deprecated C interface.

Font layouting & rendering with cairo and freetype

I have a system that has only the freetype2 and cairo libraries available. What I want to achieve is:
getting the glyphs for a UTF-8 text
layouting the text, storing position information (by myself)
getting cairo paths for each glyph for rendering
Unfortunately the documentation doesn't really explain how it should be done, as they expect one to use a higher level library like Pango.
What I think could be right is: Create a scaled font with cairo_scaled_font_create and then retrieve the glyphs for the text using cairo_scaled_font_text_to_glyphs. cairo_glyph_extents then gives the extents for each glyph. But how can I then get things like kerning and the advance? Also, how can I then get paths for each font?
Are there some more resources on this topic? Are these functions the expected way to go?
Okay, so I found what's needed.
You first need to create a cairo_scaled_font_t which represents a font in a specific size. To do so, one can simply use cairo_get_scaled_font after setting a font, it creates a scaled font for the current settings in the context.
Next, you convert the input text using cairo_scaled_font_text_to_glyphs, this gives an array of glyphs and also clusters as output. The cluster mappings represent which part of the UTF-8 string belong to the corresponding glyphs in the glyph array.
To get the extents of glyphs, cairo_scaled_font_glyph_extents is used. It gives dimensions, advances and bearings of each glyph/set of glyphs.
Finally, the paths for glyphs can be put in the context using cairo_glyph_path. These paths can then be drawn as wished.
The following example converts an input string to glyphs, retrieves their extents and renders them:
const char* text = "Hello world";
int fontSize = 14;
cairo_font_face_t* fontFace = ...;
// get the scaled font object
cairo_set_font_face(cr, fontFace);
cairo_set_font_size(cr, fontSize);
auto scaled_face = cairo_get_scaled_font(cr);
// get glyphs for the text
cairo_glyph_t* glyphs = NULL;
int glyph_count;
cairo_text_cluster_t* clusters = NULL;
int cluster_count;
cairo_text_cluster_flags_t clusterflags;
auto stat = cairo_scaled_font_text_to_glyphs(scaled_face, 0, 0, text, strlen(text), &glyphs, &glyph_count, &clusters, &cluster_count,
&clusterflags);
// check if conversion was successful
if (stat == CAIRO_STATUS_SUCCESS) {
// text paints on bottom line
cairo_translate(cr, 0, fontSize);
// draw each cluster
int glyph_index = 0;
int byte_index = 0;
for (int i = 0; i < cluster_count; i++) {
cairo_text_cluster_t* cluster = &clusters[i];
cairo_glyph_t* clusterglyphs = &glyphs[glyph_index];
// get extents for the glyphs in the cluster
cairo_text_extents_t extents;
cairo_scaled_font_glyph_extents(scaled_face, clusterglyphs, cluster->num_glyphs, &extents);
// ... for later use
// put paths for current cluster to context
cairo_glyph_path(cr, clusterglyphs, cluster->num_glyphs);
// draw black text with green stroke
cairo_set_source_rgba(cr, 0.2, 0.2, 0.2, 1.0);
cairo_fill_preserve(cr);
cairo_set_source_rgba(cr, 0, 1, 0, 1.0);
cairo_set_line_width(cr, 0.5);
cairo_stroke(cr);
// glyph/byte position
glyph_index += cluster->num_glyphs;
byte_index += cluster->num_bytes;
}
}
Those functions seem to be the best way, considering Cairo's text system. It just shows even more that Cairo isn't really meant for text. It won't be able to do kerning or paths really. Pango, I believe, would have its own complex code for doing those things.
For best advancement of Ghost, I would recommend porting Pango, since you (or someone else) will probably eventually want it anyway.

Why is pango_layout_get_pixel_size slightly wrong on Ubuntu Linux in C

I'm experiencing a frustrating issue trying to draw text using Pango and Cairo libraries in C in a Gtk application running on Ubuntu Linux.
I'm creating a Pango layout and then drawing it at a given location which is determined by the size of the text as reported by pango_layout_get_pixel_size, but the size returned by that function is wrong in both width and height, especially in height. Here is my full code:
// Create a cairo context with which to draw
// Note that we already have a GtkDrawingArea widget m_pGtkDrawingArea
cairo_t *cr = gdk_cairo_create(m_pGtkDrawingArea->window);
// Text to draw
std::string szText("NO DATA AVAILABLE");
// Create the layout
PangoLayout *pLayout = gtk_widget_create_pango_layout(m_pGtkDrawingArea, szText.c_str());
// Set layout properties
pango_layout_set_alignment(pLayout, PANGO_ALIGN_LEFT);
pango_layout_set_width(pLayout, -1);
// The family to use
std::string szFontFamily("FreeSans");
// The font size to use
double dFontSize = 36.0;
// Format the font description string
char szFontDescription[32];
memset(&(szFontDescription[0]), 0, sizeof(szFontDescription));
snprintf(szFontDescription, sizeof(szFontDescription) - 1, "%s %.1f", szFontFamily.c_str(), dFontSize);
// Get a new pango font description
PangoFontDescription *pFontDescription = pango_font_description_from_string(szFontDescription);
// Set up the pango font description
pango_font_description_set_weight(pFontDescription, PANGO_WEIGHT_NORMAL);
pango_font_description_set_style(pFontDescription, PANGO_STYLE_NORMAL);
pango_font_description_set_variant(pFontDescription, PANGO_VARIANT_NORMAL);
pango_font_description_set_stretch(pFontDescription, PANGO_STRETCH_NORMAL);
// Set this as the pango font description on the layout
pango_layout_set_font_description(pLayout, pFontDescription);
// Use auto direction
pango_layout_set_auto_dir(pLayout, TRUE);
// Get the pixel size of this text - this reports a size of 481x54 pixels
int iPixelWidth = 0, iPixelHeight = 0;
pango_layout_get_pixel_size(pLayout, &iPixelWidth, &iPixelHeight);
// Calculate the text location based on iPixelWidth and iPixelHeight
double dTextLocX = ...;
double dTextLocY = ...;
// Set up the cairo context for drawing the text
cairo_set_source_rgba(cr, 1.0, 1.0, 1.0, 1.0);
cairo_set_antialias(cr, CAIRO_ANTIALIAS_BEST);
// Move into place
cairo_move_to(cr, dTextLocX, dTextLocY);
// Draw the layout
pango_cairo_show_layout(cr, pLayout);
//
// pango_layout_get_pixel_size() reported a size of 481x54 pixels,
// but the actual size when drawn is 478x37 pixels!
//
//
// Clean up...
//
So, as described at the bottom of the above code, the pango_layout_get_pixel_size function reports a size of 481x54 pixels, but the size of the text on the screen is actually 478x37 pixels.
What am I doing wrong here? How can I get the actual correct pixel size?
Thanks in advance for your help!
The text you are displaying ("NO DATA AVAILABLE") is all-caps, and consequently has no descenders (letters which are partly below the baseline, like j, p and q.) Normally, when you measure the extent of a text box, you include room for the descenders whether or not they are present; otherwise, you will see odd artifacts such as inconsistent line separation depending on whether or not a given line has a descender.
Pango provides APIs which return both the logical extent (which includes the full height of the font) and the ink extent (which is the bounding box of the inked part of the image). I suspect you are looking for the ink extent.

openCV get subimage in C

I am using OpenCV for image manipulation in C. Please forgive me if this question is answered in the documentation, but I have found the OpenCV docs to be pretty badly formed and difficult to read.
I have an CvMat* that i have extracted from an image file as below:
CvMat* mat = cvLoadImageM((char*) filename, CV_LOAD_IMAGE_COLOR);
What I need to do is get a subimage of that by cropping out a certain bounded region. A logical command for this might be:
CvMat* subMat = cvGetSubImage(mat, minx, maxx, miny, maxy);
where minx, maxx, miny, and maxy define the boundaries of the cropped region. Is there a built in way to do this easily?
Take a look at http://nashruddin.com/OpenCV_Region_of_Interest_(ROI)/
In which the tutorial does the following on a Region of Interest:
cvSetImageROI(img1, cvRect(10, 15, 150, 250));
IplImage *img2 = cvCreateImage(cvGetSize(img1),
img1->depth,
img1->nChannels);
cvCopy(img1, img2, NULL);
cvResetImageROI(img1);
OpenCV has built in capabilities for setting the region which you care about and copying that region out of an image, just as you want to achieve.
If you want a sub-pixel accurate rectangular section of a src image use cvGetRectSubPix or cv::getRectSubPix (this creates an individual copy of all the data, this is not a ROI!)
Example:
cv::Size size(dst_width,dst_height);
cv::Point2f center(src_centerx,src_center_y);
cv::Mat dst;
cv::getRectSubPix(src,size, center,dst,CV_8U);
Generally this is done by cropping an ROI (region of interest). This blog post goes into some detail on cropping:
/* load image */
IplImage *img1 = cvLoadImage("elvita.jpg", 1);
/* sets the Region of Interest
Note that the rectangle area has to be __INSIDE__ the image */
cvSetImageROI(img1, cvRect(10, 15, 150, 250));
/* create destination image
Note that cvGetSize will return the width and the height of ROI */
IplImage *img2 = cvCreateImage(cvGetSize(img1),
img1->depth,
img1->nChannels);
/* copy subimage */
cvCopy(img1, img2, NULL);
/* always reset the Region of Interest */
cvResetImageROI(img1);
To convert between IplImage (legacy OpenCV) and cvMat (OpenCV 2.x), simply use the cvMat constructor or look at this question for more methods.

OpenCV cvCanny memory exception

I am trying to do the examples in the OpenCV book and I got to the part regarding cvCanny. I am trying to use it, but I keep getting a memory exception error of
Unhandled exception at 0x75d8b760 in Image_Transform.exe: Microsoft C++ exception: cv::Exception at memory location 0x0011e7a4..
I have also looked at another post that was similar to this question, but it did not help for me as I got the same error each time. Any help is greatly appreciated and the source code for the function is located below.
void example2_4(IplImage* img)
{
// Create windows to show input and ouput images
cvNamedWindow("Example 2-4 IN", CV_WINDOW_AUTOSIZE);
cvNamedWindow("Example 2-4 OUT", CV_WINDOW_AUTOSIZE);
// Display out input image
cvShowImage("Example 2-4 IN", img);
// Create an image to hold our modified input image
IplImage* out = cvCreateImage(cvGetSize(img), IPL_DEPTH_8U, 3);
// Do some smoothing
//cvSmooth(img, out, CV_GAUSSIAN, 3, 3);
// Do some Edge detection
cvCanny(img, out, 10, 20, 3);
// Show the results
cvShowImage("Example 2-4 OUT", out);
// Release the memory used by the transformed image
cvReleaseImage(&out);
// Wait for user to hit a key then clean up the windows
cvWaitKey(0);
cvDestroyWindow("Example 2-4 IN");
cvDestroyWindow("Example 2-4 OUT");
}
int main()
{
// Load in an image
IplImage* img = cvLoadImage("images/00000038.jpg");
// Run the transform
example2_4(img);
// clean the image from memory
cvReleaseImage(&img);
return 0;
}
You forgot to say if you are able to see the original image being displayed on the screen.
I never get tired of telling people that checking the return of functions is a must!
Consider IplImage* img = cvLoadImage("images/00000038.jpg"); , how can you tell if this function succeeded or not? As far as I can tell, the error you are having might be from a function failing prior to cvCanny() being called.
Anyway, I recently posted a code that uses cvCanny to improve circle detection. You can check that code and see what you are doing differently.
EDIT:
Your problem in this case is that you are passing to cvCanny input and output as a 3 channel image, when it takes only a single channel image. Check the docs:
void cvCanny(const CvArr* image, CvArr* edges, double threshold1, double threshold2, int aperture_size=3)
Implements the Canny algorithm for edge detection.
Parameters:
* image – Single-channel input image
* edges – Single-channel image to store the edges found by the function
* threshold1 – The first threshold
* threshold2 – The second threshold
* aperture_size – Aperture parameter for the Sobel operator (see Sobel)
So, change your code to:
// Create an image to hold our modified input image
IplImage* out = cvCreateImage(cvGetSize(img), IPL_DEPTH_8U, 1);
// Do some smoothing
//cvSmooth(img, out, CV_GAUSSIAN, 3, 3);
IplImage* gray = cvCreateImage(cvGetSize(img), IPL_DEPTH_8U, 1);
cvCvtColor(img, gray, CV_BGR2GRAY);
// Do some Edge detection
cvCanny(gray, out, 10, 20, 3);

Resources