I am starting learning an openCV and a have question about it.
My target is to recognize captcha.
First I must preprocess an image.
There is an example of captcha
here
So problem is how to crop symbols from image and put it into 2D array(bitmap).
Automatic Partition Detection
The first thing you'd need to do is create the filter array of background colors. This will be the array containing the colors that occur in the background. For that purpose you can just take the offset 20x20 area or leave it as a user option depending on your project standpoint.
typedef unsigned char Pixel [3];
typedef *Pixel PixelArray;
// Function to return offset byte of x/y coordinate
int bmp_get_offset (int width, int x, int y)
{
int w = width;
const int channels = 3;
const int bpp = 8;
const int single = (channels * bmpp) / 8;
const int offset = 54;
int rowsize = w * single;
int pixAddress;
if(rowsize % 4 != 0) rowsize += 4 - (rowsize % 4);
pixAddress = offset + yp * rowsize + xp * single;
return pixAddress;
}
// Function to return specific area (pseudo-code)
PixelArray bmp_get_area (FILE * bmp, int x, int y, int w, int h)
{
PixelArray buffer = buffer_new(bmp); // sets image into a memory-allocated buffer
PixelArray area [h * w];
const int src_width = *((int*)&buffer[(0x12)]);
for(int iWidth = 0; iWidth < w; iWidth++)
for(int iHeight = 0; iHeight < h; iHeight++)
area[iHeight * src_width + iWidth] = buffer[bmp_get_offset(src_width, x + iWidth, y + iHeight)];
return area;
}
Well it didn't go that much pseudo-code.
Now that you have the filter you can limit outer pixels.
Now what you need is a vertical raster scan. Or just vertical scan.
Over the entire captcha image.
Each pixel of the vertical line will be additionally checked if it matches some color from the already-obtained area.
If all the pixels of the line (that has the size of the image's height) return positive when checked whether pixel is close to or matches area color, an aray indexer will increment so that we have where the last character ends.
Edit 1
For 3 seconds I GIMPed the color curves of the image, resulting in a plain background:
So this respectively simplifies the filtering process quite a lot.
The color curves magic I did is actually only a brightness/contrast adjustment control, which is maybe the easiest color processing you can implement (after the invert).
I might periodically edit to clarify some more. This will certainly give you a good practice. A real practice.
Doc:
BMP File Format (More than enough information for you to start working with bitmaps. The most important one is the bitmap structure, which is a combination of BMPINFOHEADER and DIBHEADER).
Tesseract OCR (alternative that will do everything for you. However if you solve your problems with the easiest solution, it will not make you a better programmer)
Related
I wonder if it is possible to solve a certain problem.
In short: get optimal performance by filling the buffer not only line by line but also column by column.
Description below:
A graphic buffer is given (i.e. intended to hold a bitmap)
#define WIDTH 320
#define HEIGHT 256
typedef struct
{
unsigned char r,g,b,a;
}sRGBA;
sRGBA* bufor_1;
main()
{
bufor_1 = (sRGBA*)malloc(WIDTH*HEIGHT*sizeof(sRGBA));
}
There is no problem with filling it horizontally line by line, because it is a 'cache friendly' case, which is the best one, e.g. floor and ceiling rycasting:
main()
{
bufor_1 = (sRGB*)malloc(WIDTH*HEIGHT*sizeof(sRGB));
for (int y = 0; y < HEIGHT; ++y)
{
for (int x = 0; x < WIDTH; ++x)
{
bufor_1[x+y*WIDTH].r = 100;
}
}
}
The difference in performance appears when we want to supplement such a buffer vertically, i.e. column by column, e.g. wall regeneration, which is done in this way, i.e.
main()
{
bufor_1 = (sRGB*)malloc(WIDTH*HEIGHT*sizeof(sRGB));
for (int x = 0; x < WIDTH; ++x)
{
for (int y = 0; y < HEIGHT; ++y)
{
bufor_1[x+y*WIDTH].r = 100;
}
}
}
The question that arises is whether it is possible to somehow combine efficient line-by-line and column-by-column completion.
From a few tests that I have performed, it turned out that if the buffer is presented as two-dimensional, i.e.
column-by-column filling is even faster than line-by-line in a one-dimensional one - but then it is the other way around, i.e. filling such a two-dimensional buffer line by line will be inefficient.
Solutions I was thinking about:
rotate the buffer 90 degrees, unfortunately it takes too much time, at least with the algorithms that I checked,
unless there is some mega-fast N (1) way
some sort of buffer remapping so that some table contains pointers to the next pixels in the column, but it probably won't be 'cache friendly' or even worse - I haven't checked anyway
I have a function that successfully reads rgb values from a ppm and a function that successfully writes to a ppm. What I am trying is a function called denoiseImage that changes rgb values from a ppm using mean filtering with a frame window size n by n where n is odd. My intent is to go through each pixel, using it as the center point for the window n by n that surrounds it. I then take the mean values for each color (r,g,b) and divide by the number of pixels in the window and assign those new values to the rgb of every pixel in the window. However, I am unable to implement a check for the cases where the frame does not fully fit into pixels (for example, the frame center point is the top right pixel, a window of 3x3 will go to non existent points.) When it does not fit fully, I intend to use the available pixels that fit and take the mean of those numbers instead. So far, my code will only work for cases where the frame fully fits. My function:
RGB *denoiseImage(int width, int height, const RGB *image, int n)
{
int firstPos, lastPos, i = 0, j = 0, k, numofPix;
int sumR=0,sumG=0,sumB=0;
numofPix = (width * height);
RGB *pixels = malloc(numofPix * sizeof(RGB));
if (n == 1) //Case where the window size is 1 and therefore the image does not get changed.
{
return pixels;
}
for (j=0;j < numofPix;j++)
{
firstPos = (j - width) - ((n - 1)/2);
lastPos = (j + width) + ((n - 1)/2);
//Need to check boundary cases to prevent segmentation fault
for (k=firstPos;k<=lastPos;k++) //Seg fault. Unable to shrink frame to compensate for cases where the frame does not fit.
{
sumR+=image[k].r;
sumG+=image[k].g;
sumB+=image[k].b;
i++;
if (i = n) //Used to skip elements not in frame
{
j += (width-n);
i = 0;
}
}
sumR = sumR/(n*n); //Calculating mean values
sumG = sumG/(n*n);
sumB = sumB/(n*n);
for (k=firstPos;k<=lastPos;k++) //Assigning the RGB values with the new mean values.
{
pixels[k].r=sumR;
pixels[k].g=sumG;
pixels[k].b=sumB;
printf("%d %d %d ",pixels[k].r, pixels[k].g, pixels[k].b);
}
}
return pixels;
}
int main()
{
RGB *RGBValues;
int width, height, max;
int j = 0,testemp=3; //test temp is a sample frame size
char *testfile = "test.ppm";
char *testfile2 = "makeme.ppm";
RGBValues = readPPM(testfile, &width, &height, &max); //Function reads values from a ppm file correctly
RGBValues = denoiseImage(width,height, RGBValues, testemp,testing);
writePPM(testfile2,width,height,max,RGBValues); //Function writes values to a ppm file correctly
}
How would I implement a way to check if the frame fits or not?
This is a great question and luckily known in the image processing community.
Edges are always treated differently when it comes to 2D filtering.
One way to look at it is to extend the space in 2D and to fill the edges with extrapolated values from the middle.
For example, you may look into the http://www.librow.com/articles/article-1 and search for a media filter.
I am sure that you will find solution soon, since you are going into right direction.
I know, another dynamic array question, this one is a bit different though so maybe it'll be worth answering. I am making a terrain generator in C with SDL, I am drawing 9 chunks surrounding the screen, proportional to the screen size, that way terrains can be generated easier in the future.
This means that I have to be able to resize the array at any given point, so I made a dynamic array (at least according to an answer I found on stack it is) and everything SEEMS to work fine, nothing is crashing, it even draws a single tile....but just one. I am looking at it and yeah, sure enough it's iterating through the array but only writing to one portion of memory. I am using a struct called Tile that just holds the x, y, w, and h of a rectangle.
This is the code I am using to allocate the array
Tile* TileMap = (Tile*)malloc(0 * sizeof(Tile*));
int arrayLen = sizeof(TileMap);
TileMap = (Tile*)realloc(TileMap, (totalTiles) * sizeof(Tile));
arrayLen = sizeof(totalTiles * sizeof(Tile));
The totalTiles are just the number of tiles that I have calculated previously are on the screen, I've checked the math and it's correct, and it even allocates the proper amount of memory. Here is the code I use to initialize the array:
//Clear all elements to zero.
for (int i = 0; i < arrayLen; i++)
{
Tile tile = {};
TileMap[i] = tile;
}
So what's weird to me is it is considering the size of a tile (16 bytes) * the totalTiles (78,000) is equaling 4....When I drill down into the array, it only has one single rect in it that gets cleared as well, so then when I go calculate the sizes of each tile:
//Figure out Y and heights
for (int i = startY; i <= (startY*(-1)) * 2; i += TILE_HEIGHT)
{
TileMap[i].y = i * TILE_HEIGHT;
TileMap[i].h = TILE_HEIGHT;
//Figure out X and widths
for (int j = startX; j <= (startX*(-1)) * 2; j += TILE_WIDTH)
{
TileMap[i].x = i * TILE_WIDTH;
TileMap[i].w = TILE_WIDTH;
}
}
*Side note, the startX is the negative offset I am using to draw chunks behind the camera, so I times it by -1 to make it positive and then time it by two to get one chunk in front of the camera
Alright, so obviously that only initializes one, and here is the render code
for (int i = 0; i < totalTiles; i++)
{
SDL_Rect currentTile;
currentTile.x = TileMap[i].x;
currentTile.y = TileMap[i].y;
currentTile.w = TileMap[i].w;
currentTile.h = TileMap[i].h;
SDL_RenderDrawRect(renderer, ¤tTile);
}
free(TileMap);
So what am I doing wrong here? I mean I literally am just baffled right now...And before Vectors get recommended in place of dynamic arrays, I don't really like using them and I want to learn to deal with stuff like this, not just implement some simple fix.
Lots of confusion (which is commonplace with C pointers).
The following code doesn't provide expected answer :arrayLen = sizeof(totalTiles * sizeof(Tile));
totalTiles * sizeof(Tile) is not even a type, I'm surprised it compiles at all. Edit : See molbnilo comment below. so it provides the size of the return type.
Anyway, proper answer should be :
arrayLen = totalTiles;
Because that's what you need in your next loop :
//Clear all elements to zero.
for (int i = 0; i < arrayLen; i++)
{
Tile tile = {};
TileMap[i] = tile;
}
You don't need the size of the table, you need its number of elements.
There are other confusions in your sample, they don't directly impact the rest of the code, but better correct them :
Tile* TileMap = (Tile*)malloc(0 * sizeof(Tile*)); : avoid allocating a size of 0.
int arrayLen = sizeof(TileMap); : no, it's not the arrayLen, just the size of the pointer (hence 4 bytes on 32-bits binaries). Remember TileMap is not defined as a table, but as a pointer allocated with malloc() and then realloc().
How can I access an RGB mat as a 1D array? I looked at the documentation but couldn't find how the 3 channel data is laid out in that case.
I'm trying to loop over each pixel with 1 for loop going from n=0 to n = img.rows*img.cols - 1, and access R, G, and B values at each pixel.
Any help would be greatly appreciated.
I don't really understand why you really need only 1 loop, so I will propose you several options (including 1 or 2 for-loops) that I know by experience to be efficient.
If you really want to iterate over all the values with only one loop in a safe way, you can reshape the matrix and turn a 3-channel 2D image into a 1-channel 1D array using cv::Mat::reshape(...) (doc):
cv::Mat rgbMat = cv::imread(...); // Read original image
// As a 1D-1 channel, we have 1 channel and 3*the number of pixels samples
cv::Mat arrayFromRgb = rgbMat.reshape(1, rgbMat.channels()*rgbMat.size().area());
There are two caveats:
reshape() returns a new cv::Mat reference, hence its output needs to be assigned to a variable (it won't operate in-place)
you are not allowed to change the number of elements in the matrix.
OpenCV stores the matrix data in row-major order.
Thus, an alternative is to iterate over the rows by getting a pointer to each row start.
This way, you will not do anything unsafe because of possible padding data at the end of the rows:
cv::Mat rgbMat = cv::imread(...);
for (int y = 0; y < rgbMat.size().height; ++y) {
// Option 1: get a pointer to a 3-channel element
cv::Vec3b* pointerToRgbPixel = rgbMat.ptr<cv::Vec3b>(y);
for (int x = 0; x < rgbMat.size().width; ++x, ++pointerToRgbPixel) {
uint8_t blue = (*pointerToRgbPixel )[0];
uint8_t green = (*pointerToRgbPixel )[1];
uint8_t red = (*pointerToRgbPixel )[2];
DoSomething(red, green, blue);
}
// Option 2: get a pointer to the first sample and iterate
uint8_t* pointerToSample = rgbMat.ptr<uint8_t>(y);
for (int x = 0; x < rgbMat.channels()*rgbMat.size().width; ++x) {
DoSomething(*pointerToSample);
++pointerToSample;
}
}
Why do I like the iteration over the rows ?
Because it is easy to make parallel.
If you have a multi-core computer, you can use any framework (such as OpenMP or GCD) to handle each line in parallel in a safe way.
Using OpenMP, it as easy as adding a #pragma parallel for before the outer loop.
Yes it is referenced over there in the documentation.
And why don't you see the snippet below:
template<int N>
void SetPixel(Mat &img, int x, int y, unsigned char newVal) {
*(img.data + (y * img.cols + x) * img.channels() + N) = newVal;
}
int main() {
Mat img = Mat::zeros(1000, 1000, CV_8UC4);
SetPixel<0>(img, 120);
SetPixel<1>(img, 120);
SetPixel<2>(img, 120);
imwrite("out.jpg", img);
return 0;
}
But it is not the safe way, it assumes mat data lays continuously in the momory (and there is no space in bytes between its rows). So better check Mat::isContinous() before using this snippet.
//C++ Code Below
//your RGB image
cv::Mat image;
//your 1D array
cv::Mat newimage;
//the function to convert the image into 1D array
image.reshape(0, 1).convertTo(newimage, CV_32F);
//http://docs.opencv.org/modules/core/doc/basic_structures.html#mat-reshape
Since my previous question was certainly not very clean. And respectively i couldn't implement an exact solution of my problem. I have been working on a function that returns the byte offset of a pixel located in X/Y coords. For that purpose i have that:
dword bmp_find_xy (dword xp, dword yp)
{
dword w = 50; // to clarify thats the real widh of the sample image i use
dword bpx = (3*8); // using 3*8 as a reminder.
dword offset = (2+sizeof(BMP)+sizeof(DIB)); // this is the offset 54.
dword pitch = w * 3; // determining the widh of pixels. Pitch variable
dword row = w * 3; // determining the widh of pixels. Row variable.
dword pixAddress; // result variable
if(pitch % 4 != 0) pitch += 4 - (pitch % 4); // finding the pitch (row+padding)
pixAddress = (offset) + pitch * yp + ((xp * bpx) / 8); // finding the address
return pixAddress;
}
So the question won't be like "What am i doing wrong/why im receiving weird errors". The question is.. am i doing it correct? On first tests it seems to work. But i am somehow unsure. Once it is confirmed thats the correct way.. I'll delete the question.
Your code looks like it gives the correct result to me. However it is inconsistent in itself.
In the row (yp) addressing, you assume that every pixel has 3 bytes.
In the column (xp) addressing, you assume that every pixel has 3*8 bits.
So why use bytes in the first case, bits in the second case? I think the code would be cleaner like this:
dword width = 50; // image width
dword channels = 3; // number of color channels
dword bpp = 8; // depth in bits
dword single = (channels*bpp)/8; // size of a pixel in bytes
dword offset = (2+sizeof(BMP)+sizeof(DIB)); // this is the offset 54.
dword rowsize = width*single; // size of a row in memory
if (rowsize % 4 != 0)
rowsize += 4 - (rowsize % 4); // account for padding
dword pixAddress; // result variable
pixAddress = offset + yp*rowsize + xp*single; // finding the address
return pixAddress;
Also, you can read the width, channel and bpp from the header.
Next, your code would be faster if you get the address of the first pixel in a row first, then keep it to iterate through the row (not recompute the whole thing every time). Here is an illustration of a typical task running over all pixels. Note that I do not use the same coding style as in the original question here.
unsigned char maxGreen = 0;
for (int y = 0; y < height; y++) {
unsigned char *row = bitmap.getRowPtr(y);
for (int x = 0; x < width; x++) {
unsigned char *pixel = row + bitmap.getColumnOffset(x);
if (pixel[2] > maxGreen)
maxGreen = pixel[2];
}
}
// maxGreen holds the maximum value in the green channel observed in the image
As you can see, in this example the offset, padding etc. calculations only need to be done once per row in the getRowPtr() function. Per pixel we only need to do the offset calculation (a simple multiplication) in the getColumnOffset() function.
This makes the example much faster, when breaking down how many calculations need to be done per pixel.
Last, I would never write code to read a BMP myself! Use a library for that!