How can I access an RGB mat as a 1D array? I looked at the documentation but couldn't find how the 3 channel data is laid out in that case.
I'm trying to loop over each pixel with 1 for loop going from n=0 to n = img.rows*img.cols - 1, and access R, G, and B values at each pixel.
Any help would be greatly appreciated.
I don't really understand why you really need only 1 loop, so I will propose you several options (including 1 or 2 for-loops) that I know by experience to be efficient.
If you really want to iterate over all the values with only one loop in a safe way, you can reshape the matrix and turn a 3-channel 2D image into a 1-channel 1D array using cv::Mat::reshape(...) (doc):
cv::Mat rgbMat = cv::imread(...); // Read original image
// As a 1D-1 channel, we have 1 channel and 3*the number of pixels samples
cv::Mat arrayFromRgb = rgbMat.reshape(1, rgbMat.channels()*rgbMat.size().area());
There are two caveats:
reshape() returns a new cv::Mat reference, hence its output needs to be assigned to a variable (it won't operate in-place)
you are not allowed to change the number of elements in the matrix.
OpenCV stores the matrix data in row-major order.
Thus, an alternative is to iterate over the rows by getting a pointer to each row start.
This way, you will not do anything unsafe because of possible padding data at the end of the rows:
cv::Mat rgbMat = cv::imread(...);
for (int y = 0; y < rgbMat.size().height; ++y) {
// Option 1: get a pointer to a 3-channel element
cv::Vec3b* pointerToRgbPixel = rgbMat.ptr<cv::Vec3b>(y);
for (int x = 0; x < rgbMat.size().width; ++x, ++pointerToRgbPixel) {
uint8_t blue = (*pointerToRgbPixel )[0];
uint8_t green = (*pointerToRgbPixel )[1];
uint8_t red = (*pointerToRgbPixel )[2];
DoSomething(red, green, blue);
}
// Option 2: get a pointer to the first sample and iterate
uint8_t* pointerToSample = rgbMat.ptr<uint8_t>(y);
for (int x = 0; x < rgbMat.channels()*rgbMat.size().width; ++x) {
DoSomething(*pointerToSample);
++pointerToSample;
}
}
Why do I like the iteration over the rows ?
Because it is easy to make parallel.
If you have a multi-core computer, you can use any framework (such as OpenMP or GCD) to handle each line in parallel in a safe way.
Using OpenMP, it as easy as adding a #pragma parallel for before the outer loop.
Yes it is referenced over there in the documentation.
And why don't you see the snippet below:
template<int N>
void SetPixel(Mat &img, int x, int y, unsigned char newVal) {
*(img.data + (y * img.cols + x) * img.channels() + N) = newVal;
}
int main() {
Mat img = Mat::zeros(1000, 1000, CV_8UC4);
SetPixel<0>(img, 120);
SetPixel<1>(img, 120);
SetPixel<2>(img, 120);
imwrite("out.jpg", img);
return 0;
}
But it is not the safe way, it assumes mat data lays continuously in the momory (and there is no space in bytes between its rows). So better check Mat::isContinous() before using this snippet.
//C++ Code Below
//your RGB image
cv::Mat image;
//your 1D array
cv::Mat newimage;
//the function to convert the image into 1D array
image.reshape(0, 1).convertTo(newimage, CV_32F);
//http://docs.opencv.org/modules/core/doc/basic_structures.html#mat-reshape
Related
I have a function that successfully reads rgb values from a ppm and a function that successfully writes to a ppm. What I am trying is a function called denoiseImage that changes rgb values from a ppm using mean filtering with a frame window size n by n where n is odd. My intent is to go through each pixel, using it as the center point for the window n by n that surrounds it. I then take the mean values for each color (r,g,b) and divide by the number of pixels in the window and assign those new values to the rgb of every pixel in the window. However, I am unable to implement a check for the cases where the frame does not fully fit into pixels (for example, the frame center point is the top right pixel, a window of 3x3 will go to non existent points.) When it does not fit fully, I intend to use the available pixels that fit and take the mean of those numbers instead. So far, my code will only work for cases where the frame fully fits. My function:
RGB *denoiseImage(int width, int height, const RGB *image, int n)
{
int firstPos, lastPos, i = 0, j = 0, k, numofPix;
int sumR=0,sumG=0,sumB=0;
numofPix = (width * height);
RGB *pixels = malloc(numofPix * sizeof(RGB));
if (n == 1) //Case where the window size is 1 and therefore the image does not get changed.
{
return pixels;
}
for (j=0;j < numofPix;j++)
{
firstPos = (j - width) - ((n - 1)/2);
lastPos = (j + width) + ((n - 1)/2);
//Need to check boundary cases to prevent segmentation fault
for (k=firstPos;k<=lastPos;k++) //Seg fault. Unable to shrink frame to compensate for cases where the frame does not fit.
{
sumR+=image[k].r;
sumG+=image[k].g;
sumB+=image[k].b;
i++;
if (i = n) //Used to skip elements not in frame
{
j += (width-n);
i = 0;
}
}
sumR = sumR/(n*n); //Calculating mean values
sumG = sumG/(n*n);
sumB = sumB/(n*n);
for (k=firstPos;k<=lastPos;k++) //Assigning the RGB values with the new mean values.
{
pixels[k].r=sumR;
pixels[k].g=sumG;
pixels[k].b=sumB;
printf("%d %d %d ",pixels[k].r, pixels[k].g, pixels[k].b);
}
}
return pixels;
}
int main()
{
RGB *RGBValues;
int width, height, max;
int j = 0,testemp=3; //test temp is a sample frame size
char *testfile = "test.ppm";
char *testfile2 = "makeme.ppm";
RGBValues = readPPM(testfile, &width, &height, &max); //Function reads values from a ppm file correctly
RGBValues = denoiseImage(width,height, RGBValues, testemp,testing);
writePPM(testfile2,width,height,max,RGBValues); //Function writes values to a ppm file correctly
}
How would I implement a way to check if the frame fits or not?
This is a great question and luckily known in the image processing community.
Edges are always treated differently when it comes to 2D filtering.
One way to look at it is to extend the space in 2D and to fill the edges with extrapolated values from the middle.
For example, you may look into the http://www.librow.com/articles/article-1 and search for a media filter.
I am sure that you will find solution soon, since you are going into right direction.
Goal: Implement the diagram shown below in OpenCL. The main thing needed from the OpenCl kernel is to multiply the coefficient array and temp array and then accumilate all those values into one at the end. (That is probably the most time intensive operation, parallelism would be really helpful here).
I am using a helper function for the kernel that does the multiplication and addition (I am hoping this function will be parallel as well).
Description of the picture:
One at a time, the values are passed into the array (temp array) which is the same size as the coefficient array. Now every time a single value is passed into this array, the temp array is multiplied with the coefficient array in parallel and the values of each index are then concatenated into one single element. This will continue until the input array reaches it's final element.
What happens with my code?
For 60 elements from the input, it takes over 8000 ms!! and I have a total of 1.2 million inputs that still have to be passed in. I know for a fact that there is a way better solution to do what I am attempting. Here is my code below.
Here are some things that I know are wrong with he code for sure. When I try to multiply the coefficient values with the temp array, it crashes. This is because of the global_id. All I want this line to do is simply multiply the two arrays in parallel.
I tried to figure out why it was taking so long to do the FIFO function, so I started commenting lines out. I first started by commenting everything except the first for loop of the FIFO function. As a result this took 50 ms. Then when I uncommented the next loop, it jumped to 8000ms. So the delay would have to do with the transfer of data.
Is there a register shift that I could use in OpenCl? Perhaps use some logical shifting method for integer arrays? (I know there is a '>>' operator).
float constant temp[58];
float constant tempArrayForShift[58];
float constant multipliedResult[58];
float fifo(float inputValue, float *coefficients, int sizeOfCoeff) {
//take array of 58 elements (or same size as number of coefficients)
//shift all elements to the right one
//bring next element into index 0 from input
//multiply the coefficient array with the array thats the same size of coefficients and accumilate
//store into one output value of the output array
//repeat till input array has reached the end
int globalId = get_global_id(0);
float output = 0.0f;
//Shift everything down from 1 to 57
//takes about 50ms here
for(int i=1; i<58; i++){
tempArrayForShift[i] = temp[i];
}
//Input the new value passed from main kernel. Rest of values were shifted over so element is written at index 0.
tempArrayForShift[0] = inputValue;
//Takes about 8000ms with this loop included
//Write values back into temp array
for(int i=0; i<58; i++){
temp[i] = tempArrayForShift[i];
}
//all 58 elements of the coefficient array and temp array are multiplied at the same time and stored in a new array
//I am 100% sure this line is crashing the program.
//multipliedResult[globalId] = coefficients[globalId] * temp[globalId];
//Sum the temp array with each other. Temp array consists of coefficients*fifo buffer
for (int i = 0; i < 58; i ++) {
// output = multipliedResult[i] + output;
}
//Returned summed value of temp array
return output;
}
__kernel void lowpass(__global float *Array, __global float *coefficients, __global float *Output) {
//Initialize the temporary array values to 0
for (int i = 0; i < 58; i ++) {
temp[i] = 0;
tempArrayForShift[i] = 0;
multipliedResult[i] = 0;
}
//fifo adds one element in and calls the fifo function. ALL I NEED TO DO IS SEND ONE VALUE AT A TIME HERE.
for (int i = 0; i < 60; i ++) {
Output[i] = fifo(Array[i], coefficients, 58);
}
}
I have had this problem with OpenCl for a long time. I am not sure how to implement parallel and sequential instructions together.
Another alternative I was thinking about
In the main cpp file, I was thinking of implementing the fifo buffer there and having the kernel do the multiplication and addition. But this would mean I would have to call the kernel 1000+ times in a loop. Would this be the better solution? Or would it just be completely inefficient.
To get good performance out of GPU, you need to parallelize your work to many threads. In your code you are just using a single thread and a GPU is very slow per thread but can be very fast, if many threads are running at the same time. In this case you can use a single thread for each output value. You do not actually need to shift values through a array: For every output value a window of 58 values is considered, you can just grab these values from memory, multiply them with the coefficients and write back the result.
A simple implementation would be (launch with as many threads as output values):
__kernel void lowpass(__global float *Array, __global float *coefficients, __global float *Output)
{
int globalId = get_global_id(0);
float sum=0.0f;
for (int i=0; i< 58; i++)
{
float tmp=0;
if (globalId+i > 56)
{
tmp=Array[i+globalId-57]*coefficient[57-i];
}
sum += tmp;
}
output[globalId]=sum;
}
This is not perfect, as the memory access patterns it generates are not optimal for GPUs. The Cache will likely help a bit, but there is clearly a lot of room for optimization, as the values are reused several times. The operation you are trying to perform is called convolution (1D). NVidia has an 2D example called oclConvolutionSeparable in their GPU Computing SDK, that shows an optimized version. You adapt use their convolutionRows kernel for a 1D convolution.
Here's another kernel you can try out. There are a lot of synchronization points (barriers), but this should perform fairly well. The 65-item work group is not very optimal.
the steps:
init local values to 0
copy coefficients to local variable
looping over the output elements to compute:
shift existing elements (work items > 0 only)
copy new element (work item 0 only)
compute dot product
5a. multiplication - one per work item
5b. reduction loop to compute sum
copy dot product to output (WI 0 only)
final barrier
the code:
__kernel void lowpass(__global float *Array, __constant float *coefficients, __global float *Output, __local float *localArray, __local float *localSums){
int globalId = get_global_id(0);
int localId = get_local_id(0);
int localSize = get_local_size(0);
//1 init local values to 0
localArray[localId] = 0.0f
//2 copy coefficients to local
//don't bother with this id __constant is working for you
//requires another local to be passed in: localCoeff
//localCoeff[localId] = coefficients[localId];
//barrier for both steps 1 and 2
barrier(CLK_LOCAL_MEM_FENCE);
float tmp;
for(int i = 0; i< outputSize; i++)
{
//3 shift elements (+barrier)
if(localId > 0){
tmp = localArray[localId -1]
}
barrier(CLK_LOCAL_MEM_FENCE);
localArray[localId] = tmp
//4 copy new element (work item 0 only, + barrier)
if(localId == 0){
localArray[0] = Array[i];
}
barrier(CLK_LOCAL_MEM_FENCE);
//5 compute dot product
//5a multiply + barrier
localSums[localId] = localArray[localId] * coefficients[localId];
barrier(CLK_LOCAL_MEM_FENCE);
//5b reduction loop + barrier
for(int j = 1; j < localSize; j <<= 1) {
int mask = (j << 1) - 1;
if ((localId & mask) == 0) {
localSums[local_index] += localSums[localId +j]
}
barrier(CLK_LOCAL_MEM_FENCE);
}
//6 copy dot product (WI 0 only)
if(localId == 0){
Output[i] = localSums[0];
}
//7 barrier
//only needed if there is more code after the loop.
//the barrier in #3 covers this in the case where the loop continues
//barrier(CLK_LOCAL_MEM_FENCE);
}
}
What about more work groups?
This is slightly simplified to allow a single 1x65 work group computer the entire 1.2M Output. To allow multiple work groups, you could use / get_num_groups(0) to calculate the amount of work each group should do (workAmount), and adjust the i for-loop:
for (i = workAmount * get_group_id(0); i< (workAmount * (get_group_id(0)+1) -1); i++)
Step #1 must be changed as well to initialize to the correct starting state for localArray, rather than all 0s.
//1 init local values
if(groupId == 0){
localArray[localId] = 0.0f
}else{
localArray[localSize - localId] = Array[workAmount - localId];
}
These two changes should allow you to use a more optimal number of work groups; I suggest some multiple of the number of compute units on the device. Try to keep the amount of work for each group in the thousands though. Play around with this, sometimes what seems optimal on a high-level will be detrimental to the kernel when it's running.
Advantages
At almost every point in this kernel, the work items have something to do. The only time fewer than 100% of the items are working is during the reduction loop in step 5b. Read more here about why that is a good thing.
Disadvantages
The barriers will slow down the kernel just by the nature of what barriers do: the pause a work item until the others reach that point. Maybe there is a way you could implement this with fewer barriers, but I still feel this is optimal because of the problem you are trying to solve.
There isn't room for more work items per group, and 65 is not a very optimal size. Ideally, you should try to use a power of 2, or a multiple of 64. This won't be a huge issue though, because there are a lot of barriers in the kernel which makes them all wait fairly regularly.
I know, another dynamic array question, this one is a bit different though so maybe it'll be worth answering. I am making a terrain generator in C with SDL, I am drawing 9 chunks surrounding the screen, proportional to the screen size, that way terrains can be generated easier in the future.
This means that I have to be able to resize the array at any given point, so I made a dynamic array (at least according to an answer I found on stack it is) and everything SEEMS to work fine, nothing is crashing, it even draws a single tile....but just one. I am looking at it and yeah, sure enough it's iterating through the array but only writing to one portion of memory. I am using a struct called Tile that just holds the x, y, w, and h of a rectangle.
This is the code I am using to allocate the array
Tile* TileMap = (Tile*)malloc(0 * sizeof(Tile*));
int arrayLen = sizeof(TileMap);
TileMap = (Tile*)realloc(TileMap, (totalTiles) * sizeof(Tile));
arrayLen = sizeof(totalTiles * sizeof(Tile));
The totalTiles are just the number of tiles that I have calculated previously are on the screen, I've checked the math and it's correct, and it even allocates the proper amount of memory. Here is the code I use to initialize the array:
//Clear all elements to zero.
for (int i = 0; i < arrayLen; i++)
{
Tile tile = {};
TileMap[i] = tile;
}
So what's weird to me is it is considering the size of a tile (16 bytes) * the totalTiles (78,000) is equaling 4....When I drill down into the array, it only has one single rect in it that gets cleared as well, so then when I go calculate the sizes of each tile:
//Figure out Y and heights
for (int i = startY; i <= (startY*(-1)) * 2; i += TILE_HEIGHT)
{
TileMap[i].y = i * TILE_HEIGHT;
TileMap[i].h = TILE_HEIGHT;
//Figure out X and widths
for (int j = startX; j <= (startX*(-1)) * 2; j += TILE_WIDTH)
{
TileMap[i].x = i * TILE_WIDTH;
TileMap[i].w = TILE_WIDTH;
}
}
*Side note, the startX is the negative offset I am using to draw chunks behind the camera, so I times it by -1 to make it positive and then time it by two to get one chunk in front of the camera
Alright, so obviously that only initializes one, and here is the render code
for (int i = 0; i < totalTiles; i++)
{
SDL_Rect currentTile;
currentTile.x = TileMap[i].x;
currentTile.y = TileMap[i].y;
currentTile.w = TileMap[i].w;
currentTile.h = TileMap[i].h;
SDL_RenderDrawRect(renderer, ¤tTile);
}
free(TileMap);
So what am I doing wrong here? I mean I literally am just baffled right now...And before Vectors get recommended in place of dynamic arrays, I don't really like using them and I want to learn to deal with stuff like this, not just implement some simple fix.
Lots of confusion (which is commonplace with C pointers).
The following code doesn't provide expected answer :arrayLen = sizeof(totalTiles * sizeof(Tile));
totalTiles * sizeof(Tile) is not even a type, I'm surprised it compiles at all. Edit : See molbnilo comment below. so it provides the size of the return type.
Anyway, proper answer should be :
arrayLen = totalTiles;
Because that's what you need in your next loop :
//Clear all elements to zero.
for (int i = 0; i < arrayLen; i++)
{
Tile tile = {};
TileMap[i] = tile;
}
You don't need the size of the table, you need its number of elements.
There are other confusions in your sample, they don't directly impact the rest of the code, but better correct them :
Tile* TileMap = (Tile*)malloc(0 * sizeof(Tile*)); : avoid allocating a size of 0.
int arrayLen = sizeof(TileMap); : no, it's not the arrayLen, just the size of the pointer (hence 4 bytes on 32-bits binaries). Remember TileMap is not defined as a table, but as a pointer allocated with malloc() and then realloc().
I am building a particle simulation, and I want to display each particle's position as a dot in a 3D scatter plot using MathGL in C (not C++!). I am having trouble with the C interface.
So far I found two interesting examples:
A C++ example that seems to be close to what I want: http://mathgl.sourceforge.net/doc_en/Dots-sample.html (but this is in C++, I have been unable to find the C-equivalent)
This is a piece of C code that constructs a 3D surf plot with dots.
#include <mgl2/mgl_cf.h>
int main()
{
HMGL gr = mgl_create_graph(600,400);
HMDT a,x,y;
a = mgl_create_data_size(30,40,1);
x = mgl_create_data_size(30,1,1);
y = mgl_create_data_size(40,1,1);
mgl_data_modify(a,"pi*(1-2*x)*exp(-4*y^2-4*(2*x-1)^2)",0);
mgl_data_fill(x,-1.,1.,'x');
mgl_data_fill(y,0.,1.,'x');
mgl_rotate(gr,40.,60.,0.);
mgl_set_light(gr,1);
mgl_box(gr,1);
mgl_surf_xy(gr,x,y,a,".","");
mgl_delete_data(a);
mgl_delete_data(y);
mgl_delete_data(x);
mgl_write_frame(gr,"test.png","");
mgl_delete_graph(gr);
return 0;
}
The example 2 is close to what I want to do, but it is annoying that a is not a linear array of just N particles. It also has to take a function to evaluate the values for a (z-axis) whereas I just want to pass the z-coordinate manually for each dot).
My data is just a 1D array of structs, similar to this:
struct particle {
double x, y, z, velocity;
};
How do I plot these particles as dots in a 3D (scatter) plot with MathGL in C? I guess I have to use mgl_dots, but how does it read from my array of values? (I could use velocity as color coding, but that is optional)
I was right about using mgl_dots, and the data can be prepared using mgl_create_data_sizeand mgl_data_put_val, e.g.:
HMDT z,x,y;
int N = 1000;
x = mgl_create_data_size(N,1,1);
z = mgl_create_data_size(N,1,1);
y = mgl_create_data_size(N,1,1);
for(int i=0; i < N; i++) {
// Set position of particle[i]
printf("%lf\n", i/(double) N);
mgl_data_put_val(x, i/(double) N, i, 0, 0);
mgl_data_put_val(y, i/(double) N, i, 0, 0);
mgl_data_put_val(z, i/(double) N, i, 0, 0);
}
I am starting learning an openCV and a have question about it.
My target is to recognize captcha.
First I must preprocess an image.
There is an example of captcha
here
So problem is how to crop symbols from image and put it into 2D array(bitmap).
Automatic Partition Detection
The first thing you'd need to do is create the filter array of background colors. This will be the array containing the colors that occur in the background. For that purpose you can just take the offset 20x20 area or leave it as a user option depending on your project standpoint.
typedef unsigned char Pixel [3];
typedef *Pixel PixelArray;
// Function to return offset byte of x/y coordinate
int bmp_get_offset (int width, int x, int y)
{
int w = width;
const int channels = 3;
const int bpp = 8;
const int single = (channels * bmpp) / 8;
const int offset = 54;
int rowsize = w * single;
int pixAddress;
if(rowsize % 4 != 0) rowsize += 4 - (rowsize % 4);
pixAddress = offset + yp * rowsize + xp * single;
return pixAddress;
}
// Function to return specific area (pseudo-code)
PixelArray bmp_get_area (FILE * bmp, int x, int y, int w, int h)
{
PixelArray buffer = buffer_new(bmp); // sets image into a memory-allocated buffer
PixelArray area [h * w];
const int src_width = *((int*)&buffer[(0x12)]);
for(int iWidth = 0; iWidth < w; iWidth++)
for(int iHeight = 0; iHeight < h; iHeight++)
area[iHeight * src_width + iWidth] = buffer[bmp_get_offset(src_width, x + iWidth, y + iHeight)];
return area;
}
Well it didn't go that much pseudo-code.
Now that you have the filter you can limit outer pixels.
Now what you need is a vertical raster scan. Or just vertical scan.
Over the entire captcha image.
Each pixel of the vertical line will be additionally checked if it matches some color from the already-obtained area.
If all the pixels of the line (that has the size of the image's height) return positive when checked whether pixel is close to or matches area color, an aray indexer will increment so that we have where the last character ends.
Edit 1
For 3 seconds I GIMPed the color curves of the image, resulting in a plain background:
So this respectively simplifies the filtering process quite a lot.
The color curves magic I did is actually only a brightness/contrast adjustment control, which is maybe the easiest color processing you can implement (after the invert).
I might periodically edit to clarify some more. This will certainly give you a good practice. A real practice.
Doc:
BMP File Format (More than enough information for you to start working with bitmaps. The most important one is the bitmap structure, which is a combination of BMPINFOHEADER and DIBHEADER).
Tesseract OCR (alternative that will do everything for you. However if you solve your problems with the easiest solution, it will not make you a better programmer)