I've been using the FJCore library in a Silverlight project to help with some realtime image processing, and I'm trying to figure out how to get a tad more compression and performance out of the library. Now, as I understand it, the JPEG standard allows you to specify a chroma subsampling ratio (see http://en.wikipedia.org/wiki/Chroma_subsampling and http://en.wikipedia.org/wiki/Jpeg); and it appears that this is supposed to be implemented in the FJCore library using the HsampFactor and VsampFactor arrays:
public static readonly byte[] HsampFactor = { 1, 1, 1 };
public static readonly byte[] VsampFactor = { 1, 1, 1 };
However, I'm having a hard time figuring out how to use them. It looks to me like the current values are supposed to represent 4:4:4 subsampling (e.g., no subsampling at all), and that if I wanted to get 4:1:1 subsampling, the right values would be something like this:
public static readonly byte[] HsampFactor = { 2, 1, 1 };
public static readonly byte[] VsampFactor = { 2, 1, 1 };
At least, that's the way that other similar libraries use these values (for instance, see the example code here for libjpeg).
However, neither the above values of {2, 1, 1} nor any other set of values that I've tried besides {1, 1, 1} produce a legible image. Nor, in looking at the code, does it seem like that's the way it's written. But for the life of me, I can't figure out what the FJCore code is actually trying to do. It seems like it's just using the sample factors to repeat operations that it's already done -- i.e., if I didn't know better, I'd say that it was a bug. But this is a fairly established library, based on some fairly well established Java code, so I'd be surprised if that were the case.
Does anybody have any suggestions for how to use these values to get 4:2:2 or 4:1:1 chroma subsampling?
For what it's worth, here's the relevant code from the JpegEncoder class:
for (comp = 0; comp < _input.Image.ComponentCount; comp++)
{
Width = _input.BlockWidth[comp];
Height = _input.BlockHeight[comp];
inputArray = _input.Image.Raster[comp];
for (i = 0; i < _input.VsampFactor[comp]; i++)
{
for (j = 0; j < _input.HsampFactor[comp]; j++)
{
xblockoffset = j * 8;
yblockoffset = i * 8;
for (a = 0; a < 8; a++)
{
// set Y value. check bounds
int y = ypos + yblockoffset + a; if (y >= _height) break;
for (b = 0; b < 8; b++)
{
int x = xpos + xblockoffset + b; if (x >= _width) break;
dctArray1[a, b] = inputArray[x, y];
}
}
dctArray2 = _dct.FastFDCT(dctArray1);
dctArray3 = _dct.QuantizeBlock(dctArray2, FrameDefaults.QtableNumber[comp]);
_huf.HuffmanBlockEncoder(buffer, dctArray3, lastDCvalue[comp], FrameDefaults.DCtableNumber[comp], FrameDefaults.ACtableNumber[comp]);
lastDCvalue[comp] = dctArray3[0];
}
}
}
And notice that in the i & j loops, they're not controlling any kind of pixel skipping: if HsampFactor[0] is set to two, it's just grabbing two blocks instead of one.
I figured it out. I thought that by setting the sampling factors, you were telling the library to subsample the raster components itself. Turns out that when you set the sampling factors, you're actually telling the library the relative size of the raster components that you're providing. In other words, you need to do the chroma subsampling of the image yourself, before you ever submit it to the FJCore library for compression. Something like this is what it's looking for:
private byte[][,] GetSubsampledRaster()
{
byte[][,] raster = new byte[3][,];
raster[Y] = new byte[width / hSampleFactor[Y], height / vSampleFactor[Y]];
raster[Cb] = new byte[width / hSampleFactor[Cb], height / vSampleFactor[Cb]];
raster[Cr] = new byte[width / hSampleFactor[Cr], height / vSampleFactor[Cr]];
int rgbaPos = 0;
for (short y = 0; y < height; y++)
{
int Yy = y / vSampleFactor[Y];
int Cby = y / vSampleFactor[Cb];
int Cry = y / vSampleFactor[Cr];
int Yx = 0, Cbx = 0, Crx = 0;
for (short x = 0; x < width; x++)
{
// Convert to YCbCr colorspace.
byte b = RgbaSample[rgbaPos++];
byte g = RgbaSample[rgbaPos++];
byte r = RgbaSample[rgbaPos++];
YCbCr.fromRGB(ref r, ref g, ref b);
// Only include the byte in question in the raster if it matches the appropriate sampling factor.
if (IncludeInSample(Y, x, y))
{
raster[Y][Yx++, Yy] = r;
}
if (IncludeInSample(Cb, x, y))
{
raster[Cb][Cbx++, Cby] = g;
}
if (IncludeInSample(Cr, x, y))
{
raster[Cr][Crx++, Cry] = b;
}
// For YCbCr, we ignore the Alpha byte of the RGBA byte structure, so advance beyond it.
rgbaPos++;
}
}
return raster;
}
static private bool IncludeInSample(int slice, short x, short y)
{
// Hopefully this gets inlined . . .
return ((x % hSampleFactor[slice]) == 0) && ((y % vSampleFactor[slice]) == 0);
}
There might be additional ways to optimize this, but it's working for now.
Related
I'm making a 3D voxel game to learn OpenGL (think Minecraft). I know that rendering each face of each cube is slow, so I'm working on meshing. My meshing algorithm of choice is similar to greedy meshing, although it doesn't merge quads so that they all become one quad. Here's what some of my important code looks like:
void build_mesh(chunk *c) {
if (c->meshes != NULL) {
vector_free(c->meshes); // deleted old mesh list
}
c->meshes = vector_create(); // creates a new mesh list
for (int x = 0; x < CHUNK_SIZE; x++) {
for (int y = 0; y < CHUNK_HEIGHT; y++) {
for (int z = 0; z < CHUNK_SIZE; z++) {
if (c->data[x][y][z] == 1) {
mesh m;
m.pos.x = x;
m.pos.y = y;
m.pos.z = z;
if (x - 1 < 0 || c->data[x - 1][y][z] == 0) {
// if we're in here that means we have to render the quad
m.type = X_MIN;
vector_add(&c->meshes, m);
}
if (x + 1 >= CHUNK_SIZE || c->data[x + 1][y][z] == 0) {
m.type = X_POS;
vector_add(&c->meshes, m);
}
if (y - 1 < 0 || c->data[x][y - 1][z] == 0) {
m.type = Y_MIN;
vector_add(&c->meshes, m);
}
if (y + 1 >= CHUNK_HEIGHT || c->data[x][y + 1][z] == 0) {
m.type = Y_POS;
vector_add(&c->meshes, m);
}
if (z - 1 < 0 || c->data[x][y][z - 1] == 0) {
m.type = Z_MIN;
vector_add(&c->meshes, m);
}
if (z + 1 >= CHUNK_SIZE || c->data[x][y][z + 1] == 0) {
m.type = Z_POS;
vector_add(&c->meshes, m);
}
}
}
}
}
}
void render_chunk(chunk *c, vert *verts, unsigned int program, mat4 model, unsigned int modelLoc, bool greedy) {
// meshing code
if (greedy) {
for (int i = 0; i < vector_size(c->meshes); i++) {
glm_translate_make(model, (vec3){c->meshes[i].pos.x, c->meshes[i].pos.y, c->meshes[i].pos.z});
setMat4(modelLoc, model);
glBindVertexArray(verts[c->meshes[i].type].VAO);
glDrawArrays(GL_TRIANGLES, 0, 6);
}
return;
}
for (int x = 0; x < CHUNK_SIZE; x++) {
for (int y = 0; y < CHUNK_HEIGHT; y++) {
for (int z = 0; z < CHUNK_SIZE; z++) {
for (int i = 0; i < 6; i++) {
if (c->data[x][y][z] == 1) {
glm_translate_make(model, (vec3){x, y, z});
setMat4(modelLoc, model);
glBindVertexArray(verts[i].VAO);
glDrawArrays(GL_TRIANGLES, 0, 6);
}
}
}
}
}
}
build_mesh only gets called when the chunk gets updated and render_chunk gets called every frame. If greedy is true, greedy meshing is implemented. However, the problem is that greedy meshing is significantly slower than just rendering everything, which should not be happening. Does anyone have any ideas what's going on?
Edit: After timing the mesh rendering, it take ~30-40 ms per frame. However, it scales up really well and still take 30-40 ms regardless of how large the chunk is.
18432 calls to glDrawArrays is way too much as the call it self is a performance hit alone due the way how GL works.
You should group your meshes to much less VAO/VBOs ... for example 128 or less ... you can divide your voxel space into slices so if you got 128x32x32 cubes try to put 32x32 cubes into single VAO/VBOs and see if it makes any difference in speed ... also I would get rid of the translation of cubes and store the cube vertexes into VBO already translated.
My answer in the duplicate (sadly deleted) QA:
How to best write a voxel engine in C with performance in mind
did go one step further representing your voxel space in a 3D texture where each texel represents a voxel and ray tracing it in fragment shader using just single glDraw rendering single QUAD covering the screen. Using the same techniques as Wolfenstein like ray cast just ported to 3D.
The ray tracing (vertex shader casts the start rays) stuff was ported from this:
raytrace through 3D mesh
Here preview from the deleted QA:
IIRC it was 128x128x128 or 256x256x256 voxels rendered in 12.4ms (ignore the fps it was measuring something else). there where a lot of room to optimize more in the shaders as I wanted to keep them as simple and understandable as I could (so no more advanced optimizations)...
There are also other options like using point sprites, or geometry shader emitting the cubes etc ...
In case lowering the number of glDraws is not enough speed boost you might want to implement BVH structures to speed up rendering ... however for single 128x32x32 space I see no point in this as that should be handled with ease...
I'm currently struggling to make a 3D Sobel edge detector in C (which I am quite new to). It's not exactly working as expected (highlighting non-edges within a solid 3D object) and I was hoping someone might see where I've gone wrong. (and sorry for the poor spacing in this post)
First of all, im is the input image which has been copied into tm with a 1 pixel border on each side.
I loop through the image:
for (z = im.zlo; z <= im.zhi; z++) {
for (y = im.ylo; y <= im.yhi; y++) {
for (x = im.xlo; x <= im.xhi; x++) {
I make an array which will house the change in the x, y, and z directions, and loop through a 3x3x3 cube:
int dxdydz[3] = {0, 0, 0};
for (a = -1; a < 2; a++) {
for (b = -1; b < 2; b++) {
for (c = -1; c < 2; c++) {
Now here's the meat, where it gets a bit tricky. I'm weighting my Sobel operator such that if you imagine one 2D surface of the kernel, it would be {{1,2,1},{2,4,2},{1,2,1}}. In other words, the weight of a kernel pixel is related to its 4-connected nearness to the center pixel.
To accomplish this, I define e as 3 - (|a| + |b| + |c|), so that it is either 0, 1, or 2. The kernel will be weighted by 3^e at each pixel.
The sign of the kernel pixel will just be determined by the sign of a, b, or c.
int e = 3 - (abs(a) + abs(b) + abs(c));
Now I loop through a, b, and c by packaging them into an array and looping from 0-1-2. When a for example is 0, we don't want to add any values to x, so we exclude that with an if statement (8 levels deep!).
int abc[3] = {a, b, c};
for (i = 0; i < 3; i++) {
if (abc[i] != 0) {
The value to add should just be the image value at that pixel multiplied by the kernel value at that pixel. abc[i] is just -1 or 1, and (int)pow(3, e) is the nearness-to-center weight.
dxdydz[i] += abc[i]*(int)pow(3, e)*tm.u[z+a][y+b][x+c];
}
}
}
}
}
Lastly take the sqrt of the sum of the squared changes in x, y, and z.
int mag2 = 0;
for (i = 0; i < 3; i++) {
mag2 += (int)pow(dxdydz[i], 2);
}
im.u[z][y][x] = (int)sqrt(mag2);
}
}
}
Of course I could just loop through the image and multiply 3x3x3 cubes by the 3D kernels:
int kx[3][3][3] = {{{-1,-2,-1},{0,0,0},{1,2,1}},
{{-2,-4,-2},{0,0,0},{2,4,2}},
{{-1,-2,-1},{0,0,0},{1,2,1}}};
int ky[3][3][3] = {{{-1,-2,-1},{-2,-4,-2},{-1,-2,-1}},
{{0,0,0},{0,0,0},{0,0,0}},
{{1,2,1},{2,4,2},{1,2,1}}};
int kz[3][3][3] = {{{-1,0,1},{-2,0,2},{-1,0,1}},
{{-2,0,2},{-4,0,4},{-2,0,2}},
{{-1,0,1},{-1,0,1},{-1,0,1}}};
But I think the loop approach is a lot sexier.
This is the function I have written for 2D Convolution in C:
typedef struct PGMImage{
int w;
int h;
int* data;
}GrayImage;
GrayImage Convolution2D(GrayImage image,GrayImage kernel){
int aH,aW,bW,bH,r,c,x,y,xx,yy,X,Y;
int temp = 0;
GrayImage conv;
CreateGrayImage(&conv,image.w,image.h);
aH = image.h;
aW = image.w;
bH = kernel.h;
bW = kernel.w;
if(aW < bW || aH < bH){
fprintf(stderr,"Image cannot have smaller dimensions than the blur kernel");
}
for(r = aH-1;r >= 0;r--){
for(c = aW-1;c >= 0;c--){
temp = 0;
for(y = bH-1;y >= 0;y--){
yy = bH - y -1;
for(x = bW-1;x >= 0;x--){
xx = bW - x - 1;
X = c + (x - (bW/2));
Y = r + (y - (bH/2));
if(X >= 0 && X < aW && Y >= 0 && Y < aH){
temp += ((kernel.data[(yy*bW)+xx])*(image.data[(Y*aW)+X]));
}
}
}
conv.data[(r*aW)+c] = temp;
}
}
return conv;
}
I reproduced this function in Matlab and found that it overestimates the values for certain pixels as compared to the regular 2D Convolution function in Matlab (conv2D). I can't figure out where I am going wrong with the logic. Please help.
EDIT:
Here's the stock image I am using (512*512):
https://drive.google.com/file/d/0B3qeTSY-DQRvdWxCZWw5RExiSjQ/view?usp=sharing
Here's the kernel (3*3):
https://drive.google.com/file/d/0B3qeTSY-DQRvdlQzamcyVmtLVW8/view?usp=sharing
On using the above function I get
46465 46456 46564
45891 46137 46158
45781 46149 46030
But Matlab's conv2 gives me
46596 46618 46627
46073 46400 46149
45951 46226 46153
for the same pixels (rows:239-241,col:316:318)
This is the Matlab code I am using to compare the values:
pgm_img = imread('path\to\lena512.pgm');
kernel = imread('path\to\test_kernel.pgm');
sz_img = size(pgm_img);
sz_ker = size(kernel);
conv = conv2(double(pgm_img),double(kernel),'same');
pgm_img = padarray(pgm_img,floor(0.5*sz_ker),'both');
convolve = zeros(sz_img);
for i=floor(0.5*sz_ker(1))+1:floor(0.5*sz_ker(1))+sz_img(1)
for j=floor(0.5*sz_ker(2))+1:floor(0.5*sz_ker(2))+sz_img(2)
startX = j - floor(sz_ker(2)/2);
startY = i - floor(sz_ker(1)/2);
endX = j + floor(sz_ker(2)/2);
endY = i + floor(sz_ker(1)/2);
block = pgm_img(startY:endY,startX:endX);
prod = double(block).*double(kernel);
convolve(i-floor(0.5*sz_ker(1)),j-floor(0.5*sz_ker(2))) = sum(sum(prod));
end
end
disp(conv(239:241,316:318));
disp(convolve(239:241,316:318));
One obvious difference is that your c code uses ints, while the matlab code uses doubles. Change your c code to use doubles, and see if the results are still different.
I created Image Convolution library for simple cases of an image which is a simple 2D Float Array.
The function supports arbitrary kernels and verified against MATLAB's implementation.
So all needed on your side is calling it with your generated Kernel.
You can use its generated DLL inside MATLAB and see it yields same results as MATLAB's Image Convolution functions.
Image Convolution - GitHub.
I implemented a blurring algorithm and it works. The result is a blurred image but if a pass multiple times the algorithm to my image the image remains unchanged. It's like the extra (more than 1) passings are not having any effect.
for (f=0; f<100; f++) {
for (y = 0; y < image->h; y++) {
for (x = 0; x < image->w; x++) {
int SUM = 0;
imageBlur->pixels[y * imageBlur->w + x] = SUM / 9;
}
}
}
It doesn't matter if f is 1 or 500 it's still the same result as one pass blur.
Each pass you are reading the same image again without having replaced it with the blurred one: imageBlur
You need to do the assignment somehow - some thing like
image->pixels[y * imageBlur->w + x] = SUM / 9;
I have a problem in some C code, I assume it belonged here over the Mathematics exchange.
I have an array of changes in x and y position generated by a user dragging a mouse, how could I determine if a straight line was drawn or not.
I am currently using linear regression, is there a better(more efficient) way to do this?
EDIT:
Hough transformation attempt:
#define abSIZE 100
#define ARRAYSIZE 10
int A[abSIZE][abSIZE]; //points in the a-b plane
int dX[10] = {0, 10, 13, 8, 20, 18, 19, 22, 12, 23};
int dY[10] = {0, 2, 3, 1, -1, -2, 0, 0, 3, 1};
int absX[10]; //absolute positions
int absY[10];
int error = 0;
int sumx = 0, sumy = 0, i;
//Convert deltas to absolute positions
for (i = 0; i<10; i++) {
absX[i] = sumx+=dX[i];
absY[i] = sumy+=dY[i];
}
//initialise array to zero
int a, b, x, y;
for(a = -abSIZE/2; a < abSIZE/2; a++) {
for(b = -abSIZE/2; b< abSIZE/2; b++) {
A[a+abSIZE/2][b+abSIZE/2] = 0;
}
}
//Hough transform
int aMax = 0;
int bMax = 0;
int highest = 0;
for(i=0; i<10; i++) {
x = absX[i];
y = absX[i];
for(a = -abSIZE/2; a < abSIZE/2; a++) {
for(b = -abSIZE/2; b< abSIZE/2; b++) {
if (a*x + b == y) {
A[a+abSIZE/2][b+abSIZE/2] += 1;
if (A[a+abSIZE/2][b+abSIZE/2] > highest) {
highest++; //highest = A[a+abSIZE/2][b+abSIZE/2]
aMax = a;
bMax = b;
}
}
}
}
}
printf("Line is Y = %d*X + %d\n",aMax,bMax);
//Calculate MSE
int e;
for (i = 0; i < ARRAYSIZE; i++) {
e = absY[i] - (aMax * absX[i] + bMax);
e = (int) pow((double)e, 2);
error += e;
}
printf("error is: %d\n", error);
Though linear regression sounds like a perfectly reasonable way to solve the task, here's another suggestion: Hough transform, which might be somewhat more robust against outliers. Here is a very rough sketch of how this can be applied:
initialize a large matrix A with zeros
transform your deltas to some absolute coordinates (x, y) in a x-y-plane (e.g. start with (0,0))
for each point
there are non-unique parameters a and b such that a*x + b = y. All such points (a,b) define a straight line in the a-b-plane
draw this "line" in the a-b-plane by adding ones to the corresponding cells in A, which represents the quantized plane
now you can find a maximum in the a-b-plane-matrix A, which will correspond to the parameters (a, b) of the straight line in the x-y-plane that has most support by the original points
finally, calculate MSE to the original points and decide with some threshold if the move was a straight line
More details e.g. here:
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/MARSHALL/node32.html
Edit: here's a quote from Wikipedia that explains why it's better to use a different parametrization to deal with vertical lines (where a would become infinite in ax+b=y):
However, vertical lines pose a problem. They are more naturally described as x = a and would give rise to unbounded values of the slope parameter m. Thus, for computational reasons, Duda and Hart proposed the use of a different pair of parameters, denoted r and theta, for the lines in the Hough transform. These two values, taken in conjunction, define a polar coordinate.
Thanks to Zaw Lin for pointing this out.