OpenGL Meshing Slower Than Not Meshing - c

I'm making a 3D voxel game to learn OpenGL (think Minecraft). I know that rendering each face of each cube is slow, so I'm working on meshing. My meshing algorithm of choice is similar to greedy meshing, although it doesn't merge quads so that they all become one quad. Here's what some of my important code looks like:
void build_mesh(chunk *c) {
if (c->meshes != NULL) {
vector_free(c->meshes); // deleted old mesh list
}
c->meshes = vector_create(); // creates a new mesh list
for (int x = 0; x < CHUNK_SIZE; x++) {
for (int y = 0; y < CHUNK_HEIGHT; y++) {
for (int z = 0; z < CHUNK_SIZE; z++) {
if (c->data[x][y][z] == 1) {
mesh m;
m.pos.x = x;
m.pos.y = y;
m.pos.z = z;
if (x - 1 < 0 || c->data[x - 1][y][z] == 0) {
// if we're in here that means we have to render the quad
m.type = X_MIN;
vector_add(&c->meshes, m);
}
if (x + 1 >= CHUNK_SIZE || c->data[x + 1][y][z] == 0) {
m.type = X_POS;
vector_add(&c->meshes, m);
}
if (y - 1 < 0 || c->data[x][y - 1][z] == 0) {
m.type = Y_MIN;
vector_add(&c->meshes, m);
}
if (y + 1 >= CHUNK_HEIGHT || c->data[x][y + 1][z] == 0) {
m.type = Y_POS;
vector_add(&c->meshes, m);
}
if (z - 1 < 0 || c->data[x][y][z - 1] == 0) {
m.type = Z_MIN;
vector_add(&c->meshes, m);
}
if (z + 1 >= CHUNK_SIZE || c->data[x][y][z + 1] == 0) {
m.type = Z_POS;
vector_add(&c->meshes, m);
}
}
}
}
}
}
void render_chunk(chunk *c, vert *verts, unsigned int program, mat4 model, unsigned int modelLoc, bool greedy) {
// meshing code
if (greedy) {
for (int i = 0; i < vector_size(c->meshes); i++) {
glm_translate_make(model, (vec3){c->meshes[i].pos.x, c->meshes[i].pos.y, c->meshes[i].pos.z});
setMat4(modelLoc, model);
glBindVertexArray(verts[c->meshes[i].type].VAO);
glDrawArrays(GL_TRIANGLES, 0, 6);
}
return;
}
for (int x = 0; x < CHUNK_SIZE; x++) {
for (int y = 0; y < CHUNK_HEIGHT; y++) {
for (int z = 0; z < CHUNK_SIZE; z++) {
for (int i = 0; i < 6; i++) {
if (c->data[x][y][z] == 1) {
glm_translate_make(model, (vec3){x, y, z});
setMat4(modelLoc, model);
glBindVertexArray(verts[i].VAO);
glDrawArrays(GL_TRIANGLES, 0, 6);
}
}
}
}
}
}
build_mesh only gets called when the chunk gets updated and render_chunk gets called every frame. If greedy is true, greedy meshing is implemented. However, the problem is that greedy meshing is significantly slower than just rendering everything, which should not be happening. Does anyone have any ideas what's going on?
Edit: After timing the mesh rendering, it take ~30-40 ms per frame. However, it scales up really well and still take 30-40 ms regardless of how large the chunk is.

18432 calls to glDrawArrays is way too much as the call it self is a performance hit alone due the way how GL works.
You should group your meshes to much less VAO/VBOs ... for example 128 or less ... you can divide your voxel space into slices so if you got 128x32x32 cubes try to put 32x32 cubes into single VAO/VBOs and see if it makes any difference in speed ... also I would get rid of the translation of cubes and store the cube vertexes into VBO already translated.
My answer in the duplicate (sadly deleted) QA:
How to best write a voxel engine in C with performance in mind
did go one step further representing your voxel space in a 3D texture where each texel represents a voxel and ray tracing it in fragment shader using just single glDraw rendering single QUAD covering the screen. Using the same techniques as Wolfenstein like ray cast just ported to 3D.
The ray tracing (vertex shader casts the start rays) stuff was ported from this:
raytrace through 3D mesh
Here preview from the deleted QA:
IIRC it was 128x128x128 or 256x256x256 voxels rendered in 12.4ms (ignore the fps it was measuring something else). there where a lot of room to optimize more in the shaders as I wanted to keep them as simple and understandable as I could (so no more advanced optimizations)...
There are also other options like using point sprites, or geometry shader emitting the cubes etc ...
In case lowering the number of glDraws is not enough speed boost you might want to implement BVH structures to speed up rendering ... however for single 128x32x32 space I see no point in this as that should be handled with ease...

Related

3D Sobel Operator Algorith in C

I'm currently struggling to make a 3D Sobel edge detector in C (which I am quite new to). It's not exactly working as expected (highlighting non-edges within a solid 3D object) and I was hoping someone might see where I've gone wrong. (and sorry for the poor spacing in this post)
First of all, im is the input image which has been copied into tm with a 1 pixel border on each side.
I loop through the image:
for (z = im.zlo; z <= im.zhi; z++) {
for (y = im.ylo; y <= im.yhi; y++) {
for (x = im.xlo; x <= im.xhi; x++) {
I make an array which will house the change in the x, y, and z directions, and loop through a 3x3x3 cube:
int dxdydz[3] = {0, 0, 0};
for (a = -1; a < 2; a++) {
for (b = -1; b < 2; b++) {
for (c = -1; c < 2; c++) {
Now here's the meat, where it gets a bit tricky. I'm weighting my Sobel operator such that if you imagine one 2D surface of the kernel, it would be {{1,2,1},{2,4,2},{1,2,1}}. In other words, the weight of a kernel pixel is related to its 4-connected nearness to the center pixel.
To accomplish this, I define e as 3 - (|a| + |b| + |c|), so that it is either 0, 1, or 2. The kernel will be weighted by 3^e at each pixel.
The sign of the kernel pixel will just be determined by the sign of a, b, or c.
int e = 3 - (abs(a) + abs(b) + abs(c));
Now I loop through a, b, and c by packaging them into an array and looping from 0-1-2. When a for example is 0, we don't want to add any values to x, so we exclude that with an if statement (8 levels deep!).
int abc[3] = {a, b, c};
for (i = 0; i < 3; i++) {
if (abc[i] != 0) {
The value to add should just be the image value at that pixel multiplied by the kernel value at that pixel. abc[i] is just -1 or 1, and (int)pow(3, e) is the nearness-to-center weight.
dxdydz[i] += abc[i]*(int)pow(3, e)*tm.u[z+a][y+b][x+c];
}
}
}
}
}
Lastly take the sqrt of the sum of the squared changes in x, y, and z.
int mag2 = 0;
for (i = 0; i < 3; i++) {
mag2 += (int)pow(dxdydz[i], 2);
}
im.u[z][y][x] = (int)sqrt(mag2);
}
}
}
Of course I could just loop through the image and multiply 3x3x3 cubes by the 3D kernels:
int kx[3][3][3] = {{{-1,-2,-1},{0,0,0},{1,2,1}},
{{-2,-4,-2},{0,0,0},{2,4,2}},
{{-1,-2,-1},{0,0,0},{1,2,1}}};
int ky[3][3][3] = {{{-1,-2,-1},{-2,-4,-2},{-1,-2,-1}},
{{0,0,0},{0,0,0},{0,0,0}},
{{1,2,1},{2,4,2},{1,2,1}}};
int kz[3][3][3] = {{{-1,0,1},{-2,0,2},{-1,0,1}},
{{-2,0,2},{-4,0,4},{-2,0,2}},
{{-1,0,1},{-1,0,1},{-1,0,1}}};
But I think the loop approach is a lot sexier.

Check whether the polygon is convex

I need to check whether the polygon is convex
I know that there were questions here about it, but I need to check the code, whether it is right
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int check_figure(float* x_points[], float* y_points[]);
int main(void) {
int n;
scanf("%i", &n);
int i = 0;
float **x_points = NULL, **y_points = NULL;
x_points = (float**) malloc(sizeof(float*) * (n + 1));
if (x_points == NULL) {
return 0;
}
y_points = (float**) malloc(sizeof(float*) * (n + 1));
if (y_points == NULL) {
return 0;
}
for (i = 0; i < n; i++) {
x_points[i] = (float*) malloc((n + 1) * sizeof(float));
scanf("%f", x_points[i]);
y_points[i] = (float*) malloc((n + 1) * sizeof(float));
scanf("%f", y_points[i]);
}
for (i = 0; i < n - 1; i++) {
if ((x_points[i] == NULL) || (y_points[i] == NULL)) {
return 0;
}
}
x_points[n] = NULL;
y_points[n] = NULL;
int convex = check_figure(x_points, y_points);
if (convex == 1) {
printf("%s", "true");
} else {
printf("%s", "false");
}
free(x_points);
free(y_points);
//free(convex);
return 0;
}
int check_figure(float *x_points[], float *y_points[]) {
float first = 0, booll = 1, sign = 0, result = 0;
int i = 0;
//int *convex = (int*)malloc(sizeof(int));
int convex;
while (1) {
if (x_points[i] != NULL) {
i++;
} else {
break;
}
}
first = *x_points[i - 1] * *y_points[0] - *y_points[i - 1] * *x_points[0];
sign = first / fabsf(first);
int k;
for (k = 0; k < i - 2; k++) {
result = *x_points[k] * *y_points[k + 1] - *x_points[k + 1] * *y_points[k];
booll = booll * sign * result / fabsf(result);
if (booll < 0) {
convex = 0;
return convex;
} else {
convex = 1;
return convex;
}
}
}
there is a sample, for example I input 4 and then I input 0,2; 2,-2; 0,0; -2,-2; and it returns me true, but the polygon is not convex...I really can't get it
A polygon is convex if all of its angles are acute, or 180 degrees (well, then it's not an angle, but depending what your doing, sometimes they happen.) So, just make sure all the angles are acute or 180 degrees.
On 2 dimensions this is not that hard, you just make sure your polygons are always wound the same way, then to test an angle between segments ab and bc, you create another vector by rotating vec2(a - b) 90 degrees towards the middle of the polygon (always the same rotation, because it's always wound the same way. A 90 degree rotation here can be accomplished by flipping the x and y values, and then negating one of the values, based on which way you are rotating.) Then, if the dot product of that rotated vector and the the vector of vec2(c-b) is positive, it is acute, if it is negative, it is obtuse, and if it is 0 it is a straight line. In three dimensions it is also not that hard, but you have to rotate along the same plane as the original angle.
Looking at your code I just have no idea how that is suppose to determine whether or not an angle is acute. At some point you need an angle, or a sine of an angle, or a cosine of an angle (obtained in this method via a dot product), or something somehow related to an angle. There are more direct ways to do this also, this is just a relatively performant one.

Using two Arrays in C/Gameboy programming

For a game in Gameboy programming, I am using four arrays called top, oldTop, bottom and oldBottom:
struct Point { int x, y; };
struct Rect { struct Point xx, yy; };
Rect top[size], oldTop[size];
Rect bottom[size], oldBottom[i];
where Rect is a struct made of two Struct Points, the top-left and the bottom right corner points.
The idea of the game is to have random-heighted blocks top-down from the ceiling and bottom-up from the floor.
It is similar to the copter-classic game. In my infinite while loop, I shift all of the rectangles down by one pixel using the following code
while (1)
{
for (int i = 0; i < size; i++)
{
//in Struct Rect, xx is the top-left corner point, and yy is the bottom right
top[i].xx.x--;
top[i].yy.x--;
bottom[i].xx.x--;
bottom[i].yy.x--;
if (top[i].xx.x < 0)
{
top[i].xx.x += 240;
top[i].yy.x += 240;
}
if (bottom[i].xx.x < 0)
{
bottom[i].xx.x += 240;
bottom[i].yy.x += 240;
}
}
for (int i = 0; i < size; i++)
{
drawRect(oldTop[i], colorBlack);
drawRect(oldBottom[i], colorBlack);
}
/*call delay function that wait for Vertical Blank*/
for(int i = 0; i < size; i++)
{
drawRect(top[i], colorGreen);
drawRect(bottom[i], colorGreen);
oldTop[i] = top[i];
oldBottom[i] = bottom[i];
}
}
The drawRect method uses DMA to draw the rectangle.
with this code, the code should display the rectangles like this: (drew this up in paint)
But the result I get is
What is odd is that if I don't draw the bottom row at all, then the top row draws fine. The result only messes up when I draw both. This is really weird because I think that the code should be working fine, and the code is not very complicated. Is there a specific reason this is happening, and is there a way to remedy this?
Thanks.
The code that I use to draw the rectangle looks like this:
void drawRect(int row, int col, int width, int height){
int i;
for (i=0; i<height; i++)
{
DMA[3].src = &color;
DMA[3].dst = videoBuffer + (row+r)*240 + col);
DMA[3].cnt = DMA_ON | DMA_FIXED_SOURCE | width;
}
}
Here's a debugging SSCCE (Short, Self-Contained, Correct Example) based on your code. There are assertions in this code that fire; it runs, but is known not to be correct. I've renamed bottom to btm and oldBottom to oldBtm so that the names are symmetric; it makes the code layout more systematic (but is otherwise immaterial).
#include <assert.h>
#include <stdio.h>
typedef struct Point { int x, y; } Point;
typedef struct Rect { struct Point xx, yy; } Rect;
enum { size = 2 };
typedef enum { colourGreen = 0, colourBlack = 1 } Colour;
/*ARGSUSED*/
static void drawRect(Rect r, Colour c)
{
printf(" (%3d)(%3d)", r.xx.x, r.yy.x);
}
int main(void)
{
Rect top[size], oldTop[size];
Rect btm[size], oldBtm[size];
int counter = 0;
for (int i = 0; i < size; i++)
{
top[i].xx.x = 240 - 4 * i;
top[i].xx.y = 0 + 10 + i;
top[i].yy.x = 240 - 14 * i;
top[i].yy.y = 0 + 20 + i;
btm[i].xx.x = 0 + 72 * i;
btm[i].xx.y = 0 + 10 * i;
btm[i].yy.x = 0 + 12 * i;
btm[i].yy.y = 0 + 20 * i;
oldTop[i] = top[i];
oldBtm[i] = btm[i];
}
while (1)
{
if (counter++ > 480) // Limit amount of output!
break;
for (int i = 0; i < size; i++)
{
//in Struct Rect, xx is the top-left corner point, and yy is the bottom right
top[i].xx.x--;
top[i].yy.x--;
btm[i].xx.x--;
btm[i].yy.x--;
if (top[i].xx.x < 0)
{
top[i].xx.x += 240;
top[i].yy.x += 240;
}
if (btm[i].xx.x < 0)
{
btm[i].xx.x += 240;
btm[i].yy.x += 240;
}
}
for (int i = 0; i < size; i++)
{
assert(top[i].xx.x >= 0 && top[i].yy.x >= 0);
assert(btm[i].xx.x >= 0 && btm[i].yy.x >= 0);
}
for (int i = 0; i < size; i++)
{
drawRect(oldTop[i], colourBlack);
drawRect(oldBtm[i], colourBlack);
}
/*call delay function that wait for Vertical Blank*/
for(int i = 0; i < size; i++)
{
drawRect(top[i], colourGreen);
drawRect(btm[i], colourGreen);
oldTop[i] = top[i];
oldBtm[i] = btm[i];
}
putchar('\n');
}
return(0);
}
As noted in a late comment, one big difference between this and your code is that oldBottom in your code is declared as:
Rect top[size], oldTop[size];
Rect bottom[size], oldBottom[i];
using the size i instead of size. This probably accounts for array overwriting issues you see.
There's a second problem though; the assertions in the loop in the middle fire:
(240)(240) ( 0)( 0) (236)(226) ( 72)( 12) (239)(239) (239)(239) (235)(225) ( 71)( 11)
(239)(239) (239)(239) (235)(225) ( 71)( 11) (238)(238) (238)(238) (234)(224) ( 70)( 10)
(238)(238) (238)(238) (234)(224) ( 70)( 10) (237)(237) (237)(237) (233)(223) ( 69)( 9)
(237)(237) (237)(237) (233)(223) ( 69)( 9) (236)(236) (236)(236) (232)(222) ( 68)( 8)
(236)(236) (236)(236) (232)(222) ( 68)( 8) (235)(235) (235)(235) (231)(221) ( 67)( 7)
(235)(235) (235)(235) (231)(221) ( 67)( 7) (234)(234) (234)(234) (230)(220) ( 66)( 6)
(234)(234) (234)(234) (230)(220) ( 66)( 6) (233)(233) (233)(233) (229)(219) ( 65)( 5)
(233)(233) (233)(233) (229)(219) ( 65)( 5) (232)(232) (232)(232) (228)(218) ( 64)( 4)
(232)(232) (232)(232) (228)(218) ( 64)( 4) (231)(231) (231)(231) (227)(217) ( 63)( 3)
(231)(231) (231)(231) (227)(217) ( 63)( 3) (230)(230) (230)(230) (226)(216) ( 62)( 2)
(230)(230) (230)(230) (226)(216) ( 62)( 2) (229)(229) (229)(229) (225)(215) ( 61)( 1)
(229)(229) (229)(229) (225)(215) ( 61)( 1) (228)(228) (228)(228) (224)(214) ( 60)( 0)
Assertion failed: (btm[i].xx.x >= 0 && btm[i].yy.x >= 0), function main, file video.c, line 63.
I think your 'not negative' checks should be revised to:
if (top[i].xx.x < 0)
top[i].xx.x += 240;
if (top[i].yy.x < 0)
top[i].yy.x += 240;
if (btm[i].xx.x < 0)
btm[i].xx.x += 240;
if (btm[i].yy.x < 0)
btm[i].yy.x += 240;
This stops anything going negative. However, it is perfectly plausible that you should simply be checking on the bottom-right x-coordinate (instead of the top-left coordinate) using the original block. Or the wraparound may need to be more complex altogether. That's for you to decipher. But I think that the odd displays occur because you were providing negative values where you didn't intend to and weren't supposed to.
The key points to note here are:
When you're debugging an algorithm, you don't have to use the normal display mechanisms.
When you're debugging, reduce loop sizes where you can (size == 2).
Printing just the relevant information (here, the x-coordinates) helped reduce the output.
Putting the counter code to limit the amount of output simplifies things.
If things are going wrong, look for patterns in what is going wrong early.
I had various versions of the drawRect() function before I got to the design shown, which works well on a wide screen (eg 120x65) terminal window.

Progressive loop through pairs of increasing integers

Suppose one wanted to search for pairs of integers x and y a that satisfy some equation, such as (off the top of my head) 7 x^2 + x y - 3 y^2 = 5
(I know there are quite efficient methods for finding integer solutions to quadratics like that; but this is irrelevant for the purpose of the present question.)
The obvious approach is to use a simple double loop "for x = -max to max; for y = -max to max { blah}" But to allow the search to be stopped and resumed, a more convenient approach, picturing the possible integers of x and y as a square lattice of points in the plane, is to work round a "square spiral" outward from the origin, starting and stopping at (say) the top right corner.
So basically, I am asking for a simple and sound "pseudo-code" for the loops to start and stop this process at points (m, m) and (n, n) respectively.
For extra kudos, if the reader is inclined, I suggest also providing the loops if one of x can be assumed non-negative, or if both can be assumed non-negative. This is probably somewhat easier, especially the second.
I could whump this up myself without much difficulty, but am interested in seeing neat ideas of others.
This would make quite a good "constructive" interview challenge for those dreaded interviewers who like to torture candidates with white boards ;-)
def enumerateIntegerPairs(fromRadius, toRadius):
for radius in range(fromRadius, toRadius + 1):
if radius == 0: yield (0, 0)
for x in range(-radius, radius): yield (x, radius)
for y in range(-radius, radius): yield (radius, -y)
for x in range(-radius, radius): yield (-x, -radius)
for y in range(-radius, radius): yield (-radius, y)
Here is a straightforward implementation (also on ideone):
void turn(int *dr, int *dc) {
int tmp = *dc;
*dc = -*dr;
*dr = tmp;
}
int main(void) {
int N = 3;
int r = 0, c = 0;
int sz = 0;
int dr = 1, dc = 0, cnt = 0;
while (r != N+1 && c != N+1) {
printf("%d %d\n", r, c);
if (cnt == sz) {
turn(&dr, &dc);
cnt = 0;
if (dr == 0 && dc == -1) {
r++;
c++;
sz += 2;
}
}
cnt++;
r += dr;
c += dc;
}
return 0;
}
The key in the implementation is the turn function, that performs the right turn given a pair of {delta-Row, delta-Col}. The rest is straightforward arithmetic.

VsampFactor and HsampFactor in FJCore library

I've been using the FJCore library in a Silverlight project to help with some realtime image processing, and I'm trying to figure out how to get a tad more compression and performance out of the library. Now, as I understand it, the JPEG standard allows you to specify a chroma subsampling ratio (see http://en.wikipedia.org/wiki/Chroma_subsampling and http://en.wikipedia.org/wiki/Jpeg); and it appears that this is supposed to be implemented in the FJCore library using the HsampFactor and VsampFactor arrays:
public static readonly byte[] HsampFactor = { 1, 1, 1 };
public static readonly byte[] VsampFactor = { 1, 1, 1 };
However, I'm having a hard time figuring out how to use them. It looks to me like the current values are supposed to represent 4:4:4 subsampling (e.g., no subsampling at all), and that if I wanted to get 4:1:1 subsampling, the right values would be something like this:
public static readonly byte[] HsampFactor = { 2, 1, 1 };
public static readonly byte[] VsampFactor = { 2, 1, 1 };
At least, that's the way that other similar libraries use these values (for instance, see the example code here for libjpeg).
However, neither the above values of {2, 1, 1} nor any other set of values that I've tried besides {1, 1, 1} produce a legible image. Nor, in looking at the code, does it seem like that's the way it's written. But for the life of me, I can't figure out what the FJCore code is actually trying to do. It seems like it's just using the sample factors to repeat operations that it's already done -- i.e., if I didn't know better, I'd say that it was a bug. But this is a fairly established library, based on some fairly well established Java code, so I'd be surprised if that were the case.
Does anybody have any suggestions for how to use these values to get 4:2:2 or 4:1:1 chroma subsampling?
For what it's worth, here's the relevant code from the JpegEncoder class:
for (comp = 0; comp < _input.Image.ComponentCount; comp++)
{
Width = _input.BlockWidth[comp];
Height = _input.BlockHeight[comp];
inputArray = _input.Image.Raster[comp];
for (i = 0; i < _input.VsampFactor[comp]; i++)
{
for (j = 0; j < _input.HsampFactor[comp]; j++)
{
xblockoffset = j * 8;
yblockoffset = i * 8;
for (a = 0; a < 8; a++)
{
// set Y value. check bounds
int y = ypos + yblockoffset + a; if (y >= _height) break;
for (b = 0; b < 8; b++)
{
int x = xpos + xblockoffset + b; if (x >= _width) break;
dctArray1[a, b] = inputArray[x, y];
}
}
dctArray2 = _dct.FastFDCT(dctArray1);
dctArray3 = _dct.QuantizeBlock(dctArray2, FrameDefaults.QtableNumber[comp]);
_huf.HuffmanBlockEncoder(buffer, dctArray3, lastDCvalue[comp], FrameDefaults.DCtableNumber[comp], FrameDefaults.ACtableNumber[comp]);
lastDCvalue[comp] = dctArray3[0];
}
}
}
And notice that in the i & j loops, they're not controlling any kind of pixel skipping: if HsampFactor[0] is set to two, it's just grabbing two blocks instead of one.
I figured it out. I thought that by setting the sampling factors, you were telling the library to subsample the raster components itself. Turns out that when you set the sampling factors, you're actually telling the library the relative size of the raster components that you're providing. In other words, you need to do the chroma subsampling of the image yourself, before you ever submit it to the FJCore library for compression. Something like this is what it's looking for:
private byte[][,] GetSubsampledRaster()
{
byte[][,] raster = new byte[3][,];
raster[Y] = new byte[width / hSampleFactor[Y], height / vSampleFactor[Y]];
raster[Cb] = new byte[width / hSampleFactor[Cb], height / vSampleFactor[Cb]];
raster[Cr] = new byte[width / hSampleFactor[Cr], height / vSampleFactor[Cr]];
int rgbaPos = 0;
for (short y = 0; y < height; y++)
{
int Yy = y / vSampleFactor[Y];
int Cby = y / vSampleFactor[Cb];
int Cry = y / vSampleFactor[Cr];
int Yx = 0, Cbx = 0, Crx = 0;
for (short x = 0; x < width; x++)
{
// Convert to YCbCr colorspace.
byte b = RgbaSample[rgbaPos++];
byte g = RgbaSample[rgbaPos++];
byte r = RgbaSample[rgbaPos++];
YCbCr.fromRGB(ref r, ref g, ref b);
// Only include the byte in question in the raster if it matches the appropriate sampling factor.
if (IncludeInSample(Y, x, y))
{
raster[Y][Yx++, Yy] = r;
}
if (IncludeInSample(Cb, x, y))
{
raster[Cb][Cbx++, Cby] = g;
}
if (IncludeInSample(Cr, x, y))
{
raster[Cr][Crx++, Cry] = b;
}
// For YCbCr, we ignore the Alpha byte of the RGBA byte structure, so advance beyond it.
rgbaPos++;
}
}
return raster;
}
static private bool IncludeInSample(int slice, short x, short y)
{
// Hopefully this gets inlined . . .
return ((x % hSampleFactor[slice]) == 0) && ((y % vSampleFactor[slice]) == 0);
}
There might be additional ways to optimize this, but it's working for now.

Resources