I have an app that generates a bunch of jpgs that I need to turn into a webm video. I'm trying to get my rgb data from the jpegs into the vpxenc sample. I can see the basic shapes from the original jpgs in the output video, but everything is tinted green (even pixels that should be black are about halfway green) and every other scanline has some garbage in it.
I'm trying to feed it VPX_IMG_FMT_YV12 data, which I'm assuming is structured like so:
for each frame
8-bit Y data
8-bit averages of each 2x2 V block
8-bit averages of each 2x2 U block
Here is a source image and a screenshot of the video that is coming out:
Images
It's entirely possible that I'm doing the RGB->YV12 conversion incorrectly, but even if I only encode the 8-bit Y data and set the U and V blocks to 0, the video looks about the same. I'm basically running my RGB data through this equation:
// (R, G, and B are 0-255)
float y = 0.299f*R + 0.587f*G + 0.114f*B;
float v = (R-y)*0.713f;
float u = (B-v)*0.565f;
.. and then to produce the 2x2 filtered values for U and V that I write into vpxenc, I just do (a + b + c + d) / 4, where a,b,c,d are the U or V values of each 2x2 pixel block.
So I'm wondering:
Is there an easier way (in code) to take RGB data and feed it to vpx_codec_encode to get a nice webm video?
Is my RGB->YV12 conversion wrong somewhere?
Any help would be greatly appreciated.
freefallr: Sure. Here is the code. Note that it's converting the RGB->YUV in place as well as putting the YV12 output into pFullYPlane/pDownsampledUPlane/pDownsampledVPlane. This code produced nice looking WebM videos when I modified their vpxenc sample to use this data.
void RGB_To_YV12( unsigned char *pRGBData, int nFrameWidth, int nFrameHeight, void *pFullYPlane, void *pDownsampledUPlane, void *pDownsampledVPlane )
{
int nRGBBytes = nFrameWidth * nFrameHeight * 3;
// Convert RGB -> YV12. We do this in-place to avoid allocating any more memory.
unsigned char *pYPlaneOut = (unsigned char*)pFullYPlane;
int nYPlaneOut = 0;
for ( int i=0; i < nRGBBytes; i += 3 )
{
unsigned char B = pRGBData[i+0];
unsigned char G = pRGBData[i+1];
unsigned char R = pRGBData[i+2];
float y = (float)( R*66 + G*129 + B*25 + 128 ) / 256 + 16;
float u = (float)( R*-38 + G*-74 + B*112 + 128 ) / 256 + 128;
float v = (float)( R*112 + G*-94 + B*-18 + 128 ) / 256 + 128;
// NOTE: We're converting pRGBData to YUV in-place here as well as writing out YUV to pFullYPlane/pDownsampledUPlane/pDownsampledVPlane.
pRGBData[i+0] = (unsigned char)y;
pRGBData[i+1] = (unsigned char)u;
pRGBData[i+2] = (unsigned char)v;
// Write out the Y plane directly here rather than in another loop.
pYPlaneOut[nYPlaneOut++] = pRGBData[i+0];
}
// Downsample to U and V.
int halfHeight = nFrameHeight >> 1;
int halfWidth = nFrameWidth >> 1;
unsigned char *pVPlaneOut = (unsigned char*)pDownsampledVPlane;
unsigned char *pUPlaneOut = (unsigned char*)pDownsampledUPlane;
for ( int yPixel=0; yPixel < halfHeight; yPixel++ )
{
int iBaseSrc = ( (yPixel*2) * nFrameWidth * 3 );
for ( int xPixel=0; xPixel < halfWidth; xPixel++ )
{
pVPlaneOut[yPixel * halfWidth + xPixel] = pRGBData[iBaseSrc + 2];
pUPlaneOut[yPixel * halfWidth + xPixel] = pRGBData[iBaseSrc + 1];
iBaseSrc += 6;
}
}
}
Never mind. The scheme I was using was correct but I had a bug in the U/V downsampling code.
Related
My function is getting an Image and I am trying to show the mirror of it.
horizonal flip. I tried to do something like swap function, but it prints the original picture.
The size of the image is m*n and the function knows the values of m and n.
Here is my code:
void flipfunc(Img *img)
{
int y;
int x;
const int middleX = m / 2;
char tmp;
char* p;
for (x = 0; x < middleX; ++x)
{
p = image->data + x * m;
for (y = 0; y <3*n; y+=3)
{
// swap pixels
tmp = p[y];
p[y] = p[3*n - 1 - y];
p[3*n - 1 - y] = tmp;
tmp = p[y+1];
p[y+1] = p[3*n - 1 - (y+1)];
p[3*n - 1 - y] = tmp;
tmp = p[y+2];
p[y+2] = p[3*n - 1 - (y+2)];
p[3*n - 1 - (y+2)] = tmp;
}
}
}
/* Image type - contains height, width, and RGB data */
struct Img {
unsigned long X;
unsigned long Y;
char *data;
};
Leaving aside the issues that other mentioned in the comments, I'll try to answer with a few hints:
1) In your function, you want to do an in-place mirroring of the given rgb image. That's reasonable.
2) You were thinking in the right direction with your "middleX" and your pixel-swapping approach. BUT it seems you did it wrong: You ignore the top half of your image completely, and instead swap each row of the bottom half twice! That's why you end up with the same image in the end. So why don't you just apply your "middle" logic to the inner loop instead of the outer loop?
Images are usually stored in a row-major raster order. That is, all pixels of the first row come before (=their addresses are smaller than) pixels of the second row, the third row, etc.
To visualize an example image of width 5 and height 4:
RGB RGB RGB RGB RGB
RGB RGB RGB RGB RGB
RGB RGB RGB RGB RGB
RGB RGB RGB RGB RGB
Each letter represents a byte in memory; spaces are just for clarity.
To find a specific sample in the image, you must do some arithmetic. For example, locate a G sample of pixel with x=1 and y=2. I marked it with a capital G:
,,, ,,, ,,, ,,, ,,,
,,, ,,, ,,, ,,, ,,,
,,, ,G. ... ... ...
... ... ... ... ...
To find its offset in the data array, count the number of samples that come before it (I marked them with ,). If you fiddle with it enough, you can discover this formula:
offset = y * 3 * sizeX + x * 3 + k;
where k is 0, 1, or 2, depending on which colour you are looking for.
Your code has (for example)
x * m + y
which is not correct at all.
BTW you shouldn't use variable names like m or n - they are extremely confusing. Just use
image->sizeX
or, if you want to use temporary variables:
unsigned long sizeX = image->sizeX;
This should be working:
void flipfunc(Img *img)
{
int y;
int x;
int l, r;
char tmp;
for (y = 0; y < img->y; ++ y) {
for (l = 0, r = img->x - 1; l < r; ++ l, -- r) {
for (d = 0; d < 3; ++ d) {
tmp = img->data[(y * img->x + l) * 3 + d];
img->data[(y * img->x + l) * 3 + d] = img->data[(y * img->x + r) * 3 + d];
img->data[(y * img->x + r) * 3 + d] = tmp;
}
}
}
}
If you want something, let's say, more optimized, build it with -O2.
Reducing a raw pixelmap to a value of 50% is easy. I simply slide a 2x2 square across the map and average the RGB components of the 4 pixels as follows:
img = XGetImage(d_remote,RootWindow(d_remote,0),0,0,attr.width,attr.height,XAllPlanes(),ZPixmap);
int i;
int j;
for(i=0;i<attr.height;i=i+2){
for(j=0;j<attr.width;j=j+2) {
unsigned long p1 = XGetPixel(img, j, i);
unsigned long p1R = p1 & 0x00ff0000;
unsigned long p1G = p1 & 0x0000ff00;
unsigned long p1B = p1 & 0x000000ff;
unsigned long p2 = XGetPixel(img, j+1, i);
unsigned long p2R = p2 & 0x00ff0000;
unsigned long p2G = p2 & 0x0000ff00;
unsigned long p2B = p2 & 0x000000ff;
unsigned long p3 = XGetPixel(img, j, i+1);
unsigned long p3R = p3 & 0x00ff0000;
unsigned long p3G = p3 & 0x0000ff00;
unsigned long p3B = p3 & 0x000000ff;
unsigned long p4 = XGetPixel(img, j+1, i+1);
unsigned long p4R = p4 & 0x00ff0000;
unsigned long p4G = p4 & 0x0000ff00;
unsigned long p4B = p4 & 0x000000ff;
unsigned long averageR = (p1R+p2R+p3R+p4R)/4 & 0x00ff0000;
unsigned long averageG = (p1G+p2G+p3G+p4G)/4 & 0x0000ff00;
unsigned long averageB = (p1B+p2B+p3B+p4B)/4 & 0x000000ff;
int average = averageR | averageG | averageB;
XPutPixel(newImg, j/2, i/2, average);
}
}
This would make a pixelmap that is 500x500 turn into one that is 250x250. This is a 50% reduction. What if I wanted to scale it by 20%. For example I would like my 500x500 image to turn into 400x400? The smallest square I can slide is a 2x2. I don't see how I can get a reduction that is not a perfect power of 2.
Solution:
How's this for effort?? I modified a script I found that does bi-linear interpolation to work on XImages. It should work for any generic pixelmap. I do find the code ugly though since I see images as 2d arrays. I don't see why all the image code is mapped to a 1d array. It's harder to visualize. This works for any resize.
void resize(XImage* input, XImage* output, int sourceWidth, int sourceHeight, int targetWidth, int targetHeight)
{
int a, b, c, d, x, y, index;
float x_ratio = ((float)(sourceWidth - 1)) / targetWidth;
float y_ratio = ((float)(sourceHeight - 1)) / targetHeight;
float x_diff, y_diff, blue, red, green ;
int offset = 0 ;
int i=0;
int j=0;
int* inputData = (int*)input->data;
int* outputData = (int*)output->data;
for (i = 0; i < targetHeight; i++)
{
for (j = 0; j < targetWidth; j++)
{
x = (int)(x_ratio * j) ;
y = (int)(y_ratio * i) ;
x_diff = (x_ratio * j) - x ;
y_diff = (y_ratio * i) - y ;
index = (y * sourceWidth + x) ;
a = inputData[index] ;
b = inputData[index + 1] ;
c = inputData[index + sourceWidth] ;
d = inputData[index + sourceWidth + 1] ;
// blue element
blue = (a&0xff)*(1-x_diff)*(1-y_diff) + (b&0xff)*(x_diff)*(1-y_diff) +
(c&0xff)*(y_diff)*(1-x_diff) + (d&0xff)*(x_diff*y_diff);
// green element
green = ((a>>8)&0xff)*(1-x_diff)*(1-y_diff) + ((b>>8)&0xff)*(x_diff)*(1-y_diff) +
((c>>8)&0xff)*(y_diff)*(1-x_diff) + ((d>>8)&0xff)*(x_diff*y_diff);
// red element
red = ((a>>16)&0xff)*(1-x_diff)*(1-y_diff) + ((b>>16)&0xff)*(x_diff)*(1-y_diff) +
((c>>16)&0xff)*(y_diff)*(1-x_diff) + ((d>>16)&0xff)*(x_diff*y_diff);
outputData[offset++] = (int)red << 16 | (int)green << 8 | (int)blue;
}
}
}
Here is some pseudocode for downscaling. WS,HS is the target image size WB,HB is the source size. WS is less than WB and HS is less than HB.
double row[WB];
double Xratio= WB/WS;
double Yratio= HB/HS;
double curYratio= Yratio;
double remainY= Yratio - floor(Yratio);
double remainX= Xratio - floor(Xratio);
double curXratio;
double rfac, cfac;
int icol,irow, orow, ocol;
zero-out row
orow= 0;
for(irow=0..HB-1)
{
// we find out how much of this row we will add to the current sum
if (curYratio>=1.0) rfac= 1.0; else rfac= curYratio;
// we add it
for(icol=0..WB) row[icol] += rfac * input[irow][icol];
// we reduce the total weight
curYratio -= rfac;
// if the total weight is now zero, we have a complete row,
// otherwise we still need some of the next row
if (curYratio!=0.0) continue;
// we have a complete row, compute the weighted average
for(icol=0..WB-1) row[icol]/= Yratio;
// now we can scale the row in horizontal
curXratio= Xratio;
ocol= 0;
double pixel= 0.0;
for(icol=0..WB-1)
{
if (curXratio>=1.0) cfac= 1.0; else cfac= curXratio;
pixel+= row[icol]*cfac;
curXratio -= cfac;
if (curXratio!=0) continue;
// now we have a complete pixel
out[orow][ocol]= pixel / Xratio;
pixel= remainX * row[icol];
curXratio= Xratio - remainX;
ocol++;
}
orow++;
// let's put the remainder of the last input row into 'row'
for(icol=0..WB-1) row[i]= remainY*input[irow][icol];
curYratio= Yratio - remainY;
}
This took longer than I thought it would, but there it is. Anyway, it's not very wise to run this directly on an input bitmap. You should convert each pixel value to it's sRGB value before doing any arithmetic. The pixel values in a common bitmap are just names for the real values which should be used in computations. Look up sRGB on wikipedia, it has good information.
If you do it without converting to sRGB and back, you will have a darker image when you scale down.
I have images as bitmap and JPEG. I will have to retrieve the pixels from the image there by RGB values of all pixels are obtained. Please suggest a method where RGB values are retrieved from an image file. I would appreciated if there are any functions available in C.
You can parse and get bitmap from JPEG using libJPEG - it is pretty simple
Suppose you have and RGB bimap in 'rgb'. Result will be placed in 'yuv420p' vector.
void rgb2yuv420p(std::vector<BYTE>& rgb, std::vector<BYTE>& yuv420p)
{
unsigned int i = 0;
unsigned int numpixels = width * height;
unsigned int ui = numpixels;
unsigned int vi = numpixels + numpixels / 4;
unsigned int s = 0;
#define sR (BYTE)(rgb[s+2])
#define sG (BYTE)(rgb[s+1])
#define sB (BYTE)(rgb[s+0])
yuv420p.resize(numpixels * 3 / 2);
for (int j = 0; j < height; j++)
for (int k = 0; k < width; k++)
{
yuv420p[i] = (BYTE)( (66*sR + 129*sG + 25*sB + 128) >> 8) + 16;
if (0 == j%2 && 0 == k%2)
{
yuv420p[ui++] = (BYTE)( (-38*sR - 74*sG + 112*sB + 128) >> 8) + 128;
yuv420p[vi++] = (BYTE)( (112*sR - 94*sG - 18*sB + 128) >> 8) + 128;
}
i++;
s += colors;
}
}
If you want to do this yourself, here's teh Wikipedia article that I worked from when I did this at work, about a year back:
http://en.wikipedia.org/wiki/YUV
This is pretty good too:
http://www.fourcc.org/fccyvrgb.php
But MUCH easier is jpeglib - that wasn't an option in my case, because the data wasn't jpeg in the first place.
Can anyone spot any way to improve the speed in the next Bilinear resizing Algorithm?
I need to improve Speed as this is critical, keeping good image quality. Is expected to be used in mobile devices with low speed CPUs.
The algorithm is used mainly for up-scale resizing. Any other faster Bilinear algorithm also would be appreciated. Thanks
void resize(int* input, int* output, int sourceWidth, int sourceHeight, int targetWidth, int targetHeight)
{
int a, b, c, d, x, y, index;
float x_ratio = ((float)(sourceWidth - 1)) / targetWidth;
float y_ratio = ((float)(sourceHeight - 1)) / targetHeight;
float x_diff, y_diff, blue, red, green ;
int offset = 0 ;
for (int i = 0; i < targetHeight; i++)
{
for (int j = 0; j < targetWidth; j++)
{
x = (int)(x_ratio * j) ;
y = (int)(y_ratio * i) ;
x_diff = (x_ratio * j) - x ;
y_diff = (y_ratio * i) - y ;
index = (y * sourceWidth + x) ;
a = input[index] ;
b = input[index + 1] ;
c = input[index + sourceWidth] ;
d = input[index + sourceWidth + 1] ;
// blue element
blue = (a&0xff)*(1-x_diff)*(1-y_diff) + (b&0xff)*(x_diff)*(1-y_diff) +
(c&0xff)*(y_diff)*(1-x_diff) + (d&0xff)*(x_diff*y_diff);
// green element
green = ((a>>8)&0xff)*(1-x_diff)*(1-y_diff) + ((b>>8)&0xff)*(x_diff)*(1-y_diff) +
((c>>8)&0xff)*(y_diff)*(1-x_diff) + ((d>>8)&0xff)*(x_diff*y_diff);
// red element
red = ((a>>16)&0xff)*(1-x_diff)*(1-y_diff) + ((b>>16)&0xff)*(x_diff)*(1-y_diff) +
((c>>16)&0xff)*(y_diff)*(1-x_diff) + ((d>>16)&0xff)*(x_diff*y_diff);
output [offset++] =
0x000000ff | // alpha
((((int)red) << 24)&0xff0000) |
((((int)green) << 16)&0xff00) |
((((int)blue) << 8)&0xff00);
}
}
}
Off the the top of my head:
Stop using floating-point, unless you're certain your target CPU has it in hardware with good performance.
Make sure memory accesses are cache-optimized, i.e. clumped together.
Use the fastest data types possible. Sometimes this means smallest, sometimes it means "most native, requiring least overhead".
Investigate if signed/unsigned for integer operations have performance costs on your platform.
Investigate if look-up tables rather than computations gain you anything (but these can blow the caches, so be careful).
And, of course, do lots of profiling and measurements.
In-Line Cache and Lookup Tables
Cache your computations in your algorithm.
Avoid duplicate computations (like (1-y_diff) or (x_ratio * j))
Go through all the lines of your algorithm, and try to identify patterns of repetitions. Extract these to local variables. And possibly extract to functions, if they are short enough to be inlined, to make things more readable.
Use a lookup-table
It's quite likely that, if you can spare some memory, you can implement a "store" for your RGB values and simply "fetch" them based on the inputs that produced them. Maybe you don't need to store all of them, but you could experiment and see if some come back often. Alternatively, you could "fudge" your colors and thus end up with less values to store for more lookup inputs.
If you know the boundaries for you inputs, you can calculate the complete domain space and figure out what makes sense to cache. For instance, if you can't cache the whole R, G, B values, maybe you can at least pre-compute the shiftings ((b>>16) and so forth...) that are most likely deterministic in your case).
Use the Right Data Types for Performance
If you can avoid double and float variables, use int. On most architectures, int would be test faster type for computations because of the memory model. You can still achieve decent precision by simply shifting your units (ie use 1026 as int instead of 1.026 as double or float). It's quite likely that this trick would be enough for you.
x = (int)(x_ratio * j) ;
y = (int)(y_ratio * i) ;
x_diff = (x_ratio * j) - x ;
y_diff = (y_ratio * i) - y ;
index = (y * sourceWidth + x) ;
Could surely use some optimization: you were using x_ration * j-1 just a few cycles earlier, so all you really need here is x+=x_ratio
My random guess (use a profiler instead of letting people guess!):
The compiler has to generate that works when input and output overlap which means it has to do generate loads of redundant stores and loads. Add restrict to the input and output parameters to remove that safety feature.
You could also try using a=b; and c=d; instead of loading them again.
here is my version, steal some ideas. My C-fu is quite weak, so some lines are pseudocodes, but you can fix them.
void resize(int* input, int* output,
int sourceWidth, int sourceHeight,
int targetWidth, int targetHeight
) {
// Let's create some lookup tables!
// you can move them into 2-dimensional arrays to
// group together values used at the same time to help processor cache
int sx[0..targetWidth ]; // target->source X lookup
int sy[0..targetHeight]; // target->source Y lookup
int mx[0..targetWidth ]; // left pixel's multiplier
int my[0..targetHeight]; // bottom pixel's multiplier
// we don't have to calc indexes every time, find out when
bool reloadPixels[0..targetWidth ];
bool shiftPixels[0..targetWidth ];
int shiftReloadPixels[0..targetWidth ]; // can be combined if necessary
int v; // temporary value
for (int j = 0; j < targetWidth; j++){
// (8bit + targetBits + sourceBits) should be < max int
v = 256 * j * (sourceWidth-1) / (targetWidth-1);
sx[j] = v / 256;
mx[j] = v % 256;
reloadPixels[j] = j ? ( sx[j-1] != sx[j] ? 1 : 0)
: 1; // always load first pixel
// if no reload -> then no shift too
shiftPixels[j] = j ? ( sx[j-1]+1 = sx[j] ? 2 : 0)
: 0; // nothing to shift at first pixel
shiftReloadPixels[j] = reloadPixels[i] | shiftPixels[j];
}
for (int i = 0; i < targetHeight; i++){
v = 256 * i * (sourceHeight-1) / (targetHeight-1);
sy[i] = v / 256;
my[i] = v % 256;
}
int shiftReload;
int srcIndex;
int srcRowIndex;
int offset = 0;
int lm, rm, tm, bm; // left / right / top / bottom multipliers
int a, b, c, d;
for (int i = 0; i < targetHeight; i++){
srcRowIndex = sy[ i ] * sourceWidth;
tm = my[i];
bm = 255 - tm;
for (int j = 0; j < targetWidth; j++){
// too much ifs can be too slow, measure.
// always true for first pixel in a row
if( shiftReload = shiftReloadPixels[ j ] ){
srcIndex = srcRowIndex + sx[j];
if( shiftReload & 2 ){
a = b;
c = d;
}else{
a = input[ srcIndex ];
c = input[ srcIndex + sourceWidth ];
}
b = input[ srcIndex + 1 ];
d = input[ srcIndex + 1 + sourceWidth ];
}
lm = mx[j];
rm = 255 - lm;
// WTF?
// Input AA RR GG BB
// Output RR GG BB AA
if( j ){
leftOutput = rightOutput ^ 0xFFFFFF00;
}else{
leftOutput =
// blue element
((( ( (a&0xFF)*tm
+ (c&0xFF)*bm )*lm
) & 0xFF0000 ) >> 8)
// green element
| ((( ( ((a>>8)&0xFF)*tm
+ ((c>>8)&0xFF)*bm )*lm
) & 0xFF0000 )) // no need to shift
// red element
| ((( ( ((a>>16)&0xFF)*tm
+ ((c>>16)&0xFF)*bm )*lm
) & 0xFF0000 ) << 8 )
;
}
rightOutput =
// blue element
((( ( (b&0xFF)*tm
+ (d&0xFF)*bm )*lm
) & 0xFF0000 ) >> 8)
// green element
| ((( ( ((b>>8)&0xFF)*tm
+ ((d>>8)&0xFF)*bm )*lm
) & 0xFF0000 )) // no need to shift
// red element
| ((( ( ((b>>16)&0xFF)*tm
+ ((d>>16)&0xFF)*bm )*lm
) & 0xFF0000 ) << 8 )
;
output[offset++] =
// alpha
0x000000ff
| leftOutput
| rightOutput
;
}
}
}
Because I'm masochistic I'm trying to write something in C to decode an 8-bit PNG file (it's a learning thing, I'm not trying to reinvent libpng...)
I've got to the point when the stuff in my deflated, unfiltered data buffer unmistakably resembles the source image (see below), but it's still quite, erm, wrong, and I'm pretty sure there's something askew with my implementation of the filtering algorithms. Most of them are quite simple, but there's one major thing I don't understand in the docs, not being good at maths or ever having taken a comp-sci course:
Unsigned arithmetic modulo 256 is used, so that both the inputs and outputs fit into bytes.
What does that mean?
If someone can tell me that I'd be very grateful!
For reference, (and I apologise for the crappy C) my noddy implementation of the filtering algorithms described in the docs look like:
unsigned char paeth_predictor (unsigned char a, unsigned char b, unsigned char c) {
// a = left, b = above, c = upper left
char p = a + b - c; // initial estimate
char pa = abs(p - a); // distances to a, b, c
char pb = abs(p - b);
char pc = abs(p - c);
// return nearest of a,b,c,
// breaking ties in order a,b,c.
if (pa <= pb && pa <= pc) return a;
else if (pb <= pc) return b;
else return c;
}
void unfilter_sub(char* out, char* in, int bpp, int row, int rowlen) {
for (int i = 0; i < rowlen; i++)
out[i] = in[i] + (i < bpp ? 0 : out[i-bpp]);
}
void unfilter_up(char* out, char* in, int bpp, int row, int rowlen) {
for (int i = 0; i < rowlen; i++)
out[i] = in[i] + (row == 0 ? 0 : out[i-rowlen]);
}
void unfilter_paeth(char* out, char* in, int bpp, int row, int rowlen) {
char a, b, c;
for (int i = 0; i < rowlen; i++) {
a = i < bpp ? 0 : out[i - bpp];
b = row < 1 ? 0 : out[i - rowlen];
c = i < bpp ? 0 : (row == 0 ? 0 : out[i - rowlen - bpp]);
out[i] = in[i] + paeth_predictor(a, b, c);
}
}
And the images I'm seeing:
Source
Source http://img220.imageshack.us/img220/8111/testdn.png
Output
Output http://img862.imageshack.us/img862/2963/helloworld.png
It means that, in the algorithm, whenever an arithmetic operation is performed, it is performed modulo 256, i.e. if the result is greater than 256 then it "wraps" around. The result is that all values will always fit into 8 bits and not overflow.
Unsigned types already behave this way by mandate, and if you use unsigned char (and a byte on your system is 8 bits, which it probably is), then your calculation results will naturally just never overflow beyond 8 bits.
It means only the last 8 bits of the result is used. 2^8=256, the last 8 bits of unsigned value v is the same as (v%256).
For example, 2+255=257, or 100000001, last 8 bits of 257 is 1, and 257%256 is also 1.
In 'simple language' it means that you never go "out" of your byte size.
For example in C# if you try this it will fail:
byte test = 255 + 255;
(1,13): error CS0031: Constant value '510' cannot be converted to a
'byte'
byte test = (byte)(255 + 255);
(1,13): error CS0221: Constant value '510' cannot be converted to a
'byte' (use 'unchecked' syntax to override)
For every calculation you have to do modulo 256 (C#: % 256).
Instead of writing % 256 you can also do AND 255:
(175 + 205) mod 256 = (175 + 205) AND 255
Some C# samples:
byte test = ((255 + 255) % 256);
// test: 254
byte test = ((255 + 255) & 255);
// test: 254
byte test = ((1 + 379) % 256);
// test: 124
byte test = ((1 + 379) & 0xFF);
// test: 124
Note that you sometimes can simplify a byte-series:
(byteVal1 + byteVal2 + byteVal3) % 256
= (((byteVal1 % 256) + (byteVal2 % 256)) % 256 + (byteVal3 % 256)) % 256