Related
//I am trying to crop an image captured by espcam the image is in a jpg format I would like to crop it. As the image is stored as a single-dimensional array I tried to rearrange the elements in the array but no changes occurred //
I have cropped the image in RGB565 but I am struggling to understand the single-dimensional array(image buffer)
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sscb_sda = SIOD_GPIO_NUM;
config.pin_sscb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_RGB565;
config.frame_size = FRAMESIZE_SVGA;
// config.jpeg_quality = 10;
config.fb_count = 2;
esp_err_t result = esp_camera_init(&config);
if (result != ESP_OK) {
return false;
}
camera_fb_t * fb = NULL;
fb = esp_camera_fb_get();
if (!fb)
{
Serial.println("Camera capture failed");
}
the Fb buffer is a single-dimensional array I want to extract each individual RGB value.
JPG is a compressed format, meaning that your rows and columns are not corresponding to what you would see by displaying a 1:1 grid on the screen. You need to convert it to the plain RGB (or equivalents) format and then copy it.
JPG achieves compression by splitting the image into YCbCR components, using a mathematical transformation and then filtering. For additional information I refer to this page.
Luckily you can follow this tutorial to do the inverse JPEG transformation on an Arduino (tip: forget to do this in real time, unless your time constraints are very relaxed).
The idea is to use a library that converts the JPEG image into an array of data:
Using the library is fairly simple: we give it the JPEG file, and the library will start generating arrays of pixels – so called Minimum Coded Units, or MCUs for short. The MCU is a block of 16 by 8 pixels. The functions in the library will return the color value for each pixel as 16-bit color value. The upper 5 bits are the red value, the middle 6 are green and the lower 5 are blue. Now we can send these values by any sort of communication channel we like.
For your use case you won't send the data through the communication channel, but rather store it in a local array by pushing the blocks into adjacent tiles, then do the crop.
That depends on what kind of hardware (camera and board) you are using.
I'm basing this on the OV2640 camera module because it's the one I've been working with. It delivers the image to the frame buffer already encoded, so I'm guessing this might be what you are facing.
Trying to crop the image after it has been encoded can be tricky, but you might be able to instruct the camera chip to only deliver a certain part of the sensor output in the first place using a window function.
The easiest way to access this setting is to define a function to access this:
void setWindow(int resolution , int xOffset, int yOffset, int xLength, int yLength) {
sensor_t * s = esp_camera_sensor_get();
resolution = 0;
s->set_res_raw(s, resolution, 0, 0, 0, xOffset, yOffset, xLength, yLength, xLength, yLength, true, true);
}
/*
* resolution = 0 \\ 1600 x 1200
* resolution = 1 \\ 800 x 600
* resolution = 2 \\ 400 x 296
*/
where (xOffset,yOffset) is the origin of the window in pixels and (xLength,yLength) is the size of the window in pixels. Be aware that changing the resolution will effectively overwrite these settings. Otherwise this works great for me, although for some reason only if the aspect ratio of 4:3 is preserved in the window size.
Looking at the output format table for the ESP32 Camera Driver one can see that most output formats are non-jpeg. If you can handle a RAW format instead (it will be slower to save/transfer, and be MUCH larger) then that would allow you to more easily crop the image by make a copy with a couple of loops. JPEG is compressed and not easily cropped. The page linked also mentions this:
Using YUV or RGB puts a lot of strain on the chip because writing to PSRAM is not particularly fast. The result is that image data might be missing. This is particularly true if WiFi is enabled. If you need RGB data, it is recommended that JPEG is captured and then turned into RGB using fmt2rgb888 or fmt2bmp/frame2bmp
If you are using PIXFORMAT_RGB565 (which means each pixel value will be kept in TWO bytes, and the image is not jpeg compressed) and FRAMESIZE_SVGA (800x600 pixels), you should be able to access the framebuffer as a two-dimensional array if you want:
uint16_t *buffer = fb->buf;
uint16_t pxl = buffer[row * 800 + column]; // 800 is the SVGA width
// pxl now contains 5 R-bits, 6 G-bits, 5 B-bits
I am making a game with C and X11. I've been trying for quite a while to find a way to put different coloured pixels on a window, frame by frame. I've seen fully developed games get thousands of frames per second. What is the most efficient way of doing this?
I have seen 2-coloured bitmaps with XImages, allocating 256 colours on a fade of black-white, and using XPutPixel with XImages (which I wasn't able to figure how to create an XImage properly that could later have pixels put on it).
I have made this for loop that creates a random image, but it is, obviously, pixel-by-pixel instead of frame-by-frame and takes 18 seconds to render one entire frame.
XColor pixel;
for (int x = 0; x < currentWindowWidth; x++) {
for (int y = 0; y < currentWindowHeight; y++) {
pixel.red = rand() % 256 * 256; //Converting 16-bit colour to 8-bit colour
pixel.green = rand() % 256 * 256;
pixel.blue = rand() % 256 * 256;
XAllocColor(display, XDefaultColormap(display, screenNumber), &pixel); //This probably takes the most time,
XSetForeground(display, graphics, pixel.pixel); //as does this.
XDrawPoint(display, window, graphics, x, y);
}
}
After three or so more weeks of testing things off and on, I finally figured out how to do it, and it was rather simple. As I said in the OP, XAllocColor and XSetForground take quite a bit of time (relatively) to work. XDrawPoint also was slow, as it does more than just put a pixel at a point on an image.
First I tested how Xlib's colour format works (for the unsigned long int represented as pixel.pixel, which was what I needed XAllocColor for), and it appears to have 100% red set to 16711680, 100% green set to 65280, and 100% blue set to 255, which is obviously a pattern. I found the maximum to be a 50% of all colours, 4286019447, which is a solid grey.
Next, I made sure my XVisualInfo would be supported by my system with a test using XMatchVisualInfo([expected visual info values]). That ensures the depth I will use and the TrueColor class works.
Finally, I made an XImage copied from the root window's image for manipulation. I used XPutPixel for each pixel on the window and set it to a random value between 0 and 4286019448, creating the random image. I then used XPutImage to paste the image to the window.
Here's the final code:
if (!XMatchVisualInfo(display, screenNumber, 24, TrueColor, &visualInfo)) {
exit(0);
}
frameImage = XGetImage(display, rootWindow, 0, 0, screenWidth, screenHeight, AllPlanes, ZPixmap);
while (1) {
for (unsigned short x = 0; x < currentWindowWidth; x += pixelSize) {
for (unsigned short y = 0; y < currentWindowHeight; y += pixelSize) {
XPutPixel(frameImage, x, y, rand() % 4286019447);
}
}
XPutImage(display, window, graphics, frameImage, 0, 0, 0, 0, currentWindowWidth, currentWindowHeight);
}
This puts a random image on the screen, at a stable 140 frames per second on fullscreen. I don't necessarily know if this is the most efficient way, but it works way better than anything else I've tried. Let me know if there is any way to make it better.
Thousands of frames per second is not possible. The monitor frequency is about 100 Hz, or 100 cycles per second, that's roughly the maximum frame rate. This is still very fast. Human eye wouldn't pick up faster frame rates.
The monitor response time is about 5ms, so any single point on the screen cannot be refreshed more than 200 times per second.
8-bit is 1 byte, so 8-bit image uses one byte per pixel, each pixel is from 0 to 256. The pixel doesn't have red, blue, green component. Instead each pixel points to an index in the color table. The color table holds 256 colors. There is a trick where you keep the pixels the same and change the color table, this makes the image fade in and out or do other weird things.
In a 24-bit image, each pixel has blue, red, green component. Each color is 1 byte, so each pixel is 3 bytes, or 24 bits.
uint8_t red = rand() % 256;
uint8_t grn = rand() % 256;
uint8_t blu = rand() % 256;
A 16-bit image uses an odd format to store red, blue, green. 16 is not divisible by 3, often times 2 colors are assigned 5-bits, and the 3rd color gets 6-bits. Then you have to fit these colors on one uint16_t sized pixel. It's probably not worth it to explore this.
The slowness of your routine is because you are painting one pixel at a time. You should paint to a buffer instead, and render the buffer once per frame. You might consider using other frame works like SDL. Other games may use things like OpenGL which takes advantage of GPU optimization for matrix operation etc.
You must use a GPU. GPUs have a highly parallel architecture optimized for graphics (hence the name). To access the GPU you will use an API like OpenGL or Vulkan or make use of a Game Engine.
I've been battling with an issue when playing certain sources of uncompressed YUV 4:2:0 planar video data with SDL_Overlay (SDL 1.2.5).
I have no problems playing, say, 640x480 video. But I have just attempted playing a video with the resolution 854x480, and I get a strange effect. The line wraps 1-2 pixels too late (causing a shear-like transformation) and the chroma disappears, to be replaced with alternating R, G or B on each line. See this screenshot
The YUV data itself is correct, as I can save it to a file and play it in another player. It is not padded at this point - the pitch matches the line length.
My suspicion is that some issue occurs when the resolution is not a multiple of 4. Perhaps SDL_Surface expects an SDL_Overlay to have a chroma resolution as a multiple of 2?
Adding to my suspicion, I note that the RGB SDL_Surface that I create at a size of 854*480 has a pitch of 2564, not the 3*854 = 2562 I would expect.
If I add 1 or 2 pixels to the width of the SDL_Surface (but keep the overlay and rectangle the same), it works fine, albeit with a black border to the right. Of course this then breaks with videos which are a multiple of four.
Setup
screen = SDL_SetVideoMode(width, height, 24, SDL_SWSURFACE|SDL_ANYFORMAT|SDL_ASYNCBLIT);
if ( screen == NULL ) {
return 0;
}
YUVOverlay = SDL_CreateYUVOverlay(width, height, SDL_IYUV_OVERLAY, screen);
Ydata = new unsigned char[luma_size];
Udata = new unsigned char[chroma_size];
Vdata = new unsigned char[chroma_size];
YUVOverlay->pixels[0] = Ydata;
YUVOverlay->pixels[1] = Udata;
YUVOverlay->pixels[2] = Vdata;
SDL_DisplayYUVOverlay(YUVOverlay, dest);
Rendering loop:
SDL_LockYUVOverlay(YUVOverlay);
memcpy(Ydata, buffer, luma_size);
memcpy(Udata, buffer+luma_size, chroma_size);
memcpy(Vdata, buffer+luma_size+chroma_size, chroma_size);
int i = SDL_DisplayYUVOverlay(YUVOverlay, dest);
SDL_UnlockYUVOverlay(YUVOverlay);
The easiest fix for me to do is increase the RGB SDL_Surface size so that it is a multiple of 4 in each dimension. But then this adds a black border.
Is there a correct way of fixing this issue? Should I try playing with padding on my YUV data?
Each plane of your input data must start on an address divisible by 8, and the stride of each row must be divisible by 8. To be clear: your chroma planes need to obey this too.
This requirement seems to be from the SDL library's use of MMX multimedia instructions on an x86 cpu. See the comments in src/video/SDL_yuv_mmx.c in the distribution.
update: I looked at the actual assembly code, and there are additional assumptions not mentioned in the source code comments. This is for SDL 1.2.14. In addition to the modulo 8 assumption described above, the code assumes that both the input luma and input chroma planes are packed perfectly (i.e. width == stride).
I am trying to find the Distance Transform for each pixels of a binary image, using OpenCV library for C. According to the rule of DT, the value of each zero (black) pixels should be 0. And that of 255 (white) pixels should be the shortest distance to a zero (black) pixel, after applying Distance transform.
I post the code here.
IplImage *im = cvLoadImage("black_white.jpg", CV_LOAD_IMAGE_GRAYSCALE);
IplImage *tmp = cvCreateImage(cvGetSize(im), 32, 1);
cvThreshold(im, im, 128, 255, CV_THRESH_BINARY_INV);
//cvSaveImage("out.jpg", im);
cvDistTransform(im, tmp, CV_DIST_L1, 3, 0, 0 );
d = (uchar*)tmp->imageData;
da = (uchar*)im->imageData;
for(i=0;i<tmp->height;i++)
for(j=0;j<tmp->width;j++)
{
//if((int)da[i*im->widthStep + j] == 255)
fprintf(f, "pixel value = %d DT = %d\n", (int)da[i*im->widthStep + j], (int)d[i*tmp->widthStep + j]);
}
cvShowImage("H", tmp);
cvWaitKey(0);
cvDestroyWindow("H");
fclose(f);
I write the pixels values along with their DT values to a file. As it turns out, some of the 0 pixels have DT values like 65, 128 etc. ie they are not 0. Moreover, I also have some white pixels that have DT values as 0 (which I guess, souldn't happen as it should be atleast 1).
Any kind of help will be appreciated.
Thanks in advance.
I guess it is because of CV_THRESH_BINARY_INV which inverts your image. So the areas you expect to be white are in fact black for DT.
Of cause, inverting the image may be your intention. Display the image im and compare with tmp for verification
So I wrote a program that reads in a bitmap and prints into the console using windows.h.
Windows (in the console) allows me to have two colors for each character space - a foreground color, and a background color.
I am limited to the 4 bit palette for these colors :
http://www.infotart.com/blog/wp-content/uploads/2008/06/windows_4bit_color_swatches.png
My program works fine for 16 colors, but I'm having trouble getting 256 figured out. (or figuring out if it's even possible)
I need to take an indexed color's RGB value (from the 256 8bit colors, something like 224, 64, 0) and display it as two of the 16 available colors, with one of them dithered.
The foreground character is going to be one of the ASCII dither characters (176, 177, 178 i think).
So I figure each of the background needs to have R, G, B values of 0, 128, 255, etc
and the foreground can be 0, 32, 64, 96, 128, 160, 192, 224, or 255
So if I had the number RGB = 192,0,0
I could set the background to RGB = 128,0,0
and have the foreground be RGB = 255,0,0 with ASCII character 176 (25% dither)
It seems like this would be pretty simple if I had a separate dither character available for red green and blue individually, but sadly I do not.
I know that the console is an awful choice, but I have to try and do this without the help of the windows gdi.
I'm completely stumped trying to figure out the algorithm for this, and having trouble even seeing if my logic is making any sense.
Anybody able to shed some light on this? all help appreciated, I've hit a wall.
Although this may not be a direct answer about going from a RGB to colored ASCII representation, the 8088 Corruption program may be a good reference to get an idea of approaches that one can take to go from a bitmap image to a CGA screen.
The 8088 Corruption program was designed to run full-motion video with sound on an original IBM PC (Google Video link).
In an explanation of how the video codec was designed (presentation available at archive.org), the creator tried several techniques, one of which was to use the the "ASCII dither characters", but wasn't satisfied with the final quality of the "picture".
So he went on to try a method where he would map multiple pixels into a ASCII character. For example, if there were two lines overlapping perpendicularly, the ASCII character X would be drawn on the screen.
I haven't actually taken a look at the source code (which I believe is written in x86 assembly), but from the descriptions of the techniques used that I've read, it may be something that may be worth taking a look at.
Well, generally, you have to "invent" a mapping from any RGB to your specific subset of colored characters.
Since the exact formula is hard to compute, I would probably stick to a huge precomputed lookup table. Table have to be 3-dimensional (one dimension for R,G,B) and [0..255] in each dimension. Each cell of the table should contain three pieces of information (packed in 2 bytes): the representing character, the foreground color, the background color.
The table should be precomputed in a following manner: for each character that you want to use as output, select each foreground and background color, then compute the resulting RGB mixture of that character displayed with that colors. Then, cell with the given RGB mixture coordinates should be updated with the info of that character and colors.
There will be empty cells, of course, as we have at most only 256*16*16 variations of colored characters for 256^3 colors, so we have to update empty colors with some kind of best nearest filled cells.
Then, for any input pixel we just lookup that table, retrieve the character and the colors, and put them in output.
It's possible to work in an opposite way - compute 256x16x16 table with resulting RGB mixtures, then search it to find a mixture that fits best for the input RGB.
I would suggest reading ImageMagick (Apache 2.0 license) quantization (Color Reduction) document as a starting place. Then you can look at color quantization, of which I believe the most popular two methods used are a median cut method, or using octrees.
You may also prefer to work in a non-RGB color space, such as Lab color space as it has some nice properties, Euclidean distance is more consistent with perceptual difference (Src: Wikipedia).
There are several kinds of dithering, ordered pattern dithering, random dithering, and error-correcting dithering, in this case I believe you want the error-correcting to reduce the apparent color error.
The 4 bit swatches have RGB values, when you mix two of these with a dither character, the RGB values would be a weighted average of each of the separate RGB elements. The weight would depend on the dither pattern used, so for the chequer-board pattern each RGB value would have equal weight, so red+green becomes:
[255,0,0] + [0,128, 0] = [(255+0)/2, (0+128)/2, (0+0)/2] = [127, 64, 0] (a shade of brown)
The weightings for the other patterns would be determined by the proportion of foreground pixels vs background pixels.
Using that to efficiently find a nearest colour is probably the hard part! With three characters, 16 colours and two foreground/background options, there is a large number of combinations, though I imagine there might be large gaps in the gamut. If you only need to translate from a 256 colour palette to one of these combinations rather then from full RGB, then a simple solution would be to write a program to perhaps exhaustively search for a best fit combination of foreground, background, and dither for each of the 256 colours and generate a look-up table that can then be used in the final application to produce a direct lookup.
Of course this fixed look-up table approach would only work is the 256 colour palette is also fixed (which is not necessarily the case). If it is not, then you may need to determine a more efficient method of finding the best match colour. I am sure that it is possible to be smarter than a mere exhaustive search.
As the other answers have already pointed out, the technique to use ASCII shade characters to generate more colors from the 16 base colors is called dithering. Dithering comes at the cost of some image resolution. Also see the legendary 8088 Corruption / 8088 Domination programs.
I'd like to provide you some code on how to find the color pair and dithering shade character algorithmically. The below approach works both in the Windows/linux consoles as well as over SSH and in the Linux Subsystem for Windows.
The general procedure is:
scale the source image down to the console resolution
map the color of each pixel to the console color that matches best
draw block/shade characters with the selected color
As a test image, I use a HSV color map:
At first, here is 16 colors at double vertical resolution. With the block character (char)223 (▀), you can double the vertical resolution by using text/background color to draw the upper and lower half of every character independently. For matching the color, I use the distance vector between the target and the probe color rgb components and brute force test all of the 16 different colors. The function sq(x) returns the square x*x.
int get_console_color(const int color) {
const int r=(color>>16)&255, g=(color>>8)&255, b=color&255;
const int matches[16] = {
sq(r- 0)+sq(g- 0)+sq(b- 0), // color_black 0 0 0 0
sq(r- 0)+sq(g- 55)+sq(b-218), // color_dark_blue 1 0 55 218
sq(r- 19)+sq(g-161)+sq(b- 14), // color_dark_green 2 19 161 14
sq(r- 58)+sq(g-150)+sq(b-221), // color_light_blue 3 58 150 221
sq(r-197)+sq(g- 15)+sq(b- 31), // color_dark_red 4 197 15 31
sq(r-136)+sq(g- 23)+sq(b-152), // color_magenta 5 136 23 152
sq(r-193)+sq(g-156)+sq(b- 0), // color_orange 6 193 156 0
sq(r-204)+sq(g-204)+sq(b-204), // color_light_gray 7 204 204 204
sq(r-118)+sq(g-118)+sq(b-118), // color_gray 8 118 118 118
sq(r- 59)+sq(g-120)+sq(b-255), // color_blue 9 59 120 255
sq(r- 22)+sq(g-198)+sq(b- 12), // color_green 10 22 198 12
sq(r- 97)+sq(g-214)+sq(b-214), // color_cyan 11 97 214 214
sq(r-231)+sq(g- 72)+sq(b- 86), // color_red 12 231 72 86
sq(r-180)+sq(g- 0)+sq(b-158), // color_pink 13 180 0 158
sq(r-249)+sq(g-241)+sq(b-165), // color_yellow 14 249 241 165
sq(r-242)+sq(g-242)+sq(b-242) // color_white 15 242 242 242
};
int m=195075, k=0;
for(int i=0; i<16; i++) if(matches[i]<m) m = matches[k=i];
return k;
}
The 16 colors are quite a limitation. So the workaround is dithering, mixing two colors to get better colors at the cost of image resolution. I use the shade characters (char)176/(char)177/(char)178 (Windows) or \u2588/\u2584/\u2580 (Linux); these are represented as (░/▒/▓). In the 12x7 font size that I use, the color mix ratios are 1:6, 2:5 and 1:2 respectively. To find the mixing ratios for your font setting, print the three shade characters in the console, take a screenshot, zoom in and count the pixels.
The three different shade ratios turn the 16 base colors into a whopping 616 colors, not counting duplicates. For matching the closest color, I first mix the colors with the shade character ratios, then compute the distance vector of target to probe rgb color components and brute force this for all probe color combinations. To encode which shade character is used and which two colors are foreground and background colors, I use bit shifting to get it all into one int return value.
int get_console_color_dither(const int color) {
const int r=(color>>16)&255, g=(color>>8)&255, b=color&255;
const int red [16] = { 0, 0, 19, 58,197,136,193,204,118, 59, 22, 97,231,180,249,242};
const int green[16] = { 0, 55,161,150, 15, 23,156,204,118,120,198,214, 72, 0,241,242};
const int blue [16] = { 0,218, 14,221, 31,152, 0,204,118,255, 12,214, 86,158,165,242};
int m=195075, k=0;
for(int i=0; i<16; i++) {
for(int j=0; j<16; j++) {
const int mixred=(red[i]+6*red[j])/7, mixgreen=(green[i]+6*green[j])/7, mixblue=(blue[i]+6*blue[j])/7; // (char)176: pixel ratio 1:6
const int match = sq(r-mixred)+sq(g-mixgreen)+sq(b-mixblue);
if(match<m) {
m = match;
k = i<<4|j;
}
}
}
for(int i=0; i<16; i++) {
for(int j=0; j<16; j++) {
const int mixred=(2*red[i]+5*red[j])/7, mixgreen=(2*green[i]+5*green[j])/7, mixblue=(2*blue[i]+5*blue[j])/7; // (char)177: pixel ratio 2:5
const int match = sq(r-mixred)+sq(g-mixgreen)+sq(b-mixblue);
if(match<m) {
m = match;
k = 1<<8|i<<4|j;
}
}
}
for(int i=0; i<16; i++) {
for(int j=0; j<i; j++) {
const int mixred=(red[i]+red[j])/2, mixgreen=(green[i]+green[j])/2, mixblue=(blue[i]+blue[j])/2; // (char)178: pixel ratio 1:2
const int match = sq(r-mixred)+sq(g-mixgreen)+sq(b-mixblue);
if(match<m) {
m = match;
k = 2<<8|i<<4|j;
}
}
}
return k;
}
Finally, you extract the shade character and the two colors by bit shifting and bit masking:
const int dither = get_console_color_dither(rgb_color);
const int textcolor=(dither>>4)&0xF, backgroundcolor=dither&0xF;
const int shade = dither>>8;
string character = ""
switch(shade) {
#if defined(_WIN32)
case 0: character += (char)176; break;
case 1: character += (char)177; break;
case 2: character += (char)178; break;
#elif defined(__linux__)
case 0: character += "\u2591"; break;
case 1: character += "\u2592"; break;
case 2: character += "\u2593"; break;
#endif // Windows/Linux
}
print(character, textcolor, backgroundcolor);
The print(...) function is provided here. The resulting image looks like this:
Finally, no asciiart post is complete without the Lenna test image. This shows you what to expect from dithering.