Simple way to write uncompressed JPEG or PNG image? - c

I have a raw image in memory, organized as an array of 32-bit RGB values. I'd like to write that out as quickly as possible to an image file so that I can free up the memory. Is there a way to do the following to write either an uncompressed JPEG, PNG or TIFF image? Or perhaps I should say, what image formats are compatible with this approach to writing raw pixel data? Note that the top-left pixel is in the first 4 bytes of the pixel data.
void write_image(uint32_t *pixels, int width, int height) {
FILE *file=fopen("file.jpg","wb");
write_header (file, width, height);
fwrite (pixels,1,width*height*4,file); // write raw pixel data
write_end (file);
fclose(file);
}

There seem to be two different issues or motivations on your part.
First, there is the desire to write an image in some uncompressed format to (presumably) gain speed. PNG and JPEG are compressed formats, though you can instruct the encoder (at least in some PNG implementations) to use the "no compress" setting.
However: a) there are few scenarios in which that "optimization" would make a critical difference, the normal compressors are quite quick.
b) Even when encoding using some compression_level=0 setting, you are still encoding the image in a particular format (typically a header, to start with). What leads us to the second motivation.
Second, it seems that you want to avoid not exactly (only) the compression, but the encoding. That is, you want to write the pixels in your unencoded ("raw") format. IN that case, of course, you cannot write a PNG or JPEG image. You can use your own or some standard RAW format, or the quasi-raw BMP format. But you still need to take care of how the pixels are organized in memory (for example, one byte per channel? RGB? BGR? RGBA ?) and perhaps some other issues (for example, BMP requires that the bytes per line are multiple of 4).

Uncompressed JPEG and PNG are non-trivial, and the results would likely have portability issues. Your simplest option might be TGA:
TGA was designed to store rasterized images that could be quickly loaded for display into frame buffers. It has a very simple 18-byte header, then raw pixel data. It's streamable, meaning image data can be written as generated, provided it's rasterized. There is no requirement for length/offset/CRC tables. The biggest limitation is samples must be in rasterized BGR order with an optional fourth channel (usually alpha but can be anything). The width/height fields in the header must be 16-bit little-endian. TGA format isn't as widely supported as GIF/JPEG/PNG, but you should find some viewers capable of rendering it.
For BGR data, the header would probably be (hexadecimal values):
0x00 0x00 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(width%256) (width/256) (height%256) (height/256) 0x18 0x20
For BGRA data, the header would probably be:
0x00 0x00 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(width%256) (width/256) (height%256) (height/256) 0x20 0x28
If your data is not in BGR/BGRA order and you can't or don't want to convert it, my second recommendation would be TIFF:
TIFF design is focused on storing rasterized scans of documents and uses a more complex TTLV-style header. It is not as streamable and requires offset tables, but you can get around this with brute-force precalculation and by defining the entire image as a single strip. Once you get the header done, then you can stream the raw pixel data in rasterized RGB or RGBA order as it's generated. TIFF can also support either big-endian or little-endian and up to 16-bits per sample. It is often the output format of choice for scanners, so viewer/editor support is fairly common.
The first four bytes are a magic number (byte order and TIFF file version), followed by a 4-byte offset of the image file directory (IFD). Usually this offset is just 8 bytes into the file to place the IFD immediately next. The IFD is a count value (N) followed by a table of N 12-byte entries in the form of tag, type, count, and value/offset. The rest of the details are too much to describe here but you can read the specification for more info.

TGA format is a good option for this. It is just a simple header followed by your data. 24 or 32 bits.
https://en.wikipedia.org/wiki/Truevision_TGA

I'd recommend svpng for debugging purposes.
https://github.com/miloyip/svpng
A minimalistic C function for saving RGB/RGBA image into uncompressed PNG.
Features
RGB or RGBA color format
Single function
32 lines of ANSI C code
No dependency
Customizable output stream (default with C file descriptor)
Example
void test_rgba(void) {
unsigned char rgba[256 * 256 * 4], *p = rgba;
unsigned x, y;
FILE* fp = fopen("rgba.png", "wb");
for (y = 0; y < 256; y++)
for (x = 0; x < 256; x++) {
*p++ = (unsigned char)x; /* R */
*p++ = (unsigned char)y; /* G */
*p++ = 128; /* B */
*p++ = (unsigned char)((x + y) / 2); /* A */
}
svpng(fp, 256, 256, rgba, 1);
fclose(fp);
}

Related

is there any way to crop an jpg image captured by esp cam?

//I am trying to crop an image captured by espcam the image is in a jpg format I would like to crop it. As the image is stored as a single-dimensional array I tried to rearrange the elements in the array but no changes occurred //
I have cropped the image in RGB565 but I am struggling to understand the single-dimensional array(image buffer)
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sscb_sda = SIOD_GPIO_NUM;
config.pin_sscb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_RGB565;
config.frame_size = FRAMESIZE_SVGA;
// config.jpeg_quality = 10;
config.fb_count = 2;
esp_err_t result = esp_camera_init(&config);
if (result != ESP_OK) {
return false;
}
camera_fb_t * fb = NULL;
fb = esp_camera_fb_get();
if (!fb)
{
Serial.println("Camera capture failed");
}
the Fb buffer is a single-dimensional array I want to extract each individual RGB value.
JPG is a compressed format, meaning that your rows and columns are not corresponding to what you would see by displaying a 1:1 grid on the screen. You need to convert it to the plain RGB (or equivalents) format and then copy it.
JPG achieves compression by splitting the image into YCbCR components, using a mathematical transformation and then filtering. For additional information I refer to this page.
Luckily you can follow this tutorial to do the inverse JPEG transformation on an Arduino (tip: forget to do this in real time, unless your time constraints are very relaxed).
The idea is to use a library that converts the JPEG image into an array of data:
Using the library is fairly simple: we give it the JPEG file, and the library will start generating arrays of pixels – so called Minimum Coded Units, or MCUs for short. The MCU is a block of 16 by 8 pixels. The functions in the library will return the color value for each pixel as 16-bit color value. The upper 5 bits are the red value, the middle 6 are green and the lower 5 are blue. Now we can send these values by any sort of communication channel we like.
For your use case you won't send the data through the communication channel, but rather store it in a local array by pushing the blocks into adjacent tiles, then do the crop.
That depends on what kind of hardware (camera and board) you are using.
I'm basing this on the OV2640 camera module because it's the one I've been working with. It delivers the image to the frame buffer already encoded, so I'm guessing this might be what you are facing.
Trying to crop the image after it has been encoded can be tricky, but you might be able to instruct the camera chip to only deliver a certain part of the sensor output in the first place using a window function.
The easiest way to access this setting is to define a function to access this:
void setWindow(int resolution , int xOffset, int yOffset, int xLength, int yLength) {
sensor_t * s = esp_camera_sensor_get();
resolution = 0;
s->set_res_raw(s, resolution, 0, 0, 0, xOffset, yOffset, xLength, yLength, xLength, yLength, true, true);
}
/*
* resolution = 0 \\ 1600 x 1200
* resolution = 1 \\ 800 x 600
* resolution = 2 \\ 400 x 296
*/
where (xOffset,yOffset) is the origin of the window in pixels and (xLength,yLength) is the size of the window in pixels. Be aware that changing the resolution will effectively overwrite these settings. Otherwise this works great for me, although for some reason only if the aspect ratio of 4:3 is preserved in the window size.
Looking at the output format table for the ESP32 Camera Driver one can see that most output formats are non-jpeg. If you can handle a RAW format instead (it will be slower to save/transfer, and be MUCH larger) then that would allow you to more easily crop the image by make a copy with a couple of loops. JPEG is compressed and not easily cropped. The page linked also mentions this:
Using YUV or RGB puts a lot of strain on the chip because writing to PSRAM is not particularly fast. The result is that image data might be missing. This is particularly true if WiFi is enabled. If you need RGB data, it is recommended that JPEG is captured and then turned into RGB using fmt2rgb888 or fmt2bmp/frame2bmp
If you are using PIXFORMAT_RGB565 (which means each pixel value will be kept in TWO bytes, and the image is not jpeg compressed) and FRAMESIZE_SVGA (800x600 pixels), you should be able to access the framebuffer as a two-dimensional array if you want:
uint16_t *buffer = fb->buf;
uint16_t pxl = buffer[row * 800 + column]; // 800 is the SVGA width
// pxl now contains 5 R-bits, 6 G-bits, 5 B-bits

Reading SQLite header

I was trying to parse the header from an SQLite database file, using this (fragment of the actual) code:
struct Header_info {
char *filename;
char *sql_string;
uint16_t page_size;
};
int read_header(FILE *db, struct Header_info *header)
{
assert(db);
uint8_t sql_buf[100] = {0};
/* load the header */
if(fread(sql_buf, 100, 1, db) != 1) {
return ERR_SIZE;
}
/* copy the string */
header->sql_string = strdup((char *)sql_buf);
/* verify that we have a proper header */
if(strcmp(header->sql_string, "SQLite format 3") != 0) {
return ERR_NOT_HEADER;
}
memcpy(&header->page_size, (sql_buf + 16), 2);
return 0;
}
Here are the relevant bytes of the file I'm testing it on:
0000000: 5351 4c69 7465 2066 6f72 6d61 7420 3300 SQLite format 3.
0000010: 1000 0101 0040 2020 0000 c698 0000 1a8e .....# ........
Following this spec, the code looks correct to me.
Later I print header->page_size with this line:
printf("\tPage size: %"PRIu16"\n", header->page_size);
But that line prints out 16, instead of the expected 4096. Why? I'm almost certain it's some basic thing that I've just overlooked.
It's an endianness problem. x86 is little-endian, that is, in memory, the least significant byte is stored first. When you load 10 00 into memory on a little-endian architecture, you therefore get 00 10 in human-readable form, which is 16 instead of 4096.
Your problem is therefore that memcpy is not an appropriate tool to read the value.
See the following section of the SQLite file format spec :
1.2.2 Page Size
The two-byte value beginning at offset 16 determines the page size of
the database. For SQLite versions 3.7.0.1 and earlier, this value is
interpreted as a big-endian integer and must be a power of two between
512 and 32768, inclusive. Beginning with SQLite version 3.7.1, a page
size of 65536 bytes is supported. The value 65536 will not fit in a
two-byte integer, so to specify a 65536-byte page size, the value is
at offset 16 is 0x00 0x01. This value can be interpreted as a
big-endian 1 and thought of is as a magic number to represent the
65536 page size. Or one can view the two-byte field as a little endian
number and say that it represents the page size divided by 256. These
two interpretations of the page-size field are equivalent.
It seems an endianness issue. If you are on a little-endian machine this line:
memcpy(&header->page_size, (sql_buf + 16), 2);
copies the two bytes 10 00 into an uint16_t which will have the low-order byte at the lower address.
You can do this instead:
header->page_size = sql_buf[17] | (sql_buf[16] << 8);
Update
For the record, note that the solution I propose will work regardless of the endianness of the machine (see this Rob Pike's Article).

Playing YUV video with SDL: Line wraps too late when resolution is not multiple of 4

I've been battling with an issue when playing certain sources of uncompressed YUV 4:2:0 planar video data with SDL_Overlay (SDL 1.2.5).
I have no problems playing, say, 640x480 video. But I have just attempted playing a video with the resolution 854x480, and I get a strange effect. The line wraps 1-2 pixels too late (causing a shear-like transformation) and the chroma disappears, to be replaced with alternating R, G or B on each line. See this screenshot
The YUV data itself is correct, as I can save it to a file and play it in another player. It is not padded at this point - the pitch matches the line length.
My suspicion is that some issue occurs when the resolution is not a multiple of 4. Perhaps SDL_Surface expects an SDL_Overlay to have a chroma resolution as a multiple of 2?
Adding to my suspicion, I note that the RGB SDL_Surface that I create at a size of 854*480 has a pitch of 2564, not the 3*854 = 2562 I would expect.
If I add 1 or 2 pixels to the width of the SDL_Surface (but keep the overlay and rectangle the same), it works fine, albeit with a black border to the right. Of course this then breaks with videos which are a multiple of four.
Setup
screen = SDL_SetVideoMode(width, height, 24, SDL_SWSURFACE|SDL_ANYFORMAT|SDL_ASYNCBLIT);
if ( screen == NULL ) {
return 0;
}
YUVOverlay = SDL_CreateYUVOverlay(width, height, SDL_IYUV_OVERLAY, screen);
Ydata = new unsigned char[luma_size];
Udata = new unsigned char[chroma_size];
Vdata = new unsigned char[chroma_size];
YUVOverlay->pixels[0] = Ydata;
YUVOverlay->pixels[1] = Udata;
YUVOverlay->pixels[2] = Vdata;
SDL_DisplayYUVOverlay(YUVOverlay, dest);
Rendering loop:
SDL_LockYUVOverlay(YUVOverlay);
memcpy(Ydata, buffer, luma_size);
memcpy(Udata, buffer+luma_size, chroma_size);
memcpy(Vdata, buffer+luma_size+chroma_size, chroma_size);
int i = SDL_DisplayYUVOverlay(YUVOverlay, dest);
SDL_UnlockYUVOverlay(YUVOverlay);
The easiest fix for me to do is increase the RGB SDL_Surface size so that it is a multiple of 4 in each dimension. But then this adds a black border.
Is there a correct way of fixing this issue? Should I try playing with padding on my YUV data?
Each plane of your input data must start on an address divisible by 8, and the stride of each row must be divisible by 8. To be clear: your chroma planes need to obey this too.
This requirement seems to be from the SDL library's use of MMX multimedia instructions on an x86 cpu. See the comments in src/video/SDL_yuv_mmx.c in the distribution.
update: I looked at the actual assembly code, and there are additional assumptions not mentioned in the source code comments. This is for SDL 1.2.14. In addition to the modulo 8 assumption described above, the code assumes that both the input luma and input chroma planes are packed perfectly (i.e. width == stride).

Getting width and height from jpeg image file

I wrote this function to given filename(a jpeg file) shall print its size in pixels, w and h. According to tutorial that I'm reading,
//0xFFC0 is the "Start of frame" marker which contains the file size
//The structure of the 0xFFC0 block is quite simple [0xFFC0][ushort
length][uchar precision][ushort x][ushort y]
So, I wrote this struct
#pragma pack(1)
struct imagesize {
unsigned short len; /* 2-bytes */
unsigned char c; /* 1-byte */
unsigned short x; /* 2-bytes */
unsigned short y; /* 2-bytes */
}; //sizeof(struct imagesize) == 7
#pragma pack()
and then:
#define SOF 0xC0 /* start of frame */
void jpeg_test(const char *filename)
{
FILE *fh;
unsigned char buf[4];
unsigned char b;
fh = fopen(filename, "rb");
if(fh == NULL)
fprintf(stderr, "cannot open '%s' file\n", filename);
while(!feof(fh)) {
b = fgetc(fh);
if(b == SOF) {
struct imagesize img;
#if 1
ungetc(b, fh);
fread(&img, 1, sizeof(struct imagesize), fh);
#else
fread(buf, 1, sizeof(buf), fh);
int w = (buf[0] << 8) + buf[1];
int h = (buf[2] << 8) + buf[3];
img.x = w;
img.y = h;
#endif
printf("%dx%d\n",
img.x,
img.y);
break;
}
}
fclose(fh);
}
But I'm getting 520x537 instead of 700x537, that's the real size.
Can someone point and explain where I'm wrong?
A JPEG file consists of a number of sections. Each section starts with 0xff, followed by 1-byte section identifier, followed by number of data bytes in the section (in 2 bytes), followed by the data bytes. The sequence 0xffc0, or any other 0xff-- two-byte sequence, inside the data byte sequence, has no significance and does not mark a start of a section.
As an exception, the very first section does not contain any data or length.
You have to read each section header in turn, parse the length, then skip corresponding number of bytes before starting to read next section. You cannot just search for 0xffc0, let alone just 0xc0, without regard to the section structure.
Source.
There are several issues to consider, depending on how "universal" you want your program to be. First, I recommend using libjpeg. A good JPEG parser can be a bit gory, and this library does a lot of the heavy lifting for you.
Next, to clarify n.m.'s statement, you have no guarantee that the first 0xFFCO pair is the SOF of interest. I've found that modern digital cameras like to load up the JPEG header with a number of APP0 and APP1 blocks, which can mean that the first SOF marker you encounter during a sequential read may actually be the image thumbnail. This thumbnail is usually stored in JPEG format (as far as I have observed, anyway) and is thus equipped with its own SOF marker. Some cameras and/or image editing software can include an image preview that is larger than a thumbnail (but smaller than the actual image). This preview image is usually JPEG and again has it's own SOF marker. It's not unusual for the image SOF marker to be the last one.
Most (all?) modern digital cameras also encode the image attributes in the EXIF tags. Depending upon your application requirements, this might be the most straightforward, unambiguous way to obtain the image size. The EXIF standard document will tell you all you need to know about writing an EXIF parser. (libExif is available, but it never fit my applications.) Regardless, if you roll your own EXIF or rely on a library, there are some good tools for inspecting EXIF data. jhead is very good tool, and I've also had good luck with ExifTool.
Lastly, pay attention to endianess. SOF and other standard JPEG markers are big-endian, but EXIF markers may vary.
As you mention, the spec states that the marker is 0xFFC0. But it seems that you only ever look for a single byte with the code if (b==SOF)
If you open the file up with a hex editor, and search for 0xFFC0 you'll find the marker. Now as long as the first 0xC0 in the file is the marker, your code will work. If it's not though, you get all sorts of undefined behaviour.
I'd be inclined to read the whole file first. It's a jpg right, how big could it be? (thought this is important if on an embedded system) Then just step through it looking for the first char of my marker. When found, I'd use a memcmp to see if the next 3bytes mathed the rest of the sig.

Loading an 8bpp grayscale BMP in C

I can't make sense of the BMP format, I know its supposed to be simple, but somehow I'm missing something. I thought it was 2 headers followed by the actual bytes defining the image, but the numbers do not add up.
For instance, I'm simply trying to load this BMP file into memory (640x480 8bpp grayscale) and just write it back to a different file. From what I understand, there are two different headers BITMAPFILEHEADER and BITMAPINFOHEADER. The BITMAPFILEHEADER is 14 bytes, and the BITMAPINFOHEADER is 40 bytes (this one depends on the BMP, how can I tell that's another story). Anyhow, the BITMAPFILEHEADER, through its parameter bfOffBits says that the bitmap bits start at offset 1078. This means that there are 1024 ( 1078 - (40+14) ) other bytes, carrying more information. What are those bytes, and how do I read them, this is the problem. Or is there a more correct way to load a BMP and write it to disk ?
For reference here is the code I used ( I'm doing all of this under windows btw.)
#include <windows.h>
#include <iostream>
#include <stdio.h>
HANDLE hfile;
DWORD written;
BITMAPFILEHEADER bfh;
BITMAPINFOHEADER bih;
int main()
hfile = CreateFile("image.bmp",GENERIC_READ,FILE_SHARE_READ,NULL,OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL);
ReadFile(hfile,&bfh,sizeof(bfh),&written,NULL);
ReadFile(hfile,&bih,sizeof(bih),&written,NULL);
int imagesize = bih.biWidth * bih.biHeight;
image = (unsigned char*) malloc(imagesize);
ReadFile(hfile,image,imagesize*sizeof(char),&written,NULL);
CloseHandle(hfile);
I'm then doing the exact opposite to write to a file,
hfile = CreateFile("imageout.bmp",GENERIC_WRITE,FILE_SHARE_WRITE,NULL,CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL);
WriteFile(hfile,&bfh,sizeof(bfh),&written,NULL);
WriteFile(hfile,&bih,sizeof(bih),&written,NULL);
WriteFile(hfile,image,imagesize*sizeof(char),&written,NULL);
CloseHandle(hfile);
Edit --- Solved
Ok so I finally got it right, it wasn't really complicated after all. As Viktor pointed out, these 1024 bytes represent the color palette.
I added the following to my code:
RGBQUAD palette[256];
// [...] previous declarations [...] int main() [...] then read two headers
ReadFile(hfile,palette,sizeof(palette),&written,NULL);
And then when I write back I added the following,
WriteFile(hfile,palette,sizeof(palette),&written,NULL);
"What are those bytes, and how do I read them, this is the problem."
Those bytes are Palette (or ColorTable in .BMP format terms), as Retired Ninja mentioned in the comment. Basically, it is a table which specifies what color to use for each 8bpp value encountered in the bitmap data.
For greyscale the palette is trivial (I'm not talking about color models and RGB -> greyscale conversion):
for(int i = 0 ; i < 256 ; i++)
{
Palette[i].R = i;
Palette[i].G = i;
Palette[i].B = i;
}
However, there's some padding in the ColorTable's entries, so it takes 4 * 256 bytes and not 256 * 3 needed by you. The fourth component in the ColorTable's entry (RGBQUAD Struct) is not the "alpha channel", it is just something "reserved". See the MSDN on RGBQUAD (MSDN, RGBQUAD).
The detailed format description can be found on the wikipedia page:Wiki, bmp format
There's also this linked question on SO with RGBQUAD structure: Writing BMP image in pure c/c++ without other libraries
As Viktor says in his answer, those bits are the pallete. As for how should you read them, take a look at this header-only bitmap class. In particular look at references to ColorTable for how it treats the pallette bit depending on the type of BMP is it was given.

Resources