Processing YUV I420 from framebuffer? - c

I have a byte array named buf, that contains a single video frame in YUV I420 format obtained from a framebuffer. For every video frame I also have the following information:
Size (e.g. 320x180)
Stride Y (e.g. 384)
Stride U (e.g. 384)
Stride V (e.g. 384)
Plane offset Y (e.g. 0)
Plane offset U (e.g. 69120)
Plane offset V (e.g. 69312)
Concatenating multiple video frames in a file, and passing that with size information to a raw video decoder in VLC or FFmpeg just produces garbled colors, so I think the bytes in buf should be reordered using the information above to produce playable output, but I'm completely new to working with video so this may be wrong.
I which order should size, stride and offset information be combined with bytes in buf to produce a byte stream that could be played raw in a video player?
Example:
https://transfer.sh/E8LNy5/69518644-example-01.yuv

The layout of the data seems odd but using the given offsets and strides, this is decodable as YUV.
First there are 384 * 180 bytes of luma.
Following are the chroma lines, each being 192 bytes long... but U and V lines take turns! This is accounted for by the strange offsets. U offset points exactly to after luma. V offset is 192 bytes further... and reading would leapfrog by 384 bytes.
Here's code that extracts those planes and assembles them as I420, for decoding with cvtColor:
#!/usr/bin/env python3
import numpy as np
import cv2 as cv
def extract(data, offset, stride, width, height):
data = data[offset:] # skip to...
data = data[:height * stride] # get `height` lines
data.shape = (height, stride)
return data[:, :width] # drop overscan/padding
width, height = 320, 180
Yoffset = 0
Uoffset = 69120 # 384*180
Voffset = 69312 # 384*180 + 192
Ystride = 384
Ustride = 384
Vstride = 384
data = np.fromfile("69518644-example-01.yuv", dtype=np.uint8)
Y = extract(data, Yoffset, Ystride, width, height)
U = extract(data, Uoffset, Ustride, width // 2, height // 2)
V = extract(data, Voffset, Vstride, width // 2, height // 2)
# construct I420: Y,U,V planes in order
i420 = np.concatenate([Y.flat, U.flat, V.flat])
i420.shape = (height * 3 // 2, width)
result = cv.cvtColor(i420, cv.COLOR_YUV2BGR_I420)
cv.namedWindow("result", cv.WINDOW_NORMAL)
cv.resizeWindow("result", width * 4, height * 4)
cv.imshow("result", result)
cv.waitKey()
cv.destroyAllWindows()

Related

is there any way to crop an jpg image captured by esp cam?

//I am trying to crop an image captured by espcam the image is in a jpg format I would like to crop it. As the image is stored as a single-dimensional array I tried to rearrange the elements in the array but no changes occurred //
I have cropped the image in RGB565 but I am struggling to understand the single-dimensional array(image buffer)
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sscb_sda = SIOD_GPIO_NUM;
config.pin_sscb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_RGB565;
config.frame_size = FRAMESIZE_SVGA;
// config.jpeg_quality = 10;
config.fb_count = 2;
esp_err_t result = esp_camera_init(&config);
if (result != ESP_OK) {
return false;
}
camera_fb_t * fb = NULL;
fb = esp_camera_fb_get();
if (!fb)
{
Serial.println("Camera capture failed");
}
the Fb buffer is a single-dimensional array I want to extract each individual RGB value.
JPG is a compressed format, meaning that your rows and columns are not corresponding to what you would see by displaying a 1:1 grid on the screen. You need to convert it to the plain RGB (or equivalents) format and then copy it.
JPG achieves compression by splitting the image into YCbCR components, using a mathematical transformation and then filtering. For additional information I refer to this page.
Luckily you can follow this tutorial to do the inverse JPEG transformation on an Arduino (tip: forget to do this in real time, unless your time constraints are very relaxed).
The idea is to use a library that converts the JPEG image into an array of data:
Using the library is fairly simple: we give it the JPEG file, and the library will start generating arrays of pixels – so called Minimum Coded Units, or MCUs for short. The MCU is a block of 16 by 8 pixels. The functions in the library will return the color value for each pixel as 16-bit color value. The upper 5 bits are the red value, the middle 6 are green and the lower 5 are blue. Now we can send these values by any sort of communication channel we like.
For your use case you won't send the data through the communication channel, but rather store it in a local array by pushing the blocks into adjacent tiles, then do the crop.
That depends on what kind of hardware (camera and board) you are using.
I'm basing this on the OV2640 camera module because it's the one I've been working with. It delivers the image to the frame buffer already encoded, so I'm guessing this might be what you are facing.
Trying to crop the image after it has been encoded can be tricky, but you might be able to instruct the camera chip to only deliver a certain part of the sensor output in the first place using a window function.
The easiest way to access this setting is to define a function to access this:
void setWindow(int resolution , int xOffset, int yOffset, int xLength, int yLength) {
sensor_t * s = esp_camera_sensor_get();
resolution = 0;
s->set_res_raw(s, resolution, 0, 0, 0, xOffset, yOffset, xLength, yLength, xLength, yLength, true, true);
}
/*
* resolution = 0 \\ 1600 x 1200
* resolution = 1 \\ 800 x 600
* resolution = 2 \\ 400 x 296
*/
where (xOffset,yOffset) is the origin of the window in pixels and (xLength,yLength) is the size of the window in pixels. Be aware that changing the resolution will effectively overwrite these settings. Otherwise this works great for me, although for some reason only if the aspect ratio of 4:3 is preserved in the window size.
Looking at the output format table for the ESP32 Camera Driver one can see that most output formats are non-jpeg. If you can handle a RAW format instead (it will be slower to save/transfer, and be MUCH larger) then that would allow you to more easily crop the image by make a copy with a couple of loops. JPEG is compressed and not easily cropped. The page linked also mentions this:
Using YUV or RGB puts a lot of strain on the chip because writing to PSRAM is not particularly fast. The result is that image data might be missing. This is particularly true if WiFi is enabled. If you need RGB data, it is recommended that JPEG is captured and then turned into RGB using fmt2rgb888 or fmt2bmp/frame2bmp
If you are using PIXFORMAT_RGB565 (which means each pixel value will be kept in TWO bytes, and the image is not jpeg compressed) and FRAMESIZE_SVGA (800x600 pixels), you should be able to access the framebuffer as a two-dimensional array if you want:
uint16_t *buffer = fb->buf;
uint16_t pxl = buffer[row * 800 + column]; // 800 is the SVGA width
// pxl now contains 5 R-bits, 6 G-bits, 5 B-bits

Is there any way for image thresholding without GDI+? [duplicate]

Is it possible to directly read/write to a WriteableBitmap's pixel data? I'm currently using WriteableBitmapEx's SetPixel() but it's slow and I want to access the pixels directly without any overhead.
I haven't used HTML5's canvas in a while, but if I recall correctly you could get its image data as a single array of numbers and that's kind of what I'm looking for
Thanks in advance
To answer your question, you can more directly access a writable bitmap's data by using the Lock, write, Unlock pattern, as demonstrated below, but it is typically not necessary unless you are basing your drawing upon the contents of the image. More typically, you can just create a new buffer and make it a bitmap, rather than the other way around.
That being said, there are many extensibility points in WPF to perform innovative drawing without resorting to pixel manipulation. For most controls, the existing WPF primitives (Border, Line, Rectangle, Image, etc...) are more than sufficient - don't be concerned about using many of them, they are rather cheap to use. For complex controls, you can use the DrawingContext to draw D3D primitives. For image effects, you can implement GPU assisted shaders using the Effect class or use the built in effects (Blur and Shadow).
But, if your situation requires direct pixel access, pick a pixel format and start writing. I suggest BGRA32 because it is easy to understand and is probably the most common one to be discussed.
BGRA32 means the pixel data is stored in memory as 4 bytes representing the blue, green, red, and alpha channels of an image, in that order. It is convenient because each pixel ends up on a 4 byte boundary, lending it to storage in an 32 bit integer. When dealing with a 32 bit integer, keep in mind the order will be reversed on most platforms (check BitConverter.IsLittleEndian to determine proper byte order at runtime if you need to support multiple platforms, x86 and x86_64 are both little endian)
The image data is stored in horizontal strips which are one stride wide which compose a single row the width of an image. The stride width is always greater than or equal to the pixel width of the image multiplied by the number of bytes per pixel in the format selected. Certain situations can cause the stride to be longer than the width * bytesPerPixel which are specific to certain architechtures, so you must use the stride width to calculate the start of a row, rather than multiplying the width. Since we are using a 4 byte wide pixel format, our stride does happen to be width * 4, but you should not rely upon it.
As mentioned, the only case I would suggest using a WritableBitmap is if you are accessing an existing image, so that is the example below:
Before / After:
// must be compiled with /UNSAFE
// get an image to draw on and convert it to our chosen format
BitmapSource srcImage = JpegBitmapDecoder.Create(File.Open("img13.jpg", FileMode.Open),
BitmapCreateOptions.None, BitmapCacheOption.OnLoad).Frames[0];
if (srcImage.Format != PixelFormats.Bgra32)
srcImage = new FormatConvertedBitmap(srcImage, PixelFormats.Bgra32, null, 0);
// get a writable bitmap of that image
var wbitmap = new WriteableBitmap(srcImage);
int width = wbitmap.PixelWidth;
int height = wbitmap.PixelHeight;
int stride = wbitmap.BackBufferStride;
int bytesPerPixel = (wbitmap.Format.BitsPerPixel + 7) / 8;
wbitmap.Lock();
byte* pImgData = (byte*)wbitmap.BackBuffer;
// set alpha to transparent for any pixel with red < 0x88 and invert others
int cRowStart = 0;
int cColStart = 0;
for (int row = 0; row < height; row++)
{
cColStart = cRowStart;
for (int col = 0; col < width; col++)
{
byte* bPixel = pImgData + cColStart;
UInt32* iPixel = (UInt32*)bPixel;
if (bPixel[2 /* bgRa */] < 0x44)
{
// set to 50% transparent
bPixel[3 /* bgrA */] = 0x7f;
}
else
{
// invert but maintain alpha
*iPixel = *iPixel ^ 0x00ffffff;
}
cColStart += bytesPerPixel;
}
cRowStart += stride;
}
wbitmap.Unlock();
// if you are going across threads, you will need to additionally freeze the source
wbitmap.Freeze();
However, it really isn't necessary if you are not modifying an existing image. For example, you can draw a checkerboard pattern using all safe code:
Output:
// draw rectangles
int width = 640, height = 480, bytesperpixel = 4;
int stride = width * bytesperpixel;
byte[] imgdata = new byte[width * height * bytesperpixel];
int rectDim = 40;
UInt32 darkcolorPixel = 0xffaaaaaa;
UInt32 lightColorPixel = 0xffeeeeee;
UInt32[] intPixelData = new UInt32[width * height];
for (int row = 0; row < height; row++)
{
for (int col = 0; col < width; col++)
{
intPixelData[row * width + col] = ((col / rectDim) % 2) != ((row / rectDim) % 2) ?
lightColorPixel : darkcolorPixel;
}
}
Buffer.BlockCopy(intPixelData, 0, imgdata, 0, imgdata.Length);
// compose the BitmapImage
var bsCheckerboard = BitmapSource.Create(width, height, 96, 96, PixelFormats.Bgra32, null, imgdata, stride);
And you don't really even need an Int32 intermediate, if you write to the byte array directly.
Output:
// draw using byte array
int width = 640, height = 480, bytesperpixel = 4;
int stride = width * bytesperpixel;
byte[] imgdata = new byte[width * height * bytesperpixel];
// draw a gradient from red to green from top to bottom (R00 -> ff; Gff -> 00)
// draw a gradient of alpha from left to right
// Blue constant at 00
for (int row = 0; row < height; row++)
{
for (int col = 0; col < width; col++)
{
// BGRA
imgdata[row * stride + col * 4 + 0] = 0;
imgdata[row * stride + col * 4 + 1] = Convert.ToByte((1 - (col / (float)width)) * 0xff);
imgdata[row * stride + col * 4 + 2] = Convert.ToByte((col / (float)width) * 0xff);
imgdata[row * stride + col * 4 + 3] = Convert.ToByte((row / (float)height) * 0xff);
}
}
var gradient = BitmapSource.Create(width, height, 96, 96, PixelFormats.Bgra32, null, imgdata, stride);
Edit: apparently, you are trying to use WPF to make some sort of image editor. I would still be using WPF primitives for shapes and source bitmaps, and then implement translations, scaling, rotation as RenderTransform's, bitmap effects as Effect's and keep everything within the WPF model. But, if that does not work for you, we have many other options.
You could use WPF primitives to render to a RenderTargetBitmap which has a chosen PixelFormat to use with WritableBitmap as below:
Canvas cvRoot = new Canvas();
// position primitives on canvas
var rtb = new RenderTargetBitmap(width, height, dpix, dpiy, PixelFormats.Bgra32);
var wb = new WritableBitmap(rtb);
You could use a WPF DrawingVisual to issue GDI style commands then render to a bitmap as demonstrated on the sample on the RenderTargetBitmap page.
You could use GDI using an InteropBitmap created using System.Windows.Interop.Imaging.CreateBitmapSourceFromHBitmap from an HBITMAP retrieved from a Bitmap.GetHBitmap method. Make sure you don't leak the HBITMAP, though.
After a nice long headache, I found this article that explains a way to do it without using bit arithmetic, and allows me to treat it as an array instead:
unsafe
{
IntPtr pBackBuffer = bitmap.BackBuffer;
byte* pBuff = (byte*)pBackBuffer.ToPointer();
pBuff[4 * x + (y * bitmap.BackBufferStride)] = 255;
pBuff[4 * x + (y * bitmap.BackBufferStride) + 1] = 255;
pBuff[4 * x + (y * bitmap.BackBufferStride) + 2] = 255;
pBuff[4 * x + (y * bitmap.BackBufferStride) + 3] = 255;
}
You can access the raw pixel data by calling the Lock() method and using the BackBuffer property afterwards. When you're finished, don't forget to call AddDirtyRect and Unlock.
For a simple example, you can take a look at this: http://cscore.codeplex.com/SourceControl/latest#CSCore.Visualization/WPF/Utils/PixelManipulationBitmap.cs

Finding xy position of a bitmap

I wrote a function in order to get the position of the requested pixel position (x250 y230 - central of the entire picture - x500 y460). The problem is that the function returns the position with 17 pixels difference more on up and 12 pixels difference more on right. What am i missing.. the padd? How can i use this function properly?
size_t find (FILE* fp, dword xp, dword yp)
{
int i;
int pointer = (sizeof(DIB)+sizeof(BMP)+2)+(250*3);
for(i=0; i<460; i++)
{
fseek(fp, pointer+(i*pointer), SEEK_SET);
}
return ftell(fp);
}
As I said in my comments, you are indeed missing the padding, but not only that.
A bitmap file is composed of multi parts: Headers, a color map, and a Pixel map (mainly).
From what I understand of your question, you need your function to return the offset address in the file fp (considered to be a bitmap file) of the pixel that would be at position xp ; yp. To do that you need at least three things:
The offset of the pixel map's begginning : you will find it by reading the last 4 bytes (a dword) of the Bitmap file header, you can get it by reading at offset 10 in your file.
The pixel-per-row (or image width) number : you will find it in the BITMAPINFOHEADER
The bit-per-pixel number : you will find it in the BITMAPINFOHEADER
When you have this, the address of your pixel in the file is :
rowSizeInBytes = (((bitPerPixel * imageWidth + 31) * 4) / 32);
pixAddress = pixelMapStartAddress + rowSizeInBytes * yp + ((xp * bitPerPixel) / 8);

a better way to draw grid as background

I want to draw grid as in the below picture.
I know a trick to draw this by draw 6 vertical and horizontal lines instead of 6 x 6 small rectangle.
But if I want to have smaller zoom (zoom for viewing picture), the lines are many. For example, say my view window is of size 800 x 600 and viewing a picture of size 400 x 300 (so zoom in is 2). There will be 400 x 300 rectangle of size 2 x 2 (each rectangle represents a pixel).
If I draw each cell (in a loop, say 400 x 300 times), it is very slow (when I move the window...).
Using the trick solves the problem.
By I am still curious if there is a better way to do this task in winapi, GDI(+). For example, a function like DrawGrid(HDC hdc, int x, int y, int numOfCellsH, int numOfCellsV)?
A further question is: If I don't resize, move the window or I don't change the zoom in, the grid won't be changed. So even if I update the picture continuously (capture screen), it is uncessary to redraw the grid. But I use StretchBlt and BitBlt to capture the screen (to memory DC then hdc of the window), if I didn't redraw the grid in memory DC, then the grid will disappear. Is there a way to make the grid stick there and update the bitmap of the screen capture?
ps: This is not a real issue. Since I want to draw the grid when zoom is not less than 10 (so each cell is of size 10 x 10 or larger). In this case, there will be at most 100 + 100 = 200 lines to draw and it is fast. I am just curious if there is a faster way.
Have you considered using CreateDIBSection this will allow you a pointer so that you can manipulate the R, G, B values rapidly, for example the following creates a 256x256x24 bitmap and paints a Green squares at 64 pixel intervals:
BITMAPINFO BI = {0};
BITMAPINFOHEADER &BIH = BI.bmiHeader;
BIH.biSize = sizeof(BITMAPINFOHEADER);
BIH.biBitCount = 24;
BIH.biWidth = 256;
BIH.biHeight = 256;
BIH.biPlanes = 1;
LPBYTE pBits = NULL;
HBITMAP hBitmap = CreateDIBSection(NULL, &BI, DIB_RGB_COLORS, (void**) &pBits, NULL, 0);
LPBYTE pDst = pBits;
for (int y = 0; y < 256; y++)
{
for (int x = 0; x < 256; x++)
{
BYTE R = 0;
BYTE G = 0;
BYTE B = 0;
if (x % 64 == 0) G = 255;
if (y % 64 == 0) G = 255;
*pDst++ = B;
*pDst++ = G;
*pDst++ = R;
}
}
HDC hMemDC = CreateCompatibleDC(NULL);
HGDIOBJ hOld = SelectObject(hMemDC, hBitmap);
BitBlt(hdc, 0, 0, 256, 256, hMemDC, 0, 0, SRCCOPY);
SelectObject(hMemDC, hOld);
DeleteDC(hMemDC);
DeleteObject(hBitmap);
Generally speaking, the major limiting factors for these kinds of graphics operations are the fill rate and the number of function calls.
The fill rate is how fast the machine can change the pixel values. In general, blits (copying rectangular areas) are very fast because they're highly optimized and designed to touch memory in a cache friendly order. But a blit touches all the pixels in that region. If you're going to overdraw or if most of those pixels don't really need to change, then it's likely more efficient to draw just the pixels you need, even if that's not quite as cache-friendly.
If you're drawing n primitives by making n things, then that might be a limiting factor as n gets large, and it could make sense to look for an API call that lets you draw several (or all) of the lines at once.
Your "trick" demonstrates both of these optimizations. Drawing 20 lines is fewer calls than 100 rectangles, and it touches far fewer pixels. And as the window grows or your grid size decreases, the lines approach will increase linearly both in number of calls and in pixels touched while the rectangle method will grow as n^2.
I don't think you can do any better when it comes to touching the minimum number of pixels. But I suppose the number of function calls might become a factor if you're drawing very many lines. I don't know GDI+, but in plain GDI, there are functions like Polyline and PolyPolyline which will let you draw several lines in one call.

Getting Image size of JPEG from its binary

I have a lot of jpeg files with varying image size. For instance, here is the first 640 bytes as given by hexdump of an image of size 256*384(pixels):
0000000: ffd8 ffe0 0010 4a46 4946 0001 0101 0048 ......JFIF.....H
0000010: 0048 0000 ffdb 0043 0003 0202 0302 0203 .H.....C........
0000020: 0303 0304 0303 0405 0805 0504 0405 0a07 ................
0000030: 0706 080c 0a0c 0c0b 0a0b 0b0d 0e12 100d ................
I guess the size information mus be within these lines. But am unable to see which bytes give the sizes correctly. Can anyone help me find the fields that contains the size information?
According to the Syntax and structure section of the JPEG page on wikipedia, the width and height of the image don't seem to be stored in the image itself -- or, at least, not in a way that's quite easy to find.
Still, quoting from JPEG image compression FAQ, part 1/2 :
Subject: [22] How can my program extract image dimensions from a JPEG
file?
The header of a JPEG file consists of
a series of blocks, called "markers".
The image height and width are stored
in a marker of type SOFn (Start Of
Frame, type N). To find the SOFn
you must skip over the preceding
markers; you don't have to know what's
in the other types of markers, just
use their length words to skip over
them. The minimum logic needed is
perhaps a page of C code. (Some
people have recommended just searching
for the byte pair representing SOFn,
without paying attention to the marker
block structure. This is unsafe
because a prior marker might contain
the SOFn pattern, either by chance or
because it contains a JPEG-compressed
thumbnail image. If you don't follow
the marker structure you will retrieve
the thumbnail's size instead of the
main image size.) A profusely
commented example in C can be found in
rdjpgcom.c in the IJG distribution
(see part 2, item 15). Perl code
can be found in wwwis, from
http://www.tardis.ed.ac.uk/~ark/wwwis/.
(Ergh, that link seems broken...)
Here's a portion of C code that could help you, though : Decoding the width and height of a JPEG (JFIF) file
This function will read JPEG properties
function jpegProps(data) { // data is an array of bytes
var off = 0;
while(off<data.length) {
while(data[off]==0xff) off++;
var mrkr = data[off]; off++;
if(mrkr==0xd8) continue; // SOI
if(mrkr==0xd9) break; // EOI
if(0xd0<=mrkr && mrkr<=0xd7) continue;
if(mrkr==0x01) continue; // TEM
var len = (data[off]<<8) | data[off+1]; off+=2;
if(mrkr==0xc0) return {
bpc : data[off], // precission (bits per channel)
h : (data[off+1]<<8) | data[off+2],
w : (data[off+3]<<8) | data[off+4],
cps : data[off+5] // number of color components
}
off+=len-2;
}
}
 
I have converted the CPP code from the top answer into a python script.
"""
Source: https://stackoverflow.com/questions/2517854/getting-image-size-of-jpeg-from-its-binary#:~:text=The%20header%20of%20a%20JPEG,Of%20Frame%2C%20type%20N).
"""
def get_jpeg_size(data):
"""
Gets the JPEG size from the array of data passed to the function, file reference: http:#www.obrador.com/essentialjpeg/headerinfo.htm
"""
data_size=len(data)
#Check for valid JPEG image
i=0 # Keeps track of the position within the file
if(data[i] == 0xFF and data[i+1] == 0xD8 and data[i+2] == 0xFF and data[i+3] == 0xE0):
# Check for valid JPEG header (null terminated JFIF)
i += 4
if(data[i+2] == ord('J') and data[i+3] == ord('F') and data[i+4] == ord('I') and data[i+5] == ord('F') and data[i+6] == 0x00):
#Retrieve the block length of the first block since the first block will not contain the size of file
block_length = data[i] * 256 + data[i+1]
while (i<data_size):
i+=block_length #Increase the file index to get to the next block
if(i >= data_size): return False; #Check to protect against segmentation faults
if(data[i] != 0xFF): return False; #Check that we are truly at the start of another block
if(data[i+1] == 0xC0): #0xFFC0 is the "Start of frame" marker which contains the file size
#The structure of the 0xFFC0 block is quite simple [0xFFC0][ushort length][uchar precision][ushort x][ushort y]
height = data[i+5]*256 + data[i+6];
width = data[i+7]*256 + data[i+8];
return height, width
else:
i+=2; #Skip the block marker
block_length = data[i] * 256 + data[i+1] #Go to the next block
return False #If this point is reached then no size was found
else:
return False #Not a valid JFIF string
else:
return False #Not a valid SOI header
with open('path/to/file.jpg','rb') as handle:
data = handle.read()
h, w = get_jpeg_size(data)
print(s)
This is how I implemented this using js. The marker you are looking for is the Sofn marker and the pseudocode would basically be:
start from the first byte
the beginning of a segment will always be FF followed by another byte indicating marker type (those 2 bytes are called the marker)
if that other byte is 01 or D1 through D9, there is no data in that segment, so proceed to next segment
if that marker is C0 or C2 (or any other Cn, more detail in the comments of the code), thats the Sofn marker you're looking for
the following bytes after the marker will be P (1 byte), L (2 bytes), Height (2 bytes), Width (2 bytes) respectively
otherwise, the next two bytes followed by it will be the length property (length of entire segment excluding the marker, 2 bytes), use that to skip to the next segment
repeat until you find the Sofn marker
function getJpgSize(hexArr) {
let i = 0;
let marker = '';
while (i < hexArr.length) {
//ff always start a marker,
//something's really wrong if the first btye isn't ff
if (hexArr[i] !== 'ff') {
console.log(i);
throw new Error('aaaaaaa');
}
//get the second byte of the marker, which indicates the marker type
marker = hexArr[++i];
//these are segments that don't have any data stored in it, thus only 2 bytes
//01 and D1 through D9
if (marker === '01' || (!isNaN(parseInt(marker[1])) && marker[0] === 'd')) {
i++;
continue;
}
/*
sofn marker: https://www.w3.org/Graphics/JPEG/itu-t81.pdf pg 36
INFORMATION TECHNOLOGY –
DIGITAL COMPRESSION AND CODING
OF CONTINUOUS-TONE STILL IMAGES –
REQUIREMENTS AND GUIDELINES
basically, sofn (start of frame, type n) segment contains information
about the characteristics of the jpg
the marker is followed by:
- Lf [frame header length], two bytes
- P [sample precision], one byte
- Y [number of lines in the src img], two bytes, which is essentially the height
- X [number of samples per line], two bytes, which is essentially the width
... [other parameters]
sofn marker codes: https://www.digicamsoft.com/itu/itu-t81-36.html
apparently there are other sofn markers but these two the most common ones
*/
if (marker === 'c0' || marker === 'c2') {
break;
}
//2 bytes specifying length of the segment (length excludes marker)
//jumps to the next seg
i += parseInt(hexArr.slice(i + 1, i + 3).join(''), 16) + 1;
}
const size = {
height: parseInt(hexArr.slice(i + 4, i + 6).join(''), 16),
width: parseInt(hexArr.slice(i + 6, i + 8).join(''), 16),
};
return size;
}
If you are on a linux system and have PHP at hand, variations on this php script may produce what you are looking for:
#! /usr/bin/php -q
<?php
if (file_exists($argv[1]) ) {
$targetfile = $argv[1];
// get info on uploaded file residing in the /var/tmp directory:
$safefile = escapeshellcmd($targetfile);
$getinfo = `/usr/bin/identify $safefile`;
$imginfo = preg_split("/\s+/",$getinfo);
$ftype = strtolower($imginfo[1]);
$fsize = $imginfo[2];
switch($fsize) {
case 0:
print "FAILED\n";
break;
default:
print $safefile.'|'.$ftype.'|'.$fsize."|\n";
}
}
// eof
host> imageinfo 009140_DJI_0007.JPG
009140_DJI_0007.JPG|jpeg|4000x3000|
(Outputs filename, file type, file dimensions in pipe-delimited format)
From the man page:
For more information about the 'identify' command, point your browser to [...] http://www.imagemagick.org/script/identify.php.
Dart/Flutter port from a solution in this forum.
class JpegProps {
final int precision;
final int height;
final int width;
final int compression;
JpegProps._(this.precision, this.height, this.width, this.compression,);
String toString() => 'JpegProps($precision,$height,$width,$compression)';
static JpegProps readImage(Uint8List imageData) {
// data is an array of bytes
int offset = 0;
while (offset < imageData.length) {
while (imageData[offset] == 0xff) offset++;
var mrkr = imageData[offset];
offset++;
if (mrkr == 0xd8) continue; // SOI
if (mrkr == 0xd9) break; // EOI
if (0xd0 <= mrkr && mrkr <= 0xd7) continue;
if (mrkr == 0x01) continue; // TEM
var length = (imageData[offset] << 8) | imageData[offset + 1];
offset += 2;
if (mrkr == 0xc0) {
return JpegProps._(imageData[offset],
(imageData[offset + 1] << 8) | imageData[offset + 2],
(imageData[offset + 3] << 8) | imageData[offset + 4],
imageData[offset + 5],
);
}
offset += length - 2;
}
throw '';
}
}
Easy way to get width and heigh from a .jpg picture. Remove the EXIF and ITP information in the the file. Use "Save as" function in a view picture program (I used IrfanView or Pain Shop Pro). In the "Save as" get rid of EXIF, then save the file. The jpg file has always without EXIF the heigh at byte 000000a3 and 000000a4. The width are at 000000a5 and 000000a6
I use php
function storrelse_jpg($billedfil) //billedfil danish for picturefile
{
//Adresse for jpg fil without EXIF info !!!!!
// width is in byte 165 til 166, heigh is in byte 163 og 164
// jpg dimensions are with 2 bytes ( in png are the dimensions with 4 bytes
$billedfil="../diashow/billeder/christiansdal_teltplads_1_x.jpg"; // the picturefil
$tekst=file_get_contents($billedfil,0,NULL,165,2); //Read from 165 2 bytes - width
$tekst1=file_get_contents($billedfil,0,NULL,163,2);//Read from 163 2 bytes - heigh
$n=strlen($tekst); // længden af strengen
echo "Størrelse på billed : ".$billedfil. "<br>"; // Headline
$bredde=0; // width
$langde=0; // heigh
for ($i=0;$i<$n;$i++)
{
$by=bin2hex($tekst[$i]); //width-byte from binær to hex
$bz=hexdec($by);// then from hex to decimal
$ly=bin2hex($tekst1[$i]); // the same for length byte
$lz=hexdec($ly);
$bredde=$bredde+$bz*256**(1-$i);
$langde=$langde+$lz*256**(1-$i);
}
// $x is a array $x[0] er width and $x[1] er heigh
$x[0]=$bredde; $x[1]=$langde;
return $x;
}
A python solution based on "raw" CPP convert - https://stackoverflow.com/a/62245035/11807679
def get_jpeg_resolution(image_bytes: bytes,
size: int = None) -> Optional[Tuple[int, int]]:
"""
function for getting resolution from binary
:param image_bytes: image binary
:param size: image_bytes len if value is None it'll calc inside
:return: (width, height) or None if not found
"""
size = len(image_bytes) if size is None else size
header_bytes = (0xff, 0xD8, 0xff, 0xe0)
if not (size > 11
and header_bytes == struct.unpack_from('>4B', image_bytes)):
# Incorrect header or minimal length
return None
jfif_bytes = tuple(ord(s) for s in 'JFIF') + (0x0, )
if not (jfif_bytes == struct.unpack_from('5B', image_bytes, 6)):
# Not a valid JFIF string
return None
index = len(header_bytes)
block_length, = struct.unpack_from(">H", image_bytes, index)
index += block_length
while index < size:
if image_bytes[index] != 0xFF:
break
# Check that we are truly at the start
# of another block
if image_bytes[index + 1] == 0xC0:
# 0xFFC0 is the "Start of frame" marker
# which contains the file size
# The structure of the 0xFFC0 block is
# quite simple
# [0xFFC0][ushort length][uchar precision]
# [ushort x][ushort y]
height, width = struct.unpack_from(">HH", image_bytes, index + 5)
return width, height
else:
index += 2
# Skip the block marker
# Go to the next block
block_length, = struct.unpack(">H",
image_bytes[slice(index, index + 2)])
# Increase the file index to get to the next block
index += block_length
# If this point is reached then no size was found
return None

Resources