Can this library detect if JPG is in RGB or CMYK format?

Can this library detect if JPG is in RGB or CMYK format? - metadata-extractor

thanks for the metadata-extractor library, it's really helpful. It gives me all information I need except whether a JPG is RGB or CMYK format. Is the information there and am I just not seeing it, or is this library not returning this attribute?
Thanks

From this document on the Java ImageIO package:
https://docs.oracle.com/javase/7/docs/api/javax/imageio/metadata/doc-files/jpeg_metadata.html
When reading, the contents of the stream are interpreted by the usual JPEG conventions, as follows:
If a JFIF APP0 marker segment is present, the colorspace is known to be either grayscale or YCbCr. If an APP2 marker segment containing an embedded ICC profile is also present, then the YCbCr is converted to RGB according to the formulas given in the JFIF spec, and the ICC profile is assumed to refer to the resulting RGB space.
If an Adobe APP14 marker segment is present, the colorspace is determined by consulting the transform flag. The transform flag takes one of three values:
2 - The image is encoded as YCCK (implicitly converted from CMYK on encoding).
1 - The image is encoded as YCbCr (implicitly converted from RGB on encoding).
0 - Unknown. 3-channel images are assumed to be RGB, 4-channel images are assumed to be CMYK.
If neither marker segment is present, the following procedure is followed: Single-channel images are assumed to be grayscale, and 2-channel images are assumed to be grayscale with an alpha channel. For 3- and 4-channel images, the component ids are consulted. If these values are 1-3 for a 3-channel image, then the image is assumed to be YCbCr. Subject to the availability of the optional color space support described above, if these values are 1-4 for a 4-channel image, then the image is assumed to be YCbCrA. If these values are > 4, they are checked against the ASCII codes for 'R', 'G', 'B', 'A', 'C', 'c'. These can encode the following colorspaces:
RGB
RGBA
YCC (as 'Y','C','c'), assumed to be PhotoYCC
YCCA (as 'Y','C','c','A'), assumed to be PhotoYCCA
Otherwise, 3-channel subsampled images are assumed to be YCbCr, 3-channel non-subsampled images are assumed to be RGB, 4-channel subsampled images are assumed to be YCCK, and 4-channel, non-subsampled images are assumed to be CMYK.
All other images are declared uninterpretable.
Metadata Extractor doesn't perform these conversions, however the above approach gives a tested example of the steps you can take to determine the colour format.

Related

How should be a labelled image for semantic segmentation?

As I understand from the below explanation, there will be two types of images for semantic segmentation which are inputs and masks. Mask images are the images that contain a 'label' in pixel value which could be some integer (0 for ROAD, 1 for TREE or (100,100,100) for ROAD (0,255,0) for TREE).
Semantic segmentation describes the process of associating each pixel of an image with a class label, (such as flower, person, road, sky, ocean, or car).
https://se.mathworks.com/help/vision/ug/semantic-segmentation-basics.html
According to my research, there are lots of types of labelled images for semantic segmentation. Along with the different extensions(.png .jpg .gif .bmp...), some of them are RGB labelled (3-channel) images and some are GRAY (1-channel) images. Below, there are two examples to explain this situation better.
RGB labelled with the extension '.png'
https://github.com/divamgupta/image-segmentation-keras#user-content-preparing-the-data-for-training
GRAY scale labelled with the extension '.gif'
https://www.kaggle.com/kmader/vgg16-u-net-on-carvana/#data
If my image has labelled as GRAY scale, I basically make it RGB by copying each value of this GRAY channel for 3 RGB channel. Just the opposite, by averaging the RGB channels, I can make the labelled image as GRAY scale. What is the difference? Which one is more suitable for which task (binary segmentation or sth else)?
In my case, I have 4 classes and try to do multiclass semantic segmentation. I've already labelled about 600 images on DataTurks. That means, I just have the object's polygons, and I have to make my labelled image on my own. For now, the extension of my input images and the mask images are '.jpg' and '.png' respectively. How should I label my images along with the which extension?

You can save the mask as grayscale png images with the values being one of 0,1,2,3(since you have 4 classes) at each location corresponding to the class(tree, bush etc.) of the pixel values in the input images.
You can verify that the mask image is generated correctly by doing this.
import cv2
import numpy as np
lbl_img = '<path_to_mask_image>'
mask = cv2.imread(lbl_img, 0)
print(np.unique(mask))
[0 1 2 3] # this will vary based on number of classes present in mask image

Images and Filters in OpenCL

Lets say I have an image called Test.jpg.
I just figured out how to bring an image into the project by the following line:
FILE *infile = fopen("Stonehenge.jpg", "rb");
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
I have never worked with images before, let alone OpenCl so there is a lot that is going over my head.
I need further clarification on this part for my own understanding
Does this bmp image also need to be stored in an array in order to have a filter applied to it? I have seen a sliding window technique be used a couple of times in other examples. Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
I know this may seem like a basic question to most but I do not have a mentor on this subject in my workplace.

Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
Not exactly. bmp is a very specific image serialization format and actually a quite complicated one (implementing a BMP file parser that deals with all the corner cases correctly is actually rather difficult).
However what you have there so far is not even file content data. What you have there is a C stdio FILE handle and that's it. So far you did not even check if the file could be opened. That's not really useful.
JPEG is a lossy compressed image format. What you need to be able to "work" with it is a pixel value array. Either an array of component tuples, or a number of arrays, one for each component (depending on your application either format may perform better).
Now implementing image format decoders becomes tedious. It's not exactly difficult but also not something you can write down on a single evening. Of course the devil is in the details and writing an implementation that is high quality, covers all corner cases and is fast is a major effort. That's why for every image (and video and audio) format out there you usually can find only a small number of encoder and decoder implementations. The de-facto standard codec library for JPEG are libjpeg and libjpeg-turbo. If your aim is to read just JPEG files, then these libraries would be the go-to implementation. However you also may want to support PNG files, and then maybe EXR and so on and then things become tedious again. So there are meta-libraries which wrap all those format specific libraries and offer them through a universal API.
In the OpenGL wiki there's a dedicated page on the current state of image loader libraries: https://www.opengl.org/wiki/Image_Libraries
Does this bmp image also need to be stored in an array in order to have a filter applied to it?
That actually depends on the kind of filter you want to apply. A simple threshold filter for example does not take a pixel's surroundings into account. If you were to perform scanline signal processing (e.g. when processing old analogue television signals) you may require only a single row of pixels at a time.
The universal solution of course to keep the whole image in memory, but then some pictures are so HUGE that no average computer's RAM can hold them. There are image processing libraries like VIPS that implement processing graphs that can operate on small subregions of an image at a time and can be executed independently.
Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
In case you mean "pixel array" instead of BMP (remember, BMP is a specific data structure), then no. Pixel component values may be of any scalar type and value range. And there are in fact colour spaces in which there are value regions which are mathematically necessary but do not denote actually sensible colours.
When it comes down to pixel data, an image is just a n-dimensional array of scalar component tuples where each component's value lies in a given range of values. It doesn't get more specific for that. Only when you introduce colour spaces (RGB, CMYK, YUV, CIE-Lab, CIE-XYZ, etc.) you give those values specific colour-meaning. And the choice of data type is more or less arbitrary. You can either use 8 bits per component RGB (0..255), 10 bits (0..1024) or floating point (0.0 .. 1.0); the choice is yours.

CGWindowListCreateImage creates image with 4 bytes per pixel. How to change that to 3 bytes per pixel?

The screen capture you get from calling CGWindowListCreateImage is an RGBA image.
How do I force it to return an image in RGB format (3 bytes per pixel) ?
Thank you !!!

CGImage does not support formats without an alpha channel. Even kCGImageAlphaNone just means "ignore the alpha bits."
If you want a 24-bit image format, you will need to convert it yourself. vImage has routines to do this efficiently. For example, you can convert ARGB to RGB with vImageFlatten_ARGB8888ToRGB888.

Intel's IPP to create images from arrays

I am working in C++ and I have a vector container of float values. I want to write an image file to disk where the pixel values of the image are the values from the array.For instance I have 40,000 values in my array and I want a 200x200 image file to be created in some format(the format is not very important, however, I would prefer something with lossless coding if possible). I would like to do this using Intel's libraries, IPP. Can somebody tell me which function would be most appropriate for my problem.(At present I'm sticking only to grayscale images.)

One way would be to just write it out as space delimited numbers in a file.raw, and load it with ImageJ. ImageJ will give you an option to specify width, height and bit-depth.
Second, one I have dome in the past, is (if you use Matlab too), use matlab engine commands to figure(data), and then used getframe/get(gcf) etc. to imwrite it to your fav. image format (Matlab has tons of them)

saving H.264 encoded images with libavcodec

I am getting H.264 images from an IP camera and want to save the encoded images (without decoding). I am using output-example.c from ffmpeg (libavformat/output-example.c) for this purpose. For Saving the raw H.264 image, I do the following:
AVPacket pkt;
av_init_packet(&pkt);
if (c->coded_frame->pts != AV_NOPTS_VALUE)
pkt.pts= av_rescale_q(c->coded_frame->pts, c->time_base, st->time_base);
if(c->coded_frame->key_frame)
pkt.flags |= PKT_FLAG_KEY;
pkt.stream_index= st->index;
pkt.data= (uint8_t *)ulAddr;//video_outbuf;
pkt.size= out_size;
save_image(pkt.data, out_size);
Where ulAddr is the address pointer to the image and out_size is the image size. Instead of saving the images to a media video file, I want to save the individual images. save_image function simply uses basic fopen and fwrite functions for saving the images. If I decode the frame and then save, everything works fine. But I have problem saving the encoded frames. The encoded frames are saved with a very small size and then they cannot be decoded.
Is there anything wrong? I will really appreciate any help in this regard.

H.264 is not a "picture" encoding format, it is a "movie" encoding format. The encoder does not encode each picture individually, it looks at a group of pictures all together and spreads the encoding for any given picture among the pictures in the group.
If you look at a single encoded picture in most cases you'll find that it has references to other pictures that can be before are after it in the encoded stream. A decoder may need to be given several other pictures prior to being able to decode your target picture.
You may need to select a different encoding format that allows you to do what you want, with H.264 you can't. The formats that encode each picture independently of the rest are called "intra-coded".

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight