Can we extract thumbnails embedded in TIFF files with metadata-extractor? - metadata-extractor

I'm able to use ExifTool to extract an embedded thumbnail jpeg from within a TIFF. Is there a way that I can do this with drew noakes' metadata-extractor?
The source file is here:
https://corpora.tika.apache.org/base/docs/commoncrawl3/RD/RDAFESH5CBBJWWQZMZR4MGJIPYYEL7DN
The extracted thumbnail/preview image is .
I see the 5225 byte count in a metadata item (0x0202) with metadata-extractor, but I'm not able to get the bytes.
Thank you!

Related

How to download images and bounding boxes from imageNet such that they have matching names?

I am doing object detection for a specific class, say, chairs .
I want to download images of chairs from imageNet. I also want to download the annotation xml files (bounding boxes) from imageNet.
Both these things are provided on imageNet and I have successfully been able to download them using a tool called ImageNet_Utils
https://github.com/tzutalin/ImageNet_Utils
But the downloaded images and bounding boxes don't have matching names. So it is impossible to tell which xml file is for which image.
How do I download images and bounding boxes from imageNet such that corresponding image and annotation xml files have matching names?
The download image URLs page says
The URLs are listed in a single txt file, where each line contains an
image ID and the original URL
Unfortunately, as of 2020-03-06, all the URL mapping files link to a Oops! The URL is not valid page. However, can however get mappings for each node individually. They are available by wnid: http://www.image-net.org/api/text/imagenet.synset.geturls.getmapping?wnid=n03273913
A bounding box annotation file will contain this element.
<filename>n03273913_16800</filename>
The n03273913 is the synset id and the 16800 is the image id. In the synset mapping file you'll find the line
n03273913_16800 http://farm1.static.flickr.com/186/425238103_8fe80b37de.jpg
You can download the image from that location.
There's a c++ library known as dlib. You can pass your downloaded images from dlib, it has GUI support for drawing blocks in images and save them in vector formats in an XML file. You can refer here for the documentation

is base-64 just for jpeg or gif? how about a tiff? how can I convert string (which is part of XML and belong to a tiff file) to byte[]?

I wrote this in my application for displaying a TIFF Image :
byte[] b = convert.frombase64("ADFsf/s1ugdGHREHR/+/235gjhjhfcg/+kdhjgvkhfv/gcngcxsfdzsdf......=")
but It doesn't work. I received this message on loading the tiff :
run-time error '31037'
system error &H800401C2 (-2147221054)
I don't know exactly why?
When I save this tiff Image in this way :
file.writeallbytes("z.tiff",b);
I can open it , It means, It saved correctly.
now my problem is, I can`t display it in my application and image loading has got some problems.
thanks
Base-64 is certainly not only for JPEG or GIF. It can be used to represent any string of binary data (including plain text.) The base-64-encoded data you gave is malformed (as can be seen by writing the bytestream to a .tiff file), though.
My issue was done!
byte[] b = convert.frombase64("ADFsf/s1ugdGHREHR/+/235gjhjhfcg/+kdhjgvkhfv/gcngcxsfdzsdf......=")
method: convert.frombase64 -> can be used for any string of binary data
My problem was about loading a tiff file.
Firstly I must identify the pages in tiff image.
I split my tiff image to single pages by frame dimension method.

File Attributes/Description

I have to put some attributes on a file like you see it on an jpeg file, there you can add many attributes about the image and resolution and also but information in about the camera.
I also saw it on an mp3 file where you can add information about the song, album ,producer etc...
Is there any way to add these attributes to something like an pdf, txt.
Thanks for your time.
I have to put some attributes on a file like you see it on an jpeg file, there you can add many attributes about the image and resolution and also but information in about the camera.
That is part of the JPEG/Exif file specification.
I also saw it on an mp3 file where you can add information about the song, album ,producer etc...
That is part of the MP3 file specification.
Is there any way to add these attributes to something like an pdf, txt.
Metadata is part of the PDF file specification. There is nothing like that for plain text files.

Bing Static Map Image type

May I know how to determine the output of Bing map static image? Previously it used to be in png but now it is in jpeg. May I know how to revert back to png format?
Example: Display the image with the link below:
http://dev.virtualearth.net/REST/V1/Imagery/Map/Road/space%20needle,seattle?mapLayer=TrafficFlow&mapVersion=v1&key=BingKey
The image is in jpeg. How to make it to png? Thanks.
I know it's a while you posted this question, but as there is no answer yet, and I got here via Google I'll supply an answer anyway.
You can use the format / fmt parameter.
From: http://msdn.microsoft.com/en-us/library/ff701724.aspx
One of the following image format values:
gif: Use GIF image format.
jpeg: Use JPEG image format. JPEG format is the default for Road, Aerial and AerialWithLabels imagery.
png: Use PNG image format. PNG is the default format for CollinsBart and OrdnanceSurvey imagery.
Examples:
format=jpeg
fmt=gif
I don't think you can request the image in a different format.
From http://msdn.microsoft.com/en-us/library/ff701724.aspx :
This URL returns an image in one of
the following formats:
PNG (image/png)
JPEG (image/jpeg)
GIF (image/gif)
You cannot specify the output format
for the map image. The image type is
chosen based on parameters such as
imagerySet.
If you really want a PNG, you could make the request from a server-side script and then construct a PNG file programmatically before serving that back to the client (using PHP's imagepng function, for example)

Getting a CGImage out of a PDF file

I have a PDF file where every page is a (LZW) TIFF file. I know this because I created it. I want to be able to load it and save it as a bunch of TIFF files.
I can open the PDF file with CGPDFDocumentCreateWithURL, and get a page. I can even draw the page onto the screen.
What I WANT to do is draw the page into a bitmapContext, so that I can use CGBitmapContextCreateImage to get the image into a CGImageRef. However, in order to create a bitmap context, I need to know the size and resolution of the image. I can't seem to find out how to get either a CGPDFDocument or a CGPDFPage to tell me the resolution of the image object on that page.
Is there an easier way to do this that I'm not realizing?
thanks.
Ghostscript will work for you here :
gs -sDEVICE=tiff32nc -sOutputFile=foo-Page%d.tif foo.pdf
For 2 page document foo.pdf you should get :
foo-Page1.tif
foo-Page2.tif
From memory I think the output resolution from GS is that of the containing Page, not necessarily the resolution of the embedded file (unless these are the same to begin with).
If this is the case and you want to recover the image as it was originally res-wise, you can use iText (java) or iTextSharp(.net) to get to the image content stream (ie. Bytes) and write them out to disk in the format of your choice, after converting the content stream into a PdfImage iirc.
Hope the ghostscript option is applicable to save writing yet another utility...

Resources