I need to get geolocation info from my photos. Lat/Lon and GPSVersion.
I've already found some info related to this question, I compared different EXIF headers and found a hexadecimal dump that gives me coordinates - now I need to get it from the file.
The question might seem very simple. How do I open a JPEG-file in Delphi to get necessary hexadecimal dumps?
Already tried to read Chars and Integers, but nothing worked. I would like not to use any external libraries for this task if possible.
This is basically my major question, but I'll be extremely happy if anyone could answer one more.
Is there an easy way to search GPS tags without searching the file for specific dumps? Now I'm looking for a strange combination 12 00 02 00 07 00, which really works. I've read EXIF documentation but I couldn't really understand the thing with GPS Tags.
If you require no external libraries, you can do this with TFileStream and an array of byte. I've done this in a project to obtain the 'picture taken date', the GPS lat-long coordinates are just another field in the EXIF header. I don't have the code here but the method is straight-forward: once you have a TFileStream to the JPEG file:
Read the first 2 bytes, check it is in fact $FF $D8 (just to be sure it's a valid JPEG)
Read the next 2 bytes, check if it's $FF $E1
if it's not, depending on which segment it is, read two more bytes (or a word) and skip that many bytes (by calling the stream's Seek method), there's a list of segments here: https://en.wikipedia.org/wiki/JPEG#Syntax_and_structure; then repeat
If it is, read 4 bytes and see if it's 'Exif' ($45 $78 $69 $66)
What follows is $00 $00 and a 8-byte TIFF header which holds general information like endianness, followed by the EXIF tags you need to work through and grab the ones you need, I had a quick search and found a list here: http://www.exiv2.org/tags.html
Since it's safe to assume that the EXIF data is in the first kilobytes of the JPEG file, you could read this much in a byte array (or TMemoryStream) and process the data there, which should perform better than separate small reads from a TFileStream.
Related
I am trying to patch a file using bsdiff, my problem is that I have to do it having few memory available. According to this constraint I need to modify the source file with the patch in order to get the target file.
Bsdiff basic are as follows:
header: not very relevant in this explanation.
Control data block:
mixlen-> number of bytes to be modified combining the bytes from the source
file and the bytes obtained from the diff block.
copylen-> number of bytes to be added. This is totally new extra data
that need to be added to our file. This bytes are read from the
extra block.
seeklen-> number used to know which we have to read from the source file.
Compressed control block.
Compressed diff block.
Compressed extra block.
Patch file format:
0 8 BSDIFF_CONFIG_MAGIC
8 8 X
16 8 Y
24 8 sizeof(newfile)
32 X control block
32+X Y diff block
32+X+Y ??? extra block
with control block a set of triples (x,y,z) meaning "add x bytes
from oldfile to x bytes from the diff block; copy y bytes from the
extra block; seek forwards in oldfile by z bytes".
So the problem is that bsdiff considers I always have the source file without any modification, so it uses it to modify data that I have already modified (if I consider the source the same file as the target). Firstly I tried to reorder the modifications to do, but in some cases these modifications affect memory that will be used in the future for another modification. Maybe the algorithm is not suitable what I want.
Does exist another algorithm suitable for this? Is there any implementation of BSDIFF or similar doing what I need?
Before going more in depth with Bsdiff I did some research, finding VCDIFF(used by Xdelta) but it also seems to have the same behavior I haven't dug into the code though, so I don't know yet if it generate the patch in the same what as Bsdiff does.
Another point to remark would be I am trying to implement it in C.
Edited 04/10/2016:
I have tried to reorder the patch, because having the addresses to modify ordered from smaller to the bigger I thought I could handle this problem storing the original memory data into a buffer until the next modification which requires that original data had been done, but it seems that the patch order is important also, maybe in Bsdiff it modifies several times the same part of memory until it gets the right data. Any idea will be very welcome if someone knows about this.
Best regards,
Iván
We cannot eliminate the dependency on source data without impacting the compressed delta size. So, you will need to have source data unmodified to make BSDIFF work in the scenario you explained.
I'm currently working on a project which involves reading file's magic files (without bindings). I'd like to know how it would be possible to read the file tests from the compiled binary magic.mgc directly, in another language (like Go), as I'm unsure of how its contents should be interpreted.
According to Christos Zoulas, main contributor of file:
If you want to use them directly you
need to understand the binary format (which changes over time) and load
it in your own data structures. [...] The code that parses the file is in apprentice.c. See check_buffer()
for the reader and apprentice_compile() for the writer. There is
a 4 byte magic number, followed by a 4 byte version number followed
by MAGIG_SET (2) number of 4 byte counts one for each set (ascii,
binary) followed by an array of 'struct magic' entries, in native
byte format.
So that's the format one should expect! Nevertheless, it has to be parsed just like the raw files.
I have several address list's on my TBIRD address book.
every time I need to edit an address that is contained in several lists, is a pain on the neck to find which list contains the address to be modified.
As a help tool I want to read the several files and just gave the user a list of which
xxx.MAB files includes the searched address on just one search.
having the produced list, the user can simply go to edit just the right address list's.
Will like to know a minimum about the format of mentioned MAB files, so I can OPEN + SEARCH for strings into the files.
thanks in advance
juan
PD have asked mozilla forum, but there are no plans from mozilla to consolidate the address on one master file and have the different list's just containing links to the master. There is one individual thinking to do that, but he has no idea when due to lack of resources,
on this forum there is a similar question mentioning MORK files, but my actual TBIRD looks like to have all addresses contained on MAB files
I am afraid there is no answer that will give you a proper solution for this question.
MORK is a textual database containing the files Address Book Data (.mab files) and Mail Folder Summaries (.msf files).
The format, written by David McCusker, is a mix of various numerical namespaces and is undocumented and seem to no longer be developed/maintained/supported. The only way you would be able to get the grips of it is to reverse engineer it parallel with looking at source code using this format.
However, there have been experienced people trying to write parsers for this file format without any success. According to Wikipedia former Netscape engineer Jamie Zawinski had this to say about the format:
...the single most brain-damaged file format that I have ever seen in
my nineteen year career
This page states the following:
In brief, let's count its (Mork's) sins:
Two different numerical namespaces that overlap.
It can't decide what kind of character-quoting syntax to use: Backslash? Hex encoding with dollar-sign?
C++ line comments are allowed sometimes, but sometimes // is just a pair of characters in a URL.
It goes to all this serious compression effort (two different string-interning hash tables) and then writes out Unicode strings
without using UTF-8: writes out the unpacked wchar_t characters!
Worse, it hex-encodes each wchar_t with a 3-byte encoding, meaning the file size will be 3x or 6x (depending on whether whchar_t is 2
bytes or 4 bytes.)
It masquerades as a "textual" file format when in fact it's just another binary-blob file, except that it represents all its magic
numbers in ASCII. It's not human-readable, it's not hand-editable, so
the only benefit there is to the fact that it uses short lines and
doesn't use binary characters is that it makes the file bigger. Oh
wait, my mistake, that isn't actually a benefit at all."
The frustration shines through here and it is obviously not a simple task.
Consequently there apparently exist no parsers outside Mozilla products that is actually able to parse this format.
I have reversed engineered complex file formats in the past and know it can be done with the patience and right amount of energy.
Sadly, this seem to be your only option as well. A good place to start would be to take a look at Thunderbird's source code.
I know this doesn't give you a straight-up solution but I think it is the only answer to the question considering the circumstances for this format.
And of course, you can always look into the extension API to see if that allows you to access the data you need in a more structured way than handling the file format directly.
Sample code which reads mork
Node.js: https://www.npmjs.com/package/mork-parser
Perl: http://metacpan.org/pod/Mozilla::Mork
Python: https://github.com/KevinGoodsell/mork-converter
More links: https://wiki.mozilla.org/Mork
For the purposes of this example, suppose there exist 2 binary files A and B, each containing a variation of, say, youtube video, where
A contains a 5 second ad
B contains no ad
With the exception for the ad, A contains the same content as B
Total length of file A is 60 seconds
Total length of file B is 55 seconds
As a general rule, if we were to compare bits patterns of each file, would we arrive to the same conclusion: files contain 55 seconds worth of common bits?
If we extend the problem further, say to the world of 2 jars, the only difference between which are comments, would it be appropriate to compare the order of bits and based on what we find, determine the degree of likeness?
It's easy to determine whether files are identical or not. Will the approach of comparing bits help accurately determine the degree to which files are close to one another?
The question is not about video files, but rather a general binary files. I mention video file above for example purposes only.
It depends on the file-format, but in your examples — no, probably not.
Video with and without initial ad: videos are usually encoded by breaking them into small time-blocks, and then encoding and compressing those blocks; if you insert an ad at the beginning, then you will most likely cause the block-transitions to happen at different time offsets within the main video.
Jar-file with and without comments (or with different comments): same story; changing the length of a comment within a file will affect the splitting of the entire file into compressible blocks, so all blocks after an altered comment will be compressed differently. (This is, of course, assuming that the jar-file actually includes the comments. Just because comments were in the source-code, that doesn't mean the jar-file will have them; that depends on compiler settings and so on.)
Most video compression these days is done with lossy algorithms. The compression is done both within a frame and BETWEEN frames. If the extra video frames added in your "A" video "leak" into the original movie because of the inter-frame compression, then by definition your two video files will be different videos, even though logically they're the same movie with 5 seconds of ad tacked onto the front. The compression algorithm will have merged 1 or more frames of the two videos into a hybrid of the two, and this fundamentally changes things.
I'm working on a C library that reads tag information from music files. I've already got ID3v2 taken care of, but I can't figure out how Ogg files are structured.
I opened a .ogg file in a hexeditor and I could find the tag data because that was all human readable. But everything from the beginning of the file to the tag data looked like garbage. How is this data encoded?
I don't need any help in the actual code, I just need help visualizing what a Ogg header looks like and what encoding it uses so I that I can read it. I'd like to use a non-hacky approach to reading Ogg files.
I've been looking at the Flac format, which has been helpful.
The Flac file I'm looking at has about 350 bytes between the "fLac" identifier and the human readable Comments section, and none of it is human readable in my hex editor, so I'm sure there has to be something important in there.
I'm using Linux, and I have no intention of porting to Windows or OS X. So if I need to use a glibc only function to convert the encoding, I'm fine with that.
The Ogg file format is documented here. There is a very nice graphical visualization as you requested with a detailed written description.
You may also want to look at libogg which is a open source BSD-licensed library for reading and writing Ogg files.
As is described in the link you provided, the following metadata blocks can occur between the "fLaC" marker and the VORBIS_COMMENT metadata block.
STREAMINFO: This block has information about the whole stream, like sample rate, number of channels, total number of samples, etc. It must be present as the first metadata block in the stream. Other metadata blocks may follow, and ones that the decoder doesn't understand, it will skip.
APPLICATION: This block is for use by third-party applications. The only mandatory field is a 32-bit identifier. This ID is granted upon request to an application by the FLAC maintainers. The remainder is of the block is defined by the registered application. Visit the registration page if you would like to register an ID for your application with FLAC.
PADDING: This block allows for an arbitrary amount of padding. The contents of a PADDING block have no meaning. This block is useful when it is known that metadata will be edited after encoding; the user can instruct the encoder to reserve a PADDING block of sufficient size so that when metadata is added, it will simply overwrite the padding (which is relatively quick) instead of having to insert it into the right place in the existing file (which would normally require rewriting the entire file).
SEEKTABLE: This is an optional block for storing seek points. It is possible to seek to any given sample in a FLAC stream without a seek table, but the delay can be unpredictable since the bitrate may vary widely within a stream. By adding seek points to a stream, this delay can be significantly reduced. Each seek point takes 18 bytes, so 1% resolution within a stream adds less than 2k. There can be only one SEEKTABLE in a stream, but the table can have any number of seek points. There is also a special 'placeholder' seekpoint which will be ignored by decoders but which can be used to reserve space for future seek point insertion.
Just after the above description, there's also the specification of the format of each of those blocks. The link also says
All numbers used in a FLAC bitstream are integers; there are no floating-point representations. All numbers are big-endian coded. All numbers are unsigned unless otherwise specified.
So, what are you missing? You say
I'd like a non-hacky approach to reading Ogg files.
Why re-write a library to do that when they already exist?