Is there a tool to search differences in hex files? - file

I have two hex files, and want to search for values that have changed from 9C in one file to 9D in the next one.
Even a tool that synchronizes the windows would work, as I can't seem to find any that enable me to do this. All I've found are tools that let me search for the same value and same address between two files.
Thanks!

"There's no such thing as a hex file."
Oh well perhaps I should delete all my hex files.
I use Motorola and Intel formats of hex files (they are just text files representing bytes, but include a record structure for addresses, checksum etc per line. These files need to be treated as binary (in .gitattributes) since they cannot be merged. I use Kdiff3 to see differences (or any other text diff tool).

Related

What is the difference between .bin and .dat file?

When writing to a binary file, when should I use .bin vs .dat? If I'm just trying to store information not meant to be read by humans, like item description/serial number pair, does it matter which one I pick if I'm just trying to make it unreadable from a text editor?
Let me give you some brief details about these files :
.BIN File : The BIN file type is primarily associated with 'Binary File'. Binary files are used for a wide variety of content and can be associated with a great many different programs. In general, a .BIN file will look like garbage when viewed in a file editor.
.DAT File : The DAT file type is primarily associated with 'Data'. Can be just about anything: text, graphic, or general binary data. Data file in special format or ASCII.
Reference:
Abhijit Banerjee answered that question on quora
.dat is a more frequently used suffix for binary data. It doesn't matter what extension you pick, as long as you are on Unix or Linux based systems.
Sufixes can mean whatever you want them to mean... Those rules are more like guidelines than actual rules...
However, BIN seems like a short to binary, so a BIN file will likely hold data in binary form. DAT seems like a short to data, so a DAT file will contain information in whatever format the developer of the program that reads that file seems fit (ASCII, Binary, a mix of them, something else entirely)
On a UNIX system, there is no difference. The extensions are interchangeable.
If you do not put any extension, it makes it kinda hard for someone not knowing what the file's extension should be, to open the file. Additionally, with Unix or Linux, if you place a dot (period) before the file name, the file hides itself.

How to read content of unknown file

I have a file that holds manufacturing orders for a machine.
I would like to read the content of this file and edit it, but when I open it in a text editor i.e. Notepad++, I get a bunch of wierd charecters:
xÚ¥—_HSQÀo«a)’êaAXŽâê×pD8R‰¬©s“i+ƒ´#¡$
-þl-ó/ÓíºIúPôàƒHˆP–%a&RÎÈn÷ü¹·;Ú;ç<ìòÝÃý}¿ó}‡{϶«rWg>˜›ãR‡)Çn0³Ûf³yÎW[5–šw½ÇRW{ñ’rO6¹ŽŸp¦ÙœcÏ.9yÀnýg
)Ë—e90ejÕø£rC. f¦}3ËŒ˜hü”å1g[…ø±ú ÜJøz®‹˜YfÈ,4`ŽKÉ—ù“ÔË¿d„þlG3#=˜Ž´+hF¬¦£€«šm¿áØ
ïÖµv‡ËpíÍ~™‡Aù
šëÈÚ]ÿç™DŒÉFØ ïƒæsij  ¦y=-74Æ/t=ÕŠr\˜š»Âä‰Ý­¨žã΢
dz·à‡'fœ½­yâ½4qåPjácòÄŒeÊhñ“ý™ÙÎÕ÷5ôlñ=˜Õ{ú;ø=Û;4OêYä>Ìpxbæâ­'è"oëB×1gQ9“'¹]Ô³’Ô³ø!ÌózÞyŸõžÓIŽù*&OÌXPÕ"ŽWžpíOÌè‚Þ3Òr0{Ž†R=_?…/¼žÞ0,ê=/?£ûÓËîy“2Z<ij³[ËÁì™÷–ôžÎ’Ããa÷<Maêéí…¼ž}©žYýZ-˜=­”á¤}π>3°¢÷œ$ïè‰3ìž«ƒÄs¿—xnŒÀ*¯gi$ÕómDËÁìùIeоû‡À¬?3°x¾"~ª§c˜öÝÇî颌°›x¾Fßb>Ï}QXÓ{öFi-êÙßóR”œe^Ñ÷ü‘¿g[Lë ŽwJZϘë¹3”³L©gH‚,^Ïe 2ôžWGøëÙ2‚Î
øœL¾ÅqÈäõ,ýç\œË3¾þeྗ&`Ϻ<KÒf“’»ðù]í‰ãžU^wèþåÔÖy”H}ò•6ø6
It looks like the file is encoded.
Any idea how to find the encoding and make the file readable and editable?
It's binary and probably encoded so without knowledge of data structure you can't do much - just reverse engineering based on trying and checking what changed, operating with hex editor.
It isn't impossible, tho. If you can change the data the way you know (eg. change number of orders from 1 to 2) and export to file, you can compare binary values and find which byte holds that number. Of course if it is encrypted and you don't know the key... It's easier to find another way.
For further read, check this out - https://en.wikibooks.org/wiki/Reverse_Engineering/File_Formats
If you've got access to a Linux box why not use
hexdump -C <filename>
You will be able to get a much better insight into how the file is structured, than by using a text editor.
There are also many "hexdump" equivalent commands on Windows

Memo fields in dbf

I am aware that .dbf database holds (text) fields larger than 254 characters in separate .dbt files linking them with an M memo field.
I have a legacy database which I can plainly see contains a field with a stated (max) length of 255 characters.
When I edit that file and save it with OpenOffice Calc, it creates a .dbf and a .dbt. I would like to leave the edited file in the format I have found, that is with a 255 characters field.
Is that possible?
Does it depend on the character set (the only option I can see when using OpenOffice and Excel, versions that supported .dbf)?
There are many versions of dbf files; some of them, such as Clipper, actually allowed for character fields longer than 254 to be stored in the dbf file itself.
I don't believe OpenOffice nor Excel support writing to those formats, so if you want to make changes (and not just read the data) you will need to find another tool.
So far as telling the xBase version, you could start with Data File Header Structure for the
dBASE Version 7 Table File.
As for DBF file editors, Google is your friend. A quick search gave me, for example, GTK DBF Editor and CDBF. The latter I remember using about three years back with a client who was still running Clipper 5.2 apps under Microsoft Windows 98.

Weird directory entries in FAT file system

So I'm trying to figure out how the FAT FS works and got confused by the root directory table. I have two files in the partition: test.txt and innit.eh which results in the following table:
The entries starting with 0xE5 are deleted, so I assume these were created due to renaming. The entries for the actual files look like this:
TEST TXT *snip*
INNIT EH *snip*
What I don't understand is where the entries like
At.e.s.t......t.x.t
Ai.n.n.i.t.....e.h.
are coming from and what are they for. They do not start with 0xE5, so should be treated as existing files.
By the way, I'm using Debian Linux to create filesystems and files, but I noticed similar behaviour on FS and files created on Windows.
The ASCII parts of the name (where the letters were close to each other) is the legacy 8.3 DOS shortname. You see it only uses capital letters. In DOS, only these would be there.
The longer parts (with 0x00 in between) is the long name (shown in Windows) which is Unicode, and uses 16bits per character.
The intervening bytes are all 0x00, which gives a strong feeling that they are stored in UTF-16 instead of UTF-8. Perhaps they are there as an extension similar to the other VFAT extensions for long filenames?

How to read lines from a pdf file into a c program using ghostscript?

I am currently taking a curse in C programming, and for our final project we need to read some text from a pdf into a string, so we can manipulate the string.
In essence what i am looking for is something similar to this, only with a .pdf instead of a .txt file.
char *line;
fscanf(myfile.txt," %[^\n]", line);
I have no experience with ghostscript, so I have no idea if this is even possible, although we where told that we should use ghostscript.
The current version of Ghostscript includes the 'txtwrite' device, which will extract text from any supported input (PostScript, PDF, XPS, PCL) and will emit it in a variety of forms.
The UTF-8 output would probably be most useful to you.
Caveat! Many things which appear to be text in PDF files are not text, and no attempt is made to deal with these.
ps2ascii is deprecated with the release of the txtwrite device, but in any case its perfectly capable (despite the name) of dealing with PDF as an input.
I can't think why anyone assigned you this project, PDF files are not text files, and cannot be treated as such. In addition to the fact that PDF files are generally compressed, identifying the contents stream and all the other streams it relies on (which may themselves include text) is non-trivial. Plus, the text is often encoded in a way which can be difficult to understand (this is particularly true of CIDFonts and TrueType fonts).
Perhaps your tutor expected you to first become expert in the PDF format, but that seems excessive for a C course.
You can convert your PDF to Postscript using pdf2ps, and then to ASCII using ps2ascii. You already know how to read ASCII.
Both utilities mentioned are in the ghostscript package.

Resources