I have an offline .EDB file (exchange Database) that I want to pull information from such as the Computer name and the Flags etc. I have found the following offsets from http://www.edbsearch.com/edb.html which indicate that the Computer name etc comes from byte 0x24 0x10 However, looking at the following EDB file in 101 editor, the value appears to be non existent. It appears later on within the file, but not in a constant place.
Is there a constant byte that I can reliably pull the Computer name from the .EDB file ? I am working on backups from another computer, but all of the solutions that I have found are for Live versions of .EDB files - which are useless for myself as I have offline databases.
Many thanks,
With database replication (CCR in 2007, DAGs in 2010+), the concept of a computer name isn't that helpful. After a failover/switchover, what should the computer name be?
I don't think that the Computer Name is populated anymore. If eseutil.exe -mh doesn't report it, then it's not there.
Also check out JetGetDatabaseFileInfo. http://msdn.microsoft.com/en-us/library/windows/desktop/gg269239(v=exchg.10).aspx Note that the documentation is for esent.dll (Windows), and that ese.dll (Exchange) is not documented. While esent.dll and ese.dll are very similar, and for simple things (such as this) you can treat them similarly and get away with it, they are NOT identical, and you will sometimes come upon incompatibilities. In other words: Do it at your own risk, your mileage may vary, etc. etc. :)
-martin
Related
File paths are inherently dubious when working with data.
Lets say I have a hypothetical situation with a program called find_brca, and some data called my.genome and both are in the /Users/Desktop/ directory.
find_brca takes a single argument, a genome, runs for about 4 hours, and returns the probability of that individual developing breast cancer in their lifetime. Some people, presented with a very high % probability, might then immediately have both of their breasts removed as a precaution.
Obviously, in this scenario, it is absolutely vital that /Users/Desktop/my.genome actually contains the genome we think it does. There are no do-overs. "oops we used an old version of the file from a previous backup" or any other technical issue will not be acceptable to the patient. How do we ensure we are analysing the file we think we are analysing?
To make matters trickier, lets also assert that we cannot modify find_brca itself, because we didn't write it, its closed source, proprietary, whatever.
You might think MD5 or other cryptographic checksums might be able to come to the rescue, and while they do help to a degree, you can only MD5 the file before and/or after find_brca has run, but you can never know exactly what data find_brca used (without doing some serious low-level system probing with DTrace/ptrace, etc).
The root of the problem is that file paths do not have a 1:1 relationship with actual data. Only in a filesystem where files can only be requested by their checksum - and as soon as the data is modified its checksum is modified - can we ensure that when we feed find_brca the genome's file path 4fded1464736e77865df232cbcb4cd19, we are actually reading the correct genome.
Are there any filesystems that work like this? If I wanted to create such a filesystem because none currently exists, how would you recommend I go about doing it?
I have my doubts about the stability, but hashfs looks exactly like what you want: http://hashfs.readthedocs.io/en/latest/
HashFS is a content-addressable file management system. What does that mean? Simply, that HashFS manages a directory where files are saved based on the file’s hash. Typical use cases for this kind of system are ones where: Files are written once and never change (e.g. image storage). It’s desirable to have no duplicate files (e.g. user uploads). File metadata is stored elsewhere (e.g. in a database).
Note: Not to be confused with the hashfs, a student of mine did a couple of years ago: http://dl.acm.org/citation.cfm?id=1849837
I would say that the question is a little vague, however, there are several answers which can be given to parts of your questions.
First of all, not all filesystems lack path/data correspondence. On many (if not most) filesystems, the file is identified only by its path, not by any IDs.
Next, if you want to guarantee that the data is not changed while the application handles them, then the approach depends on the filesystem being used and the way this application works with the file (if it keeps it opened or opens and closes the file as needed).
Finally, if you are concerned by the attacker altering the data on the filesystem in some way while the file data are used, then you probably have a bigger problem, than just the file paths, and that problem should be addressed beforehand.
On a side note, you can implement a virtual file system (FUSE on Linux, our CBFS on Windows), which will feed your application with data taken from elsewhere, be it memory, a database or a cloud. This approach answers your question as well.
Update: if you want to get rid of file paths at all and have the data addressed by hash, then probably a NoSQL database, where the hash is the key, would be your best bet.
I have several address list's on my TBIRD address book.
every time I need to edit an address that is contained in several lists, is a pain on the neck to find which list contains the address to be modified.
As a help tool I want to read the several files and just gave the user a list of which
xxx.MAB files includes the searched address on just one search.
having the produced list, the user can simply go to edit just the right address list's.
Will like to know a minimum about the format of mentioned MAB files, so I can OPEN + SEARCH for strings into the files.
thanks in advance
juan
PD have asked mozilla forum, but there are no plans from mozilla to consolidate the address on one master file and have the different list's just containing links to the master. There is one individual thinking to do that, but he has no idea when due to lack of resources,
on this forum there is a similar question mentioning MORK files, but my actual TBIRD looks like to have all addresses contained on MAB files
I am afraid there is no answer that will give you a proper solution for this question.
MORK is a textual database containing the files Address Book Data (.mab files) and Mail Folder Summaries (.msf files).
The format, written by David McCusker, is a mix of various numerical namespaces and is undocumented and seem to no longer be developed/maintained/supported. The only way you would be able to get the grips of it is to reverse engineer it parallel with looking at source code using this format.
However, there have been experienced people trying to write parsers for this file format without any success. According to Wikipedia former Netscape engineer Jamie Zawinski had this to say about the format:
...the single most brain-damaged file format that I have ever seen in
my nineteen year career
This page states the following:
In brief, let's count its (Mork's) sins:
Two different numerical namespaces that overlap.
It can't decide what kind of character-quoting syntax to use: Backslash? Hex encoding with dollar-sign?
C++ line comments are allowed sometimes, but sometimes // is just a pair of characters in a URL.
It goes to all this serious compression effort (two different string-interning hash tables) and then writes out Unicode strings
without using UTF-8: writes out the unpacked wchar_t characters!
Worse, it hex-encodes each wchar_t with a 3-byte encoding, meaning the file size will be 3x or 6x (depending on whether whchar_t is 2
bytes or 4 bytes.)
It masquerades as a "textual" file format when in fact it's just another binary-blob file, except that it represents all its magic
numbers in ASCII. It's not human-readable, it's not hand-editable, so
the only benefit there is to the fact that it uses short lines and
doesn't use binary characters is that it makes the file bigger. Oh
wait, my mistake, that isn't actually a benefit at all."
The frustration shines through here and it is obviously not a simple task.
Consequently there apparently exist no parsers outside Mozilla products that is actually able to parse this format.
I have reversed engineered complex file formats in the past and know it can be done with the patience and right amount of energy.
Sadly, this seem to be your only option as well. A good place to start would be to take a look at Thunderbird's source code.
I know this doesn't give you a straight-up solution but I think it is the only answer to the question considering the circumstances for this format.
And of course, you can always look into the extension API to see if that allows you to access the data you need in a more structured way than handling the file format directly.
Sample code which reads mork
Node.js: https://www.npmjs.com/package/mork-parser
Perl: http://metacpan.org/pod/Mozilla::Mork
Python: https://github.com/KevinGoodsell/mork-converter
More links: https://wiki.mozilla.org/Mork
Is there a way to read a file's data but continue reading the data on the hard drive past the end of file? For normal file I/O I could just use fread(), but, obviously, that will only read to the end of the file. And it might be beneficial if I add that I need this on a Windows computer.
All my Googling for a way to do this is instead coming up with results about unrelated topics concerning EOF, such as people having problems with normal I/O.
My reasoning for this is that I accidentally deleted part of the text in a text file I was working on, and it was an entire day's worth of work. I Googled up a bunch of file recovery stuff, but it all seems to be about recovering deleted files, where my problem is that the file is still there but without some of its information, and I'm hoping some of that data still exists directly after the currently marked end of file and is neither fragmented elsewhere or already claimed or otherwise overwritten. Since I can't find a program that helps with this specifically, I'm hoping I can quickly make something up for it (I understand that, depending on what is involved, this might not be as feasible as just redoing the work, but I'm hoping that's not the case).
As far as I can foresee, though I might not be correct (not sure, which is why I'm asking for help), there are 3 possibilities.
Worst of the three: I have to look up Windows API functions that allow direct access to the entire hard drive (similar to its functions for memory, perhaps? those I have experience with) and scan the entire thing for the data that I still have access to from the file and then just continue looking at what's after it.
Second: I can get a pointer to the file, then I still have to get raw access to HD but at least have a pointer to the file in it?
Best of the three: Just open the file for write access, seek to the end, then write a ways past EOF to claim more space, but first hope that Windows won't clean the data before it hands it over to me so that I get garbage data which was the previous data in that spot which would actually be what I'm looking for? This would be awesome if it were that simple, but I'm afraid to test it out because I'd lose the data if it failed, so hopefully someone else already knows. The PC in question is running Vista Home Premium if that matters to anyone that knows the gory details of Windows.
Do either of those three seem plausible? Whether yea or nay, I'm also open (and eager) for other suggestions, especially those which are better than my silly ideas, and especially if they come with direction toward specific functions to use to get the job done.
Also, if anyone else actually has heard of a recovery program that doesn't just recover deleted files but which would actually work for a situation like this, and which is free and trustworthy, that works too.
Thanks in advance for any assistance.
You should get a utility for scanning the free space of a hard drive and recovering data from it, for example PhotoRec or foremost. Note however that if you've been using the machine much at all (even web browsing, which will create files in your cache), the data has likely already been overwritten. Do not save your recovery tools on the same hard drive, or even use the same PC to download them; get them from another computer and save them to a USB device, then run them from that device.
As for the conceptual content of your question, files are abstract objects. There is no such thing as data "past eof" except (depending on the implementation) perhaps up to the next multiple of the filesystem/disk "blocksize". Also it's possible (very likely) that your editor "saved" the file by truncating it and writing everything newly from the beginning, meaning there's not necessarily any correspondence between the old and new storage.
Your question doesn't make a lot of sense -- by definition there is nothing in the file after the EOF. By your further description, it appears that you want to read whatever happens to be on the disk after the last byte that is used by the file, which might be random garbage (unused space) or might be some other file. But in either case, this isn't 'data after the EOF' its just data on the disk that's not part of the file. Its even possible that it might be some other part of the same file, if the filesystem happens to lay out its data that way -- some filesystems scatter blocks in seemingly random ways across the disk and figuring out what bytes belong to which files requires understanding the filesystem metadata.
Morning all,
I've gone and told a customer I could migrate some of their old data out of a DOS based system into the new system I've developed for them. However I said that without actually looking at the files that stored the data in the old system - I just figured a quick google would solve all the problem for me... I was wrong!
Anyway, this program has a folder with hundreds... well 800 files with all sorts of file extensions, .ave, .bak, .brw, .dat, .001, .002...., .007, .dbf, .dbe and .his.
.Bak obviously isn't a SQL backup file.
Does anyone have any programming experience using any of those file types who may be able to point me in the direction of some way to read and extract the data?
I cant mention the program name for the reason that I don't think the original developer will allow this...
Thanks.
I'm willing to bet that the .dbf file is in DBase format, which is really straightforward. The contents of that might provide clues to the rest of them.
the unix 'file' utility can be used to recognize many file types by their 'magic number'. It examines the file's contents and compares it with thousands of known formats. If the files are in any kind of common format, this can probably save you a good amount of work.
if they're NOT in a common format, it may send you chasing after red herrings. Take its suggestions as just that, suggestions.
In complement to the sites suggested by Greg and Dmitriy, there's also the repository of file formats at http://www.wotsit.org ("What's its format?").
If that doesn't help, a good hex editor (with dump display) is your friend... I've always found it amazing how easy it can be to read and recognize many file formats.
Could be anything. Best be is to open with a hex editor, and see what you can see
Most older systems used a basic ISAM which had one file per table that contained a set of fixed length data records. The other files would probably be indexes
As you only need the data, not the index, just look for the files with repeating data patterns (it often looks like pretty patterns on the hex editor screen)
When you find the file with the data, try to locate a know record e.g. "Mr Smith" and see if you can work out the other fields. Integers are often byte for byte, dates are often encoded and days from a known start date, money could be in BCD
If you see a strong pattern, then most likely each record is a fixed length. There will probably be a header block on the file say 128 or 256 bytes, and then the fixed length records
Many old system where written in COBOL. There is plenty of info on the net re cobol formats, and some companies even sell COBOL ODBC drivers!
I think Greg is right about .dbf file. You should try to find some information about other file formats using sites like http://filext.com and http://dotwhat.net. The .bak file is usually a copy of another file with the same name, but other extension. For example there may be database.dbf file and database.bak file with backup of it. You should ask (if it's possible) for any details/documentation/source code of application that used that files from your customer.
Back in the DOS days, programmers used to make up their own file extentions pretty much as they saw fit. The DBF might well be a DBase file which is easy enough to read, and the .BAK is probably a backup of one of the other important files, or just a backup left by a text editor.
For the remaining files, first thing I would do is check if they are in a readable ASCII format by opening them in a text editor.
If this doesn't give you a good result, try opening them in a binary editor that shows side by side hex and ASCII with control characters blanked out. Look for repeating patterns that might correspond to record fields. For example, say the .HIS was something like an order histrory file, it might contain embedded product codes or names. If this is the case, count the number of bytes between such fields. If it is a regular number, you probably have a flat binary file of records. This is best decoded by opening the file in the app, looking for values in a given record, and searching for the corresponding values in the binary file. Time consuming, and a pain in the ass, but workable enough once you get the hang of it.
Happy hacking!
.DBF is a dBASE or early FoxPro database.
.DAT was used by Btrieve, and IIRC Paradox for DOS.
The .DBE and .00x files are probably either temporary or index files related to the .DAT files.
.DBF is easy. They'll open with MS Access or Excel (pre-2007 versions of Office, anyway), or with ADO or ODBC.
If the .DAT files are indeed Btrieve, you're in a world of hurt. They're a mess, even if you can get your hands on the right version of the data dictionary and a copy of the Btrieve structure. (Been there, done that, wore out the t-shirt before I got done.)
As others have suggested, I recommend a hex editor if you can't figure out what those files are and that dbf is probably Dbase.
BAK seems to be a backup file. I'm thinking that *.001, *.002, etc might be part of the backup. Are they all the same size? Maybe the backup was broken up into smaller pieces so that it could fit onto removable media?
Finally, take this as a life lesson. Before sending that Statement of Work over, if the customer asks you to import data from System A to System B, always ask for the sample schema and sample data and sample files. Lots of times things that seem straight forward hand up being nightmares.
Good luck!
Be sure to use the Modified date on the files as clues, if the .001, .002, etc all have similar time stamps, maybe along with the .BAK, they might be part of the backup. Also there may be some old cruft in the directory you can (somewhat safely) ignore. Look for .BAT files and try to dissect them as well.
One hint, if the .dbf files are DBase, FoxPro, or one of the other products that used that format. Then you may be able to read them using ODBC. My system still has the ODBC driver for .dbf (Vista, with VS 2008 - how it got there I'd have to hunt up, but I'd guess it was MDAC Microsoft Data Access which put that there). So, you may not have a "world of unpicking to do", if the ODBC driver will read the .dbf files.
I seem to remember (with a little confidence of 20+ years ago DBase III tinkering) that DBase used .001, .002, ... file for memo (big text) fields.
Good luck trying to salvage the data.
The DBF format is fairly common.
The other files are puzzling.
I'm guessing that either you're dealing with old BTrieve files (bad), or (hopefully) with the results of some ill-conceived backup scheme where someone backed up his database into the same directory rather than into the hard drive in which case you could ignore these.
It's now part of Pervasive, but I used, years ago, Data Junction to migrate data from lots of file types to others. Have a look, unless you want to write a parser.
.dat can also be old Clarion 2.1 files... It works on an ISAM basis also, with key/index files
A game that I play stores all of its data in a .DAT file. There has been some work done by people in examining the file. There are also some existing tools, but I'm not sure about their current state. I think it would be fun to poke around in the data myself, but I've never tried to examine a file, much less anything like this before.
Is there anything I should know about examining a file format for data extraction purposes before I dive headfirst into this?
EDIT: I would like very general tips, as examining file formats seems interesting. I would like to be able to take File X and learn how to approach the problem of learning about it.
You'll definitely want a hex editor before you get too far. It will let you see the raw data as numbers instead of as large empty blocks in whatever font notepad is using (or whatever text editor).
Try opening it in any archive extractors you have (i.e. zip, 7z, rar, gz, tar etc.) to see if it's just a renamed file format (.PK3 is something like that).
Look for headers of known file formats somewhere within the file, which will help you discover where certain parts of the data are stored (i.e. do a search for "IPNG" to find any (uncompressed) png files somewhere within).
If you do find where a certain piece of data is stored, take a note of its location and length, and see if you can find numbers equal to either of those values near the beginning of the file, which usually act as pointers to the actual data.
Some times you just have to guess, or intuit what a certain value means, and if you're wrong, well, keep moving. There's not much you can do about it.
I have found that http://www.wotsit.org is particularly useful for known file type formats, for help finding headers within the .dat file.
Back up the file first. Once you've restricted the amount of damage you can do, just poke around as Ed suggested.
Looking at your rep level, I guess a basic primer on hexadecimal numbers, endianness, representations for various data types, and all that would be a bit superfluous. A good tool that can show the data in hex is of course essential, as is the ability to write quick scripts to test complex assumptions about the data's structure. All of these should be obvious to you, but might perhaps help someone else so I thought I'd mention them.
One of the best ways to attack unknown file formats, when you have some control over contents is to take a differential approach. Save a file, make a small and controlled change, and save again. Do a binary compare of the files to find the difference - preferably using a tool that can detect inserts and deletions. If you're dealing with an encrypted file, a small change will trigger a massive difference. If it's just compressed, the difference will not be localized. And if the file format is trivial, a simple change in state will result in a simple change to the file.
The other thing is to look at some of the common compression techniques, notably zip and gzip, and learn their "signatures". Most of these formats are "self identifying" so when they start decompressing, they can do quick sanity checks that what they're working on is in a format they understand.
Barring encryption, an archive file format is basically some kind of indexing mechanism (a directory or sorts), and a way located those elements from within the archive via pointers in the index.
With the the ubiquitousness of the standard compression algorithms, it's mostly a matter of finding where those blocks start, and trying to hunt down the index, or table of contents.
Some will have the index all in one spot (like a file system does), others will simply precede each element within the archive with its identity information. But in the end somewhere, there is information about offsets from one block to another, there is information about data types (for example, if they're storing GIF files, GIF have a signature as well), etc.
Those are the patterns that you're trying to hunt down within the file.
It would be nice if somehow you can get your hand on two versions of data using the same format. For example, on a game, you might be able to get the initial version off the CD and a newer, patched version. These can really highlight the information you're looking for.