FAT deleted files recovery capability

FAT deleted files recovery capability - filesystems

I was working on kinda exploration of File Allocation Table recovery last couple of weeks. My purpose is to locate a possibly deleted file by its signature (for example, ZIP file by "50 4B 03 04" bytes) and recover the whole thing to search inside of it.
I've explored there's a problem with FAT: file system uses allocation table indicies for both cluster chain storing and deleted files marking, making files recovery, at first sight, impossible.
But there's hell of a recovery software advertising promising recovery of files deleted from FAT file system. So, there might be a workaround, I assume.
I've found that we can successfully recover files continuously located on disk. First cluster gives us an index, and index address value gives us strong possiblity of finding a directory entry where file size is stored. But is it the end? I'd like to recover fragmented files as well, but can't find the way.
May anyone know a workaround and help me here a bit, please?

FAT file system uses a directory entry for each file and folder. It shows starting cluster, filename, date and size. To access file, system looks in directory finds file and notes the starting cluster. Then it goes to the FAT (file allocation table) cluster that corresponds to the starting cluster. The starting cluster entry contains the cluster number of the next cluster. The next cluster entry points to the next cluster and so on until you come to an end of file marker which means this is the last cluster used by the file.
When you delete a file or folder. It locates the directory it resides in and changes the 1st letter of the file or folder name entry to E6 hex (not sure if its E6 or something slightly different) and it deletes the FAT chain.
That is why you can recover only contiguous files in FAT system once a file is deleted. All data recovery utilities will use this method. None other available unless you can find traces of the FAT with correct cluster chains still in place.

Related

Retrieve file count without walking through entire file system

I am on a vxworks 6.9 platform. I want to know how many files are in a folder. The file system is DOSFS (FAT). The only way I know how to do this is to simply loop through every file in the folder and count. This gets very expensive the more files in the folder. Is there a more sensible way to do this? Does there exist some internal database or count of all files in a folder?

The FAT filesystem does not keep track of the number of files it contains. What it does contain is:
A boot sector
A filesystem information sector (on FAT32) including:
Last number of known free clusters
Number of the most recently allocated cluster
Two copies of the file allocation table
An area for the root directory (on FAT12 and FAT16)
Data clusters
You'll need to walk the directory tree to get a count.

Date in NLog file name and limit the number of log files

I'd like to achieve the following behaviour with NLog for rolling files:
1. prevent renaming or moving the file when starting a new file, and
2. limit the total number or size of old log files to avoid capacity issues over time
The first requirement can be achieved e.g. by adding a timestamp like ${shortdate} to the file name. Example:
logs\trace2017-10-27.log <-- today's log file to write
logs\trace2017-10-26.log
logs\trace2017-10-25.log
logs\trace2017-10-24.log <-- keep only the last 2 files, so delete this one
According to other posts it is however not possible to use date in the file name and archive parameters like maxArchiveFiles together. If I use maxArchiveFiles, I have to keep the log file name constant:
logs\trace.log <-- today's log file to write
logs\archive\trace2017-10-26.log
logs\archive\trace2017-10-25.log
logs\archive\trace2017-10-24.log <-- keep only the last 2 files, so delete this one
But in this case every day on the first write it moves the yesterday's trace to archive and starts a new file.
The reason I'd like to prevent moving the trace file is because we use Splunk log monitor that is watching the files in the log folder for updates, reads the new lines and feeds to Splunk.
My concern is that if I have an event written at 23:59:59.567, the next event at 00:00:00.002 clears the previous content before the log monitor is able to read it in that fraction of a second.
To be honest I haven't tested this scenario as it would be complicated to set up as my team doesn't own Splunk, etc. - so please correct me if this cannot happen.
Note also I know that it is possible to directly feed Splunk other ways like via network connection, but the current setup for Splunk at our company is reading from log files so it would be easier that way.
Any idea how to solve this with NLog?

When using NLog 4.4 (or older) then you have to go into Halloween mode and make some trickery.
This example makes hourly log-files in the same folder, and ensure archive cleanup is performed after 840 hours (35 days):
fileName="${logDirectory}/Log.${date:format=yyyy-MM-dd-HH}.log"
archiveFileName="${logDirectory}/Log.{#}.log"
archiveDateFormat="yyyy-MM-dd-HH"
archiveNumbering="Date"
archiveEvery="Year"
maxArchiveFiles="840"
archiveFileName - Using {#} allows the archive cleanup to generate proper file wildcard.
archiveDateFormat - Must match the ${date:format=} of the fileName (So remember to correct both date-formats, if change is needed)
archiveNumbering=Date - Configures the archive cleanup to support parsing of filenames as dates.
archiveEvery=Year - Activates the archive cleanup, but also the archive file operation. Because the configured fileName automatically ensures the archive file operation, then we don't want any additional archive operations (Ex. avoiding generating extra empty files at midnight).
maxArchiveFiles - How many archive files to keep around.
With NLog 4.5 (Still in BETA), then it will be a lot easier (As one just have to specify MaxArchiveFiles). See also https://github.com/NLog/NLog/pull/1993

How to modify a single file inside a very large zip without re-writing the entire zip?

I have large zip files that contain huge files. There are "metadata" text files within the zip archives that need to be modified. However, it is not possible to extract the entire zip and re-compress it. I need to locate the target text file inside the zip, edit it, and possibly append the change to the zip file. The file name of the text file is always the same, so it can be hard-coded. Is this possible? Is there a better way?

There are two approaches. First, if you're just trying to avoid recompression of the entire zip file, you can use any existing zip utility to update a single file in the archive. This will entail effectively copying the entire archive and creating a new one with the replaced entry, then deleting the old zip file. This will not recompress the data not being replaced, so it should be relatively fast. At least, about the same time required to copy the zip archive.
If you want to avoid copying the entire zip file, then you can effectively delete the entry you want to replace by changing the name within the local and central headers in the zip file (keeping the name the same length) to a name that you won't use otherwise and that indicates that the file should be ignored. E.g. replacing the first character of the name with a tilde. Then you can append a new entry with the updated text file. This requires rewriting the central directory at the end of the zip file, which is pretty small.
(A suggestion in another answer to not refer to the unwanted entry in the central directory will not necessarily work, depending on the utility being used to read the zip file. Some utilities will read the local headers for the zip file entry information, and ignore the central directory. Other utilities will do the opposite. So the local and central entry information should be kept in sync.)

There are "metadata" text files within the zip archives that need to be modified.
However, it is not possible to extract the entire zip and re-compress it.
This is a good lesson why, when dealing with huge datasets, keeping the metadata in the same place with the data is a bad idea.
The .zip file format isn't particularly complicated, and it is definitely possible to replace something inside it. The problem is that the size of the new data might increase, and not fit anymore into the location of the old data. Thus there is no standard routine or tool to accomplish that.
If you are skilled enough, theoretically, you can create your own zip handling functions, to provide the "file replace" routine. If it is about the (smallish) metadata, you do not even need to compress them. The .zip's "central directory" is located in the end of the file, after the compressed data (the format was optimized for appending new files). General concept is: read the "central directory" into the memory, append the new modified file after the compressed data, update the central directory in memory with the new file offset of the modified file, and write the central directory back after the modified file. (The old file would be still sitting somewhere inside the .zip, but not referenced anymore by the "central directory".) All the operations would be happening at the end of the file, without touching the rest of the archive's content.
But practically speaking, I would recommend to simply keep the data and the metadata separately.

How to detect end of directory in the data area of FAT32?

I am working with the disk file directly. since the size of a directory is 0 in the directory structure, I wonder how do I detect the end of a directory file on the disk.
DIR_Name[0] == 0x00
The above way to detect end of directory doesn't seem reliable. I found on wiki that the size of the root directory in FAT32 is fixed to 512 entries, but what about other subdirectories. I might need to traverse down directories using the FAT and the cluster number.

From the first Google search result for "fat32 on disk format", page 24:
When a directory is created, a file with the ATTR_DIRECTORY bit set in
its DIR_Attr field, you set its DIR_FileSize to 0. DIR_FileSize is not
used and is always 0 on a file with the ATTR_DIRECTORY attribute
(directories are sized by simply following their cluster chains to the
EOC mark).
Also: The FAT32 root directory size is not fixed at 512 entries; its size is determined in exactly the same way as any other directory.
From another reliable source:
Reading Directories
The first step in reading directories is finding and reading the root
directory. On a FAT 12 or FAT 16 volumes the root directory is at a
fixed position immediately after the File Allocation Tables:
first_root_dir_sector = first_data_sector - root_dir_sectors;
In FAT32, root directory appears in data area on given cluster and can be
a cluster chain.
root_cluster_32 = extBS_32->root_cluster;
Emphasis added.

A non-root directory is just a file.
The root directory starts in a fixed place on the disk (following the FAT). An entry in the root directory contains a cluster number. That cluster contains the data of the file or directory. The entry of that cluster number in the FAT, i.e. FAT[cluster_number] contains the number of the next cluster that belongs to the file or directory. That cluster contains more data of the file or directory and the FAT entry contains the number of the next cluster of the file or directory, etcetera, until you encounter the end-of-cluste mark, a value equal to or greater than 0xFFFFFFF8.

Filesystem links on a FAT32 formatted storage

I know FAT32, as well as FAT16/12 neither support symbolic links nor hard-links. However I came up with this idea:
The FAT specification describes that every file is associated with a directory-entry. In my understanding, one could say that a file-entry in a directory somehow or other points to the file's content.
So, how can I define two directory-entries which point to the same file-content? Or, what could prevent me from doing so?
Use case: I have a USB mass storage device for my car radio, and I want to use directories as playlists since the radio software doesn't support playlists. So it isn't important to me how Windows behaves when doing this.

This should work for simple issues. I.e. it works as a hack / workaround and I don't know what happens if you rename / move / remove files. So, you should not do this on your main hdd.
I edited the directory-entries manually using a hex editor. I modified clusters as well as file-sizes and successfully faked hardlinks. My car-radio and even Windows (7, 64Bit) have no problems with playing back the original and "hard-linked" mp3-Files I used.
When I'm opening the device again in the hex-editor none of my modifications are changed back (See chkdsk issue in answer #1 - but as far as I know chkdsk has to be started manually, anyways.

What you are talking about ("two directory-entries which are pointing to the same file-content") are hard links. chkdsk will report them as cross-links and break them, "repairing" the files (in fact making the copies).

MichaelPh posted instructions on SuperUser:
https://superuser.com/a/486829/51237
It's possible to use Disk Probe (on XP only, I've yet to get it to write the changes on Win7) to modify the cluster a FAT Directory references. This method can be used to redirect the DCIM folder (or a subfolder) to point to the folder used by a different scan device.
Whether this is a good idea or not is a different matter and you use this at your own risk.
Insert the Eye-Fi card either in it's USB Card Reader or directly into an SSD slot and note the drive letter it's installed as (assumed to be F:\ for simplicity)
Ensure all Windows Explorer windows for the card and sub-directories are closed.
Run Disk Probe
Select Drives->Logical Volume
In the Open Logical Volume dialog double-click F:\ in the Logical Volumes list
Click the Set Active button for the Handle F: has been selected as. You can leave the handle as read-only for now.
Select Tools->Search Sectors...
Check Exhaustive Search, enter DCIM in Enter characters to search for and Search
You should find a match (mine is at 8192). Select No on the "Found match..." dialog to cancel the rest of the search.
Select Sectors->Read and increase Number of Sectors to at least two so that the whole directory table is included.
Find DCIM in the ASCII on the right of the Disk Probe screen, this is the start of the FAT entry for the directory. Make a note of the hex value of the 27th byte of the record (each entry is 32bytes), this is the directory cluster reference. This value is require to revert the DCIM directory back to normal use if required.
Find the entry for the directory you want to redirect DCIM to and again make a note of the 27th byte in the record.
Go back to the 27th byte of the DCIM record and change it to the value noted in step 11.
Select Sectors->Write and then click Write it on the Write Sector dialog. A warning will come up if you opened the sectors as read-only. Yes to overwrite if you're happy to make the change.
Opening the DCIM directory in Windows Explorer will now show the contents of the target directory.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight