Analyze VMDK (vmware virtual machine disk) files for changes

Analyze VMDK (vmware virtual machine disk) files for changes - filesystems

Is there a good way to analyze VMware delta VMDK files between snapshots to list changed blocks, so one can use a tool to tell which NTFS files are changed?

I do not know a tool that does this out of the block, but it should not be so difficult.
The VMDK file format specification is available and the format is not that complex. As far as I remember, a VMDK file consists of a lot of 64k block. At the beginning of the VMDK file there is some directory that contains the information where a logical block is stored in the physical file.
It should be pretty easy to detect there a logical block is stored in both files and than compare the data in the two version of the VMDK file.

Related

Scan a USB for Folders which have mp3 file in them using ELM CHAN fatfs?

I am tying to scan usb msc in stm32 for audio files. This mp3 files are scattered in many folders which are unknown to the application.
First I scan for directories in the root folder and find folders then I scan folders for mp3 files.
This is very time consuming and for depth of 8 folders with many files in each folder.
Is there any Way to Scan for just for folders which have mp3 file in them using a better approach.
Directory structure for testing is something like this:

It is not clear what your problem might be since you have not provided any code to see how you are scanning, or quantative information on the file/folder structure ("many files" is rather vague), or even specified the media type used.
However a solution that might overcome all the variables of filesystem performance, hardware I/O, driver implementation and media type and make access more deterministic regardless, is to maintain a separate index file or database in a single file in the root directory to map each MP3 file to its path so you need only search the index/database for the MP3 you need (or use it to directly list all MP3's without scanning the file system).
If you maintain that file sorted (or a separate index file that is sorted) then you can use a binary search to find a specific file. Or simply use a real database - though that might be a rather heavyweight solution for this purpose. You might even load the metadata into memory for even faster access, and write it to the filesystem only if it changes.
Either way, the solution I suggest is to isolate your application from the variability of the filesystem/media, and the lack of scalability of FAT in general by maintaining your own "metadata" file(s) indicating what is stored and where so that you can use that to access the files directly without file system scanning using findfirst/findnext semantics, and recursion which is always best avoided, but is the obvious way to scan a directory tree.
Incidentally this is precisely how iTunes works for example. The "iTunes Library.xml" contains meta-data about "songs" including their location. Clearly you need not have anything quite that detailed, but the principle is the same and there may be merit in using XML or JSON for your application given a suitable library for updating and accessing such a file.
By doing that, the performance is more directly within your control rather than dependent on the filesystem, media and/or device driver level. However you still have some control/responsibility over the media and its interface (SPI, SDIO, USB or whatever), and the device I/O layer (DMA, interrupts, polled, bit-banged), and while you may have little control over the choice of FAT and the ELM FatFs implementation, you can certainly impact its performance greatly at the device driver, hardware interface and physical media level.

HSQL data file sparse?

I have a situation where my HSQL database has stopped working citing that it has exceeded the size limit
org.hsqldb.HsqlException: error in script file line: 41 file input/output errorerror org.hsqldb.HsqlException: wrong database file version: requires large database support opening file - file hsql/mydb.data;
When i checked the size of the data file using "du -sh" it was only around 730MB, but when i performed a "ls -alh" it gave me a shocking size of 16G which explains why HSQL probably reports it as a 'large database'. So, the data file seems to be a "sparse file"
But,nobody did change the data file to a sparse file, does HSQL maintain the data file as a sparse file? Or has the file system marked it as sparse?
How do i work around this to get back my HSQL database without corrupting data in it? I was thinking on using the hsqldb.cache_file_scale property, but it would still mean that I would hit the problem again when the file grows to 64G
Just in case if it matters I am running it on a Debian 3.2 box and it runs on Java 7u25.

You need to perform CHECKPOINT DEFRAG to compact the file from time to time.
When a lot of data is deleted, the space in the .data file is lost. The above command rewrites the current .data file to a much smaller new file.
If the file has already grown very large, or if you need to have a huge database, you can connect with the hsqldb.large_data=true property that enables very large databases.
http://hsqldb.org/doc/2.0/guide/dbproperties-chapt.html

Where filesystems store their file metadata

So I know that a file is composed by it's data and also metadata, which is information about it (usually the name, the type of the file, dates of creation and modification, etc.).
My question is where exactly is that information stored. I know it can be included inside the file, the directory or in a database, but for the Windows, Linux and MAC-OS file systems I can't seem to find this information...

Most of this information in the case of Windows and Mac is proprietary.
For Windows I can say for certain that a close enough version of the NTFS file system driver has been written for Linux. You can look into that, there's also some documentation most of it written by Richard 'Flatcap' Russon (http://www.flatcap.org/ntfs/).
Documentation on the FAT Filesystem has been made public a long time ago with the intent that it would provide ample information for developers and engineers working on flash drives and things like that. (http://msdn.microsoft.com/en-us/library/windows/hardware/gg463080.aspx)
Documentation on the Ext Filesystem used by Linux distributions can be found on the web easily. (Ext2 : http://www.nongnu.org/ext2-doc/ext2.html)
I have no clue what Mac uses but I bet it's some kind of proprietary abomination derived from an existing format (probably ext). This is just my opinion, do not take this as fact.
All these formats have some sort of structure that holds meta-data. The file is just a stream of bytes somewhere on the physical drive. Most filesystems should have a structure that stores at least the file's location (usually a starting cluster for each fragment of the file) and the file's size. The rest of meta-data is up to each file system to implement.
For example, the FAT Filesystem there are tables for each directory and each directory stores metadata about the files it contains. But it also has a FAT table that holds the fragment locations for each file contained in the filesystem.
The NTFS Filesystem has a big table called the Master File Table that holds a record of metadata for each file contained by the filesystem including the table itself. Each record holds all the metadata including the file location on the physical drive for each fragment.
However, the directory structure is held as data in directory file records. However, the NTFS has even more structures that hold information about the files such as the USN Journal or the Volume Bitmap.
To access the meta-data contained by the file system you either have to parse the raw volume or to use functions exposed by the operating system's API. The API doesn't generally give you all the information you want about the metadata. For example, Windows API will give you functions to iterate through the USN Journal to find information about a particular file, but you can't get the MFT attributes of a file directly.
Again, I have to stress out that even with most of the documentation on these proprietary file systems, you're taking shots in the dark since it's their intellectual property. Some if not most of the documentation that we have now comes from reverse engineering.

It depends on the filesystem. Take a look at http://lxr.free-electrons.com/source/fs/fat/fat.h for instance.

systems#Metadata show details on which file systems store information in metadata.

Adding Capability to NFS Server - Compressing/Decompressing Stored/Retrieved Files

I need to build a custom Suse Linux NFS Server that does compression on certain files that are stored on the disk, and decompresses files as they are read from the disk. This needs to be transparent to the remote users of the file system, meaning that if a user saves a 10MB file named XYZZY.tif on /archiveDirectoryOnNFSServer, that when they do a ls -l on that mounted directory, they will see a 10MB file called XYZZY.tif, even though the actual file stored on the disk on the NFS server will be XYZZY.tif.compressed, and it will be 2MB in size.
I'm expecting that I need to build this as a driver that sits below the NFS Server software stack, but, I'm having difficulty finding where to start. Are there existing NFS Servers that provide this level of customization through APIs? Will I need to modify source of an open source NFS Server, and, if so, is there one that would be easiest to start with, and are they modularly structured such that this will be straight forward? I'm having difficult locating relevant content on the internet, and any pointers will be greatly appreciated.

IMO that kind of functionality is absolutely not the NFS server's responsibility (an nfs server should, well, serve files over nfs), but the underlying filesystem's. However, there's not that much choice in Linuxland but you could start by checking out fusecompress and btrfs.

This post is a bit old so you may already be aware of some options here, but there are a couple others (both for server side).
http://zfsonlinux.org/
zfs filesystem has built-in compression. I typically use lzjb as it is the fastest compression algorithm and does a reasonable job (MySQL DB's get 2-4x compression, filesystems with non-compressed data get around 4). you have a choice of algorithm depending on how much CPU time you wish to offer the compression.
if you want different file types compressed then you may consider laying gluster on top of a set of zfs filesystems.
gluster will allow you to store certain file types (by extension) on different underlying filesystems.
in this case, you specify the underlying filesystem as a zfs volume with the particular options you need (for example, .zip and .png go on an uncompressed filesystem, while things you write once and read many like static html files might go on a higher compression--you'll pay once when it's written but reads should be really fast since it scans fewer disk blocks and decompression is very fast)
zfs will manage the nfs mounts if you use it as your nfs server--you wont want this if you lay gluster on top.
it's easy to specify dynamically other attributes per filesystem (atime/noatime, # of copies if you want redundancy other than your normal raid, you can add SSD's as cache devices to get more performance).
in these solutions, you still send the full uncompressed files over the wire, so it doesn't make up for network performance but gives a lot of options if you're trying to speed up Disk IO or get more utilization out of your drives.

How are access times, modified times, encodings and filenames stored in files on NTFS and EXT3/4?

For academic and task related purposes I need to know how is file related data associated within files on NTFS and EXT. How does the operating system know file's name? How do editors know in which encoding to treat the file contents?
Are these details stored on a separate information location on the NTFS/EXT or are they included within the file itself?

On NTFS such information is stored not in the file itself but in the master file table (MFT).
You are asking many questions. I suggest you read up on the subject. Here is the short version, and here is everything in full detail.