Weird directory entries in FAT file system - filesystems

So I'm trying to figure out how the FAT FS works and got confused by the root directory table. I have two files in the partition: test.txt and innit.eh which results in the following table:
The entries starting with 0xE5 are deleted, so I assume these were created due to renaming. The entries for the actual files look like this:
TEST TXT *snip*
INNIT EH *snip*
What I don't understand is where the entries like
At.e.s.t......t.x.t
Ai.n.n.i.t.....e.h.
are coming from and what are they for. They do not start with 0xE5, so should be treated as existing files.
By the way, I'm using Debian Linux to create filesystems and files, but I noticed similar behaviour on FS and files created on Windows.

The ASCII parts of the name (where the letters were close to each other) is the legacy 8.3 DOS shortname. You see it only uses capital letters. In DOS, only these would be there.
The longer parts (with 0x00 in between) is the long name (shown in Windows) which is Unicode, and uses 16bits per character.

The intervening bytes are all 0x00, which gives a strong feeling that they are stored in UTF-16 instead of UTF-8. Perhaps they are there as an extension similar to the other VFAT extensions for long filenames?

Related

Is there a tool to search differences in hex files?

I have two hex files, and want to search for values that have changed from 9C in one file to 9D in the next one.
Even a tool that synchronizes the windows would work, as I can't seem to find any that enable me to do this. All I've found are tools that let me search for the same value and same address between two files.
Thanks!
"There's no such thing as a hex file."
Oh well perhaps I should delete all my hex files.
I use Motorola and Intel formats of hex files (they are just text files representing bytes, but include a record structure for addresses, checksum etc per line. These files need to be treated as binary (in .gitattributes) since they cannot be merged. I use Kdiff3 to see differences (or any other text diff tool).

Reading in the output of a program, saving it as a string, and using that string in the original program

I am using trying to get specific information from a group of MP3 files, currently I am in the main cygwin64 that holds MP3 files and a .C file which simply contains
FILE * fp;
It contains that single line of code because when that line of code is in place and I type and run "thing.c" in the cygwin command line it outputs what seems the be the information of the contents of the folder. For example it outputs,
home: sticky, directory
lib: directory
sbin: directory
setup-x86_64.exe: PE32+ executable (GUI) x86-64 (stripped to external PDB), for MS Windows
song.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 128 kbps, 44.1 kHz, JntStereo
song1.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 128 kbps, 44.1 kHz, JntStereo
thing.c: ASCII text, with CRLF line terminators
thing.txt: empty
What I want to do is be able to pull that output into a string that I can then use in my C file and alter and then re print out the new altered information. However I'm not sure where the output really is coming from or how I might be able to get it or save the output as a .txt file or back into a C file.
Any advice is appreciated Thanks!
This file is not really a C file at all. Because you're in Cygwin, you're likely operating on a case-insensitive filesystem (NTFS). As such, Cygwin's file command is running when you run the .c file. The way you've attempted to declare a variable (apparently) just so happens to be doing a 'file * fp' command. I'm sure you're getting fp: Cannot open "fp" or something similar after the rest of your output.
This is not anything C-related at all but is just being interpreted as a script by your shell.
It sounds like you have a lot to learn if you want to do this in C. More likely, you can probably write a shell script to accomplish what you want. While I've never used it, mp3info (https://github.com/jaalto/cygwin-package--mp3info) exists for pulling tag information from MP3 files. You could possibly get the exact information you want from that, or pipe the output into sed, awk, or a number of other tools.

Header and structure of a tar format

I have a project for school which implies making a c program that works like tar in unix system. I have some questions that I would like someone to explain to me:
The dimension of the archive. I understood (from browsing the internet) that an archive has a define number of blocks 512 bytes each. So the header has 512 bytes, then it's followed by the content of the file (if it's only one file to archive) organized in blocks of 512 bytes then 2 more blocks of 512 bytes.
For example: Let's say that I have a txt file of 0 bytes to archive. This should mean a number of 512*3 bytes to use. Why when I'm doing with the tar function in unix and click properties it has 10.240 bytes? I think it adds some 0 (NULL) bytes, but I don't know where and why and how many...
The header chcksum. As I know this should be the size of the archive. When I check it with hexdump -C it appears like a number near the real size (when clicking properties) of the archive. For example 11200 or 11205 or something similar if I archive a 0 byte txt file. Is this size in octal or decimal? My bets are that is in octal because all information you put in the header it needs to be in octal. My second question at this point is what is added more from the original size of 10240 bytes?
Header Mode. Let's say that I have a file with 664, the format file will be 0, then I should put in header 0664. Why, on a authentic archive is printed 3 more 0 at the start (000064) ?
There have been various versions of the tar format, and not all of the extensions to previous formats were always compatible with each other. So there's always a bit of guessing involved. For example, in very old unix systems, file names were not allowed to have more than 14 bytes, so the space for the file name (including path) was plenty; later, with longer file names, it had to be extended but there wasn't space, so the file name got split in 2 parts; even later, gnu tar introduced the ##LongLink pseudo-symbolic links that would make older tars at least restore the file to its original name.
1) Tar was originally a *T*ape *Ar*chiver. To achieve constant througput to tapes and avoid starting/stopping the tape too much, several blocks needed to be written at once. 20 Blocks of 512 bytes were the default, and the -b option is there to set the number of blocks. Very often, this size was pre-defined by the hardware and using wrong blocking factors made the resulting tape unusable. This is why tar appends \0-filled blocks until the tar size is a multiple of the block size.
2) The file size is in octal, and contains the true size of the original file that was put into the tar. It has nothing to do with the size of the tar file.
The checksum is calculated from the sum of the header bytes, but then stored in the header as well. So the act of storing the checksum would change the header, thus invalidate the checksum. That's why you store all other header fields first, set the checksum to spaces, then calculate the checksum, then replace the spaces with your calculated value.
Note that the header of a tarred file is pure ascii. This way, In those old days, when a tar file (whose components were plain ascii) got corrupted, an admin could just open the tar file with an editor and restore the components manually. That's why the designers of the tar format were afraid of \0 bytes and used spaces instead.
3) Tar files can store block devices, character devices, directories and such stuff. Unix stores these file modes in the same place as the permission flags, and the header file mode contains the whole file mode, including file type bits. That's why the number is longer than the pure permission.
There's a lot of information at http://en.wikipedia.org/wiki/Tar_%28computing%29 as well.

Binary structure for a directory with files

I'm trying to solve the following problem. I want to create a set of directories with files in them , but in memory using C# , using strings / byte arrays, and I am trying to figure out what's the format and byte sequence for all of this. i mean something like
<magic sequence for top directory header> <magic sequence for file header> </ end file> ... file 2 file 3 ... etc ... </magic sequence for the directory header> , etc.
I'm talking about windows formats here.
Could you point me to a location where i can read about this or even better, give me some existing examples?
Thanks !
Angel
What you are looking depends on the filesystem in use. The filesystem defines how files and directories are stored. On windows a common format is FAT32 and NTFS (which is not publically available).
As shown in the link above, the FAT format is specified by Microsoft

What are reserved filenames for various platforms?

I'm not asking about general syntactic rules for file names. I mean gotchas that jump out of nowhere and bite you. For example, trying to name a file "COM<n>" on Windows?
From: http://www.grouplogic.com/knowledge/index.cfm/fuseaction/view_Info/docID/111.
The following characters are invalid as file or folder names on Windows using NTFS: / ? < > \ : * | " and any character you can type with the Ctrl key.
In addition to the above illegal characters the caret ^ is also not permitted under Windows Operating Systems using the FAT file system.
Under Windows using the FAT file system file and folder names may be up to 255 characters long.
Under Windows using the NTFS file system file and folder names may be up to 256 characters long.
Under Window the length of a full path under both systems is 260 characters.
In addition to these characters, the following conventions are also illegal:
Placing a space at the end of the name
Placing a period at the end of the name
The following file names are also reserved under Windows:
aux,
com1,
com2,
...
com9,
lpt1,
lpt2,
...
lpt9,
con,
nul,
prn
Full description of legal and illegal filenames on Windows: http://msdn.microsoft.com/en-us/library/aa365247.aspx
A tricky Unix gotcha when you don't know:
Files which start with - or -- are legal but a pain in the butt to work with, as many command line tools think you are providing options to them.
Many of those tools have a special marker "--" to signal the end of the options:
gzip -9vf -- -mydashedfilename
As others have said, device names like COM1 are not possible as filenames under Windows because they are reserved devices.
However, there is an escape method to create and access files with these reserved names, for example, this command will redirect the output of the ver command into a file called COM1:
ver > "\\?\C:\Users\username\COM1"
Now you will have a file called COM1 that 99% of programs won't be able to open, and will probably freeze if you try to access.
Here's the Microsoft article that explains how this "file namespace" works. Basically it tells Windows not to do any string processing on the text and to pass it straight through to the filesystem. This trick can also be used to work with paths longer than 260 characters.
The boost::filesystem Portability Guide has a lot of good info.
Well, for MSDOS/Windows, NUL, PRN, LPT<n> and CON. They even cause problems if used with an extension: "NUL.TXT"
Unless you're touching special directories, the only illegal names on Linux are '.' and '..'. Any other name is possible, although accessing some of them from the shell requires using escape sequences.
EDIT: As Vinko Vrsalovic said, files starting with '-' and '--' are a pain from the shell, since those character sequences are interpreted by the application, not the shell.

Resources