Creation of maximum size file in C: drive (OS drive) fails - c

I have developed a disk and file wiping software (using WIN32 api) which also contains option of wiping drive's free space. I do this by creating a file which is size of drive's free space available and then I write random bytes (Applying various standard wiping schemes) on that file.
My problem is that It works well on on every other drive except on the drive which has windows operating system installed (in my case, it is C:). It gives "Not enough disk space" error although the said drive has lots of free space available. My program runs with administrative privileges. Is it some kind of privileges issue? Do I need to give more privileges to my program even after running it with administrator? I would want to do it programatically using winapi.
I am testing mostly on NTFS file system. I am creating file using CreateFile winapi call and to make sure to create exact size of file equaling available free space, I am using fragmentation api to get free space and then using SetEndOfFile winapi method to extend the size of file.
Any help would be appreciated.

Windows doesn't handle running out of disk space well, so it wouldn't surprise me if the system reserves some disk space ordinary users, even administrators can't use. However regardless of whether that is true or not, your scheme is flawed. It's easily possible for the amount of space allocated to change in between the the time you find out how much space is free and you try to allocate it all. And even it works, things might start breaking because the disk is full, even if only for a short while.
Since you're already using the defragmentation API, I'd use it to wipe all the clusters on the disk without trying to allocate them all once. First create a file that fills up most of the disk space, but leaves plenty of room for other processes to allocate files. Then use FSCTL_GET_VOLUME_BITMAP to get the bitmap of unallocated sectors, and FSCTL_MOVE_FILE to move clusters from the file you created into the free clusters found in the bitmap. You'll need to be ready for FSCTL_MOVE_FILE to fail because something allocated one of the clusters marked as free. In that case I would keep halving the number of clusters you were moving at once until it worked. If it fails with only one cluster, than you know that it's the cluster (or one of the clusters) that's been allocated.
Something like this pseudocode:
// unalloc_start and unalloc_len describe an unallocated region in the free space bitmap
wipe_unallocated_clusters(hwipefile, unalloc_start, unalloc_len, max_chunk_len) {
unalloc_vcn = unalloc_start
unalloc_end = unalloc_start + unalloc_len
max_chunck_len = max(max_chunk_len, unalloc_len)
clen = max_chunk_len
while(unalloc_vcn < unalloc_end) {
clen = max(clen, unalloc_end - unalloc_vcn)
while (fsctl_move_file(hwipefile, 0, unalloc_vcn, clen) == FAILED) {
if (clen == 1) {
unalloc_vcn++ // skip over allocated cluster
continue
}
clen /= 2 // try again with half as many clusters
}
unalloc_vcn += clen
clen = max(clen * 2, max_chunk_len) // double the clusters if it worked
}
}

Seeing some code and specific values that fail would be nice, but I'll try to use my psychic powers. I have two theories.
1) your file size exceeds maximum file size for the filesystem on C:. FAT16 supports up to 2GB, FAT32 supports up to 4GB files.
2) you have too many files in your root directory. There are some reports that Windows FAT32 implementation supports only 1000 file entries in the root directory.

Related

Is it possible to read a file without loading it into memory?

I want to read a file but it is too big to load it completely into memory.
Is there a way to read it without loading it into memory? Or there is a better solution?
I want to read a file but it is too big to load it completely into memory.
Be aware that -in practice- files are an abstraction (so somehow an illusion) provided by your operating system thru file systems. Read Operating Systems: Three Easy Pieces (freely downloadable) to learn more about OSes. Files can be quite big (even if most of them are small), e.g. many dozens of gigabytes on current laptops or desktops (and many terabytes on servers, and perhaps more).
You don't define what is memory, and the C11 standard n1570 uses that word in a different way, speaking of memory locations in §3.14, and of memory management functions in §7.22.3...
In practice, a process has its virtual address space, related to virtual memory.
On many operating systems -notably Linux and POSIX- you can change the virtual address space with mmap(2) and related system calls, and you could use memory-mapped files.
Is there a way to read it without loading it into memory?
Of course, you can read and write partial chunks of some file (e.g. using fread, fwrite, fseek, or the lower-level system calls read(2), write(2), lseek(2), ...). For performance reasons, better use large buffers (several kilobytes at least). In practice, most checksums (or cryptographic hash functions) can be computed chunkwise, on a very long stream of data.
Many libraries are built above such primitives (doing direct IO by chunks). For example the sqlite database library is able to handle database files of many terabytes (more than the available RAM). And you could use RDBMS (they are software coded in C or C++)
So of course you can deal with files larger than available RAM and read or write them by chunks (or "records"), and this has been true since at least the 1960s. I would even say that intuitively, files can (usually) be much larger than RAM, but smaller than a single disk (however, even this is not always true; some file systems are able to span several physical disks, e.g. using LVM techniques).
(on my Linux desktop with 32Gbytes of RAM, the largest file has 69Gbytes, on an ext4 filesystem with 669G available and 780G total space, and I did had in the past files above 100 Gbytes)
You might find worthwhile to use some database like sqlite (or be a client of some RDBMS like PostGreSQL, etc...), or you could be interested in libraries for indexed files like gdbm. Of course you can also do direct I/O operations (e.g. fseek then fread or fwrite, or lseek then read or write, or pread(2) or pwrite ...).
I need the content to do a checksum, so I need the complete message
Many checksum libraries support incremental updates to the checksum. For example, the GLib has g_checksum_update(). So you can read the file a block at a time with fread and update the checksum as you read.
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
#include <glib.h>
int main(void) {
char filename[] = "test.txt";
// Create a SHA256 checksum
GChecksum *sum = g_checksum_new(G_CHECKSUM_SHA256);
if( sum == NULL ) {
fprintf(stderr, "Could not create checksum.\n");
exit(1);
}
// Open the file we'll be checksuming.
FILE *fp = fopen( filename, "rb" );
if( fp == NULL ) {
fprintf(stderr, "Could not open %s: %s.\n", filename, strerror(errno));
exit(1);
}
// Read one buffer full at a time (BUFSIZ is from stdio.h)
// and update the checksum.
unsigned char buf[BUFSIZ];
size_t size_read = 0;
while( (size_read = fread(buf, 1, sizeof(buf), fp)) != 0 ) {
// Update the checksum
g_checksum_update(sum, buf, (gssize)size_read);
}
// Print the checksum.
printf("%s %s\n", g_checksum_get_string(sum), filename);
}
And we can check it works by comparing the result with sha256sum.
$ ./test
0c46af5bce717d706cc44e8c60dde57dbc13ad8106a8e056122a39175e2caef8 test.txt
$ sha256sum test.txt
0c46af5bce717d706cc44e8c60dde57dbc13ad8106a8e056122a39175e2caef8 test.txt
One way to do this, if the problem is RAM, not virtual address space, is memory mapping the file, either via mmap on POSIX systems, or CreateFileMapping/MapViewOfFile on Windows.
That can get you what looks like a raw array of the file bytes, but with the OS responsible for paging the contents in (and writing them back to disk if you alter them) as you go. When mapped read-only, it's quite similar to just malloc-ing a block of memory and fread-ing to populate it, but:
It's lazy: For a 1 GB file, you're not waiting the 5-30 seconds for the whole thing to be read in before you can work with any part of it, instead, you just pay for each page on access (and sometimes, the OS will pre-read in the background, so you don't even have to wait on the per-page load in)
It responds better under memory pressure; if you run out of memory, the OS can just drop clean pages from memory without writing them to swap, knowing it can page them back in from the golden copy in the file whenever they're needed; with malloc-ed memory, it has to write it out to swap, increasing disk traffic at a time when you're likely oversubscribed on the disk already
Performance-wise, this can be slightly slower under default settings (since, without memory pressure, reading the whole file in mostly guarantees it will be in memory when asked for, while random access to a memory mapped file is likely to trigger on-demand page faults to populate each page on first access), though you can use posix_madvise with POSIX_MADV_WILLNEED (POSIX systems) or PrefetchVirtualMemory (Windows 8 and higher) to provide a hint that the entire file will be needed, causing the system to (usually) page it in in the background, even as you're accessing it. On POSIX systems, other advise hints can be used for more granular hinting when paging the whole file in at once isn't necessary (or possible), e.g. using POSIX_MADV_SEQUENTIAL if you're reading the file data in order from beginning to end usually triggers more aggressive prefetch of subsequent pages, increasing the odds that they're in memory by the time you get to them. By doing so, you get the best of both worlds; you can begin accessing the data almost immediately, with a delay on accessing pages not paged in yet, but the OS will be pre-loading the pages for you in the background, so you eventually run as full speed (while still being more resilient to memory pressure, since the OS can just drop clean pages, rather than writing them to swap first).
The main limitation here is virtual address space. If you're on a 32 bit system, you're likely limited to (depending on how fragmented the existing address space is) 1-3 GB of contiguous address space, which means you'd have to map the file in chunks, and can't have on-demand random access to any point in the file at any time without additional system calls. Thankfully, on 64 bit systems, this limitation rarely comes up; even the most limiting 64 bit systems (Windows 7) provide 8 TB of user virtual address space per process, far larger than the vast, vast majority of files you're likely to encounter (and later versions increase the cap to 128 TB).

Managing Log File Sizes

I am implementing a simple log file handler for an embedded device. I cannot use syslog because it is already reserved for other uses. The device's SSD size is limited, so there is a real risk of the log file using all of the disk space, which will crash the device.
What is the cheapest way I can guarantee I will have at least X remaining disk space after a write?
On Linux, the only way to find out the amount of remaining disk space is the statfs(2) syscall. If that's too slow for you, I think you'll just have to call it less frequently and assume that you aren't logging so much in between calls that you're filling up too much.
On many modern filesystems, it can generally be difficult to try and map how much less free space will remain after a particular write. Not only do you have block-granularity in allocation (or not, in case your filesystem supports tail-packing), but on some filesystems you may also be affected by sudden copy-on-write allocation after data de-duplication, or lazy allocation of zeroed blocks and whatnot. Trying to be too smart about this is bound to get you in trouble when switching between filesystems, so I'd recommend just setting some reasonable low-water mark on available space and stop writing more data after it has been reached.

Fragmentation in modern file systems

I was tinkering with Pintos OS file system and wonder:
How do modern file systems handle fragmentation issue, including internal, external and data?
OK, so it's file fragmentation you are interested in.
The answer is it depends entirely on the file system and the operating system. In the case of traditional eunuchs file systems, the disk is inherently fragmented. There is no concept whatsoever of contiguous files. Files are stored in changed data blocks. This is why paging is done to partitions and most database systems on eunuchs use partitions.
"Hard" file systems that allow contiguous files manage them in different ways. A file consists of one or more "extents." If the initial extent gets filled, the file system manager creates a new extent and chains to it.In some systems there are many options for file creation. One can specify the initial size of the file and reserve space for subsequent allocations (ie, the size of the first extent and the size of additional extents).
When a hard file system gets fragmented, there are different approaches for dealing with it. In some systems, the normal way of "defragging" is to do an image back up to secondary storage then restore. This can be part of the normal system maintenance process.
Other system use "defragging" utilities that either run as part of the regular system schedule or are manually run.
The problem of disk fragmentation is often exaggerated. If you have a disk with a reasonable amount of space, you don't really tend to get much file fragmentation. Disk fragmentation—yes; but this is not really much of a problem if you have sufficient free disk space. File fragmentation occurs when (1) you don't have enough free contiguous disk space or (2) [most likely with reasonable disk space] you have a file that continually gets added data.
Most file systems indeed have ways to deal with fragmentation. I'll however describe the situations for the usual file systems that are not too complex.
For Ext2, for each file there are 12 direct block pointers that point to the blocks where the file is contained. If they are not enough, there is one singly indirect block that points to block_size / 4 blocks. If they are still not enough, there is a doubly indirect block that points to block_size / 4 singly indirect blocks. If not yet enough, there is a triply indirect block that points to block_size / 4 doubly indirect blocks. This way, the file system allows fragmentation at block boundaries.
For ISO 9660, which is the usual file system for CDs and DVDs, the file system doesn't support fragmentation as is. However, it's possible to use multiple consecutive directory records in order to split a big (more than 2G/4G, the maximum describable file size) file into describable files. This might cause fragmentation.
For FAT, the file allocation table describes the location and status of all data clusters on the disk in order to allow fragmentation. So when reading the next cluster, the driver looks up in the file allocation table to find the number of the next cluster.

How to check if a file of given length can be created?

I want to create a non-sparse file of a given length (i.e. 2GB), but I want to check if that is possible before actually writing stuff to disk.
In other words I want to avoid getting ENOSPC (No space left on device) while writing. I'd prefer not to create a "test file" of size 2GB or things like that just to check that there is enough space left.
Is that possible?
Use posix_fallocate(3).
From the description:
The function posix_fallocate() ensures that disk space is allocated
for the file referred to by the descriptor fd for the bytes in the
range starting at offset and continuing for len bytes. After a
successful call to posix_fallocate(), subsequent writes to bytes in
the specified range are guaranteed not to fail because of lack of
disk space
You can use the statvfs function to determine how much free bytes (and inodes) a given filesystem has.
That should be enough for a quick check, but do remember that it's not a guarantee that you'll be able to write as much (or, for that matter, that writing more than that would have failed) - other applications could also be writing to (or deleting from) the same filesystem. So do continue to check for various write errors.
fallocate or posix_fallocate can be used to allocate (and deallocate) a large chunk. Probably a better option for your use-case. (Check the man page, there's a lot of options for space management that you might find interesting.)

Secure File Delete in C

Secure File Deleting in C
I need to securely delete a file in C, here is what I do:
use fopen to get a handle of the file
calculate the size using lseek/ftell
get random seed depending on current time/or file size
write (size) bytes to the file from a loop with 256 bytes written each iteration
fflush/fclose the file handle
reopen the file and re-do steps 3-6 for 10~15 times
rename the file then delete it
Is that how it's done? Because I read the name "Gutmann 25 passes" in Eraser, so I guess 25 is the number of times the file is overwritten and 'Gutmann' is the Randomization Algorithm?
You can't do this securely without the cooperation of the operating system - and often not even then.
When you open a file and write to it there is no guarantee that the OS is going to put the new file on the same bit of spinning rust as the old one. Even if it does you don't know if the new write will use the same chain of clusters as it did before.
Even then you aren't sure that the drive hasn't mapped out the disk block because of some fault - leaving your plans for world domination on a block that is marked bad but is still readable.
ps - the 25x overwrite is no longer necessary, it was needed on old low density MFM drives with poor head tracking. On modern GMR drives overwriting once is plenty.
Yes, In fact it is overwriting n different patterns on a file
It does so by writing a series of 35 patterns over the
region to be erased.
The selection of patterns assumes that the user doesn't know the
encoding mechanism used by the drive, and so includes patterns
designed specifically for three different types of drives. A user who
knows which type of encoding the drive uses can choose only those
patterns intended for their drive. A drive with a different encoding
mechanism would need different patterns.
More information is here.
#Martin Beckett is correct; there is so such thing as "secure deletion" unless you know everything about what the hardware is doing all the way down to the drive. (And even then, I would not make any bets on what a sufficiently well-funded attacker could recover given access to the physical media.)
But assuming the OS and disk will re-use the same blocks, your scheme does not work for a more basic reason: fflush does not generally write anything to the disk.
On most multi-tasking operating systems (including Windows, Linux, and OS X), fflush merely forces data from the user-space buffer into the kernel. The kernel will then do its own buffering, only writing to disk when it feels like it.
On Linux, for example, you need to call fsync(fileno(handle)). (Or just use file descriptors in the first place.) OS X is similar. Windows has FlushFileBuffers.
Bottom line: The loop you describe is very likely merely to overwrite a kernel buffer 10-15 times instead of the on-disk file. There is no portable way in C or C++ to force data to disk. For that, you need to use a platform-dependent interface.
MFT(master File Table) similar as FAT (File Allocation table),
MFT keeps records: files offsets on disk, file name, date/time, id, file size, and even file data if file data fits inside record's empty space which is about 512 bytes,1 record size is 1KB.
Note: New HDD data set to 0x00.(just let you know)
Let's say you want overwrite file1.txt OS MFT finds this file offset inside record.
you begin overwrite file1.txt with binary (00000000) in binary mode.
You will overwrite file data on disk 100% this is why MFT have file offset on disk.
after you will rename it and delete.
NOTE: MFT will mark file as deleted, but you still can get some data about this file i.e. date/time : created, modified, accessed. file offset , attributes, flags.
1- create folder in c:\ and move file and in same time rename in to folder( use rename function ) rename file to 0000000000 or any another without extention
2- overwrite file with 0x00 and check if file was overwrited
3- change date/time
4- make without attributes
5- leave file size untouched OS faster reuse empty space.
6- delete file
7- repeat all files (1-6)
8- delete folder
or
(1, 2, 6, 7, 8)
9- find files in MFT remove records of these files.
The Gutmann method worked fine for older disk technology encoding schemes, and the 35 pass wiping scheme of the Gutmann method is no longer requuired which even Gutmann acknowledges. See: Gutmann method at: https://en.wikipedia.org/wiki/Gutmann_method in the Criticism section where Gutmann discusses the differences.
It is usually sufficient to make at most a few random passes to securely delete a file (with possibly an extra zeroing pass).
The secure-delete package from thc.org contains the sfill command to securely wipe disk and inode space on a hard drive.

Resources