Premptively getting files into Windows page cache - c

I have a program written in C that allows the user to scroll through a display of about a zillion small files. Each file needs to undergo a certain amount of processing (read only) before it's displayed to the user. I've implemented a buffer that preprocesses the files in a certain radius around the user's position, so if they're working linearly through them, there's not much delay. For various reasons, I can only actually run my processing algorithm on one file at a time (though I can have multiple files open, and I can read from them) so my buffer loads sequentially.
My processing algorithms are as optimized as they're going to get, but I'm running into I/O problems. At first, my loading process is slow, but when the files have been accessed a few times, it speeds up by about 5x. Therefore I strongly suspect that what's slowing me down is waiting for the Windows page cache to pull my files into memory. I know very little about that sort of thing. If I could ensure my files were in the cache before my processing algorithm needed them, I'd be in business.
My question is: is there a way to persuade/cajole/trick/intimidate Windows into loading my files into the page cache before I actually get around to reading/processing them?

There's only one way to get a file into the file system cache: reading it. That's a chicken-and-egg problem. You can get the egg first by using a helper thread that reads files. It would have to have some kind of smarts about what file is likely to be next. And not read too much.

On a POSIX system, you'd use posix_fadvise:
POSIX_FADV_WILLNEED
        Specifies that the application expects to access the specified data in the near future.
However, that doesn't seem to exist on Windows. What is fadvise/madvise equivalent on windows ? - Stack Overflow has some alternatives.

Related

FileSystemWatcher handling moving file - another solution

Hi
I was trying to use FileSystemWatcher to detect if some files or directories has been moved to another location. The problem was, i had to use onCreated and onDeleted events to handle this, but there are many issues using this solution
how could i detect change if i will select more than one file and press Ctrl+C, Ctrl+V, or right-click and select Copy and then Paste in the same directory?
how could i detect, if i will select more than one directory?
the last one, what if i simulate moving file? I could delete file and create with same name in different place.
I know i could use, Timers, process locking detection, verification which process uses file (if explorer.exe then it could be moving file), but this solution is not perfect and it's very ineffective. I was whinking about this how to solve this issue, and i have decided to implement this in low-level language. Is this possible to do this using C, or assembler? I know that every thing is possible to do using assembler, so is it possible to implement this in asm? I would like to create my own FileSystemWatcher using assembler or C but where should i looking for info how to do this?
File movement within the same filesystem can be detected easily using a filesystem filter driver, as the filesystem received the corresponding request from the OS. Other scenarios such as moving to the other disk or moving by copy/delete sequence are hardly traceable even with the filter driver because you would need to match between the file which have been created/written to and the file which is being deleted (possibly on the other disk).
If you plan to write some security mechanism (like a DRM), then I need to remind that the data can be altered during copying (eg. encrypted or compressed), which makes your task even harder.
Still you can look at filesystem filter drivers - should you decide to go on with detection of filesystem events, such driver is a much more reliable and powerful mechanism than FileSystemWatcher.

Garbage data in file after sudden power loss

I am using a flash with FAT32 system. I am continuosly writing data to a file using file system APIs from rtos(SMX). However, after sudden poweroffs, the file contains garbage values just above the first file entry on system reboot.
I run chkdsk utility, but it doesn't fix any problem.
Any idea how can i get rid of these garbage entries even on unclean power offs?
If you expect sudden power loss, you'll need to disable all caching/buffering on file writes. Of course you'll also need to deal with partially-written files, but that should at least prevent trailing garbage.
I don't know the API you're using, but this might be done by mounting the drive "synchronously" (e.g., mount -o sync in Linux) or by opening individual files with specific options. If you do disable buffering on individual file writes, you may still run the risk of corrupting the FAT, however, and losing all the files.

How to write in memory data to DVD on Linux using C?

I've data in MP4 format which needs to be copied to DVD on Linux platform. Now I am creating MP4 file on hard disk and then burning that file to DVD using growisofs command.
It would be more efficient if I didn't have to write the MP4 data to hard disk before they are burned to DVD. Please let me know if there is a way to write in memory data to DVD using C program.
By reimplementing the tasks growisofs performs. DVDs are different to randomly accessible storage. First the data to be burnt onto the blank medium must be prepared into in a certain format, namely ISO9660, this includes a certain error correction scheme. The result of this is a complete Track. In the ISO9660 scheme it's not possible to record single files, only whole file systems. Once you got the FS you must implement the whole program for controlling the recording process.
This is what growisofs does. Now you could take the source of growisofs and replace the code it uses to read the files with code to read from some shared memory. But then you must make sure, that your program can deliver the data continously, without falling into pauses. Once started, the recording process should not be interrupted.
Anyway: If you're under Linux your program could provide the filesystem structure through FUSE.

Are intermediate files bad practice?

I was recently downvoted (which only bugged me a little :) ) for an answer I gave to this question. The person offered no explanation for the down vote which started me thinking: "Why would you avoid producing intermediate files?" Especially in a language like Python where File IO is laughably easy.
There seemed to be consensus that it was a bad idea, but I know for a fact that intermediate files are used regularly in practice. I worked for a very well respected research firm (let's just say S.O. wouldn't exist without this firm) where it was assumed that your programs would produce files as output. We did this because if your program indeed deserved to be a standalone program then it would need debuggable output and some way of passing its output between processes that could later be examined in case we discovered an error in our output further downstream.
Is it considered bad practice (in cases like the question linked above) to use intermediate files? Why?
One problem with intermediate files happens when multi-threading.
If Clients C1 and C2 are handled simultaneously by server process S (which may or not have forked into seperate processes, used threads, or whatever concurrency system..), you may get weird issues when both try to create the same intermediate file.
I believe one of Unix philosophies is that all programs should act as filters, however this doesn't necessarily mean creating files on the disk, and using intermediate files leads to unwieldly behaviour in my opinion. One should also treat the disk as a last resort and only use it for storing/retrieving data that should be available after powering off the computer, and maybe even take care to allow programs to run on read-only media.
Well, there are some issues when you use files, especially there may be many unexpected failures while accessing or creating the files. The following listed are all the issues that I personally have experienced.
1) The file location is on the remote machine and the network is down. (NFS mounted).
2) There is not enough free space while creating the file.
3) In between the process the user press Ctrl-C to cancel the process the file is not deleted.
4) The file is mounted on the NFS and the network is slow.
5) The folder in which file was created was a soft link and the original link was deleted.
But still we have to use file because there are hardly any options while working in bash. But in C,C++ i think disk access should be considered as the last resort. Program producing files as output is ok, if that is the only way to communicate with the user. But atleast for intermediate savings use of disk files should be minimized.
If you create temporary files properly (with setting platform-specific 'temporary' flag meaning do not flush cache to disk when no urgent need) they are perfectly good if task requires them.
There are almost no things in IT that you can't use while having a good reason to. :-)

Read data from damaged media

Is it possible to read damaged media (cd, hdd, dvd,...) even if windows explorer bombs out?
What I mean to ask is, whether there is a set of APIs or something that can access the disk at a very low level (below explorer?) and read whatever can be retrieved even if it is only partial, especially if you can still see the file is there from explorer, but can't do anything with it because it is damaged somehow (scratch on cd, etc)?
The main problem with Windows Explorer is that it doesn't support resuming copying after a read error. Most superficially scratched CDs, for example, will fail on different areas of the disk every time you eject and reinsert them.
Therefore, with a utility that supports resuming copy operations, it is possible to read the entire contents of a damaged CD with by doing "eject/reload/resume" a few times.
In fact, this is what a utility I wrote does, and I've never needed anything fancier to read scratched disks. (It simply uses ReadFile and WriteFile.)
One step lower would be opening the raw partition (i.e. disk image) by passing a string such as "\.\F:" (note: slashes are literal here) to CreateFile. It would allow you to read raw sectors from a drive, but reconstructing files from that data would be hard.
In fact, the "\.\" syntax allows you to open devices in the "\GLOBAL??" branch of the Windows Object Manager namespace as if they were files. It's not unlike calling dd with /dev/x as a parameter. There is also a "\Device" branch, but that's only accessible via DeviceIoControl() (i.e. ioctl()), meaning there's no simple ReadFile()/WriteFile() interface.
Anything lower level than that would be device-specific, I guess; like reading raw CD-ROM data (including ECC bits) the way some CD-burning programs do. You'd have to do some research on the specific media (CD, flash, DVD) and what your hardware allows you to do on them.
Note: The backslashes seem to get lost on the way to the web page; you need to pass "backslash backslash dot backslash DeviceName" to CreateFile. You need to escape them, too, of course.
If you want to do it, do it from the Linux side - see: http://sourceforge.net/projects/monkeycity/ opensource
or ready made app and freeware too: http://www.theabsolute.net/sware/dskinv.html
the first step is dd_rescue. After that, you're free to try anything to reconstruct the data.
And there's GNU ddrescue
GNU ddrescue is a data recovery tool. It copies data from one file or block device (hard disc, cdrom, etc) to another, trying to rescue the good parts first in case of read errors.
Make sure to use the 3-arg version (manual):
ddrescue [options] infile outfile [mapfile]
That is, do use a mapfile even if it's optional, because:
If you use the mapfile feature of ddrescue, the data is rescued very efficiently, (only the needed blocks are read). Also you can interrupt the rescue at any time and resume it later at the same point. The mapfile is an essential part of ddrescue's effectiveness. Use it unless you know what you are doing.
And it's also included in Cygwin and Homebrew.
I don't know what layer exists between Windows Explorer and the Win32 APIs. You can try to write a program with the Win32 File I/O stuff. If that doesn't work, then you have to write your own device driver to get any lower.
I've had some luck from the linux side, or using BartPE (http://www.nu2.nu/pebuilder/), but just seeing the file doesn't always mean the file is going to be recoverable, whether you're trying from Windows or Linux. You're best bet might be to use a trial of a recovery program.
I have had two disks start to disintegrate on me. From the pattern of unreadable sectors I think they had internal flaking of their emulsion. WinXP Explorer just threw up its hands and said the drive didn't even exist.
In both cases I used "GetDataBack for NTFS" from Runtime Software (http://www.runtime.org/). You can download a free trial which will show you what you could get back if you paid for it. When I bought it it was $49, but I see it is now $79.
This program is amazing. It's not necessarily fast as it will reread some sectors over and over, trying to get a consensus value from multiple tries, but when it's done you can get back stuff that you thought was gone forever. I had one drive that it took over 10 hours to analyze, but when it was done I got back over 97% of a 500GB drive. Definitely worth the price.
Another great tool is Beyond Compare. I have rev 2.5.3, but it is currently at 3.?? and costs $30. They have a full-functionality, 30-day trail. It does a great job of copying large quantities of files (and only those that need to be copied) and, unlike Explorer, it doesn't blow up if something fails. It's sort of like a visual rsync for Windows, if you're familiar with that program from the Samba people.
I have no connection with either of the comapnies mentioned other than being a very satisfied customer.
The gold standard for recovering data from a magnetic storage device would have to be SpinRite. It's a commerical app though, so you probably wouldn't learn much from it.
If you have a Linux machine around, I can recommend dvdisaster. It is originally meant for creating error correction files, but it also reads DVDs into an image and ignores read errors; and you can use different drives one after another to get missing sectors filled in the image.

Resources