Library or tools for managing shared mmapped files - c

Disclaimer: This is probably a research question as I cannot find what I am looking for, and it is rather specific.
Problem: I have a custom search application that needs to read between 100K and 10M files that are between 0.01MB to about 10.0MB each. Each file contains one array that could be directly loaded as an array via mmap. I am looking for a solution to prefetch files into RAM before they are needed and if the system memory is full, eject ones that were already processed.
I know this sounds a lot like a combination of OS memory management and something like memcached. What I am actually looking for is something like memcached that doesn't return strings or values for a key, but rather the address for the start of a chosen array. In addition, (this is a different topic) I would like to be able to have the shared memory managed such that the distance between the CPU core and the RAM is the shortest on NUMA machines.
My question is: "does a tool/library like this already exist?"

Your question is related to this one
I'm not sure you need to find a library. You just need to understand how to efficiently use system calls.
I believe the readahead system call could help you.

Indeed you have many many files (and perhaps too much of them). I hope that your filesystem is good enough, or that they are in many directories. Having millions of files may become a concern if not tuned appropriately (but I won't dare help on this).
I don't know if it is your application who writes & reads that many files. Perhaps you might consider switching to a fast DBMS like PostGresQL or MySQL, or perhaps you could use GDBM.

I have once done this for a search-engine kind of application. It used an LRU chain, which was also addressable (via a hash table) by file-id, and memory-address IIRC. On every access, the hot items were repositioned to the head of the LRU chain. When memory got tight (mmap can fail ...) the tail of the LRU-chain was unmapped.
The pitfall of this scheme is that the program can get blocked on pagefaults. And since it was single threaded, it was really blocked. Altering this to a multithreaded architecture would involve protecting the hash and LRU structures by locks and semaphores.
After that, I realised that I was doing double buffering: the OS itself has a perfect LRU diskbuffer mechanism, which is probably smarter then mine. Just open()ing or mmap()ing every single file on every request is only one sytemcall away, and (given recent activity) just as fast, or even faster than the buffering layer.
wrt DBMS: using a DBMS is a clean design, but you have the overhead of minimal 3 systemcalls just to get the first block of data. And it will certainly (always) block. But it lends itself reasonably for a multi-threaded design, and relieves you from the pain of locks and buffer management.

Related

Directly accessing video memory within the Linux kernel in a driver-agnostic manner

TL;DR at end in bold if you don't want rationale/context (which I'm providing since it's always good to explain the core issue and not simply ask for help with Method X which might not be the best approach)
I frequently do software performance analysis on older hardware, which shows up race errors, single-frame graphical glitches and other issues more readily than more modern silicon.
Often, it would be really cool to be able to take screenshots of a misbehaving application that might render garbage for one or two frames or display erroneous values for a few fractions of a second. Unfortunately, problems most frequently arise when the systems in question are swapping heavily to disk, making it consistently unlikely that the screenshots I try to take will contain the bugs I'm trying to capture.
The obvious solution would be a capture device, and I definitely want to explore pixel-perfect image and video recording in the future when I have the resources for that (it sounds like a hugely fun opportunity to explore FPGAs).
I recently realized, however, that the kernel is what is performing the swapping, and that if I move screenshotting into kernelspace, well, I don't have to wait for my screenshot keystroke to make its way through the X input layer, into the screenshot program, wait for that to do its XSHM dance and get the screenshot data, all while the system is heavily I/O loaded (eg, 5-second system load of >10) - I can simply have the kernel memcpy() the displayed area of video memory to a preallocated buffer at the exact fraction of a second I hit PrtSc!
TL;DR: Where should I start looking to figure out how to "portably" (within the sense of Linux having different graphics drivers, each with different architectural designs) access the currently-displayed area of video memory?
I get the impression I should be looking at libdrm, possibly within KMS, but I would really appreciate some pointers to knowing what actually accesses video memory.
I'm also guessing there are probably some caveats and gotchas to reading video memory directly on certain chipsets? I don't expect my code to make it into the Linux kernel (who knows, but I doubt it) but I'd still like whatever I build to be fairly portable across computers for convenience.
NOTE: I am not using compositing with the systems in question, in case this changes anything. I'm interested to know whether I could write a compositing-compatible system; I suspect this would be nontrivial.

What is the most efficient way to traverse a hard disk and determine whether a given block is currently in use?

I'm working on a personal project to regularly (monthly-ish) traverse my hard disk and shred (overwrite with zeros) any blocks on the disk not currently allocated to any inode(s).
C seemed like the most logical language to do this in given the low-level nature of the project, but I am not sure how best to find the unused blocks in the filesystem. I've found some questions around S.O. and other places that are similar to this, but did not see any consensus on the best way to efficiently and effectively find these unused blocks.
df has come up in any questions even remotely similar to this, but I don't believe it has the resolution necessary to specify exact block offsets unless I am missing something. Is there another utility I should look into or some other direction entirely?
Whatever solution I develop would need to be able to handle, at minimum, ext3 filesystems, and preferably ext4 also.
You don't really have any general solution to find out which blocks are in use other than writing your own implementation to read and parse the on disk filesystem data which is highly specific to the filesystems you want to support. How the data looks on disk is something that is often undocumented outside of the code for that filesystem and when documented the documentation is often out of date compared to the actual implementation.
Your best bet is to read the implementation of fsck for the filesystem you want to support since it does more or less what you're interested in, but be warned that many fsck implementations out there don't always check all of the data that belongs to the filesystem. You might have alternate superblocks and certain metadata that fsck doesn't check (or only checks in case the primary superblock is corrupted).
If you really want to do what you say you want to do and not just learn about filesystems your best bet is to dump your filesystem like a normal backup, wipe the disk and restore the backup. I highly doubt anything else is safe to do especially considering that your disk wiping application might break your filesystem with any kernel update you do.
Linux currently support over a dozen different filesystems so the answer will depend on which one you choose.
However, they should all have an easy way of finding free blocks otherwise creating new files or extending current files would be a bit slow.
For example, ext2 has, at the start of each block group, a header containing, among other things, the free list for that block group. I don't believe this has changed in ext4 even though there's a lot of extra stuff in there.
You would probably be far better traversing the free blocks in those block headers rather than taking an arbitrary block and trying to figure out if it's used or free.

Securely remove file from ext3 linux

This question has been asked with varying degrees of success in the past...
Are there tools, or C/C++ unix functions to call that would enable me to retrieve the location on disk of a file? Not some virtual address of the file, but the disk/sector/block the file resides in?
The goal here is to enable overwriting of the actual bits that exist on disk. I would probably need a way to bypass the kernel's superimposition of addresses. I am willing to consider an x86 asm based solution...
However, I feel there are tools that do this quite well already.
Thanks for any input on this.
Removing files securely is only possible under very specific circumstances:
There are no uncontrolled layers of indirection between the OS and the actual storage medium.
On modern systems that can no longer be assumed. SSD drives with firmware wear-leveling code do not work like this; they may move or copy data at will with no logging or possibility of outside control. Even magnetic disk drives will routinely leave existing data in sectors that have been remapped after a failure. Hybrid drives do both...
The ATA specification does support a SECURE ERASE command which erases a whole drive, but I do not know how thorough the existing implementations are.
The filesystem driver has a stable and unique mapping of files to physical blocks at all times.
I believe that ext2fs does have this feature. I also think that ext3fs and ext4fs also work like this in the default journaling mode, but not when mounted with the data=journal option which allows for file data to be stored in the journal, rather than just metadata.
On the other hand reiserfs definitely works differently, since it stores small amounts of data along with the metadata, unless mounted with the notail option.
If these two conditions are met, then a program such as shred may be able to securely remove the content of a file by overwriting its content multiple times.
This method still does not take into account:
Backups
Virtualized storage
Left over data in the swap space
...
Bottom line:
You can no longer assume that secure deletion is possible. Better assume that it is impossible and use encryption; you should probably be using it anyway if you are handling sensitive data.
There is a reason that protocols regarding sensitive data mandate the physical destruction of the storage medium. There are companies that actually demagnetize their hard disk drives and then shred them before incinerating the remains...

File in both KLM and user space

I remembering reading this concept somewhere. I do not remember where though.
I have a file say file.c, which along with other files I compile along with some other files as a library for use by applications.
Now suppose i compile the same file and build it with a Kernel module. Hence now the same file object is in both user space and kernel space and it allows me to access kernel data structures without invoking a system call. I mean i can have api's in the library by which applications can access kernel data structures without system calls. I am not sure if I can write anything into the kernel (which i think is impossile in this manner), but reading some data structures from kernel this way would be fine?
Can anyone give me more details about this approach. I could not find anything in google regarding this.
I believe this is a conceptually flawed approach, unless I misunderstand what you're talking about.
If I understand you correctly, you want to take the same file and compile it twice: once as a module and once as a userspace program. Then you want to run both of them, so that they can share memory.
So, the obvious problem with that is that even though the programs come from the same source code, they would still exist as separate executables. The module won't be its own process: it only would get invoked when the kernel get's going (i.e. system calls). So by itself, it doesn't let you escape the system call nonsense.
A better solution depends on what your goal is: do you simply want to access kernel data structures because you need something that you can't normally get at? Or, are you concerned about performance and want to access these structures faster than a system call?
For (1), you can create a character device or a procfs file. Both of these allow your userspace programs to reach their dirty little fingers into the kernel.
For (2), you are in a tough spot, and the problem gets a lot nastier (and more insteresting). To solve the speed issue, it depends a lot on what exact data you're trying to extract.
Does this help?
There are two ways to do this, the most common being what's called a Character Device, and the other being a Block Device (i.e. something "disk-like").
Here's a guide on how to create drivers that register chardevs.

Are there operating systems that aren't based off of or don't use a file/directory system?

It seems like there isn't anything inherent in an operating system that would necessarily require that sort of abstraction/metaphor.
If so, what are they? Are they still used anywhere? I'd be especially interested in knowing about examples that can be run/experimented with on a standard desktop computer.
Examples are Persistent Haskell, Squeak Smalltalk, and KeyKOS and its descendants.
It seems like there isn't anything inherent in an operating system
that would necessarily require that sort of abstraction/metaphor.
There isn't any necessity, it's completely bogus. In fact, forcing everything to be accessible via a human readable name is fundamentally flawed, and precludes security due to Zooko's triangle.
Examples of hierarchies similar to this appear as well in DNS, URLs, programming language module systems (Python and Java are two good examples), and torrents, X.509 PKI.
One system that fixes some of the problems caused by DNS/URLs/X.509 PKI is Waterken's YURL.
All these systems exhibit ridiculous problems because the system is designed around some fancy hierarchy instead of for something that actually matters.
I've been planning on writing some blogs explaining why these types of systems are bad, I'll update with links to them when I get around to it.
I found this http://pages.stern.nyu.edu/~marriaga/papers/beyond-the-hfs.pdf but it's from 2003. Is something like that what you are looking for?
About 1995, I started to design an object oriented operating system
(SOOOS) that has no file system.
Almost everything is an object that exists in virtual memory
which is mapped/paged directly to the disk
(either local or networked, I.e. redudimentary cloud computing).
There is a lot of overhead in programs to read and write data in specific formats.
Image never reading and writing files.
In SOOOS there are no such things as files and directories,
Autonomous objects, which would essentially replace files, can be organized
suiting your needs, not simply a restrictive hierarchical file system.
There are no low level drive format structures (I.e. clusters)
that have additional level of abstraction and translation overhead.
SOOOS Data storage overhead is simply limited to page tables
that can be quickly indexed as with basic virtual memory paging.
Autonomous objects each have their own dynamic
virtual memory space which serves as the persistent data store.
When active they are given a task context and added to the active process task list
and then exist as processes.
A lot of complexity is eliminated in my design, simply instanciate objects
in a program and let the memory manager and virtual memory system handle
everything consistently with minimal overhead.
Booting the operating system is simply a matter of loading the basic kernal
setting up the virtual memory page tables to the key OS objects and
(re)starting the OS object tasks. When the computer is turned-off,
shutdown is essentially analogous to hibernation
so the OS is nearly in instant-on status,
The parts (pages) of data and code are loaded only as needed.
For example to edit a document, instead of starting a program by loading the entire
executable in memory, simply load the task control structure of the
autonomous object and set the instruction pointer to the function to be performed.
The code is paged in only as the instruction pointer traverses its virtual memory.
Data is always immediately ready to be used and simply paged in only as accessed
with no need to parse files and manage data structures which often
have a distict represention in memory from secondary storage.
Simply use the program's native memory allocation mechanism and
abstract data types without disparate and/or redundent data structures.
Object Linking and Embedding type of program interaction,
memory mapped IO, and interprocess communication you
get practically for free as one would implement
memory sharing using the facilities of the processor's Memory Management Unit.

Resources