Seamless memory-mapped files in C

Seamless memory-mapped files in C - c

I'm creating several programs in C that will have to communicate through files.
They will be using files because the communication will not be linear, i.e. program #5 could use a file that program #2 created.
The execution of these programs will be linear (serial).
There will be a single control program which manages the execution of these cascading programs. This program will be the one creating the files, and should only pass file names to the programs
Since disk I/O is slow (lets assume the OS doesn't cache these operations), I would need to use memory-mapped files.
However, the requirement is that the control program can seamlessly switch between regular and memory-mapped files - which means that the cascading programs will have to be unaware of whether they're writing/reading to/from a memory-mapped file or a regular one.
How can I create a file, which presents itself to the rest of the system as a normal file (has a place in the FS hierarchy, a file name, can be read and written), but is in fact in memory and not on the disk?

The terminology you're using here is a little weird - memory-mapping is a way of accessing a file (any file), not a separate type of file from one that's stored on disk.
That being said, if you want to have some of your files written out to disk and some not, the easiest way to do that would be to store them in an in-memory filesystem, such as tmpfs. There is usually one of these mounted at /dev/shm on most Linux systems.

Related

Is there a way to prevent a file from being completely loaded by a software?

Is there a way to limit a hard drive from reading a certain file? Ex. It's given to Program A the order to open a .txt file. Program B overloads the .txt file opening hundreds times a second. Program A is unable to open the txt file.
So I'm trying to stress test a game engine that relies on extracting all used textures from a single file at once. I think that this extraction method is causing some core problems to the game developing experience of the engine overall. My theory is that the problem is caused by the slow reading time of some hard drives. But I'm not sure if I'm right on this, and I needed I way to test this out.

Most operating systems support file locking and file sharing so that you can establish rules for processes that share access to a file.
.NET, for example (which runs on Windows, Linux, and MacOS), provides the facility to open a file in a variety of sharing modes.
For very rapid access like you describe, you may want to consider a memory-mapped file. They are supported on many operating systems and via various programming languages. .NET also provides support.

Possible to build support for a filesystem directly into an application?

I am wondering if it's possible to write an application that will access a foreign filesystem, but without needing support for that filesystem from the operating system. For example, I'd like to write an app in C that runs on Mac OS X that can browse / copy files from an ext2/ext3 formatted disk. Of course, you'd have to do all the transfers through the application (not through the system using cp or the Finder), but that would be OK for my purpose. Is this possible?

There are user space libraries that allow you to access file systems.
The Linux-NTFS library (libntfs) allows you to access NTFS file systems and there are user space programs like ntfsfix to do things to the file system.
E2fsprogs does the same for ext2, ext3 and ext4 filesystems.
As Basile mentioned, Mtools is another one that provides access to FAT partitions.
There was even a program that does exactly what you're looking for on Windows. It's called ext2explore and allows you to access ext2 partitions from Windows.

It is possible. For example the GNU mtools utility are doing that (assuming a way to access the raw device or partition) for MS-DOS FAT file systems.
However, file systems inside the kernel are usually very well tested and optimized.

Yes and No. For a regular user Application is usually not possible because access to block devices is restricted to root only. Every block device should give read/write to the needed block device for that effect. This would need at best a server/client approach where a service is started on the machine and configured to give the permissions on a per block device manner.
The somewhat easier alternative would be you to use the MacFUSE implementation.
Look here:
http://code.google.com/p/macfuse/
http://groups.google.com/group/macfuse?pli=1
The MacFuse project seems no longer mantained, but can give you a starting point for your project.
The dirty and quick approach is the following as root chmod 666 /dev/diskN

You can hijack syscalls and library calls from your application and then redirect reads/writes to anything like a KV store or a distributed DB layer (using the regular calls for the "virtual devices" that you do not support).
Then, the possibilities are boundless because you don't have to reach the physical/virtual devices when someone asks for them (resolving privilege issues).

Is writing bytes at the end of EXE file safe?

I've heard that if we append some bytes at the end of an EXE file, it can still work properly. Is it true in all case? And is it a safe way?
I intend to write the demo using data in the program execution file, so it can be safe (at least to normal user) and I don't have to store data anywhere else.

This is impossible to answer with a certain Yes or No.
I assume you will store data at the end of your executable in lieu of storing program state in a configuration file. I further assume you're doing this for fun and the end result does not need to be perfect.
Any code signing mechanism that might be in place on your platform will cry foul with these sorts of tricks. The signatures will only be valid if the executable does not materially change. (At least, in the code signing mechanism I helped implement, the cryptographic signature was computed over the entire contents of the executable -- with the exception of the signature itself -- not just segments marked as executable or data in the program headers.)
Any anti-virus mechanisms that might be in place on your platform will cry foul with these sorts of tricks. Self-modifying code is definitely associated with programs attempting to remain hidden or obscure, and code that writes to itself is going to trigger alarms in behavioral anti-virus tools.
Tools such as tripwire or mtree will always complain about your program. rpm -qa or debsums will always report problems with your executable. It will be difficult to reliably transfer this program from site to site.
The permissions on standard executables in most environments would outright forbid this behavior. User accounts do not have privileges to modify most executables on the system -- only executables owned by the user that will run the executable could be written as well. (And even then, a mandatory access control system such as AppArmor, SELinux, TOMOYO, or SMACK could forbid a process from writing to the program file, if properly configured. And almost every reasonable security profile would forbid it.)
No system administrator would let two users execute and write to the executable.
You also have the pragmatic problem of finding the executable file in the first place. At least Linux provides /proc/self/exe, but (a) system administrators may not have it mounted (b) system administrators may not let most processes use it (c) if the executable file is replaced while the program is executing finding the correct file to modify may be difficult.
You have to decide between two methods of updating the executable: either you modify the existing file (ftell(3) and fseek(3)) or you write the changed contents to a new file and replace the executable. Both approaches are troublesome: if you modify the file, you might have several copies executing simultaneously, trying to write conflicting edits to the file. Clever programming can avoid huge problems, but the entire executable might not be in a consistent state. If you replace the file, you might have several copies executing simultaneously, and the disk copies of the executable might not be freed and actually removable until the system is rebooted. You could have a dozen copies of your executable program file invisibly taking up disk space. None of them could share memory while executing, increasing memory pressure.
Yes, it's possible to keep configuration data in the program executable, and even make it work in some environments. But it isn't production-quality.

Virtual file systems

Most games come with their resources (models, textures, etc.) packed into special files (like .pk3 files in Quake 3 for example). Apparently, those files somehow get "mounted" and are used as if they were separate file systems.
I'd like to know how this is achieved. The only strategy I came up with so far is placing offset-size information in the file's header, then memory-mapping the file and accessing the resources as if they were independent write-protected chunks of memory.
I'd like to know if my strategy is viable and if there are better alternatives.
Thanks!

Your strategy is quite reasonable; in fact, it's an exact analogue of what a filesystem driver does with a raw block device. What problems are you having with that implementation?

Your approach sounds reasonable. Basically you'll have an ISAM file with (optional) metadata. You can split the file into sections ("directories") based on criteria (content type, frequency of use, locality of use, etc) and have a separate index/table of contents for each section. If you allow a section as a content type, then you can nest them and handle them in a consistent fashion.
If your requirements are fairly basic, you could consider simply using a zip/tar file as a container. Looking at the code for these would probably be a good place to start, regardless.

I can't say about exact format of Quake3, but there are several approaches to do what you need:
archive files (ZIP, Tar, etc)
compound storage of different kind. On Windows there's Microsoft Structured Storage
embedded single-file database
virtual file system, which implements an actual file system, but stored not on the disk partition, but somewhere else (in resources, in memory etc).
Each of those approaches has it's own strengths and weaknesses.
Our SolFS product is an example of virtual file system mentioned above. It was designed for tasks similar to yours.
Archives and some compound file implementations usually have linear sequential structure - the files are written one by one with some directory at the beginning or at the end of file.
Some compound file implementations, databases and virtual file system have page-based structure ("page" is similar to sector or cluster in FAT or NTFS) where files can be scattered across the storage.

How to implement a very simple filesystem?

I am wondering how the OS is reading/writing to the hard drive.
I would like as an exercise to implement a simple filesystem with no directories that can read and write files.
Where do I start?
Will C/C++ do the trick or do I have to go with a more low level approach?
Is it too much for one person to handle?

Take a look at FUSE: http://fuse.sourceforge.net/
This will allow you to write a filesystem without having to actually write a device driver. From there, I'd start with a single file. Basically create a file that's (for example) 100MB in length, then write your routines to read and write from that file.
Once you're happy with the results, then you can look into writing a device driver, and making your driver run against a physical disk.
The nice thing is you can use almost any language with FUSE, not just C/C++.

I found it quite easy to understand a simple filesystem while using the fat filesystem on the avr microcontroller.
http://elm-chan.org/fsw/ff/00index_e.html
Take look at the code you will figure out how fat works.

For learning the ideas of a file system it's not really necessary to use a disk i think. Just create an array of 512 byte byte-arrays. Just imagine this a your Harddisk an start to experiment a bit.
Also you may want to hava a look at some of the standard OS textbooks like http://codex.cs.yale.edu/avi/os-book/OS8/os8c/index.html

The answer to your first question, is that besides Fuse as someone else told you, you can also use Dokan that does the same for Windows, and from there is just a question of doing Reads and Writes to a physical partition (http://msdn.microsoft.com/en-us/library/aa363858%28v=vs.85%29.aspx (read particularly the section on Physical Disks and Volumes)).
Of course that in Linux or Unix besides using something like Fuse you only have to issue, a read or write call to the wanted device in /dev/xxx (if you are root), and in these terms the Unices are more friendly or more insecure depending on your point of view.
From there try to implement a simple filesystem like Fat, or something more exoteric like an tar filesystem, or even some simple filesystem based on Unix concepts like UFS or Minux, or just something that only logs the calls that are made and their arguments to a log file (and this will help you understand, the calls that are made to the filesystem driver during the regular use of your computer).
Now your second question (that is much more simple to answer), yes C/C++ will do the trick, since they are the lingua franca of system development, also a lot of your example code will be in C/C++ so you will at least read C/C++ in your development.
Now for your third question, yes, this is doable by one person, for example the ext filesystem (widely known in Linux world by it's successors as ext2 or ext3) was made by a single developer Theodore Ts'o, so don't think that these things aren't doable by a single person.
Now the final notes, remember that a real filesystem interacts with a lot of other subsystems in a regular kernel, for example, if you have a laptop and hibernate it the filesystem has to flush all changes made to the open files, if you have a pagefile on the partition or even if the pagefile has it's own filesystem, that will affect your filesystem, particularly the block sizes, since they will tend to be equal or powers of the page block size, because it's easy to just place a block from the filesystem on memory that by coincidence is equal to the page size (because that's just one transfer).
And also, security, since you will want to control the users and what files they read/write and that usually means that before opening a file, you will have to know what user is logged on, and what permissions he has for that file. And obviously without filesystem, users can't run any program or interact with the machine. Modern filesystem layers, also interact with the network subsystem due to the fact that there are network and distributed filesystems.
So if you want to go and learn about doing kernel filesystems, those are some of the things you will have to worry about (besides knowing a VFS interface)
P.S.: If you want to make Unix permissions work on Windows, you can use something like what MS uses for NFS on the server versions of windows (http://support.microsoft.com/kb/262965)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Seamless memory-mapped files in C - c

Related

Is there a way to prevent a file from being completely loaded by a software?

Possible to build support for a filesystem directly into an application?

Is writing bytes at the end of EXE file safe?

Virtual file systems

How to implement a very simple filesystem?

Categories

Resources