I'm running some very specialized experiments for a research project. These experiments call for controlling memory accesses: my application should not, under any circumstances, swap information with the disk. That is, all information the application needs must stay in RAM for the duration of the execution, but it should use as much RAM as possible.
My question is: is there any way I can control disk access by my application, or at least count disk accesses for later analysis?
This is using C and Linux.
Please let me know if I can clarify the question... been working on this for so long I think everybody knows exactly what I'm talking about.
One thing you can do is actually create a ramfs or RAM file system. Are you working on a unix platform? If so you can check out mount and umount on how to create them.
http://linux.die.net/man/8/mount
http://linux.die.net/man/8/umount
Basically what you do is you create a file system stored in your RAM. You don't have to deal with all the disk read/write time anymore. If i read your question correctly you want to try avoiding disk access if you can. It's very simple to do really since you can have multiple file systems located on both a hard drive and memory.
http://www.cyberciti.biz/faq/howto-create-linux-ram-disk-filesystem/
http://www.alper.net/linuxunix/linux-ram-based-filesystem/
Hope this all helped.
The mlock system call allows you to lock part or all of your process's virtual memory to RAM, thus preventing it from being written to swap space. Notice that another process with root priviledges can still that memory area.
Related
When looking into DB storage engines, it seems most use mmap to persist. However, is there a situation where writing to a cache layer and writing binary to disk using read and write makes sense?
What I'm trying to understand is what is the difference between mmap and unmmap vs read and write? And when to use the one or the other?
If you can feasibly use mmap(), it's usually the better way. When you use read()/write(), you have to perform a system call for every operation (although libraries like stdio minimize this with user-mode buffering), and these context switches are expensive. Even if the file block is in the buffer cache, you have to first switch into the kernel to check for it. Additionally, the kernel needs to copy the data from the kernel buffer to the caller's memory.
On the other hand, when you use mmap(), you only have to perform a system call when you first open and map the file. From then on, the virtual memory subsystem keeps the application memory synchronized with the file contents. Context switches are only necessary when you try to access a file block that hasn't yet been paged in from disk, not for each part of the file you try to read or write. When you modify the mapped memory, it gets written back to the file lazily.
For most practical applications, you should use whichever method fits the logic of the application best. The performance difference between the two methods will only be significant in highly time-critical applications. When implementing a library, you can't tell the needs of client applications, so of course you try to wring every bit of performance out of it. But for many other applications, premature optimization is the root of all evil.
Before overwriting data in a file, I would like to be pretty sure the old data is stored on disk. It's potentially a very big file (multiple GB), so in-place updates are needed. Usually writes will be 2 MB or larger (my plan is to use a block size of 4 KB).
Instead of (or in addition to) calling fsync(), I would like to retain (not overwrite) old data on disk until the file system has written the new data. The main reasons why I don't want to rely on fsync() is: most hard disks lie to you about doing an fsync.
So what I'm looking for is what is the typical maximum delay for a file system, operating system (for example Windows), hard drive until data is written to disk, without using fsync or similar methods. I would like to have real-world numbers if possible. I'm not looking for advice to use fsync.
I know there is no 100% reliable way to do it, but I would like to better understand how operating systems and file systems work in this regard.
What I found so far is: 30 seconds is / was the default for /proc/sys/vm/dirty_expire_centiseconds. Then "dirty pages are flushed (written) to disk ... (when) too much time has elapsed since a page has stayed dirty" (but there I couldn't find the default time). So for Linux, 40 seconds seems to be on the safe side. But is this true for all file systems / disks? What about Windows, Android, and so on? I would like to get an answer that applies to all common operating systems / file system / disk types, including Windows, Android, regular hard disks, SSDs, and so on.
Let me restate this your problem in only slightly-uncharitable terms: You're trying to control the behavior of a physical device which its driver in the operating system cannot control. What you're trying to do seems impossible, if what you want is an actual guarantee, rather than a pretty good guess. If all you want is a pretty good guess, fine, but beware of this and document accordingly.
You might be able to solve this with the right device driver. The SCSI protocol, for example, has a Force Unit Access (FUA) bit in its READ and WRITE commands that instructs the device to bypass any internal cache. Even if the data were originally written buffered, reading unbuffered should be able to verify that it was actually there.
The only way to reliably make sure that data has been synced is to use the OS specific syncing mechanism, and as per PostgreSQL's Reliability Docs.
When the operating system sends a write request to the storage
hardware, there is little it can do to make sure the data has arrived
at a truly non-volatile storage area. Rather, it is the
administrator's responsibility to make certain that all storage
components ensure data integrity.
So no, there are no truly portable solutions, but it is possible (but hard) to write portable wrappers and deploy a reliable solution.
First of all thanks for the information that hard disks lie about flushing data, that was new to me.
Now to your problem: you want to be sure that all data that you write has been written to the disk (lowest level). You are saying that there are two parts which need to be controlled: the time when the OS writes to the hard drive and the time when the hard drive writes to the disk.
Your only solution is to use a fuzzy logic timer to estimate when the data will be written.
In my opinion this is the wrong way. You have control about when the OS is writing to the hard drive, so use the possibility and control it! Then only the lying hard drive is your problem. This problem can't be solved reliably. I think, you should tell the user/admin that he must take care when choosing the right hard drive. Of course it might be a good idea to implement the additional timer you proposed.
I believe, it's up to you to start a row of tests with different hard drives and Brad Fitzgerald's tool to get a good estimation of when hard drives will have written all data. But of course - if the hard drive wants to lie, you can never be sure that the data really has been written to the disk.
There are a lot of caches involved in giving users a responsive system.
There is cpu cache, kernel/filesystem memory cache, disk drive memory cache, etc. What you are asking is how long does it take to flush all the caches?
Or, another way to look at it is, what happens if the disk drive goes bad? All the flushing is not going to guarantee a successful read or write operation.
Disk drives do go bad eventually. The solution you are looking for is how can you have a redundant cpu/disk drive system such that the system survives a component failure and still keeps working.
You could improve the likelihood that system will keep working with aid of hardware such as RAID arrays and other high availability configurations.
As far software solution goes, I think the answer is, trust the OS to do the optimal thing. Most of them flush buffers out routinely.
This is an old question but still relevant in 2019. For Windows, the answer appears to be "at least after every one second" based on this:
To ensure that the right amount of flushing occurs, the cache manager spawns a process every second called a lazy writer. The lazy writer process queues one-eighth of the pages that have not been flushed recently to be written to disk. It constantly reevaluates the amount of data being flushed for optimal system performance, and if more data needs to be written it queues more data.
To be clear, the above says the lazy writer is spawned after every second, which is not the same as writing out data every second, but it's the best I can find so far in my own search for an answer to a similar question (in my case, I have an Android apps which lazy-writes data back to disk and I noticed some data loss when using an interval of 3 seconds, so I am going to reduce it to 1 second and see if that helps...it may hurt performance but losing data kills performance a whole lot more if you consider the hours it takes to recover it).
I'm writing a real time library which exports a standardized interface (VST) and is hosted by external applications.
The library must publish a table that is viewable by any thread in the same process (if it knows where to look) - to be clear, this table must be viewable by ALL dlls in the process space - if they know where to look.
Accessing the table must be fast. Virtual memory seems like overkill, and I've considered using a window handle (and I still may) to message pump, but I'd prefer an even faster method, if one is available.
Also, a shared data segment in the PE is something I'd like to avoid if possible. I think I'd almost rather use a window handle.
I'm not concerned with synchronization at the moment, I can handle that after the fact. I'd just like some suggestions for the fastest technique to publish the table within a process space.
You seem to be confused. All threads in the same process share the same address space, so you don't need any form of IPC: if a thread knows the address of the table, it can access it.
Use CreateFileMapping and pass in INVALID_FILE_HANDLE as the file handle.
This will create a named shared memory page(s) that is accessible by anyone who knows the name.
Don't be alarmed by the fact that MSDN docs say it's backed by the paging file - it will only go to disk in case your physical memory is exhausted, just like regular system memory.
In all regards, since it's supported by hardware MMU - it's identical to regular memory.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I need to setup and configure a ram-disk from within my C application. Is it possible?
From what I understand, a ram-disk can be setup, mounted and resized only by the root.
My application would not have that priviledge.
Is there any alternate to ram-disk which can be programmed, if it's not possible with ram-disk?
The purpose is to get data available across multiple applications which run at different times and over the network. Since the data is huge(~100-150 GB), ram-disk implementation from within the application would keep the data in memory and the next application would just use it. This would save the expensive writing to and reading from the hard disk of the huge data.
Would appreciate help on this.
Edit: A little more clarity on the problem statement. Process A runs on machine1 and writes data of about 100GB on machine2 over NFS and exits. The process B runs on machine1 and reads this data (100GB) from machine2 over NFS. The writing and reading of this huge data is turning out to be the bottleneck. How can I reduce this?
Use shm_open to create a named shared memory object, followed by ftruncate to set the size you need. You can then mmap part or all of it for writing, close it, and again shm_open it (using the same name) and mmap it in another process later. Once you're done with it, you can shm_unlink it.
Use a regular file, but memory map it. That way, the second process can access it just as easily as the first. The OS caching will take care of keeping the "hot" parts of the file in RAM.
Update, based on your mention of NFS. Look for caching settings in Linux, increase them to be very, very aggressive, so the kernel caches and avoids writing back to disk (or NFS) as much as possible.
The solution for you would be to use use shared memory (e.g. with mmap). To circumvent the problem that your two process do not run at the same time introduce an additional process (call it the "ramdisk"-process). That runs permanent and keeps the memory map alive, while your other process can connect to it.
Usually you setup a ram-disk using admin tools and use it in your program as a normal filesystem. To share data between different processes you could use shared-memory.
I'm not sure what you want to achieve by loading 150GB into memory (are you sure you have that much RAM?).
Ten years ago, I tried to put c-header files into a ram-disk to speed-up compilation, unfortunatly this had no measureable effect, because the normal file system caches them already.
I will write some thing in a file/memory just before system shutdown or a service shutdown. In the next restart of system, Is it possible to access same file or same memory on the disk, before filesystem loads? Actual requirement is like this, we have a driver that sits between volume level drivers and filesystem driver...in that part of the driver code, I want to access some memory or file.
Thanks & Regards,
calvin
The logical thing here is to read/write this into the registry if it is not too big. Is there a reason you do not want to use the registry?
If you need to access large data and you are writing a volume or device filter and cannot rely on ZwOpen/Read/Write/Close functions in the kernel an approach would be to create the file in user mode, get its device name and cluster chain and store them in the registry. On the next boot, you can get the device and clusters from registry, and do direct I/O on them.
Since you want to access this before the filesystem loads, my first thought is to allocate and use a block of storage space on the hard drive outside of the filesystem. You can create a hidden mini-partition on the drive and use low-level I/O commands to read and write your data.
This is a common task in the world of embedded systems, and we often implement it by adding some sort of non-volatile memory device into the system (flash, battery-backed DRAM, etc) and reading and writing to that device. Since you likely don't have the same level of control over the available hardware as embedded developers do, the closest analogue I can think of would be to reserve a chunk of space on a physical disk that you can read from without having to mount as a filesystem. A dedicated mini-partition might work the best because if you know the size of it, you can treat it as one big raw-access buffer and can avoid having to hassle with filenames, filesystems, etc.