How to setup programmable ram disk without root permissions on linux [closed] - c

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I need to setup and configure a ram-disk from within my C application. Is it possible?
From what I understand, a ram-disk can be setup, mounted and resized only by the root.
My application would not have that priviledge.
Is there any alternate to ram-disk which can be programmed, if it's not possible with ram-disk?
The purpose is to get data available across multiple applications which run at different times and over the network. Since the data is huge(~100-150 GB), ram-disk implementation from within the application would keep the data in memory and the next application would just use it. This would save the expensive writing to and reading from the hard disk of the huge data.
Would appreciate help on this.
Edit: A little more clarity on the problem statement. Process A runs on machine1 and writes data of about 100GB on machine2 over NFS and exits. The process B runs on machine1 and reads this data (100GB) from machine2 over NFS. The writing and reading of this huge data is turning out to be the bottleneck. How can I reduce this?

Use shm_open to create a named shared memory object, followed by ftruncate to set the size you need. You can then mmap part or all of it for writing, close it, and again shm_open it (using the same name) and mmap it in another process later. Once you're done with it, you can shm_unlink it.

Use a regular file, but memory map it. That way, the second process can access it just as easily as the first. The OS caching will take care of keeping the "hot" parts of the file in RAM.
Update, based on your mention of NFS. Look for caching settings in Linux, increase them to be very, very aggressive, so the kernel caches and avoids writing back to disk (or NFS) as much as possible.

The solution for you would be to use use shared memory (e.g. with mmap). To circumvent the problem that your two process do not run at the same time introduce an additional process (call it the "ramdisk"-process). That runs permanent and keeps the memory map alive, while your other process can connect to it.

Usually you setup a ram-disk using admin tools and use it in your program as a normal filesystem. To share data between different processes you could use shared-memory.
I'm not sure what you want to achieve by loading 150GB into memory (are you sure you have that much RAM?).
Ten years ago, I tried to put c-header files into a ram-disk to speed-up compilation, unfortunatly this had no measureable effect, because the normal file system caches them already.

Related

Enforcing the type of medium on a Virtual Memory system [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Suppose I'm designing a software application that requires high bandwidth / low latency memory transfers to operate properly.
My OS uses Virtual Memory addressing.
Is there a way to enforce the variables (that I choose) to be located in DDR and not on the hard drive for example?
You're conflating virtual memory with swap memory: Virtual memory just means, that the address space in which a process operates is an abstraction that presents a very orderly structure, while the actual physical address space is occupied in almost a chaotic manner. And yes, virtual memory is part of memory page swapping, but it's not a synonym for it.
One way to achieve what you want is to simply turn off page swapping for the whole system. It can also be done for specific parts of virtual address space. But before I explain you how to that, I need to tell you this:
You're approaching this from the wrong angle. The system main memory banks you're referring to as DDR (which is just a particular transfer clocking mode, BTW) are just one level in a whole hierarchy of memory. But actually even system main memory is slow compared to the computational throughput of processors. And this has been so since the dawn of computing. This is why computers have cache memory; small amounts of fast memory. And on modern architectures these caches also form the interface between caching hierarchy layers.
If you perform a memory operation on a modern CPU, this memory operation will hit the cache. If it's a read and the cache is hot, the cache will deliver, otherwise it escalates the operation to the next layer. Writes will affect only the caches on the short term and only propagate to main memory through cache eviction or explicit memory barriers.
Normally you don't want to interfere with the decisions an OS takes regarding virtual memory management; you'll hardly able to outsmart it. If you have a bunch of data sitting in memory which you access at a high frequency, then the memory management will see that and don't even consider paging out that part of memory. I think I'll have to write that out again, in clear words: On every modern OS, regions of memory that are in active and repeated use will not be paged out. If swapping happens, then, because the system is running out of memory and tries to juggle stuff around. This is called Thrashing and locking pages into memory will not help against it; all it will do is forcing the OS to go around and kill processes that hog memory (likely your process) to get some breathing space.
Anyway, if you really feel you want to lock pages into memory, have a look at that mlock(2) syscall.
As far as I can tell, there is no way to force certain variables to be stored in DDR vs. HDD, when virtual-memory handles the memory translations. What you can do is to configure your operating system to use different types of secondary storage for virtual memory - such as solid state disks, HDD, etc.

Multiple processes read/write files. What API to use?

I have this situation when I need to spawn worker processes. On one side, worker processes should read evenly split parts of a file and pass data to socket connection. Other side should read that data and write it in parallel. I plan to split source file into parts beforehand so that each process gets only one part of a file to read from or write to.
So I'm already using sockets with read/write. From that, I think, it is better for me to continue to use this simple API. But I can not find any means of setting file pointer when using file descriptors. I need that obviously, when reading from file that is divided to read/write parts.
I've heard that mmap can help me somehow. But to my understanding mmap needs much RAM and my app will run multiple mentioned transfers. The app is also quite limited in CPU usage.
The question is, what API should I use?
EDIT I be on Linux. Filesystem is ext4.

Can I check for disk access at runtime?

I'm running some very specialized experiments for a research project. These experiments call for controlling memory accesses: my application should not, under any circumstances, swap information with the disk. That is, all information the application needs must stay in RAM for the duration of the execution, but it should use as much RAM as possible.
My question is: is there any way I can control disk access by my application, or at least count disk accesses for later analysis?
This is using C and Linux.
Please let me know if I can clarify the question... been working on this for so long I think everybody knows exactly what I'm talking about.
One thing you can do is actually create a ramfs or RAM file system. Are you working on a unix platform? If so you can check out mount and umount on how to create them.
http://linux.die.net/man/8/mount
http://linux.die.net/man/8/umount
Basically what you do is you create a file system stored in your RAM. You don't have to deal with all the disk read/write time anymore. If i read your question correctly you want to try avoiding disk access if you can. It's very simple to do really since you can have multiple file systems located on both a hard drive and memory.
http://www.cyberciti.biz/faq/howto-create-linux-ram-disk-filesystem/
http://www.alper.net/linuxunix/linux-ram-based-filesystem/
Hope this all helped.
The mlock system call allows you to lock part or all of your process's virtual memory to RAM, thus preventing it from being written to swap space. Notice that another process with root priviledges can still that memory area.

Embedded File System and power-off

I am working on an embedded application without any OS that needs the use of a File System. I've been over this many times with the people in the project and some agree with me that the system must make a proper shut down of the system whenever there is a power failure or else the file system might go crazy.
Some people say that it doesn't matter if you simply power off the system and let nature run its course, but I think that's one of the worst things to do, especially if you know this will bring you a problem and probably shorten your product's life span.
In the last paragraph I just assumed that it is a problem, but my question remains:
Does a power down have any effect on the file system?
Here is a list of various techniques to help an embedded system tolerate a power failure. These may not be practical for your particular application.
Use a Journaling File System - Can tolerate incomplete writes due to power failure, OS crash, etc. Most modern filesystems are journaled, but do your homework to confirm.
Unless your application needs the write performance, disable all write caching. Check your disk drivers for caching options. Under Linux/Unix, consider mounting the filesystem in sync mode.
Unless it must be writable, make it read-only. Try to keep your application executables and operating system files on their own partition(s), with write protections in place (e.g. mount read only in Linux). Your read/write data should be on its own partition. Even if your application data gets corrupted, your system should still be able to boot (albeit with a fail safe default configuration).
3a. For data that is only written once (e.g. Configuration Settings), try to keep it mounted as read-only most of the time. If there is a settings change mount is as R/W temporarily, update the data, and then unmount/remount it as read-only.
3b. Use a technique similar to 3a to handle application/OS updates in the field.
3c. If it is impractical for you to mount the FS as read-only, at least consider opening individual files as read-only (e.g. fp=fopen("configuration.ini", "r")).
If possible, use separate devices for your storage. Keeping things in separate partitions provides some protection, but there are still edge cases where a partition table may become corrupt and render the entire drive unreadable. Using physically separate devices further isolates against one corrupt device bringing down the whole system. In a perfect world, you would have at least 4 separate devices:
4a. Boot Loader
4b. Operating System & Application Code
4c. Configuration Settings
4e. Application Data
Know the characteristics of your storage devices, and control the brand/model/revision of devices used. Some hard disks ignore cache flush commands from the OS. We had cases where some models of CompactFlash cards would corrupt themselves during a power failure, but the "industrial" models did not have this problem. Of course, this information was not published in any datasheet, and had to be gathered by experimental testing. We developed a list of approved CF cards, and kept inventory of those cards. We periodically had to update this list as older cards became obsolete, or the manufacturer would make a revision.
Put your temporary files in a RAM Disk. If you keep those writes off-disk, you eliminate them as a potential source of corruption. You also reduce flash wear and tear.
Develop automated corruption detection and recovery methods. - All of the above techniques will not help you if the application simply hangs because a missing config file. You need to be able to recover as gracefully as possible:
7a. Your system should maintain at least two copies of its configuration settings, a "primary" and a "backup". If the primary fails for some reason, switch to the backup. You should also consider mechanisms for making backups whenever whenever the configuration is changed, or after a configuration has been declared "good" by the user (testing vs production mode).
7b. Did your Application Data partition fail to mount? Automatically run chkdsk/fsck.
7c. Did chkdsk/fsck fail to fix the problem? Automatically re-format the partition and get it back to a known state.
7d. Do you have a Boot Loader or other method to restore the OS and application after a failure?
7e. Make sure your system will beep, flash an LED, or something to indicate to the user what happened.
Power Failures should be part of your system qualification testing. The only way you will be sure you have a robust system is to test it. Yank the power cord from the system and document what happens. Try yanking the power at multiple points in the system operation (during runtime, while booting, mid configuration, etc). Repeat each test multiple times.
If you cannot mitigate all power failure problems, incorporate a battery or Supercapacitor into the system - Keep in mind that you will need a background process in your OS to initiate a graceful shutdown when power gets low. Also, batteries will require periodic testing and replacement with age.
Addition to msemack's response, unfortunately my rating is too low to post a comment to his answer vs. a separate answer.
Does a power down have any effect on the file system?
Yes, if proper measures aren't put in place to prevent corruption. See previous answers for file system options to help mitigate. However if ATA flush/sleep aren't properly implemented on your device you may run into the scenario we did. In our scenario the device was corrupt beyond the file system, and fdisk/format would not recover the device.
Instead an ATA security-erase was required to recover the device once corruption occurs. In order to avoid this, we implemented an ATA sleep command prior to power loss. This required hold-up of 400ms to support the 160ms ATA sleep took, and leave some head room for degradation of the caps over the life of the product.
Notes from our scenario:
fdisk/format failed to repair/recover the drive.
Our power-safe file system's check disk utility returned that the device had bad blocks, but there really weren't any.
flush/sync returned success, quickly, and most likely weren't implemented.
Once corrupt, dd could not read the device beyond the 1st partition boundary and returned i/o errors after.
hdparm used to issue ATA security-erase, as only method of recovery for some corruption scenarios.
For non-journalling filesystem unexpected turn-off can mean corruption of certain data including directory structure. This happens if there's unsaved data in the cache or if the FS is in the process of writing multi-block update and interruption happens when only some blocks are written.
Journalling addresses this problem mostly - if there's interruption in the middle, recovery routine or check-and-repair operation done by the FS (usually implicitly) brings the filesystem to consistent state. However this state is not always the latest - i.e. if there were some data in the memory cache, they can be lost even with journalling. This is because journalling saves you from corruption of the filesystem but doesn't do magic.
Write-through mode (no write caching) reduces possibility of the data loss but doesn't solve the problem completely, as journalling will work as a cache (for a very short time).
So unfortunately backup or data duplication are the main ways to prevent data loss.
It totally depends on the file system you are using and if it is acceptable to loose some data at power off based on your project requirements.
One could imagine using a file system that is secured against unattended power-off and is able to recover from a partial write sequence. So on the applicative side, if you don't have critic data that absolutely needs to be written before shuting down, there is no need for a specific power off detection procedure.
Now if you want a more specific answer for your project you will have to give more information on the file system you are using and your project requirements.
Edit: As you have critical applicative data to save before power-off, i think you have answered the question yourself. The only way to secure unattended power-off is to have a brown-out detection that alerts your embedded device coupled with some hardware circuitry that allows keeping delivering enought power to the device to perform the shutdown procedure.
The FAT file-system is particularly prone to corruption if a write is in progress or a file is open on shutdown - specifically if ther is a buffered operation that is not flushed . On one project I worked on the solution was to run a file system integrity check and repair (essentially chkdsk/scandsk) on start-up. This strategy did not prevent data loss, but it did prevent the file system becoming unusable.
A number of vendors provide journalling add-on components for FAT to counter exactly this problem. These include Segger, Quadros and Micrium for example.
Either way, your system should generally adopt a open-write-close approach to file access, or open-write-flush if you feel the need to keep the file open.

After how many seconds are file system write buffers typically flushed?

Before overwriting data in a file, I would like to be pretty sure the old data is stored on disk. It's potentially a very big file (multiple GB), so in-place updates are needed. Usually writes will be 2 MB or larger (my plan is to use a block size of 4 KB).
Instead of (or in addition to) calling fsync(), I would like to retain (not overwrite) old data on disk until the file system has written the new data. The main reasons why I don't want to rely on fsync() is: most hard disks lie to you about doing an fsync.
So what I'm looking for is what is the typical maximum delay for a file system, operating system (for example Windows), hard drive until data is written to disk, without using fsync or similar methods. I would like to have real-world numbers if possible. I'm not looking for advice to use fsync.
I know there is no 100% reliable way to do it, but I would like to better understand how operating systems and file systems work in this regard.
What I found so far is: 30 seconds is / was the default for /proc/sys/vm/dirty_expire_centiseconds. Then "dirty pages are flushed (written) to disk ... (when) too much time has elapsed since a page has stayed dirty" (but there I couldn't find the default time). So for Linux, 40 seconds seems to be on the safe side. But is this true for all file systems / disks? What about Windows, Android, and so on? I would like to get an answer that applies to all common operating systems / file system / disk types, including Windows, Android, regular hard disks, SSDs, and so on.
Let me restate this your problem in only slightly-uncharitable terms: You're trying to control the behavior of a physical device which its driver in the operating system cannot control. What you're trying to do seems impossible, if what you want is an actual guarantee, rather than a pretty good guess. If all you want is a pretty good guess, fine, but beware of this and document accordingly.
You might be able to solve this with the right device driver. The SCSI protocol, for example, has a Force Unit Access (FUA) bit in its READ and WRITE commands that instructs the device to bypass any internal cache. Even if the data were originally written buffered, reading unbuffered should be able to verify that it was actually there.
The only way to reliably make sure that data has been synced is to use the OS specific syncing mechanism, and as per PostgreSQL's Reliability Docs.
When the operating system sends a write request to the storage
hardware, there is little it can do to make sure the data has arrived
at a truly non-volatile storage area. Rather, it is the
administrator's responsibility to make certain that all storage
components ensure data integrity.
So no, there are no truly portable solutions, but it is possible (but hard) to write portable wrappers and deploy a reliable solution.
First of all thanks for the information that hard disks lie about flushing data, that was new to me.
Now to your problem: you want to be sure that all data that you write has been written to the disk (lowest level). You are saying that there are two parts which need to be controlled: the time when the OS writes to the hard drive and the time when the hard drive writes to the disk.
Your only solution is to use a fuzzy logic timer to estimate when the data will be written.
In my opinion this is the wrong way. You have control about when the OS is writing to the hard drive, so use the possibility and control it! Then only the lying hard drive is your problem. This problem can't be solved reliably. I think, you should tell the user/admin that he must take care when choosing the right hard drive. Of course it might be a good idea to implement the additional timer you proposed.
I believe, it's up to you to start a row of tests with different hard drives and Brad Fitzgerald's tool to get a good estimation of when hard drives will have written all data. But of course - if the hard drive wants to lie, you can never be sure that the data really has been written to the disk.
There are a lot of caches involved in giving users a responsive system.
There is cpu cache, kernel/filesystem memory cache, disk drive memory cache, etc. What you are asking is how long does it take to flush all the caches?
Or, another way to look at it is, what happens if the disk drive goes bad? All the flushing is not going to guarantee a successful read or write operation.
Disk drives do go bad eventually. The solution you are looking for is how can you have a redundant cpu/disk drive system such that the system survives a component failure and still keeps working.
You could improve the likelihood that system will keep working with aid of hardware such as RAID arrays and other high availability configurations.
As far software solution goes, I think the answer is, trust the OS to do the optimal thing. Most of them flush buffers out routinely.
This is an old question but still relevant in 2019. For Windows, the answer appears to be "at least after every one second" based on this:
To ensure that the right amount of flushing occurs, the cache manager spawns a process every second called a lazy writer. The lazy writer process queues one-eighth of the pages that have not been flushed recently to be written to disk. It constantly reevaluates the amount of data being flushed for optimal system performance, and if more data needs to be written it queues more data.
To be clear, the above says the lazy writer is spawned after every second, which is not the same as writing out data every second, but it's the best I can find so far in my own search for an answer to a similar question (in my case, I have an Android apps which lazy-writes data back to disk and I noticed some data loss when using an interval of 3 seconds, so I am going to reduce it to 1 second and see if that helps...it may hurt performance but losing data kills performance a whole lot more if you consider the hours it takes to recover it).

Resources