UBIFS: Unexpected behavior (wear leveling)

UBIFS: Unexpected behavior (wear leveling) - filesystems

I have been playing around with UBIFS some. One test I wrote was a stress test to see if the wear-leveling in the system works as expected. In a nutshell the test
Writes a file with random data to the file system located on the ubi volume
Verifies the file contents
Deletes the file
This is test is done a certain number of times (around 200,000). The "stressed" UBI volume was mounted on another UBI volume. As expected, the maximum erase count for the "stressed" ubi volume went up. What I also noticed is that the maximum erase count for UBI volume of the mount location also went up. I would not have expected this.
Anyone know what might cause this? Something in UBI? Or some mechanism in the Linux kernel (like logging)?
Has anyone seen this type of behavior with other files systems that implement wear-leveling?

First guess would be that access-time logging is turned on, or maybe modification-time if the tests are being done in the root of the "stressed" volume. Most likely access-time - mount the outer filesystem (actually probably both) with -noatime.

Two processes in the system communicate via a Unix Domain Socket. This socket is created in the "mount" UBI volume (I know not a good location). When I moved this file to a RAM-based location (i.e. /tmp), the writes to the mount UBI volume stopped. During the stress test the socket existed, but was not being used. It would be good to know why the file system thinks it needs to write the file after every sync.

Related

Are existing proc files available to the kernel if system loses access all sudden to root file system such as can't read disk

If the answer is NO, that if root filesystem disk partition is lost for any reason, like dying hard drive or if booted via nfs rootfs and network is lost etc, if in these cases the kernel "no longer" has access to read or write to previously existing /proc/ files, then that's the answer, NO.
If the answer is YES, that the kernel still has access to already existing /proc/ because they are virtual and not really on any filesystem and so are still available after to kernel even after root "/" is lost, then how can I do the "equivalent" of:
"echo 1 > /proc/existingfile" but WITHOUT using call_usermodehelper but via some SYSCALL? where the "echo 1 >" can be replaced with some kernel SYSCALL so that "userland" is not relied upon because it won't be available in my scenario where root partition disappeared.
(UPDATE: In reply to a comment, perhaps SYSCALL was the wrong word, I don't care if SYSCALLS might be possible or impossible to call from inside kernel because they were made with user-space in mind. SysCall method is not the point, but I simply want to know of "any" possible method whereby I can trigger a "WRITE" to an existing /proc/file without the need of reading any input from "user-space".)
(UPDATE2: Would be nice if some Kernel authority can answer if the kernel still has access to read/write to a /proc/file even after the root "/" file system (let's say its rootfs over nfs) has become "unavailable". So far comments "contradict" on this issue, some say NO, others say YES, and others unsure etc.)
I do not want to simply cut to the chase and just do the action that putting a 1 into the existing file would have done based on kernel code. I want it to go through the usual vfs_write etc pathway "before" it does what echo 1 into /proc/existing file does. (I'm debugging some crashes/issues that's why I want it to go via this specific route.)
maybe related?
Access /proc fs variable from other parts of Kernel code

Accessing block device data beyond reported capacity

I have a SATA block device that reports a capacity that is smaller than its accessible space, and I would like to read and write to it past the reported capacity using the file created by Linux for block devices. So I hope to operate using the descriptor returned from open({"/dev/sda", O_RDWR). However, when I try to use lseek to seek past the capacity of the device, I get an error and errno gets set to EINVAL (22).
Is there a way to access the data past the capacity of the device without modifying the device drivers and while still using the file descriptor returned by open()?
My Linux release is CentOS 7 with kernel 3.10.0-514.21.1.el7.x86_64, although I'd be interested in solutions even if they involve other Linux distributions.
Edit: The drive I am working with is a FLEX protocol drive that reports the conventional capacity of the drive, but also has shingled magnetic recording available at an offset above the reported capacity of the drive. If you are interested, the details of this protocol can be found on the T13 website.

If I remember correctly, that error is caused because the device itself wasn't able to read or write that cylinder, indicating it likely does not exist. Note that many manufacturers use 1000B = 1KB and the likes, and that file systems reserve their own space as well.
The short answer is, you don't. The device will only report the space you can use, and will not report cache sizes either. This misreporting isn't at the OS level, but at the device.

Find most recently accessed file from give files in C

How to get most recently accessed file in Linux?
I used stat() call checking for st_atime, but it is not updating if i open and read the file.

You can check if your filesystem is mounted with the noatime or relatime option:
greek0#orest:/home/greek0$ cat /proc/mounts
/dev/md0 / ext3 rw,noatime,errors=remount-ro,data=ordered 0 0
...
These mount options are often used because they increase filesystem performance. Without them, every single read of a file turns into a write to the disk (for updating the atime).
In general, you can't rely on atime to have any useful meaning on most computers.
If it's Ok to only detect accesses to files that happen while your program is running, you can look into inotify. It provides a method to be notified of currently ongoing filesystem accesses.
If that doesn't satisfy your requirements, I'm afraid you're out of luck.

How to avoid damaging SD card for large writes?

Ok, first a little background to help make my question clear:
I am working on a device that collects certain data from sensors and posts them to a server using a GSM modem. As a GSM connection is not 100% reliable, it would contain a logging mechanism that would write unsent data to an SD card.
We are using Chan's FatFs module for providing us with a file system as we want the log to be readable on a PC.
Now I've been testing the FAT system for boundary conditions, i.e., trying to fill up the card completely.
In the first run I opened the file and set the code to keep writing a string until the drive was full. The program would synch after every write.
I left the code running overnight.
The next day, I examined the SD card. I found that the file was only 150 MB in size. There were about 1.2 million lines written to it. The card could still be read from but not written to or formatted.
Next time I tried the same type of test, but this time I used the f_lseek() function to pre-allocate the file to 1GB. It would then write to that file until that limit was reached. This time the data would be synced after 50 writes. It would then close that file and open another to do the same.
As you can guess another brave little card lost it's mind that day.
So these are what I would like help with :
How to prevent damage to the card while writing large amounts of data?
Does leaving the file open for extended periods have any negative effects?
Since the full code may be too long, here's the main part where the writing happens
for(file_count=3;file_count>=0;--file_count){
ax_log_msg(E_LOG_INFO,"===================================");
ax_log_msg(E_LOG_INFO,file_names[file_count]);
f_open(&file_ptr,file_names[file_count],FA_WRITE|FA_OPEN_ALWAYS);
if(result!=FR_OK){
ax_log_msg(E_LOG_INFO,"\n\rf_open Failed\n\rResult code");
ax_log_msg(E_LOG_INFO,FRESULT_S[result]);
continue;
}
ax_log_msg(E_LOG_INFO,"\n\rf_open Sucessfull");
result=f_lseek(&file_ptr,FILE_SIZE_LIMIT_1GB);
if(result!=FR_OK){
ax_log_msg(E_LOG_INFO,"\n\rf_lseek Failed for preallocation\n\rResult code");
ax_log_msg(E_LOG_INFO,FRESULT_S[result]);
f_close(&file_ptr);
continue;
}
ax_log_msg(E_LOG_INFO,"\n\rf_lseek Sucessfull for preallocation");
f_lseek(&file_ptr,0);
bytes_to_write=sizeof(messages[file_count]);
write_count=0;
while( (f_tell(&file_ptr) < FILE_SIZE_LIMIT_1GB )){
result=f_write(&file_ptr,messages[file_count],bytes_to_write,&bytes_written);
if(result==FR_OK){
++write_count;
if(write_count%50==0){
f_sync(&file_ptr);
}
}else{
ax_log_msg(E_LOG_INFO,"\n\rWrite failed\n\rFRESULT=");
ax_log_msg(E_LOG_INFO,FRESULT_S[result]);
break;
}
}
f_close(&file_ptr);
}
Note :
ax_log_msg() is part of the device firmware to print on console.
FRESULT_S[result] is used to convert the enum result code to a string.
If there is any data missing, please do mention it.
Thank You

You probably need to buffer an entire block of data, perhaps 4 KB, to avoid flashing an entire block with every flush. But, the filesystem or driver should do this for you, as long as you don't call fflush explicitly, which is the real lesson.
Why do you need it to be synced so often? Perhaps a timer would work better than an interval per number of records?

Due to 100,000 write cycles limit per sector it is a really challenging task to extend a flash memory lifespan. One of my cards died over one night after I run writing tests on it. I then counted time periods, and that's indeed easy to perform 100,000 writes (in the same sector) just in one night (without taking into account a calculation it comes through experience).
At that time I was told that there is a smart monitors in some filesystems and they count and keep writes number for every sector in order to writings number per every sector was the same, I guess. I neither took nor tested one.
I now found some extremely popular/highly voted answer/suggestion for Raspberrypi and I quote it here now:
These methods should increase the lifespan of the SD card by minimising the number of read/writes in various ways:
Disable Swap
Swapping is the process of using part of the SD card as volatile memory. This will increase the amount of RAM available, but it will result in a high number of read/writes. It is unlikely to increase performance significantly.
Disable swap with the swapoff command:
sudo swapoff --all
You must also prevent it from coming back after a reboot:
For Raspbian which uses dphys-swapfile to manage a swap file (instead of a "normal" swap partition) you can simply sudo apt-get remove dphys-swapfile to remove it permanently. Best to remove because setting the CONF_SWAPSIZE to 0, as explained in this answer, doesn't seem to work and still creates a 100MB swap file after reboot.
For other distributions that use a swap partition instead of a swap file, remove the appropriate line from /etc/fstab
Disabling Journaling on the Filesystem
Using a journaling filesystem such as ext3 or ext4 WITHOUT a journal is an option to decrease read/writes. The obvious drawback of using a filesystem with journaling disabled is data loss as a result of an ungraceful dismount (i.e. post power failure, kernel lockup, etc.).
You can disable journaling on ext3 by mounting it as ext2
You can disable journaling on ext4 on an unmounted drive like this:
tune4fs -O ^has_journal /dev/sdaX
e4fsck –f /dev/sdaX
sudo reboot
The noatime Mount Flag
Assign the noatime mount flag to partitions residing on the SD card by adding it to the options section of the partition in /etc/fstab.
Reading accesses to the file system will no longer result in an update to the atime information associated with the file. The importance of the noatime setting is that it eliminates the need by the system to make writes to the file system for files which are simply being read. Since writes can be somewhat expensive as mentioned in previous section, this can result in measurable performance gains. Note that the write time information to a file will continue to be updated anytime the file is written to with this option enabled.
Directories in RAM
Highly used directories such as /var/tmp/ and possibly /var/log can be relocated to RAM in /etc/fstab like this:
tmpfs /var/tmp tmpfs nodev,nosuid,size=50M 0 0
This will allow /var/tmp to use 50MB of RAM as disk space. The only issue with doing this is that any drives mounted in RAM will not persist past a reboot. Thus if you mount /var/log and your system encounters an error that causes it to reboot, you will not be able to find out why.
Directories in external Hard Disk
You can also mount some directories on a persistent USB hard disk. More details of this can be found in this question.
The Raspberry Pi can also boot it's root partition from an external drive. This could be via USB or Ethernet and means that the SD card will only be used to delegate to different device during boot. This requires a bit of kernel hacking to accomplish, as I don't think the default kernel supports USB storage. You can find more information at this question, or this external blog post.
Here is one more interesting consideration from another answerer:
Excellent article about flash filesystems.
Important question when talking about flash filesystems is following: What is wear leveling? Wikipedia article. Basically, on flash disks you can write limited number of times until block goes bad. After that, filesystem (if there is no built-in wear leveling management on hardware, as in case of SSDs there usually is) must mark that block as invalid, and avoid using it anymore.
Typical filesystems (for example reiserfs, ntfs, ext3 and so on) are designed for hard disks, that do not have such limitations.
JFFS2
Includes compression and elegant wear leveling protection.
YAFFS2
Single thing that makes the difference: short mount times, after successful umount.
Implements write once property: once data is written to one block, there is no need to rewrite it. This is important for protecting against wear leveling.
LogFS
Not very mature, but already included in Linux kernel tree.
Supports larger filesystems than JFFS2/YAFFS2 without problems.
UBIFS
More mature than LogFS
Write caching support
On scalability: article. On large disks, better performance than with JFFS2
ext4
If no driver or card (for example SSD drives do have internal wear leveling, at least usually) handle wear leveling, then ext4 is not the best idea, as it is not intended for raw flash usage.
What is best one?
Of course, it depends on usage and support. From what I read from the internet, I would recommend UBIFS. Good support for large filesystems, mature development phase, adequate performance and no huge downsides.
Thanks to answerers:
How can I extend the life of my SD card?
Choice of filesystem for GNU/Linux on an SD card

Reading a sector on the boot disk

This is a continuation of my question about reading the superblock.
Let's say I want to target the HFS+ file system in Mac OS X. How could I read sector 2 of the boot disk? As far as I know Unix only provides system calls to read from files, which are never stored at that location.
Does this require either 1) the program to run kernel mode, or 2) the program to be written in Assembly? I would prefer to avoid either of these restrictions, particularly the latter.

I've done this myself on the Mac, see my disk editor tool: http://apps.tempel.org/iBored
You'd open the drive using the /dev/diskN or /dev/rdiskN (N is a disk index number starting from 0). Then you can use lseek (make sure to use the 64 bit range version!) and read/write calls on the opened file.
Also, use the shell command "ls /dev/disk*" to see which drives exist currently. And note that the drives also exist with a "sM" extension where M is the partition number. That way, could can also read partitions directly.
Or, you could just use the shell tool "xxd" or "dd" to read data and then use their output. Might be easier.
You'll not be able to read your root disk and other internal disks unless you run as root, though. You may be able to access other drives as long as they were mounted by the user, or have their permissions disabled. But you may also need to unmount the drive's volumes first. Look for the unmount command in the shell command "diskutil".
Hope this helps.
Update 2017: On OS X 10.11 and later SIP may also prevent you from directly accessing the disk sectors.

In Linux, you can read from the special device file /dev/sda, assuming the hard drive you want to read is the first one. You need to be root to read this file. To read sector 2, you just seek to offset 2*SECTOR_SIZE and read in SECTOR_SIZE bytes.
I don't know if this device file is available on OS X. Check for interestingly named files under /dev such as /dev/sda or /dev/hda.

I was also going to suggest hitting the /dev/ device file for the volume, but you might want to contact Amit Singh who has written an hfsdebug utility and has probably done just what you want to do.

How does this work in terms of permissions? Wouldn't reading from /dev/... be insecure since if you read far enough you would be able to read files for which you do not have read access?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight