FAT File system and lots of writes

FAT File system and lots of writes - c

I am considering using a FAT file system for an embedded data logging application. The logger will only create one file to which it continually appends 40 bytes of data every minute. After a couple years of use this would be over one million write cycles. MY QUESTION IS: Does a FAT system change the File Allocation Table every time a file is appended? How does it keep track where the end of the file is? Does it just put an EndOfFile marker at the end or does it store the length in the FAT table? If it does change the FAT table every time I do a write, I would ware out the FLASH memory in just a couple of years. Is a FAT system the right thing to use for this application?
My other thought is that I could just store the raw data bytes in the memory card and put an EndOfFile marker at the end of my data every time I do a write. This is less desirable though because it means the only way of getting data out of the logger is through serial transfers and not via a PC and a card reader.

FAT updates the directory table when you modify the file (at least, it will if you close the file, I'm not sure what happens if you don't). It's not just the file size, it's also the last-modified date:
http://en.wikipedia.org/wiki/File_Allocation_Table#Directory_table
If your flash controller doesn't do transparent wear levelling, and your flash driver doesn't relocate things in an effort to level wear, then I guess you could cause wear. Consult your manual, but if you're using consumer hardware I would have thought that everything has wear-levelling somewhere.
On the plus side, if the event you're worried about only occurs every minute, then you should be able to speed that up considerably in a test to see whether 2 years worth of log entries really does trash your actual hardware. Might even be faster than trying to find the relevant manufacturer docs...

No, a flash file system driver is explicitly designed to minimize the wear and spread it across the memory cells. Taking advantage of the near-zero seek time. Your data rates are low, it's going to last a long time. Specifying a yearly replacement of the media is a simple way to minimize the risk.

If your only operation is appending to one file it may be simpler to forgo a filesystem and use the flash device as a data tape. You have to take into account the type of flash and its block size, though.

Large flash chips are divided into sub-pages that are a power-of-two multiple of 264 (256+8) bytes in size, pages that are a power-of-two multiple of that, and blocks which are a power-of-two multiple of that. A blank page will read as all FF's. One can write a page at a time; the smallest unit one can write is a sub-page. Once a sub-page is written, it may not be rewritten until the entire block containing it is erased. Note that on smaller flash chips, it's possible to write the bytes of a page individually, provided one only writes to blank bytes, but on many larger chips that is not possible. I think in present-generation chips, the sub-page size is 528 bytes, the page size is 2048+64 bytes, and the block size is 128K+4096 bytes.
An MMC, SD, CompactFlash, or other such card (basically anything other than SmartMedia) combines a flash chip with a processor to handle PC-style sector writes. Essentially what happens is that when a sector is written, the controller locates a blank page, writes a new version of that sector there along with up to 16 bytes of 'header' information indicating what sector it is, etc. The controller then keeps a map of where all the different pages of information are located.
A SmartMedia card exposes the flash interface directly, and relies upon the camera, card reader, or other device using it to perform such data management according to standard methods.
Note that keeping track of the whereabouts of all 4,000,000 pages on a 2 gig card would require either having 12-16 megs of RAM, or else using 12-16 meg of flash as a secondary lookup table. Using the latter approach would mean that every write to a flash page would also require a write to the lookup table. I wouldn't be at all surprised if slower flash devices use such an approach (so as to only have to track the whereabouts of about 16,000 'indirect' pages).
In any case, the most important observation is that flash write times are not predictable, but you shouldn't normally have to worry about flash wear.

Did you check what happens to the FAT file system consistency in case of a power failure or reset of your device?
When your device experience such a failure you must not lose only that log entry, that you are just writing. Older entries must stay valid.
No, FAT is not the right thing if you need to read back the data.
You should further consider what happens, if the flash memory is filled with data. How do you get space for new data? You need to define the requirements for this point.

Related

How does Flash Translation Layer store mapping data, unusable block and super block?

Does ftl have private storage space that is not flash？
If not, how does ftl store those meta data while avoiding wear leveling.
Actually I don’t know if there is a super block in ftl, but if you want to locate the mapping data and unusable block whose physical address changes frequently, a certain physical address may be needed. The content on this physical address must change frequently, how To avoid wear leveling of this physical address?

There are many possible solutions to this problem and it's very intertwined with the data representation that the drive uses to store its data, so I'm sure it differs a lot based on the drive / manufacturer. I'll just outline a general approach that could work.
Let's say you design an FTL that maintains several fixed-size, append-only "logs", and for simplicity we always have one "active" log that all writes are appended to. If the user is issuing random writes, the order of LBAs in the active log will be random too. When the active log fills all the space allocated to it, it gets "frozen" and we switch the active log to some empty log elsewhere in the flash. As the data in the frozen log becomes stale, we will eventually need to garbage collect it by copying any still-referenced blocks to a different log before erasing the original so that it can be reused for new writes.
Now, for each write to a log, nothing in our interface so far requires that the blocks be exactly 4KiB (or whatever), so you could append a small header to the data that tells you what its LBA is, and perhaps some other metadata -- write sequence number so you can tell if it's the most recent copy of a block, and maybe a checksum for read integrity checking. When a write finishes, you update an in-RAM copy of the map with the new location for the LBAs that were updated (RAM inside the SSD, not RAM for the main CPU of the computer obviously).
If the FTL crashes or loses power, you can reconstruct the map by reading all headers from all the logs. The downside is that scanning the logs will scale O(number of logs * number of blocks per log), so you optimize that somehow:
you could write the headers to a separate part of the disk by themselves so that you can scan them without also reading the user data (same big-O runtime but a lot faster in practice)
you could periodically flush the in-RAM copy of the map to flash somewhere, along with the latest IO sequence number, so that you only have to read the parts of the logs that were written since the latest map flush
How do you find the portion of the log to start scanning from? do a binary search on the IO sequence numbers in the log headers. So the boot runtime is now O(number of logs * (log_2(number of blocks per log) + number of blocks that need to be scanned))
How do you know when to stop scanning? either you recognize that all data in the block you read is 1's because that part of the log hasn't been written to yet, or you recognize that the checksum and data don't match.
Minor optimization: during a clean shutdown, always write the map to flash, so that this binary search + scanning only needs to happen if there's a crash or unclean shutdown.
So far, this lowers how often you need to write the map by a lot, but it's still probably too often to overwrite it to a fixed location for a drive with a very long lifetime. To resolve that, we have to cycle where we write the map:
The simplest solution would be to designate a small set of X special logs to store all map data and write to them like a circular buffer, where X is chosen to make the map updates last the expected lifetime of the device. To find the most recent map in the log on boot, you'd do binary search within those logs to find the last one that was written. So boot = O(X * log_2(number of maps per log) + runtime to scan the other logs if unclean shutdown).
Probably a more optimal solution (but one that might be more complicated), would include the map writes directly into the logs where the updates are happening. Then you need some way to find where the maps are at boot time -- the most obvious way to do that would be to write the map into the beginning of each active log, or you could allow arbitrary map writes by adding backpointers into the block headers that point back to the latest map in their log.
Another aspect of this is that full map flushes could be expensive, which would add tail latency if it ever interferes with the performance of user IOs -- would it be better to allow incremental updates? That's when you start looking at using something like a log-structured merge (LSM) tree to store your map, so that each incremental write is pretty small and you can amortize the full map write cost.
Obviously there are a bunch of tiny details that this explanation leaves out, but hopefully that's enough to get you started. :-)

How to protect against flash block failure?

I have a small amount of sensitive data (less than 1K) on flash memory which I would like to protect against some forms of data loss. Most notably, I would like to make sure that the data survives if the flash block it resides on fails.
The obvious answer is to have a backup of the file. Then all I need is to ensure somehow that the two files are located on different blocks. Is there a way to do this?
I'm mostly interested in having this work on Linux, so I'm looking for either a Linux-specific solution, or if there isn't any, a file system specific solution will do too.
EDIT: I'm also open to other approaches of protecting against flash block failure.

The easiest way is to create extra partition on this memory and put file there. I would avoid filesystem solution - most filesystem damages start with directory structure. And don't forget about wear leveling controller - you can't be 100% sure, where your data acutally is.

Best solution I can figure out is to put a write counter and CRC (optional) on each page, and increment counter on each write. You can allocate as many pages as you want (2-8?). You overwrite the page with the lowest counter. If a page write fails (and CRC fails?), overwrite over the next lowest number page.
When booting, the app only needs to find the page with the highest block number and intact CRC, and carry on from there.
Pages should be multiples of 1K per sector size of your memory. Check the specs.

How to avoid damaging SD card for large writes?

Ok, first a little background to help make my question clear:
I am working on a device that collects certain data from sensors and posts them to a server using a GSM modem. As a GSM connection is not 100% reliable, it would contain a logging mechanism that would write unsent data to an SD card.
We are using Chan's FatFs module for providing us with a file system as we want the log to be readable on a PC.
Now I've been testing the FAT system for boundary conditions, i.e., trying to fill up the card completely.
In the first run I opened the file and set the code to keep writing a string until the drive was full. The program would synch after every write.
I left the code running overnight.
The next day, I examined the SD card. I found that the file was only 150 MB in size. There were about 1.2 million lines written to it. The card could still be read from but not written to or formatted.
Next time I tried the same type of test, but this time I used the f_lseek() function to pre-allocate the file to 1GB. It would then write to that file until that limit was reached. This time the data would be synced after 50 writes. It would then close that file and open another to do the same.
As you can guess another brave little card lost it's mind that day.
So these are what I would like help with :
How to prevent damage to the card while writing large amounts of data?
Does leaving the file open for extended periods have any negative effects?
Since the full code may be too long, here's the main part where the writing happens
for(file_count=3;file_count>=0;--file_count){
ax_log_msg(E_LOG_INFO,"===================================");
ax_log_msg(E_LOG_INFO,file_names[file_count]);
f_open(&file_ptr,file_names[file_count],FA_WRITE|FA_OPEN_ALWAYS);
if(result!=FR_OK){
ax_log_msg(E_LOG_INFO,"\n\rf_open Failed\n\rResult code");
ax_log_msg(E_LOG_INFO,FRESULT_S[result]);
continue;
}
ax_log_msg(E_LOG_INFO,"\n\rf_open Sucessfull");
result=f_lseek(&file_ptr,FILE_SIZE_LIMIT_1GB);
if(result!=FR_OK){
ax_log_msg(E_LOG_INFO,"\n\rf_lseek Failed for preallocation\n\rResult code");
ax_log_msg(E_LOG_INFO,FRESULT_S[result]);
f_close(&file_ptr);
continue;
}
ax_log_msg(E_LOG_INFO,"\n\rf_lseek Sucessfull for preallocation");
f_lseek(&file_ptr,0);
bytes_to_write=sizeof(messages[file_count]);
write_count=0;
while( (f_tell(&file_ptr) < FILE_SIZE_LIMIT_1GB )){
result=f_write(&file_ptr,messages[file_count],bytes_to_write,&bytes_written);
if(result==FR_OK){
++write_count;
if(write_count%50==0){
f_sync(&file_ptr);
}
}else{
ax_log_msg(E_LOG_INFO,"\n\rWrite failed\n\rFRESULT=");
ax_log_msg(E_LOG_INFO,FRESULT_S[result]);
break;
}
}
f_close(&file_ptr);
}
Note :
ax_log_msg() is part of the device firmware to print on console.
FRESULT_S[result] is used to convert the enum result code to a string.
If there is any data missing, please do mention it.
Thank You

You probably need to buffer an entire block of data, perhaps 4 KB, to avoid flashing an entire block with every flush. But, the filesystem or driver should do this for you, as long as you don't call fflush explicitly, which is the real lesson.
Why do you need it to be synced so often? Perhaps a timer would work better than an interval per number of records?

Due to 100,000 write cycles limit per sector it is a really challenging task to extend a flash memory lifespan. One of my cards died over one night after I run writing tests on it. I then counted time periods, and that's indeed easy to perform 100,000 writes (in the same sector) just in one night (without taking into account a calculation it comes through experience).
At that time I was told that there is a smart monitors in some filesystems and they count and keep writes number for every sector in order to writings number per every sector was the same, I guess. I neither took nor tested one.
I now found some extremely popular/highly voted answer/suggestion for Raspberrypi and I quote it here now:
These methods should increase the lifespan of the SD card by minimising the number of read/writes in various ways:
Disable Swap
Swapping is the process of using part of the SD card as volatile memory. This will increase the amount of RAM available, but it will result in a high number of read/writes. It is unlikely to increase performance significantly.
Disable swap with the swapoff command:
sudo swapoff --all
You must also prevent it from coming back after a reboot:
For Raspbian which uses dphys-swapfile to manage a swap file (instead of a "normal" swap partition) you can simply sudo apt-get remove dphys-swapfile to remove it permanently. Best to remove because setting the CONF_SWAPSIZE to 0, as explained in this answer, doesn't seem to work and still creates a 100MB swap file after reboot.
For other distributions that use a swap partition instead of a swap file, remove the appropriate line from /etc/fstab
Disabling Journaling on the Filesystem
Using a journaling filesystem such as ext3 or ext4 WITHOUT a journal is an option to decrease read/writes. The obvious drawback of using a filesystem with journaling disabled is data loss as a result of an ungraceful dismount (i.e. post power failure, kernel lockup, etc.).
You can disable journaling on ext3 by mounting it as ext2
You can disable journaling on ext4 on an unmounted drive like this:
tune4fs -O ^has_journal /dev/sdaX
e4fsck –f /dev/sdaX
sudo reboot
The noatime Mount Flag
Assign the noatime mount flag to partitions residing on the SD card by adding it to the options section of the partition in /etc/fstab.
Reading accesses to the file system will no longer result in an update to the atime information associated with the file. The importance of the noatime setting is that it eliminates the need by the system to make writes to the file system for files which are simply being read. Since writes can be somewhat expensive as mentioned in previous section, this can result in measurable performance gains. Note that the write time information to a file will continue to be updated anytime the file is written to with this option enabled.
Directories in RAM
Highly used directories such as /var/tmp/ and possibly /var/log can be relocated to RAM in /etc/fstab like this:
tmpfs /var/tmp tmpfs nodev,nosuid,size=50M 0 0
This will allow /var/tmp to use 50MB of RAM as disk space. The only issue with doing this is that any drives mounted in RAM will not persist past a reboot. Thus if you mount /var/log and your system encounters an error that causes it to reboot, you will not be able to find out why.
Directories in external Hard Disk
You can also mount some directories on a persistent USB hard disk. More details of this can be found in this question.
The Raspberry Pi can also boot it's root partition from an external drive. This could be via USB or Ethernet and means that the SD card will only be used to delegate to different device during boot. This requires a bit of kernel hacking to accomplish, as I don't think the default kernel supports USB storage. You can find more information at this question, or this external blog post.
Here is one more interesting consideration from another answerer:
Excellent article about flash filesystems.
Important question when talking about flash filesystems is following: What is wear leveling? Wikipedia article. Basically, on flash disks you can write limited number of times until block goes bad. After that, filesystem (if there is no built-in wear leveling management on hardware, as in case of SSDs there usually is) must mark that block as invalid, and avoid using it anymore.
Typical filesystems (for example reiserfs, ntfs, ext3 and so on) are designed for hard disks, that do not have such limitations.
JFFS2
Includes compression and elegant wear leveling protection.
YAFFS2
Single thing that makes the difference: short mount times, after successful umount.
Implements write once property: once data is written to one block, there is no need to rewrite it. This is important for protecting against wear leveling.
LogFS
Not very mature, but already included in Linux kernel tree.
Supports larger filesystems than JFFS2/YAFFS2 without problems.
UBIFS
More mature than LogFS
Write caching support
On scalability: article. On large disks, better performance than with JFFS2
ext4
If no driver or card (for example SSD drives do have internal wear leveling, at least usually) handle wear leveling, then ext4 is not the best idea, as it is not intended for raw flash usage.
What is best one?
Of course, it depends on usage and support. From what I read from the internet, I would recommend UBIFS. Good support for large filesystems, mature development phase, adequate performance and no huge downsides.
Thanks to answerers:
How can I extend the life of my SD card?
Choice of filesystem for GNU/Linux on an SD card

Optimizing data stream to disk in C (also flash memory)

I have a C program running on Linux that acquires data from a USB device (sensor data), does some processing and streams the result to disk. Currently I save to a text file using fputs(), a line looks like this:
timestamp value1 value2 ... valueN
the sample rate being up to 250Hz.
The program should run on a RPi or similar board and possibly write the data to a flash memory (SD card).
I have following questions:
Should I be optimizing the data stream or let the OS do the job? More specifically, should I be trying to minimize how often data is actually written to disk (also given the use of a flash memory)?
I have read about setbuf() and setvbuf(), as I understand they should effectively delay writing until a "block" is filled. Are these appropriate or is there a better way other than perhaps implementing my own buffer?
Which output function is best suited for data streaming with the above in mind (fputs() / fprintf() / write())?
Should I be trying to increase randomness (as to use all sectors) when writing to a SD card? If yes what's the best way to achieve this?
Here some more thoughts:
I can consider using a binary format to decrease size, but I would prefer keeping the text format to simplify later data handling.
Using a hard drive is also an option in the final design, especially if a high acquisition rate is to be carried on over a long time.
The data rate being relatively low I do not expect bandwidth problem with either hard drive or SD card. It is possible that the rate will be higher in the future (kHz or more).
Thanks for your answers.
EDIT 20130128
Thank you for all the answers so far, they give me some good insight. I'll sum it up a bit:
In general I should not have bandwidth issues, however to avoid unnecessary large log files I might consider a binary format. Yes the log should be human readable, if not I'll make an export function or similar. Yes unwind's assumption is correct, about 10 or 15 data values each line.
The mentioned read/write cycles per cell should be enough for some time, at least in the testing phase, considering we don't always write and delete the same cells. I will play around with buffer size in setvbuf() and set the buffering mode to full buffering to see if I can optimize this while keeping a reasonable save interval (a few seconds or more also depending on sample rate).
In the final design I might use a hard drive to avoid most of the problems mentioned here, or a second SD card which can be easily replaced (might be also good to quickly retrieve the data). I will format this with one of the format suggested here (FAT or JFFS2/F2FS).
Following zmo's suggestion I will try to make the system as read only as possible (at least the system partition), I was already considering this.
A Beaglebone, also mentioned by zmo, is my next choice if I'm not happy with the RPi (I read that its USB bus is not always stable, USB is obviously very important for my application).
I have already implemented a UDP port to send data over network, still I would like to keep at least a local copy of that data and maybe only send a subset of or already processed data, as well as "control data".

Should I be optimizing the data stream or let the OS do the job? More specifically, should I be trying to minimize how often data is actually written to disk (also given the use of a flash memory)?
Well, you can usually assume that the OS does a pretty awesome job at buffering and handling output to the hard drive… As long as you don't do unbuffered writes.
Though, from my experience, you should not write logs to a SD Card, because it definitely kills the SD Card faster than you can imagine. On my first projects, I had installed linux on beaglebones, and between 6 months to 12 months after, all my SD Cards had to be replaced…
Since then, I've learned to run read only systems on the SD card and send any kind of regular updates over the network, the trick being to use a ramdisk for /tmp and /var.
In your case, using a hard drive is an easy solution (which will works smoothly), but you can also use a secondary SD Card where you write the logs. Then you'll be able to use a "stupid" filesystem such as a FAT one where you'll write your data aligned, as your data will be the only thing to be written on the SD. What is killing a SDCard is lots of little read/writes that happen a lot with temporary files, and defragmentation of the drive.
I have read about setbuf() and setvbuf(), as I understand they should effectively delay writing until a "block" is filled. Are these appropriate or is there a better way other than perhaps implementing my own buffer?
well, just keep it to full buffering, it will help write your data aligned on the filesystem.
Which output function is best suited for data streaming with the above in mind (fputs() / fprintf() / write())?
they should all behave similarly for your problematic.
Should I be trying to increase randomness (as to use all sectors) when writing to a SD card? If yes what's the best way to achieve this?
the firmware of the sdcard should be taking care of that for you. The only thing would be to use a simpler filesystem like FAT (or JFFS2/F2FS like ivan-voras suggets), because ext2/ext3/ext4 filesystems do automatic defragmentation which basically is moving around inodes to keep everything aligned. Though I'm not sure if it disables that behavior with SDcards and SSDs.

Writing to the SD card often will definitely kill it sooner, but it also means you can attempt to prolong this time by reducing the number of writes. As others have said, the best solution for you would be to write the logs over the network to a server or just another machine which has proper storage (in the simplest case, maybe you can use syslog(3) or just plain NFS).
If you want to continue with the original plan, then using setvbuf(3) to enable block buffered mode and setting a large buffer size (like 128 KiB or 256 KiB) would be best. A large buffer size also means that you will lose unwritten data from the buffer if power goes out, etc.
However, a large buffer only delays the inevitable and you should search for other options. It's not as alarming as Lundin's answer states because there are many cells and you're not writing always to the same one, so if you get the largest SD card you can buy, then using his method you can calculate approximately how many times you can rewrite the entire card before it fails. Using a flash-friendly file system such as F2FS or JFFS2 will be beneficial.

Here're my thoughts:
It might be a good idea to buffer some data in memory before writing to disk, but keep in mind that this might cause data loss in case of power failure.
I think this is highly dependent on the file system and type of storage you use. There is no generic answer but it could prove useful to implement and benchmark it on your specific configuration.
Considering the huge amount of data you're outputting, I'd choose a binary format (unless you want the file to be human readable)
The firmware of the flash drive should already take care of this. Basically this is the cornerstone of all modern SSDs. (SD card controllers should implement it too.)

NAND RAW access

I'm working with a C++ application in an embedded systems running Linux. This device receives messages (small chunk of few bytes) and need to be stored in a non volatile memory in case of power failure. This worked well with another platform because a static RAM was available.
The problem on this platform is that we only have a NAND Flash to do this and we would like to append different message in the same block without having to erase the whole block before updating it with a new message ! Writing a file per messages is not a good solution because there can be a lot of them ! Moreover, this must be efficient and should be life sparing for the flash by avoiding too much erases ! What I would like to be able to do is writing byte after byte into the flash without worrying about bad blocks.
I found "Petit FAT File System" and I'm wondering if this would suite my needs ... ?
Could someone tell me if this is possible with "Petit FAT File System" or give me any suggestion on how to handle this ?
Thanks !

I haven't looked into Petit file system, but your real limitation is the NAND flash. The manufacture data sheet will likely indicate how many writes you can successfully make to each block, before an erase is required. It's possible that there is no hard limit, but the integrity of the data will not be guaranteed after a max write count.
The answer depends on the process technology and flash cell design. For example, is it SLC or MLC NAND? SLC is going to be able to handle multiple block writes better.
Another question would be what type of flash controller is on your system? If it uses hardware ECC, then you might be limited by the controller, since 2nd writes will invalidate the ECC value of the 1st data write. If it is possible that you can do ECC calculations in software, then it comes back to the NAND limitation.
Small write support might be addressed in the data sheet, via a special set aside memory area that might be provided. So again, check the data sheet.
If you post a link, or indicate what hardware you are using, I can try and give you a more definite answer.

If you are dealing with flash, there's no way around deleting it before writing. All flash memory works in that way. Depending on your real-time requirements and the size of the data, this may or may not be an issue. But since you are using embedded Linux, real-time is probably not a major concern for the application anyhow.
I don't see why you would need a complete file system to store a few bytes?! Why do you need an external memory for this in the first place, can't you write to the internal flash of the MCU? If you just need to store a few bytes, an MCU with on-chip eeprom/data flash would likely suit your needs the best.
Also, that flash circuit doesn't look too promising. First I find it mighty fishy that they don't type out the number of cycles nor the data retention but refer to the "gualification report". This might indicate that the the memory is of poor quality.
And the data sheet says year 2009 and Samsung. If I may be cynical, that probably means that the chip is already obsolete. Samsung doesn't exactly have the best long-life reputation.

I'm curious why you want to use raw flash. Why not use something like JFFS2 or UBIFS on top of the MTD drive? Let the MTD driver manage the ECC while JFFS2 or UBIFS manages the wear-leveling. Then just open one file and write to it whenever you need.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight