Life expectancy of usb stick when datalogging [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I know that on average a flashdrive has a life expectancy of roughly 100,000 write cycles. This raises a question for me
I have written a program where I write some values to a csv file on a usb stick every 6 seconds. Every day a new file is created. The machine is a Sigmatek PLC programmed in structure text (similar to pascal) with C library for file handling. The code looks something like file fopen (opens todays file), write some values to the stream along with a timestamp, then file fclose (close the file).
I heard someone say this could mean my usb stick will not last a long time since I'm opening and closing the file every 6 seconds. He suggested I opened the file, write values every 6 seconds as usual, and then close after 10 or 20 minutes, this way the usb stick would last a lot longer. His reasoning being that the usb stick will only be written to at the moment you would actually close the file with Fclose. Can someone confirm this?
Or will this perhaps not become a problem at all even if im opening and closing every 6 seconds, since the usb stick has 16gb of memory, and will only run out of memory after a looooong time (1 file is 500kb max, one file created evey day) , therefore I'm only writing and not writing and erasing from memory? Is the 100,000 write cycles lifetime based on purely writing or writing, erasing and re-writing?

First, regarding a fclose() every 10-20 minutes. This depends on the buffering mode (for C, see setvbuf). In buffered mode, what you were told is correct - any buffered data is written at the time of a fclose(). However, there is an increased risk of losing data (e.g. sudden power loss means unwritten buffer is lost).
We've also made embedded systems using writable flash (not USB). 100,000 write cycles is hugely variable. It means "P/E" (program-erase) cycles. If you're only appending data, then at the rate you cite, I would not bother too much about it. If you're doing other things like erasing/compressing log files which could result in the same storage location being written multiple times, then you need to think more about it. You'd also need to look at what is being done by the OS - for example, any type of auto-defrag should preferably not be enabled.

Related

C fastest way to continously write data to file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have a string composed of some packet statistics, such as packet length, etc.
I would like to store this to a csv file, but if I use the standard fprintf to write to a file, it writes incredibly slowly, and I end up losing information.
How do I write information to a file as quickly as possible in order to minimize information loss from packets. Ideally I would like to support millions of packets per second, which means I need to write millions of lines per second.
I am using XDP to get packet information and send it to the userspace via an eBPF map if that matters.
The optimal performance will depend on the hard drive, drive fragmentation, the filesystem, the OS and the processor. But optimal performance will never be achieved by writing small chunks of data that do not align well with the filesystem's disk structure.
A simple solution would be to use a memory mapped file and let the OS asynchronously deal with actually committing the data to the file - that way it is likely to be optimal for the system you are running on without you having to deal with all the possible variables or work out the optimal write block size of your system.
Even with regular stream I/O you will improve performance drastically by writing to a RAM buffer. Making the buffer size a multiple of the block size of your file system is likely to be optimal. However since file writes may block if there is insufficient buffering in the file system itself for queued writes or write-back, you may not want to make the buffer too large if the data generation and the data write occur in a single thread.
Another solution is to have a separate write thread, connected to the thread generating the data via a pipe or queue. The writer thread can then simply buffer data from the pipe/queue until it has a "block" (again matching the file system block size is a good idea), then committing the block to the file. The pipe/queue then acts a buffer storing data generated while the thread is stalled writing to the file. The buffering afforded by the pipe, the block, the file system and the disk write-cache will likely accommodate any disk latency so long at the fundamental write performance of the drive is faster then the rate at which data to write is being generated - nothing but a faster drive will solve that problem.
Use sprintf to write to a buffer in memory.
Make that buffer as large as possible, and when it gets full, then use a single fwrite to dump the entire buffer to disk. Hopefully by that point it will contain many hundreds or thousands of lines of CSV data that will get written at once while you begin to fill up another in-memory buffer with more sprintf.

Logging Events---Looking For a Good Way

I created a program that monitors for events.
I want to log these events "in the right way".
Currently I have a string array, log[500][100].
Each line is a string of characters (up to 100) that report something about the event.
I have it set up so that only the last 500 events are saved in the array.
After that, new events overwrite the oldest events.
Currently I just keep revolving through the array until the program terminates, then I write the array to a file.
Going forward I would like to view the log in real time, any time I wish, without disturbing the event processing and logging process.
I considered opening the file for "appending" but here are my concerns:
(1) The program is running on a Raspberry Pi which has a flash memory as a "disk drive". I believe flash memories have a limited number of write cycles before problems can occur. This program runs 24/7 "forever" so I am afraid the "disk drive" will "wear out".
(2) I am using pretty much all the CPU capacity of the RPi so I don't want to add a lot of overhead/CPU cycles.
How would experienced programmers attack this problem?
Please go easy on me, this is my first C program.
[EDIT]
I began reviewing all the information and I became intrigued by Mark A's suggestion for tmpfs. I looked into it more and I am sure this answers my question. It allows the creation of files in RAM not the SD card. They are lost on power down but I don't care.
In order to keep the files from growing to large I created a double buffer approach. First I write 500 events to file A then switch to file B. When 500 events have been written to file B I close and reopen file A (to delete the contents and start at 0 events) and switch to writing to file A. I found I needed to fflush(file...) after each write or else the file was empty until fclose.
Normally that would be OK but right now I am fighting a nasty segmentation fault so I want as much insight into what is going on. When I hit the fault, I never get to my fclose statements.
Welcome to Stack Overflow and to C programming! A wonderful world of possibilities awaits you.
I have several thoughts in response to your situation.
The very short summary is to use stdout and delegate the output-file management to the shell.
The longer, rambling answer full of my personal musing is as follows:
1 : A very typical thing for C programs to do is not be in charge of how outputs are kept. You might have heard of the "built in" file handles, stdin, stdout, and stderr. These file handles are (under normal circumstances) always available to your program for input (from stdin) and output (stdout and stderr). As you might guess from their names stdout is customarily used for regular output and stderr is customarily used for error / exception output. It is exceedingly typical for a C program to simply read from stdin and output to stdout and stderr, and let something else (e.g., the shell) take care of what those actually are.
For example, reading from stdin means that your program can be used for keyboard entry and for file reading, without needing to change your program's code. The same goes for stdout and stderr; simply output to those file handles, and let the user decide whether those should go to the screen or be redirected to a file. And, because stdout and stderr are separate file handles, the user can have them go to separate 'destinations'.
In your case, to implement this, drop the array entirely, and simply
fprintf(stdout, "event notice : %s\n", eventdetailstring);
(or similar) every time your program has something to say. Take a look at fflush(), too, because of potential output buffering.
2a : This gets you continuous output. This itself can help with your concern about memory wear on the Pi's flash disk. If you do something like:
eventmonitor > logfile
then logfile will be being appended to during the lifetime of your program, which will tend to be writing to new parts of the flash disk. Of course, if you only ever append, you will eventually run out of space on the disk, so you might set up a cron job to kill the currently running eventmonitor and restart it every day at midnight. Done with the above command, that would cause it to overwrite logfile once per day. This prevents endless growth, and it might even use a new physical area of the flash drive for the new file (even though it's the same name; underneath, it's a different file, with a different inode, etc.) But even if it reuses the exact same area of the flash drive, now you are down to worrying if this will last more than 10,000 days, instead of 10,000 writes. I'm betting that within 10,000 days, new options will be available -- worst case, you buy a new Pi every 27 years or so!
There are other possible variations on this theme, as well. e.g., you could have a sophisticated script kicked off by cron every day at midnight that kills any currently running eventmonitor, deletes output files older than a week, and starts a new eventmonitor outputting to a file whose filename is based partly on the date so that past days' file aren't overwritten. But all of this is in the realm of using your program. You can make your program easier to use by writing it to use stdin, stdout, and stderr.
2b : Or, you can just have stdout go to the screen, which is typically how it already is when a program is started from an interactive shell / terminal window. I imagine you could have the Pi running headless most of the time, and when you want to see what your program is outputting, hook up a monitor. Generally, things will stay running between disconnecting and reconnecting your monitor. This avoids affecting the flash drive at all.
3 : Another approach is to have your event monitoring program send its output somewhere off-system. This is getting into more advanced programming territory, so you might want to save this for a later enhancement, after you've mastered more of the basics. But, your program could establish a network connection to, say, a JSON API and send event information there. This would let you separate the functions of event monitoring from event reporting.
You will discover as you learn more programming that this idea of separation of concerns is an important concept, and applies at various levels of a program or a system of interoperating programs. In this case, the Pi is a good fit for the data monitoring aspect because it is a lightweight solution, and some other system with more capacity and more stable storage can cover the data collection aspect.

improve performance in file IO in C [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to write bulk of integers to a file after performing heap operations on them, one by one. I am trying to merge sorted files into a single file. As of now, I am writing to file after every operation. I am using min heap to merge files.
My questions are -
When performing file write, is disk accessed every time a file write is made or chunks of memory blocks are written at a time?
Will it improve performance if I'll take output of heap in an array of say size 1024 or may be more and then perform a write at once?
Thank you in advance.
EDIT- Will using setbuffer() help? I feel it should help for certain extend.
1. When performing file write, is disk accessed every time a file write is made
or chunks of memory blocks are written at a time?
No. Your output isn't written until the output buffer is full. You can force a write with fflush to flush output streams causing an immediate write, but otherwise, output is buffered.
other 1. Will it improve performance if I'll take output of heap in an array of
say size 1024 or may be more and then perform a write at once?
If you are not exhausting the heap, then no, you are not going to gain significant performance putting the storage on the stack, etc.. Buffering is always preferred, but if you store all the data in an array and then call write, you still have the same size output buffer to deal with.
When performing file write, is disk accessed every time a file write
is made or chunks of memory blocks are written at a time?
This is up to the kernel. Buffers are flushed when you call fsync() on the file descriptor. fflush() only flushes the data buffered in the FILE structure, it doesn't flush the kernel buffers.
Will it improve performance if I'll take output of heap in an array of
say size 1024 or may be more and then perform a write at once?
I made tests some time ago to compare the performance of write() and fwrite() against a custom implementation, and it turns out you can gain a fair speedup by calling write() directly with large chunks. This is actually what fwrite() does, but due to the infrastructure it has to maintain, it is slower than a custom implementation. As for buffer size, 1024 is certainly too small. 8K or something would perform better.
It is operating system and implementation specific.
On most Linux systems -with a good filesystem like Ext4- the kernel will try hard to avoid disk accesses by caching a lot of file system data. See linuxatemyram
But I would still recommend avoiding making too much of IO operations, and have some buffering (if using stdio(3) routines, pass buffers of several dozens of kilobytes to fwrite(3) and use setvbuf(3) and fflush(3) with care; alternatively use direct syscalls like write(2) or mmap(2) with buffers of e.g. 64Kbytes...)
BTW, using perhaps the posix_fadvise(2) syscall might marginally help performance (if used wisely).
In reality, the bottleneck is often the hardware. Use RAM filesystems (tmpfs) or fast SSD disks if you can.
On Windows systems (which I never used), I have no idea, but the general intuition is that some buffering should help.

Optimizing data stream to disk in C (also flash memory)

I have a C program running on Linux that acquires data from a USB device (sensor data), does some processing and streams the result to disk. Currently I save to a text file using fputs(), a line looks like this:
timestamp value1 value2 ... valueN
the sample rate being up to 250Hz.
The program should run on a RPi or similar board and possibly write the data to a flash memory (SD card).
I have following questions:
Should I be optimizing the data stream or let the OS do the job? More specifically, should I be trying to minimize how often data is actually written to disk (also given the use of a flash memory)?
I have read about setbuf() and setvbuf(), as I understand they should effectively delay writing until a "block" is filled. Are these appropriate or is there a better way other than perhaps implementing my own buffer?
Which output function is best suited for data streaming with the above in mind (fputs() / fprintf() / write())?
Should I be trying to increase randomness (as to use all sectors) when writing to a SD card? If yes what's the best way to achieve this?
Here some more thoughts:
I can consider using a binary format to decrease size, but I would prefer keeping the text format to simplify later data handling.
Using a hard drive is also an option in the final design, especially if a high acquisition rate is to be carried on over a long time.
The data rate being relatively low I do not expect bandwidth problem with either hard drive or SD card. It is possible that the rate will be higher in the future (kHz or more).
Thanks for your answers.
EDIT 20130128
Thank you for all the answers so far, they give me some good insight. I'll sum it up a bit:
In general I should not have bandwidth issues, however to avoid unnecessary large log files I might consider a binary format. Yes the log should be human readable, if not I'll make an export function or similar. Yes unwind's assumption is correct, about 10 or 15 data values each line.
The mentioned read/write cycles per cell should be enough for some time, at least in the testing phase, considering we don't always write and delete the same cells. I will play around with buffer size in setvbuf() and set the buffering mode to full buffering to see if I can optimize this while keeping a reasonable save interval (a few seconds or more also depending on sample rate).
In the final design I might use a hard drive to avoid most of the problems mentioned here, or a second SD card which can be easily replaced (might be also good to quickly retrieve the data). I will format this with one of the format suggested here (FAT or JFFS2/F2FS).
Following zmo's suggestion I will try to make the system as read only as possible (at least the system partition), I was already considering this.
A Beaglebone, also mentioned by zmo, is my next choice if I'm not happy with the RPi (I read that its USB bus is not always stable, USB is obviously very important for my application).
I have already implemented a UDP port to send data over network, still I would like to keep at least a local copy of that data and maybe only send a subset of or already processed data, as well as "control data".
Should I be optimizing the data stream or let the OS do the job? More specifically, should I be trying to minimize how often data is actually written to disk (also given the use of a flash memory)?
Well, you can usually assume that the OS does a pretty awesome job at buffering and handling output to the hard driveā€¦ As long as you don't do unbuffered writes.
Though, from my experience, you should not write logs to a SD Card, because it definitely kills the SD Card faster than you can imagine. On my first projects, I had installed linux on beaglebones, and between 6 months to 12 months after, all my SD Cards had to be replacedā€¦
Since then, I've learned to run read only systems on the SD card and send any kind of regular updates over the network, the trick being to use a ramdisk for /tmp and /var.
In your case, using a hard drive is an easy solution (which will works smoothly), but you can also use a secondary SD Card where you write the logs. Then you'll be able to use a "stupid" filesystem such as a FAT one where you'll write your data aligned, as your data will be the only thing to be written on the SD. What is killing a SDCard is lots of little read/writes that happen a lot with temporary files, and defragmentation of the drive.
I have read about setbuf() and setvbuf(), as I understand they should effectively delay writing until a "block" is filled. Are these appropriate or is there a better way other than perhaps implementing my own buffer?
well, just keep it to full buffering, it will help write your data aligned on the filesystem.
Which output function is best suited for data streaming with the above in mind (fputs() / fprintf() / write())?
they should all behave similarly for your problematic.
Should I be trying to increase randomness (as to use all sectors) when writing to a SD card? If yes what's the best way to achieve this?
the firmware of the sdcard should be taking care of that for you. The only thing would be to use a simpler filesystem like FAT (or JFFS2/F2FS like ivan-voras suggets), because ext2/ext3/ext4 filesystems do automatic defragmentation which basically is moving around inodes to keep everything aligned. Though I'm not sure if it disables that behavior with SDcards and SSDs.
Writing to the SD card often will definitely kill it sooner, but it also means you can attempt to prolong this time by reducing the number of writes. As others have said, the best solution for you would be to write the logs over the network to a server or just another machine which has proper storage (in the simplest case, maybe you can use syslog(3) or just plain NFS).
If you want to continue with the original plan, then using setvbuf(3) to enable block buffered mode and setting a large buffer size (like 128 KiB or 256 KiB) would be best. A large buffer size also means that you will lose unwritten data from the buffer if power goes out, etc.
However, a large buffer only delays the inevitable and you should search for other options. It's not as alarming as Lundin's answer states because there are many cells and you're not writing always to the same one, so if you get the largest SD card you can buy, then using his method you can calculate approximately how many times you can rewrite the entire card before it fails. Using a flash-friendly file system such as F2FS or JFFS2 will be beneficial.
Here're my thoughts:
It might be a good idea to buffer some data in memory before writing to disk, but keep in mind that this might cause data loss in case of power failure.
I think this is highly dependent on the file system and type of storage you use. There is no generic answer but it could prove useful to implement and benchmark it on your specific configuration.
Considering the huge amount of data you're outputting, I'd choose a binary format (unless you want the file to be human readable)
The firmware of the flash drive should already take care of this. Basically this is the cornerstone of all modern SSDs. (SD card controllers should implement it too.)

C program stuck on uninterruptible wait while performing disk I/O on Mac OS X Snow Leopard

One line of background: I'm the developer of Redis, a NoSQL database. One of the new features I'm implementing is Virtual Memory, because Redis takes all the data in memory. Thanks to VM Redis is able to transfer rarely used objects from memory to disk, there are a number of reasons why this works much better than letting the OS do the work for us swapping (redis objects are built of many small objects allocated in non contiguous places, when serialized to disk by Redis they take 10 times less space compared to the memory pages where they live, and so forth).
Now I've an alpha implementation that's working perfectly on Linux, but not so well on Mac OS X Snow Leopard. From time to time, while Redis tries to move a page from memory to disk, the redis process enters the uninterruptible wait state for minutes. I was unable to debug this, but this happens either in a call to fseeko() or fwrite(). After minutes the call finally returns and redis continues working without problems at all: no crash.
The amount of data transfered is very small, something like 256 bytes. So it should not be a matter of a very big amount of I/O performed.
But there is an interesting detail about the swap file that's target of the write operation. It's a big file (26 Gigabytes) created opening a file with fopen() and then enlarged using ftruncate(). Finally the file is unlink()ed so that Redis continues to take a reference to it, but we are sure that when the Redis process will exit the OS will really free the swap file.
Ok that's all but I'm here for any further detail. And BTW you can even find the actual code in the Redis git, but it's not trivial to understand in five minutes given that's a fairly complex system.
Thank you very much for any help.
As I understand it, HFS+ has very poor support for sparse files. So it may be that your write is triggering a file expansion that is initializing/materializing a large fraction of the file.
For example, I know mmap'ing a new large empty file and then writing at a few random locations produces a very large file on disk with HFS+. It's quite annoying since mmap and sparse files are an extremely convenient way of working with data, and virtually every other platform/filesystem out there handles this gracefully.
Is the swap file written to linearly? Meaning we either replace an existing block or write a new block at the end and increment a free space pointer? If so, perhaps doing more frequent smaller ftruncate calls to expand the file would result in shorter pauses.
As an aside, I'm curious why redis VM doesn't use mmap and then just move blocks around in an attempt to concentrate hot blocks into hot pages.
antirez, I'm not sure I'll be much help since my Apple experience is limited to the Apple ][, but I'll give it a shot.
First thing is a question. I would have thought that, for virtual memory, speed of operation would be a more important measure than disk space (especially for a NoSQL DB where speed is the whole point, otherwise you'd be using SQL, no?). But, if your swap file is 26G, maybe not :-)
Some things to try (if possible).
Try to actually isolate the problem to the seek or write. I have a hard time believing a seek could take that long since, at worst, it should be a buffer pointer change. Still, I didn't write OSX so I can't be sure.
Try adjusting the size of the swap file to see if that's what is causing the problem.
Do you ever dynamically expand the swap file (as opposed to pre-allocation)? If you do, that may be what is causing the problem.
Do you always write as low in the file as you can? It may be that creating a 26G file may not actually fill it with data but, if you create it then write to the last byte, the OS may have to zero out the bytes before then (deferring the initialization, if any).
What happens if you just pre-allocate the entire file (write to every byte) and not unlink it? In other words, leave the file there between runs of your program (creating it if it doesn't already exist of course). Then in your startup code for Redis, just initialize the file (pointers and such). This may get rid of any problems like those in point 4 above.
Ask on the various BSD sites as well. I'm not sure how much Apple changed under the covers but OSX is just BSD at the lowest level (Pax ducks for cover).
Also consider asking on the Apple sites (if you haven't already done so).
Well, that's my small contribution, hopefully it'll help. Good luck with your project.
Have you turned off file caching for your file? i.e. fcntl(fd, F_GLOBAL_NOCACHE, 1)
Have you tried debugging with DTrace and or Instruments (Apple's experimental dtrace front-end)?
Exploring Leopard with DTrace
Debugging Chrome on OS X
As Linus said once on the Git mailing list:
"I realize that OS X people have a hard time accepting it, but OS X
filesystems are generally total and utter crap - even more so than
Windows."

Resources