btrfs: how to drain one disk in 'single' profile - btrfs

Assume a multi-device btrfs with data profile single and metadata profile mirrored.
The first disk is almost full.
The second disk is large enough to hold all data of the whole filesystem.
The first disk needs replacement - is there a way to drain the data from the first disk, like e.g. some btrfs balance filter?
There is devid=1 to select data of the first disk only, but how to tell btrf balanceto shift all that data to the second disk?

To remove a device and have its contents transferred to the remaining devices, you can use btrfs device remove ....
To replace a device you can use btrfs replace .... Afterwards you may need to btrfs filesystem resize ....

Related

exFAT cluster allocation strategy

I am attempting to create an exFAT file system implementation for an embedded processor. I have a question about the cluster allocation of a file.
Suppose I have created a file and then written some data to the file. Lets say that the file has used 10 clusters and they are continuous on disk. As far as I understand it, exFAT has a way to mark the file as continuous in the file's directory entry.
Lets say I create more files and write data to them. Then I go back to my first file and want to write more data to the file. But now there are no free clusters at the end of the file as other files occupy them. I guess it must be possible to add more space to the file, but how is this recorded in exFAT? The original file was continuous but now it isnt. The FAT table wouldnt have been updated because of that. Does the FAT table now have to be re-written because the file is fragmented? Thanks
I did an experiment using a USB pen and a disk utility. It turns out that if you write more data to the file and it becomes fragmented, then the continuous cluster flag gets cleared in the dir entry. Then the FAT table needs to be created to indicate where the clusters for the file are located.

Data distribution in btrfs single profile array: using file instead of block level?

I have an array of 3 different drives which I use in single profile (no raid). I don't use raid because the data isn't that important to spend some extra money for additioinal drives.
But what I could not figure out exactly is on what granularity the data is distributed on the 3 drives.
I could find this on the wiki page:
When you have drives with differing sizes and want to use the full
capacity of each drive, you have to use the single profile for the
data blocks, rather than raid0
As far as I understand this means that not the whole files are distributed/allocated on one of the 3 drives but each of the file's data blocks.
This is unfortunate because losing only 1 drive will destroy the whole array. Is there a possibility to balance a single profile array at a file level?
I would be fine with the risk of losing all files on 1 drive in the array but not losing the whole array if 1 drive fails.

How does fwite/putc write to Disk?

Suppose we have an already existing file, say <File>. This file has been opened by a C program for update (r+b). We use fseek to navigate to a point inside <File>, other than the end of it. Now we start writing data using fwrite/fputc. Note that we don't delete any data previously existing in <File>...
How does the system handle those writes? Does it rewrite the whole file to another position in the Disk, now containing the new data? Does it fragment the file and write only the new data in another position (and just remember that in the middle there is some free space)? Does it actually overwrite in place only the part that has changed?
There is a good reason for asking: In the first case, if you continuously update a file, the system can get slow. In the second case, it could be faster but will mess up the File System if done to many files. In the third case, especially if you have a solid state Disk, updating the same spot of a File over and over again may render that part of the Disk useless.
Actually, that's where my question originates from. I've read that, to save Disk Sectors from overuse, Solid State Disks move Data to less used sectors, using different techniques. But how exactly does the stdio functions handle such situations?
Thanks in advance for your time! :D
The fileystem handler creates a kind of dicationary writing to sectors on the disc, so when you update the content of the file, the filesystem looks up the dictionary on the disc, which tells it, in which sector on the disc the file data is located. Then it spins (or waits until the disc arrives there) and updates the appropriate sectors on the disc.
That's the short version.
So in case, of updating the file, the file is normally not moved to a new place. When you write new data to the file, appending to it, and the data doesn't fit into the existing sector, then additional sectors are allocated and the data is written there.
If you delete a file, then usually the sectors are marked as free and are reused. So only if you open a new file and rewrite it, it can happen that the file is put in different sectors than before.
But the details can vary, depending on the hardware. AFAIK if you overwrite data on a CD, then the data is newly written (as long as the session is not finalized), because you can not update data on a CD, once it is written.
Your understanding is incorrect: "Note that we don't delete any data previously existing in File"
If you seek into the middle of a file and start writing it will write over whatever was at that position before.
How this is done under the covers probably depends on how computer in the hard disk implements it. It's supposed to be invisible outside the hard disk and shouldn't matter.

How to avoid physical disk I/O

I have a process which writes huge data over the network. Let's say it runs on machine A and dumps around 70-80GB of file on machine B over NFS. After process 1 finishes and exits, my process 2 runs of machine A and fetches this file from machine B over NFS. The bottleneck in the entire cycle is the writing and reading of this huge data file. How can I reduce this
I/O time? Can I somehow keep the data loaded in the memory, ready to use by process 2 even after process 1 has exited?
I'd appreciate ideas on this. Thanks.
Edit: since the process 2 'reads' the data directly from the network, would it be better to
copy the data locally first and then read from the local disk?
I mean would
(read time over network) > (cp to local disk) + (read from local disk)
If you want to keep the data loaded in memory, then you'll need 70-80 GB of RAM.
The best is maybe to attach a local storage (hard disk drive) to system A to keep this file locally.
The obvious answer is to reduce network writes - which seems could save you time on an exponential scale and improve reliability - there seems very little point in copying any file to another machine only to copy it back, so in order to answer your questions more precisely we will need more information.
There is a lot of network and IO overhead with this approach. So you may not be able to reduce the latency further down.
Since the file is more than 80 GB, create an mmap that process 1 will write into and later process 2 can read from it - no network involved, use only machine A - but still IO overhead is unavoidable.
More faster: both the processes can run simultaneously and you can have a semaphore or other signalling mechanism wherein process 1 can indicate process 2 that the file is ready to be read.
Fastest approach: Let process 1 create a shared memory and share it with process 2. Whenever a limit (maximum data chunk that can be loaded into the memory, based on your RAM size) is reached, let process 1 signal process 2 that the data can be read and processed - this solution will be feasile only if the file/data can actually be processed chunks by chunks instead of one big chunk of your 80GB.
Whether you use mmap or plain read/write should make little difference; either way, everything happens through the filesystem cache/buffers. The big problem is NFS. The only way you can make this efficient is by storing the intermediate data locally on machine A rather than sending it all over the network to machine B only to pull it back again right afterwards.
Use tmpfs to leverage memory as (temporary) files.
Use mbuffer with netcat to simply relay from one port to another without storing the intermediate stream, but still allowing streaming to occur at varying speeds:
machine1:8001 -> machine2:8002 -> machine3:8003
At machine2 configure a job like:
netcat -l -p 8002 | mbuffer -m 2G | netcat machine3 8003
This will allow at most 2 gigs of data to be buffered. If the buffer is filled 100%, machine2 will just start blocking reads from machine1, delaying the output stream without failing.
When machine1 had completed transmission, the second netcat will stay around till the mbuffer is depleted
You can use RAM disk as storage
NFS is slow. Try use alternative way to transfer data to another PC. For sample - TCP/IP stream.
Another solution - you can use inmemory database (TimesTen for sample)

FAT File system and lots of writes

I am considering using a FAT file system for an embedded data logging application. The logger will only create one file to which it continually appends 40 bytes of data every minute. After a couple years of use this would be over one million write cycles. MY QUESTION IS: Does a FAT system change the File Allocation Table every time a file is appended? How does it keep track where the end of the file is? Does it just put an EndOfFile marker at the end or does it store the length in the FAT table? If it does change the FAT table every time I do a write, I would ware out the FLASH memory in just a couple of years. Is a FAT system the right thing to use for this application?
My other thought is that I could just store the raw data bytes in the memory card and put an EndOfFile marker at the end of my data every time I do a write. This is less desirable though because it means the only way of getting data out of the logger is through serial transfers and not via a PC and a card reader.
FAT updates the directory table when you modify the file (at least, it will if you close the file, I'm not sure what happens if you don't). It's not just the file size, it's also the last-modified date:
http://en.wikipedia.org/wiki/File_Allocation_Table#Directory_table
If your flash controller doesn't do transparent wear levelling, and your flash driver doesn't relocate things in an effort to level wear, then I guess you could cause wear. Consult your manual, but if you're using consumer hardware I would have thought that everything has wear-levelling somewhere.
On the plus side, if the event you're worried about only occurs every minute, then you should be able to speed that up considerably in a test to see whether 2 years worth of log entries really does trash your actual hardware. Might even be faster than trying to find the relevant manufacturer docs...
No, a flash file system driver is explicitly designed to minimize the wear and spread it across the memory cells. Taking advantage of the near-zero seek time. Your data rates are low, it's going to last a long time. Specifying a yearly replacement of the media is a simple way to minimize the risk.
If your only operation is appending to one file it may be simpler to forgo a filesystem and use the flash device as a data tape. You have to take into account the type of flash and its block size, though.
Large flash chips are divided into sub-pages that are a power-of-two multiple of 264 (256+8) bytes in size, pages that are a power-of-two multiple of that, and blocks which are a power-of-two multiple of that. A blank page will read as all FF's. One can write a page at a time; the smallest unit one can write is a sub-page. Once a sub-page is written, it may not be rewritten until the entire block containing it is erased. Note that on smaller flash chips, it's possible to write the bytes of a page individually, provided one only writes to blank bytes, but on many larger chips that is not possible. I think in present-generation chips, the sub-page size is 528 bytes, the page size is 2048+64 bytes, and the block size is 128K+4096 bytes.
An MMC, SD, CompactFlash, or other such card (basically anything other than SmartMedia) combines a flash chip with a processor to handle PC-style sector writes. Essentially what happens is that when a sector is written, the controller locates a blank page, writes a new version of that sector there along with up to 16 bytes of 'header' information indicating what sector it is, etc. The controller then keeps a map of where all the different pages of information are located.
A SmartMedia card exposes the flash interface directly, and relies upon the camera, card reader, or other device using it to perform such data management according to standard methods.
Note that keeping track of the whereabouts of all 4,000,000 pages on a 2 gig card would require either having 12-16 megs of RAM, or else using 12-16 meg of flash as a secondary lookup table. Using the latter approach would mean that every write to a flash page would also require a write to the lookup table. I wouldn't be at all surprised if slower flash devices use such an approach (so as to only have to track the whereabouts of about 16,000 'indirect' pages).
In any case, the most important observation is that flash write times are not predictable, but you shouldn't normally have to worry about flash wear.
Did you check what happens to the FAT file system consistency in case of a power failure or reset of your device?
When your device experience such a failure you must not lose only that log entry, that you are just writing. Older entries must stay valid.
No, FAT is not the right thing if you need to read back the data.
You should further consider what happens, if the flash memory is filled with data. How do you get space for new data? You need to define the requirements for this point.

Resources