I need to store a very large amount of data on an hard disk. I can format it in basically any kind of format. That data is fundamental, therefore I made a copy of it. However, if some file goes corrupted, I immediately need to know it so that I can make a new copy of the only remaining file.
However, while it is easy to check if the hard disk as a whole is safe and sound, the only way I can check if a file is not corrupted is to read it and hash it. For very large amounts of data, however, this is nearly unfeasible! I can't afford 10 hours of reading and hashing to check the integrity of all the files. Moreover, continuously reading the whole data would keep my hard disk spinning and therefore could get it damaged. It sounded reasonable to me, however, that some form of check could be automatically implemented thanks to the file system itself.
I know that systems as RAID exist to assure file integrity, but those involve more hard disks, right?
So my question is: given that I know that my hard disk is alive, how can I know if some data on it somewhere got corrupted? Is there any way to make that data recoverable?
The advanced file systems like ZFS (Solaris file sysyem but available in Linux) provides the file integrity by storing the cksum of data blocks.
The RAID can provides more reliability with redundancy that one has choose for
critical data.
Related
I'm looking for a proper way to archive and back up my data. This data consists of photos, video's, documents and more.
There are two main things I'm afraid might cause data loss or corruption, hard drive failure and bit rot.
I'm looking for a strategy that can ensure my data's safety.
I came up with the following. One hard drive which I will regularly use to store and display data. A second hard drive which will serve as an onsite backup of the first one. And a third hard drive which will serve as an offsite backup. I am however not sure if this is sufficient.
I would prefer to use regular drives, and not network attached storage, however if it's better suited I will adapt.
One of the things I read about that might help with bit rot is ZFS. ZFS does not prevent bit rot but can detect data corruption by using checksums. This would allow me to recover a corrupted file from a different drive and copy it to the corrupted one.
I need at least 2TB of storage but I'm considering 4TB to ensure potential future needs.
What would be the best way to safely store my data and prevent data loss and corruption?
For your local system plus local backup, I think a RAID configuration / ZFS makes sense because you’re just trying to handle single-disk failures / bit rot, and having a synchronous copy of the data at all times means you won’t lose the data written since your last backup was taken. With two disks ZFS can do a mirror and handles bit rot well, and if you have more disks you may consider using RAIDZ configurations since they use less storage overall to provide single-disk failure recovery. I would recommend using ZFS here over a general RAID solutions because it has a better user interface.
For your offsite backup, ZFS could make sense too. If you go that route, periodically use zfs send to copy a snapshot on the source system to the destination system. Debatably, you should use mirroring or RAIDZ on the backup system to protect against bit rot there too.
That said — there are a lot of products that will do the offsite backup for you automatically, and if you have an offsite backup, the only advantage of having an on-site backup is faster recovery if you lose your primary. Since we’re just talking about personal documents and photos, taking a little while to re-download them doesn’t seem super high stakes. If you use Dropbox / Google Drive / etc. instead, this will all be automatic and have a nice UI and support people to yell at if anything goes wrong. Also, storage at those companies will have much higher failure tolerances because they use huge numbers of disks (allowing stuff like RAIDZ with tens of parity disks and replicated across multiple geographic locations) and they have security experts to make sure that all your data is not stolen by hackers.
The only downsides are cost, and not being as intimately involved in building the system, if that part is fun for you like it is for me :).
I have a project which uses BerkelyDB as a key value store for up to hundreds of millions of small records.
The way it's used is all the values are inserted into the database, and then they are iterated over using both sequential and random access, all from a single thread.
With BerkeleyDB, I can create in-memory databases that are "never intended to be preserved on disk". If the database is small enough to fit in the BerkeleyDB cache, it will never be written to disk. If it is bigger than the cache, then a temporary file will be created to hold the overflow. This option can speed things up significantly, as it prevents my application from writing gigabytes of dead data to disk when closing the database.
I have found that the BerkeleyDB write performance is too poor, even on an SSD, so I would like to switch to LMDB. However, based on the documentation, it doesn't seem like there is an option creating a non-persistent database.
What configuration/combination of options should I use to get the best performance out of LMDB if I don't care about persistence or concurrent access at all? i.e. to make it act like an "in-memory database" with temporary backing disk storage?
Just use MDB_NOSYNC and never call mdb_env_sync() yourself. You could also use MDB_WRITEMAP in addition. The OS will still eventually flush dirty pages to disk; you can play with /proc/sys/vm/dirty_ratio etc. to control that behavior.
From this post: https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
vm.dirty_ratio is the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk. When the system gets to this point all new I/O blocks until dirty pages have been written to disk.
If the dirty ratio is too small, then you will see frequent synchronous disk writes.
I know that in normal cases is faster to read/write from a file, but if I created a chat system:
Would it be faster to write and read from a file or to insert/select data in a db and cahe results?
Database is faster. AND importantly for you, deals with concurrent access.
Do you really want a mechanical disk action every time someone types? Writing to disk is a horrible idea. Cache messages in memory. Clear the message once it is sent to all users in the room. The cache will stay small, most of the time empty. This is your best option if you don't need a history log.
But if you need a log....
If you write a large amount of data in 1 pass, I guarantee the file will smoke database insert performance. A bulk insert feature of the database may match the file, but it requires a file data source to begin with. You would need to queue up a lot of messages in memory, then periodically flush to the file.
For many small writes the gap will close and the database will pull ahead. Indexes will influence the insert speed. If thousands of users are inserting to a heavily indexed table you may have problems.
Do your own tests to prove what is faster. Simulate a realistic load, not a 1 user test.
Databases by far.
Databases are optimized for data storage which is constantly updated and changed as in your case. File storage is for long-term storage with few changes.
(even if files were faster I would still go with databases because it's easier to develop and maintain)
Since I presume your system would write/read data continuously (as people type their messages), writing them to a file would take longer time because of the file handling procedure, i.e.
open file for writing
lock file
write & save
unlock file
I would go with db.
I'm writing a web application who needs to store data sent from one client, wait for other client to request and read it (on small intervalls, like 3 or 4 seconds) and then remove this data.
Currently im doing it saving this data to flat files but i'd like to know if it would be more efficient to write it to a database.
I know that usually it's more efficient to use a database but in this case i'll try to handle a lot of requests with small amounts of data on them.
Thanks in advance and sorry about my english :)
I agree with David's comment above. The question is how much I/O will you incur for each read/write. That can be affected by a lot of factors. I'm guessing the flat file option will be fastest, especially if your database is on a remote server and the data has to be sent over your internal network to read and write it.
Depending on how much data you have and how many requests you are handling, the fastest I/O would be to hold the data in memory. Of course, this is not very fault tolerant -- but that is also another consideration. The DB would provide you better integrity (over using flat files) in the event of a failure -- but if that is not a consideration, you may want to just keep it in memory.
Recently, I read an article entitled "SATA vs. SCSI reliability". It mostly discusses the very high rate bit flipping in consumer SATA drives and concludes "A 56% chance that you can't read all the data from a particular disk now". Even Raid-5 can't save us as it must be constantly scanned for problems and if a disk does die you are pretty much guaranteed to have some flipped bits on your rebuilt file system.
Considerations:
I've heard great things about Sun's ZFS with Raid-Z but the Linux and BSD implementations are still experimental. I'm not sure it's ready for prime time yet.
I've also read quite a bit about the Par2 file format. It seems like storing some extra % parity along with each file would allow you to recover from most problems. However, I am not aware of a file system that does this internally and it seems like it could be hard to manage the separate files.
Backups (Edit):
I understand that backups are paramount. However, without some kind of check in place you could easily be sending bad data to people without even knowing it. Also figuring out which backup has a good copy of that data could be difficult.
For instance, you have a Raid-5 array running for a year and you find a corrupted file. Now you have to go back checking your backups until you find a good copy. Ideally you would go to the first backup that included the file but that may be difficult to figure out, especially if the file has been edited many times. Even worse, consider if that file was appended to or edited after the corruption occurred. That alone is reason enough for block-level parity such as Par2.
That article significantly exaggerates the problem by misunderstanding the source. It assumes that data loss events are independent, ie that if I take a thousand disks, and get five hundred errors, that's likely to be one each on five hundred of the disks. But actually, as anyone who has had disk trouble knows, it's probably five hundred errors on one disk (still a tiny fraction of the disk's total capacity), and the other nine hundred and ninety-nine were fine. Thus, in practice it's not that there's a 56% chance that you can't read all of your disk, rather, it's probably more like 1% or less, but most of the people in that 1% will find they've lost dozens or hundreds of sectors even though the disk as a whole hasn't failed.
Sure enough, practical experiments reflect this understanding, not the one offered in the article.
Basically this is an example of "Chinese whispers". The article linked here refers to another article, which in turn refers indirectly to a published paper. The paper says that of course these events are not independent but that vital fact disappears on the transition to easily digested blog format.
ZFS is a start. Many storage vendors provide 520B drives with extra data protection available as well. However, this only protects your data as soon as it enters the storage fabric. If it was corrupted at the host level, then you are hosed anyway.
On the horizon are some promising standards-based solutions to this very problem. End-to-end data protection.
Consider T10 DIF (Data Integrity Field). This is an emerging standard (it was drafted 5 years ago) and a new technology, but it has the lofty goal of solving the problem of data corruption.
56% chance I can't read something, I doubt it. I run a mix of RAID 5 and other goodies and just good backup practices but with Raid 5 and a hot spare I haven't ever had data loss so I'm not sure what all the fuss is about. If you're storing parity information ... well you're creating a RAID system using software, a disk failure in R5 results in a parity like check to get back the lost disk data so ... it is already there.
Run Raid, backup your data, you be fine :)