Any distributed file system which support constant time cloning - filesystems

lustre, or google file system(GFS) split a file into some kinds of block, and save them to various nodes. So they can acheive scalability, distributed traffic.
ZFS, btrfs, wafl support constant time cloning. By this, they can achieve cloning speed, writable snapshot, saving storage size.
I have been founding any file system which support above two feature.
Though there are a lot file system which support constant time cloning. but I can't find any distributed file system which can support constant time cloning. Lustre team look like developing lustre supporting zfs(and also support cloning). but it revealed yet(moreover it doesn't include 2.0 beta, maybe it will not be revealed in short time).
Nexenta storage seemed like supporting these feature by "namespace nfs". but it wasn't. it just distribute file by file-level distribution. It means, if some file exceed size of volume of one node, it will not able to handle it. If a lot of cloned files grow to big file, they can't handle that(at least, they have to really copy(not shadowing nodes) original file to other node. maybe i can attach SAN disks to zvolume of a ZFS node. but I'm very worry about concentrated traffic of ZFS node.
so I'm looking for a file system or a solution which can handle above two issue.

One working solution is to combine the Lustre filesystem with Robinhood Policy Engine in backup mode to constantly backup your filesystem files. This mode makes it possible to backup a Lustre v2.x filesystem to an external storage. It tracks modifications in the filesystem thanks to Lustre 2+ changelogs feature (FS events), and copy modified files to the backend storage, according to admin-defined migration policies. You can configure your own upcall commands in Robinhood, for example to provide a scalable way to clone your filesystem and schedule sync tasks on several nodes.
With Lustre on ZFS, it should be possible to use ZFS snapshot feature, but even the ZFS stack is not yet ready for production (currently tested on top 1 supercomputer Sequoia at LLNL).

Related

Approach to properly archiving and backing up data, preventing data loss and corruption

I'm looking for a proper way to archive and back up my data. This data consists of photos, video's, documents and more.
There are two main things I'm afraid might cause data loss or corruption, hard drive failure and bit rot.
I'm looking for a strategy that can ensure my data's safety.
I came up with the following. One hard drive which I will regularly use to store and display data. A second hard drive which will serve as an onsite backup of the first one. And a third hard drive which will serve as an offsite backup. I am however not sure if this is sufficient.
I would prefer to use regular drives, and not network attached storage, however if it's better suited I will adapt.
One of the things I read about that might help with bit rot is ZFS. ZFS does not prevent bit rot but can detect data corruption by using checksums. This would allow me to recover a corrupted file from a different drive and copy it to the corrupted one.
I need at least 2TB of storage but I'm considering 4TB to ensure potential future needs.
What would be the best way to safely store my data and prevent data loss and corruption?
For your local system plus local backup, I think a RAID configuration / ZFS makes sense because you’re just trying to handle single-disk failures / bit rot, and having a synchronous copy of the data at all times means you won’t lose the data written since your last backup was taken. With two disks ZFS can do a mirror and handles bit rot well, and if you have more disks you may consider using RAIDZ configurations since they use less storage overall to provide single-disk failure recovery. I would recommend using ZFS here over a general RAID solutions because it has a better user interface.
For your offsite backup, ZFS could make sense too. If you go that route, periodically use zfs send to copy a snapshot on the source system to the destination system. Debatably, you should use mirroring or RAIDZ on the backup system to protect against bit rot there too.
That said — there are a lot of products that will do the offsite backup for you automatically, and if you have an offsite backup, the only advantage of having an on-site backup is faster recovery if you lose your primary. Since we’re just talking about personal documents and photos, taking a little while to re-download them doesn’t seem super high stakes. If you use Dropbox / Google Drive / etc. instead, this will all be automatic and have a nice UI and support people to yell at if anything goes wrong. Also, storage at those companies will have much higher failure tolerances because they use huge numbers of disks (allowing stuff like RAIDZ with tens of parity disks and replicated across multiple geographic locations) and they have security experts to make sure that all your data is not stolen by hackers.
The only downsides are cost, and not being as intimately involved in building the system, if that part is fun for you like it is for me :).

How to transfer rules and configuration to edge devices?

In our application we have a server which contains entities along with their relations and processing rules stored in DB. To that server there will be n no.of clients like raspberry pi , gateways, android apps are connected.
I want to push configuration & processing rules to those clients, so when they read some data they can process on their own. This is to make the edge devices self sustainable, avoid outages when server/network is down.
How to push/pull the configuration. I don't want to maintain DBs at client and configure replication. But the problem is maintenance and patching of DBs for those no.of client will be tough.
So any other better alternative.?
At the same time I have to push logs to upstream (server).
Thanks in advance.
I have been there. You need an on-device data store. For this range of embedded Linux, in order of growing development complexity:
Variables: Fast to change and retrieve, makes sense if the data fits in memory. Lost if the process ends.
Filesystem: Requires no special libraries, just read/write access somewhere. Workable if the data is small enough to fit in memory and does not change much during execution (read on startup when lacking network, write on update from server). If your data can be structured as a few object variables, you could write them to JSON files, and there is plenty of documentation on other file storage options for Android apps.
In-memory datastore like Redis: Lightweight dependency, can automate messaging and filesystem-stored backup. Provides a managed framework/hybrid of the previous two.
Lightweight databases, especially SQLite: Lightweight SQL database, stored in one file and popular with Android apps (probably already installed on many of the target devices). It could work for frequent changes on a larger block of data in a memory-constrained environment, but does not look like a great fit. It gets worse for anything heavier.
Redis replication is easy, but indiscriminate, so mainly sensible if your devices receive a changing but identical ruleset. Otherwise, in all these cases, the easiest transfer option may be to request and receive the whole configuration (GET a string, download a JSON file, etc.) and parse the received values.

What is scratch space /filesystem in HPC

I am studying about HPC applications and Parallel Filesystems. I came across the term scratch space AND scratch filesystem.
I cannot visualize where this scratch space exists. Is it on the compute node as a mounted filesystem /scratch or on the main storage space.
What are it's contents.
Is scratch space independent on each compute node or, two or more nodes can share a single scratch space.
So lets say I have a file 123.txt which I want to process parallelly. Will the scratch space contain the parts of this file or the whole file will be copied.
I am confused and nowhere on google is there a clear description. Please point out to some.
Thanks a Lot.
It all depends on how the cluster was setup and what the users need. When you are given access to a cluster you should also be given some information about how it is meant to be used which should answer most of your questions.
On one of the clusters I work with NFS is used for long term storage and some Lustre space is available for job scratch space. Both the NFS and Lustre are seen by all of the nodes. Each of the nodes also has some scratch space on the node that only that node can see.
If you want your job to work on 123.txt in parallel you can copy 123.txt to a shared scratch space(Lustre) or you can copy it to each of your node scratch spaces in your job file.
for i in `cat $PBS_NODEFILE | sort -u ` ; do scp 123.txt $i:/scratch ; done
Once each node has a copy you can run your job. Once the job is done you need to copy your results to persistent storage since clusters will often run scripts to cleanup scratch space.
There are a lot of different ways to think about or deploy scratch space or a scratch file system.
Let's say you have a cluster of linux nodes, and these nodes all have a hard disk. You could imagine a /scratch space, local to each node. Since the OS image is going to be relatively small, and one cannot procure anything smaller than a terabyte drive nowadays, you end up with close to a terabyte of storage for the node to use.
What would you do with this node-local storage? Oh, lots of things. Scalable Checkpoint-Restart. Local out-of-core operations.
When I first started playing with clusters, it seemed like a good idea to gang all this un-used space into a parallel file system. PVFS worked really well for that purpose.
which lets me segue to a /scratch parallel file system available to all nodes. There is a technology component to this (which parallel file system will a site deploy?) but there is also a policy component: how long will data on this file system be retained? is it backed up? /scratch often implies files are not backed up and in fact are purged after some period of not being accessed (typically two weeks)

Does LevelDB supports hot backups (or equivalent)?

Currently we are evaluating several key+value data stores, to replace an older isam currently in use by owr main application (for 20 something years!) ...
The problem is that our current isam doesn't support crash recoveries.
So LevelDB seemd Ok to us (also checking BerkleyDB, etc)
But we ran into de question of hot-backups, and, given the fact that LevelDB is a library, and not a server, it is odd to ask for 'hot backup', as it would intuitively imply an external backup process.
Perhaps someone would like to propose options (or known solutions) ?
For example:
- Hot backup through an inner thread of the main applicacion ?
- Hot backup by merely copying the LevelDB data directory ?
Thanks in advance
You can do a snapshot iteration through a LevelDB, which is probably the best way to make a hot copy (don't forget to close the iterator).
To backup a LevelDB via the filesystem I have previously used a script that creates hard links to all the .sst files (which are immutable once written), and normal copies of the log (and MANIFEST, CURRENT etc) files, into a backup directory on the same partition. This is fast since the log files are small compared to the .sst files.
The DB must be closed (by the application) while the backup runs, but the time taken will obviously be much less than the time taken to copy the entire DB to a different partition, or upload to S3 etc. This can be done once the DB is re-opened by the application.
LMDB is an embedded key value store, but unlike LevelDB it supports multi-process concurrency so you can use an external backup process. The mdb_copy utility will make an atomic hot backup of a database, your app doesn't need to stop or do anything special while the backup runs. http://symas.com/mdb/
I am coming a bit late to this question, but there are forks of LevelDB that offer good live backup capability, such as HyperLevelDB and RocksDB. Both of these are available as npm modules, i.e. level-hyper and level-rocksdb. For more discussion, see How to backup RocksDB? and HyperDex Question.

File Replication Solutions

Thinking about a Windows-hosted build process that will periodically drop files to disk to be replicated to several other Windows Servers in the same datacenter. The other machines would run IIS, and serve those files to the masses.
The total corpus size would be millions of files, 100's of GB of data. It'd have to deal with possible contention on the target servers, latent links e.g. over a WAN, cold-start clean servers
Solutions I've thought about so far :
queue'd system and daemons either wake periodically and copy or run as services.
SAN - expensive, complex, more expensive
ROBOCOPY, on a timed job - simple but effective. Lots of internal/indeterminate state e.g. where its at in copying, errors
Off the shelf repl. software - less expensive than SAN but still expensive
UNC shared folders and no repl. Higher latency, lower cost - still need a clustering solution too.
DFS Replication.
What else have other folks used?
I've used rsync scripts with good success for this type of work, 1000's of machines in our case. I believe there is an rsync server for windows, but I have not used it on anything other than Linux.
Though we do not have these millions of giga of data to manage, we are sending and collecting lots of files overnight between our main company and its agencies abroad. We have been using allwaysync for a while. It allows folders/ftp synchronization. It has a nice interface that allow folders and files analysis and comparisons, and it can be of course scheduled.
UNC shared folders and no replication has many downsides, especially if IIS is going to use UNC paths as home directories for sites. Under stress, you will run into http://support.microsoft.com/default.aspx/kb/810886 because of the number of simultaneous sessions against the server sharing the folder. Also, you will experience slow IIS site startups since IIS is going to want to scan/index/cache (depending on IIS version and ASP settings) the UNC folder.
I've seen tests with DFS that are very promising, exhibition none of the above restrictions.
We use ROBOCOPY in my organization to pass files around. It runs very seamlessly and I feel it worth a recommendation.
Additionally, you are not doing anything too crazy. If you are also proficient in perl, I am sure you could write a quick script that will fulfill your needs.

Resources