I am running a piece of code that's based on LevelDB. It works fine in my workstation, but when I deploy it into a cluster (with Lustre file system), the program breaks with an "Invalid argument" error. This error is thrown by LevelDB.
What I have found in the web after several hours of reading is that LevelDB cannot be run in a cluster or multiprocess environment. I am not trying to do anything in parallel with the LevelDB database, but it seems that LevelDB just does not like that file system.
Does anybody have suggestions to make LevelDB run in a cluster with shared file system? Is that even possible? Any considerations I should take into account?
Cheers!
It's not levelDb, but in order for this to work for sqlite I had to mount my lustre cluster with the -localflock option. This is the solutions for some other databases as well.
Related
I have a cluster back in my office for testing purposes. I have there a database and i would like to make all kind of "monekybusiness" to those test machines, long before i want to go to production.
I zipped 2-3 coffees all this morning trying to figure out HOW to make the "inFAMOUS" Simian Army to chew my nerves here on my local machines.
Everywhere i read, saw all kind of setups for AWS.
Question : Is there a possibility to deploy the Monkeys on my local cluster? Or is there any other alternative to Simian Army?
What is the question that you want to answer with your tests?
ChaosMonkey is a resilience tool that was design for the cloud, its main purpose is to verify that AWS' Auto Scaling Groups (ASG) will be able to re-provision faulty/offline nodes and that the application is capable to perform in a stable way when this process is being done. Do you have an automated process like this in your local cluster?
Sorry If my title didn't made any sense, but my problem is that, I have one application which is hosted on a server, the application uses a database which is hosted on the same server, also the same server is using sidekiq to process a lot of queues.
One problem, is that a lot of memory is used, and everything works very slow, and even if I have a 8 core processor, I can't take advantage of it when processing queues because the application was developed on MRI and is using Unicorn.
I was thinking at moving all the part which is used to process the queues on a different server, there install Puma, and jRuby and process the queues in there(this process should be a lot faster by taking advantage of multiple cores.
All the data processed by sidekiq, is coming from a Database and is stored in a Database(currently is the same database from where it takes the info and where is storing the data). Most of the sidekiq workers are receiving some information, and are using that information to get other informations so they need to connect to the same db as the app.
What will be a good solution, to serve the same database to 2 different applications?
And is it a good idea to have another server with Puma and jRuby installed for sidekiq only(maybe other things in the future)?
Thank you
Even with MRI and Unicorn you can take advantage of multiple cores: Just start unicorn multiple times or use the clustered mode provided by Puma. Same for Sidekiq. No need to switch to JRuby right away.
Accessing the database from multiple application is no problem. But do yourself a favor and use a dedicated database server. Makes added more application servers way easier.
Currently we are evaluating several key+value data stores, to replace an older isam currently in use by owr main application (for 20 something years!) ...
The problem is that our current isam doesn't support crash recoveries.
So LevelDB seemd Ok to us (also checking BerkleyDB, etc)
But we ran into de question of hot-backups, and, given the fact that LevelDB is a library, and not a server, it is odd to ask for 'hot backup', as it would intuitively imply an external backup process.
Perhaps someone would like to propose options (or known solutions) ?
For example:
- Hot backup through an inner thread of the main applicacion ?
- Hot backup by merely copying the LevelDB data directory ?
Thanks in advance
You can do a snapshot iteration through a LevelDB, which is probably the best way to make a hot copy (don't forget to close the iterator).
To backup a LevelDB via the filesystem I have previously used a script that creates hard links to all the .sst files (which are immutable once written), and normal copies of the log (and MANIFEST, CURRENT etc) files, into a backup directory on the same partition. This is fast since the log files are small compared to the .sst files.
The DB must be closed (by the application) while the backup runs, but the time taken will obviously be much less than the time taken to copy the entire DB to a different partition, or upload to S3 etc. This can be done once the DB is re-opened by the application.
LMDB is an embedded key value store, but unlike LevelDB it supports multi-process concurrency so you can use an external backup process. The mdb_copy utility will make an atomic hot backup of a database, your app doesn't need to stop or do anything special while the backup runs. http://symas.com/mdb/
I am coming a bit late to this question, but there are forks of LevelDB that offer good live backup capability, such as HyperLevelDB and RocksDB. Both of these are available as npm modules, i.e. level-hyper and level-rocksdb. For more discussion, see How to backup RocksDB? and HyperDex Question.
lustre, or google file system(GFS) split a file into some kinds of block, and save them to various nodes. So they can acheive scalability, distributed traffic.
ZFS, btrfs, wafl support constant time cloning. By this, they can achieve cloning speed, writable snapshot, saving storage size.
I have been founding any file system which support above two feature.
Though there are a lot file system which support constant time cloning. but I can't find any distributed file system which can support constant time cloning. Lustre team look like developing lustre supporting zfs(and also support cloning). but it revealed yet(moreover it doesn't include 2.0 beta, maybe it will not be revealed in short time).
Nexenta storage seemed like supporting these feature by "namespace nfs". but it wasn't. it just distribute file by file-level distribution. It means, if some file exceed size of volume of one node, it will not able to handle it. If a lot of cloned files grow to big file, they can't handle that(at least, they have to really copy(not shadowing nodes) original file to other node. maybe i can attach SAN disks to zvolume of a ZFS node. but I'm very worry about concentrated traffic of ZFS node.
so I'm looking for a file system or a solution which can handle above two issue.
One working solution is to combine the Lustre filesystem with Robinhood Policy Engine in backup mode to constantly backup your filesystem files. This mode makes it possible to backup a Lustre v2.x filesystem to an external storage. It tracks modifications in the filesystem thanks to Lustre 2+ changelogs feature (FS events), and copy modified files to the backend storage, according to admin-defined migration policies. You can configure your own upcall commands in Robinhood, for example to provide a scalable way to clone your filesystem and schedule sync tasks on several nodes.
With Lustre on ZFS, it should be possible to use ZFS snapshot feature, but even the ZFS stack is not yet ready for production (currently tested on top 1 supercomputer Sequoia at LLNL).
Thinking about a Windows-hosted build process that will periodically drop files to disk to be replicated to several other Windows Servers in the same datacenter. The other machines would run IIS, and serve those files to the masses.
The total corpus size would be millions of files, 100's of GB of data. It'd have to deal with possible contention on the target servers, latent links e.g. over a WAN, cold-start clean servers
Solutions I've thought about so far :
queue'd system and daemons either wake periodically and copy or run as services.
SAN - expensive, complex, more expensive
ROBOCOPY, on a timed job - simple but effective. Lots of internal/indeterminate state e.g. where its at in copying, errors
Off the shelf repl. software - less expensive than SAN but still expensive
UNC shared folders and no repl. Higher latency, lower cost - still need a clustering solution too.
DFS Replication.
What else have other folks used?
I've used rsync scripts with good success for this type of work, 1000's of machines in our case. I believe there is an rsync server for windows, but I have not used it on anything other than Linux.
Though we do not have these millions of giga of data to manage, we are sending and collecting lots of files overnight between our main company and its agencies abroad. We have been using allwaysync for a while. It allows folders/ftp synchronization. It has a nice interface that allow folders and files analysis and comparisons, and it can be of course scheduled.
UNC shared folders and no replication has many downsides, especially if IIS is going to use UNC paths as home directories for sites. Under stress, you will run into http://support.microsoft.com/default.aspx/kb/810886 because of the number of simultaneous sessions against the server sharing the folder. Also, you will experience slow IIS site startups since IIS is going to want to scan/index/cache (depending on IIS version and ASP settings) the UNC folder.
I've seen tests with DFS that are very promising, exhibition none of the above restrictions.
We use ROBOCOPY in my organization to pass files around. It runs very seamlessly and I feel it worth a recommendation.
Additionally, you are not doing anything too crazy. If you are also proficient in perl, I am sure you could write a quick script that will fulfill your needs.