What is a couchbase pool - default

In couch base URL, e.g. server:port/pools/default
what exactly a couch base pool is. Will it always be default or we can change it.
There is some text written there
http://www.couchbase.com/docs/couchbase-manual-1.8/couchbase-admin-restapi-key-concepts-resources.html
but I cannot really get it 100%. Please anyone can explain.

A long time ago the Couchbase engineers intended to build out a concept of having pools similar to zfs pools, but for a distributed database. The feature isn't dead, but just never got much attention compared to other database features that needed to be added. What ended up happening was that the pools/default just ended up being a placeholder for something that the engineers wanted to build in the future. In the old days the idea was that a pool would be a subset of buckets that was assigned to a subset of nodes in the cluster and that this would help with management of large clusters (100+ nodes).
So right now I would say don't worry about the whole pools concept because in the current (2.x releases) this is a placeholder that doesn't have any special meaning. In the future though there will likely be a feature around the pools concept and it will be well documented. Please also note that no decisions have been made about what Couchbase will do with pools, how exactly they will work, or when they will be implemented. This post is only meant to give the history for why the pools concept exists.

Related

In need of an embeddable NoSQL database that handles ~1Gb datasets, persisted on disk

I am building an Electron app, for which I need to select an embeddable NoSQL database. In fact, this database is supposed to hold a local subset of data stored on an ArangoDB remote backend. I have been searching the Internet a lot, but fail so far to converge to an ultimate candidate. I hope that somebody could advise me from experience.
Typical datasets amount to possibly ~tens of thousands of documents, and I can imagine cases where the set would amount to ~1Gb over time. Furthermore, I have the need for secondary indexes.
I have looked at PouchDB, UnQlite, LokiJS, LevelDB, NeDB, LinvoDB...
In the end, NeDB and LinvoDB seem like reasonable candidates with persistence to disk (SQlite-like), where NeDB cannot handle large datasets; something which LinvoDB, a fork of NeDB, seems to be able to handle. LinvoDB does not load the whole database in memory, but appears to index "everything" by default and keep that in memory.
On the other hand, I have tried to follow several conversations regarding their indexes, where NeDB appears to suggest in their documentation that they are persisted to disk (https://github.com/louischatriot/nedb#indexing), once built, which appears then again to be negated by LinvoDB (sorry, I lost many of the quotes/sources in the vast amount of tabs open...), suggesting indexes are to be build from scratch on launch. (And it may also be I misunderstand NeDB's documentation althogether.)
Basically, what I need, is a JS database solution for an Electron app, which may hold "considerable" but not "huge" amounts of data. The app's loading times should be reasonable (i.e., not discourage usage), while being responsive (i.e., database should contain secondary indexes) and respecting the user's resources as much as possible.
Questions:
Has anybody any experience with above or other embedded NoSQL databases, by which any of these or others could be recommended for my use case?
If indeed LinvoDB's indexes need to be rebuilt from scratch every time I launch the app, could that be a significant performance hit (loading time of the order of seconds)? (Surely I'd have to benchmark this...)
ArangoDB is not embeddable, but perhaps I should consider to just deploy it as a service alongside my native app? This link NoSQL database: ArangoDB appears to suggest that the developers themselves do not discourage this. Would this be overkill and/or not user friendly? A performance hit?
Any advise would really be appreciated.
Have the same need, seems linvodb3 is the best choice currently. It's under positive developing and the target is dedicated to Electron desktop environment.
Have considered sqlite?
There is a npm package and it works with electron, i have tried it by myself.
You just have to rebuild electron, this could make some problems.
Here your answers:
yes I have, but not much
no I've never tried LinvoDB
no I've never tried ArangoDB too

Redis for cakePHP app

I want to start a big cakePHP project where performance will be an issue. I will have a users table with act as tree behavior and many financial data related to the users. This application will make a lot of dynamic reports aggregating data for different tree nodes etc.
Since there is on github an easy to use library which sets data source of model to redis, I was wondering if it's a good idea to use it for entire app? Is there anyone who has experience with it, and what could be potential problems if I decide to depend on redis as main/only data storage?
EDIT: I have installed redis and Tried to use RedisModel for two models with simple relation HasMany/BelongsTo. When I tried to simply use those models like standard AppModels - it simply wont work (Redis Error: Missing key). Apparently you can't use Model->find Model->save etc. in standard way. You have to use redis methods instead (setKeyValue ect.). This means that pagination and other cakePHP futures will also not work. So maybe it is not the best idea to use redisModel for all my models...
I cannot speak for CakePHP specifically, but I'll talk about redis in general and the points of your question in particular, it should be applicable to your framework of choice in the end. Let's see:
You mention you want to start an application where performance will be an issue — I just wanted to mention you should be careful with the assumption that you will need a nosql solution, because this is hard to assess beforehand. Redis is hella fast, but MySQL for instance has been proven to be capable to handling millions of records and operations just fine, provided it's properly configured and used, and it's much simpler if you need lots of relational structures.
Concerning Redis as the main and only data store:
Redis is perfectly stable for the job. Instagram
reportedly stored 300 million key-value pairs pseudo-sharded
using hashes to great effect, and while it's not the only data
storage system they use, it goes to show redis is pretty reliable.
This very site (Stack Overflow) uses redis also extensively for
caching purposes.
Redis is also reported to have an overall excellent continuous uptime on average (which shouldn't be surprising considering the point above)
Options exists to mitigate downtime issues, replication is supported to some extent, and Redis Cluster is coming soon to support proper distributed approaches.
The main problem you could face is not understanding properly how its
persistence works. You should absolutely read this and this article before you get started because this point is important. In a nutshell, redis does not write changes immediately to disk, which means that depending on your configuration, a crash can cause a data loss ranging from a few seconds to several minutes since the last disk write. This might or might not be a problem depending on your use case; if the data is extremely sensitive (ie, financial records) you might want to think twice before jumping to redis, or build a system where redis is not exclusively used but rather combined with another storage system.
Relational structures in a non-relational data store like redis mean doing more work and often duplicating/denormalizing data. It can be done, but it's something to consider; in your question you mention you'll need to aggregate data to generate dynamic reports, are you sure you want to use redis for this? it sounds like a relational database would give you way more flexibility at a very small cost of performance. If you know in advance you'll need to run complex queries over your data, it could be a good idea not to reinvent the wheel unless you absolutely need to.
My advice here would be to first get a better feeling on what redis is and how works, potentially build your own models instead of relying on others to better understand what can and cannot be done, and from there assess where you want to take it. Redis is reliable enough to be used standalone, but at the end of the day what's smart is to use the right tool for the right job, and you might find some things of your app work well with redis while some others are better off to a more traditional storage system.

which db to go for tiny data requirements

I need some help choosing databases for my application.
My web application will basically consist of a main table. lets call it the "User" table.
it will have the user info like name, id, password, address, phone etc.
There will be 5 other related tables where i will save each user's info.
eg. Table for books read, Table for songs heard, Food eaten etc.
Overall i dont expect my data to go beyond 1,000 users.
So, i have got tiny data requirements.
Generally i would have gone with mysql, but i am feeling a bit adventurous.
I want to try out some of the new solutions on the block.
my requirements are:
1. pure performance
2. good documentation, ease of use
since my db size shouldn't be more than a few hundreds megs in size, i'd rather the entire tablespace in the memory itself for faster performance. How about some of the new NoSQL DBs.
any recommendations? I have worked mainly on oracle and MySQl and don't have much idea of all the new exciting stuff out there.
I would suggest to go with sqlite if your database requirement is small.
From sqlite website:
SQLite is a compact library. With all features enabled, the library
size can be less than 350KiB, depending on the target platform and
compiler optimization settings. (64-bit code is larger. And some
compiler optimizations such as aggressive function inlining and loop
unrolling can cause the object code to be much larger.) If optional
features are omitted, the size of the SQLite library can be reduced
below 200KiB. SQLite can also be made to run in minimal stack space
(4KiB) and very little heap (100KiB), making SQLite a popular database
engine choice on memory constrained gadgets such as cellphones, PDAs,
and MP3 players. There is a tradeoff between memory usage and speed.
SQLite generally runs faster the more memory you give it.
Nevertheless, performance is usually quite good even in low-memory
environments.
Object oriented dbs can be used like db4o or versant.
Neo4j (for Java) is a pretty awesome tool. It's technically a graph database, but by the sounds of your data model, I think it would be well-suited for you. From what I've seen it performs very well, its documentation was just incredibly good, and if you are using Java then it's like second nature. You basically point it at a directory and it sets up shop there.
If you are feeling adventurous and happen to be using Java, I suggest you give it a try.
I think redis is exactly what you want!
Yesterday I downloaded and installed it for the first time. It runs completely in memory and that meets your performance requirement. (It only writes the data to disk for cases like power failure, like a backup, but this does not slow down the writes to it.)
For linux and such there is tar.gz on the download page.
For windows you can download Dusan's native port: http://redis.io/download - it is precompiled and also has the client console to try out.
The documentation is very good, for example this is the page for the data types: http://redis.io/topics/data-types and you also find all the other relevant information as a fast to browse reference there.
And there is a nice online tutorial to get started quickly: http://try.redis-db.com/ which is actually fun to work through.
I like the atomic operations like "increment by" and the list stuctures with push and pop.
There is also a hash type.
For python there is redis-py: https://github.com/andymccurdy/redis-py
Me myself being a python coder I think the data structures that redis offers do very good match the python datatypes.

Deploy redis on many servers

My web application is using redis for the main database. It's very nice in performance. At this time, my database is too big and I want to add some new servers for storage. But I still stuck in the solution how to distribute in stable and easy to backup.
Everyone has any ideas?
Thanks a lot!
From what I understand there is no automatic way of doing this built into redis itself (and it's hard to implementa generic way since it depends what your application will do with these data), you have to do this yourself (or in the driver like the ruby driver does).
I think your best bet is to put that logic in your application, without any knowledge of your application it is hard to say precisely but you may decide that the first part of your ids decide which redis server the key will be stored.
The ruby driver simply try to distribute the keys among the servers or takes the server index from the key name if formatted accordingly (something like "{server_id}mykey" after a quick glance at the code)
[Edit]
Possible solution:
- https://github.com/gmr/mredis
You don't mention which language you use. If it's Ruby then the driver has a client side sharding solution, which solves many problems. antirez is working on a cluster solution for Redis, but it is still unfinished.
Neither client side sharding nor Redis cluster can solve every problem though. If you, for example, need to do unions and intersections of sets you can't do that unless both sets happen to reside in the same shard (I believe Redis cluster will have some means to handle this, but not automatically).
Yet another solution is Redis diskstore, but just like the clustering it is not yet finished. Diskstore would mean that you can grow your dataset larger than RAM, and use replication to scale reads.

Distributed database management system - alternatives?

I am working to develop an application that needs data distributed across countries. Content will be supplied "per region", but needs to be able to be easily copied to another region. On top of this I have general information that needs to be shared and synchronized across the databases.
The organisation I work for is considering implementing this system themselves, but it feels like there should be some good solutions out there already (I am open to cloud solutions - the less my company needs to manage the better)?
This might be a vague question, but I think it is possible to answer it well.
What are my options when developing this kind of distributed data system?
Update:
Should have elaborated (but I'm not sure how much I can say given NDA). Suffice to say, I have "Content" which I need stored on some space (files). I need metadata stored about the content distributed over several nodes (that might be hosted by us or some one else) to allow fast-paced communication and regionalized differences in data. I need to control HOW data is replicated between nodes, but preferably in a standards compliant way. (Preferably not written by us)
You can try CouchDB. Its off-line replication model sounds like a good fit for geo distributed system.
Interesting question - but it would really help to get more context.
You talk about "data", which usually means something with a fairly well-defined structure, often implemented in a relational database.
You also talk about "content", which usually means something with a (much) less well-defined structure, often implemented as a document of some type. Many solutions exist for structuring "documents", e.g. file systems or web sites.
Assuming we are talking about structured data, the simplest thing to do is have single repository, accessible everywhere. Have a look at "cloud" offerings - Amazon's a good bet. Creating your own global data repository is a significant undertaking - but if you're dealing with highly confidential data, or have specific performance requirements, it may the way to go.
If neither of those options work, you're in the world of "enterprise service bus". Google it, but be careful - it's a complex field, and you really want to find someone who knows what they're doing.
Having said that, using an off the shelf ESB is many times less painful than building your own distributed data structure.
I know it's years after asking, but I was looking up the answer to the same question and it looks like Cassandra may fit the bill. Once setup, it looks and acts like other database solutions (Tables, Views, SQL, Transactions, etc.), but it can also be entirely decentralized. Each instance acts as a node in a cluster of other Cassandra nodes. They synchronize behind the scenes and if one goes down, the others pick up the slack. This makes Cassandra both highly scalable and highly fault tolerant.

Resources