What is the difference between Tokyo Cabinet and Kyoto Cabinet? - tokyo-cabinet

FAL Labs has multiple Tokyo products and Kyoto products:
Tokyo Cabinet and Kyoto Cabinet are both lightweight database libraries.
Tokyo Tyrant and Kyoto Tycoon are both lightweight database servers...
Can someone explain the difference between Tokyo and Kyoto products?

Tokyo Cabinet is more complete and stable, Kyoto is too fresh yet (today is Dec 8 2010) and has some issues. Kyoto, written in C++, is (much) more simple than Tokyo (written in C), but this simplicity let some gap. The performance of Kyoto is a little bit worse than Tokyo, but works better with threads (at least the documentation promises that).
From the official documentation:
<< In 2007, Tokyo Cabinet was developed as the successor to QDBM on the following purposes. They were achieved and Tokyo Cabinet could replace conventional DBM products.
(...)
In 2009, Kyoto Cabinet was developed as another successor to QDBM. Compared with the sibling product (Tokyo Cabinet), the following advantages were pursued. However, the performance of Tokyo Cabinet is higher than Kyoto Cabinet, at least in single thread operations. >>
I used both, but I still prefer Tokyo, because I had a problem with Kyoto: In Kyoto Cabinet Database using File Hash Database, how can avoid file size increasing? and no one was able to help me. I still don't know how to solve that.
In my personal experience, I found Kyoto easier to compile and install, and also easier to use. I had big problems with Tokyo library dependences and problems to link the native library with the Java interface. With Kyoto everything was good and works fine in the first attempt. But, as I said before, I feel more control over the database using Tokyo.

Tokyo Cabinet and Tyrant are LGPL and written in C. Kyoto Cabinet and Tycoon are GPLv3 and written in C++.
Kyoto Tyrant supports expired records in memory, so it can replace memcached.
The developer says Kyoto* isn't successor of Tokyo*, but it's just a marketing strategy;
if you're not gonna develop a commercial product, use Kyoto. It's newer and better.
And I suggest you to read the developer's blog (both Japanese and English]) and read header files carefully (if you're gonna use the library).
Good luck.

The most important difference between the two in regard to my use cases is that TC has a "table database" while KC has not.
Yes, you can serialize arbitrary data to string and store it as item value, but then you either cannot search by value at all, or need to iterate over the whole dataset and deserialize each item, or reinvent the wheel and manually index the data.
Tokyo Cabinet's TDB provides excellent query capabilities for nested data (indexes, numeric and string comparison, even regular expressions within "fields"). The Kyoto thing is just a KV store; TC is also a powerful document-oriented database.

Also, according to test what I did, protocol of Kyoto is only HTTP based - more open,
but slower than binary protocol of Tokyo thing.

Related

CouchBase 1.8 and 2.0 Erlang SDKs? Why is Erlang Left out

I really enjoy seeing the great work being done by CouchBase team on providing us with a great NoSQL solution. However, despite the fact that there are few erlang Web developers compared to perhaps ruby, PHP, java or Python, the number of developers picking up erlang are increasing. Which brings me to why on their SDK page, they have constantly left out Erlang. With yaws web Server, Mochiweb, and many other Erlang Web libraries, why in the world would they not support Erlang in their NoSQL realm. Its quite disturbing to discover that they use it in building their DBMS yet they do not provide a client/SDK for the language. Now, the question: Somewhere i read that its because there is no money in Erlang Web Development. Is this the only reason they give ? Who else knows why Couch Base has constantly refused to provide an Erlang SDK for their NoSQL database ?
I have received a phone call from couchbase company in April, this year. They asked me which language I use for programming.
I think it is related to how couchbase make money. From the website's information, they provide project team members for customer's project team for fee based on days or hours.
These members (couchbase employee) should use the same language as their customers, most customers are using c#, java, python. So they just provide these language's sdk, not erlang.
For a middle-long period ( 1 or 2 years), I think it is impossible to provide erlang sdk from couchbase.
I believe the primary issue is the amount of demand for an Erlang SDK. There are far more developers for Java, C#, Ruby, and Python than Erlang.
That being said, it should be possible to use Erlang with Couchbase for some features. Couchbase supports the memcached API, so basic key/value lookups should work. You can try this library and see if that works: erlmc. Couchbase 2.0 features such as views may not be accessible. Also, Couchbase is open source, so you could try writing your own client, if you really wanted to.
cberl is a NIF based Erlang client which uses libcouchbase. I have started working on but didn't spend much time. It is not fully tested and has some broken parts, but all the basic functionality is there so it is worth a shot. Now it is listed as an experimental SDK on couchbase website so I think it will get more traction and have less bugs in short time.

Berkeley DB java edition, any LGPL or BSD alternatives in Java?

I am dealing with a huge dataset consisting of key-value pairs. The queries are always in the form of range queries on the key space (keys are numbers) hence any persistent B-Tree like structure will handle the situation. I would like to use BDB-Java Edition but the product is closed source and my company doesn't want to buy BDB-JE License. I am wondering, would you please share your experience with any non-GPL java based key-value storage system.
Thanks,
-A
There is also OrientDB, which is a document database written in Java and can be embedded to application (no external server) like BDB Java edition. They use Apache 2.0 license.
They also have key/value based variant: OrientKV. I haven't really used Orient myself, just poking around, so I don't know if it supports your use case (range queries on key space). However, it advertises itself as really fast.
Though, it seems Orient DB is not very widely used. I even made a question asking if anybody has any experiences to share.
Tokyo Cabinet comes to mind as a very fast KV store which is under the LGPL and is embedded like the BDB and supports BTrees. It is c-based but a javaclient is available and I had no trouble installing it.
MongoDB and CouchDB nice , but it runs as a separate server. Again Java support is available.

What programming language is used to IMPLEMENT google algorithm?

It is known that google has best searching & indexing algorithm.
The also have good relevancy.
They are also quicker in getting down the latest results.
All that's fine.
What programming language (c, c++, java, etc...) & database (oracle, MySQL, etc...) have they used in achieving this (since they have to manipulate with volume of data quickly and effectively)?.
Though I'm not looking for their in-depth architecture (if in case violates their company policies) an overview of all such things could be useful.
Anybody please add you valuable suggestions and insight on this?
Google internally use C++, Java and Python. See Rhino on Rails:
One of the (hundreds of) cool things
about working for Google is that they
let teams experiment, as long as it's
done within certain broad and
well-defined boundaries. One of the
fences in this big playground is your
choice of programming language. You
have to play inside the fence defined
by C++, Java, Python, and JavaScript.
Google's search algorithm is essentially MapReduce, which stems from functional programming techniques, implemented in C++.
Google has its own storage mechanism for this called the Google File System.
Mainly pigeons:
PigeonRank's success relies primarily on the superior trainability of the domestic pigeon (Columba livia) and its unique capacity to recognize objects regardless of spatial orientation. The common gray pigeon can easily distinguish among items displaying only the minutest differences, an ability that enables it to select relevant web sites from among thousands of similar pages.
Relevance of search results is governed by quality of information retrieval algorithms they use, not the programming language.
But C++ is what most of their backend code is written in (for most services).
They don't use any off-the-shelf RDBMS products for data storage. All of that is written in-house.
Check it out, the Bigtable.

Social Networking backend architecture

Ideally, where would an application like Facebook store its "Friends" data?
In a database table? in an xml file?
From Facebooks engineering page:
"Already, we are the second most-trafficked PHP site in the world (Yahoo is #1), and one of the largest MySQL installations anywhere, running thousands of databases."
and
"We've built a lightweight but powerful multi-language RPC framework that allows us to seamlessly and easily tie together subsystems written in any language, running on any platform. Facebook is built in PHP, C++, Perl, Python, Erlang, Java, and even a little bit of ML—and it all works together.
* We are the largest user in the world of memcached, an open-source caching system. Originally developed by LiveJournal, we've since made so many scalability improvements and performance upgrades that we will be the primary contributor of features in the next major release.
* We've created a custom-built search engine serving millions of queries a day, completely distributed and entirely in-memory, with real-time updates."
Relational databases?
check out this blog: http://highscalability.com/ many real-world examples of systems architecures to learn from
"Friends" data is well-described in a graph database. Neo4j is an example, though I know it's not the way Facebook stores this information.
Facebook uses a number of database technologies that may be involved:
a patched version of MySQL
Cassandra
Hadoop
... others
Most probably it should contain some other mechanism. As an example a search engine does not keep its index as a database or XML file. To obtain a maximum performance generally they keep some tree (Binary search tree or more complicated one) and store them on disk in performance effective manner. So I guess such mechanism.
Certainly not in a XML file.
Yes, in a database, in one or several tables. And for the precise exemple of facebook, on several server.

What is the production ready NonSQL database?

With the rising of non-sql database usage in high traffic website, I'm interested to use it for my project. Now I've heard several names like Voldermort, MongoDB and CouchDB. But which are among these NonSQL database that is production ready? I've seen the download pages and it seems that none of them is production ready because is not version 1.0 yet. Is there any other names other than these 3 that is recommendable to be used in production?
What do you mean by production ready? As far as I know, all of them are being used on live systems.
You should make your choice based on how the features they provide fit your needs.
You can also add Tokyo Cabinet to the list as well as the mnesia database provided by the Erlang VM.
I think you need to start out from your project requirements to see what kind of database you really need. There are many non-relational DBMS:s out there and they differ a lot in what kind of problems they are good at solving. I think the article Should you go Beyond Relational Databases? by Martin Kleppmann is a good starting point for finding out what you need. There's also a lot of stackoverflow threads on similar topics, these are my favorites:
The Next-gen Databases
Non-Relational Database Design
When shouldn’t you use a relational
database?
Good reasons NOT to use a relational
database?
When you have narrowed down what you actually need you can take a deeper look into the alternatives to see which DBMS are production ready for your use case. Production readiness isn't a yes/no thing: people may successfully deploy some solution that for example lacks in tool support - in another project this could be a no-go.
As for version numbers different projects have a different take on this, so you can't just compare the version numbers. I'm involved in the graph database project Neo4j and even if it has been in production use for 5+ years by now we still haven't released a version 1.0 final yet.
I'm tempted to answer "use SIRA_PRISE".
It's definitely non-SQL.
And its current version is 1.2, meaning that someone like you must definitely assume it's "production-ready".
But perhaps I shouldn't be answering at all.
Nice article comparing rdbms with 'next gen' and listing some providers:
Is the Relational Database Doomed?
http://readwrite.com/2009/02/12/is-the-relational-database-doomed
I will suggest you to use Arangodb.
ArangoDB is a multi-model mostly-memory database with a flexible data model for documents and graphs. It is designed as a “general purpose database”, offering all the features you typically need for modern web applications.
ArangoDB is supposed to grow with the application—the project may start as a simple single-server prototype, nothing you couldn’t do with a relational database equally well. After some time, some geo-location features are needed and a shopping cart requires transactions. ArangoDB’s graph data model is useful for the recommendation system. The smartphone app needs a lean API to the back-end—this is where Foxx, ArangoDB’s integrated Javascript application framework, comes into play.
Another unique feature is ArangoDB’s query language AQL — it makes querying powerful and convenient. AQL enables you to describe complex filter conditions and joins in a readable format, much in the same way as SQL.
You can model your data in several ways:
in key/value pairs
as collections of documents
as graphs with nodes, edges, and properties for both
You can access data in ArangoDB:
using the general HTTP REST API via curl/wget, or your browser
via the ArangoDB shell (“arangosh”)
using a programming language specific client library
Server requirements for ArangoDB:
ArangoDB runs on Linux, OS X and Microsoft Windows.
It runs on 32bit and 64bit systems, though using a 32bit system will limit you to using only approximately 2 to 3 GB of data with ArangoDB.

Resources