I've noticed that rufus-tokyo and other apis support transactions in Tokyo Tyrant. I couldn't find any mention of the transaction support in the TT docs (http://1978th.net/tokyotyrant/spex.html#clientprog)
Is that transaction support simulated? Or is there a way to do a server-side transaction using the C api?
It's Tokyo Cabinet that supports transactions.
Related
Which from the well-known key/value stores has also the transaction support? We just need to interlace operations in transactions and rollback from time to time.
In case anyone stumbles on this in the future.
FoundationDB is fully ACID
Looks like foundation db has gone and this isn't relevant anymore link has been updated to Wikipedia
Personally, I'd recommend using REDIS. I've used it for a number of different applications and it's solid, fast and has a great community around it.
And it fully supports transactions: here's information on REDIS Transactional Support
I'm not very knowledgable about KV databases, but Berkeley DB springs to mind.
FAL Labs has multiple Tokyo products and Kyoto products:
Tokyo Cabinet and Kyoto Cabinet are both lightweight database libraries.
Tokyo Tyrant and Kyoto Tycoon are both lightweight database servers...
Can someone explain the difference between Tokyo and Kyoto products?
Tokyo Cabinet is more complete and stable, Kyoto is too fresh yet (today is Dec 8 2010) and has some issues. Kyoto, written in C++, is (much) more simple than Tokyo (written in C), but this simplicity let some gap. The performance of Kyoto is a little bit worse than Tokyo, but works better with threads (at least the documentation promises that).
From the official documentation:
<< In 2007, Tokyo Cabinet was developed as the successor to QDBM on the following purposes. They were achieved and Tokyo Cabinet could replace conventional DBM products.
(...)
In 2009, Kyoto Cabinet was developed as another successor to QDBM. Compared with the sibling product (Tokyo Cabinet), the following advantages were pursued. However, the performance of Tokyo Cabinet is higher than Kyoto Cabinet, at least in single thread operations. >>
I used both, but I still prefer Tokyo, because I had a problem with Kyoto: In Kyoto Cabinet Database using File Hash Database, how can avoid file size increasing? and no one was able to help me. I still don't know how to solve that.
In my personal experience, I found Kyoto easier to compile and install, and also easier to use. I had big problems with Tokyo library dependences and problems to link the native library with the Java interface. With Kyoto everything was good and works fine in the first attempt. But, as I said before, I feel more control over the database using Tokyo.
Tokyo Cabinet and Tyrant are LGPL and written in C. Kyoto Cabinet and Tycoon are GPLv3 and written in C++.
Kyoto Tyrant supports expired records in memory, so it can replace memcached.
The developer says Kyoto* isn't successor of Tokyo*, but it's just a marketing strategy;
if you're not gonna develop a commercial product, use Kyoto. It's newer and better.
And I suggest you to read the developer's blog (both Japanese and English]) and read header files carefully (if you're gonna use the library).
Good luck.
The most important difference between the two in regard to my use cases is that TC has a "table database" while KC has not.
Yes, you can serialize arbitrary data to string and store it as item value, but then you either cannot search by value at all, or need to iterate over the whole dataset and deserialize each item, or reinvent the wheel and manually index the data.
Tokyo Cabinet's TDB provides excellent query capabilities for nested data (indexes, numeric and string comparison, even regular expressions within "fields"). The Kyoto thing is just a KV store; TC is also a powerful document-oriented database.
Also, according to test what I did, protocol of Kyoto is only HTTP based - more open,
but slower than binary protocol of Tokyo thing.
Many things depend on BDB. When I go to install the prepackaged software for my server, each piece of software seems to want a different version of BerkeleyDB. But it seems when I compile them I can specify a specific BDB version. (The software involved includes Postfix, OpenLDAP, and Cyrus IMAP.)
I use BDB in python projects occaisionally and I have no clue what impact the different versions have on the database file created.
I would like to know the difference between all the different Berkeley DB versions. It seems difficult to find information about the different versions and any API or file format differences, incompatibilities between versions, et cetera.
I know at minimum the following versions exist:
1.85 (a historical version?)
2.x
3.x
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
5.0
Generally, for each release you can find a Change Log in the documentation that gets downloaded with the release. You can also find a history of past releases, as well as upgrade instructions in the Build, Installation and Upgrading Guide here. You can also find the list of historic change logs here.
As you have discovered, different packages link in different versions of the Berkeley DB library. Usually, the BDB library name includes the release number, so that multiple versions can co-exist on a system at the same time. Since Berkeley DB is used by so many different packages, it is not uncommon to have multiple versions of Berkeley DB on your system.
A quick summary of the major releases/features:
1.85: Last UCB release (1994)
2.0: Adds transactions, recovery (1997)
3.0: Adds Queue AM, POSIX threads, subdatabases (1999)
3.3: Adds Bulk get, Secondary Indices, Degree 1 isolation (Dirty Reads)
4.0: Adds Replication (2001)
4.1: Adds Encryption & Checksums
4.2: Adds Java Collections API
4.3: Adds Sequence numbers, Degree 2 isolation
4.4: Adds Database compaction, in-memory databases, Peer-to-Peer HA
4.5: Adds MVCC, Replication Mgr API
4.6: Adds Cache priority per operation
4.7: Adds Java DPL API, Architecture neutral HA
4.8: Adds C# API, C++ STL API, SMP scalability improvements, Table partitioning, Bulk Insert & Delete, Foreign Keys
5.0: Adds SQL API, JDBC/ODBC, Full Text and R-tree search (2010)
The interim releases add support for additional platforms and other features and enhancements.
I hope that this helps.
Regards,
Dave
I am dealing with a huge dataset consisting of key-value pairs. The queries are always in the form of range queries on the key space (keys are numbers) hence any persistent B-Tree like structure will handle the situation. I would like to use BDB-Java Edition but the product is closed source and my company doesn't want to buy BDB-JE License. I am wondering, would you please share your experience with any non-GPL java based key-value storage system.
Thanks,
-A
There is also OrientDB, which is a document database written in Java and can be embedded to application (no external server) like BDB Java edition. They use Apache 2.0 license.
They also have key/value based variant: OrientKV. I haven't really used Orient myself, just poking around, so I don't know if it supports your use case (range queries on key space). However, it advertises itself as really fast.
Though, it seems Orient DB is not very widely used. I even made a question asking if anybody has any experiences to share.
Tokyo Cabinet comes to mind as a very fast KV store which is under the LGPL and is embedded like the BDB and supports BTrees. It is c-based but a javaclient is available and I had no trouble installing it.
MongoDB and CouchDB nice , but it runs as a separate server. Again Java support is available.
Ideally, where would an application like Facebook store its "Friends" data?
In a database table? in an xml file?
From Facebooks engineering page:
"Already, we are the second most-trafficked PHP site in the world (Yahoo is #1), and one of the largest MySQL installations anywhere, running thousands of databases."
and
"We've built a lightweight but powerful multi-language RPC framework that allows us to seamlessly and easily tie together subsystems written in any language, running on any platform. Facebook is built in PHP, C++, Perl, Python, Erlang, Java, and even a little bit of ML—and it all works together.
* We are the largest user in the world of memcached, an open-source caching system. Originally developed by LiveJournal, we've since made so many scalability improvements and performance upgrades that we will be the primary contributor of features in the next major release.
* We've created a custom-built search engine serving millions of queries a day, completely distributed and entirely in-memory, with real-time updates."
Relational databases?
check out this blog: http://highscalability.com/ many real-world examples of systems architecures to learn from
"Friends" data is well-described in a graph database. Neo4j is an example, though I know it's not the way Facebook stores this information.
Facebook uses a number of database technologies that may be involved:
a patched version of MySQL
Cassandra
Hadoop
... others
Most probably it should contain some other mechanism. As an example a search engine does not keep its index as a database or XML file. To obtain a maximum performance generally they keep some tree (Binary search tree or more complicated one) and store them on disk in performance effective manner. So I guess such mechanism.
Certainly not in a XML file.
Yes, in a database, in one or several tables. And for the precise exemple of facebook, on several server.