In-memory nosql database that can do upto a million reads

In-memory nosql database that can do upto a million reads - arm

I have been looking up for databases that can do up to a million writes per second which should also be an in-memory , nosql database which will run on an arm based architecture.
I have read about aerospike but it does not support arm based architectures.
Can somebody suggest a database for the above requirements. Any help will be appreciated. Thank you.

Aerospike has a unique hybrid memory architecture where you can decide how you want to split the data between in-memory & storage.
Link: https://aerospike.com/products/features/hybrid-memory-architecture/
Let me know if you need any help related to Aerospike at any point in time.

Redis makes some claims to extraordinary performance, and at least some have had success building it for ARM. See https://groups.google.com/forum/#!topic/redis-db/4j_ohRJTD5g

Related

Which are the useful metrics to monitor in a Oracle database

I am looking to monitor an Oracle database.
Do you have an idea about the metrics useful that I have to monitor?
Thank you in advance for your help.

As an administration question, this is probably not the right site for this question. However, Oracle provides excellent documentation. Which metrics you monitor depends on your objective: Just keeping the database running (e.g., memory, CPU, storage limits being reached), making it run fast (e.g., optimizing pl/sql code, looking for the longest queries in v$sql, missing indexes/full table scans), etc.
Check out: https://docs.oracle.com/database/121/TGDBA/title.htm

I think:
CPU occupation
Memory occupation
Number of open sessions

OrientDB in distributed architecture works with vertex replication across servers?

I have worked in a project with OrientDB graph database. I've managed to fill the database and perform the queries in it without problems. But after I needed to run my queries using the distributed feature from OrientDB, and I came with an important (maybe trivial) doubt.
I've managed to use the distributed mode also without problems using 3 differente machines, but I wanted to be sure that OrientDB is really storing my database within the 3 machines that I've used. Is there any way to check that?
When I was researching for this answer, I came to the conclusion that OrientDB replicates the entire database across all the machines, is that correct? The goal to use the distributed architecture was to improve performance, but if OrientDB works with replication, and I run one query in a specific machine, the query will be processed using all machines, or only one?
To be short, I want to know if OrientDB when using the distributed mode, distributes the vertex and edges across the machines, and process the queries using all the machines?
I've read the entire documentation : http://orientdb.com/docs/2.0/orientdb.wiki/Distributed-Architecture.html and could not find a clear explanation for this questions.
Thanks in advance!

OrientDB, by default, replicates the entire DB on all the servers. What you're looking for is called "Sharding". OrientDB supports manual sharding (automatic in the future), that means you (the application) can decide where to store the vertices/edges.

suggestions for a replicated datastore

I am looking for a simple key-value datastore which will automatically replicate itself on different machines. Unfortunately, a Distributed Hash Table will not work for me since I need the whole datastore to be available on all machines. I have looked at mnesia from the erlang world but talking to it from different languages is a pain.
Any suggestions on what I should go for?
Thanks!

DNS. It's a distributed cached mapping table replicated across all masters (and partially to clients).
I think we can safely say it scales.

Which embedded database capable of 100 million records has an efficient C or C++ API

I'm looking for a cross-platform database engine that can handle databases up hundreds of millions of records without severe degradation in query performance. It needs to have a C or C++ API which will allow easy, fast construction of records and parsing returned data.
Highly discouraged are products where data has to be translated to and from strings just to get it into the database. The technical users storing things like IP addresses don't want or need this overhead. This is a very important criteria so if you're going to refer to products, please be explicit about how they offer such a direct API. Not wishing to be rude, but I can use Google - please assume I've found most mainstream products and I'm asking because it's often hard to work out just what direct API they offer, rather than just a C wrapper around SQL.
It does not need to be an RDBMS - a simple ISAM record-oriented approach would be sufficient.
Whilst the primary need is for a single-user database, expansion to some kind of shared file or server operations is likely for future use.
Access to source code, either open source or via licensing, is highly desirable if the database comes from a small company. It must not be GPL or LGPL.

you might consider C-Tree by FairCom - tell 'em I sent you ;-)

i'm the author of hamsterdb.
tokyo cabinet and berkeleydb should work fine. hamsterdb definitely will work. It's a plain C API, open source, platform independent, very fast and tested with databases up to several hundreds of GB and hundreds of million items.
If you are willing to evaluate and need support then drop me a mail (contact form on hamsterdb.com) - i will help as good as i can!
bye
Christoph

You didn't mention what platform you are on, but if Windows only is OK, take a look at the Extensible Storage Engine (previously known as Jet Blue), the embedded ISAM table engine included in Windows 2000 and later. It's used for Active Directory, Exchange, and other internal components, optimized for a small number of large tables.
It has a C interface and supports binary data types natively. It supports indexes, transactions and uses a log to ensure atomicity and durability. There is no query language; you have to work with the tables and indexes directly yourself.
ESE doesn't like to open files over a network, and doesn't support sharing a database through file sharing. You're going to be hard pressed to find any database engine that supports sharing through file sharing. The Access Jet database engine (AKA Jet Red, totally separate code base) is the only one I know of, and it's notorious for corrupting files over the network, especially if they're large (>100 MB).
Whatever engine you use, you'll most likely have to implement the shared usage functions yourself in your own network server process or use a discrete database engine.

For anyone finding this page a few years later, I'm now using LevelDB with some scaffolding on top to add the multiple indexing necessary. In particular, it's a nice fit for embedded databases on iOS. I ended up writing a book about it! (Getting Started with LevelDB, from Packt in late 2013).

One option could be Firebird. It offers both a server based product, as well as an embedded product.
It is also open source and there are a large number of providers for all types of languages.

I believe what you are looking for is BerkeleyDB:
http://www.oracle.com/technology/products/berkeley-db/db/index.html
Never mind that it's Oracle, the license is free, and it's open-source -- the only catch is that if you redistribute your software that uses BerkeleyDB, you must make your source available as well -- or buy a license.
It does not provide SQL support, but rather direct lookups (via b-tree or hash-table structure, whichever makes more sense for your needs). It's extremely reliable, fast, ACID, has built-in replication support, and so on.
Here is a small quote from the page I refer to above, that lists a few features:
Data Storage
Berkeley DB stores data quickly and
easily without the overhead found in
other databases. Berkeley DB is a C
library that runs in the same process
as your application, avoiding the
interprocess communication delays of
using a remote database server. Shared
caches keep the most active data in
memory, avoiding costly disk access.
Local, in-process data storage
Schema-neutral, application native data format
Indexed and sequential retrieval (Btree, Queue, Recno, Hash)
Multiple processes per application and multiple threads per process
Fine grained and configurable locking for highly concurrent systems
Multi-version concurrency control (MVCC)
Support for secondary indexes
In-memory, on disk or both
Online Btree compaction
Online Btree disk space reclamation
Online abandoned lock removal
On disk data encryption (AES)
Records up to 4GB and tables up to 256TB
Update: Just ran across this project and thought of the question you posted:
http://tokyocabinet.sourceforge.net/index.html . It is under LGPL, so not compatible with your restrictions, but an interesting project to check out, nonetheless.

SQLite would meet those criteria, except for the eventual shared file scenario in the future (and actually it could probably do that to if the network file system implements file locks correctly).

Many good solutions (such as SQLite) have been mentioned. Let me add two, since you don't require SQL:
HamsterDB fast, simple to use, can store arbitrary binary data. No provision for shared databases.
Glib HashTable module seems quite interesting too and is very
common so you won't risk going into a dead end. On the other end,
I'm not sure there is and easy way to store the database on the
disk, it's mostly for in-memory stuff
I've tested both on multi-million records projects.

As you are familiar with Fairtree, then you are probably also familiar with Raima RDM.
It went open source a few years ago, then dbstar claimed that they had somehow acquired the copyright. This seems debatable though. From reading the original Raima license, this does not seem possible. Of course it is possible to stay with the original code release. It is rather rare, but I have a copy archived away.

SQLite tends to be the first option. It doesn't store data as strings but I think you have to build a SQL command to do the insertion and that command will have some string building.
BerkeleyDB is a well engineered product if you don't need a relationDB. I have no idea what Oracle charges for it and if you would need a license for your application.
Personally I would consider why you have some of your requirements . Have you done testing to verify the requirement that you need to do direct insertion into the database? Seems like you could take a couple of hours to write up a wrapper that converts from whatever API you want to SQL and then see if SQLite, MySql,... meet your speed requirements.

There used to be a product called b-trieve but I'm not sure if source code was included. I think it has been discontinued. The only database engine I know of with an ISAM orientation is c-tree.

Choice of embedded database?

We are building an application on an embedded platform that needs a reasonably high performance database (very low select speeds on tables with > 500,000 entries).
The database needs to be able to :
Store atomic commit information in NVRAM so that such information is preserved if power fails before the commit finishes.
Be written to NAND Flash in such a way as to level wearing across the memory (could be done using, e.g. jffs2 or yaffs2).
Currently our options appear to be a "roll-your own" approach, or possibly SQLite.
Any other options, or pointers about the details of "rolling your own" or working with SQLite appreciated!
Edit: The target has 32MB of RAM, 1MB of NVRAM and 64MB of NAND Flash. The rest of the code is C, so that is the preferred language. The target processor is an ARM. In general, the queries that need to have the most performance are pretty simple. Complex queries don't need to have the same level of performance.

Apple's iPhone (and iPod Touch) uses the SQLite DB for a lot of its functions, so there's definitely a proven flash-based platform there. However, I doubt the amount of data in any of those tables has > 500k rows.

I think this Wikipedia RDBMS comparison might help you in making your choice.
But I don't understand why you have your specific NVRAM requirement.

Codebase provides a solid portable lightweight fast isam with transactions.

If your embedded system has access to the .NET framework, you can embed VistaDB.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

In-memory nosql database that can do upto a million reads - arm

Aerospike has a unique hybrid memory architecture where you can decide how you want to split the data between in-memory & storage. Link: https://aerospike.com/products/features/hybrid-memory-architecture/ Let me know if you need any help related to Aerospike at any point in time.

Redis makes some claims to extraordinary performance, and at least some have had success building it for ARM. See https://groups.google.com/forum/#!topic/redis-db/4j_ohRJTD5g

Related

Which are the useful metrics to monitor in a Oracle database

OrientDB in distributed architecture works with vertex replication across servers?

suggestions for a replicated datastore

Which embedded database capable of 100 million records has an efficient C or C++ API

Choice of embedded database?

Categories

Resources