How does freebase storage system works? Does it use a type of distributed key value store like Cassandra or MongoDB? I have been looking for documentation regarding the details of this..
Freebase uses a proprietary graph database called Graphd to store the types, attributes, topics, etc.
Related
What should be the ideal database-schema if you are building an Oauth Provider and store session timings?
https://auth0.com/blog/how-we-store-data-in-the-cloud-at-auth0/ Auth0 uses MongoDB.
I assume it has everything to do with the 'More reads less writes' motto that defines NoSQL. Plus many mongo specific features like Replication, which help keep the integrity of the user data.
For logging specifically, Microsoft recommends Azure Blob Storage ABS for .NET and other providers like Serilog.
NoSQL db's like MongoDB don't follow a rigid schema, and logs could be stored in an unstructured data storage, but the modern way is using Object Storage which various cloud providers provide. Unstructured data is simply storing data without any schema, relation or clear structure. Object Storage is storing the same things without any folders or subdirectories, plus defining some metadata for each file and is a better way of storing logs like session-timings.
When learning Distributed Storage System,I faced a basic question: what are structured unstructured and semistructured data and their differeces.I already know simple differences between them, What I want to know are how to differ them inside.
Structured Data is SQL like structures where the number of fields (columns) is fixed and every entry in the collection (table) has the same structure. References to other collections/tables are 'hardwired' via foreign keys.
Unstructured Data is like MongoDB where a collection is a loose association of documents which do not require to have the same structure. Each document can have different elements and references to other documents can be ad hoc.
Semi-structured systems are various hybrids of the two. For example Google's Firebase repository each document must have the same elements; however, relationships are ad hoc. Semi-structured data often include semantics like inheritance and isA vs. hasA relationships.
As described above, should I store object storage keys in database for the purpose of searching, aggregating etc.? What best practice you have?
Appreciated for any suggestions and answers.
It is a known practice to use AWS DynamoDB as a metadata store for S3 objects for searching and listing objects based on objects metadata.
However if you need full text search on objects (e.g Text files in S3) you might need to consider using an indexing service such as AWS Elastic Search. You can also use AWS Athena if the query time is not that significant.
Now my architecture is based on Berkeley DB but doesn't perform well and it's not easy to clusterize.
I'm looking for a NOSQL database to store key/value data clusterizable and with best performance on get next key ordered and get by key.
There are no difference for me on RAM or persistence storage.
Check this question for tips: Hierarchical, ordered, key-value store?
Also, there is a Google project looks promising: https://code.google.com/p/leveldb/ - I haven't used it, I'm just found it looking for similar NoSQL DB engine.
I am looking for nosql key value stores that also provide for storing/maintaining relationships between stored entities. I know Google App Engine's datastore allows for owned and unowned relationships between entities. Does any of the popular nosql store's provide something similar?
Even though most of them are schema less, are there methods to appropriate relationships onto a key value store?
It belongs to the core features of graph databases to provide support for relationships between entities. Typically, you model your entities as nodes and the relationships as relationships/edges in the graph. Unlike RDBMS you don't have to define relationships in advance -- just add them to the graph as needed (schema-free). I created a domain modeling gallery giving a few examples of how this can look in practice. The examples use the Neo4j graphdb, a project I'm involved in. The mailing list of this project use to prove very helpful for graph modeling questions.
The document-oriented database Riak has support for links between documents.
You can add support for relationships on top of any database engine (like key/value), but it doesn't come whithout work. It all comes down to your use case. If you provide more details it's easier to come up with a useful answer.
Oops, now I saw that the title says "nosql store" and then your actual question narrows this down to "nosql key value store". As key/value stores have no semantics for defining relationships between entities I'll still post my answer.
MongoDB is a document database, not a key/value store. It does provide, however, a simple form of inter-document references. These work more-or-less like SQL foreign keys that are automatically nulled when the referenced object is deleted.
This is adequate for the same sorts of things for which you'd use foreign keys, but it isn't optimized for serious graph traversal.
The relationships in the Google App Engine are only keys to entities that are automatically de-referenced when accessed in code. And are only values when used to filter against. Its a function of the DB Api rather than anything explicit, so the access to the ReferenceProperty will simply perform a query against the referenced model to get access to the object.
If you look at something like MongoDB, the relationships are stored in-object (from what I remeber), but they can also be stored however you want in the sense that you would create an API that would search the joined table for your item in the relationship in a similar manner to who the App Engine works.
Paul.