How do NoSQL (MongoDB) search for something quickly?

How do NoSQL (MongoDB) search for something quickly? - database

I'm not sure if this is the right place to ask this question (please move if it isn't).
I was wondering how NoSQLs like MongoDB search for items. As I understand it, NoSQL is merely a database that is NOT SQL (no actual structure). I'll use MongoDB since that's the only type I've had experience with dealing. In the case of MongoDB, collections (instead of tables) that store items in json format.
SQL has columns we can search and sort by. Using json format to store items though, would it not need to do a step like parse or json_decode to take out the item and compare thereby, slowing down the request?
Appreciate any info in advance.

Every item location of the NoSQL table is stored in HashMap against the hash of primary key. So, it retrieves the data very fast.

You are mistaken if you think, that the external representation as JSON implies the internal data format. (Just as the display as a "table" does not imply anything about the internal structures of a SQL database.)
Fortunately, mongodb is open source, so you can just look into https://github.com/mongodb/mongo

Related

How to store user defined data structures?

I'm building a mobile application that records information about items and then outputs an automatically generated report.
Each Item may be of various types, each type requires different information to be recorded. The user needs to be able to specify what is to be stored for each type.
Is there a "best" way to store this type of information in a relational database?
My current plan is to have a Type table that maps Types to Attributes that need to be recorded for that Type. Does this sound sensible? I imagine that it may get messy when I come to produce reports from this data.
I guess I need a way of generalising the information that needs to be recorded?
I think I just need some pointers in the right direction.
Thanks!

Only a suggestion, might not be an answer... use JSON and go for no-sql database. Today it is more convenient to operate and play around with data in not strictly relational database format.
That way you can define a model(s), or create you own data structure as mentioned and store it easily as a collection of documents of that model. Also no-sql allows structure changes without obligating you to define entire "column" for all "rows" present there ;)
Check this out about MongoDB and NoSQL explanation.
This is also a beatiful post that i love about data modeling in
NoSQL.

Can we use Elastic Search as Backend Data Store by replacing Postgres?

We are using Postgres to store and work with app data, the app data contains mainly:
We need to store the incoming request json after processing that.
We need to search the particular JSON using the Identifier field, for which we are creating a separate column, for each row in the table
For Clients, they may require searching the JSON column, I mean client want to one json based on certain key value in the json
All these things are ok at present with Postgres, when I am reading some blog article, where they mentioned that we can use ElasticSearch as backend data store also, instead of just as search server, if we can use like that, can we replace Postgres with ElasticSearch? What advantages I can get in doing this, what are the pros of postgres when compared with ElasticSearch for my case, what are cons?
Can anyone given some advice please.

Responding the questions one by one:
We need to store the incoming request json after processing that.
Yes and No. ElasticSearch allows to store JSON objects. This works if the JSON structure is known beforehand and/or is stable (i.e. the same keys in the JSON have the same type always).
By default the mapping (i.e. schema of the collection) is dynamic, means it allows to infer schema based on the value inserted. Say we insert this document:
{"amount": 1.5} <-- insert succeeds
And immediately after try to insert this one:
{"amount": {"value" 1.5, "currency": "EUR"]} <-- insert fails
ES will reply with an error message:
Current token (START_OBJECT) not numeric, can not use numeric value accessors\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper#757a68a8; line: 1, column: 13]
If you have JSON objects of unknown structure you can still store them in ES, it could be done by using the type object and setting property enabled: false; this will not allow you to do any kind of queries on the content of such field though.
We need to search the particular JSON using the Identifier field , for which we are creating a separate column ,for each row in the table
Yes. This can be done using field of type keyword if identifier is an arbitrary string, or integer if it is an integer.
For Clients, they may require searching the JSON column, I mean client want to one json based on certain key value in the json.
As per 1), yes and no. If JSON schema is known and strict, it can be done. If JSON structure is arbitrary, it can be stored but will not be queryable.
Though I would say ElasticSearch is not suitable for your case, there are some some guys that make JDBC and ODBC drivers for ElasticSearch, apparently in some cases ElasticSearch can be used as relational database.

elasticsearch is a HTTP wrapper to Apache Lucene. Apache Lucene stores object in a columnar fashion in order to speed-up search (Lucene segments).
I am completing the very good Nikolay answer:
The good:
Both Lucene and Elasticsearch are solid project
Elasticsearch is (my opinion) the best and easiest software for clustering (sharding and replication)
Support version conflict (https://www.elastic.co/guide/en/elasticsearch/guide/current/concurrency-solutions.html)
The bad:
not realtime (https://www.elastic.co/guide/en/elasticsearch/guide/current/near-real-time.html)
No support ACID transaction (Changes to individual documents are ACIDic, but not changes involving multiple documents.)
Slow to get lot of data (must use search scroll, very slow comparing to a SQL database fetch)
No authentication and access-control
My opinion is use elasticsearch as a kind of view of your database, with read-only access.

Map database object to source implementation

What would be the best way to tie a database object to a source code implementation? Basically so that I could have a table of "ingredients" that could be referred to by objects from another table containing a "recipe", while still being able to index and search efficiently by their metadata. Also taking into account that some "ingredients" might inherit from other "ingredients".
Maybe I'm looking at this in a totally wrong way, would appreciate any light on the subject.

If I've correctly understood your goal, there should be these two choices:
Use an OR/M and don't try to implement the data mapping yourself from scratch.
Switch to a NoSQL storage. Analyze your data model and see if it's not very relational and it can be expressed using a document storage like MongoDB. For example, MongoDB already supports indexing.

MongoDB vs SQL Server for storing recursive trees of data

I'm currently specing out a project that stored threaded comment trees.
For those of you unfamiliar with what I'm talking about I'll explain, basically every comment has a parent comment, rather than just belonging to a thread. Currently, I'm working on a relational SQL Server model of storing this data, simply because it's what I'm used to. It looks like so:
Id int --PK
ThreadId int --FK
UserId int --FK
ParentCommentId int --FK (relates back to Id)
Comment nvarchar(max)
Time datetime
What I do is select all of the comments by ThreadId, then in code, recursively build out my object tree. I'm also doing a join to get things like the User's name.
It just seems to me that maybe a document storage like MongoDB which is NoSql would be a better choice for this sort of model. But I don't know anything about it.
What would be the pitfalls if I do choose MongoDB?
If I'm storing it as a Document in MongoDB, would I have to include the User's name on each comment to prevent myself from having to pull up each user record by key, since it's not "relational"?
Do you have to aggressively cache "related" data on the objects you need them on when you're using MongoDB?
EDIT: I did find this arcticle about storing trees of information in MongoDB. Given that one of my requirements is the ability to list to a logged in user a list of his recent comments, I'm now strongly leaning towards just using SQL Server, because I don't think I'll be able to do anything clever with MongoDB that will result in real performance benefits. But I could be wrong. I'm really hoping an expert (or two) on the matter will chime in with more information.

The main advantage of storing hierarchical data in Mongo (and other document databases) is the ability to store multiple copies of the data in ways that make queries more efficient for different use cases. In your case, it would be extremely fast to retrieve the whole thread if it were stored as a hierarchical nested document, but you'd probably also want to store each comment un-nested or possibly in an array under the user's record to satisfy your 2nd requirement. Because of the arbitrary nesting, I don't think that Mongo would be able to effectively index your hierarchy by user ID.
As with all NoSQL stores, you get more benefit by being able to scale out to lots of data nodes, allowing for many simultaneous readers and writers.
Hope that helps

How do you mix SQL DB vs. Key-Value store (i.e. Redis)

I'm reviewing my code and realize I spend a tremendous amount of time
taking rows from a database,
formatting as XML,
AJAX GET to browser, and then
converting back into a hashed javascript object as my local datastore.
On updates, I have to reverse the process (except using POST instead of XML.)
Having just started looking at Redis, I'm thinking I can save a tremendous amount of time keeping the objects in a key-value store on the server and just using JSON to transfer directly to JS client. But my feeble mind can't anticipate what I'm giving up by leaving a SQL DB (i.e. I'm scared to give up the GROUP BY/HAVING queries)
For my data, I have:
many-many relationships, i.e. obj-tags, obj-groups, etc.
query objects by a combination of such, i.e. WHERE tag IN ('a', 'b','c') AND group in ('x','y')
self joins, i.e. ALL the tags for each object WHERE tag='a' (sql group_concat())
a lot of outer joins, i.e. OUTER JOIN rating ON o.id = rating.obj_id
and feeds, which seem to be a strong point in REDIS
How do you successfully mix key-value & SQL DBs?
For example, is practical to join a large list of obj.Ids from a REDIS set with SQL data using a SQL RANGE query (i.e. WHERE obj.id IN (1,4,6,7,8,34,876,9879,567,345, ...), or vice versa?
ideas/suggestions welcome.

You may want to take a look at MongoDB. It works with JSON style objects, and comes with SQL like indexing & querying. Redis is more suitable for storing data structures likes lists & sets, when you want a simple lookup instead of a complex query.

Now that the actual problem is more defined (i.e. you spend a lot of time writing repetitive conversion code to move from one layer/representation to the next) maybe you could consider writing (or googling for) something that automatizes this, maybe?
Googles returns plenty of results for "convert table to XML" (and the reverse), would this help? Would something going directly from table to key/value pairs be better? Have you tried tackling this problem in a generalized way?

When you say "I spend a tremendous amount of time" do you mean this is a lot of development time, or are you referring to computing time?
Personally I'd be wary of mixing a RDBMS with a non-RDBMS solution, because this will probably create problems when the two different paradigms clash.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight