How do I partition mongodb datasets?

How do I partition mongodb datasets? - database

I'm stuck in mongodb sharding and I need your help!
My first question is "How do I make my database partitioned:true in sh.status()?
I've worked with sharding servers and mongos but I need to partition my documents base on datetime.So I used tags and zone-ranges but I couldn't make this option true!
Here is the option I'm talking about:
I tried query it by sh.shardCollection("db.coll" , partitioned:true) but it doesn't work.

Create the index on which you would like to shard/partition:
use <database>
db.<collection>.createIndex({"<shard key field>":1})
Enable Sharding ( partition) the database:
sh.enableSharding("<database>")
Shard the collection :
sh.shardCollection("<database>.<collection>", { "<shard key field>" : 1, ... } )

Related

A field with big array on mongodb

I am a beginner at Mongo and I made a data base with the following topology.
Some fields of metadata and one field that contain the experiment results.
experiment results- vector of integers with ~150,000 values
status = db.DataTest.insert_one(
{
"person_num" : num,
"life_cycle" : cycle,
"other_metadata" : meta_data,
"results_of_experiment": big_array
}
)
I inserted something like 7500 of those documents
Its occupied 8GB of memory and work really slowly for find operations.
I don't need those experiment results to search by them only the option to retrieve them from the DB as chunk of data.
Is there another solution to store on the DB the experiment results?
Is using "gridfs" is relevant to this case and not too complicated?

Based on your comments, the most common query is
db.DataTest.find( { "life_cycle": { $gt: 800 } }).limit(5)
Without an index on the life_cycle field, MongoDB is forced to do a collection scan. That is, fetch & evaluate all documents in your collection one by one. In a large collection, this will take a long time.
MongoDB does not create indexes automatically. You would have to observe your most common queries, and create indexes to support those queries. As far as I know, there is no automatic index creation in any database software; SQL, NoSQL, or otherwise.
Database indexing is a deep subject and cannot be explained in a short answer.
Having said that, if you create an index on the life_cycle field, it should improve your query times but only for the query you posted above. Other query types would likely require different indexes. You can do so in the mongo shell:
db.DataTest.createIndex({life_cycle: 1})
I encourage you to read these pages to understand more about indexing in MongoDB:
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/applications/indexes/
https://docs.mongodb.com/manual/tutorial/create-indexes-to-support-queries/

Cloudant Database Map Reduce

I am new to cloudant , no-sql data base (i had worked on mongodb )
1) is there any cloudant ui to write the queires to find the resultset for developing.
2) how to create map-reduce in cloudant ?..
can u please reply me or send your thoughts.

The search indexes are written in JavaScript (at the moment, Cloduant has launched their own "Cloudant Query" which promises to be easier to work with but I haven't had the time to try it properly yet.)
Say you have documents in your DB which contain a field called "UserName" and you want to create a view on all these. You could write a function like this;
function(doc) {
if ( typeof doc.UserName !== "undefined" ) {
emit([doc.UserName], doc._id);
}
}
For example (it will output the user names and document ids)
If a given user name could be associated with multiple documents you could do this, for example;
function(doc) {
if ( typeof doc.UserName !== "undefined" ) {
emit([doc.UserName,doc._id], 1);
}
}
and also use the built-in "count" or "sum" reduce functions that Cloudant provides to tally the number of documents a given user name is associated with etc.
You can use the UI in the Cloudant DB dashboard to execute queries or (as I personally favour) use a tool like Postman (https://www.getpostman.com/)
One word of warning though; error- and sanity -checking of your JavaScript code is pretty much non-existent and you'll only know that something isn't working when you hit "save & build index" which can be a major pain if you're working on large databases (it can grind the whole thing to a halt). A pro tip, therefore, is to work out your indexes on smaller data sets in some safe little sandbox database before you let it lose on anything important...
All of this is supposedly going to be Much Better with Cloudant Query.

How to use indexed properties of NodeModels in cypher queries of Neo4django?

I'm a newbie to Django as well as neo4j. I'm using Django 1.4.5, neo4j 1.9.2 and neo4django 0.1.8
I've created NodeModel for a person node and indexed it on 'owner' and 'name' properties. Here is my models.py:
from neo4django.db import models as models2
class person_conns(models2.NodeModel):
owner = models2.StringProperty(max_length=30,indexed=True)
name = models2.StringProperty(max_length=30,indexed=True)
gender = models2.StringProperty(max_length=1)
parent = models2.Relationship('self',rel_type='parent_of',related_name='parents')
child = models2.Relationship('self',rel_type='child_of',related_name='children')
def __unicode__(self):
return self.name
Before I connected to Neo4j server, I set auto indexing to True and and gave indexable keys in conf/neo4j.properties file as follows:
# Autoindexing
# Enable auto-indexing for nodes, default is false
node_auto_indexing=true
# The node property keys to be auto-indexed, if enabled
node_keys_indexable=owner,name
# Enable auto-indexing for relationships, default is false
relationship_auto_indexing=true
# The relationship property keys to be auto-indexed, if enabled
relationship_keys_indexable=child_of,parent_of
I followed Neo4j: Step by Step to create an automatic index to update above file and manually create node_auto_index on neo4j server.
Below are the indexes created on neo4j server after executing syndb of django on neo4j database and manually creating auto indexes:
graph-person_conns lucene
{"to_lower_case":"true", "_blueprints:type":"MANUAL","type":"fulltext"}
node_auto_index lucene
{"_blueprints:type":"MANUAL", "type":"exact"}
As suggested in https://github.com/scholrly/neo4django/issues/123 I used connection.cypher(queries) to query the neo4j database
For Example:
listpar = connection.cypher("START no=node(*) RETURN no.owner?, no.name?",raw=True)
Above returns the owner and name of all nodes correctly. But when I try to query on indexed properties instead of 'number' or '*', as in case of:
listpar = connection.cypher("START no=node:node_auto_index(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives 0 rows.
listpar = connection.cypher("START no=node:graph-person_conns(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives
Exception Value:
Error [400]: Bad Request. Bad request syntax or unsupported method.
Invalid data sent: (' expected but-' found after graph
I tried other strings like name, person_conns instead of graph-person_conns but each time it gives error that the particular index does not exist. Am I doing a mistake while adding indexes?
My project mainly depends on filtering the nodes based on properties, so this part is really essential. Any pointers or suggestions would be appreciated. Thank you.
This is my first post on stackoverflow. So in case of any missing information or confusing statements please be patient. Thank you.
UPDATE:
Thank you for the help. For the benefit of others I would like to give example of how to use cypher queries to traverse/find shortest path between two nodes.
from neo4django.db import connection
results = connection.cypher("START source=node:`graph-person_conns`(person_name='s2sp1'),dest=node:`graph-person_conns`(person_name='s2c1') MATCH p=ShortestPath(source-[*]->dest) RETURN extract(i in nodes(p) : i.person_name), extract(j in rels(p) : type(j))")
This is to find shortest path between nodes named s2sp1 and s2c1 on the graph. Cypher queries are really cool and help traverse nodes limiting the hops, types of relations etc.
Can someone comment on the performance of this method? Also please suggest if there are any other efficient methods to access Neo4j from Django. Thank You :)

Hm, why are you using Cypher? neo4django QuerySets work just fine for the above if you set the properties to indexed=True (or not, it'll just be slower for those).
people = person_conns.objects.filter(name='n2')
The neo4django docs have some other querying examples, as do the Django docs. Neo4django executes those queries as Cypher on the backend- you really shouldn't need to drop down to writing the Cypher yourself unless you have a very particular traversal pattern or a performance issue.
Anyway, to more directly tackle your question- the last example you used needs backticks to escape the index name, like
listpar = connection.cypher("START no=node:`graph-person_conns`(name='s2') RETURN no.owner?, no.name?",raw=True)
The first example should work. One thought- did you flip the autoindexing on before or after saving the nodes you're searching for? If after, note that you'll have to manually reindex the nodes either using the Java API or by re-setting properties on the node, since it won't have been autoindexed.
HTH, and welcome to StackOverflow!

Does the NDB membership query ("IN" operation) performance degrade with lots of possible values?

The documentation for the IN query operation states that those queries are implemented as a big OR'ed equality query:
qry = Article.query(Article.tags.IN(['python', 'ruby', 'php']))
is equivalent to:
qry = Article.query(ndb.OR(Article.tags == 'python',
Article.tags == 'ruby',
Article.tags == 'php'))
I am currently modelling some entities for a GAE project and plan on using these membership queries with a lot of possible values:
qry = Player.query(Player.facebook_id.IN(list_of_facebook_ids))
where list_of_facebook_ids could have thousands of items.
Will this type of query perform well with thousands of possible values in the list? If not, what would be the recommended approach for modelling this?

This won't work with thousands of values (in fact I bet it starts degrading with more than 10 values). The only alternative I can think of are some form of precomputation. You'll have to change your schema.

One way you can you do it is to create a new model called FacebookPlayer which is an index. This would be keyed by facebook_id. You would update it whenever you add a new player. It looks something like this:
class FacebookUser(ndb.Model):
player = ndb.KeyProperty(kind='Player', required=True)
Now you can avoid queries altogether. You can do this:
# Build keys from facebook ids.
facebook_id_keys = []
for facebook_id in list_of_facebook_ids:
facebook_id_keys.append(ndb.Key('FacebookPlayer', facebook_id))
keysOfUsersMatchedByFacebookId = []
for facebook_player in ndb.get_multi(facebook_id_keys):
if facebook_player:
keysOfUsersMatchedByFacebookId.append(facebook_player.player)
usersMatchedByFacebookId = ndb.get_multi(keysOfUsersMatchedByFacebookId)
If list_of_facebook_ids is thousands of items, you should do this in batches.

Opa : insert new element to the Database

I'm in OPA for some days now and I really start to like it. I'm attending the first year of computer science and we make some database class the next year-
The little I know about Databases are from php, I have used MySQL with php and SQLlite with c++. But this type of database is a bit different from what I've seen.
I have followed the guide about database in OPA http://doc.opalang.org/manual/Hello--database but I have a question:
In the guide we declare a new Database:
type user_status = {regular} or {premium} or {admin}
type user_id = int
type user = { user_id id, string name, int age, user_status status }
database users {
user /all[{id}]
/all[_]/status = { regular }
}
We learn how to read this database and make some query to this database with Maps, but how do I add a new element? I was testing a bit:
/users/all[{id:0}]/name<-getusername;
but id should be auto increment, from the little I know.
Thanks everyone for the help =D
I really want to get in OPA, the little I have make is really impressive!

mongoDB and auto-increment
With mongoDB (the default Opa database) there is no auto-increment (like in SQL), for scalability reason.
But if you really need one, you can use a counter to create this feature yourself:
database users {
user /all[{id}]
int /fresh_key
/all[_]/status = { regular }
}
And increment the key each time you use it: /users/fresh_key++
Random fresh key
You can also generate a random id, for example with something like Random.string(6)
Read this thread to learn more about this technique: http://lists.owasp.org/pipermail/opa/2012-April/001052.html
User defined unique key
But if you are dealing with users, maybe you already have a unique key: what about using "login" or "email" as the unique key?

You can also use Date.in_milliseconds(Date.now_gmt()) for a more unique id, maybe concatenated with the user id

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How do I partition mongodb datasets? - database

Related

A field with big array on mongodb

Cloudant Database Map Reduce

How to use indexed properties of NodeModels in cypher queries of Neo4django?

Does the NDB membership query ("IN" operation) performance degrade with lots of possible values?

Opa : insert new element to the Database

Categories

Resources