I am trying to update my collections in my mongodb instance hosted on mlab.
I am running the following code:
...
db.collectionOne.insert(someArrayOfJson)
db.collectionTwo.insert(someArrayOfJson)
The first collection gets updated and the second doesn't.
Using the same/different valid Json arrays produce the same outcome. Only the first gets updated.
I have seen this question duplicate document - same collection and I can understand why it wouldn't work. But my problem is across two seperate collections?
When inserting the data manually on mlab the document goes in the second collection fine - so I am lead to believe it allows duplicate data accross seperate collections.
I am new to mongo - am I missing something simple?
Update:
The response is:
22:01:53.224 [main] DEBUG org.mongodb.driver.protocol.insert - Inserting 20 documents into namespace db.collectionTwo on connection [connectionId{localValue:2, serverValue:41122}] to server ds141043.mlab.com:41043
22:01:53.386 [main] DEBUG org.mongodb.driver.protocol.insert - Insert completed
22:01:53.403 [main] DEBUG org.mongodb.driver.protocol.insert - Inserting 20 documents into namespace db.collectionOne on connection [connectionId{localValue:2, serverValue:41122}] to server ds141043.mlab.com:41043
22:01:55.297 [main] DEBUG org.mongodb.driver.protocol.insert - Insert completed
But there is nothing entered into the db for the second dataset.
Update v2:
If I make a call after the two inserts such as:
db.createCollection("log", { capped : true, size : 5242880, max : 5000 } )
The data collections get updated!
what is total data size ???
here is the sample code it works for me
db.collectionOne.insert([{"name1":"John","age1":30,"cars1":[ "Ford", "BMW", "Fiat"]},{"name1":"John","age1":30,"cars1":[ "Ford", "BMW", "Fiat" ]}]); db.collectionTwo.insert([{"name2":"John","age2":30,"cars2":[ "Ford", "BMW", "Fiat"]},{"name2":"John","age2":30,"cars2":[ "Ford", "BMW", "Fiat" ]}])
If data is more the you can use "Mongo Bulk Write Operations" and also you can refer Mongo DB limits and thresholds
https://docs.mongodb.com/manual/reference/limits
How did you determine that the second collection did not get updated?
I believe you are simply seeing the difference between NoSQL and SQL databases. A SQL database will guarantee you that a read after a successful write will read the data you just wrote. A NoSQL database does not guarantee that you can immediately read data you just wrote. See this answer for more details.
Related
I'm currently working on a mongoDB collection containing documents that looks like the following :
{ startTime : Date, endTime: Date, source: String, metaData: {}}
And my usecase is to retrieve all documents that is included within a queried time frame, such as my query looks like this :
db.myCollection.find(
{
$and: [
{"source": aSource},
{"startTime" : {$lte: timeFrame.end}},
{"endTime" : {$gte: timeFrame.start}}
]
}
).sort({ "startTime" : 1 })
With an index defined as the following :
db.myCollection.createIndex( { "source" : 1, "startTime": 1, "endTime": 1 } );
The problem is that queries are very slow (multiple hundreds of ms on a local database) as soon as the number of document per source increase.
Using mongo explain shows me that i'm efficiently using this index (only found documents are scanned, otherwise only index-access is made), so the slowness seems to come from the index scan itself, as this query needs to go over a large portion of this index.
In addition to that, such an index gets huge pretty quickly and therefore seems inefficient.
Is there anything i'm missing that could help makes those queries faster, or am I condemned to retrieve all the documents belonging to a given source as the best way to go ? I see that mongo now provides some time-series features, could that bring any help in regard of my problem ?
I have a set of test results in my mongodb database. Each document in the database contains version information, test data, date, test run information etc...
The version is broken up in the document and stored as individual values. For example: { VER_MAJOR : "0", VER_MINOR : "2", VER_REVISION : "3", VER_PATCH : "20}
My application wants the ability to specify a specific version and grab the document as well as the previous N documents based on the version.
For example:
If version = 0.2.3.20 and n = 5 then the result would return documents with version 0.2.3.20, 0.2.3.19, 0.2.3.18, 0.2.3.17, 0.2.3.16, 0.2.3.15
The solutions that come to my mind is:
Create a new database that contains documents with version information and is sorted. Which can be used to obtain the previous N version's which can be used to obtain the corresponding N documents in the test results database.
Perform the sorting in the test results database itself like in number 1. Though if the test results database is large, this will take a very long time. Also consider inserting in order every time.
Creating another database like in option 1 doesn't seem like the right way. But sorting the test results database seems like there will be lots of overhead, am I mistaken that I should be worried about option 2 producing lots of overhead? I have the impression I'd have to query the entire database then sort it on application side. Querying the entire database seems like overkill...
db.collection_name.find().sort([Paramaters for sorting])
You are quite correct that querying and sorting the entire data set would be very excessive. I probably went overboard on this, but I tried to break everything down in detail below.
Terminology
First thing first, a couple terminology nitpicks. I think you're using the term Database when you mean to use the word Collection. Differentiating between these two concepts will help with navigating documentation and allow for a better understanding of MongoDB.
Collections and Sorting
Second, it is important to understand that documents in a Collection have no inherent ordering. The order in which documents are returned to your app is only applied when retrieving documents from the Collection, such as when specifying .sort() on a query. This means we won't need to copy all of the documents to some other collection; we just need to query the data so that only the desired data is returned in the order we want.
Query
Now to the fun part. The query will look like the following:
db.test_results.find({
"VER_MAJOR" : "0",
"VER_MINOR" : "2",
"VER_REVISION" : "3",
"VER_PATCH" : { "$lte" : 20 }
}).sort({
"VER_PATCH" : -1
}).limit(N)
Our query has a direct match on the three leading version fields to limit results to only those values, i.e. the specific version "0.2.3". A range $lte filter is applied on VER_PATCH since we will want more than a single patch revision.
We then sort results by VER_PATCH to return results descending by the patch version. Finally, the limit operator is used to restrict the number of documents being returned.
Index
We're not done yet! Remember how you said that querying the entire collection and sorting it on the app side felt like overkill? Well, the database would doing exactly that if an index did not exist for this query.
You should follow the equality-sort-match rule when determining the order of fields in an index. In this case, this would give us the index:
{ "VER_MAJOR" : 1, "VER_MINOR" : 1, "VER_REVISION" : 1, "VER_PATCH" : 1 }
Creating this index will allow the query to complete by scanning only the results it would return, while avoiding an in-memory sort. More information can be found here.
I have the following structure in my firestore database:
messages:
m1:
title: "Message 1"
...
archived: false
m2:
title: "Message 2"
...
archived: true
Let's say I have 20k messages and I want to get archived messages using a "where" clause, will my query be slower than if I structured my database as following ?
nonArchivedMessages:
m1:
title: "Message 1"
...
archivedMessages:
m2:
title: "Message 2"
...
Using the second structure seems, to me, more adapted for large datasets but implies issues in some cases, such as getting a message without knowing whether it is archived or not.
One of the guarantees for Cloud Firestore is that the time it takes to retrieve a certain number of documents is not dependent on the total number of documents in the collection.
That means that in your first data model, if you load 100 archived documents and (for example) it takes 1 second, you know that it'll always take about 1 second to load 100 archived documents, no matter how many documents there are in the collection.
With that knowledge the only difference between your two data models is that in the first model you need a query to capture the archived messages, while in the second model you don't need a query. Queries on Cloud Firestore run by accessing an index, so the difference is that there is one (more) index being read in the first data model. While this has a minimal impact on the execution time, it is going to be insignificant compared to the time it takes to actually read the documents and return them to the client.
So: there may be other reasons to prefer the second data model, but the performance to read the archived messages is going to be the same between them.
I am a beginner at Mongo and I made a data base with the following topology.
Some fields of metadata and one field that contain the experiment results.
experiment results- vector of integers with ~150,000 values
status = db.DataTest.insert_one(
{
"person_num" : num,
"life_cycle" : cycle,
"other_metadata" : meta_data,
"results_of_experiment": big_array
}
)
I inserted something like 7500 of those documents
Its occupied 8GB of memory and work really slowly for find operations.
I don't need those experiment results to search by them only the option to retrieve them from the DB as chunk of data.
Is there another solution to store on the DB the experiment results?
Is using "gridfs" is relevant to this case and not too complicated?
Based on your comments, the most common query is
db.DataTest.find( { "life_cycle": { $gt: 800 } }).limit(5)
Without an index on the life_cycle field, MongoDB is forced to do a collection scan. That is, fetch & evaluate all documents in your collection one by one. In a large collection, this will take a long time.
MongoDB does not create indexes automatically. You would have to observe your most common queries, and create indexes to support those queries. As far as I know, there is no automatic index creation in any database software; SQL, NoSQL, or otherwise.
Database indexing is a deep subject and cannot be explained in a short answer.
Having said that, if you create an index on the life_cycle field, it should improve your query times but only for the query you posted above. Other query types would likely require different indexes. You can do so in the mongo shell:
db.DataTest.createIndex({life_cycle: 1})
I encourage you to read these pages to understand more about indexing in MongoDB:
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/applications/indexes/
https://docs.mongodb.com/manual/tutorial/create-indexes-to-support-queries/
I have a mongo collection with documents that have a schema structured like the following:
{ _id : bla,
fname : foo,
lname : bar,
subdocs [ { subdocname : doc1
field1 : one
field2 : two
potentially_huge_array : [...]
}, ...
]
}
I'm using the ruby mongo driver that currently does not support elemMatch. I use an aggregation when extracting from subdocs via a project, unwind and match pipeline.
What I would now like to do is to page results from the potentially_huge_array array contained in the subdocument. I have not been able to figure out how to grab just a subset of the array without dragging the entire subdoc, huge array and all, out of the db into my app.
Is there some way to do this?
Would a different schema be a better way to handle this?
Depending on how huge is huge, you definitely don't want it embedded into another document.
The main reason is that unless you always want the array returned with the document, you probably don't want to store it as part of the document. How you can store it in another collection would depend on exactly how you want to access it.
Reviewing the types of queries you most often perform on your data will usually suggest the best schema - one that will allow you to be efficient about number of queries, the amount of data returned and ease of indexing the data.
If you field really huge and changes often, just placed it in separate collection.