Mongodb Aggregating information into new collection automatically? - angularjs

I'm trying to figure out the best way to implement this.
If I have one large collection in my mongodb that holds all of my "Inventory" information for my warehouse without regard to the specific "type" of inventory, what is the best way to aggregate the data into their own collections continuously? **Added this information after-the-fact: I'm using a Mean stack so maybe some of this is better to just do server-side with an angular function rather than actually keeping a collection updated?
For instance, my current "Inventory" collection would have items such as
_id: something, name: item1, type: chemical, vendor: something, safetycode: ####, machineUse: n/a, equipmentUse: n/a ...
_id: something2, name: item2, type: machine, vendor: something2 safetycode: n/a, machineUse: "digging", equipmentUse: n/a ...
_id: something3, name: item3, type: equipment, vendor: something3 safetycode: n/a, machineUse: n/a, equipmentUse: "Computer" ...
I'm inclined to $group but is this best practice to keep a 'SEPERATE' collection updated with their respective groups? You'll notice in the following that the aggregate function should collect the specific 'type'(Chemical, Machine, equipment, etc...) and store all the details of each item collected with fields that are only used for that 'type' (i.e, Chemicals use 'safetycode', machines DO NOT use saftey code so it's left out and instead 'machineUse' is stored, etc.)
db.invfulls.aggregate([
{ "$group": {
"_id": "$itype",
"total": {$sum : 1},
"items":{
"$push":{
"$cond":{
"if": {"$eq":["$itype","Chemical"]},
"then": {"id":"$_id", "name":"$name", "vendor":"$vendor", "review":"$needsreview"},
"else": {"id":"$_id", "name":"$name", "vendor":"$vendor"}
}
}
}
}},
{$out: "invByType"}
])
Additionally, would I have to make this a function in the database and call that function anytime there is a new "post" made?
I've read a bit about the mapReduce as well but everything I read says it's a very slow and shouldn't be used?

Related

Why does this push not add a new field? (mongoDB)

I have a collection of restaurants where each document includes data such as the name, location, reviews, etc, and I'm attempting to push data which includes two reviews to the restaurant located in the borough of Brooklyn. This document happens to not currently have an array field of "reviews". According to everything I've looked at, the $push function should automatically add the array to the document, but it does not. Here is the code:
db.restaurants.updateOne(
{ borough: 'Brooklyn' },
{
$push: {
reviews: {
$each: [
{
name: 'Frank Zappa',
date: 'January 3, 2019',
rating: 3,
comments: 'This restaurant is not good',
},
{
name: 'Freddie Mercury',
date: 'January 3, 2019',
rating: 5,
comments: 'This restaurant is my favorite',
},
],
},
},
}
);
The next command after this is to push the same reviews to another restaurant that does already have the array field "reviews", and also slice to 2. The only difference in code between these two is the filter and the slice modifier. The second works perfectly, the first does not.
Am I misunderstanding how push should work? Using push is a requirement in this case, though I did also try addToSet with the same
https://mongoplayground.net/p/MxH01R4dxjU
it seems your Query is working fine plz check that you provided right values in the filter operation
Turns out I didn't pay close enough attention to how many documents have the "borough" : "Brooklyn" field and there was more than one. Changed to updateMany and the problem was solved. Thanks for the time and attention!

Efficient way to store frequently requested key-value data with relations?

Let's say I'm building Twitter.
One of the tasks is to track, which tweets are read by particular user and store this data on server. When user requests somebody's feed, server should return:
[
{
id: 1,
tweet: "Hey there!",
isRead: false
},
{
id: 2,
tweet: "Here's my cat, look",
isRead: true
},
{
id: 2,
tweet: "Blue or yellow? Thats the question",
isRead: true
},
...
]
Which is the most efficient way to store data for which tweets are read by which user, and retrieving this data when returning somebody's feed for particular user?
Any ideas about data storing architecture are highly appreciated. My current stack is PostgreSQL for storing users and "tweets". Redis, MongoDB and neo4j are also used in the project, so available.
The first guess was to use Redis, like:
user_id: tweet_id
-----------------
user_id: tweet_id
-----------------
....
But I think, there may be better variants, more suitable for persistent data storage.
Thank you in advance.
Have a look at this Twitter-clone that Redis' author, antirez (a.k.a Salvatore Sanfilippo), had made: http://redis.io/topics/twitter-clone

MongoDB: Query and retrieve objects inside embedded array?

Let's say I have the following document schema in a collection called 'users':
{
name: 'John',
items: [ {}, {}, {}, ... ]
}
The 'items' array contains objects in the following format:
{
item_id: "1234",
name: "some item"
}
Each user can have multiple items embedded in the 'items' array.
Now, I want to be able to fetch an item by an item_id for a given user.
For example, I want to get the item with id "1234" that belong to the user with name "John".
Can I do this with mongoDB? I'd like to utilize its powerful array indexing, but I'm not sure if you can run queries on embedded arrays and return objects from the array instead of the document that contains it.
I know I can fetch users that have a certain item using {users.items.item_id: "1234"}. But I want to fetch the actual item from the array, not the user.
Alternatively, is there maybe a better way to organize this data so that I can easily get what I want? I'm still fairly new to mongodb.
Thanks for any help or advice you can provide.
The question is old, but the response has changed since the time. With MongoDB >= 2.2, you can do :
db.users.find( { name: "John"}, { items: { $elemMatch: { item_id: "1234" } } })
You will have :
{
name: "John",
items:
[
{
item_id: "1234",
name: "some item"
}
]
}
See Documentation of $elemMatch
There are a couple of things to note about this:
1) I find that the hardest thing for folks learning MongoDB is UN-learning the relational thinking that they're used to. Your data model looks to be the right one.
2) Normally, what you do with MongoDB is return the entire document into the client program, and then search for the portion of the document that you want on the client side using your client programming language.
In your example, you'd fetch the entire 'user' document and then iterate through the 'items[]' array on the client side.
3) If you want to return just the 'items[]' array, you can do so by using the 'Field Selection' syntax. See http://www.mongodb.org/display/DOCS/Querying#Querying-FieldSelection for details. Unfortunately, it will return the entire 'items[]' array, and not just one element of the array.
4) There is an existing Jira ticket to add this functionality: it is https://jira.mongodb.org/browse/SERVER-828 SERVER-828. It looks like it's been added to the latest 2.1 (development) branch: that means it will be available for production use when release 2.2 ships.
If this is an embedded array, then you can't retrieve its elements directly. The retrieved document will have form of a user (root document), although not all fields may be filled (depending on your query).
If you want to retrieve just that element, then you have to store it as a separate document in a separate collection. It will have one additional field, user_id (can be part of _id). Then it's trivial to do what you want.
A sample document might look like this:
{
_id: {user_id: ObjectId, item_id: "1234"},
name: "some item"
}
Note that this structure ensures uniqueness of item_id per user (I'm not sure you want this or not).

How to query in the nested array.(using pymongo)

I'm new bee in mongodb.
I made a nested array document like this.
data = {
"title": "mongo community",
"description": "I am a new bee",
"topics": [{
"title": "how to find object in array",
"comments": [{
"description": "desc1"
}]
},
{
"title": "the case to use ensureIndex",
"comments": [{
"description": "before query"
},
{
"description": "If you want"
}
]
}
]
}
after that, put it in the "community"
db.community.insert(data)
so,I would like to accumulate "comments" which topics title is "how to find object in array"
then I tried,
data = db.community.find_one({"title":"mongo community","topics.title":"how to find object in array" } )
the result is
>>> print data
{
u 'topics': [{
u 'comments': [{
u 'description': u 'desc1'
}],
u 'title': u 'how to find object in array'
},
{
u 'comments': [{
u 'description': u 'before query'
},
{
u 'description': u 'If you want'
}],
u 'title': u 'the case to use ensureIndex'
}],
u '_id': ObjectId('4e6ce188d4baa71250000002'),
u 'description': u 'I am a new bee',
u 'title': u 'mongo community'
}
I don't need the topics "the case to use ensureIndex"
Whould you give me any advice.
thx.
It looks like you're embedding topics as an array all in a single document. You should try to avoid returning partial documents frequently from MongoDB. You can do it with the "fields" argument of the find method, but it isn't very easy to work with if you're doing it frequently.
So to solve this you could try to make each topic a separate document. I think that would be easier for you too. If you want to save information about the "community" for forum, put it in a separate collection. For example, you could use the following in the monbodb shell:
// ad a forum:
var forum = {
title:"mongo community",
description:"I am a new bee"
};
db.forums.save(forum);
// add first topic:
var topic = {
title: "how to find object in array",
comments: [ {description:"desc1"} ],
forum:"mongo community"
};
db.topics.save(topic);
// add second topic:
var topic = {
title: "the case to use ensureIndex",
comments: [
{description:"before query"},
{description:"If you want"}
],
forum:"mongo community"
};
db.topics.save(topic);
print("All topics:");
printjson(db.topics.find().toArray());
print("just the 'how to find object in array' topic:")
printjson(db.topics.find({title:"how to find object in array"}).toArray());
Also, see the document Trees In MongoDB about schema design in MongoDB. It happens to be using a similar schema to what you are working with and expands on it for more advanced use cases.
MongoDB operates on documents, that is, the top level documents (the things you save, update, insert, find, and find_one on). Mongo's query language lets you search within embedded objects, but will always return, update, or manipulate one (or more) of these top-level documents.
MongoDB is often called "schema-less," but something more like "(has) flexible schemas" or "(has) per-document schemas" would be a more accurate description. This is a case where your schema design -- having topics embedded directly within a community -- is not working for this particular query. However there are probably other queries that this schema supports more efficiently, like listing the topics within a community in a single query. You might want to consider the queries you want to make and re-design your schema accordingly.
A few notes on MongoDB limitations:
top-level documents are always returned (optionally with only a subset of fields, as #scott noted -- see the mongodb docs on this topic)
each document is limited to 16 megabytes of data (as of version 1.8+), so this schema will not work well if the communities have a long list of topics
For help with schema design, see the mongodb docs on schema design, Kyle Banker's video "Schema Design Basics", and Eliot Horowitz's video "Schema Design at Scale" for an introduction, tips, and considerations.

mongodb - retrieve array subset

what seemed a simple task, came to be a challenge for me.
I have the following mongodb structure:
{
(...)
"services": {
"TCP80": {
"data": [{
"status": 1,
"delay": 3.87,
"ts": 1308056460
},{
"status": 1,
"delay": 2.83,
"ts": 1308058080
},{
"status": 1,
"delay": 5.77,
"ts": 1308060720
}]
}
}}
Now, the following query returns whole document:
{ 'services.TCP80.data.ts':{$gt:1308067020} }
I wonder - is it possible for me to receive only those "data" array entries matching $gt criteria (kind of shrinked doc)?
I was considering MapReduce, but could not locate even a single example on how to pass external arguments (timestamp) to Map() function. (This feature was added in 1.1.4 https://jira.mongodb.org/browse/SERVER-401)
Also, there's always an alternative to write storedJs function, but since we speak of large quantities of data, db-locks can't be tolerated here.
Most likely I'll have to redesign the structure to something 1-level deep, like:
{
status:1,delay:3.87,ts:138056460,service:TCP80
},{
status:1,delay:2.83,ts:1308058080,service:TCP80
},{
status:1,delay:5.77,ts:1308060720,service:TCP80
}
but DB will grow dramatically, since "service" is only one of many options which will append each document.
please advice!
thanks in advance
In version 2.1 with the aggregation framework you are now able to do this:
1: db.test.aggregate(
2: {$match : {}},
3: {$unwind: "$services.TCP80.data"},
4: {$match: {"services.TCP80.data.ts": {$gte: 1308060720}}}
5: );
You can use a custom criteria in line 2 to filter the parent documents. If you don't want to filter them, just leave line 2 out.
This is not currently supported. By default you will always receive the whole document/array unless you use field restrictions or the $slice operator. Currently these tools do not allow filtering the array elements based on the search criteria.
You should watch this request for a way to do this: https://jira.mongodb.org/browse/SERVER-828
I'm attempting to do something similar. I tried your suggestion of using the GROUP function, but I couldn't keep the embedded documents separate or was doing something incorrectly.
I needed to pull/get a subset of embedded documents by ID. Here's how I did it using Map/Reduce:
db.parent.mapReduce(
function(parent_id, child_ids){
if(this._id == parent_id)
emit(this._id, {children: this.children, ids: child_ids})
},
function(key, values){
var toReturn = [];
values[0].children.forEach(function(child){
if(values[0].ids.indexOf(product._id.toString()) != -1)
toReturn.push(child);
});
return {children: toReturn};
},
{
mapparams: [
"4d93b112c68c993eae000001", //example parent id
["4d97963ec68c99528d000007", "4debbfd5c68c991bba000014"] //example embedded children ids
]
}
).find()
I've abstracted my collection name to 'parent' and it's embedded documents to 'children'. I pass in two parameters: The parent document ID and an array of the embedded document IDs that I want to retrieve from the parent. Those parameters are passed in as the third parameter to the mapReduce function.
In the map function I find the parent document in the collection (which I'm pretty sure uses the _id index) and emit its id and children to the reduce function.
In the reduce function, I take the passed in document and loop through each of the children, collecting the ones with the desired ID. Looping through all the children is not ideal, but I don't know of another way to find by ID on an embedded document.
I also assume in the reduce function that there is only one document emitted since I'm searching by ID. If you expect more than one parent_id to match, than you will have to loop through the values array in the reduce function.
I hope this helps someone out there, as I googled everywhere with no results. Hopefully we'll see a built in feature soon from MongoDB, but until then I have to use this.
Fadi, as for "keeping embedded documents separate" - group should handle this with no issues
function getServiceData(collection, criteria) {
var res=db[collection].group({
cond: criteria,
initial: {vals:[],globalVar:0},
reduce: function(doc, out) {
if (out.globalVar%2==0)
out.vals.push({doc.whatever.kind.and.depth);
out.globalVar++;
},
finalize: function(out) {
if (vals.length==0)
out.vals='sorry, no data';
return out.vals;
}
});
return res[0];
};

Resources