Why is my MongoDB aggregation query so slow - database

I have several IDs (usually 2 or 3) of users whom I need to fetch from the database. Thing is, I also need to know the distance from a certain point. Problem is, my collection has 1,000,000 documents (users) in it, and it takes upwards of 30 seconds to fetch the users.
Why is this happening? When I just use the $in operator for the _id it works fine and returns everything in under 200ms, and when I just use the $geoNear operator it also works fine, but when I use the 2 together everything slows down insanely. What do I do? Again, all I need is a few users with the IDs from the userIds array and their distance from a certain point (user.location).
EDIT: Also wanted to mention that when i use $nin instead of $in the query also performs pefrectly. Only $in is causing the problem when combined with $geoNear
const user = await User.findById('logged in users id');
const userIds = ['id1', 'id2', 'id3'];
[
{
$geoNear: {
near: user.location,
distanceField: 'distance',
query: {
_id: { $in: userIds }
}
}
}
]

I found a work-around: i just query by the ID field, and later I use a library to determine the distance of the returned docs from the central point.

Indexing your data could be a solution to your problem. without indexing mongodb has to scan through all documents.

Related

Firebase Firestore compound orderBy query not working

in my react app I'm reading my Firestore collections and used conditional query for filtering. and this queries contains multiple orderBy (price, date, type). each order has ascending and descending. and i used it like so.
const constraints = [];
if (price)
constraints.push(orderBy("price", price == "1" ? "desc" : "asc"));
if (date)
constraints.push(orderBy("postedDate", date == "1" ? "desc" : "asc"));
const posts = collection(db, "allPosts");
let q = query(livings, ...constraints);
const qSnapshot = await getDocs(q);
when running this and filtering only by one of them, it works. but when i use the together it only works for the first query, in this case for price. no matter if i change the value before or after.
what is the solution for this? also does this happen with where query as well?
Every query that you execute against Firestore needs a matching index. For single-field queries, the indexes are automatically generated. But for queries (including ordering results) involving multiple fields you will need to (often) explicitly define the composite index yourself.
If the index that is needed for a query is not found, the server sends back an error and the SDK raises that error. So you catch errors in your code and log them, you'll find the error message in your logging output. In that error message you'll find a direct link to the Firestore console to generate the exact index that is needed.
So:
Catch and log the error.
Find the message in the logging output.
Click the link in the error message.
Tell Firestore to generate the index with a single click.
Be patient while your existing data is indexed.
Try the query again. :)

Mongoose - Can't explain population

I'm doing a certain query, and I want to get the executionTime of it (including the popualtion):
const managerId = "023492745"
const company = await Companies.find({
_id: "1234"
})
.populate(
{
path: "employees",
match: {
_id: { $ne: managerId },
},
})
.explain()
I try to use explain() on the query, but all It only retrieves information about the find() part and not about the populate() part. How can I get the executionTime of the whole query?
explain is a command executed by the mongodb server, while populate is a function executed on the client side by mongoose.
The populate function works by receiving the results of the find from the server, then submitting additional queries to retrieve the corresponding data to place in each document.
The response to the explain command does not contain the found documents, only the statistics and metadata about the query, so there is nothing for populate to operate on.
Instead of explain, you might try increasing the log verbosity or enabling profiling on the mongod server to capture the subsequent queries.

Querying Firestore without Primary Key

I'd like my users to be able to update the slug on the URL, like so:
url.co/username/projectname
I could use the primary key but unfortunately Firestore does not allow any modifcation on assigned uid once set so I created a unique slug field.
Example of structure:
projects: {
P10syfRWpT32fsceMKEm6X332Yt2: {
slug: "majestic-slug",
...
},
K41syfeMKEmpT72fcseMlEm6X337: {
slug: "beautiful-slug",
...
},
}
A way to modify the slug would be to delete and copy the data on a new document, doing this becomes complicated as I have subcollections attached to the document.
I'm aware I can query by document key like so:
var doc = db.collection("projects");
var query = doc.where("slug", "==", "beautiful-slug").limit(1).get();
Here comes the questions.
Wouldn't this be highly impractical as if I have more than +1000 docs in my database, each time I will have to call a project (url.co/username/projectname) wouldn't it cost +1000 reads as it has to query through all the documents? If yes, what would be the correct way?
As stated in this answer on StackOverflow: https://stackoverflow.com/a/49725001/7846567, only the document returned by a query is counted as a read operation.
Now for your special case:
doc.where("slug", "==", "beautiful-slug").limit(1).get();
This will indeed result in a lot of read operations on the Firestore server until it finds the correct document. But by using limit(1) you will only receive a single document, this way only a single read operation is counted against your limits.
Using the where() function is the correct and recommended approach to your problem.

Parse Server, MongoDB - get "liked" state of an object

I am using Parse Server, which runs on MongoDB.
Let's say I have collections User and Comment and a join table of user and comment.
User can like a comment, which creates a new record in a join table.
Specifically in Parse Server, join table can be defined using a 'relation' field in the collection.
Now when I want to retrieve all comments, I also need to know, whether each of them is liked by the current user. How can I do this, without doing additional queries?
You might say I could create an array field likers in Comment table and use $elemMatch, but it doesn't seem as a good idea, because potentially, there can be thousands of likes on a comment.
My idea, but I hope there could be a better solution:
I could create an array field someLikers, a relation (join table) field allLikers and a number field likesCount in Comment table. Then put first 100 likers in both someLikers and allLikers and additional likers only in the allLikers. I would always increment the likesCount.
Then when querying a list of comments, I would implement the call with $elemMatch, which would tell me whether the current user is inside someLikers. When I would get the comments, I would check whether some of the comments have likesCount > 100 AND $elemMatch returned null. If so, I would have to run another query in the join table, looking for those comments and checking (querying by) whether they are liked by the current user.
Is there a better option?
Thanks!
I'd advise agains directly accessing MongoDB unless you absolutely have to; after all, the way collections and relations are built is an implementation detail of Parse and in theory could change in the future, breaking your code.
Even though you want to avoid multiple queries I suggest to do just that (depending on your platform you might even be able to run two Parse queries in parallel):
The first one is the query on Comment for getting all comments you want to display; assuming you have some kind of Post for which comments can be written, the query would find all comments referencing the current post.
The second query again is for on Comment, but this time
constrained to the comments retrieved in the first query, e.g.: containedIn("objectID", arrayOfCommentIDs)
and constrained to the comments having the current user in their likers relation, e.g.: equalTo("likers", currentUser)
Well a join collection is not really a noSQL way of thinking ;-)
I don't know ParseServer, so below is just based on pure MongoDB.
What i would do is, in the Comment document use an array of ObjectId's for each user who likes the comment.
Sample document layout
{
"_id" : ObjectId(""),
"name" : "Comment X",
"liked" : [
ObjectId(""),
....
]
}
Then use a aggregation to get the data. I asume you have the _id of the comment and you know the _id of the user.
The following aggregation returns the comment with a like count and a boolean which indicates the user liked the comment.
db.Comment.aggregate(
[
{
$match: {
_id : ObjectId("your commentId")
}
},
{
$project: {
_id : 1,
name :1,
number_of_likes : {$size : "$liked"},
user_liked: {
$gt: [{
$size: {
$filter: {
input: "$liked",
as: "like",
cond: {
$eq: ["$$like", ObjectId("your userId")]
}
}
}
}, 0]
},
}
},
]
);
this returns
{
"_id" : ObjectId(""),
"name" : "Comment X",
"number_of_likes" : NumberInt(7),
"user_liked" : true
}
Hope this is what your after.

What is the fastest ArangoDB friends-of-friends query (with count)

I'm trying to use ArangoDB to get a list of friends-of-friends. Not just a basic friends-of-friends list, I also want to know how many friends the user and the friend-of-a-friend have in common and sort the result.
After several attempts at (re)writing the best performing AQL query, this is what I ended up with:
LET friends = (
FOR f IN GRAPH_NEIGHBORS('graph', #user, {"direction": "any", "includeData": true, "edgeExamples": { name: "FRIENDS_WITH"}})
RETURN f._id
)
LET foafs = (FOR friend IN friends
FOR foaf in GRAPH_NEIGHBORS('graph', friend, {"direction": "any", "includeData": true, "edgeExamples": { name: "FRIENDS_WITH"}})
FILTER foaf._id != #user AND foaf._id NOT IN friends
COLLECT foaf_result = foaf WITH COUNT INTO common_friend_count
RETURN {
user: foaf_result,
common_friend_count: common_friend_count
}
)
FOR foaf IN foafs
SORT foaf.common_friend_count DESC
RETURN foaf
Unfortunately, performance is not as good as I would've liked. Compared to the Neo4j versions of the same query(and data), AQL seems quite a bit slower (5-10x).
What I'd like to know is... How can I improve our query to make it perform better?
I am one of the core developers of ArangoDB and tried to optimize your query. As I do not have your dataset I can only talk about my test dataset and would be happy to hear if you can validate my results.
First if all I am running on ArangoDB 2.7 but in this particular case I do not expect a major performance difference to 2.6.
In my dataset I could execute your query as it is in ~7sec.
First fix:
In your friends statement you use includeData: true and only return the _id. With includeData: false GRAPH_NEIGHBORS directly returns the _id and we can also get rid of the subquery here
LET friends = GRAPH_NEIGHBORS('graph',
#user,
{"direction": "any",
"edgeExamples": {
name: "FRIENDS_WITH"
}})
This got it down to ~ 1.1 sec on my machine. So I expect that this will be close to the performance of Neo4J.
Why does this have a high impact?
Internally we first find the _id value without actually loading the documents JSON. In your query you do not need any of this data, so we can safely continue with not opening it.
But now for the real improvement
Your query goes the "logical" way and first gets users neighbors, than finds their neighbors, counts how often a foaf is found and sorts it.
This has to build up the complete foaf network in memory and sort it as a whole.
You can also do it in a different way:
1. Find all friends of user (only _ids)
2. Find all foaf (complete document)
3. For each foaf find all foaf_friends (only _ids)
4. Find the intersection of friends and foaf_friends and COUNT them
This query would like this:
LET fids = GRAPH_NEIGHBORS("graph",
#user,
{
"direction":"any",
"edgeExamples": {
"name": "FRIENDS_WITH"
}
}
)
FOR foaf IN GRAPH_NEIGHBORS("graph",
#user,
{
"minDepth": 2,
"maxDepth": 2,
"direction": "any",
"includeData": true,
"edgeExamples": {
"name": "FRIENDS_WITH"
}
}
)
LET commonIds = GRAPH_NEIGHBORS("graph",
foaf._id, {
"direction": "any",
"edgeExamples": {
"name": "FRIENDS_WITH"
}
}
)
LET common_friend_count = LENGTH(INTERSECTION(fids, commonIds))
SORT common_friend_count DESC
RETURN {user: foaf, common_friend_count: common_friend_count}
Which in my test graph was executed in ~ 0.024 sec
So this gave me a factor 250 faster execution time and I would expect this to be faster than your current query in Neo4j, but as I do not have your dataset I can not verify it, it would be good if you could do it and tell me.
One last thing
With the edgeExamples: {name : "FRIENDS_WITH" } it is the same as with includeData, in this case we have to find the real edge and look into it. This could be avoided if you store your edges in separate collections based on their name. And then remove the edgeExamples as well. This will further increase the performance (especially if there are a lot of edges).
Future
Stay tuned for our next release, we are right now adding some more functionality to AQL which will make your case much easier to query and should give another performance boost.

Resources