Mongoid Search in embedded documents in optimized way - mongoid

I have 2 documents Users and books
class User
embeds_many :books
end
class Book
embedded_in :user
end
I want to search for books on some conditions, Is there an efficient way to do that rather than loop on all users and for each user loop on its books to retrieve books that matched the conditions.

When you say:
embeds_many :books
that actually produces an array of hashes inside MongoDB and then wraps some Mongoid stuff around that array and its elements. So you search embedded documents just like you'd search any other array of hashes.
For example, if your Book has a title field then you could say:
users = User.where('books.title' => /Pancakes/)
to find all the users with books about pancakes. Of course that gives you a bunch of Users rather than Books. Embedded documents don't exist on their own, they're just parts of their parent document, so you have to go through the parent. But, once you have some Users, you can extract the books you're interested in:
books = users.map(&:books).flatten.select { |b| b.title =~ /Pancakes/ }
You can also throw in an only if your Users are large and you don't want to pull whole Users out of MongoDB:
# Some versions of Mongoid will get upset if you don't include :id
books = users.only(:books, :id).map...
If you're doing this sort of thing a lot then maybe Book shouldn't be embedded inside User.

Related

Is there any way to sort on a nested value in Azure Cognitive Search?

Is there any way to sort on a nested value in Azure Cognitive Search?
My use case is that I have a database of songs that are associated with dances that one can dance to that song. Users can vote on the danceability of a dance to a song, so there is a is a numeric vote tally for each song/dance combination. A core part of the functionality for the search is to be able to do an arbitrary search and sort the results by the popularity of a particular dance.
I am currently modeling this by creating a new top level field with a decorated name (e.g. DNC_Salsa or DNC_Waltz) for each dance. This works. But aside from being clumsy, I can't associate other information with a dance. In addition, I have to dynamically add the dance fields, so I have to use the generic SearchDocument type in the C# library rather than using a POCO type.
I'd much prefer to model this with the dance fields as an array of subdocuments where the subdocuments contain a dance name, a vote count and the other information I'd like to associate with a dance.
A simplified example record would look something like this:
{
"title": "Baby, It's Cold Outside",
"artist": "Seth MacFarlane",
"tempo": 119.1,
"dances": [
{ "name", "cha cah", "votes", 1 },
{ "name", "foxtrot", "votes", 4 }
]
}
I gave this a try and received:
{"error":{"code":"OperationNotAllowed","message":"The request is invalid.","details":[{"code":"CannotEnableFieldForSorting","message":"The field 'Votes' cannot be enabled for sorting because it is directly or indirectly contained in a collection, which makes it a multi-valued field. Sorting is not allowed on multi-valued fields. Parameters: definition"}]}}
It looks like elastic search will do what I want:
Sort search results | Elasticsearch Guide [7.17] | Elastic
If I'm reading the Elasticsearch documetion correctly, you can basically say I'd like to sort on the dances subdocument by first filtering for name == "cha cha" and then sorting on the vote field.
Is there anything like this in Azure Cognitive Search? Or even something more restrictive? I don't need to do arbitrary sorting on anything in the subdocument. I would be happy to only ever sort on the vote count (although I'd have to be able to do that for any dance name).
It's not clear to me what your records or data model looks like. However, from the error message you provided, it's clear that you try to sort on a multivalue property. That is logically impossible.
Imagine a property Color that can contain colors like 'Red' or 'Blue'. If you sort by Color, you would get your red values before the blues. If you instead had 'Colors' that can contain multiple values like both 'Red' and 'Blue', how would you sort it? You can't.
So, if you actually want to sort by a property, that property has to contain a single value.
When that's said, I have a feeling you are really asking about ranking/boosting. Not sorting. Have a look at the examples with boosting and scoring profiles for different genres of music. I believe the use case in these examples could help you solve your use case.
https://learn.microsoft.com/en-us/azure/search/index-add-scoring-profiles#extended-example

Can I match important Keywords in a string?

Consider a user inputs this search string to a news search engine:
"Oops, Donald Trump Jr. Did It Again (Wikileaks Edition) :: Politics - Paste"
Imagine we have a database of News Titles, and a database of "Important People".
The goal here is: If a Search string contains an Important person, then return results containing this "substring" with higher ranking then those resutls that do NOT contain it.
Using the Yahoo Vespa Engine, How can I match a database full of people names against long news title strings ?
*I hope that made sense, sorry everyone, my english not so good :( Thank you !
During document processing/indexing of news titles you could extract named entities from the input text using the "important people" database. This process could be implemented in a custom document processor. See http://docs.vespa.ai/documentation/document-processing-overview.html).
A document definition for the news search could look something like this with a custom ranking function. The document processor reads the input title and populates the entities array.
search news {
document news {
field title type string {
indexing: summary | index
}
field entities type array<string> {
indexing: summary | index
match: word
}
}
rank-profile entity-ranking {
first-phase {
expression: nativeRank(title) + matches(entities)
}
}
At query time you'll need to do the same named entity extraction from the query input and built a Vespa query tree which can search the title (e.g using OR or WeakAnd) and also search the entities field for the possible named entities using the Vespa Rank operator. E.g given your query example the actual query could look something like:
select * from sources * where rank(title contains "oops" or title
contains "donald" or title contains "trump", entities contains "Donald Trump Jr.");
You can build the query tree in a custom searcher http://docs.vespa.ai/documentation/searcher-development.html using a shared named entity extraction component.
Some resources
Shared components & writing custom searchers/documentprocesors (To implement the named entity extraction) http://docs.vespa.ai/documentation/jdisc/container-components.html
Ranking http://docs.vespa.ai/documentation/ranking.html
Query language http://docs.vespa.ai/documentation/query-language.html

Solr: Master documents with x children - how to index

There are "dossiers" that are being indexed in Solr.
Each dossier has x persons connected to it.
It should be possible to search for persons and to search for dossiers. When searching for a person, the dossier should also be returned.
I was wondering, what would be a good way to index this?
Do I need to split the index in a "DossierIndex" and a "PersonIndex"? Or just throw them together even though they don't really have common fields. (Dossier has status, etc; Persons have names, birthdays etc)
You should take a look into BlockJoin capabilities in Solr, with it help you could index "dossiers" with nested persons.
I recommend amazing article about it - http://blog.griddynamics.com/2013/09/solr-block-join-support.html
More info - https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers

Implementing keyword search on Google App Engine?

I'm trying to implement a keyword/tags search for a certain entity type in GAE's datastore:
class Docs(db.Model):
title = db.StringProperty()
user = db.StringProperty()
tags = db.StringListProperty()
I also wrote a very basic search function (using a fake list of tags, not the datastore values), that takes a query string and matches it to each set of tags. It ranks Docs based on how many query words match the tags. This is pretty much all I need it to do, except using the actual datastore.
I have no idea how to get the actual datastore values though. For my search function to work I need a list of all the entities in the datastore, which is impossible (?).
I also tried looking into GAE's experimental full-text search, and Relation Index Entities as a way to search the datastore without using the function I wrote. Neither was successful.
Any other ideas on how to search for entities based on tags?
It's a very simple query, if you need to find all Docs with a tag "findme", it's simply:
num_results = 10
query = Docs.all().filter("tags in", "findme")
results = query.fetch(num_results) # get list of results
It's well documented:
https://developers.google.com/appengine/docs/python/datastore/queries

limiting fields returned from a mongoid polymorphic relation

I have a polymorphic relation in mongoid like the following one:
class Company
include Mongoid::Document
field :name, :type => String
has_many :posts, as: :postable
end
class Person
include Mongoid::Document
field :name, :type => String
has_many :posts, as: :postable
end
class Post
include Mongoid::Document
belongs_to :postable, polymorphic: true
end
I would like to sometimes (and not most of the time) load only some fields of the postable. In a non polymorphic relation (say only a Person has posts) I can do:
Person.only(:name).find(some_post.postable_id)
But is this possible in a polymorphic relation?
I can see a few ways of doing this but I am not sure which is the best. Complex relations are not a strength MongoDB and other NoSQLs and queries can get expensive if you start scanning multiple documents for your answers.
Using where:
Person.where(name: "name").posts.where(postable_id: "id")
Finding the parent from the post:
Post.where(id: "id").person.only(:name)
(note that postable_id and id are not the same value!)
However, be careful. If all you are doing is listing posts you are probably better off including the person's name in the post and writing a callback to update the person's name if they change their name. Alternatively, if you are linking to comments from a person why not include the comment IDs (or enough data to make the link) in the Person model so you can get all comments with one document call. Similarly if you are getting a count, use a callback to make a post count in the Person model so you aren't mapping hundreds of Posts every time.
I hope this helps, if not provide me with some more information and I'll update this.

Resources