Aggregations on embeded documents not working - mongoid

I am trying to do some simple aggregations on embedded documents using Mongoid and it's not working although it works if the value is in the top level documents.
This works;
Accounts.avg("sale_price")
>> 500.0
but if the field to average is in an embedded document it returns 0
Accounts.avg("sales.sale_price")
>> 0.0
Am I using the wrong syntax to ref the embedded values or do these aggregation methods not work on embedded documents?

Related

Solr 8.8.2 reduce recall and improve precision for multi token queries - mm, qs, shingles

I'm facing a issue wherein I have huge amount of data in Solr and as a result, searching for a multi token query is generating a big recall set. For ex - if i search for "apple watch series 4 42mm", i get back 4 million results. My parser is edismax, minimum match setting is 2 as of now, and am using WhiteSpace Tokenizer with a bunch of filters. The goal here is to reduce this recall set to display more relevant results.
Things that I explored are -
MinimumMatch - Am trying setting mm to 2>2 4>3 to see how it results. Also tried finding out if i could apply mm on individual fields and found out that it used to be possible with local params in Solr but has been discontinued since Solr 7.2. I do not want to get into writing a custom parser or tweaking Solr's code since that could lead to other problems. Nor do i want to change the default parser to Lucene. Is there any other way that i could apply mm separately to category_name, product_name, product_description, brand_name, etc?
Query slop - Am not using qs as of now, tried a few examples converting my query into phrase query and applying qs. It does reduce recall but i have a problem there. Suppose i have a product which has "apple" in brand_name and "watch series 4 42mm" in product name, that is a relevant result but will not be returned because the phrase query has to have all tokens in the field. Is there a way to apply qs to suit my purpose?
ShingleFilterFactory - I'm trying this filter with outputUnigrams true because i do not want the individual terms to not be indexed. But with that, index size would explode and result set won't be that good either. Can i use other levers like mm or something else along with this to make it work? Also, is there a way to make outputUnigrams a query param?
Explored pf2, pf3, ps also but those will be used for boosting. Right now, my aim is filtering the most relevant results.
Can someone please help me with the above? Thanks

Displaying information about SolR search result

I am struggling with a little problem where I have to display relevant information about the resultset returned from SolR but can't figure out how to calculate it without iterating the results (bad).
Basically I am storing my documents with a state field and while the search is supposed to return all documents, the UI has to show "Found 15 entities, 5 are in state A, 3 in state B and 8 in C".
At the moment I am using a rather brittle approach of running the query 3 times with additional scoping by type, but I'd rather get that information from the one query I am displaying. (There have been some edge cases where the numbers don't add up and since SolR can return facets I guess there has to be a way to use that functionality in this case)
I am using SolR 3.5 from Rails with the sunspot gem
As you mention yourself, you can use facets for this by setting
facet=true&facet.field=state
I'm not familiar with the sunspot gem, but by looking at the documentation you can use
facets like this(Assuming Entity is your searchable):
Entity.search do:
facet :state
end
This should return the states of all entities returned by your query with the number of entities in this state. The Sunspot documentation tells me you can read these facets in the following way:
search.facet(:state).rows.each do |facet|
puts "State #{facet.value} has #{facet.count} entities"
end
Essentially there are three main sets of functions you can use to garner stats from solr.
The first is faceting:
http://wiki.apache.org/solr/SimpleFacetParameters
There is also grouping (field collapsing):
https://wiki.apache.org/solr/FieldCollapsing
And the stats package:
https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
Although the stats, facet and group may be replaced by the analytic package known as olap which is aimed to be in solr V 5.0.0:
https://issues.apache.org/jira/browse/SOLR-5302
Good luck.

Different scores from Solr 1 vs Solr 4 Dismax Handler

I've migrated my Solr 1.4 index to Solr 4.0 using this method, and I've kept my solrconfig.xml and schema.xml as unchanged as possible while still being functional.
I'm using the DisjunctionMaxQuery (dismax / solr.DisMaxRequestHandler) requestHandler and comparing my search results between Solr 1.4 and Solr 4. Using ?debugQuery=on in the URL, I can see that the parsedQuery portion is virtually the same between Solr versions, yet the generated scores are different. (The explain portion is different, but the calculation is long and obtuse.)
Example query: q=foo
Example response:
Solr 1.4:
title: "foo (32-bit)"
score: 3.8850176
Solr 4.0:
title: "foo (32-bit)"
score: 2.1525226
Despite having the same request handler and identical indices, what would be causing this significant difference in scores?
If the explain portion is different, then it's using different calculations to calculate the scores so they are going to be different. Scores are pretty arbitrary anyways and are basically only used for comparison within the one result set for the query, in other words it doesn't make sense to compare scores from one query to the scores of another query. The same probably applies to different version of solr, especially if the way the calculations are done are different.

Solr grouping issue

Being new to Solr (3.6.1 used on the project that I am working on) I am trying to understand how logical grouping can limit the data returned.
Working with the test data and schema that is supplied as part of the solr download when I run a query like id:1 and id:2 which based on the data returns 2 documents
but in the next case
(id:1 and popularity:0) and (id:2 and popularity:7)
I would assume that I would only get 1 document back as there is no document that has a popularity of 0 and yet all 5 documents are returned (I only loaded 5)
In the last case where I have int1 and (id:2 and popularity:7) I get three documents based on the tests i do (through the admin web page) and / or seem to return the same number of results. What am I missing?
After additional research it turns out that the parser (atleast the one used for the admin window) so lowercase and will be treated as the default operator which is normally defined as OR so anded clauses must be upper cased AND not and for the correct results to be returned.

How to do a constant score query in Solr

I'm using SolrNet to access a Solr index where I have a multivalue field called "tags". I want to perform the following pseudo-code query:
(tags:stack)^10 OR (tags:over)^5 OR (tags:flow)^2
where the term "stack" is being boosted by 10, "over" is being boosted by 5 and "flow" is being boosted by 2. The result I'm after is that results with "stack" will appear higher than those with "flow", etc.
The problem I'm having is that say "flow" only appears in a couple of documents, but "stack" appears in loads, then due to a high idf value, documents with "flow" appear above those with "stack".
When this was project was implemented straight in Lucene, I used ConstantScoreQuery and these eliminated the idf based the score solely on the boost value.
How can this be achieved with Solr and SolrNet, where I'm effectivly just passing Solr a query string? If it can't, is there an alternative way I can approach this problem?
Thanks in advance!
Solr 5.1 and later has this built into the query parser syntax via the ^= operator.
So just take your original query:
(tags:stack)^10 OR (tags:over)^5 OR (tags:flow)^2
And replace the ^ with ^= to change from boosted to constant:
(tags:stack)^=10 OR (tags:over)^=5 OR (tags:flow)^=2
I don't think there any way to directly express a ConstantScoreQuery in Solr, but it seems that range and prefix queries use ConstantScoreQuery under the hood, so you could try faking a range query, e.g. tags:[flow TO flow]
Alternatively, you could implement your own Solr QueryParser.

Resources