Solr score boost - based on number of likes - solr

I have added fs_votingapi_result in solr document this represents number of likes.
I found below function to improve the score based on fs_votingapi_result.
But I am unable to get the logic behind this - what are the extra parameters $vote_steepness, $total, $total, $vote_boost?
bf=recip(rord(fs_votingapi_result),$vote_steepness,$total,$total)^$vote_boost
I am new to solr and I am not able to find any document/article to get more idea about this.

This is in the Function Query documentation.
recip
A reciprocal function with recip(x,m,a,b) implementing a/(m*x+b). m,a,b are constants, x is any numeric field or arbitrarily complex function.
rord
The reversed ordinal of the indexed value. (In your case, the function: rord(fs_votingapi_result) would yield 1 for the record w the most votes, 2 for the second most votes, etc...)
So
recip(rord(fs_votingapi_result),$vote_steepness,$total,$total)
= $total / ($vote_steepness * rev-ordinal-of-vote-result + $total)
Then the result is boosted by $vote_boost to create the boost function (from bf param).
= ($total / ($vote_steepness * rev-ordinal-of-vote-result + $total)) * $vote_boost
Which is added to the document score from the rest of the query. (Then before scores are returned, they are normalized across all matching docs)
The $<var> values are either defined in solrconfig.xml or more commonly passed as separate http query parameters.
Hope that gives you a starting point.

Related

SQL Report Builder: Use aggregate function on ReportItem

I've entered the following expression for a given cell, which is essentially a dollar value divided by a quantity to get a cents per gallon value, labeled as Textbox41:
=ReportItems!Total_Gross_Profit2.Value / ReportItems!Gallon_Qty3.Value
What I was trying to do is use this expression for an AVG aggregation in another cell =avg(ReportItems!Textbox41.Value), but I'm getting an error:
The Value expression for the textrun
'Textbox79.Paragraphs[0].TextRuns[0]' uses an aggregate function on a
report item. Aggregate functions can be used only on report items
contained in page headers and footers.
Is there some limitation that does not allow aggregations on ReportItems? I've also tried the following, which also did not work:
=AVG(ReportItems!Total_Gross_Profit2.Value / ReportItems!Gallon_Qty3.Value)
Where am I going wrong here?
Regarding your question:
Is there some limitation that does not allow aggregations on ReportItems?
You have your answer in the error message you provided.
As for the resolution, it's hard to give precise guidance with the information you provided, but in general, start thinking in terms of dataset fields instead of report objects. If you're operating from inside a matrix or table, and if the values for 'Total_Gross_Profit' and 'Gallon_Qty_3' look something analogous to this:
= ReportItems!ProfitsFromX.Value + ReportItems!ProfitsFromY.Value
= ReportItems!GallonQtyA.Value + ReportItems!GallonQtyB.Value
Point to the fields directly instead:
= Fields!ProfitsFromX.Value + Fields!ProfitsFromY.Value
= Fields!GallonQtyA.Value + Fields!GallonQtyB.Value
That way, when it comes to aggregation, it's more clear what to do:
= avg(
(Fields!ProfitsFromX.Value + Fields!ProfitsFromY.Value)
/ (Fields!GallonQtyA.Value + Fields!GallonQtyB.Value)
)
And if you find that cumbersome, you can create calculated fields on the dataset object, and reference those instead where appropriate.

Inconsistent values for getNumberFound() in Search API

I have a full-text search index with 42 documents like in the screenshot below:
When I query the index for "" it returns all the 42 documents correctly (good), but when I use the limit and offset options in the query, the value returned for the total number of matches found (results.getNumberFound()) varies from time to time. It gives me different values for different offsets!! In short, making the same query just with different offset values gives a different value for results.getNumberFound() function!
NOTE: This happens only in production server after I deploy the app. In local server everything
works perfectly (i.e for the same query, the number of total hits found is the same regardless of the offset option value).
Query query = Query.newBuilder()
.setOptions(QueryOptions.newBuilder()
.setLimit(limit)
.setOffset(offset).build())
.build(searchPhrase);
Results<ScoredDocument> results = INDEX.search(query);
LOG.warning( "Phrase:'" + searchPhrase +
"' limit:" + limit +
" offset:" + offset +
" num:" + results.getNumberFound());
Here's a screenshot of the log output:
So is there something wrong I'm doing or it's a bug in the Search API because the weird thing is that the issue only happens in the production server not the local one.
The python docs say
number_found
Returns an approximate number of documents matching the query. QueryOptions defining post-processing of the search results. If the QueryOptions.number_found_accuracy parameter were set to 100, then number_found <= 100 is accurate.
Similiar api components in exist in Java. From your code it appears you haven't set an accuracy. See java QueryOptions https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/search/QueryOptions
Having said that I have seen many questions/discussions about lack of accuracy on the number of found results.
Surprisingly, this is working as intended (as Tim says).
https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/search/QueryOptions.Builder#setNumberFoundAccuracy(int)
In its default state, the datastore scans the minimal set of data to fulfill the request. The database provides a very rough estimate of match results by multiplying ID range with estimate of matching keys (#keys found that matched / #ids scanned during the query).
For small data sets, set the accuracy value higher (500 or 1000) and call it a day. You can also improve the estimate by making sure key IDs are uniformly distributed and by fetching a higher limit each call (though if you don't need the data, just use the accuracy parameter).
This might not be applicable here but this is a general workaround for larger data sets:
Use num_accuracy == 1000. When queries return an estimate of <1000, you can trust that. When a query returns an estimate of >1000, perform your own estimate using a second query:
Include an extra numeric field with your data, which is a value of a discrete probabilistic event (e.g. #0s in a hash of some randomish data). When you get a large estimate from the first query, repeat your query with the additional constraint (e.g. AND ZERO_COUNT == y), where y is chosen based on the first query's estimate to match <1000 entities, producing an exact count for the second query which you can accurately extrapolate. Since you don't need the results of this data, you can set limit to 1 & num_accuracy == 1000.

How to get tf-idf score and bm25f score of a term in a document using whoosh?

I am using whoosh to index a dataset. I want to retrieve the td-idf score and bm25f score given a term and document? I have seen the scoring.TFIDF() and scoring.TFIDFScorer(). In order to call TFIDFScorer().score() method we should pass a matcher object. Which matcher object should I pass to it.
Similarly, what parameters should I pass to BM25FScorer()._score(self, weight, length)? What are weight and length parameters? What values are passed by default?
Finally able to figure it out. Here it is for anyone who come here later,
For finding TFIDF and BM25F score of a term and document.
qp = QueryParser('content', ix.schema)
q = qp.parse(unicode('id:1'))
with ix.searcher(weighting=scoring.TF_IDF()) as searcher_tfidf:
scoring.TFIDF().scorer(searcher_tfidf, 'body', 'algebra').score(q.matcher(searcher_tfidf))
with ix.searcher(weighting=scoring.BM25F()) as searcher_bm25f:
scoring.BM25F().scorer(searcher_bm25f, 'body', 'algebra').score(q.matcher(searcher_bm25f))
ix is IndexReader object obtained using open_dir() method or create_in(). The key is to get the Matcher object that matches exactly the required document. So, use an id or any unique field in the schema to get that particular document using qp.parse() method.

Using SOLR how do I return the best result for a set of integer preferences

I'm trying to create a query that returns the best product depending on a few required attributes and a few optional ones that just affect the weighting.
Properties 1-3 required
Properties 4-5 optional
ratings 1-3 optional
The data is structured in the solr db like so:
property1 (string)
property2 (string)
property3 (string)
property4 (string)
property5 (string)
rating1 (int)
rating2 (int)
rating3 (int)
The query I've created so far get's me close, but it does not take in account how close the optional fields are to the specific requested value.
An example is the ratings are valued 1-5 for arbitrary properties such as efficiency or usefulness. I need it to acknowledge that if the user wants rating1 set to 4 then values 3 and 5 are still valid, just equally less so. Also a value of 2 is weighted more then 1. So it basically creates a scale based on how far the product is from the desired rating value.
defType = dismax
sort = score desc
fl = entity_id,score,property4,property5,rating1,rating2,rating3
fq = property1:215 property2:45 property3:17
bq = property4:(H)^5 OR property5:(87)^5 OR rating1:(1)^5 OR rating2:(3)^5 OR rating3:(5)^5
Since you have the rules for doing the math on the rating, I would go with a function query. You could do any math that you think works best in this case and the result could affect the boost score.

Solr: how to get total number of results for grouped query using the Java API

I have the following:
43 documents indexed in Solr
If I use the Java API to do a query without any grouping, such as:
SolrQuery query = new SolrQuery("*:*");
query.setRows(10);
I can then obtain the total number of matching elements like this:
solrServer.query(query).getResults().getNumFound(); //43 in this case
The getResults() method returns a SolrDocumentList instance which contains this value.
If however I use grouping, something like:
query.set("group", "true");
query.set("group.format", "simple");
query.set("group.field", "someField");
Then the above code for retrieving query results no loger works (throws NPE), and I have to instead use:
List<GroupCommand> groupCommands = solrServer.query(query).getGroupResponse().getValues();
List<Group> groups = groupCommand.getValues();
Group group = groups.get(groupIndex);
I don't understand how to use this part of the API to get the overall number of matching documents (the 43 from the non-grouping query above). First I thought that with grouping is no longer possible to get that, but I've noticed that if I do a similar query in the Solr admin console, with the same grouping and everything, it returns the exact same results as the Java API and also numFound=43. So obviously the code used for the console has some way to retrieve that value even when grouping is used:
My question is, how can I get that overall number of matching documents for a query using grouping executed via Solr Java API?
In looking at the source for Group that is returned from your groups.get(groupIndex) call, it has a getResults() method that returns a SolrDocumentList. The SolrDocumentList has a getNumFound() method that should return the overall number, I believe...
So you should be able to get this as the following:
int numFound = group.getResults().getNumFound();
Hope this helps.
Update: I believe as OP stated, group.getResults().getNumFound() will only return the number of items in the group. However, on the GroupCommand there is a getMatches() method that may be the corresponding count that is desired.
int matches = groupCommands.getMatches();
If you set the ngroups parameter to true (default false) this will return the number of groups.
eg:
solrQuery.set("group.ngroups", true);
https://cwiki.apache.org/confluence/display/solr/Result+Grouping
this can then be retrieved from your responding GroupCommand with:
int numGroups = tempGroup.getNGroups();
At least that was my understanding?

Resources