Limiting the output from MoreLikeThis in Solr - solr

I'm trying to use MoreLikeThis to get all similar documents but not documents with a specific contenttype.
So the first query needs to find the one document that I want to get "More Like This" of - and the second query needs to limit the similar documents to not be pdf's (-contenttype:pdf)
Does anyone know if this is possible?
Thanks

When using the MoreLikeThisHandler, all the common parameters applied to the mlt results set. So you can use the fq parameter to exclude your pdf documents from the mlt results:
http://localhost:8983/solr/mlt?q=test&mlt.fl=text&fq=-contenttype:pdf
The q parameter allows to select the document to generate mlt results (actually, it's the first document matching the initial query that is used).

Related

Why does Dismax's bq (Boost Query) parameter filter results instead of just boosting them?

I'm using Solr 8.11.2. In order to boost documents with certain field values, I'm using Dismax's bq (Boost Query) parameter.
From what I've read, this should only influence the score of the search results returned by the rest of the query. What I see happening is that it filters all search results that don't have the field I'm boosting.
I'm using the following query, which returns all documents containing both words procedure and maintenance:
q=((+procedure+maintenance))&rows=10&start=0&wt=xml&q.op=AND&fl=id,score,alias,author,hash,collection,label,url,lastModified,path,extension,objectId,objectDtType,title,DocumentPK_s,Taal_s,Site_s,SharePointId_s&hl=true&hl.qt=highlightRH&hl.fl=content,description,label&hl.snippets=5&defType=dismax&bf=recip(max(0,ms(NOW-3MONTH,creationDate)),3.16e-11,1,1)&pf=content&sort=score DESC
But as soon as I append &bq=language:english^10000, which is supposed to boost documents where the field language is set to english, all documents where the field language doesn't exist are no longer part of the results.
Am I misunderstanding how this parameter is supposed to work? Is it a side effect?

How to boost a solr document at query time based on attribute value

I want boost at query time all documents that have value user_id=2. Basically I want on the top of my results all the documents belonged to a specific user.
After looking at some Solr resources I ended up writing a query like, but it is not working properly.
/solr/public-main/select?q={!boost b=if(div(155623,user_id),2,1)}sometext&wt=json&indent=true&debugQuery=true
Any hints?
Thanks
You don't need to use the boost with a dynamic boost. Apply a boost query which will boost all the documents that match the query: bq=user_id:2^4. Adjust 4 to a suitable boost value depending on the rest of your boosts (if any in q or qf).
One option is to have a function query with fl=x,y,userexists:exists(query({!v='user_id:2'})) and then u can sort by userexists and then by score field.

Is it possible to boost mlt queries in solr?

Specifically if I'm doing a query using the solr mlt handler (http://wiki.apache.org/solr/MoreLikeThisHandler) and stream.body to supply the source doc is there any way to boost result documents based on document age?
I already know how to do that for a regular query using dismax (http://wiki.apache.org/solr/FunctionQuery#Date_Boosting) but I can't quite figure out the magic incantation to do it for the mlt handler.
It looks like the mlt handler is written to handle one of two cases:
q=[typical query goodness which can include date boosting]
stream.body=[url]
If q is present, stream.body is ignored and vice-versa, so unfortunately I don't think you'll be able to do what you want in a single call without patching the MoreLikeThisHandler.
BUT: If you need this in a hurry, you can do it with two queries
Run your same MLT query solely for the purpose of retrieving the interesting-terms and boosts (e.g with mlt.interestingTerms=details&mlt.boost=true&rows=0)
Using the interesting-terms and boosts from (1), run a standard Solr query (non-MLT) with the date-boosting function you desire.

How to boost fields in solr

I already have the boost determined before hand. I have a field in the solr index called boost1 . This boost field will have a value from 1 to 10 similar to google PR rank. This is the boost that should be applied to every query ran in solr. here are the fields in my index
Id
Title
Text
Boost1
The boost field should be apply to every query. I am trying to implement functionality similar to Google PR rank. Is there a way to do this using solr?
you can add the boost during query e.g.
q={!boost b=boost1}
How_can_I_boost_the_score_of_newer_documents
However, this may need to be added explicitly by you.
If you are using dismax or edismax with the request handler, The bf (Boost Functions) parameter could be used to boost the documents.
http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29
bf=boost1^0.5
This can be added to defaults with the request handler definition, so that they are applied to all the search queries.
you can use function queries to vary the amount of boost FunctionQuery
I think you need to use index time document boosts. See this if you are indexing XML or this if using DataImportHandler.

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

Resources