I use the following query string to get a document indexed in Solr:
http://localhost:8080/solr/newsarchive/select/?q=ID:bbc-55950440dc8e5f1a550bd736214a1e7e&sort=Date%20desc&version=2.2&start=0&rows=10&indent=on&wt=json
Which returns the specified document of ID bbc-55950440dc8e5f1a550bd736214a1e7e.
My question is: Is there any way to make this query returns a number of related documents IDs?
There is a way to do this in Solr, it's called More Like This: https://wiki.apache.org/solr/MoreLikeThis
You pass Solr a query and the More Like This handler will return similar documents for each document the query you passed in would return. It determines similarity by looking at the terms in fields that you select and running a Lucene query using those terms.
The fields you select need at a minimum to be stored, preferably they should be set up to store term vectors:
<field name="cat" ... termVectors="true" />
An example query (taken from the documentation):
http://localhost:8983/solr/select?q=apache&mlt=true&mlt.fl=manu,cat
In this case you are querying the index for the word "apache" and requesting a more like this result set (mlt=true). You are asking Solr to base the similar on the fields manu and cat. Solr will then look at the terms in those fields and perform a search on those fields using those terms to locate similar documents.
A few more articles/examples:
http://blog.brattland.no/node/18
https://cwiki.apache.org/confluence/display/solr/MoreLikeThis
Related
I'm using Solr 8.11.2. In order to boost documents with certain field values, I'm using Dismax's bq (Boost Query) parameter.
From what I've read, this should only influence the score of the search results returned by the rest of the query. What I see happening is that it filters all search results that don't have the field I'm boosting.
I'm using the following query, which returns all documents containing both words procedure and maintenance:
q=((+procedure+maintenance))&rows=10&start=0&wt=xml&q.op=AND&fl=id,score,alias,author,hash,collection,label,url,lastModified,path,extension,objectId,objectDtType,title,DocumentPK_s,Taal_s,Site_s,SharePointId_s&hl=true&hl.qt=highlightRH&hl.fl=content,description,label&hl.snippets=5&defType=dismax&bf=recip(max(0,ms(NOW-3MONTH,creationDate)),3.16e-11,1,1)&pf=content&sort=score DESC
But as soon as I append &bq=language:english^10000, which is supposed to boost documents where the field language is set to english, all documents where the field language doesn't exist are no longer part of the results.
Am I misunderstanding how this parameter is supposed to work? Is it a side effect?
I have the following simple query in solr in which I want to solr all the records based on their name similarity to a text ("Olive Tasting Room"):
query: name:"Olive Tasting Room"
But when I search it on solr it returns only one document which is most similar. this is while I want a sorted list of all my documents based on their rank (similarity to my query).
how should I do this in sorl/lucene ?
When you use the `field:"Term Term2" syntax, you're doing a phrase search - i.e. you expect the terms to come in succession after each other.
The best way to handle more "natural" queries is to use the edismax query parser. You do this by using defType=edismax in the URL. After changing to edismax, you can enter the query itself in q - q=Olive Tasting Room (escape it properly if you enter it directly into an URL), and qf=name (qf is short for "query fields", which fields the edismax handler should query).
You can also use the pf3=text parameter to give a boost to any documents that feature three words from your query after each other (and pf2 for just two) in the text.
I am trying to do a product search setup using Solr. It does return results for keywords that follow the same order in the product name. However, when the keywords are mixed up, no results are returned. I would like to get results with scores that closely match the given keywords in any order.
My question on scoring has the schema, data configuration and query. Any help will be greatly appreciated.
As long as you enter your query as a regular query, instead of using wildcards, any hits in a text_general field as you've defined should be returned.
You can use the mm parameter to adjust how many of the terms supplied that need to match from a query. I suggest using the edismax query parser, as that allows you do to more "natural" queries instead of having to add the fieldnames in the query itself:
defType=edismax&qf=catchall&q=nikon dslr
defType=edismax&qf=catchall&q=dslr nikon
should both give the same set of documents (but possibly different scores when using phrase boosts).
From a previous query I already have the document ID (the uniqueKey in this schema is 'track_id') of the document I'm interested in.
Then I would like to query a sequence of words on that document while highlighting the match.
I can't seem to be able to combine the search parameters in a successful way (all my google searches return purple links :\ ), although I've already tried many combinations these past few days. I also know the field where the matches will be if that's any use in terms of improving match speed.
I'm guessing it should be something like this:
/select?q=track_id:{key_i_already_have} AND/&/{part_I_dont_know} word1 word2 word3
Currently, since I can't combine these two search parameters, I'm only querying the words and thus getting several results from several documents.
Thanks in advance.
From Solr 4 you can use the realtime get, which is much more faster than searching the index by id.
http://localhost:8983/solr/get?ids=id1,id2,id3
For index updates to be visible (searchable), some kind of commit must reopen a searcher to a new point-in-time view of the index. The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher. This is primarily useful when using Solr as a NoSQL data store and not just a search index.
You may try applying Filter Query for id. So it will filter your search query to that id, and then search in that document for all the keywords, and highlight them.
Your query will look like:
/select?fq=track_id:DOC_ID&q=word1 word2 word3
Just make sure your "id" field in schema.xml is defined of the type string to apply filter queries on it.
<field name="id" type="string" indexed="true" stored="true" required="true" />
I'm trying to use MoreLikeThis to get all similar documents but not documents with a specific contenttype.
So the first query needs to find the one document that I want to get "More Like This" of - and the second query needs to limit the similar documents to not be pdf's (-contenttype:pdf)
Does anyone know if this is possible?
Thanks
When using the MoreLikeThisHandler, all the common parameters applied to the mlt results set. So you can use the fq parameter to exclude your pdf documents from the mlt results:
http://localhost:8983/solr/mlt?q=test&mlt.fl=text&fq=-contenttype:pdf
The q parameter allows to select the document to generate mlt results (actually, it's the first document matching the initial query that is used).