solr displayed some results first when they are part of the results - solr

I consider this solr psedo-doc
<doc>
<field name="title"/>
<field name="name"/>
<field name="keywords"/>
</doc>
Some doc's will have the keyword "up" which means that they should appear first (despite of their initial order position) when and only when they are part of the search results.
So lets say I have:
doc1('title1','Bob, Alice','people, up, couple')
doc2('title2','Smart Phone, Laptop, Bob','devices, electronics')
if I query with "title:title2 name:Bob" then I should get doc1 first (it has the 'up' keyword).
if I query with "name:Bob" I still get doc1 first for the same reason.
if I query with "name:Laptop" then I should only get doc2 in my results. doc1 should not be included since it doesnt match my search query.
Any suggestion to do this?

You have several options to do something like that:
function query / boost query (in dismax handler)
during index time (boost documents)
extract 'up' keyword to additional field and sort by this field, than score
For example (with dismax handler):
/select?defType=dismax&q=...&bq=keywords:"up"^1000

This can be solved with Solr's query time boosting. So following the guidance from the Solr Relevancy FAQ - you could add an additional boosted search term to all queries, e.g. title:title2 name:Bob keywords:up^2
You could also at index time for each document, determine if the up keyword is present then store that in an additional field (boolean for example) in your schema and boost the query results based on that boolean field.

Related

How to query a specific document by id

From a previous query I already have the document ID (the uniqueKey in this schema is 'track_id') of the document I'm interested in.
Then I would like to query a sequence of words on that document while highlighting the match.
I can't seem to be able to combine the search parameters in a successful way (all my google searches return purple links :\ ), although I've already tried many combinations these past few days. I also know the field where the matches will be if that's any use in terms of improving match speed.
I'm guessing it should be something like this:
/select?q=track_id:{key_i_already_have} AND/&/{part_I_dont_know} word1 word2 word3
Currently, since I can't combine these two search parameters, I'm only querying the words and thus getting several results from several documents.
Thanks in advance.
From Solr 4 you can use the realtime get, which is much more faster than searching the index by id.
http://localhost:8983/solr/get?ids=id1,id2,id3
For index updates to be visible (searchable), some kind of commit must reopen a searcher to a new point-in-time view of the index. The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher. This is primarily useful when using Solr as a NoSQL data store and not just a search index.
You may try applying Filter Query for id. So it will filter your search query to that id, and then search in that document for all the keywords, and highlight them.
Your query will look like:
/select?fq=track_id:DOC_ID&q=word1 word2 word3
Just make sure your "id" field in schema.xml is defined of the type string to apply filter queries on it.
<field name="id" type="string" indexed="true" stored="true" required="true" />

Could it possible to get related documents in Solr search query?

I use the following query string to get a document indexed in Solr:
http://localhost:8080/solr/newsarchive/select/?q=ID:bbc-55950440dc8e5f1a550bd736214a1e7e&sort=Date%20desc&version=2.2&start=0&rows=10&indent=on&wt=json
Which returns the specified document of ID bbc-55950440dc8e5f1a550bd736214a1e7e.
My question is: Is there any way to make this query returns a number of related documents IDs?
There is a way to do this in Solr, it's called More Like This: https://wiki.apache.org/solr/MoreLikeThis
You pass Solr a query and the More Like This handler will return similar documents for each document the query you passed in would return. It determines similarity by looking at the terms in fields that you select and running a Lucene query using those terms.
The fields you select need at a minimum to be stored, preferably they should be set up to store term vectors:
<field name="cat" ... termVectors="true" />
An example query (taken from the documentation):
http://localhost:8983/solr/select?q=apache&mlt=true&mlt.fl=manu,cat
In this case you are querying the index for the word "apache" and requesting a more like this result set (mlt=true). You are asking Solr to base the similar on the fields manu and cat. Solr will then look at the terms in those fields and perform a search on those fields using those terms to locate similar documents.
A few more articles/examples:
http://blog.brattland.no/node/18
https://cwiki.apache.org/confluence/display/solr/MoreLikeThis

How to enforce an exact match to get the highest priority?

I am indexing and searching 5 fields, which are tokenized/filtered in various ways.
BUT, I would like that when I search, if the query I entered matches a value in field 1, it will be the top result I get back.
How would I define:
The field
The query in such a way this field gets priority IF there is 100% match
In my schema, I have the field
<field name="na_title" type="text_names" indexed="true" stored="false" required="true" />
text_names is :<fieldType name="text_names" class="solr.StrField" />
I have ONLY one entry with na_title="somthing is going on here". But, when I search
text_names:somthing is going on here I get many results.
Just to point it out, there are no analyzers nor filters on that field, both for query and index actions.
From the manual:
Lucene allows influencing search results by "boosting" in more than
one level:
Document level boosting - while indexing - by calling
document.setBoost() before a document is added to the index.
Document's Field level boosting - while indexing - by calling
field.setBoost() before adding a field to the document (and before
adding the document to the index).
Query level boosting - during
search, by setting a boost on a query clause, calling
Query.setBoost().
You'll need to index the field twice -- once analyzed and once not. Then you can boost the matches in the nonanalyzed fields over the others.
A shortcut could be to index all those fields as strings and use copyfield to copy them as text into a catch-all field. That would simplify the query a little and decrease the number of duplicate fields.

Solr Faceting Multi-valued vs Tokenizers

I'm trying to set up a subject field in my schema. I'm drawing from a database where a single record can have multiple subjects and the subjects are listed in a comma delimited string. Is there a way to facet on just one of the subjects?
Thanks
Check SolrFacetingOverview for an faceting overview.
Facet Indexing section mentions the field type you should choose for the field that you want to facet on.
You can customize the faceting using SimpleFacetParameters
You can filter the results with entities having particular value for a subject using the filter query e.g. fq=subject:"MATH"
The filtering would produce only the results matching the criteria and the facet results would include the facets from the resultset.
if I understand well you want this, in the dih file:
<entity name="entity" pk="id" query="..." transformer="RegexTransformer">
<field column="subjects" splitBy=","/>
</entity>
and the query for facetting:
http://localhost:8983/solr/select?q=...&facet=true&facet.field=subjects&facet.query=subjects:the-one-you-want
would that work?

Solr - Results that contain all terms, in any order

In a SOLR install, when I search against a field with a multi-word search term I want SOLR to return documents that have all of the terms in the search, but they do not need to be in the exact order.
For example, if I search for title of Brown Chicken Brown Cow, I want to find all documents that contain all of the terms Brown, Chicken and Cow, irrespective of order in the title field. So, for example, the title "The chicken and the cow have brown poop" should match the query. AFAIK, this is how Google executes searches as well.
I have experimented with the following query formats:
1. Title:Brown AND Title:Chicken
2. Title:Brown AND Chicken
3. Title:Brown+Chicken
I am very confused by the results. In some instances, the first two queries return the same exact set of results. In other instances, the first version will return many results and the second version will return none. The third version seems to meet my needs, but I am confused by the different meaning of the queries.
All of my tests have been run against a field of type text_en.
<field name="Title" multiValued="false" type="text_en" indexed="true" stored="true"/>
So, what's the best SOLR query/set up for this type of search? Also, is there an easy way to make Solr.NET take a user entered search term and convert it to this type of format?
Also, will SOLR by default give documents that match the order of the search phrase a higher relevancy score? If not, what's the right levers to pull to make that happen?
Edit:
Some of my confusion was caused by searching against not default fields vs default fields. Knowing this, the only format that works consistently is the first format.
If I were you I would try to use:
Title:(Brown Chicken)
Brackets will make it equivalent to your query no 1. Quotation will force Solr to search for exact match, including space and order
Please try Title:"Brown Chicken" or use Dismax query parser to handle your queries.
The wiki for lucene query parser speaks (emphasis mine):
....Since text is the default field, the field indicator is not required.
Note: The field is only valid for the term that it directly precedes,
so the query
title:Do it right
Will only find "Do" in the title field. It will find "it" and "right"
in the default field (in this case the text field).
Do you have only the title field in your data model?
Please run debugQuery=on to explain your query to see how they are scored: see it in action https://stackoverflow.com/a/9262300/604511

Resources