Get list of terms that were highlighted by Solr - solr

When I search for the word "fish" I get back a list of documents containing that word and variants of that word. If I turn on highlighting I might see a snippet that looks like this:
The law requires that anyone <em>fishing</em> in public lakes...
I would like to show the user the above snippet, which works just fine by the way, but I would also like to show the user a complete list of words that would also have been highlighted had I shown all snippets.
For example I would like to be able to show the user the following:
Section 18.32A - Hunting and Fishing
...The law requires that anyone <em>fishing</em> in public lakes...
Document also contains: Fish, Fishing, Fisherman
Is thee a way to get that list of words other than having solr highlight the entire document and then me parsing the document looking for em tags and building a list of highlighted words?

I would investigate frag size (hl.fragsize), synonyms (synonym.txt), or stemming (can help with variations of a word) to find a solution. You can set fish, fishing, fished to all mean the same in synonyms. Ensure you understand how the expand all works and whether you want the search to replace each with the other. Also ensure you know whether to index the synonym file or query with it. Do not use synonyms at both index and query time. There is also a switch to enable multiple matches in highlighting.

Related

How to get exact part where azure search found a match

So I have azure search, and when I search for something in the content it gave me all of the text from file. Is there a way to find a part of the content where it found a match? Like one sentence before + sentence with match + sentence after or just index of the start for match and the length?
Yes. There is hit highlighting feature for exactly this purpose. To use it you specify the fields you want this information on in highlight member of the request (documented here). In your response you will get some text snippets with text surrounding the matches and the hit work marked with a tag of your choice (defaults to <em></em>).

Solr: Separate highlight fragment for each search term occurrence

I use Solr 5 for searching in large (text) documents. For each search result, I display a fragment containing the highlighted search match. This works nicley using Solr's Standard Highlighter. Yet I found that if several matches are found close to each other, they will be merged into one fragment, even with hl.mergeContiguous=false. Params are
SolrQuery query = new SolrQuery();
query.setQuery(rawQuery);
query.set("defType", "lucene");
query.setRows(1000);
query.setHighlight(true);
query.setHighlightFragsize(200);
query.setHighlightSnippets(20);
query.setParam("hl.fl", "content");
query.setParam("hl.maxAnalyzedChars", "-1");
query.setParam("hl.mergeContiguous", false);
Example: I use a bible translation for testing, just because of its length. Searching for beast yields (among many others)
...7:8 Of clean beasts, and of beasts that are not clean, and of birds, and of everything that creeps upon the ground, 7:9 there went in two and two to Noah into...
I would rather have this fragment twice, because it contains two occurrences of the search term. Manually duplicating the fragment in this case appears clumsy to me. Am I missing a query parameter, or do I need a custom BoundaryScanner to achieve this?
You can think of using hl.regex - regex based fragmenter, and prepare the regex based on your terms and attach to the request. look for related hl.regex.slop, hl.regex.maxAnalyzedChars params also if you want to try this.
Or can reduce the fragment size for standard highlighter: hl.fragsize to something you think two occurrences of your terms may not be existing within.
BoundaryScanner works with FastVectorHighlighter only, and can be the option if no OOTB param works.

GAE Full Text Search: can only match exact word? how to search like contains(...)?

Just tried GAE(1.7.7 Java) Full Text Search and found if the search string is work, surprisingly it will not match working, worked, or hardworking, homework, I'd like to know if i miss something in the API, i read the tutorial but did not found any document about this except plural match.
Thanks.
P.S. I tried unit test for search service, not in working environment.
Tucked away in the docs (but unfortunately not in the table of operators), there is a '~' operator
To search for plural variants of an exact query, use the ~ operator:
~"car" # searches for "car" and "cars"
Not sure how far that will get you. Unfortunately thats about it.
See https://developers.google.com/appengine/docs/java/search/overview#Queries_on_Fields
There is so little documentation on this,but just having tried it, it just works on plurals.
One approach would be to do your own stemming on the words in the document, (though you wouldn't return that as the text ;-) Then you could perform stemming on your search term and be able to match worked, working etc..
This is a late answer, but to follow up the previous answer, what you want to do is not possible with the basic API functions. The search API works on full-text searching principles. To get around this you can tokenise your searchable data pre-index and store this in a field with the relevant document.
See: Partial matching GAE search API

Solr wildcard search, need to only highlight the matched part not the entire word

My text is like this: The searched word WildCard shall be partially highlighted
I search using wildcard expression "wild*".
What I need is to have the highlight snippet to be [tag]Wild[/tag]Card. What I got was [tag]WildCard[/tag], and I spent lots of time researching it, but could not find an answer.
This behavior can be illustrated on linkedin.com, where you type other people's name at the top right corner.
Once this is figured out, I will have a follow-up questions.
Thanks,
I am not sure if you can achieve what you want directly in solr. The obvious solution is to parse the returned doc yourself searching for [tag]WildCard[/tag] and find out what part of the term you need to highlight.
This is not possible with Solr. What you need to do is change the characters Solr uses to highlight found words to something you can easily remove (maybe an html comment) and then build a highlight yourself. You are likely already storing your query in a variable, so just scan the search return docs and highlight all instances of your query.

Solr configuration

I'm very new with Solr,
And I really want a step by step to have my Solr search result like the google one.
To give you an idea, when you search 'PHP' in http://wiki.apache.org/solr/FindPage , the word 'php' shows up in bold .. This is the same result I want to have.
Showing only a parser even if the pdf is a very huge one.
You can use highlighting to show a matching snippet in the results.
http://wiki.apache.org/solr/HighlightingParameters
By default, it will wrap matching words with <em> tags, but you can change this by setting the hl.simple.pre/hl.simple.post parameters.
You may be looking at the wrong part of the returned data. Try looking at the 'highlighting' component of the returned data structure (i.e. don't look at the response docs). This should give you the snippets you want.

Resources