suggestions(recommendations) in the solr search results - solr

I am using solr to search contents of the PDF files that I have indexed. I am considering the whole content of the file as content field. Now I need some suggestions in the result as how we get in online shopping sites. Along with the solr result, I need information such as most searched text or something like you can also find: blah blah. And I also need the result to be displayed in the order of most selected answers for the searched text.
Please can anyone brief me on this? I am new to solr. Thanks in advance.

Related

Azure suggester returning all content

I'm trying to implement an Azure suggester feature into our pilot Azure search app and running into issues. The content I'm indexing are PDF files, so my suggester definition is based on the content field itself which can be thousands of lines of text. Following examples online, when I implement the suggester, I'm returned the entire content of the body of text from the PDF file. What I'd really like to do is return just a phrase found in the text.
For instance, suppose I'm indexing a Harry Potter book and I type into my search field "Dum", I'd like to see suggested results back like "Dumbledore", "Dementor", etc VS the whole book. Is this possible?
Tks
If we want to search for words sharing the same prefix, Autocomplete is the right API for this job. https://learn.microsoft.com/en-us/rest/api/searchservice/autocomplete
In contrast, Suggester API helps users find the documents containing words with that prefix. It returns text snippets containing those worlds.
If you still believe suggester api does not behave as expected and autocomplete is not suitable, let me know your source document, query and expected results.

Generate highlighted snippets in Solr for PDFs

I'm new to solr. I've set up a solr server and have indexed a few thousand PDFs. I am trying to query solr via the rest API in a PHP page. I am trying to build something similar to the solritas interface included in the tutorial (solrserver/browse), but I don't know how to generate highlighted snippets. I found in the documentation "hl" is a query parameter and is by default set to false.
When I get http://solrserver/?q=search+term&hl=true I get back a response with a hightlighting section, but it only contains the document IDs, no generated snippets.
I am using the tutorial provided schema and config for solr 4.2.1. I believe that the configuration is fine because solritas is able to display highlighted snippets using the same indexed data. I've tried seeing how solritas is built but it's separated out in .vm template files and I haven't been able to find what I'm looking for yet.
I can see the full text of the PDF in the doc->content area, so it is stored. I think I just don't understand the proper way to generate snippets! Can someone please help!
Thanks :)
I would suggest, you should try using hl.fl parameter. So your query should be something like this:
?q=search+term&hl=true&hl.fl=field1,field2,field3
Where field1, field2 and field3 are three source fields you would like to generate highlights.
In your case, if the field name you want to use for highlighting is content, your query can be:
?q=search+term&hl=true&hl.fl=content
More details: http://docs.lucidworks.com/display/solr/Highlighting
With highlighting, you can even specify fragment size, HTML tags around highlighted text etc...

Solr search – Tika extracted text from PDF not return highlighting snippet

I have successfully indexed Pdf –using Tika- and pure text –fetched from database- in one single collection. Now I am trying to implement highlighting. When I querying Solr i placing in the url the following: http://myhost:8090/solr/ktm/select/?q=BlahBlah&start=0&rows=120&indent=on&hl=true&wt=json . Everything is OK. The received output has the original (not highlighted text) content under “docs” and the highlighted snippets under “highlighting”. But I had noticed the documents that have been extracted by Tika don’t have “highlighting” snippet. That kind of response, cause me many troubles (zero length rows). Is there any workaround in order to tackle it? I have already tried to copyField (at index time) but the response come out blank ({“highlighting”:{}}). I really need help on this.

Solr configuration

I'm very new with Solr,
And I really want a step by step to have my Solr search result like the google one.
To give you an idea, when you search 'PHP' in http://wiki.apache.org/solr/FindPage , the word 'php' shows up in bold .. This is the same result I want to have.
Showing only a parser even if the pdf is a very huge one.
You can use highlighting to show a matching snippet in the results.
http://wiki.apache.org/solr/HighlightingParameters
By default, it will wrap matching words with <em> tags, but you can change this by setting the hl.simple.pre/hl.simple.post parameters.
You may be looking at the wrong part of the returned data. Try looking at the 'highlighting' component of the returned data structure (i.e. don't look at the response docs). This should give you the snippets you want.

Solr display page no of PDF along with the results

My question is just a continuation of this activity where I would like to display page no for the searched word in the input document.
Solr open document after searching a keyword
So I use
1) tika-0.9.jar to extract the output as an intermediate file.
2) Then I create another XML where the extracted output is the input and write the data in the format expected by Solr and then post this xml using post.jar command.
3) I use Solritas Serach UI with Solr 3.2 version (http://localhost:8983/solr/browse) to view the results.
I would like to display the page no's along with the results.
Example :
If I search for a word test in the input PDF's what I have manged so far is to display all set of docs that contain this result and on click of any doc the input PDF will open. I would like to display the page no of the where this word say 'test' is present in each of the input doc.
Please give me some suggestion , like whether this can be done by some how storing the page no in the index .
Your suggestions are most welcome.
Thanks and regards.

Resources