Solr Highlighting - Display Snippet - solr

I have successfully set up highlighting in Solr4, I am indexing docx, xlsx & pdf's mainly so just have fields like url, title & content.
I have Solr highlighting the content field and it displays the small snippet of text, but sometimes the matched word is in the title as opposed to the content and therefore it will not return me a snippet of text
Is there any way of returning even just the first line or two from the content field so that it is not left blank.

I guess your query URL looks like q=(title:ABC OR content:ABC)&hl=true&hl.fl=title,content
Try adding hl.alternateField=content to the query

Use fl=content parameter with your query. If no highlighted content returned then generate snippet from content (fl=content) field returned with each document in result set.

Related

Solr No "Content" Field in Collection after Indexing PDFs/DOCs

I have a collection of thousands of documents/pdfs and there are a lot of fields like: url, title, date...etc. But there is no content field, which is something that seems like it must exist in order for you to be to able to search by keywords of the entire document, not just the title. I see some people saying that usually, the content field is generated automatically when you index.
How do I go about adding a content field that should contain all the text in the PDFs/DOCs? I am on Solr 6 so I know I need to use API to create a new field to work with managed-schema. But after that, how do I re-index my collection? And if I just name the new field "content", will Solr know that the "content" field should contain all the text in my PDFs/DOCs when it's reindexing?
Creating a "content" field did not work! Instead, I set stored=true for my _text_ field and everything worked.

Search Highlights in file Contents

We have .net application displaying data from using Azure Search service, based on DotNet search App Template
Is there a possibility to show only parts of the file with the highlighted term - for example PDF file page with term highlighted on top?
Have you looked up using the highlight feature?
https://learn.microsoft.com/en-us/rest/api/searchservice/search-documents#highlightstring-optional
Your search results will then have a new #search.highlights field which contains a collection of passages with the searched terms. Then you can also use the $select parameter to decide which fields to retrieve and which one to ignore.
https://learn.microsoft.com/en-us/rest/api/searchservice/search-documents#selectstring-optional

Indexing URL pointing to pdf using TIKA in SOLR

I have a requirement where the incoming update request has a metadata like "link":"htp://example.pdf" (along with some other metadata) and i have to parse the PDF document and indexed it in another field like "link_value":"PDF extracted contents". Is this possible in SOLR using tika?
NOTE: I cannot use Data import handler since the incoming request is not from a single source and is done via external source
So, if I understand correctly:
you are getting some /update call to add some doc
the doc contains a 'link' field, which you want to retrieve, extract text with Tika, and index into another field
Yes you can do this in Solr, but you need to do some work:
set up an UpdateRequestProcessor, you could start off TikaLanguageIdentifierUpdateProcessorFactory as it uses Tika too and maybe you can reuse some stuff
you wire your URP so it is used by the /update handler
that URP will kick in every time a doc is added
in the URP code, you: retrieve the pdf, programatically extract the text with Tika, and add it to the target field
You can map content to a specific field and supply specific field values when you're using the ExtractingRequestHandler (if you're using Tika yourself, you'll include the content as a regular document field).
To map the content to a different field, use fmap: fmap.content=link_value, and to include a literal value (i.e. the URL of the document you're indexing), use literal: literal.link=http://example.com/test.pdf (apply URL escaping as necessary).

SSRS action on field in CSV list

My client requires a report that produces a comma separated list of files in one column of a grid. I know I can use a FOR XML Path in my query to yield these results. However, the client wants to be able to click on an individual value in that CSV and be taken to a link for that element in the list. For example, the column in the report would look like:
1.jpg, 2.jpg, 3.jpg
He needs the ability to click on the 2.jpg, and go to that actual file. I know I can put in an action for the entire field to go to one URL, but can I narrow that action down to a specific part of the CSV list?
You can use placeholders to embed hyperlinks in comma delimited text, you'll need to wrap each value with an anchor tag and href. See the URL Embedded in Text section of my earlier answer an example.
SQLFiddle of sample code for adding markup. It's ugly in SQL but works.
I came up with a workable solution. The problem seems to lie in the fact that there are multiple URLs in the list. If i only return a single URL, it works fine. so, instead of a CS list, I return one at a time and use a matrix to group the other fields.

Need plugin to overwrite default title

Im trying to write a plugin for Nutch based on http://sujitpal.blogspot.com/2009/07/nutch-custom-plugin-to-parse-and-add.html to get a custom title finder.
This works well, and storing extracted titles in new field is no problem. But I want to use it in Solr instead of default title. The problem is Solr needs multivalued fields as I have 2 title fields.
metadata.remove("title");
didnt work.
I really want to use the new title instead of the default one created by Nutch. Any suggestions?
Why don't you put your title in a different field, thus it will be handled properly ?

Resources