Im trying to write a plugin for Nutch based on http://sujitpal.blogspot.com/2009/07/nutch-custom-plugin-to-parse-and-add.html to get a custom title finder.
This works well, and storing extracted titles in new field is no problem. But I want to use it in Solr instead of default title. The problem is Solr needs multivalued fields as I have 2 title fields.
metadata.remove("title");
didnt work.
I really want to use the new title instead of the default one created by Nutch. Any suggestions?
Why don't you put your title in a different field, thus it will be handled properly ?
Related
I have a collection of thousands of documents/pdfs and there are a lot of fields like: url, title, date...etc. But there is no content field, which is something that seems like it must exist in order for you to be to able to search by keywords of the entire document, not just the title. I see some people saying that usually, the content field is generated automatically when you index.
How do I go about adding a content field that should contain all the text in the PDFs/DOCs? I am on Solr 6 so I know I need to use API to create a new field to work with managed-schema. But after that, how do I re-index my collection? And if I just name the new field "content", will Solr know that the "content" field should contain all the text in my PDFs/DOCs when it's reindexing?
Creating a "content" field did not work! Instead, I set stored=true for my _text_ field and everything worked.
I am pretty new to SOLR and we have a requirement where I have to modify one of the JSON property value from incoming request to get updated and stored. Something like the below one.
e.g.
{
"name":"Google,CA,94043"
}
When I add this JSON via add/update documents using SOLR admin. I want this name to be stored as just Google. So when I do a search(query) . from SOLR admin it should list name as Google not "Google,CA,94043"
I have added FieldType with PatternReplaceFilterFactory and referenced the same to name field. The result is not appearing with the updated one. But when I analyze field value (index/query) using the admin tool it has the values correctly. Not sure how to achieve this.
Let me know if anyone has steps on how to achieve this.
I am using the solr (6.5.1) suggester to return autocomplete results.
I am trying to display a price and a thumbnail with the autocomplete results but can't find a way to do this.
Is there a way to return more fields?
I see these two questions from two years ago that seem to be trying to accomplish what I want, and both say that at the time it is not doable.
Solr Suggestion with multiple payloads
Returning an entire Document on Solr Suggestion
Has anything changed since two years ago?
Is there a different way that this can be accomplished?
just put all info you need into a field, and use that field as payload. For example you could:
append some string info, separated by |: payload:"17|/path/to/thumbnail"
or you could use Solr BinaryField and put a Java pojo containing the info you need there serialized
I would go the simple route, the first one.
There are several fields and I want generate title from these fields by concatenating them by character '||' in an article in Drupal. However I find it that you must input article title manually, then how to generate title automatically?
I have used Auto Node Title in a project for my client and worked fine to generate titles automatically for the nodes. I don't know if with it you can generate title for the other fields also, but maybe with some additional configuration it would be possible to do
I've crawled a site with Nutch successfully and am trying to return a highlighted abstract using Solr as the indexer/searcher. So, if I query "ocean" then I want to return a 20-30 word abstract from just the text of the web page (not the title or url) containing that query term.
I've copied the Nutch schema.xml as my Solr schema.xml.
So I have two questions:
1. Is the "content" field in the Nutch schema.xml the field for body elements of a web page?
2. If this field is not stored, is there a way to have Solr retrieve that field at search time so that it can be highlighted?
I haven't used Nutch in a long time, but I think it's pretty safe to assume that "content" is the field you want to highlight.
You need to store the field to be able to use highlighting and if you want to use FastVectorHighlighting you need to enable the following attributes for that field: termVectors, termPositions and termOffsets.
If you use FVH, you can also use boundaryScanner in Solr 3.5 and up.