Can SOLR configuration files be located in parent folders? - solr

I have configured the QueryElevation searchComponent of SOLR as documented here:
http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field
However, I would like to load the elevate.xml file from several folders above the default one.
I cannot get this to work... all of the following generate an error:
<str name="config-file">../../elevate.xml</str>
<str name="config-file">..\..\elevate.xml</str>
<str name="config-file">c:/elevate.xml</str>
<str name="config-file">c:\elevate.xml</str>

Per the Solr wiki:
Path to the file that defines query elevation. This file must exist in:
${instanceDir}/conf/${config-file} , or
${dataDir}/${config-file}

Related

Dynamic query fields in Sor / Macro substitution in solrconfig.xml

We've got a multilingual search index with the "field-per-language" configuration with a lot of similar aliases in the search handler like this:
<str name="f.content_en.qf">Title_en^10 Text_en^1 ...</str>
<str name="f.content_de.qf">Title_de^10 Text_de^1 ...</str>
...
They are used in the q parameter:
<str name="q">{!edismax qf=$searchField pf=$searchField v=$searchText}</str>
The client knows, which language should be used and calls Solr like this, e.g.: /solr/core/search?searchText=TEXT&searchField=content_en
That works fine, but the configuration contains a lot of similar stuff.
I'd like to optimize the config to something like this:
<str name="df">content</str>
<str name="f.content.qf">Title_${lang}^10 Text_${lang}^1...</str>
Then the client would need to provide the lang parameter only.
I tried to use concat function like this:
paramLang=en
searchFields=concat("Title", "_", "${paramLang}", " ", "Text", "_", "${paramLang}")
and use it as the qf:
q={!edismax qf=$searchFields v=$searchText}
But it seems, the local params qf does not support Solr functions.
Is is possible with Solr at all?
Actually, the Parameter substitution / Macro Expansion works fine.
The issue was with those macros in the solrconfig.xml: there is a conflict with Solr system properties substitution. Solr could not resolve the query parameter macros.
I could not find a proper way, how to escape query parameters (macros) and used the following workaround:
<lst name="invariants">
<str name="defType">edismax</str>
<str name="searchFields">
Title_${lang:${lang}}^10
Text_${lang:${lang}}^1
...
<lst name="defaults">
<str name="q">*</str>
<str name="qf">${searchFields:${searchFields}}</str>
<str name="pf">${searchFields:${searchFields}}</str>
<str name="lang">en</str>
...
Query URL: /search?q=TEXT&lang=en
Update: proper way to deal with var substitution in solrconfig.xml - escape the dollar char by $$:
<str name="searchFields">
Title_$${lang}^10
Text_$${lang}^1
...
Update #2: do NOT define macros in the invariant or append sections when using a Solr Cloud! Otherwise, you'll a weird exception, e.g.:
"undefined field: \"Text_$\"
or
"msg": "Error from server at null: org.apache.solr.search.SyntaxError: Query Field '${searchFields}' is not a valid field name"
P.S. wt=json as "invariant" is also NOT compatible with Solr Cloud, giving "unexpected" content-type error.
So many "surprises" :(

SOLR more-like-this query parser returns no results

I am trying to get the more-like-this query parser working on my test system. The test system has SOLR cloud 6.5.0 installed. The MLT handler is enabled with the following configuration:
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
<lst name="defaults">
<str name="mlt.qf">search_text_st</str>
<str name="mlt.fl">search_text_st</str>
<int name="mlt.minwl">4</int>
<int name="mlt.maxwl">18</int>
</lst>
</requestHandler>
When I query for document similar to a specific document with the handler, I get expected results. For example:
http://localhost:8983/solr/MyCloud/mlt?q=id:123
The above query will return results:
"response":{"numFound":361,"start":0,"maxScore":113.24594,"docs":[...]}
However, when I try an equivalent query using the MLTQParser with {!mlt qf=search_text_st fl=search_text_st minwl=4 maxwl=18}123, I get no results:
http://localhost:8983/solr/MyCloud/select?q=%7B!mlt+qf%3Dsearch_text_st+fl%3Dsearch_text_st+minwl%3D4+maxwl%3D18%7D123
The response looks like this:
"response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]}
I have done nothing so far to enable or configure MLTQParser, but it does appear to be enabled because I get an error when using a document ID that doesn't exist.
Any idea why this is not working?
I eventually figured out why this was failing. The search_text_st field was being created using copyField. The Cloud MLT Query Parser uses the realtime get handler to retrieve the fields to be mined for keywords. Because of the way the realtime get handler is implemented, it does not return data for fields populated using copyField. (see https://issues.apache.org/jira/browse/SOLR-3743)
Changing the configuration to use the source fields made it work.

How to index a pdf / word doc in Apache SolR

I am new to big data environment, hence apologizing first if the below query is meaningless.
I want to read a word / pdf document and index those documents in SolR . I understand that SolR accepts a JSON or XML format and not a word / pdf /txt files. Is it necessary to convert a word / pdf document into JSON or XML before sending the document to SolR? I initially thought I should use Tika, but my understanding is that Tika can convert a pdf to text and not to JSON.
Could you please guide how to index in Solr?
Thanks for the help
The standard endpoint for indexing 'rich files' are at update/extract, so if you post your file to that destination, Solr will run it through Tika internally, extract the text and properties. You can provide literal values through the URL (such as an ID, filename, other metadata) with literal.fieldname=value arguments.
The Uploading Data with Solr Cell using Apache Tika description in the manual gives you a low-level introduction to how to submit documents with curl through HTTP, as well as which configuration options are required to enable automagic extraction (which is enable on a few of the examples (data driven, tech products iirc)):
If you are not working with the supplied sample_techproducts_configs or data_driven_schema_configs config set, you must configure your own solrconfig.xml to know about the Jar's containing the ExtractingRequestHandler and it's dependencies:
<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar" />`
<lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar" />
You can then configure the ExtractingRequestHandler in solrconfig.xml.
<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="fmap.Last-Modified">last_modified</str>
<str name="uprefix">ignored_</str>
</lst>
<!--Optional. Specify a path to a tika configuration file. See the Tika docs for details.-->
<str name="tika.config">/my/path/to/tika.config</str>
<!-- Optional. Specify one or more date formats to parse. See DateUtil.DEFAULT_DATE_FORMATS
for default date formats -->
<lst name="date.formats">
<str>yyyy-MM-dd</str>
</lst>
<!-- Optional. Specify an external file containing parser-specific properties.
This file is located in the same directory as solrconfig.xml by default.-->
<str name="parseContext.config">parseContext.xml</str>
</requestHandler>

Set default search fields in Apache Solr

I am trying to Implement Apache Solr search through SolrNet library.So far I have managed to run an instance of Solr in my machine and make some queries based on specific fields.
My code to do it looks like this
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<Product>>();
var results = solr.Query(new SolrQueryByField("id", "SP2514N"));
This one works fine now,But I would like to make queries with out specifying a field , So that when I enter a search key word solr will look in to the all fields available and return a result.I have Found the code to make it in SolrNet library from here
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<Product>>();
var results = solr.Query(new SolrQuery("SP2514N"));
But this never worked,When I drilled down to bottom ,I found that I need to set default search fields in Solr instance so that Solr will search that fields when nothing else is selected(This is how i understood it I am not sure about this).
So I went to set default fields in Solr ,I took Solrconfig.XML and edited it like this
<requestHandler name="/query" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="df">text</str>
<str name="df">id</str>
</lst>
</requestHandler>
[just added <str name="df">id</str> this field as extra].But this too never helped And I am stuck ,Can any one tell me How I could set default search field in Solr correctly?Or am i doing any thing else wrong?
I have Uploaded My Solrconfig file here
I do not know about SolrNet library, but to make a default field for search you need to define DefaultSearchField in schema.xml i.e. <defaultSearchField>FieldName</defaultSearchField>.
You can find this file # <SOLR_HOME>\apache-solr-3.6.0\example\example-DIH\solr\testsyndrome\conf\schema.xml
I hope that's what you are looking for.
Don't start from SolrNet, use Solr's built-in Web Admin interface. Iterate there until you understand the request handlers and the parameters. Then, go back to SolrNet.
In your case, it seems that you changed default request handler and tried to use df parameter twice. I would stick to the original request handler for now just to avoid the extra issue.
With using df parameter, are you trying to search a single field or multiple fields? If single field, keep only one value for the parameter. If multiple, you need to switch to using eDisMax, where you can provide a set of fields.
Again, admin interface lets you experiment with it, then you can add it into the handler's default parameter.

Solr indexing fails on server.request(up)

while indexing into solr, i am getting an error like this.
HTTP Status 500 - lazy loading error
org.apache.solr.common.SolrException: lazy loading error at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:260)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
The URL formed is : http://localhost:8080/solr/update/extract?Latitude=51.9125&Longitude=179.5&commit=true&waitFlush=true&waitSearcher=true&wt=javabin&version=2
(I have configured tomcat using Xampp on Windows machine)
I have been following SOF and various other blogs/forums and tried to debug it but for hours i could not find anything.
I have added the following things in the solr.xml
<maxFieldLength>10000</maxFieldLength>
<writeLockTimeout>60000</writeLockTimeout>
<commitLockTimeout>60000</commitLockTimeout>
<lockType>simple</lockType>
<unlockOnStartup>true</unlockOnStartup>
<reopenReaders>true</reopenReaders>
<requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048000" />
<lst name="defaults">
<!--str name="echoParams">explicit</str-->
<!--int name="rows">10</int-->
<!--str name="df">text</str-->
<str name="Latitude">Latitude</str>
<str name="Longitude">Longitude</str>
</lst>
Even tried adding the following to solconfig.xml ands restarting tomcat i get
<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="ext.map.Last-Modified">last_modified</str>
<bool name="ext.ignore.und.fl">true</bool>
</lst>
</requestHandler>
On the Java console it shows an error :
org.apache.solr.common.SolrException: Internal Server Error
I realized the issue might be because of my solr home path. I created a new directory and copied all the config files there and mentioned that as my solr path. However, I later update the solrconfig.xml, correcting paths for all the jars.
Also tried adding the 'pdfbox and fontbox' jars in to solr lib folder and restarting Tomcat
My Java code is :
String urlString = "http://localhost:8080/solr";
SolrServer server = new CommonsHttpSolrServer(urlString);
ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
String fileName=f.toString();
up.addFile(new File(fileName));
up.setParam("Latitude", Latitude);
up.setParam("Longitude", Longitude);
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(up);
(Port8080 is the one i have configured)
Still solr indexing is not working at my end.. i have tried hours debugging this and figuring out. It would be really great if you can show me some hint or suggest anything i am doing wrong.
As for your ref i have already tried :
http://wiki.apache.org/solr/FrontPage
http://wiki.apache.org/solr/ContentStreamUpdateRequestExample
http://wiki.apache.org/solr/UpdateRichDocuments
http://wiki.apache.org/solr/ExtractingRequestHandler#Configuration
http://lucene.472066.n3.nabble.com/Problem-using-ExtractingRequestHandler-with-tomcat-td494930.html
http://lucene.472066.n3.nabble.com/Internal-Server-Error-td715713.html
How to index pdf's content with SolrJ?
Finally I find a way to solve this.
Just modify SOLR_HOME/conf/solrconfig.xml, change all the dir attribute of <tab> tag from "../../dist/" to "../dist/", save the file
If you encounter this problem, that the solr directory must be move from apache-solr-x.x.x\example to some other place, so the relative path "../../dist/" need to be changed accordingly.
Remember to restart your tomcat and see if it work.
Do you have solr-cell.jar in your class path. ?
The ExtractingRequestHandler is in the solr-cell.jar, which is not packaged with the default solr-server

Resources