Get Solrconfig components in java application using solrj - solr

I've set the default field for solr search in solrconfig.xml under requesthandler defaults.
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
</lst>
Is there anyway I can retrieve these values through solrj? Need this to implement multiple search domains dynamically.

Not sure with Solrj, however you can can retrieve the Solr Config with direct http http://localhost:8983/solr/admin/file/?contentType=text/xml;charset=utf-8&file=solrConfig.xml
Parse the xml file for the xml element.

Related

Apache solr distributes search using shards are not working (shardsWhitelist)

I have installed latest solr and creates multiple cores called Unicore and SAP core and as per the SOLR configurations of solr 8.7 distributed search we need to add the following code in the respective confgiurations directories solrconfig.xml file, below solrconfig.xml from Unicore conf
<requestHandler name="/select" class="solr.SearchHandler">
<!-- other params go here -->
<shardHandlerFactory class="HttpShardHandlerFactory">
<int name="socketTimeout">1000</int>
<int name="connTimeout">5000</int>
<str name="shardsWhitelist">localhost:8983/solr/SAP</str>
</shardHandlerFactory>
</requestHandler>
the query i used to collect data from shards:
http://localhost:8983/solr/UniCore/select?q=text:searchString&wt=json&indent=true&shards=localhost:8983/solr/SAP
found below issue:
I did googled a lot and still not able to find the solution. refernce stackverflow question
Modfify solr.xml under solr server directory => update this file by adding core names into shardsWhitelist in the solsInstance\Server\solr\solr.xml and restart solr. This is working for SOLR-8.7 version.
<!-- shardsWhiltlisting SAP core for distributed search -->
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:600000}</int>
<int name="connTimeout">${connTimeout:60000}</int>
<str name="shardsWhitelist">${solr.shardsWhitelist:hostName:8983/solr/SAP}</str>
</shardHandlerFactory>
Configure HttpShardHanldeFactory shardsWhiltlist for all the cores needed
<str name="shardsWhitelist">${solr.shardsWhitelist:hostName:port/solr/core1,
hostName:port/solr/core2,.., solr.shardsWhitelist:hostName:port/solr/coren,}</str>

solr 6.0 disable ocr for a specific core

I'm trying to disable OCR for a specific core.
i have solr 6.0 and I configured /update/extract the following way:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="tika.config">C:\solr-6.0.0\contrib\extraction\lib\tika.config</str>
</lst>
</requestHandler>
and file tika.config contains:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
</parser>
</parsers>
</properties>
It doesn't work. I can't disable OCR. Any idea how to disable? (I have tesseract installed and I need it. Please don't suggest to remove it).
Thanks !

Restoring a backup from Sol 8.2.0 results in empty index

We are running Solr 8.2.0 as a standalone system with multiple cores. To test backups, I used the following command:
sudo -u solr curl 'http://localhost:8983/solr/debt/replication?command=backup&lo
cation=/tmp/solrbackups/debt/'
The response I got from the following command:
sudo -u solr curl 'http://localhost:8983/solr/debt/replication?command=details&wt=xml'
was:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">3</int>
</lst>
<str name="status">OK</str>
<lst name="details">
<str name="indexSize">119 bytes</str>
<str name="indexPath">/var/solr/data/debt/data/restore.20191211201823908</str>
<arr name="commits">
<lst>
<long name="indexVersion">1576093177904</long>
<long name="generation">1233</long>
<arr name="filelist">
<str>segments_y9</str>
</arr>
</lst>
</arr>
<str name="isMaster">true</str>
<str name="isSlave">false</str>
<long name="indexVersion">1576093177904</long>
<long name="generation">1233</long>
<lst name="master">
<arr name="replicateAfter">
<str>commit</str>
</arr>
<str name="replicationEnabled">true</str>
</lst>
</lst>
</response>
Then I ran a delete command to test out restoring the index from the backup by using:
sudo -u solr curl http://localhost:8983/solr/debt/update -H "Content-type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
Then I stopped and started Solr via /etc/init.d/solr stop|start
When I ran the following command:
sudo -u solr curl 'http://localhost:8983/solr/debt/replication?command=restore&location=/tmp/solrbackups/debt/'
I got a Status OK from the command line.
But when I go to the web API to look at the core, it shows that there are no documents.
Has anyone else had this problem?

Share config files across environments in Solr

We have several environments (incl. several development, staging and production), but currently copy and paste the Solr conf folders and setup solr-data-config.xml for each environment as the file has the environments details:
<dataConfig>
<dataSource name="ds-db" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://10.0.0.40:3306/***"
user="***"
password="**"/>
How can we separate the solr config from the environment data, so that we just have one config folder per search group and have separate environment data?
I would recommend to externalise the environment dependent parameters :
1) DIH
You can obtain this using placeholders :
e.g.
<dataConfig>
<dataSource name="ds-db" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url ="${dataimporter.request.url}"
user ="${dataimporter.request.user}"
password ="${dataimporter.request.password}"/>
2) Solrconfig
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
<str name="clean">true</str>
...
<str name="url">${db.url:defaultUrl}</str>
<str name="user">${db.user:defaultUser}</str>
<str name="password">${db.password:}</str>
...
</lst>
</requestHandler>
${environment_variable: "default" } is the syntax to use[1] .
Then you need to pass the variables as Java system properties for the Solr java process.
[1] https://lucene.apache.org/solr/guide/6_6/configuring-solrconfig-xml.html#Configuringsolrconfig.xml-JVMSystemProperties

How do I change schema.xml without restarting Solr?

I've added "copyField source="product" dest="text"/" in schema.xml
solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
</lst>
</requestHandler>
I restarted solr and loaded the data again to reflect changes made. My question is whether it is necessary to restart solr every time I make a change in schema.xml.
You can issue a RELOAD command to the core -
http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0
That would let you avoid restarting tomcat or jetty and avoid most of the downtime as it will keep the old core running until the new core is ready.
However there are a few things configuration wise that would require a restart. See https://issues.apache.org/jira/browse/SOLR-3592 and https://wiki.apache.org/solr/CoreAdmin#RELOAD

Resources