We have several environments (incl. several development, staging and production), but currently copy and paste the Solr conf folders and setup solr-data-config.xml for each environment as the file has the environments details:
<dataConfig>
<dataSource name="ds-db" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://10.0.0.40:3306/***"
user="***"
password="**"/>
How can we separate the solr config from the environment data, so that we just have one config folder per search group and have separate environment data?
I would recommend to externalise the environment dependent parameters :
1) DIH
You can obtain this using placeholders :
e.g.
<dataConfig>
<dataSource name="ds-db" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url ="${dataimporter.request.url}"
user ="${dataimporter.request.user}"
password ="${dataimporter.request.password}"/>
2) Solrconfig
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
<str name="clean">true</str>
...
<str name="url">${db.url:defaultUrl}</str>
<str name="user">${db.user:defaultUser}</str>
<str name="password">${db.password:}</str>
...
</lst>
</requestHandler>
${environment_variable: "default" } is the syntax to use[1] .
Then you need to pass the variables as Java system properties for the Solr java process.
[1] https://lucene.apache.org/solr/guide/6_6/configuring-solrconfig-xml.html#Configuringsolrconfig.xml-JVMSystemProperties
Related
I have installed latest solr and creates multiple cores called Unicore and SAP core and as per the SOLR configurations of solr 8.7 distributed search we need to add the following code in the respective confgiurations directories solrconfig.xml file, below solrconfig.xml from Unicore conf
<requestHandler name="/select" class="solr.SearchHandler">
<!-- other params go here -->
<shardHandlerFactory class="HttpShardHandlerFactory">
<int name="socketTimeout">1000</int>
<int name="connTimeout">5000</int>
<str name="shardsWhitelist">localhost:8983/solr/SAP</str>
</shardHandlerFactory>
</requestHandler>
the query i used to collect data from shards:
http://localhost:8983/solr/UniCore/select?q=text:searchString&wt=json&indent=true&shards=localhost:8983/solr/SAP
found below issue:
I did googled a lot and still not able to find the solution. refernce stackverflow question
Modfify solr.xml under solr server directory => update this file by adding core names into shardsWhitelist in the solsInstance\Server\solr\solr.xml and restart solr. This is working for SOLR-8.7 version.
<!-- shardsWhiltlisting SAP core for distributed search -->
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:600000}</int>
<int name="connTimeout">${connTimeout:60000}</int>
<str name="shardsWhitelist">${solr.shardsWhitelist:hostName:8983/solr/SAP}</str>
</shardHandlerFactory>
Configure HttpShardHanldeFactory shardsWhiltlist for all the cores needed
<str name="shardsWhitelist">${solr.shardsWhitelist:hostName:port/solr/core1,
hostName:port/solr/core2,.., solr.shardsWhitelist:hostName:port/solr/coren,}</str>
I'm trying to disable OCR for a specific core.
i have solr 6.0 and I configured /update/extract the following way:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="tika.config">C:\solr-6.0.0\contrib\extraction\lib\tika.config</str>
</lst>
</requestHandler>
and file tika.config contains:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
</parser>
</parsers>
</properties>
It doesn't work. I can't disable OCR. Any idea how to disable? (I have tesseract installed and I need it. Please don't suggest to remove it).
Thanks !
How to do full import in PostgreSQL to solr.
I want to full import but i cant not able to do that. I have all files. Here I follow process
I create one collection SA_APP_MASTER
I have all data in PostgreSQL local.
I create one xml file which connect to my local postgresh and select data.
I have one more file solrconfig file.
Now my question is that in which directory of my solr I have to store that file so I can do full import .
The Data Import Handler (the one xml file which connect to your local postgres and select data)has to be registered in solrconfig.xml (The solrconfig.xml file is located in the conf/ directory for each collection). For example:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">/path/to/my/DIHconfigfile.xml</str>
</lst>
</requestHandler>
Or you can put your DIH xml file in the same folder where solrconfig.xml is located and register in solrconfig.xml as the following.
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">DIHconfigfile.xml</str>
</lst>
</requestHandler>
The only required parameter is the config parameter, which specifies the location of the DIH configuration file that contains specifications for the data source, how to fetch data, what data to fetch, and how to process it to generate the Solr documents to be posted to the index.
You can have multiple DIH configuration files. Each file would require a separate definition in the solrconfig.xml file, specifying a path to the file.
I've added "copyField source="product" dest="text"/" in schema.xml
solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
</lst>
</requestHandler>
I restarted solr and loaded the data again to reflect changes made. My question is whether it is necessary to restart solr every time I make a change in schema.xml.
You can issue a RELOAD command to the core -
http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0
That would let you avoid restarting tomcat or jetty and avoid most of the downtime as it will keep the old core running until the new core is ready.
However there are a few things configuration wise that would require a restart. See https://issues.apache.org/jira/browse/SOLR-3592 and https://wiki.apache.org/solr/CoreAdmin#RELOAD
I am trying to config solr over ms sql server.
I found only this tutorial which is a bit old (2011)
Is there an updated tutorial?
Is there a formal tutorial?
Steps to configure solr on Tomcat
http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/
Everything about data import handler can be found here..
http://wiki.apache.org/solr/DataImportHandler
After creating the dataimport handler (ex file name data-config.xml) you need to add this request handler to solrconfig.xml as below
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">/home/username/data-config.xml</str>
</lst>
</requestHandler>
Sample ms sql DIH configuration:
data-config.xml:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://myserver;databaseName=mydb;responseBuffering=adaptive;selectMethod=cursor"
user="sa"
password="password"/>
<document>
<entity name="results" query="SELECT statements">
<field column="fielda" name="fielda"/>
<field column="fieldb" name="fieldb"/>
<field column="fieldc" name="fieldc"/>
</entity>
</document>
</dataConfig>