I have configured Solr 3.1 with Apache tika 0.9 successfully
I don't change Schema.xml(default schema) and solrconfig.xml file
I have pass this command to browser :
http://localhost:8080/solr/update/extract?literal.id=post1&commit=true%20-F%20%22myfile=#D:\code.txt%22
Output :
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">593</int>
</lst>
</response>
But whenever i search from http://localhost:8080/solr/admin/ with : , it's doesn't give any record
please help me on that ASAP
Thanks
Dhaval,
I think the myfile=#d:\code.txt syntax is only understood by the command line utility curl. Most browsers won't support it.
Retry it with curl, and I think it should work for you. Then look at further solr examples for how to do the post with a browser if you really need to.
Related
I have installed latest solr and creates multiple cores called Unicore and SAP core and as per the SOLR configurations of solr 8.7 distributed search we need to add the following code in the respective confgiurations directories solrconfig.xml file, below solrconfig.xml from Unicore conf
<requestHandler name="/select" class="solr.SearchHandler">
<!-- other params go here -->
<shardHandlerFactory class="HttpShardHandlerFactory">
<int name="socketTimeout">1000</int>
<int name="connTimeout">5000</int>
<str name="shardsWhitelist">localhost:8983/solr/SAP</str>
</shardHandlerFactory>
</requestHandler>
the query i used to collect data from shards:
http://localhost:8983/solr/UniCore/select?q=text:searchString&wt=json&indent=true&shards=localhost:8983/solr/SAP
found below issue:
I did googled a lot and still not able to find the solution. refernce stackverflow question
Modfify solr.xml under solr server directory => update this file by adding core names into shardsWhitelist in the solsInstance\Server\solr\solr.xml and restart solr. This is working for SOLR-8.7 version.
<!-- shardsWhiltlisting SAP core for distributed search -->
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:600000}</int>
<int name="connTimeout">${connTimeout:60000}</int>
<str name="shardsWhitelist">${solr.shardsWhitelist:hostName:8983/solr/SAP}</str>
</shardHandlerFactory>
Configure HttpShardHanldeFactory shardsWhiltlist for all the cores needed
<str name="shardsWhitelist">${solr.shardsWhitelist:hostName:port/solr/core1,
hostName:port/solr/core2,.., solr.shardsWhitelist:hostName:port/solr/coren,}</str>
I need to make some adoptions in project that utilizes apache solr for fulltext searches. Someone configured everything on the production machine and i want to prepare everything locally and deploy the whole new version at once.
I already created working vagrant setup for everything and it works well.
But my problem is - i am not very experienced with configuring apache solr and cant manage to get it working.
Here is my installation script:
apt-get install -q -y openjdk-8-jdk
# install apache solr
if [[ ! -e "/etc/default/solr.in.sh" ]]
then
wget http://www-eu.apache.org/dist/lucene/solr/7.7.1/solr-7.7.1.tgz
tar xzf solr-7.7.1.tgz solr-7.7.1/bin/install_solr_service.sh --strip-components=2
chmod u+x ./install_solr_service.sh
./install_solr_service.sh solr-7.7.1.tgz
cat /vagrant/config/solr/solr.in.sh >> /etc/default/solr.in.sh
rm -f /opt/solr-7.7.1/server/solr/solr.xml
ln -s /vagrant/config/solr/solr.xml /opt/solr-7.7.1/server/solr/solr.xml
fi
contents of /vagrant/config/solr/solr.in.sh
(content taken from production config - i dont really understand the purpose)
# this is just a partial file - we append its contents to the original
SOLR_RECOMMENDED_OPEN_FILES=65000
Content of linked solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<solr>
<str name="coreRootDirectory">${coreRootDirectory:/vagrant/config/solr/cores}</str>
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">${jetty.port:8983}</int>
<str name="hostContext">${hostContext:solr}</str>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<int name="zkClientTimeout">${zkClientTimeout:30000}</int>
<int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int>
<int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>
<str name="zkCredentialsProvider">${zkCredentialsProvider:org.apache.solr.common.cloud.DefaultZkCredentialsProvider}</str>
<str name="zkACLProvider">${zkACLProvider:org.apache.solr.common.cloud.DefaultZkACLProvider}</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:600000}</int>
<int name="connTimeout">${connTimeout:60000}</int>
<str name="shardsWhitelist">${solr.shardsWhitelist:}</str>
</shardHandlerFactory>
</solr>
The cores directory contains all the information from the production machine, i just added the following value to the core.properties file within each core
dataDir=/var/solr/data/NAME_OF_CORE
I figured this way the data would be part of my machine but the config part of my repository.
But when i browse to localhost:8983 (which works perfectly) i dont see any core. Neither can i create a new core, when creating a new core called "new_core" it says:
new_core: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core new_core: Error loading solr config from /var/solr/data/new_core/conf/solrconfig.xml
So - how would i provision solr correctly to keep all my config in git but the data on the machine?
The company that set up everything is not helpful, they provide ZERO information.
Kind regards,
Philipp
How to do full import in PostgreSQL to solr.
I want to full import but i cant not able to do that. I have all files. Here I follow process
I create one collection SA_APP_MASTER
I have all data in PostgreSQL local.
I create one xml file which connect to my local postgresh and select data.
I have one more file solrconfig file.
Now my question is that in which directory of my solr I have to store that file so I can do full import .
The Data Import Handler (the one xml file which connect to your local postgres and select data)has to be registered in solrconfig.xml (The solrconfig.xml file is located in the conf/ directory for each collection). For example:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">/path/to/my/DIHconfigfile.xml</str>
</lst>
</requestHandler>
Or you can put your DIH xml file in the same folder where solrconfig.xml is located and register in solrconfig.xml as the following.
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">DIHconfigfile.xml</str>
</lst>
</requestHandler>
The only required parameter is the config parameter, which specifies the location of the DIH configuration file that contains specifications for the data source, how to fetch data, what data to fetch, and how to process it to generate the Solr documents to be posted to the index.
You can have multiple DIH configuration files. Each file would require a separate definition in the solrconfig.xml file, specifying a path to the file.
I'm trying to setup a very basic configuration of Solr, to read some text from a mysql table and index it. I'm following the steps in DIH Quick Start document.
The document doesn't tell you where to place solrconfig.xml.
At first I tried placing it under the solr5.3.1 folder (next to bin). That failed. Then I noticed the "add core" button was looking for it in server\solr\new_core. So I put it there, but then got this other error:
My data import handler looks like this:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
And here's data-config.xml:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/ctcrets"
user="root"
password="xxxx"/>
<document>
<entity name="id"
query="select RETS_STAGE1_QUEUE_ID as id, LN_LIST_NUMBER as name, xmlText as desc from RETS_STAGE1_QUEUE">
</entity>
</document>
</dataConfig>
What could be the problem?
The document assumes you already know the solr.home [1] directory structure. On top of that, I think it assumes you started the sample Solr instance (e.g. ./solr start -p 8984) where everything should be already set.
Once started you can see on the dashboard where the configuration is exactly located. Go there, change the files as suggested and RELOAD the core through the admin console (CoreAdmin). If you want you can also do a stop / restart.
As side notes:
the DIH is not part of the Solr core, so you should put some "lib" directive within the solrconfig.xml, as far as I remember, the sample config already has those directives so you don't need to "import" the DIH lib
the JDBC driver that allows the connection with the database is not included so your classpath (i.e. JVM or Solr classpath - through the same lib directive) must include this additional lib(s).
[1] http://www.solrtutorial.com/configuring-solr.html
I've added "copyField source="product" dest="text"/" in schema.xml
solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
</lst>
</requestHandler>
I restarted solr and loaded the data again to reflect changes made. My question is whether it is necessary to restart solr every time I make a change in schema.xml.
You can issue a RELOAD command to the core -
http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0
That would let you avoid restarting tomcat or jetty and avoid most of the downtime as it will keep the old core running until the new core is ready.
However there are a few things configuration wise that would require a restart. See https://issues.apache.org/jira/browse/SOLR-3592 and https://wiki.apache.org/solr/CoreAdmin#RELOAD