Issue while creating Solr core on HDFS - solr

I am trying to create solr core on HDFS in a stand alone instance (solr-5.3.0 and Hadoop 2.7). I have started the service like below,
$ bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://localhost:9000/tmp -Dsolr.updatelog=hdfs://localhost:9000/tmp -s solr-cores/core1
Waiting up to 30 seconds to see Solr running on port 8983 [/]
Started Solr server on port 8983 (pid=42277). Happy searching!
And trying to create core like below,
bin/solr create -c hdfsstarted -d /home/admin/HadoopTools/solr-5.3.0/server/solr/configsets/data_driven_schema_configs_hdfs/conf -n hdfsstarted
But getting below error:
Setup new core instance directory:
/home/admin/HadoopTools/solr-5.3.0/solr-cores/core1/hdfsstarted
Creating new core 'hdfsstarted' using command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=hdfsstarted&instanceDir=hdfsstarted
ERROR: Error CREATEing SolrCore 'hdfsstarted': Unable to create core [hdfsstarted] Caused by: Protocol message end-group tag did not match expected tag.
I have modified the solrconfig.xml like below,
<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
<str name="solr.hdfs.home">hdfs://10.67.5.244:50070/tmp</str>
<bool name="solr.hdfs.blockcache.enabled">true</bool>
<int name="solr.hdfs.blockcache.slab.count">1</int>
<bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool>
<int name="solr.hdfs.blockcache.blocksperbank">16384</int>
<bool name="solr.hdfs.blockcache.read.enabled">true</bool>
<bool name="solr.hdfs.blockcache.write.enabled">false</bool>
<bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
<int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
<int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
</directoryFactory>
<lockType>
hdfs
</lockType>
Kindly let me know how to create core correctly in HDFS.

Caused by: Protocol message end-group tag did not match expected tag.
This error happens because you are using incorrect HDFS port. See hdfs - ls: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: .
Here you need to change port from 50070 (which looks like NameNode web ui port) to 8020 or whatever port you are using as NameNode RPC port:
<str name="solr.hdfs.home">hdfs://10.67.5.244:50070/tmp</str>

Related

Solr 8.10 sample core files and how to add new cores

I'm using regular Solr 8.10.1 (no Solr Cloud)
I start it like C:\solr-8.10.1\bin\solr start -p 8983
My folder structure:
- solr-8.10.1
- server
- solr
- configsets
- sample_techproducts_configs
- conf
- mytest
- conf
- lang
data-config.xml
managed-schema
protwords.txt
solrconfig.xml
stopwords.txt
synonyms.txt
- data
- samplecatalog
- conf
data-config.xml
schema.xml
solrconfig.xml
solr.xml
I also copied files from my solr 4.3.2 instance samplecatalog to a new folder in 8.10.1.
But when I got to http://localhost:8983/solr/#/~cores
I see no cores.
solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<solr>
<int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int>
<str name="sharedLib">${solr.sharedLib:}</str>
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">${jetty.port:8983}</int>
<str name="hostContext">${hostContext:solr}</str>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<int name="zkClientTimeout">${zkClientTimeout:30000}</int>
<int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int>
<int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>
<str name="zkCredentialsProvider">${zkCredentialsProvider:org.apache.solr.common.cloud.DefaultZkCredentialsProvider}</str>
<str name="zkACLProvider">${zkACLProvider:org.apache.solr.common.cloud.DefaultZkACLProvider}</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:600000}</int>
<int name="connTimeout">${connTimeout:60000}</int>
<str name="shardsWhitelist">${solr.shardsWhitelist:}</str>
</shardHandlerFactory>
</solr>
I just want to have a sample core folder with a schema.xml handlers and a data-config.xml for my entities, so I can start and expand from that foundation.
I checked the tutorials but I can't find any samples or see where I can define cores via my config files.
I also checked here, but that's for a very old version.
Short answer : cd into solr bin directory and run solr create -c "mytest"
(#see solr create command).
Basically you can follow this few steps to define a configuration set and create the corresponding core.
Define SOLR_HOME (where to put Solr core(s) config/data) in solr's bin/solr.in.sh, or bin\solr.in.cmd on windows. It's recommended you separate it from solr sources & binaries.
Create/move your configuration set in SOLR_HOME directory and ensure solr has ownership.
Run the solr create command
Here a bash script based on one I oftenly use that does the job (I noticed you are on a windows machine but the principle remains the same) :
#!/bin/bash
SOLR_SRC="/opt/solr" # symlink to your solr-<version> directory
SOLR_ROOT="/var/solr"
SOLR_HOME="${SOLR_ROOT}/data"
CORE="mytest"
# Create core config set in SOLR_HOME
cd ${SOLR_HOME}/
mkdir -p ${CORE}/data
# cp -R ${SOLR_SRC}/server/solr/configsets/_default/conf/ ./${CORE}/ # from default conf
cp -R ${SOLR_SRC}/server/solr/${CORE}/conf/ ./${CORE}/
# Set ownership
chown -R solr:solr ${SOLR_HOME}
# Create core
su - solr -c "${SOLR_SRC}/bin/solr create -c ${CORE}"

Failed to create solr core in Solr 8.9.0 using Solr API

I have created Solr war file followed process as mentioned in
https://gist.github.com/fschiettecatte/836d13be0c95f1fd159e45d3af861952
as I want to run Solr as standalone application through my specific
version of jetty server.
After creating war file I started solr through jetty successfully by
running below command:
$ java -Djetty.home=/var/solr -Djetty.base=/var/solr
-Dsolr.solr.home=/var/solr/solr -Dsolr.log.dir=/var/solr/solr
-Dbootstrap_confdir=/var/solr/solr/conf -Dcollection.configName=conf
-DzkRun -Djava.util.logging.config.file=/var/solr/solr/solr-log.properties
-jar /var/solr/start.jar
2021-08-20 07:49:40.869:INFO::main: Logging initialized #155ms to
org.eclipse.jetty.util.log.StdErrLog
2021-08-20 07:49:41.021:INFO:oejs.Server:main: jetty-9.4.18.v20190429;
built: 2019-05-10T18:03:12.512Z; git:
7ef7435fd940d3eb73c256b765d93aff5849c6e8; jvm 11.0.5+10
2021-08-20 07:49:41.029:INFO:oejdp.ScanningAppProvider:main:
Deployment monitor [file:///data/git/runtime/solr/webapps/] at
interval 1
2021-08-20 07:49:41.568:INFO:oejw.StandardDescriptorProcessor:main: NO
JSP Support for /solr, did not find
org.apache.jasper.servlet.JspServlet
2021-08-20 07:49:41.572:INFO:oejs.session:main:
DefaultSessionIdManager workerName=node0
2021-08-20 07:49:41.573:INFO:oejs.session:main: No SessionScavenger
set, using defaults
2021-08-20 07:49:41.573:INFO:oejs.session:main: node0 Scavenging every 600000ms
2021-08-20 07:49:41.575:WARN:oejs.SecurityHandler:main:
ServletContext#o.e.j.w.WebAppContext#294425a7{solr,/solr,file:///tmp/jetty-0.0.0.0-8983-solr.war-_solr-any-17126331786779836602.dir/webapp/,STARTING}{/solr.war}
has uncovered http methods for path: /
ERROR StatusLogger No Log4j 2 configuration file found. Using default
configuration (logging only errors to the console), or user
programmatically provided configurations. Set system property
'log4j2.debug' to show Log4j 2 internal initialization logging. See
https://logging.apache.org/log4j/2.x/manual/configuration.html for
instructions on how to configure Log4j 2
2021-08-20 07:49:43.647:INFO:oejsh.ContextHandler:main: Started
o.e.j.w.WebAppContext#294425a7{solr,/solr,file:///tmp/jetty-0.0.0.0-8983-solr.war-_solr-any-17126331786779836602.dir/webapp/,AVAILABLE}{/solr.war}
2021-08-20 07:49:43.652:INFO:oejs.AbstractConnector:main: Started
ServerConnector#6da9dc6{HTTP/1.1,[http/1.1]}{0.0.0.0:8983}
2021-08-20 07:49:43.653:INFO:oejs.Server:main: Started #2939ms
My solr is working just fine I run few command to verify:
$ curl "http://0.0.0.0:8983/solr/admin/collections?action=clusterstatus&wt=xml"
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">19</int>
</lst>
<lst name="cluster">
<lst name="collections"/>
<arr name="live_nodes">
<str>192.168.1.2:8983_solr</str>
</arr>
</lst>
</response>
When I tried to create core it was failing with below error, before
running this command I created folder name “a10” under solr home
directory “/var/solr/solr/cores”
$ curl "http://0.0.0.0:8983/solr/admin/cores?action=CREATE&name=a10&instanceDir=cores/a10&shard=shard10&collection=conf1&coreNodeName=a10&wt=xml"
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">10067</int>
</lst>
<lst name="error">
<lst name="metadata">
<str name="error-class">org.apache.solr.common.SolrException</str>
<str name="root-error-class">org.apache.solr.cloud.ZkController$NotInClusterStateException</str>
</lst>
<str name="msg">Error CREATEing SolrCore 'a10': coreNodeName a10 does
not exist in shard shard10, ignore the exception if the replica was
deleted</str>
<int name="code">400</int>
</lst>
</response>
Backtrace because of this error in jetty console:
07:54:36.047 [qtp466505482-21] ERROR org.apache.solr.handler.RequestHandlerBase - org.apache.solr.common.SolrException: Error CREATEing SolrCore 'a1': coreNodeName a1 does not exist in shard shard1, ignore the exception if the replica was deleted
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1136)
at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:92)
at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:360)
at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:396)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:758)
at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:739)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:511)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1700)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1667)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:505)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:132)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:724)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:830)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.solr.cloud.ZkController$NotInClusterStateException: coreNodeName a1 does not exist in shard shard1, ignore the exception if the replica was deleted
at org.apache.solr.cloud.ZkController.checkStateInZk(ZkController.java:1874)
at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1773)
at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1180)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1097)
... 41 more
My issue got resolved when I started Solr in stand alone mode. I started solr in cloud mode because of that create core command was failing.
Solr API should be used only when solr started in standalone mode and Collection related API should be sued when solr started in cloud mode.

Solr 8.4.1 cloud : bin/post - File not Found problem

I am new to Solr and have been working through the tutorial of 8.4.0. Having followed successfully the techproducts example using SolrCloud, I'm now trying to use a schemaless approach to index some PDF files. For that, I used the following, again from the tutorial, to index several files which are stored int the ~/Documents/pdf folder:
bin/solr create -c localpdf -s 2 - rf 2
bin/post -c localpdf ~/Documents/pdf
When executing the above, I get the following error:
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/localpdf/update/extract. Reason:
<pre> Not Found</pre></p>
</body>
</html>
SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/localpdf/update/extract?resource.name=%2Fhome%2Fuser%2FDocuments%2Fpdf%2Ftest234.pdf&literal.id=%2Fhome%2Fuser%2FDocuments%2Fpdf%2Ftest234.pdf
Running the same command with techproducts, i.e. running:
bin/post -c techproducts ~/Documents/pdf
at least finds the files (it gives me some other errors related to PDFBox and some fonts, but that's another matter)
I can add other files, for instance XML to localpdf from the example/exampledocs folder, but not the pdfs.
What am I missing here?
You must configure your core / collection to load the extracting request handler - otherwise it's not available. The techproducts core does this by default. Add the jars to the list of jars to load:
<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar" />
​<lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar" />
And add the request handler definition (from the guide linked above):
<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="fmap.Last-Modified">last_modified</str>
<str name="uprefix">ignored_</str>
</lst>
<!--Optional. Specify a path to a tika configuration file. See the Tika docs for details.-->
<str name="tika.config">/my/path/to/tika.config</str>
<!-- Optional. Specify one or more date formats to parse. See DateUtil.DEFAULT_DATE_FORMATS
for default date formats -->
<lst name="date.formats">
<str>yyyy-MM-dd</str>
</lst>
<!-- Optional. Specify an external file containing parser-specific properties.
This file is located in the same directory as solrconfig.xml by default.-->
<str name="parseContext.config">parseContext.xml</str>
</requestHandler>

apache solr provisioning - how to keep config in VCS but data on machine

I need to make some adoptions in project that utilizes apache solr for fulltext searches. Someone configured everything on the production machine and i want to prepare everything locally and deploy the whole new version at once.
I already created working vagrant setup for everything and it works well.
But my problem is - i am not very experienced with configuring apache solr and cant manage to get it working.
Here is my installation script:
apt-get install -q -y openjdk-8-jdk
# install apache solr
if [[ ! -e "/etc/default/solr.in.sh" ]]
then
wget http://www-eu.apache.org/dist/lucene/solr/7.7.1/solr-7.7.1.tgz
tar xzf solr-7.7.1.tgz solr-7.7.1/bin/install_solr_service.sh --strip-components=2
chmod u+x ./install_solr_service.sh
./install_solr_service.sh solr-7.7.1.tgz
cat /vagrant/config/solr/solr.in.sh >> /etc/default/solr.in.sh
rm -f /opt/solr-7.7.1/server/solr/solr.xml
ln -s /vagrant/config/solr/solr.xml /opt/solr-7.7.1/server/solr/solr.xml
fi
contents of /vagrant/config/solr/solr.in.sh
(content taken from production config - i dont really understand the purpose)
# this is just a partial file - we append its contents to the original
SOLR_RECOMMENDED_OPEN_FILES=65000
Content of linked solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<solr>
<str name="coreRootDirectory">${coreRootDirectory:/vagrant/config/solr/cores}</str>
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">${jetty.port:8983}</int>
<str name="hostContext">${hostContext:solr}</str>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<int name="zkClientTimeout">${zkClientTimeout:30000}</int>
<int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int>
<int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>
<str name="zkCredentialsProvider">${zkCredentialsProvider:org.apache.solr.common.cloud.DefaultZkCredentialsProvider}</str>
<str name="zkACLProvider">${zkACLProvider:org.apache.solr.common.cloud.DefaultZkACLProvider}</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:600000}</int>
<int name="connTimeout">${connTimeout:60000}</int>
<str name="shardsWhitelist">${solr.shardsWhitelist:}</str>
</shardHandlerFactory>
</solr>
The cores directory contains all the information from the production machine, i just added the following value to the core.properties file within each core
dataDir=/var/solr/data/NAME_OF_CORE
I figured this way the data would be part of my machine but the config part of my repository.
But when i browse to localhost:8983 (which works perfectly) i dont see any core. Neither can i create a new core, when creating a new core called "new_core" it says:
new_core: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core new_core: Error loading solr config from /var/solr/data/new_core/conf/solrconfig.xml
So - how would i provision solr correctly to keep all my config in git but the data on the machine?
The company that set up everything is not helpful, they provide ZERO information.
Kind regards,
Philipp

Solr 3.1 doesn't index the file

I have configured Solr 3.1 with Apache tika 0.9 successfully
I don't change Schema.xml(default schema) and solrconfig.xml file
I have pass this command to browser :
http://localhost:8080/solr/update/extract?literal.id=post1&commit=true%20-F%20%22myfile=#D:\code.txt%22
Output :
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">593</int>
</lst>
</response>
But whenever i search from http://localhost:8080/solr/admin/ with : , it's doesn't give any record
please help me on that ASAP
Thanks
Dhaval,
I think the myfile=#d:\code.txt syntax is only understood by the command line utility curl. Most browsers won't support it.
Retry it with curl, and I think it should work for you. Then look at further solr examples for how to do the post with a browser if you really need to.

Resources