Solr Cloud in Kubernetes Indexing error - HttpSolrCall Unable to write response - solr

I am trying to do an Indexing with Solr Cloud running in Kubernetes cluster. I defined a Data Import Handler and I can see the configuration in Solr UI.
The Data Import Handler will allow me to trigger a SQL query and fetch the Polygon data for building the index.
<dataSource
type="JdbcDataSource" processor="XPathEntityProcessor"
driver="oracle.jdbc.driver.OracleDriver" ...... />
<document>
<entity name="pcode" pk="PC" transformer="ClobTransformer"
query="select PCA as PC, GEOM as WPOLYGON,
SBJ,PD,CD
from SCPCA
where SBC is not null>
<field column="PC" name="pCode" />
<field column="WPOLYGON" name="wpolygon" clob="true"/>
<field column="SBJ" name="sbjcode" clob="true"/>
<field column="PD" name="portid"/>
<field column="CD" name="cancid"/>
</entity>
</document>
</dataConfig>
After triggering the index via UI.It runs for around 1 minute and fails with following errors in the console
qtp1046545660-14) [c:sba s:shard1 r:core_node6 x:sba_shard1_replica_n4] o.a.s.u.p.LogUpdateProcessorFactory [sba_shard1_replica_n4] webapp=/solr path=/dataimport params={core=sba&debug=true&optimize=false&indent=on&commit=true&name=dataimport&clean=true&wt=json&command=full-import&_=164589234356779&verbose=true}{deleteByQuery=*:*,commit=} 0 70343
2022-02-26 16:30:38.092 INFO (qtp10465423460-14) [c:sba s:shard1 r:core_node6 x:sba_shard1_replica_n4] o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are shutting down => org.eclipse.jetty.io.EofException: Reset cancel_stream_error
at org.eclipse.jetty.http2.server.HTTP2ServerConnectionFactory$HTTPServerSessionListener.onReset(HTTP2ServerConnectionFactory.java:159)
org.eclipse.jetty.io.EofException: Reset cancel_stream_error
I am using Solr Cloud 8.9 with Solr operator 0.5.0 and I checked jetty config and it have an idle timeout of 120000.
Any one faced similar issues and fixed it?

Jetty's EofException almost always means one specific thing. The client
closed the connection before Solr could respond, so when Solr finally
finished processing and tried to have Jetty send the response, there was
nowhere to send it -- the connection was gone.
In my case I was doing a full data import to Solr and it failed with this HttpSolrCall Unable to write response EofException . This was happening due to issues with my managedSchema / schema.xml . I forgot to add all columns correctly in the schema.xml which caused the Indexing to fail with EofException. After correcting my schema.xml it worked fine.
It is bit confusing error as why there is an EofException for wrong schema. However, if it is Solr always check the schema.xml / managedSchema for any discrepancies.

Related

Can not apply patch LUCENE-2899.patch to SOLR on Windows

I am trying to apply patch LUCENE-2899.patch to Solr.
I have done this:
Cloned solr from official repo (I am on master branch)
Downloaded and installed ant and GNU patch, i get it here http://gnuwin32.sourceforge.net/packages/patch.htm
Put Ant and GNU patch to PATH env var.
And I got this...
```
D:\utils\solr_master\lucene-solr>patch -p1 -i LUCENE-2899.patch --dry-run
patching file dev-tools/idea/.idea/ant.xml
Assertion failed: hunk, file ../patch-2.5.9-src/patch.c, line 354
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
```
UPDATE 1
I am trying to compile, but build failed.
D:\utils\solr_master\lucene-solr>ant compile
Buildfile: D:\utils\solr_master\lucene-solr\build.xml
BUILD FAILED
D:\utils\solr_master\lucene-solr\build.xml:21: The following error occurred while executing this line:
D:\utils\solr_master\lucene-solr\lucene\common-build.xml:623: java.lang.NullPointerException
at java.util.Arrays.stream(Arrays.java:5004)
at java.util.stream.Stream.of(Stream.java:1000)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545)
at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:438)
at org.apache.tools.ant.util.ChainedMapper.lambda$mapFileName$1(ChainedMapper.java:36)
at java.util.stream.ReduceOps$1ReducingSink.accept(ReduceOps.java:80)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:484)
at org.apache.tools.ant.util.ChainedMapper.mapFileName(ChainedMapper.java:35)
at org.apache.tools.ant.util.CompositeMapper.lambda$mapFileName$0(CompositeMapper.java:32)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545)
at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:438)
at org.apache.tools.ant.util.CompositeMapper.mapFileName(CompositeMapper.java:33)
at org.apache.tools.ant.taskdefs.PathConvert.execute(PathConvert.java:363)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:292)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:346)
at org.apache.tools.ant.Target.execute(Target.java:448)
at org.apache.tools.ant.helper.ProjectHelper2.parse(ProjectHelper2.java:172)
at org.apache.tools.ant.taskdefs.ImportTask.importResource(ImportTask.java:221)
at org.apache.tools.ant.taskdefs.ImportTask.execute(ImportTask.java:165)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:292)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:346)
at org.apache.tools.ant.Target.execute(Target.java:448)
at org.apache.tools.ant.helper.ProjectHelper2.parse(ProjectHelper2.java:183)
at org.apache.tools.ant.ProjectHelper.configureProject(ProjectHelper.java:93)
at org.apache.tools.ant.Main.runBuild(Main.java:824)
at org.apache.tools.ant.Main.startAnt(Main.java:228)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:283)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:101)
Total time: 0 seconds
UPDATE 2
I have downloaded Solr from
https://builds.apache.org/job/Solr-Artifacts-7.3/lastSuccessfulBuild/artifact/solr/package/ and https://builds.apache.org/job/Solr-Artifacts-master/lastSuccessfulBuild/artifact/solr/package/
but neither for 7.3 version nor for 8.0(master) version I don't see opennlp dir in contrib dir. Where can I find it?
UPDATE 3
I have run version from master branch witch I have downloaded here https://builds.apache.org/job/Solr-Artifacts-master/lastSuccessfulBuild/artifact/solr/package/ and I have trying to run OpenNLP like gentleman in this post:
Exception while integrating openNLP with Solr
But I have the same error as he.
numberplate_shard1_replica_n1:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: >Could not load conf for core numberplate_shard1_replica_n1: Can't load schema >managed-schema: Plugin init failure for [schema.xml] fieldType >"text_opennlp_nvf": Plugin init failure for [schema.xml] analyzer/tokenizer: >Error instantiating class: 'org.apache.lucene.analysis.opennlp.OpenNLPTokenizerFactory'
If patch LUCENE-2899 is merged into master why I have this error?
UPDATE 5
I have restarted solr and errors were gone. But...
I was trying to add fields ( to managed-schema ) to form example ( https://wiki.apache.org/solr/OpenNLP ) :
<fieldType name="text_opennlp" class="solr.TextField">
<analyzer>
<tokenizer class="solr.OpenNLPTokenizerFactory"
sentenceModel="opennlp/en-sent.bin"
tokenizerModel="opennlp/en-token.bin"
/>
</analyzer>
</fieldType>
<field name="content" type="text_opennlp" indexed="true" termOffsets="true" stored="true" termPayloads="true" termPositions="true" docValues="false" termVectors="true" multiValued="true" required="true"/>
But when I try to run Solr in Cloud mode I got this:
D:\utils\solr-7.3.0-7\solr-7.3.0-7\bin>solr -e cloud
Welcome to the SolrCloud example!
This interactive session will help you launch a SolrCloud cluster on your local workstation.
To begin, how many Solr nodes would you like to run in your local cluster? (specify 1-4 nodes) [2]:
1
Ok, let's start up 1 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]:
Solr home directory D:\utils\solr-7.3.0-7\solr-7.3.0-7\example\cloud\node1\solr already exists.
Starting up Solr on port 8983 using command:
"D:\utils\solr-7.3.0-7\solr-7.3.0-7\bin\solr.cmd" start -cloud -p 8983 -s "D:\utils\solr-7.3.0-7\solr-7.3.0-7\example\cloud\node1\solr"
Waiting up to 30 to see Solr running on port 8983
Started Solr server on port 8983. Happy searching!
INFO - 2018-03-26 14:42:26.961; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at localhost:9983 ready
Now let's create a new collection for indexing documents in your 1-node cluster.
Please provide a name for your new collection: [gettingstarted]
numberplate
Collection 'numberplate' already exists!
Do you want to re-use the existing collection or create a new one? Enter 1 to reuse, 2 to create new [1]:
1
Enabling auto soft-commits with maxTime 3 secs using the Config API
POSTing request to Config API: http://localhost:8983/solr/numberplate/config
{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}
ERROR: Error from server at http://localhost:8983/solr: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/numberplate/config. Reason:
<pre> Not Found</pre></p>
</body>
</html>
SolrCloud example running, please visit: http://localhost:8983/solr
D:\utils\solr-7.3.0-7\solr-7.3.0-7\bin>
UPDATE 6
I have created new collection and I get more precise error:
test_collection_shard1_replica_n1: > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > Could not load conf for core test_collection_shard1_replica_n1: Can't load > schema managed-schema: org.apache.solr.core.SolrResourceNotFoundException: > Can't find resource 'opennlp/en-sent.bin' in classpath or '/configs/_default', > cwd=D:\utils\solr-7.3.0-7\solr-7.3.0-7\server
Please check your logs for more information
Maybe I need to copy somewhere OpenNLP models http://opennlp.sourceforge.net/models-1.5/
But where can I put this models?
Can you help me? What I do wrong?
As you can see on LUCENE-2899, the patch is already applied to 8.0 (master), as well as 7.3.
You can find pre-built nightlies at Solr-Artifacts-master for (currently) 8.0 and at Solr-Artifacts-7.3 for 7.3.
The opennlp libraries are bundled inside the artifacts:
solr-8.0.0-3304 find . -name '*nlp*'
[...]
./contrib/langid/lib/opennlp-tools-1.8.3.jar
./contrib/analysis-extras/lib/opennlp-maxent-3.0.3.jar
./contrib/analysis-extras/lib/opennlp-tools-1.8.3.jar
./contrib/analysis-extras/lucene-libs/lucene-analyzers-opennlp-8.0.0-3304.jar
You then have to tell Solr to load these jars, which you can do through solrconfig.xml.
<lib dir="../../../contrib/analysis-extras/lib/" regex="opennlp-.*\.jar" />
<lib dir="../../../contrib/analysis-extras/lucene-libs/lucene-analyzers-opennlp-.*\.jar" regex=".*\.jar" />
Confirm that the jars are loaded as you expect in Solr's log file.

Calling HTTPS URL with SOLR's DataImportHandler returns 403

(This took me a while, so I'm providing the Question and Answer thinking it's worth it.)
The URL from which the DataImportHandler has to retrieve the data is secured via HTTPS and an additional auth parameter. The configuration of the DataImportHandler looks like this:
<dataConfig>
<dataSource type="URLDataSource"
baseUrl="https://www.gutscheinpony.de/"
encoding="UTF-8"/>
<document>
<entity name="pony"
pk="id"
url="feeds.xml?auth=XXX"
processor="XPathEntityProcessor"
forEach="/data/offers/offer"
xsl="xslt/gutscheinpony.xsl">
<!-- fields omitted -->
</entity>
</document>
</dataConfig>
Running this on a regular SOLR 6 installation will fail with a 403 Forbidden code while a quick test on the same URL via curl succeeds (showing only the interesting output):
curl https://www.gutscheinpony.de/feeds.xml?auth=XXX -Iv
> Host: www.gutscheinpony.de
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
Is it possible to set the User Agent for DataImportHandler connections without writing custom Java code?
The difference is that Java does not set the User Agent by default. Neither do SOLR nor the DataImportHandler fix this automatically for HTTPS connections.
It is possible to set a User Agent value for a Java process using the System property http.agent. The value does only matter if the other server cares about it.
Thus, the DataImportHandler will run fine when SOLR is started like this:
bin/solr -f -Dhttp.agent="test/me"

Errors while trying to configure Solr 5.3.1 on Windows 10

I'm trying to setup a very basic configuration of Solr, to read some text from a mysql table and index it. I'm following the steps in DIH Quick Start document.
The document doesn't tell you where to place solrconfig.xml.
At first I tried placing it under the solr5.3.1 folder (next to bin). That failed. Then I noticed the "add core" button was looking for it in server\solr\new_core. So I put it there, but then got this other error:
My data import handler looks like this:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
And here's data-config.xml:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/ctcrets"
user="root"
password="xxxx"/>
<document>
<entity name="id"
query="select RETS_STAGE1_QUEUE_ID as id, LN_LIST_NUMBER as name, xmlText as desc from RETS_STAGE1_QUEUE">
</entity>
</document>
</dataConfig>
What could be the problem?
The document assumes you already know the solr.home [1] directory structure. On top of that, I think it assumes you started the sample Solr instance (e.g. ./solr start -p 8984) where everything should be already set.
Once started you can see on the dashboard where the configuration is exactly located. Go there, change the files as suggested and RELOAD the core through the admin console (CoreAdmin). If you want you can also do a stop / restart.
As side notes:
the DIH is not part of the Solr core, so you should put some "lib" directive within the solrconfig.xml, as far as I remember, the sample config already has those directives so you don't need to "import" the DIH lib
the JDBC driver that allows the connection with the database is not included so your classpath (i.e. JVM or Solr classpath - through the same lib directive) must include this additional lib(s).
[1] http://www.solrtutorial.com/configuring-solr.html

SOLR Field not reflected in schema browser

I created a solr core using bin/solr -c core1 and then copied the schema.xml file from basic config set to core1/conf folder and added a field
<field name="title" type="text" indexed="true" stored="true"/>.
But this field is not reflected in schema browser.
What configurations should I make to get the new fields reflected in schema browser in solr admin ui?
I am using solr 5.3.1
By default when you create a solr core it will use managed schema. You will see the following configuration in solrconfig.xml after core is created.
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
Above this configuration you will find the comments on how use managed-schema. Comment this out and uncomment the following to use schema.xml
<schemaFactory class="ClassicIndexSchemaFactory"/>
You need to reload the core: go to http://yourhost:8983/solr/#/~cores/core1 and press "Reload" button.

Configure DataImportHandler in SolrCloud with ZooKeeper

I have a SolrCloud configured like this: exploration of SolrCloud, the difference is that I use Solr 4.0.0 Beta. Shortly the configuration:
ZooKeeper on default port 2181
3 instances of Solr running on different ports
This is just for testing purpose. The desired configuration is with 3 ZooKeeper instances (one for every Solr instance). I manage to index some XML files with curl command.
Questions:
How can I configure DIH/collection? I managed to change the solrconfig.xml (config for dataimport-handler), add in lib the proper driver for DB connection, but in solr admin I get "sorry, no dataimport-handler defined!" The changes can be watched in zookeeper (I see the data_config.xml) and in solr admin panel I can see the updated version of solrconfig.xml.
Any good tutorial for a production deploy of solrcloud (with somthink like the desired configuration mentioned before) on single or multiple machine for Ubuntu 12.04 LTS?
Any advice would be appreciated! Thanks in advance!
Normally DIH config has nothing to do with wether you're using a single Solr instance or multiple instances in a solrCloud config. DIH will write data in the current instance's Lucene index, and then it's up to zooKeeper to speread it around on the other instances.
Make sure your DIH is propertly configured:
In solrconfig.xml, all necessary libraries are loaded. This means the two DIH jars:
<lib dir="../../../dist/" regex="solr-dataimporthandler-4.3.0.jar" />
<lib dir="../../../dist/" regex="solr-dataimporthandler-extras-4.3.0.jar" />
as well as others jars you may need (like Database JDBC driver, etc).
Still in solrconfig.xml make sure the DIH handler is declared, something like this:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
Finally, the config file you declared in the DIH handler (data-config.xml) should be in the same "conf" dir as solrconfig.xml and should have proper content, something like:
<dataConfig>
<dataSource type="JdbcDataSource" name="myDataSource" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:#someHost:1521:someDb" user="someUser" password="somePassword" batchSize="5000"/>
<document name="myDoc" >
<entity name="myDoc" dataSource="myDatasource" transformer="my.custom.Transformer" query="select col1, col2, col3 from table1 where whatever" />
</document>
</dataConfig>

Resources